Professional Documents
Culture Documents
HYDERABAD CAMPUS
FIRST SEMESTER 2016 2017
INFORMATION RETRIEVAL (CS F469) - COMPREHENSIVE MAKEUP EXAM
A. Construct the phrase alignment matrix with English words as rows and Hindi words as
columns. [4 M]
B. Assuming that the alignment matrix from question A is the intersection of P(f|e) and P(e|f),
identify whether the following phrase pairs are consistent with the alignment [2 M]
i.
ii. ( )
C. Which is the longest phrase pair that is consistent with the alignment? [2 M]
D. Compute the reordering distance between the following 2 phrase pairs given in question B
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv[4 M]
E. In the mathematical model of the phrase based translation why is the reordering distance not
directly used but an exponentially decaying cost function d = |startiendi11)| is used? [2 M]
|start end 1)|
F. In the exponentially decaying cost function d = i i1 , What should the value of if
the movement of the phrases have to be penalized? [2 M]
G. If a spurious phrase pair occurs only once in the whole parallel corpus, what will the value of
(f,e) and (e,f)? [1 M]
H. If a spurious phrase translation pair occurs only once how will you compute the phrase
translation values show with the help of an example. [3 M]
E. You are hired by Bing to work on its search engine to use the concept of collaborative filtering to
recommend documents to a query. [Hint: Here the query is considered as an active user to
whom you will recommend items]. Answer questions i-iv [1+1+2+6 = 10 M]
i. What do you mean by neighboring users in this scenario?
ii. What do you mean by the items in this scenario?
iii. What is the rating in this scenario?
iv. Briefly sketch the algorithm, preferably with some formulas. Assume that r(Q, D) is a retrieval
function that can give you a positive similarity value for any query and document.
[Hint: map the given problem to the user-item matrix and find analogies to the problem]
F. Given the following SVD for M where the columns of M are the ratings of Matrix, Alien, Star
Wars, Casablanca and Titanic answer questions i and v .
Suppose Leslie assigns rating 3 to Alien and rating 4 to Titanic. [2+2+2+3+3=12 M]
i.Show how can we represent Leslie as a vector?
ii.What is her representation in movie space?
iii.Find the representation of Leslie in concept space.
iv.What does that representation predict about how well Leslie would like the other movies
appearing in our example data?
v.How to guess the movies a person would most like. How would you use a similar technique to
guess the people that would most like a given movie, if all you had were the ratings of that
movie by a few people?
Q6. Link Analysis
For the web graph given in Figure1, Answer questions A-F. [4+4+4+2+3=17M]
A. Write the flow equations for calculating the page rank for all
the pages in the web graph.
B. Using power iteration method what will be the page rank of
all the pages after 2 iterations.
C. Show the transition matrix A, that will be used by the
PageRank algorithm, assuming that with probability a
random surfer will follow the links on the current page, and
with (1- ) probability he/she will transition to any of the
(three) pages with uniform probability; where is set to 0.5.
Figure 1
D. Suppose we set to 0, then what will be the page ranks associated with the three pages?
[Note: You need not compute the page rank just a 2-line justification is expected]
E. Show the working of HITS algorithm in vector notation for two iterations on the web graph
in Figure 1.
F. Does the Web graph in Figure 2 have spider traps and Dead ends? [2 M]
Figure 2
Squared chord
C. Using the concepts learnt in this course suggest an application of your choice that could be
useful and ease the life of common man, also show the architecture of your proposed system.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3 M]