You are on page 1of 3

Adaptive Search Engines

Learning Ranking Functions with • Current Search Engines


SVMs – One-size-fits-all
– Hand-tuned retrieval
CS4780 – Machine Learning function
Fall 2009 • Hypothesis
– Different users need
different retrieval functions
Thorsten Joachims – Different collections need
Cornell University different retrieval functions
• Machine Learning
– Learn improved retrieval
functions
T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the – User Feedback as training
ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2002. data

Overview Sources of Feedback


• How can we get training data for learning improved • Explicit Feedback
retrieval functions? – Overhead for user
– Explicit vs. implicit feedback – Only few users give
– Absolute vs. relative feedback feedback
– User study with eye-tracking and relevance judgments => not representative
• Implicit Feedback
• What learning algorithms can use this training data? – Queries, clicks, time,
– Ranking Support Vector Machine mousing, scrolling, etc.
– User study with meta-search engine – No Overhead
– More difficult to
interpret

Feedback from Clickthrough Data User Study: Eye-Tracking and Relevance


Relative Feedback: Absolute Feedback: • Scenario
Clicks reflect preference The clicked links are – WWW search
between observed links. relevant to the query.
– Google search engine
– Subjects were not restricted
(3 < 2), 1. Kernel Machines Rel(1),
(7 < 2),
http://svm.first.gmd.de/
2. Support Vector Machine NotRel(2), – Answer 10 questions
http://jbolivar.freeservers.com/
(7 < 4), 3. SVM-Light Support Vector Machine Rel(3), • Eye-Tracking
http://ais.gmd.de/~thorsten/svm light/
(7 < 5), 4. An Introduction to Support Vector Machines NotRel(4), – Record the sequence of eye movements
http://www.support-vector.net/
(7 < 6) 5. Support Vector Machine and Kernel ... References NotRel(5), – Analyze how users scan the results page of Google
http://svm.research.bell-labs.com/SVMrefs.html
6. Archives of SUPPORT-VECTOR-MACHINES ... NotRel(6), • Relevance Judgements
http://www.jiscmail.ac.uk/lists/SUPPORT...
7. Lucent Technologies: SVM demo applet Rel(7) – Ask relevance judges to explicitly judge the relevance of all
http://svm.research.bell-labs.com/SVT/SVMsvt.html
8. Royal Holloway Support Vector Machine pages encountered
http://svm.dcs.rhbnc.ac.uk
– Compare implicit feedback from clicks to explicit judgments

1
What is Eye-Tracking? Conclusion: Viewing Behavior
Eye tracking device

Device to detect and record where • Users most frequently view two abstracts
and what people look at • Users typically view results in order from top to bottom
– Fixations: ~200-300ms;
• Users view links one and two more thoroughly and often
information is acquired
– Saccades: extremely rapid • Users click most frequently on link one
movements between fixations • Users typically do not look at links below before they click
– Pupil dilation: size of pupil (except maybe the next link)
indicates interest, arousal
=> Design strategies for interpreting clickthrough
data that respect these properties!

“Scanpath” output depicts pattern of movement


throughout screen. Black markers represent fixations.

Are Clicks Absolute Relevance Judgments? Strategies for Generating Relative Feedback
• Clicks depend not only on relevance of a link, but also Strategies 1. Kernel Machines
http://www.kernel-machines.org/
– On the position in which the link was presented • “Click > Skip Above” 2. Support Vector Machine
http://jbolivar.freeservers.com/
– The quality of the other links – (3>2), (5>2), (5>4) 3. SVM-Light Support Vector Machine
http://ais.gmd.de/~thorsten/svm light/
=> Interpreting Clicks as absolute feedback extremely • “Last Click > Skip Above” 4. An Introduction to SVMs
difficult! – (5>2), (5>4) http://www.support-vector.net/
5. Support Vector Machine and ...
• “Click > Earlier Click” http://svm.bell-labs.com/SVMrefs.html
6. Archives of SUPPORT-VECTOR...
– (3>1), (5>1), (5>3) http://www.jisc.ac.uk/lists/SUPPORT...
7. Lucent Technologies: SVM demo applet
• “Click > Skip Previous” http://svm.bell-labs.com/SVMsvt.html
8. Royal Holloway SVM
– (3>2), (5>4) http://svm.dcs.rhbnc.ac.uk
• “Click > Skip Next” 9. SVM World
http://www.svmworld.com
– (1>2), (3>4), (5>6) 10. Fraunhofer FIRST SVM page
http://svm.first.gmd.de

Learning Retrieval Functions from


Comparison with Explicit Feedback Pairwise Preferences
Idea: Learn a ranking function, so that number of violated
pair-wise training preferences is minimized.
Form of Ranking Function: sort by
rsv(q,di) = w1 * (#of query words in title of di)
+ w2 * (#of query words in anchor)
+…
+ wn * (page-rank of di)
= w * (q,di)
Training: Select w so that
if user prefers di to di for query q,
=> All but “Click > Earlier Click” appear accurate then
rsv(q, di) > rsv(q, dj)

2
Ranking Support Vector Machine Experiment
• Find ranking function with low error and large margin Meta-Search Engine “Striver”
– Implemented meta-search engine on top of Google,
MSNSearch, Altavista, Hotbot, Excite
– Retrieve top 100 results from each search engine
– Re-rank results with learned ranking functions
Experiment Setup
– User study on group of ~20 German machine learning
• Properties 1 2 researchers and students
– Convex quadratic program => homogeneous group of users
– Non-linear functions using Kernels – Asked users to use the system like any other search engine
– Implemented as part of SVM-light 3 – Train ranking SVM on 3 weeks of clickthrough data
– http://svmlight.joachims.org – Test on 2 following weeks
4

Which Ranking Function is Better? Results


1. Kernel Machines 1. Kernel Machines
http://svm.first.gmd.de/ http://svm.first.gmd.de/
2. SVM-Light Support Vector Machine
2. Support Vector Machine
http://jbolivar.freeservers.com/
3.
http://ais.gmd.de/~thorsten/svm light/
Support Vector Machine and Kernel ... References
Ranking A Ranking B A better B better Tie Total
3. An Introduction to Support Vector Machines
http://www.support-vector.net/ http://svm.research.bell-labs.com/SVMrefs.html
4. Archives of SUPPORT-VECTOR-MACHINES ... 4. Lucent Technologies: SVM demo applet
http://www.jiscmail.ac.uk/lists/SUPPORT... http://svm.research.bell-labs.com/SVT/SVMsvt.html
5. SVM-Light Support Vector Machine 5. Royal Holloway Support Vector Machine
http://svm.dcs.rhbnc.ac.uk
Learned Google 29 13 27 69
http://ais.gmd.de/~thorsten/svm light/

Google 1. Kernel Machines


http://svm.first.gmd.de/
Learned Learned MSNSearch 18 4 7 29
2. Support Vector Machine
http://jbolivar.freeservers.com/
3. SVM-Light Support Vector Machine

4.
http://ais.gmd.de/~thorsten/svm light/
An Introduction to Support Vector Machines
Learned Toprank 21 9 11 41
http://www.support-vector.net/
5. Support Vector Machine and Kernel ... References
http://svm.research.bell-labs.com/SVMrefs.html
6. Archives of SUPPORT-VECTOR-MACHINES ...

7.
http://www.jiscmail.ac.uk/lists/SUPPORT...
Lucent Technologies: SVM demo applet
Result:
http://svm.research.bell-labs.com/SVT/SVMsvt.html
8. Royal Holloway Support Vector Machine
http://svm.dcs.rhbnc.ac.uk – Learned > Google
• Approach – Learned > MSNSearch
– Experiment setup generating “unbiased” clicks for fair evaluation. – Learned > Toprank
• Validity
– Clickthrough in combined ranking gives same results as explicit
Toprank: rank by increasing minimum rank over all 5 search engines
feedback under mild assumptions [Joachims, 2003].

Learned Weights Conclusions


• Weight Feature • Clickthrough data can provide accurate feedback
• 0.60 cosine between query and abstract – Clickthrough provides relative instead of absolute judgments
• 0.48 ranked in top 10 from Google • Ranking SVM can learn effectively from relative preferences
• 0.24 cosine between query and the words in the URL – Improved retrieval through personalization in meta search
• 0.24 doc ranked at rank 1 by exactly one of the 5 engines
• Current and future work
...
– Exploiting query chains
• 0.22 host has the name “citeseer”
… – Adapting intranet search for Cornell Library Web Collection and
Physics E-Print ArXiv
• 0.17 country code of URL is ".de"
• 0.16 ranked top 1 by HotBot – Implementation of methods in Osmot Search Engine
... – Robustness to “click-spam”
• -0.15 country code of URL is ".fi" – Learning theory for interactive learning with preference
• -0.17 length of URL in characters – Further user studies to get more operational model of user behavior
• -0.32 not ranked in top 10 by any of the 5 search engines • Info and Papers
• -0.38 not ranked top 1 by any of the 5 search engines – http://www.joachims.org

You might also like