You are on page 1of 17

Is More Always Better ?

Dumais et al, Microsoft Research

A presentation by George Karlos

What is a Question Answering System ?

Combines information retrieval and natural language processing Answer questions (posed in natural language)
Retrieve answers
No documents
No passages

TREC Question Answering Track


Fact-based, short-answer questions
Who killed Abraham Lincoln? How tall is Mount Everest?

Dumais et als System


Focus on same type of questions Techniques are more broadly applicable

More Data

Other QA groups:

Variety of linguistic resources


(part-ofspeech tagging, syntactic parsing, semantic relations, named entity extraction, dictionaries, Wordnet, etc. )

Higher Accuracy

Use Web as source

Dumais et als System:

WEB Redundancy !!!


multiple, differently phrased, answer occurrences

Small information source Hard to find answers


Likely only 1 answer exists Complex relations between Q & A
Anaphor resolution, synonymy, alternate syntactic formulations, indirect answers (NLP)

Web More likely to find answers in simple relation to the question. Less likely to deal with NLP systems difficulties.

Enables Simple Query Rewrites


More sources that include answers in a simple, related to

the question form. e.g. Who killed Abraham Lincoln?

1. ______ killed Abraham Lincoln. 2. Abraham Lincoln was killed by ______ etc.

Facilitates Answer Mining


Improves efficiency even if no obvious answers are found.

e.g. How many times did Bjorn Borg win Wimbledon?


1. 2. 3. 4. Bjorn Borg <text> <text> Wimbledon <text> <text> 5 <text> Wimbledon <text> <text> <text> Bjorn Borg <text> 37 <text> <text> Bjorn Borg <text> <text> 5 <text> <text> Wimbledon 5 <text> <text> Wimbleton <text> <text> Bjorn Borg

5 is the most frequent number Most likely the correct answer.

Four main components:


Rewrite Query N-gram Mining

N-gram Filtering/Reweighting N-gram Tiling

Rewrite Query

Rewrite query in a way that a possible answer might be formed


e.g. When was Abraham Lincoln born?

Abraham Lincoln was born on <DATE>

7 categories of questions Different sets of rewrite rules 1-5 rewrite types

Rewrite Query

Output : [string , L/R/- , weight]


Reformulated search query Position answer is expected How much the answer is preferred

High precision query Abraham Lincoln was born on more likely to be correct

Lower precision query Abraham AND Lincoln AND Born

Rewrite Query

Simple string manipulations


Where is the Louvre museum located?

The Louvre Museum is located in


1.
2.

More rewrites Proper rewrite guaranteed to be found

W1 IS W2 W3 . Wn W1 W2 IS W3.Wn etc.

Using a parser can result in proper rewrite not be found

Final rewrite: ANDing of non-stop words Louvre Museum AND located Stop Words ( in, the, etc..) Important indicators of likely answers

N-Gram Mining

N-Grams

Collections of words or letters that frequently appear on the web


1. 2. 3. 4.

1-,2-,3-grams are extracted from the summaries Scored based on the weight of the query rewrite that retrieved them Scores summed Final score based on rewrite rules and number of unique summaries in which it occurred

N-Gram Filtering/Reweighting

Query assigned one of seven question types (who, what, how many, etc..) System determines what filters to apply
Boost score of a potential answer Remove strings from the candidate list

Answers analyzed and rescored

N-Gram Tiling

Answer tiling algorithm


Merges similar answers Assembles longer answers out of answer fragments

e.g. A B C and B C D A B C D Algorithm stops when no n-grams can be further tiled.

TREC vs. WEB


AND Rewrites Only, Top 100 (Google)
MRR
Web1 TREC, Contiguous Snippet

NumCorrect 281 117

PropCorrect 0.562 0.234

0.450 0.186

TREC, Non-Contiguous Snippet

0.187

128

0.256

AND Rewrites Only, Top 100 (MSNSearch)


MRR
Web1 TREC, Contiguous Snippet TREC, Non-Contiguous Snippet

Lack of redundancy in TREC !

NumCorrect 281 117 128

PropCorrect 0.562 0.234 0.256

0.450 0.186 0.187

Web2, Contiguous Snippet


Web2, Non-Contiguous Snippet

0.355
0.383

227
243

0.454
0.486

TREC & WEB(Google) Combined


Larger source. Better?
Trec 0.262 MRR
Web 0.416 MRR

Combined QA results

MRR: 0.433 NumCorrect: 283 4.1% improvement

QA Accuracy and number of snippets


More is better up to a limit

Smaller collections Lower Accuracy


Snippet quality less important

You might also like