You are on page 1of 28

Conceptual structures in

modern information retrieval

Claudio Carpineto
Fondazione Ugo Bordoni
Roma
carpinet@fub.it
Overview

• Keyword-based IR and early conceptual approaches

• Context and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structures

• Research at FUB

• Conclusions
Vector-based IR

Documents Query

Vectors of Vector of
weighted keywords weighted keywords

Matching

Retrieved documents
Term weighting

• tf.idf and vector space model (Salton) very popular


in70’s and 80’s

• BM25 (Robertson) has been the state of the art


in the 90’s

• Several recent term-weighting functions based on


statistical language modeling (Ponte, Lafferty)

• A new weighting framework based on deviation


from randomness + information gain (FUB + UG)
W = Inf1 Inf2 .

tf . log [(N + 1) / (n + 0.5)] tf / (tf + 1)


......É ......É

tfn = tf . log (1 + K . avg_l / l)


Inherent limitations of keyword-based IR

• Vocabulary problem

• Relations are ignored


Early approaches to conceptual IR

• n-grams (Salton 1975, Maarek 1989)

• parse tree (Dillon 1983, Metzler 1989)

• case relations (Fillmore 1968, Somers 1987)

• conceptual graphs (Dick 1991)


Why early conceptual IR not successful

• No best representation scheme


• Manual coding too costly
• Automated coding too hard
• Training required both for the indexer and the user
• Effectiveness not clearly demonstrated
• Retrieval task often not appropriate
Overview

• Vector-based IR and early conceptual approaches

• Context and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structures

• Research at FUB

• Conclusions
Evolution of topical IR

• Very short queries


• Heterogeneous collections
• Unreliable sources
• Interactive sessions
Model of modern topical IR

Docs Query Context

Indexing Indexing

Ranking

Visualization

Interaction

Use
Inverted Ranking
File

Select top D docs


+ norm
Compute σ(ω )
Σε λε χτ τοπ Ε τε ρµ σ

Weighted
Query
Form.
Docs
Query Θυε ρψ Εξπανσιον
Performance of retrieval feedback versus query difficulty

0,1
TREC-7
unexpanded
expanded
0,8
0,7
0,6
0,5
0,4
0,3
0,2
Ranking based on interdocument similarity

Cluster hypothesis (van Rijsbergen 1978)

Approaches

- Matching the query against document clusters (Willet 1988)


- Matching the query against transformed document
representations (GVSM, Wong 1987, LSI, Deerwester 1990)
- Computing the conceptual distance between query and
documents (Order-theoretical ranking, Carpineto 2000)
Order-theoretical ranking

4
KBS

3
CREDIT 1 1 3
KBS FINANCE NNS BANK
(D5)

2
FINANCE NNS 0 2 BANK
4

CREDIT FINANCE NNS KBS


WATERS
KBS BANK (D6)
(D4) (Query)

2 3
NNS NNS
BANK BANK
ACCOUNT RIVER
(D3) (D2)

1 1
NNS NNS
FINANCE FINANCE
CREDIT BANK
KBS ACCOUNT
(D7) (D1)
Performance of order-theoretical ranking

• Better than hierarchic clustering and comparable to


best matching on the whole collection

• Markedly better than both hierarchic clustering and


best matching on non-matching relevant documents

• Order-theoretical ranking does not scale up well but


it is synergistic with best matching document ranking
Overview

• Vector-based IR and early conceptual approaches

• Context and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structures

• Research at FUB

• Conclusions
Question Answering

Task:
Closed-class questions in unrestricted domains with
no guarantee of answer and result possibly scattered
over multiple documents
Question Answering

Approach:

1. Recognize type of queries


2. Retrieve relevant documents
3. Find sought entities near question words
4. Fall back to best-matching passage
retrieval in case of failure
Web Information Retrieval
Web Information Retrieval

Current tasks:

named-entity finding task


topic distillation task

Approach:

1. Use of multiple methods


2. Combination of results via interpolation and
normalization schemes
XML document retrieval

Goal:
Use document structure to improve precision and
recall of unstructured queries
“concerts this weekend at Sofia under 20 euros”

Approaches:
• Automatic inference of query structure
• Semi-automatic query annotation
• Hybrid query languages
Overview

• Vector-based IR and early conceptual approaches

• Context and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structures

• Research at FUB

• Conclusions
Recommender systems

“Related keyword” feature

versus

Context-dependent query reformulation


Docs
Document
Ranking
Query
Query

Term ranking 1

+ Term ranking 2

Term ranking 3
Combining text retrieval and text mining with concept lattices

Goal

Integration of multiple search strategies


(querying, browsing, thesaurus climbing,
bounding) into a unique Web interface
Conclusions

The use of conceptual structures surfaces in traditional


topic relevance retrieval and it is at the heart of many
non-topical retrieval tasks

Towards conceptual search


•Understand term meaning
•Adapt to the user
•Can translate between applications
•Explainable
•Capable of filtering and summarization

You might also like