Conceptual Structures in Modern Information Retrieval: Claudio Carpineto

Conceptual structures in
modern information retrieval
Claudio Carpineto
Fondazione Ugo Bordoni
Roma
carpinet@fub.it
Overview
• Keyword-based IR and early conceptual approaches
• Context and concepts in modern topical IR
• Emerging IR tasks requiring knowledge structures
• Research at FUB
• Conclusions
Vector-based IR
Documents Query
Vectors of Vector of
weighted keywords weighted keywords
Matching
Retrieved documents
Term weighting
• tf.idf and vector space model (Salton) very popular

in70’s and 80’s
• BM25 (Robertson) has been the state of the art

in the 90’s
• Several recent term-weighting functions based on

statistical language modeling (Ponte, Lafferty)
• A new weighting framework based on deviation

from randomness + information gain (FUB + UG)
W = Inf1 Inf2 .
tf . log [(N + 1) / (n + 0.5)] tf / (tf + 1)

......É ......É
tfn = tf . log (1 + K . avg_l / l)

Inherent limitations of keyword-based IR
• Vocabulary problem
• Relations are ignored

Early approaches to conceptual IR
• n-grams (Salton 1975, Maarek 1989)
• parse tree (Dillon 1983, Metzler 1989)
• case relations (Fillmore 1968, Somers 1987)
• conceptual graphs (Dick 1991)

Why early conceptual IR not successful
• No best representation scheme

• Manual coding too costly
• Automated coding too hard
• Training required both for the indexer and the user
• Effectiveness not clearly demonstrated
• Retrieval task often not appropriate
Overview
• Vector-based IR and early conceptual approaches
• Research at FUB
• Conclusions
Evolution of topical IR
• Very short queries

• Heterogeneous collections
• Unreliable sources
• Interactive sessions
Model of modern topical IR
Docs Query Context
Indexing Indexing
Ranking
Visualization
Interaction
Use
Inverted Ranking
File
Select top D docs

+ norm
Compute σ(ω )
Σε λε χτ τοπ Ε τε ρµ σ
Weighted
Query
Form.
Docs
Query Θυε ρψ Εξπανσιον
Performance of retrieval feedback versus query difficulty
0,1
TREC-7
unexpanded
expanded
0,8
0,7
0,6
0,5
0,4
0,3
0,2
Ranking based on interdocument similarity
Cluster hypothesis (van Rijsbergen 1978)
Approaches
- Matching the query against document clusters (Willet 1988)

- Matching the query against transformed document
representations (GVSM, Wong 1987, LSI, Deerwester 1990)
- Computing the conceptual distance between query and
documents (Order-theoretical ranking, Carpineto 2000)
Order-theoretical ranking
4
KBS
3
CREDIT 1 1 3
KBS FINANCE NNS BANK
(D5)
2
FINANCE NNS 0 2 BANK
4
CREDIT FINANCE NNS KBS

WATERS
KBS BANK (D6)
(D4) (Query)
2 3
NNS NNS
BANK BANK
ACCOUNT RIVER
(D3) (D2)
1 1
NNS NNS
FINANCE FINANCE
CREDIT BANK
KBS ACCOUNT
(D7) (D1)
Performance of order-theoretical ranking
• Better than hierarchic clustering and comparable to

best matching on the whole collection
• Markedly better than both hierarchic clustering and

best matching on non-matching relevant documents
• Order-theoretical ranking does not scale up well but

it is synergistic with best matching document ranking
Overview
• Research at FUB
• Conclusions
Question Answering
Task:
Closed-class questions in unrestricted domains with
no guarantee of answer and result possibly scattered
over multiple documents
Question Answering
Approach:
1. Recognize type of queries

2. Retrieve relevant documents
3. Find sought entities near question words
4. Fall back to best-matching passage
retrieval in case of failure
Web Information Retrieval
Web Information Retrieval
Current tasks:
named-entity finding task

topic distillation task
Approach:
1. Use of multiple methods

2. Combination of results via interpolation and
normalization schemes
XML document retrieval
Goal:
Use document structure to improve precision and
recall of unstructured queries
“concerts this weekend at Sofia under 20 euros”
Approaches:
• Automatic inference of query structure
• Semi-automatic query annotation
• Hybrid query languages
Overview
• Research at FUB
• Conclusions
Recommender systems
“Related keyword” feature
versus
Context-dependent query reformulation

Docs
Document
Ranking
Query
Query
Term ranking 1
+ Term ranking 2
Term ranking 3
Combining text retrieval and text mining with concept lattices
Goal
Integration of multiple search strategies

(querying, browsing, thesaurus climbing,
bounding) into a unique Web interface
Conclusions
The use of conceptual structures surfaces in traditional

topic relevance retrieval and it is at the heart of many
non-topical retrieval tasks
Towards conceptual search

•Understand term meaning
•Adapt to the user
•Can translate between applications
•Explainable
•Capable of filtering and summarization

Conceptual Structures in Modern Information Retrieval: Claudio Carpineto

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Conceptual Structures in Modern Information Retrieval: Claudio Carpineto

Uploaded by

Copyright:

Available Formats

Conceptual structures in

modern information retrieval

• Keyword-based IR and early conceptual approaches

• Context and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structures

• tf.idf and vector space model (Salton) very popular

• BM25 (Robertson) has been the state of the art

• Several recent term-weighting functions based on

• A new weighting framework based on deviation

tf . log [(N + 1) / (n + 0.5)] tf / (tf + 1)

tfn = tf . log (1 + K . avg_l / l)

• Relations are ignored

• n-grams (Salton 1975, Maarek 1989)

• parse tree (Dillon 1983, Metzler 1989)

• case relations (Fillmore 1968, Somers 1987)

• conceptual graphs (Dick 1991)

• No best representation scheme

• Vector-based IR and early conceptual approaches

• Context and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structures

• Very short queries

Docs Query Context

Select top D docs

Cluster hypothesis (van Rijsbergen 1978)

- Matching the query against document clusters (Willet 1988)

CREDIT FINANCE NNS KBS

• Better than hierarchic clustering and comparable to

• Markedly better than both hierarchic clustering and

• Order-theoretical ranking does not scale up well but

• Vector-based IR and early conceptual approaches

• Context and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structures

1. Recognize type of queries

named-entity finding task

1. Use of multiple methods

• Vector-based IR and early conceptual approaches

• Context and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structures

“Related keyword” feature

Context-dependent query reformulation

Integration of multiple search strategies

The use of conceptual structures surfaces in traditional

Towards conceptual search

You might also like