Professional Documents
Culture Documents
A topic is …
a seminal event or activity, along with all
directly related events and activities.
A story is …
a topically cohesive segment of news that
includes two or more DECLARATIVE
independent clauses about
a single event.
Example Topic
Transcription:
text (words)
Story:
Non-story:
• 3 Source Conditions:
• text sources and manual transcription of the audio
sources
• text sources and ASR transcription of the audio
sources
• text sources and the sampled data signal for audio
sources
• 2 Story Boundary Conditions:
• Reference story boundaries provided
• No story boundaries provided
The Topic Detection Task:
First Stories
= Topic 1 Time
= Topic 2
same topic?
0.01
rin
h
lis
da
g
En
n
Ma
1999 TDT3 Tracking Results
Required Evaluation Condition
4 English Training Stories, Multilingual Test Texts,
Newswire Text+Broadcast News ASR, Given ASR Boundaries
95%
Confidence
0.092
Bars
1 Actual
Normalized Tracking Cost
Decision Cost
Mi
nim
Gra um D
ph
0.1 Co ET
st
0.01
1
1
E1
1
d1
1
a1
nn
U
N
on
M
G
CM
ow
BB
Pe
ag
U
UI
U
Dr
1999 TDT3 Tracking Results
Required Evaluation Condition
Story Segmentation using
Decision Trees
Using Maximum Entropy
Language Models
Idea: compare perplexities of adaptive trigram with
general English trigram
M1
q1
M2 q2
q3 ?
…
Mk
k
P (q N +1 | q1...q N ) = P(q N +1 | M i ) P( M i | q1...q N )
i =1
Using Relevance Models in Link
Detection
• Question:
• Are stories S1 and S2 linked?
• Approach
• Create are relevance model for S1 and S2
• Measure the distance between the models
Building Relevance Models as
Topic Models
• S1 :
– Generate queries from it
– Retrieve documents from the collection
– Estimate
tf w, D cf w
P( w | D) = λ + (1 − λ )
|D| Coll.Size
P( w | S1 ) = P ( w | q1...q N )
= P( w | D ) P ( D |q1...q N )
D∈R
Measuring Distances
Kullback-Leibler Distance
P( w | S1 )
D( S1 || S 2 ) = P( w | S1 ) log
w P( w | S 2 )
Symmetric Kullback-Leibler Distance
Dsym ( S1 || S 2 ) = (D( S1 || S 2 ) + D ( S 2 || S1 ) )
1
2
Kullback-Leibler Distance with “Clarity”
P( w | S 2 )
DCl ( S1 || S 2 ) = P( w | S1 ) log
w P( w | GE )
(GE : general english)
Comparison on Training Data
Comparison on Evaluation Data
Distance Metric /Smoothing
• TDT:
• International Benchmark
• Various sub tasks
• Link detection