Professional Documents
Culture Documents
f
i (ww0 ) + Æ1
cluster: 4 Evaluation Tasks
psi (w0 w) j def ;
f
i (w) + Æ1 jV j
=
We apply the techniques just described to two tasks that
where f
i (y ) is the frequency with which word sequence stand to benefit from models of content and changes in
y occurs within the sentences in cluster
i , and V is the topic: information ordering for text generation and in-
vocabulary. But because we want the insertion state sm formation selection for single-document summarization.
to model digressions or unseen topics, we take the novel These are two complementary tasks that rely on dis-
step of forcing its language model to be complementary joint model functionalities: the ability to order a set of
to those of the other states by setting pre-selected information-bearing items, and the ability
pred. 90
90
good performance on the other. These findings provide
80 support for the hypothesis that content models are not
70 only helpful for specific tasks, but can serve as effective
60
representations of text structure in general.
Summarization accuracy
50
6 Conclusions
40
30
In this paper, we present an unsupervised method for the
induction of content models, which capture constraints
20
on topic selection and organization for texts in a par-
10
content-model
word+loc
ticular domain. Incorporation of these models in order-
lead
0
0 5 10 15 20 25 30
ing and summarization applications yields substantial im-
Training-set size (number of summary/source pairs)
provement over previously-proposed methods. These re-
sults indicate that distributional approaches widely used
Figure 3: Summarization performance (extraction accu- to model various inter-sentential phenomena can be suc-
racy) on Earthquakes as a function of training-set size. cessfully applied to capture text-level relations, empiri-
cally validating the long-standing hypothesis that word
distribution patterns strongly correlate with discourse
sentence-level classifier on the full training set. Clearly,
patterns within a text, at least within specific domains.
this performance gain demonstrates the effectiveness of
An important future direction lies in studying the cor-
content models for the summarization task.
respondence between our domain-specific model and
5.5 Relation Between Ordering and Summarization domain-independent formalisms, such as RST. By au-
Methods tomatically annotating a large corpus of texts with dis-
course relations via a rhetorical parser (Marcu, 1997;
Since we used two somewhat orthogonal tasks, ordering Soricut and Marcu, 2003), we may be able to incorpo-
and summarization, to evaluate the quality of the content- rate domain-independent relationships into the transition
model paradigm, it is interesting to ask whether the same structure of our content models. This study could uncover
parameterization of the model does well in both cases. interesting connections between domain-specific stylistic
Specifically, we looked at the results for different model constraints and generic principles of text organization.
topologies, induced by varying the number of content- In the literature, discourse is frequently modeled using
model states. For these tests, we experimented with the a hierarchical structure, which suggests that probabilis-
Earthquakes data (the only domain for which we could tic context-free grammars or hierarchical Hidden Markov
evaluate summarization performance), and exerted direct Models (Fine et al., 1998) may also be applied for model-
control over the number of states, rather than utilizing the ing content structure. In the future, we plan to investigate
cluster-size threshold; that is, in order to create exactly m how to bootstrap the induction of hierarchical models us-
states for a specific value of m, we merged the smallest ing labeled data derived from our content models. We
clusters until m clusters remained. would also like to explore how domain-independent dis-
Table 5 shows the performance of the different-sized course constraints can be used to guide the construction
content models with respect to the summarization task of the hierarchical models.
and the ordering task (using OSO prediction rate). While
the ordering results seem to be more sensitive to the num- Acknowledgments We are grateful to Mirella Lapata
ber of states, both metrics induce similar ranking on the for providing us the results of her system on our data, and
models. In fact, the same-size model yields top perfor- to Dominic Jones and Cindi Thompson for supplying us
mance on both tasks. While our experiments are limited with their document collection. We also thank Eli Barzi-
to only one domain, the correlation in results is encourag- lay, Sasha Blair-Goldensohn, Eric Breck, Claire Cardie,
ing: optimizing parameters on one task promises to yield Yejin Choi, Marcia Davidson, Pablo Duboue, Noémie
Elhadad, Luis Gravano, Julia Hirschberg, Sanjeev Khu- D. R. Jones, C. A. Thompson. 2003. Identifying
danpur, Jon Kleinberg, Oren Kurland, Kathy McKeown, events using similarity and context. In Proceedings of
Daniel Marcu, Art Munson, Smaranda Muresan, Vin- CoNLL, 135–141.
cent Ng, Bo Pang, Becky Passoneau, Owen Rambow, R. Kittredge, T. Korelsky, O. Rambow. 1991. On the
Ves Stoyanov, Chao Wang and the anonymous reviewers need for domain communication language. Computa-
for helpful comments and conversations. Portions of this tional Intelligence, 7(4):305–314.
work were done while the first author was a postdoctoral J. Kupiec, J. Pedersen, F. Chen. 1999. A trainable doc-
ument summarizer. In I. Mani, M. T. Maybury, eds.,
fellow at Cornell University. This paper is based upon
Advances in Automatic Summarization, 55–60. MIT
work supported in part by the National Science Founda- Press, Cambridge, MA.
tion under grants ITR/IM IIS-0081334 and IIS-0329064 M. Lapata. 2003. Probabilistic text structuring: Exper-
and by an Alfred P. Sloan Research Fellowship. Any iments with sentence ordering. In Proceeding of the
opinions, findings, and conclusions or recommendations ACL, 545–552.
expressed above are those of the authors and do not nec- W. C. Mann, S. A. Thompson. 1988. Rhetorical struc-
essarily reflect the views of the National Science Founda- ture theory: Toward a functional theory of text organi-
tion or Sloan Foundation. zation. TEXT, 8(3):243–281.
D. Marcu. 1997. The rhetorical parsing of natural lan-
guage texts. In Proceedings of the ACL/EACL, 96–
References 103.
K. R. McKeown. 1985. Text Generation: Using Dis-
F. C. Bartlett. 1932. Remembering: a study in experi- course Strategies and Focus Constraints to Generate
mental and social psychology. Cambridge University Natural Language Text. Cambridge University Press,
Press. Cambridge, UK.
R. Barzilay, L. Lee. 2003. Learning to paraphrase: An W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flan-
unsupervised approach using multiple-sequence align- nery. 1997. Numerical Recipes in C: The Art of Scien-
ment. In HLT-NAACL 2003: Main Proceedings, 16– tific Computing. Cambridge University Press, second
23. edition.
D. Beeferman, A. Berger, J. Lafferty. 1997. Text seg- L. Rabiner. 1989. A tutorial on hidden Markov models
mentation using exponential models. In Proceedings and selected applications in speech recognition. Pro-
of EMNLP, 35–46. ceedings of the IEEE, 77(2):257–286.
S. F. Chen, K. Seymore, R. Rosenfeld. 1998. Topic D. R. Radev, K. R. McKeown. 1998. Generating natu-
adaptation for language modeling using unnormalized ral language summaries from multiple on-line sources.
exponential models. In Proceedings of ICASSP, vol- Computational Linguistics, 24(3):469–500.
ume 2, 681–684.
O. Rambow. 1990. Domain communication knowledge.
G. DeJong. 1982. An overview of the FRUMP sys- In Fifth International Workshop on Natural Language
tem. In W. G. Lehnert, M. H. Ringle, eds., Strategies Generation, 87–94.
for Natural Language Processing, 149–176. Lawrence
J. Reynar, A. Ratnaparkhi. 1997. A maximum entropy
Erlbaum Associates, Hillsdale, New Jersey.
approach to identifying sentence boundaries. In Pro-
P. A. Duboue, K. R. McKeown. 2003. Statistical acqui- ceedings of the Fifth Conference on Applied Natural
sition of content selection rules for natural language Language Processing, 16–19.
generation. In Proceedings of EMNLP, 121–128.
R. E. Schapire, Y. Singer. 2000. BoosTexter: A
S. Fine, Y. Singer, N. Tishby. 1998. The hierarchical boosting-based system for text categorization. Ma-
hidden Markov model: Analysis and applications. Ma- chine Learning, 2/3:135–168.
chine Learning, 32(1):41–62.
R. Soricut, D. Marcu. 2003. Sentence level discourse
R. Florian, D. Yarowsky. 1999. Dynamic non-local lan-
parsing using syntactic and lexical information. In
guage modeling via hierarchical topic-based adapta- Proceedings of the HLT/NAACL, 228–235.
tion. In Proceedings of the ACL, 167–174.
M. White, T. Korelsky, C. Cardie, V. Ng, D. Pierce,
D. Gildea, T. Hofmann. 1999. Topic-based language
K. Wagstaff. 2001. Multi-document summarization
models using EM. In Proceedings of EUROSPEECH,
via information extraction. In Proceedings of the HLT
2167–2170.
Conference, 263–269.
Z. Harris. 1982. Discourse and sublanguage. In R. Kit-
A. Wray. 2002. Formulaic Language and the Lexicon.
tredge, J. Lehrberger, eds., Sublanguage: Studies of
Cambridge University Press, Cambridge.
Language in Restricted Semantic Domains, 231–236.
Walter de Gruyter, Berlin; New York. J. Wu, S. Khudanpur. 2002. Building a topic-dependent
maximum entropy language model for very large cor-
M. Hearst. 1994. Multi-paragraph segmentation of ex-
pora. In Proceedings of ICASSP, volume 1, 777–780.
pository text. In Proceedings of the ACL, 9–16.
R. Iyer, M. Ostendorf. 1996. Modeling long distance
dependence in language: Topic mixtures vs. dynamic
cache models. In Proceedings of ICSLP, 236–239.