Professional Documents
Culture Documents
Informatics
I.
I NTRODUCTION
II.
1) Word-based Scoring: The rst methods used for sentence scoring were based on word scoring. In such approaches,
each word receives a score and the weight of each sentence
is the sum of all scores of its constituent words. The most
important word-based scoring methods are listed below.
1 More
66
Word Frequency: As the name of the method suggests, the more frequently a words occurs in the text,
the higher its score;
Proper Noun: This method hypothesizes that sentences that contain a higher number of proper nouns
are possibly more important than others.
Sentence Resemblance to the Title: Sentence resemblance to the title is the vocabulary overlap between
this sentence and the document title;
Sentence Centrality: Sentence centrality is the vocabulary overlap between a sentence and other sentences
in the document;
Sentence Length: This feature is employed to penalize sentences that are either too short or long;
Three different contexts were used to assess sentence scoring methods. The combination of sentence scoring algorithms
in [3] were used in order to yield better quality results in
summaries. This section describes: (i) the datasets used; (ii)
the methodology followed in the assessment experiments, (iii)
the abbreviations used to better understand the experiments;
(iv) the results; and (v) the conclusions.
A. Corpus
Three different datasets were used in the assessment presented. They are detailed in the following subsections.
67
TABLE I.
WF
TFIDF
UpCase
PropNoun
WCOcurrency
LexicalS
CueP
NumData
SenLength
SPosition
SCentral
ResTitle
AggSim
TextRankS
BushyP
Word Frequency
TF/IDF
Upper Case
Proper Noun
Word Co-Occurrence
Lexical Similarity
Cue-Phrase
Numerical Data
Sentence Length
Sentence Position
Sentence Centrality
Resemblance-Title
Aggregate Similarity
TextRank Score
Bushy Path
D. Results
For each dataset we uses the results of each algorithm
performance [3] to analysis of some combinations of these
algorithms. The combinations are:
B. Evaluation Methodology
This section describes the methodology followed in the
experiments to assess the quality of summaries.
1) Quantitative assessment: ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [9] was used to quantitatively evaluate the summaries generated by using the different
scoring methods. ROUGE is a fully automated widely used
evaluator that essentially measures content similarity between
system-developed summaries and the corresponding gold summaries.
All algorithms;
Algorithms
68
TABLE II.
com01
com02
com03
com04
com05
com06
com07
com08
com09
com10
Combinations - CNN
LexicalS + ResTitle
WF + TFIDF + LexicalS
WF + TFIDF + SPosition1
WF + LexicalS + SPosition1
WF + LexicalS + ResTitle
TFIDF + SPosition + ResTitle
LexicalS + SPosition + ResTitle
WF + TFIDF + LexicalS + SPosition1
TFIDF + LexicalS + SPosition + ResTitle
WF+TFIDF+LexicalS+SPosition+ResTitle
TABLE IV.
com01
com02
com03
com04
com05
com06
com07
com08
com09
com10
Recall
0.73(0.17)
0.69(0.18)
0.72(0.17)
0.69(0.18)
0.74(0.16)
0.72(0.17)
0.74(0.16)
0.71(0.17)
0.69(0.18)
0.72(0.17)
Precision
0.36(0.12)
0.40(0.12)
0.36(0.12)
0.39(0.12)
0.36(0.12)
0.37(0.12)
0.36(0.12)
0.37(0.12)
0.39(0.12)
0.37(0.12)
F-measure
0.48(0,13)
0.49(0.13)
0.48(0.14)
0.49(0.13)
0.48(0.14)
0.48(0.14)
0.48(0.14)
0.48(0.13)
0.49(0.13)
0.48(0.13)
TFIDF + SenLength
TFIDF + TextRankS
WF + TFIDF + SenLength
WF + TFIDF + TextRankS
WF + SenLength + TextRankS
TFIDF + LexicalS + TextRankS
TFIDF + SenLength + TextRankS
WF + TFIDF + LexicalS + SenLength
WF + TFIDF + LexicalS + TextRankS
TFIDF + LexicalS + SenLength + TextRankS
TABLE V.
Results of ROUGE having Blog Summarization dataset as
gold standard applied to the proposed algorithms combinations
com01
com02
com03
com04
com05
com06
com07
com08
com09
com10
Recall
0.77(0.10)
0.74(0.11)
0.75(0.11)
0.74(0.12)
0.75(0.11)
0.76(0.11)
0.74(0.12)
0.74(0.12)
0.75(0.11)
0.75(0.11)
Precision
0.63(0.14)
0.64(0.14)
0.62(0.14)
0.63(0.14)
0.63(0.14)
0.63(0.14)
0.63(0.14)
0.63(0.15)
0.63(0.14)
0.63(0.15)
F-measure
0.69(0,13)
0.68(0.12)
0.68(0.13)
0.68(0.13)
0.68(0.12)
0.68(0.12)
0.68(0.13)
0.68(0.13)
0.68(0.13)
0.68(0.13)
Fig. 1.
com01 achieved the best results comparing other combinations and the single algorithms.
69
IV.
G ENERAL C ONCLUSIONS
Fig. 2.
Number of Correct Sentences x Combinations - Using Blog
Summarization dataset
3) Assessment Using the SUMMAC Dataset: The summarization assessment of scientic papers we perform an experiment using. The SUMMAC dataset contains larger documents
(in relation to other datasets used here). Each document usually
is 6-8 pages long and is well structured. In order to evaluate
the algorithms combinations Table VI presents the 10 best
performance combinations for this dataset.
TABLE VI.
com01
com02
com03
com04
com05
com06
com07
com08
com09
com10
Combinations - SUMMAC
CueP + ResTitle
SPosition1 + TextRankS
TFIDF + CueP + SPosition1
TFIDF + SPosition + ResTitle
CueP + SPosition + ResTitle
CueP + SPosition + TextRankS
SPosition1 + ResTitle + TextRankS
TFIDF + CueP + SPosition + ResTitle
TFIDF + CueP + SPosition + TextRankS
CueP + SPosition + ResTitle + TextRankS
V.
[1]
[2]
TABLE VII.
Results of ROUGE having SUMMAC dataset as gold
standard applied to the proposed algorithms combinations
com01
com02
com03
com04
com05
com06
com07
com08
com09
com10
Recall
0.38(0.12)
0.41(0.10)
0.42(0.11)
0.45(0.12)
0.34(0.11)
0.41(0.10)
0.45(0.11)
0.45(0.12)
0.49(0.10)
0.45(0.11)
Precision
0.27(0.10)
0.26(0.10)
0.24(0.10)
0.24(0.10)
0.29(0.10)
0.26(0.10)
0.25(0.10)
0.24(0.10)
0.21(0.10)
0.25(0.10)
[3]
F-measure
0.30(0,08)
0.30(0.08)
0.29(0.08)
0.29(0.08)
0.29(0.07)
0.30(0.08)
0.31(0.10)
0.29(0.08)
0.28(0.09)
0.30(0.08)
[4]
[5]
[6]
[7]
Combination conclusions:
ACKNOWLEDGEMENTS
[8]
[9]
70