You are on page 1of 7

Super Parsing:Sentiment Classication with Review Extraction

Jian Liu , JianXin Yao and Gengfeng Wu Department of Computer Science, Shanghai University, 149 Yanchang Rd, 200072 Shanghai, PR China {liujian,jianxin yao,gfwu}@mail.shu.edu.cn Abstract
This paper describes the sentiment classication with review extraction. Whole process can be illustrated logically as: (1) extract the review expressions on specic subjects and attach sentiment tag and weight to each expression; (2) calculate the sentiment indicator of each tag by accumulating the weights of all the expression with the corresponding tag; (3) given the indicators on different tags, use a classier to predict the sentiment label of the text. A system Approximate Text Analysis (ATA) is used for review extraction in stage 1. It follows the idea of Super Parsing, which enables non-adjacent constituents to be merged to deduce a new one. To traverse the valid constituent combinations in Super Parsing, an algorithm named Candidate List Algorithm (CLA) is proposed. Then the performance of three kinds of classiers (a simple linear classier, SVM and decision tree) in stage 3 is studied. The experiments on on-line documents show that the SVM algorithm achieves the best performance.

Figure 1. Sentiment Classication with IE corresponding tag; 3. use a classier to predict the sentiment label with the indicators on different sentiment tags.

Introduction

Nowadays, with the amazing expansion of Internet and e-commerce, increasing documents appear on the web, in the form of news, report, BBS post, BLOG and so on. As part of the effort to better organize the substantial on-line information for users, researchers have been actively investigating the problem of automatic text classication, especially according to the authors sentiment (attitude) to the subject. This paper describes Sentiment Classication with review extraction. The whole process is (see Figure 1): 1. extract the review expressions on specic subjects and attach sentiment tag and weight to each expression; 2. calculate the sentiment indicator for each tag by accumulating the weights of all the expression with the

The rest of this paper is organized as following: Section 2 reviews the previous work related to this research. Section 3 introduces the Super Parsing. Section 4 describes an implementation of Super Parsing, named Approximate Text Analysis (ATA). More detailed than the work in [11], we propose Candidate List Algorithm (CLA) to search the valid constituent combinations in Super Parsing (see section 3.2). This paper also studies the accuracy to predicate sentiment class label with the indicators, when different classiers are adopted (see sections 5 and 6).

Related Work

One related area to this work is Information Extraction (IE), whose goal is to transform text into a structured format

Proceedings of the 2005 The Fifth International Conference on Computer and Information Technology (CIT05) 0-7695-2432-X/05 $20.00 2005

IEEE

and thereby reduce the information in a document to a tabular structure. Most extracting systems (such as AutoSlog, LIEP, PALKA, WHISK, RAPIER, SRV etc.[12]) use templates to detect the relevant events, varied from the house renting, job hunting to the terrorist activities. As a further study, some researchers have explored to extract evaluation-related expressions. Such expressions can be sentimental review or neutral description on some subjects. Yi and Nasukawa manually construct the tag patterns to extract sentiment expressions on specic subject. In [13] a moving window is mentioned for the seeking process. Eventually the expressions will be outputted in the form of ternary tuples and binary tuples[13, 16]. Kobayashi and Inui [8] explore the sentiment expression extraction at a ner granularity, in which the expressions about the specic features of products can be identied with some co-occurrence patterns. Additionally, Hu and Liu [5, 6] use Semantic Orientation to pick up evaluative sentences on various features of pre-determined product, and classify these sentences on the polarity, feature to generate sentimental summary. Different from the work mentioned above, the extraction approach here does not only detect the sentiment expressions, but also evaluate the validation and/or strength of expressions with weights. Then it provides quantitative information for prediction of sentiment label in the later process. Another area is Sentiment Classication. As a prevailing approach, Semantic Orientation (SO) is dened to evaluate the sentiment direction and strength of a text or a word when classication. Turneys work in [15] dened the SO of a specic word with the mutual information between this word and the words excellent and poor, where the mutual information is computed using statistics gathered by a search engine. Fei et al.[4] present a phrase pattern-based method. More extensive than Turneys work, the phrase patterns do not only include adjectives and verbs, but also nouns, adverbs, conjunctions and prepositions. The semantic orientation methods search linguistic relations within each sentence, so is not sensitive to some inter-sentential semantic relations. However, this work introduces a Super Parsing strategy to seek the valid relations beyond sentential scope. It can detect some sentimental expressions which are hardly recognized by pure syntactic approaches, so is more helpful for the evaluation of authors sentiment.

The new government consists of many elites. It has achieved greatly in the recent years. The leader is quite aspiring. where the word government is a specic subject related to a specic domain politics, and each underlined word is a evaluative word (achieve) or an aspect (leader). From the view of human reader, these underlined words semantically depend on government. But many extracting techniques, which analyze the text with mere syntactical knowledge, easily neglect the relations among these constituents. Since they mainly adopt the divide-and-conquer-based parsing idea, in which only adjacent constituents can be merged to create a new one. Motivated by this, we employ Super Parsing to search the potential association among the constituents. It allows the merge of non-adjacent constituents even in case these constituents exist in adjacent sentences, and allows a constituent contributing to the generation of PLURAL new constituents in the process of iterative deduction. Therefore, it is able to recognize some inter-sentential relations, such as government achieved (NP-VP) or government leader (NP-NP) in the above example.

3.1

Formalism

Deduction is the elementary step in parsing. It merges some existing constituents to generate a new one. Assume a rule X1 Xn C. The production C is called mother of the rule while each constraint Xi is called daughter. If the daughter Xi is matched by constituent yi = xi , [li , ri ] , it is denoted as xi Xi in this work. The constituent expression y = x, [l, r] , means the constituent y takes the text region from l to r and its linguistic information is stored as x. A constituent yi can represent a word, a phrase, a grammar structure or a semantic concept. Traditionally, only adjacent constituents can be merged. For yi and yi+1 , it is necessary that ri = li+1 . However, the Super Parsing employs in deduction a loose policy ri li+1 , which enables non-adjacent constituents to match the adjacent daughters of a rule. Following is the denition of our loose deduction: Denition 1 Given a rule r X1 Xn C and constituents yi = xi , [li , ri ] (i = 1, , n), if (1) xi Xi (2) li < ri (3) ri li+1 , then a constituent z = c, [l1 , rn ] is generated by loose deduction, where c C. Such generation is denoted as y1 , , yn z To illustrate Super Parsing more clearly, some denitions are given as follow:
r

Super Parsing

Due to some linguistic phenomena, such as anaphora and ellipsis, the syntactical extraction approaches can hardly discover some inter-sentential semantic relations. Ignoring these relations will always lose some valuable information, and eventually leads to a partial evaluation on authors attitude. Taking the following sentence as an example,

Proceedings of the 2005 The Fifth International Conference on Computer and Information Technology (CIT05) 0-7695-2432-X/05 $20.00 2005

IEEE

Denition 2 G(S, r) = r {y1 , , yn S; z S| y1 , , yn z z} For a constituent set S, the Generating Set G(S, r) is the constituents that can be generated by rule r via loose deduction. Denition 3 F (S, R) (r R|G(S, r) = ) A boolean function F (S, R) is constructed to decide whether the parsing process should be ceased. It is true if and only if no more new constituents can be generated by rule set R on constituent set S.
S-PARSE(S, R) 1 while F (S, R) = f alse 2 do 3 for r R 4 do 5 S = S G(S, r)

Figure 3. Candidate Lists of a given rule r proposed in this paper 1 . The tabling data structure is called candidate list. As illustrated in gure 3, the candidate list CLr,j is used to store the constituents that match the daughter j of the rule r.
CLA(CL, r, u) 1 n #(r.D) 2 for i 1 to n 3 do 4 if u r.Di 5 then 6 if i = n 7 then UCMB(CL, r, u); 8 else (CLr,i ).push(u); UCMB(CL, r, u) 1 n #(r.D) 2 for X {c : U n ; j : N |1 3 do 4 if RV (X) = true 5 then r 6 X Y; 7 output(Y );

Figure 2. Pseudo code for Super Parsing Figure 2 describes the whole process of Super Parsing. Given the initial constituent set S, the Super Parsing iteratively casts rules on existing constituents to generate new ones and add them back into S, until no more new constituents can be generated. Following the sample mentioned in this section, suppose a super parser has limited rules to recognize words government(NP), achieved(VP) and leader(NP), and grammar knowledge NP + VPEVT and NP + NPNP. All the words in the sample are extracted as the initial constituent set. Then each recognizable word deduces to a corresponding grammar element (e.g. leader deduces NP). In the following deduction, an EVT denoting government achieved is created by the incorporation of NP for government and VP for achieved. Meanwhile, an NP denoting government leader is also generated similarly. The inter-sentential event government achieved and semantic entity government leader, which cannot be found directly by pure syntactical analysis, are recognized by Super Parsing. However, the loose deduction also raises the risk of importing some constituents that are grammatically correct but semantically false. As the solution in this work, we alleviate it in two ways: to constraint the search scope for constituent combination (see sections 3.2 and 3.3); and to describe constituent more informatively, i.e. use a more complex data structure instead of a single symbol (see section 4.2).

j < n cj CLr,j cn = u c}

Figure 4. Candidate List Algorithm Figure 4 shows the process of the Candidate List Algorithm. Given a rule r, an input Unit U and candidate list set CL, the algorithm will traverse the valid constituent combination from Candidate List and output the new constituents according to the rule for each combination. Denition 4 RV (X) (i < #X|Xi = xi : [li , ri ] (ri li+1 ))

3.2

Candidate List Algorithm

To implement the Super Parsing effectively, a tablingbased method, named candidate list algorithm (CLA), is

The function RV (X) checkes the region validation of constituent list X, where the operator #X returns the member amount of list X. If the input and output of CLA is set to a constituent queue, the Super Parsing can be easily implemented as an incremental procedure. It receives a constituent from and outputs newly generated ones back to the queue, if any. The procedure runs in cycle until no more constituents can be
1A

more detailed embodiment is revealed in [10].

Proceedings of the 2005 The Fifth International Conference on Computer and Information Technology (CIT05) 0-7695-2432-X/05 $20.00 2005

IEEE

obtained from the input. Such an incremental system of Super Parsing is detailed in section 4.

4.1

Tag Lexicon

3.3

Combinational Explosion

The term Linguistic Unit is used to denote the data structure corresponding to a constituent. In this work, a n n1 w(y) = W (x1 , , xn ) = w(xi ) log2 (D(xi , xi+1 )+2) Linguistic Unit contains three components: text region, key feature, and additional feature set. Text Region is the text i=1 i=1 area occupied by the given constituent. Key Feature stores (1) the tag denoting the essence of a linguistic unit. For examwhere w(x) is the weight of constituent x, and D(x, y) ple, a unit attached with a WRD as key feature, means that means the distance between constituents x and y. If x and it is a word. While the Additional Feature Set is a tag set y take the region lx , rx and ly , ry respectively, then to store the secondary descriptive information for the unit. (2) D(x, y) = min(ly rx , lx ry ) In the following sentence, Additionally, each initial constituent is set as 1. With the weight on each constituent, a threshold can be established to eliminate the new constituents that have poor reliability. Meanwhile, another threshold also can be set to eliminate a combination member which is distant from its neighbors in a combination. Hence, the scale of constituent combination can be restricted effectively.
0

The Super Parsing is powerful to nd potential constituent relations. However, it also leads to the explosive combination when searching the qualied constituents matching rules. So, some pruning methods are required to curtail the searching scale. This work mainly employs a weighting mechanism to evaluate the validation of a constituent. If a constituent y is produced by constituents x1 , , xn on a certain rule, the weight of y is less when x1 , , xn are more sparsely distributed and/or they are less weighted. In this work, the following empirical formula is adopted to calculate the weight of a constituent,

ATA uses tags to describe the different linguistic meaning of a constituent. In this work, a tag SBJ denotes that the attached constituent is a subject, probably a product, a service or something else. A tag COM denotes that the attached constituent is a commendation to something, while a tag DER is derogatory. More specically, tags COM S and COM O is commendatory to subject constituent and object constituent respectively. And the tags DER S and DER O can be understood in a similar way. There are two sentiment tags POS and NEG in this work, representing positive and negative evaluation respectively.

4.2

Linguistic Unit

The 1 congress 2 , 3 I 4 think 5 , 6 is 7 wise 8

Approximate Text Analysis

This section describes an incremental system of Super Parsing, named Approximate Text Analysis (ATA) 2 . It is used here to extract the sentimental expressions on specic subjects and to evaluate the strength of different sentiments. The dashed box in Figure 1 is the whole framework of ATA. The approach receives a raw text as input, and outputs the review units. A raw text can be pure text, or tagged document such as HTML and XML. While Linguistic Unit (or unit for short in some cases) is the data structure of constituent. For each outputted unit, its weight and sentiment tags will be used for the calculation of sentiment genre and strength of the corresponding review. This work set two sentiment tags: POS (positive) and NEG (negative). Since the Sentiment Accumulator is highly related to ATA in this work, it is also mentioned in section 4.4.
2A

the congress is wise is a constituent about expression. Its text region can be described in multiple forms. The region of the above constituent can be depicted as a set of continuous location { 1, 2 , 6, 8 }, or a bit-pattern 0100011 , where a 1 indicates that constituent occupies this position [7]. More roughly, it can be depicted as a pair 1, 8 . For convenience, the location pair is adopted in this work to represent the region. And the Key Feature of the unit is set as EXP to denote a sentiment expression. The Additional Feature Set can contain several tags to describe a constituent in different meanings. To mark the constituent as a positive expression, the tag POS should be added into the Additional Feature Set.

4.3

Deduction Rules

more detailed embodiment is revealed in [9].

In this work, Deduction Rules are invoked in the Deduction Unit to generate new linguistic constituents with the given constituent combination. They are used by Super Parsing to detect the words, phrases, grammar structures and semantic relations relevant to the sentiment expression. Figure 5 shows some sample rules for ATA. For example, the rule 1 means that the word congress is a subject for analyzing. The word wise is a positive evaluation, so the rule can be written as shown in rule 2. When the word

Proceedings of the 2005 The Fifth International Conference on Computer and Information Technology (CIT05) 0-7695-2432-X/05 $20.00 2005

IEEE

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

WRD : congress {} ENT { SBJ } WRD : wise {} ADJ { COM } WRD : def eat {} VER { COM S DER O } WRD : attack {} VER { DER S } ADJ { DER } + ENT { SBJ+ } ENT { N EG } ADJ { COM } + ENT { SBJ+ } ENT { P OS } ENT { SBJ } + ADJ { DER } EXP { N EG } ENT { SBJ } + ADJ { COM } EXP { P OS } ENT { SBJ } + VER { COM S } VER { P OS } ENT { SBJ } + VER { DER S } VER { N EG } VER { COM O } + ENT { SBJ } EXP { P OS } VER { DER O } + ENT { SBJ } EXP { N EG }

Figure 5. Some sample rules for ATA defeat always implies the desirable affect to the subject constituent while the undesirable affect to the object constituent, so the tags COM S and DER O are used to denote respectively in rule 3. Similarly, the word attack expresses negative feeling to the subject constituent, so the tag DER S is marked, as dened in rule 4. As shown in rule 8, a sentence like the congress is wise can be recognized as a positive expression on the specic subject congress. Because of the Super Parsing, the words the and is is ignored, but the expression still can be recognized. In the case of the congress attacks the new policy, a human reader can feel the authors negative sentiment to the congress, as interpreted in rule 10. In some rules, a postx + is behind tags. It means that both the unit matched and the new unit generated by the rule should contain the tag. For example, the constituent matches the second daughter of rule 5 should have the tag SBJ, and the new unit generated by this rule should also have the tag SBJ. However, if a tag is presented without any postx, the matched unit should contain it but the tag is not kept in new constituent. And all the other additional features of matched units not required by rule will be added into the new unit.

Candidate List Algorithm. Each time, it obtains one unit from Queue. In case that the Queue is empty, the Deduction Unit will inform the scanner to generate a unit, and wait until a unit is ready in the Queue. For the new accepted unit, the Deduction Unit enforces the CLA for each rule and sends new units back into the Queue, if any. The Peeker monitors any units sent from Deduction Unit into the Queue. When it nds a constituent containing a sentiment tag (POS or NEG in this work), the Peeker then send the sentiment tag as well as the weight of the constituent to the Sentiment Accumulator. The Sentiment Accumulator sums up the weights of all the received unit on the sentiment tags. When the whole text is scanned by ATA, the eventual weight sum-up of each tag is the sentiment indicator. These indicators will be left to a classier for sentiment label prediction.

Classiers

Figure 1 also shows the sentiment label prediction with indicators. The classier accepts the sentiment indicators as input and outputs a class label for the corresponding text. This work tests three different classiers, namely a simple linear classier (SLC for short), Support Vector Machine (SVM) and C4.5 algorithm. The classier SLC is dened as following: SLC(P, N ; c) = positive N P N negative otherwise c (3)

4.4

Sub Components

The Scanner is a component to provide initial linguistic units for ATA. When the Queue is empty, the scanner will be informed to offer a unit. It then scans the raw text by moving the reading pointer forward until nding a word. The scanner creates a new unit for the word, sends it into the Queue and waits for the next request. For instance, when a word congress is found in the text location 1, 2 , the component creates a unit 1, 2 , WRD, , congress . More details of linguistic unit can be seen in section 4.2. The Deduction Unit is the core of ATA to enforce the

It receives indicators on two tags POS and NEG, and returns a label positive or negative. The threshold c is a parameter, which makes best discrimination between the positive and negative instances of training set. The classier SLC is implemented in GNU C++. Support vector machines (SVMs) [1] are a set of related supervised learning methods, applicable to both classication and regression. Basically, a SVM algorithm creates a maximum-margin hyper plane that lies in a transformed input space. Given training examples labeled either positive or negative, the hyper plane splits the positive and negative training examples, such that the distance from the closest examples (the margin) to the hyper plane is maximized. The SVM implementation in this work is the software package SVMTorch 3 , developed by Collobert[3, 2]. C4.5 [14] is a well-known decision tree algorithm. A decision tree is a predictive model; that is, a mapping of observations about an item to conclusions about the items target value. Each inner node corresponds to variable; an arc to a child represents a possible value of that variable. A leaf represents the predicted value of target variable given
3 http://www.idiap.ch/machine

learning.php?content=Torch/en SVMTorch.txt

Proceedings of the 2005 The Fifth International Conference on Computer and Information Technology (CIT05) 0-7695-2432-X/05 $20.00 2005

IEEE

Positive Negative

Classied as Positive A C

Classied as Negative B D (4) (5) (6) (7) (8)

A A+B A Recall of Positive = A+C D Precision of Negative = D+C D Recall of Negative = D+B A+D Accuracy = A+D+B+C Precision of Positive = Figure 6. Measures for evaluation

be recognized by a lexical analyzer (or a word segmentation program for Chinese text). For a corpus, the word list will be sorted with frequency and be given to a human review. He will select the domainspecic keywords from the list and create the rule for the corresponding word or phrase. As for the evaluative words, some English sentiment words are collected from General Inquirer (GI) 5 at rst. And then, from WordNet (or HowNet for Chinese text), the synonyms of known sentiment words are extracted. Since their Part-Of-Speech tags and sentiment labels are given, the rule generation of evaluative words is quite clear. Figure 7 shows the experimental results on the two corpora. Both the experiments show the similar performance rank: The SVM algorithm achieves the best performance above 90%, better than the C4.5 algorithm. The SLC classier gets the lowest accuracy, but it is still beyond 85%, which is reasonable enough for application.

7
the values of the variables represented by the path from the root. C4.5 uses the information gain ratio to select the best variable for node division. The software C4.5 Release 8, which is written by Quinlan4 , is adopted in this work.

Conclusions

Experiments

This section uses some on-line documents to test the performance of ATA when adopting different classiers. The experimental documents cover two domains: politics and religion. All the documents are collected from www.google.com and groups-beta.google.com. The latter is a news-group search engine. Each document is involved in two classes of subjects and has an overall attitude toward the subject classes (either positive or negative). A subject can be a political expression, party, politician, religious concept or a religious leader. And an attitude can be either positive (desirable, praiseful) or negative (undesirable, critical). Three measures, namely precision, recall and accuracy, are employed to evaluate each experiment. Their denitions can be found in Figure 6. The corpus politics contains 826 articles, 672 of them are positive and 154 are negative. While the corpus religion consists of 2856 articles, 620 of them are positive and 2236 are negative. For each corpus, 2/3 documents are randomly selected to construct the Deduction Rules. The classication takes cross validation method, in which 5-od for politics and 10-od for religion. In the process of rule construction, grammar rules are created manually, while the word rules are based on a semiautomated way. For each given article, word and phrase will
4 http://www.rulequest.com/Personal/c4.5r8.tar.gz

This paper describes the sentiment classication with review extraction. A Super-Parsing-based system, named Approximate Text Analysis, is employed to extract the sentimental expressions. The Candidate List Algorithm is proposed to traverse the constituent combinations incrementally. Three different classiers, i.e. a simple linear classier (SLC), Support Vector Machine, and C4.5 algorithm as well, are tested in classifying the sentiment label when given sentiment indicators. The experiments on the on-line documents illustrate that the SVM algorithm achieves the best performance. Although SLC gets the lowest accuracy in all three classiers, it is still reasonable for application.

Acknowledgments

The authors deeply thank Xuanjin Huang, Jin Min, and Xiaochun Wu in Fudan University for their valuable corpora. The anonymous reviewers from CIT2005 are also greatly thanked for their informative advice on this paper. Meanwhile, the gratefulness is given to Science and Technology Commission of Shanghai Municipal Government for their nancial support (Project NO: 035115028).

References
[1] B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classiers. In Proceedings of the 5th annual ACM workshop on Computational Learning Theory, pages 144152. ACM Press, 1992.
5 http://www.wjh.harvard.edu/inquirer

Proceedings of the 2005 The Fifth International Conference on Computer and Information Technology (CIT05) 0-7695-2432-X/05 $20.00 2005

IEEE

Classier SLC SVM C4.5

Positive Precision Recall 85.71% 96.48% 95.98% 93.07% 95.98% 89.58% Positive Precision Recall 83.98% 96.65% 86.09% 93.53% 84.11% 92.03%

Negative Precision Recall 82.35% 57.89% 98.63% 79.55% 50.98% 74.29% Negative Precision Recall 99.20% 95.73% 98.88% 97.42% 98.63% 97.05%

Accuracy 85.82% 90.91% 87.64%

(a) Experiment on corpus Politics

Classier SLC SVM C4.5

Accuracy 95.90% 96.85% 96.32%

(b) Experiment on corpus Religion

Figure 7. Experimental results

[2] R. Collobert and S. Bengio. SVMTorch: A support vector machine for large-scale regression and classication problems. Journal of Machine Learning Research, 1:143160, 2001. [3] R. Collobert and S. Bengio. SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1:143160, 2001. [4] Z. Fei, J. Liu, and G. Wu. Sentiment classication using phrase patterns. In The Fourth International Conference on Computer and Information Technology (CIT04), Wuhan, China, pages 11471152, Sept 2004. [5] M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, Aug 2004. [6] M. Hu and B. Liu. Mining opinion features in customer reviews. In Proceedings of Nineteeth National Conference on Articial Intellgience, San Jose, USA, July 2004. [7] M. Johnson. Parsing with discontinuous constituents. In Proceedings of the 23rd conference on Association for Computational Linguistics,Chicago, Illinois, pages 127 132, 1985. [8] N. Kobayashi, K. Inui, Y. Matsumoto, K. Tateishi, and T. Fukushima. Collecting evaluative expressions for opinion extraction. In Proceedings of the 1st International Joint Conference on Natural Language Processing, Sanya City, Hainan Island, China, pages pp. 584589, Mar 2004. [9] J. Liu and G. Wu. Apparatus and method for approximate text analysis, 2005. Application NO: 200510023589.8, Chinese Patent. [10] J. Liu and G. Wu. Apparatus and method for global reduction, 2005. Application NO: 200510023588.3, Chinese Patent. [11] J. Liu, J. Yao, and G. Wu. Sentiment classication using information extraction technique. In Proceedings of the 6th International Symposium on Intelligent Data Analysis, Madrid, Spain, Sept 2005. [12] I. Muslea. Extraction patterns for information extraction tasks: A survey. In The AAAI Workshop on Machine Learning for Information Extraction, 1999.

[13] T. Nasukawa and J. Yi. Sentiment analysis: Capturing favorability using natural language processing. In The Second International Conferences on Knowledge Capture (K-CAP 2003), Sanibel Island, FL, USA, pages pp 70 77, Oct 2003. [14] J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. [15] P. D. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classication of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 417424, 2002. [16] J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In The Third IEEE International Conference on Data Mining, November 2003.

Proceedings of the 2005 The Fifth International Conference on Computer and Information Technology (CIT05) 0-7695-2432-X/05 $20.00 2005

IEEE

You might also like