Professional Documents
Culture Documents
What is it??
Input raw text over some topic Output opinion ( +ve, -ve or neutral ) Its is hard why??? - determines the opinion on overall text rather than just subject of the topic
We know
Web enormous amount of data Topical categorization active research
Why is it interesting?
Represents the voice about particular topic from broader audience Example : product reviews, movie reviews, book reviews Important to business intelligence applications - What do people (dis)like in Nikon D40
Nave approach
Idea: people tend to use certain words to express strong sentiments, produce such list and rely to classify text
Nave Bayes
Assign to a given document d the class Nave Bayes rule :
Maximum Entropy
Idea is to make fewest assumptions about the data while still being consistent with it
Evaluations
Randomly selected 700 positive, 700 negative sentiment documents Automatically removed rating indicators, extracted textual information from original HTML Added NOT_ to every word between a negation word(not, isnt) and first punctuation.
Results
Conclusion
Unigram presence information turned out to be most effective The superiority of presence information in comparison to feature frequency indicates a difference between sentiment and topic categorization.