You are on page 1of 15

Sentiment Analysis

An Overview of Concepts and


Selected Techniques

Terms

Sentiment

A thought, view, or attitude, especially one


based mainly on emotion instead of reason

Sentiment Analysis

aka opinion mining


use of natural language processing (NLP) and
computational techniques to automate the
extraction or classification of sentiment from
typically unstructured text

Motivation

Consumer information

Marketing

Consumer attitudes
Trends

Politics

Product reviews

Politicians want to know voters views


Voters want to know policitians stances and who else
supports them

Social

Find like-minded individuals or communities

Problem

Which features to use?

Words (unigrams)
Phrases/n-grams
Sentences

How to interpret features for sentiment


detection?

Bag of words (IR)


Annotated lexicons (WordNet, SentiWordNet)
Syntactic patterns
Paragraph structure

Challenges
Harder than topical classification, with
which bag of words features perform well
Must consider other features due to

Subtlety of sentiment expression


irony
expression of sentiment using neutral words

Domain/context dependence

words/phrases can mean different things in different


contexts and domains

Effect of syntax on semantics

Approaches

Machine learning

Nave Bayes
Maximum Entropy Classifier
SVM
Markov Blanket Classifier

Assume pairwise
independent features

Accounts for conditional feature dependencies


Allowed reduction of discriminating features from
thousands of words to about 20 (movie review
domain)

Unsupervised methods

Use lexicons

LingPipe Polarity Classifier

First eliminate objective sentences, then


use remaining sentences to classify
document polarity (reduce noise)

LingPipe Polarity Classifier


Uses unigram features extracted from
movie review data
Assumes that adjacent sentences are
likely to have similar subjective-objective
(SO) polarity
Uses a min-cut algorithm to efficiently
extract subjective sentences

LingPipe Polarity Classifier


Graph for classifying three items.

LingPipe Polarity Classifier


Accurate as baseline but uses only 22% of
content in test data (average)
Metrics suggests properties of movie
review structure

SentiWordNet

Based on WordNet synsets

Ternary classifier

http://wordnet.princeton.edu/
Positive, negative, and neutral scores for each
synset

Provides means of gauging sentiment for


a text

SentiWordNet:
Construction

Created training sets of synsets, Lp and Ln

Start with small number of synsets with fundamentally


positive or negative semantics, e.g., nice and nasty
Use WordNet relations, e.g., direct antonymy, similarity,
derived-from, to expand Lp and Ln over K iterations
Lo (objective) is set of synsets not in Lp or Ln

Trained classifiers on training set

Rocchio and SVM


Use four values of K to create eight classifiers with
different precision/recall characteristics
As K increases, P decreases and R increases

SentiWordNet: Results

24.6% synsets with Objective<1.0

Many terms are classified with some degree of


subjectivity

10.45% with Objective<=0.5


0.56% with Objective<=0.125

Only a few terms are classified as definitively


subjective

Difficult (if not impossible) to accurately


assess performance

SentiWordNet: How to use


it

Use score to select features (+/-)

e.g. Zhang and Zhang (2006) used words in


corpus with subjectivity score of 0.5 or greater

Combine pos/neg/objective scores to


calculate document-level score

e.g. Devitt and Ahmad (2007) conflated


polarity scores with a Wordnet-based graph
representation of documents to create
predictive metrics

References
1.

http://www.answers.com/sentiment, 9/22/08
B. Pang, L. Lee, and S. Vaithyanathan, Thumbs up? Sentiment
classification using machine learning techniques, in Proc Conf
on Empirical Methods in Natural Language Processing (EMNLP),
pp. 7986, 2002.
Esuli A, Sebastiani F. SentiWordNet: A Publicly Available Lexical
Resource for Opinion Mining. In: Proc of LREC 2006 - 5th Conf
on Language Resources and Evaluation, 2006.
Zhang E, Zhang Y. UCSC on TREC 2006 Blog Opinion Mining.
TREC 2006 Blog Track, Opinion Retrieval Task.
Devitt A, Ahmad K. Sentiment Polarity Identification in Financial
News: A Cohesion-based Approach. ACL 2007.
Bo Pang , Lillian Lee, A sentimental education: sentiment
analysis using subjectivity summarization based on minimum
cuts, Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics, p.271-es, July 21-26, 2004.

You might also like