Professional Documents
Culture Documents
and are subsequently combined to find the users in the form of hashtags to express their opinions
sentiment polarity of the complete document. about current movies opinion. We then pre-processed
IV. Feature-level or aspect-level: It helps to analyze the dataset by :
what people are trying to suggest. Feature level • removing stop words,
tries to extract sentiment from the opinion directly. • removing all URLs (e.g. www.abc.com), hash
tags (e.g. #movie), targets (@username),
Recent studies have proven that with Twitter it is • Correct the spellings; sequence of repeated
possible to get people’s insight from their profiles in characters is to be handled.
contrast to traditional ways of obtaining information • Remove all punctuations ,symbols, numbers
about perceptions. Furthermore, authors of [4]
proposed an algorithm for exploiting the emotions After preprocessing ,Polarity and subjectivity were
from tweets while considering a large scale of data calculated using three different libraries,
for sentiment analysis. To identify social SentiWordNet, W-WSD and TextBlob. The
communities with influential impact, a novel method following steps explain the tools and techniques that
was proposed by [5] and implemented by assigning used us in sentiment analysis. Furthermore,
metric value to each of the user’s emotional posts. sentiment-analysis framework is explained in Figure
Subsequently, the contribution of this paper includes 1
the analysis of movies sentiments gathered from
Twitter profiles, with various sentiment analyzers. In A- Data Gathering
addition, this paper presents the comparisons of
results obtained from each analyzer. In our analysis To gather tweets from twitter based on our movies
comparison of different sentiment analyzers are done keywords to view the opinion of the public about
with the different classifiers. Sentiment prediction is these movies. For this, we used Tweepy API [7,12].
done on Twitter data to show which technique has a We have created an account on Tweepy API linked to
better capability. our Twitter account. To retrieve the tweets, Tweepy
API accepts parameters and provides the Twitter
II. LITERATURE SURVEY account’s data in return. Retrieved tweets, from
Twitter accounts, were further used for opinion
The analysis of hollywood movies was done on the mining. under the following fields: twitter_id,
basis of real-time Twitter data [6], extracted from hashtag, tweet_created, user_id, screen_name,
Twitter by using Twitter-streaming application tweet_text, retweet_count, follower_count, and
programming interface (API) [7]. favourite_count of each tweet. We select hashtags
Two sentiment analyzers named SentiWordNet [8] that were trending on Twitter, which represents
and WordNet [9] were used to find positive and public view for the movies.
negative scores. To add accuracy to the model,
negation handling [10] and word sequence def DownloadData(self):
disambiguation (WSD) were used. Twitter streaming # authenticating
API was also used to gather data by the authors of
[11] for the prediction of the Indonesian presidential consumerKey =
elections. The aim was to use Twitter data to 'tVIcrrTSV25b0L2KCRWsthbAZ'
understand public opinion. For this purpose, after the consumerSecret =
collection of data, the study performed automatic 'SFJSQrCao5prmbqWSGcT5YFsfKX'
buzzer detection to remove unnecessary tweets and accessToken = '1947267980-
then analyzed the tweets sentimentally by breaking fX8uOOkVTm9GKQ6rmqR3cE'
each tweet into several sub-tweets. After that, it accessTokenSecret =
calculated sentiment polarity and, to predict opinion '9nf4XXJUKx6URh9NQwOi77z8a'
outcome, used positive tweets associated with movie, auth = tweepy.OAuthHandler(consumerKey,
Twitter-based prediction was 0.61% better than the consumerSecret)
same type of surveys conducted traditionally. auth.set_access_token(accessToken,
accessTokenSecret)
III. METHODOLGY api = tweepy.API(auth)
In this paper we have explained a process from the # input for term to be searched and how many
collection, sentiment analysis, and classification of tweets to
Twitter opinions. We collects the tweets of particular search
movies & considered tweets that were posted by
searchTerm = input("Enter Keyword/Tag to we pre-processed from tweets, because all the above
search about: ") conditions will get information that was not a
NoOfTerms = int(input("Enter how many tweets requirement for sentiment analysis in our approach.
to search: "))
C. Polarity Calculation and Sentiment Analysis
# searching for tweets
self.tweets = tweepy.Cursor(api.search, From social media platforms we can collect valuable
q=searchTerm, lang = information of sentiment analysis by detecting
emotions or opinions from a large volume of data
"en").items(NoOfTerms) present in unstructured format. Sentiment analysis
includes five polarity classes, which are strong
negative, negative, neutral, positive, highly positive.
The polarity of each tweet
is determined by assigning a score from -1 to 1 based
on the words used, where a negative score from -1 to
Data Gathering (Tweepy API) -0.5 means a highly negative sentiment, negative
score from -0.5 to 0 means a negative sentiment, a
positive score from 0 to 0.5 means a positive
Preprocessing sentiment, a positive score from 0.5 to 1 means a
highly positive sentiment while the zero value is
considered a neutral sentiment.
Removing stop words
Removing all URLs We used Textblob, SentiWordNet and Word Sense
Disambiguation (WSD) sentiment analyzers for
Correct the spellings
detecting the polarity of movie reviews, and to give a
Remove all punctuations , clear view of the most accurate analyzer for the
symbols, numbers polarity calculator. Textblob [8] is a Python (2 and 3)
library for processing textual data. Natural language
Polarity Calculation processing (NLP) tasks such as stopword removal,
url removal, noun phrase extraction, part-of-speech
Text Blob Sentiwordnet WSD tagging, selling correction ,sentiment analysis,
classification, translation, and more are provided in a
simple API. Similarly, we used SentiWordNet [9],
which is a publicly available analyzer of the English
Twitter DataSet Sentimental Analysis language that contains opinions extracted from a
wordnet database. In addition to that, W-WSD [11]
Figure
Highly2. Framework Neutral
Positive of Negative
tweet sentiment Highly
analysis; has the ability to detect the correct word sense within
Positive Negative a specified context. Baseline words/Unigrams are
clear representatives for the calculation of polarity.
Consequently, we used Unigram data in WSD.
B. Data Pre-processing Despite admitting uncertain observations, WSD is far
better than other natural-language processing context
In the pre-processing step, we removed the irrelevant recognition functions.
Twitter data. First, we just kept tweets that were in
English. IV. RESULT AND DISCUSSION
Furthermore, we pre-processed the dataset by :
• Removing stop words, This section contains the result of sentimental
• Removing all URLs (e.g. www.abc.com), hash analysis of movie tweets along with a discussion of
tags (e.g. #movie), targets (@username), the results. At the first step tweets were collected and
• Correct the spellings; sequence of repeated analyzed from various sentiment analyzers of
characters is to be handled. sentiment analysis with the aim of determining the
• Remove all punctuations ,symbols, numbers (e.g. accuracy of these analyzers. The main purpose of
!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~). using a different sentiment analyzers such as
TextBlob, SentiWordNet, and WSD was to give a
comparison of results with different analyzer.
Tweets of movie Race 3 gathered from public lexicons. But, in this research we are providing best
accounts were 3000 in number. Among the three adopted sentiment analysis for movie opinion by
sentiment analyzers we compared in this research, we giving comparison between sentiment lexicons (W-
found that TextBlob had the highest rate of tweets WSD, SentiWordNet, TextBlob). As a result, we
with neutral sentiment, 1729 in number and 57.6 % in calculated sentiments from three analyzers named
percentage. SentiWordNet gave 377 & 197 the SentiWordNet, TextBlob, and W-WSD. Although
highest negative & highly negative sentiment rate, TextBlob results were relatively better as is clearly
12.5% & 6.6 % respectively, which can be viewed in shown in the results. In order to take our initiative to
Table 1 and Figure 3. next level, we will find the patterns of any product or
person based on Twitter reviews in future research.
REFERENCES
[1] S. Argamon, K. Bloom, A. Esuli, and F.
Sebastiani, “Automatically determining attitude
type and force for sentiment analysis,” in Human
Language Technology. Challenges of the
Information Society. Springer, 2009, pp. 218–
231.
[2] Boštjan Kaluža, “Machine Learning in Java”, first
published: Published by Packt Publishing Ltd,
UK, 2016.
[3] R. Feldman, “Techniques and applications for
Figure 3. Polarity calculation with each sentiment sentiment analysis,” Communications of the
analyzer. ACM, vol. 56, no. 4, pp. 82–89, 2013.
[4] Kanavos, A.; Nodarakis, N.; Sioutas, S.;
Table 1. Polarity calculation with sentiment analyzer Tsakalidis, A.; Tsolis, D.; Tzimas, G. Large
percentage accuracy. scale implementations for twitter sentiment
Sentiment Text classification. Algorithms 2017
Sentiwordnet WSD
Analyzer Blob [5] Kanavos, A.; Perikos, I.; Hatzilygeroudis, I.;
Highly 284 (9.5 Tsakalidis, A. Emotional community detection in
212 (7.07 %) 392 (13.1 %)
Positive %) social networks. Comput. Electr. Eng. 2017, 65,
591 542 (18.06 449–460.
Positive 828 (27.6 %) [6] U. R. Hodeghatta, ‘‘Sentiment analysis of
(19.7 %) %)
1729 1672 (55.7 Hollywood movies on Twitter,’’ in Proc.
Neutral 1476 (49.2 %) IEEE/ACM ASONAM, Aug. 2013, pp. 1401
(57.6 %) %)
249 ( 1404.
Negative 377 (12.5 %) 189 (6.3 %) [7]Twitter Apps. Available online:
8.3 %)
Highly 147 (4.9 http://www.tweepy.org/ (accessed on 26
197 (6.6 %) 114 3.8 %) February 2018).
Negative %)
[8] TextBlob Docs. Available online
https://textblob.readthedocs.io/en/dev/
[9] Miller, G.A.Wordnet: A lexical database for
V. CONCLUSIONS AND FUTURE english. Commun. ACM 1995, 38, 39–41.
WORK [10] Hogenboom, A.; Van Iterson, P.; Heerschop, B.;
Frasincar, F.; Kaymak, U. Determining negation
This paper focuses on the adoption of various scope and strength in sentiment analysis. In
sentiment analyzers to determine the approach with Proceedings of the 2011 IEEE International
the highest accuracy rate for learning about movie Conference on Systems, Man, and Cybernetics,
opinion. In a lexicon-based sentiment analysis, Anchorage, AK, USA, 9–12 October 2011; pp.
semantic orientation is of words, phrases or sentences 2589–2594.
calculated in a document. Polarity in the lexicon- [11] Ibrahim, M.; Abdillah, O.; Wicaksono, A.F.;
based method is calculated on the basis of the Adriani, M. Buzzer detection and sentiment
dictionary, that consists of a semantic score of a analysis for predicting presidential election
particular word. Considerable work has been done results in a twitter nation. In Proceedings of the
in the field of sentiment analysis from sentiment 2015 IEEE International Conference on Data