You are on page 1of 7

Aspect Based Sentiment Analysis Using NeuroNER and

Bidirectional Recurrent Neural Network


Ngoc Minh Tran
Hanoi University of Science and Technology
Hanoi Vietnam
light.pearl.gre@gmail.com

ABSTRACT ACM Reference format:


Nowadays, understanding sentiments of what customers say, think Ngoc Minh Tran. 2018. Aspect Based Sentiment Analysis Using
and review plays an important part in the success of every NeuroNER and Bidirectional Recurrent Neural Network. In SoICT ’18:
business. In consequence, Sentiment Analysis (SA) has been Ninth International Symposium on Information and Communication
becoming a vital part in both academic and commercial standpoint Technology, December 6–7, 2018, Da Nang City, Viet Nam. ACM, New
in recent years. However, most of the current sentiment analysis York, NY, USA, 7 pages. https://doi.org/10.1145/3287921.3287922
approaches only focus on detecting the overall polarity of the
whole sentence or paragraph. That is the reason why this work
focuses on another approach to this task, which is Aspect Based
1 INTRODUCTION
Sentiment Analysis (ABSA). People opinions and feelings about different things in their
daily life is a huge data source that has attracted the interest of
The proposed ABSA system in this paper has two main phases: many companies. It includes essential information such as reviews
aspect term extraction and aspect sentiment prediction. For the and opinions about a certain product or service which is collected
first phase, as to deal with the named-entity recognition (NER) by every company in order to survive and become better in
task, it is performed by reusing the NeuroNER [1] program competitive business environment nowadays. And Sentiment
without any modifications because it is currently one of the best Analysis (SA) has turned out to be one of the best techniques to
NER tool available. For the sentiment prediction task, a analyze these reviews and opinions as it can identify sentiment
bidirectional gated recurrent unit (BiGRU) Recurrent Neural polarity inside the text.
Network (RNN) model which processes 4 features as input: word Traditional approaches for SA aim to classify the sentiment
embeddings, SenticNet [2], Part of Speech and Distance is polarity of textual documents as positive, negative or neutral.
implemented. However, this network architecture performance on Usually, this classification is performed by considering the
SemEval 2016 [3] dataset showed some drawbacks and documents as a single unit (document-level sentiment analysis) or
limitations that influenced the polarity prediction result. For this considering each sentence of a document (sentence-level
reason, this work proposes some adjustments to the mentioned sentiment analysis). In both cases, information about the aspects,
model to solve the current problems and improve the accuracy of i.e., features or properties of products or services, is not explored.
the second task. However, the sentiment polarity might be different for different
aspects of the same product or service. For instance, in the
CCS CONCEPTS following sentence “I liked the image resolution of the TV, but its
• Computing methodologies → Artificial intelligence → Natural remote control is terrible”, there is a positive opinion about the
language processing image resolution and a negative opinion about the remote control.
To deal with this type of scenario, we can use ABSA to enhance
KEYWORDS sentiment analysis decision making through specific information
Sentiment Analysis, Gated Recurrent Units, Bidirectional about each aspect of a product or service.
Recurrent Neural Network ABSA is more complex and challenging than document-
level or sentence-level sentiment analysis. It is normally divided
Permission to make digital or hard copies of all or part of this work for personal or
into two main steps: extracting the terms/words related to the
classroom use is granted without fee provided that copies are not made or distributed aspects and performing sentiment polarity prediction for each
for profit or commercial advantage and that copies bear this notice and the full identified aspect. Specifically, in this Aspect Based Sentiment
citation on the first page. Copyrights for components of this work owned by others
than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, Analysis system, it takes text sentences, which are reviews about a
or republish, to post on servers or to redistribute to lists, requires prior specific specific domain (e.g. restaurants, hotels) as input. The system will
permission and/or a fee. Request permissions from Permissions@acm.org.
SoICT 2018, December 6–7, 2018, Danang City, Viet Nam then analyze the input sentiment using a two-step method. Firstly,
© 2018 Association for Computing Machinery it extracts the named entities in the sentence with respect to their
ACM ISBN 978-1-4503-6539-0/18/12…$15.00
https://doi.org/10.1145/3287921.3287922
aspect. After that, it predicts the polarity of each entity in the

1
SoICT 2018, December 2018, Da Nang, Vietnam N. Tran

sentence in terms of three kinds of sentiments (positive, negative, Secondly, Jebbara and Cimiano [7] in their works of
neutral) to get the sentiment for each aspect. addressing ABSA task propose a Two-Step Neural Network
There are two current approaches to ABSA which are used architecture that contains two similar main subtasks that are
depending on the type of given datasets. Specifically, the first mentioned above: aspect term extraction and aspect polarity
kind of dataset gives only aspects and their polarity in the training prediction. The authors used BiGRU RNN for both subtasks. As
sentence. (i.e “The food is delicious and the waiters are helpful” the first step, the neural network is used to extract aspects from
{#food – positive} {#services – positive}). Meanwhile, the second the text by considering the problem as a sequence labeling task. In
kind of dataset gives one more information which is entity terms the second step, the network predicts the sentiment polarity label
(i.e “The food is delicious and the waiters are helpful” {food - for each extracted aspect with respect to its context. It uses
#food – positive} {waiters - #services – positive}). For the former pretrained semantic word embedding features from WordNet [11]
case, the problem is considered as a label classification problem combining with part of speech and distance feature. The system
for aspect extraction task. Meanwhile, for the latter case, the succeeded in combining two tasks of ABSA into one system and
aspect extraction task should do the named entity recognition to showed a decent impact for the polarity prediction, but less so for
extract the term then classify the aspect label for it. The objectives the aspect term extraction.
of the second task of the ABSA – aspect polarity prediction is the The third related paper is the one that provides the network
same for both two kinds of dataset. Because the dataset used in architecture idea to develop the model for the Aspect Polarity
this work – SemEval 2016 is the second kind, the proposed ABSA Prediction task. Nio and Murakami [8] present a sentiment
system follows the latter approach. classification model based on the BiLSTM network over three
In this work, the proposed ABSA system contains 2 subtasks. features word embeddings, part of speech tag and SentiWordnet
The first subtask is Aspect Term Extraction, which NeuroNER [9]. The architecture achieved the state-of-art result for Japanese
named entity recognition tool is used to achieve the subtask language dataset. This motivated me to shift this architecture into
objectives. For the second subtask – Aspect Polarity Prediction, a English dataset for sentiment polarity prediction task.
Bidirectional Gated Recurrent Units (BiGRU) Neural Network is This research presents a two-steps system to solve two
implemented and some modifications are added to the network to subtasks like Jebbera and Cimiano work [7], and bases on the
deal with its remaining problems which will be explored further in architecture of the third paper to develop the aspect sentiment
the later sections of this paper. classification for the second subtask with English dataset.
The remaining of this paper is organized as follows: Sect. 2
discusses the related works and researches of ABSA. Sect. 3
presents the proposed ABSA system overall architecture. Sect. 4 3 SYSTEM ARCHITECTURE
discusses the BiGRU neural network of Aspect Polarity Prediction The proposed ABSA system in this paper contains 2 main
subtask, its disadvantages, and the proposed solutions. Sect. 5 subtasks: Aspect Term Extraction and Aspect Polarity Prediction.
introduces the datasets and explains the experimental results and As the first step, the named entity recognition tool NeuroNER is
assessments. Sect. 6 concludes this research and provides plans used to extract entity terms and their corresponding aspect from
for future work. the text. The output of the first subtask is a predicted tag sequence
in the IOB format with the aspect term is outlined in the input
sentence. This result will be used to calculate the Distance feature
2 RELATED WORKS which represents the relative distance of each words in the
This paper work is in line with the growing interest of sentence to the detected aspect term. In the second step, a
implementing Recurrent Neural Network (RNN) in ABSA. recurrent neural network processes each extracted aspect with
Recently, RNN has performed better than Convolutional Neural respect to its context and predicts a sentiment polarity label.
Network (CNN) in most of Natural Language Processing tasks Specifically, each word of the input sentence will be represented
including sentiment classification according to a research from as a concatenated vector from several features (Word
Yin et al., 2017 [5]. This section shares three different related Embeddings, SenticNet [2], Part-of-Speech tagging and Distance.
approaches that inspired my work. After that, the network processes this input sequence feature
Firstly, Ruder et al. [6] presents a system that uses a vectors using a Bidirectional Gated Recurrent Units layer and
hierarchical bidirectional Long Short Terms Memory (BiLSTM) regular feed-forward layer. The output of the network is a single
RNN. Specifically, this approach implements two BiLSTM layers predicted polarity label for the aspect term of interest. The aspect
on top of each other and takes two features as input which are term for which a polarity label is to be predicted is outlined in the
word embeddings and aspect embeddings. The former feature is input sentence. Figure 1. shows the overall structure of the
fed to the first layer while the latter one is fed to the above layer. system.
The output of the stacked BiLSTM network is fed to the final For the first subtask of the system, Jebbera et at. [7] system
layer to determine the polarity of the aspect. This system is stated showed a significant drawback by yielding below 50 percent in
to show competitive result without using any hand-engineered the score of F1, precision, and recall. The small numbers of
features or external resources. correct aspect terms in the first subtask might lead to high
sentiment prediction accuracy in the second subtask. That is the

2
Aspect Based Sentiment Analysis Using NeuroNER and BiRNN SoICT 2018, December 2018, Da Nang, Vietnam

reason why instead of using the recurrent neural network to detect Table 1: Aspect Term Extraction result on SemEval 2016
the aspect term in the input sentence, the named entity recognition English restaurant dataset [20]
tool NeuroNER is proposed to use. NeuroNER is an open source
and freely available named entity recognition tool based on the Version F1 Ranking
artificial neural network. This tool takes the sentiment analysis NLANGP [18] 0.730 1/30
input data with pretrained entities for each sentence and their ESI 0.679 10/30
respective aspects as labels. The output of NeuroNER is the IOB NeuroNER 0.674 13/30
entities tagging for each sentence and their aspects. To evaluate IIT-T [19] 0.612 20/30
the performance of NeuroNER, it is performed on the BUAP 0.372 30/30
SemEval2016 restaurant English dataset. Comparing with the
other system result with the same dataset for Aspect Term
Extraction task, NeuroNER achieved a decent result, ranked 13 4 ASPECT POLARITY PREDICTION
over 30 systems that participated in the competition. Table 1.
This section demonstrates the second subtask of the ABSA
shows the F1 score of NeuroNER compared with other systems in
system – Aspect Polarity Prediction. The main objective of this
aspect term extraction task for this dataset
subtask is predicting the polarity for detected aspect term given
The second task of the system – Aspect Polarity Prediction
from the result of the first subtask Aspect Term Extraction. To
contains the majority of this work contribution will be discussed
address this problem, as mentioned above, the network
in detail in the next section.
architecture proposed by Nio and Murakami [8] is referred. In
their work, the authors present a sentiment polarity prediction
The food was great, but the waiter was rude architecture that used BiLSTM RNN with three features as input:
positive negative word embeddings, part of speech tagging and SentiWordnet.
Based on this network, the architecture for the second ABSA
subtask is implemented with some modifications. Instead of using
BiLSTM RNN, BiGRU RNN is used to process the input features.
ASPECT POLARITY PREDICTION LSTM and GRU are the two most popular used network models
(BiGRU) of RNN. However, GRU has a simpler architecture than LSTM,
fewer parameters thus less demanding computation and faster
training but still produces competitive results compared with
LSTM in many NLP tasks. Additionally, the use of the
bidirectional model allows the model to be aware of both previous

𝒔 ⃗⃗⃗
𝒘 ⃗𝒅 ⃗
𝒑
and subsequent context of the input data. For the input of the
neural network, 4 features are used: word embeddings, SenticNet,
Part of Speech tagging and Distance.
The food was great, but the waiter was rude
4.1 Features
O B O O O O B O O 4.1.1 Word Embeddings. This is the most important feature
#food #service that has been successfully implemented in numerous NLP tasks.
In this work, fastText [10] – a library for learning word
embeddings for 294 languages introduced by Facebook’s AI
Research (FAIR) lab is used. It uses the skip-gram model on a
ASPECT TERM EXTRACTION corpus of 50 thousand restaurant reviews collected by Mehrbod
(NeuroNER) Safari to learn word representation model. After that, this model is
used to compute 100-dimensional embeddings vector for each
word. The sequence of word embeddings vectors for a sentence
with words 1…N is denoted as:
The food was great, but the waiter was rude [𝑤]1𝑁 = {𝑤1 , … , 𝑤𝑁 } 𝑤𝑖𝑡ℎ 𝑤𝑖 ∈ 𝑅100 (1)

4.1.2 SenticNet. SenticNet 3 [2] is a publicly semantic resource


for conceptual and affective information conveyed by natural
Figure 1: Overall system architecture contains 2 subtasks: language. This resource gives nearly similar information as
Aspect Term Extraction and Aspect Polarity Prediction. SentiWordnet that is used in the work of Nio and Murakami;
⃗ , 𝒘
𝒔 ⃗ ,𝒑
⃗⃗⃗ , 𝒅 ⃗ denote for 4 feature vectors: senticnet, word however, it is more powerful. With a concept-level knowledge
embeddings, distance and part of speech. base of more than 100000 natural language concepts, SenticNet 3
has a capability of providing semantic vector for each word

3
SoICT 2018, December 2018, Da Nang, Vietnam N. Tran

specifying 5 values: pleasantness, attention, sensitivity, aptitude, receive the sequence that each word will be a 141 dimensions
polarity. These provided scores are then included in the model as vector. The sequence of input vectors for a sentence with N words
additional input source that the neural network can get is denoted as:
information from. Since SenticNet provides semantics and
[𝑢]1𝑁 = {(𝑤1 , 𝑠1 , 𝑝1 , 𝑑1 )𝑇 , … , (𝑤1 , 𝑠1 , 𝑝1 , 𝑑𝑁 )𝑇 } 𝑤𝑖𝑡ℎ 𝑢𝑖 ∈
polarity information of a concept, the aspect polarity prediction
𝑅100+5+35+1 (5)
system will be benefited from this knowledge. In consequence,
each word inside the input sentence will be constructed a 5- The resulting sequence is then fed to the BiGRU layer that
dimensional feature vector using SenticNet 3 and will be referred produces an output sequence of recurrent states:
as sentic vector. In case the word outside of the knowledge base, it [𝑔]1𝑁 = 𝐵𝐼𝐺𝑅𝑈([𝑢]1𝑁 ) = {(𝑔 𝑔1 𝑇 , … , (𝑔
⃗⃗⃗⃗1 , ⃖⃗⃗⃗⃗) 𝑔𝑁 𝑇 } 𝑤𝑖𝑡ℎ ⃗⃗⃗
⃗⃗⃗⃗⃗𝑁 , ⃖⃗⃗⃗⃗⃗) 𝑔𝑖 , 𝑔
⃖⃗⃗⃗𝑖 ∈
is given a default 5 zero scores for 5 values. Each sentic vector si 𝑅25 (6)
is considered as an additional word vector for the word i. The
One layer processes the input from left to right and the other
sequence of sentic vectors for a sentence with words 1…N is
layer processes it in the reverse order. Then the final state of
denoted as:
forward and backward GRU will be concatenated to receive a
[𝑠]1𝑁 = {𝑠1 , … , 𝑠𝑁 } 𝑤𝑖𝑡ℎ 𝑠𝑖 ∈ 𝑅5 (2) fixed sized representation ℎ = (𝑔 𝑔1 𝑇 ∈ 𝑅50 of the aspect
⃗⃗⃗⃗⃗𝑁 , ⃖⃗⃗⃗⃗)
term. Then it will be passed to a densely connected feed-forward
layer producing another hidden representation. Finally, a densely
4.1.3 Part of Speech. Part of Speech is the next feature in the
connected layer with a softmax activation function processes that
system besides word embeddings and sentic vectors. Based on the
hidden representation to a 3-dimensional vector representing a
work of Nio and Murakami., this feature can aid the sentiment
probability distribution over the three polarity labels positive and
polarity prediction. A 1-of-K coding scheme transforms each tag
negative and neutral. The highest polarity label will be chosen as
into a K-dimensional vector that represents its corresponding tag.
the predicted label for the aspect. To update the parameters and
Specifically, in this research, the NLTK POS Tagger [12] which
optimize the model, Adam [15] technique is used.
has a total of 35 tags is used. These vectors are then concatenated
Figure 2. below shows the neural network architecture for
with their respective word vectors before being fed to the neural
Aspect Polarity Prediction task:
network. The sequence of POS tag vectors for a sentence with
words 1… N is denoted as:
[𝑝]1𝑁 = {𝑝1 , … , 𝑝𝑁 } 𝑤𝑖𝑡ℎ 𝑝𝑖 ∈ 𝑅35 (3)

4.1.4 Distance. This feature has been implemented in relation


extraction [13] and Semantic Role Labeling [14]. The idea behind
this feature is giving the model the information of the position of
the aspect inside the sentence and its relative distance to other
words. For this task, a similar technique as has been done for
relation extraction and Semantic Role Labeling is applied. Each
word is tagged with number showing its relative distance with the
aspect term:

“This restaurant has great service”


-1 0 1 2 3
Figure2. Neural network architecture for Aspect Polarity
In this example, the bold word “restaurant" is the aspect term for Prediction subtask. The network process the input sentence as
which we want to extract the polarity. The relative distance to the a sequence of word embedding wi, distance di, sentic si, part of
aspect term is shown below each word. The value of the words to speech pi concatenated together using BiGRU layer and
the left will be minus one and vice versa for the reverse side. In regular feed forward layer. The output of the network is a
single predicted polarity tag for the currently processed
this system, this raw value of each word is taken as input for the
aspect.
network by concatenating them with other 3 features. The
sequence of distance embedding vectors for a sentence with words
1… N words is denoted as:
4.3 Drawbacks and Proposed Solutions
The network architecture above was tested on the
[𝑑]1𝑁 = {𝑑1 , … , 𝑑𝑁 } (4) SemEval2016 dataset. Although it produced a decent accuracy
result of 68 percent over the test dataset, the incorrect polarity
4.2 Model
predicted terms still pointed out two limitations of the architecture
Assume we have a sentence with extracted aspect terms from that affected its accuracy. This section will discuss these problems
the first subtask. After getting all the feature vectors for each word with their incorrect predicted examples inside the dataset and the
inside input sentence, they will be concatenated together to proposed solutions for them

4
Aspect Based Sentiment Analysis Using NeuroNER and BiRNN SoICT 2018, December 2018, Da Nang, Vietnam

“I liked the atmosphere very much but the food was not worth
4.3.1 Problem 1: Predicting the incorrect polarity for the price”
sentences that has more than 2 aspect terms which have 2 Predicted: atmosphere – positive; food – positive
different polarities: Correct: atmosphere – positive; food – negative
Example 1:
“Nice ambiance but highly overrated price” Example 2:
Predicted: ambiance – positive; price – positive “The food was not great and the waiters were rude”
Correct: ambiance – positive; price – negative Predicted: food - positive; waiter - negative
Correct: food - negative; waiter – negative
The reason that caused this problem is the sentence contains
Example 2: negation words, for example, “not, neither, etc.”. The fact that
“The food was great, the margaritas too but the waitress was neural network did not have any methods to deal with this
too busy being nice to her other larger party” problem caused wrong polarity prediction for every aspect terms
Predicted: food - positive; margaritas - positive; waitress in sentences that includes negation words.
- positive To solve this problem, a list of negation words and use two
Correct: food - positive; margaritas - positive; waitress lists of positive and negative words proposed by Minqing Hu and
– negative Bing Liu 2004 [16] are used. Each sentence will be given a score
The reason that causes this problem might be the fact that the called as sentiment score. For each word in the input word
neural network takes the whole sentence as input. In fact, in a window in the negative list, the score will be deducted by 1 and
sentence that has two or more aspect terms, only some words will be added by 1 if in the positive list. For each sentence that has
related to the aspect term decide its polarity. That is the reason negation words, the system detects the nearest aspect term for
why the idea of the solution is trying to limit the range of words each negation word. The sentiment score of the aspect term linked
surrounding the aspect term that are passed to the network for to that negation word will be multiplied by minus 1. The final
predicting the polarity of the term. Taking one example in the sentiment score will be considered as a feature 1-dimensional
dataset that has been predicted into consideration (bold words are vector and be concatenated to the word vector which has
aspect terms): contained the others 4 features already. The new sequence of
input vector for a sentence with N words in now become: (100-
“Nice ambiance but highly overrated price” dimensional word embeddings vector + 5-dimensional sentic
vector + 35-dimensional part of speech vector + 1-dimensional
The above architecture predicted polarity for both “ambiance” distance vector + 1-dimensional sentiment score vector):
and “price” positive because it takes all words in the sentence as
[𝑢]1𝑁 = {(𝑤1 , 𝑠1 , 𝑝1 , 𝑑1 , 𝑠𝑐𝑜𝑟𝑒1 )𝑇 , … , (𝑤1 , 𝑠1 , 𝑝1 , 𝑑𝑁 , 𝑠𝑐𝑜𝑟𝑒𝑁 )𝑇 }
input to predict the polarity. It may lead to a situation that the
network takes the positive Senticnet score of the adjective “Nice” 𝑤𝑖𝑡ℎ 𝑢𝑖 ∈ 𝑅100+5+35+1+1 (7)
to decide the polarity for the term “price”. As a result, the term Below is the example that shows how this solution works on a
“price” got the wrong polarity – negative. That is the reason why sentence “The food was not great and the waiters were rude”:
the range of the input words is limited like the example above. Nearest aspect to negation word “not”: food
The bold words should be the only words passed to the network Sentiment score for input “food was not great”:
for predicting. If the network only gets “nice ambiance” and (0+0+0+1) x -1 = 1
“highly overrated price” as input, the predicted polarity for Sentiment score for input “waiters were rude”:
“ambiance” and “price” will be correct. (0+0-1) = -1
The applied technique that showed improvement to the system
performance is called “Nearest Adjective”. This technique limits
the input word window of an aspect term from the term itself to 5 EXPERIMENTS AND RESULTS
nearest adjective to it. If the sentence does not have any
adjectives, we will keep all the words of the sentence as the input 5.1 Dataset
of the network like the old version. Below is an example of how The datasets are used in this task is the English restaurant
this technique works (underlined words are aspect terms and bold dataset from SemEval-2016 Task 4, which are used for Aspect
word are adjectives) Sentiment Based Analysis. Each of the sentences in the training
Sentence: “The food is delicious but the service is terrible” dataset contains a pair of entity E and attribute A towards which
Term 1 window: “food is delicious” opinion is expressed. E and A are from the inventory of entity
Term 2 window: “service is terrible” types (e.g restaurant, food, drinks) and attribute labels (e.g prices,
quality). Each E and A pair like that defines an aspect and is
4.3.2 Problem 2: Predicting the incorrect polarity when assigned a polarity from a set of {negative, positive, neutral}.
sentences has negation words:
Example1: 5.2 Results and Evaluation

5
SoICT 2018, December 2018, Da Nang, Vietnam N. Tran

This section shows experimental results for the second subtask Table 4. Dataset sentences examples that have incorrect aspect
of the ABSA system: Aspect Polarity Prediction. Follow the polarity prediction in the first version and correct aspect
architecture and improved solutions mentioned in Section 4, the polarity prediction after applying proposed solutions
results of three version of the system will be presented here in
Table 2. The first is the BiGRU RNN network without any Example Sentences First Improved
modification. The second is the network combined with Nearest Version Version
Adjectives technique to solve problem 1. And the third is the
The food is great, the Food Food
second one combined with solutions for negation problem. The
margaritas is good too, but the Positive Positive
accuracy is calculated by the number of correct polarity prediction
waiters were busy being nice
over the set of correct aspect term extraction from subtask 1. All
to others Margaritas Margaritas
experiments were performed with the deep learning library Keras
[17] and use many of its implemented algorithms. Positive Positive
Follow the experimental results for three versions of the
second subtask, evaluation of the improved architecture and Waiters Waiters
comparison between the model result and other model submitted Positive Negative
for SemEval2016 Task 4 will be discussed Nice ambiance but highly Ambiance Ambiance
overrated place Positive Positive

Table 2. Aspect Polarity Prediction Accuracy for three


Place Place
versions. 1st version is the raw BiGRU RNN. NA denotes for
Nearest Adjectives solution and Negation is the solution for Positive Negative
Negation words problem The restaurant is nice and the Restaurant Restaurant
services are not bad Positive Positive
Version Train Acc. Test Acc.
1st version 0.741 0.682 Services Services
1st + NA 0.742 0.724 Neutral Positive
1st + NA + Negation 0.850 0.787 Neither pizza was good enough Pizza Pizza
to serve last night Positive Negative
Based on the results of these experiments, it can be evaluated
that the solutions for two current problems of the BiGRU RNN
architecture worked out as the Nearest Adjectives (NA) technique 6 CONCLUSION
increased the accuracy by 4 percent and by 10 percent for both
training and testing dataset when combined with the negation In this work, a two-phase ABSA system which contains two
technique. Table 4. shows some examples in the datasets which subtasks: Aspect Term Extraction and Aspect Polarity Prediction
the improved version (NA + Negation) fixed the incorrect is proposed. The first subtask is processed by using named entity
prediction of the first version. recognition tool NeuroNER. Meanwhile, the second subtask is
Compared with other system results submitted in SemEval addressed by using bidirectional Gated Recurrent Unit Recurrent
2016 as shown in Table 3. the accuracy of this proposed system Neural Network. To improve the performance of the system, two
ranked 18 over 30 systems, just 3 percent below the top 10 system techniques that can solve the remaining problems of the model are
and 10 percent below the system that achieved the state-of-the-art proposed. The solutions achieved significant accuracy increase for
result. the second subtask and got a decent ranking compared with other
systems working on the same dataset.
Table 3. Aspect Polarity Prediction result on SemEval 2016 Although the proposed solutions showed good results on
English restaurant dataset [20] the system performance, there is still room for improvement as the
solutions only fixed part of the incorrect prediction. For the future,
System Accuracy Ranking I plan to research the optimal methods for these problems to
XRCE [21] 0.881 1/30 further improve the accuracy of the system.
SeemGo 0.811 10/30
This system 0.787 18/30 REFERENCES
LeeHu 0.781 20/30 [1] Frank Dernoncourt, Ji Young Lee, Peter Szovolits. 2017, NeuroNER: an easy-
to-use program for named-entity recognition based on neural networks,
BUAP 0.608 30/30
Proceedings of the 2017 Conference on Empirical Methods in Natural
Language Processing: System Demonstrations, 97-102.
[2] Erik Cambria, Daniel Olsher, Dheeraj Rajagopal. 2014, SenticNet 3: A
Common and Common-sense Knowledge Base for Cognition-Driven Sentiment
Analyisis, Proceeding AAAI'14 Proceedings of the Twenty-Eighth AAAI
Conference on Artificial Intelligence, 1515-1521

6
Aspect Based Sentiment Analysis Using NeuroNER and BiRNN SoICT 2018, December 2018, Da Nang, Vietnam

[3] Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Ion Androutsopoulos.


2016, SemEval-2016 Task 5: Aspect Based Sentiment Analysis, Proceedings of
Sem-Eval 2016, 19-20
[4] Bollegala, Danushka, Weir David, and Carroll John. 2011, Using multiple
sources to construct a sentiment sensitive thesaurus for cross-domain sentiment
classification. the 11 Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies, 1,
132–141
[5] Wenpeng Yin, Katharina Kann, Mo Yu, Hinrich Schutze. 2017, Comparative
Study of CNN and RNN for Natural Language Processing, arXiv:1702.01923
[6] Sebastian Ruder, Parsa Ghaffari, John G Breslin. 2016, A Hierarchical Model
of Reviews for Aspect-based Sentiment Analysis, Proceedings of the 2016
Conference on Empirical Methods in Natural Language Processing, 999-1005
[7] Soufian Jebbara, Philipp Cimiano. 2016, Aspect-Based Relational Sentiment
Analysis Using a Stacked Neural Network Architecture, ECAI 2016 - 22nd
European Conference on Artificial Intelligence, 29 August-2 September 2016,
The Hague, The Netherlands - Including Prestigious Applications of Artificial
Intelligence (PAIS 2016). 1123 - 1131
[8] Lasguido Nio, Koji Murakami. 2018, Japanese Sentiment Classification Using
Bidirectional Long Short-Term Memory Recurrent Neural Network,
Proceedings of the 24th Annual Meeting Association for Natural Language
Processing, 1119-1122
[9] Andrea Esuli, Fabrizio Sebastiani. 2006, SentiWordnet: A Publicy Available
Lexical Resource for Opinion Mining, Proceedings of the 5th Conference on
Language Resources and Evaluation (LREC’06), 417-422
[10] Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov. 2016,
Enriching Word Vectors with Subword Information, Transactions of the
Association for Computational Linguistics – Volume 5, Issue 1, 135-146
[11] Fellbaum, C. 2005, Wordnet and wordnets. Encyclopedia of Language and
Linguistics, 665-670. Elsevier, Oxford
[12] Steven Bird, Edward Loper. 2004, NLTK: The Natural Language Toolkit,
Proceeding ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective
tools and methodologies for teaching natural language processing and
computational linguistics - Volume 1, 63-70
[13] Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J. 2014, Relation Classifcation via
Convolutional Deep Neural Network, Proceedings of the 25th International
Conference on Computational Linguistics (COLING), 2335-2344
[14] Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.
2011, Natural language processing (almost) from scratch. Journal of Machine
Learning Research 12, 2493-2537
[15] Diederik P. Kingma, Jimmy Lei Ba, Adam. 2014, A method for Stochastic
Optimization, Proceedings of the 3rd International Conference on Learning
Representations (ICLR), arXiv preprint arXiv: 1412.6980
[16] Minqing Hu, Bing Liu. 2004, Mining and Summarizing Customer Reviews,
Proceeding KDD '04 Proceedings of the tenth ACM SIGKDD international
conference on Knowledge discovery and data mining, 168-177
[17] Francois Chollet. 2015, Keras - theano-based deep learning
library.https://github.com/fchollet/keras,
[18] Zhiqiang Toh, Jian Su. 2016, NLANGP at SemEval-2016 Task 5: Improving
Aspect Based Sentiment Analysis using Neural Network Features, Proceedings
of the 10th International Workshop on Semantic Evaluation (SemEval-2016),
282-288
[19] Ayush Kumar, Sarah Kohail, Amit Kumar, Asif Ekbal, Chris Biemann. 2016,
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining
Domain Dependency and Distributional Semantics Features for Aspect Based
Sentiment Analysis, Proceedings of the 10th International Workshop on
Semantic Evaluation (SemEval-2016), 1129-1135
[20] Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Ion Androutsopoulos,
Suresh Manandhar, Mohammad AL-Smadi, Mahmoud Al-Ayyoub, Yanyan
Zhao, Bing Qin, Orphée De Clercq, Véronique Hoste, Marianna Apidianaki,
Xavier Tannier, Natalia Loukachevitch, Evgeny Kotelnikov, Nuria Bel, Salud
María Jiménez-Zafra, Gülşen Eryiğit. 2016, SemEval-2016 Task 5: Aspect
Based Sentiment Analysis, 2016, Proceedings of the 10th International
Workshop on Semantic Evaluation (SemEval-2016), 19-30
[21] Caroline Brun, Julien Perez, Claude Roux. 2016, XRCE at SemEval-2016 Task
5: Feedbacked Ensemble Modelling on Syntactico-Semantic Knowledge for
Aspect Based Sentiment Analysis, Proceedings of the 10th International
Workshop on Semantic Evaluation (SemEval-2016), 277-281

You might also like