Professional Documents
Culture Documents
1
SoICT 2018, December 2018, Da Nang, Vietnam N. Tran
sentence in terms of three kinds of sentiments (positive, negative, Secondly, Jebbara and Cimiano [7] in their works of
neutral) to get the sentiment for each aspect. addressing ABSA task propose a Two-Step Neural Network
There are two current approaches to ABSA which are used architecture that contains two similar main subtasks that are
depending on the type of given datasets. Specifically, the first mentioned above: aspect term extraction and aspect polarity
kind of dataset gives only aspects and their polarity in the training prediction. The authors used BiGRU RNN for both subtasks. As
sentence. (i.e “The food is delicious and the waiters are helpful” the first step, the neural network is used to extract aspects from
{#food – positive} {#services – positive}). Meanwhile, the second the text by considering the problem as a sequence labeling task. In
kind of dataset gives one more information which is entity terms the second step, the network predicts the sentiment polarity label
(i.e “The food is delicious and the waiters are helpful” {food - for each extracted aspect with respect to its context. It uses
#food – positive} {waiters - #services – positive}). For the former pretrained semantic word embedding features from WordNet [11]
case, the problem is considered as a label classification problem combining with part of speech and distance feature. The system
for aspect extraction task. Meanwhile, for the latter case, the succeeded in combining two tasks of ABSA into one system and
aspect extraction task should do the named entity recognition to showed a decent impact for the polarity prediction, but less so for
extract the term then classify the aspect label for it. The objectives the aspect term extraction.
of the second task of the ABSA – aspect polarity prediction is the The third related paper is the one that provides the network
same for both two kinds of dataset. Because the dataset used in architecture idea to develop the model for the Aspect Polarity
this work – SemEval 2016 is the second kind, the proposed ABSA Prediction task. Nio and Murakami [8] present a sentiment
system follows the latter approach. classification model based on the BiLSTM network over three
In this work, the proposed ABSA system contains 2 subtasks. features word embeddings, part of speech tag and SentiWordnet
The first subtask is Aspect Term Extraction, which NeuroNER [9]. The architecture achieved the state-of-art result for Japanese
named entity recognition tool is used to achieve the subtask language dataset. This motivated me to shift this architecture into
objectives. For the second subtask – Aspect Polarity Prediction, a English dataset for sentiment polarity prediction task.
Bidirectional Gated Recurrent Units (BiGRU) Neural Network is This research presents a two-steps system to solve two
implemented and some modifications are added to the network to subtasks like Jebbera and Cimiano work [7], and bases on the
deal with its remaining problems which will be explored further in architecture of the third paper to develop the aspect sentiment
the later sections of this paper. classification for the second subtask with English dataset.
The remaining of this paper is organized as follows: Sect. 2
discusses the related works and researches of ABSA. Sect. 3
presents the proposed ABSA system overall architecture. Sect. 4 3 SYSTEM ARCHITECTURE
discusses the BiGRU neural network of Aspect Polarity Prediction The proposed ABSA system in this paper contains 2 main
subtask, its disadvantages, and the proposed solutions. Sect. 5 subtasks: Aspect Term Extraction and Aspect Polarity Prediction.
introduces the datasets and explains the experimental results and As the first step, the named entity recognition tool NeuroNER is
assessments. Sect. 6 concludes this research and provides plans used to extract entity terms and their corresponding aspect from
for future work. the text. The output of the first subtask is a predicted tag sequence
in the IOB format with the aspect term is outlined in the input
sentence. This result will be used to calculate the Distance feature
2 RELATED WORKS which represents the relative distance of each words in the
This paper work is in line with the growing interest of sentence to the detected aspect term. In the second step, a
implementing Recurrent Neural Network (RNN) in ABSA. recurrent neural network processes each extracted aspect with
Recently, RNN has performed better than Convolutional Neural respect to its context and predicts a sentiment polarity label.
Network (CNN) in most of Natural Language Processing tasks Specifically, each word of the input sentence will be represented
including sentiment classification according to a research from as a concatenated vector from several features (Word
Yin et al., 2017 [5]. This section shares three different related Embeddings, SenticNet [2], Part-of-Speech tagging and Distance.
approaches that inspired my work. After that, the network processes this input sequence feature
Firstly, Ruder et al. [6] presents a system that uses a vectors using a Bidirectional Gated Recurrent Units layer and
hierarchical bidirectional Long Short Terms Memory (BiLSTM) regular feed-forward layer. The output of the network is a single
RNN. Specifically, this approach implements two BiLSTM layers predicted polarity label for the aspect term of interest. The aspect
on top of each other and takes two features as input which are term for which a polarity label is to be predicted is outlined in the
word embeddings and aspect embeddings. The former feature is input sentence. Figure 1. shows the overall structure of the
fed to the first layer while the latter one is fed to the above layer. system.
The output of the stacked BiLSTM network is fed to the final For the first subtask of the system, Jebbera et at. [7] system
layer to determine the polarity of the aspect. This system is stated showed a significant drawback by yielding below 50 percent in
to show competitive result without using any hand-engineered the score of F1, precision, and recall. The small numbers of
features or external resources. correct aspect terms in the first subtask might lead to high
sentiment prediction accuracy in the second subtask. That is the
2
Aspect Based Sentiment Analysis Using NeuroNER and BiRNN SoICT 2018, December 2018, Da Nang, Vietnam
reason why instead of using the recurrent neural network to detect Table 1: Aspect Term Extraction result on SemEval 2016
the aspect term in the input sentence, the named entity recognition English restaurant dataset [20]
tool NeuroNER is proposed to use. NeuroNER is an open source
and freely available named entity recognition tool based on the Version F1 Ranking
artificial neural network. This tool takes the sentiment analysis NLANGP [18] 0.730 1/30
input data with pretrained entities for each sentence and their ESI 0.679 10/30
respective aspects as labels. The output of NeuroNER is the IOB NeuroNER 0.674 13/30
entities tagging for each sentence and their aspects. To evaluate IIT-T [19] 0.612 20/30
the performance of NeuroNER, it is performed on the BUAP 0.372 30/30
SemEval2016 restaurant English dataset. Comparing with the
other system result with the same dataset for Aspect Term
Extraction task, NeuroNER achieved a decent result, ranked 13 4 ASPECT POLARITY PREDICTION
over 30 systems that participated in the competition. Table 1.
This section demonstrates the second subtask of the ABSA
shows the F1 score of NeuroNER compared with other systems in
system – Aspect Polarity Prediction. The main objective of this
aspect term extraction task for this dataset
subtask is predicting the polarity for detected aspect term given
The second task of the system – Aspect Polarity Prediction
from the result of the first subtask Aspect Term Extraction. To
contains the majority of this work contribution will be discussed
address this problem, as mentioned above, the network
in detail in the next section.
architecture proposed by Nio and Murakami [8] is referred. In
their work, the authors present a sentiment polarity prediction
The food was great, but the waiter was rude architecture that used BiLSTM RNN with three features as input:
positive negative word embeddings, part of speech tagging and SentiWordnet.
Based on this network, the architecture for the second ABSA
subtask is implemented with some modifications. Instead of using
BiLSTM RNN, BiGRU RNN is used to process the input features.
ASPECT POLARITY PREDICTION LSTM and GRU are the two most popular used network models
(BiGRU) of RNN. However, GRU has a simpler architecture than LSTM,
fewer parameters thus less demanding computation and faster
training but still produces competitive results compared with
LSTM in many NLP tasks. Additionally, the use of the
bidirectional model allows the model to be aware of both previous
⃗
𝒔 ⃗⃗⃗
𝒘 ⃗𝒅 ⃗
𝒑
and subsequent context of the input data. For the input of the
neural network, 4 features are used: word embeddings, SenticNet,
Part of Speech tagging and Distance.
The food was great, but the waiter was rude
4.1 Features
O B O O O O B O O 4.1.1 Word Embeddings. This is the most important feature
#food #service that has been successfully implemented in numerous NLP tasks.
In this work, fastText [10] – a library for learning word
embeddings for 294 languages introduced by Facebook’s AI
Research (FAIR) lab is used. It uses the skip-gram model on a
ASPECT TERM EXTRACTION corpus of 50 thousand restaurant reviews collected by Mehrbod
(NeuroNER) Safari to learn word representation model. After that, this model is
used to compute 100-dimensional embeddings vector for each
word. The sequence of word embeddings vectors for a sentence
with words 1…N is denoted as:
The food was great, but the waiter was rude [𝑤]1𝑁 = {𝑤1 , … , 𝑤𝑁 } 𝑤𝑖𝑡ℎ 𝑤𝑖 ∈ 𝑅100 (1)
3
SoICT 2018, December 2018, Da Nang, Vietnam N. Tran
specifying 5 values: pleasantness, attention, sensitivity, aptitude, receive the sequence that each word will be a 141 dimensions
polarity. These provided scores are then included in the model as vector. The sequence of input vectors for a sentence with N words
additional input source that the neural network can get is denoted as:
information from. Since SenticNet provides semantics and
[𝑢]1𝑁 = {(𝑤1 , 𝑠1 , 𝑝1 , 𝑑1 )𝑇 , … , (𝑤1 , 𝑠1 , 𝑝1 , 𝑑𝑁 )𝑇 } 𝑤𝑖𝑡ℎ 𝑢𝑖 ∈
polarity information of a concept, the aspect polarity prediction
𝑅100+5+35+1 (5)
system will be benefited from this knowledge. In consequence,
each word inside the input sentence will be constructed a 5- The resulting sequence is then fed to the BiGRU layer that
dimensional feature vector using SenticNet 3 and will be referred produces an output sequence of recurrent states:
as sentic vector. In case the word outside of the knowledge base, it [𝑔]1𝑁 = 𝐵𝐼𝐺𝑅𝑈([𝑢]1𝑁 ) = {(𝑔 𝑔1 𝑇 , … , (𝑔
⃗⃗⃗⃗1 , ⃖⃗⃗⃗⃗) 𝑔𝑁 𝑇 } 𝑤𝑖𝑡ℎ ⃗⃗⃗
⃗⃗⃗⃗⃗𝑁 , ⃖⃗⃗⃗⃗⃗) 𝑔𝑖 , 𝑔
⃖⃗⃗⃗𝑖 ∈
is given a default 5 zero scores for 5 values. Each sentic vector si 𝑅25 (6)
is considered as an additional word vector for the word i. The
One layer processes the input from left to right and the other
sequence of sentic vectors for a sentence with words 1…N is
layer processes it in the reverse order. Then the final state of
denoted as:
forward and backward GRU will be concatenated to receive a
[𝑠]1𝑁 = {𝑠1 , … , 𝑠𝑁 } 𝑤𝑖𝑡ℎ 𝑠𝑖 ∈ 𝑅5 (2) fixed sized representation ℎ = (𝑔 𝑔1 𝑇 ∈ 𝑅50 of the aspect
⃗⃗⃗⃗⃗𝑁 , ⃖⃗⃗⃗⃗)
term. Then it will be passed to a densely connected feed-forward
layer producing another hidden representation. Finally, a densely
4.1.3 Part of Speech. Part of Speech is the next feature in the
connected layer with a softmax activation function processes that
system besides word embeddings and sentic vectors. Based on the
hidden representation to a 3-dimensional vector representing a
work of Nio and Murakami., this feature can aid the sentiment
probability distribution over the three polarity labels positive and
polarity prediction. A 1-of-K coding scheme transforms each tag
negative and neutral. The highest polarity label will be chosen as
into a K-dimensional vector that represents its corresponding tag.
the predicted label for the aspect. To update the parameters and
Specifically, in this research, the NLTK POS Tagger [12] which
optimize the model, Adam [15] technique is used.
has a total of 35 tags is used. These vectors are then concatenated
Figure 2. below shows the neural network architecture for
with their respective word vectors before being fed to the neural
Aspect Polarity Prediction task:
network. The sequence of POS tag vectors for a sentence with
words 1… N is denoted as:
[𝑝]1𝑁 = {𝑝1 , … , 𝑝𝑁 } 𝑤𝑖𝑡ℎ 𝑝𝑖 ∈ 𝑅35 (3)
4
Aspect Based Sentiment Analysis Using NeuroNER and BiRNN SoICT 2018, December 2018, Da Nang, Vietnam
“I liked the atmosphere very much but the food was not worth
4.3.1 Problem 1: Predicting the incorrect polarity for the price”
sentences that has more than 2 aspect terms which have 2 Predicted: atmosphere – positive; food – positive
different polarities: Correct: atmosphere – positive; food – negative
Example 1:
“Nice ambiance but highly overrated price” Example 2:
Predicted: ambiance – positive; price – positive “The food was not great and the waiters were rude”
Correct: ambiance – positive; price – negative Predicted: food - positive; waiter - negative
Correct: food - negative; waiter – negative
The reason that caused this problem is the sentence contains
Example 2: negation words, for example, “not, neither, etc.”. The fact that
“The food was great, the margaritas too but the waitress was neural network did not have any methods to deal with this
too busy being nice to her other larger party” problem caused wrong polarity prediction for every aspect terms
Predicted: food - positive; margaritas - positive; waitress in sentences that includes negation words.
- positive To solve this problem, a list of negation words and use two
Correct: food - positive; margaritas - positive; waitress lists of positive and negative words proposed by Minqing Hu and
– negative Bing Liu 2004 [16] are used. Each sentence will be given a score
The reason that causes this problem might be the fact that the called as sentiment score. For each word in the input word
neural network takes the whole sentence as input. In fact, in a window in the negative list, the score will be deducted by 1 and
sentence that has two or more aspect terms, only some words will be added by 1 if in the positive list. For each sentence that has
related to the aspect term decide its polarity. That is the reason negation words, the system detects the nearest aspect term for
why the idea of the solution is trying to limit the range of words each negation word. The sentiment score of the aspect term linked
surrounding the aspect term that are passed to the network for to that negation word will be multiplied by minus 1. The final
predicting the polarity of the term. Taking one example in the sentiment score will be considered as a feature 1-dimensional
dataset that has been predicted into consideration (bold words are vector and be concatenated to the word vector which has
aspect terms): contained the others 4 features already. The new sequence of
input vector for a sentence with N words in now become: (100-
“Nice ambiance but highly overrated price” dimensional word embeddings vector + 5-dimensional sentic
vector + 35-dimensional part of speech vector + 1-dimensional
The above architecture predicted polarity for both “ambiance” distance vector + 1-dimensional sentiment score vector):
and “price” positive because it takes all words in the sentence as
[𝑢]1𝑁 = {(𝑤1 , 𝑠1 , 𝑝1 , 𝑑1 , 𝑠𝑐𝑜𝑟𝑒1 )𝑇 , … , (𝑤1 , 𝑠1 , 𝑝1 , 𝑑𝑁 , 𝑠𝑐𝑜𝑟𝑒𝑁 )𝑇 }
input to predict the polarity. It may lead to a situation that the
network takes the positive Senticnet score of the adjective “Nice” 𝑤𝑖𝑡ℎ 𝑢𝑖 ∈ 𝑅100+5+35+1+1 (7)
to decide the polarity for the term “price”. As a result, the term Below is the example that shows how this solution works on a
“price” got the wrong polarity – negative. That is the reason why sentence “The food was not great and the waiters were rude”:
the range of the input words is limited like the example above. Nearest aspect to negation word “not”: food
The bold words should be the only words passed to the network Sentiment score for input “food was not great”:
for predicting. If the network only gets “nice ambiance” and (0+0+0+1) x -1 = 1
“highly overrated price” as input, the predicted polarity for Sentiment score for input “waiters were rude”:
“ambiance” and “price” will be correct. (0+0-1) = -1
The applied technique that showed improvement to the system
performance is called “Nearest Adjective”. This technique limits
the input word window of an aspect term from the term itself to 5 EXPERIMENTS AND RESULTS
nearest adjective to it. If the sentence does not have any
adjectives, we will keep all the words of the sentence as the input 5.1 Dataset
of the network like the old version. Below is an example of how The datasets are used in this task is the English restaurant
this technique works (underlined words are aspect terms and bold dataset from SemEval-2016 Task 4, which are used for Aspect
word are adjectives) Sentiment Based Analysis. Each of the sentences in the training
Sentence: “The food is delicious but the service is terrible” dataset contains a pair of entity E and attribute A towards which
Term 1 window: “food is delicious” opinion is expressed. E and A are from the inventory of entity
Term 2 window: “service is terrible” types (e.g restaurant, food, drinks) and attribute labels (e.g prices,
quality). Each E and A pair like that defines an aspect and is
4.3.2 Problem 2: Predicting the incorrect polarity when assigned a polarity from a set of {negative, positive, neutral}.
sentences has negation words:
Example1: 5.2 Results and Evaluation
5
SoICT 2018, December 2018, Da Nang, Vietnam N. Tran
This section shows experimental results for the second subtask Table 4. Dataset sentences examples that have incorrect aspect
of the ABSA system: Aspect Polarity Prediction. Follow the polarity prediction in the first version and correct aspect
architecture and improved solutions mentioned in Section 4, the polarity prediction after applying proposed solutions
results of three version of the system will be presented here in
Table 2. The first is the BiGRU RNN network without any Example Sentences First Improved
modification. The second is the network combined with Nearest Version Version
Adjectives technique to solve problem 1. And the third is the
The food is great, the Food Food
second one combined with solutions for negation problem. The
margaritas is good too, but the Positive Positive
accuracy is calculated by the number of correct polarity prediction
waiters were busy being nice
over the set of correct aspect term extraction from subtask 1. All
to others Margaritas Margaritas
experiments were performed with the deep learning library Keras
[17] and use many of its implemented algorithms. Positive Positive
Follow the experimental results for three versions of the
second subtask, evaluation of the improved architecture and Waiters Waiters
comparison between the model result and other model submitted Positive Negative
for SemEval2016 Task 4 will be discussed Nice ambiance but highly Ambiance Ambiance
overrated place Positive Positive
6
Aspect Based Sentiment Analysis Using NeuroNER and BiRNN SoICT 2018, December 2018, Da Nang, Vietnam