Professional Documents
Culture Documents
, where
refers
number. Now its time to summarize the nu
assessments, which is indicated as
, and
negative ones, which is indicated as
, for t
in order to attain the desired assessment set
one mapping with the specific feature set
. H
We can put
and
and
.
two approaches: one is called Widow Mecha
the other is called Syntax Matching (SM).
a. Window Mechanism (WM)
WM is a pragmatic and applied method
Chinese lexical analysis and part-of-speech t
proposed by Fan Miao [9]. Its principle schem
WM makes use of the nearest-assoc
meaning that the nearest modifier is suppo
greatest influence on the headword, nam
according to most idiomatic expression ways
started from the headword to search for the
and identify its orientation in both forward
direction within a settled range which is th
Window. But it is not adequate since there m
d and following
define the final
s since they also
cation. Negative
an pick them up
d as .
re assessment set
whole category of
ific feature set of
ws. This specific
, where k is the
n fact, reviews in
of an item, so
d we do not have
items in database
nknown features.
ferent item has
mental analysis on
cific feature set
to the feature
umber of positive
d the number of
the j
feature
that is one-to-
Here
.
is an example.
t of an example item
Here we provide
anism (WM) and
d on the basis of
tagging which is
me is as Fig.6.
ciated principle,
osed to have the
mely the feature,
of Chinese. It is
nearest modifier
d and backward
he so-called Big
may exist negative
words or double negation that can
decide to introduce the WM again,
modifier as the new headword an
named Small Window. Similarly,
headword to search for negative
however, within the intersection o
discovered negative words is eve
maintains the same; while the orig
the number is odd. Thus the
improved.
Fig.6 The principle scheme of W
b. Syntax Matching (SM)
SM is a completely different w
is the training of syntax path. The
the shortest path from feature to m
To illustrate it, Fig.7 shows a synta
Deposit to the modifier New.
Fig.7 An example of a syntax path of a
The syntax path ought to be: UP
@CP#DOWN@CP#DOWN@IP#
UP/DOWN means going up and
node, and # stands for the partition b
We have utilized abundant nat
training data to acquire 3319 distin
models which are stored in data
compare the newly obtained sy
database. Once they match each
relationship between the headword
review can be recognized, so can t
Besides, SM has an extra advant
machine learning, meaning that a n
in the database so that models will
will be increased.
change the orientation. We
where we regard searched
nd define a new range ,
it is started from the new
words in both directions,
f and . If the number of
n, the original orientation
ginal orientation reverses if
identification accuracy is
Window Mechanism
way from WM. Its core idea
syntax path here refers to
modifier in the parsing tress.
ax path from the headword
a common Chinese sentence
P@NP#UP@NP#DOWN
#DOWN@VP, where
down, @TAG represents
between different actions.
tural language materials as
nct types of syntax path as
abase. Therefore, we can
yntax path with those in
h other successfully, the
d and modifier in handled
the sentimental orientation.
tage since it is based on
new path can then be added
be upgraded and accuracy
238
As for the issue of negative words and double negation,
SM makes use of the similar approach as WM, so here we will
not repeat it.
Finally we integrated both WM and SM in our system
thus the actual analysis accuracy has been greatly improved.
The following figure is the result of analysis for a certain
review with these two approaches in our prototype system.
Fig.8 A sentimental analysis result with both WM and SM
D. Acquiring specific feature weight set
Now we have already got specific feature assessment
set
of the i
feature of the i
item should be,
i
, (1)
where k
item.
However, to a particular customer, the importance of
different feature cannot be the same. Take a mobile for
example, a certain customer may pay more attention to its
price rather than whether it has access to the Internet. So we
have to distribute different weights to different features.
Specifically, for the i
.
Fig.9 The customer_browser table
To work out
in customer_browser table to
be
in
customer_preference table is indicated as
item,
its j
item is
, so there is a ratio
B
. (2)
While in customer_preference table the frequency is
and there is also a ratio
P
. (3)
Then we set the weight of feature
as
. (4)
For those features that never appear in the two tables,
their weight will be 0 known from equation (4). It is
reasonable because we want to make those items whose
features that customer mainly cares about stand out when
doing recommendation.
However, if one of these two tables is empty, its influence
on the final
should
either be B
or P
. Furthermore, complete
emptiness reveals that the customer has never bought anything
or cared about any feature. It shows all the features have the
same importance for the customer and we will let the feature
weight set satisfy
for the i
item.
Till now, we can get the specific feature weight set and
the total mark of the i
, (5)
239
E. Acquiring overall feature weight set
Although equation (5) has considered the situation where
a certain customer may pay more attention to some particular
features, it is still not complete. There is a problem: in initial
database, the number of mentioned time of different features is
not the same. Direct using of equation (5) may weaken the
influence of those features seldom mentioned. Here is an
example:
Suppose item A and B have two features: quality and
price. For A, its quality has 100 positive and 0 negative
reviews; its price has 10 positive and 0 negative reviews. For
B, its quality has 210 positive and 0 negative reviews; its price
has 0 positive and 100 negative reviews. According to
equation (5) and assume
for both A
and B, then
A quanlity w ,
A piice w ,
A w ,
B quanlity w ,
B piice w ,
B w ,
So
A B.
Although A is praised in both features while B is
criticized on its price, their marks are the same. Apparently the
influence of price is weakened. Since features are born to the
same, we need to take measures to balance their influences.
For the feature
. Then we let C
so
that the overall feature weight set can be got, which is
C
.
Every feature will be influenced by , so equation (5) is
changed to be
i
, (6)
With equation (6), the problem of previous example can
be solved,
A w
w,
B w
w.
Thus the grade of B is lower than A, which satisfies
common sense.
E. Acquiring item weight set
Additionally, we have to notice that different items have
different number of known features as mentioned above. This
difference can as well be too big to fairly grade and rank
items. To overcome this problem and balance the inequity,
different items should be assigned with weights and such
weights form a new set called item weight set, which is
for the i
item, we have
.
Hence, equation (6) can be further improved as
i
, (7)
Now we eventually acquire the final form of grading
equation and we are able to use it for real grading, ranking and
recommendation.
III. EXPERIMENTAL RESULTS OF SOME MAJOR PROCESSES
After the design of the comprehensive recommendation
algorithm, we took advantages of the information of multi-
band mobiles from Amazon.cn (http://www.amazon.cn). Their
titles and reviews are stored in the database. On the basis of
these real data in life we tested some major processes in our
proposed algorithm.
A. Extraction of set , and
Table I shows the experimental results of set extraction.
We then added these sets into our database for further use.
TABLE I
RESULTS OF SET EXTRACTION
Item Number
Review 15099
Overall features 91
Opinion Modifier 850
Negative 35
B. Judgment of sentimental orientation for reviews
Table II shows the result of orientation judgment for
reviews. Through the integration of WM and SM we can see
the final accuracy is around 86%.
TABLE II
RESULTS OF ORIENTATION JUDGMENT FOR REVIEWS
Item Number
Review 15099
Feature hit review 10267
Correct judgement 8829
C. Analysis of time complexity
Our approach has a time complexity of t m under the
worst situation, where t refers to the number of items and m
refers to the number of overall features. However, the existing
item-based collaborative filtering whose complexity can be
t