You are on page 1of 6

The Design and Implementation of Feature-Grading

Recommendation System for E-Commerce


Luo Yi
International School
Beijing University of Posts and
Telecommunications
Beijing, China, 100876
luoyi.zxx@gmail.com
Fan Miao
School of Software Engineering
Beijing University of Posts and
Telecommunications
Beijing, China, 100876
fanmiao1120@gmail.com
Zhou Xiaoxia
School of Insurance and Economics
University of International Business and
Economics
Beijing, China, 100029
xiaoxiazhou0719@126.com

Abstract In this paper we present a novel approach named
Feature-Grading which is a comprehensive algorithm used to
make recommendation of commodities in e-commerce business.
It is a technique based on the integration of feature mining,
sentimental analysis, and the records of customer historical
behaviors. The overall process of Feature-Grading can be
separated into 5 key steps: 1.Extracting overall feature set of a
group category of commodities; 2.Extracting modifier set and
negative words set; 3.Acquiring specific feature set and feature
assessment set; 4.Acquiring specific feature weight set;
5.Acquiring item weight set. After these 5 steps, we are able to
grade and rank all the items with an acquired grading equation.
Then the needed as well as top ranking items can be
recommended. Moreover, we utilize the real information of
mobiles and their reviews from the famous e-commerce website
Amazon.cn as our experimental data and discuss some important
results which reveal that the Feature-Grading really works well.
At last, we also briefly introduce the prototype recommendation
system we developed on the basis of Feature-Grading.

Keywords Feature-Grading; Feature mining; Sentimental
Analysis; Historical behaviors; Recommendation

I. INTRODUCTION
In the e-commerce, there are two major approaches for
customers to meet items face-to-face. One is called
Customer-active which is achieved by customers themselves
through some search engines. The other way is accomplished
by merchants with a kind of recommendation system to
recommend commodities. We call it Items-active.

For Customer-active, what a customer enters in search
engines reveals what he/she wants. Existing search engines for
commodities utilizes the similar techniques as those for
normal web pages which is based on key-words matching,
meaning that items saved in the database should be tagged
with enough key words. Most of such key words, however, are
manually appended by merchants. This mechanism is very
low-efficient. It is easy to neglect some vital features as well.
If there is a system which can automatically mine out the key
features, (i.e. the key words), of a group category of items,
then it is possible to complete the marking process with less
manual operation so as to improve comprehensive efficiency.
This should be our first mission, since the mining of features
not only benefits the existing Customer-active searching
approaches, but also acts as the fundamental of our proposed
recommendation algorithm.

As for Item-active, we have more words to say because
it executes the function of a recommendation system better.
Since the birth of e-commerce, there has arose many
recommendation algorithms. A latest and popular method is
called Collaborative Filtering [1, 2, 3]. It has two typical
types, one is user-based and the other is item-based. The main
idea of user-based is that many users may have similar
purchasing behaviors so that they are put into a same group.
Once a member has bought a certain item, this item will be
recommended to other members in the same group. However,
the item-based approach connects similar commodities rather
than users together. If an item is purchased then a similar one
may be recommended. The integration of such two approaches
achieves relatively good performance, resulting in the widely
use of Collaborative Filtering algorithm in contemporary large
e-commerce websites [4, 5].

However, such algorithm fails to consider diverse
assessments and reviews after each item. Therefore sometimes
many low rating items are recommended, merely because they
are similar to what user has purchased. Hence, a better system
should understand how to rank recommended commodities
and provide both related and highly appreciated items.
Apparently, it involves evaluation, which can only be done by
customers in common sense. Therefore, our task is to analyze
on the customer reviews then extract their sentimental
orientation to accomplish the final grading and ranking
process.

Besides, we also believe the current general model of
recommendation will gradually become more personalized.
Thats why we further proposed an improved algorithm which
can make personal recommendation towards a specific
customer based on his/her historical behaviors.
Thus, a more complete, reliable, and personalized
recommendation algorithm has been proposed in this paper on
the basis of practical business demands and existing systems
drawbacks. We call it Feature-Grading algorithm. Meanwhile,
we also developed its corresponding prototype system. (See
Fig.1) Our Chinese experimental dates of multi-brand mobiles
and their reviews come from Amazon.cn.
Proceeding of the IEEE
International Conference on Information and Automation
Shenzhen, China June 2011
978-1-61284-4577-0270-9/11/$26.00 2011 IEEE
236
Fig.1 The prototype system of Feature-Grading
II. THE DESIGN OF ALGORITHM
The overall process of Feature-Gradin
designed as Fig.2. At beginning, only comm
reviews are stored in database.

Fig.2 The overall process of Feature-Grading
By following the arrow we can attain t
equation for recommendation.
Now well move on to the details.
A. Extracting overall feature set of
commodities
This critical is the base of the whole syste
First, we use the ICTCLAS (Institute
Technology, Chinese Lexical Analysis Syst
each review into independent words with par
With further optimizing, we can obtain satisf
tagging results. An example is as below.

Fig.3 An example of a splitted and tagged r

g algorithm
M
ng algorithm is
modities and their

algorithm
the final grading
a category of
em.
e of Computing
tem) [6] to split
rt-of-speech tags.
fied splitting and

review
After getting the desired review
typical features from these separ
Chinese expression habits, feature
are tagged with /n.
We firstly come up with the
Mining, typically the Apriori A
features. This means is achieved b
appearing frequently in the reviews.
once a customer made comments on
are involved. Through the handling
Apriori Algorithm, we can event
feature set.
Another plain method is
Normally, features are always
frequently than other words in a sen
these top ranked words to form the
but less reliable.
Actually we integrated these
properly manual optimization. We
satisfied feature set of a category

, where m is the num



B. Extracting modifier set and neg
For the following sentimental
recognize modifiers in reviews and
the splitted reviews, modifiers are
with /a and /v. Here we only co
namely those can only be positi
utilized a simple and effective
modifier synonym group in WordN
and its orientation.
In WordNet, synonyms are dis
we can simply treat words in the s
sentimental orientation. (See Fig.4)

Fig.4 Polarized modifier synony
We initially pick up some qual
hand and mark them with label.
certain modifier, we firstly check w
a seed. If yes, its orientation will di
need to traverse the synonyms o
WordNet. Once a synonym turns to
of the original modifier can be judg
Additionally, we will add a new mo
w, we now need to extract
rate words. According to
s are always nouns which
idea of using Association
Algorithm [7], to identify
y mining associated words
. The reason to use it is that
n an item, associated words
g of the whole reviews with
tually acquire appropriate
Word Frequency Count:
nouns and occur more
ntence. We can simply filter
final feature set. Its quick
two methods and applied
e indicated the eventually
y of commodities as
mber of total features.
gative words set
analysis, we have to firstly
d judge their orientation. In
e always the words tagged
onsider polarized modifiers,
ve or negative. We have
way which involves the
et [8] to identify a modifier
stributed into one group, so
same group have the same

ym group in WordNet
lified modifiers as seeds by
. Then when dealing with a
whether the modifier is just
irectly be judged; if no, we
of the desired modifier in
o be a seed, the orientation
ged through such synonym.
odifier into database so that
237
the stored seeds group can be enhanced
judgment accuracy can be meliorated. We
satisfied modifier set as .
Then we need to extract negative words
contribute to the final sentimental identific
words are always tagged with /d, we ca
manually. The negative word set are indicated
C. Acquiring specific feature set and featur
of an individual item
After obtaining the feature set of a w
commodities, we are able to acquire the speci
each individual item through their review
feature set is indicated as

number of known features an item owns. In


database cannot cover all the features o
uncommented features are unknown to us and
right to decide them subjectively. Therefore, i
can only be considered as they dont have un
Obviously, is a subset of and diff
different . Then we will continue take sentim
reviews to get assessment set.
Particularly, for the i

item, its spec


is

, where

refers
number. Now its time to summarize the nu
assessments, which is indicated as

, and
negative ones, which is indicated as

, for t
in order to attain the desired assessment set
one mapping with the specific feature set

. H

We can put

and

into a table, Fig. 5 i



Fig.5 The specific feature set and feature assessment set
The question is how to get

and

.
two approaches: one is called Widow Mecha
the other is called Syntax Matching (SM).
a. Window Mechanism (WM)
WM is a pragmatic and applied method
Chinese lexical analysis and part-of-speech t
proposed by Fan Miao [9]. Its principle schem
WM makes use of the nearest-assoc
meaning that the nearest modifier is suppo
greatest influence on the headword, nam
according to most idiomatic expression ways
started from the headword to search for the
and identify its orientation in both forward
direction within a settled range which is th
Window. But it is not adequate since there m
d and following
define the final
s since they also
cation. Negative
an pick them up
d as .
re assessment set
whole category of
ific feature set of
ws. This specific
, where k is the
n fact, reviews in
of an item, so
d we do not have
items in database
nknown features.
ferent item has
mental analysis on
cific feature set
to the feature
umber of positive
d the number of
the j

feature

that is one-to-
Here

.
is an example.

t of an example item
Here we provide
anism (WM) and
d on the basis of
tagging which is
me is as Fig.6.
ciated principle,
osed to have the
mely the feature,
of Chinese. It is
nearest modifier
d and backward
he so-called Big
may exist negative
words or double negation that can
decide to introduce the WM again,
modifier as the new headword an
named Small Window. Similarly,
headword to search for negative
however, within the intersection o
discovered negative words is eve
maintains the same; while the orig
the number is odd. Thus the
improved.
Fig.6 The principle scheme of W
b. Syntax Matching (SM)
SM is a completely different w
is the training of syntax path. The
the shortest path from feature to m
To illustrate it, Fig.7 shows a synta
Deposit to the modifier New.
Fig.7 An example of a syntax path of a
The syntax path ought to be: UP
@CP#DOWN@CP#DOWN@IP#
UP/DOWN means going up and
node, and # stands for the partition b
We have utilized abundant nat
training data to acquire 3319 distin
models which are stored in data
compare the newly obtained sy
database. Once they match each
relationship between the headword
review can be recognized, so can t
Besides, SM has an extra advant
machine learning, meaning that a n
in the database so that models will
will be increased.
change the orientation. We
where we regard searched
nd define a new range ,
it is started from the new
words in both directions,
f and . If the number of
n, the original orientation
ginal orientation reverses if
identification accuracy is

Window Mechanism
way from WM. Its core idea
syntax path here refers to
modifier in the parsing tress.
ax path from the headword

a common Chinese sentence
P@NP#UP@NP#DOWN
#DOWN@VP, where
down, @TAG represents
between different actions.
tural language materials as
nct types of syntax path as
abase. Therefore, we can
yntax path with those in
h other successfully, the
d and modifier in handled
the sentimental orientation.
tage since it is based on
new path can then be added
be upgraded and accuracy
238
As for the issue of negative words and double negation,
SM makes use of the similar approach as WM, so here we will
not repeat it.
Finally we integrated both WM and SM in our system
thus the actual analysis accuracy has been greatly improved.
The following figure is the result of analysis for a certain
review with these two approaches in our prototype system.


Fig.8 A sentimental analysis result with both WM and SM
D. Acquiring specific feature weight set
Now we have already got specific feature assessment
set

of the i

item. It is time to consider how to recommend


good items, which requires us to grade each item and rank
them.
A direct idea for grading is to grade each feature
individually for an item and find the sum as its final mark. For
the j

feature of the i

item, more positive reviews means


the feature are more appreciated, so we simply consider

as its grades. In a word, the total mark of the i


item should be,
i

, (1)
where k

is the number of specific features of the i

item.
However, to a particular customer, the importance of
different feature cannot be the same. Take a mobile for
example, a certain customer may pay more attention to its
price rather than whether it has access to the Internet. So we
have to distribute different weights to different features.
Specifically, for the i

item we can get a feature weight


set

.


Fig.9 The customer_browser table
To work out

, we firstly introduce two tables. The first


one is called customer_browser table, which records a
customers historical purchase details. As long as the customer
buys a certain item, its features and their corresponding
number of mentioned times (both positive and negative) will
be recorded in the table. Fig.9 is a real example. The second
one is customer_preference table that reveals what a customer
has intently cared about. See Fig.10.


Fig.10 The customer_preference table
Obviously, the integration of these two tables to some
degree reflects a customers interests as well as demands so
they will directly influence the feature weight set.
At the beginning we only consider the situation where
these two tables are not empty. Set the frequency (number of
appearance times) of feature

in customer_browser table to
be

(i=1, 2, , m), then the total frequency for all features


should be B

, where m is the number of all


features. Similarly, the frequency of feature

in
customer_preference table is indicated as

, and the total


frequency is P

. Now we focus on the i

item,
its j

item is

. Its frequency in customer_browser table


is

, so there is a ratio
B

. (2)
While in customer_preference table the frequency is


and there is also a ratio
P

. (3)
Then we set the weight of feature

as

. (4)
For those features that never appear in the two tables,
their weight will be 0 known from equation (4). It is
reasonable because we want to make those items whose
features that customer mainly cares about stand out when
doing recommendation.
However, if one of these two tables is empty, its influence
on the final

should be set to nil, which means

should
either be B

or P

. Furthermore, complete
emptiness reveals that the customer has never bought anything
or cared about any feature. It shows all the features have the
same importance for the customer and we will let the feature
weight set satisfy

for the i

item.
Till now, we can get the specific feature weight set and
the total mark of the i

item is revised as,


i

, (5)
239
E. Acquiring overall feature weight set
Although equation (5) has considered the situation where
a certain customer may pay more attention to some particular
features, it is still not complete. There is a problem: in initial
database, the number of mentioned time of different features is
not the same. Direct using of equation (5) may weaken the
influence of those features seldom mentioned. Here is an
example:
Suppose item A and B have two features: quality and
price. For A, its quality has 100 positive and 0 negative
reviews; its price has 10 positive and 0 negative reviews. For
B, its quality has 210 positive and 0 negative reviews; its price
has 0 positive and 100 negative reviews. According to
equation (5) and assume

for both A
and B, then
A quanlity w ,
A piice w ,
A w ,
B quanlity w ,
B piice w ,
B w ,
So
A B.
Although A is praised in both features while B is
criticized on its price, their marks are the same. Apparently the
influence of price is weakened. Since features are born to the
same, we need to take measures to balance their influences.
For the feature

of overall feature set , assume its


number of mentioned times (both positive and negative) in
reviews in database is T

. Then we let C

so
that the overall feature weight set can be got, which is
C

. Obviously it is one-by-one mapping


with overall feature set

.
Every feature will be influenced by , so equation (5) is
changed to be
i

, (6)
With equation (6), the problem of previous example can
be solved,
A w
w,
B w
w.
Thus the grade of B is lower than A, which satisfies
common sense.
E. Acquiring item weight set
Additionally, we have to notice that different items have
different number of known features as mentioned above. This
difference can as well be too big to fairly grade and rank
items. To overcome this problem and balance the inequity,
different items should be assigned with weights and such
weights form a new set called item weight set, which is

, where t is number of items.


Assuming the number of features is k

for the i


item, we have

.
Hence, equation (6) can be further improved as
i

, (7)
Now we eventually acquire the final form of grading
equation and we are able to use it for real grading, ranking and
recommendation.
III. EXPERIMENTAL RESULTS OF SOME MAJOR PROCESSES
After the design of the comprehensive recommendation
algorithm, we took advantages of the information of multi-
band mobiles from Amazon.cn (http://www.amazon.cn). Their
titles and reviews are stored in the database. On the basis of
these real data in life we tested some major processes in our
proposed algorithm.
A. Extraction of set , and
Table I shows the experimental results of set extraction.
We then added these sets into our database for further use.
TABLE I
RESULTS OF SET EXTRACTION
Item Number
Review 15099
Overall features 91
Opinion Modifier 850
Negative 35
B. Judgment of sentimental orientation for reviews
Table II shows the result of orientation judgment for
reviews. Through the integration of WM and SM we can see
the final accuracy is around 86%.
TABLE II
RESULTS OF ORIENTATION JUDGMENT FOR REVIEWS
Item Number
Review 15099
Feature hit review 10267
Correct judgement 8829
C. Analysis of time complexity
Our approach has a time complexity of t m under the
worst situation, where t refers to the number of items and m
refers to the number of overall features. However, the existing
item-based collaborative filtering whose complexity can be
t

p under the worst situation, where p is the number of


customers who bought the item. The comparison reveals our
method has much less time complexity than existing ones.
IV. THE IMPLEMENTATION OF PROTOTYPE SYSTEM
Based our proposed Feature-Grading algorithm, we has
developed the prototype system. (See Fig.1) It connects with
underlying database and keeps on recording a particular
customers historical behaviors. Customer can use it to make
real-time recommendation and synthesized recommendation.
For real-time recommendation, a customer is asked to
input his/her preferred features. The system then will grade
and rank all the items based on the customers current wants
through the method of Feature-Grading.
Fig.10 shows the results of real-time recommendation
according to the customer Luo Yi.
240

Fig.10 The real-time recommendation to customer Luo Yi
For synthesized recommendation, since a customers
historical behaviors are saved, the system then directly use
those records to grade items. Fig.11 is an example of
synthesized recommendation to customer Fan Miao.


Fig.11 The synthesized recommendation to customer Fan Miao
Besides, the module of sentimental analysis of review is
added into function list so that customer can make his/her own
review on some item. (See Fig.8) The new reviews will of
course influence future recommendation.
V. CONCLUSIONS AND FUTURE WORK
This paper has creatively proposed a recommendation
algorithm called Feature-Grading and further introduced its
corresponding prototype system. We mainly focused on the
design of 5 process of this algorithm and also discussed some
key results after experiments. These results revealed that the
Feature-Grading method works very well. It overcome some
drawbacks of existing recommendation systems and extended
their ability as well.
Our future efforts will be spent on the improvements of
sentimental analysis of reviews. We plan to expand the handle
range from simple sentence to compound sentence, including
transitional sentence, comparative sentence, and imperative
sentence and so on.
REFERENCES
[1] P. Resnick, N. Iacovous, M. Suchak, P. Bergstrom, and J. Riedl,
GroupLens: An Open Architecture for Collaborative Filtering of
Netnews, In Proceedings of CSCW94, Chapel Hill, NC.
[2] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl., Analysis of
Recommendation Algorithms for E-Commerce, In Proceedings of
ACM E-Commerce, 2000.
[3] W. Hill, L. Stead, M. Rosenstein, and G. Furnas, Recommending and
Evaluating Choices in a Virtual Community of Use, In Proceedings of
CHI95.
[4] U. Shardanand, and P. Maes, Social Information Filtering: Algorithms
for Automating Word of Mouth, In Proceedings of CHI 95. Denver,
CO.
[5] J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl,
GroupLens: Applying Collaborative Filtering to Usenet News,
Communications of the ACM, 40(3), pp. 77-87.
[6] Hua-Ping Zhang, Hong-Kui Yu, De-Yi Xiong, Qun Liu, HHMM-
based Chinese lexical analyzer ICTCLAS, Proceedings of the second
SIGHAN workshop on Chinese language processing, 2003, pp.184-
187.
[7] Rakesh Agrawal, Ramakrishnan Srikant, Fast Algorithms for Mining
Association Rules, Proceedings of the 20th International Conference
on Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile,
September 1994.
[8] George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek
Gross, and Katherine Miller, Introduction to WordNet: An On-line
Lexical Database, International Journal of Lexicography, 1990,
pp.235-244.
[9] Miao Fan, Guoshi Wu, Jing Li, Feature-Item Recommender System
for E-Commerce, 2011 International Conference on Computer Control
and Automation.
241