Professional Documents
Culture Documents
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
a r t i c l e
i n f o
Article history:
Received 8 April 2013
Received in revised form 3 January 2014
Accepted 6 January 2014
Available online 10 January 2014
Keywords:
Neuro-Fuzzy inference system
Higher order singular value decomposition
Subtractive clustering
Sparsity
Scalability
Multi-criteria collaborative ltering
a b s t r a c t
Collaborative Filtering (CF) is the most widely used prediction technique in recommender systems. It
makes recommendations based on ratings that users have assigned to items. Most of the current CF recommender systems maintain only single user ratings inside the user-item ratings matrix. Multi-criteria
based CF presents a possibility of providing accurate recommendations by considering the user preferences in multi aspects of items. However, in the multi-criteria CF, the user behavior about items features
is frequently subjective, imprecise and vague. These in turn induce uncertainty in reasoning and representation of items features that exactly cannot be solved using crisp machine learning techniques. In
contrast, approaches such as fuzzy methods instead of crisp methods can better solve the issue of uncertainty. In addition, fuzzy methods can predict the users preference more accurately and even better alleviate the sparsity problem in overall rating by considering user perception about items features. Apart
from this, in the multi-criteria CF, users provide the ratings on different aspects (criteria) of an item in
new dimensions; thereby, increasing the scalability problem. Appropriate dimensionality reduction techniques are thus needed to capture the high dimensions all together without reducing them into lower
dimensions to reveal the latent associations among the components. This study presents a new model
for multi-criteria CF using Adaptive Neuro-Fuzzy Inference System (ANFIS) combined with subtractive
clustering and Higher Order Singular Value Decomposition (HOSVD). HOSVD is used for dimensionality
reduction for improving the scalability problem and ANFIS is used for extracting fuzzy rules from the
experimental dataset, alleviating the sparsity problems in overall ratings and representing and reasoning
the users behavior on items features. Experimental results on real-world dataset show that combination
of two techniques remarkably improves the predictive accuracy and recommendation quality of multicriteria CF.
2014 Elsevier B.V. All rights reserved.
1. Introduction
During the last decade the amount of information available online increased exponentially and information overload problem has
become one of the major challenges faced by information retrieval
and information ltering systems. Recommender systems are one
solution to the information overload problem. In the mid-1990s,
recommender systems became active in the research domain when
the focus was shifted to recommendation problems by researchers
that explicitly rely on user rating structure and also emerged as an
independent research area [1].
Recommender systems based on Collaborative Filtering (CF) are
particularly popular and used by large online [24]. CF algorithms
can be divided into two categories: memory-based algorithms and
model based algorithms [3,5,6]. Memory-based (or heuristicbased) methods, such as correlation analysis and vector similarity,
Corresponding author. Tel.: +60 197608281.
E-mail address: nilashidotnet@hotmail.com (M. Nilashi).
0950-7051/$ - see front matter 2014 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.knosys.2014.01.006
search the user database for user proles that are similar to the
prole of the active user that the recommendation is made for
[7]. Heuristic-based approaches are classed into user-based and
item-based approaches [6,8]. User-based CF has been the most
popular and commonly used (memory-based) CF strategy [9]. It
is based on the premise that similar users will like similar items.
Item-based CF was rst proposed by [10] as an alternative style
of CF that avoids the scalability bottleneck associated with the traditional user-based algorithm. The bottleneck arises from the
search for neighbors in a population of users that is continuously
growing. In item-based CF, similarities are calculated between
items rather than between users, the intuition being that a user
will be interested in items which are similar to items he has liked
in the past. Two of the most popular approaches to computing similarities between users and items are the Pearson correlation coefcient and cosine-based coefcients.
One of the main problems in the recommender systems specifically CF is known as the sparsity problem [1114]. Also, memory
based CF approaches suffer from the scalability problem. Therefore,
scaling up these systems on real datasets is one of the main challenges that many studies have been provided to overcome it
[1518].
Compared with memory based algorithms, model-based algorithms usually scale better in terms of their resource requirements
(memory and computing time) and do not require keeping actual
user proles for predictions [5,10,19]. Model-based CF adopts an
eager learning strategy where a model of the data, i.e. the users,
items and their ratings for those items, is pre-computed [6,8,20].
Several research has suggested that model-based CF can also produce better predictive accuracy than memory-based collaborative
ltering, by using more sophisticated techniques such as matrix
factorisation and dimensionality reduction, for example [21,22].
1.1. Multi-criteria collaborative ltering
The ratings provided by users for items are the key input to CF
recommender systems. They present information regarding the
quality of the item along with the preference of the user who
shared the rating. Principally, the large numbers of recommender
systems are developed for single-valued ratings. According to
Adomavicius and Kwon [23], pure CF-based recommender systems
rely solely on product ratings provided by a large user community
to generate personalized recommendation lists for each individual
online user. In traditional CF systems the assumption is that customers provide an overall rating for the items which they have
purchased, for example, using a 5-star rating system. However, given the value of customer feedback to the business, customers in
some domains are nowadays given the opportunity to provide
more ne-grained feedback and to rate products and services along
various dimensions [24,25].
Adomavicius and Kwon [23] introduced schemes of incorporating multi-criteria rating information in the recommendation process. For example, they considered for each item multiple criteria
and overall ratings that indicate how much the item is liked by
the user based on their perception of items features. They stated
that single-rating CF recommenders are indicated as systems that
attempt to estimate a rating function R0 that has the form
users items ? R0 for predicting a rating for any given user-item
pair. R0 is totally ordered set, typically composed of real-valued
numbers inside a certain range. They further discussed that in multi-criteria recommender systems, in comparison, the rating function R0 gets the form users items ? R0 R1 Rk. Therefore,
in the multi-criteria CF problem, there are m users, n items and k
criteria in addition to an overall rating. Thus, users can provide a
number of explicit ratings for items; a general rating R0 must be
predicted in addition to k additional criteria ratings (R1, . . . , Rk). It
can be congured to push new items to users in two ways, either
by producing a Top-N list of recommendations for a given target,
or by predicting the target users likely utility (or rating) for a particular unseen item. We will refer to these as the recommendation
task and the rating prediction task in multi-criteria CF,
respectively.
1.2. The problems and our contributions
In the context of personalization applications, traditional single-rating CF have been highly successful however, the research
area regarding CF with multi-criteria ratings for items has been
rarely touched and the issue is largely unexplored. According to
Adomavicius and Kwon [23], the problem of multi-criteria recommendations with a single and an overall rating is still considered an
optimization problem. Also, providing ratings for items in multi aspects in the CF recommender systems presents new challenges
such as sparsity problem in criteria and overall ratings, scalability
83
84
Yahoo!Movies network and several comparisons are conducted between our method and other algorithms.
Thus, in comparison with research efforts found in the literature, our work has the following differences. In this research:
A new hybrid recommendation model using HOSVD and NeuroFuzzy techniques is proposed for increasing the predictive accuracy and improving the scalability of the multi-criteria CF.
Sparsity issue in overall ratings is solved using Neuro-Fuzzy
technique.
HOSVD is used for scalability improvement.
The remainder of this paper is organized as follows: In Section 2,
research background and related work are described. HOSVD
dimensional reduction technique, k-Nearest Neighbor (k-NN) Classier, ANFIS and subtractive clustering are introduced in the separate subsections in Sections 3. Section 4 provides an overview of
research methodology. Section 5 presents the result and discussion. Finally, conclusions and future work is presented in Section 6.
85
considered as linguistic labels by using fuzzy sets. A model proposed by Pinto et al. [45] that combined fuzzy numbers, product
positioning (from marketing theory) and item-based CF. Zhang
et al. [30] developed a hybrid recommendation approach with
combination of user-based and item-based CF techniques using
fuzzy set techniques and applied it to mobile product and service
recommendation. They tested the prediction accuracy of their hybrid recommendation approach using MovieLens 100 K dataset.
In case of multi-criteria CF, few researches has been conducted
to develop the similarity calculation of the traditional memorybased CF approach to investigate multi-criteria rating [23,46,47]
that the similarities between users are estimated through aggregating traditional similarities from individual criteria or applying
multidimensional distance metrics. Aggregation function approach
was seen by Adomavicius and Kwon [23] as the overall rating r0
can serve as an aggregate of multi-criteria ratings. With all this
presumption, this method nds aggregation function f representing the connection between overall and multi-criteria ratings as:
r 0 f r1 ; . . . ; rk
Jannach et al. [24] further developed the accuracy of multi-criteria CF by proposing a method using Support Vector Regression
(SVR) for automatically detecting the existing relationships between detailed item ratings and the overall ratings. In addition,
the learning process of SVR models was per item and user and
lastly combined the individual predictions in a weighted approach.
Similar to our research, they evaluated their methods using
Yahoo!Movie dataset.
3. Materials and methods
3.1. Higher Order Singular Value Decomposition (HOSVD)
To represent and recognize high-dimensional data effectively,
the dimensionality reduction is conducted on the original dataset
for low-dimensional representation [57]. Visualizing, comparing,
and decreasing processing time of data are the main advantages
of dimensionality reduction techniques. HOSVD is one of the powerful dimensionality reduction techniques for tensor decomposition proposed by Lathauwer et al. [58]. They proposed HOSVD as
a generalization of the SVD that is used for tensors decomposition.
For obtaining HOSVD calculations the following steps are
needed:
Step 1: Unfolding of the mode-d tensor T 2 RI1 ...Id which yields
matrices A(1), . . . , A(d). They are dened as:
n
An 2 j
The matrix unfolding of a tensor can be dened as matrix representations of that tensor in which all the column (row, etc.) vectors are stacked one after the other [58].
In the case of 3rd-order tensors T 2 RI1 I2 I3 , there exist three
matrix unfolding (see Fig. 1) as:
mode 1: j = i2 + (i3 1)I3,
mode 2: j = i3 + (i1 1)I1,
mode 3: j = i1 + (i2 1)I2.
Step 2: Identifying the d left singular matrices as U(1), . . . , U(d)
obtained by:
An Un
n
X
Vn ;
n 1; . . . ; d
3
n
In In
S T1 U1 2 U2 d Ud
where Sia a as sub-tensors of S 2 RI1 I2 ...Id are found through xing the nth index to a with ordering properties as:
n
rIn P 0
5
n
i
86
Mode-3 Unfolding: A( 1 )
I1 I 2 I3
Mode-3 Unfolding: A( 2 )
I 2 I1 I3
Mode-3 Unfolding: A( 3 )
I3 I1 I 2
kTk2F
Rd
R1
X
X
1 2
d 2
ri
ri kSk2F
i1
i1
kT Te k 6
R1
d
X
X
n 2
r in
n1 i1 F 1 1
Table 1
Computational cost for main steps in HOSVD.
Step
O(I1I2. . .IN)
O(I2I1I2. . .In1In+1...IN)
Constructing An An
n
nT
(n)
Determining A A
to obtain U
Contract tensor T with matrices U(n) s to get tensor S
O(I3)
O(I2I1I2. . .In1In+1...IN)
dst
Fig. 2. Procedure for decomposing tensors via HOSVD [59].
N-dim
r
Xn
x ytj 2
j1 sj
87
dst
n
X
jxsj ytj
j1
xs xs y xt 0
dst 1 p0 pt0
t yt y
t
xs xs xs xs yt y
1X
1X
xs
t
xsj and y
y
n j
n j sj
10
Rule
Rule
Rule
Rule
1:
2:
3:
4:
IF
IF
IF
IF
In1
In1
In1
In1
is
is
is
is
A1
A1
A2
A2
AND
AND
AND
AND
In2
In2
In2
In2
is
is
is
is
B1
B1
B2
B2
THEN
THEN
THEN
THEN
where the parameters A1, A2, B1 and B2 determine labels for indicating MFs for the inputs parameters In1 and In2, respectively. Also,
parameters pij, qij and rij (i, j = 1, 2) denote parameters of the output
MFs.
In Fig. 4, the layers in ANFIS perform the different action that is
detailed as bellow:
Layer 1: In this layer, membership grades are provided by nodes
which are adaptive nodes. The outputs in this layer are obtained
by:
i 1; 2
O1Bj
j 1; 2
lBj In2 ;
11
!
In1 ci 2
;
lAi In1 ; ri ; ci exp
2r2i
!
In c 2
lBj In2 ; rj ; cj exp 2 2 j ;
2rj
i 1; 2
12
j 1; 2
where the parameters of the MFs are dened as {ri,ci} and {rj,cj},
governing the Gaussian functions. In this layer, ANFIS parameters
stand usually as premise parameters.
Layer 2: There are xed number of nodes in the second layer,
labeled with P. The outputs of the second layer can be dened
as:
i; j 1; 2
13
88
O3ij W ij
W ij
2
X
P2
i; j 1; 2
14
j1 W ij
i1
i; j 1; 2
15
where W ij is the output of layer 3, and {pij, qij, rij} is the parameter
set.
Layer 5: There is only one single xed node labeled with R. This
node performs the summation of all incoming signals. Hence,
the overall output of the model is given by:
2 X
2
2 X
2
X
X
Out O
W ij fij
W ij pij In1 qij In2 r ij
i1 j1
i1 j1
W ij pij In1 W ij qij In2 W ij r ij
Pi P i Pc1 exp
!
kxi xj k2
;
r 2
b
r b g r a
19
Pi Pi
Pck exp
kxi xk k2
r 2
!
20
2 X
2
X
ond cluster center, for determining the new density values, the result of the rst cluster center is subtracted as follows:
16
i1 j1
where the overall output out is a linear combination of the consequent parameters when the values of the premise parameters are
xed.
If f x1 is A1 ; x2 is A2 ; . . . ; xk is Ak then y gx1 ; x2 ; . . .
17
ra
2
!
18
where Xi = [Xi1, Xi2, . . . , Xin] and Xj = [Xj1, Xj2, . . . , Xjn] are data vectors
for input and output dimensions, ra is a positive constant dening
the neighborhood range of the cluster or simply the radius of hypersphere cluster in data space and |||| indicates the Euclidean distance. ra is a critical parameter that determines the number of
cluster centers or locations. The rst cluster center is selected as
the c1 data point with the highest potential value, Pc1 . For the sec-
89
Mul-Criteria Dataset
Criteria 1
Criteria 2
Criteria k
Overall Rang
Clustering
Cluster 1
Cluster n
(Cluster 1) IF THEN
(Cluster n) IF THEN
Fuzzy Rules
Database
ratings database. Previous studies [74,75] have indicated the benets of applying clustering in recommender systems. Using HOSVD
and cosine-similarity approaches, we perform the clustering task
in an effective way for multi-criteria CF.
As discussed earlier, for recommendation task in multi-criteria
CF, recommender systems deal with high-dimensional data and
this phenomenon makes the computational cost extremely high
and even non-feasible for traditional dimensionality reduction
techniques. Given the scalability challenge, in this paper, HOSVD
is able to (1) factorize large tensors efciently using much less time
than standard methods, while at the same time and (2) obtain lowrank factors that preserve the main variance of the tensors. Thus,
due to the dimensionality reduction, we can better form and precompute the neighborhood that leads the prediction generation
be much faster in multi-criteria CF and this means that forming
neighborhoods in the low dimensional eigenspace provided better
quality and performance. In addition, after tensor decomposing by
HOSVD, the clustering of data using cosine-based similarity is performed in an effective way and once the clustering is complete, the
performance of multi-criteria CF can be very good, since the size of
the group that must be analyzed is much smaller.
For applying HOSVD, 3-dimensional data is stored in the 3dimensional tensor A 2 RI1 I2 I3 , whereby I1 corresponds to the
number of users, I2 corresponds to the number of items which were
rated and I3 is the number of used criteria. Each entry of the tensor A
is a number between 1 and 13. Using HOSVD the tensor A 2 RI1 I2 I3
that contains the user ratings about items on four criteria was
decomposed into A 2 S1 U2 V3 W in which U 2 RI1 I1 , V 2 RI2 I2
and W 2 RI3 I3 are orthonormal matrices, and S 2 RI1 I2 I3 is a core
tensor which satises all-orthogonality and ordering properties.
Similar to the truncated SVD for low-rank approximation and
dimensionality reduction of matrices, low-rank approximation
and dimensionality reduction of higher order tensors can be done
by the truncated HOSVD (but with better approximation and computation), that is, take the rst r1 columns of U, the rst r2 columns
of V, the rst r3 columns of W, and the top-left r1 r2 r3 block of S.
In that direction, for dimensionality reduction for 3 dimensions
dataset, HOSVD is an effective method. It is exible to choose different r for different mode of a tensor. The size of the data goes down to
r1r2r3 + I1r1 + I2r2 + I3r3 from I1I2I3, and if r1 = r2 = r3 the size of the
data goes down to r3 + r(I1 + I2 + I3). If we at the tensor into a
I1 I2I3 matrix, the size of the data only goes down to R2 + R(I1 + I2
I3). Therefore, result of the HOSVD decomposition on 3rd tensor of
users ratings are the matrices U, V and W that show the relations between user and user, item and item, and criterion and criterion,
respectively. This decomposition is obtained without splitting the
3-dimensional space into pair relations. For the sake of conciseness,
in the following a very simple example with only 4 users 6 items and
4 criteria is demonstrated. Table 2 shows the user rating for items by
users based on 4 criteria and its decomposition to the matrices U, V,
W, S(:,:,1), S(:,:,2), S(:,:,3) and S(:,:,4) is shown in Table 3.
similarity cosA; B
AB
kAk kBk
Pn
i1 Ai Bi
q
q
Pn
Pn
2
2
i1 Ai
i1 Bi
21
90
Table 2
Multi-criteria ratings for 4 users and 6 items on 4 criteria.
Items
Ratings on criteria 1
Users
Ratings on criteria 2
13
12
11
1
1
13
1
1
Ratings on criteria 3
11
11
4
0
11
12
3
5
5
4
11
9
11
11
11
3
10
11
11
9
11
9
5
3
C1
I1
I2
I3
I4
I5
I6
5
4
4
13
13
13
5
5
12
4
5
4
11
11
13
13
13
13
Ratings on criteria 4
10
10
11
5
4
3
3
11
4
11
9
3
4
9
3
3
New user ratings on four criteria C1, C2, C3 and C4
C2
C3
4
5
3
3
13
12
Rule 1: IF A is A1
Rule 2: IF A is A2
Rule 3: IF A is An
5
5
13
5
AND
AND
D is B1
D is B2
AND
AND
S is C1
S is C2
AND
D is Bn
AND
S is Cn
4
11
5
13
12
11
12
12
9
11
4
13
5
10
4
13
9
11
9
8
4
12
3
4
5
13
11
5
3
13
4
3
C4
4
11
5
12
13
12
11
4
12
4
11
13
AND
AND
AND
V is D1
V is D2
THEN
THEN
V is Dn
THEN
f1 = p1 A + q1 D + r1 S + t1 V + p1
f2 = p2 A + q2 D + r2 S + t2 V + p2
fn = pn A + qn D + rn S + tn V + pn
22
91
S(:,:, 2)
77.91
0.99
0.17
0.50
0.81
1.44
1.65
0.55
0.44
1.45
3.25
1.39
0.85
0.72
2.19
1.06
0.09
6.52
0.13
1.33
0.87
1.07
1.60
0.95
2.27
3.74
6.62
6.28
4.17
1.87
0.73
1.63
6.40
4.98
2.12
2.45
0.21
2.50
4.56
1.91
2.11
3.21
0.98
2.11
0.52
5.74
15.63
3.96
S(:,:, 3)
1.43
13.11
2.16
5.15
2.56
2.32
3.84
5.08
0.55
0.17
6.91
0.01
0.92
1.85
1.11
0.41
2.47
0.22
3.32
0.28
0.64
9.24
1.49
1.80
2.73
6.96
3.76
0.77
0.08
1.28
2.71
0.59
3.50
3.30
0.16
3.14
3.54
1.39
1.64
0.26
S(:,:, 4)
0.19
5.80
3.88
5.85
0.18
0.40
4.60
0.70
Matrix U
Matrix V
0.30
0.60
0.67
0.33
0.49
0.56
0.51
0.43
0.51
0.46
0.46
0.56
0.65
0.34
0.29
0.62
0.74
0.66
0.06
0.08
0.44
0.60
0.50
0.44
0.17
0.12
0.46
0.86
0.47
0.43
0.73
0.23
Matrix W
0.65
0.29
0.20
0.25
0.45
0.43
0.40
0.40
0.43
0.46
0.38
0.38
0.05
0.84
0.32
0.03
0.36
0.24
0.6
Items
I1
New User
Users
0.4
UnewU1
0.2
I4
I3
0
-0.2
I2
U4
-0.4
-0.6
-0.8
-0.65
-0.6
-0.55
-0.5
I6
I5
U3
-0.45
-0.4
-0.35
-0.3
lAi S e
lAi V e
(a)
(b)
1
Cluster 2
Cluster 3
Degree of Membership
Degree of Membership
Cluster 4
Cluster 1
0.8
0.6
0.4
0.2
0
10
10.5
11
11.5
12
12.5
0.05
0.03
0.36
0.26
0.66
0.60
0.8
U2
0.61
0.08
0.59
0.33
0.12
0.39
0.21
0.20
0.45
0.74
0.25
0.31
13
Sbi 2
2a2
i
23
Vbi 2
2a2
i
24
1
Cluster 2
Cluster 3
Cluster 1
0.8
Cluster 4
0.6
0.4
0.2
0
10
10.5
11
11.5
12
12.5
13
92
of clusters was used for the training data and the second group
of data including 10% of the total dataset of clusters was used for
the checking data. The remaining 10% data of clusters was used
for the testing data.
5. Result and discussion
Fig. 8. Architecture of implemented ANFIS model for two inputs, one output and
two rules.
where {ai, bi, ci} is the parameter set of the MFs in the premise part
of fuzzy IFTHEN rules that change the shapes of the MFs. Parameters in this layer are referred to as the premise parameters.
From the ANFIS architecture shown in Fig. 8, it can be observed
that when the values of the premise parameters are xed, the overall output can be expressed as a linear combination of the consequent parameters. In symbols, the output O can be rewritten as:
w1
w1
wn
f1
f2 . . .
fn
w1 w2
w1 w2
wn1 wn
1 p1 A q1 D r 1 S t 1 V p1 w
n pn A qn D
w
r n S t n V pn
1 Ap1 w
1 Dq1 w
1 Sr 1 w
1 Vt 1 w
1 p1
w
n Apn w
n Dqn w
n Sr n w
n Vt n w
n pn
w
25
which is linear in consequent parameters pi, qi, ri, ti, and pi. Fig. 9
shows the architecture of the implemented ANFIS that consist of
four inputs, four rules, sixteen MFs for inputs and output.
4.2.1. Training the ANFIS and model validation using checking and
testing dataset
In this study, three set of data were used for ANFIS modeling as
training, checking and testing data. ANFIS uses training data for
constructing the model of target system. The rows of training data
are used as inputs and outputs for construction the target model.
Checking data is used for testing generalization capability of the
FIS at each epoch that prevents over-tting networks and veries
the identied ANFIS. Similar to the format of training data, the formats for the checking and testing data are dened data but generally their elements are different from those of the training data.
Any clusters obtained using HOSVD were divided into three
groups. The rst group of data including 80% of the total dataset
Movie ID
User ID
Directing
Story
Visual
Acting
Overall rating
2
13
9
...
...
13
13
12
...
...
1
13
13
...
...
13
13
13
...
...
2
11
13
...
...
13
11
13
...
...
1
13
8
...
...
13
13
13
...
...
2
13
9
...
...
13
13
12
...
...
1
13
8
...
...
13
12
13
...
...
93
Table 5
Information of Yahoo!Movies dataset.
Name
#Users
#Items
#Overall ratings
YM-20-20
YM-10-10
YM-5-5
429
1827
5978
491
1471
3079
18,504
48,026
82,599
sci
bi ai
maxfai; big
26
8
>
< 1 ai=bi; if ai < bi
sci 0;
if ai bi
>
:
bi=ai 1; if ai > bi
27
Pn
MSE
k
1X
sci
nk i1
28
R2 1
Pn
O1 actualO
Pn
Table 6
Rule of thumb for the interpretation of the Silhouette coefcient.
Range
Interpretation
>0.70
0.500.70
0.250.50
<0.25
Table 7
Average Silhouette coefcient value for clusters obtained from different rank
approximation.
Rank approximation
Number of clusters
Rank
Rank
Rank
Rank
Rank
Rank
Rank
Rank
Rank
Rank
11
14
13
16
12
14
15
15
13
13
0.837
0.858
0.833
0.790
0.867
0.811
0.828
0.798
0.787
0.765
29
predictionO=actualO
n
O1 actualO
30
predictionO
O1 actualO
002
004
006
008
010
012
014
016
018
020
predictionO
n
Pn
MAPE
SC k
O1 actualO
actualO
31
s
Pn
2
O1 actualO predictionO
RMSE
n
32
where actual (O) indicates the real overall rating provided by user,
prediction (O) implies the predicted overall rating value and n corresponds to the number of used users ratings.
Usually, in the training process RMSE and MSE measure are
used to test the prediction model, however, in this study, other
performance measures were used to investigate for a more effective performance evaluation that are coefcient of determination
R2 and MAPE. The coefcient of determination R2 provides a value
between [1] about the training of the proposed network. A value
closer to 1 stands for the success of learning. Also, in this study,
MAPE was used that accurately identies the model deviations.
After implementing the ANFIS model using fuzzy logic toolbox
in MATLAB 7.10.0 software, the training and checking data from
the training and checking dataset were tested for error estimation.
Data from four inputs was given to trained model of ANFIS along
with actual overall ratings. From the inputs value, the suitable
MFs (see Fig. 7(a) and (b)) were selected to predict the overall ratings using the extracted rules (see Table 8). From the fuzzy rule
viewer of established ANFIS model shown in Fig. 10, the process
of overall rating prediction by selecting the MFs can be better visualized. It indicates the behavioral of users over the change in values
of all four inputs for overall rating. From the fuzzy rule viewer
above, when the input parameters of Acting is at 11, Directing at
12, Story at 12, and Visuals at 11, an output of overall rating at
12 is obtained.
Table 9 presents errors for a sample of training and checking
dataset. As can be seen the error from nineteen samples in Table 9,
ANFIS model has been trained effectively using training data.
Table 8
Formation of extracted rules by ANFIS.
Rule #
1
2
3
4
IF
IF
IF
IF
(Acting
(Acting
(Acting
(Acting
is
is
is
is
cluster1)
cluster2)
cluster3)
cluster4)
AND
AND
AND
AND
(Directing
(Directing
(Directing
(Directing
is
is
is
is
cluster1)
cluster2)
cluster3)
cluster4)
AND
AND
AND
AND
(Story
(Story
(Story
(Story
is
is
is
is
cluster1)
cluster2)
cluster3)
cluster4)
AND
AND
AND
AND
(Visuals
(Visuals
(Visuals
(Visuals
is
is
is
is
cluster1)
cluster2)
cluster3)
cluster4)
THEN
THEN
THEN
THEN
(OveralRating
(OveralRating
(OveralRating
(OveralRating
is
is
is
is
out1cluster1)(1)
out1cluster2)(1)
out1cluster3)(1)
out1cluster4)(1)
94
Fig. 10. Fuzzy rule viewer for input and output variables of ANFIS model.
Table 9
Training and checking errors for prediction overall ratings by ANFIS.
Sample #
Training data
Checking data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
12
10
13
12
12
13
12
13
12
12
12
10
13
12
11
12
12
13
12
12
10.0001
13
12
12
13
12
13
12
12
12
10.0013
13
12
10.9999
12
12
13
12
0
0.0001
0
0
0
0
0
0
0
0
0
0.0013
0
0
-0.0001
0
0
0
0
11
12
10
12
11
11
12
12
12
13
11
11
12
12
12
12
11
10
11
11.01
12.009
10.009
12.0001
11.0008
11.009323
12.0004
12.00383
11.998
12.999998
11.003
11.0005
12.00276
12.0003
11.9917
11.99346
10.99299
10.009
10.9901
0.01
0.009
0.009
0.0001
0.0008
0.009323
0.0004
0.00383
0.002
0.000002
0.003
0.0005
0.00276
0.0003
0.0083
0.00654
0.00701
0.009
0.0099
Training Error
Checking Error
Prediction error
0.01
0.005
0
-0.005
-0.01
0
10
15
20
Number of Samples
Fig. 11. Training and checking error for nineteen samples in the dataset.
In this study, the average error for checking data was equal to
0.0001904. After 200 epochs, the averages RMSE, MSE, MAPE and
R2 were calculated 0.02144, 0.00912, 0.18230 and 0.82460, respectively. The average error for training data was equal to
0.000162221. After 200 epochs, the averages RMSE, MSE, MAPE
and R2 were calculated 0.01272, 0.00912, 0.18230 and 0.99460,
respectively. Also, after 200 epochs, the average error for testing
data was equal to 0.000172361. The averages RMSE, MSE, MAPE
and R2 were calculated 0.01951, 0.00949, 0.10230 and 0.91150,
respectively. Average training and checking error after 200 epochs
are shown in Fig. 12.
Fig. 13 illustrates the interdependency of four inputs parameters and the overall rating obtained from the fuzzy rules generated
by ANFIS combined with subtractive clustering through control
surface. The level of overall rating can be depicted as a continuous
function of its input parameters as Acting, Directing, Story and
Visuals. The surface plots in this gure depict the variation of overall rating based on identied fuzzy rules. Fig. 13(a) shows the interdependency of overall rating on Directing and Acting. Fig. 13(b)
depicts interdependency of overall rating on Acting and Story.
Fig. 13(c) shows interdependency of overall rating on Visuals and
Acting. Fig. 13(d) depicts interdependency of overall rating on
Story and Directing. Fig. 13(e) depicts interdependency of overall
95
Fig. 12. The error of each observation for checking and training data.
rating on Visuals and Directing and Fig. 13(f) shows interdependency of overall rating on Story and Visuals.
These surface plots exactly show the users perception and
behaviors on any two features of items in the cluster of users with
similar preferences. In addition, the results depicted in the surface
plots are valuable to reveal users behavior about items features in
multi-criteria CF. Thus, the users preferences in any cluster of users
can be modeled by ANFIS and recommender system can recognize
which item features (criteria) in which level is tailored to their preferences. Also, the several curves presented in Fig. 14(ad) reveal distinctly the user behavior on any feature of items. As can be seen in
these curves, there is a signicant increase for overall rating versus
Story criteria in relation to the other criteria. It can be inferred that
Story criteria is most important for users in that cluster.
5.3. Multi-criteria CF evaluation
In this section, we completely focus on multi-criteria CF recommendation using proposed method. As mentioned before, we used
k-NN for classication data and also we stated that selecting k and
distance metric are important in k-NN method for accuracy of classication. Therefore, in this study, the optimal distance metric and
k were chosen using cross-validation [77]. Thus, classier could
accurately predict the testing data. Five-fold cross-validation
method has been applied to choice the type of distance metric
and best k value.
Using ve-fold cross-validation approach, for values k = 1, k = 3,
k = 5 and k = 7 and three different methods of calculating the nearest distance (Euclidean), correlation and City-Block, the result of
averaged classication accuracy presented in Table 10. From
Table 10, the highest averaged classication accuracy is obtained
about 98.91% using Euclidean distance metric for k = 5 in comparison to the City-Block (95.89%) and Correlation (96.76%) distance
metrics. Also, using Euclidean method, the averaged classication
rate is higher than Correlation and City-Block methods for all values of k. Thus, based on this result, we established the optimal value 5 obtained using ve-fold cross-validation and Euclidean for
distance metric for classication accuracy.
We determined the precision respectively the recall of the TopN list of each element in the test set and build the arithmetic mean
of these values. The recommenders prediction accuracy was measured by RMSE [78], which is a widely used metric for evaluating
the statistical accuracy of recommendation algorithms, given by
s
1 X
RMSE
jaij pij j2
ui ;oj 2X
jXj
33
where X = {(ui, oj)|ui had rated oj in the probe set}. A lower value of
RMSE indicates a higher accuracy of the recommendation system.
Recall
Precision
34
35
F1
2 Recall Precision
Recall Precision
36
96
Fig. 13. Interdependency of overall rating on (a) Directing and Acting, (b) Acting and Story, (c) Visuals and Acting, (d) Story and Directing, (e) Visuals and Directing, and (f)
Story and Visuals.
97
Fig. 14. Curves for revealing the relationship between overall rating and (a) Visuals, (b) Directing, (c) Story, and (d) Acting.
Table 10
Averaged classication accuracy for distance metrics and values of k.
Distance metric
k=1
k=3
k=5
k=7
k=9
Euclidean
City block
Correlation
96.63
94.72
95.28
97.34
94.87
95.38
98.91
95.89
96.76
97.67
93.89
94.88
95.56
93.73
94.87
Table 11
Coverage and RMSE for YM-5-5, YM-10-10 and YM-20-20.
Size of neighborhood
MAEpred; act
Dataset
5
10
15
20
25
30
YM-5-5
RMSE
YM-10-10
RMSE
YM-20-20
RMSE
0.551097
0.549707
0.544308
0.538909
0.530209
0.528609
0.5365
0.5310
0.5289
0.5200
0.5184
0.5174
0.53158
0.52988
0.52039
0.51558
0.51129
0.50349
0.58
YM 5-5
YM 10-10
YM 20-20
0.57
RMSE
0.56
0.55
0.54
0.53
0.52
0.51
0.5
15
25
35
Neighborhood Size
Fig. 15. RMSE and neighborhood size.
45
55
N
X
predu;i act u;i
N
37
i1
98
0.86
0.85
0.84
YM-10-10
YM-20-20
0.72
0.71
0.83
MAE(%)
F measure
0.73
5 Neighbors
15 Neighbors
25 Neighbors
35 Neighbors
0.82
0.81
0.7
0.69
0.8
0.68
0.79
0.67
0.78
0.77
15
25
35
0.66
10
40
Top-N
F measure
0.84
0.82
0.8
0.78
15
40
50
70
90
5 Neighbors
15 Neighbors
25 Neighbors
35 Neighbors
0.86
30
Size of Neighbors
0.76
20
25
35
Table 14
Recommendation accuracy using MAE for different neighborhood size.
Neighborhood size
MAE(%) YM-10-10
MAE(%) YM-20-20
10
20
30
40
50
70
90
0.7325
0.7370
0.7249
0.7260
0.7184
0.7112
0.7180
0.7105
0.7088
0.7093
0.7015
0.6902
0.6724
0.6688
40
Top-N
Precision @5 YM-20-20
0.88
Precision @7 YM-20-20
0.86
Precision @5 YM-10-10
Precision @7 YM-10-10
Table 12
MAE, precision at Top-5 and Top-7 of proposed method HOSVD and truncated SVD for
YM-10-10 (neighborhood size: all users).
Algorithm
Precision@5
Precision@7
MAE
HOSVD
Truncated SVD
HOSVD-ANFIS and subtractive
clustering
75.34
74.03
81.44
72.85
72.19
80.78
1.17
1.75
0.96
Precision
0.84
0.82
0.8
0.78
0.76
0.74
0.72
4
10
12
Number of Clusters
Table 13
MAE, precision at Top-5 and Top-7 for proposed method, HOSVD and truncated SVD
for YM-20-20 (neighborhood size: all users).
Algorithm
Precision@5
Precision@7
MAE
HOSVD
Truncated SVD
HOSVD-ANFIS and subtractive
clustering
78.57
75.12
83.34
76.43
73.21
81.32
0.95
1.45
0.91
Also, according to the rank 12 approximation dened for HOSVD decomposition, we employed the precision on different number
of clusters. For rank 12 approximation, the dened number of clusters was changed iteratively: starting from the 3 clusters, after
each iteration number of clusters was increased by 3 until 12.
Fig. 19 illustrates the precision value for Precision@5 and Precision@7 versus number of clusters.
As can be seen in Fig. 19, the worst precision is obtained for YM10-10 at precision@5 in the third cluster and the best precision is
Fig. 19. Precision versus number of clusters in Precision@5 and Precision@7 for
different dataset.
99
5000
Throughput (Recs./Sec)
4500
4000
Table 15
Coverage and RMSE for YM-5-5, YM-10-10 and YM-20-20.
Size of
neighborhood
3000
5
10
15
20
25
30
2000
1500
Multi-criteria CF with
similarity-based approach
YM-5-5
3500
2500
Multi-criteria CF with
proposed method
0.99
0.99
0.99
0.99
0.99
1
0.99
0.99
0.99
0.99
1
1
0.99
0.99
0.99
0.99
1
1
0.54
0.59
0.65
0.66
0.68
0.69
0.70
0.73
0.76
0.79
0.81
0.83
0.75
0.80
0.82
0.84
0.86
0.88
1000
500
3
12
Number of clusters
Fig. 20. Throughput of proposed method versus similarity-based approach.
multi-criteria CF recommender system as the number of recommendations generated per second for k selected user (k = 5). From
the curves in this plot, we see that using the HOSVD and cosinebased approaches for clustering the high-dimensional data, the
throughput is substantially higher than the multi-criteria CF based
on similarity-based approach. This is due to the reason that with
the clustered approach using HOSVD and cosine-based similarity
the prediction algorithm uses a fraction of neighbors. The throughput of multi-criteria recommender system increases rapidly with
the increase in the number of clusters with the small sizes. Since
the multi-criteria CF based on similarity approach has to scan
through all the neighbors, the number of clusters does not impact
the throughput.
We also evaluated the recommendation quality using coverage
measures. Coverage measures the percentage of items for which a
CF system can provide a prediction or that ever appear in a recommendation list [81]. It should be noted that a recommender system
maintains a good level of coverage so that most of the items are
connected in some way to the rest of the data, otherwise they will
be isolated and essentially dormant in the system.
The curves shown in Fig. 21 present the quality of the recommendation of proposed method and reveals that the coverage is
strongly related to the neighborhood size. Table 15 presents the
coverage obtained from the proposed method. To experimentally
show the effectiveness of clustering using HOSVD and cosinebased similarity on coverage, we also performed the experiments
on similarity-based approach as presented in Table 15.
YM 20-20
1.002
YM 10-10
YM 5-5
Coverage
1
0.998
0.996
0.994
0.992
5
15
25
35
Neghiborhood Size
Fig. 21. Neighborhood size and coverage.
45
55
From the Table 15, the proposed method maintains a good level
of coverage in relation to the similarity-based approach on different neighborhood sizes. In addition, the results also conrm that
proposed method and similarity-based approach have good coverage on YM-20-20.
100
[17] J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborative
ltering recommender systems, ACM Trans. Inform. Syst. (TOIS) 22 (2004) 5
53.
[18] J.S. Breese, D. Heckerman, C. Kadie, Empirical analysis of predictive algorithms
for collaborative ltering, in: Proceedings of the Fourteenth Conference on
Uncertainty in Articial Intelligence, Morgan Kaufmann Publishers Inc., 1998,
pp. 4352.
[19] K. Goldberg, T. Roeder, D. Gupta, C. Perkins, Eigentaste: a constant time
collaborative ltering algorithm, Inform. Retriev. 4 (2001) 133151.
[20] L.M. de Campos, J.M. Fernndez-Luna, J.F. Huete, M.A. Rueda-Morales, Using
second-hand information in collaborative recommender systems, Soft
Comput. 14 (2010) 785798.
[21] Y. Koren, R. Bell, C. Volinsky, Matrix factorization techniques for recommender
systems, Computer 42 (2009) 3037.
[22] P. Symeonidis, M.M. Ruxanda, A. Nanopoulos, Y. Manolopoulos, Ternary
semantic analysis of social tags for personalized music recommendation, in:
ISMIR, Citeseer, 2008, pp. 219224.
[23] G. Adomavicius, Y. Kwon, New recommendation techniques for multicriteria
rating systems, Intell. Syst., IEEE 22 (2007) 4855.
[24] D. Jannach, Z. Karakaya, F. Gedikli, Accuracy improvements for multi-criteria
recommender systems, in: Proceedings of the 13th ACM Conference on
Electronic Commerce, ACM, 2012, pp. 674689.
[25] G. Adomavicius, N. Manouselis, Y. Kwon, Multi-criteria recommender systems,
in: Recommender Systems Handbook, Springer, 2011, pp. 769803.
[26] V. Nourani, M. Komasi, A geomorphologybased ANFIS model for multi-station
modeling of rainfallrunoff process, J. Hydrol. (2013).
[27] J. Marx-Gomez, C. Rautenstrauch, A. Nrnberger, R. Kruse, Neuro-fuzzy
approach to forecast returns of scrapped products to recycling and
remanufacturing, Knowl.-Based Syst. 15 (2002) 119128.
[28] S. Sen, J. Vig, J. Riedl, Tagommenders: connecting users to items through tags,
in: Proceedings of the 18th International Conference on World Wide Web,
ACM, 2009, pp. 671680.
[29] J. Lu, Q. Shambour, Y. Xu, Q. Lin, G. Zhang, A web-based personalized business
partner recommendation system using fuzzy semantic techniques, Comput.
Intell. 29 (2013) 3769.
[30] Z. Zhang, H. Lin, K. Liu, D. Wu, G. Zhang, J. Lu, A hybrid fuzzy-based
personalized recommender system for telecom products/services, Inform. Sci.
235 (2013) 117129.
[31] Q. Shambour, J. Lu, A trust-semantic fusion-based recommendation approach
for e-business applications, Dec. Supp. Syst. 54 (2012) 768780.
[32] Q. Shambour, J. Lu, A hybrid trust-enhanced collaborative ltering
recommendation approach for personalized government-to-business eservices, Int. J. Intell. Syst. 26 (2011) 814843.
[33] J.-T. Sun, H.-J. Zeng, H. Liu, Y. Lu, Z. Chen, CubeSVD: a novel approach to
personalized Web search, in: Proceedings of the 14th International Conference
on World Wide Web, ACM, 2005, pp. 382390.
[34] P. Symeonidis, A. Nanopoulos, Y. Manolopoulos, Tag recommendations based
on tensor dimensionality reduction, in: Proceedings of the 2008 ACM
Conference on Recommender Systems, ACM, 2008, pp. 4350.
[35] Y. Xu, L. Zhang, W. Liu, Cubic analysis of social bookmarking for personalized
recommendation, in: Frontiers of WWW Research and Development-APWeb
2006, Springer, 2006, pp. 733738.
[36] M. Leginus, V. Zemaitis, Speeding Up Tensor Based Recommenders with
Clustered Tag Space and Improving Quality of Recommendations with NonNegative Tensor Factorization, Masters thesis, Aalborg University, 2011.
[37] P. Symeonidis, A. Nanopoulos, Y. Manolopoulos, A unied framework for
providing recommendations in social tagging systems based on ternary
semantic analysis, IEEE Trans. Knowl. Data Eng. 22 (2010) 179192.
[38] Q. Li, C. Wang, G. Geng, Improving personalized services in mobile
commerce by a novel multicriteria rating approach, in: WWW, 2008, pp.
12351236.
[39] M. Lesaffre, M. Leman, Using fuzzy logic to handle the users semantic
descriptions in a music retrieval system, in: Theoretical Advances and
Applications of Fuzzy Logic and Soft Computing, Springer, 2007, pp. 8998.
[40] Y. Cao, Y. Li, An intelligent fuzzy-based recommendation system for consumer
electronic products, Expert Syst. Appl. 33 (2007) 230240.
[41] G. Castellano, A. Fanelli, M. Torsello, A neuro-fuzzy collaborative ltering
approach for web recommendation, Int. J. Comput. Sci. 1 (2007) 2729.
[42] L.M. de Campos, J.M. Fernndez-Luna, J.F. Huete, A collaborative recommender
system based on probabilistic inference from fuzzy observations, Fuzzy Sets
Syst. 159 (2008) 15541576.
[43] R.R. Yager, Fuzzy logic methods in recommender systems, Fuzzy Sets Syst. 136
(2003) 133149.
[44] J. Carbo, J.M. Molina, Agent-based collaborative ltering based on fuzzy
recommendations, Int. J. Web Eng. Technol. 1 (2004) 414426.
[45] M.A. Pinto, R. Tanscheit, M. Vellasco, Hybrid recommendation system based on
collaborative ltering and fuzzy numbers, in: IEEE International Conference on
Fuzzy Systems (FUZZ-IEEE), 2012, IEEE, 2012, pp. 16.
[46] T.Y. Tang, G. McCalla, The pedagogical value of papers: a collaborative-ltering
based paper recommender, J. Digital Inform. 10 (2009).
[47] N. Manouselis, C. Costopoulou, Experimental analysis of design choices in
multiattribute utility collaborative ltering, Int. J. Pattern Recog. Artif. Intell.
21 (2007) 311331.
[48] N. Sahoo, R. Krishnan, G. Duncan, J.P. Callan, Collaborative ltering with multicomponent rating for recommender systems, in: Proceedings of the Sixteenth
Workshop on Information Technologies and Systems, 2006.
101
[66] M. Sugeno, Industrial Applications of Fuzzy Control, Elsevier Science Inc., 1985.
[67] M. Nilashi, K. Bagherifard, O. Ibrahim, N. Janahmadi, M. Barisami, An
application expert system for evaluating effective factors on trust in B2C
websites, Engineering 3 (2011) 10631071.
[68] M. Nilashi, M. Fathian, M.R. Gholamian, O. bin Ibrahim, A. Talebi, N. Ithnin, A
comparative study of adaptive neuro fuzzy inferences system (ANFIS) and
fuzzy inference system (FIS) approach for trust in B2C electronic commerce
websites, JCIT 6 (2011) 2543.
[69] M. Nilashi, M. Fathian, M.R. Gholamian, O.B. Ibrahim, Propose a model for
customer purchase decision in B2C websites using adaptive neuro-fuzzy
inference system, Int. J. Bus. Res. Manage. (IJBRM) 1 (2011) 118.
[70] Q. Liang, J.M. Mendel, An introduction to type-2 TSK fuzzy logic systems, in:
IEEE International Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE99,
1999, IEEE, 1999, pp. 15341539.
[71] S.L. Chiu, Fuzzy model identication based on cluster estimation, J. Intell.
Fuzzy Syst. 2 (1994) 267278.
[72] A. Bouchachia, W. Pedrycz, Enhancement of fuzzy clustering by mechanisms of
partial supervision, Fuzzy Sets Syst. 157 (2006) 17331759.
[73] M.T. Hayajneh, A.M. Hassan, F. Al-Wedyan, Monitoring defects of ceramic tiles
using fuzzy subtractive clustering-based system identication method, Soft
Comput. 14 (2010) 615626.
[74] A. Bilge, H. Polat, A comparison of clustering-based privacy-preserving
collaborative ltering schemes, Appl. Soft Comput. 13 (2013) 24782489.
[75] A. Bilge, H. Polat, A scalable privacy-preserving recommendation scheme
via bisecting k-means clustering, Inform. Process. Manage. 49 (2013)
912927.
[76] L. Kaufman, P.J. Rousseeuw, Finding Groups In Data: An Introduction To
Cluster Analysis, Wiley.com, 2009.
[77] G. Hamerly, G. Speegle, Efcient model selection for large-scale nearestneighbor data mining, in: Data Security and Security Data, Springer, 2012, pp.
3754.
[78] A. Gunawardana, C. Meek, A unied approach to building hybrid recommender
systems, in: Proceedings of the Third ACM Conference on Recommender
Systems, ACM, 2009, pp. 117124.
[79] D. Billsus, M.J. Pazzani, User modeling for adaptive news access, User Model.
User-Adap. Interact. 10 (2000) 147180.
[80] K. Bagherifard, M. Nilashi, O. Ibrahim, N. Ithnin, L.A. Nojeem, Measuring
semantic similarity in grids using ontology, Int. J. Innov. Appl. Stud. 2 (2013)
230237.
[81] J.L. Herlocker, J.A. Konstan, A. Borchers, J. Riedl, An algorithmic framework for
performing collaborative ltering, in: Proceedings of the 22nd Annual
International ACM SIGIR Conference on Research and Development in
Information Retrieval, ACM, 1999, pp. 230237.