You are on page 1of 20

Knowledge-Based Systems 60 (2014) 82101

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

Multi-criteria collaborative ltering with high accuracy using higher


order singular value decomposition and Neuro-Fuzzy system
Mehrbakhsh Nilashi , Othman bin Ibrahim, Norada Ithnin
Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

a r t i c l e

i n f o

Article history:
Received 8 April 2013
Received in revised form 3 January 2014
Accepted 6 January 2014
Available online 10 January 2014
Keywords:
Neuro-Fuzzy inference system
Higher order singular value decomposition
Subtractive clustering
Sparsity
Scalability
Multi-criteria collaborative ltering

a b s t r a c t
Collaborative Filtering (CF) is the most widely used prediction technique in recommender systems. It
makes recommendations based on ratings that users have assigned to items. Most of the current CF recommender systems maintain only single user ratings inside the user-item ratings matrix. Multi-criteria
based CF presents a possibility of providing accurate recommendations by considering the user preferences in multi aspects of items. However, in the multi-criteria CF, the user behavior about items features
is frequently subjective, imprecise and vague. These in turn induce uncertainty in reasoning and representation of items features that exactly cannot be solved using crisp machine learning techniques. In
contrast, approaches such as fuzzy methods instead of crisp methods can better solve the issue of uncertainty. In addition, fuzzy methods can predict the users preference more accurately and even better alleviate the sparsity problem in overall rating by considering user perception about items features. Apart
from this, in the multi-criteria CF, users provide the ratings on different aspects (criteria) of an item in
new dimensions; thereby, increasing the scalability problem. Appropriate dimensionality reduction techniques are thus needed to capture the high dimensions all together without reducing them into lower
dimensions to reveal the latent associations among the components. This study presents a new model
for multi-criteria CF using Adaptive Neuro-Fuzzy Inference System (ANFIS) combined with subtractive
clustering and Higher Order Singular Value Decomposition (HOSVD). HOSVD is used for dimensionality
reduction for improving the scalability problem and ANFIS is used for extracting fuzzy rules from the
experimental dataset, alleviating the sparsity problems in overall ratings and representing and reasoning
the users behavior on items features. Experimental results on real-world dataset show that combination
of two techniques remarkably improves the predictive accuracy and recommendation quality of multicriteria CF.
2014 Elsevier B.V. All rights reserved.

1. Introduction
During the last decade the amount of information available online increased exponentially and information overload problem has
become one of the major challenges faced by information retrieval
and information ltering systems. Recommender systems are one
solution to the information overload problem. In the mid-1990s,
recommender systems became active in the research domain when
the focus was shifted to recommendation problems by researchers
that explicitly rely on user rating structure and also emerged as an
independent research area [1].
Recommender systems based on Collaborative Filtering (CF) are
particularly popular and used by large online [24]. CF algorithms
can be divided into two categories: memory-based algorithms and
model based algorithms [3,5,6]. Memory-based (or heuristicbased) methods, such as correlation analysis and vector similarity,
Corresponding author. Tel.: +60 197608281.
E-mail address: nilashidotnet@hotmail.com (M. Nilashi).
0950-7051/$ - see front matter 2014 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.knosys.2014.01.006

search the user database for user proles that are similar to the
prole of the active user that the recommendation is made for
[7]. Heuristic-based approaches are classed into user-based and
item-based approaches [6,8]. User-based CF has been the most
popular and commonly used (memory-based) CF strategy [9]. It
is based on the premise that similar users will like similar items.
Item-based CF was rst proposed by [10] as an alternative style
of CF that avoids the scalability bottleneck associated with the traditional user-based algorithm. The bottleneck arises from the
search for neighbors in a population of users that is continuously
growing. In item-based CF, similarities are calculated between
items rather than between users, the intuition being that a user
will be interested in items which are similar to items he has liked
in the past. Two of the most popular approaches to computing similarities between users and items are the Pearson correlation coefcient and cosine-based coefcients.
One of the main problems in the recommender systems specifically CF is known as the sparsity problem [1114]. Also, memory
based CF approaches suffer from the scalability problem. Therefore,

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

scaling up these systems on real datasets is one of the main challenges that many studies have been provided to overcome it
[1518].
Compared with memory based algorithms, model-based algorithms usually scale better in terms of their resource requirements
(memory and computing time) and do not require keeping actual
user proles for predictions [5,10,19]. Model-based CF adopts an
eager learning strategy where a model of the data, i.e. the users,
items and their ratings for those items, is pre-computed [6,8,20].
Several research has suggested that model-based CF can also produce better predictive accuracy than memory-based collaborative
ltering, by using more sophisticated techniques such as matrix
factorisation and dimensionality reduction, for example [21,22].
1.1. Multi-criteria collaborative ltering
The ratings provided by users for items are the key input to CF
recommender systems. They present information regarding the
quality of the item along with the preference of the user who
shared the rating. Principally, the large numbers of recommender
systems are developed for single-valued ratings. According to
Adomavicius and Kwon [23], pure CF-based recommender systems
rely solely on product ratings provided by a large user community
to generate personalized recommendation lists for each individual
online user. In traditional CF systems the assumption is that customers provide an overall rating for the items which they have
purchased, for example, using a 5-star rating system. However, given the value of customer feedback to the business, customers in
some domains are nowadays given the opportunity to provide
more ne-grained feedback and to rate products and services along
various dimensions [24,25].
Adomavicius and Kwon [23] introduced schemes of incorporating multi-criteria rating information in the recommendation process. For example, they considered for each item multiple criteria
and overall ratings that indicate how much the item is liked by
the user based on their perception of items features. They stated
that single-rating CF recommenders are indicated as systems that
attempt to estimate a rating function R0 that has the form
users  items ? R0 for predicting a rating for any given user-item
pair. R0 is totally ordered set, typically composed of real-valued
numbers inside a certain range. They further discussed that in multi-criteria recommender systems, in comparison, the rating function R0 gets the form users  items ? R0  R1      Rk. Therefore,
in the multi-criteria CF problem, there are m users, n items and k
criteria in addition to an overall rating. Thus, users can provide a
number of explicit ratings for items; a general rating R0 must be
predicted in addition to k additional criteria ratings (R1, . . . , Rk). It
can be congured to push new items to users in two ways, either
by producing a Top-N list of recommendations for a given target,
or by predicting the target users likely utility (or rating) for a particular unseen item. We will refer to these as the recommendation
task and the rating prediction task in multi-criteria CF,
respectively.
1.2. The problems and our contributions
In the context of personalization applications, traditional single-rating CF have been highly successful however, the research
area regarding CF with multi-criteria ratings for items has been
rarely touched and the issue is largely unexplored. According to
Adomavicius and Kwon [23], the problem of multi-criteria recommendations with a single and an overall rating is still considered an
optimization problem. Also, providing ratings for items in multi aspects in the CF recommender systems presents new challenges
such as sparsity problem in criteria and overall ratings, scalability

83

problem with increasing new dimensions and representation and


reasoning users behavior and preferences about items features.
With regard to scalability problem, while developing multicriteria recommender system, it is obvious that the recommender
system deals with high-dimensional data and therefore, applying
dimensionality reduction technique for more than 2 dimensions
for the problem of multi-criteria CF is one important issue that
has rarely been considered in prior researches for such systems.
The 3-dimensional space is rather split into pair relations such as
user-item, user-criteria, and criteria-item in order to apply existing
dimensionality reduction techniques and reveal the latent associations between the components. Therefore, a part of the total interaction between any pairs in the three or higher dimensions was
lost.
CF recommender systems also suffer from sparsity or missing
value in the user-item ratings matrix and this inuences the predictive accuracy. Multi-criteria CF recommender systems suffer
more from this problem on two sides, missing values in overall
and criteria, with the system having to predict these missing ratings with new approaches [24]. In addition, to alleviate sparsity,
traditional and crisp methods such as clustering and regression
do not consider the exact users preferences in multi-criteria CF.
With regards to the multi-criteria CF, the experimental results obtained in this study demonstrate that fuzzy methods can better
alleviate the sparsity problem in overall ratings and this in turn
leads to improvement in predictive accuracy.
For representation and reasoning users behavior, the user
perception and behavior about items features are imprecise,
subjective and vague; all these have to be taken into account. In
multi-criteria CF, it is important to deal with non-stochastic
uncertainty problem induced from vagueness and imprecision in
representing and reasoning items features. With regards to user
behavior and perception for example interest, the uncertainty is
connected to how the user interest precisely can be represented
and measured. In that direction, in multi-criteria CF, the system has
to consider this issue and predict the overall preference about items
according to the user behavior on different dimensions of the items.
The systems developed by solely clustering methods, dimensionality
reduction techniques, and regression approaches are usually failed to
predict the exact user preferences about items features. Therefore,
the sophisticated methods are needed that can accurately predict
the user overall preference on the items features and provide the
recommendations to be more tailored to the user taste.
We describe the contributions of our study as follows:
There have been no prior applications of the High Order Singular Value Decomposition (HOSVD) and Neuro-Fuzzy approach to
multi-criteria CF problem; with respect to this, a new model based
on HOSVD and ANFIS is developed. For the rst time in a multi-criteria CF problem, ANFIS is used for solving the sparsity in overall
ratings and uncertainty problem and HOSVD is applied for scalability problem.
In order to overcome above mentioned problems, in the proposed method, two well-known techniques are combined from
the elds of articial intelligence and dimensionality reduction.
We apply HOSVD for dimensionality reduction on the high-dimensional dataset of user ratings. In addition, the result of decomposition through HOSVD is used for clustering based on cosine-based
similarity. The aim of clustering by this method is to provide a
model of similar users for extracting fuzzy rules with high accuracy
using Adaptive Neuro-Fuzzy Inference System (ANFIS) and improving the model efciency [26]. Indeed, HOSVD is used to capture the
high dimensions all together without reducing them into lower
dimensions where the traditional approaches have failed. Therefore, it substantially improves the efciency and scalability of multi-criteria CF. Due to the nature of experimental dataset, we
perform HOSVD on third-orders tensor. However, it can be also

84

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

applied on tensors with more than 3 dimensions. This can be one of


the main advantages of HOSVD which make it exible and effective
approach for multi-criteria CF where other traditional machine
learning techniques have failed. It should be noted that using HOSVD the computation time for decomposition procedure is high
when the tensor order is increased. However, it can be done in
the ofine phase and with incremental learning for data approximation procedure in the online phase.
In the proposed model, ANFIS aims to extract knowledge (rules)
from the users ratings in multi aspect to be used in overall rating
prediction task. The extracted rules is employed for predicting unknown ratings for alleviating sparsity problem in overall rating and
also revealing the real level of user preferences on items features.
The ANFIS provides exible structure of dened problem that is
suitable for generating stipulated inputoutput pairs using a set
of induced fuzzy IFTHEN rules with appropriate and varied MFs
[27]. The produced Fuzzy Inference System (FIS) is served to predict user overall preferences about items features with proper
training. The elements of this model are a fuzzy set, a neural network and data clustering. In addition, non-stochastic uncertainty
emerging from vagueness and imprecision is handled using ANFIS.
The MFs produced by ANFIS is used for representation and reasoning users behavior of providing rating according to their perception about items features. The MFs formed by ANFIS are
continuous and more accurate in representing the features of items
and user feedbacks. Furthermore, to prevent the problem of overtting discussed in the previous researches [24,28], subtractive
clustering is applied to minimize overtting by ne-tuning the ANFIS models and also the checking set is used to solve this problem
in the training data.
In the context of product recommendation, in practical applications and situations, customers are interested in rating the
items or express their preferences in linguistic terms, such as
{low interest}, {high interest} or {no interest} for the item features. This gives a suggestion to design multi-criteria CF to be
user-friendly and convenient for users in giving ratings to items.
Therefore, for multi-criteria CF, the fuzzy logic and fuzzy set is
more appropriate in human linguistic reasoning with imprecise
concepts in relation to the crisp approaches. In addition, linguistic terms are more suitable than numerical values in assessing
qualitative information, which is usually related to the human
perceptions, opinions and tastes. Hence, in multi-criteria CF, it
is more appropriate that the linguistic terms be considered for
users to express their preferences, knowledge and personal judgments [29]. From this perspective, we can dene users degrees
of preference regarding a particular item in a set of linguistic
terms such as {low interest}, {high interest} or {no interest} for
the feature of items. Furthermore, fuzzy approach provides a
way to quantify the non-stochastic uncertainty that is induced
from imprecision, vagueness, and subjectivity. Modeling with
fuzzy approach is more reliable than traditional statistical methods such as Bayesian method which handles uncertainty due to
randomness. Moreover, the discovered fuzzy rules from the
users ratings through ANFIS can maintain in the rules database
to be used in the next predictions for items recommendation.
These properties promise to provide the framework for addressing the representation and inference challenges in multi-criteria
CF research.
In this study, we consider the proposed method for movie domain recommender systems. However, the method can also be
adopted for e-business and e-government applications recommender systems such as recommender systems developed by
Zhang et al. [30] and Shambour and Lu [31,32] for e-business and
e-government applications, respectively.
Finally, we perform an in-depth experimental evaluation, which
the user rating about items in multi aspects obtained from

Yahoo!Movies network and several comparisons are conducted between our method and other algorithms.
Thus, in comparison with research efforts found in the literature, our work has the following differences. In this research:
 A new hybrid recommendation model using HOSVD and NeuroFuzzy techniques is proposed for increasing the predictive accuracy and improving the scalability of the multi-criteria CF.
 Sparsity issue in overall ratings is solved using Neuro-Fuzzy
technique.
 HOSVD is used for scalability improvement.
The remainder of this paper is organized as follows: In Section 2,
research background and related work are described. HOSVD
dimensional reduction technique, k-Nearest Neighbor (k-NN) Classier, ANFIS and subtractive clustering are introduced in the separate subsections in Sections 3. Section 4 provides an overview of
research methodology. Section 5 presents the result and discussion. Finally, conclusions and future work is presented in Section 6.

2. Research background and related work


In the area of personalized web search, Sun et al. [33] proposed
Cube singular value decomposition (CubeSVD) to improve Web
Search. Based on their CubeSVD analysis, which also used HOSVD
technique, web search activities carried out more efciently. They
evaluated the method on MSN search engine data. In the eld of
recommender systems, several recommendation models have been
proposed which have used three dimensional tensors for recommending music, objects and tags. Recommender models, using
HOSVD for dimension reduction have been proposed for recommending personalized music [22] and tags [34]. Xu et al. [35] used
HOSVD to provide item recommendations. Their work was compared with a standard CF algorithm, without focusing in tag recommendations. Leginus et al. [36] utilized clustering techniques for
reducing tag space that improved the quality of recommendations
and also the execution time of the factorization and decreases the
memory demands. Their proposed method was adaptable with
HOSVD. They also introduced a heuristic method to speed-up
parameters tuning process for HOSVD recommenders. Symeonidis
et al. [37] introduced a recommender based on HOSVD where each
tagging activity for a given item from a particular user is represented by value 1 in the initial tensor, all other cases were represented with 0. Li et al. [38] presented a multi-criteria rating
approach to improve personalized services in mobile commerce
using Multi-linear Singular Value Decomposition (MSVD). The
aim of their paper was to exploit context information about the
user as well as multi-criteria ratings in the recommendation
process.
The fuzzy logic eld has grown considerably in a number of
applications across a wide variety of domains like in the semantic
music recommendation system [39] and product recommendations [40]. Castellano et al. [41] developed a Neuro-Fuzzy strategy
combined with soft computing approaches for recommending Uniform Resource Locators (URLs) to the active users. They used fuzzy
clustering for creating user prole considering the similar browsing behavior. de Campos et al. [42] proposed a model by combining
Bayesian network for governing the relationships between the
users and fuzzy set theory for presenting the vagueness in the
description of users ratings. A conceptual framework based on fuzzy logic-based was proposed by Yager [43] to represent and justify
the recommendation rules. In the proposed framework, an internal
description of the items was used that relied solely on the preferences of the active user. Carbo and Molina [44] developed an algorithm based on CF that ratings and recommendations were

85

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

considered as linguistic labels by using fuzzy sets. A model proposed by Pinto et al. [45] that combined fuzzy numbers, product
positioning (from marketing theory) and item-based CF. Zhang
et al. [30] developed a hybrid recommendation approach with
combination of user-based and item-based CF techniques using
fuzzy set techniques and applied it to mobile product and service
recommendation. They tested the prediction accuracy of their hybrid recommendation approach using MovieLens 100 K dataset.
In case of multi-criteria CF, few researches has been conducted
to develop the similarity calculation of the traditional memorybased CF approach to investigate multi-criteria rating [23,46,47]
that the similarities between users are estimated through aggregating traditional similarities from individual criteria or applying
multidimensional distance metrics. Aggregation function approach
was seen by Adomavicius and Kwon [23] as the overall rating r0
can serve as an aggregate of multi-criteria ratings. With all this
presumption, this method nds aggregation function f representing the connection between overall and multi-criteria ratings as:

r 0 f r1 ; . . . ; rk

In order to developing the idea of Adomavicius and Kwon [23],


Sahoo et al. [48,49] extended the exible mixture model (FMM)
developed by Si and Jin [50] to multi-criteria recommender systems.
The assumption of FMM is that two latent variables Zu and Zi (for
customers and products) provide just one rating ur of user u on item
i. They discovered the dependency framework of the overall rating
(r0) and multi-criteria ratings (r0, r1, r2, and r4). Liu et al. [51] presented a multi-criteria recommendation approach which was based
on the clustering of users. Their idea was that for each user one of
the criteria is dominant and users are grouped according to their
criteria preferences. They applied linear least squares regression, assign each user to one cluster, and evaluated different schemes for
the generation of predictions. They applied the methods on hotel
domain dataset with ve criteria, Value, Location, Rooms, Service
and Cleanliness. Zhang et al. [52] proposed two types of multi-criteria probabilistic latent semantic analysis algorithms extended from
the single-rating version. First, the mixture of multi-variate Gaussian distribution was assumed to be the underlying distribution of
multi-criteria ratings of each user. Second, they further assumed
the mixture of the linear Gaussian regression model as the underlying distribution of multi-criteria ratings of each user, inspired by the
Bayesian network and linear regression.
Shambour and Lu [53] implemented a hybrid Multi-Criteria
Semantic enhanced CF (MC-SeCF) approach to alleviate limitations
such as sparsity and cold-start of the item-based CF techniques.
The experimental results on MovieLens dataset demonstrated the
effectiveness of their proposed approach in alleviating the sparsity
and cold-start items problems. They achieved high accuracy and
more coverage in very sparse and new items datasets than the
benchmark item-based CF recommendation algorithms. In the proposed method for building a model using HOSVD and ANFIS, the
explicit ratings are needed. However, based on Nielsens 90-9-1
principle [54] more people will lurk in a virtual community than
will participate. Hence, with considering the Nielsens 90-9-1 principle, appropriate and domesticated strategies are required to be
incorporated in multi-criteria CF such as developed method by
Shambour and Lu [53] which uses semantic information of items.
Generally, we view the MC-SeCF approach to be complementary
to our method. An opportunity for future work is therefore to combine the predictions of such MC-SeCF approach with our method in
a hybrid approach. With respect to the achieved improvements by
Shambour and Lu [53], the major problems such as sparsity and
cold-start can be remarkably alleviated. These can be suggestions
that methods proposed by Shambour and Lu [53] and Kernel-SVD
[55,56] combined with HOSVD can be incorporated into multi-criteria CF to address the sparseness problem.

Jannach et al. [24] further developed the accuracy of multi-criteria CF by proposing a method using Support Vector Regression
(SVR) for automatically detecting the existing relationships between detailed item ratings and the overall ratings. In addition,
the learning process of SVR models was per item and user and
lastly combined the individual predictions in a weighted approach.
Similar to our research, they evaluated their methods using
Yahoo!Movie dataset.
3. Materials and methods
3.1. Higher Order Singular Value Decomposition (HOSVD)
To represent and recognize high-dimensional data effectively,
the dimensionality reduction is conducted on the original dataset
for low-dimensional representation [57]. Visualizing, comparing,
and decreasing processing time of data are the main advantages
of dimensionality reduction techniques. HOSVD is one of the powerful dimensionality reduction techniques for tensor decomposition proposed by Lathauwer et al. [58]. They proposed HOSVD as
a generalization of the SVD that is used for tensors decomposition.
For obtaining HOSVD calculations the following steps are
needed:
 Step 1: Unfolding of the mode-d tensor T 2 RI1 ...Id which yields
matrices A(1), . . . , A(d). They are dened as:
n

An 2 j

in1 In2 In3 . . . Id I1 I2 . . . Id1 in2 In3 In4 . . . Id I1 I2 . . . Id1


   Id I1 I2 . . . Id1 i1 I2 I3 . . . In1    in1 ;
in 0; 1; . . . In  1

The matrix unfolding of a tensor can be dened as matrix representations of that tensor in which all the column (row, etc.) vectors are stacked one after the other [58].
In the case of 3rd-order tensors T 2 RI1 I2 I3 , there exist three
matrix unfolding (see Fig. 1) as:
 mode 1: j = i2 + (i3  1)I3,
 mode 2: j = i3 + (i1  1)I1,
 mode 3: j = i1 + (i2  1)I2.
Step 2: Identifying the d left singular matrices as U(1), . . . , U(d)
obtained by:

An Un

n
X
Vn ;

n 1; . . . ; d

3
n

In In

In the Eq. (3), the matrices U 2 R


and values
Pn
2 RIn I1 I2 ...In1 In1 ...Id stands for singular values in a diagonal matrix includes with descending order. The matrix V(n) stands for right
singular matrices that V(n)TV(n)=I and U(n)TU(n)=I. These singular
matrices are orthonormal.
 Step 3: Finding the S 2 RI1 I2 ...Id (core tensor) through contracting the left singular matrices U(n) with original tensor T:
T

S T1 U1 2 U2 d Ud

where Sia a as sub-tensors of S 2 RI1 I2 ...Id are found through xing the nth index to a with ordering properties as:
n

kSin 1kF r1 P kSin 2kF r2 P    P kSin In kF


n

rIn P 0

5
n
i

In Eq. (5), for all possible values of n, r kSin i kF (Frobenious


norms) stands to the ith n-mode singular value of tensor T. Fig. 2
shows a pseudo code for HOSVD algorithm.
Procedure HOSVD (Input: Tensor T)
For HOSVD the computation cost is calculated as shown in
Table 1.

86

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

Mode-3 Unfolding: A( 1 )

I1 I 2 I3

Mode-3 Unfolding: A( 2 )

I 2 I1 I3

Mode-3 Unfolding: A( 3 )

I3 I1 I 2

Fig. 1. Unfolding of a 3rd-order tensor.

3.1.1. Truncated HOSVD


The truncated HOSVD is dened as a multi-rank approximation.
The truncated HOSVD is taken as the rst approximation of an iterative algorithm. The matrices and core tensor are updated iteratively starting with Eq. (4). The algorithm stops when it ceases to
improve the approximation or it reaches a maximum number of
iterations [59]. This iterative method belongs to the family of alternating least-squares methods, and is called higher-order orthogonal iteration [58].
According to Lathauwer et al. [58], for the determined decomposition by HOSVD, the following norm holds:

kTk2F

Rd
R1
X
X
1 2
d 2
ri   
ri kSk2F
i1

i1

where the n-rank of S is indicated by Rn. Suppose Rn(1 6 n 6 d) be


e can be dened through
the n-mode rank of tensor T. A tensor T
holding the largest I0n of n-mode singular values and ignoring the
remaining values. Thus, because of rank truncation, the error is
bounded by Lathauwer et al. [58]:

kT  Te k 6

R1
d
X
X

n 2

r in

n1 i1 F 1 1

In practice, using an analogous procedure demonstrated in


Fig. 2, the rank-(R1, R2, R2, . . . , Rd) of e
S (truncated core tensor) can
be dened by using Rn leading singular eigenvectors in preference
to keeping all left singular eigenvectors to build the transformation
e n .
matrix U

Table 1
Computational cost for main steps in HOSVD.
Step

O(I1I2. . .IN)
O(I2I1I2. . .In1In+1...IN)

Constructing An An
n

nT

(n)

Determining A A
to obtain U
Contract tensor T with matrices U(n) s to get tensor S

O(I3)
O(I2I1I2. . .In1In+1...IN)

3.2. k-Nearest Neighbor (k-NN) classier


k-Nearest Neighbor (k-NN) classier is a well-known and powerful instance-based machine learning technique for classication
data [60]. By learning from all sorted training instances, k-NN simply can be applied to get results from training instances. The k-NN
algorithm consists of two phases: training phase and classication
phase. In training phase, the training examples are vectors (each
with a class label) in a multidimensional feature space. In this
phase, the feature vectors and class labels of training samples are
stored. In the classication phase, k is a user-dened constant
(see Fig. 3), a query or test point (unlabelled vector) is classied
by assigning a label, which is the most recurrent among the k training samples nearest to that query point. In other words, the k-NN
method compares the query point or an input feature vector with
a library of reference vectors, and the query point is labeled with
the nearest class of library feature vector. This way of categorizing
query points based on their distance to points in a training dataset
is a simple, yet an effective way of classifying new points. One of
the main advantages of the k-NN method in classifying the objects
is that it requires only few parameters to tune: k and the distance
metric, for achieving sufciently high classication accuracy. Thus,
in k-NN based implementations, the best choice of k and distance
metric for computing the nearest distance is an important task.
In k-NN classier, the distance function usually is considered
Euclidean distance when the input vectors and outputs are real
numbers and discrete classes, respectively. In this study, we use
Euclidean, City-Block and correlation distance metrics for distance
calculation in k-NN.
Assume x1, x2, . . . , xmx indicates the rst row vectors and y1,
y2, . . . , ymy indicates the second row vectors, the various distance
metrics for measuring distance between xs and yt are dened as
follows:

dst
Fig. 2. Procedure for decomposing tensors via HOSVD [59].

N-dim

Unfolding the tensor T

r
Xn
x  ytj 2
j1 sj

87

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

Fig. 3. k-NN for k = 8 and k = 5.

dst

n
X
jxsj  ytj

j1

xs  xs y  xt 0
dst 1  p0 pt0
t yt  y
t
xs  xs xs  xs yt  y
1X
1X
xs
t
xsj and y
y
n j
n j sj

10

where Eqs. (8)(10) stand for Euclidean, City-Block and correlation


distance metrics, respectively.
3.3. Adaptive Neuro-Fuzzy Inference System (ANFIS)
Soft computing techniques are known for their efciency in
dealing with complicated problems when conventional analytical
methods are infeasible or too expensive, with only sets of operational data available. Fuzzy logic (FL) and Fuzzy Inference Systems
(FIS), rst proposed by Zadeh [61], provide a solution for decision
making based on vague, ambiguous, imprecise or missing data.
FL represents models or knowledge using IFTHEN rules. A Neuro-Fuzzy system is functionally equivalent to a FIS. A FIS mimics
a human reasoning process by implementing fuzzy sets and
approximate reasoning mechanism which use numerical values instead of logical values. A FIS requires a domain expert to dene the
MFs and to determine the associated parameters both in the MFs,
and the reasoning section [62,63]. However, there is no standard
for the knowledge acquisition process. Thus, the results may be different if a different knowledge engineer is at work in acquiring the
knowledge from experts. A Neuro-Fuzzy system can replace the
knowledge acquisition process by humans using a training process
with a set of inputoutput training dataset. Thus instead of dependent on human experts, the Neuro-Fuzzy system will determine
the parameters associated with the Neuro-Fuzzy system through
a training process, by minimizing an error criterion. A popular Neuro-Fuzzy system is called an ANFIS. ANFIS is fuzzy system that uses
Articial Neural Network (ANN) theory to determine its properties
(fuzzy sets and fuzzy rules) [6469]. It consists of ve feed-forward
layers as shown in Fig. 4.
The ANFIS is functionally equivalent to TakagiSugenoKang
(TSK) fuzzy model. It can also express its knowledge in the IF
THEN rule format as follow:





Rule
Rule
Rule
Rule

1:
2:
3:
4:

IF
IF
IF
IF

In1
In1
In1
In1

is
is
is
is

A1
A1
A2
A2

AND
AND
AND
AND

In2
In2
In2
In2

is
is
is
is

B1
B1
B2
B2

THEN
THEN
THEN
THEN

f11 = p11In1 + q11In2 + r11


f12 = p12In1 + q12In2 + r12
f21 = p21In1 + q21In2 + r21
f22 = p22In1 + q22In2 + r22

Fig. 4. The structure of ANFIS.

where the parameters A1, A2, B1 and B2 determine labels for indicating MFs for the inputs parameters In1 and In2, respectively. Also,
parameters pij, qij and rij (i, j = 1, 2) denote parameters of the output
MFs.
In Fig. 4, the layers in ANFIS perform the different action that is
detailed as bellow:
 Layer 1: In this layer, membership grades are provided by nodes
which are adaptive nodes. The outputs in this layer are obtained
by:

O1Ai lAi In1 ;

i 1; 2

O1Bj

j 1; 2

lBj In2 ;

11

where appropriate MFs are indicated by Ai and Bj for the input


parameters In1 and In2 that can be dened as triangular, trapezoidal
and Gaussian functions. The Gaussian type MFs for Ai and Bj MFs
and input parameters In1 and In2 are dened as below:

!
In1  ci 2
;
lAi In1 ; ri ; ci exp 
2r2i
!
In  c 2
lBj In2 ; rj ; cj exp  2 2 j ;
2rj

i 1; 2
12
j 1; 2

where the parameters of the MFs are dened as {ri,ci} and {rj,cj},
governing the Gaussian functions. In this layer, ANFIS parameters
stand usually as premise parameters.
 Layer 2: There are xed number of nodes in the second layer,
labeled with P. The outputs of the second layer can be dened
as:

O2ij W ij lAi In1 lBj In2 ;

i; j 1; 2

13

where the symbol Wij is used here to represent weight.


 Layer 3: In this layer, very node i labeled with N determines the
ratio of the ith rules ring strength to the sum of all rules ring
strengths as:

88

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

O3ij W ij

W ij
2
X
P2

i; j 1; 2

14

j1 W ij

i1

where the output of this layer represents the normalized ring


strengths.
 Layer 4: The nodes are adaptive nodes. The output of each node
in this layer is simply the product of the normalized ring
strength and a rst-order polynomial (for a rst-order Sugeno
model). Thus, the outputs of this layer are given by:

O4ij W ij fij W ij pij In1 qij In2 r ij ;

i; j 1; 2

15

where W ij is the output of layer 3, and {pij, qij, rij} is the parameter
set.
 Layer 5: There is only one single xed node labeled with R. This
node performs the summation of all incoming signals. Hence,
the overall output of the model is given by:
2 X
2
2 X
2
X
X
Out O
W ij fij
W ij pij In1 qij In2 r ij
i1 j1



i1 j1






W ij pij In1 W ij qij In2 W ij r ij

Pi P i  Pc1 exp 

!
kxi  xj k2
;
r 2
b

r b g r a

19

where rb is a positive constant, which denes a neighborhood that


has measurable reductions in density measure and g indicates a
constant greater than 1 to control and avoid cluster centers being
in too close proximity [73]. From the Eq. (19), the potential measurement will be signicantly reduced from data points near the
rst cluster center c1. Based on the larger potential value, the data
point c2 is chosen for the second cluster center.
Usually, after determining the kth cluster center ck, according to
the Eq. (20), the potential is revised as:

Pi Pi 

Pck exp

kxi  xk k2
 r 2

!
20

2 X
2
X

ond cluster center, for determining the new density values, the result of the rst cluster center is subtracted as follows:

16

i1 j1

where the overall output out is a linear combination of the consequent parameters when the values of the premise parameters are
xed.

where P k is the largest potential density value and ck denotes the


location of the kth cluster center. After revising the density function,
the next cluster center is selected as the point having the greatest
density value. This process continues until a sufcient number of
clusters is attained at which all points lie within a loop belonging
to a cluster center.
4. Research methodology

3.4. Subtractive clustering


The idea in TSK model is that each rule in a rule base indicates
an area for a model, which can be linear [70]. The TSK rule structure in a basic shape is as follows:

If f x1 is A1 ; x2 is A2 ; . . . ; xk is Ak then y gx1 ; x2 ; . . .

17

where sentences forming the condition are connected through the


logical function f. The output y is obtained by g that is a function
of the inputs xi.
In order to establish an effective TSK model of a process, using
subtractive clustering for generating clusters of data is constructive.
The main goal of using subtractive clustering as a cluster analyser is
to partition the dataset into a number of homogeneous and natural
subsets. The subtractive clustering method assumes each data point
is a potential cluster center and calculates a measure of the likelihood that each data point would dene the cluster center, based
on the density of surrounding data points. By using it, the quantity
of calculation is in proportion to the number of data points which is
foreign to the dimensions of problem. However, while the actual
cluster centers are not necessarily located at one of the data points,
in most cases it is a good approximation, especially with the reduced computation this approach requires [71]. In this method, a
data point with the highest potential, which is a function of the distance measure, is considered as a cluster center. The data points
that are close to new cluster center are penalized in order to facilitate the emergence of new cluster centers [72]. From the Eq. (18),
the potential cluster center Pi can be obtained at a data point xi as:
m
X
kxi  xj k2
Pi
exp   2
j1

ra
2

!
18

where Xi = [Xi1, Xi2, . . . , Xin] and Xj = [Xj1, Xj2, . . . , Xjn] are data vectors
for input and output dimensions, ra is a positive constant dening
the neighborhood range of the cluster or simply the radius of hypersphere cluster in data space and |||| indicates the Euclidean distance. ra is a critical parameter that determines the number of
cluster centers or locations. The rst cluster center is selected as
the c1 data point with the highest potential value, Pc1 . For the sec-

Fig. 5 shows the general framework of proposed method with


combination HOSVD for dimensionality reduction and ANFIS combined with subtractive clustering for discovering knowledge from
users ratings and predicting overall ratings.
In the rst step, we apply the HOSVD for dimensionality reduction to reveal the latent associations among the components in the
user-item-criteria tensor. Then, we perform cosine-based similarity for clustering to obtain groups of similar users and determine
labels for clusters. Indeed, by this way high quality clusters are obtained that is necessary for developing efcient ANFIS model. Then,
ANFIS is applied on clusters for extracting fuzzy rules and predicting null values in overall ratings. The main tasks of dimensionality
reduction process are reducing the dimension and obtaining best
approximation of data in the tensor of user preferences about
items on multi aspects and nding users with similar preferences
on items and criteria. Measuring the similarity for users based on
their ratings on criteria provides the possibility of applying clustering method. After applying clustering method that provides the
classes of users with similar taste, ANFIS is used to extract knowledge (fuzzy rules) from determined clusters. To increase the accuracy of rule-based system, reduce the amount of data in any class
and minimize overtting in the training data, subtractive clustering is combined with ANFIS. Thus, the main steps in the proposed
method for developing the model in the ofine phase are:
 Step 1: HOSVD is applied on training data in 3-order tensor for
dimensionality reduction to get the best approximation of rating information.
 Step 2: The approximated data by HOSVD is used for clustering
using cosine-based similarity. In fact, in this step, label for each
vector of ratings is dened to be used in k-NN method in online
phase.
 Step 3: ANFIS combined with subtractive clustering is used for
training data in clusters obtained in the previous step for
extracting fuzzy rules and forming rule clusters.
 Step 4: The fuzzy rules are used for predicting existing null values of overall ratings in ofine and online phases. It should be
noted that for predicting the unknown overall ratings, we

89

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

Mul-Criteria Dataset

Criteria 1

Criteria 2

Criteria k

Overall Rang

Dimensionality Reducon Using


HOSVD

Clustering

Cluster 1

Cluster n

ANFIS Combined with Subtracve


Clustering

Extracng Fuzzy Rules

(Cluster 1) IF THEN

(Cluster n) IF THEN

Fuzzy Rules
Database

Overall Rangs Predicon

Fig. 5. Proposed model using ANFIS and HOSVD.

solved the sparsity problem in criteria using the neighborhood


formation in any clustering. For predicting the unknown criteria
ratings for the target item, we relied on a cosine-based similarity as a similarity measure which was performed on approximated data obtained by HOSVD.
After learning the model in the ofine phase, in the online
phase, the recommender system follows the recommendation
and prediction tasks of multi-criteria CF recommender systems
using the 3 main steps as:
 Step 1: Using k-NN method, recommender system predicts the
class label for new data.
 Step 2: Recommender system refers to the corresponding fuzzy
rule cluster and predicts the overall rating for active user (see
Section 4.2 for more detail).
 Step 3: After overall rating prediction, recommender system
forms the neighbors using cosine similarity presented in Eq.
(21) for active user from corresponding cluster and makes predictions and Top-N recommendations.

4.1. Clustering the experimental dataset using HOSVD and improving


the scalability of multi-criteria CF
The multi-criteria CF are needed to quickly produce high quality
recommendations for very large-scale problems. In this paper, we
address the performance issues by scaling up the neighborhood
formation process through the use of dimensionality reduction
techniques. Scalability is an issue in multi-criteria CF because tensor of data is composed of multiple dimensions and the dimension
in itself can be very large. There is no doubt that clustering techniques reduces the sparsity and improves scalability of recommender systems: it does this by effectively partitioning the

ratings database. Previous studies [74,75] have indicated the benets of applying clustering in recommender systems. Using HOSVD
and cosine-similarity approaches, we perform the clustering task
in an effective way for multi-criteria CF.
As discussed earlier, for recommendation task in multi-criteria
CF, recommender systems deal with high-dimensional data and
this phenomenon makes the computational cost extremely high
and even non-feasible for traditional dimensionality reduction
techniques. Given the scalability challenge, in this paper, HOSVD
is able to (1) factorize large tensors efciently using much less time
than standard methods, while at the same time and (2) obtain lowrank factors that preserve the main variance of the tensors. Thus,
due to the dimensionality reduction, we can better form and precompute the neighborhood that leads the prediction generation
be much faster in multi-criteria CF and this means that forming
neighborhoods in the low dimensional eigenspace provided better
quality and performance. In addition, after tensor decomposing by
HOSVD, the clustering of data using cosine-based similarity is performed in an effective way and once the clustering is complete, the
performance of multi-criteria CF can be very good, since the size of
the group that must be analyzed is much smaller.
For applying HOSVD, 3-dimensional data is stored in the 3dimensional tensor A 2 RI1 I2 I3 , whereby I1 corresponds to the
number of users, I2 corresponds to the number of items which were
rated and I3 is the number of used criteria. Each entry of the tensor A
is a number between 1 and 13. Using HOSVD the tensor A 2 RI1 I2 I3
that contains the user ratings about items on four criteria was
decomposed into A 2 S1 U2 V3 W in which U 2 RI1 I1 , V 2 RI2 I2
and W 2 RI3 I3 are orthonormal matrices, and S 2 RI1 I2 I3 is a core
tensor which satises all-orthogonality and ordering properties.
Similar to the truncated SVD for low-rank approximation and
dimensionality reduction of matrices, low-rank approximation
and dimensionality reduction of higher order tensors can be done
by the truncated HOSVD (but with better approximation and computation), that is, take the rst r1 columns of U, the rst r2 columns
of V, the rst r3 columns of W, and the top-left r1  r2  r3 block of S.
In that direction, for dimensionality reduction for 3 dimensions
dataset, HOSVD is an effective method. It is exible to choose different r for different mode of a tensor. The size of the data goes down to
r1r2r3 + I1r1 + I2r2 + I3r3 from I1I2I3, and if r1 = r2 = r3 the size of the
data goes down to r3 + r(I1 + I2 + I3). If we at the tensor into a
I1  I2I3 matrix, the size of the data only goes down to R2 + R(I1 + I2
I3). Therefore, result of the HOSVD decomposition on 3rd tensor of
users ratings are the matrices U, V and W that show the relations between user and user, item and item, and criterion and criterion,
respectively. This decomposition is obtained without splitting the
3-dimensional space into pair relations. For the sake of conciseness,
in the following a very simple example with only 4 users 6 items and
4 criteria is demonstrated. Table 2 shows the user rating for items by
users based on 4 criteria and its decomposition to the matrices U, V,
W, S(:,:,1), S(:,:,2), S(:,:,3) and S(:,:,4) is shown in Table 3.

U new U2 V3 W1  Sy1 0:5091 0:2729 


As can be seen in the Fig. 6, using cosine similarity in Eq. (21),
the similar users to the new user can be found. The cosine similarity between two vectors A and B can be dened as:

similarity cosA; B

AB
kAk  kBk

Pn
i1 Ai  Bi
q

q
Pn
Pn
2
2
i1 Ai 
i1 Bi

21

By applying this method, system is able to cluster data based on


user similarity from matrix U. Since k-NN predictor requires supervised learning, cosine-based similarity is selected to obtain clusters

90

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

Table 2
Multi-criteria ratings for 4 users and 6 items on 4 criteria.
Items
Ratings on criteria 1
Users

Ratings on criteria 2

13
12
11
1
1
13
1
1
Ratings on criteria 3

11
11
4
0

11
12
3
5

5
4
11
9

11
11
11
3

10
11
11
9

11
9
5
3
C1

I1
I2
I3
I4
I5
I6

5
4
4
13
13
13

5
5
12
4

5
4
11
11
13
13
13
13
Ratings on criteria 4

10
10
11
5
4
3
3
11
4
11
9
3
4
9
3
3
New user ratings on four criteria C1, C2, C3 and C4
C2
C3
4
5
3
3
13
12

from approximated data by HOSVD to provide the labels for them.


From the truncated matrix U, the rst row of the matrix is selected
and system does cosine similarities calculation through Eq. (21)
with the second row, third row and so on, until it reaches the last
row. The highest value of cosine similarities is clustered with the
rst row. Applying this method on rows, the system will obtain
clusters with small number of similar users. With determining a
specic number of clusters, system can combines the close clusters
by calculating cosine-based similarity. Finally, after constructing
the clusters, the system assigns the category label to the each vector of users ratings. Similar to this procedure, we can obtain the a
specic number of clusters from the matrix V.
4.2. ANFIS architecture of proposed method and solving the sparsity
problem in multi-criteria CF
As discussed earlier, multi-criteria CF recommender systems
suffer from the sparsity problem in two sides, missing values in
the overall and criteria, and the system ought to predict these
missing ratings with new approaches. In this paper, we solve the
problem of sparsity in overall ratings using Neuro-Fuzzy system.
Generating the proper MFs and extracting the fuzzy rules for the
prediction of overall ratings are the main advantages of this method that can be used in the online and ofine phases. Because in the
multi-criteria CF the overall ratings are based on users perception
of items features, thus, we can better alleviate sparsity problem in
the overall ratings using the generated MFs and fuzzy rules obtained from users preference on the items features. In addition,

Rule 1: IF A is A1
Rule 2: IF A is A2

Rule 3: IF A is An

5
5
13
5

AND
AND

D is B1
D is B2

AND
AND

S is C1
S is C2

AND

D is Bn

AND

S is Cn

solving and alleviating the sparsity problem in multi-criteria CF


recommender systems improves the predictive accuracy of these
systems that has been proved in the prior researches [24,53]. Based
on the experimental results, we will also demonstrate that proposed method signicantly improve predictive accuracy of multicriteria CF. Using ANFIS, we will see that prediction error in overall
ratings is very low and even zero in many cases and this show the

4
11
5
13

12
11
12
12

9
11
4
13

5
10
4
13

9
11
9
8

4
12
3
4

5
13
11
5

3
13
4
3

C4

4
11
5
12
13
12

11
4
12
4
11
13

capability of ANFIS in alleviating sparsity problem in an exact and


effective way.
In this study, discovering the knowledge (fuzzy rules) from
users ratings and generalizing the relationship Y = f(X1, X2, . . . , Xn)
are the main goal of applying ANFIS for accurate prediction of
overall ratings that accordingly lead to predictive accuracy
improvement in multi-criteria CF. In this relationship, X1, X2, . . . , Xn
stands for input variables and Y stands for output variable. In the
current study, overall rating or user overall preference about items
can be determined as a function of items features or criteria. Thus,
we associate the Y variable to the overall rating and X1, X2, . . . , Xn
variables to the criteria ratings. Predicting the relationship between inputs and output is one of the important tasks that ANFIS
does. Based on the experimental dataset, the input parameters of
the ANFIS model under consideration are Acting (A), Directing
(D), Story (S) and Visuals (V). Overall rating (O) stands for output
that is dened as overall preference. These attributes naturally
are vague, imprecise and incomplete fuzzy terms that lead to
uncertainty in user interest about items features such as Acting,
Story, Visuals and Directing. Thus, in ANFIS, they can be introduced
and expressed by fuzzy linguistic values (uncertainty modeling)
such as {cluster 1}, {cluster 2}, {cluster 3} and {cluster 4} that
determine the domain of user interest of Acting, Directing, Story
and Visuals in four regions using MFs. They are given in Fig. 7a
and b for two inputs Visuals and Directing, respectively.
The relationship between input variables (criteria) and outputs
(overall rating) can be dened as

AND
AND

AND

V is D1
V is D2

THEN
THEN

V is Dn

THEN

f1 = p1 A + q1 D + r1 S + t1 V + p1
f2 = p2 A + q2 D + r2 S + t2 V + p2

fn = pn A + qn D + rn S + tn V + pn

Ov erall rating f Acting; Directing; Story; Visuals

22

In ANFIS models, the output relations are related to the inputs


by mathematical relationships mapping using fuzzy rules. Fuzzy
rules play important role in the ANFIS models and they are backbone of such systems. The shape of fuzzy rules in ANFIS is dened
as

91

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101


Table 3
Generated matrices after applying HOSVD on tensor of users ratings.
S(:,:, 1)

S(:,:, 2)

77.91
0.99
0.17
0.50

0.81
1.44
1.65
0.55

0.44
1.45
3.25
1.39

0.85
0.72
2.19
1.06

0.09
6.52
0.13
1.33

0.87
1.07
1.60
0.95

2.27
3.74
6.62
6.28

4.17
1.87
0.73
1.63

6.40
4.98
2.12
2.45

0.21
2.50
4.56
1.91

2.11
3.21
0.98
2.11

0.52
5.74
15.63
3.96

S(:,:, 3)

1.43
13.11
2.16
5.15

2.56
2.32
3.84
5.08

0.55
0.17
6.91
0.01

0.92
1.85
1.11
0.41

2.47
0.22
3.32
0.28

0.64
9.24
1.49
1.80

2.73
6.96
3.76
0.77

0.08
1.28
2.71
0.59

3.50
3.30
0.16
3.14

3.54
1.39
1.64
0.26

S(:,:, 4)

0.19
5.80
3.88
5.85

0.18
0.40
4.60
0.70

Matrix U

Matrix V
0.30
0.60
0.67
0.33

0.49
0.56
0.51
0.43

0.51
0.46
0.46
0.56

0.65
0.34
0.29
0.62

0.74
0.66
0.06
0.08

0.44
0.60
0.50
0.44

0.17
0.12
0.46
0.86

0.47
0.43
0.73
0.23

Matrix W
0.65
0.29
0.20
0.25
0.45
0.43

0.40
0.40
0.43
0.46
0.38
0.38

0.05
0.84
0.32
0.03
0.36
0.24

Second Column of U and V

0.6

Items

I1

New User
Users

0.4

UnewU1

0.2

I4

I3

0
-0.2

I2

U4

-0.4
-0.6
-0.8
-0.65

-0.6

-0.55

-0.5

I6
I5

U3

-0.45

-0.4

-0.35

-0.3

First Column of U and V


Fig. 6. 2D graph of users and items.

For example, in this study, from the users ratings to movies,


ANFIS by training vectors of users ratings in any clusters extracts
the fuzzy rules such as

lAi S e

lAi V e

(a)

(b)

1
Cluster 2

Cluster 3

Degree of Membership

Degree of Membership

IF the Acting of a movie is cluster1 AND Directing is cluster1 AND


Story is cluster1 AND Visuals is cluster1 THEN the Overall Rating is
out1cluster1.

Cluster 4

Cluster 1

0.8
0.6
0.4
0.2
0

10

10.5

11

11.5

12

12.5

0.05
0.03
0.36
0.26
0.66
0.60

According to the extracted fuzzy rules by ANFIS, the out1cluster1


for overall rating is obtained from the MFs degree of 4 input variables.
Also, using subtractive clustering in ANFIS, system improves the
precision of extracted fuzzy rules obtained from users ratings to
movies and minimizes the overtting in training the data. It reveals the users preferences about items features in soft clusters
and divides the user preferences on items features in fuzzy clusters that system can predict exact relation between any criteria
and overall rating.
To illustrate a simple model of ANFIS applied on multi-criteria
CF, assume the system has two criteria S and V and one output
along with two fuzzy IFTHEN rules. Fig. 8 shows the rst-order
Sugeno FIS, the ANFIS model with two rules.
In Fig. 8, S and V indicate the crisp inputs related to node i and Ai
and Bi imply the linguistic labels distinguished by appropriate MFs
lAi and lBi , respectively. In this study, ANFIS uses the Guassian MF
as

0.8
U2

0.61
0.08
0.59
0.33
0.12
0.39

0.21
0.20
0.45
0.74
0.25
0.31

13

Sbi 2
2a2
i

23

Vbi 2
2a2
i

24

1
Cluster 2

Cluster 3

Cluster 1

0.8
Cluster 4

0.6
0.4
0.2
0
10

10.5

Input Variable " Visual"


Fig. 7. Membership functions for (a) Visuals and (b) Directing.

11

11.5

12

Input Variable "Directing

12.5

13

92

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

of clusters was used for the training data and the second group
of data including 10% of the total dataset of clusters was used for
the checking data. The remaining 10% data of clusters was used
for the testing data.
5. Result and discussion

Fig. 8. Architecture of implemented ANFIS model for two inputs, one output and
two rules.

where {ai, bi, ci} is the parameter set of the MFs in the premise part
of fuzzy IFTHEN rules that change the shapes of the MFs. Parameters in this layer are referred to as the premise parameters.
From the ANFIS architecture shown in Fig. 8, it can be observed
that when the values of the premise parameters are xed, the overall output can be expressed as a linear combination of the consequent parameters. In symbols, the output O can be rewritten as:

w1
w1
wn
f1
f2 . . .
fn
w1 w2
w1 w2
wn1 wn
 1 p1 A q1 D r 1 S t 1 V p1    w
 n pn A qn D
w

r n S t n V pn
 1 Ap1 w
 1 Dq1 w
 1 Sr 1 w
 1 Vt 1 w
 1 p1
w
 n Apn w
 n Dqn w
 n Sr n w
 n Vt n w
 n pn
   w

25

which is linear in consequent parameters pi, qi, ri, ti, and pi. Fig. 9
shows the architecture of the implemented ANFIS that consist of
four inputs, four rules, sixteen MFs for inputs and output.
4.2.1. Training the ANFIS and model validation using checking and
testing dataset
In this study, three set of data were used for ANFIS modeling as
training, checking and testing data. ANFIS uses training data for
constructing the model of target system. The rows of training data
are used as inputs and outputs for construction the target model.
Checking data is used for testing generalization capability of the
FIS at each epoch that prevents over-tting networks and veries
the identied ANFIS. Similar to the format of training data, the formats for the checking and testing data are dened data but generally their elements are different from those of the training data.
Any clusters obtained using HOSVD were divided into three
groups. The rst group of data including 80% of the total dataset

In order to analyse the effectiveness of the proposed method,


several experiments were conducted on Yahoo!Movies dataset
provided by Yahoo! Research Alliance Webscope program
(http://webscope.sandbox.yahoo.com).
On the Yahoo!Movies network, users could rate movies in 4
dimensions (Story, Acting, Direction and Visuals) and assign an
overall rating. Users used a 13-level rating scale for ratings. The
four features for any movies were considered as: C1 = Acting,
C2 = Story, C3 = Visuals and C4 = Directing. As can be seen in Table 4,
all users ratings are measured in a value between 1 and 13 in
quantitative scale.
In the experimental dataset there are 257,317 tuples of rating in
the original dataset with 127,829 users and 8272 movies. However,
the resulting ratings tensor is extremely sparse, because many of
the user-item-criteria entries are just empty elds. The sparsity
level of dataset is about 97.57% (sparsity level = 1 
density = 1  (257,317  100)/(127,829  8272) = 0.9757). That means,
not even 2.43% of all entries in the rating tensor are lled. Similar
to the work by Jannach et al. [24], we pre-processed the datasets
and created the test datasets with different density and quality
levels and applied the proposed method on YM-20-20, YM-10-10,
and YM-5-5. In this form, the description of dataset is presented
in Table 5.
5.1. Performance of HOSVD clustering
Because HOSVD is quickly calculated, HOSVD is applied on the
training tensor AeR15005004, which corresponds to the training
e a1;a2;a3 is retained. The values
set. As result an approximation A
set for a1, a2 and a3 determine the dimensions of the core tensor.
It should be noted that all the experiments in this study were
implemented using MATLAB and on a Microsoft Windows operating systems with Intel Core i5 processors having a speed of
2.66 GHz and 4 GB RAM.
For estimating the performance of HOSVD clustering for rank 2,
4, 8, 12, 16, and 20 approximations, we adopt Silhouette coefcient
[76] value as the standard measure for clustering quality and used
it to determine the best cluster formation. The Silhouette coefcient can assess the quality of a clustering. It is an internal index
that measures how good the clustering ts the original data based
on statistical properties of the clustered data. External indices, by
contrast, measure the quality of a clustering by comparing it with
an external (supervised) labeling. The Silhouette coefcient of an
element i of a cluster k is dened by the average distance a(i) between i and the other elements of k (the intra-cluster distance),
and the distance b(i) between i and the nearest element in the
nearest cluster (is minimal inter-cluster distance).
Table 4
A sample of the multi-criteria dataset from the Yahoo!Movies.

Fig. 9. Architecture of the implemented ANFIS.

Movie ID

User ID

Directing

Story

Visual

Acting

Overall rating

2
13
9
...
...
13
13
12
...
...

1
13
13
...
...
13
13
13
...
...

2
11
13
...
...
13
11
13
...
...

1
13
8
...
...
13
13
13
...
...

2
13
9
...
...
13
13
12
...
...

1
13
8
...
...
13
12
13
...
...

93

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

5.2. Evaluation of proposed ANFIS model

Table 5
Information of Yahoo!Movies dataset.
Name

#Users

#Items

#Overall ratings

YM-20-20
YM-10-10
YM-5-5

429
1827
5978

491
1471
3079

18,504
48,026
82,599

sci

bi  ai
maxfai; big

26

which can be written as:

8
>
< 1  ai=bi; if ai < bi
sci 0;
if ai bi
>
:
bi=ai  1; if ai > bi

27

After cluster analysis, the ANFIS model was applied on one of


the clusters with maximum Silhouette coefcient. In that cluster,
four fuzzy clusters have been determined for the given 190 users
ratings in the third cluster generated by HOSVD method for rank
approximation 12. The number of fuzzy rule set was equal to the
number of cluster centers, each representing the characteristic of
the cluster as given in Table 8.
For evaluating the ANFIS model, several measures of accuracy
were used to determine the model capability for predicting the
overall rating. For this reason, the models were evaluated by four
estimators Mean Absolute Percentage Error (MAPE), Root Mean
Square Error (RMSE), Mean Absolute Error (MAE) and coefcient
of determination (R2). These estimators are determined by

Pn

An overall score for a set of nk elements (a cluster or the entire


clustering) is calculated by taking the average of the Silhouette
coefcients sci of all elements i in the set. Thus, SCk can be dened
as

MSE

k
1X
sci
nk i1

28

R2 1 

Pn

O1 actualO

Pn

Table 6
Rule of thumb for the interpretation of the Silhouette coefcient.
Range

Interpretation

>0.70
0.500.70
0.250.50
<0.25

Strong structure has been found


Reasonable structure has been found
The structure is weak and could be articial
No substantial structure has been found

Table 7
Average Silhouette coefcient value for clusters obtained from different rank
approximation.
Rank approximation

Number of clusters

Average Silhouette coefcient

Rank
Rank
Rank
Rank
Rank
Rank
Rank
Rank
Rank
Rank

11
14
13
16
12
14
15
15
13
13

0.837
0.858
0.833
0.790
0.867
0.811
0.828
0.798
0.787
0.765

29

 predictionO=actualO
n

O1 actualO

30

 predictionO

O1 actualO

The Silhouette coefcient takes values between 1 and 1. The


closer to 1, the better the clustering ts the data. Table 6 lists a
general rule of thumb for interpreting the Silhouette coefcient.
Table 7 shows the average Silhouette Coefcient for HOSVD
clustering for rank 2, 4, 8, 12, 16, and 20 approximations. According
to the Table 7, the highest average Silhouette coefcient for HOSVD
clustering obtained 0.867 for rank 10 approximation. This accuracy
percentage is reasonably good. Based on observation, lower
approximation ranks do better than the high approximation ranks.
This supports our claim that truncated HOSVD gives better results.

002
004
006
008
010
012
014
016
018
020

 predictionO
n

Pn
MAPE

SC k

O1 actualO

 actualO

31

s
Pn
2
O1 actualO  predictionO
RMSE
n

32

where actual (O) indicates the real overall rating provided by user,
prediction (O) implies the predicted overall rating value and n corresponds to the number of used users ratings.
Usually, in the training process RMSE and MSE measure are
used to test the prediction model, however, in this study, other
performance measures were used to investigate for a more effective performance evaluation that are coefcient of determination
R2 and MAPE. The coefcient of determination R2 provides a value
between [1] about the training of the proposed network. A value
closer to 1 stands for the success of learning. Also, in this study,
MAPE was used that accurately identies the model deviations.
After implementing the ANFIS model using fuzzy logic toolbox
in MATLAB 7.10.0 software, the training and checking data from
the training and checking dataset were tested for error estimation.
Data from four inputs was given to trained model of ANFIS along
with actual overall ratings. From the inputs value, the suitable
MFs (see Fig. 7(a) and (b)) were selected to predict the overall ratings using the extracted rules (see Table 8). From the fuzzy rule
viewer of established ANFIS model shown in Fig. 10, the process
of overall rating prediction by selecting the MFs can be better visualized. It indicates the behavioral of users over the change in values
of all four inputs for overall rating. From the fuzzy rule viewer
above, when the input parameters of Acting is at 11, Directing at
12, Story at 12, and Visuals at 11, an output of overall rating at
12 is obtained.
Table 9 presents errors for a sample of training and checking
dataset. As can be seen the error from nineteen samples in Table 9,
ANFIS model has been trained effectively using training data.

Table 8
Formation of extracted rules by ANFIS.
Rule #

Extracted fuzzy rules

1
2
3
4

IF
IF
IF
IF

(Acting
(Acting
(Acting
(Acting

is
is
is
is

cluster1)
cluster2)
cluster3)
cluster4)

AND
AND
AND
AND

(Directing
(Directing
(Directing
(Directing

is
is
is
is

cluster1)
cluster2)
cluster3)
cluster4)

AND
AND
AND
AND

(Story
(Story
(Story
(Story

is
is
is
is

cluster1)
cluster2)
cluster3)
cluster4)

AND
AND
AND
AND

(Visuals
(Visuals
(Visuals
(Visuals

is
is
is
is

cluster1)
cluster2)
cluster3)
cluster4)

THEN
THEN
THEN
THEN

(OveralRating
(OveralRating
(OveralRating
(OveralRating

is
is
is
is

out1cluster1)(1)
out1cluster2)(1)
out1cluster3)(1)
out1cluster4)(1)

94

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

Fig. 10. Fuzzy rule viewer for input and output variables of ANFIS model.

Table 9
Training and checking errors for prediction overall ratings by ANFIS.
Sample #

Training data

Training ANFIS output

Training error (%)

Checking data

Checking ANFIS output

Checking error (%)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

12
10
13
12
12
13
12
13
12
12
12
10
13
12
11
12
12
13
12

12
10.0001
13
12
12
13
12
13
12
12
12
10.0013
13
12
10.9999
12
12
13
12

0
0.0001
0
0
0
0
0
0
0
0
0
0.0013
0
0
-0.0001
0
0
0
0

11
12
10
12
11
11
12
12
12
13
11
11
12
12
12
12
11
10
11

11.01
12.009
10.009
12.0001
11.0008
11.009323
12.0004
12.00383
11.998
12.999998
11.003
11.0005
12.00276
12.0003
11.9917
11.99346
10.99299
10.009
10.9901

0.01
0.009
0.009
0.0001
0.0008
0.009323
0.0004
0.00383
0.002
0.000002
0.003
0.0005
0.00276
0.0003
0.0083
0.00654
0.00701
0.009
0.0099

For subtractive clustering, the parameters were dened by a


trial and error approach as: range of inuence: accept ratio: 0.5, reject ratio: 0.15 and 0.5 and squash factor: 1.25. However, we could
test the effect of the two variables ra and rb that represent a radius
of neighborhood on the training, checking and test data for overall
rating prediction error. The error was estimated in lowest value for
the rb = 1.5 ra and the results of varying ra from 0.3 and 0.8 for the
radius of neighborhood. Fig. 11 presents the overall rating prediction error of checking and training for nineteen samples.
0.015

Training Error
Checking Error

Prediction error

0.01
0.005
0
-0.005
-0.01
0

10

15

20

Number of Samples
Fig. 11. Training and checking error for nineteen samples in the dataset.

In this study, the average error for checking data was equal to
0.0001904. After 200 epochs, the averages RMSE, MSE, MAPE and
R2 were calculated 0.02144, 0.00912, 0.18230 and 0.82460, respectively. The average error for training data was equal to
0.000162221. After 200 epochs, the averages RMSE, MSE, MAPE
and R2 were calculated 0.01272, 0.00912, 0.18230 and 0.99460,
respectively. Also, after 200 epochs, the average error for testing
data was equal to 0.000172361. The averages RMSE, MSE, MAPE
and R2 were calculated 0.01951, 0.00949, 0.10230 and 0.91150,
respectively. Average training and checking error after 200 epochs
are shown in Fig. 12.
Fig. 13 illustrates the interdependency of four inputs parameters and the overall rating obtained from the fuzzy rules generated
by ANFIS combined with subtractive clustering through control
surface. The level of overall rating can be depicted as a continuous
function of its input parameters as Acting, Directing, Story and
Visuals. The surface plots in this gure depict the variation of overall rating based on identied fuzzy rules. Fig. 13(a) shows the interdependency of overall rating on Directing and Acting. Fig. 13(b)
depicts interdependency of overall rating on Acting and Story.
Fig. 13(c) shows interdependency of overall rating on Visuals and
Acting. Fig. 13(d) depicts interdependency of overall rating on
Story and Directing. Fig. 13(e) depicts interdependency of overall

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

95

Fig. 12. The error of each observation for checking and training data.

rating on Visuals and Directing and Fig. 13(f) shows interdependency of overall rating on Story and Visuals.
These surface plots exactly show the users perception and
behaviors on any two features of items in the cluster of users with
similar preferences. In addition, the results depicted in the surface
plots are valuable to reveal users behavior about items features in
multi-criteria CF. Thus, the users preferences in any cluster of users
can be modeled by ANFIS and recommender system can recognize
which item features (criteria) in which level is tailored to their preferences. Also, the several curves presented in Fig. 14(ad) reveal distinctly the user behavior on any feature of items. As can be seen in
these curves, there is a signicant increase for overall rating versus
Story criteria in relation to the other criteria. It can be inferred that
Story criteria is most important for users in that cluster.
5.3. Multi-criteria CF evaluation
In this section, we completely focus on multi-criteria CF recommendation using proposed method. As mentioned before, we used
k-NN for classication data and also we stated that selecting k and
distance metric are important in k-NN method for accuracy of classication. Therefore, in this study, the optimal distance metric and
k were chosen using cross-validation [77]. Thus, classier could
accurately predict the testing data. Five-fold cross-validation
method has been applied to choice the type of distance metric
and best k value.
Using ve-fold cross-validation approach, for values k = 1, k = 3,
k = 5 and k = 7 and three different methods of calculating the nearest distance (Euclidean), correlation and City-Block, the result of
averaged classication accuracy presented in Table 10. From
Table 10, the highest averaged classication accuracy is obtained
about 98.91% using Euclidean distance metric for k = 5 in comparison to the City-Block (95.89%) and Correlation (96.76%) distance
metrics. Also, using Euclidean method, the averaged classication
rate is higher than Correlation and City-Block methods for all values of k. Thus, based on this result, we established the optimal value 5 obtained using ve-fold cross-validation and Euclidean for
distance metric for classication accuracy.
We determined the precision respectively the recall of the TopN list of each element in the test set and build the arithmetic mean
of these values. The recommenders prediction accuracy was measured by RMSE [78], which is a widely used metric for evaluating
the statistical accuracy of recommendation algorithms, given by

s
1 X
RMSE
jaij  pij j2
ui ;oj 2X
jXj

33

where X = {(ui, oj)|ui had rated oj in the probe set}. A lower value of
RMSE indicates a higher accuracy of the recommendation system.

Table 11 presents the RMSE obtained from proposed approach on


YM-5-5 (each movie has at least 5 ratings), YM-10-10 (each movie
has at least 10) ratings and YM-20-20 (each movie has at least 20
ratings). Fig. 15 shows the prediction accuracy for different neighborhood size on datasets YM-5-5, YM-10-10 and YM-20-20.
To compare the proposed method with the HOSVD, truncated
SVD and some stat-of-the-art approaches in multi-criteria CF, we
employ the recall and precision metrics, which are widely used
in recommender systems to evaluate the quality of recommendations [79,80]. Precision is the ratio of relevant items recommended
to total number of items recommended. Recall is the ratio of relevant items recommended to total number of relevant items that
exist. The two measures are inversely related and are dependent
on the length of the recommendation list. The longer the recommendation list, the easier it becomes to achieve high recall, but
the more difcult it becomes to achieve good precision. The F measure is the weighted harmonic mean that combines both precision
and recall [24].

Recall

Number of correctly recommended items


Number of interesting items

Precision

Number of correctly recommended items


Number of recommended items

34

35

where items of interest to a customer u refer to products in the test


set that were purchased by u, and correctly recommended items are
items that match the items of interest. Although these measures are
simple to compute and intuitively appealing, they are in conict because increasing the size of the recommendation set improves the
recall at the expense of reducing the precision [8].
The F1-metric [24,79], which combines precision and recall, is
also widely used to evaluate the quality of recommendations. Specically, the trade-off between precision and recall is balanced
using this measure by assigning equal weights to both metrics.
Therefore, we use the F1-metric in our evaluation, as shown in
Eq. (36).

F1

2  Recall  Precision
Recall Precision

36

We ran the experiments on datasets YM-10-10 and YM-20-20


datasets for N equal 1, 5, 7, 15, 25, 35 and 40, where N is the number of items to be recommended by the Top-N recommender
systems.
From all the two F1 curves in Figs. 16 and 17, we can notice that
the proposed method gives high level of accuracy when the size of
neighbors is increased versus the Top-N recommendation. This
outcome demonstrates the signicance of combining HOSVD
method and ANFIS with subtractive clustering for overcoming

96

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

Fig. 13. Interdependency of overall rating on (a) Directing and Acting, (b) Acting and Story, (c) Visuals and Acting, (d) Story and Directing, (e) Visuals and Directing, and (f)
Story and Visuals.

the problems connected to the multi-criteria CF. The results above


clearly reveal that the proposed method gives better result for YM20-20. In the Figs. 16 and 17, the signicant changes in accuracy
measured by F1 between neighbor size 15 and 25 indicates that

high accuracy is obtained for large neighborhood compared with


the small neighborhood. These outcomes according to the experiments are related to result of clustering and extracting fuzzy rules
from YM-20-20 and YM-10-10 datasets.

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

97

Fig. 14. Curves for revealing the relationship between overall rating and (a) Visuals, (b) Directing, (c) Story, and (d) Acting.

Table 10
Averaged classication accuracy for distance metrics and values of k.
Distance metric

k=1

k=3

k=5

k=7

k=9

Euclidean
City block
Correlation

96.63
94.72
95.28

97.34
94.87
95.38

98.91
95.89
96.76

97.67
93.89
94.88

95.56
93.73
94.87

Table 11
Coverage and RMSE for YM-5-5, YM-10-10 and YM-20-20.
Size of neighborhood

MAEpred; act

Dataset

5
10
15
20
25
30

YM-5-5
RMSE

YM-10-10
RMSE

YM-20-20
RMSE

0.551097
0.549707
0.544308
0.538909
0.530209
0.528609

0.5365
0.5310
0.5289
0.5200
0.5184
0.5174

0.53158
0.52988
0.52039
0.51558
0.51129
0.50349

0.58

YM 5-5
YM 10-10
YM 20-20

0.57

RMSE

0.56
0.55
0.54
0.53
0.52
0.51
0.5

15

25

In order to compare the proposed method with previous work


[23,24,52], we also evaluated our approach on the YM-10-10 using
an additional set of metrics. In the Table 12, we report Precision@5
and Precision@7 values as well as the Mean Absolute Error (MAE).
We also performed SVD and HOSVD techniques without using ANFIS with subtractive clustering on YM-10-10 and YM-20-20 datasets; the results are presented in Table 13.
The MAE is determined as the average absolute deviation between predicted ratings and true ratings shown in Eq. (37).

35

Neighborhood Size
Fig. 15. RMSE and neighborhood size.

45

55


N 
X

predu;i  act u;i 


N

37

i1

where N is the number of items on which a user u has expressed an


opinion.
From the results, we can nd that the precision at Top-5 and
Top-7 of the proposed method outperforms the algorithms in the
previous work and methods using solely HOSVD and SVD.
In order to compare proposed method with MC-SeCF developed
by Shambour and Lu [53] evaluated on MovieLens dataset, we also
evaluated our approach on YM-20-20 and YM-10-10 using MAE
metric for different neighborhood size. The MAE comparison is
shown in Fig. 18. When looking the curves in this gure, the significant improvement in recommendation accuracy is obtained in the
large neighborhood sizes. We present the recommendation accuracy using MAE for YM-20-20 and YM-10-10 in Table 14. From
Fig. 18 and Table 14, it can be observed that the proposed method
slightly better improves recommendation accuracy on the YM-2020 for large neighborhood sizes. Compared with the MC-SeCF, the
MAE values for our method is slightly higher than the MAE values
of MC-SeCF. However, quite interestingly, on the YM-20-20 dataset, better recommendation accuracy is obtained by our method.
This indicates that on the on the YM-20-20, the accuracy is relatively high because of discovering better fuzzy rules. Also, as mentioned earlier, using semantic information of items can be
incorporated to the multi-criteria CF for obtaining more accurate
recommendations.

98

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

0.86
0.85
0.84

YM-10-10
YM-20-20

0.72
0.71

0.83

MAE(%)

F measure

0.73

5 Neighbors
15 Neighbors
25 Neighbors
35 Neighbors

0.82
0.81

0.7
0.69

0.8

0.68
0.79

0.67

0.78
0.77

15

25

35

0.66
10

40

Top-N

F measure

0.84

0.82

0.8

0.78

15

40

50

70

90

Fig. 18. Recommendation accuracy for different neighborhood sizes on YM-20-20


and YM-10-10.

5 Neighbors
15 Neighbors
25 Neighbors
35 Neighbors

0.86

30

Size of Neighbors

Fig. 16. F1 measure and Top-N recommendation for YM-10-10.

0.76

20

25

35

Table 14
Recommendation accuracy using MAE for different neighborhood size.
Neighborhood size

MAE(%) YM-10-10

MAE(%) YM-20-20

10
20
30
40
50
70
90

0.7325
0.7370
0.7249
0.7260
0.7184
0.7112
0.7180

0.7105
0.7088
0.7093
0.7015
0.6902
0.6724
0.6688

40

Top-N

Precision @5 YM-20-20

0.88

Precision @7 YM-20-20

0.86

Fig. 17. F1 measure and Top-N recommendation for YM-20-20.

Precision @5 YM-10-10
Precision @7 YM-10-10

Table 12
MAE, precision at Top-5 and Top-7 of proposed method HOSVD and truncated SVD for
YM-10-10 (neighborhood size: all users).
Algorithm

Precision@5

Precision@7

MAE

HOSVD
Truncated SVD
HOSVD-ANFIS and subtractive
clustering

75.34
74.03
81.44

72.85
72.19
80.78

1.17
1.75
0.96

Precision

0.84
0.82
0.8
0.78
0.76
0.74
0.72
4

10

12

Number of Clusters
Table 13
MAE, precision at Top-5 and Top-7 for proposed method, HOSVD and truncated SVD
for YM-20-20 (neighborhood size: all users).
Algorithm

Precision@5

Precision@7

MAE

HOSVD
Truncated SVD
HOSVD-ANFIS and subtractive
clustering

78.57
75.12
83.34

76.43
73.21
81.32

0.95
1.45
0.91

Also, according to the rank 12 approximation dened for HOSVD decomposition, we employed the precision on different number
of clusters. For rank 12 approximation, the dened number of clusters was changed iteratively: starting from the 3 clusters, after
each iteration number of clusters was increased by 3 until 12.
Fig. 19 illustrates the precision value for Precision@5 and Precision@7 versus number of clusters.
As can be seen in Fig. 19, the worst precision is obtained for YM10-10 at precision@5 in the third cluster and the best precision is

Fig. 19. Precision versus number of clusters in Precision@5 and Precision@7 for
different dataset.

achieved for YM-20-20 at Precision@5 in the twelfth cluster. This


result demonstrates for YM-10-10 and YM-20-20 that the precision is increased with increasing the number of clusters.
To experimentally show the effectiveness of clustering using
HOSVD and cosine-based similarity, we performed the experiments on similarity-based approach developed by Adomavicius
and Kwon [23] and compared with the proposed method. They
proposed different potential ways to calculate the similarity between users based on their criteria ratings. It should be noted that
Chebyshev distance metric performed best among their similaritybased approaches.
Fig. 20 presents the performance results of our experiments for
proposed method and similarity-based approach using Chebyshev
distance metric. The throughput is plotted as a function of the
cluster size demonstrated in Fig. 20. We dene throughput of a

99

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

5000

Similarity-Based Approach YM-20-20


Similarity-Based Approach YM-10-10
HOSVD and Cosine-Based YM-20-20
HOSVD and Cosine-Based YM-10-10

Throughput (Recs./Sec)

4500
4000

Table 15
Coverage and RMSE for YM-5-5, YM-10-10 and YM-20-20.
Size of
neighborhood

YM-10- YM-20- YM-5-5 YM-10- YM-2010


20
10
20
Coverage Coverage Coverage Coverage Coverage Coverage

3000
5
10
15
20
25
30

2000
1500

Multi-criteria CF with
similarity-based approach

YM-5-5

3500

2500

Multi-criteria CF with
proposed method

0.99
0.99
0.99
0.99
0.99
1

0.99
0.99
0.99
0.99
1
1

0.99
0.99
0.99
0.99
1
1

0.54
0.59
0.65
0.66
0.68
0.69

0.70
0.73
0.76
0.79
0.81
0.83

0.75
0.80
0.82
0.84
0.86
0.88

1000
500
3

12

Number of clusters
Fig. 20. Throughput of proposed method versus similarity-based approach.

multi-criteria CF recommender system as the number of recommendations generated per second for k selected user (k = 5). From
the curves in this plot, we see that using the HOSVD and cosinebased approaches for clustering the high-dimensional data, the
throughput is substantially higher than the multi-criteria CF based
on similarity-based approach. This is due to the reason that with
the clustered approach using HOSVD and cosine-based similarity
the prediction algorithm uses a fraction of neighbors. The throughput of multi-criteria recommender system increases rapidly with
the increase in the number of clusters with the small sizes. Since
the multi-criteria CF based on similarity approach has to scan
through all the neighbors, the number of clusters does not impact
the throughput.
We also evaluated the recommendation quality using coverage
measures. Coverage measures the percentage of items for which a
CF system can provide a prediction or that ever appear in a recommendation list [81]. It should be noted that a recommender system
maintains a good level of coverage so that most of the items are
connected in some way to the rest of the data, otherwise they will
be isolated and essentially dormant in the system.
The curves shown in Fig. 21 present the quality of the recommendation of proposed method and reveals that the coverage is
strongly related to the neighborhood size. Table 15 presents the
coverage obtained from the proposed method. To experimentally
show the effectiveness of clustering using HOSVD and cosinebased similarity on coverage, we also performed the experiments
on similarity-based approach as presented in Table 15.

YM 20-20

1.002

YM 10-10
YM 5-5

Coverage

1
0.998
0.996
0.994
0.992
5

15

25

35

Neghiborhood Size
Fig. 21. Neighborhood size and coverage.

45

55

From the Table 15, the proposed method maintains a good level
of coverage in relation to the similarity-based approach on different neighborhood sizes. In addition, the results also conrm that
proposed method and similarity-based approach have good coverage on YM-20-20.

6. Conclusion and future work


In this paper, a new method was proposed using a combination
of HOSVD and ANFIS combined with subtractive clustering to improve the recommendation quality and predictive accuracy of multi-criteria CF. We proposed this method for overcoming the
existing shortcomings such as predicting the overall ratings, sparsity, scalability and uncertainty induced from vagueness and
imprecision in representing and reasoning items features in multi-criteria CF.
Using HOSVD, we reduced the noise of high-dimensional data
effectively and improved the scalability problem. Also, by HOSVD,
we considered all factors in the third-order tensor of user, item and
criteria all together to reveal latent relationships between them.
The results of applying HOSVD method on the high-dimensional
dataset assist us to have clusters with high quality using cosinebased similarity. In addition, tensor decomposition using HOSVD
on the experimental dataset demonstrated its advantages in case
of dimensionality reduction in more than two dimensions for
obtaining favorable approximation of information. From the experiments, we observed that proposed method using HOSVD and ANFIS achieves better recommendation accuracy in relation to the
algorithms in the previous work and methods using solely SVD
and HOSVD.
The experimental results on movie dataset clearly demonstrated the capability of ANFIS modeling using MFs and fuzzy rules
without the human expert intervention in multi-criteria CF. Besides, the model of ANFIS combined with subtractive was used to
extract knowledge from user ratings and preferences on items features. This was done by incorporating the element of training into
the existing Neuro-Fuzzy system. Furthermore, with the training
data of ANFIS, the rules and the MFs were properly tuned to predict
the unknown overall ratings for alleviating the sparsity problem
which have advantages in terms of the simplicity of the algorithm
and the speed of the training convergence. Moreover, users ratings
on items in multi-criteria CF are accumulated overtime and fuzzy
rules can be amended and maintained in rules database for prediction tasks. The advantage of this method is its exibility and
extendibility in which can be developed for any number of dimensions and criteria/features the dataset.
We analysed the predictive accuracy of proposed method on a
real-world dataset in the domain of movie recommendation provided by Yahoo!Movie. We used the popular measurement metrics: the F1, RSME, MAE and the coverage. The proposed method
was evaluated in cases of MAE, Precision@5 and Precision@7 using

100

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

precision metric to be comparable with the algorithms in previous


work. Our experiments conrmed that the hybrid of HOSVD technique and ANFIS combined with subtractive clustering signicantly has improved predictive accuracy and recommendation
quality evaluated by standard accuracy metrics.
For future work, we plan to investigate different classication
and tensor decomposition techniques with dynamic process in
the context of multi-criteria CF using different types of multi-criteria dataset. Future studies will focus on further improvement of the
multi-criteria CF quality by incorporating sophisticated methods
such as fuzzy semantic techniques by considering content-based
recommendation.
Acknowledgements
The authors would like to acknowledge the support of Universti
Teknologi Malaysia (UTM) for providing nancial assistance. We
would like to thank Prof. Dietmar Jannach for providing us with
a multi-criteria data set for our experiment and Ehsan Shekarian
for his helpful comments in revising the paper. Appreciation also
goes to the editors and anonymous reviewers for their valuable
comments and suggestions, which were helpful in improving the
paper.

Appendix A. Supplementary material


Supplementary data associated with this article can be found, in
the online version, at http://dx.doi.org/10.1016/j.knosys.2014.01.
006.
References
[1] S.S. Anand, B. Mobasher, Intelligent techniques for web personalization, in:
Proceedings of the 2003 International Conference on Intelligent Techniques for
Web Personalization, Springer-Verlag, 2003, pp. 136.
[2] J.-M. Yang, K.F. Li, Recommendation based on rational inferences in
collaborative ltering, Knowl.-Based Syst. 22 (2009) 105114.
[3] J. Bobadilla, F. Ortega, A. Hernando, J. Alcal, Improving collaborative ltering
recommender system results and performance using genetic algorithms,
Knowl.-Based Syst. 24 (2011) 13101316.
[4] N. Mehrbakhsh, B. Karamollah, I. Othman, A. Hamid, N.L. Ayodele, R. Nazanin,
Collaborative ltering recommender systems, Res. J. Appl. Sci., Eng. Technol. 5
(2013) 41684182.
[5] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender
systems: a survey of the state-of-the-art and possible extensions, IEEE Trans.
Knowl. Data Eng. 17 (2005) 734749.
[6] M. Deshpande, G. Karypis, Item-based top-n recommendation algorithms, ACM
Trans. Inform. Syst. (TOIS) 22 (2004) 143177.
[7] G. Bordogna, G. Pasi, A exible multi criteria information ltering model, Soft
Comput. 14 (2010) 799809.
[8] B. Sarwar, G. Karypis, J. Konstan, J. Riedl, Application of dimensionality
reduction in recommender system-a case study, in: Proceedings of the ACM
WebKDD Workshop, 2000.
[9] J.A. Konstan, B.N. Miller, D. Maltz, J.L. Herlocker, L.R. Gordon, J. Riedl,
GroupLens: applying collaborative ltering to Usenet news, Commun. ACM
40 (1997) 7787.
[10] B. Sarwar, G. Karypis, J. Konstan, J. Riedl, Item-based collaborative ltering
recommendation algorithms, in: Proceedings of the 10th International
Conference on World Wide Web, ACM, 2001, pp. 285295.
[11] H.J. Ahn, H. Kang, J. Lee, Selecting a small number of products for effective user
proling in collaborative ltering, Expert Syst. Appl. 37 (2010) 30553062.
[12] H.-N. Kim, A.-T. Ji, I. Ha, G.-S. Jo, Collaborative ltering based on collaborative
tagging for enhancing the quality of recommendation, Electron. Comm. Res.
Appl. 9 (2010) 7383.
[13] Y.-J. Park, K.-N. Chang, Individual and group behavior-based customer prole
model for personalized product recommendation, Expert Syst. Appl. 36 (2009)
19321939.
[14] B. Jeong, J. Lee, H. Cho, An iterative semi-explicit rating method for building
collaborative recommender systems, Expert Syst. Appl. 36 (2009) 61816186.
[15] C.-F. Tsai, C. Hung, Cluster ensembles in collaborative ltering
recommendation, Appl. Soft Comput. 12 (2012) 14171425.
[16] G. Chen, F. Wang, C. Zhang, Collaborative ltering using orthogonal
nonnegative matrix tri-factorization, Inform. Process. Manage. 45 (2009)
368379.

[17] J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborative
ltering recommender systems, ACM Trans. Inform. Syst. (TOIS) 22 (2004) 5
53.
[18] J.S. Breese, D. Heckerman, C. Kadie, Empirical analysis of predictive algorithms
for collaborative ltering, in: Proceedings of the Fourteenth Conference on
Uncertainty in Articial Intelligence, Morgan Kaufmann Publishers Inc., 1998,
pp. 4352.
[19] K. Goldberg, T. Roeder, D. Gupta, C. Perkins, Eigentaste: a constant time
collaborative ltering algorithm, Inform. Retriev. 4 (2001) 133151.
[20] L.M. de Campos, J.M. Fernndez-Luna, J.F. Huete, M.A. Rueda-Morales, Using
second-hand information in collaborative recommender systems, Soft
Comput. 14 (2010) 785798.
[21] Y. Koren, R. Bell, C. Volinsky, Matrix factorization techniques for recommender
systems, Computer 42 (2009) 3037.
[22] P. Symeonidis, M.M. Ruxanda, A. Nanopoulos, Y. Manolopoulos, Ternary
semantic analysis of social tags for personalized music recommendation, in:
ISMIR, Citeseer, 2008, pp. 219224.
[23] G. Adomavicius, Y. Kwon, New recommendation techniques for multicriteria
rating systems, Intell. Syst., IEEE 22 (2007) 4855.
[24] D. Jannach, Z. Karakaya, F. Gedikli, Accuracy improvements for multi-criteria
recommender systems, in: Proceedings of the 13th ACM Conference on
Electronic Commerce, ACM, 2012, pp. 674689.
[25] G. Adomavicius, N. Manouselis, Y. Kwon, Multi-criteria recommender systems,
in: Recommender Systems Handbook, Springer, 2011, pp. 769803.
[26] V. Nourani, M. Komasi, A geomorphologybased ANFIS model for multi-station
modeling of rainfallrunoff process, J. Hydrol. (2013).
[27] J. Marx-Gomez, C. Rautenstrauch, A. Nrnberger, R. Kruse, Neuro-fuzzy
approach to forecast returns of scrapped products to recycling and
remanufacturing, Knowl.-Based Syst. 15 (2002) 119128.
[28] S. Sen, J. Vig, J. Riedl, Tagommenders: connecting users to items through tags,
in: Proceedings of the 18th International Conference on World Wide Web,
ACM, 2009, pp. 671680.
[29] J. Lu, Q. Shambour, Y. Xu, Q. Lin, G. Zhang, A web-based personalized business
partner recommendation system using fuzzy semantic techniques, Comput.
Intell. 29 (2013) 3769.
[30] Z. Zhang, H. Lin, K. Liu, D. Wu, G. Zhang, J. Lu, A hybrid fuzzy-based
personalized recommender system for telecom products/services, Inform. Sci.
235 (2013) 117129.
[31] Q. Shambour, J. Lu, A trust-semantic fusion-based recommendation approach
for e-business applications, Dec. Supp. Syst. 54 (2012) 768780.
[32] Q. Shambour, J. Lu, A hybrid trust-enhanced collaborative ltering
recommendation approach for personalized government-to-business eservices, Int. J. Intell. Syst. 26 (2011) 814843.
[33] J.-T. Sun, H.-J. Zeng, H. Liu, Y. Lu, Z. Chen, CubeSVD: a novel approach to
personalized Web search, in: Proceedings of the 14th International Conference
on World Wide Web, ACM, 2005, pp. 382390.
[34] P. Symeonidis, A. Nanopoulos, Y. Manolopoulos, Tag recommendations based
on tensor dimensionality reduction, in: Proceedings of the 2008 ACM
Conference on Recommender Systems, ACM, 2008, pp. 4350.
[35] Y. Xu, L. Zhang, W. Liu, Cubic analysis of social bookmarking for personalized
recommendation, in: Frontiers of WWW Research and Development-APWeb
2006, Springer, 2006, pp. 733738.
[36] M. Leginus, V. Zemaitis, Speeding Up Tensor Based Recommenders with
Clustered Tag Space and Improving Quality of Recommendations with NonNegative Tensor Factorization, Masters thesis, Aalborg University, 2011.
[37] P. Symeonidis, A. Nanopoulos, Y. Manolopoulos, A unied framework for
providing recommendations in social tagging systems based on ternary
semantic analysis, IEEE Trans. Knowl. Data Eng. 22 (2010) 179192.
[38] Q. Li, C. Wang, G. Geng, Improving personalized services in mobile
commerce by a novel multicriteria rating approach, in: WWW, 2008, pp.
12351236.
[39] M. Lesaffre, M. Leman, Using fuzzy logic to handle the users semantic
descriptions in a music retrieval system, in: Theoretical Advances and
Applications of Fuzzy Logic and Soft Computing, Springer, 2007, pp. 8998.
[40] Y. Cao, Y. Li, An intelligent fuzzy-based recommendation system for consumer
electronic products, Expert Syst. Appl. 33 (2007) 230240.
[41] G. Castellano, A. Fanelli, M. Torsello, A neuro-fuzzy collaborative ltering
approach for web recommendation, Int. J. Comput. Sci. 1 (2007) 2729.
[42] L.M. de Campos, J.M. Fernndez-Luna, J.F. Huete, A collaborative recommender
system based on probabilistic inference from fuzzy observations, Fuzzy Sets
Syst. 159 (2008) 15541576.
[43] R.R. Yager, Fuzzy logic methods in recommender systems, Fuzzy Sets Syst. 136
(2003) 133149.
[44] J. Carbo, J.M. Molina, Agent-based collaborative ltering based on fuzzy
recommendations, Int. J. Web Eng. Technol. 1 (2004) 414426.
[45] M.A. Pinto, R. Tanscheit, M. Vellasco, Hybrid recommendation system based on
collaborative ltering and fuzzy numbers, in: IEEE International Conference on
Fuzzy Systems (FUZZ-IEEE), 2012, IEEE, 2012, pp. 16.
[46] T.Y. Tang, G. McCalla, The pedagogical value of papers: a collaborative-ltering
based paper recommender, J. Digital Inform. 10 (2009).
[47] N. Manouselis, C. Costopoulou, Experimental analysis of design choices in
multiattribute utility collaborative ltering, Int. J. Pattern Recog. Artif. Intell.
21 (2007) 311331.
[48] N. Sahoo, R. Krishnan, G. Duncan, J.P. Callan, Collaborative ltering with multicomponent rating for recommender systems, in: Proceedings of the Sixteenth
Workshop on Information Technologies and Systems, 2006.

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101


[49] N. Sahoo, R. Krishnan, G. Duncan, J. Callan, Research notethe halo effect in
multicomponent ratings and its implications for recommender systems: the
case of Yahoo!, Movies, Inform Syst. Res. 23 (2012) 231246.
[50] L. Si, R. Jin, Flexible mixture model for collaborative ltering, in: ICML, 2003,
pp. 704711.
[51] L. Liu, N. Mehandjiev, D.-L. Xu, Multi-criteria service recommendation based
on user criteria preferences, in: Proceedings of the Fifth ACM Conference on
Recommender Systems, ACM, 2011, pp. 7784.
[52] Y. Zhang, Y. Zhuang, J. Wu, L. Zhang, Applying probabilistic latent semantic
analysis to multi-criteria recommender system, AI Commun. 22 (2009) 97107.
[53] Q. Shambour, J. Lu, A hybrid multi-criteria semantic-enhanced collaborative
ltering approach for personalized recommendations, in: IEEE/WIC/ACM
International Conference on Web Intelligence and Intelligent Agent
Technology (WI-IAT), 2011, IEEE, 2011, pp. 7178.
[54] J. Nielsen, Participation inequality: encouraging more users to contribute,
Jakob Nielsens Alertbox 9 (2006).
[55] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge
university press, 2004.
[56] T.-J. Chin, K. Schindler, D. Suter, Incremental kernel SVD for face recognition
with image sets, in: 7th International Conference on Automatic Face and
Gesture Recognition, 2006, FGR 2006, IEEE, 2006, pp. 461466.
[57] Z. Zhang, N. Ye, Learning a tensor subspace for semi-supervised dimensionality
reduction, Soft Comput. 15 (2011) 383395.
[58] L. De Lathauwer, B. De Moor, J. Vandewalle, On the best rank-1 and rank-(R1,
R2, . . . , Rn) approximation of higher-order tensors, SIAM J. Matrix Anal. Appl.
21 (2000) 13241342.
[59] T.G. Kolda, B.W. Bader, Tensor decompositions and applications, SIAM Rev. 51
(2009) 455500.
[60] M. Nilashi, O. Ibrahim, K. Bagherifard, J. Nasim, B. Mahdi, Application of knearest neighbour predictor for classifying online customer trust, J. Theoret.
Appl. Inform. Technol. 36 (2011) 1825.
[61] L.A. Zadeh, Fuzzy sets, Inform. Control 8 (1965) 338353.
[62] M. Nilashi, N. Janahmadi, Assessing and prioritizing affecting factors in Elearning websites using AHP method and fuzzy approach, in: Information and
Knowledge Management, 2012, pp. 4661.
[63] M. Nilashi, O. Ibrahim, A model for detecting customer level intentions to
purchase in B2C websites using TOPSIS and fuzzy logic rule-based system,
Arab. J. Sci. Eng. (2013) 116.
[64] B. Cetisli, A. Barkana, Speeding up the scaled conjugate gradient algorithm and its
application in neuro-fuzzy classier training, Soft Comput. 14 (2010) 365378.
[65] S. Petrovic-Lazarevic, K. Coghill, A. Abraham, Neuro-fuzzy modelling in
support of knowledge management in social regulation of access to
cigarettes by minors, Knowl.-Based Syst. 17 (2004) 5760.

101

[66] M. Sugeno, Industrial Applications of Fuzzy Control, Elsevier Science Inc., 1985.
[67] M. Nilashi, K. Bagherifard, O. Ibrahim, N. Janahmadi, M. Barisami, An
application expert system for evaluating effective factors on trust in B2C
websites, Engineering 3 (2011) 10631071.
[68] M. Nilashi, M. Fathian, M.R. Gholamian, O. bin Ibrahim, A. Talebi, N. Ithnin, A
comparative study of adaptive neuro fuzzy inferences system (ANFIS) and
fuzzy inference system (FIS) approach for trust in B2C electronic commerce
websites, JCIT 6 (2011) 2543.
[69] M. Nilashi, M. Fathian, M.R. Gholamian, O.B. Ibrahim, Propose a model for
customer purchase decision in B2C websites using adaptive neuro-fuzzy
inference system, Int. J. Bus. Res. Manage. (IJBRM) 1 (2011) 118.
[70] Q. Liang, J.M. Mendel, An introduction to type-2 TSK fuzzy logic systems, in:
IEEE International Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE99,
1999, IEEE, 1999, pp. 15341539.
[71] S.L. Chiu, Fuzzy model identication based on cluster estimation, J. Intell.
Fuzzy Syst. 2 (1994) 267278.
[72] A. Bouchachia, W. Pedrycz, Enhancement of fuzzy clustering by mechanisms of
partial supervision, Fuzzy Sets Syst. 157 (2006) 17331759.
[73] M.T. Hayajneh, A.M. Hassan, F. Al-Wedyan, Monitoring defects of ceramic tiles
using fuzzy subtractive clustering-based system identication method, Soft
Comput. 14 (2010) 615626.
[74] A. Bilge, H. Polat, A comparison of clustering-based privacy-preserving
collaborative ltering schemes, Appl. Soft Comput. 13 (2013) 24782489.
[75] A. Bilge, H. Polat, A scalable privacy-preserving recommendation scheme
via bisecting k-means clustering, Inform. Process. Manage. 49 (2013)
912927.
[76] L. Kaufman, P.J. Rousseeuw, Finding Groups In Data: An Introduction To
Cluster Analysis, Wiley.com, 2009.
[77] G. Hamerly, G. Speegle, Efcient model selection for large-scale nearestneighbor data mining, in: Data Security and Security Data, Springer, 2012, pp.
3754.
[78] A. Gunawardana, C. Meek, A unied approach to building hybrid recommender
systems, in: Proceedings of the Third ACM Conference on Recommender
Systems, ACM, 2009, pp. 117124.
[79] D. Billsus, M.J. Pazzani, User modeling for adaptive news access, User Model.
User-Adap. Interact. 10 (2000) 147180.
[80] K. Bagherifard, M. Nilashi, O. Ibrahim, N. Ithnin, L.A. Nojeem, Measuring
semantic similarity in grids using ontology, Int. J. Innov. Appl. Stud. 2 (2013)
230237.
[81] J.L. Herlocker, J.A. Konstan, A. Borchers, J. Riedl, An algorithmic framework for
performing collaborative ltering, in: Proceedings of the 22nd Annual
International ACM SIGIR Conference on Research and Development in
Information Retrieval, ACM, 1999, pp. 230237.

You might also like