You are on page 1of 4

An Recommendation Algorithm Based on Weighted Slope One Algorithm and

User-Based Collaborative Filtering


WANG Panpan, QIAN Qian, SHANG Zhenhong, LI Jingsong
Yunnan Key Laboratory of Computer Technology Applications, Kunming University of Science and Technology, Kunming
650504,China
E-mail: qianqian_yn@126.com

Abstract: Personalized recommendation is one of the most popular marketing methods, and collaborative filtering is one
of the most successful recommendation technologies. However, data sparsity problem results in the low prediction
accuracy and the poor recommendation quality. To resolve this problem, the present study proposed an improved
recommendation method with weighted Slope one algorithm. The method calculates the similarity between users based
on users’ ratings, so as to find every user’s nearest neighbors. Based on the nearest neighbor’s ratings, weighted Slope
one algorithm is used to predict the unknown ratings of the target user and to present recommendation results. In the
experiment, MovieLens data set was used to test the recommendation accuracy of the method. The experimental results
suggest that the improved algorithm can effectively improve the accuracy of rating prediction and the recommendation
performance.
Key Words: Personalized Recommendation, Similarity, Collaborative Filtering, Weighted Slope One Algorithm

one is a simple collaborative filtering algorithm, it is based


1 INTRODUCTION on linear regression model and utilizes rating deviations to
With the development of Internet and e-commerce, predict ungraded items. Daniel Lemire and Anna
shopping online has become a popular way of consumption, Maclachlan proposed Slope one algorithm in 2005 [3]. This
more and more people rely on a series of online services. algorithm receives much concern for its easy to understand
However, the number of commodities is growing and implement. However, Slope one algorithm is not better
increasingly. In order to help people to find their favorite than traditional collaborative filtering algorithm when data
products among large number of items, recommendation is sparse, so this algorithm cannot resolve the data sparsity
systems are developed. Recommendation systems can problem. Many improved Slope one algorithms have been
provide the items which users interested in. The core issues proposed to obtain higher accuracy in recommendation. Pu
of the recommendation systems are making efforts to Wang and HongWu Ye [1] used Slope one scheme to fill
recommend matching products to users, such products blank ratings of the user-item rating matrix, then used
should be personalized, affordable and with high degree of user-based collaborative filtering recommendation
matching ratings. These requirements lead to lots of algorithm to give recommendation; DeJia Zhang [2]
challenges for technology development and psychology proposed an item-based collaborative filtering
researches. Now, the existing recommendation recommendation algorithm using Slope one scheme
technologies provide solutions for different areas (such as smoothing, it used items’ neighbors to predict the ratings of
e-commerce sites, social networking sites, financial the unrated items; Hua Chai, JianYi Liu [3], proposed an
products, real estates, and so on). The real site instances improved Slope one algorithm to solve poor quality
include MovieFinder, Amazon, CDNow, Dangdang, problem, item similarity and single value decomposition
Douban, etc. Personalized recommendation systems have made the result more resonable; MaoKang Du, Miao Liu,
good prospects, but these systems are also facing the ShaoHua Li, and Qin Pu [4] proposed a new collaborative
following problems: data sparsity, cold start, scalability and filtering algorithm which combined k-nearest items and
other challenges [1][2]. With the further research, these Slope one.
questions have been resolved to a certain extent, and have The above-mentioned studies are all based on the basic
created satisfactory recommendations in movie, music, and Slope one algorithm, and make prediction on the nearest
some other fields. items. But the user’s nearest neighbors are different, they
The present recommendation algorithms are mainly divided have various preferences to the items. In this paper, an
into four kinds: content-based algorithms, improved Slope one algorithm is proposed, this algorithm is
knowledge-based algorithms, collaborative filtering ground on user-based collaborative filtering algorithm.
algorithms, and hybrid algorithms. Among these Specifically, the algorithm firstly calculates the user's
recommendation algorithms, collaborative filtering is one nearest neighbors by the similarity between users, and then
of the most widely used recommendation techniques, Slope finds the appropriate number of neighbors based on
user-based collaborative filtering. In the end, weighted
This research is supported by the National Science Foundation of Chi Slope one algorithm is used within the user’s nearest
na (61462053ˈ61462052ˈand 31300938).

978-1-4673-9714-8/16/$31.00 2016
c IEEE 2431
neighbors to predict empty ratings and to give
recommendation.
¦ ( R − R )( R − R ) i,m i j,m j
Sim ( i, j) =
m∈Ii,j
(3)
2 THE SIMILARITY BETWEEN USERS
¦ (R − R ) ¦ (R − R )
2 2
m∈Ii,j i,m i m∈Ii,j j,m j
In order to find user’s neighbors, this study use the
collaborative filtering method to calculate the similarity Where R i is the average rating of user i on the users set Ii,j ,
between users and to find the nearest neighbor set. R j is the average rating of user j on the users set Ii,j .
Collaborative filtering is mainly classified into two types:
model-based collaborative filtering algorithm and The calculation of cosine correlation is simple, but when
memory-based collaborative filtering algorithm. The latter the data is sparse, it is not a good way to find the nearest
algorithm includes user-based collaborative filtering and neighbors. Pearson correlation can achieve better results
item-based collaborative filtering. The main idea of compared with other methods [11], it was used by many
user-based collaborative filtering is to find out the users recommendation systems. In this research, the similarities
who are similar to the behavior of the target user based on among users are calculated by Pearson correlation method.
history data, and to recommend these users' behavior to the
target user. So the first step of collaborative filtering 3 SLOPE ONE ALGORITHM
algorithm is to calculate the similarities among users, then Slope one is a typical item-based collaborative filtering
do recommendations to the target user. The recommended algorithm. The algorithm takes into account both the
items should be the items which have not been rated by the information from other users who rate the same items and
target users. The aim of similarity calculation is to find the information of the other items rated by the same user,
similar users, and get the neighbor set based on the value of and it is based on the rating difference between items.
similarity. The similarity is particularly important for that it The basic Slope one algorithm [5][6][7] is derived from a
affects the quality of recommendation. The specific way of
simple linear model: y = f ( x ) = x + b , where x is a
similarity calculation is introduced as follows.
To calculate the similarities between users according to variable presented the difference of ratings and b is a
users’ ratings, the fundamental computing methods are constant, for example, the ratings of five items which are
rated by user1, user2, user3,user4 are shown in table 1 as
cosine correlationǃadjusted cosine vector similarityǃand
follows:
Pearson correlation.
First, the rating matrix R(mhQ) of users is defined, where Table1. Ratings for five items
m denotes the number of users, n represents the number of item
items. R i,j is the rating of the item j rated by user i, Ii,j is item1 item2 item3 item4 item5
user
the item set both rated by user i and user j, where user1 5 1 1 3
Ii,j = Ii ∩ I j .
user2 4 2 3 2 2
1) Cosine correlation
user3 4 3 5 3
The user's ratings are seen as a vector of n-dimension
spaces on the project, user i and user j are expressed as the user4 3 2 ?
vector i and the vector j. If the user has no ratings on the
items, the ratings are defined as zero, so the similarity
calculation between user i and user j is as follows: From the table 1, we want to know how the user4 rated to
G G the item5, so the average difference which user1, user2 and
i⋅ j
Sim ( i, j) = cos ( i, j) = G 2 G 2 (1) user3 had rated to the item2, item3 and item5 is first
& i & ⋅& j & calculated, the calculation process is:
Where numerator is the inner product of two users’ rating ( ( 3 − 1 ) + ( 2 − 2 ) ) / 2 = 1 , ( ( 3 − 1) + ( 2 − 3 ) ) / 2 = 0.5 . The
vectors, the denominator is the product of two users’ rating
norm of vectors. computational result is: ( ( 3 + 1) + ( 2 + 0.5) ) / 2 = 3.25 , thus,
2) Adjusted cosine vector similarity this result fill the vacant rating of user4 rated to the item5.
Ii is the item set rated by user i, I j is the item set rated by Given the training set, the average difference between item
i and item j could be defined as dev j,i , the formula is as
user j, so the formula is as follows:
follows:
¦ (R − R )(R − R )
m∈Ii,j i,m i j,m j
u j − ui
Sim ( i, j) = dev j,i =
(2) ¦ card ( S j,i ( Ȥ ) )
(4)
¦ (R − R ) ¦ (R − R )
2 2
i,m i j,m j u∈S j,i( Ȥ )
m∈Ii m∈I j

u j is the rating of the user u to item j, u i is the rating of the


Where R i is the average rating of user i for all the co-rated
user u to item i, S j,i ( χ ) is the users that have both rated item
items, R j is the average rating of user j for all the co-rated
i and item j, card ( ) is the sum of the items in the set.
items.
3) Pearson correlation

2432 2016 28th Chinese Control and Decision Conference (CCDC)


Finally, the algorithm use dev j,i + u i to get the predicted user does not have the same rated items with others, the
rating of item j, when it is put on all the possible predictions, similarity is set to zero.
the formula can be obtained: 2) According to the similarities, the top k users were added
1 to the target user's nearest neighbor S ( k ) .
P (u )j = ¦dev j,i + u i
card ( R j ) i∈R j
(5)
3) The formula (7) is used to calculate the ratings P ( u ) j
Where P ( u ) j is the predictive ratings of item jˈ card ( R j ) is which are not rated by the target user. If the items were
not rated by the neighbors, the user's rating will be
the item set ( i ≠ j and S ( u ) is not empty) that all users have replaced by the average value of the rated items of the
given scores. user.
When computing the average deviation ( dev j,i ) between 4) The top N (N=10) items are recommended according to
item i and item j, the basic Slope one algorithm does not the ratings calculated by the above procedures.
take the different number of neighbors into account. In
other words, the credibility is not the same. For instance, if
5 EXPERIMENTAL EVALUATION
1000 users rated the pair of item j and item k, whereas 10
users rated the pair of item j and item l, obviously, dev j,k is 5.1 Data Set
likely to be more precise than dev j,l . Daniel Lemire defined The Movielens data set was used in the present experiment.
The data set[8] has been widely used, and it consists of
the formula of weighted Slope one algorithm [9] as follows: 100,000 ratings from 943 users on 1682 movies. In the data
¦ i∈S( u ) −{ j}
( dev j,i + u i ) c j,i set, each user rated at least 20 movies. The ratings ranged
P wSl ( u ) j = (6) 1-5 expressed the fondness of users to movies, 4 and 5
¦ c
i∈S( u ) −{ j} j,i represents positive ratings, 1 and 2 represents negative
Here, the weighted value is described as c j,i , the predicted ratings. The density of the data set is:
100000
ratings of unrated items is defined as P wSl ( u ) j , × 100% = 6.3% , the lowest level of sparsity for
943 × 1682
c j,i = card ( S j,i ( Ȥ ) ) . the tests is defined as: 1 − 6.3% = 93.7% . Hence one can
see that the data set is very sparse.
4 WEIGHTED SLOPE ONE AND USER This experiment divided the data set into training set and
BASED COLLABORATIVE FILTERING test set, and used 5-fold cross-validation. The data set was
split up into five disjoint subsets, of which training set and
Although Slope one algorithm performs well, it does not test set ratio is 0.8/0.2. Training set contains history rating
take the difference between users into account. Weighted records, and recommendation is conducted based on it to
Slope one algorithm and user-based collaborative filtering the users of the testing set.
algorithm are adopted in this paper to solve this problem.
An important step of this algorithm is to calculate the 5.2 Metrics
similarities of users. Pearson correlation is used to calculate
the similarities between users and to select the most nearest The metrics for evaluating the accuracy of prediction
neighbors of the target user. After that, the ratings of the algorithms can be divided into two main categories
unrated items are predicted and the recommendation results [9][10][11][12]: decision-support metrics and statistical
are given, based on the weighted Slope one algorithm. The accuracy metrics. Statistical accuracy metrics, Mean
number of neighbors is represented by the letter k, the value Absolute Error (MAE), was used to measure prediction
of k gets from experiments, the nearest neighbors set is performance of the proposed algorithm in the present study.
The MAE is calculated by the difference between user's
represented by S ( k ) , then the prediction formula is as
predictive score and the user's actual score, which is
follows: defined as:
¦ ( dev + u ) c
i∈S( k ) ¦
n
Pi − R i
P (u )j =
j,i i j,i
i =1 (8)
(7) M AE =
¦ ( )c i∈S k j,i n
To get higher accuracy, the algorithm predict the blank Where, Pi is the prediction rating, R i is the real rating of
ratings based on the nearest neighbor set, and the neighbor the testing set, n is the total number of ratings in the testing
set is chosen according to the similarity between users, the set, the smaller the value of MAE is, the better the
analysis of this algorithm is as follows: prediction accuracy would be.
Input: the training data set, rating matrix R ( m × n ) , the 5.3 Get the Number of Neighbors
target user u, the target item j.
In the proposed algorithm, the chosen of neighbors’ scale
Output: the prediction rating: P ( u ) j : [13] affects recommendation quality. The number of the
1) First, the users who have rated the same items as the neighbors was represented by k, in order to select the
target user are found. Then the formula (3) is used to appropriate value of k, different values of k ( k=2, 5, 10, 15,
calculate the similarities between users. If the target 20, 25, 30, 35, 40, 45, 50 ) were tested to predict the ratings
accuracy results. Each k value was tested five times and the

2016 28th Chinese Control and Decision Conference (CCDC) 2433


average value of the 5 values was used as the final results. are replaced by the average ratings. The experimental
The results are shown in Figure 1. results show that the improved weighted Slope one
algorithm is more accurate than weighted Slope one
algorithm, and satisfactory results can be achieved under
data sparse condition.

REFERENCES
[1] P. Wang, H.W. Ye, A personlized recommendation
algorithm combining slope one scheme and user based
collaborative filtering, Proceedings of International
Conference on Industrial and Information Systems, 152-154ˈ
Fig 1. Comparison of MAE of different neighbors
2009.
From the Figure 1, as the number of neighbors increased, [2] D.J. Zhang, An item-based collaborative filtering
the MAE values decreased first and then increased when k recommendation algorithm using slope one scheme
smothing, Proceedings of 2th International Symposium on
was greater than 10, which means that the accuracy of Electronic Commerce and Security, IEEE Computer Society
prediction was increased first and then decreased when k Press, 215-217,2009.
was greater than 10. Therefore, value 10 was selected for [3] H. Chai, J.Y. Liu. Research on improved Slope one
the best k value. recommendation algorithm, NetinfoSecurity, Vol.2, 77-81,
2015.
5.4 Predicting Results
[4] M.K. Du, M. Liu, S.H. Li, Q. Pu, Slope one collaborative
In the end, the experimental results of MAE from the filtering recommendation algorithm based on neighbor,
improved weighted Slope one algorithm were compared Journal of Chongqing University of Posts and
with MAE from weighted Slope one algorithm and Telecommunications, Vol.26, No.3, 421-426, 2014.
user-based collaborative filtering algorithm, as shown in [5] T.Q. Jiang, W. Lu, H.T. Xiong, Personalized collaborative
Figure 2. filtering based on improved slope one Algorithm,
Proceedings of International Conference on Systems and
Informatics, 2312-2315,2012.
[6] C. Lv, Weighted item deviation Slope One collaborative
filtering recommendation algorithm based on the user
similarities, Journal of Nanchang University, Vol.38, No.4,
342-347, 2014.
[7] D.J. Lin, X.W. Meng, Slope one algorithm based on single
value decomposition, New Industrialization Straregy, Vol.2,
No.11, 12-17, 2012.
[8] J. Chen, Y. Pan, Z.H. Zhang, F. Pan, Slope One Model and
Algorithm Based on Real-time User Behavior, Operations
Research and Management Science, Vol.24,No.1, 89-92,
Fig 2. MAE of different algorithms 2015.
As it can be seen, the MAE from the improved algorithm is [9] D. Lemire, A. Maclachlan, Slope one predictors for onine
lower than the basic weighted Slope one when k=10. The rating-based collaborative filtering, Proceedings of SIAM
Data Mining Conference, 2005.
proposed algorithm alleviates the data sparsity problem.
The experimental results show that the improved Slope one [10] Y. Wang, H.Y. Lou, Improved slope one algorithm for
collaborativefiltering, Computer Science, Vol.38, No.10A,
algorithm is more accurate than weighted Slope one
192-194, 2011.
algorithm.
[11] M.W. Wang, H.L. Tao, X.Y. Xiong, A collabotative filtering
recommendation algorithm based on iterative Bidirectional
6 CONCLUSIONS Clustering, Journal of Chinese Information Processing,
Vol.22, No.4, 61-65, 2008.
The result of traditional Slope one algorithm is less
[12] J.J. Wang, K.H. Lin, J. Li, A Collaborative Filtering
accurate, because it does not take the similarities between Recommendation Algorithm Based on User Clustering and
users into consideration. An improved weighted Slope one Slope one Scheme, Proceedings of 8th Computer Science &
algorithm, which is based on the similarity between users, Education,1473-1476,2013.
is proposed in the present study to improve the [13] C.G. Huang, J. Yin, J. Wang, Uncertain
recommendation accuracy. Specifically, the most similar neighbors’collaborative filtering recommendation algorithm,
users are chosen as neighbors, then the weighted Slope one Chinese Journal of Computers, Vol.33, No.8, 1369-1377,
algorithm is used to fill the blank ratings. If the user does 2010.
not have neighbors (k=0), the ratings of the unrated items

2434 2016 28th Chinese Control and Decision Conference (CCDC)

You might also like