Professional Documents
Culture Documents
Bachelor of Engineering
in
Information Technology
by
Harshita Krishna (BE/10731/2013)
Apoorva Rastogi (BE/10580/2013)
Anisha Dutta (BE/10329/2013)
Supervised by:
May 2017
DECLARATION CERTIFICATE
This is to certify that the work presented in this project entitled “Recommender
Systems”, in partial fulfillment of the requirement for the award of Degree of
Bachelor of Engineering in Information Technology, submitted to the
Department of Computer Science and Engineering of Birla Institute of
Technology, Mesra, Ranchi, Jharkhand is a bonafide work carried out by Harshita
Krishna, Apoorva Rastogi and Anisha Dutta under my supervision and
guidance.
To the best of my knowledge, the content of this project, either partially or fully,
has not been submitted to any other institution for the award of any other degree.
Head, Dean,
Deapartment of CSE, Undergraduate Studies,
BIT Mesra BIT Mesra
,
CERTIFICATE OF APPROVAL
We owe our deepest gratitude to our advisor Dr. Abhijit Mustafi, for his constant
support and motivation, despite his extremely busy schedule. Our interactions with
him always resulted in new ideas and proved beneficial towards our work. Without
his constant presence and supervision our work would not have been successful.
We are very grateful to Dr. Sandip Dutta, Head of the Department, Department
of Computer Science and Engineering, Birla Institute of Technology, Mesra
for extending all the facilities at all times for pursing this course.
And lastly our batch mates who have always been there with valuable suggestions
and support in our endeavors.
Harshita Krishna
Apoorva Rastogi
Anisha Dutta
ABSTRACT
Recommender systems are information filtering tools that seek to predict ratings
for users and items, primarily from Big Data to recommend their likes. Movie
recommendations provide a mechanism to assist users to discover movies that they
would like to watch based on the behavior of similar users; in addition, it also
helps users expand their horizons by providing recommendations from unexplored
genres, because similar users have a liking for those movies. This makes a
recommender system a significant part of website and e-commerce applications.
This article focuses on movie recommendation systems whose primary objective is
to suggest a recommender system by treating the sparse movie ratings matrix, as a
matrix completion problem. The first version comprises of a version of matrix
completion problem wherein a statistical approach is used. The second version
makes use of more advanced technique, i.e., Non- negative matrix factorization.
The two versions are simulated to run on MovieLens dataset, which is available
freely on the internet. The two versions are hence compared and the results have
been analyzed and interpreted. Evaluation metrics such as mean squared error has
been used to get a quantitative measure of the efficiency of the two versions thus
designed.
TABLE OF CONTENTS
S. No. Title
1 Introduction
3 Real-Life Examples
6 MovieLens Dataset
7 Project Objective
10 Conclusion
11 References
INTRODUCTION
The increasing importance of the Web as a medium for electronic and business
transactions has served as a driving force for the development of recommender
systems technology. An important catalyst in this regard is the ease with which the
Web enables users to provide feedback about their likes or dislikes. For example,
consider a scenario of a content provider such as Netflix. In such cases, users are
able to easily provide feedback with a simple click of a mouse. A typical
methodology to provide feedback is in the form of ratings, in which users select
numerical values from a specific evaluation system (e.g., five-star rating system)
that specify their likes and dislikes of various items.
Other forms of feedback are not quite as explicit but are even easier to collect in
the Web-centric paradigm. For example, the simple act of a user buying or
browsing an item may be viewed as an endorsement for that item. Such forms of
feedback are commonly used by online merchants such as Amazon.com, and the
collection of this type of data is completely effortless in terms of the work required
of a customer. The basic idea of recommender systems is to utilize these various
sources of data to infer customer interests.
The entity to which the recommendation is provided is referred to as the user, and
the product being recommended is also referred to as an item. Therefore,
recommendation analysis is often based on the previous interaction between users
and items, because past interests and proclivities are often good indicators of future
choices.
So, what is the basic principle that underlies the working of recommendation
algorithms?
The basic principle of recommendations is that significant dependencies exist
between user and item-centric activity. For example, a user who is interested in a
historical documentary is more likely to be interested in another historical
documentary or an educational program, rather than in an action movie. In many
cases, various categories of items may show significant correlations, which can be
leveraged to make more accurate recommendations. Alternatively, the
dependencies may be present at the finer granularity of individual items rather than
categories. These dependencies can be learned in a data-driven manner from the
ratings matrix, and the resulting model is used to make predictions for target users.
GOALS OF RECOMMENDER SYSTEMS
Novelty: Recommender systems are truly helpful when the recommended item is
something that the user has not seen in the past. For example, popular movies of a
preferred genre would rarely be novel to the user. Repeated recommendation of
popular items can also lead to reduction in sales diversity.
Netflix was founded as a mail-order digital video disc (DVD) Rental Company of
movies and television shows, which was eventually expanded to streaming
delivery. At the present time, the primary business of Netflix is that of providing
streaming delivery of movies and television shows on a subscription basis. Netflix
provides users the ability to rate the movies and television shows on a 5-point
scale. Furthermore, the user actions in terms of watching various items are also
stored by Netflix. These ratings and actions are then used by Netflix to make
recommendations.
Collaborative Filtering
The information source that content-based filtering systems are mostly used with is
text documents. A standard approach for term parsing selects single words from
documents. The vector space model and latent semantic indexing are two methods
that use these terms to represent documents as vectors in a multi dimensional
space.
Relevance feedback, genetic algorithms, neural networks, and the Bayesian
classifier are among the learning techniques for learning a user profile. The vector
space model and latent semantic indexing can both be used by these learning
methods to represent documents. Some of the learning methods also represent the
user profile as one or more vectors in the same multi dimensional space which
makes it easy to compare documents and profiles. Other learning methods such as
the Bayesian classifier and neural networks do not use this space but represent the
user profile in their own way.
As previously detailed, Pandora Radio is a popular example of a content-based
recommender system that plays music with similar characteristics to that of a song
provided by the user as an initial seed. There are also a large number of content-
based recommender systems aimed at providing movie recommendations; a few
such examples include Rotten Tomatoes, Internet Movie Database, Jinni, Rovi
Corporation, and Jaman. Document related recommender systems aim at providing
document recommendations to knowledge workers. Public health professionals
have been studying recommender systems to personalize health education and
preventative strategies.
Recommendation systems are great for discovery. For example, the "Genius
Recommendations" feature of iTunes, "Frequently Bought Together" of
Amazon.com makes surprising recommendations which are similar to what
we already like. The "Now Touching The Void and Into Thin Air" example
discussed in class is a best example.
Neither the University of Minnesota nor any of the researchers involved can
guarantee the correctness of the data, its suitability for any particular purpose, or
the validity of results based on the use of the data set. The data set may be used for
any research purposes under the following conditions:
* The user may not state or imply any endorsement from the University of
Minnesota or the GroupLens Research Group.
* The user must acknowledge the use of the data set in publications resulting from
the use of the data set.
* The user may not redistribute the data without separate permission.
* The user may not use this information for any commercial or revenue-bearing
purposes without first obtaining permission from a faculty member of the
GroupLens Research Project at the University of Minnesota.
PROJECT OBJECTIVE
PROCESS FLOW:
User similarity is calculated between the target users and all the other
users. Similarity is calculated using Cosine Distance:
where, A and B are the users.
The iterative function returns the K nearest neighbors and their respective
similarities with the target user.
Now we have found the users which are most similar to our target user. We
predict the ratings for the movies watched by them using the formula:
Where, sim(a,i) = similarity between target user ‘a’ and the nearest
neighbor ‘i’ in the set NSa (set of nearest neighbors)
ra = mean rating of target user ‘a’
So we recommend the movies having a rating > a set threshold (here, 3).
The list of movies that the user (userID=731) has already watched is as
follows:
The recommended movies and their predicted ratings for our target user
(userID=731) are as follows:
The relationship between mean square error (predicted rating and mean rating) and
K is shown in the given plot :
The list of movies that the user (userID=122) has already watched is as
follows:
The recommended movies and their predicted ratings for our target user
(userID=122) are as follows:
The relationship between mean square error (predicted rating and mean
rating) and K is shown in the given plot :
PHASE 2 :
NON-NEGATIVE MATRIX FACTORIZATION
NMF ALGORITHM
NEEDED COMPUTATIONS
Compute predicted element for each user-movie pair (dot product of row and
column in P and Q) :
Compute the squared error for each user-movie pair (in order to compute the
gradient) :
Find the gradient (slope of error curve) by taking the differential of the error
of each element.
Update each element in P and Q by using a learning rate, α. This determines
how far to travel along the gradient. α is generally small because if we
choose a step size that is too large, we could miss the minimum.
PREDICTED RECOMMENDATIONS
Our movie recommendations for the target user (userID=731) after performing
NMF are as follows:
The relationship between mean square error (predicted rating and mean rating) and
K (no. of iterations) is shown in the given plot:
CONCLUSION
Using a statistical approach we get the recommended movies and predicted ratings
for our target user.
Deuk Hee Park, Hyea Kyeong Kim, Il Young Choi, and Jae Kyeong Kim,
“A literature review and classification of recommender systems research”,
Expert Systems with Applications, Volume 39, Issue 11, 1 September 2012,
Pages 10059–10072