You are on page 1of 16

734 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO.

6, JUNE 2005

Toward the Next Generation of Recommender


Systems: A Survey of the State-of-the-Art and
Possible Extensions
Gediminas Adomavicius, Member, IEEE, and Alexander Tuzhilin, Member, IEEE

Abstract—This paper presents an overview of the field of recommender systems and describes the current generation of
recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid
recommendation approaches. This paper also describes various limitations of current recommendation methods and discusses
possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader
range of applications. These extensions include, among others, an improvement of understanding of users and items, incorporation of
the contextual information into the recommendation process, support for multcriteria ratings, and a provision of more flexible and less
intrusive types of recommendations.

Index Terms—Recommender systems, collaborative filtering, rating estimation methods, extensions to recommender systems.

1 INTRODUCTION

R ECOMMENDER systems have become an important


research area since the appearance of the first papers
on collaborative filtering in the mid-1990s [45], [86], [97].
methods, incorporation of various contextual information
into the recommendation process, utilization of multcriteria
ratings, development of less intrusive and more flexible
There has been much work done both in the industry and recommendation methods that also rely on the measures
academia on developing new approaches to recommender that more effectively determine performance of recommen-
systems over the last decade. The interest in this area still der systems.
remains high because it constitutes a problem-rich In this paper, we describe various ways to extend the
research area and because of the abundance of practical capabilities of recommender systems. However, before
applications that help users to deal with information doing this, we first present a comprehensive survey of the
overload and provide personalized recommendations, state-of-the-art in recommender systems in Section 2. Then,
content, and services to them. Examples of such applica- we identify various limitations of the current generation of
tions include recommending books, CDs, and other recommendation methods and discuss some initial ap-
products at Amazon.com [61], movies by MovieLens proaches to extending their capabilities in Section 3.
[67], and news at VERSIFI Technologies (formerly
AdaptiveInfo.com) [14]. Moreover, some of the vendors
have incorporated recommendation capabilities into their
2 THE SURVEY OF RECOMMENDER SYSTEMS
commerce servers [78]. Although the roots of recommender systems can be traced
However, despite all of these advances, the current back to the extensive work in cognitive science [87],
generation of recommender systems still requires further approximation theory [81], information retrieval [89],
improvements to make recommendation methods more forecasting theories [6], and also have links to management
effective and applicable to an even broader range of real-life science [71] and to consumer choice modeling in marketing
applications, including recommending vacations, certain [60], recommender systems emerged as an independent
research area in the mid-1990s when researchers started
types of financial services to investors, and products to
focusing on recommendation problems that explicitly rely
purchase in a store made by a “smart” shopping cart [106].
on the ratings structure. In its most common formulation,
These improvements include better methods for represent- the recommendation problem is reduced to the problem of
ing user behavior and the information about the items to be estimating ratings for the items that have not been seen by a
recommended, more advanced recommendation modeling user. Intuitively, this estimation is usually based on the
ratings given by this user to other items and on some other
information that will be formally described below. Once we
. G. Adomavicius is with the Carlson School of Management, University of
Minnesota, 321 19th Avenue South, Minneapolis, MN 55455. can estimate ratings for the yet unrated items, we can
E-mail: gedas@umn.edu. recommend to the user the item(s) with the highest
. A. Tuzhilin is with the Stern School of Business, New York University, estimated rating(s).
44 West 4th Street, New York, NY 10012. E-mail: atuzhili@stern.nyu.edu.
More formally, the recommendation problem can be
Manuscript received 8 Mar. 2004; revised 14 Oct. 2004; accepted 10 Dec. formulated as follows: Let C be the set of all users and let S
2004; published online 20 Apr. 2005.
For information on obtaining reprints of this article, please send e-mail to: be the set of all possible items that can be recommended,
tkde@computer.org, and reference IEEECS Log Number TKDE-0071-0304. such as books, movies, or restaurants. The space S of
1041-4347/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
ADOMAVICIUS AND TUZHILIN: TOWARD THE NEXT GENERATION OF RECOMMENDER SYSTEMS: A SURVEY OF THE STATE-OF-THE-ART... 735

TABLE 1
A Fragment of a Rating Matrix for a Movie Recommender System

possible items can be very large, ranging in hundreds of and 2) estimating the utility function that optimizes certain
thousands or even millions of items in some applications, performance criterion, such as the mean square error.
such as recommending books or CDs. Similarly, the user Once the unknown ratings are estimated, actual
space can also be very large—millions in some cases. Let u recommendations of an item to a user are made by
be a utility function that measures the usefulness of item s selecting the highest rating among all the estimated
to user c, i.e., u : C  S ! R, where R is a totally ordered set ratings for that user, according to (1). Alternatively, we
(e.g., nonnegative integers or real numbers within a certain can recommend the N best items to a user or a set of
range). Then, for each user c 2 C, we want to choose such users to an item.
item s0 2 S that maximizes the user’s utility. More formally: The new ratings of the not-yet-rated items can be
estimated in many different ways using methods from
8c 2 C; s0c ¼ arg max uðc; sÞ: ð1Þ machine learning, approximation theory, and various
s2S
heuristics. Recommender systems are usually classified
In recommender systems, the utility of an item is usually according to their approach to rating estimation and, in the
represented by a rating, which indicates how a particular user next section, we will present such a classification that was
liked a particular item, e.g., John Doe gave the movie “Harry proposed in the literature and will provide a survey of
Potter” the rating of 7 (out of 10). However, as indicated different types of recommender systems. The commonly
earlier, in general, utility can be an arbitrary function, accepted formulation of the recommendation problem was
including a profit function. Depending on the application, first stated in [45], [86], [97] and this problem has been
utility u can either be specified by the user, as is often done for studied extensively since then. Moreover, recommender
the user-defined ratings, or is computed by the application, as systems are usually classified into the following categories,
can be the case for a profit-based utility function. based on how recommendations are made [8]:
Each element of the user space C can be defined with a
profile that includes various user characteristics, such as age, . Content-based recommendations: The user will be
gender, income, marital status, etc. In the simplest case, the recommended items similar to the ones the user
profile can contain only a single (unique) element, such as preferred in the past;
User ID. Similarly, each element of the item space S is . Collaborative recommendations: The user will be
defined with a set of characteristics. For example, in a recommended items that people with similar tastes
movie recommendation application, where S is a collection and preferences liked in the past;
of movies, each movie can be represented not only by its ID, . Hybrid approaches: These methods combine colla-
but also by its title, genre, director, year of release, leading borative and content-based methods.
actors, etc. In addition to recommender systems that predict the
The central problem of recommender systems lies in that absolute values of ratings that individual users would give
utility u is usually not defined on the whole C  S space, to the yet unseen items (as discussed above), there has been
but only on some subset of it. This means u needs to be work done on preference-based filtering, i.e., predicting the
extrapolated to the whole space C  S. In recommender relative preferences of users [22], [35], [51], [52]. For
systems, utility is typically represented by ratings and is example, in a movie recommendation application, prefer-
initially defined only on the items previously rated by the ence-based filtering techniques would focus on predicting
users. For example, in a movie recommendation application the correct relative order of the movies, rather than their
(such as the one at MovieLens.org), users initially rate some individual ratings. However, this paper focuses primarily
subset of movies that they have already seen. An example on rating-based recommenders since it constitutes the most
of a user-item rating matrix for a movie recommendation popular approach to recommender systems.
application is presented in Table 1, where ratings are
specified on the scale of 1 to 5. The “” symbol for some of 2.1 Content-Based Methods
the ratings in Table 1 means that the users have not rated In content-based recommendation methods, the utility
the corresponding movies. Therefore, the recommendation uðc; sÞ of item s for user c is estimated based on the utilities
engine should be able to estimate (predict) the ratings of the uðc; si Þ assigned by user c to items si 2 S that are “similar”
nonrated movie/user combinations and issue appropriate to item s. For example, in a movie recommendation
recommendations based on these predictions. application, in order to recommend movies to user c, the
Extrapolations from known to unknown ratings are content-based recommender system tries to understand the
usually done by 1) specifying heuristics that define the commonalities among the movies user c has rated highly in
utility function and empirically validating its performance the past (specific actors, directors, genres, subject matter,

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
736 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 6, JUNE 2005

etc.). Then, only the movies that have a high degree of and the content of document dj is defined as
similarity to whatever the user’s preferences are would be
recommended. Contentðdj Þ ¼ ðw1j ; . . . wkj Þ:
The content-based approach to recommendation has its As stated earlier, content-based systems recommend
roots in information retrieval [7], [89] and information items similar to those that a user liked in the past [56], [69],
filtering [10] research. Because of the significant and early [77]. In particular, various candidate items are compared
advancements made by the information retrieval and
with items previously rated by the user and the best-
filtering communities and because of the importance of
matching item(s) are recommended. More formally, let
several text-based applications, many current content-based
systems focus on recommending items containing textual ContentBasedP rofileðcÞ be the profile of user c containing
information, such as documents, Web sites (URLs), and tastes and preferences of this user. These profiles are
Usenet news messages. The improvement over the tradi- obtained by analyzing the content of the items previously
tional information retrieval approaches comes from the use seen and rated by the user and are usually constructed
of user profiles that contain information about users’ tastes, using keyword analysis techniques from information
preferences, and needs. The profiling information can be retrieval. For example, ContentBasedP rofileðcÞ can be
elicited from users explicitly, e.g., through questionnaires, defined as a vector of weights ðwc1 ; . . . ; wck Þ, where each
or implicitly—learned from their transactional behavior weight wci denotes the importance of keyword ki to user c
over time. and can be computed from individually rated content
More formally, let ContentðsÞ be an item profile, i.e., a set vectors using a variety of techniques. For example, some
of attributes characterizing item s. It is usually computed by averaging approach, such as Rocchio algorithm [85], can be
extracting a set of features from item s (its content) and is used to compute ContentBasedP rofileðcÞ as an “average”
used to determine the appropriateness of the item for vector from an individual content vectors [8], [56]. On the
recommendation purposes. Since, as mentioned earlier, other hand, [77] uses a Bayesian classifier in order to
content-based systems are designed mostly to recommend estimate the probability that a document is liked. The
text-based items, the content in these systems is usually
Winnow algorithm [62] has also been shown to work well
described with keywords. For example, a content-based
for this purpose, especially in the situations where there are
component of the Fab system [8], which recommends Web
pages to users, represents Web page content with the many possible features [76].
100 most important words. Similarly, the Syskill & Webert In content-based systems, the utility function uðc; sÞ is
system [77] represents documents with the 128 most usually defined as:
informative words. The “importance” (or “informative- uðc; sÞ ¼ scoreðContentBasedP rofileðcÞ; ContentðsÞÞ: ð5Þ
ness”) of word kj in document dj is determined with some
weighting measure wij that can be defined in several Using the above-mentioned information retrieval-based
different ways. paradigm of recommending Web pages, Web site URLs,
One of the best-known measures for specifying keyword or Usenet news messages, both ContentBasedP rofileðcÞ of
weights in Information Retrieval is the term frequency/inverse user c and ContentðsÞ of document s can be represented as
document frequency (TF-IDF) measure [89] that is defined as TF-IDF vectors w~c and w ~s of keyword weights. Moreover,
follows: Assume that N is the total number of documents utility function uðc; sÞ is usually represented in the
that can be recommended to users and that keyword kj information retrieval literature by some scoring heuristic
appears in ni of them. Moreover, assume that fi;j is the defined in terms of vectors w ~c and w
~s , such as the cosine
number of times keyword ki appears in document dj . Then, similarity measure [7], [89]:
T Fi;j , the term frequency (or normalized frequency) of
keyword ki in document dj , is defined as w~c  w
~s
uðc; sÞ ¼ cosð~ ~s Þ ¼
wc ; w
jj~
wc jj2  jj~
ws jj2
fi;j PK ð6Þ
T Fi;j ¼ ; ð2Þ i¼1 wi;c wi;s
maxz fz;j ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PK 2 ;
PK 2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
i¼1 wi;c i¼1 wi;s
where the maximum is computed over the frequencies fz;j
of all keywords kz that appear in the document dj . where K is the total number of keywords in the system.
However, keywords that appear in many documents are For example, if user c reads many online articles on the
not useful in distinguishing between a relevant document topic of bioinformatics, then content-based recommenda-
and a nonrelevant one. Therefore, the measure of inverse tion techniques will be able to recommend other bioinfor-
document frequency ðIDFi Þ is often used in combination matics articles to user c. This is the case because these
with simple term frequency ðT Fi;j Þ. The inverse document articles will have more bioinformatics-related terms (e.g.,
frequency for keyword ki is usually defined as “genome,” “sequencing,” “proteomics”) than articles on
N other topics and, therefore, ContentBasedP rofileðcÞ, as
IDFi ¼ log : ð3Þ defined by vector w ~c , will represent such terms ki with
ni
high weights wic . Consequently, a recommender system
Then, the TF-IDF weight for keyword ki in document dj is using the cosine or a related similarity measure will assign
defined as higher utility uðc; sÞ to those articles s that have high-
weighted bioinformatics terms in w ~s and lower utility to the
wi;j ¼ T Fi;j  IDFi ð4Þ
ones where bioinformatics terms are weighted less.

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
ADOMAVICIUS AND TUZHILIN: TOWARD THE NEXT GENERATION OF RECOMMENDER SYSTEMS: A SURVEY OF THE STATE-OF-THE-ART... 737

Besides the traditional heuristics that are based mostly features from text documents, some other domains have an
on information retrieval methods, other techniques for inherent problem with automatic feature extraction. For
content-based recommendation have also been used, such example, automatic feature extraction methods are much
as Bayesian classifiers [70], [77] and various machine harder to apply to multimedia data, e.g., graphical images,
learning techniques, including clustering, decision trees, audio streams, and video streams. Moreover, it is often not
and artificial neural networks [77]. These techniques differ practical to assign attributes by hand due to limitations of
from information retrieval-based approaches in that they resources [97].
calculate utility predictions based not on a heuristic Another problem with limited content analysis is that, if
formula, such as a cosine similarity measure, but rather two different items are represented by the same set of
are based on a model learned from the underlying data features, they are indistinguishable. Therefore, since text-
using statistical learning and machine learning techni- based documents are usually represented by their most
ques. For example, based on a set of Web pages that were important keywords, content-based systems cannot distin-
rated as “relevant” or “irrelevant” by the user, [77] uses guish between a well-written article and a badly written
the naive Bayesian classifier [31] to classify unrated Web one, if they happen to use the same terms [97].
pages. More specifically, the naive Bayesian classifier is
used to estimate the following probability that page pj 2.1.2 Overspecialization
belongs to a certain class Ci (e.g., relevant or irrelevant) When the system can only recommend items that score
given the set of keywords k1;j ; . . . ; kn;j on that page: highly against a user’s profile, the user is limited to being
recommended items that are similar to those already rated.
P ðCi jk1;j & . . . &kn;j Þ: ð7Þ For example, a person with no experience with Greek
cuisine would never receive a recommendation for even the
Moreover, [77] uses the assumption that keywords are
greatest Greek restaurant in town. This problem, which has
independent and, therefore, the above probability is
also been studied in other domains, is often addressed by
proportional to
introducing some randomness. For example, the use of
genetic algorithms has been proposed as a possible solution
Y
P ðCi Þ P ðkx;j jCi Þ: ð8Þ
x in the context of information filtering [98]. In addition, the
problem with overspecialization is not only that the
While the keyword independence assumption does not
content-based systems cannot recommend items that are
necessarily apply in many applications, experimental results
different from anything the user has seen before. In certain
demonstrate that naı̈ve Bayesian classifiers still produce
cases, items should not be recommended if they are too
high classification accuracy [77]. Furthermore, both
similar to something the user has already seen, such as a
P ðkx;j jCi Þ and P ðCi Þ can be estimated from the underlying
different news article describing the same event. Therefore,
training data. Therefore, for each page pj , the probability some content-based recommender systems, such as Daily-
P ðCi jk1;j & . . . &kn;j Þ is computed for each class Ci and page pj Learner [13], filter out items not only if they are too different
is assigned to class Ci having the highest probability [77]. from the user’s preferences, but also if they are too similar
While not explicitly dealing with providing recommen- to something the user has seen before. Furthermore, Zhang
dations, the text retrieval community has contributed several et al. [112] provide a set of five redundancy measures to
techniques that are being used in content-based recommen- evaluate whether a document that is deemed to be relevant
der systems. One example of such a technique would be the contains some novel information as well. In summary, the
research on adaptive filtering [101], [112], which focuses on diversity of recommendations is often a desirable feature in
becoming more accurate at identifying relevant documents recommender systems. Ideally, the user should be pre-
incrementally by observing the documents one-by-one in a sented with a range of options and not with a homogeneous
continuous document stream. Another example would be set of alternatives. For example, it is not necessarily a good
the work on threshold setting [84], [111], which focuses on idea to recommend all movies by Woody Allen to a user
determining the extent to which documents should match a who liked one of them.
given query in order to be relevant to the user. Other text
retrieval methods are described in [50] and can also be 2.1.3 New User Problem
found in the proceedings of the Text Retrieval Conference The user has to rate a sufficient number of items before a
(TREC) (http://trec.nist.gov). content-based recommender system can really understand
As was observed in [8], [97], content-based recommender the user’s preferences and present the user with reliable
systems have several limitations that are described in the recommendations. Therefore, a new user, having very few
rest of this section. ratings, would not be able to get accurate recommendations.
2.1.1 Limited Content Analysis 2.2 Collaborative Methods
Content-based techniques are limited by the features that Unlike content-based recommendation methods, collabora-
are explicitly associated with the objects that these systems tive recommender systems (or collaborative filtering systems)
recommend. Therefore, in order to have a sufficient set of try to predict the utility of items for a particular user based
features, the content must either be in a form that can be on the items previously rated by other users. More formally,
parsed automatically by a computer (e.g., text) or the the utility uðc; sÞ of item s for user c is estimated based on
features should be assigned to items manually. While the utilities uðcj ; sÞ assigned to item s by those users cj 2 C
information retrieval techniques work well in extracting who are “similar” to user c. For example, in a movie

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
738 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 6, JUNE 2005

recommendation application, in order to recommend simðc; c0 Þ, is essentially a distance measure and is used as a
movies to user c, the collaborative recommender system weight, i.e., the more similar users c and c0 are, the more
tries to find the “peers” of user c, i.e., other users that have weight rating rc0 ;s will carry in the prediction of rc;s . Note
similar tastes in movies (rate the same movies similarly). that simðx; yÞ is a heuristic artifact that is introduced in
Then, only the movies that are most liked by the “peers” of order to be able to differentiate between levels of user
user c would be recommended. similarity (i.e., to be able to find a set of “closest peers” or
There have been many collaborative systems developed “nearest neighbors” for each user) and, at the same time,
in the academia and the industry. It can be argued that the simplify the rating estimation procedure. As shown in
Grundy system [87] was the first recommender system, (10b), different recommendation applications can use their
which proposed using stereotypes as a mechanism for own user similarity measure as long as the calculations are
building models of users based on a limited amount of normalized using the normalizing factor k, as shown above.
information on each individual user. Using stereotypes, The two most commonly used similarity measures will be
the Grundy system would build individual user models described below. One problem with using the weighted
and use them to recommend relevant books to each user. sum, as in (10b), is that it does not take into account the fact
Later on, the Tapestry system relied on each user to that different users may use the rating scale differently. The
identify like-minded users manually [38]. GroupLens [53], adjusted weighted sum, shown in (10c), has been widely
[86], Video Recommender [45], and Ringo [97] were the used to address this limitation. In this approach, instead of
first systems to use collaborative filtering algorithms to using the absolute values of ratings, the weighted sum uses
automate prediction. Other examples of collaborative their deviations from the average rating of the correspond-
recommender systems include the book recommendation ing user. Another way to overcome the differing uses of the
system from Amazon. com, the PHOAKS system that rating scale is to deploy preference-based filtering [22], [35],
helps people find relevant information on the WWW [103], [51], [52], which focuses on predicting the relative prefer-
and the Jester system that recommends jokes [39]. ences of users instead of absolute rating values, as was
According to [15], algorithms for collaborative recom- pointed out earlier in Section 2.
mendations can be grouped into two general classes: Various approaches have been used to compute the
memory-based(or heuristic-based) and model-based. similarity simðc; c0 Þ between users in collaborative recom-
Memory-based algorithms [15], [27], [72], [86], [97] mender systems. In most of these approaches, the similarity
essentially are heuristics that make rating predictions based between two users is based on their ratings of items that
on the entire collection of previously rated items by the both users have rated. The two most popular approaches are
users. That is, the value of the unknown rating rc;s for user c correlation and cosine-based. To present them, let Sxy be the
and item s is usually computed as an aggregate of the set of all items corated by both users x and y, i.e.,
ratings of some other (usually, the N most similar) users for Sxy ¼ fs 2 Sjrx;s 6¼  & ry;s 6¼ g. In collaborative recom-
the same item s: mender systems, Sxy is used mainly as an intermediate
result for calculating the “nearest neighbors” of user x and
rc;s ¼ aggr rc0 ;s ; ð9Þ is often computed in a straightforward manner, i.e., by
c0 2C^ computing the intersection of sets Sx and Sy . However,
where C^ denotes the set of N users that are the most similar some methods, such as the graph-theoretic approach to
to user c and who have rated item s (N can range anywhere collaborative filtering [4], can determine the nearest
from 1 to the number of all users). Some examples of the neighbors of x without computing Sxy for all users y. In
aggregation function are: the correlation-based approach, the Pearson correlation
coefficient is used to measure the similarity [86], [97]:
1X
ðaÞ rc;s ¼ rc0 ;s ; P
ðrx;s  rx Þðry;s  ry Þ
N 0 ^
c 2C s2Sxy
X simðx; yÞ ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi: ð12Þ
ðbÞ rc;s ¼k simðc; c0 Þ  rc0 ;s ; ð10Þ
P
ðrx;s  rx Þ2
P
ðry;s  ry Þ2
c0 2C^ s2Sxy s2Sxy
X
ðcÞ rc;s ¼ rc þ k simðc; c0 Þ  ðrc0 ;s  rc0 Þ; In the cosine-based approach [15], [91], the two users x
c0 2C^
and y are treated as two vectors in m-dimensional space,
where multiplier k serves as a normalizing factor and is where m ¼ jSxy j. Then, the similarity between two vectors
0 can be measured by computing the cosine of the angle
P
usually selected as k ¼ 1 c0 2C^ jsimðc; c Þj, and where the
average rating of user c, rc , in (10c) is defined as1 between them:
P
  X rx;s ry;s
rc ¼ 1 jSc j r ; where Sc ¼ fs 2 Sjrc;s 6¼ g: ð11Þ
s2S c;s x ~
~ y s2Sxy
c
simðx; yÞ ¼ cosð~ yÞ ¼
x; ~ ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P 2 ;
P 2 ffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
In the simplest case, the aggregation can be a simple jj~
xjj2  jj~
yjj2 r r x;s y;s
s2Sxy s2Sxy
average, as defined by (10a). However, the most common
aggregation approach is to use the weighted sum, shown in ð13Þ
(10b). The similarity measure between users c and c0 ,
x ~
where ~ y denotes the dot-product between the vectors ~
x
1. We use the rc;s ¼  notation to indicate that item s has not been rated and ~
y. Still another approach to measuring similarity
by user c. between users uses the mean squared difference measure

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
ADOMAVICIUS AND TUZHILIN: TOWARD THE NEXT GENERATION OF RECOMMENDER SYSTEMS: A SURVEY OF THE STATE-OF-THE-ART... 739

and is described in [97]. Note that different recommender the user’s class membership, the user ratings are assumed
systems may take different approaches in order to imple- to be independent, i.e., the model structure is that of a naive
ment the user similarity calculations and rating estimations Bayesian model. The number of classes and the parameters
as efficiently as possible. One common strategy is to of the model are learned from the data. The second model
calculate all user similarities simðx; yÞ (including the represents each item in the domain as a node in a Bayesian
calculation of Sxy ) in advance and recalculate them only network, where the states of each node correspond to the
once in a while (since the network of peers usually does not possible rating values for each item. Both the structure of
change dramatically in a short time). Then, whenever the the network and the conditional probabilities are learned
user asks for a recommendation, the ratings can be from the data. One limitation of this approach is that each
efficiently calculated on demand using precomputed user can be clustered into a single cluster, whereas some
similarities. recommendation applications may benefit from the ability
Note that both the content-based and the collaborative to cluster users into several categories at once. For example,
approaches use the same cosine measure from information in a book recommendation application, a user may be
retrieval literature. However, in content-based recommen- interested in one topic (e.g., programming) for work
der systems, it is used to measure the similarity between purposes and a completely different topic (e.g., fishing)
vectors of TF-IDF weights, whereas, in collaborative for leisure.
systems, it measures the similarity between vectors of the Moreover, [11] proposed a collaborative filtering method
actual user-specified ratings. in a machine learning framework, where various machine
Many performance-improving modifications, such as learning techniques (such as artificial neural networks)
default voting, inverse user frequency, case amplification [15], coupled with feature extraction techniques (such as singular
and weighted-majority prediction [27], [72], have been value decomposition—an algebraic technique for reducing
proposed as extensions to these standard correlation-based dimensionality of matrices) can be used. Both [15] and [11]
and cosine-based techniques. For example, the default compare their respective model-based approaches with
voting [15] is an extension to the memory-based approaches standard memory-based approaches and report that, in
described above. It was observed that, whenever there are some applications, model-based methods outperform mem-
relatively few user-specified ratings, these methods would ory-based approaches in terms of accuracy of recommenda-
not work well in computing the similarity between users x tions. However, the comparison in both cases is purely
and y since the similarity measure is based on the empirical and no underlying theoretical evidence support-
intersection of the itemsets, i.e., sets of items rated by both ing this claim is provided.
users x and y. It was empirically shown that the rating There have been several other model-based collaborative
prediction accuracy could improve if we assume some recommendation approaches proposed in the literature. A
default rating value for the missing ratings [15]. statistical model for collaborative filtering was proposed in
Also, while the above techniques traditionally have been [105], and several different algorithms for estimating the
used to compute similarities between users, Sarwar et al. model parameters were compared, including K-means
[91] proposed using the same correlation-based and cosine- clustering and Gibbs sampling. Other collaborative filtering
based techniques to compute similarities between items methods include a Bayesian model [20], a probabilistic
instead and obtain the ratings from them. This idea has relational model [37], a linear regression [91], and a
been further extended in [29] for the top-N item recom- maximum entropy model [75]. More recently, a significant
mendations. In addition, [29], [91] present empirical amount of research has been done in trying to model the
evidence that item-based algorithms can provide better
recommendation process using more complex probabilistic
computational performance than traditional user-based
models. For instance, Shani et al. [96] view the recommen-
collaborative methods while, at the same time, providing
dation process as a sequential decision problem and
comparable or better quality than the best available user-
propose using Markov decision processes (a well-known
based algorithms.
stochastic technique for modeling sequential decisions) for
In contrast to memory-based methods, model-based
generating recommendations. Other probabilistic modeling
algorithms [11], [15], [37], [39], [47], [64], [75], [105] use
techniques for recommender systems include probabilistic
the collection of ratings to learn a model, which is then used
latent semantic analysis [47], [48] and a combination of
to make rating predictions. For example, [15] proposes a
multinomial mixture and aspect models using generative
probabilistic approach to collaborative filtering, where the
semantics of Latent Dirichlet Allocation [64]. Similarly, Si
unknown ratings are calculated as
and Jin [99] also use probabilistic latent semantic analysis to
n
X propose a flexible mixture model that allows modeling the
rc;s ¼ Eðrc;s Þ ¼ i  Prðrc;s ¼ ijrc;s0 ; s0 2 Sc Þ ð14Þ classes of users and items explicitly with two sets of latent
i¼0
variables. Furthermore, Kumar et al. [55] use a simple
and it is assumed that rating values are integers between 0 probabilistic model to demonstrate that collaborative filter-
and n and the probability expression is the probability that ing is valuable with relatively little data on each user, and
user c will give a particular rating to item s given that user’s that, in certain restricted settings, simple collaborative
ratings of the previously rated items. To estimate this filtering algorithms are almost as effective as the best
probability, [15] proposes two alternative probabilistic possible algorithms in terms of utility.
models: cluster models and Bayesian networks. In the first As in the case of content-based techniques, the main
model, like-minded users are clustered into classes. Given difference between collaborative model-based techniques

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
740 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 6, JUNE 2005

and heuristic-based approaches is that the model-based can also be addressed using hybrid recommendation
techniques calculate utility (rating) predictions based not on approaches, described in the next section.
some ad hoc heuristic rules, but, rather, based on a model
learned from the underlying data using statistical and 2.2.3 Sparsity
machine learning techniques. A method combining both In any recommender system, the number of ratings already
memory-based and model-based approaches was proposed obtained is usually very small compared to the number of
in [79], where it was empirically demonstrated that the use ratings that need to be predicted. Effective prediction of
of this combined approach can provide better recommen- ratings from a small number of examples is important. Also,
dations than pure memory-based and model-based colla- the success of the collaborative recommender system
borative approaches. depends on the availability of a critical mass of users. For
A different approach to improving the performance of example, in the movie recommendation system, there may
existing collaborative filtering algorithms was taken in [108], be many movies that have been rated by only few people
where the input set of user-specified ratings is carefully
and these movies would be recommended very rarely, even
selected using several techniques that exclude noise, re-
if those few users gave high ratings to them. Also, for the
dundancy, and exploit the sparsity of the ratings’ data. The
user whose tastes are unusual compared to the rest of the
empirical results demonstrate the increase in accuracy and
population, there will not be any other users who are
efficiency for model-based collaborative filtering algorithms.
It is also suggested that the proposed input selection particularly similar, leading to poor recommendations [8].
techniques may help the model-based algorithms to address One way to overcome the problem of rating sparsity is to
the problem of learning from large databases [108]. use user profile information when calculating user similar-
Furthermore, among the latest developments, [109] pro- ity. That is, two users could be considered similar not only if
poses a probabilistic approach to collaborative filtering that they rated the same movies similarly, but also if they belong
constitutes yet another way to combine the memory-based to the same demographic segment. For example, [76] uses
and model-based techniques. In particular, [109] proposes the gender, age, area code, education, and employment
1) using an active learning approach to learn the probabil- information of users in the restaurant recommendation
istic model of each user’s preferences and 2) using the stored application. This extension of traditional collaborative
user profiles in a mixture model to calculate recommenda- filtering techniques is sometimes called “demographic
tions. The latter aspect of the proposed approach deploys filtering” [76]. Another approach that also explores simila-
some of the ideas used in traditional memory-based rities among users has been proposed in [49], where the
algorithms. sparsity problem is addressed by applying associative
The pure collaborative recommender systems do not retrieval framework and related spreading activation algo-
have some of the shortcomings that content-based systems rithms to explore transitive associations among consumers
have. In particular, since collaborative systems use other through their past transactions and feedback. A different
users’ recommendations (ratings), they can deal with any approach for dealing with sparse rating matrices was used
kind of content and recommend any items, even the ones in [11], [90], where a dimensionality reduction technique,
that are dissimilar to those seen in the past. However, Singular Value Decomposition (SVD), was used to reduce
collaborative systems have their own limitations [8], [57], as
the dimensionality of sparse ratings matrices. SVD is a well-
described below.
known method for matrix factorization that provides the
2.2.1 New User Problem best lower rank approximations of the original matrix [90].
It is the same problem as with content-based systems. In 2.3 Hybrid Methods
order to make accurate recommendations, the system must
Several recommendation systems use a hybrid approach by
first learn the user’s preferences from the ratings that the
combining collaborative and content-based methods, which
user gives. Several techniques have been proposed to
address this problem. Most of them use the hybrid helps to avoid certain limitations of content-based and
recommendation approach, which combines content-based collaborative systems [8], [9], [21], [76], [94], [100], [105].
and collaborative techniques. The next section describes Different ways to combine collaborative and content-based
hybrid recommender systems in more detail. An alternative methods into a hybrid recommender system can be
approach is presented in [83], [109], where various classified as follows:
techniques are explored for determining the best (i.e., most
informative to a recommender system) items for a new user 1. implementing collaborative and content-based
to rate. These techniques use strategies that are based on methods separately and combining their predictions,
item popularity, item entropy, user personalization, and 2. incorporating some content-based characteristics
combinations of the above [83], [109]. into a collaborative approach,
3. incorporating some collaborative characteristics into
2.2.2 New Item Problem a content-based approach, and
New items are added regularly to recommender systems. 4. constructing a general unifying model that in-
Collaborative systems rely solely on users’ preferences to corporates both content-based and collaborative
make recommendations. Therefore, until the new item is characteristics.
rated by a substantial number of users, the recommender All of the above approaches have been used by recommen-
system would not be able to recommend it. This problem der systems researchers, as described below.

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
ADOMAVICIUS AND TUZHILIN: TOWARD THE NEXT GENERATION OF RECOMMENDER SYSTEMS: A SURVEY OF THE STATE-OF-THE-ART... 741

2.3.1 Combining Separate Recommenders probabilistic method for combining collaborative and
One way to build hybrid recommender systems is to content-based recommendations, which is based on the
implement separate collaborative and content-based sys- probabilistic latent semantic analysis [46]. Yet, another
tems. Then, we can have two different scenarios. First, we approach is proposed by [25] and [5], where Bayesian
can combine the outputs (ratings) obtained from individual mixed-effects regression models are used that employ
recommender systems into one final recommendation using Markov chain Monte Carlo methods for parameter estima-
either a linear combination of ratings [21] or a voting tion and prediction. In particular, [5] uses the profile
scheme [76]. Alternatively, we can use one of the individual information of users and items in a single statistical model
recommenders, at any given moment choosing to use the that estimates unknown ratings rij for user i and item j:
one that is “better” than others based on some recommen-
rij ¼ xij  þ zi j þ wj i þ eij ;
dation “quality” metric. For example, the DailyLearner
system [13] selects the recommender system that can give eij  Nð0; 2 Þ;
ð15Þ
the recommendation with the higher level of confidence, i  Nð0; Þ;
while [104] chooses the one whose recommendation is more j  Nð0; Þ;
consistent with past ratings of the user.
where i ¼ 1; . . . ; I and j ¼ 1; . . . ; J represent users and
2.3.2 Adding Content-Based Characteristics to items, respectively, and eij , i , and j are random variables
Collaborative Models taking into effect noise, unobserved sources of user
Several hybrid recommender systems, including Fab [8] and heterogeneity, and item heterogeneity, respectively. Also,
the “collaboration via content” approach, described in [76], xij is a matrix containing user and item characteristics, zi is
are based on traditional collaborative techniques but also a vector of user characteristics, and wj is a vector of item
maintain the content-based profiles for each user. These characteristics. The unknown parameters of this model are
content-based profiles, and not the commonly rated items, , 2 , , and and they are estimated from the data of
are then used to calculate the similarity between two users. already known ratings using Markov chain Monte Carlo
As mentioned in [76], this allows to overcome some methods. In summary, [5] uses user attributes fzi g
sparsity-related problems of a purely collaborative ap- constituting a part of a user profile, item attributes fwj g
proach since, typically, not many pairs of users will have a constituting a part of an item profile, and their interactions
significant number of commonly rated items. Another fxij g to estimate the rating of an item.
benefit of this approach is that users can be recommended Hybrid recommendation systems can also be augmented
an item not only when this item is rated highly by users by knowledge-based techniques [17], such as case-based
with similar profiles, but also directly, i.e., when this item reasoning, in order to improve recommendation accuracy
scores highly against the user’s profile [8]. Good et al. [40] and to address some of the limitations (e.g., new user, new
employ a somewhat similar approach in using the variety of item problems) of traditional recommender systems. For
different filterbots—specialized content-analysis agents that example, knowledge-based recommender system Entrée
act as additional participants in a collaborative filtering [17] uses some domain knowledge about restaurants,
community. As a result, the users whose ratings agree with cuisines, and foods (e.g., that “seafood” is not “vegetarian”)
some of the filterbots’ ratings would be able to receive better to recommend restaurants to its users. The main drawback
recommendations [40]. Similarly, [65] uses a collaborative of knowledge-based systems is a need for knowledge
approach where the traditional user’s ratings vector is acquisition—a well-known bottleneck for many artificial
augmented with additional ratings, which are calculated intelligence applications. However, knowledge-based re-
using a pure content-based predictor. commendation systems have been developed for applica-
tion domains where domain knowledge is readily available
2.3.3 Adding Collaborative Characteristics to Content- in some structured machine-readable form, e.g., as an
Based Models ontology. For example, the Quickstep and Foxtrot systems
The most popular approach in this category is to use some [66] use research paper topic ontology to recommend online
dimensionality reduction technique on a group of content- research articles to the users.
based profiles. For example, [100] uses latent semantic Moreover, several papers, such as [8], [65], [76], [100],
indexing (LSI) to create a collaborative view of a collection empirically compare the performance of the hybrid with the
of user profiles, where user profiles are represented by term pure collaborative and content-based methods and demon-
vectors (as discussed in Section 2.1), resulting in a strate that the hybrid methods can provide more accurate
performance improvement compared to the pure content- recommendations than pure approaches.
based approach.
2.4 Summary and Conclusions
2.3.4 Developing a Single Unifying Recommendation As described in Sections 2.1, 2.2, and 2.3, there has been
Model much research done on recommendation technologies over
Many researchers have followed this approach in recent the past several years that have used a broad range of
years. For instance, [9] proposes using content-based and statistical, machine learning, information retrieval, and other
collaborative characteristics (e.g., the age or gender of users techniques that have significantly advanced the state-of-the-
or the genre of movies) in a single rule-based classifier. art in comparison to early recommender systems that
Popescul et al. [80] and Schein et al. [94] propose a unified utilized collaborative and content-based heuristics. As was

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
742 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 6, JUNE 2005

TABLE 2
Classification of Recommender Systems Research

discussed above, recommender systems can be categorized However, both collaborative and content-based methods
as being 1) content-based, collaborative, or hybrid, based on the have certain limitations, described earlier in this section.
recommendation approach used, and 2) heuristic-based or Moreover, in order to provide better recommendations
model-based, based on the types of recommendation techni- and to be able to use recommender systems in arguably
ques used for the rating estimation. We use these two more complex types of applications, such as recommend-
orthogonal dimensions to classify the recommender systems ing vacations or certain types of financial services, most of
research in the 2  3 matrix presented in Table 2. the methods reviewed in this section would need
The recommendation methods described in this section significant extensions. For example, even for a traditional
have performed well in several applications, including the movie recommendation application, [3] showed that, by
ones for recommending books, CDs, and news articles extending the traditional memory-based collaborative
[64], [88], and some of these methods are used in the filtering approach to take into consideration the contextual
“industrial-strength” recommender systems, such as the information, such as when, where, and with whom a
ones deployed at Amazon [61], MovieLens [67], and movie is seen, the resulting recommender system could
VERSIFI Technologies (formerly AdapiveInfo.com) [14]. outperform the pure traditional collaborative filtering

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
ADOMAVICIUS AND TUZHILIN: TOWARD THE NEXT GENERATION OF RECOMMENDER SYSTEMS: A SURVEY OF THE STATE-OF-THE-ART... 743

method. Many real-life recommendation applications, the profile of user i be defined as a vector of p features, i.e.,
including several business applications, such as the ones ci ¼ ðai1 ; . . . ; aip Þ. Also, let the profile of item j be defined as
~
described above, are arguably more complex than a movie a vector of r features, i.e., ~ sj ¼ ðbj1 ; . . . ; bjr Þ. We deliberately
recommender system and would require taking more did not precisely define the meanings of features aij and bkl
factors into the recommendation consideration. Therefore, because they can mean different concepts in different
the need to develop more advanced recommendation applications, such as numbers, categories, rules, sequences,
methods is even more pressing for such types of etc. Also, let ~ c be a vector of all user profiles, i.e.,
applications. In the next section, we review various ways c ¼ ð~
~ cm Þ, and let ~
c1 ; . . . ;~ s be a vector of all item profiles,
to extend recommendation methods in order to support s ¼ ð~
i.e., ~ sn Þ. Then, the most general rating estima-
s1 ; . . . ;~
more complex types of recommendation applications. tion procedure can be defined as

0 rij ; if rij 6¼ 
3 EXTENDING CAPABILITIES OF RECOMMENDER rij ¼ ð16Þ
uij ðR;~ sÞ; if rij ¼ 
c;~
SYSTEMS
that estimates each unknown rating r0ij ¼ uij ðR;~ c;~sÞ in
Recommender systems, as described in Section 2 and
terms of known ratings R ¼ frij 6¼ g, user profiles ~ c, and
summarized in Table 2, can be extended in several ways
item profiles ~ s. We can use various methods for estimating
that include improving the understanding of users and
utility function uij , including various heuristics, nearest-
items, incorporating the contextual information into the
neighbor classifiers, decision trees, spline methods, radial
recommendation process, supporting multcriteria ratings,
basis functions, regressions, neural networks, and relational
and providing more flexible and less intrusive types of
learning methods (since ~ c and ~ s can be matrices or, using
recommendations. Such more comprehensive models of
database tehcnology, relational tables). Moreover, we
recommender systems can provide better recommendation
would like to point out that (16) presents the most general
capabilities. In the remainder of this section, we describe the
model that depends on a whole range of inputs, including
proposed extensions and also identify various research
the characteristics of user i (~ ci ) and possibly other users
opportunities for developing them.
c ¼ ð~
~ cm Þ, characteristics of item j (~
c1 ; . . . ;~ sj ), and possibly
3.1 Comprehensive Understanding of Users and other items ~ s ¼ ð~ sn Þ, ratings (preferences) Ri ex-
s1 ; . . . ;~
Items pressed by user i, and ratings (preferences) expressed by all
As was pointed out in [2], [8], [54], [105], most of the other users R ¼ frij 6¼ g. Therefore, function uij clearly
recommendation methods produce ratings that are based subsumes the collaborative, content-based, and hybrid
on a limited understanding of users and items as captured methods discussed in Section 2. However, most of the
by user and item profiles and do not take full advantage of existing recommender systems make function uij depen-
the information in the user’s transactional histories and dent only on a (small) subset of the whole input space R, ~ c,
other available data. For example, classical collaborative and ~ s. For example, function uij for traditional memory-
filtering methods [45], [86], [97] do not use user and item based collaborative filtering methods does not depend on
profiles at all for recommendation purposes and rely inputs ~ c and ~ s and restricts R only to column Rj and usually
exclusively on the ratings information to make recommen- only to the set of N nearest-neighbors rij for column Rj .2
dations. Although there has been some progress made on An interesting research problem would be to extend the
incorporating user and item profiles into some of the attribute-based profiles, as defined by ~ c and ~ s, to utilize the
methods since the earlier days of recommender systems more advanced profiling techniques described above, such
[13], [76], [79], these profiles still tend to be quite simple as rule, sequence, and signature-based methods.
and do not utilize some of the more advanced profiling
3.2 Extensions for Model-Based Recommendation
techniques. In addition to using traditional profile features,
Techniques
such as keywords and simple user demographics [69], [77],
more advanced profiling techniques based on data mining As discussed in Section 2, some of the model-based
rules [1], [34], sequences [63], and signatures [26] that approaches provide rigorous rating estimation methods
describe a user’s interests can be used to build user profiles. utilizing various statistical and machine learning techniques.
Also, in addition to using the traditional item profile However, other areas of mathematics and computer science,
features, such as keywords [9], [76], similar advanced such as mathematical approximation theory [16], [73], [81], can
profiling techniques can also be used to build comprehen- also contribute to developing better rating estimation
sive item profiles. With respect to recommender systems, methods defined by (16). One example of an approxima-
advanced profiling techniques that are based on data tion-based approach to defining function uij in (16) consti-
mining have been used mainly in the context of Web usage tutes radial basis functions [16], [30], [92] that are defined as
analysis [59], [68], [110], i.e., to discover the navigational follows: Given a set of points X ¼ fx1 ; . . . ; xm g (where
Web usage patterns (i.e., page view sequences) of users in xi 2 IRN ) and the values of an unknown function f (e.g., the
order to provide better Web site recommendations; how- rating function) at these points, i.e., fðx1 Þ; . . . ; fðxm Þ, a radial
ever, such techniques have not been widely adopted in basis function rf;X estimates the values of f in the whole IRN ,
rating-based recommender systems. given rf;X ðxi Þ ¼ fðxi Þ for all i ¼ 1; . . . ; m, as
Once user and item profiles are built, the most general
2. Actually, the situation is a little more complicated than this because
ratings estimation function can be defined in terms of these the estimation of nearest neighbors may involve other values of matrix R for
profiles and the previously specified ratings as follows: Let some of the collaborative filtering methods.

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
744 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 6, JUNE 2005
Xm
rf;X ðxÞ ¼ i¼1
i ðkx  xi kÞ; ð17Þ two-dimensional User  Item recommendation methods to
multdimensional settings. In addition, [43] argued that the
where f 1 ; . . . ; m g are coefficients from IR, kxk is a norm inclusion of the knowledge about the user’s task into the
(e.g., L2 ), and  is a positive definite function, i.e., a function recommendation algorithm in certain applications can lead
satisfying the condition to better recommendations.
m X
X m   In order to take into consideration the contextual
i j ðxi  xj Þ > 0 ð18Þ information, [2] proposes defining the utility (or ratings)
i¼1 j¼1 function over a multidimensional space D1  . . .  Dn (as
opposed to the traditional two-dimensional User  Item
for all distinct points x1 ; . . . ; xm in IRN and all the coefficients
space) as
1 ; . . . ; m from IR. Then, a well-known theorem [92] states
that, if  is a positive definite function, then there exists a u : D1  . . .  Dn ! R: ð19Þ
unique function rf;X of the form (17) satisfying the
conditions rf;X ðxi Þ ¼ fðxi Þ for all i ¼ 1; . . . ; m. Some popular Then, a recommendation problem is defined by selecting
examples of positive definite functions  are: certain “what” dimensions Di1 ; . . . ; Dik ðk < nÞ and certain
“for whom” dimensions Dj1 ; . . . ; Djl ðl < nÞ that do not
1.ðrÞ ¼ r , where > 0 is a positive odd number, overlap, i.e., fDi1 ; . . . ; Dik g \ fDj1 ; . . . ; Djl g ¼ , and recom-
2.ðrÞ ¼ rk logðrÞ, where k 2 IN (thin-plate splines), mending, for each tuple ðdj1 ; . . . ; djl Þ 2 Dj1  . . .  Djl , the
and tuple ðdi1 ; . . . ; dik Þ 2 Di1  . . .  Dik that maximizes the
2
3. ðrÞ ¼ e r , where > 0 (Gaussian). utility uðd1 ; . . . ; dn Þ, i.e.,
One of the advantages of radial basis functions is that
they have been extensively studied in approximation theory 8ðdj1 ; . . . ; djl Þ 2 Dj1  . . .  Djl ;
and their theoretical properties and utilization of radial ðdi1 ; . . . ; dik Þ ¼ arg max uðd01 ; . . . ; d0n Þ: ð20Þ
basis functions in many practical applications have been ðd0i1 ;...;d0ik Þ2Di1 ...Dik
ðd0j1 ;...;d0jl Þ¼ðdj1 ;...;djl Þ
understood very well [16], [92]. Therefore, it should be
interesting to apply them to estimating unknown ratings in For example, in the case of a movie recommender system,
recommender systems. one needs to consider not only characteristics of the
One caveat with using radial basis functions in recom- movie d1 and of the person who wants to see the movie
mender systems, though, is that the recommendation space d2 , but also such contextual information as 1) d3 : where and
c ~
~ s does not usually constitute an N-dimensional Eu- how the movie will be seen (e.g., in the movie theater, at
clidean space IRN . Therefore, one research challenge is to home on TV, on video or DVD), 2) d4 : with whom the movie
extend radial basis methods from the real numbers to other will be seen (e.g., alone, with girlfriend/boyfriend, friends,
domains and apply them to recommender systems pro- parents, etc.), and 3) d5 : when will the movie be seen (e.g.,
blems. The applicability of other approximation methods on weekdays or weekends, in the morning/afternoon/
for estimating uij in (16) constitutes another interesting evening, during the opening night, etc.). As discussed
research topic. earlier, each of the components d1 , d2 , d3 , d4 , d5 can be
3.3 Multidimensionality of Recommendations defined as a vector of its characteristics and the overall
The current generation of recommender systems operates utility function uðd1 ; d2 ; d3 ; d4 ; d5 Þ can be quite complex and
in the two-dimensional User  Item space. That is, they take into consideration various interaction effects among
make their recommendations based only on the user and vectors d1 , d2 , d3 , d4 , d5 .
item information and do not take into consideration As was argued in [2], [3], many of the two-dimensional
additional contextual information that may be crucial in recommendation algorithms cannot be directly extended to
some applications. However, in many situations, the utility the multidimensional case. Furthermore, [3] proposes a
of a certain product to a user may depend significantly on reduction-based recommendation approach which uses only
time (e.g., the time of the year, such as season or month, or the ratings that pertain to the context of the user-specified
the day of the week). It may also depend on the person(s) criteria in which a recommendation is made. For example,
with whom the product will be consumed or shared and to recommend a movie to a person who wants to see it in a
under which circumstances. In such situations, it may not movie theater on a Saturday night, the reduction-based
be sufficient to simply recommend items to users; the approach would use only the available ratings of the movies
recommender system must take additional contextual seen in the movie theaters over the weekends, if it is
information, such as time, place, and the company of a determined from the data that the place and the time of the
user, into consideration when recommending a product.
week dimensions affect the moviegoers’ behavior. By
For example, when recommending a vacation package, the
selecting only the ratings relevant to a recommendation
system should also consider the time of the year, with
whom the user plans to travel, traveling conditions and context, the reduction-based approach projects the multdi-
restrictions at that time, and other contextual information. mensional cube of ratings on the two primary User and Item
As another example, a user can have significantly different dimensions. Then, any standard two-dimensional recom-
preferences for the types of movies she wants to see when mendation method described in Section 2 can be used to
she is going out to a movie theater with a boyfriend on a produce a recommendation. Since these recommendations
Saturday night as opposed to watching a rental movie at are based only on the context-specific set of ratings, this
home with her parents on a Wednesday evening. As was amounts to building a local model producing context-
argued in [2] and [3], it is important to extend traditional specific recommendations.

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
ADOMAVICIUS AND TUZHILIN: TOWARD THE NEXT GENERATION OF RECOMMENDER SYSTEMS: A SURVEY OF THE STATE-OF-THE-ART... 745

Another possible approach to producing multdimen- for the maximum of fc ðrÞ. However, as with decor and
sional recommendations would be to deploy the hierarchical service ratings, we might not have the user’s food ratings
Bayesian method presented in [5], which can be extended fc ðrÞ for all such restaurants and, thus, will also need to use
from two to multdimensional case as follows: Instead of a rating estimation procedure for fc ðrÞ before making any
considering the two-dimensional case, as defined in (15), recommendations.
where user characteristics d1 are defined with vector zi and We believe that the problem of finding the Pareto-
item characteristics d2 with vector wj , we can also add optimal solution set and the iterative method of consecutive
contextual dimensions d3 ; . . . ; dn , where di ¼ ðdi1 ; . . . ; dixi Þ is single criterion optimizations for multcriteria recommenda-
a vector of characteristics for dimension Di . Then, the rating tion problems mentioned above should also constitute
function r ¼ uðd1 ; d2 ; . . . ; dn Þ is extended from (15) to the interesting and challenging problems.
linear combination of d1 ; d2 ; . . . ; dn and also includes inter-
action effects among these dimensions (i.e., interaction 3.5 Nonintrusiveness
effects, as defined by matrix fxij g in (15), should be extended Many recommender systems are intrusive in the sense that
to include other dimensions). One of the research challenges they require explicit feedback from the user and often at a
is to make these extensions scalable for large values of n. significant level of user involvement. For example, before
recommending any newsgroup articles, the system needs to
3.4 Multcriteria Ratings acquire the ratings of previously read articles and, often,
Most of the current recommender systems deal with single- many of them. Since it is impractical to elicit many ratings
criterion ratings, such as ratings of movies and books. of these articles from the user, some recommender systems
However, in some applications, such as restaurant recom- use nonintrusive rating determination methods where
menders, it is crucial to incorporate multcriteria ratings into certain proxies are used to estimate real ratings. For
recommendation methods. For example, many restaurant example, the amount of time a user spends reading a
guides, such as Zagat’s Guide, provide three criteria for newsgroup article can serve as a proxy of the article’s rating
restaurant ratings: food, decor, and service. Although given by this user. Some nonintrusive methods of getting
multicriteria ratings have not yet been examined in the user feedback are presented in [18], [53], [66], [74], [94].
recommender systems literature, they have been exten- However, nonintrusive ratings (such as time spent reading
sively studied in the Operations Research community [33], an article) are often inaccurate and cannot fully replace
[102]. Typical solutions to the multcriteria optimization explicit ratings provided by the user. Therefore, the
problems include: problem of minimizing intrusiveness while maintaining
certain levels of accuracy of recommendations needs to be
1. finding Pareto optimal solutions, addressed by the recommender systems researchers.
2. taking a linear combination of multiple criteria and One way to explore the intrusiveness problem is to
reducing the problem to a single-criterion optimiza- determine an optimal number of ratings the system should
tion problem, ask from a new user. For example, before recommending
3. optimizing the most important criterion and con- any movies, MovieLens.org first asks the user to rate a
verting other criteria to constraints, and predefined number of movies (e.g., 20). This request incurs
4. consecutively optimizing one criterion at a time, certain costs on the end-user that can be modeled in various
converting an optimal solution to constraint(s), and ways, the simplest model being a fixed-cost model (i.e., the
repeating the process for other criteria. cost of rating each movie is C and the cost of rating n movies
An example of the latter approach is the method of is C  n). Then, the intrusiveness problem can be formulated
successive concessions [102]. as an optimization problem that tries to find an optimal
To illustrate how some of these methods can be used in number of initial rating requests n as follows: Each
recommender systems, consider the application of ap- additional rating supplied by the user increases the
proach 3 to the problem of recommending restaurants r to accuracy of recommendations (or any other effectiveness
user c based on the user’s criteria of food quality fc ðrÞ, decor measure) and, therefore, results in certain benefits for the
dc ðrÞ, and service sc ðrÞ. We can take food quality fc ðrÞ to be user. One interesting intrusiveness-related research pro-
the primary criterion and use others as constraints, i.e., we blem would be to develop formal models for defining and
want to find restaurants r that maximize fc ðrÞ, subject to the measuring benefit BðnÞ of supplying n initial ratings in
constraints dc ðrÞ > c and sc ðrÞ > c , where c and c are terms of the increased accuracy of predictions based on
minimal ratings for decor and service (e.g., user c will not these ratings. Once it is known how to measure benefits
go to any restaurant having decor and service ratings below BðnÞ (e.g., by measuring the predictive accuracy of a
10, out of possible 30, regardless of the quality of food recommender system), we need to determine an optimal
there). This problem is complicated by the fact that we number of initial ratings n that maximizes expression
usually will not have the user’s decor dc ðrÞ and service BðnÞ  C  n. Clearly, the optimal value of n is reached
sc ðrÞ ratings for all the restaurants. Then, the task of a when marginal benefits are equal to marginal costs, i.e.,
recommender system is to estimate unknown ratings d0c ðrÞ when BðnÞ ¼ C. The optimal solution should exist under
and s0c ðrÞ, e.g., using the rating estimation methods the assumption that BðnÞ is a monotonically increasing
described in Section 2, and find all the restaurants r function in n with decreasing marginal benefits BðnÞ that
satisfying constraints d0c ðrÞ > c and s0c ðrÞ > c . Once we asymptotically converge to zero.
find all the restaurants satisfying the constraints with these Another interesting research opportunity lies in devel-
estimated ratings, we can use those restaurants in searching oping marginal cost models that are more advanced than

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
746 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 6, JUNE 2005

the fixed-cost model described above and that can poten- measures can be either statistical or decision-support [41].
tially include a cost/benefit analysis of using both implicit Statistical accuracy metrics mainly compare the estimated
and explicit ratings in a recommender system. ratings (e.g., as defined in (16)) against the actual ratings R
Finally, the issue of incrementally selecting good training in the User  Item matrix and include Mean Absolute Error
data for modeling purposes is the problem of active learning, (MAE), root mean squared error, and correlation between
which is a fairly well-studied area in the machine learning predictions and ratings. Decision-support measures deter-
literature, and numerous approaches have been proposed mine how well a recommender system can make predic-
to addressing this problem [23], [24], [36], [58]. We believe tions of high-relevance items (i.e., items that would be rated
highly by the user). They include classical IR measures of
that applying active learning methods to address the
precision (the percentage of truly “high” ratings among
nonintrusiveness issue constitutes another interesting re-
those that were predicted to be “high” by the recommender
search opportunity.
system), recall (the percentage of correctly predicted “high”
3.6 Flexibility ratings among all the ratings known to be “high”), F-
measure (a harmonic mean of precision and recall), and
Most of the recommendation methods are inflexible in the
Receiver Operating Characteristic (ROC) measure demon-
sense that they are “hard-wired” into the systems by the
strating the trade-off between true positive and false positive
vendors and, therefore, support only a predefined and fixed rates in recommender systems [41].
set of recommendations. Therefore, the end-user cannot Although popular, these empirical evaluation measures
customize recommendations according to his or her needs have certain limitations. One limitation is that these
in real time. This problem has been identified in [2] and measures are typically performed on test data that the
Recommendation Query Language (RQL) has been pro- users chose to rate. However, items that users choose to rate
posed to address it [2]. RQL is SQL-like language for are likely to constitute a skewed sample, e.g., users may rate
expressing flexible user-specified recommendation re- mostly the items that they like. In other words, the
quests. For example, the request “recommend to each user empirical evaluation results typically only show how
from New York the best three movies that are longer than accurate the system is on items the user decided to rate,
two hours” can be expressed in RQL as: whereas the ability of the system to properly evaluate a
random item (which it should be able to do during its
RECOMMEND Movie TO User normal real-life use) is not tested. Understandably, it is
BASED ON Rating expensive and time-consuming to conduct controlled
SHOW TOP 3 experiments with users in the recommender systems
FROM MovieRecommender settings, therefore, the experiments that test recommenda-
WHERE Movie.Length > 120 tion quality on an unbiased random sample are rare, e.g.,
AND User.City = “New York”. [69]. However, high-quality experiments are necessary in
order to truly understand the benefits and limitations of the
Also, most of the recommender systems recommend
proposed recommendation techniques.
only individual items to individual users and do not deal
In addition, although crucial for measuring the
with aggregation. However, it is important to be able to
accuracy of recommendations, the technical measures
provide aggregated recommendations in a number of
mentioned earlier often do not adequately capture
applications, such as recommend brands or categories of
“usefulness” and “quality” of recommendations. For
products to certain segments of users. For example, a travel-
example, as [107] observes for a supermarket application,
related recommender system may want to recommend
recommending obvious items (such as milk or bread)
vacations in Florida (category of products) to the under-
that the consumer will buy anyway will produce high
graduate students from the Northeast (user segment) accuracy rates; however, it will not be very helpful to the
during the spring break. One way to support aggregated consumer. Therefore, it is also important to develop
recommendations is by utilizing the OLAP-based approach economics-oriented measures that capture the business
[19] to multidimensional recommendations. OLAP-based value of recommendations, such as return on investments
systems naturally support aggregation hierarchies and the (ROI) and customer lifetime value (LTV) measures [32],
initial approaches to deploying OLAP-based methods in [88], [95]. Developing and studying the measures that
recommender systems are presented in [2], [3]. However, would remedy the limitations described in this section
more work is required to develop a more comprehensive constitutes an interesting and important research topic.
understanding of how to use the OLAP approach in
recommender systems and this constitutes an interesting 3.8 Other Extensions
and challenging research problem. Other important research issues that have been explored in
the recommender systems literature include explainability
3.7 Effectiveness of Recommendations [12], [42], trustworthiness [28], scalability [4], [39], [91], [93],
The problem of developing good metrics to measure the and privacy [82], [93] issues of recommender systems.
effectiveness of recommendations has been extensively However, we will not review this work and will not discuss
addressed in the recommender systems literature. Some research opportunities in these areas because of space
examples of this work include [41], [44], [69], [107]. In most limitation.
of the recommender systems literature, the performance
evaluation of recommendation algorithms is usually done
in terms of coverage and accuracy metrics. Coverage 4 CONCLUSIONS
measures the percentage of items for which a recommender Recommender systems made significant progress over the
system is capable of making predictions [41]. Accuracy last decade when numerous content-based, collaborative,

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
ADOMAVICIUS AND TUZHILIN: TOWARD THE NEXT GENERATION OF RECOMMENDER SYSTEMS: A SURVEY OF THE STATE-OF-THE-ART... 747

and hybrid methods were proposed and several “indus- [18] A. Caglayan, M. Snorrason, J. Jacoby, J. Mazzu, R. Jones, and K.
Kumar, “Learn Sesame—A Learning Agent Engine,” Applied
trial-strength” systems have been developed. However, Artificial Intelligence, vol. 11, pp. 393-412, 1997.
despite all of these advances, the current generation of [19] S. Chaudury and U. Dayal, “An Overview of Data Warehousing
recommender systems surveyed in this paper still requires and OLAP Technology,” ACM SIGMOD Record, vol. 26, no. 1,
pp. 65-74, 1997.
further improvements to make recommendation methods [20] Y.-H. Chien and E.I. George, “A Bayesian Model for Collaborative
more effective in a broader range of applications. In this Filtering,” Proc. Seventh Int’l Workshop Artificial Intelligence and
paper, we reviewed various limitations of the current Statistics, 1999.
[21] M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes, and
recommendation methods and discussed possible exten- M. Sartin, “Combining Content-Based and Collaborative Filters in
sions that can provide better recommendation capabilities. an Online Newspaper,” Proc. ACM SIGIR ’99 Workshop Recom-
These extensions include, among others, the improved mender Systems: Algorithms and Evaluation, Aug. 1999.
[22] W.W. Cohen, R.E. Schapire, and Y. Singer, “Learning to Order
modeling of users and items, incorporation of the con- Things,” J. Artificial Intelligence Research, vol. 10, pp. 243-270, 1999.
textual information into the recommendation process, [23] D. Cohn, L. Atlas, and R. Ladner, “Improving Generalization with
support for multcriteria ratings, and provision of a more Active Learning,” Machine Learning, vol. 15, no. 2, pp. 201-221,
1994.
flexible and less intrusive recommendation process. We
[24] D. Cohn, Z. Ghahramani, and M. Jordan, “Active Learning with
hope that the issues presented in this paper will advance Statistical Models,” J. Artificial Intelligence Research, vol. 4, pp. 129-
the discussion in the recommender systems community 145, 1996.
about the next generation of recommendation technologies. [25] M. Condliff, D. Lewis, D. Madigan, and C. Posse, “Bayesian
Mixed-Effects Models for Recommender Systems,” Proc. ACM
SIGIR ’99 Workshop Recommender Systems: Algorithms and Evalua-
tion, Aug. 1999.
REFERENCES [26] C. Cortes, K. Fisher, D. Pregibon, A. Rogers, and F. Smith,
[1] G. Adomavicius and A. Tuzhilin, “Expert-Driven Validation of “Hancock: A Language for Extracting Signatures from Data
Rule-Based User Models in Personalization Applications,” Data Streams,” Proc. Sixth ACM SIGKDD Int’l Conf. Knowledge Discovery
Mining and Knowledge Discovery, vol. 5, nos. 1 and 2, pp. 33-58, and Data Mining, 2000.
2001a. [27] J. Delgado and N. Ishii, “Memory-Based Weighted-Majority
[2] G. Adomavicius and A. Tuzhilin, “Multidimensional Recommen- Prediction for Recommender Systems,” Proc. ACM SIGIR ’99
der Systems: A Data Warehousing Approach,” Proc. Second Int’l Workshop Recommender Systems: Algorithms and Evaluation, 1999.
Workshop Electronic Commerce (WELCOM ’01), 2001b. [28] C. Dellarocas, “The Digitization of Word of Mouth: Promise and
[3] G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin, Challenges of Online Feedback Mechanisms,” Management Science,
“Incorporating Contextual Information in Recommender Systems vol. 49, no. 10, pp. 1407-1424, 2003.
Using a Multidimensional Approach,” ACM Trans. Information [29] M. Deshpande and G. Karypis, “Item-Based Top-N Recommenda-
Systems, vol. 23, no. 1, Jan. 2005. tion Algorithms,” ACM Trans. Information Systems, vol. 22, no. 1,
[4] C.C. Aggarwal, J.L. Wolf, K-L. Wu, and P.S. Yu, “Horting Hatches pp. 143-177, 2004.
an Egg: A New Graph-Theoretic Approach to Collaborative
[30] J. Duchon, “Splines Minimizing Rotation-Invariate Semi-Norms in
Filtering,” Proc. Fifth ACM SIGKDD Int’l Conf. Knowledge Discovery
Sobolev Spaces,” Constructive Theory of Functions of Several
and Data Mining, Aug. 1999.
Variables, W. Schempp and Zeller, ed., pp. 85-100, Springer, 1979.
[5] A. Ansari, S. Essegaier, and R. Kohli, “Internet Recommendations
Systems,” J. Marketing Research, pp. 363-375, Aug. 2000. [31] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. John
[6] J.S. Armstrong, Principles of Forecasting—A Handbook for Researchers Wiley & Sons, 2001.
and Practitioners. Kluwer Academic, 2001. [32] F.R. Dwyer, “Customer Lifetime Valuation to Support Marketing
[7] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Decision Making,” J. Direct Marketing, vol. 3, no. 4, 1989.
Addison-Wesley, 1999. [33] M. Ehrgott, Multicriteria Optimization. Springer Verlag, Sept. 2000.
[8] M. Balabanovic and Y. Shoham, “Fab: Content-Based, Collabora- [34] T. Fawcett and F. Provost, “Combining Data Mining and Machine
tive Recommendation,” Comm. ACM, vol. 40, no. 3, pp. 66-72, Learning for Efficient User Profiling,” Proc. Second Int’l Conf.
1997. Knowledge Discovery and Data Mining (KDD-96), 1996.
[9] C. Basu, H. Hirsh, and W. Cohen, “Recommendation as [35] Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer, “An Efficient
Classification: Using Social and Content-Based Information in Boosting Algorithm for Combining Preferences,” Proc. 15th Int’l
Recommendation,” Recommender Systems. Papers from 1998 Work- Conf. Machine Learning, 1998.
shop, Technical Report WS-98-08, AAAI Press 1998. [36] Y. Freund, H.S. Seung, E. Shamir, and N. Tishby, “Selective
[10] N. Belkin and B. Croft, “Information Filtering and Information Sampling Using the Query by Committee Algorithm,” Machine
Retrieval,” Comm. ACM, vol. 35, no. 12, pp. 29-37, 1992. Learning, vol. 28, nos. 2-3, pp. 133-168, 1997.
[11] D. Billsus and M. Pazzani, “Learning Collaborative Information [37] L. Getoor and M. Sahami, “Using Probabilistic Relational Models
Filters,” Proc. Int’l Conf. Machine Learning, 1998. for Collaborative Filtering,” Proc. Workshop Web Usage Analysis and
[12] D. Billsus and M. Pazzani, “A Personal News Agent that Talks, User Profiling (WEBKDD ’99), Aug. 1999.
Learns and Explains,” Proc. Third Ann. Conf. Autonomous Agents,
[38] D. Goldberg, D. Nichols, B.M. Oki, and D. Terry, “Using
1999.
Collaborative Filtering to Weave an Information Tapestry,” Comm.
[13] D. Billsus and M. Pazzani, “User Modeling for Adaptive News
ACM, vol. 35, no. 12, pp. 61-70, 1992.
Access,” User Modeling and User-Adapted Interaction, vol. 10, nos. 2-
3, pp. 147-180, 2000. [39] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins, “Eigentaste: A
[14] D. Billsus, C.A. Brunk, C. Evans, B. Gladish, and M. Pazzani, Constant Time Collaborative Filtering Algorithm,” Information
“Adaptive Interfaces for Ubiquitous Web Access,” Comm. ACM, Retrieval J., vol. 4, no. 2, pp. 133-151, July 2001.
vol. 45, no. 5, pp. 34-38, 2002. [40] N. Good, J.B. Schafer, J.A. Konstan, A. Borchers, B. Sarwar, J.L.
[15] J.S. Breese, D. Heckerman, and C. Kadie, “Empirical Analysis of Herlocker, and J. Riedl, “Combining Collaborative Filtering with
Predictive Algorithms for Collaborative Filtering,” Proc. 14th Conf. Personal Agents for Better Recommendations,” Proc. Conf. Am.
Uncertainty in Artificial Intelligence, July 1998. Assoc. Artificial Intelligence (AAAI-99), pp. 439-446, July 1999.
[16] M.D. Buhmann, “Approximation and Interpolation with Radial [41] J.L. Herlocker, J.A. Konstan, A. Borchers, and J. Riedl, “An
Functions,” Multivariate Approximation and Applications, N. Dyn, Algorithmic Framework for Performing Collaborative Filtering,”
D. Leviatan, D. Levin, and A. Pinkus, eds., Cambridge Univ. Proc. 22nd Ann. Int’l ACM SIGIR Conf. Research and Development in
Press, 2001. Information Retrieval (SIGIR ’99), 1999.
[17] R. Burke, “Knowledge-Based Recommender Systems,” Encyclope- [42] J.L. Herlocker, J.A. Konstan, and J. Riedl, “Explaining Collabora-
dia of Library and Information Systems, A. Kent, ed., vol. 69, tive Filtering Recommendations,” Proc. ACM Conf. Computer
Supplement 32, Marcel Dekker, 2000. Supported Cooperative Work, 2000.

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
748 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 6, JUNE 2005

[43] J.L. Herlocker and J.A. Konstan, “Content-Independent Task- [68] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, “Discovery and
Focused Recommendation,” IEEE Internet Computing, vol. 5, no. 6, Evaluation of Aggregate Usage Profiles for Web Personalization,”
pp. 40-47, Nov./Dec. 2001. Data Mining and Knowledge Discovery, vol. 6, no. 1, pp. 61-82, 2002.
[44] J.L. Herlocker, J.A. Konstan, L.G. Terveen, and J.T. Riedl, [69] R.J. Mooney and L. Roy, “Content-Based Book Recommending
“Evaluating Collaborative Filtering Recommender Systems,” Using Learning for Text Categorization,” Proc. ACM SIGIR ’99
ACM Trans. Information Systems, vol. 22, no. 1, pp. 5-53, 2004. Workshop Recommender Systems: Algorithms and Evaluation, 1999.
[45] W. Hill, L. Stead, M. Rosenstein, and G. Furnas, “Recommending [70] R.J. Mooney, P.N. Bennett, and L. Roy, “Book Recommending
and Evaluating Choices in a Virtual Community of Use,” Proc. Using Text Categorization with Extracted Information,” Proc.
Conf. Human Factors in Computing Systems, 1995. Recommender Systems Papers from 1998 Workshop, Technical Report
[46] T. Hofmann, “Probabilistic Latent Semantic Analysis,” Proc. 15th WS-98-08, 1998.
Conf. Uncertainty in Artificial Intelligence, pp. 289-296, 1999. [71] B.P.S. Murthi and S. Sarkar, “The Role of the Management
[47] T. Hofmann, “Collaborative Filtering via Gaussian Probabilistic Sciences in Research on Personalization,” Management Science,
Latent Semantic Analysis,” Proc. 26th Ann. Int’l ACM SIGIR Conf., vol. 49, no. 10, pp. 1344-1362, 2003.
2003. [72] A. Nakamura and N. Abe, “Collaborative Filtering Using
[48] T. Hofmann, “Latent Semantic Models for Collaborative Filter- Weighted Majority Prediction Algorithms,” Proc. 15th Int’l Conf.
ing,” ACM Trans. Information Systems, vol. 22, no. 1, pp. 89-115, Machine Learning, 1998.
2004. [73] G. Nurnberger, Approximation by Spline Functions. Springer-Verlag,
[49] Z. Huang, H. Chen, and D. Zeng, “Applying Associative Retrieval 1989.
Techniques to Alleviate the Sparsity Problem in Collaborative [74] D.W. Oard and J. Kim, “Implicit Feedback for Recommender
Filtering,” ACM Trans. Information Systems, vol. 22, no. 1, pp. 116- Systems,” Proc. Recommender Systems. Papers from 1998 Workshop,
142, 2004. Technical Report WS-98-08, 1998.
[50] D.A. Hull, “The TREC-7 Filtering Track: Description and [75] D. Pavlov and D. Pennock, “A Maximum Entropy Approach to
Analysis,” Proc. Seventh Text Retrieval Conf. (TREC-7), 1999. Collaborative Filtering in Dynamic, Sparse, High-Dimensional
[51] R. Jin, L. Si, and C. Zhai, “Preference-Based Graphic Models for Domains,” Proc. 16th Ann. Conf. Neural Information Processing
Collaborative Filtering,” Proc. 19th Conf. Uncertainty in Artificial Systems (NIPS ’02), 2002.
Intelligence (UAI 2003), Aug. 2003a. [76] M. Pazzani, “A Framework for Collaborative, Content-Based, and
Demographic Filtering, Artificial Intelligence Rev., pp. 393-408, Dec.
[52] R. Jin, L. Si, C. Zhai, and J. Callan, “Collaborative Filtering with
1999.
Decoupled Models for Preferences and Ratings,” Proc. 12th Int’l
Conf. Information and Knowledge Management (CIKM 2003), Nov. [77] M. Pazzani and D. Billsus, “Learning and Revising User Profiles:
2003b. The Identification of Interesting Web Sites,” Machine Learning,
vol. 27, pp. 313-331, 1997.
[53] J.A. Konstan, B.N. Miller, D. Maltz, J.L. Herlocker, L.R. Gordon,
and J. Riedl, “GroupLens: Applying Collaborative Filtering to [78] C.C. Peddy and D. Armentrout, Building Solutions with Microsoft
Usenet News,” Comm. ACM, vol. 40, no. 3, pp. 77-87, 1997. Commerce Server 2002. Microsoft Press, 2003.
[79] D.M. Pennock and E. Horvitz, “Collaborative Filtering by
[54] J.A. Konstan, J. Riedl, A. Borchers, and J.L. Herlocker, “Recom-
Personality Diagnosis: A Hybrid Memory And Model-Based
mender Systems: A GroupLens Perspective,” Proc. Recommender
Approach,” Proc. Int’l Joint Conf. Artificial Intelligence Workshop:
Systems, Papers from 1998 Workshop, Technical Report WS-98-08,
Machine Learning for Information Filtering, Aug. 1999.
1998.
[80] A. Popescul, L.H. Ungar, D.M. Pennock, and S. Lawrence,
[55] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins,
“Probabilistic Models for Unified Collaborative and Content-
“Recommendation Systems: A Probabilistic Analysis,” J. Computer
Based Recommendation in Sparse-Data Environments,” Proc. 17th
and System Sciences, vol. 63, no. 1, pp. 42-61, 2001.
Conf. Uncertainty in Artificial Intelligence, 2001.
[56] K. Lang, “Newsweeder: Learning to Filter Netnews,” Proc. 12th
[81] M.J.D. Powell, Approximation Theory and Methods. Cambridge
Int’l Conf. Machine Learning, 1995.
Univ. Press, 1981.
[57] W.S. Lee, “Collaborative Learning for Recommender Systems,” [82] N. Ramakrishnan, B.J. Keller, B.J. Mirza, A.Y. Grama, and G.
Proc. Int’l Conf. Machine Learning, 2001. Karypis, “Privacy Risks in Recommender Systems,” IEEE Internet
[58] D. Lewis and J. Catlett, “Heterogeneous Uncertainty Sampling for Computing, vol. 5, no. 6, pp. 54-62, Nov./Dec. 2001.
Supervised Learning,” Proc. 11th Int’l Conf. Machine Learning, [83] A.M. Rashid, I. Albert, D. Cosley, S.K. Lam, S.M. McNee, J.A.
pp. 148-156, 1994. Konstan, and J. Riedl, “Getting to Know You: Learning New User
[59] J. Li and O.R. Zaı̈ane, “Combining Usage, Content, and Structure Preferences in Recommender Systems,” Proc. Int’l Conf. Intelligent
Data to Improve Web Site Recommendation,” Proc. Fifth Int’l Conf. User Interfaces, 2002.
Electronic Commerce and Web Technologies (EC-Web ’04), pp. 305-315, [84] S. Robertson and S. Walker, “Threshold Setting in Adaptive
2004. Filtering,” J. Documentation, vol. 56, pp. 312-331, 2000.
[60] G.L. Lilien, P. Kotler, and K.S. Moorthy, Marketing Models. Prentice [85] J.J. Rocchio, “Relevance Feedback in Information Retrieval,”
Hall, 1992. SMART Retrieval System—Experiments in Automatic Document
[61] G. Linden, B. Smith, and J. York, “Amazon.com Recommenda- Processing, G. Salton, ed., chapter 14, Prentice Hall, 1971.
tions: Item-to-Item Collaborative Filtering,” IEEE Internet Comput- [86] P. Resnick, N. Iakovou, M. Sushak, P. Bergstrom, and J. Riedl,
ing, Jan./Feb. 2003. “GroupLens: An Open Architecture for Collaborative Filtering of
[62] N. Littlestone and M. Warmuth, “The Weighted Majority Netnews,” Proc. 1994 Computer Supported Cooperative Work Conf.,
Algorithm,” Information and Computation, vol. 108, no. 2, pp. 212- 1994.
261, 1994. [87] E. Rich, “User Modeling via Stereotypes,” Cognitive Science, vol. 3,
[63] H. Mannila, H. Toivonen, and A.I. Verkamo, “Discovering no. 4, pp. 329-354, 1979.
Frequent Episodes in Sequences,” Proc. First Int’l Conf. Knowledge [88] S. Rosset, E. Neumann, U. Eick, N. Vatnik, and Y. Idan, “Customer
Discovery and Data Mining (KDD-95), 1995. Lifetime Value Modeling and Its Use for Customer Retention
[64] B. Marlin, “Modeling User Rating Profiles for Collaborative Planning,” Proc. Eighth ACM SIGKDD Int’l Conf. Knowledge
Filtering,” Proc. 17th Ann. Conf. Neural Information Processing Discovery and Data Mining (KDD-2002), July 2002.
Systems (NIPS ’03), 2003. [89] G. Salton, Automatic Text Processing. Addison-Wesley, 1989.
[65] P. Melville, R.J. Mooney, and R. Nagarajan, “Content-Boosted [90] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Application of
Collaborative Filtering for Improved Recommendations,” Proc. Dimensionality Reduction in Recommender Systems—A Case
18th Nat’l Conf. Artificial Intelligence, 2002. Study,” Proc. ACM WebKDD Workshop, 2000.
[66] S.E. Middleton, N.R. Shadbolt, and D.C. de Roure, “Ontological [91] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-Based
User Profiling in Recommender Systems,” ACM Trans. Information Collaborative Filtering Recommendation Algorithms,” Proc. 10th
Systems, vol. 22, no. 1, pp. 54-88, 2004. Int’l WWW Conf., 2001.
[67] B.N. Miller, I. Albert, S.K. Lam, J.A. Konstan, and J. Riedl, [92] R. Schaback and H. Wendland, “Characterization and Construc-
“MovieLens Unplugged: Experiences with an Occasionally Con- tion of Radial Basis Functions,” Multivariate Approximation and
nected Recommender System,” Proc. Int’l Conf. Intelligent User Applications, N. Dyn, D. Leviatan, D. Levin, and A. Pinkus, eds.,
Interfaces, 2003. Cambridge Univ. Press, 2001.

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.
ADOMAVICIUS AND TUZHILIN: TOWARD THE NEXT GENERATION OF RECOMMENDER SYSTEMS: A SURVEY OF THE STATE-OF-THE-ART... 749

[93] J.B. Schafer, J.A. Konstan, and J. Riedl, “E-Commerce Recommen- Gediminas Adomavicius received the PhD
dation Applications,” Data Mining and Knowledge Discovery, vol. 5, degree in computer science from New York
nos. 1/2, pp. 115-153, 2001. University in 2002. He is an assistant professor
[94] A.I. Schein, A. Popescul, L.H. Ungar, and D.M. Pennock, in the Department of Information and Decision
“Methods and Metrics for Cold-Start Recommendations,” Proc. Sciences at the Carlson School of Management,
25th Ann. Int’l ACM SIGIR Conf., 2002. University of Minnesota. Dr. Adomavicius’ re-
[95] D.C. Schmittlein, D.G. Morrison, and R. Colombo, “Counting search focuses on personalization technologies,
Your Customers: Who Are They and What Will They Do Next?” data mining, and combinatorial auction mechan-
Management Science, vol. 33, no. 1, 1987. isms. He has published more than 20 refereed
[96] G. Shani, R. Brafman, and D. Heckerman, “An MDP-Based journal and conference papers in these areas.
Recommender System,” Proc. 18th Conf. Uncertainty in Artificial He is a member of the ACM, the IEEE, and the IEEE Computer Society.
Intelligence, Aug. 2002.
[97] U. Shardanand and P. Maes, “Social Information Filtering:
Algorithms for Automating ‘Word of Mouth’,” Proc. Conf. Human Alexander Tuzhilin received the PhD degree in
Factors in Computing Systems, 1995. computer science from the Courant Institute of
[98] B. Sheth and P. Maes, “Evolving Agents for Personalized Mathematical Sciences, New York University
Information Filtering,” Proc. Ninth IEEE Conf. Artificial Intelligence (NYU). He is currently an associate professor of
for Applications, 1993. information systems at the Stern School of
[99] L. Si and R. Jin, “Flexible Mixture Model for Collaborative Business, NYU. His current research interests
Filtering,” Proc. 20th Int’l Conf. Machine Learning, Aug. 2003. include knowledge discovery in databases,
[100] I. Soboroff and C. Nicholas, “Combining Content and Collabora- personalization, and CRM technologies. He
tion in Text Filtering,” Proc. Int’l Joint Conf. Artificial Intelligence has published widely in leading IEEE Computer
Workshop: Machine Learning for Information Filtering, Aug. 1999. Society, ACM, and IS journals and conference
[101] G. Somlo and A. Howe, “Adaptive Lightweight Text Filtering,” proceedings and served on program committees of numerous IEEE
Proc. Fourth Int’l Symp. Intelligent Data Analysis, 2001. Computer Society and IS conferences. Dr. Tuzhilin was as a cochair of
[102] R.B. Statnikov and J.B. Matusov, Multicriteria Optimization and the Third IEEE International Conference on Data Mining in 2003. He
Engineering. Chapman & Hall, 1995. currently serves on the editorial boards of the IEEE Transactions on
[103] L. Terveen, W. Hill, B. Amento, D. McDonald, and J. Creter, Knowledge and Data Engineering, the Data Mining and Knowledge
“PHOAKS: A System for Sharing Recommendations,” Comm. Discovery Journal, the INFORMS Journal on Computing, and the
ACM, vol. 40, no. 3, pp. 59-62, 1997. Electronic Commerce Research Journal. He is a member of the IEEE.
[104] T. Tran and R. Cohen, “Hybrid Recommender Systems for
Electronic Commerce,” Proc. Knowledge-Based Electronic Markets,
Papers from the AAAI Workshop, Technical Report WS-00-04, AAAI
. For more information on this or any other computing topic,
Press, 2000.
please visit our Digital Library at www.computer.org/publications/dlib.
[105] L.H. Ungar and D.P. Foster, “Clustering Methods for Collabora-
tive Filtering,” Proc. Recommender Systems, Papers from 1998
Workshop, Technical Report WS-98-08 1998.
[106] W. Wade, “A Grocery Cart that Holds Bread, Butter, and
Preferences,” New York Times, Jan. 16, 2003.
[107] Y. Yang and B. Padmanabhan, “On Evaluating Online Persona-
lization,” Proc. Workshop Information Technology and Systems, pp. 35-
41, Dec. 2001.
[108] K. Yu, X. Xu, J. Tao, M. Ester, and H.-P. Kriegel, “Instance
Selection Techniques for Memory-Based Collaborative Filtering,”
Proc. Second SIAM Int’l Conf. Data Mining (SDM ’02), 2002.
[109] K. Yu, A. Schwaighofer, V. Tresp, X. Xu, and H.-P. Kriegel,
“Probabilistic Memory-Based Collaborative Filtering,” IEEE Trans.
Knowledge and Data Eng., vol. 16, no. 1, pp. 56-69, Jan. 2004.
[110] Proc. WEBKDD 2002—Mining Web Data for Discovering Usage
Patterns and Profiles, O.R. Zaı̈ane, J. Srivastava, M. Spiliopoulou,
B. M. Masand, eds., 2003.
[111] Y. Zhang and J. Callan, “Maximum Likelihood Estimation for
Filtering Thresholds,” Proc. 24th Ann. Int’l ACM SIGIR Conf., 2001.
[112] Y. Zhang, J. Callan, and T. Minka, “Novelty and Redundancy
Detection in Adaptive Filtering,” Proc. 25th Ann. Int’l ACM SIGIR
Conf., pp. 81-88, 2002.

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.

You might also like