You are on page 1of 6

2007 IFIP International Conference on Network and Parallel Computing - Workshops

Immune-Inspired Collaborative Filtering Technology for Rating-Based Recommendation System


YUE Xun * LI Quan-zhong Collage of Information Sciences and Engineering ,Shandong Agricultural University , Taian ,ShangDong, China 271018 (E-mail: yuexun@sdau.edu.cn) Abstract
A novel technology inspired by the adaptive and self-organizing immune nature is applied to the task of rating-based recommendation technology. Unlike present vector-space model, users multiple interest model is introduced to be the representation of antigen and antibody. The artificial immune networks model, which is a type of competitive learning algorithm is capable of extracting relevant features contained in antigens, useful predictions and recommendations are made from the memory antibody cells which represent an internal image of the antigens. It provides better recommendation quality owing to solving to data sparse problem. Experimental results indicate the effectiveness and wide applicability. The advantages are expected to be its ease of adaptation to the dynamic environment. Steve Cayzer and Uwe Aickelin[7][8]have presented an artificial immune system (AIS) applied to the task of film recommendation by collaborative filtering , their AIS model is based on ARB network ideas . However, Aickelins model suffered from its shortage in scalability as their calculation complexity increased quickly both in time and space when the number of user increases, that is, they require the entire existing data to be maintained and analyzed repeatedly whenever new user ratings are added. In this paper, our research work provides an alternative approach to dealing with rating-based collaborative filtering recommendation technology based on artificial immune network to produce high quality recommendation by defining similarity neighbors. Unlike the conventional approach in which only useritem matrix data are used, users who have voted will represents the set of antigens to be recognized, users multiple interests model is introduced to be the representation of antigen and antibody and immune clone algorithm. An artificial immune network model that inspired the adaptive and self-organising immune nature is capable of extracting relevant features contained in antigens, useful predictions and recommendations are made from internal image of the antigens to the active user. Like the natural immune system, Its strongest advantage compared to current approaches is expected to be its ease of adaptation to the dynamic environment, it also provide better recommendation quality owing to sloving to data sparse problem.

1. Introduction
Rating-based collaborative filtering technology aggregate the user preferences which usually are a set of votes on an item to find user similarities based on their ratings[1], different machine learning algorithms such as Bayesian Network [2],clustering [3],rule-based approach [4] are used to combine the preferences of neighbors to produce a prediction or top-N recommendation for the active user. Unfortunately, the traditional collaborative filtering systems can not make accurate recommendation owing to the sparsity problem, researchers have presented dimensionality reduction of the user-item matrix [5]or to calculate similarity among users using a neural network or association rules [6] . Another main technical limitations of CF is ability of recommendation systems to be adaptive to environment where users have many completely different interests or items have completely different content.

2. Methodology
2.1. Overall procedure
The proposed method consists of the following phases; First, users who have voted will represents the antigens to be recognized, unlike modeling users and items with the vector-space model which is widely

Unrecognized Copyright 0-7695-2943-7/07 $25.00Information 2007 IEEE DOI 10.1109/NPC.2007.24

899 897

used, users multiple interests model is introduced to be the representation of antigen and antibody . Second, the central task of our artificial immune network model is capable of extracting relevant features contained in antigens, the recognition of an antigen results in cell proliferation, mutation and selection as suggested by the clonal selection theory, the recognition of antibodies of the network itself results in the network suppression, thus the network size is controlled by immune network dynamic and metadynamic process[9], the network cells antibodiesrepresent an internal image of the antigens, new antigen can be added incrementally to adjust model , as well as the internal image of the antigens contribute to the recommendation quality. Third, useful predictions and recommendations are made from antibodies of internal image, finally, a Top-N list of movies is generated as a recommendation to the active user.

Occupation

Degree in i interest attributes

Integer. ,such as 1,2.21 means Administrator, artist, doctorwriter Unknown, Action Adventure Animation Children's Comedy Crime Documentary Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War, Western

Example2: the Use id of second user is 12 , age is 28 , gender is F, occupation is other, the behavior of user vote is described as follow: { (4,5); (69,5); (88,5); (98,5); (127,4); (133,4); (159,4); (161,5); (168,4); (170,4); (174,5); (195,4); (202,4); (202,3); (215,4); (216,5);(228,4); (238,4); (242,5); (328,4); (392,4);(416,3); (480,4); (591,5); (708,38) } the behavior of user vote in multiple interests is as follow: Table 2 Behavior of user vote in multiple interests
interests unknown Action Adventure Animation Children's Comedy Crime Documentary Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western behavior of user 12 vote {none} {4| 127|161|174| 195| 228|328} {174 |228} { none } {416} {4|69|88|169|170|202| 216|238|242 |480|} {127} { none } {4|98|127|133|170|215|392|416|59 1|708} { none } { none } { none } { none } {159 328} {69|88|133|161|170|202 |216} {195| 228} {98| 159 |195| 328| 480 |591} {69| 133} {203} degree 0 0.135 0.038 0 0.019 0.192 0.019 0 0.192 0 0 0 0 0.038 0.154 0.038 0.115 0.038 0.019

2.2. Representation of antigen and antibody


The most obvious representation for user profile is an n m user-item matrix containing historical purchasing information of n customers on m items, where the length is the number of items, and the position is the item identifier the 'vote' . In fact, it is clearly that users have many completely different interests, we called it as multiple interests, by analyzing the category information of MovieLens dataset, such as Action, Adventure, Animation, Children's, Comedy, Crime, Documentary, Drama etc, on the other hand, a movie also have multiple interests attributes, for example, Star Wars belongs to Action, Adventure, Fantasy and Sci-Fi. Unlike modeling users and items with the vector-space model, in our model , firstly, we use formula 1 to calculate the interests degree of multiple interests attributes based on unique identifier of the movie being rated;

Degree(i ) =

Where ,m(i) is the number of movies which user have vote in i interest attributes, the rating scale of 'vote' is not less than 2, N is the total number of movies which user have voted. Secondly, representation of antigen and antibody which describe the behavior of user vote is define as follow: Table1 Representation of antigen and antibody
Parameters User id Age Gender Descriptions Integer such as 1,2,3943 Integer. ,such as 20,21,,22, 0M, 1-F

m(i ) N

i=1,2,.n

(1)

Representation of antigen about id Use 12 can be described : {122811400.1350.03800.019 0.192 0.019 0 0.192 0 0 0 0 0.038 0.1540.0380.1150.0380.019}

2.3. Artificial immune network model


Mathematically, the central task of immune network model can be formally defined:
Ag = [Ag1 , Ag 2 , ", Ag N ]
T

898 900

means antigens to be recognized, this is the input data of Icainet, the recognition of an antigen results in antobodies proliferation, mutation and selection .the recognition of components of the network itself results in the network suppression, thus the network size is controlled by immune network dynamic and T Ab = [Ab1 , Ab2 , ", Abq ] , metadynamic process.
Ab j = Ab j1 , Ab j 2 ,", Ab jq R p , j = 1,2,", q

Ag i = [Ag i1 , Ag i 2 , " , Ag i N ] R p , i = 1,2, " , N

internal image T M = [m1 , m 2 , ", m h ]

of

the

h<<N, is capable of extracting relevant features contained in antigens. ICainet Algorithm: Input :user profile who have voted as
antigen Ag Output: M is memory network which extract the relevant features contained in antigens Step1: Initialization: create an initial population of network antibodies Ab = { y1 , y2 ,... yl } ; Step2: for each antigen j , transforming and encoding data, do: Step3: Clonal selection and expansion: 3-1 ,For each network element, determine its affinity with the antigen presented. Select Nc of high affinity elements, then Reproduce (clone) them proportionally to their affinity based on learning (hypermutation) rate; and place them into a clonal memory set C; 3-2 ;determine the affinity among Ag and all the elements of C, from C ,re-select of antibodies with highest affinity and put them into clone memory Abe; 3-3 ; Clonal interactions: determine the network inter-actions (affinity) among all the elements of the clonal memory set Abe; 3-4: Clonal suppression: eliminate those memory clones whose affinity is less than a pre-specified threshold; 3-5: determine the affinity among Ag and all the elements of Abe , from Abe ,re-select of antibodies
Ag

mk = mk 1 , mk 2 , ", mki p R p , k = 1,2, ", h,

represent an antigens, then, ,

with highest affinity and put them into memory Ab; 3-6 :Determine the network interactions (affinity) among all the elements of the memory Ab t; eliminate those memory clones whose affinity is less than a prespecified threshold based on pruning threshold . 3-7: put antibodies of Ab into memory network M 3-8: Determine the network interactions (affinity) among all the elements of the memory network M ; eliminate those memory clones whose affinity is less than a prespecified threshold based on pruning threshold . Step 4: Diversity: introduce a number of randomly generated cells into the network; Step 5: Repeat until a pre-specified number of iterations is performed. Let us support the max number of antibodies is m, for each network element, select k of high affinity elements, the max number clone is c, computational complexity is step 3 is O(m*k), let the number of antigens is n, the total computational complexity is O(m3). In Initialization step, we do not create an initial random population of network antibodies Ab = { y1 , y 2 ,... yl } ; first and second VIP user in every

interest are selected. For Immune networks, we make no distinction between the network cells and their surface molecules (antibodies), the Ag-Ab and Ab-Ab interactions are quantified through proximity (or similarity) measures, there are several expressions that can be employed in the determination of the degree of matching between elements. for example, such as Hamming distance (DH) cosine measurePearson Cosine SimilarityEuclidean. The task of immune network model is extracting relevant features contained in antigens by comparing user information (user profile), generally, a user profile is generated based on the movies information that the user has voted, an antibodies with a high similarity level results in cell proliferation, mutation and selection, the clone algorithm used in our model is described as follow: Step 1: determine the interest degree between antibodies and antigen , Select the high affinity elements from Multiple interests , Step 2: Select the movies id in this interests from antibodies and antigen,

899 901

Step 3: find the same movies id and dis-same movies is between antibodies and antigen, Step 4: Restriuction the dis-same movies id ,then reselect movies id from them according C, undergo cloning C proportionally to their stimulation level Step 5: next generation antibodies

2.4. Recommendation mechanic


Once the immune model has stabilised using the above algorithm, we use the memory network M which extract the relevant features contained in antigens to weigh the neighbours, based on the similarity calculation, we can find neighbors with similar tendencies to a particular user, we can intend to identify a subset of movies which user may be interested. Encode user for whom to make predictions as antigen Ag WHILE (AIS not stabilised) & (Reviewers available) DO Calculate matching scores between Ab of memory network M select antobodies of memory network M which have the best k (absolute) correlation scores determine the interest degree between antibodies and antigen , Select the high affinity elements from Multiple interests , Select the movies id in this interests from antibodies and antigen, find the same movies id and dissame movies is between antibodies and antigen, Restriuction the dis-same movies id ,then reselect movies id from them according C, and select c% movies as recommendation results . WHILE (AIS at full size) & (AIS not stable) DO

ratings . There are eighteen movies interests as described in section 3 , occupation have 21 classes. Recommender systems research has used several types of measures for evaluating the quality of a recommender system. Mean Absolute Error (MAE) between ratings and predictions is a widely used metric. In our immune network model, the suppression rate provides some fixed threshold for the correlation of any antibody with the antigen, other parameters did not seem to make any significant difference, at low suppression rates it may prove difficult to fill the immune network completely, conversely, at very high suppression rates it may not be necessary to examine all the supplied users. But , what are the good values of parameters of the suppression rate , the internal image of the antigens by different parameters contribute to the recommendation quality., Figure 1 shows the effect of the difference suppression rate, in particular, the value of suppression rate in our work is 0.15.

Figure1 Effect of the composition of suppression rate Figure 2 shows the comparison of similarity measure methods, Pearson Cosine Similarity and Euclidean , Our results confirm the Icainet algorithm based on the cosine similarity are superior to other methods mentioned in this paper.

3. Experimental evaluation
The data sets come from the MovieLens data[10] which were collected through the MovieLens web site for seven months by the GroupLens Research Project (GroupLens website). For the present investigation, two dataset were used, the training dataset consists of 80,000 ratings of 943 users on 1682 movies using the 15 integer rating scale, the test dataset consists 20000

Figure.2 Comparison of similarity measure methods

900 902

Figs. 3 show the compared results in all dataset between our Icainet and others, such as , Aickelin model, SPSimple Pearson predictor, as we can see from Figs. 3, we compute average MAE of our Icainet ,Aickelin model and SP, the average MAE of Icainet is the lowest.

Inspired the natural immune system, in this paper, our research work provides an alternative approach to deal with rating-based collaborative filtering recommendation technology to produce high quality recommendation, its strongest advantage compared to current approaches is expected to be its ease of adaptation to the changing / dynamic environment that characterizes the World Wide Web, it also provide better recommendation quality owing to sloving to data sparse problem, so, it can be used for different applications ranging from profiling for personalization applications. The future research direction focuses on a multi-layered immune network model which is capable of continuously identifying similar groups and also to minimize the time for processing each stream element and the amount of memory available to the query processor.

9. Acknowledgements
Figure 3 compared results in all dataset between our Icainet and others This study was supported in part by the outstanding young research fund of Shandong Agricultural University. We would like to thank the international review team for their thoughtful review comments

10. References
[1] Herlocker, J., Konstan, J. A., Terveen, L. and Riedl, Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on Information Systems, 2004,22 (1), pp.234-258 [2] J. Canny, Collaborative filtering with privacy via factor analysis. SIGIR 2002, [3] S. H. S. Chee. Rectree , A linear collaborative filtering algorithm. Masters thesis, Simon Fraser University, November (2000) [4] S. H. S.g Chee, J. H., and K. Wang. Rectree An efficient collaborative filtering method. Lecture Notes in Computer Science, (2001):2114, [5] Billsus, D., & Pazzani, M. J. Learning collaborative information filters. In Proceedings of 15th international conference on machine learning. Madision, Wisconsin. 1998. pp. 4654 [6] Shepherd, M., Watters, C., & Marath, A. T.. Adaptive user modeling for filtering electronic news. In Proceedings of the 35th annual Hawaii international conference on system sciences). Hawaii, USA.,2002,pp 102b [7]Cayzer S and Aickelin U (2002), A Recommender System based on the Immune Network, in Proceedings CEC2002, pp 807-813, Honolulu, USA [8] Cayzer, Steve and Aickelin, Uwe A Recommender System based on Idiotypic Artificial Immune Networks. Journal of Mathematical Modelling and Algorithms, 2005,4(2),pp:181-198. [9]A. Ishiguro, S. Ichikawa, and Y. Uchikawa(1994). A Gait Acquisition of Six-Legged Robot Using Immune Networks. In Proceedings of International Conference on Intelligent

Figure 4. MAE of the Classic CF and our CF method of trust inferences for different Sparsity lpevels Our immune network model are tested for different levels of sparsity over a pre-selected 80000-ratings set extracted randomly by the set of actual ratings. Figure 4 illustrates the sensitivity of the algorithms in relation to the different levels of sparsity applied between our Icainet and others, such as Principle Component Analysis (PCA) and information retrieval techniques Latent Semantic Indexing (LSI). For typical sparsity levels of recommendation systems, ICainet performs as much as 8.1% and 10.1% better than other CF respectively. Considering that most of the alternative methods proposed for dealing with the sparsity problem result in recommendation quality degradation, the quality performance of our prediction algorithms is very satisfactory.

4. Conclusions and Future Prospects

901 903

Robotics and Systems (IROS 94),volume 2, pages 1034 1041, Munich, Germany.

[10] http://www.movelens.umn.edu

902 904

You might also like