Matrix factorization tutorial with Python code

14/03/2016
MatrixFactorization:ASimpleTutorialandImplementationinPython@quuxlabs
MatrixFactorization:ASimpleTutorial
andImplementationinPython
ByAlbertAuYeungonSeptember16,2010
ThereisprobablynoneedtosaythatthereistoomuchinformationontheWebnowadays.
Searchengineshelpusalittlebit.Whatisbetteristohavesomethinginteresting
recommendedtousautomaticallywithoutasking.Indeed,fromassimpleasalistofthe
mostpopularbookmarksonDelicious,tosomemorepersonalizedrecommendationswe
receivedonAmazon,weareusuallyofferedrecommendationsontheWeb.
Recommendationscanbegeneratedbyawiderangeofalgorithms.Whileuserbasedor
itembasedcollaborativefilteringmethodsaresimpleandintuitive,matrixfactorization
techniquesareusuallymoreeffectivebecausetheyallowustodiscoverthelatentfeatures
underlyingtheinteractionsbetweenusersanditems.Ofcourse,matrixfactorizationis
simplyamathematicaltoolforplayingaroundwithmatrices,andisthereforeapplicablein
manyscenarioswhereonewouldliketofindoutsomethinghiddenunderthedata.
Inthistutorial,wewillgothroughthebasicideasandthemathematicsofmatrix
factorization,andthenwewillpresentasimpleimplementationinPython.Wewillproceed
withtheassumptionthatwearedealingwithuserratings(e.g.anintegerscorefromthe
rangeof1to5)ofitemsinarecommendationsystem.
TableofContents:
BasicIdeas
Themathematicsofmatrixfactorization
Regularization
ImplementationinPython
FurtherInformation
SourceCode
References
BasicIdeas
Justasitsnamesuggests,matrixfactorizationisto,obviously,factorizeamatrix,i.e.tofind
outtwo(ormore)matricessuchthatwhenyoumultiplythemyouwillgetbacktheoriginal
matrix.
http://www.quuxlabs.com/blog/2010/09/matrixfactorizationasimpletutorialandimplementationinpython/
1/9
14/03/2016
AsIhavementionedabove,fromanapplicationpointofview,matrixfactorizationcanbe
usedtodiscoverlatentfeaturesunderlyingtheinteractionsbetweentwodifferentkindsof
entities.(Ofcourse,youcanconsidermorethantwokindsofentitiesandyouwillbedealing
withtensorfactorization,whichwouldbemorecomplicated.)Andoneobviousapplicationis
topredictratingsincollaborativefiltering.
InarecommendationsystemsuchasNetflixorMovieLens,thereisagroupofusersandaset
ofitems(moviesfortheabovetwosystems).Giventhateachusershaveratedsomeitemsin
thesystem,wewouldliketopredicthowtheuserswouldratetheitemsthattheyhavenot
yetrated,suchthatwecanmakerecommendationstotheusers.Inthiscase,allthe
informationwehaveabouttheexistingratingscanberepresentedinamatrix.Assumenow
wehave5usersand10items,andratingsareintegersrangingfrom1to5,thematrixmay
looksomethinglikethis(ahyphenmeansthattheuserhasnotyetratedthemovie):
D1
D2
D3
D4
U1
U2
U3
U4
U5
Hence,thetaskofpredictingthemissingratingscanbeconsideredasfillingintheblanks
(thehyphensinthematrix)suchthatthevalueswouldbeconsistentwiththeexistingratings
inthematrix.
Theintuitionbehindusingmatrixfactorizationtosolvethisproblemisthatthereshouldbe
somelatentfeaturesthatdeterminehowauserratesanitem.Forexample,twouserswould
givehighratingstoacertainmovieiftheybothliketheactors/actressesofthemovie,orif
themovieisanactionmovie,whichisagenrepreferredbybothusers.Hence,ifwecan
discovertheselatentfeatures,weshouldbeabletopredictaratingwithrespecttoacertain
userandacertainitem,becausethefeaturesassociatedwiththeusershouldmatchwiththe
featuresassociatedwiththeitem.
Intryingtodiscoverthedifferentfeatures,wealsomaketheassumptionthatthenumberof
featureswouldbesmallerthanthenumberofusersandthenumberofitems.Itshouldnot
bedifficulttounderstandthisassumptionbecauseclearlyitwouldnotbereasonableto
assumethateachuserisassociatedwithauniquefeature(althoughthisisnotimpossible).
Andanywayifthisisthecasetherewouldbenopointinmakingrecommendations,because
2/9
14/03/2016
eachoftheseuserswouldnotbeinterestedintheitemsratedbyotherusers.Similarly,the
sameargumentappliestotheitems.
Themathematicsofmatrixfactorization
Havingdiscussedtheintuitionbehindmatrixfactorization,wecannowgoontoworkonthe
mathematics.Firstly,wehaveaset ofusers,andaset ofitems.Let ofsize
be
thematrixthatcontainsalltheratingsthattheusershaveassignedtotheitems.Also,we
assumethatwewouldliketodiscover$K$latentfeatures.Ourtask,then,istofindtwo
matricsmatrices (a
matrix)and (a
matrix)suchthattheirproduct
approximates :
Inthisway,eachrowof wouldrepresentthestrengthoftheassociationsbetweenauser
andthefeatures.Similarly,eachrowof wouldrepresentthestrengthoftheassociations
betweenanitemandthefeatures.Togetthepredictionofaratingofanitem by ,wecan
calculatethedotproductofthetwovectorscorrespondingto and :
Now,wehavetofindawaytoobtain and .Onewaytoapproachthisproblemisthefirst

intializethetwomatriceswithsomevalues,calculatehow`differenttheirproductisto ,
andthentrytominimizethisdifferenceiteratively.Suchamethodiscalledgradientdescent,
aimingatfindingalocalminimumofthedifference.
Thedifferencehere,usuallycalledtheerrorbetweentheestimatedratingandtherealrating,
canbecalculatedbythefollowingequationforeachuseritempair:
Hereweconsiderthesquarederrorbecausetheestimatedratingcanbeeitherhigheror
lowerthantherealrating.
3/9
14/03/2016
Tominimizetheerror,wehavetoknowinwhichdirectionwehavetomodifythevaluesof
and .Inotherwords,weneedtoknowthegradientatthecurrentvalues,andthereforewe
differentiatetheaboveequationwithrespecttothesetwovariablesseparately:
Havingobtainedthegradient,wecannowformulatetheupdaterulesforboth
and
Here, isaconstantwhosevaluedeterminestherateofapproachingtheminimum.Usually
wewillchooseasmallvaluefor ,say0.0002.Thisisbecauseifwemaketoolargeastep
towardstheminimumwemayrunintotheriskofmissingtheminimumandendup
oscillatingaroundtheminimum.
Aquestionmighthavecometoyourmindbynow:ifwefindtwomatrices and suchthat
approximates ,isntthatourpredictionsofalltheunseenratingswillallbezeros?
Infact,wearenotreallytryingtocomeupwith and suchthatwecanreproduce
exactly.Instead,wewillonlytrytominimisetheerrorsoftheobserveduseritempairs.In
otherwords,ifwelet beasetoftuples,eachofwhichisintheformof
,such
that containsalltheobserveduseritempairstogetherwiththeassociatedratings,weare
onlytryingtominimiseevery for
.(Inotherwords, isoursetoftraining
data.)Asfortherestoftheunknowns,wewillbeabletodeterminetheirvaluesoncethe
associationsbetweentheusers,itemsandfeatureshavebeenlearnt.
Usingtheaboveupdaterules,wecantheniterativelyperformtheoperationuntiltheerror
convergestoitsminimum.Wecanchecktheoverallerrorascalculatedusingthefollowing
equationanddeterminewhenweshouldstoptheprocess.
4/9
14/03/2016
Regularization
Theabovealgorithmisaverybasicalgorithmforfactorizingamatrix.Therearealotof
methodstomakethingslookmorecomplicated.Acommonextensiontothisbasicalgorithm
istointroduceregularizationtoavoidoverfitting.Thisisdonebyaddingaparameter and
modifythesquarederrorasfollows:
Inotherwords,thenewparameter isusedtocontrolthemagnitudesoftheuserfeature
anditemfeaturevectorssuchthat and wouldgiveagoodapproximationof without
havingtocontainlargenumbers.Inpractice, issettosomevaluesintherangeof0.02.The
newupdaterulesforthissquarederrorcanbeobtainedbyaproceduresimilartotheone
describedabove.Thenewupdaterulesareasfollows.
ImplementationinPython
Oncewehavederivedtheupdaterulesasdescribedabove,itactuallybecomesvery
straightforwardtoimplementthealgorithm.Thefollowingisafunctionthatimplementsthe
algorithminPython(notethatthisimplementationrequiresthenumpymodule).
Note:ThecompletePythoncodeisavailablefordownloadinsectionSourceCodeatthe
endofthispost.
01
importnumpy
02
03
04
defmatrix_factorization(R,P,Q,K,steps=5000,alpha=0.0002,
beta=0.02):
Q=Q.T
5/9
14/03/2016
05
forstepinxrange(steps):
06
foriinxrange(len(R)):
07
forjinxrange(len(R[i])):
08
ifR[i]
[j]>0:
09
10
eij=R[i]
[j]numpy.dot(P[i,:],Q[:,j])
forkinxrange(K):
11
P[i][k]=P[i][k]+alpha*(2*eij*Q[k]
[j]beta*P[i][k])
12
Q[k][j]=Q[k][j]+alpha*(2*eij*P[i]
[k]beta*Q[k][j])
13
eR=numpy.dot(P,Q)
14
e=0
15
foriinxrange(len(R)):
16
forjinxrange(len(R[i])):
17
ifR[i]
[j]>0:
18
19
20
e=e+pow(R[i]
[j]numpy.dot(P[i,:],Q[:,j]),2)
forkinxrange(K):
e=e+(beta/2)*(pow(P[i]
[k],2)+pow(Q[k][j],2))
6/9
14/03/2016
21
ife
<0.001:
22
break
23
returnP,
Q.T
Wecantrytoapplyittoourexamplementionedaboveandseewhatwewouldget.Belowisa
codesnippetinPythonforrunningtheexample.
01
R=[
02
[5,3,0,1],
03
[4,0,0,1],
04
[1,1,0,5],
05
[1,0,0,4],
06
[0,1,5,4],
07
08
09
R=numpy.array(R)
10
11
N=len(R)
12
M=len(R[0])
13
K=2
14
7/9
14/03/2016
14
15
P=numpy.random.rand(N,K)
16
Q=numpy.random.rand(M,K)
17
18
19
nP,nQ=matrix_factorization(R,
P,Q,K)
nR=numpy.dot(nP,
nQ.T)
Andthematrixobtainedfromtheaboveprocesswouldlooksomethinglikethis:
D1
D2
D3
D4
U1
4.97
2.98
2.18
0.98
U2
3.97
2.40
1.97
0.99
U3
1.02
0.93
5.32
4.93
U4
1.00
0.85
4.59
3.93
U5
1.36
1.07
4.89
4.12
Wecanseethatforexistingratingswehavetheapproximationsveryclosetothetruevalues,
andwealsogetsome'predictions'oftheunknownvalues.Inthissimpleexample,wecan
easilyseethatU1andU2havesimilartasteandtheybothratedD1andD2high,whilethe
restoftheuserspreferredD3,D4andD5.Whenthenumberoffeatures(KinthePython
code)is2,thealgorithmisabletoassociatetheusersanditemstotwodifferentfeatures,and
thepredictionsalsofollowtheseassociations.Forexample,wecanseethatthepredicted
ratingofU4onD3is4.59,becauseU4andU5bothratedD4high.
FurtherInformation
Wehavediscussedtheintuitivemeaningofthetechniqueofmatrixfactorizationanditsuse
incollaborativefiltering.Infact,therearemanydifferentextensionstotheabovetechnique.
Animportantextensionistherequirementthatalltheelementsofthefactormatrices(
and intheaboveexample)shouldbenonnegative.Inthiscaseitiscallednonnegative
8/9
14/03/2016
matrixfactorization(NMF).OneadvantageofNMFisthatitresultsinintuitivemeaningsof
theresultantmatrices.Sincenoelementsarenegative,theprocessofmultiplyingthe
resultantmatricestogetbacktheoriginalmatrixwouldnotinvolvesubtraction,andcanbe
consideredasaprocessofgeneratingtheoriginaldatabylinearcombinationsofthelatent
features.
SourceCode
ThefullPythonsourcecodeofthistutorialisavailablefordownloadat:
mf.py
References
Therehavebeenquitealotofreferencesonmatrixfactorization.Belowaresomeofthe
relatedpapers.
GborTakcsetal(2008).Matrixfactorizationandneighborbasedalgorithmsforthe
Netflixprizeproblem.In:Proceedingsofthe2008ACMConferenceonRecommender
Systems,Lausanne,Switzerland,October2325,267274.
PatrickOtt(2008).IncrementalMatrixFactorizationforCollaborativeFiltering.Science,
TechnologyandDesign01/2008,AnhaltUniversityofAppliedSciences.
DanielD.LeeandH.SebastianSeung(2001).AlgorithmsforNonnegativeMatrix
Factorization.AdvancesinNeuralInformationProcessingSystems13:Proceedingsofthe
2000Conference.MITPress.pp.556562.
DanielD.LeeandH.SebastianSeung(1999).Learningthepartsofobjectsbynon
negativematrixfactorization.Nature,Vol.401,No.6755.(21October1999),pp.788791.
9/9

Matrix factorization tutorial with Python code

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matrix factorization tutorial with Python code

Uploaded by

Copyright:

Available Formats

14/03/2016

Now,wehavetofindawaytoobtain and .Onewaytoapproachthisproblemisthefirst

You might also like