Professional Documents
Culture Documents
CS534
SpectralClustering
RepresentdatapointsastheverticesVofagraphG.
VerticesareconnectedbyedgesE
EdgeshaveweightsW
Largeweightsmeanthattheadjacentverticesareverysimilar;small
weightsimplydissimilarity
MethodsthatusethespectrumofthesimilaritymatrixW to
clusterareknownasspectralclustering
Motivations/Objectives
Therearedifferentwaystointerpretthe
spectralclustering
Onecanviewspectralclusteringasfinding
partitionsofthegraphthatminimizes
NormalizedCut
Alternatively,wecanalsoviewthisas
performingarandomwalkonthegraph
Graphpartitioning
GraphTerminologies
Degreeofnodes
Volumeofaset
GraphCut
ConsiderapartitionofthegraphintotwopartsA
andB
Cut(A,B):sumoftheweightsofthesetofedgesthat
connectthetwogroups
Anintuitivegoalisfindthepartitionthatminimizes
thecut
MinCutObjective
Mincut:Minimizeweightofconnectionsbetween
groups
min ,
,
Problem:
Preferdegeneratesolution(e.g.theredpartition)
Needtoexpresspreferenceformorebalanced
solution
NormalizedCut
Considertheconnectivitybetweengroups
relativetothevolumeofeachgroup
cut ( A, B) cut ( A, B) A
Ncut ( A, B)
Vol ( A) Vol ( B) B
Vol ( A) Vol ( B)
Ncut ( A, B) cut ( A, B)
Vol ( A)Vol ( B)
MinimizedwhenVol(A)andVol(B)areequal.
Thusencouragebalancedcut
SolvingNCut
HowtominimizeNcut?
Let W be the similarity matrix, W (i, j ) Wi , j ;
Let D be the diag. matrix, D(i, i ) j W (i, j );
Let x be a vector in {1,1}N , x(i ) 1 i A.
Withsomesimplifications,wecanshow:
yT ( D W ) y
min x Ncut( x) min y
yT Dy
Rayleighquotient
Subjectto: y T D1 0 (y takes discrete values)
NPHard!
SolvingNCut
Relaxtheoptimizationproblemintothecontinuousdomain
bysolvinggeneralizedeigenvaluesystem:
min subjectto 1
Whichgives:
Notethat 1 0,sothefirsteigenvectoris 1
witheigenvalue0.
Thesecondsmallesteigenvectoristherealvaluedsolutionto
thisproblem!!
2wayNormalizedCuts
1. ComputetheaffinitymatrixW,computethe
degreematrix(D),Disdiagonaland
2. Solve ,where is
calledtheLaplacian matrix
3. Usetheeigenvectorwiththesecondsmallest
eigenvaluetobipartitionthegraphintotwo
parts.
HowtoCreatetheGraph?
ItiscommontouseaGaussianKernelto
computesimilaritybetweenobjects
Onecouldcreate
Afullyconnectedgraph
Knearestneighborgraph(eachnodeisonly
connectedtoitsKnearestneighbors)
CreatingBipartitionUsing2nd
Eigenvector
Sometimesthereisnotaclearthresholdtosplit
basedonthesecondvectorsinceittakes
continuousvalues
Howtochoosethesplittingpoint?
a) Pickaconstantvalue(0,or0.5).
b) Pickthemedianvalueassplittingpoint.
c) LookforthesplittingpointthathastheminimumNcut
value:
1. Choosen possiblesplittingpoints.
2. ComputeNcut value.
3. Pickminimum.
KwayPartition?
Recursivebipartitioning(Hagenetal.,91)
Recursivelyapplybipartitioningalgorithmina
hierarchicaldivisivemanner.
Disadvantages:Inefficient,unstable
Clustermultipleeigenvectors
Buildareducedspacefrommultipleeigenvectors.
Commonlyusedinrecentpapers
Apreferableapproachitslikedoingdimension
reductionthenkmeans
ARandomWalkViewofSpectral
Clustering
Imaginearandomwalkfromnodeionthe
graph
Assumethattheprobabilityoftakingstep
fromnodeitonodejisgivenbythetransition
matrixP:
,
Startingwithinoneclusterandtakearandom
walkgovernedbyP,wewillbelikelyremainin
thesameclusterforalongtime
PropertyofRandomWalk
Ifwestartat ,wherewillweendupaftert
steps?
TransitionMatrixDecomposition
Recallthat ,thuswehave:
,
Wewillfocusonasymmetricvariantofthismatrix
fornow
Wecandecomposeitusingitseigenvectors
,with
Spectralgraphtheorystatesthatundermild
conditions,wehave
1,andtherestofeigen valuesarelessthan1
RandomWalkofInfiniteSteps
Thus
Giveninfinitetimesteps,theprobabilityofendingina
particularnodeisindependentofthestartingnode
FiniteStepRandomWalk
Givenlargebutfinitet,wecanfocusonthe
secondlargesteigenvector
,sotheprobabilitystarting
at ,andendupat isincreasedif and
havethesamesign
Thissuggeststhatweshouldclusterbasedon
thesignof