You are on page 1of 22

SpectralClustering

CS534
SpectralClustering
RepresentdatapointsastheverticesVofagraphG.
VerticesareconnectedbyedgesE
EdgeshaveweightsW
Largeweightsmeanthattheadjacentverticesareverysimilar;small
weightsimplydissimilarity

MethodsthatusethespectrumofthesimilaritymatrixW to
clusterareknownasspectralclustering
Motivations/Objectives
Therearedifferentwaystointerpretthe
spectralclustering
Onecanviewspectralclusteringasfinding
partitionsofthegraphthatminimizes
NormalizedCut
Alternatively,wecanalsoviewthisas
performingarandomwalkonthegraph
Graphpartitioning
GraphTerminologies
Degreeofnodes

Volumeofaset
GraphCut
ConsiderapartitionofthegraphintotwopartsA
andB

Cut(A,B):sumoftheweightsofthesetofedgesthat
connectthetwogroups

Anintuitivegoalisfindthepartitionthatminimizes
thecut
MinCutObjective
Mincut:Minimizeweightofconnectionsbetween
groups
min ,
,
Problem:
Preferdegeneratesolution(e.g.theredpartition)

Needtoexpresspreferenceformorebalanced
solution
NormalizedCut
Considertheconnectivitybetweengroups
relativetothevolumeofeachgroup
cut ( A, B) cut ( A, B) A
Ncut ( A, B)
Vol ( A) Vol ( B) B

Vol ( A) Vol ( B)
Ncut ( A, B) cut ( A, B)
Vol ( A)Vol ( B)

MinimizedwhenVol(A)andVol(B)areequal.
Thusencouragebalancedcut
SolvingNCut
HowtominimizeNcut?
Let W be the similarity matrix, W (i, j ) Wi , j ;
Let D be the diag. matrix, D(i, i ) j W (i, j );
Let x be a vector in {1,1}N , x(i ) 1 i A.

Withsomesimplifications,wecanshow:
yT ( D W ) y
min x Ncut( x) min y
yT Dy
Rayleighquotient
Subjectto: y T D1 0 (y takes discrete values)

NPHard!
SolvingNCut
Relaxtheoptimizationproblemintothecontinuousdomain
bysolvinggeneralizedeigenvaluesystem:
min subjectto 1

Whichgives:
Notethat 1 0,sothefirsteigenvectoris 1
witheigenvalue0.
Thesecondsmallesteigenvectoristherealvaluedsolutionto
thisproblem!!
2wayNormalizedCuts
1. ComputetheaffinitymatrixW,computethe
degreematrix(D),Disdiagonaland

2. Solve ,where is
calledtheLaplacian matrix
3. Usetheeigenvectorwiththesecondsmallest
eigenvaluetobipartitionthegraphintotwo
parts.
HowtoCreatetheGraph?
ItiscommontouseaGaussianKernelto
computesimilaritybetweenobjects

Onecouldcreate
Afullyconnectedgraph
Knearestneighborgraph(eachnodeisonly
connectedtoitsKnearestneighbors)
CreatingBipartitionUsing2nd
Eigenvector
Sometimesthereisnotaclearthresholdtosplit
basedonthesecondvectorsinceittakes
continuousvalues
Howtochoosethesplittingpoint?
a) Pickaconstantvalue(0,or0.5).
b) Pickthemedianvalueassplittingpoint.
c) LookforthesplittingpointthathastheminimumNcut
value:
1. Choosen possiblesplittingpoints.
2. ComputeNcut value.
3. Pickminimum.
KwayPartition?
Recursivebipartitioning(Hagenetal.,91)
Recursivelyapplybipartitioningalgorithmina
hierarchicaldivisivemanner.
Disadvantages:Inefficient,unstable
Clustermultipleeigenvectors
Buildareducedspacefrommultipleeigenvectors.
Commonlyusedinrecentpapers
Apreferableapproachitslikedoingdimension
reductionthenkmeans
ARandomWalkViewofSpectral
Clustering
Imaginearandomwalkfromnodeionthe
graph
Assumethattheprobabilityoftakingstep
fromnodeitonodejisgivenbythetransition
matrixP:
,
Startingwithinoneclusterandtakearandom
walkgovernedbyP,wewillbelikelyremainin
thesameclusterforalongtime
PropertyofRandomWalk
Ifwestartat ,wherewillweendupaftert
steps?
TransitionMatrixDecomposition
Recallthat ,thuswehave:
,
Wewillfocusonasymmetricvariantofthismatrix
fornow
Wecandecomposeitusingitseigenvectors
,with

Spectralgraphtheorystatesthatundermild
conditions,wehave
1,andtherestofeigen valuesarelessthan1
RandomWalkofInfiniteSteps

Thus

Since ,when ,wehave:

Giveninfinitetimesteps,theprobabilityofendingina
particularnodeisindependentofthestartingnode
FiniteStepRandomWalk

Givenlargebutfinitet,wecanfocusonthe
secondlargesteigenvector
,sotheprobabilitystarting
at ,andendupat isincreasedif and
havethesamesign
Thissuggeststhatweshouldclusterbasedon
thesignof

You might also like