Professional Documents
Culture Documents
MiningHeterogeneous
InformationNetworks
RokiaMissaoui
LARIM
UniversitduQubecenOutaouais,Canada
http://w3.uqo.ca/missaoui
Acknowledgement
1
InternetMap
3
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 2/17/2013
Outline
Introduction
Networkrepresentationandkindsofnetworks
Whyminingheterogeneousinformationnetworks(HINs)?
ResearchworkofHansteamonminingHINs
Combiningrankingwithclustering
Combiningrankingwithclassification
Metapathbasedexplorationofinformationnetworks
Rolediscoveryandevolutionanalysis
Othercontributions
OurcurrentresearchonHINs
Conclusion
References
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 4
2
Introduction
Social networks
A social structure of nodes (e.g., individuals or
organizations) that are related to each other by various
ties such as friendship, affinity, collaboration,
Typical social networks
Social bookmarking (Del.icio.us)
Friendship networks (Facebook, Myspace)
Professional networks (LinkedIn)
Media Sharing (Flickr, Youtube)
Folksonomy: collaborative tagging using three entities:
users, resources and tags
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 5
Introduction
Information network analysis (Sun & Han, 2012)
Database as an information network: entities and relationships
Focus on heterogeneous information networks since they
contain rich and inter-related semantics
Data mining (DM) techniques: clustering, classification, ranking,
similarity search, link prediction, trends and evolution analysis
Construction of semantically rich networks by exploring links
among node types through DM techniques
A lot of topics that still need to be explored
Main reference
3
NetworkRepresentations
A network/graph: G = (V, E), where V: vertices/nodes, E:
edges/links
Adjacency matrix:
Aij = 1 if there is an edge between vertices i and j; 0 otherwise
Weighted graph:
Edges having weight (strength), usually a real number
Directed network (directed graph): if each edge has a direction
Labeled graph:
Edges have a label (e.g., creation date)
InformationNetwork(IN)
A network where each node represents an entity (e.g., actor in a
social network) and each link a relationship between entities
Each node/link may have attributes, labels, and weights
Links may carry rich semantic information
Homogeneous networks
Single object type and single link type (one-mode data)
Web: a collection of linked Web pages
Heterogeneous or multi-typed networks
Multiple object and link types
Medical network: patients, doctors, diseases, treatments
Bibliographic network: publications, authors, venues
4
InformationNetwork
Informationnetwork(Sun&Han,2012)
Heterogeneousinformationnetwork
Whenthenumberofobjecttypesorlinktypesismorethan1
Homogeneousvs.HeterogeneousNetworks
5
MiningInformationNetworks
Homogeneous networkscanoftenbederivedfromtheiroriginal
heterogeneous networks
E.g.,coauthornetworkscanbederivedfromauthorpaper
conferencenetworksby projection onauthorsonly
Papercitationnetworkscanbederivedfromacomplete
bibliographicnetworkwithpapersandcitationsprojected
HeterogeneousINs(HINs)carryricherinformationthantheir
correspondingprojectedhomogeneousnetworks
TypedHINsvs.nontypedHINs(i.e.,notdistinguishingdifferent
typesofnodes)
TypednodesandlinksimplyamorestructuredIN,andthus
oftenleadtomoreinformativediscovery
WhyMiningINs?
Informationnetworksareeverywhere!
Biologicalnetworks
Bibliographicnetworks:DBLP,ArXiv,PubMed,
Socialnetworks:Facebook>100millionactiveusers
WorldWideWeb(WWW):>3billionnodes,>50billionedges
Cyberphysicalnetworks
Yeast protein
The Web network Co-author network Social network sites
interaction network
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 12
6
WhatCanbeMinedfromHeterogeneousNetworks?
DBLP:AComputerSciencebibliographicdatabase
>1.8Mpapers,>0.7Mauthors,>10Kvenues,>70Kterms
(appearingmorethanonce)
BipartiteGraphs
G=(V1V2,E V1xV2)
IncidencematrixwherecellAij =1ifthereexists
alinkbetweeni andj, 0otherwise
Onecanconvertatwomodenetworkdatainto
two onemode data,butwithinformationloss V1
AxAT givesthenumberofnodesinV2 colinkedbyboth V2
therowandthecolumninV1. E.g.,two authorshave
papersinbothAAAI andICML
AT xA givesthenumberofnodesinV1 whicharelinked
toboththerowandthecolumninV2.E.g.,Jack and
Tracy havepapersinone conference (SDM)
7
ClusteringandRanking:TwoCriticalFunctions
Clustering AC
E G H
A C Not distinguishing
I objects in each
E G cluster?
B D
J H
F J I
B 1 A 1 B
F D
2 C 2 D
3 E 3 F
Ranking A
1
C 4 G 4 I
2
E 5 H 5 J
3
A C 4
B A better solution:
I ComparingDapples Integrating
E G 5
and oranges?
B D 6
G clustering with
J H 7
I ranking
F 8
H
9
F
10 J
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 15
RankClus:IntegratingClusteringwithRanking
8
ANewMethodology:RankClus
Rankingasthefeature ofthecluster
Rankingisconditionalonaspecificcluster
E.g.,VLDBsrankinTheoryvs.itsrankintheDBarea
Thedistributionsofrankingscoresoverobjectsaredifferentin
eachcluster
Clusteringandrankingaremutuallyenhanced
Betterclustering:rankdistributionsforclustersaremore
distinguishingfromeachother
Betterranking:bettermetricforobjectsislearnedfromthe
ranking
Noteveryobjectshouldbetreatedequallyinclustering!
SimpleRankingvs.AuthorityRanking
SimpleRanking
Proportionalto#ofpublicationsofanauthor/aconference
Considersonlyimmediateneighborhood inthenetwork
AuthorityRanking:
Moresophisticatedrankrulesareneeded
Propagate therankingscoresinthenetworkoverdifferent
types
9
RulesforAuthorityRanking
Rule1:Highlyrankedauthorspublishmanypapersinhighly
rankedconferences
Rule2:Highlyrankedconferencesattractmanypapersfrom
manyhighlyrankedauthors
Rule3:Therankofanauthorisenhancedifheorshecoauthors
withmany highlyrankedauthors
RankClus:AlgorithmFramework
Sub-Network
Ranking
Initialization
Randomlypartition
Repeat
Ranking
Clustering
Rankingobjectsineachsubnetworkinducedfromeachcluster
Generatingnewmeasurespace
Estimatemixturemodelcoefficients foreachtargetobject
Adjustingcluster
Untilstable
10
StepbyStepRunningCaseIllustration
Stable
RankClus:Clustering&RankingCSConferences
11
TimeComplexity:Linearto#ofLinks
Ateachiteration,|E|:edgesinnetwork,m:numberoftarget
objects,K:numberofclusters
Rankingforsparsenetwork
~O(|E|)
Mixturemodelestimation
~O(K|E|+mK)
Clusteradjustment
~O(mK^2)
Inall,linearw.r.t.|E|
~O(K|E|)
Note:SimRankwillbeatleastquadraticateachiterationsinceit
evaluatesdistancebetweeneverypairinthenetwork
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 23
NetClus [KDD09]:BeyondBiTypedNetworks
Beyondbitypedinformationnetwork
AStarNetworkSchema[richerinformation]
Splitanetworkintodifferentlayers
Eachrepresentedbyanetworkcluster
12
MultiTypedNetworksLeadtoBetterResults
Thenetworkclusterfordatabasearea:Conferences,Authors,
andTerms
BetterclusteringandrankingthanRankClus
NetClus:DatabaseSystemCluster
Surajit Chaudhuri 0.00678065
database 0.0995511 VLDB 0.318495 Michael Stonebraker 0.00616469
databases 0.0708818 SIGMOD Conf. 0.313903 Michael J. Carey 0.00545769
system 0.0678563 ICDE 0.188746 C. Mohan 0.00528346
data 0.0214893 PODS 0.107943 David J. DeWitt 0.00491615
query 0.0133316 EDBT 0.0436849 Hector Garcia-Molina 0.00453497
systems 0.0110413
H. V. Jagadish 0.00434289
queries 0.0090603
David B. Lomet 0.00397865
management 0.00850744
Raghu Ramakrishnan 0.0039278
object 0.00837766
Philip A. Bernstein 0.00376314
relational 0.0081175
Joseph M. Hellerstein 0.00372064
processing 0.00745875
Jeffrey F. Naughton 0.00363698
based 0.00736599
Yannis E. Ioannidis 0.00359853
distributed 0.0068367
Jennifer Widom 0.00351929
xml 0.00664958
Per-Ake Larson 0.00334911
oriented 0.00589557
Rakesh Agrawal 0.00328274
design 0.00527672
Dan Suciu 0.00309047
web 0.00509167
Michael J. Franklin 0.00304099
information 0.0050518
Umeshwar Dayal 0.00290143
model 0.00499396
Abraham Silberschatz 0.00278185
efficient 0.00465707 Ranking authors in XML
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 26
13
InterestingResultsfromOtherDomains
RankCompete:Organizeyourphotoalbumautomatically!
RanktreatmentsforAIDSfromMEDLINE
FromRankClus toRankClass
RankClus[EDBT09]:Clusteringandrankingworkingtogether
Notraining,noavailableclasslabels,noexpertknowledge
RankClass[KDD11]:Integrationofrankingandclassification
Ranking:informativeunderstanding&summaryofeach
class
Classmembershipiscriticalinformationwhenranking
objects
Letrankingandclassificationmutuallyenhanceeachother!
Output:Classificationresults+rankinglistofobjects
withineachclass
14
ClassificationGeneratesGoodRankingResults
DBLP:4fieldsdataset(DB,DM,AI,IR)formingaheterog.info.network
Rankobjectswithineachclass(withextremelylimitedlabelinformation)
ObtainHighclassificationaccuracyandexcellentrankingswithineachclass
Listobjectswiththehighestconfidencemeasurebelongingtoconf.&terms
Database DataMining AI IR
VLDB KDD IJCAI SIGIR
SIGMOD SDM AAAI ECIR
Top5ranked
ICDE ICDM ICML CIKM
conferences
PODS PKDD CVPR WWW
EDBT PAKDD ECML WSDM
data mining learning retrieval
database data knowledge information
Top5ranked
query clustering reasoning web
terms
system classification logic search
xml frequent cognition text
SimilaritySearch:FindSimilarObjectsinNetworks
DBLP
WhoarethemostsimilartoChristosFaloutsos?
IMDB
WhichmoviesarethemostsimilartoLittleMiss
Sunshine?
ECommerce
WhichproductsarethemostsimilartoKindle?
Y.Sun,J.Han,X.Yan,P.S.Yu,andTianyiWu,PathSim:
MetaPathBasedTopKSimilaritySearchin
HeterogeneousInformationNetworks,VLDB'11
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 30
15
NetworkSchemaandMetaPath
Networkschema
Metaleveldescriptionofanetwork
MetaPath
Metaleveldescription ofapathbetweentwoobjects
Apathonnetworkschema
Denoteanexistingorconcatenatedrelation betweentwo
objecttypes
Jim-P1-Ann
Mike-P2-Ann Co-authorship
Mike-P3-Bob
Relation: Describes the type
Path instances Meta-path
of relationships
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 31
DifferentMetaPathsTellDifferentSemantics
WhoaremostsimilartoChristosFaloutsos?
16
OneMetaPathIsBetterThanOthers
Whichpicturesaremostsimilarto?
Group
Image
PathSim:SimilarityinTermsofPeers
Whypeers?
Stronglyconnected,whilesimilarvisibility
Amazon Kindle
B&N Nook
Sony Reader
Kobo eReader
Inadditiontometapath
Needtoconsidersimilaritymeasures
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 34
17
ExistingSimilarityMeasures
x p y
x y
Note: P-PageRank and SimRank do not
z
distinguish object type and relationship type
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 35
OnlyPathSim CanFindPeers
LimitationsofExistingMeasures
Randomwalk(RW):Favorhighlyvisibleobjects
objectswithlargedegrees
Pairwiserandomwalk(PRW):Favorpure objects
objectswithhighlyskeweddistributionintheirinlinksoroutlinks
PathSim
Favorpeers:objectswithstrongconnectivityandsimilar
visibilityunderthegivenmetapath
x y
18
ComparingSimilarityMeasuresinDBLPData
Which venues are most
similar to DASFAA?
Favorhighly
visibleobjects
Arethesetiny
forumsmost
similartoSIGMOD?
FindAcademicPeersbyPathSim
AnhaiDoan Jignesh Patel
CS,Wisconsin CS, Wisconsin
Databasearea Database area
PhD:2002 PhD: 1998
Meta-Path: Author-Paper-Venue-Paper-Author
19
PathPredict:MetaPathBasedRelationshipPrediction
Wideapplications
WhomshouldIcollaborate with?
WhichpapershouldIcite forthistopic?
WhomelseshouldIfollow onTwitter?
WhetherAnnwillbuy thebookSteveJobs?
WhetherBobwillclick theadonhotel?
RelationshipPredictionvs.LinkPrediction
Linkpredictioninhomogeneousnetworks[LibenNowell and
Kleinberg,2003,Hasan etal.,2006]
E.g.,friendshipprediction
Relationshippredictioninheterogeneousnetworks
Differenttypesofrelationshipsneeddifferentprediction
models
vs.
Differentconnectionpathsneedtobetreatedseparately!
Metapathbasedapproachtodefinetopologicalfeatures.
vs.
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 40
20
WhyPredictionUsingHeterogeneousInfoNetworks?
MetaPathBasedCoauthorshipPredictioninDBLP
Coauthorshippredictionproblem
Whethertwoauthorsaregoingtocollaborateforthefirsttime
Coauthorshipencodedinmetapath
AuthorPaperAuthor
Topologicalfeaturesencodedinmetapaths
Metapathsbetweenauthorsunderlength4
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 42
21
CaseStudy:PredictingConcreteCoAuthors
Highqualitypredictivepowerforsuchadifficulttask
UsingdatainT0=[1989;1995]and
T1=[1996;2002]
Predictnewcoauthorrelationship
inT2=[2003;2009]
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 43
WhenWillItHappen?
Fromwhethertowhen
Whether:WillJim rentthemovieAvatarinNetflix?
Within 1month?3months?1year?Needtobuild
differentmodels!
When:WhenwillJim rentthemovieAvatar?
WhatistheprobabilityJimwillrentAvatarwithin2
months?
Bywhen JimwillrentAvatarwith90%probability?
Whatistheexpectedtime itwilltakeforJimtorent
Avatar?
Y.Sun,J.Han,C.C.Aggarwal,andN.Chawla,WhenWillIt
Happen?RelationshipPredictioninHeterogeneousInformation
Networks,WSDM'12,Feb.2012
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 44
22
RoleDiscoveryinNetworks:WhyDoesItMatter?
Armycommunication
network(imaginary)
Automatically Commander
infer Captain
Solider
RoleDiscovery
Objective:Extractsemanticmeaningfromplainlinkstofinely
modelandbetterorganizeinformationnetworks
Challenges
Latentsemanticknowledge
Interdependency
Scalability
Opportunity
Humanintuition
Realisticconstraint
Crosscheckwithcollectiveintelligence
Methodology:propagatesimpleintuitiverulesandconstraints
overthewholenetwork
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 46
23
DiscoveryofAdvisorAdvisee
RelationshipsinDBLPNetwork
Input:DBLPresearchpublicationnetwork
Output:Potentialadvisingrelationshipanditsranking(r,[st,ed])
Ref.C.Wang,J.Han,etal., MiningAdvisorAdvisee
RelationshipsfromResearchPublicationNetworks,SIGKDD2010
Input: Temporal Output: Relationship analysis Visualized chorological hierarchies
collaboration network
1999 (0.9, [/, 1998])
Ada (0.4,
Ada 2000
2 000 Bob
[/, 1998])
(0.5, [/, 2000])
2000
(0.8, [1999,2000])
Jerry (0.49,
2001 Jerry
[/, 1999])
(0.7,
[2000, 2001])
Ying 2002 Smith
th Bob
(0.2,
Ying [2001, 2003])
2003
(0.65, [2002, 2004])
2004
Smith
MiningEvolutionandDynamicsinHINs
Manynetworksarewithtimeinformation
E.g.,accordingtopaperpublicationyear,DBLPnetworkscanformnetwork
sequences
Motivation:Modelevolutionofcommunitiesinheterogeneousnetwork
Automaticallydetectthebestnumberofcommunitiesineachtimestamp
Modelthesmoothnessbetweencommunitiesofadjacenttimestamps
Modeltheevolutionstructureexplicitly
Birth,death,split
EvoNetClus:Modeling evolutionofdynamicheterogeneousnetworks
Coevolutionwithinacommunity
heterogeneousmultitypedobject/links
Discoveryofevolutionstructuresamongdifferentcommunities
Y.Sun,etal.,"StudyingCoEvolutionofMultiTypedObjectsinDynamic
HeterogeneousInformationNetworks",MLG10
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 48
24
Evolution:IdeaIllustration
Fromnetworksequencestoevolutionarycommunities
RelatedWork
Backstrom&Leskovec(2011)
Supervisedrandomwalksforlinkprediction and
recommendation inHINs
Dongetal,2012
Arankingfactorgraphmodel(RFG)forpredictinglinksin
HINs
Tangetal.(2008,2012)
CommunityevolutioninHINs(calledmultimodenetworks)
usingaclusteringmethodonevolvingnetworks
Davisetal.(2011)
LinkpredictioninHINsusinganextensiontoAdamic/Adar
measureandexploitingclassification
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 50
25
OurCurrentWork
Goal
Miningheterogeneousinformationnetworks
Approach
Exploitthepotentialandtheoreticalbasisofformalconcept
analysis (FCA)andtwoofitsextensionstomanage
multidimensionality andheterogeneityinnetworks
Useandadaptasetoffindingsonconceptpruning,
core/peripheralnodeidentification,networkpartitioning
(e.g.,biclustering andtriclustering),taxonomybasedmining
tobetteranalyzelargenetworksandextractrichpatterns
suchasgroupsandassociationrules
OurCurrentWork
Networksunderstudy
AnalysisofaHINwithaninteractionnetworktogetherwith
anaffiliationoneforlinkpredictionandrecommendation
GeneralizetheapproachtoanarbitraryHIN
Analysisofanetworkwithtridimensionalandevenn
dimensional datausingtriadic(andlateronpolyadic)
conceptanalysis
Explorationofourpreviousworkonformalconceptanalysis
(e.g.,implicationswithnegation,attribute/object
generalization,operationsonlattices)todetectricherand
userorientedpatterns inHINs
26
OurCurrentWork
Linkpredictionandrecommendation
AddanewnodeNi andatleastalinkinaHINwithtwotypes
ofnodesandlinks
Useformalconceptanalysistogetherwithconcept(cluster)
pruningandweightingtosuggestasetoflinkstobeadded
betweenthenewnodeandexistingnodes
Links to
recommend
Alan
New node
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 53
OurCurrentWork
Networkswithtridimensionaldata
Objects with attributes under conditions
E.g.events (1..5),researchers (P,N,R,K,S)androles (a,b,c,d)
a :speaker (at agiven event),b :organizer
c:author,d:PCmember
E.g.,Researcher K attendsEvent 2 with two roles:author andPC
member
27
OurCurrentWork
Triadicconceptanalysis(Lehmann&Wille,1995)
OurCurrentWork
E.g.Thetriadicconcept(345,RK,ab)meansthatEvents 3,4&5
attractResearchers R&Kwithroles aandb
28
OurCurrentWork
Triadicassociationrules
E.g.,any role (e.g.,speaker)played byS isalsoplayedbyP
ad
N P:whenever researcher N attendsevents as
aspeaker andPCmember,P does so
Rokia Missaoui & Lonard Kwuida. Mining Triadic Association Rules from Ternary Relations.
ICFCA 2011, p. 204-218.
Conclusion(Sun&Han,2012)
Richknowledgecanbeminedfrominformationnetworks
Whatisthemagic?
Heterogeneous,structured informationnetworks!
Clustering,rankingandclassification:Integratedclustering,
rankingandclassification:RankClus,RankClass,
MetaPathbasedsimilaritysearchandrelationshipprediction
Rolediscoveryandevolutionaryanalysis
Knowledgeispower,butknowledgeishiddeninmassivelinks!
Miningheterogeneous informationnetworks:Muchmoretobe
explored!!
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 58
29
FutureResearch(Sun&Han,2012)
Discoveringontologyandstructure ininformationnetworks
Discoveringandmininghiddeninformationnetworks
Mininginformationnetworksformedbystructureddatalinking
withunstructureddata(text,multimediaandWeb)
Miningcyberphysicalnetworks(networksformedbydynamic
sensors,image/videocameras,withinformationnetworks)
Enhancingthepowerofknowledgediscoverybytransforming
massiveunstructureddata:Incrementalinformationextraction,
rolediscovery, multidimensionalstructuredinfonet
Miningnoisy,uncertain,untrustablemassivedatasetsby
informationnetworkanalysisapproach
TurningWikipediaand/orWebintostructuredorsemistructured
databasesbyheterogeneousinformationnetworkanalysis
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 59
References:BooksonNetworkAnalysis
A.L.Barabasi.Linked:HowEverythingIsConnectedtoEverythingElseandWhatItMeans.Plume,
2003.
M.Buchanan.Nexus:SmallWorldsandtheGroundbreakingTheoryofNetworks.W.W.Norton&
Company,2003.
D.J.CookandL.B.Holder.MiningGraphData.JohnWiley&Sons,2007
S.Chakrabarti.MiningtheWeb:DiscoveringKnowledgefromHypertextData.MorganKaufmann,
2003
A.DegenneandM.Forse.IntroducingSocialNetworks.SagePublications,1999
P.J.Carrington,J.Scott,andS.Wasserman.ModelsandMethodsinSocialNetworkAnalysis.
CambridgeUniversityPress,2005.
J.Davies,D.Fensel,andF.vanHarmelen.TowardstheSemanticWeb:OntologyDriven
KnowledgeManagement.JohnWiley&Sons,2003.
D.Fensel,W.Wahlster,H.Lieberman,andJ.Hendler.SpinningtheSemanticWeb:Bringingthe
WorldWideWebtoItsFullPotential.MITPress,2002.
L.GetoorandB.Taskar(eds.).Introductiontostatisticallearning.InMITPress,2007.
B.Liu.WebDataMining:ExploringHyperlinks,Contents,andUsageData.Springer,2006.
J.P.Scott.SocialNetworkAnalysis:AHandbook.SagePublications,2005.
J.Watts.SixDegrees:TheScienceofaConnectedAge.W.W.Norton&Company,2003.
D.J.Watts.SmallWorlds:TheDynamicsofNetworksbetweenOrderandRandomness.Princeton
UniversityPress,2003.
S.WassermanandK.Faust.SocialNetworkAnalysis:MethodsandApplications.Cambridge
UniversityPress,1994.
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 60
30
References:SomeOverviewPapers
T.BernersLee,J.Hendler,andO.Lassila.Thesemanticweb.ScientificAmerican,May
2001.
C.CooperandAFrieze.Ageneralmodelofwebgraphs.Algorithms,22,2003.
S.ChakrabartiandC.Faloutsos.Graphmining:Laws,generators,andalgorithms.ACM
Comput.Surv.,38,2006.
T.Dietterich,P.Domingos,L.Getoor,S.Muggleton,andP.Tadepalli.Structured
machinelearning:Thenexttenyears.MachineLearning,73,2008
S.DumaisandH.Chen.Hierarchicalclassificationofwebcontent.SIGIR'00.
S.Dzeroski.Multirelationaldatamining:Anintroduction.ACMSIGKDDExplorations,
July2003.
L.Getoor.Linkmining:anewdataminingchallenge.SIGKDDExplorations,5:84{89,
2003.
L.Getoor,N.Friedman,D.Koller,andB.Taskar.Learningprobabilisticmodelsof
relationalstructure.ICML'01
D.JensenandJ.Neville.Datamininginnetworks.InPapersoftheSymp.Dynamic
SocialNetworkModelingandAnalysis,NationalAcademyPress,2002.
T.WashioandH.Motoda.Stateoftheartofgraphbaseddatamining.SIGKDD
Explorations,5,2003.
References:SomeInfluentialPapers
A.Z.Broder,R.Kumar,F.Maghoul,P.Raghavan,S.Rajagopalan,R.Stata,A.Tomkins,
andJ.L.Wiener.Graphstructureintheweb.ComputerNetworks,33,2000.
S.BrinandL.Page.Theanatomyofalargescalehypertextualwebsearchengine.
WWW'98.
S.Chakrabarti,B.E.Dom,S.R.Kumar,P.Raghavan,S.Rajagopalan,A.Tomkins,D.
Gibson,andJ.M.Kleinberg.Miningtheweb'slinkstructure.COMPUTER,32,1999.
M.Faloutsos,P.Faloutsos,andC.Faloutsos.Onpowerlawrelationshipsoftheinternet
topology.ACMSIGCOMM'99
M.GirvanandM.E.J.Newman.Communitystructureinsocialandbiologicalnetworks.
InProc.Natl.Acad.Sci.USA99,2002.
B.A.HubermanandL.A.Adamic.Growthdynamicsofworldwideweb.Nature,
399:131,1999.
G.JehandJ.Widom.SimRank:ameasureofstructuralcontextsimilarity.KDD'02
D.Kempe,J.Kleinberg,andE.Tardos.Maximizingthespreadofinfluencethrougha
socialnetwork.KDD'03
J.M.Kleinberg,R.Kumar,P.Raghavan,S.Rajagopalan,andA.Tomkins.Thewebasa
graph:Measurements,models,andmethods.COCOON'99
J.M.Kleinberg.Smallworldphenomenaandthedynamicsofinformation.NIPS'01
R.Kumar,P.Raghavan,S.Rajagopalan,D.Sivakumar,A.Tomkins,andE.Upfal.
Stochasticmodelsforthewebgraph.FOCS'00
M.E.J.Newman.Thestructureandfunctionofcomplexnetworks.SIAMReview,45,
2003.
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 62
31
References:ClusteringandRanking(1)
E.Airoldi,D.Blei,S.FienbergandE.Xing,MixedMembershipStochasticBlockmodels,
JMLR08
LiangliangCao,AndreyDelPozo,XinJin,JieboLuo,JiaweiHan,andThomasS.Huang,
RankCompete:SimultaneousRankingandClusteringofWebPhotos,WWW10
G.JehandJ.Widom,SimRank:ameasureofstructuralcontextsimilarity,KDD'02
JingGao,FengLiang,WeiFan,ChiWang,YizhouSun,andJiaweiHan,Community
OutliersandtheirEfficientDetectioninInformationNetworks",KDD'10
M.E.J.NewmanandM.Girvan,Findingandevaluatingcommunitystructurein
networks,PhysicalReviewE,2004
M.E.J.NewmanandM.Girvan,Fastalgorithmfordetectingcommunitystructurein
networks,PhysicalReviewE,2004
J.ShiandJ.Malik,NormalizedcutsandimageSegmentation,CVPR'97
YizhouSun,YintaoYu,andJiaweiHan,"RankingBasedClusteringofHeterogeneous
InformationNetworkswithStarNetworkSchema",KDD09
YizhouSun,JiaweiHan,PeixiangZhao,ZhijunYin,HongCheng,andTianyiWu,
"RankClus:IntegratingClusteringwithRankingforHeterogeneousInformation
NetworkAnalysis",EDBT09
References:ClusteringandRanking(2)
YizhouSun,JiaweiHan,JingGao,andYintaoYu,"iTopicModel:InformationNetworkIntegrated
TopicModeling",ICDM09
YizhouSun,CharuC.Aggarwal,andJiaweiHan,"RelationStrengthAwareClusteringof
HeterogeneousInformationNetworkswithIncompleteAttributes",PVLDB5(5),2002
A.Wu,M.Garland,andJ.Han.Miningscalefreenetworksusinggeodesicclustering.KDD'04
Z.WuandR.Leahy,Anoptimalgraphtheoreticapproachtodataclustering:Theoryandits
applicationtoimagesegmentation,IEEETrans.PatternAnal.Mach.Intell.,1993.
X.Xu,N.Yuruk,Z.Feng,andT.A.J.Schweiger.SCAN:Astructuralclusteringalgorithmfor
networks.KDD'07
XiaoxinYin,JiaweiHan,PhilipS.Yu."LinkClus:EfficientClusteringviaHeterogeneousSemantic
Links",VLDB'06.
YintaoYu,CindyX.Lin,YizhouSun,ChenChen,JiaweiHan,BinbinLiao,TianyiWu,ChengXiang
Zhai,DuoZhang,andBoZhao,"iNextCube:InformationNetworkEnhancedTextCube",VLDB'09
(demo)
X.Yin,J.Han,andP.S.Yu.Crossrelationalclusteringwithuser'sguidance.KDD'05
32
References:NetworkClassification(1)
A.Appice,M.Ceci,andD.Malerba.Miningmodeltrees:Amultirelationalapproach.
ILP'03
JingGao,FengLiang,WeiFan,YizhouSun,andJiaweiHan,"BipartiteGraphbased
ConsensusMaximizationamongSupervisedandUnsupervisedModels",NIPS'09
L.Getoor,N.Friedman,D.KollerandB.Taskar,LearningProbabilisticModelsofLink
Structure,JMLR02.
L.Getoor,E.Segal,B.TaskarandD.Koller,ProbabilisticModelsofTextandLink
StructureforHypertextClassification,IJCAIWSTextLearning:BeyondClassification,
2001.
L.Getoor,N.Friedman,D.Koller,andA.Pfeffer,LearningProbabilisticRelational
Models,chapterinRelationDataMining,eds.S.DzeroskiandN.Lavrac,2001.
M.Ji,Y.Sun,M.Danilevsky,J.Han,andJ.Gao,Graphbasedclassificationon
heterogeneousinformationnetworks,ECMLPKDD10.
M.Ji,J.Jan,andM.Danilevsky,RankingbasedClassificationofHeterogeneous
InformationNetworks,KDD11.
Q.LuandL.Getoor,Linkbasedclassification,ICML'03
D.LibenNowellandJ.Kleinberg,Thelinkpredictionproblemforsocialnetworks,
CIKM'03
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 65
References:NetworkClassification(2)
J.Neville,B.Gallaher,andT.EliassiRad.Evaluatingstatisticaltestsforwithinnetwork
classifiersofrelationaldata.ICDM'09.
J.Neville,D.Jensen,L.Friedland,andM.Hay.Learningrelationalprobabilitytrees.
KDD'03
JenniferNeville,DavidJensen,RelationalDependencyNetworks,JMLR07
M.SzummerandT.Jaakkola,Partiallylabeledclassicationwithmarkovrandomwalks,
InNIPS,volume14,2001.
M.J.Rattigan,M.Maier,andD.Jensen.Graphclusteringwithnetworkstructure
indices.ICML'07
P.Sen,G.M.Namata,M.Galileo,M.Bilgic,L.Getoor,B.Gallagher,andT.EliassiRad.
Collectiveclassificationinnetworkdata.AIMagazine,29,2008.
B.Taskar,E.Segal,andD.Koller.Probabilisticclassificationandclusteringinrelational
data.IJCAI'01
B.Taskar,P.Abbeel,M.F.Wong,andD.Koller,RelationalMarkovNetworks,chapter
inL.GetoorandB.Taskar,editors,IntroductiontoStatisticalRelationalLearning,2007
X.Yin,J.Han,J.Yang,andP.S.Yu,CrossMine:EfficientClassificationacrossMultiple
DatabaseRelations,ICDE'04.
D.Zhou,O.Bousquet,T.N.Lal,J.Weston,andB.Scholkopf,Learningwithlocaland
globalconsistency,InNIPS16,Vancouver,Canada,2004.
X.ZhuandZ.Ghahramani,Learningfromlabeledandunlabeleddatawithlabel
propagation,TechnicalReport,2002.
Tutorial Mining Heterogeneous Information Networks EGC2013 - Toulouse 66
33
References:SocialNetworkAnalysis
B.AlemanMeza,M.Nagarajan,C.Ramakrishnan,L.Ding,P.Kolari,A.P.Sheth,I.B.
Arpinar,A.Joshi,andT.Finin.Semanticanalyticsonsocialnetworks:experiencesin
addressingtheproblemofconflictofinterestdetection.WWW'06
R.Agrawal,S.Rajagopalan,R.Srikant,andY.Xu.Miningnewsgroupsusingnetworks
arisingfromsocialbehavior.WWW'03
P.BoldiandS.Vigna.TheWebGraphframeworkI:Compressiontechniques.WWW'04
D.Cai,Z.Shao,X.He,X.Yan,andJ.Han.Communityminingfrommultirelational
networks.PKDD'05
P.Domingos.Miningsocialnetworksforviralmarketing.IEEEIntelligentSystems,20,
2005.
P.DomingosandM.Richardson.Miningthenetworkvalueofcustomers.KDD'01
P.DeRose,W.Shen,F.Chen,A.Doan,andR.Ramakrishnan.Buildingstructuredweb
communityportals:Atopdown,compositional,andincrementalapproach.VLDB'07
G.Flake,S.Lawrence,C.L.Giles,andF.Coetzee.Selforganizationandidentificationof
webcommunities.IEEEComputer,35,2002.
J.Kubica,A.Moore,andJ.Schneider.Tractablegroupdetectiononlargelinkdatasets.
ICDM'03
References:LinkandRelationshipPrediction
V.Leroy,B.B.Cambazoglu,andF.Bonchi,Coldstartlinkprediction,KDD10.
D.LibenNowellandJ.Kleinberg,Thelinkpredictionproblemforsocial
networks,CIKM03,
R.N.Lichtenwalter,J.T.Lussier,andN.V.Chawla,Newperspectivesand
methodsinlinkprediction,KDD10.
YizhouSun,RickBarber,ManishGupta,CharuC.AggarwalandJiaweiHan,
"CoAuthorRelationshipPredictioninHeterogeneousBibliographic
Networks,ASONAM11.
YizhouSun,JiaweiHan,CharuC.Aggarwal,andNiteshV.Chawla,"WhenWill
ItHappen? RelationshipPredictioninHeterogeneousInformation
Networks",WSDM12.
B.Taskar,M.faiWong,P.Abbeel,andD.Koller,Linkpredictioninrelational
data,NIPS03.
XiaoYu,QuanquanGu,MianweiZhou,andJiaweiHan,"CitationPredictionin
HeterogeneousBibliographicNetworks,SDM12.
34
References:RoleDiscoveryandSummarization
D.Archambault,T.Munzner,andD.Auber.Topolayout:Multilevelgraphlayoutbytopological
features.IEEETrans.Vis.Comput.Graph,2007.
XinJin,JieboLuo,JieYu,GangWang,DhirajJoshi,andJiaweiHan,iRIN:ImageRetrievalin
ImageRichInformationNetworks,WWW'10(demopaper)
LuLiu,FeidaZhu,ChenChen,XifengYan,JiaweiHan,PhilipYu,andShiqiangYang,Mining
DiversityonNetworks",DASFAA'10
Y.Tian,R.A.Hankins,andJ.M.Patel.Efficientaggregationforgraphsummarization.SIGMOD'08
ChiWang,JiaweiHan,YuntaoJia,JieTang,DuoZhang,YintaoYu,andJingyiGuo,Mining
AdvisorAdviseeRelationshipsfromResearchPublicationNetworks ",KDD'10
ZhijunYin,ManishGupta,TimWeningerandJiaweiHan,LINKREC:AUnifiedFrameworkforLink
RecommendationwithUserAttributesandGraphStructure,WWW10
PeixiangZhao,XiaoleiLi,DongXin,andJiaweiHan,GraphCube:OnWarehousingandOLAP
MultidimensionalNetworks,SIGMOD'11
PeixiangZhaoandJiaweiHan,OnGraphQueryOptimizationinLargeNetworks",Proc.2010Int.
Conf.onVeryLargeDataBases(VLDB'10),Singapore,Sept.2010
References:NetworkEvolution
L.Backstrom,D.Huttenlocher,J.Kleinberg,andX.Lan.Groupformationin
largesocialnetworks:Membership,growth,andevolution.KDD'06
M.S.KimandJ.Han.Aparticleanddensitybasedevolutionaryclustering
methodfordynamicnetworks.VLDB'09
J.Leskovec,J.Kleinberg,andC.Faloutsos.Graphsovertime:Densification
laws,shrinkingdiametersandpossibleexplanations.KDD'05
YizhouSun,JieTang,JiaweiHan,ManishGupta,BoZhao,Community
EvolutionDetectioninDynamicHeterogeneousInformationNetworks,KDD
MLG10
35