You are on page 1of 33

INARC I3.

2 Mid-Year Report
I3.2: Modeling and Mining of Text-Rich Information Networks

Dan Roth (Task Co-lead) Jiawei Han (Task Co-Lead) Heng Ji (CUNY) Xifeng Yan (UCSB) University of Illinois at Urbana-Champaign NS-CTA: INARC

Urbana-Champaign, May 12, 2012

I3.2: Modeling and Mining of Text-Rich Information Networks


Key Objectives: Structurally model a text-rich info. network and investigate methods for mining knowledge from such networks Enhance keyword search and knowledge discovery capability by the text-rich info. network model Deliverables: Q1: Methodologies for modeling and construction of multidimensional, relatively structured info. networks by progressive info. network analysis Q2: Models for enhanced text data analysis using relatively structured, heterogeneous info. networks Q3: Methods for multi-facet search in text-rich info. networks Q4: System prototype demo of the approaches Impact: Modeling, principles, and methodologies developed for textKey Technical Innovations: rich info. networks will lead to more relevant query results Exploitation of mostly unstructured data from reports Role Researchers along with some relatively structured metadata and D. Roth, UIUC (INARC) the links (e.g., hyperlinks) between reports to discover Lead key entities associated to a given query. The Lead J. Han, UIUC (INARC) exploitation builds upon semantic processing (e.g., Primary H. Ji, CUNY (INARC) topic modeling), network analysis (e.g, iTopics) and data mining (e.g., topic/text cubes) technologies Primary X. Yan, UCSB (INARC) Efficient algorithms to enrich text mining techniques Collaborators N. Chawla, Notre Dame (SCNARC) (linked with E2.3) with the info. network topology J. J. Garcia-Luna-Aceves, UCSC (CNARC) Information trustworthiness analysis in text-rich info. M. Magdon-Ismail, RPI (SCNARC) (linked with S2.1) networks and other text-rich networks Z. Wen, IBM (SCNARC)
Total $322K 2

Advancing the State-of-the-Art of Network Science

Text-Rich Information Networks: Combining contents & network Focused on large heterogeneous information networks Collections of news articles from diverse resources, blogs and forums Wikipedia, an information network consisting of structured and unstructured data Developed State-of-the-art algorithmic tools Supporting knowledge acquisition, information extraction, text modeling and integrated information structure discovery Utilizing deep text analysis & large scale statistical models over the content and the structure of the network Make use of both explicit network structure and hidden `ontological structure (e.g., category structure) Advanced our understanding of how to: Acquire and extract information from heterogeneous information networks when data is noisy, volatile, uncertain, and incomplete
3 3

Overall Task Organization

Subtask 1: Modeling and construction of multi-dimensional, relatively structured information networks by integrated text and information analysis Subtask2: Topic Modeling and Discovery with InfoNet

Subtask 2: Enhanced text data analysis using relatively structured, heterogeneous information networks

Subtask1: Text-rich InfoNet Construction

Subtask3: Multi-Facet Search

Subtask 3: Multi-facet search in text-rich information networks

Novelty Claims

Subtask 1: Modeling and construction of multi-dimensional, relatively structured information networks by integrated text and information analysis
Explicitly capture the interplay between textual topics and network structure

Subtask 2: Enhanced text data analysis using relatively structured, heterogeneous information networks Novel theories and methods to make text data and information network mutually enhance each other in text understanding and information analysis

Subtask 3: Multi-facet search in text-rich information networks

Exploring effective methods for search and mining in text-rich information networks
5 5

Subtask 1: Text-Rich Network Modeling and Construction


Modeling and construction of multi-dimensional, relatively structured information networks by integrated text and information analysis

Data Fusion and Information Network Fusion: Web structure mining for integration of web data with info. networks *WWW11, SIGMOD11 demo+ Wikification (integration of wikipedia for entity/concept resolution) *ACL11+ Enrichment & disambiguation of information network Dynamic Acquisition of Taxonomic Relations Network *EMNLP10+ Leverage Semantic Information Network to Enhance Entity Co-reference Resolution and Entity Identification [ACL-HLT11] Micro and Macro Collaborative Networks Ranking for Entity and Event Coreference Resolution *EMNLP2011SUB+ Markov Logic Networks and Learning-to-Rank to Enhance Open Domain Role Discovery *TAC10+

Growing Parallel Paths for Web Structure Mining


Path LI DIV ... DIV ... HTML DIV UL
Page B

AX AY AZ AW AU AV

1 2 3 4 5 6

P LI AB HTML DIV

AD

HTML
Page D

DIV

UL LI LI

X Y Z W U V

AE

HTML
Page E

DIV

UL LI TD

Page A

LI

AC

HTML
Page C

DIV

AF

HTML
Page F

TABLE

TR TD

Result:

Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick Barber, Jiawei Han, and Donato Malerba, Growing Parallel Paths for Entity-Page Discovery, WWW'11, Mar. 2011

WinaCS: Web Information Network Analysis for Computer Science


Web structure-guided information extraction and integration Integration of DBLP information networks Integration of mined web structures with DBLP networks for knowledgebase construction

Supports intelligent querying & mining

Tim Weninger, Marina Danilevsky, et al., WinaCS: Construction and Analysis of Web-Based Computer Science Information Networks", ACM SIGMOD'11 (system demo), Athens, Greece, June 2011.

Wikification: Example: Entity Resolution & Tracking

Regi Blinker played three matches for Oranje. The left winger started his pro carreer at Feyenoord and played 400 official matches for Feyenoord, Celtic and Sparta. He retired from football in 2003. Where is he now?

Wikification [ACL2011]

Given: An information networks consisting of news articles and blogs, Wikipedia: Text, Structured Information, Network (hyperlink) Structure and Ontological (Category) Structure. Goal: Identify all entities and concepts mentioned in articles and blog Disambiguate & map each entity and concept to its appropriate Wikipedia page Entity (and Concept) Resolution Associate with each concept a collection of semantic attributes Progressively enrich the information network and enable better access to it. Approach: A global optimization problem that accounts for Local, node-specific information, Global, node and network structure information Ontological network structure Machine Learning algorithms determine candidates and rank nodes
Lev Ratinov, Doug Downey, Mike Anderson, Dan Roth, Local and Global Algorithms for Disambiguation to Wikipedia , ACL11,
10 10

Identification of Taxonomic Relations [EMNLP2010]

(Honda, Toyota) are Siblings M1A2 is-a Tank is-a Vehicle AK-47 is-a Gun

The use of information networks to acquire Taxonomic Relations. Given: An information networks consisting of news articles and blogs Pairs of Concepts or Entities Make use of: WikipediaText, Structured Information, Network (hyperlink) Structure and Ontological (Category) Structure. Goal: Developing large ontologies is essential to progressively enrich the information network and enable better access to it. Huge amount of work has been done on developing stationary networks Suffers from low coverage, noise, and brittleness A Machine Learning & Optimization based approach Exploits the fact that data in heterogeneous information networks is noisy, uncertain, and incomplete. Considers multiple relations, makes use of a global constraint optimization process to leverage both Wikipedia and the web. Significantly outperforms existing well-known taxonomical networks.
Quang Do and Dan Roth, Constraints based Taxonomic Relation Classification, EMNLP10
11 11

Leverage Semantic Information Network to Enhance Entity Coreference Resolution / Entity Identification
Disambiguation

Name Variant Clustering

9.4% absolute improvement in micro-averaged accuracy


(CUNY) Heng Ji and Ralph Grishman. "Knowledge Base Population: Successful Approaches and Challenges". ACL-HLT2011. 12

Micro and Macro Collaborative Networks Ranking for Entity and Event Coreference Resolution ( oAq ) Previous methods only focused on the

cq2 q

cq1
cq3

0.7 0.4
cq5

cq4 q

target node and one learning theory itself Propose a new collaborative network ranking theory which imitates human collaborative learning Leverage inter-connections among collaborative entities in information networks
Automatic

cq7

0.3 cq6 0.6 (q)


oB
7%

correct rank :

profiling for each node Construct a collaborative network for each entity based on graph-based clustering Rank multiple decisions from collaborative entities (micro) and algorithms (macro) based on global prediction

absolute improvement in microaveraged accuracy On-going CUNY+UIUC work: using topic modeling for entity clustering

(CUNY) Zheng Chen and Heng Ji. 2011. Collaborative Ranking: A Case Study in Entity Linking. Proc. EMNLP2011 [SUB]

13

13

Markov Logic Networks and Learning-to-Rank to Enhance Open Domain Role Discovery
V6 forum V7 Wail Al-Shehri V8 twitter Al-Qaeda V13 member origin Wail Al-Shehri Khamis Mushait Boston V15 residence V14 residence Waleed Al-Shehri Terrorist Information Network V4 Abdul Aziz Al-Omari Mohamed Atta

Waleed Al-Shehri V9 V10

sibling

911 Suspect Terrorist Network V3 Abdul Rahman Al-Omari V11 news page

V3 Abdul Rahman Al-Omari Abdul Aziz Al-Omari pilot V12 V16 web blog pilot Saudi Arabian Airlines

V4

Mohamed Atta

Discovered 26 roles for persons, 16 roles for organizations and 13 roles for locations Markov Logic Networks for Cross-slot and Cross-query reasoning based on InfoNet and textual linkages to resolve conflictions and predict missing links Weight=15: x , y , z Ambiguous ( X , Y ) Textual Linkage(Y , Z ) Pilot ( X ) Pilot ( Z ) Remove X Weight=100: x , y , z Sibling ( X , Y ) Origin(Y , Z ) Origin( X , Z ) Maximum Entropy based Learning-to-rank model to re-rank candidate answers 13%-22% absolute F-measure improvement
(CUNY) Chen et al. "CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description". Proc. TAC2010 and Lecture Notes in Computer Science, 2010 14 14

Subtask 2. Network-Enhanced Text Analysis


Enhanced text data analysis using relatively structured, heterogeneous information networks

Progressive Dynamic Information Network Analysis [EMNLP11, ACL-HLT2011 (sub)] Integration of Heterogeneous Info. Network and Topic Modeling (Biased Propagation) [KDD11 (sub)] Topic Modeling for Active Learning and Inference in Event Network Construction [ACL-HLT2011 (sub)] Geographical Topic Discovery & Comparison [WWW11] Latent Association Analysis of Document Pairs [KDD11 (sub)]
15 15

Progressive Dynamic Information Network Analysis


Islamic Republic of Iran Broadcasting Iran Supreme National Security Council

Motivations
Most information obtained on text-rich InfoNet construction so far is viewed as static, ignoring the temporal dimension of many types of attributes

Ali Larijani

Approaches
Temporal Role Representation [T1 T2 T3 T4] = <start-start, end-start, start-end, end-end> New Evaluation Metric

Tehran University

Farideh Motahari

Hassan Rowhani

Aggregation over 2 tuples Baseline

Aggregation over 10 tuples

Our Approach on Information Networks

Local temporal role discovery using new kernel methods based on dependency paths Global inference and aggregation to resolve conflicts using Integer Linear Programming (ongoing collaboration with Dan Roth at UIUC) Results State-of-the-art temporal role classification accuracy and lowest vagueness/over-constraining
16

(CUNY) Javier Artiles, Qi Li, Enrique Amigo and Heng Ji. 2011. Leveraging Cross-document Redundancy for Temporal Information Extraction. EMNLP2011, ACL-HLT2011 [SUB]

Probabilistic Topic Models with Biased Propagation on Heterogeneous Information Networks [KDD11 (sub)]

Problem and Motivation: Discover latent topics & identify clusters of multi-typed objects simultaneously Treat multi-typed objects differently (e.g., D w. rich text & U w.o. explicit text) Solution and Contribution: Basic idea: biased topic propagation Propose a novel TMBP algorithm to directly incorporate heter. infornet instead of homog. InforNet with topic modeling (improve 20%-40% over PLSA)

Topic modeling with heterogeneous InforNet

Biased propagation
(UIUC) Hongbo Deng, Jiawei Han, Bo Zhao, Yintao Yu, and Cindy Xide Lin, "Probabilistic Topic Models with Biased Propagation on Heterogeneous Information Networks", KDD'11 (sub)

Topic model
17

Topic Modeling for Active Learning and Inference in Event Network Construction
north talks Korea south Putin Pyong officials
yang Washington Event Type: "Contact" Trigger: talk, meet etc. Arguments: "Entity" "Instrument" "Place" "Time-Within" Event Type: "Business" Trigger: form, dissolve Arguments: "Org""Place" "TimeWithin" "Agent" Event Type: "Attack" Trigger: blew, attack Arguments: "Attacker" "Target" "Place" "TimeWithin" Event Type: "Justice" Trigger: Arrest, Jail Arguments:"Defendant" "Time-Within" "Adjudicator" "Place"

Doc 1

Topic

Doc 2

nuclear program China weapons


United States

Doc 3

troops
Saddam

Iraqi

Doc 4

fighting army regime British forces military city Kurdish control


Baghdad

Doc 5

Doc 6

court York dollars EventType:"Transaction" case Trigger: Borrow, Launch Doc N million Arguments: "Giver" AFP government "Recipient""Money""Sell media er""Artifact""Buyer" convicted billion company (CUNY + UIUC) Hao Li, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Topically Related sentence Data is Better Data: Topic Modeling for Event Extraction. ACL-HLT2011[sub]

Modeling can enhance information network construction by grouping similar event types together and converging information distributions Using Topic modeling, with only 1/4 training data we can achieve comparable performance as passive learning Cross-document inference within topic clusters provided 10% improvement over state-of-the-art event extraction, significant gains over IR based clustering Ongoing work: apply new entitydriven and biased propagation based topic modeling methods

Geographical Topic Discovery & Comparison

Motivation: Analyze GPS-associated documents, e.g., geo-tagged photos and tweets sent from iphones Problem: Given a collection of GPS-associated documents and # of topics K, discover K geotopics along with the topic distribution in different geo. locations Latent Geographical Topic Analysis Combine text and GPS location info Words that are close to each other are more likely to be in the same region. Words that are in the same regions are more likely to be in the same topic Regions are not known beforehand. Our framework adopts the region discovery process according to the dataset

Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas Huang, Geographical Topic Discovery and Comparison, WWW'11, Mar. 2011

19

Latent Association Analysis of Document Pairs


Document Pairs Correlation Factor
1

Topic Simplex for Corpus 1

Topic Simplex for Corpus 2

Latent Association Analysis (LAA) mines the topics of two document sets simultaneously, taking the bipartite network between two document sets into consideration One of the first attempts to analyze the topic structures of two connected document sets, aiming to infer their mapping network model LAA significantly outperforms existing algorithms with 70% accuracy improvement
Gengxin Miao, et al., Latent Association Analysis of Document Pairs, KDD11 (sub)

Subtask 3: Multi-Facet Search and Mining

Information Network-Based Trustworthiness Analysis *COLING10, Army Sci10 (Best Paper Award), WWW11+ Progressive Network Analysis for Expert Search (Diffusion through Co-occurrence Relationships for Expert Search on the Web) *SIGIR11 sub+

Modeling and Exploiting Heterogeneous Sources for Expertise Ranking *SIGIR11 sub+
Personalized Recommendation on Information Networks *SIGIR11 sub+ Multi-facet Search in Self-Boosting Information Networks (Demo: Terrorism Network Search and Browsing) *SIGIR11 sub+
21 21

Information Network-Based Trustworthiness Analysis

Given: Multiple Information networks: websites, blogs, forums, sensor networks Some claims, e.g., [Person A travelled to France], [There is a fire in downtown Chicago] Prior beliefs and background knowledge Our goal is to: Score trustworthiness of Claims based on support across multiple (trusted) sources in the network source characteristics: reputation, interest-group, verifiability of information, etc. Prior Beliefs and Background knowledge Rate databases/sources as more/less trustworthy Track how the trustworthiness of fact / database varies with time as the text corpus grows over time New framework for incorporating prior knowledge into any fact-finding algorithm Done via a Linear Programming approach Highly expressive declarative constraints Tractable (polynomial time) Prior knowledge improve results Absolutely essential when the users judgment varies from the norm
Dan Roth et al, COLING10, Army Sci10 (Best Paper Award), WWW11
22 22

Progressive Network Analysis for Expert Search

Goal: find and rank people who have expertise described by user query Web pages are more noisy, contain spam compared to corpus in an enterprise. Both relevance and reputation should be considered Use a heterogeneous hypergraph to model the co-occurrence relationships among people and words and devise a heat diffusion model on the hyerpgraph Applied to 0.5B web pages Accuracy: 50%-200% improvement than the leading language model methods. Significantly overcome noises in the Web.

Ziyu Guan, et al., Diffusion through Co-occurrence Relationships for Expert Search on the Web, SIGIR11 (sub)

23

Modeling and Exploiting Heterogeneous Sources for Expertise Ranking


Problem: How to leverage both heterogeneous network and documents to identify the relevant experts for a given query?
Baseline: The expertise of a person could be characterized based on his/her associated documents (doc-based method) Intuitions: Citation graph: Similar documents are likely to have similar relevance to a given query Coauthor graphCitation graph Coauthor graph: Two authors are most likely to share Top-10 experts for query: similar expertise if they coauthor many papers. Information retrieval Document-author bipartite graph: mutually reinforced between documents (x) and authors (y) Solution: We formulate a joint regularization framework to incorporate several hypotheses to capture the information of different graphs together with textual documents Result: Using DBLP with 2M nodes and 10M edges. Significant improvements over the baseline.
Hongbo Deng, et al., Modeling and Exploiting Heterogeneous Sources for Expertise Ranking, SIGIR11 (sub)

24

Multi-Facet Search in Self-Boosting Information Networks (Example: Terrorism Network Search and Browsing)
Demo: http://blender2.cs.qc.cuny.edu/BlenderGraph/ Video: http://nlp.cs.qc.cuny.edu/terrorism.m4v

Facilitate a military analyst in expert finding and terrorist information search gathering,
control and analysis for any given query Entity-topic analyzer for self-expansion and self-boosting: Terrorism organization members status of members (die, arrest,...) and information networks associated with each member
(CUNY + UIUC) Sam Anzaroot, Javier Artiles, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Search and Browsing Self-Boosting Information Networks. SIGIR2011 [SUB]

Military Relevance

Subtask 1: Text-Rich Network Modeling and Construction Object search task enhanced by entity disambiguation and role discovery can provide methods for finding groups of soldiers and identifying terrorists with certain expertise Subtask 2: Network-Enhanced Text Analysis Asymmetric wars and counter-terrorism need understand text-rich net Text mining for monitoring potential threats and detecting terrorism with entity-topic modeling and event detection and tracking Subtask 3: Multi-Facet Search and Mining in Text-Rich Networks Most military applications need to search in multi-facets on text and unstructured data, including emails, reports, telecommunication messages, military-related news and blogs Our multi-facet multi-dimensional information network search and browsing tool has rich functions and provide intelligent network expansions
26 26

I3.2s Collaboration Network


I1.1 Roth Huang I1.2 Tarek, Charu

T1.5 Lin, Wen


S1.1 Lin

I3.1 Han, Yan

I3.2: Han, Ji, Roth, Yan weekly telecons frequent emails 5 joint papers

Logic Reasoning for Information Validation

T1.4 Parsons

ARL Cole Winkler

Military Data for Topic Analysis

T1.1 Adali

IRC Leung Data & Experiments

E2.3 Han

27

Next Six Months and Path Ahead to 2012

Continue research on mining text-intensive information networks Research in three frontiers: (1) integrated classification and clustering in network mining, (2) build up a theory on link/relationship analysis in heterogeneous networks, and (3) explore military applications Collaborations with researchers in other networks Work with Nitesh Chawla, who has done much work on link prediction, on evaluation of mining methods for clustering and classification of heterogeneous networks Work with SCNARC (Boleslaw Szymanski et al.) on using the method developed here to mine social and cognitive networks Next year research planned if funded Effective theory and methods for mining heterogeneous networks involving social and communication networks Network classification and clustering modeling in heterogeneous information, social, and communication networks Application of role discovery, network classification, and anomaly detection methods in military applications
28

Research Papers (Accepted/Published, 2011)


1.

2.

3.

4.

5.

6.

7. 8.

Tim Weninger, Marina Danilevsky, Fabio Fumarola, Joshua Hailpern, Jiawei Han, et al., WinaCS: Construction and Analysis of Web-Based Computer Science Information Networks", Proc. of 2011 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'11), (system demo paper), Athens, Greece, June 2011. Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas Huang, Geographical Topic Discovery and Comparison, Proc. of 2011 Int. World Wide Web Conf. (WWW'11), Hyderabad, India, Mar. 2011 (Full paper). Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick Barber, Jiawei Han, and Donato Malerba, Growing Parallel Paths for Entity-Page Discovery, Proc. of 2011 Int. World Wide Web Conf. (WWW'11), Hyderabad, India, Mar. 2011 (Poster paper) Heng Ji and Ralph Grishman. "Knowledge Base Population: Successful Approaches and Challenges". Accepted by Proc. the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2011), 2011. Heng Ji, Adam Lee and Wen-Pin Lin. "Information Network Construction and Alignment from Automatically Acquired Comparable Corpora". Invited book chapter for Building and Using Comparable Corpora. Springer, 2011. Heng Ji, Benoit Favre, Wen-Pin Lin, Dan Gillick, Dilek Hakkani-Tur and Ralph Grishman. 2011. "Open-domain Multi-document Summarization via Information Extraction: Challenges and Prospects". Invited book chapter for Multi-source, Multilingual Information Extraction and Summarisation. Springer. Lev Ratinov, Doug Downey, Mike Anderson, Dan Roth, Local and Global Algorithms for Disambiguation to Wikipedia , ACL11 Y. Chan and D. Roth, Exploiting Syntactico-Semantic Structures for Relation Extraction, ACL11
29

Research Papers (Published, Sept.-Dec. 2010)


1. 2.

3.

4.

5.

6.

7.

8.

Manish Gupta, Rui Li, Zhijun Yin, and Jiawei Han, Survey on Social Tagging Techniques", SIGKDD Explorations, 12(1):58-72, 2010. Lu Liu, Jie Tang, Jiawei Han, Meng Jiang, Shiqiang Yang, Mining Topic-Level Influence in Heterogeneous Networks", Proc. 2010 ACM Int. Conf. on Information and Knowledge Management (CIKM'10), Toronto, Canada, Oct. 2010 Tim Weninger, Fabio Fumarola, Jiawei Han, Donato Malerba, Mapping Web Pages to Database Records via Link Paths", Proc. 2010 ACM Int. Conf. on Information and Knowledge Management (CIKM'10), Toronto, Canada, Oct. 2010. Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han, The Wisdom of Social Multimedia: Using Flickr for Prediction and Forecast", Proc. 2010 ACM Multimedia Int. Conf. (ACM-Multimedia10), Florence, Italy, Oct. 2010 Zheng Chen, Suzanne Tamang, Adam Lee, Xiang Li, Wen-Pin Lin, Javier Artiles, Matthew Snover, Marissa Passantino and Heng Ji. "CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description". Proc. Text Analytics Conference (TAC2010), 2010 Hao Li, Xiang Li, Heng Ji and Yuval Marton. 2010. "Domain-Independent Novel Event Discovery and Semi-Automatic Event Annotation". Proc. the 23rd Pacific Asia Conference on Language, Information and Computation (PACLIC 2010) J. Pasternack and Dan Roth, Comprehensive Trust Metrics for Information Networks , Army Science Conf.10 (Best Paper Award), Dec. 2010. Q. Do and D. Roth, Constraints based Taxonomic Relation Classification, EMNLP10, Oct. 2010
30

Research Papers (Submitted, 2011)


1. 2. 3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

(UIUC + U. Michigan) Cindy Xide Lin, Qiaozhu Mei, Yunliang Jiang, and Jiawei Han, "Inferring the Diffusion and Evolution of Topics in Social Communities", KDD'11 (sub) (UIUC) Hongbo Deng, Jiawei Han, Bo Zhao, Yintao Yu, Cindy Xide Lin, "Probabilistic Topic Models with Biased Propagation on Heterogeneous Information Networks", KDD'11 (sub) (UIUC) Zhijun Yin (UIUC), Liangliang Cao (UIUC), Jiawei Han (UIUC), Chengxiang Zhai (UIUC), Thomas Huang (UIUC), "LPTA: A Probabilistic Model for Latent Periodic Topic Analysis", KDD'11 (sub) (CUNY + UIUC) Heng Ji and Jiawei Han. 2011. Web-Scale Knowledge Discovery and Information Extraction. Invited Paper for IEEE Special Issue on Web-Scale Multimedia Processing and Applications. In Preparation. (CUNY + UIUC) Hao Li, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Topically Related Data is Better Data: Topic Modeling for Event Extraction. Submitted to the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2011) (CUNY + UIUC) Sam Anzaroot, Javier Artiles, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Search and Browsing Self-Boosting Information Networks. Submitted to the 34th Annual International ACM SIGIR Conference (SIGIR2011) (CUNY) Javier Artiles, Qi Li, Enrique Amigo and Heng Ji. 2011. Leveraging Cross-document Redundancy for Temporal Information Extraction. Submitted to Empirical Methods in Natural Language Processing (EMNLP2011) (CUNY) Javier Artiles, Enrique Amigo, Qi Li and Heng Ji. 2011. Evaluating Temporal Information Extraction. Submitted to ACLHLT2011 (CUNY) Zheng Chen and Heng Ji. 2011. Collaborative Ranking: A Case Study in Entity Linking. Submitted to Conference on Empirical Methods in Natural Language Processing (EMNLP2011) (CUNY) Qi Li, Javier Artiles and Heng Ji. 2011. Dependency Paths Kernel for Temporal Relation Classification. Submitted to 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2011). (CUNY) Suzanne Tamang and Heng Ji. 2011. Learning-to-Rank for Slot Filling System Combination and Assessment. Submitted to Conference on Empirical Methods in Natural Language Processing (EMNLP2011) (CUNY) Zheng Chen, Suzanne Tamang, Adam Lee and Heng Ji. 2011. A Toolkit for Knowledge Base Population. Submitted to the 34th Annual International ACM SIGIR Conference (SIGIR2011) (CUNY) Xiang Li and Heng Ji. 2011. Comment-guided Reinforcement Learning for Slot Filling. Submitted to Conference on Empirical Methods in Natural Language Processing (EMNLP2011) 31

Other Technical Contributions

(Book: UIC + UIUC + CMU) Philip S. Yu, Jiawei Han, and Christos Faloutsos (Editors), LINK MINING: MODELS, ALGORITHMS AND APPLICATIONS, Springer, 2010. (UIUC) Jiawei Han has received Daniel C. Drucker Eminent Faculty Award at UIUC (UCSB) Ms. Gengxin Miao, who was supported by the INARC program, has received IBM Ph.D. Fellowship for 2011-2012. Gengxin Miao is co-supervised by Xifeng Yan at INARC. (CUNY) Heng Ji. CUNY Chancellor's "Salute to Scholar" Award, November 2010. (CUNY) Heng Ji. National Science Foundation Research Experiences for Undergraduates, March 2011 Jiawei Han, Towards Integrated Mining of Multiple Social and Information Networks (keynote speech) The 2011 Int. Conf. on Advances in Social Network Analysis and Mining (ASONAM11), July 2011. Jiawei Han, Exploring the Power of Heterogeneous Information Networks in Data Mining (keynote speech) The 2011 Int. SIAM Data Mining Conf. (SDM11), April 2011. Jiawei Han, Construction and Analysis of Web-Based Computer Science Information Networks (keynote speech) The 2011 Int. Conf. on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, June 2011. Latifur Khan, Wei Fan, Jiawei Han, Jing Gao, Mohammad Mehedy Masud, Data Stream Mining: Challenges and Techniques, (tutorial), The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2011), May 2011 Jiawei Han, Web Structure Mining and Information Network Analysis: An Integrated Approach, invited speech at the Third International Workshop on Network Theory: Web Science Meets Network Science, March 2011. Heng Ji, Web-Scale Knowledge Discovery and Population from Unstructured Data, Keynote Speech ACLCLP 2010 Information Retrieval Conference, December 2010. Heng Ji. Overview of the TAC2010 Knowledge Base Population Track, Keynote Speech at Web People Search (WePS-3) Conference, September 2010.
32

Personalized Recommendation on Information Networks


Concept extraction Text Concept
Combine text & links in heterogeneous networks

Find good conceptual associations of user interests; distinguish clean sources and noisy sources

(UIUC) Chi Wang, et al., Learning Relevance in a Heterogeneous Social Network and Its Application in Online Targeting, SIGIR11 (sub)

33

You might also like