I3.2.pptx обратить внимание

INARC I3.
2 Mid-Year Report
I3.2: Modeling and Mining of Text-Rich Information Networks
Dan Roth (Task Co-lead) Jiawei Han (Task Co-Lead) Heng Ji (CUNY) Xifeng Yan (UCSB) University of Illinois at Urbana-Champaign NS-CTA: INARC
Urbana-Champaign, May 12, 2012
I3.2: Modeling and Mining of Text-Rich Information Networks

Key Objectives: Structurally model a text-rich info. network and investigate methods for mining knowledge from such networks Enhance keyword search and knowledge discovery capability by the text-rich info. network model Deliverables: Q1: Methodologies for modeling and construction of multidimensional, relatively structured info. networks by progressive info. network analysis Q2: Models for enhanced text data analysis using relatively structured, heterogeneous info. networks Q3: Methods for multi-facet search in text-rich info. networks Q4: System prototype demo of the approaches Impact: Modeling, principles, and methodologies developed for textKey Technical Innovations: rich info. networks will lead to more relevant query results Exploitation of mostly unstructured data from reports Role Researchers along with some relatively structured metadata and D. Roth, UIUC (INARC) the links (e.g., hyperlinks) between reports to discover Lead key entities associated to a given query. The Lead J. Han, UIUC (INARC) exploitation builds upon semantic processing (e.g., Primary H. Ji, CUNY (INARC) topic modeling), network analysis (e.g, iTopics) and data mining (e.g., topic/text cubes) technologies Primary X. Yan, UCSB (INARC) Efficient algorithms to enrich text mining techniques Collaborators N. Chawla, Notre Dame (SCNARC) (linked with E2.3) with the info. network topology J. J. Garcia-Luna-Aceves, UCSC (CNARC) Information trustworthiness analysis in text-rich info. M. Magdon-Ismail, RPI (SCNARC) (linked with S2.1) networks and other text-rich networks Z. Wen, IBM (SCNARC)
Total $322K 2
Advancing the State-of-the-Art of Network Science
Text-Rich Information Networks: Combining contents & network Focused on large heterogeneous information networks Collections of news articles from diverse resources, blogs and forums Wikipedia, an information network consisting of structured and unstructured data Developed State-of-the-art algorithmic tools Supporting knowledge acquisition, information extraction, text modeling and integrated information structure discovery Utilizing deep text analysis & large scale statistical models over the content and the structure of the network Make use of both explicit network structure and hidden `ontological structure (e.g., category structure) Advanced our understanding of how to: Acquire and extract information from heterogeneous information networks when data is noisy, volatile, uncertain, and incomplete
3 3
Overall Task Organization
Subtask 1: Modeling and construction of multi-dimensional, relatively structured information networks by integrated text and information analysis Subtask2: Topic Modeling and Discovery with InfoNet
Subtask 2: Enhanced text data analysis using relatively structured, heterogeneous information networks
Subtask1: Text-rich InfoNet Construction
Subtask3: Multi-Facet Search
Subtask 3: Multi-facet search in text-rich information networks
Novelty Claims
Subtask 1: Modeling and construction of multi-dimensional, relatively structured information networks by integrated text and information analysis
Explicitly capture the interplay between textual topics and network structure
Subtask 2: Enhanced text data analysis using relatively structured, heterogeneous information networks Novel theories and methods to make text data and information network mutually enhance each other in text understanding and information analysis
Subtask 3: Multi-facet search in text-rich information networks
Exploring effective methods for search and mining in text-rich information networks
5 5
Subtask 1: Text-Rich Network Modeling and Construction

Modeling and construction of multi-dimensional, relatively structured information networks by integrated text and information analysis
Data Fusion and Information Network Fusion: Web structure mining for integration of web data with info. networks *WWW11, SIGMOD11 demo+ Wikification (integration of wikipedia for entity/concept resolution) *ACL11+ Enrichment & disambiguation of information network Dynamic Acquisition of Taxonomic Relations Network *EMNLP10+ Leverage Semantic Information Network to Enhance Entity Co-reference Resolution and Entity Identification [ACL-HLT11] Micro and Macro Collaborative Networks Ranking for Entity and Event Coreference Resolution *EMNLP2011SUB+ Markov Logic Networks and Learning-to-Rank to Enhance Open Domain Role Discovery *TAC10+
Growing Parallel Paths for Web Structure Mining

Path LI DIV ... DIV ... HTML DIV UL
Page B
AX AY AZ AW AU AV
1 2 3 4 5 6
P LI AB HTML DIV
AD
HTML
Page D
DIV
UL LI LI
X Y Z W U V
AE
HTML
Page E
DIV
UL LI TD
Page A
LI
AC
HTML
Page C
DIV
AF
HTML
Page F
TABLE
TR TD
Result:
Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick Barber, Jiawei Han, and Donato Malerba, Growing Parallel Paths for Entity-Page Discovery, WWW'11, Mar. 2011
WinaCS: Web Information Network Analysis for Computer Science

Web structure-guided information extraction and integration Integration of DBLP information networks Integration of mined web structures with DBLP networks for knowledgebase construction
Supports intelligent querying & mining
Tim Weninger, Marina Danilevsky, et al., WinaCS: Construction and Analysis of Web-Based Computer Science Information Networks", ACM SIGMOD'11 (system demo), Athens, Greece, June 2011.
Wikification: Example: Entity Resolution & Tracking
Regi Blinker played three matches for Oranje. The left winger started his pro carreer at Feyenoord and played 400 official matches for Feyenoord, Celtic and Sparta. He retired from football in 2003. Where is he now?
Wikification [ACL2011]
Given: An information networks consisting of news articles and blogs, Wikipedia: Text, Structured Information, Network (hyperlink) Structure and Ontological (Category) Structure. Goal: Identify all entities and concepts mentioned in articles and blog Disambiguate & map each entity and concept to its appropriate Wikipedia page Entity (and Concept) Resolution Associate with each concept a collection of semantic attributes Progressively enrich the information network and enable better access to it. Approach: A global optimization problem that accounts for Local, node-specific information, Global, node and network structure information Ontological network structure Machine Learning algorithms determine candidates and rank nodes
Lev Ratinov, Doug Downey, Mike Anderson, Dan Roth, Local and Global Algorithms for Disambiguation to Wikipedia , ACL11,
10 10
Identification of Taxonomic Relations [EMNLP2010]
(Honda, Toyota) are Siblings M1A2 is-a Tank is-a Vehicle AK-47 is-a Gun
The use of information networks to acquire Taxonomic Relations. Given: An information networks consisting of news articles and blogs Pairs of Concepts or Entities Make use of: WikipediaText, Structured Information, Network (hyperlink) Structure and Ontological (Category) Structure. Goal: Developing large ontologies is essential to progressively enrich the information network and enable better access to it. Huge amount of work has been done on developing stationary networks Suffers from low coverage, noise, and brittleness A Machine Learning & Optimization based approach Exploits the fact that data in heterogeneous information networks is noisy, uncertain, and incomplete. Considers multiple relations, makes use of a global constraint optimization process to leverage both Wikipedia and the web. Significantly outperforms existing well-known taxonomical networks.
Quang Do and Dan Roth, Constraints based Taxonomic Relation Classification, EMNLP10
11 11
Leverage Semantic Information Network to Enhance Entity Coreference Resolution / Entity Identification
Disambiguation
Name Variant Clustering
9.4% absolute improvement in micro-averaged accuracy

(CUNY) Heng Ji and Ralph Grishman. "Knowledge Base Population: Successful Approaches and Challenges". ACL-HLT2011. 12
Micro and Macro Collaborative Networks Ranking for Entity and Event Coreference Resolution ( oAq ) Previous methods only focused on the
cq2 q
cq1
cq3
0.7 0.4
cq5
cq4 q
target node and one learning theory itself Propose a new collaborative network ranking theory which imitates human collaborative learning Leverage inter-connections among collaborative entities in information networks
Automatic
cq7
0.3 cq6 0.6 (q)

oB
7%
correct rank :
profiling for each node Construct a collaborative network for each entity based on graph-based clustering Rank multiple decisions from collaborative entities (micro) and algorithms (macro) based on global prediction
absolute improvement in microaveraged accuracy On-going CUNY+UIUC work: using topic modeling for entity clustering
(CUNY) Zheng Chen and Heng Ji. 2011. Collaborative Ranking: A Case Study in Entity Linking. Proc. EMNLP2011 [SUB]
13
13
Markov Logic Networks and Learning-to-Rank to Enhance Open Domain Role Discovery
V6 forum V7 Wail Al-Shehri V8 twitter Al-Qaeda V13 member origin Wail Al-Shehri Khamis Mushait Boston V15 residence V14 residence Waleed Al-Shehri Terrorist Information Network V4 Abdul Aziz Al-Omari Mohamed Atta
Waleed Al-Shehri V9 V10
sibling
911 Suspect Terrorist Network V3 Abdul Rahman Al-Omari V11 news page
V3 Abdul Rahman Al-Omari Abdul Aziz Al-Omari pilot V12 V16 web blog pilot Saudi Arabian Airlines
V4
Mohamed Atta
Discovered 26 roles for persons, 16 roles for organizations and 13 roles for locations Markov Logic Networks for Cross-slot and Cross-query reasoning based on InfoNet and textual linkages to resolve conflictions and predict missing links Weight=15: x , y , z Ambiguous ( X , Y ) Textual Linkage(Y , Z ) Pilot ( X ) Pilot ( Z ) Remove X Weight=100: x , y , z Sibling ( X , Y ) Origin(Y , Z ) Origin( X , Z ) Maximum Entropy based Learning-to-rank model to re-rank candidate answers 13%-22% absolute F-measure improvement
(CUNY) Chen et al. "CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description". Proc. TAC2010 and Lecture Notes in Computer Science, 2010 14 14
Subtask 2. Network-Enhanced Text Analysis

Enhanced text data analysis using relatively structured, heterogeneous information networks
Progressive Dynamic Information Network Analysis [EMNLP11, ACL-HLT2011 (sub)] Integration of Heterogeneous Info. Network and Topic Modeling (Biased Propagation) [KDD11 (sub)] Topic Modeling for Active Learning and Inference in Event Network Construction [ACL-HLT2011 (sub)] Geographical Topic Discovery & Comparison [WWW11] Latent Association Analysis of Document Pairs [KDD11 (sub)]
15 15
Progressive Dynamic Information Network Analysis

Islamic Republic of Iran Broadcasting Iran Supreme National Security Council
Motivations
Most information obtained on text-rich InfoNet construction so far is viewed as static, ignoring the temporal dimension of many types of attributes
Ali Larijani
Approaches
Temporal Role Representation [T1 T2 T3 T4] = <start-start, end-start, start-end, end-end> New Evaluation Metric
Tehran University
Farideh Motahari
Hassan Rowhani
Aggregation over 2 tuples Baseline
Aggregation over 10 tuples
Our Approach on Information Networks
Local temporal role discovery using new kernel methods based on dependency paths Global inference and aggregation to resolve conflicts using Integer Linear Programming (ongoing collaboration with Dan Roth at UIUC) Results State-of-the-art temporal role classification accuracy and lowest vagueness/over-constraining
16
(CUNY) Javier Artiles, Qi Li, Enrique Amigo and Heng Ji. 2011. Leveraging Cross-document Redundancy for Temporal Information Extraction. EMNLP2011, ACL-HLT2011 [SUB]
Probabilistic Topic Models with Biased Propagation on Heterogeneous Information Networks [KDD11 (sub)]
Problem and Motivation: Discover latent topics & identify clusters of multi-typed objects simultaneously Treat multi-typed objects differently (e.g., D w. rich text & U w.o. explicit text) Solution and Contribution: Basic idea: biased topic propagation Propose a novel TMBP algorithm to directly incorporate heter. infornet instead of homog. InforNet with topic modeling (improve 20%-40% over PLSA)
Topic modeling with heterogeneous InforNet
Biased propagation
(UIUC) Hongbo Deng, Jiawei Han, Bo Zhao, Yintao Yu, and Cindy Xide Lin, "Probabilistic Topic Models with Biased Propagation on Heterogeneous Information Networks", KDD'11 (sub)
Topic model
17
Topic Modeling for Active Learning and Inference in Event Network Construction
north talks Korea south Putin Pyong officials
yang Washington Event Type: "Contact" Trigger: talk, meet etc. Arguments: "Entity" "Instrument" "Place" "Time-Within" Event Type: "Business" Trigger: form, dissolve Arguments: "Org""Place" "TimeWithin" "Agent" Event Type: "Attack" Trigger: blew, attack Arguments: "Attacker" "Target" "Place" "TimeWithin" Event Type: "Justice" Trigger: Arrest, Jail Arguments:"Defendant" "Time-Within" "Adjudicator" "Place"
Doc 1
Topic
Doc 2
nuclear program China weapons

United States
Doc 3
troops
Saddam
Iraqi
Doc 4
fighting army regime British forces military city Kurdish control

Baghdad
Doc 5
Doc 6
court York dollars EventType:"Transaction" case Trigger: Borrow, Launch Doc N million Arguments: "Giver" AFP government "Recipient""Money""Sell media er""Artifact""Buyer" convicted billion company (CUNY + UIUC) Hao Li, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Topically Related sentence Data is Better Data: Topic Modeling for Event Extraction. ACL-HLT2011[sub]
Modeling can enhance information network construction by grouping similar event types together and converging information distributions Using Topic modeling, with only 1/4 training data we can achieve comparable performance as passive learning Cross-document inference within topic clusters provided 10% improvement over state-of-the-art event extraction, significant gains over IR based clustering Ongoing work: apply new entitydriven and biased propagation based topic modeling methods
Geographical Topic Discovery & Comparison
Motivation: Analyze GPS-associated documents, e.g., geo-tagged photos and tweets sent from iphones Problem: Given a collection of GPS-associated documents and # of topics K, discover K geotopics along with the topic distribution in different geo. locations Latent Geographical Topic Analysis Combine text and GPS location info Words that are close to each other are more likely to be in the same region. Words that are in the same regions are more likely to be in the same topic Regions are not known beforehand. Our framework adopts the region discovery process according to the dataset
Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas Huang, Geographical Topic Discovery and Comparison, WWW'11, Mar. 2011
19
Latent Association Analysis of Document Pairs

Document Pairs Correlation Factor
1
Topic Simplex for Corpus 1
Topic Simplex for Corpus 2
Latent Association Analysis (LAA) mines the topics of two document sets simultaneously, taking the bipartite network between two document sets into consideration One of the first attempts to analyze the topic structures of two connected document sets, aiming to infer their mapping network model LAA significantly outperforms existing algorithms with 70% accuracy improvement
Gengxin Miao, et al., Latent Association Analysis of Document Pairs, KDD11 (sub)
Subtask 3: Multi-Facet Search and Mining
Information Network-Based Trustworthiness Analysis *COLING10, Army Sci10 (Best Paper Award), WWW11+ Progressive Network Analysis for Expert Search (Diffusion through Co-occurrence Relationships for Expert Search on the Web) *SIGIR11 sub+
Modeling and Exploiting Heterogeneous Sources for Expertise Ranking *SIGIR11 sub+
Personalized Recommendation on Information Networks *SIGIR11 sub+ Multi-facet Search in Self-Boosting Information Networks (Demo: Terrorism Network Search and Browsing) *SIGIR11 sub+
21 21
Information Network-Based Trustworthiness Analysis
Given: Multiple Information networks: websites, blogs, forums, sensor networks Some claims, e.g., [Person A travelled to France], [There is a fire in downtown Chicago] Prior beliefs and background knowledge Our goal is to: Score trustworthiness of Claims based on support across multiple (trusted) sources in the network source characteristics: reputation, interest-group, verifiability of information, etc. Prior Beliefs and Background knowledge Rate databases/sources as more/less trustworthy Track how the trustworthiness of fact / database varies with time as the text corpus grows over time New framework for incorporating prior knowledge into any fact-finding algorithm Done via a Linear Programming approach Highly expressive declarative constraints Tractable (polynomial time) Prior knowledge improve results Absolutely essential when the users judgment varies from the norm
Dan Roth et al, COLING10, Army Sci10 (Best Paper Award), WWW11
22 22
Progressive Network Analysis for Expert Search
Goal: find and rank people who have expertise described by user query Web pages are more noisy, contain spam compared to corpus in an enterprise. Both relevance and reputation should be considered Use a heterogeneous hypergraph to model the co-occurrence relationships among people and words and devise a heat diffusion model on the hyerpgraph Applied to 0.5B web pages Accuracy: 50%-200% improvement than the leading language model methods. Significantly overcome noises in the Web.
Ziyu Guan, et al., Diffusion through Co-occurrence Relationships for Expert Search on the Web, SIGIR11 (sub)
23
Modeling and Exploiting Heterogeneous Sources for Expertise Ranking

Problem: How to leverage both heterogeneous network and documents to identify the relevant experts for a given query?
Baseline: The expertise of a person could be characterized based on his/her associated documents (doc-based method) Intuitions: Citation graph: Similar documents are likely to have similar relevance to a given query Coauthor graphCitation graph Coauthor graph: Two authors are most likely to share Top-10 experts for query: similar expertise if they coauthor many papers. Information retrieval Document-author bipartite graph: mutually reinforced between documents (x) and authors (y) Solution: We formulate a joint regularization framework to incorporate several hypotheses to capture the information of different graphs together with textual documents Result: Using DBLP with 2M nodes and 10M edges. Significant improvements over the baseline.
Hongbo Deng, et al., Modeling and Exploiting Heterogeneous Sources for Expertise Ranking, SIGIR11 (sub)
24
Multi-Facet Search in Self-Boosting Information Networks (Example: Terrorism Network Search and Browsing)
Demo: http://blender2.cs.qc.cuny.edu/BlenderGraph/ Video: http://nlp.cs.qc.cuny.edu/terrorism.m4v
Facilitate a military analyst in expert finding and terrorist information search gathering,
control and analysis for any given query Entity-topic analyzer for self-expansion and self-boosting: Terrorism organization members status of members (die, arrest,...) and information networks associated with each member
(CUNY + UIUC) Sam Anzaroot, Javier Artiles, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Search and Browsing Self-Boosting Information Networks. SIGIR2011 [SUB]
Military Relevance
Subtask 1: Text-Rich Network Modeling and Construction Object search task enhanced by entity disambiguation and role discovery can provide methods for finding groups of soldiers and identifying terrorists with certain expertise Subtask 2: Network-Enhanced Text Analysis Asymmetric wars and counter-terrorism need understand text-rich net Text mining for monitoring potential threats and detecting terrorism with entity-topic modeling and event detection and tracking Subtask 3: Multi-Facet Search and Mining in Text-Rich Networks Most military applications need to search in multi-facets on text and unstructured data, including emails, reports, telecommunication messages, military-related news and blogs Our multi-facet multi-dimensional information network search and browsing tool has rich functions and provide intelligent network expansions
26 26
I3.2s Collaboration Network

I1.1 Roth Huang I1.2 Tarek, Charu
T1.5 Lin, Wen

S1.1 Lin
I3.1 Han, Yan
I3.2: Han, Ji, Roth, Yan weekly telecons frequent emails 5 joint papers
Logic Reasoning for Information Validation
T1.4 Parsons
ARL Cole Winkler
Military Data for Topic Analysis
T1.1 Adali
IRC Leung Data & Experiments
E2.3 Han
27
Next Six Months and Path Ahead to 2012
Continue research on mining text-intensive information networks Research in three frontiers: (1) integrated classification and clustering in network mining, (2) build up a theory on link/relationship analysis in heterogeneous networks, and (3) explore military applications Collaborations with researchers in other networks Work with Nitesh Chawla, who has done much work on link prediction, on evaluation of mining methods for clustering and classification of heterogeneous networks Work with SCNARC (Boleslaw Szymanski et al.) on using the method developed here to mine social and cognitive networks Next year research planned if funded Effective theory and methods for mining heterogeneous networks involving social and communication networks Network classification and clustering modeling in heterogeneous information, social, and communication networks Application of role discovery, network classification, and anomaly detection methods in military applications
28
Research Papers (Accepted/Published, 2011)

1.
2.
3.
4.
5.
6.
7. 8.
Tim Weninger, Marina Danilevsky, Fabio Fumarola, Joshua Hailpern, Jiawei Han, et al., WinaCS: Construction and Analysis of Web-Based Computer Science Information Networks", Proc. of 2011 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'11), (system demo paper), Athens, Greece, June 2011. Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas Huang, Geographical Topic Discovery and Comparison, Proc. of 2011 Int. World Wide Web Conf. (WWW'11), Hyderabad, India, Mar. 2011 (Full paper). Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick Barber, Jiawei Han, and Donato Malerba, Growing Parallel Paths for Entity-Page Discovery, Proc. of 2011 Int. World Wide Web Conf. (WWW'11), Hyderabad, India, Mar. 2011 (Poster paper) Heng Ji and Ralph Grishman. "Knowledge Base Population: Successful Approaches and Challenges". Accepted by Proc. the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2011), 2011. Heng Ji, Adam Lee and Wen-Pin Lin. "Information Network Construction and Alignment from Automatically Acquired Comparable Corpora". Invited book chapter for Building and Using Comparable Corpora. Springer, 2011. Heng Ji, Benoit Favre, Wen-Pin Lin, Dan Gillick, Dilek Hakkani-Tur and Ralph Grishman. 2011. "Open-domain Multi-document Summarization via Information Extraction: Challenges and Prospects". Invited book chapter for Multi-source, Multilingual Information Extraction and Summarisation. Springer. Lev Ratinov, Doug Downey, Mike Anderson, Dan Roth, Local and Global Algorithms for Disambiguation to Wikipedia , ACL11 Y. Chan and D. Roth, Exploiting Syntactico-Semantic Structures for Relation Extraction, ACL11
29
Research Papers (Published, Sept.-Dec. 2010)

1. 2.
3.
4.
5.
6.
7.
8.
Manish Gupta, Rui Li, Zhijun Yin, and Jiawei Han, Survey on Social Tagging Techniques", SIGKDD Explorations, 12(1):58-72, 2010. Lu Liu, Jie Tang, Jiawei Han, Meng Jiang, Shiqiang Yang, Mining Topic-Level Influence in Heterogeneous Networks", Proc. 2010 ACM Int. Conf. on Information and Knowledge Management (CIKM'10), Toronto, Canada, Oct. 2010 Tim Weninger, Fabio Fumarola, Jiawei Han, Donato Malerba, Mapping Web Pages to Database Records via Link Paths", Proc. 2010 ACM Int. Conf. on Information and Knowledge Management (CIKM'10), Toronto, Canada, Oct. 2010. Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han, The Wisdom of Social Multimedia: Using Flickr for Prediction and Forecast", Proc. 2010 ACM Multimedia Int. Conf. (ACM-Multimedia10), Florence, Italy, Oct. 2010 Zheng Chen, Suzanne Tamang, Adam Lee, Xiang Li, Wen-Pin Lin, Javier Artiles, Matthew Snover, Marissa Passantino and Heng Ji. "CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description". Proc. Text Analytics Conference (TAC2010), 2010 Hao Li, Xiang Li, Heng Ji and Yuval Marton. 2010. "Domain-Independent Novel Event Discovery and Semi-Automatic Event Annotation". Proc. the 23rd Pacific Asia Conference on Language, Information and Computation (PACLIC 2010) J. Pasternack and Dan Roth, Comprehensive Trust Metrics for Information Networks , Army Science Conf.10 (Best Paper Award), Dec. 2010. Q. Do and D. Roth, Constraints based Taxonomic Relation Classification, EMNLP10, Oct. 2010
30
Research Papers (Submitted, 2011)

1. 2. 3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
(UIUC + U. Michigan) Cindy Xide Lin, Qiaozhu Mei, Yunliang Jiang, and Jiawei Han, "Inferring the Diffusion and Evolution of Topics in Social Communities", KDD'11 (sub) (UIUC) Hongbo Deng, Jiawei Han, Bo Zhao, Yintao Yu, Cindy Xide Lin, "Probabilistic Topic Models with Biased Propagation on Heterogeneous Information Networks", KDD'11 (sub) (UIUC) Zhijun Yin (UIUC), Liangliang Cao (UIUC), Jiawei Han (UIUC), Chengxiang Zhai (UIUC), Thomas Huang (UIUC), "LPTA: A Probabilistic Model for Latent Periodic Topic Analysis", KDD'11 (sub) (CUNY + UIUC) Heng Ji and Jiawei Han. 2011. Web-Scale Knowledge Discovery and Information Extraction. Invited Paper for IEEE Special Issue on Web-Scale Multimedia Processing and Applications. In Preparation. (CUNY + UIUC) Hao Li, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Topically Related Data is Better Data: Topic Modeling for Event Extraction. Submitted to the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2011) (CUNY + UIUC) Sam Anzaroot, Javier Artiles, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Search and Browsing Self-Boosting Information Networks. Submitted to the 34th Annual International ACM SIGIR Conference (SIGIR2011) (CUNY) Javier Artiles, Qi Li, Enrique Amigo and Heng Ji. 2011. Leveraging Cross-document Redundancy for Temporal Information Extraction. Submitted to Empirical Methods in Natural Language Processing (EMNLP2011) (CUNY) Javier Artiles, Enrique Amigo, Qi Li and Heng Ji. 2011. Evaluating Temporal Information Extraction. Submitted to ACLHLT2011 (CUNY) Zheng Chen and Heng Ji. 2011. Collaborative Ranking: A Case Study in Entity Linking. Submitted to Conference on Empirical Methods in Natural Language Processing (EMNLP2011) (CUNY) Qi Li, Javier Artiles and Heng Ji. 2011. Dependency Paths Kernel for Temporal Relation Classification. Submitted to 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2011). (CUNY) Suzanne Tamang and Heng Ji. 2011. Learning-to-Rank for Slot Filling System Combination and Assessment. Submitted to Conference on Empirical Methods in Natural Language Processing (EMNLP2011) (CUNY) Zheng Chen, Suzanne Tamang, Adam Lee and Heng Ji. 2011. A Toolkit for Knowledge Base Population. Submitted to the 34th Annual International ACM SIGIR Conference (SIGIR2011) (CUNY) Xiang Li and Heng Ji. 2011. Comment-guided Reinforcement Learning for Slot Filling. Submitted to Conference on Empirical Methods in Natural Language Processing (EMNLP2011) 31
Other Technical Contributions
(Book: UIC + UIUC + CMU) Philip S. Yu, Jiawei Han, and Christos Faloutsos (Editors), LINK MINING: MODELS, ALGORITHMS AND APPLICATIONS, Springer, 2010. (UIUC) Jiawei Han has received Daniel C. Drucker Eminent Faculty Award at UIUC (UCSB) Ms. Gengxin Miao, who was supported by the INARC program, has received IBM Ph.D. Fellowship for 2011-2012. Gengxin Miao is co-supervised by Xifeng Yan at INARC. (CUNY) Heng Ji. CUNY Chancellor's "Salute to Scholar" Award, November 2010. (CUNY) Heng Ji. National Science Foundation Research Experiences for Undergraduates, March 2011 Jiawei Han, Towards Integrated Mining of Multiple Social and Information Networks (keynote speech) The 2011 Int. Conf. on Advances in Social Network Analysis and Mining (ASONAM11), July 2011. Jiawei Han, Exploring the Power of Heterogeneous Information Networks in Data Mining (keynote speech) The 2011 Int. SIAM Data Mining Conf. (SDM11), April 2011. Jiawei Han, Construction and Analysis of Web-Based Computer Science Information Networks (keynote speech) The 2011 Int. Conf. on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, June 2011. Latifur Khan, Wei Fan, Jiawei Han, Jing Gao, Mohammad Mehedy Masud, Data Stream Mining: Challenges and Techniques, (tutorial), The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2011), May 2011 Jiawei Han, Web Structure Mining and Information Network Analysis: An Integrated Approach, invited speech at the Third International Workshop on Network Theory: Web Science Meets Network Science, March 2011. Heng Ji, Web-Scale Knowledge Discovery and Population from Unstructured Data, Keynote Speech ACLCLP 2010 Information Retrieval Conference, December 2010. Heng Ji. Overview of the TAC2010 Knowledge Base Population Track, Keynote Speech at Web People Search (WePS-3) Conference, September 2010.
32
Personalized Recommendation on Information Networks

Concept extraction Text Concept
Combine text & links in heterogeneous networks
Find good conceptual associations of user interests; distinguish clean sources and noisy sources
(UIUC) Chi Wang, et al., Learning Relevance in a Heterogeneous Social Network and Its Application in Online Targeting, SIGIR11 (sub)
33

I3.2.pptx обратить внимание

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

I3.2.pptx обратить внимание

Uploaded by

Copyright:

Available Formats

INARC I3.

Urbana-Champaign, May 12, 2012

I3.2: Modeling and Mining of Text-Rich Information Networks

Advancing the State-of-the-Art of Network Science

Overall Task Organization

Subtask1: Text-rich InfoNet Construction

Subtask3: Multi-Facet Search

Subtask 3: Multi-facet search in text-rich information networks

Subtask 3: Multi-facet search in text-rich information networks

Subtask 1: Text-Rich Network Modeling and Construction

Growing Parallel Paths for Web Structure Mining

WinaCS: Web Information Network Analysis for Computer Science

Supports intelligent querying & mining

Wikification: Example: Entity Resolution & Tracking

Identification of Taxonomic Relations [EMNLP2010]

Name Variant Clustering

9.4% absolute improvement in micro-averaged accuracy

0.3 cq6 0.6 (q)

Waleed Al-Shehri V9 V10

Subtask 2. Network-Enhanced Text Analysis

Progressive Dynamic Information Network Analysis

Aggregation over 2 tuples Baseline

Aggregation over 10 tuples

Our Approach on Information Networks

Topic modeling with heterogeneous InforNet

nuclear program China weapons

fighting army regime British forces military city Kurdish control

Geographical Topic Discovery & Comparison

Latent Association Analysis of Document Pairs

Topic Simplex for Corpus 1

Topic Simplex for Corpus 2

Subtask 3: Multi-Facet Search and Mining

Information Network-Based Trustworthiness Analysis

Progressive Network Analysis for Expert Search

Modeling and Exploiting Heterogeneous Sources for Expertise Ranking

I3.2s Collaboration Network

T1.5 Lin, Wen

I3.1 Han, Yan

Logic Reasoning for Information Validation

ARL Cole Winkler

Military Data for Topic Analysis

IRC Leung Data & Experiments

Next Six Months and Path Ahead to 2012

Research Papers (Accepted/Published, 2011)

Research Papers (Published, Sept.-Dec. 2010)

Research Papers (Submitted, 2011)

Other Technical Contributions

Personalized Recommendation on Information Networks

You might also like