You are on page 1of 3

Journal of Computer Applications (JCA) ISSN: 0974-1925, Volume V, Issue 3, 2012

Second or Higher Order Associations Between a Name and Candidate Name Aliases
A.Muthusamy a,*, Dr.A.Subramani b,1

Abstract - It is typically referred by frequent name aliases on the web. Exact identification of aliases of a given person name is useful in various web related tasks such as information retrieval, opinion mining, personal name disambiguation, and relation extraction. Numerous ranking scores are defined to evaluate the candidate aliases using three approaches: lexical pattern frequency, word co-occurrences, and page counts on the web. Lexical pattern technique is to extract the aliases of a given personal name from the web. First, extracts a set of candidate aliases. Second, rank the extracted candidates according to the possibility of a candidate being a correct alias of the given name. Third, automatically extract a large set of candidate aliases from snippets retrieved from a web search engine. It has been showed experimentally that the knowledge of aliases is helpful to identify a particular person from his or her namesakes on the web. Moreover, the aliases extracted using the proposed method is successfully utilized in an information retrieval task. As well, the existing namesake disambiguation algorithm assumes the real name of a person to be given and does not attempt to disambiguate people who are referred only by aliases. We proposed to overcome the demerits of creating a word co-occurrence graph using the definition of anchor texts-based co-occurrences and run graph mining algorithms to identify second or higher order associations between a name and candidate aliases.
Index Terms - Web mining, information extraction, web text analysis.

research scholars from various fields are referred by their unique names on web. Most of the queries [1] [7] to web search engines include the person names. The search engine might give the relevant documents met the information need of the users query. Many web pages about person names might also be created by aliases. The user will not be able to retrieve all information about a person if he only uses his personal name. To retrieve complete information about a person name, the person might know about his aliases on the web. Various types of words are used as aliases on the web. Whereas, identifying aliases will help us in information retrieval. The aliases are extracted using previously proposed alias extraction method [1]. The search engine expands the query on person names by grouping the extracted aliases to retrieve relevant web pages those are referred by original names as well as aliases thereby improving recall and Mean Reciprocal Rank (MRR). II. LITERATURE SURVEY S.Sekine and J.Artiles [2] describes the task definition, annotation scheme, statistics of the data and evaluation scheme of the task. D. Bollegala, Y. Matsuo, and M. Ishizuka [3] proposed a technique that exploits page counts and text snippets returned from Web search engine.C. Galvez and F. Moya-Anegon [4] founded Approximate string matching algorithms are used for extracting variants or abbreviations of personal names.Y. Matsuo, J. Mori, M. Hamasaki, K. Ishida, T. Nishimura, H.Takeda, K. Hasida, and M. Ishizuka [5] Polyphonet is employed to form extraction scalable and utilize person-to-word relations. T. Hokama and H. Kitagawa [6] proposed a method for extracting the mnemonic names of people from the web it shows experimental results using the real web data.J. Artiles, J. Gonzalo, and F. Verdejo [7] proposed a method used to evaluate people searching strategies on the World-Wide-Web. In personal name disambiguation [8] the goal is to disambiguate various people that share the same name. In name disambiguation the objective is to identify the different entities that are referred by the same ambiguous name. In alias extraction, the authors are interested in extracting all references to a single entity from the web. P. Mika [9], the application of this representation shows how community-based semantics emerges from this model through a process of graph transformation.

I. INTRODUCTION Today searching information on the web is daily activity for many people. Information retrieval is a field concerned with the analysis, structure, organization, storage, searching, and retrieval of information. Retrieval of information about people from web search engines can become difficult task when a person has name aliases. Several users query might include retrieval of documents for personal names. Many
Manuscript received 20/July/2012. Manuscript selected 14/Aug/2012. A.Muthusamy,Assistant Professor, Department of MCA, K.S.R. College of Engineering, Tamilnadu, India E-mail: muthusamy.arumugam@gmail.com Dr.A.Subramani,Professor & Head,Department of MCA, K.S.R. College of Engineering, Tamilnadu, India E-mail: subramani.appavu@gmail.com

107

Second or Higher Order Associations between a Name and Candidate Name Aliases

III. MODEL Previous Model

Page counts on the Web Page counts retrieved from a web search engine for the conjunctive query, p and x, for a name p and a candidate alias x can be regarded as an approximation of their co occurrences in the web. Proposed Model

Figure 1. Existing Model

In Figure. 1 it shows the previous model of Automatic Discovery of Personal name aliases from the web. It consist of, Numerous ranking scores method extract aliases of a given personal name from the web. With the given personal name, the method first extracts a set of candidate aliases. Next, we rank the extracted candidates according to the likelihood of a candidate being a correct alias of the given name. Automatically extracted lexical pattern-based approach to efficiently extract a large set of candidate aliases from snippets retrieved from a web search engine. We define numerous ranking scores to evaluate candidate aliases using three approaches: a. Lexical pattern frequency b. Word co-occurrences c. Page counts on the web. Lexical Pattern Frequency It extracts numerous lexical patterns that are used to describe aliases of a personal name. Pattern extraction algorithm can extract a large number of lexical patterns. It is used in Information retrieval. Word co-occurrences It is used in various tasks such as synonym extraction, query translation in cross-language information retrieval, and ranking and classification of web pages. Anchor texts measure the association between a name and its aliases on the web and pointing to a URL provide useful semantic clues related to the resource represented by the URL.

Figure 2. Proposed Model

A new model as shown in the figure 2 is used to create a word co-occurrence graph using the definition of anchor texts-based co-occurrences and graph mining algorithms to identify second or higher order associations between a name and candidate aliases. Need of Graph Mining Algorithm Graph mining is a special case of structured data mining. Structured data mining is the process of finding and extracting useful information from semi structured data sets. It contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as schema less or self-describing structure. IV. CONCLUSION The anchor texts-based co-occurrences among the given personal name and aliases will create a word co-occurrence graph by making connections between nodes representing name and aliases in the graph based on their higher or second order associations with each other. The graph mining algorithm helps to find out the hop distances between nodes will be used to identify the association orders among name and aliases. The web search engine can expand the query on a personal name by grouping aliases in the order of their associations with name to retrieve all relevant results thereby improving recall and achieving a substantial MRR compared to that of previously proposed methods

108

Journal of Computer Applications (JCA) ISSN: 0974-1925, Volume V, Issue 3, 2012 REFERENCES
[1] Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka, Automatic Discovery of Personal Name Aliases from the Web, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 6, JUNE 2011 S. Sekine and J. Artiles, Weps 2 Evaluation Campaign: Overview of the Web People Search Attribute Extraction Task, Proc. Second Web People Search Evaluation Workshop (WePS 09) at 18th Intl World Wide Web Conf., 2009. D. Bollegala, Y. Matsuo, and M. Ishizuka, Measuring Semantic Similarity between Words Using Web Search Engines, Proc. Intl World Wide Web Conf. (WWW 07), pp. 757-766, 2007. C. Galvez and F. Moya-Anegon, Approximate Personal NameMatching through Finite-State Graphs, J. Am. Soc. for Information Science and Technology, vol. 58, pp. 1-17, 2007. Y. Matsuo, J. Mori, M. Hamasaki, K. Ishida, T. Nishimura, H.Takeda, K. Hasida, and M. Ishizuka, Polyphonet: An Advanced Social Network Extraction System, Proc. WWW 06, 2006. T. Hokama and H. Kitagawa, Extracting Mnemonic Names of People from the Web, Proc. Ninth Intl Conf. Asian Digital Libraries (ICADL 06), pp. 121-130, 2006. J. Artiles, J. Gonzalo, and F. Verdejo, A Testbed for People Searching Strategies in the WWW, Proc. SIGIR 05, pp. 569-570, 2005. R. Bekkerman and A. McCallum, Disambiguating Web Appearances of People in a Social Network, Proc. Intl World Wide Web Conf. (WWW 05), pp. 463-470, 2005. P. Mika, Ontologies Are Us: A Unified Model of Social Networks and Semantics, Proc. Intl Semantic Web Conf. (ISWC 05), 2005.

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

BIOGRAPHY
Mr. A. Muthusamy is currently working as a Assistant Professor, Department of Computer Applications, K.S.R. College of Engineering, Tiruchengode and as a Research Scholar in Bharathiar University, Coimbatore. He received his MCA degree from Anna University, Chennai. He published 1 technical papers in National Conference and 1 international journal. His area of research includes Mobile Communications, Data Mining, Web Mining. Dr. A. Subramani is currently working as a Professor and Head, Department of Computer Applications, K.S.R. College of Engineering, Tiruchengode and as a Research Guide in various Universities. He received his Ph.D. Degree in Computer Applications from Anna University, Chennai. He is a Reviewer of 10 National/International Journals. He is in the editorial board of 6 International/National Journals. He is an Associate Editor of Journal of Computer Applications. He has published more than 30 technical papers at various International, National Journals and Conference proceedings. His areas of research include High Speed Networks, Routing Algorithm, Soft computing, Wireless Communications, Mobile Ad-hoc Networks.

109