Professional Documents
Culture Documents
Volume 2, Issue 12, December - 2015. ISSN 2348 4853, Impact Factor 1.317
I.
INTRODUCTION
The growing unstructured data on the web is making it difficult and challenging for the users of web to
extract useful information Therefore, a good information retrieval system is essential for the user to
obtain effective and efficient results. The most significant part to access the useful information using
information retrieval and Natural Language Processing (NLP) is ranking. There are several tasks that
information retrieval (IR) and natural language processing (NLP) performs for solving the central
problem of ranking. They include text retrieval, entity search, meta-search, personalized search, text
summarization and question answering. In this paper we focus on text retrieval since, according to the
report by IProspect, 56% of the web users are using the web search every day and 88% use search web
every week [2]. Therefore, to find the relevance of searched text, ranking plays a key role.
This paper is organized by follows. In the following next section, we describe the review of related works
on learning ranking methods for information retrieval. In section 3, we discuss the learning methods
(data fusion) techniques. In section 4 and 5, we have explained the methods which are implemented on
the architecture based on learning ranking methodology. In section 6 and 7 simulation part as well
applications of a learning methodology are discussed. Finally and last section we have concluded the
work presented in the paper with the future research directions.
www.ijafrc.org
Ranking based
on relevance
No: of Documents
Database
Dn = {D1, D2DN}
Rank of Doc
Retrieval
System
Query Qn
Dq,1
Dq,2
.
.
.
Dq,nq
Borda Rank
In 1770 Jean-Charles de Borda Count proposed a voting based data fusion method [11]. It is an
unsupervised method for aggregation of ranking. It is a method, which can be applied on Meta Search
engine. In this case, Borda Count ranks the documents based on their positions in the basic rankings. If
any document has a high ranking in basic rankings it is counted as a high ranking in the final ranking list.
The scores of ranking documents in the final ranking list can be calculated as
35 | 2015, IJAFRC All Rights Reserved
www.ijafrc.org
S RD = F ( ) = Si .(1)
i =1
S RD
Si ,1
M
= Si , j ..(2)
M
S
i ,n
Where Si,j is the number of documents in the ranked list behind document j. The basic ranking i , i (j) is
denoted by the rank of the document j in the basic ranking i , and n denotes the no: of documents.
For example the document a, b, c can be ranked for three basic rankings: 1, 2, and 3.
1
a
b
C
2
A
C
B
3
b
a
c
0 1 0 1
S RD = 1 + 0 + 2 = 3 (3)
2 2 1 5
The final ranking list is created by Borda Count is based on the scores. S RD
c
S RD = b (4)
a
B.
Reciprocal Rank
In this approach, merging of document into unified list is done and retrieved documents are used for
ranking. Basically,
The retrieval system is defined as rank position or reciprocal rank system. The computation equation of
the rank score of
Document l shows the position of this document in all of the systems (m=1n).
SR (Dl) =
1
l
.(5)
position(d lm )
Firstly, the method calculates the rank position score of every document and it combines them by the
using rank position
36 | 2015, IJAFRC All Rights Reserved
www.ijafrc.org
scores. Documents can be in ascending or descending order. The top document is known as Pseudorels.
In our experiment, we used two data fusion Techniques for determining the Pseudorels: Rank Position,
and Borda Count.
Data fusion methods merges and accepts more than two ranked lists with the aim to provide better
performance than by an
individual system used for data fusing [10].
V. ARCHITECTURE OF LEARNING TO RANLK FOR IR SYSTEM
Learning to rank includes a metasearch engine system which sends the user request query to the
several search engines and aggregate the results obtained from them.
Meta-Search Query Interface
Query Dispatcher
Search
Engines
Information Extractor
Record
Collector
Result
Collector
User
Result Merger
DBMS
Result Ranker
Fig: 2. Standard Architecture of MSEs
User Query Interface: User sends queries to the search engine with option of four types of
search and search engines to be used.
Query Dispatcher: It generates or fetches an actual query to the search engines according to the
user query.
Information Extraction and Result Merger:
Information Extraction (IE) is a very important component for extracting the results. It contains
data record collected as a result from collector component. The result merger merges the
documents retrieved from specific search engines as selected by the user, and combines them
into a single rank list. These documents are arranged in ascending order with their global
www.ijafrc.org
similarity. The top documents having higher global similarity in the ranked list are returned to
the user interface.
Display: It generates the results page from the replies received and includes Ranking, Parsing
and clustering to obtain the search results.
Personalization/Knowledge: It involves weight of the searching results, query and search
engine for each user.
Output: A final ranked list of web pages using Rank and Borda count techniques
Methods:Step1. Take a query to find out the results from different databases or search engines, such as
Yahoo, Google, MSN.
Step2. Take top k documents from the
Step3. Find out union of top k documents obtained from individual engine and remove the duplicate
to get the unique result from different search engines in order to get unique web pages.
For k=No: of documents = 20, Number of Queries = See in the Query List
Top k results by Yahoo, Google and MSN are shown in table2.
Step4. Using data fusion techniques compares the performance of these two techniques in automatic
ranking.
www.ijafrc.org
Query/Keyword
nos
Query
Query/Keyword
nos
Q1
Cancer
Q6
Brain Tumor
Q2
Eye Diseases
Q7
Animal Disease
Q3
Blood Sugar
Q8
Heart attack
Q4
Breast Cancer
Q9
X-Ray
Q5
Dengue
Q10
Root Canal
Learning to rank has wide variety of an applications for IR and NLP. Most of them are document
search retrievals such as Expert search, meta search, personalized search, online advertisement
search, question answering, keyword phrases extraction, documents summarization, and machine
translation.
A.
Searching on Web
Web search is mostly used application for learning to rank which is also known as ranking models. It
can be used for different problems like web searching, personalized searching, federated search,
online advertisement searching etc.
B.
www.ijafrc.org
This application is basically used for examining the ratings of items and ranked list of items. This
application can be formalized as classification problem because users can give ranking to the items
which are more likely to be preferred by the users.
C.
E. Machine Translation
The typical problem of ranking in machine learning is re-ranking. There are many advantages of reranking approach. Machine translation produces many translation models by using generative
model. With the help of discriminative model, re-ranking can be done by candidate translations.
Through re-ranking, the accuracy of translation may be enhanced using discriminative model for
final translation selection. Further, through re-ranking the efficiency of translation may also be
improved.
VII.
Information retrieval and natural language processing is very important to retrieve the results in more
effective and efficient manner. For learning IR we have presented different ranking methods that are
applied for learning to IR ranking. Meta search engine supports or aggregates the results obtained from
different search engines. The main issue of Meta Search Engine (MSE) occurs in selection of database,
documents and result merging. In this paper, we have represented how to combine the search results and
obtained the results in the ranked list from multiple search engines. We have experimented on Rank
Position (Reciprocal Ranking) and Borda ranking techniques of retrieval system relevance judgment and
compared the results of both. Our result indicate that Natural Language search systems can have
practical use in society and algorithm can be used to improve the precision of medical documents. Other
text mining techniques can be used for large amount of text data in future.
VIII. REFERENCES
[1]
www.ijafrc.org
[2]
Meng, W.,Yu,C and Liu K. Building Effective and Efficient Metasearch engine ACM Computing
Surveys, Vol.34, No.1, (March 2002).
[3]
W.MENG, Metasearch Engines, Department of Computer Science, State University of New York
at Binghamton, 2008.
[4]
[5]
[6]
[7]
Y. Lu, W.Meng, L. Shu, C. Yu, and K. Liu,Evaluation of Result Merging Strategies for Metasearch
Engines, 6th International Conference on Web Information Systems Engineering (WISE
Conference), New York, 2005.
[8]
H. Zhao, W. Meng, Z. Wu, V. Raghavan, and C. Yu,Fully Automatic Wrapper Generation for Search
Engines. World Wide Web Conference, Chiba, japan,2005.
[9]
S. Souldatos, T. Dalamagas, and T. Sellis, Captain Nemo: A MetaSearch Engine with Personalized
Hierarchical search sapace, School of Electrical and Computer Engineering, national technical
university of Athens, November, 2005.
[10]
[11]
[12]
[13]
Yuwono, B. and lee, D. Search and ranking algorithm for locating resources on the World Wide
Web. In proceeding of the 5th International Conference on Data engineering, page no: 164-177.
(1996).
[14]
Hang Li., Learning to Rank for Information Retrieval and Natural language Processing.
Morgan&Claypool, 2011.
[15]
Jiang.X, Yunhua.Hu, Hang Li., A Ranking approach to Keyphrase Extraction. SIGIR09, July 19-23,
2009, Boston, Massachusetts,USA.
Personalized
www.ijafrc.org