A2 980560

Working of Search
Engines
A2-39
Avinash Kumar Widhani, Ankit Tripathi, Rohit
Sharma
LNMIIT
16umm006@lnmiit.ac.in, 16ume010@lnmiit.ac.in,
16ucc078@lnmiit.ac.in
Abstract
This article argues that search engines raise not merely technical issues
but also political ones. Our study of search engines suggests that they
systematically exclude (in some cases by design and in some,
accidentally) certain sites and certain types of sites in favor of others,
systematically giving prominence to some at the expense of others. We
argue that such biases, which would lead to a narrowing of the Web's
functioning in society, run counter to the basic architecture of the Web
as well as to the values and ideals that have fueled widespread support
for its growth and development. We consider ways of addressing the
politics of search engines, raising doubts whether, in particular, the
market mechanism could serve as an acceptable corrective .
Introduction
A search engine is the practical application of information retrieval techniques
to large-scale text collections. A web search engine is the obvious example, but as
has been mentioned, search engines can be found in many different applications,
such as desktop search or enterprise search. Search engines have been around for
many years. For example, MEDLINE, the online medical literature search system,
started in the 1970s. The term search engine was originally used to refer
to specialized hardware for text search. From the mid-1980s onward, however, it
gradually came to be used in preference to information retrieval system as the
name for the software system that compares queries to documents and produces
ranked result lists of documents. There is much more to a search engine than the
ranking algorithm.
Search engines come in a number of configurations that reflect the applications
they are designed for. Web search engines, such as Google and Yahoo, must
be able to capture many terabytes of data, and then provide subsecond
response times to millions of queries submitted every day from around the world.
Enterprise search enginesfor example, Autonomymust be able to process
the large variety of information sources in a company and use company-specific
knowledge as part of search and related tasks, such as data mining. Data mining
refers to the automatic discovery of interesting structure in data and includes
techniques
such as clustering. Desktop search engines, such as the Microsoft Vista
search feature, must be able to rapidly incorporate new documents, web pages,
and email as the person creates or looks at them, as well as provide an intuitive
interface for searching this very heterogeneous mix of information. There is overlap
between these categories with systems such as Google, for example, which is
available in configurations for enterprise and desktop search.
Open source search engines are another important class of systems that have
somewhat different design goals than the commercial search engines. There are a
number of these systems, and the Wikipedia page for information retrieval9
provides
links to many of them. Three systems of particular interest are Lucene,
Lemur, and the system provided with this book, Galago. Lucene is a popular
Java-based search engine that has been used for a wide range of commercial
applications.
LITERATURE
REVIEW
So many search engines have been created that it is difficult for users to know
where they are, how to use them, and what topics they best address. Meta search
engines reduce the user burden by dispatching queries to multiple search engines
in parallel. The SAVVYSEARCH meta search engine is designed to efficiently query
other search engines by carefully selecting those search engines likely to return
useful results and responding to fluctuating load demands on the web.
Metasearch System: SAVVYSEARCH

SAVVYSEARCH is designed to balance two potentially conflicting goals: (1)
maximizing the likelihood of returning good links and (2) minimizing computational
and web resource consumption. The key to compromise is knowing which search
engines to contact for specific queries at particular times. SAVVYSEARCH tracks
long-term performance of search engines on specific query terms to determine
which are appropriate and monitors recent performance of search engines to
determine whether it is even worth trying to contact them.
Submitting a Query
The options cover the treatment of the terms, the display of results, and the
interface language. Three aspects of the results display can be varied: (1) the
number of links returned, (2) the format of the links description, and (3) the timing.
By default, 10 links are displayed with the uniform resource locators (URLs) and
descriptions when available, and the results of each search engine are listed
separately as they arrive.
Processing a Query
When a user submits the query, SAVVYSEARCH must make two decisions: (1) how
many search engines to contact simultaneously and (2) what order the search
engines should be contacted in.
Resource Reasoning Each search engine

queried expends network and local computational
resources. Thus, modifying concurrency
(number of search engines queried in parallel)
is the best way to moderate resource
consumption.
Ranking Search Engines

The purpose of ranking is to determine
which search engines are most worthwhile to
contact for a given query. Search engines are
ranked based on learned associations between
search engines and query terms (stored in a
metaindex) and recent data on search-engine
performance.
Dispatching a Query

A2 980560

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A2 980560

Uploaded by

Copyright:

Available Formats

Working of Search

Metasearch System: SAVVYSEARCH

Resource Reasoning Each search engine

Ranking Search Engines

You might also like