Professional Documents
Culture Documents
1. Introduction
2. Search Terminology
9. Conclusion
1
SEARCH ENGINES
Introduction:
Think back to the library card catalogue analogy. In the old card
files, and even in today’s computer terminal library catalogues, you find
information by searching on either the author, the title, or the subject. You
usually choose the subject option when you want to cover a broad range of
information.
Example: You’d like to create your own home page on the Web, but you
don’t know how to write HTML, you’ve never created a graphic file, and
you’re not sure how you’d post a page on the Web even if you knew how
to write one. In short, you need a lot of information on a rather broad
topic--Web publishing.
Your best bet is not a search engine, but a Web directory like Yahoo.
Yahoo is a subject-tree style catalogue that organizes the Web into 14
majors topics, including Arts, Business and Economy, Computers and
Internet, Education, Entertainment, Government, Health, News,
Recreation, Reference, Regional, Science, Social Science, Society and
Culture. Under each of these topics is a list of subtopics, and under each of
those is another list, and another, and so on, moving from the more general
to the more specific.
2
Example: To find out about Web page publishing from yahoo, select the
Computers and Internet Topic, under which you find a subtopic on the
Wide World Web. Click on that and you find another list of subtopics,
several of which are pertinent to your search: Web Page Authoring, CGI
Scripting, Java, HTML, Page Design, and Tutorials. Selecting any of these
subtopics eventually takes you to Web pages that have been posted
precisely for the purpose of giving you the information you need.
Important note: More and more search engines are incorporating Web
directories into their sites. These directories interact with the main search
engine on the site in various ways. See Excite, Infoseek and Lycos, even
Alta Vista--they are no longer “just a search engine.” They are now
characterizing themselves as Web portals or hubs -- places where people
come to on the Web to get information about a multitude of subjects, and
even to chat, send email and form online communities.
Search Terminology
3
Query semantics A set of rules that defines the meaning of a query.
Search engines, like all other web sites, are housed on high-speed
computers called WW servers. They are completely dedicated to providing
effective search services 24 hours a day. Search engine servers are
connected to the backbone (high speed infrastructure) of the WWW via
extremely fast, expensive telephone lines called t3 lines. Most of Yahoo’s
servers, for example, are located in Santa Clara, California.
4
web sites. At most, these collections sport an impressive, frequently
updated and detailed majority of the WWW.
5
account, these algorithms generate a relevancy score for the first web page
in their memory. They then proceed to do the same for the second, third
and millionth web pages. Finally, the relevancy scores are sorted in order
from most relevant to least, and the corresponding web pages are listed in
this order with informative summary information from the database.
Viola! The surfer (hopefully) gets the results he of she was looking for.
Indexing:
Document should be indexed for making search easier and less time
consuming. Indexing is the processing of a document representation by
assigning content descriptors or terms to the document. Each document
has objective terms (for example: The authors name, document URL, and
the date of publication), and non-objective terms intended to reflect the
information known as content terms.
6
they have user interface for obtaining and presenting results. Search tools
employ robots for indexing web documents, and these can be classified as
type1 and type2.
Search services:
Search sites:
There are basically two types of search sites on the web: search
directories and search engines.
Search engines use software robots to survey the Web and build their
databases. Web documents are retrieved and indexed. When you enter a
query at a search engine web site, your input is checked against the search
engine’s keyword indices. The best matches are then returned to you as
hits.
7
SEARCH ENGINE COMPONENTS:
User interface: The screen in which you type a query and which displays
the search results.
Searcher: The part that searches a database for information to match your
query.
Indexer: The function that categorizes the data obtained by the gatherer.
KEYWORD SEARCHING:
This is the most common form of text search on the Web. Most
search engines do their text query and retrieval using keywords.
Unless the author of the Web document specifies the keywords for
her document (this is possible by using meta tags in the latest version of
HTML), it’s up to the search engine to determine them. Essentially, this
8
means that search engines pull out and index words that are believed to be
significant. Words that are mentioned towards the top of a document and
words that are repeated several times throughout the document are more
likely to be deemed important.
Some sites index every word on every page. Others index only part
of the document. For example, Lycos indexes the title, headings,
subheadings and the hyperlinks to other sites, along with the first 20 lines
of text and the 100 words that occur most often.
Search engines also cannot return hits on keywords that mean the
same, but are not actually entered in your query. A query on heart disease
would not return a document that used the word “cardiac” instead of
“heart”.
9
Unlike keyword search systems, concept-based search systems try to
determine what you mean, not just what you say. In the best
circumstances, a concept-based search returns hits on documents that are
“about” the subject/theme you’re exploring, even if the words in the
document don’t precisely match the words you enter into the query.
Search refining options differ from one search engine to another, but
some of the possibilities include the ability to search on more than one
10
word, to give more weight to one search term than you give to another, and
to exclude words that might be likely to muddy the results. You might also
be able to search on proper names, on phrases, and on words that are found
within a certain proximity to other search terms.
Some search engines also allow you to specify what form you’d like
your results to appear in, and whether you wish to restrict your search to
certain fields on the internet (i.e., Usenet of the Web) or to specific parts of
Web documents (i.e., the title of URL).
Many, but not all search engines allow you to use so-called Boolean
operators to refine your search. These are the logical terms AND, OR,
NOT, and the so-called proximal locators, NEAR and FOLLOWED BY.
Boolean AND means that all the terms you specify must appear in
the documents, i.e., “heart” AND “attack”. You might use this if you
wanted to exclude common hits that would be irrelevant to your query.
Boolean OR means that at least one of the terms you specify must
appear in the documents, i.e., bronchitis, acute OR chronic. You might use
this if you didn’t want to rule out too much.
Boolean NOT means that at least one of the terms you specify must
not appear in the documents. You might use this if you anticipated results
that would be totally off-base, i.e., nirvana AND Buddhism, NOT Cobain.
Not quite Boolean + and – Some search engines use the characters +
and – instead of Boolean operators to include and exclude terms.
NEAR means that terms you enter should be within a certain number
of words of each other. FOLLOWED BY means that one term must
directly follow the other. ADJ, for adjacent, serves the same function. A
search engine that will allow you to search on phrases uses, essentially, the
same method (i.e., determining adjacency of keywords).
11
Capitalization: This is essential for searching on proper names of people,
companies or products. Unfortunately, many words in English are used
both as proper and common nouns – Bill, bill, Gates, gates, Oracle, oracle,
Lotus, lotus, Digital, digital – the list is endless.
Some search engines are now indexing Web documents by the meta
tags in the documents HTML (at the beginning of the document in the so-
called “head” tag). What this means is that the Web page author can have
some influence over which keywords are used to index the document, and
even in the description of the document that appears when it comes up as a
search engine hit.
12
for long. There is a lot of conflicting information out there on meta-
tagging. If you’re confused it may be because different search engines
look at meta tags in different ways. Some rely heavily on meta tags, others
don’t use them at all.
The search engines are aware of such deceptive tactics, and have
devised various methods to circumvent them, so be careful. Use keywords
that are appropriate to your subject, and make sure they appear in the top
paragraphs of actual text on your web page. Many search engine
algorithms score the words that appear towards the top of your document
more highly than the words that appear towards the bottom. Words that
appear in HTML header tags (H1, H2, H3, etc) are also given more weight
by some search engines. It sometimes helps to give your page a file name
that makes use one of your prime keywords, and to include keywords in
the “Alt” image tags.
Remember that all the major search engines have slightly different
policies. If you’re designing a website and meta-tagging your documents,
we recommend that you take the time to check out what the major search
13
engines say in their help files about how they each use meta tags. You
might want to optimize your meta tags for the search engines you believe
are sending the most traffic to your site.
14
1. The search engines they send your search terms to (size, content,
number of search engines, you ability to choose the search
engines you prefer); all of them search subject directories as well
as search engines and intermix results from all.
2. How they handle your search terms and search syntax (Boolean
operators, phrases, and defaults they impose).
15
Quantity in results does not equal satisfaction. If you get more
results than you want, try refining the results by going directly
to AltaVista Advanced Search, Northern Light, or Infoseek by
clicking on their link in the results. Choose meta-search
engines that offer some of these as options.
Conclusion:
Though there are many search engines available on the web, the
searching methods and the engines need to go a long way for efficient
retrieval of information on relevant topics. As the technology advances at
an unimaginable pace, it is not unwise expecting an efficient search engine,
which addresses all the needs.
Choosing the right search engine will need patience and experience.
Use Meta search engines. They minimize your search to a great extent.
The good news is that new search engines are evolving every day to
improve retrieval efficiency.
16