You are on page 1of 40

Effective Web Searching

T.B. Rajashekar National Centre for Science Information Indian Institute of Science Bangalore - 560 012 (E-Mail: raja@ncsi.iisc.ernet.in)

Effective Web Searching


How we use libraries and IR systems? Organization of the web

Accessing web-based information: key problems


Tools for Information retrieval on the web Directories/ guides Search engines Meta search tools People finding tools Strategies for web searching

Guides to search tools


Keeping current
T.B. Rajashekar November 2000 2

How we use Libraries and IR Systems?


Libraries: How the documents are organised document types, classification system used Access tools catalogues, indexes, automated catalogues, access points Our information need (search topic) translate these in terms of organization scheme employed by the library Information Retrieval systems (e.g. bibliographic

databases)
How the database is organised, record content, fields, search elements Indexing and query language, thesaurus, Boolean logic, truncation, etc. Our information need formulated as a search expression using the query language
T.B. Rajashekar November 2000 3

Organization of the Web


Adopt same strategy while searching the Web Understand web information architecture Understand the information access tools and the information access mechanisms they provide Represent our query in terms of mechanisms supported by these tools and search the web Web sites: How the content is organised (document types, structuring and navigation) Searchable/indexable and non searchable/indexable content Structure of web pages Meta tags, page attributes (properties)

T.B. Rajashekar

November 2000

Organization of the Web...


Web is the totality of web pages stored on web servers Spectacular growth in web-based information sources

and services:
Education and research Entertainment Business and commerce Personal home pages

Estimated to contain over 1 billion indexable web pages Doubling each year Over 80 million web sites

T.B. Rajashekar

November 2000

Accessing Web-based Information: Key Problems


Identification of sources (documents)
No central card catalog Most web pages are not indexed in standard

vocabulary, unlike library catalogues or journal article indexes Impossible to reach all related pages/ sites directly Need to use intermediate, resource finding tools

T.B. Rajashekar

November 2000

Information Retrieval on the Web


How to find relevant documents on the Web? Informal: Browsing (and book marking for later use) Friends Print sources Discussion forums (mailing lists) Current awareness services (e.g. Scout Report) Guessing web site addresses! Formal (using information finding tools) Web directories/ guides Web search engines Meta-search tools Specialty search engines
T.B. Rajashekar November 2000 7

Web Directories/ Guides


Also called as virtual libraries and Internet resource

catalogues Organised collection of descriptions and links to Internet sources Organisation: by subject categories (hierarchical); by resource type (patents, e-journals, institutes, etc.) Most use human experts for source selection, indexing and classification Some include reviews/ ratings of listed sites

T.B. Rajashekar

November 2000

Web Directories/ Guides...


Examples of general web directories: Librarians Index to the Internet (www.lii.org) Britannicas Webs best sites (www.britannica.com) Infomine (infomine.ucr.edu) Scout Report Signpost (www.signpost.org) BUBL link (bubl.ac.uk/link) Yahoo (www.yahoo.com) Magellan (www.mckinley.com) Galaxy (www.galaxy.com) Looksmart (www.looksmart.com) Snap (www.snap.com)

T.B. Rajashekar

November 2000

Web Directories/ Guides...


Guides to directories: WWW Virtual Library (www.vlib.org) Argus Clearinghouse (www.clearinghouse.net) Gogettem (www.gogettem.com/) Subject-specific guides (subject gateways): Edinburgh Engineering Virtual Library (www.eevl.ac.uk) Social Science Information Gateway (sosig.ac.uk) The Internet Pilot To Physics (physicsweb.org/TIPTOP) Chemcenter (www.acs.com) Programmers Heaven (www.programmersheaven.com) Resource type guides: Patents (www.european-patent-office.org) Electronic journals (www.publist.com)
T.B. Rajashekar November 2000 10

Web Directories/ Guides...


Most web directories support searching within

categories and descriptions, in addition to browsing Advantages:


Access to high quality sources Do not contain redundant links Faster access to sources

Disadvantages: One needs to be aware of such directories/ guides May not be up-to-date May not be exhaustive Categories (subject hierarchy) varies across directories

T.B. Rajashekar

November 2000

11

Web Directories/ Guides...


When to use web directories/ guides?
For broad/ general topics where keyword searching on search engines retrieves too many irrelevant sites When you want a few highly relevant sites and intention is not exhaustive/ comprehensive search

When not to use web directories/ guides?


For concept/ keyword searches
Search terms are distinctive

Effective directory/ guide usage:


Take advantage of the sub-search within categories, supported by most directories/ guides Join their mailing lists for automatic updates on new sites
T.B. Rajashekar November 2000 12

Web Directories/ Guides...


Demonstration of directories/ guides: Librarians Index to the Internet (www.lii.org) Britannicas Webs best sites (www.britannica.com) Scout Report Signpost (www.signpost.org) BUBL link (bubl.ac.uk/link) Yahoo (www.yahoo.com) WWW Virtual Library (www.vlib.org) Argus Clearinghouse (www.clearinghouse.net)

T.B. Rajashekar

November 2000

13

Web Search Engines


Just as A&I journals index published literature, web

search engines build a full-text index to web pages gathered from web sites and provide a keyword search interface to this index Spider programs periodically visit web sites and gather the web pages for indexing Also index web sites submitted by site developers A brief summary of the indexed web page is also prepared The index usually contains URLs, titles, headings, and other words from the HTML document
November 2000 14

T.B. Rajashekar

Web Search Engines...


The search engines provide a forms-based search

interface for entering the queries Support simple and advanced search interfaces Search results are returned in the form of a list of web sites matching the query Some key features supported:
Phrase searching ( double quotes) Boolean searching (AND, OR, NOT) Implied Boolean: Term inclusion (+), term exclusion (-)

T.B. Rajashekar

November 2000

15

Web Search Engines


Key features Proximity searches (NEAR, ADJ, BEFORE, AFTER) Use of parentheses to group search terms Truncation searches (industr*) Field-specific searching (Title, URL, Text) Natural language queries (Why is the sky blue?) Relevance ranking of search results Number of search terms Number of times each search term occurs Proximity of search terms Location of search terms (title, text)

T.B. Rajashekar

November 2000

16

Web Search Engines


Key features Sub-searching (searching within retrieved records) Case sensitivity Limit by language Limit by age of documents Limit by audio, video and image type Translation of search results (title and description) Limit by domain, host

T.B. Rajashekar

November 2000

17

Web Search Engines...


Examples: Fastsearch (alltheweb.com)
Altavista (www.altavista.com)
Google (www.google.com) Northernlight (www.northernlight.com) HotBot (www.hotbot.com) Excite (www.excite.com) Lycos (www.lycos.com) InfoSeek Guide (www.infoseek.com)

WebCrawler (www.webcrawler.com)
Worldwide Web Worm (www.goto.com)

T.B. Rajashekar

November 2000

18

Web Search Engines...


Specialty search engines: Country-specific search engines www.khoj.com www.123india.com Subject-specific search engines Chemfinder (www.chemfinder.com) Engineering Resources Online (www.er-online.co.uk) MathSearch (www.maths.usyd.edu.au:8000/MathSearch.html) Netpart: Company site locator (www.websense.com/locator.cfm) World Trade Locator (www.intl-tradenet.com) Resource-specific search engines: Patents (www.uspto.gov) Journal articles (www.findarticles.com)
T.B. Rajashekar November 2000 19

Web Search Engines...


Advantages of search engines:

Best suited for complex keyword/ concept searches Control over search: search terms can be combined as required Searches can be limited to period of time, fields, source type,etc. Currency of information, made possible by regular addition by web spiders Exhaustive information can be retrieved (with lots of patience!)
Disadvantages:

Time consuming False positives Search engines vary in terms of search techniques/ syntax

Dead links, redundant links (same document gets displayed)


Spamming (salting of pages) Higher ranking of paying sites
T.B. Rajashekar November 2000 20

Web Search Engines...


Limitations of web search engines: Poor retrieval effectiveness (relevance) as little vocabulary control is exercised by web site developers and the index engines Different search engines return different search results due to the variation in indexing and search process (40% nonoverlap) None of the search engines come close to indexing the entire web, much less the entire Internet. Content not indexed: PDF documents Content that requires log in Databases searched using CGI programs Web content on intranets behind fire walls
T.B. Rajashekar November 2000 21

Web Search Engines...


Demonstration of search engines: Fastsearch (www.alltheweb.com) Altavista (www.altavista.com) Google (www.google.com) Northernlight (www.northernlight.com)

T.B. Rajashekar

November 2000

22

Meta Search Tools


Exhaustive searches require use of more than one web search engine

and familiarity with their search interface


Meta search tools provide a common interface and conduct searches

in many search engines simultaneously and return results in a uniform format


Do not gather web pages, build indexes, accept URL additions,

classify or review web sites


Some features supported:

Duplicate hits removal Rank results Selection of search engine(s) to be used

T.B. Rajashekar

November 2000

23

Meta Search Tools...

Search using multiple search engines

Search using a meta search tool

T.B. Rajashekar

November 2000

24

Meta Search Tools...


Meta search tools (remote sites):
MetaCrawler (www.metacrawler.com) Ixquick (www.ixquick.com) Dogpile (www.dogpile.com) ProFusion (www.profusion.com)

Meta search tools (local, installable software):


Copernic (www.copernic.com) SearchPad (www.searchpad.com)

LexiBot (www.completeplanet.com)

T.B. Rajashekar

November 2000

25

Meta Search Tools...


Advantages: Query can be run across multiple search engines User needs to learn only the search interface of the meta search tool Better results: retrieves top-ranking pages from individual search engines Disadvantages: Unique features of individual search engines is lost Not exhaustive: use only top results returned by search engines

T.B. Rajashekar

November 2000

26

Meta Search Tools...


When to use meta search tools? Need to be used cautiously Good for simple searches, particularly if search terms are distinctive or unique Good for testing with a few keywords and find which individual search engine returns good results Good for quick and dirty searching if you are in a hurry and want to find a few relevant sites quickly For complex searches, involving many search terms, Boolean logic, etc., it is better to use individual search engines

T.B. Rajashekar

November 2000

27

Meta Search Tools...


Demonstration:
MetaCrawler (www.metacrawler.com) Ixquick (www.ixquick.com) Dogpile (www.dogpile.com)

ProFusion (www.profusion.com)

T.B. Rajashekar

November 2000

28

People Finding Tools


Register names and addresses and find e-mail addresses Examples: Bigfoot (www.bigfoot.com) Peoplesearch (www.peoplesearch.net) Ahoy (ahoy.cs.washington.edu:6060/) Four11 (www.four11.com) Switchboard (www.switchboard.com) Whowhere (www.whowhere.lycos.com/) Most search engines also support people searches (e.g.

Altavista, Google, Yahoo!)

T.B. Rajashekar

November 2000

29

People Finding Tools


Using people finding tools: Person should have registered in the tool(s) Searcher should know both surname and first name, else too many names will be retrieved Bias for U.S. based people Often, required e-mail cannot be retrieved through these tools Alternatively, any search engine may be used (phrase search using persons name) If persons affiliation is known, Yahoo! Directory may be used to locate the institution and e-mail

T.B. Rajashekar

November 2000

30

Web Search Strategies


Search steps:
1. Analyze the search topic and identify the search terms (both inclusion and exclusion), their synonyms (if any), phrases and Boolean relations (if any) 2. Select the search tool(s) to be used (meta search engine, directory, general search engine, specialty search engine)

3. Translate the search terms into search statements of the selected search engine
4. Perform search 5. Refine the search based on results 6. Visit the actual site(s) and save the information (using FileSave option of the browser)
T.B. Rajashekar November 2000 31

Web Search Strategies


Tips for effective web searching:
Broad or general concept searches: start with directory-based services (want a few highly relevant sites for a broad topic) Highly specific or topics with unique terms/ many concepts: use the search tools Go through the help pages of search tools carefully Gather sufficient information about the search topic before searching

Spelling variations, synonyms, broader and narrower terms

Use specific keywords, rare/unusual words are better than common ones
T.B. Rajashekar November 2000 32

Web Search Strategies...


Tips for effective web searching
Prefer phrase & adjacency searching to Boolean (stuffed animal than stuffed and animal) Use as many synonyms as possible - search engines use statistical retrieval methods and produce better results with more query words Avoid use of very common words (e.g., computer) Enter search terms in lower case. Use upper case to force exact match (e.g. Light Combat Aircraft, LCA)

Use More like this option, if supported by the search engine (e.g. Excite, Google)

T.B. Rajashekar

November 2000

33

Web Search Strategies...


Tips for effective web searching
Repeat the search by varying search terms and their combinations; try this on different search tools Enter most important terms first - some search tools are sensitive to word order

Use the NOT operator to exclude unwanted pages (e.g.: biodata, resumes, courses)
Go through at least 5 pages of search results before giving up the scan

Select 2 or 3 search tools and master the search techniques

T.B. Rajashekar

November 2000

34

Sample Web Searches


Companies dealing with polymers Do not use search engines (too many irrelevant hits) Use directory sources (e.g. www.yahoo.com) Follow the categories: Business and Economy Business-to-Business Chemicals Do a sub-search on Polymers Use specialty search engines (e.g. www.bizweb.com)

T.B. Rajashekar

November 2000

35

Sample Web Searches...


Web pages related to Light Combat Aircraft Keywords are unique Use Search Tools (e.g. www.altavista.com) Search for Light Combat Aircraft (phrase search in simple search interface) Use of double quotes will force the search engine to consider the set of keywords as a phrase Search can be limited to specific dates More refined search in advanced search interface: Light Combat Aircraft AND India

T.B. Rajashekar

November 2000

36

Sample Web Searches...


Web sources related to simulation or modeling of

activated sludge process


This is a concept search - search tools are better Using Altavista, the query may be submitted as (simulat* OR model*) AND activated sludge process Note use of * to cover word variations like simulated, simulate, models, etc. Note use of phrase form for activated sludge process

T.B. Rajashekar

November 2000

37

Guides to Search Tools


www.beaucoup.com (guide to 2,000+ search engines,

indices and directories) www.searchpower.com (a very comprehensive search engine directory - claims over 16,000 search engine listings!) www.123go.com/drw/search/search.htm (Dr. Websters Big Page of Search Engines ) www.finderseeker.com (The search engine of search engines) www.virtualfreesites.com (Over 1,000 specialised search engines)
November 2000 38

T.B. Rajashekar

Keeping Current
AskScott (www.askscott.com): Provides a very

comprehensive tutorial on search engines SearchEngineWatch (www.searchenginewatch.com) The site offeres information about new developments in search engines and provides reviews and tutorials. Botspot (www.botspot.com): Collection and guide to variety of bots (intelligent agents)

T.B. Rajashekar

November 2000

39

raja@ncsi.iisc.ernet.in

T.B. Rajashekar

November 2000

40

You might also like