Professional Documents
Culture Documents
Creation Process
e
describ
Descriptions
Data/Document
are represented by
Storage System
t
en
es
pr
re
Data/Document
make up organized into provide structure to
Data Objects Feeds Tables Databases
are gath coul
ered by d be
or cont
ga aine
ni provide d in
News, a
are
ze
d multiple
Retrieval System
Products,
are
Objects residing in databases that by ate
an
cre views of Related Links,
are not referenceable by a simple to
ind
ate d data in a Trademarks,
URI such as ener use A repository of data that could be stored in RDF format.
exe
g is database Recirculation,
company names, addresses, phone
db
Stocks Can be updated independently of the index that references it.
numbers, names of people, generate
y
personal profiles, horoscopes, 3rd Parties Metadata Entries in the graph can reference each other. For instance,
credit card numbers, product ca a request for a stock quote of an entry could point to the
names, prices, maps, event na symbol, the symbol could point to the company name and the
Alexa ct data that contains current quote, the current quote could point to the quote history,
listings, news clips, weather as
are
e
Bigfoot information about and the quote history could point to a quote histogram.
crib
Pages in context
describe make up are organized into provide structure to WWW Site could be contained in a
Hierarchical
Attributes WWW Sites Submissions
Taxonomies
Directory Graph
de Databases
scr la
ibe are g te
ro
are
athe Usenet Yahoo!
de
ar
red b rg
ain
ei
a Yahoo! Open Directory
an
scr
y ni t
ain
nd
ze Infoseek
ibe
m Open Directory
nd
ex
d a
by
ed
ra te ate Dewey Decimal
gene cre
b
Library of Congress
y
can be can assign quality ratings A collection of technologies that attempt Amazon ranks related books and CDs by keeping track
WWW Users Editors
such as editor’s choice to rank the relative value of a set of results. of the purchasing habits of groups (group profiles).
or cool site of the day Google ranks results during the indexing process Grapevine allows users to actively rank results
ar
paid
are
e
by examining the relationships created by the hyperlinks (a customization process) but then attempts to guess
nt
volunteer
p
in the documents. More popular sites score higher. what the user will want based on what it has learned
e
in a
u
res
ned
ke
(a personalization process).
ntai Clever does the ranking during post processing. It uses
rep
co
ma
d be
coul an iterative process that looks at the popularity of a Direct Hit analyzes the time spent by users on a given
site (much like Google) but does so in regard to the result. It also keeps track of whether or not a user returns
make up make up are organized into provide structure to
WWW Objects WWW Pages Page Sorted Lists WWW Document Relevance keywords used in the search. to the results page after viewing a result. If the user
Databases Databases returns, the result receives a lower score.
are Ranking can
is r
are ga an are be d
la
of
thered ind one
efe
by te by
Objects referenceable Groups of WWW ro alphabetically Inktomi ex
ed
ren
rg
by a URI such as Objects in context a ate by relevance Google by
cre
c
ni
ed
html, text, gif, jpeg, png, ze to by modification date Alta Vista
d sed y
b
te by u db Lycos
are
ya
avi, mpeg, real, quicktime, ra use
gene
n
wav, pdf, rss, xml, xul files
fro
a
as
Organizing
at
Crawlers Algorithms used
ad
are Any processes done on the data
such
Principle by
et
act
as
m
as returned from the index. Could involve
ch
ta
sorting, ordering, or compiling the data
da
su
ir
ch retrieved from the index.
pa
su web page, then continue gathering text link analysis
n
on all of the pages linked to the given page,
ca
meta data analysis
for
s
t data
ha then follow the links on those pages, and so on.
suc
collec
are “sensed” by send data to send data to create an sends data to a
Representations Sensors Analyzers Indexers Index Post Processors
gene
rate
generate s
s
Stored proxies of ... are the input devices Creates a lookup table from all of the If a graph is in place, the index functions
physical world objects of the data/document data fed into it. If a graph is not in use as a lightweight lookup table.
and ideas storage system all of the data fed into it is stored as
sends data to an
If no graph is used, the index contains
entries in the lookup table.
all of the data in addition to the lookup table,
I-search and PLS are existing technologies. in a list format. Results Data
su
su ch
ch as
as
an
to
Metadata and Raw Data
a
Data
at
sd
such as such as
nd
could be sent to a
XML, XUL, or RDF HTML streams
se
Data is self-describing Form and content
and separate from the are merged.
form it will eventually take.
Interpreter
sends data to an
Could handle:
boolean operators form information
spell checking from the data
word stemming stream. Classifies
case folding remaining data.
ive
e
internationalization
iv
ece
rece
thesauri
ive
ld r
e
phrase searching
rec
ectly
cou
related terms
ly
ect
d dir
dir
Aggregator
coul
ld
cou
an
m
Combines or interleaves
ro
yf
an data from multiple
ctl
a from sources.
e dat
ire
receiv
ad
could
at
ed
eiv
Articulator
c
re
make
s use
uld
of an
co
m
ro
tl yf
c
re
sends data to a
or ‘Layout Engine’ di
Combines form ta
da
and content. ive
ce
re
d Templates
ul
co
he
yt
sent to an
nb Provides the architecture,
uence
ce
ence
uen
influ
infl
can infl
could be the same as the
can
can
Input Device Output Device
awn by an
could be redr ren
de
Keywords Views rs
can contro Any device that can
displays
cont l the d receive data and
ribu isplay
ca te to ord er of render it for the user.
nb the Could be by relevance,
es spec
to ifica creation date, modification
red tion
cify in of date, alphabetical,
pe
ca ns by source, by media type,
involve progressive disclosure,
Scope is reduced by specifying pagination, or custom look and
additional restrictions upfront.
in a
in a
Active Searchers feel (themes)
an
an
Best for data retrieval.
contains an
can
conta
conta
n
ain
spe
ntai
cify
ont
d co
could
could
ld c
coul
cou
made using a
can be stored in
Options Customization Results Pages
Settings
create
can contri
bute to th
e specific
such as keyverbs, ation of options that are Any UI page or widget that
be
could boolean operators, stored either locally displays the results data
d in
specify
data type, media type, or on a server n taine
could language or domain be co
could
make of a in order to get
People Queries Source Answers
coul
d be
tion of
... can be described by their ecifica
goals, age, gender, income, ion of rove the sp
specificat an imp
geographic location, prove the data th
at c
ta that can im contain
education, hardware and contains da
software, connection speed, Behavioral Personalization
member status History can be stored in Profiles can
require
be
col
lec
click paths, behaviors and ted
session tracking, states that are stored and shou
ld pro
sto
decision tracking either locally or red vide
the
on into requir
rely on a server ed
shou
Scope is reduced by reducing large collections of
ld e
Passive Searchers Group Profiles mpo
choices from a found set, in a profiles that can be
wer
rely
sequence of ever smaller sets. on analyzed to produce
Best for document retrieval. of
tion trend information
in fica
sho
ed peci
uld
tor he s
hel
t
es rove
pp
eop
n b imp
is made up of
le a
n
Data/Document
ca at c
a ttai
a th
exist in the
n th
dat eir
tain
con
involve
Retrieval Process
current page, monitor size,
color settings, javascript
capability
Information
to enable
Actions