Introduction To Information Rertrieval Recitation

Uploaded by

Hafidz Jazuli Luthfi

0% found this document useful (0 votes)

44 views2 pages

Copyright

Available Formats

ODT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as ODT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

44 views2 pages

Introduction To Information Rertrieval Recitation

Uploaded by

Hafidz Jazuli Luthfi

Copyright:

Available Formats

Download as ODT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

1.

1 An example information retrieval problem

Let say, we want to built a document collection of novel books. To do that, we need to create a matrix
of terms and documents. If we found one billion terms in one thousand documents, our document
collection we be represented as matrix of one billion rows multiply by one thousand coloums. If an
document dy included a term tx, then we added value 1, else we added 0. So, our document collection
will be an binary matrix.
In the we need to find any document that matched with query, for example “Harry Potter The Gandalf
”. We may used boolean logic that translate our query into “Harry AND Potter AND The AND
Gandalf”. Technically, we send a command to find any documents that exactly included terms “Harry”,
“Potter”, “The”, and “Gandalf”. If a document does not has a term “Harry”, but has rest of three terms,
then a documents is not match. For example: we have five documents and the boolean operation will
be:
10010 AND 01010 AND 00010 AND 10111 = 00010
We can said that only document with id four (d4) is matched to our query.

1.2 A first take at building an inverted index

If we have very big document collection, then using matrix to store our terms and documents id is
efficient, since we store value 0 for every not matched terms. A good solution is by only store matched
terms. One good data structure that make it possible is inverted index. In the real world, you can find
similar implementation in the last page of book called as “Index” where some important terms ordered
alphabetically and for each term included page number where we can found its term. In the computer
science, we use linked list data structure to implement an inverted index.
The building of an inverted index will looked like picture below:
Dictionary is set of terms found in collection sorted alphabetically and Postings is set of sorted
document identifier that found in specific term.

1.3 Processing Boolean Queries

Processing an boolean queries required an merge algorithm to find union for disjunction operation and
intersection for conjunction operation. We could optimize query processing through order boolean
execution by postings size of term.

1.4 The extended Boolean model versus ranked retrieval

Boolean retrieval model have been widely used in search engine system before 1990. Westlaw is largest
commercial legal search service since 1975 which adopted extended boolean model as main search
engine until 1992. Below the examples of boolean model querying:
• “John AND Drive is same sentence”: John /s Drive
• “John AND Drive is same paragraph”: John /p Drive
• “John OR Joe”: (John Joe)

• “Any term consisted post-”: post!

However, extended boolean model still not capable to rank retrieved information because the essential
of boolean model is only capable to identify any terms in query found in any documents. Boolean
query can not judge how similar each document and how relevant each document to information need.

SearchingForInformation-Lanning2014
Document17 pages
SearchingForInformation-Lanning2014
Eduar Aguirre
No ratings yet
Lectures Merged
Document327 pages
Lectures Merged
Alankrit Kr. Singh
No ratings yet
A Review of 'The Anatomy of A Large-Scale Hypertextual Web Search Engine'
Document27 pages
A Review of 'The Anatomy of A Large-Scale Hypertextual Web Search Engine'
Nilay Khandelwal
No ratings yet
IR Unit 2
Document54 pages
IR Unit 2
jaganbecs
No ratings yet
Lecture 3-Skip Pointers and Phrase Queries
Document12 pages
Lecture 3-Skip Pointers and Phrase Queries
Yash Gupta
No ratings yet
Made By:-Bhawana Agarwal Cs Iiiyr
Document29 pages
Made By:-Bhawana Agarwal Cs Iiiyr
Bhawana Agarwal
No ratings yet
Deep Search: How to Explore the Internet More Effectively
From Everand
Deep Search: How to Explore the Internet More Effectively
Alan Pearce
Rating: 5 out of 5 stars
5/5 (7)
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
Document42 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
Gnanasekaran
No ratings yet
Lecture 4
Document38 pages
Lecture 4
Satzhan Kadir
No ratings yet
Introduction of IR Models
Document67 pages
Introduction of IR Models
Magarsa Bedasa
No ratings yet
Preferential Infinitesimals for Information Retrieval: An SEO-Optimized Title
Document34 pages
Preferential Infinitesimals for Information Retrieval: An SEO-Optimized Title
khushi maheshwari
No ratings yet
MASWS Assignment 3 A Review On Picky, A SW Search Engine: 1 Doing Search Over Semantically Organized Text?
Document5 pages
MASWS Assignment 3 A Review On Picky, A SW Search Engine: 1 Doing Search Over Semantically Organized Text?
Qijun Liu
No ratings yet
Explaining Vector Databases in 3 Levels of Difficulty - by Leonie Monigatti - Jul, 2023 - Towards Data Science
Document12 pages
Explaining Vector Databases in 3 Levels of Difficulty - by Leonie Monigatti - Jul, 2023 - Towards Data Science
van mai
No ratings yet
Introduction of IR Models
Document62 pages
Introduction of IR Models
Magarsa Bedasa
No ratings yet
Lesson 3 A Internet As Nursing Resource
Document64 pages
Lesson 3 A Internet As Nursing Resource
beer_ettaa
No ratings yet
Searching Document in Google
Document23 pages
Searching Document in Google
Bikram Prajapati
No ratings yet
Course Notes For Unit 1 of The Udacity Course CS262 Programming Languages
Document32 pages
Course Notes For Unit 1 of The Udacity Course CS262 Programming Languages
Iain McCulloch
No ratings yet
Exercise 1.2010.solutions
Document3 pages
Exercise 1.2010.solutions
marwa mk
83% (24)
Semantic Web Ontology
Document12 pages
Semantic Web Ontology
Zarana Kanani
No ratings yet
Google Dorks
Document2 pages
Google Dorks
facebookdotcom
No ratings yet
Finding Similar Items: Aadil Ahmad, Pawan Kumar, Himanshu Kamboj, and Sunil Kumar
Document3 pages
Finding Similar Items: Aadil Ahmad, Pawan Kumar, Himanshu Kamboj, and Sunil Kumar
Aadil Ahmad
No ratings yet
Vector Space Model Explained
Document11 pages
Vector Space Model Explained
jeysam
No ratings yet
SP-6 - Jenis-Jenis Pencarian
Document3 pages
SP-6 - Jenis-Jenis Pencarian
Mas Simon
No ratings yet
KL 5 Ei 7 Vfegr BIO7 W 99 Ed 9 XT 2 F Yq TEn 4 BB ULEX9 FN
Document13 pages
KL 5 Ei 7 Vfegr BIO7 W 99 Ed 9 XT 2 F Yq TEn 4 BB ULEX9 FN
Yoga Pratama R.
No ratings yet
Ontology-Based Semantic Web Search
Document41 pages
Ontology-Based Semantic Web Search
Andy Satria
No ratings yet
Semantic Search - Reader View
Document21 pages
Semantic Search - Reader View
Santa Cruz
No ratings yet
Natural Language Processing Lec 5
Document12 pages
Natural Language Processing Lec 5
Touseef sultan
No ratings yet
Information Retrieval Systems Explained
Document33 pages
Information Retrieval Systems Explained
Hailemariam Setegn
No ratings yet
Some of The Most Commonly Used Google Dorks Are:: 1. Site
Document2 pages
Some of The Most Commonly Used Google Dorks Are:: 1. Site
Alex A
No ratings yet
Introduction to Semantic Web Data Integration
Document184 pages
Introduction to Semantic Web Data Integration
meenakshigm
No ratings yet
Widc Tfidf
Document20 pages
Widc Tfidf
nyoman s
No ratings yet
Google-based Information Extraction Finds People and Movies
Document8 pages
Google-based Information Extraction Finds People and Movies
hectorboudin
No ratings yet
Assignment 6
Document3 pages
Assignment 6
Udaypreet Singh
No ratings yet
Google Search Operators.
Document8 pages
Google Search Operators.
superkan619
No ratings yet
Information Retrieval Models
Document4 pages
Information Retrieval Models
HellenNdegwa
No ratings yet
Monster Boolean Basics
Document2 pages
Monster Boolean Basics
Eeza Ortile
No ratings yet
Unit Ii Modeling
Document15 pages
Unit Ii Modeling
SASIKALA C CS106
No ratings yet
Question Answering
Document29 pages
Question Answering
SV PR
No ratings yet
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
Document26 pages
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
lukinmyeyes
No ratings yet
Anti-Serendipity: Finding Useless Documents and Similar Documents
Document9 pages
Anti-Serendipity: Finding Useless Documents and Similar Documents
feysalnur
No ratings yet
Learning Python from Zero to Hero
Document23 pages
Learning Python from Zero to Hero
Ramesh Kumar
No ratings yet
Lucene Solr
Document52 pages
Lucene Solr
Rubila Dwi Adawiyah
No ratings yet
Term Frequency and Inverse Document Frequency
Document26 pages
Term Frequency and Inverse Document Frequency
lalitha sri
No ratings yet
PPT08-Natural Language Processing
Document44 pages
PPT08-Natural Language Processing
Guntur Wibisono
No ratings yet
Book246 PDF
Document15 pages
Book246 PDF
Philippe Biton
No ratings yet
Elasticsearch Blueprints - Sample Chapter
Document24 pages
Elasticsearch Blueprints - Sample Chapter
Packt Publishing
No ratings yet
Boolean Retrieval: Dr. Hosam Refaat
Document21 pages
Boolean Retrieval: Dr. Hosam Refaat
Hussain Saeed
No ratings yet
Introduction To Information Storage and Retrieval Systems: BY-Research Scholar
Document42 pages
Introduction To Information Storage and Retrieval Systems: BY-Research Scholar
umraojigyasa
No ratings yet
Information Retrieval and Web Search
Document69 pages
Information Retrieval and Web Search
Shiluti Tsuhah
No ratings yet
m04 Study Questions-Cataloging
Document3 pages
m04 Study Questions-Cataloging
api-541774438
No ratings yet
NLP Mod-5
Document17 pages
NLP Mod-5
sharonsajan438690
No ratings yet
IR Techniques for Unstructured Document Retrieval
Document14 pages
IR Techniques for Unstructured Document Retrieval
Chandrashekhar Naik
No ratings yet
Sec 1
Document36 pages
Sec 1
Olivia Michel
No ratings yet
Optimize Access to Large Datasets with Index Sequential Files
Document7 pages
Optimize Access to Large Datasets with Index Sequential Files
Kadiri precious Eghosa
No ratings yet
XHTML
From Everand
XHTML
Jitendra Patel
No ratings yet
Useful Python
From Everand
Useful Python
Stuart Langridge
No ratings yet
Bitemporal Data: Theory and Practice
From Everand
Bitemporal Data: Theory and Practice
Tom Johnston
No ratings yet
Troubleshooting Oracle Performance
From Everand
Troubleshooting Oracle Performance
Christian Antognini
Rating: 5 out of 5 stars
5/5 (2)
Practical Web Development with Haskell: Master the Essential Skills to Build Fast and Scalable Web Applications
From Everand
Practical Web Development with Haskell: Master the Essential Skills to Build Fast and Scalable Web Applications
Ecky Putrady
No ratings yet
Structured Search for Big Data: From Keywords to Key-objects
From Everand
Structured Search for Big Data: From Keywords to Key-objects
Mikhail Gilula
No ratings yet
Graph Algorithm
Document14 pages
Graph Algorithm
Hafidz Jazuli Luthfi
No ratings yet
Data Compression
Document12 pages
Data Compression
Hafidz Jazuli Luthfi
No ratings yet
String Algorithm
Document17 pages
String Algorithm
Hafidz Jazuli Luthfi
No ratings yet
Introduction To Information Rertrieval Answer
Document6 pages
Introduction To Information Rertrieval Answer
Hafidz Jazuli Luthfi
100% (4)
Artificial Intilligence A Modern Approach Recitation
Document7 pages
Artificial Intilligence A Modern Approach Recitation
Hafidz Jazuli Luthfi
No ratings yet
Regular Expressions
Document4 pages
Regular Expressions
Hafidz Jazuli Luthfi
No ratings yet
String Matching
Document4 pages
String Matching
Hafidz Jazuli Luthfi
No ratings yet
Graph Algorithm
Document4 pages
Graph Algorithm
Hafidz Jazuli Luthfi
No ratings yet
Union-Find Algorithm
Document3 pages
Union-Find Algorithm
Hafidz Jazuli Luthfi
No ratings yet
Chapter 1 - Transaction Processing and MGT
Document70 pages
Chapter 1 - Transaction Processing and MGT
nathan eyasu
No ratings yet
Penggunaan Focus Group Discussion (FGD) Dalam Meneroka Permasalahan Pengurusan Kunci Di Kolej Komuniti Masjid Tanah (KKMT)
Document10 pages
Penggunaan Focus Group Discussion (FGD) Dalam Meneroka Permasalahan Pengurusan Kunci Di Kolej Komuniti Masjid Tanah (KKMT)
ferdi kuswandi
No ratings yet
Oracle Testking 1z0-070 v2019-03-20 by Rodrigo 56q
Document43 pages
Oracle Testking 1z0-070 v2019-03-20 by Rodrigo 56q
Moataz Abdelhamid
No ratings yet
CS2032 Data Warehousing and Data Mining PPT Unit I
Document88 pages
CS2032 Data Warehousing and Data Mining PPT Unit I
vdsrihari
No ratings yet
Info HQ HL7 Specification RevF (732037-00F)
Document54 pages
Info HQ HL7 Specification RevF (732037-00F)
pinheiromarcelino
No ratings yet
Relational Data Model and ER/EER-to-Relational Mapping
Document85 pages
Relational Data Model and ER/EER-to-Relational Mapping
Thạnh Hồ
No ratings yet
Agra TERM2 XII CS QP PB1
Document6 pages
Agra TERM2 XII CS QP PB1
ꜱ.ꜱʀɪᴠᴀʀꜱнᴀɴ
No ratings yet
Jena
Document29 pages
Jena
luong nguyen
No ratings yet
DGkids GRADE 6
Document2 pages
DGkids GRADE 6
mohamed elkholany
No ratings yet
National Data Management Office - Session Day 01
Document60 pages
National Data Management Office - Session Day 01
Murad
No ratings yet
Esri and Oracle
Document16 pages
Esri and Oracle
sasa.vukoje
No ratings yet
Guidewire Best Practice UpD
Document33 pages
Guidewire Best Practice UpD
Anumesh Shetty
0% (1)
Devops Engineering Aws Lab-Guide-1.6
Document165 pages
Devops Engineering Aws Lab-Guide-1.6
m m
No ratings yet
02 - OceanStor Dorado V6 Architecture and Key Technology V1.1 PDF
Document62 pages
02 - OceanStor Dorado V6 Architecture and Key Technology V1.1 PDF
Николай Галанов
No ratings yet
CDS - Value Helps
Document9 pages
CDS - Value Helps
Ajay Gupta
100% (1)
Concepts of Database Management Eighth Edition
Document41 pages
Concepts of Database Management Eighth Edition
Isaac Pavesich
No ratings yet
MySQL Performance Tuning Step by Step
Document36 pages
MySQL Performance Tuning Step by Step
Jairo Serrano
No ratings yet
SQL Server command-line utilities guide
Document17 pages
SQL Server command-line utilities guide
new2track
No ratings yet
Analyzing Data to Guide Decisions Using Descriptive, Diagnostic, Predictive, and Prescriptive Analytics
Document6 pages
Analyzing Data to Guide Decisions Using Descriptive, Diagnostic, Predictive, and Prescriptive Analytics
Emily Cleveland
No ratings yet
Computer YEAR 13 WORKSHEET
Document35 pages
Computer YEAR 13 WORKSHEET
Damien Wong
No ratings yet
Hitachi Data Systems Software Matrix Product Line Card PDF
Document12 pages
Hitachi Data Systems Software Matrix Product Line Card PDF
vcosmin
No ratings yet
3 Modular - Engineering - enUS - en-US - PDF - COMOS Platform Mod
Document1 page
3 Modular - Engineering - enUS - en-US - PDF - COMOS Platform Mod
Ivan Nikodijevic
No ratings yet
Class 12 CBSE MySQL Computer Science Worksheet
Document3 pages
Class 12 CBSE MySQL Computer Science Worksheet
yazhinirekha4444
No ratings yet
Digital Technologies and The Management of Conservation Documentation
Document92 pages
Digital Technologies and The Management of Conservation Documentation
David Green
No ratings yet
Jacob Berg Resume 2012
Document2 pages
Jacob Berg Resume 2012
jacobsberg
No ratings yet
Innovative HUAWEI CLOUD Services
Document50 pages
Innovative HUAWEI CLOUD Services
Marco Marco
No ratings yet
Art Gallery Management Systems
Document15 pages
Art Gallery Management Systems
sameerroushan
83% (6)
Submittal Transmittal Form Review
Document1 page
Submittal Transmittal Form Review
César Cárdenas Cuentas
No ratings yet
Assignment Paper - Week 2-1B
Document7 pages
Assignment Paper - Week 2-1B
arthurmpendamani
No ratings yet