You are on page 1of 4

IS414: Search Engine Technologies

Lucene Assignment
Last Updated on 8 July 2011

Table of Contents
Summary ......................................................................................................................................................... 2
Deliverables ..................................................................................................................................................... 2
Document Set .................................................................................................................................................. 3
Students Testing Before Submission ............................................................................................................ 3
Index Structure ................................................................................................................................................ 3
Mandatory Features ........................................................................................................................................ 4
Test Queries .................................................................................................................................................... 4
Students Own Testing Before Submission.................................................................................................... 4

Summary

Objective: Set up a web-based search engine.


Purpose: After completing this assignment, you will be able to implement a text search function in your
future projects.
Method: Implement a JSP page on Apache Tomcat that invokes Lucene to search a specified index. The
search result should be displayed in the browser.
Start: Use the Lucene Javascript Query Constructor as a starting point.
o You need to add a text field for the index location.
o You need to change the constructed query to match the format expected by the Lucene code.
Hints:
o Explore C:\is414\lucene-2.1.0 folders documentation and samples.
o Although File Indexer (refer to lab 3) is not required in the submission, you may need to modify
some of the codes. You need to build an index that conforms to the required structure in order to
test your search engine.
This group assignment is worth 7% of course grade.
Due: 14 September 2011, noon.
Late submission: Minus 1 point per day.

Deliverables
You should submit working code that implements the 7 mandatory features, in a packaged Web solution
in .war format.
Your code should adhere strictly to the following requirements; if not, we may deduct points or even decide
that your code does not work:

When we test your submission, we will use our own pre-built index. Therefore, your code should work
with the index structure specified in this document; in particular, use exactly the same field names.
Our index may reside in a different directory from yours. Your interface should include a text field for
specifying the index location. The index location should be passed to your Lucene code.

Document Set

Students Testing Before Submission


Folder: C:\is414\Sample2 (this is only for students own validation before submission)
1.
2.
3.
4.
5.
6.

C:\is414\Sample2\PlainText.txt
C:\is414\Sample2\PlainText2.txt
C:\is414\Sample2\RTF.rtf
C:\is414\Sample2\MSWord.doc
C:\is414\Sample2\HTML.html
C:\is414\Sample2\PDF.pdf

Index Structure

Field

Description

body

Content of the searched document

fileName

File format of the searched document (pdf, doc, html, txt)

lastModifiedDate

Last modified date of the searched document

Mandatory Features
All the features in the following sample GUI should be supported. Each feature is worth 1 mark. If any of our test
queries for a feature does not return the correct results, we will assign a zero score for that feature.

1
2
3
4
5

6
7
Refer to Lucene Javascript Query Constructor

Test Queries
Students Own Testing Before Submission
Queries
1

All of the words: content, document

Exact phrase: new

At least one of the words: how, new, document


Without the words: content

All of the words: content, document

Results
1. C:\is414\Sample2\MSWord.doc
2. C:\is414\Sample2\PlainText.txt
3. C:\is414\Sample2\RTF.rtf
1. C:\is414\Sample2\HTML.html
1. C:\is414\Sample2\PlainText2.txt
2. C:\is414\Sample2\HTML.html
3. C:\is414\Sample2\PDF.pdf
1. C:\is414\Sample2\MSWord.doc
2. C:\is414\Sample2\RTF.rtf

Format: NOT plain text (.txt)

You might also like