Professional Documents
Culture Documents
Due Date: February 19, 2014 (11:55 pm) Submission Instructions: 1. Submit ONE Microsoft Word file containing answers (including the source code) for all the problems given in this assignment via slate. 2. The name of the file should be formatted as mentioned below: <ROLLNO>.docx For example, i090055.docx, etc. Plagiarism: Plagiarism is not allowed. If found plagiarized you will be awarded zero marks in the assignment.
Run Files: By uncompressing input.tar.gz, you will get 129 run files. Each run file was generated by a search engine. The run file contains ranked lists corresponding to 50 different information needs (the queries). In particular, each line in the run file contains following fields in the order given below: 1. 2. 3. 4. 5. 6. Query ID Dummy value (ignore this field) Document ID Rank Score (the score tells us how well the document matches the query) System name (ignore this field).
QREL File: Qrel.txt is the file that contains the gold set, i.e., the relevance label for selected query-document pair. In particular, each line in the QREL contains following fields in the order given below: 1. 2. 3. 4. Query ID Dummy value (ignore this field) Document ID Relevance Label (0 means non-relevant, 1 means relevant)
IR Evaluation: Your task is to evaluate the search engines (the run) and arrange them in decreasing order of MAP values. In order to achieve this objective, you will need to compute the runs average precision for each query and then compute the mean average precision. Present your result using a plot (run vs. MAP value). Write the program that computes the MAP values in Perl. Submit (a) the source code, (b) a text file containing the runs name and its MAP value ordered from best to worst and (c) the plot. !