Skip to content

botaohu/cs276-pa3

Repository files navigation

This folder contains the following files:

1. Data
  a. queryDocTrainData
     This file contains the training data for this assignment. For each (query,url) pair, there are several features given (details available in the assignment description)
  b. queryDocTrainRel
     This file contains the relevance values for each (query,url) pair given in the queryDocTrainData file. This file can be used for evaluation while building the model
  c. AllQueryTerms
     This file contains the tokens contained in query terms "across train and test data"

2. Helper code 
  a. rank0.py
     This is a baseline skeleton code provided for your help. It contains functions to parse the features data and write the ranked results to stdout. You may or may not use this code, just make sure your output format is the same as the one produced by this file(and mentioned in the handout). 

     The baseline simply ranks the urls in decreasing order of number of body_hits across all query terms.
  b. ndcg.py
     This is the code for calculating the ndcg score of your ranking algorithm. You can run the code as follows:
       $ python ndcg.py <your ranked file> <file with relevance values>

     For example, if you store the results of baseline in a file called "ranked.text", in order to calculate it's ndcg score, you can run the following command:
       $ python ndcg.py ranked.txt queryDocTrainRel

3. rank.sh
   This is the script we will be calling to execute your program. The script takes 2 arguments: 1) the id of the task (0/1/2/3/4, 4 is for extra credit, 0 for baseline), 2) input data file (in the specified format). Therefore, in order to run the baseline code, you can execute:
       $ ./rank.sh 0 queryDocTrainData

   You can use any language to do the assignment as long as you follow two requirements:
     - rank.sh should work with the two parameters as mentioned above
     - rank.sh should output your ranking results in the correct format to stdout
     - your code can take any number of extra arguments, the script should only take these two
     - the way the script is written right now, it assumes that the files for the tasks are called rank1.py, rank2.py, rank3.py, rank4.py (extra credit). You can change the script if you want as long as it meets the input/output requirements

4. submit.py
   This is the submit script used for the assignment. Please submit each task (and report) individually. In order to submit a task, simply run the following command:
       $ python submit.py

   and follow the instructions. Note that 1/2/3 are tasks mentioned in the assignment, 0 is for the report and 4 is for extra credit (optional). The report should be present in the same folder with the name "report.pdf"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published