GitHub - botaohu/cs276-pa3

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
doc		doc
AllQueryTerms		AllQueryTerms
CS276_PA3.pdf		CS276_PA3.pdf
README		README
gridsearch.py		gridsearch.py
ndcg.py		ndcg.py
people.txt		people.txt
queryDocTrainData		queryDocTrainData
queryDocTrainRel		queryDocTrainRel
rank.py		rank.py
rank.sh		rank.sh
rank0.py		rank0.py
starter.zip		starter.zip
submit.py		submit.py
term_doc_freq		term_doc_freq

Repository files navigation

This folder contains the following files:

1. Data
a. queryDocTrainData
This file contains the training data for this assignment. For each (query,url) pair, there are several features given (details available in the assignment description)
b. queryDocTrainRel
This file contains the relevance values for each (query,url) pair given in the queryDocTrainData file. This file can be used for evaluation while building the model
c. AllQueryTerms
This file contains the tokens contained in query terms "across train and test data"

2. Helper code
a. rank0.py
This is a baseline skeleton code provided for your help. It contains functions to parse the features data and write the ranked results to stdout. You may or may not use this code, just make sure your output format is the same as the one produced by this file(and mentioned in the handout).

The baseline simply ranks the urls in decreasing order of number of body_hits across all query terms.
b. ndcg.py
This is the code for calculating the ndcg score of your ranking algorithm. You can run the code as follows:
$ python ndcg.py <your ranked file> <file with relevance values>

For example, if you store the results of baseline in a file called "ranked.text", in order to calculate it's ndcg score, you can run the following command:
$ python ndcg.py ranked.txt queryDocTrainRel

3. rank.sh
This is the script we will be calling to execute your program. The script takes 2 arguments: 1) the id of the task (0/1/2/3/4, 4 is for extra credit, 0 for baseline), 2) input data file (in the specified format). Therefore, in order to run the baseline code, you can execute:
$ ./rank.sh 0 queryDocTrainData

You can use any language to do the assignment as long as you follow two requirements:
- rank.sh should work with the two parameters as mentioned above
- rank.sh should output your ranking results in the correct format to stdout
- your code can take any number of extra arguments, the script should only take these two
- the way the script is written right now, it assumes that the files for the tasks are called rank1.py, rank2.py, rank3.py, rank4.py (extra credit). You can change the script if you want as long as it meets the input/output requirements

4. submit.py
This is the submit script used for the assignment. Please submit each task (and report) individually. In order to submit a task, simply run the following command:
$ python submit.py

and follow the instructions. Note that 1/2/3 are tasks mentioned in the assignment, 0 is for the report and 4 is for extra credit (optional). The report should be present in the same folder with the name "report.pdf"

About

No description, website, or topics provided.

Readme

Activity

1 star

3 watching

0 forks

Report repository

Releases

No releases published

Packages

No packages published

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

AllQueryTerms

AllQueryTerms

CS276_PA3.pdf

CS276_PA3.pdf

README

README

gridsearch.py

gridsearch.py

ndcg.py

ndcg.py

people.txt

people.txt

queryDocTrainData

queryDocTrainData

queryDocTrainRel

queryDocTrainRel

rank.py

rank.py

rank.sh

rank.sh

rank0.py

rank0.py

starter.zip

starter.zip

submit.py

submit.py

term_doc_freq

term_doc_freq

Repository files navigation

About

Releases

Packages

Languages

botaohu/cs276-pa3

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages