botaohu/cs276-pa3
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This folder contains the following files: 1. Data a. queryDocTrainData This file contains the training data for this assignment. For each (query,url) pair, there are several features given (details available in the assignment description) b. queryDocTrainRel This file contains the relevance values for each (query,url) pair given in the queryDocTrainData file. This file can be used for evaluation while building the model c. AllQueryTerms This file contains the tokens contained in query terms "across train and test data" 2. Helper code a. rank0.py This is a baseline skeleton code provided for your help. It contains functions to parse the features data and write the ranked results to stdout. You may or may not use this code, just make sure your output format is the same as the one produced by this file(and mentioned in the handout). The baseline simply ranks the urls in decreasing order of number of body_hits across all query terms. b. ndcg.py This is the code for calculating the ndcg score of your ranking algorithm. You can run the code as follows: $ python ndcg.py <your ranked file> <file with relevance values> For example, if you store the results of baseline in a file called "ranked.text", in order to calculate it's ndcg score, you can run the following command: $ python ndcg.py ranked.txt queryDocTrainRel 3. rank.sh This is the script we will be calling to execute your program. The script takes 2 arguments: 1) the id of the task (0/1/2/3/4, 4 is for extra credit, 0 for baseline), 2) input data file (in the specified format). Therefore, in order to run the baseline code, you can execute: $ ./rank.sh 0 queryDocTrainData You can use any language to do the assignment as long as you follow two requirements: - rank.sh should work with the two parameters as mentioned above - rank.sh should output your ranking results in the correct format to stdout - your code can take any number of extra arguments, the script should only take these two - the way the script is written right now, it assumes that the files for the tasks are called rank1.py, rank2.py, rank3.py, rank4.py (extra credit). You can change the script if you want as long as it meets the input/output requirements 4. submit.py This is the submit script used for the assignment. Please submit each task (and report) individually. In order to submit a task, simply run the following command: $ python submit.py and follow the instructions. Note that 1/2/3 are tasks mentioned in the assignment, 0 is for the report and 4 is for extra credit (optional). The report should be present in the same folder with the name "report.pdf"
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published