GitHub - yjkogan/CS205-Final-Project

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
Code		Code
RawData/helper_scripts		RawData/helper_scripts
Submission		Submission
Website		Website
DESIGN		DESIGN
README		README
results.csv		results.csv
results.xlsx		results.xlsx
website_video_urls.txt		website_video_urls.txt

Repository files navigation

To run the project, use the following command line arguments:

python [LAUNCHER] [MAPPER] [DATASET_TO_COMPARE_WITH] [DATASET_TO_READ_FROM]

By default, the script runs locally. With a properly configured .mrjob file, the flag -emr can be used to run the script on Amazon EMR. For example:

python mr_launcher2.py matrix_mr.py s3://temp-205/large_dataset.csv large_dataset.csv -emr

To change the number of mapper instances you need to go into mr_launcher2.py. While there, you can also set a limit on the number of times to launch a job on EMR.

The result of a successful run is the creation of file scoreDict.txt and a line in the file perflog.csv. Lines in perflog.csv have the following format:

[ROWS COMPUTED],[TOTAL RUNTIME],[AVGRUNTIME],[TIMESTAMP]

You can also just directly run one of the scripts in the typical mr.job way using the following form:

python [MAPPER] [-r emr] [--jobconfmapred.red.tasks=n] [DATASET_TO_COMPARE_WITH] &gt; [OUTPUT_FILE]

However, you need to be careful that you give mappers the correctly formatted data file.

The files serial_example.py, n2_inside_mr.py and foremr_headts.py all need to be run locally and need some hard coded values to be altered in order to run properly. They are run with the following command line arguments:

python [MAPPER] [DATABASE] e.g. python foremr_headts.py user2_export.csv

serial_example.py needs the path to the database hard-coded to the variable filename

n2_inside_mr.py needs the path to the database hard-coded to the variable filename.

foremr_headts.py needs the path to the database of pre-computed teachers and the list of results from that computation. The output of n2_inside_mr.py can serve as this list as it is the correct format. Running n2_inside_mr.py on a subset of the database was how we generated our list.

About

No description, website, or topics provided.

Readme

Activity

1 star

3 watching

0 forks

Report repository

Releases

No releases published

Packages

No packages published

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code

Code

RawData/helper_scripts

RawData/helper_scripts

Submission

Submission

Website

Website

DESIGN

DESIGN

README

README

results.csv

results.csv

results.xlsx

results.xlsx

website_video_urls.txt

website_video_urls.txt

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

yjkogan/CS205-Final-Project

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages