We've provided several sample data that we've parsed from the MSD (Million Song Database).

** README **

This project focuses on the grouping of songs based on meaningful and measurable attributes of songs found on the EchoNest via the Million Song Database by parallelizing isomaps on each tracks. The process is divided into three main parts:

Data extraction & processing
Applying isomap on the data
Visualization and extraction

** FILES **

The following files are included in this submission:

songs_parallel.py
songs_serial.py
mrjob_config_file.txt
tiny.dat
out100.dat
out1000.dat
out10000.dat
isomap_parallel.py
driver.py
driver_parallel.py
visualization.py

** Data Extraction & Processing **

If you would like to test it on a small subset of the online data, use tiny.dat as your input file.

In order to extract the data serially locally, run: $ python songs_serial.py [input file] > [output file]

In order to extract the data in parallel locally, run: $ python songs_parallel.py [input file] > [output file]

To set up EMR, edit mrjob_config_file.txt to include credentials, then run: $ export MRJOB CONF=/home/you/yourpath/mrjob_config_file.txt

In order to extract the data in parallel on Amazon EMR, run: $ python songs_parallel.py -r emr [input file] > [output file]

In order to extract the full data on Million Song Database, run: $ python songs_parallel.py -r emr 's3://tbmmsd/.tsv.' > [output file]

This was done to produce out100.dat, out1000.dat, and out10000.dat, with 100, 1000, and 10000 songs respectively.

** Applying Isomap **

This part of the project implements the isomap algorithm to the rows of song data pulled from the million song database.

For comparison, we've implemented both the serial and parallel version of the isomap with the serial version being run by the command:

$ python driver.py

and the parallel version being run by the command:

$ python driver_parallel.py

The input file for analysis can be specified in the driver.py and driver_parallel.py files respsectively.

** Visualization & Extraction **

visualization.py processes and produces plots for the input data file, which consists of the output file from the isomap algorithm above (python drive_parallel.py) and the input file for driver_parallel.py (zipping together the isomap scores and the associated track data for plots)

It can run for various data length outputs - input files and number for 'TRACKS' must be adjusted in the file, but visualization.py runs for the largest data set since it produces the most useful visualizations.

simply run:

$ mpirun -n 4 python visualization.py

runs with 2, 4 or 8 instances

Output: 5 plots. To proceed through the code running, close each graph to proceed to the next one

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
music		music
README.md		README.md
driver.py		driver.py
driver_parallel.py		driver_parallel.py
isomap_parallel.py		isomap_parallel.py
out100000.dat		out100000.dat
plot.py		plot.py
results_1D_parallel		results_1D_parallel
results_1D_serial		results_1D_serial
sample-results.png		sample-results.png
song10000		song10000
songdriver.py		songdriver.py
songs_parallel.py		songs_parallel.py
songs_serial.py		songs_serial.py
unique_artists.txt		unique_artists.txt
visualizations.py		visualizations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

music

music

README.md

README.md

driver.py

driver.py

driver_parallel.py

driver_parallel.py

isomap_parallel.py

isomap_parallel.py

out100000.dat

out100000.dat

plot.py

plot.py

results_1D_parallel

results_1D_parallel

results_1D_serial

results_1D_serial

sample-results.png

sample-results.png

song10000

song10000

songdriver.py

songdriver.py

songs_parallel.py

songs_parallel.py

songs_serial.py

songs_serial.py

unique_artists.txt

unique_artists.txt

visualizations.py

visualizations.py

Repository files navigation

We've provided several sample data that we've parsed from the MSD (Million Song Database).

About

Releases

Packages

physicsistic/recommend_songs

Folders and files

Latest commit

History

Repository files navigation

We've provided several sample data that we've parsed from the MSD (Million Song Database).

About

Resources

Stars

Watchers

Forks