GitHub - nanodust/MusicMining: mining music using the million song dataset

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Centers		Centers
.gitignore		.gitignore
ER_new.png		ER_new.png
README		README
apriori.r		apriori.r
centers.r		centers.r
compared.r		compared.r
conf.ini		conf.ini
db_helper.py		db_helper.py
db_modified.csv		db_modified.csv
hdf5_getters.py		hdf5_getters.py
hdf5_import.py		hdf5_import.py
hdf5_import_threading.py		hdf5_import_threading.py
hdf5_to_csv.py		hdf5_to_csv.py
hdf5_to_mysql.py		hdf5_to_mysql.py
migration_script.sql		migration_script.sql
music-mining.pdf		music-mining.pdf
musickmeans.r		musickmeans.r
musickmeansReduced.r		musickmeansReduced.r
musickmeansWrite.r		musickmeansWrite.r
mysql_helper.py		mysql_helper.py
p3_h5tocsv.py		p3_h5tocsv.py
push_to_db.py		push_to_db.py
rules_sample.csv		rules_sample.csv
rules_to_mysql.py		rules_to_mysql.py
singleAtt.csv		singleAtt.csv
test.py		test.py

Repository files navigation

#~~~~~~~ MySQL Import ~~~~~~~~
To import the h5 files to a MySQL server follow these steps:

First, run the migration_script.sql in your MySQL server. This will create a
'millionsong' database and several tables.

Next, we need to run the next script (make sure conf.ini matches your MySQL server configuration):

	python hdf5_to_mysql.py PATH_TO/MillionSongSubset/data [a] [b]

[a] and [b] specify the range of files to import (from 0 to 10,000), if you leave it empty it will import all files.

#~~~~~~~ K means clustering analysis~~~~~
To perform the k means analysis on the data the file db_modified.csv has
to be created. This is created from the db.csv file created using the
command:
    $python hdf5_to_csv.py PATH_TO/MillionSongSubset/data

The output file db.csv is opened and saved as a proper csv file(db_modified.csv)
for R has trouble handling multiple separator tokens.

Kmeans is performed on the resulting csv for 2 orders of partitioning using the
musickmeans.r script.
Within R terminal:
    >source("musickmeans.r")
To output the partitions created to separate csv files, within the R terminal:
    >source("musickmeansWrite.r")

To see the cluster centers created in musickmeansWrite.r, the centers can be
exported via the r script:
    >source("centers.r")

To see the accuracy of clustering within one level of clustering run the
R script within the R terminal:
    >source("compared.r")

#~~~~~~~~ Association ~~~~~~~~~

To create association rules run the aprior.r script. You need to install the 'arules' library.
The apriori.r script expects "train_triplets.txt" file to be present in the same working directory.
	>source("apriori.r")

This will create a file named "rules.csv" in the same working directory. This file will contain the 
association rules, support, confidence and lift values.

To import "rules.csv" to MySQL you may run the following script:
	python rules_to_mysql.py rules.csv

This will import the rules.csv values to the 'song_association' table. (see migration_script.sql)
Also, make sure conf.ini matches your MySQL configuration before running this script.


#~~~~~~~~ Team Roles ~~~~~~~~~~~~
There are a few things on which all of us worked together - like, selecting the dataset, discussing what to put in the reports and poster, selecting attributes, etc.

Alyssa worked mainly on the data analysis using clustering and she was the R expert in the team. She is the first person in the group to understand the data completely. She was also in charge of the report write-up and proofreading.

Prashanth worked on creating the database schema, generating diagrams and tuning the schema. He also developed the basic scripts that would write data to the database from python data structures. He was in charge of preparing the poster.

Carlos worked on the python scripts to read data from hdf5 files and putting them into csv. He also worked on python scripts that read data from files and put them into python data structures, which can make use of the scripts created by Prashanth (and improvising them). He also worked on analysing the data using association.