-
Notifications
You must be signed in to change notification settings - Fork 0
ppreet/EECS595_Project
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
EECS 595 Class Project ----------------------------------------- USAGE: Use "python3 main.py [binary | frequency | tf-idf] [naive_bayes | logistic_regression | knn | svm | baseline | random_forest | gradient_boosted_tree | neural_network | k_means]" ----------------------------------------- DEVELOPMENT: To get feature maps, use the following codes: (1). for binary features: from data.features import LyricsDataSet lyricsData = LyricsDataSet('binary') (2). for term frequency: from data.features import LyricsDataSet lyricsData = LyricsDataSet('frequency') (3). for tf-idf: from data.features import LyricsDataSet lyricsData = LyricsDataSet('tf-idf') To get train and test splits, use the following codes: train_x = lyricsData.get_train_x() train_y = lyricsData.get_train_y() test_x = lyricsData.get_test_x() test_y = lyricsData.get_test_y() Note that train_x and test_x are lists of scipy.sparse_matrix, to convert it to np.array(), you can use train_x[i].toarray() ------------------ |db_processing.py| ------------------ This file is used to transform the original dataset mxm_dataset.db to several .txt files including data/frequency_features.txt, data/vocabulary.txt, data/genreList.txt. The new files are much smaller to the original file (<60MB vs. 2.6GB) so that time consumed on reading train and test split could be greatly reduced from (10 min to 20s). If you downloaded the data/ directory, there is no need to run this code.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published