The required modules has been listed in the requirements.txt You can install them by pip install -r requirements.txt. Strongly recommend setup the environment in virtualenv
Steps:
- Extract the data.tar.gz
- Run merge.py to produce a CSV file instead of thousands of files. (Merged file provided, so this step is optional)
- Run preprocessing.py, change the method you are going to apply (TFIDF or NN)
- run run_tfidf.py or run_cnn.py
Result in Kaggle: Accuracy: 0.89709 Final Report is: ML_Project_Report.pdf