Keywords - Natural Language Processing, Machine Learning, Python, NLTK, scikit-learn
- Built and trained machine learning models to analyse and classify the binary sentiment of a movie review in the dataset of IMDB movie reviews containing 50,000 movie reviews.
- Experimented with and tuned Decision Tree, Naïve Bayes and SVM models for the classification, achieving accuracy of 87.4% with Linear SVM model with frequency bag-of-words text representation.
Code
folder contains python code files to pre-process the raw text data and train the machine learning models- The set of python files
generate_dataset_*.py
clean and pre-processe the text data model_training_bagOfwords.py
trains several machine learning models using bag-of-words text representationmodel_training_freqency_bagOfwords.py
trains several machine learning models using frequency bag-of-words text representation
- The set of python files
Dataset
folder contains the raw as well as the pre-processed dataset- At
Root
location- Project Presentation (
Sentiment Analysis - Project Presentation - Final.pdf
)
- Project Presentation (