Machine-Learning-based-classification-for-Sentimental-analysis-of-IMDb-reviews

Introduction

It is a course project in CS229 Machine Learning at Stanford University in 2020. The project is a Natural Language Processing topic. The languages and relevent packages are Python - Tensorflow - Keras - Scikit-learn. The project aims to classify the IMDb reviews as positive and negative reviews.

Dataset

For this analysis we’ll be using a dataset of 50,000 movie reviews taken from IMDb. The data was compiled by Andrew Maas and can be found here: IMDb Reviews. The data is split evenly with 25k reviews intended for training and 25k for testing your classifier. Moreover, each set has 12.5k positive and 12.5k negative reviews. IMDb lets users rate movies on a scale from 1 to 10. To label these reviews the curator of the data labeled anything with ≤ 4 stars as negative and anything with ≥ 7 stars as positive. Reviews with 5 or 6 stars were left out.

Preprocess

The words are lemmatized and preprocessed into matrix using binary, count, tfidf vectorizer

Model

The models include logistic regression, random forest, gradient boosting and mlp classifier using Scikit-learns. In addition, the project uses three embedding methods, NNLM 50, NNLM 128, NNLM 128 with normalization and universal encoder and Long Short Term Memory (LSTM) to classify the text.

Result

The training acc of nnlm embedding model can achieve around 98%. But for the valadation acc, the sklearn models with count vectorizer are better.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
N-gram_model		N-gram_model
Sequence_model		Sequence_model
results		results
README.md		README.md
Sentiment analysis of IMDb reviews-CS229 Fianl report.pdf		Sentiment analysis of IMDb reviews-CS229 Fianl report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

N-gram_model

N-gram_model

Sequence_model

Sequence_model

results

results

README.md

README.md

Sentiment analysis of IMDb reviews-CS229 Fianl report.pdf

Sentiment analysis of IMDb reviews-CS229 Fianl report.pdf

Repository files navigation

Machine-Learning-based-classification-for-Sentimental-analysis-of-IMDb-reviews

Introduction

Dataset

Preprocess

Model

Result

About

Releases

Packages

Languages

JoshWuuu/Machine-Learning-based-classification-for-Sentimental-analysis-of-IMDb-reviews

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning-based-classification-for-Sentimental-analysis-of-IMDb-reviews

Introduction

Dataset

Preprocess

Model

Result

About

Resources

Stars

Watchers

Forks

Languages