Sentiment Analysis with Yelp Reviews

Source code for

Requirements

gensim 3.5.0
scikit-learn 0.20.1
wordcloud 1.5.0
spacy 2.0.13
imbalanced-learn 0.4.3
dask 0.20.0

Files

The script preprocess.py converts the original json file to a roughly processed parquet file. The next script, process.py, uses spaCy for further processing and text analysis. To vectorize the processed review texts, bigrams and trigrams need to be identified and based on that, a dictionary needs to be created. These and other methods are defined in util.py. Training happens in training.py, visualization in visualization.py.

The trigram phraser model was too big, which is why I left out the bigram and trigram phraser models. The dictionary can be loaded from gensim_dct and the trained model from sentiment_classifier.sav.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

gensim_dct

gensim_dct

preprocess.py

preprocess.py

process.py

process.py

sentiment_classifier.sav

sentiment_classifier.sav

training.py

training.py

util.py

util.py

visualization.py

visualization.py

Repository files navigation

Sentiment Analysis with Yelp Reviews

Requirements

Files

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
gensim_dct		gensim_dct
preprocess.py		preprocess.py
process.py		process.py
sentiment_classifier.sav		sentiment_classifier.sav
training.py		training.py
util.py		util.py
visualization.py		visualization.py

ViviLearns2Code/yelp-review

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis with Yelp Reviews

Requirements

Files

About

Resources

Stars

Watchers

Forks

Languages