Amazon Text Analysis using NLTK

In this project we apply natural language processing techniques and ML models to predict the gender of Amazon grocery and gourmet food reviewers.

As of 14/9/2018, we can predict the gender of the writer with 72.24% accuracy, using the Keras CNN. Prior gender prediction research in the field has accomplished accuracy rates of 60-70%.

Project Brief

Amazon has contracted your team to do an exploratory data analysis on product reviews. In particular, they are interested in being able to classify people as male or female based on their reviews. They have given you a dataset of customer reviews of grocery and gourmet food items. Create a model that identifies a person as male/female based on their review (regardless of product).

Dataset Used

Grocery and Gourmet Food Dataset

Amazon Dataset Guidelines

Methodology

The original Grocery and Gourmet food dataset does not include clear gender labels, only the names or usernames of the writers. Therefore, we used the Gender Guesser Library to label the data to be used in our prediction models. Using this method, we were able to classify roughly 25% of the samples available. Additional manual text processing got us to near full dataset to be labeled.

We utilized various models for prediction, with results ranging from 50% to upwards of 70% for the best models.

Contributors

Brenner Haverlock
Kelvin Li
Lorin Fields
Saranya Mandava
Tina Kovacova

This Capstone Project was developed by Lambda School Machine Learning and Data Science students.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.ipynb_checkpoints		.ipynb_checkpoints
myflaskapp		myflaskapp
myflaskapp_new		myflaskapp_new
server		server
xyhackerSite		xyhackerSite
Lorin model 67.49 percent large data v12.ipynb		Lorin model 67.49 percent large data v12.ipynb
.DS_Store		.DS_Store
.gitignore		.gitignore
Amazon_NLTK_Baseline_Model_V4.ipynb		Amazon_NLTK_Baseline_Model_V4.ipynb
Amazon_NLTK_V3.ipynb		Amazon_NLTK_V3.ipynb
README.md		README.md
Review_data_wrangling.ipynb		Review_data_wrangling.ipynb
WebScraper for Amazon reviews.ipynb		WebScraper for Amazon reviews.ipynb
XY_Hacker.ipynb		XY_Hacker.ipynb
index_xyHacker.html		index_xyHacker.html
review_summary_xyhacker_model.ipynb		review_summary_xyhacker_model.ipynb
review_summary_xyhacker_model.py		review_summary_xyhacker_model.py
test_xyHacker.js		test_xyHacker.js
tokenizer.pickle		tokenizer.pickle

xyhacker/xyhacker

Folders and files

Latest commit

History

Repository files navigation

Amazon Text Analysis using NLTK

Project Brief

Dataset Used

Methodology

Contributors

About

Resources

Stars

Watchers

Forks

Languages