Simple "translation" of Amazon product reviews classified by language, topic, and sentiment
License
curtislb/ReviewTranslation
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
These are the files in our project in usage order with a description of what they do. count_reviews.py is a script to count the number of english and spanish reviews there are in a gzip file. Used at the beginning to judge what subset of the dataset we should use. review_data.py is an io utility file proc_language.py reads the dataset, detects the languages, and prints out only English and Spanish reviews with only the necessary fields filter_reviews.py removes reviews that have less than 5 words fix_acute.py changes all accents in Spanish to the unaccented letter remove_stopwords.py removes stopwords from both Spanish and English topic_extraction.py loads the reviews and fits an LDA using a CountVectorizer and outputs the top 50 words of n_topics topics. categorize_reviews.py Uses the LDA words and weights to categorize the dataset print_categories.py Takes the LDA words and weights and prints them in correct order in human readable format
About
Simple "translation" of Amazon product reviews classified by language, topic, and sentiment
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published