GitHub - jshph/LIN-350-Final-Project-Forum-Opinion

About this project

Final project made in LIN 350 at UT Austin, as an experiment in categorizing hotel reviews into the features they describe (i.e. room service, amenities, comfort). Employs Python NLTK (natural language processing) library and implements TextRank, an algorithm that extracts keywords from text in an unsupervised fashion.

Original motivations for this project were to categorize forum thread discussions around various products. We didn't pursue this because annotating our own dataset of forum threads was out of our scope, so we pivoted to hotel reviews instead (for which we had an annotated dataset). One major thing that we learned may help our future work with these kinds of data: forums feature more comparative experience, whereas product reviews are generally more standalone value judgments that are anecdotal, etc.

Notes to get started

store the below datasets in the root of this folder

http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Electronics_5.json.gz]

^^^ extracted and stored as reviews_Electronics_amazon.json

http://www.cs.cmu.edu/~jiweil/review.txt.zip

^^^ extracted and stored as hotels_review_tripadvisor.txt

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.idea		.idea
LDAtuning_tets		LDAtuning_tets
Papers		Papers
__pycache__		__pycache__
.DS_Store		.DS_Store
.gitignore		.gitignore
ABSA15_Hotels_Test.xml		ABSA15_Hotels_Test.xml
ABSA15_Hotels_parsed.txt		ABSA15_Hotels_parsed.txt
ComputationalSemantics-IntermediateReport.docx		ComputationalSemantics-IntermediateReport.docx
JSONparse.py		JSONparse.py
LDAtuning.py		LDAtuning.py
Laptop_Reviews.py		Laptop_Reviews.py
Opinionator.py		Opinionator.py
Opinionator_to_file.py		Opinionator_to_file.py
Parsed_Hotel_Data.py		Parsed_Hotel_Data.py
PhamJoshua_ProjectProposal.pdf		PhamJoshua_ProjectProposal.pdf
README.md		README.md
TextRank.py		TextRank.py
asin_grouped_keywords.txt		asin_grouped_keywords.txt
depparse.py		depparse.py
gdict_originals_words.dict		gdict_originals_words.dict
hotel_textrank_output.txt		hotel_textrank_output.txt
hw2module.py		hw2module.py
initial_proj_description_feedback.txt		initial_proj_description_feedback.txt
keywordfreq.txt		keywordfreq.txt
ranking.py		ranking.py
relextract.py		relextract.py
sentences_parsed.p		sentences_parsed.p
spider.py		spider.py
stopwords-augmented.txt		stopwords-augmented.txt
test1.txt		test1.txt
testoutput.txt		testoutput.txt
textrank_hotel_output_feature_counts.txt		textrank_hotel_output_feature_counts.txt
textrank_hotels_output.txt		textrank_hotels_output.txt
textrank_hotels_output_featurecount2.txt		textrank_hotels_output_featurecount2.txt
textrank_hotels_output_featurecount3.txt		textrank_hotels_output_featurecount3.txt
tfidf_textrank.py		tfidf_textrank.py
tfidf_textrank_output.txt		tfidf_textrank_output.txt
wordCount.py		wordCount.py
wordCountEdited.py		wordCountEdited.py
wordCountEditedv2.py		wordCountEditedv2.py

jshph/LIN-350-Final-Project-Forum-Opinion

Folders and files

Latest commit

History

Repository files navigation

About this project

Notes to get started

About

Resources

Stars

Watchers

Forks

Languages