This is a project by Hemeng Maggie Li, Chudan Ivy Liu, Jiawen Jasmine Zhu.
(Since we can't select two partners on submission site, so two of us submit one copy of the project, and the other one also submit one copy by herself. Please grade this content for all three of us)
- Install package
TextBolb
- Run with command
pip install -U textblob
python -m textblob.download_corpora
- Run with command
- Install package
NLTK
- Run with commcand
sudo pip install -U nltk
- Run with commcand
- Install package
geoplotlib
- Under the directory of
geoplotlib
, run with command:pip install geoplotlib
- Under the directory of
-
Machine Learning Part:
main.py
:
is the executable of machine learning partdata_processing_geo.py
: processes data from business json file of Yelp Dataset for data visualizationdata_processing_ml.py
: processes data from review json file of Yelp Dataset for machine learning part of the projectextract_features.py
: extracts features from a given review texttraining_tuning.py
: runs cross validation for the processed data using knn and decision tree algorithm to determine the best parameters for trainingpredict.py
: fits the model using training data, and predicts the test data There is also one function that predicts the review with knn.
-
Geographical Visualization Part:
star_rating_geo.py
:
is the executable of star rating geographical plotprice_range_geo.py
:
is the executable of price range geographical plot