Skip to content

denisafanasev/CS598_YelpWizard

Repository files navigation

CS598_Yelp_Wizard

Introduction and general information

The goal of this task is to leverage data mining technics and to build a small-scale application system that would allow the envisioned end users (i.e., people who will benefit from the results that are generated by data mining algorithms) to upload a new data set and apply at least one algorithm that you developed or experimented with to mine the uploaded data set using a Web interface.

Application description

During this task I developed a "Yelp wizard" application which allow users to evaluate a dataset of restaurant’s reviews in order to find and choose a new restaurant for visit based on topics and cuisine which could be interested for a user.

Dataset description

For the task we will use Yelp's reviews data set:

  • "yelp_academic_dataset_review.json"
  • "yelp_academic_dataset_business.json"

with 703508 reviews for 14035 business. Each business linked to a set of categories like type of business, cuisines and so on. Before topic mining we will pre-process this file and choose review only for venues, which are in "Restaurant" category and extracts the set of cuisines which get us a set of 239 cuisines in whole. By default, the application operates by the whole this dataset but also, a user could upload any subset of this dataset for evaluation.

Functions and goals

The key goal of the application is to allow user to find a restaurant which could be interested for him based on data mining algorithms instead of the standard filters which available on Yelp service. So, in the application there are two key functions:

  • Topic mining - which allow user to apply LDA algorithm with different parameters for all reviews in the dataset and choose which topic is interested. Based on this choose the application will show a list of cuisines and a list of restaurants for which chosen topic is a topic with the highs weight. It allows user to choose restaurant based on topics and key words of each topic, instead of normal search by keywords.
  • Text similarities - which allow user to choose cuisine based on measure of similarities between cuisines based on review's texts. It allows user to evaluate a cuisines data set and find an interesting cuisine based on similarities between texts.

Toolkit and libraries

To develop the application, I used:

  • Python - as general language
  • Flask - application server
  • amCharts - for charts and visualisation
  • Bootstrap - for user interface

For data mining I used the following tools and libraries for Python:

  • Sklearn - for classification
  • Gensim - for text processing
  • Numpy - for some additional tool
  • NLTK - for text processing

User guide

[https://github.com/denisafanasev/CS598_YelpWizard/blob/master/docs/YelpWizard_User_guide.pdf]

Run

flask run --host=0.0.0.0 &

About

CS598 Capstone project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published