CS598_Yelp_Wizard

Introduction and general information

The goal of this task is to leverage data mining technics and to build a small-scale application system that would allow the envisioned end users (i.e., people who will benefit from the results that are generated by data mining algorithms) to upload a new data set and apply at least one algorithm that you developed or experimented with to mine the uploaded data set using a Web interface.

Application description

During this task I developed a "Yelp wizard" application which allow users to evaluate a dataset of restaurant’s reviews in order to find and choose a new restaurant for visit based on topics and cuisine which could be interested for a user.

Dataset description

For the task we will use Yelp's reviews data set:

"yelp_academic_dataset_review.json"
"yelp_academic_dataset_business.json"

with 703508 reviews for 14035 business. Each business linked to a set of categories like type of business, cuisines and so on. Before topic mining we will pre-process this file and choose review only for venues, which are in "Restaurant" category and extracts the set of cuisines which get us a set of 239 cuisines in whole. By default, the application operates by the whole this dataset but also, a user could upload any subset of this dataset for evaluation.

Functions and goals

The key goal of the application is to allow user to find a restaurant which could be interested for him based on data mining algorithms instead of the standard filters which available on Yelp service. So, in the application there are two key functions:

Topic mining - which allow user to apply LDA algorithm with different parameters for all reviews in the dataset and choose which topic is interested. Based on this choose the application will show a list of cuisines and a list of restaurants for which chosen topic is a topic with the highs weight. It allows user to choose restaurant based on topics and key words of each topic, instead of normal search by keywords.
Text similarities - which allow user to choose cuisine based on measure of similarities between cuisines based on review's texts. It allows user to evaluate a cuisines data set and find an interesting cuisine based on similarities between texts.

Toolkit and libraries

To develop the application, I used:

Python - as general language
Flask - application server
amCharts - for charts and visualisation
Bootstrap - for user interface

For data mining I used the following tools and libraries for Python:

Sklearn - for classification
Gensim - for text processing
Numpy - for some additional tool
NLTK - for text processing

User guide

[https://github.com/denisafanasev/CS598_YelpWizard/blob/master/docs/YelpWizard_User_guide.pdf]

Run

flask run --host=0.0.0.0 &

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
controllers		controllers
data		data
datasets		datasets
docs		docs
models		models
services		services
static		static
templates		templates
utils		utils
yelp_dataset_challenge_academic_dataset		yelp_dataset_challenge_academic_dataset
README.md		README.md
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

controllers

controllers

data

data

datasets

datasets

docs

docs

models

models

services

services

static

static

templates

templates

utils

utils

yelp_dataset_challenge_academic_dataset

yelp_dataset_challenge_academic_dataset

README.md

README.md

app.py

app.py

config.py

config.py

requirements.txt

requirements.txt

run.sh

run.sh

Repository files navigation

CS598_Yelp_Wizard

About

Releases

Packages

Languages

denisafanasev/CS598_YelpWizard

Folders and files

Latest commit

History

Repository files navigation

CS598_Yelp_Wizard

About

Topics

Resources

Stars

Watchers

Forks

Languages