242-2016

Project for the "Para-normal Distributions" team (Team #0) of CMPS242 Fall 2016. The group member are:

Eriq Augustine
Varun Embar
Dhawal Joharapurkar
Xiao Li

The aim of this project is to experiment on finding similar businesses in the Yelp Challenge Dataset using clustering.

Code

We are using python 3. The code directory contains the actual code for our clustering and evaluation.

Running the run.py file will run a short clustering run using a small (100 points) subset of our data.

Running experiments.py file will run our actual experiments. It runs on the entire test set and tries many parameter combinations, so it is suggested to not run that.

To run the tests, you can use the test.sh script. It is a very small script, but the command to run all tests in a directory is easy to forget.

The tests, run.py, and experiments.py do not hit the database by default. Instead, they load the data from a pickle generated by running data.py. If you want to use the database, then you will need a file called secrets.py that defines constants used to connect the file. The following constants must be defined:

DB_HOST
DB_PORT
DB_NAME
DB_USER
DB_PASS

Dependencies

Our project uses the numpy library, which will need to be installed prior to running.

In additional if you are going to connect to the database, then you will need to also install the psycopg2 Postgres driver.2

Data

The data directory mainly contains scripts for:

Generating SQL files from the Yelp JSON dataset.
Creating tables to hold the data.
Inserting the data.
Optimizing the data for our specific queries.

The build.sh script takes the data from the JSON files to optimized database tables.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
code		code
data		data
docs		docs
reports		reports
.gitignore		.gitignore
README.md		README.md
contribution.txt		contribution.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

docs

docs

reports

reports

.gitignore

.gitignore

README.md

README.md

contribution.txt

contribution.txt

Repository files navigation

242-2016

Code

Dependencies

Data

About

Releases

Packages

Contributors 3

Languages

eriq-augustine/242-2016

Folders and files

Latest commit

History

Repository files navigation

242-2016

Code

Dependencies

Data

About

Resources

Stars

Watchers

Forks

Languages