- python for fast generation of SQL queries
- jupyter notebook and pandas for a better interface for returning results
- HAWQ for a fast MPP engine on top of HDFS
- PL/Python for creating custom functions that the HAWQ engine executes in parallel
- MADlib for in-database machine learning and statistics, in parallel
- I also wrote some python scripts to scrape airbnb and generate the dataset
The easiest way to use this is by installing anaconda. Anaconda is a package manager for python, found here: https://www.continuum.io/downloads
- if a package isn't found, you can run
conda install <package name>
to install it - to run the jupyter notebook, first open the terminal (or cmd in windows). navigate to the directory where your ipynb file is located and then type
jupyter notebook
. - you will need to modify the connection string to connect to your airbnb database
- several TODOs at bottom of the notebook