Chicago Police Officer Scheduling System
These are the data sets that we used in our project.
- Chicago crimes https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
- Chicago traffic tracker https://data.cityofchicago.org/Transportation/Chicago-Traffic-Tracker-Historical-Congestion-Esti/77hq-huss
- Chicago traffic crashes https://data.cityofchicago.org/Transportation/Traffic-Crashes-Crashes/85ca-t3if
Here are the following things that needs to be downloaded on every VM in the cluster, including the master node and slave nodes. Assuming that the VMs are running Ubuntu 16.04.6 LTS
- Hadoop 3.1.3
- Spark 3.0.0
- Python 3.7.6
- Anaconda 4.5.11
- Install all the packages from
requirements.txt
After running jupyter notebook
, it should be quite straight forward when running the notebooks, as long as you have installed Hadoop and Spark correctly.
Each folder has a README.md
file that further explains what the contents of that folder does.