- Census: https://www2.census.gov/programs-surveys/popest/datasets/2010/modified-race-data-2010/stco-mr2010_mt_wy.csv
- New York crimes: https://data.cityofnewyork.us/api/views/qgea-i56i/rows.csv?accessType=DOWNLOAD
- Washington Post shootings: https://github.com/domk11/BigDataProjectCrimes/blob/master/shootings_wash_post.csv
- Police Deaths: https://github.com/domk11/BigDataProjectCrimes/blob/master/police_killings.csv
- Presidential Elections (2016): https://github.com/domk11/BigDataProjectCrimes/blob/master/presidential_polls_2016.csv
-
Load the datasets inside a database named 'datascience' respecting the collection names described as depicted in contracts. Create an output folder as described into settings file.
-
Run dataset filtering and cleaning
python filter_original_dataset.py
- Run New York crime analysis
python crimes_type.py
- Run Washington post killings analysis
python shoots.py
- Run Presidential elections analysis
python polls.py
- Run Police Deaths analysis
python police_deaths.py
- Run Districts Census
python pop_distribution.py
- Census analysis with mapreduce
Into src/mapreduce
python run.py