This code models childhood lead poisoning in the city of Chicago. This project is under development by Eric Potash and Joe Walsh at the University of Chicago's Center for Data Science and Public Policy in partnership with the Chicago Department of Public Health. For an overview of the project, see our preliminary results which were written up and published in the 21st ACM SIGKDD Proceedings.
Closely based on previous work of Joe Brew, Alex Loewi, Subho Majumdar, and Andrew Reece as part of the 2014 Data Science for Social Good Summer Fellowship.
The code for each phase is located in the corresponding subdirectory and is executed using a drake. The output of each phase is contained in a database schema of the same name.
###input
Preprocess and import our data into the database. CDPH provided us with three private databases:
- Blood Lead Level Tests
- Home Inspections
- WIC enrollment and program data
We supplemented that data with the following public datasets:
- Chicago addresses
- Cook County Assessor Data
- Building Footprints
- Build Permits
- Building Violations
- American Community Survey
###dedupe Deduplicate the names of children from the blood tests and the WIC Cornerstone database.
###buildings Analyze the Chicago buildings shapefile to extract all addresses and group them into buildings and complexes.
###aux Process the data to prepare for model building. That includes summarizing and spatially joining datasets.
###output Generate model features by aggregating the datasets at a variety of spatial and temporal resolutions.
###model Use our drain pipeline to run run models in parallel and serialize the results.
We run the workflow using drake. Specify the following environment variables in the lead/default_profile
file:
# Postgresql databse connection information
PGHOST=
PGDATABASE=
PGUSER=
PGPASSWORD=
ASSESSOR_FILE= # Cook County Tax Assessor MDB file
CURRBLLSHORT_FILE= # Current blood lead levels CSV file
M7_FILE= # Old blood lead levels CSV file
INSPECTIONS_FILE= # Inspections CSV file
CORNERSTONE_DIR= # Directory containing Cornerstone DBF files
CORNERSTONE_ADDRESSES_FILE= # Geocoded Cornerstone addresses CSV file