Skip to content

digideskio/lead-public

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CDPH Childhood Lead Poisoning Model

This code models childhood lead poisoning in the city of Chicago. This project is under development by Eric Potash and Joe Walsh at the University of Chicago's Center for Data Science and Public Policy in partnership with the Chicago Department of Public Health. For an overview of the project, see our preliminary results which were written up and published in the 21st ACM SIGKDD Proceedings.

Closely based on previous work of Joe Brew, Alex Loewi, Subho Majumdar, and Andrew Reece as part of the 2014 Data Science for Social Good Summer Fellowship.

The Solution

The code for each phase is located in the corresponding subdirectory and is executed using a drake. The output of each phase is contained in a database schema of the same name.

###input

Preprocess and import our data into the database. CDPH provided us with three private databases:

  • Blood Lead Level Tests
  • Home Inspections
  • WIC enrollment and program data

We supplemented that data with the following public datasets:

###dedupe Deduplicate the names of children from the blood tests and the WIC Cornerstone database.

###buildings Analyze the Chicago buildings shapefile to extract all addresses and group them into buildings and complexes.

###aux Process the data to prepare for model building. That includes summarizing and spatially joining datasets.

###output Generate model features by aggregating the datasets at a variety of spatial and temporal resolutions.

###model Use our drain pipeline to run run models in parallel and serialize the results.

Running the model

We run the workflow using drake. Specify the following environment variables in the lead/default_profile file:

# Postgresql databse connection information
PGHOST=
PGDATABASE=
PGUSER=
PGPASSWORD=

ASSESSOR_FILE= # Cook County Tax Assessor MDB file
CURRBLLSHORT_FILE= # Current blood lead levels CSV file
M7_FILE= # Old blood lead levels CSV file
INSPECTIONS_FILE= # Inspections CSV file
CORNERSTONE_DIR= # Directory containing Cornerstone DBF files
CORNERSTONE_ADDRESSES_FILE= # Geocoded Cornerstone addresses CSV file

Software we use

About

DSaPP Lead Hazard Modeling

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 86.3%
  • Shell 10.6%
  • Batchfile 3.1%