Skip to content

ndanielsen/canimakeit-app

Repository files navigation

##Data Exploration for DDL Incubator Group 4

##Team

Mason Unger - Statistics, Graphing, Team Lead

Nathan Danielsen -- Data Analysis, Model building and validation, and Django stuff

Jordan Lewis - Data wrangling, Database

Max Heiber - Front End, APIs, Scraping

##Problem Statement

Distill economic data from BLS and various public sources, combined with user input about income and spending to simply answer a simple question:

“Can I make it in this city?”

###Research Questions

Can we make a prediction on if someone can 'make it' if they move to DC (or any other large city)?

What is making it in DC? Can we define making it or is it defined by user preferences?

Does 'making it' require a certain income threshold or budget breakdown for different locations (zip code, neighborhood) depending on the persons gross household income and/or household size?

Can we recommend or predict where the best place is for a person to live based on a combination of their resources, quality of life preferences and other information/ preferences?

What are the strongest dependant variables (ie. costs for housing, grocery or insurance; or perhaps walkability score?) that feed into 'making it' or not?

Can we predict if 'making it' will change over time due to inflation or other changes in consumer prices?

###Hypothesis

I hypothesize that by using open data sources (open and user-given) and by using widely accepted models for 'making it', we will find that a ratio of housing price to net-income, cost/ease of the work commute, and ability to save are the greatest determinants of making it.

###Value Proposition

We can give users a realistic picture of the total costs of living in a certain area and give them guidance on if they can make it or not - in short term and long term.

###Strategy for Analysis

For the DC Market:

Segment - Divide and segment publically available consumer expenditure data into demographic groups (young singles under 25 and 25+, married without kids etc)

Location - Break down all data in DC by Ward, Zip code and/or neighboorhood

Machine Learning Applications - For microdata, unsupervised clustering approaches would interesting to pull out patters.

Output

Attempt to create a model (min, mean, max) of consumer purchasing power by segment and location in DC

With this model, can we complement this model with additional information to give users a more realistic understanding of 'can they make it'?

##Data Sources

Seed Data

Bureau of Labor Statistics Data Consumer Expenditure (CE) tabular data CE microdata Component CPI price indexes by elementary item and area

Guide for CE Survey Data in R

Getting started with Consumer Expenditure Survey Public-Use Microdata

Can we find more data out there on this?

Complementary Data

Craigs List's Apartment Listings

Zillow API

Data Feature Backlog

Mint Intergration

##Budget Models We should define 3 classes of "Making It" Budgets. Anything below the Min would be the danger zone for not making it.

"Min - Barely Making It" Frugal Budget

"Mean - Making It" Young Adult Budget - Can Make it

"Max - Got It Made" Where the poor and rich really spend their money

*Links are illistrative for getting a sense of the budget models

##Already Invented Features (To refactor and not reinvent wheels)

###Craigslist

Craigslist Apartment Crawler Craig's List Apartment Crawler

craigsuck: A Craigslist RSS poller

###Zillow API

PyZillow

###Mint API a screen-scraping API for Mint.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published