Skip to content
This repository has been archived by the owner on Sep 19, 2019. It is now read-only.

texastribune/the-dp

Repository files navigation

The Texas Higher Education Data Project

Build Status

A very rough guide to starting development

Example .env file for environment variables:

DJANGO_SETTINGS_MODULE=exampleproject.settings.dev
DATABASE_URL=postgis:///tx_highered

Complete guide to getting started (remove steps to suit you):

# install postgresql libpq-dev

git clone $REPOSITORY && cd $PATH
mkvirtualenv tx_higher_ed
setvirtualenvproject
add2virtualenv .
pip install -r requirements.txt

# if you need to create a database:
# `postdoc` greatly simplifies connecting to Docker databases
pip install postdoc
phd createdb --encoding=UTF8 -T template0
echo "CREATE EXTENSION postgis;" | phd psql
echo "CREATE EXTENSION postgis_topology;" | phd psql

# or if you need to reset your database:
make resetdb

# syncdb and load fixtures
make syncdb

#######################################################################
# You can stop at this point if you're just playing with the project. #
#######################################################################

# if using 2012 data, bump it up to 2014 standards
python tx_highered/scripts/2014_update.py

# get ipeds data, requires https://github.com/texastribune/ipeds_reporter
../ipeds_reporter/csv_downloader/csv_downloader.py \
  --uid data/ipeds/ipeds_institutions.uid --mvl data/ipeds
mv ~/Downloads/Data_*.csv data/ipeds
# get thecb data
cd data && make all
# load data
#   timing: 10m25.069s
make load
# post-process the data
python exampleproject/manage.py tx_highered_process


####################################
# placeholder for post-2014 update #
####################################
# the 2012->2014 specific stuff can go out and the above importing
# instructions can get updated

Database

This project currently requires a PostGIS database (hopefully not for long):

$ phd createdb
$ phd psql

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_topology;

Moving data between databases

You can do a sql dump to move data from one postgres database to another (excluding geo info):

$ phd SOURCE_DATABASE_URL pg_dump --no-owner --no-acl --table=tx_highered* --clean > tx_highered.sql
$ phd DEST_DATABASE_URL psql -f tx_highered.sql

After deploy

  1. Freeze the current data in a fixture
    1. Edit the tx_highered_YYYY.json.gz make task
    2. Run the task to save the data
  2. Adjust the loading scripts to reference the new fixture
  3. Deprecate (or delete) any one-time data migration scripts, e.g. 2014_update.py won't be necessary after 2015

Getting Data from the IPEDS Data Center

When it asks you for an Institution, enter a list of UnitIDs generated by:

list(Institution.objects.filter(ipeds_id__isnull=False).values_list('ipeds_id', flat=True))

Getting Data from the Texas Higher Education Coordinating Board

If you want to regrab data from THECB's web site, first find the data file that you want to re-grab. It will be named something like "top_10_percent.html". There will also be a file called "top_10_percent.POST". From that file you can recreate the report with the command:

curl -X POST -d @top_10_percent.POST http://www.txhighereddata.org/interactive/accountability/InteractiveGenerate.cfm -s -v > blahblahblah.html

If you need to modify the report, you can reverse engineer it from the POST data and the form markup.

(c) 2012 The Texas Tribune