Quick Feed is a series of interconnected scripts that parses CSV files in the Voting Information Project (VIP) CSV format and—assuming no serious issues—builds the data into a VIP XML document. It also provides an audit of the build process, ranging from a high-level summary to low-level reports on data quality.
While these scripts are currently the method of VIP XML document creation, it will remain largely under-developed and undeveloped due to being eventually superseded by the new data processing pipeline, Metis. Many of the inclusions are admittedly hacks to make the building process a bit simpler.
Quick Feed requires the following components to be installed, first:
- Python 2.7
- PostgreSQL 9+
To follow the installation instructions below, homebrew is also needed.
The simplest way to use this is via Docker. If you can't / won't use Docker, see the next section for full installation instructions.
docker build -t quick_feed .
docker run -v /your/data/dir:/data -v /your/feed/output/dir:/feeds -v /your/error/reports/dir:/reports quick_feed --state co
The following instructions assume installation on a Mac.
First, if you haven't already, install homebrew. Life will become infinitely easier.
Install both python and postgres using homebrew
$ brew install postgres python
Clone the repository
$ git clone https://github.com/votinginfoproject/quick_feed.git
Install virtualenv, set up a virtual environment, and install the required libraries
$ pip install virtualenv
$ virtualenv --no-site-packages ~/path/to/virtualenv
$ source ~/path/to/virtualenv/bin/activate
$ pip install -r requirements.txt
At this point, as long as the other dependencies are set up correctly, quick_feed will now work with the proper arguments
usage: quick_feed.py [-h] --report-dir REPORT_DIR --feed-dir FEED_DIR
[--tmp-dir TMP_DIR] --state STATE [--county COUNTY]
[--data-type {db_flat,element_flat,feed}] [--voterfile]
[--dbname DATABASE] [--dbuser USERNAME]
[--dbpass PASSWORD] [--schema SCHEMA_URL]
CSV_DIR
This application processes structured VIP CSV data into a VIP XML document
while performing validation on the data.
positional arguments:
CSV_DIR the directory containing VIP CSV files
optional arguments:
-h, --help show this help message and exit
--report-dir REPORT_DIR
the output directory for reports
--feed-dir FEED_DIR the output directory for the feed
--tmp-dir TMP_DIR a temp directory for parsing
--state STATE the abbreviation of the state
--county COUNTY the full name of the county
--data-type {db_flat,element_flat,feed}
the VIP CSV data file type (NB: only current supported
type is "db_flat")
--voterfile flag to validate the data with the assumption it's
from a voterfile
--dbname DATABASE the database name
--dbuser USERNAME username to connect to the database
--dbpass PASSWORD password to connect to the database
--schema SCHEMA_URL the url to a VIP schema version
Install VirtualBox to add a virtual machine to your Windows computer, choosing Ubuntu as the OS here
A tutorial
Read this settings article to choose your settings while you create and set up your VM
Be sure to select the option to have a shared clipboard - this will make things easier when you try to copy and paste commands or errors between machines.
Install the Linux Guest Additions
Set up three shared folders, to be shared between your Windows and VM. One for feeds, one for reports, and one for data. Instructions on setting up shared folders here
You do not need to install PostgreSQL or Python, because they should both be included in Ubuntu's default distribution.
Install git
sudo apt-get install git-core
Clone the repository
git clone https://github.com/votinginfoproject/quick_feed.git
Create a new database, user and password for quick_feed to use
psql postgres
create user -s vip with password '[choose pw]'
createdb quickfeed owner vip
Install easy_install
sudo apt-get update
sudo apt-get install python-setuptools python-dev build-essential git-core -y
Install pip
sudo easy_install pip
Install virtualenv
sudo pip install virtualenv
Create a directory for your virtual environment
mkdir ~/virtualenvs
Create a virtualenv
virtualenv --no-site-packages ~/virtualenvs/quickfeed
Load your virtualenv
source ~/envs/quickfeed/bin/activate
Install the requirements
pip install -r requiremenents.txt
You are now ready to use quick_feed at the command line, as shown above. Every time you want to begin using quick_feed, you should do the following:
1. Mount the three shared folders:
sudo mount.vboxsf [shared data folder name] ~/data
sudo mount.vboxsf [shared feeds folder name] ~/feeds
sudo mount.vboxsf [shared reports folder name] ~/reports
2. Load the virtualenv: source ~/envs/quickfeed/bin/activate
3. cd to the directory: cd ~/quick_feed
When you are finished running quick_feeds, type: deactivate Your virtualenv will be shut down
A successful run takes a while and then exits gracefully and produces have a report_summary.txt file where ever you pointed the reports directory at. If things went well, it also produces the feed wherever you pointed the feed output at. Check the reports_summary.txt for errors, and see other accompanying error files to see if anything needs to be cleaned up.
However, more than likely there will be data file errors that prevent a report from being generated. See the section below for some of the typical errors. In these cases you'll most likely get a stack trace output, and have to use that as your guide to figuring out the type of error you encountered.
- Non-ascii characters. The data somewhat frequently has non-ascii characters that kill the feed generation process. It'll give you the 0xNN character code, so then you can google the code, find the character, copy it, and then do a search for it and replace it with whatever is appropriate. Common non-ascii characters found include: curly single and double quotes and em dashes. Referenda might also include things like the Section Mark character.
- Files need to be named after the tables they are put into, and also need column names appropriately named as well. So far all I can suggest is find a good data set and use it as a guide, unless you want to pick apart the schema from one of the python files (feedconf.py)
- Invalid field inputs. Some of the schema fields are enums, and while they handily accept an empty string value, if the export has the value NULL, this will fail. I'm sure there are many other examples of this.
- No election. Every import needs an election to base the data off of, so if election.txt/election.csv is missing, this will fail with an obscure error that points to a line in the file trying to read the date from a database cursor/result set.
- More than one election. Also, there should only be a single election in the feed building stage. Fun!
- Non-data files in data directory. The import is very fragile, and expects all the files in the directory to be part of the import and fails if some are not.