As of 2012-05-28, the EPSRC "Grants on the Web" database contains grants dating back to 1985:
- 43,317 grants
- £10,935,828,250 of committed funding
This repository contains tools for scraping and manipulating this data into a useful format.
Instead of using the code in this repository, why not download the database generated by it? A listing of these can be found at CKAN.
- Latest database (SQL format, updated 2012-05-28)
- See this data visualised at OpenSpending
To get up and running, you will need python
and pip
. Instructions for installing python
are beyond the scope of this document, but you should be able to get pip
if you don't have it by running:
$ easy_install pip
If you use virtualenv
, you might want to create an environment to play around in:
$ virtualenv --distribute pyenv
$ source pyenv/bin/activate
Now install the scraper and its dependencies:
$ pip install -e .
Create the database (or migrate your existing database to the current schema version) by running:
$ ./migrate.sh epsrc.db 4
This scraper operates on a local mirror of the EPSRC GOW website. You must first make this mirror (this will take a long time to complete, often several days):
$ wget -c -m gow.epsrc.ac.uk
Then you can use the epsrc-scrape
tool to parse this data:
$ epsrc-scrape epsrc.db ./gow.epsrc.ac.uk
Each grant has a unique grant reference, e.g. GR/J50118/01
. We can check that the example grant above made it into the database with the following command
$ sqlite3 -line epsrc.db "select * from grants where id='GR/J50118/01'"
id = GR/J50118/01
title = USE OF CATALYTIC MEMBRANES FOR IN SITU REACTION AND SEPARATION
value = 114608
principal_investigator_id = 40792
department_id = 5273
created_at = 2011-06-16 12:36:35.164790
modified_at = 2011-06-16 12:36:35.164800
Be careful with the data this scraper generates. Care has been taken to sanitize the data, but as with any scraper, there may be inconsistencies or missing entries. Please let me know if you find any problems.