Takes data from places, and puts them into a "Data" "Catalogue"
See the instructions in the https://github.com/nhsengland/iit-infrastructure/tree/master/ansible README.rst file.
-
Set up a virtualenv using your favourite tool for doing so, and activate it.
-
git clone https://github.com/nhsengland/publish-o-matic.git
-
python setup.py install
# or develop if you insist on it being changeable -
See below for setting up cronjobs
-
To manually run a scraper do
run_scraper <NAME>
where name is the name of a module in the datasets module.
TODO: Merge steps 2 and 3.
I don't trust it yet ...
$ crontool > mycrontab
$ less mycrontab
** does it look sane? **
$ crontab mycrontab
How wrong can it be?
$ crontool | crontab
Contains individual STL (Scrape, Transform, Load) procedures for curated datasets.
Each directory is expected to contain a data dir (For cached/retrieved data files) and three files:
- scrape.py - scrape the data files and metadata
- transform.py - make adjustments / additions to scraped metadata as required
- load.py - load the datasets into a CKAN instance.