This extension is a modification of the already existing extension that can be found here https://github.com/ckan/ckanext-dcat. The extension provides functionality for logging data about harvest activity done on each dataset, and functionality that checks if the dataset already exists based on its resources. It alse extends the original extension by adding two custom harvesters which are adapted for harvesting data from http://data.norge.noand http://geonorge.no.
-
Install ckanext-harvest (https://github.com/ckan/ckanext-harvest#installation)
-
Install the extension on your virtualenv:
(pyenv) $ pip install -e git+https://github.com/OpenTransportDataProject/ckanext-customharvesters.git#egg=ckanext-customharvesters
-
Install the extension requirements:
(pyenv) $ cd /usr/lib/ckan/default/src/ (pyenv) $ pip install -r ckanext-customharvesters/requirements.txt
-
Enable the required plugins in your ini file:
ckan.plugins = dcat dcat_rdf_harvester dcat_json_interface geonorgeHarvester datanorgeHarvester
-
Restart apache:
sudo service apache2 restart
-
Navigate to the harvester interface by going to http://yourCKANIP/harvest
-
Press the ”Add Harvest Source” button and fill in the necessary information about your harvest source.
-If you want to add a general tag to all datasets(for example ”OTD”), add the following line in the configurationfield:"default_tags": [{"name": "OTD"}]
-Example URL for data.norge.no: http://data.norge.no/api/dcat/data.json
-Example URL for geonorge.no: https://kartkatalog.geonorge.no/api/search?text=kystverket&limit=10000000
Remember to hit save when you are done
-
Choose the harvest source you created from the list found at http://yourCKANIP/harvest
-
Press the ”Admin”-button. (you must be logged in as a sysadmin user)
-
Press the ”Reharvest”-button, you have now created a new harvest job.
-
Login to your CKAN server through SSH.(How to do this depends on your own setup)
-
Activate your CKAN virtual environment(How to do this may vary, depending on your CKAN installation):
/usr/lib/ckan/default/bin/activate
-
Run the gather_consumer-command:
(pyenv) $ paster --plugin=ckanext-harvest harvester gather_consumer --config=/etc/ckan/default/production.ini
-
On another terminal, run the fetch_consumer-command:
(pyenv) $ paster --plugin=ckanext-harvest harvester fetch_consumer --config=/etc/ckan/default/production.ini
-
Finally, on a third console, run the run-command to start any pending harvesting jobs:
(pyenv) $ paster --plugin=ckanext-harvest harvester run --config=/etc/ckan/default/production.ini