Skip to content
This repository has been archived by the owner on Oct 18, 2022. It is now read-only.
/ iati.cloud Public archive
forked from zimmerman-team/IATI.cloud

The open-source datastore for IATI data with RESTful web API providing XML, JSON, CSV plus ETL tools. It extracts and parses IATI XML files referenced in the IATI Registry and stores them in Apache Solr, for you to use. This codebase is deployed in a High Availability Cluster for IATI as the IATI Datastore.

License

Notifications You must be signed in to change notification settings

IATI/iati.cloud

 
 

Repository files navigation

IATI.cloud


Quality Gate Status License: AGPLv3 Open issues CircleCI

IATI.cloud extracts all published IATI XML files from the IATI Registry and makes them available in a normalised PostgreSQL database, that you can access using a RESTful API. The project also stores all the parsed data in Apache Solr cores, allowing for faster querying. Two APIs are currently encompassed by the IATI.cloud project.

IATI is a global aid transparency standard and it makes information about aid spending easier to access, re-use and understand the underlying data using a unified open standard. You can find more about the IATI data standard at: www.iatistandard.org

Requirements

Name Required version Installation instructions
Python 3.6.5 Python 3.6.5
Tip: for managing multiple versions of python you can use pyenv
PostgreSQL latest PostgreSQL
PostGIS latest PostGIS
Might already be installed depending on the PostgreSQL installation done
RabbitMQ latest RabbitMQ
Apache Solr 8.2.0 Solr
Python requirements Installed by requirements.txt See instructions below
Diskspace 1GB of space is recommended to ensure the Repository, Postgres Database, Apache Solr and required services can be installed. Do keep in mind, parsing and indexing datasets does increase the overall size of the IATI.cloud project, which can reach up to or more than 80GB. Not applicable

Setting up your IATI Cloud environment

  1. Go to folder root/OIPA.
  2. Create a virtual environment with the correct Python version, recommended name is ‘env’ (ex: virtualenv env -p python3 )
  3. Activate the virtual environment (ex: source env/bin/activate)
  4. Install required libraries using pip install -r requirements.txt
  5. Make sure the following services are running on your installation: PostgreSQL, (ex: sudo systemctl status postgresql )
  6. Run pre-commit install --hook-type commit-msg
  7. Create a PostgreSQL database
  8. Add the following .env file to the current working directory:
OIPA_DB_NAME=oipa
OIPA_DB_USER=oipa
OIPA_DB_PASSWORD=oipa
DJANGO_SETTINGS_MODULE=OIPA.development_settings
  1. Add the file “local_settings.py” with the following information to the folder root/OIPA/OIPA:
SOLR = {
  'indexing': True,
  'url': 'http://localhost:8983/solr',
  'cores': {
       'activity': 'activity',
       'budget': 'budget',
       'dataset': 'dataset',
       'organisation': 'organisation',
       'publisher': 'publisher',
       'result': 'result',
       'transaction': 'transaction',
  }
}
DOWNLOAD_DATASETS = False
  1. Go back to the folder root/OIPA and run database migrations with the command python manage.py migrate
  2. Start the development server with the command python manage.py runserver
  3. Create a superuser account for django with the command python manage.py createsuperuser
  4. Start RabbitMQ with brew service start rabbitmq on mac or sudo service rabbitmq-server start on linux.
  5. Start Celery worker with the command: celery -A OIPA worker --loglevel=info --concurrency=10, change the concurrency to your liking.
  6. Start Celery beat with the command: celery -A OIPA beat --loglevel=info -S django
  7. Start Celery flower with the command: celery flower -A OIPA --port=5555
  8. Navigate to your Solr installation.
  9. To use Apache Solr you will need to create the following 7 cores:
  • activity
  • budget
  • dataset
  • organisation
  • publisher
  • result
  • transaction

To create a core :

  • Input the following command on the command line: bin/solr create -c [name of your core].
  • Copy the ‘managed-schema’ file from OIPA/solr/[name of your core]/conf/ and paste it in the server/solr/[name of your core]/conf/ folder of the solr core.
  1. Run the command bin/solr start to run Solr.

You can now access the Django admin page at http://localhost:8000/admin/, the Flower Dashboard at http://localhost:5555 and the Apache Solr administrative dashboard at localhost:8983/solr/#/

Debugging Celery or Apache Solr

  • install telnet
  • add at the to-be-debugged line in your code :
from celery.contrib import rdb
rdb.set_trace()

When the code reaches the corresponding line, you will receive a notification in the terminal stating that they are waiting for the debugger at port [port number].
In another terminal you can then launch: telnet localhost [port number].
This will open the debugger.

Parsing/indexing the data.

This process is managed from the Django administration page. The following is a step by step description of everything that needs to be done to load the data inside your local postgres database.

  1. Disable Apache Solr indexing within OIPA/OIPA/local_settings.py by changing ‘indexing’: True to False.
  2. On the django administration page, run the task to import codelists.
    • Wait for this to finish.
  3. On the django administration page, force run the task to update exchange rates.
    • Wait for this to finish.
    • Activate a scheduled version of this task, the task should be run monthly, not strictly necessary on a local installation.
  4. Enable Apache Solr indexing.
  5. On the django administration page, run the task to import datasets.
    • In case you want to make use of the IATI validator, make sure that you set DOWNLOAD_DATASETS = True in OIA/OIPA/local_settings.py, so the IATI validator can be used.
    • Wait for this to finish.
  6. Wait for the IATI Validator to finish its validation. We can check the status here. If no data is returned, it has finished.
  7. If you want to parse ALL available datasets use the following: On the django administration page, run the task to validate the datasets. If you want to parse a specific organisation or a specific dataset , use those tasks.
    • Wait for this to finish.
  8. We can now parse and index the datasets that have been prepared in the previous steps. On the django administration page, run the task to parse all datasets.
    • Wait for this to finish.

After this all the data is available within our database as well as Solr. Solr can now be used to query the data. Simply select the core containing the information you're interested in, go to the query tab and ask your question.

API Documentation

Full API documentation for iati.cloud can be found at docs.iati.cloud.

About the project

Can I contribute?

Yes! We are mainly looking for coders to help on the project. If you are a coder feel free to Fork the repository and send us your amazing Pull Requests!

How should I contribute?

Python already has clear PEP 8 code style guidelines, so it's difficult to add something to it, but there are certain key points to follow when contributing:

  • PEP 8 code style guidelines should always be followed. Tested with flake8 OIPA.
  • Commitlint is used to check your commit messages.
  • Always try to reference issues in commit messages or pull requests ("related to #614", "closes #619" and etc.).
  • Avoid huge code commits where the difference can not even be rendered by browser based web apps (Github for example). Smaller commits make it much easier to understand why and how the changes were made, why (if) it results in certain bugs and etc.
  • When developing a new feature, write at least some basic tests for it. This helps not to break other things in the future. Tests can be run with pytest
  • If there's a reason to commit code that is commented out (there usually should be none), always leave a "FIXME" or "TODO" comment so it's clear for other developers why this was done.
  • When using external dependencies that are not in PyPI (from Github for example), stick to a particular commit (i. e. git+https://github.com/Supervisor/supervisor@ec495be4e28c694af1e41514e08c03cf6f1496c8#egg=supervisor), so if the library is updated, it doesn't break everything
  • Automatic code quality / testing checks (continuous integration tools) are implemented to check all these things automatically when pushing / merging new branches. Quality is the key!

Running the tests

Pytest-django is used to run tests. This will be installed automatically when the project is set up. To run tests, from the top level directory of the project, run pytest OIPA/. If you are in the same directory where manage.py is, only running pytest will be sufficient. Refer to Pytest-django documentations for details.

Tip: to be able to use debuggers (f. ex. ipdb) with pytest, run it with -s option (to turn off capturing test output).

Testing / code quality settings can be found in the setup.cfg file. Test coverage settings (for pytest-cov plugin) can be found at .coveragerc file.

Who currently makes use of IATI.cloud?

& many others

About

The open-source datastore for IATI data with RESTful web API providing XML, JSON, CSV plus ETL tools. It extracts and parses IATI XML files referenced in the IATI Registry and stores them in Apache Solr, for you to use. This codebase is deployed in a High Availability Cluster for IATI as the IATI Datastore.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 82.4%
  • CSS 10.7%
  • SCSS 3.0%
  • HTML 1.9%
  • XSLT 1.7%
  • JavaScript 0.2%
  • Other 0.1%