Skip to content

luiscape/hdxscraper-wfp-vam-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WFP VAM API Collector

Collector for the WFP VAM API.

Build Status Coverage Status

Usage

If you are on an Unix machine, you can use the Makefile to run this collector:

test:
  bash bin/test.sh;

setup:
  bash bin/setup.sh;

run:
  bash bin/run.sh;

Or you can run directly using Python:

$ python scripts/wfp_collect/

The results will be stored in CSV files, JSON files, and / or a SQLite database called "scraperwiki.sqlite".

Cleaning Data

The modified GAUL boundary set provided by the VAM unit contains around 50k administrative codes. However, the provision starts at the admin 2 level, meaning that codes for amin 0 and admin 1 don't have individual records. We need those records in order to query for admin 1 units or admin 0 units without specifying a further level of disaggregation. The clean_admin_codes.R script solves that issue by creating those missing records. To run do:

$ Rscript code/clean_admin_codes.R

A new CSV file titled modified_admin_units.csv will be generated to the config directory.

Making Queries

The queries seem to be unique. That is, an user will have to make a large number of queries (hundreds of thousands) in order to collect the complete database. This scraper was designed to make those queries automatically and store the resulting data.

API Design

The current API design imposes on the user the assumption that he knows a considerable amount of information before issuing queries. Users have to know exactly the combination of administrative units, indicator type IDs, and other variables in order to get the series she is interested on. In sum, the API isn't designed for exploration.

To go around this issue, this scraper issues queries using the combination of available query parameters. Considering that there are around 60 thousand locations available, the combination of variables result in nearly one million queries. This is inefficient and costly in computational terms.

Parallel Requests

This collector makes N number of parallel requests to the WFP API. Inside the __main__ script of the wfp_collect module, you can tweek that parameter as follows:

kwargs = {'query_limit': 50}

There are a number of considerations including system resources (i.e. memory), bandwidth, and server status that may affect the maximum number of parallel requests allowed. We've had mixed results, but have settled in 50 requests at a time.

About

Scraper for the API from the World Food Programme's VAM Unit.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published