libhxl-python

Python support library for the Humanitarian Exchange Language (HXL) data standard. It supports both Python 2.7+ and Python 3.

About HXL: http://hxlstandard.org

Usage

The hxl() function (in the package hxl) reads HXL from a file object, filename, URL, or list of arrays and makes it available for processing, much like $() in JQuery:

import sys
from hxl import hxl

dataset = hxl(sys.stdin)

You can add additional methods to process the data. This example shows an identity transformation in a pipeline (See "Generators", below):

for line in hxl(sys.stdin).gen_csv():
    print(line)

This is the Same transformation, but loading the entire dataset into memory as an intermediate step (see "Filters", below):

for line in hxl(sys.stdin).cache().gen_csv():
    print(line)

Filters

There are a number of filters that you can apply in a stream after a HXL dataset. This example uses the with_rows() filter to find every row that has a #sector of "WASH" and print the organisation mentioned in the row:

for row in hxl(sys.stdin).with_rows('#sector=WASH'):
    print('The organisation is {}'.format(row.get('#org')))

This example removes the WASH sector from the results, then counts the number of times each organisation appears in the remaining rows:

url = 'http://example.org/data.csv'
result = hxl(url).with_rows('#sector!=WASH').count('#org')

The following filters are available:

Filter method	Description
`Dataset.cache()`	Cache an in-memory version of the dataset (for processing multiple times).
`Dataset.with_columns(patterns)`	Include only columns that match the tag pattern(s), e.g. "#org+impl".
`Dataset.without_columns(patterns)`	Include all columns except those that match the tag patterns.
`Dataset.with_rows(queries)`	Include only rows that match at least one of the queries, e.g. "#sector=WASH".
`Dataset.without_rows(queries)`	Exclude rows that match at least one of the queries, e.g. "#sector=WASH".
`Dataset.sort(patterns, reverse=False)`	Sort the rows, optionally using the pattern(s) provided as sort keys. Set _reverse_ to True for a descending sort.
`Dataset.count(patterns, aggregate_pattern=None)`	Count the number of value combinations that appear for the pattern(s), e.g. ['#sector', '#org']
`Dataset.add_columns(specs, before=False)`	Add columns with fixed values to the dataset, e.g. "Country#country=Kenya" to add a new column #country with the text header "Country" and the value "Kenya" in every row.

Sinks

Sinks take a HXL stream and convert it into something that's not HXL.

Validation

To validate a HXL dataset against a schema (also in HXL), use the validate sink:

is_valid = hxl(url).validate('my-schema.csv')

If you don't specify a schema, the library will use a simple, built-in schema:

is_valid = hxl(url).validate()

If you include a callback, you can collect details about the errors and warnings:

def my_callback(error_info):
    # error_info is a HXLValidationException
    sys.stderr.write(error_info)

is_valid = hxl(url).validate(schema='my-schema.csv', callback=my_callback)

Generators

Generators allow the re-serialising of HXL data, returning something that works like an iterator. Example:

for line in hxl(url).gen_csv():
    print(line)

The following generators are available (you can use the parameters to turn the text headers and HXL tags on or off):

Generator method	Description
`Dataset.gen_raw(show_headers=True, show_tags=True)`	Generate arrays of strings, one row at a time.
`Dataset.gen_csv(show_headers=True, show_tags=True)`	Generate encoded CSV rows, one row at a time.
`Dataset.gen_json(show_headers=True, show_tags=True)`	Generate encoded JSON rows, one row at a time.

Caching

libhxl uses the Python requests library for opening URLs. If you want to enable caching (for example, to avoid beating up on your source with repeated requests), your code can use the requests_cache plugin, like this:

import requests_cache
requests_cache.install_cache('demo_cache', expire_after=3600)

The default caching backend is a sqlite database at the location specied.

Installation

This repository includes a standard Python setup.py script for installing the library and scripts (applications) on your system. In a Unix-like operating system, you can install using the following command:

python setup.py install

If you don't need to install from source, try simply

pip install libhxl

Once you've installed, you will be able to include the HXL libraries from any Python application, and will be able to call scripts like hxlvalidate from the command line.

Name		Name	Last commit message	Last commit date
Latest commit History 739 Commits
hxl		hxl
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG		CHANGELOG
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hxl

hxl

tests

tests

.gitignore

.gitignore

.travis.yml

.travis.yml

CHANGELOG

CHANGELOG

LICENSE.md

LICENSE.md

MANIFEST.in

MANIFEST.in

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

libhxl-python

Usage

Filters

Sinks

Validation

Generators

Caching

Installation

About

Releases

Packages

Languages

License

raymondnijssen/libhxl-python

Folders and files

Latest commit

History

Repository files navigation

libhxl-python

Usage

Filters

Sinks

Validation

Generators

Caching

Installation

About

Resources

License

Stars

Watchers

Forks

Languages