Copyright 2012-2014 Johns Hopkins University HLTCOE. All rights reserved. This software is released under the 2-clause BSD license. See LICENSE in the project root directory.
Python modules and scripts for working with Concrete, an HLT data specification defined using Thrift.
This repository contains the Python classes generated by the Thrift compiler, but not the .thrift definition files that were used to generate these classes. The .thrift definition files can be found in the Concrete-Thrift GitHub repository: https://github.com/hltcoe/concrete-thrift
Concrete-Python requires the following:
- Python >= 2.7.x
- 'networkx' Python package
- 'thrift' Python package >= 0.9.1
You do not need to install the Thrift compiler to use this library.
You can install Concrete using the pip package manager:
pip install git+https://github.com/hltcoe/concrete-python.git#egg=concrete
or by cloning this repository and running setup.py:
git clone https://github.com/hltcoe/concrete-python.git
cd concrete-python
python setup.py test
python setup.py install
The Concrete Python package comes with two scripts.
-
concrete2json.py reads in a Concrete Communication and prints a JSON version of the Communication to stdout. The JSON is "pretty printed" with indentation and whitespace, which makes the JSON easier to read and to use for diffs.
-
validate_communication.py reads in a Concrete Communication file and prints out information about any invalid fields. This script is a command-line wrapper around the functionality in the
concrete.validate
library.
Use the '-h/--help' flag for details about the scripts' command line arguments.
Compiled Python classes end up in the concrete
namespace. You can
use them by importing them as follows:
from concrete import Communication
foo = Communication()
foo.text = 'hello world'
...
The Python version of the Thrift Libraries does not perform any
validation of Thrift objects. You should use the
validate_communication()
function after reading and before writing a
Concrete Communication:
from concrete.util import read_communication_from_file
from concrete.validate import validate_communication
comm = read_communication_from_file('tests/testdata/serif_dog-bites-man.concrete')
# Returns True|False, logs details using Python stdlib 'logging' module
validate_communication(comm)
Thrift fields have three levels of requiredness:
- explicitly labeled as required
- explicitly labeled as optional
- no requiredness label given ("default required")
The Java version of the Thrift libraries will raise an exception if a
required field is missing on deserialization or serialization, and
will raise an exception if a "default required" field is missing on
serialization. The Python version of the Thrift Libraries (as of
Thrift 0.9.1) does not perform any validation of Thrift objects on
serialization or deserialization. The Python Thrift libraries do
provide a validate()
function, but this function only checks for
explicitly required fields, and not "default required" fields.
The Thrift validate()
function also only performs shallow validation -
nested data structures are not checked for required fields.
The validate_communication()
function recursively checks a
Communication object for required fields, plus additional checks for
UUID mismatches.