JSON schema and validation code for HEPData submissions
- Documentation: http://hepdata-validator.readthedocs.io
If you can, install LibYAML (a C library for parsing and emitting YAML) on your machine. This will allow for the use of CLoader for faster loading of YAML files. Not a big deal for small files, but performs markedly better on larger documents.
Via pip:
pip install hepdata-validator
Via GitHub (for developers):
git clone https://github.com/HEPData/hepdata-validator
cd hepdata-validator
pip install --upgrade -e . -r requirements.txt
pytest testsuite
To validate files, you need to instantiate a validator (I love OO).
from hepdata_validator.submission_file_validator import SubmissionFileValidator
submission_file_validator = SubmissionFileValidator()
submission_file_path = 'submission.yaml'
# the validate method takes a string representing the file path.
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path)
# if there are any error messages, they are retrievable through this call
submission_file_validator.get_messages()
# the error messages can be printed
submission_file_validator.print_errors(submission_file_path)
Data file validation is exactly the same.
from hepdata_validator.data_file_validator import DataFileValidator
data_file_validator = DataFileValidator()
# the validate method takes a string representing the file path.
data_file_validator.validate(file_path='data.yaml')
# if there are any error messages, they are retrievable through this call
data_file_validator.get_messages()
# the error messages can be printed
data_file_validator.print_errors('data.yaml')
Optionally, if you have already loaded the YAML object, then you can pass it through as a data object. You must also pass through the file_path
since this is used as a key for the error message lookup map.
from hepdata_validator.data_file_validator import DataFileValidator
import yaml
file_contents = yaml.load(open('data.yaml', 'r'))
data_file_validator = DataFileValidator()
data_file_validator.validate(file_path='data.yaml', data=file_contents)
data_file_validator.get_messages('data.yaml')
data_file_validator.print_errors('data.yaml')
An example offline validation script uses the hepdata_validator
package to validate the submission.yaml
file and all YAML data files of a HEPData submission.
There are currently 2 versions of the JSON schemas, 0.1.0 and 1.0.0. In most cases you should use 1.0.0 (the default). If you need to use a different version, you can pass a keyword argument schema_version
when initialising the validator:
submission_file_validator = SubmissionFileValidator(schema_version='0.1.0')
data_file_validator = DataFileValidator(schema_version='0.1.0')