Project Overview

Python modules to generate BEL resource documents.

Resource Generator

gp_baseline.py - acts as the driver for the resource-generator. This module uses configuration.py to determine which parsers to run over which datasets. After parsing and stroring the data in a usable form, gp_baseline calls out to namespaces.py, annotate.py, and equiv.py to generate the new .belns, .belanno, and .beleq files.
configuration.py - matches each dataset to the proper parser. This module can be used to customize which parsers to run. To run/not run a particular parser, simply uncomment/comment it.
parsers.py - contains parsers for each dataset, and in some cases mutiple parsers over the same data. This is mainly due to the fact that in some cases withdrawn or deprecated terms are not included during resource generation, but are needed for resolving lost terms in the change log. Also included are parsers for the not-yet fully implemented PubChem names and IDs namespace.
parsed.py - acts as a storage module. Takes the data handed to it by the parser and stores it in a DataObject. Currently all of the data being used in this module is being kept in memory. See bug tracker about a possible solution to this memory constraint.
datasets.py - each DataObject that holds a particular dataset is defined in this module. These objects act as an interface to the underlying dictionaries, and do various manipulations over the data to assist in generating the BEL resource files.
namespaces.py - uses the parser instance itself to determine which namespace to generate. Properly encodes each term, and writes out each .belns file.
annotate.py - simple module that uses the MeSH data set to generate .belanno files.
equiv.py - the main function in this module will take a DataObject as a parameter, and use that object's defined functions to generate the new .beleq files.
common.py - defines some common functions used throughout the program, namely a download() function and a function that will open and read a gzipped file.
constants.py - any constants used throughout the program are defined in this module.

Change-Log

change_log.py - a separate module from gp_baseline. This module will download and parse the old .belns, .belanno, and .beleq files and compare those results with the newly generated files that will be locally stored from gp_baseline.py. Currently, change_log.py must be run with the flag -n <res_files>. res_files being the directory in which the newly generated resource files are located. The result of running change_log.py will be a dictionary mapping all the old terms to either their replacement terms or the string withdrawn. This dictionary can be consumed by an update script to resolve lost terms in older versioned BEL documents.
changelog_config.py - the configuration file for change_log.py. Much like configuration.py, this module maps which parsers will be needed, and the corresponding datasets for those parsers.
write_log.py - the only task for this module is to write the change-log data out to a file using a json format.

Dependencies

To run these Python scripts, the following software must be installed:

Python 3.x - modules are written in Python 3.2.3
lxml - used to parse various XML documents.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
datasets		datasets
.gitignore		.gitignore
README.md		README.md
annotate.py		annotate.py
change_log.py		change_log.py
changelog_config.py		changelog_config.py
common.py		common.py
configuration.py		configuration.py
constants.py		constants.py
datasets.py		datasets.py
equiv.py		equiv.py
gp_baseline.py		gp_baseline.py
namespaces.py		namespaces.py
parsed.py		parsed.py
parsers.py		parsers.py
write_log.py		write_log.py

jhourani/resource-generator

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Resource Generator

Change-Log

Dependencies

About

Resources

Stars

Watchers

Forks

Languages