KGX (Knowledge Graph Exchange) is a Python library and set of command line utilities for exchanging Knowledge Graphs (KGs) that conform to or are aligned to the Biolink Model.
The core datamodel is a Property Graph (PG), represented internally in Python using a networkx MultiDiGraph model.
KGX allows conversion to and from:
- RDF serializations (read/write) and SPARQL endpoints (read)
- Neo4J endpoints (read) or Neo4J dumps (write)
- CSV/TSV and JSON (see associated data formats and example script to load CSV/TSV to Neo4j)
- Any format supported by networkx
KGX will also provide validation, to ensure the KGs are conformant to the Biolink model: making sure nodes are categorized using Biolink classes, edges are labeled using valid Biolink relationship types, and valid properties are used.
Internal representation is networkx MultiDiGraph which is a property graph.
The structure of this graph is expected to conform to the Biolink Model standard, briefly summarized here:
- Nodes
- id : required
- name : string
- category : string. broad high level type. Corresponds to label in neo4j
- extensible other properties, depending on the node
- Edges
- subject : required
- edge_label : required
- object : required
- extensible other properties, depending on the edge
Note: the installation of KGX requires Python 3.7+
You should first confirm what version of Python you have running and upgrade to v3.7 as necessary, following best practices in your operating system. It is also assumed that the common development tools are installed including git, pip, and all necessary development libraries for your operating system.
Go to where you wish to host your local project repository and git clone the project, namely:
cd /path/to/your/local/git/project/folder
git clone https://github.com/NCATS-Tangerine/kgx.git
# then enter into the cloned project repository
cd kgx
For convenience, make use of the Python venv
module to create a lightweight virtual environment.
Note that you may also have to install the appropriate
venv
package for Python 3.7.For example, under Ubuntu Linux, you might
sudo apt-get install python3.7-venv
Once venv
is available, type:
python3 -m venv venv
source venv/bin/activate
To exit the environment, type:
deactivate
To reenter, source the activate
command again.
Alternately, you can also use use conda env to manage packages and the development environment:
conda create -n translator-modules python=3.7
conda activate translator-modules
Some IDE's (e.g. PyCharm) may also have provisions for directly creating a virtual environment. This should work fine.
The Python dependencies of the application need to be installed into the local environment using a version of pip
matched to your Python 3.7 installation (assumed here to be called pip3
).
Again, follow the specific directives of your operating system for the installation.
For example, under Ubuntu Linux, to install the Python 3.7 matched version of pip, type the following:
sudo apt-get install python3-pip
which will install the pip3
command.
At this point, it is advisable to separately install the wheel
package dependency before proceeding further
(Note: it is assumed here that your venv
is activated)
pip3 install wheel
After installation of the wheel
package, we install the remaining KGX Python package dependencies without error:
pip3 install .
It is sometimes better to use the 'python -m pip' version of pip rather than just 'pip'
to ensure that the proper version of pip - i.e. for the python3 in your virtual environment - is used
(i.e. once again, better check your pip version. On some systems, it may run the operating system's version,
which may not be compatible with your venv
installed Python 3.7)
python -m pip install .
Some components of KGX leverage the use of Docker. If not installed in your Operating system environment, the following instructions to install Docker may be followed to install it.