This project provides a framework for organizing electronic structure (and other) calculations on organic molecules into a rational file structure and relational database. MESS.DB scales happily to millions of molecules.
-
Clone the repository and submodules:
git clone --recursive git@github.com:vamin/MESS.DB.git messdb cd messdb
-
Alias mess executable:
alias mess=${PWD}/bin/mess
-
Get help:
mess -h mess [tool] -h
-
Import the sample source molecules:
mess import fda
-
Generate 3D structures for all molecules in the database:
mess select | mess calculate balloon
-
Select only molecules in a particular molecular weight range:
mess select -hd -n MW -o '>' -v 250
-
Find matches to a SMARTS string:
mess select | mess match -hd -m [CX3]=[OX1]
- Python 2.7+
- SQLite 3.7+
- Open Babel 2.3+ with Python bindings
In addition, mess calculate
provides access to various calculation methods,
each of which may have their own dependencies. You can learn about module
dependencies by running e.g. mess calculate balloon -h
.
Every molecule is represented with its own directory and a collection of records in a relational database.
Each molecule is identified by its inchikey. For example, morphine's InChIKey is BQJCRHHNABKAKU-KBQPJGBKSA-N. Molecule directories are based on the InChIKey, so all files related to morphine will be stored in molecules/B/QJ/CRHHNABKAKU-KBQPJGBKSA-N/. In addition to any in/out files generated during a calculation, each molecule directory contains:
- INCHIKEY.inchi -- the molecule in InChI format
- INCHIKEY.log -- a log tracking the molecule's calculation history
- INCHIKEY.notes -- a space for manual annotations of the molecule
- INCHIKEY.png -- a 2D representation of the molecule
- sources.tsv -- a table of sources for the molecule, including where to buy if the source is commercial
The relational database is stored in db/mess.db, an SQLite database which is
initialized on the first run of mess
. The schema is described in db/schema.
It is possible to query the database directly via mess select
, which also
provides command line interface for common queries.
For each example, we will assume that a corpus of molecules to investigate has
already been imported with mess import [source]
.
-
Generate 3D structures with Balloon:
mess select | mess calculate balloon
Running
mess select
with no options outputs a list of every molecule in the database. -
Calculate electronic structure with Mopac:
mess select | mess calculate mopac -p 2
The
-p 2
specifies that the calculation should use the geometry in path id 2 (the Balloon-generated geometry) as input. -
Select candidate molecules:
mess select -n MW -o '<' -v 250 | mess select -n 'IONIZATION POTENTIAL' -o '<' -v '7'
The resulting table can be sorted by piping to common unix tools like
sort
.
-
Generate 3D structures with Balloon:
mess select | mess calculate balloon
-
Compare halogen-containing molecules to aspirin target geometry by Spectrophore:
mess match -m [F,Cl,Br,I] | mess match -t aspirin.xyz -s -p 2
For more information about these tools, run mess [tool] -h
.
- annotate
- backup
- calculate
- check
- import
- inspect
- match
- remove
- select
- transform
There is also a helper script for setting up sources for import, sources/setup_source.sh.
Many. This software is currently in alpha, meaning not every feature has been implemented and those that have been may behave unexpectedly. When the project has progressed to the point that it is safe to use, this section will be updatedd with a specific bug list.
Code and documentation copyright Victor Amin 2013-2014 and made available under the AGPL license.