Skip to content

heitzerh/MESS.DB

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MESS.DB - Molecular Electronic Structure (and other Stuff) DB

This project provides a framework for organizing electronic structure (and other) calculations on organic molecules into a rational file structure and relational database.

Motivation and Design Philosophy

MESS.DB is designed to facilitate large scale screening. Molecules are represented in the database and file structure by their InChIKeys. MESS.DB is intended to be simple, portable, and human-friendly.

How it Works

Suppose you import the molecule Morphine. Morphine's InChIKey will be calculated (BQJCRHHNABKAKU-KBQPJGBKSA-N) and a directory will be created for it:

molecules/B/QJ/CRHHNABKAKU-KBQPJGBKSA-N/
BQJCRHHNABKAKU-KBQPJGBKSA-N.inchi <- the molecule in InChI format
BQJCRHHNABKAKU-KBQPJGBKSA-N.log <- a log tracking what has been done to the molecule
BQJCRHHNABKAKU-KBQPJGBKSA-N.notes <- a blank space for notes
BQJCRHHNABKAKU-KBQPJGBKSA-N.png <- a 2D representation of the molecule
sources.tsv <- a table of sources for the molecule, including where to buy if the source is commercial

In addition, morphine, along with its SMILES, InChI, IUPAC name, synonyms, and basic properties (like MW, charge, etc.) will be imported to MESS.DB, an SQLite relational database. For the curious, the schema is in db/schema.sql.

Methods (which, as far as MESS is concerned, are plugins that describe how to run a particular calculation) can be run against the database (or a subset on it). If I apply the balloon141 method, which generates 3D structures from smiles strings, a new folder appears in the molecules folder:

molecules/B/QJ/CRHHNABKAKU-KBQPJGBKSA-N/
balloon141_FROM_import_PATH_2/ <- contains logs and output from running balloon
BQJCRHHNABKAKU-KBQPJGBKSA-N.inchi
BQJCRHHNABKAKU-KBQPJGBKSA-N.log
BQJCRHHNABKAKU-KBQPJGBKSA-N.notes
BQJCRHHNABKAKU-KBQPJGBKSA-N.png
sources.tsv

If balloon generates any new properties that are not in the database, they are added. Now we can use the balloon 3D coordinates to run another calculation, and get:

molecules/B/QJ/CRHHNABKAKU-KBQPJGBKSA-N/
balloon141_FROM_import_PATH_2/ <- contains logs and output from running balloon
pm7_mopac2012_FROM_balloon141_PATH_3/ <- contains logs and output from running mopac
BQJCRHHNABKAKU-KBQPJGBKSA-N.inchi
BQJCRHHNABKAKU-KBQPJGBKSA-N.log
BQJCRHHNABKAKU-KBQPJGBKSA-N.notes
BQJCRHHNABKAKU-KBQPJGBKSA-N.png
sources.tsv

Even though most relevant properties are imported into the database after a run, all output files are retained for your reading and copying pleasure.

MESS.DB scales happily to thousands, if not millions, of molecules.

How to Install

First, clone the repository and set up an empty database:

git clone git@github.com:vamin/MESS.DB.git
cd messdb  
python mess/scripts/setup_db.py

MESS.DB can be run from the messdb directory without installation:

python mess

or

./bin/mess

If you would like to install MESS for all users on a system (expreimental):

python setup.py

MESS.DB works best with Python 2.7+, though it will work with lower versions of Python so long as they have Python 2.7's default modules installed. Open Babel, and it's python module pybel, are also required for most operations.

Modules also have their own dependencies, which you can learn about by running them.

Usage Examples

import a set of molecules

mess import sources/fda

This imports the "FDA-approved drugs" data set into mess.db and the molecules dir.

apply a method to all molecules in the database

mess select 'select * from molecule' | mess calculate -m balloon141

Balloon generates 3D structures from smiles.

apply a method that uses previous series of methods (parent path) output

mess select 'select * from molecule' | mess calculate -m pm7_mopac2012 -pp 2

Run a semiempirical calculation using the output from path 2 (the balloon 3D structures in this case, if you've been following along).

Current Features

-import from most common molecule formats (smi, inchi, xyz, sdf, etc.)
-rational file structure with graceful duplicate handling
-relational database of all molecules, sources, methods, and properties
-source tracking
-select molecules based on sql queries of their properties
-apply calculations (methods) to any selection

Planned Features

-report generation
-self-integrity checking
-handling of multiple molecular states (e.g cation, anion, triplet, conformers, etc.)
-database backup/restore
-database pruning

Contributors

Victor Amin, 2013-

About

Molecular Electronic Structure (and other Stuff) DataBase

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published