1.Using Python and R for Quantitative Structure-Activity Relationship (QSAR) modeling and medicinal chemistry data analysis
I have been developing some predictive models using R and Python. This is a Repo to publish my scripts for QSAR modeling and medicinal chemistry data analysis. Since I do a lot of QSAR work in R and scikit-learn, I’ve created various scripts that make life easier for me as I do my QSAR modeling and analysis. Some of them are for private consumption. However, I’ve released a few of them to GITHUB since they seem to be generally useful.
2.Development of web-based informatics tools for use by chemists.
Software tools for applications such as modeling the similarities among drug-like small molecules, high-throughput screening (HTS) data analysis, Structure-Activity Relationship (SAR) analysis and visualization, elucidating target /MOA hypothesis are important for many applications in drug discovery and chemical biology/genomics. In this area I am working on the development of the MedChem Companion environment, an integrated web-based tools and/or platform for drug discovery/ chemical biology informatics. This modular software infrastructure currently consist of a Structure searchable chemistry database of hard-to-find biologically active natural products as a source of novel scaffolds for drug discovery (lessons from nature inspiring the design of new drugs) along with a user-friendly web interface. Ongoing work include the integration of the software components: Matched Molecular Pairs Identifier, Pharmacophore Generator, QSAR Model Builder (for developing a new QSAR modeling environment to enable the generation/analysis of QSAR models of compound activities and automate QSAR modeling to drive compound design), the integration of the KNIME environment and a range of RDkit based applications that permit/facilitates drug discovery chemistry along with the user-friendly web interface, named MedChem Companion Tools that is intended for non-expert users (Figure 1). The integration of cheminformatic tools with the RDkit programming environment has many advantages for small molecule discovery, such as easy access to a wide spectrum of statistical methods, machine learning algorithms and graphic utilities. There are plans to implement a toolbox for generation, validation and updates of in-silico ADME/Tox models to support lead identification and optimization programs. Ultimately, the MedChem Companion toolkit will provide utilities for the following areas of discovery and chemical biology informatics: processing large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries and screening results with a wide spectrum of algorithms as well as an integrated platform to explore drug polypharmacology.