Skip to content

Solving chemical identity crises with open data.

License

Notifications You must be signed in to change notification settings

hilldrew/chemex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

chemex

chemex

This project is about making public chemical hazard information more usable, by applying limited labor resources and Python.

  • Transforming certain useful resources, such as hazard classification lists, into simple & uniform data formats.
  • Helping to link different chemical identifiers, especially authority-controlled IDs & open structure-based IDs.

Context

Many public sources of chemical hazard information don't have the kind of programmatic accessibility that's generally expected from open data. For the major sources of open chemical data on the web, there are helpful Python interfaces: see PubChemPy, ChemSpiPy, CIRpy, BioServices. But those resources still leave gaps in the accessibility of certain types of information, which I'm interested in.

Organization

  • chemex package, just a loose collection of convenient code.
  • scripts/ contains a few scripts that help transform and clean data from useful public-domain sources. See the README(s) in that directory.
  • notebooks/ contains notebooks for explanation and/or testing.

Requirements

  • Python 3.x, or 2.7 with future
  • requests
  • beautifulsoup4
  • lxml
  • pandas & numpy
  • xlrd
  • boltons

Fine print

Files in data/ are from the public domain.

Everything else here is free and unencumbered software released into the public domain (see LICENSE).

About

Solving chemical identity crises with open data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published