For more information and the full documentation please visit http://lpm-hms.github.io/COSMOS2/.
To chat with the author/other users (many of which use COSMOS to make bioinformatics NGS workflows), use gitter:
pip install cosmos-wfm
COSMOS is a workflow management system for Python. It allows you to efficiently program complex workflows of command line tools that automatically take advantage of a compute cluster, and provides a web dashboard to monitor, debug, and analyze your jobs. COSMOS is able to scale on a traditional cluster such as LSF or GridEngine with a shared filesystem. It is especially powerful when combined with spot instances on Amazon Web Services and StarCluster.
COSMOS was designed to solve the problem of compute-intensive and complex scientific data pipelines. It's primary objective is to provide a simple but flexible api to specify complex job DAGs, a way to resume modified or failed workflows, and make debugging and provenance as easy as possible.
COSMOS was published as an Application Note in the journal Bioinformatics, but has evolved a lot since it's original inception. If you use COSMOS for research, please cite it's manuscript. This means a lot to the author.
Since the original publication, it has been re-written and open-sourced by the original author, in a collaboration between The Lab for Personalized Medicine at Harvard Medical School, the Wall Lab at Stanford University, and Invitae, a clinical genetic sequencing diagnostics laboratory.
- Written in python which is easy to learn, powerful, and popular. A programmer with limited experience can begin writing COSMOS workflows right away.
- Powerful syntax for the creation of complex and highly parallelized workflows.
- Reusable recipes and definitions of tools and sub workflows allows for DRY code.
- Keeps track of workflows, job information, and resource utilization and provenance in an SQL database.
- The ability to visualize all jobs and job dependencies as a convenient image.
- Monitor and debug running workflows, and a history of all workflows via a web dashboard.
- Alter and resume failed workflows.
- Support for DRMS such as SGE, LSF. DRMAA coming soon. Adding support for more DRMs is very straightforward.
- Supports for MySQL, PosgreSQL, Oracle, SQLite by using the SQLALchemy ORM.
- Extremely well suited for cloud computing, especially when used in conjuection with AWS and StarCluster.
Please use the Github Issue Tracker.
Some pretty big changes here, incurred during a hackathon at Invitae where a lot of feedback and contributions were received. Primarily, the api was simplified and made more intuitive. A new COSMOS primitive was created called a Dependency, which we have found extremely useful for generalizing subworkflow recipes. This API is now considered to be much more stable.
- Renamed Execution -> Workflow
- Reworked Workflow.add_task() api, see its docstring.
- Renamed task.tags -> task.params.
- Require that a task's params do not have keywords that do not exist in a task's functions parameters.
- Require that a user specify a task uid (unique identifer), which is now used for resuming instead of a Task's params.
- Created cosmos.api.Dependency, which provides a way to specify a parent and input at the same time.
- Removed one2one, one2many, etc. helpers. Found this just confused people more than helped.
- Various stability improvements to the drmaa jobmanager module