This repository contains code examples from the SCons for data science and computational biology pipelines post on my blog. For a full explanation, please see the introductory post.
In short though, SCons is a build tool (akin to make) which uses python to construct build scripts. These build scripts are valuable for data scientists and computational biologists as a tool for iteratively specifying the structure of a complex analysis with many individual components. It offers the following advantages over either running individual programs/analyses one at a time or via a shell script:
- Reproducibility: it's easier to reproduce research when you have a script that runs everything for you
- Once you get things working, you can hit go and walk away
- If you update one of the intermediate results, only the downstream results will be updated on subsequent builds, saving time
- Running independent steps in parallel becomes a snap
First install the prerequisites.
If you are running Ubuntu, you can simple run ./install_prereqs.sh
.
(If you are on OSX and feel like specifying a suitable setup with homebrew
et al., please be my guest)
Note that you will be asked for your password for permission to run apt-get install
and sudo pip install
.
If you're paranoid, take a look at the script for yourself to make sure there is no funny business, or just install the various libraries manually.
Once that's out of the way, you should be able to cd
into an example*
directory and run scons
to build the given analysis.
And that's it!
The files produced by the analysis should be in the output
directory of your example
directory.
The sequence data was obtained from a GenBank submission of Shiino et al., 2012.