Odetta is a set of tools for discovering and analyzing novel transcript isoforms using paired-end RNA-Seq data.
External software is used at various points in the pipeline:
- CASHX is used for sequence alignment.
- multisplat is used to discover splice junctions.
- GMB is used to discover novel gene models.
TODO TODO note about rtree
Using Odetta might look like this...
TODO
You can set up a mrjob configuration. For example...
mrjob.conf
runners:
local:
base_tmp_dir: /path/to/tmp/dir
jobconf:
mapreduce.job.maps: 8
mapreduce.job.reduces: 7
Use the configuration with...
python example.py --conf-path ./mrjob.conf input_file > output_file
mapreduce.job.maps and mapreduce.job.reduces are particularly useful for utilizing all available processors when running locally (i.e. not Hadoop).
The mrjob docs describe all available options.
Odetta uses mrjob for map/reduce processing. mrjob makes developing and running map/reduce easy, both locally and on Hadoop.
I have not tested this on Hadoop.
nose is used for unit testing. You can run the tests using nosetests tests/
.