maize utility libraries

Collection of Python libraries to parse bioinformatics files, or perform common tasks related to annotation and comparative genomics.

Dependencies

Following are a list of third-party python packages that are used by some routines in the library. These dependencies are not mandatory since they are only used by a few modules.

There are other Python modules here and there in various scripts. The best way is to install them via pip install when you see ImportError.

Installation

The easiest way is to install it via PyPI:

To install the development version:

pip install git+git://github.com/orionzhou/maize.git

Alternatively, if you want to install manually:

cd ~/code  # or any directory of your choice
git clone git://github.com/orionzhou/maize.git
export PYTHONPATH=~/code:$PYTHONPATH

Please replace ~/code above with whatever you like, but it must contain maize. To avoid setting PYTHONPATH everytime, please insert the export command in your .bashrc or .bash_profile.

In addition, a few module might ask for locations of external programs, if the extended cannot be found in your PATH. The external programs that are often used are:

Most of the scripts in this package contains multiple actions. To use the fasta or gff example:

usage: fasta [-h]
             {size,desc,clean,extract,split,tile,merge,gaps,rename,rmdot,cleanid,2aln,translate}
             ...

fasta utilities

optional arguments:
  -h, --help            show this help message and exit

available commands:
  {size,desc,clean,extract,split,tile,merge,gaps,rename,rmdot,cleanid,2aln,translate}
    size                Report length for each sequence
    desc                Report description for each sequence
    clean               Remove irregular chararacters
    extract             retrieve fasta sequences
    split               run pyfasta to split a set of fasta records evenly
    tile                create sliding windows that tile the entire sequence
    merge               merge multiple fasta files and update IDs
    gaps                report gap ('N's) locations in fasta sequences
    rename              rename/normalize sequence IDs, merge short
                        scaffolds/contigs
    rmdot               replace periods (.) in an alignment fasta by dashes
                        (-)
    cleanid             clean sequence IDs in a fasta file
    2aln                convert fasta alignment file to clustal format
    translate           translate nucleotide seqs to amino acid seqs

usage: gff [-h]
           {summary,filter,fix,fixboundaries,fixpartials,index,extract,cluster,chain,format,note,splicecov,picklong,2gtf,2tsv,2bed12,2fas,fromgtf,merge}
           ...

gff utilities

optional arguments:
  -h, --help            show this help message and exit

available commands:
  {summary,filter,fix,fixboundaries,fixpartials,index,extract,cluster,chain,format,note,splicecov,picklong,2gtf,2tsv,2bed12,2fas,fromgtf,merge}
    summary             print summary stats for features of different types
    filter              filter the gff file based on Identity and Coverage
    fix                 fix gff fields using various options
    fixboundaries       fix boundaries of parent features by range chaining
                        child features
    fixpartials         fix 5/3 prime partial transcripts, locate nearest in-
                        frame start/stop
    index               index gff db
    extract             extract contig or features from gff file
    cluster             cluster transcripts based on shared splicing structure
    chain               fill in parent features by chaining children
    format              format gff file, change seqid, etc.
    note                extract certain attribute field for each feature
    splicecov           extract certain attribute field for each feature
    picklong            pick longest transcript
    2gtf                convert gff3 to gtf format
    2tsv                convert gff3 to tsv format
    2bed12              convert gff3 to bed12 format
    2fas                extract feature (e.g. CDS) seqs and concatenate
    fromgtf             convert gtf to gff3 format
    merge               merge several gff files into one

Then you can just do to run any action:

python -m maize.formats.fasta size

python -m maize.formats.gff fix

This will tell you the options and arguments it expects.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
apps		apps
compara		compara
docs		docs
formats		formats
old		old
test		test
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
__main__.py		__main__.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apps

apps

compara

compara

docs

docs

formats

formats

old

old

test

test

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

init.py

init.py

main.py

main.py

template.py

template.py

Repository files navigation

maize utility libraries

Contents

Dependencies

Installation

About

Releases

Packages

Languages

License

shanwai1234/maize

Folders and files

Latest commit

History

Repository files navigation

maize utility libraries

Contents

Dependencies

Installation

About

Resources

License

Stars

Watchers

Forks

Languages