Skip to content

sasonbol/Integron_Finder

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Integron_Finder
===============

Find integrons in DNA sequences

See Documentation for more details: http://integronfinder.readthedocs.org/en/latest/

# Dependencies :

- Python 2.7
   - Pandas 0.15.1
   - Numpy 1.9.1
   - Biopython 1.65
   - Matplotlib 1.4.3
   - psutils 2.1.3
- HMMER 3.1b1
- INFERNAL 1.1
- Prodigal V2.6.2

# Usage

```
usage: integron_finder [-h] [--local_max] [--func_annot] [--cpu CPU]
                       [-dt DISTANCE_THRESH] [--outdir .] [--linear]
                       [--union_integrases] [--cmsearch CMSEARCH]
                       [--hmmsearch HMMSEARCH] [--prodigal PRODIGAL]
                       [--path_func_annot bank_hmm] [--gembase]
                       [--attc_model file.cm] [--evalue_attc 1]
                       [--keep_palindromes] [--no_proteins] [--eagle_eyes]
                       [-V]
                       replicon

positional arguments:
  replicon              Path to the replicon file (in fasta format), eg :
                        path/to/file.fst or file.fst

optional arguments:
  -h, --help            show this help message and exit
  --local_max           Allows thorough local detection (slower but more
                        sensitive and do not increase false positive rate).
  --func_annot          Functional annotation of CDS associated with integrons
                        HMM files are needed in Func_annot folder.
  --cpu CPU             Number of CPUs used by INFERNAL and HMMER
  -dt DISTANCE_THRESH, --distance_thresh DISTANCE_THRESH
                        Two elements are aggregated if they are distant of
                        DISTANCE_THRESH [4kb] or less
  --outdir .            Set the output directory (default: current)
  --linear              Consider replicon as linear. If replicon smaller than
                        20kb, it will be considered as linear
  --union_integrases    Instead of taking intersection of hits from Phage_int
                        profile (Tyr recombinases) and integron_integrase
                        profile, use the union of the hits
  --cmsearch CMSEARCH   Complete path to cmsearch if not in PATH. eg:
                        /usr/local/bin/cmsearch
  --hmmsearch HMMSEARCH
                        Complete path to hmmsearch if not in PATH. eg:
                        /usr/local/bin/hmmsearch
  --prodigal PRODIGAL   Complete path to prodigal if not in PATH. eg:
                        /usr/local/bin/prodigal
  --path_func_annot bank_hmm
                        Path to file containing all hmm bank paths (one per
                        line)
  --gembase             Use gembase formatted protein file instead of
                        Prodigal. Folder structure must be preserved
  --attc_model file.cm  path or file to the attc model (Covariance Matrix)
  --evalue_attc 1       set evalue threshold to filter out hits above it
                        (default: 1)
  --keep_palindromes    for a given hit, if the palindromic version is found,
                        don't remove the one with highest evalue
  --no_proteins         Don't annotate CDS and don't find integrase, just look
                        for attC sites.
  --eagle_eyes          Synonym of --local_max. Like a soaring eagle in the
                        sky, catching rabbits(or attC sites) by surprise.
  -V, --version         show program's version number and exit

```


### Example

    integron_finder myfastafile.fst --local_max --func_annot

## Output :

A folder name Results\_id\_genome, inside there are different files :

- *.gbk : contains the input sequence with all integrons and features found.
- *.integrons : contain list of all element detected (attc, protein near attC, integrase, Pc, attI, Pint) with position, strand, evalue, etc...
- *.pdf : representation of complete integrons detected (with integrase (redish) and at least one attc (blueish)). If a protein has a hit with an antibiotic resistance gene, it's yellow, otherwise grey.

 and one folder, `other`, containing the different outputs of the different steps of the program.

 # Mobyle

 You can use this program whithout installing it, through a webserver:

 http://mobyle.pasteur.fr/cgi-bin/portal.py#forms::integron_finder
 
 # Citation
 
 The paper is submitted.
 
 See here for pre-print: 
 
Automatic and accurate identification of integrons and cassette arrays in bacterial genomes reveals unexpected patterns
Jean Cury, Thomas Jové, Marie Touchon, Bertrand Néron, Eduardo PC Rocha
bioRxiv doi: http://dx.doi.org/10.1101/030866

 
 Please cite also the following articles:
 
 - Nawrocki, E.P. and Eddy, S.R. (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics, 29, 2933-2935.
 - Eddy, S.R. (2011) Accelerated Profile HMM Searches. PLoS Comput Biol, 7, e1002195.
 - Hyatt, D., Chen, G.L., Locascio, P.F., Land, M.L., Larimer, F.W. and Hauser, L.J. (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11, 119.
 
 and if you use the function `--func_annot` which uses Resfams:
 
 - Gibson, M.K., Forsberg, K.J. and Dantas, G. (2015) Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J, 9, 207-216.

About

Find integrons in DNA sequences

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%