Skip to content
This repository has been archived by the owner on Sep 30, 2022. It is now read-only.

centre-for-microbiome-research/GenomeFISH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GenomeFISH

Scripts are contained in the scripts directory and are written in Python 2.x. No installation is required. Simply copy the scripts to your system and ensure you have the required dependencies as detailed below.

Calculating pairwise average nucleotide identity

Script: ani.py

Requirements:

Calculates pairwise average nucleotide identity (ANI) values between a set of genomes:

ani.py <gene_dir> <output_dir>

where <gene_dir> contains FASTA files of genes in nucleotide space for two or more genomes and <output_dir> is the desired output directory for results. FASTA files must end with a '.fna' or '.fasta' extension. The number of CPUs to use can be specified with the "--threads" parameter.

Simulating in silico probes

Script: in_silico_probes.py

Requirements:

  • biolib Python library
  • BLASTn >= 2.6.0+ must be on your system path
  • melt.pl from the UNAFold software library must be on your system path

Calculates the percent identity and free energy error between in silico probes from a reference genome and a target genome:

in_silico_probes.py <genome_dir> <ani_matrix> <output_dir>

where <genome_dir> contains genomic FASTA files for two or more genomes, <ani_matrix> is a file with pairwise average nucleotide identity between genomes (see: ani.py), and <output_dir> is the desired output directory for results. The genomic FASTA files must end in a '.fasta' or '.fna' extension and all pairwise combination of genomes are considered as reference and target genomes. In order to reduce computational requirements in silico probes are only simulated every 120 bp (i.e. same length as the probe). At this step size it takes ~2 hours to compare a pair of genomes. You can do multiple genome comparisons in parallel though so this should help a bit. The length of the probes can be changed with the "--probe_size" parameter and the spacing between probes changed with the "--probe_step_size" parameters. The number of CPUs to use can be specified with the "--threads" parameter. Using multiple CPUs will substantially reduce processing time when multiple pairs of genomes are being processed. For other optional parameters see the command line help (i.e. in_silico_probes.py -h).

About

Simulation of probe hybridization events between target and reference genomes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages