Skip to content

ginerorama/Blaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blaster

alt text

Require

python modules: Tkinter, biopython, pillow, seaborn

other binary programs:

  1. ncbi blast
  2. mview
  3. muscle

Important Note: Orthoprok.gif file has to be at the same path that Blaster.py.

Usage


python Blaster.py


Input

Proteins and genomes sequences have to be in separated fasta (.fa, .fna, .fasta) files (multifasta is not supported).

load genome files selecting the folder containning the target genome in /genome folder label load query proteins selecting the folder containning your proteins sequences in fasta format in /Queries folder Indicate the path in which your project is going to be save at /Save folder.

Analysis

Firs yoou need to generate a log files picking up log buttom. Next perform a blast a analysis picking up at Blast buttom. Finally you can perform as many analysis you want changing the coverage and identity patterns (Note: the results are going to rewrite your old results unless you change your /Save folder.

Use collapse to collapse all the genomes that belog to the same species. Use absence analysis to obtain those genomes that not contains you query proteins (default: presence analysis)

Output

Blaster analysis generates two different folders. /Result folder that contains all the analysis result and /tmp folder in which several intermediate-tables analysis are stored.

/result/ file description:

  • Analysis_presence_summary.csv = table containning number of copies found by blast for query proteins in every genome.
  • Analysis_pseudo_summary.csv = table containning number of pseudogenes found by blast for query proteins in every genome.
  • Protein_ident.csv = table containning % identity value of proteins found by blast in every genome.
  • Protein_size.csv = table containning % coverage value of proteins found by blast in every genome.
  • summarystats.csv = number of total queries proteins found in all genomes, this data are graphical representaed in the grpahic_stats.pdf file.

In /result/blast/ you can find all the blast hits found for every query in the target genomes according to coverage and identity parameters used in the analysis.

Alignment analysis:

all the protein sequences found according to coverage and identity values set are going to be alignned with MUSCLE and reformated MVIEW.

Graphic analysis

In this section different graphic analysis can be performed and width, height and font scale of the output figure could be controlled.

Protein plot analysis:

In these graphics the coverage of all homologous proteins found in the target genomes for a determinated query protein will be plotted. Resulting graphic will be pop up but it will also be saved at /result/proteins/

Heatmap analysis:

next files present in /tmp:

  • groupbyserovar_1normalized_absence.csv
  • groupbyserovar_1normalized_merge.csv
  • groupbyserovar_1normalized_pseudo.csv

are used to plot presence/absence of query proteins in all genomes. In the case collapse genomes option is set, all genome from the same species will be collapsed in the heatmap. This analysis will generate three differente graphic files:

  • clustermap_presence.pdf
  • clustermap_pseudogenes.pdf
  • clustermap_presence_pseudogenes.pdf

Blast plot:

This function plot the coverage and the identity of all query protein making easy the visual inspection of the whole Blast analysis.

This figure is stored in blast_plot.tif file

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages