Skip to content

chevrm/transPACT

Repository files navigation

transPACT

trans-AT PKS Annotation and Comparison Tool

logos

transPACT is a joint collaboration between the University of Wisconsin-Madison, ETH Zurich, and Wageningen University.

Reference:

EJN Helfrich*, R Ueoka*, MG Chevrette*, F Hemmerling, X Lu, S Leopold-Messer, AY Burch, SE Lindow, J Handelsman, J Piel†, MH Medema†. Evolution of combinatorial diversity in trans-acyltransferase polyketide synthase assembly lines across bacteria. 2021. Nature Communications 12, 1422. 10.1038/s41467-021-21163-x

* equal contributions

† to whom correspondance should be addressed; JP: jpiel (at) ethz.ch | MHM: marnix.medema (at) wur.nl

Brief description

Trans-acyltransferase polyketide synthases (trans-AT PKSs) are multimodular enzymes that biosynthesize diverse pharmaceutically and ecologically important natural products. Here, we developed and applied a phylogenomic algorithm, transPACT (trans-AT PKS Annotation and Comparison Tool), to perform a global computational analysis of trans-AT PKS gene clusters, identifying hundreds of evolutionarily conserved module blocks. Network analysis of their exchange patterns reveals a widespread diversification mechanism for these enzymes. transPACT implementation to assign substrate specificity to trans-AT PKS's ketosynthase (KS) domains can be found within this repository, as well as helper scripts used to generate the global trans-AT PKS network. transPACT is typically run independently, but is built within the antiSMASH 4.x architecture [paper] [repo].

Set up environment

Dependencies are listed in conda_packages.txt. It is highly suggested for users to create their own conda environment using this file, e.g.:

conda create --name transPACT --file conda_packages.txt

This creates a new environment called transPACT with all dependencies installed. This environment can now be accessed by:

conda activate transPACT

Install/setup time on a "normal" desktop computer should be less than 5 minutes. In tests, setup completed in 26 seconds with: date && git clone https://github.com/chevrm/transPACT.git && cd transPACT && conda create --name transPACTtest --file conda_packages.txt && conda activate transPACTtest && date

Running transPACT to assign KS substrate specificity

  • python2 transPACT_substrate_from_faa.py <protein_fasta_of_KS_domains.faa>

    • transPACT prediction of trans-AT substrate from a protein fasta. An example is provided in example/test.faa.
    • Tab separated output (default is to STDOUT; redirect to a file to save results)
    • Run time on a "normal" desktop computer should be less than 1 minute per KS domain. Run time for a singe KS domain was benchmarked at 7 seconds with date && python2 transPACT_substrate_from_faa.py example/test.faa && date
  • python2 ./data/dendrogram20200320/generate_dendrogram_userweights.py <Jaccard_weight> <DSS_weight> <AdjacencyIndex_weight>

    • Generate trans-AT pathway dendrogram
    • Implementation of Jaccard index (JI), domain sequence similariry (DSS), and adjacency index is as described in BiG-SCAPE [paper]. Briefly, JI measures the percentage of shared types of domains, DSS measures sequence identity between protein domains, and AI measures the percentage of pairs of adjacent domains.
      • Suggested weights are JI = 0, DSS = 0.32, AI = 0.68, the same weights that are used in BiG-SCAPE's distance calculation for trans-AT PKS pathways.
    • Not provided in this repo (due to size): all vs. all diamond table (filename set at line 576).
    • Output is a newick format dendrogram that can be visualized in any number of tree visualization software. We recommend iTOL, and have used that for out global analysis [iTOL] [our analysis]. Extensive documentation on annotating iTOL trees can be found here. Our annotation files are at data/dendrogram20200227/itol_bin.txt for denoting whether a BGC lies on a contig edge and data/dendrogram20200227/itol_dom.txt for annotating the KS-domain clades of the pathway.
    • Brief instructions to recreate the dendrogram from the transPACT manuscript:
      • Create and enter a new dendrogram folder and copy dendrogram script, e.g.: mkdir data/dendrogram_test && cd data/dendrogram_test && cp ../dendrogram20200320/generate_dendrogram_userweights.py ./
      • Create all vs. all diamond table with diamond makedb -d all --in data/dendrogram20190514/KS_precomputed_1405_hmmalign_trimmed_renamed.fasta && diamond blastp -d all -q data/dendrogram20190514/KS_precomputed_1405_hmmalign_trimmed_renamed.fasta -o full.dbp
      • Update line 576 of generate_dendrogram_userweights.py to point to the absolute path of full.dbp above.
      • Generate dendrogram with suggested weights: python2 ./data/dendrogram20190829/generate_dendrogram_userweights.py 0 0.32 0.68

What's actually happening when I run transPACT?

The core transPACT algorithm is found at antismash/specific_modules/nrpspks/nrpspksdomainalign/substrate_from_faa.py. It has been symbolically linked at transPACT_substrate_from_faa.py for user convenience. For each ketosynthase domain (input as a protein fasta), KSs are aligned to a reference alignment of a core set of 647 experimentally characterized KS domains with MUSCLE (see align_ks_domains(); invoked on line 533). This alignment is used to phylogenetically place the query sequence onto a reference phylogeny (placement with pplacer; see run_pipeline_pplacer(); invoked on line 534) and query sequences are assigned to a clade and functional classification based on monophyly (see parse_pplacer()).