- Partially reconstruct gene family lineages in Escherichia Coli
- Identify horizontal gene transfer events
These two python scripts are very similar. They take a file with blocks of genes (.afa) and convert it to fasta format.
This is a shell script that converts fasta files to phylip format.
mrca(tree, family) returns the most recent common ancestor for a given gene family.
The correct phylogenetic tree for our 5 sample species. I typed this out by hand based on a tree generated by RAxML, there was no script directly involved in making this.
This is a collection of scripts put together to generate the needed information to run dupDel. It's a good idea to run this once and keep the files around. It is important to note that the input files are made once with other scripts I haven't looked into.
Input:
- fam.out (silix results)
- geneSpeciesMap.txt
- dbList.txt (file with sample species)
- geneOrder.txt
Output:
- famGenes.txt
- famInfoResult.txt
- adjacencyInfo.txt
Sample command: python processFamGenes.py -f fam.out -m geneSpeciesMap.txt -d dbList.txt -g geneOrder.txt
This script calculates the minimum cost and associated duplications/deletions for every gene family.
Input:
- testATree
- famInfoResult.txt
Output:
- dupDelAll.txt
Sample command: python dupDel.py -t testATree -f famInfoResult.txt -d 3 -c 5 -n 1
This is a wrapper for the main pipeline. Make sure you've done all the preprocessing steps before running this (Preprocessing not included in this repository).
Example: python htrans.py -f siLiX_families -m gene<->species_map -d list_of_species_of_interest -g gene_order -t phylogenetic_tree -b deletion_cost -c duplication_cost -s #_of_species -o full_species_list
Kevin Heath & Zunyan Wang
Email me at kevin.n.heath@gmail.com