Skip to content
/ core Public
forked from gwct/core

My personal core functions and scripts for manipulating sequence data, phylogenetic trees, and other things.

License

Notifications You must be signed in to change notification settings

rtraborn/core

 
 

Repository files navigation

CORE: COde for Romps in Evolutionary data

A mixture of scripts and libraries to help with sequence data manipulation, tree parsing, and other things.

Author

Gregg Thomas

About

These scripts can be used for many tasks including sequence handling, tree making, and sequence alignment.
Some of these programs are mainly used as wrappers to easily run other genomics or phylogenetics programs on a bunch of files. Pay attention to the dependencies for each script to make sure you have the proper programs installed.
Please note that many of these scripts expect input as FASTA files. For my scripts, these must have the extension .fa. If you don't have FASTA formatted files, you can use seq_convert to get them to FASTA format and fa_edit to make any changes you need to them afterwards.
Almost all of these scripts are written in Python 2.7 (https://www.python.org/downloads/).
For any script, use the -h flag for specific usage details.

CORE scripts

  1. cafecore/cafe_report_analysis.py
  • This script reads the report output file from a CAFE run and makes the results more understandable. It has a lot of options for output based on the files you have.
  1. corelib/core.py
  • General helper functions such as reading sequences to a dictionary. You'll have to look to see what all is there.
  1. corelib/nj_tree.r
  • Simple R script to get a Neighbor Joining tree. Used by supertreemaker and probably not helpful standalone.
  • Dependencies:
     i. R (https://www.r-project.org/)
  1. corelib/treeparse.py
  • A couple functions that read (rooted) Newick formatted trees and return all relevant information in a more useful way to code with.
  1. count_aln.py
  • This script gathers statistics about a single alignment file, or a directory full of alignment files.
  1. count_pos.py
  • This script simply counts the number of amino acids or nucleotides in a file or directory.
  1. fa_concat.py
  • Concatenates many FASTA formatted sequence files into a single FASTA file.
  1. fa_edit.py
  • A general purpose FASTA handling script. Can relabel and trim headers and remove start and stop AAs.
  1. how_many_trees
  • Just a little script to show the number of possible rooted tree topologies for a given number of species.
  1. paml_lrt.py
  • Performs a likelihood ratio test on output from the branch-site test in codeml.
  • Dependencies: Output from two run_codeml.py runs with -b 1 (null model) and -b 2 (alternate model).
  1. run_codeml.py
  1. run_gblocks.py
  • A script to run GBlocks to mask a directory full of alignments in FASTA format. Note: This currently runs GBlocks at the most relaxed settings for phylogenetic tree inference. It will reject any masks that remove more than 20% of the columns from the original alignment.
  • Dependencies:
     i. GBlocks (http://molevol.cmima.csic.es/castresana/Gblocks.html) called as gblocks
  1. run_muscle.py
  1. run_pasta_aln.py
  1. run_raxml.py
  1. seq_convert.py
  • A sequence file format conversion tool. Currently converts between FASTA (.fa), Phylip (.ph), and Nexus (.nex) formats. It assumes files will have those extensions. Remember, these formats vary a lot in the details, so they might not work right away for everything. Let me know if you run into problems and I'll try to fix it.
  1. supertreemaker.py

About

My personal core functions and scripts for manipulating sequence data, phylogenetic trees, and other things.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • R 0.3%