Skip to content

bodenlab/GRASP-resources

Repository files navigation

GRASP-resources

This is the GRASP-resources page, where you will find -

1. Supplementary material for the GRASP paper

2. Python notebooks designed to help with curating and analysing ancestral sequence reconstruction datasets

You can also learn more about GRASP at the GRASP-suite website and use GRASP now

You might also be interested in -

GRASP - the code for the web GUI version of GRASP

bnkit - the code for performing inference within GRASP

Supplementary material

All of the supplementary material for the GRASP paper is stored in the Supplementary Material folder. Refer to the README within that folder for further information.

Notebooks

These Jupyter notebooks are split into two sections.

Curation - aligning, curating, and handling files before ancestral inference

Post Inference Analysis - analysing data sets after ancestral inference

How to use this repository

  1. Clone this repository to your desktop
git clone https://github.com/bodenlab/GRASP-resources.git

  1. Install the required Python modules as specified in requirements.txt (we assume python>=3.5)
pip install -r requirements.txt

Some notebooks require additional code that is stored in the /src folder. As long as you keep the src folder in the same relative location to the notebooks this will run correctly.

  1. For Curation 5, the standard package of MAFFT is required for multiple sequence alignment.

Here are the instructions to install MAFFT

  1. Now you can start a Jupyter notebook from the main folder
jupyter-notebook

And you will be able to navigate to the different notebooks and run the Python code within them.

Notebooks Table of Contents

  • Curation 1 - Basic file handling

This notebook shows ways to read FASTA files into Python and perform basic operations on them.

  • Curation 2 - Sequence curation

This notebook shows how to filter sequence data sets on basis of their headers and how to summarise the species information within them.

  • Curation 3 - Checking exon counts

This notebook shows how to query NCBI database to retrieve exon counts for a sequence data set.

  • Curation 4 - Mapping exon structure

This notebook shows how to map the exon structure information onto a multiple sequence alignment.

  • Curation 5 - Sequence curation for ancestral sequence reconstruction

This notebook shows how to automatically and iteratively remove sequences from a data set on the basis of length, bad characters, motifs, and internal deletions.

  • Post inference analysis 1 - Analysis of fractional distance

This notebook allows you to analyse how the amino acid sequence at equivalent nodes changes as we increase data set size. You can specify nodes of interest in the smallest data set, which are then mapped to the equivalent nodes in the larger data sets, and then the fractional distance is calculated and plotted for all given nodes. This analysis was performed in the GRASP paper (see Figure 3).

The default notebook uses the DHAD and CYP2 data sets and recreates figures from the GRASP paper, however it can easily be adapted to your own data sets.

About

ASR workflows & GRASP supplementary material

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published