Skip to content

fhlab/TRIAD

Repository files navigation

TRIAD

Scripts for analysing the composition of libraries generated by TRIAD cloning, which accompany the manuscript by Stephane Emond1, Maya Petek, Emily Kay, Brennen Heames, Sean Devenish, Nobuhiko Tokuriki and Florian Hollfelder.

Setup and installation

Alignment software

All are currently (August 2019) available as packages in standard Ubuntu repositories, apart from PEAR, which now requires a separate download due to a change in licensing.

The versions given here were used to process the Illumina dataset described in the manuscript. The pickled summary data and reference sequences for alignment are provided in the folder manuscript_data.

Python and Conda dependencies

These are a set of scripts that was developed and tested within the Anaconda tools. The environment is given in TRIAD.yml. To install the environment, go to the TRIAD directory and use:

conda env create -f TRIAD.yml

Environment set-up usually takes about 10 minutes. Activate the environment with source activate TRIAD or conda activate TRIAD: the latter is preferred for modern conda v.4.4 or higher.

Possible issue with conda and resolution

While BioPython is included in the conda environment file, you may run into an issue where BioPython cannot be loaded. The workaround is to first install pip3 with your preferred package manager, then create the conda environment and finally install BioPython with sudo pip3 install -c conda-forge biopython.

Short version of scripts

1a. If working on a cluster and it is difficult to install PEAR, assemble the reads separately:

pear -f $forwardReads -r $reverseReads -o $baseName.$activity --keep-original --min-overlap 5 --min-assembly-length 0 --quality-threshold 15 --max-uncalled-base 0.01

1b. Run count.sh reads_fw reads_rv reference.fa base_name activity or count_PC.sh [arguments] as appropriate to environment.

Arguments:

  • reads_fw : (only for PC version) path to forward fastq.gz reads
  • reads_rv: (only for PC version) path to reverse reads
  • reference.fa : filepath to reference fasta file
  • base_name : Usually a shorthand for what gene we are looking at, eg. PTE
  • activity : A label for what fraction / activity gate / input library these reads came from

Steps in the pre-processing script.

  • (If using paired end reads in 1a: merge reads with PEAR. Take the opportunity to filter out very broken data.)
  • Align all reads against reference.
  • While we have SAM files, take the opportunity to calculate depth / position.
  • Extract reads that are correctly mapped, keep the name.
  • Throw away reads the fully match reference. This is faster than NW alignment for all reads later.
  • Feed interesting reads to EMBOSS Needleman-Wunsch aligner.
  • Output fasta ALN files.

These scripts will output an alignment file in FASTA format from each pair of input forward & reverse reads. The output names are specified as options to count.sh such that alignments will be named base_name.activity.aln (for example, PTE.3bp_deletion.aln).

Counting substitutions, deletions and combinations for each sequencing file

When run on the PTE library data, this the following command and options:

PTE_composition.py --folder /path/to/aln/files --reference TRIAD/manuscript_data/full_fragment.fa --start_offset 200 --end_trail 97 --output S6_full
  • Load a dictionary of all interesting mutations we're considering
  • Read a read+reference into a SeqRecord
  • Various checks that the read is not broken Barcodes: As long as the read does not contain insertions, the barcode is ignored and does not contribute to detected mutations
  • If the mutation is defined as interesting, figure out what kind it is and add to valid_counts dictionary
  • Add the mutation to a dictionary counting everything
  • Save both dictionaries for later viewing.

IPYNB notebook that reproduces figures and statistics in the manuscript

Start a jupyter notebook with jupyter lab and have a look at results in TRIAD_composition_figures.ipynb.

About

Scripts for analysing the composition of libraries generated by TRIAD cloning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published