Skip to content

pfiziev/AluMorphix

Repository files navigation

This file provides a brief description of my project and all scripts that I have programmed.

SYSTEM REQUIREMENTS:
1. Python 2.7
2. Bowtie 2
3. Samtools
4. Pysam

A collection of shell scripts run

run.sh: This script runs the whole analysis pipeline:
1. Generate a diploid subject genome
2. Generate a set of random pair-end reads
3. Align all reads against a database of known Alu sequences
4. Tag reads that map to Alus
5. Align the reads against the reference genome
6. Process alignments and report potential insertions, deletions and if they are heterozygous or homozygous

run_random.sh - runs the whole pipeline, but generates a random reference genome instead of using real data.

Various scripts to generate the subject genome and the short reads:
generate_random_genome.py
generate_reads.py
generate_subject_genome_and_reads.py
generate_subject_genome.py
generate_subject_genome_with_deletions.py


bowtie2_build.sh - builds an index for the reference genome

bowtie2.sh - mapsthe short reads to the reference genome
bt2_vs_alu.sh - maps shorts reads to the Alu db

mark_ALU_reads.py - scans the alignments of the reads and the known Alu sequences and tags reads that come from an Alu.

utils.py - some utility methods

alumorphix.py - determines potential insertions and deletions and if they are heterozygous or homozygous

The project assumens that the data is organized as following:

chr22_diploid_30x/:

chr22_diploid_30x/alu: # put all information about known Alus here
alu.fa  # the sequence of the Alu
alu.fa.fai  #indexed with samtools
bowtie2_index   # bowtie2 index of the Alu sequence

chr22_diploid_30x/genome:   # the reference genome
alu.fa
alu.fa.fai
bowtie2_index
chr22_ALU
chr22_ALU.bed
genome.fa
genome.fa.fai


chr22_diploid_30x/alu/bowtie2_index:
alu_bt2.1.bt2
alu_bt2.2.bt2
alu_bt2.3.bt2
alu_bt2.4.bt2
alu_bt2.rev.1.bt2
alu_bt2.rev.2.bt2

chr22_diploid_30x/genome/bowtie2_index:
genome.1.bt2
genome.2.bt2
genome.3.bt2
genome.4.bt2
genome.rev.1.bt2
genome.rev.2.bt2

About

Alu Polymorphism Detector

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published