Structual Variant Pipeline

computed 0.23
computed 0.13

TODO

add adaptable background rate for smith waterman filtering

Usage

The basic pipeline may be ran with the follow command.

python run.py four data/ambig_calls/ figs/all.pdf --L 600

The first argument specifies the type of pipeline that is being run. In this case four indicates that the pdf with the four plots per page should be generated. Other options include simple, which does (TODO), and sixteen which generates a pdf page with 16 subplots, where each subplot is the overlap graph for a different L.

The second argument, data/ambig_calls, specifies the folder where the input files are found. The four pipeline requires 3 files per region: x-refcoords.bed, x_merged.m4, x.fa, where x is the name of the region. The pipeline uses all *.m4 files in the specified directory as regions to examine.

The third argument specifies the output pdf. If pdftk is not installed (and in which case the pipeline will run correctly), then figs/all.pdf will not exist, but figs/ will be populated with a pdf for each region.

The fourth argument, --L, is an optional argument specifying the minimum matching length to say that a read overlaps another read. The L parameter is fed into minimap; see minimap for more information.

Notes

If running pipeline via ssh, login with ssh -X

Dependencies

numpy, networkx, bipython... the works
- see requirements.txt for more info
align: github.com/aled1027/align
- clone the repo and install with python setup.py develop
pdftk
- Not necessary, but it combines many pdfs into a single pdf.
- On debian, install with sudo apt-get install pdfkt

Profiling

python -m cProfile -s time the_pipeline.py > profile.text 2>&1

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
incomplete_data		incomplete_data
sv_pipeline		sv_pipeline
.gitignore		.gitignore
.gitmodules		.gitmodules
connect.sh		connect.sh
minimap		minimap
readme.md		readme.md
run.py		run.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

incomplete_data

incomplete_data

sv_pipeline

sv_pipeline

.gitignore

.gitignore

.gitmodules

.gitmodules

connect.sh

connect.sh

minimap

minimap

readme.md

readme.md

run.py

run.py

run.sh

run.sh

Repository files navigation

Structual Variant Pipeline

TODO

Usage

Notes

Dependencies

Profiling

About

Releases

Packages

Contributors 2

Languages

aled1027/sv_pipeline

Folders and files

Latest commit

History

Repository files navigation

Structual Variant Pipeline

TODO

Usage

Notes

Dependencies

Profiling

About

Resources

Stars

Watchers

Forks

Languages