Skip to content

Create sample sequences from a reference genome and a VCF file

Notifications You must be signed in to change notification settings

JoseBlanca/vcf2fasta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

vcf2fasta creates sequences for each sample in a VCF file.

I have written this script to help me with a phylome analysis.

It requires:

* a genome reference fasta file
* a VCF file.
* a BED file with the regions to process

It will substitute the alleles found in the VCF file into the genome reference to create the sample sequences.

It will create a fasta file for each region found in the BED file. In each fasta file the aligned sequence for each sample in the VCF file will be written.

A file with the coverages found in a list of BAM files can be optionally given. This file can be generated with the command:

$ samtools depth -f bam_list.txt -q 20 -Q 55 | gzip > coverages.csv.gz

If this coverage file is given all the positions with a coverage lower than a given threshold will be set to N.

Other alternatives to this script are: GATK's FastaAlternateReferenceMaker and bcftools consensus.

About

Create sample sequences from a reference genome and a VCF file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages