Skip to content

tiehan/Rhea_Chip

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rhea_chip is a simple pipeline for the analysis of human data generated by NGS approaches. With Rhea_chip you can now have full suite of NGS tools up and running on any high end workstation in an afternoon.

This pipeline will provide a workflow as follows:

From the fastq files generated from the Illumina pipeline, all of them were first processed by filtering adaptor sequences and removing low-quality reads. Cleaned reads were aligned to the GRCh37/hg19 human reference genome (UCSC Genome Browser) with aln/mem and sampe tools of the Burrows-Wheeler Aligner (BWA)(Li & Durbin, 2010). The resulting bam files were sorted by using the SortSam tools which is part of the Picard toolkit (http://sourceforge.net/projects/picard/). The Picard MergeSamFiles tool was performed to merge the sorted bam files which belong to one sample into one bam file. Potential PCR artifacts were marked with the Picard MarkDuplicates tool. Notably, alignment to the human reference particularly in areas around small base pair insertions and deletions (indels) is often imperfect which may misalign. To facilitate the identification of Indels, alignments were refined with the Genome Analysis Toolkit(GATK) RealignerTargetCreator and IndelRealigner tools. When used together, these tools use the full alignment context to determine whether an Indel exists. Quality scores for sequencing reads were recalibrated using BaseRecalibrator and PrintReads from GATK to more closely match the actual probability of mismatching the reference and to correct any variation in quality between machine cycle and sequence context(DePristo et al., 2011). An index file for each BAM file was generated using the SAMTools index function. Variants were called using the HaplotypeCaller from GATK (https://www.broadinstitute.org/gatk/). HaplotypeCaller calls both single nucleotide polymorphisms and indels using de novo assembly of haplotypes in the target region. After variants were called and exported in variant call format (VCF), we performed the GATK tool VariantFiltration to filter variants. SNV and indel VCF files were combined using GATK CombineVariants tool.At the same time, we provide a new method for the detection of copy number variation.

Refer to [RheaChip/install.py] to install !

About

A Simple pipeline for the analysis of human data generated by NGS approaches.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%