Skip to content

Filter annovar annotated variant list.

Notifications You must be signed in to change notification settings

RubiscoHQ/VarFilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VarFilter

This is a tool used to filter annovar annotated variants file. Annovar is a wonderful tool that annotate genome wide variants. You can specify rules and apply different disease models while filtering. Hanqing Liu at Zhejiang University liuhanqing93@gmail.com

Command

-I —input <file_route>

the file contain variants, if you just get vcf format, annotate it use annovar, or wannovar (a web app interface of annovar). See test/test.annovar.txt for example, which is generated by wannovar, with suffix “.annovar”.

-SI -sample_info <file_route>

the file contain samples’ information, should contain at least FamilyID, SampleID, Gender, Type, Father, Mother. See test/sample_info.txt for example.

-S -sample <str>

Filter variants using given sample ID, for multiple samples, use space to separate. Variant will be remained if only one given sample has it.

-G -gene <str>

Filter variants using given gene name, for multiple genes, use space to separate.

-R -region <str num num>

Filter variants using given region, format “chromosome start end”, like “chr2 1000 100000”. For multiple regions, use space to separate.

-CF -column_filter <str str str boole>

Filter variants using one column’s information in input file, format “column_name logic query_value na_remain”, like “SIFT_score ‘>’ 0.5 T”. For multiple column filter, you can specify this flag for many times.

  • column_name: the name of column corresponding to your input file.
  • logic: should be one of [‘>’, ‘<‘, ‘=‘, ‘!=‘, ‘>=‘, ‘<=‘] (remember to add quotes) when query value is number, or one of [in, !in, include, !include, is, !is] when query value is string.
  • query_value: the value of this filter
  • na_remain: should be ’T’ to remain NA value (information not provide in the input file, usually remain blank or ‘.’), or ‘F’ to exclude.

-TL -total_logic

If you specify -CF for more than one times, than you should assert the overall filter logic of column filter. Should be one of [‘ALL_TRUE’, ‘NOT_ALL_TRUE’, 'ALL_FALSE’, ‘NOT_ALL_FALSE’, ’N_TRUE’, ’N_FALSE’], N is the number of true/flase columns.

-M -model

Apply mendel's law to your input file, using the sample information provided by sample file. Should be [Dom, ResHom, ResComp], for multiple models, use space to separate. For every families:

  • ‘Dom’: Dominant, all patients carry and all healthy people don’t.
  • ‘ResHom’: Recessive Homozygote, all patients are homozygote and all healthy people don’t.
  • ‘ResComp’: Recessive Compound Heterozygote, for at least two variants on one gene, all patients are both heterozygote (but not homozygote, which is contained in ResHom) and all healthy people don’t.

-O -output

Specify the filename of filter result.

Examples

  • Load input file, sample file, output to a file.

python main.py -I ./test/test.annovar -SI ./test/sampleinfo.txt -O filterresult

  • Filter by genes, samples, regions.

python main.py -I ./test/test.annovar -SI ./test/sample_info.txt -O filter_result -S 1 2 3 -G USH2A -R chr1 215800000 216200000

  • Use two column filter, one is to select allele frequency larger than 0.10 in 1000G_ALL (1000 genomes all), the other is to select prediction of Polyphen2 is not B (Benign). The overall logic is ALL_TRUE, which means both judgements of two filter should be TRUE.

python main.py -I ./test/test.annovar -SI ./test/sample_info.txt -O filter_result -CF 1000G_ALL '<=' 0.10 F -CF Polyphen2_HDIV_pred '!is' B F -TL ALL_TRUE

  • Select samples from same family, apply Dominant model to them.

python main.py -I ./test/test.annovar -SI ./test/sample_info.txt -O filter_result -S 1 2 3 -G USH2A -M Dom

About

Filter annovar annotated variant list.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages