VarFilter

This is a tool used to filter annovar annotated variants file. Annovar is a wonderful tool that annotate genome wide variants. You can specify rules and apply different disease models while filtering. Hanqing Liu at Zhejiang University liuhanqing93@gmail.com

Command

-I —input <file_route>

the file contain variants, if you just get vcf format, annotate it use annovar, or wannovar (a web app interface of annovar). See test/test.annovar.txt for example, which is generated by wannovar, with suffix “.annovar”.

-SI -sample_info <file_route>

the file contain samples’ information, should contain at least FamilyID, SampleID, Gender, Type, Father, Mother. See test/sample_info.txt for example.

-S -sample <str>

Filter variants using given sample ID, for multiple samples, use space to separate. Variant will be remained if only one given sample has it.

-G -gene <str>

Filter variants using given gene name, for multiple genes, use space to separate.

-R -region <str num num>

Filter variants using given region, format “chromosome start end”, like “chr2 1000 100000”. For multiple regions, use space to separate.

-CF -column_filter <str str str boole>

Filter variants using one column’s information in input file, format “column_name logic query_value na_remain”, like “SIFT_score ‘>’ 0.5 T”. For multiple column filter, you can specify this flag for many times.

column_name: the name of column corresponding to your input file.
logic: should be one of [‘>’, ‘<‘, ‘=‘, ‘!=‘, ‘>=‘, ‘<=‘] (remember to add quotes) when query value is number, or one of [in, !in, include, !include, is, !is] when query value is string.
query_value: the value of this filter
na_remain: should be ’T’ to remain NA value (information not provide in the input file, usually remain blank or ‘.’), or ‘F’ to exclude.

-TL -total_logic

If you specify -CF for more than one times, than you should assert the overall filter logic of column filter. Should be one of [‘ALL_TRUE’, ‘NOT_ALL_TRUE’, 'ALL_FALSE’, ‘NOT_ALL_FALSE’, ’N_TRUE’, ’N_FALSE’], N is the number of true/flase columns.

-M -model

Apply mendel's law to your input file, using the sample information provided by sample file. Should be [Dom, ResHom, ResComp], for multiple models, use space to separate. For every families:

‘Dom’: Dominant, all patients carry and all healthy people don’t.
‘ResHom’: Recessive Homozygote, all patients are homozygote and all healthy people don’t.
‘ResComp’: Recessive Compound Heterozygote, for at least two variants on one gene, all patients are both heterozygote (but not homozygote, which is contained in ResHom) and all healthy people don’t.

-O -output

Specify the filename of filter result.

Examples

Load input file, sample file, output to a file.

python main.py -I ./test/test.annovar -SI ./test/sampleinfo.txt -O filterresult

Filter by genes, samples, regions.

python main.py -I ./test/test.annovar -SI ./test/sample_info.txt -O filter_result -S 1 2 3 -G USH2A -R chr1 215800000 216200000

Use two column filter, one is to select allele frequency larger than 0.10 in 1000G_ALL (1000 genomes all), the other is to select prediction of Polyphen2 is not B (Benign). The overall logic is ALL_TRUE, which means both judgements of two filter should be TRUE.

python main.py -I ./test/test.annovar -SI ./test/sample_info.txt -O filter_result -CF 1000G_ALL '<=' 0.10 F -CF Polyphen2_HDIV_pred '!is' B F -TL ALL_TRUE

Select samples from same family, apply Dominant model to them.

python main.py -I ./test/test.annovar -SI ./test/sample_info.txt -O filter_result -S 1 2 3 -G USH2A -M Dom

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
test		test
README.md		README.md
config.py		config.py
data.py		data.py
filter.py		filter.py
main.py		main.py
merge.py		merge.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test

test

README.md

README.md

config.py

config.py

data.py

data.py

filter.py

filter.py

main.py

main.py

merge.py

merge.py

Repository files navigation

VarFilter

Command

-I —input <file_route>

-SI -sample_info <file_route>

-S -sample <str>

-G -gene <str>

-R -region <str num num>

-CF -column_filter <str str str boole>

-TL -total_logic

-M -model

-O -output

Examples

About

Releases

Packages

Languages

RubiscoHQ/VarFilter

Folders and files

Latest commit

History

Repository files navigation

VarFilter

Command

-I —input <file_route>

-SI -sample_info <file_route>

-S -sample <str>

-G -gene <str>

-R -region <str num num>

-CF -column_filter <str str str boole>

-TL -total_logic

-M -model

-O -output

Examples

About

Resources

Stars

Watchers

Forks

Languages