GitHub

Description of the WGS data processing scripts in this repository

genotype_calling/

post_processing.sh : Post-processing for vcf files generated by bcftools calling pipeline: clean-up and filtering

call.sh : Runs standard QC steps on genotype data: sex check, IBD, PCA, missing genotype rate, mendel error rate.

run_HLA_typing_HLA-LA.sh : Runs HLA-LA on a list of bam files

variant_processing/

merge_excl_both.sh : Merges 2 plink datasets, removing SNPs that have discordant alleles from both datasets

liftover.sh : Runs a liftover/liftdown for a plink genotype dataset

exclude_duplicate_pos_in_bim_missingness.py : Finds duplicate SNPs in a plink genotype fileset and prints SNPs with higher missing rate to be excluded (as part of the liftover.sh pipeline)

update_pos_after_liftover.py : Updates coordinates in bim file after liftover (part of the liftover.sh pipeline)

fixVcfAlleles.py : Aligns alleles in a VCF file according to a reference fasta file, swaps them if necessary

extract_bed_from_plink.sh : Filter plink dataset removing SNPs within intervals given as a 0-based bed file

correctForBatchEffect.R : Removes SNPs that have a low chi-squared p-value for testing difference between genotype counts in Mallick vs Pagani datasets

convert_table_to_fam.py : Creates a plink fam file from a tab-separated sample desciption table.

convertBedToGds.R : converts plink to GDS format

convertGdsToBed.R : converts GDS to plink format

plink2treemix.py : converts plink to treemix format

prune.sh : prunes a plink dataset

variant_QC/

run_genotype_QC.sh : Runs standard QC steps on genotype data: sex check, IBD, PCA, missing genotype rate, mendel error rate.

plot_all_QC.R : Plots QC metrics obtained by run_genotype_QC.sh script

parse_bcftools_stats.py : Gathers QC statistics from per-population bcftools stats files and combined the QC results into one table

getSexChrCounts.py : Counts the numbers of alleles on chrX and chrY for each sample

variant_annotation_and_AFs/

get_rs_by_positions.py : Annotates a text table with SNPs as chr:pos with rs ids

get_positions_by_rs.py : Gets SNP chr:position by rs id

get_AF_AC_per_population.py : Counts allele number and allele counts for each SNP in each population group.

parse_combine_DBs.py : Downloads GWAS catalog, (HGMD) and ClinVar, reformats them in a unified way, merges annotations for the same SNPs

parseVEP.py : Parses the output of Ensembl VEP and writes the result as a table

findInterestingVEP2_withAFs.py : Parses Ensembl VEP annotation, an older version of parseVEP.py script. Better use parseVEP.py

calculate_overlap_and_maf.py : Calculates the number of overlapping SNPs in 3 datasets

population_genetics/

run_ibd.sh : Runs Beagle IBD sharing analysis

rename_samples_by_regions.py : Rename samples from original sample ids to "Source_Population_index"

plot_admixture_results_ivan.R : plots Admixture results in R

plot_PCA.R : make various PCA plots

plot_PCA_rus-FU.R : plot Russians together with Uralic populations on the PC plane

corplots.R : make corplots with D statistics for Russians vs Uralic

corplots_all_FU.R : make D statistics corplots for Uralic vs Uralic populations

plot_ibd.R : plot IBD results

plot_finestructure.R : modified version of the plotting function provided with finsetructure

param_dstats.TEMPLATE.txt : template for parameter file for D statistics

make_all_possible_triplets.py : makes all triplets from an .ind file

make_all_possible_quadruples.py : makes all quadruples from an .ind file

logs/

merge_GR+papers.log.sh : Merge GR genotypes with Mallick and Pagani, apply basic filtering

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
GWAS_scripts		GWAS_scripts
HLA-typing		HLA-typing
genotype_calling		genotype_calling
logs		logs
misc		misc
population_genetics		population_genetics
tmp		tmp
variant_QC		variant_QC
variant_annotation_and_AFs		variant_annotation_and_AFs
variant_processing		variant_processing
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GWAS_scripts

GWAS_scripts

HLA-typing

HLA-typing

genotype_calling

genotype_calling

logs

logs

misc

misc

population_genetics

population_genetics

tmp

tmp

variant_QC

variant_QC

variant_annotation_and_AFs

variant_annotation_and_AFs

variant_processing

variant_processing

.DS_Store

.DS_Store

README.md

README.md

Repository files navigation

Description of the WGS data processing scripts in this repository

About

Releases

Packages

Languages

DashaZhernakova/GR_scripts

Folders and files

Latest commit

History

Repository files navigation

Description of the WGS data processing scripts in this repository

About

Resources

Stars

Watchers

Forks

Languages