Skip to content

DashaZhernakova/GR_scripts

Repository files navigation

Description of the WGS data processing scripts in this repository

genotype_calling/

post_processing.sh : Post-processing for vcf files generated by bcftools calling pipeline: clean-up and filtering

call.sh : Runs standard QC steps on genotype data: sex check, IBD, PCA, missing genotype rate, mendel error rate.

run_HLA_typing_HLA-LA.sh : Runs HLA-LA on a list of bam files

variant_processing/

merge_excl_both.sh : Merges 2 plink datasets, removing SNPs that have discordant alleles from both datasets

liftover.sh : Runs a liftover/liftdown for a plink genotype dataset

exclude_duplicate_pos_in_bim_missingness.py : Finds duplicate SNPs in a plink genotype fileset and prints SNPs with higher missing rate to be excluded (as part of the liftover.sh pipeline)

update_pos_after_liftover.py : Updates coordinates in bim file after liftover (part of the liftover.sh pipeline)

fixVcfAlleles.py : Aligns alleles in a VCF file according to a reference fasta file, swaps them if necessary

extract_bed_from_plink.sh : Filter plink dataset removing SNPs within intervals given as a 0-based bed file

correctForBatchEffect.R : Removes SNPs that have a low chi-squared p-value for testing difference between genotype counts in Mallick vs Pagani datasets

convert_table_to_fam.py : Creates a plink fam file from a tab-separated sample desciption table.

convertBedToGds.R : converts plink to GDS format

convertGdsToBed.R : converts GDS to plink format

plink2treemix.py : converts plink to treemix format

prune.sh : prunes a plink dataset

variant_QC/

run_genotype_QC.sh : Runs standard QC steps on genotype data: sex check, IBD, PCA, missing genotype rate, mendel error rate.

plot_all_QC.R : Plots QC metrics obtained by run_genotype_QC.sh script

parse_bcftools_stats.py : Gathers QC statistics from per-population bcftools stats files and combined the QC results into one table

getSexChrCounts.py : Counts the numbers of alleles on chrX and chrY for each sample

variant_annotation_and_AFs/

get_rs_by_positions.py : Annotates a text table with SNPs as chr:pos with rs ids

get_positions_by_rs.py : Gets SNP chr:position by rs id

get_AF_AC_per_population.py : Counts allele number and allele counts for each SNP in each population group.

parse_combine_DBs.py : Downloads GWAS catalog, (HGMD) and ClinVar, reformats them in a unified way, merges annotations for the same SNPs

parseVEP.py : Parses the output of Ensembl VEP and writes the result as a table

findInterestingVEP2_withAFs.py : Parses Ensembl VEP annotation, an older version of parseVEP.py script. Better use parseVEP.py

calculate_overlap_and_maf.py : Calculates the number of overlapping SNPs in 3 datasets

population_genetics/

run_ibd.sh : Runs Beagle IBD sharing analysis

rename_samples_by_regions.py : Rename samples from original sample ids to "Source_Population_index"

plot_admixture_results_ivan.R : plots Admixture results in R

plot_PCA.R : make various PCA plots

plot_PCA_rus-FU.R : plot Russians together with Uralic populations on the PC plane

corplots.R : make corplots with D statistics for Russians vs Uralic

corplots_all_FU.R : make D statistics corplots for Uralic vs Uralic populations

plot_ibd.R : plot IBD results

plot_finestructure.R : modified version of the plotting function provided with finsetructure

param_dstats.TEMPLATE.txt : template for parameter file for D statistics

make_all_possible_triplets.py : makes all triplets from an .ind file

make_all_possible_quadruples.py : makes all quadruples from an .ind file

logs/

merge_GR+papers.log.sh : Merge GR genotypes with Mallick and Pagani, apply basic filtering

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published