genotype_calling/
post_processing.sh : Post-processing for vcf files generated by bcftools calling pipeline: clean-up and filtering
call.sh : Runs standard QC steps on genotype data: sex check, IBD, PCA, missing genotype rate, mendel error rate.
run_HLA_typing_HLA-LA.sh : Runs HLA-LA on a list of bam files
variant_processing/
merge_excl_both.sh : Merges 2 plink datasets, removing SNPs that have discordant alleles from both datasets
liftover.sh : Runs a liftover/liftdown for a plink genotype dataset
exclude_duplicate_pos_in_bim_missingness.py : Finds duplicate SNPs in a plink genotype fileset and prints SNPs with higher missing rate to be excluded (as part of the liftover.sh pipeline)
update_pos_after_liftover.py : Updates coordinates in bim file after liftover (part of the liftover.sh pipeline)
fixVcfAlleles.py : Aligns alleles in a VCF file according to a reference fasta file, swaps them if necessary
extract_bed_from_plink.sh : Filter plink dataset removing SNPs within intervals given as a 0-based bed file
correctForBatchEffect.R : Removes SNPs that have a low chi-squared p-value for testing difference between genotype counts in Mallick vs Pagani datasets
convert_table_to_fam.py : Creates a plink fam file from a tab-separated sample desciption table.
convertBedToGds.R : converts plink to GDS format
convertGdsToBed.R : converts GDS to plink format
plink2treemix.py : converts plink to treemix format
prune.sh : prunes a plink dataset
variant_QC/
run_genotype_QC.sh : Runs standard QC steps on genotype data: sex check, IBD, PCA, missing genotype rate, mendel error rate.
plot_all_QC.R : Plots QC metrics obtained by run_genotype_QC.sh script
parse_bcftools_stats.py : Gathers QC statistics from per-population bcftools stats files and combined the QC results into one table
getSexChrCounts.py : Counts the numbers of alleles on chrX and chrY for each sample
variant_annotation_and_AFs/
get_rs_by_positions.py : Annotates a text table with SNPs as chr:pos with rs ids
get_positions_by_rs.py : Gets SNP chr:position by rs id
get_AF_AC_per_population.py : Counts allele number and allele counts for each SNP in each population group.
parse_combine_DBs.py : Downloads GWAS catalog, (HGMD) and ClinVar, reformats them in a unified way, merges annotations for the same SNPs
parseVEP.py : Parses the output of Ensembl VEP and writes the result as a table
findInterestingVEP2_withAFs.py : Parses Ensembl VEP annotation, an older version of parseVEP.py script. Better use parseVEP.py
calculate_overlap_and_maf.py : Calculates the number of overlapping SNPs in 3 datasets
population_genetics/
run_ibd.sh : Runs Beagle IBD sharing analysis
rename_samples_by_regions.py : Rename samples from original sample ids to "Source_Population_index"
plot_admixture_results_ivan.R : plots Admixture results in R
plot_PCA.R : make various PCA plots
plot_PCA_rus-FU.R : plot Russians together with Uralic populations on the PC plane
corplots.R : make corplots with D statistics for Russians vs Uralic
corplots_all_FU.R : make D statistics corplots for Uralic vs Uralic populations
plot_ibd.R : plot IBD results
plot_finestructure.R : modified version of the plotting function provided with finsetructure
param_dstats.TEMPLATE.txt : template for parameter file for D statistics
make_all_possible_triplets.py : makes all triplets from an .ind file
make_all_possible_quadruples.py : makes all quadruples from an .ind file
logs/
merge_GR+papers.log.sh : Merge GR genotypes with Mallick and Pagani, apply basic filtering