mmhe
is an implemtation of moment-matching method for SNP-based heritability eitmation.
For the python scripts, you will need to install python and the packages required, including argparse
, numpy
, os
, and struct
.
For the Matlab scripts, you will need to install Matlab.
You can get mmhe
by simply clone this repo with
git clone https://github.com/chiayenchen/mmhe.git
or download the entire repo from github website (https://github.com/chiayenchen/mmhe
).
-
Python version
mmhe.py
Usage:
python ./mmhe.py --grm my_grm_prefix --pheno my.pheno --mpheno 1 --covar my.covar
You can get the input files descrption with
./mmhe.py --help
.mmhe.py
can take a pre-computed gentic relationship matrix (GRM), a phenotype file with multiple phenotypes, and a covaraite file (usually contains pricipal components for ancestry adjustment). Specifications of these files are listed below.-
The phenotype file follows the GCTA phenotype file format (same as PLINK phenotype format). The first 2 columns are the family ID and individual ID of the subjects included. These IDs are used to match the phenotype to the GRM. Make sure these IDs correspond to the IDs in the genotype file used to calculate GRM. IF you have multiple phenotypes in the file, you can specify which phenotype to use in the current analysis by
--mpheno
. -
The covariate file follows the GCTA
--qcovar
file format. The first 2 columns are the family ID and individual ID of the subjects included. These IDs are used to match the phenotype to the GRM. Make sure these IDs correspond to the IDs in the genotype file used to calculate GRM. Note that all covarites in the file will to be included in the analysis and all covariates are treated as continuous variables. -
The GRM follows GCTA binary GRM format (
PREFIX.grm.bin
,PREFIX.grm.N.bin
andPREFIX.grm.id
). However,mmhe.py
only requiresPREFIX.grm.bin
andPREFIX.grm.id
.
Link to GCTA:
http://cnsgenomics.com/software/gcta/index.html
Link to PLINK2:
https://www.cog-genomics.org/plink2
The output of
mmhe.py
is the point esitamate and standard error of SNP-based heritabilty. The computation time is also provided. -
-
Matlab version
mmhe.m
Once read in the GRM, phenotype, and covariates in Matlab as matrices,
mmhe.m
can give the point esitamate and standard error of SNP-based heritabilty. Specifications of the input data format are listed below.- y: n_subj x 1 vector of phenotype
- X: n_subj x n_cov matrix of covariates
- K: n_subj x n_subj matrix of empirical genetic similarity matrix (n_subj is the number of subjects anlyzed)
The outputs from
mmhe.m
are- h2: SNP heritability estimate
- se: standard error estimate of h2
-
Python version
mmhe_col.py
Usage:
python ./mmhe.py --grmdir my_grm_dir my_grm_prefix --pheno my.pheno --mpheno 1 --covar my.covar
For a dataset that has more than 100,000 subjects (or any large dataset), the
mmhe_col.py
can load the GRM by blockes.Specifications of the input data format are listed below.
-
Phenotype file and covariate file follow the GCTA format (see description in the
mmhe
single GRM version). -
Block GRM files with
PREFIX.grm.id
file.
The bock GRM files are n_subj x k matrices that are the columns of the full GRM matrix. Typically you would have
n_subj/k
block GRM files each with k columns of the full GRM and 1 block GRM files that will take less thank
columns at the end of the GRM. These files should be plain text file and are saved as PREFIX.{block_num}.grm (e.g., PREFIX.1.grm, PREFIX.2.grm, ...) in 1 directory.The
PREFIX.grm.id
file shoud follow the format of GCTA GRM file format, with first column family ID and second column individual ID. -
-
Matlab version
mmhe_col.m
For a dataset that has more than 100,000 subjects (or any large dataset), the
mmhe_col.m
can load the GRM by blockes.Specifications of the input data format are listed below.
- y: n_subj x 1 vector of phenotype
- X: n_subj x n_cov matrix of covariates
- grm_dir: directory where block columns of the empirical genetic similarity matrix can be found; we have assumed here that each block column variable K is save as GRM_col{col_num}.mat (e.g., GRM_col1.mat, GRM_col2.mat, ...) in this directory. blk_size: size (number of columns) of each block
The outputs from
mmhe_col.m
are- h2: SNP heritability estimate
- se: standard error estimate of h2
Please contact Tian Ge (tge1@mgh.harvard.edu) or Chia-Yen Chen (chiayen.chen@mgh.harvard.edu) for any questions and comments.
Please cite the bioRxiv paper if you use this software. Phenome-wide Heritability Analysis of the UK Biobank Tian Ge, Chia-Yen Chen, Benjamin M. Neale, Mert R. Sabuncu, Jordan W. Smoller doi: https://doi.org/10.1101/070177 http://biorxiv.org/content/early/2016/08/18/070177