Lab-Research-On-DNA-Methylation

Project Title: Interpreting correlations between DNA Methylation Levels and DNA sequences

Overview:

DNA methylation is known for regulating gene expression. However, methylation levels have different pattern between brain cell types. We are interested in understanding where these differences come from and what specific reasons cause such variance of methylation levels. We hypothesize that methylation level differences are caused by DNA sequences correlated with cell type. This purpose of this project is to explain the correlation between DNA methylation levels and DNA sequences, with the help of machine learning models. We will first extract features of DNA sequences and apply linear models such as Lasso regression. Those features with nonzero coefficients would be interesting candidates for further analysis. Coefficient of determination will be calculated to evaluate our model performance. Furthermore, to improve prediction performances, neural networks will also be used to extract higher level features and help predict DNA methylation levels.

Brief Description of Projects and Methods:

DNA methylation has been well known for modifying the function of the genes and affecting gene expression. This project aims to predict and interpret DNA methylation at differentially methylated regions (DMRs), which have different methylation levels in different brain cell types. The dataset we use comes from whole genome bisulfite sequencing of mouse brain samples, combined with corresponding DNA sequences. These datasets are large-scale and high-dimensional, with ~60,000 DMRs for each of 16 cell types. The evaluation of our model will be the mean squared difference between predicted methylation values and observed methylation values. Features are extracted by scanning DNA sequences and counting the occurences of 2080 possible 6 base pair sequences (kmers, k=6). Other than Kmers occurrences, more features will be extracted and added such as CpG attributes, DNA structure and histone modification of that DNA sequences. The first method we apply is LASSO regression. By controlling the value of the regularization parameter, alpha, we will see how our models perform in terms of mean squared error with respect to number of non-zero coefficients. Afterwards, some nonlinear machine learning models will also be used to further improve our prediction results and help us interpret the correlations. For example, convolutional neural network might be helpful to extract high-level features and thus we can interpret these DNA sequences from macroscopic views.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
SimulationCNN		SimulationCNN
CNN_features.py		CNN_features.py
CNN_methylation.py		CNN_methylation.py
CNN_weight.npy		CNN_weight.npy
Convolutional Neural Network on REAL Data.ipynb		Convolutional Neural Network on REAL Data.ipynb
GLM_Binomial.npy		GLM_Binomial.npy
Generalized Linear Model (Binomial).ipynb		Generalized Linear Model (Binomial).ipynb
KNN.py		KNN.py
Lasso_params.npy		Lasso_params.npy
LinearRegression.ipynb		LinearRegression.ipynb
Logistic Regression.ipynb		Logistic Regression.ipynb
Motif_Comparision.ipynb		Motif_Comparision.ipynb
README.md		README.md
Random Forest Regressor.ipynb		Random Forest Regressor.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SimulationCNN

SimulationCNN

CNN_features.py

CNN_features.py

CNN_methylation.py

CNN_methylation.py

CNN_weight.npy

CNN_weight.npy

Convolutional Neural Network on REAL Data.ipynb

Convolutional Neural Network on REAL Data.ipynb

GLM_Binomial.npy

GLM_Binomial.npy

Generalized Linear Model (Binomial).ipynb

Generalized Linear Model (Binomial).ipynb

KNN.py

KNN.py

Lasso_params.npy

Lasso_params.npy

LinearRegression.ipynb

LinearRegression.ipynb

Logistic Regression.ipynb

Logistic Regression.ipynb

Motif_Comparision.ipynb

Motif_Comparision.ipynb

README.md

README.md

Random Forest Regressor.ipynb

Random Forest Regressor.ipynb

Repository files navigation

Lab-Research-On-DNA-Methylation

Project Title: Interpreting correlations between DNA Methylation Levels and DNA sequences

Overview:

Brief Description of Projects and Methods:

About

Releases

Packages

Languages

h5li/Lab-Research-On-DNA-Methylation

Folders and files

Latest commit

History

Repository files navigation

Lab-Research-On-DNA-Methylation

Project Title: Interpreting correlations between DNA Methylation Levels and DNA sequences

Overview:

Brief Description of Projects and Methods:

About

Topics

Resources

Stars

Watchers

Forks

Languages