Healthy-Aging

Folder structure

age_predict/
	age_predict/
		contains a custom python package we developed for data analysis
		


analysis/

	EWAS/ (indicates data was from the Epigenome Wide Association Datahub)

		blood/
			contains DNA Methylation analysis with data from whole blood
		
		brain/
			contains DNA Methylation analysis with data from whole blood
		
		breast/
			contains DNA Methylation analysis with data from whole blood
		
		leukocyte/
			contains DNA Methylation analysis with data from whole blood
		
		healthy_vs_unhealthy/
			contains DNA Methylation analysis of healthy vs unhealthy
		
		gene_identification/
			contains mappings of CpG sites to genes
		


	PPMI/ (indicates data used was from the Parkinson's Progression Marker Initiative database)
		
		blood_chem_EDA/
			contains PPMI blood chemistry EDA

		ppmi_120_methylation_profiling/
			contains PPMI methylation data EDA

		data_summary.xlsx
			contains summary of data in various PPMI files

NOTES ON DATA

The data for this project came from the EWAS Data Hub https://bigd.big.ac.cn/ewas/datahub/index.
The data having to do with healthy individuals came from the EWAS-pre-prepared data cut called "age_methylation_v1.zip", and its meta data was from the file "sample_age_methylation_v1.zip". When unzipped the data file used was then called "age_methylation_v1.txt". It is 22 GB in size, too large to place on GitHub. It can be downloaded at: https://bigd.big.ac.cn/ewas/datahub/download
In the following directory: 'Healthy-Aging/analysis/EWAS/blood/feature_selection' there is a jupiter notebook entitled "Load_all_data_select_out_tissue_save_ranked_dfs". If provided with a path to the data file "age_methylation_v1.txt" and a tissue type, it will read in the data and select out data from just that tissue. Then, if you like, it will impute the missing values, using our standard procedure, and then create Train and Test sets containing data from just the cpg sites we have ranked as most important. Then, these datasets are saved as pandas dataframes in the feature_selection directories. It is these dataframes that were then used by the modeling routines in the modeling directories.
The data on unhealthy individuals came from the EWAS-pre-prepared data cut called "disease_methylation_v1.zip" and its metadata was from the file "sample_disease_methylation_v1.zip". This files is too large to place on GitHub. It can be downloaded at: https://bigd.big.ac.cn/ewas/datahub/download
Cuts of the data required for the analysis can be produced using the following scripts:

alz_brain_top_56.csv and hunt_brain_top_56.csv: get_shared_healthy_unhealthy.ipynb

combined_healthy_unhealthy_107_cpgs.csv: produce_combined_healthy_unhealthy.ipynb

alz_brain_unhealthy_all.csv and hunt_brain_unhealthy_all.csv: read_all_unhealthy_brain_data.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
age_predict		age_predict
analysis		analysis
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

age_predict

age_predict

analysis

analysis

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Healthy-Aging

NOTES ON DATA

About

Releases

Packages

Contributors 4

Languages

AC297rDNAMethylation2021/Healthy-Aging

Folders and files

Latest commit

History

Repository files navigation

Healthy-Aging

NOTES ON DATA

About

Resources

Stars

Watchers

Forks

Languages