Code and Data for 2016 PSB publication
####Note: This github reppository contains most code and some data used by this project. Limited by file size, some big datasets and algorithms/softwares (all open to public) are not uploaded. Detailed information will be provided upon request.
1 ./data
- contains some datasets used by this project
- ./mental-disorder-years-threshold-3percent
- contains the processed data from CT.GOV
- ./pubmed_matched
- show matched diseases for both CT.GOV and PubMed datasets (automatic matching)
2 ./script
- all scripts (mainly python and R) used by this project
- rawDataProcessing.py main script
- ./pylib python libraries used in the project
- ./rlib R library used in the project (for Rscript user, one can import the rlib.Rproj to run all R code)
3 ./result
- some (middle) results
- ./Comparison_in_ex all: results while compare CT.GOV inclusion/exclusion criteria
- ./Comparison_pubmed_ex: results while compare CT.GOV exclusion criteria with PubMed data
- ./network_In_Ex_Pubmed: network analysis results
- ./match_pubmed: matched diseases for both CT.GOV and PubMed datasets (semi-automatic matching with manual corrections)
4 ./REF
- all references used in the paper
mapDiseaseName
: map Did and Disease Name (optional)
get_cde_st_table()
: get CDE-ST table for all
cytoscapePrepare[N]()
: prepare for cytoscape file (from inpath files using R)
The programs in this folder is used to analysis generated by previous rawDataProcessing process.
Take the following steps:
-
rscript CDE_unique_analysis.R #source
-
source('./ST_analysis (original).R') #can not be finished in one program
- group=in
- group=ex
source("./networkAnalysis.R")
There are duplicated entries in the edgelist:pubmed
see overlap
Rscript Comparison_overlap.R, source won't plot, need run as bash
see top table FOR BOTH OVERAL AND PER DISEASE
source('./Comparison_topCEF.R)
For any question or request, please contact me at handongma.work at gmail.com
or