Code for ICML 2019 paper "Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates"

Author: George H. Chen (georgechen [at symbol] cmu.edu)

Code requirements:

Anaconda Python 3.6
Additional packages: joblib, lifelines
cython compilation is required:

python setup_random_survival_forest_cython.py build_ext --inplace

The main code implementing all the different nonparametric survival methods from the paper is in npsurvival_models.py. Cython helper code for random survival forests is in random_survival_forest_cython.pyx. There are two main utility files: survival_datasets.py deals with loading datasets (the "pbc" dataset is loaded from the statsmodels Python package; the "gbsg2" and "recid" datasets are loaded from the "data/"), and util.py has some helper calculation functions. Note: the "kidney" dataset is not public so I have removed it from this distribution. These Python files just mentioned should not be directly run. Instead the files that should be run are the demo_*.py files (e.g., python demo_rsfann.py config_tiny.ini, which saves results to the directory output_tiny); in particular, to generate all the experimental results for the "pbc", "gbsg2" and "recid" datasets (and save their results to csv files in the directory output), run ./demo.sh (warning: this takes a while to run).

After running demo.sh, a simple way to display all the tabulated outputs is to run python table_aggregator.py config.ini. To produce the plots (excluding the "kidney" dataset) in the main part of the paper (i.e., not the extended results), run python table_aggregator_plot_short.py config.ini. To produce the plots in the appendix (the extended results, excluding the "kidney" dataset), run python table_aggregator_plot.py config.ini. Note that these display/plot scripts require the auxiliary text files survival_estimator_names.txt and survival_estimator_names_short.txt.

Important: If you do not want to re-run all the methods but still want to produce plots (excluding for the "kidney" dataset), I have included precomputed csv tables in the folder precomputed. Please move the csv files in this folder to be in the output directory (as specified in the configuration file used; by default if using the provided config.ini file, the output directory is output) and run the plotting code to regenenerate plots.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
precomputed		precomputed
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
config_tiny.ini		config_tiny.ini
demo.sh		demo.sh
demo_cdfreg_knn_weighted.py		demo_cdfreg_knn_weighted.py
demo_coxph.py		demo_coxph.py
demo_kernel.py		demo_kernel.py
demo_knn_weighted.py		demo_knn_weighted.py
demo_rsf.py		demo_rsf.py
demo_rsfann.py		demo_rsfann.py
npsurvival_models.py		npsurvival_models.py
random_survival_forest_cython.pyx		random_survival_forest_cython.pyx
setup_random_survival_forest_cython.py		setup_random_survival_forest_cython.py
survival_datasets.py		survival_datasets.py
survival_estimator_names.txt		survival_estimator_names.txt
survival_estimator_names_short.txt		survival_estimator_names_short.txt
table_aggregator.py		table_aggregator.py
table_aggregator_plot.py		table_aggregator_plot.py
table_aggregator_plot_short.py		table_aggregator_plot_short.py
util.py		util.py

License

georgehc/npsurvival

Folders and files

Latest commit

History

Repository files navigation

Code for ICML 2019 paper "Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates"

About

Resources

License

Stars

Watchers

Forks

Languages