lifelines

What is survival analysis and why should I learn it? Historically, survival analysis has been developed and applied heavily by the actuarial and medical community. Generally, its purpose is to answer why do events occur now versus later under uncertainity (where events might refer to deaths, disease remission, etc.). This is great for researchers who are interested in measuring lifetimes: they can answer questions like what factors might influence deaths?

But outside of medicine and actuarial science, there are many interesting and exicting applications of this lesser-known technique. SaaS providers are interested in measuring customer lifetimes; sociologists are interested in measure political parties lifetimes, or relationships, or marriages; Telcoms are interested in understand customer behaviours, etc.

####Dependencies:

The usual Python data stack: numpy, pandas, matplotlib (optional)

(Quick) Intro to lifelines and survival analysis

**Work in progress (30%) **

If you are new to survival analysis, wondering why it is useful, and are interested in examples, I recommend running the Tutorial and Examples.ipynb notebook, or you can view it online here.

Documentation

**Work in progress (75%) **

I've added documentation to a notebook, Documentation.ipynb, that adds detail to the classes, methods and data types. You can use the IPython notebook to view it, or view it online.

Enough talk - just show me the examples!

Generating Datasets

%pylab
from lifelines.generate_datasets import *
from lifelines.estimation import *

n_ind = 4 # how many lifetimes do we observe
n_dim = 5 # the number of covarites to generate. 
t = np.linspace(0,40,400)

hz, coefs, covart = generate_hazard_rates(n_ind, n_dim, t, model="aalen")
# you're damn right these are dataframes

hz.plot()

(this styling of Matplotlib is present in the styles/ folder)

sv = construct_survival_curves(hz, t )
sv.plot() #moar dataframes

#using the hazard curves, we can sample from survival times.
rv = generate_random_lifetimes(hz, t, 50 )
print rv
array([[ 9.4235589 ,  3.60902256,  3.0075188 ,  0.60150376],
       [ 1.00250627,  3.20802005,  0.70175439,  0.30075188],
       [ 5.71428571,  8.02005013,  5.41353383,  0.30075188],
       ...,
       [ 3.70927318,  4.41102757,  3.30827068,  0.30075188],
       [ 1.80451128,  1.5037594 ,  0.30075188,  0.40100251],
       [ 1.40350877,  1.5037594 ,  0.80200501,  0.10025063]])

survival_times = rv[:,0][:,None]  

#estimation is clean and built to resemble scikit learn's api.
kmf = KaplanMeierFitter()
kmf.fit(survival_times)
kmf.survival_function_.plot()

naf = NelsonAalenFitter()
naf.fit(survival_times)
naf.cumulative_hazard_.plot()

Censorship events and estimation

When there are right-censored events, the simplest case being there are still surviving individuals, we need to be more careful and factor these non-observed individuals in. The api for this is an obvious extension from above:

t = np.linspace(0,40,1000)
hz, coefs, covart = generate_hazard_rates(1, 2, t, model="aalen")

#generate random lifetimes with uniform censoring. C is the boolean of censorship
T, C = generate_random_lifetimes(hz, t, size=750, censor=True )

In the above line, C is a boolean array with True iff we observed the death event, otherwise, they individual was right-censored. T is the death event, or if censored, the most lifespan before censorship.

kmf = KaplanMeierFitter()
kmf.fit(T,t,censorship=C) #add in the censorship here

#plot it
ax = kmf.survival_function_.plot()
sv = construct_survival_curves(hz,t) 
sv.plot(ax=ax) 

##what if we had ignored the censorship events?
kmf.fit(T,t, column_name="KM-estimate without factoring censorship")
kmf.survival_function_.plot(ax=ax)

plt.show()

Survival Regression

Currently implemented is Aalen Additive model

from lifelines.estimation import AalenAdditiveFitter

#will fit the cumulative hazards
aaf = AalenAdditiveFitter(fit_intercept=True)
aaf.fit(T[None,:], X, censorship=C) #X is a dataframe of numpy array of covariatesg
aaf.cumulative_hazards_.plot()

#plot the kernel smoothed hazards
aaf.smoothed_hazards(20).plot()

Plotting

The styling present in the above graphs is from a custom matplotlibrc file, you can find it in the styles/ directory.

There is a plotting library in Lifelines, under lifelines.plotting. We can visualize the lifetimes of individuals (really only good for data checking for small samples).

from lifelines.plotting import plot_lifetimes

N = 20
current = 10
birthtimes = current*np.random.uniform(size=(N,))
T, C= generate_random_lifetimes(hz, t, size=N, censor=current - birthtimes )
plot_lifetimes(T, censorship=C, birthtimes=birthtimes)

Moar examples?

There are some IPython notebook files in the repo, and you can view them online here:

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
datasets		datasets
lifelines		lifelines
styles		styles
.gitignore		.gitignore
Documentation.ipynb		Documentation.ipynb
LICENSE		LICENSE
README.md		README.md
Tutorial and Examples.ipynb		Tutorial and Examples.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

lifelines

lifelines

styles

styles

.gitignore

.gitignore

Documentation.ipynb

Documentation.ipynb

LICENSE

LICENSE

README.md

README.md

Tutorial and Examples.ipynb

Tutorial and Examples.ipynb

Repository files navigation

lifelines

(Quick) Intro to lifelines and survival analysis

Documentation

Enough talk - just show me the examples!

Generating Datasets

Censorship events and estimation

Survival Regression

Plotting

Moar examples?

About

Releases

Packages

License

Basqiat/lifelines

Folders and files

Latest commit

History

Repository files navigation

lifelines

(Quick) Intro to lifelines and survival analysis

Documentation

Enough talk - just show me the examples!

Generating Datasets

Censorship events and estimation

Survival Regression

Plotting

Moar examples?

About

Resources

License

Stars

Watchers

Forks