Exercise 2: Confidence intervals and bands for survival function


Back to course page


Theory

The Kaplan-Meier [KM] estimator of the survival function is \[ \widehat{S}(t)=\prod_{\{i: t_i\leq t\}}\biggl[1-\frac{\Delta\overline{N}(t_i)}{\overline{Y}(t_i)}\biggr]. \] A uniformly consistent estimator of the variance of \(\sqrt{n}[\widehat{S}(t)-S(t)]\) is \[ \widehat{V}(t)=\widehat{S}^2(t)\widehat{\sigma}(t)= \widehat{S}^2(t)\int_0^t \frac{nd\overline{N}(u)}{\overline{Y}(u)[\overline{Y}(u)-\Delta\overline{N}(u)]} \] (Greenwood formula).

An asymptotic pointwise \(100(1-\alpha)\)% confidence interval for \(S(t)\) at a fixed \(t\) is \[ \biggl( \widehat{S}(t)\Bigl[1-u_{1-\alpha/2}\sqrt{\widehat{\sigma}(t)/n}\Bigr],\, \widehat{S}(t)\Bigl[1+u_{1-\alpha/2}\sqrt{\widehat{\sigma}(t)/n}\Bigr] \biggr). \] Asymptotic simultaneous \(100(1-\alpha)\)% Hall-Wellner confidence bands for \(S(t)\) over \(t\in\langle 0,\tau\rangle\) (for some pre-specified \(\tau\)) are \[ \biggl( \widehat{S}(t)\Bigl\{1-k_{1-\alpha}(\widehat{K}(\tau))\bigl[1+\widehat{\sigma}(t)\bigr]/\sqrt{n}\Bigr\},\, \widehat{S}(t)\Bigl\{1+k_{1-\alpha}(\widehat{K}(\tau))\bigl[1+\widehat{\sigma}(t)\bigr]/\sqrt{n}\Bigr\},\, \biggr), \] where \(\widehat{K}(t)=\widehat{\sigma}(t)/[1+\widehat{\sigma}(t)]\) and \(k_{1-\alpha}(t)\), \(t\in(0,1\rangle\), satisfies the equation \[ \text{P}\bigl[\sup_{u\in\langle 0,t\rangle}|B(u)|>k_{1-\alpha}(t)\bigr]=\alpha, \] where \(B\) is the Brownian bridge.

There are variations of confidence intervals and confidence bounds for \(S(t)\) based on various transformations (\(\log\), \(\log(-\log)\), \(\arcsin\), …). Formulae for these intervals can be derived by the delta method.

Implementation in R

Pointwise confidence intervals

library(survival)
fit <- survfit(Surv(x,delta)~1,data=dn,conf.type="plain",conf.int=0.9)
fit2 <- survfit(Surv(x,delta)~grp,data=dataname,conf.type="plain",conf.int=0.9)
cbind(fit$time,fit$lower,fit$upper)
summary(fit)
plot(fit2[1],conf.int=TRUE)
lines(fit2[2],conf.int=TRUE,col="red")

The function survfit in the library survival calculates pointwise confidence intervals. The argument conf.type specifies the transformation (conf.type="plain" means no transformation), the argument conf.int specifies the coverage probability (default 0.95).

Confidence intervals are stored in the output object of the function survfit, the components are called upper and lower. The contents of survfit objects can be also displayed by the function summary.

The function plot called on survfit objects plots the confidence intervals included in the input object. The logical argument conf.int determines whether or not confidence intervals are plotted.

Simultaneous confidence bands

library(OIsurv)
out <- confBands(Surv(dn$x,dn$delta),confType="plain",confLevel=0.95,type="hall",tU=240)
lines(out,lty=3,col="blue")
library(km.ci)
fit <- survfit(Surv(x,delta)~1,data=dn,conf.type="plain",conf.int=0.95)
out <- km.ci(fit,conf.level=0.95,tl=0.03,tu=240,method="hall-wellner") 
summary(out)
lines(out,lty=3,col="blue")
plot(out)

There are two different R libraries that can calculate Hall-Wellner simultaneous confidence bands.

library(OIsurv) includes a function called confBands, which requires a survival object as the input and returns a list of three vectors (time, lower, upper). There is a method for plotting lines from a confBands object, but no method for plot.

library(km.ci) includes a function called km.ci, which requires a survfit object as the input and returns another survfit object with recalculated lower and upper components. The output can be processed by any function that accepts survfit objects – e.g., plot, summary, lines.

Task 1

Download the dataset km_all.RData.

The dataframe inside is called all. It includes 101 observations and three variables. The observations are acute lymphatic leukemia [ALL] patients who had undergone bone marrow transplant. The variable time contains time (in months) since transplantation to either death/relapse or end of follow up, whichever occured first. The outcome of interest is time to death or relapse of ALL (relapse-free survival). The variable delta includes the event indicator (1 = death or relapse, 0 = censoring). The variable type distinguishes two different types of transplant (1 = allogeneic, 2 = autologous).

Calculate and plot the Kaplan-Meier estimate, 95% pointwise confidence intervals and 95% Hall-Wellner confidence bounds for all patients together, for patients with allogeneic transplants, and for patients with autologous transplants.

Task 2

Generate \(n=50\) censored observations as follows: the survival distribution is Weibull with shape parameter \(\alpha=0.7\) and scale parameter \(1/\lambda=2\). Its expectation is \(\Gamma(1+1/\alpha)/\lambda=2\Gamma(17/7)\doteq 2.53\). The censoring distribution is exponential with rate \(\lambda=0.2\) (the expectation is \(1/\lambda=5\)), independent of survival.

Calculate and plot the Kaplan-Meier estimator of \(S(t)\) together with 95% pointwise confidence intervals and 95% Hall-Wellner confidence bounds. Include the true survival function in the plot (use a different color). Include a legend explaining which curve is which.

Task 3 (voluntary)

Conduct a simulation study with data created according to Task 2 assignment. Generate 500 such datasets and estimate the probability that the true survival curve is wholly covered by the 95% pointwise confidence intervals and 95% Hall-Wellner confidence bounds (restrict the task to a reasonable finite interval).