The Kaplan-Meier [KM] estimator of the survival function is \[ \widehat{S}(t)=\prod_{\{i: t_i\leq t\}}\biggl[1-\frac{\Delta\overline{N}(t_i)}{\overline{Y}(t_i)}\biggr]. \] A uniformly consistent estimator of the variance of \(\sqrt{n}[\widehat{S}(t)-S(t)]\) is \[ \widehat{V}(t)=\widehat{S}^2(t)\widehat{\sigma}(t)= \widehat{S}^2(t)\int_0^t \frac{nd\overline{N}(u)}{\overline{Y}(u)[\overline{Y}(u)-\Delta\overline{N}(u)]} \] (Greenwood formula).
An asymptotic pointwise \(100(1-\alpha)\)% confidence interval for \(S(t)\) at a fixed \(t\) is \[ \biggl( \widehat{S}(t)\Bigl[1-u_{1-\alpha/2}\sqrt{\widehat{\sigma}(t)/n}\Bigr],\, \widehat{S}(t)\Bigl[1+u_{1-\alpha/2}\sqrt{\widehat{\sigma}(t)/n}\Bigr] \biggr). \] Asymptotic simultaneous \(100(1-\alpha)\)% Hall-Wellner confidence bands for \(S(t)\) over \(t\in\langle 0,\tau\rangle\) (for some pre-specified \(\tau\)) are \[ \biggl( \widehat{S}(t)\Bigl\{1-k_{1-\alpha}(\widehat{K}(\tau))\bigl[1+\widehat{\sigma}(t)\bigr]/\sqrt{n}\Bigr\},\, \widehat{S}(t)\Bigl\{1+k_{1-\alpha}(\widehat{K}(\tau))\bigl[1+\widehat{\sigma}(t)\bigr]/\sqrt{n}\Bigr\},\, \biggr), \] where \(\widehat{K}(t)=\widehat{\sigma}(t)/[1+\widehat{\sigma}(t)]\) and \(k_{1-\alpha}(t)\), \(t\in(0,1\rangle\), satisfies the equation \[ \text{P}\bigl[\sup_{u\in\langle 0,t\rangle}|B(u)|>k_{1-\alpha}(t)\bigr]=\alpha, \] where \(B\) is the Brownian bridge.
There are variations of confidence intervals and confidence bounds for \(S(t)\) based on various transformations (\(\log\), \(\log(-\log)\), \(\arcsin\), …). Formulae for these intervals can be derived by the delta method.
library(survival)
fit <- survfit(Surv(x,delta)~1,data=dn,conf.type="plain",conf.int=0.9)
fit2 <- survfit(Surv(x,delta)~grp,data=dataname,conf.type="plain",conf.int=0.9)
cbind(fit$time,fit$lower,fit$upper)
summary(fit)
plot(fit2[1],conf.int=TRUE)
lines(fit2[2],conf.int=TRUE,col="red")
The function survfit
in the library survival
calculates pointwise confidence intervals. The argument conf.type
specifies the transformation (conf.type="plain"
means no transformation), the argument conf.int
specifies the coverage probability (default 0.95).
Confidence intervals are stored in the output object of the function survfit
, the components are called upper
and lower
. The contents of survfit
objects can be also displayed by the function summary
.
The function plot
called on survfit
objects plots the confidence intervals included in the input object. The logical argument conf.int
determines whether or not confidence intervals are plotted.
library(OIsurv)
out <- confBands(Surv(dn$x,dn$delta),confType="plain",confLevel=0.95,type="hall",tU=240)
lines(out,lty=3,col="blue")
library(km.ci)
fit <- survfit(Surv(x,delta)~1,data=dn,conf.type="plain",conf.int=0.95)
out <- km.ci(fit,conf.level=0.95,tl=0.03,tu=240,method="hall-wellner")
summary(out)
lines(out,lty=3,col="blue")
plot(out)
There are two different R libraries that can calculate Hall-Wellner simultaneous confidence bands.
library(OIsurv)
includes a function called confBands
, which requires a survival object as the input and returns a list of three vectors (time
, lower
, upper
). There is a method for plotting lines
from a confBands
object, but no method for plot
.
library(km.ci)
includes a function called km.ci
, which requires a survfit
object as the input and returns another survfit
object with recalculated lower
and upper
components. The output can be processed by any function that accepts survfit
objects – e.g., plot
, summary
, lines
.
Download the dataset km_all.RData.
The dataframe inside is called all
. It includes 101 observations and three variables. The observations are acute lymphatic leukemia [ALL] patients who had undergone bone marrow transplant. The variable time
contains time (in months) since transplantation to either death/relapse or end of follow up, whichever occured first. The outcome of interest is time to death or relapse of ALL (relapse-free survival). The variable delta
includes the event indicator (1 = death or relapse, 0 = censoring). The variable type
distinguishes two different types of transplant (1 = allogeneic, 2 = autologous).
Calculate and plot the Kaplan-Meier estimate, 95% pointwise confidence intervals and 95% Hall-Wellner confidence bounds for all patients together, for patients with allogeneic transplants, and for patients with autologous transplants.
Generate \(n=50\) censored observations as follows: the survival distribution is Weibull with shape parameter \(\alpha=0.7\) and scale parameter \(1/\lambda=2\). Its expectation is \(\Gamma(1+1/\alpha)/\lambda=2\Gamma(17/7)\doteq 2.53\). The censoring distribution is exponential with rate \(\lambda=0.2\) (the expectation is \(1/\lambda=5\)), independent of survival.
Calculate and plot the Kaplan-Meier estimator of \(S(t)\) together with 95% pointwise confidence intervals and 95% Hall-Wellner confidence bounds. Include the true survival function in the plot (use a different color). Include a legend explaining which curve is which.
Conduct a simulation study with data created according to Task 2 assignment. Generate 500 such datasets and estimate the probability that the true survival curve is wholly covered by the 95% pointwise confidence intervals and 95% Hall-Wellner confidence bounds (restrict the task to a reasonable finite interval).