The Kaplan-Meier [KM] estimator of the survival function is \[ \widehat{S}(t)=\prod_{\{i: t_i\leq t\}}\biggl[1-\frac{\Delta\overline{N}(t_i)}{\overline{Y}(t_i)}\biggr]. \] A uniformly consistent estimator of the variance of \(\sqrt{n}[\widehat{S}(t)-S(t)]\) is \[ \widehat{V}(t)=\widehat{S}^2(t)\widehat{\sigma}(t)= \widehat{S}^2(t)\int_0^t \frac{nd\overline{N}(u)}{\overline{Y}(u)[\overline{Y}(u)-\Delta\overline{N}(u)]} \] (Greenwood formula).

An asymptotic pointwise \(100(1-\alpha)\)% confidence interval for \(S(t)\) at a fixed \(t\) is \[
\biggl(
\widehat{S}(t)\Bigl[1-u_{1-\alpha/2}\sqrt{\widehat{\sigma}(t)/n}\Bigr],\,
\widehat{S}(t)\Bigl[1+u_{1-\alpha/2}\sqrt{\widehat{\sigma}(t)/n}\Bigr]
\biggr).
\] Asymptotic simultaneous \(100(1-\alpha)\)% *Hall-Wellner confidence bands* for \(S(t)\) over \(t\in\langle 0,\tau\rangle\) (for some pre-specified \(\tau\)) are \[
\biggl(
\widehat{S}(t)\Bigl\{1-k_{1-\alpha}(\widehat{K}(\tau))\bigl[1+\widehat{\sigma}(t)\bigr]/\sqrt{n}\Bigr\},\,
\widehat{S}(t)\Bigl\{1+k_{1-\alpha}(\widehat{K}(\tau))\bigl[1+\widehat{\sigma}(t)\bigr]/\sqrt{n}\Bigr\},\,
\biggr),
\] where \(\widehat{K}(t)=\widehat{\sigma}(t)/[1+\widehat{\sigma}(t)]\) and \(k_{1-\alpha}(t)\), \(t\in(0,1\rangle\), satisfies the equation \[
\text{P}\bigl[\sup_{u\in\langle 0,t\rangle}|B(u)|>k_{1-\alpha}(t)\bigr]=\alpha,
\] where \(B\) is the Brownian bridge.

There are variations of confidence intervals and confidence bounds for \(S(t)\) based on various transformations (\(\log\), \(\log(-\log)\), \(\arcsin\), …). Formulae for these intervals can be derived by the delta method.

```
library(survival)
fit <- survfit(Surv(x,delta)~1,data=dn,conf.type="plain",conf.int=0.9)
fit2 <- survfit(Surv(x,delta)~grp,data=dataname,conf.type="plain",conf.int=0.9)
cbind(fit$time,fit$lower,fit$upper)
summary(fit)
plot(fit2[1],conf.int=TRUE)
lines(fit2[2],conf.int=TRUE,col="red")
```

The function `survfit`

in the library `survival`

calculates pointwise confidence intervals. The argument `conf.type`

specifies the transformation (`conf.type="plain"`

means no transformation), the argument `conf.int`

specifies the coverage probability (default 0.95).

Confidence intervals are stored in the output object of the function `survfit`

, the components are called `upper`

and `lower`

. The contents of `survfit`

objects can be also displayed by the function `summary`

.

The function `plot`

called on `survfit`

objects plots the confidence intervals included in the input object. The logical argument `conf.int`

determines whether or not confidence intervals are plotted.

```
library(OIsurv)
out <- confBands(Surv(dn$x,dn$delta),confType="plain",confLevel=0.95,type="hall",tU=240)
lines(out,lty=3,col="blue")
```

```
library(km.ci)
fit <- survfit(Surv(x,delta)~1,data=dn,conf.type="plain",conf.int=0.95)
out <- km.ci(fit,conf.level=0.95,tl=0.03,tu=240,method="hall-wellner")
summary(out)
lines(out,lty=3,col="blue")
plot(out)
```

There are two different R libraries that can calculate Hall-Wellner simultaneous confidence bands.

`library(OIsurv)`

includes a function called `confBands`

, which requires a survival object as the input and returns a list of three vectors (`time`

, `lower`

, `upper`

). There is a method for plotting `lines`

from a `confBands`

object, but no method for `plot`

.

`library(km.ci)`

includes a function called `km.ci`

, which requires a `survfit`

object as the input and returns another `survfit`

object with recalculated `lower`

and `upper`

components. The output can be processed by any function that accepts `survfit`

objects – e.g., `plot`

, `summary`

, `lines`

.

Download the dataset km_all.RData.

The dataframe inside is called `all`

. It includes 101 observations and three variables. The observations are acute lymphatic leukemia [ALL] patients who had undergone bone marrow transplant. The variable `time`

contains time (in months) since transplantation to either death/relapse or end of follow up, whichever occured first. The outcome of interest is time to death or relapse of ALL (relapse-free survival). The variable `delta`

includes the event indicator (1 = death or relapse, 0 = censoring). The variable `type`

distinguishes two different types of transplant (1 = allogeneic, 2 = autologous).

Calculate and plot the Kaplan-Meier estimate, 95% pointwise confidence intervals and 95% Hall-Wellner confidence bounds for all patients together, for patients with allogeneic transplants, and for patients with autologous transplants.

Generate \(n=50\) censored observations as follows: the survival distribution is Weibull with shape parameter \(\alpha=0.7\) and scale parameter \(1/\lambda=2\). Its expectation is \(\Gamma(1+1/\alpha)/\lambda=2\Gamma(17/7)\doteq 2.53\). The censoring distribution is exponential with rate \(\lambda=0.2\) (the expectation is \(1/\lambda=5\)), independent of survival.

Calculate and plot the Kaplan-Meier estimator of \(S(t)\) together with 95% pointwise confidence intervals and 95% Hall-Wellner confidence bounds. Include the true survival function in the plot (use a different color). Include a legend explaining which curve is which.

Conduct a simulation study with data created according to Task 2 assignment. Generate 500 such datasets and estimate the probability that the true survival curve is wholly covered by the 95% pointwise confidence intervals and 95% Hall-Wellner confidence bounds (restrict the task to a reasonable finite interval).