We observe n independent triplets (Xi,δi,{Zi(t),t>0}),i=1,…,n, where
In this exercise we will assume that at least one of our given covariates varies in time.
Conditioning by stochastic process needs to be performed carefully. We define the model through hazard function \lambda (intensity of observing an event) so that value of \lambda at time t depends on value of \mathcal{Z} process at time t, i.e. \mathbf{Z}(t): \lambda(t\mid \mathcal{Z}) \equiv \lim \limits_{h\to 0_{+}} \dfrac{1}{h} \mathsf{P} \left[ t \leq T < t+h \mid T\geq t, \mathbf{Z}(t) \right].
In the Cox model with time-varying covariates we assume: \lambda\left(t\mid \mathcal{Z}\right) = \lambda_0(t) \exp \left\{ \boldsymbol{\beta}^{\mathsf{T}} \mathbf{Z}(t) \right\}, where:
Usual structures of covariates:
coxph(Surv(time,delta) ~ I(age + time), data)
is not the solution!Proportinal hazard property of the Cox model is now violated:
\dfrac{\lambda(t\mid \mathcal{Z}_1)}{\lambda(t\mid \mathcal{Z}_2)} = \dfrac{\lambda_0(t)}{\lambda_0(t)} \cdot \dfrac{\exp \left\{ \boldsymbol{\beta}^{\mathsf{T}} \mathbf{Z}_1 (t)\right\}}{\exp \left\{ \boldsymbol{\beta}^{\mathsf{T}} \mathbf{Z}_2 (t)\right\}} = \exp \left\{ \boldsymbol{\beta}^{\mathsf{T}} (\mathbf{Z}_1(t) - \mathbf{Z}_2(t)) \right\} \quad \text{still depends on time } t. Interpretation of \beta_j connected to constant covariate Z_j remains the same: \exp \beta_j = \dfrac{\lambda_0(t)}{\lambda_0(t)} \cdot \dfrac{\exp \left\{\boldsymbol{\beta}^{\mathsf{T}} \left(\mathbf{Z} (t) + \mathbf{e}_j\right)\right\}}{\exp \left\{\boldsymbol{\beta}^{\mathsf{T}} \mathbf{Z} (t) \right\}}. However, coefficient \beta_k connected to time-varying covariate Z_k(t) needs to be interpreted according to the properties of Z_k(t), see blackboard notes (and if interested current age or time transformation) for explanation.
Regardless of time-dependence of covariates (as long as \mathcal{Z} is predictable) the partial likelihood method preserves the same properties as classical maximum likelihood method. Therefore, tests from previous Exercise 7 are still valid.
General cumulative hazard \Lambda(t\mid \mathcal{Z}) and survival function S(t\mid \mathcal{Z}) can be defined and estimated only for constant processes (\mathcal{Z} = \mathbf{Z}). Breslow estimator of baseline cumulative hazard function \Lambda_0(t) \widehat{\Lambda}_0(t) = \int \limits_{0}^{t} \dfrac{ \, \mathrm{d} \overline{N}(u)}{\sum \limits_{i=1}^n Y_i (u) \exp \left\{ \widehat{\boldsymbol{\beta}}^{\mathsf{T}} \mathbf{Z}_i(u) \right\}}, still works and corresponds to the patient with constant zero trajectory of the process \mathcal{Z}. Survival curves for subjects with constant trajectories of covariates can be estimated using \widehat{S}(t \mid \mathcal{Z} = \mathbf{Z}) = \exp \left\{ -\widehat{\Lambda}_0(t) \exp \left\{\widehat{\boldsymbol{\beta}}^{\mathsf{T}} \mathbf{Z} \right\} \right\}.
From September 1967 till March 1974, men with serious heart disease were enrolled into a follow-up study. The follow-up was closed in April 1974. During the follow-up, some of the men underwent transplantation of the heart. The goal of the analysis is to estimate the effect of heart transplant on survival.
This can be done by specifying a Cox regression model with a time-varying covariate indicating whether or not the heart has already been transplanted. At the start of the follow-up, all patients have zero in this covariate. After transplantation, the covariate is switched to 1. The model formula is \lambda(t\mid \mathcal{Z})=\lambda_0(t)\exp\{\beta Z(t)\} where Z(t) is the time-varying indicator of transplantation. The baseline hazard \lambda_0(t) is the risk of death before transplantation. The value \mathrm{e}^\beta expresses the relative change in the risk of death after the transplantation.
The original format of the dataset (called jasa
from
library(survival)
) is this:
Variable name | Description | Values |
---|---|---|
birth.dt | birth date | “yyyy-mm-dd” |
accept.dt | enrollment date | “yyyy-mm-dd” |
tx.date | date of transplantation | “yyyy-mm-dd” or “<NA>” if no transplantation |
fu.date | date of the end of follow-up | “yyyy-mm-dd” |
fustat | status at the end of follow-up | 0 = alive, 1 = dead |
surgery | indicator of bypass surgery before enrollment | 0 = no, 1 = yes |
transplant | indicator of transplantation (at any time during follow-up) | 0 = no, 1 = yes |
library(survival)
print(subset(jasa,select=c(birth.dt:fustat,surgery,transplant))[1:10,])
## birth.dt accept.dt tx.date fu.date fustat surgery transplant
## 1 1937-01-10 1967-11-15 <NA> 1968-01-03 1 0 0
## 2 1916-03-02 1968-01-02 <NA> 1968-01-07 1 0 0
## 3 1913-09-19 1968-01-06 1968-01-06 1968-01-21 1 0 1
## 4 1927-12-23 1968-03-28 1968-05-02 1968-05-05 1 0 1
## 5 1947-07-28 1968-05-10 <NA> 1968-05-27 1 0 0
## 6 1913-11-08 1968-06-13 <NA> 1968-06-15 1 0 0
## 7 1917-08-29 1968-07-12 1968-08-31 1970-05-17 1 0 1
## 8 1923-03-27 1968-08-01 <NA> 1968-09-09 1 0 0
## 9 1921-06-11 1968-08-09 <NA> 1968-11-01 1 0 0
## 10 1926-02-09 1968-08-11 1968-08-22 1968-10-07 1 0 1
In order to be analyzed, this dataset must be transformed into a different format, where the follow-up period is divided into subintervals and the subject’s data are written into several lines, one line for each subinterval. The time-varying covariate is created by changing the value of the covariate between the lines pertaining to the same subject. The main idea is illustrated in blackboard notes.
The transformed dataset (called heart
from
library(survival)
) looks like this:
Variable name | Description | Values |
---|---|---|
id | identification number | row index in jasa |
start | open start of the interval | days from enrollment |
stop | closed end of the interval | days from enrollment |
event | status at the end of follow-up | 0 = alive, 1 = dead |
age | age at enrollment | years - 48 |
year | enrollment time after “1967-10-01” | years |
surgery | indicator of bypass surgery before enrollment | 0 = no, 1 = yes |
transplant | indicator of transplantation (during this time interval) | 0 = no, 1 = yes |
(help(jasa)
says that year = year of acceptance in years
after 1 Nov 1967, however, calculation confirmed it is actually in years
after 1 Oct 1967)
print(subset(heart,select=c(id,start:transplant),id<=10))
## id start stop event age year surgery transplant
## 1 1 0 50 1 -17.1553730 0.1232033 0 0
## 2 2 0 6 1 3.8357290 0.2546201 0 0
## 3 3 0 1 0 6.2970568 0.2655715 0 0
## 4 3 1 16 1 6.2970568 0.2655715 0 1
## 5 4 0 36 0 -7.7371663 0.4900753 0 0
## 6 4 36 39 1 -7.7371663 0.4900753 0 1
## 7 5 0 18 1 -27.2142368 0.6078029 0 0
## 8 6 0 3 1 6.5954825 0.7008898 0 0
## 9 7 0 51 0 2.8692676 0.7802875 0 0
## 10 7 51 675 1 2.8692676 0.7802875 0 1
## 11 8 0 40 1 -2.6502396 0.8350445 0 0
## 12 9 0 85 1 -0.8377823 0.8569473 0 0
## 13 10 0 12 0 -5.4976044 0.8624230 0 0
## 14 10 12 58 1 -5.4976044 0.8624230 0 1
The first subject died 50 days after enrollment without having a transplant.
## id start stop event age year surgery transplant
## 1 1 0 50 1 -17.15537 0.1232033 0 0
Subject #4 lived without transplant for 36 days, was getting a
transplant on day 36, and died tree days later, on day 39. There are two
rows for subject #4; the first has transplant=0
, the second
has transplant=1
. The death of this subject is indicated by
event=1
on the second row. The first row has
event=0
because subject #4 was still alive at the end of
the first interval (day 36).
## id start stop event age year surgery transplant
## 5 4 0 36 0 -7.737166 0.4900753 0 0
## 6 4 36 39 1 -7.737166 0.4900753 0 1
Subject #3 was transplanted on the day of enrollment. In the transformed data, transplantation was moved to day 1 because the intervals cannot have zero width. (Actually, the time of enrollment was shifted one day earlier for all patients.) Another possible solution would be to consider the patient transplanted at the time of enrollment.
## id start stop event age year surgery transplant
## 3 3 0 1 0 6.297057 0.2655715 0 0
## 4 3 1 16 1 6.297057 0.2655715 0 1
If the subject is written over multiple lines, \delta=event
is zero in all
lines except the last (because the subject is observed in subsequent
intervals, it must have survived). The value of event
in
the last line shows the final survival status of the subject
(0=censored, 1=died). The final time of follow-up period X = \min \{T, C\} is also in the last
line in the column stop
.
Investigate the structure of the transformed dataset
heart
and try to reproduce it from the
original dataset jasa
. Can you deal with all obstacles?
Concentrate on the important steps required for successful usage of
coxph
for time-varying covariates. Goal is to prove to me
and to yourself that in the future you will be able to perform this data
preparation on your own.
The survival object used on the left-hand side of the model formula must be adapted to express the interval structure of the data. Therefore, it is written with three arguments:
Surv(start,stop,delta)
Here, start
is the left boundary of the time interval,
stop
is the right boundary, and delta
is the
survival status at the end of this interval (the value of
stop
).
The proportional hazards model is specified as usual, with the
three-argument survival object as the outcome. For example, the model
introduced at the beginning of this assignment would be fitted on the
transformed heart
data by the code
fit <- coxph(Surv(start,stop,event)~transplant,data=heart)
Of course, the time-invariant covariates age
,
year
and surgery
could be also included in the
model.
Estimate and test the effect of heart transplantation on the survival of the patients. Adjust the effect of transplantation for age at enrollment, time of enrollment, and previous bypass surgery. Consider the interactions of heart transplantation with these covariates.
Provide both quantitative (effect size, confidence intervals if possible) and qualitative (does transplantation help to survive longer?) answers.
Provide plot of estimated survival function corresponding to a regular patient with and without transplantation and interpret it.