Time-varying covariates in the Cox model

Theory - the Cox model with time-varying covariates

We observe \(n\) independent triplets \(\left(X_i, \delta_i, \left\{\mathbf{Z}_i(t), t>0\right\}\right), i = 1, \ldots, n\), where

\(X_i = \min \{T_i, C_i\}\) is censored failure time,
\(\delta_i = \mathbb{I}_{(T_i \leq C_i)}\) is failure indicator,
\(\mathbf{Z}_i(t)\) is a vector of covariates that may change over time. In fact, individual covariate \(Z_i(t)\) is a stochastic process on \((0,\infty)\).

In this exercise we will assume that at least one of our given covariates varies in time.

Conditioning by stochastic process needs to be performed carefully. We define the model through hazard function \(\lambda\) (intensity of observing an event) so that value of \(\lambda\) at time \(t\) depends on value of \(\mathcal{Z}\) process at time \(t\), i.e. \(\mathbf{Z}(t)\): \[ \lambda(t\mid \mathcal{Z}) \equiv \lim \limits_{h\to 0_{+}} \dfrac{1}{h} \mathsf{P} \left[ t \leq T < t+h \mid T\geq t, \mathbf{Z}(t) \right]. \]

In the Cox model with time-varying covariates we assume: \[ \lambda\left(t\mid \mathcal{Z}\right) = \lambda_0(t) \exp \left\{ \boldsymbol{\beta}^{\mathsf{T}} \mathbf{Z}(t) \right\}, \] where:

\(\lambda_0(t)\) is the baseline hazard function,
\(\boldsymbol{\beta} \in \mathbb{R}^p\) is unknown vector of regression coefficients.

Usual structures of covariates:

\(Z(t) = const\) - characteristics of the subjects that cannot (or are not allowed to) change,
- gender, genotype, …
\(Z(t) = \mathbb{I}(E \leq t)\) - event indicators (covariate activates the effect only after observing random event at time \(E\)),
- transplantation indicator, surgery indicator, …
- more in blackboard notes
\(Z_i(t) = \sum \limits_{j=1}^{n_i} z_{ij} \mathbb{I}_{[t_{i,j-1}, t_{i,j})} (t)\) - piecewise constant process for gradually updated values,
- blood pressure, cholesterol concentration, cumulative amount of alcohol consumption, …
- more in blackboard notes
\(Z(t) = A + t\) - linear function of time,
- \(A\) age at entry of the study, \(t\) time that has passed from entering the study, \(A+t\) is then the age of subject at time \(t\).
- Makes only sense if in interaction with other covariates (more in NMST532).
- Warning: coxph(Surv(time,delta) ~ I(age + time), data) is not the solution!
- If interested, see this.
Any other suitable transformation of measured data, that can change over time.
- \(Z(t)\) can be some summarization of that covariate from all previous observations.
- Remember, predictability of process \(Z(t)\) is the key property - it is not allowed to depend on information coming from future observations!
- Case \(Z(t) = Z \cdot g(t)\) is, for your interest only, discussed here.

Proportinal hazard property of the Cox model is now violated:

\[ \dfrac{\lambda(t\mid \mathcal{Z}_1)}{\lambda(t\mid \mathcal{Z}_2)} = \dfrac{\lambda_0(t)}{\lambda_0(t)} \cdot \dfrac{\exp \left\{ \boldsymbol{\beta}^{\mathsf{T}} \mathbf{Z}_1 (t)\right\}}{\exp \left\{ \boldsymbol{\beta}^{\mathsf{T}} \mathbf{Z}_2 (t)\right\}} = \exp \left\{ \boldsymbol{\beta}^{\mathsf{T}} (\mathbf{Z}_1(t) - \mathbf{Z}_2(t)) \right\} \quad \text{still depends on time } t. \] Interpretation of \(\beta_j\) connected to constant covariate \(Z_j\) remains the same: \[ \exp \beta_j = \dfrac{\lambda_0(t)}{\lambda_0(t)} \cdot \dfrac{\exp \left\{\boldsymbol{\beta}^{\mathsf{T}} \left(\mathbf{Z} (t) + \mathbf{e}_j\right)\right\}}{\exp \left\{\boldsymbol{\beta}^{\mathsf{T}} \mathbf{Z} (t) \right\}}. \] However, coefficient \(\beta_k\) connected to time-varying covariate \(Z_k(t)\) needs to be interpreted according to the properties of \(Z_k(t)\), see blackboard notes (and if interested current age or time transformation) for explanation.

Regardless of time-dependence of covariates (as long as \(\mathcal{Z}\) is predictable) the partial likelihood method preserves the same properties as classical maximum likelihood method. Therefore, tests from previous Exercise 7 are still valid.

General cumulative hazard \(\Lambda(t\mid \mathcal{Z})\) and survival function \(S(t\mid \mathcal{Z})\) can be defined and estimated only for constant processes (\(\mathcal{Z} = \mathbf{Z}\)). Breslow estimator of baseline cumulative hazard function \(\Lambda_0(t)\) \[ \widehat{\Lambda}_0(t) = \int \limits_{0}^{t} \dfrac{ \, \mathrm{d} \overline{N}(u)}{\sum \limits_{i=1}^n Y_i (u) \exp \left\{ \widehat{\boldsymbol{\beta}}^{\mathsf{T}} \mathbf{Z}_i(u) \right\}}, \] still works and corresponds to the patient with constant zero trajectory of the process \(\mathcal{Z}\). Survival curves for subjects with constant trajectories of covariates can be estimated using \[ \widehat{S}(t \mid \mathcal{Z} = \mathbf{Z}) = \exp \left\{ -\widehat{\Lambda}_0(t) \exp \left\{\widehat{\boldsymbol{\beta}}^{\mathsf{T}} \mathbf{Z} \right\} \right\}. \]

Stanford heart transplant data

From September 1967 till March 1974, men with serious heart disease were enrolled into a follow-up study. The follow-up was closed in April 1974. During the follow-up, some of the men underwent transplantation of the heart. The goal of the analysis is to estimate the effect of heart transplant on survival.

This can be done by specifying a Cox regression model with a time-varying covariate indicating whether or not the heart has already been transplanted. At the start of the follow-up, all patients have zero in this covariate. After transplantation, the covariate is switched to 1. The model formula is \[ \lambda(t\mid \mathcal{Z})=\lambda_0(t)\exp\{\beta Z(t)\} \] where \(Z(t)\) is the time-varying indicator of transplantation. The baseline hazard \(\lambda_0(t)\) is the risk of death before transplantation. The value \(\mathrm{e}^\beta\) expresses the relative change in the risk of death after the transplantation.

The original format of the dataset (called jasa from library(survival)) is this:

Variable name	Description	Values
birth.dt	birth date	“yyyy-mm-dd”
accept.dt	enrollment date	“yyyy-mm-dd”
tx.date	date of transplantation	“yyyy-mm-dd” or “<NA>” if no transplantation
fu.date	date of the end of follow-up	“yyyy-mm-dd”
fustat	status at the end of follow-up	0 = alive, 1 = dead
surgery	indicator of bypass surgery before enrollment	0 = no, 1 = yes
transplant	indicator of transplantation (at any time during follow-up)	0 = no, 1 = yes

library(survival)
print(subset(jasa,select=c(birth.dt:fustat,surgery,transplant))[1:10,])

##      birth.dt  accept.dt    tx.date    fu.date fustat surgery transplant
## 1  1937-01-10 1967-11-15       <NA> 1968-01-03      1       0          0
## 2  1916-03-02 1968-01-02       <NA> 1968-01-07      1       0          0
## 3  1913-09-19 1968-01-06 1968-01-06 1968-01-21      1       0          1
## 4  1927-12-23 1968-03-28 1968-05-02 1968-05-05      1       0          1
## 5  1947-07-28 1968-05-10       <NA> 1968-05-27      1       0          0
## 6  1913-11-08 1968-06-13       <NA> 1968-06-15      1       0          0
## 7  1917-08-29 1968-07-12 1968-08-31 1970-05-17      1       0          1
## 8  1923-03-27 1968-08-01       <NA> 1968-09-09      1       0          0
## 9  1921-06-11 1968-08-09       <NA> 1968-11-01      1       0          0
## 10 1926-02-09 1968-08-11 1968-08-22 1968-10-07      1       0          1

In order to be analyzed, this dataset must be transformed into a different format, where the follow-up period is divided into subintervals and the subject’s data are written into several lines, one line for each subinterval. The time-varying covariate is created by changing the value of the covariate between the lines pertaining to the same subject. The main idea is illustrated in blackboard notes.

The transformed dataset (called heart from library(survival)) looks like this:

Variable name	Description	Values
id	identification number	row index in `jasa`
start	open start of the interval	days from enrollment
stop	closed end of the interval	days from enrollment
event	status at the end of follow-up	0 = alive, 1 = dead
age	age at enrollment	years - 48
year	enrollment time after “1967-10-01”	years
surgery	indicator of bypass surgery before enrollment	0 = no, 1 = yes
transplant	indicator of transplantation (during this time interval)	0 = no, 1 = yes

(help(jasa) says that year = year of acceptance in years after 1 Nov 1967, however, calculation confirmed it is actually in years after 1 Oct 1967)

print(subset(heart,select=c(id,start:transplant),id<=10))

##    id start stop event         age      year surgery transplant
## 1   1     0   50     1 -17.1553730 0.1232033       0          0
## 2   2     0    6     1   3.8357290 0.2546201       0          0
## 3   3     0    1     0   6.2970568 0.2655715       0          0
## 4   3     1   16     1   6.2970568 0.2655715       0          1
## 5   4     0   36     0  -7.7371663 0.4900753       0          0
## 6   4    36   39     1  -7.7371663 0.4900753       0          1
## 7   5     0   18     1 -27.2142368 0.6078029       0          0
## 8   6     0    3     1   6.5954825 0.7008898       0          0
## 9   7     0   51     0   2.8692676 0.7802875       0          0
## 10  7    51  675     1   2.8692676 0.7802875       0          1
## 11  8     0   40     1  -2.6502396 0.8350445       0          0
## 12  9     0   85     1  -0.8377823 0.8569473       0          0
## 13 10     0   12     0  -5.4976044 0.8624230       0          0
## 14 10    12   58     1  -5.4976044 0.8624230       0          1

The first subject died 50 days after enrollment without having a transplant.

##   id start stop event       age      year surgery transplant
## 1  1     0   50     1 -17.15537 0.1232033       0          0

Subject #4 lived without transplant for 36 days, was getting a transplant on day 36, and died tree days later, on day 39. There are two rows for subject #4; the first has transplant=0, the second has transplant=1. The death of this subject is indicated by event=1 on the second row. The first row has event=0 because subject #4 was still alive at the end of the first interval (day 36).

##   id start stop event       age      year surgery transplant
## 5  4     0   36     0 -7.737166 0.4900753       0          0
## 6  4    36   39     1 -7.737166 0.4900753       0          1

Subject #3 was transplanted on the day of enrollment. In the transformed data, transplantation was moved to day 1 because the intervals cannot have zero width. (Actually, the time of enrollment was shifted one day earlier for all patients.) Another possible solution would be to consider the patient transplanted at the time of enrollment.

##   id start stop event      age      year surgery transplant
## 3  3     0    1     0 6.297057 0.2655715       0          0
## 4  3     1   16     1 6.297057 0.2655715       0          1

If the subject is written over multiple lines, \(\delta=\)event is zero in all lines except the last (because the subject is observed in subsequent intervals, it must have survived). The value of event in the last line shows the final survival status of the subject (0=censored, 1=died). The final time of follow-up period \(X = \min \{T, C\}\) is also in the last line in the column stop.

Task 1

Investigate the structure of the transformed dataset heart and try to reproduce it from the original dataset jasa. Can you deal with all obstacles?

Concentrate on the important steps required for successful usage of coxph for time-varying covariates. Goal is to prove to me and to yourself that in the future you will be able to perform this data preparation on your own.

Fitting Cox proportional hazards model with time-varying covariates

The survival object used on the left-hand side of the model formula must be adapted to express the interval structure of the data. Therefore, it is written with three arguments:

Surv(start,stop,delta)

Here, start is the left boundary of the time interval, stop is the right boundary, and delta is the survival status at the end of this interval (the value of stop).

The proportional hazards model is specified as usual, with the three-argument survival object as the outcome. For example, the model introduced at the beginning of this assignment would be fitted on the transformed heart data by the code

fit <- coxph(Surv(start,stop,event)~transplant,data=heart)

Of course, the time-invariant covariates age, year and surgery could be also included in the model.

Task 2

Estimate and test the effect of heart transplantation on the survival of the patients. Adjust the effect of transplantation for age at enrollment, time of enrollment, and previous bypass surgery. Consider the interactions of heart transplantation with these covariates.

Provide both quantitative (effect size, confidence intervals if possible) and qualitative (does transplantation help to survive longer?) answers.

Provide plot of estimated survival function corresponding to a regular patient with and without transplantation and interpret it.

Deadline for report: Monday 16th January 9:00, but the sooner you deliver it, the sooner you get the course credit :-)

Time-varying covariates in the Cox model

Jan Vávra

Exercise 8 (2nd January 2023)

Theory - the Cox model with time-varying covariates

Stanford heart transplant data

Fitting Cox proportional hazards model with time-varying covariates