Exercise 7: Time-varying covariates

Back to course page

Stanford heart transplant data

From September 1967 till March 1974, men with serious heart disease were enrolled into a follow-up study. The follow-up was closed in April 1974. During the follow-up, some of the men underwent transplantation of the heart. The goal of the analysis is to estimate the effect of heart transplant on survival.

This can be done by specifying a Cox regression model with a time-varying covariate indicating wheter or not the heart has already been transplanted. At the start of the follow-up, all patients have zero in this covariate. After transplantation, the covariate is switched to 1. The model formula is $\lambda(t\mid Z(t))=\lambda_0(t)\exp\{\beta Z(t)\}$ where $$Z(t)$$ is the time-varying indicator of transplantation. The baseline hazard $$\lambda_0(t)$$ is the risk of death before transplantation. The value $$\mathrm{e}^\beta$$ expresses the relative change in the risk of death after the transplantation.

The original format of the dataset is this:

library(survival)
print(subset(jasa,select=c(birth.dt:fustat,transplant))[1:33,])
##      birth.dt  accept.dt    tx.date    fu.date fustat transplant
## 1  1937-01-10 1967-11-15       <NA> 1968-01-03      1          0
## 2  1916-03-02 1968-01-02       <NA> 1968-01-07      1          0
## 3  1913-09-19 1968-01-06 1968-01-06 1968-01-21      1          1
## 4  1927-12-23 1968-03-28 1968-05-02 1968-05-05      1          1
## 5  1947-07-28 1968-05-10       <NA> 1968-05-27      1          0
## 6  1913-11-08 1968-06-13       <NA> 1968-06-15      1          0
## 7  1917-08-29 1968-07-12 1968-08-31 1970-05-17      1          1
## 8  1923-03-27 1968-08-01       <NA> 1968-09-09      1          0
## 9  1921-06-11 1968-08-09       <NA> 1968-11-01      1          0
## 10 1926-02-09 1968-08-11 1968-08-22 1968-10-07      1          1
## 11 1920-08-22 1968-08-15 1968-09-09 1969-01-14      1          1
## 12 1915-07-09 1968-09-17       <NA> 1968-09-24      1          0
## 13 1914-02-22 1968-09-19 1968-10-05 1968-12-08      1          1
## 14 1914-09-16 1968-09-20 1968-10-26 1972-07-07      1          1
## 15 1914-12-04 1968-09-27       <NA> 1968-09-27      1          0
## 16 1919-05-16 1968-10-26 1968-11-22 1969-08-29      1          1
## 17 1948-06-29 1968-10-28       <NA> 1968-12-02      1          0
## 18 1911-12-27 1968-11-01 1968-11-20 1968-12-13      1          1
## 19 1909-10-04 1968-11-18       <NA> 1968-12-24      1          0
## 20 1913-10-19 1969-01-29 1969-02-15 1969-02-25      1          1
## 21 1925-09-29 1969-02-01 1969-02-08 1971-11-29      1          1
## 22 1926-06-05 1969-03-18 1969-03-29 1969-05-07      1          1
## 23 1910-12-02 1969-04-11 1969-04-13 1971-04-13      1          1
## 24 1917-07-07 1969-04-25 1969-07-16 1969-11-29      1          1
## 25 1936-02-06 1969-04-28 1969-05-22 1974-04-01      0          1
## 26 1938-10-18 1969-05-01       <NA> 1973-03-01      0          0
## 27 1960-07-21 1969-05-04       <NA> 1970-01-21      1          0
## 28 1915-05-30 1969-06-07 1969-08-16 1969-08-17      1          1
## 29 1919-02-06 1969-07-14       <NA> 1969-08-17      1          0
## 30 1924-09-20 1969-08-19 1969-09-03 1971-12-18      1          1
## 31 1914-10-04 1969-08-23       <NA> 1969-09-07      1          0
## 32 1905-04-02 1969-08-29 1969-09-14 1969-11-13      1          1
## 33 1921-01-01 1969-11-27 1970-01-16 1974-04-01      0          1

The columns are: birth date, enrollment date, date of transplantation (missing if no transplantation), date of the end of follow-up, survival status at the end of follow-up (1=dead, 0=alive), indicator of transplantation (at any time during follow-up).

In order to be analyzed, this dataset must be transformed into a different format, where the follow-up period is divided into subintervals and the subject’s data are written into several lines, one line for each subinterval. The time-varying covariate is created by changing the value of the covariate between the lines pertaining to the same subject.

The transformed dataset looks like this:

print(subset(heart,select=c(id,start:transplant),id<=10))
##    id start stop event         age      year surgery transplant
## 1   1     0   50     1 -17.1553730 0.1232033       0          0
## 2   2     0    6     1   3.8357290 0.2546201       0          0
## 3   3     0    1     0   6.2970568 0.2655715       0          0
## 4   3     1   16     1   6.2970568 0.2655715       0          1
## 5   4     0   36     0  -7.7371663 0.4900753       0          0
## 6   4    36   39     1  -7.7371663 0.4900753       0          1
## 7   5     0   18     1 -27.2142368 0.6078029       0          0
## 8   6     0    3     1   6.5954825 0.7008898       0          0
## 9   7     0   51     0   2.8692676 0.7802875       0          0
## 10  7    51  675     1   2.8692676 0.7802875       0          1
## 11  8     0   40     1  -2.6502396 0.8350445       0          0
## 12  9     0   85     1  -0.8377823 0.8569473       0          0
## 13 10     0   12     0  -5.4976044 0.8624230       0          0
## 14 10    12   58     1  -5.4976044 0.8624230       0          1

The variable id identifies the patient; its value corresponds to the row numbers of the untransformed dataset jasa. The variables start and stop define the intervals (in days after the start of the follow-up). The intervals are considered open on the left and closed on the right. The variable event shows the survival status at the end of each interval. The variable transplant is the time-varying transplantation indicator. The variables age, year, and surgery are time invariant. The variable age expresses the age at enrollment in years (decreased by 48), the variable year is the enrollment time in years after Nov. 1, 196, the variable surgery is a binary indicator of bypass surgery before enrollment.

The first subject died 50 days after enrollment without having a transplant. Subject #4 lived without transplant for 36 days, was transplanted on day 36, and died tree days later, on day 39. There are two rows for subject #4; the first has transplant=0, the second has transplant=1. The death of this subject is indicated by event=1 on the second row. The first row has event=0 because subject #4 was still alive at the end of the first interval (day 36). Subject #3 was transplanted on the day of enrollment. In the transformed data, transplantation was moved to day 1 because the intervals cannot have zero width. Another possible solution would be to consider the patient transplanted at the time of enrollment.

Investigate the structure of the transformed dataset heart and think how it might have been created from the original dataset jasa.

Fitting Cox proportional hazards model with time-varying covariates

The survival object used on the left-hand side of the model formula must be adapted to express the interval structure of the data. Therefore, it is written with three arguments:

Surv(start,stop,delta)

Here, start is the left boundary of the time interval, stop is the right boundary, and delta is the survival status at the end of this interval (the value of stop). If the subject is written over multiple lines, delta is zero in all lines except the last (because the subject is observed in subsequent intervals, it must have survived). The value of delta in the last line shows the final survival status of the subject (0=censored, 1=died).

The proportional hazards model is specifies as usual, with the tree-argument survival object as the outcome. For example, the model introduced at the beginning of this assignment would be fitted on the transformed heart data by the code

fit=coxph(Surv(start,stop,event)~transplant,data=heart)

Of course, the time-invariant covariates age, year and surgery could be also included in the model.