Exercise 5: Comparing more than two distributions and stratification


Back to course page


\(P\)-sample and stratified logrank tests

We have been discussing two sample problems. In practice, more complex settings often arise:

  • There are more than two treatments or groups, and the question of interest is whether the groups differ from each other.
  • We are interested in a comparison between two groups, but we wish to adjust for another factor that may confound the analysis.
  • We want to adjust for lots of covariates.
library(survival)

Three or more survival curves – \(P\)-sample logrank

There are more than two groups, and the question of interest is whether the groups differ from each other.

Given more than two samples, is there a statistical difference between the survival times? We can formulate the hypotheses in the context of survival functions as \[ H_0:\quad S_1(t)=\ldots=S_P(t) \] against the alternative \[ H_1:\quad \exists t\in \mathbb{R}\mbox{ and pair }i\neq j: S_i(t)\neq S_j(t). \]

Let

  • \(t_i\) be times where events are observed (assume these are ordered and there are \(d\) such times),
  • \(d_{ik}\) be the number of observed events from group \(k\) at time \(t_i\),
  • \(y_{ik}\) be the number of subjects in group \(k\) that are at risk at time \(t_i\),
  • \(d_i = \sum_{j=1}^n d_{ij}\),
  • \(y_i = \sum_{j=1}^n y_{ij}\), and
  • \(W(t_i)\) be the weight of the observations at time \(t_i\).

Then to test the hypothesis above, a vector \(Z\) is computed, where the \(k\)th element is \[ Z_k=\sum_{i=1}^d W(t_i)\left[d_{ik}-y_{ik}\frac{d_i}{y_i}\right]. \] The covariance matrix \(\widehat{\Sigma}\) is also computed from the data. Under the null hypothesis, the test statistic \(X^2 = Z^{\top}\widehat{\Sigma}^{-1}Z\) follows a \(\chi^2\) distribution with \(P\) degrees of freedom. That is, if \(X^2 > \chi^2_{1-\alpha,df=P}\), the data provide strong evidence against the null hypothesis and we reject \(H_0\).

Example: Time taken to finish a test with 3 different noise distractions. All tests were stopped after 12 minutes.

time = c(9, 9.5, 9, 8.5, 10, 10.5, 10, 12, 12, 11, 12,
  10.5, 12, 12, 12, 12, 12, 12)
event = c(1,1,1,1,1,1,1,1,0,1,1,1,1,0,0,0,0,0)
group = rep(1:3, times=c(6,6,6))
(cbind(group,time,event))
##       group time event
##  [1,]     1  9.0     1
##  [2,]     1  9.5     1
##  [3,]     1  9.0     1
##  [4,]     1  8.5     1
##  [5,]     1 10.0     1
##  [6,]     1 10.5     1
##  [7,]     2 10.0     1
##  [8,]     2 12.0     1
##  [9,]     2 12.0     0
## [10,]     2 11.0     1
## [11,]     2 12.0     1
## [12,]     2 10.5     1
## [13,]     3 12.0     1
## [14,]     3 12.0     0
## [15,]     3 12.0     0
## [16,]     3 12.0     0
## [17,]     3 12.0     0
## [18,]     3 12.0     0
(noise.lrt = survdiff(Surv(time, event) ~ group, rho=0))
## Call:
## survdiff(formula = Surv(time, event) ~ group, rho = 0)
## 
##         N Observed Expected (O-E)^2/E (O-E)^2/V
## group=1 6        6     1.57   12.4463   17.2379
## group=2 6        5     4.53    0.0488    0.0876
## group=3 6        1     5.90    4.0660    9.4495
## 
##  Chisq= 20.4  on 2 degrees of freedom, p= 3.75e-05
noise.km = survfit(Surv(time, event) ~ group)
plot(noise.km, col = 2:4, xlab = "Minutes",
  ylab = "Time to finish test",
  main = "Time to finish test for 3 noise levels")

Stratified Logrank

Sometimes, even though we are interested in comparing two groups (or maybe \(P\)) groups, we know there are other factors that also affect the outcome. It would be useful to adjust for these other factors in some way.

Example: For the nursing home data (see previous exercise), a logrank test comparing length of stay for those under and over 85 years of age suggests a significant difference (\(p=0.03\)). However, we know that gender has a strong association with length of stay, and also age. Hence, it would be a good idea to stratify the analysis by gender when trying to assess the age effect.

A stratified logrank allows one to compare groups, but allows the shapes of the hazards of the different groups to differ across strata. It makes the assumption that the group 1 vs group 2 hazard ratio is constant across strata. In other words: \(\lambda_{1s}(t)/\lambda_{2s}(t)=\theta\) where \(\theta\) is constant over the strata (\(s =1,\ldots,S\)).

nurshome =
  read.table("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/NursingHome.dat",
  head=F, skip=14, col.names = c("los", "age", "rx", "gender", "married", "health", "censor"))
nurshome$discharged = 1 - nurshome$censor # event indicator
nurshome$under85 = (nurshome$age < 85) + 0
(nurs.slrt = survdiff( Surv(los, discharged) ~ under85 + strata(gender), data = nurshome))
## Call:
## survdiff(formula = Surv(los, discharged) ~ under85 + strata(gender), 
##     data = nurshome)
## 
##             N Observed Expected (O-E)^2/E (O-E)^2/V
## under85=0 689      537      560     0.926      1.73
## under85=1 912      742      719     0.721      1.73
## 
##  Chisq= 1.7  on 1 degrees of freedom, p= 0.189

Task 1

Activate the dataset veteran by calling data(veteran). It includes 137 observations and eight variables. The observations are lung cancer patients. The variable time contains survival time (in days) since the start of treatment. The variable status includes the event indicator (1 = death, 0 = censoring). The variable trt distinguishes two different treatments (1 = standard, 2 = experimental). The treatment was randomly assigned to the patients.

Karnofsky score for each patient is stored in variable karno. Make a dichotomized version of karno for Kanofsky score up to 60 and over 60.

Perform a comparison of survival curves with respect to the treatment when stratifying for the dichotomized Kanofsky score. Moreover, perform a comparison of survival curves with respect to the dichotomized Kanofsky score when stratifying for the treatment. Comment on the suitability of these two analyses.

Due date: Dec. 21