We have been discussing two sample problems. In practice, more complex settings often arise:
library(survival)
There are more than two groups, and the question of interest is whether the groups differ from each other.
Given more than two samples, is there a statistical difference between the survival times? We can formulate the hypotheses in the context of survival functions as H0:S1(t)=…=SP(t) against the alternative H1:∃t∈R and pair i≠j:Si(t)≠Sj(t).
Let
Then to test the hypothesis above, a vector Z is computed, where the kth element is Zk=d∑i=1W(ti)[dik−yikdiyi]. The covariance matrix ˆΣ is also computed from the data. Under the null hypothesis, the test statistic X2=Z⊤ˆΣ−1Z follows a χ2 distribution with P degrees of freedom. That is, if X2>χ21−α,df=P, the data provide strong evidence against the null hypothesis and we reject H0.
Example: Time taken to finish a test with 3 different noise distractions. All tests were stopped after 12 minutes.
time = c(9, 9.5, 9, 8.5, 10, 10.5, 10, 12, 12, 11, 12,
10.5, 12, 12, 12, 12, 12, 12)
event = c(1,1,1,1,1,1,1,1,0,1,1,1,1,0,0,0,0,0)
group = rep(1:3, times=c(6,6,6))
(cbind(group,time,event))
## group time event
## [1,] 1 9.0 1
## [2,] 1 9.5 1
## [3,] 1 9.0 1
## [4,] 1 8.5 1
## [5,] 1 10.0 1
## [6,] 1 10.5 1
## [7,] 2 10.0 1
## [8,] 2 12.0 1
## [9,] 2 12.0 0
## [10,] 2 11.0 1
## [11,] 2 12.0 1
## [12,] 2 10.5 1
## [13,] 3 12.0 1
## [14,] 3 12.0 0
## [15,] 3 12.0 0
## [16,] 3 12.0 0
## [17,] 3 12.0 0
## [18,] 3 12.0 0
(noise.lrt = survdiff(Surv(time, event) ~ group, rho=0))
## Call:
## survdiff(formula = Surv(time, event) ~ group, rho = 0)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## group=1 6 6 1.57 12.4463 17.2379
## group=2 6 5 4.53 0.0488 0.0876
## group=3 6 1 5.90 4.0660 9.4495
##
## Chisq= 20.4 on 2 degrees of freedom, p= 3.75e-05
noise.km = survfit(Surv(time, event) ~ group)
plot(noise.km, col = 2:4, xlab = "Minutes",
ylab = "Time to finish test",
main = "Time to finish test for 3 noise levels")
Sometimes, even though we are interested in comparing two groups (or maybe P) groups, we know there are other factors that also affect the outcome. It would be useful to adjust for these other factors in some way.
Example: For the nursing home data (see previous exercise), a logrank test comparing length of stay for those under and over 85 years of age suggests a significant difference (p=0.03). However, we know that gender has a strong association with length of stay, and also age. Hence, it would be a good idea to stratify the analysis by gender when trying to assess the age effect.
A stratified logrank allows one to compare groups, but allows the shapes of the hazards of the different groups to differ across strata. It makes the assumption that the group 1 vs group 2 hazard ratio is constant across strata. In other words: λ1s(t)/λ2s(t)=θ where θ is constant over the strata (s=1,…,S).
nurshome =
read.table("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/NursingHome.dat",
head=F, skip=14, col.names = c("los", "age", "rx", "gender", "married", "health", "censor"))
nurshome$discharged = 1 - nurshome$censor # event indicator
nurshome$under85 = (nurshome$age < 85) + 0
(nurs.slrt = survdiff( Surv(los, discharged) ~ under85 + strata(gender), data = nurshome))
## Call:
## survdiff(formula = Surv(los, discharged) ~ under85 + strata(gender),
## data = nurshome)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## under85=0 689 537 560 0.926 1.73
## under85=1 912 742 719 0.721 1.73
##
## Chisq= 1.7 on 1 degrees of freedom, p= 0.189
Task 1
Activate the dataset veteran
by calling data(veteran)
. It includes 137 observations and eight variables. The observations are lung cancer patients. The variable time
contains survival time (in days) since the start of treatment. The variable status
includes the event indicator (1 = death, 0 = censoring). The variable trt
distinguishes two different treatments (1 = standard, 2 = experimental). The treatment was randomly assigned to the patients.
Karnofsky score for each patient is stored in variable karno
. Make a dichotomized version of karno
for Kanofsky score up to 60 and over 60.
Perform a comparison of survival curves with respect to the treatment when stratifying for the dichotomized Kanofsky score. Moreover, perform a comparison of survival curves with respect to the dichotomized Kanofsky score when stratifying for the treatment. Comment on the suitability of these two analyses.
Due date: Dec. 21