Consider an ordinary regression model for log survival time of the form \[ Y_i=\log T_i=-{\bf X}_i{\boldsymbol\beta}+\sigma{\bf W}_i \] where the error term \(W_i\) has a suitable distribution, e.g., extreme value, generalized extreme value, normal, or logistic. This leads to Weibull, generalized gamma, log-normal or log-logistic models for \(T_i\).
glm
function from R
and ingoring censoringAtlantic halibut survival times data
One conservation measure suggested for trawl fishing is a minimum size limit for halibut (32 inches). However, this size limit would only be effective if captured fish below the limit survived until the time of their release. An experiment was conducted to evaluate the survival rates of halibut caught by trawls or longlines, and to assess other factors which might contribute to survival (duration of trawling, maximum depth fished, size of fish, and handling time).
halibut = read.csv("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/halibut.csv", head=T)
dim(halibut)
## [1] 294 8
head(halibut)
## No Time Death Towd Deldepth Length Handtime Logcat
## 1 1 209 1 30 13 41 8 6.992
## 2 2 209 1 30 13 44 8 6.992
## 3 3 209 1 30 13 47 10 6.992
## 4 4 209 1 30 13 34 10 6.992
## 5 5 38 1 30 13 40 11 6.992
## 6 6 209 1 30 13 42 11 6.992
summary(halibut)
## No Time Death Towd
## Min. : 1.00 Min. : 1.10 Min. :0.0000 Min. : 30.00
## 1st Qu.: 74.25 1st Qu.: 13.20 1st Qu.:1.0000 1st Qu.: 30.00
## Median :147.50 Median : 29.80 Median :1.0000 Median :100.00
## Mean :147.50 Mean : 88.91 Mean :0.9286 Mean : 74.52
## 3rd Qu.:220.75 3rd Qu.: 82.28 3rd Qu.:1.0000 3rd Qu.:100.00
## Max. :294.00 Max. :1200.00 Max. :1.0000 Max. :100.00
## Deldepth Length Handtime Logcat
## Min. : 1.000 Min. :29.00 Min. : 1.00 Min. :2.904
## 1st Qu.: 5.000 1st Qu.:38.00 1st Qu.: 6.00 1st Qu.:4.059
## Median : 8.000 Median :43.00 Median :10.00 Median :4.203
## Mean : 9.966 Mean :42.51 Mean :12.51 Mean :4.886
## 3rd Qu.:10.000 3rd Qu.:46.75 3rd Qu.:17.00 3rd Qu.:5.323
## Max. :58.000 Max. :57.00 Max. :38.00 Max. :8.690
# Time = survival time (minutes)
# Death = event indicator
# Towd = towing duration
# Deldepth = depth range of towing
# Length = length of fish (cm)
# Handtime = handling time (min)
# Logcat = log total catch
Gamma distribution of log survival time
summary(halibut.gamma <- glm(log(Time) ~ Towd + Deldepth + Length + Handtime + Logcat, family = Gamma(identity), data = halibut))
## Warning: glm.fit: algorithm did not converge
##
## Call:
## glm(formula = log(Time) ~ Towd + Deldepth + Length + Handtime +
## Logcat, family = Gamma(identity), data = halibut)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.02542 -0.30063 0.00607 0.22418 1.08881
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.926200 0.617453 4.739 3.38e-06 ***
## Towd -0.014068 0.003175 -4.431 1.33e-05 ***
## Deldepth -0.016685 0.009073 -1.839 0.066952 .
## Length 0.048097 0.012928 3.720 0.000239 ***
## Handtime -0.055115 0.011497 -4.794 2.63e-06 ***
## Logcat 0.066403 0.071847 0.924 0.356143
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.1874759)
##
## Null deviance: 101.027 on 293 degrees of freedom
## Residual deviance: 79.962 on 288 degrees of freedom
## AIC: 1095
##
## Number of Fisher Scoring iterations: 25
### Remark: canonical link = inverse, not identity or log
names(halibut.gamma)
## [1] "coefficients" "residuals" "fitted.values"
## [4] "effects" "R" "rank"
## [7] "qr" "family" "linear.predictors"
## [10] "deviance" "aic" "null.deviance"
## [13] "iter" "weights" "prior.weights"
## [16] "df.residual" "df.null" "y"
## [19] "converged" "boundary" "model"
## [22] "call" "formula" "terms"
## [25] "data" "offset" "control"
## [28] "method" "contrasts" "xlevels"
halibut.gamma$deviance
## [1] 79.96172
halibut.gamma$null.deviance
## [1] 101.0268
### summary(halibut.gamma <- glm(Time ~ Towd + Deldepth + Length + Handtime + Logcat, family = Gamma(log), data = halibut))
### Why not?
Exponential distribution of log survival time
halibut.exp <- glm(log(Time) ~ Towd + Deldepth + Length + Handtime + Logcat, family = Gamma(identity), data = halibut, control = list(maxit = 500))
summary(halibut.exp, dispersion = 1)
##
## Call:
## glm(formula = log(Time) ~ Towd + Deldepth + Length + Handtime +
## Logcat, family = Gamma(identity), data = halibut, control = list(maxit = 500))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.02519 -0.30160 0.00546 0.22557 1.08931
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.927170 1.428074 2.050 0.0404 *
## Towd -0.014133 0.007334 -1.927 0.0540 .
## Deldepth -0.015612 0.020705 -0.754 0.4508
## Length 0.048473 0.029843 1.624 0.1043
## Handtime -0.054857 0.026564 -2.065 0.0389 *
## Logcat 0.060973 0.165771 0.368 0.7130
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 1)
##
## Null deviance: 101.027 on 293 degrees of freedom
## Residual deviance: 79.954 on 288 degrees of freedom
## AIC: 1095
##
## Number of Fisher Scoring iterations: 61
Normal distribution of log survival time
summary(halibut.lm <- lm(log(Time) ~ Towd + Deldepth + Length + Handtime + Logcat, data = halibut))
##
## Call:
## lm(formula = log(Time) ~ Towd + Deldepth + Length + Handtime +
## Logcat, data = halibut)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9516 -0.9400 0.0826 0.8058 3.9494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.338289 0.602193 3.883 0.000128 ***
## Towd -0.013225 0.002587 -5.113 5.81e-07 ***
## Deldepth -0.031566 0.011064 -2.853 0.004643 **
## Length 0.042868 0.012805 3.348 0.000923 ***
## Handtime -0.059059 0.011797 -5.006 9.69e-07 ***
## Logcat 0.258838 0.070134 3.691 0.000267 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.275 on 288 degrees of freedom
## Multiple R-squared: 0.3155, Adjusted R-squared: 0.3036
## F-statistic: 26.55 on 5 and 288 DF, p-value: < 2.2e-16
### OR summary(halibut.normal <- glm(log(Time) ~ Towd + Deldepth + Length + Handtime + Logcat, family = gaussian, data = halibut))
### WHY NOT ? summary(halibut.normal <- glm(Time ~ Towd + Deldepth + Length + Handtime + Logcat, family = gaussian(log), data = halibut))
glm
function from R
and taking into account censoringExponential regression with arbitrary random censoring
Let \(T_1,\ldots,T_n\) be distributed according to \(\mathsf{Exp}(\lambda_i)\) and let \(C_1,\ldots,C_n\) be independent of each other and independent of \(T_1,\ldots,T_n\), with arbitrary distributions. Introduce covariate vectors \({\bf Z}_i\) of dimension \(p\) representing a random sample from an arbitrary distribution. We observe independent triplets \[ (X_1, {\bf Z}_1, \delta_1), \ldots, (X_n, {\bf Z}_n, \delta_n), \] where \(X_i =\min\{T_i,C_i\}\) and \(\delta_i =\mathcal{I}(T_i\leq C_i\}\). Suppose there exists a \(p\)-vector \({\boldsymbol\beta}\) of regression parameters such that \[ \lambda_i=\exp\left\{{\bf Z}_i{\boldsymbol\beta}\right\}. \] The score statistic of the above defined model is equivalent to the score statistic of a Poisson loglinear model with \(\delta_i\) as the response and \(\log X_i\) as the offset. Algorithms for finding the maximum likelihood estimator of \({\boldsymbol\beta}\), calculating the observed information matrix, approximating the distribution of the estimated \({\boldsymbol\beta}\), performing tests about \({\boldsymbol\beta}\) and building confidence intervals are all taken from the Poisson loglinear model.
### Death is censoring indicator !!!
summary(halibut.cens <- glm(Death ~ Towd + Deldepth + Length + Handtime + Logcat + offset(log(Time)), family=poisson, data = halibut))
##
## Call:
## glm(formula = Death ~ Towd + Deldepth + Length + Handtime + Logcat +
## offset(log(Time)), family = poisson, data = halibut)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.4391 -0.1859 0.6323 1.3659 2.4642
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.581185 0.519335 -4.970 6.69e-07 ***
## Towd 0.008305 0.002023 4.105 4.05e-05 ***
## Deldepth 0.016099 0.007578 2.124 0.0336 *
## Length -0.046115 0.010234 -4.506 6.60e-06 ***
## Handtime 0.072703 0.010106 7.194 6.29e-13 ***
## Logcat -0.292355 0.052443 -5.575 2.48e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 683.77 on 293 degrees of freedom
## Residual deviance: 516.20 on 288 degrees of freedom
## AIC: 1074.2
##
## Number of Fisher Scoring iterations: 7
Task 1
Build up a suitable regression model for the halibut survival times within the exponential family framework and taking into account censoring.
Due date: Nov. 15