NMST539 | Lab Session 4

Wishart Distribution

(application for confidence bands and statistical tests)

LS 2017 | Tuesday 21/03/2017

Rmd file (UTF8 encoding)

The R-software is available for download from the website: https://www.r-project.org

A user-friendly interface (one of many): RStudio.

Manuals and introduction into R (in Czech or English):

Bína, V., Komárek, A. a Komárková, L.: Jak na jazyk R. (PDF súbor)
Komárek, A.: Základy práce s R. (PDF súbor)
Kulich, M.: Velmi stručný úvod do R. (PDF súbor)
De Vries, A. a Meys, J.: R for Dummies. (ISBN-13: 978-1119055808)

Wishart distribution

The Wishart distribution is, as an analogy to the \(\chi^2\) distribution and the variance inference in the univariate case, a very useful tool for the analysis of the variance-covariance matrix of same random sample \(X_{1}, \dots, X_{n}\) for \(n \in \mathbb{N}\) where each \(X_{i}\) in the sample is a \(p\)-dimensional random vector (\(p\) different covariates recorded on each subject). Thus, it is probability distributions defined over symmetric and nonnegative-definite random matrices.

In general, the Wishart distribution is a multi-variate generalization of the \(\chi^2\) distribution: for instance, for a normaly distributed random sample \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n}\) drawn from some multivariate distribution \(N_{p}(\boldsymbol{0}, \Sigma)\), with the zero mean vector and \(\Sigma\) being some variance-covariance matrix we have that \[ \mathbb{X}^{\top}\mathbb{X} \sim W_{p}(\Sigma, n), \] for \(\mathbb{X} = (\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n})^{\top}\). Similarly, for a random sample \(X_{1}, \dots, X_{n}\) drawn from \(N(0,1)\) we have that \[ \boldsymbol{X}^{\top}\boldsymbol{X} \sim \chi_{n}^2, \] where \(\boldsymbol{X} = (X_{1}, \dots, X_{n})^\top\). Thus, for a sample of size \(n \in \mathbb{N}\) drawn from \(N(0,1)\) the corresponding distribution of \(\mathbb{X}^\top\mathbb{X}\) is \(W_{1}(1, n) \equiv \chi_{n}^2\).

The corresponding density function of the Wishart distribution takes the form \[ f(\mathcal{X}) = \frac{1}{2^{np/2} |\Sigma|^{n/2} \Gamma_{p}(\frac{n}{2})} \cdot |\mathcal{X}|^{\frac{n - p - 1}{2}} e^{-(1/2) tr(\Sigma^{-1}\mathcal{X})}, \] for \(\mathcal{X}\) being a \(p\times p\) random matrix and \(\Gamma_{p}(\cdot)\) is a multivariate generalization of the Gamma function \(\Gamma(\cdot)\) In the R software there are various options (packages) on how to use and apply the Wishart distribution.

To Do by Yourself (Theoretical and Practical)

Let the multivariate random sample \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n}\) comes from some general multivariate distribution \(N_{p}(\boldsymbol{\mu}, \Sigma)\), with some mean vector \(\boldsymbol{\mu} \in \mathbb{R}^p\) and the variance-covariance positive definite matrix \(\Sigma\). Show, that the random matrix defined as \(n\mathbb{X}^\top \mathcal{H}\mathbb{X}\) follows the Wishart distribution \(W_{p}(\Sigma, n - 1)\), for \(\mathcal{H}\) being the idenpotent centering matrix \(\mathcal{H} = \mathbb{I}_{n} - \frac{1}{n}\boldsymbol{1}_{n}\boldsymbol{1}_{n}^\top\).
use the help session in R to find out more about some standard commands used in R for the Wishart distribution (in particular, check the help session for the command rWishart());
some other commands and additional extensions are available in some packages which needs to be downloaded and installed separately, e.g. function Wishart() in the library ‘MCMCpack’, or Wishart() in the library ‘mixAK’, or rwishart() in the library ‘dlm’, dist.Wishart() in the library ‘LaplacesDemon’, and others;

A simple random sample from Wishart distribution \(W_{1}(\Sigma = 1, n = 10)\) (equivalently a \(\chi^2\) distribution with \(n = 10\) degrees of freedom) can be obtained, for instance, as

S <- as.matrix(1)
sample <- rWishart(10, df = 10, S)

Using a standard approach for generating a univariate random sample we would use the R function rchisq() for this scenario. Try to use both and using some histograms compare whether do they correspond or not.For large sample size \(n \in \mathbb{N}\) we they should be quite similar/identical.

set.seed(1234)
sampleWishart <- rWishart(5000, df = 10, S)
sampleChiSq <- rchisq(5000, df = 10)

par(mfrow = c(1,2))
hist(sampleWishart, col = "lightblue", main = expression(paste("Wishart Distribution", sep ="")), xlim = c(0, 40), breaks = 30, freq = F)
lines(density(sampleWishart), col = "red", lwd = 2)
hist(sampleChiSq, col = "lightblue", main = expression(paste(chi^2, "Distribution", sep = "")), xlim = c(0,40), breaks = 30, freq = F)
lines(density(sampleChiSq), col = "red", lwd = 2)

To Do by Yourself

consider some Wishart distribution for some \(p \in \mathbb{N}\) such that \(p > 1\);
generate a sample from the Wishart distribution \(W_{p}\) with some reasonable parameters \(\Sigma\) and \(df\);
consider some general matrix \(A \in \mathbb{R}^{p \times q}\) for some \(q \in \mathbb{N}\) such that \(p \neq q\) and verify (e.g. by histograms) that \(A^\top \mathbb{M} A \sim W_{q}(A^\top\Sigma A, n)\), where \(\mathbb{M} \sim W_{p}(\Sigma, n)\);

Hotelling’s \(\boldsymbol{T^2}\) Distribution

In a similar manner as we define a classical \(t\)-distribution in an univariate case (i.e. standard normal \(N(0,1)\) variable devided by a square root of a \(\chi^2\) variable normalized by its degrees of freedom) we define a multivariate generalization (Hotelling’s \(T^{2}\) distribution) as \[ n \boldsymbol{Y}^{\top} \mathbb{M}^{-1} \boldsymbol{Y} \sim T^{2}(n, p), \] to be a random variable with the Hotelling’s \(T^2\) distribution with \(p \in \mathbb{N}\) to be the dimension of \(Y \sim N_{p}(0, \mathbb{I})\) and \(n \in \mathbb{N}\) being the parameter of the Wishart distribution of \(\mathbb{M} \sim W_{p}(\mathbb{I}, n)\).

A special case for \(p = 1\) gives the standard Fisher distribution with one and \(n\) degrees of freedom (equivalently a square of the \(t\)-distribution with \(n\) degrees of freedom). The Hotelling’s \(T^2\) distribution with parameters \(p, n \in \mathbb{N}\) can be identivied with the Fisher distribution using the following expression \[ T^{2}(p, n) \equiv \frac{n p}{n - p + 1}F_{p, n - p + 1}. \]

In general, the relationship betwenn the Hotelling’s \(T^2\) distribution and the Fisher distribution is given by the same expression \[ T^{2}(p, n) \equiv \frac{n p}{n - p + 1}F_{p, n - p + 1}. \]

The effect of different parameter settings can be (a little) visualized, for instance, by the following figure:

samples <- NULL
samples <- cbind(samples, rf(n = 1000, df1 = 1, df2 = 1))
samples <- cbind(samples, rf(n = 1000, df1 = 1, df2 = 10))

samples <- cbind(samples, rf(n = 1000, df1 = 10, df2 = 10))
samples <- cbind(samples, rf(n = 1000, df1 = 10, df2 = 100))

samples <- cbind(samples, rf(n = 1000, df1 = 100, df2 = 10))
samples <- cbind(samples, rf(n = 1000, df1 = 100, df2 = 100))

samples <- cbind(samples, rf(n = 1000, df1 = 1000, df2 = 100))
samples <- cbind(samples, rf(n = 1000, df1 = 1000, df2 = 1000))

par(mfrow = c(4,2))
for (i in 1:dim(samples)[2]){
  hist(samples[,i], xlab=paste("Sample no.", i, sep = ""), col = "yellow", main = "", freq = F, breaks = 20)
  lines(density(samples[,i]), col = "red")
}

Moreover, using the relationship between the Hotelling’s \(T^2\) distribution and the Fisher’s F distriubtion we can effectively use the \(F\)-distribution to draw the corresponding critical values when needed for some statistical tests.

For some multivariate random sample \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n} \sim N_{p}(\boldsymbol{\mu}, \Sigma)\) for some unknown mean vector \(\boldsymbol{\mu} \in \mathbb{R}^{p}\) and some variance-covariance matrix \(\Sigma\), we have that \[ (n - 1)\Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big)^\top \mathcal{S}^{-1} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big) \sim T^2(p, n - 1), \] which can be also equivalently expressed as \[ \frac{n - p}{p} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big)^\top \mathcal{S}^{-1} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big) \sim F_{p, n - p}. \] This can be now used to construct confidence regions for the unknown mean vector \(\boldsymbol{\mu}\) of for testing hypothesis about the true value of the vector of parameters \(\boldsymbol{\mu} = (\mu_{1}, \dots, \mu_{p})^{\top}\). However, rather than construction a confidence region for \(\boldsymbol{\mu}\) (which can be impractical in higher dimensions for even slightly larger values of \(p\)) one focusses on construction confidence intervals for the elements of \(\boldsymbol{\mu}\) such that the mutual coverage is under control (usually we require a simultaneous coverage of \((1 - \alpha)\times 100~\%\) for some small \(\alpha \in (0,1)\)).

For a hypothesis test

\[ H_{0}: \boldsymbol{\mu} = \boldsymbol{\mu}_{0} \in \mathbb{R}^{p} \]

\[ H_{1}: \boldsymbol{\mu} \neq \boldsymbol{\mu}_{0} \in \mathbb{R}^{p} \]

we can use the following test statistic:

\[ (n - 1)\Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big)^\top \mathcal{S}^{-1} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big), \]

which, under the null hypothesis, follows the \(T^2(p, n - 1)\) distribution. Equivalently we also have that

\[ \frac{n - p}{p} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big)^\top \mathcal{S}^{-1} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big) \]

follows the Fisher \(F\)-distribution with \(p\) and \(n - p\) degrees of freedom.

In the R software we can use the library install.packages("DescTools") - if it is installed in the R environment, we can load the library by the command

library(DescTools)

and we can use the function HotellingsT2Test() to perform the test on multivariate mean vector (use the help session to find out how the function works).

Using the same approach we can also construct the confidence elipsoid for \(\boldsymbol{\mu} \in \mathbb{R}^p\) - it holds that \[ \frac{n - p}{p} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big)^\top \mathcal{S}^{-1} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big) \sim F_{p, n - p}, \] and therefore, the following set \[ \left\{\boldsymbol{\mu} \in \mathbb{R}^p;~ \Big( \overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big)^\top \mathcal{S}^{-1} \Big( \overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big) \leq \frac{p}{n - p} F_{p, n- p}(1 - \alpha) \right\} \] is a confidence region at the confidence level of \(\alpha = 1 - \alpha\) for the vector of parameters \(\boldsymbol{\mu} \in \mathbb{R}^p\) - an interior of iso-distance ellipsoid in \(\mathbb{R}^p\).

A brief example on how it works (example from the lecture notes):

library(mvtnorm)
s=matrix(c(1,-0.5,-0.5,1),2);x=seq(-3,3,by=0.015) ### variance-covariance matrix

contour(x,x,outer(x,x, function(x,y){dmvnorm(cbind(x,y),sigma=s)}))
n <- 20
X <- rmvnorm(n,sigma=s)
m <- apply(X,2,mean)
S <- cov(X)
points(m[1],m[2],pch=8,col="red",cex=2)

S1=solve(S)
contour(x,x,outer(x,x,function(x,y){(n-2)*
                                      apply(t(t(cbind(x,y))-m),1,function(x){t(x)%*%S1%*%x})<
                                      2*qf(0.95,2,n-2)}),col="red",add=TRUE)
bodyx=m[1]+c(-1,1)*sqrt(S[1,1]*2*qf (0.95,2,n-2) /(n-2))
bodyy=m[2]+c(-1,1)*sqrt(S[2,2]*2*qf (0.95,2,n-2) /(n-2))
polygon(x=bodyx[c(1,1,2,2,1)],y=bodyy[c(1,2,2,1,1)],border="blue")

(which is the 95% confidence region the true mean vector)

To Do by Yourself

Use the approach described above and construct a confidence region for the true mean vector for some data set of your choice;
Explain and interpret the obtained results. What is the conclusion?

In an analogous way one can also construct a two sample Hotelling test to compare two population means.

Difference of two multivariate means with the same variance-covariance matrix

We assume a multivariate sample \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n} \sim N_{p}(\boldsymbol{\mu}_{1}, \Sigma)\) and some another sample \(\boldsymbol{Y}_{1}, \dots, \boldsymbol{Y}_{M} \sim N_{p}(\boldsymbol{\mu}_{2}, \Sigma)\), with some generally different mean parameter \(\boldsymbol{\mu}_{1} \neq \boldsymbol{\mu}_{2}\). We are interested in testing the null hypothesis \[ H_{0}: \boldsymbol{\mu} = \boldsymbol{\mu}_{0} \] against the alternative hypothesis, that the null does not hold. We have that the following holds: \[ (\overline{X}_{1} - \overline{X}_{2}) \sim N_{p}\Bigg(\boldsymbol{\Delta}, \frac{n + m}{n m} \Sigma\Bigg), \] and also \[ n\mathcal{S}_{1} + m\mathcal{S}_{2} \sim W_{p}(\Sigma, n + m - 2), \] where \(\mathcal{S}_{1}\) and \(\mathcal{S}_{2}\) respectively are the empirical estimates of the variance-covariance matrix \(\Sigma\), given the first sample and the second sample respectively. Then the rejection region is defined as \[ \frac{nm(n + m - p - 1)}{p(n + m)^2}(\overline{\boldsymbol{X}}_{1} - \overline{\boldsymbol{X}}_{2})^{\top} \mathcal{S}^{-1} (\overline{\boldsymbol{X}}_{1} - \overline{\boldsymbol{X}}_{2}) \geq F_{p, n + m - p - 1}(1 - \alpha). \]

Difference of two multivariate means with unequal variance-covariance matrix

Now, we assume a multivariate sample \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n} \sim N_{p}(\boldsymbol{\mu}_{1}, \Sigma_{1})\) and some another sample \(\boldsymbol{Y}_{1}, \dots, \boldsymbol{Y}_{M} \sim N_{p}(\boldsymbol{\mu}_{2}, \Sigma_{2})\), with some generally different mean parameter \(\boldsymbol{\mu}_{1} \neq \boldsymbol{\mu}_{2}\). We are interested in testing the null hypothesis \[ H_{0}: \boldsymbol{\mu} = \boldsymbol{\mu}_{0} \] against the alternative hypothesis, that the null does not hold. Again, we have that the following holds: \[ (\overline{X}_{1} - \overline{X}_{2}) \sim N_{p}\Bigg(\boldsymbol{\Delta}, \frac{\Sigma_{1}}{n} + \frac{\Sigma_{2}}{m}\Bigg), \] and therefore, it also holds that \[ (\overline{X}_{1} - \overline{X}_{2})^\top \Big(\frac{\Sigma_{1}}{n} + \frac{\Sigma_{2}}{m}\Big)^{-1} (\overline{X}_{1} - \overline{X}_{2}) \sim \chi_{p}^{2}. \]

To Do by Yourself

download and install the R library ‘Hotelling’ and ‘DescTools’;
for one sample and two sample Hotelling tests use the command HotellingsT2Test() from ‘DescTools’ package;
for two sample Hotelling tests you can also use the command hotelling.test() from ‘Hotelling’ package;
use the help session to find out more about commands hotelling.stat() and hotelling.test() which can be used to test the null hypothesis about the true value of the mean vector;

library("Hotelling")

There is a dataset called container.df available in the package ‘Hotelling’. The data represent some concentration measurements on nine different elements (see the help session ?container.df for further details).

data(container.df)
summary(container.df)

##        gp            Ti                Al               Fe         
##  Min.   :1.0   Min.   :0.02800   Min.   :0.6590   Min.   :0.02700  
##  1st Qu.:1.0   1st Qu.:0.03175   1st Qu.:0.7405   1st Qu.:0.02975  
##  Median :1.5   Median :0.03300   Median :0.7850   Median :0.06950  
##  Mean   :1.5   Mean   :0.03350   Mean   :0.8047   Mean   :0.07665  
##  3rd Qu.:2.0   3rd Qu.:0.03700   3rd Qu.:0.8730   3rd Qu.:0.12200  
##  Max.   :2.0   Max.   :0.03900   Max.   :0.9410   Max.   :0.16800  
##        Mn               Mg               Ca              Ba         
##  Min.   :0.0020   Min.   :0.0900   Min.   :5.511   Min.   :0.00900  
##  1st Qu.:0.0020   1st Qu.:0.1130   1st Qu.:6.422   1st Qu.:0.01200  
##  Median :0.0030   Median :0.1595   Median :6.989   Median :0.01550  
##  Mean   :0.0034   Mean   :0.1759   Mean   :7.001   Mean   :0.01615  
##  3rd Qu.:0.0050   3rd Qu.:0.2362   3rd Qu.:7.578   3rd Qu.:0.02000  
##  Max.   :0.0050   Max.   :0.2700   Max.   :7.780   Max.   :0.02300  
##        Sr               Zr         
##  Min.   :0.0130   Min.   :0.00900  
##  1st Qu.:0.0150   1st Qu.:0.01000  
##  Median :0.0160   Median :0.01150  
##  Mean   :0.0162   Mean   :0.01125  
##  3rd Qu.:0.0170   3rd Qu.:0.01200  
##  Max.   :0.0190   Max.   :0.01400

The estimates for the mean vector and variance-covariance matrix can be obtained as

apply(container.df[,2:10], 2, mean) ### mean vector estimate

##      Ti      Al      Fe      Mn      Mg      Ca      Ba      Sr      Zr 
## 0.03350 0.80470 0.07665 0.00340 0.17590 7.00055 0.01615 0.01620 0.01125

cov(container.df[,2:10])            ### variance-covariance estimate

##               Ti           Al            Fe            Mn            Mg
## Ti  1.194737e-05 2.289474e-04  1.527632e-04  4.421053e-06  1.975789e-04
## Al  2.289474e-04 7.775589e-03  3.868363e-03  1.112842e-04  5.884863e-03
## Fe  1.527632e-04 3.868363e-03  2.465608e-03  7.072632e-05  3.294963e-03
## Mn  4.421053e-06 1.112842e-04  7.072632e-05  2.147368e-06  9.583158e-05
## Mg  1.975789e-04 5.884863e-03  3.294963e-03  9.583158e-05  4.784832e-03
## Ca  6.710526e-06 2.654805e-03  2.015711e-04  1.297895e-05 -4.767895e-05
## Ba  1.355263e-05 3.768895e-04  2.209500e-04  6.410526e-06  3.091211e-04
## Sr -2.105263e-07 1.484211e-06 -1.876842e-05 -6.631579e-07 -1.371579e-05
## Zr  1.921053e-06 6.907895e-05  2.451316e-05  6.842105e-07  4.423684e-05
##               Ca            Ba            Sr            Zr
## Ti  6.710526e-06  1.355263e-05 -2.105263e-07  1.921053e-06
## Al  2.654805e-03  3.768895e-04  1.484211e-06  6.907895e-05
## Fe  2.015711e-04  2.209500e-04 -1.876842e-05  2.451316e-05
## Mn  1.297895e-05  6.410526e-06 -6.631579e-07  6.842105e-07
## Mg -4.767895e-05  3.091211e-04 -1.371579e-05  4.423684e-05
## Ca  4.574089e-01  5.564921e-04 -3.695368e-04 -1.588158e-05
## Ba  5.564921e-04  2.139737e-05 -1.821053e-06  2.644737e-06
## Sr -3.695368e-04 -1.821053e-06  2.484211e-06  1.210526e-06
## Zr -1.588158e-05  2.644737e-06  1.210526e-06  1.671053e-06

and we can test whether the mean vector of concentrations in the first container (gp == 1) and the second container (gp == 2) is the same or not. Try to perform a one sample Hotelling test to test whether the unknown vector of mean parameters equals some prespecified vector \(\boldsymbol{\mu}_{0} \in \mathbb{R}^{p}\).

For a two sample problem we can use the first covariate in the data to define two populations and we provide a test whether the mean vectors are equal or not. The corresponding Hotelling test statistic equals

split.data = split(container.df[,-1],container.df$gp)
x <- split.data[[1]]
y <- split.data[[2]]
statistic <- hotelling.stat(x, y)

and the corresponding test is performed by

hotellingTest <- hotelling.test(.~gp, data = container.df)
hotellingTest

## Test stat:  126.11 
## Numerator df:  9 
## Denominator df:  10 
## P-value:  4.233e-09

Questions

What is the interpretation of the test results?
Try to perform a simple two sample \(t\)-test(s). What are the conclussions now?
What would be the corresponding simultaneous confidence intervales for the elements of the mean vector in this example?

The correponding simultaneous confidence intervals for all possible linear combinations \(\boldsymbol{a}^\top \boldsymbol{\mu}\) of the mean vector \(\boldsymbol{\mu} \in \mathbb{R}^{p}\) is given as

\[ P\Big(\forall \boldsymbol{a} \in \mathbb{R}^{p};~ \boldsymbol{a}^\top \boldsymbol{\mu} \in \big( \boldsymbol{a}^\top\overline{\boldsymbol{X}} - \sqrt{K_{\alpha} \boldsymbol{a}^\top \mathcal{S} \boldsymbol{a}}, \boldsymbol{a}^\top\overline{\boldsymbol{X}} + \sqrt{K_{\alpha} \boldsymbol{a}^\top \mathcal{S} \boldsymbol{a}} \big) \big)\Big) = 1 - \alpha, \] where \(K_{\alpha}\) is the corresponding quantile obtained from the Fisher \(F\) distribution (\(K_{\alpha} = \frac{p}{n - p} F_{p, n - p}(1 - \alpha)\)) and \(\mathcal{S}\) is the sample variance-covariance matrix.

Fquant <- (9/(20 - 9)) * qf(1 - 0.05, 9, 11)
meanVec <- apply(container.df[,2:10], 2, mean)
LB <- diag(9) %*% meanVec - sqrt(Fquant * diag(diag(9) %*% cov(container.df[,2:10])  %*% diag(9)))
UB <- diag(9) %*% meanVec + sqrt(Fquant * diag(diag(9) %*% cov(container.df[,2:10])  %*% diag(9)))

confInt <- data.frame(LB, UB, meanVec,  row.names = names(container.df)[2:10])
names(confInt) <- c("Lower Boundary", "Upper Boundary", "Mean Estimate")

confInt

##    Lower Boundary Upper Boundary Mean Estimate
## Ti   0.0281791989    0.038820801       0.03350
## Al   0.6689600906    0.940439909       0.80470
## Fe   0.0002131292    0.153086871       0.07665
## Mn   0.0011442333    0.005655767       0.00340
## Mg   0.0694184851    0.282381515       0.17590
## Ca   5.9594482060    8.041651794       7.00055
## Ba   0.0090293265    0.023270674       0.01615
## Sr   0.0137737525    0.018626247       0.01620
## Zr   0.0092600784    0.013239922       0.01125

To Do by Yourself

Consider some other linear combination for the simultaneous confidence intervals.
Define necessary linear combinations in order to obtain all pair-wise mean comparissons and their corresponding \(95~\%\) confidence (simultaneous) intervals.
What approach should be used to construct a condidence region (elipsoid)?

Homework Assignment

(Deadline: fifth lab session / 04.04.2007)

Use the command below to instal the SMSdata packages with various multivariate datasets:

install.packages(pkgs="http://www.karlin.mff.cuni.cz/~hlavka/sms2/SMSdata_1.0.tar.gz", repos=NULL, type="source")

Chose one dataset of your choise from the list of all available datasets in the package:

data(package = "SMSdata")

There are 21 different datasets and you can load each of them by typing and running data(dataset_name), for instance:

data(plasma)

Choose one dataset where you can assume at least two differen groups. Calculate the estimates for the mean vector in each group and apply the statistical test, that both mean vectors are the same.
Choose one group only, and define a reasonable null vector (\(\boldsymbol{\mu}_{0}\)) for the true mean vector (\(\boldsymbol{\mu}\)) and test the null hypothesis that \(\boldsymbol{\mu}\) is equal to \(\boldsymbol{\mu}_{0}\). Provide the corresponding \(95~\%\) confidence region and interpret the results.