NMST539, LS 2015/16
Cvičenie 3 (týždeň 4)
Multivariate normal distribution)
(marginal and conditional distributions)
Let us firstly recall the two-dimensional normal distribution of some random vector \(\Big(\begin{array}{x}X_{1}\\X_{2}\end{array}\Big)\) denoted as
\(\Big(\begin{array}{x}X_{1}\\X_{2}\end{array}\Big) \sim N_{2}\left(\boldsymbol{\mu} = \Big(\begin{array}{c} \mu_{1} \\ \mu_{2}\end{array}\Big), \Sigma = \left( \begin{array}{cc} \sigma_{1}^{2} & \sigma_{21} \\\sigma_{12} & \sigma_{2}^{2} \end{array} \right) \right)\)
where \(\boldsymbol{\mu}\) is the expectation vector and \(\Sigma\) is the variance-covariance matrix, which is a positive definite and symmetric. The correspoding density function (of the two dimensional normal distrubution) is given by the expression
\(\large{f(\boldsymbol{x}) = \frac{1}{2 \pi |\Sigma|^{1/2}} exp\Big\{ -\frac{1}{2} (\boldsymbol{x} - \boldsymbol{\mu})^{\top} \Sigma^{-1} (\boldsymbol{x} - \boldsymbol{\mu}) \Big\},}\)
for \(\boldsymbol{x} = (x_{1}, x_{2})^{\top} \in \mathbb{R}^{2}\). It is commonly useful to be able to derive marginal distrubution of random variables \(X_{1}\) and \(X_{2}\) or the conditional distrubution of \(X_{1}\) given \(X_{2}\) (or \(X_{2}\) given \(X_{1}\) respectively).
library("mvtnorm")
For a simple example with a two dimensional normal distribution we draw marginal densities and a conditional density of \(X_{2}\) given \(X_{1} = 0.7\).
Sigma <- matrix(c(1,.8,.8,1), nrow=2) ## variance-covariance matrix
x <- seq(-3,3,0.01)
contour(x,x,outer(x,x,function(x,y){dmvnorm(cbind(x,y),sigma=Sigma)}))
abline(v=.7, lwd=2, lty=2, col = "red")
text(0.75, -2, labels=expression(x[1]==0.7), col = "red", pos = 4)
### conditional distribution of X2 | X1 = 0.7
y <- dnorm(x, mean = 0.8 * 0.7, sd = sqrt(1 - 0.8^2))
lines(y-abs(min(x)),x,lty=2,lwd=2, col = "red")
### marginals
m1 <- m2 <- dnorm(x, 0, 1)
lines(x, m1 - abs(min(x)), lty = 1, lwd = 1)
lines(m2 - abs(min(x)), x, lty = 1, lwd = 1)

And we can add some other conditianal distribution of (e.g. for \(X_{2} | X_{1} = -1\)) by adding additional lines of an R code:
contour(x,x,outer(x,x,function(x,y){dmvnorm(cbind(x,y),sigma=Sigma)}))
abline(v=-1, lwd=2, lty=2, col = "blue")
### conditional distribution of X2 | X1 = - 1
y2 <- dnorm(x, mean = 0.8 * (- 1), sd = sqrt(1 - 0.8^2))
lines(-y2 + max(x),x,lty=2,lwd=2, col = "blue")

Another option:
contour(x,x,outer(x,x,function(x,y){dmvnorm(cbind(x,y),sigma=Sigma)}))
abline(v=-1, lwd=1, lty=1, col = "blue")
lines(y2 - 1 ,x,lty=2,lwd=2, col = "blue")

And even more generalized version for different values of \(x_{1} \in \mathbb{R}\):
contour(x,x,outer(x,x,function(x,y){dmvnorm(cbind(x,y),sigma=Sigma)}))
condN <- function(x, cx){dnorm(x, mean = 0.8 * cx, sd = sqrt(1 - 0.8^2))}
for (i in seq(-2, 2, by = 0.25)){col <- colors()[grep("red",colors())][4*i + 9];
lines(condN(x, i) + i, x, lwd = 2, col = col);
abline(v = i, lwd=1, lty=1, col = col)}

Conditional multivariate normal with ‘condMVNorm()’
There is a nice (and quite brief) R library available from the R cran mirrors which can be used to calculate the conditional distribution of some random vector \(\boldsymbol{X}_{1}\) given another random vector (value) \(\boldsymbol{X}_{2} = \boldsymbol{x}_{2}\) where the joint distribution of \((\boldsymbol{X}_{1}^{\top}, \boldsymbol{X}_{2}^{\top})^{\top}\) is multivariate normal (with the corresponding dimension). The library can be installed by using command install.packages("condMVNorm") .
library("condMVNorm")
## Warning: package 'condMVNorm' was built under R version 3.1.2
Functions dcmvnorm() and condMVN can be directly used to obtain the conditional distribution of interest (see the help session for more details - ?dcmvnorm and ?condMVN ).
We apply these functions to the previous example with a two dimensional normal distribution with a zero mean vector and \(\Sigma = \left( \begin{array}{cc} 1 & 0.8 \\ 0.8 & 1 \end{array} \right)\). We want to obtain the conditional distribution of \(X_{2}\) given \(X_1 = 0.7\).
condDist <- condMVN(mean = c(0,0), sigma = Sigma, dependent.ind = 2, given.ind = 1, X.given = 0.7)
and also
dcmvnorm(0, mean = c(0,0), sigma = Sigma, dependent.ind = 2, given.ind = 1, X.given = 0.7)
## [1] 0.4301297
which should be compared with
dnorm(0, condDist$condMean, sqrt(condDist$condVar))
## [1] 0.4301297
Conditional normal distribution example
In this example we consider a dataset mtcars which is standardly available in the R environment (for a more detailed description see the help session: ?mtcars ).
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
attach(mtcars)
Let us assume that covariates ‘mpg’ (miles per gallon), ‘hp’ (horse power) and ‘disp’ (displacement) are all normaly distributed and the triple \((X_{1}, X_{2}, X_{3})^{\top} = (mpg, hp, disp)^{\top}\) follows a three dimensional normal distribution with some unknown mean vector \(\boldsymbol{\mu} = (\mu_{1}, \mu_{2}, \mu_{3})^{\top} \in \mathbb{R}^{3}\) and some positive definite and symmetric variance-covariance matrix \(\Sigma\).

What is the (estimated) distribution if the miles per gallon (‘mpg’) given the horse power and displacement, where \((X_{2}, X_{3})^{\top} = (120,120)^{\top}\)?
The corresponding estimates can be obtained by
mu <- apply(cbind(mpg, hp, disp), 2, mean)
Sigma <- cov(cbind(mpg, hp, disp))
Thus, the corresponding conditional mean and variance are obtained as
condMean <- mu[1] + Sigma[1, 2:3] %*% solve(Sigma[2:3,2:3]) %*% (c(120, 120) - mu[2:3])
condSigma <- Sigma[1,1] - Sigma[1, 2:3] %*% solve(Sigma[2:3,2:3]) %*% Sigma[2:3, 1]
Which gives us the following distribution function:
condMean
## [,1]
## [1,] 24.11354
condSigma
## [,1]
## [1,] 9.14495
plot(dnorm(x <- seq(0, 35, length = 500), mean = condMean, sd = sqrt(condSigma)) ~ x,
xlab = "Miles per gallon", ylab = "Conditional Density",
type = "l", lwd = 2, col = "red",
xlim = c(0, 35), ylim = c(0,0.18))
lines(dnorm(x, mean(mpg), sd(mpg)) ~ x, lwd = 1)
legend(0, 0.175, legend = c("Conditional distribution: MPG | HP = 120 & Disp = 120",
"Estimated Normal Distribution of MPG"), col = c("red", "black"), lty = c(1,1))

Domáca úloha
(Deadline: piaty týždeň / 21.03.2016)
Na webovej stránke Doc. Hlávku
http://www1.karlin.mff.cuni.cz/~hlavka/teac.html
je k dispozícii niekoľko dátových súborov, ktoré do Rka stačí načítať pomocou príkazu load(nazov_suboru.rda) .
-
vyberte si jeden dátový súbor, u ktorého je možne predpokladať aspoň dve premenné, ktoré by mohli mať normálne rozdelenie (napr. odhadnúť pomocou histogramu) načítajte ho v Rku;
-
do již vytvoreného
Rnw súboru, pridajte aspoň jeden obrázok podobný s tými, ktoré sme vytvárali na začiatku cvičenia;
-
príslušný obrázok by mal obsahovať kontúry, marginálne hustory, podmienenú hustotu a vhodný popis;
|
Comment
Keep in minds that some functions in R require standard deviation to be specified when working with the normal distribution (e.g. functions
dnorm()
,pnorm()
,qnorm()
, etc.) while others need the variance-covariance matrix (variance in one dimension) to be provided instead.