# NMST539 | Lab Session 3

## Multivariate Normal Distribution

### LS 2017 | Monday 05/03/18

###### Rmd file (UTF8 coding)

A user-friendly interface (one of many): RStudio.

Manuals and introduction into R (in Czech or English):

• Bína, V., Komárek, A. a Komárková, L.: Jak na jazyk R. (PDF súbor)
• Komárek, A.: Základy práce s R. (PDF súbor)
• Kulich, M.: Velmi stručný úvod do R. (PDF súbor)
• De Vries, A. a Meys, J.: R for Dummies. (ISBN-13: 978-1119055808)

#### 1. Conditional Normal Distribution

Let us consider a two-dimensional normal distribution of some random vector $$\Big(\begin{array}{x}X_{1}\\X_{2}\end{array}\Big)$$. The corresponding distribution is usually denoted as

$$\Big(\begin{array}{x}X_{1}\\X_{2}\end{array}\Big) \sim N_{2}\left(\boldsymbol{\mu} = \Big(\begin{array}{c} \mu_{1} \\ \mu_{2}\end{array}\Big), \Sigma = \left( \begin{array}{cc} \sigma_{1}^{2} & \sigma_{12} \\\sigma_{21} & \sigma_{2}^{2} \end{array} \right) \right)$$,

where $$\boldsymbol{\mu} \in \mathbb{R}^2$$ is the vector of the expected values and $$\Sigma$$ is the variance-covariance matrix, which is a positive definite and symmetric, thus $$\sigma_{12} = \sigma_{21}$$. The correspoding density function (of the two dimensional normal distrubution) is given by the expression

$$\large{f(\boldsymbol{x}) = \frac{1}{2 \pi |\Sigma|^{1/2}} exp\Big\{ -\frac{1}{2} (\boldsymbol{x} - \boldsymbol{\mu})^{\top} \Sigma^{-1} (\boldsymbol{x} - \boldsymbol{\mu}) \Big\},}$$

for an arbitrary $$\boldsymbol{x} = (x_{1}, x_{2})^{\top} \in \mathbb{R}^{2}$$.

This density can be used to derive the marginal distrubution of the random variables $$X_{1}$$ and $$X_{2}$$ or the conditional distrubution of $$X_{1}$$ given $$X_{2}$$ (or $$X_{2}$$ given $$X_{1}$$ respectively). In the following we will do both.

• For the marginal density of $$X_{1}$$ we need to obtain $$f(x_{1}) = \int_{\mathbb{R}} f(x_{1}, x_{2}) \mbox{d}x_{2}$$ and analogously also for the marginal density of $$X_{2}$$, where integrate the join density wrt the first covariate instead. Both marginals are again normaly distributed and it holds that
$$X_{1} \sim N(\mu_1, \sigma_1^2)~~~$$ and $$~~~X_{2} \sim N(\mu_2, \sigma_2^2)$$.

• For a simple example with a two dimensional normal distribution the conditional distribution distribution of $$X_{2}$$ given $$X_{1} = x_{1}$$ is, again, normal and it holds that (analogously also for the distribution of $$X_{1}$$ given $$X_{2}$$)

$$(X_{2} | X_{1 } = x_{1}) \sim N\Big(\mu_{2} + \frac{\sigma_{21}(x_1 - \mu_1)}{\sigma_{1}^2}, \sigma_{2}^2 - \frac{\sigma_{12}\sigma_{21}}{\sigma_{1}^2}\Big).$$

Now we can apply the formulas given above to obtain the marginal and conditional distributions. We will use the R library mvtnorm (which needs to be firstly installed on R). The library is loaded into the working environment by running the command

library("mvtnorm")

Let us consider a simple example with two dimensional normal distribution with the zero mean vector $$\boldsymbol{\mu} = (0,0)^\top$$, and the variance-covariance matrix $$\Sigma = \left( \begin{array}{cc} 1 & 0.8 \\0.8& 1\end{array} \right)$$. We would like to calculate the conditional distribution of $$X_{2}$$ given $$X_{1} = 0.7$$.

#### Do by Yourselves

• Is there any linear relationship between the covariates $$X_1$$ and $$X_2$$? Can you quantitatively express how strong this relationship is?

• In terms of the linear regression modelling approach: imagine you obtain a sample from the given two dimensional normal distribution and you fit a simple regression line to the data. Do you have some expectation about the parameter estimates you obtain when fitting the linear regression model?
Try the following:

n <- 100
sample <- rmvnorm(n, c(0, 0), matrix(c(1, 0.8, 0.8, 1),2,2))
summary(lm(sample[,1] ~ sample[,2]))

Use the following piece of the R code to obtain a comparison between the joint distribution, marginal distribution and the conditional distribution of $$X_{2}$$ given $$X_{1} = 0.7$$. Derive the theoretical expressions for the marginal distributions and the conditional distribution.

Sigma <- matrix(c(1,.8,.8,1), nrow=2) ## variance-covariance matrix

x <- seq(-3,3,0.01)
contour(x,x,outer(x,x,function(x,y){dmvnorm(cbind(x,y),sigma=Sigma)}), col = "blue")

abline(v=.7, lwd=2, lty=2, col = "red")
text(0.75, -2, labels=expression(x[1]==0.7), col = "red", pos = 4)

### conditional distribution of X2 | X1 = 0.7
y <- dnorm(x, mean =  0.8 * 0.7, sd = sqrt(1 - 0.8^2))
lines(y-abs(min(x)),x,lty=2,lwd=2, col = "red")

### marginals
m1 <- m2 <- dnorm(x, 0, 1)
lines(x, m1 - abs(min(x)), lty = 1, lwd = 2, col = "gray30")
lines(m2 - abs(min(x)), x, lty = 1, lwd = 2, col = "gray30")

The conditional distribution can be obtained for any value $$X_{1} = x_1$$, for instance, we obtain the conditional distribution of $$X_{2} | X_{1} = -1$$):

contour(x,x,outer(x,x,function(x,y){dmvnorm(cbind(x,y),sigma=Sigma)}), col = "blue")
abline(v=-1, lwd=2, lty=2, col = "red")

### conditional distribution of X2 | X1 = - 1
y2 <- dnorm(x, mean = 0.8 * (- 1), sd = sqrt(1 - 0.8^2))
lines(-y2 + max(x),x,lty=2,lwd=2, col = "red")