NMST539  Lab Session 3
Multivariate Normal Distribution
(marginal and conditional distributions)
LS 2017  Monday 05/03/18
The Rsoftware is available for download from the website: https://www.rproject.org
A userfriendly interface (one of many): RStudio.
Manuals and introduction into R (in Czech or English):

Bína, V., Komárek, A. a Komárková, L.: Jak na jazyk R. (PDF súbor)

Komárek, A.: Základy práce s R. (PDF súbor)

Kulich, M.: Velmi stručný úvod do R. (PDF súbor)

De Vries, A. a Meys, J.: R for Dummies. (ISBN13: 9781119055808)
1. Conditional Normal Distribution
Let us consider a twodimensional normal distribution of some random vector \(\Big(\begin{array}{x}X_{1}\\X_{2}\end{array}\Big)\). The corresponding distribution is usually denoted as
\(\Big(\begin{array}{x}X_{1}\\X_{2}\end{array}\Big) \sim N_{2}\left(\boldsymbol{\mu} = \Big(\begin{array}{c} \mu_{1} \\ \mu_{2}\end{array}\Big), \Sigma = \left( \begin{array}{cc} \sigma_{1}^{2} & \sigma_{12} \\\sigma_{21} & \sigma_{2}^{2} \end{array} \right) \right)\),
where \(\boldsymbol{\mu} \in \mathbb{R}^2\) is the vector of the expected values and \(\Sigma\) is the variancecovariance matrix, which is a positive definite and symmetric, thus \(\sigma_{12} = \sigma_{21}\). The correspoding density function (of the two dimensional normal distrubution) is given by the expression
\(\large{f(\boldsymbol{x}) = \frac{1}{2 \pi \Sigma^{1/2}} exp\Big\{ \frac{1}{2} (\boldsymbol{x}  \boldsymbol{\mu})^{\top} \Sigma^{1} (\boldsymbol{x}  \boldsymbol{\mu}) \Big\},}\)
for an arbitrary \(\boldsymbol{x} = (x_{1}, x_{2})^{\top} \in \mathbb{R}^{2}\).
This density can be used to derive the marginal distrubution of the random variables \(X_{1}\) and \(X_{2}\) or the conditional distrubution of \(X_{1}\) given \(X_{2}\) (or \(X_{2}\) given \(X_{1}\) respectively). In the following we will do both.

For the marginal density of \(X_{1}\) we need to obtain \(f(x_{1}) = \int_{\mathbb{R}} f(x_{1}, x_{2}) \mbox{d}x_{2}\) and analogously also for the marginal density of \(X_{2}\), where integrate the join density wrt the first covariate instead. Both marginals are again normaly distributed and it holds that
\(X_{1} \sim N(\mu_1, \sigma_1^2)~~~\) and \(~~~X_{2} \sim N(\mu_2, \sigma_2^2)\).

For a simple example with a two dimensional normal distribution the conditional distribution distribution of \(X_{2}\) given \(X_{1} = x_{1}\) is, again, normal and it holds that (analogously also for the distribution of \(X_{1}\) given \(X_{2}\))
\((X_{2}  X_{1 } = x_{1}) \sim N\Big(\mu_{2} + \frac{\sigma_{21}(x_1  \mu_1)}{\sigma_{1}^2}, \sigma_{2}^2  \frac{\sigma_{12}\sigma_{21}}{\sigma_{1}^2}\Big).\)
Now we can apply the formulas given above to obtain the marginal and conditional distributions. We will use the R library mvtnorm (which needs to be firstly installed on R). The library is loaded into the working environment by running the command
library("mvtnorm")
Let us consider a simple example with two dimensional normal distribution with the zero mean vector \(\boldsymbol{\mu} = (0,0)^\top\), and the variancecovariance matrix \(\Sigma = \left( \begin{array}{cc} 1 & 0.8 \\0.8& 1\end{array} \right)\). We would like to calculate the conditional distribution of \(X_{2}\) given \(X_{1} = 0.7\).
Do by Yourselves

Is there any linear relationship between the covariates \(X_1\) and \(X_2\)? Can you quantitatively express how strong this relationship is?

In terms of the linear regression modelling approach: imagine you obtain a sample from the given two dimensional normal distribution and you fit a simple regression line to the data. Do you have some expectation about the parameter estimates you obtain when fitting the linear regression model? Try the following:
n < 100
sample < rmvnorm(n, c(0, 0), matrix(c(1, 0.8, 0.8, 1),2,2))
summary(lm(sample[,1] ~ sample[,2]))
Use the following piece of the R code to obtain a comparison between the joint distribution, marginal distribution and the conditional distribution of \(X_{2}\) given \(X_{1} = 0.7\). Derive the theoretical expressions for the marginal distributions and the conditional distribution.
Sigma < matrix(c(1,.8,.8,1), nrow=2) ## variancecovariance matrix
x < seq(3,3,0.01)
contour(x,x,outer(x,x,function(x,y){dmvnorm(cbind(x,y),sigma=Sigma)}), col = "blue")
abline(v=.7, lwd=2, lty=2, col = "red")
text(0.75, 2, labels=expression(x[1]==0.7), col = "red", pos = 4)
### conditional distribution of X2  X1 = 0.7
y < dnorm(x, mean = 0.8 * 0.7, sd = sqrt(1  0.8^2))
lines(yabs(min(x)),x,lty=2,lwd=2, col = "red")
### marginals
m1 < m2 < dnorm(x, 0, 1)
lines(x, m1  abs(min(x)), lty = 1, lwd = 2, col = "gray30")
lines(m2  abs(min(x)), x, lty = 1, lwd = 2, col = "gray30")
The conditional distribution can be obtained for any value \(X_{1} = x_1\), for instance, we obtain the conditional distribution of \(X_{2}  X_{1} = 1\)):
contour(x,x,outer(x,x,function(x,y){dmvnorm(cbind(x,y),sigma=Sigma)}), col = "blue")
abline(v=1, lwd=2, lty=2, col = "red")
### conditional distribution of X2  X1 =  1
y2 < dnorm(x, mean = 0.8 * ( 1), sd = sqrt(1  0.8^2))
lines(y2 + max(x),x,lty=2,lwd=2, col = "red")
