NMST539 | Lab Session 2Multivariate Normal DistributionLS 2017 | Thursday 02/03/17Rmd file (UTF8 coding)For the second NMST539 lab session we will discover some tools, available within the R software instalation, which are meant for working with the multivariate normal distribution. The R-software is available for download from the website: https://www.r-project.org A user-friendly interface (one of many): RStudio. Manuals and introduction into R (in Czech or English):
R package ‘mvtnorm’For the beginning, we need to make sure that the library for the multivariate normal distribution (
We can now use the loaded library to easily generate random values from some multivariate normal distribution with some pre-specified variance-covariance matrix \(\Sigma\). For more details about the library (including the manual document) see the library website on R cran: CommentIt is useful to always keep in mind that everytime we are about to use some random generator in R (or any other software) it is good to predefine the initial values for the generator (in R there is a command called For instance try (among each other) the command:
and compare it with the same commend, however, with the initial setting of the generator done by the
Is it clear, what is the difference between these two outputs?
For the beginning, let us start with a simple (two-dimensional only) example: we will generate a random sample of size \(n \in \mathbb{N}\) from the two-dimensional normal distribution \(N_{2}\Big(\boldsymbol{\mu} = \Big(\begin{array}{c}2\\3\end{array}\Big), \Sigma = \Big(\begin{array}{cc}10^2 & 6^2\\ 6^2 & 6^2\end{array}\Big) \Big)\). Remember, that in case of a one-dimensional random generator for the normal distribution in R (command The sample of size \(n=1000\) from the given two-dimensional normal distribution can be generated by the following command:
And we can plot the generated random sample into a nice scatterplot using for instance the command
Using now mean estimates and the sample variance-covariance matrix we can easily draw some ‘depth’ contours in the figure. Another option of course, is to use the theoretical values, as we know the distribution we generated from (which is usually not the case in real life situations). However, in the following, we will rather use the sample and we calculate the sample mean vector and the sample variance-covariance matrix.
The two dimensional normal density function is given by \(f(\boldsymbol{x}) = f(x_{1}, x_{2}) = \frac{1}{2 \pi |\Sigma|^{1/2}} exp \left\{ -\frac{1}{2} (\boldsymbol{x} - \boldsymbol{\mu})^\top \Sigma^{-1} (\boldsymbol{x} - \boldsymbol{\mu}) \right\}\), where \(\boldsymbol{x} = (x_{1}, x_{2})^\top \in \mathbb{R}^2\) and \(\boldsymbol{\mu} = (\mu_{1}, \mu_{2})^\top\) is the mean vector. Thus, we also need the inverse matrix for the sample variance-covariance matrix \(\widehat{\Sigma}\) to be able to draw the corresponding contours. The R fuction
This can be used as follows:
And the final plot is produced by the following couple of the commands:
Do by Yourselves
Another option which can be used to get the same results is the function predefined in the R package
Within the
Do by YourselvesCan you reconstruct the countour from the figure above? How is the red ellipse calculated there?
Alternative Approaches with the library
|
Comment
Remmeber, that once we know the joint distribution function all marginals are easily obtainable. On the other hand, however, once we know all marginals it is still not enough to reconstruct the whole joint (multivariate) distribution. Using this idea we can do some the next principle to generate some multivariate normal distribution BUT we can not control for the overal dependence structure - the covariance matrix among the marginals.