NMST539, LS 2015/16Cvičenie 8 (týždeň 9)Multidimensional Scaling & ClusteringMultidimensional Scaling (MDS)A multidimensional scaling is another common method in statistics used for high dimensional data visualization. However, unlike the principal components analysis (and somehow factor analysis as well) the multidimensional scaling rather focuses on visualizing similarities/dissimilarities in observations rather than major variability directions in the data. The main idea of the multidimensional scaling is to identify meaningful underlying dimensions that allow us to detect existing similarities and dissimilarities between the observed data points. There are of course many different approaches how to define similarities and dissimilarities respectively. If we choose to measure simularities and dissimilarities between variables in a sense of a classical correlation matrix (between available covariates) then we obtain a classical factor analysis approach. On the other hand, if we choose to measure similarities/dissimilarities using a standard Euclidian distance then we end up with a principal component analysis. Many other options are, however, possible. The starting point for the MDS analysis (algoritm) is so called a matrix of distances (respectively a similarity/dissimilarity matrix) between all pairs of observations. The distances are calculated with respect to the available covariates and various definitions for calculating distances can be applied. In some situations the multidimensional scaling approach can be also performed for a similarity/dissimilarity matrix wich is not based on a typical distance (see a nonmetric MDS approach below). In the statistical software R one can use a standard function ‘dist()’ to calculate similarities/dissimilarities (see the help session for further details) wich is available under the standard R instalation. Such matrix can be consequently used for the MDS algorithm which represents the observations in (usually) a lower dimensional plane in a way that the original distances are preserved as well as possible. We will again start with the dataset which respresents different river localities in the Czech republic. The life diversity is measured by a set of 17 various bio metrics and we are interested in identifying similar localities in the dataset.
Measures of Similarities and Dissimilarities in RThere are various distance definitions which can be used to calculate mutual similarities/dissimilarities between the pairs of observations. Considering the R function
Multidimensional ScalingWe again start with the dataset of bio metrics in different localities in the Czech republic. If we stick with the Euclidian metric for calculating distances (similarities respectively dissimilarities) between different localities (with respect to 17 available bio metrics) we already have the corresponding matrix stored in the R object A classical function in the R environment wich perform a multidimensional scaling is
What does the result respresent? Plotting the original data with using two dimensions only (but still preserving original distances as well as possible) can be done by the following command:
To Do
Question
Classical MDS vs. Nonmetric MDSThe key difference between these two approaches is that the first one uses a matrix of similarities/dissimilarities being defined in a sence of classical distance while the second one only uses some matrix of ordered ranks of similarities/dissimilarities (an arbitrary monotone function of distances). For nonmetric MDS one can use the R function
The results can be again plotted using an analogous set of commands:
Question
Another usefull graphical device (especially for small number of observations) is a grah in 2 dimensionscconstructed using a multidimensional scaling approach. It is available in the R library ‘igraph’ (use
Beside the two function already mentioned above ( Domáca úloha (dobrovoľná)Na webovej stránke Doc. Hlávku http://www1.karlin.mff.cuni.cz/~hlavka/teac.html je k dispozícii niekoľko dátových súborov, ktoré do Rka stačí načítať pomocou príkazu
|