Lectures: | Thursday 14:00 in K11 |

Exercises: | Thursday 15:40 in K11 |

- Lectures and exercises will be taught in English if at least one student will require so. If nobody will require English as a teaching language then both lectures and exercises will be taught in Czech.

Matúš Čellár | Adéla Drabinová | Jaroslav Dufek | Karel Chuchel | Kateřina Janoušková | Dominik Matula | Jan Moravec |

- Please, by the end of March, install SAS software on your laptop (if you have some).
The software is available
**(for academic purposes only)**to all MFF UK students in the framework of the SAS Academic Programme. Installation is provided by RNDr. Ing. Jaroslav Richter on the 3rd floor (please, arrange an appointment either via telephone (221 913 206 or just line 3206 from any phone in the Karlín building) or via e-mail`richter[AT]karlin.ETC-YOU-KNOW-WELL-WHAT`).

Lecture 1 (20/02/2014)

**Topic:** HTML and bibliographic information sources on Internet.

HTML tags: | Page at w3schools Page at htmldog |

CSS Templates: | CSS Templates For Free Andreas Viklund Example from CSS Templates For Free |

Classification systems: | MSC JEL |

Bibliographic databases: | Web of Science (WOS) MathSciNet Scopus ZentralBlatt MATH Google Scholar |

DOI number: | doi.org DOI at Wiki |

Articles databases: | JSTOR Wiley Online Library ScienceDirect SpringerLink |

htpasswd: | .htaccess Example .htaccess Example 2 On-line htpasswd generator |

- Create your homepage at Artax server and then send a link to this page to the lecturer via e-mail.
- Add to this webpage information concerning your Bachelor or Master thesis including its MSC and/or JEL classification, keywords in both Czech/Slovak and English. Further, provide three references from your thesis including the following information: DOI number (as an active link), number of citations according to WOS and Scopus, information whether a full text of the publication is available from IP's of MFF UK. If it is, include the link to this full text.

Lecture 2 (27/02/2014)

nmst440-latex.tex | nmst440-latex.bib | akplainnat.bst | Makefile | nmst440-latex.pdf |

AK_small.jpg | AK_small.eps | AK.jpg | ||

nmst440-tdens.R | dt_1.eps | dt_all.pdf | dt_all.gif |

- Use LaTeX package custom-bib to prepare a
`bst`file to be able to produce a list of references as close as possible to the style requested by*Statistical Modelling*journal, see here. - Use one of databases (WOS, MathSciNet, ...) introduced last week or other resources to find references related to keywords from the previous assignment. Find at least five papers and at least one book.
- Create a
`bib`file containing those references. - Use LaTeX and the
`bst`style file from assignment 1 and write a short text where you use different types of referencing (direct, indirect) to works from your`bib`file (when working on this part, try also other standard bibliography styles like`plain`,`unsrt`,`abbrv`, ...). - Use
`Gimp`and`ps2pdf`to convert any`jpg`file (any photograph, printscreen, ...) into`eps`and`pdf`and include it in your LaTeX document. - Prepare a series of plots illustrating the central limit theorem applied to the chi-squared distribution (do not forget that some standardization of the chi-squared density
is needed), include the plots in your LaTeX file. Prepare not only plots with densities but also with corresponding cumulative distirbution functions (cdf's).
Create a
`pdf`document from your LaTeX file. Include a link to this`pdf`file on your webpage. - Use
`convert`to prepare dynamic`gif`files (based on densities and cdf's) based on the plots prepared in the previous item. Include those`gif`files on your webpage.

Lecture 3 (06/03/2014)

nmst440-graphics.R | pchShow.R | dmix2.R | |

Adobe symbols encoding | tiger.ps | ||

nmst440-readData.R | auta2004.dat | auta2004.csv | |

cars.xls | cars.csv |

- Consider bivariate t-distributions with
*ν=5*and*ν=50*degrees of freedom and a scale matrix having values of 1 and 4 on a diagonal and an off-diagonal value of 1. Draw a heat map supplemented by contour lines of densities of those t-distributions. Further, draw a 3D plot of those densities. Additionally, sample randomly 100 points from each of those distributions and add the sample points to the heat maps. Include all plots in the LaTeX document from the previous assignment.

*Remark:*Multivariate t-distribution (density, distribution function, random sampling) is implemented, e.g., in an R package`mvtnorm`. - Take data included in the Excel sheet partners.xls
related to this questionnaire and prepare
an
`RData`file containing a data frame with well-formatted data (no gross errors, categorical variables as factors, ...). At this stage, keep two date columns`DOB`and`DateInterv`as having a class`character`. Additionally, create the following derived variables:`NumPtnr`: Real number of reported partners.`Vppnarg`: Self-reported number of acts where partner is a regular one (spouse or boy-/girlfriend). Define it NA if there is no regular partner.`Vppncns`: Self-reported number of protected acts where partner is other than spouse. Define it NA if participant does not have any partner who is other than spouse.`Vppnamm`: Self-reported number of acts where participant and partner are both males. Define it NA if participant is female.`Vppagdf`: Age difference between participant and his/her most frequent sexual partner of opposite sex. Define it NA if participant does not have any partner of opposite sex.

- Paul Murrell (2011).
*R Graphics. Second Edition.*Boca Raton: CRC Press. ISBN 978-1-4398-3176-2.

Lecture 4 (13/03/2014)

nmst440-partners.R | partners.csv | ||

nmst440-tables.R | p2string.R | cars.RData | |

nmst440-Sweave.Rnw | nmst440-Sweave.bib | akplainnat.bst | |

Sweave.sty | SweaveAK.sty | sweaveIt.R |

- Use
`Sweave`to create a`PDF`report on the analysis of*partners*data trying to answer the following question:*Does the value of*`Vppagdf`depend on gender and age of participant?

Examine both marginal and partial effect (being adjusted for the effect of the second factor) of gender and age on`Vppagdf`. Provide results in a form of a table being similar to this table. Include also two plots being suitable for evaluation of a marginal relationship between`Vppagdf`and gender and between`Vppagdf`and age. On a plot of the`Vppagdf`to age relationship, use different symbols/colors to distinguish male and female participants.

Lecture 5 (20/03/2014)

nmst440-simul1.R | nmst440-simul2.R | nmst440-bootstrap.R | |

nmst440-simulScript.R | nmst440-simulScript.sh |

- As you all (hopefully) know, the χ
^{2}distribution of the test statistic of the Pearson χ^{2}test of independence in the contingency table is only asymptotic. It is traditionally claimed that the asymptotic χ^{2}approximation works reasonably well when all*expected*counts (under independence) are higher than a magical number 5. Perform a simulation study towards exploration of a true significance level of the χ^{2}test of independence in a 2x2 table corresponding to comparison of two independent binomial distributions. This is in fact a test towards comparison of proportions of a certain property (``success'') in two independent populations. In the following, let*p*and_{1}*p*be proportions of ``success'' in population 1 and 2, respectively and let_{2}*n*and_{1}*n*be sample sizes in population 1 and 2, respectively. Consider a χ_{2}^{2}test of independence with a nominal significance level of 5% and use continuity corrections when calculating the value of the test statistic. Further, assume equal sample sizes in the two groups, i.e.,*n*=_{1}*n*=_{2}*n*and consider three scenarios (of independence):*p*=_{1}*p*=_{2}*p*= 0.01;*p*=_{1}*p*=_{2}*p*= 0.1;*p*=_{1}*p*=_{2}*p*= 0.5.

*n*(sample size in each group) that gradually correspond to the lowest value of the expected count (under the respective scenario) being 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100. That is, you have in total 3 x 12 = 36 scenarios. Use a simulation length of at least*M*= 10000.

Report results (empirical significance levels) in a form of well-formatted table(s) included in a document prepared using`LaTeX`. Use also a suitable plot to visualize the results.`Sweave`can be used to prepare the report.

**Remark:**Before you start the simulation, think a little bit whether some scenarios cannot pose computational/theoretical problems.

Lecture 6 (27/03/2014)

Test statistic of a certain independence test | |||

nmst440-indTest.R | indTest.R | rMVN2.R | |

nmst440-devel.R | indTest.c | Makefile | |

nmst440-simIndTest.R | nmst440-prepareScripts.R | nmst440-resultIndTest.R | Sněhurka results |

- Take IQ date and use a test of independence implemented in nmst440-indTest.R
(with
*a*= 1) to evaluate separately for*boys*and*girls*(variable`fgender`) whether IQ (variable`iq`) depends on an average grade from the 8th year of a Primary School (variable`zn8`). Use a method of bootstrap to calculate the P-values of the tests.

**Remark:**Explanation on how to use bootstrap to calculate the P-value of the considered test of independence will be provided during the lecture.

Lectures 7–10 (03/04/2014, 10/04/2014, 17/04/2014, 24/04/2014)

Lecture 11 (15/05/2014)

SAS/STAT Documentation | SAS/STAT Procedures | ||

SAS proc nlmixed | SAS proc glimmix | ||

nmst440-nlme.pdf | nmst440-nlme.R | nmst440-nlmixed.sas | argconc.txt |

- See Section 2 of nmst440-nlme.pdf for details. Data for the assignments: toenail.txt.

Lecture 12 (22/05/2014)

S-plus (probably not any more) | |||

–>TIBCO Spotfire | |||

SPSS (IBM, Acrea in CZ) | |||

Statistica (StatSoft/Dell) | |||

NCSS | |||

Stata | |||

Statgraphics | |||

Minitab | |||

nmst440-rpanel.R | rp_samples.R |

- Michael Lawrence, John Verzani (2012).
*Programming Graphical User Interfaces in R.*Boca Raton: CRC Press. ISBN 978-1-4398-5682-6.

View My Stats