Home (CZ) | Teaching (CZ) | BESEDA | NMST552 |

Rko (CZ) |

SIS pages of the course: | ENG | CZE |

Tutorial: | Thursday 14:00 in K11 |

- V roce 2017 bude vyučováno v češtině, podpůrné materiály budou vesměs v angličtině.

Markéta Horejšová | Daniel Jahn | Jan Jeliga | Tomáš Šlampiak | Jan Vávra | Václav Veselý |

PuTTY | Download |

FileZilla | Download |

GIMP | Download |

Ghostscript and GhostView | Download |

GNU Emacs | Download |

ESS plugin for Emacs | Download |

Cygwin | Download |

Rtools | Download |

R package devtools |

Tutorial 1 (23/02/2017)

**Topic:** HTML and bibliographic information sources on Internet.

HTML tags: | Page at w3schools Page at htmldog |

CSS Templates: | CSS Templates For Free Andreas Viklund Example from CSS Templates For Free |

Classification systems: | MSC JEL |

Bibliographic databases: | Web of Science (WOS) Scopus MathSciNet ZentralBlatt MATH Google Scholar |

DOI number: | doi.org DOI at Wiki |

Articles databases: | JSTOR JSTOR (Statistics) Wiley Online Library ScienceDirect SpringerLink |

htpasswd: | .htaccess Example .htaccess Example 2 On-line htpasswd generator |

- Create your homepage at Artax server and then send a link to this page to the lecturer via e-mail.
- Add to this webpage information concerning your Bachelor or Master thesis including its MSC and/or JEL classification, keywords in both Czech/Slovak and English. Further, provide three references from your thesis including the following information: DOI number (as an active link), number of citations according to WOS and Scopus, information whether a full text of the publication is available from IP's of MFF UK. If it is, include the link to this full text.

Tutorial 2 (02/03/2017)

Data.R | Data.xls | Data.csv | |

RC2dob.gender.R |

Data included in the LibreOffice sheet Consum.ods
contain information on spendings of participants to a certain scientific event during their stay
at the conference site. Personal information contains: *gender* (m/f),
*category* (professor (prof), associate professor (doc),
assistant (asist), researcher (res), Ph.D. student (phd), guest (host)), *institution*.
Additionally, length of a talk (if given) is included. The remaining columns provide numbers
of consumed drinks of different types, total spendings and spendings on other services (column *other*). Missing
values are indicated by empty cell or a string "`na`".

- Prepare data for statistical analysis which aim would mainly be to explore mutual relationships among
personal and consumption variables or mutually among consumption variables. Use your subject matter knowledge
to clean data and especially information on
*institution*.

Tutorial 3 (09/03/2017)

Individual work at home.

Tutorials 4–5 (16/03 and 22/03/2017)

R script (big data and apply) | R script (classes and methods) | R script (functions and programming) |

Data (Kojeni) | CovMat.R |

Write an R function which takes an object of class `glm` and creates two tables (each being returned as a `data.frame`).

Table 1 will contain for each non-intercept coefficient (i) exponential of the MLE of the coefficient (which has a useful interpretation for many GLM's), (ii) related standard error calculated by the mean of a delta method, (iii) p-value from the Wald test, (iv) p-value from the likelihood-ratio (deviance) test, (v) confidence interval for the exponential of the coefficient being dual to the Wald test, (vi) confidence interval for the exponential of the coefficient being dual to the likelihood-ratio test. User should be able to specify a coverage of the confidence intervals.

Table 2 will contain for each term (effect) included in the model (i) related degrees of freedom, (ii) Wald test statistic and a p-value, (iii) likelihood-ratio test statistic and a p-value.

Additionally write a function which prints the results in a nice form. Minimal niceness consists of (a) providing some explanatory titles to the two tables, (b) printed p-values being formatted such that
those being lower than 0.001 are printed as `<0.001`, (c) printed numbers (other than p-values) will be rounded to a value being specified by the user (take 2 as a default value for number of digits
after a decimal sign).

Test your function on a logistic model based on Consum data with response being indicator of whether more was spent on alcoholic rather than non-alcoholic drinks (count 25 CZK for Radler/nealko and 30 CZK for liquer, do not count `other` spendings) and covariates (all included in an additive way) (i) gender, (ii) category, (iii) talk categorized as `none`/`at most 30 min`/`more than 30 min`. Missing values for `talk` should be considered as no talk. Disregard subjects with `category` `guest`.

Tutorial 6 (30/03/2017)

R script (routine analysis) | formatOut.R | funTabDescr.R |

Data (nelsNE) | report (LaTeX) | report (pdf) |

Write an R function to convert tables from Assignment 4–5 into LaTeX tables. Use LaTeX to prepare a toy report in pdf containing results of the test analysis from Assignment 4–5.

Tutorial 7 (06/04/2017)

R script (routine analysis) | report (Sweave) | report (pdf) |

Data (nelsNE, processed) | report 2 (Sweave) | report 2 (pdf) |

R script (process Sweave) | bib file | |

TeX style | bib style |

Convert the LaTeX document from Assignment 6 into the Sweave document.

Tutorial 8 (13/04/2017)

R script (graphics) |

Tutorial 9 (20/04/2017)

R script (simulation 1) | R script (simulation 2) | |

R script (batch) | Shell script |

As you all (hopefully) know, the χ^{2} distribution of the test statistic of the Pearson χ^{2} test
of independence in the contingency table is only asymptotic. It is traditionally claimed that the asymptotic
χ^{2} approximation works reasonably well when all *expected* counts (under independence) are higher
than a magical number 5.

Perform a simulation study towards exploration of a true significance level and true distribution of the test statistic
of the χ^{2} test of independence
in a 2x2 table corresponding to comparison of two independent binomial distributions. This is in fact a test towards
comparison of proportions of a certain property (``success'') in two independent populations.

In the following, let *p _{1}* and

*p*=_{1}*p*=_{2}*p*= 0.01;*p*=_{1}*p*=_{2}*p*= 0.1;*p*=_{1}*p*=_{2}*p*= 0.5.

For each scenario, consider values of *n* (sample size in each group) that gradually correspond
to the lowest value of the expected count (under the respective scenario) being 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100.
That is, you have in total 3 x 12 = 36 scenarios. Use a simulation length of at least *M* = 10000.

Report results (empirical significance levels) in a form of well-formatted table(s) included in a LaTeX (or Sweave) document that you already prepared for Assignments 6 and 7.

Additionally, use suitable graphical tools to compare empirical distributions of the test statistics (under considered scenarios) to assumed χ^{2} distributions. Include relevant plots in the document.

**Remark:** Before you start the simulation, think a little bit whether some scenarios cannot pose computational/theoretical problems.

Tutorial 10 (27/04/2017)

Seminar introducing capabilities of SAS (Statistical Analysis Software) given by Petr Klášterecký, MFF UK graduate and employee of the SAS company.

**Links**:
Akademický program společnosti SAS v České republice

Tutorial 11 (04/05/2017)

R script (parallel simulation) | R script (some test of independence) | |

indTest.c (C file) | indTest.R (R function) | rMVN2.R (R function) |

Interface function .C |

dyn.load and dyn.unload |

Creating shared objects |

Random number generation |

Numerical analysis subroutines |

(distribution and mathematical functions, mathematical constants) |

Optimization |

Integration |

Take Consum data and use a test of independence implemented in indTest.R (with a = 1) to evaluate dependence of **spendings** on beer consumption (total for beer and Plzeň) and spendings on non-alcoholic drinks (total for Radler/nealko, cola/kofola, čaj). Count 25 CZK for Radler/nealko. Perform the analysis for (a) the whole dataset, (b) "senior" people only (category prof, doc, asist, res, host), (c) "junior" people only (category phd). Use a method of bootstrap to calculate the P-values of the tests. Include results in a document which is being created in the framework of previous assignments.

**Remark:** Explanation on how to use bootstrap to calculate the P-value of the considered test of independence will be/was provided during the lecture.

Tutorial 12 (11/05/2017)

Seminar introducing capabilities of another traditional commercial software products for statistical data analysis given by Tomáš Jurczyk, MFF UK graduate and employee of the QUEST software company (development, sale and support of STATISTICA).

Tutorial 13 (18/05/2017)

R script (simulation) | R script (prepare bash scripts) | R script (process results) |

R markdown files (tar.gz archive) |

Sněhurka (Karlín) | IT4Innovations (Ostrava) |

R Markdown | knitr |

Tutorial 14 (25/05/2017)

R shiny files (tar.gz archive) | ||

R script (lattice, ggplot2) | R script (shape maps) | R script (Google maps) |

Additional files for R maps (tar.gz archive) |

R shiny | |

lattice package description | Getting started with lattice graphics |

ggplot2 package page | Short tutorial on ggplot2 |

View My Stats