Home (CZ) | Teaching (CZ) | NMST552 |

Rko (CZ) |

SIS pages of the course: | ENG | CZE |

Tutorial: | Monday 15:40 in K10B |

There will be no tutorial lecture on Monday November 6, 2023.

This lecture is primarily intended as a supplementary course for students of the master programme *Probability, mathematical statistics and econometrics*. The following knowledge and abilities are assumed:

- Foundations of statistical inference (statistical test, confidence interval, standard error, consistency);
- Basic procedures of statistical inference (asymptotic tests on expected value, one- and two-sample t-test, one-way analysis of variance, chi-square test of independence);
- Linear model;
- Intermediate knowledge of R, a free software environment for statistical computing and graphics (https://www.r-project.org);
- Working knowledge of LaTeX;
- Ability of algorithmic programming (in arbitrary language, e.g., Python, Pascal, C/C++, Fortran, ...).

This course **is not** a cook-book course on R.

PuTTY | Download |

FileZilla | Download |

GNU Emacs | Download |

ESS plugin for Emacs | Download |

Cygwin | Download |

Rtools | Download |

R package devtools |

Tutorial 1 (02/10/2023 and 09/10/2023)

**Topic:** Data management in R

**Files:**

Data.R | Data.xls | Data.csv | |

RC2dob.gender.R |

Data included in the LibreOffice sheet Consum.ods
contain information on spendings of participants to a certain scientific event during their stay
at the conference site. Personal information contains: *gender* (m/f),
*category* (professor (prof), associate professor (doc),
assistant (asist), researcher (res), Ph.D. student (phd), guest (host)), *institution*.
Additionally, length of a talk (if given) is included. The remaining columns provide numbers
of consumed drinks of different types, total spendings and spendings on other services (column *other*). Missing
values are indicated by empty cell or a string "`na`".

Prepare data for statistical analysis which aim would mainly be to explore mutual relationships among
personal and consumption variables or mutually among consumption variables. Use your subject matter knowledge
to clean data and especially information on *institution*.

Tutorial 2 (16/10/2023)

R script (functions and programming) | R script (classes and methods) | CovMat.R |

No new assignment for this tutorial. Work on previous assignments.

Tutorial 3 (23/10/2023)

R script (glm object) |

Write an R function which takes an object of class `glm` and creates two tables (each being returned as a `data.frame`).

Table 1 will contain for each non-intercept coefficient (i) exponential of the MLE of the coefficient (which has a useful interpretation for many GLM's), (ii) related standard error calculated by the mean of a delta method, (iii) p-value from the Wald test, (iv) p-value from the likelihood-ratio (deviance) test, (v) confidence interval for the exponential of the coefficient being dual to the Wald test, (vi) confidence interval for the exponential of the coefficient being dual to the likelihood-ratio test. User should be able to specify a coverage of the confidence intervals.

Table 2 will contain for each term (effect) included in the model (i) related degrees of freedom, (ii) Wald test statistic and a p-value, (iii) likelihood-ratio test statistic and a p-value.

Additionally write a function which prints the results in a nice form. Minimal niceness consists of (a) providing some explanatory titles to the two tables, (b) printed p-values being formatted such that
those being lower than 0.001 are printed as `<0.001`, (c) printed numbers (other than p-values) will be rounded to a value being specified by the user (take 2 as a default value for number of digits
after a decimal sign).

Test your function on a logistic model based on Consum data with response being indicator of whether more was spent on alcoholic rather than non-alcoholic drinks (count 25 CZK for Radler/nealko and 30 CZK for liquer, do not count `other` spendings) and covariates (all included in an additive way) (i) gender, (ii) category, (iii) talk categorized as `none`/`at most 30 min`/`more than 30 min`. Missing values for `talk` should be considered as no talk. Disregard subjects with `category` `guest`.

Tutorial 4 (23/10/2023)

R script (big data and apply) | Data (Kojeni) |

No new assignment for this tutorial. Work on previous assignments.

Tutorial 5 (30/10/2023)

R script (routine analysis) | formatOut.R | funTabDescr.R |

Data (nelsNE2) | report (LaTeX) | report (pdf) |

Write an R function to convert tables from Assignment 3 into LaTeX tables. Use LaTeX to prepare a toy report in pdf containing results of the test analysis from Assignment 3.

Tutorial 6 (13/11/2023 and 13/11/2023)

R script (simulation 1) | R script (simulation 2) |

As you all (hopefully) know, the χ^{2} distribution of the test statistic of the Pearson χ^{2} test
of independence in the contingency table is only asymptotic. It is traditionally claimed that the asymptotic
χ^{2} approximation works reasonably well when all *expected* counts (under independence) are higher
than a magical number 5.

Perform a simulation study towards exploration of a true significance level and true distribution of the test statistic
of the χ^{2} test of independence
in a 2x2 table corresponding to comparison of two independent binomial distributions. This is in fact a test towards
comparison of proportions of a certain property (``success'') in two independent populations.

In the following, let *p _{1}* and

*p*=_{1}*p*=_{2}*p*= 0.01;*p*=_{1}*p*=_{2}*p*= 0.1;*p*=_{1}*p*=_{2}*p*= 0.5.

In other words, data will consist of *2 n* binary observations being generated from two independent Bernoulli populations whose probabilities of success and failure are as follows:

Success | Failure | Sum | |

Population 1 | p_{1} | 1 - p_{1} | 1 |

Population 2 | p_{2} | 1 - p_{2} | 1 |

Contingency table with *expected* counts under the hypothesis *p _{1} = p_{2} (= p)* is then

Success | Failure | Margin | |

Population 1 | n p | n (1 - p) | n |

Population 2 | n p | n (1 - p) | n |

Margin | 2 n p | 2 n (1 - p) | 2 n |

For each scenario, consider values of *n* (sample size in each group) that gradually correspond
to the lowest value of the expected count, i.e., a value of *n p* (under the respective scenario) being (approximately) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100.
That is, you have in total 3 x 12 = 36 scenarios. Use a simulation length of at least *M* = 10000.

Report results (empirical significance levels) in a form of well-formatted table(s) included in a LaTeX (or Sweave) document that you have already prepared for earlier Assignments.

Additionally, use suitable graphical tools to compare empirical distributions of the test statistics (under considered scenarios) to assumed χ^{2} distributions. Include relevant plots in the document.

**Remark:** Before you start the simulation, think a little bit whether some scenarios cannot pose computational/theoretical problems.

Tutorial 7 (27/11/2023)

R script (graphics) | Data (nelsNE2) | pchShow.R |

No new assignment for this tutorial. Work on previous assignments.

Tutorial 8 (27/11/2023)

R script (palettes) | data (PS PČR 2021, csv) | data (PS PČR 2021, RData) |

R script (shape maps) | shape files (CZE_adm.tar.gz) | |

R script (3D plots) | R function (dmixn) |

Take results of the PS PČR 2021 elections (or other elections, e.g., fresh 2023 parliament elections in Slovakia might be more interesting for some students) and calculate conditional distributions of votes by regions. For at least one party, visualize the respective regional proportions in a map. Include the map in a separate section of the document which is being prepared by previous assignments.

Tutorial 9 (04/12/2023)

Sněhurka (Karlín) | Snow White (Karlín) | IT4Innovations (Ostrava) |

Try to implement the simulation study from the Assignment 6 in an efficient way.

The course credit will be awarded to the student who hands in a satisfactory solution to each assignment by the prescribed deadline. The nature of these requirements precludes any possibility of additional attempts to obtain the course credit.

**DEADLINE** for delivery of all the files: **Sunday 25/02/2024** (23:59 CET).

View My Stats