This example illustrates the usage of the proposed multivariate permutation approach based on the discrete optimal transport for a \(d\)-dimensional test statistic \(\mathbf{T}\) for \(d=3\). For simplicity, we use the same three sample problem as in the first example, but now we will consider all three pairwise comparisons.
Consider three independent random samples from shifted continuous distributions. Namely, let \(f\) be a density with a finite variance, and \(X_{i,j}\), \(j=1,\dots,n_i\), is a random sample from a distribution with density \(f(x-\mu_i)\) for \(i=1,2,3\).
The null hypothesis \[ H_0:\mu_1=\mu_2=\mu_3 \] will be tested via partial hypotheses \[ H_{0,1}:\mu_1=\mu_2, \ H_{0,2}:\mu_1=\mu_3 \text{ and } H_{0,3}:\mu_2=\mu_3. \] The test statistic is \(\mathbf{T}=(T_1,T_2,T_3)^\top\), where \(T_k\) is the classical t-statistics for testing \(H_{0,k}\), \(k=1,2,3\).
We use the same setup as in the first
example, i.e. three balanced samples from
\(N(\mu_j,1)\) \(j\in\{1,2,3\}\) of sizes \(n_1=n_2=n_3=5\), and \(\mu_1=\mu_2=0\), \(\mu_3=2\).
The resulting data are shown in the following plot:
First, we compute the test statistics from the data to get the vector
\(\mathbf{T}_0\). In the code below
n1=5
is the sample size of each of the samples.
T1.0 <- t.test(x[1:n1], x[(n1 + 1):(2 * n1)])$stat
T2.0 <- t.test(x[1:n1], x[(2 * n1 + 1):(3 * n1)])$stat
T3.0 <- t.test(x[(n1 + 1):(2 * n1)], x[(2 * n1 + 1):(3 * n1)])$stat
(T.0 <- c(T1.0, T2.0, T3.0))
## t t t
## 0.3750625 -4.5776556 -3.9617182
We proceed similarly as in the first example, according to the following algorithm:
Algorithm 1.
Input: A \(d\)-variate test statistic \(\mathbf{T}_0\) for testing \(H_0\).
If the null hypothesis is rejected, it is also desirable to
In this example we take \(B=799\), so \(B+1=800\).
We will consider a product grid with \(n_R=20\) and \(n_S=40\). Its construction is explained on a page about grid sets.
n.R <- 20
n.S <- 40
d <- 3
N <- n.R * n.S
require(randtoolbox)
x <- halton(n.S, dim = 2) # Halton sequence of n.S points
# transformation to the unit sphere
s <- matrix(NA, nrow=n.S, ncol = 3)
s[ ,1] <- (1-2 * x[ ,1])
s[ ,2] <- 2 * sqrt(x[ ,1] * (1-x[ ,1])) * cos(2 * pi * x[ ,2])
s[ ,3] <- 2 * sqrt(x[ ,1] * (1-x[ ,1])) * sin(2 * pi * x[ ,2])
# final product grid
grid <- matrix(rep(0, d * N), ncol = d)
radius <- (1:n.R)/(n.R + 1)
for(i in 1:n.R)
for(j in 1:n.S)
grid[(i-1) * n.S+j, ]=radius[i] * s[j,]
The resulting grid is plotted in the following interactive plot:
In this step, the vectors \(\mathbf{T}_1,\dots,\mathbf{T}_B\) of permutation test statistics are computed. The procedure is the same as for \(d=2\).
set.seed(1234)
B <- n.R * n.S - 1
#perm. test statistics T_b (in rows):
T.b <- matrix(NA, ncol = 3, nrow = B)
for(b in 1:B){
x.b <- sample(x, size = length(x), replace = FALSE)
T.b[b,1] <- t.test(x.b[1 : n1], x.b[(n1 + 1) : (2 * n1)])$stat
T.b[b,2] <- t.test(x.b[1 : n1], x.b[(2 * n1 + 1) : (3 * n1)])$stat
T.b[b,3] <- t.test(x.b[(n1 + 1):(2*n1)], x.b[(2 * n1 + 1):(3 * n1)])$stat
}
Dimension \(d=3\) still allows us to visualize the obtained permutation test statistics using a 3D plot. The blue points correspond to \(\mathbf{T}_1,\dots,\mathbf{T}_B\) and the red point is \(\mathbf{T}_0\). Obviously, it lies on the edge of the cloud. The next interactive plot enables rotation and so a good inspection of the whole situation.
In this step we will compute the transportation \(F_{\pm}^{(B)}\) of \(\mathbf{T}_0,\mathbf{T}_1,\dots,\mathbf{T}_B\)
to the target grid. The optimization is performed using the function
solve
from package clue
, similarly as for
\(d=2\) in the
first example.
require(clue)
sqdist <- function(a, b){
return(abs(a - b)^2)
}
d <- 3
y <- rbind(T.0, T.b)
costes <- t(outer(grid[ ,1], y[ ,1], "sqdist"))
for(j in 2:d) costes <- costes + t(outer(grid[ ,j], y[ ,j], "sqdist"))
asignacion <- solve_LSAP(costes)
FB <- grid[asignacion, ]
Here, again, FB[1,] is \(F_{\pm}^{(B+1)}(\mathbf{T}_0)\) and FB[b+1,] corresponds to \(F_{\pm}^{(B+1)}(\mathbf{T}_b)\).
The value corresponding to \(F_{\pm}^{(B)}(\mathbf{T}_0)\) is stressed by a red color in the next interactive plot.
Notice that the red point (i.e. the image of \(\boldsymbol{T}_0\)) lies on the most outer orbit of the grid. Since we chose \(n_R=20\) as in the example for \(d=2\), the same p-value (either \(\widehat{p}_e\) or \(\widehat{p}_a\)) is obtained.
vnorm <- function(x) sqrt(sum(x^2))
(p.a <- 1-vnorm(FB[1, ]))
## [1] 0.04761905
FB.norms <- round(apply(FB, 1, vnorm), dig = 6)
(p.e <- 1/(B + 1) * (1 + sum(FB.norms[-1] >= FB.norms[1])))
## [1] 0.05
To evaluate the partial contributions, we compute the directional vector \(\mathbf{D}\):
D <- FB[1, ]/vnorm(FB[1, ])
# contributions in percentages
D^2 * 100
## [1] 3.515625 45.437154 51.047221
The contribution of the first, second and third test statistic to the rejection of \(H_0\) is 4 %, 45 % and 51 % respectively.