We analyze the intraday Bitcoin prices from 2022. The data come from a larger database available at kaggle.com repository. We consider intraday data with frequency 20 min and calculate the logarithmic cumulative intraday returns (CIR). The resulting data are available at bitcoin_CIR.csv file. The observations for each day will be treated as functional data measured at \(J=72\) points in \([0,1]\).


Note that the setting used in the code below differs from the one used in (Hudecová, Hlubinka, and Hlávka 2025), so the output differs as well. Namely, here we use a larger number of permutation samples \(B\) to gain more efficiency. The overall conclusion stays the same, but there is a difference in the final vector of individual contributions.

Data

bitcoin <- read.csv("bitcoin_CIR.csv", stringsAsFactors = TRUE)

We are concerned with the question whether the behavior of CIR differs for working and weekend days. In particular, we consider the cumulative intraday returns for Mondays (sample 1), Wednesdays (sample 2) and Saturdays (sample 3), and test \[ H_0: \varphi_1=\varphi_2=\varphi_3, \] where \(\varphi_i\) is the characteristic functional of sample \(i\in\{1,2,3\}\).

sample1 <- matrix(bitcoin[bitcoin$weekday=="Monday", "logCIR"], byrow = TRUE, ncol =72)
sample2 <- matrix(bitcoin[bitcoin$weekday=="Wednesday", "logCIR"], byrow = TRUE, ncol = 72)
sample3 <- matrix(bitcoin[bitcoin$weekday=="Saturday", "logCIR"], byrow = TRUE, ncol = 72)

data.all <- rbind(sample1, sample2, sample3)
c(nrow(sample1), nrow(sample2), nrow(sample3))
## [1] 52 52 53

The three samples, of sizes \(n_1=52\), \(n_2=52\), and \(n_3=53\), are plotted below:

Test statistic

The null hypothesis is tested via pairwise comparisons of samples (1,2), (1,3) and (2,3), respectively. For pair \((i,j)\), the test statistic for the comparison of the corresponding samples is computed as a suitable \(L_2\) distance of the empirical characteristic functions, as described in (Hlávka, Hlubinka, and Koňasová 2022) or (Hudecová, Hlubinka, and Hlávka 2025). The code for its computation is available in file HHP.R as a function Two.sample.ECDF. The function requires specification of weight matrix \(\boldsymbol{V}\). Based on empirical evidence from (Hlávka, Hlubinka, and Koňasová 2022), it is recommended to choose the matrix \(\boldsymbol{V}\) as data dependent, namely as an approximation to the inverse of the sample covariance matrix, because this choice leads to generally reasonable results under the null as well as against various alternatives. Hence, this is set as default, but other possibilities are available as well.

source("HHP.R")

First, we define a function ECF.TT that computes the multivariate test statistic \(\boldsymbol{T}\) for a general test of \[ H_0: \varphi_1=\dots=\varphi_K. \]

The inputs are:

  • data : \(N\times J\) matrix of \(N\) observations (in rows) of curves evaluated at \(J\) points in \([0,1]\) (columns)
  • nn : \(K\)-dimensional vector of sample sizes

The output is a \(d={K \choose 2}\) dimensional test statistic \(\boldsymbol{T}\), whose components are test statistics for the pairwise comparisons.

ECF.TT <- function(data, nn){

  K <- length(nn)
  
  id <- cumsum(nn)
  
  # transform the matrix data into list of individual data sets
  data.list <- list()
  data.list[[1]] <- data[1:nn[1], ]
  for(k in 2:K) data.list[[k]] <- data[(id[k-1]+1):id[k], ]
  
  TT <- numeric(choose(K, 2)) # the test statistic T of length d=choose(K,2) to be computed
  d <- 1
  
  # pairwise comparisons -> compare all j<l
  for(j in 1:(K-1)) for(l in (j+1):K){
    V <- compute.V(data1 = data.list[[j]], data2 = data.list[[l]], data.all = data, nn = nn, n.grid = ncol(data))
    TT[d] <- Two.sample.ECDF(data.list[[j]], data.list[[l]], V = V)
    d <- d + 1
  }
  
return(TT)
}

Now we can compute the test statistic \(\boldsymbol{T}_0\) for the comparison of the three samples (Monday, Wednesday, and Saturday):

nn <- c(nrow(sample1), nrow(sample2), nrow(sample3))
T0 <- ECF.TT(data = data.all, nn = nn)
T0
## [1] 0.0001038486 0.0001099517 0.0001807475

Permutation sample

We choose the number of permutation replicas as \(B=1999\). The values \(\boldsymbol{T}_1,\dots,\boldsymbol{T}_B\) are calculated using the permutation principle and function ECF.TT. Note that the resulting set depends on the setting of the random generator and on the choice of \(B\). The same holds for the entire multivariate permutation test.

set.seed(1)

B <- 1999

  
T.mat <- matrix(nrow = B, ncol = length(T0))
  
for(b in 1:B){
    s <- sample(1:nrow(data.all))
    data.b <- data.all[s, ]
    T.mat[b,] <- ECF.TT(data.b, nn)
  }

The cloud of 3-dimensional test statistics can be visualized using 3D graph, where \(\boldsymbol{T}_0\) is stressed by red color.

library(plotly)

o.dat <- as.data.frame(cbind(10^4 * rbind(T0, T.mat), c(1, rep(2, B)), c(10, rep(2, B))))
names(o.dat) <- c("T1", "T2", "T3", "ID", "size")
o.dat$color <-  c("red", rep("#808080", B))

o.dat$symbol <-  c("square", rep("circle", B))
fig <- plot_ly(o.dat, x = ~T1, y = ~T2, z = ~T3, type = 'scatter3d', mode = 'markers',
  marker = list(size = ~size, color = ~color, symbol = ~symbol, line = list(color = ~color)))

fig %>% hide_colorbar()

Optimal transport

The multivariate permutation test based on the optimal measure transport (OMT) requires specification of a grid in the unit ball in \(\mathbb{R}^3\). Since the components of \(\boldsymbol{T}\) take only positive values and only large positive values indicate violation of \(H_0\), we need to construct a grid in the set \(\{\boldsymbol{x}\in [0,1]^3: \| \boldsymbol{x}\|\leq 1\}\). A product grid is computed with \(n_R=50\) and \(n_S=40\), see a separate page on grid sets for details on the construction.

nR <- 50
nS <- 40

library(randtoolbox)

y <- halton(nS, dim = 2)
s <- matrix(NA, ncol = 3, nrow = nS)
 
s[,1] <- 1-y[,1]
phi1 <- 1/pi * acos(1 - y[, 1])
phi2 <- y[, 2]/4
s[,2] <- sin(pi * phi1) * cos(2 * pi * phi2)
s[,3] <- sin(pi * phi1) * sin(2 * pi * phi2)
  
grid <- matrix(rep(0, 3*(B + 1)), ncol = 3)
radius <- (1:nR)/(nR + 1)
  
for(i in 1:nR) 
  for(j in 1:nS)
      grid[(i-1)*nS+j, ] <- radius[i] * s[j, ]

Computation of the optimal transport:

require(clue)

sqdist<-function(a,b){
  return(abs(a - b)^2)
}

T.all <- rbind(T0,T.mat)

costes <- t(outer(grid[, 1], T.all[, 1], "sqdist")) + t(outer(grid[, 2], T.all[, 2], "sqdist")) + t(outer(grid[, 3], T.all[, 3], "sqdist"))
asignacion <- solve_LSAP(costes)
FB <- grid[asignacion,]

Visualization of the grid and transported value \(F^*(\boldsymbol{T}_0)\) (red color):

FB.dat <- as.data.frame(cbind(FB, c(1, rep(2, B)), c(10, rep(2,B))))
names(FB.dat) <- c("T1", "T2", "T3", "ID", "size")
FB.dat$color <-  c("red", rep("#808080", B))

FB.dat$symbol <-  c("square", rep("circle", B))
fig <- plot_ly(FB.dat, x = ~T1, y = ~T2, z = ~T3, type = 'scatter3d', mode = 'markers',
  marker = list(size = ~size,color = ~color,symbol = ~symbol,line = list(color=~color)))

fig %>% hide_colorbar()

Permutation p-value

The final p-value is computed as the proportion of cases that are more or the same extreme as \(F^*(\boldsymbol{T}_0)\):

vnorm <- function(x) sqrt(sum(x^2))

FB.norms <- apply(FB, 1, vnorm)
FB.norms <- round(FB.norms, dig = 6)
(p1.e <- 1/(B + 1) * (1 + sum(FB.norms[-1] >= FB.norms[1])))
## [1] 0.04
(p1.a <- 1 - vnorm(FB[1, ]))
## [1] 0.03921569

The null hypothesis of equality of distribution is rejected on significance level \(\alpha=0.05\). Therefore, it makes sense to explore the vector of individual contributions:

D <- FB[1, ]/vnorm(FB[1, ])
round(D^2, 3)
## [1] 0.079 0.215 0.706

These results indicate that mainly the comparison between Wednesday and Saturday contributes to the rejection of \(H_0\), while the contribution of the comparison between Monday and Wednesday is rather negligible.

Comparison: Functional ANOVA

For a comparison, we conduct also various ANOVA tests for testing \[ \widetilde{H}_0: \mu_1=\mu_2=\mu_3, \] where \(\mu_i\) is the mean function of sample \(i\in\{1,2,3\}\). We use a battery of various tests available in package fdANOVA, (Górecki and Smaga 2019):

library(fdANOVA)

grp <- c(rep(1, nrow(sample1)), rep(2, nrow(sample2)), rep(3, nrow(sample3))) 
F.anova <- fanova.tests(t(data.all), group.label = grp, parallel = FALSE, test = "ALL")

pval <- c(F.anova$FP$pval, F.anova$CS$pval, F.anova$L2B$pval, F.anova$L2b$pval, F.anova$FB$pval, F.anova$Fb$pval)

The resulting p-values are shown in the following table. We see that none of the tests rejects the equality of mean functions.

P-values of ANOVA tests from fdANOVA
FP CS L2B L2b FB Fb
0.793 0.7849 0.772499 0.7819 0.7748013 0.7918

Back to the main page.

References

Górecki, Tomasz, and Łukasz Smaga. 2019. fdANOVA: An r Software Package for Analysis of Variance for Univariate and Multivariate Functional Data.” Computational Statistics 34 (2): 571–97.
Hlávka, Zdeněk, Daniel Hlubinka, and Kateřina Koňasová. 2022. “Functional ANOVA Based on Empirical Characteristic Functionals.” Journal of Multivariate Analysis 189: 104878.
Hudecová, Šárka, Daniel Hlubinka, and Zdeněk Hlávka. 2025. “Functional k Sample Problem via Multivariate Optimal Measure Transport-Based Permutation Test.” In International Workshop on Functional and Operatorial Statistics, 249–57. Springer.