Title: | General Hypothesis Testing Problems for Multivariate Coefficients of Variation |
---|---|
Description: | Performs test procedures for general hypothesis testing problems for four multivariate coefficients of variation (Ditzhaus and Smaga, 2023 <arXiv:2301.12009>). We can verify the global hypothesis about equality as well as the particular hypotheses defined by contrasts, e.g., we can conduct post hoc tests. We also provide the simultaneous confidence intervals for contrasts. |
Authors: | Marc Ditzhaus [aut], Lukasz Smaga [aut, cre] |
Maintainer: | Lukasz Smaga <[email protected]> |
License: | LGPL-2 | LGPL-3 | GPL-2 | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-01-07 03:59:10 UTC |
Source: | https://github.com/cran/GFDmcv |
The output are different contrast matrices, namely the centering matrix as well as the matrices of Dunnett's and Tukey's contrasts for given number of groups.
contr_mat(k, type = c("center", "Dunnett", "Tukey"))
contr_mat(k, type = c("center", "Dunnett", "Tukey"))
k |
an integer denoting a number of groups |
type |
an character denoting type of contrasts. The possible values are |
The centering matrix is , where
is the unity matrix and
consisting of
's only.
The matrix of Dunnett's contrasts:
The matrix of Tukey's contrasts:
The matrix of contrasts.
Ditzhaus M., Smaga L. (2022) Permutation test for the multivariate coefficient of variation in factorial designs. Journal of Multivariate Analysis 187, 104848.
Ditzhaus M., Smaga L. (2023) Inference for all variants of the multivariate coefficient of variation in factorial designs. Preprint https://arxiv.org/abs/2301.12009.
Dunnett C. (1955) A multiple comparison procedure for comparing several treatments with a control. Journal of the American Statistical Association 50, 1096-1121.
Tukey J.W. (1953) The problem of multiple comparisons. Princeton University.
contr_mat(4, type = "center") contr_mat(4, type = "Dunnett") contr_mat(4, type = "Tukey")
contr_mat(4, type = "center") contr_mat(4, type = "Dunnett") contr_mat(4, type = "Tukey")
Calculates the estimators with respective -confidence intervals for the four different variants of the multivariate coefficients (MCV) and their reciprocals
by Reyment (1960), Van Valen (1974), Voinov and Nikulin (1996) and Albert and Zhang (2010).
e_mcv(x, conf_level = 0.95)
e_mcv(x, conf_level = 0.95)
x |
a matrix of data of size |
conf_level |
a confidence level. By default, it is equal to 0.95. |
The function e_mcv()
calculates four different variants of multivariate coefficient of variation for -dimensional data. These variant were introduced by
by Reyment (1960, RR), Van Valen (1974, VV), Voinov and Nikulin (1996, VN) and Albert and Zhang (2010, AZ):
where is the sample size,
is the empirical mean vector and
is the empirical covariance matrix:
In the univariate case (), all four variants reduce to coefficient of variation. Furthermore, their reciprocals, the so-called standardized means, are determined:
In addition to the estimators, the respective confidence intervals [C_lwr
, C_upr
] for a given confidence level are calculated by the
e_mcv()
function.
These confidence intervals are based on an asymptotic approximation by a normal distribution, see Ditzhaus and Smaga (2023) for the technical details. These approximations
do not rely on any specific (semi-)parametric assumption on the distribution and are valid nonparametrically, even for tied data.
When (respectively
) a data frame with four rows (one row) corresponding to the four MCVs (the univariate CV)
and six columns containing the estimators
C_est
for the MCV (CV) and the estimators B_est
for their reciprocals as well as the upper and lower bounds of the corresponding
confidence intervals [C_lwr
, C_upr
] and [B_lwr
, B_upr
].
Albert A., Zhang L. (2010) A novel definition of the multivariate coefficient of variation. Biometrical Journal 52:667-675.
Ditzhaus M., Smaga L. (2023) Inference for all variants of the multivariate coefficient of variation in factorial designs. Preprint https://arxiv.org/abs/2301.12009.
Reyment R.A. (1960) Studies on Nigerian Upper Cretaceous and Lower Tertiary Ostracoda: part 1. Senonian and Maastrichtian Ostracoda, Stockholm Contributions in Geology, vol 7.
Van Valen L. (1974) Multivariate structural statistics in natural history. Journal of Theoretical Biology 45:235-247.
Voinov V., Nikulin M. (1996) Unbiased Estimators and Their Applications, Vol. 2, Multivariate Case. Kluwer, Dordrecht.
# d > 1 (MCVs) data_set <- lapply(list(iris[iris$Species == "setosa", 1:3], iris[iris$Species == "versicolor", 1:3], iris[iris$Species == "virginica", 1:3]), as.matrix) lapply(data_set, e_mcv) # d = 1 (CV) data_set <- lapply(list(iris[iris$Species == "setosa", 1], iris[iris$Species == "versicolor", 1], iris[iris$Species == "virginica", 1]), as.matrix) lapply(data_set, e_mcv)
# d > 1 (MCVs) data_set <- lapply(list(iris[iris$Species == "setosa", 1:3], iris[iris$Species == "versicolor", 1:3], iris[iris$Species == "virginica", 1:3]), as.matrix) lapply(data_set, e_mcv) # d = 1 (CV) data_set <- lapply(list(iris[iris$Species == "setosa", 1], iris[iris$Species == "versicolor", 1], iris[iris$Species == "virginica", 1]), as.matrix) lapply(data_set, e_mcv)
The function GFDmcv()
calculates the Wald-type statistic for global null hypotheses
and max-type statistics for multiple local null hypotheses, both in terms of the four variants
of the multivariate coefficient of variation. Respective -values
are obtained by a
-approximation, a pooled bootstrap strategy and a pooled permutation approach (only for the
Wald-type statistic), respectively.
GFDmcv( x, h_mct, h_wald, alpha = 0.05, n_perm = 1000, n_boot = 1000, parallel = FALSE, n_cores = NULL )
GFDmcv( x, h_mct, h_wald, alpha = 0.05, n_perm = 1000, n_boot = 1000, parallel = FALSE, n_cores = NULL )
x |
a list of length |
h_mct |
a |
h_wald |
a |
alpha |
a significance level (then |
n_perm |
a number of permutation replicates. |
n_boot |
a number of bootstrap replicates. |
parallel |
a logical indicating whether to use parallelization. |
n_cores |
if |
The function GFDmcv()
calculates the Wald-type statistic for global null hypotheses of the form
where is a contrast matrix reflecting the research question of interest and
(
) are the subgroup-specific MCVs (and their reciprocal) by Reyment (1960, RR), Van Valen (1974, VV),
Voinov and Nikulin (1996, VN) or Albert and Zhang (2010, AZ), respectively.
We refer to the function
e_mcv()
for the detailed definitions of the different
variants. The -value of the Wald-type statistic relies on a
-approximation,
a (pooled) bootstrap or permutation approach.
Furthermore, the function GFDmcv()
calculates a max-type test statistic for the multiple
comparison of local null hypotheses:
where and
. The
-values are determined by a Gaussian approximation and a bootstrap approach, respectively.
In addition to the local test decisions, multiple adjusted confidence intervals for the contrasts
and
, respectively, are calculated.
Please have a look on the plot and summary functions designed for this package. They can be used
to simplify the output of GFDmcv()
.
A list of class gfdmcv
containing the following components:
overall_res |
a list of two elements representing the results for testing
the global null hypothesis. The first one is a matrix |
mct_res |
all results of MCT tests for particular hypothesis in |
h_mct |
an argument |
h_wald |
an argument |
alpha |
an argument |
Albert A., Zhang L. (2010) A novel definition of the multivariate coefficient of variation. Biometrical Journal 52:667-675.
Ditzhaus M., Smaga L. (2022) Permutation test for the multivariate coefficient of variation in factorial designs. Journal of Multivariate Analysis 187, 104848.
Ditzhaus M., Smaga L. (2023) Inference for all variants of the multivariate coefficient of variation in factorial designs. Preprint https://arxiv.org/abs/2301.12009.
Reyment R.A. (1960) Studies on Nigerian Upper Cretaceous and Lower Tertiary Ostracoda: part 1. Senonian and Maastrichtian Ostracoda, Stockholm Contributions in Geology, vol 7.
Van Valen L. (1974) Multivariate structural statistics in natural history. Journal of Theoretical Biology 45:235-247.
Voinov V., Nikulin M. (1996) Unbiased Estimators and Their Applications, Vol. 2, Multivariate Case. Kluwer, Dordrecht.
# Some of the examples may run some time. # one-way analysis for MCV and CV # d > 1 (MCV) data_set <- lapply(list(iris[iris$Species == "setosa", 1:3], iris[iris$Species == "versicolor", 1:3], iris[iris$Species == "virginica", 1:3]), as.matrix) # estimators and confidence intervals of MCVs and their reciprocals lapply(data_set, e_mcv) # contrast matrices k <- length(data_set) # Tukey's contrast matrix h_mct <- contr_mat(k, type = "Tukey") # centering matrix P_k h_wald <- contr_mat(k, type = "center") # testing without parallel computing res <- GFDmcv(data_set, h_mct, h_wald) summary(res, digits = 3) oldpar <- par(mar = c(4, 5, 2, 0.3)) plot(res) par(oldpar) # testing with parallel computing library(doParallel) res <- GFDmcv(data_set, h_mct, h_wald, parallel = TRUE, n_cores = 2) summary(res, digits = 3) oldpar <- par(mar = c(4, 5, 2, 0.3)) plot(res) par(oldpar) # two-way analysis for CV (based on the example in Ditzhaus and Smaga, 2022) library(HSAUR) data_set <- lapply(list(BtheB$bdi.pre[BtheB$drug == "No" & BtheB$length == "<6m"], BtheB$bdi.pre[BtheB$drug == "No" & BtheB$length == ">6m"], BtheB$bdi.pre[BtheB$drug == "Yes" & BtheB$length == "<6m"], BtheB$bdi.pre[BtheB$drug == "Yes" & BtheB$length == ">6m"]), as.matrix) # estimators and confidence intervals of CV and its reciprocal lapply(data_set, e_mcv) # interaction h_mct <- contr_mat(4, type = "Tukey") h_wald <- kronecker(contr_mat(2, type = "center"), contr_mat(2, type = "center")) res <- GFDmcv(data_set, h_mct, h_wald) summary(res, digits = 3) oldpar <- par(mar = c(4, 6, 2, 0.1)) plot(res) par(oldpar) # main effect drug h_mct <- matrix(c(1, 1, -1, -1), nrow = 1) h_wald <- kronecker(contr_mat(2, type = "center"), 0.5 * matrix(1, 1, 2)) res <- GFDmcv(data_set, h_mct, h_wald) summary(res, digits = 3) oldpar <- par(mar = c(4, 6, 2, 0.1)) plot(res) par(oldpar) # main effect length h_mct <- matrix(c(1, -1, 1, -1), nrow = 1) h_wald <- kronecker(0.5 * matrix(1, 1, 2), contr_mat(2, type = "center")) res <- GFDmcv(data_set, h_mct, h_wald) summary(res, digits = 3) oldpar <- par(mar = c(4, 6, 2, 0.1)) plot(res) par(oldpar)
# Some of the examples may run some time. # one-way analysis for MCV and CV # d > 1 (MCV) data_set <- lapply(list(iris[iris$Species == "setosa", 1:3], iris[iris$Species == "versicolor", 1:3], iris[iris$Species == "virginica", 1:3]), as.matrix) # estimators and confidence intervals of MCVs and their reciprocals lapply(data_set, e_mcv) # contrast matrices k <- length(data_set) # Tukey's contrast matrix h_mct <- contr_mat(k, type = "Tukey") # centering matrix P_k h_wald <- contr_mat(k, type = "center") # testing without parallel computing res <- GFDmcv(data_set, h_mct, h_wald) summary(res, digits = 3) oldpar <- par(mar = c(4, 5, 2, 0.3)) plot(res) par(oldpar) # testing with parallel computing library(doParallel) res <- GFDmcv(data_set, h_mct, h_wald, parallel = TRUE, n_cores = 2) summary(res, digits = 3) oldpar <- par(mar = c(4, 5, 2, 0.3)) plot(res) par(oldpar) # two-way analysis for CV (based on the example in Ditzhaus and Smaga, 2022) library(HSAUR) data_set <- lapply(list(BtheB$bdi.pre[BtheB$drug == "No" & BtheB$length == "<6m"], BtheB$bdi.pre[BtheB$drug == "No" & BtheB$length == ">6m"], BtheB$bdi.pre[BtheB$drug == "Yes" & BtheB$length == "<6m"], BtheB$bdi.pre[BtheB$drug == "Yes" & BtheB$length == ">6m"]), as.matrix) # estimators and confidence intervals of CV and its reciprocal lapply(data_set, e_mcv) # interaction h_mct <- contr_mat(4, type = "Tukey") h_wald <- kronecker(contr_mat(2, type = "center"), contr_mat(2, type = "center")) res <- GFDmcv(data_set, h_mct, h_wald) summary(res, digits = 3) oldpar <- par(mar = c(4, 6, 2, 0.1)) plot(res) par(oldpar) # main effect drug h_mct <- matrix(c(1, 1, -1, -1), nrow = 1) h_wald <- kronecker(contr_mat(2, type = "center"), 0.5 * matrix(1, 1, 2)) res <- GFDmcv(data_set, h_mct, h_wald) summary(res, digits = 3) oldpar <- par(mar = c(4, 6, 2, 0.1)) plot(res) par(oldpar) # main effect length h_mct <- matrix(c(1, -1, 1, -1), nrow = 1) h_wald <- kronecker(0.5 * matrix(1, 1, 2), contr_mat(2, type = "center")) res <- GFDmcv(data_set, h_mct, h_wald) summary(res, digits = 3) oldpar <- par(mar = c(4, 6, 2, 0.1)) plot(res) par(oldpar)
Simultaneous confidence intervals for contrasts for CV and MCVs and their reciprocals are plotted.
## S3 method for class 'gfdmcv' plot(x, ...)
## S3 method for class 'gfdmcv' plot(x, ...)
x |
an "gfdmcv" object. |
... |
additional arguments not used. |
No return value, called for side effects.
# Some of the examples may run some time. # For more examples, see the documentation of the GFDmcv() function. data_set <- lapply(list(iris[iris$Species == "setosa", 1:3], iris[iris$Species == "versicolor", 1:3], iris[iris$Species == "virginica", 1:3]), as.matrix) # estimators and confidence intervals of MCVs and their reciprocals lapply(data_set, e_mcv) # contrast matrices k <- length(data_set) # Tukey's contrast matrix h_mct <- contr_mat(k, type = "Tukey") # centering matrix P_k h_wald <- contr_mat(k, type = "center") # testing without parallel computing res <- GFDmcv(data_set, h_mct, h_wald) oldpar <- par(mar = c(4, 5, 2, 0.3)) plot(res) par(oldpar)
# Some of the examples may run some time. # For more examples, see the documentation of the GFDmcv() function. data_set <- lapply(list(iris[iris$Species == "setosa", 1:3], iris[iris$Species == "versicolor", 1:3], iris[iris$Species == "virginica", 1:3]), as.matrix) # estimators and confidence intervals of MCVs and their reciprocals lapply(data_set, e_mcv) # contrast matrices k <- length(data_set) # Tukey's contrast matrix h_mct <- contr_mat(k, type = "Tukey") # centering matrix P_k h_wald <- contr_mat(k, type = "center") # testing without parallel computing res <- GFDmcv(data_set, h_mct, h_wald) oldpar <- par(mar = c(4, 5, 2, 0.3)) plot(res) par(oldpar)
Prints the summary of the inference methods for CV and MCVs.
## S3 method for class 'gfdmcv' summary(object, ...)
## S3 method for class 'gfdmcv' summary(object, ...)
object |
an "gfdmcv" object. |
... |
integer indicating the number of decimal places to be used to present the numerical results,
It can be named |
The function prints out the information about the significance level, constrast matrices,
test statistics, -values, critical values, simultaneous confidence intervals for contrasts
performed by the
GFDmcv()
function.
No return value, called for side effects.
# Some of the examples may run some time. # For more examples, see the documentation of the GFDmcv() function. data_set <- lapply(list(iris[iris$Species == "setosa", 1:3], iris[iris$Species == "versicolor", 1:3], iris[iris$Species == "virginica", 1:3]), as.matrix) # estimators and confidence intervals of MCVs and their reciprocals lapply(data_set, e_mcv) # contrast matrices k <- length(data_set) # Tukey's contrast matrix h_mct <- contr_mat(k, type = "Tukey") # centering matrix P_k h_wald <- contr_mat(k, type = "center") # testing without parallel computing res <- GFDmcv(data_set, h_mct, h_wald) summary(res, digits = 3)
# Some of the examples may run some time. # For more examples, see the documentation of the GFDmcv() function. data_set <- lapply(list(iris[iris$Species == "setosa", 1:3], iris[iris$Species == "versicolor", 1:3], iris[iris$Species == "virginica", 1:3]), as.matrix) # estimators and confidence intervals of MCVs and their reciprocals lapply(data_set, e_mcv) # contrast matrices k <- length(data_set) # Tukey's contrast matrix h_mct <- contr_mat(k, type = "Tukey") # centering matrix P_k h_wald <- contr_mat(k, type = "center") # testing without parallel computing res <- GFDmcv(data_set, h_mct, h_wald) summary(res, digits = 3)