Distribution and correlation-free two-sample test of high-dimensional means

The Annals of Statistics

2020, Vol. 48, No. 3, 1304–1328

https://doi.org/10.1214/19-AOS1848

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE TEST

OF HIGH-DIMENSIONAL MEANS

Y KAIJIE XUE

AND FANG YAO

School of Statistics and Data Science, Nankai University, [email protected]

Department of Probability and Statistics, School of Mathematical Sciences, Center for Statistical Science, Peking University,

[email protected]

We propose a two-sample test for high-dimensional means that requires

neither distributional nor correlational assumptions, besides some weak con-

ditions on the moments and tail properties of the elements in the random vec-

tors. This two-sample test based on a nontrivial extension of the one-sample

central limit theorem (Ann. Probab. 45 (2017) 2309–2352) provides a prac-

tically useful procedure with rigorous theoretical guarantees on its size and

power assessment. In particular, the proposed test is easy to compute and does

not require the independently and identically distributed assumption, which

is allowed to have different distributions and arbitrary correlation structures.

Further desired features include weaker moments and tail conditions than ex-

isting methods, allowance for highly unequal sample sizes, consistent power

behavior under fairly general alternative, data dimension allowed to be expo-

nentially high under the umbrella of such general conditions. Simulated and

real data examples have demonstrated favorable numerical performance over

existing methods.

1. Introduction. Two-sample test of high dimensional means as one of the key issues

has attracted a great deal of attention due to its importance in various applications, including

[2–5, 10–12, 19, 24–26, 29]and[21], among others. In this article, we tackle this problem

with the theoretical advance brought by a high-dimensional two-sample central limit the-

orem. Based on this, we propose a new type of testing procedure, called distribution and

correlation-free (DCF) two-sample mean test, which requires neither distributional nor cor-

relational assumptions and greatly enhances its generality in practice.

We denote two samples by X

={X

,...,X

} and Y

={Y

,...,Y

} respectively, where

is a collection of mutually independent (not necessarily identically distributed) random

vectors in R

with X

= (X

,...,X

)



and E(X

) = μ

= (μ

,...,μ

)



, i = 1,...,n,

and Y

is deﬁned in a similar fashion with E(Y

) = μ

= (μ

,...,μ

)



for all i = 1,...,m.

The normalized sums S

and S

are denoted by S

= n

−1/2



i=1

= (S

,...,S

)



and

= m

−1/2



i=1

= (S

,...,S

)



, respectively. Note that we only assume independent

observations, and each sample with a common mean. The hypothesis of interest is

: μ

= μ

v.s. H

: μ

= μ

and the proposed two-sample DCF mean test is such that we reject H

: μ

= μ

at signiﬁ-

cance level α ∈ (0, 1), provided that



− n

1/2

−1/2



∞

≥ c

(α),

where T

=S

− n

1/2

−1/2



∞

is the test statistic that only depends on the inﬁnity

norm of the sample mean difference, and c

(α) that plays a central role in this test is a data-

driven critical value deﬁned in (5)ofTheorem3. It is worth mentioning that c

(α) is easy to

Received October 2018; revised January 2019.

MSC2010 subject classiﬁcations. 62H05, 62F05.

Key words and phrases. High-dimensional central limit theorem, Kolmogorov distance, multiplier bootstrap,

power function.

1304

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1305

compute via a multiplier bootstrap based on a set of independently and identically distributed

(i.i.d.) standard normal random variables that are independent of the data, where the explicit

calculation is described after (6). Note that the computation of the proposed test is of an order

O{n(p + N)}, more efﬁcient than O(Nnp) that is usually demanded by a general resampling

method. In spite of the simple structure of T

, we shall illustrate its desirable theoretical

properties and superior numerical performance in the rest of the article.

We emphasize that our main contributions reside on developing a practically useful test

that is computationally efﬁcient with rigorous theoretical guarantees given in Theorem 3–

5. We begin with deriving nontrivial two-sample extensions of the one-sample central limit

theorems and its corresponding bootstrap approximation theorems in high dimensions [9],

where we do not require the ratio between sample sizes n/(n + m) to converge but merely

reside within any open interval (c

),0<c

≤ c

< 1, as n, m →∞. Further, Theorem 3

lays down a foundation for conducting the two-sample DCF mean test uniformly over all

α ∈ (0, 1). The power of the proposed test is assessed in Theorem 4 that establishes the

asymptotic equivalence between the estimated and true versions. Moreover, the asymptotic

power is shown consistent in Theorem 5 under some general alternatives with no sparsity or

correlation constraints.

The proposed test sets itself apart from existing methods by allowing for non-i.i.d. ran-

dom vectors in both samples. The distribution-free feature is in the sense that, under the

umbrella of some mild assumptions on the moments and tail properties of the coordinates,

there is no other restriction on the distributions of those random vectors. In contrast, exist-

ing literature require the random vectors within sample to be i.i.d. [3–6], and some methods

further restrict the coordinates to follow a certain type of distribution, such as Gaussian or

sub-Gaussian [26, 29]. This feature sets the proposed test free of making assumptions such as

i.i.d. or sub-Gaussianity, which is desirable as distributions of real data are often confounded

by numerous factors unknown to researchers. Another key feature is correlation-free in the

sense that individual random vectors may have different and arbitrary correlation structures.

By contrast, most previous works assume not only a common within-sample correlation ma-

trix, but also some structural conditions, such as those on trace [5], mixing conditions [21]

or bounded eigenvalues from below [3]. It is worth noting that our assumptions on the mo-

ments and tail properties of the coordinates in random vectors are also weaker than those

adopted in literature, for example, [3, 11]and[21] assumed a common ﬁxed upper bound to

those moments, [5]and[19] allowed a portion of those moments to grow but paid a price on

correlation assumptions.

We also stress that the proposed test possesses consistent power behavior under fairly gen-

eral alternative (a mild separation lower bound on μ

− μ

in Theorem 5) with neither spar-

sity nor correlation conditions, while previous work requiring either sparsity [26] or structural

assumption on signal strength [5, 11] or correlation [21], or both [3]. Lastly, we point out that

thedatadimensionp can be exponentially high relative to the sample size under the umbrella

of such mild assumptions. This is also favorable compared to previous work, as [3, 5]and

[21] allowed such ultrahigh dimensions under nontrivial conditions on either the distribution

type (e.g., sub-Gaussian) or the correlation structure (or both) as a tradeoff.

We conclude the Introduction by noting relevant work on one-sample high-dimensional

mean test, such as [14–18, 20, 23, 27, 28]and[1], among others. It is relatively easier to

develop a one-sample DCF mean test with similar advantages based on results in [9], thus is

not pursued here. The rest of the article is organized as follows. In Section 2,wepresentthe

two-sample high-dimensional central limit theorem, and the result on multiplier bootstrap for

evaluating the Gaussian approximation. In Section 3, we establish the main result Theorem 3

for conducting the proposed test, and Theorem 4 to approximate its power function, followed

by Theorem 5 to analyze its asymptotic power under alternatives. Simulation study is carried

1306 K. XUE AND F. YAO

out in Section 4 to compare with existing methods, and an application to a real data example

is presented in Section 5. We collect the auxiliary lemmas and the proofs of the main results,

Theorems 3–5 in the Appendix, and delegate the proofs of Theorems 1–2, Corollary 1 and

the auxiliary lemmas to an online Supplementary Material [22] for space economy.

2. Two-sample central limit theorem and multiplier bootstrap in high dimensions.

In this section, we ﬁrst present an intelligible two-sample central limit theorem in high di-

mensions, which is derived from its more abstract version in Lemma 4 in the Appendix.Then

the result on the asymptotic equivalence between the Gaussian approximation appeared in the

two-sample central limit theorem and its multiplier bootstrap term is also elaborated, whose

abstract version can be referred to Lemma 5.

We ﬁrst list some notation used throughout the paper. For two vectors x = (x

,...,x

) ∈

and y = (y

,...,y

)



∈ R

, write x ≤ y if x

≤ y

for all j = 1,...,p.Foranyx =

,...,x

)



∈ R

and a ∈ R, denote x + a = (x

+ a,...,x

+ a)



.Foranya, b ∈ R,use

the notation a ∨ b = max{a,b} and a ∧ b = min{a,b}. For any two sequences of constants

and b

, write a

 b

if a

≤ Cb

up to a universal constant C>0, and a

∼ b

 b

and b

 a

. For any matrix A = (a

),deﬁneA

∞

= max

i,j

|. For any function

f : R → R, write f 

∞

= sup

z∈R

|f(z)|. For a smooth function g : R

→ R, we adopt

indices to represent the partial derivatives for brevity, for example, ∂

∂

g = g

jkl

.Forany

α>0, deﬁne the function ψ

(x) = exp(x

) − 1forx ∈[0, ∞), then for any random variable

X,deﬁne

(1) X

= inf



λ>0 : E





|X|/λ



≤ 1



which is an Orlicz norm for α ∈[1, ∞) and a quasi-norm for α ∈ (0, 1).

Denote F

={F

,...,F

} as a set of mutually independent random vectors in R

such

that F

= (F

,...,F

)



and F

∼ N

(μ

,E{(X

− μ

)(X

− μ

)



}) for all i = 1,...,n,

which denotes a Gaussian approximation to X

. Likewise, deﬁne a set of mutually inde-

pendent random vectors G

={G

,...,G

} in R

such that G

= (G

,...,G

)



and

∼ N

(μ

,E{(Y

− μ

)(Y

− μ

)



}) for all i = 1,...,m to approximate Y

.Thesets

, Y

, F

and G

are assumed to be independent of each other. To this end, de-

note the normalized sums S

, S

and S

by S

= n

−1/2



i=1

= (S

,...,S

)



= n

−1/2



i=1

= (S

,...,S

)



, S

= m

−1/2



i=1

= (S

,...,S

)



and S

−1/2



i=1

= (S

,...,S

)



,whereS

and S

serve as the Gaussian approximations

for S

and S

, respectively. Lastly, denote a set of independent standard normal random

variables e

n+m

={e

,...,e

n+m

} that is independent of any of X

, F

, Y

and G

2.1. Two-sample central limit theorem in high dimensions. To introduce Theorem 1,a

list of useful notation are given as follows. Denote

= max

1≤j≤p



i=1





− μ





/n, L

= max

1≤j≤p



i=1





− μ





/m.

We denote the key quantity ρ

∗∗

n,m

∗∗

n,m

= sup

A∈A





− n

1/2

+ δ

n,m

− δ

n,m

1/2

∈ A



− P



− n

1/2

+ δ

n,m

− δ

n,m

1/2

∈ A





(2)

where P(S

− n

1/2

+ δ

n,m

− δ

n,m

1/2

∈ A) represents the unknown probability of

interest, and P(S

− n

1/2

+ δ

n,m

− δ

n,m

1/2

∈ A) serves as a Gaussian approxi-

mation to this probability of interest, and ρ

∗∗

n,m

measures the error of approximation over all

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1307

hyperrectangles A ∈ A

. Note that A

is the class of all hyperrectangles in R

of the form

{w ∈ R

: a

≤ w

≤ b

for allj = 1,...,p} with −∞ ≤ a

≤ b

≤∞for all j = 1,...,p.

By assuming more speciﬁc conditions, Theorem 1 gives a more explicit bound on ρ

∗∗

n,m

com-

paredtoLemma4.

HEOREM 1. For any sequence of constants δ

n,m

, assume we have the following condi-

tions (a)–(e):

(a) There exist universal constants δ

>δ

> 0 such that δ

< |δ

n,m

| <δ

(b) There exists a universal constant b>0 such that

min

1≤j≤p



− n

1/2

+ δ

n,m

− δ

n,m

1/2





≥ b.

n,m

≥ 1 such that L

≤ B

n,m

and L

≤ B

n,m

(d) The sequence of constants B

n,m

deﬁned in (c) also satisﬁes

max

1≤i≤n

max

1≤j≤p



exp





− μ



n,m



≤ 2,

max

1≤i≤m

max

1≤j≤p



exp





− μ



n,m



≤ 2.

(e) There exists a universal constant c

> 0 such that

n,m

)



log(pn)



/n ≤ c

,(B

n,m

)



log(pm)



/m ≤ c

Then we have the following property, where ρ

∗∗

m,n

is deﬁned in (2):

∗∗

n,m

≤ K



n,m

)



log(pn)





1/6



n,m

)



log(pm)





1/6



for a universal constant K

> 0.

Conditions (a)–(c) correspond to the moment properties of the coordinates, and (d) con-

cerns the tail properties. It follows from (a) and (b) that the moments on average are bounded

below away from zero, hence allowing certain proportion of these moments to converge to

zero. This is weaker than previous work that usually require a uniform lower bound on all

moments [3, 11, 21]. Condition (c) implies that the moments on average has an upper bound

n,m

that can diverge to inﬁnity without restriction on correlation, thus offers more ﬂexibil-

ity than those in literature that demands either a ﬁxed upper bound or a certain correlation

structure or both. To appreciate this, letting B

n,m

∼ n

1/3

, one notes that all the variances of

the coordinates are allowed to be uniformly as large as B

2/3

n,m

∼ n

2/9

→∞under condition

(c), while no restriction on correlation is needed. As a comparison, if we assign a common

covariance to two samples, say  = (

)

1≤j,k≤p

with each 

= n

2/9

1{j=k}

for some

constant ρ ∈ (0, 1), then the trace condition in [5] implies that p = o(1). Compared with a

ﬁxed upper bound on the tails of the coordinates [3, 21], condition (d) allows for uniformly

diverging tails as long as B

n,m

→∞. Condition (e) indicates that the data dimension p can

grow exponentially in n, provided that B

n,m

is of some appropriate order. These conditions

as a whole set the basis for the so-called “distribution and correlation-free” features.

2.2. Two-sample multiplier bootstrap in high dimensions. Due to the unknown probabil-

ity in ρ

∗∗

n,m

(2) denoting the Gaussian approximation, it limits the applicability of the central

limit theorem for inference. The idea is to adopt a multiplier bootstrap to approximate its

Gaussian approximation, and quantify its approximation error bound. Denote



= n

−1



i=1



− μ



− μ









= n

−1



i=1

−

X)(X

−



1308 K. XUE AND F. YAO

where

X = n

−1



i=1

= (

,...,

)



. Analogously, denote 



and

Y .Nowwe

introduce the multiplier bootstrap approximation in this context. Let e

n+m

={e

,...,e

n+m

}

be a set of i.i.d. standard normal random variables independent of the data, we further denote

(3) S

= n

−1/2



i=1

−

X), S

= m

−1/2



i=1

i+n

−

Y),

and it is obvious that E



) =



and E



) =



,whereE

(·) means the

expectation with respect to e

n+m

only. Then, for any sequence of constants δ

n,m

that depends

on both n and m, we denote the quantity of interest ρ

n,m

= sup

A∈A





+ δ

n,m

∈ A



− P



− n

1/2

+ δ

n,m

− δ

n,m

1/2

∈ A





(4)

where P

(·) means the probability with respect to e

n+m

only, and P

+δ

n,m

∈ A) acts

as the multiplier bootstrap approximation for the Gaussian approximation P(S

− n

1/2

n,m

− δ

n,m

1/2

∈ A). In particular, ρ

n,m

can be understood as a measure of error

between the two approximations over all hyperrectangles A ∈

. The following theorem

provides a more explicit bound on ρ

n,m

in contrast to its abstract version stated in Lemma 5

in the Appendix.

HEOREM 2. For any sequence of constants δ

n,m

, assume we have the following condi-

tions (a)–(e),

(a) There exists a universal constant δ

> 0 such that |δ

n,m

| <δ

(b) There exists a universal constant b>0 such that

min

1≤j≤p



− n

1/2

+ δ

n,m

− δ

n,m

1/2





≥ b.

n,m

≥ 1 such that

max

1≤j≤p



i=1



− μ





/n ≤ B

n,m

max

1≤j≤p



i=1



− μ





/m ≤ B

n,m

(d) The sequence of constants B

n,m

deﬁned in (c) also satisﬁes

max

1≤i≤n

max

1≤j≤p



exp





− μ



n,m



≤ 2,

max

1≤i≤m

max

1≤j≤p



exp





− μ



n,m



≤ 2.

(e) There exists a sequence of constants α

n,m

∈ (0,e

−1

) such that

n,m

log

(pn) log

(1/α

n,m

)/n ≤ 1,

n,m

log

(pm) log

(1/α

n,m

)/m ≤ 1.

Then there exists a universal constant c

∗

> 0 such that with probability at least 1 − γ

n,m

where

n,m

= (α

n,m

)

log(pn)/3

+ 3(α

n,m

)

log

1/2

(pn)/c

∗

+ (α

n,m

)

log(pm)/3

+ 3(α

n,m

)

log

1/2

(pm)/c

∗

+ (α

n,m

)

log

(pn)/6

+ 3(α

n,m

)

log

(pn)/c

∗

+ (α

n,m

)

log

(pm)/6

+ 3(α

n,m

)

log

(pm)/c

∗

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1309

we have the following property, where ρ

n,m

is deﬁned in (4),

n,m





n,m

log

(pn) log

(1/α

n,m

)/n



1/6



n,m

log

(pm) log

(1/α

n,m

)/m



1/6

Conditions (a)–(c) pertain to the moment properties of the coordinates, condition (d) con-

cerns the tail properties and condition (e) characterizes the order of p. These conditions

have the desirable features as those in Theorem 1, such as allowing for uniformly diverging

moments and tails and so on. Moreover, by combining Theorem 2 withatwo-sampleBorel–

Cantelli lemma (i.e., Lemma 6), where condition (f) is needed for Lemma 6, one can deduce

Corollary 1 below, which facilitates the derivation of our main result in Theorem 3.

OROLLARY 1. For any sequence of constants δ

n,m

, assume the conditions (a)–(e) in

Theorem 2 hold. Also suppose that the condition (f) holds as follows:

(f) The sequence of constants γ

n,m

deﬁned in Theorem 2 also satisﬁes



n,m

< ∞.

Then with probability one, we have the following property, where ρ

n,m

is deﬁned in (4),

n,m





n,m

log

(pn) log

(1/α

n,m

)/n



1/6



n,m

log

(pm) log

(1/α

n,m

)/m



1/6

3. Two-sample mean test in high dimensions. In this section, based on the theoretical

results from the preceding section, we ﬁrst establish the main result, Theorem 3, which gives

a conﬁdence region for the mean difference (μ

− μ

) and, equivalently, the DCF test pro-

cedure. We note that the theoretical guarantee is uniform for all α ∈ (0, 1) with probability

one.

HEOREM 3. Assume we have the following conditions (a)–(e):

(a) n/(n + m) ∈ (c

), for some universal constants 0 <c

< 1.

(b) There exists a universal constant b>0 such that

min

1≤j≤p





− n

1/2





+ E



− m

1/2





≥ b.

n,m

≥ 1 such that

max

1≤j≤p



i=1





− μ



k+2



/n ≤ B

n,m

max

1≤j≤p



i=1





− μ



k+2



/m ≤ B

n,m

for all k = 1, 2.

(d) The sequence of constants B

n,m

deﬁned in (c) also satisﬁes

max

1≤i≤n

max

1≤j≤p



exp





− μ



n,m



≤ 2,

max

1≤i≤m

max

1≤j≤p



exp





− μ



n,m



≤ 2.

(e) B

n,m

log

(pn)/n → 0 as n →∞.

1310 K. XUE AND F. YAO

Then with probability one, the Kolmogorov distance between the distributions of the quantity

S

−n

1/2

−1/2

−n

1/2

(μ

−μ

)

∞

and the quantity S

−n

1/2

−1/2



∞

satisﬁes

sup

t≥0







− n

1/2

−1/2

− n

1/2



− μ





∞

≤ t



− P





− n

1/2

−1/2



∞

≤ t









n,m

log

(pn)/n



1/6

where S

and S

are as in (3), and P

(·) means the probability with respect to e

n+m

only.

Consequently,

sup

α∈(0,1)







− n

1/2

−1/2

− n

1/2



− μ





∞

≤ c

(α)



− (1 − α)







n,m

log

(pn)/n



1/6

where

(5) c

(α) = inf



t ∈ R : P





− n

1/2

−1/2



∞

≤ t



≥ 1 − α



for α ∈ (0, 1), where S

and S

are as in (3), and P

(·) denotes the probability with respect

to e

n+m

only.

Note that condition (a) is on the relative sample sizes that allows the ratio n/(n + m) to

diverge within any open interval (c

) for 0 <c

< 1, rather than demanding conver-

gence as in existing work. Conditions (b) and (c) concern the moment properties of the coor-

dinates, while condition (d) is associated with the tail properties, and condition (e) quantiﬁes

the order of p. By inspection, these conditions are slightly stronger than those in Theorems

1 and 2, but still maintain all desired advantages. To appreciate such beneﬁts, consider the

following example:

n/(n + m) ∈ (0.1, 0.9), B

n,m

∼ n

1/9

, log p ∼ n

,α∈ (0, 1/9),

,...,X

n/2

i.i.d.

∼ N(0

,), X

n/2+1

,...,X

i.i.d.

∼ N(0

, 2),

,...,Y

m/3

i.i.d.

∼ N(1

, 3), Y

m/3+1

,...,Y

i.i.d.

∼ N(1

, 4),

where 1

is the vector of ones, and the covariance matrix  = (

) ∈ R

p×p

with each



= n

2/27

1{j=k}

for some constant ρ ∈ (0, 1). Then one can verify that this example ful-

ﬁlls all conditions in Theorem 3, but violates the assumptions in most existing articles which

requires i.i.d. samples or trace conditions [5].

From Theorem 3, the 100(1 − α)% conﬁdence region for (μ

− μ

) can be expressed as

1−α



− μ



− n

1/2

−1/2

− n

1/2



− μ





∞

≤ c

(α)



Equivalently, the proposed test procedure in (6) is such that, we reject H

: μ

= μ

signiﬁcance level α ∈ (0, 1),if

(6) T



− n

1/2

−1/2



∞

≥ c

(α),

where the data-driven critical value c

(α) in (5) admits fast computation via the multiplier

bootstrap using independent set of i.i.d. standard normal random variables, which is imple-

mented as follows:

• Generate N sets of standard normal random variables, each of size (n + m), denoted by

n+m

, ..., e

n+m

as random copies of e

n+m

={e

,...,e

n+m

}. Then calculate N times of

=S

− n

1/2

−1/2



∞

while keeping X

and Y

ﬁxed, where S

and S

are

in (3). These values are denoted as {T

,...,T

} whose 100(1 − α)th quantile is used to

approximate c

(α).

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1311

It is easy to see that the computation of the DCF test is of the order O{n(p + N)}, compared

with O(Nnp) that is usually demanded by a general resampling method.

According to (6), the true power function for the test can be formulated as

(7) Power



− μ



= P





− n

1/2

−1/2



∞

≥ c

(α) | μ

− μ



To quantify the power of the DCF test, the expression (7) is not directly applicable since

the distribution of (S

− n

1/2

−1/2

) is unknown. Motivated by Theorem 3, we propose

another multiplier bootstrap approximation for Power(μ

− μ

), based on a different set of

standard normal random variables e

∗n+m

={e

∗

,...,e

∗

n+m

} independent of e

n+m

that are used

to calculate c

(α),

(8)

Power

∗



− μ



= P

∗





∗

− n

1/2

−1/2

∗

+ n

1/2



− μ





∞

≥ c

(α)



where S

∗

and S

∗

are as deﬁned in (3) with e

∗n+m

instead of e

n+m

,andP

∗

(·) means the

probability with respect to e

∗n+m

only. The following theorem is devoted to establishing the

asymptotic equivalence between Power(μ

− μ

) and Power

∗

(μ

− μ

) under the same

conditions as those in Theorem 3.

HEOREM 4. Assume the conditions (a)–(e) in Theorem 3 hold, then for any μ

− μ

∈

, we have with probability one,



Power

∗



− μ



− Power



− μ









n,m

log

(pn)/n



1/6

By inspection of the conditions in Theorem 4, it is worth mentioning that neither sparsity

nor correlation restriction is required, as opposed to previous work requiring sparsity [3]

for instance. To appreciate this point, the asymptotic power under fairly general alternatives

speciﬁed by condition (f) is analyzed in the theorem below.

HEOREM 5. Assume the conditions (a)–(e) in Theorem 3 and that

(f)

n,m,p

={μ

∈ R

,μ

∈ R

:μ

− μ



∞

≥ K

n,m

log(pn)/n}

1/2

}, for a sufﬁ-

ciently large universal constant K

> 0.

Then for any μ

− μ

∈ F

n,m,p

, we have with probability tending to one,

Power

∗



− μ



→ 1 as n →∞.

The set

n,m,p

in (f) imposes a lower bound on the separation between μ

and μ

,which

is comparable to the assumption max

|δ

/σ

1/2

i,i

|≥{2β log(p)/n}

1/2

in Theorem 2 in [3]. The

latter is in fact a special case of condition (f) when the sequence B

n,m

is constant. It is worth

mentioning that the asymptotic power converges to 1 under neither sparsity nor correlation as-

sumptions in the context of our theorem. In contrast, Theorem 2 in [3] requires not only sparse

alternatives, but also restrictions on the correlation structure, for example, condition 1 in that

theorem such that the eigenvalues of the correlation matrix diag ()

−1/2

 diag ()

−1/2

lower bounded by a positive universal constant. These comparisons reveal that the proposed

DCF is powerful for a broader range of alternatives. We conclude this section by noting that

the theory for the DCF-type test based on L

-norm can also be of interest but is not yet

established, which needs further investigation.

1312 K. XUE AND F. YAO

4. Simulation studies. In the two-sample test for high-dimensional means, methods that

are frequently used and/or recently proposed include those proposed by [5] (abbreviated as

CQ, an L

norm test), [3] (abbreviated as CL, an L

∞

norm test) and [21] (abbreviated as XL,

a test combining L

and L

∞

norms) tests. We conduct comprehensive simulation studies to

compare our DCF test with these existing methods in terms of size and power under various

settings. The two samples X

={X

}

i=1

and Y

={Y

}

i=1

have sizes (n, m), while the data

dimension is chosen to be p = 1000. Without loss of generality, we let μ

= 0 ∈ R

.The

structure of μ

∈ R

is controlled by a signal strength parameter δ>0 and a sparsity level

parameter β ∈[0, 1]. To construct μ

, in each scenario, we ﬁrst generate a sequence of i.i.d.

random variables θ

∼ U(−δ,δ) for k = 1,...,p and keep them ﬁxed in the simulation

under that scenario. We set δ(r) ={2r log(p)/(n ∨ m)}

1/2

that gives appropriate scale of

signal strength [3, 5, 28]. We take μ

= (θ

,...,θ

βp

, 0



p−βp

)



∈ R

,wherea denotes

the nearest integer no more than a,and0

is the q-dimensional vector of 0’s. Thus the signal

becomes sparser for a smaller value of β, with β = 0 corresponding to the null hypothesis

and β = 1 representing the fully dense alternative. The covariance matrices of the random

vectors are denoted by cov(X

) = 

,cov(Y



) = 



for all i = 1,...,n, i



= 1,...,m.

The nominal signiﬁcance level is α = 0.05, and the DCF test is conducted based on the

multiplier bootstrap of size N = 10

To have comprehensive comparison, we ﬁrst consider the following six different set-

tings. The ﬁrst setting is standard with (n,m,p) = (200, 300, 1000), where the elements

in each sample are i.i.d. Gaussian, and the two samples share a common covariance ma-

trix  = (

)

1≤j,k≤p

. The matrix  is speciﬁed by a dependence structure such that



= (1 +|j − k|)

−1/4

. Beginning with δ = 0.1, where the implicit chosen value r = 0.217

corresponds to quite weak signal according to [3, 28], we calculate the rejection proportions

of the four tests based on 1000 Monte Carlo runs over a full range of sparsity levels from

β = 0 (corresponding to null hypothesis) to β = 1 (corresponding to fully dense alternative).

Then the the signals are gradually strengthened to δ = 0.15, 0.2, 0.25, 0.3. The second set-

ting is similar to the ﬁrst, except for 

= 2



= 2 for all i = 1,...,n, i



= 1,...,m,

where  is deﬁned in the ﬁrst setting. These two settings are denoted by “i.i.d. equal (resp.,

unequal) covariance setting.”

In the third setting, the random vectors in each sample have completely different distribu-

tions and covariance matrices from one another. The procedure to generate the two samples

is as follows. First, a set of parameters {φ

: i = 1,...,m,j = 1,...,p} are generated from

the uniform distribution U(1, 2) independently, and are kept ﬁxed for all Monte Carlo runs.

In a similar fashion, {φ

∗

: i = 1,...,m,j = 1,...,p} are generated from U(1, 3) indepen-

dently. Then, for every i = 1,...,n,wedeﬁneap × p matrix 

= (ω

ij k

)

1≤j,k≤p

with each

ij k

= (φ

)

1/2

(1 +|j − k|)

−1/4

. Likewise, for every i = 1,...,m,deﬁneap × p matrix



∗

= (ω

∗

ij k

)

1≤j,k≤p

with each ω

∗

ij k

= (φ

∗

)

1/2

(1 +|j − k|)

−1/4

. Subsequently, we gener-

ate a set of i.i.d. random vectors

}

i=1

with each

= (

,...,

)



∈ R

,such

that {

,...,

i,2p/5

} are i.i.d. standard normal random variables, {

i,2p/5+1

,...,

i,p

} are

i.i.d. centered Gamma(16, 1/4) random variables, and they are independent of each other. Ac-

cordingly, we construct each X

by letting X

= μ

+ 

1/2

for all i = 1,...,n.Itisworth

noting that 

= 

for all i = 1,...,n,thatis,X

’s have different covariance matrices and

distributions. The other sample Y

={Y

}

i=1

is constructed in the same way with 

= 

∗

for all i = 1,...,m. Then we obtained the results for various signal strength levels of δ over

a full range of sparsity levels of β, and we denote this setting as “completely relaxed.” The

fourth setting is analogous to the third, except that we set (n,m,p)= (100, 400, 1000),where

two sample sizes deviates substantially from each other. Since this setting is concerned with

highly unequal sample sizes, and is therefore denoted as “completely relaxed and highly un-

equal setting.” The ﬁfth setting is similar to the third, except that we replace the standard

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1313

normal innovations in

and



by independent and heavy-tailed innovations (5/3)

−1/2

t(5)

with mean zero and unit variances, referred to as “completely relaxed and heavy-tailed set-

ting.” The sixth setting is also analogous to the third, while independent and skewed innova-

tions 8

−1/2

{χ

(4) − 4} with mean zero and unit variances are used, denoted by “completely

relaxed and skewed setting.”

We conduct the four tests and calculate the rejection proportions to assess the empirical

power at different signal levels δ and sparsity levels β in each setting as described above,

based on 1000 Monte Carlo runs. The numerical results of these six settings are shown in

Tables 1–2. For visualization, we depict the empirical power plots of all settings in Figure 1.

We also display the multiplier bootstrap approximation based on another independent set of

size N = 10

, which agrees well with the empirical size/power of the DCF test and justiﬁes

the theoretical assessment in Theorem 4. We see that the empirical sizes of proposed DCF

test agree well with the nominal level 0.05 in all six settings. By comparison, the CQ test is

not as stable, and the CL and XL tests show underestimation of type I error in all settings.

Regarding power performance under alternatives in these six settings, despite all tests suf-

fering low power for the weak signals δ = 0.1andδ = 0.15, the DCF test still dominates the

other tests at all levels of β. When the signal strength rises to δ = 0.2, the results in Setting I

indicate that the DCF test outperforms the other tests, except for the CQ test when β ≥ 80%

(a very dense alternative). Although the power of CQ test increases above that of DCF test

at β = 80%, the gains are not substantial when both tests have high power. Similar patterns

are observed in Settings II, III, V, VI with δ = 0.25 for β ranging between 80% and 83%,

and Settings III, IV with δ = 0.3forβ at 80% and 90%, respectively. This phenomenon is

visually shown in the power plot in Figure 1. It is also noted the DCF test dominates the CL

∞

type) and XL (combined type) uniformly in these settings over all levels of δ and β.

To summarize, except for the rapidly increased power of CQ test in very dense alternatives,

the DCF test outperforms the other tests over various signal levels of δ in a broad range of

sparsity levels β, for alternatives with varied magnitudes and signs. Moreover, the gains are

sustainable in the situations that the data structures get more complex, for example, highly

unbalanced sizes, heavy-tailed or skewed distributions.

We further examine alternatives with common/ﬁxed signal upon reviewer’s request

under the “completely relaxed setting,” denoted by Setting VII, where we let μ

δ(1,...,1

βp

, 0



p−βp

)



. Note that the empirical sizes of four tests in Setting VII are the

same as those in Setting III (thus not reported), while the power patterns appear to favor

the CQ test when increasing for dense alternatives (DCF still dominates in the range of less

dense levels). Here, numerical power values are not tabulated for conciseness, given that the

visualization in Figure 1 sufﬁces. We conclude this section by pointing out that, compared

to Settings I–VI in which nonzero signals θ

∼ U(−δ,δ), the alternatives in Setting VII with

common/ﬁxed signal are more stringent and easy to be violated in practice.

5. Real data example. We analyze a dataset obtained from the UCI Machine Learning

Repository, https://archive.ics.uci.edu/ml/datasets/eeg+database. The data consist of 122 in-

dividuals, out of which n = 45 participants belong to the control group, while the remaining

m = 77 are in the alcoholic group. In the experiment, each subject was shown to a single

stimulis (e.g., picture of object) selected from the 1980 Snodgrass and Vanderwart picture

set. Then, for each individual, the researchers recorded the EEG measurements which were

sampled at 256 Hz (3.9-msec epoch) for one second from 64 electrodes on that person’s

scalps, respectively. As a common practice of data reduction, for each electrode, we pool the

256 records to form 64 measurements by taking the average of the original records on four

proximal grid points. Likewise, we also pool the 64 electrodes by taking the average on ev-

ery four proximal electrodes, resulting 16 combined electrodes. For the control group, we let

1314 K. XUE AND F. YAO

TABLE 1

Rejection proportions (%) calculated for four testing methods at different signal strength levels of δ and sparsity levels of β based on 1000 Monte Carlo runs, where β = 0

corresponds to the null hypothesis β = 1 to the fully dense alternative, and (n,m,p)= (200, 300, 1000)

Setting I: i.i.d. equal cov

δ = 0.1 δ = 0.15 δ = 0.2 δ = 0.25 δ = 0.3

Test DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ

β = 04.20 2.40 3.90 5.80 4.30 2.30 2.40 3.60 4.50 2.80 3.70 6.00 4.60 2.70 2.20 3.80 5.00 3.10 3.80 6.10

β = 0.02 5.00 3.20 2.50 3.40 7.50 4.80 3.70 3.50 15.410.56.50 3.90 31.723.314.64.40 59.047.932.64.90

β = 0.04 5.80 3.70 2.80 3.60 10

.06.20 4.30 3.90 20.614.28.80 4.70 40.630.820.05.10 72.058.941.55.30

β = 0.29.90 6.50 3.90 4.50 22.715.99.10 5.30 48.737.323.77.40 84.572.452.011.699.397.187.223.4

β = 0.413.99.40 5.30 5.20 35.325.414.47.80 68.857.

137.916.596.891.172.742.5 100 100 97.796.9

β = 0.617.811.86.70 5.60 45.833.720.312.882.771.851.139.999.697.286.899.1 100 100 100 100

β = 0.822.413.89.00 8.30 55.540.124.423.191.381.761.591.7 100 99.295.7 100 100 100 100 100

β = 126.517.910.910.764.548.

130.639.595.088.570.1 100 100 99.6 100 100 100 100 100 100

Setting II: i.i.d. unequal cov

δ = 0.1 δ = 0.15 δ = 0.2 δ = 0.25 δ = 0.3

Test DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ

β = 04.90 1.80 3.70 6.10 5.20 1.30 2.20 3.80 5.00 1.60 3.60 6.00 4.80 1.20 3.50 6.30 5.00 1.90 3.90 6.20

β = 0.02 4.70 1.00 2.40 3.80 6.60 1.40 2.70 4.10 10.72.60 2.90 4.10 19.16.70 4.80 4.40 33.314.48.80 4.50

β = 0.04 5.80 1.30 2.50 4.10 7

.90 1.80 2.80 4.30 12.53.50 3.40 4.50 24.79.30 6.00 4.60 42.520.312.25.00

β = 0.28.10 1.90 2.70 4.60 15.04.40 3.80 4.90 30.911.27.20 6.40 57.626.516.38.40 86.852.133.911.8

β = 0.410.62.80 3.10 5.70 22.47.20 5.70 6.50 47.319.

611.610.078.743.226.619.197.574.153.245.7

β = 0.613.53.30 3.80 6.70 29.29.60 6.70 8.40 59.026.517.118.790.556.236.754.499.888.170.199.6

β = 0.816.44.60 4.50 7.40 37.411.98.60 12.670.932.921.439.695.667.047.098

.9 100 94.290.5 100

β = 119.25.20 5.00 8.10 43.514.410.718.379.439.928.179.898.276.267.8 100 100 97.799.9 100

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1315

TABLE 1

(Continued)

Setting III: completely relaxed

δ = 0.1 δ = 0.15 δ = 0.2 δ = 0.25 δ = 0.3

Test DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ

β = 04.70 2.00 3.90 6.30 4.50 1.70 2.30 3.50 4.80 1.90 3.70 6.10 4.60 2.20 2.80 3.90 5.10 2.10 3.80 6.20

β = 0.02 4.90 2.10 3.20 4.40 6.50 2.70 3.50 5.30 9.40 4.30 4.00 5.60 13.67.80 6.20 5.70 24.912.910.15.90

β = 0.04 5.60 2.40 3.50 4.70 7

.60 3.40 4.20 5.40 12.16.00 5.00 5.80 19.110.88.80 6.00 32.819.113.86.50

β = 0.27.50 3.80 4.30 5.80 12.16.00 5.60 6.60 23.912.58.90 7.50 44.226.316.69.30 71.650.232.114.1

β = 0.49.40 3.90 4.50 6.30 18.49.00 8.00 7.60 35.819.

912.711.762.340.826.418.589.369.948.631.5

β = 0.611.54.90 6.20 6.80 24.010.88.90 9.50 48.028.218.217.876.855.337.035.796.583.864.683.1

β = 0.813.66.40 6.60 7.00 30.313.511.712.757.336.423.428.586.765.045.181

.298.591.677.4 100

β = 0.83 14.37.10 6.80 7.50 31.014.611.813.158.037.623.930.887.666.146.188.098.992.679.2 100

β = 116.68.50 7.40 8.00 35.017.213.917.365.642.828.348.290.875.756.099.999.295.595.7 100

1316 K. XUE AND F. YAO

TABLE 2

Rejection proportions (%) calculated for four testing methods at different signal strength levels of δ and sparsity levels of β based on 1000 Monte Carlo runs, where β = 0

corresponds to the null hypothesis β = 1 to the fully dense alternative, (n,m,p)= (100, 400, 1000) for Setting IV, and (n,m,p)= (200, 300, 1000) for Settings V and VI

Setting IV: completely relaxed and highly unequal sample sizes

δ = 0.1 δ = 0.15 δ = 0.2 δ = 0.25 δ = 0.3

Test DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ

β = 04.70 0.800 3.90 6.80 4.90 0.900 3.80 6.30 5.20 0.700 3.90 6.10 4.50 0.600 3.50 6.00 4.90 0.500 3.40 6.10

β = 0.02 5.20 1.10 2.90 4.70 5.90 1.00 3.60 5.60 6.70 1.40 4.60 5.80 8.90 2.40 5.00 5.80 13.24.20 6.20 5.90

β = 0.04 5.40 1.20 3.00 4.80 6

.30 1.30 4.50 5.70 7.80 1.90 5.00 6.00 11.23.30 5.60 6.10 17.65.70 7.10 6.20

β = 0.26.60 1.30 3.30 5.40 9.20 2.20 5.10 5.80 14.93.90 5.70 6.20 25.38.70 7.00 7.50 42.816.511.88.80

β = 0.47.80 2.00 4.30 5.50 12.43.40 5.20 6.10 22.36.

60 7.10 8.60 38.213.09.70 10.761.324.817.015.8

β = 0.69.10 2.40 4.60 5.80 16.13.80 5.50 7.90 29.510.09.20 10.849.919.314.317.675.333.721.934.2

β = 0.810.52.50 4.70 6.10 19.95.20 6.70 9.20 36.912.710.914.560.124.019.332

.284.946.633.678.2

β = 0.911.32.80 4.80 6.40 21.95.40 7.10 9.90 39.513.312.617.764.626.621.643.888.048.635.394.0

β = 112.12.90 5.30 7.30 23.45.90 7.30 11.042.014.612.821.768.629.624.559.090.953.141.999.4

Setting V: completely relaxed and heavy-tailed

δ = 0.1 δ = 0.15 δ = 0.2 δ = 0.25 δ = 0.3

Test DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ

β = 04.20 2.20 3.80 6.20 5.20 2.50 3.90 6.10 4.70 1.90 2.90 6.00 4.30 2.00 1.70 3.90 4.50 2.30 2.00 3.70

β = 0.02 5.50 2.10 3.70 5.40 6.40 2.50 3.90 5.50 9.50 4.40 4.60 6.10 15.37.40 6.30 6.10 25.515.010.36.20

β = 0.04 6.20 2.30 3.80 5.50 7

.20 3.60 4.20 6.00 12.66.60 5.80 6.20 18.99.80 7.00 6.50 33.320.713.07.10

β = 0.27.50 3.60 4.00 5.80 12.46.80 6.50 7.30 23.513.09.60 8.90 45.627.617.911.371.752.633.814.1

β = 0.49.50 4.20 4.40 5.90 18.19.00 8.30 8.90 35.921.

314.012.764.443.226.918.590.373.452.033.7

β = 0.611.55.10 4.50 6.00 23.812.610.111.746.729.219.417.877.555.937.438.997.486.565.688.2

β = 0.813.77.30 6.20 8.80 29.416.012.314.156.536.924.928.987.469.148.381

.499.293.680.0 100

β = 0.83 14.17.50 6.30 9.20 30.617.313.015.258.138.126.032.088.170.149.587.599.394.182.1 100

β = 116.18.90 7.40 9.40 34.918.915.017.264.544.630.552.291.675.156.699.899.796.596.0 100

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1317

TABLE 2

(Continued)

Setting VI: completely relaxed and skewed

δ = 0.1 δ = 0.15 δ = 0.2 δ = 0.25 δ = 0.3

Test DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ DCF CL XL CQ

β = 04.20 2.10 2.40 3.60 4.90 1.40 2.70 3.80 5.00 1.60 2.50 3.90 4.90 2.40 3.70 5.80 4.70 1.90 2.70 3.90

β = 0.02 4.80 1.30 2.70 4.40 6.20 1.70 3.10 4.70 7.50 2.70 3.80 4.90 12.95.80 5.00 5.00 24.311.88.30 5.00

β = 0.04 5.30 1.40 3.00 4.60 7

.00 2.30 3.30 4.90 11.35.20 4.50 5.10 17.18.70 7.00 5.10 32.217.312.05.30

β = 0.27.40 3.00 3.30 4.80 12.85.80 5.00 5.80 23.012.99.20 6.40 42.425.617.78.40 71.348.632.512.4

β = 0.49.40 4.50 4.00 5.10 18.79.30 6.80 7.20 37.321.

913.410.662.943.328.617.389.470.951.830.7

β = 0.611.55.70 4.50 6.20 24.712.39.60 9.50 48.129.818.116.575.755.037.634.895.983.764.586.4

β = 0.814.26.30 5.80 6.60 30.514.910.512.558.037.623.427.186.765.444.980

.298.792.077.5 100

β = 0.83 14.37.50 6.30 6.70 31.615.310.813.260.139.324.229.887.966.546.287.498.992.881.0 100

β = 116.38.90 6.70 7.40 35.919.314.616.467.044.729.449.391.074.657.299.999.396.197.2 100

1318 K. XUE AND F. YAO

FIG.1. Shown are the bootstrap approximated power curve of the DCF test (crosses), and the empirical power

curves of four methods: the DCF test (squares), the CL test (triangles point down), the XL test (circles) and the

CQ test (triangles point up) based on 1000 Monte Carlo runs under Settings I–VII across different signal levels

of δ and sparsity levels of β.

c,j

= (μ

c,j,1

,...,μ

c,j,64

)



∈ R

be the common mean vector of the EEG measurements

on j ’th electrode for j = 1,...,16. For convenience, we write μ

= (μ



c,1

,...,μ



c,16

)



∈ R

with p = 64 × 16 = 1024 that is much larger than n and m. Similarly, for the alcoholic group,

let μ

a,j

= (μ

a,j,1

,...,μ

a,j,64

)



∈ R

be the common mean vector of EEG measurements on

j ’th electrode for j = 1,...,16, and denote μ

= (μ



a,1

,...,μ



a,16

)



∈ R

. We are interested

in the hypothesis test

: μ

= μ

versus H

: μ

= μ

to determine whether there is any difference in means of EEG between two groups. We ﬁrst

carry out the DCF, CL, XL and CQ tests, whose p-values are given by 0.006, 0.1708, 0.093

and 0.0955, shown in Table 3. In literature, [13] provided evidence for the mean difference

between two groups, the proposed DCF test indeed detected the difference with statistical

signiﬁcance while the other tests failed to.

For further veriﬁcation, we carry out random bootstrap with replacement separately within

each sample, and repeat for 500 times. The rejection proportions for the four tests over the

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1319

TABLE 3

Shown are the results of four tests based the original dataset, the

bootstrapped samples and the random permutations

p-values of the four tests based on the

dataset

Test DCF CL XL CQ

p-value 0.006 0.1708 0.093 0.0955

Rejection proportions (%) of the four tests

over 500 bootstrapped datasets

Test DCF CL XL CQ

Rejection proportion 82 65.865 58

Rejection proportions (%) of the four tests

over 500 random permutations

Test DCF CL XL CQ

Rejection proportion 4.61.83.47.4

500 bootstrapped datasets are given in Table 3, which shows that the highest rejection pro-

portion among the four tests is achieved by DCF at 82%. This is in line with the smallest

and signiﬁcant p-value given by the DCF test based on the dataset itself. We also perform

500 random permutations of the whole dataset (i.e., mixing up two groups that eliminate the

group difference) and conduct four tests over each permuted dataset. From Table 3,wesee

that the rejection proportion of the DCF test (0.046) is close to the nominal level α = 0.05,

while those of the other tests differ considerably.

APPENDIX

We ﬁrst present some auxiliary lemmas that are key for deriving the main theorems. To

introduce Lemma 1,foranyβ>0andy ∈ R

, we deﬁne a function F

(w) as

(w) = β

−1

log





j=1

exp



β(w

− y

)





,w∈ R

which satisﬁes the property

0 ≤ F

(w) − max

1≤j≤p

− y

) ≤ β

−1

log p,

for every w ∈ R

by (1) in [8]. In addition, we let ϕ

: R →[0, 1] be a real valued function

such that ϕ

is thrice continuously differentiable and ϕ

(z) = 1forz ≤ 0andϕ

(z) = 0for

z ≥ 1. For any φ ≥ 1, deﬁne a function ϕ(z) = ϕ

(φz), z ∈ R. Then, for any φ ≥ 1and

y ∈ R

, denote β = φ log p and deﬁne a function κ : R

→[0, 1] as

(9) κ(w) = ϕ



φF

φ log p

(w)



= ϕ



(w)



,w∈ R

Lemma 1 is devoted to characterize the properties of the function κ deﬁned in (9), which can

be also referred to Lemmas A.5 and A.6 in [7].

EMMA 1. For any φ ≥ 1 and y ∈ R

, we denote β = φ log p, then the function κ deﬁned

in (9) has the following properties, where κ

jkl

denotes ∂

∂

κ. For any j,k,l = 1,...,p,

there exists a nonnegative function Q

jkl

such that:

1320 K. XUE AND F. YAO

(1) |κ

jkl

(w)|≤Q

jkl

(w) for all w ∈ R

(2)



j=1



k=1



l=1

jkl

(w)  (φ

+ φ

β + φβ

)  φβ

for all w ∈ R

(3) Q

jkl

(w)  Q

jkl

(w +˜w)  Q

jkl

(w) for all w ∈ R

and ˜w ∈{w

∗

∈ R

max

1≤j≤p

∗

|β ≤ 1}.

To state Lemma 2, a two-sample extension of Lemma 5.1 in [9], for any sequence of

constants δ

n,m

that depends on both n and m, we denote the quantity ρ

n,m

= sup

v∈[0,1]

sup

y∈R





1/2



− n

1/2

+ δ

n,m

− δ

n,m

1/2



+ (1 − v)

1/2



− n

1/2

+ δ

n,m

− δ

n,m

1/2



≤ y



− P



− n

1/2

+ δ

n,m

− δ

n,m

1/2

≤ y





(10)

Lemma 2 provides a bound on ρ

n,m

under some general conditions.

EMMA 2. For a ny φ

,φ

≥ 1 and any sequence of constants δ

n,m

, assume the following

condition (a) holds,

(a) There exists a universal constant b>0 such that

min

1≤j≤p



− n

1/2

+ δ

n,m

− δ

n,m

1/2





≥ b.

Then we have

n,m

 n

−1/2

(log p)



n,m

+ L

(log p)

1/2

+ φ

(φ

)



+ m

−1/2

(log p)

|δ

n,m



n,m

+ L

(log p)

1/2

+ φ

∗

(φ

)





min{φ

,φ

}



−1

(log p)

1/2

up to a positive universal constant that depends only on b, where ρ

n,m

is deﬁned in (10).

To state Lemma 3 that is a two-sample version of Corollary 5.1 in [9], for any sequence of

constants δ

n,m

that depends on both n and m, we denote the quantity ρ

∗

n,m

∗

n,m

= sup

v∈[0,1]

sup

A∈A





1/2



− n

1/2

+ δ

n,m

− δ

n,m

1/2



+ (1 − v)

1/2



− n

1/2

+ δ

n,m

− δ

n,m

1/2



∈ A



− P



− n

1/2

+ δ

n,m

− δ

n,m

1/2

∈ A





(11)

which has a similar form to the key quantity ρ

∗∗

n,m

in Theorems 1 and 2. Lemma 3 gives a

bound on ρ

∗

n,m

under some general conditions, and it is important for deriving Lemma 4 and

Theorem 1.

EMMA 3. For a ny φ

,φ

≥ 1 and any sequence of constants δ

n,m

, assume the following

condition (a) holds,

(a) There exists a universal constant b>0 such that

min

1≤j≤p



− n

1/2

+ δ

n,m

− δ

n,m

1/2





≥ b.

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1321

Then we have

∗

n,m

≤ K

∗



−1/2

(log p)



∗

n,m

+ L

(log p)

1/2

+ φ

(φ

)



+ m

−1/2

(log p)

|δ

n,m



∗

n,m

+ L

(log p)

1/2

+ φ

∗

(φ

)





min{φ

,φ

}



−1

(log p)

1/2



up to a universal constant K

∗

> 0 that depends only on b, where ρ

∗

n,m

is deﬁned in (11).

Before stating the next lemma, for any φ ≥ 1, we denote M

(φ) = M

(φ) + M

(φ),

where M

(φ) and M

(φ) are given as follows, respectively,

−1



i=1



max

1≤j≤p



− μ





max

1≤j≤p



− μ



1/2



(4φ log p)



−1



i=1



max

1≤j≤p



− μ





max

1≤j≤p



− μ



1/2



(4φ log p)



similar to those adopted in [9]. Likewise, for any φ ≥ 1 and any sequence of constants δ

n,m

that depends on both n and m, we denote M

∗

(φ) = M

(φ) + M

(φ) with M

(φ) and

(φ) as follows, respectively,

−1



i=1



max

1≤j≤p



− μ





max

1≤j≤p



− μ



1/2



4|δ

n,m

|φ log p





−1



i=1



max

1≤j≤p



− μ





max

1≤j≤p



− μ



1/2



4|δ

n,m

|φ log p





Recalling the deﬁnition of ρ

∗∗

n,m

in (2), Lemma 4 gives an abstract upper bound on ρ

∗∗

n,m

under

mild conditions as follows.

EMMA 4. For any sequence of constants δ

n,m

, assume we have the following conditions

(a)–(b):

(a) There exists a universal constant b>0 such that

min

1≤j≤p



− n

1/2

+ δ

n,m

− δ

n,m

1/2





≥ b.

(b) There exist two sequences of constants

∗

and

∗∗

such that we have

∗

≥ L

and

∗∗

≥ L

, respectively. Moreover, we also have

∗

= K



∗



(log p)



−1/6

≥ 2,

∗∗

= K



∗∗



(log p)

|δ

n,m



−1/6

≥ 2,

for a universal constant K

∈ (0,(K

∗

∨ 2)

−1

], where the positive constant K

∗

that depends

on n as deﬁned in Lemma 3 in the Appendix.

Then we have the following property, where ρ

∗∗

n,m

is deﬁned in (2),

∗∗

n,m

≤ K



∗



(log p)



1/6





∗



∗





∗∗



(log p)

|δ

n,m



1/6



∗



∗∗



∗∗



for a universal constant K

> 0 that depends only on b.

1322 K. XUE AND F. YAO

To introduce Lemma 5, for any sequence of constants δ

n,m

that depends on both n and m,

denote a useful quantity



n,m

=



− 

+ δ

n,m

(



− 

)

∞

. Lemma 5 below gives an

abstract upper bound on ρ

n,m

deﬁned in (4).

EMMA 5. For any sequence of constants δ

n,m

, assume we have the following condition

(a):

(a) There exists a universal constant b>0 such that

min

1≤j≤p



− n

1/2

+ δ

n,m

− δ

n,m

1/2





≥ b.

Then for any sequence of constants



n,m

> 0, on the event {



n,m

≤



n,m

}, we have the

following property, where ρ

n,m

is deﬁned in (4),

n,m

 (



n,m

)

1/3

(log p)

2/3

Lastly, we present two-sample Borel–Cantelli lemma in Lemma 6.

EMMA 6. Let {A

n,m

: n ≥ 1,m≥ 1,(n,m)∈ A} be a sequence of events in the sample

space , where A is the set of all possible combinations (n, m), which has the form A =

{(n, m) : n ≥ 1,m∈ σ(n)} where σ(n) is a set of positive integers determined by n, possibly

the empty set. Assume the following condition (a):

(a)



∞

n=1



m∈σ(n)

P(A

n,m

)<∞.

Then we have the following property:



∞



∞



∞



n=k



m∈(k

)∩σ(n)

n,m



= 0,

where (k

) ={k : k ∈ Z,k≥ k

Note that if m ∈ σ(n) = ∅, we just delete the roles of those A

n,m

and A

n,m

during any

operations such as union and intersection, and the same applies to P(A

n,m

) and P(A

n,m

)

during summation and deduction.

Before preceding, we mention that the derivations of Theorems 1–2 essentially follow

those of their counterparts in [9], but need more technicality to employ the aforesaid Lemmas

4–5 to address the challenge arising from unequal sample sizes. The derivation of Corollary 1

is based on Theorem 1 as well as a two-sample Borel–Cantelli lemma (Lemma 6)thatﬁrst

appears in this work as far as we know.

Theorems 3–5 regarding the DCF test are newly developed, while no comparable results

are present in literature. Thus we present the proofs of Theorems 3–5 below, while the proofs

of Theorems 1–2, Corollary 1 and the auxiliary lemmas are delegated to an online Supple-

mentary Material for space economy.

ROOF OF THEOREM 3. First of all, we deﬁne a sequence of constants δ

n,m

(12) δ

n,m

=−n

1/2

−1/2

Together with condition (a), it can deduced that

(13) δ

< |δ

n,m

| <δ

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1323

with δ

={c

/(1 − c

)}

1/2

> 0andδ

={c

/(1 − c

)}

1/2

> 0. Moreover, by combining (12),

(13) with condition (b), we have

(14) min

1≤j≤p



− n

1/2

+ δ

n,m

− δ

n,m

1/2





≥ min



1,δ



In addition, based on condition (a) and condition (e), one has

(15) B

n,m

log

(pm)/m ∼ B

n,m

log

(pn)/n → 0.

To this end, by combining (12), (13), (14), (15), condition (c), condition (d) with Theorem 1,

it can be shown that

(16)

sup

t≥0







− n

1/2

−1/2

− n

1/2



− μ





∞

≤ t



− P





− n

1/2

−1/2

− n

1/2



− μ





∞

≤ t





≤ ρ

∗∗

n,m





n,m

log

(pn)/n



1/6

Next, we denote a sequence of constants α

n,m

(17) α

n,m

= (pn)

−1

and it is obvious that

(18) α

n,m

∈



0,e

−1



Moreover, by combining condition (a), condition (e) with (17), we conclude that

(19) B

n,m

log

(pm) log

(1/α

n,m

)/m ∼ B

n,m

log

(pn) log

(1/α

n,m

)/n → 0.

To this end, by combining (12), (13), (14), (17), (18), (19), condition (c), condition (d) with

Theorem 2, it follows that there exists a universal constant c

∗

> 0 such that with probabil-

ity at least 1 − γ

n,m

,wehaveρ

n,m

 {B

n,m

log

(pn)/n}

1/6

,whereγ

n,m

= (α

n,m

)

log(pn)/3

3(α

n,m

)

log

1/2

(pn)/c

∗

+ (α

n,m

)

log(pm)/3

+ 3(α

n,m

)

log

1/2

(pm)/c

∗

+ (α

n,m

)

log

(pn)/6

+ 3 ×

(α

n,m

)

log

(pn)/c

∗

+ (α

n,m

)

log

(pm)/6

+ 3(α

n,m

)

log

(pm)/c

∗

. Together with (a), (17)and(18),

it is not hard to prove that

(20)



n,m

< ∞.

Henceforth, by combining (12), (13), (14), (17), (18), (19), (20), condition (c), condition (d)

with Corollary 1, we reach a conclusion that with probability one,

(21)

sup

t≥0







− n

1/2

−1/2



∞

≤ t



− P





− n

1/2

−1/2

− n

1/2



− μ





∞

≤ t





≤ ρ

n,m





n,m

log

(pn)/n



1/6

Finally, according to (16)and(21), the assertion holds trivially. 

1324 K. XUE AND F. YAO

PROOF OF THEOREM 4. Given any (μ

− μ

),wehave

(22)

Power

∗



− μ



= P

∗





∗

− n

1/2

−1/2

∗

+ n

1/2



− μ





∞

≥ c

(α)



= 1 − P

∗





∗

− n

1/2

−1/2

∗

+ n

1/2



− μ





∞

(α)



= 1 − P

∗



−n

1/2



− μ



− c

(α) < S

∗

− n

1/2

−1/2

∗

−n

1/2



− μ



+ c

(α)



= 1 − P

∗



−n

1/2



− μ



− c

(α) < S

∗

− n

1/2

−1/2

∗

−n

1/2



− μ



+ c

(α)



+ P



−n

1/2



− μ



− c

(α) < S

− n

1/2

−1/2

− n

1/2



− μ



< −n

1/2



− μ



+ c

(α)



− P



−n

1/2



− μ



− c

(α) < S

− n

1/2

−1/2

− n

1/2



− μ



< −n

1/2



− μ



+ c

(α)



≥ 1 − sup

A∈A







− n

1/2

−1/2

− n

1/2



− μ





∞

∈ A



− P

∗





∗

− n

1/2

−1/2

∗



∞

∈ A





− P





− n

1/2

−1/2



∞

(α)



= Power



− μ



− sup

A∈A







− n

1/2

−1/2

− n

1/2



− μ





∞

∈ A



− P

∗





∗

− n

1/2

−1/2

∗



∞

∈ A





Likewise, given any (μ

− μ

),wehave

Power



− μ



= P





− n

1/2

−1/2



∞

≥ c

(α)



= 1 − P





− n

1/2

−1/2



∞

(α)



= 1 − P



−c

(α) < S

− n

1/2

−1/2

(α)



= 1 + P

∗



−n

1/2



− μ



− c

(α) < S

∗

− n

1/2

−1/2

∗

−n

1/2



− μ



+ c

(α)



− P



−n

1/2



− μ



− c

(α)

− n

1/2

−1/2

− n

1/2



− μ



< −n

1/2



− μ



+ c

(α)



(23)

− P

∗



−n

1/2



− μ



− c

(α) < S

∗

− n

1/2

−1/2

∗

< −n

1/2



− μ



+ c

(α)



≥ 1 − sup

A∈A







− n

1/2

−1/2

− n

1/2



− μ





∞

∈ A



− P

∗





∗

− n

1/2

−1/2

∗



∞

∈ A





DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1325

− P

∗





∗

− n

1/2

−1/2

∗

+ n

1/2



− μ





∞

(α)



= Power

∗



− μ



− sup

A∈A







− n

1/2

−1/2

− n

1/2



− μ





∞

∈ A



− P

∗





∗

− n

1/2

−1/2

∗



∞

∈ A





Putting (22)and(23) together indicates that

(24)



Power

∗



− μ



− Power



− μ





≤ sup

A∈A







− n

1/2

−1/2

− n

1/2



− μ





∞

∈ A



− P

∗





∗

− n

1/2

−1/2

∗



∞

∈ A





Moreover, by similar argument as in the proof of Theorem 3, one can show that with proba-

bility one,

(25)

sup

A∈A







− n

1/2

−1/2

− n

1/2



− μ





∞

∈ A



− P

∗





∗

− n

1/2

−1/2

∗



∞

∈ A









n,m

log

(pn)/n



1/6

Finally, by combining (24) with (25), for any μ

− μ

∈ R

, we have that with probability

one,



Power

∗



− μ



− Power



− μ









n,m

log

(pn)/n



1/6

which completes the proof. 

ROOF OF THEOREM 5. First of all, on the basis of (8) and the triangle inequality, it is

clear that

(26)

Power

∗



− μ



≥ P

∗





∗

− n

1/2

−1/2

∗



∞

≤



1/2



− μ





∞

− c

(α)



At this point, with some abuse of notation, we denote {e

: j ≤ p} as the natural basis for R

Then it follows from union bound inequality and concentration inequality that for any t ≥ 0,

(27)

∗





∗

− n

1/2

−1/2

∗



∞

≥ t



≤



j=1

∗





∗

− n

1/2

−1/2

∗



≥ t



≤



j=1

2exp



−t









+ nm

−1







≤ 2p exp



−t





2max

j≤p









+ nm

−1









By plugging t = c

(α) into (27), it follows from the deﬁnition of c

(α) that

(28)

(α) ≤



2log(2p/α) max

j≤p









+ nm

−1









1/2

≤



4log(pn) max

j≤p









+ nm

−1









1/2

1326 K. XUE AND F. YAO

for sufﬁciently large n. To bound the quantity max

j≤p



(



+ nm

−1



}, ﬁrst notice

that

(29)

max

j≤p









+ nm

−1











+ nm

−1





∞

≤





− 

+ nm

−1





− 





∞





+ nm

−1





∞

For the term 



− 

+ nm

−1

(



− 

)

∞

, inequalities (53) and (54) from the Supple-

mentary Material together with (12), (17) and condition (a) entails that there exists a universal

constant c

> 0 such that

(30)





− 

+ nm

−1





− 





∞

≤ c



n,m

log

(pn)/n



1/2

with probability tending to one. Regarding the term 

+ nm

−1





∞

, one has





+ nm

−1





∞

≤







∞

+ nm

−1







∞

≤







∞

+ c







∞

= max

1≤j≤p



i=1



− μ





/n + c

max

1≤j≤p



i=1



− μ





≤ max

1≤j≤p



i=1





− μ





1/2

/n(31)

+ c

max

1≤j≤p



i=1





− μ





1/2

≤



max

1≤j≤p



i=1



− μ







1/2

+ c



max

1≤j≤p



i=1



− μ







1/2

≤ c

n,m

for some universal constants c

> 0, where the second inequality is by condition (a), the

third inequality is based on Jensen’s inequality, the fourth inequality holds from the Cauchy–

Schwarz inequality and the last inequality follows from condition (c). To this end, by com-

bining (30), (31), (e) with (29), it can be deduced that there exists a universal constant c

> 0

such that

(32) max

j≤p









+ nm

−1







≤ c

n,m

with probability tending to one. Together with (28), it can be veriﬁed that

(33) c

(α) ≤



n,m

log(pn)



1/2

with probability tending to one. Now, we set the constant K

in (f) as K

= 4c

1/2

, and it then

follows from (f) and (33)that

(34)



1/2



− μ





∞

− c

(α) ≥



n,m

log(pn)



1/2

DISTRIBUTION AND CORRELATION-FREE TWO-SAMPLE MEAN TEST 1327

with probability tending to one. Hence, it can be deduced that with probability tending to

one,

Power

∗



− μ



≥ P

∗





∗

− n

1/2

−1/2

∗



∞

≤



n,m

log(pn)



1/2



= 1 − P

∗





∗

− n

1/2

−1/2

∗



∞

≥



n,m

log(pn)



1/2



≥ 1 − 2p exp



−4c

n,m

log(pn)





2max

j≤p









+ nm

−1









≥ 1 − 2n

−2

→ 1asn →∞,

where the ﬁrst inequality is based on (26)and(34), the second inequality holds from (27),

and the last inequality is by (32). This completes the proof. 

Acknowledgements. The authors would like to thank the Associate Editor and the two

referees for their insightful comments.

Fang Yao’s research is partially supported by National Natural Science Foundation of

China Grant 11871080, a Discipline Construction Fund at Peking University and Key Lab-

oratory of Mathematical Economics and Quantitative Finance (Peking University), Ministry

of Education.

Kaijie Xue’s research is partially supported by National Natural Science Foundation of

China Grant 11871080, the Fundamental Research Funds for the Central Universities, Key

Laboratory for Medical Data Analysis and Statistical Research of Tianjin and the Key Labo-

ratory of Pure Mathematics and Combinatorics, Ministry of Education.

Fang Yao is the corresponding author.

SUPPLEMENTARY MATERIAL

Supplement to “Distribution and correlation-free two-sample test of high-dimen-

sional means” (DOI: 10.1214/19-AOS1848SUPP; .pdf). The supplementary material (link

TBA) contains the proofs of Theorems 1–2, Corollary 1, and the auxiliary lemmas.

REFERENCES

[1] AYYALA,D.N.,PARK,J.andROY, A. (2017). Mean vector testing for high-dimensional dependent obser-

vations. J. Multivariate Anal. 153 136–155. MR3578843 https://doi.org/10.1016/j.jmva.2016.09.012

[2] B

AI,Z.andSARANADASA, H. (1996). Effect of high dimension: By an example of a two sample problem.

Statist. Sinica 6 311–329. MR1399305

[3] C

AI,T.T.,LIU,W.andXIA, Y. (2014). Two-sample test of high dimensional means under dependence. J.

R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372. MR3164870 https://doi.org/10.1111/rssb.12034

[4] C

HANG,J.,ZHENG,C.,ZHOU,W.-X.andZHOU, W. (2017). Simulation-based hypothesis testing of

high dimensional means under covariance heterogeneity. Biometrics 73 1300–1310. MR3744543

https://doi.org/10.1111/biom.12695

[5] C

HEN,S.X.andQIN, Y.-L. (2010). A two-sample test for high-dimensional data with applications to

gene-set testing. Ann. Statist. 38 808–835. MR2604697 https://doi.org/10.1214/09-AOS716

[6] C

HEN, X. (2018). Gaussian and bootstrap approximations for high-dimensional U-statistics and their appli-

cations. Ann. Statist. 46 642–678. MR3782380 https://doi.org/10.1214/17-AOS1563

[7] C

HERNOZHUKOV,V.,CHETVERIKOV,D.andKAT O, K. (2013). Gaussian approximations and multi-

plier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786–2819.

MR3161448 https://doi.org/10.1214/13-AOS1161

[8] C

HERNOZHUKOV,V.,CHETVERIKOV,D.andKATO , K. (2015). Comparison and anti-concentration

bounds for maxima of Gaussian random vectors. Probab. Theory Related Fields 162 47–70.

MR3350040 https://doi.org/10.1007/s00440-014-0565-9

1328 K. XUE AND F. YAO

[9] C

HERNOZHUKOV,V.,CHETVERIKOV,D.andKATO , K. (2017). Central limit theorems and bootstrap in

high dimensions. Ann. Probab. 45 2309–2352. MR3693963 https://doi.org/10.1214/16-AOP1113

[10] F

ENG,L.,ZOU,C.,WANG,Z.andZHU, L. (2015). Two-sample Behrens–Fisher problem for high-

dimensional data. Statist. Sinica 25 1297–1312. MR3409068

[11] G

REGORY,K.B.,CARROLL,R.J.,BALADANDAYUTHAPANI,V.andLAHIRI, S. N. (2015). A two-

sample test for equality of means in high dimension. J. Amer. Statist. Assoc. 110 837–849. MR3367268

https://doi.org/10.1080/01621459.2014.934826

[12] H

U,J.,BAI,Z.,WANG,C.andWANG, W. (2017). On testing the equality of high dimensional mean vectors

with unequal covariance matrices. Ann. Inst. Statist. Math. 69 365–387. MR3611524 https://doi.org/10.

1007/s10463-015-0543-8

[13] H

USSAIN,L.,AZIZ,W.,NADEEM,S.A.,SHAH,S.A.andMAJID, A. (2015). Electroencephalography

(EEG) analysis of alcoholic and control subjects using multiscale permutation entropy. J. Multidiscip.

Eng. Sci. Technol. 1 3159–0040.

[14] P

ARK,J.andAYYALA, D. N. (2013). A test for the mean vector in large dimension and small samples. J.

Statist. Plann. Inference 143 929–943. MR3011304 https://doi.org/10.1016/j.jspi.2012.11.001

[15] S

HEN,Y.andLIN, Z. (2015). An adaptive test for the mean vector in large-p-small-n problems. Comput.

Statist. Data Anal. 89 25–38. MR3349665 https://doi.org/10.1016/j.csda.2015.03.004

[16] S

R I VAS TAVA , M. S. (2007). Multivariate theory for analyzing high dimensional data. J. Japan Statist. Soc.

37 53–86. MR2392485 https://doi.org/10.14490/jjss.37.53

[17] S

R I VAS TAVA , M. S. (2009). A test for the mean vector with fewer observations than the dimension under

non-normality. J. Multivariate Anal. 100 518–532. MR2483435 https://doi.org/10.1016/j.jmva.2008.

06.006

[18] S

R I VAS TAVA ,M.S.andDU, M. (2008). A test for the mean vector with fewer observations than the

dimension. J. Multivariate Anal. 99 386–402. MR2396970 https://doi.org/10.1016/j.jmva.2006.11.002

[19] S

R I VAS TAVA ,M.S.andKUBOKAWA, T. (2013). Tests for multivariate analysis of variance in high dimen-

sion under non-normality. J. Multivariate Anal. 115 204–216. MR3004555 https://doi.org/10.1016/j.

jmva.2012.10.011

[20] W

ANG,L.,PENG,B.andLI, R. (2015). A high-dimensional nonparametric multivariate test for mean

vector. J. Amer. Statist. Assoc. 110 1658–1669. MR3449062 https://doi.org/10.1080/01621459.2014.

988215

[21] X

U,G.,LIN,L.,WEI,P.andPAN, W. (2016). An adaptive two-sample test for high-dimensional means.

Biometrika 103 609–624. MR3551787 https://doi.org/10.1093/biomet/asw029

[22] X

UE,K.andYAO, F. (2019). Supplement to “Distribution and correlation-free two-sample test of high-

dimensional means.” https://doi.org/10.1214/19-AOS1848SUPP.

[23] Y

AGI,A.andSEO, T. (2014). A test for mean vector and simultaneous conﬁdence intervals with three-step

monotone missing data. Amer. J. Math. Management Sci. 33 161–175.

[24] Y

AMADA,T.andHIMENO, T. (2015). Testing homogeneity of mean vectors under heteroscedasticity in

high-dimension. J. Multivariate Anal. 139 7–27. MR3349477 https://doi.org/10.1016/j.jmva.2015.02.

005

[25] Z

HANG,J.andPAN, M. (2016). A high-dimension two-sample test for the mean using cluster subspaces.

Comput. Statist. Data Anal. 97 87–97. MR3447038 https://doi.org/10.1016/j.csda.2015.12.004

[26] Z

HANG, X. (2015). Testing high dimensional mean under sparsity. Preprint. Available at

arXiv:1509.08444v2.

[27] Z

HAO, J. (2017). A new test for the mean vector in large dimension and small samples. Comm. Statist.

Simulation Comput. 46 6115–6128. MR3740770 https://doi.org/10.1080/03610918.2016.1197244

[28] Z

HONG,P.-S.,CHEN,S.X.andXU, M. (2013). Tests alternative to higher criticism for high-dimensional

means under sparsity and column-wise dependence. Ann. Statist. 41 2820–2851. MR3161449

https://doi.org/10.1214/13-AOS1168

[29] Z

HU,Y.andBRADIC, J. (2016). Two-sample testing in non-sparse high-dimensional linear models.

Preprint. Available at arXiv:1610.04580v1.