You are on page 1of 35

Global Testing Against Sparse Alternatives in Time-Frequency

Analysis

T. Tony Cai1 , Yonina C. Eldar2 , and Xiaodong Li1


arXiv:1410.6966v1 [math.ST] 25 Oct 2014

Abstract

In this paper, an over-sampled periodogram higher criticism (OPHC) test is proposed for
the global detection of sparse periodic effects in a complex-valued time series. An explicit
minimax detection boundary is established between the number and magnitude of the complex
sinusoids hidden in the series. The OPHC test is shown to be asymptotically powerful in the
detectable region. Numerical simulations illustrate and verify the effectiveness of the proposed
test. Furthermore, the periodogram over-sampled by O(log N ) is proven universally optimal in
global testing for periodicities under a mild minimum separation condition. Connections to the
problem of detecting a stream of pulses from frequency measurements in signal processing is
also discussed.

Keywords: testing for periodicity, sparsity, over-sampled periodogram, higher criticism, de-
tection boundary, normal comparison, empirical processes.

1 Introduction

1.1 Model and methodology

Global detection of periodic patterns in time series due to various biological rhythms such as
cell division, circadian rhythms, life cycles of microorganisms, and many others is an important
problem in gene expression studies; See, for example, [32] and the references therein. Testing for
periodicity dates back to Fisher [27]. Suppose ut , t = 1, . . . , N , is a real-valued time series observed
at equispaced time points, that satisfies the model

ut = ζt + t , (1.1)

where the noise t ∼ N (0, σ 2 ) are i.i.d. normal variables with variance σ 2 , and ζt = E(ut ). For
simplicity, we assume N is odd and define n by N = 2n + 1. Simple harmonic analysis yields the
representation
n     
X 2πjt 2πjt
ζt = a0 + aj cos + bj sin .
N N
j=1

1
Department of Statistics, The Wharton School, University of Pennsylvania. The research was supported in part
by NSF Grants DMS-1208982 and DMS-1403708, NIH Grant R01 CA127334, and the Wharton Dean’s Fund for
Post-Doctoral Research.
2
Department of Electrical Engineering, Technion.

1
Define Rj2 = a2j + b2j , j = 1, . . . , n. In [27], the well-known Fisher’s test, which is based on the
maximum value of the normalized standard periodogram of the observed series, was proposed to
test

H0 : R12 = . . . = Rn2 = 0 versus H1 : Rj2 > 0 for some j, Ri2 = 0 for all other i, (1.2)

in which there is a single periodicity under the alternative.


Since then substantial extensions and improvements have been made in the literature, e.g.
[45, 8, 20, 23]. In [45] an extension of Fisher’s test, which is based on an adaptive set of largest
normalized standard periodogram values, was proposed to test the hypotheses

H0 : R12 = . . . = Rn2 = 0 versus H1 : Rj21 > 0, . . . , Rj2s > 0, Ri2 = 0 for all other i, (1.3)

where there are several distinct periodicities in the series under the alternative. The higher criticism
test proposed in [23] can also be applied to the standard periodogram, and certain asymptotic
minimaxity properties of this testing method against sparse alternatives have been established.
However, the efficacies of these standard periodogram-based tests proposed in the literature are
not justified when the frequencies of the periodicities are off the grid ( 2π 4π 2nπ
N , N , . . . , N ). For general
frequencies, the tests based on the standard periodogram are possibly suboptimal. The goal of
the present paper is to construct a new test based on an over-sampled peridogram that adapts to
generic frequencies and to study its optimality properties.
Our discussion throughout the paper is focused on complex-valued time series for the following
reasons. First, complex-valued or bivariate time series arise naturally in modern data analysis
such as functional MRI, blood-flow and oceanography; see, for example, [42, 46] and the references
therein. Second, as indicated in Section 4.1 of [10] and Section 1.5 of [28], complex-valued time series
are sometimes more convenient for analysis. We therefore begin our analysis with complex-valued
series. Periodicity detection in real-valued series is then discussed in Section 4.
Specifically, we consider the linear model

y = Xβ + z, (1.4)

where the design matrix X ∈ CN ×p with p  N is complex-valued and is an extended discrete


Fourier transform (EDFT) matrix, i.e,
1 − 2πi(k−1)(j−1)
Xjk := √ e p , j = 1, . . . , N, k = 1, . . . , p. (1.5)
p

The vector z represents zero-mean i.i.d. complex white noise, i.e., z = z1 + iz2 with z1T , z2T ∼
 
2
N (0, σ2 I2N ). The noise level σ is assumed to be known, and we let σ = 1 throughout the paper.
A distinct feature of our model in (1.4) is that the value of p is assumed to be unknown
throughout this paper, which means that the design matrix X is actually unavailable. Moreover,
adjacent columns in X are nearly parallel, which is different from the common assumption in high-
dimensional regression models in which the columns of the design matrices are pairwise incoherent.
A broad class of combinations of periodicities can be represented by the mean Xβ. When β is
sparse, Xβ is a superposition of a sparse collection of complex sinusoids. This model represents
the case in which we measure N low-pass measurements on an over-sampled grid, where the grid
itself is unknown.

2
We consider a global test for periodicity modeled as

H0 : β = 0 versus H1 : β 6= 0 and β is sparse. (1.6)

Note that under the null hypothesis β = 0, the observation y is a sequence of complex white noise;
while under the alternative, the mean of y can be sparsely represented by the design matrix X,
i.e., the mean of y consists of complex sinusoids. We denote by s the sparsity of β under the
alternative, and assume it is unknown.
Although the over-sampled periodogram of a real-valued time series can be complicated, it turns
out to be surprisingly simple in the complex case. Define
 
1 2πi (m−1)(j−1)
U= √ e q (1.7)
N 1≤m≤q,1≤j≤N

whose row vectors are normalized in 2-norm, and set

v = U y, and Im = |vm |2 , m = 1, . . . , q. (1.8)

By letting q > N , I1 , . . . , Iq is the over-sampled periodogram of interest.


An important question is how to choose the optimal over-sampling rate q/N ? A quick an-
swer is that the bigger q is, the more powerful a test can be designed based on the over-sampled
periodogram. However, the joint distribution of an excessively over-sampled periodogram is not
easy to analyze, and more importantly, testing methods based on an excessively over-sampled
periodogram may significantly increase the computational cost without improving the statistical
power. A careful analysis yields that there is a universally optimal choice for the over-sampling
rate: q/N = O(log N ), i.e. q = O(N log N ) in (1.7). Later we will show that as long as the
frequencies of the complex sinusoids in the mean of y obey some minimum separation condition,
the over-sampling rate O(log N ) leads to an asymptotically optimal global test. We emphasize that
this over-sampling rate is independent of the grid parameter p and the sparsity s.
We now construct a test statistic based on this over-sampled periodogram. Define
q
1{√Im ≥t} − q Ψ̄(t)
P
m=1
HC(t) := p , (1.9)
q Ψ̄(t)(1 − Ψ̄(t))
2
where Ψ̄(t) = P(|z| ≥ t) = e−t is the tail probability of the standard complex normal variable
as calculated in Lemma 5.2, and Im is defined as in (1.8). Notice that under the null, all vm are
standard complex-valued normal variables as defined in Definition 5.1, and E HC(t) = 0 for any
fixed t. The proposed test statistic is then defined as

HC ∗ = sup HC(t), (1.10)


a≤t≤b

for which appropriate choices of the interval [a, b] are discussed in Sections 2 and 3. We will fix a
threshold level T , and reject H0 if and only if HC ∗ > T . This test is referred to as the over-sampled
periodogram higher criticism (OPHC) test.
The higher criticism method was originally termed by Tukey and introduced in Donoho and
Jin [23] for signal detection under a sparse homoscedastic Gaussian mixture model. This detection
problem was first studied in Ingster [36]. Cai, Jin and Low [14] investigated minimax estimation of

3
the non-null proportion n under the same model. Hall and Jin [33] proposed a modified version of
the high criticism for detection with correlated noise with known covariance matrices. Cai, Jeng and
Jin [13] considered heteroscedastic Gaussian mixture model and showed that the optimal detection
boundary can be achieved by a double-sided version of the higher criticism test. The papers [4, 3]
studied a related problem of detecting a signal with a known geometric shape in Gaussian noise.
In the special case in which p = N , i.e., the frequencies are on the grid, the design matrix
becomes the orthogonal DFT matrix. Multiplying the measurement by the inverse DFT matrix,
the design matrix is reduced to the identity design. Therefore, the problem becomes equivalent to
the standard sparse detection model discussed in [36, 23], and the standard higher criticism test
proposed in [23] can be directly applied. Notice that in the OPHC test defined above, choosing
q = N in (1.7) is equivalent to multiplying the measurement by the inverse DFT. Therefore, when
p = N , if we choose q = N instead of q = O (N log N ), the method described above is reduced to
the standard HC method proposed in [23]. Similar to the real-valued case discussed in [23], the HC
method with q = N is asymptotically optimal when β is sufficiently sparse, so there is no need to
over-sample the periodogram.

1.2 Compressed detection of pulse streams using Fourier measurements

Besides periodicity detection in a complex-valued series, the hypothesis testing model (1.4) is also
closely connected to an interesting problem in signal processing: detection of a stream of pulses
from frequency measurements. Unlike testing for periodicity in time series, in which the goal is
to detect periodic effects based on time measurements, the objective here is to detect time pulses
based on frequency measurements. These two problems are important in their respective fields, one
in time series analysis and one in signal processing, and are seemingly very different at first sight,
but they can both be attacked through the hypothesis test model (1.4).
Suppose h(t) : (−∞, +∞) → R is a pulse function. Then a stream of pulses can be represented
by
s
X
f (t) = αl h(t − tl ). (1.11)
l=1

Without loss of generality, we assume that 0 < t1 ≤ . . . ≤ ts < 1, and the coefficients α1 , . . . , αs ∈ C.
Streams of pulses appear in many different practical applications in a variety of scientific and en-
gineering fields, such as ultrasound imaging [51], channel sounding [11, 29], and moving target
detection in sonar and radar [6, 7]. In these settings, the echoes of a transmitted pulse h(t) are
analyzed to identify the positions and reflectance coefficients of scatterers in the medium. The
received signal is assumed to have the form (1.11), where s is the number of scatterers and the
complex amplitudes {a` }L L
`=1 and time-delays {t` }`=1 correspond to the reflectance and location of
the scatterers. Other settings in which pulse streams play an important role are image superreso-
lution [5], signal compression [18], and ultra wideband communications [21].
The signal processing task of interest is to detect the existence of f (t) from its noisy linear
samples. In the context of radar, for example, the delays t` represent the distances to the `th
target, while the complex amplitudes α` convey information about the Doppler shift which is
related to the target’s velocity as well as other channel attenuation factors. Detecting whether or
not a target is present then reduces to detecting wether there exists at least one value t` for which
|α` | > 0.
One natural method is sampling the signal in the time domain; that is, detecting the existence

4
of f (t) based on f ( p0 ) + z(1), f ( p1 ) + z(2), . . . , f ( p−1
p ) + z(p) with some integer p, where z(t)’s are
measurement errors. Since the time elapse of the pulse function h(t) is usually very short, i.e., the
energy of h(t) is concentrated in a short interval, the integer p has to be chosen large enough such
that the information of the signal can be contained in the sequence of measurements, although
most measurements are possibly purely noise. As the support of h(t) is inversely proportional to
its bandwidth, p is typically chosen to be on the order of the Nyquist rate, which for a real signal
is equal to its bandwidth. This implies that even if the strength of the signal is very large, one
cannot reduce the number of measurements by directly sampling in the time domain.
Instead, we suggest to sample the signal on the frequency domain. With a specific frequency
parameter ω, the noisy frequency measurement of f (t) can be written as
Z +∞
ỹ(ω) = e−i2πωt f (t)dt + z(ω)
−∞
s
X
= αl e−i2πωtl ĥ(ω) + z(ω).
l=1

Sampling pulse streams in the frequency domain has been promoted recently in the context of finite-
rate-of-innovation (FRI) sampling [50, 49]. In that context it has been shown that a stream of L
pulses can be recovered entirely from only 2L Fourier coefficients. Furthermore, several concrete
sampling methods have been proposed that can be implemented conveniently in hardware and that
lead to the desired Fourier components [7, 30, 48].
The first question is how to choose a sequence of frequencies ω’s. Noticing that 0 ≤ t1 < . . . , ≤
ts < 1, in this work, we propose to choose ω’s as 0, . . . , N − 1; that is, low frequency samples,
as opposed to random frequencies in a much larger bandwidth as suggested in compressed sensing
[17]. Later we will show the advantage of our scheme by numerical simulations. Under this choice
the measurements become
s
X
ỹ(j − 1) = αl e−i2π(j−1)tl ĥ(j − 1) + z(j − 1), j = 1, . . . , N,
l=1

where z(0), z(1), . . . , z(N − 1) are IID standard circularly-symmetric


√ complex normal variables;
that is, the real and imaginary parts (the coefficient of “i = −1”) of z(j − 1) are independent
symmetric normal variables with variances 21 .
We also assume that ĥ(0) = ĥ(1) = . . . = ĥ(N − 1) = 1, which can be guaranteed by choosing
h(t) as a normalized sinc function with a bandwidth parameter larger than N . By denoting
yj := ỹ(j − 1) and zj := z(j − 1), the measurements can be written as
s
X
yj = αl e−i2π(j−1)tl + zj , j = 1, . . . , N.
l=1

In order to explicitly demonstrate our method and results, we assume that t1 , . . . , ts are located
on the grid {0, p1 , . . . , p−1
p }. Here p is unknown and can be chosen arbitrarily large. By letting
τl = ptl + 1, the measurement model becomes equivalent to (1.4) with β ∈ Cp an s-sparse signal,

such that βτl = pαl for 1 ≤ l ≤ s, and its other components are 0. The signal detection problem

H0 : f (t) = 0 versus H1 : f (t) 6= 0

is also transformed to (1.6).

5
1.3 Relation with global testing in linear models

If the dimension p in (1.4) were known, the hypothesis testing model (1.4) considered in the present
paper is also strongly related to the global testing problem under a linear model with sparse
alternatives. It is helpful to review some well-known results for the real-valued case in this line of
research.
Consider the linear model:
y = Xβ + ,
where X ∈ RN ×p , β ∈ Rp are the design matrix and regression coefficients, respectively. The noise
vector  ∈ RN is assumed to be IID Gaussian variables with variance 1, i.e.,  ∈ N (0, IN ). The
global detection of β is captured by the following hypothesis test:

H0 : β = 0, versus H1 : β 6= 0. (1.12)

In the recently developed literature of high-dimensional statistics, p is comparable or much greater


than N . Moreover, the parameter vector β is assumed to be sparse compared to the number
of samples; that is, kβk0 = s  N . In this setting, the tradeoff between the strength of the
nonzero regression coefficients and the sparsity, by which the detectability of the non-zero regression
coefficients is captured, has been intensively studied in the literature. In order to simplify the
analysis, it is convenient to assume that the nonzero components of β have the same magnitude
A. The tradeoff between the signal strength and sparsity is reduced to a quantitative relationship
between A and s for given N and p. This relationship also depends closely on the properties of the
design matrix X. There are two well-studied examples in the literature:

• Identity design matrix. The simplest form of X is perhaps the identity matrix I, which
implies N = p. The detection boundary for this design matrix is given in [36, 23], and an
1 1
implementable test is given in the latter work. √ Roughly speaking, when s < N 2 = p 2 , the
detectability of nonzero β is dictated by A = 2r log p with r = O(1). To be specific, if
s = p1−α with α ∈ [ 21 , 1], by letting r and α be fixed , a higher criticism test is asymptotically
powerful as long as r > ρ∗ (α), where the detection boundary function ρ∗ is defined as:
( √
(1 − 1 − α)2 , α ∈ 43 , 1
 

ρ (α) = (1.13)
α − 21 , α ∈ 12 , 34 .


On the other hand, if r < ρ∗ (α), all sequences of testing procedures are asymptotically
powerless. The condition α > 12 √
is crucial. Otherwise the detectability of nonzero β is not
characterized by the scaling A = 2r log p.

• Gaussian design matrix. Another carefully studied class of design matrices are the Gaussian
designs; that is, X ∈ RN ×p has IID symmetric normal variables with variance p1 . This model
appears in [2], although a more general
q family of fixed design matrices are also analyzed there1 .
2rp log p
Under this model, assume A = N and s = p1−α with fixed α and r. The following
1−α 5
 3
results are proven in Corollary 1 of [2]: Assume (i) p Nlog p → 0; (ii) log
N
p 2
= O(p+5α−4 )
1
In [2], the design matrix X is assumed to have normalized columns. For Gaussian designs this asymptotically
amounts to assuming that X has IID symmetric normal variables with variance N1 . By scaling the parameter vector
β appropriately, the common variance can be instead assumed to be p1 , with rows that are asymptotically normalized.
This is consistent with our model (1.4).

6
1
for all  > 0; (iii) N − 2 p1−α log9/2 p → 0; and (iv) the support and signs of β are assumed
to be random. Then the condition r > ρ∗ (α) guarantees successful detection by a higher
criticism type of test, while all testing methods are asymptotically powerless once r < ρ∗ (α).
The detection boundary function ρ∗ (α) is the same as (1.13). It is noteworthy that condition
(iii) is crucial in establishing the lower bound, and it requires N > s2 .

For ease of presentation, we assume N = p1−γ with 0 ≤ γ ≤ 1 throughout the paper. Then in
the case of Gaussian designs, the detection boundary r = ρ∗ (α) holds when
1+γ
< α < 1,
2
which summarizes conditions (i), (ii) and (iii), as well as the support and signs of β are assumed to
be random. However, there is an “unnatural” property of the detection boundary ρ∗ in this case:
When γ > 0, r → ρ∗ ( 1+γ
2 ) > 0 as α → 2 .
1+γ

In Section 2, with a similar setup of γ, α, A and r, under the condition 1+γ


2 < α < 1, a new
detection boundary is developed for the model (1.4) with EDFT designs, as long as the support
of β satisfies a mild minimum separation condition. It enjoys the property r → ρ∗ ( 1+γ
2 ) = 0 as
1+γ
α → 2 . A detailed comparison between the detection boundary of EDFT designs and that of
Gaussian designs is also provided in Section 2.

1.4 Structure of the paper

The rest of the paper is organized as follows: In Section 2, we develop theoretical results for the
proposed method. An explicit detection boundary r = ρ∗γ (α) is derived, under a mild minimum
separation assumption, and the asymptotic optimality of OPHC is established. In Section 3, nu-
merical simulations illustrate the efficacy of our approach. In the implementation of OPHC, we
compare the performances between q/N = O(log N ), q = N and q = p. Moreover, in the context
of detecting pulse streams, the proposed low-frequency partial DFT sensing matrix is compared to
a random-frequency partial DFT matrix. A summary of our main contributions is given in Section
4, as well as some future research directions. All the proofs are deferred to Section 5.

2 Theoretical results

In this section, we aim to establish a sharp tradeoff between the magnitude and number of the
nonzero components in β, such that the OPHC test can successfully reject the null hypothesis
when the alternative is true. Under the null, we assume
supp(β) = {τ1 , . . . , τs },
where 1 ≤ τ1 < . . . < τs ≤ p. This implies that the nonzero components of β are βτ1 , . . . , βτs . If we
denote
β̃1 := βτ1 , . . . , β̃s := βτs , (2.1)
then under the alternative, the s-sparse signal β is uniquely parameterized by (τ , β̃), where
   
τ1 β̃1
 ..   .. 
τ =  .  and β̃ =  .  .
τs β̃s

7
The distribution of the measurement y under the alternative is therefore parameterized by τ and
β̃, so we denote it by P(τ ,β̃) . Under the null, the distribution of y is i.i.d. standard complex normal,
which we denote by P0 .
Since N  p and s  N , throughout the paper, we assume N = p1−γ with fixed γ ∈ (0, 1).
Under the alternative, we assume s = p1−α with 1+γ 2 < α < 1. This assumption implies that
1
s < N 2 . In [2], both the signs of the nonzero regression coefficients and the support of β are
assumed to be uniformly random. In contrast, we assume the phases of the nonzero components of
β are arbitrary and even adversarial, and impose a mild minimum separation assumption on the
support of β in (1.4), which automatically holds as long as its support is uniformly random.

2.1 Minimum separation and random support conditions

If the frequencies of the periodic effects contained in y are concentrated, then it is likely that
the cumulative periodogram test can successfully reject the null; See [25]. On the contrary, we
assume the distances between the indices of the nonzero components of β, i.e. τ1 , . . . , τs , satisfy
the following minimum separation condition:
1 log2 N
∆(τ ) := min{|τl+1 − τl | : l = 1, . . . , s, τs+1 := τ1 + p} ≥ . (2.2)
p N
A similar minimum separation condition appears in the context of super-resolution; See [22, 16].
This spacing condition holds asymptotically if the support is assumed to be random. Assume
that τ1 ≤ . . . ≤ τs are the order statistics of independent uniformly distributed random variables
a1 , . . . , as in {1, . . . , p}. For any a, b ∈ {1, . . . , p}, define the distance
 
|a − b| |a − b|
d(a, b) = min ,1 − . (2.3)
p p
It is evident that
∆(τ ) = min d(al1 , al2 ).
1≤l1 <l2 ≤s

For any fixed l1 < l2 , and any p0 ∈ [0, 1], it is easy to see P(d(al1 , al2 ) < p0 ) ≤ 2p0 + p1 . Since there
s(s−1)
are 2 pairs, we have
  X s(s − 1)
P min d(al1 , al2 ) < p0 ≤ P (d(al1 , al2 ) < p0 ) ≤ s(s − 1)p0 + .
1≤l1 <l2 ≤s 2p
1≤l1 <l2 ≤s

log2 N
By letting p0 = N , we obtain

log2 N s2 log2 N s2
 
P ∆(τ ) < ≤ + .
N N p
1+γ
Recall that we assume the sparsity satisfies 2 < α < 1, where N = p1−γ and s = p1−α . Then
s2 log2 N s2
N + p → 0 as p → ∞. Therefore, (2.2) holds with probability tending to 1. A simple
corollary is that with probability approaching 1, all the indices a1 , . . . , as are distinct.
In brief, under the setup 1+γ
2 < α < 1, the minimum separation condition is less restrictive
than the uniformly random support condition. In the next section, a new detection boundary is
established under this minimum separation condition.

8
2.2 Detection boundary

Recall that under the alternative, the distribution of the observation y is parameterized by (τ , β̃).
We further assume that2
( r )
rp log p log2 N
(τ , β̃) ∈ Γ(p, N, s, r) := |β̃1 | = . . . = |β̃s | = A := , ∆(τ ) ≥ . (2.4)
N N

This implies that all nonzero components of β have the same amplitude, which is a common
assumption in the literature of minimax detection boundaries; See, for example, [23, 33, 2].
We aim to derive a new minimax detection boundary r = ρ∗γ (α), such that the null can be
successfully rejected asymptotically if r > ρ∗γ (α), while all testing methods would fail to reject the
null asymptotically if r < ρ∗γ (α). The detection boundary of (α, r) is established in this section
when the minimum separation condition (2.2) holds and the sparsity level satisfies 1+γ 2 < α < 1.
Recall that the OPHC test defined by (1.7)–(1.10) is determined by the interval [a, b], the
specific choice of q = O(N log Nq ), and the threshold T . In our theoretical analysis, we choose
q = N blog N + 1c and [a, b] = [1, log N3 ], by which the OPHC test statistic is defined as

HC ∗ = sup
q
HC(t).
N
1≤t≤ log 3

Finally, we choose the threshold T = log2 N and define the OPHC test as

Ψ = I(HC ∗ > log2 N ). (2.5)

That is, the null hypothesis is rejected if and only if HC ∗ > log2 N . This threshold is often too
conservative in practice, and a more reliable and useful threshold for finite samples can be chosen
by Monte Carlo simulations, as we discuss in Section 3.
Our first theorem is on the detectable region of (α, r), in which the null can be successfully
rejected asymptotically:

Theorem 2.1 In the measurement model (1.4), suppose N = p1−γ with γ ∈ [0, 1). Under the
alternative we assume s = p1−α with 1+γ2 < α < 1, and (τ, β̃) satisfies (2.4) with parameter r. If

r > ργ (α) with
(√ √ 2, α ∈ 43+ γ4 , 1
 

( 1 − γ − 1 − α)
ργ (α) = h
α − 21 − γ2 , α ∈ 1+γ 3 γ
2 ,4 + 4 ,
 q 
N
then the OPHC test defined by (1.7) – (1.10) with q = N blog N + 1c and [a, b] = 1, log 3 is
asymptotically powerful:
!
lim P0 (H0 is rejected) + max P(τ ,β̃) (H0 is accepted) = 0.
N →∞ (τ ,β̃)∈Γ(p,N,s,r)
q
2
As discussed in Section 1.3, it is assumed that A = 2rpNlog p in the literature of global detection boundaries under

linear models. The difference of 2 stems from the difference between real-valued and complex-valued sequences.

9
The most significant technical novelty of our paper lies in the proof of Theorem 2.1. In the
analysis of HC ∗ under the alternative, the mean and covariance structure of v, which is defined
in (1.8), requires more careful calculation than in existing work, e.g. [2, 33]. In particular the
estimation of E(v1 ), . . . , E(vq ) and the control of Cov(1|va |>t , 1|vb |>t ) are treated cautiously based
on a variety of cases. In most calculations of the statistical properties of v, the special structure of
the extended DFT design matrix X needs to be employed. In terms of the null, the analysis of the
HC ∗ statistic is related to that for the simple model X = I as in [23], but here a non-asymptotic
concentration inequality is required, which is stated in Lemma 5.4.
The following theorem establishes the lower bound for the testing problem.

Theorem 2.2 Under the same setting as Theorem 2.1, if r < ρ∗γ (α), then all sequences of hypoth-
esis tests are asymptotically powerless, i.e.,
!
lim P0 (H0 is rejected) + max P(τ ,β̃) (H0 is accepted) = 1.
N →∞ (τ ,β̃)∈Γ(p,N,s,r)

Theorems 2.1 and 2.2 together show that the proposed test is asymptotically optimal.
The proof of Theorem 2.2 is relatively easy. In fact, by taking advantage of the specific structure
of the design matrix X, the deduction can be reduced to the case in which the design matrix equals
the identity. The classic lower bound arguments in [36, 23, 33] can then be directly applied.

1+γ
Figure 1: Detection boundary functions ρ∗ (α) (red) and ρ∗γ (α) (green) for γ = 0.3 and 2 < α < 1.

We now compare ρ∗γ with the detection boundary ρ∗ associated with the Gaussian designs
established in [2]. As indicated in Section 1.3,qafter normalizing the rows of the Gaussian design,
the magnitude parameter is denoted as A = 2rpNlog p . Notice that in our model the magnitude
q √
parameter is A = rp N log p
, and the difference of 2 is due to the distinction between real-valued
and complex-valued models. Therefore, it is fair to compare ρ∗ and ρ∗γ directly. It is obvious that
ρ∗γ (α) < ρ∗ (α) for all 1+γ
2 < α < 1 as long as γ > 0. This implies that the detection boundary
associated with the extended DFT design matrix leads to milder trade-off between the number and

10
the strength of the nonzero components of β than that of Gaussian designs, as long as 1+γ
2 < α < 1.
To illustrate their differences, the two detection boundary functions are plotted together in Figure
1 for γ = 0.3.

3 Numerical Simulations

In this section, we study the empiricalq


behavior of the OPHC test by numerical simulations. Recall
that in Section 2, we chose [a, b] = [1, log N3 ] in (1.10) in order to establish the minimax detection
boundary. However, to implement higher criticism tests in practice, it is more convenient to express
the statistic as a function of P(1) ≤ P(2) ≤ . . . ≤ P(q) , which are the P-values of the order statistics
of |v1 |, . . . , |vq | defined in (1.8). In other words, suppose |v(1) | ≥ . . . ≥ |v(q) | are the order statistics
2
of |v1 |, . . . , |vq | and let P(m) := Ψ̄(|v(m) |) = e−|v(m) | . The HC ∗ test in our following numerical
simulations is defined as
m − qP(m)
HC ∗ = max q , (3.1)
m: 1q ≤P(m) < 12 qP(m) (1 − P(m) )
√ √
which is equivalent to choosing [a, b] = [ log 2, log q] in (1.10).

3.1 Comparison with other choices of over-sampling rates

In this section we compare the empirical testing powers of the OPHC test with various choices of
q.

Figure 2: Empirical distribution of the OPHC test statistic under the null is plotted in the upper panel,
where N = 1000 and q = 14000. Under the mixed alternative, we let p = 1, 000, 000, s = 30 and r = 0.3.
Empirical distribution of the OPHC test statistic is plotted in the middle panel with known σ = 1, and in
the lower panel with estimated variance of noise.

First, let N = 1000 and q = 2N blog N + 1c = 14, 000. The noise level is assumed to be σ = 1.
Then the empirical distribution of the OPHC test statistic HC ∗ under the null can be derived by
Monte Carlo simulation with 1, 000 independent trials, which is shown in the upper panel of Figure
2.

11
Under the alternative, we assume that p = 1, 000, 000 and the sparsity of β is s = 20. The
support of β is distributed uniformly at random, and the phase of the nonzero entries of β are
uniformly
q distributed on [0, 2π). All magnitudes of the nonzero components of β are fixed as
rp log p
A= N with r = 0.3.
We first assume that the variance σ 2 = 1 is known. The resulting empirical distribution of HC ∗
under the mixed alternative is plotted in the middle panel of Figure 2 by 1000 independent trials.
In 956 trials of them, the empirical P-values are smaller than 0.05, by which the periodicities are
successfully detected.
Let’s discuss the case where the variance σ 2 = 1 is unknown. Since it is necessary to make sure
that the estimation of the variance is consistent under the null, we use the mean square of |yt | as
the estimate. The resulting empirical distribution of HC ∗ is plotted in the lower panel of Figure
2. Among the 1000 trials, there are 745 with empirical P-values smaller than 0.05.

Figure 3: Empirical distribution of the SPHC test statistic under the null is plotted in the upper panel,
where N = 1000 and q = 1000. Under the mixed alternative, we let p = 1, 000, 000, s = 30 and r = 0.3.
Empirical distribution of the SPHC test statistic is plotted in the middle panel with known σ = 1, and in
the lower panel with estimated variance of noise.

Next we consider the OPHC test with q = N = 1000. We refer to this test as standard peri-
odogram higher criticism (SPHC) test since it is constructed based on the standard periodogram.
The empirical distribution of the SPHC test statistic under the null is plotted in the upper panel
of Figure 3. The setup of the mixed alternative is the same as in the experiments for the OPHC
test described before. Suppose the variance σ 2 = 1 is known. The distribution of the SPHC test
statistic under the alternative is plotted in the middle panel of Figure 3 by 1000 trials. In 867 trials
of them, the empirical P-values are smaller than 0.05. This is worse than the OPHC method with
q = 14000, where the periodicities are successfully detected in 956 trials. When the the variance
of the noise is unknown, we still estimate it by the mean square of |yt |, such that the estimate is
consistent under the null. The resulting empirical distribution of the SPHC test statistic is plotted
in the lower panel of Figure 3 based on 1000 independent trials. In only 478 trials among them,
the empirical P-values are smaller than 0.05. This is also worse than OPHC where the periodicities
are successfully detected in 745 independent trials .
Finally, we assume p were known and consider the OPHC test with q = p = 1000000. In this

12
Figure 4: Empirical distribution of the OPHC test statistic under the null is plotted in the upper panel,
where N = 1000 and q = 1, 000, 000. Under the mixed alternative, we let p = 1, 000, 000, s = 30 and r = 0.3.
Empirical distribution of this test statistic is plotted in the middle panel with known σ = 1, and in the lower
panel with estimated variance of noise.

case the OPHC test coincides with the method proposed in [2]. The empirical distribution of this
test statistic under the null by 1, 000 independent trials is plotted in the upper panel of Figure 4.
The setup of the mixed alternative is the same as before. When the variance σ 2 = 1 is known, the
distribution of this test statistic under the alternative is plotted in the middle panel of Figure 4 by
1000 trials. In 949 trials of them, the empirical P-values are smaller than 0.05, which is slightly
worse than the OPHC method with q = 14000 as mentioned before. When the the variance of the
noise is unknown, with its estimation by the mean square of |yt |, the resulting empirical distribution
of this test statistic is plotted in the lower panel of Figure 4 based on 1000 independent trials. In
741 trials among them, the empirical P-values are smaller than 0.05, which is also slightly worse
than the OPHC method with q = 14000. Although the statistical efficiency for q = p is almost the
same as that for q = O(N log N ), the computational complexity of the OPHC method with q = p
is much greater than q = O(N log N ).

3.2 Comparison between the low-frequency and random frequency DFT sensing
matrices

In the application of detection of pulse streams using Fourier measurements as discussed in Section
1.2, the model of observation (1.4) is derived by choosing the sampled frequencies 0, 1, . . . , N − 1.
The sensing matrix X amounts to the N ×p low-frequency DFT matrix. However, in the application
of compressed sensing where one needs to estimate the signal accurately, the frequencies of the
Fourier measurements are usually chosen randomly in a large range, by which the sensing matrix
X amounts to an N × p random partial DFT matrix [17]. In this section we compare the statistical
properties of these two different sensing matrices in terms of global testing.
It is not reasonable to apply the OPHC test directly when the sensing matrix X is a random
partial DFT matrix, since the mean Xβ does not consist of equally-spaced samples of the super-
position of several complex sinusoids. Instead, we use the method proposed in [2]. To be q
specific,
p ∗
we suppose p is known and X is therefore available, and define v = U y ∈ Cp with U = NX .

13
Figure 5: Under the random-frequency design, empirical distribution of the HC-ACP test statistic under
the null is plotted in the upper panel, where N = 1000. Under the mixed alternative, we let p = 1, 000, 000,
s = 30 and r = 0.3. Empirical distribution of this test statistic is plotted in the middle panel with known
σ = 1, and in the lower panel with estimated variance of noise.

Under the null, straightforward calculations imply that all components of v are standard complex
Gaussian, so we can calculate the P-values of |v1 |, . . . , |vp |, respectively. We then define the higher
criticism test statistic as in (3.1) with q = p. This test is referred to as HC-ACP throughout this
section.
Under the null and with the knowledge of σ = 1, the empirical distribution of the HC-ACP test
statistic is plotted in the upper panel of Figure 5. The mixed alternative is defined in the same
way as in Section 3.1, where p = 1, 000, 000, N = 1000, s = 20, r = 0.3 and σ 2 = 1. When the
variance σ 2 = 1 is known, the empirical distribution of the the HC-ACP test statistic by 1000 trials
is plotted in the middle panel of Figure 5. In 851 trials of them, the empirical P-values are less
than 0.05. In contrast, we have shown in the previous section that by using the OPHC test with
the low-frequency DFT sensing matrix, the pulse streams are successfully detected in 956 trials.
We also consider the case when σ 2 = 1 is unknown, and an estimate is given by the mean
square of |yj |. This estimate is valid since it is consistent under the null hypothesis. The empirical
distribution of the HC-ACP test statistic by this estimate and 1000 trials is plotted in the lower
panel of Figure 5. There is 0 trial of them with P-value less than 0.05. It turns out that under the
random-frequency design, the distributions of HC-ACP based on the true and estimated variances
behave very differently as shown in the middle and lower panel of Figure 5.

4 Discussion

Motivated by periodicity detection in complex-valued time series analysis, we investigated the global
hypothesis testing problem (1.6) under the linear model (1.4), where the frequencies of the hidden

14
periodicities are not the Fourier frequencies which are located on the grid. The OPHC test, which
is a higher criticism test applied to the periodogram over-sampled by O(log N ), is proposed to solve
the hypothesis test problem.
We investigated the theoretical properties of this testing method. By assuming that the fre-
quencies of the periodicities hidden in the complex series satisfy a minimum separation condition,
a detection boundary between the magnitude and number of the complex sinusoids is explicitly es-
tablished. Perhaps surprisingly, the new detection boundary associated with the extended Discrete
Fourier Transform (EDFT) design matrix X in (1.4) can be much lower than that of Gaussian
design matrices, given that the frequencies satisfy the mild minimum separation condition. In fact,
the value of p is allowed to be infinity by slightly modifying our analytical argument. However, for
ease of exposition, we employ the assumption that p is finite but unknown.
Numerical simulations show that our method can successfully detect sparse and weak period-
icities. Our experiments also validate the choice q = O(N log N ) by being compared to the choice
q = N , i.e. the standard periodogram higher criticism, and q = p, i.e. excessively over-sampled
periodogram methods. In adition, motivated by the model of compressed detection of pulse streams
using Fourier measurements, the OPHC test associated with the low-frequency measurements is
shown empirically more powerful than using random-frequency measurements associated with the
testing method proposed in [2].
The hypothesis testing problem considered in this paper is related to a number of other in-
teresting problems that merit further study. We briefly discuss here several directions for future
research.

• Statistical estimation of frequencies. A related important statistical problem is to esti-


mate the frequencies of the periodicities under the observation model (1.4). This also amounts
to estimating the positions of the moving targets in the model of pulse streams detection.
Statistical estimation of frequencies under (1.4) has a long history with rich literature. Para-
metric methods date back to 1795 by Prony [40], and later developed in [44, 12, 41, 35], to
name a few. Other methods for estimating these frequencies include sparse recovery methods,
e.g., [19, 34, 26, 24] and total-variation minimization [16, 15] in the context of super-resolution.
However, computationally feasible estimation with minimax convergence rate has not been
fully established. Therefore, it is interesting to develop both computationally efficient and
statistically optimal estimators of the frequencies.

• Sinusoidal denoising. Compared to frequency estimation, denoising, i.e., estimation of the


mean Xβ in (1.4), is a conceptually easier statistical task. In the recent papers [9, 47], SDP
methods are proposed for this purpose, and some nearly-optimal statistical properties are
established. One open question is to establish the exact statistical optimality for denoising
with computationally fast methods.

• Testing for periodicity in real-valued series. In the literature of testing for periodicity
in real-valued time series, the hidden frequencies are usually assumed to lie on the grid
( 2π 4π 2nπ
N , N , . . . , N ), where N = 2n + 1; See the discussions in Section 1.1. As an analogy to
(1.4), a more general periodicity detection model in real-valued time series analysis can be
formulated as (1.1) with
s
X
ζt = a0 + [al cos (θl t) + bl sin (θl t)] .
l=1

15
Here 0 < θ1 < . . . < θs ≤ π, and 0, θ1 , . . . , θs satisfy some minimum separation condition.
Define Rl2 = a2l + b2l for l = 1, . . . , s. The detection problem amounts to
H0 : R12 = . . . = Rs2 = 0 versus H1 : R12 + . . . + Rs2 > 0
without the knowledge of s. This hypothesis testing formulation can be viewed as an analogy
to (1.6) for the real case. In fact, when the variance of noise is known, the OPHC test
defined by (1.7), (1.8), (1.9) and (1.10) can be applied to the real sequence ut by the idea of
complexification, i.e., transforming ut into yt = ut + iut+n for t = 1, . . . , n. Consequently, the
mean of yt amounts to a superposition of complex sinusoids, and the noise part in yt consists
of a sequence of complex white noise. Then the hypothesis test is reduced to (1.4) to which
OPHC can be applied.
However, the optimality of complexification is unknown, and it is more natural to over-sample
the periodogram of the real series ut directly, based on which higher criticism tests can be
defined. Unlike the complex case, the over-sampled periodogram for the real series is far more
complicated, and usually it is defined in the form of Lomb-Scargle periodogram; See [38, 43].
The statistical behavior of such over-sampled periodogram is more difficult to analyze than
that of (1.8) defined for the complex-valued series. We leave the derivation and analysis for
over-sampled Lomb-Scargle periodogram based tests for real-valued time series analysis as
future work.
• Statistical inference on the periodicity without knowledge of the noise level σ.
In the OPHC test, the noise level σ needs to be known in order to calculate the p-values of
the components of the over-sampled periodogram. However, in practice the variance of noise
is usually unknown. In Section 3, the variance of noise is estimated by the mean square of
the observed sequence, which is consistent when the null is true, and the resulting procedure
works well. However, for other statistical problems, such as frequency estimation or sinusoidal
denoising, the variance needs to be estimated accurately. Minimax rate of convergence for
various statistical inferences without the knowledge of σ is another interesting research topic.

5 Proofs

This section is dedicated to the proofs of Theorems 2.1 and 2.2. We begin by collecting a few
technical tools that will be used in the proof of the main results.

5.1 Preliminaries

First, we formally introduce the concept of complex-valued multivariate normal distribution.

Definition 5.1 Define z = x + iy ∈ Cnisan n-dimensional complex-valued multivariate normal


x
vector with distribution CN (µ, Γ, Ω), if is an 2n-dimensional real-valued normal vector, and
y
z satisfies
E z = µ, E((z − µ)(z − µ)∗ ) = Γ, E((z − µ)(z − µ)T ) = Ω.
Here X ∗ denotes the complex conjugate transpose, while X T denotes the ordinary transpose. More-
over, we say a complex-valued multivariate normal vector z is standard, if
z ∼ CN (0, In , 0).

16
The following lemma gives the tail distribution of the standard complex normal variable, which
turns out to be much neater than that of real normal variables.

Lemma 5.2 Suppose z ∼ CN (0, 1, 0) is the standard circularly-symmetric complex normal vari-
able. Then for any t ≥ 0,
2
Ψ̄(t) := P(|z| ≥ t) = e−t .
In addition, if µ ∈ C is fixed, we have
C0 2 2
e−(t−|µ|)+ ≤ P(|µ + z| > t) ≤ e−(t−|µ|)+ (5.1)
1 + (t − |µ|)+

for some positive numerical constant C0 . Here x+ := max(x, 0) for x ∈ R, and x2+ is short for
(x+ )2 .

1 −|z|2
Proof Since the density of z is πe ,we have
Z
1 −|z|2
Ψ̄(t) = e dA,
|z|>t π

where dA is the Lebesgue measure on the z-plane. In other words, if z = x + iy, then dA = dxdy.
By applying the polar coordinates z = reiθ , we have
Z 2π Z ∞ Z ∞
1 −r2 2 2
Ψ̄(t) = e rdrdθ = 2re−r dr = e−t .
0 t π t

To prove (5.1), define u ∈ C satisfying |u| = 1 and ūµ = |µ|. This unit complex scalar always exists
µ
since we can let u = |µ| when µ 6= 0, and any unit scalar when µ = 0. Notice that

<(ūz) > t − |µ| =⇒ <(ūz) + |µ| > t =⇒ <(ū(z + µ)) > t =⇒ |z + µ| > t,

and hence
P(|z + µ| > t) ≥ P(<(ūz) > t − |µ|).
Since  
1
z ∼ CN (0, 1, 0) =⇒ ūz ∼ CN (0, 1, 0) =⇒ <(ūz) ∼ N 0, ,
2
by the tail probability of standard real-valued normal variable we have
C0 2
P(<(ūz) > t − |µ|) ≥ e−(t−|µ|)+ .
1 + (t − |µ|)+

Moreover,
2
P(|µ + z| > t) ≤ P(|z| > t − |µ|) ≤ e−(t−|µ|)+ .

Under the alternative hypothesis, we need to yield an upper bound of the variance of HC(t) for
fixed t, for which the following lemma will be applied for several times. The argument is standard
in the literature of normal comparison inequalities; See, for example, [39, 37].

17
 
w1
Lemma 5.3 Suppose ∼ CN (0, Γ, 0) is a 2-dimensional complex normal vector, where Γ =
w2
 
1 ξ
.
ξ¯ 1
1. For any fixed a1 , a2 ∈ C and t > 0, there holds
 2 2

Cov(1{|w1 −a1 |>t} , 1{|w2 −a2 |>t} ) ≤ min e−(t−|a1 |)+ , e−(t−|a2 |)+ .

2. If |ξ| ≤ 21 , we obtain

Cov(1{|w1 −a1 |>t} , 1{|w2 −a2 |>t} )


(t − |a1 |)2+ + (t − |a2 |)2+
 
≤ C0 |ξ| exp − (1 + (t − |a1 |)+ )(1 + (t − |a2 |)+ )
1 + |ξ|

for some positive numerical constant C0 .

Proof Simple calculation yields

Cov(1{|w1 −a1 |>t} , 1{|w2 −a2 |>t} ) = P(|w1 − a1 | > t, |w2 − a2 | > t) − P(|w1 − a1 | > t) P(|w2 − a2 | > t).
(5.2)

This implies

Cov(1{|w1 −a1 |>t} , 1{|w2 −a2 |>t} ) ≤ P(|w1 − a1 | > t, |w2 − a2 | > t)
≤ min (P(|w1 − a1 | > t), P(|w2 − a2 | > t))
≤ min (P(|w1 | > (t − |a1 |)+ ), P(|w2 | > (t − |a2 |)+ ))
 2 2

= min e−(t−|a1 |)+ , e−(t−|a2 |)+ .
   
1 1 hξ Z1
When |ξ| ≤ 2, let Γh = ¯ , 0 ≤ h ≤ 1. Define a random vector ∼ CN (0, Γh , 0) and a
hξ 1 Z2
function of h
F (h) = P(|Z1 − a1 | > t, |Z2 − a2 | > t).
Here t > 0 is a fixed parameter. By Newton-Lebnitz theorem, we have
Z 1
Cov(1{|w1 −a1 |>t} , 1{|w2 −a2 |>t} ) = F (1) − F (0) = F 0 (h)dh.
0
 
Z1
It suffices to give an upper bound to F 0 (h) for all 0 < h < 1. By the density function of , we
Z2
have the explicit formula
Z Z   ∗  
1 z1 −1 z1
F (h) = exp − Γh dA2 dA1 ,
π 2 det(Γh ) z2 z2
|z1 −a1 |>t |z2 −a2 |>t

Here dA1 and dA2 are the Lebesgue measures on the z1 -plane and z2 -plane, respectively. In other
words, if we write z1 = x1 + iy1 and z2 = x2 + iy2 , then dA1 = dx1 dy1 and dA2 = dx2 dy2 . Simple

18
 
1 −hξ
calculation in linear algebra gives det(Γh ) = 1 − h2 |ξ|2 and Γ−1 = 1
, which
h 1−h2 |ξ|2 −hξ¯ 1
implies
|z1 |2 + |z2 |2 − 2h<(ξ z̄1 z2 )
ZZ  
1
F (h) = exp − dA2 dA1 .
π 2 (1 − h2 |ξ|2 ) 1 − h2 |ξ|2
|z1 −a1 |>t
|z2 −a2 |>t

By changing the order of derivative and integrals, we have

F 0 (h)
|z1 |2 + |z2 |2 − 2h<(ξ z̄1 z2 )
ZZ  
= exp −
1 − h2 |ξ|2
|z1 −a1 |>t
|z2 −a2 |>t

2h|ξ|2 2<(ξ z̄1 z2 ) + (2h2 <(ξ z̄1 z2 ) − 2(|z1 |2 + |z2 |2 )h)|ξ|2


 
+ dA2 dA1
π 2 (1 − h2 |ξ|2 )2 π 2 (1 − h2 |ξ|2 )3
|z1 |2 + |z2 |2 − 2h<(ξ z̄1 z2 )
ZZ  
≤ C|ξ| (1 + |z1 ||z2 |) exp − dA2 dA1 ,
1 − h2 |ξ|2
|z1 −a1 |>t
|z2 −a2 |>t

where C is a numerical constant. The last inequality is due to 0 ≤ h ≤ 1 and |ξ| ≤ 21 . Notice that

|z1 |2 + |z2 |2 − 2h<(ξ z̄1 z2 ) |z1 |2 + |z2 |2 − 2h|ξ||z1 ||z2 |



1 − h2 |ξ|2 (1 − h|ξ|)(1 + h|ξ|)
|z1 | + |z2 |2
2
≥ .
1 + h|ξ|
|z1 |2 +|z2 |2 −2ρ|z1 ||z2 |
This is due to the fact that (1−ρ) obtains the minimum at ρ = 0. Therefore

|z1 |2 + |z2 |2
ZZ  
0
F (h) ≤ C|ξ| (1 + |z1 ||z2 |) exp − dA2 dA1
1 + h|ξ|
|z1 −a1 |>t
|z2 −a2 |>t

|z1 |2 + |z2 |2
ZZ  
≤ C|ξ| (1 + |z1 ||z2 |) exp − dA2 dA1 .
1 + |ξ|
|z1 |>(t−|a1 |)+
|z2 |>(t−|a2 |)+

By using the polar coordinates: z1 = r1 eiθ1 and z2 = r2 eiθ2 , we have dA1 = r1 dr1 dθ1 and dA2 =
r2 dr2 dθ2 . Then
Z ∞ Z ∞  2
r1 + r22

0 2
F (h) ≤ C|ξ|4π (1 + r1 r2 ) exp − r1 r2 dr1 dr2 .
(t−|a1 |)+ (t−|a2 |)+ 1 + |ξ|

For any fixed u > 0, simple integration by parts yields


Z ∞
r2 u2
   
1 + |ξ|
exp − rdr = exp − ,
u 1 + |ξ| 2 1 + |ξ|

19
and

r2
Z  
exp − r2 dr
u 1 + |ξ|
u2 1 + |ξ| ∞ r2
   
(1 + |ξ|)u
Z
= exp − + exp − dr
2 1 + |ξ| 2 u 1 + |ξ|
u2
 
≤ C(1 + u) exp − ,
1 + |ξ|

where the last inequality is due to the real Gaussian bound. These equalities/inequalities give

(t − |a1 |)2+ + (t − |a2 |)2+


 
0
F (h) ≤ C0 |ξ| exp − (1 + (t − |a1 |)+ )(1 + (t − |a2 |)+ ).
(1 + |ξ|)

By integrating it over [0, 1], our proposition is proven.

To give a desirable upper bound of the OPHC test statistic when the null hypothesis is true, we
need to control the maximum of a empirical process defined by a sequence of dependent Bernoulli
random variables. A prominent observation is that these dependent Bernoulli random variables can
be divided into blog N + 1c groups, each of which contains only independent Bernoulli variables.
Since we need to give a uniform upper bound for all groups, the following non-asymptotic upper
bound for the maximum of empirical processes turns out to be essential in the analysis of the null.

Lemma 5.4 Suppose X1 , . . . , Xn ∼ Ber(ρ) are n IID Bernoulli random variables. Then

| nj=1 Xj − nρ|
" P #
1
P sup > C1 ≤
log2 n
p
3
n
3 log n
≤ρ≤1− n nρ(1 − ρ)

provided n > N0 . Here C1 and N0 are positive numerical constants.

Proof This is a weak version of existing concentration inequalities in the literature of ratio type
empirical processes; See, for example, Theorem 2.1 and Example 2.7 in [31]. Similar results are
stated in [52] and [1].

Finally, there is a simple and useful result which we will use in the proofs for several times.

Lemma 5.5 For any a, b ∈ [0, 1], define

d(a, b) = min(|a − b|, 1 − |a − b|).

We have
4d(a, b) ≤ 1 − e2πi(a−b) ≤ 2πd(a, b).

Proof These inequalities can be obtained by comparing the length of arcs and chords in the unit
circle.

20
5.2 Proof of Theorem 2.1

Suppose  > 0 is an underdetermined positive parameter, which will be specified later in order to
establish the detectable region. Denote L = blog N + 1c and hence q = N L. By the definition of v
as in (1.8) and the definition of β̃ as in (2.1), we have
s
1 X − 2πi(j−1)(τl −1)
yj = √ e p β̃l + zj , j = 1, . . . , N,
p
l=1

and
N
1 X 2πi(m−1)(j−1)
vm = √ e q yj
N j=1
N s 
τ −1
 N
1 X X 2πi(j−1) m−1
− lp 1 X 2πi(m−1)(j−1)
=√ e q
β̃l + √ e q zj
N p j=1 N j=1
l=1
:= θm + wm , m = 1, . . . , q. (5.3)
It is obvious that θm is deterministic and (w1 , . . . , wq ) is a q-dimensional complex multivariate
normal vector. First, since zj ∼ CN (0, 1, 0) are independent, we know E zj2 = 0 and E zj z̄j = 1.
This implies E wm2 = 0 and E w w̄ = 1, and therefore w ∼ CN (0, 1, 0). For any 1 ≤ m , m ≤ q
m m m 1 2
and m1 6= m2 , straightforward calculation gives E wm1 wm2 = 0 and
2πiN (m1 −m2 )
N
1 X 2πi(m1 −m 2 )(j−1) 1−e q
E(wm1 w̄m2 ) = e q =  .
N 2πi(m1 −m2 )
j=1 N 1−e q

      2πiN (m1 −m2 )


wm1 1 ξ 1−e q
This implies ∼ CN 0, ¯ , 0 , where ξ = 2πi(m1 −m2 )
!. For all m1 6= m2 , by
wm2 ξ 1 N 1−e q

Lemma 5.5,
2 1
|ξ| ≤ ≤
2πi(m1 −m2 )
 . (5.4)
m1 −1 m2 −1
2N d ,

N 1 − e
q
q q

m1 −m2 N (m1 −m2 )


Furthermore, when L = q is an integer, we have
2πiN (m1 −m2 )
1−e q
ξ=  2πi(m1 −m2 )
 = 0. (5.5)
N 1−e q

This implies that wm1 and wm2 are independent.

5.2.1 Lower bound of E HC(t) under the alternative

Step 1
We choose m1 , . . . , ms ∈ {1, . . . , q} such that m1q−1 , m2q−1 , . . . , msq−1 are closest to τ1 −1 τs −1
p ,..., p
under the metric d, respectively. This implies that for each 1 ≤ l ≤ s,
 
ml − 1 τl − 1 1 1
d , ≤ ≤ . (5.6)
q p 2q 2N log N

21
Since τ satisfies the minimum separation condition as indicated in (2.4), m1 , . . . ms must be distinct.
Therefore, for 1 ≤ ν ≤ s,
N 
τ −1
 N  
β̃ X 2πi(j−1) mν −1
− lp β̃ν X 2πi(j−1) mν −1
− τνp−1
√ l
X
θm ν = e q
+√ e q

1≤l≤s
N p j=1 N p j=1
l6=ν

:= S1 + S2 .
First, as to S1 , we have

N 
τ −1

1 X X 2πi(j−1) m ν −1
− lp
|S1 | = √ β̃l e q
Np

j=1

1≤l≤s,l6=ν
 
−1
τ
2πiN mνq−1 − lp
1 X 1−e
=√ β̃l 
mν −1 τl −1

Np
2πi − p

1≤l≤s,l6=ν 1−e q
kβk∞ X 2
≤ √ 
mν −1 τ −1

Np i2π − lp
1≤l≤s,l6=ν 1 − e q

kβk∞ X 2
≤ √  .
Np mν −1 τl −1
1≤l≤s,l6=ν 4d q , p
P 2
The last inequality is due to Lemma 5.5. let’s now bound 
mν −1 τl −1
. Since
1≤l≤s,l6=ν 4d q
, p
     
mν − 1 τl − 1 τl − 1 τν − 1 mν − 1 τν − 1
d , ≥d , −d ,
q p p p q p
1
≥ min(|ν − l|, s − |ν − l|)∆(τ ) −
2q
 
1
≥ min(|ν − l|, s − |ν − l|) ∆(τ ) − ,
2q
we have
 s 
b2c
X 2 X 1 1 log s
  ≤ 1 ≤ 1 .
4d mν −1 , τl −1 a ∆(τ ) − 2q ∆(τ ) − 2q
1≤l≤s,l6=ν q p a=1

As a result
kβk∞ log s
|S1 | ≤ √ 1 .
N p ∆(τ ) − 2q
As to S2 , we have
√ 
N  


N β̃ β̃
ν ν
X 2πi(j−1) mν −1
− τν −1
S2 − √ = √ e q p
− N 

p Np
j=1
N
kβk∞
 
mν −1
2πi(j−1) − τνp−1
X
≤ √ e q
− 1
Np j=1

N
kβk∞ X 2πi(j−1)d mνq−1 , τνp−1
 

= √ e − 1 .
N p j=1

22
 
The last inequality is by the definition of d in Lemma 5.5. By (5.6), we have d mνq−1 , τνp−1 ≤ 1
2q .
By Lemma 5.5, there holds
   
mν − 1 τν − 1 (j − 1)π
 
2πi(j−1)d mν −1 , τν −1
e q p
− 1 ≤ d (j − 1)2πd
, ,0 ≤ .
q p q
This implies that

N β̃ν kβk∞ π N (N − 1)
S2 − √ ≤ √ .

p Np q 2
In summary
s !
N kβk log s πN (N − 1)

θmν − β̃ν ≤ √ +

p 1 2q
Np ∆(τ ) − 2q

 
r log p  log s πN (N − 1) 
≤ 2 + .
N log N
− 1 2N log N
N 2N log N

Noticing N = p1−γ and s = p1−α , for any fixed  > 0, we have


s
N p
θmν − β̃ν ≤ 2 r log p, ν = 1, . . . , s, (5.7)

p
provided p > C(γ, α, r, ), which is a constant only depending on γ, α, r and .

Step 2:
Define Dl ⊂ {1, . . . , q} for 1 ≤ l ≤ s as
   √ 
m τl log N
Dl = m : 1 ≤ m ≤ q, d , < ,
q p N
2

Dl . Since ∆(τ ) ≥ logN N > 2 log N
S
and D = N , D1 , . . . , Ds are disjoint subsets of {1, . . . , q}. For
1≤l≤s
each index m ∈ Dν , ν = 1, . . . , s, since
N 
τ −1
 N  
β̃ X 2πi(j−1) m−1
− lp β̃ν X 2πi(j−1) m−1
− τνp−1
√ l
X
θm = e q
+√ e q
,
1≤l≤s
N p j=1 N p j=1
l6=ν

we have

N   N  
β̃ν τ −1
2πi(j−1) m−1 − τνp−1
β̃ 2πi(j−1) m−1 − lp
√ l
X X X
|θm | ≤ √ e q
+ e q


N p j=1

1≤l≤s N p j=1

l6=ν

 
τl −1
m−1
1 − e2πiN q − p
s
N kβk∞ X
≤ kβk∞ + √ 
τ −1

p N p 1≤l≤s 2πi m−1 − lp
l6=ν 1 − e q

s
N kβk∞ X 1
≤ kβk∞ + √  .
p N p 1≤l≤s 2d m−1 , τl −1
l6=ν
q p

23
Notice that
     
m − 1 τl − 1 τν − 1 τl − 1 m − 1 τν − 1
d , ≥d , −d −
q p p p q p

log N
≥ min(|ν − l|, r − |ν − l|)∆(τ ) −
 √N 
log N
≥ min(|ν − l|, r − |ν − l|) ∆(τ ) − .
N

This implies that


 s 
b2c
X 1 X 1 1 log s
  ≤  √ ≤ √ .
2d m−1 τl −1 a ∆(τ ) − log N
∆(τ ) − log N
1≤l≤s
l6=ν
q , p a=1 N N

In summary,
s
N kβk∞ log s
|θm | ≤ kβk∞ + √ √
p N p ∆(τ ) − log N
N
 
p log s
= r log p 1 + √
N ∆(τ ) − log N
 
p log s
≤ r log p 1 + √
log2 N − log N

Similarly, when p > C(γ, α, r, ), we have


p
|θm | ≤ r log p 1 + 2 ,

∀m ∈ D. (5.8)

Step 3:
In this step we aim to give a uniform upper bound of θm for all m ∈ Dc . Straightforward calculation
yields

N 
τl −1

X β̃l X 2πi(j−1) m−1
− p
|θm | = √ e q
1≤l≤s N p j=1


 
τl −1
m−1
1 − ei2πN q − p

kβk∞ X
≤ √ 
τ −1

Np i2π m−1 − lp
1≤l≤s 1 − e q

kβk∞ X 1
≤ √  .
Np 2d m−1
, τl −1
1≤l≤s q p

P 1  . We consider the position of m−1 on T = [0, 1]/{0 ∼ 1}


We now aim to bound 
m−1 τl −1 q
1≤l≤s 2d q
, p
τ −1 τ −1
relative to τ1p−1 , . . . , τsp−1 . Suppose on T = [0, 1]/{0 ∼ 1}, m−1 q √is located between jp and j+1p

τj −1 log N τj+1 −1
(recall that τs+1 = τ1 ). Since m ∈ Dc , we have d( m−1 q , p )≥ N and d( m−1
q , p ) ≥ log N
N .

24

τj−1 −1 log N
The next adjacent location parameters τj−1 and τj+2 satisfy d( m−1
q , p ) ≥ ∆(τ ) + N and

τj+2 −1 log N
d( m−1
q , p ) ≥ ∆(τ ) + N , etc. Then we have

b s−1
2
c
X 1 X 1
 ≤ √
m−1 τl −1 log N
1≤l≤s 2d q , p a=0 a∆(τ ) + N
N log s
≤√ + .
log N ∆(τ )

In summary,
   
kβk∞ N log s p 1 log s
|θm | ≤ √ √ + = r log p √ + .
Np log N ∆(τ ) log N log2 N

Similarly, when p > C(γ, α, r, ), we have


p
|θm | ≤ 2 r log p, ∀m ∈ Dc . (5.9)

Step 4:
We are now ready to derive a lower bound of E HC(t). Recall that
q
P
1{|vm |>t} − q Ψ̄(t)
m=1
HC(t) = p . (5.10)
q Ψ̄(t)(1 − Ψ̄(t))

By Lemma 5.2, we have


Ps
l=1 P(|θ m + wml | > t) − sΨ̄(t)
E HC(t) ≥ p l
q Ψ̄(t)(1 − Ψ̄(t))
Ps C0 −(t−|θml |)2+ 2
l=1 1+t e − se−t
≥ p .
q Ψ̄(t)(1 − Ψ̄(t)

By (5.7), we have p
min |θml | ≥ (1 − 2 ) r log p,
1≤l≤s

which implies  √ 
C0 −(t−(1−2 ) r log p)2+ 2
s 1+t e − e−t
E HC(t) ≥ p .
q Ψ̄(t)(1 − Ψ̄(t)

Letting t = µ log p, by s = p1−α , N = p1−γ and q = N blog N + 1c, there holds
1 1 γ µ √ 2 √ 2
p 2 −α+ 2 + 2 −( µ−(1− ) r)+ ,
p
E HC( µ log p) ≥ (5.11)
polylog(p)

where polylog(p) is a polynomial of log p.

25
5.2.2 Upper bound of Var(HC(t)) under the alternative

By (5.8) and (5.9), we have p


max |θm | ≤ 1 + 2 r log p,
1≤m≤q

and p
maxc |θm | ≤ 2 r log p.
m∈D

By the definition of HC(t) as in (5.10), simple calculation yields


1 X 
VarHC(t) = cov 1{|θa +wa |>t} , 1{|θb +wb |>t} .
q Ψ̄(t)(1 − Ψ̄(t))
1≤a,b≤q
 
a−1 b−1 1
By equation (5.4), when d q , q ≥ N, we have

1 1
| E(wa w̄b )| ≤  ≤ .
2N d a−1 b−1 2
q , q

Then, Lemma 5.3 implies


(t−|θa |)2+ +(t−|θb |)2+
 
 C0 exp − 2 (1 + t)2
cov 1{|θa +wa |>t} , 1{|θb +wb |>t} ≤   .
2N d a−1 q , q
b−1

 
a−1 b−1 1
On the other hand, when d q , q < N, Lemma 5.3 implies

2
cov 1{|θa +wa |>t} , 1{|θb +wb |>t} ≤ e−(t−|θa |)+ .


Now we bound VarHC(t) by controlling


1 X X 
S1 (t) = cov 1{|θa +wa |>t} , 1{|θb +wb |>t} ,
q Ψ̄(t)(1 − Ψ̄(t))
a∈D d a−1 , b−1 <
 
1
q q N

1 X X 
S2 (t) = cov 1{|θa +wa |>t} , 1{|θb +wb |>t} ,
q Ψ̄(t)(1 − Ψ̄(t))
a∈D d a−1 , b−1 ≥
 
1
q q N

1 X X 
S3 (t) = cov 1{|θa +wa |>t} , 1{|θb +wb |>t} ,
q Ψ̄(t)(1 − Ψ̄(t)) c
a∈D d a−1
( b−1 < 1
)
q , q N
b∈D c

and
1 X X 
S4 (t) = cov 1{|θa +wa |>t} , 1{|θb +wb |>t} .
q Ψ̄(t)(1 − Ψ̄(t)) c
a∈D d a−1
( b−1 ≥ 1
)
q , q N
b∈D c

By the symmetry between a and b, we have

VarHC(t) ≤ 2(S1 (t) + S2 (t)) + S3 (t) + S4 (t).

26
Step 1: Upper bound for S1 (t)
For fixed a ∈ D,
X  2q 2
cov 1{|θa +wa |>t} , 1{|θb +wb |>t} ≤ e−(t−|θa |)+
  N
d a−1
q
, b−1
q
1
<N

2q −(t−√r log p(1+2 ))2+


≤ e .
N

2qs log N
Notice that |D| ≤ N , we get

2qs log N
N 2q −(t−√r log p(1+2 ))2+
S1 (t) ≤ e
q Ψ̄(t)(1 − Ψ̄(t)) N
which implies √ 2 √ 2
S1 ( µ log p) ≤ polylog(p)p−α+γ+µ−( µ−(1+ ) r)+ .
p

Step 2: Upper bound for S2 (t)


For fixed a ∈ D,
X 
cov 1{|θa +wa |>t} , 1{|θb +wb |>t}
 
a−1 b−1 1
d q
, q ≥N

(t−|θa |)2+ +(t−|θb |)2+


 
X C0 exp − 2 (1 + t)2
≤  
d a−1 b−1 ≥ 1 2N d a−1 q , b−1
q
( q , q N)
√ 2
r log p(1+2 ))+
C0 (1 + t)2 e (
− t− X 1
≤  
2N d a−1 b−1
q , q
 
d a−1
q
, b−1
q
1
≥N
√ 2
r log p(1+2 ))+
C0 (1 + t)2 e (
− t− q
X q
≤ 2
2N l
l=1
√ 2
− t− r log p(1+2 ))+
C0 q log q(1 + t)2 e (
≤ .
N

2qs log N
Notice that |D| ≤ N , we get
√ √ 2
− t− r log p(1+2 ))+
C0 q log q(1 + t)2 e (
2qs log N
N
S2 (t) ≤ ,
q Ψ̄(t)(1 − Ψ̄(t)) N

which implies √ 2 √ 2
S2 ( µ log p) ≤ polylog(p)p−α+γ+µ−( µ−(1+ ) r)+ .
p

Step 3: Upper bound for S3 (t)


For fixed a ∈ Dc and any b ∈ {1, . . . , q}
2 2
√ 2
cov 1{|θa +wa |>t} , 1{|θb +wb |>t} ≤ e−(t−|θa |)+ ≤ e−(t− r log p)+ .


27
Then
1 2

r log p)2+
X X
S3 (t) ≤ e−(t−
q Ψ̄(t)(1 − Ψ̄(t)) c a∈D d a−1
( b−1 < 1
)
q , q N
b∈D c

−(t−2 r log p)2+
q 2q
N e
= ,
q Ψ̄(t)(1 − Ψ̄(t))

which implies p √ 2√ 2
S3 ( µ log p) ≤ polylog(p)pµ−( µ− r)+ .

Step 4: Upper bound for S4 (t)


For fixed a ∈ Dc ,
X 
cov 1{|θa +wa |>t} , 1{|θb +wb |>t}
b∈D c
d a−1
( b−1 ≥ 1
q , q )
N

(t−|θa |)2+ +(t−|θb |)2+


 
X C0 exp − 2 (1 + t)2
≤  
b∈D c 2N d a−1 q , b−1
q
d a−1
( b−1 ≥ 1
q , q )
N
√ 2
− t−2 r log p)+
C0 (1 + t)2 e ( X 1
≤  
2N d a−1 b−1
b∈D c
q , q
d a−1
( b−1 ≥ 1
q , q )
N
√ 2
− t−2 r log p)+
t)2 e (
q
C0 (1 + X q
≤ 2
2N l
l=1
2
√ 2
t)2 e (
− t− r log p)+
C0 q log q(1 +
≤ .
N
Therefore √ 2
− t−2
C0 q log q(1 + t)2 e (
r log p)+
q
S4 (t) ≤ ,
q Ψ̄(t)(1 − Ψ̄(t)) N
which implies p √ 2√ 2
S4 ( µ log p) ≤ polylog(p)pµ−( µ− r)+ .

Step 5: Summary
In summary, there holds
p
VarHC( µ log p)
p p p p
≤ 2(S1 ( µ log p) + S2 ( µ log p)) + S3 ( µ log p) + S4 ( µ log p)
√ 2 √ 2 √ 2√ 2
 
≤ polylog(p) p−α+γ+µ−( µ−(1+ ) r)+ + pµ−( µ− r)+ . (5.12)

28
5.2.3 Detectable region under the alternative and the values of  and µ

By Chebyshev’s inequality, we have


    1  1
2
p p p
P HC( µ log p) < E HC( µ log p) − (log N ) Var HC( µ log p) ≤ .
log2 N

By (5.11) and (5.12), to guarantee P HC( µ log p) ≤ log2 N ≤ log12 N , it suffices to require that


p > C(γ, α, r, , µ) is sufficiently large, and


√ 2 √ 2 √ 2 √ 2
(
1 γ µ 1

2 − α + 2 + 2 − ( µ − (1 −  ) r)+ > 2 −α + γ + µ − ( µ− (1 +  ) r)+ ,
1 γ µ √ 2 √ 2 1 √ 2 √r)2 > 0,
2 − α + 2 + 2 − ( µ − (1 −  ) r)+ > 2 µ − ( µ −  +

which amounts to
( √ √ √ √
1 − α > 2( µ − (1 − 2 ) r)2+ − ( µ − (1 + 2 ) r)2+ ,
√ √ √ √ (5.13)
1 − 2α + γ > 2( µ − (1 − 2 ) r)2+ − ( µ − 2 r)2+ .

In order to find appropriate µ ∈ (0, 1 − γ) and  > 0 depending only on γ, α, r such that these two
inequalities hold simultaneously, we will discuss three cases separately.

Case 1: 1+γ 3+γ 1+γ


2 ≤ α < 4 and α − 2 < r ≤ 4
1−γ
In this case, let µ = 4r(1 − 2 )2 . Then both
inequalities in (5.13) hold when we let  = 0. By the continuity of the functions with respect to
 and the properties of open sets, we can choose a sufficiently small positive constant C0 (γ, α, r),
such that when  = C0 (γ, α, r) > 0, both inequalities in (5.13) hold strictly.

Case 2: 1+γ 3+γ


2 ≤ α < 4 and r > 4
1−γ
In this case, let µ = (1−γ)(1−2 )2 . Then both inequalities
in (5.13) hold when we let  = 0. Similarly, they also hold when  = C0 (γ, α, r) > 0.

√ √ 2
Case 3: 3+γ
4 ≤ α < 1 and r > 1−γ− 1−α In this case, let µ = (1 − γ)(1 − 2 )2 . Then
both inequalities in (5.13) hold when we let  = 0. Similarly, they also hold when  = C0 (γ, α, r) > 0.
In summary, for fixed 1+γ ∗ 2
2 ≤ α < 1 and r > ργ (α), we can choose  > 0 and µ ∈ (0, (1−γ)(1−) ]
only depending on γ, α, r such that such that
 both inequalities in (5.13) hold. Notice that t =
√ q 
µ log p lies in the domain of HC(t), i.e. 1, log N3 . Then when p > C(α, γ, r), we have

the inequality P HC( µ log p) ≤ log2 N ≤ log12 N , and hence P HC ∗ ≤ log2 N ≤ log12 N . Since
 

C(α, γ, r) is independent of the choice of (τ , β̃), we have

lim max P(τ ,β̃) (H0 is failed to reject) = 0.


p→∞ (τ ,β̃)∈Γ(p,N,s,r)

5.2.4 Upper bound of HC ∗ under the null

Under the null, for m = 1, . . . , q we have vm = wm . By (5.5), for any u = 1, . . . , L, the vari-
ables vu , vu+L , vu+2L , . . . , vu+(N −1)L are IID standard complex normal variables. This implies
that 1{|vu |≥t} , 1{|vu+L |≥t} , . . . , 1{|vu+(N −1)L |≥t} are IID Bernoulli random variables with parameter
2
Ψ̄(t) = e−t .

29
By Lemma 5.4, for any u, we have
N
 
P
1{|vu+(j−1)L |>t} − N Ψ̄(t)

j=1
 1
sup > C1 < ,
 
P p 2
 q
N log N N Ψ̄(t)(1 − Ψ̄(t))  log N
1≤t≤ log 3

provided N > N0 . Therefore


q
P
1{|vm |>t} − q Ψ̄(t)
∗ m=1
HC = sup p
1≤t≤
q
log N q Ψ̄(t)(1 − Ψ̄(t))
3

L
PN
1 X − N Ψ̄(t)
j=1 1{|vu+(j−1)L |>t}
= sup √ p
1≤t≤
q
log N L u=1 N Ψ̄(t)(1 − Ψ̄(t))
3

L
PN
1 X − N Ψ̄(t)
j=1 1{|vu+(j−1)L |>t}
≤√ sup p
L u=1 1≤t≤qlog N N Ψ̄(t)(1 − Ψ̄(t))
3

≤ LC1 log N
L
with probability at least 1 − log2 N
. Since L = blog N + 1c and N = p1−γ , we have

lim P(H0 is rejected) = lim P(HC ∗ > log2 N ) = 0.


p→∞ p→∞

5.3 The proof of Theorem 2.2


p
Without loss of generality, assume = pγ is an integer. We now consider a class of special
N
alternatives. Let r
rp log p
β̃1 = . . . = β̃s =
N
be real and positive. As to τ , we first define a set of index vectors:

TN = τ̃ = (τ̃1 , . . . , τ̃s ) : 1 ≤ τ̃1 < . . . < τ̃s ≤ N,

2 2
τ̃l+1 − τ̃l ≥ log N for l = 1, . . . , s − 1, τ̃1 + N − τ̃s ≥ log N .
 
For τ̃ ∈ TN , define τ = (pγ (τ̃1 − 1) + 1, . . . , pγ (τ̃s − 1) + 1), which implies τ , β̃ ∈ Γ(p, N, s, r).
Then the measurements become
s
1 X − 2πi(τl −1)(j−1)
yj = √ e p β̃l + zj
p
l=1
s r
1 X − 2πipγ (τ˜lp−1)(j−1) rp log p
=√ e + zj
p N
l=1
s
1 X 2πi(τ˜l −1)(j−1)
e−
p
=√ N r log p + zj
N l=1

30
for j = 1, . . . , N .
Denote by FN the N × N normalized DFT matrix:
1 2πi(k−1)(j−1)
FN (j, k) = √ e− N .
N
√ p
Define θ(τ̃ ) ∈ RN , such that the τ̃l th component of θ(τ̃ ) is 2r log p = 2r(1 − γ) log N for
l = 1, . . . , s, while other components are zeros. Then θ(τ̃ ) is a sparse vector, whose sparsity is
1− α−γ √ p
s = p1−α = N 1−γ . Moreover, all its entries are all 2r log p = 2r(1 − γ) log N . Now the
measurements can be written as
1
y = √ FN θ(τ̃ ) + z,
2
which is equivalent to √ √
2FN∗ y = θ(τ̃ ) + 2FN∗ z.
Notice that FN∗ z √
∼ CN (0, I, 0). Since θ(τ̃ ) is deterministic, real and positive and the imaginary
and real parts of 2FN∗ y are independent, we have the following equivalent measurement:

v := <( 2FN∗ y) = θ(τ̃ ) + w,

where w ∈ N (0, IN ). Now the detection problem becomes nearly the standard sparse mean
detection studied in [36]; aslo see [23, 33]. The only difference is that here τ̃ ∈ TN satisfies the
separation condition, which is

τ̃l+1 − τ̃l ≥ log2 N for l = 1, . . . , s − 1, τ̃1 + N − τ̃s ≥ log2 N.

We now prove that this difference is actually negligible. Suppose τ̃ is uniformly distributed in TN .
It induces a mixed simple alternative
1 X
p1 (v) = pτ̃ (v).
|TN |
τ̃ ∈TN

To prove that

P0 (H0 is rejected) + max P(β,τ ) (H0 is accepted) → 1,


(β,τ )∈Γ(p,N,s,r)

by the standard Hellinger distance argument, it suffices to prove the


r
p1
E0 (v) ≥ 1 − o(1).
p2

Suppose SN is the collection of all subsets of {1, . . . , N } with cardinality s. By Lemma A. 8 in [33],
|TN |
we know |S N|
= 1 − o(1). Define

1 X
p̃1 (v) = pτ̃ (v),
|SN |
τ̃ ∈SN

and define
|SN | 1 X
δ(v) = p̃1 − p1 = pτ̃ (v)
|TN | |TN |
τ̃ ∈(SN −TN )

31
If τ̃ is uniformly distributed in SN , p̃1 becomes the simple mixed alternative, and the detection
problem becomes the standard sparse mean detection problem. Notice that
v
u |S |
r u N p̃1 − δ
p1 t |TN |
E0 = E0
p0 p0
s s
p̃1 δ
≥ E0 − E0
p0 p0
s s
p̃1 δ
≥ E0 − E0
p0 p0
s s
p̃1 1 X
= E0 − 1
p0 |TN |
τ̃ ∈SN −TN
s s
p̃1 |SN − TN |
= E0 −
p0 |TN |
s
p̃1
≥ E0 − o(1).
p0
q
p̃1
Therefore, it suffices to prove E0 p0 ≥ 1 − o(1). The problem now becomes the standard sparse
1− α−γ
mean vector detection studied
p in [36, 23, 33]. Since s = N
1−γ and the common nonzero compo-

nents of the mean vector: 2r(1 − γ) log N , it suffices to require



r(1 − γ) < α−γ if 1 < α−γ ≤ 3 ,
1−γ q 2 1−γ
2 4
r(1 − γ) < 1 − 1 − α−γ if 3 < α−γ < 1.
1−γ 4 1−γ

This is exactly r < ρ∗γ (α), and the proof is completed.

References

[1] K. S. Alexander. Rates of growth and sample moduli for weighted empirical processes indexed
by sets. Probability Theory and Related Fields, 75(3):379–423, 1987.

[2] E. Arias-Castro, E. Candès, and Y. Plan. Global testing under sparse alternatives: ANOVA,
multiple comparisons and the higher criticism. Ann. Statist., 39(5):2243–2794, 2011.

[3] E. Arias-Castro, E. J. Candés, H. Helgason, and O. Zeitouni. Searching for a trail of evidence
in a maze. Ann. Statist., 36(4):1726–1757, 2008.

[4] E. Arias-Castro, D. L. Donoho, and X. Huo. Near-optimal detection of geometric objects by


fast multiscale methods. IEEE Trans. Inform. Theory, 51(7):2402–2425, Jul. 2005.

[5] L. Baboulaz and P. L. Dragotti. Exact feature extraction using finite rate of innovation prin-
ciples with an application to image super-resolution. IEEE Transaction on Image Processing,
18(2):281–298, 2009.

32
[6] W. U. Bajwa, K. Gedalyahu, and Y. C. Eldar. Identification of parametric underspread linear
systems and super-resolution radar. IEEE Transactions on Signal Processing, 59(6):2548–2561,
2011.

[7] E. Baransky, G. Itzhak, I. Shmuel, N. Wagner, E. Shoshan, and Y. C. Eldar. A Sub-Nyquist


Radar Prototype: Hardware and Algorithms. IEEE Transactions on Aerospace and Electronic
Systems, 50(2):809–822, 2014.

[8] E. Blviken. New Tests of Significance in Periodogram Analysis. Scand J Statist, 10:1–9, 1983.

[9] B. N. Bhaskar, G. Tang, and B. Recht. Atomic norm denoising with applications to line
spectral estimation. Preprint, 2012.

[10] P. J. Brockwell and R. A. Davis. Times Series: Theory and Methods (2nd edition). Springer,
2009.

[11] A. Bruckstein, T. J. Shan, and T. Kailath. The resolution of overlapping echos. IEEE Trans.
on Acoustics, Speech and Signal Processing, 33(6):1357–1367, 1985.

[12] J. Cadzow. Signal enhancement-a composite property mapping algorithm. IEEE Trans. on
Acoustics, Speech and Signal Processing, 36(1):49 – 62, 1988.

[13] T. T. Cai, J. X. Jeng, and J. Jin. Optimal detection of heterogeneous and heteroscedastic
mixtures. Journal of the Royal Statistical Society. Series B (Methodological).

[14] T. T. Cai, J. Jin, and M. G. Low. Estimation and confidence sets for sparse normal mixtures.
Ann. Statist., 35(6):2421–2449, 2007.

[15] E. Candès and C. Fernandez-Granda. Super-resolution from noisy data. J. of Fourier Analysis
and its Applications, 19(6):1229–1254, 2013.

[16] E. Candès and C. Fernandez-Granda. Towards a mathematical theory of super-resolution.


Communications on Pure and Applied Mathematics, 67(6):906–956, 2014.

[17] E. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction
from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489–509,
2006.

[18] V. Chaisinthop and P. L. Dragotti. Centralized and Distributed Semi-Parametric Compression


of Piecewise Smooth Functions. IEEE Transaction on Signal Processing, 59(7):3071 – 3085,
2011.

[19] S. S. Chen and D. L. Donoho. Application of basis pursuit in spectrum estimation. Conference
on Acoustics, Speech and Signal Processing, 3, 1998.

[20] S. Chiu. Detecting Periodic Components in a White Gaussian Time Series. Journal of the
Royal Statistical Society. Series B (Methodological), 51(2):249–259, 1989.

[21] Kfir M. Cohen, Chen Attias, Barak Farbman, Igor Tselniker, and Yonina C. Eldar. Channel
estimation in uwb channels using compressed sensing. In Proc. IEEE ICASSP-14, Florence,
Italy, May 2014.

[22] D. L. Donoho. Superresolution via sparsity constraints. SIAM J. Math. Anal, 23(5):1309 –
1331, 1992.

33
[23] D. L. Donoho and J. Jin. Higher criticism for detecting sparse heterogeneous mixture. Ann.
Statist., 32(3):962–994, 2004.
[24] M. F. Duarte and R. G. Baraniuk. Spectral compressive sensing. Appl. Comput. Harmon.
Anal., 35(1):111–129, 2013.
[25] J. Durbin. Tests for Serial Correlation in Regression Analysis on the Periodogram of Least-
Squares Residuals. Biometrika, 56(1):1–15, 1969.
[26] A. Fannjiang and W. Liao. Coherence pattern-guided compressive sensing with unresolved
grids. Journal SIAM Journal on Imaging Sciences, 5(1):179 – 202, 2012.
[27] R. A. Fisher. Tests of Significance in Harmonic Analysis. Proceedings of Royal Society, Ser.
A, 125:54–59, 1929.
[28] W. A. Fuller. Introduction to Statistical Time Series (2nd edition). Wiley, 1995.
[29] K. Gedalyahu and Y. C. Eldar. Time delay estimation from low rate samples: A union of
subspaces approach. IEEE Transactions on Signal Processing, 58(6):3017–3031, June 2010.
[30] Kfir Gedalyahu, Ronen Tur, and Yonina C. Eldar. Multichannel Sampling of Pulse Streams
at the Rate of Innovation. IEEE Trans. Signal Process., 59(4):1491–1504, Apr. 2011.
[31] Evarist Giné and Vladimir Koltchinskii. Concentration inequalities and asymptotic results for
ratio type empirical processes. The Annals of Probability, 34(3):1143 – 1216, 2006.
[32] E. F. Glynn, J. Chen, and A. R. Mushegian. Detecting periodic patterns in unevenly spaced
gene expression time series using LombScargle periodograms. Bioinformatics, 22(3):310 – 316,
2006.
[33] P. Hall and J. Jin. Innovated higher criticism for detecting sparse signals in correlated noise.
Ann. Statist., 38(3):1686–1732, 2010.
[34] J. Hu, Z. Shi, J. Zhou, and Q. Fu. Compressed sensing of complex sinusoids: an approach
based on dictionary refinement. IEEE Trans. on Signal Processing, 60(7):3809 – 3821, 2012.
[35] Y. Hua and T. Sarkar. Matrix pencil method for estimating parameters of exponentially
damped/undamped sinusoids in noise. IEEE Trans. on Acoustics, Speech and Signal Process-
ing, 38(5):814 – 824, 1993.
[36] Y. I. Ingster. Minimax detection of a signal for `ln balls. Math. Methods Statist., 7:401–428,
1999.
[37] W. Li and Q. Shao. A normal comparison inequality and its applications. Probability Theory
and Related Fields, 122(4):494–508, 2002.
[38] N. R. Lomb. Least-squares frequency analysis of unequally spaced data. Astrophys Space Sci,
39:447–462, 1976.
[39] H. Rootzén M. R. Leadbetter, G. Lindgren. Extremes and related properties of random
sequences and processes. Springer Series in Statistics, 1983.
[40] R. Prony. Essai expérimental et analytique: sur les lois de la dilatabilité de fluides élastique
et sur celles de la force expansive de la vapeur de l’alkool, à différentes températures. Journal
de l’Ecole Polytechnique, 1(2):24 – 76, 1795.

34
[41] R. Roy and T. Kailath. Espirit-estimation of signal parameters via rotational invariance
techniques. IEEE Trans. on Acoustics, Speech and Signal Processing, 37(7):984 – 995, 1989.

[42] P. Rubin-Delanchy and A. T. Walden. Kinematics of Complex-Valued Time Series. IEEE


Trans. on Signal Processing, 56(9):4189–4198, 2008.

[43] J. D. Scargle. Studies in astronomical time series analysis. II. Statistical aspects of spectral
analysis of unevenly spaced data. Astrophys J, 263:835–853, 1982.

[44] R. Schmidt. Multiple emitter location and signal parameter estimation. IEEE Trans. on
Antennas and Propagation, 34(3):276 – 280, 1986.

[45] A. F. Siegel. Testing for Periodicity in a Time Series. Journal of the American Statistical
Association, 75(370):345–348, 1980.

[46] A. M. Sykulski, S. C. Olhede, J. M. Lilly, and J. J. Early. The Whittle Likelihood for Complex-
Valued Time Series. Preprint, 2013.

[47] G. Tang, B. N. Bhaskar, and B. Recht. Near minimax line spectral estimation. Preprint, 2013.

[48] R. Tur, Y. C. Eldar, and Z. Friedman. Innovation rate sampling of pulse streams with appli-
cation to ultrasound imaging. IEEE Trans. Signal Process., 59(4):1827–1842, Apr. 2011.

[49] Jose Antonio Urigiien, Yonina C Eldar, Pier Luigi Dragotti, and Zvika Ben-Haim. Compressed
Sensing: Theory and Applications, chapter Sampling at the rate of innovation: theory and
applications, pages 148–209. Cambridge University Press, 2012.

[50] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite rate of innovation. IEEE
Trans. Signal Processing, 50:1417–1428, June 2002.

[51] N. Wagner, Y. C. Eldar, and Z. Friedman. Compressed beamforming in ultrasound imaging.


IEEE Trans. Signal Process., 60(9):4643–4657, Sep. 2012.

[52] M. C. A. Van Zuijlen. Properties of the empirical distribution function for independent non-
identically distributed random variables. Ann. Prob., 6(2):250 – 266, 1978.

35

You might also like