Professional Documents
Culture Documents
ORIGINAL ARTICLE
We approach the problem of non-parametric estimation for autoregressive Markov switching processes. In this context, the
NadarayaWatson-type regression functions estimator is interpreted as a solution of a local weighted least-square problem,
which does not admit a closed-form solution in the case of hidden Markov switching. We introduce a non-parametric recursive
algorithm to approximate the estimator. Our algorithm restores the missing data by means of a Monte Carlo step and estimates
the regression function via a RobbinsMonro step. We prove that non-parametric autoregressive models with Markov switching
are identifiable when the hidden Markov process has a finite state space. Consistency of the estimator is proved using the strong
-mixing property of the model. Finally, we present some simulations illustrating the performances of our non-parametric
estimation procedure.
Keywords: Autoregressive process; Markov switching; RobbinsMonro approximation; non-parametric kernel estimation
MOS subject classification: Primary: 60G17, Secondary:62G07.
1. INTRODUCTION
Markov switching autoregressive processes can be looked at as a combination of hidden Markov models (HMMs)
and threshold regression models. The switching autoregressive processes, introduced into an econometric context
by Goldfeld and Quandt (1973), have become quite popular in the literature and have been employed Hamilton
(1989) in the analysis of the gross internal product of the USA for both contraction and expansion regimes. In this
family of models, which combines different autoregressive models to describe the time evolution of the process,
the transition between these different autoregressive models is controlled by an HMM.
Switching linear autoregressive processes with Markov regime has been extensively studied, and several appli-
cations in economics and finance can be found in, for instance, Krolzig (1997), Kim and Nelson (1999) and
Hamilton and Raj (2003). These models are also widely used in several electrical engineering areas including
tracking of manoeuvring targets, failure detection, wind power production and stochastic adaptive control; see, for
instance, Tugnait (1982), Doucet et al. (2000), Capp et al. (2005) and Ailliot and Monbet (2012).
Switching nonlinear autoregressive models with Markov regime are of considerable interest to the statistics
community, especially for econometric series modelling. Among such models, considered by Francq and Roussig-
nol (1997), there are those that admit an additive decomposition, with particular interest in the switching ARCH
Correspondence to: Luis Angel Rodriguez, Dpto. de Matemticas, FACYT, Universidad de Carabobo, Valencia, Venezuela. E-mail:
larodri@uc.edu.ve
models (Francq et al., 2001). However, an even more general class of switching nonlinear autoregressive processes
that do not necessarily admit an additive decomposition has also been studied by Krishnamurthy and Rydn (1998)
and Douc et al. (2004).
We consider a particular type of switching nonlinear autoregressive models Y D Yk k>0 with Markov regime,
called a Markov switching nonlinear autoregressive (MS-NAR) process and which is defined for k > 1 by
where ek k>1 are i.i.d. random variables, the sequence Xk k>1 is a homogeneous Markov chain with state
space 1; : : : ; m and r1 .y/; : : : ; rm .y/ are the regression functions, assumed to be unknown.
We denote by A the probability transition matrix of the Markov chain X , that is, A D aij , with aij D
P .Xk D j jXk1 D i /. We assume that the variable Y0 , the Markov chain Xk k>1 and the sequence ek k>1
are independent random variables.
This model is a generalization of the switching linear autoregressive model with Markov regime, also known as
MS-AR model. When the regression functions ri are linear, the MS-NAR process is simply an MS-AR model.
In the parametric case, that is, when the regression functions depend on an unknown parameter, the maximum
likelihood estimation method is commonly used. Although the consistency of the maximum likelihood estimator
for the MS-NAR model is given by Krishnamurthy and Rydn (1998), the consistency and asymptotic normality
are proved in a more general context by Douc et al. (2004). Several versions of the expectationmaximization
(EM) algorithm and of its variants, for instance, stochastic EM, Monte Carlo EM and simulated annealing EM
(SAEM), are implemented in Capp et al. (2005) for the computations of the maximum likelihood estimator. A
semi-parametric estimation for the MS-NAR model was studied by Ros and Rodrguez (2008b), where the authors
considered a conditional least-square approach for the parameter estimation and a kernel density estimator for the
innovation density probability.
Although in many situations a key problem is how to estimate the order m of the model, in this work, we assume
it to be known. Nevertheless, when m is not known, a possible approach is to consider the minimization of a
penalized contrast function. In this case, the main problem to be solved is the choice of a good penalty term. For
the case of an MS-AR with Gaussian innovation, Ros and Rodrguez (2008a) considered a penalized likelihood
criteria. For finite mixture model, a penalized contrast defined from the Hankel matrices of the first algebraic
moments has been taken into account by Dacunha-Castelle and Gassiat (1997). In the non-parametric context, as
far as we know, this is still an open problem.
Nevertheless, before studying consistency of any estimation procedure for this model, one needs to answer the
question of model identifiability. So far, in the context of the non-parametric estimation of the MS-NAR models, as
far as we know, this problem has not been previously addressed in the literature. We tackle this problem following
the ideas given by the recent work of Gassiat et al. (2016) for the non-parametric estimation of HMMs. The overall
idea is to identify the Markov regime first and then to ensure that the estimation method provides a unique estimate.
We consider non-parametric estimators obtained through the minimization of a quadratic contrast function. This
function has a unique minimum given by the NadarayaWatson estimator when the Markov chain is observed,
ensuring in this case that the regression functions are identified with a non-parametric approach.
In this work, we consider a non-parametric regression model. That is, for i D 1; : : : ; m, we define a Nadaraya
Watson-type kernels estimator, given by
Pn1
yYk
kD0 YkC1 K h
1Ii .XkC1 /
rOi;n .y/ D Pn1 yYk : (2)
kD0 K h
1Ii .XkC1 /
This NadarayaWatson-type estimator was introduced for HMMs in Harel and Puri (2001).
In the first part, we establish uniform consistency, assuming that a realization of the complete data .Y0Wn ; X1Wn /
is known, with Y0Wn D .Y0 ; : : : ; Yn / and X1Wn D .X1 ; : : : ; Xn /; that is, we prove the convergence over compact
subsets C R,
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
This is an interesting asymptotic result, but the key feature of the MS-NAR models is that because the state
sequence Xk k>1 is generally not observable, the statistical inference has to be carried out by means of the
observations Yk k>0 only.
In the non-parametric context, the estimators of regression functions ri .y/, for each y and i D 1; : : : ; m, can be
interpreted as solutions D .1 ; : : : ; m / of the local weighted least-squares problem
n1 m
1 XX y Yk
U.y; Y0Wn ; X1Wn ; / D K 1Ii .XkC1 /.YkC1 i /2 ;
nh h
kD0 iD1
where the weights are specified by the kernel K , so that the observations Yk near y have the largest influence on
the estimate of the regression function at y . That is,
When a realization of the state sequence Xk k>1 is observed, the solutions of this problem are the Nadaraya
Watson kernel estimators rOi;n defined in (2). Nevertheless, when Xk k>1 is a hidden Markov chain, the solution
must be approximated because it does not admit a closed form.
In the second part, we propose a recursive algorithm for the estimation of the regression functions ri with a
t
Monte Carlo step, which restores the missing data Xk k>1 by X1Wn , and a RobbinsMonro procedure, which
allows us to estimate the unknown value of . This approximation minimizes the potential U using the gradient
algorithm, for each fixed y
t D t1 t r U y; Y0Wn ; X1Wn
t
; t1 ;
where t is any sequence of real positive numbers decreasing to 0, and r U , the gradient of U with respect to
the vector 2 Rm .
In a general context, the RobbinsMonro approach is studied in Duflo (1996). Whereas EM-type algorithms
with kernel estimation are used for finite mixtures of non-parametric multi-variate densities in Benaglia et al.
(2009) and of non-parametric autoregression with independent regimes in Franke et al. (2011), we establish in
the present work the consistency of the estimator obtained by our RobbinsMonro algorithm. This asymptotic
property is obtained for each fixed point y .
The article is organized as follows. In Section 2, we present the general conditions on the model ensuring the
existence of a probability density distribution and its stability. We prove that it satisfies the strong mixing depen-
dence condition and model identifiability. Furthermore, we prove the uniform consistency of the NadarayaWatson
kernels estimator in the case of complete data. In Section 3, we prove the main result, namely, the consistency of
estimator related to our RobbinsMonro algorithm. Section 4 contains some numerical experiments on simulated
data illustrating the performances of our non-parametric estimation procedure. Some of the proofs are deferred to
Appendix A.
2. PRELIMINARY
We review the key properties of the MS-NAR model, which we shall need afterwards for proving results. In
addition, we prove the uniform consistency of the NadarayaWatson kernel estimator under the assumption that a
realization of complete data is available.
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
Pm
E4 iD1 i log i < 0.
E5 E.je1 js / < 1, for some s > 1.
E6 The sequence ek k>1 of random variables has a common density probability function .e/ with respect
to the Lebesgue measure.
E7 The density probability function .e/ is everywhere positive on R.
The MS-NAR process Y D Yk k>0 is not, in general, a Markov process. However, condition E1 implies that
the extended process Z D Yk ; Xk k>1 with space states E D R 1; : : : ; m is a Markov chain.
We recall that a Markov chain is Fellerian if and only if, for all continuous bounded functions, the image under
the operator defined by the kernel transition of Z is also bounded and continuous. It is strongly Fellerian if, for
any bounded function, the image is continuous. Under condition E2, Z is a Feller chain, and it is a strong Feller
chain if, in addition, condition E6 holds.
For the problem of the existence and uniqueness of a strictly stationary ergodic solution for model Y , we have,
under condition E1, that a stationary solution exists if and only if the Markov chain Z has an invariant probability
measure. Furthermore, this stationary solution is unique if and only if the invariant probability measure of the
extended process Z is unique. Yao and Attali (1999) first studied the properties of the extended process and then
derived the properties of the marginal process Y .
The MS-NAR model is called sublinear if conditions E2 and E3 hold. In Proposition 2.1, we summarize some
results, given by Yao and Attali (1999), for sublinear MS-NAR model.
Proposition 2.1 (Yao and Attali). Consider a sublinear MS-NAR, Y D Yk k>0 . Assuming E1E7, we have the
following:
Remark 2.1. Under conditions E1E7, there exists a strictly stationary solution for model Y if and only if the
Markov chain Z has an invariant probability measure. Furthermore, this stationary solution is unique if and only
if the invariant probability measure of Z is unique.
Condition E7 in Proposition 2.1 ensures that the transition kernel of Markov chain Z is -irreducible, which
implies the uniqueness of the invariant probability measure.
For the stability, the moment condition E5 with s > 1 is enough, but for the asymptotic properties of the kernel
estimator, it will be necessary that s > 2.
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
(i) The random vector .Y0Wn ; X1Wn / admits the probability density function
p.Y0Wn D y0Wn ;X1Wn D x1Wn / D .yn rxn.yn1 // .y1 rx1.y0 //axn1 xn ax1 x2 x1 p.Y0 D y0 /;
with respect to the product measure c , where and c denote Lebesgue and counting measures
respectively.
(ii) If is a bounded density, then the joint density of .Yk ; Yk 0 / satisfies
where Mba , with a; b 2 Z, is the -algebra generated by Yk kDaWb . It is called absolutely regular mixing, if
n WD E ess supP .BjM01 / P .B/ W B 2 M1
n ! 0; as n ! 1: (4)
The values n and n are called strong mixing and regular mixing coefficients respectively. For properties and
examples under several mixing assumptions, see Doukhan (1994). In general, we have the following inequality:
2n 6 n 6 1. This implies that all -mixing processes are also -mixing. Note that the -mixing coefficients
can be rewritten as follows:
n WD supjcov.;
/j W 0 6 kk1 ; k
k1 6 1; 2 M01 ;
2 M1
n : (5)
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
The extended process Z is geometrically ergodic. This implies a geometric rate of -mixing coefficients, such
that Z is an -mixing process. From (5), we prove in Proposition 2.2 that the -mixing property of Z D .Y; X / is
transferred to the component Y .
Proposition 2.2. The MS-NAR model under conditions E1E7 is -mixing, and their coefficients n .Y /
decrease geometrically.
2.4. Identifiability
We prove the identifiability of the MS-NAR model, following the recent work of Gassiat et al. (2016) for non-
parametric estimation of HMMs. We consider the following assumptions:
I1 The probability transition matrix A D .aij /i;j D1Wm has full rank.
I2 The functions r1 ;: : : ; rm are different a.s.; that is, if i j , then ri .y 0 / rj .y 0 / for almost all y 0 .
I3 The density probability function is such that the functions .y r1 .y 0 //; : : : ; .y rm .y 0 // are linearly
independent; that is,
m
X
i .y ri .y 0 // D 0; for all y; y 0 1 D D m D 0:
iD1
I4 The density probability function is such that .y rQkQ .y 0 // D .y rk .y 0 // for all y if and only if
rQkQ .y 0 / D rk .y 0 /.
.3/
We denote by pA;r the probability density function of Y0 ; Y1 ; Y2 ; Y3 . Notice that if the Markov chain X is
.3/
irreducible, there exists a unique invariant distribution and (Lemma 2.1) pA;r is well defined by
0 1 0 1
m
X m
X m
X
.3/
pA;r D p.Y0 D y0 / @ j aj i .y1 rj .y0 //A .y2 ri .y1 // @ aij .y3 rj .y2 //A ;
iD1 j D1 j D1
where p.Y0 D y0 / is the probability density of Y0 , and D .1 ; : : : ; m / is a stationary distribution of A, which
is the distribution of X1 . For the case of a non-irreducible Markov chain, the distribution of X1 has to be specified
as there arise many invariant distributions.
.3/
Proposition 2.3. Assume that m is known, under conditions I1I4, A and r are identifiable from pA;r , up to
label swapping of the hidden states.
The proof of this proposition follows the same idea given in Gassiat et al. (2016).
Proof
We have to prove that if AQ is a mm probability transition matrix and if rQ D .rQ1 ; : : : ; rQm / are regression functions,
.3/ .3/
such that pA; Q rQ D pA;r , then there exists a permutation of the set 1; : : : ; m such that, for all i; j D 1; : : : ; m,
aQ ij D a .i / .j / and rQi D r .i / . P
From conditions I1 and I3, the functions m 0
j D1 j aj i .y rj .y // iD1Wm are linearly independent; simi-
Pm 0
larly, the functions j D1 aij .y rj .y // iD1Wm are also linearly independent. Then, according to Allman et
al. (2009, Theorem 8), there exists a permutation of the set 1; : : : ; m such that, for all i D 1; : : : ; m,
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
m
X m
X
Q j aQ j i .y1 rQj .y0 // D j aj .i/ .y1 rj .y0 //
j D1 j D1
m
X m
X
Q j aQ j i .y1 r .j / .y0 // D .j / a .j / .i / .y1 r .j / .y0 //
j D1 j D1
m
X X m
aQ ij .y3 r .j / .y2 // D a .i / .j / .y3 r .j / .y2 //:
j D1 j D1
Remark 2.2. Condition I2 implies the identifiability of regression functions ri s for almost all y 0 . Nevertheless,
the continuity given by condition E2 ensures the identifiability for all y 0 .
Corollary 2.1 applies in the case where the innovation e is a Gaussian white noise.
Corollary 2.1. Assume that m is known and that is the density of a Gaussian distribution with zero mean and
.3/
variance 2 . Under conditions I1I2, A and r are identifiable from pA;r up to label swapping of the hidden states.
for i D 1; : : : ; m and y 2 R.
Let us introduce
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
with
n1 n1
1 X 1 X
gO i;n .y/ WD YkC1 Kh .y Yk / 1Ii .XkC1 /; fOi;n .y/ WD Kh .y Yk / 1Ii .XkC1 /; (8)
nh nh
kD0 kD0
B1 kKk1 < 1.
B2 kk1 < 1.
R R
Under condition B1, the kernel K is of order 2, that is, tK.t /dt D 0 and 0 < t 2 K.t /dt < 1.
Let C be a compact subset of R. We assume the following regularity conditions:
We define g2;i .y/ WD fi .y/ri;0 .y; y/ D fi .y/E.Y12 jY0 D y; X1 D i /, which is continuous owing to
condition R3.
Finally, we impose one of the two following moment conditions:
Remark 2.3. Note that M1 implies M2 and M2 implies E5. The latter is a sufficient condition for the stability of
the MS-NAR model.
In view of the independence of Y0 and e1 , condition M1 implies:
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
where c is a strictly positive constant. Moreover, E3 and M2 imply E.jY1 js / < 1. This condition is also implied
by M1. Conditions M1 and M2 are assumed to hold: the former so as to obtain the a.s. uniform convergence over
compact sets and the latter for the a.s pointwise convergence.
We now establish the uniform convergence over compact sets of the NadarayaWatson kernel estimator rOi;n
defined in (2). For this purpose, we introduce the following three technical lemmas. Their proofs are given in
Appendix A.
The first lemma allows treating in a unified way the asymptotic behaviour of the variances and covariances of
fOi;n and a truncated version of gO i;n . The other two lemmas give an asymptotic bound for the bias and variance
terms in the estimation of the regression functions ri .
We denote by A.k/ij the .i; j /th entry of the k th power of the matrix A. We define Bn;h Bh so that
limh!0 limn!1 Bn;h D limh!0 Bh , that is, for large enough n and small enough h, Bn;h is approximately
equal to Bh . Analogously, we define Bn;h Bh to mean that limh!0 limn!1 Bn;h 6 limh!0 Bh . In particular,
we write Bn;h B to mean that B is a bound for the sequence Bn;h , for large enough n and small enough h.
Lemma 2.2. Assume that the model MS-NAR satisfies conditions E1E7, D1, B1B2, S1 and R2R3 on a
compact set C . Let Mn n>1 be a non-decreasing sequence of positive numbers tending to infinity. Let
(i) var .T0;n / h.a2 fi .y/ C 2abgi .y/ C b 2 g2;i .y//kKk22 C o.h2 /.
(ii) cov .T0;n ; Tk;n / h2 .a2 C 2ab.jri .y/j C E.je1 j// C b 2 ri;k .y; y//A.k/ 3
i;i i kk1 C o.h /.
2 2 2 2
(iii) cov .T0;n ; Tk;n / 6 .a C 2abMn C b Mn /4kKk1 k , for any n > 0.
Lemma 2.3. Assume that the model MS-NAR satisfies conditions E1E4, E6E7, D1, B1B2, S1, M2 and
R2R3 on a compact set C . Let Mn n>1 be a positive non-decreasing sequence tending to infinity, and > 1.
Then the following asymptotic inequalities hold true, for all y 2 C :
=2 un .2s/
2 nh
(i) P .jgO i;n .y/ EgO i;n .y/j > / 4 1 C 16c1
C c2 16Mhn C c3 Mn2 h2 .
=2 un
2 nh
(ii) P .jfOi;n .y/ EfOi;n .y/j > / 4 1 C 16cQ1
C c2 16
h
.
Where c1 D supy2C g2;i .y/kKk22 , cQ1 D supy2C fi .y/kKk22 , c2 > 0 and 0 <
< 1 such that the mixing
coefficient n .Y / 6 c2
n , un D .h log n/1 , and c3 D kKk21 .
Lemma 2.4. Assume that the model MS-NAR satisfies conditions E1E7, D1 and R2 on a compact set C . Then
the following statements hold true.
Remark 2.4. Lemma 2.2 is a preliminary result, which is necessary to prove Lemma 2.3.
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
Theorem 2.1. Assume that the model MS-NAR (1) satisfies conditions E1E4, E6E7, D1, B1B2, S1 and
R1R3 on a compact set C . Then
(i) If nhn = log n ! 1 and condition M2 holds, then for all y 2 C , jrOi;n .y/ ri .y/j ! 0; a.s.
(ii) If nhn = log n ! 1 and condition M1 holds, then supy2C jrOi;n .y/ ri .y/j ! 0; a.s.
3. MAIN RESULTS
We present our RobbinsMonro type algorithm for the non-parametric estimation of the MS-NAR model in the
partially observed data case, and we prove the consistency of the estimator.
The NadarayaWatson estimator rOn .y/ D .rO1;n .y/; : : : ; rOm;n .y//, for each y , can be interpreted as the solution
of a locally weighted least-squares problem; in our case, this has to do with finding the minimum of the potential
U defined by
n1 m
1 XX
U.y; Y0Wn ; X1Wn ; / D Kh .y Yk /1Ii .XkC1 /.YkC1 i /2 ; (9)
nh
kD0 iD1
with respect to D .1 ; : : : ; m / in a convex open set of Rm . Thus, the regression estimator rOn is given by
In the partially observed data case, that is, when we do not observe Xk k>1 , we cannot obtain an explicit expres-
sion for the solution rOn .y/. Thus, we must consider a recursive algorithm for the approximation to this solution.
Our approach approximates the estimator rOn .y/ by a stochastic recursive algorithm similar to that of Robbins
Monro (Capp et al., 2005; Duflo, 1996; Yao, 2000). This involves two steps: first, a Monte Carlo step that restores
the missing data Xk k>1 and second, a RobbinsMonro approximation so as to minimize P the potential U .
At this point, we introduce some further notation. For 1 6 i 6 m, ni .X1Wn / D nkD1 1Ii .Xk / is the number of
Pn1
visits of the Markov chain Xk k>1 to state i in the first n steps, and nij .X1Wn / D kD1 1Ii;j .Xk1 ; Xk / is the
number of transitions from i to j in the first n steps. t D . t ; At / is a vector containing the estimated functions
t D .1t ; : : : ; m
t
/ and the estimated probability transition matrix At , in the t th iteration of the RobbinsMonro
algorithm.
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
t
Step E. Update the estimation D . t ; At / by
t D t1 t r U y; Y0Wn ; X1Wn
t
; t1 ; (10)
where r U y; Y0Wn ; X1Wn t t
; t1 D r U y; Y0Wn ; X1Wn ; D t1 , At D .aij
t t
/i;j D1Wm by aij D
t t t t t t
nij .X1Wn /=ni .X1Wn /, and D .i /iD1Wm by i D ni .X1Wn /=n.
P
Step A. Reduce the asymptotic variance of the algorithm by using the averages N t D 1t tkD1 k instead of t ,
which can be recursively computed by N 0 D 0 , and
1 t N t1
N t D N t1 C : (11)
t
The
following t result enables us to write the algorithm as a stochastic gradient algorithm. Let
E 0 U.y; Y0Wn ; X1Wn ; /jFt1 D u.y; Y0Wn ; / with E 0 ./ D E.jY0Wn ; 0 / and 0 D . 0 ; A0 / 2 Ft1 the
s
-algebra generated by X1Wn sD1W.t1/ . This conditional expectation is in fact the expectation with respect to the
t
conditional distribution function p.X1Wn jY0Wn ; 0 /. The proof is given in Appendix A.
n1 m
1 XX 0
u.y; Y0Wn ; / D Kh .y Yk /P .XkC1 D i jY0Wn ; /.YkC1 i /2 ; (12)
nh
kD0 iD1
t
and E 0 r U.y; Y0Wn ; X1Wn ; /jFt1 D r u.y; Y0Wn ; /:
Therefore, the restoration-estimation algorithm is a stochastic gradient algorithm that minimizes u.y; Y0Wn ; /
and can be written as
t D t1 C t r u.y; Y0Wn ; t1 / C &t ; (13)
where
t
&t D r U y; Y0Wn ; X1Wn ; t1 C r u.y; Y0Wn ; t1 /: (14)
Thus, the stochastic gradient algorithm is obtained by perturbation of the following gradient system:
t1
t1
x1 p.Y1 jY0 ; X1 D x1 ;
t1
/ : : : axt1
n1 xn
p.Yn jYn1 ; Xn D xn ; t1
/
p.X1Wn D x1Wn jY0Wn ; /D t1
;
p.Y1Wn jY0 ; /
Provided that XkC1 is known, p.Xk jXkC1 ; Y0Wk ; t1 / is a discrete distribution. The following sampling strategy
is suggested: for k D 2; : : : ; n and i D 1; : : : ; m, compute recursively the optimal filter by means of
m
X
t1 t1
p.Xk D i jY0Wk ; / / p.Yk jYk1 ; Xk D i; / ajt1
i p.Xk1 D j jY0Wk1 ;
t1
/:
j D1
t1
Then sample Xn from p.Xn jY0Wn ; / and for k D n 1; : : : ; 1, sample Xk using
t1 t1
t1
aixkC1
p.Xk D i jY0Wk ; /
p.Xk D i jXkC1 D xkC1 ; Y0Wk ; / D Pm :
j D1 ajt1
xkC1 p.Xk D j jY0Wk ;
t1 /
t
Following the proof reported in Rosales (2004), we have that the sequence X1Wn t2N is an ergodic Markov
t
chain with invariant distribution p.X1Wn D x1Wn jY0Wn ; /. It is sufficient to note that the sequence X1Wn t2N
N
is an irreducible and aperiodic Markov chain on a finite state space, 1; : : : ; m . Irreducibility and aperiodicity
follow directly from the positivity of the kernel,
n1
Y
t t1 t1
Q.X1Wn jX1Wn ; Y0Wn ; // p.Xnt jY0Wn ; t1
/ p.Xkt jXkC1
t
; Y0Wk ; t1
/ > 0:
kD1
In this case, the standard ergodic result for finite Markov chains applies (Kemeny and Snell, 1960).
t t1 t1
kQ.X1Wn jX1Wn ; Y0Wn ; / p.X1Wn jY0Wn ; /k 6 c
t : (15)
Moreover, (15) is satisfied with c D card.1; : : : ; mN /,
D .1 2Qx / and Qx D infx 0 ; 0 Q.x 0 jx; 0
/ for
x; x 0 2 1; : : : ; mN .
Step E: Estimation
In each iteration of this algorithm, we evaluate r U.y; Y0Wn ; X1Wn ; / the gradient of the potential. For each
1 6 i 6 m, we compute the components
@U
.y; Y0Wn ; X1Wn ; / D gO i;n .y; Y0Wn ; X1Wn / i fOi;n .y; Y0Wn ; X1Wn /:
@i
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
In each iteration, this quantity is updated. It has the advantage that the ratio rOi;n is not computed directly, avoiding
the zeros of the function fOi;n .
3.2. Consistency
The convergence analysis of RobbinsMonro approximations are well studied in Duflo (1996) in the general case.
In this article, we use a framework for the convergence of the stochastic gradient algorithm for the likelihood
function in HMMs similar to the framework used by Capp et al. (2005, p. 431). A consideration, for our particular
case, is that u./ is a continuously differentiable function of . The following convergence result is given for each
fixed y .
P P
Theorem 3.1. Assume condition B1, that t is a positive sequence such that t t D 1, t t2 < 1,
and that the closure of the set N t is a compact subset of . Then, almost surely, the sequence N t satisfies
limt!1 r u.y; Y0Wn ; N t / D 0. Furthermore, limt!1 N t D and r u.y; Y0Wn ; / D 0, a.s.
Proof P
Let M t D tsD0 s &s . The sequence M t is an Ft martingale, in fact
E.M t jFt1 / D E.t &t C M t1 jFt1 / D E.t &t jFt1 / C E.M t1 jFt1 / D M t1 :
P1
Moreover, it satisfies tD1 E.kM t M t1 k2 jFt1 / < 1. Indeed,
and
m n1
!2
2 4 X X
k&t k D 2 2 .YkC1 it1 /Kh .y Yk /Bit .k/ ;
n h
iD1 kD0
where Bit .k/ D 1Ii .XkC1
t t
/ E 1Ii .XkC1 /jFt1 are Bernoulli centred random variables. Therefore,
m n1
4 X X
E.k&t k2 jFt1 / D .YkC1 it1 /.Yk 0 C1 it1 /Kh .y Yk /Kh .y Yk 0 / ti .k; k 0 /;
n2 h2 0
iD1 k;k D0
and
m n1
!2
2 1 X X
E.k&t k jFt1 / 6 2 2 .YkC1 it1 /Kh .y Yk / D k. t1 /k2 ; (16)
n h
iD1 kD0
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
1
Pn
where ./ D .1 . /; : : : ; m . // and i ./ D nh kD1 .YkC1 i /Kh .y Yk /.
By compactness k./k2 is finite, therefore
1
X
t t1 2 t1 2
E.kM M k jFt1 / 6 k. /k t2 < 1:
tD1
Thus, by applying the conditional BorelCantelli lemma in Capp et al. (2005, Lemma 11.2.9), we see that the
sequence M t has a finite limit a.s., and according to Capp et al. (2005, Theorem 11.3.2), the sequence t
satisfies
lim r u.y; Y1Wn ; t / D 0:
t!1
By continuity of the function r u, we prove that D limt!1 t satisfies r u.y; Y1Wn ; / D 0, and by the
Cesro theorem, limt!1 N t D .
t
We have shown in Section 3.1 that the sequence X1Wn t2N is an ergodic Markov chain with invariant distri-
bution given by p.X1Wn D x1Wn jY0Wn ; /. The rate of convergence is given by equation (15). Moreover, for all
x1Wn 2 1; : : : ; mN , this invariant distribution satisfies
x1 p.Y1 jY0 ; X1 D x1 ;
/ : : : axn1 xn p.Yn jYn1 ; Xn D xn ;
/
p.X1Wn D x1Wn jY0Wn ; /D ;
p.Y1Wn jY0 ; /
where D . ; A /, D limt!1 N t is the limit obtained in Theorem 3.1, and A D limt!1 At is the
probability transition of the limit Markov chain X ; that is, A D .aij .Y0Wn //i;j D1Wm given by
aij .Y0Wn / D p.XkC1 D j jXk D i; Y0Wn ; /;
Theorem 2.1 implies that, if nhn = log n ! 1, then for all y 2 C , gO i;n .y/ ! gi .y/, fOi;n .y/ ! fi .y/ a.s. as
n ! 1. This implies that EgO i;n .y/jY0Wn ; ! gi .y/, EfOi;n .y/jY0Wn ; ! fi .y/, and i .y; Y0Wn / !
ri .y/, a.s. as n ! 1. As a consequence, we obtain
R
Remark
R 3.1. Note that fi .y/dy D i . Thus, if the compact set C is such that P .Y0 2 C / D 1, then
O
C fi;n .y/dy ! i , when n ! 1.
4. NUMERICAL EXAMPLES
We illustrate the performance of the algorithm developed, in the previous section, by applying it to a simulated
data.
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
4.1. Example 1
In this first example, we use an MS-NAR model with m D 2 states and autoregressive functions
2 2
r1 .y/ D 0:7y C 2e .10y / ; r2 .y/ D 1;
1 C e 10y
where r1 is a bump function and r2 is a decreasing logistic function. These functions were reported by Franke et al.
(2011). Let be the density of a Gaussian distribution with zero mean and variance 2 D 0:4. The transition
probability matrix is given by
0:98 0:02
AD :
0:02 0:98
We used a straightforward implementation of the algorithms described earlier. We generate a sample of length
n D 1000. For each k , we simulate Xk and then use it to determine Yk . The simulated data are plotted in Figure 1
(left).
For the estimation of the regression function ri , we use the standard Gaussian density as the kernel function
K , in spite of the fact that it is not compactly supported. As bandwidth parameter, we take h D .n= log.n//1=5 .
Assuming that the complete data .Y0Wn ; X1Wn / is available, we show in Figure 1 (right) the performance of rO1
and rO2 .
We implemented the restoration-estimation algorithm for the data described earlier. The initial estimates for the
0
Markov chain X1Wn in Step 0 of our algorithm were obtained by using a SAEM algorithm for the MS-AR model,
The estimated linear functions were of the form rO1 .y/ D 0:8239y 0:0218 and rO2 .y/ D 0:2943y C 0:6334.
Figure 2 (left) shows the scatter plot of Yk against Yk1 and the linear adjustment. In Figure 2 (right), we show
the scatter plot of Yk against Yk1 , r1 and r2 (solid lines) and the respective RobbinsMonro estimates (dashed
lines) for the last iteration.
We have implemented our RobbinsMonro procedure with t D 1 W T iterations and with the smoothing step
defined as
Figure 1. Simulated data Y0Wn (left). Estimated regression functions for complete data .Y0Wn ; X1Wn /. The real functions are
shown with solid lines and the estimates with dashed lines (right). [Colour figure can be viewed at wileyonlinelibrary.com]
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
1 t 6 T1
t D :
.t T1 /1 t > T1 C 1
0:983 0:017
AO D :
0:017 0:983
The square estimation error kAt Ak22 is shown in Figure 3; we observe that convergence is reached with less
than 100 iterations.
Figure 2. Parameter estimation for SAEM and scatter plot of simulated data (left). Non-parametric estimation by Robbins
Monro procedure; the real functions are shown with solid lines and the estimates with dashed lines (right). The points are
labelled with respect to the real state of Xk : dots for Xk D 1, and x s for Xk D 2. [Colour figure can be viewed at
wileyonlinelibrary.com]
Figure 3. The square estimation error kAt Ak22 for t D 1 W T . [Colour figure can be viewed at wileyonlinelibrary.com]
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
4.2. Example 2
In this example, we consider an MS-NAR model with m D 3. The autoregressive functions are
2 2
r1 .y/ D 0:7y C 2e .10y / ; r2 .y/ D 1; and r3 .y/ D 2cos.y/ 1:
1 C e 10y
The functions r1 and r2 are the same as those considered in Example 4.1. We take to be a Gaussian density with
zero mean and with variance 2 D 0:4. The transition matrix is given by
0 1
0:98 0:01 0:01
A D @ 0:01 0:98 0:01 A :
0:01 0:01 0:98
We simulate a sample path for .Y; X / of size n D 3000. The data are shown in Figure 4.
We implemented our RobbinsMonro algorithm for the data Y considering that X is hidden. We take a standard
Gaussian density as the kernel function K , the bandwidth parameter h D .n= log.n//1=5 and the smoothing step
0
t D t 0:6 , and the initial estimates for the Markov chain X1Wn in Step 0 are considered as uniform random
variables. Owing to the complexity of the regression functions in this example, the estimate for an MS-AR model
is not a good starting point.
For T D 1000, we obtain the following results. The estimate for the transition matrix is
0 1
0:9665 0:0244 0:0091
AO D @ 0:0161 0:9338 0:0500 A :
0:0230 0:0312 0:9458
Figure 5 shows the non-parametric regression functions obtained by the RobbinsMonro procedure, and the square
error for the estimate At .
4.3. Example 3
In this example, we take m D 4. The autoregressive functions are
2 2
r1 .y/ D 0:7y C 2e .10y / ; r2 .y/ D 1
1 C e 10y
Figure 4. Simulated data Y0Wn (left). Scatter plot for .Yk ; Yk 1/, where the points are labelled with respect to the real state of
Xk : dots for Xk D 1, x s for Xk D 2 and circles for Xk D 3 (right). [Colour figure can be viewed at wileyonlinelibrary.com]
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
Figure 5. Non-parametric estimation; the real functions are shown with solid lines and the estimates with dashed lines (left).
The square estimation error kAt Ak22 for t D 1 W T (right). [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 6. Simulated data Y0Wn (left). Scatter plot of .Yk ; Yk 1/. The points are labelled with respect to the real state of Xk :
dots for Xk D 1, x s for Xk D 2, circles for Xk D 3 and diamonds for Xk D 4 (right). [Colour figure can be viewed at
wileyonlinelibrary.com]
We take to be a Gaussian density with zero mean and variance 2 D 0:25. The transition matrix is given by
0 1
0:9000 0:1000 0 0
B 0:0500 0:9000 0:0500 0 C
ADB @ 0
C:
0:0500 0:9000 0:0500 A
0 0 0:1000 0:9000
We simulate a sample path of .Y; X / of size n D 3000. The simulated data are shown in Figure 6.
For T D 1000, we obtain the following results. The estimated transition matrix is
0 1
0:7655 0:0151 0:1439 0:0755
B 0:0406 0:7627 0:1128 0:0839 C
AO D B C
@ 0:1449 0:0705 0:7558 0:0288 A :
0:0656 0:0933 0:1109 0:7302
The non-parametric regression functions obtained by the RobbinsMonro procedure and the square error for the
estimate At are displayed in Figure 7.
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
Figure 7. Non-parametric estimation; the real functions are shown with solid lines and the estimates with dotted lines (left).
The square estimation error is given by kAt Ak22 for t D 1 W T . [Colour figure can be viewed at wileyonlinelibrary.com]
Notice that the algorithm has a good performance for the first two examples, that is, the case of m D 2 and
m D 3 respectively. Moreover, convergence is quickly reached for the probability transition matrix At . However,
when the number of states is increased to m > 4, several problems arise in the non-parametric estimation. It is
seen that the algorithm has difficulties in identifying the state of the Markov chain at intersection points of the
regression functions; that is, a loss of numerical identifiability occurs at these points owing to the discretization
step and the sample size. Furthermore, a misclassification of data arises if the variance is large with respect to the
range of regression functions. Finally, when m is large, the algorithm is more sensitive to both the choice of the
starting point and the selection of the window size h.
It seems that some useful properties of the algorithm cease to be effective when the number of states increases.
This is probably because as the number of parameters increase, so does the complexity of the model. This is,
clearly, a consequence of what has been dubbed as the curse of dimensionality.
ACKNOWLEDGEMENTS
We thank the editor, Robert Taylor, the co-editor and two anonymous referees for their insightful comments that
greatly contributed to the improvement of this article. L. J. Fermn acknowledges support from the DIUV REG
N02/2011 project of the Universidad de Valparaso. L. A. Rodrguez is thankful to Universidad de Carabobo for a
sabbatical grant. This work has been partially supported by the Anillo ACT1112 and the MathAmSud 16MATH03
SIDRE projects.
REFERENCES
Ailliot P, Monbet V. 2012. Markov-switching autoregressive models for wind time series. Environmental Modelling & Software
30(9): 2101.
Allman ES, Matias C, Rhodes JA. 2009. Identifiability of parameters in latent structure models with many observed variables.
Annals of Statistics 37: 30993132.
Ango-Nze P, Buhlmann P, Doukhan P. 2002. Weak dependence beyond mixing and asymptotics for nonparametric regression.
Annals of Statistics 30: 397430.
Baum LE, Petrie T, Soules G, Weiss N. 1970. A maximization technique occurring in the statistical analysis of a probabilistic
functions of Markov chains. Annals of Mathematical Statistics 41: 164171.
Benaglia T, Chauveua D, Hunter DR. 2009. An EM-like algorithm for semi- and non-parametric estimation in multivariate
mixtures. Journal of Computational and Graphical Statistics 18(2): 505526.
Capp O, Moulines E, Rydn T. 2005. Inference in Hidden Markov Models. New York, USA: Springer.
Carter CK, Kohn R. 1994. On Gibbs sampling for state space model. Biometrika 81: 541553.
Dacunha-Castelle D, Gassiat E. 1997. The estimation of the order of a mixture model. Bernoulli 3(3): 279299.
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
Delyon B, Lavielle M, Moulines E. 1999. Convergence of a stochastic approximation version of EM algorithm. The Annals of
Statistics 27(1): 94128.
Douc R, Moulines E, Rydn T. 2004. Asymptotic properties of the maximum likelihood estimator in autoregressive models
with Markov regime. Annals of Statistics 32: 22542304.
Doucet A, Logothetis A, Krishnamurthy V. 2000. Stochastic sampling algorithms for state estimation of jump Markov linear
systems. IEEE Transactions on Automatic Control 45(2): 188202.
Doukhan P. 1994. Mixing: Properties and Examples. Lecture Notes in Statistics 85.
Duflo M. 1996. Algorithmes Stochastiques. Berlin: Springer-Verlag.
Ferraty F, Antn N, Vieu P. 2001. Regresin No paramtrica: Desde la Dimensin Uno hasta la dimensin Infinita. Bizcaia,
Spain: Servicio editorial de Universidad del Pas Vasco.
Francq C, Roussignol M. 1997. On white noises driven by hidden Markov Chains. Journal of Time Series Analysis 18:
553578.
Francq C, Roussignol M, Zakoian J-M. 2001. Conditional heteroskedasticity driven by hidden Markov chains. Journal of Time
Series Analysis 2: 197220.
Franke J, Stockis JP, Tadjuidje J, Li WK. 2011. Mixtures of nonparametric autoregressions. Journal of Nonparametric Statistics
23(2): 287303.
Gassiat E, Cleynen A, Robin S. 2016. Inference in finite state space non-parametric hidden Markov models and applications.
Statistics and Computing 26(1): 6171.
Goldfeld SM, Quandt R. 1973. A Markov model for switching regressions. Journal of Econometrics 1: 316.
Hamilton JD. 1989. A new approach to the economic analysis of non stationary time series and the business cycle.
Econometrica 57(2): 357384.
Hamilton JD, Raj B. 2003. Advances in Markov-Switching Models: Applications in Business Cycle Research and Finance
(Studies in Empirical Economics). Heidelberg, Germany: Springer.
Harel M, Puri M. 2001. U-statistiques conditionnells universellement consistantes pour des modles de Markov cachs.
Comptes Rendus de lAcadmie des Sciences - Sries I- Mathematics 333: 953956.
Kemeny JG, Snell JL. 1960. Finite Markov Chains. Princeton, New Jersey: Van Nostrand.
Kim C, Nelson C. 1999. State-Space Models with Regime Switching Classical and Gibbs-Sampling Approaches with
Applications. Cambridge, Massachusetts, London: MIT Press.
Krishnamurthy V, Rydn T. 1998. Consistent estimation of linear and non-linear autoregressive models with Markov regime.
Journal of Time Series Analysis 19: 291307.
Krolzig H-M. 1997. Markov-Switching Vector Autoregressions: Modelling, Statistical Inference, and Application to Business
Cycle Analysis. Verlag Berlin Heidelberg, Germany: Springer.
Polyak BT, Juditsky AB. 1992. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and
Optimization 30: 838855.
Rio E. 1993. Covariance inequalities for strongly mixing processes. Annales de linstitut Henri Poincar (B) Probabilits et
Statistiques 29: 587597.
Rio E. 2000. Thorie Asymptotique des Processus Faiblement Dpendents, Vol. 31. Paris: Springer-SMAI.
Ros R, Rodrguez LA. 2008a. Penalized estimate of the number of states in Gaussian linear AR with Markov regime. Electronic
Journal of Statistics 2: 11111128.
Ros R, Rodrguez LA. 2008b. Estimacin semiparamtrica en procesos autorregresivos con rgimen de Markov. Divulgaciones
Matemticas 16(1): 155171.
Rosales R. 2004. MCMC for hidden Markov models incorporating aggregation of states and filtering. Bulletin of Mathematical
Biology 66(5): 11731199.
Tugnait J. 1982. Adaptive estimation and identification for discrete systems with Markov jump parameters. Automatic Control,
IEEE Transactions 27(5): 10541065.
Yao J. 2000. On recursive estimation in incomplete data models. Statistics 34: 2751.
Yao J, Attali JG. 1999. On stability of nonlinear AR process with Markov switching. Advances in Applied Probability
32: 394407.
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
Taking into account the independence of Y0 , Xk k>1 and ek k>1 , we have the following factorization:
Since .yk rxk .yk1 //.yk 0 rxk0 .yk 0 1 // 6 kk21 , and using this bound in the aforementioned expression,
we have that the integral of the remaining terms is equal to 1. Thus, p.Yk D yk ; Yk 0 D yk 0 / 6 kk21 . In a similar
way, we can prove that p.Y0 D y0 ; Yk 0 D yk 0 / 6 kk1 .
Proof of Proposition 2.2
We consider the extended Markov chain Z D Zn n>1 defined by Zn D .Yn ; Xn /. Under conditions E1E7,
from Yao and Attali (1999, Theorem 1), we have that Z is a geometrically ergodic Markov chain. Denoting by
.E; B/ the space state of Z , by Q the kernel probability transition and by the invariant probability measure, it
follows that the -mixing coefficients of Z take the following form (Doukhan, 1994, Section 2.4):
n .Z/ WD E supjQ.n/ .Z; B/ .B/j W B 2 B : (A1)
By Theorem 1 in Doukhan (1994, Section 2.4), the geometric ergodicity implies the -mixing property for Z .
Moreover, there exist 0 <
< 1 and c > 0 such that the -mixing coefficients satisfy n .Z/ 6 c
n . Thus, from
inequality 2n .Z/ 6 n .Z/, the process Z is also -mixing.
On the other hand, the process Y can be obtained from Z as Yn D .Zn /, where is the projection function.
Since the projection is a continuous function, we have Mba .Y / Mba .Z/ for all a; b . Then, from the expression
given for the -mixing coefficients in formula (5), we obtain
1 c
n .Y / 6 n .Z/ 6 n .Z/ 6
n :
2 2
m
X Z C1 m
X 0 2 t 2 =2
i e ty .y ri .y 0 //dy D i e ri .y /tC D 0:
1
iD1 iD1
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
Thus,
m
X X m
1 X m
X
0 rik .y 0 /t k
i e ri .y /t D i D0 H) i rik .y 0 / D 0; 8k > 0:
k
iD1 kD0 iD1 iD1
From condition I2, det.V / 0 for almost all y 0 . Then the system of equations
m
X
i rik .y 0 / D 0; for k D 0; : : : ; m 1
iD1
Z
h2 a fi .y h/K./d
Z 2
Cb E.Y1 1IjY1 j6Mn jX1 D i; Y0 D y h/K./fi .y h/d :
From the dominated convergence theorem and conditions R2 and R3, we have, when n ! 1 and h ! 0,
Var .T0;n / h a2 fi .y/ C 2abgi .y/ C b 2 g2;i .y/ kKk22 C o.h2 /:
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
Now, for the covariance terms cov.T0;n ; Tk;n /, we define Uks D .Yk 1IjYk j6Mn /s , s D 0; 1; 2. As the process is
stationary, it suffices to consider
cov U1s Kh .y Y0 / 1Ii .X1 /; UkC1
l
Kh .y Yk / 1Ii .XkC1 / :
Owing to -dependency using the inequality of Rio (Rio, 1993), the covariance is bounded by
cov U1s Kh .y Y0 / 1Ii .X1 /; UkC1
l
Kh .y Yk / 1Ii .XkC1 /
6 MnsCl cov .Kh .y Y0 / 1Ii .X1 /; Kh .y Yk / 1Ii .XkC1 //
6 MnsCl 4kKk21 k :
This gives
Set Ck .i; v/ D XkC1 D i; Yk D v. We recall that A.k/ij is the .i; j /th entry of the k th power of the matrix A.
Then
E U1s UkC1
l
Kh .y Y0 / 1Ii .X1 /Kh .y Yk / 1Ii .XkC1 /
Z Z
D Kh .y u/ Kh .y v/ E U1s UkC1 l
C0 .i; u/; Ck .i; v/
p.Y0 D u; Yk D v/P .X1 D i; XkC1 D i /dudv
Z Z
.k/ 2
6 Ai i i h K.u/K.v/E jU1s UkC1
l
j C0 .i; y uh/; Ck .i; y vh/
For s C l D 1, we only consider the case s D 0; l D 1 since the case s D 1; l D 0 is similar. Thus,
E jY1s YkC1
l
j X1 D i; Y0 D u; XkC1 D i; Yk D v D E . jYkC1 jj XkC1 D i; Yk D v/ 6 jri .v/j C E.je1 j/;
so that, by Lemma 2.1, by the continuity of the function ri .v/ and by the moment condition for e1 , when we take
h ! 0, we obtain
cov U1s Kh .y Y0 /1Ii .X1 /; UkC1
l
Kh .y Yk /1Ii .XkC1 / h2 .jri .y/jCE.je1 j// A.k/ 2 3
i i i kk1 C o.h /:
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
Then by the continuity of the function ri;k .u; v/ and Lemma 2.1, taking h ! 0, we have
cov U1s Kh .y Y0 /1Ii .X1 /; UkC1
l
Kh .y Yk /1Ii .XkC1 / h2 ri;k .y; y/A.k/ 3
i i i kk1 C o.h /;
By collecting the bounds, we obtain, for large enough n and small enough h, that
n1
1 X Q
gQ i;n .y/ D k :
nh
kD0
Thus,
P .jgO i;n .y/EgO i;n .y/j > 2/ 6 P .jgQ i;n .y/EgQ i;n .y/j > /CP .jgO i;n .y/gQ i;n .y/E.gO i;n .y/gQ i;n .y/j/ > /:
Conditions E3 and M2 imply E.jYk js / < 1 for s > 2, then by Chebyshevs inequality,
P .jgO i;n .y/ gQ i;n .y/ E.gO i;n .y/ gQ i;n .y/j/ > / 6 2 var .jgO i;n .y/ gQ i;n .y/j/
We obtain a bound on the right-hand side of the aforementioned inequality using the Hlder inequality and the
stationarity of the model,
E jgO i;n .y/ gQ i;n .y/j2 6 h2 E Y12 Kh2 .y Y0 /1Ii .X1 /1IjY1 j>Mn
6 h2 E.jY1 js /
Y 2s 1IjY j>M K 2
1 1 n 1
6 kKk21 E.jY1 js /Mn.2s/ h2
6 c3 Mn.2s/ h2 ;
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
n1
X
Q 0/ C 2
sn2 D n2 h2 var .gQ i;n .y// D nvar . Q 0;
.n k/cov . Q k /:
kD1
Q k into terms:
Second, we use Trans device to split the covariance of
n1
X uX
n 1 n1
X
Q 0;
.n k/cov . Q k/ D Q 0;
.n k/cov . Q k/ C Q 0;
.n k/cov . Q k /:
kD1 kD1 kDun
In a way similar to the case of the first bound, we apply item (ii) of Lemma 2.2 with a D 0, b D 1 and condition
R3, and we obtain for k 6 un < n
Q 0;
cov . Q k / h2 sup kri;k k1 kk21 C o.h3 /: (A.2)
k2N
Q 0;
cov . Q k / 6 4kKk21 Mn2 k : (A.3)
From Proposition 2.2, there exist 0 <
< 1 and c2 > 0 such that the -mixing coefficients satisfy n .Y / 6 c2
n .
From inequalities (A.2) and (A.3) and taking un D .h log n/1 , we obtain
n1
X un
.n k/cov . Q k / sup kri;k k1 kk21 h2 nun C 4c2 kKk21 nMn2
Q 0; D o.nh/:
k2N 1
kD1
=2
2 nh 16c2 Mn
un
P .jgQ i;n .y/ EgQ i;n .y/j > / 4 1 C C ;
16c1 h
where c1 D supy2C g2;i .y/kKk22 . Thus, result (i) follows. We can prove (ii) in a similar way, taking a D 1; b D 0
and cQ1 D supy2C fi .y/kKk22 in Lemma 2.2.
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
1
P
Since gO i;n .y/ D nh k k , equation (A.4) implies that
Z
EgO i;n .y/ D K.u/gi .y uh/du: (A.5)
.uh/2 00
gi .y uh/ D gi .y/ uhgi0 .y/ C gi .yQu /
2
with yQu D .y uh/.1 t / C ty for some t 2 0; 1. As the kernel K is assumed to be of order 2, substituting the
Taylor approximation into (A.5) gives
Z
h2
EgO i;n .y/ gi .y/ D gi 00 .yQu /u2 K.u/du:
2
Thus,
According to the bias-variance decomposition, the proof of the theorem is achieved through Lemmas 2.3 and 2.4,
guaranteeing the strict positivity of infy2C jfOi;n
q .y/j.
n
Thus, applying Lemma 2.3 with D 0 log nhn
, Mn D n with > 0, un D .hn log n/1 , hn D nd
with 0 < d < 1, and large enough so that log.n/ D o./, we have for # D .s 2/ d 2 > 0 and
02
32c1
D .s 2/ d 1
s ! 2 1
log n log.n/ 16n C 2
un n1C.2s/
P jgO i;n .y/ EgO i;n .y/j > 0 4 exp 0 C c2 1
C c3 2
nhn 32c1 1
.log n/ 2 h 2 0 log.n/hn
0 n
2 C 12 un 1C.2s/ (A.10)
0
32c 16n
n
4n 1 C c2 C c3
1
0 .log n/ 2 hn
1
2 02 log.n/hn
cn.1C#/ :
Applying the BorelCantelli lemma, the almost surely pointwise convergence of jgO i;n .y/ EgO i;n .y/j to 0 is
proved. We proceed analogously to obtain the almost surely pointwise convergence of jfOi;n .y/ EfOi;n .y/j ! 0.
According to Lemma 2.4, we have
1
inf jfOi;n .y/j > inf jfi .y/j sup jfOi;n .y/ EfOi;n .y/j sup jEfOi;n .y/ f .y/j > inf jfi .y/j > 0:
y2C y2C y2C y2C 2 y2C
Thus, the previous results obtained from Lemmas 2.3 and 2.4 and inequality (A.8) give the pointwise convergence
of jrOi;n .y/ ri .y/j.
To obtain the uniform convergence on a compact set C, we only need to prove an asymptotic inequality of type
(A.10) for the term supy2C jgO i;n .y/ EgO i;n .y/j, and analogously for supy2C jfOi;n .y/ EfOi;n .y/j, in inequality
(A.9). For this, we proceed by using a truncation device as in Ango-Nze et al. (2002), assuming the moment
condition M1.
Let us set k D YkC1 Khn .y Yk /1Ii .XkC1 / and the truncated variable Q k D k 1IjYkC1 j6Mn . Then we
define the truncated kernel estimator of gi by
n1
1 X Q
gQ i;n .y/ D k :
nhn
kD0
6 c4 nM0 =2 h1=2
n ;
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237
L. J. FERMIN, R. RIOS, AND L. A. RODRIGUEZ
sup jgQ i;n .y/EgQ i;n .y/j 6 max jgQ i;n .tk /EgQ i;n .tk /jCsup jgQ i;n .tk /gQ i;n .y/jCsup jEgQ i;n .y/EgQ i;n .tk /j:
y2C kD1;:::; n y2C y2C
Let us examine each term in the right-hand side of the preceding equation. First, we have from Lemma 2.3
s ! s !
n
X
"0 log n "0 log n
P max jgQ i;n .tk / EgQ i;n .tk /j > 6 P jgQ i;n .tk / EgQ i;n .tk /j >
kD1;:::; n 2 nhn 2 nhn
kD1
!
2
0
128c 32n1=2 Mn
un
n 4n 1 C c2 :
0 .log n/1=2 h1=2
n
For the second and third terms, we use the following inequality obtained from condition R1:
n
1 X Mn Mn L
jgQ i;n .tk / gQ i;n .y/j 6 Mn jKhn .tk Yk / Khn .y Yk /j 6 c5 1C jy tk j 6 c5 1Cn ;
nhn hn hn
kD1
1
1 C 02 1Cd
Setting L
n D n
2 2
hn Mn1 , n D c5 =Ln , un D .hn log n/1 , # D 128c1
C 2
C d 1 > 0 and M0 D
2.# C 1/ C d , we obtain for some constant c > 0
s ! !
2
log n c5 0
128c 32n1=2 Mn
un nM0 =2
P sup jgO i;n .y/EgO i;n .y/j > 0 4n 1 Cc
2 Cc 4
y2C nhn Ln 0 .log n/1=2 h1=2
n h1=2
n
(A.11)
.1C#/
cn :
Hence, the BorelCantelli lemma implies the a.s. convergence of the term supy2C jgO i;n .y/ EgO i;n .y/j.
The uniform convergence over a compact set of the regression function rOi;n follows in the same way as for the
a.s. pointwise convergence.
Remark A1. Note that in the proof of the a.s. pointwise convergence, the probability term in (A.10) is summable
if # D .s 2/ d 2 > 0. This is only possible if s > 2, and so the restriction imposed in condition M2 arises.
q From the asymptotic inequalities (A.10) and (A.11), we can notice that the convergence rate in Theorem 2.1 is
log n
nhn
.
wileyonlinelibrary.com/journal/jtsa Copyright 2017 John Wiley & Sons, Ltd J. Time Ser. Anal. (2017)
DOI: 10.1111/jtsa.12237
NON-PARAMETRIC ESTIMATION OF MS-NAR PROCESS
D r u.y; Y0Wn ; /:
J. Time Ser. Anal. (2017) Copyright 2017 John Wiley & Sons, Ltd wileyonlinelibrary.com/journal/jtsa
DOI: 10.1111/jtsa.12237