Professional Documents
Culture Documents
1
2
3
F:jbes05m060.tex; (Diana) p. 1
4
5
6
12
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
65
66
67
Guodong L I
Department of Statistics and Actuarial Science, University of Hong Kong, P. R. China (ligd@hkusua.hku.hk )
68
69
70
Guohua J IANG
Guanghua School of Management, Peking University, Beijing, P. R. China 100871 (gjiang@gsm.pku.edu.cn)
13
14
62
64
Hansheng WANG
10
11
61
63
7
8
60
71
72
The least absolute deviation (LAD) regression is a useful method for robust regression, and the least absolute shrinkage and selection operator (lasso) is a popular choice for shrinkage estimation and variable
selection. In this article we combine these two classical ideas together to produce LAD-lasso. Compared
with the LAD regression, LAD-lasso can do parameter estimation and variable selection simultaneously.
Compared with the traditional lasso, LAD-lasso is resistant to heavy-tailed errors or outliers in the response. Furthermore, with easily estimated tuning parameters, the LAD-lasso estimator enjoys the same
asymptotic efficiency as the unpenalized LAD estimator obtained under the true model (i.e., the oracle
property). Extensive simulation studies demonstrate satisfactory finite-sample performance of LAD-lasso,
and a real example is analyzed for illustration purposes.
75
76
77
78
79
80
82
efficient model selection criteria, among which the most wellknown example is the AIC. The efficient criteria have the ability
to select the best model by an appropriately defined asymptotic
optimality criterion. Therefore, they are particularly useful if
the underlying model is too complicated to be well approximated by any finite-dimensional model. However, if an underlying model indeed has a finite dimension, then it is well known
that many efficient criteria (e.g., the AIC) suffer from a nonignorable overfitting effect regardless of sample size. In such
a situation, a consistent model selection criterion can be useful. Hence the second major class contains all of the consistent
model selection criteria, among which the most representative
example is the BIC. Compared with the efficient criteria, the
consistent criteria have the ability to identify the true model
consistently, if such a finite-dimensional true model does in fact
exist.
Unfortunately, theoretically there is no general agreement regarding which selection criteria type is preferable (Shi and Tsai
2002). As we demonstrate, the proposed LAD-lasso method belongs to the consistent category. In other words, we make the
assumption that the true model is of finite dimension and is contained in a set of candidate models under consideration. Then
the proposed LAD-lasso method has the ability to identify the
true model consistently. (For a better explanation of model selection efficiency and consistency, see Shao 1997; McQuarrie
and Tsai 1998; and Shi and Tsai 2002.)
Because most selection criteria (e.g., AIC, BIC) are developed based on OLS estimates, their finite-sample performance
under heavy-tailed errors is poor. Consequently, Hurvich and
Tsai (1990) derived a set of useful model selection criteria (e.g.,
AIC, AICc, BIC) based on the LAD estimates. But despite
Datasets subject to heavy-tailed errors or outliers are commonly encountered in applications. They may appear in the responses and/or the predictors. In this article we focus on the
situation in which the heavy-tailed errors or outliers are found
in the responses. In such a situation, it is well known that the
traditional ordinary least squares (OLS) may fail to produce a
reliable estimator, and the least absolute deviation
(LAD) esti
mator can be very useful. Specifically, the n-consistency and
asymptotic normality for the LAD estimator can be established
without assuming any moment condition of the residual. Due
to the fact that the objective function (i.e., the LAD criterion)
is a nonsmooth function, the usual Taylor expansion argument
cannot be used directly to study the LAD estimators asymptotic properties. Therefore, over the
past decade, much effort
has been devoted to establish the n-consistency and asymptotic normality of various LAD estimators (Bassett and Koenker
1978; Pollard 1991; Bloomfield and Steiger 1983; Knight 1998;
Peng and Yao 2003; Ling 2005), which left the important problem of robust model selection open for study.
In a regression setting, it is well known that omitting an
important explanatory variable may produce severely biased parameter estimates and prediction results. On the other hand, including unnecessary predictors can degrade the efficiency of the
resulting estimation and yield less accurate predictions. Hence
selecting the best model based on a finite sample is always a
problem of interest for both theory and application. Under appropriate moment conditions for the residual, the problem of
model selection has been extensively studied in the literature
(Shao 1997; Hurvich and Tsai 1989; Shi and Tsai 2002, 2004),
and two important selection criteria, the Akaike information criterion (AIC) (Akaike 1973) and the Bayes information criterion
(BIC) (Schwarz 1978), have been widely used in practice.
By Shaos (1997) definition, many selection criteria can be
classified into two major classes. One class contains all of the
74
81
1. INTRODUCTION
73
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
F:jbes05m060.tex; (Diana) p. 2
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
n
(yi xi )2
+n
i=1
p
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
59
62
63
64
j |j |,
66
67
j=1
68
which allows for different tuning parameters for different coefficients. As a result, lasso* is able to produce sparse solutions
more effectively than lasso.
Nonetheless, it is well known that the OLS criterion used in
lasso* is very sensitive to outliers. To obtain a robust lasso-type
estimator, we further modify the lasso* objective function into
the following LAD-lasso criterion:
LAD-lasso = Q() =
n
|yi xi | + n
i=1
p
As can be seen, the LAD-lasso criterion combines the LAD criterion and the lasso penalty, and hence the resulting estimator is
expected to be robust against outliers and also to enjoy a sparse
representation.
Computationally, it is very easy to find the LAD-lasso estimator. Specifically, we can consider an augmented dataset
{(yi , xi )} with i = 1, . . . , n + p, where (yi , xi ) = (yi , xi ) for
1 i n, (yn+j , xn+j ) = (0, nj ej ) for 1 j p, and ej is a
p-dimensional vector with the jth component equal to 1 and all
others equal to 0. It can be easily verified that
LAD-lasso = Q() =
70
71
72
73
74
75
76
j |j |.
j=1
n+p
69
|yi xi |.
i=1
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
+ i ,
i = 1, . . . , n,
(1)
where xi = (xi1 , . . . , xip ) is the p-dimensional regression covariate, = (1 , . . . , p ) are the associated regression coefficients, and i are iid random errors with median 0. Moreover,
assume that j = 0 for j p0 and j = 0 for j > p0 for some
p0 0. Thus the correct model has p0 significant and ( p p0 )
insignificant regression variables.
Usually, the unknown parameters of model
(1) can be esti
mated by minimizing the OLS criterion, ni=1 (yi xi )2 . Furthermore, to shrink unnecessary coefficients to 0, Tibshirani
(1996) proposed the following lasso criterion:
57
58
61
65
39
40
60
lasso =
n
i=1
(yi xi )2
+ n
p
j=1
|j |,
99
100
101
Assumption A. The error i has continuous and positive density at the origin.
111
113
Note that Assumptions A and B are both very typical technical assumptions used extensively in the literature (Pollard 1991;
Bloomfield and Steiger 1983; Knight 1998). They are needed
116
102
103
104
105
106
107
108
109
110
112
114
115
117
118
F:jbes05m060.tex; (Diana) p. 3
for establishing the n-consistency and the asymptotic normality of the unpenalized LAD estimator. Furthermore, define
an = max{j , 1 j p0 }
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
and
bn = min{j , p0 < j p},
where j is a function of n. Based on the foregoing notation, the
consistency of LAD-lasso estimator can first be established.
Lemma 1 (Root-n consistency). Suppose that (xi , yi ), i =
1, . . . , n, are iid and that the linear
regression model (1) satisfies Assumptions A and B. If nan 0, then the LAD-lasso
estimator is root-n consistent.
The proof is given in Appendix A. Lemma 1 implies that if
the tuning parameters associated with the significant variables
1/2 , then the correspondconverge to 0 at a speed faster than
n
ing LAD-lasso estimator can be n-consistent.
We next show that if the tuning parameters associated with
the insignificant variables shrink to 0 slower than n1/2 , then
those regression coefficients can be estimated exactly as 0 with
probability tending to 1.
Lemma 2 (Sparsity). Under the same
assumptions as Lemma 1 and the further assumption that nbn , with probability tending to 1, the LAD-lasso estimator = ( a , b ) must
satisfy b = 0.
The proof is given in Appendix B. Lemma 2 shows that
LAD-lasso has the ability to consistently produce sparse solutions for insignificant regression coefficients; hence variable
selection and parameter estimation can be accomplished simultaneously.
The foregoing two lemmas imply that the root-nconsistent
estimator must satisfy P(2 = 0) 1 when the tuning parameters fulfill appropriate conditions. As a result, the LAD-lasso
estimator performs as well as the oracle estimator by assuming
that the b = 0 is known in advance. Finally, we obtain the asymptotic distribution of the LAD-lasso estimator.
Theorem (Oracle property). Suppose that (xi , yi ), i = 1,
. . . , n, are iid and that the linear regression
model (1) satisfies
and Li 2001), AIC, AICc, and BIC (Hurvich and Tsai 1989;
Zou, Hastie, and Tibshirani 2004) methods can be used to select the optimal regularization parameter j after appropriate
modification, for example, replacing the least squares term by
the LAD criterion. However, using them in our situation is difficult for at least two reasons. First, there are a total of p tuning
parameters involved in the LAD-lasso criterion, and simultaneously tuning so many regularization parameters is much too
expensive computationally. Second, their statistical properties
are not clearly understood under heavy-tailed errors. Hence we
consider the following option.
Following an idea of Tibshirani (1996), we can view the
LAD-lasso estimator as a Bayesian estimator with each regression coefficient following a double-exponential prior with location parameter 0 and scale parameter nj . Then the optimal j
can be selected by minimizing the following negative posterior
log-likelihood function:
n
|yi xi | + n
p
i=1
j |j | log(.5nj ),
(2)
j=1
i=1
|yi xi | + n
p
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
n j for j > p0 , is not necessarily guaranteed. Hence directly using such a tuning parameter estimate tends to produce
overfitted results if the underlying true model is indeed of finite
dimension. This property is very similar to the AIC. In contrast, note that in (2), the complexity of the final model is actually controlled by log(), which can be viewed as simply the
number of significant variables in the AIC. Hence it motivates
us to consider the following BIC-type objective function:
n
60
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
j |j | log(.5nj ) log(n),
97
j=1
98
99
100
101
102
j =
log(n)
,
n|j |
(3)
103
104
105
106
107
108
109
3. SIMULATION RESULTS
110
111
112
113
114
115
116
117
118
F:jbes05m060.tex; (Diana) p. 4
4
1
2
3
4
5
6
7
yi = xi + i ,
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
prediction error (MAPE), evaluated based on another 1,000 independent testing samples for each iteration.
As can be seen from Tables 24, all of the methods (AIC,
AICc, BIC, and LAD-lasso) demonstrate comparable prediction accuracy in terms of mean and median MAPE. However,
the selection results differ significantly. In the case of = 1.0,
both the AIC and AICc demonstrate appreciable overfitting effects, whereas the BIC and LAD-lasso have very comparable
performance, with both demonstrating a clear pattern of finding
the true model consistently. Nevertheless, in the case of = .5,
LAD-lasso significantly outperforms the BIC with a margin that
can be as large as 30% (e.g., n = 50; see Table 3) whereas the
overfitting effect of the BIC is quite substantial.
Keep in mind that our simulation results should not be mistakenly interpreted as evidence that the AIC (AICc) is an inferior choice compared with the BIC or LAD-lasso. As pointed
out earlier, the AIC (AICc) is a well-known efficient but inconsistent model selection criterion that is asymptotically optimal
if the underlying model is of infinite dimension. On the other
hand, the BIC and LAD-lasso are useful consistent model selection methods if the underlying model is indeed of finite dimension. Consequently, all of the methods are useful for practical
data analysis. Our theory and numerical experience suggest that
25
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
26
27
85
86
No. of zeros
28
87
29
Method
Underfitted
Correctly fitted
Overfitted
Incorrect
Correct
Average MAPE
Median MAPE
88
30
1.0
50
AIC
AICc
BIC
LASSO
ORACLE
.079
.093
.160
.184
.347
.429
.591
.633
1.000
.574
.478
.249
.183
.079
.093
.161
.185
3.038
3.252
3.635
3.730
4.000
1.102
1.099
1.092
1.099
1.059
1.011
.988
1.110
1.045
1.084
89
.003
.005
.026
.035
1.045
1.043
1.035
1.040
1.027
1.013
1.087
1.037
1.059
1.073
94
3.117
3.232
3.778
3.815
4.000
0
0
0
0
0
3.295
3.343
3.907
3.922
4.000
1.019
1.018
1.014
1.016
1.012
.987
.946
.976
1.020
1.043
99
.666
.550
.314
.036
0
.001
.002
.017
0
2.945
3.228
3.619
3.961
4.000
.550
.546
.539
.549
.528
.560
.525
.615
.501
.537
104
.596
.539
.162
.013
0
0
0
0
0
3.131
3.249
3.819
3.987
4.000
.522
.521
.516
.520
.513
.469
.542
.496
.496
.513
109
0
0
0
0
0
3.224
3.268
3.868
3.992
4.000
.510
.510
.507
.507
.507
.526
.502
.502
.507
.516
31
32
33
34
35
100
36
37
38
39
200
40
41
42
43
44
45
.5
50
46
47
48
49
50
100
51
52
53
54
55
56
57
58
59
200
AIC
AICc
BIC
LASSO
ORACLE
.385
.435
.774
.798
1.000
AIC
AICc
BIC
LASSO
ORACLE
0
0
0
0
0
.475
.498
.914
.927
1.000
AIC
AICc
BIC
LASSO
ORACLE
0
.001
.002
.017
0
.334
.449
.684
.947
1.000
AIC
AICc
BIC
LASSO
ORACLE
0
0
0
0
0
.404
.461
.838
.987
1.000
AIC
AICc
BIC
LASSO
ORACLE
0
0
0
0
0
.429
.455
.877
.992
1.000
.003
.005
.026
.035
.612
.560
.200
.167
0
.525
.502
.086
.073
0
0
.571
.545
.123
.008
0
90
91
92
93
95
96
97
98
100
101
102
103
105
106
107
108
110
111
112
113
114
115
116
117
118
F:jbes05m060.tex; (Diana) p. 5
1
2
60
61
No. of zeros
62
Method
Underfitted
Correctly fitted
Overfitted
Incorrect
Correct
Average MAPE
Median MAPE
63
1.0
50
AIC
AICc
BIC
LASSO
ORACLE
.082
.102
.177
.225
.244
.316
.492
.566
1.000
.674
.582
.331
.209
.082
.102
.179
.228
2.825
3.072
3.501
3.687
4.000
1.052
1.048
1.042
1.048
1.007
1.290
1.124
1.164
1.073
.990
64
.009
.011
.027
.042
.994
.994
.985
.991
.976
1.023
.948
.988
.965
.980
69
2.995
3.105
3.707
3.803
4.000
.677
.650
.174
.139
0
0
0
0
0
3.051
3.104
3.814
3.856
4.000
.971
.971
.966
.969
.963
1.161
.986
1.020
1.059
.907
74
.713
.594
.339
.025
0
.001
.001
.012
0
2.904
3.189
3.596
3.975
4.000
.522
.519
.512
.524
.501
.525
.494
.544
.489
.522
79
.699
.646
.273
.018
0
0
0
0
0
2.957
3.090
3.700
3.982
4.000
.499
.498
.493
.495
.489
.546
.475
.493
.524
.509
.675
.652
.200
.013
0
0
0
0
0
2.999
3.052
3.781
3.987
4.000
.487
.486
.483
.482
.482
.473
.464
.496
.514
.494
6
7
8
9
100
10
11
12
13
14
200
15
16
17
18
19
20
.5
50
21
22
23
24
25
100
26
27
28
29
30
31
32
33
200
AIC
AICc
BIC
LASSO
ORACLE
.316
.364
.721
.787
1.000
AIC
AICc
BIC
LASSO
ORACLE
0
0
0
0
0
.323
.350
.826
.861
1.000
AIC
AICc
BIC
LASSO
ORACLE
0
.001
.001
.012
0
.287
.405
.660
.963
1.000
AIC
AICc
BIC
LASSO
ORACLE
0
0
0
0
0
.301
.354
.727
.982
1.000
AIC
AICc
BIC
LASSO
ORACLE
0
0
0
0
0
.325
.348
.800
.987
1.000
.009
.011
.027
.042
.675
.625
.252
.171
0
65
66
67
68
70
71
72
73
75
76
77
78
80
81
82
83
84
85
86
87
88
89
90
91
92
34
93
35
94
36
37
38
39
95
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Whereas the problem of earnings forecasting has been extensively studied in the literature on the North American and European markets, the same problem regarding one of the worlds
fastest growing capital markets, the Chinese stock market, remains not well understood. China resumed its stock market in
1991 after a hiatus of 4 decades. Since it reopened, the Chinese
stock market has grown rapidly, from a few stocks in 1991 to
more than 1,300 stocks today, with a total of more than 450 billion U.S. dollars in market capitalization. Many international
institutional investors are moving into China looking for better
returns, and earnings forecast research has become extremely
important to their success.
In addition, over the past decade, China has enjoyed a high
economic growth rate and rapid integration into the world market. Operating in such an environment, Chinese companies tend
to experience more turbulence and uncertainties in their operations. As a result, they tend to generate more extreme earnings.
For example, in our dataset, the kurtosis of the residuals differentiated from an OLS fit is as large as 90.95, much larger
than the value 3 of a normal distribution. Hence the reliability
of the usual OLS-based estimation and model selection methods (e.g., AIC, BIC, lasso) is severely challenged, whereas the
LAD-based methods (e.g., LAD, LAD-lasso) become more attractive.
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
F:jbes05m060.tex; (Diana) p. 6
1
2
60
61
No. of zeros
62
Method
Underfitted
Correct fitted
Overfitted
Incorrect
Correct
Average MAPE
Median MAPE
63
1.0
50
AIC
AICc
BIC
LASSO
ORACLE
.125
.149
.238
.248
.283
.373
.542
.536
1.000
.592
.478
.220
.216
.129
.153
.245
.255
2.961
3.206
3.640
3.671
4.000
1.217
1.212
1.202
1.212
1.164
1.199
1.258
1.347
1.248
1.366
64
.010
.011
.043
.046
3.108
3.216
3.751
3.761
4.000
1.151
1.151
1.144
1.148
1.133
1.104
1.152
1.339
1.218
1.169
69
.001
.001
.002
.002
3.144
3.191
3.856
3.850
4.000
1.123
1.128
1.120
1.123
1.114
1.148
1.090
1.070
1.122
1.110
74
.002
.003
.004
.030
.603
.602
.595
.607
.582
.594
.561
.674
.611
.614
79
6
7
8
9
100
10
11
12
13
14
200
15
16
17
18
19
20
50
.5
21
22
23
24
100
25
26
27
28
29
200
30
31
32
33
AIC
AICc
BIC
LASSO
ORACLE
AIC
AICc
BIC
LASSO
ORACLE
AIC
AICc
BIC
LASSO
ORACLE
0
.010
.011
.043
.046
0
.001
.001
.002
.002
0
.367
.410
.742
.748
1.000
.385
.407
.864
.856
1.000
.623
.579
.215
.206
0
.614
.592
.134
.142
0
.309
.412
.652
.920
1.000
2.982
3.219
3.600
3.946
4.000
AIC
AICc
BIC
LASSO
ORACLE
0
0
0
0
0
.340
.391
.790
.978
1.000
.660
.609
.210
.022
0
0
0
0
0
0
3.086
3.197
3.765
3.977
4.000
.575
.574
.569
.572
.566
.571
.581
.557
.573
.594
AIC
AICc
BIC
LASSO
ORACLE
0
0
0
0
0
.355
.386
.866
.984
1.000
.645
.614
.134
.016
0
0
0
0
0
0
3.085
3.153
3.857
3.984
4.000
.564
.563
.558
.560
.559
.560
.575
.568
.580
.550
.002
.003
.004
.030
.689
.585
.344
.050
65
66
67
68
70
71
72
73
75
76
77
78
80
81
82
83
84
85
86
87
88
89
90
91
92
34
93
35
94
36
37
38
39
40
41
42
95
Past studies on earnings forecasting show that these explanatory variables are among the most important accounting
variables in predicting future earnings. Asset turnover ratio
measures the efficiency of the company using its assets; profit
margin measures the profitability of the companys operation;
debt-to-asset ratio describes how much of the company is fi-
43
Variable
47
Int
ROEt
ATO
PM
LEV
GROWTH
PB
ARR
INV
ASSET
51
52
53
54
55
56
58
MAPE
STDE
59
NOTE:
57
99
100
101
104
46
50
98
103
45
49
97
102
44
48
96
AIC
AICc
BIC
.316809
.207533
.058824
.140276
.020943
.018806
.316809
.207533
.058824
.140276
.020943
.018806
.316809
.207533
.058824
.140276
.020943
.018806
LAD-lasso
LAD
OLS
105
2.103386
.181815
.165402
.209346
.245233
.033153
.018385
.009217
.354148
.101458
106
.233541
.021002
.014523
.014523
.014523
.002186
.363688
.190330
.058553
.134457
.023592
.019068
.001544
.002520
.012297
.016694
.121379
.021839
.121379
.021839
.121379
.021839
.120662
.022410
.120382
.021692
.067567
.006069
107
108
109
110
111
112
113
114
115
116
117
118
F:jbes05m060.tex; (Diana) p. 7
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
5. CONCLUDING REMARKS
60
61
n
(yi xi ) + n
i=1
p
62
63
64
65
66
67
j |j |,
68
69
j=1
70
71
72
73
74
75
76
77
78
79
ACKNOWLEDGMENTS
80
The authors are grateful to the editor, the associate editor, the
referees, Rong Chen, and W. K. Li for their valuable comments
and constructive suggestions, which led to substantial improvement of the manuscript. This research was supported in part by
the Natural Science Foundation of China (grant 70532002).
81
82
83
84
85
86
87
88
We want to show that for any given > 0, there exists a large
constant C such that
(A.1)
P inf Q + n1/2 u > Q() 1 ,
u =C
89
90
91
92
93
97
98
99
102
103
p
+n
j j + n1/2 uj |j |
104
105
j=1
106
n
yi x x n1/2 u |yi x |
107
108
i=1
nan
96
101
i=1
95
100
n
yi x + n1/2 u |yi x |
Dn (u) =
i
94
109
p0
110
|uj |.
(A.2)
j=1
112
Denote the first item at the last line of (A.2) by Ln (u). On the
other hand, according to Knight (1998), it holds that for x = 0,
|x y| |x|
111
113
114
115
116
117
118
F:jbes05m060.tex; (Diana) p. 8
8
1
because
2
3
4
n1/2 u
n
i=1
5
6
+2
n
n1/2 u xi
11
12
13
14
15
16
17
18
2
nE Zni
(u)I n1/2 |u xi |
2
n1/2 |u xi |
2 ds I n1/2 |u xi |
nE
20
21
22
28
29
30
31
32
R 2nE
n1/2 |u xi |
34
I n
35
2nE
36
37
38
n1/2 |u xi |
43
i=1
49
56
57
58
59
n
xi sgn i n1/2 xi .
Hence
E
n
Furthermore,
1/2
n
n1/2 u xi
79
[F(s) F(0)] ds
sf (0) ds + o(1) = .5f (0)u (xi xi )u + o(1),
83
84
85
86
87
88
89
90
91
92
93
94
95
101
102
1/2
105
106
108
n
104
107
M
i=1
=E
78
100
n = Mn1/2 ,
Zni (u)
77
103
n1/2 u xi
76
99
= nE[Zni (u)] = nE
75
98
i=1
74
97
n
73
96
i=1
n
n
2
var
Zni =
var(Zni ) nEZni
(u) 0.
48
55
47
54
s ds I n1/2 |u xi | <
72
82
i=1
46
53
where p represents convergence in probability. Therefore, the second item on the right side of (A.3) converges to
f (0)u u in probability.
By choosing a sufficiently large C, for (A.3), the second item
dominates the first item uniformly in u = C. Furthermore,
the second item on the last line of (A.2) converges to 0 in probability and hence is also dominated by the second item on the
right side of (A.3). This completes the proof of Lemma 1.
{ f (0) + }E|u xi | ,
42
52
n1/2 |u xi |
70
71
41
51
|u xi | <
69
1
Zni (u) p f (0)u u,
2
i=1
[F(s) F(0)] ds I n1/2 |u xi | <
2n{ f (0) + }E
40
50
39
45
1/2
66
68
n
1/2 Q()
1/2
ij + nj sgn(j ),
n
= n
sgn(yi xi )x
j
65
81
33
44
i=1
64
67
25
27
63
80
= 4E |u xi |2 I(|u xi | n) = o(1).
24
26
n
23
62
By the central limit theorem, the first item on the right side
of (A.3) converges in distribution to u W, where W is a p-dimensional normal random vector with mean 0 and variance matrix = cov(x1 ).
Next we show that the second item on the right side of
(A.3) converges to a real function of u in probability. Denote
the cumulative distribution function of i by F and the item
n1/2 u xi
[I(i s) I(i 0)] ds by Zni (u). Hence
0
19
61
1
2 E |u x1 |2 I |u x1 | > n1/2 0.
( )
(A.3)
9
10
P n1/2 max(|u x1 |, . . . , |u xn |) >
nP |u x1 | > n1/2
i=1
60
109
110
111
112
113
xi sgn(yi xi )
114
115
i=1
1/2
n
i=1
116
117
118
F:jbes05m060.tex; (Diana) p. 9
where =
n( ). Hence
n1/2
3
4
5
6
7
8
9
10
11
12
n
REFERENCES
= Op (1).
xi sgn(yi xi )
i=1
13
14
15
16
17
18
n
yi x a n1/2 v xai |yi x a |
sn (v) =
ai
ai
19
20
i=1
21
p0
+n
j j + n1/2 vj |j | ,
22
23
24
25
26
27
28
29
j=1
30
d v W0 + f (0)v v,
31
32
33
34
35
36
37
38
39
40
41
42
(A.1)
j=1
v W0
f (0)v v
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
43
102
44
103
45
104
46
105
47
106
48
107
49
108
50
109
51
110
52
111
53
112
54
113
55
114
56
115
57
116
58
117
59
118