Professional Documents
Culture Documents
Let x1 , . . . , xn be a random sample from a p-dimensional normal population with mean and
positive-definite variance-covariance matrix . Consider the testing problem: Ho : = o
versus Ha : 6= o , where o is a known vector.
This is a generalization of the one-sample t test of the univariate case. When p = 1, the test
statistic is
n
1 X
x o
, with s2 =
(xi x)2 .
t=
s/ n
n 1 i=1
The statistic follows a Student-t distribution with n 1 degrees of freedom. One rejects Ho
if the p-value of t is less than the type-I error denoted by . For generalization, we rewrite
the test as
!1
s2
2
t = (
x o )
(
x o ).
n
One rejects Ho if and only if t2 t2n1 (/2), the upper 100(/2) percentile of t-distribution
with n 1 degrees of freedom. A natural generalization of this test statistic is
2
T = (
x o )
where S =
1
n1
Pn
i=1 (xi
S
n
!1
(
x o ),
x
)(xi x
)0 is the sample covariance matrix.
(1)
V4
50.919252
13.216236
-8.710395
2
V4
> Si=solve(S)
> T2=nrow(x)*t(xave)%*%Si%*%xave
> T2
[,1]
[1,] 2.107425
### Results of marginal tests
> t.test(x[,1])
One Sample t-test
data: x[, 1]
t = -0.2556, df = 119, p-value = 0.7987
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-2.300074 1.774182
sample estimates:
mean of x
-0.2629461
> t.test(x[,2])
One Sample t-test
data: x[, 2]
t = 0.7465, df = 119, p-value = 0.4568
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-1.009311 2.230812
sample estimates:
mean of x
0.6107505
> t.test(x[,3])
One Sample t-test
data: x[, 3]
t = 1.1524, df = 119, p-value = 0.2515
3
25
20
15
0
10
10
15
Quantile of chisquare
[1] 0.975713
> source("Hotelling.R")
> Hotelling(z,c(4,50,10))
[,1]
Hoteliing-T2 9.73877286
p.value
0.06492834 <== Cannot reject the null hypo at the 5% level.
Remark: Let y = Cx + d, where C is a p p non-singular matrix and d is a p-dimensional
vector. Then, testing H o : = o of x is equivalent to testing Ho : y = Co + d of y. It
turns out that, as expected, the Hotelling T 2 statistic is identical.
Proof: y = C x
+ d and S y = CSC 0 .
T 2 = n(
y yo )0 S 1
y yo )
y (
= n(C x
+ d Co d)0 (CSC 0 )1 (C x
+ d Co d)
0 0 1 1 1
x o )]
= n[C(
x o )] C S C [C(
0 1
= n(
x o ) S (
x o ).
Key result: Hotelling T 2 test statistic is equivalent to likelihood ratio test statistic.
General theory: Let be the parameter vector of a likelihood function L() with observations x1 , . . . , xn . Suppose that the parameter space is . Consider the null hypothesis
o versus Ha : 3 o , where o is a subspace of . The likelihood ratio statistic is
=
max o L()
.
max L()
(2)
One rejects Ho if < c, where c is some critical value depending on the type-I error.
Intuitively, one rejects Ho if the maximized likelihood function over the subspace o is much
smaller than that over the parameter space , indicating that it is unlikely for to be in
o . Specifically, under the hull hypothesis Ho , the likelihood ratio statistic
2 ln() 2vvo ,
as n ,
1
n
Pn
i=1
b =
xi and
L(o , ) =
1
n
Pn
i=1 (xi
x
)(xi x
)0 . Under o , the likelihood
n
1X
exp
(xi o )0 1 (xi o ) .
2 i=1
"
1
(2)np/2 ||n/2
b =
where
o
1
n
Pn
i=1 (xi
o )(xi o )0 . Consequently,
=
b
||
b |
|
o
!n/2
b
b | is called the Wilks lambda.
The statistic 2/n = ||/|
o
n
X
(xi x
)(xi x
)0 + n(
x o )(
x o )0 |
i=1
= |
n
X
(xi x
)(xi x
)0 || 1 n(
x o )0
i=1
n
X
i=1
!1
(x x
)(xi x
)0
(
x o )|.
1
This follows directly from the identity |A| = |A22 ||A11 A12 A1
22 A21 | = |A11 ||A22 A21 A11 A12 |
with
" P
#
"
#
n
0
(x
)(x
)
A
A
n(
x
)
i
i
11
12
o
i=1
A=
=
.
n(
x o )0
1
A21 A22
(xi o )(xi o )0 =
i=1
n
X
(xi x
)(xi x
)0 + n(
x o )(
x o )0 .
i=1
Therefore,
n
X
n
X
o|
b
|n|
T2
.
(1)| (xi o )(xi o ) | = | (x x
)(xi x
)|(1) 1 +
n1
i=1
i=1
!
That is,
T2
1+
.
n1
!
b
|n
Consequently,
2/n
b
||
T2
= b = 1+
n1
|o |
!1
Remark: The prior result suggests that there is no need to calculate S 1 in computing the
Hotellings T 2 statistic. Indeed,
T2 =
b |
(n 1)|
o
(n 1),
b
||
Confidence region
Let be the parameter vector of interest and be the paramter space. For a random sample
X 1 , . . . , X n , let X denote the data. A 100(1 )% confidence region R(X) is defined as
P r[ R(X)] = 1 ,
where the probability is evaluated at the unknown true parameter . That is, the region
R(X) will cover with probability 1 .
For the mean vector of a multivariate normal distribution, the confidence region (C.R.)
can be obtained by the result
(n 1)p
T2
Fp,np .
np
That is,
"
)0 S 1 (X
)0 (n 1)p Fp,np () = 1 ,
P r n(X
np
1 Pn
0
whatever the values of and , where S = n1
i=1 (X i X)(X i X) . In other words,
will be within
if we measure the distance using the variance-covariance matrix n1 S, then X
[(n 1)pFp,np ()/(n p)]1/2 of .
)0 (S/n)1 (X
)0 can be considered as the Mahalanobis
Remark: The quantity (X
because the covariance matrix of X
is 1 , which is consistently
distance of from X,
n
1
estimated by n S. Compared with the Euclidean distance, Mahalanobis distance takes into
consideration the covariance structure. It is a distance measure based on correlations between
variables.
For normal distribution, C.R. can be viewed as ellipsoids centered at and have axes
determined by the eigenvalues and eigenvectors of S. For Np (, ) distribution, the contours
of constant density are ellipsoids defined by x such that
(x )0 1 (x ) = c2 .
v
u
u (n 1)p
t
F
n(n p)
p,np ()ei ,
0 Sa
a
a0 Sa
a0 x
tn1 (/2)
a0 a0 x
+ tn1 (/2)
.
n
n
This interval consists of a0 values for which
n(a0 x
a0 )
tn1 (/2),
a0 Sa
or, equivalently,
n[a0 (
x )]2
t2n1 (/2).
a0 Sa
8
If we consider all values of a for which the prior inequality holds, then we are naturally led
to the determination of
n[a0 (
x )]2
max
.
a
a0 Sa
Using the Maximization Lemma (see Eq. (2.50) of the text, p. 80) with x = a, d = x
and B = S, we obtain
n[a0 (
x )]2
[a0 (
x )]2
max
=
n
max
= n(
x )0 S 1 (
x ) = T 2 ,
a
a
a0 Sa
a0 Sa
"
(n1)p
Fp,np ,
np
a0 X
v
u
u p(n 1)
F
t
n(n p)
v
u
u p(n 1)
F
+t
a0 X
0
p,np ()a Sa,
n(n p)
p,np ()a Sa
v
u
u p(n 1)
t
F
v
u
u p(n 1)
sii
sii
i xi + t
Fp,np ()
,
p,np ()
(n p)
n
(n p)
n
r
v
u
u p(n 1)
t
F
(n p)
xi xk
s
p,np ()
v
u
u p(n 1)
+t
F
(n p)
p,np ()
Table 1: Critical multipliers for one-at-a-time t-intervals and the T 2 -intervals for selected
sample size n and dimension p, where 1 =r0.95
p(n1)
Fp,np (0.05)
np
n tn1 (0.025) p = 3 p = 5 p = 10
15
2.145
3.495 4.825 11.51
30
2.045
3.089 3.886 5.835
50
2.010
2.961 3.631 5.044
100
1.984
2.874 3.470 4.617
200
1.972
2.834 3.396 4.438
1.960
2.795 3.327 4.279
m
X
P r[Ci false] = 1
i=1
m
X
[1 P r(Ci true)]
i=1
= 1 (1 + + m ).
This is a special case of the Bonferroni inequality and allows us to control the overall type-I
error rate 1 + + m . The choices of i make the method flexible, depending on the
prior knowledge on the importance of each interval. If there is no prior information, then
i = /m is often used.
10
Table 2: Ratio of the lengths between Bonferroni interval and T 2 -interval for selected sample
size n and dimension p when the probability of the interval is 0.95
p
n
2
3
4
5
10
15 0.877 0.778 0.693 0.617 0.289
30 0.899 0.823 0.761 0.709 0.521
50 0.906 0.837 0.783 0.738 0.583
100 0.911 0.847 0.798 0.757 0.622
200 0.913 0.852 0.804 0.766 0.640
0.916 0.856 0.811 0.774 0.656
Let zi = a0i xi and szi = a0i Sai . Then, each of the intervals
r
zi tn1 (/(2m))
szi
,
n
i = 1, , m,
contains a0i with probability 1 /m. Jointly, all a0i are in their respective intervals with
probability p 1 m(/m) = 1 .
For instance, if m = p and ai is the ith unit vector, then the intervals
sii
sii
i xi + tn1 (/(2p))
xi tn1 (/(2p))
n
n
r
hold with probability at least 1 . We refer to these intervals as the Bonferroni intervals.
In general, if we use i = /p, then we can compare the length of Bonferroni intervals with
those of the T 2 intervals. The ratio is
Length of Bonferroni interval
tn1 (/(2p))
.
=r
2
Length of T interval
p(n1)
F
()
np
p,np
This ratio does not depend on the sample mean or covariance matrix. Table 2 gives the prior
ratio for some selected n and p. See also, Table 5.4 of the text( page 234).
When the sample size is large, we can apply the central limit theory so that the population
)0 S 1 (X
) 2p . Thus,
may not be normal. Recall that, when n is large, n(X
)0 S 1 (X
) 2p ()] 1 ,
P r[n(X
where 2p () is the upper 100 percentile of a chi-square distribution with p degrees of
freedom. This property can be used (a) to construct asymptotic confidence intervals for the
means i and (b) to perform hypothesis testing about .
11
a0 X
2p ()
a0 Sa
n
We discuss two cases. In the first case, the goal is to identify unusual observations in a
sample, including the possibility of drift over time. This is called the T 2 -chart, which uses
the distance d2i discussed before in assessing normality assumption. For the ith observation,
the d2i statistic is
d2i = (xi x
)0 S 1 (xi x
).
The upper control limit is then set by the upper quantiles of 2p . Typically, 95th and 99th
percentiles are used to set the upper control limit and the lower control limit is zero. Once
a point is found to be outside the control limit, the individual confidence intervals for the
component means can be used to identify the source of the deviation.
As an illustration, consider the monthly log returns of the four Midwest companies from
1998 to 2007. The T 2 -chart is given in Figure 2. About three data points are outside of the
99% limit, indicating a volatility period.
In the second case, we consider a control chart for future observations. The theory behind
this type of control chart is the following result.
Result 5.6. Let X 1 , . . . , X n be independently distributed as Np (, ) and let X be a
future observation from the same population. Then,
T2 =
n
0 S 1 (X X)
p(n 1) Fp,np ,
(X X)
n+1
np
p(n2 1)
Fp,np ().
n(n p)
= 0. Since X and X
are independent,
Proof: First, E(X X)
= + 1 = n + 1 .
Cov(X X)
n
n
13
Figure 2: T-sq chart for the monthly log returns of four Midwest companies: 1998-2007
20
25
t2
15
10
20
40
60
14
80
100
120
Figure 3: T 2 Future Chart for monthly log returns of four Midwest companies: 1998-2007
25
20
T2
15
10
40
60
80
tdx
100
120
(X X) S
(X X)
n+1
n+1
has a scaled F distribution as stated.
T 2 -chart for future observations. For each new observation, plot
T2 =
n
(x x
)0 S 1 (x x
),
n+1
against time order. Set the lower control limit to zero and the upper control limit as
U CL =
(n 1)p
Fp,np (0.01).
np
For illustration, consider the monthly log return series of the four Midwest companies. We
start with initial n = 40 observations. The T 2 -chart for future observations is given in
Figure 3.
Remark: R functions for the two control charts are available on the course web.
15
=
where x
Pn
i=1
1
),
= (1
o o + n x
xi /n.
16
Consider the case in which X i is serially correlated such as it follows a vector AR(1) model
X t = (X t1 ) + at ,
at iid Np (0, ),
where E(X t ) = and all eigenvalues of are less than 1 in modulus. Let X 1 , . . . , X n be
n consecutive observations of the model. Define the lag-` autocovariance matrix of X t as
` = E(X t )(X t` )0 ,
` = 0, 1, 2, . . . .
i (i )0 .
i=1
For ` > 0, post-multiplying the model by (X t` )0 , taking expectation, and using the
fact that X t` is uncorrelated with at , we get
` = `1 ,
` = 1, 2, . . . .
` = (`1) 0 ,
` > 1,
which is equivalent to
` = `+1 0 ,
` = 1, 2, . . . .
For the vector AR(1) model, we can show the following properties:
= .
1. E(X)
2. Cov(n1/2
Pn
t=1
X t ) p , where = (I )1 0 + 0 (I 0 )1 0 , as n .
0
X)(X
t X) p 0 as n .
Using these properties, we can show that n(X
) is approximately normal with mean
0 1
0 and covariance . This implies that n(X ) (X ) 2p , not the usual statistic
)0 S 1 (X
). See Table 5.10 of the text (page 257) for the difference in coverage
n(X
probability due to the effect of AR(1) serial correlations.
3. S =
1
n1
Pn
t=1 (X t
17