You are on page 1of 64

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

A Well Conditioned and Sparse Estimate of


Covariance and Inverse Covariance Matrix Using
Joint Penalty
Ashwini Maurya
Department of Statistics and Probability, Michigan State University

Oct 29, 2014

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Where is covariance matrix used ?

Principal component Analysis (PCA) [Johnstone et al.


(2004), Zou et al. (2006)].
Linear or Quadratic Discriminant Analysis (LDA) [Mardia et
al. (1975)].
Gaussian Graphical Modeling (GGM) [Koller and Friedman
(2009), Meinshausen et al. (2006)].
Covariance matrix is not the end goal:
- PCA requires estimate eigen-structure.
- LDA/QDA require inverse covariance matrix.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Where is covariance matrix used ?

Principal component Analysis (PCA) [Johnstone et al.


(2004), Zou et al. (2006)].
Linear or Quadratic Discriminant Analysis (LDA) [Mardia et
al. (1975)].
Gaussian Graphical Modeling (GGM) [Koller and Friedman
(2009), Meinshausen et al. (2006)].
Covariance matrix is not the end goal:
- PCA requires estimate eigen-structure.
- LDA/QDA require inverse covariance matrix.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Where is covariance matrix used ?

Principal component Analysis (PCA) [Johnstone et al.


(2004), Zou et al. (2006)].
Linear or Quadratic Discriminant Analysis (LDA) [Mardia et
al. (1975)].
Gaussian Graphical Modeling (GGM) [Koller and Friedman
(2009), Meinshausen et al. (2006)].
Covariance matrix is not the end goal:
- PCA requires estimate eigen-structure.
- LDA/QDA require inverse covariance matrix.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Where is covariance matrix used ?

Principal component Analysis (PCA) [Johnstone et al.


(2004), Zou et al. (2006)].
Linear or Quadratic Discriminant Analysis (LDA) [Mardia et
al. (1975)].
Gaussian Graphical Modeling (GGM) [Koller and Friedman
(2009), Meinshausen et al. (2006)].
Covariance matrix is not the end goal:
- PCA requires estimate eigen-structure.
- LDA/QDA require inverse covariance matrix.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Does covariance reflect geometry of data ?

Data Lying near a linear manifold ? Yes, PCA is the


alternative.
Otherwise No ! Let X U(a, a) for a > 0. Let Y = X 2 ,
then Cov(X,Y)=0.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Does covariance reflect geometry of data ?

Data Lying near a linear manifold ? Yes, PCA is the


alternative.
Otherwise No ! Let X U(a, a) for a > 0. Let Y = X 2 ,
then Cov(X,Y)=0.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Some real life applications where covariance is useful

Analysis of gene expression


Web Search Problems
Time Series Data Analysis
Climate Data Analysis

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Why not just use sample covariance matrix ?

Let Xm = (Xm,1 , Xm,2 , , Xm,p ) be p-dimensional random


vector, the sample covariance matrix S = ((Sij )) where:
n
1 X
Sij =
(Xm,i Xi )(Xm,j Xj );
n1

i, j = 1, 2, ..p

m=1

MLE, unbiased, and well behaved for fixed p as n . But


very noisy for large p.
Sample Eigenvalues are overdispersed [Marcenko-Pastur
(1967), Bai and Yin (1993), Johnstone (2001)].

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Why not just use sample covariance matrix ?

Let Xm = (Xm,1 , Xm,2 , , Xm,p ) be p-dimensional random


vector, the sample covariance matrix S = ((Sij )) where:
n
1 X
Sij =
(Xm,i Xi )(Xm,j Xj );
n1

i, j = 1, 2, ..p

m=1

MLE, unbiased, and well behaved for fixed p as n . But


very noisy for large p.
Sample Eigenvalues are overdispersed [Marcenko-Pastur
(1967), Bai and Yin (1993), Johnstone (2001)].

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Why not just use sample covariance matrix ?

Let Xm = (Xm,1 , Xm,2 , , Xm,p ) be p-dimensional random


vector, the sample covariance matrix S = ((Sij )) where:
n
1 X
Sij =
(Xm,i Xi )(Xm,j Xj );
n1

i, j = 1, 2, ..p

m=1

MLE, unbiased, and well behaved for fixed p as n . But


very noisy for large p.
Sample Eigenvalues are overdispersed [Marcenko-Pastur
(1967), Bai and Yin (1993), Johnstone (2001)].

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Why not just use sample covariance matrix ?

Sample eigenvectors are not consistent [Johnstone and Lu


(2004)].
LDA breaks down if p/n . [Bickel and Levina (2004)]
Singular if p > n.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Why not just use sample covariance matrix ?

Sample eigenvectors are not consistent [Johnstone and Lu


(2004)].
LDA breaks down if p/n . [Bickel and Levina (2004)]
Singular if p > n.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Why not just use sample covariance matrix ?

Sample eigenvectors are not consistent [Johnstone and Lu


(2004)].
LDA breaks down if p/n . [Bickel and Levina (2004)]
Singular if p > n.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Some desired property of a covariance matrix estimator

Well Conditioned The eigenvalues are bounded below and


above by finite positive constant. Estimation of
eigen-structure is also important.
Sparsity In high dimensional setting, it does not make sense
to estimate more parameters than number of observations
without a sparsity structure.
A fast and efficient algorithm A higher computational
complexity is big problem for high dimensional data problems.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Some desired property of a covariance matrix estimator

Well Conditioned The eigenvalues are bounded below and


above by finite positive constant. Estimation of
eigen-structure is also important.
Sparsity In high dimensional setting, it does not make sense
to estimate more parameters than number of observations
without a sparsity structure.
A fast and efficient algorithm A higher computational
complexity is big problem for high dimensional data problems.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Some desired property of a covariance matrix estimator

Well Conditioned The eigenvalues are bounded below and


above by finite positive constant. Estimation of
eigen-structure is also important.
Sparsity In high dimensional setting, it does not make sense
to estimate more parameters than number of observations
without a sparsity structure.
A fast and efficient algorithm A higher computational
complexity is big problem for high dimensional data problems.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Early work: Estimators based on shrinkage of eigenvalues


Steinian shrinkage of eigenvalues: First proposed by Stein
[Rietz lecture (1975)].
= U(
)
U

is a diagonal matrix with entries as transformed


where ()
function of sample eigenvalues and U is matrix of eigenvectors
of sample covariance matrix.
+ 2 I where 1 and
Empirical Bayes L.R. Haff (1980): 1
2 depends upon n and p only.
+ 2 I where optimal 1 and 2
Ledoit-Wolf (2003) 1
estimated from data.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Early work: Estimators based on shrinkage of eigenvalues


Steinian shrinkage of eigenvalues: First proposed by Stein
[Rietz lecture (1975)].
= U(
)
U

is a diagonal matrix with entries as transformed


where ()
function of sample eigenvalues and U is matrix of eigenvectors
of sample covariance matrix.
+ 2 I where 1 and
Empirical Bayes L.R. Haff (1980): 1
2 depends upon n and p only.
+ 2 I where optimal 1 and 2
Ledoit-Wolf (2003) 1
estimated from data.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Early work: Estimators based on shrinkage of eigenvalues


Steinian shrinkage of eigenvalues: First proposed by Stein
[Rietz lecture (1975)].
= U(
)
U

is a diagonal matrix with entries as transformed


where ()
function of sample eigenvalues and U is matrix of eigenvectors
of sample covariance matrix.
+ 2 I where 1 and
Empirical Bayes L.R. Haff (1980): 1
2 depends upon n and p only.
+ 2 I where optimal 1 and 2
Ledoit-Wolf (2003) 1
estimated from data.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Early work: Estimators based on shrinkage of eigenvalues

Maurya (2014): A joint convex penalty of `1 and trace


norm to overcome the over-dispersion of sample eigenvalue.
+ 2 I also used in other context.
The form of estimators 1
- Original formulation of ridge regression. [Hoerl and
Kennard, 1970]
- Regularized discriminant analysis. [Friedman, 1989]

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Early work: Estimators based on shrinkage of eigenvalues

Maurya (2014): A joint convex penalty of `1 and trace


norm to overcome the over-dispersion of sample eigenvalue.
+ 2 I also used in other context.
The form of estimators 1
- Original formulation of ridge regression. [Hoerl and
Kennard, 1970]
- Regularized discriminant analysis. [Friedman, 1989]

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

What we achieve by eigenvalue shrinkage ?

The estimators are well-conditioned.


Permutation Invariant under some orthogonal group, since
eigenvectors remain unchanged.
- It means the basis in which data is generated is not taken
advantage of.
- Often translates the problem that one will be able to find a
good estimator in any basis.
- It is appropriate to believe that the basis is somewhat nice
and this leads to believe that the covariance matrix has some
structure that one should be able to take advantage of.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

What we achieve by eigenvalue shrinkage ?

The estimators are well-conditioned.


Permutation Invariant under some orthogonal group, since
eigenvectors remain unchanged.
- It means the basis in which data is generated is not taken
advantage of.
- Often translates the problem that one will be able to find a
good estimator in any basis.
- It is appropriate to believe that the basis is somewhat nice
and this leads to believe that the covariance matrix has some
structure that one should be able to take advantage of.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

An Estimator based on shrinkage of eigenvalues


Consider the following optimization problem:
= argmin k Sk2 +

p

2
X
ai i () t ,

(1.1)

i=1

where S is sample covariance matrix, i () is i th largest


eigenvalue of , t > 0 is some suitably chosen positive constant
and ai s are shrinkage weights.
Solution to (1.1) is
= (S + t UAU T )(I + UAU T )1

(1.2)

where A = diag (a1 , a2 , , ap ), I is identity matrix and U is


matrix of eigenvectors (see later for choice of U).

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

An Estimator based on shrinkage of eigenvalues


Consider the following optimization problem:
= argmin k Sk2 +

p

2
X
ai i () t ,

(1.1)

i=1

where S is sample covariance matrix, i () is i th largest


eigenvalue of , t > 0 is some suitably chosen positive constant
and ai s are shrinkage weights.
Solution to (1.1) is
= (S + t UAU T )(I + UAU T )1

(1.2)

where A = diag (a1 , a2 , , ap ), I is identity matrix and U is


matrix of eigenvectors (see later for choice of U).

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

An Estimator based on shrinkage of eigenvalues

Well Conditioned
The estimator in (1.2) is well conditioned.
= min (SU(I + A)1 U T + t UA(I + A)1 U T )
min ()
Aii
>0
> t min
ip 1 +
for minip Aii > 0, > 0, t > 0.
Estimator (1.2) is not sparse.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

An Estimator based on shrinkage of eigenvalues

Well Conditioned
The estimator in (1.2) is well conditioned.
= min (SU(I + A)1 U T + t UA(I + A)1 U T )
min ()
Aii
>0
> t min
ip 1 +
for minip Aii > 0, > 0, t > 0.
Estimator (1.2) is not sparse.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Early work: Estimators based on regularization


Focuses on two class of regularized Estimators:
One class rely upon the natural ordering among variables.
- Includes estimators based on banding and tapering [Bickel
and Levina (2008), Cai et al. (2010)]
Other class, no natural ordering among variables.
- In many examples, the natural ordering does not make
sense, e.g. gene expression data.
- Searching over all permutations is not possible.
- `1 penalized estimators become more appropriate which
yields permutation invariant and sparse estimates. [Rothman
et al (2008), Yuan (2009), Jacob and Tibshirani (2011)].

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Early work: Estimators based on regularization


Focuses on two class of regularized Estimators:
One class rely upon the natural ordering among variables.
- Includes estimators based on banding and tapering [Bickel
and Levina (2008), Cai et al. (2010)]
Other class, no natural ordering among variables.
- In many examples, the natural ordering does not make
sense, e.g. gene expression data.
- Searching over all permutations is not possible.
- `1 penalized estimators become more appropriate which
yields permutation invariant and sparse estimates. [Rothman
et al (2008), Yuan (2009), Jacob and Tibshirani (2011)].

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Early work: Estimators based on regularization


Focuses on two class of regularized Estimators:
One class rely upon the natural ordering among variables.
- Includes estimators based on banding and tapering [Bickel
and Levina (2008), Cai et al. (2010)]
Other class, no natural ordering among variables.
- In many examples, the natural ordering does not make
sense, e.g. gene expression data.
- Searching over all permutations is not possible.
- `1 penalized estimators become more appropriate which
yields permutation invariant and sparse estimates. [Rothman
et al (2008), Yuan (2009), Jacob and Tibshirani (2011)].

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Early work: Estimators based on regularization


Consider the following optimization problem based on ell1
penalty:
= argmin k Sk2 + k k1

(1.3)

where is positive constant. By `1 penalty k k1 , we


penalize only off-diagonal entries of .
The minimizer of (1.3) is:
ii = Sii , ij = sign(Sij ) max(|Sij | /2, 0) i 6= j. (1.4)
Sufficiently large value of will give sparse solution.
Solution (1.4) need not be positive definite. [Xue et al.
(2012)]

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

How to estimate a sparse and well conditioned


simultaneously ? A Joint Penalty (JPEN) approach
Consider the a joint penalty of `1 and eigenvalue penalty. Let
be solution to the following optimization problem:

, = argmin

|| S||22 + k k1 +

p
X

i
ai {i () t}2 .

i=1

(2.1)
The estimator given in (2.1) shrinks the eigenvalues of sample
covariance matrix towards constant t by weight ai and a
sufficiently large value of will yields a sparse estimate of
covariance matrix.
Estimator given in (2.1) need not be positive definite.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

How to estimate a sparse and well conditioned


simultaneously ? A Joint Penalty (JPEN) approach
Consider the a joint penalty of `1 and eigenvalue penalty. Let
be solution to the following optimization problem:

, = argmin

|| S||22 + k k1 +

p
X

i
ai {i () t}2 .

i=1

(2.1)
The estimator given in (2.1) shrinks the eigenvalues of sample
covariance matrix towards constant t by weight ai and a
sufficiently large value of will yields a sparse estimate of
covariance matrix.
Estimator given in (2.1) need not be positive definite.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

How to estimate a sparse and well conditioned


simultaneously ? A Joint Penalty (JPEN) approach
The proposed JPEN estimator is given by:
=

argmin
S,t,A,
|(,)R
1

h
|| S||2F + k k1 +

p
X

(2.2)
2

ai {i () t} .

i=1

where
r
n
log p
S,t,A,

, t > 0, min Aii > 0,


R1
= (, ) :  
ip
n


o
1
Aii 
min (S) /2 + t min

ip 1 + Aii
1 + maxip Aii

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

How to estimate a sparse and well conditioned


simultaneously ? A Joint Penalty (JPEN) approach
Estimator given by (2.2) is positive definite and sparse
simultaneously.
= U DD
T be eigenvalue
Let p/n 1 and
then
decomposition of ,

min (D)



1
t min Aii /2
ip
1 + minip Aii
min (S)
+ min
ip 1 + maxip Aii

S,A, , we have min (S) () > 0 in


For (, ) R
> 0 as
probability [Bai and Yin (1993)]. Therefore min (D)
n = n(p) .

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

How to estimate a sparse and well conditioned


simultaneously ? A Joint Penalty (JPEN) approach
Estimator given by (2.2) is positive definite and sparse
simultaneously.
= U DD
T be eigenvalue
Let p/n 1 and
then
decomposition of ,

min (D)



1
t min Aii /2
ip
1 + minip Aii
min (S)
+ min
ip 1 + maxip Aii

S,A, , we have min (S) () > 0 in


For (, ) R
> 0 as
probability [Bai and Yin (1993)]. Therefore min (D)
n = n(p) .

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

How to estimate a sparse and well conditioned


simultaneously ? A Joint Penalty (JPEN) approach
S,A, ?
What can we say about existence of random set R
S,A, is asymptotically nonempty.
Set R
For finite samples when p > n, we have min (S) = 0,
n
1
o
t min Aii
ip
1 + t minip Aii
2
> 0


min (D)

for sufficiently large , t and minip Aii > 0. This guarantees


S,t,A, for finite samples.
the existence of nonempty set R
1

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Theoretical consistency of the JPEN estimator

Assumptions about the true covariance matrix 0 .


A0. The X := (X1 , X2 , , Xp ) be a mean zero vector with
covariance matrix 0 .
A1. With E = {(i, j) : 0ij 6= 0, i 6= j}, the cardinality (E ) s for
some positive integer s.
A2. There exists a finite positive real number k > 0 such that
where min (0 ) and max (0 )
1/k min (0 ) max (0 ) k,
are the minimum and maximum eigenvalues of matrix 0
respectively.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Estimators based on `1 regularization

Theorem (Consistency in Frobenius Norm)


be the minimizer as defined in (2.2). Under Assumptions
Let
S,t,A, and
A0, A1, A2 and for (, ) R
1
min (0 ) t max (0 ), we have:
r
 (p + s)log p 
0 kF = OP
(2.3)
k
n
Here the worst part of rate of convergence comes from
estimating the diagonal entries. For correlation
q matrix
estimation, the rate can be improved to OP

s log p
n

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Theoretical consistency JPEN estimator


Let 0 = W W be the variance correlation decomposition of 0
where is true correlation matrix and W is the a diagonal matrix
of true standard deviations. Define the JPEN estimator of :
K =

argmin

p
n
o
X
2

kK k + kK k1 +
ai {i (K )t}2

,t,A,
K |(,)R
1a

i=1

(2.4)
where

,t,A,
R
1a

,t,A,
R
1a

is given by:

r
n
log p
, t > 0,
= (, ) :  
n


o

1
Aii
min ()
t min
lambda/2 +

ip
1 + Aii
1 + maxip Aii
(2.5)

is the sample counterpart of .


and

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Theoretical consistency JPEN estimator


K W
be correlation matrix based estimate of
Let K = W
is matrix of sample standard
covariance matrix, where W
estimates. The following corollary establishes convergence of K in
Operator norm:
Theorem
S,t,A, ,
Under the assumption A0, A1, A2 and for (, ) R
1a
r
 (s + 1)log p 
0 k = OP
.
k
n

(2.6)

0 kF pk
0 k. Therefore the rate of
Note that k
convergence in Frobenius norm of the correlation matrix based
estimator is the same as the estimator defined in (2.2).

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Theoretical consistency JPEN estimator

Remark:

0 k = OP log p/n . This rate
i) For s = O(log p), k
of operator norm convergence is same as that of banded
estimator proposed of Bickel and Levina (2008).
ii) Rothman (2012) proposed an estimator of covariance
matrix based on similar loss function. The choice of different
penalty function yields very different estimate. Moreover
although his estimator yields a positive definite covariance
matrix estimator but whether their algorithm gives optimal
solution to their optimization problem, is hard to justify.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

An algorithm
Let f () be the objective function of (2.2). Then,
f () = ||

S||2F

+ k k1 +

p
X

ai {i () t}2 .

(3.1)

i=1

Let = UDU T be eigenvalue decomposition of , let f1 () be the


same as f () but excluding the terms not depending upon , we
have,

f1 () = tr { 2 2BC 1 C } + k k1
(1 + max Aii )k BC 1 k2F + k k1
ip

= f2 ().
where I is the identity matrix, C = I + UAU T and
B = S + t UAU T .

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

An algorithm

Note that f2 () is convex in , therefore a unique solution


exists.

An optimal solution to (3.1) is given by:


ii = (BC 1 )ii .

n

ij = sign (BC 1 )ij max |(BC 1 )ij |

,0 .
2(1 + maxip Aii )
(3.2)

where sign(x) is sign of x and |x| is absolute value of x.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

An Algorithm
Choice of U:
Note that U is the matrix of eigenvectors of , which is unknown.
In practice, one can chose U as matrix of eigenvectors of
corresponding eigenvalue decomposition of S + I for some  > 0
i.e. let S + I = U1 D1 U1T , then take U = U1 .
Choice of and :
For given value of > 0, we can find the value of satisfying:
n
< 2 (1 + min Aii )
ip

o
min (S)
+ 2 t min Aii ,
ip
1 + maxi Aii

and such choice of guarantees that the minimum eigenvalue of


the estimate will be at least > 0 and such choice of
S,t,A, .
(, ) R
1

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Computational time
We compare the computational time of the our algorithm with
Glasso and PDSCE [Rothman (2012)] algorithm.

Figure: Timing comparison of JPEN, Graphical Lasso(Glasso), PDSCE


on log-log scale.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Computational time
We compare the computational timing of our algorithm to
some other existing algorithms graphical lasso [Friedman et
al.(2008)], PDSCE [Rothman (2011)].
The exact timing of these algorithm also depends upon the
implementation, platform etc. (we did our computations in
R 3.1 on a AMD 2.8GHz processor).
Unlike Glasso algorithm for which the computational time
depends upon how dense the true covariance matrix is, our
algorithm takes roughly same amount of time for both dense
and sparse covariance matrices.
Although the proposed method requires optimization over a
S,t,A, , our algorithm is
grid of values of (, ) R
1
computationally efficient and easily scalable to large scale
data analysis problems.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Performance comparison with some other methods


We generate random vectors from multivariate t-distribution
for varying sample sizes and dimensions with 5 degrees of
freedom. We chose n = 50, 100, 200 and p = 500, 1000.
For each estimate of covariance and inverse covariance matrix,
we calculate average relative error (ARE) based on 50
simulations using the following formula:
= |log (f (S, ))
log (f (S, 0 ))|/|(log (f (S, 0 ))|,
RE (0 , )
where f (S, ) is density of multivariate normal distribution, S
is the
is sample covariance matrix, is the true covariance,
estimate of .
One of other choice of performance criteria is Kullback Leibler
[Yuan and Lin (2007), Bickel and Levina (2008)].

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Simulation
We generate random vectors from multivariate t-distribution
with mean vector zero and various structured covariance
matrices with degrees of freedom 5.
(i) Hub Graph: The rows/columns of 0 are partitioned into
J equally-sized disjoint groups: {V1 V2 , ..., VJ } =
{1, 2, ..., p}, each group is associated with a pivotal row k.
Let size |V1 | = s. We set 0i,j = 0j,i = for i Vk and
0i,j = 0j,i = 0 otherwise. In our experiment,
J = [p/s], k = 1, s + 1, 2s + 1, ..., and we always take
= 1/(s + 1) with J = 20.
(ii) Neighborhood Graph: We first uniformly sample
(y1 , y2 , ..., yn ) from a unit square. We then set
0i,j = 0j,i = with probability
1
( 2) exp(4kyi yj k2 ). The remaining entries of 0 are
set to be zero. We always take to be 0.245.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Recovery of eigen-sprectrum of true covariance matrix

Figure: Eigenvalues Plot for p = 50 based on 50 realizations.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Tuning parameter selection

For each estimate, the optimal tuning parameter was obtained


by minimizing the empirical loss function
Srobust kF
k

(4.1)

is an estimate of the the covariance matrix, Srobust is


where
an estimate of sample covariance matrix based on 5000 sample
observations (refer a section 5 for detailed discussion).
Simulation shows that that optimal value of tuning
parameters (, ) selected by criteria (4.1) is similar if we
replace Srobust by true covariance matrix.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Recovery of eigen-sprectrum of true covariance matrix

Figure: Eigenvalues Plot for p = 50 based on 50 realizations.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Recovery of sparse structure of true covariance matrix


Figure: Heatmap of zeros identified in covariance matrix out of 50
realizations. White color is 50/50 zeros identified, black color is 0/50
zeros identified.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Average relative error and standard deviations

Table: Hub Type of covariance Matrix


p=500
n

Ledoit-Wolf

Glasoo

PDSCE

JPEN

50
100
200

19.5(7.083)
36.5(19.19)
21.9(3.558)

382.(23.08)
422.(54.06)
340.(48.35)

21.3(2.845)
22.9(7.590)
24.9(5.753)

19.5(7.072)
18.9(4.077)
18.3(4.663)

Ledoit-Wolf

Glasoo

PDSCE

JPEN

64.5(6.19)
92.9(11.3)
106(13.5)

470(49.8)
559(72.9)
463(59.7)

57.7(10.93)
89.4(12.65)
97.9(17.34)

50.6(4.8)
57.8(4.76)
61.9(4.93)

p=1000
50
100
200

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Average relative error and standard deviations

Table: Neighborhood Type of covariance Matrix


n
50
100
200

Ledoit-Wolf
0.47(0.004)
0.48(0.002)
0.34(0.002)

p=500
Glasoo
PDSCE
13.4(0.198)
0.48(0.003)
10.8(0.090)
0.48(0.002)
7.97(0.070)
0.34(0.001)

Ledoit-Wolf

Glasoo

PDSCE

JPEN

50
100
200

0.29(0.003)
0.27(0.002)
0.26(0.001)

18.2(0.2)
11.9(0.131)
9.21(0.179)

0.30(0.003)
0.27(0.002)
0.27(0.001)

0.28(0.003)
0.26(0.002)
0.25(0.001)

JPEN
0.47(0.004)
0.34(0.032)
0.28(0.005)

p=1000

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Simulation
The average relative error and their standard deviations are
given in table above. (Please refer the manuscript for detailed
simulations). The numbers in the bracket is the standard error
estimate of relative error.
The JPEN estimate of covariance matrix outperforms other
methods for all values of p and n and for all four types of
covariance matrices.
Among all the methods, the PDSC estimates are closer to
JPEN estimate in the terms of ARE.
The Ledoit-Wolf estimate performs good in terms of ARE but
the estimated covariance matrix is not sparse.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Analysis of Colon Tumor Tissue Data

In this experiment, colon adenocarcinoma tissue samples were


collected, 40 of which were tumor tissues and 22 non-tumor
tissues. Tissue samples were analyzed using an Affymetrix
oligonucleotide array.
The data were processed, filtered, and reduced to a subset of
2,000 gene expression values with the largest minimal
intensity over the 62 tissue samples (source:
http://genomics-pubs.princeton.edu/oncology
/affydata/index.html).
We obtain estimates of covariance matrix for p = 50, 100, 200
and then use LDA to classify these tissues as either tumorous
or non-tumorous (normal).

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Analysis of Colon Tumor Tissue Data

We classify each test observation x to either class k = 0


(tumorous) or k = 1(normal) using the LDA rule
o
n
k + log (k ) .
k 1 k
k (x) = arg max x T
2
k
where k is the proportion of class k observations in the training
data, k is the sample mean for class k on the training data, and
:=
1 is an estimator of the inverse of the common covariance

matrix on the training data computed by one of the methods under


consideration.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Analysis of Colon Tumor Tissue Data


Table: Averages and standard errors of classification errors over 100
replications in %.

Method
Logistic Regression
SVM
Naive Bayes
Graphical Lasso
Joint Penalty

p=50
21.0(0.84)
16.70(0.85)
13.3(0.75)
10.9(1.3)
9.9(0.98)

p=100
19.31(0.89)
16.76(0.97)
14.33(0.85)
9.4(0.89)
8.9(0.93)

p=200
21.5(0.85)
18.18(0.96)
14.63(0.75)
9.8(0.90)
8.2(0.81)

Among all the methods covariance matrix based based LDA


classifiers perform far better than other methods.
When more genes are added to the data set, the classification
performance of JPEN estimate based LDA classifier improves
whereas for other methods classification performance
deteriorates.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary
A Joint penalty estimate of covariance matrix is proposed.
The estimator is both well-conditioned and sparse
simultaneously. The proposed approach allows one to take
advantage of any prior structure if known on the eigenvalues
of true covariance matrix.
The theoretical consistency of JPEN estimator is establish in
both Frobenius and Operator norm which guarantees
consistency for principal components, hence we expect that
PCA will be one of the most important applications of the
method.
The proposed algorithm is very fast, efficient and easily
scalable to large scale optimization problems.

Summary

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Estimation of Inverse covariance matrix

We propose JPEN estimator of inverse covariance matrix and


establish similar rate of convergence of the proposed
estimator.
The JPEN inverse covariance matrix performs better than
some other methods for varying sample seizes and dimensions.
Please refer the manuscript for detailed discussion.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Summary

Acknowledgment
I would like to express my deep gratitude to Professor Hira L. Koul
for his valuable and constructive suggestions during the planning
and development of this research work.

Introduction and Scope

Proposed Estimator

An algorithm

A simulation study

Real life Data Application

Thank you for your attention !


For references and other details, I can be reached at
mauryaas@msu.edu.

Summary

You might also like