A Joint Convex Penalty For Inverse Covariance Matrix Estimation

Introduction and Scope
Proposed Estimator
An algorithm
A simulation study
Real life Data Application
A Well Conditioned and Sparse Estimate of

Covariance and Inverse Covariance Matrix Using
Joint Penalty
Ashwini Maurya
Department of Statistics and Probability, Michigan State University
Oct 29, 2014
Summary
Proposed Estimator
An algorithm
A simulation study
Where is covariance matrix used ?
Principal component Analysis (PCA) [Johnstone et al.

(2004), Zou et al. (2006)].
Linear or Quadratic Discriminant Analysis (LDA) [Mardia et
al. (1975)].
Gaussian Graphical Modeling (GGM) [Koller and Friedman
(2009), Meinshausen et al. (2006)].
Covariance matrix is not the end goal:
- PCA requires estimate eigen-structure.
- LDA/QDA require inverse covariance matrix.
Summary
Proposed Estimator
An algorithm
A simulation study

(2004), Zou et al. (2006)].
al. (1975)].
Summary
Proposed Estimator
An algorithm
A simulation study

(2004), Zou et al. (2006)].
al. (1975)].
Summary
Proposed Estimator
An algorithm
A simulation study

(2004), Zou et al. (2006)].
al. (1975)].
Summary
Proposed Estimator
An algorithm
A simulation study
Does covariance reflect geometry of data ?
Data Lying near a linear manifold ? Yes, PCA is the

alternative.
Otherwise No ! Let X U(a, a) for a > 0. Let Y = X 2 ,
then Cov(X,Y)=0.
Summary
Proposed Estimator
An algorithm
A simulation study
Does covariance reflect geometry of data ?
Data Lying near a linear manifold ? Yes, PCA is the

alternative.
Otherwise No ! Let X U(a, a) for a > 0. Let Y = X 2 ,
then Cov(X,Y)=0.
Summary
Proposed Estimator
An algorithm
A simulation study
Some real life applications where covariance is useful
Analysis of gene expression

Web Search Problems
Time Series Data Analysis
Climate Data Analysis
Summary
Proposed Estimator
An algorithm
A simulation study
Summary
Why not just use sample covariance matrix ?
Let Xm = (Xm,1 , Xm,2 , , Xm,p ) be p-dimensional random

vector, the sample covariance matrix S = ((Sij )) where:
n
1 X
Sij =
(Xm,i Xi )(Xm,j Xj );
n1
i, j = 1, 2, ..p
m=1
MLE, unbiased, and well behaved for fixed p as n . But

very noisy for large p.
Sample Eigenvalues are overdispersed [Marcenko-Pastur
(1967), Bai and Yin (1993), Johnstone (2001)].
Proposed Estimator
An algorithm
A simulation study
Summary

n
1 X
Sij =
n1
i, j = 1, 2, ..p
m=1

Proposed Estimator
An algorithm
A simulation study
Summary

n
1 X
Sij =
n1
i, j = 1, 2, ..p
m=1

Proposed Estimator
An algorithm
A simulation study
Sample eigenvectors are not consistent [Johnstone and Lu

(2004)].
LDA breaks down if p/n . [Bickel and Levina (2004)]
Singular if p > n.
Summary
Proposed Estimator
An algorithm
A simulation study

(2004)].
Singular if p > n.
Summary
Proposed Estimator
An algorithm
A simulation study

(2004)].
Singular if p > n.
Summary
Proposed Estimator
An algorithm
A simulation study
Summary
Some desired property of a covariance matrix estimator
Well Conditioned The eigenvalues are bounded below and

above by finite positive constant. Estimation of
eigen-structure is also important.
Sparsity In high dimensional setting, it does not make sense
to estimate more parameters than number of observations
without a sparsity structure.
A fast and efficient algorithm A higher computational
complexity is big problem for high dimensional data problems.
Proposed Estimator
An algorithm
A simulation study
Summary

Proposed Estimator
An algorithm
A simulation study
Summary

Proposed Estimator
An algorithm
A simulation study
Summary
Early work: Estimators based on shrinkage of eigenvalues

Steinian shrinkage of eigenvalues: First proposed by Stein
[Rietz lecture (1975)].
= U(
)
U
is a diagonal matrix with entries as transformed

where ()
function of sample eigenvalues and U is matrix of eigenvectors
of sample covariance matrix.
+ 2 I where 1 and
Empirical Bayes L.R. Haff (1980): 1
2 depends upon n and p only.
+ 2 I where optimal 1 and 2
Ledoit-Wolf (2003) 1
estimated from data.
Proposed Estimator
An algorithm
A simulation study
Summary

= U(
)
U

where ()
+ 2 I where 1 and
Proposed Estimator
An algorithm
A simulation study
Summary

= U(
)
U

where ()
+ 2 I where 1 and
Proposed Estimator
An algorithm
A simulation study
Summary
Maurya (2014): A joint convex penalty of `1 and trace

norm to overcome the over-dispersion of sample eigenvalue.
+ 2 I also used in other context.
The form of estimators 1
- Original formulation of ridge regression. [Hoerl and
Kennard, 1970]
- Regularized discriminant analysis. [Friedman, 1989]
Proposed Estimator
An algorithm
A simulation study
Summary
Maurya (2014): A joint convex penalty of `1 and trace

norm to overcome the over-dispersion of sample eigenvalue.
+ 2 I also used in other context.
The form of estimators 1
- Original formulation of ridge regression. [Hoerl and
Kennard, 1970]
- Regularized discriminant analysis. [Friedman, 1989]
Proposed Estimator
An algorithm
A simulation study
Summary
What we achieve by eigenvalue shrinkage ?
The estimators are well-conditioned.

Permutation Invariant under some orthogonal group, since
eigenvectors remain unchanged.
- It means the basis in which data is generated is not taken
advantage of.
- Often translates the problem that one will be able to find a
good estimator in any basis.
- It is appropriate to believe that the basis is somewhat nice
and this leads to believe that the covariance matrix has some
structure that one should be able to take advantage of.
Proposed Estimator
An algorithm
A simulation study
Summary
What we achieve by eigenvalue shrinkage ?
The estimators are well-conditioned.

Permutation Invariant under some orthogonal group, since
eigenvectors remain unchanged.
- It means the basis in which data is generated is not taken
advantage of.
- Often translates the problem that one will be able to find a
good estimator in any basis.
- It is appropriate to believe that the basis is somewhat nice
and this leads to believe that the covariance matrix has some
structure that one should be able to take advantage of.
Proposed Estimator
An algorithm
A simulation study
Summary
An Estimator based on shrinkage of eigenvalues

Consider the following optimization problem:
= argmin k Sk2 +
p

2
X
ai i () t ,
(1.1)
i=1
where S is sample covariance matrix, i () is i th largest

eigenvalue of , t > 0 is some suitably chosen positive constant
and ai s are shrinkage weights.
Solution to (1.1) is
= (S + t UAU T )(I + UAU T )1
(1.2)
where A = diag (a1 , a2 , , ap ), I is identity matrix and U is

matrix of eigenvectors (see later for choice of U).
Proposed Estimator
An algorithm
A simulation study
Summary

Consider the following optimization problem:
= argmin k Sk2 +
p

2
X
ai i () t ,
(1.1)
i=1
where S is sample covariance matrix, i () is i th largest

eigenvalue of , t > 0 is some suitably chosen positive constant
and ai s are shrinkage weights.
Solution to (1.1) is
= (S + t UAU T )(I + UAU T )1
(1.2)
where A = diag (a1 , a2 , , ap ), I is identity matrix and U is

matrix of eigenvectors (see later for choice of U).
Proposed Estimator
An algorithm
A simulation study
Summary
Well Conditioned
The estimator in (1.2) is well conditioned.
= min (SU(I + A)1 U T + t UA(I + A)1 U T )
min ()
Aii
>0
> t min
ip 1 +
for minip Aii > 0, > 0, t > 0.
Estimator (1.2) is not sparse.
Proposed Estimator
An algorithm
A simulation study
Summary
Well Conditioned
The estimator in (1.2) is well conditioned.
= min (SU(I + A)1 U T + t UA(I + A)1 U T )
min ()
Aii
>0
> t min
ip 1 +
for minip Aii > 0, > 0, t > 0.
Estimator (1.2) is not sparse.
Proposed Estimator
An algorithm
A simulation study
Early work: Estimators based on regularization

Focuses on two class of regularized Estimators:
One class rely upon the natural ordering among variables.
- Includes estimators based on banding and tapering [Bickel
and Levina (2008), Cai et al. (2010)]
Other class, no natural ordering among variables.
- In many examples, the natural ordering does not make
sense, e.g. gene expression data.
- Searching over all permutations is not possible.
- `1 penalized estimators become more appropriate which
yields permutation invariant and sparse estimates. [Rothman
et al (2008), Yuan (2009), Jacob and Tibshirani (2011)].
Summary
Proposed Estimator
An algorithm
A simulation study

Summary
Proposed Estimator
An algorithm
A simulation study

Summary
Proposed Estimator
An algorithm
A simulation study
Summary

Consider the following optimization problem based on ell1
penalty:
= argmin k Sk2 + k k1
(1.3)
where is positive constant. By `1 penalty k k1 , we

penalize only off-diagonal entries of .
The minimizer of (1.3) is:
ii = Sii , ij = sign(Sij ) max(|Sij | /2, 0) i 6= j. (1.4)
Sufficiently large value of will give sparse solution.
Solution (1.4) need not be positive definite. [Xue et al.
(2012)]
Proposed Estimator
An algorithm
A simulation study
Summary
How to estimate a sparse and well conditioned

simultaneously ? A Joint Penalty (JPEN) approach
Consider the a joint penalty of `1 and eigenvalue penalty. Let
be solution to the following optimization problem:
, = argmin
|| S||22 + k k1 +
p
X
i
ai {i () t}2 .
i=1
(2.1)
The estimator given in (2.1) shrinks the eigenvalues of sample
covariance matrix towards constant t by weight ai and a
sufficiently large value of will yields a sparse estimate of
covariance matrix.
Estimator given in (2.1) need not be positive definite.
Proposed Estimator
An algorithm
A simulation study
Summary

Consider the a joint penalty of `1 and eigenvalue penalty. Let
be solution to the following optimization problem:
, = argmin
|| S||22 + k k1 +
p
X
i
ai {i () t}2 .
i=1
(2.1)
The estimator given in (2.1) shrinks the eigenvalues of sample
covariance matrix towards constant t by weight ai and a
sufficiently large value of will yields a sparse estimate of
covariance matrix.
Estimator given in (2.1) need not be positive definite.
Proposed Estimator
An algorithm
A simulation study
Summary

The proposed JPEN estimator is given by:
=
argmin
S,t,A,
|(,)R
1
h
|| S||2F + k k1 +
p
X
(2.2)
2
ai {i () t} .
i=1
where
r
n
log p
S,t,A,
, t > 0, min Aii > 0,

R1
= (, ) :
ip
n

o
1
Aii
min (S) /2 + t min
ip 1 + Aii
1 + maxip Aii
Proposed Estimator
An algorithm
A simulation study

Estimator given by (2.2) is positive definite and sparse
simultaneously.
= U DD
T be eigenvalue
Let p/n 1 and
then
decomposition of ,

min (D)

1
t min Aii /2
ip
1 + minip Aii
min (S)
+ min
ip 1 + maxip Aii
S,A, , we have min (S) () > 0 in

For (, ) R
> 0 as
probability [Bai and Yin (1993)]. Therefore min (D)
n = n(p) .
Summary
Proposed Estimator
An algorithm
A simulation study

Estimator given by (2.2) is positive definite and sparse
simultaneously.
= U DD
T be eigenvalue
Let p/n 1 and
then
decomposition of ,

min (D)

1
t min Aii /2
ip
1 + minip Aii
min (S)
+ min
ip 1 + maxip Aii
S,A, , we have min (S) () > 0 in

For (, ) R
> 0 as
probability [Bai and Yin (1993)]. Therefore min (D)
n = n(p) .
Summary
Proposed Estimator
An algorithm
A simulation study
Summary

S,A, ?
What can we say about existence of random set R
S,A, is asymptotically nonempty.
Set R
For finite samples when p > n, we have min (S) = 0,
n
1
o
t min Aii
ip
1 + t minip Aii
2
> 0

min (D)
for sufficiently large , t and minip Aii > 0. This guarantees

S,t,A, for finite samples.
the existence of nonempty set R
1
Proposed Estimator
An algorithm
A simulation study
Theoretical consistency of the JPEN estimator
Assumptions about the true covariance matrix 0 .

A0. The X := (X1 , X2 , , Xp ) be a mean zero vector with
covariance matrix 0 .
A1. With E = {(i, j) : 0ij 6= 0, i 6= j}, the cardinality (E ) s for
some positive integer s.
A2. There exists a finite positive real number k > 0 such that
where min (0 ) and max (0 )
1/k min (0 ) max (0 ) k,
are the minimum and maximum eigenvalues of matrix 0
respectively.
Summary
Proposed Estimator
An algorithm
A simulation study
Summary
Estimators based on `1 regularization
Theorem (Consistency in Frobenius Norm)

be the minimizer as defined in (2.2). Under Assumptions
Let
S,t,A, and
A0, A1, A2 and for (, ) R
1
min (0 ) t max (0 ), we have:
r
(p + s)log p
0 kF = OP
(2.3)
k
n
Here the worst part of rate of convergence comes from
estimating the diagonal entries. For correlation
q matrix
estimation, the rate can be improved to OP
s log p
n
Proposed Estimator
An algorithm
A simulation study
Summary
Theoretical consistency JPEN estimator

Let 0 = W W be the variance correlation decomposition of 0
where is true correlation matrix and W is the a diagonal matrix
of true standard deviations. Define the JPEN estimator of :
K =
argmin
p
n
o
X
2
kK k + kK k1 +
ai {i (K )t}2
,t,A,
K |(,)R
1a
i=1
(2.4)
where
,t,A,
R
1a
,t,A,
R
1a
is given by:
r
n
log p
, t > 0,
= (, ) :
n

o
1
Aii
min ()
t min
lambda/2 +
ip
1 + Aii
1 + maxip Aii
(2.5)
is the sample counterpart of .

and
Proposed Estimator
An algorithm
A simulation study
Summary

K W
be correlation matrix based estimate of
Let K = W
is matrix of sample standard
covariance matrix, where W
estimates. The following corollary establishes convergence of K in
Operator norm:
Theorem
S,t,A, ,
Under the assumption A0, A1, A2 and for (, ) R
1a
r
(s + 1)log p
0 k = OP
.
k
n
(2.6)
0 kF pk
0 k. Therefore the rate of
Note that k
convergence in Frobenius norm of the correlation matrix based
estimator is the same as the estimator defined in (2.2).
Proposed Estimator
An algorithm
A simulation study
Remark:

0 k = OP log p/n . This rate
i) For s = O(log p), k
of operator norm convergence is same as that of banded
estimator proposed of Bickel and Levina (2008).
ii) Rothman (2012) proposed an estimator of covariance
matrix based on similar loss function. The choice of different
penalty function yields very different estimate. Moreover
although his estimator yields a positive definite covariance
matrix estimator but whether their algorithm gives optimal
solution to their optimization problem, is hard to justify.
Summary
Proposed Estimator
An algorithm
A simulation study
Summary
An algorithm
Let f () be the objective function of (2.2). Then,
f () = ||
S||2F
+ k k1 +
p
X
ai {i () t}2 .
(3.1)
i=1
Let = UDU T be eigenvalue decomposition of , let f1 () be the

same as f () but excluding the terms not depending upon , we
have,

f1 () = tr { 2 2BC 1 C } + k k1
(1 + max Aii )k BC 1 k2F + k k1
ip
= f2 ().
where I is the identity matrix, C = I + UAU T and
B = S + t UAU T .
Proposed Estimator
An algorithm
A simulation study
Summary
An algorithm
Note that f2 () is convex in , therefore a unique solution

exists.
An optimal solution to (3.1) is given by:

ii = (BC 1 )ii .
n

ij = sign (BC 1 )ij max |(BC 1 )ij |
,0 .
2(1 + maxip Aii )
(3.2)
where sign(x) is sign of x and |x| is absolute value of x.
Proposed Estimator
An algorithm
A simulation study
Summary
An Algorithm
Choice of U:
Note that U is the matrix of eigenvectors of , which is unknown.
In practice, one can chose U as matrix of eigenvectors of
corresponding eigenvalue decomposition of S + I for some > 0
i.e. let S + I = U1 D1 U1T , then take U = U1 .
Choice of and :
For given value of > 0, we can find the value of satisfying:
n
< 2 (1 + min Aii )
ip
o
min (S)
+ 2 t min Aii ,
ip
1 + maxi Aii
and such choice of guarantees that the minimum eigenvalue of

the estimate will be at least > 0 and such choice of
S,t,A, .
(, ) R
1
Proposed Estimator
An algorithm
A simulation study
Computational time
We compare the computational time of the our algorithm with
Glasso and PDSCE [Rothman (2012)] algorithm.
Figure: Timing comparison of JPEN, Graphical Lasso(Glasso), PDSCE

on log-log scale.
Summary
Proposed Estimator
An algorithm
A simulation study
Computational time
We compare the computational timing of our algorithm to
some other existing algorithms graphical lasso [Friedman et
al.(2008)], PDSCE [Rothman (2011)].
The exact timing of these algorithm also depends upon the
implementation, platform etc. (we did our computations in
R 3.1 on a AMD 2.8GHz processor).
Unlike Glasso algorithm for which the computational time
depends upon how dense the true covariance matrix is, our
algorithm takes roughly same amount of time for both dense
and sparse covariance matrices.
Although the proposed method requires optimization over a
S,t,A, , our algorithm is
grid of values of (, ) R
1
computationally efficient and easily scalable to large scale
data analysis problems.
Summary
Proposed Estimator
An algorithm
A simulation study
Summary
Performance comparison with some other methods

We generate random vectors from multivariate t-distribution
for varying sample sizes and dimensions with 5 degrees of
freedom. We chose n = 50, 100, 200 and p = 500, 1000.
For each estimate of covariance and inverse covariance matrix,
we calculate average relative error (ARE) based on 50
simulations using the following formula:
= |log (f (S, ))
log (f (S, 0 ))|/|(log (f (S, 0 ))|,
RE (0 , )
where f (S, ) is density of multivariate normal distribution, S
is the
is sample covariance matrix, is the true covariance,
estimate of .
One of other choice of performance criteria is Kullback Leibler
[Yuan and Lin (2007), Bickel and Levina (2008)].
Proposed Estimator
An algorithm
A simulation study
Summary
Simulation
We generate random vectors from multivariate t-distribution
with mean vector zero and various structured covariance
matrices with degrees of freedom 5.
(i) Hub Graph: The rows/columns of 0 are partitioned into
J equally-sized disjoint groups: {V1 V2 , ..., VJ } =
{1, 2, ..., p}, each group is associated with a pivotal row k.
Let size |V1 | = s. We set 0i,j = 0j,i = for i Vk and
0i,j = 0j,i = 0 otherwise. In our experiment,
J = [p/s], k = 1, s + 1, 2s + 1, ..., and we always take
= 1/(s + 1) with J = 20.
(ii) Neighborhood Graph: We first uniformly sample
(y1 , y2 , ..., yn ) from a unit square. We then set
0i,j = 0j,i = with probability
1
( 2) exp(4kyi yj k2 ). The remaining entries of 0 are
set to be zero. We always take to be 0.245.
Proposed Estimator
An algorithm
A simulation study
Recovery of eigen-sprectrum of true covariance matrix
Figure: Eigenvalues Plot for p = 50 based on 50 realizations.
Summary
Proposed Estimator
An algorithm
A simulation study
Summary
Tuning parameter selection
For each estimate, the optimal tuning parameter was obtained

by minimizing the empirical loss function
Srobust kF
k
(4.1)
is an estimate of the the covariance matrix, Srobust is

where
an estimate of sample covariance matrix based on 5000 sample
observations (refer a section 5 for detailed discussion).
Simulation shows that that optimal value of tuning
parameters (, ) selected by criteria (4.1) is similar if we
replace Srobust by true covariance matrix.
Proposed Estimator
An algorithm
A simulation study
Recovery of eigen-sprectrum of true covariance matrix
Figure: Eigenvalues Plot for p = 50 based on 50 realizations.
Summary
Proposed Estimator
An algorithm
A simulation study
Recovery of sparse structure of true covariance matrix

Figure: Heatmap of zeros identified in covariance matrix out of 50
realizations. White color is 50/50 zeros identified, black color is 0/50
zeros identified.
Summary
Proposed Estimator
An algorithm
A simulation study
Average relative error and standard deviations
Table: Hub Type of covariance Matrix

p=500
n
Ledoit-Wolf
Glasoo
PDSCE
JPEN
50
100
200
19.5(7.083)
36.5(19.19)
21.9(3.558)
382.(23.08)
422.(54.06)
340.(48.35)
21.3(2.845)
22.9(7.590)
24.9(5.753)
19.5(7.072)
18.9(4.077)
18.3(4.663)
Ledoit-Wolf
Glasoo
PDSCE
JPEN
64.5(6.19)
92.9(11.3)
106(13.5)
470(49.8)
559(72.9)
463(59.7)
57.7(10.93)
89.4(12.65)
97.9(17.34)
50.6(4.8)
57.8(4.76)
61.9(4.93)
p=1000
50
100
200
Summary
Proposed Estimator
An algorithm
A simulation study
Average relative error and standard deviations
Table: Neighborhood Type of covariance Matrix

n
50
100
200
Ledoit-Wolf
0.47(0.004)
0.48(0.002)
0.34(0.002)
p=500
Glasoo
PDSCE
13.4(0.198)
0.48(0.003)
10.8(0.090)
0.48(0.002)
7.97(0.070)
0.34(0.001)
Ledoit-Wolf
Glasoo
PDSCE
JPEN
50
100
200
0.29(0.003)
0.27(0.002)
0.26(0.001)
18.2(0.2)
11.9(0.131)
9.21(0.179)
0.30(0.003)
0.27(0.002)
0.27(0.001)
0.28(0.003)
0.26(0.002)
0.25(0.001)
JPEN
0.47(0.004)
0.34(0.032)
0.28(0.005)
p=1000
Summary
Proposed Estimator
An algorithm
A simulation study
Summary
Simulation
The average relative error and their standard deviations are
given in table above. (Please refer the manuscript for detailed
simulations). The numbers in the bracket is the standard error
estimate of relative error.
The JPEN estimate of covariance matrix outperforms other
methods for all values of p and n and for all four types of
covariance matrices.
Among all the methods, the PDSC estimates are closer to
JPEN estimate in the terms of ARE.
The Ledoit-Wolf estimate performs good in terms of ARE but
the estimated covariance matrix is not sparse.
Proposed Estimator
An algorithm
A simulation study
Summary
Analysis of Colon Tumor Tissue Data
In this experiment, colon adenocarcinoma tissue samples were

collected, 40 of which were tumor tissues and 22 non-tumor
tissues. Tissue samples were analyzed using an Affymetrix
oligonucleotide array.
The data were processed, filtered, and reduced to a subset of
2,000 gene expression values with the largest minimal
intensity over the 62 tissue samples (source:
http://genomics-pubs.princeton.edu/oncology
/affydata/index.html).
We obtain estimates of covariance matrix for p = 50, 100, 200
and then use LDA to classify these tissues as either tumorous
or non-tumorous (normal).
Proposed Estimator
An algorithm
A simulation study
Summary
We classify each test observation x to either class k = 0

(tumorous) or k = 1(normal) using the LDA rule
o
n
k + log (k ) .
k 1 k
k (x) = arg max x T
2
k
where k is the proportion of class k observations in the training
data, k is the sample mean for class k on the training data, and
:=
1 is an estimator of the inverse of the common covariance
matrix on the training data computed by one of the methods under

consideration.
Proposed Estimator
An algorithm
A simulation study
Summary

Table: Averages and standard errors of classification errors over 100
replications in %.
Method
Logistic Regression
SVM
Naive Bayes
Graphical Lasso
Joint Penalty
p=50
21.0(0.84)
16.70(0.85)
13.3(0.75)
10.9(1.3)
9.9(0.98)
p=100
19.31(0.89)
16.76(0.97)
14.33(0.85)
9.4(0.89)
8.9(0.93)
p=200
21.5(0.85)
18.18(0.96)
14.63(0.75)
9.8(0.90)
8.2(0.81)
Among all the methods covariance matrix based based LDA

classifiers perform far better than other methods.
When more genes are added to the data set, the classification
performance of JPEN estimate based LDA classifier improves
whereas for other methods classification performance
deteriorates.
Proposed Estimator
An algorithm
A simulation study
Summary
A Joint penalty estimate of covariance matrix is proposed.
The estimator is both well-conditioned and sparse
simultaneously. The proposed approach allows one to take
advantage of any prior structure if known on the eigenvalues
of true covariance matrix.
The theoretical consistency of JPEN estimator is establish in
both Frobenius and Operator norm which guarantees
consistency for principal components, hence we expect that
PCA will be one of the most important applications of the
method.
The proposed algorithm is very fast, efficient and easily
scalable to large scale optimization problems.
Summary
Proposed Estimator
An algorithm
A simulation study
Summary
Estimation of Inverse covariance matrix
We propose JPEN estimator of inverse covariance matrix and

establish similar rate of convergence of the proposed
estimator.
The JPEN inverse covariance matrix performs better than
some other methods for varying sample seizes and dimensions.
Please refer the manuscript for detailed discussion.
Proposed Estimator
An algorithm
A simulation study
Summary
Acknowledgment
I would like to express my deep gratitude to Professor Hira L. Koul
for his valuable and constructive suggestions during the planning
and development of this research work.
Proposed Estimator
An algorithm
A simulation study
Thank you for your attention !

For references and other details, I can be reached at
mauryaas@msu.edu.
Summary

A Joint Convex Penalty For Inverse Covariance Matrix Estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Joint Convex Penalty For Inverse Covariance Matrix Estimation

Uploaded by

Copyright:

Available Formats

Introduction and Scope

Real life Data Application

A Well Conditioned and Sparse Estimate of

Oct 29, 2014

Introduction and Scope

Real life Data Application

Where is covariance matrix used ?

Principal component Analysis (PCA) [Johnstone et al.

Introduction and Scope

Real life Data Application

Where is covariance matrix used ?

Principal component Analysis (PCA) [Johnstone et al.

Introduction and Scope

Real life Data Application

Where is covariance matrix used ?

Principal component Analysis (PCA) [Johnstone et al.

Introduction and Scope

Real life Data Application

Where is covariance matrix used ?

Principal component Analysis (PCA) [Johnstone et al.

Introduction and Scope

Real life Data Application

Does covariance reflect geometry of data ?

Data Lying near a linear manifold ? Yes, PCA is the

Introduction and Scope

Real life Data Application

Does covariance reflect geometry of data ?

Data Lying near a linear manifold ? Yes, PCA is the

Introduction and Scope

Real life Data Application

Some real life applications where covariance is useful

Analysis of gene expression

Introduction and Scope

Real life Data Application

Why not just use sample covariance matrix ?

Let Xm = (Xm,1 , Xm,2 , , Xm,p ) be p-dimensional random

MLE, unbiased, and well behaved for fixed p as n . But

Introduction and Scope

Real life Data Application

Why not just use sample covariance matrix ?

Let Xm = (Xm,1 , Xm,2 , , Xm,p ) be p-dimensional random

MLE, unbiased, and well behaved for fixed p as n . But

Introduction and Scope

Real life Data Application

Why not just use sample covariance matrix ?

Let Xm = (Xm,1 , Xm,2 , , Xm,p ) be p-dimensional random

MLE, unbiased, and well behaved for fixed p as n . But

Introduction and Scope

Real life Data Application

Why not just use sample covariance matrix ?

Sample eigenvectors are not consistent [Johnstone and Lu

Introduction and Scope

Real life Data Application

Why not just use sample covariance matrix ?

Sample eigenvectors are not consistent [Johnstone and Lu

Introduction and Scope

Real life Data Application

Why not just use sample covariance matrix ?

Sample eigenvectors are not consistent [Johnstone and Lu

Introduction and Scope

Real life Data Application

Some desired property of a covariance matrix estimator

Well Conditioned The eigenvalues are bounded below and