You are on page 1of 11

A Simplified Exposition of Sparsity Inducing

Penalty Functions for Denoising


Shivkaran Singh and Sachin Kumar S and Soman K P

Abstract This paper attempts to provide a pedagogical view to the approach of


denoising using non-convex regularization developed by Ankit Parekh et al. The
present paper proposes a simplified signal denoising approach by explicitly using
sub-band matrices of decimated wavelet transform matrix. The objective function
involves convex and non-convex terms in which the convexity of the overall function is restrained by parameterized non-convex term. The solution to this convexoptimization problem is obtained by employing Majorization-Minimization iterative algorithm. For the experimentation purpose, different wavelet filters such as
daubechies, coiflets and reverse biorthogonal were used.

1 Introduction
Signal denoising is a task where we estimate our original signal x Rn from its
noisy observation, say z Rn , represented as
z = x + n,

(1)

where n represents Additive White Gaussian Noise (AWGN) with standard deviation . Sparsity based solutions have a clear advantage over other solutions because
of low space and time complexity. A successful method of convex denoising using
tight frame regularization was derived by [1]. This paper aims to provide a pedagogical explanation to the theory proposed by [1]. In this work, a denoising method
using Non-Convex Penalty (NCP) functions and decimated wavelet transform matrix W [2] is proposed. Wavelet transform matrix (W ) contains sub-band wavelet
Shivkaran Singh Sachin Kumar S Soman K P
Center for Computational Engineering and Networking (CEN)
Amrita School of Engineering, Coimbatore
Amrita Vishwa Vidyapeetham, Amrita University, India 641112
e-mail: shvkrn.s@email.address

matrices corresponding to different levels of transformation. The problem formulation is derived using W because it is less complicated and easy to manipulate. The
problem formulation for the signal denoising problem using W is given as
(
1
argmin F(x) := ||z x||22 + 1 (W1 x; c1 )i1 + ....
2
x
i1
)
..... + L+1

(WL+1 x; cL+1 )iL+1

(2)

iL+1

where 1 , 2 ....L+1 are the regularization parameters corresponding to different


levels of transformation ( > 0 always), W1 ,W2 .....WL+1 are the sub-band matrices
of W corresponding to different levels of transformations, (.; c) : R R is a nonconvex sparsity inducing function parametrized by c and L represents levels of transformation of W. In Section 3, more clarity about (2) and different sub-band matrices
of W is provided. Parameter c is used to adjust the convexity of overall objective
function F(x). The formulation comprises of convex and non-convex (non-smooth)
terms. For sparse processing of the signal, penalty function, (x; c), should be chosen such that it improves the sparsity of x [3]. In this work, NCP functions such as
logarithmic, arctangent and rational penalty function were used. The use of NCP
functions leads to better sparsity. Among all mentioned NCP functions,arctangent
penalty function delivered better results as it converges more rapidly to the identity
function, see Fig. 3 in [4].
To solve the denoising problem, Majorization-Minimization (M-M) algorithm
instead of Alternating Direction Multiplier Method (ADMM) [1] was used, which
provided computational advantage and improved Root Mean Squared Error (RMSE)
value (see Fig. 3). Further, we tried different filters associated with different wavelets
namely Biorthogonal, Coiflets, Daubechies, Symlets and Reverse Biorthogonal
Wavelets. Reverse biorthogonal filters provided the competing results. The entire experimentation was performed using MATLAB. In Section 2, the properties of sparsity inducing non-smooth penalty functions are accentuated. Later we described the
M-M class of algorithms [5], highlighting majorization-minimization. In Section
3, first the notations involved are addressed and then the M-M algorithm used in
our formulation is explained. Later in Section ??, 1-D example and corresponding
experimental results are provided.

2 Methodology
2.1 Sparsity Inducing Functions
The convex proxy for sparsity i.e. l1 norm has a special importance in sparse signal
processing. Nevertheless, sparsity inducing NCP functions provide better realization
in several signal estimation problems. In the formulation (2), parameter c will ensure
that the addition of a non-smooth penalty function doesnt alter the overall convexity
of cost function F. Parameter c can be chosen to make NCP function maximally non
convex [4]. In order to maintain convexity of the cost function F, range derived in [3]
is utilized, which is given by
1
(3)
0 < cj <
j
where j = 0, 1, 2....L corresponds to the different levels of transformation. If parameter c fails to comply with (3), then global optimal solution (minima) cant be
ensured, means the geometry of the NCP function affects the solution of the overall cost function. Later we will see in section 2 that c j = 1/ j provided competing
results. Alongwith the aforementioned condition, any non-convex penalty function
(.; c) : R R must adheres to the following conditions [1] to provide global
minima
1. is continuous function on R
2. is differentiable twice on R \{0}
3. Slope of (x) is unity at immediate right of origin, and on immediate left its
negative unity
0 (0+) = 1 and 0 (0) = 1

Fig. 1 Logarithmic NCP


function

4. For x > 0, 0 (x) is decreasing from 1 to 0 and for x < 0, 0 (x) is increasing from
1 to 0 (Refer Figure 1 to visualize)
5. (x) is symmetric i.e. function is unchanged by any permutation of variable x.
(x; c) = (x; c)
6. l1 norm can be retrieved as a special case of (x; c), with c = 0
7. The greatest lower bound to the value of 00 (x) is c
Examples of several NCP functions used in experimentation are listed in Table
1. NCP function is not differentiable at origin, because of a discontinuity of (x)
at x = 0, which is illustrated in Fig.1. In Fig.1, the variation in the function with
different values of parameter c could be observed. It turns out that, mathematically
we can generalize the derivative of a non-differentiable function using sub-gradient.
In convex analysis, sub-gradients are used when convex function is not differentiable [6].

Table 1: Examples of NCP Functions


NC Penalty Function

Expression

Rational

(x) =

Logarithmic

(x) =

Arctangent

(x) =

|x|
1+c|x|/2
1
c log (1 + c |x|)
2

c 3






tan1 1+2c|x|
6
3

2.2 M-M algorithm


The M-M algorithm is an iterative algorithm which operates by generating a proxy
function that Majorize or Minorize the cost function. If the intention is to minimize
the cost function then M-M reads majorize-minimize. On the other hand if the intention is to maximize the cost function, MM reads minorize-minimize [7]. Any
algorithm based on this fashion of iteration is called M-M algorithm. M-M algorithm can gracefully simplify a difficult optimization problem by (1) linearizing the
given optimization problem, (2) disjoining the variables in the optimization problem (3) modifiying non-smooth problem into a smooth problem [5]. It turns out that
sometimes (not always) we pay the price of simplifying the problem with slower
convergence rate due to several iterations. In statistics, Expectation-Maximization
(E-M) is special form of M-M algorithm. It is a widely applicable approach for
determining Maximum-Likelihood-Estimation (MLE) [8]. The M-M algorithm is
usually easier to grasp than E-M algorithm. In this paper, we used M-M algorithm

for minimizing the cost function, hence it reads majorize-minimize. Further explanation will consider the case of minimizing the cost function. The initial part of
M-M algorithms implementation is to define the majorizer (Gm (x)), m = 0, 1, 2...,
where m denotes the current majorizer. The idea behind using a majorizer is that it
is easier to solve than F(x) (Gm (x) must be convex function). A function Gm (x) is
called a majorizer of another function F(x) at x = xm iff
F(x) Gm (x) f or all x

(4)

F(x) = Gm (x) at x = xm

(5)

In other words, (4) signifies that the surface of Gm (x) always lies above the surface
of F(x) and (5) signifies that Gm (x) is tangent to F(x) at point x = xm . The basic
intuition regarding equation (4) and (5) could be developed from Fig. 2. In M-M
algorithm, for each iteration we must find a majorizer first with upper bound and
then minimize it. Hence the name Majorization-Minimization. If xm+1 is minima
of current proxy function (Gm (x)), M-M algorithm derives the cost function F(x)
downwards. It is demonstrated in [5] that numerical stability of M-M approach depends on decreasing the proxy function rather than minimizing it. In a similar way,
M-M approach works for multi-dimensional function where it is more effective.
As a general practice, quadratic functions are preferred as majorizers because
derivative provides a linear system. A polynomial of higher order could also be used,
however it will make the solution difficult (non-linear system). To minimize F(x)
using M-M approach, we can either majorize 12 ||y x||2 [9] or NCP function [10] or
both. In our experimentation, we majorized the NCP function.

3 Problem Formulation
In this paper the importance of NCP functions for 1-D signal denoising is addressed
as NCP functions promotes sparsity. The NCP function should be chosen to assure

Fig. 2 A Majorizing Function


for Piecewise Defined Linear
Function

the convexity of overall cost function. A benefit of this approach is that we can arrive
at the solution by using convex optimization methods.

3.1 Notations
The signal x to be estimated is represented by an N-Point vector
x = [x1 , x2 , x3 , ........., x(N)]
The NCP function (x; c) used in our experimentation parametrized by c is represented as [4]




1 + 2a|x|

(x) = tan1
6
a 3
3
The wavelet transform matrix is represented by a square matrix [2] W. The Matlab
syntax to generate the wavelet matrix is as follows W = wavmat(N, F1 , F2 , L) where
N denotes the length of the signal, F1 and F2 are decomposition/reconstruction filters
which can be obtained by Matlab function w f ilters(0 wavelet name0 ); for different
wavelets, L denotes the transformation level (we used L = 4).

3.2 Sub-band matrices of wavelet transform matrix


Structure of a wavelet transform matrix W could be interpreted as

W1
W2

..

W = .

.
..
WL+1
where W1 ,W2 , .....WL+1 are the sub-band matrices of W with dimensions corresponding to the level of transformation used. For e.g. Let signal length be N = 64,
level of transformation be 4 (L = 4), then structure of wavelet transform matrix will
be like

[W1 ]464
[W2 ]

464

W = [W3 ]864

[W4 ]1664
[W5 ]3264 6464
where [W1 ]464 , [W2 ]464 ... are the sub-band matrices corresponding to different
levels of transformation.

3.3 Algorithm
Consider the cost function in (2), M-M algorithm generates a sequence of simpler
optimization problems as
xm+1 = argmin Gm (xm )

(6)

i.e. in each iteration we are solving a smaller convex optimization problem. The expression in (6) is used to update xm in each iteration with x0 = z as initialization.
Each iteration in M-M algorithm will have different majorizer which must fit an upper bound for NCP function i.e. it should follow (4) & (5). As mentioned in Section
2, we used second order polynomial as majorizer for simpler solution to our optimization problem.
As mentioned in [11], scalar case majorizer for NCP function could be given by
g(x; s) =


s
0 (s) 2 
x + (s) 0 (s)
2s
2

(7)

0 (s) 2
x +
2s

(8)

It can be modified as
g(x; s) =
where is

s
(s) 0 (s)
(9)
2
the term in parenthesis (i.e. ) in (7) is merely a constant which can be avoided for
solving optimization problem.
Equivalently, the corresponding vector case can be derived as
0 ([W s]1 )
[W x]1 + ...
2[W s]1
0 ([W s]n )
[W x]n +
... + [W x]Tn
2[W s]n

g ([W x], [W s]) = [W x]T1

where W is Wavelet matrix, [W x]n is the nth component of vector W x, [W s]n is the
nth component of vector W s. Above equation can also be written as
n

g ([W x], [W s]) = [W x]Ti


i=1

0 ([W s]i )
[W x]i +
2[W s]i

(10)

The more compact form of (10) is given by


1
g ([W x], [W s]) = (W x)T (W x) +
2
Where

(11)

0 ([W s]

1)

[W s]1

..

.
0 ([W s]n )
[W s]n

Therefore, using (4) we can say that


g ([W x], [W s]) ([W x])

(12)

Hence g([W x], [W s]) is a majorizer to ([W x]). Therefore, using (11) & (12), we
can directly give majorizer for (2) by
1
1
G(x, s) = ||z x||2 + (W x)T (W x) +
2
2

(13)

To avoid any further confusion, will consume all the different s. Finally. It will
appear as

0 ([W s] )
(1 ) [W s] 1
1

..

(14)
=
.

0
([W s]n )
(L+1 ) [W s]
n

expression in (13) becomes


1
1
G(x, s) = ||z x||2 + (W x)T (W x) +
2
2

(15)

Using M-M algorithm, xm can be calculated by minimizing (16) as


xm+1 = argmin Gm (x, s)

(16)

1
1
xm+1 = argmin ||z x||2 + (W x)T (W x) +
2
2
x

(17)

1
1
xm+1 = argmin ||z x||2 + xT W T W x +
2
2
x

(18)

minimizing (18), we can easily explicitly arrive at



1
xm+1 = I +W T W
z

(19)

The only problem with the update equation arrives when the term [W s]n goes to zero,
the corresponding entries in becomes infinite. Therefore, expression (19) may
become infinite. However, To avoid this unstable state, Woodbury Matrix Identity
(more commonly called as the matrix inversion lemma) [12] could be used, which
is given in the form
(A + XBY )1 = A1 A1 X(B1 +Y A1 X)1Y A1

(20)

using (20), our update equation (19) becomes


1

xm+1 = z W T +WW T

(21)

Clearly (21) solves for instead of . Therefore, entries of will become


zero, instead of becoming infinity. Above algorithm can be easily implemented using MATLAB. In depth analysis of M-M algorithm can be found in Chapter 12
of [5].
The M-M approach explained above is summarized as
Step 1: Set m = 0. Initialize by setting x0 = z
Step 2: Choose Gm (x) for F(x) such that
F(x) Gm (x) f or all x
F(x) = Gm (x) at x = xm
Step 3: Find xm+1 by minimizing
xm+1 = argmin Gm (x, s)
x

Step 4: Increment m using m = m + 1, jump to Step 2

4 Example
During experimentation, 1 D signal denoising problemwas considered. The synthetic noisy signal used for experimentation purpose was generated using MakeSignal()
function from Wavelab tool with Additive White Gaussian Noise of = 4. The
wavelab tool is available at: http://statweb.stanford.edu/wavelab/
The value of parameter c which maintains the convexity could be is calculated using (3). Maximal sparse solution was noted with c = 1/ . To compute the values
associated with j , expression given by [1] was further modified as
j = 2 j/2

N
2(log(N/2) j)

,1 j L

(22)

Where N denotes the length of the signal and L denotes transformation level. Note
that, we used 0 = 1 for first sub-band matrix W1 . Further, different ( j ) for different band of wavelet transform matrix was used. As mentioned in Section 3, arc tangent NCP function is used. Using trial and error, 4-Level transformation in Wavelet
Matrix (i.e. L = 4) gave better results. Further, reconstruction and decomposition
high pass filters of reverse biorthogonal wavelet (rbio2.2) were employed, which

Table 2: RMSE values for different wavelet filters


Wavelet filter

Root Mean Square Error

Biorthogonal (bior1.3)
Coiflets (coif1)
Daubechies (db2)
Biorthogonal (bior2.2)
Reverse Biorthogonal (rbio2.2)

1.6881
1.5902
1.4844
1.4811
1.4565

provided improved results. However, experiment with biorthogonal, daubechies,


coiflets and Symlets wavelets was also conducted. Symlets family of wavelets is
merely a modified version of daubechies with similar properties and improved symmetry. Symlets provided the identical result as of daubechies. Table 2 lists the different RMSE values obtained using different wavelet filters with = 0.98. The
outcomes obtained (Table 2) were compared with outcomes obtained in [1] and l1
regularization. It can be observed in Figure 3 that Non-convex regularization with
decimated wavelet gave better results. In Fig 3, it can be observed that non-convex
regularization using decimated wavelets preserved the peaks of the given signal.
Therefore, same approach could be used for signals pertaining sharp peaks. Example of one such signal is Electrocardiogram (ECG) signal.

Fig. 3: Example of 1-D Denoising

5 Conclusion
The attempt in this paper is to provide a pedagogical approach to the ingenious
methodology proposed by Ankit parekh et al [1]. An approach for 1-D signal denoising using decimated wavelet transform is proposed. The sub-band nature of
wavelet transform matrix (W) is exploited to get easier and better understanding
and solution for the given denoising problem. The problem formulation comprised
a smooth and a non-smooth term with a parameter c which controls the overall convexity. The solution to this formulation is obtained using M-M iterative algorithm.
The proposed approach offered better experimental results than non-convex regularization [1] and l1 regularization. The same functional procedure could be extended
for denoising a noisy image.

References
1. Ankit Parekh and Ivan W Selesnick. Convex denoising using non-convex tight frame regularization. Signal Processing Letters, IEEE, 22(10):17861790, 2015.
2. Jie Yan. Wavelet matrix. Dept. of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada, 2009.
3. Ivan Selesnick. Penalty and shrinkage functions for sparse signal processing. Connexions,
2012.
4. Ivan W Selesnick and Ilker Bayram. Sparse signal estimation by maximally sparse convex
optimization. Signal Processing, IEEE Transactions on, 62(5):10781092, 2014.
5. Kenneth Lange. Numerical analysis for statisticians. Springer Science & Business Media,
2010.
6. Jan van Tiel. Convex analysis. John Wiley, 1984.
7. David R Hunter and Kenneth Lange. A tutorial on mm algorithms. The American Statistician,
58(1):3037, 2004.
8. Geoffrey McLachlan and Thriyambakam Krishnan. The EM algorithm and extensions, volume
382. John Wiley & Sons, 2007.
9. Ingrid Daubechies, Michel Defrise, and Christine De Mol. An iterative thresholding algorithm
for linear inverse problems with a sparsity constraint. Communications on pure and applied
mathematics, 57(11):14131457, 2004.
10. Ivan Selesnick. Total variation denoising (an mm algorithm). NYU Polytechnic School of
Engineering Lecture Notes, 2012.
11. Ivan W Selesnick, Ankit Parekh, and Ilker Bayram. Convex 1-d total variation denoising with
non-convex regularization. Signal Processing Letters, IEEE, 22(2):141144, 2015.
12. Mario AT Figueiredo, J Bioucas Dias, Joao P Oliveira, and Robert D Nowak. On total variation
denoising: A new majorization-minimization algorithm and an experimental comparisonwith
wavalet denoising. In Image Processing, 2006 IEEE International Conference on, pages 2633
2636. IEEE, 2006.

You might also like