Facial Feature Extraction Using Complex Dual-Tree Wavelet Transform

Facial feature extraction using complex dual-tree wavelet transform
Turgay C elik
*
, Hu seyin O
zkaramanl, Hasan Demirel

Department of Electrical and Electronic Engineering, Eastern Mediterranean University, Gazimagusa, TRNC, Mersin 10, Turkey
Received 9 August 2006; accepted 5 December 2007
Available online 24 January 2008
Abstract
In this paper, we propose a novel method for facial feature extraction using the directional multiresolution decomposition oered by
the complex wavelet transform. The dual-tree implementation of complex wavelet transform oered by Selesnick is used (DT-DWT(S))
[I.W., Selesnick, R.G. Baraniuk, N.C. Kingsbury, The dual-tree complex wavelet transform, IEEE Signal Processing Magazine, 6, s.l.,
IEEE, November 2005, vol. 22, pp. 123151.]. In the dual-tree implementation, two parallel discrete wavelet transform (DWT) with dif-
ferent lowpass and highpass lters in dierent scales are used. The linear combination of subbands generated by two parallel DWT is
used to generate 6 dierent directional subbands with complex coecients. A test statistic, which is derived with absolute value of com-
plex coecient, whose distribution matches very closely with the directional information in the 6 subbands of the DT-DWT(S) is derived
and used for detecting facial feature edges. The use of the complex wavelet transform is motivated by the fact that it helps eliminate the
eects of non-uniform illumination, and the directional information provided by the dierent subbands makes it possible to detect edge
features with dierent directionalities in the corresponding image. Edge information of facial area is enhanced using multiresolution
structure of DT-DWT(S). The proposed method also employs an adaptive skin colour model instead of a predened skin colour statistic.
The model is developed with a unimodal Gaussian distribution using the skin region which is extracted excluding the detected edge map
obtained from the DT-DWT(S). By combining the edge information obtained by using DT-DWT(S) and the non-skin areas obtained
from the pixel statistics, the facial features are extracted. The algorithm is tested over the well known Carnegie Mellon University
(CMU) and Marks Weber face databases. The average detection rate of the proposed method using DT-DWT(S) provides up to
9.6% improvement over the same method using discrete wavelet transform (DWT).
2007 Elsevier Inc. All rights reserved.
Keywords: Facial feature detection; Complex dual-tree wavelet transform; Skin color modelling
1. Introduction
Facial feature detection has been among one of the most
important research topics for the past several years [1,2]
and it still remains to be an open problem. It is not an easy
machine vision task due to the variability in gender, race,
pose, expression, and the image acquisition conditions.
This importance arises from the fact that facial feature
detection forms the basis for and determines the success
of many high level machine vision applications such as tele-
conferencing, 3D model based face animation, facial
expression analysis, face recognition, and lip reading.
There have been many techniques suggested in the liter-
ature for solving the problem. These techniques can be
classied into two broad categories based on shape and col-
our. The rst class of methods tries to characterize the face
and each facial feature with a certain combination of col-
ours [3]. The colour based approach is low-cost, however
not very robust. The shape-based approaches look for spe-
cic shapes in the image adopting either template matching
[4], deformable templates [5], graph matching [6,7], snakes
[8], the Hough transform [9], the Discrete wavelet trans-
form [10], and the Gabor wavelets [11]. The shape based
methods perform well under certain restricted assumptions
regarding the head position, the image scale, and the illu-
mination conditions. However, they are computationally
more expensive methods.
1077-3142/$ - see front matter 2007 Elsevier Inc. All rights reserved.
doi:10.1016/j.cviu.2007.12.001
*
Corresponding author. Fax: +90 392 3650240.
E-mail address: turgay.celik@emu.edu.tr (T. C elik).
www.elsevier.com/locate/cviu
Available online at www.sciencedirect.com
Computer Vision and Image Understanding 111 (2008) 229246
It has been shown that a Gabor wavelet approach for
facial feature extraction outperforms many other shape
based methods [12]. The success of Gabor wavelet based
approach is essentially due to their ability in removing
most of the variability in image due to variation in lighting
and contrast. Furthermore they are robust against small
shifts and deformations [13]. Another important attribute
of the Gabor wavelets is their resemblance to the sensitivity
proles of neurons found in visual cortex of higher verte-
brates [1416]. The disadvantage however of the method
is their high computational complexity due to the ltering
of the face image with a bank of Gabor lters at many
scales and orientations.
Recently wavelets under multiresolution framework
have been shown to help combat the detrimental eects
of both noise and non-uniform illumination [17] in face rec-
ognition and facial feature extraction. The dual tree imple-
mentation of complex wavelet transform provides a
directional multiresolution decomposition of a given image
much like the Gabor wavelets however unlike the Gabor
wavelets it has relatively less computational expense which
will be shown in Section 2 with details. Thus in this paper,
an automatic method for facial feature extraction in colour
images using complex dual-tree wavelet transform (DT-
DWT(S)) is proposed. The face images are assumed to be
approximately frontal and upright. The desirable charac-
teristics of the DT-DWT(S) such as spatial locality, orien-
tation selectivity and excellent noise cleaning performance
provides a framework which renders the extraction of
facial features almost invariant to such disturbances
[18,19].
The norm of complex directional wavelet subband coef-
cients is used to create a test statistics for enhancing the
facial feature edge points. The Rayleigh distribution of
the derived statistics matches very closely with the true
coecient distribution in the 6 directional subbands. The
use of the complex wavelet transform helps to detect more
facial feature edge points due to its improved directional-
ity. Additionally, it eliminates the eects of non-uniform
illumination very eectively. Another novelty of the pro-
posed algorithm is the automatic and accurate extraction
of the skin region due to the exclusion of the detected edge
map in developing the skin colour model. A unimodal
Gaussian distribution of the colour components in YCbCr
colour space is used to model the skin colour pixels present
in the face region. After modelling the skin pixel statistics,
the algorithm automatically detects skin colour pixels the
second time by using skin pixel estimation. The extracted
skin region makes the system adaptive for changing illumi-
nation conditions. By combining the edge information
obtained by using DT-DWT(S) and the non-skin areas
obtained from the skin colour statistics, the facial features
can be extracted. The proposed algorithm is used to locate
four points for eyes, eyebrows and two points for the
mouth. For evaluating the performance of the proposed
algorithm we compare the manually extracted points to
the automatically extracted ones with the proposed algo-
rithm using the well known CMU [20] and Markus Weber
face databases, [21]. A correct detection is declared if the
manually detected and automatically detected points are
within 3 pixels of each other. The worst case performance
occurs for the eye corners with a detection rate of 92%.
The average detection rate for all facial feature points is
95%.
The layout of this paper is organized as follows: Section
2 compares Gabor ltering and DT-DWT(S) in the manner
of computational complexity. Section 3 gives general infor-
mation about Viola and Jones face detector [22]. The pro-
posed algorithm is introduced in Section 4, which is
followed by the Section 5 which covers performance anal-
ysis of the proposed algorithm and it also covers compari-
son analysis of the algorithm if DWT is used rather than
DT-DWT(S). The paper is concluded with nal section of
conclusion.
2. Computational complexity comparison between Gabor
ltering and DT-DWT(S)
Gabor ltering and DT-DWT(S) are used to produce
directionalities in dierent scales. Since, DT-DWT(S) can
produce 6 directional subbands, that is 165, 135, 105,
75, 45, 15, then the computational complexity analysis
is based on 6 directional information and changing number
of scales.
Let each lter in DT-DWT(S) has N number of coe-
cients, even though number of coecients in 2D Gabor l-
ter is much higher than the number of coecients in the
lters of DT-DWT(S), let us assume that they are equal
now for the worst case analysis. So the number of coe-
cients in Gabor lters which will be used in the convolving
overall image is N
2
. Say, our images size is HxW, where H
is height of image and W width of image. We have 6 direc-
tionalities and J scales, in the operation of convolving over-
all image with single Gabor lter structure, for each pixel
N
2
multiplications/divisions and (N
2
1) additions/sub-
tractions are needed. Since, we are using complex coe-
cients, each complex operand in Gabor lter coecients
represented in two dierent spaces, i.e. real and imaginary.
So, for each pixel, twice the number of arithmetic opera-
tions are needed with respect to numbers represented in
real space, that is the overall arithmetic operations for a
single pixel is 2N
2
multiplications/divisions, 2(N
2
1)
additions/subtractions. The image has HxW total pixels,
so the total number of arithmetic operations is 2WHN
2
multiplications/divisions, and 2WH(N
2
1) additions/sub-
tractions. Since 6 directionalities are considered in each
scale, and the number of scales is J, in total, the overall
arithmetic operations are required is 12WHJN
2
multiplica-
tions/divisions and 12WHJ(N
2
1) additions/subtrac-
tions. Note that, in Gabor ltering, the image size is kept
constant but the Gabor lters parameters are changed in
order to provide directionalities in dierent scales. So total
number of multiplications/divisions operations for 6
230 T. C elik et al. / Computer Vision and Image Understanding 111 (2008) 229246
directionalities with J scales for Gabor ltering is formu-
lated as following;
12WHJN
2
1
and similarly the number of additions/subtractions opera-
tions is formulated as follows;
12WHJN
2
1 2
Now let us make analysis of the DT-DWT(S). For each
scale in DT-DWT(S), rst we need to nd the number of
arithmetic operations for single level DWT, which is used
to generate directionalities in DT-DWT(S) with dierent
type lowpass/highpass lters. The detailed explanation of
DT-DWT(S) implementation is given in [18]. So, for single
level analysis part of DWT, number of arithmetic opera-
tions for ltering with /highpass lter of size N along col-
umns of image, one needs to have W
H
2
j
_ _
multiplications/
divisions, W
H
2
j
_ _
N 1 additions/subtractions in scale j,
so total number of operations for total ltering for lowpass
and highpass is 2W
H
2
j
_ _
N multiplications/divisions, and
2W
H
2
j
_ _
N 1 additions/subtractions. Note that division
by 2
j
comes from down sampling by 2 operations. Follow-
ing to ltering along columns, one needs to nd number of
arithmetic operations in ltering along rows. This is equiv-
alent to ltering along rows with lowpass and highpass l-
ters of length N. So, for each ltering operation one needs
W
2
j
_ _
H
2
j
_ _
N multiplications/divisions, and
W
2
j
_ _
H
2
j
_ _
N 1
additions/subtractions. Total number of arithmetic opera-
tions is equivalent to 2
W
2
j
_ _
H
2
j
_ _
N multiplications/divisions,
and 2
W
2
j
_ _
H
2
j
_ _
N 1 additions/subtractions. So for single
level DWT, total number of arithmetic operations for mul-
tiplications/divisions is 2W
H
2
j
_ _
N 2
W
2
j
_ _
H
2
j
_ _
N
_ _
and
total number of arithmetic operations for additions/sub-
tractions is 2W
H
2
j
_ _
N 1 2
W
2
j
_ _
H
2
j
_ _
N 1
_ _
. With
the Selesnicks implementation, which uses linear combina-
tions of dierent DWT with dierent analysis lters to gen-
erate J scales with 6 directionalities with produced
coecients are complex, total arithmetic operations needed
for multiplications/divisions is formulated as following;
J
j1
4
2WHN
2
j
1
1
2
j
_ _
J
j1
12
WH
2
2j
_ _
12WH
8
12
N
J
j1
1
2
j

1
2
2j
_ _
_ _
J
j1
1
2
2j
_ _
3
and similarly, total arithmetic operations needed for addi-
tions/subtractions is formulated as follows;
J
j1
4
2WHN 1
2
j
1
1
2
j
_ _
J
j1
12
WH
2
2j
_ _
12WH
8
12
N 1
J
j1
1
2
j

1
2
2j
_ _
_ _
J
j1
1
2
2j
_ _
4
It is clear that, Eq. (3) is always smaller than the Eq. (1)
with any number of scales J; similarly Eq. (4) is always
smaller than the Eq. (2) with any number of scales J.
Fig. 1(a) shows the comparison of number of arithmetic
multiplications/divisions operations for 128 128 image
with N = 12, similarly Fig. 1(b) shows the comparison of
number of arithmetic additions/subtractions operations
for 128 128 image with N = 12. The graphics are shown
in gures show number of operations with respect to
change in scales for both Gabor ltering and DT-DWT(S).
In order to make overall gure comparable, the logarith-
Fig. 1. Arithmetic operation comparisons for Gabor ltering and DT-
DWT(S) with N = 12, 128 128 of image size, variable scale number (a)
Number of arithmetic operations for multiplications/divisions, (b) Num-
ber of arithmetic operations for additions/subtractions.
T. C elik et al. / Computer Vision and Image Understanding 111 (2008) 229246 231
mic scale in number of operations axis is selected. It is clear
that, for changing scales, the number of arithmetic opera-
tions required for Gabor ltering is much higher than the
number of arithmetic operations required for DT-DWT(S).
Fig. 2 shows the similar operation with J = 6, and variable
N. Similar to Fig. 1, the number of operations for DT-
DWT(S) is smaller than Gabor ltering.
3. Face detection using Viola and Jones face detector
The face detection problem has received a lot of atten-
tion in the image processing literature since it is the rst
step in many higher level image processing applications.
Some of the techniques proposed use of statistical color
model to segment skin like regions that are assumed to con-
tain the face region. Most of these techniques suer from
the non-uniform illumination eects. Eventhough they
are ecient in computational complexity, they do not sur-
vive in the outdoor environment under changing lighting
conditions. On the other hand, the methods which use
holistic approaches are much more ecient in correct
detection of faces. The main disadvantage of these methods
is the computational cost in detecting faces. They require
scanning of all possible feature locations in images in dier-
ent scales which brings high computational burden, and
makes the system complex enough for possible real-time
implementation.
Recently, Viola and Jones presented a face detector
which uses a holistic approach and is much faster than
any of their contemporaries [22]. The performance can be
attributed to the use of an attentional cascade, using low-
feature number detectors based on a natural extension of
Haar wavelets. Each detector in their cascade ts objects
to simple rectangular masks. In order to reduce the number
of computations, while moving through their cascade, they
introduced a new image representation called the integral
image. For each pixel in the original image, there is exactly
one pixel in the integral image, whose value is the sum of
the original image values above and to the left. The integral
image can be computed quickly which drastically improves
the computation costs of the rectangular feature models.
At the highest levels of the attentional cascade, where most
of the comparisons are made, the rectangular features are
very large. As the computation progresses down the cas-
cade, the features can get smaller and smaller, but fewer
locations are tested for faces. The attentional cascade clas-
siers are trained on a training set. Since the described
technique is faster in implementation, and its detection rate
is high enough with its low false alarm rate [22], we have
employed this technique in our proposed method to detect
faces. Fig. 3 shows some pictures from CMU database, and
detected faces marked with rectangular enclosing frame.
4. Facial feature extraction using DT-DWT(S)
The overall algorithm for facial feature extraction is
given as ow chart in Fig. 4 where the input to the algorithm
is the face image. DT-DWT(S) is applied and the subbands
are processed to nd edges of the feature points. This stage
is followed by the extraction of skin colour area and skin
colour modelling. After these steps are carried out, one
has a good seed image for locating the facial features
(mouth, eyes, and eyebrows). This is achieved in the follow-
ing manner: at the rst stage the region below the centre of
the overall face area is isolated for detecting the mouth fea-
ture. This is followed by detecting the left eye, left eyebrow,
right eye, and right eyebrow features. In some cases, the fea-
ture edge maps are too large. This is especially true for the
area between eye and eyebrow. As a consequence this region
may be lost. In order to overcome this deciency, the auto-
matically extracted skin colour map is used. The skin colour
map is applied if there is missing left eye or left eye brow
candidate. After all candidates are localized, the derived
skin colour map is again used for ne tuning. The ne
Fig. 2. Arithmetic operation comparisons for Gabor ltering and DT-
DWT(S) with J = 6, 128 128 of image size, variable number of
coecients (N) (a) Number of arithmetic operations for multiplications/
divisions, (b) Number of arithmetic operations for additions/subtractions.
tuning stage especially extracts the facial feature regions
roughly. These regions may be adjusted more using any
edge detector such as Sobel or Prewit. In the literature, there
are constant models which dene regions over which it is
more likely a skin pixel will occur, but these constant region
based denitions are not eective to overcome drawback of
changing illumination eects or other kind of non-idealities
caused from imaging system. In the following subsections, a
detailed explanation of edge information extraction and
skin colour detection is provided.
The logical AND operation is used between the edge
map and the skin colour map to detect overall facial feature
candidates. There are a lot of feature candidates detected in
the face region, including some undesired features. In order
to eliminate the undesired features we need to analyze the
detected feature candidates with respect to their orientation
and shapes. The facial features, such as mouth, eye, and eye-
brows can be represented by using simple ellipse. Hence an
ellipse is tted to each of the detected facial feature candi-
date. The orientation of major axis of the tted ellipse with
respect to the horizontal axis of the image plane is used to
determine the angle of orientation. Major facial features,
such as eyes, mouth and eyebrows should have almost the
same angle of orientation. The angle of orientation infor-
mation is not enough to describe a facial feature. For exam-
ple, if an undesired circle-like feature is detected, and if its
orientation is close to the orientation of the desired features,
then the angle of orientation is misleading. The remedy for
this problem is to use the ratio of minor axis and major axis.
The ratio should be smaller than 0.75, which is calculated
statistically using a database of 300 faces from CMU face
database [20] in which an ellipse t operation is carried
out to determine this ratio for the desired facial features.
Therefore, both angle of orientation and the ratio of the
minor and major axis, are used as the decision criteria for
determining a facial feature from the feature candidates.
4.1. DT-DWT(S) based edge extraction for facial feature
edge enhancement
The recently developed complex dual tree wavelet
transform overcomes some of the shortcomings of the
Fig. 3. Samples from CMU face database and the detected faces using Viola & Jones face detector [22].
traditional real discrete wavelet transform (DWT). It has
improved directionality and reduced shift sensitivity and
it is approximately orientation invariant [18]. Its perfor-
mance in denoising applications has been shown to be
superior to the methods which employ real DWT, so better
detect it has better edge detection performance. The DT-
DWT(S) employs two real wavelet transforms in parallel
where the wavelets of one branch are the Hilbert trans-
forms of the wavelets in the other. In this manner any input
image can be decomposed into its 6 directional subbands.
The DT-DWT(S) in particular generates 6 subbands with
complex coecients in dierent directionalities, 165,
135, 105, 75, 45, 15. The real (R
i
) and imaginary (C
i
)
parts of complex wavelets in the 6 directional subbands
are depicted in Fig. 5.
It should be noted that the DT-DWT(S) is similar to the
popular Gabor wavelet analysis. Unlike the Gabor wave-
lets which can be rotated to any direction and scaled to
any resolution, the DT-DWT(S) has limited directionality
however unlike the Gabor wavelets there exist fast algo-
rithms for evaluating the DT-DWT(S). We employ the Hil-
bert transform pairs of wavelets designed in [23] for
analysis part of DT-DWT(S).
The input face image is decomposed one level into its
corresponding directional subbands. In order to aid the
facial feature extraction process we propose to enhance
the facial feature edge information in the DT-DWT(S)
domain by assembling a test statistic for the noise cor-
rupted wavelet coecients in the 6 subbands.
Dening a test statistics using the square root of the
squared sum of 6 real and complex subbands as
G
Ri Ci

R
2
i
C
2
i
_
i 1; 2; 3; . . . ; 6 where R
i
andC
i
are real
and complex subbands with the same directionality, and the
distribution of the test statistic follows the Rayleigh distribu-
tion [24]. Fig. 6(a) shows an example face image from the
CMU face database. The histograms of the test statistic
G
Ri Ci
after one level of the DT-DWT(S) are also depicted in
Fig. 6(b). The distributions at dierent resolutions also fol-
low similar statistics. Note that, the visualization of magni-
tude information in 6 directional bands for one level
complex wavelet transform is illustrated in Fig. 7.
G
Ri Ci
consists of twoclasses; w
edge
andw
noise
where the rst
class represents the edges and the second class refers to the
non-edges which can be considered to be noise. In the case
of facial feature extraction class refers to the non-edges can
be thought as the homogenous regions in the face region,
i.e., skin areas, etc. The distribution of the corresponding
edge class of G
Ri Ci
can be approximated as;
Fig. 4. Overall facial feature and face skin colour extraction algorithm.
Fig. 5. Complex dual-tree 2D wavelets and corresponding labels.
Fig. 6. (a) Segmented face region, (b) Histogram of magnitude of complex wavelet coecients G
R
1
C
1
, G
R
2
C
2
, G
R
3
C
3
, G
R
4
C
4
, G
R5C5
, and G
R
6
C
6
respectively.
p
Ri Ci
r=w
edge

r
r
2
edge
e
r
2
=2r
2
edge
5
Similar to (5), the distribution of the corresponding non-
edge class (noise) of G
R
i
C
i
can be approximated as;
p
Ri Ci
r=w
noise

r
r
2
noise
e
r
2
=2r
2
noise
6
where r is magnitude of complex coecient of G
Ri Ci
located
at spatial position of (x, y).
It should be noted that as the power of the additive
white noise contaminating the face image is increased the
above approximate probability distributions becomes more
accurate. The overall distribution of G
R
i
C
i
can then be
approximated as:
p
Ri Ci
r p
edge
p
Ri Ci
r=w
edge
p
noise
p
Ri Ci
r=w
noise
7
where p
edge
is the a priori probability for the edge related
coecients and p
noise
= 1 p
edge
is the a priori probability
for the noise-related coecients. In order to estimate
p
Ri Ci
r we need to estimate p
edge
, r
edge
, and r
noise
. The
parameters can be estimated by maximizing the following
likelihood function [24];
lnL
allx;y
lnp
Ri Ci
G
Ri Ci
x; y 8
Using (8), one can derive the iterative equations dened in
(9)(11) to obtain approximations for approximated esti-
mations of p
edge
, r
edge
, and r
noise
, note that there is no need
for estimate of p
edge
which is automatically derived from
afore mentioned equality with p
noise
.
s
i
n
t
p
t
i
rn
r
2
i
e
r
2
n
=2r
2
i
_ _
j
p
t
i
rn
r
2
j
e
r
2
n
=2r
2
j
_ _ 9
r
i
t 1
1
2
n
s
i
n
tr
2
n
n
s
i
n
t
10
p
t1
i

1
N
n
s
i
n
t; i fw
edge
; w
noise
g; n f1; 2; . . . ; Ng 11
where t is current iteration, (t + 1) is the next iteration, and
N is the total number of coecients in G
Ri Ci
.The correct
estimation of these parameters is crucial for the success
of the facial edge map detection. Fig. 8 depicts the histo-
grams of the coecient distributions as well as the esti-
mated distributions with the above derived parameters
for the 6 directional subbands. The close agreement be-
tween the true and estimated distributions is obvious. Once
the parameters are reliably and accurately estimated, one
can estimate using Bayes theorem the conditional distribu-
tion function as,
p
Ri Ci
w
edge
=r
p
edge
p
Ri Ci
r=w
edge
p
edge
p
Ri Ci
r=w
edge
1 p
edge
p
Ri Ci
r=w
noise
12
Eq. (12) gives the conditional probability that the observed
coecient in the DT-DWT(S) corresponds to an edge. In
order to enhance edge map, one can use scale and spatial
consistency of the edge class. Spatial consistency makes
use of the idea that edge pixels are most likely surrounded
by neighbouring edge pixels. Scale consistency implies that
each distinctive edge survive in consecutive scales with a
high probability. Then, for each spatial location
x; y; p
Ri Ci
w
edge
=r is smoothed using its 8-neighborhood
by using the following formula,
Fig. 7. Visualization of magnitude of complex wavelet coecients G
R1C1
,
G
R
2
C
2
, G
R
3
C
3
, G
R
4
C
4
, G
R5C5
, and G
R
6
C
6
(a) L channel, (b) a channel, (c) b
channel.
p
Ri Ci
x; yw
edge
=r
1
m1
1
n1
wm; np
Ri Ci
x m; y medge=r 13
where w(m, n) are scaling coecients from the Gaussian
Kernel. The formula dened in (13) provides spatial consis-
tency, which assumes that edge pixels are surrounded by
edge pixels, and suppress noise and spurious points which
are misclassied as edge pixels by (12).
The process is followed by enhancing each scale edge
map through scales using the idea dened in [24] which is
formulated by;
p
s
Ri Ci
w
edge
=r
3
1
p
s
R
i
C
i
w
edge
=r

1
p
s1
R
i
C
i
w
edge
=r

1
p
s2
R
i
C
i
w
edge
=r
14
where s denotes scale (large s means lower resolution). The
formula in (14) means that, every edge pixel in higher resolu-
tions should survive through lower resolutions if it is a strong
Fig. 8. Distribution of coecients in G
Ri Ci
(dashed lines) and approximation to that distribution using Eq. (7) (continuous lines) for G
R
1
C
1
, G
R
2
C
2
, G
R
3
C
3
,
G
R4C4
, G
R5C5
, and G
R6C6
, respectively.
edge pixel. Edge enhancement is depicted in Fig. 9, where the
edge information in the rst scale for all directional bands
shown in the rst row for the image given in Fig. 6(a), and
in the second rowtheir enhanced versions are demonstrated.
It is easily noticeable in Fig. 9 how edge information is pre-
served, and cleaned using inter and intra scale dependency.
The rst row represents the edge only information found
using DT-DWT(S) and its interscale information. Further
edge enhancements are applied using intra scale informa-
tion. It is noticeable that, the edges get more smooth struc-
ture with a trade of loosing details, which may be caused
from noise eects. The drawback may be overcome using a
combination of six dierent bands information which may
enhance overall edge map of desired object.
After the edge maps in the 6 directional subbands are
enhanced using the space and scale consistency, it remains
to combine the 6 directional subbands into an overall edge
map with all the directional edge information. Since the
CIE Lab colour space is used for better edge preservation,
a similar optimization is applied to both the luminance
component L and chrominance components a and b to
detect overall edge map as follows.
EMx; y arg max
i1;...;6
p
Ri Ci
x; yw
edge
=r 15
Once the three edge maps EM
L
(x, y), EM
a
(x, y), and
EM
b
(x, y) are calculated, they can be combined to give
the overall edge map E(x, y);
Ex; y arg maxfEM
L
x; y; EM
a
x; y; EM
b
x; yg 16
Fig. 10 depicts the step by step generation of nal edge map
E(x, y), which is the combination of EM
L
(x, y), EM
a
(x, y)
and EM
b
(x, y) with respect to Eq. (15) using a picture from
a face data set collected by Markus Weber [21] in DT-
DWT(S) and DWT domains.
Even though our focus is on DT-DWT(S), there will be
a comparison for the corresponding facial feature extrac-
tion algorithm with DWT, so here we are giving denition
of edge map for DWT. In order to extract edge map E(x,
y), which is the combination of EM
L
(x, y), EM
a
(x, y),
and EM
b
(x, y) for DWT, we only used horizontal and ver-
tical detail subbands which are lowhigh (LH) and high
low (HL) subbands of DWT. The square root of sums of
squares of LH and HL produces similar statistics dened
in (5) and (6) for edge and noise respectively. Rather than
producing 6 dierent edge maps as in DT-DWT(S), it pro-
duces only one edge map. It is apparent that, the DT-
DWT(S) extracts more robust edges with respect to
DWT. For edge enhancement, DWT uses two orthogonal
bands [24] and generates overall edge map. Since diagonal
information in DWT is noisy, it is not used in overall
consideration.
After E(x, y) has been extracted, we proceed to obtain a
binary map. The binary map together with the ellipse t-
ting procedure to be described next will help in obtaining
a more accurate face region from which skin colour model
can be extracted. In order to choose the threshold value
automatically for binarization of E(x, y), Otsus method
[17] is employed. The threshold is determined by choosing
the value that maximizes the discrimination criterion r
2
B
=r
2
w
where r
2
B
is the between-class variance and r
2
w
is the within-
class variance. Fig. 11 depicts E(x, y), and its binarized
version EB(x, y) for DT-DWT(S) and DWT. Again superi-
ority of DT-DWT(S) on DWT is shown in Fig. 11(c) and
(e). It is clear that, DT-DWT(S) extracts more features with
respect to DWT. It is clear in Fig. 11(c) that, facial features,
i.e., mouth, nostrils, eyes, eyebrows are extracted roughly
but it is not the case for Fig. 11(e).
4.2. Automatic skin colour extraction using facial feature
edge map
In order to extract the facial features more precisely we
propose an automatic skin colour model. To model the
skin area, one needs to extract skin pixels, pixels which
do not belong to the facial features. These are the pixels
that are mostly black in nal binary edge map. In order
to better model the skin colour, we t an ellipsoid to the
face region in the binary edge map and exclude the non-
face areas which expected as background region around
detected face, or the hairy part of the face. The detected
face region is square in size, which is a characteristic of
Viola and Jones face detector. It generally detects face
Fig. 9. Edge enhancement and noise removal using inter, and intra edge information; rst row is the raw edge information in dierent directional bands;
second row is the enhanced edge information.
region from forehead to upper part of chin. Since the width
of human face is smaller than the height of face, the size
from forehead to chin covers the background objects from
around. In order to eliminate this eect, an ellipsoid is t
on to the face detected face region.
It should be noted that the skin colour modelling does
not take into account the potential facial feature areas as
well. The idea behind this method is the assumption that
we have already detected signicant features of face, which
are mainly parts of mouth, eyes, nostrils, and some arte-
facts in skin area. The ellipse tting is used in order to dis-
criminate between the facial skin colour region from the
background and facial features which are distinctly non-
skin colour regions. By dening a tolerance measure D
Fig. 10. Step by step illustration of extraction of edge maps; rst row is extraction of EM
L
(x, y), second row is extraction of EM
a
(x, y), third row is
extraction of EM
b
(x, y), and fourth row is original image and overall edge map E(x, y) (a) is DT-DWT(S) (b) is DWT.
which is taken to be 5% of overall height of the cropped
face region, a large portion of the background region is
rejected. Thus a more accurate skin colour model can be
obtained.
An illustration of simple face with its ellipse tting is
shown in Fig. 12, and the minor and major axis of the
ellipse is dened by;
b
H 2D
2
a
2
3
b 17
The part of the face under the ellipse region and the
binary edge map are anded for the exclusion of non-skin
colour regions. Then the maximum connected component
of this image is selected for skin colour modelling. Fig. 13
summarizes this process. Fig. 13(a) is binary edge map,
and (b) is the ellipse parameterized with respect to Eq.
(17), and negative of edge map found in (a) is shown in
(c), and (d) shows the resultant binary map which is found
using binary AND operation of (b) and (c). The nal skin
colour map, SCM(x, y), is depicted in Fig. 13(e) in which
it is clear that the small parts inside eye area are removed
by selecting maximum connected component [25]. This
operation provides the data which is highly belongs to
the skin region. It is clear that the binary map shown in
Fig. 13(e) consists of skin regions only, which makes our
algorithm more robust with respect to constant skin color
model.
Fig. 11. Extraction of binary edge map EB(x, y) (a) Original face image
(b) Edge map E(x, y) using DT-DWT(S) (c) Binary edge map EB (x, y)
using DT-DWT(S) (d) Edge map E(x, y) using DWT (e) Binary edge map
EB(x, y) using DWT.
Fig. 12. Application of binary ellipsoid over face region.
Fig. 13. Extraction of skin colour map; (a) Overall binary edge map, (b)
Ellipse map (c) Negation of overall binary edge map, (d) Combination of (b)
and (c), (e) Overall skin map which is biggest connected component in (d).
Due to non-uniform lighting conditions and/or other
artefacts sometimes binary edge map merges eye regions
with eyebrows, nding the skin colour distribution can help
overcome these deciencies. We have experimented on
many colour spaces, and we have found that YCbCr colour
space is best representative of skin colour. Skin colour
region is modelled with a single three dimensional Gauss-
ian distribution with the following parameters:
x Y x; yCbx; yCrx; y
l
x

1
K
allx;y;SCMx;y1
x
1
K 1
allx;y;SCMx;y1
x
T
x
18
where x is input vector composed of colour space values at
spatial location (x,y) , l
x
is average vector computed using
ensemble averages of vectors x, and

x
is covariance ma-
trix. Using these parameters, one can easily derive a distri-
bution of skin pixels. Our assumption is that, the skin color
pixels in the corresponding face area is distributed with 3-
dimensional Gaussian distribution with the parameters gi-
ven in Eq. (18). Using the derived model one can dene a
binary map which is used to represent the skin color region
in corresponding color face image as follows;
SPx; y
1; exp
1
2
xl
x
1
x
xl
x
T
_ _ _ _
PT
h
0; otherwise
_
_
_
19
xY x; yCbx; yCrx; y
where SP(x, y) is a binary skin colour map which depicts
skin colour pixels as 1, and other pixels as 0. T
h
is a global
threshold values and it is set to 0.1 after many experiments.
Fig. 14, shows extraction of skin colour map of the face im-
age, where original images are shown in column (a), and
extracted skin colour regions are shown in column (b). It
is clear in Fig. 14 that the algorithm achieves eective skin
color model estimation.
The extracted skin color model can be used as an ini-
tial color model for face tracking applications. In the fol-
lowing gure, we have demonstrated this with the well
known video sequence Akiyo. The Fig. 15 shows the steps
described in this section for the rst frame of Akiyo
sequence, where the detected face region is shown in (a),
(b) shows the ellipsoid used in operations, (c) shows the
negative of segmented edge map, (d) shows the combina-
tion of (b) and (c), i.e., binary anding operation, and (e)
shows the nal skin color map detected for the corre-
sponding video sequence, and nally (f) shows the
detected face region with respect to Eq. (19) with the pix-
els in (f) are input to the model. It is clear in Fig. 15 that
corresponding skin color model eectively segmenting
detected face into skin regions and non-skin regions.
Fig. 16 shows the binary skin color region tracking using
the skin color model for the rst frame of Akiyo sequence
for the rst few frames.
It is very clear from Fig. 16, the proposed automatic
skin color extraction model eectively segments skin
regions. The Akiyo sequence from frames 5 to 40 is shown,
and its skin color segmented version is displayed also. Note
that, the rst frame of the sequence is used to train the
model parameters once.
4.3. Merging facial edge map with skin colour map to detect
facial feature landmarks
The extracted maps, facial edge map and skin colour
map are used to detect areas for facial feature components.
It is expected that the skin color map has no response in
facial feature areas. This information can be used in order
to enhance facial feature regions. The binary negative ver-
sion skin color map is called as negative skin color map,
NSP, which is formulated as follows,
NSP 1 SP 20
It is expected that, NSP has response in facial feature areas,
but no response in other areas. Fig. 17 shows, SP and NSP
for a picture from CMU face database.
As shown in Fig. 17 NSP gives high responses for facial
feature regions. Sometimes it is possible to have holes in
these regions because of skin color model parameter
Fig. 14. Automatic extraction of skin colour regions; (a) Original images
(b) Skin colour map.
Fig. 15. Demonstration of skin color extraction using edge map. (a) Detected face, (b) Corresponding ellipse, (c) Negative of edge map, (d) Binary anding
of (c) and (d), (e) Binary skin color map (f) Segmented face image using Eq. (19).
Fig. 16. Application of automatic skin color model extraction for Akiyo sequence, frames from 5 to 40 with increment of 5 frames and its corresponding
binarized skin map with respect to Eq. (19) is shown.
estimations. So that, each connected component of NSP is
processed with binary hole lling operation [25]. After NSP
is post-processed with binary hole lling operation, it is
binary anded with binary edge map, EB. Note that, after
this operation, the connected components which are
adjusted with EB are selected as candidate regions for
facial features. In selection part of facial features candi-
dates, the following criterions are used:
(a) Since the face image is assumed to be in upright ori-
entation, then the facial features are expected to be in
horizontal orientation, then ellipse tted over candi-
date region should have orientation in between 85
and 105.
(b) The minor axis versus major axis ratio of each tted
region should be smaller than the 0.75.
(c) Each facial feature should have almost symmetry
with respect to its major and minor axis that is mea-
sured ratio of covered area separated by the major
and minor axis. This ratio should be smaller than 0.1.
(d) If (a), (b), and (c) does not provide a unique selection
criterion for facial feature selection, and still there left
candidates with number of them higher than 5, i.e.,
mouth, left eye, right eye, left eyebrow, right eye-
brow, then use the rst ve largest connected compo-
nents for (f) selection.
(e) After ve candidates are selected, than upper left one
is assumed to be left eyebrow, below is left eye, and
upper right is assumed to be right eyebrow, and
below it right eye, and bottom one is the mouth.
In Fig. 18, we have shown the resultant candidate
regions, and selected regions with respect to steps listed
above.
After candidate regions are found, the next stage is nd-
ing the required facial feature landmarks; left and right
mouth corners, inner and outer corners of the left eyebrow,
inner and outer corners of the right eyebrow, inner and
outer corners of the left eye, and inner and outer corners
of the right eye. The following algorithm is used:
(a) For each candidate region nd a rectangular area
enclosing the original candidate.
(b) Find left and right sides of each candidate region, and
mark points as facial feature landmarks.
(c) Make rectangular areas 5 pixels wider and apply Har-
ris corner detector in each candidate rectangular area.
(d) If Harris corner detector nds two interest points
which are near to the region boundaries then select
Fig. 17. SP and NSP for a face image (a) Original image (b)
Corresponding SP (c) Corresponding NSP.
Fig. 18. (a) Original image, (b) Corresponding SP, (c) Corresponding
NSP (d) Corresponding EB (d) Binary anding of (c) and (d), (f) Selected
facial feature regions.
them as facial feature landmarks, if this process can
not be achieved, than select landmark positions as
the positions detected in (b).
5. System performance and discussion
In order to evaluate the performance of the proposed
system the CMU face database [20] is used. Approximately
frontal face images with varying lighting conditions are
selected. A sample set of our test images with varying illu-
mination conditions is shown in Fig. 19. The performance
of the facial feature detection system is evaluated using the
following methodology. When, all facial features, such as
eyes, eyebrows, and mouth, are correctly detected, and
then the detection is regarded as a correct detection, and
whenever one of them is not correctly detected, it is
assumed to be a misdetection. A correct detection is tested
based on the alignment of the facial feature corners of the
automatically segmented and the manually segmented fea-
tures with a tolerance of 3 pixels. Face images used in the
facial feature detection process are the images with
256 256 pixels which are segmented and normalized from
the face databases. Fig. 20 shows proposed detection mech-
anism and its response with detected facial features with
the algorithm mentioned above.
Fig. 19. A sample set from test images before face segmentation.
Fig. 20. Facial feature extraction; (a) Segmented face image, (b) Facial feature map, (c) Skin Colour Map, (d) Merging information in (b) and (c), (e)
Overall facial feature map enhanced using skin color map (f) Labelled facial features.
The proposed algorithm is tested over two sets selected
from Markus Weber and CMU face databases. Selected
sets show diversities in illumination conditions. The similar
algorithm is applied for DWT where only horizontal and
vertical bands are used to generate edge information.
The rst comparison is carried out for new proposed
algorithm in two dierent face databases. Table 1 shows
detection rates of ten facial feature points for the CMU
face database. The proposed facial feature detector is tested
on 250 sample face images to obtain the detection statistics.
The input images show a variation on illumination and it is
observed that there is diversity in texture of faces on the
overall faces. The proposed facial feature detector provides
92.5% average detection rate for all of the facial features if
the located feature points are within the window of 3
pixel distance from the manually pre-located correct fea-
ture locations. The feature detector reaches up to 93.7%
average detection rate for the detected features within the
window of 5 pixel distance. This detection rate can be
improved if we use dierent type of model for skin colour
segmentation, such as mixture of Gaussians, instead of
using unimodal Gaussian, which may better model the
overall skin color distribution.
Table 2 shows the same performance over 250 faces
from Markus Weber face database, in which there is very
small illumination variations with respect to samples
selected from CMU face database. Thats why we get
97.5% average detection rate within the window of 3 pix-
els distance and 98.5% average detection rate within a win-
dow of 5 pixels distance.
The proposed DT-DWT(S) based facial feature algo-
rithm is compared with a Discrete Wavelet Transform
(DWT) based approach which uses the same algorithm
with orthogonal and symmetric wavelets [26]. The detec-
tion performances of the proposed algorithm using DT-
DWT(S) and DWT are given in Table 3.
It is clear in Table 3 that the performance of the DT-
DWT(S) based approach is higher than the performance
of DWT based approach. DT-DWT(S) based algorithm
achieves on the average 92.5% detection rate for 3 pixel
distance, and 93.7% detection rate for 5 pixel distance,
on the other hand DWT based algorithm achieves on the
average 82.9% detection rate for 3 pixel distance, and
86.5% detection rate for 5 pixel distance. It is clear from
these statistics that the precision for DT-DWT(S) based
algorithm is on the average 9.6% better than DWT based
algorithm for 3 pixel distance.
6. Conclusion
In this paper, we introduce a novel method for facial fea-
ture extraction using DT-DWT(S) to extract edge informa-
tion using the 6 complex bands with dierent
directionalities. A test statistics whose distribution matches
very closely with the directional information in the six direc-
tional subbands of the DT-DWT(S) is derived and used for
Table 1
Facial feature detection results for 250 samples from the CMU face
database
Facial feature name Detection rate
3 pixel distance
Detection rate
5 pixel distance
Left mouth corner 0.91 0.95
Right mouth corner 0.94 0.94
Inner corner of the left eyebrow 0.93 0.93
Outer corner of the left eyebrow 0.88 0.90
Inner corner of the right eyebrow 0.92 0.93
Outer corner of the right eyebrow 0.89 0.92
Inner corner of the left eye 0.94 0.94
Outer corner of the left eye 0.93 0.94
Inner corner of the right eye 0.96 0.97
Outer corner of the right eye 0.95 0.95
Table 2
Facial feature detection results for 250 samples from the Markus Weber
face database
3 pixel distance
Detection rate
5 pixel distance
Table 3
Comparison of the proposed algorithm with two dierent wavelet
transforms (DT-DWT(S), DWT) in 250 samples collected from the
CMU face database
3 pixel distance
Detection rate
5 pixel distance
DT-DWT(S)
DWT
detecting facial feature edges. The model is developed with a
unimodal Gaussian distribution using the skin region which
is extracted excluding the detected edge map obtained from
the DT-DWT(S). The proposed method also employs an
adaptive skin colour model using YCbCr colour space.
The extracted skin region makes the system adaptive for
changing conditions, which is not the case for the prede-
ned colour region based skin colour classiers. Facial fea-
ture extraction is then performed by combining the edge
information obtained by using DT-DWT(S) and the non-
skin skin areas obtained from the pixel statistics. The pro-
posed algorithm is tested on face images taken from
CMU face database where we have selected images with
severe changes in illumination conditions. The system per-
forms 92.5% detection rate on the average for eyes, eye-
brows, and mouth features for the tolerance of 3 pixel
distance. Meanwhile the same algorithm with DWT per-
forms 82.9% detection rate on the average for the same
facial features. The proposed algorithms performance gets
better if the diversity in the illumination conditions is less,
which is approved by the detection statistics taken on
Marks Weber face database, where the algorithm achieves
97.5% detection rate on the average for 3 pixel distance.
The future work will be about analysis of multiscale
structure of DT-DWT(S) to train a facial landmark data-
base using a robust classier. The classiers outputs from
dierent scales will be embedded into ner scales to make
the classier stronger with respect to noise eects caused
from imaging hardware which can be combated easily
using multiscale structure of DT-DWT(S).
References
[1] Ming-Hsuan Yang, David Kriegman, Narendra Ahuja, Detecting
faces in images: a survey, IEEE Transactions on Pattern Analysis and
Machine Intelligence (PAMI) 24 (2002) 3458.
[2] R. Lanzarotti, P. Campadelli, N.A. Borghese, Automatic features
detection for overlapping face images on their 3D range modeICIAP
01: Proceedings of the 11th International Conference on Image
Analysis and Processing, IEEE Computer Society, Washington, DC,
USA, 2001, p. 316.
[3] Rein-Lien Hsu, A.K. Jain, Face modeling for recognitionProceedings
of International Conference on Image Processing, vol. 2, IEEE,
Thessaloniki, Greece, 2001, pp. 693696.
[4] R. Brunelli, T. Poggio, Face recognition: features versus templates,
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 15, 10, s.l., IEEE, October 1993, pp. 10421052.
[5] A.L. Yuille, P.W. Hallinan, D.S. CohenFeature Extraction From
Faces Using Deformable Templates, vol. 8, Kluwer Academic
Publishers, 1992, pp. 99111.
[6] R. Herpers, G. Sommer, An attentive processing strategy for the
analysis of facial features, in: H. Wechslcr et al. (Eds.), Face
Recognition: From Theory to Applications, Springer Verlag, Lon-
don, 1998, pp. 457468.
[7] L. Wiskott et al., Face recognition by elastic bunch graph matching,
in: L.C. Jain (Ed.), Intelligent Biometric Techniques in Fingerprints
and Face Recognition, CRC Press, Boca Raton, 1999, pp. 355396.
[8] M. Pardas, M. Losada, Facial parameter extraction system based on
active contours, in: Proceedings of International Conference on
Image Processing, s.l., IEEE, 2001, vol. 1, pp. 10581061.
[9] T. Kawaguchi, D. Hidaka, M, Rizon, Detection of eyes from human
faces by Hough transform and separability lter, in: Proceedings of
International Conference on Image Processing, s.l., IEEE, 2000, vol.
1, pp. 4952.
[10] P. Campadelli, R. Lanzarotti, C. Savazzi, A feature-based face
recognition system, Proceedings of 12th International Conference on
Image Analysis and Processing 1 (2003) 6873.
[11] M.A. Arbib, T. Uchiyama, Color image segmentation using compet-
itive learning, IEEE Transactions on Pattern Analysis and Machine
Intelligence 16 (1994) 11971206.
[12] G. Donato, et al., Classifying facial actions, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 10, s.l., IEEE, October
1998, vol. 21, pp. 974989.
[13] V. Kyrki, J.-K. Kamarainen, H. Kalviainen, Simple Gabor feature
space for invariant object recognition, Pattern Recognition Letters, 3,
s.l., Elsevier Verlag, February 2004, vol. 25, pp. 311318.
[14] J. Jones, L. Palmer, An evaluation of the twodimensional Gabor lter
model of simple receptive elds in cat striate cortex, Journal of
Neurophysiology (1987) 12331258.
[15] D. Field, Relations between the statistics of natural images and the
response properties of cortical cells, Journal of the Optical Society of
America A 4 (1978) 23792394.
[16] D. Burr, M. Morrone, D. Spinelli, Evidence for edge and bar
detectors in human vision, Vision Research 29 (1989) 419431.
[17] H.K. Ekenel, B. Sankur, Multiresolution Face Recognition, Image
and Vision Computing, vol. 23, 5, s.l., Elsevier Verlang, 2005, pp.
469477.
[18] I.W. Selesnick, R.G. Baraniuk, N.C Kingsbury, The dual-tree
complex wavelet transform, IEEE Signal Processing Magazine, vol.
22, 6, s.l., IEEE, November 2005, pp. 123151.
[19] N.G. Kingsbury, Image processing with complex wavelets, Philo-
sophical Transactions of the Royal Society (1999).
[20] T. Sim, S. Baker, M. Bsat, The CMU Pose, Illumination, and
Expression (PIE) database, in: Proceedings of Fifth IEEE Interna-
tional Conference on Automatic Face and Gesture Recognition, s.l.,
IEEE, 2002. pp. 4651.
[21] Weber, Markus, Face database collection of Markus Weber, Face
Database Collection of Markus Weber. [Online] 02 February 2006.
http://www.vision.caltech.edu/Image_Datasets/faces/.
[22] P. Viola, M.J. Jones, Robust real-time face detection, International
Journal of Computer Vision 57 (2004) 137154.
[23] R. Yu, H. Ozkaramanli, Hilbert transform pairs of orthogonal
wavelet bases: necessary and sucient conditions, IEEE Transactions
on Signal Processing, vol. 53, 12, s.l., IEEE, December 2005, pp.
4723-4725.
[24] J. Scharcanski, C.R. Jung, R.T. Clarke, Adaptive image denoising
using scale and space consistency, IEEE Transactions on Image
Processing, vol. 11, 9, s.l., IEEE, September 2002, pp. 10921101.
[25] Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing,
Prentice Hall, 2002.
[26] A.F. Abdelnour, I.W. Selesnick, Design of 2-band orthogonal near-
symmetric CQF, Proceedings of IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP01) 6 (2001) 3693
3696.

Facial Feature Extraction Using Complex Dual-Tree Wavelet Transform

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Facial Feature Extraction Using Complex Dual-Tree Wavelet Transform

Uploaded by

Copyright:

Available Formats

Facial feature extraction using complex dual-tree wavelet transform

zkaramanl, Hasan Demirel

You might also like