You are on page 1of 8

An Automated Method for Predicting Iris Segmentation Failures

Nathan Kalka, Nick Bartlow, and Bojan Cukic


AbstractArguably the most important task in iris recogni-
tion systems involves localization of the iris region of interest,
a process known as iris segmentation. Research has found that
segmentation results are a dominant factor that drives iris
recognition matching performance. This work proposes tech-
niques based on probabilistic intensity features and geometric
features to arrive at scores indicating the success of both pupil
and iris segmentation. The technique is fully automated and
therefore requires no human supervision or manual evaluation.
This work also presents a machine learning approach which
utilizes the pupil and iris scores to arrive at an overall iris
segmentation result prediction. We test the techniques using
two iris segmentation algorithms of varying performance on
two publicly available iris datasets. Our analysis shows that
the approach is capable of arriving at segmentation scores
suitable for predicting both the success and failure of pupil
or iris segmentation. The proposed machine learning approach
achieves an average classication accuracy of 98.45% across
the four combinations of algorithms and datasets tested when
predicting overall segmentation results. Finally, we present one
potential application of the technique specic to iris match
score performance and outline many other potential uses for
the algorithm.
I. INTRODUCTION
The performance of iris recognition systems is driven in
part by application scenario requirements. Standoff distance,
subject cooperation, underlying optics, and illumination are
just a few examples of factors associated with these re-
quirements. These factors subsequently dictate the nature
of images an iris recognition system will deal with. At the
image level, iris segmentation is arguably one of the most
important factors driving recognition performance [1]. That
is, if the iris regions are successfully localized for pairs
of images to be matched, the correct classication will be
almost always be made. Iris image segmentation typically
consists of two problems. First, one must dene the boundary
between the pupil, the black region in the center of the eye
and the iris, the textured region surrounding the pupil as
shown by the inner green ring in Figure 1(a). Second, one
must dene the boundary between the iris and the sclera, or
the lighter region surrounding the iris as shown by the outer
blue ring in Figure 1 (a). Many methods exist for detecting
these boundaries.
Manuscript recieved June 7, 2009. This work was supported in part by
National Science Foundation (CNS-0325333) and by the afliates of Center
for Identication Technology Research.
Nathan Kalka is with the Lane Department of Computer Science and
Electrical Engineering, West Virginia University, Morgantown, WV 26506-
6109 USA, phone: 304-293-4918 (e-mail: nathan.kalka@mail.wvu.edu).
Nick Bartlow is with Booz Allen Hamilton, Herndon, VA 20171-3025 USA,
phone: 703-984-7084 (e-mail: bartlow nicholas@bah.com). Bojan Cukic is
with the Lane Department of Computer Science and Electrical Engineering,
West Virginia University, Morgantown, WV 26506-6109 USA, phone: 304-
293-9686 (e-mail: bojan.cukic@mail.wvu.edu).
Whether due to limitations of algorithms or poor image
quality, failed segmentation often accounts for misclassi-
cation errors in iris recognition systems. As a result, the
ability to automatically determine whether the segmentation
block of an iris recognition system has succeeded or failed
is of paramount importance when attempting to predict the
outcome of matching. Despite the obvious utility of such
an automated tool, the authors are unaware of any work
that addresses this issue. Whether a binary success / failure
ag or a measure with higher granularity, currently existing
algorithms do not explicitly evaluate segmentation result. As
(a) Correctly segmented image. (b) Failed pupil segmentation.
(c) Failed iris segmentation. (d) Failed pupil and iris seg-
mentation
Fig. 1. Four types of segmentation results (a) Correctly segmented image
(b) Failed pupil segmentation (c) Failed iris segmentation (d) Failed pupil
and iris segmentation.
a result, without human inspection, the success of segmen-
tation blocks is largely unknown in most iris recognition
systems. Having a tool which provides such information is
useful in an operational sense in that it can serve as an
indicator to reacquire a better image if feasible. Otherwise,
when reacquisition is not an option, such a measure could
serve to ag entrance into a computationally more expensive
automatic segmentation block or, if that fails, to perform
manual segmentation. Related to this idea, many iris quality
algorithms utilize local analysis which requires (at least a)
rough iris segmentation [2], [3]. If the segmentation fails, the
quality estimate will be inaccurate.
This paper presents a technique which automatically mea-
sures the success of iris segmentation. Using techniques
based on probabilistic intensity features and geometric fea-
tures we arrive at scores indicating the success of both pupil
and iris segmentation. Besides looking at the correctness
of the pupil and iris segmentation independently, we also
provide the ability to arrive at a global binary segmentation
evaluation result by way of a decision tree based machine
learning approach. We test the accuracy of the approach on
two databases using two different segmentation algorithms.
Additionally, to demonstrate one application of the tool, we
investigate the effect that the varying success of iris segmen-
tation has on iris matching performance. Finally, we compare
our metrics ability to detect erroneous segmentation to that
of Zuo et. al [4].
The remainder of the work is broken down as follows.
Section II provides a summary of related work. Section III
describes the experimental design including the data sets
and segmentation algorithms. Section IV describes in detail
the approach used to automatically measure segmentation
results as well as the decision tree approach to arriving at a
binary global segmentation result. Section V investigates the
experimental results in terms of the segmentation measures
and the ability of these measures to predict recognition
performance. Section VI provides a discussion of consider-
ations and limitations of the technique. Finally, Section VII
concludes the work and discusses potential areas for future
work.
II. RELATED WORK
To the best of our knowledge, the only prior work focusing
on automatic iris segmentation evaluation is that of Zuo et. al
[4]. This work analyzes the gradient along the pupil and iris
boundaries to discern the success or failure of segmentation.
Other research has focused on manual segmentation. In [1],
the authors develop an iris segmentation algorithm speci-
cally for non-ideal images. Experimentation is done on four
data sets, which have been manually ground truthed. They
dene failed segmentation to include boundaries, either pupil
or iris, that do not fall along their respective borders. Bonney
[5] proposed an algorithm for non-orthogonal (off-angle)
iris segmentation. Ground truthing of data was performed
manually to generate a mask of all iris pixels. More recently,
phase I of the Noisy Iris Challenge Evaluation (NICE) has
set out to evaluate the success of iris segmentation algorithms
on images known to contain iris obstructions, reections, off-
angle or motion blurred images, etc. that make segmentation
more challenging [6]. In this challenge, segmentation results
of the submitted algorithms will be compared to ground truth
segmentation determined by manual inspection.
III. EXPERIMENTAL DESIGN
To test the proposed technique, we selected two publicly
available iris data sets and two iris recognition algorithms (in-
cluding segmentation, encoding, and matching blocks). For
data sets, we chose the WVU non-ideal iris set [7] and the
NIST Iris Challenge Evaluation (ICE) set [8]. At the time of
the experiment, the WVU data contained 2, 412 images and
the ICE dataset contained 2, 953 images. For each data set, all
images were segmented with two segmentation algorithms.
The rst algorithm was a WVU in-house algorithm authored
by Zuo et. al. [1]. The second is Maseks publicly available
TABLE I
SEGMENTATION GROUND TRUTH RESULTS FOR BOTH BOTH
ALGORITHMS ACROSS WVU AND ICE DATASETS.
Segmentation Failures
Zuo et. al Masek
Segmentation
Category
WVU ICE WVU ICE
Pupil 23 (0.95%) 19 (0.64%) 315 (13.06%) 415 (14.05%)
Iris 31 (1.29%) 23 (0.70%) 697 (28.90%) 193 ( 6.54%)
Either 37 (1.53%) 24 (0.81%) 812 (33.67%) 560 (18.96%)
Both 17 (0.70%) 18 (0.61%) 200 ( 8.29%) 48 ( 1.63%)
implementation [9]. We chose to use Maseks algorithm as
we found it has a greater number of errors than Zuos in
both the pupil and iris segmentation. It is also worthy of
noting that we did not optimize the segmentation parameters
of either algorithm to the individual data sets. Additionally,
Zuos segmentation algorithm outputs segmentation precision
scores which we compare with in section V. After having
segmented both data sets with both algorithms, the segmen-
tation results were ground truthed. Figure 1 shows example
segmentations and the caption lists associated ground truth
results. Table I summarizes the segmentation ground truth
results from both algorithms on the two data sets. After
the ground truth results were tabulated, we processed the
segmented images for the four combinations of algorithms
and data sets by scoring the pupil and iris segmentation
results with the technique described in the following section.
After processing the images, we ran the results, including
pupil and iris scoring, through a machine learner to arrive
at a simple binary segmentation result prediction model.
That is, the model predicts that either both the pupil and
iris boundaries were correctly estimated (good segmentation)
or at least one of the boundaries was incorrectly estimated
(failed segmentation). Finally, we look into the effect that
varying iris segmentation results has on iris match scores
by ltering the match score results according to the model
predictions.
IV. APPROACH
Our approach to automatic iris segmentation evaluation
utilizes probabilistic models to estimate the validity of the
pupil segmentation. Distance measures based on the con-
centricity and eccentricity (only used in elliptical based
segmentation models) of the iris and the pupil are used as
an estimate (score) of the validity of the iris segmentation.
Finally, the scores are used as features to build a model
through machine learning, to predict segmentation success
or failure. The following sections describe these methods in
detail.
A. Pupil Segmentation Measure
Given a pupil segmentation, we are interested in whether
the pixels, x, that fall within the pupillary boundary are
indeed pupil pixels. By tting probabilistic models for x, we
can formulate a likelihood ratio test, (x), to decide between
pupil and non-pupil pixels. In other words,
(x) =
P(x|H
1
)
P(x|H
0
)
, (1)
where H
1
: x corresponds to pupil pixel and H
0
: x
corresponds to a non-pupil pixel.
To do so, we need to choose models for P(x|H
1
) and
P(x|H
0
). We assume that the pupil area of an iris image
is a relatively at homogenous region of dark intensities
(with respect to the iris), with discontinuities resulting only
in the presence of eyelashes / eyelids or specular reection.
Additionally, we observed from spatial intensity histograms

0.03
0.035
0.04
0.045
0.05
F
re
q
u
e
n
c
y
Ring of nterest Histogram

mage ntensities Bins
ris Fit
Pupil Fit
3
W ,
W
0.005
0.01
0.015
0.02
0.025
0.03
F
re
q
u
e
n
c
y
,
W
0 50 100 150
0
ntensity (Bin #)


Fig. 2. (1.) An iris image and its overlayed segmentation boundaries. (2.)
Mask out all regions outside of the segmentation. (3.) Compute the intensity
histogram of the unmasked region and estimate parameters for each model.
For this specic image, Bin(Pt) = 24, Bin(It) = 123,

= 4.75,
= 73.93, and = 19.39. (4.) Every pixel within the estimated pupil
segmentation is classied as belonging to the pupil or not. Red pixels were
classied as non-pupil pixels; blue circle corresponds to the estimated pupil
segmentation boundary from Maseks algorithm.
of correctly segmented pupil regions, under ideal lighting
conditions, that most of the intensity frequencies fall close
to zero with a small step-ladder effect trailing off towards
higher intensities as a result of specular reection / eyelashes.
On the other hand, under ideal lighting conditions, we
assume the iris area is much more heterogenous as compared
to the pupil area, particulary around the collarette region,
while the iris region closer to the sclera is much atter.
The spatial histograms of correctly segmented iris regions
(without occlusion masks) show that the frequency spread is
much wider than that of the pupil and curiously bell-shaped.
Given these observations and based on empirical evaluation,
we t a Gamma distribution for P(x|H
1
) (k, ) which
is characterized by two parameters, shape k and scale . We
employ a Gaussian distribution for P(x|H
0
) N(,
2
).
The last point of concern is estimating parameters for each
model, which are computed online (we assume the shape
parameter k = 1 for all experiments thus only the scale
parameter is estimated for the Gamma model). That is,
for each image, a new set of parameters is estimated for
each model. This is accomplished as follows. First, the image
region of interest (ROI) is localized (i.e. removing everything
but the segmentation result). The spatial histogram of the im-
age intensities for the ROI is computed. Once the histogram
has been computed, the scale parameter is estimated as:

=
Bin(Pt)

i=0
x
i
w
i
, (2)
where P
t
is a threshold used to constrain the size of the
pupil region, x
i
is a gray level bin from the histogram of
the ROI, and w
i
is the weight associated with bin x
i
which
sums to one (the normalizing term
1
k
is omitted since we
assume k = 1). In other words,

is derived by summing
the product of the gray level bins and the associated weights
until the bin corresponding to P
t
is reached. Similarly, the
parameters for the Gaussian are estimated as:
=
Bin(It)

i=Bin(Pt)+1
x
i
w
i
,

2
=
Bin(It)

i=Bin(Pt)+1
w
i
(x
i
)
2
, (3)
where I
t
is a threshold used to constrain the size of the
iris region, x
i
is a gray level bin from the histogram of
the ROI, and w
i
is the weight associated with bin x
i
which
sums to one. Thresholds, P
t
and I
t
, have been determined
based on experimental evaluation (all results presented with
P
t
= 10000 and I
t
= 25000). Finally, every pixel within
the pupil boundary is assigned 0 or 1 based on on equation
1 and the ratio of these values is used as the pupil over-
segmentation score. Fig. 2 is a block diagram of this process.
The described metric is designed to measure pupil over-
segmentation, P
over
, that is when the estimated pupil bound-
ary is greater than the actual pupil boundary. Pupil under-
segmentation, when the estimated pupil boundary is smaller
than the actual boundary, would remain undetected. To
accommodate under-segmentation, we employ an iterative
approach that increases the estimated pupil radius (or in
the case of an ellipse, the estimated major and minor axis)
and determines whether those pixels inside the expanded
pupil radius are pupil pixels by using equation (1). This
process continues until the pupil radius has reached the size
of the iris radius or the ratio of pupil to non-pupil pixels
is less than 20% which was chosen based on experimental
evaluation. The rationale behind the addition of the threshold
was to prevent the inuence of heterogenous factors such as
dark eyelashes/eyelids and reduce unnecessary computations
when the estimated pupil boundary is not under-segmented.
The nal under-segmentation score is calculated as:
P
under
=
P
over
P
over
+ P
est under
. (4)
The over-segmentation score is utilized here because the
pupil boundary may contain non-pupil pixels whereas P
over
is an estimate of just the pupil pixels within the pupil
boundary. P
est under
is the total number of estimated pupil
pixels over all iterations (displayed as green in Fig 3(b)).
Figure 3 is an illustration of this process. Figure 3(a) is
a failed pupil and iris segmentation. The blue ellipse is
(a) Poorly segmented image. (b) Under-segmentation esti-
mation.
Fig. 3. Pupil under-segmentation estimation (a) Under segmented pupil
(b) Estimated under-segmentation.
the estimate of the pupil boundary while the red ellipse
is the estimated iris boundary. Our pupil over-segmentation
produces a score of P
over
= 0.74 indicating that there is
no over-segmentation. Figure 3(b) is an illustration of the
masked region overlayed with the estimated pupil pixels
(green) and increasing elliptical bands (red). Our pupil under-
segmentation score for this image is P
under
= 0.44.
B. Geometric Iris Measure
Often, the pupil and iris boundaries are not necessarily
concentric but the distance between their centers is typically
small [10] (with the exception of extreme off-angle). We
observed that when iris and/or pupil segmentation fail, the
distance between the pupil and iris centers increases. We
noticed a similar phenomenon for failures with an elliptical
based segmentation model, in addition to an increase in
eccentricity. Based on these observations, we make use of
a measure for iris evaluation that is based on the eccentricity
and concentricity of the pupil and iris boundaries. We utilize
the following expressions as measures for circular (5) and
elliptical (6) based models:
I
C
=

(p
x
i
x
)
2
+ (p
y
i
y
)
2
, (5)
I
E
=

(p
x
i
x
)
2
+ (p
y
i
y
)
2
+ arccos

b
i
a
i

100 (6)
+arccos

b
p
a
p

100,
where (p
x
, p
y
) are the pupil center coordinates, (i
x
, i
y
) are
the iris center coordinates, b
i
and a
i
are the semi-minor and
semi-major axes for iris ellipse and b
p
and a
p
are the semi-
minor and semi-major axes for pupil ellipse.
V. EXPERIMENTAL RESULTS
This section is broken down into three subsections. First,
we establish the ability of the described scoring approach
to accurately predict segmentation results for both pupil and
iris segmentation. Next, we present the ability of the machine
learning NBTree model to predict overall segmentation re-
sults. Then to demonstrate one potential application of the
technique, we present the effect of ltering iris recognition
match scores by the predicted segmentation results. Finally,
we compare our approach to the precision metric described
by Zuo et. al.
A. Pupil and Iris Scoring Results
The primary goal of this paper is to arrive at an automatic
technique for measuring segmentation results of both pupil
and iris segmentation. With that in mind, we tested the scor-
ing techniques on two datasets for two different segmentation
algorithms.
1) Masek Segmentation: First we look at the pupil and
iris scoring with Masek algorithm and the WVU dataset.
Figure 4 shows the distributions of scores for images that
have both good segmentation and failed segmentation. Figure
4 (a) depicts the pupil over-segmentation scoring results
with the black curve representing the distribution for images
that failed segmentation and the green curve corresponding
to images that were correctly segmented. As can be seen
by the plot, the distributions are separated fairly well with
the correctly segmented images having a mean score of
0.91 and the images that failed pupil segmentation having
a mean score of 0.28. Figure 4 (b) illustrates the pupil
under-segmentation scores. The distributions for both failed
and correct segmentation are much closer than that of their
over-segmentation counterparts, having a mean of 0.77 and
0.91 respectively. A similar trend can be seen with the iris
0 0.2 0.4 0.6 0.8 1
0
100
200
300
400
500
600
700
800
Pupil Score
C
o
u
n
t
Libor Masek WVU Pupil Over Segmentation
Good Segmentation
Failed Segmentation
(a) Masek/WVU: Pupil Scores.
0 0.2 0.4 0.6 0.8 1
0
100
200
300
400
500
600
700
800
Pupil Score
C
o
u
n
t
Libor Masek WVU Pupil Under Segmentation
Good Segmentation
Failed Segmentation
(b) Masek/WVU: Iris Scores.
0 0.2 0.4 0.6 0.8 1
0
100
200
300
400
500
600
700
800
Iris Score
C
o
u
n
t
Libor Masek WVU Iris Segmentation
Good Segmentation
Failed Segmentation
(c) Masek/WVU: Pupil / Iris ROC.
10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
FAR
G
A
R
Libor Masek WVU Pupil / Iris ROC
PupilOver
Iris
PupilUnder
(d) Masek/WVU: Pupil / Iris ROC.
Fig. 4. Score distributions for Masek segmentation on WVU data (a)
Pupil-Over distributions (b) Pupil-Under distributions (c) Iris distributions
(d) Pupil / Iris ROC curve.
scores (min-max normalized between 0 and 1) as shown
in plot 4(c). Here the scores corresponding to images with
correctly segmented iris regions have a mean of 0.94 while
the incorrect images scored 0.72 on average. We see however,
that there is a wider distribution in the scoring for the failed
iris segmentations vs. the failed pupil segmentations with
the standard deviations falling at 0.18 and 0.14, respectively.
Plot 4(d) displays GAR/FAR ROC curves for both the pupil
and iris segmentation scoring. Here we dene a genuine
accept to be a correctly segmented image classied as
correctly segmented and a false accept to be an incorrectly
segmented image classied as correctly segmented. Based
on the distributions seen in plots (a), (b), and (c) it is
not surprising to see that the pupil over-segmentation ROC
indicates almost perfect performance while the pupil under-
segmentation and iris curves show the existence of errors
with an EER 34% and EER 11% respectively.
In Figure 5 we refer to the ICE data. Plot (a) shows
the distributions for the pupil over-segmentation scores that
appear very similar to the pupil over-segmentation scores
of the WVU data. Here the mean scores for correct and
failed segmentation are 0.91 and 0.37 respectively. On the
other hand we see a preponderance of overlapping scores
in the pupil under-segmentation and iris scores (min-max
normalized between 0 and 1) shown in plot (b) and (c).
Here the mean under-segmentation score for images that
failed segmentation is approximately the same, 0.78, as with
the WVU scores, while the mean for correct segmentations,
0.86, decreased. The mean iris score for images that failed
segmentation drops to 0.62 down from 0.72 in the WVU
images. As a result, we see the performance of the iris
segmentation classication increases in plot (d). While the
EER remains fairly constant, a GAR of 80% can be achieved
at an FAR of 2.6%. Whereas in the WVU data a GAR of
80% can only be reached at an FAR over 7%.
0 0.2 0.4 0.6 0.8 1
0
100
200
300
400
500
600
700
Pupil Score
C
o
u
n
t
Libor Masek ICE Pupil Over Segmentation
Good Segmentation
Failed Segmentation
(a) Masek/ICE: Pupil Scores.
0 0.2 0.4 0.6 0.8 1
0
100
200
300
400
500
600
700
Pupil Score
C
o
u
n
t
Libor Masek ICE Pupil Under Segmentation
Good Segmentation
Failed Segmentation
(b) Masek/ICE: Iris Scores.
0 0.2 0.4 0.6 0.8 1
0
100
200
300
400
500
600
700
Iris Score
C
o
u
n
t
Libor Masek ICE Iris Segmentation
Good Segmentation
Failed Segmentation
(c) Masek/ICE: Pupil / Iris ROC.
10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
FAR
G
A
R
Libor Masek ICE Pupil / Iris ROC
PupilOver
Iris
PupilUnder
(d) Masek/ICE: Pupil / Iris ROC.
Fig. 5. Score distributions for Masek segmentation on ICE data (a) Pupil-
Over distributions (b) Pupil-Under distributions (c) Iris distributions (d)
Pupil / Iris ROC curve.
2) Zuo et. al. Segmentation: As noted in the experimental
design, Zuos algorithm makes far fewer errors on both the
WVU and ICE datasets. As a result, the distributions for
the failed pupil and iris scores are not suitable for graphical
representation. However, Table II characterizes the data in
terms of mean and standard deviation. Here the mean pupil
scores for under-segmentation and over-segmentation on the
WVU data fall at 0.95 and 0.85, which are slightly better than
TABLE II
PUPIL/IRIS SCORE DISTRIBUTION STATISTICS FOR ZUOS
SEGMENTATION.
WVU ICE
Segmentation
Category Mean Std Mean Std
Failed Pupil Over 0.66412 0.20991 0.3462 0.2934
Correct Pupil Over 0.85326 0.055811 0.86312 0.066401
Failed Pupil Under 0.6007 0.30259 0.86535 0.20652
Correct Pupil Under 0.95026 0.10379 0.97827 0.06806
Failed Iris 0.62306 0.22929 0.58317 0.32226
Correct Iris 0.89204 0.066236 0.87858 0.047397
those from Maseks algorithm. The iris scores for correctly
segmented images fall at 0.89 which is more consistent
with the results from Maseks segmentation. As a result,
although noticeably of a more stepwise shape (due to a few
incorrect segmentations), the performance of the iris scoring
as a prediction mechanism is similar to what we see in the
Masek/WVU results. In Figure 6(a) we do, however, notice
that the pupil over-segmentation ROC curve drops below
the iris ROC curve, a phenomenon that is noticed only in
the combination of algorithm / data set. We provide a short
explanation for this phenomenon in the discussion section.
Looking into the pupil and iris scoring on the ICE data
with Zuos segmentation algorithm, we see similar trends
as in Figures 4 and 5. Here the mean score for correctly
segmented images falls at 0.86, 0.98, and 0.88 for pupil
over/under and iris segmentations.
B. Predicting the Overall Segmentation Result
While the previous results are useful for measuring and
subsequently predicting the outcome of pupil and iris seg-
mentation results independently, we have yet to explore the
notion of a prediction of overall segmentation success. That
is, given the score(s) for the pupil segmentation and the score
for iris segmentation, can we predict whether both bound-
aries have been successfully segmented as in Figure 1(a).
Conversely can we predict failed overall segmentation when
the pupil, iris, or both boundaries are incorrectly estimated.
As mentioned in the experimental design section, we chose
to use the NBTree (NaiveBayes Tree) approach described
in [11]. Due to space limitations and the observation that
other decision tree approaches performed similarly well,
we omit any discussion on this specic machine learning
technique. What is of interest to us is the performance of
the two-class decision problem. Specically, given the pupil
and iris score(s), how well does the NBTree model (or any
other model of interest) predict the overall segmentation
result? Table III shows the confusion matrices for the two
segmentation algorithms across both data sets utilizing just
the pupil over segmentation measure. It should be noted
that the default parameters were used for the WEKA im-
plementation of the NBTree algorithm. Additionally, 10 X
10 cross validation was used to train / test the models. We
see that with just pupil over-segmentation alone the model
accurately predicts segmentation performance, achieving an
overall correct classication accuracy of 90.52%, 94.82%,
98.88%, and 99.53% for Masek - WVU, Masek - ICE, Zuo
10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
FAR
G
A
R
Zuo WVU Pupil / Iris ROC
PupilOver
Iris
PupilUnder
(a) Zuo et. al - WVU
10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
FAR
G
A
R
Zuo ICE Pupil / Iris ROC
PupilOver
Iris
PupilUnder
(b) Zuo et. al - ICE
Fig. 6. Pupil / iris ROC curves for Zuo et. al segmentation (a) WVU (b)
ICE.
- WVU, and Zuo - ICE respectively. Table IV illustrates
classication performance when we train the NBTree model
with pupil over-segmentation and iris segmentation scores.
We notice a modest increase in performance with overall
correct classication accuracies of 94.71%, 99.19%, 99.54%,
and 99.93% for Masek - WVU, Masek - ICE, Zuo - WVU,
and Zuo - ICE respectively. Finally, Table V provides clas-
sication performance when the model is trained with the
minimum of pupil over/under segmentation score and the
iris segmentation scores. More specically, we are taking
the smaller of the two pupil measures in combination with
the iris measure. We tested combining all three features
and noticed a much more drastic see-saw effect across
the combination of segmentation algorithms and datasets as
opposed to using just the minimum of the pupil measures.
As with the previous tables we see a slight increase in
performance for Masek - WVU (95.38) and Jinyu - WVU
(99.59) in terms of overall correct classication accuracy. On
the other hand there is no change for Libor - WVU (99.19)
and a slight decrease in performance for Zuo - ICE (94.71).
In the case if Zuo-ICE, an additional instance of a failed
segmentation was classied as correct.
TABLE III
PUPIL OVER-SEGMENTATION: NBTREE MISCLASSIFICATIONS. A G =
ACTUAL GOOD SEGMENTATION, A F = ACTUAL FAILED SEGMENTATION,
P G = PREDICTED GOOD SEGMENTATION, P F = PREDICTED FAILED
SEGMENTATION
(a)
Masek - WVU
P G P F
A G 98.81% 0.19%
A F 27.46% 72.54%
(b)
Masek - ICE
P G P F
A G 99.96% 0.04%
A F 27.14% 72.86%
(c)
Zuo et. al - WVU
P G P F
A G 99.92% 0.08%
A F 67.57% 32.43%
(d)
Zuo et. al - ICE
P G P F
A G 100% 0.00%
A F 50.00% 50.00%
TABLE IV
PUPIL OVER-SEGMENTATION + IRIS MEASURE: NBTREE
MISCLASSIFICATIONS.
(a)
Masek - WVU
P G P F
A G 98.85% 1.15%
A F 13.30% 86.70%
(b)
Masek - ICE
P G P F
A G 99.71% 0.29%
A F 3.04% 96.96%
(c)
Zuo et. al - WVU
P G P F
A G 99.96% 0.04%
A F 27.03% 72.97%
(d)
Zuo et. al - ICE
P G P F
A G 100% 0.00%
A F 8.33% 91.67%
TABLE V
PUPIL min Over, Under SEGMENTATION + IRIS MEASURE: NBTREE
MISCLASSIFICATIONS.
(a)
Masek - WVU
P G P F
A G 98.85% 2.22%
A F 11.33% 88.67%
(b)
Masek - ICE
P G P F
A G 99.71% 0.29%
A F 3.04% 96.96%
(c)
Zuo et. al - WVU
P G P F
A G 99.83% 0.17%
A F 16.22% 83.78%
(d)
Zuo et. al - ICE
P G P F
A G 100% 0.00%
A F 12.50% 87.50%
C. Filtering iris match scores using segmentation prediction
The last result we present demonstrates one potential
application of the technique. We use the overall segmentation
result prediction to selectively lter match scores from the
two different data sets. Due to the small number of incor-
rectly segmented images in both the WVU and ICE datasets,
we do not present results when using the Zuos segmenta-
tion algorithm. Instead we focus on Maseks segmentation,
encoding, and matching algorithms. As mentioned in the in-
troduction, iris segmentation is a main factor in determining
an iris recognition systems ability to successfully classify
pairs of iris image as genuine or imposter. Along those lines,
we would expect to see performance drop as the number of
incorrectly segmented iris images increases. Figure 7 shows
match score ROC curves based on segmentation results
across both the WVU and ICE datasets.
10
3
10
2
10
1
10
0
10
1
10
2
0
10
20
30
40
50
60
70
80
90
100
FAR
G
A
R
Libor Masek WVU Iris Recognition ROC
Pred. Good Seg.
Pred. Failed Seg.
All
Good Seg.
Failed Seg.
(a) Masek/WVU: Iris Recognition ROC.
10
3
10
2
10
1
10
0
10
1
10
2
0
10
20
30
40
50
60
70
80
90
100
FAR
G
A
R
Libor Masek ICE Iris Recognition ROC
Pred. Good Seg.
Pred. Failed Seg.
All
Good Seg.
Failed Seg.
(b) Masek/ICE: Iris Recognition ROC.
Fig. 7. Iris recognition performance ROCs based on segmentation result
predictions (Masek) (a) WVU (b) ICE.
In the gure, a total of ve curves are shown. The blue curves
show the matching performance when all match scores are
included and serves as a baseline. The highest performing
solid (green) curves represent the matching performance
from scores corresponding to pairs where both images were
predicted by the algorithm to have successfully segmented
the iris image. The highest performing dotted (green) curves
represent the match scores of image pairs when the iris
images were segmented correctly (ground truth). Conversely,
the lowest performing solid curves (red) represent the match-
ing performance for pairs of images that were predicted to
fail segmentation. Finally, the dotted red curves show the
matching performance for pairs of images that failed seg-
mentation (ground truth). These two graphs are useful in that
we conrm two concepts. One, we conrm the previously
accepted premise that matching performance is signicantly
affected by segmentation results. Perhaps the more useful
conclusion is that the proposed technique accurately predicts
overall segmentation results as the matching performance of
the ltered results corresponding to the predicted data closely
resembles the matching performance of the ltered results for
the ground truth data.
D. Segmentation Prediction Comparison
Recall that the Zuo et. als [4] segmentation precision
metric measures the gradient of the pupil and iris boundaries
on the normalized iris region. Then a static threshold is
applied to determine if segmentation failed or not. That is,
if 1
p
> 0.8 and 1
i
> 0.7 then the segmentation
is said to be correct; otherwise the segmentation is said to
be incorrect. Here,
p
, and
i
are the precision scores for
the pupil and iris boundaries respectively. When utilizing
these thresholds we obtain a correct segmentation classica-
tion accuracy of 0.95 when evaluating the WVU non-ideal
dataset. For the ICE data we obtain an accuracy of 0.93.
Both scores are comparable to the results in [4]. Although
these numbers indicate the utility of the metric in classifying
segmentation performance, our proposed approach utilizing
the NBtree model provides better accuracy. This may be due
to the fact that a simple threshold is not enough suggesting
a more complex decision boundary. To test this we decided
TABLE VI
ZUO ET. ALS SEGMENTATION PRECISION: NBTREE
MISCLASSIFICATIONS.
(a)
WVU
P G P F
A G 99.83% 0.17%
A F 43.24% 56.75%
(b)
ICE
P G P F
A G 99.86% 0.14%
A F 25.00% 75.00%
to utilize the Zuo et. al precision scores with our NBtree
model. Table VI provides the classication accuracies for
this experiment. We see a substantial increase in performance
across both datasets. The overall classication accuracies are
99.17 and 99.66 for WVU non-ideal and ICE respectively.
Both the proposed approach and Zuo et. als precision metric
seem to reliably predict segmentation performance under the
NBTree model, suggesting that a simple threshold decision
boundary is not enough.
VI. DISCUSSION
When evaluating the experimental results, readers need
to be aware of the underpinnings of this study. One con-
sideration of interest relates to the pupil scoring for Zuos
segmentation algorithm. As seen in plot (a) of Figure 6, the
pupil segmentation scoring performs poorly compared to the
iris segmentation scoring. This is due mainly to the fact that
the occlusion masks were not considered when calculating
pupil segmentation scores (i.e. Zuos algorithm correctly
segments occluded pupils while Maseks does not). This was
chosen because we wanted to be consistent in the application
of segmentation masks across both algorithms. Should we
have used the masks on Zuos algorithm, the scoring would
have performed better. However, due to the fact that the
masks are often inaccurate for Maseks segmentation for the
chosen parameters on the two datasets, we would have seen a
high drop in performance on that end. Therefore, we decided
not to include the masks in the score generation process.
Related to the NBTree approach to overall segmentation
prediction, there are a number of other ways to arrive at an
overall iris segmentation prediction. In particular, informa-
tion fusion approaches may be applicable. We investigated a
number of such approaches including the simple sum, min
score, and weighted sum rules to fuse the pupil and iris
score and arrive at a single overall segmentation score. We
observed that the fused results performed signicantly better
for both data sets when Zuos segmentation was used, but
no fusion rule allowed for said performance improvement
with Maseks segmentation algorithm. After further analysis,
we determined this was due to the fact that while the pupil
and iris scores were uncorrelated with Zuos segmentation,
the scores were negatively correlated with Maseks segmen-
tation. Therefore, the ability to perform such fusion appears
dependent on the characterization of the chosen segmentation
algorithms.
VII. CONCLUSIONS AND FUTURE WORK
We presented an approach to automatically measure the
results of iris segmentation algorithms. Scores are provided
for the two boundaries relevant to the task: pupil segmen-
tation and iris segmentation. We evaluated the approach
using two algorithms across two publicly available data sets.
The results indicate the approach is capable of arriving at
segmentation scores suitable for predicting both the success
and failure of pupil or iris segmentation. Additionally, we
present a machine learning approach to arrive at an overall
segmentation result prediction which achieves an average
classication accuracy of 98.45% across the four combina-
tions of algorithms and datasets tested. Finally, we presented
one application of the proposed technique where the overall
iris segmentation prediction is used to lter iris recognition
matching scores into correctly segmented and incorrectly
segmented scoring bins. Here we conrmed that iris match
scores hailing from images that were predicted to have
good segmentation scores perform more accurately than pairs
which were predicted to have failed segmentation. While
this is one application of the technique, the technique should
prove useful in many other arenas such as iris quality metrics
involving local analysis, a means to signal the need for more
intensive segmentation processing or image reacquisition.
There are a number of outstanding issues that will spawn
future work. In particular, instead of utilizing static size
constraining thresholds for pupil (P
t
) and iris (P
i
) as seen
in step 3 of Figure 2, we could search for local minima in
the histogram to dynamically reach pupil and iris thresholds.
Additionally, there are very specic cases of incorrect seg-
mentation that may result in highly concentric boundaries
for pupil / iris that still fail segmentation. These cases
would result in inaccurate iris segmentation scores, and the
radial analysis of the annular iris width could prove to be a
promising technique to arrive at an iris segmentation score
or at least complement the current measure in dealing with
this issue. Naturally, testing the algorithm on other data
sets and additional segmentation algorithms would glean
more information onto the degree in which the technique
generalizes.
REFERENCES
[1] J. Zuo, N.D. Kalka, and N.A. Schmid, A robust iris segmentation
procedure for unconstrained subject presentation, Biometric Con-
sortium Conference, 2006 Biometrics Symposium: Special Session on
Research, pp. 16, Sept. 19 2006-Aug. 21 2006.
[2] N.D. Kalka, J. Zuo, N.A. Schmid, and B. Cukic, Image quality
assessment for iris biometric, 2006, vol. 6202, p. 62020D, SPIE.
[3] Y. Chen, S.C. Dass, and A.K. Jain, Localized iris image quality using
2-d wavelets, in ICB, 2006, pp. 373381.
[4] Jinyu Zuo and N.A. Schmid, An automatic algorithm for evaluating
the precision of iris segmentation, 29 2008-Oct. 1 2008, pp. 16.
[5] B. Bonney, Non-orthogonal iris recognition, U.S.N.A. Trident
Scholar project report;, vol. 331, 2005.
[6] H. Proena and L.A. Alexandre, Noisy iris challenge evaluation - part
1, http://nice1.di.ubi.pt/, 2008.
[7] S. Crihalmeanu, Database design for biomdata oracle database
technical documentation, West Virginia University, Lane Department
of CSEE, 2006.
[8] P.J. Phillips, W.T. Scruggs, A.J. OToole, P.J. Flynn, K.W. Bowyer,
C.L. Schott, and M. Sharpe, Ice 2006 report: Frvt 2006 and ice 2006
large-scale results., NISTIR 7408, 2007.
[9] L. Masek, Recognition of human iris patterns for biometric iden-
tication., Bachelors Thesis. The University of Western Australia,
2003.
[10] J. Daugman, How iris recognition works, IEEE Trans. Circuits Syst.
Video Techn., vol. 14, no. 1, pp. 2130, 2004.
[11] R. Kohavi, Scaling up the accuracy of naive-bayes classiers:
a decision tree hybrid, Proceedings of the Second International
Conference on Knowledge Discovery and Data Mining, pp. 202207,
1996.

You might also like