Professional Documents
Culture Documents
Abstract
Quality evaluation of image segmentation algorithms are still subject of
debate and research. Currently, there is no generic metric that could be
applied to any algorithm reliably. This article contains an evaluation for
the PSRN (Peak Signal-To-Noise Ratio) as a metric which has been used to
evaluate threshold level selection as well as the number of thresholds in the
case of multi-level segmentation. The results obtained in this study suggest
that the PSNR is not an adequate quality measurement for segmentation
algorithms.
Keywords: Segmentation, threshold, PSNR
1. Introduction
In image processing, segmentation is a a set of techniques that separate
regions from a scene based on similarity. There are several techniques available for this process [10, 4]. Segmentation is usually based on attributes such
as color, brightness contrast or continuity of pixel regions. In the particular
case of threshold based techniques, one ore more threshold values is determined. Pixels of similar brightness levels are then grouped as below or above
such threshold levels [6].
Fig. 1 shows an example of a scene containing a simple foreground and a
background. Fig. 2 shows its corresponding 256 gray level histogram with
an obtained threshold level t at 118. The resulting image of a threshold based
segmentation algorithm can is shown at Fig. 3, where pixels below t are set
to (0). Conversely, pixels of brightness level above t are set to 255. In this
case, pixels labeled as (0) and (255) can be treated as the background and
foreground, respectively.
Such techniques are often used at pre-processing step in high level computer vision based systems as it reduces the amount of irrelevant information
by similarity grouping of the pixels in the same region. The objective of
threshold algorithms is to detect the threshold level that separates an image
in regions of interest more accurately. The main problem is that the quality
2
(2)
(a)
(b)
Figure 4: Example of an image from the database (a) and its respective ground truth (b)
(a)
(b)
Figure 5: Automatically filled ground truth image (a) and obtained binary mask (b)
Fig. 6 shows an example of a binary mask B (a) and its corresponding bad
segmentation B 0 (b).
When used as an analytic method, the PSNR is used between the resulting
image and the original. Therefore, the PSNR must be calculated between
each original image I and the corresponding segmentation mask B and bad
segmentation mask B 0 .
For each image in the database, the PSNR is calculated between both
B and B 0 and I and the results of the PSNR are calculated and stored for
posterior analysis.
(a)
(b)
Figure 6: Binary mask B (a) and bad segmentation mask B 0 after salt and pepper noise
(b)
4.1. Proof
Let P be the set of PSNR results calculated between each binary mask B
and its corresponding image I. Le P 0 be the set of PSNR results calculated
between each bad segmentation mask B 0 and its corresponding source image
I. If the PSNR is not an adequate analytic method, the average of PSNR
values in P should be significantly superior to those obtained in P 0 . For this
paper, this condition is adopted as our main hypothesis.
Figure 7: Probability density for the set P of PSNR results for good segmentation masks
Figure 8: Probability density for the set P 0 of PSNR results for bad segmentation masks
F
df
df denominator
P value
Confidence interval
Variance rates
0.4618
299
299
4.2651011
0.3679506 a 0.5795227
0.4617745
and P 0 are homogeneous and the Students T test cannot be used reliably.
The Welchs T test is then used to determine if the difference between P and
P 0 is statistically significant.
5.2. Welchs T test
As a null hypothesis, we adopt that P and P 0 are equal and the difference
between the means of both sets is zero (0). As the alternative hypothesis, we
adopt that the mean of P 0 is superior to the mean of P . Should the alternative
hypothesis be accepted, it would suggest that the bad segmentation masks
were considered better then the ideal segmentation according to the PSNR
metric.
The Welchs T Test is then applied with 95% of significance between both
sets P and P 0 . Table 2 shows the results of the Welchs T test.
T statistics
df
p value
Confidence interval
Mean of P
Mean of P 0
-7.6524
526.607
4.7351014
0.8641351
5.638749
6.740013
The p value for the Welchs T test is 4.735 1014 and is found in the
area of rejection of the null hypothesis. We are left with the acceptance of
the alternative hypothesis which indicate that the PSNR values calculated
from the bad segmentation masks B 0 are superior to the ones calculated by
human obtained masks B.
6. Final considerations
We investigated the efficacy of the PSNR as an analytic method for segmentation algorithms the same way its adopted. We used human created
segmentation masks as an ideal reference of a segmentation algorithm and
compared the calculated PSNR values from these masks to those calculated
from artificially inferior segmentation masks.
To verify if the PSNR is a good evaluation method we compared the values of two sets of calculated PSNR values from good and bad segmentation
masks. The mask generation procedure can produce masks that would not
be obtainable from threshold algorithms as the values for labels are usually
determined by the values of the calculated thresholds. For example, a foreground object on a brighter background would have its pixels set to (0) in
the binary mask while the background would be set to (255). However, there
is no rule for what levels each label should be set to and this could influence
the PSNR as well. Some graph based algorithms even separate regions using
random colors [8]. Results from such such algorithms could not be verified
with the PSNR as it is as they would change greatly from one execution to
another.
We proposed the use of Welchs T test to verify if the difference between the sets of PSNR values from good and bad segmentation is significant.
Higher PSNR values for good segmentation masks would suggest the PSNR
is in fact a good analytic method. However, the results from the Welch T test
suggest exactly the opposite. The values of PSNR value for the bad segmentation masks are significantly superior than the ones for good segmentation
masks. Therefore, the PSNR should not be considered an adequate method
for evaluation of segmentation algorithms. However, the PSNR is still a good
method to evaluate discrepancies between images and could be used to evaluate edge detection algorithms by comparing with ground truth images such
as the ones present in the BSR300 database.
Future works could include the verification of multi-threshold algorithms
and the determination of the number of thresholds as well as the impact of
the label values.
7. Acknowledgment
The authors would like to thank the Berkeley University for the creation
and availability of the BSR300 database.
9
References
[1] Siddharth Arora, Jayadev Acharya, Amit Verma, and Prasanta K Panigrahi. Multilevel thresholding for image segmentation through a fast
statistical recursive algorithm. Pattern Recognition Letters, 29(2):119
125, 2008.
[2] Jaime S Cardoso and Lus Corte-Real. Toward a generic evaluation
of image segmentation. Image Processing, IEEE Transactions on,
14(11):17731782, 2005.
[3] Yu-Kumg Chen, Fan-Chieh Cheng, and Pohsiang Tsai. A gray-level
clustering reduction algorithm with the least i psnr/i. Expert Systems
with Applications, 38(8):1018310187, 2011.
[4] H. Erdmann, G. Wachs-Lopes, C. Gallao, P. M. Ribeiro, and S. P. Rodrigues. Developments in Medical Image Processing and Computational
Vision, chapter A Study of a Firefly Meta-Heuristics for Multithreshold
Image Segmentation, pages 279295. Springer International Publishing,
Cham, 2015.
[5] Ronald Aylmer Fisher. The asymptotic approach to behrenss integral,
with further tables for the d test of significance. Annals of Eugenics,
11(1):141172, 1941.
[6] Rafael C Gonzalez and Richard E Woods. Digital image processing,
2002.
[7] Ming-Huwi Horng and Ren-Jean Liou. Multilevel minimum cross entropy threshold selection based on the firefly algorithm. Expert Systems
with Applications, 38(12):1480514811, 2011.
[8] Qing-Hua Huang, Su-Ying Lee, Long-Zhong Liu, Min-Hua Lu, Lian-Wen
Jin, and An-Hua Li. A robust graph-based segmentation method for
breast tumors in ultrasound images. Ultrasonics, 52(2):266275, 2012.
[9] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human
segmented natural images and its application to evaluating segmentation
algorithms and measuring ecological statistics. In Proc. 8th Intl Conf.
Computer Vision, volume 2, pages 416423, July 2001.
10
11