Screening For Objectionable Images: A Review of Skin Detection Techniques

International Machine Vision and Image Processing Conference
Screening for Objectionable Images: A Review of Skin Detection Techniques
Wayne Kelly1, Andrew Donnellan1, Derek Molloy2

Department of Electronic Engineering, ITT Dublin, Tallaght, Dublin 24, Ireland.
2
School of Electronic Engineering, Dublin City University.
WayneKelly@itnet.ie, Andrew.Donnellan@it-tallaght.ie, Derek.Molloy@dcu.ie
the format, outlined in Figure 1, but each paper varies

in its implementation technique.
Abstract
In recent times advances in data communication
technologies, in particular high speed Internet
connections and 3G mobile phones, have introduced
major concern about the relative ease of access to
unsuitable material. This academic and commercial
problem of real time detection of unsuitable images
communicated by phone and Internet has grown
steadily over the last number of years. This review
paper is presented in three parts. The first part
compares and contrasts the most significant skin
detection techniques, feature extraction techniques and
classification methods. The second gives an analysis of
the significant test results. This review paper examines
thirty-three of the most recent techniques along with
their specific conditions. Finally, this paper concludes
by identifying future challenges and briefly summarizes
the proposed features of an optimal system for future
implementation with a suggested solution to the affects
of lighting variations on the colour of skin pixels.
Skin
Detection
Feature
Extraction
Image
Classification
Benign
Image
Figure 1: General Objectionable Image Detection System
The structure of this report is as follows: Section 2

gives a comparison of the most significant skin
detection techniques, while Section 3 and 4 give
comparisons of the feature extraction techniques and
classification methods respectively. Section 5 analyses
the most significant test results. A discussion on future
challenges and a summarized proposal for an
implementation technique is given in Section 6.
2. Skin Detection
1. Introduction
The detection of skin is an indication of the

presence of a human limb or torso within a digital
image. In recent times various methods of identifying
skin within images have been developed. This section
gives an overview of the main skin detection methods
implemented for the detection of objectionable images.
In early 2004 the Irish government demanded that

its mobile phone networks must take responsibility for
material transmitted across their systems and
implement security precautions to prevent the
distribution of objectionable material to minors. This
was after two cases concerning the transmission of
pornographic images to teenagers. The first incident
was when sexually explicit images, showing a 14 year
old girl, were found to be circulating amongst school
students [43]. The second, when a teenage girl received
pornographic images from an unidentified phone
number [42].
The development of objectionable image detection
systems has been instigated throughout the world by
events such as the incidents which took place in Ireland
in 2004. The identification processes generally follow
978-0-7695-3332-2/08 $25.00 2008 IEEE

DOI 10.1109/IMVIP.2008.21
Objectionable
Image
Input
Image
2.1. Colour Spaces for Skin Detection

A colour space can be described as a way to
mathematically represent, or store, colours. Choosing a
colour space for skin detection has become a
contentious issue within the image processing world.
Albiol [36] declares that if an optimum skin detector is
designed for every colour space, then their
performance will be the same. Gomez [34] states that
for pixel based skin detection there is seldom an
151
separately. Luminance is a representation of brightness

in an image and chrominance defines the two attributes
of a colour hue and saturation. Y is another
representation of brightness and is obtained with a
weighted sum of RGB, whereas Cb and Cr are
obtained by subtracting the luminance (Y) from the
Blue and Red components of RGB [29]. Due to the
fact that the luminance and chrominance components
are stored separately, YCbCr is greatly suited to skin
detection and Shin [32] found that YCbCr gives the
best skin detection results compared to seven other
colour space transformations.
YUV and YIQ are colour spaces normally
associated with television broadcasts but have been
used in digital image processing. Similar to YCbCr the
three components are in the form of one luminance (Y)
and two chrominance (UV or IQ), where IQ and UV
represent different coordinate systems on the same
plane. Although both colour spaces have been used
independently for skin detection [31], a combination of
both YUV and YIQ together is used in objectionable
image detection [4][8], giving poor results compared to
the original RGB colour space.
appropriate colour model for indoor and outdoor

images, but shows that a combination of colour spaces
can improve the performance (E of YES, red/green and
H of HSV).
2.1.1 Basic Colour Space (RGB). The most
commonly used method for representing pixel
information of a digital image is the RGB (Red, Green,
Blue) colour space. In this colour space levels of red,
green and blue light are combined to produce various
colours. Jones and Rehg [5] identified 88% of pixels
correctly while using RGB for simplicity and speed, as
most web images use RGB colour space. It is also
stated here that the accuracy could be increased if
another colour space was used. RGB colour space has
been used extensively in the detection of objectionable
images [15][16][21]. A recent US patent [41] proposed
that a pixel is not skin colour if B>G, G<B, G>R,
B<1/4R or B>200.
2.1.2 Perception Colour Space (HSV, HSI). The
HSV (Hue, Saturation, Value), also referred to as HSB
(Hue, Saturation, Brightness), colour space is a
nonlinear transform of RGB and can be referred to as
being a perceptual colour space due to its similarity to
the human perception of colour. Hue is a component
that describes pure colour (e.g. pure yellow, orange or
red), whereas saturation gives a measure of the degree
to which a pure colour diluted by white Light [33].
Value attempts to represent brightness along the grey
axis (e.g. white to black) but as brightness is subjective
it is therefore difficult to measure [33]. HSV is one of
the most commonly used colour spaces for skin
detection in adult images, second only to RGB, but
sometimes said to give better results [3][6][25]. Q. Zhu
[20] also notes that dropping the Value component and
only using the Hue and Saturation components, can
still allow for the detection 96.83% of the skin pixels.
HSI (Hue, Saturation, Intensity), also referred to as
HSL (Hue, Saturation, Luminance), is another
perceptual colour space that gives good skin detection
results. Like Value in HSV, Intensity is another
representative of grey level, but decoupled from the
colour components (Hue and Saturation). The HSI
colour space was used by Wang [24] as part of a
content-based approach, stating that the skin and
background pixels can be better differentiated using
HSI rather than RGB.
2.2 Skin Detection by Colour

Pixel colour classification can be complicated and
there have been many suggested methods for
classifying pixels as skin or non-skin colour in an
attempt to achieve the optimum performance. Fleck [1]
says that skin colours lie within a small region (red,
yellow and brown) of the colour spectrum regardless of
the ethnicity of the person within an image. Although
this is a small region within the colour spectrum, it also
incorporates other, easily identifiable, non-skin objects
such as wood. Furthermore, human skin under
significant amount of light can appear as a different
colour. Colour detection methods can be classed as
physical based, parametric or non-parametric. The
choice of colour space can greatly affect the
performance of both the physical based and parametric
approaches, but the influence of the colour space
choice is said to reduce greatly in the non-parametric
approaches [36][37].
2.2.1 Physical Based Approaches. Using explicit
threshold values in a colour space to detect skin is one
of the most simplistic ways of detecting skin pixels. A
physical based approach, using thresholds is often
referred to as a colour model. This is the creation of
parameters to stipulate the values a pixel can be if it is
to be considered as skin. Example:
Jiao [4] found that 94.4% of adult images could be
detected using only the YUV and YIQ colour spaces,
in which a pixel can be considered to be skin if
2.1.3 Orthogonal Colour Space (YCbCr, YIQ,

YUV). Often associated with digital videos the YCbCr
colour space is the third most used colour spaces for
skin detection in this area [17][18][29]. YCbCr is a
colour space where the luminance (Y) component and
the two chrominance components (Cb Cr) are stored
152
(1)
(20 I 90) (100 150)
where,
|V |.
(2)
= tan1
| U |
This method of skin detection can be used with a
single colour space [1][17] for simplicity or multiple
colour spaces [7][24] to increase accuracy.
Related to the explicit threshold is the skin
probability ratio (also known as skin likelihood). This
is where a pixel is classified as skin using various
probability theories to create a skin likelihood map. Ye
[9] uses Bayes theorem to reduce the effect of
variations in light while detecting skin.
although completely unrelated, can have very similar

histograms. A solution to this issue is the colour
coherence vectors (CCV). CCV establishes the
relevance (coherence) or irrelevance (incoherence) of a
pixel to the region in which the pixel is situated, where
a pixels colour coherence is the degree to which pixels
of that colour are members of large similarly-coloured
regions [39]. Jiao [4] found that using CCV along with
a colour histogram improved specificity (87.7% to
90.4%) but decreased sensitivity (91.3% to 89.3%)
2.3 Skin Detection by Texture

Although the texture of skin is quite distinct from a
close range, skin texture appears smooth within most
images. One of the biggest problems with skin colour
modelling is falsely detecting non-skin regions as skin
(false/positive) due to similar colour. Skin texture
methods are principally used to boost the results of the
skin colour modelling by reducing this false/positive
rate.
2.2.2 Parametric Approaches. As previously

discussed skin colours lay within a small region of the
colour spectrum, within this skin colour cluster skin is
normally distributed i.e. Gaussian distribution.
The Gaussian mixture model is a combination of
Gaussian functions and is defined [31] as
1
1
1
p(c) = wi
exp (c i )T (c i ) , (3)
1
1
i
2
(2 ) 2 | i | 2
where c is the colour vector, is the mean vector and
is the diagonal covariance matrix. The number of
Gaussian functions used is critical and the choice of
colour space is also of great importance [31]. It is
widely regarded [5][37][38] that the Gaussian mixture
model gives inferior results to that of such systems like
the colour histogram, yet it has been extensively used
for skin colour segmentation in objectionable image
detection systems [27][28] showing surprisingly high
sensitivity (92.2%) and specificity (97.9%) [12].
2.3.1 Gabor Filter. Gabor filters are band-pass filters

that select a certain wavelength range around a centre
wavelength using the Gaussian function. Gabor filters
measure by performing image analysis in the
space/wave number domain. Jiao [4] used a Gabor
filter along with a Sobel edge operator to simply boost
the performance of the skin colour detection finding
that specificity was improved (63.3% to 87.7%) but
sensitivity was decreased (94.4% to 91.3%). Whereas
Wang [24] and Xu [27] use a Gabor filter to train a
Gaussian mixture model to recognise skin and nonskin texture features.
2.3.2 Co-Occurrence Matrix. The two-dimensional
co-occurrence matrix measures the repetitive changes
in the grey level (brightness) to measure texture. The
matrix records the simultaneous occurrence of two
values in a certain relative position. After the cooccurrence matrix has been constructed, the entropy,
energy, contrast, correlation and homogeneity features
of the image can be calculated. The co-occurrence
matrix is used as a good trade off between accuracy
and computation time [7][13].
2.2.3
Non-Parametric
Approaches.
Colour
histograms are a statistical method for representing the
distribution of colour in an image and are constructed
by counting the number of pixels of each colour.
Jayaram [35] shows that the number of bins used in the
histogram is a large factor in the performance of the
skin detection.
Another use of the colour histogram is the
likelihood histogram [6][22], created with the skin
colour likelihood algorithm which establishes the
probability of a pixel being a skin pixel. Jones and
Rehg [5] used a set of training images to create two
colour histograms of skin and non-skin pixels;
maximum entropy modelling was then used to train a
Bayes classifier with 88% accuracy. This model has
been repeatedly used as part of other objectionable
image detection systems [15][19].
A major issue with colour histograms is they only
measure colour density, this means that two images,
2.3.3 Neighbourhood Gray Tone Difference Matrix.

The neighbourhood grey tone difference matrix
(NGTDM) is another texture feature analysis method
very similar to the co-occurrence matrix as it measures
the changes in intensity and dynamic range per unit
area. NGTDM extracts the visual texture features such
as Coarseness, Contrast, Busyness, Complexity and
Strength. Cusano [10] used NGTDM with Daubechies'
153
improve accuracy [29]. The ability to extract these skin

features depends on the method used in skin detection,
if colour histograms are used then only the skin
area/image ratio can be used, whereas using a skin
likelihood map could allow the use of skin features
such as orientation, height and width of skin regions.
Shih [44] extracts features such as scalable colour
and edge histogram descriptors from a test image. Then
a labelled image database is searched and 100 images
with the closest correlation of features, to the test
image, are returned. Depending on the number of adult
images returned from the dataset, the test image is
classified as objectionable or benign.
wavelets to extract the texture features of skin regions

to boost the classification of skin.
3. Feature Extraction
The classification of digital images is a memory
hungry and computationally complex process. The
solution for this is a process called feature extraction.
Feature extraction is a form of dimension reduction,
where resources used to describe large sets of data are
simplified with as little loss to accuracy as possible.
The colour and texture methods discussed previously
are forms of feature extraction, but they are used solely
in the classification of skin. This section discusses the
features used in the classification of the objectionable
image, predominately geometric and dimensional.
4. Classifiers
A classifier is a mathematical method of grouping
the images based on the results from the feature
extraction and skin detection. Most of the systems
class the images as benign or objectionable, but some
have various levels such as topless, nude or sex image
[25].
3.1 Face Detection

If it was assumed that all images with large areas of
skin are objectionable, then a perfectly acceptable
portrait image would be classed as objectionable. Face
detection algorithms are used to filter any images
whose skin pixels are mainly occupied by a face or
faces. The face detection algorithms proposed by Viola
and Jones [40] give good trade offs between accuracy
and computational speed, for this reason they have
become popular methods of face detection in
objectionable image detection systems [22][28]. Shen
[30] uses the detected faces as a reference point to help
detect the torso from which breasts and pubic regions
are identified.
4.1 Supervised Machine Learning

Machine Learning is a field in artificial intelligence
that develops algorithms to allow a computer to use
past experience to improve performance. Supervised
learning is when the algorithm learns from training
data that shows desired outputs for various possible
inputs and is the most used form of classification in the
objectionable image detection field, with 22 of the
reviewed publications using at least one of four
various methods: Support Vector Machine (SVM),
Neural Networks (NN), Decision Tree (DT) and kNearest Neighbour (k-NN).
3.2 Skin Features

After skin has been detected various features can be
extracted. The skin area/image ratio is the percentage
ratio of the image which is covered by skin. As most
objectionable images would be predominately skin, the
skin area/image ratio is used by most, if not all, the
reviewed systems. This ratio does not depend on the
method of skin classification and can be used as an
input to the classifier [15][16] or as an early filtering
system [2].
The amount [10], position [14], orientation [28],
height and width [13], shape [17][20], eccentricity
[21], solidity [21], compactness [19], rectangularity
[19] and location [27][29] of skin regions are features
used as input components to the machine learning
classifiers. Liang [13] found that the height feature was
the most important feature for the detection of
objectionable images. The choice and implementation
of classifier would stipulate the influence of the skin
features, but it has been shown that skin features can
4.1.1 Support Vector Machine. The SVM is a kernel

based classifier, that is relatively easy to train
(compared to neural networks). Given a training set of
benign and objectionable images the SVM will find the
hyperplane between the two sets that will result in the
highest number of benign images together and
objectionable images together. The distance between
the hyperplane and both sets must also be at its
maximum. The SVM has been shown to be able to
give high performance when used with the Gaussian
mixture model (92.2% sensitivity and 97.9%
specificity) [12], skin probability map (97.6%
sensitivity and 91.5% specificity) [23] and colour
histogram (89.3% sensitivity and 90.6% specificity)
[4]. R. Cusano [10] found that the SVM gave better
results than multiple decision trees.
154
calls weak classifiers, learning from each, correct and

incorrect classification, but this process can be
vulnerable to noise. Bootstrapping is where one is
given a small set of labelled data and a large set of
unlabelled data, and the task is to induce a classifier.
Lee [29] shows that the addition of a boosting
algorithm increases sensitivity from 81.74% to
86.29%.
4.1.2 Neural Networks. NN are a machine learning

algorithm based on how a biological brain learns by
example. Classification is performed by a large number
of interconnected neurones working simultaneously to
process the image features and decide if the image is
benign or objectionable. NN can implicitly detect
complex nonlinear relationships between independent
and dependent variables, but can be computationally
complex compared to SVM and can be difficult to
train. Bosson [6] found that neural networks (83.9%
sensitivity and 89.1% specificity) gave slightly better
results to that of k-NN and SVM. Kim [26] attained
94.7% sensitivity and 95.1% specificity using NN with
MPEG-7 Descriptors.
5. Results
The test results given are in the form of sensitivity
and specificity, where sensitivity is defined as the ratio
of the number of objectionable images identified to the
total number of objectionable images tested and
specificity is defined as the ratio of the number of
benign images passed to the total number of benign
images tested [2]. Due to space constraints Table 1
only shows the top 5 results of the reviewed
publication whose sensitivity and specificity are both
above 90%. As can be seen from this table the results
of the detection systems look to give extremely high
sensitivity and specificity.
4.1.3 Decision Tree (DT). A DT is a classifier in the

form of a tree structure, where each leaf node indicates
the value of a target class and each internal node
specifies a test to be carried out on a single attribute,
with one branch and sub-tree for each possible
outcome of the test. The classification of an instance is
performed by starting at the root of the tree and
moving through it until a leaf node is reached, which
provides the classification of the instance. Zheng [18]
shows that a DT can give 91.35% sensitivity and
92.3% specificity in detecting objectionable images.
Zheng [19] also found that the DT (C4.5 method) gave
higher accuracy than NN and SVM.
4.1.4 k-Nearest Neighbour. The k-NN is based on
finding the closest examples from the training data to
classify an image as objectionable or benign. The
training of the k-NN is very fast and Xu [27] found that
the k-NN (81% sensitivity and 94% specificity)
outperformed the NN (79% sensitivity and 91%
specificity).
Publication
Wang 1997 [2]
Sensitivity
91%
Specificity
96%
Yoo 2003 [11]
93.47%
91.61%
Jeong 2004 [12]
92.2%
97.9%
Belem 2005 [23]
97.6%
91.5%
Kim 2005 [26]
94.7%
95.1%
Table 1: Top results from reviewed publications
Publication
Fleck 1996
[1]
4.2 Geometric Classifier
Jiao 2003
[4]
Fleck [1] used an Affine Imaging Model to identify

limbs and torsos from detected skin regions, and then
established if the limb and torso arrangement matches
a geometric skeletal structure. Affine geometry is the
geometry of vectors, which do not involve length or
angle. Fleck achieved 52.2% sensitivity and 96.6%
specificity using the Affine Imaging Model, which is
poor compared to the machine learning classifiers.
Sensitivity
Specificity
52.2%
96.6%
89.3%
90.6%
Duan 2002
[8]
80.7%
Cusano 2004
[10]
90.4%
Lee 2004
[29]
86.4%
90%
88.4%
94.8%
Dataset
Source: Internet, CDs, Magazines
Ethnicity: Caucasians
Illumination Conditions: Various
Source: Internet, Corel Library
Ethnicity: Caucasians, Asian
Illumination Conditions: Not Provided
Source: Internet, Corel Library
Ethnicity: Caucasians, Asian, European
Source: Not Provided
Ethnicity: Caucasians, African, Indian
Source: Not Provided
Ethnicity: Caucasians, African, Asian
Illumination Conditions: Controlled
Table 2: Publications that give adequate dataset information
4.3 Boosting Classification
But the publications presented in Table 1 give very

little information for the training and testing dataset
used, such as source and illumination conditions of the
dataset as well as the ethnicity of the people within the
images. If little information of the testing methods or
images used is given, it is hard to accept the results
Boosting is the use of an algorithm to increase the

accuracy of the learning classifiers and has been
performed in two ways in the reviewed publications;
Adaboost and Bootstrapping. Adaboost repeatedly
155
used if computational complexity is a

problem.
2. An adaptive skin colour technique should be
used to eliminate variations in image quality
and lighting. None of the reviewed
publications give adequate solutions. A
proposed solution to the variations in light is
to extend the work by Jones [5]. A luminance
(i) factor would be used to divide the existing
colour histogram into 4 new colour
histograms. Depending on its luminance
value, each pixel is placed into a different
histogram
which
represent
different
percentage bands of light: i<25, 25<=i<50,
50<=i<75 and 75<=i. Adopting this approach
would establish a histogram for skin pixels for
each light band showing the affect light has
on skin pixel colour.
3. Gabor Filters have the greatest affect on
increasing the specificity as texture analysis
method.
4. The analysis of skin features such as location
and orientation should be utilised along with a
face detector to reduce false positives.
5. NN and SVM consistently give high levels of
accuracy (need large datasets to train which
may be an issue for some).
6. A Boosting algorithm such as Adaboost [29]
should be used to boost the classification
process.
This paper has reviewed the best performing
techniques used in skin detection for objectionable
images. It has evaluated the best of the current
techniques used in skin classification and feature
extraction. Future challenges have been identified, and
the proposed features of an optimal implementation
technique are provided.
presented. This is a frequent problem throughout most

of the publications here as there is no standard
objectionable image datasets; there is no sure way to
adequately comparing all systems. Not all the
publications omit the details of their datasets; Table 2
shows 5 publications which do give reasonable
amounts of information on the datasets used to test
their respected systems. This table illustrates that as the
sensitivity increases from system to system the
specificity decreases; this would suggest a more
realistic set of results.
6. Conclusion
To reduce false-positives some papers have added
various steps such as face detection and swimsuit
detection. Generally the techniques have implemented
a skin detection method, as large amounts of skin are
generally a sign of the presence of naked people,
followed by a feature extraction method, to identify the
features such as shape and location, and finally
classification from the results of the two previous
steps.
The right choice of method to perform colour
analysis in the skin identification process directly
stipulates the features that can be extracted from the
image. The use of colour histograms to find the colour
density of an image may identify if large skin areas are
present, but they do not allow for features such as
shape and location to be found. However, using colour
histograms to train a Beyes probability algorithm has
been proven to give good results [5]; note this is an old
method and newer adaptive methods of skin detection
have since been developed [12].
Much of the datasets used are described as being
gathered randomly from the Internet (Some papers
count logos as images from the Internet thus boosting
their results.), but do not state from what domain
(Asian, American...etc) or of what the images depict
(indoor, outdoor, professional, amateur, etc). Both of
these issues can affect the accuracy as the ethnicity of
the persons within the images changes with the domain
and the variations in quality and lighting could reduce
the skin identification performance. The need for an
academically available datasets is essential, but due to
the nature of the images needed this may be problem.
There are legal and ethnical issues surrounding the
distribution of such images which prevent the creation
of a dataset, as no academic institute wishes to be
perceived as a distributor of pornographic material.
After careful examination of the published papers it
was decided that an optimal system would consist of:
1. HSV/HSI or YCbCr should be the choice of
colour space for accuracy; RGB should be
References
[1] M. Fleck, D.A. Forsyth, C. Bregler. Finding naked
people, ECCV 1996, vol. 2, pp 593-602.
[2] J.Z. Wang, J. Li, G. Wiederhold, O. Firschein, System
for
screening
objectionable
images,
Computer
Communications 1998, Vol.21 (15), pg 1355-1360, Elsevier.
[3] Y. Chan, R. Harvey, D. Smith, Building systems to
block pornography, eWiC 1999, pp 34-40.
[4] F. Jiao, W. Gao, L. Duan, G. Cui, Detecting Adult
Image using Multiple Features, ICII 2001, vol.3, pg 378383.
[5] M. Jones, J. M. Rehg, Statistical colour models with
application to skin detection, IJCV 2002, 46(1), pp 81-96.
156
[6] A. Bosson, G.C. Cawley, Y. Chan, R. Harvey, NonRetrieval: blocking pornographic images CIVR02, pp 50-59.
[22] Y. Wang, W. Wang, W. Gao Research on the

Discrimination of Pornographic and Bikini Images ISM,
2005, pp 558-564.
[7] L.L. Cao, X.L. Li, N.H. Yu, Z.K. Liu, Naked People
Retrieval Based on Adaboost Learning, ICMLC 2002, Vol.
2, pp 1133 - 1138.
[23] R. Belem, J. Cavalcanti, E. Moura, M. Nascimento,

SNIF: A Simple Nude Image Finder, LA-Web, 2005, pp
252-258.
[8] L. Duan, G. Cui, W. Gao, H. Zhang, Adult image

detection method base-on skin colour model and support
vector machine, ACCV 2002, pp 797-800.
[24] S-L. Wang, H. Hu, S-H. Li, H. Zhang, Exploring

Content-Based and Image-Based Features for Nude Image
Detection FSKD (2), 2005, pp 324-328.
[9] Q. Ye, W. Gao, W. Zeng, T. Zhang, W. Wang, Y. Liu,

Objectionable Image Recognition System in Compression
Domain, IDEAL 2003, pp 1131-1135.
[25] W. Kim, S.J. Yoo, J-s. Kim, T.Y. Nam, K. Yoon,

Detecting Adult Images Using Seven MPEG-7 Visual
Descriptors, Human.Society@Internet, 2005, pp 336-339.
[10] C. Cusano, C. Brambilla, R. Schettini, G. Ciocca, On

the Detection of pornographic digital images, VCIP, 2003,
pp 2105-2113.
[26] W. Kim, H-K. Lee, S-J. Yoo, S.W. Baik, Neural

Network Based Adult Image Classification ICANN (1),
2005, pp 481-486
[11] S-J. Yoo, M-H. Jung, H-B. Kang, C-S. Won, S-M. Choi,
"Composition of MPEG-7 Visual Descriptors for Detecting
Adult Images on the Internet", LNCS 2713, Springer-Verlag,
pp 682-687, 2003.
[27] Y. Xu, B. Li, X. Xue, H, Lu, Region-based

Pornographic Image Detection, MMSP, 2005.
[28] H.A. Rowley, Y. Jing, S. Baluja, Large scale imagebased adult-content filtering, VISAPP, 2006, pp 290-296.
[12] C. Jeong, J. Kim, K. Hong, Appearance-based nude

image detection, ICPR 2004, pp IV: 467470.
[13] K.M. Liang, S.D. Scott, M. Waqas, Detecting
pornographic images, ACCV 2004, pp 497-502.
[29] J.-S. Lee, Y.-M. Kuo, P.-C. Chung, E.-L. Chen, Naked
image detection based on adaptive and extensible skin colour
model, PR(40),No. 8, August 2007, pp. 2261-2270.
[14] H. Zheng, M. Daoudi, B. Jedynak, Blocking Adult

Images Based on Statistical Skin Detection, ELCVIA 2004
Vol. 4 (2), pp 1-14.
[30] X. Shen, W. Wei, Q. Qian, The filtering of internet

images based on detecting erotogenic-part, ICNC 2007, pp
732-736.
[15] W. Zeng, W. Gao, T. Zhang, Y. Liu, Image Guarder:

An Intelligent Detector for Adult Images, ACCV 2004, pp
198-203.
[31] P. Kakumanu, S. Makrogiannis, N. Bourbakis, A

survey of skin-colour modelling and detection methods,
PR(40), No. 3, March 2007, pp. 1106-1122.
[16] Y. Liu, W. Zeng, H. Yao, Online Learning

Objectionable Image Filter Based on SVM, PCM, 2004, pp
304-311.
[32] M.C. Shin, K.I. Chang, L.V. Tsap, Does Colour space
Transformation Make Any Difference on Skin Detection?
IEEE WACV, Dec 2002, pp 275-279.
[17] W.A. Arentz, B. Olstad, Classifying offensive sites

based on image content CVIU, No. 1-3, 2004, pp 295-310.
[33] R.C. Gonzalez, R.E. Woods, S.L. Eddins, Digital Image

Processing Using MATLAB, Prentice Hall, 2004.
[18] Q-F. Zheng, M-J. Zhang, W-Q. Wang, A Hybrid

Approach to Detect Adult Web Images, PCM, 2004, pp
609-616.
[34] G. Gomez, M. Sanchez, L.E. Sucar, On Selecting an

Appropriate Colour Space for Skin Detection, MICAI, 2002,
pp 69-78.
[19] Q-F. Zheng, M-J. Zhang, W-Q. Wang Shape-based

Adult Image Detection, ICIG, 2004, pp 150-153.
[35] S. Jayaram, S. Schmugge, M.C. Shin, L.V. Tsap, Effect

of Colour space Transformation, the Illuminance Component,
and Colour Modelling on Skin Detection, CVPR, 2004, pp
813-818.
[20] Q. Zhu, C-T. Wu, K-T. Cheng, Y-L. Wu, An adaptive

skin model and its application to objectionable image
filtering, ACM Multimedia, 2004, pp 56-63.
[36] A. Albiol, L. Torres, E.J. Delp, Optimum colour spaces

for skin detection, ICIP, 2001, pp 122 124.
[21] J. Ruiz-del-Solar, V. Cataneda, R. Verschae, R. BaezaYates, F. Ortiz, Characterizing Objectionable Image

Content (Pornography and Nude Images) of Specific Web
Segments: Chile as a Case Study, LA-WEB, 2005, pp 269278.
[37] S.L. Phung, A. Bouzerdoum, D. Chai, Skin

segmentation using colour pixel classification: analysis and
comparison, IEEE TPAMI, 2005, pp 148-154.
157
[38] V. Vezhnevets, V. Sazonov, A. Andreeva, "A Survey on

Pixel-Based Skin Colour Detection Techniques". Proc.
Graphicon, 2003, pp 85-92.
[41] D.B. Swift, Evaluating graphic image files for

objectionable content, patent: US 7027645 B2, Apr 2006.
[42] A. Healy, Call for mobile phone security, The Irish
Times, 17th February, 2004.
[39] G. Pass, R. Zabih, J. Miller, Comparing images using

colour coherence vectors, ACM Multimedia 1996, pp 65-73.
[43] A. Healy, Gardai seek distributor of explicit image of

girl on phone, The Irish Times, 23rd January, 2004.
[40] P.A. Viola, M.J. Jones, Rapid Object Detection using a

Boosted Cascade of Simple Features, CVPR 2001, pp 511518.
[44] J-L. Shih, C-H. Lee, C-H Yang, An adult image

identification system employing image retrieval technique,
JVCIR(18), No. 6, December 2007, pp. 453-463.
158

Screening For Objectionable Images: A Review of Skin Detection Techniques

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Screening For Objectionable Images: A Review of Skin Detection Techniques

Uploaded by

Copyright:

Available Formats

International Machine Vision and Image Processing Conference

Screening for Objectionable Images: A Review of Skin Detection Techniques

Wayne Kelly1, Andrew Donnellan1, Derek Molloy2

the format, outlined in Figure 1, but each paper varies

Figure 1: General Objectionable Image Detection System

The structure of this report is as follows: Section 2

The detection of skin is an indication of the

In early 2004 the Irish government demanded that

978-0-7695-3332-2/08 $25.00 2008 IEEE

2.1. Colour Spaces for Skin Detection

separately. Luminance is a representation of brightness

appropriate colour model for indoor and outdoor

2.2 Skin Detection by Colour

2.1.3 Orthogonal Colour Space (YCbCr, YIQ,

although completely unrelated, can have very similar

2.3 Skin Detection by Texture

2.2.2 Parametric Approaches. As previously

2.3.1 Gabor Filter. Gabor filters are band-pass filters

2.3.3 Neighbourhood Gray Tone Difference Matrix.

improve accuracy [29]. The ability to extract these skin

wavelets to extract the texture features of skin regions

3.1 Face Detection

4.1 Supervised Machine Learning

3.2 Skin Features

4.1.1 Support Vector Machine. The SVM is a kernel

calls weak classifiers, learning from each, correct and

4.1.2 Neural Networks. NN are a machine learning

4.1.3 Decision Tree (DT). A DT is a classifier in the

Yoo 2003 [11]

Jeong 2004 [12]

Belem 2005 [23]

Kim 2005 [26]

Table 1: Top results from reviewed publications

4.2 Geometric Classifier

Fleck [1] used an Affine Imaging Model to identify

Table 2: Publications that give adequate dataset information

4.3 Boosting Classification

But the publications presented in Table 1 give very

Boosting is the use of an algorithm to increase the

used if computational complexity is a

presented. This is a frequent problem throughout most

[22] Y. Wang, W. Wang, W. Gao Research on the

[23] R. Belem, J. Cavalcanti, E. Moura, M. Nascimento,

[8] L. Duan, G. Cui, W. Gao, H. Zhang, Adult image

[24] S-L. Wang, H. Hu, S-H. Li, H. Zhang, Exploring

[9] Q. Ye, W. Gao, W. Zeng, T. Zhang, W. Wang, Y. Liu,

[25] W. Kim, S.J. Yoo, J-s. Kim, T.Y. Nam, K. Yoon,

[10] C. Cusano, C. Brambilla, R. Schettini, G. Ciocca, On

[26] W. Kim, H-K. Lee, S-J. Yoo, S.W. Baik, Neural

[27] Y. Xu, B. Li, X. Xue, H, Lu, Region-based

[12] C. Jeong, J. Kim, K. Hong, Appearance-based nude

[14] H. Zheng, M. Daoudi, B. Jedynak, Blocking Adult

[30] X. Shen, W. Wei, Q. Qian, The filtering of internet

[15] W. Zeng, W. Gao, T. Zhang, Y. Liu, Image Guarder:

[31] P. Kakumanu, S. Makrogiannis, N. Bourbakis, A

[16] Y. Liu, W. Zeng, H. Yao, Online Learning

[17] W.A. Arentz, B. Olstad, Classifying offensive sites

[33] R.C. Gonzalez, R.E. Woods, S.L. Eddins, Digital Image

[18] Q-F. Zheng, M-J. Zhang, W-Q. Wang, A Hybrid

[34] G. Gomez, M. Sanchez, L.E. Sucar, On Selecting an

[19] Q-F. Zheng, M-J. Zhang, W-Q. Wang Shape-based

[35] S. Jayaram, S. Schmugge, M.C. Shin, L.V. Tsap, Effect

[20] Q. Zhu, C-T. Wu, K-T. Cheng, Y-L. Wu, An adaptive