Professional Documents
Culture Documents
3, August 2011
ABSTRACT
Face and eye detection is one of the most challenging problems in computer vision area. The goal of this paper is to present a study over the existing literature on face and eye detection and gaze estimation. With the uptrend of systems based on face and eye detection in many different areas of life in recent years, this subject has gained much more attention by academic and industrial area. Many different studies have been performed about face and eye detection. Besides having many challenging problems like, having different lighting conditions, having glasses, facial hair or mustache on face, different orientation pose or occlusion of face, face and eye detection methods performed great progress. In this paper we first categorize face detection models and examine the basic algorithms for face detection. Then we present methods for eye detection and gaze estimation.
KEYWORDS
Image Processing, Face Detection, Eye Detection, Gaze Estimation
1. INTRODUCTION
Face detection is one of the most challenging problems in disciplines such as image processing, pattern recognition and computer vision. With the improvements in information technology, face detection / recognition has wide usage in applications such as personal identity, video surveillance, witness face reconstruction, computerized aging, control systems (like tracing fatigue of drivers to avoid traffic accidents), HCI (human computer interaction, which provides control of computer based system, with no need of interaction through usage of mouse, keyboard etc.), video game controllers. Therefore, implementations of face and gaze detection have received a great deal of attention in the recent literature [1-7]. For last 15 years, face detection has become a popular subject for researchers in psychophysics, neural sciences (especially in uniqueness of faces; whether face recognition is done holistically or by local feature analysis) and engineering, image processing, analysis and computer vision. A formal method of classifying faces was first proposed by Francis Galton in 1888. He proposed a method that collects facial profiles as curves and then finds their norms. This method was able to classify other profiles by using their deviations from the norms. The classification was a multi-modal resulting in a vector of independent measures that could be compared with other vectors in a database. Face recognition is a very challenging task because of variability of features in the photo taken. Different variations of scaling the faces in image, location, orientation of images (like rotated or not), pose of faces (like frontal, by side or profile) make face recognition difficult to achieve to build a performance system with practical usage for needs mentioned above. Mentioned in Yangs survey [8], the main challenges associated with face detection are;
DOI : 10.5121/ijcses.2011.2303
29
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
Pose: The image of face can vary due to relative camera-face pose, like; frontal, 45 degree, profile, and upside-down. Also some features of face (like eyes, mouth ...) may be partially or wholly occluded. Presence or absence of structural components: Facial features such as beards, mustaches, and glasses may or may not be present and there is a great deal of variability among these components including shape, color, and size. Facial expression: The appearances of faces are directly affected by a persons facial expression. Occlusion: Faces may be partially occluded by other objects. In an image with a group of people, some faces may partially occlude other faces. Image orientation: Face images directly vary for different rotations about the cameras optical axis. Imaging conditions: When the image is formed, factors such as lighting (spectra, source distribution and intensity) and camera characteristics (sensor response, lenses) affect the appearance of a face.
2. FACE RECOGNITION
There are more than 100 different techniques for face detection in images. Most of these methods use the same techniques/analysis for feature extraction. To enable better understanding of these techniques, they can be categorized into four different categories regarding their analysis methods.
30
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
Figure 1: Original and corresponding low resolution images [8] In figure 2 [8], examples of the coded rules, which are based on the characteristics of human face, used to locate the face candidates in the lowest resolution include: the center part of the face (Region-1) has a four cells with basically uniform intensity, the upper round part of a face (Region-2) has a basically uniform intensity and the difference between the average gray values of the center part and the upper round part is significant. The lowest resolution (Level 1) image is searched for face candidates and these are further processed at finer resolutions. At Level 2, local histogram equalization is performed on the face candidates received from Level 1, followed by edge detection. Surviving candidate regions are then examined at Level 3 with another set of rules that respond to facial features such as the eyes and the mouth.
Figure 2: A typical face used in knowledge-based top-down methods [8] Yangs method does not result in a high detection rate but later work [21] used the idea of multi-resolution images and rules based searching for face detection in frontal views.
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
2.3.2. Eigenfaces
Eigenfaces approach is a principal component analysis (PCA) based approach for detecting faces. In this approach, faces are represented as points in a space called eigenspace. Face detection is performed by computing the distance of image windows to the space of faces. Assuming raw images as points in a high dimensional space is impractical for two reasons; firstly, working in very high dimensional spaces is computationally too expensive, and secondly, raw images contain statistically irrelevant data which degrades the performance of the system. PCA reduces the dimensionality of a feature space by restricting the attention to those directions along which the scatter is greatest [37]. Thus, PCA is used for projecting the raw face
33
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
images onto the eigenspace of a representative set of normalized face images, for eliminating redundant and irrelevant information. Many works on face detection, recognition, and feature extractions have adopted the idea of eigenvector decomposition and clustering.
34
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
patterns. However, one drawback is that the network architecture has to be extensively tuned (number of layers, number of nodes, learning rates, etc.) to get exceptional performance [8].
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
Eye trackers are used in research on the visual system, in psychology, in cognitive linguistics and in product design. There are a number of methods for measuring eye movement. The most popular variant uses video images from which the eye position is extracted. Studies on eye can be classified as below: Detection of eye: Given an arbitrary face image, the goal of eye detection is to determine the location of the eyes. Simply in eye detection, the areas where both eyes are located are found or two eyes individually localized. As a result of the process usually eye areas are indicated by a rectangle. Detailed feature extraction: On the other hand the goal of this category is to give detailed information such as the contour of the visible eyeball region, circular area formed by iris and pupil, location of pupil in the visible eye area, state of the eye (blink/not blink). This type of work is more difficult in computer vision area as detection or real-time tracking of small details are highly effected from varying ambient conditions and result may easily fail. A lot of work on eye detection area such as; eye pupil movement detection, eye feature extraction, eye state detection, eye gaze detection using different techniques both in still images and in video sequences for real-time applications. There are many different approaches for eye/gaze detection, some of methods need extra hardware support to accomplish the task and for others, just a simple webcam is enough to do detection. Various methods are described below.
3.1. Electrooculography
Electrooculography (EOG) is a technique for measuring the resting potential of the retina. There is a permanent potential difference between the cornea and the fundus of approximately 1mV, small voltages can be recorded from the region around the eyes which vary as the eye position varies. Pairs of electrodes are placed either above and below the eye or to the left and right of the eye. By carefully placing electrodes it is possible to separately record horizontal and vertical movements. If the eye is moved from the center position towards one electrode, this electrode "sees" the positive side of the retina and the opposite electrode "sees" the negative side of the retina. Consequently, a potential difference occurs between the electrodes. Assuming that the resting potential is constant, the recorded potential is a measure for the eye position. However, the signal can change when there is no eye movement. It is dependent on the state of dark adaption (used clinically to calculate the Arden ratio as a measure of retinal health), and is affected by metabolic changes in the eye. It is prone to drift and giving spurious signals, the state of the contact between the electrodes and the skin produces and other source of variability. There have been reports that the velocity of the eye as it moves may itself contribute an extra component to the EOG. It is not a reliable method for quantitative measurement, particularly of medium and large saccades. However, it is a cheap, easy and non-invasive method of recording large eye movements, and is still frequently used by clinicians. The eye acts as a dipole in which the anterior pole is positive and the posterior pole is negative. 1. Left gaze; the cornea approaches the electrode near the outer canthus resulting in a positive-going change in the potential difference recorded from it. 2. Right gaze; the cornea approaches the electrode near the inner canthus resulting in a positive-going change in the potential difference recorded from it (A, an AC/DC amplifier).
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
as a distraction to the subject. As infra-red detectors are not influenced to any great extent by other light sources, the ambient lighting level does not affect measurements. Spatial resolution (the size of the smallest movement that can reliably be detected) is good for this technique, it is of the order of 0.1, and temporal resolutions of 1ms can be achieved. It is better for measuring horizontal than vertical eye movements. Blinks can be a problem, as not only do the lids cover the surface of the eye, but the eye retracts slightly, altering the amount of light reflected for a short time after the blink.
38
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
The detection of the eyes used in [31] is an algorithm of low computational cost, which is divided into two parts.
Figure 3: Identification of objects inside the eye area [31] Based on parameters of facial geometry, an area is defined, which possibly represents the eye and eyebrow locations in addition to extra facial information. Following from that, an algorithm [32] is employed to detect the objects. This algorithm allows a separation between the eye and other objects which occasionally belong to that geometrically defined area.
Figure 4: Detection of the eye's area [31] Considering the two largest objects found in Figure 3[31], it is observed that object 1 is the eyebrow, whereas object 2 is the eye. Figure 4[31] shows a photograph that accurately represents the eye detection. Finally, after finding the eye, it is found its center of mass, that will adjust the eye position in a box of previously defined size. The information contained in the box will be used later by the neural network, not only for the training database but also for the pattern classification.
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
This solution is defined in terms of subset of training samples (supports vectors) whose i is non- zero. [36] SVM are commonly used in eye detection system. The SVM determines the presence/absence of eyes using an input vector consisting of gray values in a moving window. In the case of rotated face images, the method often fails to detect eyes because such images are inconsistent with the training image set. This does not present a problem for authentication applications with the constraint that a human face must be in an upright position. However, in order to be widely applicable to photo albums [13] and automatic video management systems [14], eye detection methods must be able to detect human eyes even in rotated faces.
41
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
42
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011
4. CONCLUSIONS
Many methods for eye detection use face location in images. So prior to study on eye detection, a review of face detection methods is needed. In this paper, various face detection methods are categorized in analysis aspects. They have advantages and disadvantages compared to each other, but the main problems stay still in all face recognition methods. These problems are: different lighting conditions; orientation, pose and partial occlusion of face; facial expressions make-up; presence of glasses, facial hair or mustache. Many methods can overcome one or more of these difficulties, but there is no silver bullet to overcome all the problems and do fast detection without giving false positive results. In our survey, we recognise that haar-like feature approach is the most used method for face recognition projects (especially for video detection of faces and eyes) in recent years, due to be able to work in real time systems with great performance. Eye detection and gaze estimation is generally dependent on face recognition. With the help of algorithms finding location of faces in images, eye detection methods show better performance. There are new approaches that try to find eyes, without detection of faces in images which also shows great performance. Methods are generally similar for face and eye detection systems. The same algorithms can be applied to detect both faces and eyes (And even these can be applicable to other objects like cars etc.). With progress in recent studies, by using face recognition models in many areas, new intelligent systems, which will bring great comfort and ease to our life, benefit from results of these studies. In the future, we will work on developing an efficient gaze estimation technique that can be used for determining how often and which part of a billboard is being looked at in a public area for low resolution images.
REFERENCES
S.C. Kuo, C.J. Lin, J.R. Liao, (2011). 3D reconstruction and face recognition using kernelbased ICA and neural networks, Expert Systems with Applications, Vol.38, pp. 5406-5415. [2] J. Yang, X. Ling, Y. Zhu, Z. Zheng, (2008). A face detection and recognition system in color image series. Mathematics and Computers in Simulation, Vol. 77, pp. 531539. [3] H. K. Ekenel , J. Stallkamp, R. Stiefelhagen, (2010). A video-based door monitoring system using local appearance-based face models, Computer Vision and Image Understanding, Vol. 114, pp. 596608. A. Mian, (2011). Online learning from local features for video-based face recognition, Pattern Recognition, Vol. 44 pp. 10681075. C. Nitschke, A. Nakazawa, H. Takemura, (2011). Display-camera calibration using eye reflections and geometry constraints, Computer Vision and Image Understanding, vol. 115, pp. 835853. A. D. Santis, D. Iacoviello (2009). Robust real time eye tracking for computer interface for disabled people, Computer Methods and Programs in Biomedicine, vol. 96. Dan Witzner Hansen, Qiang Ji ,In the Eye of the Beholder: A Survey of Models for Eyes and Gaze IEEE,2010 M. H. Yang, N. Ahuja, (2002). Detecting Faces in Images: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No.1. 43 [1]
[4]
[5]
[6]
[7]
[8]
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011 [9] G. Yang and T. S. Huang, (1994). Human Face Detection in Complex Background. Pattern Recognition, vol. 27, no. 1, pp. 53-63. Y. Amit, D. Geman, and B. Jedynak, Efficient Focusing and Face Detection, Face Recognition: From Theory to Applications, H. Wechsler, P.J. Phillips, V. Bruce, F. Fogelman-Soulie, and T.S. Huang, eds., vol. 163, pp. 124-156, 1998. L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Wadsworth, 1984. A. Lanitis, C.J. Taylor, and T.F. Cootes, An Automatic Face Identification System Using Flexible Appearance Models, Image and Vision Computing, vol. 13, no. 5, pp. 393-401, 1995. G.J. Edwards, C.J. Taylor, and T. Cootes, Learning to Identify and Track Faces in Image Sequences. Proc. Sixth IEEE Intl Conf. Computer Vision, pp. 317-322, 1998. M. Turk and A. Pentland, Eigenfaces for Recognition, J. Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991. E. Osuna, R. Freund, and F. Girosi, Training Support Vector Machines: An Application to Face Detection, Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 130-136, 1997. F. Samaria and S. Young, HMM Based Architecture for Face Identification, Image and Vision Computing, vol. 12, pp. 537-583, 1994. H. Rowley, S. Baluja, and T. Kanade, Neural Network-Based Face Detection, Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 203-208, 1996. H. Rowley, S. Baluja, and T. Kanade, Rotation Invariant Neural Network-Based Face Detection, Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 38-44, 1998. R. E. Schapire, A brief introduction to boosting, Proc. of the Six- teenth International Joint Conference on Articial Intelligence, 1999, pp. 246252 R. Brunelli, T. Poggio, Face Recognition: Features Versus Templates, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 15, No. 10, pp. 1042-1052, Oct. 1993. A. Pentland, B. Moghaddam, and T. Starner, View-based and Modular Eigenspaces for Face Recognition, In IEEE Conference on Computer Vision and Pattern Recognition, pp. 84-91, 1994. Z. H. Zhou and X. Geng, Projection Functions for Eye Detection, Pattern Recognition, Vol. 37, No. 5, pp. 1049-1056, 2004. R. L. Hsu, M. Abdel-Mottaleb, and A. K. Jain, Face Detection in Color Images, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, No. 5, pp. 696-706, May 2002. J. Miao, B. Yin, K. Wang, L. Shen, and X. Chen, A Hierarchical Multiscale and Multiangle System for Human Face Detection in a Complex Background Using Gravity-Center Template, Pattern Recognition, vol. 32, no. 7, pp. 1237-1248, 1999., 44
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011 [25] P. Viola and M. Jones, Robust real time object detection, in Proc. of the 2nd International Workshop on Statistical and Computational Theories of Vision-Modeling, Learning, Computing and Sampling, Vancouver, Canada, July 2001. K. Grauman, M. Betke, J. Gips, G. R. Bradski, "Communication via Eye Blinks - Detection and duration analysis in real time". Proc. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Lihue, HI, Vol. 1 pp.:I-I0I0 - 1-1017 2001. S. Haykin, Neural Networks A Comprehensive Foundation. Second Edition, 2005. L. Huchuan ,Yo Z. Wei, Y. Del. "Eye detection based on rectangle features and pixel-pattern based Texture features". Department of Electronic Engineering, Dalian University of Technology, 2007. G. Yepeng. "Robust Eye Detection from Facial Image based on Multicue Facial Information". School of Communication and Information Engineering Shanghai University, 2007 N. A. Jalal, K. Sara, P. R. Hamid. "Eye Detection Algorithm on Facial Color Images". Department of Computer Engineering, Ferdowsi University of Mashhad and Department of Computer Engineering, University, 2008 Helton M. Peixoto, Ana M. G. Guerreiro, Adrian D. D. Neto , Image Processing for Eye Detection and Classification of the Gaze Direction, Proceedings of International Joint Conference on Neural Networks, Atlanta, Georgia, USA, June 14-19, 2009 M. R. Haralick, S. G. Linda. "Computer and Robot Vision", Volume I, Addison-Wesley, pp. 2848, 1992. E. Osuna, R. Freund, and F. Girosit, Training Support Vector Machines: An Application to Face Detection, Proc. CVPR, 1997, pp. 130-136. G. Pan, W. Lin, Z. Wu, Y. Yang, An Eye Detection System Based on SVM Filter, Proc. SPIE, vol. 4925, 2002, pp. 326-331. Hyoung-Joon Kim and Whoi-Yul Kim , Eye Detection in Facial Images Using Zernike Moments with SVM, ETRI Journal, Volume 30, Number 2, April 2008 Hyungkeun Jee 1, Kyunghee Lee 2, Sungbum Pan , Eye and Face Detection using SVM, IEEE 2004 Robust real-time object detection, Interntional workshop on statistical and computational theories of vision 57 (2004), no. 2, 137154 Shiguang Shan, Wen Gao, Yizheng Chang, Bo Cao, and Pang Yang, Review the strength of gabor features for face recognition from the angle of its robustness to mis-alignment, International Conference on Pattern Recognition, 2004, pp. 338341. Peng Wang, Matthew B. Green, Qiang Ji , James Wayman , Automatic Eye Detection and Its Validation IEEE ,2005 45
[26]
[27] [28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, August 2011 [40] Paul Viola and Michael Jones, Robust real-time object detection, International Journal of Computer Vision 57 (2004), no. 2, 137154. Benke Xiang , Xiaoping Cheng ,Eye Detection Based on Improved AD AdaBoost Algorithm, 2010 2nd International Conference on Signal Processing Systems, 2010 IEEE LI Chuang, DING Xiao-qing and WU You-shou, A Revised AdaBoost Algorithm-AD AdaBoost CHINESE JOURNAL OF COMPUTERS, vol 30, Jan. 2007, pp. 103-109.
[41]
[42]
Zeynep Orman received the B.Sc., M.Sc. and Ph.D. degrees from Istanbul University, Istanbul, Turkey, in 2001, 2003 and 2007, respectively. She has studied as a postdoctoral research fellow in the Department of Information Systems and Computing, Brunel University, London, UK in 2009. She is currently working as an Assistant Professor in the Department of Computer Engineering, Istanbul University. Her research interests are artificial neural networks, nonlinear systems, image processing applications and intelligent systems.
Erdem KEMER is currently studying his Master of Engineering degree in Computer Science & Engineering at Istanbul University. His major research interests are Artificial Intelligence and Software Development.
Abdulkadir BATTAL is currently studying his Master of Engineering degree in Computer Science & Engineering at Istanbul University. His major research interests are Artificial Intelligence and Software Development.
46