The main problem in computer vision is to
calculate depth using two images.This paper aims in calculating
the depth of facial feature(i.e nose),using left and right images.
This left and right images are captured by left and right camera
which are mounted horizontally separated by slight distance such
that they are able to capture image from left and right camera
.The extracted correspondent feature point(i.e nose)from both
left and right image is used and triangulation method is used to
calculate 3D distance of nose.The triangulation method requires
disparity map along with correspondence points.
The main problem in computer vision is to
calculate depth using two images.This paper aims in calculating
the depth of facial feature(i.e nose),using left and right images.
This left and right images are captured by left and right camera
which are mounted horizontally separated by slight distance such
that they are able to capture image from left and right camera
.The extracted correspondent feature point(i.e nose)from both
left and right image is used and triangulation method is used to
calculate 3D distance of nose.The triangulation method requires
disparity map along with correspondence points.
The main problem in computer vision is to
calculate depth using two images.This paper aims in calculating
the depth of facial feature(i.e nose),using left and right images.
This left and right images are captured by left and right camera
which are mounted horizontally separated by slight distance such
that they are able to capture image from left and right camera
.The extracted correspondent feature point(i.e nose)from both
left and right image is used and triangulation method is used to
calculate 3D distance of nose.The triangulation method requires
disparity map along with correspondence points.
Depth Detection of Facial feature Sushma.H.R* 1 M.tech student ,Department of ISE,P.E.S.I.T India
Abstract The main problem in computer vision is to calculate depth using two images.This paper aims in calculating the depth of facial feature(i.e nose),using left and right images. This left and right images are captured by left and right camera which are mounted horizontally separated by slight distance such that they are able to capture image from left and right camera .The extracted correspondent feature point(i.e nose)from both left and right image is used and triangulation method is used to calculate 3D distance of nose.The triangulation method requires disparity map along with correspondence points.
Keywords Stereo calibration,Stereo rectification. I. INTRODUCTION
Depth detection of facial feature is about finding the depth from camera using two images. Depth detection are applied in various fields like Robotics and 3D-Model generation.
In Robotics to find information about the position of an object. This can be used by robots to identify one or more similar objects by calculating the distance of similar object,it becomes easier for robot to distinguish between objects.
By calculating the 3D values of multiple points, we can generate 3D-Model.
Depth information for a 2D image can be calculated in several ways.One way to detect depth is using laser range camera and one more method is using a two image pair in combination and then apply triangulation. The usual and most common method is called as stereo vision, stereo matching or stereo correspondence. In stereo correspondence is about finding same correspondent point from given input left and right image taken,these images are captured from two cameras which are slightly separated and are stored in database and then these images are given as input for detecting depth . II. RELATED WORK One of the vital problem definition in computer vision is to calculate the depth to an object using left and right image taken by two cameras.This method makes use of focal length to pixel point ratio.This ratio relates to image formed in the lens and the image formed on the outside world [1]. Markov Random Field (MRF) algorithm is used to capture monocular cues, and then use them into a stereo system. Monocular cues along with stereo (triangulation) ones, are used to get more accurate depth estimates by using monocular cues or stereo cues alone [2].
Stereo camera depth is applied in human centered applications. This gives us the study of how stereo camera depth resolution and human depth resolution varies.Stereo vision is the one which provides depth to human eye by disparity between two images taken from left and right eye [3].
This method shows importance of camera calibration on performance of depth reconstruction using stereo imaging. This method provides formulae that relate different parameter errors to the 3D reconstruction measurements[4]. III. PROPOSED METHOD In this paper we use two images to calculate the depth of facial feature the methodology used here is stereo imaging. Stereo imaging is ability that our eye has given us. How far can this be achieved in computational systems is the question. Computers can achieve this by finding correspondence of same point that on both left and right image.Once correspondent method is found from both left and right image Triangulation method is applied to calculate the 3D distance of the corresponding point.
Stereo imaging involves following steps when using two images to calculate depth. Step 1: Removing errors that are caused by lens due to manufacturing defects of lens and bulging effect and this process is called undistortion. Step 2: Alignment of both left and right images so that they lie on the same planer. Step 3:Finding the correspondent points from both left and right Image.The output of this is disparity map. Step 4:The geometric arrangements camera is known,then disparity map is used in triangulation to find the 3D distance. The first phase is image acquisition, where images of chessboard patterns and user from left and right cameras are captured and saved as left and right images respectively in the database. The second phase is the stereo calibration, where International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 7July 2013
cameras are calibrated and the output matrix of calibrated camera is obtained which is used for rectification, matrix obtained after rectification is further required in Depth detection of facial feature.The third phase is Face detection using the built-in library of OpenCv,followed by feature extraction of the detected face.The last phase is the one where the extracted correspondent nose points and triangulation is applied on these correspondent nose points to Detect the depth of facial feature.
A.Image Acquisition This is a prerequisite stage where Images are to be captured and stored in the database. In order to capture Images the two cameras are horizontally mounted separated by a distance such that both the cameras are able to capture the same image. The OpenCV functions that allow us to interact with hardware such as cameras are collected into a library called HighGUI (which stands for high-level graphical user interface).
Figure 1: The SystemArchitecture B.Stereo calibration Stereo calibration is the process of finding the geomety between the two cameras[5].Stereo calibration is about finding the rotation matrix and translation vector between the two cameras.Both Rotation and Translation are calculated by the function cvStereoCalibrate().In cvStereoCalibrate(),a single rotation matrix and translation vector that relate the right camera to the left camera is produced.The input of stereo calibration is used for stereo rectification.Stereo rectification is the process of correcting the individual images such that images will appear that they are coplanar.
The result of the process of aligning the two image planes will be eight terms, four each for the left and the right cameras.The two cameras that is right and left cameras,requires distortion vector that is distortion Coefficients which is obtained,by rotation matrix (to apply to the image), and the rectified and unrectified camera matrices.In order perform rectification process Bouguets algorithm is used, which uses the rotation and translation parameters from two calibrated cameras.
The captured left and right chessboard patterns are given as the input to stereo calibration.Stereo calibration is performed in Opencv using function CvStereoCalibrate.The output of Stereo calibration is rotation, translation and undistortion vectors.Following stereo calibration is stereo rectification which uses CvStereoRectify() function.
C.Face detection The main purpose of face detection is to find the corresponding point in both left and right image. OpenCV implements face-detection technique first developed by Paul Viola and Michael Jones and is known as the Viola-Jones detector. The face detector uses Haar Feature-based Cascade Classifier.OpenCV uses a set of pre-trained face recognition file to detect face, the pre-trained face recognition file is in xml format.
D.Facial feature extraction The Facial feature i.e. nose is extracted in OpenCV using pre- trained nose recognition file to detect nose which is in xml format. Once nose is extracted a midpoint is drawn on centre of detected nose. This nose point is taken as corresponding feature point from both left and right image.
E.Depth detection of Facial feature Once the stereo calibration and stereo rectification is performed we obtain disparity map in the form of vector.This disparity map is used to calculate the depth by triangulation and is called reprojection, and the output is a depth map.
Triangulation The stereo calibration process eliminates radial distortion,the bulging phenomenon of image,and tangential distortion which is due to manufacturing defects.Stereo Rectification makes two images that is left and right image to be row aligned and lie on the same plane.Now we have undistorted,aligned left and right image which are coplanar.with exactly parallel optical axes that are a known distance apart, and with equal focal lengths fl =fr.
Taking Nl and Nr to be the nose positions of the points in the left and right images,and that the depth is inversely proportional to the disparity between left nose point and right nose point.The disparity is defined as difference between the Image Acquisition Stereo Calibration Face Detection Facial Feature Extraction Depth Detection of Facial Feature International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 7July 2013
left nose point and right nose point and is denoted as d =Nl Nr. This is shown in Figure 2, and the depth Z is derived by using similarity of triangles.
Nl z Nr
f
T
Ol d=Nl-Nr Or
Figure 2:Triangulation(Similarity of Triangles).
The equation to calculate the depth Z is given by: T- (Nl-Nr)/ Z f = T / Z
Z =f T / Nl-Nr. IV. RESULTS
Table 1 Results for Image set Left and Right image number Face and Nose Detected X-axis Y-axis Z-axis Image 001 Yes -1.23 13.33 64.90 Image 002 Yes -3.66 12.30 66.56 Image 003 Yes -3.48 13.01 63.32 Image 004 Yes -2.73 11.49 64.90 Image 005 Yes -5.31 12.22 61.81 Image 006 Yes -4.95 08.43 63.31 Image 007 Yes -3.40 11.90 61.80 Image 008 Yes -6.43 11.78 60.37 Image 009 Yes -5.31 9.81 61.81 Image 010 Yes 0.09 10.69 60.37 Image 011 Yes -1.68 9.24 45.54 Image 012 Yes -3.18 10.49 55.23 Image 013 Nose not detected Garbage value Garbage value Garbage value Image 014 Nose not detected Garbage value Garbage value Garbage value Image 015 Nose not detected Garbage value Garbage value Garbage value
The above table indicates the distance of x,y and z-axis where the images arecaptured for varying position and different distances.Face detection is done from both left and right image.After face detection nose mid point is used as correspondent point from left and right image and triangulation is applied to detect depth.
V. CONCLUSION AND FUTUREWORK The Depth detection of facial feature (i.e. nose) is achieved, using Stereo Imaging Concept.This gives us calculated values in Z-axis, Thus depth can be acquired using left and right Images, by extracting corresponding feature point from both left and right images and finally applying Triangulation method from the acquired correspondent feature points of Left and Right Images.The Depth Detection can be extracted of more facial features like eye ,nose mouth,eyebrows,jawline points and so on can be extracted and can be used in 3D face model generation.If Deph Can be estimated for the entire image rather than a single feature,can be applied in Free Viewpoint Television (FTV) and Multi-View Coding (MVC). REFERENCES [1] Luis Copertari,Unidad Acadmica de Ingeniera Elctrica,Universidad Autnoma de Zacatecas, Stereoscopic vision for depth perception,Investigation scientific,August 2007,Volume 3. [2] Ashutosh Saxena, J amie Schulte and Andrew Y. Ng. Depth Estimation using Monocular and Stereo Cues, Stanford University, Stanford, CA 94305. [3] Mikko Kyt*, Mikko Nuutinen, Pirkko Oittinen,Method for measuring stereo camera depth accuracy based on stereoscopic vision, Aalto University School of Science and Technology, Department of Media Technology,Otaniementie 17, Espoo, Finland. [4] Wenzi Zhao, N.Nandhana Kumar, Effects of Camera Alignment Errors on Stereoscopic Depth Estimates,University of Virginia,VA 22903. [5] Gary Bradski and Adrian Kaehler,Learning openCV, September 2008: First Edition. [6] Viola, P. and J ones, M. Rapid object detection using boosted cascade of simple features. IEEE Conference on Computer Vision and Pattern Recognition, 2001. [7] Hua Gu Guangda Su Cheng Du, Feature Points Extraction from Faces Image and Vision Computing NZ, Palmerston North, November 2003.