You are on page 1of 4

2008 International Conference on Computer Science and Software Engineering

A Vision-Based Algorithm for Landing Unmanned Aerial Vehicles


Yang Fan Shi Haiqing Wang Hong State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China yang-f04@mails.tsinghua.edu.cn

Abstract
Autonomous landing is an important part of the autonomous control of Unmanned Aerial Vehicles (UAVs). In this article we present the design and implementation of a vision algorithm for autonomous landing. We use an onboard camera to obtain the image of the ground where the landing target (landmark) is laid and a method based on Zernike moments will be responsible for the target detection and recognition. Then the pose and position of the UAV will be estimated using the image of the landmark. As an important part of the article, the results of experiments on an OpenGL-developed testbed are presented as well as the trial results using simulated flight video sequences to demonstrate the accuracy and efficiency of our algorithm.

landing target (in most circumstances, a landmark), and the estimation of the pose and position of the UAV. In this article we present a vision algorithm to solve these two problems effectively.

Fig 1Rotary-wing (helicopter) UAV

1. Introduction
UAV is the abbreviation of Unmanned Aerial Vehicle, which is widely used in the area of reconnaissance and surveillance [1][3] because of its low price and compact size, and most importantly, its ability to carry out dangerous missions without human pilots aboard. There are two kinds of UAVs: the fixedwing ones and rotary-wing ones (helicopters). The helicopters have the specialties of air hovering and vertical takeoff and landing which allow them to accomplish tasks that fixed-wing aircrafts are unable to perform, and this article will focus on the autonomous landing of the unmanned helicopters (Fig 1). The autonomous flight control is the basic requirement of the application of UAVs, for which the autonomous landing is an important capability. The term autonomous landing means a UAV should locate and finally land on the designated location without ground control. Since vision has the modality for object detection and recognition, the combination of computer vision and landing control has become the hotspot of researches on UAV[1][2]. There are two major subjects of vision-based landing control: the detection and recognition of the

2. System Design
The landmark is designed as a white letter H (Fig 2), complied with the Convention on International Civil Aviation. The ground image is obtained by onboard camera and sent to vision module for landmark recognition and state estimation. The pose and position calculated will be transmitted to the control module of the UAV for further navigation. Before we start to talk about the vision algorithm, we need to make some assumptions: the camera should be perpendicular to the ground and the vertical axis of the camera should coincide with the principle axis of the UAV. We make these assumptions to ensure the landmark in the image obtained by onboard camera will only change in rotation, translation and scaling, not in perspective projection[1].

3. Landmark Recognition
In this chapter we will describe the vision algorithm to extract and identify the landmark in the aerial image in three parts: preprocessing, landmark detection and extraction, and landmark recognition.

3.1 Preprocessing

978-0-7695-3336-0/08 $25.00 2008 IEEE DOI 10.1109/CSSE.2008.309

993

In this stage, we will reduce the redundant information of the image to facilitate the detection and recognition (Fig 2a, b). 1) RGB to Gray. First we convert the color image obtained from the camera to grayscale so as to reduce the computational cost and focus on the intensity of the image. The conversion is performed under the following equation[1][7]: (1) Y = 0.299 R + 0.587G + 0.114 B where Y, R, G, B represent the intensity, the red, green and blue value in the image respectively. 2) Image Enhancement. In order to remove the white noise, we apply a 3x3 median filter to the grayscale image[1][7]. Besides, median filters can eliminate the salt-and-pepper noise caused by wireless transmission effectively. 3) Thresholding. Then we need to produce a binary image in which the landmark should be preserved and the other data of the image should be removed as much as possible (Fig 2b). The fixed threshold is chosen after various tests with simulation data.

Fig 2. Landmark Recognition. (a) Initial image. (b) Binary Image. (c) Segmented image. (d) Region of Landmark.

3.2 Landmark Detection and Extraction


The goal of this stage is to find and extract the image regions which possibly contain the landmark. (Fig 2c) This is accomplished by the following two steps: 1) Segmentation. After thresholding we get a binary image contains the landmark. However, there are always many other objects which are also in the image. In order to distinguish them from our landing target, we need to partition the image into different connected regions first. The method of segmentation we choose is based on region growing[7]. After segmentation, the image would have been partitioned into a set of regions with the region of landmark among them. 2) Extraction. The numbers of pixels each region contains are quite different, and if we can estimate the probable range of pixel number of the landmark, we can exclude the regions too large or too small. One method to decide the thresholds of the area is to make use of the height of the UAV given by GPS since the area of landmark is with proportional to the square of the height. However currently we still use the fixed thresholds instead. Those regions with more than 10000 pixels or less than 100 pixels are discarded.

As shown in Fig 2c, we may still have more than one region after the extraction. So in this stage we will finally decide which region is indeed the landmark we need (Fig 2d). Since we have made the assumption that the camera will be perpendicular to the ground, we only need to handle the rotation, translation and scaling of the landmark. Of the many methods devised for pattern recognition, invariant moments are ideal for this task since they are based on the geometric shapes and one can calculate a set of descriptors which are invariant to those transforms [1]. The vision algorithm developed by USC uses one such kind of moments, the Hu Moments[4] for landmark recognition[1]. However, in our simulation experiments the Hu Moments dont perform very well, especially when the landmark is changed in rotation. So in our algorithm, we choose Zernike Moments[5] instead. To the discrete two-dimensional image intensity function I (x,y), the Zernike moment Anm is defined as[6]:

Anm =

( n + 1)

R ( ) e
nm x y

jm

I ( x, y )

(2)

where j = 1 , is a normalization constant calculated as the number of pixels calculated in the unit disk, n is the order of the moment and is a positive integer, m is the repetition which has the value of a positive or negative integer subjects to the condition that n - | m| is even and | m | n.. and are the polar coordinates of the pixel: (3) = x2 + y 2 , = arctan ( y / x ) And the Rnm( ) is the orthogonal radial polynomial given by:

3.3 Landmark Recognition

Rnm ( ) =

n |m| 2 k =0

nmk

n 2 k

(4)

Bnmk is a coefficient calculated as:

994

(5) ( n k )! n+ | m | n | m | k ! k ! k ! 2 2 The most important characteristic of Zernike moments from its definition is Anm is invariant to rotation itself [5][6] while Hu moments have to obtain such invariance by complex calculation [4]. Since the Zernike moments are defined in the unit disk, when we want to get an image regions moments we should find the gravity center of the region and map the region into a unit disk first. In our algorithm we choose A20 and A31 to distinguish between the landmark and other regions. The corresponding polynomials can be calculated beforehand as: (6) R20 ( ) = 2 2 1
Bnmk = ( 1)
k

Since the camera is perpendicular to the ground, we simply consider the distance between the UAV and the landmark corresponding to the pixel distance between the image center and the landmark region [1]. The height of the UAV obtain from the GPS is needed to calculate the actual distances. If the number of pixels between the image center and the gravity center of the landmark region along the vertical axis is y, the focal length of the camera is f, then the distance along UAVs principle axis can be calculated as: y height (8) distY =
f

Similarly, the distance principle axis is given by:


distX = x height f

perpendicular

to

the (9)

(7) R31 ( ) = 3 3 2 If one regions A20 and A31 values calculated lie within a tolerance of 20% and 35% of the values of a standard image of landmark respectively we think this region may be the landmark. If more than one region meets this requirement, we tend to choose the one with the smallest error.

From the equations above we can see that the accuracy of the calculation mainly depends on the accuracy of locating the landmark image regions gravity center.

4.2 Pose Estimation

4. Pose and Position Estimation


In this chapter we will use the image region of the landmark to estimate the pose and position of the UAV.

Fig 4. Pose Estimation

Fig 3. Definition of (a) Position and (b) Pose

The position of the UAV means the distances between the center of the landmark on the ground and the UAV both along and perpendicular to the principle axis of the UAV (e.g., the vertical axis of camera) (Fig 3a). These two values are needed to adjust the heading and speed of the UAV. The pose of the UAV means the angle between the principle axis of the UAV and the vertical axis of the landmark. (Fig 3b). We calculate this value because we need the UAV finally land on the landmark with its principle axis consists with that of the landmark.

In the pose estimation we also need to locate the gravity center of the landmark in the image. Besides, we find the pixel farthest to the gravity center in the region. With this pair of pixels, we can draw a diagonal of the landmark. However, for letter H, one diagonal may correspond to two poses (Fig 4). In order to determine which of the two is the pose of the landmark, we use a standard landmark image to calculate the angle between two corner points of the H (anglein Fig 4a). Then according to the diagonal we got, we calculate the coordinates of such two points of the image region (point A and B in Fig 4a) and see which of the two is farther from the region. If the point A in Fig 4a is closer, then the pose should be like that of Fig 4a, otherwise it should be like the one of Fig 4b. After determining the pose, we can estimate the pose angle using angleand the slope of the diagonal.

4.1 Position Estimation

5. Experiment Results

995

The experiments were done in two stages, first we used a testbed to check the accuracy of the pose and position estimation algorithm, and then we applied our algorithm to some flight simulation video sequences to verify the recognition efficiency. 1) Results on the testbed. The testbed was developed using OpenGL and we can change the position and pose of the landmark in the image by the control panel freely and observe the computational error. The resolution of the image is 640480, the diameter of the landmark is 2 meters. We chose eight different flying heights range from 6 meters to 20 meters, and in each altitude we chose five different poses and five different positions to gather the calculation results. Results under a PentiumIV-1.73GHz CPU are shown in Table 1. From the table we can see the average position error is smaller than 5cm and pose error is within 3. Besides, the temporal cost for each frame is under 50ms. These results proved the accuracy of our pose and position estimation and the real-time capacity of the whole algorithm with limited noise.
Table 1. Results on the testbed

Fig 5. Sample frames of the video sequences

6. Conclusion and Future Work


In this paper we present a vision algorithm to solve the autonomous landing problem of UAV. The experiment results show that our algorithm works with accuracy and efficiency, the pose and position computational errors are within 3and 10cm. Besides, the processing speed of our algorithm is 10 frames per second, which is fast enough for UAVs navigation. The most important work in the future is to use a real UAV to test the robustness of our algorithm. Also we plan to use adaptive threshold in the image processing and landmark extraction. This will improve the accuracy of recognition as well as reduce the temporal cost of computation.

Position Error X Position Error Y Pose Error Temporal Cost

RMS 4.21253cm 1.20984cm 0.563808 39.9841ms

Maximum 8.149196cm 1.646628cm 1.416945 44.1866ms

References
[1] Srikanth Saripalli, James F Montgomery, Gaurav S Sukhatme. Visual Guided Landing of an Unmanned Aerial Vehicle, IEEE Transactions on Robotics and Automation, Vol19, No3, 2003, pp371-380 [2] Cory S Sharp, Omid Shakernial, Sastry S Shankar. A Vision System for Landing an Unmanned Aerial Vehicle. IEEE lnternational Conference On Robotics and Automation, 2001, pp1720-1727 [3] Mondragon, Ivan F, and Campoy, Pascual and Correa, Juan F. and Mejias, Luis, Visual Model Feature Tracking For UAV Control, IEEE International Symposium on Intelligent Signal Processing, 2007, pp1-6 [4] M. K. Hu, Visual pattern recognition by moment invariants, IRE Trans.Inform. Theory, vol. IT-8, pp. 179 187, 1962. [5] Teague, M.R.: Image analysis via the general theory of moments. J. Opt. Soc. Am. 70(8), 920930 (1980) [6] Mohammed Al-Rawi, Fast Zernike Moments, J RealTime Image Proc, No3, 2008. pp 89-96 [7] R.Gonzalez, R.Woods. Digital image processing, 2nd edition, Addison-Wesley Longman Publishing Co., Inc, 2003

2) Results on the flight simulation video sequences. These video sequences are captured on ground by keep the camera perpendicular to the landmark and manually change the pose of the camera to simulate the rotation, transaction and scaling of the landmark (Fig 5). We applied our algorithm to those sequences to verify the accuracy of landmark recognition and test the real-time computational capacity. Results under a PentiumIV-1.73GHz CPU are shown in Table 2. Our algorithm succeeds in recognize 96.1% of the frames with the landmark and the processing speed is about 10 frames per second. As a conclusion of these experiments, our vision algorithm is accurate and efficient, especially when the interference is limited.
Table 2. Results on video sequences

Number of Frames Correctly Recognized Correction Rate Average Temporal Cost

3877 3725 96.1% 78.5543ms

996

You might also like