You are on page 1of 40

Vision Based Robot Navigation-Techniques & Algorithm Development

A SEMINAR REPORT Submitted by

Surya kant
In partial fulfillment for the award of the degree of

BACHELOR OF ENGINEERING
in ELECTRONICS & COMMUNICATION ENGINEERING

IIT

DELHI
September & 2011

Acknowledgement I thank my guide Shri Sunil Agrawal for guiding me through my project Vision Based Vehicle Navigation- Techniques & Algorithm Development. I thank sir to give me the opportunity to work on this topic for my seminar and constantly encouraging me to pursue a creative and innovative topic. I am deeply thankful for the guidance and the useful direction that I received while working on the report and the presentation. I thank sir for the cooperation and the trust that they put in and my seminar. I thank sir for the guidance that I receive all during the seminar duration that helped me to complete the seminar presentation on time.

Sincerely Sachmanik Singh Cheema IV Year Student Electronics & Communication University Institute of Engineering & Technology Panjab University.

TABLE OF CONTENTS

List of Figures
Figure 1: Mars Rover Figure 2: Related Areas to Computer Vision Figure 3: Example of Motion Analysis Figure 4: Computer Vision in Autonomous Vehicle Navigation Figure 5: Example of Path Planning Figure 6: Vision Task Sequencer Figure 7: A Vision Based Robot for Unstructured Environment Figure 8: Path Approximation Figure 9: Pixels in a Binary Image Figure 10: Cropped Image Figure 11: The Final Working Image Figure 12: The Computed Deviation Figure 13: Image Enhancement Figure 14: Graphical User Interface Figure 15: Cars Used By Google Street View Figure 16: Example of Map Building Figure 17: A View of the Hardware of the Google Driverless Vehicle Figure 18: Front View of the Google Driverless Vehicle Figure 19: A Similar Vehicle Developed By Carnegie Mellon University for DARPA Challenge.

Contents
1. Introduction 1.1 Computer Vision 1.2 Computer Vision: A Continuously Emerging Field 1.3 Typical Task of Computer Vision 1.3.1 Recognition 1.3.2 Motion Analysis 1.3.3 Scene Reconstruction 1.3.4 Image Restoration 1.3.5 3D Volume Recognition 1.4 Computer Vision Systems 1.5 Related Areas 1.6 Applications for Computer Vision 2. Computer Vision in Autonomous Vehicle Navigation 2.1 Indoor Navigation 2.1.1 Map Based Navigation 2.1.2 Map Building Based Navigation 2.1.3 Map less Navigation 2.1.3.1 Absolute Localization 2.1.3.2 Incremental Localization 2.2 Navigation Using Occupancy Grids 2.3 Navigation Using Optic Flow 2.4 Navigation Using Object Recognition 2.2 Outdoor Navigation 2.2.1 Navigation in Structured Environment 2.2.1.1 NAVLAB- The Navigation Laboratory 2.2.1.2. VITS- Vision Task Sequencer 2.2.2 Navigation in Unstructured Environments 3. Algorithm Development 3.1 An Improved Technique for Vision Based Path Tracking Robots 3.1.1 Introduction 3.1.2 Proposed Algorithm 3.1.3 Methodology

3.1.4 Embedded System Control 4. Google Driverless Vehicle 4.1 Introduction 4.2 Commercialization 5. Bibliography

Abstract
The report deals with the real time vision based vehicle navigation which in simpler terms means the ability of a robot or vehicle to be able to find its path with the sole help of a vision sensor or camera that enables the robot to provide the necessary information about the environment by extracting those features with the help of image processing algorithms and techniques. This report explains the study of various indoor and outdoor navigation techniques that can be used for navigation of autonomous vehicles. For outdoor environment the techniques have been further divided into structured and unstructured environment. It explains the role of software like MATLAB in the development of path tracking and environment detection algorithm that are the crux of vehicle navigation. This report also presents a self developed algorithm for a simple line following robot that works on the image pixels itself rather than converting image in to its equivalent model which makes the algorithm computationally simpler than other algorithms. The report further extends the concept of real time based autonomous vehicle navigation to unstructured environment which is much more related to the present world requirement and development of automated vehicles. On the same track, it also describes the Googles concept of driverless vehicles. This is a revolutionary concept that aims to change the present driving experience and shift it to complete autonomous and driverless cars.

1. Introduction
1.1 Computer Vision Computer Vision is the science and technology of machines that see. Here see means the machine is able to extract information from an image, to solve some task, or perhaps "understand" the scene in either a broad or limited sense. Applications range from relatively simple tasks, such as industrial machine vision systems which, say, count bottles speeding by on a production line, to research into artificial intelligence and computers or robots that can comprehend the world around them. As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, or multidimensional data from a medical scanner. Today in the 2011, machine vision is a rapidly developing field, in both the scientific and industrial fields, and indeed now even in the video game universe, with the first machine vision computer game systems appearing as world-wide commercial products. Some strands of computer vision research are closely related to the study of Biological Vision- indeed, just as many strands of AI research are closely tied with research into human consciousness. The field of biological vision studies and models the physiological processes behind visual perception in humans and other animals. Computer vision, on the other hand, studies and describes the processes implemented in software and hardware behind artificial vision systems. Interdisciplinary exchange between biological and computer vision has proven fruitful for both fields. Computer vision is, in some ways, the inverse of Computer Graphics. While computer graphics produces image data from 3D models, computer vision often produces 3D models from

image data. There is also a trend towards a combination of the two disciplines, e.g., as explored in Augmented Reality.

Figure 1: Mars Rover

Sub-domains of computer vision include: 1. 2. 3. 4. 5. 6. 7. Scene Reconstruction. Event Detection. Video Tracking. Object Recognition. Machine learning. Motion Estimation. Image Restoration.

1.2 Computer Vision: Continuously Emerging Computer Vision is a diverse and relatively new field of study. In the early days of computing, it was difficult to process even moderately large sets of image data. It was not until the late 1970s that a more focused study of the field

emerged. Computer vision covers a wide range of topics which are often related

to other disciplines. Figure 2: Related Areas to Computer Vision Consequently there is no standard formulation of "the computer vision problem". Moreover, there is no standard formulation of how computer vision problems should be solved. Instead, there exists an abundance of methods for solving various well-defined computer vision tasks, where the methods often are very task specific and seldom can be generalized over a wide range of applications. Many of the methods and applications are still in the state of basic research, but more and more methods have found their way into commercial products, where they often constitute a part of a larger system which can solve complex tasks (e.g., in the area of medical images, or quality control and measurements in industrial processes). In most practical computer vision applications, the computers are pre-programmed to solve a particular task, but methods based on learning are now becoming increasingly common.

1.3 Typical Task of Computer Vision


Different application areas of computer vision employ a range of computer vision tasks, more or less well-defined measurement problems or processing problems, which can be solved using a variety of methods. Some examples of typical computer vision tasks are presented below.

1.3.1.

Recognition

The classical problem in computer vision, image processing, and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity. This task can normally be solved robustly and without effort by a human, but is still not satisfactorily solved in computer vision for the

general case: arbitrary objects in arbitrary situations. The existing methods for dealing with this problem can at best solve it only for specific objects, such as simple geometric objects (e.g., polyhedral), human faces, printed or hand-written characters, or vehicles, and in specific situations, typically described in terms of well-defined illumination, background, and pose of the object relative to the camera. Different varieties of the recognition problem are described in the literature: Object Recognition: One or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene. Identification: An individual instance of an object is recognized.

Examples: Identification of a specific person's face or fingerprint, or identification of a specific vehicle.

Detection: The image data is scanned for a specific condition.

Examples: Detection of possible abnormal cells or tissues in medical images or detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image which can be further analyzed by more computationally demanding techniques to produce a correct interpretation. Several specialized tasks based on recognition exist, such as: Content Based Image Retrieval: Finding all images in a larger set of images which have a specific content. The content can be specified in different ways, for example in terms of similarity relative a target image, or in terms of high-level search criteria given as text input. Pose Estimation: Estimating the position or orientation of a specific object relative to the camera. An example application for this technique would be assisting a robot arm in retrieving objects from a conveyor belt in an assembly line situation. Optical Character Recognition (OCR): Identifying characters in images of printed or handwritten text, usually with a view to encoding the text in a format more amenable to editing or indexing (e.g. ASCII).

1.3.2.

Motion analysis

Several tasks relate to motion estimation where an image sequence is processed to produce an estimate of the velocity either at each point in the image or in the 3D scene, or even of the camera that produces the images. Examples of such tasks are: Figure 3: Example of Motion Analysis

Egomotion: Determining the 3D rigid motion (rotation and translation) of the camera from an image sequence produced by the camera. Tracking: Following the movements of a (usually) smaller set of interest points or objects (e.g., vehicles or humans) in the image sequence. Optical Flow: To determine, for each point in the image, how that point is moving relative to the image plane, i.e., its apparent motion. This motion is a result both of how the corresponding 3D point is moving in the scene and how the camera is moving relative to the scene.

1.3.3.

Scene reconstruction

Given one or (typically) more images of a scene, or a video, scene reconstruction aims at computing a 3D Model of the scene. In the simplest case the model can be a set of 3D points. More sophisticated methods produce a complete 3D surface model.

1.3.4.

Image restoration

The aim of image restoration is the removal of noise (sensor noise, motion blur, etc.) from images. The simplest possible approach for noise removal is various types of filters such as low-pass filters or median filters. More

sophisticated methods assume a model of how the local image structures look like, a model which distinguishes them from the noise. By first analyzing the image data in terms of the local image structures, such as lines or edges, and then controlling the filtering based on local information from the analysis step, a better level of noise removal is usually obtained compared to the simpler approaches. 1.3.5. 3D volume recognition Parallax depth recognition generates a 2.5D plane, which is a 2D plane with depth information, similar to this geographic contour map. For 3D volume recognition, the 2D pre-processing steps may be applied first (noise reduction, contrast enhancement, etc), because initially the data may be in a 2D format from multiple cameras or images, and the 3D data is extracted from the 2D images using parallax depth perception. Generally a 3D volume cannot be easily reconstructed from a single 2D image, because 2D depth analysis of a single image requires a great deal of secondary knowledge about shapes and shadows, and what constitutes separate objects or simply different parts of the same object. But a single 2D camera can be used to do 3D depth analysis if several images can be captured from different positions in space and parallax analysis is done between the different images. 3D volume recognition from only two 2D images in parallax results in a distorted 2.5D plane with depth information, with incomplete backsides to the object field. Transitions from near & far objects are sudden but continuous. Areas of shadow or washout, or where a shape does not appear in both images, results in a null space of no depth information. The rear profile of shape impressions in this 2.5D plane can sometimes be inferred from the incomplete front sides, using a database of known 3D shape profiles. The shape recognition process is faster if certain assumptions can be made about the images to be recognized; for example, warship radar which generates a 2.5D depth map of friend/foe objects on the surroundings ocean only needs to identify shapes known to represent other ships in an upright position. It does not need to consider shapes where the ships are laying sideways, upside down, or at a usually high-pitched angle out of the water. The volumetric detail of the back sides of objects can be increased by using many more parallax image sets of the scene from different angles and locations, and comparing each 2.5D volume against previously analyzed 2.5D volumes to find shape/angle overlap between the volumes. 3D parallax recognition accuracy can be higher and faster if the different 2D images are captured from known specific source positions and angles.

1.4. Computer vision systems

The organization of a computer vision system is highly application dependent. Some systems are stand-alone applications which solve a specific measurement or detection problem, while others constitute a sub-system of a larger design which, for example, also contains sub-systems for control of mechanical actuators, planning, information databases, man-machine interfaces, etc. The specific implementation of a computer vision system also depends on if its functionality is pre-specified or if some part of it can be learned or modified during operation. There are, however, typical functions which are found in many computer vision systems. Image acquisition: A digital image is produced by one or several image sensors, which, besides various types of light-sensitive cameras, include range sensors, tomography devices, radar, ultra-sonic cameras, etc. Depending on the type of sensor, the resulting image data is an ordinary 2D image, a 3D volume, or an image sequence. The pixel values typically correspond to light intensity in one or several spectral bands (gray images or color images), but can also be related to various physical measures, such as depth, absorption or reflectance of sonic or electromagnetic waves, or nuclear magnetic resonance. Pre-processing: Before a computer vision method can be applied to image data in order to extract some specific piece of information, it is usually necessary to process the data in order to assure that it satisfies certain assumptions implied by the method. Examples are o Re-sampling in order to assure that the image coordinate system is correct. o Noise reduction in order to assure that sensor noise does not introduce false information. o Contrast enhancement to assure that relevant information can be detected. o Scale Space representation to enhance image structures at locally appropriate scales. Feature extraction: Image features at various levels of complexity are extracted from the image data. Typical examples of such features are o Lines, edges and ridges. o Localized interest points such as corners, blobs or points. o More complex features may be related to texture, shape or motion. Detection/segmentation: At some point in the processing a decision is made about which image points or regions of the image are relevant for further processing. Examples are Selection of a specific set of interest points. Segmentation of one or multiple image regions which contain a specific object of interest. High-level processing: At this step the input is typically a small set of data, for example a set of points or an image region which is assumed to contain a specific object. The remaining processing deals with, for example:
o o

o o o o

Verification that the data satisfy model-based and application specific assumptions. Estimation of application specific parameters, such as object pose or object size. Image recognition: classifying a detected object into different categories. Image registration: comparing and combining two different views of the same object.

1.5. Related Areas


Much of artificial intelligence deals with autonomous planning or deliberation for robotically systems to navigate through an environment. A detailed understanding of these environments is required to navigate through them. Information about the environment could be provided by a computer vision system, acting as a vision sensor and providing high-level information about the environment and the robot. Artificial intelligence and computer vision share other topics such as pattern recognition and learning techniques. Consequently, computer vision is sometimes seen as a part of the artificial intelligence field or the computer science field in general. Physics is another field that is closely related to computer vision. Computer vision systems rely on image sensors, which detect electromagnetic radiation which is typically in the form of either visible or infra-red light. The sensors are designed using solid state physics. The process by which light propagates and reflects off surfaces is explained using optics. Sophisticated image sensors even require quantum mechanics to provide a complete understanding of the image formation process. Also, various measurement problems in physics can be addressed using computer vision, for example motion in fluids. A third field which plays an important role is neurobiology, specifically the study of the biological vision system. Over the last century, there has been an extensive study of eyes, neurons, and the brain structures devoted to processing of visual stimuli in both humans and various animals. This has led to a coarse, yet complicated, description of how "real" vision systems operate in order to solve certain vision related tasks. These results have led to a subfield within computer vision where artificial systems are designed to mimic the processing and behavior of biological systems, at different levels of complexity. Also, some of the learningbased methods developed within computer vision have their background in biology. Yet another field related to computer vision is signal processing. Many methods for processing of one-variable signals, typically temporal signals, can be extended in a natural way to processing of two-variable signals or multi-variable signals in computer vision. However, because of the specific nature of images there are many methods developed within computer vision which have no counterpart in the processing of one-variable signals. A distinct character of these methods is

the fact that they are non-linear which, together with the multi-dimensionality of the signal, defines a subfield in signal processing as a part of computer vision. Beside the above mentioned views on computer vision, many of the related research topics can also be studied from a purely mathematical point of view. For example, many methods in computer vision are based on statistics, optimization or geometry. Finally, a significant part of the field is devoted to the implementation aspect of computer vision; how existing methods can be realized in various combinations of software and hardware, or how these methods can be modified in order to gain processing speed without losing too much performance. The fields most closely related to computer vision are image processing, image analysis and machine vision. There is a significant overlap in the range of techniques and applications that these cover. This implies that the basic techniques that are used and developed in these fields are more or less identical, something which can be interpreted as there is only one field with different names. On the other hand, it appears to be necessary for research groups, scientific journals, conferences and companies to present or market themselves as belonging specifically to one of these fields and, hence, various characterizations which distinguish each of the fields from the others have been presented. The following characterizations appear relevant but should not be taken as universally accepted: Image processing and image analysis tend to focus on 2D images, how to transform one image to another, e.g., by pixel-wise operations such as contrast enhancement, local operations such as edge extraction or noise removal, or geometrical transformations such as rotating the image. This characterization implies that image processing/analysis neither require assumptions nor produce interpretations about the image content. Computer vision tends to focus on the 3D scene projected onto one or several images, e.g., how to reconstruct structure or other information about the 3D scene from one or several images. Computer vision often relies on more or less complex assumptions about the scene depicted in an image. Machine Vision tends to focus on applications, mainly in manufacturing, e.g., vision based autonomous robots and systems for vision based inspection or measurement. This implies that image sensor technologies and control theory often are integrated with the processing of image data to control a robot and that real-time processing is emphasized by means of efficient implementations in hardware and software. It also implies that the external conditions such as lighting can be and are often more controlled in machine vision than they are in general computer vision, which can enable the use of different algorithms. There is also a field called imaging which primarily focus on the process of producing images, but sometimes also deals with processing and analysis of

images. For example, medical contains lots of work on the analysis of image data in medical applications. Finally, pattern recognition is a field which uses various methods to extract information from signals in general, mainly based on statistical approaches. A significant part of this field is devoted to applying these methods to image data.

1.6. Applications for Computer Vision One of the most prominent application fields is medical computer vision or medical image processing. This area is characterized by the extraction of information from image data for the purpose of making a medical diagnosis of a patient. Generally, image data is in the form of microscopy images, X-ray images, angiography images, ultrasonic images, and tomography images. An example of information which can be extracted from such image data is detection of tumors, arteriosclerosis or other malign changes. It can also be measurements of organ dimensions, blood flow, etc. This application area also supports medical research by providing new information, e.g., about the structure of the brain, or about the quality of medical treatments. A second application area in computer vision is in industry, sometimes called machine vision, where information is extracted for the purpose of supporting a manufacturing process. One example is quality control where details or final products are being automatically inspected in order to find defects. Another example is measurement of position and orientation of details to be picked up by a robot arm. Machine vision is also heavily used in agricultural process to remove undesirable food stuff from bulk material, a process called optical sorting. Military applications are probably one of the largest areas for computer vision. The obvious examples are detection of enemy soldiers or vehicles and missile guidance. More advanced systems for missile guidance send the missile to an area rather than a specific target, and target selection is made when the missile reaches the area based on locally acquired image data. Modern military concepts, such as "battlefield awareness", imply that various sensors, including image sensors, provide a rich set of information about a combat scene which can be used to support strategic decisions. In this case, automatic processing of the data is used to reduce complexity and to fuse information from multiple sensors to increase reliability. Artist's Concept of Rover on Mars, an example of an unmanned land-based vehicle. Notice the stereo cameras mounted on top of the Rover. One of the newer application areas is autonomous vehicles, which include submersibles, land-based vehicles (small robots with wheels, cars or trucks), aerial vehicles, and unmanned aerial vehicles (UAV). The level of autonomy

ranges from fully autonomous (unmanned) vehicles to vehicles where computer vision based systems support a driver or a pilot in various situations. Fully autonomous vehicles typically use computer vision for navigation, i.e. for knowing where it is, or for producing a map of its environment (SLAM) and for detecting obstacles. It can also be used for detecting certain task specific events, e. g., a UAV looking for forest fires. Examples of supporting systems are obstacle warning systems in cars, and systems for autonomous landing of aircraft. Several car manufacturers have demonstrated systems for autonomous driving of cars, but this technology has still not reached a level where it can be put on the market. There are ample examples of military autonomous vehicles ranging from advanced missiles, to UAVs for recon missions or missile guidance. Space exploration is already being made with autonomous vehicles using computer vision, e. g., NASA's Mars Exploration Rovers and ESA's ExoMars Rover.

Other application areas include: Support of visual effects creation for cinema and broadcast, e.g., camera tracking (match moving). Surveillance.

2.

Computer Vision in Autonomous Vehicle Navigation

To achieve goal-directed autonomous behaviour, some sort of sensory information about the environment is essential. The first of the solutions that strikes oneself is to observe that in biological systems and try to replicate the same. Though, living beings use several senses to learn about their surroundings, perhaps the most important of these for navigation is vision. Taking inspiration from here, scientists have tried to model a vision system for a mobile robot enables it to locate and grasp the relevant aspects of the world such that an intelligent navigation system can plan appropriate action using the visual information.

Figure 4: Computer Vision in Autonomous Vehicle Navigation There are broadly 2 types of navigation that need to be assisted to the vehicle depending upon the environment in which the robot is present: 1. Indoor Navigation. i) Map Based. ii) Map Building. iii) Map less. 2. Outdoor Navigation. i) Structured Environment. ii) Unstructured Environment.

As the name suggests, this kind of navigation is restricted to indoor arenas. Here the environment is generally well structured and map of the part from the robot to the target is known many a times. The most prominent examples that fall into this category are guiding a robot through a hallway, reaching specified locations on user commands, performing scheduled tasks, operations in hazardous situations or inside nuclear power plants and many more. Some of the first vision systems developed for mobile robot navigation relied heavily on the geometry of the space and other metrical information for driving the vision process and performing self localization. In particular, the interior space was represented by CAD models of varying complexity. With time, these were replaced by simpler models like occupancy maps, topological maps and even sequences of images.

2.1.

Indoor Navigation

All these and subsequent efforts fall into three broad categories.

2.1.1. Map-Based Navigation: These are systems relying on user created geometric models or topological models of the environment. 2.1.2. Map-Building based Navigation. These are systems that use sensors of various kinds to construct their own geometrical or topological models and use the same for navigation. 2.1.3. Map less Navigation: These are systems that do not use any explicit representation of the environment space but rather recognize objects found in the surrounding or resort to tracking them by generating motions based on visual observations. Map-based navigation techniques may further be classified as 2.1.3.1.Absolute Localization: In these kinds of strategies, the current position of the robot is determined with respect a fixed origin which often is the starting position of the robot. Such approaches are referred to as Absolute Localization techniques as at every instant the position of the robot is known absolutely i.e. relative to a global reference frame. 2.1.3.2.Incremental Localization: In a large number of practical situations, the initial position of a robot is known atleast approximately. In such cases, the localization algorithm must simply keep track of the uncertainties in the robots position as it executes motion commands and, when the uncertainties exceed a bound, use its sensors for a new fix on its position. The main advantage of absolute localization is that since the position estimates are always computed from a fixed origin (accurately known), the errors in estimation at each position are independent of each other. Therefore these errors do not pile up indefinitely as in the case of incremental localization techniques. Thus known the possible sources of error, an upper bound on the maximum possible error can be computed in the case of absolute localization and care can be taken to minimize the same. The main approaches for tackling the indoor navigation problem are described in the sections below.

2.2.

Navigation Using Occupancy Grids

Sonar measurements are used to provide maps of the robots environment with regions classified as empty, occupied or unknown; matches of new maps with old ones for landmark classification and also to obtain or correct global position and orientation information. The method starts with a number of range measurements obtained from sonar units whose position with respect to one another is known. Each measurement provides information about empty and possibly occupied volumes in the space subtended by the beam. This occupancy information is projected onto a two-dimensional horizontal map. Sets of readings taken both from different sensors and from different positions of the robot are progressively incorporated into the sonar map. As more readings are added the

area deduced to be empty expands, and the expanding empty area encroaches on and sharpens the possibly occupied region. The map becomes gradually more detailed. Or in other words, the 3-D environment is divided into a grid of several cells and each cell is attached with a probability value that is measure of the belief that the cell is occupied. While occupancy-grid based approaches are capable of generating maps that are rich in geometric detail, the extent to which these details can be relied upon depends on the accuracy of the robots odometry and sensor uncertainties during map construction. Additionally for large and complex environments, this kind of representation is not computationally efficient for path planning.

2.3.

Navigation using Optic Flow

This approach takes its inspiration from flying insects. The first systems, built by Santos-Victor et. al. mimicked the visual behaviour of the insects. Optic flow is technically defined as the perceived visual motion of objects as the observer moves relative to them. It is generally useful for navigation because it contains information about the self-motion and the 3-D structure of the environment. In layman terms, for a robot with two cameras one on either side navigating a corridor, the difference between the velocity of the image seen with the left eye and the velocity of the image seen with the right eye is computed. If it is approximately zero then the robot stays in the middle of the corridor. However, if the velocities are different, then the robot moves towards the side whose image changes with smaller velocity until the speed of the images match. This method is more relevant and is very commonly used aerial systems as optic flow data can not only be used to guide planar motion but also estimate the altitude and thus control the same.

2.4.

Navigation using Object Recognition

Symbolic navigation means that the robot is not commanded to go to the locations given by specific coordinates but by symbolic commands such as go to the desk in front of you, go to the open door on the left of the desk, and go to the next room through the open door. Such symbolic commands are very convenient for a human to use since he or she uses the same commands with the other person in everyday life. The recognition method used here exploits the functionality of objects. An object is described by its significant surfaces and functional evidence. Significant surfaces are chosen based on their functional role. For instance, the primary role of a desk is thus characterized by a work surface and some surfaces that correspond to the support structures, like the legs. Functional evidence can be generated when the object is used for the function for which it is intended. For example, we may observe a desktop supporting some other objects. Such other objects constitute functional evidence.

Figure 5: Example of Path Planning Functional evidence provides additional evidence in recognition of an object even if some significant surfaces are not visible. Functional evidence also gives state information of an object. Because functional evidence of a door consists of objects seen through a door, detecting functional evidence of a door means that a door is open. The s-map represents the locations of the visible 3D surfaces of obstacles in 2D space, and can be made efficiently. The name, s-map, comes from the idea of squeezing 3D space into 2D space. The efficiency of the method comes from making good use of the characteristics of indoor environments. Indoor environments have flat floors, vertical walls, and manmade objects. The general idea of the whole process is summarized in the schematic below. The most common problems with indoor settings arise because the objects and their arrangements are constantly changing. Also, it is very difficult and tedious to describe the environment completely using geometric details. This is where the above method proves very advantageous. Since in the above method, generic models are used to characterize the environment, the above shortcomings are handled in an easy and efficient manner.

The approaches described in the first section fall under the category of Map building navigation techniques while the approaches in the latter two sections can be categorized as Map less navigation techniques. Map-based navigation approaches generally involve some kind of symbolic recognition or landmark tracking.

2.2. Outdoor Navigation


Outdoor Navigation is the most common and probably the more difficult kind for terrestrial navigation involved in tracking and surveillance procedures. As with indoor navigation, outdoor navigation also involves obstacle-avoidance, landmark detection, map building and position estimation. However, a complete map of the environment is hardly ever known a priori and the system has to cope with the objects as they appear in the scene, without prior information about their expected position. Hence it is relatively a more difficult problem when compared with indoor navigation and generally involves the application of high end sensors, real-time processing techniques and efficient algorithms to control the trajectory of the robot. Outdoor navigation problems can be often characterized as road following. A representative of this type of navigation is the NAVLAB systems for autonomous navigation developed by researchers at CMU, vision guided road following for Autobahns in Germany and the Prometheus system. 2.2.1. Navigation in Structured Environment In general, outdoor navigation in structured environments involves some kind of road following. Road following can be defined as the ability to recognize the lines that separate the lanes or separate the road from the non-road area; or the road surface from the adjoining surfaces. Road following for outdoor robots is similar to hallway following for indoor robots except for the problems caused by shadows, changing illuminations, colours etc. This survey discusses two of the most prominent pieces of work in the field of outdoor navigation- the NAVLAB and the VITS.

Figure 6: Map Plotting Operations

2.2.1.1.

NAVLAB - The Navigation Laboratory

NAVLAB is one of the first attempts at autonomous outdoor navigation built at the Robotics Institute, CMU. It uses two kinds of vision algorithms: colour vision for road following and 3-D vision for obstacle detection and avoidance. Overview of the Road Following Algorithm As the above figure indicates, the algorithm involves three stages. 1. Classification of each pixel. 2. Next step involves the usage of the results of classification to arrive at the best fit road position. 3. The last step is the collection of these new colour statistics based on the detected road and non-road regions. The approach uses a multi-class adaptive colour classification to classify the image points into road and non-road on the basis of colour. Each class is represented by the mean, and covariance matrix, of red, green, and blue values, and by its a priori likelihood, based on expected fraction of pixels in that class. Assuming Gaussian distribution, the confidence that a pixel of colour X belongs to a class is computed. Each pixel is classified with the class of highest probability. The road geometry can be described by the two following parameters: The intercept (P), which is the image column of the roads vanishing point. This is where the road centre line intercepts the vanishing line of the road. The intercept gives the roads direction relative to the vehicle. Thus the row position of the vanishing line in the image becomes a characteristic.

The orientation (e) of the road in the image is an indication of how far the vehicle is to the right or left of the centre line. In a two-dimensional parameter space, with intercept as one dimension and orientation as the other, each point classified as road votes for all road (P, e) combinations to which it could belong, while non-road points cast negative votes. The (P, e) pair that receives the most votes is the one that contains the most road points, and it is reported as the road. Once the road has been found in an image, the colour statistics of the road and off-road models are modified for each class by re-sampling the detected regions and updating the colour models. The road is picked out by hand in the first image. Thereafter, the process is automatic, using the segmentation from each image to calculate colour statistics for the next 3-D Perception Algorithm Colour vision is not sufficient for navigation. Information such as the location of obstacles is essential which requires processing of 3-D data. Thus this approach simultaneously achieves two objectives; one of obstacle avoidance to locally steer the vehicle and the other of terrain analysis to provided a more detailed description of the environment. In order to study 3-D vision a scanning laser range finder was used. The scanner measures the phase difference between the laser signal and its reflection from the target object, which in turn provides the distance between of the target from the scanner. The complete obstacle detection algorithm can be summarized as extracting the surface discontinuities, pixels with high jumps in xy-z, followed by finding clusters in space of surface normals and identify the corresponding regions in the original image and finally expanding the region until the fitting error is larger than the given threshold. The expansion proceeds iteratively adding the point of the region boundary that adds a minimum fitting error. Highlights of this approach It is worth noting that since this road following method does not rely on the exact local geometry of the road, it is very robust. The road may actually curve or not have parallel edges, or the segmentation may not be completely correct, but still the method outputs approximate road position and orientation. Also, the key feature here is the use of two different approaches to tackle the problem of navigation. It overcomes the drawbacks of both line tracking and region analysis. Line tracking relies on the success of extraction of road edges but in practice the strongest edges are often weaker and do not necessarily fit the straight line model. Also in region analysis employs colour for classification of pixels into road and non road. The drawbacks are overcome by using a multi-class classification technique reducing the problem of dominant features like the dark shadows and light sun.

2.2.1.2.

VITS - Vision Task Sequencer

The task of the vision system in a road following scenario is to provide a description of the road for navigation. Roads may be described in a variety of ways, e.g., by sets of road edges, a centre line with associated road width, or planar patches. In this approach, it has been chosen to represent a road by its edges, or more precisely, points in three space that, when connected, form a polygonal approximation of the road edge. Often, however, the dominant linear features in road images are the shoulder/vegetation boundaries rather than the road/shoulder boundaries.

Figure 6: Vision Task Sequencer The difficulties in extracting the real road boundary from the image led to adoption of a segmentation algorithm to first extract the road in the image, track the road/non-road boundary, and then calculate three dimensional road edgepoints. The video data processing unit (VIVD) uses a clustering algorithm to segment the image into road and non-road regions. After producing a binary road image, the road boundaries are traced and select image points are transformed into three dimensional road boundary points. The algorithm is summarized in the following steps as in the diagram below. 2.2.2. Navigation

in

Unstructured Environments

An outdoor environment with no regular properties that could be perceived and tracked for navigation may be referred to as unstructured environment. In such cases, the vision system can make use of at most a generic characterization of the possible obstacles in the environment. Unstructured environments arise in cross-country navigation as, for example, in planetary (lunar/martian-like) terrain navigation. Perhaps many peoples idea of the state of the art in outdoor robot navigation is the rover Sojourner placed on Mars by the NASA Pathfinder Mission on 4th July 1997.

Figure 7: A Vision Based Robot for Unstructured Environment As the rover drove around, it was watched by high-definition stereo cameras at the lander. The robot also transmitted continuous telemetry information back to the lander about the states of all of its sensors and systems. Twice per Martian day information and pictures were forwarded from the Lander to Earth. At the control station on Earth, the images from the Lander were used to construct a 3D view of the rover and its surroundings from which a human operator was able to obtain a realistic impression of the situation. Guidance of the rover was achieved in two steps: at the start of a day, the human operator examined the 3D view and planned a route for the rover in terms of a sequence of waypoints through which it would be required to pass. While heading for a particular waypoint, frequent stops were made to check for hazards. Obstacles in the path were searched for using two forward-facing CCD cameras and 5 lasers projecting vertical stripes in a diverging pattern. From stereo views of the scene with each of the lasers turned on individually, a triangulation calculation allowed the location of any obstacles to be found. If an obstacle

sufficiently obstructed the rover, a local avoiding manoeuvre was planned and executed. If for some reason the rover was unable to return to its original path within a specified time after avoiding an obstacle in this way, the robot stopped to wait for further commands from earth. Since the human operator carried out most of the goal selection, obstacle avoidance and path-planning for the robot during the manual waypoint input stage, the degree of autonomy in the navigation performed by the rover was actually quite small Nevertheless, Sojourner is still an icon for navigation systems in unknown, unstructured environments.

3. Algorithm Development 3.1. An Improved Technique Tracking Robot for Vision Based Path

This section describes a simple algorithm for tracking path for a line following robot (black line on white background) with the information related to steering decision obtained from the images of the path by processing the images using MATLAB Image Processing Toolbox. The images are captured using MATLAB Image Acquisition Toolbox by triggering frames from a video in real time and applying the algorithm on these frames. This approach provides an improvement to the infrared sensor based robots, which tend to give random values due to unavailability of adequate light, are not independent of the shape of the path as there is no information of the path ahead, and shape and width of the path is also a deciding factor for sensor separation. This approach is inspired from the human vision of determining the deviation of the path by having knowledge of the path ahead by comparing the orientation of the path. This is a simple computational technique working on the pixel information of the image in comparison to other complex mathematical techniques available. The algorithm has been verified using a recorded video and correct deviation of the path has been observed.

3.1.1.

Introduction

In recent years, concept of visual path following has been applied on autonomous robot navigation using sophisticated techniques like Kalman filters, Voronoi diagrams for feature extraction from the image and accurate determination of the kinematics of robot. Some of the advanced approaches are EYEBOT robots at University of Western Australia and ALVINN project at Carnegie Mellon University which make use of advanced image processing in coordination with neural networks. Our approach is to use a simple image processing technique to extract basic information from the images of the path and change the steering of the robot accordingly. This approach does not require approximating curvature of the path. It only needs comparison between pixels and can be implemented on a much simpler hardware. In this section, an algorithm for path tracking robot has been implemented using the knowledge of the path ahead. The steering decision is taken by comparing the position of current pixel with the position of pixel that will be present at the predetermined pixels ahead of the current pixel in an 8 bit

binary image. The images required for the algorithm are extracted from a video which is generated using USB webcam, with parameters like frames rate and the number of frames to be captured, controlled by the user. The existing algorithms for path tracking depend on robot motion like the funnel lane algorithm, but this approach is much simpler and does not need any camera adjustment according to path. The coloured images captured are firstly converted into gray scale images and then converted into binary images by applying a threshold value. Hence, the information of the path is in the high contrast image, in which path can be easily differentiated from the surroundings. The result is that, except for the path, all the background is white and is represented by 1 value in an 8 bit image, whereas, the black path is represented by a 0 value. The video is stored in the working directory as a .mat file which can be simulated for reference.

3.1.2.

Proposed Algorithm

This algorithm works on the human behaviour while driving a vehicle by taking left and right turns by comparing the orientation of the path and distance from the change in orientation of the path. In order to get a close approximation of the deviation of the path, a rather smaller area of the image is processed than the complete image.

Figure 8: Path Approximation The binary image is obtained from image conversion operations. The basic steps involved in the algorithm are: Step I: Capture of coloured images using camera positioned vertically above the path. Step II: Conversion of coloured image to gray scale image. Step III: Conversion of gray scale image to a binary image. Step IV: Cropping of image to obtain a smaller area to be processed. Step V: Determination of first black pixel in the first row of the area being processed. Step VI: Determination of first black pixel in the last row of the area being processed.

Step VII: Comparison of the column value of pixel location in Step V and Step VI to obtain a steering decision. Step IX: Repetition of all the above steps for the next frame being captured and for total number of frames. Step X: End. Since the deviation can also be measured by approximating the average of the black pixels representing the path. The changed steps are: Step V: Determination of mean value of black pixels in the first row of the area being processed. Step VI: Determination of mean value of black pixels in the last row of the area being processed.

Figure 9: Pixels in a Binary Image

Figure 10: Cropped Image

Figure 11: The Final Working Image The image is cropped so that deviation is measured only for 50 pixels ahead of the current pixel position; hence the image is the cropped starting from 70 pixels in the y axis to 120 pixel in y axis taking all values for x axis from 1 to 160 pixels giving us the area to be processed. This cropped image will be used to obtain two slices of images by taking the upper row of the binary image and lower part of the image which is the lower slice. All the black pixels having a value 0 are found in the upper and lower slice/row. For finding the current position, the first black pixel of lower slice is taken, which is the minimum value of the black pixel present in the last row, provided all the black pixel have been found. Similarly for finding the deviation of the path, first black pixel in the upper slice is found provided all the black pixels have been found in the first row. Now the value for first black pixel in first row and last row are compared in order to take the steering decision. This enables the robot to compare its current position on the path with the position, that is 50 pixels ahead from the current position, and take the decision to go left or right according to the deviation.

Figure 12: The Computed Deviation Considering Pixel_1 to be the column value of first black pixel in last row, and Pixel_2 to be the column value of first black pixel in first row, the following cases are possible: Case 1:Pixel_2>Pixel_1 Right; Case 2:Pixel_2<Pixel_1 Left; Case 3:Pixel_2=Pixel_1 Straight; In real time application to autonomous robots, the movements (Right, Left, and Straight) can be represented by different characters, which can be communicated to the microcontroller.

3.1.3.

Methodology

The algorithm in this section was applied on frames that were triggered during the previously recorded video of the path and not in real time. The images were triggered from the video using Image Acquisition Toolbox of MATLAB and were stored as a .mat file in the MATLAB working directory. In real time application to autonomous robots, the algorithm is applied to a particular triggered frame and the image triggering is paused, till the algorithm has been implemented on that frame and desired steering action has been taken. Only after that, the next frame will be triggered from the video. One of the advantages of the algorithm is that even if the steering decision taken was incorrect due to a noise disrupted frame,

the direction of movement would be corrected with the triggering of the next frame, provided the path is still in the frame of camera. In order to give a user friendly representation to the algorithm, the successive frames and their deviation were represented in a GUI. A. Image Acquisition For Image Acquisition, the camera was configured to capture images of resolution 160X120 at a frame rate of 6 frames per second and to trigger a pre determined number of frames. Since, we need to extract pixel information from the triggered image in order to find the deviation; we need to find a high contrast between the path and surrounding in order to differentiate the path from the surrounding which was not possible in coloured image. Moreover, there was a red tint in the image, which made it difficult to locate the path. Therefore, Image Enhancement was done. B. Image Enhancement The coloured image was converted to a gray scale image, and using global threshold determined from the image itself, the gray scale image was converted to a black and white image, with a high contrast between surroundings and the path as shown in Fig. 8.

Figure 13: Image Enhancement The image is also called a binary image since it represented a white pixel with a value 1 and black pixel with a value 0, which reduces the problem, to find the path represented by pixel values 0 in the binary image. C. Graphical User Interface In order to represent the frame and the deviation of the path in that frame, a graphical user interface is created, which gives a clear representation of the deviation in the path to be followed. A target window has been provided, so that user can closely monitor the path and its corresponding deviation. The tool takes frames one by one from the .mat file and apply the algorithm on it. A line is

plotted to represent the deviation and is displayed next to the original image. In real time, this GUI can be interfaced with the image acquisition code so that the deviation of path in a particular frame can be found as the frame is triggered from the video.

Figure 14: Graphical User Interface

3.1.4.

Embedded System Control

In future, such a technique can be tested in real time on the embedded system and the algorithm could be altered keeping in mind, the mounting of the camera and time taken by the robot to move left or right. Since the algorithm works on MATLAB, which saves the computational requirements, hence a simple microcontroller is required. Certain special conditions can be incorporated in the algorithm, like noise elimination, and path duplicity in the images. In order to eliminate the wires, wireless serial communication can be used, and USB camera can be replaced with a wireless camera.

4. Google Driverless Vehicle

The Google Driverless Car is a project by Google that involves developing technology for driverless vehicles. The project is currently being led by Google engineer Sebastian Thrun, director of the Stanford Artificial Intelligence Laboratory and co-inventor of Google Street View, whose team at Stanford created the robotic vehicle Stanley which won the 2005 Grand DARPA Challenge and its US $2 million prize from the United States Department of Defence.

Figure 15: Cars used by Google Street View.

The team developing the system consisted of 15 engineers working for Google, including Chris Urmson, Mike Montemerlo, and Anthony Levandowski who had worked on the DARPA Grand and Urban Challenges. Lobbied by Google, the Nevada Legislature passed a law in June 2011 to authorize the use of autonomous vehicles. Nevada became the first state where driverless vehicles can be legally operated on public roads. The Nevada Department of Transportation (NDOT) is now responsible for setting safety and performance standards and for designating areas where driverless cars may be tested.

4.1. Introduction
The system combines information gathered from Google Street View with artificial intelligence software that combines input from video cameras inside the car, a LIDAR sensor on top of the vehicle, radar sensors on the front of the vehicle and a position sensor attached to one of the rear wheels that helps locate the car's position on the map. As of 2010, Google has tested several vehicles equipped with the system, driving 1,000 miles (1,600 km) without any human intervention, in addition to 140,000 miles (230,000 km) with occasional human intervention, the only accident occurring when a car crashed into the rear end of a test vehicle

while stopped at a red light. Google anticipates that the increased accuracy of its automated driving system could help reduce the number of traffic-related injuries and deaths, while using energy and space on roadways more efficiently.

Figure 16: Example of Map Building Using Sensor The project team has equipped a test fleet of seven vehicles, consisting of six Toyota Prii and an Audi TT, each accompanied in the driver's seat by one of a dozen drivers with unblemished driving records and in the passenger seat by one of Google's engineers. The car has traversed San Franciscos Lombard Street, famed for its steep hairpin turns and through city traffic. The vehicles have driven over the Golden Gate Bridge and on the Pacific Coast Highway, and have circled Lake Tahoe. The system drives at the speed limit it has stored on its maps and maintains its distance from other vehicles using its system of sensors. The system provides an override that allows a human driver to take control of the car by stepping on the brake or turning the wheel, similar to Cruise Control systems already in cars.

4.2. Commercialization
While Google had no immediate plans to commercially develop the system, the company hopes to develop a business which would market the system and the data behind it to automobile manufacturers. An attorney for the California Department of Motor Vehicles raised concerns that "The technology is ahead of the law in many areas" citing state laws that "all presumed to have a human being operating the vehicle." According to the New York Times, policy makers and regulators have argued that new laws will be required if driverless vehicles are to become a reality because "the technology is now advancing so quickly that it is in

danger of outstripping existing law, some of which dates back to the era of horsedrawn carriages."

Figure 17: A View of the Hardware of the Google Driverless Vehicle

Figure 18: Front view of Google Driverless Vehicle. Google lobbied for two bills that made Nevada the first state where driverless vehicles can be legally operated on public roads. The first bill is an amendment to an Electric vehicle bill that provides for the licensing and testing of autonomous vehicles. The second bill will provide an exemption from the ban on distracted driving to permit occupants to send text messages while sitting behind the wheel. The two bills came to a vote before the Legislatures session ended in June 2011. Google executives refused to explain why they want Nevada to be the maiden state for their driverless car.

Figure 19: A Similar Vehicle Developed by Carnegie Mellon University for the DARPA Challenge.

5. Bibliography
N. Nilsson: Shakey the robot, Technical Report, SRI International, 1984. H. Moravec: Obstacle avoidance and Navigation in the real world by a seeing robot rover, PhD thesis, Stanford University, 1981. Thorpe: Vision and Navigation for a mobile robot, PhD thesis, CarnegieMellon University, 1984.

Thorpe, M. H. Herbert, T. Kanade: Vision and Navigation for Carnegie-Mellon Navlab, IEEE transactions on Robotics and Automation, 1988. A. Pomeleau: Vision and Navigation for a mobile robot, PhD thesis, Carnegie-Mellon University, 1984. T. M. Jochem, D. A. Pomerleau, C. Thorpe: MANIAC: A Next Generation Neurally Based Autonomous Road Follower, IEEE transactions on Robotics and Automation, 1988. N. Ayache, O. D. Faugeras: Maintaining Representations of the Environment in a Mobile Robot, IEEE transactions on Robotics and Automation, 1989.D. Kim, R. Nevatia: A method for Recognition and Localization of Generic Objects for Indoor Navigation, IEEE Workshop on Applications of Computer Vision, 1994. R. Ingio, E. McVey, B. Berger, M. Mirtz: Machine vision for Vehicle Guidance, IEEE transactions on Pattern Analysis and Machine Intelligence, 1984. R. Pagnot, P. Grandjean: Fast-cross country navigation on fair terrains, IEEE transactions on Robotics and Automation, 1995. H. Moravec, A. Elfes: High Resolution Maps from Wide-Angle Sonar, IEEE transactions on Robotics and Automation, 1985. K. Sugihara: Some Location Problems for Robot Navigation Using Single Camera, IEEE transactions on Computer Vision, Graphics and Image Processing, 1988. Kosaka, A. Kak: Fast vision-guided mobile robot navigation using model based reasoning and prediction of uncertainities, IEEE transactions on Computer Vision, Graphics and Image Processing, 1992. Kim, R. Nevatia: Symbolic Navigation with Generic Maps, IEEE Workshop on Applications of Computer Vision, 1994. J. Santos-Victor, G. Sandini, F. Curotto, S. Garibaldi: Divergent stereo in autonomous navigation: from bees to robots, International Journal of Computer Vision, 1995 J. Zufferey: Bio-Inspired Vision-based Flying robots, PhD thesis, Ecole Polytechnique Federale de Lausanne, 2005. M. Turk, M. Marra, D. Morgenthaler, D. Gremban: Machine vision for Vehicle Guidance, IEEE transactions on Pattern Analysis and Machine Intelligence, 1988. Wilcox, L. Matthies, D. Gennery, B. Cooper, T. Nguyen, T. Litwin, A. Mishkin, H. Stone: Robotic Vehicles for Planetary explorations, IEEE transactions on Robotics and Automation, 1992.

You might also like