Sonar-Based Mobile Robot Navigation Through Supervised Learning On A Neural Net

AutonomousRobots 3,355-374 (1996) @ 1996Kluwer Academic Publishers. Manufacturedin The Netherlands.
Sonar-Based Mobile Robot Navigation Through Supervised Learning on a Neural Net

PRABIR K. PAL AND ASIM KAR
Division of RemoteHandling and Robotics,BhabhaAtomic ResearchCentre,Murnbai 400085, India

pkpal@magnum.barctI.emet.in
Abstract. For mobile robot navigation in an unknown and changing environment, a reactive approach is both simple to implement and fast in response. A neural net can be trained to exhibit such a behaviour. The advantage is that, it relates the desired motion directly to the sensor inputs, obviating the need of modeling and planning. In this work, a feedforward neural net is trained to output reactive motion in response to ultrasonic range inputs, with data generated artificially on the computer screen. We develop input and output representations appropriate to this problem. A purely reactive robot, being totally insensitive to context, often gets trapped in oscillations in front of a wide object. To overcome this problem, we introduce a notion of memory into the net by including context units at the input layer. We describe the mode of training for such a net and present simulated runs of a point robot under the guidance of the trained net in various situations. We also train a neural net for the navigation of a mobile robot with a finite turning radius. The results of the numerous test runs of the mobile robot under the control of the trained neural net in simulation as well as in experiments carried out in the laboratory, are reported in this paper. Keywords: mobile robot, navigation, supervised learning, neural net, context units
1.
Introduction
To send a robot to operate in a remote, inaccessible and possibly hazardous location, we must impart to it the power to navigate to its destination through an unknown environment. Sometimes, the environment is partly known in terms of corridors and turns. However, the detailed arrangements of objects on the way may not be known in advance. Even if they are known, they may change with time. So the robot must have some means of finding its way in an uncertain environment. For that, it needs some sensors through which it can detect the presence of obstacles. Once an obstacle is detected, it has to suitably modify its course of motion to avoid a collision. In this paper, we consider the navigation of a mobile robot in a completely unknown and changing environment. The robot has to reach a goal point without hitting any object on the way. This goal could be
one in a sequence of intermediate goals on the way to the actual destination. In other words, a broad path to the destination may be known, and only the details need to be worked out by the navigator. Thus, the focus is on local obstacle avoidance, although the same algorithm may navigate successfully on even a long trip. The navigation algorithm works irrespective of the means of mobility. The robot could move on its legs, on wheels or on tracks. In any case, it has to follow the course of motion determined by the navigator. For a vehicle that can turn at will and can move in any direction, this does not pose any problem. However, if the vehicle has to obey certain constraints of motion (e.g., a finite turning radius), then sometimes it may not be in a position to follow the navigators command. In such cases, the navigator must keep in mind the robots limitations and issue its commands accordingly.
356
Pal and Kar
The robot may use several sensors to feel its environment. With stereo vision, it may develop a depth map of its surroundings. Similar information can be obtained on a single plane by using a laser range scanner. An ultrasonic range sensor can also provide range informations, although its angular resolution is poor. However, because it is cheap and easy to use, it has been used extensively for navigation purposes. In our laboratory experiments, we have used an ultrasonic range sensor to know about the environment. The sensor is turned around to record the ranges in all directions about the robot. One way for the robot to navigate would be to try to develop a map or a geometric model of its environment (Elfes, 1987; Moravec, 1988; Thrun, 1996). On the basis of such a model it can then decide its course of motion. This is difficult on two grounds. Sonar data has poor angular resolution and reliability. Forming a useful geometric model of the environment out of such data is well near impossible. Also, given such a model, geometric path planning (Latombe, 1990) is time consuming. As a result, model based navigation is slow and unsure. For real-time response in navigation, one often resorts to a reactive approach (Arkin, 1989; Brooks, 1986; Borenstein and Koren, 1991) in which, at any instant, the robot is sensitive only to its immediate surrounding and the goal point. It does not see or remember the global arrangement of objects, the connectivity of free space or its motions in previous time-steps to decide its current direction of motion. As a result, for some arrangement of objects, a reactive robot may keep wandering indefinitely (cycling or oscillating) without reaching the goal. Nevertheless, in the presence of a global plan, a reactive behaviour results in a quick local modification of a trajectory for the purpose of obstacle avoidance. The simplest way of building a reactive behaviour is through a potential field (Khatib, 1985). However, this approach is too obviously plagued with the problems of local minima. Instead, Borenstein and Koren proposed a polar histogram method to generate reactive motions for the robot (Borenstein and Koren, 1991). They also suggested a method to avoid cyclic or oscillatory motions characteristic of the reactive approach. This was later formalized by Slack in the form of navigational templates (Slack, 1993). A reactive behaviour can also be learnt by a neural net (Pomerleau, 1992; Millan and Torras, 1992; Dubrawski and Crowley, 1994; Meng and Kak, 1993). The advantage is that the network learns to relate the
desired reactive motion directly to the sensor data. No modeling or planning is required. Also, as we show later, by the use of context units one can introduce a short term memory into the net so as to get rid of oscillations characteristic of a purely reactive robot. The power of the neural approach lies in its ability to learn and exhibit such behaviour directly in response to the sonar data, as also in its ability to perform reasonably well even under noisy input. Moreover a neural implementation fits in well with the reinforcement learning (RL) scheme (Lin, 1992) for continued learning and adaptation. Neural nets have also been used in mobile robot navigation for sensor interpretation and fusion (Davis, 1995; Thrun and Bucken, 1996). In this paper, we describe our ongoing efforts to develop reactive navigation capability for a mobile robot through supervised learning on a back-propagation type neural net. Pomerlieus attempt to steer a car from a video image of the road was based on a similar approach, although the situations our mobile robot is expected to handle are quite different. For mobile robot navigation, a connectionist Reinforcement Learning (RL) approach has also been tried out by several researchers (Millan and Torras, 1992; Dubrawski and Crowley, 1994; Lin, 1992). However, it turns out to be extremely slow, because it learns stochastically from very weak feedbacks (yes/no) from the environment. It appears that RL is useful only when the situation-action rules are difficult to decipher, as is the case with many control applications, e.g., in Gullapalli et al. (1994). When the desired behaviour or reaction is at least intuitively predictable, a supervised learning is far more efficient and effective. That is why it is felt that whenever possible a supervised learning should precede an eventual reinforcement learning (Lin, 1992). For supervised learning, sufficient number of training data representing all possible situations have to be acquired and provided. In the case of a mobile robot, its desired reaction to a situation is not at all difficult to guess. However, to provide such reactions in the field in response to the sonar inputs, for the generation of training data, is not at all easy. We circumvent this problem by generating data artificially on the computer screen, where the sonar inputs are simulated. The most important drawback of the reactive approach is its lack of sensitivity to context. If navigation is done purely on the basis of the current range data, every now and then the robot may decide to reverse its motion, because it does not remember that it had been coming from the same direction and it does not
Sonar-Based Mobile Robot Navigation
357
make much sense in going back to the same place. Often, this results in indefinite oscillations in certain situations. What is needed in this case is a resolve not to vascillate and pursue one direction doggedly to go round the obstacle. In the behaviour-based approach (Brooks, 1986; Mahadevan and Connell, 1992; Dorigo and Colombetti, 1994) this problem is handled by including several behavioural modes and by switching from one to the other at appropriate instants. For example, the robot may switch from obstacle avoidance to wall following as the dominant behaviour, in order to go round a wide object. While learning the basic behaviours may not be as difficult, what remains for the robot to learn now is to coordinate these behaviours (Maes and Brooks, 1990; Dorigo and Colombetti, 1994). In this work, however, we have aimed at obtaining an integrated behaviour of the robot by using context units at the input layer of a feedforward neural net. This is simple to implement because the individual behaviours do not have to be identified and prioritized. Also, in terms of performance it is satisfactory so long as the environment is not too complex. The context units seem to play the role of the coordinator of behaviours. 2. Input and Output Representations
2.1.
Input Representation
A feedforward neural net learns to map from the inputs to the outputs of the training data. The quality of learning depends on the smoothness of this mapping function. Wherever the function undergoes sharp variations, the neural net finds it difficult to learn. This results in a large error and a long learning time. The situation can often be improved considerably by representing the inputs and outputs in such a way that the new input output relation becomes smoother. Finding an appropriate representation for a problem requires a good idea about the role played by each of its inputs in deciding its output. Here we have a goal, which has to be reached avoiding obstacles with the help of the ultrasonic range data acquired periodically from all directions. The only means of control is through the steering angle of the mobile robot. Thus the obvious inputs are the goal position with respect to the current robot position and orientation, and the ultrasonic range data at some discrete angles over 360 deg. The output is of course the steering angle.
An information that is crucial in determining the output may be encoded directly or indirectly into all the inputs, if possible. For example, in the navigation problem, the goal direction is the most important input, because it sets the direction of uninhibited motion. We arrange the range data for input to the network always starting from the goal direction. Thus this information is implicitly conveyed through the arrangement of the entire input data. Although the goal direction keeps changing in the world coordinate as the robot moves on, the neural net always seesthe situations with respect to the goal direction and tries to learn the underlying motion strategies with respect to the same direction. The range data also form important inputs from the environment. However, their interpretation depends largely on the direction and the distance to the goal point. If the goal point is behind an object, represented by a set of range data, then the robot must move away from the object so as to go round it. On the other hand, if the goal point is in front of the same object, the robot is expected to approach the object and reach the goal point. Thus the same range information should lead to entirely different kinds of motion of the robot depending on the distance to the goal point. This calls for a transformation of the range data such that the network is able to distinguish between these two situations clearly. Theperceptive range of the robot is the distance up to which it can sense the presence and absence of objects. It depends on the sensitivity of the range sensors as well as on the dimensions, characteristics and orientations of the reflecting surfaces of the objects around. The reactive range R, of the robot is the distance to which the robot takes notice of the presence of any object before deciding its course of motion. In other words, a range data outside the reactive range is ignored and treated as free space. The reactive range must be less than the perceptive range of the robot. We dynamically vary the reactive range along with the goal distance ds, subject to a maximum, called the maximum reactive range R,. Thus, R, = ds, = RXI dg < Rx d, L Rx (1)
The range data r are then transformed into rT as rT=r-Rx, = Rf, r 5 R, r P R,
(2)
358
Pal and Kar
where Rf is the range coding for free space. To distinguish it clearly from the occupied ranges, we set Rf to a positive constant value, viz., R,. When the goal is far off2 (dg > R,), the reactive range is R,. So all range data higher than R, are coded as free space Rf, and the range data r less than R, are subtracted by R, to form the corresponding transformed range YT. A negative value of rT gives an indication that the object is closer than the goal and depending on the goal direction the robot may have to make corrective motion for going round it. The more negative the value of rr, the closer is the object posing obstruction to the robot. Figure l(a) shows such
a situation. The circle represents the reactive range which is at its maximum value R, in this case. Only the occupied sectors have been shown; The free sectors are left blank. When the goal is nearby, i.e., within Rx, all ranges higher than the goal distance are treated as free space. So objects behind the goal (as well as in other directions beyond d,), even within R,, are completely ignored. They are no more relevant for reaching the goal. Figure l(b) shows such a situation. The inner circle shows the reactive range R,, whereas the outer circle shows Rx. If the shortest range recorded by its range sensors exceeds the reactive range (R,), the robot seesan empty environment. For training reactive navigation, we selected a layered feedforward neural net with a single hidden layer. The input layer has 24 neurons. They receive the transformed range data rT at 15 degrees interval starting from the goal direction. 2.2. Output Representation
(b)
Figure 1. Occupied sectors at S when the goal position G is (a) outside (b) inside the maximum reactive range R,. R, is the reactive range.
We choose a distributed representation for the steering angle at the output. There are 12 neurons at the output. They correspond to 12 directions of motion of the mobile robot at 30 degrees interval starting from the goal direction. The level of activation of any,of these units represent the desirability of the robot moving in that direction. The unit with the highest activation decides the actual direction of motion. For training, we set the target output as a Gaussian distribution with its peak on the neuron representing the desired direction of motion. In effect, even the neighbouring neurons around the peak receive some activation. This conveys the fact that the neighbouring neurons indeed represent nearby angles in the physical space; A fact which, though obvious to us, is not apparent to the neural net. As we shall see below, such a distributed representation also enables us to change the output smoothly and continuously even when the direction of motion undergoes an abrupt change. A typical problem of navigation is to decide whether to take a left or a right turn when confronted with an object on the way to the goal. Figure 2 shows the desired reactive motions at two points Pl and P2 for a goal point G and an object 0. A neural net must learn to switch its motion from left to right as we shift the robot across the line OG joining the object to the goal (in an approximate sense). This requires learning
359
the patterns output at the same points by the trained net. They show how the net reconciles to the different output patterns at Pl and P2. In fact, the activation pattern of the output neurons of the trained net now recognizes both the reactive options. While varying its output pattern in a continuous fashion, the neural net can suddenly effect a change in the position of maximum activation, causing a switch from one option to the other. 3.
F&UZ 2. Desired reactive motion at points Pl and P2 for goal position G. The motion is discontinuous across OG.
Training Data for Navigation
I--(a) (b)
-
L---1IIII/
(c)
1.1
(d)
Figure 3. Target activations of the 12 output neurons (for training) at positions (a) Pl and (b) P2 of Fig. 2. Activations actually achieved at the output by the trained net at (c) Pl and (d) P2 show two peaks indicating competition between the two reactive options.
a discontinuous decision surface which a single neuron at the output cannot do. With a distributed output, although the pattern of activations of the output neurons change continuously, the resulting decision may change discontinuously, because it is only the position of the output neuron with maximum activation that forms the decision. Figures 3(a) and (b) show the target activations of the output neurons at the time of training for points PI and P2 respectively of Fig. 2. Figures 3(c) and (d) show
Acquiring sufficiently diverse data in the field in terms of the sonar ranges and the desired directions of motion for different goal points is a difficult task. So, we decided to generate a similar kind of data interactively on the computer screen. While doing so, we did not try to simulate the unreliability (mainly because of specular reflection on inclined surfaces) of sonar sensors. In other words, we do not try to guess the presence of undetected objects or surfaces, and after training we feel satisfied if the robot reacts in a sensible way to whatever it perceives. On the computer screen, we create an environment of simple polygonal objects, set a goal position and show the desired reactive motion at various points. This is repeated with many other goal positions. The 24 range inputs are formed by the minimum range computed in each sector of 15 degrees, starting from the goal direction, at the training point. The target output is formed on the 12 output sectors of 30 degrees each, again starting from the goal direction, through a Gaussian with its peak in the direction of desired motion. The standard deviation is set to 15 degrees. In this representation, the outputs change smoothly and continuously with the direction of motion. While generating data interactively on the screen for supervised learning, it is important to show the desired reactive motion of the mobile robot in relation to only what it seesat that moment through its ultrasonic range sensors and ignore other aspects like the global arrangement of objects or even the overall shape of the object that is currently obstructing the robot. It is important to follow this principle to avoid contradictions in the training data, because two situations may appear very similar from their range data, although given a global view they may look very different to the trainer, inviting very different reactions. In order to be consistent about this, it is necessary to establish a procedure following which these reactions may be determined.
360
Pal and Kar
For registering the desired direction of reactive motion at a point, the objects are first occluded. The shortest distance within each sector is indicated as the range (r). An arc shows the span of the recorded range. Free sectors (r > R,.) are left blank. The occupied sectors are shown in color. There is a fuzzy rule about the color of a sector. An occupied sector with a range from R, down to Rx/2 is shown in blue, from Rx/2 down to Rx/4 is shown in green and Rx/4 downward is shown in yellow. These color codes help in pointing the directions of reactive motions consistently. If it appears that the reactive motion should point to the free space next to a blue sector, then its direction is set one sector away from the blue sector. For a green sector, the reactive motion is set two sectors away and for a yellow sector it is set three sectors away. This ensures that the robot does not approach too close to an object and even if it finds itself quite close to an object, it tries to get away from that object while moving toward the goal point. Quite often however there are not enough free sectors available to adhere strictly to the above rules. In such situations a direction is set that compromises equally with the requirements of the occupied sectors on either side of the free sector. Figure 4 shows the direction of reactive motion in a simple situation. If there are number of free sectors of various widths in different directions at a point, it becomes difficult to select one for onward motion. The goal direction, the previous direction of motion and the width of the free sector together decide the desirability of moving
into a particular free sector. The main difficulty lies in being consistent about this choice. Otherwise, it leads to contradictions among some of the training data, which the neural net finds difficult to reconcile. This shows up in a high worst case error. Here we choose the sector closest to the goal direction as the desired direction of motion, irrespective of the ranges of the neigboring occupied sectors. This is simple to follow at the time of registration of training data and results in consistent, though sometimes counter-intuitive, trajectories.
4.
Results of Reactive Navigation
We trained a neural net with a single hidden layer of 6 neurons with data for 280 situations. R, was set at 1.0 m. For the sake of comparison, the training was provided in three ways-Backpropagation with epoch learning (BPEPOCH) in which weights are updated after seeing all the training data,, Backpropagation with stochastic learning (BPSTOCH) in which weights are updated after seeing each training data and Resilient backpropagation (RPROP) (Riedmiller and Braun, 1993) which employs an adaptive technique for updating its weights after every epoch. Table 1 shows the run-times and convergence characteristics of these learning algorithms. Although initially BPSTOCH seems to learn faster, that is only because it gets a chance to update its weights each time it sees a
Table I. Run-time and convergence characteristics of the three approaches to supervised learning: Backpropagation with epoch leaming (BPEPOCH), Backpropagation with stochastic learning (BPSTOCH) and Resilient backpropagation (RPROP). A 24-6-12 neural net configuration was used for learning 280 patterns of purely reactive navigation. Average and worst case percentage errors after 100 iterations 31.78 57.11 13.64 46.71 17.20 54.83 1000 iterations 17.42 54.05 12.78 45.69 12.96 40.13 4000 iterations 14.47 51.23 12.70 43.49 11.87 39.55 Execution time per iteration (SY 0.97 1.29 0.95
Supervised learning method BPEPOCH* BPSTOCH* RPROP
Figure 4. Reactive motion at S; The direction is determined following a color rule. b and g stand for a blue and a green range.
#on a 486DX2 PC. *learning and momentum parameters were set at 0.5 and 0.75 respectively.
361
training data. In the long run it is RPROP which attains the most impressive figures in the shortest time. BPEPOCH seems to trail behind in average and worst case errors. The figures in Table 1 do not necessarily reflect the quality of learning achieved through the respective methods of training. We assess that by allowing the trained net to navigate in randomly created environments. Such environments are created by randomly positioning and orienting a collection of polygonal objects and then removing or separating out some of them to get rid of non-convex formations of objects and to ensure that the gaps between objects is at least RJ2. Start and goal positions are also selected randomly. However, positions for which the solutions are trivial (viz., minor modification of the line joining the start to the goal position) are not included in the test runs. 100 such runs are taken in 10 environments, i.e., for 10 start and goal pairs in each environment. Figure 5 shows a few runs with the RPROP-trained net in some of these environments. The quality of learning achieved by a trained net is indicated through several parameters-the collision rate, i.e., the percentage of trajectories in which collision takes place, the success rate, i.e., the percentage of trajectories in which the goal is reached without any collision. For the successful trajectories we also compute the number of undulations in every 1000 steps. Undulations in a trajectory is an indication of the lack of smoothness in the mapping function learnt; The fewer the undulations, the smoother is the function and the better is the generalisation. At each step, the robot is said to have moved straight if its direction of motion falls within the -15 deg to f15 deg range with respect to its previous direction of motion. Otherwise it is said to have taken a left or a right turn. If the robot swings from a left to a right turn or vice-versa in two successive steps, we count it as an undulation. The number of undulations is sensitive to the step-length. In this paper, they are always quoted for a step-length of 0.06 m. Table 2 gives the results of the test runs for purely reactive navigation with the three methods of trainingBPEPOCH, BPSTOCH and RPROP. Collision avoidance is learnt perfectly by all the methods, although the success rate is generally poor. This is because in front of a wide object a reactive robot keeps oscillating indefinitely without reaching the goal. So far as the success rate is concerned, BPEPOCH and RPROP fares almost equally well. RPROP however produces
Table 2. Performance measures for the reactive navigation of a point robot in 100 trial runs in 10 randomly created environments for three different learning algorithms. Learning method BPEPOCH BPSTOCH RPROP Collision rate % 0 0 0 Success rate % 51 50 56 Undulations per 1000 steps 51 63 44
Table 3. Effect of the number of neurons in the hidden layer on the quality of learning. No. of neurons in the hidden layer 2 6 12 Collision rate % 46 0 0 Success rate % 54 56 56 Undulations per 1000 steps 57 44 41
smoother trajectories, as indicated by fewer undulations. On the other hand, in spite of having impressive figures of average and worst caseerrors, BPSTOCH has a poorer success rate and high undulations. All these point to the fact that for a fixed set of training data epoch learning results in better generalisation than stochastic learning followed in Pal and Kar (1995) possibly because it gets a more reliable information regarding the shape of the error function through the summed gradient over the whole pattern set. Having chosen RPROP as the most suitable method for training, we proceed to see the effect of the number of neurons in the hidden layer on the quality of learning (see Table 3). With only 2 neurons in the hidden layer, there is a severe constriction in the flow of information from the input to the output. Nevertheless, the success rate of 54% is quite impressive in the relative scale. The difference that this narrow hidden layer makes is seen in the high collision rate of 46% as well as in its relatively high undulations. With 12 neurons in the hidden layer, the success rate remains unchanged, although the undulations get a little worse. It seemsthat for reactive navigation 6 neurons in the hidden layer is adequate and appropriate. A reactive robot can make its way only through small and compact objects. Figure 5(d) shows two trajectories (S l-T1 and S2-T2) which oscillate in front of wide objects and do not make any further progress. Figure 6
362
Pal and Kar
I (4
(4
(4
(0
Figure 5. Reactive trajectories of the point robot under the control of the trained net in random environments (a) to (f). The start and goal points are shown as S and G. Wherever a trajectory terminates in oscillations, that point is marked as T.
shows the lines of motion in that environment although for a different goal position. The flows are smooth and continuous away from the objects. However, close to the objects, there are locations at which the flow directions oppose each other at neighboring points. These
are places where the neural net fails to merge smoothly diverse tendencies of flow in adjoining regions. This is inherent in the reactive approach. We try to alleviate this problem through the introduction of context units in the input layer.
363
-v-v~*~a-w~~+~~ aa*aaa.ea>.v+
arrrrrrrrt
tt
*t
tt
tx,,
rft
t t t t 9
F&We 6. Lines of motion for the environment shown in Fig. 5(d) for the goal position G. The encircled regions have lines of motion facing one another. As the robot enters such a region, it gets trapped into indefinite oscillations. The locations and extents of these regions depend on the location of the goal point.
5.
Context Units
The reason behind the failure of the reactive approach is its total insensitivity to context! It does not remember what it had seen and how it had chosen to act just the previous moment. To introduce a sense of memory in navigation we use context units (Hertz et al., 1991; Jordan, 1989; Yeung and Bekey, 1993). The context units are additional neurons in the input layer. They receive their inputs from the outputs of the previous time step (see Fig. 7(a)). Thus the network learns to relate its current outputs to its past experience and decisions3. With context units, the training has to take place trajectorywise. At the beginning of a trajectory the robot does not have any context. So, the context units are all initialized to zero. At this stage, the robot may choose to follow its reactive instinct. However, once an output direction is chosen, the context is set for the next step. The inputs CL to the context units at the ith time step are given by cj = acj-1 + O&l, 0 < o! < 1 (3)
where Ci-i and Oi-i are the context inputs and the neural net outputs at the previous time step. The constant a! controls the influence of previous motions on the current direction of motion. As u approaches unity, the net exhibits more and more memory at the cost of sensitivity to detail. The activation patterns at the outputs of the neural net usually have several peaks, each corresponding to a reactive option. This pattern, in a way, is a reflection of the environment as well. Accumulated over a few steps, following Eq. (3), in the context units, they hold some information about what the robot has seen in the last few steps. Trajectories obtained by this method were presented in Pal and Kar (1995). They were jerky and erratic at points. It appears that the contents of the context units have poor contextual information. They do not convey in clear terms which way the robot chose to move. This information is much more relevant in deciding the current direction of motion than a hazy recollection of what the robot had seen in the last few steps. In view of this, we decided to filter the outputs of the net before they are fed to the context units. The filter allows the highest peak in the activation pattern,
364
Pal and Kar
Equation (3) can now be rewritten as
Ci = CXC~-~ +F(Oi-I>,
0 < a! < 1
(4)
1 Input 1 (a)
(b)
Figure 7. (a) A context sensitive neural net. A thick arrow implies a full connection through a weight matrix, (b) the output is passed through a filter, which suppresses the minor peaks, before being fed to the context units at the input.
corresponding to the actual direction of motion, to pass through, but suppresses other minor peaks which only indicate alternate reactive options (see Fig. 7(b)). This forms a clean and unambiguous context signal ideally suited to navigation. However, other forms of context signals can also possibly be contemplated. For example, the previous inputs can also be fed to additional context units. That would of course increase the size of the neural net. A neural implementation of the filter can be envisaged4. However, for the time being, we have programmed the filter simply as a function which detects the position of maximum activation in its input and outputs a Gaussian around that.
where F is a suitable function of the activations of the output units. Its job is to extract the contextual information from the outputs of the net. Although in this case the function happens to be only a filter, in principle it can be quite complex. In fact, as we shall see later, this function assumes a nontrivial form in the case of navigation of a mobile robot with a finite turning radius. Since the contextual information is now output by a function, it is no more necessary that the number of context units has to be the same as the number of output units. In fact the context information can now be compressed and fed to fewer context units, the extreme case being a situation in which there is a single context unit. However, since the context information has significant influence over the output, it is desirable that it is presented through a number of units, comparable to the number of regular input units, so that there are sufficient number of weights available to influence the output. To generate training data, the contextsensitive motions are now chosen following the same principle as is implied in (Borenstein and Koren, 1991; Slack, 1993), i.e., once a decision is taken to keep a nearby object to the left, motions in subsequent time steps should not contradict that. So the robot would not hesitate to go round an object on its way to the goal. For training, the back-propagation technique can still be used. However, the data for a trajectory must be presented to the net in the same sequence in which they were obtained. At each step, as the processed ranges are fed to the input units of the net, the context units are set following Eq. (4). We trained a net with 12 context units apart from the 24 regular units at the input. There were 18 units in the hidden layer and 12 units at the output. The step-length of each motion was set to 0.2 m. The value of cx was set to 0.3. Only 30 trajectories were used for training. Each trajectory had about 10 to 20 steps of motion. Figure 8 shows trajectories generated by the context sensitive net in the same situations as shown in Fig. 5. The effect of the context units is obvious. All those trajectories which terminated in oscillations in Fig. 5, now reach their respective destinations in Fig. 8. Table 4 makes a comparison of the performance of a point robot under the control of the reactive and the context sensitive neural net. Apart from a higher success rate of
365
(4
an additional trajectory (S5-G5) showing a navigational failure.
(f)
Figure 8. Context sensitive trajectories for a point robot in the sameenvironmentsand for the samestart goal pairs as in Fig. 5. Only (f) has
86%, the trajectories under the context sensitive net are smoother (i.e., they have fewer undulations). However they are not as sure of collision avoidance (14%)
because unlike the reactive trajectories they always terminate at the goal even if they have to hit objects on
the way (see S5-G.5 in Fig. 8(f)).
The navigation problem has an inherent left-right symmetry. With the goal direction as the reference, if we substitute an environment with its mirror image, the reactive response in this environment should also be a mirror image of the response in the original environment. Although such a symmetry is implicitly
366
Pal and Kar
Table 4. A comparisonof the performanceof a point robot underthe control of the reactive and the context sensitive neural net. Method Reactive (RPROP) Context sensitive (RPROP) No. of colliding trajectories 0 14 No. of failed trajectories 44 14 No. of successful Collision trajectories rate % 56 86 0 14 Success Undulations rate % per 1000steps 56 86 44 30
S (a)
Figure 9.
(b)
For a robot trajectory in a certain environment(a), there is a mirror trajectory in its mirror environment(b).
conveyed to the net through the training data representing all possible situations, in order to avoid any bias toward clockwise or counterclockwise motion, as was observed in Pal and Kar (1995), we decided to include along with each training data generated on the screen, its mirror image data. This symmetry holds for context sensitive navigation as well. For a context sensitive trajectory, we create a mirror trajectory, each data of which is a mirror image of the corresponding data of the original trajectory (see Fig. 9). The inclusion of symmetric pairs in the training data ensures that the net recognizes the left-right symmetry as a basic feature of navigation.
6.
Laboratory
Experiments in Navigation
So far, our neural net has been trained to guide a point robot which is essentially omnidirectional. The output of such a net cannot be used directly to steer a mobile robot with a finite turning radius. A kinematic model
of the wheel arrangements must be used to decide the steering angle that would result in the desired motion. Alternatively, the neural net may be trained to output the steering angle directly. In that case, to generate the training data, we have to indicate the desired steering angle rather than the desired direction of motion in a given situation. It feels like driving a three-wheeled vehicle on the computer screen. We did not follow any color rule for this, but instinctively drove the vehicle around the objects keeping in mind its finite turning radius. We generated 47 such trajectories and their symmetric pairs for training. Each such trajectory had at most 10 segments and quite often formed only part of a complete trajectory to the goal. This was done to avoid repetitive data and to ensure sufficient variety within a reasonably small volume of training data. The mobile robot under consideration has three wheels of which the front wheel is steered and the hind wheels are separately driven. The steering angle for a mobile platform is the angle made by the steering wheel with respect to its axis (see Fig. 10). The context
367
Table 5. Method Simulation Experiment
Performance of the mobile robot in simulation and experiment. No. of colliding trajectories 9 6 No. of failed trajectories 9 6 No. of successful trajectories 71 32 Collision rate % 11 16 Success rate % 89 84
/-x
Figure IO. The configuration parameters of the mobile robot-the coordinates (x, y), the heading angle $ and the steering angle 0.
for onward motion is set by the orientation of this axis, called the heading angle, whereas the output of the neural net holds only the steering angle. Thus the output cannot be copied directly to the context units. Instead, the steering angle is used to compute the heading angle at the end of the motion step and that with respect to the goal direction is fed as a Gaussian to the context units. The function F in Eq. (4) refers to this kind of mapping from the outputs to the context units in a general sense. A 36 x 18 x 7 neural net was trained by RPROP through 10,000 iterations with the data comprising of 94 part trajectories. A reduction in the number of output neurons from 12 to 7 was possible because the mobile robot could turn only between -45 to +45 degrees. The average and the worst case percentage error for the training data after these iterations were 22 and 72 respectively, indicating that there are situations in which the neural net could not reconcile itself to the
training data. More than any deficiency in its ability to learn, it is the occasional contradictions in the training data that is responsible for this high worst case error. Such contradictions arise when apparently similar situations evoke different reactions from the trainer. In spite of that, the neural net manages to learn the essential features shared by most of the data. The behaviour of the trained net in simulated environments (these are different from the ones in which the training data were generated) is quite good as is evident from Table 5. Out of 80 trajectories in 9 environments, 71 were successful and 9 terminated in collision, indicating 89% success. The fact that the robot fails sometimes even in simulation shows that the navigation is not yet perfect. We used this net to guide a battery driven experimental mobile robot over short distances inside our laboratory. The runs were taken over a 8 m x 6 m flat open area on which cartons and cardboard cylinders posed as obstacles (see Fig. 11). Altogether 38 runs were taken in 7 different environments. Each environment had a certain arrangement of the objects. These arrangements were sometimes random as in Figs. 12(a) to (d) and (f). However, the situations that a mobile robot is likely to confront in reality are often more orderly-like a corridor or the chairs and tables in an office etc. So, some environments were created (see Figs. 12(e), (g) to (i)) to test the ability of the mobile robot to negotiate long walls and turns. The experimental runs were kept short (5 m to 10 m) because there is no way for our mobile robot to know its actual position and orientation during a run. It in fact computes these values at every step assuming that there is no slippage of its wheels. This results in a gradual drift between the computed and the actual coordinates and eventually shows up in the robot reaching a goal point away from what was originally specified. In the runs of Fig. 12, we show the robot at its computed coordinates. So in this case there is a drift between the positions and orientations of the actual objects and the ones perceived through the sonar ranges (shown as arcs).
368
Pal and Kar
F&z
1I.
Photographsshowing the mobile robot at intermediatepositionsduring the run of Fig. 12(d).
As we started our experiments, we observed that some of the cartons were difficult to detect at an angle because of specular reflection of the ultrasonic wave from their smooth surfaces. The cylinders could of course be sensed from all angles. To ensure uniformity in the reflectivity of all the objects in the environment irrespective of their orientations, we wrapped each of them with corrugated paper. Because of its wavy surface, a corrugated paper scatters ultrasonic waves in all directions. So the objects could now be sensed from all angles; In fact they appear much too wide, so much so that often the mobile robot fails to recognise the gap between two objects. This results in a trajectory of the mobile robot different from that obtained in
simulation. However, as long as the robot manages to reach its destination without hitting any object on the way, we consider the run to be successful. The mobile robot is 0.46 m wide and 0.53 m long. The maximum reactive range Rx is kept at 1.5 m. In our arrangements of the objects, we have tried to keep gaps ranging from 1 m to 1.6 m for navigation. The narrowest gap the mobile robot ever passed through during the experiments was 1 m. In a dozen cases, it passed through gaps measuring upto 1.25 m. The fact that the mobile robot requires a wide enough gap (more than double of its width) to pass through safely is because of the strong scattering of ultrasound by the corrugated papers. A reduction in the sensitivity of the ultrasonic
369
(4
Figure 12. (a) to (i): A sample of simulated (left) and actual runs (right) for different object arrangements and start and goal positions, The rectangles and circles shown in simulation correspond to cartons and cardboard cylinders in the real environment. In the actual runs with the mobile robot, the recorded sonar ranges are shown as arcs. Excepting in (f) and (h), the mobile robot reaches its destination, although often in ways different from that indicated in simulation, as in (a) to (c) and (e). (Continued on next puge)
370
Pal and Kar
Figure 12.
(Continues)
receiver or use of a lesser scatterer would have possibly allowed it to pass through narrower openings. Even then, it appears that the context or the direction from which a gap is approached has a strong bearing on whether the robot would move through the gap or avoid it.
Table 5 shows the results of our experiments, Out of 38 runs, 32 were successful to reach the goal without collision, indicating 84% success. There are situations in which the simulation succeeds but the experiment fails and vice-versa. The difference comes mainly from the apparent widths of the objects; In our experiments
371
(h)
(j9
Figure 12. (Continued)
the objects looked much wider than what they actually are. Out of the successful runs, in only 14 cases the trajectories were similar in both simulation and experiment. In the remaining 18 cases either the gaps were not sensed well enough or the apparent increase in widths of the objects encouraged the robot to take a
course different from that in simulation. Figure 12 shows some of these runs, of which in (d), (g) and (i) the actual runs are similar to that in simulation, whereas in (a) to (c)and (e) they are different. Figures 12(f) and (h) show two instances of navigational failures where the robot meets collision in spite of detecting the objects.
372
Pal and Kar
The failures take place in situations where the perceived gap is narrow and detected late (see Fig. 12(f)). Besides, the context units seem to discourage the robot from taking sharp turns; This also makes it difficult sometimes for the robot to steer clear of obstacles (see Fig. 12(h)).
(ii) The use of context units seem to prevent the mobile robot from taking sharp turns. This makes it difficult for the robot to make use of openings that show up late. (iii) The mobile robot does not know to stop in absence of a clear verdict about the direction of motion. The generation of training data on the computer screen completely ignores the variations in the ability of objects to scatter ultrasound. Whereas our navigation algorithm operates using range data, ultrasonic sensors alone are not adequate to provide reliable range informations in varied circumstances. In our experiments we wrapped each object with corrugated paper to avoid the surface, shape and orientation dependent uncertainties in the recorded ranges. In an actual application this is not possible. As a result, with natural objects (not wrapped with corrugated papers) sometimes the robot fails to see a wide gap, and sometimes it moves embarassingly close to an object. However, in conjunction with other simple sensors like infrared proximity sensors, one can form a more reliable estimates of the ranges (Flynn, 1988), which can then be used effectively by our navigator. The generation of training data is laborious. It is also subjective in nature. Such expertise is difficult to transfer or document. Moreover there is no way to assert that a training set is complete or adequate. This uncertainty about the adequacy of training is the way of life with all learning algorithms. In its formative period, the training set is continually upgraded with data for situations in which the trained net tends to fail. Since the number of possible situations is potentially infinite, this process may continue indefinitely. On the other hand if we use a mathematical model directly for navigation, it may have certain limitations, but there will be no uncertainty about its capabilities. At the same time such a model is inflexible and so it may not be possible to perfect it over a length of time. A good alternative is to supplement supervised learning with a scheme like reinforcement learning (RL) for continued learning and adaptation during operation. The neural net trained on artificially generated data may form the policy network in an Adaptive Heuristic Critic (AHC) learning architecture (Sutton, 1984), while learning continues with actual sensor data and reinforcements obtained in the field.
7.
Conclusion
A neural implementation of reactive navigation has the advantage that the navigational decision is directly related to the sensor inputs and does not require any modeling. Also, it is more tollerent to noisy inputs. By carefully representing the inputs and outputs, we have been able to train a feedforward neural net to exhibit the essential features of reactive navigation. Following are the principal components of our representation: (i) The goal direction serves as the reference direction for both the input and the output. (ii) The range data are so transformed before presentation as input to the neural net that it becomes amply clear as to which object is an obstacle and which is not on the way to the goal. (iii) The output indicating the steering direction has a distributed representation that allows a discontinuous change in the steering direction through a continuous change in the activations of the output neurons. We have also successfully explored the use of context units to avoid oscillations characteristic of a pure reactive behaviour. For the mobile robot the output gives the steering angle, whereas the context is formed by the heading angle. So, instead of copying directly from the output, the context inputs are computed from the output using kinematic equations of the mobile robot. In principle of course, a recurrent neural net may be left to discover its context by backpropagation through time (Werbos, 1990). The trained net guides a mobile robot in our laboratory through gaps of 1 m and above, to its destination with about 84% success measured over a small sample of only 38 runs. The occasional navigational failures of the mobile robot can be traced to the following reasons: (i) The training data set may not have included any situation resembling the one encountered.
373
Acknowledgments Part of this material was presented at the 1995 IEEE International Conference in Robotics and Automation. This paper has benefitted immensely from the changes suggested by the editor and three anonymous reviewers. We are grateful to M. Jayandranath of Central Workshop for providing us space for conducting the navigation experiments. We also deeply appreciate the interest shown by M.S. Ramakumar all along in our efforts to improve mobile robot navigation.
Notes
Another way of asserting the importance of an information is to present it to the neural net through a number of input neurons, so that there are sufficient number of weights at its disposal to influence the output significantly. We adopt this technique to represent the context information in Section 5. In our representation for the inputs, the goal direction is set as the reference direction. However, if the goal point lies outside the maximum reactive range, its distance does not play any role in deciding the motion of the robot. This only points to the fact that a reactive robot is only concerned about its immediate neighborhood. Lin (1992) also recognises the importance of history information in deciding the course of navigation, Along with current sensor readings, the previous action is also included in his input representation It may be noted that the filter between the outputs and the context units broadens the scope of these units. With a neural implementation of the filter, the net, in fact becomes a recurrent net and algorithms such as backpropagation through time (Werbos, 1990) become applicable for training the net.
References
Arkin, R.C. 1989. Motor schema-based mobile robot navigation. Int. .I. Robotics Res., pp. 92-112. Borenstein, .I. and Koren, Y. 1991. The vector held histogramFast obstacle avoidance for mobile robots. IEEE Trans. Robotics Automat., pp. 278-288. Brooks, R.A. 1986. A robust layered control system for a mobile robot. IEEE J. Robotics Automat., RA-2: 14-23. Davis, IL. 1995. Sensor fusion for autonomous outdoor navigation using neural networks. Proc. qf1995 Intelligent Robots and Systems (IROS) Conference, IEEE Press, pp. 338-343. Dorigo, M. and Colombetti, M. 1994. Robot shaping: Developing autonomous agents through learning. Artificial Intelligence, 71:321-370. Dubrawski, A. and Crowley, J.L. 1994. Learning locomotion reflexes: A self-supervised neural system for a mobile robot. Robotics und Autonomous Systems, pp. 133-142. Elfes, A. 1987. Sonar-based real world mapping and navigation. IEEE J. Robotics Automat., RA-31249-265.
Flynn, A.M. 1988. Combining sonar and infrared sensors for mobile robot navigation, Int. J. Robotics Reseurch, 7(6):5-14. Gullapalli, V., Franklin, J.A., and Benbrahim, H. 1994. Acquiring robot skills via reinforcement learning. IEEE Control Systems, pp. 13-24. Hertz, J., Krogh, A., and Palmer R.G. 1991. Introduction to the Theory OfNeural Computation, Addison-Wesley, pp. 179-182. Jordan, MI. 1989. Serial order: A parallel distributed processing approach. Advances in Connectionist Theory: Speech, edited by J.L. Elman and D.E. Rumelhart, Hillsdale: Erlbaum. Khatib, 0. 1985. Real-time obstale avoidance for manipulators and mobile robots. Proc. IEEEInt. Coq? Robotics Automat., pp. 500505. Latombe, J.C. 1990. Robot Motion Planning, Kluwer Academic, Boston. Lin, L.J. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293321. Maes, l? and Brooks, R.A. 1990. Learning to coordinate behaviours. Proceedings AAAI-90, pp. 796-802. Mahadevan, S. and Connell, J. 1992. Automatic programming of behaviour-based robots using reinforcement learning. Artijicial Intelligence, 55:311-365. Meng, M. and Kak, A.C. 1993. Mobile robot navigation using neural networks and nonmetrical environment models. IEEE Control Systems, pp. 30-39. Millan, J.R. and Torras, C. 1992. A reinforcement connectionist approach to robot path finding in non-maze-like environments. Machine Learning, 8:363-395. Moravec, H.P. 1988. Sensor fusion in certainty grids for mobile robots. AI Magazine, pp. 61-74. Pal, P.K. and Kar, A. 1995. Mobile robot navigation using a neural net. Proceedings qfthe IEEEInternutional Cocference in Robotics and Automation. Pomerleau, D.A. 1992. Neural network perception for mobile robot guidance. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh. Riedmiller, M. and Braun, H. 1993. A direct adaptive method for faster back-propagation learning: The RPROP algorithm. Proc. IEEE Int. Car-$ Neurul Networks, San Francisco, pp. 586591. Rumelhart, D.E., Hinton, G.E., and Williams, R.J. 1986. Learning internal representation by error propagation, In Purullel Distributed Processing-Explorations in the Microstructure offcognition, edited by Rumelhart and McClelland, MIT Press, vol. 1, pp. 318-362. Slack, M.G. 1993. Navigational templates: Mediating qualitative guidance and quantitative control in mobile robots. IEEE Trans. Syst. Man Cybern., pp. 452-466. Sutton, R.S. 1984. Temporal credit assignment in reinforcement learning. Ph.D. Thesis, Dept. of Computer and Information Science, University of Massachusetts. Thrun, S. and Bucken, A. 1996. Learning maps for indoor mobile robot navigation, CMU-CS-96-121, School of Computer Science, Carnegie Mellon University. Werbos, P.J. 1990. Backpropagation through time: What it does and how to do it. Proceedings qf the IEEE, vol. 78, no. 10. Yeung, D.Y. and Bekey, G.A. 1993. On reducing learning time in context-dependent mappings. IEEE Trans. on Neural Networks, 4:31-42.
374
Pal and Kar
Prabir K. Pal is a Scientific Officer at the Division of Remote Handling and Robotics of Bhabha Atomic Research Centre, Mumbai, India. His research interests include robot motion planning, mobile robot navigation, gait generation of walking machines, visual simulation of robots and neural networks. He did his MSc. in Physics from the University of Calcutta, Calcutta, India in 1978 and joined the Training School of Bhabha Atomic Research Centre in the same year. Over the years his interests shifted from physics to computing and finally to robotics. In the last 12 years he has led a number of important research projects at the Division of Remote Handling and Robotics.
Asim Kar is a Scientific Officer at the Division of Remote Handling and Robotics of Bhabha Atomic Research Centre, Mumbai, India. His research interests include robot motion planning and sonar based mobile robot navigation. He did his graduation (B.E.) in Electrical Engineering from the University of Calcutta, Calcutta, India in 1989 and post graduation (M.E.) from Jadavpur University, Calcutta, India in 1991. He joined the Training School of Bhabha Atomic Research Centre in 1990. Since 1991, he is associated with the Division of Remote Handling and Robotics. He developed an ultrasonic range sensor and did the entire system design for an autonomous mobile robot.

Sonar-Based Mobile Robot Navigation Through Supervised Learning On A Neural Net

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sonar-Based Mobile Robot Navigation Through Supervised Learning On A Neural Net

Uploaded by

Copyright:

Available Formats

AutonomousRobots 3,355-374 (1996) @ 1996Kluwer Academic Publishers. Manufacturedin The Netherlands.

Sonar-Based Mobile Robot Navigation Through Supervised Learning on a Neural Net

Division of RemoteHandling and Robotics,BhabhaAtomic ResearchCentre,Murnbai 400085, India

Pal and Kar

Sonar-Based Mobile Robot Navigation

The range data r are then transformed into rT as rT=r-Rx, = Rf, r 5 R, r P R,

Pal and Kar

Sonar-Based Mobile Robot Navigation

Training Data for Navigation

Pal and Kar

Results of Reactive Navigation

Supervised learning method BPEPOCH* BPSTOCH* RPROP

Sonar-Based Mobile Robot Navigation

Pal and Kar

Sonar-Based Mobile Robot Navigation

Pal and Kar

Equation (3) can now be rewritten as

Sonar-Based Mobile Robot Navigation

the way (see S5-G.5 in Fig. 8(f)).

Pal and Kar

Sonar-Based Mobile Robot Navigation

Table 5. Method Simulation Experiment

Pal and Kar

Photographsshowing the mobile robot at intermediatepositionsduring the run of Fig. 12(d).

Sonar-Based Mobile Robot Navigation

Pal and Kar

Sonar-Based Mobile Robot Navigation

Pal and Kar

Sonar-Based Mobile Robot Navigation

Pal and Kar

You might also like