J.M. Martinez (*), C. Barret, M. Houkari, P. Meyne, M. Dominguez (*) Laboratoire de Robotique d'Evry, Allke Jean Rostand, 91025 Evry, France (*) Centre d'Etudes de Saclay, DMT, 91190 Gif sur Yvette, France e-mail : jmm@soleil.saclay.cea.fr Abstract : This paper shows neural networks capabilities in optimal control applications of non linear dynamic systems. Our method is based ona classical method concerning the direct research of the optlmal control using gradient techni- ques. Weshow that neural approach and backpropagation paradigmare able to solve et-ficiently equations relative to necessary conditions for an optimizing solution. Wehave ta- ken into account the known capabilities of neural networks in approximation functions. And for dynamic systems, we have generalized the indirect learning of inverse model adaptive architecture that is capable of defining an optimal control in relation to a temporal criterion. Keywords : neural control theory, adaptive idenufication, adaptive control, optimal control. 1. Introduction Neural techniques have already shown their ability in iden- tification and control processes [1,2,3,4,5,6]. At first these techniques were introduced for static processes from direct or induect learning architectures of the inverse model. Then they were extended for dynamic processes. These approa- ches are only local optimisation based, i.e. the effects of the control in time are not really taken into account. Analternative to this limitation consists in the elaboration of the control from the minimization of a criterion in relation with temporal evolutions of process states. This approach is recurrent networks based where the behaviour of the process is analyzed froma sequence of neural networks models. To fit dynamical behaviour, one can use Back propagation Through Time. BTT (12). Our approach is similar. But here, wewill useBTT to deal with optimal control learning. We will use a sequence of neural networks to estimate the tem- poral evolution of the process states in order to define the best control. Today, claqsical methods in control theory are sufficiently mature and are well formalized. Nevertheless, these methods are not suitable inreal applications concerning op- timal control. They may be implemented by coupling them with neural methods. In[8], one emphazises the <<continuity of the research onartificial neural networks with more tradi- tional research,, in order to take advantages of the control process background knowledge. Definition of optimal control by classical methods requires good knowledge of the process. An analytical model in al- gebric-derivative or recurrent non linear equations is neces- sary. In real processes models are not sufficiently known. And when they are, it is not possible to use themin line be- cause of temporal constraints. So, in real applications, one usually deals with idenhfication problem using adaptive linear models which provides the typical feature of the pro- cess to classical controllers. But these models are not able to estimate the process behaviour over a long time and optimal control is not possible. In relation with classical methods, neural techniques can be distinguished by two characteristics. The fmt one consists in the smcture of neural models, i.e. non linear models gene- ralizing the classical linear approach. The second is relative to adaptation algorithms like backpropagation that fits real applications in adaptive and optimal control. Wedetal these two points below. For the most part of real cases processes are non linear. The neural approach with its non linear features is better than the linear approach. Neural models are to beseen as a generali- zation of linear models. Indeed, if activation functions of neural units are linear, we are back to linear identification models. In an adaptive scheme, the parameters of identifica- tion models are synaptic weights. In the other hand, it is known from[9], that a two-layer network with an arbitrarily large number of units in the hidden layer can approximate any continuous function f E C(R", RP) over a compact sub- set of R'. And we can add the high degree parallelism of the neural computation capbable of dealing complex applica- tions, using dedcated hardware. The second point relative to backpropagation is more tech- nical. It has been shown in [2, 6, 91that backpropagation provides easily the jacobian of the neural model. So wecan use it as if it was the jacobian of the process. The first idea is to help operators of the process in direct or indirect mode to define the better actions in relation to a given goal, i.e. to answer the requests ((What If ?n and <<What For '5). <<What if>>to help process operators to estimate perturbations on the process before any decision on the control, and <<What for>> to propose themsome variations onthecontrol to reach a de- 0-7803-2129-4194 $3.00 @ 1994 IEEE sired goal onthe process state. The other idea that wepre- sent here, is to use this appropriateness to extend control help towards the definition of a real optimal control. Section 2 will describe the classical direct method to find an optimizing solution. This method gives necessary condi- tions. I t is a well known method which can befound in[ 101. Section 3 presents the resolution of these necessary condi- tions using neural techniques. This approach is based onin- direct learning of inverse model to identify the process by a multi layered network. Section 4 gives our views on this ap- proach that seems very attractive for real applications. We conclude in theSection 5. 2. Optimization Method Weconsider non linear dynamic systems which can bedes- cribed by (1 ) : X, +l =F( X, . U, ) X E RP U E Rq where X, is the state vector and U, is the control vector at discrete time t. Froma given initial state XO, the problemis to tind il sequence of optimal control U, , U1, ..., U,, that mi- nimize a given cost function by the equation (2) : T+ 1 C(X,.X,. .... uo, U,, ...) = c Cr(Xr, Vr) r = 1 This N-srage optimal control problem, when analytical cases are not available, can be solve from numerical techniques by computing gradients cost in relation to the sequence control (3) : ac aut &U, =-a.- I [O,U To compute sensitivities of the cost with respect to varia- tions in control space, the direct classical method leads to solve associated recurrence equations fromfinal condition (4) : Yr-, =Fx r . Yr +c Xr with Y T+l =0 The gadients Cut =aC/dU, are calculated by ( 5) : cur =Fur ' *, +C u r Details of calculations are given in the Appendix. This sche- meneeds the process model F (Equation 1) and the associa- ted jacobian Fx (Equation 4) and Fu(Equation 5) . Wedeal with this using neural techniques identification to provide a neural model and backpropagation to compute gradients of the cost function in relation to control inputs. ping frominputs to outputs. Thevector of parameters (i.e. weights synaptic) are calculated using backpropapation to minimize a cost function based on the discrepancy between mget outputs and network outputs. So, we can use this adaptive scheme to deal with identification process. To per- forma model of the process, wepropose for example the classical series-parallel method , as seens as in Figure 3.1. Other methods of identification can be used (e.g. feed- forward or recurrent networks using stochastic or batch gra- dients) [ll]. Here, the method enables us to identify a process described by equation (1). So wesuppose that the identification pro- blemis solved by a feedforward neural network. The neural model which identifies the process, is a good model for control as long as it gives good enough mapping from inputs (state and control) to outputs (state). Besides. this kind of learning is capable of adapting to possible process drifts if it is kept on line. Control U, - F(Xn,UJ * State A x n ~a Figure 3.1 : Series Parallel Identification The notation f1 is the unit time delay. Backpropagation also gives differentiations of inputs with respect to ourputs. So, we are going b use backpropagation to solve equations (4) and ( 5) in which we will substitute the jacobian of the neural model for the jacobian of the real process model F (6 and 7) : y - 1 = @X r . Y r + C X r Cur =@ut . *r +Cu r These equations can be solved using backpropagation through neural model. Now weare going to describe our method. Building blocks of propagation and backprapaga- tion steps are described respectively in Figure 3. 2 and 3.3. The arrows in thick line represent the result of each 3. Neural Method A feedforward network can be seen as a parameterized map- 1465 step. PROP~ATI ON I Ut The PROPAGATION step is the classical forward step for multi-layered networks where we have added the calcula- tion of the grahents of local function q(X,,UJ. This step Figure 3.3 : Backorooazation SteD - . . defines each state Xt+l according to the previous state X, and value control U,. During this step the network at rela- tes to process state at discrete time t. So this step must be repeated for t =1 to T+l to get the estimated temporal beha- viour of the process at discrete time t =1,2 ,..., T+1. The par- tial derivatives of local cost function from state and control, Le; cXt =ac,(X,,U,)/dX, and cut =aq(X,,U,)/dU, are also calculated at discrete time t=0, 1, ..., T. The BACKPROPAGATION step, as seen as in Figure 3.3, performs a classical backpropagation of adjoint vectors Y, through internal state of neural models Qv The backpropag- tion step provides the terms %tY, and the terms OUt.Y,. To define each adjoint vector Y, and the sensitive relation of the global cost Cut with respect to control vector U,, one must add cut and cxt corresponding to the sensitive relation of lo- tal cost c, with respect to variations in x, and U, respecti- vely. Each of these terms has been calculated during the PROPAGATION step. Fromthis interpretation, one can find a minimun of the glo- bal cost function in relation with the sequence of control vectors U, fort =O,l , ..., T. Weshow inFigure 3.4 the general architecture to provide the sensitive relations Cut, i.e. to sol- ve the adjoint system. To lighten this figure wehave inclu- ded in each block at the calculation of the respective local gradient cut and cxt In the following figure, two data streams go through each elementary block (@,, c,). The first one consists in the propa- gation of pairs (X,, U,) initializing the intemal state of each unity. The second streamis relative to backpropagation seen as an echo occurring on temporal terminal T+1. There is no hypothesis concerning the depth of the temporal terminal. This echo propagates on the horizontal axis thevalue of ad- joint vectors Yt and on the vertical axis the values Cut in re- lation to variations that must be applied on the control vector. Onthis figure, the sequential distribution of the cost function appears along the sequence of blocks @,. This dis- tribution fits the definition of the global cost function as the 1466 . sumof local cost functions. inputs are Xo, XT+ld. Inthis case and for the particular value Ficure 3.4 : Adioint System Resolution This method requires initial values of the control vectors U,. To deal with this problemweuse learning capabilities of neural networks. To define initial conditions, the idea is to build a neural controller to estimate the o p m control in relation with the initial state X, and each cost function q. An iterative solution to perform the learning step of the neural controller consists in channelling sensitive realtions Cut to theneural controller. These sensitives relations are seen as errors on the last layer of the neural controller. Fromthese errors, the backpropa- gation will adapt modifications of synaptic weights of the neural controller to minimize these errors. And little by little, after several iterations, neural controller will leam optimal control in relation to initial Xo states and desired states X,d included inlocal cost ct. Figure 3.5 shows the general principle of this method. In ge- neral. the number of neural controller inputs is dependent on the number of desired states and desired control vectors, i.e. on the depth of the temporal window of the cost function. In general, the main objective is to control a path in the state space. So the cost function is only dependent on the desired states and the neural controller inputs are Xo, Xld,...,XT+, . Similarly, sometimes the main goal is to reach a desired state at discrete time T+1, so in this case the neural controller d T =0, werecognize the neural architecture which was pro- posed [ l , 6,7]. Figure 3.5 : O~timal Control Learninc Architecture 4. Discussion This approach can be generalized easily to processes which are described by non linear recurrence relations such as Xt+l =F(X,,X,-, ,..., U,,U,, ,... ). This representation is certainly more adapted to processes for which delay lines link state and control vectors. Onthe other hand, if there is no access to the state vector, estimation techniques such as Kalman fil- ters or other neural techniques can be used. The gradient problems must be solved : value of the step, the criterion to stop iterative procedure and the convergence to a local minimun.We must also deal with all problems con- cerning the numeric stability to solve the adjoint system. Nevertheless wehave applied our method to solve the pro- blemof the optimal control for a second order system, des- cribed by d%/dt2 =U. For the states of the process, wedealt with position and variation of the position using an Euler ap- proximation (10 ms for the sampling periode). The step a of the gradient (as seen in Equation 6) has been changed between 5 and 200 according to the variations of the cost function. After about 1000 iterations wefound the optimal solution, i.e. the bang-bang control law. 1467 5. Conclusions Today it is known that supervised learning is not completely dependent on a teacher [ 11. To solve problems of control this kind of learning is used to build a model of theccworld, and to rely on this model to give directives to a controller in or- der to reach a goal. Our work tries to apply this approach to process optimal control, i.e. when a trajectory in state space is desired. Our approach is a generalization of the neural architectures which were proposed by Jordan and Barto [l , 81. Indeed, with only one neural model to estimate the state, i.e. with T =0, werecognize their architectures. When the goal is spe- clfied over a long time (T>O) our method is reminiscent of Widrows works in [2]. The difference consists in the forma- lization of the optimal control using background classical methods. Wehope have proved that a sequence of fitted fee- dforward networks to process can provide theoptimal con- trol. Wehave shown that a baclcpropagation through this sequence of neural models solves the adjoint systemof ne- cessary conditions for an optimizing solution. 6. References [ 11 M. I. Jordan. D. E. Rumelhart, ForwardModels : Super- vised Learning with a distal teacher, Cognitive Science, 16, page 307-354. [2] D. Nguyen. B. Widrow, The Truck Backer-upper, In- ternational Neural Network Conference, July 9-13 1990, Pa- ris, France. [3] K.S. Narendra, K. ParthaSarathy, Identification and Control of Dynamical Systems Using Neural Network, IEEE Trans. On Neural Networks, Vol. 1, No. 1, March 1990. [4] D. Psaltis. A. Sideris and A. Yamamura, Neural Con- trollers, IC, San Diego, 1987. [ 5] M. Kawato, Computational Scheme and Neural Network Models for Formation and Control of Multijoint Ann Trajectory, in Neural Networks for Control edited by W. Thomas Miller, R. Sutton and PJ. Werbos, Bradford Book, 1990. [6] J.M. Martinez, Ch. Parey, M. Houkari, Lar6tropropaga- tion sous Iangle de la thCoriedu Contrijle, NEURO-NI- MES91,4-8 Novembre 1991, Nmes, France. 171 A. G. Barto, Connectionnist Learning for Control in Neural Networks for Control edited by W. Thomas Miller, R. Sutton and P.J. Werbos, Bradford Book, 1990. [8] K.M. Homik, M. Stinchcombe, H. White, Multi-layer Feedforward Networks are Universal Approximators, UCSD Depamnent of Economiccs Discussion Paper, June 1988. [ Y] Y. Lecun. A Theorical Framework for Back-Propaga- tion. Connectionnist Models, Summer School, Morgan Kaufinann Publishers. [ 101R. Boudarel, J . D e b , P. Guichet, Commande Opti- male des Processus, Techniques de 1 Automatisme, Dunod Paris 1968. [ l l ] S.-Z. Qin, H.-T. Su, andT.J . McAvoy, Comparison of Four Neural Net Learning Methods for Dynamic System Identification, IEEE Trans. On Neural Networks, Vol. 3, No. 1, Jan. 1992. [ 121 P. J . Werbos, <<Backpropagation Through Time : What it Does and How to Do it,,, Proc. IEEE, vol78, no 10, Oct 90, pp 1550-1560. Appendix Wedeal with systems and cost functions which are defined by the following equations : Xr+, =F(X,. U!) X E Rp U E R4 T = ( vO) c cr (xp + T+ 1 ( T + 1 ) I = I Fromthecost function C considered as a function of con- trol vectors U,, wehave : Let us define the adjoint vector by : Adjoint vectors Y, are linked by following recurrent equa- tions : T+ l a c k ax, d ~ , , ~ - ac, q+, ,,,ax, ax,,, awl - G+~GF, = Using following notations : Weobtain sensitives relations of the global cost C with res- pect to control vectors U, : Y, -, =c Xr +Yr - Fx , with YT+, =0 cur =Cur +y, Fur
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business