You are on page 1of 15

Steepest-Descent Failure Analysis

M . M C D E R M O T T , JR., ~ A N D V~/. T. F O ' W L E R 2

Communicated by A. Miele

Abstract. An extensive failure analysis of the steepest-descent optimization algorithm has been made. Each of the ways in which the algorithm can fail is discussed in terms of both the mathematical and numerical manifestations of a failure and the information which each type of failure provides about the formulation of the physical problem. Numerical tests for each of the various types of failure are described; several faulty problem formulations are presented, each of which illustrates a particular type of failure. A table is presented in which alI failure modes are summarized and the corresponding numerical tests are exhibited~ Key Words. Gradient methods, ill-posed problems, computing methods, numerical methods, failure analysis.

1 Introduction The steepest-descent algorithm is widely used to obtain numerical solutions to optimization problems. One of the main drawbacks of this algorithm is that the user is required to arbitrarily choose several problemdependent parameters and time functions in order to implement the algorithm, and the convergence characteristics for each particular problem are highly dependent on these choices. It is often difficult for a user who is not intimately familiar with the algorithm to m a k e appropriate choices of these parameters, tn fact, experienced users of the algorithm are often accused of being practitioners of some type of black magic. The authors' experience with this algorithm at first tended to validate this notion. However, after considerably analysis, most of the seemingly mysterious
Assistant Professor, D e p a r t m e n t of Industrial Engineering, Texas A & M University, College Station, Texas. 2 Associate Professor, D e p a r t m e n t of Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin, Austin, Texas.

229
This journal is copyrightedby Plenum.Each article is availablefor $7.50 from Plenum PubIishingCorporation, 227 West I7th Street, New York, N.Y 10011.

230

JOTA: VOL. 23, NO. 2, OCTOBER 1977

phenomena which occurred could be readily related to problem formulation or numerical limitations. This paper is a result of this analysis. All of the various ways in which the steepest-descent algorithm can fail and the relation of these failures to the physical problem are considered. The authors feel that the analysis presented here will be very useful to all users of the steepest-descent algorithm in determining the exact causes of failures or erratic convergence characteristics of the algorithm. The insight provided by the analysis should also be helpful to users of the algorithm in choosing the parameters required to obtain a numerical solution to a particular problem.

2. Steepest-DescentAlgorithm
~lqae steepest-descent algorithm (Refs. 1-6) is an iterative algorithm for computing a time history for an m-vector of control variables u (t) to minimize the scalar performance index 3

4, = 4,[x(ts), q],

(1)

where the n-vector of state variables x is governed by the differential equation i = [(x, u, t). (2)

the initial states are fixed, and the terminal states must satisfy the p + 1 constraints
o = O[x (q),

tr].

(3)

The algorithm requires that the user choose one component of the vector function q~ as a stopping function. This function is denoted by f~. On each iteration, the final time is determined as the time when the stopping condition 0 = 1)Ix (tl),

ti]

(4)

is satisfied. In what follows, the remaining p-vector of terminal constraint functions will still be denoted by ~p. To begin the computation, the user must first choose a nominal control history u (t). The differential equations (2) are numerically integrated until the stopping condition (4) is satisfied. The trajectory which is generated is referred to as the nominal trajectory, and quantities evaluated along this trajectory are denoted by a superscript asterisk; i.e., ( )*. 3 This paper uses the notation of Denham and Bryson (Refs. 1-4).

JOTA: VOL. 23, NO. 2, OCTOBER 1977

231

The algorithm makes use of properties of the system of differential equations adjoint to Eq. (2):

Ji = - A ~'A,
where A = (Of/Ox)*

(5)

and A is the vector of adjoint variables. Let ~(t, tr) denote the state transition matrix for this system, and let Aea and ACa denote the particular solutions of the system which satisfy, respectively, the boundary conditions

A~.(tl) = [~/Ox + (1/h)4; oa/Ox]*=,,


A~a(tI) = [O0/Ox + (1/fi)60f~/Ox]*=t,,

(6)

where A~a is an n-vector and A,a is an n p matrix. Then, at any time t, A4,a(t) = 'tfft, t~)A~a(tr), A,a(t) = "-Ifft, tf)A,n(tr). The optimal variation in the control is given in terms of the particular solutions of the adjoint system A+a and A,a. Denham shows that the optimal variation in the control is 6u = - ( 1 / 2 ~ ) g,~-~B r [A4,a- A,a(], (8) (7)

where W~ = W~(t) is a time varying, positive-definite weighting matrix for control variations which must be chosen by the user of the algorithm, B = [affOu]*, and = -1;211,4, + 2/x 2~4~], /x = 1/2[(~4, - - I , ~ I , + ) / ( R T - ~ 2 _ A o T I f ~ A~0)]~/z (9) (10)

R is the stepsize in control space which must be chosen by the user of the algorithm. To ensure that the control variation is small, so that the linearity assumptions are not violated, the control variation is required to satisfy the constraint

R== I,9 ~u~(~-)w~,(~ -) au(~-) &.

(11)

AO is the requested change in the value of the terminal constraint functions. For computational convenience, usually a0 = -a~*, where O_<a~l.

232 Also,

JOTA: VOL. 23, NO. 2, OCTOBER 1977

14,4 =

"J t o

t,

AaBW,

-x

B Aem dr,

(12) (13) (14)

9 1,4, = f A~flBWfflBr A , a dr,


a tO

I,o =

ty

A~aBWu B A , a d.

--1

to

Note that, if a weighted controllability matrix C, is defined as

C~ =
then

I[ r
o

~r(ty, r)B(r)W2~(r)Br(r)~(tr, r) dr,

(15)

I4,4, = A~n(tr) Cu A4,a(tr),

(16) (17) (18)

Io, = A~a(ti)C~An(t~), I** = h~n(tf )C,A,a(tr).

The weighted controllability matrix C, provides information about the linearized problem (linearized about the nominal trajectory) and will be discussed in the failure analysis which follows.

3. T y p e s o f F a i l u r e s

The computations indicated in Eqs. (6) and (8)-(i0) are not properly defined if any of the following conditions occur: (A) (B) 0 = o, N = I4,, - 1 , 4 j , , Iq, ~ 0,
T --1

(c)
(D)

D = n2-a~f;~ I, = singular.

~4,-< 0,

For the above failure types, the following comments are pertinent. (A) Division by ~ is indicated in Eq. (6). (B) If N, the numerator in the quotient indicated in Eq. (10), is zero it follows that ~ = 0, and the division by p indicated in Eq. (8) is not defined. If N < 0, p is imaginary, which is physically meaningless.

JOTA: VOL. 23, NO. 2, OCTOBER 1977

233

(C) Division by D, the denominator in the quotient indicated by Eq. (10), is not defined for D = 0. If D < 0,/x is imaginary, which is physically meaningless. (D) The computations indicated by Eqs. (9) and (10) require that Io-~ l exist. Failures caused by ~ = 0 and by D = 0 have been fully discussed in the literature and are easily avoided. Also, Denham discussed one possible cause of failure due to N = 0. These results, previously presented by other authors, are reviewed here for completeness. In addition, two more possible causes of failure due to N = 0 and two possible causes of failure due to Io~ being singular are presented. 3.1. Type (A) Failure. Consider first the case in which the time derivative of the stopping function is zero at the final time,

~[x(t~), ts] = 0.
The final time is determined as the time at which the stopping function is zero; i.e.,

a[x(t~), t~] = 0.
For the first-order analysis used in the steepest-descent algorithm, the simultaneous satisfaction of the conditions ~2 = 0 and ~ = 0

implies that the stopping function f~[x(t), t] is zero over some finite time interval. Note that, for the nonlinear system, an extremum of f~ is possible. Thus, the increment in the final time Atf cannot be uniquely determined. This type of failure is caused by a poor choice of the stopping function and can be avoided by choosing a stopping function which is monotonic along any physically realizable trajectory. Methods to construct such a stopping function are well documented in the literature, the most direct method being to make a change in independent variable, so that the final value of the new independent variable is fixed. The function

~=t-t~
is used as the stopping function and
= 1, oa/ox = O.

This transformation was proposed by Long (Ref. 7) for quasilinearization algorithms and its application to steepest-descent algorithms is described by Miele et al. (Ref. 8) in the sequential-restoration algorithm (SGRA).

234

J O T A : VOL. 23, NO. 2, O C T O B E R 1977

3.2. Type (B) Failure. iS zero.

Second, consider the case where N =- I,~+ r --I

Convergence. Denham has shown that, for a properly formulated problem, this expression is positive semidefinite and will be zero only on the optimal trajectory. Thus, for a properly formulated problem, N = 0 implies that convergence, rather than a gradient failure, has occurred. Pedormance Index Constrained. The quantity N can be zero if certain types of improper problem formulation occur. Consider the consequences of the vector A4,a and the columns of A4,a being linearly dependent, so that there exist constants ai, not all zero, such that
0 = Y~f= 1 Aq,,nai +

A4,f~OLp+I.
(19)

If ap+l # O, this relation can be solved for A , a to yield A~a=A~aa, where


a = (1/ap+l)[al, a2,..., ap] T

The case in which


O~p+l = 0

results i n / ~ being singular and will be discussed below. Equation (19) is a mathematical statement of the fact that the performance index is constrained in the physical problem and cannot be minimized while simultaneously satisfying the constraints. That is, the Jacobian matrix 0(0, 4))/O(x, t) is not of full rank. It will now be shown that linear dependence of A4a and the columns of Au,a results in N = 0. Using Eqs. (16)-(19) yields
N =a
T

(AoaC, Aon)a - a

(Ac,aCuAu, a)I4,, (A4,nC, Aoa)a.

-1

Noting that
T A~aC~ A~a - I~,~,

the above equation reduces to

or

N=0.

JOTA: VOL. 23, NO. 2, OCTOBER 1977

235

Thus, if the performance index is constrained, the numerical manifestation is that N = 0. Further, in this case, N = 0 is computed as the difference between two nonzero terms. Performance Index Uncontrollable. Another improper problem formulation which will cause a failure due to N = 0 occurs when the vector A~a is in the null space of the weighted controllability matrix C.. This is a mathematical statement of the fact that the performance index is uncontrollable in the physical problem. If A~a is in the null space of C., then by definition I4~ = A~aC, A ~ = 0,
iqj~Q = T ACaC, A+a = 0, T

and it follows that N-0. Thus, a gradient failure due to N = 0 will occur if the performance index is uncontrollable. In this case, N is computed as the difference of two terms which both approach zero. Numerical Errors. In numerical computations, N is sometimes negative, which D e n h a m has proved to be theoretically impossible. Negative values of N result from numerical errors and indicate that the true value of N is near zero (due to convergence, constrained performance, or uncontrollable performance), but roundoff or truncation errors or both have caused the computed value to be negative. When numerical computations are performed, N should be tested; and, it it is less than some small number e, a warning should be given and the computation should be terminated or other appropriate action taken. 3.3. Type (C) Failure. In the above discussion, it was shown that the quantity N must be positive. It follows that

D = R2-~,~,~I~2, zX4,
must also be positive i f / x is to be properly defined and real. Negative values of D mean that the requested changes in the terminal constraints 2~0 are larger than can be achieved with the given stepsize in control space R ; recall that

R 2 = i q 6u TW'~ 3u d~'.
o

236

JOTA: VOL. 23, NO. 2, OCTOBER 1977

Any conputer program which implements the steepest-descent algorithm must include logic to choose R and A~ in such a way that D is always positive. This logic will preclude the possibility of failures due to D -< 0. In practice, since no a priori knowledge is available to choose R, one usually chooses At~ (or 6u) to achieve the greatest possible performance increment on each iteration and then, if desired, computes the corresponding value for R. Theoretical and computational aspects of various strategies for choosing the stepsize are presented by Miele et al. (Ref. 8). 3.4. Type (D) Failure. The final cause of failure is for the matrix Io, to be singular. Before discussing the specific causes of failure, certain properties of the matrix I ~ will be investigated. By definition,
iqr4t ~ T T Iti' A,a(tf)q~ (~-,t f ) B ( ~ ) W , --1 (~')BT (7)~(~-, tf)Aqm(tf) d~'.

The following facts will be used in the analysis which follows. First note that the matrix A~,a(tr) is constant, so the first and last terms in the integrand can be taken outside of the integral. Since W, is positive definite, it follows that W~-~ is also positive definite; consequently, there exists a nonsingular matrix H~ such that
W2~=H~H~.

Also, since the integrand in the expression for I,~, is a quadratic form, it is at least positive semidefinite and is positive definite iff it is nonsingular. By taking the constant matrix A,a outside of the integral, I00 can be written in the form = AT
T

~rr

= A,nC~ Aoa, where the integral term C, is referred to as a weighted controllability matrix, due to the similarity between Cu and the classic controllability matrix (Ref. 10) q

C = fro %~tTBB T~ dT
for the linear system
g.~c= A 6x + B 6u.

The linear system is controllable on the interval [to, tf] iff the matrix C is

JOTA: VOL. 23, NO. 2, OCI'OBER 1977

237

positive definite. An equivalent criterion for controllability is that the matrix r(~-) = B T(~.),~(~., tf) has n independent columns on [to, t~]. Define a matrix Y, (~') as Y, (r) = Hu (~-)B 7(~')~(a ts). The relationships between the matrices C,, and Y~ of the steepest-descent algorithm and the matrices C and Y of the controllability problem will now be considered.

Constraints Dependent. Consider the consequences of the columns of the matrix A , a being linearly dependent. That is, there exists constants a~, not all zero, such that
0 = Aorta. Recall that this is the case referred to in the discussion of failures due to N=0. This is a mathematical statement of the fact that the terminal constraints (including the stopping condition) are not independent and, consequently, cannot be satisfied simultaneously. That is, the Jacobian matrix O~/O(x, t) is not of full rank. If the rank of A,n is less than p, it follows that the p p matrix I, = A .GA,

must be of rank less than p; therefore, it is singular.

Constraints Uncontrollable. The matrix Io wilt be singular if another improper problem formulation occurs. Assume that the matrix C, does not span the column space of the matrix Aca. That is, there exists a nonzero vector A which is a linear combination of the columns of Acn (that is, A = Acn/3) and 0 = CuA. It follows that
IJ ~,t~

c* uiXOflp

Since/3 # 0, if follows that the matrix I, is not positive definite; hence, it is singular. This is a mathematical statement of the fact that the terminal constraints cannot be controlled in the physical problem. It is emphasized that the nonsingularity of 1~ only requires that C~ span the column space of A,a; it does not require that the linearized problem be completely controllable. The problem may contain uncontrollable functions, but the constraint functions , are controllable.

238

JOTA: VOL. 23, NO. 2, OCTOBER 1977

4. Numerical Tests for Failure

A series of numerical tests to determine the existence of and to identify the failures discussed in the previous section are given in Table 1. The sequence in which the tests are to be performed is indicated in the notes at the bottom of the table. The only computation required to perform the tests, but not required by the steepest-descent algorithm, is the determination of the rank of the matrices A+a and [A,a ! A~a]. It is imperative
T a b l e 1. Numerical failures (A) ~ = 0 (B) N=Icg,-Icj44,1~,4, =0 (132) Performance index is constrained,
T ~1

Failure tests a n d i n t e r p r e t a t i o n . Causes of failure Numerical tests

(A1) Poor choice of stopping condition. (B1) Convergence.

tOt<<ifaAll~it+ta, l.
][q~lt and relative change in q5 small. Rank [Aca ! A4,a] ~ p + 1. N is computed as the difference of two nonzero nearly equal terms.* [I,e,[<E and II~,,I#,614,,b[ e, where e is a criterion for numerical zero. N is computed as the difference of two terms which both approach zero.:)

(B3) Performance index is uncontrollable.

(B4) One of the causes of error listed above plus numerical errors will cause N < 0.

(c) R2-aO~I~a~-<O

(C1) Incompatible of R and A~.

choice

Logic should be included in the computer program to choose R and A so that this error does not occur. Rank [A+a] _-.'p.*

(D) 1 ~ = singular

(DI) Terminal constraints (including the stopping condition) are not independent. (D2) Terminal constraints are uncontrollable.

Rank [ I ~ ] :3~ p. f

*Tests D1 and B2 must be performed sequentially. t Failure on test D2 indicates uncontrollability only after Test D1 is passed. :) Failure in Test B3 indicates uncontrollability only after Test B2 is passed.

JOTA: VOL. 23, NO. 2, O C T O B E R

1977

239

that the numerical routine used to compute Ie7 ] have a reliable test to determine if Iq,+ is singular or ill-conditioned; otherwise, improperly formulated problems will not be detected.

5. Examples
For alI of the examples of steepest-descent algorithm failure, severai optimization problems based on a single physical system will be used. The failures shown will be rather obvious in order that the reader can easily see what is happening. However, in a complex optimization problem, the causes for algorithm failure are anything but obvious. Therefore, the computer code should always include checks for the various types of algorithm failure to alert the user and inform him of the probable cause of failure. Consider the two-dimensional equations of motion for a constantly thrusting point mass in a constant gravity field. The mass of the point decreases at a uniform rate. The single control variable u is the angle between the thrust vector and the coordinate xz. The situation is as shown in Fig. 1. The equations of motion for the system are
JCl ~" X2,

it2 = ( T / x s ) c o s u,
X3 -~" X4,

24 = ( T / x s ) s i n u - g ,

Xi Fig. 1. Thrusting point mass in a constant gravity field.

240

J O T A : VOL. 23, NO, 2, O C T O B E R 1977

where T and C are the constants such that T-->0 and C - 0 .

5.1. Type (A) Failure. It is not difficult to construct an example of this type of failure for the given system; however, the example which one can construct is transparent. Let C = 0, so that

and let
[ ] = X5 - - X s [ . Thus,

(Z-0,
and a Type (A) failure occurs. The reader should note that, in more complex problems, it is easily possible to inadvertently induce a failure of this type, especially if f~ involves several state variables. 5.2. Type (B) Failure. to = 0 let For an example of this type of failure, at and x5 = Xso.

Xl = X2 = X3 -~- X4 = 0

Let the stopping condition be Ul=t-1; and, at the terminal time t i = 1, let xl through x4 be free. Let C > 0 and T > 0. Now, let us maximize ~b = xs(tf). In this case, Aca(tl) = [0, 0, 0, 0, 1] T. Since all other variables are free at ts, A , a does not exist. After carrying out the appropriate matrix operations, we arrive at the conclusion that
N =14, = O.

In the case demonstrated, the performance index could not be controlled. This fact could be detected by Test B3 (see Table 1). With slight variations, an example in which the performance index is constrained can be constructed.

JOTA: VOL. 23, NO. 2, OCFOBER 1977

241

5.3. Type ((2) Failure. This type of failure cannot be exhibited easily without using the actual numerical results from a computer program. The reader is referred to Miele et al. (Ref. 8), where logic to avoid this type of error is discussed in detail. 5.4. Type (D) Failure. To demonstrate this type of failure, let

XI-~'X2"~-~X3~-X4"~O
at t = to. Let the stopping condition be 12=t-1

and xs=xso

and the performance matrix be x3(tr). The terminal constraint is


t~ = X 5 - Xsf.

For this case,

= [0, 0, 0, 0, 1].
Also,

[0, 0, 0, 0, 11
and

f ~0.
Since I,~, started out with a value 0, I~---0. Thus, Ioo is singular as was required in this failure mode. In this case, the terminal constraint was uncontrollable, since Test D2 was failed after successful completion of Test D 1.

6. Conclusions All of the possible ways in which the steepest-descent numerical optimization algorithm can fail have been enumerated and analyzed. Failures of Types (A) and (C) are well documented and can easily be avoided by careful implementation of the steepest-descent algorithm. Failures of Types (B) and (D) are attributed to improper problem formulation, not to the steepest-descent algorithm. However, the numerical tests described will detect these formulation errors, and the user is thus informed which type of error has occurred. Numerical results of these tests

242

JOTA: VOL. 23, NO. 2, OCTOBER 1977

are interpreted in terms of the formulation of the physical problem being considered. The authors have found that the implementation of these tests (as summarized in Table 1) in a steepest-descent computer program is highly desirable. The user is thus provided with critical information at the time when he most needs it. Despite the fact that the indicated tests are implemented, the user of the steepest-descent algorithm must constantly be aware that the algorithm is based on a local, linear approximation of a nonlinear system about some nominal trajectory. Therefore, information available from the linear approximation may indicate a condition which is not true for the nonlinear system. In almost every case, the failures discussed here can be caused by either inherent problems in the nonlinear system or by local conditions that result from a particular nominal trajectory. It is impossible to distinguish between these two cases numerically. Thus, statements that certain functions are constrained, uncontrollable, or linearly dependent mean that the local, linear approximations of the functions exhibit the indicated characteristic. As a practical matter, it has been the authors' experience that, when the algorithm fails, the failure is most often inherent in the problem formulation rather than the particular nominal trajectory.

References
1. DENHAM, W. F., Steepest-Ascent Solution of Optimal Programming Problems, The Raytheon Company Report No. BR-2393, 1963. 2. BRYSON, A. E., JR., and DENHAM, W. F., A Steepest-Ascent Method for Solving Optimum Programming problems, Journal of Applied Mechanics, Vol. 84, No. 2, 1962. 3. BRYSON, A. E., JR., DENHAM, W. F., and DREYFUS, S. E., Optimal programming Problems with Inequality Constraints, I: Necessary Conditions for Extremal Solutions, AIAA Journal, Vol. 1, No. 11, 1963. 4. DENHAM, W. F., and BRYSON, A. E. JR., Optimal Programming problems with Inequality Constraints, II : Solution by Steepest-Ascent, AIAA Journal, Vot. 2, No. 1, 1964. 5. KELLEY, H. J., Gradient Theory of Optimal Flight Paths, ARS Journal, Vol. 30, No. 10, 1960. 6. KELLEY, H. J., Method of Gradients, Optimization Techniques, Edited by G. Leitmann, Academic Press, New York, New York, 1962. 7. LONG, R. S., Newton-Raphson Operator; Problems with Undetermined End Points, AIAA Journal, Vol. 3, No. 5, 1969. 8. MIELE, A., PRITCHARD, R. E., and DAMOULAKIS, J. N., Sequential Gradient-Restoration Algorithm for Optimal Control Problems, Journal of Optimization Theory and Applications, Vol. 5, No. 4, 1970.

JOTA: VOL. 23, NO. 2, OCTOBER 1977

243

9. MIELE, A., Recent Advances in Gradient Algorithms for Optimal Control Problems (Survey Paper), Journal of Optimization Theory and Applications, \1ol. 17, Nos. 5/6, 1975. 10. ZADEH, L. A., and DESOUR, C. A., Linear System Theory, McGraw-Hill Book Company, New York, New York, t963. 11. KALMAN, R. E., HO, Y. C., and NARENDRA, K. S., Controllability of Linear Dynamical Systems, Contributions to Differential Equations, Vol. 1, No. 2, 1962.