You are on page 1of 25

Chen CL

Analytical vs. Numerical ?

Numerical Methods for


Unconstrained Optimization

Analytical Methods: write necessary conditions and


solve them (analytical or numerical ?) for candidate
local minimum designs
Some Diculties:
Number of design variables in constraints can be large
Functions for the design problem can be highly nonlinear
In many applications, cost and/or constraint functions can be
implicit in terms of design variables

Cheng-Liang Chen

PSE

Numerical Methods: estimate an initial design and


improve it until optimality conditions are satised

LABORATORY

Department of Chemical Engineering


National TAIWAN University

Chen CL

General Concepts Related to


Numerical Algorithms
A General Algorithm
Current estimate:

x(k)

Subproblem 1:

(k)

feasible search direction

(positive scalar) step size

Subproblem 2:

k = 0, 1,

New estimate: x(k+1) = x(k) + k d(k)


= x(k) + x(k)

Chen CL

General Concepts Related to


Numerical Algorithms

Chen CL

Chen CL

General Concepts Related to


Numerical Algorithms

General Concepts Related to


Numerical Algorithms

Descent Step Idea


current
estimate

Example: check the descent condition


f (x) = x21 x1x2 + 2x22 2x1 + e(x1+x2)

new
estimate

  
  
f (x(k)) > f (x(k+1))
= f (x(k) + k d(k))
Taylor

(k)

(k)

f (x ) + f (x )



(k)

(k)

+ k d

(k)

(k) (k)
= f (x(k)) +
 k c d 
<0

f (x(k))d(k) = c(k)d(k) < 0 : descent condition


T

Angle between c(k) and d(k) must be between 90o and 270o
Chen CL

Unconstrained Optimization

Verify d1 = (1, 2), d2 = (1, 0) at (0, 0) are descent directions or not





1
2x1 x2 2 + e(x1+x2) 


=
c =

1
x1 + 4x2 + e(x1+x2) 
(0,0)


1
c d1 = 1 1 = 1 + 2 = 1 > 0 (not a descent dir.)
2


1
c d2 = 1 1 = 1 + 0 = 1 < 0 (a descent dir.)
0

Chen CL

One-Dimensional Minimization:
Reduction to A Function of Single Variable

Assume: a descent direction has been found


f (x)

f (x(k) + d(k))

f ()

<

T
f (x(k)) + f
(x(k))d(k) = f(|x(k))

=cd<0
f (0) = f (x(k)) (small move reducing f )

f (0)

c(k) d(k) < 0

Taylor

d should be a descent direction

Chen CL

Example: analytical step size determination

Analytical Method to Compute Step Size

f (x) = 3x21 + 2x1x2 + 2x22 + 7

d(k) is a descent direction > 0


df 2(k )
df (k )
= 0,

>0
d
d2
0=

Chen CL

df (x(k+1)) df (x(k+1)) dx(k+1)


(k+1)
=
= f T (x
) d(k)


d
dx
d
c(k+1)T
T

c(k+1) d(k) = c(k+1) d(k) = 0


Gradient of the cost function at NEW point, c(k+1),
is orthogonal to the current search direction, d(k)

d(k) = (1, 1) at x(k) = (1, 2)


6x1 + 2x2 
10

c(k) = f (x(k)) =
=

2x1 + 4x2  (k)
10
x


1
(k)
(k)
= 10 10 = 20 < 0
c d
1

1
1
1

x(k+1) = + =
2
1
2
f (x(k+1)) = 3(1 )2 + 2(1 )(2 ) + 2(2 )2 + 7
= 72 20 + 22 f ()

Chen CL

10

df
10
= 14k 20 = 0 k =
d
7
2
df
= 14 > 0
d2

1
1
3/7
=

x(k+1) = + ( 10
)
7
1
2
4/7

NC:

54
< 22 = f (x(k))
7
10
7
(k+1)

f (x
) =
10

7
1

10
= 0 (check)
f T (x(k+1))d(k) = 10
7
7
1
f (x(k+1)) =

Chen CL

11

Numerical Methods to Compute Step Size


Most one-dimensional search methods work for only
unimodal functions
(work for  = 0
= u,)
(u  interval of uncertainty)

Chen CL

12

Chen CL

Unimodal Function

Unimodal Function

Unimodal function: f (x) is one unimodal function if

Outcome of two experiments


x [0, 1], 0 < x1 < x2 < 1

x1 < x2 < x implies f (x1) > f (x2), and


x > x3 > x4 implies f (x3) < f (x4)

Chen CL

f1 < f2 x [0, x2]


f1 > f2 x [x1, 1]
f1 = f2 x [x1, x2]

14

Equal Interval Search


To reduce successively the interval of uncertainty, I,
to a small acceptable value
I = u ,

( = 0)

Evaluate the function at  = 0, , 2, 3, , u


If f ((q + 1)) < f (q)

then continue

If f ((q + 1)) > f (q)

then  = (q 1),

new pt

current pt

u = (q + 1)
[ , u ]

13

Chen CL

15

Chen CL

16

Chen CL

Equal Interval Search: Example


Equal Interval Search:
Example
f () = 2 4 + e
= 0.5
 = 0.001
Note:
f (x) = x(x1.5), x [0, 1] x [x7, x8] = [0.7, 0.8]
i

xi

.1

.2

.3

.4

.5

.6

.7

.8

.9

use 99 points

eliminate 98%
eliminate 1%
per function evaluation

f (xi) .14 .26 .36 .44 .50 .54 .56 .56 .54

Chen CL

18

Equal Interval Search: 3 Interior Points

x [a, b] three tests x1, x0, x2 three possibilities

No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37


u


u


u


u

Trial step
0.000000
0.500000
1.000000
1.500000
2.000000
1.050000
1.100000
1.150000
1.200000
1.250000
1.300000
1.350000
1.400000
1.450000
1.355000
1.360000
1.365000
1.370000
1.375000
1.380000
1.385000
1.390000
1.380500
1.381000
1.381500
1.382000
1.382500
1.383000
1.383500
1.384000
1.384500
1.385000
1.385500
1.386000
1.386500
1.387000
1.386500

Function
3.000000
1.648721
0.718282
0.481689
1.389056
0.657651
0.604166
0.558193
0.520117
0.490343
0.469297
0.457426
0.455200
0.463115
0.456761
0.456193
0.455723
0.455351
0.455077
0.454902
0.454826
0.454850
0.454890
0.454879
0.454868
0.454859
0.454851
0.454844
0.454838
0.454833
0.454829
0.454826
0.454824
0.454823
0.454823
0.454824
0.454823

17

= 0.5

start
from
 = 1.0
= 0.05

start
from
 = 1.35
= 0.005

start
from
 = 1.38
= 0.0005

Chen CL

19

Equal Interval Search: 2 Interior Points

a =  + 13 (u ) = 13 (u + 2)
b =  + 23 (u ) = 13 (2u + )

Case 1: f (a) < f (b)  < < b

Case 2: f (a) > f (b) a < < u


I  = 23 I

reduced interval of uncertainty

2 points eliminate 16.7% per function evaluation ?!


try 3 points (2 are new) eliminate 25% per function evaluation !!

Why ?

old point is NOT used

Chen CL

20

Chen CL

21

Golden Section Search

Golden Section Search


Reduction of Interval of Uncertainty

Question of Equal Interval Search (n = 2):


known midpoint is NOT used in next iteration
Solution: Golden Section Search
Fibonacci Sequence:
F0 = 1;

F1 = 1;

Fn = Fn1 + Fn2, n = 2, 3,

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,


Fn
1.618,
Fn1

Given u, 
I
Select a, b s.t. u a
b 
Suppose f (b) > f (a)


b
 = ,


b 

Fn1
0.618 as n
Fn

Chen CL

22

=
=
=

=
=

u 
I, a
I, u
[b, u],

a, u


I , u

 = (1 )I
b = (1 )I
delete [b, u]



= b, I = u ,


b = (1 )I

Chen CL

23

Golden Section Search

Golden Section Search

Reduction of Interval of Uncertainty

Initial Bracketing of Minimum

I = I,

(1 )I = I = ( I)
2 + 1 = 0

= 1+2 5 = 0.618 =
1 = 0.382

1
1.618

Q : ne initial three points ?

q1

q = 0; 0 =

1 0

  
q = 1; 1 = + 1.618
 

= 2.618

1.618(0 0)

q = 2; 2

1
2 

2
= 2.618 + 1.618
  

= 5.236

2
3 

3
= 5.236 + 1.618

  

= 9.472

1.618(1 0 )

q1 q2

u a
q q1
0.618I
= 1.618

=
=
a 
q1 q2
0.382I
ratio of increased trial step size is 1.618

q = 0, 1, 2,

j=0

q1
q2


a  = 0.382I,
u 
a = 0.618I = (1.618) 0.382I
 


Starting at = 0,
q

evaluate q = (1.618)j = q1 + (1.618)q ,

q = 3; 3
..

..

1.618(2 1 )

Chen CL

24

25

Golden Section Search

Golden Section Search

Initial Bracketing of Minimum

Algorithm
Step 1: choose q,  = q2, u = q , I

f (q2) > f (q1) and


f (q1) < f (q )

If

Chen CL

Step 2: a =  + 0.382I, b =  + 0.618I, f (a), f (b)

Then q2 < < q


u = q

q


Step 3: compare f (a), f (b), go to Step 4, 5, or 6

(1.618)j

Step 4: if f (a) < f (b)  < < b


 = , u = b, b = a, a =  + 0.382(u  ), go to Step 7

(1.618)j

Step 5: if f (a) > f (b) a < < u


 = a, u = u, a = b, b =  + 0.618(u  ), go to Step 7

j=0

q2


 = q2 =

j=0

I = u

q
q1
 = (1.618)

  + (1.618)



q q1

= 2.618(1.618)

q1

Step 6: if f (a) = f (b) a < < b


 = a, u = b, return to Step 2

q1 q2

Step 7: if I  = u  < 
=

Chen CL

26

Golden Section Search: Example

u +
2

and Stop; otherwise return to Step 3

Chen CL

27

Golden Section Search: Example


f () = 2 4 + e,

f () = 2 4 + e

= 0.5,

 = 0.001

= 0.5
 = 0.001

Reducing Interval of Uncertainty


No.

l ; [f (l )]

a ; [f (a )]

b ; [f (b )]

u ; [f (u )]

0.500 000

1.309 017

1.809 017

2.618 034

2.118 034

[1.648 721]

[0.466 464]

[0.868 376]

[5.236 610]

0.500 000

1.000 000

1.309 017

1.809 017

[1.648 721]

[0.718 282]

[0.466 464]

[0.868 376]

Table 5.2 Golden Section Search for


f () = 2 4 + e

3
4

Initial Bracketing of Minimum


Trial step
1
2
3
4

0.000
1 0.500
q 1.309
u 2.618

000
000
071
034

Function value
3.000
1.648
0.466
5.236

000
721
464
610

5
6
7
8

1.000 000

1.309 017

1.500 000

1.809 017

[0.718 282]

[0.466 464]

[0.481 689]

[0.868 376]

1.000 000

1.190 983

1.309 017

1.500 000

[0.718 282]

[0.526 382]

[0.466 464]

[0.481 689]

1.190 983

1.309 017

1.381 966

1.500 000

[0.526 382]

[0.466 464]

[0.454 860]

[0.481 689]

1.309 017

1.381 966

1.427 051

1.500 000

[0.466 464]

[0.454 860]

[0.458 190]

[0.481 689]

1.309 017

1.354 102

1.381 966

1.427 051

[0.466 464]

[0.456 873]

[0.454 860]

[0.458 190]

1.354 102

1.381 966

1.399 187

1.427 051

[0.456 873]

[0.454 860]

[0.455 156]

[0.458 190]

1.309 017
0.809 017
0.500 000
0.309 017
0.190 983
0.118 304
0.072 949

Chen CL

28

Chen CL

29

Golden Section Search: Example


f () = 2 4 + e,

= 0.5,

Golden Section Search: Example

 = 0.001

f (x) = x(x 1.5);

Reducing Interval of Uncertainty (cont)

Initial Bracketing of Minimum

No.

l ; [f (l )]

a ; [f (a )]

b ; [f (b )]

u ; [f (u )]

1.354 102

1.371 323

1.381 966

1.399 187

0.045 085

[0.456 873]

[0.455 269]

[0.454 860]

[0.455 156]

1.371 323

1.381 966

1.388 544

1.399 187

[0.455 269]

[0.454 860]

[0.454 833]

[0.455 156]

10
11
12
13
14
15

1.381 966

1.388 544

1.392 609

1.399 187

[0.454 860]

[0.454 833]

[0.454 902

[0.455 156]

1.381 966

1.386 031

1.388 544

1.392 609

[0.454 860]

[0.454 823]

[0.454 833]

[0.454 902]

1.381 966

1.384 479

1.386 031

1.388 544

[0.454860]

[0.454 829]

[0.454 823]

[0.454 833]

1.384 479

1.386 031

1.386 991

1.388 544

[0.454 829]

[0.454 823]

[0.454 824]

[0.454 833]

1.384 471

1.385 438

1.386 031

1.386 991

[0.454 829]

[0.454 824]

[0.454 823]

[0.454 823]

0.027 864
0.017 221
0.010 643

2
3
4
5
6
7
8

Xa

0
0.25
0.6545
1.308981

0
0.3125
0.55338
0.25004

Xl
fmin
Xu

31

Golden Section Search: Example

= 0.25, = 0.001
Xb

1
2
3
4

Chen CL

f (x) = x(x 1.5);

Reducing Interval of Uncertainty

Xl

Fcn value

0.002 512

Golden Section Search: Example

Trial xs

0.004 065

30

No.

No.

0.006 570

Chen CL

f (x) = x(x 1.5);

= 0.25, = 0.001

= 0.25, = 0.001

Reducing Interval of Uncertainty (cont)

Xu

0.250000000

0.654530742

0.904450258

1.308981000

0.312500000

0.553385621

0.538645118

0.250040242

0.250000000

0.499999999

0.654450259

0.904450258

0.312500000

0.499999999

0.553370247

0.538645118

0.499999999

0.654499998

0.749950259

0.904450258

0.499999999

0.553379750

0.562499998

0.538645118

0.654499998

0.749980997

0.808969259

0.904450258

0.553379750

0.562500000

0.559022627

0.538645118

0.654499998

0.713507255

0.749962001

0.808969259

0.553379750

0.561168280

0.562499999

0.559022627

0.713507255

0.749973741

0.772502773

0.808969259

0.561168280

0.562499999

0.561993625

0.559022627

0.713507255

0.736043543

0.749966485

0.772502773

0.561168280

0.562305217

0.562499999

0.561993625

0.736043543

0.749970969

0.758575347

0.772502773

0.562305217

0.562499999

0.562426463

0.561993625

I
1.058981
0.654450258
0.404450259
0.24995026
0.154469261
0.095462003
0.058995518
0.03645923

No.

9
10
11
12
13
14
15
16

Xl

Xa

Xb

Xu

0.736043543

0.744650692

0.749968198

0.758575347

0.562305217

0.562471385

0.562499999

0.562426463

0.744650692

0.749969911

0.753256129

0.758575347

0.562471385

0.562499999

0.562489398

0.562426463

0.744650692

0.747937969

0.749968852

0.753256129

0.562471385

0.562495748

0.562499999

0.562489398

0.747937969

0.749969506

0.751224592

0.753256129

0.562495748

0.562499999

0.562498500

0.562489398

0.747937969

0.749193459

0.749969102

0.751224592

0.562495748

0.562499349

0.562499999

0.562498500

0.749193459

0.749969352

0.750448699

0.751224592

0.562499349

0.562499999

0.562499799

0.562498500

0.749193459

0.749672961

0.749969198

0.750448699

0.562499349

0.562499893

0.562499999

0.562499799

0.749672961

0.749969293

0.750152367

0.750448699

0.562499893

0.562499999

0.562499977

0.562499799

I
0.022531804
0.013924655
0.008605437
0.00531816
0.003286623
0.002031133
0.00125524
0.000775738

Chen CL

32

Quadratic Curve Fitting


(approximated quadratic function)

f () = q() = a0 + a1 + a22


f (i) = q(i) = a0 + a1i + a2i2
f (u) = q(u) = a0 + a1u + a2u2


1
f (u) f () f (i) f ()
a2 =

u i
u 
i 
f (i) f ()
a1 =
a2(i + )
i 
a0 = f () a1 a22

1: locate initial interval of uncertainty (, u)


2: select  < i < u f (i)
3: compute a0, a1, a2,
, f (
)
4:

< i

f (i) < f (
)

f (i) > f (
)

[
, u]

[, i]

, i,

i,
, u

[,
]

[i, u]


, i, u

,
, i

Step 5: Stop if two successive estimates of minimum point of


f () are suciently close. Otherwise delete primes on  , i , u
and return to Step 2

34

Example:
f () = 2 4 + e

Step
Step
Step
Step

i <


dq() 
= a1 + 2a2
= 0
d 
a1
d2 q

=
if d
2 = 2a2 > 0
2a2

Chen CL

33

Computational Algorithm:

Polynomial Interpolation
q() = a0 + a1 + a22

Chen CL

= 0.5

Chen CL

35

Multi-Dimensional Minimization:
Powells Conjugate Directions Method

 = 0.5 i = 1.309017 u = 2.618034


f () = 1.648721 f (i) = 0.466464 f (u) = 5.236610
 3.5879 1.1823 
1
a2 = 1.30902
2.1180 0.80902 = 2.410
a1 =

1.1823
0.80902

(2.41)(1.80902) = 5.821

a0 = 1.648271 (5.821)(0.50) 2.41(0.25) = 3.957

= 1.2077 < i

f (
) = 0.5149 > f (i)

 =
= 1.2077 u = u = 2.618034, i = i = 1.309017
 = 1.2077 i = 1.309017 u = 2.618034
f () = 0.5149 f (i) = 0.466464 f (u) = 5.236610
a2 = 5.3807 a1 = 7.30547 a0 = 2.713

= 1.3464 f (
) = 0.4579

Conjugate Directions
Let A be an n n symmetric matrix.
A set of n vectors (directions) {S i} is said to be
A-conjugate if
S Ti AS j = 0 for i, j = 1, , n;

i = j

Note: orthogonal directions are a special case of


conjugate directions (A = I)

Chen CL

36

Multi-Dimensional Minimization:
Powells Conjugate Directions Method

37

Multi-Dimensional Minimization:
Powells Conjugate Directions Method

Quadratically Convergent Method


If a minimization method, using exact arithmetic, can
nd the minimum point in n steps while minimizing a
quadratic function in n variables, the method is called
a quadratically convergent method
Theorem: Given a quadratic function of n variables
and two parallel hyperplanes 1 and 2 of dimensions
k < n. Let the constrained stationary points of the
quadratic function in the hyperplanes be X 1 and X 2,
respectively. Then the line joining X 1 and X 2 is
conjugate to any line parallel to the hyperplanes.
Chen CL

Chen CL

38

Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Meaning: If X 1 and X 2 are the minima of Q obtained
by searching along the direction S from two dierent
starting points X a and X b, respectively,
the line (X 1 X 2) will be conjugate to S

Proof:
1 T
X AX + B T X + C
2
Q(X) = AX + B
(n 1)
Q(X) =

search from a along S X 1

(stationary pt)

search from b along S X 2


S orthogonal to Q(X 1) and Q(X 2)
S T Q(X 1) = S T AX 1 + S T B = 0
S T Q(X 2) = S T AX 2 + S T B = 0
S T [Q(X 1) Q(X 2)] = S T A(X 1 X 2) = 0

Chen CL

39

Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Proof:

Theorem:

If a quadratic function
1
Q(X) = X T AX + B T X + C
2

is minimized sequentially,
once along each direction
of a set of n mutually
conjugate directions,

Q(X ) = B + AX = 0
n

j S j
Let X = X 1 +
j=1

Sj

conjugate directions
to A
n

j S j
0 = B + AX 1 + A
j=1

0 =

the

minimum of the function Q

the starting point

S Ti (B+

AX 1)+
n

S Ti A
j S j
j=1

will be found at or before


the nth step irrespective of

= (B + AX 1)T S i + iS Ti AS i
(B + AX 1)T S i
=
S Ti AS i

Chen CL

40

Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Note:

X i+1
i
0
Q(X i+1)
0
i
Xi

Chen CL

Powells Conjugate Directions: Example


f (x1, x2) = 6x21 + 2x22
 6x1x2 x1 2x2 
 

x
12 6 x
1
1
1
= 1 2
+ x1 x2
2
x
x
6
4
2
2
 
 
1
0
if S 1 =
X1 =
2

  0

12 6 s
1
S T1 AS 2 = 1 2
6
4
s
2
 
 

s
1
1
= 0
S2 =
= 0 2
s2
0



1
1 2
2
5


1 =
=


4
12
6
1



1 2
6 4  2  


0
5/4
5 1

+
X 2 = X 1 + 1 S 1 =
=
4 2
0
5/2

i S i,

=
is
=
=
=
=

Xi +
i = 1, , n
found by minimizing Q(iS i) so that
S Ti Q(X i+1)
B + AX i+1 = B + A(X i + i S i)
S Ti Q(X i+1) = S Ti {B + A(X i + i S i)}
(B + AX i)T S i + i S Ti S i
(B + AX i)T S i
=
S Ti AS i
i1

= X1 +
j S j
j=1

X iT AS i = X 1T AS i +

i1


j S j T AS i = X 1T AS i

j=1

Si
S Ti AS i
Si
= (B + AX 1)T T
S i AS i

i = (B + AX i)T

= i

Chen CL

42

Powells Conjugate Directions: Example




1 2

0
1

=
12
12
6
1


1 0
6 4 0
 
5/4
1
1
= X 2 + 2 S 2 =
+
12 0
5/2

2 =

X3


=

4/3
5/2

= X (?)

41

Chen CL

43

Powells Algorithm

Chen CL

44

Progress of Powells Method

un ; u1 ,

u2,

, un1,

S (1); u2,

, un1,

S (2); u3,

un,

S (n1); un, S (1),

S (2),

Chen CL

45

Powells Conjugate Directions: Example

un;

un,

S (1);

S (1),

S (2);

, S (n1)

Min: f (x1, x2) = x1 x2 + 2x21 + 2x1x2 + x22


X 1 = [0 0]T

(un, S (1)), (S (1), S (2)),


un, S (1), S (2),
are A-conjugate

Chen CL

46

Cycle 1: Univariate Search

= 0 =

1
2

X 2 = X 1 + u2 =


0.5
2

along u1: f (X 2 u1) = f (, 0.5) = 2 2 0.25


df
d

= 0 =

1
2

X 3 = X 2 u1 =

0.5

0.5

along u2: f (X 3 + u2) = f (0.5, 0.5 + ) = 2 0.75


df
d

= 0 =

47

Cycle 2: Pattern Search

along u2: f (X 1 + u2) = f (0, ) = 2


df
d

Chen CL

1
2

X 4 = X 1 + u2 =

0.5

S (1) = X 4 X 2

0.5
0
0.5
=

=
1
0.5
0.5

f (X 4 + S (1)) = f (0.5 0.5, 1 + 0.5)


= 0.252 0.5 1
df
d

= 0 = 1.0

X 5 = X 4 + S (1) =

1.0

1.5

Chen CL

48

Chen CL

Simplex Method

Chen CL

Simplex Method

50

Simplex Method

49

Chen CL

51

Simplex Method

Chen CL

52

Chen CL

Properties of Gradient Vector

53

Properties of Gradient Vector


Property 1: The gradient vector c of a function
f (x1, , xn) at point x = (x1, , xn) is orthogonal
(normal) to the tangent plane for the surface
f (x1, , xn) = constant.

f (x) = f (x1, x2, . . . , xn)

f (x)
x1
.
f (x) =
. = c
f (x)
xn

c(k) = c(x(k)) = f (x(k)) =

f (x(k))
xi

C is any curve on the surface through x


T is a vector tangent to curve C at x c T = 0

Chen CL

54

Chen CL

Properties of Gradient Vector


Proof:
s : any parameter along C

x1
s

T =

xn
s

x=x
(a unit tangent vector along C at x)
df
= 0
f (x) = constant
ds
df
f x1
f xn
0 =
=
+ +
ds
x1 s
xn s
T
= c T = cT

55

Properties of Gradient Vector


Property 2: Gradient represents a direction of
maximum rate of increase for f (x) at x
Proof:
u

a unit vector in any direction not tangent to C

t : a parameter along u
df
f (x + u) f (x)
= lim
0
dt

 
f
f
2
f (x + u) = f (x) +  u1 x
+

+
u
n
xn + O( )
1
n

1
f
ui
+ O(2)
( )
f (x + u) f (x) = 
xi

i=1
n

df
f (x + u)
f
ui
= c u = cT u
= lim
=
0
dt

x
i
i=1
= ||c|| ||u|| cos

(max rate of increase when = 0)

Chen CL

56

Chen CL

Properties of Gradient Vector

57

Verify Properties of Gradient Vector

Property 3: The maximum rate of change of f (x) at


any point x is the magnitude of the gradient vector
 
 
(max  df
dt  = ||c||)
u is in the direction of gradient vector for = 0

f (x) = 25x21 + x22,


f (x(0)) = 25

x(0) = (0.6, 4)


c = f (0.6, 4) =

C T =0

Slope of tangent: m1 =
Slope of gradient: m2 =

dx2
dx1

c1
c2

5x1 2
1x1
50x1
2x2

1
= 3.75

30
8

= 3.75

Property 2: choose arbitrary direction


D = (0.501034, 0.865430), = 0.1


x(1)
C
x(1)
D
f (x(1) )
C
f (x(1) )
D




0.6
0.966235
0.6966235
(0)
= x + C =
+ 0.1
=
4.0
0.257663
4.0257663
0.6
0.501034
0.6501034
= x(0) + D =
+ 0.1
=
4.0
0.865430
4.0854300
= 28.3389
= 27.2566

Property 3:

<

f (x(1) )
C

C C = 1.00 > C D = 0.7071059

50x1
2x2


=

30
8

15
t
=
=
(4)2 +152
||t||

.257663
0.966235

Chen CL

Verify Properties of Gradient Vector


Property 1:

58

30



8
0.966235
c
C =
= 2 2 =
30 +8
||c||
0.257663

 
(25x21 +x22 =25)
1
= 4
t = (25x2s
2
+x
=25)
1
2
15
s
2

Chen CL

f
x1
f
x2

59

Steepest Descent Algorithm


Steepest Descent Direction
Let f (x) be a dierentiable function w.r.t. x. The
direction of steepest descent for f (x) at any point
is d = c
Steepest Descent Algorithm:

Step
Step
Step
Step
Step

1:
2:
3:
4:
5:

a starting design x(0), k = 0, 


c(k) = f (x(k)); stop if ||c(k)|| < 
d(k) = c(k)
calculate k to minimize f (x(k) + d(k))
x(k+1) = x(k) + k d(k), k = k + 1 Step 2

Chen CL

60

Chen CL

61

Steepest Descent Algorithm

Notes:

Steepest Descent: Example


f (x1, x2) = x21 + x22 2x1x2

d = c c d = ||c||2 < 0
The successive directions of steepest
descent are normal to each other

Step 1: x(0) = (1, 0), k = 0, ()


Step 2: c(0) = f (x(0)) = (2x1 2x
2, 2x2 2x1)
= (2, 2); ||c(0)|| = 2 2 = 0

d = c
d(k) d(k+1) = c(k) c(k+1) = 0

Step 3: d(0) = c(0) = (2, 2)

proof:

Step 4: to minimize f (x(0) + d(0)) = f (1 2, 2)

df (x(k+1) )
 d (k+1)

0 =

f (1 2, 2) = (1 2)2 + (2)2 2(1 2)(2)


= 162 8 = f ()
df ()
= 32 8 = 0 0 = 0.25
d

T
)
x(k+1)
f (x
=
x



   (k)
T
(k) +d
(k+1)
(
x
)
c

d2 f ()
d2

(k+1) T

= c
d(k) = c(k+1) c(k)
(k+1)
= d
d(k)

Chen CL

Chen CL

63

Steepest Descent: Example


f (x1, x2, x3) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3
x(0) = (2, 4, 10)
x = (0, 0, 0)
Step 1: k = 0,  = 0.005, ( = 0.05, = 0.0001 for Golden)
4x2 + 2x1 + 2x3, 4x3 + 2x2)
Step 2: c(0) = f (x(0)) = (2x1 + 2x2,
= (12, 40, 48); ||c(0)|| = 4048 = 63.6 > 
(0)

Step 3: d

= c

(0)

= (12, 40, 48)

Step 4: to minimize f (x(0) + d(0)) by Golden 0 = 0.158718


(1)

(0)

(0)

x = x + 0d = (0.0954, 2.348, 2.381)


c(1) = (4.5, 4.438, 4.828); ||c(1)|| = 7.952 > 
Note: c(1) d(0) = 0 (perfect line search)

= 32 > 0

Step 5:
x(1) = x(0) + 0d(0) = (1 0.25(2), 0 + 0.25(2)) = (0.5, 0.5)
c(1) = (0, 0) stop
62

Step 5:

x(0) = (1, 0)

Steepest Descent: Example


Optimum solution for Example 5.10 with steepest descent program
NO.

x1
2.00000E + 00

x2
4.00000E + 00

x3

f (x)

c

1.00000E + 01 3.32000E + 02 1.58718E 01 6.36239E + 01

9.53870E 02

2.34871E + 00

2.38155E + 00 1.07503E + 01 3.05872E 01 7.95922E + 00

1.47384E + 00

9.90342E 01

9.04562E 01 1.05936E + 00 1.81571E 01 2.06142E + 00

1.29826E + 00

1.13477E + 00

6.07228E 01 6.73757E 01 6.49989E 01 8.13910E 01

1.08573E + 00

6.61514E 01

5.03640E 01 4.58534E 01 1.90588E 01 1.21729E + 00

9.24028E 01

7.63036E 01

3.71842E 01 3.17218E 01 5.88053E 01 5.63154E 01

7.34684E 01

4.92294E 01

3.94601E 01 2.24007E 01 1.93877E 01 8.19377E 01

6.40697E 01

5.48401E 01

2.79474E 01 1.58946E 01 5.71554E 01 3.99141E 01

5.35008E 01

3.46139E 01

2.67396E 01 1.13373E 01 1.94660E 01 5.77545E 01

10

4.61478E 01

3.89014E 01

1.93950E 01 8.09174E 02 5.69767E 01 2.84837E 01

11

3.78902E 01

2.49307E 01

1.95219E 01 5.78310E 02 1.94601E 01 4.11895E 01

12

3.28464E 01

2.78695E 01

1.40291E 01 4.13141E 02 5.72088E 01 2.03339E 01

13

2.71519E 01

1.77281E 01

1.38132E 01 2.94940E 02 1.94439E 01 2.94710E 01

14

2.34872E 01

1.98704E 01

9.96396E 02 2.10499E 02 5.72650E 01 1.45112E 01

15

1.93449E 01

1.26669E 01

9.89806E 02 1.50233E 02 1.94457E 01 2.10430E 01

16

1.67477E 01

1.41872E 01

7.12541E 02 1.07197E 02 5.71777E 01 1.03580E 01

17

1.38196E 01

9.03973E 02

7.05267E 02 7.65362E 03 1.94534E 01 1.50077E 01

18

1.19599E 01

1.01263E 01

5.08180E 02 5.46341E 03 5.70832E 01 7.39559E 02

Chen CL

64

Chen CL

Steepest Descent: Example


x1

NO.

x2

x3

f (x)

65

Steepest Descent: Disadvantages


c

19

9.86659E 02

6.46051E 02

5.03927E 02 3.90156E 03 1.94571E 01 1.07016E 01

20

8.54114E 02

7.23289E 02

3.63134E 02 2.78693E 03 5.72147E 01 5.28078E 02

21

7.04412E 02

4.60867E 02

3.59726E 02 1.98946E 03 1.94320E 01 7.65397E 02

22

6.09761E 02

5.16211E 02

2.59230E 02 1.41991E 03 5.74372E 01 3.76650E 02

23

5.02296E 02

3.28470E 02

2.56647E 02 1.01241E 03 1.94224E 01 5.46915E 02

24

4.34774E 02

3.68093E 02

1.84853E 02 7.21938E 04 5.74409E 01 2.68589E 02

25

3.58170E 02

2.34188E 02

1.83000E 02 5.14807E 04 1.94379E 01 3.90103E 02

26

3.09971E 02

2.62487E 02

1.31757E 02 3.67049E 04 5.71430E 01 1.91685E 02

27

2.55704E 02

1.67348E 02

1.30584E 02 2.62104E 04 1.94475E 01 2.77637E 02

28

2.21338E 02

1.87414E 02

9.40925E 03 1.87131E 04 5.72554E 01 1.36815E 02

29

1.82492E 02

1.19396E 02

9.32100E 03 1.33549E 04 1.94475E 01 1.98348E 02

30

1.57951E 02

1.33752E 02

6.71412E 03 9.53054E 05 5.71873E 01 9.76661E 03

31

1.30274E 02

8.52432E 03

6.65348E 03 6.80463E 05 1.94439E 01 1.41534E 02

32

1.12762E 02

9.54793E 03

4.79362E 03 4.85693E 05 5.72520E 01 6.97001E 03

33

9.29691E 03

6.08244E 03

4.74860E 03 3.46611E 05 1.94283E 01 1.01055E 02

Slow to converge, especially when approaching the optimum


a large number of iterations
Information calculated at previous iterations is NOT used,
each iteration is started independent of others

Optimum design variables : 8.04787E 03 , 6.81319E 03 , 3.42174E 03


Optimum cost function value : 2.47347E 05
Norm of gradient at optimum : 4.97071E 03
Total no. of function evaluations : 753

Chen CL

66

Chen CL

67

Scaling of Design Variables


Example:

The steepest descent method converges in only one iteration for a


positive denite quadratic function with a unit condition number
of the Hessian matrix

To accelerate the rate of convergence


scale design variables such that
condition number of new Hessian

matrix

is

unity

Min: f (x1, x2) = 


25x21 + x22
50 0
H =
0 2
let x = Dy

D=



Min: f (y1, y2) = 12 y12 + y22

y 0 = ( 50, 2)

x0 = (1, 1)


1
50

0
1
2

Chen CL

68

Chen CL

69

Example:
Min: f (x1, x2) = 6x21 6x1x2 + 2x22 5x1 + 4x2 + 2


H =

12 6
6 4

1,2 = 0.7889, 15.211

(eigenvalues)

v 1,2 = (0.4718, 0.8817), (0.8817, 0.4718)





0.4718 0.8817
let x = Qy
Q = v1 v2 =
0.8817

0.4718

Min: f (y1, y2) = 0.5(0.7889y12 + 15.211y22) + 1.1678y1 + 6.2957y2 + 2

1
0

let y = Dz
D = 0.7889
1
0
15.211
Min: f (y1, y2) = 0.5(z12 + z22) + 1.3148z1 + 1.6142z2
x0 = (1, 2) z = (1.3158, 1.6142)
x = QDz = ( 31 , 23 )

Chen CL

70

Conjugate Gradient Method


Fletcher and Reeves (1964)
Steepest Descent: orthogonal at consecutive steps
converge but slow
Conjugate Gradient Method:
modify current steepest descent direction by adding a
scaled previous direction
cut diagonally through orthogonal steepest descent directions
Conjugate Gradient Directions: d(i), d(j)
orthogonal w.r.t. a symmetric and positive denite
matrix A
T
d(i) Ad(j) = 0

Chen CL

71

Conjugate Gradient Method: algorithm


Step 1: k = 0, x(0) d(0) = c(0) = f (x(0))
Stop if ||c(0)|| < , otherwise go to Step 4
Step 2: c(k) = f (x(k)), Stop if ||c(k)|| < 

2
Step 3: d(k) = c(k)+k d(k1), k = ||c(k)||/||c(k1)||
Step 4: compute k = to minimize f (x(k) + d(k))
Step 5: x(k+1) = x(k) + k d(k), k = k + 1, go to Step 2

Note:
Find the minimum in n iterations for positive denite quadratic
forms having n design variables
Inexact line search, non-quadratic forms
re-started every n + 1 iterations for computational stability
(x(0) = x(n+1))

Chen CL

72

Chen CL

73

Example:

0.0956

4.31241

2.348 + 3.81268

5.57838
2.381

x(2) = x(1) + d(1) =

Min: f (x(1) + d(1)) = 0.3156


Min: f (x) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3
c(0) = (12, 40, 48);

c(2) = (0.6238, 0.4246, 0.1926),

f (x(0)) = 332.0

Note:

(1)

= (0.0956, 2.348, 2.381)

(1)

= (4.5, 4.438, 4.828); ||c(1)|| = 7.952; f (x(1)) = 10.75



2
= ||c(1)||/||c(0)||
= [7.952/63.3]2 = 0.015633

x
c

||c(0)|| = 63.6;

x(2) = (1.4566, 1.1447, 0.6205)

x(0) = (2, 4, 10)

1
(1)

||c(2)|| = 0.7788

c(2) d(1) = 0

= c(1) + 1d(0)

4.31241
12
4.500

+ (0.015633) 40 = 3.81268
=
4.438

5.57838
48
4.828

Chen CL

74

Newton Method
A Second-order Method
x : current estimate of x
x x + x (desired)
f (x + x) = f (x) + cT x + 12 xT Hx
NC:

f
x

= c + Hx = 0

x = H 1c
x = H 1c

(modied)

Chen CL

75

Steps: (modied)
Step 1: k = 0; c(0); 
(k)
x(k)) , i = 1 n; Stop if ||c(k)|| < 
Step 2: ci = f (x
i

2
f
Step 3: H(x(k)) = xix
j

Step 4: d(k) = H 1c(k)

or

Hd(k) = c(k)

Note:for computational eciency, a system of linear simultaneous eqns


is solved instead of evaluating the inverse of Hessian

Step 5: compute k = to minimize f (x(k) + d(k))


Step 6: x(k+1) = x(k) + d(k), k = k + 1, go to Step 2

Note: unless H is positive denite,


d(k) will not be that of descent for f
T

1 (k)
H > 0 c(k) d(k) = k c(k) H
 c  < 0
> 0 for positive H

Chen CL

76

Chen CL

Example:

f (x) = 3x21 + 2x1x2 + 2x22 + 7


x(0) = (5, 10);

 = 0.0001

c(0) = (6x1 + 2x2, 2x1 + 4x2) = (50, 50);




6 2
,
2 4

H (0)

(1)

=


||c(0)|| = 50 2

c(1) =

4 2

2 6 

4 2 50
5
(0)
1 (0)
1
d
= H c = 20
=
2 6 50
 

 10

5
5

5
5
(0)
x(1) = x(0) + d =
+
=
10
10
10 10

H (0) =

df
d

77

5 5
10 10
50 50
50 50

 
0
=
0

 
0
=
0

1
= 20

= 0 or f (x(1)) d(0) = 0





6(5

5)
+
2(10

10)
50

50
f (x(1)) =
=
2(5 5) + 4(10 10)

 50 50

5
f (x(1)) d(0) = 50 50 50 50
10

= 5(50 50) 10(50 50) = 0 = 1


Chen CL

78

Example:
f (x) = 10x41 20x21x2 + 10x22 + x21 2x1 + 5,

x(0) = (1, 3)

c = f (x) = (40x31 40x1x2 + 2x1 2, 20x21 + 20x2)




H = f (x) =

120x21 40x2 + 2 40x1


40x1
20

Chen CL

79

Chen CL

80

Comparison of Steepest Descent,


Conjugate Gradient Methods
f (x) = 50(x2 x21)2 + (2 x1)2

Chen CL

Chen CL

81

Chen CL

83

Newton,

x(0) = (5, 5) x = (2, 4)

82

Newton Method
Advantage: quadratic convergent rate
Disadvantages:
Calculation of second-order derivatives at each iteration
A system of simultaneous linear equations needs to be solved
Hessian of the function may be singular at some iterations
Memoryless method: each iteration is started afresh
Not convergent unless Hessian remains positive denite and a
step size determination scheme is used

Chen CL

84

Chen CL

Quasi-Newton Methods

Marquardt Modication (1963)


(k)

Steepest Descent:

1 (k)

= (H + I) c

Use only 1st-order information poor rate of convergence


Each iteration is started with new design variables without using
any information from previous iterations

Far away solution point use Steepest Descent


Near the solution point use Newton Method
Step 1: k = 0; c
Step 2:
Step 3:

(0)

Newton Method:

; ; (= 10000) (large)

x(k)) , i = 1 n; Stop
= f (x
i 2
f
(k)
)
H(x = xix
j
(k)
d = (H + k I)1c(k)
(k)
(k)
(k)

(k)
ci

Use 2nd-order derivatives quadratic convergence rate

if ||c(k)|| < 

2nd-order derivatives !
Requires calculation of n(n+1)
2
DIculties if Hessian is singular
Not learning processes

Step 4:
Step 5: if f (x + d ) < f (x ), go to Step 6
Otherwise, let k = 2k and go to Step 4
Step 6: Set k+1 = 0.5k , k = k + 1 and go to Step 2

Chen CL

85

86

Quasi-Newton Methods
Quasi Newton Methods, Update Methods:
Use rst-order derivatives to generate approximations for Hessian
combine desirable features of both steepest descent and Newtons
methods
Use information from previous iterations to speed up convergence
(learning processes)
Several ways to approximate (updated) Hessian or its inverse
Preserve properties of symmetry and positive deniteness

Chen CL

87

Davidon-Fletcher-Powell (DFP) Method


Davidon (1959), Fletcher and Powell (1963)
To approximate Hessian inverse using only rst
derivatives
x = H 1c
Ac
A :

nd A by using only 1st-order information

Chen CL

88

Step
Step
Step
Step
Step
Step

1:
2:
3:
4:
5:
6:

Matrix A(k) is always positive denite


always converge to a local minimum if > 0


T
d
(k) 
(k)
= c(k) A(k)c(k) < 0
f (x + d )
d
=0

k = 0; c(0), ; A(0)(= I, H 1)
c(k) = f (x(k)), Stop if ||c(k)|| < 
d(k) = A(k)c(k)
compute k = to minimize f (x(k) + d(k))
x(k+1) = x(k) + k d(k)
update A(k)

When applied to a positive denite quadratic form,


A(k) converges to inverse of Hessian of the quadratic form

A(k+1) = A(k) + B (k) + C (k)


(k) (k) T
(k) (k) T
B (k) = ss(k)sy (k)
C (k) = yz(k)zz (k)
s(k) = k d(k)
(change in design)
y (k) = c(k+1) c(k)
c

(k+1)

= f (x

(k+1)

89

DFP Properties:

A H 1

DFP Procedures:

Chen CL

(change in gradient)
z (k) = A(k)y (k)

Step 7: set k = k + 1 and go to Step 2

Chen CL

90

Chen CL

91

DFP Example:
f (x) = 5x21 + 2x1x2 + x22 + 7
1-1.

x(0) = (1, 2);


c

1-2.

||c

A(0) = I;

x(0) = (1, 2)

k = 0,  = 0.001

(0)

(0)

= (10x1 + 2x2, 2x1 + 2x2) = (14, 6),

|| =
142 + 62 = 15.232 > 

1-3.

d(0) = c(0) = (14, 6)

1-4.

x(1) = x(0) + d(0) = (1 14, 2 6)


f (x(1)) = f () = 5(1 14)2 + 2(1 14)(2 6) + 2(2 6)2 + 7
df
d

= 5(2)(14)(1 14) + 2(14)(2 6) + 2(6)(1 14)


+2(6)(2 6) = 0

= 0.0988,
1-5.

(1)

= x

(0)

d2f
d2
(0)

+ 0d

= 2348 > 0

= (1 14, 2 6) = (0.386, 1.407)

1-6.

s(0) = 0d(0) = (1.386, 0.593),

c(1) = (1.406, 2.042)

y (0) = c(1) c(0) = (15.046, 3.958),


s(0) y (0) = 23.20,

1.921
T
s(0)s(0) =
0.822

0.0828
B (0) =
0.0354
A(1)

z (0) = y (0)

y (0) z (0) = 242.05

0.822
226.40 59.55
T

z (0)z (0) =
0.352
59.55 15.67

0.935 0.246
0.0354

C (0) =
0.246 0.065
0.0152

0.148 0.211

= A(0) + B (0) + C (0) =


0.211 0.950

Chen CL

92

Chen CL

93

s(1) = 1d(1) = (0.455, 1.334),

2-6.

c(2) = (0.836, 0.284)

y (1) = c(2) c(1) = (1.882, 1.758)


z (1) = A(1)y (1) = (0.649, 2.067)

2-2. ||c(1)|| =

s(1) y (1) = 3.201,


y (1) z (1) = 4.855

0.207 0.607
0.421 1.341
T
z (1)z (1)T =

s(1)s(1) =
0.607 1.780
1.341 4.272

.0647
0.19
.0867
0.276
C (1) =

B (1) =
0.19 0.556
0.276 0.880

0.126 0.125

A(2) = A(1) + B (1) + C (1) =


0.125 0.626

1.0462 + 2.0422 = 2.29 > 

2-3.

d(1) = A(1)c(1) = (0.586, 1.719)

2-4.

x(2) = x(1) + d(1)


1 = 0.776 (minimize f (x(1) + d(1)))

2-5.

x(2) = x(1) + d(1) = (0.386, 1.407) + (0.455, 1.334)


= (0.069, 0.073)

Chen CL

94

Broyden-Fletcher-Goldfarb-Shanno (BFGS)
Method
Direct update Hessian using only rst derivatives
x = H 1c
Hx = c
Ax c
A :

nd A by using only 1st-order information

Chen CL

95

BFGS Procedures:

Step
Step
Step
Step
Step
Step

1:
2:
3:
4:
5:
6:

k = 0; c(0), ; H (0)(= I, H)
c(k) = f (x(k)), Stop if ||c(k)|| < 
solve H (k)d(k) = c(k) to obtain d(k)
compute k = to minimize f (x(k) + d(k))
x(k+1) = x(k) + k d(k)
update H (k)

H (k+1) = H (k) + D (k) + E (k)


T
(k) (k) T
y (k)y (k)
D (k) = y (k)s(k)
E (k) = c (k)c (k)
c d
s(k) = k d(k)
(change in design)
y (k) = c(k+1) c(k)

(change in gradient)

c(k+1) = f (x(k+1))
Step 7: set k = k + 1 and go to Step 2

Chen CL

96

Chen CL

97

BFGS Example:
f (x) = 5x21 + 2x1x2 + x22 + 7
H (0) = I;

x(0) = (1, 2);

1-1.

x(0) = (1, 2)
k = 0,  = 0.001

= (10x1 + 2x2, 2x1 + 2x2) = (14, 6),

||c(0)|| =
142 + 62 = 15.232 > 
c

1-2.

(0)

1-4.

x(1) = x(0) + d(0) = (1 14, 2 6)


= 5(2)(14)(1 14) + 2(14)(2 6) + 2(6)(1 14)
+2(6)(2 6) = 0

= 0.0988,

d2f
d2
(0)

x(1) = x(0) + 0d

1-5.

= 2348 > 0

H (1)

= (1 14, 2 6) = (0.386, 1.407)

Chen CL

98

c(0) d(0) = 232.0

59.55
196
84
T

c(0)c(0) =
15.67
84 36

2.567
0.845 0.362

E (0) =
0.675
0.362 0.155

9.915
2.205

= H (0) + D (0) + E (0) =


2.205 0.520

y (0) s(0) = 23.20,

226.40
T
y (0)y (0) =
59.55

9.760
D (0) =
2.567

f (x(1)) = f () = 5(1 14)2 + 2(1 14)(2 6) + 2(2 6)2 + 7


df
d

c(1) = (1.406, 2.042)

y (0) = c(1) c(0) = (15.046, 3.958)

= c(0) = (14, 6)

1-3.

s(0) = 0d(0) = (1.386, 0.593),

1-6.

(0)

Chen CL

2-6.

99

s(1) = 1d(1) = (0.317, 1.417),

c(2) = (0.706, 0.157)

y (1) = c(2) c(1) = (0.340, 2.199)

2-2. ||c(1)|| =
2-3.
2-4.

1.0462 + 2.0422 = 2.29 > 

c(1) = H (1)d(1) d(1) = (17.20, 76.77)


x(2) = x(1) + d(1)
1 = 0.018455 (minimize f (x(1) + d(1)))

2-5.

x(2) = x(1) + 1d(1) = (0.0686, 0.0098)

y (1) s(1) = 3.224,


c(1) d(1) = 174.76

0.1156 0.748
1.094 2.136
T
c(1)c(1)T =

y (1)y (1) =
0.748 4.836
2.136 4.170

.0063 .0122
0.036 0.232

E (1) =
D (1) =
.0122 .0239
0.232 1.500

9.945
1.985

H (2) = H (1) + D (1) + E (1) =


1.985 1.996