287 517 1 SM

JNM: Journal of Natural Sciences and Mathematics
Vol. 2, No. 1, pp. 17-40 (June 2008/Jumada Al-Akher 1429

H
)
c _ Qassim University Publications
Reduced Gradient Method and its Generalization
via Stochastic Perturbation
Abdelkrim El Mouatasim
Department of Mathematics,
Faculty of Sciences, Jazan University,
B.P. 114, Jazan, Saudi Arabia.
E-mail: a.mouatasim@gmail.com
Abstract: In this paper, the global optimization of a nonconvex objective function
under linear and nonlinear dierentiable constraints is studied, a reduced gradient and
GRG descent methods with random perturbation is proposed and it is desired to establish
the global convergence of the algorithm. Some numerical examples are also given by the
problems of statistical, octagon, mixture, alkylation and pooling.
Keywords: Stochastic perturbation, constraints optimization, reduced gradient
method and its generalization, Nanconvex optimization.
Received for JNM on November 19, 2007.
1 Introduction
The general constrained optimization problem is to minimize a nonlinear function
subject to linear or nonlinear constraints. Two equivalent formulations of this problem
are useful for describing algorithms. They are
Minimize : f(x)
subject to : g
i
(x) = 0, i = 1, .., m
l
g
i
(x) 0, i = m
l
+ 1, .., m
x u
(1.1)
where f and each g
i
: IR
n
IR is twice continuously dierentiable function and the
lower- and upper-bound vectors, and u, may contain some innite components; and
Minimize : f(x)
Subject to : h(x) = 0
x 0
(1.2)
17
18 Abdelkrim El Mouatasim
where h maps IR
m
to IR, the objective function f : IR
n
IR and the restriction
h
j
: IR
n
IR, j = 1, ..., m (the component of h) are twice continuously dierentiable
functions. For the case of multi objective optimization see [10].
The main technique that has been proposed for solving constrained optimization
problems in this work is reduced gradient method and its generalization. We are mainly
interested in the situation where, on one hand f is not convex and, on the other hand,
the constraints are in general not linear; to get equality constraints one might think
about introducing slack variables and incorporating them into the functions g
i
.
The problem (1.2) can be numerically approached by using reduced gradient method
or its generalizations, which generates a sequence x
k
k0
, where x
0
is an initial feasible
point and, for each k > 0, a new feasible point x
k+1
is generated from x
k
by using an
operator Q
k
(see section 3). Thus the iterations are given by:
k 0 : x
k+1
= Q
k
(x
k
). (1.3)
In order to prevent convergence to local minimum, various modications of these basic
methods have been introduced in the literature. For instance, the reduced gradi-
ent method has been considered by [1]; the random perturbations projected gradient
method in [2]; the sequential convex programming in [24]; the generalized reduced
gradient in [5,7, 2526,32, 34]; sequential quadratic programming in [15, 30]; nonlin-
ear programming in [3, 20, 27]; stochastic perturbation of active set [11]. Moreover,
stochastic or evolutionary search can be introduced, but these methods are usually
considered as expensive and not accurate: on the one hand, the exibility introduced
by randomization implies often a large number of evaluations of the objective func-
tion and, on the other hand, the pure random search generates many points which do
not improve the value of the objective function. This remark has led some people to
introduce a controlled random search (see for instance [2, 4, 6, 9, 23, 28, 33]).
In such a method, the sequence x
k
k0
is replaced by a random vectors sequence
X
k
k0
and the iterations are modied as follows:
k 0 : X
k+1
= Q
k
(X
k
) +T
k
(1.4)
where T
k
is a suitable random variable, called the stochastic perturbation. The se-
quence T
k
k0
goes to zero slowly enough in order to prevent convergence to a local
minimum (see Section 4). The reduced gradient method and its generalization are
recalled in Section 3, while the notations are introduced in Section 2. The results of
numerical experiments will be given in Section 5.
2 Notations and assumptions
We denote by IR the set of the real numbers (, +), IR
+
the set of posi-
tive real numbers [0, +), E = IR
n
, the n-dimensional real Euclidean space. For
Reduced Gradient Method and its Generalization via Stochastic Perturbation 19
x = (x
1
, x
2
, ...., x
n
)
t
E, x
t
denotes the transpose of x. We denote by | x | =
x
t
x = (x
2
1
+ ... + x
2
n
)
1/2
the Euclidean norm of x.
The general optimization problem (1.1) is equivalent to global optimization problem
(1.2) with equality constrained and nonnegative variables.
Remark 2.1 The assumed conditions are satised by using two phase method, since
we can add articial variables in the constraints.
Linear case: h(x) = Ax b where A is m n matrix with rank m, and b is
an m-vector. Any m columns of A are linearly independent, and every extreme
point of the feasible region has m strictly positives variables.
We introduce basic and nonbasic variables according to
A = [B, N] x =
_
x
B
x
N
_
x
N
0 x
B
> 0 (2.1)
by nondegeneracy assumption. Furthermore the gradient of f may be conformally
partitioned as
f(x)
t
=
_
B
f(x)
t
,
N
f(x)
t
,
where
B
f and
N
f are the basic gradient and nonbasic gradient of f respec-
tively.
Nonlinear case: for simplicity we also assume that the gradients of the con-
straint functions h
j
are linearly independent for all point x 0. This assumption
like the assumption of the full rank of the matrix A in the linear case ensures that
we can choose a basis and express the basic variables as a linear combination of
the nonbasic variables and then reduce the problem. Hence a similar generalized
reduced gradient procedure applies.
Let a feasible solution x
k
0 with h
j
(x
k
) = 0 for all j be given. By assumption
the Jacobian matrix of the constraints h(x) = (h
1
(x), ..., h
m
(x))
t
at each x 0
has full rank and, for simplicity at the point x
k
will be denoted by
A
k
= Jh(x
k
) and b
k
= A
k
x
k
.
Then a similar construction as in linear case applies.
Let
C = x E [ h(x) = 0, x 0.
The objective function is f : E IR, its lower bound on C is denoted by l
: l
=
min
C
f. Let us introduce
C
= S
C; S
= x E [ f(x) .
We assume that
f is twice continuously dierentiable on E (2.2)
> l
: C
is not empty, closed and bounded (2.3)

> l
: meas(C
) > 0, (2.4)
where meas(C
) is the measure of C
.
Since E is a nite dimensional space, the assumption (2.3) is veried when C is
bounded or f is coercive, i.e., lim
x+
f(x) = +. Assumption (2.3) is veried when C
contains a sequence of neighborhoods of a point of optimum x
having strictly positive

measure, i.e., when x
can be approximated by a sequence of points of the interior of

C. We observe that the assumptions (2.2)-(2.3) yield that
C =
_
>l
i.e., x C : > l
such that x C
.
From (2.2) and (2.3) we get
1
= sup|f(x)| : x C
< +.
Then
2
= sup|d| : x C
< +,
where d is the direction of reduced gradient method (see section 3). Thus,
(, ) = sup|y (x + d)| : (x, y) C
, 0 < +, (2.5)
where , are positive real numbers.
3 Reduced Gradient and its Generalization
3.1 Reduced Gradient Method
The reduced gradient method is coldly related to the simplex method of linear pro-
gramming in that the problem variables are partitioned into basic and nonbasic groups.
From a theoretical viewpoint, the method can be shown to have very much like the gra-
dient projection method [1]. Some variants of the method applied to convex nonlinear
problems with linear constraints are also referred to as convex simplex method.
Again we are dealing with the same basic descent algorithm structure and are
interested in still other means for computing descent directions. We use the following
form of the linearly constrained with f(.) continuously dierentiable on IR
n
:
_
_
_
Minimize : f(x)
subject to : Ax = b
x 0
(3.1)
where A is an mn matrix with full row rank and b IR
m
.
The reduced gradient method begins with a basis B and a feasible solution x
k
=
(x
k
B
, x
k
N
) such that x
k
B
> 0. The solution x is not necessarily a basic solution, i.e. x
N
do not have to be identically zero. Such a solution can be obtained e.g. by the usual
rst phase procedure of linear optimization. Using the basis B from Bx
B
+ Nx
N
= b,
we have
x
B
= B
1
b B
1
Nx
N
,
hence the basic variables x
B
can be eliminated from the problem (3.1)
_
_
_
Minimize : f
N
(x
N
)
subject to : B
1
b B
1
Nx
N
> 0
0 x
N
(3.2)
where f
N
(x
N
) = f(x) = f(B
1
b B
1
Nx
N
, x
N
). Using the notation
f(x)
t
=
_
B
f(x)
t
,
N
f(x)
t
,
the gradient of f
N
, which is the so-called reduced gradient, can be expressed as
f
N
(x)
t
=
_
B
f(x)
t
B
1
N) + (
N
f(x)
t
_
.
Now let us assume that the basis is nondegenerate, i.e. only the nonnegativity
constraints x
N
0 might be active at the current iterate x
k
. Let the search direction
be a vector d
t
= (d
t
B
, d
t
N
) in the null space of the matrix A dened as d
B
= B
1
Nd
N
and d
N
0. If we dene so, then the feasibility of x
k
+ d is guaranteed as long as
x
k
B
+ d
B
0, i.e. as long as

max
= min
iB,d
i
<0
x
k
i
d
i
.
We still need to dene d
N
0 such that it is a descent direction of f
N
projected to
the coordinate hyperplane active at the current point x
k
N
. So we have
d
k
j
=
_
_
0 if x
k
j
= 0 and
f
N
(x
k
N
)
x
j
0,
f
N
(x
k
N
)
x
j
otherwise
j N.
To complete the description of the algorithm we make a line search to obtain the new
point.
x
k+1
= arg min
0
max
f(x
k
+ d
k
).
If all the coordinates x
k+1
B
stay strictly positive we keep the basis, else a pivot is made
to eliminate the zero variable from the basis and replace it by a positive but currently
nonbasic coordinate. For examples of this type see the statistical problem in Section 5.
3.2 Generalized reduced gradient (GRG)
The reduced gradient method can be generalized to nonlinearly constrained optimiza-
tion problems. Similar to the linearly constrained case we consider the problem with
equality constraints and nonnegative variables.
The reduced gradient method and its generalization try to extend the methods of
linear optimization to the nonlinear case. These methods are close to, or equivalent
to projected gradient methods; just the presentation of methods is frequently quite
dierent.
We generate a reduced gradient search direction by virtually keeping the linearized
constraints valid. This direction by construction will be in the null space of A
k
. More
specically for the linearized constraints we have
h(x
k
) + Jh(x
k
)(x x
k
) = 0 + A
k
(x x
k
) = 0.
From this, one has
B
k
x
B
k
+ N
k
x
N
k
= A
k
x
k
,
and by introducing the notation b
k
= A
k
x
k
, we have
x
B
k
= B
1
k
b
k
B
1
k
N
k
x
N
k
.
Hence the basic variables x
B
k
can be eliminated from the linearization of the problem
(1.2) to result the problem (3.2).
From this point the generation of the search direction d goes exactly to the same
way as in the linearly constrained case. Due to the nonlinearity of the constraints
h(x
k+1
) = h(x
k
+ d) = 0 will not hold. Hence something more has to be done to
restore feasibility.
In old versions of the GRG Newtons method is applied to the nonlinear equality
system h(x) = 0 from the initial point x
k+1
to produce gradient direction, which is
combined by a direction from the orthogonal subspace (the range space of A
t
k
) and
then a modied (nonlinear, discrete) line search is performed. These schemes are quite
complicated and not discussed here in details.
Let us recall briey the essential points of the reduced gradient method: an initial
feasible guess x
0
C is given and a sequence x
k
k0
C is generated by using
iterations of the general form:
k 0 x
k+1
= Q
k
(x
k
) = x
k
+
k
d
k
. (3.3)
The optimal choice for
k
is
k
arg minf(x
k
+ d
k
) : 0
max
, (3.4)
where
max
=
_
_
_
min
1jn
_
x
k
j
d
k
j
: d
k
j
< 0
_
if d
k
< 0
if d
k
0.
We have f(x
k
+
k
d
k
) f(x
k
).
4 Stochastic perturbation of the reduced gradient
The main diculty remains the lack of convexity: if f is not convex, the Kuhn-Tucker
points may not correspond to global minimum. In the following we shall improve this
point by using an appropriate random perturbation.
The sequence of real numbers x
k
k0
is replaced by a sequence of random variables
X
k
k0
involving a random perturbation T
k
of the deterministic iteration (3.3). Then
we have X
0
= x
0
;
k 0 X
k+1
= Q
k
(X
k
) +T
k
= X
k
+
k
d
k
+T
k
, (4.1)
where
k 1 T
k
is independent from (X
k1
, ..., X
0
), (4.2)
x C Q
k
(x) +T
k
C. (4.3)
Equation (4.1) can be viewed as perturbation of the descent direction d
k
, which is
replaced by a new direction D
k
= d
k
+
P
k
k
and the iterations (4.1) become
X
k+1
= X
k
+
k
D
k
.
General properties dening convenient sequences of perturbation T
k
kO
can be found
in the literature: usually, a sequence of Gaussian laws may be used in order to satisfy
these properties.
We introduce a random vector Z
k
, we denote by
k
and
k
the cumulative distri-
bution function and the probability density of Z
k
, respectively.
We denote by F
k+1
(y[X
k
= x) the conditional cumulative distribution function
F
k+1
(y[X
k
= x) = P(X
k+1
< y[X
k
= x);
the conditional probability density of X
k+1
is denoted by f
k+1
.
Let us introduce a sequence of n-dimensional random vectors Z
k
k0
C. We
consider also
k
k0
, a suitable decreasing sequence of strictly positive real numbers
converging to 0 and such that
0
1. The optimal choice for
k
is
k
arg minf(X
k
+ D
k
) : 0
max
, (4.4)
where
max
=
_
_
_
min
1jn
_
X
k
j
D
k
j
: D
k
j
< 0
_
if D
k
< 0
if D
k
0.
F
k+1
(y[X
k
= x) = P(X
k+1
< y[X
k
= x).
Then
F
k+1
(y[X
k
= x) = P(Z
k
<
y Q
k
(x)
k
)
and we have
F
k+1
(y[X
k
= x) =
k
_
y Q
k
(x)
k
_
f
k+1
(y[X
k
= x) =
1
n
k
k
_
y Q
k
(x)
k
_
y C. (4.5)
Note that (2.5) shows that
|y Q
k
(x)| (, ) for (x, y) C
.
We assume that there exists a decreasing function t g
k
(t), g
k
(t) > 0 on IR
+
such
that
y C

k
_
y Q
k
(x)
k
_
g
k
_
(, )
k
_
(4.6)
For simplicity, let
Z
k
= 1
C
(Z
k
)Z
k
. (4.7)
The procedure generates a sequence U
k
= f(X
k
). By construction this sequence is
decreasing and lower bounded by l
k 0 l
U
k+1
U
k
. (4.8)
Thus there exists U l
such that U
k
U for k +.
Lemma 4.1 Let T
k
=
k
Z
k
and = f(x
0
) if Z
k
is given by (4.7). Then there exists
> 0 such that
P(U
k+1
< [U
k
)
meas(C
n
k
g
k
_
(, )
k
_
> 0 (l
, l
+ ],
where n = dim(E).
Proof. Let

C
= x C[f(x) < , for (l
, l
+ ]. Since C
, l
< < , it
follows from (2.4) that

C
is non empty and has a strictly positive measure.

If meas(C

C
) = 0 for any (l
, l
+ ], the result is immediate, since we have

f(x) = l
on C.
Let us assume that there exists > 0 such that meas(C

C
) > 0. For (l
, l
+],
we have

C
and meas(C

C
) > 0.
P(X
k
,

C
) = P(X
k
C

C
) =
_
C

C
P(X
k
dx) > 0 for any (l
, l
+ ], since
the sequence U
i
i0
is decreasing, we have also
X
i
i0
C
. (4.9)
Therefore
P(X
k
,

C
) = P(X
k
C
) =
_
C

C
P(X
k
dx) > 0 for any (l
, l
+ ].
Let (l
, l
+ ], we have from (4.8)

P(U
k+1
< [U
k
) = P(X
k+1
[X
i
,

C
, i = 0, , k).
But (4.4) and (4.5) yield that
P(X
k+1
[X
i
,

C
, i = 0, , k) = P(X
k+1
[X
k
,

C
).
Thus
P(X
k+1
[X
k
,

C
) =
P(X
k+1
, X
k
,

C
)
P(X
k
,

C
)
.
Moreover
P(X
k+1
, X
k
,

C
) =
_
C

C
P(X
k
dx)
_
f
k+1
(y[X
k
= x) dy.
Using (4.9) we get
P(X
k+1
, X
k
,

C
) =
_
C

C
P(X
k
dx)
_
f
k+1
(y[X
k
= x) dy
and
P(X
k+1
, X
k
,

C
) inf
xC

C
_
_
f
k+1
(y[X
k
= x) dy
_
_
C

C
P(X
k
dx).
Thus
P(X
k+1
[X
k
,

C
) inf
xC

C
_
_
f
k+1
(y[X
k
= x) dy
_
.
Taking (4.5) into account, we have
P(X
k+1
[X
k
,

C
)
1
n
k
inf
xC

C
_
_
k
(
y Q
k
(x)
k
) dy
_
.
Note that (2.5) shows that
|y Q
k
(x)| (, )
and (4.6) yields that
k
_
y Q
k
(x)
k
_
g
k
_
(, )
k
_
.
Hence
P(X
k+1
[X
k
,

C
)
1
n
k
inf
xC

C
g
k
((
(, )
k
)) dy.
It follows that
P(X
k+1
[X
k
,

C
)
meas(C
n
k
g
k
_
(, )
k
_
.
The global convergence is a consequence of the following result, which yields from
the Borel-Catellis lemma (see, for instance, [28]).
Lemma 4.2 Let U
k
k0
be a decreasing sequence, lower bounded by l
. Then there
exists U such that U
k
U for k +. Assume that there exists > 0 such that for
any (l
, l
+], there is a sequence of strictly positive real numbers c

k
()
k0
such
that
k 0 P(U
k+1
< [U
k
) c
k
() > 0;
+
k=0
c
k
() = +. (4.10)
Then U = l
almost surely.
For the proof we refer the reader to [21] or [28].
Theorem 1 Let = f(x
0
) and
k
satisfy (4.4). Assume that x
0
C, the sequence
k
is non increasing and
+
k=0
g
k
_
(, )
k
_
= +. (4.11)
Then U = l
almost surely.
Proof. Let
c
k
() =
meas(C
n
k
g
k
(
(, )
k
) > 0. (4.12)
Since the sequence
k

k 0
is non increasing,
c
k
()
meas(C
n
0
g
k
(
(, )
k
) > 0.
Thus, equation (4.11) shows that
+
k=0
c
k
()
meas ( C
n
0
+
k=0
g
k
_
(, )
k
_
= +.
Using Lemma 1 and Lemma 2 we have U = l
almost surely.
Theorem 2 Let Z
k
= 1
C
(Z)Z where Z is a random variable following N ( 0, Id),
( > 0) and let
k
=
_
a
log (k + d)
, (4.13)
where a > 0, d > 0 and k is the iteration number. If x
0
C then, for a large enough,
U = l
almost surely.
Proof. We have
k
( z ) =
1
_
2
_
n
exp
_
1
2
_
_
_
z
_
_
_
2
_
= g
k
( | z | ) > 0.
So
g
k
_
(, )
k
_
=
1
_
2
_
n
(k + d)
(,)
2
/(2
2
a)
For a such that
0 <
(, )
2
2a
< 1,
we have
k=0
g
k
_
(, )
k
_
= +,
and, from the preceding theorem, we have U = l
almost surely.
5 Numerical results
In order to apply the method presented in Section 4, we start at the initial value
X
0
= x
0
C. At step k 0, X
k
is known and X
k+1
is determined.
We generate k
sto
the number of perturbation, the case k
sto
= 0 corresponds to the
unperturbed descent method. We add slash variables z
1
, z
2
, ... to inequality constraints
in order to opting equality constraints.
We shall introduce a maximum iteration number k
max
such that the iterations are
stopped when k = k
max
or x
k
is a Kuhn-Tucker point. For each descent direction,
the optimal
k
is calculated by unidimensional exhaustive search with a xed step on
the interval dened by (3.4). This procedure could imply a large number of calls of
the objective function. The global number of function evaluations is
eval
and includes
the evaluations for unidimensional search. We denote by k
end
the value of k when the
iterations are stopped (it corresponds to the number of evaluations of the gradient of
f). The optimal value and optimal point are f
opt
and x
opt
respectively. The perturba-
tion is normally distributed and samples are generated by using the log-trigonometric
generator and the standard random number generator of FORTRAN library. We use
k
=
_
a
log(k+2)
, where a > 0.
The maximum of iterations has been xed at k
max
= 100. The results for k
sto
= 0,
250, 500, 1000 and 2000 are given in the tables below for each problems. Concern
experiments performed on a workstation HP Intel(R) Celeron(R) M processor 1.30GHz,
224 Mo RAM. The row cpu gives the mean CPU time in seconds for one run.
5.1 Statistical problem
The statistical problem of estimating a bounded mean with a minimax procedure is
nonconvex and nonlinear; we will restrict ourselves to this problem (see for instance, [16]
and [17]). Thus, our problem is reduced to the following: for a suitable m
2
> m
1
1.27
and for each m (m
1
, m
2
] nd the a
1
, a
2
. In this case, the problem reduces to a global
optimization problem with 5 unknown (, a
1
, a
2
, a
3
, b) and 3 equality constraints. This
problem is equivalent to the maximization of the convex combination of the several g
i
function without the constraint on (see for instance [14] and [22]). Using the equality
a
1
+ a
2
+ a
3
= 1, one variable, a
1
for example, can be eliminated. The problem we
eventually studied is the following:
_
_
Minimize : (1 a
2
a
3
)g
1
+ a
2
g
2
+ a
3
g
3
subject to:
a
2
+ a
3
+ z
1
= 1
a
i
+ z
i
= 1 i = 2, 3
b + z
4
= 1
0 a
i
i = 2, 3
0 b, 0 z
i
i = 1, .., 4,
where
g
1
(a
2
, a
3
, b) : (1)
2
= ,
g
2
(a
2
, a
3
, b) : bmexp(bm) +
x=1
(bm(x))
2
(bm)
x1
x!
exp(bm) = ,
g
3
(a
2
, a
3
, b) : mexp(m) +
x=1
(m(x))
2
m
x1
x!
exp(m) = and
_
_
(x) = m
a
2
b
x
exp (1 b)m + a
3
a
2
b
x1
exp (1 b)m + a
3
if x ,= 0, 1
(1) = m
a
2
b exp(bm) + a
3
exp(m)
1 a
2
+ a
3
+ a
2
exp(bm) + a
3
exp(m)
.
There are 7 variables and 10 constraints. We use a = 1 and m = 1.5, the Fortran
code furnishes the following optimal solutions, see also Table 1:
a
1
= 0.0958 a
2
= 0.9042 a
3
= 0.00000 b
= 0.907159
Table 1: Results for Statistics
k
sto
0 250 500 1000 2000
f
opt
0.3401 0.4022 0.4030 0.4044 0.4044
eval
1 1183 4773 10743 17635
k
end
2 7 12 13 11
cpu 0.00 37.16 134.72 288.91 491.86
5.2 Octagon problem
Consider polygon in the plane with 5 sides (5-gons for short) and unit diameter. Which
one of them has a maximum area? (see for instant [18,31]). Using QP and geometric
reasoning, the optimal octagon is determined in (6, 10) to have an area about 2.8%
larger than the regular octagon.
Grahams conjecture states that the optimal octagon can be illustrated as in Fig-
ure 1, in which a solid line between two vertices indicates that the distance between
these points is one.
In 1996 Hansen formulates this question in the quadratically constrained quadratic
optimization problem dening this conguration, which appears below (see for instance
[8]). By symmetry, and without loss of generality, the constraint x
2
x
3
is added to
reduce the size of the feasible region.
>
>
>
>
>
>
>
>
>
>
>
>>
`
`
`
`
`
`
`
`
`
`
`
` \
\
\
\
\
\
\
\
\
\
\
\
\\
A
0
= (0, 0)
A
7
= (x
3
x
1
, y
1
y
3
)
A
6
= (x
1
x
2
+ x
4
, y
1
y
2
+ y
4
)
A
5
= (x
1
, y
1
)
A
4
= (0, 1)
A
3
= (x
1
, y
1
)
A
2
= (x
3
x
1
x
5
, y
1
y
3
+ y
5
)
A
1
= (x
1
x
2
, y
1
y
2
)
Figure 1: Denition of variables for the conguration conjectured by Graham to have max-
imum area
_
_
Minimize:
1
2
(x
2
+ x
3
4x
1
)y
1
+ (3x
1
2x
3
+ x
5
)y
2
+ (3x
1
2x
2
+ x
4
)y
3
+
(x
3
2x
1
)y
4
+ (x
2
2x
1
)y
5
+ x
1
Subject to:
|A
0
A
1
| 1 : (x
1
x
2
)
2
+ (y
1
y
2
)
2
+ z
1
= 1
|A
0
A
2
| 1 : (x
1
+ x
2
x
5
)
2
+ (y
1
y
3
+ y
5
)
2
+ z
2
= 1
|A
0
A
6
| 1 : (x
1
x
2
+ x
4
)
2
+ (y
1
y
2
+ y
4
)
2
+ z
3
= 1
|A
0
A
7
| 1 : (x
1
+ x
3
)
2
+ (y
1
y
3
)
2
+ z
4
= 1,
|A
1
A
2
| 1 : (2x
1
x
2
x
3
+ x
5
)
2
+ (y
2
y
3
+ y
5
)
2
+ z
5
= 1
|A
1
A
3
| 1 : (2x
1
+ x
2
)
2
+ y
2
2
+ z
6
= 1
|A
1
A
4
| 1 : (x
1
x
2
)
2
+ (y
1
y
2
1)
2
+ z
7
= 1
|A
1
A
7
| 1 : (2x
1
x
2
x
3
)
2
+ (y
2
+ y
3
)
2
+ z
8
= 1
|A
2
A
3
| 1 : (x
3
x
5
)
2
+ (y
3
+ y
5
)
2
+ z
9
= 1
|A
2
A
4
| 1 : (x
1
+ x
3
x
5
)
2
+ (y
1
y
3
+ y
5
1)
2
+ z
10
= 1
|A
2
A
5
| 1 : (2x
1
+ x
3
x
5
)
2
+ (y
3
+ y
5
)
2
+ z
11
= 1
|A
2
A
6
| = 1 : (2x
1
x
2
x
3
+ x
4
+ x
5
)
2
+ (y
2
+ y
3
+ y
4
y
5
)
2
= 1
|A
3
A
6
| 1 : (2x
1
+ x
2
x
4
)
2
+ (y
2
y
4
)
2
+ z
12
= 1
|A
4
A
6
| 1 : (x
1
x
2
+ x
4
)
2
+ (y
1
y
2
+ y
4
1)
2
+ z
13
= 1
|A
4
A
7
| 1 : (x
1
x
3
)
2
+ (1 y
1
+ y
3
)
2
+ z
14
= 1
|A
5
A
6
| 1 : (x
2
x
4
)
2
+ (y
2
y
4
)
2
+ z
15
= 1
|A
5
A
7
| 1 : (2x
1
x
3
)
2
+ y
2
3
+ z
16
= 1
|A
6
A
1
| 1 : (2x
1
x
2
x
3
+ x
4
)
2
+ (y
2
+ y
3
+ y
4
)
2
+ z
17
= 1
x
2
x
3
z
18
= 0
x
2
i
+ y
2
i
= 1, i = 1, 2, 3, 4, 5
x
1
+ z
19
= 0.5
x
i
+ z
18+i
= 1, i = 2, 3, 4, 5
0 x
i
, y
i
i = 1, 2, .., 5, 0 z
i
i = 1, 2, .., 23.
There are 33 variables and 62 constraints.
We use a = 5, the Fortran code furnishes the following optimal solutions, see also
Table 2:
x
opt
= (0.25682, 0.67357, 0.67060, 0.91187, 0.91342),
y
opt
= (0.96619, 0.73978, 0.74245, 0.41172, 0.40820)
Table 2: Results for Octagon
k
sto
0 250 500 1000 2000
f
opt
0.726637 0.72712 0.727377 0.72738 0.72739
eval
2 526 1654 2634 4585
k
end
3 11 12 10 15
cpu 0.00 1.48 6.70 10.67 12.53
5.3 Mixture problem
In this example of petrochemical mixture we have four reservoirs: R
1
, R
2
, R
3
and R
4
.
The rst two receive three distinct source products. Then their content is combined in
the other two reservoirs to create the wanted miscellanies. The question is to determine
the quantity of every product to buy in order to maximize the prots.
The R
1
reservoir receives two products of quality (e.g., content in sulfur) 3 and
1 in quantity q
0
and q
1
(see for instance [12, 19]). The quality of the mixture done
contain in R
1
is s =
3q
0
+q
1
q
0
+q
1
(1, 3). The R
2
reservoir contains a product of the third
source, of quality 2 in quantity q
2
. One wants to get in the R
3
reservoirs and R
4
of
capacity 10 and 20, of the quality products of to the more 2.5 and 1.5. Figure 2, where
the variable x
1
to x
2
represents some quantities, illustrates this situation. The unit
prices of the bought products are respectively 60, 160 and 100; those of the nished
products are 90 and 150. Is the dierence between the costs of purchase and sale
therefore 60q
0
+ 160q
1
+ 100q
2
90(x
1
+ x
3
) 150(x
2
x
4
)? The qualities of the
nal miscellanies are
sx
1
+2x
3
x
1
+x
3
and
sx
2
+2x
4
x
2
+x
4
. The addition of the constraints of volume
conservation q
0
+q
1
= x
1
+x
2
and q
2
= x
3
+x
4
permits the elimination of the variables
q
0
, q
1
and q
2
:
q
0
=
(s 1)(x
1
+ x
2
)
2
, q
1
=
(3 s)(x
1
+ x
2
)
2
, q
2
= x
3
+ x
4
.
R
2
R
1
R
4
R
3
q
0
,3
q
1
,1
x
1
, s
10; 2.5
`
`
`
`
`
`
`
`
`
`
`
x
2
,s
q
2
, 2
x
3
, 2

x
4
, 2
20; 1.5
Figure 2: A Problem of mixture
The mathematical model transformed (the variable s is noted x
5
) is as follows:
_
_
Minimize: 120x
1
+ 60x
2
+ 10x
3
50x
4
50x
1
x
5
50x
2
x
5
Subject to:
2.5x
1
+ 0.5x
3
x
1
x
5
z
1
= 0
1.5x
2
0.5x
4
x
2
x
5
z
2
= 0
x
1
+ x
3
+ z
3
= 10
x
2
+ x
4
+ z
4
= 20,
x
5
+ z
5
= 3,
x
5
z
6
= 1,
0 x
i
i = 1, .., 5; 0 z
i
i = 1, .., 6.
There are 11 variables and 17 constraints. We use a = 0.05, the Fortran code furnish
the following optimal solutions, see also Table 3.
x
opt
= (0, 10, 0, 10, 1)
Table 3: Results for Mixture
k
sto
0 250 500 1000 2000
f
opt
306 353 367 374 400
eval
2 28 51 105 28667
k
end
3 7 7 9 101
cpu 0.00 0.02 0.36 1.27 38.53
We have the optimum value of objective function (f
opt
= 400) for Mixture problem
in k
sto
= 2000.
5.4 Alkylation problem
The alkylation process is common in the petroleum industry, it is an important unit
that is used in reneries to upgrade light olens and isobutane into much more highly
valued gasoline component. The alkylation process is usually mixed at the gasoline
in order to improve his performance; see the problem 5.3. A simplied process ow
diagram of an alkylation process is shown in Figure 3 below. The process model seeks
to determine the optimum set of operating conditions for the process, based on a math-
ematical model, which allows the maximization of prot. The problem is formulated
as a direct nonlinear programming model with mixed nonlinear inequality and equality
constraints and a nonlinear prot function to be maximized (see for instance [3]). A
typical reaction [29] is
(CH
3
)
2
C
Olen feed
= CH
2
+ (CH
3
)
3
CH
Isobutane make up
Fresh acid
(CH
3
)
2
CHCH
2
C(CH
3
)
3
Alkylate product
As shown in Figure 3, an olen feed (100% butane), a pure isobutane recycle and a
100% isobutane make up stream are introduced in a reactor together with an acid
catalyst. The reactor product stream is then passed through a fractionator where the
isobutane and the alkylate product are separated. The spent acid is also removed
from the reactor.
Reactor
Alkylate product
Spent acid
Olen feed
Isobutane
make up
Fresh acid
Isobutane recycle
Fractionator
Hydrocarbone product
Figure 3: Simplied Alkylation Process Flow sheet
The notation used is shown in Table 4 along with the upper and lower bounds on
each variable. The bounds represent economic, physical and performance constraints.
Table 4: Variable and their bounds
Symbol Variable Lower Upper
Bound Bound
x
1
Olen feed (barrels/day) 0 2000
x
2
Isobutane recycle (barrels/day) 0 16000
x
3
Acid Addition Rate (X1000 pounds/day) 0 120
x
4
Alkylate yield (barrels/day) 0 5000
x
5
Isobutane make up (barrels/day) 0 2000
x
6
Acid strength (wt.%) 85 93
x
7
Motor octane no. 90 95
x
8
External Isobutane-olen Ratio 3 12
x
9
Acid dilution factor 1.2 4
x
10
F-4 performance no. 145 162
The external isobutane- to- olen ratio, x
8
, was equal to the sum of the isobutane
recycle, x
2
, and the isobutane make up, x
5
, divided by the olen feed, x
1
.
x
8
=
x
2
+ x
5
x
1
.
The motor octane number, x
7
, was a function of the external isobutane- to- olen ratio,
x
8
, and the acid strength by weight percent, x
6
, (for the same reactor temperatures
and acid strenghts as for the alkylate yield x
4
)
x
1
86.35 + 1.098x
8
0.038x
2
8
+ 0.325(x
6
89).
The acid strength by weight percent, x
6
, could be derived from an equation that
expressed the acid addition rate, x
3
, as a function of the alkylate yield, x
4
, and the
acid dilution factor, x
9
, and the acid strength by weight percent, x
6
(the addition acid
was assumed to have a strength of 98%)
1000x
3
=
x
4
x
6
x
9
98 x
6
.
This equation repeats itself with the help of quadratic constraints implying an articial
variable x
11
x
3
= x
6
x
11
and 98000x
11
1000x
3
= x
4
x
9
.
The alkylate yield, x
4
, was a function of the olen feed, x
1
, and the external isobutane-
to- olen ratio, x
8
. The relationship determined by holding the reactor temperature
between 80 F and 90 F and the reactor acid strength by weight percent at 85 to 93
was:
x
4
x
1
(1.12 + 0.13167x
8
0.00667x
2
8
).
While adding an articial variable x
12
, one gets the equivalent quadratic constraints:
x
4
= x
1
x
12
and x
12
1.12 + 0.13167x
8
0.00667x
2
8
.
The objective function is linear in the cost of the products sources x
1
, x
3
and x
5
, and
in the cost of the isobutane recycled x
2
, and depends linearly on the product of the
quantity of alkane by the rate of octane x
4
x
7
.
The other constraints of the physical boundary are on the variables and the linear
relations. Continuation to the elimination of variables, a stake to the scale and a
renumbering of the indications, one gets the following optimization problem:
_
_
Minimize : 614.88x
1
171.5x
2
6.3x
1
x
4
+ 4.27x
1
x
5
3.5x
2
x
5
+ 10x
3
x
6
Subject to :
0.325x
3
+ x
4
1.098x
5
+ 0.038x
2
5
+ z
1
= 57.425
0.13167x
5
+ x
7
+ 0.00667x
2
5
+ z
2
= 1.12
12.2x
1
10x
2
+ z
3
= 200, 12.2x
1
10x
2
z
4
= 0.1
10x
2
+ 12.2x
1
x
5
10x
2
x
5
+ z
5
= 1600, 10x
2
+ 12.2x
1
x
5
10x
2
x
5
z
6
= 0.1
65.346x
1
980x
6
0.666x
1
x
4
+ 10x
2
x
5
= 0
x
1
1.22x
1
x
7
+ x
2
x
7
= 0
x
3
x
6
+ z
7
= 120
x
1
+ z
8
= 32.786885, x
1
z
9
= 0.01
x
5
+ z
10
= 12, x
5
z
11
= 3
x
2
+ z
12
= 20, x
6
+ z
13
= 1.411765
x
3
+ z
14
= 95, x
3
z
15
= 85
x
4
+ z
16
= 95, x
4
z
17
= 92.66666.
0.8196722 + z
18
= x
7
,
x
i
0, i = 1, .., 7; z
j
, j = 1, .., 18.
There are 25 variables and 38 restrictions and we use a = 10 and k
max
= 100.
The SPRGM algorithm furnishes the following optimal solutions, see also Table 5:
x
opt
= (30.4713, 20.0000, 90.9999, 94.1900, 10.4100, 1.0697, 1.7743)
Table 5: Results for Alkylation
n
sto
0 250 500 1000 2000
f
opt
1161.626 1176.192 1176.191 1176.186 1176.193
eval
2 274 531 1019 2063
k
end
3 14 14 14 14
cpu 0.00 0.97 1.95 3.86 7.86
5.5 Pooling problem
We present in Figure 4 (and Table 6) an example in which the intermediate pools are
allowed to be intermediate, see for instance [12]. The pooling is thus extended to the
case where exiting blends of some intermediate pools are feeds of others.
The proportion model of the generalized pooling problem is not a bilinear program,
since the variables are not partitioned into two sets. Therefore, this formulation belongs
to the class of quadratically constrained quadratic programs; cf. for instance [19].
F
3
F
2
F
1
P
1
P
2
B
3
B
2
B
1
x
11
>
>
>
>
>
>
>
>
>
.
.
.
.
.
.
.
.
.
.
.
x
32
y
23
y
21
y
12
Figure 4: A Generalized Pooling Problem

Table 6: Characteristics of GP
Feed Price Attribute Max Pool Max Blend Price Max Attribute
$/bbl Quality Supply capacity $/bbl Demand Max
F
1
6 3 18 P
1
20 B
1
9 10 2.5
F
2
16 1 18 P
2
20 B
2
13 15 1.75
F
3
10 2 18 B
3
14 20 1.5
A hybrid model for GP is given below where v
12
denotes the ow from P
1
to P
2
(all
other variables are dened as before).
_
_
max
q,t,v,w,x,y
6(x
11
+ y
12
+ v
12
q
21
(y
12
+ v
12
)) 16q
12
(y
12
+ v
12
)
10(x
32
+ y
21
+ y
23
v
12
) + 9(x
11
+ y
21
) + 13(y
12
+ x
32
) + 14y
23
Subject to :
supply:
_
_
_
x
11
+ y
12
+ v
12
q
21
(y
12
+ v
12
) + z
1
= 18
q
21
(y
12
+ v
12
) + z
2
= 18
x
32
+ y
21
+ y
23
v
12
) + z
3
= 18
demand:
_
_
_
x
11
+ y
21
+ z
4
= 10
y
12
+ x
32
+ z
5
= 15
y
23
+ z
6
= 20
capacity:
_
y
12
+ v
12
+ z
7
= 20
y
21
+ y
23
+ z
8
= 20
attribute:
_
_
(3 2q
21
)v
12
+ 2(y
21
+ y
23
v
12
) t
1
2
(y
21
+ y
23
) = 0
3x
11
+ t
1
2
y
21
2.5(x
11
+ y
21
) + z
9
= 0
2x
32
+ (3 2q
21
)y
12
1.75(x
32
+ y
12
) + z
10
= 0
t
1
2
+ z
11
= 1.5
pos. ow:
_
_
_
y
21
+ y
23
v
12
z12 = 0
q
21
+ z13 = 1
q 0, t 0, v 0, w 0, x 0, y 0, z
i
0, i = 1, .., 13.
There are 21 variables and 35 restrictions, we use a = 50.
The SPRGM algorithm furnish the following optimal solutions, see also Table 7:
x
opt
= (x
11
, x
32
) = (0.1506, 2.6564); y
opt
= (y
12
, y
21
, y
23
) = (0.7425, 0.0005, 19.9988)
v
opt
= v
12
= 10.0013; q
opt
= q
21
= 0.9949; t
opt
= t
1
2
= 1.5000.
Table 7: Results for Pooling
n
sto
0 250 500 1000 2000
f
opt
26.15 26.63 26.71 26.72 26.73
eval
2 181 441 813 1114
k
end
3 5 5 5 5
cpu 0.00 0.03 0.03 0.11 0.17
6 Concluding remarks
We have presented a stochastic modication of the reduced gradient method for linear
and nonlinear constraints, involving the adjunction of a stochastic perturbation. This
approach leads to a stochastic descent method where the deterministic sequence gen-
erated by the reduced gradient is replaced by a sequence of random variables. We have
established a result of stochastic descent methods.
The numerical experiments show that our method is eective to be calculated for
nonconvex optimization problems verifying a nondegeneracy assumption. The use of
stochastic perturbations generally improves the results furnished by the reduced gra-
dient.
Here yet, we observe that the adjunction of the stochastic perturbation improves
the result, with a larger number of evaluations of the objective function. The nal
points X
k
end
are very close or practically identical for k
sto
250.
The main diculty in the practical use of the stochastic perturbation is connected
to the tuning of the parameters. Even if the combination with the deterministic reduced
gradient increases the robustness, the values of a and k
sto
have an important inuence
on the quality of the result. The random number generator may have also a inuence
aspects such as the number of evaluations. All the sequences
k
k0
tested have
furnished analogous results, but for dierent sets of parameters.
References
[1] Beck, P., Lasdon, L. and Engquist, M. A reduced gradient algorithm for nonlinear
network problems. ACM Trans. Math. Softw., 9 (1983), 5770.
[2] Bouhadi, M., Ellaia, R. and Souza de Cursi, J. E. Random perturbations of the
projected gradient for linearly constrained problems. Nonconvex Optim. Appl.,
54 (2001), 487499.
[3] Bracken, J. and McCormick, G. P. Selected applications of nonlinear programming,
New York-London-Sydney: John Wiley and Sons, Inc. (1986) XII, 110.
[4] Carson, Y. and Maria, A. Simulation optimization: methods and applications,
Proceeding of Winter Simulation Conference, USA, 1997.
[5] Conn, A.R., Gould, N. and Toint, P.L. Methods for nonlinear constraints in opti-
mization calculations, Inst. Math. Appl. Conf. Ser., New Ser., 63 (1997), 363390.
[6] Dorea, C. C. Y. Stopping rules for a random optimization method. SIAM
Journal Control and Optimization, Vol. 28, No. 4 (1990), 841850.
[7] Drud, A.S. CONOPT A large-scale GRG code. ORSA J. Comput., Vol. 6,
No. 2 (1994), 207216.
[8] El Mouatasim, A. Stochastic perturbation in global optimization: analyses and
application, PhD Theses, Morocco: Mohammadia School of Engineers, 2007.
[9] El Mouatasim, A., Ellaia, R. and Souza de Cursi, J. E. Random perturbation of
variable metric method for unconstraint nonsmooth global optimization. Inter-
national Journal of Applied Mathematics and Computer Science, Vol. 16, No. 4
(2006), 463474.
[10] Ellaia, R., El Mouatasim, A., Banga, J. R. and Sendin, O. H. NBI-RPRGM for
multi-objective optimization design of Bioprocesses, Journal of ESAIM: PRO-
CEEDINGS, 20 (2007), 118128.
[11] El Mouatasim, A., Ellaia, R. and Souza de Cursi, J.E. Projected variable metric
method for linear constrained nonsmooth global optimization via perturbation
stochastic. Submitted to RAIRO Journal (2008).
[12] Foulds, L.R. Haugland, D. and Jornsten, K. A bilinear approach to the pooling
problem. Optimization. 24 (1992), 165180.
[13] Floudas, C. A. and Pardalos, P. M. A collection of test problems for constrained
global optimization algorithms, Lecture Notes in Computer Science, 455. Berlin:
Springer-Verlag, XIV, 180, 1990.
[14] Ghosh, M. N. Uniform approximation of minimax point estimates. Ann. Math.
Statist., 35 (1964), 10311047.
[15] Gould, N.I.M. and Toint, P.L. SQP Methods for large-scale nonlinear program-
ming, System modelling and optimization. Methods, theory and applications, 19th
IFIP TC7 conference, Cambridge, GB, July 12-16, 1999, Boston: Kluwer Aca-
demic Publishers, IFIP, Int. Fed. Inf. Process. 46, 149178, 2000.
[16] Gourdin, E. Global optimization algorithms for the construction of least favorable
priors and minimax estimators, Engineering Undergraduate Thesis, Every, France:
Institut dInformatique dInterprise, 1989.
[17] Gourdin, E., Jaumard, B. and MacGibbon, B. Global optimization decomposi-
tion methods for bounded parameter minimax evaluation, SIAM Journal of Sci.
Comput., Vol. 15, No. 1 (1994), 1635.
[18] Graham, R. L. The largest small Hexagon. Journal of Combinatorial Theory,
Series A, 18 (1975), 165170.
[19] Haverly, C. A. Studies of the behaviour of recursion for the pooling problem,
ACM SIGMAP Bulletin, 25 (1996), 1928.
[20] Hock, W. and Schittkowski, K. Test examples for nonlinear programming
Codes,Lecture Notes in Economics and Mathematical Systems, 187, Berlin-
Heidelberg-New York: Springer-Verlag., V, 178, 1981.
[21] LEcuyer, P. and Touzin, R. On the Deng-Lin random number generators and
related methods. Statistics and Computing, Vol. 14, No. 1 (2003), 59.
Johnstone, I.M. and MacGibbon, K.B. Minimax estimation of a constrained
Poisson vector, Ann. Stat., Vol. 20, No. 2 (1992), 807831.
[22] Kall, P. Solution methods in stochastic programming, Lect. Notes Control Inf. Sci.
197, 322, 1994.
[23] Kiwiel, K. C. Free-Steering relaxation mthods for problems with strictly convex
costs and linear constraints, Math. Oper. Res., Vol. 22, No. 2 (1997), 326349.
[24] Lasdon, L.S., Waren, A.D. and Ratner, M. Design and testing of a generalized
reduced gradient code for nonlinear programming, ACM Trans. Math. Softw., 4
(1978), 3450.
[25] Luenberger, D. G. Introduction to linear and nonlinear programming, Addison-
Wesley Publishing Company. XII, 356, 1973.
[26] Martinez, J. M. A direct search method for nonlinear programming. ZAMM,
Z. Angew. Math. Mech., Vol. 79, No. 4 (1999), 267276.
Pogu, M. and Souza de Cursi, J.E. Global optimization by random perturbation
of the gradient method with a xed parameter. J. Glob. Optim. 5, 2 (1994),
159180.
[27] Pine, S. H. Organic chemistry, New York London: McGraw-hill, 1987.
[28] Raber, U. Nonconvex All-Quadratic global optimization problems: solution meth-
ods, application and related topics, Disseration Universitat Trier, 1999.
[29] Reinhardt, K. Extremale polygone gegebenen durchmessers, Jber Dtsch. Math.
Ver., 31 (1922), 251270.
[30] Smeers, Y. Generalized reduced gradient method as an extension of feasible
Direction Methods, J. Optimization Theory Appl., 22 (1977), 209226.
[31] Sadegh, P. and Spall, J. C. Optimal random perturbations for stochastic approx-
imation using a simultaneous perturbation gradient approximation, IEEE Trans.
Autom. Control, Vol. 44, No. 1 (1999), 231232.
[32] Stople, M. On models and methods for global optimization of structural topology,
Doctoral Thesis, Stockholm, 2003.

287 517 1 SM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

287 517 1 SM

Uploaded by

Copyright:

Available Formats

JNM: Journal of Natural Sciences and Mathematics

Vol. 2, No. 1, pp. 17-40 (June 2008/Jumada Al-Akher 1429

is not empty, closed and bounded (2.3)

having strictly positive

can be approximated by a sequence of points of the interior of

= x C[f(x) < , for (l

is non empty and has a strictly positive measure.

+ ], the result is immediate, since we have

+ ], we have from (4.8)

+], there is a sequence of strictly positive real numbers c

Figure 4: A Generalized Pooling Problem

You might also like