You are on page 1of 6

Adaptive Optimal Control for an Air-breathing Hypersonic Vehicle

Chao Guo, Huai-Ning Wu, Biao Luo


Science and Technology on Aircraft Control Laboratory
School of Automation Science and Electrical Engineering,
Beihang University (Beijing University of Aeronautics and Astronautics) Beijing 100191, P. R. China
whn@buaa.edu.cn

Abstract—This paper considers the problem of optimal control the step commands in altitude and velocity while requiring
for the longitudinal model of a generic air-breathing hypersonic limited state information. In [10], the optimal reference
vehicle (AHV). Existing works about the optimal control of AHV command tracking problem of the AHV was dealt with using
usually require the complete knowledge of the system dynamics. the LQR control design method based on a linearized model.
However, for realistic control systems, the dynamics may often be However, these works usually need the complete knowledge of
unknown or partially unknown, which makes it more difficult to AHV’s system dynamic or identification of the system
handle the optimal controller design of the AHV. To overcome dynamic. It is well known that the modeling and identification
this difficulty, we propose an adaptive optimal controller procedures of the AHV are often a time consuming procedure
synthesis algorithm, where the algebraic Riccati equation (ARE)
which requires model design, parameter identification and so
is solved online without requiring the internal dynamics of AHV.
Control inputs for the AHV include the elevator deflection and
on. So in this paper, we solve the optimal altitude command
the throttle setting. The simulation results on the AHV tracking problem of the AHV using the adaptive optimal
demonstrate the effectiveness of the proposed design method. control algorithm which does not require the internal dynamics
knowledge of the system. In [11-13], Abu-Khalaf, Vrabie and
Keywords-Air-breathing hypersonic vehicle; Adaptive optimal Lewis have given some research results of the adaptive optimal
control; Online control; LQR control.
The organization of this paper is as follows: Section II
I. INTRODUCTION presents the aerodynamic model of the AHV. Section III
Air-breathing hypersonic vehicles (AHVs) offer a introduces the adaptive optimal control algorithm based on
promising technology for cost-efficient access to space, policy iteration. Section IV gives the online implementation of
because scramjet propulsion may provide significant the adaptive optimal control algorithm. Section V provides the
advantages over the traditional expendable rockets. With this simulation results and a brief conclusion is drawn in VI.
type of aircraft, quick response and global strike capabilities
for Air Force missions will be more practical. The NASA X- II. THE AHV MODEL
43A set new world speed records in 2004, reaching Mach 6.8 The schematic of the geometry for this generic AHV is
and Mach 9.6 on two separate occasions with a scramjet engine. shown in Fig. 1.
These flights were the culmination of NASA’s Hyper-X
program, with the objective being to explore alternatives to τ2
rocket power for space access vehicles. δe
τ1,U Xb
Control of AHV is a significant challenge because of the
τ1,L
strong interaction between the aerodynamic and propulsive Zb
effects. Due to the enormous complexity of the dynamics, only
the models of the longitudinal dynamics of AHVs have been
utilized for the control design. Until recently, some works have α
done for the modeling of AHVs. Chavez and Schmidt used the
Newtonian Impact Theory to get the pressure distribution on
the hypersonic vehicle [1, 2]. As opposite of the model
Fig. 1 Geometry for AHV
developed by Chavez and Schmidt, Bolender and Doman
employed the Oblique-shock theory and Prandtl-Meyer flow to The longitudinal dynamics for the AHV is described by the
determine the pressure and shock angles, resulting in a possibly following nonlinear equations˖
more accurate model [3], and Oppenheimer and Doman made
use of the liner piston theory to include the unsteady T cos α − D μ sin γ
aerodynamic effects [4]. In this paper, the AHV model for V = − (1)
m r2
controller design is based on Bolender and Doman’s work [3].
L + T sin α ( μ − V 2 r ) cos γ
At present, many control methods have been studied for γ = − (2)
AHV, such as robust control [5-7], adaptive control [8, 9]. In mV Vr 2
[8], an adaptive sliding mode controller was designed to track 
h = V sin γ (3)

The work is supported by the National Natural Science Foundation of


China under Grants 61074057 and 91016004, and the Fundamental Research
Funds for the Central Universities (YWF-10-01-A19), China.

978-1-61284-200-4/11/$26.00 
c 2011 IEEE 7
α = q − γ (4) The solution of this optimal control problem, determined by
M yy the Bellman’s optimality principle, is given by
q = (5)
I yy u = − Kx (t ) (16)
where with
L = 0.5ρV 2 SCL (6)
K = R −1 BT P (17)
D = 0.5ρV 2 SCD (7) where the matrix P is the unique positive definite solution of
T = 0.5ρV 2 SCT (8) the ARE˖
M yy = 0.5ρV ScCM 2
(9) AT P + PA − PBR −1 BT P + Q = 0 (18)
r = h + RE (10) In the traditional methods of solving the ARE (18), it needs
In the equations, the aerodynamic coefficients CL , CD , CM the complete knowledge of the system model, both the system
matrix A and control input matrix B . So the system
and the thrust coefficient CT are functions of the Mach number
identification procedure is required prior to solve the optimal
M , the angle of attack α and the control inputs. The control control problem. In the following section, a policy iteration
inputs are the elevator deflection δ e and the throttle setting β . algorithm will be presented to solve online the optimal control
The elevator deflection mainly affects the pitching moment as problem without knowing the internal dynamics knowledge of
well as the lift and drag of the vehicle, and the throttle setting the system. The algorithm is based on an action network and a
will mostly affect the thrust. critic network. The update of the critic network results in
calculating the infinite horizon cost associated with the use of a
The nonlinear dynamic equations can be linearized at given initial stabilizing controller. The action network
different points within the flight envelop as defined by the parameters are then updated in the sense of reducing the cost
vehicle’s velocity and altitude. By trimming the vehicle at the compared to the present control policy.
specified velocity and altitude, the longitudinal dynamics of
this AHV can be described as follows: Note that K is a stabilizing state-feedback gain for the
system (13). Under the assumption that ( A, B ) is stabilizable,
x = Ax + Bu (11)
where x = [V , γ , h, α , q ] denotes the system state vector, x = ( A − BK ) x is a stable closed-loop system. Then the
u = [ β , δ e ] stands for the control input vector. We make the corresponding infinite horizon quadratic cost is given by

assumption that the pair ( A, B ) is stabilizable, and design the



V ( x (t )) = ³ xT (τ )(Q + K T RK ) x (τ )dτ = x T (t ) Px (t ) (19)
controller based on this linear system. t
where P is the real symmetric positive definite solution of the
Let x* and u * be the steady values of the system (11), Lyapunov matrix equation
i.e., Ax* + Bu * = 0 , and denote the deviations from the
steady-state values as: ( A − BK )T P + P ( A − BK ) = −(Q + K T RK ) (20)
and V ( x (t )) serves as a Lyapunov function for the system (13)
x = x − x* , u = u − u * (12)
Then the system (11) can be rewritten as: with the controller gain K . The cost function (19) can be
written as
x = Ax + Bu (13) t +T
V ( x (t )) = ³ x T (τ )(Q + K T RK ) x (τ )dτ + V ( x (t + T )) (21)
III. CONTINUOUS-TIME ADAPTIVE CRITIC SOLUTION FOR t
THE INFINITE HORIZON OPTIMAL CONTROL PROBLEM
Based on (21), denoting x (t ) with xt , with the
In this section, we use an adaptive optimal control
parameterization V ( xt ) = xt T Pxt and an initial stabilizing
algorithm to solve online the LQR problem for continuous-time
system. control gain K1 , the following policy iteration scheme can be
implemented online:
Consider the linear time-invariant system (13). The infinite
horizon quadratic cost function associated with the system is t +T

expressed as xt T Pi xt = ³ xt T (Q + Ki T RK i ) xτ dτ + xt +T T Pi xt + T (22)


t

V ( x (t0 ), t0 ) = ³ ( x T (τ )Qx (τ ) + u T (τ ) Ru (τ ))dτ (14) K i +1 = R −1 BT Pi (23)
t0 Equations (22) and (23) formulate a policy iteration process
of the adaptive optimal control algorithm which does not
where Q ≥ 0 , R > 0 and (Q1/ 2 , A) is detectable. The aim of
involve the system matrix A .
optimal control problem is to find the control policy:
The convergence proof of the adaptive optimal algorithm
u* (t ) = arg min V (t0 , x (t0 ), u (t )) (15) was given in the literature [13].
u ( t )
t0 ≤ t ≤∞

8 2011 IEEE 5th International Conference on Cybernetics and Intelligent Systems (CIS)
IV. ONLINE IMPLEMENTATION OF THE ADAPTIVE OPTIMAL The implementation flow chart of the online adaptive
CONTROL ALGORITHM optimal control algorithm is given in Fig. 2.
For the implementation of the adaptive optimal control
algorithm given by (22) and (23), we only need the knowledge
of the control input matrix B which appears in the control
policy update. The information regarding the system matrix A
P0 , i = 1, K1 , σ ( A − BK1 ) < 0
is embedded in the state x (t ) and x (t + T ) which are observed
online, and the matrix A is never required for the computation.
To compute the P matrix parameters of the cost function,
X = ª¬ xΔ1 xΔ2 ... xΔN º¼ , xΔi = x i (t ) − x i (t + T )
the term xT (t ) Pi x (t ) is written as
Y = [d ( x 1 , K i ) d ( x 2 , K i ) ... d ( x N , K i )]T
xT (t ) Pi x (t ) = pi T x (t ) (24) pi = ( XX T )−1 XY

where x (t ) is the Kronecker product quadratic polynomial


basis vector with the elements {xi (t ) x j (t )}i =1:n; j =1:n and
pi − pi −1 < ε
p = v( P) with v (.) a vector valued matrix function that acts Ki = R −1 BT Pi −1 i ← i +1

on symmetric matrix and returns a column vector by stacking


the elements of the diagonal and upper triangular part of the
symmetric matrix into a vector where the off-diagonal elements
are taken as 2 Pij , see [14]. Using (24), equation (22) can be
Fig. 2 Flow-chart of the adaptive optimal control algorithm
rewritten as
t +T
piT ( x (t ) − x (t + T )) = ³ xT (τ )(Q + Ki T RK i ) x (τ )dτ (25) x
t
u x
where pi is the vector of the unknown parameters and
−K x = Ax + Bu; x0
x (t ) − x (t + T ) acts as a regression vector. The right hand side
target function, denoted d ( x (t ), K i ) , can be written as
t +T
V = x T Qx + u T Ru
d ( x (t ), Ki ) ≡ ³ x T (τ )(Q + Ki T RK i ) x (τ )dτ (26)
t

Considering V (t ) = xT (t )Qx (t ) + u T (t ) Ru (t ) as a definition for


a new state V (t ) , the value of d ( x (t ), K i ) can be measured by
taking two measurements of the newly introduced system state
since d ( x (t ), Ki ) = V (t ) − V (t + T ) .
After a sufficient number of data points are collected using
the same control policy Ki , a least-squares (LS) method can be Fig. 3 Structure of the system with the adaptive optimal controller

employed to solve for the parameters pi of the cost function P0 ≥ P1 ≥ Pi −1 ≥ Pi ≥ Pi +1 ≥ P**

Vi ( xt ) , which will then yield the matrix Pi . Online Online Online Online
Learning Learning Learning Learning
ĂĂ ĂĂ
update

update

update

update

update

The basic training steps of the control system based on the


adaptive optimal control algorithm are as follows:
Step 1: Choose a small positive constant ε and a
K0 K1 K2 Ki K i +1 Ki+ 2 K **

stabilizing (though not optimal) state feedback controller. The T0 T1 T2 Ti Ti +1 Ti + 2 T **

initial controller can be usually chosen as zero if the system to


Control Gain

be controlled is stable.
ĂĂ
Step 2: Calculate the parameters pi (the matrix Pi ) by ĂĂ

using the least-square method.


T0 T1 T2 Ti Ti +1 Ti + 2 T **
Step 3: If pi − pi −1 < ε , stop the iteration process;
otherwise, return to Step 2 to calculate the next solution. Fig. 4 Representation of the online adaptive optimal control algorithm

Step 4: Obtain the optimal control law after completing the The structure of the system with the adaptive optimal
iteration process. controller is presented in Fig. 3. We can observe that the

2011 IEEE 5th International Conference on Cybernetics and Intelligent Systems (CIS) 9
adaptive optimal controller has a hybrid structure with a beginning and the end of each time interval. The simulation
continuous time internal state followed by a sampler and was conducted using the system measurement data at every
discrete time control gain update rule. 0.2s, and a LS problem was solved after 40 sample data were
acquired. Fig. 5 presents the convergence result for the P
Fig. 4 shows that the system is controlled using a state matrix parameters. The cost function parameters converged to
feedback control policy which has a constant control gain K i , the optimal ones at time t = 40s after five updates of the
over the time intervals [Ti , Ti +1 ] . During this time internal a controller parameters. The P matrix obtained using the online
reinforcement learning procedure, which uses the system adaptive optimal control algorithm, without knowing the
measurement data, is employed to determine the value system internal dynamics, is
associated with this controller. The value is described by the
parametric structure Pi . Once the learning procedure results in ª 68.5644 40.9759 0.6001 −1.6779 −0.3708º
« 40.9759 117.4726 2.0495 3.6025 0.8161 »
convergence to the value Pi , this result is used for calculating a « »
new control gain K i for the state-feedback controller. At every P = « 0.6001 2.0495 0.1576 0.0656 0.0145 » (28)
« »
step in the iterative procedure, it is guaranteed that the new « −1.6779 3.6025 0.0656 1.5491 0.4077 »
controller will provide a better performance than the previous «¬ −0.3708 0.8161 0.0145 0.4077 0.4001 »¼
controller. Therefore, a monotonically decreasing sequence of In order to give a comparison, the P matrix calculated by
cost functions, {Pi } will be obtained, which will converge to directly solving the ARE (18), is
the smaller possible value P** , associated with the optimal
ª 68.5640 40.9766 0.6001 −1.6775 −0.3706 º
control gain K ** . « 40.9766 117.4726 2.0495 3.6026
« 0.8161 »»
V. SIMULATION RESULTS OF THE AHV P = « 0.6001 2.0495 0.1576 0.0656 0.0145 » (29)
« »
To test the effective of the adaptive optimal controller « −1.6775 3.6026 0.0656 1.5491 0.4077 »
proposed in the previous section, we consider the following «¬ −0.3706 0.8161 0.0145 0.4077 0.4001 »¼
simulation result in this section. At the trim condition It can be seen that the error difference between the
( V = 15060 ft / s , h = 1100000 ft ), a linear model of the AHV parameter of the two matrices is in the range of 10 −4 .
is described by the following system:
100 300
x = Ax + Bu
P(1,1)

P(2,2)
A = [ A1 A2 ] 80 200

ª 3.4700e − 05 −0.0315º 60 100


« 2.5930e − 04 0 »» 0 50 0 50
« T ime(sec) T ime(sec)
A1 = « 0 1.5060 »
« » 0.4 2.5
« −2.5930e − 04 0 »
P(3,3)

«¬ 0.0055 0 »¼ 0.2 P(4,4) 2

ª −7.3910e − 5 −0.0497 0 º
« −5.7282e − 04 0.0440 » 0 1.5
0 0 50 0 50
« » T ime(sec) T ime(sec)
A2 = « 0 0 0 »
« »
« 5.7282e − 04 −0.0440 1 » Fig. 5 Evolution of the parameters of the P matrix
¬« −0.0014 0.5923 −0.0682 »¼
In fact, when the difference between the measured cost and
ª 0.0273 0 º (27) the expected cost crosses below a designer specified threshold,
« 5.7113e − 05 0 »»
« we can think that the convergence of the algorithm has been
B=« 0 0 » achieved. After the convergence to the optimal controller was
« » attained, the algorithm need not continue to be run and the
« −5.7113e − 05 0 » subsequent updates of the control gain will stop.
«¬ 0 3.3168»¼
In Figs. 6-10, the solid lines are system state trajectories for
The weight matrices R and Q are chosen as the AHV based on the adaptive optimal control algorithm, and
R = diag ([1,1]) and Q = diag ([5,5, 0.005,1,1]) . the dot lines are the results by directly solving the ARE. It is
shown from the figures that the adaptive optimal control
Since there are 15 independent elements in the symmetric method can basically achieve the same results as the standard
matrix P , the setup of the LS problem needs at least 15 LQR design method (by directly solving the ARE). The
measurements of the cost function associated with the given advantage of the adaptive optimal control algorithm is that the
control policy and the measurements of the system states at the knowledge of the system internal dynamics is not required.

10 2011 IEEE 5th International Conference on Cybernetics and Intelligent Systems (CIS)
4
x 10 0.059
1.5061
ADP
ADP
0.058 LQR
LQR
1.5061

Angle of attack(rad)
Flight velocity(ft/s)

0.057
1.506
0.056

1.5059
0.055

1.5059 0.054
0 50 100 150 200 0 50 100 150 200
T ime(sec) T ime(sec)

Fig.6 The flight velocity trajectory Fig.9 The attack angle trajectory

-4 -3
x 10 x 10
20 4
ADP ADP
15 LQR 3 LQR
Flight path angle(rad)

Pitch rate(rad/s)
10 2

5 1

0 0

-5 -1
0 50 100 150 200 0 50 100 150 200
T ime(sec) T ime(sec)

Fig.7 The flight path angle trajectory Fig.10 The pitch rate trajectory

5
x 10 0.12
1.112

1.11 0.118
ADP
Flight altitude(ft)

1.108
Throttle setting

LQR 0.116
1.106
0.114
1.104
0.112
1.102

1.1 0.11
0 50 100 150 200 0 50 100 150 200
T ime(sec) T ime(sec)

Fig.8 The flight altitude trajectory Fig.11 The throttle setting trajectory

But the standard LQR design method needs the complete law, we assume that there exist some uncertainties in matrix
knowledge of the system model (the matrices A and B ). Figs. A . Applying the LQR control law to the uncertain system
11-12 show the control input trajectories by using the adaptive where the (2, 2)-th entry of A is changed to be 1.0060, we
optimal control algorithm. The quadratic costs by directly can obtain that the quadratic cost is 0.002281. However, using
solving the ARE and the adaptive optimal control algorithm are the adaptive optimal control law for the uncertain system, the
0.001599 and 0.001740, respectively. resulting quadratic cost is 0.002034. It is clear that the
To illustrate the advantage of the adaptive optimal control

2011 IEEE 5th International Conference on Cybernetics and Intelligent Systems (CIS) 11
adaptive optimal control algorithm is better than the LQR Journal of Guidance, Control, and Dynamics, vol. 22, no. 1, pp. 87-95,
design method for the uncertain system. 1999
[3] M. A. Bolender and D. B. Doman, “Nonlinear longitudinal dynamic
model of an air-breathing hypersonic vehicle,” Journal of Spacecraft
-0.015 and Rockets, vol. 44, no. 2, pp. 374-387, 2007
[4] M. W. Oppenheimer and D. B. Doman, “A hypersonic vehicle model
developed with piston theory,” AIAA Atmospheric Flight Mechanics
Elevattor deflection(rad)

-0.02 Conference and Exhibit, Keystone, Colorado, Aug. 21-24. AIAA-2006-


6637, 2006
[5] Q. Wang and R. F. Stengel, “Robust nonlinear control of a hypersonic
-0.025 aircraft,” Journal of Guidance, Control, and Dynamics, vol. 23, no. 4,
pp. 577-585, 2000
[6] L. Fiorentini, A. Serrani, M. Bolender and D. Doman, “Nonlinear robust
-0.03 adaptive control of flexible air-breathing hypersonic vehicles,” Journal
of Guidance, Control, and Dynamics, vol. 32, no. 2, pp. 402-417, 2009
[7] D. Sigthorsson, P. Jankovsky, A. Serrani, S. Yurkovich, M. Bolender
and D. Doman, “Robust linear output feedback control of an airbreathing
-0.035 hypersonic vehicle,” Journal of Guidance, Control, and Dynamics, vol.
0 50 100 150 200 31, no. 4, pp. 1052-1066, 2008
T ime(sec)
[8] H. Xu, M. D. Mirmirani and P. Ioannou, “Adaptive sliding mode control
design for a hypersonic flight vehicle,” Journal of Guidance, Control,
Fig.12 The elevator deflection trajectory and Dynamics, vol. 27, no. 5, pp. 829-838, 2004
[9] L. Fiorentini, A. Serrani, M. A. Bolender and D. B. Doman, “Nonlinear
robust/adaptive controller design for an air-breathing hypersonic vehicle
VI. CONCLUSION model,” AIAA Guidance, Navigation and Control Conference and
In the past, optimal control has been an off-line design Exhibit, Hilton Head, South Carolina, Aug. 20-23, AIAA-2007-6329,
2007
method, and the online adaptive controller has not been optimal.
[10] K. P. Groves, D. O. Sigthorssony, A. Serraniz, S. Yurkovichx, M. A.
In this paper we use the online adaptive optimal control Bolender and D. B. Doman. “Reference command tracking for a
algorithm to solve the altitude input command tracking linearized model of an air-breathing hypersonic vehicle,” AIAA
problem of the AHV. The online adaptive optimal control Guidance, Navigation, and Control Conference and Exhibit, San
algorithm, which uses the reinforcement learning principles to Francisco, California, Aug. 15-18, AIAA-2005-6144, 2007
solve the continuous-time LQR problem, is a data-based [11] M. Abu-Khalaf, F. L. Lewis and J. Huang, “Policy iterations on the
approach to the solution of the ARE without using the Hamilton-Jacobi-Isaacs equation for H-infinity state feedback control
with input saturation,” IEEE Trans. on Automat. Control, AC-51, no. 12,
knowledge of the system internal dynamic. The simulation pp.1989-1995, 2006
results show that the method proposed in this paper can achieve [12] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for
a good control objective for the AHV. nonlinear systems with saturating actuators using a neural network HJB
approach,” Automatica, vol. 41, no. 5, pp. 779-791, 2005
REFERENCES [13] D. Vrabie, O. Pastravanu, M. Abu-Khalaf and F. L. Lewis, “Adaptive
optimal control for continuous-time linear systems based on policy
[1] F. R. Chavez and D. K. Schmidt, “Analytical aeropropulsive/aeroelastic iteration,” Automatica, vol.45, no. 2, pp. 477-484, 2004
hypersonic-vehicle model with dynamic analysis,” Journal of Guidance,
Control, and Dynamics, vol. 17, no. 6, pp. 1308-1319, 1994 [14] J. W. Brewer, “Kronecker products and matrix calculus in system
theory,” IEEE Trans. on Circuit and System, vol. 25, no. 9, pp. 772-781,
[2] F. R. Chavez and D. K. Schmidt, “Uncertainty modeling for 1978
multivariable-control robustness analysis of elastic high-speed vehicles,”

12 2011 IEEE 5th International Conference on Cybernetics and Intelligent Systems (CIS)

You might also like