Professional Documents
Culture Documents
Abstract—This paper considers the problem of optimal control the step commands in altitude and velocity while requiring
for the longitudinal model of a generic air-breathing hypersonic limited state information. In [10], the optimal reference
vehicle (AHV). Existing works about the optimal control of AHV command tracking problem of the AHV was dealt with using
usually require the complete knowledge of the system dynamics. the LQR control design method based on a linearized model.
However, for realistic control systems, the dynamics may often be However, these works usually need the complete knowledge of
unknown or partially unknown, which makes it more difficult to AHV’s system dynamic or identification of the system
handle the optimal controller design of the AHV. To overcome dynamic. It is well known that the modeling and identification
this difficulty, we propose an adaptive optimal controller procedures of the AHV are often a time consuming procedure
synthesis algorithm, where the algebraic Riccati equation (ARE)
which requires model design, parameter identification and so
is solved online without requiring the internal dynamics of AHV.
Control inputs for the AHV include the elevator deflection and
on. So in this paper, we solve the optimal altitude command
the throttle setting. The simulation results on the AHV tracking problem of the AHV using the adaptive optimal
demonstrate the effectiveness of the proposed design method. control algorithm which does not require the internal dynamics
knowledge of the system. In [11-13], Abu-Khalaf, Vrabie and
Keywords-Air-breathing hypersonic vehicle; Adaptive optimal Lewis have given some research results of the adaptive optimal
control; Online control; LQR control.
The organization of this paper is as follows: Section II
I. INTRODUCTION presents the aerodynamic model of the AHV. Section III
Air-breathing hypersonic vehicles (AHVs) offer a introduces the adaptive optimal control algorithm based on
promising technology for cost-efficient access to space, policy iteration. Section IV gives the online implementation of
because scramjet propulsion may provide significant the adaptive optimal control algorithm. Section V provides the
advantages over the traditional expendable rockets. With this simulation results and a brief conclusion is drawn in VI.
type of aircraft, quick response and global strike capabilities
for Air Force missions will be more practical. The NASA X- II. THE AHV MODEL
43A set new world speed records in 2004, reaching Mach 6.8 The schematic of the geometry for this generic AHV is
and Mach 9.6 on two separate occasions with a scramjet engine. shown in Fig. 1.
These flights were the culmination of NASA’s Hyper-X
program, with the objective being to explore alternatives to τ2
rocket power for space access vehicles. δe
τ1,U Xb
Control of AHV is a significant challenge because of the
τ1,L
strong interaction between the aerodynamic and propulsive Zb
effects. Due to the enormous complexity of the dynamics, only
the models of the longitudinal dynamics of AHVs have been
utilized for the control design. Until recently, some works have α
done for the modeling of AHVs. Chavez and Schmidt used the
Newtonian Impact Theory to get the pressure distribution on
the hypersonic vehicle [1, 2]. As opposite of the model
Fig. 1 Geometry for AHV
developed by Chavez and Schmidt, Bolender and Doman
employed the Oblique-shock theory and Prandtl-Meyer flow to The longitudinal dynamics for the AHV is described by the
determine the pressure and shock angles, resulting in a possibly following nonlinear equations˖
more accurate model [3], and Oppenheimer and Doman made
use of the liner piston theory to include the unsteady T cos α − D μ sin γ
aerodynamic effects [4]. In this paper, the AHV model for V = − (1)
m r2
controller design is based on Bolender and Doman’s work [3].
L + T sin α ( μ − V 2 r ) cos γ
At present, many control methods have been studied for γ = − (2)
AHV, such as robust control [5-7], adaptive control [8, 9]. In mV Vr 2
[8], an adaptive sliding mode controller was designed to track
h = V sin γ (3)
978-1-61284-200-4/11/$26.00
c 2011 IEEE 7
α = q − γ (4) The solution of this optimal control problem, determined by
M yy the Bellman’s optimality principle, is given by
q = (5)
I yy u = − Kx (t ) (16)
where with
L = 0.5ρV 2 SCL (6)
K = R −1 BT P (17)
D = 0.5ρV 2 SCD (7) where the matrix P is the unique positive definite solution of
T = 0.5ρV 2 SCT (8) the ARE˖
M yy = 0.5ρV ScCM 2
(9) AT P + PA − PBR −1 BT P + Q = 0 (18)
r = h + RE (10) In the traditional methods of solving the ARE (18), it needs
In the equations, the aerodynamic coefficients CL , CD , CM the complete knowledge of the system model, both the system
matrix A and control input matrix B . So the system
and the thrust coefficient CT are functions of the Mach number
identification procedure is required prior to solve the optimal
M , the angle of attack α and the control inputs. The control control problem. In the following section, a policy iteration
inputs are the elevator deflection δ e and the throttle setting β . algorithm will be presented to solve online the optimal control
The elevator deflection mainly affects the pitching moment as problem without knowing the internal dynamics knowledge of
well as the lift and drag of the vehicle, and the throttle setting the system. The algorithm is based on an action network and a
will mostly affect the thrust. critic network. The update of the critic network results in
calculating the infinite horizon cost associated with the use of a
The nonlinear dynamic equations can be linearized at given initial stabilizing controller. The action network
different points within the flight envelop as defined by the parameters are then updated in the sense of reducing the cost
vehicle’s velocity and altitude. By trimming the vehicle at the compared to the present control policy.
specified velocity and altitude, the longitudinal dynamics of
this AHV can be described as follows: Note that K is a stabilizing state-feedback gain for the
system (13). Under the assumption that ( A, B ) is stabilizable,
x = Ax + Bu (11)
where x = [V , γ , h, α , q ] denotes the system state vector, x = ( A − BK ) x is a stable closed-loop system. Then the
u = [ β , δ e ] stands for the control input vector. We make the corresponding infinite horizon quadratic cost is given by
8 2011 IEEE 5th International Conference on Cybernetics and Intelligent Systems (CIS)
IV. ONLINE IMPLEMENTATION OF THE ADAPTIVE OPTIMAL The implementation flow chart of the online adaptive
CONTROL ALGORITHM optimal control algorithm is given in Fig. 2.
For the implementation of the adaptive optimal control
algorithm given by (22) and (23), we only need the knowledge
of the control input matrix B which appears in the control
policy update. The information regarding the system matrix A
P0 , i = 1, K1 , σ ( A − BK1 ) < 0
is embedded in the state x (t ) and x (t + T ) which are observed
online, and the matrix A is never required for the computation.
To compute the P matrix parameters of the cost function,
X = ª¬ xΔ1 xΔ2 ... xΔN º¼ , xΔi = x i (t ) − x i (t + T )
the term xT (t ) Pi x (t ) is written as
Y = [d ( x 1 , K i ) d ( x 2 , K i ) ... d ( x N , K i )]T
xT (t ) Pi x (t ) = pi T x (t ) (24) pi = ( XX T )−1 XY
Vi ( xt ) , which will then yield the matrix Pi . Online Online Online Online
Learning Learning Learning Learning
ĂĂ ĂĂ
update
update
update
update
update
be controlled is stable.
ĂĂ
Step 2: Calculate the parameters pi (the matrix Pi ) by ĂĂ
Step 4: Obtain the optimal control law after completing the The structure of the system with the adaptive optimal
iteration process. controller is presented in Fig. 3. We can observe that the
2011 IEEE 5th International Conference on Cybernetics and Intelligent Systems (CIS) 9
adaptive optimal controller has a hybrid structure with a beginning and the end of each time interval. The simulation
continuous time internal state followed by a sampler and was conducted using the system measurement data at every
discrete time control gain update rule. 0.2s, and a LS problem was solved after 40 sample data were
acquired. Fig. 5 presents the convergence result for the P
Fig. 4 shows that the system is controlled using a state matrix parameters. The cost function parameters converged to
feedback control policy which has a constant control gain K i , the optimal ones at time t = 40s after five updates of the
over the time intervals [Ti , Ti +1 ] . During this time internal a controller parameters. The P matrix obtained using the online
reinforcement learning procedure, which uses the system adaptive optimal control algorithm, without knowing the
measurement data, is employed to determine the value system internal dynamics, is
associated with this controller. The value is described by the
parametric structure Pi . Once the learning procedure results in ª 68.5644 40.9759 0.6001 −1.6779 −0.3708º
« 40.9759 117.4726 2.0495 3.6025 0.8161 »
convergence to the value Pi , this result is used for calculating a « »
new control gain K i for the state-feedback controller. At every P = « 0.6001 2.0495 0.1576 0.0656 0.0145 » (28)
« »
step in the iterative procedure, it is guaranteed that the new « −1.6779 3.6025 0.0656 1.5491 0.4077 »
controller will provide a better performance than the previous «¬ −0.3708 0.8161 0.0145 0.4077 0.4001 »¼
controller. Therefore, a monotonically decreasing sequence of In order to give a comparison, the P matrix calculated by
cost functions, {Pi } will be obtained, which will converge to directly solving the ARE (18), is
the smaller possible value P** , associated with the optimal
ª 68.5640 40.9766 0.6001 −1.6775 −0.3706 º
control gain K ** . « 40.9766 117.4726 2.0495 3.6026
« 0.8161 »»
V. SIMULATION RESULTS OF THE AHV P = « 0.6001 2.0495 0.1576 0.0656 0.0145 » (29)
« »
To test the effective of the adaptive optimal controller « −1.6775 3.6026 0.0656 1.5491 0.4077 »
proposed in the previous section, we consider the following «¬ −0.3706 0.8161 0.0145 0.4077 0.4001 »¼
simulation result in this section. At the trim condition It can be seen that the error difference between the
( V = 15060 ft / s , h = 1100000 ft ), a linear model of the AHV parameter of the two matrices is in the range of 10 −4 .
is described by the following system:
100 300
x = Ax + Bu
P(1,1)
P(2,2)
A = [ A1 A2 ] 80 200
ª −7.3910e − 5 −0.0497 0 º
« −5.7282e − 04 0.0440 » 0 1.5
0 0 50 0 50
« » T ime(sec) T ime(sec)
A2 = « 0 0 0 »
« »
« 5.7282e − 04 −0.0440 1 » Fig. 5 Evolution of the parameters of the P matrix
¬« −0.0014 0.5923 −0.0682 »¼
In fact, when the difference between the measured cost and
ª 0.0273 0 º (27) the expected cost crosses below a designer specified threshold,
« 5.7113e − 05 0 »»
« we can think that the convergence of the algorithm has been
B=« 0 0 » achieved. After the convergence to the optimal controller was
« » attained, the algorithm need not continue to be run and the
« −5.7113e − 05 0 » subsequent updates of the control gain will stop.
«¬ 0 3.3168»¼
In Figs. 6-10, the solid lines are system state trajectories for
The weight matrices R and Q are chosen as the AHV based on the adaptive optimal control algorithm, and
R = diag ([1,1]) and Q = diag ([5,5, 0.005,1,1]) . the dot lines are the results by directly solving the ARE. It is
shown from the figures that the adaptive optimal control
Since there are 15 independent elements in the symmetric method can basically achieve the same results as the standard
matrix P , the setup of the LS problem needs at least 15 LQR design method (by directly solving the ARE). The
measurements of the cost function associated with the given advantage of the adaptive optimal control algorithm is that the
control policy and the measurements of the system states at the knowledge of the system internal dynamics is not required.
10 2011 IEEE 5th International Conference on Cybernetics and Intelligent Systems (CIS)
4
x 10 0.059
1.5061
ADP
ADP
0.058 LQR
LQR
1.5061
Angle of attack(rad)
Flight velocity(ft/s)
0.057
1.506
0.056
1.5059
0.055
1.5059 0.054
0 50 100 150 200 0 50 100 150 200
T ime(sec) T ime(sec)
Fig.6 The flight velocity trajectory Fig.9 The attack angle trajectory
-4 -3
x 10 x 10
20 4
ADP ADP
15 LQR 3 LQR
Flight path angle(rad)
Pitch rate(rad/s)
10 2
5 1
0 0
-5 -1
0 50 100 150 200 0 50 100 150 200
T ime(sec) T ime(sec)
Fig.7 The flight path angle trajectory Fig.10 The pitch rate trajectory
5
x 10 0.12
1.112
1.11 0.118
ADP
Flight altitude(ft)
1.108
Throttle setting
LQR 0.116
1.106
0.114
1.104
0.112
1.102
1.1 0.11
0 50 100 150 200 0 50 100 150 200
T ime(sec) T ime(sec)
Fig.8 The flight altitude trajectory Fig.11 The throttle setting trajectory
But the standard LQR design method needs the complete law, we assume that there exist some uncertainties in matrix
knowledge of the system model (the matrices A and B ). Figs. A . Applying the LQR control law to the uncertain system
11-12 show the control input trajectories by using the adaptive where the (2, 2)-th entry of A is changed to be 1.0060, we
optimal control algorithm. The quadratic costs by directly can obtain that the quadratic cost is 0.002281. However, using
solving the ARE and the adaptive optimal control algorithm are the adaptive optimal control law for the uncertain system, the
0.001599 and 0.001740, respectively. resulting quadratic cost is 0.002034. It is clear that the
To illustrate the advantage of the adaptive optimal control
2011 IEEE 5th International Conference on Cybernetics and Intelligent Systems (CIS) 11
adaptive optimal control algorithm is better than the LQR Journal of Guidance, Control, and Dynamics, vol. 22, no. 1, pp. 87-95,
design method for the uncertain system. 1999
[3] M. A. Bolender and D. B. Doman, “Nonlinear longitudinal dynamic
model of an air-breathing hypersonic vehicle,” Journal of Spacecraft
-0.015 and Rockets, vol. 44, no. 2, pp. 374-387, 2007
[4] M. W. Oppenheimer and D. B. Doman, “A hypersonic vehicle model
developed with piston theory,” AIAA Atmospheric Flight Mechanics
Elevattor deflection(rad)
12 2011 IEEE 5th International Conference on Cybernetics and Intelligent Systems (CIS)