Rational Iterative Methods For The Matrix Sign Function

SIAM J. MATRIX ANAL. APPL.
(C) 1991 Society for Industrial and Applied Mathematics

Vol. 12, No. 2, pp. 273-291, April 1991 O06
RATIONAL ITERATIVE METHODS FOR

THE MATRIX SIGN FUNCTION*
Downloaded 11/09/14 to 198.11.31.160. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
CHARLES KENNEYy AND ALAN J. LAUB"

Abstract. In this paper an analysis of rational iterations for the matrix sign function is presented. This
analysis is based on Pad6 approximations of a certain hypergeometric function and it is shown that local
convergence results for "multiplication-rich" polynomial iterations also apply to these rational methods. Mul-
tiplication-rich methods are of particular interest for many parallel and vector computing environments. The
main diagonal Pad6 recursions, which include Newton’s and Halley’s methods as special cases, are globally
convergent and can be implemented in a multiplication-rich fashion which is computationally competitive with
the polynomial recursions (which are not globally convergent). Other rational iteration schemes are also discussed,
including Laurent approximations, Cayley power methods, and globally convergent eigenvalue assignment
methods.
Key words. Pad6 approximation, matrix sign function, Riccati equations, rational iterations
AMS(MOS) subject classifications. 15A24, 65D99, 65F99
1. Introduction. It is a classical result that the algebraic Riccati equation can be

solved by using an invariant subspace of an associated Hamiltonian matrix. This motivated
the introduction, by Roberts [21 in 197 l, of the matrix sign function as a means of
finding the positive and negative invariant subspaces of any matrix X which does not
have eigenvalues on the imaginary axis. This and subsequent work 9 showed that the
matrix sign function could be used to solve many problems in control theory.
The sign of X can be defined constructively as the limit of the Newton sequence
(1.1) Xn+ 1/2(Xnqt-X;1), Xo:X,
(1.2) sgn (X)-= lim X.
Newton’s method has the pleasant feature that it is globally convergent; if X has no
eigenvalues on the imaginary axis then the limit in 1.2) exists. As a definition, however,
1.2) does not reveal many of the important properties of the sign function. Because of
this, it is useful to have an equivalent definition based on the Jordan canonical form of
X (see [4], [7]). For a complex scalar z with Re z :/: 0, define the sign of z by
1.3) sgnz=I -1 if Rez>0,

ifRez<0.
For a complex matrix X such that A(X) c C + tA C- (i.e., X has no eigenvalues on the
imaginary axis) let T take X to Jordan form:
T,
0
Received by the editors September 28, 1989; accepted for publication (in revised form) November 15,
1989. This research was supported by National Science Foundation (and Air Force Office of Scientific Research)
grant ECS87-18897, National Science Foundation grant DMS88-00817, and Air Force Office of Scientific Research
contract AFOSR-89-0167.
? Department of Electrical and Computer Engineering, University of California, Santa Barbara, California
93106 (laub%lanczos@hub.ucsb.edu).
273
274 c. KENNEY AND A. J. LAUB
where P and N are in block diagonal Jordan form with, respectively, positive and negative
real part eigenvalues. Then the sign of X is given by
(1.5) sgn (X)=

T-’[I00I]T’
where I and -I in (1.5) have the same dimensions as P and N in (1.4). This shows
immediately that the sign of X is a square root of the identity which commutes with X:
(1.6) S 2- I, XS SX,
where S sgn (X).
Using (1.4) in 1.1 ), shows that the eigenvalues ))n) of Xn are decoupled from each
other and obey the scalar recursions
+ x))=
with lim,_ +o )(n)
,,j sgn ()j). This decoupling greatly simplifies the analysis of methods
like 1.1 ).
Because of the need for pivoting, matrix inversions are sometimes not as amenable
to parallel or vector implementation as matrix multiplications. Thus, a current trend
in evaluating sgn (X) and related functions such as the polar decomposition [5],
[11 ], [12] is to favor algorithms which are "multiplication-rich," such as the Newton-
Schulz iteration
(1.8) X,,+ 1/2X,,(3I-X).
(The recursion (1.8) is obtained from (1.1) by using Schulz’s approximation X
Xn + (I- X2)X, as suggested in [12 ].) This method avoids the matrix inversion in 1.1
and is quadratically convergent provided
(1.9) III-X=ll < ,

where I]" is any reasonable matrix norm (see Theorem 5.2). If (1.9) is not satisfied
then a starter method such as 1.1 must be used until I- X
< 1.
Higher-order polynomial recursions for the polar decomposition of a nonsingular
matrix were developed independently by Kovarik 17 and Leipnik 18 and are applicable
to the matrix sign function. These methods are based on polynomial approximations of
the hypergeometric function
and generate convergent matrix sequences provided that (1.9) is satisfied. The motivation
for studying this function is that for nonzero real x, sgn x x
Ix] x/( ()/2 where
( x 2. In 3, we show that the sufficient condition (1.9) actually provides a rather
good approximation to the true region of convergence for these methods. Consequently,
we might feel that loss of global convergence is the price that must be paid in order to
use multiplication-rich algorithms. Rather surprisingly, this is not the case.
For example, recursions based on rational (Padr) approximations of (1 ()-1/2
have much larger regions of convergence. In fact, the main diagonal approximations
(those for which the degree m of the denominator is equal to or one greater than the
ITERATIVE METHODS MATRIX SIGN 275
degree k of the numerator) lead to globally convergent iterations that satisfy an elegant
error formula:
(I.II) S Xn S -+- Xn -1
S Xo )’Yn s .-Jf- Xo -’Yn
where 3’ k + m + is the order of the approximation. (For Newton’s method, a similar

result was proved by Balzer [3, eq. (39)] and by Roberts [21, 1.3 ].) These methods
are easily modified to allow exact one-step convergence of specified eigenvalues (much
like the eigenvalue assignment schemes of Balzer in [3 ]) while still remaining globally
convergent. An analysis of the Halley family of algorithms of Gander [10 for the polar
decomposition shows that these methods belong to this class of assignment procedures.
The work in 10 can also be adapted to give a local convergence theory for general sign
function iterations of the form Xn / F(Xn).
A second family of globally convergent multiplication-rich methods is based on the
Cayley transform
(.2) Y=(I-X)(I+X) -,
which takes the positive real part eigenvalues of X inside the unit circle and the negative
(1.13) .
real part eigenvalues of X outside the unit circle. If Y is multiplied by itself repeatedly,
then these eigenvalues move toward zero and infinity, respectively. Transforming back
to get X,,
(I- Y")(I+ y,,)-I
moves these eigenvalues very near one and minus one, respectively. (If X has -1 as an
eigenvalue, then I + X is singular and a modified version of (1.12), (1.13) must be used.)
A fascinating correspondence between the Cayley power method and the Pad6 approx-
imation method is that if the power u in 1.13 is equal to 3? in 1.11 ), then Xn is equal
to ),! This does not mean, however, that these two methods should be viewed as identical
because in this case the Pad6 method requires n matrix inversions while the Cayley
method requires only two. Similar equivalency results for different members of the Pad6
method can also be proved (see Theorem 3.4). An interesting sidelight on the Cayley
power method is that (1.12) can be replaced by any transformation which is a rational
or analytic function of X that takes the fight- and left-half complex planes inside and
outside the unit disk, respectively. For example, if Y e -x then Y" is just the fundamental
solution matrix to I2 -XY at time v: Y" e -"x and (I- e-"X)(I + c-x) -1.
Note in this case that I + e -"x is never singular, since the eigenvalues of X are not on
the imaginary axis.
In the next section we present the theory of the Pad6 approximants of )-/2
for k >_- rn 1, which is based on well-known results for hypergeometric functions. This
theory is then used to analyze scalar sign function recursions in 3, where we also show
how it can be adapted to give globally convergent eigenvalue assignment iterations. In
4 we consider other rational iterations including Laurent methods. These scalar results
are useful because matrix convergence is predicated on the scalar convergence of the
eigenvalues of X ( 5 ). This leads to local convergence results for k >_- m 1, and global
convergence for the main diagonal approximants k m and k rn 1.
2. Pad6 approximations to (1 r)-/2. Let (a)n (a)(c + 1)’" "(c + n 1)

with (C)o 1, and define the family of hypergeometric functions
(2.1) 2F1 (G/’ "Y’ ) n (’,’),,

276 C. KENNEY AND A. J. LAUB
From ],
(2.2) (1 -j)-l/Z 2F1(1/2, 1, 1,()= f().
In general, the [k/m] Pad approximant to f is a rational function Pkm/Qkm where

deg (Pkm) k, deg (Qkm) m, and
Pkm( ._ _ .)
(2.3) f() O( k + m+ 1).
Qkm( t
Because f is a hypergeometric function, a great deal is known about Pkm and Qkm [1 ].
First of all [13], Qkm is related to the set of orthogonal polynomials over [0, defined
with respect to the weight function w(() ((-1/2/r)( ()-1/2(k+ l- for k >= rn 1.
m
If is the mth such polynomial with m( 1, then
2.4 Qkm( fz g;mm(t-),
and Qkm(O 1. From (2.4), the zeros of Qkm are just the reciprocals of the zeros of ff,.
Since the zeros of /m are simple [22] and lie in (0, ), the zeros of Qkm are also simple
and lie in 1, oo ). (This result could have been anticipated from another point of view
since ()-/2 has a natural branch cut along 1, oo) and, as noted in [2, pp. 51-57],
the zeros and poles of a Pad6 approximant tend to fall along the branchcuts of the
functions they approximate.) Denoting the zeros of Qkm by < z < z2 < < Zm, we
may write
(2.5) Qkm() I-[ (zi--g;)/zi.

i=1
This identity is useful for convergence analysis, but a more convenient form is
Qkm(() 2FI (-m,-1/2 k, -k- m, ()
(2.6) ,
,=o
(-m).(-1/2-k),"
n!(-k-m),
=_
E q.mt".
n=O
From [13], ekm is given by
(2.7)
Pkm()--
.0k
1/2 )n( 1/2 m)m(n- k- m)m
n!(-k- m)m(n+ 1/2 m)m
Zeknmn.
n=O
The key to the local error analysis ofPad6 recursions is the following theorem, which
was proved by Leipnik [18, Thm. and stated by Kovarik 17, lemma following Thm.
2 for the polynomial case rn 0.
THEOREM 2.1. For k >= rn 1,
(2.8) Qm()--(1--)Pm()-"k+m+l( ii), i=1

where i i( k, m) > 0 for 0 <= <- I max (2k + 1, 2m) (k + rn + ), and
(2o9) arn(
i=1
Proof. From (2.3) and the fact that am() )em() is a polynomial of
order+k+m+ 1,
Qm()-(1-)em()=j Qkm Qkm
--k+m+l( cii),i=1
for some constants Co, c, c,. Setting gives (2.9). It remains to show that the
coefficients ci are positive. The idea of the proof is best illustrated by considering the
diagonals, m k t, for 1, 0, 1, ,
k in the Pad6 table. (For example, see Table
1.) For the first main diagonal, -1, t 0 and multiplying out the left side of (2.8)
kk+l 2
gives Co qk / > 0. For the second main diagonal, 0,/ 0, and c0 (p)2 >
0. For the first superdiagonal, 1, u 1, and
c0 ): > 0, c,
In general, for 0, u t, and the coefficients, G can be written as the sum of terms of
the form
(2.10) "k-
t’ Dkk-
+ P +
and
(2.11 o- o-
k-r k+r-s
where
(2.12) ONrNsNtNk.
We complete the proof of the theorem by showing that each term of the type (2.10) or
(2.11 is positive. From (2.7),
(2.13) e--(+r )+r-s( + t-k)_t(t+ r-s-k)_t

(k+ r-s)(t-2k)_( + t+ r-s)_t
Since both o-
+ and p t have sign (-
2.14 o o
k-r k+r-s >0.
Using (2.13),
p(p _p o_o_
+1) =-+-
( 1-(s_r+k_t)(k+r_s+
(s-r)(t-s+r+
>0
by (2.14) and (2.12) because (s r)/(s r + k t) N and
(t-s+r+)/(k-s+r+ 1)<1.
(Note that the degenerate case k s r does not cause a problem because (2.10)
then reduces to oo
o which is positive by (2.14)
3. Scalar Pad recursions. As we show in 5, the convergence of the matrix sequence

Xn } is determined by the convergence of the scalar sequences for the eigenvalues of Xo.
The scalar Pad6 recursions have the form
Pkm( Xn)
(3.1) Xn+ --X
Qkm( X2)
where Pkm/Qkm is the [k/m] Pad6 approximant to )-1/2. Table gives the expres-
sions for the fight-hand side of (3.1) for k and rn between zero and three. For example,
the case k 0, rn gives
2 Xn
(3.2) Xn+l
+x2
which might be called the "inverse" Newton method for solving the equation X 2
0 since the values Xl, x2," generated by (3.2) are the inverses of those generated by
the "regular" Newton method
(3.3) x,,+== xn+--

Xn
The case k 1, rn gives Halley’s method (see [10] for a related application). The
next theorem generalizes the local convergence results of Leipnik 18 and Kovarik 17 ].
THEOREM 3.1. Let [1- x) <1 for xo 6 C and define { x, by (3.1) for
k >= rn 1. Then
l-x,,2 -x) (+m+ln
(3.4)
and
(3.5) lim
q-oo
x sgn (x0).
Proof. By (3.1),
(3.6) 1-x2=(Qm()-(1-)Pm())/Qm(),
where Xo2. But Qkm has zeros Zl, Zm in 1, + ), so by (2.5)
(3.7) IQkm()l I__> fiZi--1 =Qkm(1).
Hm_lziZi g;I > fi--lZi
Zi Zi
TABLE
Padd recursions for the matrix sign function.
k=0 k=l k=2 k=3
x x x
m=0 (3-x g(5- Ox+3x ") (35-35x+Zlx"-Sx)
2x x(3+x 2) x(15+ 10x2-x 4) x(35+35x2-7x4+x 6)
m=l
+x + 3x 4 (1 + 5x 2) 8 + 7X 2)
8x 4x( + x 2) x(5 + 10x + x 4) X (35 + 105X + 2 lX X 6)
m=2
-
3 + 6x x -t- 6x + X + 10x + 5x 2 (3 + 42x + 35x 4)
16x 8x(3 + 5x ) 2x(3 + 10x + 3x 4) x(7 + 35x + 2 lx q- X 6)
m=3
5 + 15x z- 5X X 5 + 45X + 15X X + 15x -Jl- 15x x + 2 lx + 35x + 7x
Using Theorem 2.1 in (3.6) gives
II--x211 <. I[k+m+l( il[i)/IQkm()[ 2

i=1
[1-xlk+m+l( ci)/[Qkm(,)l
i=1
2
<--_ ]I-x[k+m+Qm(1)/IQ.m()] 2
=< I1 -xg + m+l
that xn
-
by (2.9) and (3.7). Repeating this argument gives (3.4). From (3.4), Xn2
sgn (x0), let h(x) XPkm( X2)/Qkm(
on the imaginary axis, h is continuous on the set
(3.8) S={x" l-x21 <}=-S+S_,
-
1. To see
X:Z). Since the only poles of h lie
whereS+-- {xeS’Rex>0},S_= {xeS’Rex<0}.By(3.4),htakesSintoS.

Since S+ N S_ and each is a connected set, h(S+) must lie entirely in S+ or S_,
because the continuous image of a connected set is connected. But S+ and h(
S+, so h(S+) S+. Similarly, h(S_) c S_. Thus if x0 e S+ then X S+_ for all n, and
by (3.4),
lim
In order to assess how well the set S in (3.8) approximates the region of convergence
for the recursions in (3.1), we define the basins of attraction for the fixed points
+1 of h"
(3.9) B+={x" n--lim+o xn=l}, B_={x" lim Xn=-l}.
n-- +
The Julia set 6 ], 19 for the recursion 3.1 is the boundary of the basin of attraction
of+l"
(3.10) Jkm=OB+.
Because of the unusual properties associated with Julia sets, Jkm is also the boundary of
the basin of attraction for -1"
(3.11) Jkm--On-.
(See [19] for a very readable introduction to Julia sets and the properties of rational
recursions such as (3.1); for a deeper study, see [6].)
Computationally, Jkm can be approximated by starting with (almost) any point
z0 C and then reversing 3.1 to solve for the predecessors of z0:
2 2
(3.12) Zn Zn+ 1Pkm( Zn+ 1)/ Qkm( Zn+
where Zn X-n in (3.1). Since (3.12) can be written as a polynomial in Zn + of order
1 max (2k + 1, 2m), there are 1 solutions n+’i) to (3.12), one of which is selected
at random to continue the iteration. This scheme takes advantage of the fact that for the
forward recursion 3.1 ), the Julia set is repulsive; points near Jkm move to + 1. In reverse,
under (3.12), the Julia set becomes attractive and nearly all orbits of points are dense
n
in Jkm (see 6, Thm. 2.5 ). Thus by plotting { z i) ) for n > 30 (to allow the initial points
time to approach the Julia set) we obtain a good graphical approximation of Jkm and
thus can assess easily the real region of convergence of (3.1) as compared to the set
]1 x 2 < 1. This was done for each of the recursions given in Table (excluding the
globally convergent main diagonal recursions), and the results are displayed in Figs. 1-
9, along with the set x21 for comparison (this set looks like an "infinity" symbol
centered at zero). In each of these figures, the principal domains of attraction of + are
the largest connected regions, inside the Julia set, which contain + 1, respectively. The
other connected regions nested within the Julia set map onto these principal domains
after a finite number of steps in (3.1). For the multiplication-rich polynomial recursions
(m 0), the set x21 < provides a rather good approximation to the actual region
of convergence. However, as m increases toward k, that is, as we move toward the main
diagonals k m or k m 1, the region of convergence becomes much larger than
[1--xZ[ <1.
We now show that along the main diagonals, the regions of convergence are as large
as possible and we have, in fact, global convergence. That is, if Xo is not on the imaginary
axis then lim_. + x, sgn (x0).
polynomials -xP,m(
of .
First note a rather remarkable property of (3.1) for k m and k m
x 2) and Qm(
x) +m / This makes it very easy to write down the appropriate recursion. For
example, if k m 2, then
lOx3+5X4--X 5,
1: the
x ) are, respectively, the odd and even parts
(l--x) k+m+l 1--5X+ 10X 2-

SO xP22 x 2) 5 X 10x x 5, and Q22 x 2) + 10 X 2 -t- 5 X 4. Thus the
2
2 / 2 recursion is xn + xn 5 + 10x2 + xn) / + 10x + 5 x,).
4 4
This property can be
proved either by manipulating the series (2.6), (2.7) or by starting with --XPkm( x 2)
and Qm( x 2) as the odd and even parts of x) +m + and then showing that
(2.3) is satisfied.
THEOREM 3.2. Let Xo C + U C- and let { x, } be defined by (3.1)for k m or
k m- 1. Then
(3.13)
+x, +xo for XoeC +
and
(k+m+l)n
(3.14)
l+Xn
-x (l+Xo)
-x0
for x0eC-
In either case, for s sgn (x0),
(3.15)
s+xn +Xo/
Proof. Equations (3.13) and (3.14) are identical except for being inverses of each
other to avoid division by zero when x0 + 1. Let x0 e C + for convenience and set xl
XOPkm( Xg)/Qkm( xg).By the preceding remarks, for any x, and k m or m 1,
(3.16) Qkm( X2 XPkm( x2 x) + + 1.
Replacing x by -x gives
(3.17) Qk,( x2 + XPkm( X2 -{- X) k + + 1.
-1
-g -i 0 1
FIG. 1. Pad convergence region for k 1, m O.
1.0
0.5
0.0
-0.5
-i 0 1
FIG. 2. Pad convergence region for k 2, m 0.
0.5
0.0
-0.5
-I 0 1
FIG. 3. Pad convergence region for k 3, m O.

-4 -2 0 2, 4
FIG. 4. Padd convergence region for k 0, rn 2.
-2
-4
-5 0 5
-2

-1
FIG. 7. Pad convergence region for k O, rn 3.
FIG. 8. Pad convergence region for k 1, rn 3.
10
-10
-10 0 10
FIG. 9. Pad convergence region for k 3, rn 2.

284 c. KENNEY AND A. J. LAUB
Thus,
--Xl-- (Qkm(1--X)--XoPkm(1--Xg))/Qkm(1 --Xg)
(l--NO) k + m + / okm
and
-I-Xl--( -I-Xo) k+m+ l/Qkm --X).
Dividing, we obtain (3.13 for n 1. Repeat to get the general statement.
From Theorem 3.2, we immediately get Theorem 3.3.
THEOREM 3.3 (Global Convergence). If xo C + U C-, then for k m or k
rn l, with m _-> limn- +o xn sgn (xo).
Proof. By Theorem 3.2, we need only show that I( i Xo)/(
C + or I( + Xo)/(
Then l(
for Xo C-.
d+(xo) =- [(
Xo)/( + Xo)l 2
-
+ Xo)l < for Xo
Xo)[ < for Xo C-. Let Xo Pe C + with -7r/2 < 0 < 7r/2.
20 cos 0 + 02)/( + 20 cos 0 + 02) < 1, and similarly
From Theorem 3.2, we see that the distance measure from Xo to given by
Xo)/( + Xo)[ and its counterpart for-l, d-(xo)=- I( + xo)/(
are more natural than [1 x and [1 + x I, respectively. For example, Xo 10 .6 and
Xo)l,
1/xo 106 are equidistant from under the regular Newton method (since (1.2) is
symmetric with respect to xo and 1/xo) but xol while 1/Xol 106. (See
[3] and [15].)
Theorem 3.2 is also useful in establishing the equivalence of certain methods in
the Pad6 table. For example, if xo C /, then two steps of the inverse Newton method
(k 0, m give
4
(3 18) l-x2=
l+X2 (1-xo)
I+X0
However, if 21 denotes the result of taking one step from Xo with the recursion (k 1,
m 2), then
4
(3 19)
1--)l
+2 _(1-Xo)
-Xo
Solving for x2 and .l, we find x2 21. Similarly, if we take one step with (k 0,
m followed by a step with (k 3, m 3) the result would be the same as one step
with (k 6, m 7).
THEOREM 3.4 (Equivalency). Let xo e C + tO C- and let Xr be the result of applying
r steps of the (possibly different) main diagonal Padd recursions [k/mll, [k/ml.
Then Xr 2, where 2 is obtained by main diagonal steps [kl/rhl],
provided that both are of the same order, i.e.,
-
(3.20) (ki+ mi+ )= [I (/i+ rfii+ )p.
i=1 i=1
Proof. Applying Theorem 3.2 for each individual step,

--Xr XO
l+Xr l+x0
Solving for X and 2e gives Xr 2f. If xo C-, use (3.14). rq
4. Other rational methods. In this section we consider other rational iterations,

including eigenvalue assignment methods, Cayley transform methods, and Laurent series
methods. Eigenvalue assignment methods were introduced by Balzer 3 ], in the form of
scaled Newton methods which move specified real eigenvalues to x in one step.
These methods were shown to be globally but not quadratically convergent. By using the
methods of Theorem 3.2, it is easy to construct globally convergent methods of arbitrarily
high order that will move any selected set { k } of real or complex conjugate eigenvalues
to x in one step.
For example, if we want a fourth-order method which assigns X 2, ),2 + i,
and 3 to x 1, then we let -xp(x 2) and q(x 2) be, respectively, the odd and
even terms in the expansion of x)4(2 x)( + x)( x):
(1 X)4(2 X)(1 -k- i- x)(1 i- x)
4 22x + 52x 2 69x + 56x 4 28x + 8X 6 X 7.
Then
xp(x -) 22x + 69x 3 + 28x + x 7 x(22 + 69x 2 + 28x 4 +
q(x2) 4 + 52x 2 + 56x 4 + 8X 6 =4( + 13x 2 + 14x 4 + 2x6),
and the desired iteration is
x,(22 + 69 x2 + 28 X4n + x6)
Xn+
4( + 13X 2 + 14X 4 + 2X 6)
In order to prove global convergence for these assignment methods, we need the
following lemma.
LEMMA 4.1. Let Re z > 0, Re ), > 0, and r > O. Then
rDz
(4.1) <1,
rWz
and
(4.2)
Proof. If we set x z/r, then Re x > 0 and

r-z 1-x
r/z <1,
l+x
as in the proof of Theorem 3.3. Now say X re i, z pe ie where 4, 0 (-r/2, r/2).

Then
I(X- z)(X- z)[ 2 (r 2- 2or cos 0 cos b + 02 COS 24)) 2

+ sin 2 ( 202 COS (]) 2or cos 0) 2
< r 2 + 2or cos 0 cos + p 2 COS 2)2
+ sin 2 )( 202 COS b q- 2or cos 0) 2
-I(X/z)(X/z)l 2,
THEOREM 4.2. Let { X, X2, X,} be a conjugate symmetric set in the open
right-half plane and--xp(x 2) and q(x 2) be, respectively, the odd and even parts of
x)v(Xl x)...( x). Then the iterative method
2
xp(x)
(4.3) xn+ q(x2)
is globally convergent oforder
for s sgn
and takes { , 2, ,} to x in one step. Moreover,
(4.4)
s+x,+l s+x, s+ ,s+
and
(4.5)
s+xn+ s+xn
Proof. We shall prove (4.4) and (4.5) for the case s 1, since the case s -1
follows immediately. From (4.3),
1--Xn+l q(X)--x,,p(x2)
+X,+l q(x])+x,,p(x2,,)
(-x)(X-x)...(X-x)
(1 q-X)3’(kl At-X)"" "(X# AvX)
which proves (4.4). Inequality (4.5) then follows from (4.4) and Lemma 4.1. E]
Remark 1. Since xp(xZ)/q(x 2) in (4.3) is an odd function, it also moves
{ X l, k2, k,} to in one step.
Remark 2. In 10], Gander gives a family of quadratically convergent methods
which depend on a parameter f:
(4.6) x+ Xn
2f-3 +xn2
f-2+fx2
In Theorem 2 of[10 ], it is shown that (4.6) is globally convergent for f > 2 and for
f= 3 gives Halley’s method, which is cubically convergent. For f< 2, prescaling must
be done to ensure convergence. We can interpret Gander’s method as a second-order
method which makes one real eigenvalue assignment. Expand
(1--X)2(X--X)=X--(2X+ 1)X+(2+X)X2--X 3,
and use the method of Theorem 4.2 to obtain the iteration
x,(2X + +x2)
(4.7) Xn+
X+(2+ X)x2
This is the same as (4.6) for X f- 2. Thus the condition f> 2 for global convergence
in (4.6) is just the requirement that the real eigenvalue X, which gets mapped to x 1,
must be in the right-half plane as in Theorem 4.2. Moreover, f 3 corresponds to X
being triply assigned to x 1, so that the iteration is cubically convergent (Halley’s
method).
Remark 3. Allowing some of the eigenvalues X in Theorem 4.2 to be multiple

results in methods in which X is mapped to x and points near X are taken at least
quadratically to x 1. For example, expanding x)2(2 x) 2 gives the second-order
method in which ), 2 is doubly assigned to one:

xn(12 +6x)
Xn+l 2 4"
4+13x+x
If x0 2.1, then Xl .99985. .
As indicated in the Introduction, another family of methods can be based on the
Cayley transform. For x 4 -1, let
1--x
(4.8) Y-1 +---"
Let 2, denote the result of multiplying y by itself u times and then transforming back:
(4.9) 2,-
l+y ."
From this we see that
(4.10)
(1 +2-- \ +x]
Now suppose that x is defined by (3.1) for one of the main diagonal (k m or k
m Pad6 recursions where u (k + m + ). By the Equivalency Theorem 3.4, we
must have
(4.11 x xn.
Thus the Cayley transform method and the Pad6 recursions produce exactly the same
results, except that the arithmetic operations of inversion and multiplication have been
rearranged. It was pointed out earlier that this can have a significant effect in the matrix
case, since the Cayley transform approach is multiplication-rich compared to the Pad6
methods. We now extend the Cayley transform method to the case where x -1 or
where -1 is an eigenvalue of X in the matrix case.
From (4.8) and (4.9),
1-
l+x (1 +x)"-(1 -x)
(4.12)
1+
(1 +x)"+(1 -x) ""
1+
The next lemma shows that the fight-hand side of (4.12) is well defined for any x which
is not on the imaginary axis.
ger .
LEMMA 4.3. Let x C + t_J C-. Then + x) + x) 4 0 for any positive inte-
Proof. Suppose to the contrary that (1 + x)" + (1 x)" 0. Then x 4 1, so

+ x)"/( x)" -1. This means that + x)/( x) is a th root of- 1: + x)/
x) e i where 0 is not an odd multiple of 7r (else x + oe ). Solving for x we find
x (sin 0/( + cos 0))i C + U C-, which is a contradiction.
-
We end this section with a short discussion of Laurent methods, which are polynomial
iterations in x and x of the form
x.+,= E
These methods are motivated by a desire to generate a "multiplication-rich" iteration
once X -1 has been computed. For example, Newton’s method is of this form with u
1, b-ll 1/2 bll. If we let
Z
j--
then the coefficients bj, can be determined from L( 1, L’( 0, L (2"- i)(
0. (Other conditions which assign specified eigenvalues to x can be used as well.)
-2
-4
FIG. 10. Laurent convergence region for , 3.
-5 o 5
FIG. 1. Laurent convergence region for , 5.
Because of symmetry reasons we generally want L to be an odd function, L(-x)

-L(x), so that u should be odd and bj, 0 wheneverj is even. After Newton’s method
(v the next two methods (v 3 and u 5) are of order four and six, respectively,
and take the form
Xn+ =-{-d --X-n +-+x. 9Xn--X3" forv= 3,
Xn+ =7552 XSn(73. 660 4270

x3 + 4580xn- 815x3 +
xn 104XSn) for v= 5.
These methods are multiplication-rich in the sense that they require one matrix inver-
sion and + multiplies per step. However, they are not globally convergent and, in
fact, the region of convergence for these two methods does not even include the set
Ix 2 2 11 < 1, as do the Pad6 methods. This is illustrated in Figs. 10 and 11, where the
set Ix 11 is included for comparison.
5. Matrix convergence. In this section we show that convergence in the matrix case
is determined by the scalar convergence of the eigenvalues. This allows us to apply the
scalar convergence results of the previous sections to the matrix case.
The following general result is the key to this process.
LEMMA 5.1. Let R R(x) be an odd rational function such that R(1) and
R’( O. Let Xo e C + tO C- such that limn- +oo Xn sgn (Xo), where x +1 R (x,).
Let Xo be a Jordan block of the form
o 1
Xo" 0
0 1
Xo
Then the matrix sequence defined by Xn + R X satisfies limn-+ + Xn sgn (Xo).

Proof. Let Rl(x) R(x),R_(x) R(R(x)), andin general Rn+ l(x) R(Rn(X)).
Because Xo is a Jordan block,
a2 av
al
5.1 X,, R,,(Xo)
0 ". az
al
where u is the order of Xo and
Thus al(n) R,,(Xo)
a2 n
aj aj( n
(j- 1)!
x,, -+ sgn (Xo) by assumption. Forj
---X
d
-
d 7---1R" x0
X
2,
xo X Rn XO
T Xo Xn -1
0 by assumption. Thus a2 (n) 0.

--
by the chain rule. But limn-+o dR/dxlx._l dR/dxlsgn(xo) 0 since sgn (Xo) +/-1
and dR / dx( +_
As an induction hypothesis suppose that aj(n) 0 for 2 _-< j _-< 1. Then by the
chain rule
ai(n) ai(n- )+ rn
Xxn -1
j =< 1. Thus r,
-
where r, has a fixed form, independent of n, involving sums and products of aj. for 2 _-<
0 by the induction hypothesis. Since dR/dxlx._, also tends to zero
we have lim_ + ai(n)=O. This means that limn_+ X sgn (x0)I sgn (X0).
Using Lemma 5.1, we obtain the matrix analogues of Theorems 3.1-3.4 and the
Cayley power method.
THEOREM 5.2. Let k >= m and assume that the eigenvalues of Xo lie in C /
C- Assume that [II X ) < and define
Then
5.2
and
(5.3)
X+
I- X .
--X, Pkm(I-- xZ,)Q-m(I- X2,).
lim
< I- X
X
(k +
sgn (Xo).
+,
/oo
Proof. The condition I- X < ensures that 11 )k2[ < for any eigenvalue
2 of Xo. Hence by Theorem 3.1, the eigenvalues h,"(i) for X converge to sgn (Xo(i)). By
Lemma 5.1 and the definition of sgn (X0) in terms of its Jordan form, (5.3) is true. The
matrix inequality (5.2) can be obtained by using the matrix analogue of the arguments
in the proof of Theorem 3.1.
THEOREM 5.3. Let A(X0) c C + U C- and assume that k m or k m in
(4.3). Then for 3’ k + m +
-
(5.4) lim X=sgn Xo=-S,
/oo
(5.5) (s-x.)(s+x.) [(S-Xo)(S+ x0)-’] "

and
(5.6) Xn= (A’’- B-’)(A’n + B"/’)-
where
(5.7) A=I+Xo and B=I-Xo.
Proof. By Theorem 3.3 the eigenvalues of Xo converge under (3.1) to the appropriate
value of + 1. By Lemma 5.1, this means that lim._ X. sgn (Xo). Equation (5.5) is
obtained by considering the individual Jordan blocks and using (3.15). Similarly, use
Lemma 4.3 to see that (5.6) is true for each Jordan block and hence for Xn itself. Vq
6. Conclusion. In this paper, we have presented a theory of rational recursions for

the matrix sign function, including Padr, Laurent, Cayley transform, and eigenvalue
assignment methods. Of particular interest are the globally convergent main diagonal
Pad6 iterations and their multiplication-rich Cayley transform equivalents.
Several important aspects concerning the numerical evaluation of sign function

iterations have been treated elsewhere and so have not been discussed here. For example,
scaling can significantly increase the speed of convergence of Xn to sgn (X) as noted in
[3] and [4]; for scaling related to the polar decomposition, see [11]. The choice of
optimal and nearly optimal scaling constants for Newton’s method is discussed at length
in [15] and it is not hard to adapt these results to the main diagonal Pad6 recursions.
Similarly, the problem of estimating the sensitivity of the sign of a matrix is considered
in [16], based on the work in [8], [14], and [20].
REFERENCES
G. BAKER, Essentials ofPad Approximants, Academic Press, New York, 1975.
2 G. BAKER AND P. GRAVES-MORRIS, Pad approximants, Vol. 13, Encyclopedia of Mathematics and Its
Applications, Gian-Carlo Rota, ed., Addison-Wesley, London, 1981.
[3] L. A. BALZER, Accelerated convergence of the matrix sign function method of solving Lyapunov, Riccati
and other matrix equations, Internat. J. Control, 32 (1980), pp. 1057-1078.
[4] G. BIERMAN, Computational aspects of the matrix sign function solution to the ARE, in Proc. 23rd Con-
ference on Decision and Control, Las Vegas, NV, 1984, pp. 514-519.
5 / BJRCK AND C. BOWIE, An iterative algorithm for computing the best estimate ofan orthogonal matrix,
SIAM J. Numer. Anal., 8 1971 ), pp. 358-364.
[6] H. BROLIN, Invariant sets under iteration of rational functions, Ark. Mat., 6 (1967), pp. 103-144.
[7 R. BYERS, Solving the algebraic Riccati equation with the matrix sign function, Linear Algebra Appl., 85
(1987), pp. 267-279.
[8] A. CLINE, C. B. MOILER, G. W. STEWART, AND J. H. WIIKINSON, An estimate for the condition number
of a matrix, SIAM J. Numer. Anal., 16 (1979), pp. 368-375.
9 E. D. DENMAN AND A. N. BEAVERS, The matrix sign function and computation in systems, Appl. Math.
Comput., 2 (1976), pp. 63-94.
[10] W. GANDER, Algorithms for the polar decomposition, SIAM J. Sci. Statist. Comput., 11 (1990), pp.
1102-1115.
11 N. J. HIGHAM, Computing the polar decompositionmwith applications, SIAM J. Sci. Statist. Comput., 7
(1986), pp. 1160-1174.
[12] N. J. HIGHAM AND R. S. SCHREIBER, Fast polar decomposition of an arbitrary matrix, SIAM J. Sci.
Statist. Comput., 11 (1990), pp. 648-655.
13] C. KENNEV AND A. J. LAUB, Pad error estimates for the logarithm of a matrix, Internat. J. Control, 50
(1989), pp. 707-730.
14] , Condition estimates for matrix functions, SIAM J. Matrix Anal. Appl., 10 (1989), pp. 191-209.
[15] ., On scaling Newton’s method for polar decomposition and the matrix sign function, SIAM J.
Matrix Anal. Appl., 13 (1992), to appear.
16 , Polar decomposition and matrix sign function condition estimates, SIAM J. Sci. Statist. Comput.,
12 1991 ), pp. 488-504.
17] Z. KOVARIK, Some iterative methods for improving orthonormality, SIAM J. Numer. Anal., 7 (1970),
pp. 386-389.
18 R. B. LEIPNIK, Rapidly convergent recursive solution of quadratic operator equations, Numer. Math., 17
1971 ), pp. 1-16.
19 H. O. PEITGEN, D. SAUPE, AND F. V. HAESELER, Cayley’s problem and Julia sets, Math. Intelligencer, 6
(1984), pp. 11-20.
[20 J. R. RICE, A theory of condition, SIAM J. Numer. Anal., 3 (1966), pp. 287-310.
[21 J. D. ROBERTS, Linear model reduction and solution of the algebraic Riccati equation by use of the sign
function, Internat. J. Control, 32 (1980), pp. 677-687.
[22] G. SZEG6, Orthogonal Polynomials, American Mathematical Society, Providence, RI, 1939.

Rational Iterative Methods For The Matrix Sign Function

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rational Iterative Methods For The Matrix Sign Function

Uploaded by

Copyright:

Available Formats

SIAM J. MATRIX ANAL. APPL.

(C) 1991 Society for Industrial and Applied Mathematics

RATIONAL ITERATIVE METHODS FOR

CHARLES KENNEYy AND ALAN J. LAUB"

AMS(MOS) subject classifications. 15A24, 65D99, 65F99

1. Introduction. It is a classical result that the algebraic Riccati equation can be

1.3) sgnz=I -1 if Rez>0,

(1.5) sgn (X)=

(1.9) III-X=ll < ,

where 3’ k + m + is the order of the approximation. (For Newton’s method, a similar

2. Pad6 approximations to (1 r)-/2. Let (a)n (a)(c + 1)’" "(c + n 1)

(2.1) 2F1 (G/’ "Y’ ) n (’,’),,

In general, the [k/m] Pad approximant to f is a rational function Pkm/Qkm where

(2.5) Qkm() I-[ (zi--g;)/zi.

Qkm(() 2FI (-m,-1/2 k, -k- m, ()

From [13], ekm is given by

(2.8) Qm()--(1--)Pm()-"k+m+l( ii), i=1

where i i( k, m) > 0 for 0 <= <- I max (2k + 1, 2m) (k + rn + ), and

Qm()-(1-)em()=j Qkm Qkm

(2.13) e--(+r )+r-s( + t-k)_t(t+ r-s-k)_t

by (2.14) and (2.12) because (s r)/(s r + k t) N and

3. Scalar Pad recursions. As we show in 5, the convergence of the matrix sequence

(3.3) x,,+== xn+--

k=0 k=l k=2 k=3

Using Theorem 2.1 in (3.6) gives

II--x211 <. I[k+m+l( il[i)/IQkm()[ 2

whereS+-- {xeS’Rex>0},S_= {xeS’Rex<0}.By(3.4),htakesSintoS.

(l--x) k+m+l 1--5X+ 10X 2-

In either case, for s sgn (x0),

FIG. 1. Pad convergence region for k 1, m O.

FIG. 3. Pad convergence region for k 3, m O.

FIG. 4. Padd convergence region for k 0, rn 2.

FIG. 5. Padd convergence region for k 2, rn 1.

FIG. 6. Padd convergence region for k 3, rn 1.

FIG. 7. Pad convergence region for k O, rn 3.

FIG. 8. Pad convergence region for k 1, rn 3.

FIG. 9. Pad convergence region for k 3, rn 2.

Proof. Applying Theorem 3.2 for each individual step,

4. Other rational methods. In this section we consider other rational iterations,

Proof. If we set x z/r, then Re x > 0 and

as in the proof of Theorem 3.3. Now say X re i, z pe ie where 4, 0 (-r/2, r/2).

I(X- z)(X- z)[ 2 (r 2- 2or cos 0 cos b + 02 COS 24)) 2

Remark 3. Allowing some of the eigenvalues X in Theorem 4.2 to be multiple

method in which ), 2 is doubly assigned to one:

Proof. Suppose to the contrary that (1 + x)" + (1 x)" 0. Then x 4 1, so

FIG. 10. Laurent convergence region for , 3.

Because of symmetry reasons we generally want L to be an odd function, L(-x)

and take the form

Xn+ =-{-d --X-n +-+x. 9Xn--X3" forv= 3,

Xn+ =7552 XSn(73. 660 4270

Then the matrix sequence defined by Xn + R X satisfies limn-+ + Xn sgn (Xo).

where u is the order of Xo and

Thus al(n) R,,(Xo)

x,, -+ sgn (Xo) by assumption. Forj

0 by assumption. Thus a2 (n) 0.

(5.5) (s-x.)(s+x.) [(S-Xo)(S+ x0)-’] "

6. Conclusion. In this paper, we have presented a theory of rational recursions for

Several important aspects concerning the numerical evaluation of sign function

You might also like