Professional Documents
Culture Documents
Yun-Chian Cheng,
David J. Houck, Jun-
This paper describes the AT&T KORBX system,
Min L1u, Marc S. which implements certain variants ofthe Karmarkar
Meketon, Lev Sluts- algorithm on a computer that has multiple proces-
man, and Pyng Wang
are in the Advanced
sors, each capable ofperforming vector arithmetic.
Decision Support Sys- We provide an overview ofthe KORBX system that
tems Department at
AT&T Bell Laboratories
covers its optimization algorithms, the hardware
in Holmdel, New Jer- architecture, and software implementation tech-
sey. Robert J. Vander- niques. Performance is characterized for a variety
bei is a member of
technical staff in the
ofapplications found in industry, government, and
Mathematics of Net- academia.
works and Systems
Department at A'T&T Introduction
Bell Laboratories in This paper describes the AT&T KORBX system, which com-
Murray Hill, New Jer- prises severalvariants ofthe Karmarkar linearprogramming algorithm
sey. Mr. Cheng joined implemented on a parallel/vector machine called the KORBX system 7
the company in 1985 processor. Within AT&T, the KORBX systemis being used to solve
and has a Ph.D. in difficult planning problems, such as the Pacific Basin network-facili-
computer science from ties-planning problem. The KORBX systemis alsobeing used by com-
the University of mercial and government customersfor manytypesof applications. For
Wisconsin at Madison. example, it is being used by the U.S. AirForce Military Airlift Com-
Mr. Houck joined the mand (MAC) to solve critical logistics problems and by commercial air-
company in 1979 and linesto solve scheduling problems, such as crewplanning. (Panel 1
has both a B.A. in defines terms and acronyms used in this paper.)
mathematics and a Before getting into details- we shouldaddress the question:
Ph.D. in operations What is a linearprogram? It is an optimization problem in which one
research from The wantsto minimize or maximize a linearfunction (ofa large number
Johns Hopkins Univer- ofvariables) subjectto a large number oflinearequality and inequality
sity. Mr. Meketon constraints. For example, linearprogramming maybe used in manu-
joined the company in facturing-production planning to develop a production schedulethat
1982 and has a B.S. in meets demandand workstation capacity constraints, while minimizing
mathematics from Vil- production costs.
lanova University and Since 1947 whenthe conventional simplex method was pro-
both an M.S. and Ph.D. posed by George Dantzig,' linearprogramming has been effectively
in operations research employed to solve decision-making problems in nearly allsectors of
from Cornell University. industry. Linear programming is also commonly used in engineering
(continued on page 19) and scientific analyses. The KORBX systemimplements newtechnol-
Cost = 24
KORBX system does indeedhandle problems withfinite polyhedron and, by jumping to an adjacent vertex,
upper bounds. reduces the objective function value in each iteration. In
The constraintsAx = b form an n - m dimen- contrast, the Karmarkar algorithm- and its variants start
sional hyperplane in an n dimensional space.Each non- at an interiorpointand then jumpin a direction of
negativity constraintx i ~ 0 cuts offa halfspacefrom the descent fromone interiorpointto another.eventually
hyperplane, resultingin a multidimensional polyhedron converging to an optimal solution.
(see Figure 1) that is called the feasible region. Associ- Corresponding to any linearprogramis another
ated witheach pointin the feasible region is a cost c Tx. linearprogram, called its dual. The duallinearprogram
The objective is to find a feasible pointthat minimizes that corresponds to program (2), assuming the upper
this cost.This is called an optimal solution. If the prob- bounds are infinite, is:
lem has a singleoptimal solution, it must be at a vertex
ofthe polyhedron. However, most real-world problems
have multiple optima. The set ofmultiple optima always
consists of some face ofthe polyhedron. In either case, subjectto A TW s C (3)
there are optimal solutions that are at vertices.
The simplex method starts at a vertex ofthe The duallinearprogramis important for several reasons.
AT&TTECHNICALJOURNAL. MAY/JUNE1989
For example, it is used to check for optimality. Indeed, series enhancement and type of equationsolver.
the dualitytheorem of linear programming says: Power-Series Enhancement. The idea in power-series
1HEOREM: [fx' isprimal feasible li.e.,Ax' = b (x' ~ O)} enhancement is as follows. For all three methods, at
andw' is dualfeasible (i.e., ATw' s c), then the duality each iteration,the current solutionis used to compute a
gap cTx' - b Tw' is nonnegative. Furthermore, the duality directionand a large step is then taken along this com-
gap iszero if and only if x' is optimal forthe primal and puted direction. If these long steps are replaced by
w' is optimal forthe dual. shorter and shorter steps, then-in the limit-we arrive
The optimizerin the KORBX system implements at a smooth curve that traces a path from any givenfeasi-
severalvariants of the Karmarkar algorithm. Each vari- ble starting point to an optimalsolution. This smooth
ant is an iterativealgorithm. In every iteration, regard- curve is calleda continuous trajectory. The step direc-
less of the method selected, a primal solutionvector x tions in the basic methods are simplyfirst derivatives
and a dual solutionvector ware calculated. If (within a of a parametrizedrepresentation of the curve (i.e., the
tolerance) x is primalfeasible, w is dual feasible, and the tangent direction).
duality gap is zero, then the system terminates and the By computinghigher-order derivatives to gen-
current solution is declared optimal. erate a truncated power-series expansion, it is possibleto
follow the continuoustrajectories more closelythan by
Algorithms using onlythe first derivative (see Figure 2). Doingthis
The variants of the Karmarkaralgorithm imple- produces an algorithm that requires fewer iterations at
10 mented in the KORBX system can be divided into three the expense of more work per iteration. Furthermore, it
basic categories: yields a more robust algorithm.
- Primal-affine. 3- 6 The primal-affine algorithmfirst Currently, the power-series enhancement is
solvesan augmented linear program in order to find a implementedonlyfor the dual-affine and primal-dual
primalfeasible solution. Then, it generates a sequence methods. Detailsof the power-series enhancement for
of primalfeasible solutions with decreasing objective these two methods are given in Reference 11.
value, until eventually the corresponding dual solution Equation Solver. The second importantoption
is feasible and the dualitygap is zero. regards the method for solving systems of equations of
- Dual-affine. 78 The dual-affine algorithmworks the the form:
other wayaround. It starts by finding a dual feasible
solution and iterates, whilemaintaining dual feasibil- (4)
ity, until eventually the corresponding primalsolution
is feasible and the dualitygap is zero. where d is a knownvector and D is a knowndiagonal
- Primal-dual. 91o The primal-dual algorithmfirst matrix (i.e., a square matrix with zeros offthe diagonal).
achievesand then maintainsboth primaland dual This is the most computationally intensive part of any
feasibility, while all along reducing the dualitygap variant of the Karmarkaralgorithm. By default, all the
until it is zero. methods use Cholesky factorization (see, for example,
The affine algorithms (both primaland dual) are often Reference 12).The idea here is to first factor AD 2 A T
called affine-scaling algorithms. into the product of a lower triangular matrix L (called the
There are several algorithmic options that may Cholesky factor) and its transpose L T:
be used with the three basic methods outlined above. We
mention here the two most important examples: power- AD2A T'=L L T
AT&TTECHNICALJOURNAL.MAY/JUNE1989
tution. This method for solving systems of equations is
well understood (itis essentially the same as Gaussian
elimination) and yieldsrobust codes. Reference 13gives
some details ofhow Cholesky factorization and forward
and backward substitution are implemented in the
KORBX system.
The amountofworkto compute L depends
greatly on the nonzero structure ofAD2 A T. Typically,
constraintmatriceshave onlya few nonzerosin each
column. The nonzerostructure ofAD 2A T, whichis the
same for any diagonal matrix D, usually is also sparse.
However, the Cholesky factor L turns out to be denser
than the original AD2A T. The extra nonzerosin L are
referred to as jill-in. Bycarefully ordering the rows of A,
it is possible to preventsubstantial fill-in, thereby approx-
imately minimizing both memory requirementsand com-
Order 1 putational effort (see Figure 3).The most common reor-
Order 2 dering heuristic (andthe one used in the KORBX sys-
Order 3 tem) is called minimum-degree ordering (see, for exam- 11
Continuous trajectory ple, Reference 14).
Another method for solving system (4) is the
conjugate-gradient method (see,for example, Refer-
Figure 2. Power series trajectories. Starting from a solution ence 12). This is an iterative method that, at each itera-
near the center of the polytope, the green line is the order-1 tion, generates an improved estimateofy.IfAD 2AT is
trajectory, the blue line is the order-2 trajectory, the red line approximately the identity matrix, then this methodcon-
is the order-3 trajectory and the gold line is the continuous verges quickly. The preconditioned conjugate-gradient
trajectory. The optimal point Is labeled x", Note that the method uses an approximate Cholesky factorization to
higher-order trajectories track the continuous trajectory transformAD2 A T into such an approximate identity
better than the lower-order, power-series trajectories. matrix. -,
Becausethere are manyimplementation details
To solve system (4), one exploits the fact that L is lower (including the choiceof preconditioner), developing a
triangularto first solve for z: robust and quick algorithm is difficult. Somedetailsof
our implementation, including newdata structures, have
Lz=d been described in Reference 15. Currently, the precondi-
by a simpleprocess calledforward substitution. Then, y is tioned conjugate-gradient option is available onlywith
the dual-affine method and precludesusing the power-
the solution to: series enhancement. The advantage of using the precon-
LTy=Z ditioned conjugate-gradient option is that, when properly
tuned, it uses less memorythan Cholesky factorization.
obtained by a similarprocess called backward substi- Hence,extremelylarge problemscan be solved.
.
. ..:.
.. ........ :':'4.,...t--...
.
~
..
. ---- ..~. -~. . ~.I...
.. .
- ._. ..:::c.... ."
-~
~ ~
. -.
~ .
..
.- ........
(a)
~
~.
~ ~~
. I
(b)
AT&TTECHNICALJOURNAL.MAY/JUNE 1989
Figure 3. Reordering the matrix for less fill-In.
(a) A matrix; shows the nonzero structure of the
A matrix. (b) AD 2 A T matrix; shows the nonzero
structure below the diagonal of the AD 2 AT
matrix. (c) Cholesky factor L with original order-
Ing; shows the nonzero structure below the diag-
onal of the Cholesky factor. After the matrix has
been reordered using the mlnlmum-clegree
heuristic, there are fewer nonzeros In the Chole-
sky factor. (d) Cholesky factor after row reorder-
Ing; shows the nonzeros below the diagonal,
reordered by the mlnlmum-degree heuristic.
(c) (d)
cally intensive calculation, and six microprocessor-based see about32 MFLOPS. Bycarefully managing main
interactive processors (IPs) that are used for other types memory and cache memory, it is possible to do even
ofwork (forexample, editinga file). The standard config- better. For example, we have a dense matrix-multipli- 13
uration includes 256 megabytes (Mbytes) of main mem- cation routine-which is written in assembly language
ory, 512 kilobytes (kbytes) of high-speed cache memory, and is optimized to the greatest extent possible-that
four 1.1-gigabyte (Gbyte) disk drives, a tape drive, achieves 53MFLOPS. For linearprograms, a reasonable
printer,and communications hardware. Each CEcan rule ofthumb is to expectabout37 MFLOPS on dense
process instructions separately and has full access to problems and about 2 MFLOPS on sparse ones.
memory. Also, there are hardware-level memory and KORBX System Software. The KORBX operating
process locks. Efficient use ofthe eight parallel proces- system, an extension ofthe UNIX operating system,
sors can lead to an eightfold speedup over singleproces- has been specially designed to accommodate parallel
sor times. Each CE is a vector processor. In fact, each processing. It supportsvirtual memory and, hence,
CE includes eight vector registers, and each register can handles processes that require up to about 1 Gbyte of
handle 32 double-precision numbers at one time. Effi- memory. Included withthe system are Fortranand C lan-
cient use ofthe vector hardwarecan lead to a fourfold guage compilers. The Fortrancompiler has a loop ana-
speedup over scalar processing. However, on some lyzerthat tries to vectorize and parallelize a user's pro-
sparse matrixcomputations, the vectorizing benefitsare gram automatically. In addition, there are compiler
generally modest. directives-written as comment statements-that allow
In scalar mode, each processor operates at the programmerto control the loop analyzer. But for
about 1.0million floating-point operations per second complicated subroutines such as some ofthose in opt,
(MFLOPS) for double-precision arithmetic written in assembly language programming maybe needed to
Fortran. (Thefloating-point hardwarefollows the IEEE achieve the greatest possible parallelization and vectori-
standard.) If a Fortran routine reaps full benefitfrom vee- zation. For C programmers, a libraryprovides many rou-
torization and parallelization, then one shouldexpectto tines for generatingparallel algorithms.
Nonzeros KORBX
Problem in constraint Comparative system
name Rows Columns matrix Comparative environment time time
mopp2 3,685 12,701 83,253 IBM 3090/MPS III 11 5
airlinel 4,420 6,711 101,377 IBM 3090/MPSX 500 23
financial 846 44,974 136,970 IBM 3090/MPSX * 12
pbfplO 15,425 34,553 137,866 IBM 3090/MPSX 76 18
mopp3 5,316 13,236 174,231 IBM 3090/MPSIII 47 15
airline2 6,642 10,707 203,597 IBM 3090/MPSX t 78
pbfp19 27,971 76,815 249,455 IBM 3090/MPSX N 53
mac30 46,863 152,058 326,252 Cyber273 and IBM 3081/MCNF t 234
mac40 63,027 209,739 449,209 Cyber273 and IBM 3081/MCNF t 360
mac50 79,019 267,357 571,792 Cyber273 and IBM 3081/MCNF t 612
moppl 24,902 43,018 583,732 IBM 3090/MPS III t 450
mlp 20,420 519,660 2,017,380 Honeywell DPS-6 and IBM # 74
3081/Bendersdecomposition
NOTE: Run times are in minutes. The comparative environment and run times were reported to us by the client,
exceptfor financial, pbfp 10, and pbfp19, whichwere run at AT&T.
16 * Didnot solve after 120minutesand 100,000 iterations.
t Didnot solve after 1440 minutes.
Run times using a prereleaseversionof KLP Release 3.1
t Problem believed unsolvable by these systems.
# Method failed.
76minutesto solve the same problemon an IBM 3090 system in 7.5 hours (usingthe primal-dual, power-series,
computerusing MPSX. order-3 method).This problemhas not been solved
The ability to solve these problemsquickly gives using commercial simplex packagesbecause oftheir size
planners the opportunity to develop a more efficient net- and speed limitations. In an attemptto obtaina reason-
workdesign. Currently, this algorithm is being used to able solution using these simplex packages, the linear
solve the 19-year planning problem (pb f p l S}, which programwas decomposed into five smallermodels ac-
MPSX cannot handle because of size limitations. cordingto rank. Interestingly, solving the larger model
Military-officer personnel planning. Anotherlinear pro- produceda planthat was closer to the force structure
gram,called mopp 1 in Table II, plans U.S. Army officer requirements by 32 percent when comparedto the
promotions (to Lieutenant, Captain, Major, Lieutenant aggregate ofthe solutions to the five smallermodels.
Colonel, and Colonel), accessions (newhires), and train- The performance ofthe KORBX system on the
ing requirements by skillcategory. The objective is to smallermodelsis also interesting. The KORBX system
develop a planthat meets the Army's force structure solved the Lieutenant model (problem mo pp Z) the
requirements as closely as possible. fastest-i.e., in 5 minutes-while, according to the Army,
This linear program (which has 21,000 con- MPS III (a commercial implementation ofthe simplex
straints and 43,000 variables) was solved on the KORBX method) on an IBM 3090 Model 200 computertook
We compared the KORBX system against a owe a great deal to our colleagueson AT&Ts Advanced
specialized, simplex-based, multicommodity network- Decision SupportSystems team. In particular, we would
flow (MCNF) code developedat Southern Methodist like to acknowledge some of the peoplewhose contribu-
University by Kennington.P The problems we used were tions have had significant impacton the final product:
created with a generator developed by Ali and Kenning- Efthymios Housos and Chih Chung Huang for work on
18 ton." We ran the KORBX system on one CE and then on parallelization; Bob Murray,AdrianKester, and Srinivas
eight CEs. Kennington'ssolverwas compiled by the For- Sataluri for developing and enhancing the preprocessor
tran compilerwith the loop analyzerturned on. However, and postprocessor; and Karen Medhi for improvements
his code was not designed for a parallelmachine and, to the dual conjugate-gradient method. We would also
hence, benefited littlefrom vectorization and paralleliza- like to thank Narendra Karmarkar and RamRamakrish-
tion.Table III shows the results. nan for developing prototypecode for both the primal-
Resultsfor two other randomlygenerated multi- affine and dual conjugate-gradient methods and for other
commodity flow problems, ken 7 and ken 9, are part of help they have given us.
the benchmarks presented earlier in the simplex-method
comparison. References
1. G. Dantzig, "Maximization of a linear function ofvariables subject
to linearinequalities," Activity Analysis ofProduction andAlloca-
Summary tion, Tj. C. Koopmans (ed.) ,John Wiley & Sons,NewYork, 1951,
Karmarkar-based optimization algorithms, par- pp.339-347.
allel/vector processing hardware, and sophisticatedsoft- 2. N. K Karmarkar, "Anewpolynomial time algorithm for linearpro-
ware implementation techniques have been combined, gramming," Combinatorica, Vol. 4, No.4, 1984, pp.373-395.
yielding the AT&T KORBX system. The system is now 3. I. I. Dikin, "Iterative solution ofproblemsofLinearand Quadratic
Programming," Soviet Mathematics Doklady Vol. 8, No.3, 1967,
tacklinglinear-programming problems that were hereto- pp.674-675.
fore unsolvable by other methods. 4. I. I. Dikin, "On the speed of an iterative process," Upravlyaemye
Sistemi, Vol. 12,No. I, 1974, pp.54-60.
Acknowledgments 5. E. Barnes,"Avariation on Karmarkar's algorithm for solving linear
programming problems," Mathematical Programming, Vol. 36,
The AT&T KORBX system is the result of long 1986, pp. 174-182.
hours of hard work by many individuals. We especially 6. R J. Vanderbei, M. S. Meketon, and B. A Freedman, "Amodi-