Zakpoane Paper

#$
&
! " ' ' ' ( % '!
'
' $* , (
)* $ ''
+ ,+ !)* . +* +' )* . ' +* / !
0,
!' )* '# )* ,
1 ' /'
) ! / #
/2 34 3 5 6
! 7 #$ % & '' '( % '7 ! ' )* $ / 8 ! ' 9 $! # ! '! ( ' # 9 89395 ' ' '(' '! : '9 1' # ! '! # ) ! ' )* ; ( '* $ ' * < 6= ; ( '* 8( ' < >=9 : ' ? '! 1' ' ' ' ' # ( ' ' ( '# @ '# 1 '! ) '> '' ' 5 ' 9 $! '! '# ' ' ' '! ' ' ' 1 1' ' ' ' ' 7, '( ' $! $ 7 ' : ' '! ' / '* '* ' # ! '! 9 A # '! + ' '! '(' " ' ' 8 '(' : ' 1 ' ! '(' '(' ' 1 ' ' ( ' 9 $! ): ' ( '! # ! ' ' ) ! ) '! ' ' 9 B ' '! #! '* ' ) ' # '! '! ' ' '! ' ( ' ' '# # # '! ' ' ' ( *' ' '! 9 B * C '' '( ' * * '! ( ' ' + ' ' ' '# 9 $! * ( '! C '! * ' ! ' ' * '! '! * ' ' '! * ) '# ' ! ' '! '! * ' ! ' ' ) '* '! * ( ' ' ( ' ' % ( ! * ' C ' #9 / '! '* ' ' ' ) #! ! ' ) ( 9 D '! # ! ' '! #! # 4
' ! '! % '!
'!
'
'! ( '
'!
#!
'!
' #! 9 $! / ) ' ! ' ' 9
9 $! # ! # # ! ' '! ' ' '! ! '(' '! 395 ' ' ! ' '! # 7 8 *' % '! 7 ' ) ! # % ( '! # ! '! D $ $ , ' * ' ! '! 9
+ ,+ !
'
59 % ( ,! 9999999999999999999999999 # 2 ' ' ' 8 '! 8 ! /' /' ! ' 2 99999999999999999999999999999999 99999999999999999999999999999999 99999999 6 49 $ $% & $% 8 ( # '! ( % ( ! 99999999999999999999999999999999 99999999999999999999999999999999 999999999999999999999999999 6 9 ' ( ) # " " E 8 * F%F F * ' ( ' ' ! ' 99999999999999999999999999999999 99999999999999999999999999999999 999999999999999999999999 69 $ $% $! ' + ' '! '! 2G8/ ) * '! ' % ( ! 99999999999999999999999999999999 99999999 546 >9 & $% 8 ' ) ' !' ( % ( ! 9999999999999995 6 9 # # " 8' ' ( % ( ! 99999999999999999999999999999999 99999999 566 39 "* + $# C ) '# 99999999999999999999995>4 9 , * " $ . % / $ 8 * ' ) # '! '' ) ! ! 99999999999999999999999999999999 9999999999999999999999999999999 5> 5 9 0$ 0 " (0 # 1 ' ( ) * ' 99999999999999999999999535 559 2 3 & .3"3 4 8% ( ,! % H ' 'B ' "/ ' 8 ' # '! , ' 99999999999999999999999999 4 5 9 3 3 " ) 5 4 % ( % G ? '$ 9999999999999 54 549 , 6 !( , 6 !( ' ' ' '! ' ' '! I /$8I$99999999999999999999999999999999 99999999999999999999999999999999 999999999999999999 5 95 ' *% 1 ' & '# % # '! 2 99999999999999999999999999999999 99999999999999999999999999999999 999999999999999999999 9 ! "
6
'
% '!
569 ?
3 3 " ) 1
# ' ( ' .0 7 ' ' ' 99999999999999999999999999999999 99999999999999999999999 35 08 " 0# 70
5>9 +0 70 7 # " H , ,! ' ' ) ) '# ' 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999 4 6 5 95 + ") . 9. ,! ' + B ' $ # '! I ' ( ') $ 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999 4>5 539 5 / $ * B?G ) ' ') 9994 4 5 95 # : ( " ( ,/%8F,8 ' 8 H ' ? ) 2 % ' I '# 99999999999999999999999999999999 99999999999999 436 9 3% 3 7 " $ * ( * ' '# 9999999999999999999999999999994 5 59 3 ( ' * ! + ' ) ' 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999 4
>
First EuroNGI Workshop: New Trends in Modelling, Quantitative Methods and Measurements
Numerical Solution Methods For Markov Chains

Udo R. Krieger Otto-Friedrich-University Bamberg Faculty Information Systems and Applied Computer Science Feldkirchenstr. 21 D-96052 Darmstadt udo.krieger@ieee.org
Outline
1. 2. 3. 4. 5.
Introduction Modeling and analysis by Markov reward techniques The algebraic background of computational methods Direct solution methods Iterative solution methods 5.1 5.2 5.3 5.4 Properties of iteration procedures Block iteration methods The Algebraic Multigrid (AMG) method The multiplicative Schwarz iteration method
6.
Summary
Software packages
The following list is only a limited selection of available packages applying numerical solution methods for Markov chains. The specication is performed by following techniques: Queueing Networks MARCA W. Stewart, Introduction to the Numerical Solution of Markov Chains, Princeton University Press, ISBN 0-691-03699-3, 1994. Dept. of Computer Science, North Carolina State University, Raleigh, USA, http:http://www.csc.ncsu.edu/faculty/stewart/ MOSES, PEPSY, SPNP G. Bolch, K. S. Trivedi et al., Queueing Networks and Markov Chains Modeling and Performance Evaluation with Computer Science Applications, John Wiley, 1998, 726 pages, ISBN 0-47119366-6. Department of Computer Science IV, University of Erlangen, Germany, http://www4.informatik.uni-erlangen.de/Projects/PEPSY/en /pepsy.html Center for Advanced Computing and Communication, Duke University, http://www.ee.duke.edu/ kst/
MACOM U. Krieger, Otto-Friedrich-University Bamberg and H. Beilner, M. Sczittnick, Informatik IV, University of Dortmund, http://ls4-www.informatik.uni-dortmund.de/QM/werkzeuge.html TELPACK N. Akar, K. Sohraby et al., Computer Science Telecommunications, University of Missouri-Kansas City, http://www.sice.umkc.edu/telpack/ Matrix-Geometric Solvers and Cyclic Reduction Methods D. Bini, B. Meini, Department of Mathematics, Group of Numerical Analysis / Computational Mathematics, University of Pisa. http://www.dm.unipi.it/gauss-pages/meini/public html/ric.html http://www.dm.unipi.it/pages/bini/public html/ric.html Matrix-Analytic Methods and Algorithms G. Latouche and V. Ramaswami, Introduction to Matrix Analytic Methods in Stochastic Modeling, SIAM, Order Code SA05, February 1999, 334 pages, ISBN 0-89871-425-7.
10
Stochastic Petri Nets, Stochastic Reward Nets and Stochastic Activity Networks (X)MGMtool, SPN2MGM (matrix-geometric models, SPNs) Boudewijn R. Haverkort, Performance of Computer Communication Systems: A Model-Based Approach John Wiley, ISBN 0-471-97228-2, 1998. http://www-i4.informatik.rwth-aachen.de/lufg/lvs/tools/manual/ manual.html DSPNexpress C. Lindemann, Performance Modelling with Deterministic and Stochastic Petri Nets, 405 pages, bundled with DSPNexpress Software on CD-ROM, John Wiley, 1998. Informatik IV, University of Dortmund, http://mobicom.cs.uni-dortmund.de/ Lindemann/ GreatSPN M. A. Marsan, G. Balbo, Chiola et al., Modelling with Generalized Stochastic Petri Nets. J. Wiley, 1995. Dipartimento di Informatica, University of Torino, UltraSan W. H. Sanders et al., Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, http://www.crhc.uiuc.edu/PERFORM/research.html
11
HiQPN F. Bause,P. Buchholz, P. Kemper, Informatik IV, University of Dortmund, http://ls4-www.informatik.uni-dortmund.de/MuS/werkzeuge.html Reliability Models, Fault-tolerant and Dependable Systems Harp, Sharpe K. Trivedi et al., CACC, ECE, Duke University, Durham http://www.ee.duke.edu/ kst/ Parallel Programs and Stochastic Automata PEPS Plateau et al., LMC-IMAG, Grenoble, http://www-apache.imag.fr/software/peps/ Stochastic Process Algebras TIPPtool U. Herzog, M. Rettelbach, M. Siegle et al., Department of Computer Science VII, University of Erlangen, http://www7.informatik.uni-erlangen.de/tipp/tool.html
12
Software packages oering basic numerical solvers of linear algebra include: LAPACK, ScaLAPACK, ITPACK, SPARSE 1.3 Dongarra et al., Oak Ridge National Laboratory, http://www.netlib.org IML++ and SparseLib++ Pozo et al., University of Tennessee, http://gams.cam.nist.gov/acmd/Sta/RPozo/sparselib++.html MGNets code repository for multigrid codes Craig Douglas, Yale and others, http://casper.cs.yale.edu/mgnet/www/mgnet.html PVM (Parallel Virtual Machine) Al Geist et al., PVM: Parallel Virtual Machine, A Users Guide and Tutorial for Networked Parallel Computing MIT Press, 1994. http://www.epm.ornl.gov/pvm/pvm home.html http://www.netlib.org/pvm3/index.html.
13
Introduction on numerical solution methods for MCs
Which models can be analyzed by numerical solution methods?
What is the basic mathematical problem?
Which solution methods are employed?
Which algorithms are recommended?
What are the limitations of the approach?
What are the perspectives?
14
Modeling Modern Telecommunication Networks
Features of the networks: - bursty or peaked arrival processes e.g. overow or VBR trac streams - simultaneous arrivals of calls or packets, multi-slot trac - generally distributed holding times of resources - congestion phenomena e.g. blocking and losses - circuit switching - load balancing and adaptive routing - ow and congestion control mechanisms
Modeling requirements: versatile arrival streams (M, GI[PH], MAP, BMAP) batch arrivals, group service phase-type distributions of service times (PH) limited capacities of service stations simultaneous allocation and occupation of resources state-dependent routing synchronization concepts, changes of customer classes, triggered arrival streams
15
Tool-supported Network Modeling and Analysis

Objective:
investigation of a telecommunication system or network at an application-oriented level not at a mathematical-oriented level of abstraction
Features:
convenient interactive performance evaluation tools for modern telecommunication networks available system modeling by nite, homogeneous, discrete or continuous-time Markov chains (DTMCs/CTMCs) model specication by means of a graphical user interface automatic generation of the underlying DTMCs/CTMCs steady-state and transient analysis of the DTMCs/CTMCs by advanced numerical methods automatic evaluation of user-dened model characteristics derived from assigned state and transition rewards graphical representation of the results of the performance evaluation
Implementation:
in C/C++ under UNIX/LINUX running on workstations and using graphical user interfaces
16
Examples of Queuing Networks - Buer modeling of a packet-switched network
17
- Modeling and analysis of packet streams
- Path modeling by a closed QNW with Coxian service
18
19
The Structure of MACOM
20
The algebraic background of computational methods
Basic notions: A = (Aij ) IRnn Z -matrix D = Diag(Dii ) IRnn diagonalpositive A = (Aij ) IRnn M-matrix Aij 0, i = j A = sI B A M-matrix Qij 0, i = j n i=1 Qij = 0
n i=1
Aij 0, i = j Dii > 0 and s > 0, B 0, s (B )
A Metzler-Leontief (ML-) matrix Q = (Qij ) IRnn Q-matrix T = (Tij ) IRnn column stochastic matrix Notation: IRn x = (x j ) 0
and 1jn and 1jn
T 0 Tij = 1
xj 0 xj 0 x j0 > 0 xj > 0
1jn 1 j n and for some j0 1jn
x = (x j ) > 0 x = (x j ) 0
e = (1, . . . , 1)t
21
Formulation of the mathematical problem
construct automatically: - Q IRnn compute: - actual distribution for t > 0 - stationary distribution transient solution: solve subject to x ( ) = Q t x( ) x(0) = x0 p(t) = (IP(X (t) = i))i=1,...,n limt p(t) = - irreducible nonsymmetric generator matrix of a CTMC X (t) with n states
n x ( ) > 0, i=1 xi ( ) = 1 - 1st order linear ODE with homogeneous constant coecients
stationary solution: input: A = Q t problem: solve subject to Ax=0 x > 0,

n i=1
- irreducible nonsymmetric singular , M-matrix with zero sums along columns
xi = 1
22
Solution theory
Theorem 1 Let A IRnn be an irreducible singular M-matrix. Then: (1) Rank(A) = n 1 (2) Dim(N(A)) = 1, N(A) = {z |Az = 0} = null space of A. = 0 is a simple eigenvalue of A. (3) 0: A =0 is the unique solution subject to e t = 1. Any solution x of (4) y Ax = 0 is a multiple of . yt A = 0
0, y t = 1:
(5) All proper principal submatrices of A are regular M-matrices. (6) A is almost monotone: Ax 0 = Ax = 0 (7) The group inverse A# exists and is unique, i.e. AA# A = A, A# AA# = A# , AA# = A# A. A# A = I y t is a projector onto the range of A (R(A)) along the null space N (A) = span{ }.
23
Characterization of singular irreducible M-matrices Theorem 2 (Wen Li, 1995) Let A IRnn be a singular Z-matrix. Then following statements are equivalent: 1. A is an irreducible singular M-matrix. 2. Each principal submatrix of A other than A itself is a nonsingular M-matrix. 3. = 0 is a simple eigenvalue of A with corresponding positive right and left eigenvectors x, y , respectively. 4. A is an almost monotone matrix with rank n1, i.e., for any x, y IRn such that Ax 0, y t A 0 it follows Ax = 0, y t A = 0. 5. For A there exists a positive right eigenvector x to = 0 and A is an almost right monotone matrix with rank n 1 (i.e., for any z IRn such that Az 0 it follows Az = 0). 6. For each Z-matrix B such that B A, B = A it follows that B is a nonsingular M-matrix. 7. For each nonzero nonnegative diagonal matrix D, A + D is a nonsingular M-matrix.
24
8. For any regular splitting A = M N, i.e. M 1 0, N 0, let V = {1, . . . , n}, W1 = V \ W2 and W2 = {j V | j th column of N is nonzero}. Then M 1 A = I M 1 N is an M-matrix. W1 is its unique singular class and W2 its unique nal class. This means that for T = M 1 N 0 and the submatrices Tij = T (Wi , Wj ), i, j {1, 2} there exists a permutation matrix P such that PTPt = holds with the properties: (i) T11 = 0 and T21 = 0 (for any splitting with perhaps empty W1 ). (ii) Each row of T12 is nonzero. (iii) T22 is a (nonempty) nonzero irreducible matrix with a simple eigenvalue (T) = (T22 ) = 1. 0 T12 0 T22
25
Classication of numerical methods
Problem formulation: let Q irreducible generator matrix of a CTMC with n states Q = B D, D = Diag(Qii ) > 0 P irreducible t.p.m. a linear system (LS): 0=Ax A IRnn , x IRn A = Qt or A = I P t of a DTMC with n states
an eigenvalue problem (EVP): x=Tx T = P t or T = I + Qt D1 = B t D1 0 T = I + Qt /d 0, d maxi=1,...,n {Qii } > 0 (uniformization procedure)
Properties of matrix A: singular M-matrix zero sums along columns irreducible (very) large (1 000 n 10 000 000) sparse ( n i=1 ( {Aij = 0, 1 j n}) O (n)) nonsymmetric banded, block or tensor structured
26
Category Direct methods
Methods Gaussian elimination techniques QR method rank reduction techniques deation techniques inverse iteration procedures derived from matrix splittings of singular M-matrices and their accelerated variants
Algorithms LU decomposition for singular M-matrices GTH algorithm Crout-algorithm block LU algorithms blockwise bordering Harrods algorithms Jacobi procedure Gauss-Seidel procedure JOR SOR SSOR block SOR AOR A/D algorithms Chebyshev algorithm parallel SOR algorithms ILU algorithms CG, ICCG, BiCG power method simultaneous iteration Arnoldi algorithm nonsymmetric Lanczos GMRES QMR, TFQMR AMG algorithms IAD algorithms Kronecker algorithms (Buchholz et al.)
Iteration methods
Ax = 0
including semi-iterative methods multi-splitting methods preconditioning techniques conjugate gradient techniques Krylov subspace methods for an EVP Tx = x and a LS
Ax = 0
algebraic multigrid advanced iterative methods on structure
27
Direct solution methods for Markov chains
LU factorization of an irreducible singular M-matrix: given an irreducible M-Matrix A IRnn there exists a unique LU factorization A = L D UD = L U with a regular lower triangular M-matrix L with unit diagonal an upper triangular M-matrix UD with unit diagonal a diagonal matrix D = Diag(di ) with di 0, i = 1, . . . , n di = 0 if and only if A is singular and i = n
A=
Bn1 yn t zn Ann
Ln1 0 t 1 zn Un1 1
1 Un1 L n1 yn 0 0
28
Block LU variant of Gaussian elimination let A=
A11 A12 A21 A22
Rnn
be an irreducible singular M-matrix with zero sums along columns (i.e. Qmatrix) and with block matrices Aij Rki kj , i, j {1, 2} and k2 2
One step of (block) Gaussian elimination without pivoting yields A(1) = A = LU with L= I 0 1 A21 A11 I - a regular M-matrix - a singular M-matrix
A(2) = U =
(2)
A11 A12 (2) 0 A22
1 A22 = A22 + (A21 A 11 )A12
- the Schur complement of A11 A22 - an irreducible Q-matrix, hence,

1 R = A21 A 11 0 (2)
et A22 = 0
(2)
with et R = et
(2)
= 0 - a simple eigenvalue of A22 y 0 with A22 y = 0

(2)
= A11 x1 = A12 y 0
Implemented in MACOM: GTH variant of Gaussian elimination without pivoting data structure: row-oriented sparse matrix scheme (linked lists)
29
GTH algorithm
given an irreducible Markov chain with a nite state space S = {1, . . . , n}, set A = Qt IRnn in the case of a CTMC and A = P t I in the case of a DTMC. 1. Modied Gaussian elimination: For k = 1 to n 1 do Diag = n i=k+1 Aik For j = k + 1 to n do Akj = Akj /Diag For i = k + 1 to n do Aij = Aij + Aik Akj Endfor Endfor Endfor 2. Norm = 1 xn = 1 3. Back substitution: For i = n 1 to 1 do xi = n k=i+1 Aik xk Norm = Norm + xi Endfor 4. Normalization: For k = 1 to n do xk = xk /Norm Endfor
30
Comparison of solution methods
31
Iterative solution methods for Markov chains
Properties of matrix splittings let A IRnn and (M, N ) IRnn IRnn : (M, N ) is a splitting A splitting is if A=M N nontrivial regular weak regular M-splitting and M is regular if N = 0 if M 1 0, N 0 if M 1 0, M 1 N 0 if M is an M-matrix and N 0 = I M 1 A = (I M 1 A) x(k)
Iteration matrix Iteration procedure Spectral properties spectrum of A (A ) spectral radius subdominant EV index w.r.t. C
T = M1 N x(k+1) = T x(k)
(A) = max{|| | (A)} (A) = max{|| | (A), = (A)}
index (A) = min{k 0 | with
N(A I )k = N(A I )k+1 } B
N(B ) = {z |B z = 0} = null space of
32
Solution theory Theorem 3 Let A IRnn be an irreducible singular M-matrix and A = M N a weak regular splitting. Then Ax=0 is equivalent to x=T x and (T) = 1 is a simple eigenvalue of T = M 1 N 0. Theorem 4 Let A Rnn be an irreducible singular M-matrix and A = M N a weak regular splitting. Then there exists a nonsingular matrix S that transforms T = M 1 N to the following Jordan canonical form: 0 0 1 1 T =M N =S 0 1 0 S 0 0 K Here, (K ) < 1 and is a, perhaps not existing, diagonal matrix with distinct diagonal elements i = 1 and |i | = 1. Theorem 5 Let A Rnn be an singular M-matrix and A = M N an M-splitting. Then the iteration matrix T = M 1 N satises the following conditions: (1) (T ) = 1 is a maximal eigenvalue of T . (2) The algebraic multiplicity of 1 (T ) coincides with the algebraic multiplicity of 0 (A). (3) index1 (T ) = index0 (A)

33
Semiconvergent iteration procedures
Objective: necessary and sucient conditions for the convergence of x(k+1) = T x(k) to the normalized eigenvector Denition 1 T IRnn is called (zero-)convergent semiconvergent exists. if if limj T j = 0 limj T j IRnn 0 of the eigenvalue (T ) = 1 of T 0
Theorem 6 T IRnn is semiconvergent if and only if there exists a regular S IRnn and a K IRnn with (K ) < 1 s.t. T =S I 0 0 K S 1
holds, where I is missing if 1 (T ) and I IRmm if 1 (T ) is an eigenvalue with multiplicity m.
34
Theorem 7 Let T IRnn . T is semiconvergent if and only if each of the following conditions holds: (1) (T ) 1 (2) if (T ) = 1 and (T ) s.t. || = 1, then = 1 ( (T ) < 1) (3) if (T ) = 1 and 1 (T ) hold, then all elementary divisors associated with 1 are linear, i.e: Rank(I T )2 = Rank(I T ) index0 (I T ) = index1 (T ) = 1
35
Block iteration methods
A IRnn
a singular M-matrix a block partition with p > 1
A = (Aij ), 1 i, j p, A block splitting
A = M N = (D D(N)) (L + L(N)) U(N) with the properties (1) to (6) is called R-regular (block) splitting: (1) D = Diag(Dii )1ip and D(N ) 0 block diagonal 0 L and 0 L(N ) strictly lower block triangular 0 U (N ) strictly upper block triangular
1 (2) D ii
0 for all i {1, . . . , p} lower block triangular
(3) M = D L
(4) N = L(N) + U(N) + D(N) 0 (5) A0 = D L U(N) is irreducible. (6) the block matrix graph (A0 ) = (V, E) has a monotone decreasing cycle, i.e. a sequence c = [i1 , i2 , . . . , il , i1 ] of adjacent nodes with l 2 and ij > ij +1 , 1 j l 1. The block matrix graph (A0 ) = (V, E ) is a directed graph with nodes V = {Vi , 1 i p} and directed edges (Vi , Vj ) E . Vi results from the partition of the index set {1, . . . , n} according to the block partition. (Vi , Vj ) E i (A0 )ij = 0, i.e. there are indices l Vi , m Vj such that (l, m) E(A0 ) is an edge in the block matrix graph of A0 .
36
Theorem 8 Each R-regular splitting A = M N of an irreducible singular M-matrix A IRnn is semiconvergent. Theorem 9 Let A IRnn be an irreducible singular M-matrix and A = M N = (D L) U be a block Gauss-Seidel splitting with irreducible block matrices Dii along its diagonal (i.e. D(N ) = L(N ) = 0 in an R-regular splitting). Suppose there exist block matrices Aij = 0 and Aji = 0 for i, j {1, . . . , p}, i = j . Then the block Gauss-Seidel method converges. Corollary 1 The block Gauss-Seidel procedure converges for any irreducible singular M-matrix with block tridiagonal structure and irreducible diagonal blocks. Remark 1 Consider an R-regular block splitting A = (D D(N )) (L + L(N )) U (N ) 1 of a singular irreducible M-matrix A with the additional property Dii >0 for 1 i p. Then the underrelaxation variant derived from a block SOR scheme
1 T = M N , 0 < < 1 1 M = (D L), 1 ((1 )D + (L(N ) + U (N ) + D(N )) N =
is semiconvergent and the splitting A = M N weak regular.
37
Block SOR algorithm for an R-regular splitting
1. Initialization: Select x(0) 0, e.g. x(0) = e/n, (0, 1), (0, 2) and mT [1, 10] IN. Set k = 0, d(0) = 1, r(0) = 1, e(0) = 1. 2. Iteration step: (Let
0 j =1
0.)
For m = 1 to mT do For i = 1 to p do (k+m) Solve Dii x i = xi endfor endfor 3. Convergence test:

(k+m)
i1 j =1
Lij xj
(k+m)
p j =1
Nij xj
(k+m1)
= xi
(k+m1)
+ ( xi
(k+m)
xi
(k+m1)
d(k+mT ) = x (k+mT ) x(k+mT 1) If endif If else k = k + mT ; endif 4. Normalization: if x(k+mT ) goto step 2 r = d (k+mT ) r 1 k = k + mT ;
(k+mT ) (k+mT )
/d then goto step 2
(k)
1/mT
( (T ))
e(k+mT ) = d(k+mT ) max(1, r (k+mT ) /(1 r(k+mT ) )) e(k+mT ) then goto step 4
T x .1 > 0 set = x(k+mT ) / x(k+mT ) = . . (k+mT ) xp
(k+m )
38
Stochastic Interpretation of Block SOR
39
Trac model of xed alternative routing Parameters: i 1/i 1/i 1/ n intensity of Poisson stream i {0, 1, 2} mean on-time of stream i {1, 2} mean o-time of stream i {1, 2} mean call holding time number of trunks of overow link
40
Generator matrix of the M + MMPP + MMPP/M/n/n:

Q t + 0 . . . . . . 0
1I4
A =
Qt + 1I4 + 2I4 ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... 0 ...
0 . . . . . . 0
nI4 Qt + nI4
= Diag(1 + 2 , 1 , 2 , 0) + 0 I4
Q =
1 2 2 1 0 2 1 2 0 1 1 0 1 2 2 0 1 2 1 2
Q irreducible A has symmetric zero structure = point Gauss-Seidel convergent Aii irreducible, A12 = 0, A21 = 0 = block Gauss-Seidel convergent M x(k+1) = N x(k)
41
R-regular splittings:

Qt + + nI4 0 . . . 0
0 Qt + + nI4 ... ... ...
M =
... ... ... ... ... ... ... ... 0
0 . . . . . .
t
0 Q + + nI4
= DL

nI4 0 . . . . . . 0
1I4
N =
(n 1)I4 2I4 ... ... ... ... 0 ...
... ... ... 1I4 0
0 = L (N ) + U (N ) + D (N ) nI4
0 . . .
- irreducible, diagonally dominant regular M-matrix D: D = Diag(Dii )i=0,n = In+1 (Qt + + nI4 ) D(N ) = Diag(nI4 , (n 1)I4 , . . . , 1I4 , ) 0 ... ... ... 1 0 ... . .. ... L = 0 1 . . .. . ... 0 . 0 ... 0 1

0 0
0 . . . . . .
0, L ( N ) = 0 0 I4 0
U (N ) =
1 0 . . . 0 . . ... . 2 . . . ... ... ... 0 ... ... ... n 0 ... ... ... 0
42
0 . . . . . . . . .
Iterative A/D and Algebraic Multigrid Methods Iterative aggregation-disaggregation (A/D) method
43
A Two-Level AMG Algorithm
Select a partition = {J1 , . . . , Jm } of {1, . . . , n}, m 2, into disjoint sets Ji with ni elements, x(0) 0, et x(0) = 1, IN.
Set k := 0. Do For j = 1 to m do (k) xj e (k) (k) (k) 1(0,) (xj ) + 1{0} (xj ) zj (x ) := ( k ) t n j e xj For i = 1 to m do Bij (x(k) ) := et TJi Jj zj (x(k) ) endfor endfor d(k) = (I T ) x(k) Solve (I B(x(k) )) y(k) = R d(k) et y (k) = 0
(k) j
g(x(k) ) := x(k) P(x(k) ) y(k) = x(k) zj (x(k) ) yj x(k+1) := T g(x(k) ) until convergence M1 x(k+1) = t e M1 x(k+1)
( [1, 10])
44
Denitions: aggregation (membership) matrix
R IRmn :
... 0 0 et . . . . . . R= >0, . . . .. .. 0 . . 0 . . . 0 et nonnegative matrices: (x) = Diag(x) IRnn (x) = Diag 1{0} (xi )
et
e t R = et
i=1,...,m
IRmm
V (x) = (x)Rt + Rt (x) IRnm
prolongation matrix
P (x) IRnm : e t P (x) = e t

P (x) = V (x) (R V (x))1 > 0 ,

z 1 (x) 0 . . . 0
P (x) =
z 2 (x) ... ... 0 ... 0 z m (x)
... ...
0 . . .
zj (x) = xj /et xj 1(0,) (xj ) + 1/nj e 1{0} (xj )
aggregated iteration matrix
B (x) IRmm :
B (x) = R T P (x)
45
projection matrix (x) = P (x) R ,

(x) IRnn : R P (x) = I 0 ... ... 0 . . .
1 (x) 0 . . . 0
(x) =
2 (x) ... ... 0 ... 0 m (x)
j (x) = z j (x) e t
projection matrix
I (x) = U W (x) IRnn :
Ul =
1 . . . 0
1 . . . . . . 1 0 ... ... 0 . . .
1 ... ... . . . . ..
... ...
0 . . .

nl nl 1 0 IR
U =
U1 0 . . . 0
U2 nnm IR ... ... 0 . . . 0 Um
W (x) = (U t U )1 U t (I (x)) IRnmn et U = 0 W (x) x = 0 S 1 = R W (x)
S = (P (x), U ) , Im 0 0 0
= S 1 (x) S
46
Error analysis of the two-level AMG scheme
Denitions:
P = { u R n : u > 0, e t u = 1 } (x) = R T P (x) (x) g (x) x(m) = (x) x = P (x) (x) = T m (x) x
= x x = x g (x)
(m)
= x x(m)
Error after an error-correction step
x g ( x) :
= (I (x) T + u et )1 (I (x))
Error after m iterations

(m)
x(m) = Tm g(x):
= Tm = [T (I g(x) et )]m1 H(1) (x)
H(1) (x) = (I T (x))# T (I (x)) = (I T (x) + g(x) et )1 T U W(x)
(m)
m1 (I T (x) + g(x) et )1 T U W(x)
(x) =
T ( I g ( x) e t )
(T ) +
<1
(T) = max{|| | (T) \ {1} } = ( T (I g(x) et ) ) 1 (T ) < 1 = (B (x)) = ( T ) 1 ( T ) < 1
47
Local convergence of the two-level AMG scheme
T IRnn
semiconvergent, stochastic T x = x s.t. R x 0
rank(I T) = n 1 J1 ... J m = { 1, . . . , n }
rank(I (x) T) = n 1
for all x U (x ) P
Non-stationary iteration scheme x(k+1) = J(m) (x(k) ) x(k) = Tm (x(k) ) x(k) for m M,
k
x(0) V (x ) P
48
Domain Decomposition Algorithms
49
The multiplicative Schwarz iteration
T IRnn semiconvergent, stochastic, (T ) simple J1 ... J m = { 1, . . . , n } , |J l | = n l for 1 l L: Pl IRnnl : Rl IRnl n : P l 0, R l 0, e t Pl = et , e t Rl = e t ,

L l=1
Im(Pl ) = IRn
R l P l = I nl simple
l (x) l = Pl Rl IRnn ,
( l T ) = 1
Error correction of a V-cycle scheme x0 := x(k) For l = 1 to L do (k) dl = (I T ) xl1 R l (I T ) P l y l = R l d l xl

(k) (k) (k)
Solve
et yl = 0
endfor If xL > 0
:= xl1 Pl yl
(k)
then (k) g (x(k) ) := xL g (x(k) ) := x(k)
else endif
50
AMG variant with a replacement approach
T IRnn semiconvergent, stochastic, (T ) simple (0) : J1 ... J m = S = { 1, . . . , n } , |J l | = n l
Error correction of a V-cycle For l = 1 to m do Jl Jm
Step 1: Rearrangement
(0) (1) = {S \ Jl , Jl } {S \ Jm , Jm }
S \ Jl Al = Slt (I T ) Sl = I T l l Tl l
Jl T ll I Tll S \ Jl Jl
Sl
permutation matrix
51
Step 2: Partial aggregation (1) (2) = {Jm , Jm } Jm = {[1], . . . , [m 1]} = m 1 + nm ... 0 0 et . . . . . . m1nnm IR = ... ... . . 0 . 0 . . . 0 et
=
et
R(1) =
(1) Rm 0 0 I
IRn ,
(1) Rm
z1 0 . . . 0
(1)
(1) 0 Pm 0 I
IR
(1) Pm
z2 ... ... 0 . . . 0 zm1
... ...
0 . . .
zl =
et
xl >0 xl
Jm Jm m = R(1) Am P (1) = A Bm Cm Dm E m IR Bm Cm 0 S Jm
I 0 1 I Dm B m = LU =
52
Step 3: Schur complement aggregation on Jm (2) (3) = {Jm } P (2) = 0 I IRnm , R(2) =
1 D m B m ,
IRnm
1 m P (2) = Em Dm Bm SJm = R(2) A Cm IRnm nm
Result: Three-grid multiplicative Schwarz iteration (0) (1) (2) (3) Pm = P (1) P (2) = 0 I IRnnm
Rm SJm
= =
R(2) R(1) =
1 D m B m R(1) ,
IRnm n
t Rm Sm (I T ) S m P m
m (x)
P m R m (x) =
0 0 (1) 1 I D m B m R
53
Error correction of the Schwarz iteration For l = 1 to m do
xl := l (xl1 ) xl1
= Sl (I l T )# (I l ) Slt xl1 + Pl l
where
0 = S Jl l
et l =1
Extensions: parallel algorithms additive Schwarz iteration (domain decomposition) multi-splitting algorithms asynchronous parallel iteration schemes
54
References
[1] U. R. Krieger: Analysis of a loss system with mutual overow in a Markovian environment. W. Stewart, Numerical Solution of Markov Chains, Marcel Dekker, New York, 1991, pp. 303 328. [2] U. R. Krieger, B. M uller-Clostermann, and M. Sczittnick. Modeling and analysis of communication systems based on computational methods for Markov chains. IEEE Journal on Selected Areas in Communications, vol. 8, pp. 1630 1648, 1990. [3] U. R. Krieger and M. Sczittnick. A Markovian approach for modelling and analysis of advanced telecommunication networks. A. Jensen, V. B. Iversen, Proceedings ITC 13, 1991. [4] U. R. Krieger and M. Sczittnick. Application of numerical solution methods for singular systems in the eld of computational probability theory. R. Beauwens, P. de Groen, Iterative Methods in Linear Algebra, North-Holland, Amsterdam, 1992, pp. 613 626. [5] U. R. Krieger. Numerical solution of large nite Markov chains by algebraic multigrid techniques. In W. Stewart, ed., Numerical solution of Markov chains, Kluwer Publishers, 1995. [6] U. R. Krieger. On a two-level multigrid solution method for nite Markov chains. Linear Algebra and its Applications, 223/224, 415-438, 1995. [7] M. Sczittnick and B. M uller-Clostermann. MACOM - A tool for the Markovian analysis of communication systems. Proceedings Fourth International Conference on Data Communication Systems and their Performance, 1990. [8] V. A. Barker. Numerical solution of sparse singular systems of equations arising from ergodic Markov chains.
55
Commun. Statist.-Stochastic Models, 5(3), 335381, 1989. [9] A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. SIAM, Philadelphia, 1994. [10] P. Buchholz et al.. Quantitative Systemanalyse mit Markovschen Ketten. Teubner-Texte zur Informatik, Vol. 8, Teubner, Stuttgart, 1994. [11] C. D. Meyer and R. J. Plemmons, eds.. Chains, and Queueing Models. Springer, New York, 1993. Linear Algebra, Markov
[12] W. Stewart, ed. Numerical Solution of Markov Chains. Marcel Dekker, New York, 1991. [13] W. Stewart. An Introduction to the Numerical Solution of Markov Chains. Princeton University Press, 1994. [14] S. Chakravarthy and A. Alfa. Matrix-Analytic Methods in Stochastic Models. Marcel Dekker, New York, 1996.
56
Brief Introduction to an Algorithmic Approach for Strong Stochastic Bounds

Jean-Michel Fourneau and Nihal Pekergin PRISM, Universit de Versailles-St Quentin, 78000 Versailles-France
1. Introduction Despite considerable works the numerical analysis of Markov chains, is still a very dicult problem when the state space is too large or the eigenvalues badly distributed [13]. Fortunately enough, while modeling high speed networks, it is often sucient to satisfy the requirements for the Quality of Service (QoS) we expect. Exact values of the performance indices are not necessary in this case and bounding some reward functions is often sucient. So, we advocate the use of stochastic bounds to prove that the QoS requirements are satised. Our approach diers from sample path techniques and coupling theorem which are generally applied to transform models, since we only consider Markov chains and algorithmic operations on stochastic matrices. Assume that we have to model a problem using a very large Markov chain. We need to compute its steady-state distribution in order to obtain reward functions (for instance, the cell loss rates for a nite capacity buer). The key idea of the methodology is to design a new chain such that the reward functions will be upper or lower bounds of the exact reward functions. This new chain is an aggregated or simplied model of the former one. These bounds and the simplication criteria are based on some stochastic orderings applied to Markov chains (see Stoyan [14, 12] and other references therein). As we drastically reduced the state space or the complexity of the analysis, we may now use numerical methods to eciently compute a bound of the rewards. We present here some results based on stochastic orderings and structurebased algorithms combined with usual numerical techniques. Thus the algorithms we present can be easily implemented inside software tools based on Markov chains [7]. Unlike former approaches which are either analytical or not really
constructive, this new approach is only based on simple algorithms. These algorithms can be applied always, even if the quality of the bounds may be sometimes not enough accurate. We survey the results in two steps : rst how to obtain a bounding matrix so a bounding steady-state distribution; in a second step how to simplify the numerical computations. We present several algorithms based on stochastic bounds and structural properties of the chains. In section 2, we dene the st stochastic ordering and we give the fundamental theorem on stochastic matrices. We also present Vincents algorithm [1] which is the starting point of all the algorithms for the st ordering presented here. Then we present, in section 3, several algorithms for st bounds based on structures: upper-Hessenberg, lumpability [8], stochastic complement [11], Class C Matrices [3]. 2. Strong Stochastic Bounds For the sake of simplicity, we restrict ourselves to Discrete Time Markov Chains (DTMC) with nite state space E = {1, . . . , n} but continuous-time models can be considered after uniformization. Here we restrict ourselves to st stochastic ordering. In the following, n will denote the size of matrix P and Pi, will refer to row i of P . First, we give a brief overview on stochastic ordering for Markov chains and we obtain a set of inequalities to imply bounds. Then we present the basic algorithm proposed by Vincent and Abuamsha [1] and we explain some of its properties.
2.1. A brief overview
Denition 1 Let X and Y be random variables taking values on a totally ordered space. Then X is said to be less than Y in the strong stochastic sense, that is, X <st Y if and only if E [f (X )] E [f (Y )] for all non decreasing functions f whenever the expectations exist. If X and Y take values on the nite state space {1, 2, . . . , n} with p and q as probability distribution vectors, then X is said to be less than Y in the 58
Following [14], we dene the strong stochastic decreasing functions or by matrix Kst . 1 0 0 ... 1 1 0 ... Kst = 1 1 1 . . . . . . .. . . . . . . .
ordering by the set of non0 0 0 . . .
1 1 1 ... 1
strong stochastic sense, that is, X <st Y if and only if k = 1, 2, . . . , n, or briey: pKst <st qKst .
n j =k
pj
n j =k qj
for
Important performance indices such as average population, loss rates or tail probabilities are non decreasing functions. Therefore, bounds on the distribution imply bounds on these performance indices as well. It is important to know that st-bounds are valid pour the transient distributions as well. We do not use this property as we are mainly interested in performance measures on the the steadystate. To the best of our knowledge, such a work has still to be done to link st-bounds and numerical analysis for the computation of transient distributions. It is known for a long time that monotonicity and comparability of the one step transition probability matrices of time-homogeneous MCs yield sucient conditions for their stochastic comparison. This is the fundamental result we use in our algorithms [14]. First let us dene the st-comparability of the matrix and the st-monotonicity. Denition 2 Let P and Q be two stochastic matrices. P <st Q if and only if P Kst QKst . This can be also characterized as Pi, <st Qi, for all i. Denition 3 Let P be a stochastic matrix, P is st-monotone if and only if for all u and v , if u <st v then uP <st vP . Hopefully, st-monotone matrices are completely characterized (this is not the case for other orderings, see [3]). Denition 4 Let P be a stochastic matrix. P is <st -monotone if and only if 1 Kst P Kst 0 component-wise. Thus we get: Property 1 Let P be a stochastic matrix, P is st-monotone if and only if for all i, j > i, we have Pi, <st Pj, Theorem 1 Let X (t) and Y (t) be two DTMC and P and Q be their respective stochastic matrices. Then X (t) <st Y (t), t > 0, if X (0) <st Y (0), st-monotonicity of at least one of the matrices holds, st-comparability of the matrices holds, that is, Pi, <st Qi, i.
59
Thus, assuming that P is not monotone, we obtain a set of inequalities on elements of Q :

n k=j n k=j
Pi,k Qi,k
n k=j n k=j
Qi,k Qi+1,k
i, j i, j
(1)
2.2. Algorithms
It is possible to derive a set of equalities, instead of inequalities. These equalities provides, once they have been ordered, a constructive way to design a stochastic matrix which yields a stochastic bound. The following equations are increasing in i and decreasing in j .
n k=j n k=j n Q1,k = k=j P1,k Qi+1,k = max( n k=j Qi,k , n k=j
Pi+1,k ) i, j
(2)
The following algorithm [1] constructs an st-monotone upper bounding DTMC Q for a given DTMC P . For the sake of simplicity, we use a full matrix representation for P and Q. Stochastic matrices associated to real performance evaluation problems are usually sparse. And the sparse matrix version of all the algorithms we present here is straightforward. Note that due to the ordering of the indices, n the summations n j =l+1 qi,j are already computed when we need j =l qi1,j and them. And they can be stored to avoid computations. However, we let them appear as summations to show the relations with inequalities 1.
Algorithm MSUB For l = 1, 2 . . . , n Do q1,l = p1,l EndFor For i = 2, 3, . . . , n Do qi,n = max(qi1,n , pi,n ) EndFor For l = n 1, n 2, . . . , 1 Do For i = 2, 3, . . . , n Do qi,l = max EndFor EndFor
n j =l
qi1,j ,
n j =l
pi,l
n j =l+1
qi,j
Fig. 1: Algorithm MSUB : The basic algorithm to construct a st-monotone upper bound Denition 5 We denote by v (P ) the matrix obtained after application of Algorithm MSUB to a stochastic matrix P . 60
First let us illustrate Algorithm MSUB on a small matrix. matrix for P 1 and compute matrix Q : 0.5 0.5 0.2 0.1 0.2 0.0 0.1 0.1 0.7 0.1 0.0 0.1 P1 = 0.2 0.1 0.5 0.2 0.0 Q = v (P 1) = 0.1 0.1 0.1 0.0 0.1 0.7 0.1 0.0 0.0 0.2 0.2 0.1 0.5
We consider a 5 5 0.2 0.6 0.2 0.0 0.1 0.1 0.1 0.5 0.1 0.1 0.2 0.1 0.1 0.7 0.3 0.0 0.1 0.1 0.1 0.5
Their steadystate distributions are respectively P 1 = (0.180, 0.252, 0.184, 0.278, 0.106) and Q = (0.143, 0.190, 0.167, 0.357, 0.143). Their expectations are respectively 1.87 and 2.16 (we assume that the rst state has index 0 to compute the reward f (i) = i associated to the expectation). Remember that the strong stochastic ordering implies that the expectation of f on distribution P 1 is smaller than the expectation of f on distribution Q for all non decreasing functions f . It may happen that matrix v (P ) computed by this algorithm is not irreducible, even if P is irreducible. Indeed due to the subtraction operation in inner loops, some elements of v (P ) may be zero even if the elements with the same indices in P are positive. However, if matrix v (P ) is reducible, it has one essential class of states and the last state belongs to that class. So it is still possible to compute the steady-state distribution for this class. We do not prove the theorem but we present an example of a matrix P 2 such that v (P 2) is reducible (i.e. states 0, 1 and 2 are transient in matrix v (P 2)). 0.5 0.1 0.2 0.0 0.0 0.2 0.7 0.1 0.0 0.2 0.1 0.1 0.5 0.0 0.2 0.2 0.0 0.2 0.7 0.1 0.0 0.1 0.0 0.3 0.5 0.5 0.1 0.1 0.0 0.0 0.2 0.6 0.2 0.0 0.0 0.1 0.1 0.5 0.0 0.0 0.2 0.1 0.1 0.7 0.5 0.0 0.1 0.1 0.3 0.5
P2 =
Q = v (P 2) =
We have derived a new algorithm which try to keep almost all transitions of P in matrix v (P ) and we have proved a necessary and sucient condition on P to obtain an irreducible matrix (the proof of the theorem is omitted for the sake of readability [2]: Theorem 2 Let P be an irreducible nite stochastic matrix. Matrix Q computed from P by Algorithm IMSUB is irreducible i P1,1 > 0 and every row of the lower triangle of matrix P contains at least one positive element. In the following, is an arbitrary positive value. And we assume that a summation with a lower index larger than the upper index is 0. 61
Algorithm IMSUB For l = 1, 2 . . . , n Do q1,l = p1,l EndFor For i = 2, 3, . . . , n Do qi,n = max(qi1,n , pi,n ) EndFor For l = n 1, n 2, . . . , 1 Do For i = 2, 3, . . . , n Do qi,l = max 0, max If (qi,l = 0) and qi,l = 1 EndIf EndFor EndFor
n j =l
qi1,j , qi,j < 1 qi,j
n j =l
pi,l
n j =l+1
qi,j Then
n j =l+1
and
(pi,l > 0) or (i = l 1)
n j =l+1
Fig. 2: Algorithm IMSUB: Construction of an irreducible st-monotone upper bound, 0 < < 1
2.3. Properties
The basic algorithm (Algorithm MSUB gure 1) has several interesting properties which can be proved using a max-plus formulation [4] which appears clearly in equation 2. Theorem 3 Algorithm MSUB provides the smallest st-monotone upper bound for a matrix P : i.e. if we consider U another st-monotone upper bounding DTMC for P then Q <st U [1]. However bounds on the probability distributions may still be improved. The former theorem only states that this algorithm provides the smallest matrix. We have developed new techniques to improve the accuracy of the bounds on the steady-state which are based on some transformations on P [4]. We have studied a linear transformation for stochastic matrices (P, ) = (1 )I + P , for (0, 1). This transformation has no eect on the steady-state distribution but it has a large inuence on the eect of Algorithm MSUB. We have proved in [4] that if the given stochastic matrix is not row diagonally dominant, then the steady-state probability distribution of the optimal st-monotone upper bounding matrix corresponding to the row diagonally dominant transformed matrix is better in the strong stochastic sense than the one corresponding to the original matrix. And we have established that the transformation P/2 + I/2 provides the best bound for the family of linear transformation we have considered. 62
More precisely: Theorem 4 Let P be a DTMC of order n, and two dierent values 1, 2 (0, 1) such that 1 < 2, Then v((P,1)) <st v((P,2)) <st v(P ) . One may ask if there is an optimal value of . When the matrix is row diagonal dominant (RDD), its diagonal serves as a barrier for the perturbation moving from the upper-triangular part to the strictly lower-triangular part in forming v (P ). Denition 6 A stochastic matrix is said to be row diagonally dominant (RDD) if all of its diagonal elements are greater than or equal to 0.5. Corollary 1 Let P be a DTMC of order n that is RDD. Then v (P ) and v ((P )) have the same steady-state probability distribution. Corollary 1 implies that one cannot improve the steady-state probability bounds by choosing a smaller value to transform an already RDD DTMC. And = 1/2 is sucient to transform an arbitrary stochastic matrix into a RDD one. This rst approach was then generalized to transformations based on a set of polynomials which gives better (i.e. more accurate) bounds [5]. Let us rst introduce these transformations and their basic properties. Denition 7 Let D be the set of polynomials () such that (1) = 1, dierent of Identity, and all the coecients of are non negative. Proposition 1 Let () be an arbitrary polynomial in D, then (P ) has the same steady-state distribution than P Theorem 5 Let be an arbitrary polynomial in D, Algorithm MSUB applied on (P ) provides a more accurate bound than the steady-state distribution of Q i.e.: P <st v((P )) <st v(P ) Corollary 1 basically states that the optimal transformation is (X ) = X/2 + 1/2, if we restrict ourselves to polynomials of degree 1. Such a result is still unknown for arbitrary degree polynomials, even if it is clear that the larger the degree of , the more accurate the bound v ((P )). This is illustrated in the example below. Let us consider stochastic matrix P 3 and we study the polynomials (X ) = X/2 + 1/2 and (X ) = X 2 /2 + 1/2. 63
0.1 0.2 P3 = 0.1 0.2
0.2 0.3 0.5 0.1
0.4 0.2 0.4 0.3
0.3 0.3 0 0.4
First, let us compute (P 3) and (P 3). 0.55 0.1 (P 3) = 0.05 0.1 0.1 0.65 0.25 0.05 0.2 0.1 0.7 0.15 0.15 0.15 (P 3) = 0 0.7 0.575 0.08 0.075 0.075 0.155 0.63 0.185 0.13 : 0.155 0.63 0.185 0.13 0.165 0.155 0.605 0.17 0.105 0.135 0.135 0.625 0.165 0.155 0.65 0.17 0.105 0.135 0.09 0.625
Then, we apply operators v to 0.55 0.1 0.2 0.1 0.55 0.2 v ((P 3)) = 0.05 0.25 0.55 0.05 0.1 0.15 And,
obtain the bounds on matrices 0.575 0.15 0.15 0.08 v ( (P 3)) = 0.075 0.15 0.075 0.7 0.2 0.2 0.2 0.2 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.4
0.1 0.1 v (P 3) = 0.1 0.1
Finally, we compute the steady-state distributions for all matrices: v(P 3) = (0.1, 0.2, 0, 3667, 0.3333) v((P 3)) = (0.1259, 0.2587, 0, 2821, 0.3333) = (0.1530, 0.2997, 0, 2916, 0.2557) v((P 3)) P 3 = (0.1530, 0.3025, 0, 3167, 0.2278)
Clearly, the bounds obtained by polynomial are more accurate than the other bounds.
2.4. Time and Space complexity
It must be clear at this point that Algorithm MSUB builds a matrix Q which is, in general, as dicult as P to analyze. This algorithm is only presented here to show that inequalities 1 have algorithmic implications. Concerning its complexity on a sparse matrix, we do not have positive results. Indeed, it may be possible 64
that matrix Q has many more positive elements than even completely lled, as in the following example : 0.5 0.2 0.1 0.1 0.1 1.0 0.0 0.0 0.0 0.0 P 4 = 1.0 0.0 0.0 0.0 0.0 Q = v (P 4) = 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
matrix P and it may be 0.5 0.5 0.5 0.5 0.5 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
More generally, it is easy to build a matrix P with 3n positive elements resulting in a completely lled matrix v (P ). However, the algorithms we survey in the next section provide matrices with structural or numerical properties. Most of them do not suer the same complexity problem. 3. Structure based bounding algorithms for st comparison We can also use the two sets of constraints of equation 1 and add some structural properties to simplify the resolution of the bounding matrix. For instance, the next algorithm provides an upper bounding matrix which is upper-Hessenberg (i.e. the low triangle except the main sub-diagonal is zero). Therefore the resolution by direct elimination is quite simple. In the following we illustrate this principle with several structures associated to simple resolution methods and present algorithms to build structure based st-monotone bounding stochastic matrices. Most of these algorithms do not assume any particular property or structure for the initial stochastic matrix.
3.1. Upper-Hessenberg structure
Denition 8 A matrix H is said to be upper-Hessenberg if and only if Hi,j = 0 for i > j + 1. The paradigm for upper-Hessenberg case is the M/G/1 queue. The resolution by recursion for these matrices requires o(m) operations [13]. Property 2 Let P be an irreducible nite stochastic matrix such that P (1, 1) > 0 and every row of the lower triangle of P contains at least one positive element. Let Q be computed from P by Algorithm UHMSUB. Then Q is irreducible, stmonotone, upper-Hessenberg and an upper bound for P . The proof is omitted [2]. This algorithm is slightly dierent of Algorithm IMSUB. The last two instructions create the upper-Hessenberg structure. Note that the generalization to block upper-Hessenberg matrices is straightforward. 65
Algorithm UHMSUB q1,n = p1,n ; For i = 1, 2, . . . , n Do q1,i = p1,i qi+1,n = max(qi,n , pi+1,n ); EndFor For i = 2, 3, . . . , n Do For l = n 1, n 2, . . . , i Do n n n qi,l = max( j =l qi1,j , j =l pi,j ) j =l+1 qi,j ; n If (qi,l = 0) and (pi,l > 0) and ( j =l+1 qi,j < 1) Then n qi,l = (1 j =l+1 qi,j ) EndIf EndFor n qi,i1 = 1 j =i qi,j For l = i 2, i 3, . . . , 1 Do qi,l = 0 EndFor EndFor
Fig. 3: Algorithm UHMSUB: Construction of an irreducible upper-Hessenberg st-monotone upper bound, 0 < < 1 The application of this algorithm on matrix P 1 already dened leads to the following upper bounding matrix : 0.5 0.1 0.0 0.0 0.0 0.2 0.6 0.3 0.0 0.0 0.1 0.1 0.5 0.2 0.0 0.2 0.1 0.1 0.7 0.5 0.0 0.1 0.1 0.1 0.5
Q=
3.2. Lumpability
Ordinary lumpability is another ecient technique to combine with stochastic bounds [15]. Unlike the former algorithms, lumpability implies a state space reduction. The following algorithm is based on the decomposition of the chain into macro-states. Again we assume that the states are ordered according to the macro-state partition. Let r be the number of macro-states. Let b(k ) and e(k ) be the indices of the rst state and the last state, respectively, of macro-state A k . First, let us recall the denition of ordinary lumpability. Denition 9 (ordinary lumpability) Let Q be the matrix of an irreducible nite DTMC, let Ak be a partition of the states of the chain. The chain is 66
ordinary lumpable according to partition Ak , if and only if for all states e and f in the same arbitrary macro state Ai , we have: qe,j =
j Ak j Ak
qf,j
macro state Ak
Ordinary lumpability constraints are consistent with the st-monotonicity and they provide a simple characterization for matrix Q. Theorem 6 Let Q be an st-monotone matrix which is an upper bound for P . Assume that Q is ordinary lumpable for partition AI , 1 I r and let Qm,l and P m,l be the blocks of transitions from set Am to set Al for Q and P respectively, then for all m and l, block Qm,l is st-monotone. Indeed, since Q is st-monotone we have:
n n
Q(i, j )
j =a j =a
Q(i + 1, j ) i
(3)
But as Q is ordinary lumpable, if i and i + 1 are in the same macro-state, we have: Q(i + 1, j ) k Q(i, j ) =
j Ak j Ak
So we can subtract in both terms of relation 3 partial sums on the macro state which are all equal due to ordinary lumpability. Therefore, assume that a, i and i + 1 are in the same macro state Ak , we get Q(i, j )
j a,j Ak j a,j Ak
Q(i + 1, j )
The algorithm computes the matrix column by column. Each block needs two steps. The rst step is based on the basic algorithm while the second step modies the rst column of the block to satisfy the ordinary lumpability constraint. More precisely, the rst step uses the same relations as Algorithm IMSUB, but it has to take into account that the rst row of P and Q may now be dierent due to the second step. The lumpability constraint is only known at the end of the rst step. Recall that ordinary lumpability is due to a constant row sum for the block. Thus after the rst step, we know how to modify the rst column of the block to obtain a constant row sum. Furthermore due to st-monotonicity, we know that the maximal row sum is reached for the last row of the block. In step 2, we modify the rst column of the block taking into account the last row sum. Once a block has been computed, it is now possible to compute the block on the left. 67
Algorithm LIMSUB r : number of macro-state For m = r, r 1, ..1 Do For l = e(m), ..b(m) Do refreshSum(l) EndFor normalize(m) EndFor
Fig. 4: Algorithm LIMSUB: Construction of an irreducible lumpable st-monotone upper bound
Procedure normalize k : index of macro-state For y = 1, ..r Do e(k) tmp = j =b(k) qe(y),j ; For i = b(y ), . . . , e(y ) 1 Do e(k) qi,b(k) = tmp j =b(k)+1 qi,j ; EndFor EndFor
Fig. 5. Algorithm normalize: lumpability constraints
Procedure refreshSum l : index of column For i = 1, 2, ..n Do qi,l = max 0, max EndFor If (qi,l = 0) and qi,l = 1 EndIf
n j =l+1 n j =l
qi1,j ,
n j =l
pi,l
n j =l+1
qi,j ; Then
qi,j < 1 qi,j ;
and
(pi,l > 0) or (i = l 1)
n j =l+1
Fig. 6. Algorithm refreshSum : monotonicity constraints (0 < < 1)
Let us illustrate the two steps on a simple example using matrix P 1 formerly dened. Assume that we divide the state-space into two macro-states: A 1 = {1, 2} and A2 = {3, 4, 5}. We give on the left the matrix after the rst step (application of procedure refreshSum for macro state A2 ) and on the right after the second step (application of procedure normalize). 68
0.1 0.1 0.5 0.1 0.1
0.2 0.1 0.1 0.7 0.3
0.0 0.1 0.1 0.1 0.5
0.1 0.1 0.7 0.1 0.1
0.2 0.1 0.1 0.7 0.3
0.0 0.1 0.1 0.1 0.5
3.3. Class C Stochastic Matrices
Some stochastic matrices have also a closed form steady-state solution, for instance, the class C matrices dened in [3]. Denition 10 A stochastic matrix Q = (qi,j )1i,j n belongs to class C, if for each column j there exists a real constant cj satisfying the following conditions: qi+1,j = qi,j + cj , 1 i n 1. Since Q is a stochastic matrix, the sum of elements in each row must be equal to 1, thus n j =1 cj = 0. For instance, the following matrix is c3 = 0.1 0.45 P = 0.35 0.25 in class C with c3 = 0.05, c2 = 0.05 and 0.15 0.20 0.25 0.4 0.45 0.5
is in class C . These matrices have several interesting properties for st ordering and other orderings as well. First, the steady-state distribution of Q can be computed in linear time: j = q1,j + cj
n j =1
j q1,j 1
n j =1 j
cj
(4)
The st-monotonicity characterization is also quite simple in this class: Proposition 2 Let P be a stochastic matrix belonging to class C. P is stmonotone if and only if n j {1, . . . , n}. k=j ck 0, The algorithm to obtain a monotone upper bound Q of class C for an arbitrary matrix P has been presented in [3]. First remark that since the upper bounding matrix Q belongs to class C, we must determine its rst row q1,j , 1 j n, and the columns coecients cj , 1 j n rather than all the elements of Q. Within former algorithms the elements of Q are linked by inequalities but now we add the linear relations which dene the C class. For instance, we have 69
qn,n = q1,n + n cn . Therefore we must choose carefully q1,n and cn to insure that 0 qn,n 1. Note that x+ denotes as usual max(x, 0).
Algorithm CCMSUB q1,n = max1in1 cn = max2in

(n1)pi,n (i1) ni + pi,n q1,n i1
n n k=j
For j = n 1, n 2, . . . , 2 Do j = max2in gi =
n1 ni n k=j
k=j
pi,k i1
q1,k
pi,k
+
n k=j +1 n k=j +1
qi,k +
i1 ni
n k=j +1
qn,k 1
q1,j = [max1in1 gi ] q1,j cj = max( n , + j 1 EndFor n q1,1 = 1 j =2 q1,j
ck )
Fig. 7: Algorithm CCSUB : Construction of a class C, st-monotone upper bound Again consider an example: let P 5 be a matrix which does not belong to class C, and Q its upper bounding matrix computed through algorithm 5. 0.5 P 5 = 0.7 0.3 0.1 0.1 0.2 0.4 0.2 0.5 0.5 Q = 0.4 0.3 0.1 0.15 0.2 0.4 0.45 0.5
Since c3 = 0.05, c2 = 0.05 c1 = 0.1, Q belongs to class C. The steady-state distributions are : P 5 = (0.4456, 0.1413, 0.4130) Q = (0.3941, 0.1529, 0.4529) and P 5 <st Q
3.4. Partition and stochastic complement
The stochastic complement was initially proposed by Meyer in [11] to uncouple Markov chains and to provide a simple approximation for steady-state. Here we propose a completely dierent idea based on an easy resolution of the stochastic A B , where complement. Let us consider a block decomposition of Q: C D A, B , C , and D are matrices of size n0 n0 , n0 n1 , n1 n0 and n1 n1 (with n0 + n1 = n). We know that I D is not singular if P is not reducible [11]. We decompose into two components 0 and 1 to obtain the stochastic complement 70
where H = B (I D)1 , R = I A HC and r = e0 + He1 . Following Quessette [9], we chose to partition the states such that matrix D is upper triangular with positive diagonal elements. It should be clear that this partition is not mandatory for the theory of stochastic complement. However it simplies the computation of H . Such a partition is always possible, even if for some cases it implies that n1 is very small [9]. It is quite simple to derive from the basic algorithm, an algorithm which builds a matrix of this form once the partition has been xed. The algorithm has two steps. The rst step is the construction of a monotone stochastic upper bounding matrix. In the second step, we remove the transitions in the lower triangle of D and sum up their probabilities in the corresponding diagonal elements of D.
3.5. Single Input Macro State Markov Chain
formulation for the steady-state equation: 0 R = 0 0 r = 1 1 = 0 H
(5)
Feinberg and Chiu [6] have studied chains divided into macro-states where the transition entering a macro-state must go through exactly one node. This node is denoted as the input node of the macro-state. They have developed an algorithm to eciently compute the steady-state distribution by decomposition. It consists of the resolution of the macro-state in isolation and the analysis of the chain reduced to input nodes. Unlike ordinary lumpability, the assumptions of the theorem are based on the graph of the transitions and do not take into account the real transition rates. It is very easy to modify the basic algorithm to create a Single Input Macro State Markov chain. We assume that for every macro state, the input state is the last state of the macro state. Thus the matrix Q looks like this: ... ...0 ... ... ...0 ... A ... ... ... ... ... ... ... ... ...0 ... ... ... B ... ... ...0 ... ... ... 71 ... ...0 ... ... ...0 ... ... ... ... ... ... ... C
Let us apply this algorithm on matrix P 1 {1, 2} and A2 = {3, 4, 5} to obtain matrix Q: 0.5 0.2 0.0 0.1 0.6 0.0 Q= 0.0 0.3 0.4 0.0 0.1 0.1 0.0 0.1 0.1
with the following partition A1 = 0.0 0.0 0.0 0.5 0.3 0.3 0.3 0.3 0.3 0.5
This structure have been used by several authors even if their proofs of comparison are usually based on sample-path theorem [10]. 4. Conclusions Strong stochastic bounds are not limited to sample-path proofs. It is now possible to compute bounds of the steady-state distribution directly from the chain. This approach may be specially useful for high speed networks modeling where the performance requirements are thresholds. Using the algorithmic approach we survey in this paper, a sample-path proof is not necessary anymore and these algorithms may be integrated into software performance tools based on Markov chains. Generalizations to other orderings or to computation of transient measures are still important problems for performance analysis. Moreover this approach can be also applied for probabilistic model checking. References [1] O. Abuamsha and J.M. Vincent. An algorithm to bound functionals of markov chains with large state space. In 4th Informs Conference on Telecommunications, 1998. [2] M. Ben Mamoun, J.M. Fourneau, N. Pekergin, and A. Troubniko. An algorithmic and numerical approach to bound the performance of high speed network. In MASCOTS 2002, pages 375383. IEEE Computer Society, 2002. [3] M. Ben Mamoun and N. Pekergin. Closed-form stochastic bounds on the stationary distribution of markov chains. Probability in the Engineering and Informational Science, (16):403 426, 2002. [4] T. Dayar, J.M. Fourneau, and N. Pekergin. Transforming stochastic matrices for stochastic comparaison with the st-order. RAIRO-RO, (37):8597, 2003. [5] T. Dayar, J.M. Fourneau, N. Pekergin, and J.M. Vincent. A new proof of st-comparison for polynomials of a stochastic matrix. 2004. submitted. 72
[6] B.N. Feinberg and S.S. Chiu. A method to calculate steady-state distributions of large markov chains by aggregating states. Operational Research, (35):282290, 1987. [7] J.M. Fourneau, M. Lecoz, N. Pekergin, and F. Quessette. An open tool to compute stochastic bounds on steady-state distributions and reward. In MASCOTS 2003, pages 219225. IEEE Computer Society, 2003. [8] J.M. Fourneau, M. Lecoz, and F. Quessette. Algorithms for an irreducible and lumpable strong stochastic bound. In Numerical Solution of Markov Chains, 2003. appear in Linear Algebra and Applications. [9] J.M. Fourneau and F. Quessette. Graphs and stochastic automata network. In Numerical Solution of Markov Chains, 1995. [10] J.C. Lui, R.R. Muntz, and D. Towsley. Bounding the mean response time of the minimum expected delay routing policy: an algorithmic approach. IEEE Transactions on Computers, 44(12):13711382, 1995. [11] C.D. Meyer. Stochastic complementation, uncoupling markov chains and the theory of nearly reducible systems. SIAM Rev., 31(2), 1989. [12] M. Shaked and J.G. Shantikumar. Stochastic Orders and Their Application. Academic Press, 1994. [13] W.J. Stewart. Introduction to the Numerical Solution of Markov Chains. Princeton University Press, 1994. [14] D. Stoyan. Comparaison Methodes for Queues and Other Stochastic Models. John Wiley and sons, 1983. [15] L. Truet. Reduction technique for discrete time markov chains on totally ordered state space using stochastic comparisons. Journal of Applied Probability, 37(3), 2000.
73
A review of numerical methods for solving large Markov chains
B EATA B YLINA a
a
JAROSAW B YLINA a
Department of Computer Science, Marie Curie-Skodowska University, Pl. M. Curie-Skodowskiej 1, 20-031 Lublin, Poland, e-mail: {beatas,jmbylina }@hektor.umcs.lublin.pl
1. Introduction In this paper we would like to make a review of choosen numerical algorithms used for numerical solving of Markov chains. We are interested in stationary solutions of homogeneous, irreducible Markov chain. Such a chain can be described with an inntesimal generator matrix Q dened for continuous-time Markov chains (CTMCs) as following: Q = (qij )1in,1j n , qij = lim pij (t) t
j =i
t0
for qij ,
i = j,
qii =
where pij (t) is the probability that if the chain is in the state i it will be in the state j after the time t. For a discrete-time Markov chain (DTMC) we can dene the matrix Q as Q = P I, where P is a stochastic matrix of the DTMC and it is dened: P = (pij )1in,1j n ,
where pij is the probability that if the chain is in the state i it will be in the state j in the next time moment. Stationary solutions can be obtained (both for CTMCs and for DTMCs) from a linear system: Q = 0, where = (1 , . . . , n ) is a vector of probabilities of particular states of the Markov chain (so 0 and trix P from: P = For CTMCs we can dene P as: P = I + Q, where 0 < < 1/(maxi=1,...,n |qii |). For a convinient notation we assume = xT and our problem is to solve: QT x = 0 (or as an eigenvector problem: PT x = x) with the constraints: x 0, eT x = 1. (3) (2) (1)
n i=1 i
= 1) which are to be found.
The same probability vector can be obtained as an eigenvector of the stochastic ma-
Despite its familiar form the equation is rather spacial. The matrix Q is singular so the equation (2) has solutions and it can be proven [13] that if rankQ = n 1 (it is true in interesting for us cases) there exists exactly one solution satisfying (3). Moreover, the matrix Q is huge (sometimes millions of states or even more), very sparse and illconditioned. We have to chose a suitable algorithm for solving our problem (depending on our aims to achieve accuracy, time or size). There are following approaches to solve the equation (2) [13]: direct methods (variations of the Gaussian elimination see section 2.); iterative methods (section 3.); projection methods (section 4.); decompositional methods (not covered in this paper they are the material for the separate article on their own). 76
2. Direct Methods Methods which would give us an exact solution in a nite number of steps if the machine accuracy were innite are called direct methods (or sometimes: exact methods). The traits of the direct methods are: constant execution time (or rather a constant number of computation steps for a given matrix size) known in advance; modication (or rather complete reconstruction) of the given matrix; the ll-in; rather good accuracy. The ll-in is a very troublesome phenomenon. It consists in appearing nonzero elements in the output matrices in places of zero elements in the input matrix. It is very unconvinient if we want to store matrices in a compact manner (i.e. without zero entries) what is very efcient and indeed necessary for such huge and sparse matrices. In compact storage schemes we must implement some routines to insert new nonzero elements or provide some space for such entries. However, amount of this space must be estimated in advance what is not a trivial problem. Sometimes we simply have not the space needed for the ll-in.
2.1. The Gaussian Elimination and the LU Factorization
In the Gaussian elimination the equation Ax = b is transformed into: Ux = c where the matrix U = (uij ) is an upper triangular matrix. From such an equation we can easily obtain the vector x with the back-substitution:
n
xn = cn /unn ;
xi = (ci
k=i+1
uik xk )/uii
for i = n 1, . . . , 2, 1.
77
The Gaussian elimination transforming the matrix U into the matrix A (and the vector b into the vector c) consists in (n 1) steps. In the step number i all the elements from the column i being below the row i are replaced with zeroes by adding the row i scaled to every row below the row i (with analogous changes in the vector b). The scaling factors are the elements of a lower triangular matrix L (which have a unity diagonal) and the LU factorization is given by the equality A = LU (so U = L1 A and c = L1 b). However, our equation (2) is a homogeneous equation with a singular matrix. To solve such an equation with the LU factorization we present four approaches after [13]. Replacing an equation. It is the most intuitive approach. Instead of solving (2) we can solve an equation: QT n x = en where en = (0, . . . , 0, 1)T and Qn is the matrix Q with the column n replaced with
n1
the vector e = (1, . . . , 1)T . In other words we solve the linear system (2) with the last
n
equations replaced with normalization equation eT x = 1. (4)
This approach is the least accurate from the presented in this section. Moreover, it is the slowest because the matrix QT is not diagonally dominant any longer and for its LU factorization the pivoting is necessary. The zero pivot. In this approach (and in the next two approaches) we apply the LU factorization directly to the matrix Q (or to its submatrix, as in Removing an equation) so there is no need for the pivoting (the matrix is diagonally dominant). The matrix is both diagonally dominant and singular therefore in the last step of the Gaussian elimination we get a zero as a diagonal entry (u nn = 0). So in the backsubstitution we start with the equation 0 xn = 0 where xn can by an aritrary real number . The remaining equations are solved normally and eventually we get x i = i with n = 1. The last step is getting rid of it is done from the normalization equation (4). 78
Removing an equation. In this approach the matrix QT is divided in blocks: QT = B d cT f .
Here the matrix B is a nonsingular square matrix of the size (n 1) (n 1), c and d are vectors of the size (n 1) and f is a real number. We also assume x T = (x T , 1). Now, our equation (2) has the form: B x + d = 0, cT x + f = 0. Only the rst equation, B x = d, is solved by the LU factorization (the second one is ignored, removed) and after that the vector xT = (x T , 1) is normalized. The inverse iteration. We solve the equation QT x(1) = x(0) , wich is arises when we assume = 0, A = QT and k = 1 in an iterative formula serving for nding an eigenvector x associated with an approximated eigenvalue : x(k) = (A I)(1) x(k1) . As a starting vector we choose x(0) = en . The consecutive steps are performed analogically as they are in The zero pivot.
2.2. The GTH Algorihm
For solving the equation (2) the GTH algorithm (given by Grassmann, Taskar and Heyman in 1985 [8]) is recommended. It is an LU factorization variant that appears to be more stable because of exploitation of the matrix Q property:
n
qii < 0,
qij 0,
j =1
qij = 0.
Its disadvantage is more complicated requirements for accessing the matrix items and therefore more complicated storage scheme and more space for storing the matrix. We present the GTH version of the Gaussian elimination for the matrix Q T on the gure 1. 79
for i = 1, 2, . . . , n 1: 1. s 0.0 2. for j = i + 1, i + 2, . . . , n: (a) r qji /qii (b) for k = i + 1, i + 2, . . . , n: i. qjk qjk + rqik (c) if j > i + 1 then s s + qj,i+1 3. qi+1,i+1 s
Fig. 1: The GTH version of the Gaussian elimination for the transposed innitesimal generator matrix 3. Iterative Methods All the iterative methods have the similar scheme. They start with a starting vector x 0 and then they generate a sequence (x(0) , x(1) , . . .) which hopefully converges to the solution vector x. A very general scheme of an iterative method applied to the equation Ax = b is shown on the gure 2. The advantages of the iterative methods: they need no modication of the given matrix (so no ll-in is generated and we do not need any additional space for new elements and we spend no additional time on inserting these elements into a complicated storage stucture); they need very little additional memory; they are usually faster than direct methods especially when we do not need very good accuracy offered by the direct methods; they are easy to emplement efciently and easy to vectorize and to parallelize. 80
1. choose x(0) 2. i 0 3. while QT x(i) is too far from 0 do: (a) x(i+1) F(x(i) ) (b) i i + 1
Fig. 2. A general scheme of an iterative method However, the iterative methods have some disadvantages too. We do not know the time needed to achieve required accuracy. Moreover, sometimes we can have even troubles with convergency and we can achieve a solution not satisfying us especially when the required accuracy is high (what can be an issue in our applications). In classical iterative methods (covered in this section) if we are to solve an equatin of the form Ax = b the function F has a form: F(v) = Hv + c, where the matrix H depends on the matrix A and the vector c depends on the matrix A and the vector b. We can obtain some general formulas from such an (informal) deduction: Let A = M N, where M is not singular. Now we have (M N)x = b and after some convertions: x = M1 Nx + M1 b. Assigning F(v) = M1 Nv + M1 b we get H = M1 N and c = M1 b 81 (6) (5)
(in our applications c = 0, because b = 0). For certain matrices M and N the function F from (5) applied to the scheme from the gure 2 gives a convergent method.
3.1. The Power Method
The power method is the most simple iterative approach. It exploites the fact that the unknown vector is an eigenvector of the matrix PT associated with the eigenvalue 1 so it is its a xed point of a function F(v) = PT v. Such xed points sometimes can be obtained from iterations of F that is by applying a scheme from the gure 2 and so it is in our case. The step 3a from the gure 2 takes the form: x(k+1) PT x(k) for a DTMC or the form: x(k+1) (I + QT )x(k) for a CTMC (where 0 < < 1/(maxi=1,...,n |qii |), see (1)). The power method is convergent but its convregency is very slow.
3.2. The Method of Jacobi
The method of Jacobi is a classical iterative methods with the coefcient matrix Q T split as following: QT = D (L + U) what corresponds to assigning: M = D, N=L+U
in (6). The matrix D = (dij ) is a diagonal matrix, the matrix L = (lij ) is a strictly lower triangular matrix (with zeroes on its diagonal) and the matrix U = (u ij ) is a strictly upper triangular matrix (with zeroes on its diagonal). So in this method the step 3a from the gure 2 looks as following: x(k+1) D1 (L + U)x(k) 82
and in scalar form: xi

(k+1)
Of course, there is no need for computing D1 , because the matrix D is a diagonal one. The method of Jacobi is much better than the power method and it is very convinient to vectorize and to parallelize [14].
3.3. The Method of Gauss-Seidel
1 dii
j =i
(k) (lij + uij )xj ,
for i = 1, . . . , n.
In this method we have the same splitting of the matrix QT but with a different groupping of components: QT = (D L) U (the matrices D, L, U are dened as in the previous section). Here we have M = (D L) and N = U, so the step 3a from the gure 2 is: x(k+1) (D L)1 Ux(k) . (7)
In this case there is no need for computing (D L)1 either, because the matrix (D L) is a lower triangular one and a version of the back-substitution (known from the LU factorization) can be performed. The method of Gauss-Seidel can be interpreted as the method of Jacobi in which we use just computed items of the vector x (k) for computing next items of the same vector in the same iteration. In scalar form (7) looks as following: xi
(k+1)
and it scalar form:
1 dii
i1 j =1
lij xj
(k+1)
+
j =i+1
(k) uij xj ,
for i = 1, . . . , n.
However, this is the cause for which the method of Gauss-Seidel is less convinient for vectorization and parallelization. On the other hand, the convergency of this method is usually better then the one of the method of Jacobi. 83
3.4. The Successive Overrelaxation (SOR) Method
The SOR method is a modication of the method of Gauss-Seidel. Here we split the matrix QT as follows: QT = (D L)/ (1 )D + U /, where 0 < < 2 [15] (for = 1 we get the method of Gauss-Seidel). The vector form of the iteration step 3a is: x(k+1) (D L)1 (1 )D + U x(k) and the scalar form is: xi
(k+1)
(1 )xi +
(k)
dii
i1 j =1
lij xj
(k+1)
+
j =i+1
(k) uij xj ,
for i = 1, . . . , n.
The parameter controls how the previous approximation vector will be taken as a part of the new approximation vector. A good value of the parameter can speed up the convergency very much but its choice is in general still an open problem.
3.5. The Preconditioned Power Method
The original power method (section 3.1.) can be very slow for some matrices P. However, its convergency can be enhanced by the preconditioning. The idea behind the preconditioning is to change the given linear system to get the better convergency but without changing its solutions. The original system Ax = b is replaced by M1 Ax = M1 b, where M1 is called a preconditioning matrix. The matrix M should be chosen so that the matrix M1 is easy to compute (for example by the Gaussian elimination) and approximates A1 . Of course, our equation is QT x = 0 with the singular coefcient matrix QT , so the matrix (QT )1 cannot be approximated, because it does not exist. Instead, we can 84
choose the matrix M1 to approximate the matrix (QT )# the group inverse of the matrix QT . Now we can solve a new system: M1 QT x = 0 with the power method (which now has a much better convergency) and our step 3a from the gure 2 is: x(k+1) (I + M1 QT )x(k) . How to nd a suitable matrix M? The most effective methods are incomplete LU factorizatoins. Such an incomplete LU factorization consists in an usual factorization of QT (0 < < 1/(maxi=1,...,n |qii |) as in (1)) but some entries in the output matrices are omitted and replaced by zeroes. In such an approach we get an inversible matrix M which differs from QT by a remainder matrix E, being small in some sense: M = LU = Q T + E and the ll-in is very little (or contolled at least) or even none. All the methods of incomplete factorizations described below can give good results but they are not sufciently investigated, especially in the applications to Markov chains. ILU(0). This is the most straightforward manner of the incomplete factorization. It generates no ll-in, because all the nonzero elements arising on the places of zero elements in the input matrix during the factorization are discarded. In other words, L + U has the same zero structure as the matrix QT . ILUTH. The incomplete LU factorization with a threshold is a method where the LU
factorization is performed in a usual row-by-row manner but only result items with their absolute values greater than (or equal to) the given threshold (and all the diagonal elements) are remembered values less than the threshold are replaced with zeroes. ILUK. In this type of the incomplete LU factorization we determine the amount of memory available for the output matrices in advance. We do this by dening a number K . After transforming each row only K largest elements from this row (and the diagonal element) are rememberd. 85
4. Projection Methods The projection methods consists in approximating the solution vector with a vector from a small-dimension subspace. Such approximations are repeated until our approximation is sufciently close to the solution in some sense the projection methods are iterative methods. The projection methods need more space than iterative methods (because they have to store huge basis vectors of subspaces), but can converge faster than classical iterative methods although the convergence rate is much better for the matrices more beautiful in their structure than the ones arising in solving Markov chains.
4.1. The Projection Step
To solve a linear system Ax = b by a projection method rst we have to choose two subspaces of dimension m from the n-dimensional space: K which is a subspace containing the approximation; L which is a subspace dening constraints for selection of approximation from K. Let the subspace K be spanned by V = (v1 , . . . , vm ). The approximated solution is in K so it can be written x = Vy where y is an m-dimensional unknown vector. To nd y we require that the residual vector b Ax = b AVy be orthogonal to the subspace L spanned by W = (w1 , . . . , wm ), that is: WT (b AVy) = 0 and then (if the matrix WT AV is nonsingular): y = (WT AV)1 WT b. If we know an initial approximation x(0) we will rather seek a difference d between the exact solution x and x(0) : x = x(0) + d. Setting r(0) = b Ax(0) we are to solve the equation Ad = r(0) 86
1. choose: an initial approximation x(0) a subspace K spanned by V = [v1 , . . . , vm ] a subspace L spanned by W = [w1 , . . . , wm ] 2. r(0) QT x(0) 3. y (WT AV)1 WT r(0) 4. x(0) x(0) + Vy
Fig. 3. A basic projection step for the equation QT x = 0 what can be done with the described above projection step. A basic projection step for our equation 2 (where A = QT and b = 0) is shown on the gure 3. The most efcient method for general, non-symmetric coefcient matrices (as Q T ) are methods based on Krylov subspaces. A Krylov subspace is dened by its dimmension m, a matrix A and a vector v: Km (A, v) = span{v, Av, A2 v, . . . , Am1 v}. Many of such methods require that an orthonormal basis be found for the Krylov subspace. Unfortunatelly, classical Gram-Schmidt ortogonalizaton is numerically poor. To deal with it there are two main kinds of methods: Arnoldi process (which is a modied Gram-Schmidt ortogonalizaton) and Lanczos methods (originally for symmetric coefcient matrices but generalized in some ways).
4.2. The Arnoldi Process
The Arnoldi process [1] on its own (see the gure 4) generates the orthonormal basis V = (v1 , . . . , vm ) for the subspace Km (A, v) and an upper Hessneberg matrix H = 87
1. v1 v/||v||2 2. for j = 1, 2, . . . , m: (a) w Avj (b) for i = 1, 2, . . . , j :

Tw i. hij vi
ii. w w hij vi (c) hj +1,j ||w||2 (d) vj +1 w/hj +1,j
Fig. 4. The basic Arnoldi process for a subspace Km (A, v) (hij ): H= h11 h12 h13 h21 h22 h23 0 0 0 0 h32 h33 0 0 0 h43 0 0 h1,m2 h2,m2 h3,m2 h4,m2 0 h1,m1 h2,m1 h3,m1 h4,m1 hm,m1 h1m h2m h3m h4m hmm
hm1,m2 hm1,m1 hm1,m
which represents the linear transformation A restricted to Km (A, v) with respect to the basis V, that is H = VT AV. The original Arnoldi process applied to a linear system Ax = b is called the full orthogonalization method (FOM) [10] but a better approach is the generalized minimum residual algorithm (GMRES) [11]. Both the methods are shown on the gure 5. They differ only in one step how to nd the vector y (but both the procedures are projections [11]). The GMRES algorithm is very popular in its iterative form. In the iterative GMRES 88
1. choose x(0) and m 2. r(0) QT x(0) 3. ||r(0) ||2 4. v1 r(0) / 5. for j = 1, . . . , m: (a) w QT vj (b) for i = 1, . . . , j :
Tw i. hij vi
ii. w w hij vi (c) hj +1,j ||w||2 (d) vj +1 w/hj +1,j 6. FOM only: nd y = (y1 , . . . , ym ) from the m m Hessenberg system Hy = e1 7. GMRES only: ||2 nd y = (y1 , . . . , ym ) minimizing || e1 Hy = (hij ) is an (m + 1) m upper Hessenberg matrix where H 8. x(0) x(0) +
m i=1 vi yi
Fig. 5. The FOM and GMRES methods for the equatin QT x = 0
89
after computing the new vector x(0) , the new residual QT x(0) is checked if it is sufciently close to 0. If not, the whole algorithm is repeated with the new x (0) as an initial guess.
4.3. The Methods Related to Conjugate Gradients
The original symmetric Lanczos algorithm is used for nding approximations of eigenvalues of a symmetric matrix A. For such a matrix a tridiagonal, symmetric matrix T is generated, eigenvalues of which are approximations of the subset of the eigenvalues of a given matrix. When we perform the Arnoldi process on a symmetric matrix we get H = T that is Arnoldis matrix H becomes: 1 2 2 2 3 3 3 4 T= .. .. .. . . . m1 m1 m m m arithmetic is exact). There are some method for solving Ax = b that are connected to the Lanczos method rst the steepest descent method and the conjugate gradient (CG) method (the latter is shown on the gure 6). Unfortunatelly, these methods are usable to symmetric, positive-denite matrices. However there exist some modications of the CG method suitable for us.
For the m = n the eigenvalues of T are the same as the eigenvalues of A (if the
CGNR. The conjugate gradient method for the normal equations consists in solving a normal equations: AT Ax = AT b or: AAT z = b and then 90 x = AT z
1. choose x(0) 2. r(0) b Ax(0) 3. v(0) 0 4. 0 5. for j = 1, 2, . . .: (a) v(j ) r(j 1) + v(j 1) (b)
(r(j 1) )T r(j 1) (v(j ) )T Av(j )
(c) x(j ) x(j 1) + v(j ) (d) r(j ) r(j 1) Av(j ) (e)

(r(j ) )T r(j ) (r(j 1) )T r(j 1)
Fig. 6. The conjugate gradient algorithm
91
with the CG method, instead of Ax = b. The matrix AT A (and also AAT ) is a symmetric, positive-denite matrix, so the CG method can be used. Unfortunatelly such an approach has more computations to do and has a worse condition. BCG and CGS. The more suitable algorithms are the biconjugate gradient algorithm (BCG) [7] and the conjugate gradient squared algorithm (CGS) [12]. The latter (shown on the gure 7) converges faster than the former and requires less computation per iteration. Moreover, both the algorithms requires less work and memory per iterations than the GMRES algorithm. However, the approximations from BCG and CGS do not satisfy an optimal property (as GMRES approximations does), so it implies that the GMRES will do not more iterations than the algorithms here described. Moreover, GMRES approximation errors decrease monotically and in BCG and CGS it does not, so we have no guarantee that the algorithms described in this section give us a solutions at all.
4.4. The Preconditioned Iterative Methods
For an ill-conditioned matrix (as QT is) projective methods can converge slowly. To make the condition better we can use a preconditioner in the similar manner as it was described for the power method in the section 3.5.. 5. Conclusion In this paper we described traditional methods for solving: Ax = b that can be used to solve the equation: QT x = b arising during modelling with Markov chains. The methods are still developed. Some traditional methods are adapted to Markov chains like the WZ factorization [4]. The vectorized, paralelized and distributed implementations on various of described here (and many more) methods are worked on 92
1. choose x(0) 2. r(0) QT x(0) 3. r(0) r(0) 4. p 0 5. q 0 6. 0 1 7. k 0 8. do until r(k) is sufciently near to zero: (a) k k + 1 (b) k (r(0) )T r(k1) (c) k /k1 (d) u r(k1) + q (e) p u + (q + p) (f) v QT p (g) k /(vT r(0) ) (h) q u v (i) u = u + q (j) x(k) = x(k1) + u (k) r(k) = QT x(k) 9. normalize x(k)
Fig. 7. The conjugate gradient squared algorithm applied to Q T x = 0
93
[5, 6, 9]. So the applications of supercomputers are investigated [14]. There are much hope in some new methods as decompositional methods and methods based on Kroncker algebra [3]. The combinations of the various traditional methods are investigated. Selection of a suitable solution method is by no means easy. There are papers on automatic propper choice of the solving method for a given model [2]. The choice depends on many questions: the matrix structure and its degree of decomposability (e.g. is it NCD?); the matrix closeness to a suitable structure (and possibility to convert it); the matrix sparseness; the matrix size (and our storage possibilities); time to nd the solution; desired accuracy; the matrix conditioning. References [1] W.E. Arnoldi: The principle of minimized iteration in the solution of the matrix eigenvalue problem, Quarterly for Applied Mathematics 9, 1951, pp. 1729. [2] W. Barge, W. Stewart: Autonomous Solution Methods for Large-Scale Markov Chains (to be printed). [3] P. Buchholz, M. Fischer, P. Kemper: Distributed Steady State Analysis Using Kronecker Algebra, Numerical Solution of Markov Chains, Zargoza, Spain, 1999, pp. 7695. [4] B. Bylina, J. Bylina: Solving Markov chains with the WZ factorization for modelling networks, Proceedings of 3rd International Conference Aplimat 2004, Bratislava 2004, pp. 307312. [5] J. Bylina: Distributed solving of Markov chains for computer network models, Annales UMCS Informatica 1 (2003), Lublin 2003, pp. 1520. 94
[6] J. Bylina, B. Bylina: GMRES dla rozwia zywania a ncuch ow Markowa na komputerze wektorowym CRAY SV1, Algorytmy, metody i programy naukowe, Polskie Towarzystwo Informatyczne, Lublin 2004, pp. 1924 (in Polish). [7] R. Fletcher: Conjugate gradient methods for indenite systems, Lecture Notes in Mathematica 506, Springer-Verlag Berlin Heidelberg 1976, pp. 7389. [8] W.K. Grassmann, M.I. Taskar, D.P. Heyman: Regenerative analysis and steady state distribution for Markov chains, Operations Research 33 (5), 1985, pp. 1107 1116. [9] W. Knottenbelt, P. Harrison: Distributed disk-based solution techniques for large Markov models, Numerical Solution of Markov Chains, Zargoza, Spain, 1999, pp. 5875. [10] Y. Saad: Krylov subspace methods for solving unsymmetric linear systems, Mathematics of Computation 37, 1981, pp. 105-126. [11] Y. Saad, M.H. Schultz: GMRES: A generalized minimal residual algorithm for solving non-symmetric linear systems, SIAM Journal of Scientic and Statistical Computing 7, 1986, pp. 856869. [12] P. Sonneveld: CGS, a fast Lanczos-type solver for nonsymmetric linear systems, SIAM Journal of Scientic and Statistical Computing 10 (1), 1989, pp. 3652. [13] W. Stewart: Introduction to the Numerical Solution of Markov Chains, Princeton University Press, Chichester, West Sussex, 1994. [14] Z. Szczerbiski: Parallel computing applied to solving large Markov chains. A feasibility study, Studia Informatica, Vol. 24, Number 4 (56), 2003, pp. 7-28. [15] D.M. Young: Iterative Solution of Large Linear Systems, Academic Press, New York, 1971.
95
Analysis of GI X /M/s/c systems via uniformisation and stochastic ordering
F ATIMA F ERREIRA , A NT ONIO PACHECO # UTAD, D EPARTMENT OF M ATHEMATICS AND CEMAT, V ILA R EAL , P ORTUGAL # IST UTL, D EPARTMENT OF M ATHEMATICS , CEMAT AND CLC, L ISBOA , P ORTUGAL
Abstract: We study multi-server queueing systems with general renewal customer batch arrival process, exponentially distributed service times and nite or innite buffer size, called GI X /M/s/c systems with s denoting the number of servers and c the buffer size. We address the time-dependent and steady-state analysis of the number of customers in the system at prearrivals of batches and seen by customers at their arrival to the system, as well as the waiting times in queue and blocking probabilities of the latter. These results are then used to derive the continuous time steady-state distribution of the number of customers in the system. For the analysis we combine Markov chain embedding with uniformisation, and use stochastic ordering as a way to bound the errors of the computed performance measures of the system. Keyworks: Embedding, GI X /M/s/c queues, Markov chains, mixed-Poisson probabilities, stochastically monotone matrices, stochastic ordering, uniformisation.
1. Introduction In this paper we consider GI X /M/s/c systems, i.e., queueing systems with s servers working in parallel and nite or innite buffer size c, at which customers arrive in batches, with independent and identically distributed (i.i.d.) sizes, according to a general renewal process and whose services have i.i.d. exponentially distributed durations. Moreover, the batch sizes, the interarrival times and the service times are independent. We address the model with partial blocking, i.e., whenever upon arrival a batch does not nd enough space in the buffer to fully accommodate the customers of the batch, the buffer is lled up with the customers in the front part of the batch and the remaining customers of the batch, for whom there is no space available in the buffer, are blocked. Our analysis combines the Markov chain embedding technique with uniformisation and stochastic ordering, and we derive time-dependent as well as steady-state results. We address: the number of customers in the system in continuous time and at prearrivals (i.e., immediately before the arrival of batches) and the number of customers seen in the system by customers at their arrival to the system, along with their waiting times in queue
and blocking probabilities. The Markov chain embedding is used in the analysis of the number of customers in the system at prearrivals of batches and the characterization of its one-step transition probabilities is based on the uniformisation of the pure-death process associated to the number of customers in the system in-between two consecutive batch arrival epochs. As, in general, this approach does not lead to an exact computation of the transition probability matrix of the the number of customers in the system at prearrivals, we bound it in the Kalmykov ordering sense between two other stochastic matrices which can be computed exactly. Our approach allows, in particular, for the derivation of the customer waiting time in queue distribution and blocking probability, as well as the the steady-state distribution of the number of customers in the system in continuous time and at prearrivals of batches. This fact may then be used to derive lower and upper bounds for performance measures of the system, such as the effective trafc load and the throughput of the system, the steady-state mean and variance of the number of customers seen in the system by a customer at his arrival to the system, the steady-state customer blocking probability and the steady-state mean and variance of the waiting time in queue of non-blocked customers. We note that the results derived could be used in an easy way to obtain results for other processes of interest associated to GI X /M/s/c systems, like the number of customers seen in the system by the last customer of a batch at his arrival to the system, and the number of customers in the system at postarrivals (i.e., immediately after the arrival of a batch of customers). Similarly, results for other blocking rules, aside from partial blocking, could also be derived in a similar way. We next give a brief outline of the paper. In Section 2. we present some notes on related literature. In Section 3. we give the time-dependent analysis of the number of customers in the system at prearrivals and seen by a customer at his arrival to the system, along with his waiting time in queue. In Section 4. we characterize the steady-state distributions of the number of customers in the system in continuous time and seen by a customer at his arrival to the system, along with his waiting time in queue, starting from the steady-state distribution of the number of customers in the system at batch prearrivals. The computation of the latter distribution is addressed in Section 5. and is based on the combination of uniformisation and stochastic ordering. In Section 6. we derive bounds for system related distributions and performance measures. Finally, in Section 7. we illustrate the results derived in the paper. 2. Some notes on related literature In this section we briey review some of the literature on queueing systems with batch arrivals, with emphasis on GI X /M/s/c systems, and on the main techniques used in 98
the paper, namely: Markov chain embedding, uniformisation and stochastic ordering. The use of embedded discrete time Markov chains, introduced by D. Kendall [29, 30] is one of the most common procedures to study non-Markovian queues with exponential inter-arrival or service times and takes advantage of the Markovian structure that these queues have on certain special instants [33, 51]. The Markov chain embedding approach plays an important role on the study of semi-Markov and Markov regenerative systems and can be used to investigate continuous time processes that are hard to study directly [33]. Additionally, this technique provides usually direct insights on the redemption of the subjacent process, e.g., on queues with exponential services the chain embedded immediately before arrival epochs to the system characterizes the limit distribution of the number of customers in the system in continuous time, along with the steady-state customer blocking probability. The uniformisation technique of A. Jensen [28], also known as Jensens method or randomization, is a powerful numerical technique to analyze continous time Markov chains (CTMCs) with bounded transition rates out of states. Essentially, the method uniformises the rates at which transitions out from each state of the CTMC take place by introducing self-transitions. Making this, uniformisation shifts the problem of computing the transition probabilities of the CTMC to the computation of the transition probabilities of the discrete time Markov chain (DTMC) embedded at the transition epochs of the uniformised CTMC along with the computation of Poisson probabilities with associated rate equal to the uniformisation rate, cf. [23, 26, 33]. The rapid dissemination of the uniformisation method to the applications is mainly due to its simplicity, its excellent numerical stability as it leads to recurrence relations involving only additions and multiplications of nonnegative numbers, allowing one to obtain results with an arbitrary pre-established accuracy, and having a probabilistic interpretation. The most frequent use of uniformisation is on the computation of time-dependent measures of CTMCs, e.g., transition probabilities [33, 42], accumulated cost/rewards on nite time intervals [5, 43, 47], and rst passage time distributions [35]. Moreover, it is also used on probabilistic model checking, cf. [3, 34, 35], and, furthermore, if a CTMC is ergodic, then its stationary distribution is equal to the stationary distribution of the corresponding embedded uniformised DTMC ([1], p. 55). Stochastic ordering is an area with a rich history of applications, as well illustrated, e.g, in the monographs of Mosler and Scarsini [39] and Shaked and Shanthikumar [44]. ller and Stoyan [40] and Szekli [49], and it It has broad applications in queueing, cf. Mu is used to study internal changes of performance due to parameter variations, to compare distinct systems, to approximate a system by a simpler one, and to obtain upper and lower bounds for the main performance measures of systems. In this work we will consider GI X /M/s/c queueing systems, which are batch arrival queues with exponential services. The work of Bailey [4] on batch service queues 99
is pointed out as the pioneering work on batch queues, with explicit references to batch arrivals queues appearing a few years later [21]. Brockmeyer et al. [7] argue that taking into account the equivalence relation that exists between the distribution of the number of customers in M k /M/s queues and the number of phases in M/Ek /s queues at the same instants, one may report the rst solutions of batch arrival queues to Erlangs solution of the M/Ek /1 system since this gives implicitly the solution of the M/M/1 system with deterministic size batch arrivals. Nowadays, there exists an extensive literature concerned, with batch arrival queues with special emphasis on single server systems and queues with exponential or Erlang inter-arrival and/or services times (see, e.g., [2, 10, 13, 14, 15, 18, 19, 36]). The case of individual arrivals has been treated in a fairly exhaustive way for queues with general renewal arrival process, i.e., GI/M/s/c systems. Several results for this case can be found both in monographs [25, 51] and in papers, e.g., [11, 12, 45, 46, 50]; however, fewer works treat the case of batch arrivals. Since the most relevant papers on batch arrivals prior to 1983 are discussed by Chaudhry and Templeton [10], we will review in the next paragraphs only the following selected works [6, 9, 16, 17, 18, 19, 24, 27, 31, 36, 37, 38, 41, 52, 53] posterior to 1982 that analyze GI X /M/s/c systems with batch arrivals and nite or innite number of servers and buffer size, starting with innite buffer size systems. Easton et al. [16, 17] established relations between the steady-state probabilities of the number of customers in the GI X /M/1 system at random, prearrival and postdeparture instants, developed some expected value formulas for the system and analyzed them computationally. Bri` ere and Chaudhry [6] studied the steady-state distribution of the number of customers in the GI X /M/1 system with bounded batch size, l, through a method that involves the evaluation of the l roots of the characteristic equation of the model inside the unit circle and provided numerical results, including plots of the roots inside the unit circle and tables of prearrival and post-departure probability distributions of the number of customers in the system. Later, Chaudhry et al. [9] developed a computational method to nd the steady-state distribution of the number of customers in the GI X /M/1 system seen by the rst or a randomly selected customer of a batch, and the distribution of the virtual waiting time. More recently, Economou [18] proposed geometric and modied geometric distributions as upper and lower stochastic bounds for the steady-state distribution of the number of customers in GI X /M/1 systems. These results were later generalized to GI X /M Y /1 systems by Economou and Fakinos [19], who obtained, using complex function theory, an asymptotic result for the above-mentioned distribution. These authors combined the embedding Markov chain approach with a technique that consists on modifying slightly a non tractable system to obtain a system with a product form distribution and, then, utilized that distribution to obtain stochastic bounds for quantities of 100
interest of the original system. The GI X /M/1 system has also been used to study other systems. In particular, van Ommeren and Nobel [41] showed how to study the GI/C 2 /1 system, in which the service times have a Coxian-2 distribution, as a special batch arrival GI X /M/1 system, and, by analyzing the GI X /M/1 system via the Markov chain embedding approach, they obtained the stationary waiting-time distribution for the GI/C 2 /1 system queue. Kijima [31] used the Markov chain embedding at prearrival epochs of the GI X /M/1 system to study the relaxation time of a stable GI/G/1 system. Systems with innite number of servers have been addressed in [27, 37, 38]. The single exponential customer service case, i.e., the GI X /M/ system, was studied by Holman et al. [27], who addressed the time-dependent and steady-state distributions of the number of customers in the system at prearrivals relating the second distribution with the steady-state distribution of the number of customers in the system in continuous time. The single general customer service case, i.e., the GI X /G/ system was addressed by Liu et al. [37], who used a shot noise process to derive an implicit expression for the probability-generating function of the number of busy servers, and obtained the rst two moments of the number of customers in the system at prearrivals of batches. Liu et al. [38] addressed the system with exponential constant batch size service, i.e., the GI X /M k / system, using the Markov chain embedding approach at prearrivals of batches to obtain a recursive relation expressing the r-th binomial moment of the joint distribution of the number of busy servers and the number of waiting customers in the system in terms of lower binomial moments and system parameters. As concerns the nite multiple server case, Grassmann and Chaudhry [24] developed a stable numerical method to solve steady state probabilities and performance measures for GI X /M/s, M X /M Y /s and discrete GI/G/1 queues that avoids the use of complex arithmetic and matrix manipulations. Later, Yao et al. [52] derived closed-form relations for the probabilities and performance measures observed at random arrival/departure epochs in GI X /M/s systems. More recently, Zhao [53] proposed, based on generating functions and the Markov chain embedding approach, a numerical procedure to compute the steady-state probabilities at prearrivals in GI X /M/s systems with nite batch sizes expressed as a linear combination of specic geometric terms. For the nite buffer case, Laxmi and Gupta [36] used the Markov chain embedding technique to obtain the steady-state distribution of the number of customers in GI X /M/s/c systems, with partial and total batch rejections, at prearrivals of batches. Furthermore, using the supplementary variable technique, they obtained the corresponding distribution at arbitrary epochs by relating it to the prearrival distribution and derived blocking probabilities and waiting time measures associated to the rst, an arbitrary and the last customer of a batch. For the same system, Kim and Choi [32] expressed the asymptotic behaviour of the loss probability of the GI X /M/s/c queue as c tends to in101
nity in terms of the roots of the characteristic equation and the boundary probabilities of the corresponding queue with innite buffer. Numerical examples showed that the asymptotic loss probabilities proposed are quite accurate approximations of the exact loss probabilities even on systems with moderated capacity. 3. Time-dependent analysis We start the section with some notation and conventions that used in the paper to address GI X /M/s/c systems. We let N denote the set of nonnegative integer numbers and for a subset E of N, we let E+ = E \ {0} so that, in particular, N+ denotes the set of natural numbers. Moreover, we let R0 = [0, ) denote the set of nonnegative real numbers, and by a stochastic process W we will mean: (Wk )kN+ if W has parameter space N+ ; and (W (t))tR0 if W has parameter space R0 . Furthermore, given a discrete random variable W taking values in EW , we let pW () denote the probability function of W and FW () its distribution function, i.e., pW (x) = P(W = x) and FW (x) = P(W x), for x R. Moreover, given a probability function {cn , n E } on a countable ordered set n = E , we let Cn = {kE : kn} ck and C {kE : kn} ck , for n E . To simplify the wording, unless further specied, by state we will always mean the number of customers in the GI X /M/s/c system. We note that we consider the GI X /M/s/c model with partial blocking, i.e., whenever a batch of size n arrives that nds the system in state i (i.e., nds i customers in the system), the rst min(n, c i) customers of the batch enter the system and the remaining max(0, i + n c) customers if any are blocked. In other words, if the buffer can accommodate all customers (which is necessarily the case if the buffer has innite size) no customers are blocked, whereas if it cannot accommodate all customers of a batch then the buffer is lled up at the arrival of the batch. We let: {bn , n N+ } denote the probability function of the batch size, SX N+ its support, and b its mean; A() denote the batch interarrival time distribution function and 1/ its mean; and, denote the customer service rate. Note that, in particular, b n > 0 if and only if n SX . Moreover, as the s servers work in parallel, whenever there are i customers in the system the aggregated service rate is i = min(i, s) and the offered trafc intensity of the system is = b/s. In addition, we let X denote the batch size process, Y denote the (continuous time) denote the state process as seen by customers (at their arrival to the state process, Y denote the system), W denote the customer waiting time (in queue) process, and Y batch prearrival state process (i.e., the state process at the prearrival of customer batches). is the state seen That is, Xk is the size of the k -th batch, Y (t) is the state at time t, Y l by the l-th customer at his arrival to the system, Wl is the waiting time in queue of the k is the state at the l-th customer with this being zero if the customer is blocked, and Y 102
prearrival of the k -th batch. Thus, if we let Tk denote the arrival epoch of the k -th batch, k = Y (T ). then Y k In this section we will concentrate on the time-dependent analysis of the state of the system as seen by customers at their arrivals to the system and at the prearrival of customer batches. We start by stating some facts concerning the state process Y which , the are useful to state afterwards the main result of the section relative to the process Y batch prearrival state process. The process Y is a Markov regenerative process (MRGP, see [33]) associated to the renewal sequence (Tk )kN+ of customer batch arrival epochs and it is a CTMC if and only if the batch arrival process is Poisson, that is if the arrival process of customers is a compound Poisson process. Thus, information on the state process Y may be obtained , the batch prearrival state process, associated to the from the analysis of the DTMC Y Markov renewal sequence(Yk , Tk )kN+ . As the service times of the different customers are i.i.d. exponentially distributed random variables, in-between two consecutive batch arrivals the state process Y evolves as a pure-death process with state space S = {n N : n c} and death rates { i = min(s, i), i S }. Thus, {Y (Tk + t), 0 t <Tk+1 Tk |Y (Tk ) = i} =st {D(t), 0 t < Tk+1 Tk |D(0) = i} (1)
for i S , where =st denotes equality in distribution, and D = (D(t))tR0 is a puredeath process with state space N and death rates {i }, independent of the batch renewal arrival process. Let ij (t), i, j N, denote the transition probability of D from state i to state j in t units of time and, with a slight abuse of notation, let ij (A) =
R0
ij (t) A(dt) =
R0
P(D(t) = j |D(0) = i ) A(dt)
(2)
denote the transition probability of D from state i to state j in the random batch interarrival time. Then, we have the following result concerning the batch prearrival state process. is a DTMC with state space S and Theorem 1 The batch prearrival state process Y one-step transitions probabilities p ij =
nSX
bn min(c,i+n)j (A).
(3)
is irreducible and aperiodic, being ergodic if c < or < 1. Moreover, Y 103
Proof The sequences of batch sizes and batch interarrival times are independent sequences of i.i.d. random variables and, moreover, X k and Tk+1 Tk are independent k , for all k . Thus, using the fact that the batch size distribution has support S X and of Y probability function {bn , n N+ }, it follows that the one-step transition probabilities of are given by (3). Y is irreducible and aperiodic since, with n0 = inf SX , the probaThe DTMC Y bilities kl (A), k l, and the probability bn0 are strictly positive, the one-step tran is of the form P = [ sition probability matrix of Y pij ]i,j S , where p ij > 0, for i j min(c, i + n0 ). Therefore, Y is ergodic if c < . is ergodic if c = and < 1; in this respect, see, e.g., It remains to show that Y [51] for the case of single arrivals. To prove the result for the case of batch arrivals, suppose that c = and = b/s < 1 and, for i N, let
A i ( ) = R0
et
(t)i A(dt) i!
(4)
which is the i-th mixed Poisson probability (with structural distribution A(.)). A (s), for k l s, and, from the properties of the mixed Poisson As kl (A) = k l probabilities it follows that
A nn (s) = s/ = b/ nN+
we have, taking i s,
is
k+1 i|Y k = i lim sup lim sup E Y

i i m=1
m bm
n=1
A nn (s) = b(1 1 ) < 0.
k+1 i|Y k = i] Moreover, E[Y b, for all i S . Thus, using Pakess lemma [see, e.g., is ergodic. Kulkarni ([33], Corollary 3.3)], we conclude that Y The results of the previous theorem may be used as support for the derivation of results relative to the analysis of the GI X /M/s/c system from the customer perspective as well as in continuous time, as illustrated in the paper. By a reasoning similar to the one used in the proof of Theorem 1, and the fact that the batch size distribution has support SX and probability function {bn , n N+ }, we have the following result relative to the state at the prearrival of batches along with the batch sizes. , X ) is an irreducible and aperiodic DTMC with state space Corollary 2 The process (Y S SX and transition probabilities p (i,n)(j,m) = min(c,i+n)j (A) bm . 104 (5)
We will end the section with the derivation of a result that is useful to characterize = (Y )lN , where Y the state seen by customers at their arrival to the system, Y + l l equals the number of customers that the l-th customer sees in the system at his arrival after removal of the blocked customers from the batch he belongs to that arrive in front of him in the batch. For that, is is convenient to let G denote the customer batch-index process (i.e., G l is the index of the batch the l-th customer belongs to) and I denote the customer position process (i.e., Il denotes the position of the l-th customer in his batch) with the positions of the customers in a batch taking successively the values from one up to the size of the batch, starting from the customer at the front of the batch and ending at the one at the G is the state of the system at the prearrival of the l-th costumers batch. rear. Thus, Y l Using the convention that empty sums take the value zero, it follows that l = Gl 1 k=1 Xk + Il and, moreover, = min(Y G + Il 1, c) Y l l (6)
for l N+ . Note that Il 1 is the number of customers from the batch of the l-th G + Il 1 would be the number customer, Gl , that are in front of him and, thus, Y l of customers that would be seen in the system by the l-th customer at his arrival if no customers in front of him in his batch would have been blocked; otherwise the l-th customer sees the buffer full at his arrival to the system. In view of (1) and the fact that the batch sizes are i.i.d. random variables with support SX and probability function {bn , n N+ }, we have the following result relative to the G , I ) = (Y G , Il )lN . process (Y + l G , I ) is an irreducible DTMC with state space Theorem 3 The process (Y E = {(i, k ) S N+ : k sup SX } and one-step transition probabilities 1 q j = qj min(c,i+j )k (A) 0 bj
mj bm
p(i,j )(k,l)
i=kl =j+1 l=1 otherwise
(7)
where q = (qj ) denotes the batch size hazard rate function, i.e., qj = P(X1 = j ) = P(X1 j ) =1 j +1 B j , B 1 j sup SX .
105
G , I ) is a DTMC; however, except for the case The previous result establishes that (Y , is not a DTMC. of single customer arrivals, the state process at customer arrivals, Y Nevertheless, in view of (6), results for the processes Y and W , the state seen by customers at their arrival to the system and the customer waiting time process, may be G , I ), as it is the case for their univariate marginal derived from the bivariate process (Y distributions. Corollary 4 The number of customers seen in the system by the l-th customer at his , l N+ , has probability function arrival to the system, Y l pY (j ) =
l
j i=[j +1sup SX ]+ iS
p(Y G
,Il ) (i, j
i + 1)
j<c j=c<
G ,Il ) (i, k ) kci+1 p(Y l
(8)
where p(Y G ,Il ) (k, m), (k, m) E , is the marginal probability function of the DTMC l (YG , I ), characterized in Theorem 3, at time l. Moreover, the waiting time in queue of the l-th customer, W l , has survival function
cs1
P(Wl > t) =
m=0
est
(st)m m!
c1
pY (i)
l
(9)
i=s+m
for t > 0. Thus the conditional survival function of the waiting time of the l-th customer given that the customer is not blocked is
cs1
< c) = P(Wl > t|Y l

m=0
est
FY (s + m 1) (st)m . 1 l m! FY (c 1)
l
(10)
Proof The relation (8) follows directly from (6) and the total probability law. Let l N+ , t > 0 and j S . The l-th customer does not wait in queue in case he < s, or if it is blocked, nds in the system less than s customers at his arrival, i.e., Y l = c. Moreover, if the l-th customer nds in the system j customers at his arrival, i.e., Y l with s j < c then his waiting time in queue has Erlang (j s + 1, s) distribution, i.e., j s (st)m . P(Wl > t|Yl = j ) = est m!
m=0
Then, the relation (9) follows from the previous equality and the total probability law. < c) = P(W > t)/P(Y < c), (10) follows directly Finally, since P(Wl > t|Y l l l from (9). We note that formulae similar to (9)-(10) could be obtained for the k -th batch prearrival virtual waiting time in queue (i.e., the waiting time in queue of the rst customer of the k -th batch) by substituting the probability masses pY (i) by pY k (i).
l
106
4. Steady-state, long-run and limit analysis In this section we will use the relation of the studied processes to the batch prearrival state process and the characterization of the latter process obtained in Section 3. (namely, Theorem 1) to derive steady-state, long-run and limit results for the former processes. With this in mind and in view of Theorem 1, case c < or < 1, we let = . We note that i , aside from being the (i )iS denote the steady-state distribution of Y steady-state probability of state i of Y , also denotes the limit probability of state i and spends in state i. The the long-run and expected long-run fraction of time the process Y following result is useful to obtain steady-state, long-run and limit results for the state seen by customers at their arrival to the system. Theorem 5 If c < or < 1, then: , X ) is ergodic and has steady-state distribution (i) The DTMC (Y (i,n) = i bn , (i, n) S SX . (11)
G , I ) is positive recurrent and has steady-state distribution (ii) The DTMC (Y (i,k) k B i = i = b b
k1
(1 qj ),
j =1
(i, k ) E .
(12)
Furthermore, if the batch size distribution is aperiodic, then the DTMC is ergodic. Proof Suppose that c < or < 1 and let (i,n) and (i,k) be given by (11)-(12). , X ) is an irreducible and aperiodic DTMC (i). Recall that, in view of Corollary 2, (Y with state space S SX and transition probabilities (5). Thus, it sufces to show that , X ). This fact follows (i,n) = i bn , (i, n) S SX , is a stationary distribution of (Y readily as, in view of equation (3) of Theorem 1, (j,m) p (j,m)(i,n) =
(j,m)S SX j S
j
mSX
bm min(c,j +m)i (A) bn
=
j S
j p ji bn = i bn = (i,n)
for (i, n) S SX , and (i,n)S SX (i,n) = iS i nSX bn = 1. k / (ii). We rst show that (i,k) = i B b, (i, k ) E , is a stationary distribution of (YG , I ). This follows as, in view of equation (3) of Theorem 1 and equation (7) of Theorem 3, (i,k) i = b
k1
(1 qj ) = (i,k1) p(i,k1)(i,k) =
j =1 (l,m)E
(l,m) p(l,m)(i,k)
107
for 1 < k sup SX , while (l,m) p(l,m)(i,1) =

(l,m)E (l,m)E
l b
m1
(1 qn )qm min(c,l+m)i (A)

n=1
1 = b since, from (3), l bm min(c,l+m)i (A) =

(l,m)E
(l,m)E
i l bm min(c,l+m)i (A) = = (i,1) b
l
lS mSX
bm min(c,l+m)i (A) =
lS
l p li = i .
G , I ) is irreducible, it follows that it is Thus, as in view of Theorem 3, the DTMC (Y positive recurrent. Since, for i S , it is possible to return to the state (i, 1) in m steps, for all m S X , we conclude that if, in addition, the distribution of the batch size is aperiodic, the DTMC G , I ) is aperiodic and therefore ergodic. (Y In order to derive results for the long-run and limit behaviors of the state seen by customers at their arrival to the system, we recall that the long-run fraction and long-run expected fraction of customers that see j customers in the system at their arrival to the system are, respectively, 1 lim l+ l
l
Umj
m=1
and
1 lim E l l
Umj
m=1
1 = lim l+ l
pY (j ). m
m=1
where Umj = 1{Y =j } . m Corollary 6 If c < or < 1, then: (i) The long-run and long-run expected fraction of customers that see at their arrival j customers in the system are equal and given by
j j
j =
i=0
(i,j i+1) =
i=0 c1 i=0 i .
j i+1 B b
(13)
for 0 j < c, and c = 1
(ii) If the batch size distribution is aperiodic, then j coincides with the limit probability that a customer sees j customers in the system at his arrival, i.e.,
l+
= j ) = j , lim P(Y l 108
j S.
(14)
(iii) If the batch size distribution is aperiodic, then the waiting time in queue has limit survival function
cs1 l
lim P(Wl > t) =

m=0
st (st)
c1
m!
i
i=s+m
(15)
for t > 0, and for the nite buffer size case (c < ) the waiting time in queue of non-blocked customers has limit survival function
cs1 l
< c) = lim P(Wl > t|Y l

m=0 j i=0 i ,
est
(st)m s+m1 1 m! c1
(16)
where j =
for j S .
(iv) In any case (even if the batch size is periodic) (15) is the steady-state survival G , I )) of the customer waiting time in queue and (16) function (with respect to (Y is the steady-state survival function of the waiting time in queue of a non-blocked customer. G , I ) is positive Proof Suppose that c < or < 1. Then, in view of Theorem 5 (ii), ( Y recurrent and has stationary distribution {(i, k ), (i, k ) E } given by (12). (i). The random variables Ulj are a bounded function of the positive recurrent G , I ), namely DTMC (Y Ulj = 1 {Y G 1 {Y G
l l
+Il 1=j } +Il 1c}
0j<c j = c.
Thus, in view of the ergodic theorems for DTMCs (see, e.g., Proposition 2.12.4 and Corollary 2.12.5 of Resnick [42]), it follows that 1 l l lim
l
Umj = lim E
m=1 l
1 l
Umj =
m=1
{(i,k)E :i+k1=j } (i,k) {(i,k)E :i+k1c} (i,k)
j<c j=c
as stated. (ii). If the batch size is aperiodic, then it follows from Theorem 5(ii) that the DTMC G , I ) is ergodic and consequently has limit distribution that coincides with its station(Y ary distribution {(i, k ), (i, k ) E }, which implies the result in view of (i) and the relation (8) of Corollary 4. (iii). The statement is a direct consequence of (ii) and (9)-(10). (iv ). The result is a direct consequence of Corollary 4 and Theorem 5 (ii).
109
We end the section with the characterization of the limit distribution of the (continuous time) state process and its relation to the limit distribution of the batch prearrival state process. For that, it is useful to let jk denote the expected sojourn time of Y in state k in-between two consecutive batch arrivals conditional to the state of the system at the prearrival of the rst batch being j , i.e.,
T2
Note that, for c < or < 1, we may write the probability vector = ( i )iS characterized in Corollary 6 as a function of the steady-state probability vector of the batch prearrival state process . Namely, = H , where j i+1 B j<c 1{ij } b ci hij = (17) Bl 1 j = c < . b
l=1
jk = E
T1
1{Y (t)=k} dt Y (T1 )=j
(18)
for j, k S , and let = [lm ]l,mS . Theorem 7 Suppose that c < or < 1. Then, Y has limit distribution with probability vector p = , i.e., pk = lim P(Y (t) = k ) =
t+ j S
j jk
(19)
for k S . Moreover, pk = for all k S+ and p0 = 1

c j =1 pj .
s k1 min(s, k )
(20)
Proof The process Y is a Markov regenerative process with state space S , rightcontinuous with left-hand limits sample-paths and associated embedded Markov renewal k , Tk )kN . The process is irreducible, aperiodic and positive recurrent. As sequence (Y the mean batch interarrival time is 1/, independently of the initial state of the process, the limit probabilities (19) follow directly from Kulkarni ([33], Teorem 9.30). In order to prove (20), let k S+ and Ak = {0, 1, ..., k 1}. The balance equations for the long-run transition rates guarantee that, in the long-run, the exit rate from A k equals the corresponding entrance rate. The entrances into Ak occur exclusively when a customer leaves the system at a time at which the process Y is in state k . Thus, the long-run entrance rate into A k is given by 110
pk k , where k = denotes the long-run customer departure rate when the process Y is in state k . That is, k = lim
t 0 1{Y (s )=k,Y (s)=k1} dM (s) t 0 1{Y (s)=k} ds
t+
where M denotes the counting process of customer departures from the system. When there are k customers in the system, the service rate is min(s, k ). From the memoryless property of the exponential distribution it follows that k = min(s, k ) and, therefore, the long-run entrance rate into Ak is pk min(s, k ). On the other sisde, the exits from Ak occur when an arrival of a customer batch with size greater than k j 1 takes place with the process Y in state j , j A k , i.e., the k1 long-run exit rate from Ak is j =0 j Bkj . By equating the long-run entrance and exit rates from A k , we obtain pk = for k S+ . As the equation (20) shows, the computation of p does not necessarily rely on the computation of the coefcients ij . In the next section we will look at numerical aspects related to the computation of the steady-state distribution of the batch prearrival state process. 5. Computations associated to the batch prearrival state process from (3) by uniformising In the section we start by addressing the computation of P the pure-death process D. Then, we will bound P , and, nally, we will derive implications in terms of bounds for , the steady-state distribution of the batch prearrival state process, in the ergodic case (c < or < 1).
based on the uniformisation of the pure-death process D 5.1. Computation of P
min(s, k )
k1
kj = j B
j =0
s k1 min(s, k )
In practice, the computation of the probabilities p ij is relevant for the computation of the probabilities ij (A) dened by (2). As the exit rates from the states of the puredeath process D are bounded by the departure rate when all servers are busy, s, these probabilities can be computed using the uniformisation method [23, 28]. Let Q = (qij )i,j S denote the innitesimal generator matrix of D, so that qi,i1 = qii = i = min(s, i), i S+ , with all other entries null. For the computation 111
we consider the embedded uniformised DTMC with of the transition probabilities of Y = uniformisation rate s associated to D, which has one-step transition probabilities P I + Q/s. Thus, p i,i1 = min(s, i)/s = 1 p ii , i S+ , p 00 = 1 and p ij = 0 otherwise. By conditioning on the number of renewals in the interval [T 1 , T2 ) of the uniformising Poisson process with rate s (independent of D) we obtain ij (A) =
mN A (s), dened in (4), which to simplify the notation we will simply denote by Here m m , is the m-th mixed Poisson probability with structural distribution A() and rate s and represents the probability that exactly m renewals take place in the uniformising Poisson process, with rate s, between two consecutive batch arrivals to the system. Accordingly, taking into account (3), A m ]ij . m (s)[P
(21)
= P
nSX
bn n
mN
m m P
(22)
where in the summations involving the probabilities bn it is understood that the respective sums are computed with restriction of the index n to the set S X . If, moreover, the system has nite capacity, c < +, the probabilities (23) can be computed exactly using only the rst c mixed Poisson probabilities, since, as m = 1
mk m<k
where [n (V )]ij = [V ]min(c,i+n)j , for V = [vij ]i,j S . Note that the linear operators n , n N+ , transform stochastic matrices into stochastic matrices. is associated to a deterministic In the single server system, s = 1, the matrix P monotone decreasing DTMC, namely p ij = j (i1)+ , where x+ = max(x, 0) for x R and is the Kronecker delta function, i.e., kl = 1 if k = l and kl = 0 if k = l. In , are this case, the one-step transition probabilities of the batch prearrival state process, Y given by bn min(c,i+n)j j>0 nj i p ij = (23) m j = 0 bn 1
nSX m<min(c,i+n)
and
nk
bn = 1
n<k
bn
112
for k N+ , (23) is equivalent to bn i+nj +

j in<ci
1
n<ci
bn
cj bn
m<c
j>0 (24) m j = 0.
p ij =
Note that the mixed Poisson probabilities can usually be computed in a fast way [35]. already In the general multiserver case, s > 1, the transition probability matrix P has a stochastic component and it is not associated to a deterministic monotone decreasing DTMC as in the single server case. Effectively, in general the computation of the from (22) can only be done by approximation since the transition probability matrix P m innite sum m m P has to be truncated. Note that if we choose natural numbers N 1 and N2 such that n>N1 bn < 1 and m>N2 m < 2 , then necessarily P
nN1
bn
m<i+n
1
n<ci
n<ci
bn n
mN2
m m P
for . = . 1 , . , with . 1 and . denoting the norms L1 and L , respectively. can be computed with xed a priori arbitrary precision. However, this result Thus, P does not lead necessarily to a satisfactory solution for the computation of the stationary since we do not have control over the way the P approximation error distribution of Y . To achieve some propagates to the computation of the stationary distribution of Y , we will degree of control on the approximation error of the stationary distribution of Y . use in the next subsection stochastically monotone properties of the matrix P
5.2. Kalmykov bounds for P
To ease the understanding of the results derived in the subsection we recall some useful denitions from stochastic ordering; see, e.g., Kulkarni [33] for more details. If I is a (r) countable ordered set and p(r) = (pi )iI , r = 1, 2, are two probability vectors, we say that p(1) is stochastically smaller than p(2) in the usual (in distribution) sense, and (1) (2) write p(1) st p(2) , if and only if j k pj j k pj , for all k I . Furthermore, if X (1) and X (2) are discrete random variables with support in the same ordered set, with respective probability vectors p(1) and p(2) , we write X (1) st Y (2) if and only if p(1) st p(2) . (k) Similarly, if for k = 1, 2, Ik is a countable ordered set and V (k) = [vij ]iI1 ,j I2 is a stochastic matrix, then the matrix V (1) is said to be smaller than the matrix V (2) in the 113
Kalmykov ordering sense, V (1) K V (2) , if and only if vij

j k j k (1)
vmj
(2)
for all i m and all k , with i, m I1 and j, k I2 . Moreover, if, for some countable ordered set I , V = [vij ]i,j I is a stochastic matrix, then V is said to be stochastically monotone if V K V . possesses stochastically monotone properties that allow us to compute The matrix P approximating bounds both from the transition probability matrix of the embedded tran , as well as the corresponding steady-state sition probability matrix at batch prearrivals, P distribution, generalizing the results presented by Ferreira and Pacheco ([20], Teorema 1) for single arrivals. For that let us consider the matrices M (N ) =
nSX
and (N ) = M
nSX
bn n
m + m P
mN
m>N
m U
(25)
(N ) denote DTMCs in S for N N, with U = (uij ) = (j 0 ). Moreover, let Y (N ) and Y (N ) ( N ). with respective transition probability matrices M and M (N ) are stochastic, Theorem 8 The matrices M (N ) and M (N ) K M M (N ) K P (27)
bn n
m + m P
m>N
mN
N m P
(26)
(N ) converge to P when N tends to innity. In addition, for all N N, and M (N ) and M (N ) (N ) , N min(c, n0 + 1)), if we let n0 = inf SX , then Y , N min(c 1, n0 ), and Y are irreducible and aperiodic DTMCs. Moreover, the DTMCs Y (N ) , N min(c 1, n0 ), are ergodic if c < or < 1, (N ) , N min(c, n0 + 1), are ergodic if c < or for sufciently and the DTMCs Y large N if c = and < 1. Proof Let N N. The mixed Poisson probabilities {m , m N} are a probability function and are all strictly positive. This implies that the matrices C (N ) =
mN
m + m P
m>N
m U
and
(N ) = C
mN
m + m P
m>N
N m P
114
are stochastic because they are convex linear combinations of the stochastic matrices n , n N, and U . P As the operators n , n N+ , transform stochastic matrices into stochastic matrices, it thus follows that M (N ) =
nSX
bn n (C (N ) )
and
(N ) = M
nSX
(N ) ) bn n (C
(28)
are stochastic matrices since they are convex linear combinations of the stochastic ma (N ) ), n N+ . In addition, from (22) and (25)(26), we trices n (C (N ) ) and n (C conclude that = M (N ) P
m>N
and (N ) P = M
m>N
m U m
nSX
nSX
m ) bn n (P
N P m ). bn n (P
m ) is a stochastic matrix, for all m N+ , Thus, taking into account that nSX bn n (P (N ) to P , as N tends to innity, follows from the fact the convergence of M (N ) and M that m>N m tends to zero as N tends to innity. The stochastic ordering relation (27) follows from the following properties: (i) the are stochastically monotone non-increasing powers of the transition probability matrix P n1 for all 0 n1 n2 ; (ii) U K V n2 K P in the Kalmykov ordering sense, i.e., P n for all for any stochastic matrix V with indices in S , so that, in particular, X K P n N; (iii) for any two stochastic matrices V1 and V2 with indices in S , if V1 K V2 , then n (V1 ) K n (V2 ) for all n N+ ; and (iv) the Kalmykov ordering is closed for convex linear combinations. More precisely, from (i)+(ii) we conclude that C (N ) K (N ) , with C = m C K C mN m P . Taking into account (22) and (28) and using (iii)+(iv), we get M (N ) =
nSX
bn n (C (N ) ) K
nSX
K bn n (C ) = P
nSX
(N ) ) = M (N ) bn n (C
from which (27) follows. (N ) , N min(c, n0 + 1), are The DTMCs Y (N ) , N min(c 1, n0 ), and Y irreducible and aperiodic since, as the mixed Poisson probabilities and the probability bn0 are strictly positive, the corresponding transition probability matrices have the form of a matrix C = [cij ]i,j S such that: cij > 0, 0 j min(c, i + n0 ), for i S . Consequently, if c < they are ergodic. Conversely, if c = , < 1 and N min(c 1, n0 ) (in which case we know that (N ) and, from Lemma 1, Y is ergodic, Y is irreducible and aperiodic), as M (N ) K P 115
we conclude using Lemma 9 which is stated after this proof that Y (N ) is ergodic. Suposse now that c = and = b/s < 1, x > 0 such that b(1 + ) < s, which then implies that = b s/(1 + ) < 0, and N ( ) > b such that nN nn s/(1+ ) = b for all N N ( ). This implies that, for N N ( ) and i s + N , (N ) i|Y (N ) = i = E Y k+1 k
mSX
= b
(m n)n +
nN
n>N
nn N
n>N
n b
(m N )n
nN
nn
< 0.
nN
(N ) i|Y (N ) = i Moreover, E Y b, for all i S . Therefore, using Pakess lemma, k+1 k ( N ) we conclude that Y is ergodic for N N ( ). The next stated lemma was invoked in the proof of the previous theorem. In the way provided in the Appendix, its proof is similar to the proof of Theorem 3.12 of Cabral Morais [8]. Lemma 9 Let W (1) and W (2) be irreducible DTMCs with common state space contained in N and having transition probability matrices P (1) and P (2) , respectively, such that P (1) K P (2) . Then, W (1) is positive recurrent case W (2) is positive recurrent.
5.3. Bounds for the steady-state distribution of the batch prearrival state process
We will end the section with a result, based on Theorem 8, that establishes bounds for the steady-state distribution of the batch prearrival state process. Its proof uses a lemma that is stated afterwards and which is proved in the Appendix. Theorem 10 Suppose that c < or < 1, N is sufciently large so that the DTMC (N ) is ergodic, and let (N ) ( , (N ) ) be the unique stationary probability vector Y , Y (N ) ). Then associated to Y (N ) (Y (N ) st st (N ) (29)
(N ) is ergodic, then Y and Y (N ) are also ergodic, in view of Proof If the DTMC Y Theorem 8 and Lemma 9. Moreover, the unique stationary probability vectors (N ) , and Y (N ) , respectively, are also the limit probability vectors and (N ) of Y (N ) , Y of the corresponding DTMCs. From (27) we conclude that, using the relation (30) of Lemma 11, that n st a(M (N ) )n a(M (N ) )n st aP 116
for n, N N+ and an arbitrary probability vector a in S . By letting n tend to innity in the previous double inequality we obtain (29) since the usual stochastic ordering is closed for is closed for limits ([44], Theorem 1.A.3 (c)). The following lemma involves stochastically monotone matrices and can be seen alternatively as a result on the closure of the usual stochastic ordering for mixtures, as illustrated in its proof given in the Appendix. Lemma 11 Suppose that, for s = 1, 2, Ss is an ordered set isomorc to an interval of Z, (s) (s) u(s) = [uj ]j S1 is a probability vector in S1 and R(s) = [rjk ]j S1 ,kS2 is a stochastic matrix. Then, u(1) st u(2) R(1) K R(2) = u(1) R(1) st u(2) R(2) . (30)
If, furthermore, S1 is nite and R = [rjk ]j,kS1 is a stochastically monotone matrix, then u(2) R u(1) R |S1 | u(2) u(1) (31) for . = . 1, .
,
with |S1 | denoting the cardinal of S1 .
6. Bounds for system related distributions and performance measures In this section, departing from the results of Theorem 10 relative to the batch prearrival state process, we will derive corollaries for the state seen by customers at their arrival to the system and their waiting times in queue, along with the continuous time state process and performance measures of the GI X /M/s/c system. We will assume in the section that the conditions of Theorem 10 hold, i.e., c < (N ) is ergodic. Moreover, we or < 1 and N is sufciently large so that the DTMC Y ( N ) ( N ) , let ( , ) be the unique stationary probability vector associated to Y (N ) (Y ( N ) Y ); this implies, in particular, that (29) holds. We next state the main result of the section, which establishes bounds for the probability vectors = H and p = ( ) associated to the state seen by customer at their arrival to the system, and the continuous time state process, respectively, where H is given by (17) and by (18). Theorem 12 Suppose that the conditions of Theorem 10 hold. Then (N ) st st
(N )
(32) (33)
p(N ) st p st p (N ) with (N ) = (N ) H , p(N ) = (N ) ( ),

(N )
= (N ) H , and p(N ) = (N ) ( ).
117
Moreover, if c < , then q (N ) q (N ) for q = , p and . = . 1, .

.
(c + 1) (N ) (N )
(34)
Proof In view of Theorem 10 and Lemma 11, to prove the result it sufces to show that H and are stochastically monotone matrices. We rst note that, in view of (18),
T2 T1
jk =
nSX
bn E
0 s
1{D(t)=k} dt D(0) = min(c, j + n) P(D(t) = k |D(0) = min(c, j + n) ) dt dA(s) (35)
=
nSX
bn
R0 0
=
nSX
bn
R0
(t) dt. min(c,j +n)k (t)A
As the coefcients ij , i, j S , are nonnegative and, taking into account (35),

c c s
jk =
k=0 nSX
bn
k=0 R0 s 0
P(D(t) = k |D(0) = min(c, j + n) ) dt dA(s) 1 dt dA(s) =

R0 0 R0
=
nSX
bn
s dA(s) = 1
for j S , we conclude that is a stochastic matrix. Again from (35), we have that
c s
(il ) =
l=j nSX
bn
R0 0
P(D(t) j |D(0) = min(c, i + n) ) dt dA(s)
for i S , and D is a CTMC with non-increasing sample paths and unitary jumps with negative sign. As, moreover, for n N+ , min(c, i + n) is a non-decreasing function of i, s it follows that 0 P(D(t) j |D(0) = min(c, i + n) ) dt, the mean time that, departing from state min(c, i + n), the process D stays in states greater or equal to j in the time interval [0, s] is a non-decreasing function of i. Consequently, lj (il ) is a nondecreasing function of i, i.e., is stochastically monotone. In turn, in view of (17), the matrix H is stochastic since
c c1
hij =
j =0 j<i
hij +
c>j i
hij + hic = 0 +
j =i
j i+1 B + 1 b
ci l=1
l B b
=1
118
l , l S , are nonnegative, we conclude that and b and the coefcients B j k hij is a non-increasing function of i, for all k S . That is, H is stochastically monotone. We are now able to use the results of the previous theorem relative to the limit distribution of the state seen by customers at their arrival to the system along with the results of Corollary 6 concerning the customer waiting time in queue process to derive bounds for the limit distribution of the customer waiting time in queue process. Corollary 13 Suppose that the conditions of Theorem 12 hold and the batch size distribution is aperiodic. Then, for any t > 0, we have the following: (i) If c = , then est
mN
for all i S . In addition, since we have k<i 0 0 j i+1 B k hij = ik<c = j =i b j k 1 1 k=c<
ki+1 l=1
l B b
k<i ik<c k=c<
(st)m m!
i
is+m
(N )
lim P(Wl > t)

l mN
est
(st)m m!
i
is+m
(N )
(36) (i) If c < , then

cs1
e
m=0
st (st)
cs1 N) a( m
m!
lim P(Wl > t)

l m=0
est
(st)m (N ) am m!
(37)
< c), and, for C (t) = liml P(Wl > t|Y l

cs1
e
m=0
st (st)
am
(N ) (N )
cs1
m!
C (t)
m=0
est
1 c
(N )
(st)m am N) m! 1 ( c
(N )
(N )
(38)
where
N) a( m = is+m
(N )
N) and a( m = is+m
N) ( c
(39)
for m = 0, 1, . . . , c s 1. (iii) The lower and upper bounds given in (36)-(37) hold for the steady-state survival function of the customer waiting time in queue and (38) for the steady-state survival function of the customer waiting time in queue of non-blocked customers, even for the case where the batch size is periodic. 119
Proof Suppose that the conditions of Theorem 12 hold and the batch size distribution is aperiodic. Then, in view of (32), we have for all j S , i
ij (N )
ij
i
ij
(N )
(40)
(i). Suppose that c = ; then (36) follows directly from (40) since, in view of (15), lim P(Wl > t) =
mN
est
(st)m m!
i .
is+m
(ii). Suppose now that c < . From (40), it follows that i

ij (N )
ij
i
ij
(N )
N) and ( c c c
(N )
for all j S . This, in turn, implies that

N) a( m is+m N) i c a ( m
and
(N ) i<s+m i (N ) 1 c
i<s+m i
1 c
(N ) i<s+m c . (N ) 1 c
(41) for m = 0, 1, . . . , c s 1. Then, (37)-(38) follow directly from (41) since: in view of (15),
cs1 m=0 l
lim P(Wl > t) =
est
and, in view of (16),

cs1 l
(st)m m!
is+m
i c
lim P(Wl > t) =

m=0
est
(st)m 1 m!
i<s+m i
1 c
(iii). The result follows directly from (i) (ii) and Corollary 6 (iv ). We next establish, based on Theorem 10, Theorem 12 and Corollary 13, bounds for some important measures of GI X /M/s/c systems. Namely, we address the the steady-state blocking probability, the effective trafc intensity, and the steady-state mean and variance of: the number of customers in the system at batch prearrivals, number of customers seen in the system by a customer at his arrival to the system, number of customers in the system in continuous time, and the waiting time in queue of nonblocked customers. In the following we assume that the conditions of Theorem 10 hold. 120
For nite buffer systems, we have the following bounds for the steady-state customer blocking probability, PB = c :
N) PB c ( c (N )
(42)
If the batch size distribution is aperiodic, then the bounds hold for the limit customer blocking probability. Conversely, PB = 0 in the case of innite buffer systems. b For nite buffer systems, the effective trafc intensity is eff = s (1 c ) = (1 c ). Thus, in view of (42), we have (1 c
(N ) N) ) eff (1 ( c ).
(43)
Let the steady-state number of customers in the system at batch prearrivals, seen by a customer at his arrival to the system, and in continuous time be denoted, respectively, . Then we have the following result, where for a random variable V we by: L, L , and L let E(V ) and Var(V ) denote respectively the mean and variance of V . Corollary 14 Suppose that the conditions of Theorem 10 hold. Then the steady-state mean and variance of the number of customers in the system at batch prearrivals, seen by a customer at his arrival to the system, and in continuous time satisfy, respectively, k k
kS (N ) kk kS N) kp( k kS (N )
E(L)
kS
k k
(N )
(44) (45) (46)
E(L )
kS
kk kpk
(N )
) E(L
kS
(N )
and
2 (N ) k2 k kS (N ) k2 k kS N) k 2 p( k kS 2
kS
(N ) k k 2
Var(L)
kS
(N ) k2 k
kS
(N ) k2 k 2
(47)
kS
(N ) kk 2
Var(L )
kS
(N ) k2 k
kS
(N ) k2 k 2
(48)
kS
(N ) kpk
) Var(L
kS
(N ) k 2 pk
kS
N) k 2 p( k
(49)
121
Proof Note that the function f (x) = xk is a non-decreasing function of x, for x > 0 and any positive exponent k . Thus, if V1 and V2 are nonnegative random variables then (see, e.g., [44, 48]) V1 st V2 = E(V1k ) E(V2k ) (50) for any positive exponent k . All the six stated equations now follow from the previous implication and theorems 10 and 12. Let W denote the steady-state waiting time in queue of a non-blocked customer. We will next use the results derived in Corollary 13 to get lower and upper bounds for the mean and variance of W . Corollary 15 Suppose that the conditions of Theorem 10 hold and the buffer is nite. Then the steady-state mean and variance of the waiting time in queue of a non-blocked customer satisfy, respectively, 1 s and 1 (s)2
cs1 m=0 (N ) 2(m + 1)am (N ) 1 c cs1 cs1 m=0
am
(N ) (N )
E(W )
1 c
1 s
cs1 m=0
am
(N ) (N )
1 c
(51)
m=0
(N ) am (N ) c
Var(W )
(N ) (N )
with am and am as given in (39). Proof The result follows directly from (38) by noting that est
R0
1 (s)2
cs1
(N ) 2(m + 1)am (N ) 1 c m=0
cs1 m=0
(N ) am (N ) c
(52)
(st)m 1 dt = m! s
and
R0
2test
(st)m 2(m + 1) dt = m! (s)2
for all m N, and E(W ) =

R0
P(W > t) dt and E(W 2 ) =

R0
2t P(W > t) dt.
Note that results similar to the ones derived in corollaries 13 and 15 for the steadystate customer waiting time in queue could be derived for the steady-state waiting time in queue of the rst customer of a batch, using the results derived for the steady-state number of customers in the system at batch prearrivals. 122
7. Numerical Results To show the power of the proposed approach we analyze GI X /M/s/c systems with the following types of renewal arrival processes, with the same mean interarrival time and service rate s1 : Pareto (P ), exponential (M ), deterministic (D), hyperexponential 2 (H 2), and Erlang with 2 phases (E2 ). In addition, to evaluate the inuence of the bach size distribution we use ve different batch size distributions with common mean: shifted binomial a binomial with 4 trials and success probability 0.5 added of one unit (1+B(4,0.5)), deterministic the constant 3 (D(3)), uniform discrete on the set {1, 2, 3, 4, 5} (U(1,5)), shifted Poisson a Poisson random variable with mean 2 added of one unit (1+Po(2)), and geometric with success probability 1/3 (Geo(1/3)). Moreover, we also consider families of distributions such as {1 + B(4, p), p [0, 1]} and {Geo(1 p), p [0, 0.8]}. For the Pareto distribution we use the parametrization (, ), > 0 and > 1, with probability density function /t +1 , for t , whose mean is /( 1). For the hyperexponential, we consider the specic case with probability density of the form 0.2et + 1.6e2t , for t > 0, with > 0. Moreover we consider the uniform distribution with left-handpoint of the interval being zero. The results have been computed with MATLAB algorithms and the steady-state probabilities of the number of customers in the system seen by customers at their arrival to the system have been obtained with an accuracy of = 10 10 . We have checked the correctness of the implementation of the algorithm proposed by comparing its output with the results obtained from dedicated simulations. Moreover, the mixed-Poisson probabilities (4) have been computed using the recursions proposed in [35]. These recursions are stable, easy to implement, and their computation time is linear on the number of computed coefcients. In particular, the computation of the mixed-Poisson probabilities with Pareto structural distribution, whose associated distribution function converges slowly to 1, avoids the numerical problems reported until recently [22]. To accurately forecast performance measures of GI X /M/s/c systems it is needed a good tting of both the interarrival time distribution and the batch size distribution, as illustrated in gures 1-4. We will provide results that illustrate mainly the sensitivity of performance measures of GI X /M/s/c systems with respect to the batch size distribution not merely its mean. For results illustrating the sensitivity of performance measures of GI X /M/s/c systems with respect to the interarrival time distribution see, e. g., [20]. We now introduce results for the steady-state blocking probability and steady-state mean waiting time in queue as a function of the trafc intensity (Figure 1), the system capacity (Figure 2) and the number of servers (Figure 3) for Pareto with = 1.1, exponential and deterministic batch interarrival time distributions. These gures show that the sensitivity of the steady-state blocking probability and steady-state mean waiting 123
0.8 6 0.7
Steadystate mean waiting time in queue
Steadystate bloking probability
0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.8 X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3) 0.5 1 Offered traffic intensity 1.5 2
2 X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3) 0.5 1 Offered traffic intensity 1.5 2
0 0
6 0.7
0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.8 0.7 X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3) 0.5 1 Offered traffic intensity 1.5 2
0 0
0.6 0.5 0.4 0.3 0.2 0.1 0 0
X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3)
0.5
1 Offered traffic intensity
1.5
0 0
Fig. 1: Steady-state blocking probability and steady-state mean waiting time in queue in GI X /M/3/10 systems as a function of the trafc intensity for the following three types of batch interarrival time distributions: Pareto with = 1.1 (top), exponential (middle), and deterministic (bottom).
124
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3)
12
10
4 X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3) 5 10 15 20 25 30 System capacity 35 40 45 50
10
15
20 25 30 System capacity
35
40
45
50
12 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3)
10
4 X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3) 5 10 15 20 25 30 System capacity 35 40 45 50
10
15
35
40
45
50
12 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3)
10
X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3)
10
15
35
40
45
50
10
15
35
40
45
50
Fig. 2: Steady-state blocking probability and steady-state mean waiting time in queue in GI X /M/3/c systems (with trafc intensity 0.9) as a function of the system capacity for the following three types of batch interarrival time distributions: Pareto with = 1.1 (top), exponential (middle), and deterministic (bottom).
125
0.7 0.6
5 4.5 X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3)
Steadystate mean time in queue

10
4 3.5 3 2.5 2 1.5 1 0.5 0 3 4 5 6 7 Number of servers 8
0.5 0.4 0.3 0.2 0.1 0 3 0.7 0.6 X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3) X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3) 4 5 6 7 Number of servers 8 9
10
5 4.5 X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3)

6 7 Number of servers 8 9 10
4 3.5 3 2.5 2 1.5 1 0.5
0.5 0.4 0.3 0.2 0.1 0 3 0.7 0.6 X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3)
0 3 5 4.5
6 7 Number of servers
10
4 3.5 3 2.5 2 1.5 1 0.5
X~1+B(4,0.5) X~U(1,5) X~1+Po(2) X~D(3) X~Geo(1/3)
0.5 0.4 0.3 0.2 0.1 0 3
10
0 3
10
Fig. 3: Steady-state blocking probability and steady-state mean waiting time in queue in GI X /M/s/10 systems (with trafc intensity 0.9) as a function of the number of servers for the following three types of batch interarrival time distributions: Pareto with = 1.1 (top), exponential (middle), and deterministic (bottom).
126
0.7 0.6
0.7 0.6
Steaystate bloking probability
0.5 0.4 0.3 0.2 0.1 0 0 2
X MX /M/3/10 E2 /M/3/10
0.5 0.4 0.3 0.2 0.1 0 0 2 MX /M/3/10 E2 /M/3/10 D /M/3/10 H2X/M/3/10 P(1.1,.)X/M/3/10
X X
D /M/3/10 H2X/M/3/10 X P(1.1,.) /M/3/10
0.2
0.4 p
0.6
0.8
0.1
0.2
0.3
0.4 p
0.5
0.6
0.7
0.8
1.8 1.6 1.4 1.2 1 0.8 0.6 0 5 MX /M/3/10 E2 /M/3/10 DX/M/3/10 X H2 /M/3/10 X P(1.1,.) /M/3/10
X
Steaystate mean waiting time in queue
1.8 1.6 1.4 1.2 1 0.8 0.6 0 5 MX /M/3/10 E2 /M/3/10 D /M/3/10 H2X/M/3/10 P(1.1,.)X/M/3/10
X X
0.2
0.4 p
0.6
0.8
0.1
0.2
0.3
0.4 p
0.5
0.6
0.7
0.8
Steadystate mean number in system
4
X MX /M/3/10 E /M/3/10 2
Steaystate mean number in system
4.5
4.5
3.5
DX/M/3/10 H2X/M/3/10 X P(1.1,.) /M/3/10
3.5
MX /M/3/10 E2 /M/3/10 2.5 DX/M/3/10 H2X/M/3/10 P(1.1,.)X/M/3/10 0.1 0.2 0.3 0.4 p 0.5 0.6 0.7 0.8
2.5
2 0
0.2
0.4 p
0.6
0.8
2 0
Fig. 4: Steady-state blocking probability, steady-state mean waiting time in queue, and steady-state mean number of customers in the system in GI X /M/3/10 systems (with trafc intensity 0.9) for batch size distributions 1 + B (4, p) (left) and Geo(1 p) (right) as a function of p.
127
time in queue to the batch size distribution varies with the interarrival time distribution, the trafc intensity, the system capacity, and the number of servers (for xed trafc intensity). Note, in particular, that ParetoX /M/s/c systems with = 1.1, or for the case with other small value of , are less sensitive than M X /M/s/c and D X /M/s/c systems, as well as other GI X /M/s/c systems with interarrival time distribution having nite variance, to the batch size distribution. In Figure 4 we present results for the steady-state blocking probability, steadystate mean waiting time in queue, and steady-state mean number of customers in GI X /M/3/10 systems (with trafc intensity 0.9) for batch size distributions 1+ B(4, p) (left) and Geo(1 p) (right) as a function of p. Large differences in the values of these performance measures are observed across the interarrival time distributions considered, along with internal variation with respect to the parameters of the batch size distribution. The gure shows that the performance measures may vary strongly as the mean batch size changes. The steady-state blocking probability shows a tendency to increase as the mean batch size increases, with this tendency being stronger if the second moment of the batch size distribution increases faster than the mean as it is the case with the geometric distribution in the parametric form considered. Thus, the batch size distribution inuences the performance measures of GI X /M/s/c systems not only through the mean but also through other characteristics like its variability. This can be explained jointly by the inspection paradox (which says that the mean size of the batch of a randomly chosen customer is equal to the ratio of the second moment to the mean of a regular batch size) and the fact that a customer that is part of a large batch has higher probability of being blocked. 8. Appendix We prove the lemmas 9 and 11. We start with the proof of Lemma 9 by a similar procedure to the one used to prove Theorem 3.12 of Cabral Morais [8]. Proof of Lemma 9. Let W (1) and W (2) be DTMCs having by common state space a subset of N and possessing one-step transition probability matrices P (1) and P (2) , respectively, such that P (1) K P (2) and, without loss of generality, assume that 0 is a (1) (2) state of W (1) and W (2) and W0 = W0 = 0. (k) For k = 1, 2, let T0 be the rst return time of the process W(k) to state 0, i.e., (k) (k) T0 = inf {n N+ : Wn = 0}. We want to show that if W (2) is ergodic then so is W (1) , for which it sufces to prove that E(T0 ) < E(T0 ) < taking into account that W (1) and W (2) are irreducible. 128
(2) (1)
(53)
The survival function of T0 at m, m N+ , can be written as a function of the number of visits to the state 0 made by the process W (k) until time m, i.e.,
m
(k)
P(T0
(k)
> m) = P
n=1
1{W (k) =0} = 0

n
= 1 E min 1,
n=1
1{W (k) =0}

n
(54)
since min{1,
(k) m } Ber(P(T0 m)). (k) n=1 1{Wn =0} (2) (1) = W0 and P (1) K P (2) , then, in view of ([33], Theorem As W0 3.13), W (1) st W (2) . Therefore, from ([44], relation (4.B.20)), and since m+1 , f (w0 , w1 , ..., wm ) = min{1, m n=1 1{wn =0} } is a non-increasing functional in N
m N, we have that
m m
E min 1,
n=1
1{W (1) =0}

n
E min 1,
n=1 (1)
1{W (2) =0}

n
so that, in view of (54), we conclude that P(T0 This implies that

(1) E(T0 ) +
> m) P(T0
(2)
> m), for all m N.
=
m=0
(1) P(T0
> m)
m=0
P(T0
(2)
> m) = E(T0 )
(2)
from which (53) follows. Proof of Lemma 11. For s = 1, 2, let X (s) denote a discrete random variable in S1 with probability function u(s) and let Y (s) be a random variable in S2 , dened on the same probability space as X (s) , such that, conditional to X (s) = i, the random variable Y (s) (s) (s) has probability function ri = [rij ]j S2 . Since, P(Y (s) = j ) =
iS1
P(X (s) = i)P(Y (s) = j |X (s) = i) =

iS1
ui rij
(s) (s)
for j S2 , we conclude that Y (s) has probability function u(s) R(s) . (s) As, for each i S1 , ri constitutes a probability function, Y (s) can be seen as a mixture of the random variables Y (s) |(X (s) = i), i S1 , with mixing random variable X (s) . Hence, (30) follows directly from Theorem 1.A.6. of Shaked and Shanthikumar [44] since u(1) st u(2) and R(1) K R(2) ri st rm , i m . 129
(1) (2)
In order to prove (31) let us assume that S1 is nite and R = [rjk ]j,kS1 is a stochastically monotone matrix. Then, |[u(2) R]j [u(1) R]j | =
nS1 (1) (u(2) n un )pnj nS1 (1) |u(2) n un |
(55)
for j S1 . Thus, u(2) R u(1) R 1 = j S1 |[u(2) R]j [u(1) R]j | |S1 | u(2) u(1) 1 . The analogous inequality for the norm follows similarly starting from the relation (55).
ACKNOWLEDGEMENTS o para a Ci This research was supported in part by Fundac a encia e a Tecnologia, and the projects POSI/42069/CPS/2001, POSI/EIA/60061/2004, and POCTI/MAT/55796/2004. References
[1] S. Asmussen. Applied Probability and Queues. Springer-Verlag, New York, second edition, 2003. [2] Y. Baba. The M X /G/1 queue with nite waiting room. J. Oper. Res. Soc. Japan, 27(3):260273, 1984. [3] C. Baier, B. R. Haverkort, H. Hermanns, and J.-P. Katoen. Model-checking algorithms for continuous-time Markov chains. IEEE Trans. Software Eng., 29(6):524541, 2003. [4] N. T. J. Bailey. A continuous time treatment of a simple queue using generating functions. J. Roy. Statist. Soc. Ser. B., 16:288291, 1954. [5] M. Bladt, B. Meini, M. F. Neuts, and B. Sericola. Distributions of reward functions on continuous-time Markov chains. In G. Latouche and P. Taylor, editors, Matrix-Analytic Methods: Theory and Applicatons, pages 3962. World Scientic, River Edge, NJ, 2002. [6] G. Bri` ere and M. L. Chaudhry. Computational analysis of single-server bulk-arrival queues: GI X /M/1. Queueing Systems Theory Appl., 2(2):173185, 1987. [7] E. Brockmeyer, H. L. Halstrm, and A. Jensen. The life and works of A. K. Erlang. Trans. Danish Acad. Tech. Sci., 1948(2):277, 1948. [8] M. J. Cabral Morais. Stochastic Ordering in the Performance Analysis of Quality Control Schemes. PhD thesis, Instituto Superior T ecnico, Technical University of Lisbon, Portugal, 2001. [9] M. L. Chaudhry, M. Agarwal, and J. G. C. Templeton. Computing steady-state queueingtime distributions of single-server queues: GI X /M/1. Z. Oper. Res., 37(1):1329, 1993.
130
[10] M. L. Chaudhry and J. G. C. Templeton. A First Course in Bulk Queues. Wiley, New York, 1983. [11] B. D. Choi and B. Kim. Sharp results on convergence rates for the distribution of GI/M/1/K queues as K tends to innity. J. Appl. Probab., 37(4):10101019, 2000. [12] B. D. Choi, B. Kim, and I.-S. Wee. Asymptotic behavior of loss probability in GI/M/1/K queue as K tends to innity. Queueing Syst. Theory Appl., 36(4):437442, 2000. [13] B. D. Choi, Y. C. Kim, Y. W. Shin, and C. E. M. Pearce. The M X /G/1 queue with queue length dependent service times. J. Appl. Math. Stochastic Anal., 14(4):399419, 2001. [14] A. Dukhovny. GI X /M Y /1 systems with resident server and generally distributed arrival and service groups. J. Appl. Math. Stochastic Anal., 9(2):159170, 1996. [15] A. Dukhovny. Vacations in GI X /M Y /1 systems and Riemann boundary value problems. Queueing Systems Theory Appl., 27(3-4):351366, 1997. [16] G. Easton, M. L. Chaudhry, and M. J. M. Posner. Some corrected results for the queue GI X /M/1. European J. Oper. Res., 18(1):131132, 1984. [17] G. Easton, M. L. Chaudhry, and M. J. M. Posner. Some numerical results for the queuing system GI X /M/1. European J. Oper. Res., 18(1):133135, 1984. [18] A. Economou. Geometric-form bounds for the GI X /M/1 queueing system. Probab. Engrg. Inform. Sci., 13(4):509520, 1999. [19] A. Economou and D. Fakinos. On the stationary distribution of the GI X /M Y /1 queueing system. Stochastic Anal. Appl., 21(3):559565, 2003. [20] F. Ferreira and A. Pacheco. Analysis of Pareto/M/s/c queues using uniformization. In T. Dohi, N. Limnios, and S. Osaki, editors, Procs. Second Euro-Japanese Workshop on Stochastic Risk Modelling for Finance, Insurance, Production and Reliability, Chamonix, France, September 18-20, 2002, pages 101110, 2002. [21] D. P. Gaver, Jr. Imbedded Markov chain analysis of a waiting-line process in continuous time. Ann. Math. Statist., 30:698720, 1959. [22] R. German. Performance Analysis of Communication Systems: Modeling with NonMarkovian Stochastic Petri Nets. Wiley, Chichester, 2000. [23] W. K. Grassmann. Finding transient solutions in Markovian event systems through randomization. In W. J. Stewart, editor, Numerical Solution of Markov Chains, pages 357371. Dekker, New York, 1991. [24] W. K. Grassmann and M. L. Chaudhry. A new method to solve steady state queueing equations. Naval Res. Logist. Quart., 29(3):461473, 1982. [25] D. Gross and C. M. Harris. Fundamentals of Queueing Theory. Wiley, New York, third edition, 1998. [26] D. Gross and D. R. Miller. The randomization technique as a modeling tool and solution procedure for transient Markov processes. Oper. Res., 32(2):343361, 1984.
131
[27] D. F. Holman, M. L. Chaudhry, and B. R. K. Kashyap. On the number in the system GI X /M/. Sankhy a Ser. A, 44(2):294297, 1982. [28] A. Jensen. Markoff chains as an aid in the study of Markoff processes. Skand. Aktuarietidskr., 36:8791, 1953. [29] D. G. Kendall. Some problems in the theory of queues. J. Roy. Statist. Soc. Ser. B., 13:151 173; discussion: 173185, 1951. [30] D. G. Kendall. Stochastic processes occurring in the theory of queues and their analysis by the method of the imbedded Markov chain. Ann. Math. Statistics, 24:338354, 1953. [31] M. Kijima. On the relaxation time for single server queues. J. Oper. Res. Soc. Japan, 32(1):103111, 1989. [32] B. Kim and B. D. Choi. Asymptotic analysis and simple approximation of loss probability of the GI X /M/c/K queue. Performace Evaluation, 54(4):331356, 2003. [33] V. G. Kulkarni. Modeling and Analysis of Stochastic Systems. Chapman and Hall, London, 1995. [34] M. Kwiatkowska. Model checking for probability and time: From theory to practice. Invited paper, In Proc. 18th IEEE Symposium on Logic in Computer Science (LICS03). IEEE Computer Society Press, 2399:351360, 2003. [35] M. Kwiatkowska, G. Norman, and A. Pacheco. Model checking CSL until formulae with random time bounds. Lecture Notes in Computer Science, 2399:152168, 2002. [36] P. V. Laxmi and U. C. Gupta. Analysis of nite-buffer multi-server queues with group arrivals: GI X /M/c/N . Queueing Systems Theory Appl., 36(1-3):125140, 2000. [37] L. Liu, B. R. K. Kashyap, and J. G. C. Templeton. On the GI X /G/ system. J. Appl. Probab., 27(3):671683, 1990. [38] L. Liu, B. R. K. Kashyap, and J. G. C. Templeton. Queue lengths in the GI X /M R / service system. Queueing Systems Theory Appl., 22(1-2):129144, 1996. [39] K. Mosler and M. Scarsini. Stochastic Orders and Applications. Springer-Verlag, Berlin, 1993. [40] A. M uller and D. Stoyan. Comparison Methods for Stochastic Models and Risks. Wiley, Chichester, 2002. [41] J. C. W. van Ommeren and R. D. Nobel. On the waiting time distribution in a GI/G/1 queue with a Coxian-2 service time distribution. Statist. Neerlandica, 43(2):8590, 1989. [42] S. Resnick. Adventures in Stochastic Processes. Birkh auser, Boston, MA, 1992. [43] B. Sericola. Occupation times in Markov processes. Comm. Statist. Stochastic Models, 16(5):479510, 2000. [44] M. Shaked and J. G. Shanthikumar. Stochastic Orders and Their Applications. Academic Press, San Diego, 1994.
132
[45] F. Simonot. Convergence rate for the distributions of GI/M/1/n and M/GI/1/n as n tends to innity. J. Appl. Probab., 34(4):10491060, 1997. [46] F. Simonot. A comparison of the stationary distributions of GI/M/c/n and GI/M/c. J. Appl. Probab., 35(2):510515, 1998. [47] E. de Souza e Silva and R. H. Gail. Calculating cumulative operational time distributions of repairable computer systems. IEEE Trans. Computers, 35(4):302332, 1986. [48] D. Stoyan. Comparison Methods for Queues and Other Stochastic Models. Wiley, Chichester, 1983. [49] R. Szekli. Stochastic Ordering and Dependence in Applied Probability. Springer-Verlag, New York, 1995. [50] N. Tian, D. Q. Zhang, and C. X. Cao. The GI/M/1 queue with exponential vacations. Queueing Systems Theory Appl., 5(4):331344, 1989. [51] R. W. Wolff. Stochastic Modeling and the Theory of Queues. Prentice Hall, Englewood Cliffs, NJ, 1989. [52] D. D. W. Yao, M. L. Chaudhry, and J. G. C. Templeton. A note on some relations in the queue GI X /M/c. Oper. Res. Lett., 3(1):5356, 1984. [53] Y. Q. Zhao. Analysis of the GI X /M/c model. Queueing Systems Theory Appl., 15(14):347364, 1994.
133
The WZ factorization and the use of the BLAS library in the numerical solution of Markov chains
Beata Bylinaa
Department of Computer Science, Marie Curie-Sklodowska University, Pl. M. Curie-Sklodowskiej 1, 20-031 Lublin, Poland, e-mail: beatas@golem.umcs.lublin.pl
a
1. Introduction A system modelled with a Markov process assumes exactly one state (from the whole set) in each time moment. The system evolution is described with the probabilities of transitions from one state to another. In such a model we are usually interested in probabilities of particular states in a given time moment t. In this paper we are interested in contium-time Markov chains (CTMC). We have got a transitoin rate matrix Q (or an innitesimal generator matrix) and we get a linear equation system with some constraints: Q = 0, 0, e = 1. (1)
(where is a stationary probability vector and e = (1, . . . , 1)T ) which we must solve. The matrix Q is a square n n matrix (usually a huge one but we consider only nite state spaces), of the rank (n 1) (in interesting for us cases), with the dominant diagonal. It is singular and sparse, so we need appropriate approaches to the system (1). There are indirect methods that are widely used like iterative or projection methods. However, sometimes we need a direct method (when we need the most accurate results, for example). The most popular of the direct methods are the LU factorization method and its relatives. In this article we want to present another direct method for solving (1), namely the WZ factorization method. We show how to speed up (without changing accuracy) the computations with the use of vector and parallel algorithms of the WZ factorization.
2. The WZ Factorization In this section we describe shortly the WZ factorization and the method for solving a linear system Ax = b where A Rnn , b, x Rn (2)
with the WZ factorization. The WZ factorization is described in [6]: A = WZ, where matrices W and Z consist of the following columns wi and rows zT i respectively: wi = (0, . . . , 0, 1, wi+1,i , . . . , wni,i , 0, . . . , 0)T wi = (0, . . . , 0, 1, 0, . . . , 0)T wi = (0, . . . , 0, wni+2,i , . . . , wi1,i , 1, 0, . . . , 0)T
ni+1 i i
i = 1, . . . , m, i = p, q, i = q + 1, . . . , n, i = 1, . . . , p, i = p + 1, . . . , n, (3)
zT i zT i where
= (0, . . . , 0, zii , . . . , zi,ni+1 , 0, . . . , 0)

i1
= (0, . . . , 0, zi,ni+1 , . . . , zii , 0, . . . , 0)

ni
m = (n 1)/2 ,
p = (n + 1)/2 , Wc = b, Zx = c,
q = (n + 1)/2 . (4) (5)
After factorization we can solve two linear systems:
(where c is an auxiliary vector) instead of one (2). 3. The Four Approaches As we noted above (section 1.) we are interested in nding the vector from (1). In our case we need to solve a homogenous linear system with n equations and n unknowns. Such a system has a non-zero solution if and only if the coecient matrix is singular. The matrix Q has a zero eigenvalue ([11]) so it is singular. Let us assign = xT and transpose (1) to get: QT x = 0, x 0, eT x = 1. (6)
We want to present four approches to solving such a linear system (like Steward did in [11] for the LU factorization). For all of them we assume that the Markov chain represented by Q is irreductible and Q is of rank (n 1). 136
3.1. Replacing an Equation
The most intuitive approach to the problem is to replace an arbitrary equation of that system with the normalization equation eT x = 1. (7)
Let Qp be the matrix Q with the pth column replaced with the vector e. Our modied system can be written: QT p x = ep , where ep = (ip )i=1,...,n . (8)
Let QT p = WZ. Setting Zx = y in the system WZx = ep we get Wy = ep from which it is obvious that y = ep . So now we are to solve the system Zx = ep . This approach is likely to yield a result less accurate than the ones presented in sections 3.2. (what was explained in [2]) and 3.3. and 3.4..
3.2. Removing an Equation
Another approach is to remove an equation. We know that the rank of Q is (n 1) that is one of the equations can be written as a linear combinantion of other equations. If we drop an equation we get a linear system of (n 1) equations with n unknowns (and the normalization equation). In this approach we divide the matrix QT in blocks ([11]): QT = B cT d f (9)
where B is an nonsingular matrix of the size (n 1), c and b are (n 1)-element vectors and f is a real number. Let us assign xn = 1, now xT = [xT , 1] and our equation (1) becomes B cT d f x 1 =0 (10)
what we can write as a linear system: Bx + d = 0 cT x + f = 0. (11)
Now we can solve the linear system without the last equation, that is only Bx = d. We solve it using the WZ factorization the matrix B is factorized: B = WZ and the system: Wy = d (12) Zx = y 137
is solved. Now we must normalize the solution vector xT = [xT , 1]. Of course, whichever equation can be dropped, not only the last.
3.3. The Zero Determinant
The matrix Q is irreductible so there exist matrices W and Z described in Section 2., meeting an equity QT = WZ and W is not singular (see [6]). We are to solve an equation WZx = 0, with eT x = 1. To solve it we can solve two linear systems: Wy = 0 and Zx = y. The matrix W is not singular so the only solution of Wy = 0 is y = 0. We have only one system Zx = 0 to solve. To solve this system rst we solve a two-equation linear system (this is the case for an even n, see [2] for general case): zpp xp + zpq xq = 0, zqp xp + zqq xq = 0. (13)
But zpp zqq zpq zqp = 0 (it is a 2 2 determinant of the last step of elimination for the matrix of rank (n 1) with dominant diagonal). Moreover, of course 0 zqq zpq 0 = 0 and zpp 0 0 zqp = 0 so our system (13) has innite number of solutions. We choose an arbitrary non-zero real number and set xp = . Now we can compute other elements of the vector x. In the case of an odd number n (when p = q ) the only dierence is that the system Zx = 0 has the form: x1 0 z11 z1n . . . . . . 0 zpp 0 xp = 0 . . . . . . . zn1 znn 0 xn There holds zpp = 0 (it is a 1 1 determinant of the last step of elimination for the matrix of rank (n 1) with dominant diagonal) so the equation zpp xp = 0 has innite number of solutions. We can set xp = ( = 0) and nd other elements of the vector x. After computing the whole vector x we normalize it to meet the equity eT x = 1.
3.4. The Inverse Iteration
The inverse iteration is a method for nding eigenfunctions of linear operators. Here it is used to nd an eigenvector x of a matrix A: Ax = x 138
where A is a real or complex square matrix and is an eigenvalue. The inverse iteration is a method to compute an eigenvector associated with a known eigenvalue. Given an approximation of the eigenvalue and a starting vector x0 , the inverse iteration generates a sequence of vectors (xk ) being solutions of the systems: (A I)xk = sk xk1 for k 1 where sk is responsible for normalization of xk usually it is chosen to make ||xk || = 1 with respect to a suitable norm. If everything goes well the sequence (xk ) converges to the eigenvector associated with the eigenvalue especialy, when the is the eiganvalue closest to and when the vector x0 is a linear combination of the eigenvectors (what is almost sure). Let A = QT . Now = 0 is an eigenvalue of A so we can write (with sk = 1): (QT 0I)xk = xk1 and then: QT xk = xk1 . For k = 1 we get: Q T x1 = x 0 . QT x As a starting vector we can choose x0 = ep and we are to solve a system T 1 = ep what we can do by the WZ factorization. Let Q = WZ, so we get WZx1 = ep and then we are to solve Wy = ep , Zx1 = y. It is obvious that the solution of the equation Wy = ep is y = ep , so we are only to solve Zx1 = ep . (14) First, let us consider an even n. The equation (14) takes the form: 0 x1 z11 z1,n 0 x 2 z22 z2,n . . . . . . xp 1 zpp zpq xq = 0 z z qp qq . . . . . . zn1,2 zn1,n xn1 0 zn1 zn,n 0 xn 139
(15)
The determinant zpp zqq zpq zqp = 0 (because QT was singular and diagonal dominant), so our linear system is contradict. But we can try to solve a system very similar to (15) with zpp zqq zpq zqp equal to the minimal positive real number on the given machine. Such an approach gives very accurate results what was explained in [12] and what is shown in the numerical experiments described in this paper. For an odd n we have: x1 0 z11 z1,n x2 0 z22 z2,n . . . . . . zpp xp = 1 . . . . . . zn1,2 zn1,n xn1 0 zn1 zn,n xn 0 and zpp = 0. Here we solve the system with zpp replaced with the minimal positive real number. For both cases we compute only x1 because next approximations do not give better results [11, 12]. In the end we normalize the solution to fulll the equation eT x = 1. 4. Numerical Experiments The algorithms were implemented for single precision numbers with the use of the language C. Programs were compiled with the compiler icc [9] with the use of the maximal opimalization option (-O3). The programs were tested under the Linux operating system on the computer with the two processor Pentium III 733 MHz (SMP). Algorithms were tested for the matrices generated for a leaky bucket model with tokens (with various parameters) [7]. In a direct method of solving the linear systems when we eliminate a nonzero elements in the matrix in the reduction phase, the overall number of nonzero elements can grow especially when the matrix is sparse. That is why we must consider various methods of storing sparse matrices. In our experiments the transition rates matrix (Q) was stored it in a traditional, two-dimensional array with a lot of zeroes [3] (called WZ) and the HarwellBoeinga sparse matrix storage scheme [4] (called SWZ). In the HarwellBoeing storage scheme a sparse matrix is stored ih three one-dimensional arrays. Arrays consist of: 140
non-zero elements values; columns indices; rows beginings; To speed up the computations we used modern programming techniques vectorization and parallelism.
4.1. The Vectorized Implementation
For the dense representation in our implementation of the WZ factorization we use of the BLAS library (Basic Linear Algebra Subroutines, [8]). BLAS is a standard of prototypes of functions (in C) or procedures (in Fortran) which implements basic operations of linear algebra. The mkl (Intels Mathematics Kernel Library ) was used for both BLAS1 and BLAS2. We use the following BLAS functions: axpy (y y + x; from BLAS1 ), scal (y y; from BLAS1 ) syr (A A + xyT ; from BLAS2 ) This algorithm with BLAS called BWZ is describe [5], [1]. For the compact representation we use the SBLAS (Sparse BLAS ). SBLAS is a library written in Fortran 77 as a BLAS1 extension for sparse vectors. When the vector b is updated four auxiliary sparse vectors are created which consist of non-zero elements of the matrix Q. Next the vector b is updated with the DAXPYI (y y + x, where x sparse vector, y dense vector, real number) subroutine. This algorithm with SBLAS called BSWZ is describe [4].
4.2. The Parallellized Implementation
There are various kinds of parallel machines. We used an SMP (Symetric multiprocessor) computer with two processors. For both our representations we parallelized nding columns of the matrix W and nding the vector b. The algorithms parallelized with the use of the OpenMP standard [10] (parallelived versions of the algorithms are denoted with an additional letter P in front of the respective abbreviation). These algorithms with OpenMP called respectively PWZ i PSWZ is describe [4]. Our results are presented in the table 1 (computation times in seconds) in the tables 2 (residual norms ||QT x||2 of computed solution vectors).
141
method replacing
removing
zero determinant inverse iteration
size 1166 2101 2850 1166 2101 2850 1166 2101 2850 1166 2101 2850
WZ 10.61 107.77 610.04 10.78 102.22 625.2 10.6 104.77 628.76 10.39 303.6 962.27
BWZ 3.26 16.28 38.1 1.02 5.3 12.98 0.86 5.61 12.85 1.92 10.35 23.82
PWZ 10.56 106.79 598.33 10.7 103.5 604.31 10.48 105.11 628.76 10.29 104.32 590.91
SWZ 3249 25976 79322 511 4301 13128 506 4334 13251 512 4355 13206
BSWZ 3257 25892 79316 507 4273 13254 494 4313 13121 499 4320 13001
PSWZ 2908 24037 73985 400 3809 11881 399 3825 11970 395 3817 11948
Table 1. The computation times in seconds of the algorithms
method replacing
removing
zero determinant inverse iteration
size 1166 2101 2850 1166 2101 2850 1166 2101 2850 1166 2101 2850
WZ 3.19e-07 1.57e-06 1.37e-06 2.92e-08 3.87e-08 4.91e-08 2.97e-08 3.52e-08 4.81e-08 2.89e-08 2.46e-08 4.82e-08
BWZ 4.50e-07 1.67e-05 1.35e-06 3.50e-08 4.54e-08 5.48e-08 3.58e-08 4.23e-08 5.42e-08 2.19e-08 1.62e-08 2.36e-08
PWZ 3.20e-07 1.57e-06 1.37e-06 2.92e-08 3.88e-08 4.91e-08 2.96e-08 3.52e-08 4.81e-08 2.89e-08 2.46e-08 4.82e-08
SWZ 1.38e-05 1.18e-05 9.44e-06 2.92e-08 3.70e-08 4.69e-08 3.06e-08 3.31e-08 4.64e-08 2.84e-08 3.47e-08 4.77e-08
BSWZ 1.44e-05 1.10e-05 1.-5e-05 2.96e-08 3.70e-08 4.70e-08 3.07e-08 3.31e-08 4.67e-08 2.84e-08 3.44e-08 4.77e-08
PSWZ 1.38e-05 1.18e-05 9.45e-06 2.92e-08 3.70e-08 4.70e-08 3.06e-08 3.31e-08 4.64e-08 2.84e-08 3.45e-08 4.77e-08
Table 2. The residual norms ||QT x||2 of the algorithms
142
5. Brief Summary The approaches presented above are an alternative to the traditional LU factorization. The numerical experiment showed that the use of the SBLAS library speeds the performance up a little without changing the accuracy (which goes down with the rise of the matrix size), The parallelization for sparse representation speed the performance up by 20% without changing the accuracy. Main, time execution is very long for compact storage and require improvement implementation. For the dense representation the use of the vectorization (with the BLAS) speeds up the performance 20 times (with the constans accuracy), but the parallelizm does not change the performance and accuracy. For all implementations and reprezentations the replacing an equation was the showest and the least accurate method. Other three methods are simillar in the performance and the accuracy. References [1] Bylina, B.: Wykorzystanie biblioteki BLAS do numerycznego rozwi azywania modeli sieci optycznych. Algorytmy, metody i programy naukowe, Lublin 2004, pp. 1119 (in Polish). [2] Bylina, B., Bylina, J.: Solving Markov chains with the WZ factorization for modelling networks. Proceedings of Aplimat 2004, Bratislava, pp. 307312. [3] Bylina, B., Bylina, J.: The Vectorized and Parallelized Solving of Markovian Models for Optical Networks, Lecture Notes in Computer Science 3037, Springer-Verlag Berlin Heidelberg 2004, pp. 578581. [4] Bylina, B., Bylina, J., Czach orski, T., Doma nska, J.: Wykorzystanie r ownoleglego i wektorowego rozkladu WZ w markowowskich modelach sieci komputerowych. Wsp olczesne problemy sieci komputerowych nowe technologie. WNT 2004, pp. 277286 (in Polish). [5] Bylina, B., Bylina, J., Stpiczy nski, P.: Blokowo-punktowy rozklad WZ macierzy. Obliczenia naukowe wybrane problemy, Lublin 2003, pp. 123130 (in Polish). [6] Chandra Sekhara Rao, S.: Existence and uniqueness of WZ factorization. Parallel Computing 23 (1997), pp. 11291139. [7] Doma nska, J., Czach orski, T.: Wplyw samopodobnej natury ruchu na zachowanie mechanizmu ciekn acego wiadra. Studia Informatica 2A(53) (2003), pp. 199212 (in Polish). 143
[8] http://www.netlib.org/blas/ [9] Intel C/C++ Compiler for Linux Release Notes. v. 6.01. [10] OpenMP C and C++ Application Program Interface. v. 2.0 (2002). [11] Stewart, W.: Introduction to the Numerical Solution of Markov Chains. Princeton University Press, Chichester, West Sussex (1994). [12] Wilkinson J.H., The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, England, 1965.
144
A distributed approach to solve large Markov chains

Jaroslaw Bylinaa
Department of Computer Science, Marie Curie-Sklodowska University, Pl. M. Curie-Sklodowskiej 1, 20-031 Lublin, Poland, e-mail: jmbylina@hektor.umcs.lublin.pl
a
1. Introduction Queueing networks have been widely used for modelling and analysis of real computer networks and many other kinds of networks (as comunication networks, for example). In general it is necessary to adopt a strictly numerical approach to queueing networks. It is always possible to obtain a Markov chain for any queueing network we can approximate arbitrarily closely any probability distribution with a phasetype representation [5] like Cox or Erlang distributions. Additionally, we can include features such as priority queueing, blocking etc. in the Markov chain representation. Although, in general, when a compilcated network behaviour is to be represented by a Markov chain then the size of the state space becomes huge very fast and there are problems with space and time complexity of algorithms for solving such huge linear systems. A distributed algorithm for solving huge and sparse linear systems that appear in the Markov chain analysis of queueing network models was presented in [2] (and in this paper, in sections 3.4.). That algorithm requires the linear system coecients matrix Q to be distributed (nearly) evenly among machines before the start of the computations. One solution to that problem is to generate the matrix Q on one computer and then distribute it among others. However, this approach is rather time-consuming because we use only one machine of the whole cluster and the distribution after the matrix generation is expensive from the communicational point of view and space-consuming because we try to store the whole huge matrix on one machine
Fig. 1. A scheme of a queueing network example
what can be expensive or even impossible (one of the advantages of the algorithm described in [2] was the starting distribution of the matrix Q to use the space on the machines in the best manner). We achieve better results when we design a distributed algorithm for generating the matrix Q after which there will be a respective part of the matrix on each of the machines. In our algorithm each computer is responsible for generating its own part of the matrix. Of course, there is some communication but reduced to the minimum. Such an algorithm is presented in [3] (and in this paper, in sections 5.6.). 2. Queueing Networks and Markov Chains To start investigating the behaviour of a queueing network as a Markov chain we have to choose a representation for the set of states. The most common manner of this is to represent each system state as a vector which components completely describes states of all elements of a queueing network. For the queueing network from the gure 1 with the constant number N of customers and with exponential (that is without memory) distribution of the service times we can describe the system state with a four-element vector (a, b, c, d) where a, b, c and d are numbers of customers waiting and being served in service stations A, B, C and D, respectively, and where a + b + c + d = N . But for example for elements with a Cox or Erlang distribution of service time we need two numbers one for 146
the number of customers and second for the state (phase) of the server. Further, for example, we may require more paremeters if there may be more classes of customers in the queue and so on. Next we have to enumerate all potential transitions among states and dene for them the transition rates qij (independent of time for homogeneous Markov chain, most interesting for us): qij = limt0 pij (t) t qij ,
j =i
for
i = j,
qii =
where pij (t) is the probability that if the system is in the state i the transition occurs to the state j in the time t. This is how the transition rate matrix Q = (qij ) is created. These steps are realised by the algorithm from [3]. The last step (being realised by the algorithm from [2]) in the analysis is to compute probability vector (t) which components i (t) are the probabilities that the system is in the state i at the time moment t or to compute the long-run (independent of time) probability vector = (1 , . . . , n ) from: Q = 0T , 0T ,
i
i = 1.
We are interested in the latter one. For conviniency we assign x = T and our problem is to nd x = (x1 , . . . , xn ) from: QT x = 0, x 0,
i
xi = 1.
3. The Iterative GMRES Algorithm For solving such an equation we chose one of projection methods, namely iterative GMRES algorithm see the gure 2. One of the advantages of this method is no ll-in generation (because the matrix Q is only used in the matrix-vector multiplication), the other is the fast convergency rate. The iterative GMRES algorithm is also convinient to vectorize [4] and parallelize (see the next section and [2]). If we want to use iterative GMRES algorithm to solve Markov chains representing behaviour of a queueing network we should expect the matrix (and of 147
1. choose x0 and m 2. r0 QT x0 3. ||r0 ||2 4. v1 r0 / 5. for j 1, . . . , m: (a) w QT vj (b) for i 1, . . . , j :

Tw i. hij vi ii. w w hij vi
(c) hj +1,j ||w||2 (d) vj +1 w/hj +1,j 6. nd y = (y1 , . . . , ym ) minimizing || e1 Hy||2 7. x0 x0 +

m i=1 vi yi
8. if the result is insuciently accurate then go to 2
Fig. 2. The iterative GMRES algorithm
148
course vectors) of a huge size. To store such a huge and sparse matrix in an efcient manner we use three one-dimensional arrays (the Harwell-Boeing storage scheme). First of them (the only array with real items) stores only the nonzero elements of the matrix. These elements are stored by rows of QT but the elements do not need to be in any particular order within a row it is only necessarily for all elements of the row i to be before all elements of the row (i + 1). The second array contains for each nonzero element its column index. The third array holds only one number for each column of the matrix QT : its starting index in the rst and second array. 4. The Distributed Implementation In our distributed implementation of iterative GMRES we used MPI (MessagePassing Interface, [1, 6]) standard that allows writing distributed programs relatively easy. Program was writen in the language C and compiled with the gcc compiler under Linux. To achieve the best size performance we decide to divide the matrix QT (n/k consecutive rows for each node, where n is the size of the matrix and k is the number of nodes) and the vectors r0 , x0 , w and vi , i = 1, . . . , m + 1 (n/k consecutive items of each vector for each node) evenly among nodes. All operations on the vectors (as scaling, adding, multiplying, computing their norm) are computed localy by each node on that nodes part of the vector(s). Then the partial result is exchanged with others nodes (only if it is necessary as for computing the norm of a vector or the scalar product of two vectors). To multiply the matrix QT by a vector we have to gather all components of the vector at the node (so we need one additional auxiliary full-sized vector in each node) before multiplying but we do not need to exchange the elements of the product, because each node hold its own part of the vector after multiplying. To see how much memory each node needs for the algorithm, let us assume that nz is the number of nonzero elements of the matrix QT . Each of the machines stores: for the matrix Q: nz/k real numbers; (nz + n)/k integer numbers; for each of the vectors r0 , x0 , vi , w: n/k real numbers; 149
for the auxiliary vector: n real numbers.

m)n +n) real numbers and So each node has to store ( nz +(3+ k nz +n k
integer numbers.
5. The Generation of the Transition Rate Matrix However, to exploit well the distributed GMRES algorithm (or any other distributed algorithm in which the matrix Q is used only for the matrix-vector multiplication) we should eciently distribute the matrix Q before computations. The best way to do that is not to distribute it at all but to generate every part of the matrix on its machine. We are intrested in one of the most general methods the Breadth-First Search algorithm (BFS), which allows us to enumerate all states, to number them, to enumerate all potential transition among states and to compute the transition rates. The BFS algorithm is an algorithm to traverse all the edges of a directed graph. We start with a list with a single (arbitrary) vertex (that is: a state vector) and we investigate all edges with the given vertex as a starting point. For each of these edges we add to the list its ending point but only if it is not in the list yet. Next we go to the next vertex in the list and so on, while there are vertices not yet traversed in the list. Described above algorithm can be used to generate the transition rate matrix for a queueing model. In this case it does not search the graph but it generates the transition graph (such a graph for the queueing network from the gure 1 for N = 2 is presented on the gure 3) and the transition rates. Such a version of the BFS algorithm is presented on the gure 4. 6. The Distributed Generation To adapt the algorithm presented above to work on a network of computers we have to pick one of the machines as a master. The others will be slaves. The slaves are the machines which generate their own parts of the matrix Q (and they will participate in computing the probability vector with the use of the algorithm from sections 3.4. [2, 3]; the master will not participate in the second part). The master controls the actions of the slaves, receives some data from them, maintains the global data and sends the slaves some information. The idea of the algorithm is as follows. 150
(2,0,0,0) (1,0,0,1) (1,1,0,0)
(1,0,1,0)
(0,0,2,0)
(0,0,1,1)
(0,0,0,2)
(0,1,1,0)
(0,1,0,1)
(0,2,0,0)
Fig. 3. A transition graph of a queueing network
1. Initialize L as a numbered list that contains only one, arbitrary state vector w (a graph vertex). 2. For each vector v representing the state that is allowed as a next state after the state w: (a) if v is not a member of the list L then attach v to L; (b) compute the transitin rate from w to v and remember it as Q[ind(w, L), ind(v, L)] (where ind(a, B) is an index of a in the list B). 3. If w is not the last element of L then assign the next element of L to w and go to 2.
Fig. 4. The BFS generation algrithm
151
1. Initialize L as a numbered list that contains only one state vector w (a graph vertex, received from the master). 2. Initialize X as an empty list. 3. For each vector v representing the state that is allowed as a next state after the state w: (a) if v is not a member of your vector pool and is not a member of the list X then attach v to X else if v is a member of your vector pool and is not a member of the list L then attach v to L; (b) compute the transitin rate from w to v and remember it as QL[ind(w, L), ind(v, L)] or as QX[ind(w, L), ind(v, X)] it depends on the list in which the v is (where ind(a, B) is an index of a in the list B). 4. If w is not the last element of L then assign the next element of L to w and go to 3. 5. Send the elements of X (that have not been sent yet) to the master. 6. Wait for the answer from the master. 7. If the answer will not be the ending signal, then attach received elements (state vectors generated by other slaves and received by the master) to L and go to 4. 8. Together with the ending signal the master sends the data (mainly sizes and indices) that the slave uses to compose its own part of the matrix Q from QL and QX.
Fig. 5. The distributed BFS generation algrithm slave
152
First, the master sends a state vector to each slave, but dierent vectors to dierent slaves. Moreover, each slave have to be sent a vector belonging to a dierent vector pool (see below). Next, each slave generates the transition rates as described in section 5. but only for the transitions from the states of the slaves own vector pool. If the slave meets a state not from its vector pool, then the state is attached to an auxiliary list, which is send to the master after the whole process. The master tracks the entire generated state space and while there exists state not being generated by the slave owning the states vector pool the master sends it to that slave and orders it to start over with this state as a starting state. The slave algorithm is presented on the gure 5. The vector pool is a key idea for this algorithm. The vector pools are subsets of the state space, their intersections are empty and their union is the whole state space. Moreover, the pools should be of almost the same count beacause they are assigned to slaves one-to-one and the algorithm described in [2] works the best for this condition. The problem is that we have to know the division into the pools before our distributed algorithm starts. The method to solve this problem is to describe each of the vector pools with simple conditions, which can be distributed among slaves before the actual algorithm starts. Exemplary conditions for our graph from the gure 3 could be the rst element of the vector is zero/the rst element of the vector is not zero (one pool of 6 and the other of 4 elements) or the one of the vector elements is 2/no vector element is 2 and the rst one is zero/no vector element is 2 and the rst one is not zero (4/3/3). The slaves can easily check if the generated state vectors are in their pools with such conditions, but on the other hand we should choose these conditions very carefully to minimize dierencies between quantities of the vector pools. 7. Conclusion We presented two algorithms which together can be used to solve large Markov chains (namely to generate the Markov chains innitesimal generator Q and to solve the equation Q = 0T ) exploiting a potential of distributed systems. The algorithms are partially implemented and tested. There is an open (and not simple) question how the vector pools can be automatically chosen by the system when only the queueing model (and not the matrix Q) is known. References [1] B. Bylina: Komunikacja w MPI, Informatyka Stosowana S2/2001, V Lubelskie Akademickie Forum Informatyczne, Kazimierz Dolny, 1718 maja 2001, 153
pp. 3140 (in Polish). [2] J. Bylina: Distributed solving of Markov chains for computer network models, Annales UMCS Informatica 1 (2003), Lublin 2003, pp. 1520. [3] J. Bylina: Distributed generation of Markov chains innitesimal generator matrices for modelling networks (submitted to Annales UMCS Informatica ). [4] J. Bylina, B. Bylina: GMRES dla rozwi azywania la ncuch ow Markowa na komputerze wektorowym CRAY SV1, Algorytmy, metody i programy naukowe, Polskie Towarzystwo Informatyczne, Lublin 2004, pp. 1924 (in Polish). [5] M.F. Neuts: Matrix Geometric Solutions in Stochastic Models An Algorithmic Approach, Johns Hopkins University Press, Baltimore, 1981. [6] P.S. Pacheco: A Users Guide to MPI, University of San Francisco 1998.
154
A tool to solve Markov chain models
PIOTR PECKA
Institute of Theoretical and Applied Informatics, ul. Baltycka 5, 44-100 Gliwice, Poland, e-mail: piotr@iitis.gliwice.pl
Abstract. The article presents a class library written in C++ which provides objects for building and solving queuing networks for complex Markovian queuing models of computer and communication systems. Continuous time Markov chains are considered. The algorithms applied in it are based on quick searching structures which allows us to solve Markovian models consisting of about 10 millions states on one single workstation. Some examples illustrate application of the package. Transient states may also be modelled: in this case a system of differential equations is solved. In both cases approximate numerical techniques are based on projection methods. Keywords: Markov chains, Breadth First Searching (BFS) algorithm, Arnoldi algorithm, transition matrix, state vector, matrix generation technique.
1. Introduction
Markov chains are commonly used in computer performance analysis. In past decade, considerable advances have been made in the numerical solution techniques, methods of automated space generation, and the availability of software packages. Markov chains represent analytic approach to obtain performance statistic for a system. The Markov chain consists of a set of states and a set of labeled transitions between the states. Markov chains are flexible enough to model complex concurrent systems and can consider such phenomena as blocking, synchronizations, state depending routing etc. Markov models may be formulated using high-level modeling formalisms such as Queuing Networks (QN) or Generalized Stochastic Petri Nets (GSPN) and
may be automatically derived in the form of transition matrix. The main problem associated with Markov modeling is very large size of transition matrix (so called state explosion problem) and numerical complexity of solving a system of linear equations defined by this matrix. This paper presents the OLYMP (Object Oriented LibrarY for modeling Markovian Processes [13]) class library that provides objects to build and to solve queuing networks for complex Markovian queuing models of computer and communication systems. The algorithms applied in the library are based on quick searching structures and can partially break limitations mentioned above. It leads to generate transitions matrix for a given model. The matrix defines linear equations for steady states analysis or differential equations for transient analysis. As a numerical method for computing probability vector we propose the modified Arnoldi algorithm (it represents one of the projections methods) with Pade approximation which is well suited for transient states [1, 14, 15]. There are very few tools for Markov analysis. We know three of them: USENUM [2] developed at the University of Dortmund, MARCA [8] developed at the University of Carolina and DNAmaca developed at the Cape Town University [9]. The first two solvers are sufficient for models up to about 100 000 states. The third one is very powerful, it is able to solve a system of millions states in short period of time on a single work station with 128 MB RAM. It uses GSPN (Generalized Stochastic Petri Nets) modeling formalism to describe a model and probabilistic algorithm based on two hash functions to determine the reachable states. This article presents a different approach: a formalism for describe a model based on QN (Queuing Networks); non deterministic algorithm based on different types of binary trees is used to generate the state space.
2. Description of state space exploration algorithm

There are following problems which appear in the transition matrix generation algorithm: 1. description of the given model; 2. definition of the scalable state vector; 3. definition of the functions which operate on network elements; 4. searching for the next states, for each reachable state and deriving the transition rates; 5. determining all reachable states and automatic state enumeration. 156
Ad. 1 The model is described by the objects defined in the class library written in C++. An user may extend this library and append new defined objects from another network elements.
Ad. 2 A state descriptor vector consists of discrete components which describe a state of the system. For example, a simple queuing model of closed network which consists of three queues with a simple station with exponent service is described by threecomponent state vector: (n1, n2, n3). The first component describes a number of tasks in the first queue, the second one the number of tasks in the second queue etc. The total number of tasks in this network is a sum of all components. A definition of vector state becomes more complex when we add another object to the model: appending another simple queue with exponent service changes the vector by one item: (n1, n2, n3, n4); a station with Erlang phases modifies the example network model by more complex component (sub-vector state): (n1, n2, n3, (n4, f)). The fourth component consists of two items the first one is the number of tasks (clients), the second one is a number of Erlang phase; the vector state is extremely complex when we consider a system with FIFO queues, which recognize several classes of clients. In this case we have to keep an order of tasks in the queue. For our example (five tasks inside the network: 3 tasks of class one and 2 of class two), the vector ((1,2,1), (1,2), 0) corresponds to a situation: in the first queue is served a task of class one (read the sub-vectors in state vector from right to left), the next task in the queue represents class two, the last one (in the end of the queue) represents class one; in second queue, task 2 is served and task 1 is the next to serve; the third queue is empty. The form of vector state in this case causes explosion of states in the Markov chain permutation of tasks of different classes in queues influences on number of states. Ad. 3 Each object in the class library, representing network element, inherits from the abstract class MarkNetItem, which delivers two functions: SndTransition and RcvTransition. These two functions change the sub-vectors (components of global vector state) of an object in matrix generation process caused by migration of tasks inside a network: SndTransition this function is invoked by main procedure of matrix generation solver; it tries to modify (if it is possible) the sub-vector of the object (network element) that invokes this function (it is caused by internal or external transition of 157
task); when the task is sent outside to another element of network (external transition) it modifies the remote object using its RcvTransition function. RcvTransition this function is invoked by SndTransition; it returns a copy of its modified sub-vector, caused by arriving task. When the object cannot accept the task it refuses returning NULL tag. When a user of the library wants to add new object to it, he has to redefine the functions mentioned above. More details in [13]. Ad. 4 The following states appear when clients move between elements of a network. Lets consider a ring connection of stations (example in Ad. 2) station 1 is connected to 2 and 3, station 2 is connected to 1 and 3, station 3 is connected to 1 and 2. Let assume that the total number of tasks inside the network is equal to 2; there are following states for beginning state: (1, 1, 0): (0, 2, 0) the task moves from station 1 to station 2. (0, 1, 1) the task moves from station 1 to station 3. (2, 0, 0) the task moves from station 2 to station 1. (1, 0, 1) the task moves from station 2 to station 3. All the reachable states of the model forms rows of transition matrix; following states for each reachable state forms the columns. The transition matrix for this model:
2;0;0 0 0,5 `1 0 0,5 2 0 0 1;1;0 0,5 0 0;1;1 0 1;0;1 0,5 0 0,5 1 0,5 1 0;2;0 0 0,5 0 0,5 2 0 1 0 0;0;2 0 0 0,5 1 0,5 0 0 2
2;0;0 1;1;0 0;1;1 1;0;1 0;2;0 0;0;2
( 0 + 1 ) 0,5 0 ( 1 + 2 ) 0,5 2 0,5 2 0,5 1 0 0,5 0 0,5 1 0,5 2
( 0 + 2 ) 0 0,5 2
The elements of the matrix (transition rates) are derived from mean service rates: 0, 1, 1 of the stations. When there is no transition from current reachable state we put zero in the column that correspond the disabled state. The diagonal of the matrix is formed by sum of the elements in row with negation. 158
The Markov solver is looking for the following states by invoking SndTransition function in objects representing the stations. Ad. 5 The main problem in generating process of transition matrix is to determine all reachable states. BFS method for generating transition matrix of Markov model was first implemented in [8]. We propose some modification of this algorithm replacing a linear list for searching states by different types of binary trees and hash function: BT (addressed by vector), DBT (disk implementation of BT), MBT ( addressed by bytes), OBT (addressed by bits), DBM (hash function implemented on disk [16]). The new BFS algorithm in object code (C++):
State s(2,0,0),x; Queue q; BTree bt; q.Append(s); bt.Append(s) while(1) { 7 while ((x=s.Succesor())!=NULL) { 8 if(!bt.In(x)) { 9 bt.Append(x) 10 q.Append(x); } 11 unsigned int state_number=x.GetStNum() } 12 if((s=q.Get())!=NULL) break; } 1 2 3 4 5 6
Object s (line 1) defines vector state with setting: (2,0,0); object q is a FIFO (line 2) queue for storing new generated reachable states; object bt (line 1) defines binary tree. Before a main loop (line 6) the beginning state is stored in the queue and in the tree (line 4, 5). In the inner loop (line 7) the new following states for state s are generated and stored in x. If state x is in the tree (line 8), its state number is assigned otherwise (line 8,9) the new state x is appended to the queue and to the tree state number is assigned from the counter inside bt object. In line 11 the state number is accessible via GetStNum function. When the queue is empty (line 12) a generating
159
process has been terminated, if is not the next reachable state is taken from the queue (function Get) and next row of the matrix is generated. This method requires more memory and processor time then the previous one but is more efficient for complex models.
3. Numerical solution of Markov chains in transient state
Transition matrix Q defines differential equations: The solution of this equation is probability vector (t) :
d (t ) = QT (t ) dt
(t) = (0)e Qt
Matrix Q is too huge, to compute the expression above. The Arnoldi method is an orthogonal projection process onto the Krylov subspace [15]. The Arnoldi process generates: an n m matrix Vm of orthonormal vectors v, Av, ... ,Am-1 where m is a dimension of the Krylov subspace, n is a dimension of the matrix A, A=QT , v is initial vector for t=0; an m m upper Hessenberg matrix. There is important relationship between A and H matrixes:
T H m = Vm AVm
Using the relationship presented above, the vector (t) is computed according the following approximate equation:
(t ) Vm e H
where
mt
=|| v || 2
Because the dimension of H is small (usually m=15) comparing to the dimension of

H Q is easy to compute expression e mt . We use Pade approximation to solve it [1].Using (t) vector probability in time t, it is possible to compute the remaining
160
parameters like boundary probabilities, loss probabilities, mean length of queues etc. for transient states.
4. Conclusion
The generation time in presented tool is longer comparing to one in DNAmaca, but there are some benefits in the first one: State space exploration algorithm is not probabilistic like that one used in DNAmaca, so the generated transition matrix is always correct. The queuing model of given network is a composition of objects corresponding to the elements of network these objects can be used again in another model. The model in DNAmaca is described using low-level GSPN (Generalized Stochastic Petri Nets) modelling formalism description of model becomes very complex, each new model have to be described from beginning. Transient state analysis is available The tool was successfully applied to modelling queuing systems described in [4, 5, 6, 7, 11] and is able to generate and solve matrixes about 10,000,000 states with 300,000,000 no zero elements on a single workstation with 500MB memory in reasonable time.
References
[1] [2] Baker G. A.: Essentials of Pade Approximation. Academic Press, New York, 1975. Bolch G. A., Greiner S., de Meer H., Trivedi K. S.: Networks and Markov Chains, Modeling and Performance Evaluation with Computer Science Applications. John Wiley & Sons, New York 1998. Domanska J., Niemiec M, Pecka P., Tomasik J: Numeryczne problemy wykorzystania Markowowskich Modeli Sieci Komputerowych. Zeszyty Naukowe Politechniki l skiej 1996 Seria: INFORMATYKA z. 30. Hamma S., Pecka P., Czachorski T., Atmaca T.: Markovian analysis of threshold based priority mechanism for frame-relay networks. Management in High Speed Networks, 2-5 November 1997, Dallas, Texas, USA. Hamma S. Pecka P.: Threshold Based Mechanism with Two-State Interrupted Poisson Arrival Process. ICT'99 in Cheju, Korea 1999. Hamma S, Domanska J., Atmaca T.: Analytical Model of Push-Out Mechanism for Frame Relay Switch: Comparative Study with Trehhold Base Mechanism and Robustness. ICT '98, Porto Carras Greece, June 1998 Jouaber B., Pastuszka M.: Czachorski T.: Modelling the Sliding window Mechanism. ICC'98, Atlanta, Georgia, USA, June 1998. 161
[3]
[4]
[5] [6]
[7]
[8]
Stewart W.: MARCA:MARKOV CHAIN ANALYZER, A Software Package for Markov Modelling, Ver. 2.0. NCSU 1990. [9] Knottenbelt W., Kritzinger P.: A Performance Analyzer for the Numerical Solution of General Markov Chains. Technical report, Data Network Architectures Laboratory, Computer Science Department, University of Cape Town, 1996. [10] Knottenbelt W.: Parallel Performance Analysis of Large Markov Models. Phd., University of London 1999. [11] Laalaoua R.:, P. Pecka P.: HOL Versus Preemptive Discipline: A Comparative Study. ATS'2000, Washington, USA, 16-20 April 2000. [12] Niemiec M., Pecka P.: Metoda odwzorowywania stanw w przestrze liczb naturalnych w markowowskich modelach systemw komputerowych. Archiwum Informatyki Teoretycznej i Stosowanej, vol. 9, z. 1-4, pp.131-153, 1997.
[13] Pecka P.: Object oriented library for generating and computing the stochastic matrix in Markov queuing models . Archiwum Informatyki Teoretycznej i Stosowanej 2000.
[14] Sidje R. B.: Parallel Algorithms for Large Sparse Matrix Exponentials: application to numerical transient analysis of Markov processes. PhD thesis, University of Rennes 1, July 1994. [15] Stewart W. J.: Introduction to the Numerical Solution of Markov Chains. Princeton University Press 94. [16] SunOS/BSD Compatibility Library Functions- dbm(3B). SunOS 5.3, 1994.
162
Two queueing models for cable access networks
Jacques Resing Eindhoven University of Technology P.O. Box 513, 5600 MB Eindhoven, The Netherlands
1. Introduction In this paper we will discuss two queueing models with a two-dimensional state space. The rst model is an M/M/1 queueing model with gated random order of service. In this service discipline customers are rst gathered in an unordered waiting room before they are put in random order in an ordered service queue at the moment that this latter queue becomes empty. The second model is a tandem queue with coupled processors. The system consists of two stations with Poisson arrivals at the rst station and exponential service requirements at both stations. The output of the rst station feeds into the second one. The total service capacity of the two stations together is constant. When both stations are nonempty, a given proportion of the capacity is allocated to the rst station, and the remaining proportion is allocated to the second station. However, if one of the stations is empty, the total service capacity of the stations is allocated to the other station. The two models are motivated by a situation encountered in multi-access communication in cable networks. Cable networks are currently being upgraded to support bidirectional data transport. The system is thus extended with an upstream channel to complement the downstream channel that is already present. This upstream channel is shared among many users so that contention resolution is essential for data transport. An ecient way to carry out the upstream data transport is via a request-grant mechanism. Users rst request data
The talk is based on joint work with Ronald Rietman (Philips Research), Lerzan Ormeci (Ko c University) and Johan van Leeuwaarden (EURANDOM)
slots in contention with other stations via contention trees. After a successful request, data transfer follows in reserved slots, not in contention with other stations. In our second model, service at the rst station represents the process of sending the requests, whereas the service at the second station represents the transmission of the actual data corresponding to the successfully received requests. There are two versions of the contention resolution mechanism via contention trees: the free access variant and the blocked access variant. Essential features of the blocked access variant are requests competing in the same tree leave the tree in random order; new requests arriving when a tree is in progress have to wait until the current tree is resolved before they can be part of a tree. Exactly these two features lead us to the study of the rst model, the queueing model with gated random order of service discipline. Here, customers in the service queue represent the requests currently competing in the tree. Customers in the waiting room represent the requests waiting until the current tree is resolved. For the M/M/1 queue with gated random order of service, we are particularly interested in the two-dimensional Markov process describing the numbers of customers in the waiting room and in the service queue. The stationary distribution of this two-dimensional Markov process is found using a compensation method for queueing problems developed by Adan [1]. For the tandem queue with coupled processors, we are interested in the two-dimensional Markov process describing the numbers of customers in both stations. The generating function of the stationary joint distribution of this two-dimensional Markov process will be obtained using the theory of boundary value problems. The rest of this paper is organized as follows. Section 2 is devoted to the joint stationary distribution of the number of customers in the waiting room and in the service queue in the M/M/1 queue with gated random order of service. After that, in Section 3 we discuss the joint stationary distribution of the number of customers in both stations in the tandem queue with coupled processors. 2. M/M/1 queue with gated random order of service
2.1. Model description
Customers arrive according to a Poisson process with rate at a single server system. The service times of the customers are exponentially distributed with parameter . We assume that := / < 1. The waiting area in front of the server consists of two parts: an unordered waiting room and an ordered service queue, both of innite capacity. Upon arrival a customer rst enters the waiting 164
room. Each time the service queue becomes empty, all customers present in the waiting room at that moment are transferred instantaneously from the waiting room to the service queue. They are put in random order into this queue and the order in which they are put into the queue determines the order in which they are taken into service later on. If there are no customers present in the waiting room at the moment that the service queue becomes empty, the server waits for the next customer to arrive, transfers this customer immediately to the service queue and starts serving this customer. It follows that the service queue cannot be empty unless the waiting room is empty. The service discipline that we obtain in this way, customers rst waiting behind a gate in the waiting room and after that being put into random order in the service queue, will be called gated random order of service. The number of customers in the waiting room at time t will be denoted by X1 (t) and the number of customers in the service queue (including the customer in service) at time t will be denoted by X2 (t). Clearly, the stochastic process {(X1 (t), X2 (t)) : t 0} is a two-dimensional Markov process. The next subsection is devoted to the determination of the stationary probabilities (k, n) = lim P (X1 (t) = k, X2 (t) = n),
t
of this Markov process.

2.2. The stationary probabilities (k, n)
The balance equations for the stationary probabilities (k, n) are given by (0, 0) = (0, 1), ( + 1) (0, 1) = (0, 0) + (1, 1) + (0, 2), ( + 1) (0, n) = (n, 1) + (0, n + 1), ( + 1) (k, n) = (k 1, n) + (k, n + 1), n 2, k 1, n 1. (1) (2) (3) (4)
The next theorem states that the probabilities (k, n) are given by an innite sum of product forms. Theorem 1 The unique probability distribution which solves the balance equations (1)(4) is given by (0, 0) = 1 ,
(5) k 0, n 1, (6)
(k, n) =
m=1
k n1 cm m m ,
for some constants cm , m and m . 165
To prove the theorem, we use a compensation method for queueing problems developed by Adan [1] (see also Adan, Wessels and Zijm [3]). The method attempts to solve the balance equations by a linear combination of product forms. This is achieved by rst determining a basis of product-form solutions satisfying (4). Subsequently this basis is used to construct a linear combination that also satises (3). The basis of product-form solutions contains uncountably many elements. Therefore a procedure is needed to select the appropriate elements. This procedure is based on a compensation argument: after introducing the rst term, countably many terms are added to compensate for the error made in the equations (3). For details we refer to Resing and Rietman [7]. We end this section with mentioning some extensions of the model. Rietman and Resing [8] have looked at the model with general service times. A machinerepair model with gated random order of service has been studied by Boxma, Denteneer and Resing [4]. 3. The tandem queue with coupled processors
3.1. Model description
We consider a tandem queueing model consisting of two stations. Customers arrive at station 1 according to a Poisson process with rate , and they demand service from both stations before leaving the system. Each customer requires an exponential amount of work with parameter j at station j , j = 1, 2. The total service capacity of the two service stations together is xed. Without loss of generality we assume that this total service capacity equals one unit of work per time unit. Whenever both stations are nonempty, a proportion p of the capacity is allocated to station 1, and the remaining part (1 p) is allocated to station 2. Thus, when there is at least one customer at each station, the departure rate of customers at station 1 is 1 p and the departure rate of customers at station 2 is 2 (1 p). However, when one of the stations becomes empty, the total service capacity is allocated to the other station. Hence, the departure rate at that station, say station j , is temporarily increased to j . In the sequel we will denote with j = /j the average amount of work per time unit required at station j , j = 1, 2. Clearly, the two-dimensional process X (t) = (X1 (t), X2 (t)), where Xj (t), j = 1, 2, is the number of customers at station j at time t, is a Markov process. Under the ergodicity condition 1 + 2 < 1 the process X (t) has a unique stationary distribution. The ergodicity condition can be understood from the fact that, independent of p, the two service stations together always work at rate 1 as long as there is work in the system. Let us denote with (k, n) the stationary 166
probability of having k customers in station 1 and n customers in station 2. In the sequel we are interested in determining the joint probability generating function P (x, y ) :=
k0 n0
(k, n)xk y n .
3.2. The probability generating function P (x, y )
From the balance equations it can be shown that P (x, y ) satises the following functional equation ( + p 1 + (1 p) 2 )xy x2 y p 1 y 2 (1 p) 2 x P (x, y ) = ((1 p) [1 y (y x) + 2 x(y 1)]) P (x, 0) + (p [2 x(1 y ) + 1 y (x y )]) P (0, y ) + (p 2 x(y 1) + (1 p) 1 y (x y )) P (0, 0). (7)
In the case p = 0, respectively p = 1, the model that we consider can be alternatively viewed as a tandem queueing model with one single server for both stations together, in which the server gives preemptive priority to station 2, respectively to station 1. It turns out that for these cases, the functional equation (7) can be solved relatively easily. This is mainly due to the fact that either the factor in front of P (0, y ) in equation (7), in case p = 0, or the factor in front of P (x, 0), in case p = 1, is equal to zero. In the sequel we sketch the approach how we can nd the solution of functional equation (7) for 0 < p < 1. A key role in the analysis is played by the kernel K (x, y ) := ( + p 1 + (1 p) 2 )xy x2 y p 1 y 2 (1 p) 2 x, because zeropairs (x, y ) of the kernel for which P (x, y ) is nite, are also zeropairs for the righthandside of (7). Using these zeropairs, we are able to formulate a Riemann-Hilbert boundary value problem for the function P (0, y ). The solution for that Riemann-Hilbert problem is known. In a similar way, it is also possible to obtain P (x, 0) and, once both P (0, y ) and P (x, 0) are determined, P (x, y ) follows from (7), because the fact that the two service stations together always work at rate 1 implies that P (0, 0) = 1 1 2 . Hence, the generating function of the joint stationary distribution of the number of customers in the two stations has been obtained. Details can be found in Resing and Ormeci [6]. The computational issues that arise when calculating performance measures like the mean number of customers in the stations and the fraction of times that stations are empty are discussed in van Leeuwaarden and Resing [5]. Finally, for an overview paper 167
on analytic solution methods (like the compensation method and the boundary value method) for queueing models with multiple waiting lines we refer to Adan, Boxma and Resing [2]. References [1] Adan, I.J.B.F. (1991). A compensation approach for queueing problems, Ph.D. thesis, Eindhoven University of Technology. [2] Adan, I.J.B.F., Boxma, O.J. and Resing, J.A.C. (2001). Queueing models with multiple waiting lines. Queueing Systems, 37, pp. 65-98. [3] Adan, I.J.B.F., Wessels, J, and Zijm, W.H.M. (1993). A compensation approach for two-dimensional Markov processes. Advances in Applied Probability, 25, pp. 783-817. [4] Boxma, O.J., Denteneer, D. and Resing, J.A.C. (2003). Delay models for contention trees in closed populations. Performance Evaluation, 53, pp. 169185. [5] Leeuwaarden, J. van, and Resing, J.A.C. (2004). A tandem queue with coupled processors: Computational issues. Submitted for publication. [6] Resing, J.A.C. and Ormeci, L. (2003). A tandem queueing model with coupled processors. Operations Research Letters, 31, pp. 383-389. [7] Resing, J.A.C. and Rietman, R. (2004). The M/M/1 queue with gated random order of service. Statistica Neerlandica, 58, pp. 97-110. [8] Rietman, R. and Resing, J.A.C. (2004). An M/G/1 queueing model with gated random order of service, To appear in Queueing Systems, 48.
168
Analysis of an optical buer with an oset-time based scheduling mechanism

Veronique Inghelbrecht, Bart Steyaert and Herwig Bruneel Department of Telecommunications and Information Processing, Ghent University, Sint-Pietersnieuwstraat 41, B-9000 Gent, Belgium
Abstract: In this paper, we study the loss performance due to data burst collision in an optical buer. We assume that this buer consist of a number of ber delay lines (FDL). In order to guarantee Quality-of-Service (QoS) dierentiation in such a buer, we investigate analytically an oset-time based scheduling mechanism. We will consider a system with C QoS classes, where the high QoS classes have a larger oset-time than the low QoS classes. For this system, we will calculate the total loss probability and the loss probability within each QoS class. Furthermore, we will calculate the delay of an arbitrary arriving burst, as well as the delay of an arriving burst of a certain QoS class.
1. Introduction The bandwidth demand on the Internet is growing rapidly. In response to this exponential demand Dense Wavelength Division Multiplexing (DWDM) has been developed. With DWDM, it is possible to transmit dierent wavelengths of light over the same ber. In this way DWDM can support data at speeds up to Terabit/s. So the electronic routers are becoming the bottleneck in the backbone network. To solve these problems, Optical Burst Switching (OBS) (proposed in [13]) seems a promising technology. An OBS network consist of OBS edge routers and OBS core routers. In the OBS ingress edge routers, IP packets with the same egress edge router address and QoS requirements are assembled into larger units, data bursts. Each time the ingress router has assembled a data burst DB, it also sends out a burst header packet BHP required for control information. These header packets and data bursts are carried on separate channels and the header packet is sent some oset time prior to the data burst. Core OBS routers (see Fig. 1.) forward these bursts through the OBS network. In these routers, the header packets are processed electronically, where the data
Fig. 1. Block diagram of an OBS core router bursts are forwarded transparently through the network, based on the information contained in the header packets. Therefore, the basic oset has to be long enough. The egress edge routers disassemble bursts into IP packets. One of the problems of processing in the optical domain is the lack of optical RAM. Optical buers, consisting of a number of Fiber Delay Lines (FDL) partially ll that gap. But such a system diers signicantly from a conventional buer. Indeed, only a discrete set of delays can be realized for contention resolution, which leads to a less than optimal utilization of the channel capacity, which reduces the overall performance. In future communications networks, Quality of Service (QoS) will also be an important issue. But because buering is a problem in the optical domain, some authors investigate the possibilities to provide QoS by setting an extra oset time (see [4, 5]) to the high priority QoS classes. By setting some extra oset time the OBS edge router has some control over the probability that a burst will be dropped inside the OBS network. As we will analyze later in this paper, an extra oset time, which is large relative to the basic oset-time, will increase the probability that an transparent optical path can be reserved. The advantage of QoS dierentiation by means of oset-time management is the simplicity of its implementation. In this paper, we investigate a FDL with oset management to avoid data burst collision. Among others, we will calculate the data burst loss probability in a system with C QoS classes and the delay of an arriving burst. An outline of the paper is as follows. First we will give a detailed model for the system under study. 170
In section 3. we will calculate the loss probability and the delay of an arriving data burst. We end with some numerical examples and some conclusions. 2. Description of the model We focus on a single WDM channel. We assume that time is slotted, and we will use the duration of a single slot as unit of time. We assume that the OBS core router has FDL lines which can delay an incoming burst by D, 2 D, 3 D,...,N D where D is the unit FDL length and N the number of FDLs. We furthermore assume that we have C QoS classes, namely class 1,2,...C, where class 1 is the one with the lowest priority and class C thus with the highest priority. All the bursts within each class have the same total oset-time, and we denote by of fi the xed oset-time of class i. The oset-time is the time between the arriving of the burst header announcing the DB and the arriving of the DB. The decision a DB will be scheduled or (if this is impossible) will be dropped, occurs at the moment that the burst header packet arrives. But to make the decision, we observe the system at of f slots after the arrival of the burst header packets. So, if the oset is long, the loss probability decreases. So we furthermore assume that low QoS classes have a lower oset-time than high QoS classes and thus, of fi < of fj if i < j . We denote by the random variable Clk the class to which the k -th DB belongs and by of fk (with probability generating function (pgf) of fk (z )) the total oset-time of the k C
th DB. Remark that this pgf can be written as of fk (z ) =

i=1
Prob(Clk = i)z of fi .
We furthermore investigate the system without void-lling, thus neither voids created by the granularity D or by the oset-time of fk can be lled, meaning that we cant use the whole capacity of the channel. We call Bk the burst-size of the k -th DB, which may be dependent on the QoS-class the k -th DB belongs to, but within a service class, we assume that the Bk s are i.i.d. (independent and identically distributed) random variables. We denote by k the interarrival time between two burst header packets. Also this set of random variables is assumed to be i.i.d. . Finally, we assume a FCFS scheduling mechanism, where the bursts corresponding to the header packets which arrives rst, will be served rst. 3. Analysis We denote by Hk the horizon as seen by the k -th BHP. It is dened as the time still necessary to serve all the DBs corresponding to BHPs arriving before the k th BHP. To decide a DB will be scheduled or dropped, we observe the system of f slots after the arriving of the BHP, and at this time the horizon is (H of f )+ . If (H of f )+ is larger than the system capacity N D, the DB will be dropped, 171
Fig. 2. Evolution of the horizon H from one arrival to the next
otherwise it will be scheduled at least of f time units after the arriving BHP packet. In gure 2, we have plotted the relation between Hk and Hk+1 . Using this gure, we obtain
Hk+1 =
of fk
(Hk of fk )+ + D + Bk k D
(Hk of fk )+ N D
(1) where x denotes the integer not smaller than x and (...)+ = max(0, ..). For the case that (Hk of fk )+ > N D, meaning that the k -th DB will be dropped, we have that
Hk+1 = (Hk k )+
(2)
When the system is stable for k , it will reach a stochastic equilibrium, independent of the initial conditions of the system. This implies that independent of the initial values of the system, for k , we have that Prob(Hk = i) = Prob(Hk+1 = i). We can derive a set of balance equations for the steady-state 172
distribution of the horizon by using equation 1 and 2 : of fC i H (0) = Prob(of f = i) H (j )Prob( B + i)

i=0 j =0 N i+lD
i+N D+Bmax min
H (j )Prob( B + i + lD) +
i
H (j )Prob( > j )
l=1 j =i+1+(l1)D of fC
j =i+N D+1
H (k ) =
i=0 N
Prob(of f = i)
i+lD
H (j )Prob( = B + i k )
j =0
+
l=1 j =i+1+(l1)D i+N D+Bmax min
H (j )Prob( = B + i + lD k )
+
j =i+N D+1
H (j )Prob( = j k )
of fC +N D+Bmax min
, 1 k of fC + N D + Bmax min
1=
j =0
H (j )
The number of equations we obtain in this way equals: of fC +N D + Bmax - min +1 ,
with Bmax =the maximum value of the random variable B and min the minimum value of the random variable . Using this distribution we can derive some performance measures, such as the burst loss probability, i.e., the probability that a DB will be lost. We call Ploss,tot the total loss probability and Ploss,i the loss probability of the i-th QoS class. We obtain for the loss probability of class i:
N D+of fC +Bmax min
Ploss,i =
j =N D+of fi +1
Prob(H = h)
(3)
whereareas the total loss probability equals

C
Ploss =
i=1
Prob(Cl = i)Ploss,i
(4)
Indeed, a DB with oset time i will be lost, when his BHP observes a horizon larger than N D + i at his arrival instant. 173
We dene by dk the burst-delay of the k -th packet, i.e. the number of slot between the beginning of the slot where the BHP arrives and the end of the slot where the service of the DB is completed. The burst-delay equals dk = bk + of fk + (Hk of fk )+ D D .
4. Numerical results In this section, we will illustrate the inuence of the various design parameters by some numerical examples. In all the gures, we assume that the successive interarrival times, represented by the random variables k , are geometrically distributed with parameter p. So, the pgf of the k -th interarrival time is given by (1 p)z E [z k ] = , (5) 1 pz
1 and the mean interarrival time is 1 p . Dening E [B ] as the mean packet length, we represent the load of the system as
= (1 p)E [B ]
(6)
1E-1
1E-2
loss probability
class1 1E-3 class2 B=D and off2=D B=D/2 and off2=D B=D and off2=2*D 20 40 D 60 80 100
total 1E-4 0
Fig. 3. The loss probability in function of D
In all the gures, we consider a system with two QoS classes where the class 1 packets have oset zero. And if not mentioned otherwise, the random variable Cl 174
that represent the QoS class of an arriving packet has the following distribution: Prob(Cl = 1) = 0.7 Prob(Cl = 2) = 0.3 In Fig. 3, we assume that that the packet lengths are geometrically distributed with mean E [B ], that the load equals = 0.5, and that the number of FDLs equals N = 10. We plot, versus the granularity D, the loss probability of both service classes and the total loss probability, in three dierent cases. First, we suppose that the mean packet length and the oset time of the Class 2 packets are equal to D. Secondly, the oset time of the Class 2 packets is still equal to D, but this time the mean packet length is D/2. In the last case, the mean packet length equals D, but this time the oset-time of the Class 2 packets is 2 times the granularity D. From the gure, it is clear that we have some control on the loss probability of the dierent QoS classes. The packet loss probability of the class 2 packets is lower than the total packet loss probability and those of the class 1 packets is larger than the total loss probability. Indeed, when a class 2 BHP arrives, the decision to serve the data packet is made, by observing his horizon of f2 slots time later than the arrival of his BHP. When a class 1 packet arrives, this decision is made earlier, namely at of f1 slots after the arrival of his BHP. We observe that from a certain value of D on (around 30), all the curves are nearly horizontal lines. This means, that the loss probability mainly depends on the ratio of the mean packet length and the mean oset times versus the granularity D, but not on the value of the granularity D itself. Therefore without loss of generality, we assume in the rest of this report that D = 30.
Cl=2 Cl=1 total 1E-1
loss probability
1E-2
1E-3
1E-4 0 5 10 N 15 20 25
Fig. 4. The loss probability in function of N 175
In Fig. 4, we again assume that the load equals 0.5, and that the packet lengths are geometrically distributed with mean E [B ] = 30. This time, we plot the loss probability in terms of the number of FDLs N . We observe, that the curves are almost straight lines, meaning that the loss probability decreases exponentially with the number of FDLs. To reduce analytic complexity however, for the rest of the paper we assume N = 10.
=0.8 1E-1
loss probability
1E-2
=0.5
Class 1 Class 2 Total 1E-3 0 10 20 30 40 offset 50 60 70 80 90
Fig. 5. The loss probability in function of of f2
1E-1
=0.8
1E-2
loss probability
=0.5 1E-3
1E-4 Class 1 Class 2 Total 1E-5 0 10 20 30 40 offset 50 60 70 80 90
Fig. 6. The loss probability in function of of f2 In Figure 5, we again consider 2 classes of service-times with identical arrival scenarios as in gure 3. As explained before, we consider a geometric packetlength with mean E [B ] = 30. The loss probabilities of the class 1, class 2 and the total loss probability are plotted in terms of the oset-time of the class 2 trac. 176
As could be expected, we observe that when the oset of class 2 bursts increases, the loss probability of the class 2 trac decreases at the expense of the class 1 burst loss probability. However, especially for large oset-times, total loss probability increases as well. Indeed, for the class 2 packets, the system is observed of f2 slots after the arriving of the BHP, which means that the probability that the horizon at this moment is larger than N D decreases when the oset-time of f2 increases. Figure 6 shows the similar curves as gure 5 when considering deterministic burst lengths equal to D. So, we cant have extremely long packet lengths, meaning that the probability that the horizon is large will decrease. We observe that the loss probability is smaller than in the previous case. Also, this time, the loss probability of the class 1 trac increases and this of the class 2 trac decreases, when the oset time increases, but this time not gradually. In Figure
constant constant geometric geometric 1E-1 service service service service time time time time D=30 D=60 D=30 D=60
loss probability
1E-2
1E-3
1E-4 0 20 40 60 E[B] 80 100 120
Fig. 7. The loss probability in function of E [B ] 7, we have set the oset of the class 2 packets constant. of f2 = 30 and have plotted for a constant load = 0.5, the loss probability versus the mean packet length for a granularity D = 30 and D = 60. We observe that when the packet lengths have a constant distribution, the curves reach a minimum when the packet lengths equal D. In case of geometric packets lengths, the curves reach a minimum for a mean packet length equal to 20 (D = 30) and equal to 50 (D = 60). If the mean packet length is much smaller than the granularity D, we will create a lot of voids; however when the packet length (compared with the xed granularity D) increases, the number of voids decreases, but the capacity of the FDL N D decreases. In Figure 8, we assume a deterministic service time with E [B ] = D = 30, we furthermore assume 2 QoS classes with the oset-time of the Class 1 trac 177
1E-1
=0.8
1E-2
loss probability
=0.5 1E-3
1E-4 Cl2 Cl1 total 0 20 40 60 80 100 percentage of Class 2 traffic
1E-5
Fig. 8. The loss probability in function of the percent of Class 2 trac zero while the class 2 oset-time equals D. We plot the loss probability of the dierent classes versus the percent of class 2 trac in the overall trac mix. First, we observe that the total loss probability doesnt change much. We also note, that if the percentage of Class 2 trac is small, the service dierentiation using oset time dierentiation becomes higher. If the load of the Class 2 trac is low, almost all these packets can be scheduled. However, when the load of the Class 2 trac increases, a lot of packets have the same oset-time and the impact of oset-time dierentiation decreases. When the Class 2 load is 100%, we have the same system as without oset-time dierentiation. Finally, in Figure 9, we have plotted the negative cumulative distribution of the burst-delay for D = of f2 = 30 and E [B ] = 30, for each of the service classes, and the delay of an arbitrary burst. If the service time is constant, the delays are smaller than in the case of geometric service times, because in the rst case all the bursts have the same service time 30. We also observe, that the delay of a high priority burst is larger than the delay of a low priority burst, but that the dierences are small. This is due to the fact, that high priority bursts have a delay that is at least their oset-time. 5. Conclusions In this paper we have studied oset-time management as a method to introduce QoS . By solving a set of balance equations for the steady state distribution of the horizon, we have calculated the loss probability of an arriving DB and the delay of an arriving burst. We observe that especially for large oset-time, the loss probability for the high priority packets is less than for the low priority packets. 178
class 2 class 1 arbitrary class
1E-1 constant service time geometric service time
Prob(d X)
1E-2
1E-3
1E-4 0 50 100 150 200 250 X 300 350 400 450 500
Fig. 9. The negative cumulative distribution of the delay On the downside, we concluded that this type of buer management increases the total loss probability. We also observe that the gain in loss probability of the high priority packets enlarges when the percentage of high priority packets becomes smaller. References [1] C. Qiao and M. Yoo, Optical burst switching - a new paradigm for an optical internet, Journal of High Speed Networks, vol. 8, pp. 6984, 1999. [2] J.S. Turner, Terabit burst switching, Journal of High Speed Networks, vol. 8, pp. 316, 1999. [3] J.Y. Wei, J.L. Pastor, R.S. Ramamurthy, and Y. Tsai, Just-in-time optical burst switching for multi-wavelength networks, in 5th International Conference on Broadband Communication BC99, (Hong Kong), pp. 339352, 1999. [4] F. Poppe, K. Laevens, H. Michiel, and S. Molenaar, Quality-of-service dierentiation and fairness in optical burst-switched networks, in Proceedings of Optical Networking and Communications OptiComm2002, (Boston), pp. 118 124, July 2002. [5] M. Yoo, C. Qiao, and S. Dixit, Qos performance of optical burst-switched networks, IEEE Journal on Selected Areas of Communication, vol. 18, pp. 20622071, October 2000.
179
Insensitive load balancing for single-class systems

` RE T. B ONALD , M. J ONCKHEERE AND A. P ROUTI E F RANCE T ELECOM R&D {Thomas.Bonald,Matthieu.Jonckheere,A lexand re.Pr outie re } @francetelecom.com
1. Introduction Load balancing is a key component of computer systems and communication networks. It consists in routing new demands (e.g., jobs, database requests, telephone calls, data transfers) to service entities (e.g., processors, servers, routes in telephone and data networks) so as to ensure the efcient utilization of available resources. One may distinguish different types of load balancing, depending on the level of information used in the routing decision: Static load balancing. In this case, the routing decision is blind in the sense that it does not depend on the system state. This is a basic scheme which is widely used in practice but whose performance suffers from the lack of information on the current distribution of system load. Semi-static load balancing. The routing decision is still blind but may depend on the period of the day [19]. Such a scheme may be useful when trafc has a well-know dayly prole for instance. Like the static scheme, it is unable to react to sudden trafc surges for some service entities. Dynamic load balancing. In this case, routing depends on the system state. Such a scheme is much more efcient as it instantaneously adapts the routing decisions to the current load distribution among service entities.
This article reviews the results presented in [4] for a single-class system and extend them to topologies with internal routing
Designing optimal load balancing schemes is a key issue. While this reduces to an optimization problem for static schemes [7, 11, 12, 25], the issue is much more complex for dynamic schemes [24]. The basic model consists of a set of parallel servers fed by a single stream of customers. Most existing results are based on the assumption of i.i.d. exponential service requirements and equal service rates. For innite buffer sizes, Winston proved that joining the shortest queue is optimal in terms of the transient number of completed jobs, provided customers arrive as a Poisson process [28]. This result was extended by Ephremides et. al. [8] to non-Poisson arrivals. For nite buffer sizes, Hordjik and Koole [10] and Towsley et. al. [26] proved that joining the shortest (non-full) queue minimizes the blocking rate. Extending these results to non-exponential service requirements or unequal service rates proves extremely difcult. Whitt gave a number of counter-examples showing that joining the shortest queue is not optimal in certain cases [27]. Harchol-Balter et. al. studied the impact of the job size distribution for those load balancing schemes where the routing decision may be based on the amount of service required by each customer [9]. Thus it seems extremely difcult if not impossible to characterize optimal load balancing schemes. In addition, the resulting performance cannot be explicitly evaluated in general. Instead of restricting the analysis to a specic distribution of service requirements (e.g., exponential), we here consider the class of insensitive load balancings, whose performance depends on this distribution through its mean only. Specically, we consider the class of Whittle queueing networks, which can represent a large variety of computer systems and communication networks and whose stationary distribution is known to be insensitive under the usual assumption of static routing [22]. We identify the class of load balancing policies which preserve insensitivity and characterize optimal strategies, for networks with a single arrival process, with or without internal Markovian routing. The resulting performance can be explicitly evaluated. The model is described in the next section. In the following two sections, optimal load balancing schemes are characterized. Section 4. is devoted to examples.
2. Model We consider a network of N processor sharing nodes. The service rate of node i is a function i of the network state x = (x1 , . . . , xN ), where xi denotes the number of customers present at node i. Required services at each node are i.i.d. exponential of unit mean. As shown in S2.5. below, this queueing system can represent a rich class of communication networks. We rst present the notion of load balancing in this context, the key relation between the balance property and the insensitivity property, and the performance objectives. 182
Notation Let N NN . For i = 1, . . . , N , we denote by ei N the unit vector with 1 in component i and 0 elsewhere. For x, y N, we write x y if x i yi for all i. We use the notation: N |x| |x|! xi and |x| . x1 ! . . . x N ! x
i=1
We denote by F the set of R+ -valued functions on N.

2.1. Load balancing
In this article, we consider K = 1 customer class. Customers arrive as a Poisson process of intensity and are rst routed to node i I = {1, . . . , N } with probability p i (x) = i (x)/ in state x where i (x) denotes the external arrival rate at node i in state x. We assume that any customer may be blocked when it enters the network only, so that the blocking probability is 1 iI pi (x) in state x. We have: i (x) .
iI
(1)
After service completion at node i, a customer is routed to node j I with probability pij or leaves the network with probability p i0 = 1 j I pij . We assume that for any state x, the following trafc equations have a unique solution 1 (x), . . . , N (x): i (x) = i (x) +
j I
j (x)pji ,
i I.
(2)
We refer to i (x) as the arrival rate at node i in state x. We further assume that the network capacity is nite in the sense that there exists a nite set Y N such that if x Y then y Y for all y x. This implies that: i (x ei ) = 0, x Y : xi > 0. (3)
Any state-dependent arrival rates that satisfy (1), (2) and (3) determine an admissible load balancing.
2.2. Balance property
Service rates
We say that the service rates are balanced if for all i, j : x : xi > 0, xj > 0.
i (x)j (x ei ) = j (x)i (x ej ),
This property denes the class of so-called Whittle networks, an extension of Jackson networks where the service rate of a node may depend on the overall network state [22]. 183
For Poisson arrivals at each node, the balance property is equivalent to the reversibility of the underlying Markov process. We assume that i (x) > 0 in all x such that xi > 0. Let be the function recursively dened by (0) = 1 and: (x) = (x ei ) , i (x) xi > 0. (4)
The balance property implies that is uniquely dened. We refer to as the balance function of the service rates. Note that if there is a function such that the service rates satisfy (4), these service rates are necessarily balanced. For any x N, (x) may be viewed as the weight of any direct path from state x to state 0, where a direct path is a set of consecutive states x(0) x, x(1), x(2), . . . , x(n) 0 such that x(m) = x(m 1) ei(m) for some i(m), m = 1, . . . , n, with n = |x|, and the weight of such a path is the inverse of the product of i(m) (x(m)) for m = 1, . . . , n (Refer to Figure 1). As will become clear in S2.3. below, the balance function plays a key role in the study of Whittle networks.
x2
2 1
x1
Fig. 1: The balance function (x) is equal to the weight of any direct path from state x to state 0.
Arrival rates A similar property may be dened for the arrival rates. We say that the arrival rates are balanced if for all i, j : i (x)j (x + ei ) = j (x)i (x + ej ), Let be the function recursively dened by (0) = 1 and: (x + ei ) = i (x)(x). 184 (5) x N.
The balance property implies that is uniquely dened. We refer to as the balance function of the arrival rates. Again, if there is a function such that the arrival rates satisfy (5), these arrival rates are necessarily balanced. For any x N, (x) may be viewed as the weight of any direct path x(n), x(n 1), . . . , x(0) from state 0 to state x, dened as the product of i(m) (x(m 1)) for m = 1, . . . , n. In particular, the fact that (x) > 0 implies that (y ) > 0 for all y x. We dene: X = {x N : (x) > 0}. In view of (1), (2), (3) and (5), we deduce that: X Y, (x + ei )
j I
(6)
(7) x N, i I , x N. (8) (9)
(x + ej )pji 0,
(x + ei )pi0 (x),
iI
We refer to A as the set of admissible functions F for which properties (7), (8) and (9) are satised.
2.3. Insensitivity property
Static load balancing Consider static load balancing where the arrival rates i (x) do not depend on the network state x (within the network capacity region dened by Y ). If the service rates are balanced, the stochastic process that describes the evolution of the network state x is an irreducible Markov process on the state space Y , with stationary distribution:
N
(x) = (0) (x)

i=1
i x i ,
x Y,
(10)
where (0) is given by the usual normalizing condition, is the balance function dened by (4) and the effective arrival rates are solutions of: i = i +
j Ik
pji j ,
i I.
In addition, the system is insensitive in the sense that the stationary distribution depends on the distribution of required services at any node through the mean only [22]. It has recently been shown that the balance property is in fact necessary for the system to be insensitive [2]. In the rest of the paper, we always assume that the service rates are balanced. 185
Dynamic load balancing We now consider dynamic load balancing where the arrival rates i (x) depend on the network state x. A sufcient condition for the insensitivity property to be retained is that the arrival rates are balanced, in which case the stochastic process that describes the evolution of the network state x is an irreducible Markov process on the state space X dened by (6), with stationary distribution: (x) = (0) (x)(x), x X, (11)
where (0) is given by the usual normalizing condition. Again, the balance property is in fact necessary for the system to be insensitive [2]. The class of insensitive load balancings is thus simply characterized by the set of admissible balance functions A dened above.
2.4. Performance objectives
Our aim is to characterize insensitive load balancings that are optimal in terms of blocking probability. The blocking probability is given by: p=
xX
(x) 1
iI
i (x)
It is worth noting that the balance function , which gives the state-dependent external arrival rates i (x) as well as the routing probabilities p ij (x) in each state x, also determines the state space X . In general, the state of actually attainable states X associated with the optimal solution is strictly included in the set of potentially attainable states Y dening the network capacity region.
2.5. Application to communication networks
The considered model is sufciently generic to represent a rich class of computer systems and communication networks. The basic example is a set of parallel servers as mentioned in Section 1.. We use this toy example as a reference system in Section 4. to assess the performance of insensitive load balancing strategies. The model can be used to represent much more complex systems, however. Circuit switched networks Consider for instance a circuit switched network composed of L links with respective capacities C 1 , . . . , CL . Users arrive at rate and require a circuit of capacity a for a random duration of mean 1/ through one of the routes r i , i I , where each route ri consists of a subset of the links {1, . . . , L}. This denes N 186
types of users. Such a circuit switched network can be represented by the above queueing system where each node i corresponds to type-i users. Specically, the service rate of node i is given by: i (x) = xi a, for i I . Thus the service rates are balanced with corresponding balance function: (x) =
iI
1 . x i ! a xi xi
Data networks We now consider a data network composed of L links with respective capacities C1 , . . . , CL and shared by K user classes. Class-k users arrive at rate k and require the transfer of a document of random size of mean 1/ k through one of the routes ri , i Ik . Again, this denes N types of users with I 1 . . . IK = {1, . . . , N }. The duration of a data transfer depends on its bit rate. We assume that the overall bit rate i (x) of type-i users depends on the network state x only and is equally shared between these users. Such a data network can be represented by the above queueing system where each node i corresponds to type-i users, with service rate i (x) = i (x), i I . The allocation must satisfy the capacity constraints: l,
i: lri
The network capacity is determined by the link capacities: Y = x N : l, xi a C l .

iI : lri
i (x) Cl .
The balanced allocation for which at least one capacity constraint is reached in any state is known as balanced fairness [3]. We have: i (x) = (x ei ) , (x) x : xi > 0,
where is the positive function recursively dened by (0) = 1 and: (x) = max
l
1 Cl
(x ei ).
i: lri
The balance function which characterizes the service rates of the corresponding queueing network is then given by: (x) = (x)
iI
1 . xi
187
The network capacity can be determined so as to guarantee a minimum data rate for instance, in which case: Y= x N : i, i (x) . xi
3. The reference case: a single class without internal routing We rst consider the case without internal routing. There is a single stream of incoming customers, that can be routed to any of the N network nodes. We rst characterize the class of admissible load balancings and then use this characterization to identify optimal strategies in terms of blocking probability.
3.1. Characterization
Denote by S A the set of balance functions that correspond to simple load balancings in the sense that customers can be blocked in a single state. We denote by y the element of S such that customers are blocked in state y Y only. Proposition 1. A balance function y S is uniquely dened by the following recursive equations: N 1 y (x + ei ). x y, y (x) =
i=1
Proof. By denition, y (x) = 0 for x > y . For any state x y , x = y , we have: (no blocking in state x). Using the insensitivity characterization, we get: y (x) = N i=1 y (x + ei ) y (x) = Proposition 2. We have: y (x) = y (y ) 1 |y x| y x |yx| if x y, 1
N
y (x + ei ).
i=1
and y (x) = 0 otherwise. The constant y (y ) is determined by the normalizing condition y (0) = 1. Proof. For any state x y , x = y , we have: y (x) = 1
N
y (x + ei ).
i=1
188
In particular, y (x) is equal to the product of y (y )/ |yx| by the number of direct paths from x to y . 2 The blocking probability associated with a simple load balancing can be easily evaluated using the following recursive formula: Proposition 3. Let 1/ (y ) be the blocking probability associated with the balance function y S . We have:
N
(y ) = 1 +
i=1
i (y ) (y ei ),
with (0) = 1 and (y ) = 0 for any y N. Proof. cf [4]. 2 The following result characterizes the set of admissible balance functions A as linear combinations of elements of S : Theorem 1. For any balance function A, we have: =
y Y
(y )y ,
(12)
where for all y Y , (y ) (y ) = y (y ) with 1 (y ) = (y )

y Y N
(y + ei ).
i=1
Conversely, for any F such that (12) lies in A. Proof. We have for any state x Y :
(y ) = 1, the balance function dened by
1 (x) = (x) +
(x + ei ).
i=1
(13)
As (x) = 0 for any state x Y , we deduce that is entirely determined by the function through (13). The proof of equality (12) then follows from the fact that the 189
function
y Y
(y )y satises (13) in any state x Y : (y )y (x) = (x)x (x) + (y )y (x),

y Y
y Y
y x,y =x
= (x) +
y Y
(y )
y x,y =x N
y (x + ei ),
i=1
1 = (x) +
(y )y (x + ei ).
i=1 y Y
Conversely, any linear combination = y Y (y )y of elements of S with ( y ) = 1 satises (0) = 1 , X Y as well as inequalities (9). 2 y Y
3.2. Optimal load balancing
We deduce from Theorem 1 that there exists an optimal admissible load balancing which is simple. In particular, the state of actually attainable states X associated with this optimal solution is of the form {x N : x y } and therefore generally smaller than the set of potentially attainable states Y . Corollary 1. There is a balance function S which minimizes the blocking probability over the set A. Proof. The blocking probability is given by:
N
p=
xX
(x) 1
i=1
i (x)
In view of (5) and (11), we deduce: p =

xY (1
N i=1 i (x))(x)(x)
xY (x)(x)
xY ((x)
(x + ei ))(x) . ( x )( x) xY
N i=1
It then follows from Theorem 1 that p= with (y ) =

xY y Y y Y
(y )(y ) (y )(y )
y (x) (x), y (y ) 190
y Y.
In particular, p min
y Y
(y ) . (y )
We conclude that the blocking probability p is minimal if (y ) = 0 for all y Y except for one state y where the function y (y )/(y ) is minimal. The corresponding balance function is y S . 2 Remark 1. In view of Corollary 1 and Proposition 3, nding the optimal admissible load balancing requires O (|Y|) operations only, where |Y| denotes the number of elements in the set Y . It is possible to further characterize the optimal load balancing when the network is monotonic in the sense that: i (x) i (x ej ), i, j, x : xj > 0. (14)
We say that a state y Y is extremal if y + e i Y for all i = 1, . . . , N . The following result is a consequence of Proposition 3: Proposition 4. Let y S be a balance function which minimizes the blocking probability over the set A. If the network is monotonic, y is an extremal state of Y . Proof. cf [4].
3.3. Single class with static internal routing
Arrival nodes
Exit nodes
x1
Arrivals
z1
1(x) N(x)
xN zN
Fig. 2. Parallel servers with a two-nodes session structure
191
We now generalize the results of the previous section to the case of static internal routing. For the sake of tractability, we assume that the set of network nodes may be split into two subsets, Ja and Jr , such that pij = 0 for any j Ja and i (x) = 0 for any i Jr and any state x. In other words, Ja are the nodes where customers arrive in the network and once they leave these nodes, they never come back to any of node of J a . We denote by xa and xr the vectors representing the number of customers in nodes of subsets Ja and Jr respectively. We further dene Ya = {za N|Ja| : x Y , xa = za }.
3.4. Characterization
Equations (5) become: (x + ei ) = i (x)(x), (x + ei ) =

j Ja
i Ja , pji (x + ej ), i Jr .
(15) (16)
pji (x + ej ) +
j Jr
From (16), we deduce there exist positive coefcients q ij , i Jr and j Ja , such that: (x + ei ) =
j Ja
qji (x + ej ),
i Jr .
(17)
Proposition 5. There exist positive coefcients (z, xr ) and (z, x) such that, for any admissible balance function and any state x: (x) =
z Y :|xr |=|z |,zr =0
(z, xr )(xa + z ).
(18)
and: (x) =
z Y :|x|=|z |,zr =0
(z, x)(z ).
(19)
Proof. This property results directly from (17). Actually the coefcient (z, xr ) is the sum of the weighted paths from x to xa + z , with weight equal to qji when moving from x to x + ej ei . From this construction, you also get that for all i J a : (x + ei ) =
z Y :|x|=|z |,zr =0
(z, x)(z + ei ).
(20) 2
192
3.5. Optimal load balancing
Denote by A and S the sets of admissible and simple balance functions on Y a respectively. Then the analog of Corollary 1 characterizing the optimal load balancing is the following:
S such that the balance function miniCorollary 2. There is a balance function ya mizing the blocking probability over the set A writes:
(x) =
z Y :|x|=|z |,zr=0
(za ). (z, x)ya
Proof. In view of (5), (18) and (20), the blocking probability writes: p = =
xX 1 (x) (x) iJa (x + ei ) xX (x)(x)
xX
(x)
z Y :|x|=|z |,zr =0 (z, x) xX
(z )
iJa
(z + ei )
(x)
z Y :|x|=|z |,zr =0 (z, x)(z ) y Ya
From Theorem 1, we know that if zr = 0, (z ) = p = =

y Ya y Ya
(y )y (za ). Then:
1 iJa
(y )
xX y Ya
(x) (y )
xX
z Y :|x|=|z |,zr=0 xX
(z, x) y (z )
y (z + e i )
(x)
z Y :|x|=|z |,zr=0 (z, x)y (z )
y Ya
(y )
xX
(x)1|x|=|y| (y, xr )
y (z ) z Y :|x|=|z |,zr=0 (z, x) y (y )
(y )
(x)
As in the proof of Corollary 1, the blocking rate is minimized when (y ) = 0 for all the state where the function y Ya except for ya y
xX xX
(x)1|x|=|y| (y, xr )
z Y :|x|=|z |,zr=0
y (z, x) y (y )
(x)
(z )
is minimized. 4. Examples
We apply previous theoretical results to a reference system consisting of a set of parallel servers, to a slightly more complex topology modelling a tree network and to 193
4.1. Set of parallel servers
We rst consider N parallel servers fed by a single stream of customers, as illustrated in Figure 3. Such a system may represent a supercomputer center, for instance, or any other distributed server system. For communication networks, it might correspond to a logical link split over several physical links. In this case, we consider data trafc only as, for telephone trafc, any policy which blocks a call only when all circuits are occupied is obviously optimal.
server 1 customer arrivals
server 2 server 3
Fig. 3. The reference system: a set of parallel servers. The model is the same as that considered in [2]. Denote by x = (x 1 , . . . , xN ) the network state, C1 , . . . , CN the server capacities and 1/ the mean service requirement. We have: i (x) = Ci , corresponding to the balance function: (x) = 1
N xi i=1 (Ci )
The network capacity region, dened so as to guarantee a minimum service rate , may be written: Y = {x : i, xi Ci / }. The overall system load is given by: =
N l=1 Ci
The monotonicity property (14) trivially holds and it follows from Corollary 1 and Proposition 4 that the optimal insensitive load balancing is characterized by the balance function: |y | |y x| |x| if x y, / (x) = y yx 194
and (x) = 0 otherwise, where y is the vector such that y i is the largest integer smaller than Ci / . In view of (5), the corresponding arrival rates are: i (x) = yi x i , |y x| x y.
Viewing each server i as a resource of y i potential circuits of rate , this boils down to routing new demands in proportion to the number of available circuits at each server. We compare the resulting blocking probability with that obtained for the best static load balancing and the greedy load balancing, respectively. We refer to the greedy strategy as that where users are routed at time t to server i(t) with the highest potential service rate : Ci . (21) i(t) = arg max i=1,...,N xi (t ) + 1 Note that for symmetric capacities, this is equivalent to joining the shortest queue. The greedy strategy is sensitive. Results are derived by simulation for i.i.d. exponential services. Figure 5 gives the blocking probability for two servers of symmetric capacities (C1 = C2 = 1) and asymmetric capacities (C1 = 1, C2 = 0.5) with = 0.1. We observe that the best insensitive load balancing provides a tight approximation for the greedy strategy, which is known to be optimal for i.i.d. exponential services with the same mean (cf. Section 1.). We veried by simulation that, for both symmetric and asymmetric scenarios, the performance of the greedy strategy is in fact only slightly sensitive to the service requirement distribution and therefore that the above approximation remains accurate under general trafc assumptions.
p 11 2 (x) p 12
1 (x)
p 21 p 22
Fig. 4. Severs with cross trafc
195
1 0.1 Blocking probability 0.01 0.001 0.0001 1e-05 1e-06 0.5 1 1.5 Load 2 Greedy Insensitive Static Blocking probability
1 0.1 0.01 0.001 0.0001 1e-05 1e-06 0.5 1 1.5 Load 2 Greedy Insensitive Static
Fig. 5: Two parallel links with symmetric capacities (left graph) and asymmetric capacities (right graph).
4.2. Routing on a tree network
Consider here a telecommunication network constituted of N routes sharing one link. Each route has a maximum capacity Ci . We consider two types of trafc: Telephone trafc, where each user requires a circuit of unit capacity. If 1/ is the mean call duration, the services rates are: i (x) = xi and the corresponding balance function is (x) = network capacity region is then dened as: Y = {x N :
i=1...N 1 Q
i=1..N
xi ! .
The
xi C, xi Ci }
Data trafc The state space can be dened as for telephone trafc to guarantee a minimum data rate. On this topology, the usual utility-based allocations are sensitive to the service distribution. However, a good approximation of utility-based allocations for a number of practical cases is given by balanced fairness (cf [3]). At least for the blocking probability criterion, there is no real restriction in supposing that the service rates are balanced. For an explicit expression of the balance function, see ([3]). The network capacity region is then dened as: Y = {x N :
i=1...N
xi C/, xi Ci / }
196
The trafc intensity is = = P Ci . For tele and the load parameter is i=1...N phone trafc, the blocking probability is compared with those of a greedy policy that routes the users at time t to the server with the highest potential service rate: i(t) = arg max min(Ci xi (t), C
i=1...N i
xi (t))
The simulation for the greedy policy are made with i.i.d exponential service times. As shown in Figure 6, for N = 2 routes, C 1 = 4, C2 = 7 and C = 10, the performance of both policies are exactly similar. For data trafc, we compare to a greedy policy which routes towards the server with highest potential (greedy HP) and to a shortest queue policy which routes to the server with the greatest remaining places left (greedy SQ): iHP (t) = arg max i (x(t) + ei ) xi (t) + 1 xi )
i=1...N
i=1...N
iSQ (t) = arg max min(Ni xi (t), N

i=1...N
As shown in Figure 6, for N = 2 routes, C 1 = 0.4, C2 = 0.7 and C = 1 and = 0.1, the performance of greedy SQ is very close to the performance of the insensitive, while greedy HP is a less efcient policy.
1 0.1 0.01 0.001 0.0001
Blocking probability
0.1
0.01
0.001
0.0001 Insensitive Greedy 1e-05 0 0.2 0.4 0.6 0.8 1 1.2 Load 1.4 1.6 1.8 2
1e-05 1e-06
Greedy SQ Greedy HP Insensitive 0.4 0.6 0.8 1 Load 1.2 1.4 1.6 1.8
0.2
Fig. 6: Blocking probability on a tree network for greedy and insensitive policies for telephone trafc (left graph) and data trafc (right graph)
197
4.3. Servers with cross trafc
In [2], a generic ow-level model of data networks using balanced PS queuing networks is introduced. The authors underline that ows do not arrive as independent Poisson process but are typically generated within sessions. A session is succession of ows, possibly separated by an interval of inactivity. On the contrary to ows, session naturally arrive as Poisson processes. The example described by Figure 4, illustrates such models. The routing rates are given by: i = 1, 2 i (x) = i (x) j = 3, 4 j (x) = p1,j 1 (x) + p2,j 2 (x) Using corollary (1), the optimal balance function writes:
(w, z ) =
u1 =0 u2 =0 z2 z1 X X 1 1 p11 p21 u z u1
z1 u1
2 2 p21 p22
z u2
z2 u2
y (w1 + u1 + u2 , w2 + z1 u1 + z2 u2 , 0, 0)
Remark that when p12 = p21 = 0, the system reduces to a set of parallel servers with a session structure in which each customer is served twice before leaving (cf Figure 4). z = (u, 0) where ui = wi + zi . Then, A if : Note w+ (w, z ) = (1 A simple balance function writes: y (w, z ) = y (y )
z z )| aw+ |y (w+ z| z y w+ |w+
z) )z (w+
(22)
z y, if w+
and y (x) = 0 otherwise. As previously we compare the blocking probability of the insensitive policy for two different internal routing probabilities p 11 = p21 = 0.5 and p12 = p21 = 0), for data trafc with Ci = 1, for all i and = 0.2. In the rst case, the greedy policy is dened as joining the server with the shortest number of customers. On the contrary to the insensitive one, this policy leads to internal blocking. The simulation are made with i.i.d exponential service times. Of course, this policy is not optimal and even naive but it gives an idea of the good performance of the insensitive policy. In the case of parallel servers (p12 = p21 = 0) the greedy policy is dened as joining the queue such that: i(t) = arg max (Ni xi (t) + Ni zi (t)).
i=1...N
Here, the greedy policy outperforms slightly the insensitive one. Both performance stay however very close. 198
5. Conclusion While load balancing is a key component of computer systems and communication networks, most existing optimality and performance results are derived for specic topologies and trafc characteristics. In this paper, we have focused on those strategies that are insensitive to the distribution of service requirements, in the general setting of Whittle networks. We have characterized any insensitive load balancing as a linear combination of so-called simple strategies, for both a single customer class without internal routing and with internal routing and specic routing assumptions. This characterization leads to simple optimality and performance results that were illustrated on toy examples.
0.1
0.01
0.001
0.0001 Insensitive Greedy 1e-05 0.2 0.4 0.6 0.8 1 1.2 Load 1.4 1.6 1.8 2
Fig. 7. Greedy routing against insensitive routing for p 11 = p21 = 0.5
199
1 0.1 0.01 0.001 0.0001 1e-05 1e-06 0.2
Insensitive Greedy 0.4 0.6 0.8 1 1.2 Load 1.4 1.6 1.8 2
Fig. 8. Greedy routing against insensitive routing for p 12 = p21 = 0
200
References [1] M. Alanyali, B. Hajek, Analysis of simple algorithms for dynamic load balancing, Mathematics of Operations Research 22-4 (1997) 840871. [2] T. Bonald, A. Prouti` ere, Insensitivity in processor-sharing networks, Performance Evaluation 49 (2002) 193209. [3] T. Bonald, A. Prouti` ere, Insensitive bandwidth sharing in data networks, Queueing Systems 44-1 (2003) 69100. [4] T. Bonald, M. Jonckheere, A. Prouti` ere, Insensitive load balancing, Sigmetrics/performance (2004) 367-6378. [5] X. Chao, M. Miyazawa, R. F. Serfozo and H. Takada, Markov network processes with product form stationary distributions, Queueing Systems 28 (1998) 377401. [6] J.W. Cohen, The multiple phase service network with generalized processor sharing, Acta Informatica 12 (1979) 245284. [7] M.B. Combe, O.J. Boxma, Optimization of static trafc allocation policies, Theoritical Computer Science 125 (1994) 1743. [8] A. Ephremides, P. Varaiya, J. Walrand, A simple dynamic routing problem, IEEE Transactions on Automatic control 25 (1980) 690693. [9] M. Harchol-Balter, M. Crovella, C. Murta, On choosing a task assignment policy for a distributed server system, IEEE journal of parallel and distributed computing 59 (1999) 204228. [10] A. Hordijk, G. Koole, On the assignment of customers to parallel queues, Probability in the Engineering and Informational Sciences 6 (1992) 495511. [11] F.P. Kelly, Blocking Probabilities in Large Circuit-switched Networks, Adv. Applied Probability 18 (1986) 473505. [12] F.P. Kelly, Routing and capacities allocations in networks with trunk reservations. Mathematics of operations research 15 (1990) 771793. [13] F.P. Kelly, Network routing, Philosophical Transactions of the Royal Society A337 (1991) 343367. [14] F.P. Kelly, Loss Networks, Annals of Applied Probabilities 1-3 (1991) 319378. [15] F.P. Kelly, Bounds on the performance of dynamic routing schemes for highly connected networks, Mathematics of Operations Research 19 (1994) 120. [16] F.P. Kelly, Mathematical modelling of the Internet, in: Mathematics Unlimited 2001 and Beyond (Editors B. Engquist and W. Schmid), Springer-Verlag, Berlin (2001) 685702. 201
[17] J. van Leeuwaarden, S. Aalto, J. Virtamo, Load balancing in cellular networks using rst policy iteration, Technical Report, Networking Laboratory, Helsinki University of Technology, 2001. [18] D. Mitra, R.J. Gibbens, B.D. Huang, State dependent routing on symmetric loss networks with trunk reservations, IEEE Transactions on communications 41-2 (1993) 400411. [19] S. Nelakuditi, Adaptive proportional routing: a localized QoS routing approach, in: Proc. of IEEE Infocom, 2000. [20] R. Nelson, T. Philips, An approximation for the mean response time for shortest queue routing with general interarrival and service times, Performance evaluation 17 (1993) 123139. [21] S. Oueslati-Boulahia, E. Oubagha, An approach to routing elastic ows, in: Proc. of ITC 16, 1999. [22] R. Serfozo, Introduction to stochastic networks, Springer, 1999. [23] P. Sparaggis, C. Cassandras, D. Towley, Optimal control of multiclass parallel service systems with and without state information, in: proc. of the 32nd Conference on Decision Control, San Antonio, 1993. [24] S. Stidham, Optimal control of admission to a queueing system, IEEE Transactions on Automatic Control 30-8 (1985) 705713. [25] A.N. Tantawi and D. Towsley, Optimal static load balancing in distributed computer systems, Journal of the ACM 32-2 (1985) 445465. [26] D. Towley, D. Panayotis, P. Sparaggis, C. Cassandras, Optimal routing and buffer allocation for a class of nite capacity queueing systems, IEEE Trans. on Automatic Control 37-9 (1992) 14461451. [27] W. Whitt, Deciding which queue to join: some counterexamples, Operations research 34-1 (1986) 226244. [28] W. Winston, Optimality of the shortest line discipline, Journal of Applied Probability 14 (1977) 181189.
202
A Markov Chain Model of Optical Packet Formation Solution Process Acceleration with Parallel Computing
Zdzislaw SZCZERBINSKI
a
Institute of Theoretical and Applied Informatics Polish Academy of Sciences Baltycka 5, 44-100 Gliwice, Poland email: zdzich@iitis.gliwice.pl
1. Introduction While modelling analytically the formation process of an optical packet from electronic blocks of messages, one usually builds a queueing model (Fig. 1), consisting of a FIFO queue and a number of Poisson sources. In our approach we limit the length of the queue to 250 blocks, and the number of sources to 20. This corresponds to the conventional length of an optical packet, equal to 250. The rationale for 20 sources is the following. Since electronic packets have variable length, from 1 to 20 blocks, it is straightforward to model this as a collection of sources, with each source generating packets of denite length only, i.e. source 1 generates single block electronic packets, source 2 packets of 2 blocks, source 3 packets consisting of 3 blocks etc. All such sources form the input to the queue in a symmetric way, and we assume that the arrival rates are equal. The formation process itself is as follows. An optical packet is ready to be sent (red) when either 250 blocks have arrived, or less than 250 blocks are in the queue but the packet which has just arrived is of length which would cause the queue to exceed 250 blocks. Besides, it is assumed that the process may not be exceedingly long; there is imposed a predened limit on the process duration. The above phenomena are modelled as a Markov chain, depicted in Fig. 2. The nodes of the graph are the Markov states of the system. Horizontally, the edges between nodes correspond to arrivals of new electronic packets; since packets are of variable length (in blocks), there are a number of edges originating
from the same node i.e. there are possible transitions from a state to one of of a number of states, depending on how many blocks have just arrived. For the states where the number of blocks in the queue approaches 250, transitions may occur back to the states with small numbers of blocks an attempt to insert the arriving blocks would have caused the queue to overll; hence, the formation of the optical packet is terminated, the packet is red, and the blocks initiate a new queue. Each vertical column of nodes corresponds to a period where no new blocks are arriving. Blocks are awaited for a predened time which may not be exceeded. This is modelled by the Erlang distribution of order 10. When the time limit is reached, the system returns to the original state an optical packet has been formed of what was in the queue and then red, and a new iteration of packet formation is about to begin. 2. Solution of the Markov chain model In order to obtain vital characteristics of the system, the Markov chain model must be solved, i.e. probabilities of all its states must be computed. Basing on this, probabilities of respective numbers of blocks in the queue may be easily calculated as sums of probabilities in the models vertical subchains. We limit our attention to homogeneous, continuous-time Markov chains, i.e. a state may change at any real-valued time instant and probabilities of transitions between states are stationary with respect to time. In this case, denoting the states of the chain as x1 , x2 , . . . , xn , the probability of transition from state xi to state xj in a very small time interval t is linear: pij (t) = qij t where qij represents the transition rate between states xi and xj . A homogeneous continuoustime Markov chain is represented by 204
8 8
. . .
20
Fig. 1. Queueing model of optical packet formation
Fig. 2. Markov chain model of the packet formation process a set of states, and an innitesimal generator matrix Q = [qij ] whose entries are the transition rates, except for the diagonal elements whose values are such that the following holds: qii = j,j =i qij . It may be shown that the stationary probability vector , a vector of length n whose k -th element denotes the stationary probability of state xk , can be obtained by solving the system of equations1 T Q = 0T The above system has an innite number of solutions since the matrix Q is singular. We are interested in nding the unique solution for which the sum of the probabilities (elements of vector ) is equal to 1. The above condition (conservation of probability) may be directly introduced into the system by replacing one equation (e.g. the last one) with T e = 1, where e is a vector whose elements are all equal to 1. Q is now non-singular and the system of equations takes the form T Q = bT
1
(1)
We assume columnwise orientation of vectors, e.g. is a column vector and r T is a row vector.
205
where b is a vector whose all elements are equal to 0 except one element which is equal to 1. For such a system, the unique solution which satises conservation of probability may be analytically computed. Let us note that (1) may be written as QT = b which allows for the application of general numerical methods of solving linear systems of the form Ax = b. The solution process is well dened in the literature; however, technical aspects make it non-trivial due to very large numbers of states in chains modelling real-world electronic-to-optical switches. 3. Solution methods for linear systems Solution methods for Ax = b (2)
are classied into two categories: direct and iterative. Direct methods give exact solutions and the number of operations they involve is xed for a given system size. A common feature of these methods is the factorization of the matrix A. The obstacle to using them for large sparse systems, as in Markov chain models, is the presence of numerous non-zero entries in the factors, known as ll-in. There are methods for reordering matrices so as to reduce ll-in; however, they inevitably require additional computer memory which, for many problems, exceeds allowable limits. Iterative methods generate a sequence of approximate solutions until a desired accuracy is reached. Their advantage over direct methods is thus that the system can be solved to a predetermined accuracy and so the number of operations required can be far less than that in direct solvers. Iterative methods are also preferred for other reasons. Usually, the only operation in which the matrix A (or a matrix easily derived from A) is involved is multiplication by vector(s); such operations do not alter the matrix, which is important for large and sparse matrices because compact storage schemes may now be implemented. Additionally, since the matrix is never altered, there is no rounding error propagation which characterizes direct methods. A major disadvantage of iterative methods is their frequent bad convergence (prohibitively long time is required to reach a solution with accepted accuracy) or even inconvergence, for an ill-conditioned matrix A. By contrast, in direct methods an upper bound on the time required to obtain the solution is easily determined before the actual calculation; besides, the solution is (theoretically) always accurate. Nevertheless, for a whole spectrum of 206
applications including, among others, Markov chains, iterative solvers are more eective than direct solvers. The oldest, classical iterative methods (Jacobi, Gauss-Seidel, SOR) converge rather slowly. A newer class of solvers, known as projection methods or Krylov subspace techniques are currently very popular for large systems of equations. They feature fairly good convergence and are competitive with classical iterative methods in terms of memory utilization. They are also well suited to implementation on parallel computers. Here, their most important feature is that they are solely composed of simple operations on vectors, and matrix-vector multiplications, which are easily parallelizable. By far the most widely known projection method is the conjugate gradient algorithm [2], used to solve linear systems where the matrix A is symmetric positive denite. For nonsymmetric systems of equations, generalizations of this algorithm have been developed and are in broad use; they include the methods of: biconjugate gradients, conjugate gradient squared [5], and generalized minimum residual [4]. For our research we have chosen the rst of the above algorithms. It is well described in the literature, has good convergence, and features two levels of possible parallelization, described in detail in the next section. 4. The biconjugate gradient method The biconjugate gradient method was developed by Fletcher [1] from an algorithm for tridiagonalization of nonsymmetric matrices (due to Lanczos), applied to solve nonsymmetric systems of equations. Like its historical predecessor, the conjugate gradient method, it is based on the idea of minimizing the function 1 f (x) = xT Ax xT b 2 which is minimized when the gradient f = Ax b = r is zero, which is equivalent to (2). The minimization consists in generating a succession of improved solutions x(k) , obtained while searching in two directions p(k) and p(k) , which results in two residual vectors r (k) and r(k) (the original conjugate gradient method used single vectors of direction and residuals, p (k) and r(k) ). These vectors satisfy the following conditions: biorthogonality r (i)T r(j ) = r(i)T r(j ) = 0, j < i biconjugacy (with respect to A) p(i) AT p(j ) = p(i) Ap(j ) = 0, j < i 207
mutual orthogonality r (i)T p(j ) = r(i)T p(j ) = 0, j < i The algorithm may be written as follows. 1. Choose an initial approximate solution x(0) . 2. Compute the residual r (0) = b Ax(0) . 3. Set p(0) = p(0) = r(0) = r(0) , and k = 0. 4. Perform iteratively the following sequence of computations (k) =
r (k)T r (k) p(k)T Ap(k)
x(k+1) = x(k) + (k) p(k) r(k+1) = r(k) (k) Ap(k) r(k+1) = r(k) (k) AT p(k) (k) =
r (k+1)T r (k+1) r (k)T r (k)
p(k+1) = r(k) + (k) p(k) p(k+1) = r(k) + (k) p(k) increment k The coecients (k) are chosen to ensure the biorthogonality condition, and the biconjugacy condition. The method converges after m n iterations (i.e. step 4 is performed at most n times) with r (m+1) = r(m+1) = 0 and x(m+1) being the solution. The above convergence rule is theoretical only since it assumes exact arithmetic. In practical applications, where numerous roundo errors occur, the algorithm proceeds beyond n iterations, until a convergence test determines that it should terminate, e.g. r (k)T r(k) where is a tolerance criterion. It may be seen from the above that the most costly computation in the biconjugate gradient procedure are the two matrix-vector multiplications. To speed up computations, they can be performed in parallel, for example an a dual-processor system. On the other hand, there are a number of other easily parallelizable operations in the algorithm: vector additions and subtractions, scalar-vector multiplications and vector-vector scalar multiplications. Some of them are connected with one another, forming the so-called linked triads, where a vector is multiplied by a scalar and then added to another vector; such collective vector operations are also parallelizable. (k) 208
5. Practical implementation and timing results The biconjugate gradient method has been implemented as a Fortran program. In order to achieve acceleration, there have been developed versions of the program for the available parallel computing environment. The computing platform for timing experiments was the Sun Enterprise 6500 system, featuring 10 UltraSPARC processors accessing 6 GB of shared memory and working under the Solaris 7 operating system (maximum computation rate exceeding 6 Gops). While parallelizing the (originally) sequential program we took advantage of the Sun Microsystems native system of program parallelization, consisting in augmenting the program with parallelizing directives for the Fortran compiler. For example, the process of computing new search directions p(k+1) and p(k+1) (see previous section), programmed as the loop do i = 1,n p(i) = bk*p(i) + r(i) pp(i) = bk*pp(i) + rr(i) end do has been parallelized by preceding the loop with the directive !MIC$ DOALL PRIVATE(i) SHARED(p,r,pp,rr,bk) which causes the loops iterations to be performed in parallel on multiple processors. Unfortunately, the Suns native system is poor in that it allows to parallelize loops only. This is what was exactly needed for all parallelizable operations in our algorithm, except one: where the two time-consuming matrix-vector multiplications must be performed simultaneously. Here the Suns software lacks the capability of some other systems which allow for the parallelization of independent sections of non-iterative (sequential) code. Nevertheless, we have managed to have the two operations proceed simultaneously by constructing, around them, an articial parallel loop consisting of two iterations only: in the former iteration A is multiplied by p(k) , and in the latter AT by p(k) . Obviously, the loops body contains testing which iteration is currently being carried out since the two iterations perform dierent operations. We used the above program to solve the model described in the introduction. The model included 2491 states (249 blocks multiplied by 10 stages of the Erlang distribution, plus the initial state). In our experiments we varied the number of processors used for computations, from 1 to 10. In Table 1, timing results are shown for the following versions of the program: 209
Parallelization variant None (sequential) Simultaneous matrix vector Parallel loops Parallel loops and simultaneous matrix vector
Number of processors 1 2 4 8 10 80 80 80 80 80 84 64 65 67 68 95 74 68 65 64 102 58 51 47 45
Table 1: Execution times, in seconds, for solving the Markov chain model of packet formation sequential, parallelized: the two matrix-vector multiplications (with respect to each other) only, parallelized: vector operations only, parallelized: both the two matrix-vector multiplications and vector operations. The results conrm the benets from parallelizing the biconjugate gradient method. There is a considerable reduction in execution time, due to both the true parallelization (matrix-vector multiplications performed in parallel) and vectorization (parallel loops). It is seen from the second row of the table that there is no point in using more than two processors if the user concentrates on parallelizing the matrix-vector multiplications only (which is obvious, since there are only two such operations in each iteration of the algorithm). As regards parallel loops, the execution time decreases constantly as more processors are used; however, the changes are not signicant between results obtained for larger numbers of processors. This is due to the fact that the one-dimensional array operations are not nearly as time-consuming as the matrix-vector multiplications. The above is conrmed if rows 2 and 3 are compared: for 2 and 4 processors, execution times in row 3 (slightly) exceed those in row 2, and for 8 and 10 processors, although the program runs faster with only vector operations parallelized, the time dierences are only a few seconds. Signicant dierences are visible in the last row; applying the two levels of parallelization resulted in a considerable time reduction, both in comparison with the single-level parallelizations and, especially, the sequential execution of the program. 210
6. Conclusion We have discussed a solution method for large Markov chains which model the formation process of an optical packet from electronic packets. It has been shown how to speed up the ensuing time-consuming computations with parallel methods. We are currently in the process of testing the parallelized method on Markov chains which are at least an order of magnitude larger than the one presented in this paper. Besides, we are planning to port the software to a multiprocessor Linux machine, with program parallelization in OpenMP, now the de facto standard parallel programming environment. References [1] R. Fletcher, Conjugate Gradient Methods for Indenite Systems (1976), Springer Verlag, Berlin. [2] M. R. Hestenes, E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Research Natl. Bur. Standards 49 (1952), 409436. [3] W. J. Knottenbelt, Parallel Performance Analysis of Large Markov Chains Ph. D. Thesis (1999), University of London, Imperial College of Science, Technology and Medicine. [4] C. C. Paige, M. A. Saunders, Solution of sparse indenite systems of linear equations, SIAM Journal on Numerical Analysis 12 (1975), 617624. [5] P. Sonneveld, CGS, a fast Lanczos-type solver for nonsymmetric systems, SIAM Journal on Scientic and Statistical Computing 10 (1989) 3652. [6] W. J. Stewart, Introduction to the Numerical Solution of Markov Chains (1994), Princeton University Press, Princeton.
211
!"# $# $%&'"(
)* "# $
+"(
%&
'
' , .. , , , , 4% 7 , , , 8 09934 .. . , , . . , 5 , 4 , , , , 8 , , 6 5 > 6 , . , , , .. . , , 0923 ' . , /
., 01234 $ , 5 . .7 , , , 4
, .. , 6.. , 6 - . , ,
, -
4 " .7 . , , 4 . , 0123 , , 4 6 , , 5 . , - 4 .7 .. , , ; -; "@ , ; , ?49 = , <41=4
.. , , . , 0:3 . , 7 6 , . .. , . , , . 6 . @ 71 . , 4 .. , .. ,
?41 =4
A , , .7 , . % . ,
B . ,
, ) ; C 42 6
6- ,4 . ,
6 4 ,
09<3 , , ,
, . , . ,
(
' 7
9= . .
7
'
('
= .. , ,
( 013 4
, ; .7 7
C
6 . , 0<3
* +; ( (
C
=
.
; =
, .
, ;(
, 7
C
=
(;
,
=
, - . ,
, ;( = D
E
* +; ( (
C
=
, - . .
, .
, ;( ; = =
, , . , . , . . (
, ;( =
,
(;
. ,
6- 54 ;1= . ,
, ,
(;
(;
, , .. , . . 6 6
;(
6 .
444 ( =
. . , , / . / . 6.7
94
4 , -* 5 6 , . . 4
, ,
*,4 .
19<
-* ; =
*,
, ; ( ; == .; ( ; = =
, ; ( ; == .; ( ; = =
, . 5 , 9 *, , , , .
*, ; =
.7 9. . % .7 .7 7 6, 7 5 , -* 4 . E , 4 . , 7 , 7 ) , .7 ,
, ,
)
6-
C 42 E .
.
C 42
)
,
9
. .7
6 0134
, ;( ; = = , ;( =
. ,
, ;( ; = = ;( ; = =
, ;( =
D 14
!
5 ,
!
. ,
; ,
. 6 4% ,
= ,
, ,
F , . ,
6-
4 4
. 023= . " .7 6.. , . . "@ . =4 , , 6
, , .
. , ,
0H3 01C34 8 , ,
, > 6
. .. , 0123 .7 , , 4 " , 4 . - ; 4 4 . 09G3 09G3 0G3 09G34 . 5 ?49 . -4 , ?41 @ 71 ;
.. ,
192
"# 4 > 6 6 . , , , , ?2 % F , , 4 , . % 5 @ C424 . . % , , ; 5 4 . , 4 *, / 4 >4 4 . 6., , , . 4 . 5 , .. , 4 , , , 8 , 6 4
. 6 6. / 9 , = 5
C 4?
, 7 . , 6C4H24
,7
, $"%
;8 ,
*, / 4 >
, 4 . @ 71 , , .
, , ,
"@ 7 .
, .
@ 4
71 - . @ , 71
19:
/ 4
79 . ,
4 . .. *@ ; @=4
@ ; @
71 8= .@,
, 5 ; @J >= . = E *@ 91 ? ;
, @ 71 . 91 .
G1C 2G: 4
. @ < 71 . .
. . -
*@ 6 6 F 4 , , . @ , , , , . . % % , 4 , 4 ;
6D $8' 6 . , = < 6D I8' 1 , 7 , C4:21 . $8' C4:<G . I8' , ,
. , , 5 5 C4:G< C4:G:4
,7
; @
71
.. ,=
19G
&
" , ,
9
'
@ , . , , , , ;" @= , , , 4 , ,
9
#
6- > , , , , , 6
$
" 6 . 4 , , @ ;"" @= 4 . , 6 K . , 4 , 01?3 F .6 , 4 . 7
7 ,
444 4 8
,, / 6
, , " @ @ ; 6 @@= 0934 > ( 6 "" @ , ,
4" , , , , 7 ,
7 -
0
. , 6 6 6 6 ,., 6 6 , , , 6 , , - . $ $
9 =4 >
/
6-
;9
0;/ 1 =4 0;/ =
, -F 6 7
$ L /9
=4 >
K
$,
4 , K 4 6 , , ,
0; (
$ L/
/ 01:34
, 5 . . 7
4 ,
7 7
( ,
.0 ( 3
C
1 @ ;(
6 , 6
1=
. , 5 F 4 , .
(
0
"" @ ,
9 =4 ( ,
7 5
.0 ( 3
(;
;9=
19H
6 6 , / ( , 5
;@ ;/ =
9= 444 @ ;/
==
4
K "" @
;/
(;
. ,
';.; ( L /
/ .
C= 444 . ; ( L /
7 ,
==
. ( , 6 . ,
(
=
.0 ( 3
, ,
(;
,
, . ,
; 4 4 09:3=
* +; (
"" @ ,
( =
,
.0 (
( 3 .0 (
, 6
3 .0 ( 3
;1=
, . ,
* +; (
( =
( ; = ;);
=( ;
;?=
;);
, -
==
@ ;/
,
$ L/
9
/4 .
$
,
9 * +; ( (C =
,
* +; ( L L 9
. , 6 /
( =
* +; ( ( C =
* +; =
,
(;) L L
, . ,
=( * +; =
/ )L L 6 ,
* +; =
$ 1 $
L L $
(% 9 $ $ %(
/ ,
;<=
% $
.-
% $9
, 7
% $1
9 9
$1
444 $ $$
,
$ C .
.7 09M34
444 $
$
.. , 6 .
19M
, , 6 =4 . / 4
.7 ,
;*; ;<49=
.
1
, =
6 6 , .
6 -
& , , . - 09M3 6
$ . .
1
# . / "" @ .
9
.7
.. , 4 4
9 9D
9D
444 9 D
9D 9 2D C 444 C
9D
444
9D
2D ;2 D = 1 444 ;2 D = 9
, , E . -? / ( .
C 9 ;2 D = 1 444 C
2 N
444 C 444 C 444 444 444 9 ;2 D =

6 .
9 C C 444 C
,
C C C 444 C
C C C 444 C
444 444 444 444 444
C C C 444 C
.0 ( 3
, % 7, ; , . "" @;2= ; , 6 , 2= . .
9 ;9 D 2 = 9 ;9 D 2 =
.7 4 , , / , E . , =4 8 6 .
;2=
11C
& * , 24 8 ,
* . 6 .
$ .. , 6 ;*, / 4 >= @ 71 . 6 6 4> . % 6 6 4 ( 54 ;2= , , .. , , 7 6 6
5
@ @
C4:H<1:H
;$8' . O 6 - 6 =4
5 94CG2H<9 .
/
"
.,
3 (
/ , 6 . 3 6
(9
. ,
(1
444 (
0 9 =4
.0 3 3
,
.0 ( 9
(1
444 ( 3
.0 ( 3
.0 3 1 3
,
.0; ( 9
.
1
(1
6
1
444 ( = 1 3
.,
.0 ( 1 3 1
;
9
=; () ( =
, 03 3 .0 ( 3
$ 6 , .. ;C4H2 @ % , ,
.0 3 3 . 0 3 3
(
, -4 % , % 71 . .. , . "" @;2= . , 4 6-
.0 ( 3
. 0( 3 1
9
=; () ( =
.0 ( 1 3
. 8 ,
( ;1= 4
. .. , . , 7
9941<MHMH 4
? , , 4
C4H2 .
, .. =4 %
9CC 4
14G22C1M 4
, 5 6 C4:G?MC:4 < .
119
% .
7,
. "" @;2= ,
C4:G<
? > 7
, 7
"" @;2=
< > 7
, 7
; @
71=
"" @;2=
111
#
, . - 6 ' 5 . + , . ; .. ,4 , , . , 6 6 - 6 2 : , . , 4 , . , 4 . 4 N5 . .7 6 .. , , , -6 , 4 . , , , = -6 , .. , , -6 , 4 , # , . , 6 # .7 4 ' , , , . ,
,
5
#
6 - , . ,
, , -
6 6 6
. .7
6 , , .
.. , , , ; 6 .. = , 65 N . - 91 4 , 5 6 6 , 6 6 , . 4 -
6 ,
, 5
11?
-6 ,
+ , 6. 6 .. , G , 6 . -,
9
# , .
# , , , 5 3 . 7
# . 4 -4
12C
4 , , F 4 , 5 , , ; , -, 4 . ,
, N , 5 * ,
1
, ,
C42 =4
9
, .7 -
4 . 6 6 , , ,
11<
1 Priority Queue - Self-similar Source Non-priority Queue - Self-similar Source Priority Queue - Geometric Source Non-priority Queue - Geometric Source
0.1
0.01
0.001
0.0001
1e-05
1e-06
1e-07
1e-08
1e-09 0 100 200 300 400 500 600 700 800
G '
6 4 P C4M 5 P C42 54 P C42=
.7
P C49
+ ' , , , 7 , 6 . ;
# , , . , , 6 5
# -" .. , =4 , -
# , ' 7 4 F 4 6 .. F C4CC1 01G 3 . 12C

1
6 5 F 6 6 , - . , 6 -
, .
6 .. , - 6 . , 4 ' 6 .' , 9CC ,
- . 5 , - .
5 '
F ; +'=
1 1
, 1CC .7
'
C49 01H34
H 4
112
1 Selfsimilar traffic Poisson traffic 0.1
0.01
0.001
Probability
0.0001
1e-05
1e-06
1e-07
1e-08 0 50 100 Nb of packets 150 200 250
6 54 P C42=
.5
.7
;5 P C42
0.01 Selfsimilar traffic Poisson traffic 0.001
0.0001
Probability
1e-05
1e-06
1e-07
1e-08 0 100 200 300 400 500 600 700 Time [slot]
M ' 54 P C42=
.7
;5 P C42
11:
. /
@ .. , - . , 5 6 , - , , 6, , = 4 . .. , 4 , , 6 . 6 . , .7 . " .7 . 6 , 7, 4 . , , 6 , , . . , ;' " . . -6 4
093 8 ,, 4 FF 484 > 4 ' 4 '4 6 ! 77 1 6 )' 8 93 9:: 8 4 7)" $ 4M 4? 9MM94 013 8 )4 8 6 '7 ! 4$ % 9MM< 4 0?3 $ @4)4 0 4 $ " 9MGG 4 0<3 $ / 4'4 6 ' ' 4" , 9MH<4 023 $ 4 4 8 4 8 8 ; ; ; 0 " .+ ! 0 * D $ , 2;:= H?2 7 H<: 9MMG4 0:3 4 * $ 9< ", , @ 6 84 I4 ; 7% = ) 9MM<4 0G3 4 6 4$4 4 ( F 4 4 * ' '3 3 "8 '! " $* $ 4$ 4' 4 1H 41 9MMH4 0H3 4 4 7 ' 8 8 , <, $ " $ 1:M71HC > " 6 9MM<4 0M3 . '4 ' 6 "4 + + ' 2 ' 0 @ . , 4 1? 9MM7192 " 6 9MM24 09C3 '4 7 8 ; . , $ , 4 ?H " 6 9MMC4 0993 )Q R "4 7 => ? @ @ @ , "@ , 9MMM4 0913 > 4 4 4I4 ) ' < 7 6 3 " 6 3 . , KM9 99 4?494
11G
09?3 > "4 % $4>4 A < * : "* 8 D $ , 4 9 :GH7:M1 , 6 9MM?4 09<3 6 84 )4I4 : 7 : 3 " ' 4 9C *, 6 9M:H4 0923 4 9 , , , 6 M<7?? " 6 9MM<4 09:3 @ 4 ! 0 0 < , 0 8 ! , % 9MH<4 09G3 @ / I4 - "4 ; " ! ' D $ , ? 11:71<< 9MM24 09H3 ' 6 "4 > 8 , )4S4 3 7 ! 8 8 @ . , 4 ?C ; 971= 2G7:H ) - 9MMG4 09M3 ' 6 "4 7 B 7 + <B 1 * @ , @ - , 5 B B > , 9MM: 9<GM " F 4 01C3 ' *4 8 7!. + ' 7 ! . F6 4 9C9 6 9MM24 0193 " , )4 % )4 7 ! . 6 3 $ , . $ 4 1? G997G19 , 6 9MHC4 0113 " %4 4 '' ' + 0 , , 4 1M 9M:94 01?3 " 4>4 < ' + @ ,4 ' -4 " ,4 " 4 4 1?1 4 : 7 ?9 > 9M224 01<3 " 4 )' 8 3 " *!C ! 7- ' ! 4 @ , % ) - 9MMH4 0123 4 > 4 4 5 4"4 4 9 . D 1 + E D $ , 6 - 9MM<4 01:3 4 4 * * 3 "7 ' 2 4@ , @ > 9MM?4 01G3 - "4 ) , 6 I4 ' , ;' = - . , , 4 D $ , 49 9MMG4 01H3 - "4 , ." @ 7 6 9MMG4
11H
Introduction to the accelerated simulation method RESTART

MANUEL VILLN-ALTAMIRANOA
A
JOS VILLN-ALTAMIRANOB
Telefnica I+D, Emilio Vargas 6, 28043 Madrid, Spain. manolo@tid.es
Departamento Matemtica Aplicada (E.U. Informtica), Univ. Politcnica de Madrid, Arboleda s/n, Madrid, Spain jvillen@eui.upm.es
Abstract. RESTART (Repetitive Simulation Trials After Reaching Thresholds) is a widely applicable accelerated simulation technique that allows the evaluation of extremely low probabilities. This paper revisits the theoretical basis of RESTART in a more general and rigorous way. The unbiasedness of the estimator is proved and its variance is derived without need of making any assumption on the simulated system. From this analysis, the gain obtained with RESTART as well as the optimal parameter values that maximise the gain are derived.
1. Introduction
Performance requirements of broadband communication networks and ultrareliable systems are often expressed in terms of events with very low probability. Probabilities of the order of 10-10 are often used to specify packet losses due to traffic congestion or system failures. For evaluation of such systems simulation is an effective means, but acceleration methods are necessary because crude simulation requires prohibitive execution time for accurate estimation of very low probabilities. One such method is importance sampling; see [1] for an overview. The basic idea behind this approach is to alter the probability measure governing events so that the formerly rare event occurs more often. One drawback of this technique is the difficulty of selecting an appropriate change of measure since it depends on the system being
simulated. Researchers have, therefore, focused on finding good heuristics for particular types of models. Another method is RESTART (Repetitive Simulation Trials After Reaching Thresholds). In this method a more frequent occurrence of the formerly rare event is achieved by performing a number of simulation retrials when the process enters regions of the state space where the chance of occurrence of the rare event is higher. These regions, called importance regions, are defined by comparing the value taken by a function of the system state, the importance function, with certain thresholds. The application of this method for particular models only requires the choice of a suitable importance function, and this choice may not need a complex analysis [2]. The method was introduced by Bayes, A.J. in [3] and has a precedent, with much more limited scope of application [2], in the splitting method described in [4]. VillnAltamirano, M. and J. coined in [5] the name RESTART and made a theoretical analysis that yields the variance of the estimator and the gain obtained with one threshold. The analysis was extended for multiple thresholds in [6]. It has allowed efficient applications of RESTART. A list of papers reporting them is provided in [2]. In the present paper the unbiasedness of the estimator is proved and its variance is derived in a more general and rigorous way. Simplifying assumptions on the independence of system states in the entrances of the system in the important regions, made in [5] and [6], are not used in this paper. Therefore the formula of the variance here derived is general. Optimal values for thresholds and the number of retrials that maximise the gain obtained with RESTART are also derived in this paper. As optimal parameters are difficult to evaluate, the paper proposes a simpler approach providing quasi-optimal parameters that lead to a gain close to the optimal one. Optimal parameters for RESTART with multiple thresholds were also derived in [6], but while an approximate formula of the gain was used there for deriving them, the exact formula of the gain is used here. It slightly affects the results: setting of thresholds at a distance e-2 was recommended in [6] and thresholds set as close to each other as possible are now recommended. The paper is organised as follows: section 2 presents a review of the method and section 3 proves the unbiasedness of the estimator. Section 4 derives the formula of the variance, section 5 derives the formula for the simulation cost and section 6 provides the expression for the gain of efficiency. Section 7 derives optimal parameters and finally, conclusions are stated in section 8. 230
Description of RESTART
RESTART has been described in several papers, e.g., [2,5,6]. Nevertheless it is described here in order to have a self-contained paper. Consider the simulation of a stochastic process Z = (Z(t), t 0), with discrete state space and either discrete or continuous parameter. The process may be Markovian or non-Markovian. For the application of RESTART, we will consider the process X(t), associated with the original one, where the system state at time t does not only include the variables describing Z at time t but also: - The variables describing Z(t1) for t1 < t that may impact on Z(t1) for t1 > t (e.g., starting time of a service). - The variables describing Z(t1) for t1 > t that are already known at time t. This knowledge of the future is inherent to the simulation technique, given that the variables of Z(t1) related to scheduled events (e.g., time scheduled for a packet arrival) are known in advance. Let denote the state space of X(t). A nested sequence of sets of states Ci, (C1 C 2 ... C M A) is defined, which determines a partition of the state space into regions Ci Ci+1 ; the higher the value of i, the higher the importance of the region
Ci Ci+1 . These sets are defined by means of a function : , called the
importance function. Thresholds Ti (1 i M) of are defined such that each set Ci is associated with Ti . As well as a crude simulations RESTART can be applied for many definitions of the probability Pr{A} of the rare set A. This probability is often defined either as the probability of the system being in a state of the set A at a random instant or at the instant of occurrence of certain events1 denoted reference events. An example of a reference event is a packet arrival. If the rare set is a buffer being full, we are not usually interested in the probability of the buffer being full at a random instant but at a packet arrival. For simplicity, the notation will only refer to the last definition. Analogously, the probability Pr{C i } of set Ci is defined as the probability of the system being in a state of the set Ci at a reference event.
From now on the term event refers to a simulation event, i.e., an instantaneous occurrence that may change the state Z. The system state resulting from the change will be called system state at the event.
231
A reference event at which the system is in a state of the set A or set Ci is referred to as an event A or event Ci, respectively. Two additional events, Bi and Di , are defined as follows: Bi : event at which Ti having been < Ti at the previous event; Di : event at which < Ti having been Ti at the previous event. RESTART works as follows: A simulation path, called main trial, is performed in the same way as if it were a crude simulation. It lasts until it reaches a predefined "end of simulation" condition. Each time an event B1 occurs in the main trial, the system state is saved, the main trial is interrupted, and R1 1 retrials of level 1 are performed. Each retrial of level 1 is a simulation path that starts with the state saved at B1 and finishes when an event D1 occurs. After the R1 1 retrials of level 1 have been performed, the main trial continues including the portion [B1 , D1 ) of the main trial, is R1. Each of these R1 paths is called a trial [B1 , D1 ) . The main trial, which continues after D1, leads to new sets of retrials of level 1 if new events B1 occur. Events B2 may occur during any trial [B1 , D1 ) . Each time an event B2 occurs, an analogous process is done: R2 1 retrials of level 2, starting in B2 and finishing in D2, are performed, leading to a total number of R2 tri als [ B 2 , D 2 ) . The trial [B1, D1), which continues after D2, may lead to new sets of retrials of level 2 if new events B2 occur. In general, Ri trials [Bi , Di ) (1 i M) are performed each time an event Bi occurs from the state saved at B1. Note that the total number of simulated paths [B1 , D1 ) ,
in a trial [Bi 1 , Di 1 ) . The number Ri is constant for each value of i.
paper, indistinctively refers to a complete or to a prematurely finished trial [Bi , Di ) .
A retrial of level i also finishes if it reaches the "end of simulation" condition before the occurrence of event Di. The term trial [Bi , Di ) , often used in the rest of the
In the case of a transient simulation, the main trial is made by means of successive replicas of the transient period. Each replica lasts until it reaches a predefined end of replica condition. Retrials are made as above described, with the particularity that a retrial also finishes if it reaches the end of replica condition. Figure 1 illustrates a RESTART simulation with M = 3, R1 = R2 = 4, R3 = 3, in which the chosen importance function also defines set A as TL. Bold, thin, 232
dashed and dotted lines are used to distinguish the main trial and the retrials of level 1, 2 and 3 respectively.
(t)
TL T3
3
A B3 B2 D2 D2 D2 B1 D1 D1
1 D1 D
B3 D3 D3 D3
T2 T1
B1
t (time)
Figure 1: Simulation with RESTART. Note that RESTART performs oversampling in high importance regions. The oversampling made in the region Ci Ci +1 (CM if i = M) is given by the accumulative number of retrials:
ri = R j
j =1 i
(1 i M)
Thus, for statitics taken on all the trials, the weight assigned to a trial when it is in the region Ci Ci +1 (CM if i = M) must be 1/ri. The end of simulation condition or the condition for the start or the end of a simulation portion (as e.g., the initial transient phase or a batch of a batch means simulation or a replica of a transient simulation) may be defined in the same way as in crude simulation. For example, the condition may be that a predefined value of the simulated time or of the number of simulated reference events is reached. These conditions hold for a trial when the sum of the time (or of the number of reference events) simulated in the trial and in all the predecessors reaches the predefined value. Some more notation: r0 = R0 = 1, C0 = , CM+1 = A; Ph/i (0 i h M + 1) : probability of the set Ch at a reference event, knowing that the system is in a state of the set Ci at that reference event. As Ch Ci, Ph/i = Pr{Ch } Pr{Ci } . Note that Ph / i Pi / j = Ph / j ; 233
P = PM +1 / 0 = PA/ 0 = Pr{A} ;
PA/i = PM +1/i ;
NA: total number of events A that occur in the simulation (in the main trial or in any retrial);
0 NA : number of events A that occur in the main trial;
N i0 (1 i M): number of events Bi that occur in the main trial;
N: number of reference events simulated in the main trial; ai (1 i M): expected number of events Ci in a trial [Bi, Di); Xi (1 i M ): random variable describing the state of the system at an event Bi (of the main trial) randomly taken; i (1 i M ) : set of possible system states at an event Bi;
* PA / X i (1 i M ) : importance of state Xi, defined as the expected number of events
A in a trial [Bi, Di) (without counting upper thresholds retrials) when the system state
* * at Bi is Xi. Note that PA / X is a random variable which takes the value PA / x when Xi
i i
= xi ;
PA 1 i M ) : expected importance of an event Bi: /i (
* * PA / i = E PA / X
PA / x dF ( x i )
i
* where F(xi) is the distribution function of Xi. Note that PA / i = a i PA / i and that * 0 0 PA / i = E[ N A ] E[ N i ] ;
V PA /X
) (1 i M ) : variance of the importance of an event Bi:

* * V PA / X i = E PA / X i
[(
)]
2
* 2 ( PA /i )
(1)
Unbiasedness of the estimator The estimator of the probability of the rare set A in a RESTART simulation depends on how this probability has been defined. For the definition adopted in this paper, the estimator for P is: 234
= NA P N rM
(2)
where N takes a fixed value, which controls the end of simulation condition. Note that the weight assigned to NA is 1/rM given that NA includes the events A occurred in all the trials, while the weight given to N is 1 since N only includes the reference events occurred in the main trial. The unbiasedness of the estimator is proved by induction: the estimator of P in a
= N 0 / N , which is an unbiased estimator. As the crude crude simulation is P A
simulation is equivalent to a RESTART simulation with M = 0 , and formula (2)

= N 0 / N for M = 0 , the estimator of P in a RESTART simulation is becomes P A
unbiased for 0 thresholds. Thus, it is enough to prove that if it is unbiased for M-1 thresholds, it is also unbiased for M thresholds. Consider a simulation with M thresholds (T1 to TM). If the retrials of level 1 (and their corresponding upper-level retrials) are not taken into account, we have a denote the number of events simulation with M-1 thresholds (T2 to TM ). Let NA and P
M 1 A and the estimator of P respectively in the simulation with M thresholds, and N A
M 1 the number of events A and the estimator of P in the simulation with M-1 and P thresholds. Define m as the random variable which indicates the sum of the number of events A occurring in the mth trial [B1, D1) performed from each event B1 of the simulation counting all the events A occurring in the corresponding upper-level retrials. Note that, among the R1 trials [B1, D1) performed from each event B1 in the M threshold simulation, only the one being a portion of the main trial belongs to the M 1 threshold simulation. Assigning m = 1 to this trial:
M 1 NA =1 ; N A = R1 m =1
As the R1 trials [B1, D1) made from each event B1 start with identical system state, E[1] = E[2] = . . . = E[ R ] . Thus, as R1 is constant:
1
M 1 E[ N A = R1 E N A
and consequently: 235
= EP
[]
E [N A ] N Ri
i =1 M
M 1 E NA
N Ri
i =2
] = E[P ]=P
M 1
(3)
is also unbiased in a RESTART simulation with M thresholds. This proves that P
Variance of the estimator The variance of the estimator is also derived by induction: first a formula is derived for 0 thresholds (crude simulation) and generalised for M thresholds; then it is proved that if the generalised formula holds for M-1 thresholds, it also holds for M thresholds.
2. Variance for 0 thresholds (crude simulation)

In a crude simulation the variance of the estimator is given by:
0 0 0 ) = V N A = V N A = P V N A = K A P V (P N N E N0 N N2 A
( )
( ) [ ]
0 0 ) E [N A ] . In simulations defined with a constant time duration where K A = V (N A
t, KA is the index of dispersion on counts, IDC(t), of the process of occurrence of events A for the time t simulated. In any case, KA is a measure of the autocorrelation of the process of occurrence of events A. If the process is uncorrelated, KA is close to 1 (exactly, K A = 1 P ).
0 The definition of KA also applies to a RESTART simulation, where N A is the number of events A in the main trial.
3. Variance for M thresholds

in a RESTART simulation with M thresholds is given by: The variance of P
) = K AP 1 + V (P N rM
M i =1
s i PA / i ( Ri 1) ri
(4)
with: 236
si =
0 V E NA i 1 0 K A PA / i E NA
( [ ]) [ ]
(1 i M)
(5)
where i = N i0 , X i1 , X i2 , ... , X iN
( X i1 , X i2 , ..., X iN i )
0
( (
0 i
)) ,
N i0 being the random variable indicating the
number of events Bi occurred in the main trial of a simulation randomly taken and being the vector of random variables describing the system states at those events Bi. Formula (5) is further developed in section 4.3 to gain insight on si. The formula provided in section 4.1 for 0 thresholds is an application of formula (4) to the case of M = 0 (where rM = r0 = 1). Let us now prove that if formula (4) holds for M-1 thresholds it also holds for M thresholds. Consider the two related M-1 and M threshold simulations described in section 3.
M 1 is given by adapting formula (4) to the case of M-1 Assume that the variance of P thresholds numbered from 2 to M:
M 1 ) = K A P V (P N
Rk
k =2
M i=2
si PA / i ( Ri 1)
Rk
k =2
(6)
M 1 ) and V ( P ) can be written as: V (P
M 1 ) = V E P M 1 V (P 1
([
(
] ) + E[V (P
M 1
1 )
(7)
) =V E [ P ] + E[V ( P )] V (P 1 1
(8)
As E [1 1 ] = E [ 2 1 ]= . . . = E [ R 1 ] , a reasoning analogous to that used to derive

1
formula (3) leads to:

=E P M 1 EP 1 1
[ ] [
237
(9)
and consequently to:

=V EP M 1 V EP 1 1
([
]) ( [
])
(10)
Formula (10) relates the first terms of formulas (7) and (8). Let us now relate their second terms. As
= P NA N Ri
i =1 M
1 R1
R1 m =1
m
N Ri
i=2 M
we can write:
)= 1 V V (P 1 R12
R1 m =1
N Ri
i=2
The R1 random variables m are identically distributed but dependent given that the vector state 1 of the events B1 at which the trials start is the same for any value of m.
) becomes: However conditional the value of 1, they are i.i.d. Thus V ( P 1
)= 1 V V (P 1 1 R1
N Ri 1 =
i =2
1 M 1 V P 1 R1
thus:
M 1 ) = 1 EV P E V (P 1 1 R1
[(
)]
(11)
Substituting (10) and (11) in (8) and subtracting equation (7) divided by R1 from (8), the following relation is obtained:
) = V (P M 1 ) ( R 1) V (P M 1 + 1 V EP 1 R1 R1
([
])
(12)
238
Applying successively formula (9) for M-1, M-2, , 2 and 1 thresholds:

0 M 1 = E P 0 = E N A 1 EP 1 1 N
] [
(13)
0 Substituting (6) and (13) in (12), taking into account that E [ N A ] = NP and
considering the definition of si given in formula (5), formula (4) is obtained.
4. Analysis of factors s i
0 In order to gain insight on factor si, the following formula for V ( E[ N A i ] ) is
derived in the appendix:

0 * 0 0 * V (E N A i ) = (PA / i ) V (N i ) + E N i V PA / X i + 2 2
[ ] (
m =1
* E Max (0, N i0 m ) ACVm PA / Xi
(14)
* where ACVm (PA /X
) is the autocovariance of
0 i
* PA at lag m. For 1 i M, if we /X
i
define: K ' i = V ( N ) E[ N ] and

0 i
i =1 + 2
m =1
* E Max 0, N i0 m ACVm PA / X 0 i * V PA /X
[ ( E [N ]
)]
then formula (14) becomes:
0 i V E NA
([
])
* = E N i0 PA /i
[ ]( )
K 'i +
* V ( PA /X )
i
(P
* A/i
(15)
239
Substituting (15) in (5) and taking into account that PA and that / i = ai PA / i
0 0 E[ N A ] = PA / i E[ N i ] , the following expression of si is obtained:
si =
V ( PA / X ) ai K 'i + i * 2 KA ( PA /i )
i
, 1 i M
(16)
Let us analyse formula (16): - Factor K ' i : This factor is a measure of the autocorrelation of the process of the occurrence of events Bi in the main trial. If the process is uncorrelated, K ' i is close to 1 (exactly, K ' i = 1 Pi / 0 ai ). In most applications, the process has a weak positive autocorrelation and K ' i is slightly greater than 1. - Factor i : If the random variables Xi were independent (as assumed in [5] and * [6]), all the covariances ACVm (PA / X ) would be zero and thus i = 1. In general, i is a
i
measure of the dependence of the importance of the system states Xi of events Bi occurring in the main trial. In most practical applications, there may be some dependence between system states of close events Bi but this dependence is negligible for distant events Bi. Thus i is usually close to 1 or at least of the same order of magnitude as 1.
- Ratio V (PA /X
) (P ) : It greatly depends on the chosen importance function and

2 A/i
may have an important impact on the efficiency of RESTART. Simulation cost Let us define the cost C of a simulation as the computer time required for the simulation, taking as time unit the average computer time per reference event in a crude simulation of the system. With this definition of time unit, the cost of a crude simulation with N reference events is C = N. In a RESTART simulation, the average cost of a reference event is always greater as overheads are involved in the implementation of RESTART: (1) for each event, an overhead mainly due to the need to evaluate the importance function and to compare it with the threshold values, and (2) for each retrial, an overhead mainly due to the restoration of event Bi (which includes to restore the system state at Bi and re-schedule 240
the scheduled events). To account for these overheads, the average cost of a reference event in a RESTART simulation is inflated (1) by a factor ye > 1 in any case and (2) by an additional factor y ri > 1 if the reference event occurs in a retrial of level i. Using the above definition of time unit, the average cost per reference event is y0 = ye in the main trial and yi = ye yri (1 i M) in a retrial of level i. As the expected number of reference events in the retrials of level i of a RESTART simulation (with N reference events in the main trial) is N Pi / 0 ri 1 (Ri 1) , the expected cost of the simulation is:
C = N y0 +
M i =1
yi Pi / 0 ri 1 (Ri 1)
(17)
Remark: factors yi affect the simulation cost when it is measured in terms of required computer time, but not when it is measured in terms of number of events to be simulated. In this case y i = 1 (0 i M) . Simulation gain with RESTART
is given by the relative confidenceA measure of the efficiency for computing P
) P 2 . To compare the RCNC of normalized cost, RCNC which is defined as C V ( P
several estimators is equivalent to comparing the computer costs for a fixed relative width of the confidence interval. The gain G obtained with RESTART can be defined as the ratio of the RCNC with
) and C crude simulation to the RCNC with RESTART. From the formulas of V ( P
) = K P N and C=N in a crude simulation, formulas (4) and (17) in a RESTART ( V (P A
simulation) and defining s0 = 0, s M +1 = 1 and y M +1 = 0 , the following expression of the gain is obtained:
G= P
i =0
s i +1u i (1 Pi +1 / i ) Pi +1 / 0 ri
1
M i =0
y i vi +1 Pi / 0 ri (1 Pi +1 / i )
(18)
241
where
1 ui =
y si 1 i +1 Pi +1 / i Pi +1 / i yi s i +1 ; v i +1 = 1 Pi +1 / i 1 Pi +1 / i
(0 i M )
Quasi-optimal Parameters To maximise the gain G in formula (20), factors si and yi must be minimized and optimal values for Pi/0 (or equivalently Pi/i-1) and ri need to be derived. Let us focus in this section on the optimal values of Pi/i-1 and ri. These optimal values, that are function of si and yi, have been derived in [6]. However, in a practical application, the values of si and yi are difficult to evaluate. Therefore, approximations of the optimal values of Pi/i-1 and ri that are independent of si and yi and given by simple expressions are recommended. As these approximations of the optimal parameters provide a gain close to that obtained with the optimal ones they are called quasi-optimal parameters. These parameters are derived assuming that the product S U takes the same value for
i +1 i
every i (0 i M) and that the same occurs for the product yi v . With these i +1 assumptions, quasi-optimal parameters maximising the gain are easily derived from (18) in these three steps: For fixed values of Pi/i-1, quasi-optimal values of ri are derived:
ri =
1 Pi / 0 Pi +1 /1
(1 i M )
(19)
In practice, as the number of retrials Ri must be integer, a value close to that given by (19) that satisfies this restriction must be chosen for ri After substituting (19) in (18), quasi-optimal values of Pi/i-1 for a fixed number of thresholds are derived:
1
Pi / i 1 = P M +1
242
(20)
This equation means that for a fixed number of thresholds, "quasi-optimal" gain is obtained when all the probabilities Pi/i-1 have the same value . After substituting (19) and (20) in (18), quasi-optimal value of M is derived. In [6] the factor (Ri 1) appearing in Equations (4) and (17) was approximated by Ri. With this approximation, the quasi-optimal value of M obtained leads to Pi / i 1 = e 2 . However, if Ri 1 is not approximated by Ri, then the larger the value of M, the greater the gain. Thus Pi / i 1 must be as close as possible to 1, i.e., the thresholds must be set as close as possible. In practice, there are two limitations on how close the thresholds can be set: one is due to the values that can take when it is a discrete function; the other is due to the restrictions on the value of Ri derived from the chosen thresholds. This value must be an integer number greater than one, given that Ri = 1 means that Ti is not really a threshold. The quasi-optimal gain, obtained when ri and Pi / i 1 are given by (19) and (20) respectively and M tends to infinite, is given by:
G=
1 P ( AVG (s ) ln P + 1) ( AVG ( y ) ln P + y 0 )
(21)
where AVG(s) and AVG( y ) are the arithmetical means of si and yi (1 i M ) respectively.
5. Conclusions
The paper has revisited the theoretical basis of RESTART in a general and rigorous way. The unbiasedness of the estimator has been proved and its variance has been derived without need of making any assumption on the simulated system. Thus the formula obtained for the variance generalises the one obtained in previous papers under some assumptions. Quasi optimal values for thresholds and the number of retrials that are easy to use in practical applications and lead to a gain close to the optimal one have been derived. 243
0 Appendix: Development of V ( E[ N A i ])
From the definition of Xi, related to an event Bi randomly taken (see section 2), and of i = ( N i0 , ( X i1 , X i2 , . .. , X iN )) , related to a simulation randomly taken, (see section
0 i
4.2), the following formula is derived:

N i0
E E [ f ( X i )] =
l =1
f X il
( )
(22)
E N i0
[ ]
where f(Xi) is any real function of Xi. For f ( X i ) = P

* A/ Xi
, as E[ P
* A/ Xi
]= P
* A/i
N i0
and E [
l =1
* 0 PA ] = E[ N A ] , formula (22) leads /X

l i
* 0 0 to PA / i = E[ N A ] E[ N i ] , a formula already given in section 2.

* 2 For f ( X i ) = ( PA / X ) , formula (22) leads to:
i
N i0 * V PA /X =
i
E
l =1
(P )
[ ]
0 i
2 * A / X il
EN
* PA /i
( )
(23)
Formula (22) can be extended to evaluate the expectation of a function of the

* system states at two events Bi. It allows to define the autocovariance of PA / X i at lag m,
* ACVm ( PA / X ) , as follows:
i
N i0 m
ACVm P
* A/ Xi
)=
E Max 0, N m
0 i
l =1
* * PA PA /X /X
l i
l +m i
)]
* PA /i
( )
(24)
244
Note that E [Max(0, N i0 m )] is the expected number of terms of the numerator, given that the number of terms is zero when N i0 m < 1 .
0 Based on previous relations, let us develop an expression for V ( E[ N A i ]) :
0 V E NA i
([
] ) = E [(E[N
0 A
] ) ] (E[N ] )
2 0 A
Define Z il as the random variable indicating the number of events A in the trial [Bi, Di) starting in the lth event Bi of the main trial. As N =
0 A N i0 l =1
N i0 l =1 N i0 l =1
Z and E
l i
Z i =
l i
E Z i =
l i
N i0 l =1
* PA / X , then:
l i
V E N i
0 A
[( [
] )]
2
=E +2
N i0
P
l =1
* A / X il
0 E NA
( [ ])
i
=E
N i0 l =1
(P )
* A / X il
m =1
N i0 m
E
m =1 l =1
* * 0 PA PA E NA / Xl / X l+m
i
( [ ])
and
Taking
into
account
formulas
0 * 0 ] = PA that E[ N A / i E[ N i ] and
2 E Max 0, N i0 m
)]
= E N i0
[( ) ] E [N ],
2 0 i
(23)
(24) (14)
and of
formula
section 4.3.1 is obtained.
References
[1] P. Heidelberger. Fast Simulation of Rare Events in Queueing and Reliability Models. ACM Transaction on Modelling and Computer Simulation vol. 5, pages 43-85, 1995. [2] M. Villn-Altamirano and J. Villn-Altamirano. On the Efficiency of RESTART for Multidimensional Systems. To appear in ACM Transaction on Modelling and Computer Simulation. [3] A.J. Bayes. Statistical Techniques for Simulation Models. Australian Computer Journal, Vol. 2, pages 180-184, 1970.
245
[4] H. Kahn and T.E. Harris. Estimation of Particle Transmission by Random Sampling. National Bureau of Standards Applied Mathematics Series, Vol. 12, pages 27-30, 1951. [5] M. Villn-Altamirano and J. Villn-Altamirano. RESTART: A Method for Accelerating Rare Event Simulations. In 13th International Teletraffic Congress, pages 71-76, 1991. [6] M. Villn-Altamirano, A. Martnez-Marrn, J. L. Gamo and F. Fernndez.-Cuesta. Enhancement of the Accelerated Simulation Method RESTART by Considering Multiple Thresholds. In 14th International Teletraffic Congress, pages 787-810, 1994.
246
Entropy Maximisation and Queueing Network Models with Blocking
DEMETRES KOUVATSOS
Networks and Performance Engineering Research Group Department of Computing, School of Informatics University of Bradford, Bradford BD7 1DP West Yorkshire, England, UK Abstract. This tutorial presents an overview of an analytic framework for a unified exposition of entropy maximisation and complex queueing systems and networks with blocking. In this context, a universal maximum entropy (ME) solution is characterised, subject to appropriate mean value constraints, for the joint state probability distribution of a complex single server queueing system with finite capacity, N, distinct either priority or non-priority classes of jobs, R, general (G-type) class inter-arrival and service time processes and either complete or partial buffer sharing scheme. The ME solution leads to the establishment of closed-form expressions for the aggregate and marginal state probabilities and, moreover, it can be stochastically implemented by making use of the generalised exponential (GE) distribution towards the least biased approximation of G-type distributions with known first two moments. Subsequently, explicit analytic formulae are presented for the estimation of the Lagrangian coefficients via asymptotic connections to the corresponding infinite capacity queue and GE-type formulae for the blocking probabilities per class. Furthermore, it is shown that the ME solution can be utilised, in conjunction with GE-type flow approximation formulae, as a cost effective building block towards the determination of a universal ME product-form approximation and a queue-by-queue decomposition algorithm for the performance analysis of complex open queueing network models (QNMs) with arbitrary configuration and repetitive service (RS) blocking
1. Introduction
Queueing network modelling is widely recognised as a powerful tool for representing discrete flow systems, such as computer, communication and flexible manufacturing systems, as complex networks of queues and servers and analysing their performance. Within this framework, the servers represent the active or passive resources of the system such as processors, memory and communication devices and the jobs circulating through the servers stand for the programs, messages or components being processed by and competing for these resources. The overall action of the system is described in terms of the assembly of jobs carried out by the individual resources and the available branches representing paths of information flows. Jobs of each resource concerned may be injected at service completion towards one or more of its output branches subject to some routing criteria leading to processing requests at other resources. Such transactions take varying times to be performed and jobs may arrive at resources during random time intervals. Thus, the queueing network model (QNM) leads to the concepts of resources being either busy or idle and of branches containing queues of jobs. Hence the performance analysis and evaluation of discrete flow systems requires the study of general queueing systems and networks. Classical queueing theory provides a conventional framework for formulating and solving the QNM. The variability of inter-arrival and service times of jobs can be modelled by probability distributions. Exact and approximate analytical methods have been proposed in the literature for solving equations describing system performance (e.g., [1-9]). These techniques lead to efficient computational algorithms for analysing QNMs and over the years a vast amount of progress has been made worldwide. Since the mid-60s, however, it became increasingly evident that, despite persistent attempts for generalisation, classical queueing theory cannot easily handle, by itself, complex queueing systems and networks with many interacting elements. In particular, exact closed-form solutions for QNMs with finite capacity, and, thus, blocking, are not generally attainable except for special cases such as two-station cyclic queues and reversible networks (c.f., Kelly [8]). As a consequence, cost-effective numerical techniques and analytic approximations are needed for the study of complex queueing systems and arbitrary QNMs with multiple classes of jobs, finite capacity and general inter-arrival and service times. To this end, alternative ideas and mathematical tools, analogous to those applied in the field of Statistical Mechanics, have been proposed in the literature (e.g., Benes 248
[10]). It can be argued that one of the most fundamental requirements in the analysis of complex queueing systems is the provision of a convincing interpretation for a probability assignment free from arbitrary assumptions. In more general context, this was the motivation behind the principle of Maximum Entropy (ME), originally developed and thoroughly discussed by Jaynes [11-12] in Statistical Physics. The principle provides a self-consistent method of inference for estimating an unknown but true probability distribution, based on information expressed in terms of known true mean value constraints. It is based on the concept of the entropy functional introduced earlier in Information Theory by Shannon [13]. Tribus [14] used the principle to derive a number of probability distributions. The mathematical foundations of the method and its generalisation to the principle of Minimum Relative Entropy (MRE) can be found in Shore and Johnson [15,16]. Authoritative expositions of the ME principle and its applications can be found in Kapur [17] and Kapur and Kesavan [18], respectively. Moreover, the principles of ME and MRE have inspired the establishment of a new and powerful framework for the approximate analysis of queueing systems and networks (e.g., [19-36]). This tutorial presents an overview of an analytic framework for a unified exposition of earlier works on entropy maximisation and complex queueing systems and networks. In this context, the principle of ME is applied, subject to suitable mean value constraints, to characterise a universal joint state probability distribution of a complex single server queueing system at equilibrium with finite capacity, N (N>0), R ( R 2 ) distinct either priority or non-priority classes of jobs, general (G-type) class inter-arrival and service times and mixed service disciplines drawn from First-ComeFirst-Served (FCFS), Last-Come-First-Served (LCFS) with (LCFS-PR) or without (LCFS-NPR) Preemption and Processor Share (PS) rules under a Complete Buffer Sharing (CBS) scheme and Preemptive Resume (PR) and Non-Preemptive Head-of Line (HOL) rules under either CBS or Partial Buffer Sharing (PBS) schemes. The ME solution leads to the establishment of closed-form expressions for the aggregate and marginal state probabilities and, moreover, it can be stochastically implemented by making use of the generalised exponential (GE) distribution towards the least biased approximation of G-type distributions with known first two moments. Consequently, explicit analytic formulae are presented for the estimation of the Lagrangian coefficients via asymptotic connections to corresponding infinite capacity queue and GE-type formulae for the blocking probabilities per class. Furthermore, it is shown that the ME solution can be used as a cost-effective building block, in conjunction with GE-type flow approximation formulae (c.f., [24-36]), towards the determination 249
of a universal ME product-form approximation and a queue-by-queue decomposition algorithm for the performance analysis of arbitrary QNMs with repetitive-service (RS) blocking. Note that in the context of a GE-type queueing network, the traffic entering and flowing through each queueing station of the network is bursty and it is approximated by a Compound Poisson Process (CPP) with geometrically distributed bulk sizes (e.g., [26]). This particular process corresponds to a GE inter-arrival time distribution and it is most appropriate (under renewality assumptions) to model simultaneous job arrivals at queueing stations generated by different bursty sources (e.g., voice or high resolution video). In this context, the burstiness of the arrival process is characterised by the squared coefficient of variation (SCV) of the inter-arrival time and, subsequently, the size of the incoming bulk. The principle of ME is introduced in Section 2. The GE distribution is presented in Section 3. A universal ME solution for a complex single server queue is characterised in Section 4. A universal ME product-form approximation and an outline of a queueby-queue decomposition algorithm for arbitrary QNMs with RS blocking are described in Section 5. Conclusions and further comments, including reference of ME applications into the performance modelling and evaluation of ATM networks and mobile systems, follow in Section 6. Remarks: (i) The RS Blocking Mechanism In QNMs with finite capacity, blocking arises because the flow of jobs through one queue may be momentarily halted if the destination queue has reached its capacity. Various types of blocking mechanisms born out of different studies have been considered in the literature (e.g., [26-32],[37-40]). Comprehensive reviews on open and closed QNMs with blocking have been compiled, respectively, by Perros [41] and Onvural [42]. An authoritative exposition of the subject appears in Perros [43]. One of the most important blocking mechanisms applicable to telecommunication, production and flexible manufacturing systems is that of repetitive service (RS) blocking with either fixed (RS-FD) or random (RS-RD) destination (e.g., [26,41-43]). This kind of blocking occurs when a job upon service completion at queue i attempts to join a destination queue j whose capacity is full. Consequently, the job is rejected by queue j and immediately receives another service at queue i. In the case of RS-FD blocking, this is repeated until the job completes service at queue i at a moment where the destination queue j is not full. In the RS-RD case each time the job completes 250
service at queue i, a downstream queue is selected independently of the previously chosen destination queue j. Due to the nature of the RS mechanism, deadlock can only arise under the RS-FD mechanism. (ii) The CBS and PBS Schemes The CBS and PBS buffer management schemes are applicable to both priority and non-priority service disciplines (c.f., Kouvatsos et al [29-32]). Note that priority classes are indexed from 1 to 2 in increasing order of priority. Under CBS scheme, jobs of any class can join a finite capacity queue as long as there is space. This buffering scheme is applicable to the analysis of both priority and non-priority queues in computer and data communication systems (e.g., [27-31,44]). Under PBS scheme, a sequence of thresholds ( N1 , N 2 ,
, N R : N1 = N, N i < N,
i = 2,3,
, R ) is set on the finite capacity, N, of the queue such that the highest
priority jobs ( i = 1 ) can join the queue simply if there is space. However, lower priority jobs can join the queue only if the total number of jobs of the same class in the queue is less than their threshold value. Once the number of lower priority jobs waiting for service reaches their threshold value, then all arriving jobs of the same class will be lost. The PBS scheme belongs to space priority mechanisms of high speed networks, such as Asynchronous Transfer Mode (ATM) networks, which are used to control the allocation of buffer space to arriving jobs (cells) at an input or output port queue of a switch. Implicitly, they provide several grades of service through the selective discarding of low priority jobs.
2. The Principle of ME
2.1 Formalism Consider a system Q that has a set of possible discrete states S = (S 0 , S1 , S 2 ,
which may be finite or countable infinite and state S n , n = 0,1,2, may be specified arbitrarily. Suppose the available information about Q places a number of constraints on P(S n ) , the probability distribution that the Q is in state S n . Without loss of generality, it is assumed that these take the form of mean values of several suitable functions {f 1 (S n ), f 2 (S n ),
, f m (S n )} , where m is less than the number of possible
states. The principle of maximum entropy (ME) [11,14-18] states that, of all distributions satisfying the constraints supplied by the given information, the 251
minimally prejudiced distribution P(S n ) is the one that maximises the systems entropy function
H(P) =
Sn S
P(S n ) log P(S n )
(2.1)
subject to the constraints:
P(S n ) = 1
Sn S
(2.2)
f k (Sn )P(Sn ) = Fk , k = 1,2,

S n S
,m
(2.3)
where {Fk } are the prescribed mean values defined on the set of functions
{f k (S n )} , k = 1,2,
, m . Note that in a stochastic context, for example, these
functions may be defined on the state space S of a Markov process with states
{S n }, n 0 , and P(S n ) can be interpreted as the asymptotic probability distribution

of state S n at equilibrium. The maximisation of H(P), subject to constraints (2.1)-(2.3), can be carried out using Lagranges Method of Undetermined Multipliers leading to the solution:
P(S n ) =
where exp{ k }, k = 1,2,
m 1 exp k f k (S n ) , S n S Z k =1
(2.4)
, m are the Lagrangian coefficients determined from
the set of constraints Fk , and Z, known in statistical physics as the partition function (or normalising constant), is given by:
Z = exp{ 0 } =
S n S
exp
m k =1
k f k (S n )
(2.5)
where 0 is the Lagrangian multiplier that corresponds to the normalisation constraint. It can be verified that the Lagrangian multipliers k , k = 1,2, the relations:
, m satisfy
0 = Fk , k = 1,2, k
252
,m
(2.6)
while the ME functional can be expressed by:
max H(P) = 0 +
p
m k =1
k Fk
(2.7)
Although it is not generally possible to solve (2.6) for { k } explicitly in terms of
{Fk } , numerical methods for obtaining approximate solutions are available. When
system Q has a countable infinite set of states, S, the entropy function H(P) is an infinite series having no upper limit, even under the normalisation constraint. However, the added expected values {Fk } of (2.3) introduce the upper bound (2.7) and the ME solution P(S n ) exists. The characterisation of a closed-form ME solution requires the priori estimates of the above multipliers in terms of constraints {Fk } . Note that these constraints may not all be known a priori, but it may be known that these constraints exist. This information, therefore, can be incorporated into the ME formalism in order to characterise the form of the state probability (2.4). As a result, the mean value constraints may become explicit parameters of ME solution. The analytic implementation of this solution, however, clearly requires the priori calculation of these constraints via queueing theoretic exact or approximate formulae expressed in terms of known basic system parameters.
2.2 Justification of ME
The principle of ME has its roots in the principle of insufficient reason given by Bernoulli in 1713, which implies "a probability assignment is a state of knowledge", and "the outcomes of an event should be considered initially equally probable unless there is evidence to make us think otherwise". Jaynes [11-12] used the concept of entropy in order to extend Bernoullis principle to the constrained problem, where prior information about the system is available. The entropy functional may be described as the expected amount of uncertainty that exists prior to the system occupying one of its states. In the absence of prior information, the entropy attains its maximum when all outcomes of an event are equally probable. Thus, one should initially start with a distribution of ME (i.e. a uniform type distribution) and then adjust this distribution to maximise the entropy subject to what is known. In this context, the principle of ME may be stated as Given the propositions of an event and any information relating to them, the best estimate for the 253
corresponding probabilities is the distribution that maximises the entropy subject to the available information. In an information theoretic context [11], the ME solution corresponds to the maximum disorder of system states, and thus is considered to be the least biased distribution estimate of all solutions that satisfy the systems constraints. In sampling terms, Jaynes [12] has shown that, given the imposed constraints, the ME solution can be experimentally realised in overwhelmingly more ways than any other distribution. Major discrepancies between the ME distribution and the experimentally observed distribution indicate that important physical constraints have been overlooked. Conversely, experimental agreement with the ME solution represents evidence that the constraints of the system have been properly identified. The maximisation of H(P), subject to constraints (2.2)-(2.3), uniquely characterises the form of the ME solution P(S n ) , S n S , satisfying consistency inference criteria proposed by Shore and Johnson [15]. It can be shown that maximising any other functional, subject to constraints, (2.1)-(2.3), will produce the same distribution as that of ME, otherwise it will be in conflict with the consistency criteria.
2.3 ME Analysis in Systems Modelling

In the field of systems modelling, expected values of various performance distributions of interest, such as the number of jobs and the idle state probabilities in each resource queue concerned, are often known, or may be explicitly derived, in terms of moments of inter-arrival and service time distributions (e.g., [33-36]). Note that the determination of the distributions themselves, via classical queueing theory, may proven an infeasible task even for a system of queues with moderate complexity. Hence, it is implied that the method of entropy maximisation may be applied, as appropriate, to characterise useful information theoretic exact and approximate performance distributions of queueing systems and networks. Focusing on a general open QNM, the ME solution (2.4) may be interpreted as a product-form approximation, subject to the mean values {Fk } , k = 1,2,
, m , viewed
as marginal type constraints per queue. Thus, for an open QNM, entropy maximisation suggests a decomposition of the network into individual queues with revised inter-arrival and service times. Consequently, each queue of the network can be solved in isolation. Note that the marginal ME queue length distributions, in conjunction with suitable formulae for the first two moments of the effective flow, 254
play the role of cost-effective building blocks towards the computation of the performance metrics (c.f., [26]).
3. The GE Distribution
The GE distribution is of the form
F( t ) = P( W t ) = 1 e t , t 0
where
(3.1) (3.2) (3.3)
= 2 (C 2 + 1) =
W is a mixed-time random variable of the inter-event-time, while 1 is the mean and C 2 is the squared coefficient of variation (SCV) of W. Note that measurements of actual traffic or service times in complex queueing systems are generally limited and only few parameters, such as mean and variance, can be computed reliably. In this case, the choice of a GE distribution - which is completely determined in terms of its first two moments - implies least bias (i.e., introduction of arbitrary and, therefore, false assumptions). For C 2 > 1 , the GE model is a mixed-time probability distribution and it can be interpreted as either an extremal case of the family two-phase exponential distributions (e.g., Hyperexponential-2 ( H 2 )) having the same and C 2 , where one of the two phases has zero service time; or a bulk type distribution with an underlying counting process equivalent to a Compound Poisson Process (CPP) with parameter 2 (C 2 + 1) and a geometrically distributed bulk size with mean ( C 2 + 1) 2 and SCV ( C 2 1) ( C 2 + 1) given by
P( N cp
i n 1 i e (1 ) n i , if n 1, i =1 = n) = i 1 i! e , if n = 0,
n
(3.4)
where N cp is a CPP random variable of the number of events per unit time corresponding to a stationary GE-type inter-event random variable. 255
The GE distribution is versatile, possessing pseudo-memoryless properties which makes the solution of many GE-type queueing systems and networks analytically tractable (e.g., [24-36]). Moreover, it has been experimentally established that the GE model, due to its extremal characteristics, defines performance bounds over corresponding solutions based on two-phase distributions with the same two moments as the GE (c.f., [26]). The GE distribution can be interpreted as an ME solution, subject to the constraints of normalisation, discrete-time zero probability and expected value. In this sense, it can be viewed as the least biased distribution estimate, given the available information in terms of the constraints. For C 2 < 1 , the GE distributional model (with F( 0) < 1 ) cannot be physically interpreted as a stochastic model. However, it can be meaningfully considered as a pseudo-distribution function of a flow model approximations of an underlying stochastic model (with C 2 < 1 ), in which negative branching pseudo-probabilities (or weights) are permitted. In this sense, all analytical GE-type exact and approximate results obtained for queueing networks with C2 > 1 can also be used - by analogy - as useful heuristic approximations when C2 < 1 (e.g., [23-26]). Note that the utility of other improper two-phase type distributions with C 2 < 1 has been proposed in the field of systems modelling by various authors (e.g., Sauer [45], Nojo and Watanabe [46]). In the context of entropy maximisation and queueing systems, the relative accuracy and cost-effectiveness of ME solutions for GE-type queueing systems largely depends on the following two requirements: The identification of all relevant mean value constraints which can be stochastically determined via exact (or approximate) formulae involving known basic system parameters, and The approximation of the corresponding Lagrangian coefficients explicitly in terms of these constraints. Moreover, both for computational efficiency and tractability purposes, it is appropriate to use GE-type approximation formulae for the first two moments of the effective flow (departure, splitting, merging) streams of individual queues within an arbitrary queueing network. To this end, the ME solution, in conjunction with GE-type flow approximation formulae, can be used as a cost-effective building block within an iterative queue-by-queue decomposition algorithm for complex QNMs with arbitrary configuration and blocking.
256
4. Maximum Entropy Analysis of a Complex G/G/1/N Queue

Consider a complex single server queue at equilibrium with R ( R 2 ) distinct classes of jobs denoted by G/G/1/N such that the total buffer capacity is N and vector N is specified either by N (N>0) for a CBS scheme or by a sequence of thresholds
{( N 1 ,
, N R ), N 1 = N, N i < N,
i = 2,3,
, R} for a PBS scheme.
the inter-arrival and service times are generally (G) distributed and the service disciplines in conjunction with buffer management schemes are classified for application purposes into three cases, namely {(FCFS, PS, LCFS-PR, LCFS-NPR) with CBS} {(PR,HOL) with CBS} {(HOL) with PBS} Note that the rules and mechanisms of Cases 1 and 2 are broadly applicable to computer systems, conventional communication networks and flexible manufacturing systems whilst Case 3 relates to ATM networks with space and service priorities. Moreover, for each class i, i = 1,2, , R , the arrival process is assumed to be censored (i.e. a job will be lost if on arrival it finds a full buffer) with mean rate i
2 2 and inter-arrival time SCV C i . Jobs are serviced with mean rate i and SCV Csi ,
i = 1,2 ,
,R .
Notation Let at any given time

j
, j = 1,2,
, J, J N,
[1, R ], be the class of the jth ordered job in the
G/G/1/N queueing system and J is the total number of jobs present,
n i , i = 1,2,
, R be the number of class i jobs in the G/G/1/N queueing system
(waiting or receiving service),
257
S=
( 1,
),
for Case 1 : {(FCFS, PS, LCFS - PR, LCFS - NPR) with CBS},
(n 1 , n 2 ,
, n R , ), for Cases 2 & 3 : {(PR, HOL) with CBS} & {(HOL) with PBS},
1
be a joint system state where job in service and

R i =1
or ( 1 R ) denotes the class of the current

1
n i N (n.b., for an idle queue S 0 with
= = 0 ),
Q be the set of all feasible states S,
n = (n 1 , n 2 ,
system state,
, n R ), 0 n i N i ,
R i =1
n i N (n.b., 0 = (0,
,0)) be a joint
be the set of all feasible states n. For a G/G/1/N queue with PR scheduling, a job in service, if any, always belongs to the highest priority class present. In this case the index is clearly redundant and the vector state S may be symbolised directly by the vector n. Finally, for notational purposes, N Ni , i = 1,2,
, R under a CBS scheme.

4.1 Prior Information
For each state S, S Q , and class i, i = 1,2, functions are defined:
, R , the following auxiliary
n i (S) = the number of class i jobs present in state S, 1, if the job in service is of class i, s i (S) = 0, otherwise, 1, if n i (S) > 0, in Cases 2 & 3, h i (S) = 0, if n i (S) = 0, in Cases 2 & 3 and for n i (S) 0 in Case 1, f i (S) = 1, if
R j=1
n j (S) = N i and s i (S) = 1,
0, otherwise.
Suppose, all that is known about the state probabilities { P(S), S Q } is that the following mean value constraints exist: (i) Normalisation, 258
P(S) = 1
SQ
(4.1)
(ii) Server utilisation per class, U i , ( 0 < U i < 1 ),
s i (S)P(S) = U i , i = 1,2,
SQ
,R
(4.2)
(iii) Busy state probability per class, i , ( 0 < i < 1 ),
h i (S)P(S) = i , i = 1,2,
SQ
,R
(4.3)
(n.b., this constraint is only applicable to Cases 2 and 3). (iv) Mean queue length per class, Li , ( U i < L i < N i ),
n i (S)P(S) = L i , i = 1,2,
SQ
,R
(4.4)
(v) Full buffer state probability per class, i , ( 0 < i < 1 ),
f i (S)P(S) = i , i = 1,2,
SQ
,R
(4.5)
satisfying the flow balance equations, namely
i (1 i ) = i Ui , i = 1,2,
,R
(4.6)
where i is the blocking probability that a job of class i on arrival finds the queue full. The choice of mean values (4.1)-(4.5) is based on the generalisation of the type of constraints used for the ME analysis of stable single class FCFS G/G/1/N queue (e.g., Kouvatsos [35]). Note that if additional constraints are used, it is no longer feasible to 259
capture a a cost-effective building block an ME P(S), S Q , solution in closedform. As a consequence, this will have adverse implications towards the creation of a computationally efficient queue-by-queue decomposition algorithm for arbitrary QNMs. Conversely, the removal of one or more constraints from the set (4.1)-(4.5) will result into a ME solution with reduced accuracy (c.f., Jaynes [12]). 4.2 A Universal Maximum Entropy Solution A universal form of the state probability distribution, P(S), S Q , can be characterised by maximising the entropy functional
H(P) =
SQ
P(S) log P(S) ,
(4.7)
subject to constraints (4.1)-(4.5). By employing Lagranges method of undetermined multipliers (e.g., [11,15,17]), a universal ME solution is expressed by
1 S = 0, , Z P(S) = 1 R si ( S ) h i ( S ) n i (S ) fi (S ) g i i x i y i , S Q - {0}, Z i =1
(4.8)
where Z is the normalising constant and {g i , i , x i , yi } are the Lagrangian coefficients corresponding to constraints (4.2)-(4.5), respectively. Note that recent ME solutions proposed in [27-32] are all special cases of (4.8). Furthermore, aggregating (4.8) over all feasible states S Q and after some manipulations, the joint ME queue length distribution P(n), n is given by:
P ( 0) =
1 , for all Cases 1-3, Z
(4.9a)
Case 1: {(FCFS, PS, LCFS-PR, LCFS-NPR) with CBS}
260
1 P(n) = Z
R j=1
n j 1 ! n! j=1 j
x
j=1
R nj j i =1
n i g i y if i ( n ) , n
{0}
(4.9b)
Case 2: {(PR) with CBS}
1 P(n) = g i i x in i y ifi (n ) Z
x
j=i +1
nj j
j (n) h j
, n
{0}
(4.9c)
Cases 2 and 3: {(HOL) with either CBS or PBS}
P(n) =
1 Z
x
i =1
ni i
ih i (n )
g j y fji (n )
j=1 n j > 0
, n
{0}
(4.9d)
where h i (n) = 1, if n i > 0, or 0, otherwise and f i (n) = 1, if otherwise.
R j=1
n j = N i , or 0 ,
4.3 Marginal and Aggregate Performance Distributions The joint ME solution (4.9a)-(4.9d) can be used, as appropriate, to establish closedform expressions for the marginal utilisations {U i , i = 1,2, marginal state probabilities {P(n ), n = 0,1, respectively. Case 1: {(FCFS, PS, LCFS-PR, LCFS-NPR) with CBS} Aggregating over all relevant states, the aggregate queue length distribution {P(n ), n = 0,1, , N} is given by
, R} , aggregate and , Ni} ,
, N} and {Pi (n i ), n i = 0,1,
261
1 , Z 1 P(n ) = Z 1 Z
where
n = 0,
R i =1 R i =1
g i x i X n 1 ,
n = 1,2,
, N 1,
(4.10)
g i x i X N 1 , n = N,
Z = 1+
gixi
i =1
1 X N 1 + X N 1 y i 1 X
(4.11)
and
X=
xi
i =1
(4.12)
Moreover, summing up over all appropriate state probabilities, the marginal probabilities {Pi (n i ), n i = 0,1, are expressed by
R 1 1 x iN 1 1+ x jg j + y j x iN 1 Z 1 xi i j=1
, N} and marginal utilisations {U i , i = 1,2,
, R}
Pi (0) =
(4.13)
Pi (n i ) =
N n i 1 n + k 1 1 ni i xi gi x ik + ni 1 Z k =0
N n i 2
x jg j
i j=1 k =0
ni + k k xi ni
+ yigi
(4.14) for all n = 1,2,
N n i N-n i xi + ni 1
x jg j y j
i j=1
N 1 N-n i -1 xi , ni
, N 1 , where x i = X x i , and
262
Pi ( N) =
1 g i x iN y i Z
(4.15)
Ui
1 X N 1 1 gixi + X N 1 y i 1 X Z
(4.16)
Note that by using Pascal's triangle equality and carrying out laborious but not complex operations, cost-effective recursive expressions for the marginal probabilities (4.13)-(4.15) can be obtained (c.f., Denazis [28]). Cases 2 and 3: {(PR,HOL) with CBS}&{(HOL) with PBS} By taking advantage of the "product" form of the ME solution (4.9c)-(4.9d) and applying the generating function approach [47], recursive expressions for marginal utilisations, aggregate and marginal probabilities can be obtained. It can be observed that the marginal utilisation U i , i = 1, defined by U i =
SQ i
, R is clearly
s (S)P(S) and after some manipulations (c.f., [30-32]), it is
determined by a universal form, namely
1 Ui = gii x i Z
N i 1 u =0
y i ( u ) C (i ) (u )
(4.17)
where (u ) = 1, if u = N i 1 or 0, otherwise, where C (i ) (u ) can be calculated recursively according to the particular priority discipline and buffer management scheme. For example, focusing on Case 3: {(HOL) with PBS}, C (i ) (u ) is expressed by
C (i ) (u ) = (1 i ) x i C (i ) (u 1) + C(u ) x iNi C(u N i ) + x iNi +1 i C (i ) (u N i 1),
(4.18)
263
for
(i )
u = 0,1,
, & i = 1,
,R ,
with
initial
conditions be
C (u ) = 0, if u < 0 or 1, if u = 0, and C(u ) = C R (u ) , and C R (u ) can

calculated recursively by
C r (u ) = x r C r (u 1) + C r 1 (u ) (1 r ) x r C r 1 (u 1) + r x rN r +1C r 1 (u N r 1),
(4.19)
u with initial conditions C r (u ) = 0, if u < 0, or 1, if u = 0, or 1 x 1 , if u > 0 (c.f.,
[32]). Similar recursive relations for C (i ) (u ) can be established in Case 2: {(PR,HOL) with CBS} (c.f., [30-31]). From eq. (4.17), the following expression for the normalising constant Z is derived
Z = 1+
R i =1
g i i x i
N i 1 u =0
y i ( u ) C (i ) (u )
(4.20)
Unconditioning expressions (4.17) over all classes, expressions (4.17) over all classes, a relationship for the aggregate utilisation U, can be obtained with aggregate arrival rate . Consequently, the aggregate probability distribution, {P(n ), n = 0,1, , N} , is given by
1 , n = 0, P(n ) = Z R 1 g i i x i y i ( n ) C(i ) (n 1), n = 1,2, Z i =1
(4.21)
, N.
Moreover, using ME solution P(S) (c.f.,(4.9)) and aggregating, as appropriate, expressions for the marginal probabilities can be obtained. For example, in Case 3: {(HOL) with PBS}, {Pi (n i ), n i = 0,1,
, N i } are determined by
N j 1 u =0
Pi (0) =
R 1 1+ g j j x j Z i j=1
y j ( u ) C i( j) (u )
(4.22)
264
Pi (n i ) =
1 ni x i i Z + gi
R i j=1
g j j x j y
(u) j
i j=1 N1 n i u =0
min( N j 1, N1 n i 1) (u) j u =0
C i( j) (u )
(4.23)
C (u )
(i ) i
, n i = 1,
, Ni ,
where the coefficients {C i( j) ( u ), u = 0,1, determined via the following recursive formulae:
, N j 1, (i, j) [1, R]} can be
C ( j) (u ) x i C ( j) (u 1) + (1 i ) x i C i( j) (u 1) C i( j) (u ) = + i x iN i +1C i( j) (u N i 1), C ( j) (u ) x i C ( j) (u 1) + x iN i C ( j) (u N i ), i j, i = j,
(4.24)
with initial conditions C i( j) (u ) = 0, if u < 0, or 1, if u = 0 (c.f., [32]). Similar recursive expressions for
{Pi (n i ), n i = 0,1,
, N i } can be obtained for Case 2:
{(PR,HOL) with CBS} (c.f., [30-31]). 4.4 GE-Type Blocking Probabilities and Lagrangian Coefficients Focusing on the GE/GE/1/N queue with multiple job classes, an approximate but universal closed form expression for the blocking probability, i , i = 1,2,
, R , can
be determined in all Cases 1-3 by using GE-type probabilistic arguments and is given by
i =
Ni n =0
i (n )(1 i ) [ N i n ] P(n )
(4.25)
where
i (n ) = i , if n = 0 , and 1, otherwise, i = ri (ri (1 i ) + i ) ,
2 2 i = ( C i + 1) 2 and ri = (C si + 1) 2 (c.f., [27-32]).
Moreover, the Lagrangian coefficients
gi , x i and i , i = 1,2,
,R
can be
approximately determined via closed form asymptotic expressions based on the ME solution of the corresponding infinite capacity GE/GE/1 queue at equilibrium (c.f., 265
Kouvatsos [26,35]). Assuming x i , g i and i are invariant to the buffer sizes, appropriate formulae can be established, namely Case 1: {(FCFS, PS, LCFS-PR, LCFS-NPR) with CBS} (c.f., [29])
xi =
L i i L
(4.26)
gi =
(1 X)i (1 ) x i
R i =1
(4.27)
where i = i i , = mean queue lengths { Li , i = 1,2,
R i =1
i , X =
xi , L =
R i =1
L i and L i is the
asymptotic marginal mean queue length of a multi-class GE/GE/1 queue. Note that
, R } of a non-priority multi-class GE/GE/1 queue
have been determined in Kouvatsos et al [36] and are given by
i 2 1 R i 2 2 2 (C i + 1) + j (C j + C sj ), j=1 j 2 2(1 ) 2 i (C i + 1) , 2(1 ) Li = 2 i (C R i 1 i + 1 2) 2 + 2 j (1 + C sj ), j=1 j 2(1 ) 2(1 ) 1 R hj 2 2 i C j C j + j , 1 j=1 h i

where { h i , i = 1,2,
for FCFS rule, for LCFS - PR rule,

(4.28)
for LCFS - NPR rule, for PS rule,
, R }, is a set of discriminatory weights that impose different
treatment to different classes of jobs under PS rule. Cases 2 and 3: {(PR,HOL) with CBS}&{(HOL) with PBS} (c.f., [30-32])
xi =
L i i , for i = 1,2, L
266
,R
(4.29)
R 1 1 , for i = 1, (1 ) (L1 1 ) =2 1 i i i R gi = , for i > 1, (1 ) ( i i1 ) =i +1 1 i i R , for i = 1, (1 ) ( i i ) i =1
PR with CBS, PR with CBS, , R, HOL with CBS/PBS,

(4.30)
1, i =
for i = 1,
PR with CBS,
( i i ) (1 x i ) , for i > 1, PR with CBS, (4.31) ( i i ) x i ( i i ) (1 x i ) , for i = 1, , R, HOL with CBS/PBS. ( i ) x i
Note that stochastic formulae for mean queue length, L i , and idle state probabilities, ( 1 i ), can be found in Kouvatsos and Tabet-Aouel [33-34]. Moreover, focusing on the flow balance condition (4.6) and by using the appropriate expressions for the normalising constant, Z, the aggregate probabilities {P(n ), n = 0,1,
, N} , and the blocking probabilities {i , i = 1,
, R} , it
follows that the Lagrangian coefficients {y i , i = 1,2, determined, after some manipulation, namely Case 1: {(FCFS, PS, LCFS-PR, LCFS-NPR) with CBS}
, R} can be recursively
n = 1, 1i 2i , 1 i X yi( n ) = 1 i ( n 1) yi 1i , n 2, X X
where
(4.32)
267
1 (1 i ) , + 1 X 1 i X i = 1 (1 i ) , i ( 0) + 1 X 1 i X
(c.f., [28,29]). Case 2: {(PR, HOL) with CBS} PR Discipline:
= 1,
(4.33)
= 2,
yi
i 1+ g ii x i C( i ) ( N 1)
N 1
R j =1
g j j x j
N 2 u =0 R
C ( j) ( u )
N2 1 C( i ) ( u ) (i ) C ( N 1) u = 0 N2
i (1 i ) i (1 i ) + g ii x i C( i ) ( N 1)
(4.34) HOL Discipline:
C ( j) ( u ) i ( j)g j jx j . u j =1 u = 0 (1 i )
yi
i 1+ g i C (i ) ( N) (1 ) i (i ) i g i C ( N)
N 1
N 1
gj
j=1 u =1
C ( j) ( u )
R
1 N 1 (i ) C (u ) C (i ) ( N) u =1
N 1
C ( j) (u ) i (1 i ) + g j u 1 j=1 u =1 (1 i )
(4.35)
(c.f., [30,31]). Case 3: {(HOL) with PBS}
yi
= g i i x i
i
N1 1
g j j x j
N1 2 u =0
( u ) ( j) y C (u ) + 1 i (1 i ) Ni j
C (i ) (u )
u = N i 1 Ni 2 u =0
j=1
(4.36)
R j=1
1 g i i x i i
C ( i ) (u )
g j j x j
N i 1 u =1
(1 i ) Ni u C ( j) (u 1) ,
(c.f., [32]). 268
5. Maximum Entropy Analysis of Complex Open Queueing Networks
Consider a complex open QNM with arbitrary configuration, R ( R 2 ) distinct priority and/or non-priority classes of jobs, M ( M 2 ) single server queueing stations with GE-type class inter-arrival and service time processes, (FCFS, PS, LCFS-PR, LCFS-NPR, PR, HOL) scheduling disciplines, (CBS, PBS) buffer management schemes and RS-FD and/or RS-RD blocking mechanisms. Notation
kimj kimj
ki 0
2 0 ki , C 0 ki
be the transition probability (first order Markov chain) that a class i job after finishing service at station k attempts to join station m as class j, be the effective transition probability, be the transition probability that a job of class i leaves the network upon finishing service at station k, be the mean rate and SCV of the external inter-arrival process of class i jobs at station k, respectively, be the mean rate and SCV of the overall actual inter-arrival process of class i jobs at station k, respectively, be the mean rate and SCV of the external effective inter-arrival process of class i jobs at station k, respectively, be the mean rate and SCV of the overall effective inter-arrival process of class i jobs at station k, respectively, be the mean rate and SCV of the actual service process of class i jobs at station k, respectively, be the mean rate and SCV of the effective (or, total) service process of class i jobs at station k, respectively, be the blocking probability that a job of class i upon its service completion (call it "completer") from station k will be blocked by station m, as class j, be the blocking probability that a completer from any station m, m k of class i is blocked by station k, 269
2 ki , C ki 2 0 ki , C 0 ki
2 ki , C ki 2 ki , Cs ki 2 s ki , C ki
kimj
ki
0 ki cki N (k )
be the blocking probability that an external arrival of class i is blocked by station k, be the blocking probability that a completer of class i at station k will be blocked by any of the downstream stations, be the total capacity of queue k, k=1,2,...,M.
5.1 A Universal ME Product-Form Approximation Focusing at queueing station k, k = 1,2,
, M , let at any given time
kj
, j = 1,2,
, J k , J k N (k) ,
kj
[1, R ] , be the class of the jth ordered job
where J k is the total number of jobs present,
n ki , i = 1,2,
queue,
Sk = (
k1
, R be the number of class i jobs in the waiting or receiving service
k2
kJ k
),
for Case 1 : {(FCFS, PS, LCFS - PR, LCFS - NPR) with CBS},
( n k1 , n k 2 ,
, n kR , k ), for Cases 2 & 3 : {(PR, HOL) with CBS} & {(HOL) with PBS},
k1
be a joint system state, where
or k (1 k R ) denote the class of the

k1
current job in service (n.b., for and idle queue S k 0 with
= k = 0 ),
Q k be the set of all feasible states S k , k = 1,2,

Moreover,
,M .
S = (S1 , S 2 ,
Q = Q1
, S M ) be the state of the network,

Q M be the set of all possible states S.
The form of the ME solution P(S), S Q subject to normalisation and marginal constraints of the type (4.1)-(4.5), can be clearly established in terms of the product-form approximation
270
1 M P(S) = Z k =1
x
i =1
n ki ( S k ) ki
ki ( S k ) h ki
kj ( S k ) g kj y fkj ,SQ
(5.1)
j=1 n j > 0
where Z is the normalising constant and {g ki , ki , x ki , y ki , i = 1,2, queueing station k, k = 1,2,

R
, R} are the
, M , as
Lagrangian coefficients that corresponds to constraints of the type (4.2)-(4.5), at
, M , respectively. Defining Zk , k = 1,2,

R
kj ( S k ) g kj y fkj
Zk =
S k Q k
x nkiki (S k ) hkiki (S k )
i =1
(5.2)
j=1 n j > 0
expression (5.1) can be written as the product of the marginal probabilities
Pk (S k ), S k Q k , i.e.,
P(S) = Pk (S k )
k =1 M
(5.3)
where Pk (S k ), S k Q k is the ME solution of queue k (c.f., (4.8)), namely
1 Pk (S k ) = Zk
x
i =1
n ki ( S k ) ki
R h ki ( S k ) ki
kj ( S k ) g kj y fkj
(5.4)
j=1 n j > 0
The ME product-form solution (5.3) clearly implies a decomposition of the complex open QNM into individual queueing stations each of which may be analysed in isolation. 5.2 ME Queue-by-Queue Decomposition Algorithm This section presents an outline of a ME queue-by-queue decomposition algorithm for the approximate analysis of complex open QNMs with arbitrary configuration and RS blocking. The universal ME joint state probability (4.8) of a censored GE/GE/1/N queue, in conjunction with GE-type flow approximation formulae for the first two moments of the effective flow (departure, splitting, merging) streams (e.g., [26]), can be used as a cost-effective building block in the solution process.
271
Note that the blocking probabilities {0 ki , kimj } can be clearly determined by the same type of expressions (4.25) used in Section 4.4 to characterise the blocking probabilities of a censored GE/GE/1/N queueing system. The RS-FD and RS-RD blocking mechanisms influence the particular stochastic closed-form expressions for the calculation of the effective service time parameters
2 ski ki , C }. {
kimj }are only applicable to the RS-RD The effective transition probabilities {
mechanism which imposes a dependence on the routing of jobs (e.g., [27-28]). The decomposition algorithm incorporates a feedback correction of the transition probabilities in order to minimise the strong underlying assumption that arrival streams per class are GE-type renewal processes. The determination of probabilities {0 ki , kimj } depends on the effective flow
2 0 ki , ki } and SCV C balance equations for { ki , effective service parameters 2 2 ski ki , C { } and overall inter-arrival parameters { ki , C ki } (c.f., [29-32]).
The ME algorithm below describes the computational process of solving iteratively the non-linear equations for blocking probabilities {0 ki , kimj } under GE-type interarrival and service time processes at each queueing station. BEGIN INPUT DATA
M, R ,
2 2 {N ( k ) , 0 ki , C 0 ki , ki , C ski , kimj }, k = 1,2,
, M, m = 0,1,
, M & i, j = 1,2,
,R .
Step 1 Feedback correction For each queue k, k = 1,2,
, M , and class i, i = 1,2,
, R , with
kimj > 0 , substitute ki = ki (1 kimj )

2 2 C ski = kiki + (1 kiki )C ski
where 272
kimj =
0, k = m, kimj (1 kiki ) , m k, m = 1,2,
, M;
Step 2 Initialise 0 ki & kimj any value in (0,1),
k , m = 1,2,
, M & i, j = 1,2,
,R ;
Step 3 Solve the system of non-linear equations of blocking probabilities {0 ki , kimj } ; Step 3.1 For each censored GE/GE/1/N queueing station k ( k = 1,2,
, M ) under
kimj }: RS-RD blocking, calculate the effective flow transition probabilities { kimj = kimj (1 kimj ) (1 cki ) , ki 0 = ki 0 (1 cki ) ,
where
cki =
kimj kimj , k, m, i ;
k m =1 j=1
0 ki , ki }: Step 3.2 Compute the effective job flow balance equations for { 0 ki = 0 ki (1 oki ) ,
ki = 0 ki +
mj , k, m, i ; mjki
m =1 j=1
2 s ki , C Step 3.3 Calculate the effective GE-type service parameters, { ki }(c.f.,
[28-30,32]):
273
Step 3.3.1 For an RS-RD blocking mechanism:
ki = ki (1 cki ) , k, m, i ,
2 2 ski C , k, m, i ; = cki + (1 cki )C ski
Step 3.3.1 For an RS-FD blocking mechanism:

M m =1 1
ki = ki
kim (1 kim )
2 ski = 1 + C
C + kim m =1 (1 kim )
M
2 ski
kim (1 + kim ) 2 m =1 (1 kim )

M
kim m =1 (1 kim )
M
2 Step 3.4 Calculate overall GE-type inter-arrival parameters, { ki , C ki };
2 2 C ki = (C ki ki ) (1 ki ) , k, i ,
ki (1 ki ) , k, i , ki =
where
2 2 2 2 mi , C smi kimj ) , C , ki = G ki ( 0 ki , C 0 ki , mi , C mi ,
0 ki 0 ki + ki = 0 ki +
mjki mj mjki (1 mjki )) (

;
m =1 j=1 M R
mjki mj (1 mjki )) (
m =1 j=1
(n.b., G ki is the GE-type flow superposition function, c.f., [29,30,32]). Step 3.5 Obtain new values for {0 ki , kimj } by applying the Newton-Raphson method; 274
2 Step 4 Calculate Cdki , the SCV of the inter-departure times: 2 2 2 ki , C C dki = Fki ( ki , ki , C ski ) , k, i ;
(n.b., Fki is the GE-type flow inter-departure function, c.f., [29,30,32]).

2 Step 5 Calculate new value for C ki , k, i ;
2 Step 6 Return to Step 3 until convergence of C ki , k, i ;
Step GE( ki , C
7
2 ki
Apply
the
2 ski
universal
ME
solution
(4.8)
of
censored
ki , C )/GE(
)/1/N queueing station k ( k = 1,2,
, M ) and
obtain the performance metrics of interest via expressions (4.9)-(4.36), as appropriate; END More detailed descriptions and theoretical proofs of the ME decomposition algorithm with particular applications to Cases 1-3, respectively, can be seen in [29,30,32]. However, the ME algorithm is also applicable to the analysis of open QNMs where queueing stations may comply with any combinations of service rules and buffer management schemes across Cases 1-3. Note that the main computational cost of the proposed algorithm is of O{cR 3M 3 } , where c is the number of iterations in Step 3 and (R 3M 3 ) is the number of operations for inverting the associated Jacobian matrix of the system of non-linear equations {0 ki , kimj } . However, if a quasi-Newton numerical method is employed, this cost can be reduced to be of
O{cR 2 M 2 } . Moreover, the existence and unicity of the solution of the non-linear
system of Step 3 cannot be shown analytically due to the complexity of the expressions of the blocking probabilities {0 ki , kimj } ; nevertheless, numerical instabilities were never observed during extensive experimentation which have verified the credibility of the ME algorithm against simulation for any feasible set of initial values of complex QNMs (e.g., [27-30,32]). In the special cases of open exponential and "reversible" queueing networks (e.g., Baskett et al [1], Kelly [8]), it has been shown that the ME product-form approximation reduces, as appropriate, to the exact solution (c.f., [24-25,28-29]). 275
6. CONCLUSIONS AND FURTHER COMMENTS

Since the mid-60s it has become increasingly evident that classical queueing theory cannot easily handle, "by itself", complex queueing systems and networks with many interacting elements. In this context, the principle of ME has inspired a new and powerful framework for the establishment of universal closed-form solutions and queue-by-queue decomposition algorithms for the approximate analysis of complex queueing systems and networks (e.g., [19,22-36]). This tutorial presents an overview of an analytic framework for a unified exposition of earlier works on entropy maximisation and complex queueing systems and networks. In this context, a universal ME solution is characterised, subject to appropriate mean value constraints, for the joint state probability distribution of a complex G/G/1/N queue with distinct either priority or non-priority classes of jobs and either CBS or PBS schemes. Closed-form expressions for the aggregate and marginal state probabilities are established. The stochastic implementation of the ME solution is achieved by making use of the GE distributional model as a least biased approximation of G-type distributions with known first two moments. Subsequently, asymptotic connections to the corresponding infinite capacity queue are made and GEtype formulae for the mean value constraints and blocking probability per class are determined. Furthermore, it is shown that the ME solution can be utilised as a costeffective building block, in conjunction with GE-type flow approximation formulae, towards the derivation of a universal ME product-form approximation and a queue-byqueue decomposition algorithm for complex QNMs with arbitrary configuration, multiple job classes and RS blocking. More recent publications on the analysis of QNMs with RS blocking, based on the principles of ME and minimum relative entropy (MRE), a generalisation [48], have appeared in [48-51]. Moreover, applications of GE-type ME solutions into the performance modelling and congestion control of mobile networks can be seen in [52-54]. The methodology of ME and its generalisations (c.f., [16,26,55]) can also be applied to study complex discrete-time QNMs such as those based on the generalised geometric (GGeo) [26] and shifted GGeo (sGGeo) [56-59] distributions and other more complex traffic models. ME applications to multi-buffer, shared buffer and shared medium ATM switch architectures under various blocking mechanisms and buffer management schemes can be seen in [57]. In particular, in the context of arbitrary multi-buffered networks, a credible approximation of a more complex sGGeo traffic pattern by a simpler and more tractable GGeo-type model appeared in [58]. 276
References
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
F. Baskett, K.M. Chandy, R.R. Muntz and F.G. Palacios, "Open, closed and mixed networks with different classes of customers", J.ACM 22 (1975) 248260. R. Marie, "An approximate analytical method for general queueing networks", IEEE Trans. Software Eng. SE-5 (1979) 530-538. M. Reiser and H. Kobayashi, "Accuracy of the diffusion approximation for some queueing systems", IBM J. Res. Rev. 18 (1974) 110-124. E. Gelenbe and G. Pujolle, "The behaviour of a single queue in a general queueing network", Acta Info. 7 (1976) 123-160. P.J. Courtois, "Decomposability: Queueing and Computer Systems Applications", Academic Press, New York (1977). K.M. Chandy, U, Herzog and L. Woo, "Approximate analysis of general queueing networks", IBM J. Res. Dev. 19 (1975) 43-49. K.C. Sevcik, A.I. Levy, S.K. Tripathi and J.L. Zahorjan, "Improving approximation of aggregated queueing network subsystems", in Computer Performance, eds. K.M. Chandy and M. Reiser, North-Holland (1977) 1-22. F.P. Kelly, "Reversibility and stochastic networks", Wiley, New York (1979). R.M. Bryant, A.E. Krzesinski, M.S. Laksmi and K.M. Chandy, "The MVA priority approximation", T.O.C.S. 2 (1984) 335-359. V.E. Benes, "Mathematical theory of connecting networks and telephone traffic", Academic Press, New York (1965). E.T. Jaynes, "Information theory and statistical mechanics", Phys. Rev. 106 (1957) 620-630. E.T. Jaynes, "Information theory and statistical mechanics II", Phys. Rev. 108 (1957) 171-190. C.E. Shannon, "A mathematical theory of communication", Bell Syst. Tech. J. 27 (1948) 379-423, 623-656. M. Tribus, "Rational description, decisions and designs", Pergamon, New York (1969). J.E.Shore and R.W. Johnson, "Axiomatic derivation of the principle of ME and the principle of minimum cross-entropy", IEEE Trans. Info. Theory IT-26 (1980) 26-37. J.E.Shore and R.W. Johnson, "Properties of cross-entropy minimisation", IEEE Trans. Info. Theory IT-27 (1981) 472-482. J.N. Kapur, "Maximum entropy models in science and engineering", John Wiley (1989). J.N. Kapur and H.K. Kesavan, "Entropy optimisation principle with applications", Academic Press, New York (1992). 277
[19] [20] [21] [22] [23] [24] [25]
[26] [27] [28] [29] [30]
[31]
[32]
[33]
A.E. Ferdinand, "A statistical mechanical approach to systems analysis", IBM J. Res. Dev. 14 (1970) 539-547. E. Pinsky and Y. Yemini, "A statistical mechanics of some interconnection networks", in: Performance '84, ed. E. Gelenbe, North-Holland (1984) 147158. E. Pinsky and Y.Yemini, "The canonical approximation in performance analysis", in: Computer Networking and Performance Evaluation, eds. T. Hasegawa et al., North-Holland (1986) 125-137. J.E. Shore, "Information theoretic approximation for M/G/1 and G/G/1 queueing systems", Acta Info. 17 (1982) 43-61. M.A. El-Affendi and D.D. Kouvatsos, "A maximum entropy analysis of the M/G/1 and G/M/1 queueing systems at equilibrium", Acta Info. 19 (1983) 339-355. D.D. Kouvatsos, "Maximum entropy methods for general queueing networks", in: Modelling Techniques and Tools for Performance Analysis, ed. D. Potier, North-Holland (1985) 589-609. D.D. Kouvatsos, "A universal maximum entropy algorithm for the analysis of general closed queueing networks", in: Computing Networking and Performance Evaluation, eds. T. Hasegawa et al., North-Holland (1986) 113124. D.D. Kouvatsos, "Entropy maximisation and queueing network models", Annals of Oper. Res. 48 (1994) 63-126. D.D. Kouvatsos and S.G. Denazis, "Entropy maximised queueing networks with blocking and multiple job classes", Performance Evaluation 17 (1993) 189-205. S.G. Denazis, "Queueing networks models with blocking and multiple job classes", Ph.D. Thesis, Bradford University (1993). D.D. Kouvatsos and I.U. Awan, "MEM for arbitrary closed queueing networks with RS-blocking and multiple job classes", Annals of Op. Res. 79 (1998) 231-269. I.U. Awan and D.D. Kouvatsos, "Arbitrary queueing network models with service priorities and blocking", Proc. of the 13th UK Workshop on Perf. Eng. of Computer and Telecommunication Systems, ed. D.D. Kouvatsos, Ilkley, UK (July 1997) 12/1-12/14. D.D. Kouvatsos, I.U. Awan and S.G. Denazis, "A priority G/G/1/N censored queue with complete buffer sharing", Proc. of the 12th UK Workshop on Perf. Eng. of Computer and Telecommunication Systems, eds. J. Hillston and R. Pooley, Edinburgh (1996) 33-48. D.D. Kouvatsos and I.U. Awan, " Approximate analysis of arbitrary QNMs with with space and service priorities", ATM Networks: Performance Modelling and Analysis, Kluwer Academic Publishers, IFIP Publication, ISBN 0-412-83640-8, Chapter 25, pp. 497-521, 1999. D.D. Kouvatsos and N.M. Tabet-Aouel, "A maximum entropy priority 278
[34] [35] [36]
[37]
[38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48]
[49]
approximation for a stable G/G/1 queue", Acta Info. 27 (1989) 247-286. D.D. Kouvatsos and N.M. Tabet-Aouel, "Product-form approximations for an extended class of general closed queueing networks", Performance '90, eds. P.J.B. King et al, North-Holland (1990) 301-315. D.D. Kouvatsos, "Maximum entropy and the G/G/1/N queue", Acta Info. 23 (1986) 545-565. D.D. Kouvatsos, P.H. Georgatsos and N.M. Tabet-Aouel, "A universal maximum entropy algorithm for general multiple class open networks with mixed service disciplines", Modelling Techniques and Tools for Computer Performance Evaluation, eds. R. Puigjaner and D. Potier, Plenum (1989) 397419. I.F. Akyildiz and C.C. Huang, "Exact analysis of multi-job class networks of queues with blocking-after service", Proc. 2nd International Workshop on Queueing Networks with Finite Capacity, eds. R.O. Onvural and I.F. Akyildiz, Research Triangle Park, USA (1992) 258-271. T. Altiok and H.G. Perros, "Approximate analysis of arbitrary configurations of queueing networks with blocking", Annals of Oper. Res. 9 (1987) 481-509. Y. Takahashi, H. Miyahara and T. Hasegawa, "An approximation method for open restricted queueing networks", Opers. Res. 28 (1980) 594-602. D. Yao and J.A. Buzacott, "Modelling a class of state dependent routing in flexible manufacturing systems", Annals of Oper. Res. 3 (1985) 153-167. H.G. Perros, "Approximation algorithms for open queueing networks with blocking", Stochastic Analysis of Computer and Communication Systems, ed. H. Takagi, North-Holland (1990) 451-494. R.O. Onvural, "Survey of closed queueing networks with blocking", ACM Comput. Surveys 22(2) (1990) 83-121. H.G. Perros, "Queueing networks with blocking", Oxford University Press (1994). L.F. deMoraes, "Priority scheduling in multiaccess communication", Stochastic Analysis of Computer and Communication Systems, ed. H. Takagi, Elsevier Science Publishers B.V., North-Holland, Amsterdam (1990) 699-732. C. Sauer, "Configuration of computing systems: an approach using queueing network models", PhD Thesis, University of Texas (1975). S. Nojo and H. Watanabe, "A new stage method getting arbitrary coefficient of variation by two stages", Trans. IEICE 70 (1987) 33-36. A.C. Williams and R.A. Bhandiwad, "A generating function approach to queueing network analysis of multiprogrammed computers", Networks 6 (1976) 1-22. C.A. Skianis and D.D.Kouvatsos, "Arbitrary open queueing networks with server vacation periods and blocking", Annals of Operations Research, Special Issue on Queueing Networks with Blocking, Baltzer Science Publishers, ISSN 0254-5330, Vol. 79, pp. 143-180, 1998. Y. Dallery and D.D. Kouvatsos, "Queueing networks with blocking", Annals 279
[50]
[51] [52]
[53]
[54] [55] [56]
[57] [58]
[59]
of Operations Research, Special issue on queueing networks with blocking, Baltzer Science Publishers, ISSN 0254 5330, Vol. 79, pp. xiii-xiv, March 1998. D.D. Kouvatsos and I.Awan, "Entropy Maximisation and Open Queueing Networks with Priorities and Blocking", Special Issue on Queueing Networks with Blocking, Performance Evaluation, ISSN 0166-5316, Vol. 51, No.s 2-4, pp. 191-227, 2003. D.D. Kouvatsos, S. Balsamo, "Queueing Networks with Blocking", Special Issue on Queueing Networks with Blocking Performance Evaluation, ISSN 0166-5316, Vol. 51, No.s 2-4, pp. 79-81, 2003. D.D. Kouvatsos. I. Awan, K. Al-Begain and S. Tantos, "Performance analysis and capacity assingnement optimisation of wireless GSM cells with Re-Use partitioning", International Journal of Simulation, Special Issue on Analytical and Stochastic Modelling Techniques and Applications, ISSN:1473-804x (On line), 1473-8031 (Print), Vol. 3, No. 1-2, pp. 56-67, 2002. D.D. Kouvatsos, I. Awan, K. Al-Begain, "Performance modelling of GPRS with bursty multi-class traffic", IEE Proceedings - Computers and Digital Techniques, Special Issue on Performance Engineering, ISSN 1350-2387 , Vol. 150, No. 2, pp. 75-85, 2003. D.D. Kouvatsos, Y. Li and I. Awan, Performance modelling of a wireless GSM/GPRS cell under partial sharing scheme, IFIP Networking 2004, Lecture Notes in Computer Science, Vol. 3042, pp. 1252-1262, 2004. J.N. Kapur and H.K. Kesavan, "Generalised maximum entropy principle with applications", Sand Ford Press, University of Waterloo (1987). D.D. Kouvatsos and R.J. Fretwell, "Closed form performance distributions of a discrete time GIG/D/1/N queue with correlated traffic", Data Communications and their Performance, eds. S. Fdida and R.O. Onvural, Chapman & Hall (1995) 141-163. D.D. Kouvatsos, "Performance modelling and cost-effective analysis of multiservice integrated networks", Electronics & Communication Engineering Journal (1997) 127-135. D.D. Kouvatsos, I.U. Awan, R.J. Fretwell and G. Dimakopoulos, "A Cost effective approximation for SRD traffic in arbitrary multi-buffered networks", Computer Networks, Special Issue on Performance Modelling and Evaluation of ATM Networks, North Holland -Elsevier, ISSN 1374-1286, Vol. 34, pp. 97-113, 2000. R.J. Fretwell and D.D. Kouvatsos, "LRD and SRD traffics: Review of results and open issues for the batch renewal process", Performance Evaluation, Special issue on ATM&IP Networks: Performance Modelling and Analysis, Vol. 48, pp. 267-284, 2002.
280
Diusion approximation as a modelling tool in congestion control and performance evaluation

rskia T. Czacho ,
a
F. Pekerginb
Institute of Theoretical and Applied Informatics Polish Academy of Sciences Baltycka 5, 44-100 Gliwice, Poland email: tadek@iitis.gliwice.pl
b LIPN, Universit e Paris-Nord 93430 Villetaneuse, Av. J. B. Cl ement, France e-mail ferhan.pekergin@lipn.univ-paris13.fr
Abstract: Diusion theory is already a vast domain of knowledge. This tutorial lecture does not cover all results; it presents in a coherent way an approach we have adopted and used in analysis of a series of models concerning evoluation of some trac control mechanisms in computer, especially ATM, networks. Diusion approximation is presented from engineers point of view, stressing its utility and commenting numerical problems of its implementation. Diusion approximation is a method to model the behavior of a single queueing station or a network of stations. It allows one to include in the model general sevice times, general (also correlated) input streams and to investigate transient states, which, in presence of bursty streams (e.g. of multimedia transfers) in modern networks, are of interest.
1. Single G/G/1 station

1.1. Preliminaries
Let A(x), B (x) denote the interarrival and service time distributions at a service station. The distributions are general but not specied, the method requires only their two rst moments. The means are: E [A] = 1/, E [B ] = 1/ and variances 2 , Var[B ] = 2 . Denote also squared coecients of variation are Var[A] = A B 2 2 2 2 2 2 . N (t) represents the number of customers present in CA = A , CB = B the system at time t.
Dene k =
ai ,
i=1
where ai are time intervals between arrivals. We assume that they are independent and indentically distributed random variables, hence, according to the central limit theorem, distribution of a variable Tk k A k
x 2 1 Tk k x] = (x), where (x) = e 2 d. k 2 A k hence for a large k : P [k xA k + k/] (x) . Denote t = x k + k/, A or k = t xA k and for large values of k , k t or k t. Denote by H (t) the number of customers arriving to the station during a time t; note that P [H (t) k ] = P [k t], hence (x) P [k xA k + k/] = P [H (t) k ] =
tends to standard normal distribution as k : lim P [
H (t) t x = P [H (t) t xA t ] = P A t
As for the normal distribution (x) = 1 (x), then P [ x] = P [ x], and H (t) t P x (x) , A t that means that the number of customers arriving at the interval of length t (sufciently long to assure large k ) may be approximated by the normal distribution 2 3 t. Similarly, the number of customers served with mean t and variance A in this time is approximately normally distributed with mean t and variance 2 3 t, provided that the server is busy all the time. Consequently, the changes B of N (t) within interval [0, t], N (t) N (0), have approximately normal distribution 2 3 + 2 3 )t. with mean ( )t and variance (A B Diusion approximation [51, 52] replaces the process N (t) by a continuous diusion process X (t) the incremental changes of wich dX (t) = X (t + dt) X (t) are normally distributed with the mean dt and variance dt, where , are coecients of the diusion equation f (x, t; x0 ) 2 f (x, t; x0 ) f (x, t; x0 ) = 2 t 2 x x 284 (1)
which denes the conditional pdf f (x, t; x0 ) = P [x X (t) < x + dx | X (0) = x0 ] of X (t). The both processes X (t) and N (t) have normally distributed changes; 2 3 + 2 3 = C 2 + C 2 ensures the same ratio of the choice = , = A B A B time-growth of mean and variance of these distributions. More formal justication of diusion approximation is in limit theorems for n is a series of random G/G/1 system given by Iglehart and Whitte [38, 39]. If N variables deriven from N (t): )nt n = N (nt) ( N , 2 2 3 3 (A + B ) n then the series is weakly convergent (in the sense of distribution) to where (t) is a standard Wiener process (i.e. diusion process with = 0 i = 1) provided that > 1, that means if the system is overloaded and has no equilibrium state. n is convergent to R . The R (t) pocess is (t) In the case of = 1 the series N process limited to half-axis x > 0 : R (t) = (t) inf [ (u), 0 u t] . Service station with 1 does not attein steady-state the number of customers is linearily growing, with uctuatins around this deterministic trend. For service stations in equilibrium, with < 1, there is no similar theorems and we should rely on heuristic conrmation of the utility of this approximation. The process N (t) is never negative, hence X (t) should be also restrained to x 0. A simple solution is to put a reecting barrier at x = 0, [44, 45]. In this case f (x, t; x ) 0 f (x, t; x0 )dx = dx = 0; f (x, t; x0 )dx = 1, and t t 0 0 0 replacing the integrated function with the right side of the diusion equation we get the boundary condition corresponding to the reecting barrier at zero: f (x, t; x0 ) f (x, t; x0 )] = 0 . x0 2 x The solution of Eq. (1) with conditions (2) is [44] lim [ f (x, t; x0 ) = x x x0 t t e
2x
(2)
x + x0 + t t
(3)
If the station is not overloaded, < 1 ( < 0), then steady-state exists. The density function does not depend on time: limt f (x, t; x0 ) = f (x), and partial dierential equation (1) becoms an ordinary one: 0= d2 f (x) d f (x) 2 2 dx dx with solution 285 f (x) = 2 2x e . (4)
This solution approximates the queue at G/G/1 system: p(n, t; n0 ) f (n, t; n0 ),

0.5 and at steady-state p(n) f (n); one can also choose e.g. p(0) 0 f (x)dx, n+0.5 p(n) n f ( x ) dx , n = 1 , 2 , . . . , [44]. 0.5 The reecting barrier excludes the stay at zero: the process is immediately reected; Eqs. (3),(4) hold for x > 0 and f (0) = 0. Therefore this version of diusion with reecting barrier is a heavy-load approximation: it gives reasonable results if the utilisation of the investigated station is close to 1, i.e. probability p(0) of the empty system is negligable. The errors are larger for small values of x as the mechanism of reecting barrier does not correspond to the behaviour of a service station; some improvement may be achieved by renormalisation or [44] 2 (for C 2 1), [25]. by shifting the barrier to x = CB B This inconvenience may be removed by introduction of another limit condition at x = 0: a barrier with instantaneous (elementary) jumps [28]. When the diusion process comes to x = 0, it remains there for a time exponentially distributed with a parameter 0 and then it returns to x = 1. The time when the process is at x = 0 corresponds to the idle time of the system. The choice of 0 = is exact for Poisson input stream and approximate otherwise. Diusion equation becomes
f (x, t; x0 ) t dp0 (t) dt
2 f (x, t; x0 ) f (x, t; x0 ) + p0 (t) (x 1) , 2 x2 x f (x, t; x0 ) = lim [ f (x, t; x0 )] p0 (t) , x0 2 x
(5)
where p0 (t) = P [X (t) = 0]. The term p0 (t) (x 1) gives the probability density that the process is started at point x = 1 at the moment t because of the jump from the barrier. The second equation makes balance of the p0 (t): the term f (x,t;x0 ) limx0 [ f (x, t; x0 )] gives the probability ow into the barrier and 2 x the term p0 (t) represents the probability ow out of the barrier.
1.2. Steady state solution
In stationary state, when limt p0 (t) = p0 , limt f (x, t; x0 ) = f (x), Eq.(5) becomes ordinary dierential and its solution, if = / = 1, may be expressed as: p0 zx for 0 < x 1, (1 e ) (6) f (x) = p0 z 2 zx (e 1)e for x1, z= . 286
We get p0 from normalisation: p0 = 1 , i.e. the exact result. The mean queue length
E [N ] =
xf (x)dx =
0
1 p0 0.5 z
p0 =
1 0
x(1 ezx )dx +

2 + C2 CA B 2(1 )
x(ez 1)ezx dx
= (7)
0.5 +
if we take p(n) =
n n1 f (x)dx ,
n = 1, 2, 3, . . . . then
2 + C2 CA B 2(1 )
E [N ] = 1 +
(8)
2 , C 2 and small The solution (8) gives better results then (7) for small values of CA B 2 , C 2 , was presented . The rst discussion of errors, which are growing with CA B in [54].
1.3. Transient solution
Consider a diusion process with an absorbing barrier (absorbing barrier means that the process is nished when it attains the barrier) at x = 0, started at t = 0 from x = x0 . Its probability density function (x, t; x0 ) has the following form, see e.g. [3]
(x+x0 )2 e (xx0 ) 2 t (xx0 )2 e 2t e 2t (x, t; x0 ) = 2 t 2
(9)
The density function of rst passage time from x = x0 to x = 0 is x0 ,0 (t) = lim [

x0
(t+1)2 x0 (x, t; x0 ) (x, t; x0 )] = e 2t . 2 x 2 t3
(10)
Suppose that the process starts at t = 0 at a point x with density (x) and evry time when it comes to the barrier it stays there for a time given by a density function l0 (x) and then reappears at x = 1. The total stream 0 (t) of mass probability that enters the barrier is
t
0 (t) = p0 (0) (t) + [1 p0 (0)],0 (t) + where
g1 ( )1,0 (t )d
(11)
,0 (t) =
,0 (t) ( )d , 287
g1 ( ) =
0 (t)l0 ( t)dt .
(12)
The density function of the diusion process with instantaneous returns is

t
f (x, t; x0 ) = (x, t; ) +
g1 ( )(x, t ; 1)d .
(13)
When Laplace transforms of these equations are considered, we have 0 (s) = p0 (0) + [1 p0 (0)] ,0 (s) + g 1 (s) 1,0 (s) , g 1 (s) = 0 (s)l0 (s) where x0 ,0 (s) = ex0 and then g 1 (s) = p0 (0) + [1 p0 (0)] ,0 (s)
+A(s)
(14)
,0 (s) =
,0 (s) ( )d ,
l0 (s) . 1 l0 (s) 1,0 (s)
(15)
Equation (13) in terms of Laplace transform becomes (x, s; x0 ) = (x, s; ) + g (x, s; 1) , f 1 (s) where (x, s; x0 ) = (x, s; ) =
0
(xx0 )
A(s)
e|xx0 |
A(s)
e|x+x0 |
A(s)
, 2 + 2s .
(16) (17)
(x, s; ) ( )d ,
A(s) =
This approach was proposed in [7]. The inverse transforms of these functions could only be numerical and they may be obtained with the use of an inversion algorithm; we have used for this purpose the Stehfests algorithm [56]. In (s) for any xed this algorithm a function f (t) is obtained from its transform f argument t as ln 2 N ln 2 i , f (t) = Vi f (18) 2 i=1 t where
min(i,N/2)
Vi = (1)N/2+i
k=
i+1 2
k N/2+1 (2k )! . (N/2 k )!k !(k 1)!(i k )!(2k i)!
(19)
N is an even integer end depends on a computer precision; we used N = 12 40. 288
Fig. 1 presents, for a certain t, a comparison of diusion and exact results, known in case of M/M/1 station and expressed by a series of Bessel functions, e.g. [43]. If the diusion results are below a certain level, the values of the diusion density are automatically set to zero because of numerical errors of the applied Laplace inversion. The above transient solution of G/G/1 model assumes that parameters of the model are constant. If they are changing we should dene the time periods where they are constant and solve diusion equation within this intervals separately, transient solution obtained at the end of one serves as the initial condition for the next interval see Figs. 2,3. The curves diusion1, diusion2 correspond to mean queues computed with the use of p(n, t) = f (n, t) and formulas (8).
1 exact diffusion
1e-10
1e-20
1e-30 0 10 20 30 40 50 60 70 80 90 100
Fig. 1: Exact distribution of M/M/1 queue, expressed by Bessel functions, and its diusion approximation; t = 70, = 0.75, = 1, queue is empty at the beginning.
1.4. Drawbacks of Stehfest algorithm
The main drawback of Stehfest algorithm is a very large range of the values taken by the coecients Vn in Eqs. (18) and (19) for even relatively small N . For instance, they are approximately in the range 100 1015 for N = 20. Fig. 6 presents the error functions shown in loglogscale, for the shifted Heaviside (unit step) function f (t) = 1(t 1). Its Laplace transform es /s is inverted 289
2.6 E[N(t)] 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0 50 100 150 200 250 300 simulation diffusion1 diffusion2
Fig. 2: The mean queue length following the changes of trac intensity ; diffusion approximation and simulation results; service time constant and equal 1 time unit; Poisson input trac: M/D/1
290
14 E[N(t)] 12 simulation diffusion2 diffusion1 throughput lambda
10
1.5
4 0.8 0.5
0 0 50 100 150 200 250 300 350
Fig. 3: The mean queue length following the changes of trac intensity; diusion approximation and simulation results; service time constant and equal 1 time unit; Poisson input trac: M/D/1
291
1 t=10, step by step t=10, without steps t=30, step by step t=30, without steps t=150, step by step t=150, without steps
1e-10
1e-20 0 10 20 30 40 50 60 70 80 90
Fig. 4: Comparison of diusion approximations of the queue distribution at M/M/1 station having parameters = 0.75 and = 1; the queue is empty at the beginning; results are obtained at t = 10, 30, 150 (a) with the initial condition xed at t = 0 for all t and (b) computed separately for each interval of length = 1 with initial conditions determined at the end of previous interval.
with Stehfest algorithm and compared with the original function. Because of a discontinuity in t = 1, the error in this point must be equal to 0.5. Fig. 6a shows the absolute error with respect to the summation limit N of Eq. (18). It is visible that for small values of N the error is relatively small and it achieves the theoretical value of 0.5 for t = 1. We can also notice that for t < 1 the absolute error is decreasing with the increase of N . The most important fact is that big values of both N and t cause totally erroneous output of the algorithm. Fig. 6b shows the inuence of the used precision on the error function for the same example. A relatively high value of N = 30 is chosen to show the advantage of using larger mantissa. The increase of N makes narrower the interval of accurate calculations for t. The use of greater precision practically doesnt inuence error function in usable range of t. The errors of inversion algorithm are especially visible when the time axis is composed of small intervals and at each interval the density function is obtained from the results of previous interval (hence the errors in all former intervals inuence the current results), see Fig. 4. The transient queue distributions of the same service station but with the input stream of on-o type, given by diusion and simulation are compared in Fig. 5. The on and o phases have 292
1 t=25, simulation t=50, simulation t=25, diffusion t=50, diffusion
1e-10
1e-20 0 20 40 60 80 100
Fig. 5: Comparison of diusion and simulation estimations of the queue distribution for t = 25, t = 50; the source has two phases with intensities 1 = 1.6, 2 = 0.4; the queue is empty at the beginning, computations are done within intervals = 1, simulation has 150 000 repetitions.
100000 1 15 digits 0.01 N = 40 1 0.0001 22 digits 1e-06 N = 18 1e-05 1e-08 28 digits 1e-10 1e-10 1e-12 N = 30
1e-14 0.1 1 10 100 1e-15 0.1 1 10
38 digits 100
Fig. 6: Dependence of error function on value of N in Stehfest formula and on precision of numbers used for calculations
293
exponential distribution and their mean values are equal 100 time units. Although the simulation results were obtained by averaging 150 000 realizations of the random process, the tail of distribution where the probabilities have small values, was not accessible to simulation. 2. G/G/1/N station In the case of a queueue limited to N positions, the second barrier of the same type as at x = 0 is placed at x = N . The model equations become [28] f (x, t; x0 ) t dp0 (t) dt dpN (t) dt = 2 f (x, t; x0 ) f (x, t; x0 ) + 2 2 x x +0 p0 (t) (x 1) + N pN (t) (x N + 1) , f (x, t; x0 ) = lim [ f (x, t; x0 )] 0 p0 (t) , x0 2 x f (x, t; x0 ) = lim [ + f (x, t; x0 )] N pN (t) , xN 2 x
(20)
where (x) is Dirac delta function.

2.1. Steady state
In stationary state, when limt p0 (t) = p0 , limt pN (t) = pN , limt f (x, t; x0 ) = f (x), Eqs.(20) become ordinary dierential ones and their solution, if = / = 1, may be expressed as:
p0 (1 ezx )
for
zx
f (x) =
where z =
and p0 , pN are determined through normalization

t t
(e 1)e pN z (xN ) (e 1)
p0
0<x1, 1xN 1, N 1x<N , (21)
for for
p0 = pN =
lim p0 (t) = {1 + ez (N 1) +
lim pN (t) = p0 ez (N 1) .
[1 ez (N 1) ]}1 ,
(22) (23)
The steady-state solution does not depend on the distributions of the sojourn times in boundaries but only on their rst moments. 294
centering2.1.1. Classes We follow [29]. When the input stream is composed of K classes of customers (k) (all parameters concerning class k have an upper index with and = K k=1 brackets) then the joint service time pdf is dened as (k) (k) b (x) , k=1
K
b(x) = hence
K 1 (k) 1 = , k=1 (k)
and
2 CB
(k) 1 (k) 2 (CB + 1) 1 . 2 (k) k=1
(24)
We assume that the input streams of dierent class customers are mutually independent, the number of class k customers that arrived within suciently long time period is normally distributed with variance (k) CA ; the sum of independent randomly distributed variables has also normal distribution with variance which is the sum of composant variances, hence
2 CA = (k ) 2
(k) (k) 2 . C A k=1
(25)
The above parameters yield , of the diusion equation; function f (x) approximates the distribution p(n) of customers of all classes present in the queue: p(n) f (n) and the probability that there are n(k) customers of class k is
N
pk (n
(k)
)=
n=n
n p(n) (k) n (k)
(k)
n(k)
(k) 1
nn(k)
k = 1, . . . , K . (26)
2.2. G/G/1/N, transient solution
The approach presented for G/G/1 station may be also used in case of two barriers with instantaneous returns, [7]. Consider a diusion process with two absorbing barriers at x = 0 and x = N , started at t = 0 from x = x0 . Its probability density function (x, t; x0 ) has the following form cf. [3] 295
(x, t; x0 ) =
(x x0 ) 1
for t = 0
xn (x x0 xn t)2 exp 2t 2 t n= xn (x x0 xn t)2 exp 2t
for t > 0 , (27)
where xn = 2nN , xn = 2x0 xn . If the initial condition is dened by a function (x), x (0, N ), limx0 (x) = limxN (x) = 0, then the pdf of the process has the form (x, t; ) = N 0 (x, t; ) ( )d . The Laplace transform of (x, t; x0 ) can be expressed as (x, s; x0 ) =
x0 ) exp[ (x ] |x x0 xn | exp A(s) A(s) n=
exp
|x x0 xn | A(s)
(28)
where A(s) = 2 + 2s. For computational eciency we rearranged the Eq. ( 28) to the form (x, s; x0 ) =
x0 ) exp[ (x ] xA(s) x0 A(s) 1(xx0 ) exp 2 sinh A(s) x0 A(s) xA(s) + 1(x0 <x) exp 2 sinh xA(s) x0 A(s) A(s) 2 sinh 2 sinh (29) . exp 2nN n=1
(x, s; ) = N (x, s; ) ( )d . Similarily, 0 The probability density function f (x, t; ) of the diusion process with elementary returns is composed of the function (x, t; ) which represents the inuence of the initial conditions and of a spectrum of functions (x, t ; 1), (x, t ; N 1) which are pd functions of diusion processes with absorbing barriers at x = 0 and x = N , started at time < t at points x = 1 and x = N 1 with densities g1 ( ) and gN 1 ( ):
t t
f (x, t; ) = (x, t; ) +
0
g1 ( )(x, t ; 1)d +
gN 1 ( )(x, t ; N 1)d .
(30)
296
Densities 0 (t), N (t) of probability that at time t the process enters to x = 0 or x = N are
t
0 (t) = p0 (0) (t) + [1 p0 (0) pN (0)],0 (t) +

t
g1 ( )1,0 (t )d
+
0
gN 1 ( )N 1,0 (t )d ,
t 0
N (t) = pN (0) (t) + [1 p0 (0) pN (0)],N (t) +

t
g1 ( )1,N (t )d (31)
+
0
gN 1 ( )N 1,N (t )d ,
where 1,0 (t), 1,N (t), N 1,0 (t), N 1,N (t) are densities of the rst passage time between corresponding points, e.g. 1,0 (t) = lim [
x0
(x, t; 1) (x, t; 1)] . 2 x
(32)
For absorbing barriers

x0
lim (x, t; x0 ) = lim (x, t; x0 ) = 0 ,

xN
(x,t;1) hence 1,0 (t) = limx0 . The functions ,0 (t), ,N (t) denote densities 2 x of probabilities that the initial process, started at t = 0 at the point with density ( ) will end at time t by entering respectively x = 0 or x = N . Finally, we may express g1 (t) and gN (t) with the use of functions 0 (t) and N (t):
g1 ( ) =
0 (t)l0 ( t)dt ,
gN 1 ( ) =
N (t)lN ( t)dt ,
(33)
where l0 (x), lN (x) are the densities of sojourn times in x = 0 and x = N ; the distributions of these times are not restricted to exponantial ones as it is in Eq. (20). Laplace transforms of Eqs. (31,33) give us g 1 (s) and g N 1 (s): g 1 (s) = g N 1 (s) = p0 (0) + ,0 (s) + [pN (0) + ,N (s)] lN (s) N 1,0 (s) 1 lN (s) N 1,N (s)
1
l0 (s) l0 (s) 1,N (s) lN (s) N 1,0 (s) 1 1 l0 (s) 1,0 (s) 1 l0 (s) 1,0 (s) 1 lN (s) N 1,N (s)
lN (s) [pN (0) + ,N (s) + g 1 (s) 1,N (s)] 1 lN (s) N 1,N (s) 297
and the Laplace transform of the density function f (x, t; ) is obtained as (x, s; ) = (x, s; ) + g (x, s; 1) + g (x, s; N 1) . f 1 (s) N 1 (s) are p 0 (s) = (34)
Probabilities that at the moment t the process has the value x = 0 or x = N 1 [ 0 (s) g 1 (s)] , s p N (s) = 1 [ N (s) g N 1 (s)] . s (35)
3. State-dependent diusion parameters, G/G/N/N transient model In the case of G/G/N/N model, the value of the diusion process corresponds to the number of active channels. The output stream depends on the number of occupied channels, hence the diusion parameters depend also on the value of the diusion process: = (x, t), = (x, t). The diusion interval x [0, N ] is divided into subintervals of unitary length and the parameters are kept constant within these subintervals. For each timeand space-subinterval with constant parameters, transient diusion is obtained. The equations for space-intervals are solved together with balance equations for probability ows between neighbouring intervals. The results are obtained in the form of Laplace transforms of density functions of the investigated diusion process and then inverted numerically. If n 1 < x < n, it is supposed that n channels are busy, hence we choose
2 2 (x, t) = (t)CA (t) + nCB ,
(x, t) = (t) n for n 1 < x < n. (36)
Jumps from x = N to x = N 1 are performed with density . In transient state we should balance the probability ows between neighbouring intervals with dierent diusion parameters. We put imaginary barriers at the borders of these intervals and suppose that the diusion process entering the barrier at n, n = 1, 2, . . . N 1, from its left side (the process is growing) is absorbed and immediately reappears at x = n + . Similarily, the process which is diminishing and enters the barrier from its right side reappears at its other side at x = n , see Fig. 7. The density functions for the intervals are as follows:
t
f1 (x, t; 1 ) = 1 (x, t; 1 ) +
t
g1 ( )1 (x, t ; 1)d +
+
0
g1 ( )1 (x, t ; 1 )d , 298
barrier gn (t)
R (t) n L (t) n
barrier
gn+ (t)
n+
Fig. 7. Flow balance for the barrier at x = n

t
fn (x, t; n ) = n (x, t; n ) +
t
gn1+ ( )n (x, t ; n 1 + )d + n = 2, . . . N 1,
+
0
gn ( )n (x, t ; n )d,
t 0
fN (x, t; N ) = N (x, t; N ) +
t
gN 1+ ( )N (x, t ; N 1 + )d + (37)
+
0
gN 1 ( )N (x, t ; N 1)d
and the probability mass ows entering the barriers are:

R n (t) = gn (t), L n (t) = gn+ (t),
n = 1, . . . , N 1
(38)
and g1 (t), gN 1 (t) are the same as in G/G/1/N model, Eq. (33). The densities R (t), L (t) are obtained in the similar way as in G/G/1/N , see Eq. (31): N Ni i
t
0 (t) = p0 (0) (t) + 1 ,0 (t) +

t
g1 ( )1,0 (t )d +
+
0
g1 ( )1,0 (t )d ,
t 0
L 1 (t) = 1 ,1 (t) + t
g1 ( )1+,1 (t )d +
+
0
g2 ( )2,1 (t )d ,
t 0
R 1 (t) = 2 ,1 (t) + t
g1+ ( )1+,1 (t )d +
+
0
g2 ( )2,1 (t )d , 299
L n (t) = n ,n (t) + t
t 0
gn1+ ( )n1+,n (t )d +
+
0
gn ( )n,n (t )d ,
t 0
R n (t) = n+1 ,n (t) + t
gn+ ( )n+,n (t )d + n = 2, . . . N 1
+
0
gn+1 ( )n+1,n (t )d ,
t 0
N (t) = pN (0) (t) + N ,N (t) +

t
gN 1+ ( )N 1+,N (t )d + (39)
+
0
gN 1 ( )N 1,N (t )d ,
where i,j (t) are the densities of the rst passage time between points i, j and are obtained as in G/G/1/N model, Eq.(32). This system of equations is subject to Laplace transformation and once again the Laplace transforms f n (x, s; n ) are obtained numerically, for a series of s values needed by the inversion algorithm for a specied t. 4. Open network of G/G/1, G/G/1/N queues, steady state The open networks of G/G/1 queues were studied in [29]. Let M be the number of stations and suppose at the beginning that there is one class of customers. The throughput of station i is, as usual, obtained from trac equations
M
i = 0i +
j =1
j rji ,
i = 1, . . . , M,
(40)
where rji is routing probability between station j and station i; 0i is external ow of customers coming from outside of network. Second moment of interarrival time distribution is obtained from two systems 2 as a function of C 2 and C 2 ; the second denes of equations; the rst denes CDi Ai Bi 2 2 2 CAj as another function of CD1 , . . . , CDM : 1) The formula (41) is exact for M/G/1, M/G/1/N stations and is approximate in the case of non-Poisson input [2] di (t) =
i bi (t)
+ (1
2 2 i CBi
i )ai (t)
bi (t) ,
i = 1, . . . , M,
(41)
where * denotes the convolution operation. From (41) we get

2 CDi = 2 + CAi (1 i)
i (1
i)
(42)
300
2) Customers leaving station i according to the distribution Di (x) choose station j with probability rij : intervals between customers passing this way has pdf dij (x) dij (x) = di (x)rij +di (x)di (x)(1rij )rij +di (x)di (x)di (x)(1rij )2 rij + (43) or, after Laplace transform, ij (s) = d i (s)rij + d i (s)2 (1 rij )rij + d i (s)3 (1 rij )2 rij + = d hence E [Dij ] = 1 , i rij i (s) rij d i (s) , 1 (1 rij )d (44) (45)
2 2 CDij = rij (CDi 1) + 1 .
2 E [Dij ], CDij refer to interdeparture times; the number of customers passing from station i to j in a time interval t has approximately normal distribution with 2 r t. The sum of streams entering station j has mean i rij t and variation CDij i ij normal distribution with mean M M
j t = [
i=1
i rij + 0j ] t
and variance
2 Aj t={
i=1
2 2 CDij i rij + C0 j 0j }t ,
hence
2 CAj =
1 j
M i=1 2 rij i [(CDi 1)rij + 1] +
2 C0 j 0j . j
(46)
2 represent the external stream of customers. For K classes Parameters 0j , C0 j
od customers with routing probabilities rij (let us assume for simplicity that the customers do not change their classes) we have i and
2 CDi = i (k) 2 [CBi 2 (k) k=1 i K (k)
(k)
= 0i +
j =1
(k)
j rji ,
(k) (k)
i = 1, . . . , M ; k = 1, . . . , K,
(47)
(k)
+ 1] + 2 i (1
(k) 2
i)
2 + (CAi + 1)(1
i)
1.
(48)
A customer in the stream leaving station i belongs to class k with probability i /i and we can determine CDi in the similar way as it has been done in Eqs. (k) (43-45), replacing rij by i /i : CDi =
(k) 2 (k)
i 2 (CDi 1) + 1 ; i 301
(k)
(49)
then
2 CAj
1 = j
(k) rij i
l=1 k=1
(k) K C i (k) 0j 0j 2 (CDi 1) rij + 1 + . i j k=1
(k) 2 (k)
(50)
Eqs. (42), (46) or (48), (50) form a linear system of equations and allow us 2 and, in consequence, parameters , for each station. to determine CAi i i
5. Open network of G/G/1, G/G/1/N queues, one class, transient solution In the case of one class of customers, the time axis is divided into small intervals (equal e.g to the smallest mean service time) and at the beginning of each interval the equations (40),(42),(46) are used to determine the input parameters of each station based on the values of i (t) obtained at the end of the precedent interval, [22]. A software tool was prepared [15, 16] and the examples below, see Fig. 8, are computed with its use. Model 1. The network is composed of the source and three stations in tandem. The source parameters are: = 0.1 t [0, 10], = 4.0 t [10, 20]. 2 = 1, i = 1, 2, 3. Parameters of all stations are the same: Ni = 10, i = 2, CBi Fig. 9a presents mean queue lengths of stations in Model 1 as a function of time. Diusion approximation is compared with simulation. Model 2, its topologu is in Fig. 8. The characteristics of three sources and of one station are changing with time in the following pattern: source A: A = 0.1 for t [0, 10], A = 4.0 for t [10, 21], A = 0.1 for t [21, 40], source B: B = 0.1 for t [0, 11], B = 4.0 for t [11, 20], B = 0.1 for t [20, 40], source C: C = 0.1 for t [0, 15], C = 2.0 for t [15, 22], C = 4.0 for t [22, 30], C = 2.0 for t [30, 31], C = 0.1 for t [31, 40]. Station 6: 6 = 2 for t [10, 15] and t [31, 40]; 6 = 4 for t [15, 31]. Other parameters are constant: N1 = N4 = 10, N3 = 5, N2 = N6 = 20, 1 = = 5 = 2. Routing probabilities are: r12 = r13 = r14 = 1/3, r64 = 0.8. Initial state: N1 (0) = 5, N1 (0) = 5, N2 (0) = 10, N3 (0) = 10, N4 (0) = 5, N5 (0) = 5, N6 (0) = 10. The results in the form of mean queue lengths are presented and compared with simulation in Figs. 9, 10. 302
source
station 1
station 2
station 3
station 4
station 3 source A
station 1
station 5
station 6
station 2 source C source B
Fig. 8. Models 1 and 2
9 source station 1, diffusion station 1, simulation station 2, diffusion station 2, simulation station 3, diffusion station 3, simulation
20 18 16 14 station 1, diffusion station 1, simulation station 2, diffusion station 2, simulation source A source B
6 12 5 10 4 8 3 6 2 4 2 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Fig. 9: (a) Model 1: mean queue lengths of station1, station2 and station3 as a function of time diusion and simulation (100 000 repetitions) results; the source intensity (t) is indicated. (b) Model 2: Mean queue lengths of station1 and station2 as a function of time diusion and simulation (100 000 repetitions) results; the source intensities A (t), B (t) are indicated.
303
10 9 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35 40 station 3, diffusion station 3, simulation station 4, diffusion station 4, simulation source A source B source C
20 18 16 14 12 10 8 6 4 2 0 0 5 10 15 20 25 30 35 40 station 5, diffusion station 5, simulation station 6, diffusion station 6, simulation source A source B source C
Fig. 10: Model 2: mean queue lengths of station3 and station4 (a) and of station5 and station6 (b) 6. Leaky bucket model In the leaky bucket scheme, the cells, before entering the network, must obtain a token. Tokens are generated at constant rate and stocked in a buer of nite capacity B . If there is a token available, an arriving cell consumes it and leaves the bucket. If not, it waits for the token in the cell buer. The capacity of this buer is also limited to M , Fig. 11. Tokens and cells arriving to full buers are lost. The diusion process X (t) is dened on the interval x [0, N = B + M ] where B is the capacity of cell buer and M is the capacity of token buer, [12]. The current value of the process is dened as x = b m + M , b and m being the current contents of the buers.
B cell stream
b m M
token stream
Fig. 11. Leaky bucket scheme Let us suppose that the cell interarrival time distribution has the mean 1/c and squared coecient of variation CA 2 c . The tokens are generated with constant 304
rate t , hence CA 2 t = 0. Arrival of a cell increases the value of the process and arrival of a token decreases it, therefore we choose the parameters of the diusion process as: = c t , = c CA 2 c . The process evolves between two barriers placed at x = 0 and at x = M + B ; x = 0 represents the state where the whole token buer is occupied and arriving tokens are lost; x = M + B represents the state where the token buer is empty and the cell buer is full: arriving cells are lost. The sojourn time at x = M + B corresponds to the residual token interarrival time, i.e. the time between the moment when the cell buer becomes full and the moment of the next token arrival. In [33] the density of holding time at the upper barrier of M/D/1/N diusion model was obtained; we follow this approach and assume that the density function lM +B (x) is lM +B (x) = c e c x . ec /t 1 (51)
The cell loss ratio L(t) may be bounded by expression [33] L(t) pN (t)P r[Arr(t, t + 1/t ) 1] where Arr(t, t + 1/t ) is the number of cell arrivals during interval [t, t + 1/t ]. If the cell stream is Poisson, the pdf l0 (x) of the sojourn time at x = 0 is dened by the density of cell interarrival time; otherwise we take this density as an approximation of l0 (x). Note that the sojourn times in boundaries are dened here by the densities l0 (t), lN (t) and are not restricted to exponential distributions. The values x > M of the process correspond to states where cells are waiting for tokens, the value x M determines in this case the number of cells in the buer; x < M means that there are tokens waiting for cells and the value M x corresponds to the number of tokens in the buer. Probability of b cells in the buer at time t is dened by f (M + b, t); probability of the empty cell buer M is given by pt (t) = p0 (t) + 0 f (x, t)dx. Probability of m tokens in the buer is given by f (M m, t) and probability of empty token buer is determined by M +B f (x, t)dx + pN (t) where p0 (t) = P r[X (t) = 0], pN (t) = P r[X (t) = N ]. M The service time is constant, hence the density function of the cell waiting time for tokens (response time of leaky bucket) may be estimated as r(x, t) = t f (t x + M, t). Hence, using G/G/1/N model we obtain transient f (x, t; ) and steady-state f (x) distributions of the diusion process for 0 x M + B . This gives us the 305
distribution of the number of tokens and cells in the leaky bucket, the response time distribution, the loss probabilities, the properties of the output stream, etc. The capacities of cell and token buers may be null, so we are able to consider a number of leaky bucket variants. The output process of the leaky bucket is the same as the cell input process provided, with probability pt (t), that there are tokens available and it is the same as token input process with probability [1 pT 9t)] that tokens are not available; at the time moment t the pdf d(x) of interdeparture times in the output stream is 1 d(x, t) = pt (t)a(x, t) + [1 pt (x, t)] (x ) , (52) t where a(x, t) is the time-dependent pdf of cell interarrival times distribution. Eq. (52) gives us the mean value and squared coecient of interdeparture times distribution, i.e. whole information needed to incorporate one or multiple leakybucket stations (for example a cascade of leaky-buckets with dierent parameters) in the diusion queueing network model of G/G/1 or G/G/1/N stations. The principles of the latter model were given in [29]. Numerical example At t = 0 the cell buer is empty and the token buer contains M (0) tokens. The tokens are generated regularly each time-unit. The cell arrival stream is Poisson; the mean interarrival time is 0.5 time-unit for 0 t < 100 and 1.5 time units for t 100, i.e. there is a trac wave exceeding the accorded level during the rst 100 units and then the trac goes down below this level. The buer capacities are B = M = 100. Figure 12 displays the diusion and simulation results concerning the output stream of leaky bucket for the initial number of tokens M (0) = 0, 50 and 100. The output dynamics given by simulation and by difusion model are very similar. Simulation results are obtained as a mean of 100 000 independent runs. If there is no tokens at the beginning, the cell stream is immediately cut to the level of token intensity (one cell per time unit), the excess of cells is stocked in the cell buer and transmitted later, when t > 100 and input rate becomes smaller. If there are tokens in the token buer, a part (for M (0) = 50) or almost totality (for M (0) = 100) of the trac wave may enter into the network. Figure 13 presents the comparison of mean number of cells in the cell buer as a function of time, for dierent initial content of the token buer M (0) = 0, 50 and 100, obtained by diusion and simulation models. In Figure 14 the distributions of cell buer contents obtained by simulation and by diusion are presented for several moments, including t = 100, i.e. the end of high trac period, when the congestion is the biggest. We see that although the mean queue length is below the buer capacity, the probability that the buer is full 306
is important ( 0.4). Note that we could not obtain this result with the use of uid approximation even if the mean number of cells in the buer predicted by diusion and uid approximations were similar.
2.5 leaky bucket input diffusion, M(0) = 0 diffusion, M(0) = 50 diffusion, M(0) = 100 simulation, M(0) = 0 simulation, M(0) = 50 simulation, M(0) = 100
input
1.5
1 M(0)=100 M(0)=50
M(0)=0
0.5
0 0 100 200 300 400 500 600
Fig. 12: The input and output of leaky bucket as a function of time the stream intensities for the initial number of tokens M (0) = 0, 50 and 100; diusion and simulation results
7. Sliding window model The mechanism watches whether for each time t the number of cells arrived during [t Tc , t] is not higher then a xed threshold Bc . These cells may pass as conforming trac with higher priority. If the threshold Bc is attained, an additional number Be of cells may enter the network with lower priority. The next cells are deleted. The performance of this mechanism is modelled by two service station of G/D/N/N type with N = Bc and N = Be , and constant service time equal Tc , see Fig. 15. If the cell stream has interarrival pdf a(x) then the input streanmto the upper station has interarival density function a1 (x) = a(x)(1 pBc )+ a(x) a(x)pBc (1 pBc )+ a(x) a(x) a(x)p2 Bc (1 pBc )+ . 307
100 90 80 70 60 50 40 M(0)=50 30 20 10 0 0 100 200 300 400 500 600 diffusion, M(0) = 0 diffusion, M(0) = 50 diffusion, M(0) = 100 simulation, M(0) = 0 simulation, M(0) = 50 simulation, M(0) = 100
M(0)=0
M(0)=100
Fig. 13: Mean number of cells in the cell buer as a function of time, M (0) = 0, 50 and 100; diusion and simulation results
0.06 simulation t = 25 simulation t = 50 simulation t = 75 diffusion t = 25 diffusion t = 50 diffusion t = 75
1 simulation, t = 100 diffusion, t = 100
0.05
0.1 0.04
0.03
0.01
0.02 0.001 0.01
0 0 20 40 60 80 100
0.0001 0 20 40 60 80 100
Fig. 14: Density of the number of cells during high source activity period, t = 25, 50, 75 (a), t = 100 (b); M (0) = 0; diusion and simulation results
308
1 cells stream a(x) a1 (x) 1 p Bc pB c d1 (x) 1 a2 (x) 1 p Be d2 (x) pB e i Be lower prority cells a2 (x) i Bc a1 (x) higher priority cells
rejected cells
Fig. 15. Sliding window queueing model This streams enters the network. The stream of cells rejected by the rst station and directed to the second one has pdf d1 (x): d1 (x) = a(x)pBc + a(x) a(x)(1 pBc )pBc + a(x) a(x) a(x)(1 pBc )2 pBc + , and the stream of cells with lowered priority entering the network has the pdf a2 (x) = a1 (x)(1pBe )+a1 (x)a1 (x)pBe (1pBe )+a1 (x)a1 (x)a1 (x)p2 Be (1pBe )+ . Numerical examples can be found in [40, 53]. 8. Multiclass FIFO queues, output streams dynamics Consider a queueing network model representing a computer network. Customers (cells in an ATM network) are grouped into classes. Each class represents a connection between two points of the network and its description includes the features of the source and the itinerary across the network. Customer queues at servers correspond to the queues of cells waiting at a node to be sent further. At a queue exit the class of a customer should be known to determine its routing. 309
In steady state queuing models this probability, for a certain class k , is given by the ratio of the throughput (k) of this class customers passing the node to (k) and K is the the total througput of this node : (k) /, where = K k=1 number of classes passing across the node. In transient state, the throughputs are a function of time and the probability (k) (t)/(t) as well. Morover, the ows (k) (k) at the entrance and the exit are not the same: i, in = i, out . The composition of output ow reects the previous compositions at the entrance delayed by the response time of the queue. To solve this problem, we choose the constant service time as the time unit, divide the time axis into intervals of unitary length and assume that the ow parameters are constant during that interval, e.g. (k) () denotes the class k ow at station i during an interval , = 1, 2, . . . (k) The input stream k in () reaches the output of the queue with a delay corresponding to the queue length in the buer at the arrival time. The part (k) p(n, ) k i, in () of this submitted load cannot be served and the corresponding ow cannot appear at the output before the time t = + n + 1. So, taking into account these delays, the unnished work ready to be processed at the time t in the station which is initially empty can be expressed as:
N
U (k) (t) =
n=1
in (t n) p(n 1, t n)
(k)
for
k = 1, . . . , K.
A similar formulation may be easily derived for initially nonempty queue knowing its composition at t = 0. Remark that some accumulation periods (of high or quickly increasing load) may yield that ready work at t exceeds the server capacity: k U (k) (t) > 1. Such a phenomenon introduces some additional delay in the transfer of the input stream to the output. (k) To compute the output throughput out (t), we nd rst the ready time 1 of the cells at the head of the queue. This is equal to the smallest value of such that:
=0 k t1
U (k) ( )
out ( ).
=0 k
(k)
(53)
Then, we determine the smallest 2 for which

2 =0 k t1
U (k) ( )
=0 k
out ( ) 1 p(0, t).
(k)
310
in (t)
(1)
in (t)
(2)
out (t) + out (t) + out (t)
(1)
(2)
(3)
in (t)
(3)
Fig. 16. A model of a switch The output throughput out (t) is obtained as out (t) =
=1 (k) 2 (k)
w U (k) ()
(54)
where w1 represents the percentage of U (k) (1 ) that has not been sent yet, w = 2 1 for = 1 and 2 , and w2 is chosen such that =1 w U ( ) = 1 p(0, t). Numerical example. Consider an ATM switch presented in Fig. 16. Each of the output queues has the capacity of 100 cells and is analysed with the use of G/D/1/100 multiclass diusion model and M/D/1 multiclass uid approximation model. The service time is constant, = 1. The input streams are Poisson with parameters (k) (t) chosen as: time t (1) in (t) (2) in (t) (3) in (t) 0 10 0.1 0.2 0 10 20 0.8 0.2 0 20 30 0.8 0.2 0.5 30 40 0.1 0.2 0.5 40 80 0.1 0.2 0 > 80 0 0 0
The streams are switched to a chosen output port as indicated in the gure. Figures 17 19 present the results: the output streams at the output port (queue is empty at the beginning). These results, obtained with the use of diusion approximation and uid ow approximation, are compared with simulation results which represent the mean of 400 000 independent runs and practically may be considered as exact. The dierencies between input and output streams 311
are clearly visible for the whole stream as well as for each class separately. They justify the need of the presented approach in the transient analysis of networks: the impact the switch queue on the ow dynamics is important and cannot be neglected. In all considered cases the results of diusion approximation are very close to simulation and clearly better that those obtained with the uid ow approximation.
9 Fluid App. Class 1 Fluid App. Class 2 Fluid App. Class 3 Fluid App. General Simulation General Simulation Class 1 Simulation Class 2 Simulation Class 3 9 Simulation Global Simulation Class 1 Simulation Class 2 Simulation Class 3 Diffusion Global Diffusion Class 1 Diffusion Class 2 Diffusion Class 3
0 0 10 20 30 40 50 60 70 80
0 0 10 20 30 40 50 60 70 80
Fig. 17: The mean queue length: global and per classes as a function of time simulation and uid approximation results (a), simulation and diusion approximation results (b)
1.4
Global input Simulation global output Diffusion global output Fluid Flow global output
0.8
Class 1 input Simulation Class 1 output Diffusion Class 1 output Fluid Flow Class 1 output
0.7 1.2 input and output traffic intensities input and output traffic intensities 0.6
0.5
0.8
0.4
0.6
0.3
0.4
0.2
0.2
0.1
0 0 10 20 30 40 time 50 60 70 80 90
0 0 10 20 30 40 time 50 60 70 80 90
Fig. 18: The intensities in (t), out (t) of global input and output ows (a) and for class 1 trac (b); the output ow is obtained by simulation, diusion and uid ow approximations
312
0.6 Class 2 input Simulation Class 2 output Diffusion Class 2 output Fluid Flow Class 2 output 0.5
0.6 Class 3 input Simulation Class 3 output Diffusion Class 3 output Fluid Flow Class 3 output 0.5
input and output traffic intensities
0.4
input and output traffic intensities
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 0 10 20 30 40 time 50 60 70 80 90
0 0 10 20 30 40 time 50 60 70 80 90
Fig. 19: The intensities of in (t), out (t) input and output ows for class 2 trac (a) and for class 3 trac (b); the output ow is obtained by simulation, diusion and uid ow approximations 9. Switch with threshold (partial buer sharing) algorithm In a node with partial buer sharing policy the diusion process represents the content of the cell buer. The process is determined on the interval x [0, N ] where N is the buer capacity, [12]. When the number of cells is equal or greater than the threshold N1 (N1 < N ), only priority cells are admitted and ordinary ones are lost. Diusion process represents the number cells of both classes, hence its parameters depend on their input and service parameters which are dierent for x N1 and x > N1 : (x) = and
= (1) C (1) 2 + (2) C (2) 2 + C 2 1 B A A (x) = = (1) C (1) 2 + C 2 2 B A
(2)
(2)
1 = (1) + (2) for 0 < x N1 , 2 = (1) for N1 < x < N for 0 < x N1 ,
(55)
(56)
for N1 < x < N.
2 = 0. Once again we use the Eq. 51 We assume constant service time, hence CB to represent the sojourn time in the barrier at x = N and to determine N as the inverse of the mean sojourn time. Steady state solution. Let f1 (x) and f2 (x) denote the pdf function of the diusion process in intervals x (0, N1 ] and x [N1 , N ). We suppose that limx0 f1 (x, t; x0 ) = limxN f2 (x, t; x0 ) = 0 , f1 (x) and f2 (x) functions have the same value at the point N1 : f1 (N1 ) = f2 (N1 ),
313
there is no probability mass ow within the interval x (1, N 1): n d fn (x) n fn (x) = 0 , 2 dx x (1, N1 ), n = 1 and x (N1 , N 1), n = 2,
and we obtain the solution of diusion equations:

[(1) + (2) ]p0 (1 ez1 x ) 1
1
f1 (x) =
for 0 < x 1 , for 1 x N1 , for N1 x N 1 , for N 1 x < N ,
f2 (x) =
f (N )ez2 (xN1 ) 1 1
[(1) + (2) ]p0 (1 ez1 )ez1 (x1)
pN 1 ez2 (xN ) 2
(57)
where zn =
2n , n = 1, 2. Probabilities p0 , pN are obtained with the use of n normalization condition. The loss ratio L(1) is expressed by the probability pN , N f (x)dx + pN . the loss ratio L(2) is determined by the probability P [x > N1 ] = N 1 2 Numerical example. Fig. 20 presents in linear and logarithmic scale the steady-sate distribution given by Eqs. (57) of the number of cells present in a station. The buer length is N = 100, the threshold value is N1 = 50. Some of the values are compared with simulation histograms which we were able to obtain only for relatively large values of probabilities.
0.3 simulation, lambda = 0.8 simulation, lambda = 0.9 diffusion, lambda = 0.8 diffusion, lambda = 0.9 0.25 1 diffusion, lambda = 0.9 diffusion, lambda = 0.8 diffusion, lambda = 0.7 diffusion, lambda = 0.6 diffusion, lambda = 0.4 diffusion, lambda = 0.3 diffusion, lambda = 0.2 diffusion, lambda = 0.1 simulation, lambda = 0.9 simulation, lambda = 0.8 simulation, lambda = 0.7 simulation, lambda = 0.6 simulation, lambda = 0.4
0.2
0.15
1e-10
0.1
0.05
0 0 20 40 60 80 100
1e-20 0 20 40 60 80 100
Fig. 20: Steady state distribution of the number of cells for trac densities (1) = (2) = = 0.9, 0.9; diusion and simulation results, normal (a) and logarithmic (b) scale 314
Transient solution. The transient solution is obtained with technique presented earlier for G/G/N/N model. It makes use of the balance equations for probability ows crossing the barrier situated at the boundary between the intervals with dierent diusion coecients, i.e. at x = N1 . Let us consider two separate diusion processes X1 (t), X2 (t): X1 (t) is dened on the interval x (0, N1 ). At x = 0 there is a barrier with sojourn times dened by a pdf l0 (t) and instantaneous returns to the point x = 1. L (t) the pdf that the At x = N1 an absorbing barrier is placed. Denote by N 1 process enters the absorbing barrier at x = N1 . The process is reinitiated at x = N1 with a density gN1 (t). X2 (t) is dened on the interval x (N1 , N ). It is limited by an absorbing barrier at x = N1 and by a barrier with instantaneous returns at x = N . The sojourn time at this barrier is dened by a pdf lN (t) and the returns are performed to x = N 1. The process is reinitiated at x = N1 + with a density gN1 + (t). R (t) the pdf that the process X (t) enters the absorbing barrier at Denote by N 2 1 x = N1 . The interaction between two processes is given by equations
L gN1 + (t) = N (t) 1
and
R gN1 (t) = N (t), 1
i.e. the probability density that one process enters to its absorbing barrier is equal to the density of reinitialization of the other process in the vicinity of the barrier. Equations (31) and (33) form a set of eight equations with eight unknown functions. When we transform these equations with the use of Laplace transform, the convolutions of density functions become products of transforms and we have a set of linear equations where the unknown variables are: g 1 (s), g N1 (s), g N1 + (s), g N 1 (s), 0 (s), N (s), N1 (s), N1 + (s). They may be expressed by all other functions, that means 1 ,0 (s), 1 ,N1 (s), 1,0 (s), 1,N1 (s), N1 ,0 (s), N1 ,N1 (s), 2 ,N 1 (s), 2 ,N (s), N1 +,N1 (s), N1 +,N (s), N 1,N1 (s), N 1,N (s) which are already determined by equations of type (32). This way we obtain the functions g 1 (s), g N1 (s), g N1 + (s), g N 1 (s) and use them in the pdfs (37). The time-domain originals f1 (x, t; 1 ), f2 (x, t; 2 ) are obtained numerically [56] from their transforms. The density of the whole process is f (x, t; ) = f1 (x, t; 1 ) for 0 < x < N1 , f2 (x, t; 2 ) for N1 < x < N .
To see the evolution of the number of cells belonging to a class, we have to consider the composition of input and output streams. Let us denote by p(i) (t) 315
probability that a cell arriving at time t belongs to class i. p (t) = where e (t) = (1) (t)[1 pN (t)],
(1) (i)
e (t) e (t) + e (t)

(1) (2)
(i)
(58)
e (t) = (2) (t)[1 pnN1 (t)]
(2)
(59)
and pnN1 (t) is probability that the buer space accessible for class 2 cells is full and these cells are rejected. We try to reect the mutual inuence of both classes in eective parameters of their service and then analyze the class behaviour independently. We know the distribution of the total number n(t) of cells in the buer at time t. Among those cells there are n(2) (t) class 2 cells. Let us denote by (t), 0 (t) n, the number of class 1 cells gathered at the end of the buer behind the last class 2 cell, seen at time t by the arriving new class 2 cell. As the service time is equal to one time unit, the eective service time for the arriving class 2 cell is 1 + . If n(2) (t) > 0 then P [ = i | n(t) = n, n where
l Cm (2)
(t) = n
(2)
]=
n 1 Cn i1 n Cn
(2)
(2)
0 n n(2) ,
(60)
if n(2) = 0 then P ( = i | n) = i,n where i,n is the Kroenecker symbol. We determine the probability of = i when n(t) = n, for all possible n(2) :
min{ni,N1 1}
m! = l!(m l)! 0

for m l 0 , otherwise ,
P [ = i n(t) = n]
(1 p(2) )i i,n +
P [n(2) n] P [ = i n, n(2) ]
n(2) =1 min{ni,N1 1}
(1 p
(2) i
) i,n +
n(2) =1
n 1 (2) Cn i1 p
(2)
n(2)
(1 p(2) )nn
(2)
(61)
and the distribution of is

N
P [ = i]
(1 p(2) )i P [n(t) = i] +
P [n(t) = l ]
l=0
min{li,N1 1} 1 (2) m Clm (1 i1 p m=1
p(2) )lm (62)
316
Now we are able to determine the mean and squared coecient of variation of the random variable B (2) representing the eective service time for class 2 cells, B (2) (t) = 1 + (t): E [B
(2)
(t)] = 1+E [ (t)],

(2) 2
(2) 2 CB (t)
E [B (2) (t)] E [ 2 (t)] E [ (t)]2 = 1 = . (E [ (t)] + 1)2 E [B (2) (t)]2 (63)

(1) 2
The coecient CB
(1) 2 CA ,
are also deduced from the input streams. principle and The changes in the intensities of the input rates at the instant t inuence the output with a delay of n(t). The service times of (t) class 1 cells which are at the end of the queue are considered as a part of the service of the arriving class 2 cell. The change of the input at t is taken into account in the service time (2) (t + n(t) (t)). On the other hand, the tile composition of the queue does not depend only on the input composition p(t) but on its evolution since the last class 2 cell arrival moment. It is not easy to determine in transient analysis the delay with which the input changes act on the (t). As a rough approximation, we considered a delay equal to n(t)/2. This choice permits us to deal with sudden falls of the class 2 input rate. Although this method captures the dynamics of the second class cell number, further eorts seem to be necessary to obtain more general characterization of time dependent queue composition. Numerical example. Let us suppose that at the beginning the buer is empty and during the interval t [0, 100] the input stream of priority cells has ratio (1) = 2 cells per time unit and the one of low priority cells (2) = 1; for t > 100 the ratio of high priority cells is (1) = 0.6667, the ratio of low priority cells does not change. The service time is constant and equal one time unit. The buer length is N = 100, the value of threshold varies between N1 = 50 and N1 = 90. The value of in Eqs. (37)(39) was chosen = 0.1. In Fig. 21 the distributions of the number of cells at the buer obtained by simulation and by diusion model for chosen time moments are compared. Diusion and simulation results are placed in separate gures to preserve their legibility. The shape of curves given by two models is very similar. At the end of second period (t = 400, 500, 600) the steady state distribution is attained. Fig. 22 displays the mean number of cells in the buer as a function of time. During the rst 100 time units the congestion is clearly visible, the buer quickly becomes saturated; during the second period the queue is also overcrowded, probability that the threshold is exceeded is near 0.7 but owing to the buer sharing policy the probability that the buer is inaccessible for priority cells remains negligible Fig. 23. The threshold value N1 is a parameter of displayed curves. 317
(2) 2 CA
is given by Eq. (63); CB
is deduced on the similar
If N1 increases the mean values of low priority cells increases (they have more space in the buer, hence less of them is rejected) and the number of priority cells increases too (as there is more class 2 cells in the queue, class 1 cells wait longer). Fig. 24 displays the mean number of high and low priority cells given by the approach we have described above using Eqs. (58)(63) and compared with simulation results. We see that the steady state mean value of class 2 cells is underestimated (because of overestimation of class 2 losses by diusion approximation seen in Fig. 23) but the dynamics of class 2 cells vanishing from the queue during heavy saturation periods is well captured. Some numerical problems were encountered when computing expressions of (2) (1) (x, t, ) and 0 (t) for very small values of e (t), e (t) and forced us to very careful programming.
0.4 t = 25 t = 50 t = 75 t = 120 t = 140 t = 200 t = 300 t = 400 t = 500 0.4 t = 25 t = 50 t = 75 t = 120 t = 140 t = 200 t = 300 t = 400 t = 500 0.35 0.35
0.3
0.3
0.25 t = 300,400,500 0.2
0.25 t = 300,400,500 0.2
t = 75
t = 75
0.15 t = 120
0.15 t = 120
0.1
t = 140 t = 200 t = 25 t = 50
0.1
t = 140 t = 200 t = 25
0.05
0.05
t = 50
0 0 20 40 60 80 100
0 0 20 40 60 80 100
Fig. 21: Distribution of the number of all cells in the buer for several time moments t = 25 500; buer size N = 100, threshold N1 = 50; simulation (a) and diusion (b) results
10. Switch with push-out algorithm The push-out policy is dened as follows. While the number n of customers in such a stationis inferior to N , it acts as a conventional G/G/1/N station serving two classes of customers. During the saturation periods, i.e. when n = N , the class-2 customers (lower priority) are replaced by their privileged partners (class1 customers). An iterative algorithm to calculate the eective arrival rates under replacement policy is proposed [6]. Let (k,j ) be the arrival rate of class k customers in the j th iteration; set (k,0) = (k) . 318
100 90 80 70 60 50 40 30 simulation results 20 10 0 0 100 200 300 400 500 600 N1 = 50 N1 = 60 N1 = 70 N1 = 80 N1 = 90 N1 = 90 N1 = 80 N1 = 70
100 90 80 70 60 50 40 30 diffusion results 20 10 0 0 100 200 300 400 500 600 N1 = 50 N1 = 60 N1 = 70 N1 = 80 N1 = 90 N1 = 90 N1 = 80
N1 = 70 N1 = 60 N1 = 50
N1 = 60 N1 = 50
Fig. 22: Mean number of cells as a function of time, parametrized by the value N1 = 50, 60, 70, 80, 90 of the threshold, simulation (a) and diusion (b) results
1 50 90 0.8 90 80 70 60 50 Probability that the threshold is exceeded
1 50 90 0.8 90 80 70 60 50 Probability that the threshold is exceeded
0.6
90 buffer overflow, N1=50 threshold exceeded, N1=50 buffer overflow, N1=60 threshold exceeded, N1=60 buffer overflow, N1=70 threshold exceeded, N1=70 buffer overflow, N1=80 threshold exceeded, N1=80 buffer overflow, N1=90 threshold exceeded, N1=90
0.6 90 buffer overflow, N1=50 threshold exceeded, N1=50 buffer overflow, N1=60 threshold exceeded, N1=60 buffer overflow, N1=70 threshold exceeded, N1=70 buffer overflow, N1=80 threshold exceeded, N1=80 buffer overflow, N1=90 threshold exceeded, N1=90
0.4
0.4
50 0.2 Probability that the buffer is full
50 0.2 Probability that the buffer is full
0 0 100 200 300 400 500 600
0 0 100 200 300 400 500 600
Fig. 23: Probability that the buer of length N = 100 is full (priority cells are lost) and that the threshold is exceeded (ordinary cells are lost) as a function of time, parametrized by the threshold value N1 = 50, 60, 70, 80, 90; simulation (a) and diusion (b) results
319
100 90 80 70 60 50 40 30 20 10 0 0 100 200 300 400 500 600 class 2, simulation class 1, simulation class 1 and 2, simulation and diffusion class 1, diffusion class 1, diffusion class 2, diffusion
Fig. 24: Mean value of class 1 and class 2 cells as a function of time; N = 100, N1 = 50 During non-saturation periods the customers are not deleted and parameters of the model are calculated with the use of Eqs. (24-25). We obtain via G/G/1/N model the function f (x), p0 , pN as specied by Eqs. (21,22,23) and the distribution p(n) of n customers of both classes taken together. The conditional distribution p(n|n < N ) which corresponds to the nonsaturation period can be easily obtained from Eq. (26). The process enters the saturation period with probability p(N ). The conditional distribution of the number of class-k customers calculated without replacements is as follows: p(k) (n(k) |N ) = N n(k) (k)
n(k)
(k) 1
N n(k)
k = 1, 2 . (64)
The sojourn time in the barrier x = N has pdf b(x) = (2,j ) (1,j ) (1) b ( x ) + b(2) (x) . (1,j ) + (2,j ) (1,j ) + (2,j )
Now the policy of replacements proceeds. If we approximate the stream of class-1 customers during saturation period by a Poisson process with 320
parameter (1) , probability of n arrivals of class-1 customers within a single saturation period is
parriv (n) =
((1,j ) x) (1,j ) exp x b(x)dx , n!

n
n = 0, 1, . . . .
If the service time is constant, b(x) = (x 1/), then ((1,j ) /) (1,j ) exp / , parriv (n) = n! and the probability of n replacements is
N
n = 0, 1, . . .
prep (n) = parriv (n)

n2 =n+1
p(2) (n(2) |N ) + p(2) (n|N )
parriv (i).
i=n
The rst sum corresponds to situations where there are n arrivals and at least n + 1 class-2 customers which could be replaced. The second sum corresponds to the situations where there are n class-2 customers and at least n class-1 arrivals. In the case of non-Poisson input stream, the pdf a(x) or distribution function A(x) of interarrival times is required to determine probability of n arrivals: x an (t)[1 A(x t)]dt dx, b(x) parriv (n) =
0 0
where n denotes the n-fold convolution. In a G/G/1/N system, class-1 and class-2 customers arriving at a saturation period are lost and the eective throughputs are e e
(1)
(2)
= (2) (1 pN ).
(1)
= (1) (1 pN ), (65)
(2)
Due to the policy of replacements, e increases and e decreases: e e

(1) (2)
= (2) (1 pN ) pN (1) ,
= (1) (1 pN ) + pN (1) , (66)
where is the probability that a class-1 customer arriving at a saturation period may replace a class-2 customer, that is the ratio of mean value of replacements to mean value of arrivals in this period: =
N (2) k=1 p (k |N )[ k1 i=0 iparriv (i) + k=1 k parriv (k )
i=k
parriv (i) ]
(67)
321
In the case of constant service times the denominator of the above fraction is iparriv (i) = (1) / =
k=1 (1)
We calculate the new values of the arrival rates (1,j +1) = (1) + (2,j +1) pN (1) , 1 pN pN = (2) (1) , 1 pN
projecting this way the inuence of replacements into non-saturation period. We recalculate the G/G/1/N model obtaining new f (x), p0 , pN and we 2 (k,j +1) (k,j ) | becomes iterate until a xed point is achieved, i.e. k=1 | smaller than an arbitrarily chosen constant. We do not prove the convergence of this algorithm. We only say that few iterations are needed in practice. After a sojourn time at the barrier x = N , the distribution of class-1 and class-2 customers is modied: pafter (n) = pafter (n) =
i=0 (2) (1) n i=0 N n
p(1) (i|N )prep (n i) p(2) (n + i|N )prep (i)
n = 0, 1, . . . , N, n = 0, 1, . . . , N.
The fraction of time when the system is overloaded and the replacements take place is determined by pN , hence we take these modications of distributions with the same weight pN into account: p(k) (n(k) ) = (1 pN ) p(k) (n(k) |n < N ) + pN p after (n(k) ), The percentage of losses of class-1 and class-2 customers are L(1) = L(2) = (1) e = pN (1 ) , (1) (2) (2) e (1) 1 + = p . N (2) (2)
(1) (k)
k = 1, 2.
(68)
(69)
In the case of time-varying input the above steady-state model uses transient solution of G/G/1/N station presented in the next section. In order to correct the 322
values of e and e , the algorithm reecting the push-out mechanism should be restarted every xed time-interval chosen suciently small with respect to the time-scale of changes of input parameters. In practice, the rate of loss is very small and we may neglect it in the analysis of ow dynamics. Therefore, to simplify the numerical side of the problem, we replace the network of G/G/1/N stations with push-out or other mechanism by the same network of G/G/1/N stations in order to predict the propagation of time-variable ow. Once the time-dependent input parameters for each station in the network are obtained, the stations are studied separately with the use of G/G/1/N/Push-Out transient model to determine the loss probabilities as a function of time. Numerical results may be found in [6]. 11. Diusion model of RED This section resumes results of [48]. Consider a G/G/1/N queue with timedependent input. In diusion model we cannot distinguish the moments of arrivals. We simply consider the queue size in intervals corresponding to mean interarrival times, t = 1/. The trac intensity remains xed within an interval. When changes, also the length 1/ of the interval is changed. We know the queue distribution f (x, t; x0 ) at the beginning of an interval. This distribution also gives us the distribution r(x, t) of response time of this queue. Let us suppose that T is the total delay between the output of RED queue and the arrival of ow with new intensity, following changes introduced by the control mechanism based on RED. At the beginning of an interval i we compute the mean value of the queue: E [Ni ] = 1 + N n=1 pi (n)n, the weighted mean i = (1 w)i1 pi (0) + [1 pi (0)] [(1 w)i1 + wE [Ni ]] and then we determine on this basis, using the drop function d( ) of RED as in Fig. ??, the probability of tagging the packet to announce the congestion. Following it, we determine the changes of trac intensity: new = [1 d( )] (old + ) + d( )old /a but only in the case when the last change was done earlier than a predened silent period. The increase of ow is additive with and the decrease of ow is multiplicative with constant 1/a. This new will reach the queue after the round trip time E [Ni ] 1/ + T . 323
(1)
(1)
source
receiver
thr_min
thr_max
notification (delay)
Fig. 25: Investigated system with RED mechanism (in case of FECN/BECN mechanism, the queue has only one threshold)
11.1. Diusion model of FECN/BECN scheme
Consider a slightly dierent control scheme: the trac intensity remains xed within a control interval D. For each t = 1/ (supposed moments of new arrivals), we compute E [Ni ] and, if E [Ni ] threshold, the counter is increased. At the end of the interval the value of new is obtained, according to the ratio of marked packets, that means to the ratio of the counter content to the value D (the supposed number of packets that arrived during interval D) new = [1 p] (old + ) + pold /a where: p= 0 1 if marked packet ratio predened value, e.g. 0.5 otherwise
This mechanism corresponds to forward (or backward) explicit congestion notication (FECN/BECN) scheme.
11.2. Numerical results
Fig. 25 presents the studied model and Figs. 26 31 display some typical results. We choose constant service time 1/ = 1 as a time unit (t.u.). The buer capacity is N = 40. The values of other model parameters, i.e. delay T , thresholds thrmin , thrmax , as well as control interval D and threshold thr for FECN/BECN scheme are given in captions of gures. Fig. 26 displays, in logarithmic and linear scales, examples of RED queue distributions for two dierent times: when the queue is relatively lightly loaded (t = 30 t.u.) and when it is overcrowded (t = 60 t.u.). The curves in logarithmic scale show that, in spite of 100 000 repetitions of the experiment, simulation model has still problems with determination of small values of probablilities. 324
The same remark may be made for Fig. 29 presenting loss as a function of time the simulation cannot give small loss probabilities while diusion is able to furnish very small values (which are naturally approximative). Figs. 27, 28 give diusion and simulation results for mean queue length as a function of time and for the resulting from RED mechanism time-dependent trac throughput. Poisson input stream (Fig. 27) and constant (deterministic) stream (Fig. 28) are considered. Fig. 30 presents long-range performance of RED mechanism. In the left gure the overall loss ratio, taken in the time interval of T = 10 000 t.u. length, is presented as a function of delay for two sets of parameters: (1) w = 0.002, pmax = 0.02 and (2) w = 0.2, pmax = 0.5; silent period = 0. The constant w is the weight with which the current value of queue is taken into consideration at . For the rst set of parameters the loss is high (the mean queue value is nearly 35, see right gure) and practicly does not depend on delay. For the second set of parameters, the loss is lower and its growth with the value of delay is distinctly visible. Simulation results are obtained in two ways: they represent either real loss, i.e. the ratio of lost packets to the whole number of arrived packets, or the overall probability of full queue, T t=1 p(N, t)/T where t = 1, 2, 3, . . . denotes consecutive slots of unitary length. Although the input stream is Poisson, the both results are not the same: as there is permanent transient state, the property of PASTA does not hold. Diusion approximation represents naturally the second approach. For the rst set of parameters, diusion curve is between both simulation results; for the second set of parameters probabilities of full queue obtained by diusion and simulation are very close. In the right gure the mean queue length and the corresponding moving average , which is the argument of RED function d( ), are displayed for the same two sets of parameters (1), (2) as in left gure. Only simulation results are given. It is visible that the oscillations observed in previous gures which are due to initial conditions, especially the ones for moving average , attenuate with time. Fig. 31 relates to the evaluation of FECN/BECN mechanism. In left gure the throughput given by diusion model is compared with simulation results; in right gure we see the evolution of simulated FECN mean queue for two dierent thresholds compared to the RED mean queue (one of those displayed in Fig. 30). 12. Time-varying and correlated input trac The correlated input is frequently modelled by a Markov modulated Poisson process: a Markov chain of M states denes the parameter of Poisson input process. If the chain is in state i, the parameter is i . If M = 2 we have an on-o source. Using the diusion approximation we may say more generally that 325
1 0.1 probability density function 0.01 0.001 0.0001 1e-05 1e-06 1e-07 diff, t=30 diff, t=60 sim, t=30 sim, t=60 30 35 probability density function
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0
diff, t=30 diff, t=60 sim, t=30 sim, t=60
10
15 20 25 number of packets
40
10
15 20 25 number of packets
30
35
40
Fig. 26: Queue distribution at t = 30 t.u. and t = 60 t.u., model parameters: thmin = 25, thmax = 35, buer capacity N = 40, initial condition N (0) = 0, Poisson input, silent period = 0, delay = 5 t.u.; logarithmic and linear scale, diusion and simulation results
40 35 30 mean queue 25 20 15 10 5 0 diff sim 2 1.8 1.6 1.4 Throughput 1.2 1 0.8 0.6 0.4 0 50 100 150 time 200 250 300 0.2 0 50 100 150 time 200 250 300 sim diff
Fig. 27: Mean queue length and changes of ow, diusion and simulation results, Poisson input, parameters as in Fig. 26
40 35 30 mean queue 25 20 15 10 5 0 diff sim 2 1.8 1.6 1.4 Throughput 1.2 1 0.8 0.6 0.4 0 50 100 150 time 200 250 300 0.2 0 50 100 150 time 200 250 300 diff sim
Fig. 28: Mean queue and changes of ow in simulation and diusion model, deterministic input ow, other parameters as in Fig. 26 326
1 1e-20 1e-40 1e-60 prob(40)
sim diff
0.5 0.45 0.4 0.35 prob(40) 0.3 0.25 0.2 0.15 0.1 0.05
sim diff
1e-80 1e-100 1e-120 1e-140 1e-160 1e-180 1e-200 0 50 100 time 150 200
50
100 time
150
200
Fig. 29: Loss as a function of time, parameters as in Fig. 26, diusion and simulation results; logarithmic and linear scale
0.5 40 sim loss (1) diff (1) mean occupancies sim p[N] (1) 35 30 25 20 15 10 0.1 0.05 0 0 5 10 delay 15 20 sim p[N] (2) diff (2) 5 0 moving average w=0.002, pmax=0.02 0.45 0.4 0.35 0.3 Loss 0.25 0.2 0.15 sim loss (2)
mean w=0.002, pmax=0.02 moving average w=0.2, pmax=0.5 mean w=0.2, pmax=0.5
2000
4000 time
6000
8000
10000
Fig. 30: Long scale performance of RED with Poisson input; left: overall loss ratio as a function of delay for two sets of parameters: (1) w = 0.002, pmax = 0.02 and (2) w = 0.2, pmax = 0.5, silent period = 0, simulation and diusion results; right: mean queues and moving averages as a function of time, the same as in left sets of parameters (1) and (2), silent period = 5 t.u., delay = 5 t.u.; simulation results
2 1.8 1.6 1.4 mean queues Throughput 25 20 FECN mean queue thr=30 15 FECN mean queue, thr=35 10 5 0 1.2 1 0.8 0.6 0.4 0.2 0 50 100 time 150 200 RED mean queue sim diff 40 35 30
2000
4000 time
6000
8000
10000
Fig. 31: FECN/BECN performance; left: throughput as a function of time, simulation and diusion results; right: comparison of mean queues FECN (D = 5, thr = 30 or thr = 35, delay = 0) and RED (w = 0.2, pmax = 0.5, silent period = delay = 5 t.u.) 327
the Markov chain determines the phases during which the input, not necesserily Poisson, is characterized by its two rst moments of interarrival times: during 1 phase i they are E [Ai ] = and E [A2 i ]. i Hence, if i (t) is the probability that a Markov process is at the state i at the moment t, the interarrival times distribution has the rst two moments at the time t E [A(t)] = E [A(t)2 ] =
M i=1 i (t)i E [Ai ] M i=1 i (t)i M 2 i=1 i (t)i E [Ai ] M i=1 i (t)i
M i=1 i (t) M i=1 i (t)i
1
M i=1 i (t)i
1 (70) (t) (71)
If i = 0, i.e. the phase i is an o period, the term E [A2 i ] in Eq. (71) is replaced by the second moment of the phase i distribution and i = 0 is replaced by the reciprocal of the mean length of this phase. The solution of the equation system di (t) = k (t)qki dt all k where qki are transition rates from a state k to state i, gives the transient state probabilities of the Markov chain underlying the input evolution we use i (t) to determine the time-dependent diusion parameters: (t) = (t) , (t) = 2 (t) + C 2 , where (t)CA B (t) = 1 , E [A(t)]
2 CA (t) =
E [A(t)2 ] 1. E [A(t)]2
(72)
In the simplest case of two-state Markov chain (an on-o source), if the rate of leaving state j is rj , j = 1, 2, the transient probabilities of the state j = 1 are 1 (t|1 (0) = 1) = 1 (t|1 (0) = 0) = r2 r1 exp[(r1 + r2 )t] + , r1 + r 2 r1 + r 2 r2 [1 exp[(r1 + r2 )t]], r1 + r 2
(73)
where i (t|i (0) = 1) is the probability of state i at time t if at t = 0 the chain was at this state, and 2 (t|2 (0) = 1) = 1 1 (t|1 (0) = 0), 2 (t|2 (0) = 0) = 1 1 (t|1 (0) = 1). The correlation should be regarded within the time scale as dened for the the function R(t, ) = E [A(t)A(t + )] E [A(t)2 ]: 328
R(t, ) =
i=1
i (t) i ( |i (0) = 1)2 i +
M i=1,i=j
j ( |i (0) = 1)i j .
(k) 2
(74)
We may also consider a superposition of K independent on-o (or modulated by various Markov chains) sources. Denote by (k) (t), CA (t) the parameters of the k -th on-o source. The resulting input stream (t) is obviously the sum of all (k) (t) and, as the number of source k customers that arrived within a considered period (which is short compared to the rate of input stream changes but suciently long to receive a certain number of customers), is normally distributed with variance (k) (t)CA (t); the sum of independent randomly distributed variables has also normal distribution with variance which is the sum of composant variances, hence
2 CA (t) = (k) 2
(k) (t) (k) 2 C (t) . (t) A k=1
(75)
If the steady-state is considered, the probabilities i (t) become i .

rj 1 pj 1,j rj pj,j +1
1 1 j1 j j j+1 j +1 rj +1 pj +1,j
31 j 1 rj pj,j 1
31
Fig. 32. Markov chain used to represent correlated input Numerical examples. The results are taken from [14]. Let us suppose that the time unit corresponds to the time needed to send a cell, e.g. if the channel throughput is 155 Mbps, then the time unit is 2.7 106 s, the multiplexer is seen as a single FIFO queue limited to 100 customers. The service time is constant. On-o source and multiplexer. The periods of the source are exponentially distributed, the mean sojourn times in each phase are equal 100 time units: r1 = r2 = 0.01, see Eq. (73). The Poisson trac has intensity 1 = 1.6 cells per time unit when the source is in the phase one and the intensity 2 = 0.4 cells per time unit when the source is in the phase two. At t = 0 the source is in phase one. Fig. 33 presents the mean queue length in the multiplexer given by diusion model and compared with simulation results. The observed dierences 329
are probably due to accumulation of numerical errors (as the solution of the model at a certain interval depends on the solution in the precedent interval) and may be probably reduced with the use of smaller intervals where diusion parameters are kept constant. Simulation results are obtained as a mean of 10 000 independent runs. It is visible that an exponential on-o source maintain practically the trac correlation in a horizon which is not wider that one on-o cycle. Multiphase source and multiplexer. Let us consider a well known model of video trac presented in [50] and used since then, e.g. [49], to represent a trac of video sources statistically multiplexed into a channel. The aggregate bit rate process is modelled by a continuous time birth and death Markov chain of 31 states, Fig. 32; the rate of leaving state j is given by rj = 1.03077(31 j ) + 2.5923j and transition probabilities are pj,j +1 = 1.03077(31 j ) , 1.03077(31 j ) + 2.5923j pj,j 1 = 1 pj,j +1 , 2 j 30;
p1,2 = p31,30 = 1.
The trac intensity at state j is j = 2.391(j 1). In the above formulas, repeated after [49], the time unit is millisecond. We scale them to maintain the chosen unit of time. Fig. 33 presents the evolution of the multiplexer mean queue as a function of time and of the initial state j0 . The results for constant service time are compared with the ones for exponential time and they conrm a remark of [49] that the inuence of service time distribution is minor compared with the inuence introduced by the variable source characteristics. 13. Jitter If the service time is constant (we suppose it is equal to 1), the queue distribution fi (x) for a station i approximates also its response time Ri , [10, 13]. The pdf ri (y ) of this time is ri (y, t; ) = fi [(y 1), t; ]. (76) The argument is shifted by 1 to include the service time of the considered customer. We may express the respons time of m consecutive stations of the virtual connection as m-fold convolution of ri (y, t; ): r1...m (y ) = r1 (y ) rm (y ) , 330 m = 2, . . . , M. (77)
60 "case1-diffusion" "case1-simulation" "case2-diffusion" "case2-simulation" 50
100 90 80 70 "DIFF-init-state-25" "SIM-init-state-25" "DIFF-init-state-23" "SIM-init-state-23" "DIFF-init-state-21" "SIM-init-state-21"
40 60 30 50 40 20 30 20 10 10 0 0 500 1000 1500 2000 2500 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Fig. 33: Mean queue length at the multiplexer, (a) the source has two exponentially distributed phases of the same mean 1/ = 100; 1 = 1.6, 2 = 0.4 (average = 1.0, case1) or 2 = 0 (average = 0.8, case 2); constant service time; diusion and simulation results; (b) the source has 31 phases, initial phase j0 = 21, 23, 25; constant service time; diusion and simulation results Numerical example. There are M = 5 stations in a path. The buer size at each station is N = 80, the xed service time is 1/ = 1. The symbols with upper index (l) refer to local streams which pass through one station together with the considered stream of the virtual connection. Parameters with upper index (v ) describe the virtual path trac which passes all 5 stations in the connection: (v ) (v )2 (l) (l)2 (l) (l)2 (l) 1 = 0.5, CA1 = 1.0, 1 = 0.3, CA1 = 1.5, 2 = 0.3, CA2 = 1.0, 3 = 0.4, CA3 = 0.5, 4 = 0.45, CA4 = 1.5, 5 ( l) = 0.2, CA5 = 2.0. Results in Figs. 34 i 35 compare the density of total respons time of the connection obtained by diusion model and by simulation and present the inuence (v )2 of the input CA1 on rhe density r(x). 14. Reactive control in ABR service It is supposed that each connection has a closed-loop feedback control mechanism which allows the network to control the cell emission process at each source. When a dened congestion level is reached, ATM switches notify trac sources contributing to the congestion. Upon receiving a congestion notication, the source reduces the rate at which it transmits cells through the considered connection. If the source does not receive congestion messages during a certain time interval, it can start to increase its trac rate. The model [1] allows us to study the impact of the closed-loop control on the performance of the virtual connections through the network. The propagation time, the delay with which 331
(l)2 (l) (l)2 (l)2
0.06 simulation diffusion 0.05
0.04
0.03
0.02
0.01
0 0 10 20 30 40 50 60 70 80 90 100
Fig. 34: Density r(y ) of the response time of the path of 5 stations (virtual path), diusion and simulation results
0.06 CA1(v)=1 CA1(v)=5 CA1(v)=10 0.05
0.04
0.03
0.02
0.01
0 0 10 20 30 40 50 60 70 80 90 100
Fig. 35. The inuence of CA1 on the density r(x) of the response time
(v )2
332
the control decision is taken and inertia of the source contribute to the control delay. Let us suppose that this delay is comparatively small, = 100 time units, and the bursty period is prolonged, so the eects of the control are visible inside the bursty period. The quantity of information which should be sent during the bursty period corresponds to NB cells. Outside the burst periods which are started every 1000 time units the source is silent. The rate of local trac is constant: (l) = 0.5 cells per time unit. We assume that the buer can contain 150 cells. At the beginning of bursty period the source starts to emit cells at the maximum rate (cv) = 1 cell per time unit. When the queue length is greater than the level So = 20, the message calling the source to decrease its maximum activity ((cv) 0.667(cv) ) is sent and after the delay of time units the source follows this order. After a time period 1 = 140 time units (i.e. 40 time units after the last change of the trac intensity) the queue is examined. If its magnitude is still over the threshold So , a new message to decrease its activity is sent to the source. If the queue lenght is below Si = 10, the message allowing the source to increase the throughput of emitted cells ((cv) 1.5(cv) ) is sent. We are not analysing the quality of this protocol but only demonstrate the performance of the model. In Fig. 37 the simulation results are compared with the results of the diusion model (diusion1 and diusion2 as dened in page 289). In diusion models the algorithm decisions are taken on the basis of the mean queue length predicted by the model. In the same model we may use the probability pN (t) as the congestion indicator.
local traffic source
local traffic
local traffic
local traffic
destination
FN
FN
FN
decision
BN
local traffic
local traffic
FN
congestion indicator decision
delay
FN
delay
Fig. 36: Queueing model with forward (FN) and backward (BN) congestion notication
333
100 90 80 70 60 50 40 30 20 10 0 0 200 400 600 800 1000 virtual traffic intensity overall traffic intensity mean queue - simulation mean queue - diffusion1 mean queue - diffusion2
Fig. 37: Trac control for a bursty period of 300 cells: mean buer queue length as a function of time; control decisions visible in changes of throughput; simulation and diusion results
334
15. G/G/N/N model and cellular networks A cellular mobile communication system is usually modelled by a network of multichannel stations, each station corresponding to a cell and the number of channels corresponding to the maximal number of connections served by the cell. Traditional modelling techniques as Markov chain approach and simulation suer here from their usual drawbacks: state number explosion for Markovian models and long run times necessary to obtain credible probabilities of rare events (connection loss when a user passes from one cell to another, so called handover, refusal of new connection) in case of simulation models. Considered model is presented in Fig. 38. A part of a cellular mobile communication system is represented by an open queueing network of seven G/G/N/N stations corresponding to the central cell and its six neighbours in hexagonal topology. The goal is to model transient behaviour of the system. Input streams 0i at each station represent streams of new connection demands generated within the cell, routing probabilities represent the possibility of handovers. 02
r72 r27 07
7
r23 r32
01
3
03 r30
r61
6
r16 r65
5
04
06
r56
rij routing probabilities (handovers) 0i input trac
05 Fig. 38. Considered conguration Consider the network in Fig. 38. Each station has 20 parallel service channels, 335
0.3 Diffusion - Station 1 - T= 1 Diffusion - Station 1 - T= 2 Diffusion - Station 1 - T= 3 Diffusion - Station 1 - T= 4 Diffusion - Station 1 - T= 5 Diffusion - Station 1 - T= 6 Diffusion - Station 1 - T= 7 Diffusion - Station 1 - T= 8 Diffusion - Station 1 - T= 9 Diffusion - Station 1 - T=10
0.3 Simulation - St.1 - T= 1 Simulation - St.1 - T= 2 Simulation - St.1 - T= 3 Simulation - St.1 - T= 4 Simulation - St.1 - T= 5 Simulation - St.1 - T= 6 Simulation - St.1 - T= 7 Simulation - St.1 - T= 8 Simulation - St.1 - T= 9 Simulation - St.1 - T=10
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0 0 5 10 15 20
0 0 5 10 15 20
Fig. 39: Station 1 (central): transient distribution of the number of occupied channels, diusion (a) and simulation (b) results
7 7
5 Diffusion - St.1 Diffusion - St.2 Diffusion - St.3 Diffusion - St.4 Diffusion - St.5 Diffusion - St.6 Diffusion - St.7
5 Simulation - St.1 Simulation - St.2 Simulation - St.3 Simulation - St.4 Simulation - St.5 Simulation - St.6 Simulation - St.7
0 0 2 4 6 8 10
0 0 2 4 6 8 10
Fig. 40: Mean number of occupied channels for each cell, (a) diusion (curves for station 2 and 4 are superimposed) and (b) simulation results
1 Diffusion - Station 1 Diffusion - Station 2 Diffusion - Station 3 Diffusion - Station 4 Diffusion - Station 5 Diffusion - Station 6 Diffusion - Station 7 1 Simulation - St.1 Simulation - St.2 Simulation - St.3 Simulation - St.4 Simulation - St.5 Simulation - St.6 Simulation - St.7 0.1 0.1
0.01
0.01
0.001
0.001
0.0001
0.0001
1e-05
1e-05
1e-06
1e-06
1e-07
1e-07
1e-08 0 2 4 6 8 10
1e-08 0 2 4 6 8 10
Fig. 41: Loss probabilities for each station, diusion results: losses for station 2 and 4 are superimposed, losses for stations 3, 5 7 are obtained but they are below 108 ; simulation results (mean of 1 million runs) for most charged stations 1,2,4; zero losses observed for other stations 336
N = 20. The input streams are Poisson, 01 = 2, 02 = 3, 02 = 04 = 2, 03 = 05 = 06 = 07 = 1. Service times at each channel and each station are exponentially distributed, = 1. The routing probabilities are:
R=
0 0.4 0.025 0.4 0.025 0.025 0.025
0.4 0 0.05 0 0 0 0.05
0.025 0.05 0 0.05 0 0 0
0.4 0 0.05 0 0.05 0 0
0.025 0 0 0.05 0 0.05 0
0.025 0 0 0 0.05 0 0.05
0.025 0.05 0 0 0 0.025 0
Suppose that stations 2, 1, 4 correspond to cells through which a highway is passing and the trac there is much more intense then in other cells. They have stronger input streams (new connections), also routing probabilities (probabilities of handovers) between station 2 and 1 as well as between 1 and 4 are considerably greater then others. At the beginning (t = 0) all queues are empty. Fig. 39 compares distributions of occupied channels in cell 1 (central station), given by diusion model and simulation (mean of 10 millions of trajectories) Fig. 40 presents mean queue lengths, Fig. 45 displays loss probabilities, [20]. For more complex networks of G/G/N/N stations with larger N , e.g. N = 100 we observed accidental loss of numerical stability in numerical inversion of Lapalce transforms; also the computation time became prohibitive. Therefore, we tried the direct numerical integration of the model dierential equations. In direct integration approach a method of lines [58] has been adapted to t the case of diusion equation. The basis of this method, which is sometimes called the generalized method of Kantoravich, is substitution of nite dierences for the derivatives with respect to one independent variable, and retention of the derivatives with respect to the remaining variables. This approach changes a given partial dierenrial equation into a system of partial dierential equations with one fewer independent variables. The stability of the integration scheme was evaluated using the Von Neumann test and the presented method is not stable if t > (x)2 /4. However, the estimation of stability region is complex since parameters of equation are dependent. In a network model the input stream parameters for each station are obtained following [29], using a system of equations matching input and output streams parameters for each station witin the network. Consider a network in Fig. 42, [21]. There are 37 cells, hence there are 37 G/G/N/N stations in the model. Each station has 20 parallel service channels, N = 20. Roads are passing certain cells, as marked in the gure. Routing 337
36
32
27
21
35
31
26
20
14
34
30
25
19
13
33
29
24
18
12
28
23
17
11
22
16
10
15
Fig. 42. Considered cell topology
probabilities between neighbouring stations (probability of handovers) are r ij = 0.05 if there is no road passing between these cells, rij = 0.4 if there is a single road between i and j , e.g. r30,24 = r24,30 = 0.4, rij = 0.26 if there are three roads leaving cell i, e.g. r20,13 = r13,20 = 0.26, and rij = 0.2 if there are four roads leaving cell i, e.g. r18,11 = r11,18 = 0.26. At the beginning (t = 0) all queues are empty. The input streams are Poisson. Initially, 0i = 0.5 if there is no highway in cell i, 0i = 1.5 if there is one road in station i and 0i = 3 if there is a crossroad (station 18) or bifurcation (cell 20). At t = 10 the trac at roads passing by cells 21 and 8 is growing (0,21 = 0,28 = 3.5). At t = 11 also 0,20 = 0,13 = 3.5 and this wave of increased trac is advancing with the speed of one cell per time unit. After 20 time units the trac is coming back to its former level: at t = 30 0,21 = 0,28 = 1.5, at t = 31 also 0,20 = 0,13 = 1.5, etc. Service is exponential with = 1 for each channel. The model was solved using direct integration. Two types of state-depandance of diusion parameters were considered: stepwise function Eq. (36) and linear continuous changes. Both versions gave similar results. The comparison with simulation indicated some superiority of the continuous parameter model and some samples of results obtained with this approach are presented in Figs. 43 46. Figures 338
18 Diffusion - St.20 Diffusion - St.21 Diffusion - St.22 Diffusion - St.23 Diffusion - St.24 Diffusion - St.25 Diffusion - St.26 Diffusion - St.27 Diffusion - St.28 Diffusion - St.29
16
14
12
10
0 0 5 10 15 20 25 30 35 40 45 50
Fig. 43: Transient mean values of the number of occupied channels in cells no. 20 29, diusion results
18 Simulation - St.20 Simulation - St.21 Simulation - St.22 Simulation - St.23 Simulation - St.24 Simulation - St.25 Simulation - St.26 Simulation - St.27 Simulation - St.28 Simulation - St.29
16
14
12
10
0 0 5 10 15 20 25 30 35 40 45 50
Fig. 44: Transient mean values of the number of occupied channels in cells no. 20 29, simulation results 339
1 Diffusion - St.20 Diffusion - St.21 Diffusion - St.22 Diffusion - St.23 Diffusion - St.24 Diffusion - St.25 Diffusion - St.26 Diffusion - St.27 Diffusion - St.28 Diffusion - St.29
0.1
0.01
0.001
0.0001
1e-05
1e-06
1e-07
1e-08 0 5 10 15 20 25 30 35 40 45 50
Fig. 45. Loss probabilities for stations 20 29, diusion results

1 Simulation - St.20 Simulation - St.21 Simulation - St.22 Simulation - St.23 Simulation - St.24 Simulation - St.25 Simulation - St.26 Simulation - St.27 Simulation - St.28 Simulation - St.29
0.1
0.01
0.001
0.0001
1e-05
1e-06
1e-07
1e-08 0 5 10 15 20 25 30 35 40 45 50
Fig. 46: Loss probabilities for stations 20 29, simulation results (mean of 1 million runs) 340
43 and 44 compare the mean value of occupied channels at certain cells as a function of time, obtained by diusion model and by simulation (mean of 10 millions of trajectories). Figs. 45 and 46 display loss probabilities (probabilities of full system) given by diusion approximation and simulation. 16. Current research We think that the most important problem of the presented approach is now to master the transient model of a general multiclass network. There are still diculties in proper description of output stream characteristics for each class: i,out (t), CDi (t). Burkes formula Eq.(41) holds only for steady state and Eqs. (48),(50) could not be greneralised to transient analysis. The solution presented in Section 8 holds only for constant service times. Yet, without deciding the question of time-dependent output per class it is not possible to determine the routing in a network. References [1] T. Atmaca, T. Czach orski, F. Pekergin, A Diusion Model of the Dynamic Eects of Closed-Loop Feedback Control Mechanisms in ATM Networks, 3rd IFIP Workshop on Performance Modelling and Evaluation of ATM Networks, Ilkley, UK, 4-7th July 1995, rozszerzona wersja w Archiwum Informatyki Teoretycznej i Stosowanej, z.1, pp. 41-56, 1999. [2] P.J. Burke, The Output of a Queueing System, Operations Research, vol. 4, no. 6, pp. 699-704. [3] R. P. Cox, H. D. Miller, The Theory of Stochastic Processes, Chapman and Hall, London 1965. [4] T. Czach orski, A Multiqueue Approximate Computer Performance Model with Priority Scheduling and System Overhead, Podstawy Sterowania, t.10, z. 3, pp.223-240, 1980. [5] T. Czach orski, A diusion process with instantaneous jumps back and some its applications, Archiwum Informatyki Teoretycznej i Stosowanej, tom 20, z. 1-2, pp. 27-46. [6] T.Czach orski, J.M.Fourneau, F.Pekergin, Diusion Model of the Push-Out Buer Management Policy, IEEE INFOCOM 92, The Conference on Computer Communications, Florence, 1992. 341
(k) (k) 2
[7] T. Czach orski, A method to solve diusion equation with instantaneous return processes acting as boundary conditions, Bulletin of Polish Academy of Sciences, Technical Sciences 41 (1993), no. 4. [8] T. Czach orski, J. M. Fourneau, F. Pekergin, Diusion model of an ATM Network Node, Bulletin of Polish Academy of Sciences, Technical Sciences 41 (1993), no. 4. [9] T. Czach orski, J. M. Fourneau, F. Pekergin, Diusion Models to Study Nonstationary Trac and Cell Loss in ATM Networks, ACM 2nd Workshop on ATM Networks, Bradford, July 1994. [10] T. Czach orski, J.M. Fourneau, L. Kloul, Diusion Approximation to Study the Flow Synchronization in ATM Networks, ACM 3rd Workshop on ATM Networks, Bradford, July 1995. [11] T. Czach orski, J. M. Fourneau, F. Pekergin, The dynamics of cell ow and cell losses in ATM networks, Bulletin of Polish Academy of Sciences, Technical Sciences vol. 43, no. 4, 1995. [12] T. Czach orski, F. Pekergin, Diusion Models Buer Sharing Policy: A Transient Analysis formance Modelling and Evaluation of ATM in: D. Kouvatsos (Editor), ATM Networks, Analysis, Chapman and Hall, London 1997. of Leaky Bucket and Partial 4th IFIP Workshop on PerNetworks, Ilkley, 1996, also Performance Modelling and
[13] T. Czach orski, T. Atmaca, J-M. Fourneau, L. Kloul, F. Pekergin, Switch queues diusion models (in polisch), Zeszyty Naukowe Politechniki /Sl/askiej, seria Informatyka, z.32, 1997. [14] T. Czach orski, F. Pekergin, Transient diusion analysis of cell losses and ATM multiplexer behaviour under correlated trac, 5th IFIP Workshop on Performance Modelling and Evaluation of ATM Networks, Ilkley, UK, 21-23 july 1997. [15] T. Czach orski, M. Pastuszka, F. Pekergin, A tool to model network transient states with the use of diusion approximation, Performance Tools 98, Palma de Mallorca, Hiszpania, wrzesie/n 1998. [16] T. Czach orski, S. Jedrus, M. Pastuszka, F. Pekergin, Diusion approximation and its numerical problems in implementation of computer network models, Archiwum Informatyki Teoretycznej i Stosowanej, z.1, pp. 41-56, 1999. [17] T. Czach orski, J-M. Fourneau, L. Kloul, Diusion Method Applied to a Hando Queueing Scheme, Archiwum Informatyki Teoretycznej i Stosowanej, z.1, pp. 41-56, 1999. 342
[18] T. Czach orski, F. Pekergin, Probabilistic Routing for Time-dependent Trafc: Analysis with the Diusion and Fluid Approximations, IFIP ATM Workshop, Antwerp, czerwiec 1999. [19] T. Czach orski, F. Pekergin, Modelling the time-dependent ows of virtual connections in ATM networks, Bulletin of PAS, in print. [20] T. Czach orski, J-M. Fourneau, S. J edru s, F. Pekergin, Transient State Analysis in Cellular Networks: the use of Diusion Approximation, QNETS 2000, Ilkley 2000. [21] T. Czach orski, J-M. Fournau, S. Jedru s, Cellular Networks: Transient State Models, Proc. of First Polish-German Teletrac Symposium PFTS 2000, 24-26 September 2000, Dresden. [22] A. Duda, Diusion Approximations for Time-Dependent Queueing Systems, IEEE J. on Selected Areas in Communications, Vol. SAC-4, no. 6, September 1986. [23] W. Feller, The parabolic dierential equations and the associated semigroups of transformations, Annales Mathematicae, Vol. 55, pp. 468-519, 1952. [24] W. Feller, Diusion processes in one dimension, Transactions of American Mathematical Society, Vol. 77, pp. 1-31, 1954. [25] J. Filipiak, A. R. Pach, Selection of Coecients for a Diusion-Equation Model of Multi-Server Queue, PERFORMANCE 84, Proc. of The 10-th International Symposium on Computer Performance, North Holland, Amsterdam 1984. [26] D. P. Gaver, Observing stochastic processes, and approximate transform inversion, Operations Research, vol. 14. no. 3, pp. 444-459, 1966. [27] D. P. Gaver, Diusion Approximations and Models for Certain Congestion Problems, Journal of Applied Probability, Vol. 5, pp. 607-623, 1968. [28] E. Gelenbe, On Approximate Computer Systems Models, J. ACM, vol. 22, no. 2, 1975. [29] E. Gelenbe, G. Pujolle, The Behaviour of a Single Queue in a General Queueing Network, Acta Informatica, Vol. 7, Fasc. 2, pp.123-136, 1976. [30] E. Gelenbe, A non-Markovian diusion model and its application to the approximation of queueing system behaviour, IRIA Rapport de Recherche no. 158, 1976. [31] E. Gelenbe, Probabilistic models of computer systems. Part II, Acta Informatica, vol. 12, pp. 285-303, 1979. 343
[32] E. Gelenbe, J. Labetoulle, R. Marie, M. Metivier, G. Pujolle, W. Stewart, R eseaux de les dattente mod elisation et traitement num erique, Editions Hommes et Techniques, Paris 1980. [33] E. Gelenbe, X. Mang, Y. Feng, A diusion cell loss estimate for ATM with multiclass bursty trac, in: D. D. Kouvatsos (Editor), Performance Modelling and Evaluation of ATM Networks, Vol. 2, Chapman and Hall, London 1996. [34] E. Gelenbe, X. Mang, R. Onvural, Diusion based statistical call admission control in ATM, Performance Evaluation, vol. 27-28, pp. 411-436, 1996. [35] B. Halachmi, W. R. Franta, A Diusion Approximation to the Multi-Server Queue, Management Sci., vol. 24, no. 5 pp. 522-529, 1978. [36] H. Hees, D. M. Lucantoni, A Markov modulated characterization of packetized voice and data trac and related statistical multiplexer parformance, IEEE J. SAC, SAC-4(6), pp. 856-867, 1986. [37] D. P. Heyman, An Approximation for the Busy Period of the M/G/1 Queue Using a Diusion Model, J. of Applied Probility, vol. 11, pp. 159-169, 1974. [38] D. Iglehart, W. Whitt, Multiple Channel Queues in Heavy Trac, Part I-III, Advances in Applied Probability, vol. 2, pp. 150-177, 355-369, 1970. [39] D. Iglehart, Weak Convergence in Queueing Theory, Advances in Applied Probability, vol. 5, pp. 570-594, 1973. [40] B. Jouaber, T. Atmaca, M. Pastuszka, T. Czach orski, Modelling the Sliding window Mechanism, The IEEE International Conference on Communications, ICC98, pp. 1749-1753, Atlanta, Georgia, USA, 7-11 czerwiec 1998. [41] B. Jouaber, T. Atmaca, M. Pastuszka, T. Czach orski, A multi-barrier diffusion model to study performances of a packet-to-cell interface, art. S48.5, Session: Special applications in ATM Network Management, International Conference on Telecommunications ICT98, Porto Carras, Greece, 22-25 czerwiec 1998. [42] T. Kimura, Diusion Approximation for an M/G/m Queue, Operations Research, vol. 31, no. 2, pp. 304-321, 1983. [43] L. Kleinrock, Queueing Systems, vol. I: Theory, vol. II: Computer Applications Wiley, New York 1975, 1976. [44] H. Kobayashi, Application of the diusion approximation to queueing networks, Part 1: Equilibrium queue distributions, J.ACM, vol. 21, no. 2, pp. 316-328, Part 2: Nonequilibrium distributions and applications to queueing modeling, J.ACM, vol. 21, no. 3, pp. 459-469, 1974. 344
[45] H. Kobayashi, Modeling and Analysis: An Introduction to System Performance Evaluation Methodology, Addison Wesley, Reading, Mass. 1978. [46] H. Kobayashi, Q. Ren, A Diusion Approximation Analysis of an ATM Statistical Multiplexer with Multiple Types of Trac, Part I: Equilibrium State Solutions, Proc. of IEEE International Conf. on Communications, ICC 93, pp. 1047-1053, May 23-26, 1993, Geneva, Switzerland. [47] L. A. Kulkarni, Transient behaviour of queueing systems with correlated trac, Performance Evaluation, vol. 27-28, pp. 117-146, 1996. [48] R. Laalaoua, T. Czach orski, S. Jedru s, T. Atmaca, Diusion Model of RED Control Mechanism, International Conference on Networking ICN01, July 9-13, 2001, CREF, Universit e de Haute Alsace, Colmar, France. [49] D-S. Lee, S-Q. Li, Transient analysis of multi-sever queues withMarkovmodulated Poisson arrivals and overload control, Performance Evaluation, vol. 16, pp. 49-66, 1992. [50] B. Maglaris, D. Anastassiou, P. Sen, G. Karlsson, J. Rubins, Performance models of statistical multiplexing in packet video communications, IEEE Trans. on Communications, Vol.24559 36, No. 7 pp. 834-844, 1988. [51] G. F. Newell, Queues with time-dependent rates, Part I: The transition through saturation; Part II: The maximum queue and return to equilibrium;, Part III: A mild rush hour, J. Appl. Prob. vol. 5, pp. 436-451, 579-590, 591606. 1968. [52] G. F. Newell, Applications of Queueing Theory, Chapman and Hall, London 1971. [53] M. Pastuszka, Modelling transient states in computer networks with the use of diusion approximation (in polish), Ph.D. Thesis, Silesian Technical University (Politechnika Slaska), Gliwice 1999. [54] M. Reiser, H. Kobayashi, Accuracy of the Diusion Approximation for Some Queueing Systems, IBM J. of Res. Develop., vol. 18, pp. 110-124, 1974. [55] S. Sharma, D. Tipper, Approximate models for the Study of Nonstationary Queues and Their Applications to Communication Networks, Proc. of IEEE International Conf. on Communications, ICC 93, pp. 352-358, May 23-26, 1993, Geneva, Switzerland. [56] H. Stehfest, Algorithm 368: Numeric inversion of Laplace transform, Comm. of ACM, vol. 13, no. 1, pp. 47-49, 1970. 345
[57] F. Veillon, Algorithm 486: Numerical Inversion of Laplace Transform, Comm. of ACM, vol. 17, no. 10, pp. 587-589, 1974; also: F. Veillon, Quelques m ethodes nouvelles pour le calcul num erique de la transform e inverse de Laplace, Th. Univ. de Grenoble, 1972. [58] D. Zwingler, Handbook of Dierential Equations, Academic Press, Boston 1989, pp. 623-627.
346
On Classical and Chaotic prediction of broadband network trac

R. G. Garroppo, S. Giordano, F. Laschi, M. Pagano, G. Procissi Dipartimento di Ingegneria dellInformazione ` di Pisa Via Caruso, I-56122 Pisa Universita
Abstract: The paper presents the analysis and performance evaluation of classical and chaotic techniques for network trac prediction. The rationale of this work is to develop analytic techniques to be used in the context of resource allocation strategies. In this sense, the research has focused on the investigation of three specic metrics: the accuracy of predictors in capturing the actual behavior of trac; the computational complexity for a realistic integration of such predictors into experimental testbed; and the adaptivity with respect to trac pattern variations. We compare three prediction algorithms which belong to the classes of statistical, geometric and chaotic predictors. We show that the classical Normalized Linear Mean Square predictor achieves a satisfactory trade-o among the above mentioned metrics as it presents a medium level of complexity while achieving high performance in terms of prediction accuracy and adaptivity to network trac changes.
1. Introduction Accurate information on the future behavior of random processes is very relevant in many networking contexts (such as trac control, data compression, resource allocation). Papers like [1, 2], for instance, present the application of predictive techniques to broadband satellite networks (Predictive Demand Assignment Multiple Access Protocol - PRDAMA) and in Active Queue Management (AQM) schemes, respectively. This paper addresses the topic of forecasting or prediction by presenting the application of some of the most common techniques based on classical and chaotic theories to network trac. The general problem of prediction can be stated as follows: given a set of observations of a stochastic process x(n), estimate the value x(n + k ) that the process x will assume k steps ahead. Such a problem is known as optimal k step prediction, according to some optimality criteria. The predictor is called k step optimal predictor. As an ex-
ample, if the optimal criterion is to minimize the mean square error, the predictor will be denoted as MMSE (Minimum Mean Square Error) predictor. Naturally, for the predictor to be practically useful, optimal criteria should obey a trade-o between estimation accuracy and a reasonable complexity. Many approaches have been proposed in the literature to address the prediction problem. In particular, the self-similar and chaotic nature of network trac (such as FTP, Web, E-mail and so-on) has lead to the application of chaotic predictors [3]. In this paper, we will present a selection of classical and chaotic prediction algorithms together with their application to network trac forecasting. In the framework of classical methods, in section II we will focus on prediction techniques that do not require a prior detailed modeling of data to be forecasted. Chaotic predictors are presented in section III in order to show a dierent philosophy to the prediction problem that, somewhat, takes into account the selfsimilarity of network trac. Section IV will be devoted to the comparison of performance of some of the presented techniques, in particular as far as their accuracy, computational complexity and adaptivity to trac changes concern. Finally, section V ends the paper with conclusion. 2. Classical Predictors This section is devoted to present some of the most widely adopted classical predictors in the context of network trac forecasting. In particular, we focus on two main classes of them, namely statistical predictors and geometric predictors. None of them, though, require a specic model of the underlying time series to be predicted. Statistical predictors only require the underlying stochastic sequence to be wide sense stationary, while geometric predictors are derived by simple algebraic reasoning.
2.1. Predictor LMMSE
One of the most widely used predictors is the so-called Linear Minimum Mean Square Error Predictor (LMMSE) [4, 5]. Such a predictor is optimal in that, within the class of linear lters, it is the one that minimizes the mean square error of prediction. Let us consider the stochastic sequence x(n) (representing, as an example, the volume of trac measured over the nth time unit) and suppose to be interested in the estimation of the value, given a set of p (possibly innite) observations, x(n), x(n 1), x(n p +1). The problem is to determine the impulse response 348
h(n) of the linear lter h such that:

p1
x (n + k ) = x(n) h(n) =
i=0
h(i) x(n i)
(1)
According to the terminology of FIR lters, the predictor is called a porder LMMSE predictor. The prediction error at the nth step is dened as:
p1
(n) = x(n + k ) x (n + k ) = x(n + k )

i=0
h(i) x(n i)
The values h(n) are derived by minimizing the Mean Square Error. According to the principle of orthogonality, the mean square error is minimum as long as the prediction error (n) has zero mean and it is orthogonal to each of the collected data. In other words: E [(n)] = 0 E [(n) x(n m)] = 0 m = 0, 1, . . . , p 1 (2) (3)
Notice that, the last equation, can be re-written as:

p1
x(n + k )
i=0
h(i) x(n i)
p1
x(n m)
= 0 (4)
E [x(n + k ) x(n m)] =

i=0
h(i) E [x(n i) x(n m)]
and nally, denoting as Rx (m) = E [x(n) x(n + m)] the autocorrelation function of the sequence x(n), the coecients h(n) can be derived by solving the linear system (Wiener-Hopf equations): Rx (m)h = p(k ) (5)
that, with obvious meaning of symbols, can be explicitly written as h(0) Rx (k ) Rx (0) Rx (1) Rx (p + 1) Rx (k + 1) Rx (1) Rx (0) Rx (p + 2) h(1) = . . . . . .. . . . . . . . . . . . h(p 1) Rx (k + p 1) Rx (p 1) Rx (p 2) Rx (0) 349
The LMMSE predictor is then obtained by inverting the previous relation, as: 1 Rx (p + 1) Rx (p + 2) = . . . . . . h(p 1) Rx (p 1) Rx (p 2) Rx (0) h(0) h(1) . . . Rx (0) Rx (1) . . . Rx (1) Rx (0) . . . h(0) h(1) . . . Rx (0) Rx (1) . . . Rx (1) Rx (0) . . . Rx (k ) Rx (k + 1) . . .
The impulse response of the 1-step LMMSE predictor (k =1) lter is then: 1 Rx (p + 1) Rx (p + 2) = . . . . . . Rx (p 1) Rx (p 2) Rx (0) h(p 1)
Rx (k + p 1) Rx (1) Rx (2) . . . Rx (p)
As it should be clear from its derivation, the implementation of such a lter requires rst the wide sense stationarity of the sequence x(n). Moreover, it requires the knowledge of at least p values of the autocorrelation function of the stochastic process x and inverting the p p matrix. To sort out these last issues, an adaptive version of LMMSE (the so-called LMS predictor) has been proposed and it will be elaborated upon in the next paragraph.
2.2. LMS Predictor
The LMS (Least Mean Square) algorithm is an adaptive approach which does not require prior knowledge of the autocorrelation structure of the stochastic sequence [4, 5]. Thus, it can be used as an on-line technique for predicting bandwidth. The algorithm scheme is shown in Fig. 1. The lter coecients h(n) are time varying and are tuned on the basis of the feedback information carried by the error (n). The values of h(n) adapt dynamically in order to decrease the mean square error. Notice that (n), x(n) and h(n) are dened as in the previous section. Indeed, the LMS algorithm is the adaptive version of the optimal LMMSE predictor. The LMS algorithm operates as follows: 1. initialize the coecients h0 2. for each new data, update h(n) according to the recursive equation h(n + 1) = h(n) + (n)x(n) where is a constant called the step size. 350 (6)
x(n + k )
PSfrag replacements
x(n)
h ( n )
x (n + k )
(n)
Fig. 1. LMS algorithm scheme
If x(n) is stationary, h(n) converges in the mean to the optimal solution given by equation (5) [4, 5]. In [5] it has been shown that LMS converges in the mean if 2 1 < 0 < max where max is the maximum eigenvalue of Rx . The speed of convergence of the algorithm depends on the value of . Large values of result in a faster convergence of the algorithm and in a quicker response to signal changes. However, small values of results in slower convergence and less uctuations once the convergence is attained. The selection of thus, should be made by trade-o between the above phenomena. Furthermore, notice that at time n, the value x(n + k ) is not known, so is (n). Thus, the value (n k ) is used, instead.
2.3. Normalized LMS (NLMS) Predictor
The normalized LMS algorithm is a modication to the LMS algorithm in which the update equation of the lters coecients is: h(n + 1) = h(n) + (n)x(n) x(n) 2 (7)
where x(n) 2 = x(n)xt (n). The main advantage of using NLMS is that it is less sensitive to the step size . According to [5], NLMS converges in the mean as long as 0 < < 2. Again, at time n, the value x(n + k ), thus (n), are not known. In the update equation, the value (n k ) is used, instead. Therefore, 351
the one-step NLMS predictor update equation becomes: h(n + 1) = h(n) + (n 1)x(n 1) x(n 1) 2 (8)
The computational complexity of a porder NLMS algorithm is that of 2p + 1 multiplications and p + 1 additions.
2.4. Geometric-Predictors
This section deals with prediction algorithms based on the approximation of the behavior of the sequence x(n) through elementary functions. These techniques does not involve stochastic analysis of time series while they only involve pure geometric considerations. The basic idea is to determine the (p 1)order polynomial interpolator of the set of p observations x(n), x(n 1), x(n p + 1) available. Such a polynomial is of the form: Pp1 (z ) = ap1 z p1 + ap2 z p2 + + a1 z + a0 (9)
and the coecients an are determined by the condition Pp1 (k ) = x(k ), np+1 k n, that is by solving the linear system (with Vandermonde matrix) a0 x(n) 1 n np2 np1 p 2 p 1 1 n1 (n 1) (n 1) a1 x(n 1) . = . . . . . . . . . . . . . . . . . . . . 1 n p + 1 (n p + 1)p2 (n p + 1)p1 ap1 A simple example of this predictor class is the Linear Predictor, which can be obtained assuming p = 2. From the above relation, this assumption implies that the prediction algorithm considers the observations x(n) and x(n 1). In particular, the predicted value, x (n + 1), is obtained by considering the slope of the straight line passing through the two points x(n) and x(n 1), i.e. x (n + 1) = 2x(n) x(n 1) 3. Chaotic Predictors This section presents the application of chaos to time series prediction. For the sake of conciseness, details on the theory of chaos are omitted here; we refer the interested reader to [6] for an excellent overview on the topic. Hence, in this section we will recall only the description of a particular type of chaotic predictor, 352
x(n p + 1)
the so-called Radial Basis Function Predictor (RBFP), which has been studied and deeply analyzed in [3]. To make predictions about chaotic time series, the rst step is to consider the time series as obtained by sampling a continuous-time scalar variable x(t) representing the dynamic of system whose underlying dynamic is that of a strange attractor lying on a D-dimensional invariant manifold. Assume then a sequence of observation xn = x(n ); the sequence is referred to as chaotic time series and the sampling period is called delay time. The second step is to embed the sequence in a mdimensional state space. The value m is called embedding dimension. If the strange attractor lies on a Ddimensional invariant manifold, then, necessarily, m D. On the other hand, Takens theorem assures that m 2D + 1 [7]. The state vectors will be of the form: x(n ) x(n ) xn = (10) . . . x (n (m 1) )
The third step is to select a proper interpolation function fN such that xn+1 = fN (xn ). Notice that, with our choice of state (10), the future state xn+1 has only the rst component unknown. Once the prediction function is chosen, and a trial value of m is selected, the prediction performance is assessed by computing the prediction error. If the results are reasonably accurate (prediction error close to 0), then the prediction problem is solved. Otherwise, a new value of m is selected and the procedure starts over, until convergence.
3.1. Approximation techniques
So far, we have been discussing about interpolation functions, namely maps fN such that xn+1 = fN (xn ) 1nN 1 In many practical contexts, including teletrac studies, this constraint may be signicantly relaxed by allowing fN to be an approximant, that is: xn+1 fN (xn ) 1nN 1
This, of course, widens the number of techniques that could be used. Typically, they belongs to two dierent categories: global techniques and local techniques. In the former, the approximant (or interpolant) is applied to the whole series of 353
vectors that can be constructed through the sequence. In the latter case, instead, the prediction is made on the basis of the states in the past that are near to the current one. The rationale of this technique is simple and intuitive: look for pieces of the trajectory in the past that resemble the current one and infer the future behavior according to the way the system evolved in that analogous past condition. Incidentally, this method looks well tailored to the case of network trac which has widely proven to be self-similar. The selection of the number k of neighbors in the past is another critical issue. Typically, and we choose this strategy, k = m + 1 is assumed [8, 9]. Global techniques are computationally too expensive and are not considered in our study. We use the local approach and we select the radial basis functions as approximant functions.
3.2. Radial Basis Function Predictor (RBFP)
According to the previous discussion, given the set of k neighbors xnj , j = 1, 2, . . . , k , of xn we choose a prediction scheme of the form:
k
x n+1 =
j =1
xn xnj
(11)
Notice that, in its general expression, the constant is replaced by a polynomial term (usually not included). The functions : R+ R, dened as: (r) = r 2 + c2
> 1, = 0
(12)
are called radial basis functions. In our analysis we always used = 0.5. In other words, the prediction (11) can be interpreted by a weighted sum of terms in which the contribution of each neighbor depends inversely on their distance to the current state (closer states give a bigger contribution). The values j , j = 1, 2, . . . , k are determined through the knowledge of the past evolution of the state as:
k
x ni +1 =
j =1
xn i x n j
(13)
together with
k
j = 0
j =1
354
where
Using a matrix notation, following system: 11 . . . k 1 1
the above system of k + 1 equations is equivalent to the 1 xn1 +1 1k 1 . . . . .. . . . . . . . . = . xnk +1 kk 1 k 1 0 0 ij = xni xnj
(14)
4. Performance Parameters for Prediction Algorithms Comparison In order to dene the performance parameters to be considered in the comparison among dierent prediction strategies, we rst establish the scenario where these predictors will be used. To this aim, Fig. 2 shows the system analyzed for dynamic resource allocation. In the system, we have the measurement system which can measure the variable X (nTp ), representing the trac volume (number of bytes or packets) arriving at the system in the time interval (n 1)Tp , nTp . The quantity Tp is the sampling period used to evaluate the arrived trac volume; this is a project parameter and a procedure on its dimensioning is under study. The sequence of observations X (iTp ), for i = 0, 1, . . . , n, is the time series considered [(n + 1)Tp ]. by the predictor, which using these observations will extrapolate X The value X [(n + 1)Tp ] represents the trac volume that is predicted to arrive to the system in the time interval nTp , (n + 1)Tp . Using this information the resource allocator can deduce the network resources, C [(n + 1)T p ], that will be necessary to the system during the time interval nTp , (n + 1)Tp . Analyzing this scheme, we can deduce some requirements for the prediction algorithm. [(n + Firstly, we can note that in a time Tc Tp , the system should predict X 1)Tp ] and calculate C [(n+1)Tp ]. The parameter Tc must be negligible with respect to Tp , since the network resources calculated in Tc are valid for a time Tp . Hence, the prediction and resource allocation prediction-based algorithm must have a low complexity. [(n +1)Tp ]. Secondly, the resource allocated depends on the trac prediction X Hence, it is very important to have a high accuracy of the predictor. A third requirement for the prediction algorithm can be deduced considering that the system works on line and that the trac characteristics can vary in time. Hence, it should be able to readily react to the variation of the trac features. In other words, the predictors should have a high adaptivity. Summarizing, the performance parameters considered in the comparison will be: 355
Input Traffic
Output
Measurement System
C [(n + 1)Tp ]
PSfrag replacements
x(nTp )
Prediction System
x [(n + 1)Tp ]
Resource Allocator
Fig. 2. Prediction-Based Resources Allocation System
1. the complexity, 2. the accuracy, 3. the adaptivity.

4.1. Metrics for quality of prediction
In order to quantify the accuracy of a predictor fN , we introduce the predictor error 2 (fN ) as: 1 1 M M V lim with the normalizing factor: 1 V = lim M M
M m=1 N +M 1
xn+1 fN (xn )
n=N
(15)
1 xm lim M M
xm
m=1
(16)
Values of 2 (fN ) close to 0 indicate good performance of the predictor. Performance gets worse as 2 (fN ) tends to 1. Indeed, 2 (fN ) close to zero implies predicted values nearly identical to the actual ones [9]. This condition is excessively restrictive. It is often enough to require a reasonably low value of 2 (fN ) and a good match of the statistics of actual and predicted data, such as autocovariance function, distribution, variance, and so forth. 356
5. Predictor Performance Analysis In this section, we present numerical results of the comparison among some predictors described in the previous sections. The study is carried out using the trac data acquired during an emulation of an actual service. In particular, we consider a time series named Conf Cell representing the trac oered by a video-conference session corresponding to three people in front of a 3-CCD camera. Each sample of the time series represents the number of ATM cells transmitted during a frame period, equal to 1/25 s (40 ms). The available number of samples is equal to 48496, with mean value and standard deviation equal to 174.064 and 99.21 cells/frame respectively. In the comparison, we have considered three dierent types of prediction algorithms: 1. a NLMS of order 20, 2. the Linear Predictor, 3. the Radial Basis Function Predictor.
5.1. Complexity
A rst analysis is devoted to evaluate the complexity of the compared prediction algorithms. As a performance measure for this analysis, we have considered the number of clock ticks necessary in a PC to obtain a single predicted value. The values obtained with the three considered predictors are summarized in Table 1.
NMLS 0.66 106
Linear Pred. 213
RBFP 1.26 106
Table 1. Clock Ticks necessary to obtain a single predicted value We can clearly observe how RBFP has a quite higher complexity than both Linear and NLMS Predictor. Indeed, the number of clock ticks necessary to obtain a single predicted value when considering the RBFP is double with respect to the NMLS and 3 order of magnitude higher than the Linear Predictor case. To quantify the time necessary to obtain a single predicted value, we can refer to a PC equipped with a CPU at 1 GHz. Considering this equipment, the computation time of the dierent algorithms is summarized in Table 2. As we can observe, the RBFP can require a computation time higher than 1 ms to calculate a single predicted value; this value can be inadequate when Tp is in the order of tens of ms and for network links with high bandwidth. 357
NMLS 660 s
Linear Pred. 0.2 s
RBFP 1.26 ms
Table 2. Computation time to obtain a single predicted value
5.2. Accuracy
The accuracy has been evaluated considering the Predictor Error dened in Section 4.1.. The results are summarized in Table 3, for dierent sampling time, Tp .
Tp 40 ms 80 ms 120 ms
NMLS 0.0379 0.0734 0.1125
Linear Pred. 0.0483 0.0903 0.1493
RBFP 0.0413 0.0687 0.0985
Table 3. Predictor Error for dierent sampling time Tp
The Table points out the NLMS predictor as the prediction algorithm producing the best performance, while the Linear Predictor, facing a good computational speed, returns the largest value of predictor error. RBFP achieves high accuracy but the setting of its parameters is a long and o-line procedure that strongly depends on the characteristics of the input trac. Furthermore, as highlighted by the previous paragraph, the RBFP presents a high complexity. For these reasons the best choice for a prediction technique to be implemented in an on-line environment, seems to be the NLMS predictor, that oers a good trade-o between accuracy and computational speed. Finally, we can note that the accuracy decreases as the sampling time, Tp , increases. To observe the relation between predicted and actual values, we show Fig. 3, Fig. 4 and Fig. 5, which highlight the accordance in terms of temporal behavior between the predicted trac and the time evolution of the acquired trac data. From the comparison of the gures, we can deduce that the degradation of the accuracy of the Linear Predictor is due to the change of the trac pattern slope. This behavior can be easily explained by observing that the Linear Predictor calculates the observation at time (n + 1)Tp , considering the slope of the trac pattern observed at time nTp . 358
Fig. 3. Comparison of Predicted and actual trac data: RBFP for Tp =120 ms
Fig. 4: Comparison of Predicted and actual trac data: Linear Predictor for Tp =120 ms
359
Fig. 5. Comparison of Predicted and actual trac data; NLMS for Tp =120 ms
5.3. Adaptivity
The last analysis is related to the ability of the prediction algorithm to capture the variation of trac characteristics. In this study, we have articially varied the mean of the measured trac adding a constant term in the time series x(n), starting from a xed value of n. The behavior obtained with the Linear and the NLMS predictors are shown in Fig. 6 and Fig. 7, while for sake of simplicity we do not show the results obtained with the RBFP. Indeed, this algorithm has shown a behavior similar to the NLMS Predictor. From the gures, we can note the high prediction error of the Linear Prediction at the time 5000T p , due to high variation of the slope presented by the measured trac pattern. On the contrary, the NMLS produces an almost negligible error in the same instant, as can be observed in Fig. 6. 6. Conclusion The paper presents the analysis of dierent prediction techniques and their application to network trac. The underlying goal of the research is the development of a sound theoretical basis for the realization of Admission Control and Resource Allocation algorithms to be integrated into the control plane of a QoS oriented network architecture. Three main prediction techniques have been analyzed in the paper: the NLMS predictor, the Linear predictor and the Radial Basis Func360
Fig. 6. Adaptability Test: NLMS for Tp =120 ms
Fig. 7. Adaptability Test: Linear Predict for Tp =120 ms
361
tion predictor. Performance of the three algorithms has been compared in terms of their accuracy in capturing actual trac behavior, their implementing complexity and their adaptivity. The results show the high complexity of the RBFP, while a very low computation time is required by the Linear Predictor. On the contrary, RBFP presents higher accuracy and adaptivity with respect to Linear Predictor. Dierently from these two prediction schemes, the NLMS permits to achieve an adequate trade-o among complexity and accuracy-adaptivity. Indeed, the analysis has shown that the NLMS has a medium complexity, but permits to achieve the same performance of the RBFP in terms of accuracy and adaptivity. References [1] Z. Jiang, Y. Li, and V. C.M. Leung. A predictive demand assignment multiple access protocol for broadband satellite networks supporting internet applications. In Proc. IEEE ICC 2002, 2002. [2] S. S. Oruganti and M. Devetsikiotis. Analyzing robust Active Queue Management schemes: A comparative study of predictors and controllers. In Proc. IEEE ICC 2003, 2003. [3] R. G. Garroppo, S. Giordano, S. Lucetti, and G. Procissi. Chaotic prediction and application to resource allocation strategies. In Proc. IEEE ICC 2004, 2004. [4] M. Hayes. Statistical Digital Signal Processing and Modeling. Wiley & Sons, 1996. [5] S. Haykin. Adaptive Filter Theory. Prentice-Hall, 1991. [6] T.S. Parker and L.O. Chua. Chaos: A tutorial for engineers. Proceedings of the IEEE, 75:9821008, 1987. [7] F. Takens. Detecting strange attractors in turbulence. In D. Rand and L. Young, editors, Dynamical Systems and Turbulence, volume 898 of Lecture Notes in Mathematics, pages 366381. Springer, 1981. [8] J. Farmer and J. Sidrowich. Predicting chaotic time series. Phys. Rev. Lett., 59:845848, 1987. [9] M. Casdagli. Nonlinear prediction of chaotic time series. Physica D, 35:335 356, 1989.
362
"
#$
% & % ' % ((
" #$
%&
'( '
) ! ) (" , /& * ) 0 3 ) ) ) ) * * 1 ** ) ) * ) * .0 0 ) 0 ) . +% . ) 0 , . * )) * ) )
* + +(" , - )) ).0 ) * 1 * -
, ) - * * ** )2
** ) * ** )/ 0 -. 0 -1 * + (" * ** ) / % ) .* 4 * )0 * (" )
+ 3 , ) 0 0* ** )/ 5 ) 1
* /
) 678/
0 0 ) * ) 5
0
0. *
-* 0%
** ) 5 0 ** ) 0 0 *.
* )
3 +# #9
) * . ** )
) )
#:;;7.;<=> . ;:,
* )
-/
0 * 0 ) *
* * .0 ))
) )("
* + , 5 0 ) ** ) )* 5 0 -/ 0 -.! ) 6:8 )) / ) 0 * * @ )* + ) , ) 0 ) * .0 (" , ) 0 - * ) ) * ) .0 * % 3 ) * + 3 ,) * )) % ) ) * ) + * ) * * 0 * ) ) / " ) .* 4 )- ) ) 1 * )) ) 0 * 5 * % 3 ) ** ) / * 0 *0 ) (" !% 3 * * (" /% ) !) )
+ ) , ) * ) / ) * )) )) + * * * (" / * ) * ** )/ * +% , 0 /& * 03 0 - ) 0 ,/ )) )- * - * ** )/ A . (" / ) * (" * 1 * /% ) / % ) 5 * /
! " # $#
! )
*
%#
& 5+ , ) .
& %'# $ "(% %

" # . +(" , * 5 / .
) ) * )
+) , B )
+7 ,
) ) +7, ) * 0
4 )
*
- ("
+ ,B
* ) +:, ?>:
* 4
)-
(" - * C
(" C D +7EF,G: 4
)*/ + 4
) , )* * (" / C * 5 ) ) * * / * - 5 ) ) ) 0 0 0 -
0 * (" ("
)* ** ) )
0 * 4
) 6:8 ) *
)
) )
$ * + + ), *
+)
#
* * / * / * H L/ * +?, . * ) , / )
5+@
* 5+ , 0 * I )
;
** ) 0-
* (" ) ** )
*
+)
+,
+G: ;
+:
) , @ D 7J K
) . * +
0 * * * -/
+ + , .
) ) ) @ D 7//K/ * 5
) ) *4 + , ) -, ) , ) ** ) * * + 5 * * * ) 0 * ) ** ) ) ) 00 ) * ) K 5+K ,
) *
. . @
,*
/ 7/ ( *
* )
) /
/"
00
?>?
* #
0) *
+
+
, / ) ) 00 *
+
;
" #
$ * 00 5 0 5 4 * 5 ) @,/ 0/
@
! ) * (" * ) ** ) * .
$ + + ),
: *
6:8 ) )
0 *
** )
@
*
@
/ % )
*
+
: N
, +M,
0
:
+
@+
+
.) )
* )
5 ) +
+
* 0
G :, G
*
:
) ** ) +=, : + + G :, 3 * ) )
@
3 *
+5, D O+5,G O+5, )*# + 0 )
*# )
F, )
0 * )
@
+M, )
* ) ** ) 0 + ) 0- : * ) ) ,/ 0 ) * ) ) 0 / 0 ) * .0 ) + * ) ** ) 0 ) 0 , -+ H I C )* ,/ 5 . 00 ) ) * )) ) + H I 0 ) D;,/ * % * )
* * )) )(" ) * 3 ) 0 0 * /
6P8
,
3 ) / 0 * )
-'"' .
* 5 ) /
/'
$% 0 1
. ) 00 G 0 ) . *
) 0
?>M
/ :/
) )
**
00
: +) , ) + , 00 / )
) 5 @
* * / % 00
: 00
/ 0 / D7 )
@ ?
?/ % 00
3 ) ** )
* 5,/
00
+ Q
0
+ : *
1
$ ++ ) ,
) * * 5
R
00 +M,
,
)
7, +7
, +7 ,
@ * +>,
* ("
RN:
+ +7
N+
0 + )O* * + ) , -
- * 5 + @,/ * 00 ) ?>=
0 0 . 1 ) *
* ,/
0 * )
) ) - ) ) 1 *
* * + + 1 6:8, -1 / 5 ) 00 * ) +>,
-.
) *
0 ) ,/
0) ** )
? /
M/
), 7;
) /
+@D;JS,
00
+ D;J:@.7, *
) 4 / 3 * . ) * & ) 7; ) ) 5 0 (" C ) ;/P 678/ * C D ; SM;S * * ) CD ;/P;?; <=T ) * ) 6;/S<: ;/P7?8 * ) = 7S + ) : 0 7 . *.* ) ) ) ) . ** ) ,/ * 3 * + @ 0 ) , 0 + @, ) * / * 4 + 5 ) ) - * , ? 0 * * / " 0 ) ) ) * 0 -.! ) O ) * ) / % ) * 00 )) )- * + @ D ; ) 0P<
:
5 *
))
)M
*C )
3 ) .) )
. *.* / 0 *
- 0 H* I 0
0 ) )
* ) )
) 0
* *
3 )
?>>
, . *.*
-0 /
.
0-
3 @ D + , C ;/SM;S
0 7 D @D: C ;/S::= D @D? ;/SM:> @ ;/P;S<
3 +)
) 0 /&
) 3 .0 . . * * * ) ) ) * ) * 4 ) ) ) 0-
) ) 0 , ) -) /
. -0
5 6?8/ 1 , * )
.) ) ) 0 00 * ) + ) - ) ** ) 0 * 5 - * A+ N : ,/ 0 5 ) ) * ** ) 0)
00
/ 0 12 - 3 , 1 0 4 15 % * 0* * / * % ) / ) ) ) * /
* * )) 0 )
) 0 ) 1 * * 0
=/
00
?%
5 + 4
* M, )
*C ?>S
* )-/
$ *: + + ) ,
:+ N
RR
RR
, +S,
0 ) ** ) )
.) )
) ) ) ) H) ) ) /
) ) . * I ) . ) 0 ) ) * ) ) ) ** ) ,/ % ) - * * ) ) 0) + ) #%% +P,
0
* ) * ) )
@
5) 0 ) *
@/
) * 4
+ )-
0 ) 5 ) % ) ) * )) )* ) ** ) ) ) ) - * ) ** ) * ) ! )
/& ) * .* 4 * * 2 )0 ) /
) ) *
00
4 )) )-/ 00 C %4
% ) ) 0 ) ) ("
C * 0 ) -/ ) * ) ) *
0 * * C+ ,/ ) %
)- *
0 / + #%%, ) * ) * 4 * 0 * ) ) 5+ ,
*: + / + , 6
+ ; 0 + ;
) 0
# 6M8 ) U 6=8 #%% * ) * ) /
) )
* * ,
*: +
) 0 ) ** ) ) 0 )) )*
##% 0 / #%% ) ) )
) . )
5 @ ) ) ) 0 ) ** * .0. : * ) ?>P ) ) / ) 0/ - ) )
) % / #%% *
) -
* V
* *
** ) ) ) % . #%% H 0 *. ) * ) ) , ) * ) / * I+ 0 )) 5
0 C * #%% ) 0 ) C * + ) ) ) / C * ) ) 0 ) 4 ) ) 7; & ) ) ) ,/ % ) ) ) / /
* ) ) / / /
- * > #%% ) ) ) / * * *
) -) * * ) * 5
>/ %
. #%%
) 0 :/ %
7; , . #%%
) / %) *
7 = S 7; )
77 +
% + * 7.==;;; ==;;; . <P;;; <P;;; . 7P=;;; 7P=;;; . :7=;;; :7=;;; . :>:7MM 0 ) ) 0 M
C , ;/P7; ;/P:< ;/P7= ;/S>7 ;/P;M * 0?>< ) / )
D 7P=;;;
D :7=;;;, * ;/S>7/
C )
5) - /
3
) * * 0 0 ) * ) - * ) * % *
0$ "4% 0$%
* ** ) ** )/ & *C 00 ) % ) 3 ) ) , / ) ** 1 3 + *
) * * . + * ) - * 0 , * 0 * * * ) * ) * % . #%% ) ) * ) * ) . * * #%% * 3 / A ** * ) ) .* 4 )- ) ) 1 * ** )/ % 6>8 * ) *% 3 - / 0 ** ) -1 G ) 0 6S8/ . * ) . 5 * ) 0 * -1 ** ) + 0 G, 3) ) ) * /
' ' '$ '%

678 ( 8 + 6:8 ! / 6?8 ) / /% +A) 0 6M8 6=8 * 6>8 " )Y (6 ) )U / /A *. 0/ 7<<M, 7.7=/ / %/ %
*
* .0 @ '
*
** )/ / * 9: ? +
.* 4
5 *
' .
*
0 7 )
' ) /
0 - 3/ /
/
7<<<, PSP.P<S/
)0
/& . ' 7<<M, 7?:.7?=/ &/ /#/ ) /< ( / $ / % / 1 * (
/ ' - * ?P +=,/
( ) * 4 *
;(
! -
/#/ W
*) * )
P< %/ # ) 1 = 6 8 9/ ** )
) X M:S/ +% .0
) 0
*) 7<<M,/ * /
?S;
6S8 " )Y / / / - ) , + :;;M,/ :=<.:>M 6P8 " . /! $ / ) / = 6
-1 $5 0 - 3/ A . B >+
* )
( ** ) 0 7 ) @, 05 0 8
*
/ . 9A %
* .
$ >? ) ) W%
) /
7<<P, ?S7> Q?S:7/
?S7
Performance analysis of an FDL based optical packet buer

Dieter Fiems and Herwig Bruneel SMACS Research Group, Vakgroep TELIN (TW07), Ghent University St-Pietersnieuwstraat 41, B-9000 Gent, Belgium
Abstract: We evaluate the performance of optical bre delay line based packet buers. In particular, we focus on the packet loss in FDL structures without feedback. Two dierent FDL selection procedures are investigated analytically using a probability generating functions approach. Further, we assess the impact of the lengths of the FDL lines on the performance of the buer and illustrate our analysis with some numerical results.
1. Introduction Future communication networks will have to cope with the ever increasing bandwidth demand related to all kinds of multimedia services over the internet. Optical packet switching (OPS) networks combine the exibility of packet switched networks with the bandwidth capacity of optical bres (see a.o. [1, 2, 3] and the references therein). An OPS network consists of optical packet switches, interconnected with optical bres. User data is transmitted in xed length optical packets consisting of payload and header information. The payload is switched entirely in the optical domain whereas the header information is converted to and processed in the electrical domain. Conversion to the electrical domain is required as there is no optical logic available. As dierent packets may contend for the same output, optical buers are required to mitigate packet loss. Buering in the optical domain relies on the use of bre delay lines (FDLs). That is, the optical packet is buered by sending it through an optical bre. There are two main classes of FDL structures for buering: FDL buers with and without feedback. In the former class, the output of the FDL structure is fed back to the input if there is contention whereas in the latter class, packets are dropped if there is contention at the output of the FDL structure.
In the present contribution, we analytically investigate the performance of FDL structures without feedback for synchronously arriving xed length packets. We assume that at most one packet can leave the system at a time. That is, we consider a single server system. Performance assessment of this type of FDL structures has been considered before. In [4], the performance of an FDL structure consisting of FDLs of dierent lengths fed by uncorrelated trac is investigated. Packets are routed to the shortest delay line such that there is no contention at the output. The authors note that an exact solution is not feasible and provide an approximate analysis. An FDL structure fed by uncorrelated trac consisting of FDLs with consecutive (single packet length) delays is considered in [5]. Packets are routed to the rst available delay line and may cause contention at the output. Finally, in [6], performance of an FDL structure fed by correlated trac is assessed. The FDL structure under investigation consists of FDLs with consecutive multiple packet length delays and as in [4] packets are routed to the shortest available delay line. In this contribution, we compare the routing scheme of [4, 6] with the scheme of [5] for FDL structures with consecutive multiple packet length delays, fed by Markov-correlated trac. The rest of this contribution is organised as follows. In the next section, we provide a more detailed description of the FDL structures and the arrival process under consideration. Sections 3 and 4 then concern the analyses for the dierent routing schemes. Finally, we illustrate our results by some numerical examples in section 5 and draw conclusions in section 6. 2. The FDL Buer Structure We consider an FDL structure fed by synchronously arriving trac. That is, we assume that time is divided into xed length intervals or slots and that packets only arrive at (just before) slot boundaries. The packets have a xed length and we assume that the slot length corresponds to the transmission time of a packet. Packets arrive in the FDL structure just before slot boundaries. Furthermore, we assume that at most 1 packet may leave the FDL structure at the end of a slot. That is, we consider a single server queueing system. The structure consists of N delay lines with lengths ng (n = 1 . . . N ). Here denotes the slot length and g denotes the granularity, an integer constant. Further, a zero length delay is also possible. In that case, the packet is directly routed to the output. Arriving packets are routed to available delay lines according to one of the following two routing schemes: Scheme NC (no contention): packets are routed to the shortest available 376
delay line such that there is neither contention at the input nor at the output of the FDL structure. That is, packets are routed such that during a slot at most a single packet enters a delay line and such that at the end of a slot at most a single packet leaves the complete delay line structure. The packet is dropped if such a delay line cannot be found. Scheme S (simple): packets are routed to the shortest available delay line. That is, packets are routed such that during a slot at most a single packet enters a delay line. The packet is dropped if such a delay line cannot be found. Note that this scheme allows contention at the output of the structure as only one packet can leave the FDL structure during a slot. Excess packets are dropped. Note that the S scheme does not require information regarding the state of the FDL structure as opposed to the NC scheme. Therefore, the S scheme is easier to implement. One easily shows that a single counter captures the state of the structure with consecutive delays (g = 1). For g > 1, g counters can capture the state of the system as it decomposes into g virtual FIFO queues (see further). The state of a more general set of FDLs is more complex to describe. The number of packet arrivals at the end of the consecutive slots are modeled by means of a discrete-time nite-state Markov modulated batch arrival process. At slot boundaries, the arrival process is in one out of M possible states. Given this state, say state i (i {1 . . . M }), there are k packet arrivals just before the next slot boundary and the arrival process is in state j at the next slot boundary with a xed probability aij (k ). Therefore, the arrival process is completely characterised by the matrix A(z ) = [Aij (z )]i,j =1...M of (partial) probability generating functions
Aij (z ) =
k=0
aij (k ) z k .
(1)
Also, for further use, let A(z ) denote the probability generating function of the number of arrivals at the end of a random slot. That is, A(z ) = A(z )eT (2) Here eT denotes the M 1 column vector with all elements equal to 1 and denotes the steady-state probability row vector of the states of the Markovian arrival process. That is, is the non-negative normalised solution of = A(1). 3. The packet loss for the NC scheme We rst consider an FDL structure with consecutive single packet length delays (g = 1) and with routing according to the NC scheme. One easily veries that the 377
optical buer with the NC routing scheme operates as an ordinary FIFO buer with size (not including the packet in service) equal to the number of bre delay lines N . For simplicity, we rst focus on an innite size queueing system and then approximate the packet loss ratio by means of tail approximations. Kim and Shro [7] note that plots of the packet loss ratio in nite capacity queues and of tail probabilities in innite capacity queues look very similar. Therefore, they propose following heuristic for the the packet loss ratio PLR(N ) of a queue with N buer spaces, PLR(N ) = PLR(0) Pr[U > N ]. Pr[U > 0] (3)
Here, Pr[U > k ] denotes the probability that the queue contents of the innite capacity queueing system exceeds k . The innite capacity FIFO queueing system with Markov-correlated arrivals has been studied extensively in the past and we may therefore rely on previous results (e.g., [8, 9]). In particular, the tail probabilities Pr[U > k ] are investigated in [9]. The interested reader is referred to this contribution. As such, we only need to determine the packet loss ratio PLR(0) for the buerless system. For such a system, all packets but one that arrive at the end of a random slot are lost. The mean number of packets that enter the system and leave the system are given by A (1) and (1 A(0)) respectively. Therefore, we obtain following expression for the packet loss ratio: PLR(0) = 1 1 A(0) A (1) (4)
Let us now shift focus to the FDL system with consecutive multiple packet length delays (g > 1). Consider a random slot, say slot n. Arrivals at the end of this slot can only leave the FDL structure at the end of slots n +1+ kg for some integer non-negative k due to the assumptions regarding the lengths of the available delay lines (since it takes kg slots to exit the k -th delay line). Therefore, these customers may only contend with customers that arrive(d) at the end of slot n + lg for some integer value l. This observation leads to the conclusion that we may regard the FDL structure under consideration as a set of g (virtual) buers as depicted in gure 1. During the consecutive slots, arrivals are cyclically routed to the dierent virtual buers and the buers are cyclically served. Since an arriving packet is routed to the shortest delay line such that there is no contention, each of the virtual buers operate as a FIFO queueing system. One should however note that due to the non-unity granularity, the complete 378
virtual buer 1
virtual buer 2
virtual buer g
Fig. 1. The system with g virtual buers. FDL structure does not follow a FIFO queueing discipline. Packets may leave the FDL structure out of order and the discipline is not work conserving as during some slots no packets leave the FDL structure while there are packets in it. This is due to the fact that once a packet is put in an FDL, it cannot be transmitted before it exits the FDL. Or, in other words, FDLs cannot be accessed randomly. Now, consider the arrival process into a single virtual buer. Every g slots, packets are routed to the virtual buer under consideration and if there are packets present at the beginning of the slot, a single packet leaves at the end of the slot. The systems behavior during time is visualised in gure 2. One now easily veries that the virtual buer observed every g slots can be modelled as a FIFO queueing system with Markov-modulated batch arrivals as considered before (i.e., as considered for the case g = 1). The transition-matrix of the arrival process of this queueing system is given by, (z ) = A(z ) Ag1 (1). A (5)
The former expression follows from the fact that after slot n (see gure 2), the arrival process goes through g 1 transitions until slot n + g . There are no arrivals in the virtual buer under consideration during these g 1 slots. As all virtual queues are fed by the same arrival process, one nds that the packet loss ratio of the system equals the packet loss ratio of a single virtual queue. Summarising we nd that adapting the arrival process by means of expression 379
queue size: un arrival state: i
i aii arrivals slot n
(un 1)+ + aii j
slot n + g time g 1 transitions
departure if un > 0 Fig. 2. Arrivals in and departures from a virtual queue (5) translates the case of general g into the case g = 1. That is, we can nd the packet loss ratio for general g by means of expressions (3) to (5) and by means of the results of reference [9]. 4. The packet loss for the S scheme As for the NC scheme, we rst focus on an FDL structure with consecutive delays (g = 1). Consider a random slot, say slot n, and assume that there are i arrivals at the end of this slot. The simple routing scheme under consideration then routes the packets such that these arrivals leave the FDL structure at the end of slots n + 1 to n + min(i, N + 1). Note that packets are lost at the entrance of the FDL structure if there are more than N + 1 arrivals. We can now consider the number of packets that leave the FDL structure at the end of slot n + N + 1. Note that slot n + N + 1 is a random slot as slot n was chosen randomly. The simple routing scheme implies that not more than one packet that arrived at the end of slot n+j (0 j < N +1) can leave the system at end of slot n+N +1. A packet that arrived at the end of slot n + j leaves the queue at the end of slot n + N + 1 if there are more than N j arrivals at the end of slot n + j . Let a(n+j ) denote the number of customers that arrive at the end of slot n + j , then we nd that
N
(n+N +1)
=
j =0
1(a(n+j ) > N j )
(6)
customers leave the system at the end of slot n + N + 1. Here 1(x) denotes the 380
indicator function. As slot n is chosen arbitrary, the state of the arrival process at the beginning of slot n is described by the row vector . Recall that the latter is the non-negative normalised solution of = A(1). Further, let B (k) (z ) = [Bij (z )]i,j =1...M denote the matrix with following (partial) probability generating functions as elements,
k (k) Bij (z ) (k)
= z + (1 z )
l=0
aij (l).
(7)
From equations (6) and (7) and taking the Markovian nature of the arrival process into account, we nd that the probability generating function B (z ) of the number of departures at the end of slot n + N + 1 is given by,
N +1
B (z ) =
k=1
B (N k+1) (z ) eT
(8)
Here eT denotes the column vector with all elements equal to 1. Recall that slot n + N +1 is a random slot as k was chosen randomly. Therefore B (z ) also denotes the probability generating function of the number of packets that try to leave the FDL structure at the end of a random slot. As at most one packet can leave the system at the end of a slot, one easily veries that (1 B (0)) equals the mean number of departures from the system at the end of a random slot. Therefore, we nd following expression for the packet loss ratio, PLR(N ) = 1 1 B (0) . A (1) (9)
We can now again focus on an FDL structure with longer delays (g > 1). As for the NC-scheme, there is no contention between packets that arrive in slots n and l for (n l) mod g = 0. Therefore, the system also operates as one with g virtual queues (see section 3). We can focus on the packet loss ratio in a single virtual queue and note that the latter also equals the packet loss ratio of the complete system. The arrival process in a single virtual queue is again described (z ) as given in expression (5) and the S-scheme is applied in by the matrix A each virtual queue. Therefore, adaptation of the arrival process again translates the case of general granularity g into the case g = 1 as investigated before (see expressions (7) to (9)). 381
5. Numerical examples To evaluate the FDL structure numerically, we assume that the arriving packets come from the compound trac of 2 types of 2-state Markovian sources. During the consecutive slots, each source alternates between an o state and an on state. A source generates a single packet if it is in the on-state and no packets if it is in the o-state. There are ni sources of type i (i = 1, 2). The transition probabilities from the on-state to the o-state and from the o-state to the on-state for a source of type i (i = 1, 2) are given by (1 i ) and (1 i ) respectively. The compound source is therefore completely characterised by the tuple (1 , 1 , n1 , 2 , 2 , n2 ). Alternatively, we may characterise the source by the tuple (1 , K1 , n1 , 2 , K2 , n2 ). Here i and Ki (i = 1, 2) denote the arrival rate and burstiness factor of a single source: i = Ki = 1 i , 2 i i 1 , 2 i i (10) (11)
for i = 1, 2. The burstiness factor Ki is a measure for the absolute lengths of the on- and o-periods and takes values between max(i , 1 i ) and innity. For Ki = 1, the source turns into a Bernoulli source and therefore the generated trac becomes uncorrelated. Figure 3 depicts the packet loss ratio versus the number of bre delay lines for both the NC routing scheme (a) and the S routing scheme (b). For each of these schemes, dierent granularities g = {1, 2, 4} are assumed as depicted. In the case of the NC scheme, we also depict some simulation results as our results are only approximate. Arrivals come from a superposition of n1 = 3 correlated sources (1 = 0.1, K1 = 20) and n2 = 3 uncorrelated sources (2 = 0.1, K2 = 1) for all plots. As expected, the NC scheme clearly outperforms the S scheme. Further, an increase of the number of bre delay lines implies a decrease of the packet loss ratio for both schemes. For the NC scheme this decrease is exponential. This is not unexpected as the FDL structure with the NC scheme corresponds to a FIFO queue for g = 1 and to a set of FIFO queues for g > 1. Also for the S scheme we observe that an increase of the number of FDLs implies a decrease of the packet loss ratio. However, the packet loss ratio does not decrease further for relatively low values of the number of FDLs. This comes from the fact that packets are always routed to the shortest delay lines. The (additional) longer delay lines are only used if there arrives a large batch at the end of a slot. We can further note that increasing the granularity decreases the packet loss ratio. For higher granularity, the correlation in the arrival process to a virtual queue 382
0.22
0.215
0.21
0.1
0.205
PLR
0.01 g=2
PLR
g=1
0.2
0.195 g=1 g=2 0.185

g=4
0.19
0.001
0.18
g=4
0.0001 0 5 10 15 20 25 number of fibre delay lines N 30 35 40
0.175 0 2 4 6 number of fibre delay lines N 8 10
(a) NC routing scheme
(b) S routing scheme
Fig. 3. Packet loss ratio vs. the number of Fibre Delay lines
the packet loss ratio of the complete system equals the packet loss ratio of a virtual queue becomes less and less correlated which implies a decrease of the packet loss ratio in that virtual queue. Figure 4 depicts the packet loss ratio versus the arrival load for both the NC routing scheme (a) and the S routing scheme (b). For each of these schemes, dierent granularities g = {1, 2, 4, 8} are assumed as depicted. The arrival load comes from the superposition of 4 correlated (K1 = 20) and 4 uncorrelated (K2 = 1) sources. 20% of the trac comes from the uncorrelated sources and there are 7 bre delay lines. As expected, for both schemes, an increase of the arrival load implies an increase of the packet loss ratio. Again, we can observe that the NC scheme always outperforms the S scheme. However, for higher load, the packet loss ratios of the S scheme is not this much worse than the packet loss ratio of the NC scheme. In both gures 5 and 6 we investigate the inuence of correlation in the arrival process on the packet loss ratio for the NC scheme (a) and the S scheme (b). In gure 5 we assume that arrivals come from the superposition of 10 sources. Sources are either correlated (1 = 0.08, K1 = 20) or uncorrelated (2 = 0.08, K2 = 1) and we vary the number of sources n1 that are correlated. Dierent values of the granularity g = {1, 2, 4} are assumed as depicted. Clearly, more correlation in the arrival process implies more loss. Also, we see that the NC scheme outperforms 383
0.3 g=1 g=2 g=4 g=8 0.25
0.3 g=1 g=2 g=4 g=8 0.25
0.2
0.2
PLR
0.15
PLR
0.15
0.1
0.1
0.05
0.05
0 0 0.1 0.2 0.3 0.4 0.5 arrival load 0.6 0.7 0.8 0.9
0 0 0.1 0.2 0.3 0.4 0.5 arrival load 0.6 0.7 0.8 0.9
Fig. 4. Packet loss ratio vs. the arrival load
0.25 g=1 g=2 g=4
0.36 g=1 g=2 g=4 0.34
0.2
0.32
0.15 PLR
0.3 PLR 0.28
0.1
0.26
0.05
0.24
0 0 2 4 6 Number of correlated sources 8 10
0.22 0 2 4 6 Number of correlated sources 8 10
Fig. 5. Packet loss ratio vs. the number of correlates sources
384
0.2 0.18 0.16 0.14 0.12 PLR 0.1 0.08 0.06 0.04 g=1 g=2 g=4
0.3 g=1 g=2 g=4 0.28
0.26
0.24 PLR 0.22 0.2 0.18 0.16

1 2 3 4 5 6 7 Burstiness factor K 8 9 10
0.02 0
10
Burstiness factor K
Fig. 6. Packet loss ratio vs. the burstiness factor K
the S scheme. Further, the packet loss ratio does not depend on the granularity if there is no correlation in the arrival process (n1 = 0). In this particular case, larger g does not reduce the correlation in the arrival process of a virtual queue (and therefore also reduces the packet loss ratio) as there is no such correlation. For higher n1 , additional granularity reduces the packet loss. Finally, in gure 6 the packet loss ratio is depicted versus the burstiness factor K of the arrival process. Here we assumed that there are 8 homogeneous sources and that the total arrival load equals 80%. Again, there are 7 FDLs available to mitigate packet loss. Increasing correlation again implies more loss. As in gure 5, we observe that the packet loss ratio does not depend on g for uncorrelated trac (K = 1). For larger K , an increase of the granularity implies less loss. 6. Conclusions We considered the performance analysis of an FDL structure with granularity. Two dierent routing schemes were investigated. The simple NC scheme clearly outperformed the S scheme as expected at the cost of more complexity. The gain of the more complex NC scheme is however limited for high load. Further, we showed that granularity can reduce packet loss considerably in comparison with a structure with the same amount of FDLs but without granularity. This is a result from the fact that granularity breaks down correlation in 385
the arrival process. References [1] R. Tucker and W. Zhong. Photonic packet switching. IEICE Transactions on Communications, E82-B(2):254264, February 1999. [2] S. Yao, B. Mukherjee, and S. Dixit. Advances in photonic packet switching: An overview. IEEE Communications Magazine, 38(2):8494, 2000. [3] L. Xu, H. Perros, and G. Rouskas. Techniques for optical packet switching and optical burst switching. IEEE Communications Magazine, pages 136142, 2001. [4] P. Cadro, A. Gravey, and C. Guillemot. Performance evaluation of an optical transparent packet switch. In Proceedings of the COST 257 First Management Committee Meeting, page COST257TD(97)12, Leidschendam, 22 23 Januari 1997. [5] J. Walraevens, S. Wittevrongel, and H. Bruneel. Calculation of the packet loss in optical packet switches: an analytic technique. International Journal , 57(4):270276, 2003. of Electronics and Communications (AEU) [6] D. Fiems, J. Walraevens, and H. Bruneel. Discrete time queueing analysis of a bre delay line structure with granularity. In The 8th IFIP Working Conference on Optical Network Design & Modelling (ONDM 2004), pages 5769, Ghent, Belgium, February 2004. [7] H. Kim and N. Schro. Loss probability calculations and asymptotic analysis for nite buer multiplexers. IEEE/ACM Transactions on Networking, 9(6):755768, 2001. [8] S. Li. A general solution technique for discrete queueing analysis of multimedia trac on ATM. IEEE Transactions on Communications, 39(7):1115 1132, 1991. [9] B. Steyaert, Y. Xiong, and H. Bruneel. An ecient solution technique for discrete-time queues fed by heterogeneous trac. International Journal of Communication Systems, 10:7386, 1997.
386
!
! "#"
$!
"
' ' % ' & # $ *
" & * & *( ' ' % &# '
& ) ' ' * * %
'
( +
&
'' ''
& , '' -%, .% / ' 0 + ' 1 & * ' ' + ' & ' %, .% ' ' ' + & ' && ' ' & & ' %, .% ' + '' ' ( & ' +
'"
# ' % '' , '' % %/ ' 2 / 7#8 7 # ' 8 ' (
%
% & 34 56+ 346/ ' * .% ' .% ' ' & %, & ' & .% + * & * ( & ' %, .% ' ' * ' -%, .% /
' * & '
' '
(+ %,
' .% ' %,
%,
'
+ <
( '
'
&
() '
* ( ' %
( ' ( ' * & ' & '+ ' & + & & <
'
* & " ' & '' ' ' & ' ' ' $ % ' ' +
+ , ' ( ' . * ( -, 8/ & ' %, .% ' = %, .% ' * '' & & & + 39 >6 & & ' + & ' ' ' ( * & ' * & ' -= ) &)! / * ' * ) &) + & ' # ( %, .% % ' + ' %, .% ' 0 * ' & ' & < & ' ' * & ' * & ' +
*( & ( + &
* 2 ' + + + = !* ' , ' 5 + , ' ( , ' & * 9 > < &' ' + &
("
' . && '+ ' < * 2 /+ = * * ( ! ? /+ ( * ' (
)
( ' ' *( ' ' ' & ' + . & ( ? -= *
'
!/ * #. .# '
9:;
) ( ' * * + ' ' %, * .% & + & ( & * * '' ' ( * 3>6+ * & ' ' & ' + ' & ' ' (& + * ( 2 & + ( ( & 2 '' * && + ' * ' * ' ' *( ' ' * + * * & & ?+ ' ' & 2 + * * & '' ?+ ' ' * ' ' ' ' ?+ ! & * & ' & * * 2 + * *( * ( ' ' 0 && & * = ! & * * *( ' 3>6+
& ' '' &
'
* (& '
'
' * &&
'
+"
,
' +
) *
&: ( ' ( ' %, && ' ' '' 8
)
) ' < & ' ' 0 & ' + * + 7 8 ' * & *
5+A *
'
'' & ' ( 2 & ' *
&* + " ' ' 0 ' (
' 3A6+ , '
& 7 ( & & * ' ' (
.% " -7 8/
9:@
' * & 4 * & 0 * + & ( & ' ' 0 + B & ' ' * <
&
' * &&
' ' & & * * (+ & CB' 7 8CD * *( /+ 7 8CB)D - ) /EB + (& * ' 0 4 & F+:+ ? -C 9F /+ ' ( / * '
'
''
* *
-+ +
C 4FF G ' * & *
5 * ' * ' * * ' * ' %, '' ( .% * < +
'& . % & 2 2 ( & & F+: '
) % & & ' * ' ' & 4FF G + '( & * * 2 * + * F+@/+ * -B/+ && ' '( &
' + '
& ' * &
&& '
' +
0 +
< & F+; && & *
-+ +
9::
4 ' = !* ' ' +
' * ' &
(& -
/"
&' ' * * 2 & '+ (* '' '' ' ' ' ?+ 0 * ' ' ( ' * %, & %, ' .% && & .% 0 + # & 0 += & ' & ' ' %, + # .% ' +
= !* ' 2 &
' '
0"
2' * ' ' + *( ) + ' & '
& & '
( &
&
9:H
!"#! $
$%&'(
! " # $ #
* + *
) , * , + + * . , * -
* + + + * + * / + * ,
* + * + * 2 + / * , * * , / * ) . *
/ 01 , ++ * * ,/
* ,
+ * * * , * + / ) + , * + + -# * -
, * , + * * / . , / , + * *
* ,
* * * * * * * *
+ * * + , * * + * / * +
* * + * *
+,
, * * * *
++ * , , +*
) , 3* -
* * / * * * +
* * *
/* * * -% *
3 * + * * , ,/ * , * / * /2 , +
, * $ +
+ +3 1 * 789 * /
) 2 *
* * /
* 2 -: * ,
* * * $:# * * , / *
* , - ( *
* * * + / * * / , * * * * / , 2 + + * , * +, ) , + * * *
+ , * * / , + /
3 *
, * , *
* *
3 * 0 -
* * ++ * ++ * */, ( ++ * * + */ * > * 3 * */* / + * - ( -( / / 7=9? * */ , *
+ /+ , -$
* + * + * - + , , , /
, + , * + + / * , + + + * ,/ *
, + ) , ++ * ; + * * < ++ * + * + + + * * + + * / * ,
3 , + ,/ ** > / * ,/
, , , *
+ * ?-
++ * */ + , * * */* > + + +
3 ?7@9* / . /, * 456 *
+ * * +
* , , , , * / * /
+ / A +
) 3 *
* 7B9 * * + ) * , , * * , * ? - ' * /+ , + * * >* , * , * * * * * * + , / , + + A /+ * > ,/ * A + + * * * * * D * * , * +* * * , - C * + * * * * , , , , /A 7@9- ( / , 3 * * + * * * -
+ * *
* / , , * * / ?+ + > +
* , * * 2 + -C + + -
+ +
, , , +
, * * * * + *
* *
* ,/ ? + * * + - (+ * .
* *
* * , - + * * , / * * ,/ , /
- + /*
* D * , , * , * * , ,/ , # * / * * / / , / * 2 , * * ? * > A / , /
/ / * , + , , * + * > * ?1 * 454 /- ( * * + , / / / * * + +, * / ++ * * A , * * , * * + * *
) + * * * * + * + *
* ?, * A + / * / ++ * * + , *
* , * * *
* / * , / / * , ++ /
-& /* / + / , / + * , ++ * */ * + * + + , * / * * -( / * , , /- + * , 3 + * , * , , $#: - & , , /* 2 , * / * ,/ , * / + , , * + , >+ 3 / * + + *
, * , / * , + / 3 + <
* , * * * ; * * /
,/
!"
: $ FF + * / * * 769)* E + * * / * * + * 7E9- * * 6H * , 6 + ,/ + G * * 6 ) , * / / 3 * / * * + / -
# ! ""
/ , * / * , * 45E * + * + * , + , + * / / + , , * / ,/ $ FF / + ), ++ -
8-
/ :#%- % 0- ' # 2 - J (- C 0- C :- ' + %- K #- ( * - & , C 2 &- ' ++ * $ * * * + * + $ -J +' * / 85 64)48 J / 6HH8 6- %2 * L J#- # , : + * + ( )$ * # + * K 64 * 6HH6 0 -
4- K ($ FF 0 * : * + # : 6HH8E2 - (: * * ( *
IB@=5-
# * +
#/ * > # M6HH8?
0 ( ) * ' - K #6 ) ++ * * / *
# , :G : * + 8H : # L 6HH4 * #- # ++ * # ( + : * + 8H : # L 6HH4 " N-%- ' K- $- - ' %-":- -( $( +) + ) * , / /< J + * / K - 68 - 465)445 6HH4 C 0- C :J- - ' + %: + * * + + * * #D6HH8, 2 &- % - ( * - &O, - # +) ** + * -: + * I4 >6HH4? %$ * : * # * (% # $:# : 1 * ( * J + ' 8B - 6 855=
45I
!"
! " #$ %
% ( % ) + * % &%
&% ( %
& % ) (
' * * + + &
, ( % % ( % ( % % % ., & % % .,+ , ( % 1
( & % %
* * &
' ( % % % *( % % -.,/ *( 0% ' + % % * / ' /+ % % + % * *( % + & % ( 2 3 445+ , % & ( & % % & ( % & -, 6!/ & & % 0 % ( % % + *
* %
% ( % & ( % &
% &
* & & %% & *( % * & & &
% % % * % +
' .,7 + , 6!
1 % + & , 6! % ., + & % ., 0 % & 0 % & % 0 % 2 3 445+ & ( % ( + ;( & &* + % + % ' % + % % ' * % * (%
- ( 1 %
% /
( % % % % ( (% 0 % ( % * % % % ' % % & < ' ( ., & * &% % ., % % ( (% % % = % % + % & ., *( - & 1 * % ( & % % 1 % ' %
* + ' * 1
&
& % * ' %
% % +
%% & %
+ %
(% + %
* & * * % + = ; -= (' '; ( ( % % % % * % & + * ( * & % 2 %
& 2; (>>
& >:5/ %
&
* ( & ( % % * % % (@ * % ( & & ( % + ., % * ( + &
% && % # ( % % & %
+ %(
3 945 %
% ( %
* %
% + A
( & % % * ? * % *
* %
89:
.,
(% +! % * ( (% % % ( & ( & +A * % ., ( % % * % ., + & * % % % * * & & & % * + ( * && % %( % * * % % (* % & ( / & - 0 % % * % /+ # & ( & % & * % + && % / * % ( * ( %% ., % + # % & * % *( %% & & 0 * % 0% * + % % % % & && % %(+ A % * % && % ( & ( < 0 * & * % ( * & (+ % & & * % + ! ; % 2! ; ::5 ) & % ( / % + % & * * % & ( * & % + % -; ; 2! 9B5/ % * & % % * * % % % < * % % % 1 &( ., + * % %(% % & % & % + % % ( -2! 9B5/ 3 &; ; + * * ( & * % ' & % + ! & ' & ( + 0 & ( % ) % ? @ * % % & % & 2 C 9B5 2A D &&:95 2 E 9B5+ 2= 9:5 & && % % & + * & ( % % + 1 % & * & & * % %% + & % * %
899
* +
% 3
&
* % * % ., 6
% ( *( * % * % + . -% % &
+ * & * /+ ( % (
., <%
% % 1 /
# - & /+
& % -( % &
( ., /
* % / + 0
6
&
& % * % && % & % && % &
+ ' %
& &
& &
& ( & ? % %
?6 & & @ ( ., & % ( % ' ) % 1 ' +
@ & ( '
@ % %
'
*3 % G 6H + +
+ % + % & + 1 ' ( ., * % * / % % % & ., * % @
& (
% *
? *( *3 %
* .,7 + '
% % % %
* + & % &
% & +
, 6! % +# % &
-! / % % *3 % %
F44
.,7 % * ( ., ( % + & J! K! % ( K!
+ ' % %% & & ., % * % %%
& ' % % + * %
% %
( & +
., + &
+I
!"
FGI>
8GBF
BG88
IG8L
GBI
! < % %
I+
!# A
& ( & + & &
& & 0 A + & ,6 ! * & A & & & & * %

A+
* % + & * & &
& & * + (
/ & % * % ( % & % &
A+
* % +
.,
&
& * %
% % < & % %
A
% & %
.,
A A
% & *3 % + & + +, % &
' +
* %
F4I
% * % + % % + & ., & ) (
& % % ., ' & % +C * & ' % * & & I & % ( & & * % &
% '%
A
&( '
., < *( ( % ' & % & ., +
* ( (
+ ;(
( + ' &A B + &

A
* 1 + & A
( A + A %
* &
% % *( A + ( % ! ., *( A ( ' ( &
% &
A A+
A+
& ! %
& / .,
& % * & * & & # % /
*( A * % & *( + B+ # < < 0 & + +B + % +! ' & & * *( G IO % ., 284NBFN 5 % ' ( % % % % +! & & + ' * % %
!
&
& + A + % & * / < 0 & % && 2 N N 5 * % *( A
M+ & % & + * %
( 84 < & * * , % & 0

!
IO & IO+
% % * % % & % % * * % & & /+ &
IO & *3 %
84 +
% % ! (
F4B
% 2B9NB>N 5 & (
*( +A I *( M % % + ( %
., 0 % & 28LNB:N 5 M+ 2B4NI:N 5 & % % % A % IO ( % +
'
* % IO * % % M 2B>NBFN 5+
A I + +
2B4NILN 5
* % 0 A * % && % A +
* & % & * %
* ! G B4+ A B+ * 1
* % B+ * 1 M -A
* % * % 8/
F48
M A I A G IO A B A G B4
M A 8 A GM
8B
M A B A G A I A G P IO A 8 A GM B4 8B
M A 8 A I A G IO A B A G A GM B4 8B
M M M M M M
$ *# # " # $ % $ $ % $ ' # ( % )( $ &% ( %%
&%
#%
&%
( % %
% )(
&%
( % % ' #
( %% ( $
$ $
B+ #
& ( %
I/
$% * +# % * % & % % + * % & ? % % % + % A (% A @ ( & ( < && + & % % ( && % % ( % ' < & ( & % ( *( A & * % & % ( (
F4F
* % + & * %
% +
+++
+++
A B A G IL A I A G I4 +++ M
+++ +++ A I A G I4
A B A G IL
+++ M
+++
+++
+++
+++
8 +A
* %
8*+ 6
( * %
&A B%
( % & * & & A * % A A && %

A A A
& % % 1 ' % % ( & * % A ' * & * % +
% % & ' &

A
* &%
&A *
#
%( & * % +
* % &A % &A +A % + & % A # & A A & A / * * % A A * %
% + &
#
* % & A &
+
A+
A &
* A % (
F4O
+++ A G + ,
IL $
A I A G I4 +++ M
+++
+++
+++
8+
% & * +
* %
A A
G
#
+ & A *( A + $
A
#
& -
A % & 1 A
A
%
A A/ A
% & * & * % A +
A (
* % * & * &
A A (
* %
!
% * & ! % &% % * & % % & ! % % ( 2=) % 445+ & 0 + 6 * % B+F/+ & % * 0 FO4 * 1 & & % & * ( % -* %/ & ! / ( * *3 % ' % * ! , + * % ! Q + ! & + =HH & + & & * * ( &% * , * * 0 '
F4L
1 %
! & * I+ *
I4 ! FO4+ ! &
+ ! 1 & I4: *3 % && % &
& O444 +
&
+ I B 8 F O .
& % % B44+BI IBF+II 9F+I IBB+I
& B44+BI IIL+L4 9B+OL
%% B44+BI IIB+IO :I+: II4+BI
I -! I/ B -! I ! / 8 -! ! / 8 B! * / F -! ! / ! I FO4
* I+
I I+LI B+I8 I+LF 8+4I
I I+>B B+8: I+>L 8+BF &
I I+> 9 B+FO I+:B 8+88
-! I
II8+>9 LI+> I % I4
&&
LO+B8 7
& 0
L4+4:
< ! B%
FO4 ! <!
(
! B ' !
( %
0 (
B 8 % * &
O *3 %
& + ( 0 & F %
*3 % ( -
/+
$ &
* I ) ,6 !+ % ( B+8'B+F
'( )
( % (% * % * % 8
*
% * % ' 8+4'8+8 F % - + + % * /+ C & ; ;/+
% ( ( % ,6 ! ) & & -I+>'I+: B
F4>
% %
& & ( % & % * % + &
&
& 0%
* *
( *3 % * '% * ( % 6( % &% % + # = * ( + + * * & % % && % % ) * + + % ( *3 % % & %
=HH *3 % + + *( *3 % & *3 % %
% & & *3 % + #& % % % + * % & % & & ,'% * * & + * * *
*3 %
* + * (
* ,'% 1 * % % +. * % * 0 &% ( & % & % & + % & % * % + <& & * % < -% / ( 0 & *3 % + 6 % ( *3 % % % *
+ +-
+, ,.
-. .
< 0%
* /
& *
F+
&
0%
*(
*3 % +
! % % % %
% % +
*3 % &
+ *
* %
*3 % F+ #*3 %
F4:
% * * *
% & % &
*(
% + + % & + *( +
& (% % & *3 % % *3 % % * *3 %
( (
+ /
% ,% ,-
. .
O+
&
*3 %
#
2I5 2; (>>5 ; ( % % A 2B5 2= M &% A+ + ?! A'I:: ( $+ + * , + I99:+ !+ D 6 % 6 * ?= && % !( ! ! % + @ 2 !( + ?6 * @ D ! ! #% B444 6 ( 2) 88-I4/ (R! B444 * ' ' ) ! &, % . * (& = = ! 1 A+ IB % !% % % = % = !( * I9>>+ ! ( 0 % , @ >:5 = &6 D+ ?6 * +@ 000 + A @ , %+ & 6 % O-O/ FF4'FOB ) 6 *
285 2= = !
9:5 = %
( ,+ + ?#
2F5 2=) % 445 + =) % D+' + @ , ! B444 ' I , ' 2O5 2 3 945 3 #% * I994 2L5 2 3 445 3 2>5 2C , 9>5 C ..= I99> A+ + ?, A+ + ?, D + D+ +
6+ + ,
@ =A= ,
F49
2:5 2D &&:O5 D && 6+ ?M >-8/ + F4F'FBO D ( I9:O 295 2D &&! % :O5 D && @ 4 5 6+ ! 1 )
2) = % ! + L8'L9 I9:O %1 !
! %
%3
% % '
C+ ? *6 7 + 6+ ?! 6+ 3 ! '
2I45 2D &&:>5 D && S 2II5 2. C 2IB5 2 ! S $ 9I5 . I99I C 9B5 @!
6+ ?6 * 2) 1 * I9:>+ + + $ + C - ( D+ ?6 * 6 % 6+ ?. 2 % 6+,+ 0 %
( @ ! + >>'98+ = , ( <B +@ & % ,
'
A+ ? 4 ! &# I9:9 +,++ ? @ ! '!( % 2 @ , %+ & @
6 5
E# 1
!( I99B+ %1 & #
2I85 2 :L5 LO I9:L
2) 2
I:-I/ !(
+ 89' @
8 9 2A D &&:95 A ,+ D && ! *6* ( 1 8 9 2! ; ! I9:: 2IL5 2! 9B5 ! ! 2I>5 2! 985 ! ! 2I:5 2 ! E 9B5 - ( ::5 ! & .+ + ; = %
! *66 ( 1
(&
% 2
6 %
D+ ?!, @ D+ ?; I998+ !+ E
6 ! :
1 > & ;
+B
& , 6 % BOI'B:L I99B , 6 * @
'
+ ?, & !
% 4
I99B+
FI4

Zakpoane Paper

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Zakpoane Paper

Uploaded by

Copyright:

Available Formats

#$

! " ' ' ' ( % '!

+ ,+ !)* . +* +' )* . ' +* / !

' ! '! % '!

' #! 9 $! / ) ' ! ' ' 9

# ' ( ' .0 7 ' ' ' 99999999999999999999999999999999 99999999999999999999999 35 08 " 0# 70

Numerical Solution Methods For Markov Chains

Introduction on numerical solution methods for MCs

Which models can be analyzed by numerical solution methods?

What is the basic mathematical problem?

Which solution methods are employed?

Which algorithms are recommended?

What are the limitations of the approach?

What are the perspectives?

Modeling Modern Telecommunication Networks

Tool-supported Network Modeling and Analysis

Examples of Queuing Networks - Buer modeling of a packet-switched network

- Modeling and analysis of packet streams

- Path modeling by a closed QNW with Coxian service

The Structure of MACOM

The algebraic background of computational methods

Aij 0, i = j Dii > 0 and s > 0, B 0, s (B )

and 1jn and 1jn

1jn 1 j n and for some j0 1jn

Formulation of the mathematical problem

stationary solution: input: A = Q t problem: solve subject to Ax=0 x > 0,

- irreducible nonsymmetric singular , M-matrix with zero sums along columns

Classication of numerical methods

an eigenvalue problem (EVP): x=Tx T = P t or T = I + Qt D1 = B t D1 0 T = I + Qt /d 0, d maxi=1,...,n {Qii } > 0 (uniformization procedure)

Category Direct methods

algebraic multigrid advanced iterative methods on structure

Direct solution methods for Markov chains

Block LU variant of Gaussian elimination let A=

A11 A12 A21 A22

A11 A12 (2) 0 A22

1 A22 = A22 + (A21 A 11 )A12

- the Schur complement of A11 A22 - an irreducible Q-matrix, hence,

= 0 - a simple eigenvalue of A22 y 0 with A22 y = 0

Comparison of solution methods

Iterative solution methods for Markov chains

(A) = max{|| | (A)} (A) = max{|| | (A), = (A)}

index (A) = min{k 0 | with

N(A I )k = N(A I )k+1 } B

N(B ) = {z |B z = 0} = null space of

Semiconvergent iteration procedures

holds, where I is missing if 1 (T ) and I IRmm if 1 (T ) is an eigenvalue with multiplicity m.

Block iteration methods

a singular M-matrix a block partition with p > 1

A = (Aij ), 1 i, j p, A block splitting

0 for all i {1, . . . , p} lower block triangular

is semiconvergent and the splitting A = M N weak regular.

Block SOR algorithm for an R-regular splitting

For m = 1 to mT do For i = 1 to p do (k+m) Solve Dii x i = xi endfor endfor 3. Convergence test:

/d then goto step 2

T x .1 > 0 set = x(k+mT ) / x(k+mT ) = . . (k+mT ) xp

Stochastic Interpretation of Block SOR

Generator matrix of the M + MMPP + MMPP/M/n/n:

Qt + 1I4 + 2I4 ... ... ... ... ... ... ...

0 Qt + + nI4 ... ... ...

... ... ... ... ... ... ... ... 0

(n 1)I4 2I4 ... ... ... ... 0 ...

... ... ... 1I4 0

A Two-Level AMG Algorithm