Professional Documents
Culture Documents
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in
printed reviews, without the prior permission of the publisher.
DOI 10.2200/S00270ED1V01Y201008CNT006
Lecture #6
Series Editor: Jean Walrand, University of California, Berkeley
Series ISSN
Synthesis Lectures on Communication Networks
Print 1935-4185 Electronic 1935-4193
Scheduling and Congestion
Control for Wireless and
Processing Networks
M
&C Morgan & cLaypool publishers
ABSTRACT
In this book, we consider the problem of achieving the maximum throughput and utility in a class
of networks with resource-sharing constraints. This is a classical problem of great importance.
In the context of wireless networks, we first propose a fully distributed scheduling algorithm
that achieves the maximum throughput. Inspired by CSMA (Carrier Sense Multiple Access), which
is widely deployed in today’s wireless networks, our algorithm is simple, asynchronous, and easy to
implement. Second, using a novel maximal-entropy technique, we combine the CSMA schedul-
ing algorithm with congestion control to approach the maximum utility. Also, we further show
that CSMA scheduling is a modular MAC-layer algorithm that can work with other protocols
in the transport layer and network layer. Third, for wireless networks where packet collisions are
unavoidable, we establish a general analytical model and extend the above algorithms to that case.
Stochastic Processing Networks (SPNs) model manufacturing, communication, and service
systems. In manufacturing networks, for example, tasks require parts and resources to produce other
parts. SPNs are more general than queueing networks and pose novel challenges to throughput-
optimum scheduling. We proposes a “deficit maximum weight” (DMW) algorithm to achieve
throughput optimality and maximize the net utility of the production in SPNs.
KEYWORDS
scheduling, congestion control, wireless networks, stochastic processing networks, car-
rier sense multiple access, convex optimization, Markov chain, stochastic approximation
vii
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 A Small Wireless Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Feasible Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Maximum Weighted Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 CSMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Entropy Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Admission Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Randomized Backpressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Preface
This book explains recent results on distributed algorithms for networks. The book is based on
Libin’s Ph.D. thesis where he introduced the design of a CSMA algorithm based on a primal-dual
optimization problem, extended the work to networks with collisions, and developed the scheduling
of processing networks based on virtual queues.
To make the book self-contained, we added the necessary background on stochastic approx-
imations and on optimization. We also added an overview chapter and comments to make the
arguments easier to follow. The material should be suitable for graduate students in electrical engi-
neering, computer science, or operations research.
The main theme of this book is the allocation of resources among competing tasks. Such
problems are typically hard because of the large number of possible allocations. Instead of searching
for the optimal allocation at each instant, the approach is to design a randomized allocation whose
distribution converges to one with desirable properties. The randomized allocation is implemented
by a scheme where tasks request the resources after a random delay. Each task adjusts the mean value
of its delay based on local information.
One application is wireless ad hoc networks where links share radio channels. Another appli-
cation is processing networks where tasks share resources such as tools or workers. These problems
have received a lot of attention in the last few years. The book explains the main ideas on simple
examples, then studies the general formulation and recent developments.
We are thankful to Prof. Devavrat Shah for suggesting adjusting the update intervals in one
of the gradient algorithms, to Prof. Venkat Anantharam and Pravin Varaiya for their constructive
comments on the thesis, to Prof. Michael Neely and R. Srikant for detailed constructive reviews of
the book, and to Prof. Vivek Borkar, P.R. Kumar, Bruce Hajek, Eytan Modiano, and Dr. Alexandre
Proutiere for their encouragement and useful feedback. We are grateful to NSF and ARO for their
support of our research during the writing of this book.
CHAPTER 1
Introduction
In a wireless network, nodes share one or more radio channels. The nodes get packets to transmit
from the application and transmit them hop by hop to their destination. For instance, one user may
be downloading a file from another node; two other users might be engaged in a Skype call.
The nodes cannot all transmit together, for their transmissions would then interfere with one
another. Consequently, at any given time, only a subset of nodes should transmit. The scheduling
problem is to design an algorithm for selecting the set of nodes that transmit and a protocol for
implementing the algorithm. Moreover, the nodes should decide which packet to send and to what
neighboring node.
This problem admits a number of formulations. In this book, we adopt a simple model of
interference: two links either conflict or they do not. Thus, conflicts are represented by an conflict
graph whose vertices are all the links and the edges are between pairs of links that conflict and should
not transmit together. Equivalently, there are subsets of links that can transmit together because they
do not share an edge. Such sets are called independent sets.
Intuitively, the set of links that should transmit depends on the backlog of the nodes. For
instance, we explain that choosing the independent set with the maximum sum of backlogs is a
good policy when the nodes need to transmit each packet only once. This policy is called Maximum
Weighted Matching (MWM). Another good policy is to first select the link with the largest backlog,
then the link with the largest backlog among those that do not conflict with the first one, and so
on. This policy is called Longest Queue First (LQF). These two policies are not easy to implement
because the information about the backlog of the nodes is not available to all the nodes. Moreover,
even if all nodes knew all the backlogs, implementing MWM would still be computationally hard
because of the huge number of independent sets even in a small graph.
One key idea in this book is that, instead of looking for the independent set with the maximum
sum of backlogs, one designs a randomized scheduling algorithm. To implement this algorithm, the
nodes choose random waiting times. The mean of the waiting time of each node decreases with
the backlog of that node. After that waiting time, the node listens to the channel. If it does not
hear any transmission, it starts transmitting a packet. Otherwise, it chooses a new waiting time and
repeats the procedure. Note that this algorithm is distributed since each node needs only know its
own backlog and whether any conflicting node is transmitting. Moreover, the algorithm does not
require any complex calculation. One can show that this algorithm, called A-CSMA for adaptive
carrier sense multiple access, selects an independent set with a probability that increases with the sum
of the backlogs in that set. Thus, this randomized algorithm automatically approximates the NP-
hard selection that MWM requires. As you might suspect, the probability distribution of the active
2 1. INTRODUCTION
independent sets may take a long time to converge. However, in practice, this convergence appears
to be fast enough for the mechanism to have good properties.
When the nodes must relay the packets across multiple hops, a good algorithm is to choose
an independent set such that the sum of the differences of backlogs between the transmitters and
the receivers is maximized. Again, this problem is NP-hard and a randomized algorithm is an
approximation with good properties. In this algorithm, the nodes pick a random waiting time whose
mean decreases with the back-pressure of the packet being transmitted. Here, the back-pressure of a
packet is the difference in queue lengths between the transmitter and the receiver, multiplied by the
link rate. We should call this protocol B-CSMA, but we still call it A-CSMA to avoid multiplying
terminology.
When we say that the randomized algorithms have good properties, we mean more than they
are good heuristics that work well in simulations. We mean that they are in fact throughput-optimal
or utility-maximizing. That is, these algorithms maximize the rates of flows through the network, in
a sense that we make precise later. One may wonder how simple distributed randomized algorithms
can have the same throughput optimality as a NP-hard algorithm such as MWM. The reason is
that achieving long-term properties of throughput does not require making the best decision at
each instant. It only requires making good decisions on average. Accordingly, an algorithm that
continuously improves the random selection of the independent set can take a long time to converge
without affecting the long-term throughput. The important practical questions concern the ability
of these algorithms to adapt to changing conditions and also the delays that packets incur with an
algorithm that is only good on average. As we explain, the theory provides some answers to these
questions in the form of upper bounds on the average delays.
Processing networks are models of communication, manufacturing, or service networks. For
instance, a processing network can model a multicast network, a car assembly plant, or a hospital. In
a processing network, tasks use parts and resources to produce new parts that may be used by other
tasks. In a car assembly plant, a rim and a tire are assembled into a wheel; four wheels and a chassis
are put together, and so on. The tasks may share workers and machine tools or robots. In a hospital,
a doctor and nurses examine a patient that may then be dispatched to a surgical theater where other
nurses and doctors are engaged in the surgery, and so on.
The scheduling problem in a processing network is to decide which tasks should be performed
at any one time.The goal may be to maximize the rate of production of some parts, such as completed
cars, minus the cost of producing these parts. Such a problem is again typically NP-hard since it
is more general that the allocation of radio channels in a wireless network. We explain scheduling
algorithms with provable optimality properties.
The book is organized as follows. Chapter 2 provides an illustration of the main results on
simple examples. Chapter 3 explains the scheduling in wireless networks. Chapter 4 studies the
combined admission control, routing, and scheduling problem for network utility maximization.
Chapter 5 studies collisions in wireless networks. Chapter 6 is devoted to processing networks.
Chapter A explains the main ideas of stochastic approximations that we use.
3
CHAPTER 2
Overview
This chapter explains the main ideas of this book on a few simple examples. In Section 2.1, we consider
the scheduling of three wireless nodes and review the maximum weighted matching (MWM) and
the A-CSMA scheduling. Section 2.2 explains how to combine admission control with scheduling.
Section 2.3 discusses the randomized backpressure algorithms. Section 2.4 reviews the Lagrangian
method to solve convex optimization problems. We conclude the chapter with a summary of the
main observations.
1 2 3
1 2 3
conflict conflict
Figure 2.1: A network with three links.
1, 2, and 3 where each link is a pair of radio transmitter and receiver. Packets arrive at the links (or
more specifically, the transmitters of the links) with the rates indicated in the right figure. A simple
situation is one where at each time t = 0, 1, 2, . . . a random number of packets with mean λi and
a finite variance arrive at link i, independently of the other times and of the arrivals at other links
and with the same distribution at each time. Thus, the arrivals are i.i.d. (independent and identically
distributed) at each link, and they are independent across links. Say that the packet transmissions
take exactly one time unit.
The links 1 and 2 conflict: if their transmitters transmit together, the signals interfere and
the receivers cannot recover the packets. The situation is the same for links 2 and 3. Links 1 and
3, however, are far enough apart not to interfere with one another. If they both transmit at the
same time, their receivers can get the packets correctly. The figure on the right side has omitted the
receivers (such that the three circles there correspond to the three links in the left figure), and it
represents the above conflict relationships by a solid line between links 1 and 2 and another between
4 2. OVERVIEW
links 2 and 3. (In Section 2.1 and 2.2, since only one-hop flows are considered, we omit the receivers
and use the terms node and link interchangeably.)
Thus, at any given time, if all the nodes have packets to transmit, the sets of nodes that can
transmit together without conflicting are ∅, {1}, {2}, {3}, and {1, 3}, where ∅ designates the empty
set.These sets are called the independent sets of the network. An independent set is said to be maximal
if one cannot add another node to it and get another independent set. Thus, {2} and {1, 3} are the
maximal independent sets.
Condition (2.1) means that, outside of a finite set A of states, the function V tends to decrease.
Since the function is nonnegative, it cannot decrease all the time. Consequently, X(n) must spend
a positive fraction of time inside A. By (a), this implies that the Markov chain is positive recurrent
since A is finite.
λ1 + λ2 ≤ 1 and λ2 + λ3 ≤ 1. (2.2)
(b) Moreover, if
λ1 + λ2 < 1 and λ2 + λ3 < 1, (2.3)
then there is a schedule such that X(n) is positive recurrent, where X(n) = (X1 (n), X2 (n), X3 (n)) denotes
the vector of queue lengths at time n.
We say that the arrival rates are feasible if they satisfy (2.2) and that they are strictly feasible if
they satisfy (2.3).
Proof:
(a) Assume that λ1 + λ2 > 1. At any given time, at most one of the two nodes 1 and 2 can
transmit. Consequently, the rate at which transmissions remove packets from the two nodes {1, 2} is
at most 1. Thus, packets arrive faster at the nodes {1, 2} than they leave. Consequently, the number
of packets in these nodes must grow without bound.
To be a bit more precise, let Qn be the total number of packets in the nodes {1, 2} at time
n ∈ {0, 1, 2, . . . }. Note that
Qn ≥ An − n
where An is the number of arrivals in the nodes {1, 2} up to time n. Indeed, at most n packets have
left between time 0 and time n − 1. Also, by the strong law of large numbers, An /n → λ1 + λ2
almost surely as n → ∞. Thus, dividing the above inequality by n, we find that
1
lim inf Qn ≥ λ1 + λ2 − 1 > 0.
n
This implies that Qn → ∞ almost surely as n → ∞. Thus, no schedule can prevent the backlog in
the network from growing without bound if λ1 + λ2 > 1, and similarly if λ2 + λ3 > 1.
(b) Assume that (2.3) holds. Then there is some p ∈ [0, 1] be such that
1 2
V (X(n)) = [X (n) + X22 (n) + X32 (n)]
2 1
6 2. OVERVIEW
is a Lyapunov function. To check the property (2.1), recall that Ai (n) the number of arrivals in queue
i at time n. Let also Si (n) take the value 1 if queue i is served at time n, and the value 0 otherwise.
Then Xi (n + 1) = Xi (n) − Zi (n) + Ai (n) where Zi (n) := Si (n)1{Xi (n) > 0}. Note that Xi (n) is
a non-negative integer (since both Ai (n) and Si (n) are integers). Therefore,
Hence,
1
E[Xi2 (n + 1) − Xi2 (n)|X(n)] ≤ βi + (λi − pi )Xi (n),
2
where βi = E(A2i (n) + Si2 (n))/2 and pi = E[Si (n)|X(n)], so that p1 = p3 = p, p2 = 1 − p.
Consequently, summing these inequalities for i = 1, 2, 3, one finds that
3
E[V (X(n + 1)) − V (X(n))|X(n)] ≤ β + (λi − pi )Xi (n)
i=1
with β = β1 + β2 + β3 . Now, λi − pi < −γ < 0 for i = 1, 2, 3, for some γ because the arrival
rates are strictly feasible. This expression is less than − if β − γ (X1 (n) + X2 (n) + X3 (n)) < −,
which occurs if
β +
X1 (n) + X2 (n) + X3 (n) > ,
γ
and this is the case when X(n) is outside the finite set defined by the opposite inequality. Therefore,
X(n) is positive recurrent by Theorem 2.2 (b).
2
The theorem does not clarify what happens when λ1 + λ2 = 1 or λ2 + λ3 = 1. The answer
is a bit tricky. To understand the situation, assume λ1 = 1, λ2 = λ3 = 0. In this case, one may serve
node 1 all the time. Does queue 1 grow without bound? Not if the arrivals are deterministic: if
exactly one packet arrives at each time at node 1, then the queue does not grow. However, if the
arrivals are random with mean 1, then the queue is not bounded. For instance, if two packets arrive
with probability 0.5 and no packet arrives with probability 0.5, then the queue length is not positive
recurrent. This means that the queue spends a zero fraction of time below any fixed level and its
mean value goes to infinity.
Proof:
This result is due to Tassiulas and Ephremides (66). Let Xi (n) be the queue length in node
i at time n (i = 1, 2, 3; n = 0, 1, 2, . . .). Let also X(n) = (X1 (n), X2 (n), X3 (n)) be the vector of
queue lengths. Define
1
V (X(n)) = [X12 (n) + X22 (n) + X32 (n)]
2
as half the sum of the squares of the queue lengths.
The claim is that, under MWM, V (X(n)) is a Lyapunov function for the Markov chain X(n).
Proceeding as in the proof of the previous theorem and with the same notation, one finds (2.5).
Taking expectation given X(n) and noting that Si (n) is now a function of X(n) determined by the
MWM algorithm, one finds
3
E[V (X(n + 1)) − V (X(n))|X(n)] ≤ β + (λi − Si (n))Xi (n).
i=1
To prove (2.1), it now suffices to show that this expression is less than − for X(n) outside a finite
set.
To do this, note that MWM chooses the value of {Si (n), i = 1, 2, 3} that maximizes
Si (n)Xi (n). The maximum value must then be larger than pX1 (n) + (1 − p)X2 (n) + pX3 (n),
where p is the probability defined before such that (2.4) holds. Indeed, the maximum is either
X1 (n) + X3 (n) or X2 (n), and this maximum is larger than any convex combination of these two
values. Hence,
E[V (X(n + 1)) − V (X(n))|X(n)]
≤ β + (λ1 − p)X1 (n) + (λ2 − (1 − p))X2 (n) + (λ3 − p)X3 (n).
In the proof of the previous theorem, we showed that the right-hand side is less than − when Xn
is outside of a finite set.
2
You will note that the crux of the argument is the MWM makes the sum of the squares of
the queue lengths decrease faster than a randomized schedule and such a randomize schedule exists
that makes that sum decrease fast enough when the arrival rates are strictly feasible.
8 2. OVERVIEW
2.1.3 CSMA
Although the MWM algorithm makes the queues positive recurrent when the arrival rates are
strictly feasible, this algorithm is not implementable in a large network for two reasons. First, to
decide whether it can transmit, a node must know if it belongs to the independent set with the
maximum weight. To determine that independent set, some node must know the queue lengths.
Getting that information requires a substantial amount of control messages. Second, identifying the
maximum weight independent set, even when knowing all the queue lengths, is a computationally
hard problem. Indeed, the number of independent sets in a large network is enormous and comparing
their weights requires an excessive number of computations.
In this section, we describe a different approach based on a Carrier Sense Multiple Access
(CSMA) protocol. When using this protocol, node i waits a random amount of time that is expo-
nentially distributed with rate Ri , i.e., with mean 1/Ri . The waiting times of the different nodes
are independent. At the end of its waiting time, a node listens to the radio channel. If it hears some
transmission, then it calculates a new waiting time and repeats the procedure. Otherwise, it transmits
a packet. For simplicity, we assume for now that the carrier sensing is perfect. That is, if one node i
starts transmitting at time t and a conflicting node j listens to the channel at time t + , then we
assume that node j hears the transmission of node i, for any arbitrarily small . Therefore, there
is no collision because under the above assumption a collision can only occur when two conflicting
nodes start transmitting at exactly the same time, which has probability 0 with the exponentially
distributed backoff times. In practice, this assumption is not valid: it takes some time for a node to
sense the transmission of another node. Moreover, we assume that there is no hidden node. This
means that if node i does not hear any conflicting transmission and starts sending a packet to its
intended receiver k, then there is no other node j that is transmitting and can be heard by k and
not by i. This is another approximation. In Chapter 5, we explain how to analyze the network with
collisions.
It turns out that this protocol is easier to analyze in continuous time than in discrete time.
For ease of analysis, we also assume that the packet transmission times are all independent and
exponentially distributed with mean 1. The arrival processes at the three nodes are independent with
rate λi .
Let us pretend that the nodes always have packets to transmit and that, when they run out,
they construct a dummy packet whose transmission time is distributed as that of a real packet. With
these assumptions, the set St of nodes that transmit at time t is modeled by a continuous-time
Markov chain that has the state transition diagram shown in Figure 2.2.
For instance, a transition from ∅ to {1} occurs when node 1 starts to transmit, which happens
with rate R1 when the waiting time of that node expires. Similarly, a transition from {1} to ∅ occurs
when the transmission of node 1 terminates, which happens with rate 1. Note that a transition from
{2} to {1, 2} cannot happen because node 1 senses that node 2 is already transmitting when its
waiting time expires. The other transitions can be explained in a similar way. We call this Markov
chain the CSMA Markov chain because it models the behavior of the CSMA protocol.
2.1. A SMALL WIRELESS NETWORK 9
R1 {1}
R3
1
1 R2
{2} {1, 3}
1 R1
1 R3
{3} 1
π(∅) = K and π(S) = Ki∈S Ri for S ∈ {{1}, {2}, {3}, {1, 3}} (2.6)
where K is such that the probabilities of the independent sets add up to one.
Proof:
Recall that a continuous-time Markov chain with rate matrix Q has invariant distribution π
and is time-reversible if and only if
A stochastic process is time-reversible if it has the same statistical properties when reversed in time.
The conditions above, called detailed balance equations, mean that when the Markov chain is stationary,
the rate of transitions from i to j is the same as the rate of transitions from j to i. If that were not
the case, one could distinguish between forward time and reverse time and the Markov chain would
not be time-reversible. Note also that by summing these identities over i, one finds that
π(i)q(i, j ) = π(j ) q(j, i) = 0
i i
where the last equality follows from the fact that the rows of a rate matrix sum to zero. Thus, π Q = 0
and π is therefore the stationary distribution of the Markov chain. See (38) for a discussion of this
method and its applications.
For the CSMA Markov chain, it is immediate to verify the detailed balance equations. For
instance, let i = {1} and j = {1, 3}. Then q(i, j ) = R3 and q(j, i) = 1, so that one has
This expression shows that, when using the CSMA protocol, the independent set with the largest
weight transmits the largest fraction of time. Thus, the CSMA protocol automatically approximates
the solution of the hard problem of finding the independent set with the maximum weight. However,
it may take a long time for the Markov chain distribution to approach its stationary distribution. In
the mean time, the queue lengths change. Thus, it is not quite clear that this scheme should make
the queues positive recurrent. We prove that this is indeed the case in Chapter 3.
Maximize H (π ) := − π(S) log π(S)
S
Subject to sj (π ) := π(S) ≥ λj , j = 1, 2, 3 and π(S) = 1. (2.7)
{S|j ∈S} S
In these expressions, the sums are over all independent sets S and H (π ) is the entropy of the
distribution π (see (65)). Also, sj (π ) is the service rate of link j under the distribution π since it is
the sum of the probabilities that link j is served.
2.1. A SMALL WIRELESS NETWORK 11
To solve the problem (2.7), we associate a Lagrange multiplier with each inequality constraint
and with the equality constraint. (See Section 2.4 for a review of that method.) That is, we form the
Lagrangian
L(π, r) = − π(S) log π(S) − rj (λj − π(S)) − r0 (1 − π(S)). (2.8)
S j {S|j ∈S} S
We know that if the rates are strictly feasible, then there is a distribution π that satisfies the
constraints of the problem (2.7). Consequently, to solve (2.7), we can proceed as follows. We find
π that maximizes L(π, r) while r ≥ 0 minimizes that function. More precisely, we look for a saddle
point (π ∗ , r ∗ ) of L(π, r), such that π ∗ maximizes L(π, r ∗ ) over π , and r ∗ minimizes L(π ∗ , r) over
r ≥ 0. Then π ∗ is an optimal solution of (2.7). (In Section 2.4, we give an example to illustrate this
generic method for solving constrained convex optimization problems.)
To maximize L(π, r) over π , we express that the partial derivative of L with respect to π(S0 )
is equal to zero, for every independent set S0 . From (2.8), we find
∂
L(π, r) = −1 − log(π(S0 )) + rj + r0 = 0.
∂π(S0 )
j ∈S0
∂
L(π, r) = −(λj − π(S)) = −(λj − sj (π )).
∂rj
{S|j ∈S}
Consequently, to minimize L(π, r), the gradient algorithm should update rj is the direction opposite
to this derivative, according to some small step size. Also, we know that rj ≥ 0, so that the gradient
algorithm should project the update into [0, ∞).That is, the gradient algorithm updates rj as follows:
Here, for any real number x, one defines {x}+ := max{x, 0}, which is the value in [0, ∞) that is the
closest to x. Also, n corresponds to the n-th step of the algorithm. At that step, the parameters r(n)
are used, and they correspond to the invariant distribution π(n) given by (2.9) with those parameters.
In this expression, α(n) is the step size of the algorithm.
12 2. OVERVIEW
This update rule has the remarkable property that link j should update its parameter rj
(which corresponds to Rj = exp{rj } in the CSMA protocol) based only on the difference between
the average arrival and service rates at that link. Thus, the update does not depend explicitly on what
the other links are doing. The service rate at link j certainly depends in a complicated way on the
parameters of the other links. However, that average arrival and service rates at link j are the only
information that link j requires to update its parameter.
Note also that if the average arrival rate λj at link j is larger that the average service rate
sj of that link, then that link should increase its parameter rj , thus becoming more aggressive in
attempting to transmit, and conversely.
Unfortunately, the link observes actual arrivals and transmissions, not their average rates. In
other words, link j observes a “noisy” version of the gradient λj − sj (π(n)) that it needs to adjust
its parameter rj . This noise in the estimate of the gradient is handled by choosing the step sizes
α(n) and also the update intervals carefully.
Let us ignore this difficulty for now to have a rough sense of how the link should update its
parameter. That is, let us pretend that link j updates its parameter rj every second, that the total
number of arrivals Aj (n) at the link during that second is exactly λj , and that the total number of
transmissions Dj (n) by the link is exactly equal to the average value sj . To simplify the situation even
further, let us choose a fixed step size in the algorithm, so that α(n) = α. With these assumptions,
the gradient algorithm (2.10) is
Now observe that the queue length Xj (n) at time n satisfies a very similar relation. Indeed, one has
Rj = exp{αXj }, j = 1, 2, 3. (2.11)
In other words, node j should select a waiting time that is exponentially distributed with rate
exp{αXj }. This algorithm is fully distributed and is very easy to implement.
However, the actual algorithms we will propose, although still simple and distributed, are
a little different from (2.11). This is because we derived the algorithm (2.11) by making a key
approximation, that the arrivals and transmissions follow exactly their average rate. To be correct, we
have to adjust r slow enough so that the CSMA Markov chain approaches its invariant distribution
before r changes significantly. There are at least two ways to achieve this.
2.1. A SMALL WIRELESS NETWORK 13
One way is to modify algorithm (2.11) by using a small enough constant step size α, a large
enough constant update interval T , and imposing an upper bound on r so that the mixing time of
the CSMA Markov chain is bounded. Specifically, node i uses a waiting time that is exponentially
distributed with rate Ri . Every T seconds, all Ri ’s are updated as follows:
where rmax ,
> 0 are constants, and Xi (n)’s are the queue lengths at the time of the n’s update.
Then, we explain in Section 3.4.3 that this algorithm is almost throughput-optimal. (That
is, it can stabilize the queues if the vector of arrival rates is in a region parametrized by rmax . The
region is slightly smaller than the maximal region.)
Another way is to use diminishing step sizes and increasing update intervals so that eventually
the arrival rates and service rates get close to their average values between two updates.This is a time-
varying algorithm since the step sizes and update intervals change with time. Detailed discussions
are provided in Section 3.4.1 and 3.4.2.
Recapping, the main point of this discussion is that solving the problem (2.7) shows that
there are parameters Rj of the CSMA protocol that serve the links fast enough. Moreover, these
parameters are roughly exponential in the queue lengths. Finally, with a suitable choice of the step
sizes and of the update intervals, one can make the algorithm support the arrival rates.
2.1.5 DISCUSSION
Before moving on to the next topic, it may be useful to comment on the key ideas of the current
section.
The first point that we want to address is the two different justifications we gave for why
Rj = exp{αXj } are suitable parameters. Recall that the first justification was that if the queue
lengths do not change much while the Markov chain approaches its stationary distribution, then
choosing these values leads to a product form π(S) = C exp{α j ∈S Xj } that favors independent
sets with a large weight. Thus, in some sense, this choice is an approximation of MWM, which we
know is stable. One flaw in this argument is that the approximation of MWM is better if α is large.
However, in that case, the parameters Rj change very fast as the queue lengths change. This is not
consistent with the assumption that the queue lengths do not change much while the Markov chain
approaches its stationary distribution. The second justification is that it corresponds to a gradient
algorithm with a fixed step size. For this algorithm to be good, the step size has to be fairly small.
However, in that case, we know that the algorithm takes a long time to converge. Thus, we find the
usual tradeoff between speed of convergence and accuracy of the limit.
The second point is related to the first and concerns the convergence time of the algorithm.
The number of states in the Markov chain is the number of independent sets. This number grows
exponentially with the number of links. Thus, one should expect the algorithm to converge slowly
and to result in very poor performance in any practical system. In practice, this is not the case. In fact,
the algorithm appears to perform well. The reason may have to do with the locality of the conflicts
14 2. OVERVIEW
so that good decisions may depend mostly on a local neighborhood and not on a very large number
of links.
The approach is to combine the A-CSMA protocol as before with admission control. As we
explain below, the network controls the arrivals as follows. When the backlog of node i is Xi , the
arrival rate λi is chosen to maximize ui (λi ) − γ Xi λi where γ is some positive constant. Note that
the choice of λi depends only on the backlog in link i, so that the algorithm is local.
Thus, the arrival rates decrease when the backlogs in the nodes increase. This is a form of
congestion control. Since the mechanism maximizes the sum of the user utilities, it implements a
fair congestion control combined with the appropriate scheduling. In the networking terminology,
one would say that this mechanism combines the transport and the MAC layer. This mechanism is
illustrated in Figure 2.3.
1 2 3
Figure 2.3: Combined admission control and scheduling. Note that the node decisions are based on
local information.
2.3. RANDOMIZED BACKPRESSURE 15
The main idea is to derive this combined admission control and scheduling algorithm is to
replace the problem (2.13) by the following one:
Maximize H (π ) + β ui (λi )
i
s.t. sj (π ) ≥ λj , ∀j. (2.14)
In this problem, β is some positive constant. If this constant is large, then a solution of (2.14)
approximates the solution of (2.13). Indeed, H (π ) is bounded and has a negligible effect on the
objective function of (2.14) when β is large.
The Lagrangian for problem (2.14) (see Section 2.4) is
L(π, λ, r) = H (π) + β ui (λi ) + rj [sj (π ) − λj ] − r0 [1 − π(S)].
i j S
As before, the maximization over π results in the CSMA protocol with rates Rj = exp{rj }. Also,
the minimization over r using a gradient algorithm is as before, which yields rj ≈ αXj . The maxi-
mization over λ amounts to choosing each λj to solve the following problem:
with γ = αβ −1 . This analysis justifies the admission control algorithm we described earlier.
links instead of nodes.) One obvious conflict is that a link can send only one packet at a time, so that
it must choose whether to send a packet of flow 1 or one of flow 2 when it has the choice. We assume
also that the transmissions have different success probabilities and possibly different physical-layer
transmission rates. For instance, when link e transmits packets, these packets reach the next node
with average rate r(e).
The goal is to design the admission control, the routing, and the scheduling to maximize the
utility
u1 (λ1 ) + u2 (λ2 )
of the two flows of packets.
We explain in Chapter 4 that the following algorithm, again called A-CSMA combined with
admission control and routing, essentially solves the problem:
1) Queuing: Each node maintains one queue per flow of packets that goes through it.
2) Admission Control: λ1 is selected to maximize u1 (λ1 ) − γ X1 λ1 where γ is some constant and X1
is the backlog of packets of flow 1 in the ingress node for these packets. Similarly, λ2 maximizes
u2 (λ2 ) − γ X2 λ2 where X2 is the backlog of packets of flow 2 in the ingress node for these packets.
3) Priority: Each link selects which packet to send as follows. Link d chooses to serve flow 1 since
(8 − 4)r(d) > (3 − 5)r(d) and (8 − 4)r(d) > 0. Here, (8 - 4) is the difference between the backlogs
2.3. RANDOMIZED BACKPRESSURE 17
of packets of flow 1 in node t (d) and w(d), and (3 - 5) is the difference between the backlogs of
packets of flow 2 in node t (d) and w(d). That is, the link chooses the flow with the maximum
backpressure if it is positive. If the maximal backpressure is non-positive, then the link does not
serve any flow. The backpressure of a flow on a link is defined as the rate of the link multiplied by the
difference in the backlogs of that flow between the transmitter and receiver of the link. (One could
think of the backlog as a potential, the rate of the link as its conductance, and the backpressure as
the current across the link when it is activated; the link chooses the flow with the largest current.)
4) Scheduling and routing: The links use the CSMA protocol. Each link has an independent backoff
timer (which is maintained by the transmitter of the link). The rate of the exponentially distributed
backoff delay of a link is exponential in the positive part of backpressure of the flow it has selected.
Since link d selects flow 1 and its backpressure is γ := (8 − 4)r(d), the backoff delay of that link
is then exponentially distributed with rate exp{αγ } where α is a constant. Note that link b and g,
both serving flow 2, have independent backoff timers. The link with the larger backpressure has a
smaller mean backoff delay. This is a (randomized) routing decision. The intuition is that the packets
should be sent where they flow better.
We justify this algorithm when all the links have the same unit rate, to simplify the notation.
Packets of flow f arrive into the network with rate λf , for f = 1, . . . , F . The utility of that flow is
uf (λf ). Each node i maintains a separate queue for each flow f of packets that go through it. The
backlog in that queue is Xi,f . Let sj,f be the service rate of packets of flow f by link j . Let δf and
f be the source and destination node of flow f . Consider the following problem:
Maximize H (π) + β uf (λf )
f
Subject to sj,f ≤ sj ,f , ∀f, i = δf , i =
f , λf ≤ sj ,f , ∀f,
j :w(j )=i j :t (j )=i j :t (j )=δ
f
In this problem, sj (π) is the average service rate of link j under the distribution π .
With dual variables ri,f ’s, one forms a partial Lagrangian:
L = H (π) + β uf (λf ) − ri,f [ sj,f − sj ,f ]
f f,i=δf ,i =
f j :w(j )=i j :t (j )=i
− rδf ,f [λf − sj ,f ] − r0 [1 − π(S)].
f j :t (j )=δ f S
We need to maximize L over π, sj,f , λf subject to the constraint f sj,f ≤ sj (π ), and
minimize L over ri,f ≥ 0.
The minimization over {ri,f } with a gradient algorithm shows that ri,f ≈ αXi,f . For any pos-
itive constants {ri,f }, we maximize L over {sj,f } and π as follows. First fix π . For a given (j, f ), note
18 2. OVERVIEW
that the term sj,f appears in L at most twice: one is in the total departure rate of flow f from node
t (j ), and the other is in the total arrival rate of flow f to node w(j ), if w(j ) =
f . Accordingly, sj,f
appears in L with the factor b(j, f ) := rt (j ),f − rw(j ),f ≈ α(Xt (j ),f − Xw(j ),f ), with the conven-
tion that r
f ,f = 0. Denote the maximal backpressure on link j as B(j ) := maxf b(j, f ). Then,
subject to the constraint f sj,f ≤ sj (π ) (where sj (π ) is fixed at the moment), the Lagrangian
is maximized by choosing sj,f = sj (π ) for an f satisfying b(j, f ) = B(j ) (i.e., choosing a flow
with the maximal backpressure) if B(j ) > 0, and choosing sj,f = 0, ∀f if B(j ) ≤ 0. Plugging the
solution of {sj,f } back to L, we get
L = H (π) + β [uf (λf ) − rδf ,f λf ] + [B(j )]+ sj (π ) − r0 [1 − π(S)].
f j S
Then, we maximize L
over π . Similar to the last section, this gives the CSMA algorithm with
Rj = exp{[B(j )]+ }. Finally, the maximization of L over λf yields the same admission control
algorithm as before. By now, we have derived all components of the algorithm described earlier in
this section.
2.4 APPENDIX
In this section, we illustrate an important method to solve a constrained convex optimization problem
by finding the saddle point of the Lagrangian. Consider the following problem.
maxx −x12 − x22
s.t. x1 + x2 ≥ 4
x1 ≤ 6, x2 ≤ 5. (2.15)
With dual variables μ ≥ 0, form a Lagrangian
L(x; μ) = −x12 − x22 + μ1 (x1 + x2 − 4) + μ2 (6 − x1 ) + μ3 (5 − x2 ).
We aim to find the saddle point (x ∗ , μ∗ ) such that x ∗ maximizes L(x; μ∗ ) over x, and μ∗
minimizes L(x ∗ ; μ) over μ ≥ 0.
One can verify that x ∗ = (2, 2)T and μ∗ = (4, 0, 0)T satisfy the requirement. Indeed, we have
∂L(x; μ)/∂x1 = −2x1 + μ1 − μ2
∂L(x; μ)/∂x2 = −2x2 + μ1 − μ3
∂L(x; μ)/∂μ1 = x1 + x2 − 4
∂L(x; μ)/∂μ2 = 6 − x1
∂L(x; μ)/∂μ3 = 5 − x2 .
So given μ∗ , ∂L(x ∗ ; μ∗ )/∂x1 = 0 and ∂L(x ∗ ; μ∗ )/∂x2 = 0. Given x ∗ , ∂L(x ∗ ; μ∗ )/∂μ1 =
0, μ∗1 > 0; ∂L(x ∗ ; μ∗ )/∂μ2 > 0, μ∗2 = 0 and ∂L(x ∗ ; μ∗ )/∂μ3 > 0, μ∗3 = 0.
It is also easy to verify that x ∗ = (2, 2)T is indeed the optimal solution of (2.15).
For an in-depth explanation of this Lagrangian method, please refer to (8).
2.5. SUMMARY 19
2.5 SUMMARY
This chapter introduced the problem of scheduling links that interfere. We use a simplified model
of interference captured by a conflict graph: either two links conflict or they do not. Accordingly, at
any given time, only links in an independent set can transmit. The first problem is to decide which
independent set should transmit to keep up with arrivals.
We explained that the problem has a solution if the arrival rates are small enough (strictly
feasible). In that case, a simple randomized schedule makes the queue lengths positive recurrent.
The technique of proof was based on a Lyapunov function. However, this schedule requires knowing
the arrival rates.
MWM selects the independent set with maximum sum of backlogs. We proved it makes the
queues positive recurrent, again by using a Lyapunov function. Unfortunately, this algorithm is not
implementable in a large network.
We then described the A-CSMA protocol where the exponentially distributed waiting time of
a node has a rate exponential in its backlog. By exploring the CSMA Markov chain, we showed that
this protocol tends to select an independent set with a large sum of backlogs. We stated a theorem
that claims that this protocol makes the queues positive recurrent.
We then showed how to combine this protocol with admission control to maximize the sum
of the utilities of the flows of packets through the network. The network accepts packets at a rate
that decreases with the backlog in their ingress node.
Finally, we described a multi-hop network where nodes can decide which packet to send
and to which neighbor. We explained that each link selects the flow with the largest backpressure.
Moreover, the links use a CSMA protocol where the mean waiting times are exponentially decreasing
in that backpressure.
21
CHAPTER 3
Scheduling in Wireless
Networks
In this chapter, we consider the scheduling of wireless nodes, assuming perfect CSMA and no hidden
nodes, as we did in Chapter 2. The arrival rates are fixed and each packet reaches its intended receiver
in one hop. We model the interference between links by a conflict graph. The objective is to design
a distributed scheduling protocol to keep up with the arrivals.
In Section 3.1, we formulate the scheduling problem. Section 3.2 defines the CSMA algo-
rithm and studies the CSMA markov chain with fixed parameters. In Section 3.3, we show that
there exist suitable parameters in the CSMA algorithm to support any vector of strictly feasible ar-
rival rates, and these parameters can be obtained by maximizing a concave function whose gradient
is the difference between the average arrival rates and the average service rates at the nodes. This
observation suggests an idealized algorithm to adjust the CSMA parameters. However, the nodes
observe the actual service rates and arrival rates, not their average values. Consequently, the proposed
algorithm, described in Section 3.4.1, is a stochastic approximation algorithm called Algorithm 1.
Different from Algorithm 1, Section 3.4.3 proposes another algorithm where the CSMA param-
eters are directly related to the queue lengths. Section 3.5 provides an alternative interpretation of
the algorithms. It shows that the suitable invariant distribution of the independent sets has the
maximal entropy consistent with the average service rates being at least equal to the arrival rates.
This maximum entropy distribution is precisely that of a CSMA Markov chain with the appro-
priate parameters. This interpretation is important because it enables to generalize the algorithms
to solve utility maximization problems with admission control and routing, as we do in Chapter
4. Section 3.6 explains a variation of Algorithm 1, called Algorithm 1(b), to reduce delays in the
network. Section 3.7 provides simulation results that confirm the properties of Algorithms 1 and
1(b). Sections 3.8, 3.9 and 3.10 are devoted to the proof of the optimality of the proposed algorithm.
In Section 3.12, we explain how the result extends to the case when the packet transmission times
have general distributions that may depend on the link. Finally, Section 3.13 collects a few technical
proofs.
it is possible to serve the arriving traffic with some transmission schedule. Denote the set of feasible
λ by C¯.
(ii) λ is said to be strictly feasible iff it is in the set C which denotes the interior of C¯.
Recall that the interior of a set is the collection of points surrounded by a ball of points in
that set. That is, the interior of C¯ is defined as int C¯ := {λ ∈ C¯|B (λ, d) ⊆ C¯ for some d > 0}, where
B (λ, d) = {λ | ||λ − λ||2 ≤ d}.
We show the following relationship in Section 3.13.1.
For example, the vector λ = (0.4, 0.6, 0.4) of arrival rates is feasible since λ = 0.4 ∗ (1, 0, 1) +
0.6 ∗ (0, 1, 0). However, it is not strictly feasible because the IS (0, 0, 0) has zero probability. On
the other hand, λ = (0.4, 0.5, 0.4) is strictly feasible.
Now we define what is a scheduling algorithm and when it is called “throughput-optimum”.
In the definition above, stabilizing the queues admits two definitions. When the network is
modeled by a time-homogeneous Markov process (e.g., if the algorithm uses a constant step size),
we define stability by the positive (Harris) recurrence1 of the Markov process. When the network
Markov process is not time-homogeneous (e.g., if the algorithm uses a decreasing step size), we
say that the queues are stable if their long-term departure rate is equal to their average arrival rate
(which is also called rate-stability).
For simplicity, assume that the packet sizes upon transmission can be different from the sizes
of the arrived packets (by re-packetizing the bits in the queue), in order to give the exponentially
distributed transmission times. We discuss how to relax the assumption on the transmission times
in Section 3.12 (which not only provides a more general result but can also make re-packetization
unnecessary).
Assuming that the sensing time is negligible, given the continuous distribution of the backoff
times, the probability for two conflicting links to start transmission at the same time is zero. So
collisions do not occur in idealized-CSMA.
1 Positive recurrence is defined for Markov process with countable state space. The concept of positive Harris recurrence is for
Markov process with uncountable state space, and it can be viewed as a natural extension of positive recurrence. However, the
precise definition of positive Harris recurrence is not given here since the concept is not used in this book. Interested readers
can refer to (29) for an exact definition and a proof that our CSMA algorithm with a constant step size ensures positive Harris
recurrence.
2 If more than one backlogged links share the same transmitter, the transmitter maintains independent backoff timers for these
links.
24 3. SCHEDULING IN WIRELESS NETWORKS
It is not difficult to see that the transmission states form a continuous time Markov chain,
which is called the CSMA Markov chain. The state space of the Markov chain is X . Denote link k’s
neighboring set by N (k) := {m : (k, m) ∈ E }. If in state x i ∈ X , link k is not active (xki = 0) and all
i = 0, ∀m ∈ N (k)), then state x i transits to state x i + e
of its conflicting links are not active (i.e., xm k
with rate Rk , where ek is the K-dimension vector whose k’th element is 1 and all other elements
are 0’s. Similarly, state x i + ek transits to state x i with rate 1. However, if in state x i , any link in its
neighboring set N (k) is active, then state x i + ek does not exist (i.e., x i + ek ∈ / X ).
Let rk = log(Rk ). We call rk the transmission aggressiveness (TA) of link k. For a given positive
vector r = {rk , k = 1, . . . , K}, the CSMA Markov chain is irreducible. Designate the stationary
distribution of its feasible states x i by p(x i ; r). We have the following result (see ((5; 71; 45)):
Proof: As in the proof of Theorem 2.6, we verify that the distribution (3.1)–(3.2) satisfies the
i = 0, ∀m ∈ N (k).
detailed balance equations. Consider states x i and x i + ek where xki = 0 and xm
From (3.1), we have
p(x i + ek ; r)
= exp(rk ) = Rk
p(x i ; r)
which is exactly the detailed balance equation between state x i and x i + ek . Such relations hold for
any two states that differ in only one element, which are the only pairs that correspond to nonzero
transition rates. It follows that the distribution is invariant. 2
Note that the CSMA Markov chain is time-reversible since the detailed balance equations
hold. In fact, the Markov chain is a reversible “spatial process” and its stationary distribution (3.1)
is a Markov Random Field ((38), page 189; (17)). (This means that the state of every link k is
conditionally independent of all other links, given the transmission states of its conflicting links.)
Later, we also write p(x i ; r) as pi (r) for simplicity. These notations are interchangeable
throughout the chapter. And let p(r) ∈ RN + be the vector of all pi (r)’s. It follows from Lemma 1
that sk (r), the probability that link k transmits, is given by
sk (r) = i [xki · p(x i ; r)]. (3.3)
Without loss of generality, assume that each link k has a capacity of 1. That is, if link k
transmits data all the time (without contention from other links), then its service rate is 1 (unit of
3.3. IDEALIZED ALGORITHM 25
data per unit time). Then, sk (r) is also the normalized throughput (or service rate) with respect to
the link capacity.
Even if the transmission time is not exponential distributed but has a mean of 1, references (5;
45) show that the stationary distribution (3.1) still holds. That is, the stationary distribution is
insensitive to the distributions of the transmission time. For completeness, we present a simple proof
of that insensitivity as Theorem 3.22 in Section 3.12.
Proof. Let d ≥ 0 be a vector of dual variables associated with the constraints r ≥ 0 in problem (3.5),
then the Lagrangian is L(r; d) = F (r; λ) + dT r. At the optimal solution r ∗ , we have
j K j ∗
∂ L(r ∗ ; d∗ ) xk exp( k=1 xk rk )
+ dk∗
j
= λk −
∂rk C(r ∗ )
= λk − sk (r ∗ ) + dk∗ = 0 (3.6)
where sk (r), according to (3.3), is the service rate (at stationary distribution) given r. Since dk∗ ≥ 0,
λk ≤ sk (r ∗ ). 2
Equivalently, problem (3.5) is the same as minimizing the Kullback–Leibler divergence (KL
divergence) between the two distributions p̄ and p(r):
That is, we choose r ≥ 0 such that p(r) is the “closest” to p̄ in terms of the KL divergence.
The above result is related to the theory of Markov Random Fields (68): when we minimize
the KL divergence between a given joint distribution pI and a product-form joint distribution pI I ,
then depending on the structure of pI I , certain marginal distributions induced by the two joint
distributions are equal (i.e., a moment-matching condition). In our case, the time-reversible CSMA
Markov chain gives the product-form distribution. Also, the arrival rate and service rate on link
k are viewed as two marginal probabilities. They are not always equal, but they satisfy the desired
inequality in Proposition 3.6, due to the constraint r ≥ 0, which is important in our design.
The following condition, proved in Section 3.13.2, ensures that supr≥0 F (r; λ) is attainable.
To see why strict feasibility is necessary, note that the links are all idle some positive fraction
of time with any parameters of the CSMA algorithm.
is the state of link k at time instance τ . Note that λk (j ) and sk (j ) are generally random variables.
We design the following distributed algorithm.
where α(j ) > 0 is the step size, and [·]D means the projection to the set D := [0, rmax ] where
rmax > 0. Thus, [r]D = max{0, min{r, rmax }}. We allow rmax = +∞, in which case the projection
is the same as [·]+ .
3 We would like to thank D. Shah for suggesting the use of increasing update intervals.
3.4. DISTRIBUTED ALGORITHMS 29
Observe that each link k only uses its local information in the algorithm.
Remark: If in period j + 1 (for any j ), the queue of link k becomes empty, then link k still transmits
dummy packets with TA rk (j ) until tj +1 . This ensures that the (ideal) average service rate is still
sk (r(j )) for all k. (The transmitted dummy packets are counted in the computation of sk (j ).)
The following result establishes the optimality property of Algorithm 1.
1
α(n) = and Tn = n + 2 for n ≥ 0.
(n + 2) log(n + 2)
Then, under Algorithm 1 with D = [0, ∞), we have
(i) r(n) → r ∗ as n → ∞;
(ii) The algorithm stabilizes the queues in the sense of rate-stability. That is,
where Qk (t) is the queue length of link k at time t. In particular, the use of dummy packets do not affect
the rate-stability.
We explain the key steps of the proof in Section 3.8, and we provide further details in Section
3.9.
Discussion
(1) In a related work (48), Liu et al. carried out a convergence analysis, using a differential-equation
method, of a utility maximization algorithm extended from (30) (see Section 4.1 for the algorithm).
However, queueing stability was not established in (48).
(2) It has been believed that optimal scheduling is NP complete in general. This complexity is
reflected in the mixing time of the CSMA Markov chain (i.e., the time for the Markov chain to
approach its stationary distribution). In (33) (and also in inequality (3.33)), the upper-bound used
to quantify the mixing time is exponential in K. However, the bound may not be tight in typical
wireless networks. For example, in a network where all links conflict, the CSMA Markov chain
mixes much faster than the bound.
(3) There is some resemblance between the above algorithm (in particular the CSMA Markov chain)
and simulated annealing (SA) (22). SA is an optimization technique that utilizes time-reversible
30 3. SCHEDULING IN WIRELESS NETWORKS
Markov chains to find a maximum of a function. SA can be used, for example, to find the Maximal-
Weighted IS (MWIS) which is needed in Maximal-Weight Scheduling. However, note that our
algorithm does not try to find the MWIS via SA. Instead, the stationary distribution of the CSMA
Markov chain with a properly-chosen r ∗ is sufficient to support any vector of strictly feasible arrival
rates (Theorem 3.8). Also, the time-reversible Markov chain we use is inherent in the CSMA
protocol, which is amenable to distributed implementation. This is not always the case in SA.
Note that there is no projection in (3.10). Instead, h̄(rk (j )) is used to bound r(j ) in a “softer” way:
⎧
⎨rmin − y
⎪ if y < rmin
h̄(y) = 0 if y ∈ [rmin , rmax ] (3.11)
⎪
⎩
rmax − y if y > rmax
λ ∈ C (rmin , rmax , )
:= {λ| arg max F (r; λ + · 1) ∈ (rmin , rmax )K }.
r
Also assume the same arrival process as in Theorem 3.10, such that the empirical arrival rates are bounded,
i.e., λk (j ) ≤ λ̄, ∀k, j for some λ̄ < ∞.
Then, if α(j ) > 0 is non-increasing and satisfies j α(j ) = ∞, j α(j )2 < ∞ and α(0) ≤ 1,
then r(j ) converges to r ∗ as i → ∞ with probability 1, where r ∗ satisfies sk (r ∗ ) = λk + > λk , ∀k.
Also, the queues are rate stable and return to 0 infinitely often.
Remark: Clearly, as rmin → −∞, rmax → ∞ and → 0, C (rmin , rmax , ) → C . So the maximal
throughput can be arbitrarily approximated by setting rmax , rmin and .
The proof of the theorem is similar to that of Theorem 5.4 to be presented later, and it is
therefore omitted here.
3.5. MAXIMAL-ENTROPY INTERPRETATION 31
3.4.3 TIME-INVARIANT A-CSMA
Although Algorithm 1 is throughput-optimal, r is not a direct function of the queue lengths. In this
section, we consider algorithm (3.12) where r is a function of the queue lengths. It can achieve a
capacity region arbitrarily close to C .
Clearly, C2 (rmax ) ⊂ C .
Then, with a small enough α and a large enough T , Algorithm 2 makes the queues stable.
Remark: Note that C2 (rmax ) → C as rmax → +∞.Therefore, the algorithm can be arbitrarily
approach throughput-optimality by properly choosing rmax , α and T .
The proof is given in Section 3.11.
gradient algorithm to update the dual variable r in order to solve problem (3.15).
Proof. Given some finite dual variables r, the Lagrangian of problem (3.15) is
L(u; r) = − ui log(ui ) + rk ( ui · xki − λk ). (3.16)
i k i
4 In fact, there is a more general relationship between ML estimation problem such as (3.5) and Maximal-Entropy problem such
as (3.14) (68) (74). In (31), on the other hand, problem (3.14) was motivated by the “statistical entropy” of the CSMA Markov
chain.
3.6. REDUCING DELAYS: ALGORITHM 1(B) 33
Denote u∗ (r) = arg maxu∈D0 L(u; r). Since i ui = 1, if we can find some w, and u∗ (r) >0
such that
∂ L(u∗ (r); r)
= − log(u∗i (r)) − 1 + rk xki = w, ∀i,
∂ui
k
then u∗ (r) is the desired distribution. The above conditions are
u∗i (r) = exp( rk xki − w − 1), ∀i. and u∗i (r) = 1.
k i
j
By solving the two equations, we find that w = log[ j exp( k rk xk )] − 1 and
∗ exp( k rk xki )
ui (r) = j
> 0, ∀i (3.17)
j exp( k rk xk )
for all k, where α(j ) is the step size, and D = [0, rmax ] where rmax can be +∞. As in Algorithm
1, even when link k has no backlog (i.e., zero queue length), we let it send dummy packets with its
current aggressiveness rk . This ensures that the (ideal) average service rate of link k is sk (r(j )) for
all k.
Since Algorithm 1(b) “pretends” to serve some arrival rates higher than the actual arrival rates
(due to the positive term min{c/rk (j ), w̄}), Qk is not only stable, but it tends to be small. About
the convergence and stability properties, Theorem 3.10 also holds for Algorithm 1(b).
3.7 SIMULATIONS
In our C++ simulations, the transmission time of all links is exponentially distributed with mean 1ms,
and the backoff time of link k is exponentially distributed with mean 1/ exp(rk ) ms. The capacity
of each link is 1(data unit)/ms. There are 6 links in the network whose CG is shown in Fig. 3.1.
3.8. PROOF SKETCH OF THEOREM 3.10-(I) 35
/LQN /LQN
/LQN /LQN
/LQN /LQN
Define 0 ≤ ρ < 1 as the “load factor”, and let ρ = 0.98 in this simulation. The arrival rate
vector is set to λ=ρ*[0.2*(1,0,1,0,0,0) + 0.3*(1,0,0,1,0,1) + 0.2*(0,1,0,0,1,0) + 0.3*(0,0,1,0,1,0)] =
ρ*(0.5,0.2,0.5,0.3,0.5,0.3) (data units/ms). We have multiplied by ρ < 1 a convex combination of
some maximal ISs to ensure that λ ∈ C .
Queue lengths
120
100
Queue lengths (data units)
80
60
Link 1
40
Link 2
Link 3
20 Link 4
Link 5
Link 6
0
0 2 4 6 8 10 12 14 16
time (ms) 4
x 10
To prove the optimality of Algorithm 1, however, there exist extra challenges. First, r in
Algorithm 1 is unbounded in general, unlike what is assumed in Chapter A. Second, the error in
the gradient is determined by the mixing property of the CSMA Markov chain, which needs to
carefully quantified and controlled. As a result, we not only need to choose suitable step sizes α(j )
(as in normal SA algorithms), but we also need to choose the update interval Tj carefully to control
the error in the gradient. We will show that with suitable choices, r converges to r ∗ , and, therefore,
the queues are stabilized.
The main steps of the proof are as follows. We choose α(j ) = 1/[(j + 2) log(j + 2)] and
Tj = j + 2 for j ≥ 0. (More general choices are possible.) In the following, the notation Tj and
T (j ) are interchangeable.
Steps of the Proof:
Step 1: The first step is to decompose the error into a bias and a zero-mean random error.
j
Let x(j ) ∈ {0, 1}K be the state of the CSMA Markov chain at time tj = j =1 Tj (with
t0 = 0). Recall that r(j ) is the value of r set at time tj . Define the random vector U (0) = (r(0), x(0))
3.8. PROOF SKETCH OF THEOREM 3.10-(I) 37
r: Transmission Aggressiveness
7
4
r
Link 1
2 Link 2
Link 3
Link 4
1 Link 5
Link 6
0
0 2 4 6 8 10 12 14 16
time (ms) x 10
4
and U (j ) = (λ (j − 1), s (j − 1), r(j ), x(j )) for j ≥ 1. Let Fj , j = 0, 1, 2, . . . be the σ -field
generated by {U (j )}j =0,1,...,,j . In the following, we write the conditional expectation E(·|Fj )
simply as Ej (·).
Write Algorithm 1 as follows:
ηk (j ) = (λk (j ) − sk (j )) − Ej [λk (j ) − sk (j )].
Thus, B(j ) is the bias of the error at step j and η(j ) is the zero mean random error.
Step 2: The second step is to consider the change of a Lyapunov function.
38 3. SCHEDULING IN WIRELESS NETWORKS
Queue lengths
400
Link 1
Link 2
350
Link 3
Link 4
300 Link 5
Queue lengths (data units)
Link 6
250
200
150
100
50
0
0 2 4 6 8 10 12 14 16
time (ms) x 10
4
Let
1
||r(j ) − r ∗ ||2 .D(j ) =
2
Using the expression for r(j + 1) we find that
J
E(j ) converges to a finite value as J → ∞. (3.22)
j =0
Proof. Pick an arbitrary δ > 0. We first claim that these two properties imply that D(j ) < δ for
infinitely many values of j . Indeed, assume otherwise so that D(j ) ≥ δ for all j ≥ j0 .Then by (3.21),
G(j ) ≤ −, ∀j ≥ j0 . Since D(j + 1) ≤ D(j ) + α(j )G(j ) + E(j ), we have that for n ≥ 1,
+n−1
j0
D(j0 + n) ≤ D(j0 ) + [α(j )G(j ) + E(j )]
j =j0
+n−1
j0 +n−1
j0
≤ D(j0 ) − α(j ) + E(j ).
j =j0 j =j0
j0 +n−1
Since j α(j ) = ∞ and (3.22) holds, we have that ∞ j =j0 α(j ) = ∞ and that j =j0 E(j )
converges to a finite value as n → ∞. Therefore, D(j0 + n) goes to −∞ as n → ∞, which is not
possible since D(j0 + n) ≥ 0. This proves the above claim.
2
By property (3.22), one can pick j1 large enough so that | m j =m1 E(j )| ≤ δ for any m1 , m2 ≥
j1 . Since D(j ) < δ for infinitely many values of j , we canw choose j2 > j1 such that D(j2 ) < δ. It
follows that for any j > j2 ,
j −1
D(j ) ≤ D(j2 ) + [α(j )G(j ) + E(j )]
j =j2
j −1
≤ D(j2 ) + E(j )
j =j2
≤ D(j2 ) + δ < 2δ.
But the choice of δ > 0 is arbitrary (i.e., δ can be arbitrarily small). It follows that D(j )
converges to zero, and therefore that r(j ) converges to r ∗ , as claimed. 2
The details of the proof, given in Section 3.9, are to show the properties (3.21)–(3.22). Prop-
erty (3.21) holds because the function F (r; λ), defined in (3.4), is strictly concave in r. Essentially,
when r is away from r ∗ , a step in the direction of the gradient brings r strictly closer to r ∗ . Proving
property (3.22) has two parts: bounding B(j ) and showing that the zero-mean noise has a finite
sum. The first part is based on estimates of the convergence rate of the Markov chain (its mixing
time). The second part uses a martingale convergence theorem.
40 3. SCHEDULING IN WIRELESS NETWORKS
Step 5: Rate-stability.
Since r(j ) converges to r ∗ , it can be shown that the long-term average service rate of each
link k converges to sk (r ∗ ) ≥ λk , which implies rate-stability. Proof of this step is in Section 3.10.
n
α(j )[r(j ) − r ∗ ]T η(j ) converges to a finite random variable.
j =0
Let
j −1
Y (j ) = α(n)[r(n) − r ∗ ]T η(n).
n=0
j −1
E[Y 2 (j )] = E{[α(n)(r(n) − r ∗ )T η(n)]2 }
n=0
j −1
≤ α 2 (n)E{||r(n) − r ∗ ||2 ||η(n)||2 }
n=0
3.9. FURTHER PROOF DETAILS OF THEOREM 3.10-(I) 41
Now we use the well-known martingale convergence theorem (see, for example, (16)) stated
below.
then there exists a random variable Z such that Z(j ) → Z almost surely as j → ∞, and E(Z 2 ) < +∞.
As a result,
m2
lim sup | α(n)[r(n) − r ∗ ]T B(n)|
j →∞ m2 ≥m1 ≥j
n=m1
m2
≤ lim sup |α(n)[r(n) − r ∗ ]T B(n)| = 0,
j →∞ m2 ≥m1 ≥j
n=m1
so
J
α(n)[r(n) − r ∗ ]T B(n) converges to a finite value as J → ∞.
n=0
where b is a constant.
Assuming (3.28) for the time being, we obtain (3.26) as follows. Recall that B(n) =
En [λ (n) − s (n)] − [λ − s(r(n))] = {En [λ (n)] − λ} − {En [s (n)] − s(r(n))}. With the arrival
process assumed in Theorem 3.10, it is easy to see that ||En [λ (n)] − λ|| = O(1/T (n)). Also,
||En [s (n)] − s(r(n))||
Tn
1
= O( ||P (Xt = .) − π||dt)
Tn 0
Tn
1
≤ O( b · e−t/Tmix (n) dt)
Tn 0
b ∞ Tmix (n)
≤ O( e−t/Tmix (n) dt) = O( ). (3.29)
Tn 0 T
Combining the above results yields (3.26). The inequality (3.27) is then straightforward.
The mixing time of a CSMA Markov chain may increase with the parameters r. To see this
consider the network of Figure 3.5 and assume that R1 = R2 = R3 = R4 = R. When R is large,
the corresponding Markov chain spends a long time in the states {{2}, {4}, {2, 4}} before visiting
the states {{1}, {3}, {1, 3}} and vice versa. Indeed, assume that the initial state is {2} or {4}. It is very
likely that the Markov chain jumps to {2, 4} before it jumps to ∅. Consequently, it takes a long time
for the probability of the other states to approach their stationary value.
1 2
4 3
To derive (3.28) and (3.25), we use a coupling argument. (This is a different, and in our
opinion, more intuitive approach than the method used in (33) based on the conductance of the
CSMA Markov chain.)
Let us start a stationary version {X̃t , t ≥ 0} with invariant distribution π and an independent
version {Xt , t ≥ 0} with some arbitrary distribution. These two Markov chains have the rate matrix
3.9. FURTHER PROOF DETAILS OF THEOREM 3.10-(I) 43
Q that corresponds to the CSMA Markov chain with parameters r(n). Let τ be the first time t that
Xt = X̃t .
Lemma 3.17
||P (Xt = .) − π|| ≤ P (τ > t). (3.30)
Proof. After time τ , we glue the two Markov chains together, so that
Now,
so that
so that
||P (Xt = .) − P (X̃t = .)|| ≤ P (Xt = X̃t ) = P (τ > t),
which proves (3.30). 2
Now we are ready to prove (3.28) and (3.25) by estimating P (τ > t).
Proof. Let t0 > 1 be a fixed time interval. The two Markov chains Xt and X̃t both have the rate
tansition matrix Q = (q(i, j ))1≤i,j ≤N where N is the number of states. Also, assume that 1 ≤
q(i, j ) ≤ R, ∀i = j : q(i, j ) > 0. Choose a state i0 which corresponds to a maximal independent
set. We will show that P (Xt0 = X̃t0 = i0 |X0 = i1 , X̃0 = i2 ) is larger than a specific constant for any
initial states i1 and i2 .
44 3. SCHEDULING IN WIRELESS NETWORKS
First, consider P (Xt0 = i0 |X0 = i1 ). We construct a path Pi1 ,i0 from state i1 to i0 , i.e., i1 =
j0 → j1 → j2 · · · → jM−1 → jM = i0 . Let O1 , O0 be the sets of links which are transmitting (or
“on”) in state i1 and i0 , respectively. First, for all links in O1 \O0 , we change them from “on” to “off ”
one by one (in an arbitrary order). Then, for all links in O0 \O1 , we change them from “off ” to “on”
one by one (in an arbitrary order). So, the path has M = |O1 \O0 | + |O0 \O1 | ≤ K jumps, i.e.,
M ≤ K. (3.31)
It is well known that the Markov chain can be interpreted as follows. From state i, one
picks an exponentially distributed r.v. Wi with rate −q(i, i) as the time to stay in state i before the
jump. Upon the jump, one chooses to jump to state j = i with probability pi,j := −q(i, j )/q(i, i),
independently of Wi . Now we have
P (Xt0 = i0 |X0 = i1 ) ≥ P (Xt reaches i0 along the path Pi1 ,i0 at time t ≤ t0 ,
and then stays at i0 for at least an interval of t0 |X0 = i1 )
M−1
M−1
= P( Wjm ≤ t0 ) · pjm ,jm+1 · exp[q(i0 , i0 )t0 ]. (3.32)
m=0 m=0
Note that each Wjm has a rate of at least 1, since −q(i, i) ≥ 1, ∀i. Let Zm , m = 0, . . . , M − 1
M−1
be M i.i.d. exponentially distributed r.v.s with rate 1. Then we have P ( m=0 Wjm ≤ t0 ) ≥
M−1 t0M
P ( m=0 Zm ≤ t0 ) ≥ P (Y = M) = M! exp(−t0 ) where Y has a Poisson distribution with param-
eter t0 . Also, since −q(i, i) ≤ K · R, ∀i, we have pjm ,jm+1 ≥ 1/(K · R). Finally, since state i0 is a
maximal independent set, it can only jump to other state by turning a link off. So −q(i0 , i0 ) ≤ K.
Using these facts and (3.31) in (3.32), we obtain
t0M
P (Xt0 = i0 |X0 = i1 ) ≥ exp(−t0 ) · (K · R)−M exp[−K · t0 ]
M!
1
≥ exp(−t0 )(K · R)−K exp[−K · t0 ]
(K)!
:= c̄ · R −K
where c̄ is a constant not related to R.
The same bound holds for P (X̃t0 = i0 |X̃0 = i2 ). Therefore, for any i1 , i2 ,
P (Xt , X̃t meet before t0 |X0 = i1 , X̃0 = i2 )
≥ P (Xt0 = X̃t0 = i0 |X0 = i1 , X̃0 = i2 ) ≥ c̄2 R −2K .
It follows that
P (τ > n · t0 ) ≤ (1 − c̄2 R −2K )n = exp{n · log(1 − c̄2 R −2K )} ≤ exp{−nc̄2 R −2K }.
So
P (τ > t) ≤ P (τ > t/t0 t0 ) ≤ exp{− t/t0 c̄2 R −2K }
≤ exp{−(t/t0 − 1)c̄2 R −2K } ≤ b · exp{−t/(t0 c̄−2 R 2K )} (3.33)
3.10. PROOF OF THEOREM 3.10-(II) 45
where b := exp(1).
Comparing with (3.28), we know that
so (3.25) holds. 2
where I (·) is the indicator function. So Sk (t) ≥ Dk (t). Assume that the initial queue lengths are
zero, then it is clear that Dk (t) ≤ Ak (t), and link k’s queue length is Qk (t) = Ak (t) − Dk (t).
We complete the proof by stating and proving two lemmas. The first lemma is intuitively
clear, although non-trivial to prove.
almost surely.
Proof. This is a quite intuitive result since r → r ∗ a.s.. In the following, we first give an outline of
the proof and then present the details.
Recall that r is adjusted at time ti , i = 1, 2, . . . , and Ti = ti − ti−1 . Note that during each
update interval Ti , the TA r is fixed. Since Ti → ∞ as i → ∞, for a given constant T > 0, we can
divide the time into blocks, such that during each block r is fixed, and all the blocks after some initial
time have similar lengths (between T and 2T ). Then we consider the average service rate ŝj in each
block j . We decompose ŝj into an ideal rate sk (rj ) where rj (temporarily) denotes the TA during
block j , an error bias, and a zero-mean martingale noise. Now, in oder to compute limt→∞ Sk (t)/t,
we need to average ŝj over all blocks. We show that the average of the martingale noise is 0, the
46 3. SCHEDULING IN WIRELESS NETWORKS
average of the error bias is arbitrarily close to 0 by choosing large-enough T , and the average of the
ideal rates is sk (r ∗ ) since rj converges to r ∗ . This implies the desired result.
Now we present the proof details. First, we divide the time into blocks. Fix a T > 0, we
construct a sequence of time {τj } as follows. Let τ0 = t0 = 0. Denote t(j ) := min{ti |ti > τj }, i.e.,
t(j ) is the nearest time in the sequence {ti , i = 1, 2, . . . } that is larger than τj . The following
defines τj , j = 1, 2, . . . , recursively. If t(j ) − τj < 2T , then let τj +1 = t(j ) . If t(j ) − τj ≥ 2T , then
let τj +1 = τj + T . Also, define Uj := τj − τj −1 , j = 1, 2, . . . .
Denote i ∗ (T ) = min{i|Ti+1 ≥ T }, and j ∗ (T ) = min{j |τj = ti ∗ (T ) }. From the above con-
struction, we have
T ≤ Uj ≤ 2T , ∀j > j ∗ (T ). (3.36)
Now, we consider the average service rate in each block j , i.e., ŝj := [Sk (τj +1 ) −
Sk (τj )]/Uj +1 . Write ŝj = sk (r(τj )) + bj + mj , where the “error bias” bj = Ej (ŝj ) − sk (r(τj ))
(Ej (·) is the conditional expectation given r(τj ) and the transmission state at time τj ), and the
martingale noise mj = ŝj − Ej (ŝj ) (note that Ej (mj ) = 0). For convenience, we have dropped the
subscript k in ŝj , bj , mj . But all discussion below is for link k.
N
First, we show that limN →∞ [ N j =0 (mj · Uj +1 )/ j =0 Uj +1 ] = 0 a.s.. Since mj is bounded,
N
E(mj ) ≤ c1 for some c1 > 0. Clearly, MN := j =0 (mj · Uj +1 ), N = 0, 1, . . . is a martingale (de-
2
N
fine M−1 = 0). We have E(MN2 ) = N j =0 (E(mj ) · Uj +1 ) ≤ c1
2 2 2
j =0 Uj +1 . Therefore,
∞ ∞ ∞
E(MN2 ) − E(MN−1
2 ) E(m2N ) · UN2 +1 UN2 +1
N = N ≤ c1 N
( j =0 Uj +1 ) 2 2 2
N=0 N =0 ( j =0 Uj +1 ) N =0 ( j =0 Uj +1 )
j ∗
(T )−1 ∞
UN2 +1 UN2 +1
= c1 N + c1 N
.
N =0 ( j =0 Uj +1 )2 N =j ∗ (T ) ( j =0 Uj +1 )
2
Since
∞
∞
∞
UN2 +1 4T 2 4T 2
N ≤ N ≤ N
( 2 2 2
N=j ∗ (T ) j =0 Uj +1 ) N =j ∗ (T ) ( j =0 Uj +1 ) N =j ∗ (T ) ( j =j ∗ (T ) Uj +1 )
∞ ∞
4T 2 4
≤ = < ∞,
(N − j ∗ (T ) + 1)2 T 2 (N − j ∗ (T ) + 1)2
N=j ∗ (T ) N =j ∗ (T )
∞ E(MN2 )−E(M 2
N−1 )
we have N=0 < ∞. Using Theorem 2.1 in (27), we conclude that
( Nj =0 j +1 )
U 2
N
N
lim [ (mj · Uj +1 )/ Uj +1 ] = 0, a.s. (3.37)
N →∞
j =0 j =0
N
N
N
N
| (bj · Uj +1 )/ Uj +1 | ≤ ( c2 ())/( Uj +1 ) ≤ c2 ()/T .
j :τj >t0 j =0 j :τj >t0 j :τj >t0
N N
Therefore, lim supN→∞ j =0 (bj · Uj +1 )/ j =0 Uj +1 ≤ c2 ()/T and similarly
N N
lim supN→∞ j =0 (bj · Uj +1 )/ j =0 Uj +1 ≥ −c2 ()/T .
Also, since r → r ∗ in the realization, it is easy to show that
N
N
lim [ (sk (r(τj )) · Uj +1 )/ Uj +1 ] = sk (r ∗ ).
N→∞
j =0 j =0
Combining the above facts, we know that with probability 1, lim supt→∞ Sk (t)/t =
N
lim supN→∞ [ N j =0 (ŝj · Uj +1 )/ Uj +1 ] ≤ sk (r ∗ ) + c2 ()/T and lim inf t→∞ Sk (t)/t =
N Nj =0
lim inf N→∞ [ j =0 (ŝj · Uj +1 )/ j =0 Uj +1 ] ≥ sk (r ∗ ) − c2 ()/T .
Since the above argument holds for any T > 0. Letting T → ∞, we have limt→∞ Sk (t)/t =
sk (r ∗ ) with probability 1. 2
Lemma 3.20 If sk (r ∗ ) ≥ λk , ∀k, and (3.35) holds a.s., then limt→∞ Dk (t)/t = λk , ∀k a.s.. That is,
the queue is “rate stable”.
Proof. Again, we first give an outline of the proof, and then we present the details. The proof is
composed of two parts. Part (a) shows that lim inf t→∞ [Ak (t) − Dk (t)]/t = 0 a.s., and part (b) shows
that lim supt→∞ [Ak (t) − Dk (t)]/t = 0 a.s.. Combining the two parts gives the desired results.
To show the result in part (a), suppose to the contrary that lim inf t→∞ [Ak (t) − Dk (t)]/t >
> 0. This implies that there is some finite time T0 > 0, such that
as shown in Fig. 3.6 (a). So, no dummy packet is transmitted after T0 due to the non-empty queue,
which implies that the average departure rate limt→∞ Dk (t)/t is equal to the average service rate
limt→∞ Sk (t)/t. However, (3.38) implies that the average service rate (which equals the average
departure rate) is strictly smaller than the average arrival rate, leading to a contradiction.
To show the result in part (b), suppose to the contrary that lim supt→∞ [Ak (t) − Dk (t)]/t >
2a for some constant a > 0. This means that Ak (t) − Dk (t) ≥ 2a · t infinitely often. By part (a),
we also know that Ak (t) − Dk (t) ≤ a · t infinitely often. Therefore, for any T1 > 0, there exist
t2 > t1 ≥ T1 such that in the interval t ∈ [t1 , t2 ], Qk (t) = Ak (t) − Dk (t) grow from below a · t1 to
above 2a · t2 , and there is no dummy packet transmitted in between (see Fig. 3.6 (b)). We show that
48 3. SCHEDULING IN WIRELESS NETWORKS
4NW 4NW
D W
İW DW
7 W W W W
D 6FHQDULR E 6FHQDULR
this indicates a large fluctuation of [Ak (t) − Sk (t)]/t, contradicting the fact that [Ak (t) − Sk (t)]/t
converges to a limit.
Next, we present the proof details.
(a) We first show that lim inf t→∞ [Ak (t) − Dk (t)]/t = 0 a.s.. For this purpose, we show that
∀ > 0, P (lim inf t→∞ [Ak (t) − Dk (t)]/t > ) = 0. If in a realization,
then ∃T0 > 1/, s.t. ∀t ≥ T0 , [Ak (t) − Dk (t)]/t ≥ , i.e., Qk (t) ≥ · t. Since T0 > 1/, we have
Qk (t) > 1, ∀t ≥ T0 , i.e., the queue is not empty after T0 .Therefore, for any t ≥ T0 , Sk (t) = Sk (T0 ) +
[Sk (t) − Sk (T0 )] = Sk (T0 ) + [Dk (t) − Dk (T0 )] ≤ T0 + Dk (t). So
T0 + Dk (t)
lim sup Sk (t)/t ≤ lim sup = lim sup Dk (t)/t.
t→∞ t→∞ t t→∞
By the assumption (3.39), lim supt→∞ Dk (t)/t < lim inf t→∞ Ak (t)/t − . So
lim supt→∞ Sk (t)/t < lim inf t→∞ Ak (t)/t − . Therefore, the intersection of events
{ lim Sk (t)/t ≥ lim Ak (t)/t} ∩ {lim inf [Ak (t) − Dk (t)]/t > } = ∅. (3.40)
t→∞ t→∞ t→∞
On the other hand, with probability 1, limt→∞ Ak (t)/t = λk and limt→∞ Sk (t)/t =
sk (r ∗ ). Since sk (r ∗ ) ≥ λk , P (limt→∞ Sk (t)/t ≥ limt→∞ Ak (t)/t) = 1. In view of (3.40), we have
P (lim inf t→∞ [Ak (t) − Dk (t)]/t > ) = 0. Since this holds for any > 0, we conclude that
lim inf t→∞ [Ak (t) − Dk (t)]/t = 0 a.s.
(b) Second, we show that lim supt→∞ [Ak (t) − Dk (t)]/t = 0 a.s..
From (a), we know that for an arbitrary a > 0, with probability 1 [Ak (t) − Dk (t)]/t ≤ a
infinitely often (“i.o.”), and limt→∞ [Ak (t) − Sk (t)]/t ≤ 0. Consider a realization in which the
above two events occur, and lim supt→∞ [Ak (t) − Dk (t)]/t > 2a. Then, [Ak (t) − Dk (t)]/t ≥ 2a
i.o..
3.11. PROOF OF THEOREM 3.13 49
By the above assumptions, Qk (t) = Ak (t) − Dk (t) ≤ a · t and Qk (t) = Ak (t) − Dk (t) ≥
2a · t i.o.. Also note that in any time interval of 1, Qk (t) can increase by at most C̄ (since the
number of arrivals in each time slot is bounded by C̄). So, for any T1 (satisfying a · T1 ≥ 4C̄), there
exist t2 > t1 ≥ T1 such that Qk (t1 ) ≤ a · t1 , Qk (t2 ) ≥ 2a · t2 , and Qk (t) ≥ 2C̄ for any t1 < t < t2 .
Since the queue is not empty from time t1 to t2 , we have
Bk (t2 )
= Bk (t1 ) + [Bk (t2 ) − Bk (t1 )]
= Bk (t1 ) + {[Ak (t2 ) − Ak (t1 )] − [Sk (t2 ) − Sk (t1 )]}
= Bk (t1 ) + {[Ak (t2 ) − Ak (t1 )] − [Dk (t2 ) − Dk (t1 )]}
= Bk (t1 ) + Qk (t2 ) − Qk (t1 )
≥ Bk (t1 ) + 2a · t2 − a · t1
Therefore,
Bk (t2 )/t2 ≥ Bk (t1 )/t2 + 2a − a · t1 /t2
Then,
Bk (t1 ) t1 t1
Bk (t2 )/t2 − Bk (t1 )/t1 ≥ ( − 1) + 2a − a .
t1 t2 t2
Since limt→∞ Bk (t)/t := b ≤ 0, we choose T1 large enough such that ∀t ≥ T1 , |Bk (t)/t −
b| ≤ a/3. Then,
|Bk (t1 )/t1 − Bk (t2 )/t2 | ≤ (2/3) · a. (3.41)
t1
Also, since t1 ≥ T1 , we have Bk (t1 )/t1 ≤ b + a/3 ≤ a. Since t2 − 1 < 0, it follows that
t1 t1
Bk (t2 )/t2 − Bk (t1 )/t1 ≥ a · ( − 1) + 2a − a = a
t2 t2
which contradicts (3.41). Therefore, P (lim supt→∞ [Ak (t) − Dk (t)]/t > 2a) = 0. Since this holds
for any a > 0, we conclude that lim supt→∞ [Ak (t) − Dk (t)]/t = 0 a.s..
Combining (a) and (b) gives limt→∞ [Ak (t) − Dk (t)]/t = limt→∞ Qk (t)/t = 0 a.s.. 2
Combining the above two lemmas, we conclude that the queues are rate stable under Algo-
rithm 1.
r̄
D
r
r∗
0 D r̄1
Define
r̄k := (α/T ) · Xk , ∀k. (3.42)
According to algorithm (3.12), the CSMA parameter rk (j ) = min{r̄k (j ), D}, ∀k, or equivalently,
r(j ) := [r̄(j )]D .
In view of the queue dynamics, the vector r̄ = (r̄k )k=1,...,K is also updated every T time units.
In particular, at time (j + 1)T , r̄ is updated for the (j + 1)-th time as follows:
r̄k (j + 1) = {r̄k (j ) + α · [λk (j ) − sk (j )]}, ∀k, (3.43)
where we choose
α = δ/K. (3.44)
Note that the average service rate sk (j ) is achieved with CSMA parameter r(j ) (instead of
r̄(j )).
Since r(j ) ∈ [0, D]K , ∀j , the mixing time of the CSMA Markov chain (in each update in-
terval T ) is Tmix := O(exp(2K · D)) by (3.34). Therefore, by (3.29), we choose T = T (δ, K, D) =
O(Tmix · (4K · D)/δ) = O(4K · D) exp(2K · D)/δ such that
|Ej [sk (j )] − sk (r(j ))| ≤ δ/(4K · D), ∀k, j. (3.45)
(j )
:= Ej [L(r̄(j + 1)) − L(r̄(j ))]
∂L(r̄(j )) 1
≤ α {[λk − Ej (sk (j ))] } + Kα 2
∂ r̄k (j ) 2
k
∂L(r̄(j ))
≤ α {[λk − sk (r(j ))] }+α {[sk (r(j ))
∂ r̄k (j )
k k
∂L(r̄(j )) 1
−Ej (sk (j ))] } + Kα 2
∂ r̄k (j ) 2
∂F (r(j )) ∂L(r̄(j )) δ
≤ α +α ( D)
∂rk (j ) ∂ r̄k (j ) 4K · D
k k
1
+ Kα 2
2
δ 1
≤ α(−δ + + Kα)
4 2
= −αδ/4.
which establishes the negative drift of L(r̄(j )). Therefore, r̄(j ) is stable, and by (3.42), X(j ) is also
stable.
π(S) = Ck∈S Rk μk
Ai
0
Figure 3.8: Activity of node i: state 0 means idle.
Designate by Qi the rate matrix of this Markov chain reversed in time defined by
where B is the constant such that these probabilities add up to one over all the possible states.
Moreover, the Markov chain reversed in time corresponds to the same CSMA network except that
the activity of each node i is described by the Markov chain with rate matrix Qi .
Proof:
We prove this result by verifying the equations
Bπ1 (x1 ) · · · πi (xi ) · · · πN (xN )q(x, y) = Bπ1 (x1 ) · · · πi (yi ) · · · πN (xN )q (y, x),
where C is the constant such that these probabilities add up to one over all the independent sets. That is,
the probability of each independent set being active does not depend on the distribution of the transmission
times.
Proof:
This result follows directly from (3.49). Indeed, with D = {i1 , i2 , . . . , in }, one has
π(A(i1 , . . . , in )) = π(x) = Bi∈D πi (Ai )i ∈D / πi (0).
x∈A(i1 ,...,in )
Now,
1 μi Ri
πi (0) = and πi (Ai ) = 1 − πi (0) = .
1 + μi Ri 1 + μ i Ri
54 3. SCHEDULING IN WIRELESS NETWORKS
Consequently,
−1
π(A(i1 , . . . , in )) = BK
i=1 (1 + μi Ri ) i∈D μi Ri = Ci∈D μi Ri
with C = BK −1
i=1 (1 + μi Ri ) . The last expression above is precisely (3.51).
2
Using this product-form result and similar techniques as before, it is not difficult to show that
Algorithm (3.10) (with λk (j ) defined as the amount of data that arrives at link k in period j + 1
divided by T (j + 1)) is still near-throughput-optimal and stabilizes the queues.
3.13 APPENDICES
3.13.1 PROOF OF THE FACT THAT C IS THE INTERIOR OF C¯
Theorem 3.23 λ is strictly feasible if and only if λ ∈ int C¯. (In other words, C = int C¯.)
Proof. (i) If λ is strictly feasible, then it can be written as λ = i p̄i x i where p̄i > 0, ∀i and
i p̄i = 1. Let p̄0 be the probability corresponding to the all-0 IS, and p̄k be the probability of the
IS ek , k = 1, 2, . . . , K. Let d0 = min{p̄0 /K, mink p̄k } > 0. We claim that for any λ that satisfies
|λk − λk | ≤ d0 , ∀k, (3.52)
we have λ ∈ C¯. Indeed, if λ satisfies (3.52), we can find another probability distribution p̄ such
that i p̄i xki = λk , ∀k. p̄ can be constructed as follows: let p̄0 = p̄0 − k (λk − λk ), p̄k = p̄k +
(λk − λk ), and let the probabilites of all other ISs be the same as those in p̄. By condition (3.52), we
have p̄ ≥ 0. Also, i p̄i xki = λk , ∀k.
Therefore, B (λ, d0 ) ⊆ C¯ where d0 > 0. So λ ∈ int C¯.
(ii) Assume that λ ∈ int C¯. We now construct a p > 0 such that λ = i pi x i . First, choose
an arbitrary pI > 0 (such that i pI,i = 1) and let λI := i pI,i x i . If it happens to be that
λI = λ, then λ is strictly feasible. In the following, we assume that λI = λ. Since λ ∈ int C¯,
there exists a small-enough d > 0 such that λI I := λ + d · (λ − λI ) ∈ C¯. So λI I can be written
as λI I = i pI I,i x i where pI I ≥ 0 and i pI I,i = 1.
Notice that λ = α · λI + (1 − α) · λI I where α := d/(1 + d) ∈ (0, 1). So λ = i pi x i
where pi := α · pI,i + (1 − α) · pI I,i , ∀i. Since α > 0, 1 − α > 0 and pI,i > 0, pI I,i ≥ 0, ∀i, we
have pi > 0, ∀i. Therefore, λ is strictly feasible. 2
Therefore,
g(r) = L(u∗ (r); r) = −F (r; λ).
We now check whether the Slater condition (8) (pages 226-227) is satisfied. Since all the
constraints in (3.15) are linear, we only need to check whether there exists a feasible u which is
in the relative interior (8) of the domain D0 of the objective function − i ui log(ui ), which is
D0 = {u|ui ≥ 0, ∀i, i ui = 1}. Since λ = i p̄i · x i where p̄i > 0, ∀i and i p̄i = 1, letting
u = p̄ satisfies the requirement. Therefore, the Slater condition is satisfied. As a result, there exist
(finite) optimal dual variables r ∗ ≥ 0 which attains the minimum of g(r), that is,
Remark 1: The above proof also shows that (3.5) is the dual problem of (3.15).
Remark 2: Another way to show Theorem 3.8 is as follows. With the optimal (finite) dual
variables r ∗ , we know that u∗i (r ∗ ), ∀i solves problem (3.15). Therefore, u∗i (r ∗ ), ∀i are feasible to
problem (3.15). As a result, i (u∗i (r ∗ ) · xki ) = sk (r ∗ ) ≥ λk , ∀k.
Remark 3: To see that the Slater condition is useful, consider the following example.
maxu∈D0 − 2i=1 ui log(ui )
(3.54)
s.t. u1 ≥ 1,
where D0 = {u|u1 , u2 ≥ 0, u1 + u2 = 1}. Here, the Slater condition is not satisfied because the
only feasible u in D0 is u = (1, 0)T , which is not in the relative interior of D0 .
The dual function in this case is g(r) = log(er + 1) − r > 0, which approaches 0 as r → +∞
but cannot attain that minimum. Therefore, there exists no finite optimal dual variable.
56 3. SCHEDULING IN WIRELESS NETWORKS
3.14 SUMMARY
This chapter starts with a description of the basic wireless scheduling problem in Section 3.1. The
problem is to schedule transmissions of interfering links to keep up with arrivals. The model of
interference is a conflict graph. We derive necessary and sufficient conditions for the existence of a
suitable schedule. In Section 3.2, we discuss a model of the CSMA protocol. This model assumes
no hidden nodes and also that the carrier sensing is instantaneous. The model results in a Markov
chain of the active independent set. The invariant distribution of that Markov chain is derived
in Lemma 3.5. This distribution has a product form. Section 3.3 introduces an idealized CSMA
algorithm that assumes that each link can estimate its arrival and service rate exactly. The key idea is
to minimize the KL-divergence between two distributions by using a gradient algorithm. Section 3.4
explains Algorithm 1 that uses the actual observations of the links (arrivals and transmissions). This
algorithm is a stochastic approximation version of the idealized algorithm. Section 3.5 elaborates on
the entropy-maximization property of the CSMA Markov chain. Section 3.6 explains Algorithm
1(b), a modification of Algorithm 1 to reduce delays. In that algorithm, the links inflate their arrival
rate. Simulation results are presented in Section 3.7. Section 3.8 sketches the proof of the convergence
of Algorithm 1. The details are in Section 3.9. In particular, the section derives a new bound on
the mixing time of CSMA Markov chains using a coupling argument. Then, Section 3.10 proves
the rate-stability of Algorithm 1. Section 3.12 explains the case when the transmission times have a
general distribution. That section provides a simple proof of the insensitivity of the CSMA Markov
chain. Finally, Section 3.13 collects a few technical proofs.
5 In this model, a transmission over a link from node m to node n is successful iff none of the one-hop neighbors of m and n is in
any conversation at the time.
6 In this model, a transmission over a link from node m to node n is successful iff neither m nor n is in another conversation at the
time.
58 3. SCHEDULING IN WIRELESS NETWORKS
above. Of particular interest is the CSMA/CA algorithm (Carrier Sense Multiple Access / Collision
Avoidance) widely deployed in the current IEEE 802.11 wireless networks.
In (18), Durvy and Thiran showed that asynchronous CSMA can achieve a high level of
spatial reuse, via the study of an idealized CSMA model without collisions. In (51), Marbach et al.
considered a model of CSMA with collisions. It was shown that under a restrictive “node-exclusive”
interference model, CSMA can be made asymptotically throughput-optimal in the limiting regime
of large networks with a small sensing delay. (Note that when the sensing delay goes to 0, collisions
asymptotically disappear.) In (61), Proutiere et al. developed asynchronous random-access-based
algorithms whose throughput performance, although not optimum, is no less than some maximal
scheduling algorithms, e.g., Maximum Size scheduling algorithms.
However, none of these works have established the throughput optimality of CSMA under a
general interference model, nor have they designed specific algorithms to achieve the optimality.
59
CHAPTER 4
L(u, s, f ; q) = − i ui log(ui ) + β M m=1 vm (fm )
+ m,k:amk =1,k=δ(m) qkm (skm − sup(k,m),m )
+ m,k:,k=δ(m) qkm (skm − fm )
(4.2)
= − i ui log(ui )
+β M v (f ) − m,k:k=δ(m) qkm fm
m=1 m m
+ k,m:amk =1 [skm · (qkm − qdown(k,m),m )]
where down(k, m) means flow m’s downstream link of link k (Note that down(up(k, m), m) = k).
If k is the last link of flow m, then define qdown(k,m),m = 0.
Fix the vectors u and q first, we solve for skm in the sub-problem
maxs k,m:amk =1 [skm · (qkm − qdown(k,m),m )]
s.t. skm ≥ 0, ∀k, m : amk = 1 (4.3)
m:amk =1 skm ≤ i (ui · xk ), ∀k.
i
The solution is easy to find (similar to (47) and related references therein) and is as follows.
At link k, denote zk := maxm:amk =1 (qkm − qdown(k,m),m ). Then,
(i) If zk > 0, then for a m ∈ arg maxm:amk =1 (qkm − qdown(k,m),m ), let skm = i (ui · xki ) and let
skm = 0, ∀m = m . In other words, link k serves a flow with the maximal back-pressure qkm −
qdown(k,m),m .
(ii) If zk ≤ 0, then let skm (j ) = 0, ∀m, i.e., link k does not serve any flow.
Since the value of qdown(k,m),m can be obtained from a one-hop neighbor, this algorithm is
distributed. (In practice, the value of qdown(k,m),m can be piggybacked in the ACK packet in IEEE
802.11.)
62 4. UTILITY MAXIMIZATION IN WIRELESS NETWORKS
Plugging the solution of (4.3) back into (4.2), we get
L(u, f ; q) = [− i ui log(ui ) + k (zk )+ ( i ui · xki )]
+[β M m=1 vm (fm ) − m,k:k=δ(m) qkm fm ]
• Scheduling: In period j + 1, link k lets its TA be rk (j ) = [zk (j )]+ in the CSMA operation,
where zk (j ) = maxm:amk =1 (qkm (j ) − qdown(k,m),m (j )). (The rationale is that, given z(j ), the
u∗ that maximizes L(u, f ; q(j )) over u is the stationary distribution of the CSMA Markov
Chain with rk (j ) = [zk (j )]+ , similar to the proof of Theorem 3.14.) Choose a flow m ∈
arg maxm:amk =1 (qkm (j ) − qdown(k,m),m (j )). When link k gets the opportunity to transmit, (i)
if zk (j ) > 0, it serves flow m ; (Similar to Algorithm 1, the dummy packets transmitted by
link k, if any, are counted in skm (j ).) (ii) if zk (j ) ≤ 0, then it transmits dummy packets. These
dummy packets are not counted, i.e., let skm (j ) = 0, ∀m. Also, they are not put into any actual
queue at the receiver of link k. (A simpler alternative is that link k keeps silent if zk (j ) ≤ 0.
That case can be similarly analyzed following the method in Section 4.4.)
• Congestion control: For each flow m, if link k is its source link, the transmitter of link k lets
the flow rate in period j + 1 be fm (j ) = arg maxfˆm ∈[0,1] {β · vm (fˆm ) − qkm (j ) · fˆm }. (This
maximizes L(u, f ; q(j )) over f .)
• The dual variables qkm (maintained by the transmitter of each link) are updated (simi-
lar to a subgradient algorithm). At time tj +1 , let qkm (j + 1) = [qkm (j ) − α · skm (j )]+ +
α · sup(k,m),m (j ) if k = δ(m); and qkm (j + 1) = [qkm (j ) − α · skm (j )]+ + α · fm (j ) if k =
δ(m). (By doing this, approximately qkm ∝ Qkm .)
Remark 1: As T → ∞ and α → 0, Algorithm 3 approximates the “ideal” algorithm that solves (4.1),
due to the convergence of the CSMA Markov chain in each period. A bound of the achievable utility
of Algorithm 3, compared to the optimal total utility W̄ defined in (4.4) is given in Section 4.4. The
4.2. EXTENSIONS 63
bound, however, is not very tight since our simulations show good performance without a very large
T or a very small α.
Remark 2: In Section 4.2, we show that by using similar techniques, the adaptive CSMA algorithm
can be combined with optimal routing, anycast or multicast with network coding. So it is a modular
MAC-layer protocol which can work with other protocols in the transport layer and the network
layer.
Remark 3: Coincidentally, the authors of (72) implemented a protocol similar to Algorithm 3 using
802.11e hardware, and it shows superior performance compared to normal 802.11. There, according
to the backpressure, a flow chooses from a discrete set of contention windows, or “CW’s” (where a
smaller CW corresponds to a larger TA). We note that, however, different from our work, (72) only
focuses on implementation study, without theoretical analysis. Therefore, the potential optimality
of CSMA is not shown in (72). Also, the CW’s there are set in a more heuristic way.
subject to the same constraints as in (4.1). Assume that u = ū when (4.4) is solved. Also, assume
that in the optimal solution of (4.1), f = f̂ and u = û. We have the following bound.
Proof. Notice that H (u) = − i ui log(ui ), the entropy of the distribution u, is bounded. Indeed,
since there are N ≤ 2K possible states, one has 0 ≤ H (u) ≤ log N ≤ log 2K = K log 2.
Since in the optimal solution of problem (4.1), f = f̂ and u = û, we have H (û) +
β m vm (fˆm ) ≥ H (ū) + β W̄ . So β[ m vm (fˆm ) − W̄ ] ≥ H (ū) − H (û) ≥ −H (û) ≥ −K · log 2.
M
Also, clearly W̄ ≥ m=1 vm (fˆm ), so (4.5) follows. 2
4.2 EXTENSIONS
Using derivations similar to Section 4.1, our CSMA algorithm can serve as a modular “MAC-
layer scheduling component” in cross-layer optimization, combined with other components in the
transport layer and network layer, usually with queue lengths as the shared information. For example,
64 4. UTILITY MAXIMIZATION IN WIRELESS NETWORKS
in addition to its combination with congestion control (at the transport layer), we demonstrate in
this section its combination with optimal multipath routing, anycast and multicast (at the network
layer). Therefore, this is a joint optimization of the transport layer, network layer and the MAC layer.
4.2.1 ANYCAST
To make the formulation more general, let us consider anycast with multipath routing. (This includes
unicast with multipath routing as a special case.) Assume that there are M flows. Each flow m has
a source δ(m) (with some abuse of notation) which generates data and a set of destinations D(m)
which receive the data. “Anycast” means that it is sufficient for the data to reach any node in the
set D(m). However, there is no specific “path” for each flow. The data that the source generates is
allowed to split and traverse any link before reaching the destinations (i.e., multipath routing). This
allows for better utilization of the network resources by routing the data through less congested parts
of the network. For simplicity, we don’t consider the possibility of physical-layer multicast here, i.e.,
the effect that a node’s transmission can be received by multiple nodes simultaneously. That is, the
transmitter indicates the intended next node in the packet header and the other nodes discard that
packet.
In this case, it is more convenient to use a “node-based” formulation (47; 77). Denote the
number of nodes by J . For each node j , let I (j ) := {k|(k, j ) ∈ L}, where L is the set of links (it is
also the set V in the conflict graph), and let O(j ) := {k|(j, k) ∈ L}. Denote the rate of flow m on
link (j, l) by sjml . Then the (approximate) utility maximization problem, similar to (4.1), is
maxu,s,f − i ui log(ui ) + β · M m=1 vm (fm )
s.t. sj l ≥ 0, ∀(j, l) ∈ L, ∀m
m
fm + l∈I (j ) sljm ≤ l∈O(j ) sjml , ∀m, j = δ(m)
/ D(m)
s m ≤ l∈O(j ) sjml , ∀m, j = δ(m), j ∈
l∈I (j ) ilj m
u ·
i i x (j,l) ≥ s
m jl , ∀(j, l) ∈ L
ui ≥ 0, i ui = 1.
Associate a dual variable qjm ≥ 0 to the 2nd and 3rd lines of constraints (for each m and
j∈ / D(m)), and define qjm = 0 if j ∈ D(m). (Note that there is no flow-conservation constraint for
flow m at each node in D(m).) Then similar to Section 4.1, a partial Lagrangian is
L(u, s, f ; q) = − i ui log(ui )
m
+β · m vm (fm ) − m qδ(m) fm (4.6)
+ (j.l)∈L,m [sjml · (qjm − qlm )].
First, fix u and q, consider maximizing L(u, s, f ; q) over s, subject to sjml ≥ 0 and i ui ·
i
x(j,l) ≥ m sjml . For each link (j, l), let the maximal back-pressure z(j,l) := maxm (qjm − qlm ). Then
clearly, if z(j,l) > 0, a flow m with qjm − qlm = z(j,l) should be served (with the whole rate i ui ·
i
x(j,l) ). If z(j,l) ≤ 0, then no flow is served. After we plug this solution of s back to (4.6), the rest of
4.2. EXTENSIONS 65
the derivation is the same as in Section 4.1. Therefore, the distributed algorithm is as follows. We
again assume vm (0) ≤ V < +∞, ∀m.
Initially, assume that all queues are empty, and set qjm = 0, ∀j, m. Then iterate as follows.
(Similar to Algorithm 3, the step size is α, and the update interval is T . For simplicity, we omit the
time index here.)
• CSMA scheduling and routing: If z(j,l) > 0, link (j, l) lets r(j,l) = z(j,l) in the CSMA op-
eration. Choose a flow m with qjm − qlm = z(j,l) . When it gets the opportunity to transmit,
serve flow m . If z(j,l) ≤ 0, then link (j, l) keeps silent. (Note that there is no replication of
packets.)
• Congestion control: For each flow m, if node j is its source, then it sets fm =
arg maxfm ∈[0,1] {β · vm (fm ) − qjm fm }.
• The dual variables qjm are updated as follows: qjm ← [qjm − α l∈O(j ) sjml )]+ + α l∈I (j ) sljm
/ D(m); and qjm ← [qjm − α l∈O(j ) sjml )]+ + α(fm + l∈I (j ) sljm ) if j =
if j = δ(m) and j ∈
δ(m). (By doing this, roughly qjm ∝ Qm m
j where Qj is the corresponding queue length.) Always
let qj = 0 if j ∈ D(m).
m
Furthermore, the above algorithm can be readily extended to incorporate channel selection in multi-
channel wireless networks, with each “link” defined by a triplet (j, l; c), which refers to the logical
link from node j to l on channel c. In this scenario, the conflict graph is defined on the set of links
(j, l; c).
L(u, s, f ; q) = H (u)
mp
+β · m vm (fm ) − m ( p∈D(m) qδ(m) )fm (4.7)
mp mp mp
+ (j.l)∈L,m,p∈D(m) sj l [(qj − ql )].
mp mp
We first optimize L(u, s, f ; q) over {sj l }, subject to 0 ≤ sj l ≤ sjml . A solution is as follows:
mp mp mp mp mp mp
sj l = 0, ∀p satisfying qj − ql ≤ 0, and sj l = sjml , ∀p satisfying qj − ql > 0. Define the
mp mp
“back-pressure” of session m on link (j, l) as Wjml := p∈D(m) (qj − ql )+ . By plugging the
above solution to (4.7), we have
L(u, s, f ; q) = H (u)
mp
+β · m vm (fm ) − m ( p∈D(m) qδ(m) )fm (4.8)
+ (j.l)∈L,m sjml Wjml .
Now we optimize this expression over {sjml }, subject to i ui · x(j,l)
i ≥ m sjml . One can find
that the rest is similar to previous derivations. To avoid repetition, we directly write down the
algorithm. Assume vm (0) ≤ V < +∞, ∀m.
mp
Initially, assume that all queues are empty, and set qj = 0, ∀j, m, p. Then iterate:
• CSMA scheduling, routing, and network coding: Link (j, l) computes the maximal back-
pressure z(j,l) := maxm Wjml . If z(j,l) > 0, then let r(j,l) = z(j,l) in the CSMA operation.
Choose a session m with Wjml = z(j,l) . When it gets the opportunity to transmit, serve session
m . To do so, node j performs a random linear combination1 of the head-of-line packets from
m p m p
the queues of session m with destination p ∈ D(m ) which satisfies qj − ql > 0, and
1 We briefly explain how to perform a “random linear combination” of these packets. For more details, please refer to (26). (Note
that our main focus here is to show how to combine CSMA scheduling with other network protocols, instead of network coding
itself.) Initially, each packet generated by the source in each session is associated with an ID. Assume that each packet is composed
of many “blocks”, where each block has γ bits. So, each block can be viewed as a number in a finite field F2γ , which has 2γ
elements. For each packet P to be combined here, randomly choose a coefficient aP ∈ F2γ . Denote the i’th block of packet P
4.3. SIMULATIONS 67
transmits the coded packet (similar to (26)). The coded packet, after received by node l,
is replicated and put into corresponding queues of session m at node l (with destination
m p m p
p ∈ D(m ) such that qj − ql > 0). The destinations can eventually decode the source
packets (26). If z(j,l) = 0, then link (j, l) keeps silent.
• Congestion control: For each flow m, if node j is its source, then it sets fm =
arg maxfm ∈[0,1] {β · vm (fm ) − ( p∈D(m) qδ(m) )fm }.
mp
mp mp mp
• The dual variables qjm are updated as follows: qj ← [qj − α l∈O(j ) sj l )]+ +
mp mp mp
α l∈I (j ) slj if j = δ(m) and j = p where p ∈ D(m); and qj ← [qj −
mp mp
α l∈O(j ) sj l )]+ + α(fm + l∈I (j ) slj ) if j = δ(m). (Note that each packet generated
by the source j = δ(m) is replicated and enters the queues at the source for all destinations
mp mp mp
of session m.) By doing this, roughly qj ∝ Qj where Qj is the corresponding queue
mp
length. Always let qj = 0 if j = p where p ∈ D(m).
Note that both algorithms in Section 4.2 can be analyzed using the approach in Section 4.4 for
Algorithm 2.
4.3 SIMULATIONS
Figure 4.1 shows the network topology, where each circle represents a node. The nodes are arranged
in a grid for convenience, and the distance between two adjacent nodes (horizontally or vertically) is
1. Assume that the transmission range is 1, so that a link can only be formed by two adjacent nodes.
Assume that two links cannot transmit simultaneously if there are two nodes, one in each
link, being within a distance of 1.1 (In IEEE 802.11, for example, DATA and ACK packets are
transmitted in opposite directions. This model considers the interference among the two links in
both directions, and is equivalent to the “two-hop interference model” in this network). The paths
of 3 multi-hop flows are plotted. The utility function of each flow is vm (fm ) = log(fm + 0.01).
The weighting factor is β = 3. (Note that the input rates are adjusted by the congestion control
algorithm instead of being specified as in the last subsection.)
Figure 4.2 shows the evolution of the flow rates, using Algorithm 3 with T = 5ms and
α = 0.23. We see that they become relatively constant after an initial convergence. By directly
solving (4.4) centrally, we find that the theoretical optimal flow rates for the three flows are 0.11,
0.134 and 0.134 (data unit/ms), very close to the simulation results. The queue lengths are also stable
(in fact, uniformly bounded as proved in Section 4.4).
as P (i). Then the corresponding block in the code packet Z is computed as Z(i) = P aP P (i), where the multiplication and
summation is on the field F2γ , and the summation is over all the packets to be combined.
Clearly, each packet in the network is a linear combination of some source packets. The ID’s of these source packets and the
corresponding coefficients are included in the packet header, and are updated after each linear combination along the path (such
that the destinations can decode the source packets).
68 4. UTILITY MAXIMIZATION IN WIRELESS NETWORKS
)ORZ
)ORZ
)ORZ
Proof. According to Algorithm 3, the source of flow m solves fm (j ) = arg maxfm ∈[0,1] {β ·
vm (fm ) − qδ(m),m (j ) · fm }. It is easy to see that if qδ(m),m (j ) ≥ β · V , then fm (j ) = 0, i.e.,
the source stops sending data. Thus, qδ(m),m (j + 1) ≤ qδ(m),m (j ). If qδ(m),m (j ) < β · V , then
qδ(m),m (j + 1) ≤ qδ(m),m (j ) + α < β · V + α. Since initially qkm (0) = 0, ∀k, m, by induction, we
have
qδ(m),m (j ) ≤ β · V + α, ∀j, m. (4.9)
In Algorithm 3, no matter whether flow m has the maximal back-pressure at link k, the
actual average service rate skm (j ) = 0 if bkm (j ) ≤ 0. That is, skm (j ) > 0 only if bkm (j ) > 0. Since
4.4. PROPERTIES OF ALGORITHM 3 69
Flow rates
0.4
Flow 1
Flow 2
0.35
Flow 3
0.3
Flow rates (data units/ms)
0.25
0.2
0.15
0.1
0.05
0
0 2 4 6 8 10 12 14 16
time (ms) x 10
4
Figure 4.2: Flow rates with joint scheduling and congestion control.
skm (j ) ≤ 1, by item 3 of Algorithm 3, qdown(k,m),m (j + 1) ≤ qdown(k,m),m (j ) + α and qkm (j +
1) ≥ qkm (j ) − α. Then, if bkm (j ) > 0, we have bkm (j + 1) ≥ bkm (j ) − 2α > −2α. If bkm (j ) ≤ 0,
then bkm (j + 1) ≥ bkm (j ). Since bkm (0) = 0, by induction, we have
where C¯ is the set of feasible service rates (including C and its boundary).
By this inequality and (4.11),
{rk (j ) · Ej [sk (j )]} ≥ max [rk (j ) · μk ]
k μ∈C¯ k
−K · log(2) − K · C · C1 /T . (4.12)
Define q̃km (j ) := qkm (j )/α and r̃k (j ) := rk (j )/α. Then according to Algorithm 3, q̃km (j )
evolves as the “backlog” in (57), i.e.,
q̃km (j + 1) = [q̃km (j ) − skm (j )]+ + fm (j )
Also, r̃k (j ) is equivalent to the maximal backpressure in (57) defined as [maxm {q̃km (j ) −
q̃down(k,m),m (j )}]+ . Finally,
(that is, if for each j the policy achieves a “weight” within D of the maximal weight), and it chooses
fm (j ) such that
1
fm (j ) = arg max { V · vm (fˆm ) − q̃δ(m),m (j )fˆm }, (4.16)
ˆ
fm ∈[0,1] 2
then
2D + BK
lim inf vm (f¯m (J )) ≥ W̄ −
J →∞ V
m
J −1
where f¯m (J ) := j =0 E[fm (j )]/J is the expected average rate of flow m up to the J ’s period, W̄
is the maximal total utility that can be achieved, and
1 max
K
B= [(Rk + μin
max,k ) + (μmax,k ) ]
2 out 2
K
k=1
where Rkmax is the maximal flow input rate at link k, μin out
max,k and μmax,k are the maximal rate the
link k can receive or transmit.
With Algorithm 3, we have Rkmax = μin
max,k = μmax,k = 1. So B = 5. Also, by compar-
out
ing (4.13)–(4.16), we have V = 2β/α and D = [K · log(2) + K · C · C1 /T ]/α. Using the above
corollary, we have
2[K · log(2) + K · C · C1 /T ]/α + 5K
lim inf vm (f¯m (J )) ≥ W̄ −
J →∞ 2β/α
m
K · log(2) + K · C · C1 /T + 5αK/2
= W̄ − . (4.17)
β
As expected, when T → ∞ and α → 0, this bound matches the bound in Proposition 4.2.
Also, as β → ∞, α → 0 , and T → ∞ in a proper way (since C and C1 depend on β),
lim inf J →∞ m vm (f¯m (J )) → W̄ .
Also, in view of the dynamics of qkm (j ) in Algorithm 3, the actual queue lengths Qkm (j ) ≤
(T /α) · qkm (j ), ∀k, m, j . Therefore,
T
Qkm (j ) ≤ [β · V + (2L − 1)α]. (4.18)
α
So all queue lengths are uniformly bounded. The bound increases with T , β and decreases with α.
The above bounds (4.17) and (4.18), however, are not very tight. Our simulation shows near-
optimal total utility without a very large β, T or a very small α. This leads to moderate queue
lengths.
4.5 SUMMARY
In this chapter, we have developed fully distributed cross-layer algorithms for utility maximization in
wireless networks. First, we combined admission control (at the transport layer) with the A-CSMA
scheduling algorithm (at the MAC layer) to approach the maximal utility (Section 4.1, 4.3, and 4.4).
Since the flows can traverse multiple hops, the transmission aggressiveness of each link is based
on the maximal back-pressure instead of the queue length as in the last chapter (which focused on
one-hop flows).
Then we further showed that A-CSMA is a modular MAC-layer component that can work
seamlessly with other protocols in the network layer and transport layer (Section 4.2). For example,
in addition to admission control, it was further combined with optimal routing, anycast and multicast
with network coding.
A key to the design of these algorithms is a modification of the usual utility maximization
problem. In particular, instead of maximizing the utility, we maximize the sum of an entropy and
a weighted utility. By doing this, we can not only obtain the suitable CSMA parameters given the
flow rates (as in the last chapter), but we can also control the flow rates to arbitrarily approximate
the maximal utility (by using a large weight on the utility).
4.6. RELATED WORKS 73
CHAPTER 5
We have shown in Chapter 4 that an adaptive CSMA (Carrier Sense Multiple Access) distributed
algorithm (Algorithm 1) can achieve the maximal throughput in a general class of wireless networks.
However, that algorithm needs an idealized assumption that the sensing time is negligible, so that
there are no collisions. In this chapter, we study more practical CSMA-based scheduling algorithms
with collisions. First, in Section 5.2, we provide a discrete-time model of this CSMA protocol and
give an explicit throughput formula, which has a simple product-form due to the quasi-reversibility
structure of the model. Second, in Section 5.3, we show that Algorithm 1 in Chapter 3 can be
extended to approach throughput optimality in this case. Finally, sufficient conditions are given to
ensure the convergence and stability of the proposed algorithm.
To combine the scheduling algorithm (with collisions) with congestion control, we follow an
approach similar to the one we used in Chapter 4. The details of the combination are given in (32).
To achieve throughput-optimality even with collisions, we need to limit the impact of colli-
sions. Our basic idea is to use a protocol similar to the RTS/CTS mode of IEEE 802.11 where we
let each link fix its transmission probability but adjust its transmission time (or length) to meet the
demand. In the absence of hidden nodes, collisions only occur among the small RTS packets but
not the data packets. Also, the collision probability is limited since we fix the transmission probabil-
ities. These two key factors combined ensure a limited impact of collisions. When the transmission
lengths are large enough, the protocol intuitively approximates the idealized-CSMA.
However, to precisely model and compute the service rates in the CSMA protocol with colli-
sions and to prove the throughput-optimality of our algorithms we need to handle two difficulties.
First, the Markov chain used to model the CSMA protocol is no longer time-reversible. Second, the
resulting stationary distribution, although in a product-form, is no longer a Markov Random Field.
Finally, it is worth noting that an interesting by-product of our general CSMA model devel-
oped in this chapter is the unification of several known models for slotted-ALOHA, wireless LAN
(as in Bianchi (4)) and the idealized-CSMA model. Indeed, we believe that the general CSMA
model captures some essence of random access algorithms.
76 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
5.2.1 MODEL
We now describe the basic CSMA/CA protocol with fixed transmission probabilities, which suffices
for our later development. Let σ̃ be the duration of each minislot. (In IEEE 802.11a, for example,
σ̃ = 9μs.) In the following, we will simply use slot to refer to the minislot.
The conflicting relationships among the links are represented by a conflict graph, as defined in
Chapter 1. In particular, it assumes that the conflict relationship among any two links is symmetric.
Assume that all links are saturated, i.e., always have packets to transmit. In each slot, if the transmitter
of link i is not already transmitting and if the medium is idle, the transmitter of link i starts
transmitting with probability pi (also denote qi := 1 − pi ). If at a certain slot, link i did not choose
to transmit but a conflicting link starts transmitting, then link i keeps silent until that transmission
ends. If conflicting links start transmitting at the same slot, then a collision happens, and assume
that all the involved links lose their packets.
There are some limitations in the above model. First, we have assumed that the conflicting
relationships between links are symmetric, which does not always hold. Consider the example in
Fig. 5.1: if link 1 and link 2 start transmitting at the same slot, link 1’s packet is corrupted but link
2’s packet could be successfully received (since the interference from link 1’s transmission is weak).
1 Second, we have implicitly assumed that the networks do not have hidden nodes, such that all
conflicting links can hear each other. (For more discussions on the hidden-node problem and the
possible ways to address it, please refer to (28) and its references.) The consideration of asymmetry
and hidden nodes would significantly complicate the analysis, and is an interesting direction for
future research.
/LQN /LQN
Each link transmits a short probe packet with length γ (similar to the RTS packet in 802.11)
before the data is transmitted. (All “lengths” here are measured in number of slots and are assumed
to be integers.) Using such a probe increases the overhead of successful transmissions, but it can
avoid collisions of long data packets. When a collision happens, only the probe packets collide, so
each collision lasts precisely γ slots. Assume that a successful transmission of link i lasts τi , which
1 Note that this kind of asymmetry does not occur in the idealized CSMA model since there is no collision there.
5.2. CSMA/CA-BASED SCHEDULING WITH COLLISIONS 77
and assume that the p.m.f. has a finite support, i.e., Pi (b) = 0, ∀b > bmax > 0. Then the mean of τi
is
b
max
Ti := E(τi ) = b · Pi (b) (5.2)
b=1
Fig. 5.2 illustrates the timeline of the 3-link network in Fig. 2.1, where link 1 and 2 conflict,
and link 2 and 3 conflict.
We note a subtle point in our modeling. In IEEE 802.11, a link can attempt to start a
transmission only after it has sensed the medium as idle for a constant time (which is called DIFS,
or “DCF Inter Frame Space”). To take this into account, DIFS is included in the packet transmission
length τi and the collision length γ . In particular, for a successful transmission of link i, DIFS is
included in the constant overhead τ . Although DIFS, as part of τ , is actually after the payload, in
Fig. 5.2, we plot τ before the payload. This is for convenience and does not affect our results. So,
under this model, a link can attempt to start a transmission immediately after the transmissions of
its conflicting links end.
The above model possesses a quasi-reversibility property that will lead to a simple throughput
formula. Our model, in Fig. 5.2, reversed in time, follows the same protocol as described above, except
for the order of the overhead and the payload, which are reversed. A key reason for this property
is that the collisions start and finish at the same time. (This point will be made more precise in
Section 5.6.)
7 J
IJ¶
/LQN 6XFFHVV
0LQLVORWV «
FROOLVLRQ 7
/LQN 6XFFHVV
«
Figure 5.2: Timeline in the basic model (In this figure, τi = Ti , i = 1, 2, 3 are constants.)
5.2.2 NOTATION
Let the “on-off state” be x ∈ {0, 1}K where xk , the k-th element of x, is such that xk = 1 if link k is
active (transmitting) in state x, and xk = 0 otherwise. Thus, x is a vector indicating which links are
78 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
active in a given slot. Let G(x) be the subgraph of G after removing all vertices (each representing a
link) with state 0 (i.e., any link j with xj = 0) and their associated edges. In general, G(x) is composed
of a number of connected components (simply called “components”) Cm (x), m = 1, 2, . . . , M(x)
(where each component is a set of links, and M(x) is the total number of components in G(x)). If a
component Cm (x) has only one active link (i.e., |Cm (x)| = 1), then this link is having a successful
transmission; if |Cm (x)| > 1, then all the links in the component are experiencing a collision. Let
the set of “successful” links in state x be S(x) := {k|k ∈ Cm (x) with |Cm (x)| = 1}, and the set
of links that are experiencing collisions be φ(x). Also, define the “collision number” h(x) as the
number of components in G(x) with size larger than 1. Fig. 5.3 shows an example. Note that the
transmissions in a collision component Cm (x) are “synchronized”, i.e., the links in Cm (x) must have
started transmitting in the same slot, and they will end transmitting in the same slot after γ slots
(the length of the probe packets).
Figure 5.3: An example conflict graph (each square represents a link). In this on-off state x, links 1, 2,
5 are active. So S(x) = {5}, φ(x) = {1, 2}, h(x) = 1.
where bk is the total length of the current packet link k is transmitting, ak is duration to-date of the
transmission in progress.
For example, in Fig. 5.4, the states w and w are
For convenience we also use the following notation. Let x(w) be the on-off state in w. In
state w, if link k is off, denote wk = 0; if link k is on, let wk = 1 and denote by bk (w), ak (w) the
corresponding bk , ak .
E PLQLVORW
/LQN
/LQN
E
/LQN
Z Z¶
7LPH D[LV
1 h(x)
p(x) = (γ Tk ) (1 − pi ) pj
E
i:xi =0
k∈S(x) j :xj =1
1 h(x)
K
= (γ Tk ) pixi qi1−xi (5.6)
E
k∈S(x) i=1
where qi := 1 − pi , Ti is the mean transmission length of link i (as defined in (5.2)), and E is a
normalizing term such that x∈{0,1}K p(x) = 1.2
K xi 1−xi
where g(x) = γ h(x) i=1 pi qi does not depend on r, and the normalizing term is
2 In this chapter, several kinds of “states” are defined. With a little abuse of notation, we often use p(·) to denote the probability of
some “state” under the stationary distribution of the CSMA/CA Markov chain. This does not cause confusion since the meaning
of p(·) is clear from its argument.
3 In (6), a similar model for CSMA/CA network is formulated with analogy to a loss network (39). However, since (6) studied the
case when the links are unsaturated, the explicit expression of the stationary distribution was difficult to obtain.
5.3. A DISTRIBUTED ALGORITHM TO APPROACH THROUGHPUT-OPTIMALITY 81
Then, the probability that link k is transmitting a payload in a given slot is
T0 · exp(rk )
sk (r) = p(x; r). (5.9)
τ + T0 · exp(rk )
x:k∈S(x)
Recall that the capacity of each link is 1. Also, it’s easy to show that the CSMA/CA Markov chain
is ergodic. As a result, if r is fixed, the long-term average throughput of link k converges to the
stationary probability sk (r). So we say that sk (r) ∈ [0, 1] is the service rate of link k.
sk (r ∗ ) = λk , ∀k. (5.10)
where
L(r; λ) = (λk rk ) − log(E(r)), (5.12)
k
with E(r) defined in (5.8). This is because ∂L(r; λ)/∂rk = λk − sk (r), ∀k.
where α(i) > 0 is the step size in period i and λk (i), sk (i) are the empirical average arrival rate
and service rate in period i (i.e., the actual amount of arrived traffic and served traffic in period i
divided by M). Note that λk (i), sk (i) are random variables which are generally not equal to λk and
sk (r(i − 1)). Also, h(·) is a “penalty function”, defined below, that keeps r(i) in a bounded region.
(This is a “softer” approach than directly projecting rk (i) to the set [rmin , rmax ]. The purpose is only
to simplify the proof of Theorem 5.4 later.) One defines
⎧
⎨rmin − y
⎪ if y < rmin
h(y) = 0 if y ∈ [rmin , rmax ] (5.14)
⎪
⎩
rmax − y if y > rmax .
Remark: An important point here is that, as in the previous chapters, we let link k send dummy packets
when its queue is empty. So, each link is saturated. This ensures that the CSMA/CA Markov chain
has the desired stationary distribution in (5.6). The transmitted dummy packets are also included in
the computation of sk (i). (Although the use of dummy packets consumes bandwidth, it simplifies
our analysis, and does not prevent us from achieving the primary goal, i.e., approaching throughput-
optimality.)
p
In period i + 1, given r(i), we need to choose τk (i), the payload lengths of each link k, so that
p p p p p
E(τk (i))= Tk (i) = T0 exp(rk (i)). If Tk (i) is an integer, then we let τk (i) = Tk (i); otherwise, we
p
randomize τk (i) as follows:
p p p
p Tk (i) with probability Tk (i) − Tk (i)
τk (i) = p
p p (5.15)
Tk (i) with probability Tk (i) − Tk (i).
Intuitively speaking, Algorithm 4 says that when rk ∈ [rmin , rmax ], if the empirical arrival
rate of link k is larger than the service rate, then link k should transmit more aggressively by using
a larger mean transmission length, and vice versa.
Algorithm 4 is parametrized by rmin , rmax , which are fixed during the execution of the algo-
rithm. Note that the choice of rmax affects the maximal possible payload length. Also, as discussed
below, the choices of rmax and rmin also determine the “capacity region” of Algorithm 4.
We define the region of arrival rates
where r ∗ (λ) denotes the unique solution of maxr L(r; λ) (such that sk (r ∗ ) = λk , ∀k, by Theorem 5.2).
Later, we show that the algorithm can “support” any λ ∈ C (rmin , rmax ) in some sense under certain
conditions on the step sizes.
Clearly, C (rmin , rmax ) → C as rmin → −∞ and rmax → ∞, where C is the set of all strictly
feasible λ (by Theorem 5.2). Therefore, although given (rmin , rmax ) the region C (rmin , rmax ) is
smaller than C , one can choose (rmin , rmax ) to arbitrarily approach the maximal capacity region
C . Also, there is a tradeoff between the capacity region and the maximal packet length, which is
unavoidable given the fixed overhead per packet and the collisions.
The complete proof of Theorem 5.4 is Section 5.6.3, but the result can be intuitively under-
stood as follows. If the step size is small (in (i), α(i) becomes small when i is large), rk is “quasi-static”
such that, roughly, the service rate is averaged (over multiple periods) to sk (r), and the arrival rate
is averaged to λk . Thus, the algorithm solves the optimization problem (5.11) by a stochastic ap-
proximation (7) argument, such that r(i) converges to r ∗ in part (i), and r(i) is near r ∗ with high
probability in part (ii).
84 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
Definition 5.5 Algorithm 4(b). Algorithm 4(b) is defined by the following update equation for
each link k:
rk (i) = rk (i − 1) + α(i)[λk (i) +
− sk (i) + h(rk (i − 1))] (5.17)
where
> 0 is a small constant. That is, the algorithm “pretends” to serve the arrival rates λ +
· 1
which are slightly larger than the actual rates λ.
λ ∈ C (rmin , rmax ,
) := {λ|λ +
· 1 ∈ C (rmin , rmax )}.
Proof. The proof is similar to that of Theorem 5.4 and the details are given in (34). A sketch is as
follows: Part (i) is similar to (i) in Theorem 5.4.The extra fact that sk (r ∗ ) > λk , ∀k reduces the queue
size compared to Algorithm 4 (since when the queue size is large, it tends to decrease). Part (ii) holds
because if we choose δ =
/2, then by Theorem 5.4, lim inf J →∞ [ Ji=1 sk (i)/J ] ≥ λk +
− δ >
λk , ∀k almost surely if α is small enough. Then the result follows by showing that the queue sizes
have negative drift. 2
length” T0 = 15. The collision length (e.g., RTS length) is γ = η · 10, and the overhead of success-
ful transmission is τ = η · 20, where η is a “relative size” of the overhead for simulation purpose.
Later we will let η ∈ {1, 0.5, 0.2} to illustrate the effects of overhead size.
Now we vary ρ and η. And in each case we solve problem (5.11) to obtain the required
mean payload length Tk := T0 · exp(rk∗ ), k = 1, 2, . . . , 7. Fig. 5.6 (a) shows how Tk ’s change as
p p
p
the load ρ changes, with η = 1. Clearly, as ρ increases, the Tk ’s tend to increase. Also, the rate
of increase becomes faster as ρ approaches 1. Therefore, as mentioned before, there is a tradeoff
between the throughput and transmission lengths (long transmission lengths introduce larger delays
p
for conflicting links). Fig. 5.6 (b) shows how the Tk ’s depend on the relative size η of the overhead
p
(with fixed ρ = 0.8 and η ∈ {1, 0.5, 0.2}). As expected, the smaller the overhead, the smaller Tk ’s
are required.
Next, we evaluate Algorithm 4(b) in our C++ simulator. The update in (5.17) is performed
every M = 500 slots. Let the step size α(i) = 0.23/(2 + i/100), the upper bound rmax = 5, the
lower bound rmin = 0, and the “gap”
= 0.005. Assume the initial value of each rk is 0.
Let the “load” of arrival rates be ρ = 0.8 (i.e., λ = 0.8 · λ̄), and the relative size of overhead
η = 0.5 (i.e., γ = 5, τ = 10). To show the negative drift of the queue lengths, assume that initially
all queue lengths are 300 data units (where each data unit takes 100 slots to transmit). As expected,
Fig. 5.7 (a) shows the convergence of the mean payload lengths, and Fig. 5.7 (b) shows that all
queues are stable.
1200 Link 7
Link 6
250
Link 7
1000
200
800
150
600
100
400
50
200
0
0 0.2 0.5 1
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9
Load Overhead (relative size)
(a) Relation with the load (given η = 1) (b) Relation with the overhead (given ρ = 0.8)
transmission time. In the second step, we prove the theorem by summing the distribution of the
CSMA/CA Markov chain over all the states with the same on-off links.
where
1 if j ∈ φ(x(w))
fj (w) = , (5.19)
Pj (bj (w)) if j ∈ S(x(w))
where Pj (bj (w)) is the p.m.f. of link j ’s transmission length, as defined in (5.1). Also, K0 is a normalizing
term such that w π(w) = 1, i.e., all probabilities sum up to 1. Note that π(w) does not depend on ak ’s.
5.6. PROOFS OF THEOREMS 87
Link 5
250
Link 1
100 Link 2 200
Link 3
150
Link 4
50 Link 5 100
Link 6
Link 7 50
0 0
0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5
time (ms) 5
x 10 time (ms) x 10
5
(a) Convergence of the mean payload lengths (b) Stability of the queues
Figure 5.7: Simulation of Algorithm (5.17) (with the conflict graph in Fig. 5.5)
Proof. Consider a transition from a valid state w to a valid state w . Define the sets
A = {i | wi = 0, ai (w ) = 1}
B = {i | ai (w) = bi (w), ai (w ) = 1}
C0 = {i | wi = wi = 0 and i is blocked}
C1 = {i | wi = wi = 0 and i is not blocked}
D = {i | ai (w) < bi (w), ai (w ) = ai (w) + 1, bi (w ) = bi (w)}
E = {i | ai (w) = bi (w), wi = 0}
By “i is blocked,” we mean that in state w link i has a neighbor that is transmitting a packet and
that transmission is not in its last time slot. As a result, link i cannot start a transmission in the next
slot. In other words, link i has a neighbor which is in the same transmission in states w and w .
A transition from w to w is possible if and only if all i belong to A ∪ B ∪ · · · ∪ E. Then, the
probability of a transition from w to w is
We now define a similar system. The only difference is that if a node is transmitting, its state
is (b, a) if the transmission will last b slots and the number of slots to go is a (including the current
one).
88 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
Consider a transition from state w to state w. This transition is possible if and only if all i’s
belong to A ∪ B ∪ · · · ∪ E , where A , . . . , E are defined similarly to A, . . . , E:
A = {i | wi = 0, ai (w) = bi (w)}
B = {i | ai (w ) = 1, ai (w) = bi (w)}
C0 = {i | wi = wi = 0 and i is blocked}
C1 = {i | wi = wi = 0 and i is not blocked}
D = {i | ai (w ) > 1, ai (w) = ai (w ) − 1, bi (w) = bi (w )}
E = {i | ai (w ) = 1, wi = 0}
Claim 1: If the transition from state w to w is possible in the original system, then the
transition from w to w is possible in the new system. And vice versa.
To prove this claim, note that A = E, B = B, D = D, E = A. Also, C0 = C0 , C1 = C1 .
This is because if link i is in C0 , then there is a neighbor j which is in the same transmission in
state w and w . So link i is also in the set C0 . And vice versa. As a result, C0 = C0 . Similarly, one
can show that C1 = C1 .
If the transition from state w to w is possible in the original system, all i’s belong to A ∪ B ∪
· · · ∪ E. By the above identities, all i’s also belong to A ∪ B ∪ · · · ∪ E , so the transition from w
to w is possible in the new system. This completes the proof of Claim 1.
The probability of transition in the new system is
Claim 2:
π(w)Q(w, w ) = π(w )Q̃(w , w), (5.20)
where
π(w) = K0 i:wi =0 qi i:wi =0 pi fi (w).
In this expression, K0 is a normalizing constant.
To prove this identity, consider a pair (w, w ) such that Q(w, w ) > 0, i.e., such that
Q̃(w , w) > 0. Then
{i | wi = 0} = A ∪ C0 ∪ C1 and {i | wi = 0} = B ∪ D ∪ E.
Consequently,
π(w) = K0 i∈A∪C0 ∪C1 qi i∈B∪D∪E pi fi (w).
Hence,
π(w)
= K0 i∈C0 qi i∈D pi fi (w). (5.21)
Q̃(w , w)
Similarly,
{i | wi = 0} = C0 ∪ C1 ∪ E and {i | wi = 0} = A ∪ B ∪ D.
5.6. PROOFS OF THEOREMS 89
Consequently,
Hence,
π(w )
= K0 i∈C0 qi i∈D pi fi (w ). (5.22)
Q(w, w )
For i ∈ D, one has bi (w ) = bi (w), so that the expressions in (5.21) and (5.22) agree. There-
fore, Claim 2 hold.
Finally, we sum up equation (5.20) over all states w’s that can transit to w in the original
system. By Claim 1, this is the same as summing up over all states w’s that w can transit to in the
new system. Therefore,
π(w)Q(w, w ) = π(w )Q̃(w , w) = π(w ) Q̃(w , w) = π(w ).
w w w
Using Lemma 5.7, the probability of any on-off state x, as in Theorem 5.1, can be computed
by summing up the probabilities of all states w’s with the same on-off state x, using (5.18).
Define the set of valid states B (x) := {w| the on-off state is x in the state w}. By Lemma 5.7,
we have
p(x) = π(w)
w∈B(x)
1
= { qi [pj · fj (w)]}
E
w∈B(x) i:xi =0 j :xj =1
1
= ( qi pj ) fj (w)
E
i:xi =0 j :xj =1 w∈B(x) j :xj =1
1
= ( qi pj ) · [ Pj (bj )] (5.23)
E
i:xi =0 j :xj =1 w∈B(x) j ∈S(x)
Now we compute the term w∈B(x) [ j ∈S(x) Pj (bj )]. Consider a state w =
{x, ((bk , ak ), ∀k : xk = 1)} ∈ B (x). For k ∈ S(x), bk can be different values in Z++ . For each fixed
bk , ak can be any integer from 1 to bk . For a collision component Cm (x) (i.e., |Cm (x)| > 1), the
90 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
remaining time of each link in the component, a (m) , can be any integer from 1 to γ . Then we have
[ Pj (bj )]
w∈B(x) j ∈S(x)
= [ Pj (bj )] ( 1)
j ∈S(x) bj 1≤aj ≤bj m:|Cm (x)|>1 1≤a (m) ≤γ
= [ bj Pj (bj )] · γ h(x)
j ∈S(x) bj
=( Tj )γ h(x) (5.24)
j ∈S(x)
1
p((x, z); r) = g(x, z) · exp( zk rk ) (5.25)
E(r)
k
where
g(x, z) = g(x) · (τ )|S(x)|−1 z T01 z . (5.26)
where 1 z is the number of links that are transmitting the payload in state (x, z).
Clearly, this provides another expression of the service rate sk (r):
sk (r) = p((x, z); r). (5.27)
(x,z)∈S :zk =1
5.6. PROOFS OF THEOREMS 91
Step 2: Alternative Characterization of Feasible Rates
Now, we give alternative definitions of feasible and strictly feasible arrival rates to facilitate
our proof. We will show that these definitions are equivalent to Definition 3.1.
Let C¯CO be the set of feasible λ, where “CO” stands for “collision”.
The rationale of the definition is that if λ can be scheduled by the network, the fraction of
time that the network spent in the detailed states must be non-negative and sum up to 1. (Note
that (5.28) is the probability that link k is sending its payload given the distribution of the detailed
states.)
For example, in the network in Fig. 2.1, λ = (0.5, 0.5, 0.5) is feasible because (5.28) holds if
we let the probability of the detailed state (x = (1, 0, 1), z = (1, 0, 1)) be 0.5, the probability of the
detailed state (x = (0, 1, 0), z = (0, 1, 0)) be 0.5, and all other detailed states have probability 0.
C¯CO = C¯ (5.29)
CCO = C. (5.30)
Proof. We first prove (5.29). By definition, any λ ∈ C¯ can be written as λ = σ ∈X p̄σ σ where
X is the set of independent sets, and p̄ = (p̄σ )σ ∈X is a probability distribution, i.e., p̄σ ≥
0, σ ∈X p̄σ = 1. Now, we construct a distribution p over the states (x, z) ∈ S as follows. Let
92 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
p((σ, σ )) = p̄σ , ∀σ ∈ X , and let p((x, z)) = 0 for all other states (x, z) ∈ S . Then, clearly
¯
(x,z)∈S p((x, z)) · z = σ ∈X p((σ, σ )) · σ = σ ∈X p̄σ σ = λ, which implies that λ ∈ CCO . So,
C¯ ⊆ C¯CO . (5.31)
On the other hand, if λ ∈ C¯CO , then λ = (x,z)∈S p((x, z)) · z for some distribution p over
S . We define another distribution p̄ over X as follows. Let p̄σ = (x,z)∈S :z=σ p((x, z)), ∀σ ∈ X .
Then, λ = (x,z)∈S p((x, z)) · z = σ ∈X (x,z)∈S :z=σ p((x, z))σ = σ ∈X p̄σ σ , which implies
that λ ∈ C¯. Therefore,
C¯CO ⊆ C¯. (5.32)
Step 3: Existence of r ∗
Assume that λ is strictly feasible. Consider the following convex optimization problem, where
the vector u can be viewed as a probability distribution over the detailed states (x, z):
maxu {H (u) + [u(x,z) · log(g(x, z))]}
(x,z)∈S
s.t. u(x,z) = λk , ∀k
(x,z)∈S :zk =1
u(x,z) ≥ 0, u(x,z) = 1 (5.33)
(x,z)
where H (u) := (x,z)∈S [−u(x,z) log(u(x,z) )] is the “entropy” of the distribution u.
Let rk be the dual variable associated with the constraint (x,z)∈S :zk =1 u(x,z) = λk , and let
the vector r := (rk ). We will show the following.
So
∂ L(u; r)
= − log(u(x,z) ) − 1 + log(g(x, z)) + rk .
∂u(x,z)
k:zk =1
We claim that
u(x,z) (r) := p((x, z); r), ∀(x, z) ∈ S (5.35)
(cf. equation (5.25)) maximizes L(u; r) over u subject to u(x,z) ≥ 0, (x,z) u(x,z) = 1. Indeed, the
partial derivative at the point u(r) is
∂ L(u(r); r)
= log(E(r)) − 1,
∂u(x,z)
which is the same for all (x, z) ∈ S (Since given the dual variables r, log(E(r)) is a constant). Also,
u(x,z) (r) = p((x, z); r) > 0 and (x,z) u(x,z) (r) = 1. Therefore, it is impossible to increase L(u; r)
by slightly perturbing u around u(r) (subject to 1T u = 1). Since L(u; r) is concave in u, the claim
follows.
Denote l(y) = maxu L(u; y), then the dual problem of (5.33) is inf y l(y). Plugging the ex-
pression of u(x,z) (y) into L(u; y), it is not difficult to find that inf r l(r) is equivalent to supr L(r; λ)
where L(r; λ) is defined in (5.12).
Since λ is strictly feasible, it can be written as (5.28) where (x,z)∈S p̄((x, z)) = 1 and
p̄((x, z)) > 0. Therefore, there exists u 0 (by choosing u = p̄) that satisfies the constraints
in (5.33) and also in the interior of the domain of the objective function. So, problem (5.33) satisfies
the Slater condition (8). As a result, there exists a vector of (finite) optimal dual variables r ∗ when
problem (5.33) is solved. Also, r ∗ solves the dual problem supr L(r; λ). Therefore, supr L(r; λ) is
attainable and can be written as maxr L(r; λ), as in (5.11).
Finally, the optimal solution u∗ of problem (5.33) is such that u∗(x,z) = u(x,z) (r ∗ ), ∀(x, z) ∈ S .
Also, u∗ is clearly feasible for problem (5.33). Therefore,
u∗(x,z) = sk (r ∗ ) = λk , ∀k.
(x,z)∈S :zk =1
94 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
2
Remark: From (5.34) and (5.35), we see that a subgradient (or gradient) of the dual objective function
L(r; λ) is
∂L(r; λ)
= λk − u(x,z) (r) = λk − sk (r).
∂rk
(x,z)∈S :zk =1
Suppose that r ∗ is not unique, that is, there exist rI∗ = rI∗I but both are optimal r. Then, rI,k
∗ = r ∗
I I,k
for some k. This contradicts (5.36) and the uniqueness of u∗ . Therefore, r ∗ is unique. This also
implies that maxr L(r; λ) has a unique solution r ∗ .
where f (rk (i − 1), Y (i − 1)) := λk − sk (i) + h(rk (i − 1)), and M(i) = λk (i) − λk is a martingale
noise.
To use Corollary 8 in page 74 of (7) to show Algorithm 3’s almost-sure convergence to r ∗ , the
following conditions are sufficient:
(i) f (·, ·) is Lipschitz in the first argument and uniformly in the second argument. This holds
by the construction of h(·).
(ii) The transition kernel of Y (i) is continuous in r(i).This is true due to the way we randomize
the transmission lengths in (5.15).
(iii) (5.38) has a unique convergent point r ∗ , which has been shown above.
(iv) With Algorithm 4, rk (i) is bounded ∀k, i almost surely. This is proved in Lemma 5.12
below.
(v) Tightness condition ((†) in (7), page 71): This is satisfied since Y (i) has a bounded state-
space (cf. conditions (6.4.1) and (6.4.2) in (7), page 76). The state space of Y (i) is bounded
because sk (i) ∈ [0, 1] and w0 (i) is in a finite set (which is shown in Lemma 5.13) below.
Proof. We first prove the upper bound rmax + 2λ̄ by induction: (a) rk (0) ≤ rmax ≤ rmax + 2λ̄; (b)
For i ≥ 1, if rk (i − 1) ∈ [rmax + λ̄, rmax + 2λ̄], then h(rk (i − 1)) ≤ −λ̄. Since λk (i) − sk (i) ≤ λ̄,
we have rk (i) ≤ rk (i − 1) ≤ rmax + 2λ̄. If rk (i − 1) ∈ (rmin , rmax + λ̄), then h(rk (i − 1)) ≤ 0.
96 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
Also since λk (i) − sk (i) ≤ λ̄ and α(i) ≤ 1, ∀i, we have rk (i) ≤ rk (i − 1) + λ̄ · α(i) ≤ rmax + 2λ̄.
If rk (i − 1) ≤ rmin , then
p
Proof. By Lemma 5.12, we know that rk (i) ≤ rmax + 2λ̄, ∀k, i, so Tk (i) ≤ T0 exp(rmax +
p
2λ̄), ∀k, i. By (5.15), we have τk (i) ≤ T0 exp(rmax + 2λ̄) + 1, ∀k, i. Therefore, in state w0 (i) =
{x, ((bk , ak ), ∀k : xk = 1)}, we have bk ≤ bmax for a constant bmax and ak ≤ bk for any k such that
xk = 1. So, w0 (i) is in a finite set. 2
Part (ii): Proof of Theorem 5.4 with Constant Step Size
The intuition is the same as in part (i). That is, if the constant step size is small enough, then
the algorithm approximately solves problem maxr G(r; λ). Please refer to (34) for the full proof.
5.7 SUMMARY
The goal of this chapter was to define a CSMA algorithm, Algorithm 4, that achieves maximum
throughput in a network with collisions. The main idea is to let stations with a big backlog transmit
longer packets. In this protocol, the attempt probability is the same for all the stations and is constant.
The stations transmit a short request (similar to an RTS/CTS exchange in WiFi). Stations collide
if they start their request during the same mini-slot. However, the collisions have a short (fixed)
duration.Thus, as the packet transmissions increase, the fraction of time that collisions waste becomes
negligible.
Section 5.2 describes the protocol with collisions and its model. Theorem 5.1 provides the
expression of link service rates. The main idea behind that result is the quasi-reversibility of the
CSMA/CA Markov chain with collisions. That property enables to develop all the results of the
chapter. Theorem 5.2 establishes the existence of transmission duration parameters that stabilize the
queues. Section 5.3 specifies Algorithm 4 in Definition 5.3, and its throughput-optimality is stated
in Theorem 5.4. Section 5.4 specifies Algorithm 4(b) in Definition 5.5. This algorithm is a version
of Algorithm 4 designed to reduce the delays. The capacity region of that algorithm is given by
5.8. RELATED WORKS 97
Theorem 5.6. Section 5.5 discusses numerical examples that confirm the analytical results. Finally,
Section 5.6 gives the technical proofs.
CHAPTER 6
6.2 EXAMPLES
This section illustrates critical aspects of the scheduling of SPNs on simple examples. Figure 6.1
shows a SPN with one input activity (IA) represented by the shaded circle and four service activities
(SAs) represented by white circles. SA2 needs one part from queue 2 and produces one part that
leaves the network, similarly for SA4. SA3 needs one part from each of the queues 2, 3 and 4 and
produces one part that leaves the network. SA1 needs one part from queue 1 and produces one part
which is added to queue 4. Each SA takes one unit of time. There is a dashed line between two SAs
if they cannot be performed simultaneously. These conflicts may be due to common resources that
the SAs require. The parts arrive at the queues as follows: at even times, IA1 generates one part for
each of the queues 1, 2 and 3; at odd times, no part arrives.
One simple scheduling algorithm for this network is as follows. At time 0, buffer the parts
that arrive at queues 1, 2 and 3. At time 1, perform SA1 which removes one part from queue 1 and
adds one part to queue 4. At time 2, use the three parts in queue 2, 3, 4 to perform SA3 and buffer
the new arrivals. Repeat this schedule forever, i.e., perform SA1 and SA3 alternately. This schedule
makes the system stable.
Interestingly, the maximum weight algorithm (MWM) makes this system unstable (in a way
similar to a counter example in (12)). By definition, at each time, MWM schedules the SAs that
ZS
Ȝ
ZS
Figure 6.3 shows another SPN. IA1 produces one part for queue 1. IA2 produces one part
for queue 2 and one part for queue 3. The synchronized arrivals generated by IA2 correspond to
the ordering of a pair of parts, as one knows that such a pair is needed for SA2. This mechanism
eliminates the difficulty encountered in the example of Figure 6.2. In Figure 6.3, we say that each
IA is “source” of a “flow” of parts (as a generalization of a “flow” in data networks). SA1 and SA2
in this network conflict, as indicated by the dashed line between the SAs. Similarly, SA2 and SA3
conflict. One may consider the problem of scheduling both the IAs (ordering parts) and the SAs
to maximize some measure of performance. Our model assumes the appropriate ordering of sets of
parts to match the requirements of the SAs.
We explain the deficit maximum weight (DMW) scheduling algorithm on the example of
Figure 6.1. In that example, we saw that MWM is unstable because it starves SA3. Specifically,
MWM schedules SA2 and SA4 before the three queues can accumulate parts for SA3. The idea
of DMW is to pretend that certain queues are empty even when they have parts, so that the parts
can wait for the activation of SA3. The algorithm is similar to MWM, but the weight of each SA is
102 6. STOCHASTIC PROCESSING NETWORKS
1 4
1 1 3
2
2
2 3
computed from the virtual queue lengths qk = Qk − Dk , ∀k. Here, Qk is the actual length of queue
k and Dk ≥ 0 is called deficit.
DMW automatically finds the suitable values of the deficits Dk . To do this, DMW uses the
maximum-weighted schedule without considering whether there are enough input parts available.
When the algorithm activates a SA that does not have enough input parts in queue k, the SA produces
fictitious parts, decreases qk (which is allowed to be negative) and increases the deficit of queue k. This
algorithm produces the results in Table 6.1 where each column gives the values of q and D after the
activities in a slot. For deficits, only D4 is shown since the deficits of all other queues are 0. In the table,
SA0 means that no SA is scheduled because all the weights of the activities are non-positive. Note
that when SA3 is activated for the first time, queue 4 is empty: Q4 = 0. Therefore, q4 is decreased to
-1, D4 is increased to 1 and a fictitious part is produced. But since SA1 is activated simultaneously,
q4 becomes 0 after this slot. After that, the sequence (SA0+IA1, SA3+SA1) repeats forever and no
more fictitious parts are produced. The key observation is that, although the virtual queue lengths
are allowed to become negative, they remain bounded in this example. Consequently, with proper
D, the actual queue lengths Q = q + D are always non-negative, and thus the starvation problem
is avoided.
(i) Each IA is the “source” of a flow of parts, like in Figure 6.3. (There are M IAs and M
flows.) In other words, the parts generated by one activation of IA m can be exactly served
by activating some SAs and eventually produce a number of products that leave the network,
without leaving any part unused in the network. (This will be made more formal later.) This
is a reasonable setup since the manufacturer knows how many input parts of each type are
needed in order to produce a set of products, and he will order the input parts accordingly.
Otherwise, there will be parts not consumed, which are clearly not necessary to order.
(iii) A SA n is associated with a set of input queues In and a set of output queues On . Due to
the way we define the queues in (ii), different flows are served by disjoint sets of SAs. (Even if
two SAs in different flows essentially perform the same task, we still label them differently.)
Also, a SA is defined only if it is used by some flow.
Each activation of IA m adds ak,m parts to queue k. Define the input matrix A ∈ RK∗M where
Ak,m = −ak,m , ∀m, k. Each activation of SA n consumes bk,n parts from each queue k ∈ In (the
“input set” of SA n), and produces bk,n parts that are added to each queue k ∈ On (the “output set”
of SA n), and possibly a number of final products that leave the network. Assume that In ∩ On = ∅.
Accordingly, define the service matrix B ∈ RK∗N , where Bk,n = bk,n if k ∈ In , Bk,n = −bk,n if
k ∈ On , and Bk,n = 0 otherwise. Assume that all elements of A and B are integers. Also assume
that the directed graph that represents the network has no cycle (see, for example, Fig. 6.1 and
Fig. 6.3).
Let a(t) ∈ {0, 1}M , t = 0, 1, . . . be the “arrival vector” in slot t, where am (t) = 1 if IA m is
activated and am (t) = 0 otherwise. Let λ ∈ RM be the vector of average arrival rates. Let x(t) ∈
{0, 1}N be the “service vector” in slot t, where xn (t) = 1 if SA n is activated and xn (t) = 0 otherwise.
Let s ∈ RN be a vector of (average) service rates.
Point (i) above means that there exists sm ∈ RN such that
Am + B · sm = 0 (6.1)
Therefore, for any activation rate λm > 0 of flow m, there exists sm ∈ RN such that
A m · λm + B · s m = 0 (6.2)
The vector sm is the service rate vector for flow m that can exactly serve λm . This is a reasonable
assumption as discussed in point (i). We also assume that sm is unique given λm , i.e., there is only
one way to serve the arrivals. We expect that this assumption usually holds in practice. Summing
up (6.2) over m gives
A · λ + B · s = 0. (6.3)
where s = m sm 0.2 Note that s 0 because a SA is defined only if it is used by some flow,
and λ 0. Also, since each flow is associated with a separate set of queues and SAs, equation (6.3)
implies (6.2) for all m as well.
By assumption, given any λ, there exists a unique s satisfying (6.3), so we also write s in (6.3)
as s(λ).
Due to resource sharing constraints among the SAs, not all SAs can be performed simul-
taneously at a given time. Assuming that all queues have enough parts such that any SA can be
performed, let x̃ ∈ {0, 1}N be a feasible service vector, and X be the set of such x̃’s. (We also call x̃
an independent set since the active SAs in x̃ can be performed without conflicts.) Denote by be
the convex hull of X , i.e.,
:= {s|∃p 0 : px̃ = 1, s = (px̃ · x̃)}
x̃∈X x̃∈X
and let o be the interior of . (That is, for any s ∈ o , there is a ball B̃ centered at s with radius
r > 0 such that B̃ ⊂ .)
Remarks
• If λ is strictly feasible, then by definition s(λ) 0, and, therefore, λ 0. This does not cause
any loss of generality: if some λm = 0, then our DMW algorithm never activates any SA in
this flow. So the flow can be disregarded from the model.
• In a more general setting, the output parts of a certain SA can split and go to more than
one output sets. The split can be random or deterministic. For example, in a hospital, after
a patient is diagnosed, he goes to a certain room based on the result. A probabilistic model
for this is that the patients go to different rooms with certain probabilities after the SA (i.e.,
2 In this chapter, the relationship a b where a , b ∈ RK means that a > b for i = 1, 2, . . . , K. Similarly, a b means
i i
that ai ≥ bi for i = 1, 2, . . . , K.
6.4. DMW SCHEDULING 105
the diagnosis). The split can also be deterministic. For example, in manufacturing, the output
parts of a SA may be put into two different queues alternately.
In both cases, we can define the element Bk,n in the matrix B to be the average rate that
SA n consumes (or adds) parts from (to) queue k. However, note that in the random case, it
may not be feasible to stabilize all queues by any algorithm even if there exist average rates
satisfying (6.2). Fig. 6.2 described earlier is such an example. For simplicity, here we mainly
consider networks without splitting.
where, as defined earlier, a(t) is the vector of actual arrivals in slot t (where the m’th element
am (t) corresponds to IA m). In this chapter, x ∗ (t) and x ∗ (q(t)) are interchangeable.
Expression (6.6) can also be written as
where μout,k (t) and μin,k (t) are the number of parts coming out of or into virtual queue k in
slot t, expressed below. (We use v + and v − to denote the positive and negative part of v. That
is, v + = max{0, v} and v − = max(0, −v}, so that v = v + − v − .)
N + ∗
μout,k (t) = [Bk,n xn (t)]
n=1
M − N − ∗
μin,k (t) = m=1 [Ak,m am (t)] + n=1 [Bk,n xn (t)].
(iv) Update of actual queues Q (t) and deficits D(t): If SA n is scheduled in slot t but there
are not enough parts in some of its input queues (or some input parts are fictitious, further
explained below), SA n is activated as a null activity. Although the null activity n does not
actually consume or produce parts, parts are removed from the input queues and fictitious parts
are added to the output queues as if SA n was activated normally. So the actual queue length
Dk (t + 1) = Qk (t + 1) − qk (t + 1). (6.8)
The proof that DMW achieves maximum throughput consists of the following steps. First,
Lemma 6.3 shows how the deficits get updated. Second, Lemma 6.4 shows that the algorithm
is optimal if q(t) is bounded. Theorems 6.5 and 6.6 provide sufficient conditions for q(t) to be
bounded.
We first derive a useful property of Dk (t).
6.4. DMW SCHEDULING 107
Lemma 6.3 Deficit Update
Dk (t) is non-decreasing with t, and satisfies
Dk (t + 1) = Qk (t + 1) − qk (t + 1)
= [Qk (t) − μout,k (t)]+ − [qk (t) − μout,k (t)]
= Qk (t) − μout,k (t) + [μout,k (t) − Qk (t)]+
−[qk (t) − μout,k (t)]
= Dk (t) + [μout,k (t) − Qk (t)]+ , (6.9)
Proof.
Part (i): √
Note that since ||q(t)||2 ≤ G, we have −G ≤ qk (t) ≤ G , ∀k, t where G := G,. We
claim that Dk (t) ≤ G + μout , ∀k, t where μout is the maximum number of parts that could leave
a queue in one slot. By the definition of the DMW algorithm, Dk (t) is non-decreasing with t and
initially Dk (t) = 0.
Suppose to the contrary that Dk (t) is above G + μout for some k and t. Then there exists
t , which is the first time that Dk (t ) is above G + μout . In other words, Dk (t ) > G + μout and
Dk (t − 1) ≤ G + μout .
By (6.9) and (6.8), we have
In Section 6.4.1, we will show that q(t) is bounded under certain conditions on the arrivals.
By Proposition 6.4, Q (t) is bounded and the maximal throughput is achieved.
is the vector of average arrival rates in the l’th time window of length T . In other words, there exists
a large enough time window T such as the ãl is “uniformly” strictly feasible.
Remark: Note that ãl can be very different for different l’s. That is, ãl , l = 0, 1, . . . do not
need to be all close to a certain strictly feasible λ.
Theorem 6.5 Under Condition 1, q(t) is bounded for all t. Therefore, (i) and (ii) in Proposition 6.4
hold.
Theorem 6.6 With the “almost constant” arrivals, q(t) is bounded for all t.
6.5. UTILITY MAXIMIZATION 109
t−1 t−1
Proof. Since
(l+1)·T −1 τ =0 am (τ ) = λm · t], we have | τ =0 am (τ ) − λm · t| ≤ 1, ∀t.
So
(l+1)·T −1
| τ =l·T am (τ )/T − λm | = (1/T ) · |[ τ =0 am (τ ) − λm · (l + 1)T ] − [ τl·T=0−1 am (τ ) −
λm lT ]| ≤ 2/T .
Since λ is strictly feasible, there exists σ > 0 such that λ + 2σ · 1 and λ − 2σ · 1 are feasible
(l+1)·T −1
vectors of arrival rates. Choose T ≥ 2/σ , then | τ =l·T am (τ )/T − λm | ≤ σ, ∀m. Therefore,
ãl + σ · 1 and ãl − σ · 1 are feasible. 2
For simplicity, also assume that the random variables {am (t), m = 1, 2, . . . , M, t = 0, 1, 2, . . . } are
independent. (This assumption, however, can be easily relaxed.) Suppose that the vector λ is strictly
feasible.
In general, this arrival process does not satisfy the smoothness condition (although when T
−1
is large, τt+T
=t a(τ )/T is close to λ with high probability). With such arrivals, it is not difficult
to show that q(t) is stable, but it may not be bounded. As a result, the deficits D(t) may increase
without bound. In this case, we show that the system is still “rate stable”, in the sense that in the
long term, the average output rates of the final products converge to the optimum output rates (with
probability 1). The intuitive reason is that as D(t) becomes very large, the probability of generating
fictitious parts approaches 0.
Theorem 6.7 With the arrival process defined above, the system is “rate stable”.
the utility function as um (fm ) := vm (fm ) − cm fm . Let f ∈ RM be the vector of input activation
rates. Assume that vm (·) is increasing and concave. Then um (·) is concave. The joint scheduling and
congestion control algorithm (or “utility maximization algorithm”) works as follows.
where V > 0 is a constant, and Am is the m’th column of A. Then, update the virtual queues as
Since fm (q(t)) in general is not integer, we let am (t) = Fm (t + 1) − Fm (t), where
Fm (t) := t−1τ =0 fm (q(τ )). And update the actual queues in the same way as (6.7).
Theorem 6.8 With the above algorithm, q(t) and Q (t) are bounded. Also, there are at most a finite
number of null activities which do not affect the long term throughput.
6.6 EXTENSIONS
In the above, we have assumed that each activity lasts one slot for the ease of exposition. Our
algorithms can be extended to the case where different activities have different durations under a
particular assumption. The assumption is that each activity can be suspended in the middle and
resumed later. If so, we can still use the above algorithm which re-computes the maximum weight
schedule in each time slot. The only difference is that the activities performed in one time slot may
not be completed at the end of the slot, but they are suspended and to be continued in later slots.
(The above assumption was also made in the “preempted” networks in (12). There, whenever a new
schedule is computed, the ongoing activities are suspended, or “preempted”.)
In this case, the algorithms are adapted in the following way. The basic idea is the same as
before. That is, we run the system according to the virtual queues q(t). Let the elements in matrices
A and B be the average rates of consuming (or producing) parts per slot from (or to) different
queues. Even if an activity is not completed in one slot, we still update the virtual queues q(t)
according to the above average rates. That is, we view the parts in different queues as fluid, and q(t)
reflects the amount of fluid at each queue. However, only when an activity is completed, the actual
parts are removed from or added to the output queues. Note that when an activity is suspended, all
parts involved in the activity are frozen and are not available to other activities. When there are not
enough parts in the queues to perform a scheduled activity, fictitious parts are used instead (and the
corresponding deficits are increased).
On the other hand, if each activity cannot be suspended in the middle once it is started,
then one possible scheme is to use long time slots in our algorithms. In slot t, each SA n with
xn∗ (t) = 1 is activated as many times as possible. When each slot is very long, the wasted time during
the slot becomes negligible, so the algorithm approximates the maximal throughput (with the cost
of longer delay). Without using long slots, the non-preemptive version of the maximal-pressure
algorithm proposed in (12) is not throughput-optimum in general, but it is throughput-optimal
under a certain resource-sharing constraints (12).
6.7 SIMULATIONS
6.7.1 DMW SCHEDULING
We simulate a network similar to Fig. 6.1 but with a different input matrix A and service matrix B
below.
⎡ ⎤ ⎡ ⎤
−3 1 0 0 0
⎢ −2 ⎥ ⎢ 0 1 1 0 ⎥
A=⎢ ⎥ ⎢
⎣ −1 ⎦ , B = ⎣ 0 0 1 0 ⎦
⎥
0 −1 0 2 1
2.5
1.5
0.5
0
0 20 40 60 80 100
Time slot
1.5
0.5
−0.5
−1
0 20 40 60 80 100
Time slot
Ȝ
Ȝ
6.8 SUMMARY
In this chapter, we have discussed the problem of achieving the maximal throughput and utility
in SPNs. First, we explained through examples (in Section 6.2) that scheduling in SPNs is more
challenging than in wireless networks because performing a service activity does not only require
resources as in wireless networks, but also requires the availability all necessary input parts. As a
result, the well known Maximum Weight Scheduling may not stabilize the queues.
114 6. STOCHASTIC PROCESSING NETWORKS
Qk(t)
100
90 Q1
Q
80 2
Q
3
70 Q
4
60 Q
5
50
40
30
20
10
0
0 1000 2000
Time slot
Proof. Let zm (t) and wn (t) be, respectively, the number of times that IA m and SA n have been
performed until time t. Write z(t) and w(t) as the corresponding vectors. Using q(0) = 0 and
equation (6.6), we have
q(t) = −A · z(t) − B · w(t). (6.15)
6.9. SKIPPED PROOFS 115
Dk(t)
100
90
D
1
80 D
2
D
70 3
D
4
60 D
5
50
40
30
20
10
0
0 1000 2000
Time slot
d(t) = B T q(t) = B T B · v = 0.
A · λ + B · y = 0. (6.16)
where
x ∗ (q) ∈ arg max q T B · x. (6.18)
x∈X
116 6. STOCHASTIC PROCESSING NETWORKS
Proof. Since y ∈ o , ∃σ > 0 such that y ∈ for any y satisfying ||y − y|| ≤ σ .
For any q̂ satisfying ||q̂|| = 1, q̂ ∈ B , by Lemma 6.10, we have d̂ := B T q̂ = 0. Also, ||B T q̂|| ≥
σ̂ := min||q ||=1,q ∈B ||B T q || > 0. Choose ˆ > 0 (which may depend on q̂) so that ||ˆ · B T q̂|| = σ .
Then, y + ˆ · B T q̂ ∈ . Also, (6.18) implies that x ∗ (q) ∈ arg maxx∈ q T B · x. So, q̂T B · x ∗ (q̂) ≥
q̂T B · [y + ˆ · B T q̂] = q̂T B · y + ˆ · ||B T q̂||2 ≥ q̂T B · y + σ̂ · σ . Let δ := σ̂ · σ . Then
Consider any q = 0. Let q̂ := q/||q||, then ||q̂|| = 1. Note that if x ∗ (q̂) ∈ arg maxx∈X q̂T B · x, then
x ∗ (q̂) ∈ arg maxx∈X qT B · x by linearity, so q T B · x ∗ (q̂) = q T B · x ∗ (q). Therefore, q T B · [x ∗ (q) −
y] = q T B · [x ∗ (q̂) − y] = ||q|| · q̂T B · [x ∗ (q̂) − y] ≥ δ||q||, proving (6.17). If q = 0, then (6.17)
holds trivially. 2
Next, to analyze the queue dynamics, consider the Lyapunov function L(q(t)) = ||q(t)||2 .
We have
(q(t)) := L(q(t + 1)) − L(q(t))
= ||q(t) − A · a(t) − B · x ∗ (q(t))||2 − ||q(t)||2
= −q(t)T A · a(t) − q(t)T B · x ∗ (q(t))
+||A · a(t) + B · x ∗ (q(t))||2
≤ −q(t)T A · a(t) − q(t)T B · x ∗ (q(t)) + c (6.20)
where μk,in , μk,out are, respectively, the maximum amount of parts that can enter or leave queue k
in one time slot.
where ãl is defined in (6.10). Then there exists δ̄ > 0 such that
Proof. By Condition 1, ∃σ > 0 such that for all l, ãl + σ · 1 and ãl − σ · 1 are feasible. Therefore,
y(l · T ) + s(σ · 1M ) ∈ and y(l · T ) − s(σ · 1M ) ∈ . Define σ > 0 to be the minimum element
of s(σ · 1M ) 0, then y ∈ for any y satisfying ||y − y(l · T )|| ≤ σ . (This is because the set is
“comprehensive”: if s ∈ , then s ∈ for any 0 s s.) Then, following the proof of Lemma 6.11,
letting δ̄ := σ̂ · σ (which do not depend on l or q) completes the proof. 2
Lemma 6.14 Assume that the maximum change of any queue in one time slot is bounded by α. And the
absolute value of every element of A and B is bounded by b̄. Then
c2 := T · c + KT 2 α · (M + K)b̄.
−1
(l+1)T
L(q((l + 1)T )) − L(q(l · T )) ≤ − q(τ )T A · a(τ )
τ =l·T
−1
(l+1)T
− q(τ )T B · x ∗ (q(τ )) + T · c.
τ =l·T
where the last two steps have used (6.23) and condition (6.24). 2
Now Theorem 6.5 can be proved as follows.
Proof. Lemma 6.14 and Lemma 6.12 imply that q(l · T ) is bounded for all l. Because each queue
has bounded increments per slot, q(t) is bounded for all t. 2
P (E ) = πj (E ) (6.27)
where πj (·) is the stationary distribution on the Rj , and Rj is the closed set of communicating
states q(t) eventually enters.
6.9. SKIPPED PROOFS 119
To show the rate stability, consider two kinds of queues. WLOG, let U be the set of queues
whose deficits go unbounded. According to Proposition 6.4, the queues outside the set only induce
a finite number of null activities.
Consider queue k ∈ U . For any C > 0, since Dk (t) → ∞, there exists finite time tk such
that Dk (t) ≥ C, ∀t ≥ tk . For t ≥ tk , queue k induces null activities at slot t − 1 only when qk (t) <
−D (t) ≤ −C. So the total number of null activities induced by queue k is not more than N · [tk +
∞k ∞
t=tk I (qk (t) < −C)] ≤ N · [tk + t=0 I (qk (t) < −C)], since queue k at most induces N null
activities in one time slot. Therefore, the average rate the queue k induces null activities is
T −1
1
rk ≤ N · lim [tk + I (qk (t) < −C)] = N · P r(qk < −C) (6.28)
T →∞ T
t=0
where the marginal probability on the RHS is induced by the stationary distribution πj (·) on the
set Rj which q(t) eventually enters. So limC→+∞ P r(qk < −C) = 0. Since (6.28) holds for any
C > 0, letting C → +∞ yields rk = 0.
Therefore, the average rate of null activities is 0 in the long term w. p. 1. Also, if we imagine
that the null activities produce real parts, then the output rates of the final products would be the
maximum since the virtual queues q(t) are stable. Combining the two facts concludes the proof.
Proof. Choose any f ∈ Rm and y 0 in o such that the flow conservation constraint is satisfied:
A · f + B · y = 0, and | m um (fm )| < ∞, ∀m. The latter is feasible by letting fm = > 0, ∀m
where is small enough.
By Lemma 6.11, we have for any q ∈ B ,
Therefore,
M
M
V · um (fm ) + q(t)T A · f ≤ V · um (fm (q(t))) + q(t)T A · f (q(t)).
m=1 m=1
120 6. STOCHASTIC PROCESSING NETWORKS
Since | m um (fm )| < ∞, we have
M
M
M
M
um (fm (q(t))) − um (fm ) ≤ vm (1) − um (fm ) ≤ C1
m=1 m=1 m=1 m=1
for some positive constant C1 . So
−q(t)T A · f (q(t)) ≤ −q(t)T A · f + V · C1 . (6.30)
Similar to (6.20), the Lyapunov drift in the algorithm is
(q(t)) ≤ −q(t)T A · f (q(t)) − q(t)T B · x ∗ (q(t)) + c. (6.31)
Plugging (6.29) and (6.30) into (6.31) yields
(q(t)) ≤ −q(t)T A · f + V · C1 − q(t)T B · y − δ ||q(t)|| + c
= −q(t)T [A · f + B · y ] − δ ||q(t)|| + V · C1 + c
= −δ ||q(t)|| + V · C1 + c.
Using Lemma 6.12, the above implies that for all t,
L(q(t)) ≤ [(V · C1 + c)/δ ]2 + V · C1 + c.
So q(t) is bounded. 2
Define q̃(0) = 0, and for t = 0, 1, . . . , define
q̃(t + 1) = q̃(t) − A · a(t) − B · x ∗ (t). (6.32)
Lemma 6.16 For all t, ||q̃(t) − q(t)|| ≤ Z for some constant Z > 0.
Dividing both sides by T · V , and using L(q(T )) − L(q(0)) = L(q(T )) ≥ 0, one gets
−1
T
um (fm (q(t)))/T ≥ U ∗ − c/V . (6.33)
t=0 m
T −1 T −1
Since um (·) is concave, um ( t=0 fm (q(t))/T ) ≥ t=0 um (fm (q(t)))/T . Using this, (6.33) and
letting T → ∞, we have (6.14). 2
123
APPENDIX A
Stochastic Approximation
Algorithm 1 and Algorithm 1(b) that we develop in this book belong to a family of stochastic
approximation algorithms. These algorithms are essentially gradient algorithms to minimize some
function, except that they use a noisy estimate of the gradient.
This chapter provides some background on stochastic approximation. In Section A.1, we
review the standard gradient algorithm. Section A.2 explains the stochastic approximation algorithm
and its convergence properties.
where {·}D means the projection onto the set D. The projection of a vector x onto a closed set D is
the closest point to x in that set, in the metric being considered. In our example with f (x) = x 2 /2,
one has ∇f (x[m]) = f (x[m]) = x[m], so the algorithm is
To illustrate the theorem, in our example with f (x) = x 2 /2, we use algorithm (A.4) with
αm = 1/(m + 1) in the case of decreasing step sizes, and αm = α = 0.1 in the case of constant step
size, both with the initial value x[0] = −2. The trajectories of {x[m]} are plotted in Fig. A.1 and
Fig. A.2.
0
−0.2
−0.4
−0.6
−0.8
x[m]
−1
−1.2
−1.4
−1.6
−1.8
−2
0 10 20 30 40 50
m
Proof. We give the proof of this result because it illustrates arguments that are typically used to
derive such results.
Denote g(m) := ∇f (x[m]). In the following, we use x[m] and x(m) interchangeably.
Proof of part (i)
Consider the Lyapunov function d(m) := 21 ||x(m) − x ∗ ||2 where || · || denote the L2 norm.
By (A.3), we have
1
d(m + 1) ≤ ||[x[m] − αm g(m)] − x ∗ ||2
2
≤ d(m) + αm · [x ∗ − x(m)]T g(m)
+αm 2
Cg /2. (A.5)
A.1. GRADIENT ALGORITHM 125
0
−0.2
−0.4
−0.6
−0.8
x[m]
−1
−1.2
−1.4
−1.6
−1.8
−2
0 10 20 30 40 50
m
where the first inequality holds because the projection to a convex set is “non-expansive” (8), that is,
||{y}D − {z}D || ≤ ||y − z||, and the second inequality follows from (A.2).
Step 1: Recurrence to a neighborhood of x ∗
Given a constant μ > 0, define the set Hμ := {x ∈ D|f (x) ≤ μ + f (x ∗ )}. Clearly, x ∗ ∈ Hμ ,
so Hμ is a neighborhood of x ∗ . For example, in Fig. A.3 with f (x) = x 2 /2, the set Hμ when μ = 0.5
is the set [a, b] = [−1, 1].
We claim that for any M0 , there exists m ≥ M0 such that x(m) ∈ Hμ . That is, Hμ is recurrent
for {x(m)}.
This claim can be proved by contradiction. Suppose that x(m) ∈ / Hμ , ∀m ≥ M0 , then ∀m ≥
M0 , using the fact that f (x) is convex in x, we have
d(m + 1) ≤ d(m) − αm μ
+αm2
Cg /2. (A.6)
Since limm→∞ αm = 0, there exists M1 such that αm ≤ μ/Cg , ∀m ≥ M1 . Therefore, for all
m ≥ M2 := max{M0 , M1 } , we have
1.5
0.5
μ
0
a b
−2 −1 0 1 2
x
Step 2: Convergence
Fix μ > 0 and > 0. Since limm→∞ αm = 0, we can choose M3 such that ∀m ≥ M3 ,
2
αm ≤ 2/Cg (A.7)
αm ≤ μ/Cg . (A.8)
By the result of step 1, there exists M4 ≥ M3 such that x(M4 ) ∈ Hμ . In the following, we show
that ∀m ≥ M4 , d(m) ≤ μ + where μ := maxx∈Hμ ||x − x ∗ ||2 /2.The proof is by induction. First,
it is clear that d(M4 ) ≤ μ < μ + . Now suppose that d(m) ≤ μ + where m ≥ M4 . We need
to show that d(m + 1) ≤ μ + as well. This is done by considering two cases. (i) If x(m) ∈ Hμ ,
then by (A.5) and (A.7), d(m + 1) ≤ d(m) + αm 2 C /2 ≤ d(m) + ≤ + ; (2) If x(m) ∈
g μ / Hμ ,
then by (A.5) and (A.8), d(m + 1) ≤ d(m) ≤ μ + . Therefore, d(m) ≤ μ + , ∀m ≥ M4 . This
argument is illustrated in Figure A.4.
Since x ∗ is unique, μ → 0 as μ → 0. Therefore, the above result holds for arbitrarily small
μ + by choosing small enough and μ. This implies that limm→∞ d(m) = 0, completing the
proof.
Proof of part (ii)
Given μ > 0 and > 0, choose the step size α to satisfy (A.7) and (A.8), i.e., α 2 ≤ 2/Cg
and α ≤ μ/Cg . Using step 1 of the proof of part (i), it is easy to see that there exist M5 such that
x(M5 ) ∈ Hμ . Then using step 2 of that proof, we know that d(m) ≤ μ + , ∀m ≥ M5 . This implies
that x(m) converges to a neighborhood of x ∗ . 2
A.2. STOCHASTIC APPROXIMATION 127
where Em (·) is the conditional expectation given Fm , the σ -field generated by x[0], x[1], . . . , x[m].
Also, define the zero-mean noise
Then we have
g̃(m) = g(m) + B(m) + η(m). (A.10)
With algorithm (A.9), we have the following known result.
To illustrate the theorem, we apply algorithm (A.9) to our example f (x) = x 2 /2, using
αm = 1/(m + 1) in the case of decreasing step sizes, and αm = α = 0.1 in the case of constant
step size, both with the initial value x[0] = −2. In both cases, the error bias B(m) = 0, ∀m, and the
zero-mean noise η(m)’s are independent and uniformly distributed in [−1, 1]. The trajectories of
{x[m]} are plotted in Fig. A.5 and Fig. A.6.
0.5
−0.5
x[m]
−1
−1.5
−2
0 20 40 60 80 100
m
1
d(m + 1) ≤ ||[x[m] − αm g̃(m)] − x ∗ ||2
2
= d(m) + αm · [x ∗ − x(m)]T g(m)
+αm · [x ∗ − x(m)]T [B(m) + η(m)]
+αm 2
||g̃(m)||2 /2. (A.11)
where the first inequality holds because the projection to a convex set is “non-expansive” (8), that is,
||{y}D − {z}D || ≤ ||y − z||.
A.2. STOCHASTIC APPROXIMATION 129
0.5
−0.5
x[m]
−1
−1.5
−2
0 20 40 60 80 100
m
Since the gradient g(m) is bounded (by (A.2)), ||B(m)|| ≤ CB < ∞, and Em ||η(m)||2 ≤
c3 < ∞, it is easy to show that Em ||g̃(m)||2 /2 ≤ C < ∞ for some constant C. Therefore,
d(n) ≤ d(m)
n−1
+ i=m {α · [x ∗ − x(i)]T g(i)}
n−1 i (A.15)
+ i=m {αi · [x ∗ − x(i)]T [B(i) + η(i)]}
n−1 2
+ i=m αi ||g̃(i)||2 /2.
∞ 2
Since Ei ||g̃(i)||2 /2 ≤ C < ∞, ∀i, one has E( ∞ i=0 αi ||g̃(i)|| /2) ≤ C
2 2
i=0 αi < +∞.
∞ 2
Therefore, i=0 αi ||g̃(i)|| /2 < +∞ w. p. 1, which implies that w. p. 1,
2
∞
lim αi2 ||g̃(i)||2 /2 = 0. (A.16)
m→∞
i=m
∞ ∞
Also, i=0 |αi · [x ∗ − x(i)]T B(i)| ≤ i=1 αi · c2 ||B(i)|| < ∞. So
∞
lim |αi · [x ∗ − x(i)]T B(i)| = 0. (A.17)
m→∞
i=m
n−1
Finally, W (n) := i=0 {αi · [x ∗ − x(i)]T η(i)} is a martingale (16).To see this, note that (a) W (n) ∈
Fn ; (b) E|W (n)| < ∞, ∀n; and (c) E(W (n + 1)|Fn ) − W (n) = αn · [x ∗ − x(n)]T E[η(n)|Fn ] =
0. Also, Em ||η(m)||2 ≤ c3 < ∞, ∀m implies that E||η(m)||2 ≤ c3 , ∀m. So
n−1
supn E(W (n)2 ) = sup i=0 E{[αi · (x ∗ − x(i))T η(i)]2 }
∞
≤ E{[αi · (x ∗ − x(i))T η(i)]2 } (A.18)
i=0
∞
≤ i=0 {αi c2 E||η(i)|| } < ∞.
2 2 2
By the L2 Martingale Convergence Theorem (16), W (n) converges with probability 1. So, w. p. 1,
n−1
sup | {αi · [x ∗ − x(i)]T η(i)}|
n≥m≥N0 i=m
= sup |W (n) − W (m)| → 0 (A.19)
n≥m≥N0
A.3. SUMMARY 131
as N0 → ∞.
Combining (A.16), (A.17) and (A.19), we know that with probability 1, for any > 0, after
x(m) returns to Hμ for some large enough m (due to recurrence of Hμ ),
n−1
n−1
∗
{αi · [x − x(i)] [B(i) + η(i)]} +
T
αi2 ||g̃(i)||2 /2 ≤
i=m i=m
for any n > m. In (A.15), since [x ∗ − x(i)]T g(i) ≤ 0, we have d(n) ≤ d(m) + , ∀n > m. In other
words, r cannot move far away from Hμ after step m. Since the above argument hold for Hμ with
arbitrarily small μ and any > 0, x converge to x ∗ with probability 1.
Proof of part (ii)
In (A.14), choose αm = α ≤ μ/(2C), then −αm μ + αm 2 · C = α(−μ + αC) ≤ −αμ/2. It
follows that
Em [d(m + 1)] ≤ d(m) − αμ/2 + α · c2 ||B(m)||.
Since m ||B(m)|| < ∞ w. p. 1, by Lemma 1, we conclude that x(m) returns to Hμ infinitely
often w. p. 1. 2
A.3 SUMMARY
This chapter has explained gradient algorithms to minimize an objective function f (x), with accurate
or noisy gradients. For simplicity, we have assumed that the objective function is convex and the
minimization is over a bounded convex region.
We first discussed the case when accurate gradients are available (Section A.1). In this case,
with decreasing step sizes that converge to 0 but sum up to infinity, the gradient algorithm makes x
to converge to x ∗ that minimizes f (x). With a constant step size that is small enough, x to converge
to a neighborhood of x ∗ .
When only inaccurate gradients are available, we have a stochastic approximation algorithm
(Section A.2). We explained that under certain conditions on the error in the gradient, the algorithm
makes x to converge to x ∗ almost surely with properly-chosen decreasing step sizes, and it makes x
returns to a neighborhood of x ∗ infinitely often with a small enough constant step size.
This chapter has provided important background for the development of our throughput-
optimal scheduling algorithms in Chapter 3, which are in the family of stochastic approximation
algorithms. In those algorithms, we need to deal with extra challenges such as quantifying the error
in the gradient and optimizing over unbounded sets.
A.4 REFERENCES
Stochastic approximation was first introduced in (63) as the Robbins-Monro algorithm. Over the
years, the theory has been developed extensively concerning the convergence conditions, rates of
132 A. STOCHASTIC APPROXIMATION
convergence, noise models, etc., with applications in many areas such as control, communications
and signal processing. See, for example, (42; 7) for a comprehensive development.
133
Bibliography
[1] R. Ahlswede, N. Cai, S. Li, and R.W. Yeung, “Network Information Flow,” IEEE Transactions
on Information Theory, vol. 46, no. 4, pp. 1204-1216, Jul. 2000. DOI: 10.1109/18.850663 65
[4] G. Bianchi, “Performance Analysis of the IEEE 802.11 Distributed Coordination Func-
tion,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 3, pp. 535-547, 2000.
DOI: 10.1109/49.840210 75, 97
[7] V. Borkar, “Stochastic Approximation: A Dynamical Systems Viewpoint,” 2008. 83, 94, 95, 132
[8] S. Boyd and L. Vandenberghe, “Convex Optimization”, Cambridge University Press, 2004. 18,
26, 55, 93, 125, 128
[9] Loc Bui, R. Srikant, and Alexander Stolyar, “Novel Architectures and Algorithms for De-
lay Reduction in Back-Pressure Scheduling and Routing,” in IEEE INFOCOM 2009 Mini-
Conference.
[10] P. Chaporkar, K. Kar, and S. Sarkar, “Throughput guarantees in maximal scheduling in wireless
networks,” in the 43rd Annual Allerton Conference on Communication, Control and Computing,
Sep. 2005. 57
[13] J. G. Dai and W. Lin, “Asymptotic Optimality of Maximum Pressure Policies in Stochas-
tic Processing Networks,” Annals of Applied Probability, vol. 18, no. 6, pp. 2239-2299, 2008.
DOI: 10.1214/08-AAP522 99
[14] P. Diaconis and D. Strook, “Geometric bounds for eigenvalues of Markov chains,” Annals of
Applied Probability, vol. 1, no. 1, pp. 36-61, Feb. 1991. DOI: 10.1214/aoap/1177005980
[16] R. Durrett, Probability: Theory and Examples. Duxbury Press, 3rd edition, March 16, 2004. 41,
130
[17] M. Durvy, O. Dousse, and P. Thiran, “Border Effects, Fairness, and Phase Transition in Large
Wireless Networks”, in IEEE INFOCOM 2008, Phoenix, Arizona, Apr. 2008. 24
[18] M. Durvy and P. Thiran, “Packing Approach to Compare Slotted and Non-Slotted
Medium Access Control,” in IEEE INFOCOM 2006, Barcelona, Spain, Apr. 2006.
DOI: 10.1109/INFOCOM.2006.251 58
[19] A. Eryilmaz, A. Ozdaglar and E. Modiano, “Polynomial Complexity Algorithms for Full
Utilization of Multi-hop Wireless Networks,” in IEEE INFOCOM 2007, Anchorage, Alaska,
May 2007. DOI: 10.1109/INFCOM.2007.65 57
[20] A. Eryilmaz and R. Srikant, “Fair Resource Allocation in Wireless Networks Using Queue-
Length-Based Scheduling and Congestion Control,” in IEEE INFOCOM, Mar. 2005.
DOI: 10.1109/INFCOM.2005.1498459 73
[21] P. Gupta and A. L. Stolyar, “Optimal Throughput Allocation in General Random Access
Networks,” in Conference on Information Sciences and Systems, Princeton, NJ, Mar. 2006.
DOI: 10.1109/CISS.2006.286657 73
[22] B. Hajek, “Cooling Schedules for Optimal Annealing,” Mathematics of Operations Research, vol.
13, no. 2, pp. 311–329, 1988. DOI: 10.1287/moor.13.2.311 29
[25] J. M. Harrison and R. J. Williams, “Workload Interpretation for Brownian Models of Stochas-
tic Processing Networks,” Mathematics of Operations Research, vol. 32, pp. 808-820, 2007.
DOI: 10.1287/moor.1070.0271
[27] S Hu, G Chen, X Wang, “On extending the Brunk-Prokhorov strong law of large
numbers for martingale differences,” Statistics and Probability Letters, 2008, Elsevier.
DOI: 10.1016/j.spl.2008.06.017 46
[28] L. Jiang and S. C. Liew, “Improving Throughput and Fairness by Reducing Exposed and
Hidden Nodes in 802.11 Networks,” IEEE Transactions on Mobile Computing, vol. 7, no. 1, pp.
34-49, Jan. 2008. DOI: 10.1109/TMC.2007.1070 76
[29] L. Jiang, D. Shah, J. Shin, and J. Walrand, “Distributed Random Access Algorithm: Scheduling
and Congestion Control,” accepted to IEEE Transactions on Information Theory. 23
[30] L. Jiang and J. Walrand, “A Distributed CSMA Algorithm for Throughput and Utility Max-
imization in Wireless Networks,” in the 46th Annual Allerton Conference on Communication,
Control, and Computing, Sep. 23-26, 2008. DOI: 10.1109/ALLERTON.2008.4797741 29,
94, 97
[31] L. Jiang and J. Walrand, “A Distributed Algorithm for Maximal Throughput and Optimal
Fairness in Wireless Networks with a General Interference Model,” EECS Technical Re-
port, UC Berkeley, Apr. 2008. http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/
EECS-2008-38.html 32
[32] L. Jiang and J. Walrand, “A Novel Approach to Model and Control the Throughput of
CSMA/CA Wireless Networks”, Technical Report, UC Berkeley, Jan 2009. http://www.
eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-8.html 75, 97
[33] L. Jiang and J. Walrand, “Convergence and Stability of a Distributed CSMA Al-
gorithm for Maximal Network Throughput,” Technical Report, UC Berkeley, Mar.
2009. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-43.html
DOI: 10.1109/CDC.2009.5400349 29, 42, 70
[36] L. Jiang and J. Walrand, “Stable and Utility-Maximizing Scheduling for Stochastic Processing
Networks,” in the 47th Annual Allerton Conference on Communication, Control, and Computing,
2009. DOI: 10.1109/ALLERTON.2009.5394870 99
[37] C. Joo, X. Lin, and N. Shroff, “Understanding the Capacity Region of the Greedy Maximal
Scheduling Algorithm in Multi-Hop Wireless Networks,” in IEEE INFOCOM 2008, Phoenix,
Arizona, Apr. 2008. DOI: 10.1109/INFOCOM.2008.165 57
[39] F. P. Kelly, “Loss networks,” Ann. Appl. Prob., vol. 1, no. 3, 1991.
DOI: 10.1214/aoap/1177005872 80
[40] F. P. Kelly, “Charging and Rate Control for Elastic Traffic,” European Transactions on Telecom-
munications, vol. 8, pp. 33-37, 1997. DOI: 10.1002/ett.4460080106
[41] F. P. Kelly, A. K. Maulloo, and D. K. H. Tan, “Rate Control for Communication Networks:
Shadow Prices, Proportional Fairness and Stability,” Journal of the Operational Research Society,
vol. 49, no. 3, pp. 237-252, 1998. DOI: 10.2307/3010473 73
[42] H. Kushner and G. Yin, “Stochastic approximation and recursive algorithms and applications,”
Springer-Verlag, New York, 2003. 132
[43] M. Leconte, J. Ni, and R. Srikant, “Improved Bounds on the Throughput Efficiency of
Greedy Maximal Scheduling in Wireless Networks,” in ACM MOBIHOC, May 2009.
DOI: 10.1145/1530748.1530771 57
[46] X. Lin and N. Shroff, “The Impact of Imperfect Scheduling on Cross-Layer Rate Control in
Multihop Wireless Networks,” in IEEE INFOCOM 2005, Miami, Florida, Mar. 2005. 73
BIBLIOGRAPHY 137
[47] X. Lin, N. B. Shroff, and R. Srikant, “A Tutorial on Cross-Layer Optimization in Wireless
Networks,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 8, pp.1452-1463,
Aug. 2006. DOI: 10.1109/JSAC.2006.879351 61, 64
[48] J. Liu, Y. Yi, A. Proutiere, M. Chiang, and H. V. Poor, “Convergence and Tradeoff of Utility-
Optimal CSMA,” http://arxiv.org/abs/0902.1996 29, 94, 97
[49] S. H. Low and D. E. Lapsley, “Optimization Flow Control, I: Basic Algorithm and Con-
vergence,” IEEE/ACM Transactions on Networking, vol. 7, no. 6, pp. 861-874, Dec. 1999.
DOI: 10.1109/90.811451 73
[51] P. Marbach, A. Eryilmaz, and A. Ozdaglar, “Achievable Rate Region of CSMA Schedulers
in Wireless Networks with Primary Interference Constraints,” in IEEE Conference on Decision
and Control, 2007. 58
[53] S. Meyn, “Stability and asymptotic optimality of generalized MaxWeight policies,” SIAM
Journal on Control and Optimization, vol. 47, no. 6, 2009. DOI: 10.1137/06067746X
[55] E. Modiano, D. Shah, and G. Zussman, “Maximizing Throughput in Wireless Networks via
Gossiping,” ACM SIGMETRICS Performance Evaluation Review, vol. 34 , no. 1, Jun. 2006.
DOI: 10.1145/1140103.1140283 57
[56] M. J. Neely, E. Modiano, and C-P. Li, “Fairness and Optimal Stochastic Control for Hetero-
geneous Networks,” in IEEE INFOCOM, Mar. 2005. DOI: 10.1109/TNET.2007.900405 73,
110, 114
[57] M. J. Neely, E. Modiano, and C. P. Li, “Fairness and Optimal Stochastic Control for Hetero-
geneous Networks,” IEEE/ACM Transactions on Networking, vol. 16, no. 2, pp. 396-409, Apr.
2008. DOI: 10.1109/TNET.2007.900405 61, 70, 71
[58] M. J. Neely and R. Urgaonkar, “Cross Layer Adaptive Control for Wireless Mesh
Networks,” Ad Hoc Networks (Elsevier), vol. 5, no. 6, pp. 719-743, Aug. 2007.
DOI: 10.1016/j.adhoc.2007.01.004
138 BIBLIOGRAPHY
[59] J. Ni and R. Srikant, “Distributed CSMA/CA Algorithms for Achieving Maximum Through-
put in Wireless Networks,” in Information Theory and Applications Workshop, Feb. 2009.
DOI: 10.1109/ITA.2009.5044953 80, 97
[60] J. Ni, B. Tan, and R. Srikant, “Q-CSMA: Queue-Length Based CSMA/CA Algorithms for
Achieving Maximum Throughput and Low Delay in Wireless Networks,” http://arxiv.
org/pdf/0901.2333
[61] A. Proutiere, Y. Yi, and M. Chiang, “Throughput of Random Access without Message
Passing,” in Conference on Information Sciences and Systems, Princeton, NJ, USA, Mar. 2008.
DOI: 10.1109/CISS.2008.4558579 58
[63] H. Robbins and S. Monro, “A Stochastic Approximation Method,” The Annals of Mathematical
Statistics, vol. 22, no. 3, pp. 400-407, Sep. 1951. 131
[64] S. Sanghavi, L. Bui, and R. Srikant, “Distributed Link Scheduling with Constant Overhead,”
in ACM SIGMETRICS, Jun. 2007. DOI: 10.1145/1269899.1254920 57
[66] L. Tassiulas and A. Ephremides, “Stability Properties of Constrained Queueing Systems and
Scheduling Policies for Maximum Throughput in Multihop Radio Networks,” IEEE Transac-
tions on Automatic Control, vol. 37, no. 12, pp. 1936-1948, Dec. 1992. DOI: 10.1109/9.182479
7, 56, 114, 118
[67] L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks
and input queued switches,” in IEEE INFOCOM, volume 2, pages 533–539, 1998.
DOI: 10.1109/INFCOM.1998.665071 56
[68] M. J. Wainwright and M. I. Jordan, “Graphical Models, Exponential Families, and Variational
Inference,” Foundations and Trends in Machine Learning, vol. 1, no. 1-2, pp. 1-305, 2008.
DOI: 10.1561/2200000001 25, 26, 32
[69] J. Walrand, “Entropy in Communication and Chemical Systems,” in the first International
Symposium on Applied Sciences in Biomedical and Communication Technologies (Isabel’08), Oct.
2008. DOI: 10.1109/ISABEL.2008.4712620 10
[72] A. Warrier, S. Ha, P. Wason and I. Rhee, “DiffQ: Differential Backlog Congestion Control for
Wireless Multi-hop Networks,” Technical Report, Dept. Computer Science, North Carolina
State University, 2008. 63
[74] P. Whittle, “Systems in Stochastic Equilibrium,” John Wiley & Sons, Inc., New York, NY, USA,
1986. 10, 32
[75] R. J. Williams, “On Stochastic Processing Networks,” Lecture Notes, 2006. http://math.
ucsd.edu/˜williams/talks/belz/belznotes06.pdf 99
[77] Y. Xi and E. M. Yeh, “Throughput Optimal Distributed Control of Stochastic Wireless Net-
works,” in International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless
Networks (WiOpt), 2006. 64
[78] J. Zhang, D. Zheng, and M. Chiang, “The Impact of Stochastic Noisy Feedback on Distributed
Network Utility Maximization,” IEEE Transactions on Information Theory, vol. 54, no. 2, pp.
645-665, Feb. 2008. DOI: 10.1109/TIT.2007.913572
[79] G. Zussman, A. Brzezinski, and E. Modiano, “Multihop Local Pooling for Distributed
Throughput Maximization in Wireless Networks,” in IEEE INFOCOM 2008, Phoenix, Ari-
zona, Apr. 2008. 57
141
Authors’ Biographies
LIBIN JIANG
Libin Jiang received the bachelor of engineering degree in electronic engineering and information
science from the University of Science and Technology of China, Hefei, China, in 2003, the master of
philosophy degree in information engineering from the Chinese University of Hong Kong, Shatin,
Hong Kong, in 2005, and the Ph.D. degree in electrical engineering and computer sciences from
the University of California, Berkeley, in 2009. His research interest includes wireless networks,
communications, and game theory.
He received the David Sakrison Memorial Prize for outstanding doctoral research in UC
Berkeley, and the best presentation award in the ACM Mobihoc’09 S3 Workshop.
JEAN WALRAND
Jean Walrand (S’71-M’80-SM’90-F’93) received the Ph.D. degree in electrical engineering and
computer science from the University of California, Berkeley.
He has been a professor at UC Berkeley since 1982. He is the author of An Introduction to
Queueing Networks (Englewood Cliffs, NJ: Prentice Hall, 1988) and Communication Networks: A First
Course (2nd ed., New York: McGraw-Hill, 1998) and coauthor of High Performance Communication
Networks (2nd ed., San Mateo, CA: Morgan Kaufman, 2000).
Prof. Walrand is a fellow of the Belgian American Education Foundation and a recipient of
the Lanchester Prize and the Stephen O. Rice Prize.
143
Index
Back-Pressure, 2 KL Divergence, 26
Backpressure, 17 Kullbach-Liebler Divergence, 26