You are on page 1of 157

Scheduling and Congestion

Control for Wireless and


Processing Networks
Synthesis Lectures on
Communication Networks
Editor
Jean Walrand, University of California, Berkeley
Synthesis Lectures on Communication Networks is an ongoing series of 50- to 100-page publications
on topics on the design, implementation, and management of communication networks. Each lecture is
a self-contained presentation of one topic by a leading expert. The topics range from algorithms to
hardware implementations and cover a broad spectrum of issues from security to multiple-access
protocols. The series addresses technologies from sensor networks to reconfigurable optical networks.
The series is designed to:
• Provide the best available presentations of important aspects of communication networks.
• Help engineers and advanced students keep up with recent developments in a rapidly evolving
technology.
• Facilitate the development of courses in this field

Scheduling and Congestion Control for Wireless and Processing Networks


Libin Jiang and Jean Walrand
2010
Performance Modeling of Communication Networks with Markov Chains
Jeonghoon Mo
2010
Communication Networks: A Concise Introduction
Jean Walrand and Shyam Parekh
2010
Path Problems in Networks
John S. Baras and George Theodorakopoulos
2010
Performance Modeling, Loss Networks, and Statistical Multiplexing
Ravi R. Mazumdar
2009
Network Simulation
Richard M. Fujimoto, Kalyan S. Perumalla, and George F. Riley
2006
Copyright © 2010 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in
printed reviews, without the prior permission of the publisher.

Scheduling and Congestion Control for Wireless and Processing Networks


Libin Jiang and Jean Walrand
www.morganclaypool.com

ISBN: 9781608454617 paperback


ISBN: 9781608454624 ebook

DOI 10.2200/S00270ED1V01Y201008CNT006

A Publication in the Morgan & Claypool Publishers series


SYNTHESIS LECTURES ON COMMUNICATION NETWORKS

Lecture #6
Series Editor: Jean Walrand, University of California, Berkeley
Series ISSN
Synthesis Lectures on Communication Networks
Print 1935-4185 Electronic 1935-4193
Scheduling and Congestion
Control for Wireless and
Processing Networks

Libin Jiang and Jean Walrand


University of California, Berkeley

SYNTHESIS LECTURES ON COMMUNICATION NETWORKS #6

M
&C Morgan & cLaypool publishers
ABSTRACT
In this book, we consider the problem of achieving the maximum throughput and utility in a class
of networks with resource-sharing constraints. This is a classical problem of great importance.
In the context of wireless networks, we first propose a fully distributed scheduling algorithm
that achieves the maximum throughput. Inspired by CSMA (Carrier Sense Multiple Access), which
is widely deployed in today’s wireless networks, our algorithm is simple, asynchronous, and easy to
implement. Second, using a novel maximal-entropy technique, we combine the CSMA schedul-
ing algorithm with congestion control to approach the maximum utility. Also, we further show
that CSMA scheduling is a modular MAC-layer algorithm that can work with other protocols
in the transport layer and network layer. Third, for wireless networks where packet collisions are
unavoidable, we establish a general analytical model and extend the above algorithms to that case.
Stochastic Processing Networks (SPNs) model manufacturing, communication, and service
systems. In manufacturing networks, for example, tasks require parts and resources to produce other
parts. SPNs are more general than queueing networks and pose novel challenges to throughput-
optimum scheduling. We proposes a “deficit maximum weight” (DMW) algorithm to achieve
throughput optimality and maximize the net utility of the production in SPNs.

KEYWORDS
scheduling, congestion control, wireless networks, stochastic processing networks, car-
rier sense multiple access, convex optimization, Markov chain, stochastic approximation
vii

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 A Small Wireless Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Feasible Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Maximum Weighted Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 CSMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Entropy Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Admission Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Randomized Backpressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Scheduling in Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21


3.1 Model and Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 CSMA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Idealized Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 CSMA Can Achieve Maximal Throughput . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.2 An idealized distributed algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Distributed Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.1 Throughput-Optimal Algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.2 Variation: Constant Update Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.3 Time-invariant A-CSMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Maximal-Entropy Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6 Reducing Delays: Algorithm 1(b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
viii
3.7.1 Time-invariant A-CSMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7.2 Time-varying A-CSMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8 Proof Sketch of Theorem 3.10-(i) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.9 Further Proof Details of Theorem 3.10-(i) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.9.1 Property 3.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.9.2 Property 3.22: Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.9.3 Property 3.22: Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.10 Proof of Theorem 3.10-(ii) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.11 Proof of Theorem 3.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.12 General Transmission Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.13 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.13.1 Proof of the fact that C is the interior of C¯ . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.13.2 Proof the Proposition 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.15 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.15.1 Maximal-weight scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.15.2 Low-complexity but sub-optimal algorithms . . . . . . . . . . . . . . . . . . . . . . . . 57
3.15.3 Throughput-optimum algorithms for restrictive interference models . . . . 57
3.15.4 Random Access algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4 Utility Maximization in Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59


4.1 Joint Scheduling and Congestion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.1 Formulation of Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.2 Derivation of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.1.3 Approaching the Maximal Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.1 Anycast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.2 Multicast with Network Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Properties of Algorithm 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.1 Bound on Backpressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.2 Total Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.3 Queue Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.6 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
ix

5 Distributed CSMA Scheduling with Collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 CSMA/CA-Based Scheduling with Collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.3 Computation of the Service Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3 A Distributed Algorithm to Approach Throughput-Optimality . . . . . . . . . . . . . . 81
5.3.1 CSMA Scheduling with Collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4 Reducing Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.6 Proofs of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.6.1 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.6.2 Proof of Theorem 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.6.3 Proof of Theorem 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.8 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6 Stochastic Processing networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3 Basic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.4 DMW scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.4.1 Arrivals that are smooth enough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.4.2 More random arrivals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.5 Utility maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.7 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.7.1 DMW scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.7.2 Utility maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.9 Skipped proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.9.1 Proof of Theorem 6.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.9.2 Proof of Theorem refthm:rate-stable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.9.3 Proof of Theorem 6.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.9.4 Proof of Theorem 6.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
x CONTENTS

A Stochastic Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123


A.1 Gradient algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
A.2 Stochastic approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
A.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
A.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Authors’ Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Preface
This book explains recent results on distributed algorithms for networks. The book is based on
Libin’s Ph.D. thesis where he introduced the design of a CSMA algorithm based on a primal-dual
optimization problem, extended the work to networks with collisions, and developed the scheduling
of processing networks based on virtual queues.
To make the book self-contained, we added the necessary background on stochastic approx-
imations and on optimization. We also added an overview chapter and comments to make the
arguments easier to follow. The material should be suitable for graduate students in electrical engi-
neering, computer science, or operations research.
The main theme of this book is the allocation of resources among competing tasks. Such
problems are typically hard because of the large number of possible allocations. Instead of searching
for the optimal allocation at each instant, the approach is to design a randomized allocation whose
distribution converges to one with desirable properties. The randomized allocation is implemented
by a scheme where tasks request the resources after a random delay. Each task adjusts the mean value
of its delay based on local information.
One application is wireless ad hoc networks where links share radio channels. Another appli-
cation is processing networks where tasks share resources such as tools or workers. These problems
have received a lot of attention in the last few years. The book explains the main ideas on simple
examples, then studies the general formulation and recent developments.
We are thankful to Prof. Devavrat Shah for suggesting adjusting the update intervals in one
of the gradient algorithms, to Prof. Venkat Anantharam and Pravin Varaiya for their constructive
comments on the thesis, to Prof. Michael Neely and R. Srikant for detailed constructive reviews of
the book, and to Prof. Vivek Borkar, P.R. Kumar, Bruce Hajek, Eytan Modiano, and Dr. Alexandre
Proutiere for their encouragement and useful feedback. We are grateful to NSF and ARO for their
support of our research during the writing of this book.

Libin Jiang and Jean Walrand


August 2010
1

CHAPTER 1

Introduction
In a wireless network, nodes share one or more radio channels. The nodes get packets to transmit
from the application and transmit them hop by hop to their destination. For instance, one user may
be downloading a file from another node; two other users might be engaged in a Skype call.
The nodes cannot all transmit together, for their transmissions would then interfere with one
another. Consequently, at any given time, only a subset of nodes should transmit. The scheduling
problem is to design an algorithm for selecting the set of nodes that transmit and a protocol for
implementing the algorithm. Moreover, the nodes should decide which packet to send and to what
neighboring node.
This problem admits a number of formulations. In this book, we adopt a simple model of
interference: two links either conflict or they do not. Thus, conflicts are represented by an conflict
graph whose vertices are all the links and the edges are between pairs of links that conflict and should
not transmit together. Equivalently, there are subsets of links that can transmit together because they
do not share an edge. Such sets are called independent sets.
Intuitively, the set of links that should transmit depends on the backlog of the nodes. For
instance, we explain that choosing the independent set with the maximum sum of backlogs is a
good policy when the nodes need to transmit each packet only once. This policy is called Maximum
Weighted Matching (MWM). Another good policy is to first select the link with the largest backlog,
then the link with the largest backlog among those that do not conflict with the first one, and so
on. This policy is called Longest Queue First (LQF). These two policies are not easy to implement
because the information about the backlog of the nodes is not available to all the nodes. Moreover,
even if all nodes knew all the backlogs, implementing MWM would still be computationally hard
because of the huge number of independent sets even in a small graph.
One key idea in this book is that, instead of looking for the independent set with the maximum
sum of backlogs, one designs a randomized scheduling algorithm. To implement this algorithm, the
nodes choose random waiting times. The mean of the waiting time of each node decreases with
the backlog of that node. After that waiting time, the node listens to the channel. If it does not
hear any transmission, it starts transmitting a packet. Otherwise, it chooses a new waiting time and
repeats the procedure. Note that this algorithm is distributed since each node needs only know its
own backlog and whether any conflicting node is transmitting. Moreover, the algorithm does not
require any complex calculation. One can show that this algorithm, called A-CSMA for adaptive
carrier sense multiple access, selects an independent set with a probability that increases with the sum
of the backlogs in that set. Thus, this randomized algorithm automatically approximates the NP-
hard selection that MWM requires. As you might suspect, the probability distribution of the active
2 1. INTRODUCTION
independent sets may take a long time to converge. However, in practice, this convergence appears
to be fast enough for the mechanism to have good properties.
When the nodes must relay the packets across multiple hops, a good algorithm is to choose
an independent set such that the sum of the differences of backlogs between the transmitters and
the receivers is maximized. Again, this problem is NP-hard and a randomized algorithm is an
approximation with good properties. In this algorithm, the nodes pick a random waiting time whose
mean decreases with the back-pressure of the packet being transmitted. Here, the back-pressure of a
packet is the difference in queue lengths between the transmitter and the receiver, multiplied by the
link rate. We should call this protocol B-CSMA, but we still call it A-CSMA to avoid multiplying
terminology.
When we say that the randomized algorithms have good properties, we mean more than they
are good heuristics that work well in simulations. We mean that they are in fact throughput-optimal
or utility-maximizing. That is, these algorithms maximize the rates of flows through the network, in
a sense that we make precise later. One may wonder how simple distributed randomized algorithms
can have the same throughput optimality as a NP-hard algorithm such as MWM. The reason is
that achieving long-term properties of throughput does not require making the best decision at
each instant. It only requires making good decisions on average. Accordingly, an algorithm that
continuously improves the random selection of the independent set can take a long time to converge
without affecting the long-term throughput. The important practical questions concern the ability
of these algorithms to adapt to changing conditions and also the delays that packets incur with an
algorithm that is only good on average. As we explain, the theory provides some answers to these
questions in the form of upper bounds on the average delays.
Processing networks are models of communication, manufacturing, or service networks. For
instance, a processing network can model a multicast network, a car assembly plant, or a hospital. In
a processing network, tasks use parts and resources to produce new parts that may be used by other
tasks. In a car assembly plant, a rim and a tire are assembled into a wheel; four wheels and a chassis
are put together, and so on. The tasks may share workers and machine tools or robots. In a hospital,
a doctor and nurses examine a patient that may then be dispatched to a surgical theater where other
nurses and doctors are engaged in the surgery, and so on.
The scheduling problem in a processing network is to decide which tasks should be performed
at any one time.The goal may be to maximize the rate of production of some parts, such as completed
cars, minus the cost of producing these parts. Such a problem is again typically NP-hard since it
is more general that the allocation of radio channels in a wireless network. We explain scheduling
algorithms with provable optimality properties.
The book is organized as follows. Chapter 2 provides an illustration of the main results on
simple examples. Chapter 3 explains the scheduling in wireless networks. Chapter 4 studies the
combined admission control, routing, and scheduling problem for network utility maximization.
Chapter 5 studies collisions in wireless networks. Chapter 6 is devoted to processing networks.
Chapter A explains the main ideas of stochastic approximations that we use.
3

CHAPTER 2

Overview
This chapter explains the main ideas of this book on a few simple examples. In Section 2.1, we consider
the scheduling of three wireless nodes and review the maximum weighted matching (MWM) and
the A-CSMA scheduling. Section 2.2 explains how to combine admission control with scheduling.
Section 2.3 discusses the randomized backpressure algorithms. Section 2.4 reviews the Lagrangian
method to solve convex optimization problems. We conclude the chapter with a summary of the
main observations.

2.1 A SMALL WIRELESS NETWORK


Consider the network shown on the left side of Figure 2.1. There are three wireless links numbered

1 2 3
  
1 2 3
conflict conflict
Figure 2.1: A network with three links.

1, 2, and 3 where each link is a pair of radio transmitter and receiver. Packets arrive at the links (or
more specifically, the transmitters of the links) with the rates indicated in the right figure. A simple
situation is one where at each time t = 0, 1, 2, . . . a random number of packets with mean λi and
a finite variance arrive at link i, independently of the other times and of the arrivals at other links
and with the same distribution at each time. Thus, the arrivals are i.i.d. (independent and identically
distributed) at each link, and they are independent across links. Say that the packet transmissions
take exactly one time unit.
The links 1 and 2 conflict: if their transmitters transmit together, the signals interfere and
the receivers cannot recover the packets. The situation is the same for links 2 and 3. Links 1 and
3, however, are far enough apart not to interfere with one another. If they both transmit at the
same time, their receivers can get the packets correctly. The figure on the right side has omitted the
receivers (such that the three circles there correspond to the three links in the left figure), and it
represents the above conflict relationships by a solid line between links 1 and 2 and another between
4 2. OVERVIEW
links 2 and 3. (In Section 2.1 and 2.2, since only one-hop flows are considered, we omit the receivers
and use the terms node and link interchangeably.)
Thus, at any given time, if all the nodes have packets to transmit, the sets of nodes that can
transmit together without conflicting are ∅, {1}, {2}, {3}, and {1, 3}, where ∅ designates the empty
set.These sets are called the independent sets of the network. An independent set is said to be maximal
if one cannot add another node to it and get another independent set. Thus, {2} and {1, 3} are the
maximal independent sets.

2.1.1 FEASIBLE RATES


The scheduling problem is to find which independent set should transmit at any given time to keep
up with the arriving packets. The first question is whether this is feasible at all. The answer depends
on how large the arrival rates are and is given in Theorem 2.3 below. However, before stating the
theorem, we should review the following notions about Markov chains.

Definition 2.1 Irreducibility and Positive Recurrence.


Consider a discrete time Markov chain {X(n), n = 0, 1, . . .} with a countable state space (i.e.,
a finite or countably infinite number of states). The Markov chain is irreducible if it can go from
every state to any other state (not necessarily in one step). An irreducible Markov chain is positive
recurrent if it spends a positive fraction of time in every state.

The following result is well known (see, for example, (2)).

Theorem 2.2 [Lyapunov Function and Positive Recurrence]


Consider a discrete time Markov chain {X(n), n = 0, 1, . . .} with a countable state space.
(a) If the Markov chain is irreducible, then either it is positive recurrent or it spends a zero fraction
of time in every state.
(b) If the Markov chain is irreducible and such that there is a nonnegative function V (X(n)) such
that
E[V (X(n + 1)) − V (X(n))|X(n)] ≤ α1{X(n) ∈ A} −  (2.1)
for some α > 0,  < 0 and some finite set A, then the Markov chain is positive recurrent. In that case, we
say that V is a Lyapunov function for the Markov chain.

Condition (2.1) means that, outside of a finite set A of states, the function V tends to decrease.
Since the function is nonnegative, it cannot decrease all the time. Consequently, X(n) must spend
a positive fraction of time inside A. By (a), this implies that the Markov chain is positive recurrent
since A is finite.

Theorem 2.3 Feasibility and Positive Recurrence.


For simplicity, consider a time-slotted system. In time slot, a node can either serve one packet or not
serve any packet. (Also, recall that it cannot serve a packet if any of its conflicting nodes serves a packet in
2.1. A SMALL WIRELESS NETWORK 5
the slot). In time slot n = 0, 1, 2, . . . , Ai (n) packets arrive at queue i (where Ai (n) is integer). Assume
that Ai (n), ∀i, n are independent of each other. Also, assume that E(Ai (n)) = λi , ∀n (where λi is the
arrival rate to queue i), and E(A2i (n)) ≤ C < ∞, ∀i, n.
(a) There is a schedule such that the queue lengths do not grow to infinity only if

λ1 + λ2 ≤ 1 and λ2 + λ3 ≤ 1. (2.2)

(b) Moreover, if
λ1 + λ2 < 1 and λ2 + λ3 < 1, (2.3)
then there is a schedule such that X(n) is positive recurrent, where X(n) = (X1 (n), X2 (n), X3 (n)) denotes
the vector of queue lengths at time n.
We say that the arrival rates are feasible if they satisfy (2.2) and that they are strictly feasible if
they satisfy (2.3).

Proof:
(a) Assume that λ1 + λ2 > 1. At any given time, at most one of the two nodes 1 and 2 can
transmit. Consequently, the rate at which transmissions remove packets from the two nodes {1, 2} is
at most 1. Thus, packets arrive faster at the nodes {1, 2} than they leave. Consequently, the number
of packets in these nodes must grow without bound.
To be a bit more precise, let Qn be the total number of packets in the nodes {1, 2} at time
n ∈ {0, 1, 2, . . . }. Note that
Qn ≥ An − n
where An is the number of arrivals in the nodes {1, 2} up to time n. Indeed, at most n packets have
left between time 0 and time n − 1. Also, by the strong law of large numbers, An /n → λ1 + λ2
almost surely as n → ∞. Thus, dividing the above inequality by n, we find that

1
lim inf Qn ≥ λ1 + λ2 − 1 > 0.
n
This implies that Qn → ∞ almost surely as n → ∞. Thus, no schedule can prevent the backlog in
the network from growing without bound if λ1 + λ2 > 1, and similarly if λ2 + λ3 > 1.
(b) Assume that (2.3) holds. Then there is some p ∈ [0, 1] be such that

λ2 < 1 − p; λ1 < p; λ3 < p. (2.4)


We claim that a schedule that at each step chooses to serve nodes 1 and 3 with probability
p and node 2 with probability 1 − p, independently from step to step, makes the queue lengths
positive recurrent. To see this, we show that when this schedule is used the function

1 2
V (X(n)) = [X (n) + X22 (n) + X32 (n)]
2 1
6 2. OVERVIEW
is a Lyapunov function. To check the property (2.1), recall that Ai (n) the number of arrivals in queue
i at time n. Let also Si (n) take the value 1 if queue i is served at time n, and the value 0 otherwise.
Then Xi (n + 1) = Xi (n) − Zi (n) + Ai (n) where Zi (n) := Si (n)1{Xi (n) > 0}. Note that Xi (n) is
a non-negative integer (since both Ai (n) and Si (n) are integers). Therefore,

Xi2 (n + 1) − Xi2 (n)


= A2i (n) + Zi2 (n) + 2Xi (n)Ai (n) − 2Xi (n)Zi (n) − 2Ai (n)Zi (n)
= A2i (n) + Zi2 (n) + 2Xi (n)Ai (n) − 2Xi (n)Si (n) − 2Ai (n)Zi (n). (2.5)

Hence,
1
E[Xi2 (n + 1) − Xi2 (n)|X(n)] ≤ βi + (λi − pi )Xi (n),
2
where βi = E(A2i (n) + Si2 (n))/2 and pi = E[Si (n)|X(n)], so that p1 = p3 = p, p2 = 1 − p.
Consequently, summing these inequalities for i = 1, 2, 3, one finds that


3
E[V (X(n + 1)) − V (X(n))|X(n)] ≤ β + (λi − pi )Xi (n)
i=1

with β = β1 + β2 + β3 . Now, λi − pi < −γ < 0 for i = 1, 2, 3, for some γ because the arrival
rates are strictly feasible. This expression is less than − if β − γ (X1 (n) + X2 (n) + X3 (n)) < −,
which occurs if
β +
X1 (n) + X2 (n) + X3 (n) > ,
γ
and this is the case when X(n) is outside the finite set defined by the opposite inequality. Therefore,
X(n) is positive recurrent by Theorem 2.2 (b).
2
The theorem does not clarify what happens when λ1 + λ2 = 1 or λ2 + λ3 = 1. The answer
is a bit tricky. To understand the situation, assume λ1 = 1, λ2 = λ3 = 0. In this case, one may serve
node 1 all the time. Does queue 1 grow without bound? Not if the arrivals are deterministic: if
exactly one packet arrives at each time at node 1, then the queue does not grow. However, if the
arrivals are random with mean 1, then the queue is not bounded. For instance, if two packets arrive
with probability 0.5 and no packet arrives with probability 0.5, then the queue length is not positive
recurrent. This means that the queue spends a zero fraction of time below any fixed level and its
mean value goes to infinity.

2.1.2 MAXIMUM WEIGHTED MATCHING


We explained that when the arrival rates are strictly feasible, there is a schedule that makes the
queues positive recurrent. However, the randomized schedule we proposed required knowing the
arrival rates. We describe an algorithm that does not require that information. This algorithm is
called Maximum Weighted Matching (MWM).
2.1. A SMALL WIRELESS NETWORK 7
Definition 2.4 Maximum Weighted Matching.
The MWM algorithm serves queue 2 if the backlog of that queue is larger than the sum of
the backlogs of queues 1 and 3; otherwise, it serves queues 1 and 3. That is, the algorithm serves the
independent set with the largest sum of backlogs.

The following result gives the property of MWM.


Theorem 2.5 MWM and Positive Recurrence.
Assume that the arrival rates are strictly feasible and have a finite variance. Then MWM makes the
queues positive recurrent.

Proof:
This result is due to Tassiulas and Ephremides (66). Let Xi (n) be the queue length in node
i at time n (i = 1, 2, 3; n = 0, 1, 2, . . .). Let also X(n) = (X1 (n), X2 (n), X3 (n)) be the vector of
queue lengths. Define
1
V (X(n)) = [X12 (n) + X22 (n) + X32 (n)]
2
as half the sum of the squares of the queue lengths.
The claim is that, under MWM, V (X(n)) is a Lyapunov function for the Markov chain X(n).
Proceeding as in the proof of the previous theorem and with the same notation, one finds (2.5).
Taking expectation given X(n) and noting that Si (n) is now a function of X(n) determined by the
MWM algorithm, one finds

3
E[V (X(n + 1)) − V (X(n))|X(n)] ≤ β + (λi − Si (n))Xi (n).
i=1
To prove (2.1), it now suffices to show that this expression is less than − for X(n) outside a finite
set.
To do this, note that MWM chooses the value of {Si (n), i = 1, 2, 3} that maximizes

Si (n)Xi (n). The maximum value must then be larger than pX1 (n) + (1 − p)X2 (n) + pX3 (n),
where p is the probability defined before such that (2.4) holds. Indeed, the maximum is either
X1 (n) + X3 (n) or X2 (n), and this maximum is larger than any convex combination of these two
values. Hence,
E[V (X(n + 1)) − V (X(n))|X(n)]
≤ β + (λ1 − p)X1 (n) + (λ2 − (1 − p))X2 (n) + (λ3 − p)X3 (n).
In the proof of the previous theorem, we showed that the right-hand side is less than − when Xn
is outside of a finite set.
2
You will note that the crux of the argument is the MWM makes the sum of the squares of
the queue lengths decrease faster than a randomized schedule and such a randomize schedule exists
that makes that sum decrease fast enough when the arrival rates are strictly feasible.
8 2. OVERVIEW
2.1.3 CSMA
Although the MWM algorithm makes the queues positive recurrent when the arrival rates are
strictly feasible, this algorithm is not implementable in a large network for two reasons. First, to
decide whether it can transmit, a node must know if it belongs to the independent set with the
maximum weight. To determine that independent set, some node must know the queue lengths.
Getting that information requires a substantial amount of control messages. Second, identifying the
maximum weight independent set, even when knowing all the queue lengths, is a computationally
hard problem. Indeed, the number of independent sets in a large network is enormous and comparing
their weights requires an excessive number of computations.
In this section, we describe a different approach based on a Carrier Sense Multiple Access
(CSMA) protocol. When using this protocol, node i waits a random amount of time that is expo-
nentially distributed with rate Ri , i.e., with mean 1/Ri . The waiting times of the different nodes
are independent. At the end of its waiting time, a node listens to the radio channel. If it hears some
transmission, then it calculates a new waiting time and repeats the procedure. Otherwise, it transmits
a packet. For simplicity, we assume for now that the carrier sensing is perfect. That is, if one node i
starts transmitting at time t and a conflicting node j listens to the channel at time t + , then we
assume that node j hears the transmission of node i, for any arbitrarily small . Therefore, there
is no collision because under the above assumption a collision can only occur when two conflicting
nodes start transmitting at exactly the same time, which has probability 0 with the exponentially
distributed backoff times. In practice, this assumption is not valid: it takes some time for a node to
sense the transmission of another node. Moreover, we assume that there is no hidden node. This
means that if node i does not hear any conflicting transmission and starts sending a packet to its
intended receiver k, then there is no other node j that is transmitting and can be heard by k and
not by i. This is another approximation. In Chapter 5, we explain how to analyze the network with
collisions.
It turns out that this protocol is easier to analyze in continuous time than in discrete time.
For ease of analysis, we also assume that the packet transmission times are all independent and
exponentially distributed with mean 1. The arrival processes at the three nodes are independent with
rate λi .
Let us pretend that the nodes always have packets to transmit and that, when they run out,
they construct a dummy packet whose transmission time is distributed as that of a real packet. With
these assumptions, the set St of nodes that transmit at time t is modeled by a continuous-time
Markov chain that has the state transition diagram shown in Figure 2.2.
For instance, a transition from ∅ to {1} occurs when node 1 starts to transmit, which happens
with rate R1 when the waiting time of that node expires. Similarly, a transition from {1} to ∅ occurs
when the transmission of node 1 terminates, which happens with rate 1. Note that a transition from
{2} to {1, 2} cannot happen because node 1 senses that node 2 is already transmitting when its
waiting time expires. The other transitions can be explained in a similar way. We call this Markov
chain the CSMA Markov chain because it models the behavior of the CSMA protocol.
2.1. A SMALL WIRELESS NETWORK 9

R1 {1}
R3
1
1 R2
 {2} {1, 3}
1 R1
1 R3
{3} 1

Figure 2.2: The CSMA Markov chain.

One has the following theorem.

Theorem 2.6 Invariant Distribution of CSMA Markov Chain.


The CSMA Markov chain is time-reversible and its invariant distribution π is given by

π(∅) = K and π(S) = Ki∈S Ri for S ∈ {{1}, {2}, {3}, {1, 3}} (2.6)

where K is such that the probabilities of the independent sets add up to one.

Proof:
Recall that a continuous-time Markov chain with rate matrix Q has invariant distribution π
and is time-reversible if and only if

π(i)q(i, j ) = π(j )q(j, i), ∀i, j.

A stochastic process is time-reversible if it has the same statistical properties when reversed in time.
The conditions above, called detailed balance equations, mean that when the Markov chain is stationary,
the rate of transitions from i to j is the same as the rate of transitions from j to i. If that were not
the case, one could distinguish between forward time and reverse time and the Markov chain would
not be time-reversible. Note also that by summing these identities over i, one finds that
 
π(i)q(i, j ) = π(j ) q(j, i) = 0
i i

where the last equality follows from the fact that the rows of a rate matrix sum to zero. Thus, π Q = 0
and π is therefore the stationary distribution of the Markov chain. See (38) for a discussion of this
method and its applications.
For the CSMA Markov chain, it is immediate to verify the detailed balance equations. For
instance, let i = {1} and j = {1, 3}. Then q(i, j ) = R3 and q(j, i) = 1, so that one has

π(i)q(i, j ) = (KR1 ) × R3 and π(j )q(j, i) = (KR1 R3 ) × 1,


10 2. OVERVIEW
so that π(i)q(i, j ) = π(j )q(j, i). The identities for the other pairs of states can be verified in a
similar way.
2
Assume that Ri = exp{αXi } for i = 1, 2, 3 where Xi is the queue length in node i. The
algorithm is then called A-CSMA, for adaptive CSMA. Assume that α is small such that Ri = αXi
changes very slowly (or quasi-static) compared to the dynamics of the CSMA Markov chain, then
approximately, for any independent set S,

π(S) = K exp{α Xi }.
i∈S

This expression shows that, when using the CSMA protocol, the independent set with the largest
weight transmits the largest fraction of time. Thus, the CSMA protocol automatically approximates
the solution of the hard problem of finding the independent set with the maximum weight. However,
it may take a long time for the Markov chain distribution to approach its stationary distribution. In
the mean time, the queue lengths change. Thus, it is not quite clear that this scheme should make
the queues positive recurrent. We prove that this is indeed the case in Chapter 3.

2.1.4 ENTROPY MAXIMIZATION


We have observed in Theorem 2.6 that the CSMA protocol results in a product-form distribution.
Moreover, we know from statistical mechanics that such distributions have a maximum entropy
subject to some mean values. This property was discovered in the study of Gibbs distributions
that occur in statistical mechanics (see e.g., (74)). For instance, the distribution of the states of the
molecules of an ideal gas with a given average velocity (temperature) is such a distribution. The
maximum entropy property of queuing networks has been explored in (38; 74). See also (69) for a
discussion.
We are looking for a distribution that could be implemented by a CSMA protocol and that
would serve the links fast enough. It is then logical to look for a maximal-entropy distribution that
serves the links fast enough. Such a distribution is likely to be implementable by a CSMA protocol.
Moreover, solving the problem should tell us how to select the parameters of the distribution, i.e.,
the parameters of the CSMA protocol.
Accordingly, we formulate the following problem:


Maximize H (π ) := − π(S) log π(S)
S 
Subject to sj (π ) := π(S) ≥ λj , j = 1, 2, 3 and π(S) = 1. (2.7)
{S|j ∈S} S

In these expressions, the sums are over all independent sets S and H (π ) is the entropy of the
distribution π (see (65)). Also, sj (π ) is the service rate of link j under the distribution π since it is
the sum of the probabilities that link j is served.
2.1. A SMALL WIRELESS NETWORK 11
To solve the problem (2.7), we associate a Lagrange multiplier with each inequality constraint
and with the equality constraint. (See Section 2.4 for a review of that method.) That is, we form the
Lagrangian
   
L(π, r) = − π(S) log π(S) − rj (λj − π(S)) − r0 (1 − π(S)). (2.8)
S j {S|j ∈S} S

We know that if the rates are strictly feasible, then there is a distribution π that satisfies the
constraints of the problem (2.7). Consequently, to solve (2.7), we can proceed as follows. We find
π that maximizes L(π, r) while r ≥ 0 minimizes that function. More precisely, we look for a saddle
point (π ∗ , r ∗ ) of L(π, r), such that π ∗ maximizes L(π, r ∗ ) over π , and r ∗ minimizes L(π ∗ , r) over
r ≥ 0. Then π ∗ is an optimal solution of (2.7). (In Section 2.4, we give an example to illustrate this
generic method for solving constrained convex optimization problems.)
To maximize L(π, r) over π , we express that the partial derivative of L with respect to π(S0 )
is equal to zero, for every independent set S0 . From (2.8), we find

∂ 
L(π, r) = −1 − log(π(S0 )) + rj + r0 = 0.
∂π(S0 )
j ∈S0

This implies that



π(S) = C exp{ rj }, (2.9)
j ∈S

where C is a constant such that S π(S) = 1. This distribution corresponds to a CSMA algorithm
with parameters Rj = exp{rj }. We conclude that there must exist some parameters Rj such that
the CSMA protocol serves the links fast enough.
Next, we look for the parameters r ∗ . To do this, we use a gradient algorithm to minimize
L(π, r). Note that the derivative of L(π, r) with respect to rj is given as follows:

∂ 
L(π, r) = −(λj − π(S)) = −(λj − sj (π )).
∂rj
{S|j ∈S}

Consequently, to minimize L(π, r), the gradient algorithm should update rj is the direction opposite
to this derivative, according to some small step size. Also, we know that rj ≥ 0, so that the gradient
algorithm should project the update into [0, ∞).That is, the gradient algorithm updates rj as follows:

rj (n + 1) = {rj (n) + α(n)[λj − sj (π(n))]}+ . (2.10)

Here, for any real number x, one defines {x}+ := max{x, 0}, which is the value in [0, ∞) that is the
closest to x. Also, n corresponds to the n-th step of the algorithm. At that step, the parameters r(n)
are used, and they correspond to the invariant distribution π(n) given by (2.9) with those parameters.
In this expression, α(n) is the step size of the algorithm.
12 2. OVERVIEW
This update rule has the remarkable property that link j should update its parameter rj
(which corresponds to Rj = exp{rj } in the CSMA protocol) based only on the difference between
the average arrival and service rates at that link. Thus, the update does not depend explicitly on what
the other links are doing. The service rate at link j certainly depends in a complicated way on the
parameters of the other links. However, that average arrival and service rates at link j are the only
information that link j requires to update its parameter.
Note also that if the average arrival rate λj at link j is larger that the average service rate
sj of that link, then that link should increase its parameter rj , thus becoming more aggressive in
attempting to transmit, and conversely.
Unfortunately, the link observes actual arrivals and transmissions, not their average rates. In
other words, link j observes a “noisy” version of the gradient λj − sj (π(n)) that it needs to adjust
its parameter rj . This noise in the estimate of the gradient is handled by choosing the step sizes
α(n) and also the update intervals carefully.
Let us ignore this difficulty for now to have a rough sense of how the link should update its
parameter. That is, let us pretend that link j updates its parameter rj every second, that the total
number of arrivals Aj (n) at the link during that second is exactly λj , and that the total number of
transmissions Dj (n) by the link is exactly equal to the average value sj . To simplify the situation even
further, let us choose a fixed step size in the algorithm, so that α(n) = α. With these assumptions,
the gradient algorithm (2.10) is

rj (n + 1) = {rj (n) + α[Aj (n) − Dj (n)]}+ .

Now observe that the queue length Xj (n) at time n satisfies a very similar relation. Indeed, one has

Xj (n + 1) ≈ {Xj (n) + Aj (n) − Dj (n)}+ .

Comparing these update equations, we find that

rj (n) ≈ αXj (n), n ≥ 1,

if we choose rj (0) = αXj (0).


Thus, with this algorithm, we find that the parameters of the CSMA protocol should be

Rj = exp{αXj }, j = 1, 2, 3. (2.11)

In other words, node j should select a waiting time that is exponentially distributed with rate
exp{αXj }. This algorithm is fully distributed and is very easy to implement.
However, the actual algorithms we will propose, although still simple and distributed, are
a little different from (2.11). This is because we derived the algorithm (2.11) by making a key
approximation, that the arrivals and transmissions follow exactly their average rate. To be correct, we
have to adjust r slow enough so that the CSMA Markov chain approaches its invariant distribution
before r changes significantly. There are at least two ways to achieve this.
2.1. A SMALL WIRELESS NETWORK 13
One way is to modify algorithm (2.11) by using a small enough constant step size α, a large
enough constant update interval T , and imposing an upper bound on r so that the mixing time of
the CSMA Markov chain is bounded. Specifically, node i uses a waiting time that is exponentially
distributed with rate Ri . Every T seconds, all Ri ’s are updated as follows:

Ri (n) = exp{min{(α/T )Xi (n), rmax +


}} (2.12)

where rmax ,
> 0 are constants, and Xi (n)’s are the queue lengths at the time of the n’s update.
Then, we explain in Section 3.4.3 that this algorithm is almost throughput-optimal. (That
is, it can stabilize the queues if the vector of arrival rates is in a region parametrized by rmax . The
region is slightly smaller than the maximal region.)
Another way is to use diminishing step sizes and increasing update intervals so that eventually
the arrival rates and service rates get close to their average values between two updates.This is a time-
varying algorithm since the step sizes and update intervals change with time. Detailed discussions
are provided in Section 3.4.1 and 3.4.2.
Recapping, the main point of this discussion is that solving the problem (2.7) shows that
there are parameters Rj of the CSMA protocol that serve the links fast enough. Moreover, these
parameters are roughly exponential in the queue lengths. Finally, with a suitable choice of the step
sizes and of the update intervals, one can make the algorithm support the arrival rates.

2.1.5 DISCUSSION
Before moving on to the next topic, it may be useful to comment on the key ideas of the current
section.
The first point that we want to address is the two different justifications we gave for why
Rj = exp{αXj } are suitable parameters. Recall that the first justification was that if the queue
lengths do not change much while the Markov chain approaches its stationary distribution, then

choosing these values leads to a product form π(S) = C exp{α j ∈S Xj } that favors independent
sets with a large weight. Thus, in some sense, this choice is an approximation of MWM, which we
know is stable. One flaw in this argument is that the approximation of MWM is better if α is large.
However, in that case, the parameters Rj change very fast as the queue lengths change. This is not
consistent with the assumption that the queue lengths do not change much while the Markov chain
approaches its stationary distribution. The second justification is that it corresponds to a gradient
algorithm with a fixed step size. For this algorithm to be good, the step size has to be fairly small.
However, in that case, we know that the algorithm takes a long time to converge. Thus, we find the
usual tradeoff between speed of convergence and accuracy of the limit.
The second point is related to the first and concerns the convergence time of the algorithm.
The number of states in the Markov chain is the number of independent sets. This number grows
exponentially with the number of links. Thus, one should expect the algorithm to converge slowly
and to result in very poor performance in any practical system. In practice, this is not the case. In fact,
the algorithm appears to perform well. The reason may have to do with the locality of the conflicts
14 2. OVERVIEW
so that good decisions may depend mostly on a local neighborhood and not on a very large number
of links.

2.2 ADMISSION CONTROL


In the previous sections, we assumed that the arrival rates λi are given and that the scheduling
algorithm tries to keep up with these arrivals. In this section, we consider the case where the network
can control the arrivals by exercising some admission control. The goal is the admit packets and
schedule the transmissions to maximize the sum of the utilities of the flows of packets. That is, we
assume that the packets that arrive at rate λi are for some user whose utility is ui (λi ) where ui (·)
is a positive, increasing, and concave function. This function expresses the diminishing return for
a higher rate. Thus, if λi increases by , the user perceives a smaller improvement when λi is large
than when it is small.
The problem is then

Maximize ui (λi )
i
s.t. λ is feasible. (2.13)

The approach is to combine the A-CSMA protocol as before with admission control. As we
explain below, the network controls the arrivals as follows. When the backlog of node i is Xi , the
arrival rate λi is chosen to maximize ui (λi ) − γ Xi λi where γ is some positive constant. Note that
the choice of λi depends only on the backlog in link i, so that the algorithm is local.
Thus, the arrival rates decrease when the backlogs in the nodes increase. This is a form of
congestion control. Since the mechanism maximizes the sum of the user utilities, it implements a
fair congestion control combined with the appropriate scheduling. In the networking terminology,
one would say that this mechanism combines the transport and the MAC layer. This mechanism is
illustrated in Figure 2.3.

Each i maximizes ui(i) - Xii


1 2 3

1 2 3

CSMA: Node i uses delay with mean exp{- Xi}

Figure 2.3: Combined admission control and scheduling. Note that the node decisions are based on
local information.
2.3. RANDOMIZED BACKPRESSURE 15
The main idea is to derive this combined admission control and scheduling algorithm is to
replace the problem (2.13) by the following one:

Maximize H (π ) + β ui (λi )
i
s.t. sj (π ) ≥ λj , ∀j. (2.14)

In this problem, β is some positive constant. If this constant is large, then a solution of (2.14)
approximates the solution of (2.13). Indeed, H (π ) is bounded and has a negligible effect on the
objective function of (2.14) when β is large.
The Lagrangian for problem (2.14) (see Section 2.4) is
  
L(π, λ, r) = H (π) + β ui (λi ) + rj [sj (π ) − λj ] − r0 [1 − π(S)].
i j S

As before, the maximization over π results in the CSMA protocol with rates Rj = exp{rj }. Also,
the minimization over r using a gradient algorithm is as before, which yields rj ≈ αXj . The maxi-
mization over λ amounts to choosing each λj to solve the following problem:

Maximize βuj (λj ) − λj rj .

Since rj ≈ αXj , this problem is essentially the following:

Maximize uj (λj ) − λj Xj αβ −1 = uj (λj ) − λj γ Xj

with γ = αβ −1 . This analysis justifies the admission control algorithm we described earlier.

2.3 RANDOMIZED BACKPRESSURE


So far, the nodes had to transmit each packet once. We now consider the case of multi-hop networks.
In the network shown in Figure 2.4, the circles represent nodes. There are two flows of packets
(flow 1, indicated by solid lines and flow 2 indicated by dashed lines). Packets of flow 1 arrive at the
top node and must be transmitted down and eventually to the bottom right node. Packets of flow 2
arrive at the left node and must make their way to the right middle node. The possible paths of the
flows are shown in the figure. Note that the packets of flow 2 can follow two possible paths. That is,
some nodes make a routing decision. Each node maintains one queue per flow that goes through it.
For instance, node i has one queue for flow 1 and one for flow 2. In the figure, the backlog of packets
of flow 1 is indicated by an integer that is underlined whereas that of flow 2 is not underlined. Thus,
the backlog of packets of flow 1 in node i is 8 and that of flow 2 is 3.
Define a link as an ordered pair of two nodes (the transmitter and receiver). In Fig. 2.4,
a, b, . . . , h are all links. Denote by t (·) and w(·) the transmitter and receiver of a link, respectively.
The links are subject to conflicts that are not indicated explicitly in the figure. In particular,
any two links may or may not transmit at the same time. (Note that the conflicts here are among
16 2. OVERVIEW

Figure 2.4: A multi-hop network.

links instead of nodes.) One obvious conflict is that a link can send only one packet at a time, so that
it must choose whether to send a packet of flow 1 or one of flow 2 when it has the choice. We assume
also that the transmissions have different success probabilities and possibly different physical-layer
transmission rates. For instance, when link e transmits packets, these packets reach the next node
with average rate r(e).
The goal is to design the admission control, the routing, and the scheduling to maximize the
utility
u1 (λ1 ) + u2 (λ2 )
of the two flows of packets.
We explain in Chapter 4 that the following algorithm, again called A-CSMA combined with
admission control and routing, essentially solves the problem:
1) Queuing: Each node maintains one queue per flow of packets that goes through it.
2) Admission Control: λ1 is selected to maximize u1 (λ1 ) − γ X1 λ1 where γ is some constant and X1
is the backlog of packets of flow 1 in the ingress node for these packets. Similarly, λ2 maximizes
u2 (λ2 ) − γ X2 λ2 where X2 is the backlog of packets of flow 2 in the ingress node for these packets.
3) Priority: Each link selects which packet to send as follows. Link d chooses to serve flow 1 since
(8 − 4)r(d) > (3 − 5)r(d) and (8 − 4)r(d) > 0. Here, (8 - 4) is the difference between the backlogs
2.3. RANDOMIZED BACKPRESSURE 17
of packets of flow 1 in node t (d) and w(d), and (3 - 5) is the difference between the backlogs of
packets of flow 2 in node t (d) and w(d). That is, the link chooses the flow with the maximum
backpressure if it is positive. If the maximal backpressure is non-positive, then the link does not
serve any flow. The backpressure of a flow on a link is defined as the rate of the link multiplied by the
difference in the backlogs of that flow between the transmitter and receiver of the link. (One could
think of the backlog as a potential, the rate of the link as its conductance, and the backpressure as
the current across the link when it is activated; the link chooses the flow with the largest current.)
4) Scheduling and routing: The links use the CSMA protocol. Each link has an independent backoff
timer (which is maintained by the transmitter of the link). The rate of the exponentially distributed
backoff delay of a link is exponential in the positive part of backpressure of the flow it has selected.
Since link d selects flow 1 and its backpressure is γ := (8 − 4)r(d), the backoff delay of that link
is then exponentially distributed with rate exp{αγ } where α is a constant. Note that link b and g,
both serving flow 2, have independent backoff timers. The link with the larger backpressure has a
smaller mean backoff delay. This is a (randomized) routing decision. The intuition is that the packets
should be sent where they flow better.
We justify this algorithm when all the links have the same unit rate, to simplify the notation.
Packets of flow f arrive into the network with rate λf , for f = 1, . . . , F . The utility of that flow is
uf (λf ). Each node i maintains a separate queue for each flow f of packets that go through it. The
backlog in that queue is Xi,f . Let sj,f be the service rate of packets of flow f by link j . Let δf and

f be the source and destination node of flow f . Consider the following problem:


Maximize H (π) + β uf (λf )
f
  
Subject to sj,f ≤ sj ,f , ∀f, i = δf , i =
f , λf ≤ sj ,f , ∀f,
j :w(j )=i j :t (j )=i j :t (j )=δ
  f

sj,f ≤ sj (π ), and π(S) = 1.


f S

In this problem, sj (π) is the average service rate of link j under the distribution π .
With dual variables ri,f ’s, one forms a partial Lagrangian:
   
L = H (π) + β uf (λf ) − ri,f [ sj,f − sj ,f ]
f f,i =δf ,i =
f j :w(j )=i j :t (j )=i
  
− rδf ,f [λf − sj ,f ] − r0 [1 − π(S)].
f j :t (j )=δ f S

We need to maximize L over π, sj,f , λf subject to the constraint f sj,f ≤ sj (π ), and
minimize L over ri,f ≥ 0.
The minimization over {ri,f } with a gradient algorithm shows that ri,f ≈ αXi,f . For any pos-
itive constants {ri,f }, we maximize L over {sj,f } and π as follows. First fix π . For a given (j, f ), note
18 2. OVERVIEW
that the term sj,f appears in L at most twice: one is in the total departure rate of flow f from node
t (j ), and the other is in the total arrival rate of flow f to node w(j ), if w(j ) =
f . Accordingly, sj,f
appears in L with the factor b(j, f ) := rt (j ),f − rw(j ),f ≈ α(Xt (j ),f − Xw(j ),f ), with the conven-
tion that r
f ,f = 0. Denote the maximal backpressure on link j as B(j ) := maxf b(j, f ). Then,

subject to the constraint f sj,f ≤ sj (π ) (where sj (π ) is fixed at the moment), the Lagrangian
is maximized by choosing sj,f = sj (π ) for an f satisfying b(j, f ) = B(j ) (i.e., choosing a flow
with the maximal backpressure) if B(j ) > 0, and choosing sj,f = 0, ∀f if B(j ) ≤ 0. Plugging the
solution of {sj,f } back to L, we get
  
L = H (π) + β [uf (λf ) − rδf ,f λf ] + [B(j )]+ sj (π ) − r0 [1 − π(S)].
f j S

Then, we maximize L
over π . Similar to the last section, this gives the CSMA algorithm with
Rj = exp{[B(j )]+ }. Finally, the maximization of L over λf yields the same admission control
algorithm as before. By now, we have derived all components of the algorithm described earlier in
this section.

2.4 APPENDIX
In this section, we illustrate an important method to solve a constrained convex optimization problem
by finding the saddle point of the Lagrangian. Consider the following problem.
maxx −x12 − x22
s.t. x1 + x2 ≥ 4
x1 ≤ 6, x2 ≤ 5. (2.15)
With dual variables μ ≥ 0, form a Lagrangian
L(x; μ) = −x12 − x22 + μ1 (x1 + x2 − 4) + μ2 (6 − x1 ) + μ3 (5 − x2 ).
We aim to find the saddle point (x ∗ , μ∗ ) such that x ∗ maximizes L(x; μ∗ ) over x, and μ∗
minimizes L(x ∗ ; μ) over μ ≥ 0.
One can verify that x ∗ = (2, 2)T and μ∗ = (4, 0, 0)T satisfy the requirement. Indeed, we have
∂L(x; μ)/∂x1 = −2x1 + μ1 − μ2
∂L(x; μ)/∂x2 = −2x2 + μ1 − μ3
∂L(x; μ)/∂μ1 = x1 + x2 − 4
∂L(x; μ)/∂μ2 = 6 − x1
∂L(x; μ)/∂μ3 = 5 − x2 .
So given μ∗ , ∂L(x ∗ ; μ∗ )/∂x1 = 0 and ∂L(x ∗ ; μ∗ )/∂x2 = 0. Given x ∗ , ∂L(x ∗ ; μ∗ )/∂μ1 =
0, μ∗1 > 0; ∂L(x ∗ ; μ∗ )/∂μ2 > 0, μ∗2 = 0 and ∂L(x ∗ ; μ∗ )/∂μ3 > 0, μ∗3 = 0.
It is also easy to verify that x ∗ = (2, 2)T is indeed the optimal solution of (2.15).
For an in-depth explanation of this Lagrangian method, please refer to (8).
2.5. SUMMARY 19

2.5 SUMMARY
This chapter introduced the problem of scheduling links that interfere. We use a simplified model
of interference captured by a conflict graph: either two links conflict or they do not. Accordingly, at
any given time, only links in an independent set can transmit. The first problem is to decide which
independent set should transmit to keep up with arrivals.
We explained that the problem has a solution if the arrival rates are small enough (strictly
feasible). In that case, a simple randomized schedule makes the queue lengths positive recurrent.
The technique of proof was based on a Lyapunov function. However, this schedule requires knowing
the arrival rates.
MWM selects the independent set with maximum sum of backlogs. We proved it makes the
queues positive recurrent, again by using a Lyapunov function. Unfortunately, this algorithm is not
implementable in a large network.
We then described the A-CSMA protocol where the exponentially distributed waiting time of
a node has a rate exponential in its backlog. By exploring the CSMA Markov chain, we showed that
this protocol tends to select an independent set with a large sum of backlogs. We stated a theorem
that claims that this protocol makes the queues positive recurrent.
We then showed how to combine this protocol with admission control to maximize the sum
of the utilities of the flows of packets through the network. The network accepts packets at a rate
that decreases with the backlog in their ingress node.
Finally, we described a multi-hop network where nodes can decide which packet to send
and to which neighbor. We explained that each link selects the flow with the largest backpressure.
Moreover, the links use a CSMA protocol where the mean waiting times are exponentially decreasing
in that backpressure.
21

CHAPTER 3

Scheduling in Wireless
Networks
In this chapter, we consider the scheduling of wireless nodes, assuming perfect CSMA and no hidden
nodes, as we did in Chapter 2. The arrival rates are fixed and each packet reaches its intended receiver
in one hop. We model the interference between links by a conflict graph. The objective is to design
a distributed scheduling protocol to keep up with the arrivals.
In Section 3.1, we formulate the scheduling problem. Section 3.2 defines the CSMA algo-
rithm and studies the CSMA markov chain with fixed parameters. In Section 3.3, we show that
there exist suitable parameters in the CSMA algorithm to support any vector of strictly feasible ar-
rival rates, and these parameters can be obtained by maximizing a concave function whose gradient
is the difference between the average arrival rates and the average service rates at the nodes. This
observation suggests an idealized algorithm to adjust the CSMA parameters. However, the nodes
observe the actual service rates and arrival rates, not their average values. Consequently, the proposed
algorithm, described in Section 3.4.1, is a stochastic approximation algorithm called Algorithm 1.
Different from Algorithm 1, Section 3.4.3 proposes another algorithm where the CSMA param-
eters are directly related to the queue lengths. Section 3.5 provides an alternative interpretation of
the algorithms. It shows that the suitable invariant distribution of the independent sets has the
maximal entropy consistent with the average service rates being at least equal to the arrival rates.
This maximum entropy distribution is precisely that of a CSMA Markov chain with the appro-
priate parameters. This interpretation is important because it enables to generalize the algorithms
to solve utility maximization problems with admission control and routing, as we do in Chapter
4. Section 3.6 explains a variation of Algorithm 1, called Algorithm 1(b), to reduce delays in the
network. Section 3.7 provides simulation results that confirm the properties of Algorithms 1 and
1(b). Sections 3.8, 3.9 and 3.10 are devoted to the proof of the optimality of the proposed algorithm.
In Section 3.12, we explain how the result extends to the case when the packet transmission times
have general distributions that may depend on the link. Finally, Section 3.13 collects a few technical
proofs.

3.1 MODEL AND SCHEDULING PROBLEM


As in Chapter 2, we assume a simple model of interference captured by a conflict graph, or equiva-
lently by independent sets. Assume there are K links in the network, where each link is an (ordered)
transmitter-receiver pair. The network is associated with a conflict graph (or “CG”) G = {V , E },
22 3. SCHEDULING IN WIRELESS NETWORKS
where V is the set of vertices (each of them represents a link) and E is the set of edges. Two links
cannot transmit at the same time (i.e., “conflict”) if and only if (iff ) there is an edge between them.
An independent set (IS) in G is a set of links that can transmit at the same time without any
interference. For example, in the network of Figure 2.1, the ISs are ∅, {1}, {2}, {3}, {1, 3}.
Let X be the set of all ISs of G (not confined to maximal independent sets), and let N = |X |
be the number of ISs. Denote the i’th IS as x i ∈ {0, 1}K , a 0-1 vector that indicates which links are
transmitting in this IS. The k’th element of x i , xki = 1 if link k is transmitting, and xki = 0 otherwise.
We also refer to x i as a transmission state, and xki as the transmission state of link k.
Packets arrive at the nodes as processes with rate λi for link k. These arrival processes can be
fairly general, as long as their long-term rates are well-defined. For instance, the arrival processes
can be stationary and ergodic. They do not have to be independent. For simplicity, assume that each
arrived packet has a unit size of 1.
We define the feasibility and strict feasibility of arrivals.

Definition 3.1 Feasibility and Strict Feasibility of Arrivals.



(i) λ is said to be feasible if and only if λ = N i=1 p̄i · x for some probability distribution
i

p̄ ∈ R+ satisfying p̄i ≥ 0 and i=1 p̄i = 1. That is, λ is a convex combination of the ISs, such that
N N

it is possible to serve the arriving traffic with some transmission schedule. Denote the set of feasible
λ by C¯.
(ii) λ is said to be strictly feasible iff it is in the set C which denotes the interior of C¯.

Recall that the interior of a set is the collection of points surrounded by a ball of points in
that set. That is, the interior of C¯ is defined as int C¯ := {λ ∈ C¯|B (λ, d) ⊆ C¯ for some d > 0}, where
B (λ, d) = {λ | ||λ − λ||2 ≤ d}.
We show the following relationship in Section 3.13.1.

Theorem 3.2 Characterization of Strictly Feasible Rates.


N
λ is strictly feasible if and only if it can be written as λ = i=1 p̄i · x i where p̄i > 0 and
N
i=1 p̄i = 1.

For example, the vector λ = (0.4, 0.6, 0.4) of arrival rates is feasible since λ = 0.4 ∗ (1, 0, 1) +
0.6 ∗ (0, 1, 0). However, it is not strictly feasible because the IS (0, 0, 0) has zero probability. On
the other hand, λ = (0.4, 0.5, 0.4) is strictly feasible.
Now we define what is a scheduling algorithm and when it is called “throughput-optimum”.

Definition 3.3 Scheduling Algorithm, Throughput-Optimal, Distributed.


A scheduling algorithm decides which links should transmit at any time instance t, given the
history of the system (possibly including the history of queue lengths, arrival processes, etc.) up to
time t.
3.2. CSMA ALGORITHM 23
A scheduling algorithm is throughput optimal if it can support any strictly feasible arrival rates
λ ∈ C (in other words, it can stabilize the queues whenever possible). Equivalently, we also say that
such an algorithm achieves the maximal throughput.
We say that a scheduling algorithm is distributed if each link only uses information within its
one-hop neighborhood. We are primarily interested in designing a distributed scheduling algorithm
that is throughput optimum.

In the definition above, stabilizing the queues admits two definitions. When the network is
modeled by a time-homogeneous Markov process (e.g., if the algorithm uses a constant step size),
we define stability by the positive (Harris) recurrence1 of the Markov process. When the network
Markov process is not time-homogeneous (e.g., if the algorithm uses a decreasing step size), we
say that the queues are stable if their long-term departure rate is equal to their average arrival rate
(which is also called rate-stability).

3.2 CSMA ALGORITHM


The idealized CSMA Algorithm works as follows.

Definition 3.4 CSMA Algorithm.


If the transmitter of link k senses the transmission of any conflicting link (i.e., any link m
such that (k, m) ∈ E ), then it keeps silent. If none of its conflicting links are transmitting, then the
transmitter of link k waits (or backs-off ) for a random period of time that is exponentially distributed
with mean 1/Rk and then starts its transmission2 . If some conflicting link starts transmitting during
the backoff, then link k suspends its backoff and resumes it after the conflicting transmission is over.
The transmission time of link k is exponentially distributed with mean 1.

For simplicity, assume that the packet sizes upon transmission can be different from the sizes
of the arrived packets (by re-packetizing the bits in the queue), in order to give the exponentially
distributed transmission times. We discuss how to relax the assumption on the transmission times
in Section 3.12 (which not only provides a more general result but can also make re-packetization
unnecessary).
Assuming that the sensing time is negligible, given the continuous distribution of the backoff
times, the probability for two conflicting links to start transmission at the same time is zero. So
collisions do not occur in idealized-CSMA.
1 Positive recurrence is defined for Markov process with countable state space. The concept of positive Harris recurrence is for
Markov process with uncountable state space, and it can be viewed as a natural extension of positive recurrence. However, the
precise definition of positive Harris recurrence is not given here since the concept is not used in this book. Interested readers
can refer to (29) for an exact definition and a proof that our CSMA algorithm with a constant step size ensures positive Harris
recurrence.
2 If more than one backlogged links share the same transmitter, the transmitter maintains independent backoff timers for these
links.
24 3. SCHEDULING IN WIRELESS NETWORKS
It is not difficult to see that the transmission states form a continuous time Markov chain,
which is called the CSMA Markov chain. The state space of the Markov chain is X . Denote link k’s
neighboring set by N (k) := {m : (k, m) ∈ E }. If in state x i ∈ X , link k is not active (xki = 0) and all
i = 0, ∀m ∈ N (k)), then state x i transits to state x i + e
of its conflicting links are not active (i.e., xm k
with rate Rk , where ek is the K-dimension vector whose k’th element is 1 and all other elements
are 0’s. Similarly, state x i + ek transits to state x i with rate 1. However, if in state x i , any link in its
neighboring set N (k) is active, then state x i + ek does not exist (i.e., x i + ek ∈ / X ).
Let rk = log(Rk ). We call rk the transmission aggressiveness (TA) of link k. For a given positive
vector r = {rk , k = 1, . . . , K}, the CSMA Markov chain is irreducible. Designate the stationary
distribution of its feasible states x i by p(x i ; r). We have the following result (see ((5; 71; 45)):

Lemma 3.5 Invariant Distribution of the CSMA Markov Chain


The stationary distribution of the CSMA Markov chain has the following product-form:

exp( K i
k=1 xk rk )
p(x ; r) =
i
(3.1)
C(r)
where
 K j
C(r) = j exp( k=1 xk rk ) (3.2)

where the summation j is over all feasible states x j .

Proof: As in the proof of Theorem 2.6, we verify that the distribution (3.1)–(3.2) satisfies the
i = 0, ∀m ∈ N (k).
detailed balance equations. Consider states x i and x i + ek where xki = 0 and xm
From (3.1), we have
p(x i + ek ; r)
= exp(rk ) = Rk
p(x i ; r)
which is exactly the detailed balance equation between state x i and x i + ek . Such relations hold for
any two states that differ in only one element, which are the only pairs that correspond to nonzero
transition rates. It follows that the distribution is invariant. 2
Note that the CSMA Markov chain is time-reversible since the detailed balance equations
hold. In fact, the Markov chain is a reversible “spatial process” and its stationary distribution (3.1)
is a Markov Random Field ((38), page 189; (17)). (This means that the state of every link k is
conditionally independent of all other links, given the transmission states of its conflicting links.)
Later, we also write p(x i ; r) as pi (r) for simplicity. These notations are interchangeable
throughout the chapter. And let p(r) ∈ RN + be the vector of all pi (r)’s. It follows from Lemma 1
that sk (r), the probability that link k transmits, is given by

sk (r) = i [xki · p(x i ; r)]. (3.3)

Without loss of generality, assume that each link k has a capacity of 1. That is, if link k
transmits data all the time (without contention from other links), then its service rate is 1 (unit of
3.3. IDEALIZED ALGORITHM 25
data per unit time). Then, sk (r) is also the normalized throughput (or service rate) with respect to
the link capacity.
Even if the transmission time is not exponential distributed but has a mean of 1, references (5;
45) show that the stationary distribution (3.1) still holds. That is, the stationary distribution is
insensitive to the distributions of the transmission time. For completeness, we present a simple proof
of that insensitivity as Theorem 3.22 in Section 3.12.

3.3 IDEALIZED ALGORITHM


In this section, we show that there is a choice r ∗ of the parameters r for which the CSMA protocol
achieves the maximal throughput. We show in Theorem 3.8 of Section 3.3.1 that one can choose the
parameters r ∗ that maximize some concave function F (r; λ). Moreover, we show that the gradient of
F (r; λ) with respect to r is the difference between the arrival rates and the average service rates when
the parameters r are used. This observation leads to an idealized algorithm to adjust the parameters
r described in Section 3.3.2. The proposed algorithm based on this idealized version is discussed in
Section 3.4

3.3.1 CSMA CAN ACHIEVE MAXIMAL THROUGHPUT


The goal of this section is to prove Theorem 3.8. This is done in two steps. First, we show in Lemma
3.6 that suitable rates exist for the CSMA algorithm, provided that a specific function F (r; λ) attains
its maximum over r ≥ 0. Second, one shows that this maximum is attained if λ is strictly feasible.
We show that maximizing F (r; λ) is equivalent to minimizing the Kullback-Leibler diver-
gence between p̄ and p(r) where p̄ characterizes λ. The interpretation of the theorem is then that
the parameters of the CSMA algorithm should be chosen so that the invariant distribution of the
CSMA Markov chain is as close as possible (in the Kullback-Leibler divergence) to the distribution
of the independent sets that corresponds to the arrival rates.

For a λ ∈ C , let p̄ be a probability distribution such that λ = N i
i=1 p̄i x . (Note that p̄ may not
be unique, in which case we arbitrarily choose one such distribution.) Define the following function
(the “log-likelihood function” (68) if we estimate the parameter r assuming that we observe p̄i ).
Note that p̄ only shows up in the derivation of our algorithm, but the information of p̄ is not needed
in the algorithm itself.

F (r; λ) := p̄ log(p (r))
i i K i i
= p̄i [ k=1 xk rk − log(C(r))] (3.4)
i  K j
= λ
k k k r − log( j exp( x
k=1 k k r ))

where λk = i p̄i xki is the arrival rate at link k. (Note that the function F (r; λ) depends on λ, but
it does not involve p̄ anymore.)
Consider the following optimization problem:

supr≥0 F (r; λ). (3.5)


26 3. SCHEDULING IN WIRELESS NETWORKS
Since log(pi (r)) ≤ 0, we have F (r; λ) ≤ 0. Therefore, supr≥0 F (r; λ) exists. Also, F (r; λ) is
concave in r (8). We show that the following lemma holds.

Lemma 3.6 CSMA Can Serve λ if maxr≥0 F (r; λ) Exists


If supr≥0 F (r; λ) is attainable (i.e., there exists finite r ∗ ≥ 0 such that F (r ∗ ; λ) =
supr≥0 F (r; λ)), then sk (r ∗ ) ≥ λk , ∀k. That is, the service rate is not less than the arrival rate when
r = r∗.

Proof. Let d ≥ 0 be a vector of dual variables associated with the constraints r ≥ 0 in problem (3.5),
then the Lagrangian is L(r; d) = F (r; λ) + dT r. At the optimal solution r ∗ , we have
 j K j ∗
∂ L(r ∗ ; d∗ ) xk exp( k=1 xk rk )
+ dk∗
j
= λk −
∂rk C(r ∗ )
= λk − sk (r ∗ ) + dk∗ = 0 (3.6)

where sk (r), according to (3.3), is the service rate (at stationary distribution) given r. Since dk∗ ≥ 0,
λk ≤ sk (r ∗ ). 2
Equivalently, problem (3.5) is the same as minimizing the Kullback–Leibler divergence (KL
divergence) between the two distributions p̄ and p(r):

inf DKL (p̄||p(r)) (3.7)


r≥0

where the KL divergence



DKL (p̄||p(r)) : = [p̄ log(p̄i /pi (r))]
i i
= i [p̄i log(p̄i )] − F (r; λ).

That is, we choose r ≥ 0 such that p(r) is the “closest” to p̄ in terms of the KL divergence.
The above result is related to the theory of Markov Random Fields (68): when we minimize
the KL divergence between a given joint distribution pI and a product-form joint distribution pI I ,
then depending on the structure of pI I , certain marginal distributions induced by the two joint
distributions are equal (i.e., a moment-matching condition). In our case, the time-reversible CSMA
Markov chain gives the product-form distribution. Also, the arrival rate and service rate on link
k are viewed as two marginal probabilities. They are not always equal, but they satisfy the desired
inequality in Proposition 3.6, due to the constraint r ≥ 0, which is important in our design.
The following condition, proved in Section 3.13.2, ensures that supr≥0 F (r; λ) is attainable.

Lemma 3.7 If λ is Strictly Feasible, then maxr≥0 F (r; λ) Exists


If the arrival rate λ is strictly feasible, then supr≥0 F (r; λ) is attainable.
3.4. DISTRIBUTED ALGORITHMS 27
Combining Lemmas 3.6 and 3.7, we have the following desirable result.

Theorem 3.8 Throughput-Optimality of CSMA.


For any strictly feasible λ there exists a finite r ∗ such that sk (r ∗ ) ≥ λk , ∀k.

To see why strict feasibility is necessary, note that the links are all idle some positive fraction
of time with any parameters of the CSMA algorithm.

3.3.2 AN IDEALIZED DISTRIBUTED ALGORITHM


Since ∂F (r; λ)/∂rk = λk − sk (r), a simple gradient algorithm to solve (3.5) is

rk (j + 1) = [rk (j ) + α(j ) · (λk − sk (r(j )))]+ , ∀k (3.8)

where j = 0, 1, 2, . . . , and α(j ) is some (small) step sizes.


Since this is an algorithm to maximize a concave function, we know from Theorem A.1 how
to choose step sizes to either converge to the solution or to approach it.
The most important property of algorithm (3.8) is that it admits an easy distributed implemen-
tation in wireless networks because link k can adjust rk based on its local information: arrival rate λk
and service rate sk (r(j )). (If the arrival rate is larger than the service rate, then rk should be increased,
and vice versa.) No information about the arrival rates and service rates of other links is needed.
One important observation is that the nodes observe actual arrival and service rates that are
random and are not equal to their mean values, unlike in (3.8). Therefore, (3.8) is only an idealized
algorithm which cannot be used directly.

3.4 DISTRIBUTED ALGORITHMS


In this section, we construct three algorithms (Algorithm 1, a variation, and Algorithm 2) based
on the above results, and we establish their throughput-optimality (or near-throughput-optimality)
properties.
The main idea of Algorithm 1 and its variation is that the nodes observe a noisy version of
the gradient instead of the actual gradient. Accordingly, we use stochastic approximation algorithms
that adjust the parameters slowly enough so that the observed empirical arrival and service rates
approach their mean value. The two algorithms differ in how they adjust the parameters.
There are two sources of error between the observed service rates under CSMA with param-
eters r and their mean value s(r) under the stationary distribution of the CSMA Markov chain with
these parameters. This first one is that the services are random. The second is that the Markov chain
takes time to converge to its stationary distribution after one changes the parameters. This second
effect results in a bias: a difference between the mean value of the observations and their mean
value under the stationary distribution. Thus, the error has two components: a bias and a zero-mean
random error. To make the effect of the bias more and more negligible, we use the same values of
28 3. SCHEDULING IN WIRELESS NETWORKS
the parameters over intervals that increase over time. The complication is that the Markov chain
might take longer and longer to converge as we change the parameters. The precise proof requires
good estimates of the convergence time of the Markov chain (i.e., of its mixing time).
Section 3.4.1 explains Algorithm 1 and proves that it is throughput-optimal. This algorithm
uses decreasing step sizes and increasing update intervals3 . Section 3.4.2 shows that a variation of
Algorithm 1 with decreasing step sizes and constant update intervals stabilizes the queues when the
arrival rates are in a smaller set (although the set can be made arbitrarily close to C ). Both of the
algorithms are time-varying since the step sizes change with time. A time-invariant algorithm, called
Algorithm 2, is given in Section 3.4.3 with an arbitrarily small loss of throughput. In Algorithm 2,
the CSMA parameters are direct functions of the queue lengths.

3.4.1 THROUGHPUT-OPTIMAL ALGORITHM 1


In Algorithm 1 defined below, the links modify their aggressiveness parameter at times
0, T (1), T (1) + T (2), T (1) + T (2) + T (3), and so on. Here, the durations T (n) increase with n to
give more and more time for the CSMA Markov chain to approach its invariant distribution under
the updated parameters. The rate of convergence of the Markov chain to its stationary distribution
is bounded by the mixing time of the chain, which depends on its parameters. Moreover, the ad-
justments are in the opposite direction to the noisy estimate of the gradient with diminishing step
sizes.
The tricky aspect of the algorithm is that T (n) depends on the parameters r(n) of the Markov
chain. These parameters depend on the step sizes up to step n. We want to choose the step sizes and
the T (n) so that T (n) gets large compared to the mixing time, and yet the step sizes sum to infinity.
This balancing act of finding step sizes that decrease just slowly enough is the technical core of the
proof.
Let link k adjust rk at time tj , j = 1, 2, . . . with t0 = 0. Define the update interval T (j ) :=
tj − tj −1 , j = 1, 2, . . . . Define “period j ” as the time between tj −1 and tj , and r(j ) as the value of r
set at time tj . Let λ k (j ) and sk (j ) be, respectively, the empirical average arrival rate and service rate
at link k between time tj and tj +1 . That is, sk (j ) := tjj +1 xk (τ )dτ/T (j + 1), where xk (τ ) ∈ {0, 1}
t

is the state of link k at time instance τ . Note that λ k (j ) and sk (j ) are generally random variables.
We design the following distributed algorithm.

Definition 3.9 Algorithm 1.


At time tj +1 where j = 0, 1, 2, . . . , let

rk (j + 1) = [rk (j ) + α(j ) · (λ k (j ) − sk (j ))]D , ∀k (3.9)

where α(j ) > 0 is the step size, and [·]D means the projection to the set D := [0, rmax ] where
rmax > 0. Thus, [r]D = max{0, min{r, rmax }}. We allow rmax = +∞, in which case the projection
is the same as [·]+ .
3 We would like to thank D. Shah for suggesting the use of increasing update intervals.
3.4. DISTRIBUTED ALGORITHMS 29
Observe that each link k only uses its local information in the algorithm.
Remark: If in period j + 1 (for any j ), the queue of link k becomes empty, then link k still transmits
dummy packets with TA rk (j ) until tj +1 . This ensures that the (ideal) average service rate is still
sk (r(j )) for all k. (The transmitted dummy packets are counted in the computation of sk (j ).)
The following result establishes the optimality property of Algorithm 1.

Theorem 3.10 Algorithm 1 is Throughput-Optimal.


For simplicity, assume that at time instances t ∈ {0, 1, 2, . . . }, Ak (t) units of data arrive at link
k. Assume that Ak (t), ∀k, t ∈ {0, 1, . . . } are independent of each other. Also, assume that E(Ak (t)) =
λk , ∀t ∈ {0, 1, . . . } and Ak (t) ≤ C̄. Therefore, the empirical arrival rates are bounded, i.e., λ k (j ) ≤
λ̄, ∀k, j for some λ̄ < ∞. Let

1
α(n) = and Tn = n + 2 for n ≥ 0.
(n + 2) log(n + 2)
Then, under Algorithm 1 with D = [0, ∞), we have
(i) r(n) → r ∗ as n → ∞;
(ii) The algorithm stabilizes the queues in the sense of rate-stability. That is,

lim Qk (t)/t = 0, ∀k,


t→∞

where Qk (t) is the queue length of link k at time t. In particular, the use of dummy packets do not affect
the rate-stability.

We explain the key steps of the proof in Section 3.8, and we provide further details in Section
3.9.
Discussion
(1) In a related work (48), Liu et al. carried out a convergence analysis, using a differential-equation
method, of a utility maximization algorithm extended from (30) (see Section 4.1 for the algorithm).
However, queueing stability was not established in (48).
(2) It has been believed that optimal scheduling is NP complete in general. This complexity is
reflected in the mixing time of the CSMA Markov chain (i.e., the time for the Markov chain to
approach its stationary distribution). In (33) (and also in inequality (3.33)), the upper-bound used
to quantify the mixing time is exponential in K. However, the bound may not be tight in typical
wireless networks. For example, in a network where all links conflict, the CSMA Markov chain
mixes much faster than the bound.
(3) There is some resemblance between the above algorithm (in particular the CSMA Markov chain)
and simulated annealing (SA) (22). SA is an optimization technique that utilizes time-reversible
30 3. SCHEDULING IN WIRELESS NETWORKS
Markov chains to find a maximum of a function. SA can be used, for example, to find the Maximal-
Weighted IS (MWIS) which is needed in Maximal-Weight Scheduling. However, note that our
algorithm does not try to find the MWIS via SA. Instead, the stationary distribution of the CSMA
Markov chain with a properly-chosen r ∗ is sufficient to support any vector of strictly feasible arrival
rates (Theorem 3.8). Also, the time-reversible Markov chain we use is inherent in the CSMA
protocol, which is amenable to distributed implementation. This is not always the case in SA.

3.4.2 VARIATION: CONSTANT UPDATE INTERVALS


In a variant of the algorithm, one can use decreasing α(j ) and constant update intervals T (j )
(instead of increasing T (j )). However, this variant requires that r(j ) be bounded. Therefore, it can
only approximate, but not achieve, the maximal throughput. The variant is

rk (j + 1) = rk (j ) + α(j ) · [λ k (j ) +  − sk (j ) + h̄(rk (j ))]. (3.10)

Note that there is no projection in (3.10). Instead, h̄(rk (j )) is used to bound r(j ) in a “softer” way:

⎨rmin − y
⎪ if y < rmin
h̄(y) = 0 if y ∈ [rmin , rmax ] (3.11)


rmax − y if y > rmax

Then, the following can be shown.

Theorem 3.11 Feasible Rates with Constant Update Intervals.


Assume that

λ ∈ C (rmin , rmax , )
:= {λ| arg max F (r; λ +  · 1) ∈ (rmin , rmax )K }.
r

Also assume the same arrival process as in Theorem 3.10, such that the empirical arrival rates are bounded,
i.e., λ k (j ) ≤ λ̄, ∀k, j for some λ̄ < ∞.
 
Then, if α(j ) > 0 is non-increasing and satisfies j α(j ) = ∞, j α(j )2 < ∞ and α(0) ≤ 1,
then r(j ) converges to r ∗ as i → ∞ with probability 1, where r ∗ satisfies sk (r ∗ ) = λk +  > λk , ∀k.
Also, the queues are rate stable and return to 0 infinitely often.

Remark: Clearly, as rmin → −∞, rmax → ∞ and  → 0, C (rmin , rmax , ) → C . So the maximal
throughput can be arbitrarily approximated by setting rmax , rmin and .
The proof of the theorem is similar to that of Theorem 5.4 to be presented later, and it is
therefore omitted here.
3.5. MAXIMAL-ENTROPY INTERPRETATION 31
3.4.3 TIME-INVARIANT A-CSMA
Although Algorithm 1 is throughput-optimal, r is not a direct function of the queue lengths. In this
section, we consider algorithm (3.12) where r is a function of the queue lengths. It can achieve a
capacity region arbitrarily close to C .

Definition 3.12 Algorithm 2.


Let Qk (j ) be the queue length of node k at time j · T . For simplicity, assume that the dynamics
of Qk (j ) is

Qk (j + 1) = [Qk (j ) + T (λk (j ) − sk (j ))]+
The nodes update r at time j · T , j = 1, 2, . . . . (That is, T (j ) = T , ∀j .) Specifically, at time
j · T , node k sets
rk (j ) = min{(α/T )Qk (j ), rmax +
}, ∀k, (3.12)
where rmax ,
> 0 are two constants.

We have the following result about Algorithm 2.

Theorem 3.13 Time-invariant A-CSMA and Queueing Stability.


Assume that the vector of arrival rates

λ ∈ C2 (rmax ) := {λ | arg max F (r; λ ) ∈ [0, rmax ]K }.


r≥0

Clearly, C2 (rmax ) ⊂ C .
Then, with a small enough α and a large enough T , Algorithm 2 makes the queues stable.

Remark: Note that C2 (rmax ) → C as rmax → +∞.Therefore, the algorithm can be arbitrarily
approach throughput-optimality by properly choosing rmax , α and T .
The proof is given in Section 3.11.

3.5 MAXIMAL-ENTROPY INTERPRETATION


This section provides a different view of the above scheduling problem, which will help us later
develop a number of other algorithms. It shows that the desired distribution of the independent
sets has maximum entropy subject to serving the links fast enough. Accordingly, to generalize the
algorithm of this chapter, one will add an entropy term to the objective function of more general
problems.
Rewrite (3.5) as  
maxr,h { k λk rk − log( j exp(hj ))}
 j
s.t. hj = K k=1 xk rk , ∀j
(3.13)
rk ≥ 0, ∀k.
32 3. SCHEDULING IN WIRELESS NETWORKS
 j
For each j = 1, 2, . . . , N, associate a dual variable uj to the constraint hj = K k=1 xk rk .
Write the vector of dual variables as u ∈ RN + . Then it is not difficult to find that the dual problem
of (3.13) is the following maximal-entropy problem. (The computation is given in (31).)

maxu − u log(ui )
 i i i
s.t. (u · xk ) ≥ λk , ∀k (3.14)
i i

ui ≥ 0, i ui = 1.

where the objective function is the entropy of the distribution u, H (u) := − i ui log(ui ). 4

Let us define the domain of the objective function H(u) as D0 = {u|ui ≥ 0, ∀i, i ui = 1}.
Then, problem (3.14) is the same as

maxu∈D0 − u log(ui )
 i i i (3.15)
s.t. i (ui · xk ) ≥ λk , ∀k.

Also, if for each k, we associate a dual variable rk to the constraint i (ui · xki ) ≥ λk in
problem (3.15), then one can compute that the dual problem of (3.15) is the original problem
maxr≥0 F (r; λ) (This is shown in Section 3.13.2 as a by-product of the proof of Proposition 3.7).
This is not surprising, since in convex optimization, the dual problem of dual problem is often the
original problem.
What is interesting is that both r and u have concrete physical meanings.We have seen that rk is
the TA of link k. Also, ui can be regarded as the probability of state x i .This observation will be useful

in later sections. A convenient way to guess this is by observing the constraint i (ui · xki ) ≥ λk .
If ui is the probability of state x i , then the constraint simply means that the service rate of link k,

i (ui · xk ), is larger than the arrival rate.
i

Theorem 3.14 Maximal Entropy Property of CSMA Markov Chain.


Given some (finite) TA’s of the links (that is, given the dual variable r of problem (3.15)), the station-

ary distribution of the CSMA Markov chain maximizes the Lagrangian L(u; r) = − i ui log(ui ) +
 
k rk ( i ui · xk − λk ) over all possible distributions u. Also, Algorithm (3.8) can be viewed as a sub-
i

gradient algorithm to update the dual variable r in order to solve problem (3.15).

Proof. Given some finite dual variables r, the Lagrangian of problem (3.15) is
  
L(u; r) = − ui log(ui ) + rk ( ui · xki − λk ). (3.16)
i k i

4 In fact, there is a more general relationship between ML estimation problem such as (3.5) and Maximal-Entropy problem such
as (3.14) (68) (74). In (31), on the other hand, problem (3.14) was motivated by the “statistical entropy” of the CSMA Markov
chain.
3.6. REDUCING DELAYS: ALGORITHM 1(B) 33

Denote u∗ (r) = arg maxu∈D0 L(u; r). Since i ui = 1, if we can find some w, and u∗ (r) >0
such that
∂ L(u∗ (r); r) 
= − log(u∗i (r)) − 1 + rk xki = w, ∀i,
∂ui
k
then u∗ (r) is the desired distribution. The above conditions are
 
u∗i (r) = exp( rk xki − w − 1), ∀i. and u∗i (r) = 1.
k i
  j
By solving the two equations, we find that w = log[ j exp( k rk xk )] − 1 and

∗ exp( k rk xki )
ui (r) =   j
> 0, ∀i (3.17)
j exp( k rk xk )

satisfy the conditions.


Note that in (3.17), u∗i (r) is exactly the stationary probability of state x i in the CSMA
Markov chain given the TA r of all links. That is, u∗i (r) = p(x i ; r), ∀i (cf. (3.1)). So Algorithm (3.8)
is a subgradient algorithm to search for the optimal dual variable. Indeed, given r, u∗i (r) maximizes
L(u; r); then, r can be updated by the subgradient algorithm (3.8), which is the deterministic version
of Algorithm 1. The whole system is trying to solve problem (3.15) or (3.5). 2
Remark: Section 3.13.2 shows that if λ ∈ C , then there exists a (finite) optimal vector of dual
variables r ∗ for problem (3.15). Therefore, u∗ (r ∗ ) = p(r ∗ ) is the optimal solution of (3.15). Note
that p(r ∗ ) can support the arrival rates λ because it is feasible to (3.15). This offers another look at
Theorem 3.8.
Therefore, the distribution that has the maximal entropy subject to the capacity constraints
in (3.15) is a desirable distribution that can support λ and also be realized by distributed CSMA.

3.6 REDUCING DELAYS: ALGORITHM 1(B)


This section develops a simple scheme to reduce the delays in the network. The idea is to add some
slack to the arrival rates. The amount of slack is controlled by a penalty in the objective function of
the optimization problem. The solution of that problem leads to Algorithm 1(b).
Consider a strictly feasible arrival rate vector λ in the scheduling problem in Section 3.1. With
Algorithm 1, the long-term average service rates are in general not strictly higher than the arrival
rates, so traffic suffers from queueing delay when traversing the links. To reduce the delay, consider
a modified version of problem (3.14):
 
maxu,w − u log(ui ) + c k log(wk )
 i i i
s.t. i (ui · 
xk ) ≥ λk + wk , ∀k
(3.18)
ui ≥ 0, i ui = 1
0 ≤ wk ≤ w̄, ∀k
34 3. SCHEDULING IN WIRELESS NETWORKS
where 0 < c < 1 is a small constant. Note that we have added the new variables wk ∈ [0, w̄] (where

w̄ is a constant upper bound), and we require i ui · xki ≥ λk + wk . In the objective function, the
term c · log(wk ) is a penalty function to avoid wk being too close to 0.
Since λ is in the interior of the capacity region, there is a vector λ also in the interior

and satisfying λ > λ component-wise. So there exist w > 0 and u (such that i u i xki = λ k :=
λk + wk , ∀k) satisfying the constraints. Therefore, in the optimal solution, we have wk∗ > 0, ∀k
(otherwise, the objective function is −∞, smaller than the objective value that can be achieved by u

and w ). Thus, i u∗i · xki ≥ λk + wk∗ > λk . This means that the service rate is strictly larger than
the arrival rate, bringing the extra benefit that the queue lengths tend to decrease to 0.
Similar to Section 3.5, we form a partial Lagrangian (with y ≥ 0 as dual variables)
 
L(u, w; y) = − i ui log(ui ) + c k log(wk )+
 
k[yk ( i ui · xki − λk − wk )]
  (3.19)
= [− i ui log(ui ) + k (yk i ui · xki )]+
 
k [c · log(wk ) − yk wk ] − k (yk λk ).

Note that the only difference from (3.16) is the extra term k [c · log(wk ) − yk wk ]. Given
y, the optimal w is wk = min{c/yk , w̄}, ∀k, and the optimal u is the stationary distribution of
the CSMA Markov chain with r = y. Therefore, the (sub)gradient algorithm to update y is yk ←
yk + α(λk + wk − sk (y)).
Since r = y, we have the following localized algorithm at link k to update rk . Notice its
similarity to Algorithm 1.

Definition 3.15 Algorithm 1(b).


At time tj +1 where j = 0, 1, 2, . . . , let

rk (j + 1) = [rk (j ) + α(j ) · (λ k (j ) + min{c/rk (j ), w̄} − sk (j ))]D (3.20)

for all k, where α(j ) is the step size, and D = [0, rmax ] where rmax can be +∞. As in Algorithm
1, even when link k has no backlog (i.e., zero queue length), we let it send dummy packets with its
current aggressiveness rk . This ensures that the (ideal) average service rate of link k is sk (r(j )) for
all k.

Since Algorithm 1(b) “pretends” to serve some arrival rates higher than the actual arrival rates
(due to the positive term min{c/rk (j ), w̄}), Qk is not only stable, but it tends to be small. About
the convergence and stability properties, Theorem 3.10 also holds for Algorithm 1(b).

3.7 SIMULATIONS
In our C++ simulations, the transmission time of all links is exponentially distributed with mean 1ms,
and the backoff time of link k is exponentially distributed with mean 1/ exp(rk ) ms. The capacity
of each link is 1(data unit)/ms. There are 6 links in the network whose CG is shown in Fig. 3.1.
3.8. PROOF SKETCH OF THEOREM 3.10-(I) 35

/LQN  /LQN 

/LQN  /LQN 

/LQN  /LQN 

Figure 3.1: Conflict graph.

Define 0 ≤ ρ < 1 as the “load factor”, and let ρ = 0.98 in this simulation. The arrival rate
vector is set to λ=ρ*[0.2*(1,0,1,0,0,0) + 0.3*(1,0,0,1,0,1) + 0.2*(0,1,0,0,1,0) + 0.3*(0,0,1,0,1,0)] =
ρ*(0.5,0.2,0.5,0.3,0.5,0.3) (data units/ms). We have multiplied by ρ < 1 a convex combination of
some maximal ISs to ensure that λ ∈ C .

3.7.1 TIME-INVARIANT A-CSMA


Initially, all queues are empty, and the initial value of rk is 0 for all k. rk is then adjusted using
Algorithm 2 once every T = 5ms, with the “step size” α = 0.23, rmax = 8, and
= 0.5.
Fig. 3.2 shows the evolution of the queue lengths. They are stable despite some oscillations.
The vector r is not shown since in this simulation, it is α/T times the queue lengths.

3.7.2 TIME-VARYING A-CSMA


We now show the results of Algorithm 1(b) with α(j ) = 0.46/[(2 + j/1000) log(2 + j/1000)] and
T (j ) = (2 + j/1000) ms. We choose the constants c = 0.01, w̄ = 0.02, and rmax = ∞. To show
the negative drift of queues, we assume that initially all queue lengths are 300 data units.
Figure 3.3 shows that the TA vector r converges, and Figure 3.4 shows that the queues tend
to decrease and are stable. However, there are more oscillations in the queue lengths than the case
with constant step size. This is because when α(j ) becomes smaller as j increases, r(j ) becomes less
responsive to the variations of queue lengths.

3.8 PROOF SKETCH OF THEOREM 3.10-(I)


As mentioned before, Algorithm 1 can be viewed as a stochastic approximation (SA) algorithm to
find the suitable CSMA parameters r = r ∗ , which happens to maximize a concave function F (r; λ).
Therefore, our following proof is an extension of the conventional proof of the convergence of SA
algorithms. In Appendix A at the end of the book, we give more details on the background of SA
algorithms and the conventional proof of their convergence.
36 3. SCHEDULING IN WIRELESS NETWORKS

Queue lengths

120

100
Queue lengths (data units)

80

60

Link 1
40
Link 2
Link 3
20 Link 4
Link 5
Link 6
0
0 2 4 6 8 10 12 14 16
time (ms) 4
x 10

Figure 3.2: Queue lengths with Algorithm 2.

To prove the optimality of Algorithm 1, however, there exist extra challenges. First, r in
Algorithm 1 is unbounded in general, unlike what is assumed in Chapter A. Second, the error in
the gradient is determined by the mixing property of the CSMA Markov chain, which needs to
carefully quantified and controlled. As a result, we not only need to choose suitable step sizes α(j )
(as in normal SA algorithms), but we also need to choose the update interval Tj carefully to control
the error in the gradient. We will show that with suitable choices, r converges to r ∗ , and, therefore,
the queues are stabilized.
The main steps of the proof are as follows. We choose α(j ) = 1/[(j + 2) log(j + 2)] and
Tj = j + 2 for j ≥ 0. (More general choices are possible.) In the following, the notation Tj and
T (j ) are interchangeable.
Steps of the Proof:
Step 1: The first step is to decompose the error into a bias and a zero-mean random error.
j
Let x(j ) ∈ {0, 1}K be the state of the CSMA Markov chain at time tj = j =1 Tj (with
t0 = 0). Recall that r(j ) is the value of r set at time tj . Define the random vector U (0) = (r(0), x(0))
3.8. PROOF SKETCH OF THEOREM 3.10-(I) 37
r: Transmission Aggressiveness
7

4
r

Link 1
2 Link 2
Link 3
Link 4
1 Link 5
Link 6
0
0 2 4 6 8 10 12 14 16
time (ms) x 10
4

Figure 3.3: The vector r converges.


and U (j ) = (λ (j − 1), s (j − 1), r(j ), x(j )) for j ≥ 1. Let Fj , j = 0, 1, 2, . . . be the σ -field
generated by {U (j )}j =0,1,...,,j . In the following, we write the conditional expectation E(·|Fj )
simply as Ej (·).
Write Algorithm 1 as follows:

r(j + 1) = {r(j ) + α(j )[λ − s(r(j )) + B(j ) + η(j )]}D

where the k-th element of B(j ) is

Bk (j ) = Ej [λ k (j ) − sk (j )] − [λk − sk (r(j ))]

and the k-th element of η(j ) is

ηk (j ) = (λ k (j ) − sk (j )) − Ej [λ k (j ) − sk (j )].

Thus, B(j ) is the bias of the error at step j and η(j ) is the zero mean random error.
Step 2: The second step is to consider the change of a Lyapunov function.
38 3. SCHEDULING IN WIRELESS NETWORKS

Queue lengths
400
Link 1
Link 2
350
Link 3
Link 4
300 Link 5
Queue lengths (data units)

Link 6
250

200

150

100

50

0
0 2 4 6 8 10 12 14 16
time (ms) x 10
4

Figure 3.4: The queue lengths.

Let
1
||r(j ) − r ∗ ||2 .D(j ) =
2
Using the expression for r(j + 1) we find that

D(j + 1) ≤ D(j ) + α(j )[r(j ) − r ∗ ]T [λ − s(r(j ))]


+ α(j )[r(j ) − r ∗ ]T B(j ) + α(j )[r(j ) − r ∗ ]T η(j ) + α(j )2 C
= D(j ) + α(j )G(j ) + E(j )

where C > 0 is a constant,

G(j ) = [r(j ) − r ∗ ]T [λ − s(r(j ))],

and E(j ) consists of the other terms.


Step 3: One shows two properties. For any δ > 0, there exists some  > 0 such that

G(j ) ≤ − whenever D(j ) ≥ δ. (3.21)


3.8. PROOF SKETCH OF THEOREM 3.10-(I) 39
Moreover, with probability 1,


J
E(j ) converges to a finite value as J → ∞. (3.22)
j =0

Step 4: The two properties imply that r(j ) converges to r ∗ .

Proof. Pick an arbitrary δ > 0. We first claim that these two properties imply that D(j ) < δ for
infinitely many values of j . Indeed, assume otherwise so that D(j ) ≥ δ for all j ≥ j0 .Then by (3.21),
G(j ) ≤ −, ∀j ≥ j0 . Since D(j + 1) ≤ D(j ) + α(j )G(j ) + E(j ), we have that for n ≥ 1,
+n−1
j0
D(j0 + n) ≤ D(j0 ) + [α(j )G(j ) + E(j )]
j =j0
+n−1
j0 +n−1
j0
≤ D(j0 ) −  α(j ) + E(j ).
j =j0 j =j0

  j0 +n−1
Since j α(j ) = ∞ and (3.22) holds, we have that ∞ j =j0 α(j ) = ∞ and that j =j0 E(j )
converges to a finite value as n → ∞. Therefore, D(j0 + n) goes to −∞ as n → ∞, which is not
possible since D(j0 + n) ≥ 0. This proves the above claim.
 2
By property (3.22), one can pick j1 large enough so that | m j =m1 E(j )| ≤ δ for any m1 , m2 ≥
j1 . Since D(j ) < δ for infinitely many values of j , we canw choose j2 > j1 such that D(j2 ) < δ. It
follows that for any j > j2 ,

j −1

D(j ) ≤ D(j2 ) + [α(j )G(j ) + E(j )]
j =j2
j −1

≤ D(j2 ) + E(j )
j =j2
≤ D(j2 ) + δ < 2δ.

But the choice of δ > 0 is arbitrary (i.e., δ can be arbitrarily small). It follows that D(j )
converges to zero, and therefore that r(j ) converges to r ∗ , as claimed. 2
The details of the proof, given in Section 3.9, are to show the properties (3.21)–(3.22). Prop-
erty (3.21) holds because the function F (r; λ), defined in (3.4), is strictly concave in r. Essentially,
when r is away from r ∗ , a step in the direction of the gradient brings r strictly closer to r ∗ . Proving
property (3.22) has two parts: bounding B(j ) and showing that the zero-mean noise has a finite
sum. The first part is based on estimates of the convergence rate of the Markov chain (its mixing
time). The second part uses a martingale convergence theorem.
40 3. SCHEDULING IN WIRELESS NETWORKS
Step 5: Rate-stability.
Since r(j ) converges to r ∗ , it can be shown that the long-term average service rate of each
link k converges to sk (r ∗ ) ≥ λk , which implies rate-stability. Proof of this step is in Section 3.10.

3.9 FURTHER PROOF DETAILS OF THEOREM 3.10-(I)


This section provides the details for the proof of properties (3.21)–(3.22).

3.9.1 PROPERTY 3.21


Since F (r; λ) defined in (3.4) (which is simply written as F (r) below) is strictly concave, we know that
maxr≥0 F (r) has a unique solution r ∗ . So, for any r satisfying 21 ||r − r ∗ || = δ > 0, we have F (r) <
F (r ∗ ). Let  := minr: 1 ||r−r ∗ ||=δ {F (r ∗ ) − F (r)} > 0. Then, for any r satisfying 21 ||r − r ∗ || ≥ δ, we
2
have F (r) ≤ F (r ∗ ) − , due to the concavity of F (·).
Therefore, if D(j ) ≥ δ, by the concavity of F (·), one has G(j ) = −[r ∗ − r(j )]T ∇F (r(j )) ≤
−[F (r ∗ ) − F (r(j ))] ≤ −.

3.9.2 PROPERTY 3.22: NOISE


We show that


n
α(j )[r(j ) − r ∗ ]T η(j ) converges to a finite random variable.
j =0

First note that



n−1
r(n) = O( α(m)) = O(log log(n)). (3.23)
m=0

Let
j −1

Y (j ) = α(n)[r(n) − r ∗ ]T η(n).
n=0

Observe that Y (j ) ∈ Fj , and E[Y (j + 1)|Fj ] − Y (j ) = 0. So Y (j ) is a martingale. Therefore,

j −1

E[Y 2 (j )] = E{[α(n)(r(n) − r ∗ )T η(n)]2 }
n=0
j −1

≤ α 2 (n)E{||r(n) − r ∗ ||2 ||η(n)||2 }
n=0
3.9. FURTHER PROOF DETAILS OF THEOREM 3.10-(I) 41

The first equality above implies that E[Y 2 (j )]


is non-decreasing with j . Also, using (3.23)
and the fact that ||η(n)|| is bounded (i.e., ||η(n)|| ≤ c < ∞, ∀n), we have
j −1

sup E[Y 2 (j )] ≤ lim c2 α 2 (n)E{||r(n) − r ∗ ||2 } < ∞.
j j →∞
n=0

Now we use the well-known martingale convergence theorem (see, for example, (16)) stated
below.

Theorem 3.16 L2 Martingale Convergence Theorem.


Let {Z(j )}, j = 0, 1, 2, . . . be a martingale. If

sup E[Z 2 (j )] < +∞,


j

then there exists a random variable Z such that Z(j ) → Z almost surely as j → ∞, and E(Z 2 ) < +∞.

Applying the theorem to {Y (j )} completes the proof.

3.9.3 PROPERTY 3.22: BIAS


We show that


|α(n)[r(n) − r ∗ ]T B(n)| < ∞. (3.24)
n=0

As a result,


m2
lim sup | α(n)[r(n) − r ∗ ]T B(n)|
j →∞ m2 ≥m1 ≥j
n=m1

m2
≤ lim sup |α(n)[r(n) − r ∗ ]T B(n)| = 0,
j →∞ m2 ≥m1 ≥j
n=m1

so

J
α(n)[r(n) − r ∗ ]T B(n) converges to a finite value as J → ∞.
n=0

The main steps to prove (3.24) are as follows:

Tmix (n) = O(exp(c.r(n)) = O((log(n))c ) (3.25)


Tmix (n) (log(n))c
||B(n)|| = O( ) = O( ) (3.26)
 T (n) n
⇒ α(n)||r(n) − r ∗ || · ||B(n)|| < ∞ (3.27)
n
42 3. SCHEDULING IN WIRELESS NETWORKS
which implies (3.24).
In (3.25)–(3.26), Tmix (n) is the mixing time of the CSMA Markov chain with parameters
r(n). This means that the distribution πt converges to the invariant distribution π exponentially fast
with parameter Tmix (n) in that
1
||πt − π || := |πt (i) − π(i)| ≤ b · exp{−t/Tmix (n)}. (3.28)
2
i

where b is a constant.
Assuming (3.28) for the time being, we obtain (3.26) as follows. Recall that B(n) =

En [λ (n) − s (n)] − [λ − s(r(n))] = {En [λ (n)] − λ} − {En [s (n)] − s(r(n))}. With the arrival

process assumed in Theorem 3.10, it is easy to see that ||En [λ (n)] − λ|| = O(1/T (n)). Also,

||En [s (n)] − s(r(n))||
Tn
1
= O( ||P (Xt = .) − π||dt)
Tn 0
Tn
1
≤ O( b · e−t/Tmix (n) dt)
Tn 0
b ∞ Tmix (n)
≤ O( e−t/Tmix (n) dt) = O( ). (3.29)
Tn 0 T
Combining the above results yields (3.26). The inequality (3.27) is then straightforward.
The mixing time of a CSMA Markov chain may increase with the parameters r. To see this
consider the network of Figure 3.5 and assume that R1 = R2 = R3 = R4 = R. When R is large,
the corresponding Markov chain spends a long time in the states {{2}, {4}, {2, 4}} before visiting
the states {{1}, {3}, {1, 3}} and vice versa. Indeed, assume that the initial state is {2} or {4}. It is very
likely that the Markov chain jumps to {2, 4} before it jumps to ∅. Consequently, it takes a long time
for the probability of the other states to approach their stationary value.

1 2

4 3

Figure 3.5: The CSMA Markov chain.

To derive (3.28) and (3.25), we use a coupling argument. (This is a different, and in our
opinion, more intuitive approach than the method used in (33) based on the conductance of the
CSMA Markov chain.)
Let us start a stationary version {X̃t , t ≥ 0} with invariant distribution π and an independent
version {Xt , t ≥ 0} with some arbitrary distribution. These two Markov chains have the rate matrix
3.9. FURTHER PROOF DETAILS OF THEOREM 3.10-(I) 43
Q that corresponds to the CSMA Markov chain with parameters r(n). Let τ be the first time t that
Xt = X̃t .

Lemma 3.17
||P (Xt = .) − π|| ≤ P (τ > t). (3.30)

Proof. After time τ , we glue the two Markov chains together, so that

P (τ > t) = P (Xt = X̃t ).

Now,

P (Xt = x) − P (X̃t = x) = P (Xt = x, Xt = X̃t ) + P (Xt = x, Xt = X̃t )


− P (X̃t = x, Xt = X̃t ) − P (X̃t = x, Xt = X̃t )
= P (Xt = x, Xt = X̃t ) + P (Xt = x, Xt = X̃t )
− P (Xt = x, Xt = X̃t ) − P (X̃t = x, Xt = X̃t )
= P (Xt = x, Xt = X̃t ) − P (X̃t = x, Xt = X̃t )

so that

|P (Xt = x) − P (X̃t = x)| ≤ P (Xt = x, Xt = X̃t ) + P (X̃t = x, Xt = X̃t ).

Summing over x, we find



|P (Xt = x) − P (X̃t = x)| ≤ 2P (Xt = X̃t ),
x

so that
||P (Xt = .) − P (X̃t = .)|| ≤ P (Xt = X̃t ) = P (τ > t),
which proves (3.30). 2
Now we are ready to prove (3.28) and (3.25) by estimating P (τ > t).

Theorem 3.18 Bound on Mixing Time.


(3.28) and (3.25) hold.

Proof. Let t0 > 1 be a fixed time interval. The two Markov chains Xt and X̃t both have the rate
tansition matrix Q = (q(i, j ))1≤i,j ≤N where N is the number of states. Also, assume that 1 ≤
q(i, j ) ≤ R, ∀i = j : q(i, j ) > 0. Choose a state i0 which corresponds to a maximal independent
set. We will show that P (Xt0 = X̃t0 = i0 |X0 = i1 , X̃0 = i2 ) is larger than a specific constant for any
initial states i1 and i2 .
44 3. SCHEDULING IN WIRELESS NETWORKS
First, consider P (Xt0 = i0 |X0 = i1 ). We construct a path Pi1 ,i0 from state i1 to i0 , i.e., i1 =
j0 → j1 → j2 · · · → jM−1 → jM = i0 . Let O1 , O0 be the sets of links which are transmitting (or
“on”) in state i1 and i0 , respectively. First, for all links in O1 \O0 , we change them from “on” to “off ”
one by one (in an arbitrary order). Then, for all links in O0 \O1 , we change them from “off ” to “on”
one by one (in an arbitrary order). So, the path has M = |O1 \O0 | + |O0 \O1 | ≤ K jumps, i.e.,
M ≤ K. (3.31)
It is well known that the Markov chain can be interpreted as follows. From state i, one
picks an exponentially distributed r.v. Wi with rate −q(i, i) as the time to stay in state i before the
jump. Upon the jump, one chooses to jump to state j = i with probability pi,j := −q(i, j )/q(i, i),
independently of Wi . Now we have
P (Xt0 = i0 |X0 = i1 ) ≥ P (Xt reaches i0 along the path Pi1 ,i0 at time t ≤ t0 ,
and then stays at i0 for at least an interval of t0 |X0 = i1 )

M−1

M−1
= P( Wjm ≤ t0 ) · pjm ,jm+1 · exp[q(i0 , i0 )t0 ]. (3.32)
m=0 m=0

Note that each Wjm has a rate of at least 1, since −q(i, i) ≥ 1, ∀i. Let Zm , m = 0, . . . , M − 1
M−1
be M i.i.d. exponentially distributed r.v.s with rate 1. Then we have P ( m=0 Wjm ≤ t0 ) ≥
M−1 t0M
P ( m=0 Zm ≤ t0 ) ≥ P (Y = M) = M! exp(−t0 ) where Y has a Poisson distribution with param-
eter t0 . Also, since −q(i, i) ≤ K · R, ∀i, we have pjm ,jm+1 ≥ 1/(K · R). Finally, since state i0 is a
maximal independent set, it can only jump to other state by turning a link off. So −q(i0 , i0 ) ≤ K.
Using these facts and (3.31) in (3.32), we obtain
t0M
P (Xt0 = i0 |X0 = i1 ) ≥ exp(−t0 ) · (K · R)−M exp[−K · t0 ]
M!
1
≥ exp(−t0 )(K · R)−K exp[−K · t0 ]
(K)!
:= c̄ · R −K
where c̄ is a constant not related to R.
The same bound holds for P (X̃t0 = i0 |X̃0 = i2 ). Therefore, for any i1 , i2 ,
P (Xt , X̃t meet before t0 |X0 = i1 , X̃0 = i2 )
≥ P (Xt0 = X̃t0 = i0 |X0 = i1 , X̃0 = i2 ) ≥ c̄2 R −2K .
It follows that
P (τ > n · t0 ) ≤ (1 − c̄2 R −2K )n = exp{n · log(1 − c̄2 R −2K )} ≤ exp{−nc̄2 R −2K }.
So
P (τ > t) ≤ P (τ > t/t0  t0 ) ≤ exp{− t/t0  c̄2 R −2K }
≤ exp{−(t/t0 − 1)c̄2 R −2K } ≤ b · exp{−t/(t0 c̄−2 R 2K )} (3.33)
3.10. PROOF OF THEOREM 3.10-(II) 45
where b := exp(1).
Comparing with (3.28), we know that

Tmix (n) = O(R(n)2K ) = O(exp(2K · r(n))), (3.34)

so (3.25) holds. 2

3.10 PROOF OF THEOREM 3.10-(II)


Although r → r ∗ as shown in Theorem 3.10-(i), it is still not clear whether the queues are rate-
stable. One reason is that a link transmits dummy packets when its queue is empty, and the dummy
packets do not contribute to the throughput of the link. In this section (in particular in the proof of
Lemma 3.20), we show that the dummy packets do not affect the stability of the queues.
Let Ak (t) be the cumulative amount of traffic that has arrived at link k by time t. By assumption,
limt→∞ Ak (t)/t = λk a.s.. Let Sk (t) be the cumulative amount of service provided to link k up to
t
time t, i.e., Sk (t) := 0 xk (τ )dτ where xk (τ ) ∈ {0, 1} denotes the transmission state of link k at
time τ . Let Dk (t) be the cumulative amount of traffic that has departed from link k by time t. Note
that when the queue is empty, there is no departure but there can be service, that is,
t
Dk (t) = xk (τ )I (Qk (τ ) > 0)dτ
0

where I (·) is the indicator function. So Sk (t) ≥ Dk (t). Assume that the initial queue lengths are
zero, then it is clear that Dk (t) ≤ Ak (t), and link k’s queue length is Qk (t) = Ak (t) − Dk (t).
We complete the proof by stating and proving two lemmas. The first lemma is intuitively
clear, although non-trivial to prove.

Lemma 3.19 Under Algorithm 1,

lim Sk (t)/t = sk (r ∗ ), ∀k, (3.35)


t→∞

almost surely.

Proof. This is a quite intuitive result since r → r ∗ a.s.. In the following, we first give an outline of
the proof and then present the details.
Recall that r is adjusted at time ti , i = 1, 2, . . . , and Ti = ti − ti−1 . Note that during each
update interval Ti , the TA r is fixed. Since Ti → ∞ as i → ∞, for a given constant T > 0, we can
divide the time into blocks, such that during each block r is fixed, and all the blocks after some initial
time have similar lengths (between T and 2T ). Then we consider the average service rate ŝj in each
block j . We decompose ŝj into an ideal rate sk (rj ) where rj (temporarily) denotes the TA during
block j , an error bias, and a zero-mean martingale noise. Now, in oder to compute limt→∞ Sk (t)/t,
we need to average ŝj over all blocks. We show that the average of the martingale noise is 0, the
46 3. SCHEDULING IN WIRELESS NETWORKS
average of the error bias is arbitrarily close to 0 by choosing large-enough T , and the average of the
ideal rates is sk (r ∗ ) since rj converges to r ∗ . This implies the desired result.
Now we present the proof details. First, we divide the time into blocks. Fix a T > 0, we
construct a sequence of time {τj } as follows. Let τ0 = t0 = 0. Denote t(j ) := min{ti |ti > τj }, i.e.,
t(j ) is the nearest time in the sequence {ti , i = 1, 2, . . . } that is larger than τj . The following
defines τj , j = 1, 2, . . . , recursively. If t(j ) − τj < 2T , then let τj +1 = t(j ) . If t(j ) − τj ≥ 2T , then
let τj +1 = τj + T . Also, define Uj := τj − τj −1 , j = 1, 2, . . . .
Denote i ∗ (T ) = min{i|Ti+1 ≥ T }, and j ∗ (T ) = min{j |τj = ti ∗ (T ) }. From the above con-
struction, we have
T ≤ Uj ≤ 2T , ∀j > j ∗ (T ). (3.36)
Now, we consider the average service rate in each block j , i.e., ŝj := [Sk (τj +1 ) −
Sk (τj )]/Uj +1 . Write ŝj = sk (r(τj )) + bj + mj , where the “error bias” bj = Ej (ŝj ) − sk (r(τj ))
(Ej (·) is the conditional expectation given r(τj ) and the transmission state at time τj ), and the
martingale noise mj = ŝj − Ej (ŝj ) (note that Ej (mj ) = 0). For convenience, we have dropped the
subscript k in ŝj , bj , mj . But all discussion below is for link k.
 N
First, we show that limN →∞ [ N j =0 (mj · Uj +1 )/ j =0 Uj +1 ] = 0 a.s.. Since mj is bounded,
N
E(mj ) ≤ c1 for some c1 > 0. Clearly, MN := j =0 (mj · Uj +1 ), N = 0, 1, . . . is a martingale (de-
2
 N
fine M−1 = 0). We have E(MN2 ) = N j =0 (E(mj ) · Uj +1 ) ≤ c1
2 2 2
j =0 Uj +1 . Therefore,

∞ ∞ ∞
E(MN2 ) − E(MN−1
2 ) E(m2N ) · UN2 +1 UN2 +1
N = N ≤ c1 N
( j =0 Uj +1 ) 2 2 2
N=0 N =0 ( j =0 Uj +1 ) N =0 ( j =0 Uj +1 )
j ∗
(T )−1 ∞
UN2 +1 UN2 +1
= c1 N + c1 N
.
N =0 ( j =0 Uj +1 )2 N =j ∗ (T ) ( j =0 Uj +1 )
2

Since

 ∞
 ∞

UN2 +1 4T 2 4T 2
N ≤ N ≤ N
( 2 2 2
N=j ∗ (T ) j =0 Uj +1 ) N =j ∗ (T ) ( j =0 Uj +1 ) N =j ∗ (T ) ( j =j ∗ (T ) Uj +1 )
∞ ∞

4T 2 4
≤ = < ∞,
(N − j ∗ (T ) + 1)2 T 2 (N − j ∗ (T ) + 1)2
N=j ∗ (T ) N =j ∗ (T )

∞ E(MN2 )−E(M 2
N−1 )
we have N=0  < ∞. Using Theorem 2.1 in (27), we conclude that
( Nj =0 j +1 )
U 2


N 
N
lim [ (mj · Uj +1 )/ Uj +1 ] = 0, a.s. (3.37)
N →∞
j =0 j =0

We know that with probability 1, r → r ∗ . Consider a realization where r → r ∗ and (3.37)


holds. Choose t0 > τj ∗ (T ) large enough such that ∀t ≥ t0 , ||r(t) − r ∗ || < . That is, after t0 , r(t) is
3.10. PROOF OF THEOREM 3.10-(II) 47
near r ∗and is thus bounded. Similar to (3.26), we have |bj | ≤ c2 ()/Uj +1 for some constant c2 (),
for any j satisfying τj > t0 . Then, for any large-enough N ,


N 
N 
N 
N
| (bj · Uj +1 )/ Uj +1 | ≤ ( c2 ())/( Uj +1 ) ≤ c2 ()/T .
j :τj >t0 j =0 j :τj >t0 j :τj >t0

N N
Therefore, lim supN→∞ j =0 (bj · Uj +1 )/ j =0 Uj +1 ≤ c2 ()/T and similarly
N N
lim supN→∞ j =0 (bj · Uj +1 )/ j =0 Uj +1 ≥ −c2 ()/T .
Also, since r → r ∗ in the realization, it is easy to show that


N 
N
lim [ (sk (r(τj )) · Uj +1 )/ Uj +1 ] = sk (r ∗ ).
N→∞
j =0 j =0

Combining the above facts, we know that with probability 1, lim supt→∞ Sk (t)/t =
 N
lim supN→∞ [ N j =0 (ŝj · Uj +1 )/ Uj +1 ] ≤ sk (r ∗ ) + c2 ()/T and lim inf t→∞ Sk (t)/t =
N Nj =0
lim inf N→∞ [ j =0 (ŝj · Uj +1 )/ j =0 Uj +1 ] ≥ sk (r ∗ ) − c2 ()/T .
Since the above argument holds for any T > 0. Letting T → ∞, we have limt→∞ Sk (t)/t =
sk (r ∗ ) with probability 1. 2

Lemma 3.20 If sk (r ∗ ) ≥ λk , ∀k, and (3.35) holds a.s., then limt→∞ Dk (t)/t = λk , ∀k a.s.. That is,
the queue is “rate stable”.

Proof. Again, we first give an outline of the proof, and then we present the details. The proof is
composed of two parts. Part (a) shows that lim inf t→∞ [Ak (t) − Dk (t)]/t = 0 a.s., and part (b) shows
that lim supt→∞ [Ak (t) − Dk (t)]/t = 0 a.s.. Combining the two parts gives the desired results.
To show the result in part (a), suppose to the contrary that lim inf t→∞ [Ak (t) − Dk (t)]/t >
 > 0. This implies that there is some finite time T0 > 0, such that

Qk (t) = Ak (t) − Dk (t) ≥ t, ∀t ≥ T0 , (3.38)

as shown in Fig. 3.6 (a). So, no dummy packet is transmitted after T0 due to the non-empty queue,
which implies that the average departure rate limt→∞ Dk (t)/t is equal to the average service rate
limt→∞ Sk (t)/t. However, (3.38) implies that the average service rate (which equals the average
departure rate) is strictly smaller than the average arrival rate, leading to a contradiction.
To show the result in part (b), suppose to the contrary that lim supt→∞ [Ak (t) − Dk (t)]/t >
2a for some constant a > 0. This means that Ak (t) − Dk (t) ≥ 2a · t infinitely often. By part (a),
we also know that Ak (t) − Dk (t) ≤ a · t infinitely often. Therefore, for any T1 > 0, there exist
t2 > t1 ≥ T1 such that in the interval t ∈ [t1 , t2 ], Qk (t) = Ak (t) − Dk (t) grow from below a · t1 to
above 2a · t2 , and there is no dummy packet transmitted in between (see Fig. 3.6 (b)). We show that
48 3. SCHEDULING IN WIRELESS NETWORKS

4N W 4N W
D W

İW DW

7 W W W W
D 6FHQDULR  E 6FHQDULR 

Figure 3.6: Proof of Lemma 3.20.

this indicates a large fluctuation of [Ak (t) − Sk (t)]/t, contradicting the fact that [Ak (t) − Sk (t)]/t
converges to a limit.
Next, we present the proof details.
(a) We first show that lim inf t→∞ [Ak (t) − Dk (t)]/t = 0 a.s.. For this purpose, we show that
∀ > 0, P (lim inf t→∞ [Ak (t) − Dk (t)]/t > ) = 0. If in a realization,

lim inf [Ak (t) − Dk (t)]/t > , (3.39)


t→∞

then ∃T0 > 1/, s.t. ∀t ≥ T0 , [Ak (t) − Dk (t)]/t ≥ , i.e., Qk (t) ≥  · t. Since T0 > 1/, we have
Qk (t) > 1, ∀t ≥ T0 , i.e., the queue is not empty after T0 .Therefore, for any t ≥ T0 , Sk (t) = Sk (T0 ) +
[Sk (t) − Sk (T0 )] = Sk (T0 ) + [Dk (t) − Dk (T0 )] ≤ T0 + Dk (t). So
T0 + Dk (t)
lim sup Sk (t)/t ≤ lim sup = lim sup Dk (t)/t.
t→∞ t→∞ t t→∞

By the assumption (3.39), lim supt→∞ Dk (t)/t < lim inf t→∞ Ak (t)/t − . So
lim supt→∞ Sk (t)/t < lim inf t→∞ Ak (t)/t − . Therefore, the intersection of events

{ lim Sk (t)/t ≥ lim Ak (t)/t} ∩ {lim inf [Ak (t) − Dk (t)]/t > } = ∅. (3.40)
t→∞ t→∞ t→∞

On the other hand, with probability 1, limt→∞ Ak (t)/t = λk and limt→∞ Sk (t)/t =
sk (r ∗ ). Since sk (r ∗ ) ≥ λk , P (limt→∞ Sk (t)/t ≥ limt→∞ Ak (t)/t) = 1. In view of (3.40), we have
P (lim inf t→∞ [Ak (t) − Dk (t)]/t > ) = 0. Since this holds for any  > 0, we conclude that
lim inf t→∞ [Ak (t) − Dk (t)]/t = 0 a.s.
(b) Second, we show that lim supt→∞ [Ak (t) − Dk (t)]/t = 0 a.s..
From (a), we know that for an arbitrary a > 0, with probability 1 [Ak (t) − Dk (t)]/t ≤ a
infinitely often (“i.o.”), and limt→∞ [Ak (t) − Sk (t)]/t ≤ 0. Consider a realization in which the
above two events occur, and lim supt→∞ [Ak (t) − Dk (t)]/t > 2a. Then, [Ak (t) − Dk (t)]/t ≥ 2a
i.o..
3.11. PROOF OF THEOREM 3.13 49
By the above assumptions, Qk (t) = Ak (t) − Dk (t) ≤ a · t and Qk (t) = Ak (t) − Dk (t) ≥
2a · t i.o.. Also note that in any time interval of 1, Qk (t) can increase by at most C̄ (since the
number of arrivals in each time slot is bounded by C̄). So, for any T1 (satisfying a · T1 ≥ 4C̄), there
exist t2 > t1 ≥ T1 such that Qk (t1 ) ≤ a · t1 , Qk (t2 ) ≥ 2a · t2 , and Qk (t) ≥ 2C̄ for any t1 < t < t2 .
Since the queue is not empty from time t1 to t2 , we have

Sk (t2 ) − Sk (t1 ) = Dk (t2 ) − Dk (t1 ).

Denote Bk (t) := Ak (t) − Sk (t), then

Bk (t2 )
= Bk (t1 ) + [Bk (t2 ) − Bk (t1 )]
= Bk (t1 ) + {[Ak (t2 ) − Ak (t1 )] − [Sk (t2 ) − Sk (t1 )]}
= Bk (t1 ) + {[Ak (t2 ) − Ak (t1 )] − [Dk (t2 ) − Dk (t1 )]}
= Bk (t1 ) + Qk (t2 ) − Qk (t1 )
≥ Bk (t1 ) + 2a · t2 − a · t1

Therefore,
Bk (t2 )/t2 ≥ Bk (t1 )/t2 + 2a − a · t1 /t2
Then,
Bk (t1 ) t1 t1
Bk (t2 )/t2 − Bk (t1 )/t1 ≥ ( − 1) + 2a − a .
t1 t2 t2
Since limt→∞ Bk (t)/t := b ≤ 0, we choose T1 large enough such that ∀t ≥ T1 , |Bk (t)/t −
b| ≤ a/3. Then,
|Bk (t1 )/t1 − Bk (t2 )/t2 | ≤ (2/3) · a. (3.41)
t1
Also, since t1 ≥ T1 , we have Bk (t1 )/t1 ≤ b + a/3 ≤ a. Since t2 − 1 < 0, it follows that
t1 t1
Bk (t2 )/t2 − Bk (t1 )/t1 ≥ a · ( − 1) + 2a − a = a
t2 t2
which contradicts (3.41). Therefore, P (lim supt→∞ [Ak (t) − Dk (t)]/t > 2a) = 0. Since this holds
for any a > 0, we conclude that lim supt→∞ [Ak (t) − Dk (t)]/t = 0 a.s..
Combining (a) and (b) gives limt→∞ [Ak (t) − Dk (t)]/t = limt→∞ Qk (t)/t = 0 a.s.. 2

Combining the above two lemmas, we conclude that the queues are rate stable under Algo-
rithm 1.

3.11 PROOF OF THEOREM 3.13


Denote r ∗ = arg maxr≥0 F (r; λ). Since λ ∈ C (rmax ), we have r ∗ ∈ [0, rmax ]K ⊂ D := [0, D]K
where D := rmax +
. Clearly, there is a constant δ > 0 such that F (r; λ) ≤ F (r ∗ ; λ) − δ, ∀r ∈
/
D, r ≥ 0.
50 3. SCHEDULING IN WIRELESS NETWORKS
r̄2

D
r

r∗

0 D r̄1

Figure 3.7: An example with K = 2.

Define
r̄k := (α/T ) · Xk , ∀k. (3.42)
According to algorithm (3.12), the CSMA parameter rk (j ) = min{r̄k (j ), D}, ∀k, or equivalently,
r(j ) := [r̄(j )]D .
In view of the queue dynamics, the vector r̄ = (r̄k )k=1,...,K is also updated every T time units.
In particular, at time (j + 1)T , r̄ is updated for the (j + 1)-th time as follows:

r̄k (j + 1) = {r̄k (j ) + α · [λk (j ) − sk (j )]}, ∀k, (3.43)

where we choose
α = δ/K. (3.44)

Note that the average service rate sk (j ) is achieved with CSMA parameter r(j ) (instead of
r̄(j )).
Since r(j ) ∈ [0, D]K , ∀j , the mixing time of the CSMA Markov chain (in each update in-
terval T ) is Tmix := O(exp(2K · D)) by (3.34). Therefore, by (3.29), we choose T = T (δ, K, D) =
O(Tmix · (4K · D)/δ) = O(4K · D) exp(2K · D)/δ such that

|Ej [sk (j )] − sk (r(j ))| ≤ δ/(4K · D), ∀k, j. (3.45)

Next, we show that r̄(j ) is stable. Therefore, X(j ) is stable.


Define the Lyapunov function

L(r̄) := Lk (r̄k )
k
3.12. GENERAL TRANSMISSION TIMES 51
where

Lk (r̄k ) : = (D − rk∗ )(r̄k − rk∗ )I (r̄k ≥ D) +


1
[(r̄k − rk∗ )2 + (D − rk∗ )2 ]I (r̄k < D).
2
We have
∂L(r̄)
= (D − rk∗ )I (r̄k ≥ D) + (r̄k − rk∗ )I (r̄k < D) = rk − rk∗ . (3.46)
∂ r̄k

where rk = min{r̄k , D} as defined.


/ D, recall that r is its projection on
We also write F (r; λ) as F (r) for convenience. For a r̄ ∈
D. Then
 ∂F (r) ∂L(r̄)  ∂F (r)
= (rk − rk∗ ) ≤ F (r) − F (r ∗ ) ≤ −δ. (3.47)
∂rk ∂ r̄k ∂rk
k k

/ D, using (3.47) and (3.45), we have


So, if r(j ) ∈


(j )
:= Ej [L(r̄(j + 1)) − L(r̄(j ))]
 ∂L(r̄(j )) 1
≤ α {[λk − Ej (sk (j ))] } + Kα 2
∂ r̄k (j ) 2
k
 ∂L(r̄(j )) 
≤ α {[λk − sk (r(j ))] }+α {[sk (r(j ))
∂ r̄k (j )
k k
∂L(r̄(j )) 1
−Ej (sk (j ))] } + Kα 2
∂ r̄k (j ) 2
 ∂F (r(j )) ∂L(r̄(j ))  δ
≤ α +α ( D)
∂rk (j ) ∂ r̄k (j ) 4K · D
k k
1
+ Kα 2
2
δ 1
≤ α(−δ + + Kα)
4 2
= −αδ/4.

which establishes the negative drift of L(r̄(j )). Therefore, r̄(j ) is stable, and by (3.42), X(j ) is also
stable.

3.12 GENERAL TRANSMISSION TIMES


So far we have assumed that the packet transmission times are exponentially distributed with rate 1.
In this section, we explain how the result extends to the case where the packet transmission times are
independent at all the links and are identically distributed at each link k with a general distribution
52 3. SCHEDULING IN WIRELESS NETWORKS
that has mean μk and finite variance. In particular, we show that the CSMA Markov chain still has
a simple product-form distribution, and Algorithm (3.10) still applies.
Assume as before that link k chooses a waiting time that is exponentially distributed with rate
Rk . We will show that the CSMA Markov chain has the following stationary distribution:

π(S) = Ck∈S Rk μk

where C is the normalizing constant.


To establish this result, we need a general model of transmission time. Such a model is shown
in Figure 3.8. The figure indicates the activity of node i if that node were alone in the network. This
activity is modeled by an irreducible Markov chain with rate matrix Qi on a finite state space {0} ∪ Ai .
The state 0 corresponds to node i being idle and the states Ai to the node transmitting a packet.
By choosing this Markov chain suitably, one can approximate any transmission time distribution
as closely as desired. The invariant distribution of this Markov chain is πi , and it is such that
πi (0) = 1/(1 + μi Ri ). That assumption is consistent with the idea that the average transmission
time is equal to μi units of time and the rate out of state 0 is equal to Ri . Indeed, on average
the transmitter is idle for 1/Ri time units, then transmits for μi units of time on average, so that
πi (0) = (1/Ri )/(μi + 1/Ri ) = 1/(1 + μi Ri ).

Ai

0
Figure 3.8: Activity of node i: state 0 means idle.

Designate by Q i the rate matrix of this Markov chain reversed in time defined by

πi (xi )qi (xi , yi ) = πi (yi )qi (yi , xi ) (3.48)

for any two possible states xi , yi .


Now consider the wireless network formed by K such nodes, with their interference con-
straints. The states of activity of these nodes evolve as independent Markov chains with their respec-
tive rate matrices Qi , except that the transitions out of the idle state for a node is allowed only when
no neighbor is transmitting. This corresponds to a Markov chain xt described by a vector whose
component i is the activity state of node i. We have the following result.
3.12. GENERAL TRANSMISSION TIMES 53
Lemma 3.21 CSMA Reversed in Time
The Markov chain xt admits the invariant distribution π given as follows:

π(x1 , . . . , xN ) = Bπ1 (x1 ) · · · πN (xN ) (3.49)

where B is the constant such that these probabilities add up to one over all the possible states.
Moreover, the Markov chain reversed in time corresponds to the same CSMA network except that
the activity of each node i is described by the Markov chain with rate matrix Q i .

Proof:
We prove this result by verifying the equations

π(x)q(x, y) = π(y)q (y, x), ∀x, y. (3.50)

Summing these equations over x then proves that π Q = 0, so that π is stationary.


To verify (3.50) consider a pair (x, y) of states such that the transition from x to y corresponds
to a transition from xi to yi by node i. Then q(x, y) = qi (xi , yi ). And let q (y, x) = qi (yi , xi ).
Consequently, by (3.48),

Bπ1 (x1 ) · · · πi (xi ) · · · πN (xN )q(x, y) = Bπ1 (x1 ) · · · πi (yi ) · · · πN (xN )q (y, x),

which shows that the equations (3.50) hold.


2
This result allows us to prove the insensitivity below.

Theorem 3.22 Insensitivity of CSMA Markov Chain.


Let A(i1 , . . . , in ) indicate the event that the nodes i1 , . . . , in are transmitting and the others are
idle. The invariant distribution of the CSMA Markov chain is such that

π(A(i1 , . . . , in )) = Cnm=1 Rim μim (3.51)

where C is the constant such that these probabilities add up to one over all the independent sets. That is,
the probability of each independent set being active does not depend on the distribution of the transmission
times.

Proof:
This result follows directly from (3.49). Indeed, with D = {i1 , i2 , . . . , in }, one has

π(A(i1 , . . . , in )) = π(x) = Bi∈D πi (Ai )i ∈D / πi (0).
x∈A(i1 ,...,in )

Now,
1 μi Ri
πi (0) = and πi (Ai ) = 1 − πi (0) = .
1 + μi Ri 1 + μ i Ri
54 3. SCHEDULING IN WIRELESS NETWORKS
Consequently,
−1
π(A(i1 , . . . , in )) = BK
i=1 (1 + μi Ri ) i∈D μi Ri = Ci∈D μi Ri

with C = BK −1
i=1 (1 + μi Ri ) . The last expression above is precisely (3.51).
2
Using this product-form result and similar techniques as before, it is not difficult to show that

Algorithm (3.10) (with λk (j ) defined as the amount of data that arrives at link k in period j + 1
divided by T (j + 1)) is still near-throughput-optimal and stabilizes the queues.

3.13 APPENDICES
3.13.1 PROOF OF THE FACT THAT C IS THE INTERIOR OF C¯

Theorem 3.23 λ is strictly feasible if and only if λ ∈ int C¯. (In other words, C = int C¯.)


Proof. (i) If λ is strictly feasible, then it can be written as λ = i p̄i x i where p̄i > 0, ∀i and

i p̄i = 1. Let p̄0 be the probability corresponding to the all-0 IS, and p̄k be the probability of the
IS ek , k = 1, 2, . . . , K. Let d0 = min{p̄0 /K, mink p̄k } > 0. We claim that for any λ that satisfies

|λ k − λk | ≤ d0 , ∀k, (3.52)

we have λ ∈ C¯. Indeed, if λ satisfies (3.52), we can find another probability distribution p̄ such
 
that i p̄i xki = λ k , ∀k. p̄ can be constructed as follows: let p̄0 = p̄0 − k (λ k − λk ), p̄k = p̄k +
(λ k − λk ), and let the probabilites of all other ISs be the same as those in p̄. By condition (3.52), we

have p̄ ≥ 0. Also, i p̄i xki = λ k , ∀k.
Therefore, B (λ, d0 ) ⊆ C¯ where d0 > 0. So λ ∈ int C¯.

(ii) Assume that λ ∈ int C¯. We now construct a p > 0 such that λ = i pi x i . First, choose
 
an arbitrary pI > 0 (such that i pI,i = 1) and let λI := i pI,i x i . If it happens to be that
λI = λ, then λ is strictly feasible. In the following, we assume that λI = λ. Since λ ∈ int C¯,
there exists a small-enough d > 0 such that λI I := λ + d · (λ − λI ) ∈ C¯. So λI I can be written
 
as λI I = i pI I,i x i where pI I ≥ 0 and i pI I,i = 1.

Notice that λ = α · λI + (1 − α) · λI I where α := d/(1 + d) ∈ (0, 1). So λ = i pi x i
where pi := α · pI,i + (1 − α) · pI I,i , ∀i. Since α > 0, 1 − α > 0 and pI,i > 0, pI I,i ≥ 0, ∀i, we
have pi > 0, ∀i. Therefore, λ is strictly feasible. 2

3.13.2 PROOF THE PROPOSITION 3.7



Consider the convex optimization problem (3.15), where λ is strictly feasible (i.e., λ = p̄i · x i
 i
for some p̄i > 0, ∀x i and i p̄i = 1).
3.13. APPENDICES 55
With dual variables r ≥ 0, the Lagrangian is
  
L(u; r) = − ui log(ui ) + rk ( ui · xki − λk ) (3.53)
i k i

The dual function g(r) is defined as

g(r) := max L(u; r).


u∈D0

The proof of Proposition 3.14 has shown that



exp( k rk xki )
u∗ (r) := arg max L(u; r) =   j
.
u∈D0 exp( r x )
j k k k

Therefore,
g(r) = L(u∗ (r); r) = −F (r; λ).

We now check whether the Slater condition (8) (pages 226-227) is satisfied. Since all the
constraints in (3.15) are linear, we only need to check whether there exists a feasible u which is

in the relative interior (8) of the domain D0 of the objective function − i ui log(ui ), which is
  
D0 = {u|ui ≥ 0, ∀i, i ui = 1}. Since λ = i p̄i · x i where p̄i > 0, ∀i and i p̄i = 1, letting
u = p̄ satisfies the requirement. Therefore, the Slater condition is satisfied. As a result, there exist
(finite) optimal dual variables r ∗ ≥ 0 which attains the minimum of g(r), that is,

g(r ∗ ) = min g(r).


r≥0

This completes the proof.

Remark 1: The above proof also shows that (3.5) is the dual problem of (3.15).
Remark 2: Another way to show Theorem 3.8 is as follows. With the optimal (finite) dual
variables r ∗ , we know that u∗i (r ∗ ), ∀i solves problem (3.15). Therefore, u∗i (r ∗ ), ∀i are feasible to

problem (3.15). As a result, i (u∗i (r ∗ ) · xki ) = sk (r ∗ ) ≥ λk , ∀k.
Remark 3: To see that the Slater condition is useful, consider the following example.

maxu∈D0 − 2i=1 ui log(ui )
(3.54)
s.t. u1 ≥ 1,

where D0 = {u|u1 , u2 ≥ 0, u1 + u2 = 1}. Here, the Slater condition is not satisfied because the
only feasible u in D0 is u = (1, 0)T , which is not in the relative interior of D0 .
The dual function in this case is g(r) = log(er + 1) − r > 0, which approaches 0 as r → +∞
but cannot attain that minimum. Therefore, there exists no finite optimal dual variable.
56 3. SCHEDULING IN WIRELESS NETWORKS

3.14 SUMMARY
This chapter starts with a description of the basic wireless scheduling problem in Section 3.1. The
problem is to schedule transmissions of interfering links to keep up with arrivals. The model of
interference is a conflict graph. We derive necessary and sufficient conditions for the existence of a
suitable schedule. In Section 3.2, we discuss a model of the CSMA protocol. This model assumes
no hidden nodes and also that the carrier sensing is instantaneous. The model results in a Markov
chain of the active independent set. The invariant distribution of that Markov chain is derived
in Lemma 3.5. This distribution has a product form. Section 3.3 introduces an idealized CSMA
algorithm that assumes that each link can estimate its arrival and service rate exactly. The key idea is
to minimize the KL-divergence between two distributions by using a gradient algorithm. Section 3.4
explains Algorithm 1 that uses the actual observations of the links (arrivals and transmissions). This
algorithm is a stochastic approximation version of the idealized algorithm. Section 3.5 elaborates on
the entropy-maximization property of the CSMA Markov chain. Section 3.6 explains Algorithm
1(b), a modification of Algorithm 1 to reduce delays. In that algorithm, the links inflate their arrival
rate. Simulation results are presented in Section 3.7. Section 3.8 sketches the proof of the convergence
of Algorithm 1. The details are in Section 3.9. In particular, the section derives a new bound on
the mixing time of CSMA Markov chains using a coupling argument. Then, Section 3.10 proves
the rate-stability of Algorithm 1. Section 3.12 explains the case when the transmission times have a
general distribution. That section provides a simple proof of the insensitivity of the CSMA Markov
chain. Finally, Section 3.13 collects a few technical proofs.

3.15 RELATED WORKS


There have been a large number of scheduling algorithms proposed in the literature. We review some
of them in this section. Many of the existing algorithms divide the time into equal-length “slots”. In
each slot, the algorithm chooses an IS to be active, based on the queue lengths of different links in
that slot (with higher priority usually given to longer queues).

3.15.1 MAXIMAL-WEIGHT SCHEDULING


A classical throughput-optimum algorithm is maximal-weight scheduling (MWS) (66). (This al-
gorithm has also been applied to achieve 100% throughput in input-queued switches (52).) With
MWS, in each slot, an IS with the maximal “weight” is scheduled, where the “weight” of an IS is
the summation of the queue lengths of the active links in the IS.
However, implementing MWS in general conflict graphs is quite difficult for two reasons.
(i) MWS is inherently a centralized algorithm and is not amenable to distributed implementation;
(ii) finding a maximal-weighted IS (in each slot) is NP-complete in general and is hard even for
centralized algorithms. Therefore, MWS is not suitable for distributed wireless networks.
A randomized version of MWS by Tassiulas (67) provides a simpler (centralized) implemen-
tation. In this algorithm, the maximal-weighted IS is not found in each slot. Instead, in each slot
3.15. RELATED WORKS 57
the algorithm compares the IS used in the previous slot and a randomly generated IS, and it chooses
the one with the larger weight. This algorithm retains the throughput-optimality, and it achieves
linear complexity in each slot. As a tradeoff, the queue lengths are increased (since it takes time to
generate an IS with a near-maximal weight).

3.15.2 LOW-COMPLEXITY BUT SUB-OPTIMAL ALGORITHMS


Due to the above disadvantages of MWS, a number of low-complexity, but sub-optimal scheduling
algorithms have been proposed. The Maximal Scheduling algorithm (MS) was proposed in (10) and
was also studied in the context of 802.11-like protocol (76). In each slot, MS chooses links with
non-empty queues until no further link can be chosen without interference.
Different from MS which does not consider the queue lengths of non-empty queues, the
Longest-Queue-First algorithm (LQF) (15; 37; 79; 43) constructs the schedule in each slot by
iteratively choosing the longest queue. (Therefore, LQF can be viewed as a greedy algorithm.)
Although the above algorithms have low computational complexity, they can only achieve
a fraction of the capacity region C , in general. The size of the fraction depends on the network
topology and interference relationships. Since LQF uses more queue-length information, it can
usually achieve higher throughput than MS and also has good delay performance. In fact, it has
been shown that LQF is throughput optimum if the network topology satisfies a “local pooling”
condition (15), or if the network is small (43). In general topologies, however, LQF is not throughput
optimum, and the fraction of C achievable can be computed as in (37).

3.15.3 THROUGHPUT-OPTIMUM ALGORITHMS FOR RESTRICTIVE


INTERFERENCE MODELS
A few recent works proposed throughput-optimal algorithms for certain interference models. For
example, Eryilmaz et al. (19) proposed a polynomial-complexity algorithm for the “two-hop inter-
ference model”5 . Modiano et al. (55) introduced a gossip algorithm for the “node-exclusive model”6 .
The extensions to more general interference models, as discussed in (19) and (55), usually involves
extra challenges. Sanghavi et al. (64) introduced an algorithm that can approach the throughput
capacity (with increasing overhead) for the node-exclusive model.

3.15.4 RANDOM ACCESS ALGORITHMS


Recently, a number of researchers realized that random access algorithms, despite their simplicity,
can achieve high throughput in wireless networks. Random access algorithms differ significantly
from the synchronous time-slotted model adopted in many existing scheduling algorithms described

5 In this model, a transmission over a link from node m to node n is successful iff none of the one-hop neighbors of m and n is in
any conversation at the time.
6 In this model, a transmission over a link from node m to node n is successful iff neither m nor n is in another conversation at the
time.
58 3. SCHEDULING IN WIRELESS NETWORKS
above. Of particular interest is the CSMA/CA algorithm (Carrier Sense Multiple Access / Collision
Avoidance) widely deployed in the current IEEE 802.11 wireless networks.
In (18), Durvy and Thiran showed that asynchronous CSMA can achieve a high level of
spatial reuse, via the study of an idealized CSMA model without collisions. In (51), Marbach et al.
considered a model of CSMA with collisions. It was shown that under a restrictive “node-exclusive”
interference model, CSMA can be made asymptotically throughput-optimal in the limiting regime
of large networks with a small sensing delay. (Note that when the sensing delay goes to 0, collisions
asymptotically disappear.) In (61), Proutiere et al. developed asynchronous random-access-based
algorithms whose throughput performance, although not optimum, is no less than some maximal
scheduling algorithms, e.g., Maximum Size scheduling algorithms.
However, none of these works have established the throughput optimality of CSMA under a
general interference model, nor have they designed specific algorithms to achieve the optimality.
59

CHAPTER 4

Utility Maximization in Wireless


Networks
In Chapter 3, the problem was to design a distributed scheduling algorithm to keep up with fixed
arrival rates when the transmissions are single hop. In this chapter, we study the combined admission
control, routing, and scheduling problem in a multi-hop network. That is, the arrival rates are not
given ahead of time. Instead, the nodes exercise some admission control. Moreover, packets may
have to go across a number of hops from their source to their destination. Finally, the routing is not
fixed. The nodes choose where to send their packets. The objective is to maximize the total utility
of the flows of packets across the network.
Section 4.1 explains the primal/dual decomposition of the utility maximization problem,
which suggests a distributed algorithm that combines MAC-layer CSMA scheduling and transport-
layer congestion control. Section 4.2 further shows that CSMA scheduling is a modular MAC-layer
component in cross-layer optimization algorithms. Specifically, we demonstrate its combination
with routing, anycast, and multicast with network coding. Section 4.3 provides simulation results
that confirm the properties of the algorithm.

4.1 JOINT SCHEDULING AND CONGESTION CONTROL


In Section 4.1.1, we formulate the optimization problem. Section 4.1.2 derives an algorithm for
that problem. In Section 4.1.3, we show that the algorithm approaches the solution of the utility
maximization problem.

4.1.1 FORMULATION OF OPTIMIZATION PROBLEM


We explained in the previous chapter that, given any feasible rates, a CSMA algorithm can serve
them by adjusting the transmission aggressiveness parameters of the links based on the backlog in
the nodes. These parameters are those that maximize the entropy of the distribution of the CSMA
Markov chain subject to the service rates being larger than the arrival rates. Also, each node can
adjust its parameter by using a gradient algorithm, and it turns out that the adjustment is determined
by the observed increase in the node’s queue length.
The key differences in the problem of this chapter are that the packets may have to go across
multiple hops and that the network adjusts the arrival rates. Thus, the problem is to select the
scheduling and the rates of the flows that the network admits to maximize the utility of the flows.
60 4. UTILITY MAXIMIZATION IN WIRELESS NETWORKS
In the optimization problem, the objective function is the sum of two terms: 1) the entropy of the
distribution of the CSMA Markov chain and 2) a multiple of the utility of the admitted rates. For
any choice of the admitted rates, maximizing the first term results in parameters of the CSMA
protocol that can serve those rates. By maximizing the second term, one chooses rates that have
a large utility. By choosing a large multiplier of the utility, one can approximate the maximum of
that utility. As in the previous chapter, the gradient algorithm enables to compute the appropriate
parameters of the CSMA protocol. However, in this formulation, the gradient depends on the
maximum backpressure for a given link, instead of its backlog as was the case in the previous chapter.
The maximum backpressure for a link is over the flows that the link can serve. We explain that
this maximum determines which flow to serve. In subsequent sections, we add the flexibility in the
choice of the paths that the packets follow, thus including the routing decisions in the formulation.
Assume there are M flows indexed by m = 1, 2, . . . , M. Each flow has a determined path
through the network, from its source to its destination. Define amk = 1 if flow m uses link k, and
amk = 0 otherwise. Let fm be the rate of flow m, and vm (fm ) be the utility function of this flow,
which is assumed to be increasing and strictly concave. The concavity reflects the diminishing value
of extra bandwidth.
Assume all links have the same physical-layer transmission rate 1 (it is easy to extend the
algorithm to different rates). Assume also that each link k maintains a separate queue for each flow
that traverses it. Then, the service rate of flow m by link k, denoted by skm , should be no less than
the incoming rate of flow m to link k. For flow m, if link k is its first link (i.e., the source link), we
say δ(m) = k. In this case, the constraint is skm ≥ fm . If k = δ(m), denote flow m’s upstream link of
link k by up(k, m), then the constraint is skm ≥ sup(k,m),m , where sup(k,m),m is equal to the incoming
service rate of flow m to link k by the previous link.
It turns out that requiring that skm ≥ sup(k,m),m instead of skm ≥ fm results in a local algorithm
instead of an end-to-end one like TCP. Thus, the choice of formulation of the constraints has
a significant impact on the resulting algorithm. Equivalent formulations produce algorithms that
solve the same problem. However, one algorithm may be local whereas another may require more
global exchange of information.
Let ui be the fraction of time that the links in the independent set x i transmit. With this

notation, the service rate of link i is i ui · xki . This rate must be at least equal to the sum of the
 
service rates of all the flows m that use this link i. That is, i ui · xki ≥ m:amk =1 skm , ∀k.
Then, consider the following optimization problem:
 
maxu,s,f − i ui log(ui ) + β M m=1 vm (fm )
s.t. skm ≥ 0, ∀k, m : amk = 1
skm ≥ sup(k,m),m , ∀m, k : amk = 1, k = δ(m)
(4.1)
skm ≥ fm , ∀m, k : k = δ(m)
 
i ui · x k ≥ m:amk =1 skm , ∀k
i

ui ≥ 0, i ui = 1
where β > 0 is a constant weighting factor.
4.1. JOINT SCHEDULING AND CONGESTION CONTROL 61
As we explained earlier, the objective function is not exactly the total utility, but it has an extra

term − i ui log(ui ). As will be further explained in Section 4.1.3, when β is large, the “importance”
of the total utility dominates the objective function of (4.1). (This is similar in spirit to the weighting
factor used in (57).) As a result, the solution of (4.1) approximately achieves the maximal utility.

4.1.2 DERIVATION OF ALGORITHM


The main idea to derive the solution is to consider a “partial Lagrangian” that includes Lagrange mul-
tipliers, of some selected constraints. One then maximizes that partial Lagrangian over the decision
variables, subject to the constraints not included in the partial Lagrangian. The maximization de-
pends on the Lagrange multipliers.The appropriate multipliers minimize the optimized Lagrangian.
Thus, some selected constraints are relaxed with multipliers, and the others are used directly in the
maximization.
Associate dual variables qkm ≥ 0 to the 2nd and 3rd lines of constraints of (4.1). Then a
  
partial Lagrangian (subject to skm ≥ 0, i ui · xki ≥ m:amk =1 skm and ui ≥ 0, i ui = 1) is

 
L(u, s, f ; q) = − i ui log(ui ) + β M m=1 vm (fm )

+ m,k:amk =1,k =δ(m) qkm (skm − sup(k,m),m )

+ m,k:,k=δ(m) qkm (skm − fm )
 (4.2)
= − i ui log(ui )
 
+β M v (f ) − m,k:k=δ(m) qkm fm
 m=1 m m
+ k,m:amk =1 [skm · (qkm − qdown(k,m),m )]

where down(k, m) means flow m’s downstream link of link k (Note that down(up(k, m), m) = k).
If k is the last link of flow m, then define qdown(k,m),m = 0.
Fix the vectors u and q first, we solve for skm in the sub-problem

maxs k,m:amk =1 [skm · (qkm − qdown(k,m),m )]
s.t. skm ≥ 0, ∀k, m : amk = 1 (4.3)
 
m:amk =1 skm ≤ i (ui · xk ), ∀k.
i

The solution is easy to find (similar to (47) and related references therein) and is as follows.
At link k, denote zk := maxm:amk =1 (qkm − qdown(k,m),m ). Then,

(i) If zk > 0, then for a m ∈ arg maxm:amk =1 (qkm − qdown(k,m),m ), let skm = i (ui · xki ) and let
skm = 0, ∀m = m . In other words, link k serves a flow with the maximal back-pressure qkm −
qdown(k,m),m .
(ii) If zk ≤ 0, then let skm (j ) = 0, ∀m, i.e., link k does not serve any flow.
Since the value of qdown(k,m),m can be obtained from a one-hop neighbor, this algorithm is
distributed. (In practice, the value of qdown(k,m),m can be piggybacked in the ACK packet in IEEE
802.11.)
62 4. UTILITY MAXIMIZATION IN WIRELESS NETWORKS
Plugging the solution of (4.3) back into (4.2), we get
  
L(u, f ; q) = [− i ui log(ui ) + k (zk )+ ( i ui · xki )]
 
+[β M m=1 vm (fm ) − m,k:k=δ(m) qkm fm ]

where zk is the maximal back-pressure at link k. So a distributed algorithm to solve (4.1) is as


follows. Denote by Qkm the actual queue length of flow m at link k. For simplicity, assume that
(0) ≤ V < ∞, ∀m, i.e., the derivative of all utility functions at 0 is bounded by some V < ∞.
vm
Algorithm 3, joint scheduling and congestion control, is defined below.

Definition 4.1 Algorithm 3.


Initially, assume that all queues are empty (i.e., Qkm (0) = 0, ∀k, m), and let qkm (0) =
0, ∀k, m. As before, define the update interval T (j ) = tj − tj −1 and t0 = 0. Here we use con-
stant step sizes and update intervals α(j ) = α, T (j ) = T , ∀j . The variables q, f , r are iteratively

updated at time tj , j = 1, 2, . . . . Let q(j ), f (j ), r(j ) be their values set at time tj . Denote by skm (j )
the empirical average service rate of flow m at link k in period j + 1 (i.e., the time between tj and
tj +1 ).

• Scheduling: In period j + 1, link k lets its TA be rk (j ) = [zk (j )]+ in the CSMA operation,
where zk (j ) = maxm:amk =1 (qkm (j ) − qdown(k,m),m (j )). (The rationale is that, given z(j ), the
u∗ that maximizes L(u, f ; q(j )) over u is the stationary distribution of the CSMA Markov
Chain with rk (j ) = [zk (j )]+ , similar to the proof of Theorem 3.14.) Choose a flow m ∈
arg maxm:amk =1 (qkm (j ) − qdown(k,m),m (j )). When link k gets the opportunity to transmit, (i)
if zk (j ) > 0, it serves flow m ; (Similar to Algorithm 1, the dummy packets transmitted by

link k, if any, are counted in skm (j ).) (ii) if zk (j ) ≤ 0, then it transmits dummy packets. These

dummy packets are not counted, i.e., let skm (j ) = 0, ∀m. Also, they are not put into any actual
queue at the receiver of link k. (A simpler alternative is that link k keeps silent if zk (j ) ≤ 0.
That case can be similarly analyzed following the method in Section 4.4.)

• Congestion control: For each flow m, if link k is its source link, the transmitter of link k lets
the flow rate in period j + 1 be fm (j ) = arg maxfˆm ∈[0,1] {β · vm (fˆm ) − qkm (j ) · fˆm }. (This
maximizes L(u, f ; q(j )) over f .)

• The dual variables qkm (maintained by the transmitter of each link) are updated (simi-

lar to a subgradient algorithm). At time tj +1 , let qkm (j + 1) = [qkm (j ) − α · skm (j )]+ +

α · sup(k,m),m (j ) if k = δ(m); and qkm (j + 1) = [qkm (j ) − α · skm (j )]+ + α · fm (j ) if k =
δ(m). (By doing this, approximately qkm ∝ Qkm .)

Remark 1: As T → ∞ and α → 0, Algorithm 3 approximates the “ideal” algorithm that solves (4.1),
due to the convergence of the CSMA Markov chain in each period. A bound of the achievable utility
of Algorithm 3, compared to the optimal total utility W̄ defined in (4.4) is given in Section 4.4. The
4.2. EXTENSIONS 63
bound, however, is not very tight since our simulations show good performance without a very large
T or a very small α.
Remark 2: In Section 4.2, we show that by using similar techniques, the adaptive CSMA algorithm
can be combined with optimal routing, anycast or multicast with network coding. So it is a modular
MAC-layer protocol which can work with other protocols in the transport layer and the network
layer.
Remark 3: Coincidentally, the authors of (72) implemented a protocol similar to Algorithm 3 using
802.11e hardware, and it shows superior performance compared to normal 802.11. There, according
to the backpressure, a flow chooses from a discrete set of contention windows, or “CW’s” (where a
smaller CW corresponds to a larger TA). We note that, however, different from our work, (72) only
focuses on implementation study, without theoretical analysis. Therefore, the potential optimality
of CSMA is not shown in (72). Also, the CW’s there are set in a more heuristic way.

4.1.3 APPROACHING THE MAXIMAL UTILITY


We now show that the solution of (4.1) approximately achieves the maximal utility when β is large.
Denote the maximal total utility achievable by W̄ , i.e.,

W̄ := maxu,s,f m vm (fm ) (4.4)

subject to the same constraints as in (4.1). Assume that u = ū when (4.4) is solved. Also, assume
that in the optimal solution of (4.1), f = f̂ and u = û. We have the following bound.

Theorem 4.2 (1/β)-Optimality.



The difference between the total utility ( M ˆ
m=1 vm (fm )) resulting from solving (4.1) and the
maximal total utility W̄ is bounded. The bound on the difference decreases as β increases. In particular,

W̄ − (K · log 2)/β ≤ m vm (fˆm ) ≤ W̄ . (4.5)


Proof. Notice that H (u) = − i ui log(ui ), the entropy of the distribution u, is bounded. Indeed,
since there are N ≤ 2K possible states, one has 0 ≤ H (u) ≤ log N ≤ log 2K = K log 2.
Since in the optimal solution of problem (4.1), f = f̂ and u = û, we have H (û) +
 
β m vm (fˆm ) ≥ H (ū) + β W̄ . So β[ m vm (fˆm ) − W̄ ] ≥ H (ū) − H (û) ≥ −H (û) ≥ −K · log 2.
M
Also, clearly W̄ ≥ m=1 vm (fˆm ), so (4.5) follows. 2

4.2 EXTENSIONS
Using derivations similar to Section 4.1, our CSMA algorithm can serve as a modular “MAC-
layer scheduling component” in cross-layer optimization, combined with other components in the
transport layer and network layer, usually with queue lengths as the shared information. For example,
64 4. UTILITY MAXIMIZATION IN WIRELESS NETWORKS
in addition to its combination with congestion control (at the transport layer), we demonstrate in
this section its combination with optimal multipath routing, anycast and multicast (at the network
layer). Therefore, this is a joint optimization of the transport layer, network layer and the MAC layer.

4.2.1 ANYCAST
To make the formulation more general, let us consider anycast with multipath routing. (This includes
unicast with multipath routing as a special case.) Assume that there are M flows. Each flow m has
a source δ(m) (with some abuse of notation) which generates data and a set of destinations D(m)
which receive the data. “Anycast” means that it is sufficient for the data to reach any node in the
set D(m). However, there is no specific “path” for each flow. The data that the source generates is
allowed to split and traverse any link before reaching the destinations (i.e., multipath routing). This
allows for better utilization of the network resources by routing the data through less congested parts
of the network. For simplicity, we don’t consider the possibility of physical-layer multicast here, i.e.,
the effect that a node’s transmission can be received by multiple nodes simultaneously. That is, the
transmitter indicates the intended next node in the packet header and the other nodes discard that
packet.
In this case, it is more convenient to use a “node-based” formulation (47; 77). Denote the
number of nodes by J . For each node j , let I (j ) := {k|(k, j ) ∈ L}, where L is the set of links (it is
also the set V in the conflict graph), and let O(j ) := {k|(j, k) ∈ L}. Denote the rate of flow m on
link (j, l) by sjml . Then the (approximate) utility maximization problem, similar to (4.1), is
 
maxu,s,f − i ui log(ui ) + β · M m=1 vm (fm )
s.t. sj l ≥ 0, ∀(j, l) ∈ L, ∀m
m
 
fm + l∈I (j ) sljm ≤ l∈O(j ) sjml , ∀m, j = δ(m)
 
/ D(m)
s m ≤ l∈O(j ) sjml , ∀m, j = δ(m), j ∈
l∈I (j ) ilj  m
u ·
i i x (j,l) ≥ s
m jl , ∀(j, l) ∈ L
ui ≥ 0, i ui = 1.

Associate a dual variable qjm ≥ 0 to the 2nd and 3rd lines of constraints (for each m and
j∈ / D(m)), and define qjm = 0 if j ∈ D(m). (Note that there is no flow-conservation constraint for
flow m at each node in D(m).) Then similar to Section 4.1, a partial Lagrangian is

L(u, s, f ; q) = − i ui log(ui )
  m
+β · m vm (fm ) − m qδ(m) fm (4.6)

+ (j.l)∈L,m [sjml · (qjm − qlm )].

First, fix u and q, consider maximizing L(u, s, f ; q) over s, subject to sjml ≥ 0 and i ui ·

i
x(j,l) ≥ m sjml . For each link (j, l), let the maximal back-pressure z(j,l) := maxm (qjm − qlm ). Then

clearly, if z(j,l) > 0, a flow m with qjm − qlm = z(j,l) should be served (with the whole rate i ui ·
i
x(j,l) ). If z(j,l) ≤ 0, then no flow is served. After we plug this solution of s back to (4.6), the rest of
4.2. EXTENSIONS 65
the derivation is the same as in Section 4.1. Therefore, the distributed algorithm is as follows. We
again assume vm (0) ≤ V < +∞, ∀m.

Initially, assume that all queues are empty, and set qjm = 0, ∀j, m. Then iterate as follows.
(Similar to Algorithm 3, the step size is α, and the update interval is T . For simplicity, we omit the
time index here.)

• CSMA scheduling and routing: If z(j,l) > 0, link (j, l) lets r(j,l) = z(j,l) in the CSMA op-

eration. Choose a flow m with qjm − qlm = z(j,l) . When it gets the opportunity to transmit,
serve flow m . If z(j,l) ≤ 0, then link (j, l) keeps silent. (Note that there is no replication of
packets.)

• Congestion control: For each flow m, if node j is its source, then it sets fm =
arg maxfm ∈[0,1] {β · vm (fm ) − qjm fm }.

 
• The dual variables qjm are updated as follows: qjm ← [qjm − α l∈O(j ) sjml )]+ + α l∈I (j ) sljm
 
/ D(m); and qjm ← [qjm − α l∈O(j ) sjml )]+ + α(fm + l∈I (j ) sljm ) if j =
if j = δ(m) and j ∈
δ(m). (By doing this, roughly qjm ∝ Qm m
j where Qj is the corresponding queue length.) Always
let qj = 0 if j ∈ D(m).
m

Furthermore, the above algorithm can be readily extended to incorporate channel selection in multi-
channel wireless networks, with each “link” defined by a triplet (j, l; c), which refers to the logical
link from node j to l on channel c. In this scenario, the conflict graph is defined on the set of links
(j, l; c).

4.2.2 MULTICAST WITH NETWORK CODING


Assume that there are M multicast sessions. Each session m has a source δ(m) which generates data
and a set of destinations D(m) which receive the data. Different from “anycast,” here the data must
reach all nodes in the set D(m). There are two possible designs for multicast. (1) Fixed multicast tree,
where the routes of each multicast session are fixed. (2) Multicast combined with multipath routing
and network coding. Case (1) is straightforward, but the routing may not be optimal. In case (2),
(26) demonstrates an algorithm that achieves the optimal utility, which however, requires centralized
Maximal-Weight scheduling at the MAC layer. In this section, we show that CSMA scheduling
can be combined with it, leading to a fully distributed algorithm. To facilitate network coding, we
let all the packets have the same size (Note that the product-form distribution is insensitive to the
distribution of the transmission time, i.e., packet size).
According to the theory of network coding (1), a certain flow rate for a multicast session can
mp
be supported if and only if it can be supported separately for each destination node. Let sj l be the
“information flow rate” on link (j, l) in multicast session m destined for node p ∈ D(m), and sjml
mp
be the “capacity” for session m on link (j, l). The above condition is that sj l ≤ sjml , ∀p ∈ D(m).
66 4. UTILITY MAXIMIZATION IN WIRELESS NETWORKS
Then, the approximate utility maximization problem is

maxu,s,f H (u) + β · M m=1 vm (fm )
mp
s.t. sj l ≥ 0, ∀(j, l) ∈ L, ∀m, ∀p ∈ D(m)
 mp  mp
fm + l∈I (j ) slj ≤ l∈O(j ) sj l , ∀m, j = δ(m), p ∈ D(m)
 mp  mp
l∈I (j ) slj ≤ l∈O(j ) sj l , ∀m, p ∈ D (m), j = δ(m), j = p
mp
sj l ≤ sjml , ∀p ∈ D(m), ∀(j, l) ∈ L
 
i ui · x
i ≥ m sjml , ∀(j, l) ∈ L
(j,l)
ui ≥ 0, i ui = 1.
mp
Associate a dual variable qj ≥ 0 to the 2nd and 3rd lines of constraints (for each m, p ∈
mp
D(m) and j = p), and define qj = 0 if j = p. Then a partial Lagrangian is

L(u, s, f ; q) = H (u)
   mp
+β · m vm (fm ) − m ( p∈D(m) qδ(m) )fm (4.7)
 mp mp mp
+ (j.l)∈L,m,p∈D(m) sj l [(qj − ql )].
mp mp
We first optimize L(u, s, f ; q) over {sj l }, subject to 0 ≤ sj l ≤ sjml . A solution is as follows:
mp mp mp mp mp mp
sj l = 0, ∀p satisfying qj − ql ≤ 0, and sj l = sjml , ∀p satisfying qj − ql > 0. Define the
 mp mp
“back-pressure” of session m on link (j, l) as Wjml := p∈D(m) (qj − ql )+ . By plugging the
above solution to (4.7), we have

L(u, s, f ; q) = H (u)
   mp
+β · m vm (fm ) − m ( p∈D(m) qδ(m) )fm (4.8)

+ (j.l)∈L,m sjml Wjml .
 
Now we optimize this expression over {sjml }, subject to i ui · x(j,l)
i ≥ m sjml . One can find
that the rest is similar to previous derivations. To avoid repetition, we directly write down the
algorithm. Assume vm (0) ≤ V < +∞, ∀m.
mp
Initially, assume that all queues are empty, and set qj = 0, ∀j, m, p. Then iterate:

• CSMA scheduling, routing, and network coding: Link (j, l) computes the maximal back-
pressure z(j,l) := maxm Wjml . If z(j,l) > 0, then let r(j,l) = z(j,l) in the CSMA operation.

Choose a session m with Wjml = z(j,l) . When it gets the opportunity to transmit, serve session
m . To do so, node j performs a random linear combination1 of the head-of-line packets from
m p m p
the queues of session m with destination p ∈ D(m ) which satisfies qj − ql > 0, and
1 We briefly explain how to perform a “random linear combination” of these packets. For more details, please refer to (26). (Note
that our main focus here is to show how to combine CSMA scheduling with other network protocols, instead of network coding
itself.) Initially, each packet generated by the source in each session is associated with an ID. Assume that each packet is composed
of many “blocks”, where each block has γ bits. So, each block can be viewed as a number in a finite field F2γ , which has 2γ
elements. For each packet P to be combined here, randomly choose a coefficient aP ∈ F2γ . Denote the i’th block of packet P
4.3. SIMULATIONS 67
transmits the coded packet (similar to (26)). The coded packet, after received by node l,
is replicated and put into corresponding queues of session m at node l (with destination
m p m p
p ∈ D(m ) such that qj − ql > 0). The destinations can eventually decode the source
packets (26). If z(j,l) = 0, then link (j, l) keeps silent.

• Congestion control: For each flow m, if node j is its source, then it sets fm =

arg maxfm ∈[0,1] {β · vm (fm ) − ( p∈D(m) qδ(m) )fm }.
mp

mp mp  mp
• The dual variables qjm are updated as follows: qj ← [qj − α l∈O(j ) sj l )]+ +
 mp mp mp
α l∈I (j ) slj if j = δ(m) and j = p where p ∈ D(m); and qj ← [qj −
 mp  mp
α l∈O(j ) sj l )]+ + α(fm + l∈I (j ) slj ) if j = δ(m). (Note that each packet generated
by the source j = δ(m) is replicated and enters the queues at the source for all destinations
mp mp mp
of session m.) By doing this, roughly qj ∝ Qj where Qj is the corresponding queue
mp
length. Always let qj = 0 if j = p where p ∈ D(m).

Note that both algorithms in Section 4.2 can be analyzed using the approach in Section 4.4 for
Algorithm 2.

4.3 SIMULATIONS
Figure 4.1 shows the network topology, where each circle represents a node. The nodes are arranged
in a grid for convenience, and the distance between two adjacent nodes (horizontally or vertically) is
1. Assume that the transmission range is 1, so that a link can only be formed by two adjacent nodes.
Assume that two links cannot transmit simultaneously if there are two nodes, one in each
link, being within a distance of 1.1 (In IEEE 802.11, for example, DATA and ACK packets are
transmitted in opposite directions. This model considers the interference among the two links in
both directions, and is equivalent to the “two-hop interference model” in this network). The paths
of 3 multi-hop flows are plotted. The utility function of each flow is vm (fm ) = log(fm + 0.01).
The weighting factor is β = 3. (Note that the input rates are adjusted by the congestion control
algorithm instead of being specified as in the last subsection.)
Figure 4.2 shows the evolution of the flow rates, using Algorithm 3 with T = 5ms and
α = 0.23. We see that they become relatively constant after an initial convergence. By directly
solving (4.4) centrally, we find that the theoretical optimal flow rates for the three flows are 0.11,
0.134 and 0.134 (data unit/ms), very close to the simulation results. The queue lengths are also stable
(in fact, uniformly bounded as proved in Section 4.4).

as P (i). Then the corresponding block in the code packet Z is computed as Z(i) = P aP P (i), where the multiplication and
summation is on the field F2γ , and the summation is over all the packets to be combined.
Clearly, each packet in the network is a linear combination of some source packets. The ID’s of these source packets and the
corresponding coefficients are included in the packet header, and are updated after each linear combination along the path (such
that the destinations can decode the source packets).
68 4. UTILITY MAXIMIZATION IN WIRELESS NETWORKS

)ORZ 

)ORZ 
)ORZ 

Figure 4.1: Network and flow directions.

4.4 PROPERTIES OF ALGORITHM 3


Section 4.4.1 derives an upper bound on the backpressure in the network. Section 4.4.2 uses that
technical result to prove that Algorithm 3 is (1/β)-optimal. That is, the utility of the flows that
it achieves differs from the maximum possible utility at most by a constant divided by β. Finally,
Section 4.4.3 proves bounds on the queue lengths.

4.4.1 BOUND ON BACKPRESSURE


Define bkm (j ) := qkm (j ) − qdown(k,m),m (j ), the backpressure at time step j for node k and flow m.

Lemma 4.3 Bound on Backpressures


(0) ≤ V < ∞, ∀m. Denote
Assume that the utility function vm (fm ) (strictly concave) satisfies vm
by L as the largest number of hops of a flow in the network. Then in Algorithm 3, bkm (j ) ≤ β · V + α +
2α · (L − 1), ∀k, m at all time step j .

Proof. According to Algorithm 3, the source of flow m solves fm (j ) = arg maxfm ∈[0,1] {β ·
vm (fm ) − qδ(m),m (j ) · fm }. It is easy to see that if qδ(m),m (j ) ≥ β · V , then fm (j ) = 0, i.e.,
the source stops sending data. Thus, qδ(m),m (j + 1) ≤ qδ(m),m (j ). If qδ(m),m (j ) < β · V , then
qδ(m),m (j + 1) ≤ qδ(m),m (j ) + α < β · V + α. Since initially qkm (0) = 0, ∀k, m, by induction, we
have
qδ(m),m (j ) ≤ β · V + α, ∀j, m. (4.9)
In Algorithm 3, no matter whether flow m has the maximal back-pressure at link k, the

actual average service rate skm (j ) = 0 if bkm (j ) ≤ 0. That is, skm (j ) > 0 only if bkm (j ) > 0. Since
4.4. PROPERTIES OF ALGORITHM 3 69
Flow rates
0.4
Flow 1
Flow 2
0.35
Flow 3

0.3
Flow rates (data units/ms)

0.25

0.2

0.15

0.1

0.05

0
0 2 4 6 8 10 12 14 16
time (ms) x 10
4

Figure 4.2: Flow rates with joint scheduling and congestion control.


skm (j ) ≤ 1, by item 3 of Algorithm 3, qdown(k,m),m (j + 1) ≤ qdown(k,m),m (j ) + α and qkm (j +
1) ≥ qkm (j ) − α. Then, if bkm (j ) > 0, we have bkm (j + 1) ≥ bkm (j ) − 2α > −2α. If bkm (j ) ≤ 0,
then bkm (j + 1) ≥ bkm (j ). Since bkm (0) = 0, by induction, we have

bkm (j ) ≥ −2α, ∀j, k, m. (4.10)



Since k:amk =1 bkm (j ) = qδ(m),m (j ), combined with (4.9) and (4.10), we have bkm (j ) ≤
β · V + α + 2α · (L − 1). 2

4.4.2 TOTAL UTILITY


In this section, we show that the difference between the optimal utility and that achieved by Algo-
rithm 3 is bounded. The result is in the following theorem.
Theorem 4.4 Algorithm 3 is (1/β)-Optimal. One has
 [K · log(2) + K · C · C1 /T ] + 5α · K/2
lim inf vm (f¯m (J )) ≥ W̄ − .
J →∞ β
m
70 4. UTILITY MAXIMIZATION IN WIRELESS NETWORKS
Proof. Regard each period (with length T ) as a “time slot” in (57). By Lemma 4.3, bkm (j ) ≤
β · V + α + 2α · (L − 1), ∀k, m, j . Since rk (j ) = [maxm bkm (j )]+ , we have 0 ≤ rk (j ) ≤ C := β ·
V + α + 2α · (L − 1). Thus, the mixing time of the CSMA Markov chain in any period is bounded
(33). So
C1
|Ej [sk (j )] − sk (r(j ))| ≤ (4.11)
T
where the constant C1 depends on C and K ((33)), and Ej (·) means the expectation conditioned
on the values of all random variables up to time tj .
 
Since u∗i := pi (r(j )), ∀i maximizes H (u) + k [rk (j ) i (xki · ui )] (see Proposition 3.14),
similar to the proof of Proposition 4.2, we have
  
[rk (j ) (xki · u∗i )] = [rk (j ) · sk (r(j ))]
k i k 
≥ max [rk (j ) · μk ] − K · log(2)
μ∈C¯ k

where C¯ is the set of feasible service rates (including C and its boundary).
By this inequality and (4.11),
 
{rk (j ) · Ej [sk (j )]} ≥ max [rk (j ) · μk ]
k μ∈C¯ k
−K · log(2) − K · C · C1 /T . (4.12)

Define q̃km (j ) := qkm (j )/α and r̃k (j ) := rk (j )/α. Then according to Algorithm 3, q̃km (j )
evolves as the “backlog” in (57), i.e.,

q̃km (j + 1) = [q̃km (j ) − skm (j )]+ + fm (j )

if link k is the source link of flow m; and otherwise,



q̃km (j + 1) = [q̃km (j ) − skm (j )]+ + sup(k,m),m (j ).

Also, r̃k (j ) is equivalent to the maximal backpressure in (57) defined as [maxm {q̃km (j ) −
q̃down(k,m),m (j )}]+ . Finally,

fm (j ) = arg max {β · vm (fˆm ) − qδ(m),m (j )fˆm }


fˆm ∈[0,1]
= arg max {(β/α) · vm (fˆm ) − q̃δ(m),m (j )fˆm } (4.13)
fˆm ∈[0,1]

Using (4.12), clearly


 
{r̃k (j ) · Ej [sk (j )]} ≥ max [r̃k (j ) · μk ] − [K · log(2) + K · C · C1 /T ]/α. (4.14)
k μ∈C¯ k
4.4. PROPERTIES OF ALGORITHM 3 71
Next, we need to use Corollary 1 in (57), which is rephrased below for completeness.
Corollary 1 in (57). If a resource allocation policy chooses sk (j ) such that
 
{r̃k (j ) · Ej [sk (j )]} ≥ max [r̃k (j ) · μk ] − D, (4.15)
k μ∈C¯ k

(that is, if for each j the policy achieves a “weight” within D of the maximal weight), and it chooses
fm (j ) such that
1
fm (j ) = arg max { V · vm (fˆm ) − q̃δ(m),m (j )fˆm }, (4.16)
ˆ
fm ∈[0,1] 2
then
 2D + BK
lim inf vm (f¯m (J )) ≥ W̄ −
J →∞ V
m
J −1
where f¯m (J ) := j =0 E[fm (j )]/J is the expected average rate of flow m up to the J ’s period, W̄
is the maximal total utility that can be achieved, and

1  max
K
B= [(Rk + μin
max,k ) + (μmax,k ) ]
2 out 2
K
k=1

where Rkmax is the maximal flow input rate at link k, μin out
max,k and μmax,k are the maximal rate the
link k can receive or transmit.
With Algorithm 3, we have Rkmax = μin
max,k = μmax,k = 1. So B = 5. Also, by compar-
out

ing (4.13)–(4.16), we have V = 2β/α and D = [K · log(2) + K · C · C1 /T ]/α. Using the above
corollary, we have
 2[K · log(2) + K · C · C1 /T ]/α + 5K
lim inf vm (f¯m (J )) ≥ W̄ −
J →∞ 2β/α
m
K · log(2) + K · C · C1 /T + 5αK/2
= W̄ − . (4.17)
β

As expected, when T → ∞ and α → 0, this bound matches the bound in Proposition 4.2.
Also, as β → ∞, α → 0 , and T → ∞ in a proper way (since C and C1 depend on β),

lim inf J →∞ m vm (f¯m (J )) → W̄ .

4.4.3 QUEUE LENGTHS


This section provides an upper bound on the queue lengths. One has the following result.
72 4. UTILITY MAXIMIZATION IN WIRELESS NETWORKS
Theorem 4.5 Bound on Queue Lengths.
One has
T
Qkm (j ) ≤ [β · V + (2L − 1)α].
α

Proof. By (4.9) and (4.10), we have

qkm (j ) ≤ β · V + α + 2(L − 1)α, ∀k, m, j.

Also, in view of the dynamics of qkm (j ) in Algorithm 3, the actual queue lengths Qkm (j ) ≤
(T /α) · qkm (j ), ∀k, m, j . Therefore,

T
Qkm (j ) ≤ [β · V + (2L − 1)α]. (4.18)
α
So all queue lengths are uniformly bounded. The bound increases with T , β and decreases with α.

The above bounds (4.17) and (4.18), however, are not very tight. Our simulation shows near-
optimal total utility without a very large β, T or a very small α. This leads to moderate queue
lengths.

4.5 SUMMARY
In this chapter, we have developed fully distributed cross-layer algorithms for utility maximization in
wireless networks. First, we combined admission control (at the transport layer) with the A-CSMA
scheduling algorithm (at the MAC layer) to approach the maximal utility (Section 4.1, 4.3, and 4.4).
Since the flows can traverse multiple hops, the transmission aggressiveness of each link is based
on the maximal back-pressure instead of the queue length as in the last chapter (which focused on
one-hop flows).
Then we further showed that A-CSMA is a modular MAC-layer component that can work
seamlessly with other protocols in the network layer and transport layer (Section 4.2). For example,
in addition to admission control, it was further combined with optimal routing, anycast and multicast
with network coding.
A key to the design of these algorithms is a modification of the usual utility maximization
problem. In particular, instead of maximizing the utility, we maximize the sum of an entropy and
a weighted utility. By doing this, we can not only obtain the suitable CSMA parameters given the
flow rates (as in the last chapter), but we can also control the flow rates to arbitrarily approximate
the maximal utility (by using a large weight on the utility).
4.6. RELATED WORKS 73

4.6 RELATED WORKS


The central idea of maximizing of the sum of the user utilities as an interpretation of TCP congestion
control is due to (41). See also (49; 54). Combining this objective with the scheduling appears
in (56; 20), which showed that solving a utility maximization problem naturally leads to a simple
congestion control algorithm at the transport layer and the maximal-weight scheduling (MWS) at
the MAC layer.
Unfortunately, as mentioned in Section 3.15.1, implementing MWS is sometimes not prac-
tical in distributed networks. This motivated the study of combining imperfect scheduling with
congestion control: Reference (46) investigated the impact of imperfect scheduling on network
utility maximization.
Related to this area, there is research on utility maximization given a certain MAC layer
protocol, for example (44) and (21) which considered the slotted-ALOHA random access protocol
at the MAC layer. Due to the inherent inefficiency of slotted-ALOHA, however, these proposals
cannot achieve the maximum utility that is achievable with perfect scheduling.
75

CHAPTER 5

Distributed CSMA Scheduling


with Collisions
5.1 INTRODUCTION

We have shown in Chapter 4 that an adaptive CSMA (Carrier Sense Multiple Access) distributed
algorithm (Algorithm 1) can achieve the maximal throughput in a general class of wireless networks.
However, that algorithm needs an idealized assumption that the sensing time is negligible, so that
there are no collisions. In this chapter, we study more practical CSMA-based scheduling algorithms
with collisions. First, in Section 5.2, we provide a discrete-time model of this CSMA protocol and
give an explicit throughput formula, which has a simple product-form due to the quasi-reversibility
structure of the model. Second, in Section 5.3, we show that Algorithm 1 in Chapter 3 can be
extended to approach throughput optimality in this case. Finally, sufficient conditions are given to
ensure the convergence and stability of the proposed algorithm.
To combine the scheduling algorithm (with collisions) with congestion control, we follow an
approach similar to the one we used in Chapter 4. The details of the combination are given in (32).
To achieve throughput-optimality even with collisions, we need to limit the impact of colli-
sions. Our basic idea is to use a protocol similar to the RTS/CTS mode of IEEE 802.11 where we
let each link fix its transmission probability but adjust its transmission time (or length) to meet the
demand. In the absence of hidden nodes, collisions only occur among the small RTS packets but
not the data packets. Also, the collision probability is limited since we fix the transmission probabil-
ities. These two key factors combined ensure a limited impact of collisions. When the transmission
lengths are large enough, the protocol intuitively approximates the idealized-CSMA.
However, to precisely model and compute the service rates in the CSMA protocol with colli-
sions and to prove the throughput-optimality of our algorithms we need to handle two difficulties.
First, the Markov chain used to model the CSMA protocol is no longer time-reversible. Second, the
resulting stationary distribution, although in a product-form, is no longer a Markov Random Field.
Finally, it is worth noting that an interesting by-product of our general CSMA model devel-
oped in this chapter is the unification of several known models for slotted-ALOHA, wireless LAN
(as in Bianchi (4)) and the idealized-CSMA model. Indeed, we believe that the general CSMA
model captures some essence of random access algorithms.
76 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS

5.2 CSMA/CA-BASED SCHEDULING WITH COLLISIONS


In this section, we introduce the model and we derive its invariant distribution. The model is
essentially a discrete-time CSMA protocol with RTS/CTS. The Markov model turns out to be
quasi-reversible. That is, it almost looks the same in reversed time, in a precise sense.

5.2.1 MODEL
We now describe the basic CSMA/CA protocol with fixed transmission probabilities, which suffices
for our later development. Let σ̃ be the duration of each minislot. (In IEEE 802.11a, for example,
σ̃ = 9μs.) In the following, we will simply use slot to refer to the minislot.
The conflicting relationships among the links are represented by a conflict graph, as defined in
Chapter 1. In particular, it assumes that the conflict relationship among any two links is symmetric.
Assume that all links are saturated, i.e., always have packets to transmit. In each slot, if the transmitter
of link i is not already transmitting and if the medium is idle, the transmitter of link i starts
transmitting with probability pi (also denote qi := 1 − pi ). If at a certain slot, link i did not choose
to transmit but a conflicting link starts transmitting, then link i keeps silent until that transmission
ends. If conflicting links start transmitting at the same slot, then a collision happens, and assume
that all the involved links lose their packets.
There are some limitations in the above model. First, we have assumed that the conflicting
relationships between links are symmetric, which does not always hold. Consider the example in
Fig. 5.1: if link 1 and link 2 start transmitting at the same slot, link 1’s packet is corrupted but link
2’s packet could be successfully received (since the interference from link 1’s transmission is weak).
1 Second, we have implicitly assumed that the networks do not have hidden nodes, such that all

conflicting links can hear each other. (For more discussions on the hidden-node problem and the
possible ways to address it, please refer to (28) and its references.) The consideration of asymmetry
and hidden nodes would significantly complicate the analysis, and is an interesting direction for
future research.

/LQN  /LQN 

Figure 5.1: An example of asymmetric interference.

Each link transmits a short probe packet with length γ (similar to the RTS packet in 802.11)
before the data is transmitted. (All “lengths” here are measured in number of slots and are assumed
to be integers.) Using such a probe increases the overhead of successful transmissions, but it can
avoid collisions of long data packets. When a collision happens, only the probe packets collide, so
each collision lasts precisely γ slots. Assume that a successful transmission of link i lasts τi , which
1 Note that this kind of asymmetry does not occur in the idealized CSMA model since there is no collision there.
5.2. CSMA/CA-BASED SCHEDULING WITH COLLISIONS 77

includes a constant overhead τ


p
(composed of RTS, CTS, ACK, etc.) and the data payload τi , which
is a random variable. Clearly, τi ≥ τ . Let the p.m.f. (probability mass function) of τi be

P r{τi = b} = Pi (b), for b = 1, 2, 3, . . . (5.1)

and assume that the p.m.f. has a finite support, i.e., Pi (b) = 0, ∀b > bmax > 0. Then the mean of τi
is
b
max
Ti := E(τi ) = b · Pi (b) (5.2)
b=1
Fig. 5.2 illustrates the timeline of the 3-link network in Fig. 2.1, where link 1 and 2 conflict,
and link 2 and 3 conflict.
We note a subtle point in our modeling. In IEEE 802.11, a link can attempt to start a
transmission only after it has sensed the medium as idle for a constant time (which is called DIFS,
or “DCF Inter Frame Space”). To take this into account, DIFS is included in the packet transmission
length τi and the collision length γ . In particular, for a successful transmission of link i, DIFS is
included in the constant overhead τ . Although DIFS, as part of τ , is actually after the payload, in
Fig. 5.2, we plot τ before the payload. This is for convenience and does not affect our results. So,
under this model, a link can attempt to start a transmission immediately after the transmissions of
its conflicting links end.
The above model possesses a quasi-reversibility property that will lead to a simple throughput
formula. Our model, in Fig. 5.2, reversed in time, follows the same protocol as described above, except
for the order of the overhead and the payload, which are reversed. A key reason for this property
is that the collisions start and finish at the same time. (This point will be made more precise in
Section 5.6.)

7 J
IJ¶
/LQN  6XFFHVV
0LQLVORWV «
FROOLVLRQ 7

/LQN  « 6XFFHVV


«
7

/LQN  6XFFHVV
«

Figure 5.2: Timeline in the basic model (In this figure, τi = Ti , i = 1, 2, 3 are constants.)

5.2.2 NOTATION
Let the “on-off state” be x ∈ {0, 1}K where xk , the k-th element of x, is such that xk = 1 if link k is
active (transmitting) in state x, and xk = 0 otherwise. Thus, x is a vector indicating which links are
78 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
active in a given slot. Let G(x) be the subgraph of G after removing all vertices (each representing a
link) with state 0 (i.e., any link j with xj = 0) and their associated edges. In general, G(x) is composed
of a number of connected components (simply called “components”) Cm (x), m = 1, 2, . . . , M(x)
(where each component is a set of links, and M(x) is the total number of components in G(x)). If a
component Cm (x) has only one active link (i.e., |Cm (x)| = 1), then this link is having a successful
transmission; if |Cm (x)| > 1, then all the links in the component are experiencing a collision. Let
the set of “successful” links in state x be S(x) := {k|k ∈ Cm (x) with |Cm (x)| = 1}, and the set
of links that are experiencing collisions be φ(x). Also, define the “collision number” h(x) as the
number of components in G(x) with size larger than 1. Fig. 5.3 shows an example. Note that the
transmissions in a collision component Cm (x) are “synchronized”, i.e., the links in Cm (x) must have
started transmitting in the same slot, and they will end transmitting in the same slot after γ slots
(the length of the probe packets).

  

  

Figure 5.3: An example conflict graph (each square represents a link). In this on-off state x, links 1, 2,
5 are active. So S(x) = {5}, φ(x) = {1, 2}, h(x) = 1.

5.2.3 COMPUTATION OF THE SERVICE RATES


In order to compute the service rates of all the links under the above CSMA protocol when all the
links are saturated, we first define the underlying discrete-time Markov chain, which we call the
CSMA/CA Markov chain.
The state of the Markov chain includes the total and the duration to-date of the transmissions
in progress. Specifically, define the state

w := {x, ((bk , ak ), ∀k : xk = 1)} (5.3)

where bk is the total length of the current packet link k is transmitting, ak is duration to-date of the
transmission in progress.
For example, in Fig. 5.4, the states w and w are

w = {x = (1, 0, 1)T , (b1 = 11, a1 = 11), (b3 = 10, a3 = 7)} (5.4)


5.2. CSMA/CA-BASED SCHEDULING WITH COLLISIONS 79
and
w = {x = (0, 0, 1)T , (b3 = 10, a3 = 8)}. (5.5)

For convenience we also use the following notation. Let x(w) be the on-off state in w. In
state w, if link k is off, denote wk = 0; if link k is on, let wk = 1 and denote by bk (w), ak (w) the
corresponding bk , ak .

E PLQLVORW
/LQN 

/LQN 
E

/LQN 

Z Z¶
7LPH D[LV

7LPH D[LV RI WKH


UHYHUVHG SURFHVV

Figure 5.4: Example of the CSMA/CA Markov chain

Note that in any state w as defined in (5.3), we have


(I) 1 ≤ ak ≤ bk , ∀k : xk = 1.
(II) Pk (bk ) > 0, ∀k ∈ S(x).
(III) If k ∈ φ(x), then bk = γ and ak ∈ {1, 2, . . . , γ }.
(Recall that S(x) is the set of successful links that transmit without collision and φ(x) is the
set of links involved in a collision.)
An important observation here is that the transmissions in a collision component Cm (x) are
“synchronized”, i.e., the links in Cm (x) must have started transmitting at the same time, and they
will end transmitting at the same time, so all links in the component Cm (x) have the same remaining
time. Indeed, any two links i and j in this component with an edge between them must have started
transmitting at the same time. Otherwise, if i starts earlier, j would not transmit since it already
hears i’s transmission; and vice versa. By induction, all links in the component must have started
transmitting at the same time.
Accordingly, ak = a (m) for any k ∈ Cm (x) where |Cm (x)| > 1, and a (m) denotes the remaining
time of the component Cm (x).
We say that a state w is valid iff it satisfies (I), (II), and (III) above.
80 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
Since the transmission lengths are always bounded by bmax by assumption, we have bk ≤ bmax ,
and, therefore, the Markov chain has a finite number of states. The Markov chain is irreducible and
ergodic.
As we will show in Section 5.6.1), a nice property of this Markov chain is its quasi-reversibility,
and its stationary distribution has a simple product-form. From that invariant distribution one can
derive the probability p(x) of any on-off state x. The result is given in the next theorem.

Theorem 5.1 Invariant Distribution of CSMA/CA On-Off States.


Under the stationary distribution, the probability p(x) of x ∈ {0, 1}K is

1 h(x)

p(x) = (γ Tk ) (1 − pi ) pj
E
i:xi =0
k∈S(x) j :xj =1

1 h(x)

K
= (γ Tk ) pixi qi1−xi (5.6)
E
k∈S(x) i=1

where qi := 1 − pi , Ti is the mean transmission length of link i (as defined in (5.2)), and E is a

normalizing term such that x∈{0,1}K p(x) = 1.2

The proof is given in Section 5.6.1.3


Remark: Note that in x, some links can be in a collision state, just as in IEEE 802.11. This is
reflected in the γ h(x) term in (5.6). Expression (5.6) differs from the idealized-CSMA case in (3.1)
and the stationary distribution in the data phase in the protocol proposed in (59).
Now we re-parametrize Tk by a variable rk . Let Tk := τ + T0 · exp(rk ), where τ , as we
p
defined, is the overhead of a successful transmission (e.g., RTS, CTS, ACK packets), and Tk :=
T0 · exp(rk ) is the mean length of the payload. Here, T0 > 0 is a constant “reference payload length”.
Let r be the vector of rk ’s. By Theorem 5.1, the probability of x (with a given r) is

p(x; r) = g(x) · (τ + T0 · exp(rk )) (5.7)


E(r)
k∈S(x)

K xi 1−xi
where g(x) = γ h(x) i=1 pi qi does not depend on r, and the normalizing term is


E(r) = [g(x ) · (τ + T0 · exp(rk ))]. (5.8)


x ∈{0,1}K k∈S(x )

2 In this chapter, several kinds of “states” are defined. With a little abuse of notation, we often use p(·) to denote the probability of
some “state” under the stationary distribution of the CSMA/CA Markov chain. This does not cause confusion since the meaning
of p(·) is clear from its argument.
3 In (6), a similar model for CSMA/CA network is formulated with analogy to a loss network (39). However, since (6) studied the
case when the links are unsaturated, the explicit expression of the stationary distribution was difficult to obtain.
5.3. A DISTRIBUTED ALGORITHM TO APPROACH THROUGHPUT-OPTIMALITY 81
Then, the probability that link k is transmitting a payload in a given slot is

T0 · exp(rk ) 
sk (r) = p(x; r). (5.9)
τ + T0 · exp(rk )
x:k∈S(x)

Recall that the capacity of each link is 1. Also, it’s easy to show that the CSMA/CA Markov chain
is ergodic. As a result, if r is fixed, the long-term average throughput of link k converges to the
stationary probability sk (r). So we say that sk (r) ∈ [0, 1] is the service rate of link k.

5.3 A DISTRIBUTED ALGORITHM TO APPROACH


THROUGHPUT-OPTIMALITY
In this section, we focus on the scheduling problem where all the packets traverse only one link
(i.e., single-hop) before they leave the network. The objective is to support any vector of strictly
feasible arrival rates λ ∈ C . However, the results here can be extended to multi-hop networks and be
combined with congestion control as in Chapter 4. Throughout the rest of the chapter, we assume
that the maximal instantaneous arrival rate is λ̄. so λ k (i) ≤ λ̄, ∀k, i.

5.3.1 CSMA SCHEDULING WITH COLLISIONS


The following theorem states that any vector λ ∈ C of average rates can be achieved by properly
p
choosing the mean payload lengths Tk := T0 exp(rk ), ∀k.

Theorem 5.2 CSMA/CA is Throughput-Optimal.


Assume that γ , τ > 0, and transmission probabilities pk ∈ (0, 1), ∀k are fixed. Given any λ ∈ C ,
there exists a unique r ∗ ∈ RK such that the service rate of link k is equal to the arrival rate for all k:

sk (r ∗ ) = λk , ∀k. (5.10)

Moreover, r ∗ is the solution of the convex optimization problem

max L(r; λ) (5.11)


r

where 
L(r; λ) = (λk rk ) − log(E(r)), (5.12)
k

with E(r) defined in (5.8). This is because ∂L(r; λ)/∂rk = λk − sk (r), ∀k.

The proof is in Section 5.6.2.


Theorem 5.2 motivates us to design a gradient algorithm to solve problem (5.11). However,
due to the randomness of the system, λk and sk (r) cannot be obtained directly and need to be
82 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
estimated. We design the following distributed algorithm, where each link k dynamically adjusts its
p
mean payload length Tk based on local information.

Definition 5.3 Algorithm 4: Transmission length control algorithm.


The vectors r is updated every M slots. Specifically, it is updated at the beginning of slot M · i,
i = 1, 2, . . . . Let ti = M · i for i ≥ 0. For i ≥ 1, let “period i” be the time between ti−1 and ti , and
r(i) be the value of r at the end of period i, i.e., at time ti . Initially, link k sets rk (0) ∈ [rmin , rmax ]
where rmin , rmax are two parameters (to be further discussed). Then at time ti , i = 1, 2, . . . , each
link k updates its parameter rk according to the following identity:

rk (i) = rk (i − 1) + α(i)[λ k (i) − sk (i) + h(rk (i − 1))] (5.13)

where α(i) > 0 is the step size in period i and λ k (i), sk (i) are the empirical average arrival rate
and service rate in period i (i.e., the actual amount of arrived traffic and served traffic in period i
divided by M). Note that λ k (i), sk (i) are random variables which are generally not equal to λk and
sk (r(i − 1)). Also, h(·) is a “penalty function”, defined below, that keeps r(i) in a bounded region.
(This is a “softer” approach than directly projecting rk (i) to the set [rmin , rmax ]. The purpose is only
to simplify the proof of Theorem 5.4 later.) One defines

⎨rmin − y
⎪ if y < rmin
h(y) = 0 if y ∈ [rmin , rmax ] (5.14)


rmax − y if y > rmax .

Remark: An important point here is that, as in the previous chapters, we let link k send dummy packets
when its queue is empty. So, each link is saturated. This ensures that the CSMA/CA Markov chain
has the desired stationary distribution in (5.6). The transmitted dummy packets are also included in
the computation of sk (i). (Although the use of dummy packets consumes bandwidth, it simplifies
our analysis, and does not prevent us from achieving the primary goal, i.e., approaching throughput-
optimality.)
p
In period i + 1, given r(i), we need to choose τk (i), the payload lengths of each link k, so that
p p p p p
E(τk (i))= Tk (i) = T0 exp(rk (i)). If Tk (i) is an integer, then we let τk (i) = Tk (i); otherwise, we
p
randomize τk (i) as follows:
p  p  p 
p Tk (i) with probability Tk (i) − Tk (i)
τk (i) =  p  p  p (5.15)
Tk (i) with probability Tk (i) − Tk (i).

Here, for simplicity, we have assumed


p thatthep arrived
 packets can be fragmented and reassembled
to obtain the desired lengths Tk (i) or Tk (i) . However, one can avoid the fragmentation by
randomizing the number of transmitted packets (each with a length of M slots) in a similar way.
5.3. A DISTRIBUTED ALGORITHM TO APPROACH THROUGHPUT-OPTIMALITY 83
When there are not enough packets in the queue, “dummy packets” are generated (as mentioned
p
before) to achieve the desired E(τk (i)) = T0 exp(rk (i)), so that the links are always saturated.

Intuitively speaking, Algorithm 4 says that when rk ∈ [rmin , rmax ], if the empirical arrival
rate of link k is larger than the service rate, then link k should transmit more aggressively by using
a larger mean transmission length, and vice versa.
Algorithm 4 is parametrized by rmin , rmax , which are fixed during the execution of the algo-
rithm. Note that the choice of rmax affects the maximal possible payload length. Also, as discussed
below, the choices of rmax and rmin also determine the “capacity region” of Algorithm 4.
We define the region of arrival rates

C (rmin , rmax ) := {λ ∈ C |r ∗ (λ) ∈ (rmin , rmax )K } (5.16)

where r ∗ (λ) denotes the unique solution of maxr L(r; λ) (such that sk (r ∗ ) = λk , ∀k, by Theorem 5.2).
Later, we show that the algorithm can “support” any λ ∈ C (rmin , rmax ) in some sense under certain
conditions on the step sizes.
Clearly, C (rmin , rmax ) → C as rmin → −∞ and rmax → ∞, where C is the set of all strictly
feasible λ (by Theorem 5.2). Therefore, although given (rmin , rmax ) the region C (rmin , rmax ) is
smaller than C , one can choose (rmin , rmax ) to arbitrarily approach the maximal capacity region
C . Also, there is a tradeoff between the capacity region and the maximal packet length, which is
unavoidable given the fixed overhead per packet and the collisions.

Theorem 5.4 Algorithm 4 is Throughput-Optimal.


Assume that the vector of arrival rates λ ∈ C (rmin , rmax ). Then, with Algorithm 4,
 
(i) If α(i) > 0 is non-increasing and satisfies i α(i) = ∞, i α(i)2 < ∞ and α(1) ≤ 1 (for
example, α(i) = 1/ i), then r(i) → r ∗ as i → ∞ with probability 1, where r ∗ satisfies sk (r ∗ ) = λk , ∀k.
(ii) If α(i) = α, ∀i, then for any δ > 0 there exists a small enough α > 0 such that

lim inf J →∞ [ Ji=1 sk (i)/J ] ≥ λk − δ, ∀k with probability 1. In other words, one can achieve aver-
age service rates arbitrarily close to the arrival rates by choosing α small enough.

The complete proof of Theorem 5.4 is Section 5.6.3, but the result can be intuitively under-
stood as follows. If the step size is small (in (i), α(i) becomes small when i is large), rk is “quasi-static”
such that, roughly, the service rate is averaged (over multiple periods) to sk (r), and the arrival rate
is averaged to λk . Thus, the algorithm solves the optimization problem (5.11) by a stochastic ap-
proximation (7) argument, such that r(i) converges to r ∗ in part (i), and r(i) is near r ∗ with high
probability in part (ii).
84 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS

5.4 REDUCING DELAYS


Consider the following variant of Algorithm 4.

Definition 5.5 Algorithm 4(b). Algorithm 4(b) is defined by the following update equation for
each link k:
rk (i) = rk (i − 1) + α(i)[λ k (i) +
− sk (i) + h(rk (i − 1))] (5.17)
where
> 0 is a small constant. That is, the algorithm “pretends” to serve the arrival rates λ +
· 1
which are slightly larger than the actual rates λ.

Theorem 5.6 Reducing Delays.


Assume that

λ ∈ C (rmin , rmax ,
) := {λ|λ +
· 1 ∈ C (rmin , rmax )}.

For algorithm (5.17), one has the following results:


 
(i) if α(i) > 0 is non-increasing and satisfies i α(i) = ∞, i α(i)2 < ∞ and α(1) ≤ 1 (for
example, α(i) = 1/ i), then r(i) → r ∗ as i → ∞ with probability 1, where r ∗ satisfies sk (r ∗ ) = λk +

> λk , ∀k;
(ii) if α(i) = α (i.e., constant step size) where α is small enough, then all queues are positive
recurrent (and therefore stable).

Algorithm (5.17) is parametrized by rmin , rmax and


. Clearly, as rmin → −∞, rmax → ∞
and
→ 0, C (rmin , rmax ,
) → C , the maximal capacity region.

Proof. The proof is similar to that of Theorem 5.4 and the details are given in (34). A sketch is as
follows: Part (i) is similar to (i) in Theorem 5.4.The extra fact that sk (r ∗ ) > λk , ∀k reduces the queue
size compared to Algorithm 4 (since when the queue size is large, it tends to decrease). Part (ii) holds

because if we choose δ =
/2, then by Theorem 5.4, lim inf J →∞ [ Ji=1 sk (i)/J ] ≥ λk +
− δ >
λk , ∀k almost surely if α is small enough. Then the result follows by showing that the queue sizes
have negative drift. 2

5.5 NUMERICAL EXAMPLES


Consider the conflict graph in Fig. 5.5. Let the vector of arrival rates be λ = ρ ·
λ̄, where ρ ∈ (0, 1) is the “load”, and λ̄ is a convex combination of several maxi-
mal IS: λ̄ = 0.2 ∗ [1, 0, 1, 0, 1, 0, 0] + 0.2 ∗ [0, 1, 0, 0, 1, 0, 1] + 0.2 ∗ [0, 0, 0, 1, 0, 1, 0] + 0.2 ∗
[0, 1, 0, 0, 0, 1, 0] + 0.2 ∗ [1, 0, 1, 0, 0, 1, 0] = [0.4, 0.4, 0.4, 0.2, 0.4, 0.6, 0.2]. Since ρ ∈ (0, 1),
λ is strictly feasible. Fix the transmission probabilities as pk = 1/16, ∀k. The “reference payload
5.6. PROOFS OF THEOREMS 85


 


  

Figure 5.5: The conflict graph in simulations

length” T0 = 15. The collision length (e.g., RTS length) is γ = η · 10, and the overhead of success-
ful transmission is τ = η · 20, where η is a “relative size” of the overhead for simulation purpose.
Later we will let η ∈ {1, 0.5, 0.2} to illustrate the effects of overhead size.

Now we vary ρ and η. And in each case we solve problem (5.11) to obtain the required
mean payload length Tk := T0 · exp(rk∗ ), k = 1, 2, . . . , 7. Fig. 5.6 (a) shows how Tk ’s change as
p p
p
the load ρ changes, with η = 1. Clearly, as ρ increases, the Tk ’s tend to increase. Also, the rate
of increase becomes faster as ρ approaches 1. Therefore, as mentioned before, there is a tradeoff
between the throughput and transmission lengths (long transmission lengths introduce larger delays
p
for conflicting links). Fig. 5.6 (b) shows how the Tk ’s depend on the relative size η of the overhead
p
(with fixed ρ = 0.8 and η ∈ {1, 0.5, 0.2}). As expected, the smaller the overhead, the smaller Tk ’s
are required.

Next, we evaluate Algorithm 4(b) in our C++ simulator. The update in (5.17) is performed
every M = 500 slots. Let the step size α(i) = 0.23/(2 + i/100), the upper bound rmax = 5, the
lower bound rmin = 0, and the “gap”
= 0.005. Assume the initial value of each rk is 0.
Let the “load” of arrival rates be ρ = 0.8 (i.e., λ = 0.8 · λ̄), and the relative size of overhead
η = 0.5 (i.e., γ = 5, τ = 10). To show the negative drift of the queue lengths, assume that initially
all queue lengths are 300 data units (where each data unit takes 100 slots to transmit). As expected,
Fig. 5.7 (a) shows the convergence of the mean payload lengths, and Fig. 5.7 (b) shows that all
queues are stable.

5.6 PROOFS OF THEOREMS


5.6.1 PROOF OF THEOREM 5.1
The proof of this theorem is composed of two steps.The first step is to derive the invariant distribution
of the CSMA/CA Markov chain in Lemma 5.7. The key idea of the derivation of that lemma is
to show that the CSMA/CA Markov chain is quasi-reversible, meaning that the Markov chain
reversed in time admits the same model, except that the duration to-date is replaced by the residual
86 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS

Required payload length vs. Overhead size


450
Required payload length vs. load
1800 Link 1
400 Link 2
Link 1 Link 3
1600
Link 2 350 Link 4
Link 3
1400 Link 5

Payload length (in slots)


Link 4
300 Link 6
Link 5
Payload length (in slots)

1200 Link 7
Link 6
250
Link 7
1000
200
800
150
600
100
400
50
200
0
0 0.2 0.5 1
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9
Load Overhead (relative size)

(a) Relation with the load (given η = 1) (b) Relation with the overhead (given ρ = 0.8)

Figure 5.6: Required mean payload lengths

transmission time. In the second step, we prove the theorem by summing the distribution of the
CSMA/CA Markov chain over all the states with the same on-off links.

Lemma 5.7 Invariant Distribution of CSMA/CA Markov Chain


In the stationary distribution, the probability of a valid state w as defined by (5.3) is

π(w) = qi [pj · fj (w)] (5.18)


K0
i:xi =0 j :xj =1

where

1 if j ∈ φ(x(w))
fj (w) = , (5.19)
Pj (bj (w)) if j ∈ S(x(w))

where Pj (bj (w)) is the p.m.f. of link j ’s transmission length, as defined in (5.1). Also, K0 is a normalizing

term such that w π(w) = 1, i.e., all probabilities sum up to 1. Note that π(w) does not depend on ak ’s.
5.6. PROOFS OF THEOREMS 87

Average payload lengths Queue lengths


250 500
Link 1
450 Link 2
Link 3
200 400 Link 4
Average payload lengths (in slots)

Link 5

Queue lengths (data units)


350 Link 6
Link 7
150 300

250
Link 1
100 Link 2 200
Link 3
150
Link 4
50 Link 5 100
Link 6
Link 7 50

0 0
0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5
time (ms) 5
x 10 time (ms) x 10
5

(a) Convergence of the mean payload lengths (b) Stability of the queues

Figure 5.7: Simulation of Algorithm (5.17) (with the conflict graph in Fig. 5.5)

Proof. Consider a transition from a valid state w to a valid state w . Define the sets

A = {i | wi = 0, ai (w ) = 1}
B = {i | ai (w) = bi (w), ai (w ) = 1}
C0 = {i | wi = wi = 0 and i is blocked}
C1 = {i | wi = wi = 0 and i is not blocked}
D = {i | ai (w) < bi (w), ai (w ) = ai (w) + 1, bi (w ) = bi (w)}
E = {i | ai (w) = bi (w), wi = 0}

By “i is blocked,” we mean that in state w link i has a neighbor that is transmitting a packet and
that transmission is not in its last time slot. As a result, link i cannot start a transmission in the next
slot. In other words, link i has a neighbor which is in the same transmission in states w and w .
A transition from w to w is possible if and only if all i belong to A ∪ B ∪ · · · ∪ E. Then, the
probability of a transition from w to w is

Q(w, w ) = i∈A∪B pi fi (w )i∈C1 ∪E qi .

We now define a similar system. The only difference is that if a node is transmitting, its state
is (b, a) if the transmission will last b slots and the number of slots to go is a (including the current
one).
88 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
Consider a transition from state w to state w. This transition is possible if and only if all i’s
belong to A ∪ B ∪ · · · ∪ E , where A , . . . , E are defined similarly to A, . . . , E:

A = {i | wi = 0, ai (w) = bi (w)}
B = {i | ai (w ) = 1, ai (w) = bi (w)}
C0 = {i | wi = wi = 0 and i is blocked}
C1 = {i | wi = wi = 0 and i is not blocked}
D = {i | ai (w ) > 1, ai (w) = ai (w ) − 1, bi (w) = bi (w )}
E = {i | ai (w ) = 1, wi = 0}

Claim 1: If the transition from state w to w is possible in the original system, then the
transition from w to w is possible in the new system. And vice versa.
To prove this claim, note that A = E, B = B, D = D, E = A. Also, C0 = C0 , C1 = C1 .
This is because if link i is in C0 , then there is a neighbor j which is in the same transmission in
state w and w . So link i is also in the set C0 . And vice versa. As a result, C0 = C0 . Similarly, one
can show that C1 = C1 .
If the transition from state w to w is possible in the original system, all i’s belong to A ∪ B ∪
· · · ∪ E. By the above identities, all i’s also belong to A ∪ B ∪ · · · ∪ E , so the transition from w
to w is possible in the new system. This completes the proof of Claim 1.
The probability of transition in the new system is

Q̃(w , w) = i∈E ∪C1 qi i∈B ∪A pi fi (w) = i∈A∪C1 qi i∈B∪E pi fi (w).

Claim 2:
π(w)Q(w, w ) = π(w )Q̃(w , w), (5.20)
where
π(w) = K0 i:wi =0 qi i:wi =0 pi fi (w).
In this expression, K0 is a normalizing constant.
To prove this identity, consider a pair (w, w ) such that Q(w, w ) > 0, i.e., such that
Q̃(w , w) > 0. Then

{i | wi = 0} = A ∪ C0 ∪ C1 and {i | wi = 0} = B ∪ D ∪ E.

Consequently,
π(w) = K0 i∈A∪C0 ∪C1 qi i∈B∪D∪E pi fi (w).
Hence,
π(w)
= K0 i∈C0 qi i∈D pi fi (w). (5.21)
Q̃(w , w)
Similarly,

{i | wi = 0} = C0 ∪ C1 ∪ E and {i | wi = 0} = A ∪ B ∪ D.
5.6. PROOFS OF THEOREMS 89
Consequently,

π(w ) = K0 i∈C0 ∪C1 ∪E qi i∈A∪B∪D pi fi (w ).

Hence,
π(w )
= K0 i∈C0 qi i∈D pi fi (w ). (5.22)
Q(w, w )

For i ∈ D, one has bi (w ) = bi (w), so that the expressions in (5.21) and (5.22) agree. There-
fore, Claim 2 hold.
Finally, we sum up equation (5.20) over all states w’s that can transit to w in the original
system. By Claim 1, this is the same as summing up over all states w’s that w can transit to in the
new system. Therefore,

  
π(w)Q(w, w ) = π(w )Q̃(w , w) = π(w ) Q̃(w , w) = π(w ).
w w w

Using Lemma 5.7, the probability of any on-off state x, as in Theorem 5.1, can be computed
by summing up the probabilities of all states w’s with the same on-off state x, using (5.18).
Define the set of valid states B (x) := {w| the on-off state is x in the state w}. By Lemma 5.7,
we have

p(x) = π(w)
w∈B(x)
1 

= { qi [pj · fj (w)]}
E
w∈B(x) i:xi =0 j :xj =1
1



= ( qi pj ) fj (w)
E
i:xi =0 j :xj =1 w∈B(x) j :xj =1
1



= ( qi pj ) · [ Pj (bj )] (5.23)
E
i:xi =0 j :xj =1 w∈B(x) j ∈S(x)


Now we compute the term w∈B(x) [ j ∈S(x) Pj (bj )]. Consider a state w =
{x, ((bk , ak ), ∀k : xk = 1)} ∈ B (x). For k ∈ S(x), bk can be different values in Z++ . For each fixed
bk , ak can be any integer from 1 to bk . For a collision component Cm (x) (i.e., |Cm (x)| > 1), the
90 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS

remaining time of each link in the component, a (m) , can be any integer from 1 to γ . Then we have


[ Pj (bj )]
w∈B(x) j ∈S(x)

 

= [ Pj (bj )] ( 1)
j ∈S(x) bj 1≤aj ≤bj m:|Cm (x)|>1 1≤a (m) ≤γ


= [ bj Pj (bj )] · γ h(x)

j ∈S(x) bj

=( Tj )γ h(x) (5.24)
j ∈S(x)

Combining (5.23) and (5.24) completes the proof.

5.6.2 PROOF OF THEOREM 5.2


To prove that theorem, we first define the “detailed state” in Step 1. We need an alternate character-
ization of feasible rates that we derive in Step 2. Using this characterization, we prove the existence
of r ∗ in Step 3. In Step 4, we prove that r ∗ is in fact unique.
Step 1: “Detailed State” and an Alternative Expression of the Service Rates
We calculate the service rates by considering all the states of the CSMA/CA Markov chain
that correspond to a link transmitting. We start with a bit of notation.
If at an on-off state x, k ∈ S(x) (i.e., k is transmitting successfully), it is possible that link k is
transmitting the overhead or the payload. So we define the “detailed state” (x, z), where z ∈ {0, 1}K .
Let zk = 1 if k ∈ S(x) and link k is transmitting its payload (instead of overhead). Let zk = 0
otherwise. Denote the set of all possible detailed states (x, z) by S .
Then similar to the proof of Theorem 5.1, and using equation (5.7), we have the following
product-form stationary distribution

1 
p((x, z); r) = g(x, z) · exp( zk rk ) (5.25)
E(r)
k

where

g(x, z) = g(x) · (τ )|S(x)|−1 z T01 z . (5.26)
where 1 z is the number of links that are transmitting the payload in state (x, z).
Clearly, this provides another expression of the service rate sk (r):

sk (r) = p((x, z); r). (5.27)
(x,z)∈S :zk =1
5.6. PROOFS OF THEOREMS 91
Step 2: Alternative Characterization of Feasible Rates
Now, we give alternative definitions of feasible and strictly feasible arrival rates to facilitate
our proof. We will show that these definitions are equivalent to Definition 3.1.

Definition 5.8 Feasible Rates.


A vector of arrival rate λ ∈ RK +(where K is the number of links) is feasible if there exists a
probability distribution p̄ over S (i.e., (x,z)∈S p̄((x, z)) = 1 and p̄((x, z)) ≥ 0), such that

λk = p̄((x, z)) · zk . (5.28)
(x,z)∈S

Let C¯CO be the set of feasible λ, where “CO” stands for “collision”.

The rationale of the definition is that if λ can be scheduled by the network, the fraction of
time that the network spent in the detailed states must be non-negative and sum up to 1. (Note
that (5.28) is the probability that link k is sending its payload given the distribution of the detailed
states.)
For example, in the network in Fig. 2.1, λ = (0.5, 0.5, 0.5) is feasible because (5.28) holds if
we let the probability of the detailed state (x = (1, 0, 1), z = (1, 0, 1)) be 0.5, the probability of the
detailed state (x = (0, 1, 0), z = (0, 1, 0)) be 0.5, and all other detailed states have probability 0.

Definition 5.9 Strictly Feasible Rates.


A vector of arrival rate λ ∈ RK + is strictly feasible if it can be written as (5.28) where

(x,z)∈S p̄((x, z)) = 1 and p̄((x, z)) > 0. Let CCO be the set of strictly feasible λ.
In the previous example, λ = (0.5, 0.5, 0.5) is not strictly feasible since it cannot be written
as (5.28) where all p̄((x, z)) > 0. But λ = (0.49, 0.49, 0.49) is strictly feasible.

Lemma 5.10 Equivalence of Feasibility Definitions


The above definitions are equivalent to Definition 3.1. That is,

C¯CO = C¯ (5.29)
CCO = C. (5.30)


Proof. We first prove (5.29). By definition, any λ ∈ C¯ can be written as λ = σ ∈X p̄σ σ where
X is the set of independent sets, and p̄ = (p̄σ )σ ∈X is a probability distribution, i.e., p̄σ ≥

0, σ ∈X p̄σ = 1. Now, we construct a distribution p over the states (x, z) ∈ S as follows. Let
92 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS
p((σ, σ )) = p̄σ , ∀σ ∈ X , and let p((x, z)) = 0 for all other states (x, z) ∈ S . Then, clearly
   ¯
(x,z)∈S p((x, z)) · z = σ ∈X p((σ, σ )) · σ = σ ∈X p̄σ σ = λ, which implies that λ ∈ CCO . So,

C¯ ⊆ C¯CO . (5.31)

On the other hand, if λ ∈ C¯CO , then λ = (x,z)∈S p((x, z)) · z for some distribution p over

S . We define another distribution p̄ over X as follows. Let p̄σ = (x,z)∈S :z=σ p((x, z)), ∀σ ∈ X .
   
Then, λ = (x,z)∈S p((x, z)) · z = σ ∈X (x,z)∈S :z=σ p((x, z))σ = σ ∈X p̄σ σ , which implies
that λ ∈ C¯. Therefore,
C¯CO ⊆ C¯. (5.32)

Combining (5.31) and (5.32) yields (5.29).


We defined that C is the interior of C¯. To prove (5.30), we only need to show that CCO is also
the interior of C¯. The proof is similar to that of 3.13.1, and is thus omitted. 2

Step 3: Existence of r ∗
Assume that λ is strictly feasible. Consider the following convex optimization problem, where
the vector u can be viewed as a probability distribution over the detailed states (x, z):

maxu {H (u) + [u(x,z) · log(g(x, z))]}
(x,z)∈S

s.t. u(x,z) = λk , ∀k
(x,z)∈S :zk =1

u(x,z) ≥ 0, u(x,z) = 1 (5.33)
(x,z)


where H (u) := (x,z)∈S [−u(x,z) log(u(x,z) )] is the “entropy” of the distribution u.

Let rk be the dual variable associated with the constraint (x,z)∈S :zk =1 u(x,z) = λk , and let
the vector r := (rk ). We will show the following.

Lemma 5.11 The Optimal Dual Variables are Suitable


The optimum dual variables r ∗ (when problem (5.33) is solved) exists, and satisfy (5.10), i.e.,
sk (r ∗ ) = λk , ∀k. Also, the dual problem of (5.33) is (5.11).
5.6. PROOFS OF THEOREMS 93
Proof. With the above definition of r, a partial Lagrangian of problem (5.33) (subject to u(x,z) ≥

0, (x,z) u(x,z) = 1) is
 
L(u; r) = [−u(x,z) log(u(x,z) )] + [u(x,z) · log(g(x, z))]
(x,z)∈S (x,z)∈S
 
+ rk [ u(x,z) − λk ]
k (x,z)∈S :zk =1
 
= {u(x,z) [− log(u(x,z) ) + log(g(x, z)) + rk ]}
(x,z)∈S k:zk =1

− (rk λk ). (5.34)
k

So
∂ L(u; r) 
= − log(u(x,z) ) − 1 + log(g(x, z)) + rk .
∂u(x,z)
k:zk =1

We claim that
u(x,z) (r) := p((x, z); r), ∀(x, z) ∈ S (5.35)

(cf. equation (5.25)) maximizes L(u; r) over u subject to u(x,z) ≥ 0, (x,z) u(x,z) = 1. Indeed, the
partial derivative at the point u(r) is
∂ L(u(r); r)
= log(E(r)) − 1,
∂u(x,z)
which is the same for all (x, z) ∈ S (Since given the dual variables r, log(E(r)) is a constant). Also,

u(x,z) (r) = p((x, z); r) > 0 and (x,z) u(x,z) (r) = 1. Therefore, it is impossible to increase L(u; r)
by slightly perturbing u around u(r) (subject to 1T u = 1). Since L(u; r) is concave in u, the claim
follows.
Denote l(y) = maxu L(u; y), then the dual problem of (5.33) is inf y l(y). Plugging the ex-
pression of u(x,z) (y) into L(u; y), it is not difficult to find that inf r l(r) is equivalent to supr L(r; λ)
where L(r; λ) is defined in (5.12).

Since λ is strictly feasible, it can be written as (5.28) where (x,z)∈S p̄((x, z)) = 1 and
p̄((x, z)) > 0. Therefore, there exists u  0 (by choosing u = p̄) that satisfies the constraints
in (5.33) and also in the interior of the domain of the objective function. So, problem (5.33) satisfies
the Slater condition (8). As a result, there exists a vector of (finite) optimal dual variables r ∗ when
problem (5.33) is solved. Also, r ∗ solves the dual problem supr L(r; λ). Therefore, supr L(r; λ) is
attainable and can be written as maxr L(r; λ), as in (5.11).
Finally, the optimal solution u∗ of problem (5.33) is such that u∗(x,z) = u(x,z) (r ∗ ), ∀(x, z) ∈ S .
Also, u∗ is clearly feasible for problem (5.33). Therefore,

u∗(x,z) = sk (r ∗ ) = λk , ∀k.
(x,z)∈S :zk =1
94 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS

2
Remark: From (5.34) and (5.35), we see that a subgradient (or gradient) of the dual objective function
L(r; λ) is
∂L(r; λ) 
= λk − u(x,z) (r) = λk − sk (r).
∂rk
(x,z)∈S :zk =1

This can also be obtained by direct differentiation of L(r; λ).


Step 4: Uniqueness of r ∗
Now, we show the uniqueness of r ∗ . Note that the objective function of (5.33) is strictly
concave.Therefore, u∗ , the optimal solution of (5.33) is unique. Consider two detailed states (ek , ek )
and (ek , 0), where ek is the K-dimensional vector whose k’th element is 1, and all other elements
are 0’s. We have u∗(ek ,ek ) = p((ek , ek ); r ∗ ) and u∗(ek ,0) = p((ek , 0); r ∗ ). Then by (5.25),

u(ek ,ek ) (r ∗ )/u(ek ,0) (r ∗ ) = exp(rk∗ ) · (T0 /τ ). (5.36)

Suppose that r ∗ is not unique, that is, there exist rI∗ = rI∗I but both are optimal r. Then, rI,k
∗ = r ∗
I I,k
for some k. This contradicts (5.36) and the uniqueness of u∗ . Therefore, r ∗ is unique. This also
implies that maxr L(r; λ) has a unique solution r ∗ .

5.6.3 PROOF OF THEOREM 5.4


We will use results in (7) to prove Theorem 5.4. Similar techniques have been used in (48) to analyze
the convergence of an algorithm in (30).
Part (i): Proof of Theorem 5.4 with Decreasing Step Size
Define the concave function

⎨−(rmin − y) /2
⎪ 2 if y < rmin
H (y) := 0 if y ∈ [rmin , rmax ] (5.37)


−(rmax − y)2 /2 if y > rmax .
Note that dH (y)/dy = h(y) where h(y) is defined in (5.14). Let G(r; λ) = L(r; λ) +
 ∗
k H (rk ). Since λ is strictly feasible, maxr L(r; λ) has a unique solution r . That is, L(r ∗ ; λ) >

L(r; λ), ∀r = r . Since r ∈ (rmin , rmax ) by assumption, it follows that ∀r, k H (rk∗ ) = 0 ≥
∗ ∗ K
 ∗ ∗ ∗
k H (rk ). Therefore, G(r ; λ) > G(r; λ), ∀r = r . So r is the unique solution of maxr G(r; λ).
Because ∂G(r; λ)/∂rk = λk − sk (r) + h(rk ), Algorithm 4 tries to solve maxr G(r; λ) with inaccu-
rate gradients.
Let v s (t) ∈ RK be the solution of the following differential equation (for t ≥ s)

dvk (t)/dt = ∂G(v(t); λ)/∂vk = λk − sk (v(t)) + h(vk (t)), ∀k (5.38)


5.6. PROOFS OF THEOREMS 95
with the initial condition that v s (s)= r̄(s). So, (5.38) can be viewed as a continuous-time gradient
algorithm to solve maxr G(r; λ), and v s (t) can be viewed as the “ideal” trajectory of Algorithm 4 with
accurate gradients. We have shown above that r ∗ is the unique solution of the convex optimization
problem maxr G(r; λ), so v s (t) converges to the unique r ∗ with any initial condition v s (s).
Recall that in Algorithm 4, r(i) is always updated at the beginning of a minislot. Define
Y (i − 1) := (sk (i), w0 (i)) where w0 (i) is the state w at time ti . Then {Y (i)} is a non-homogeneous
Markov process whose transition kernel from time ti−1 to ti depends on r(i − 1). The update in
Algorithm 3 can be written as

rk (i) = rk (i − 1) + α(i) · [f (rk (i − 1), Y (i − 1)) + M(i)]

where f (rk (i − 1), Y (i − 1)) := λk − sk (i) + h(rk (i − 1)), and M(i) = λ k (i) − λk is a martingale
noise.
To use Corollary 8 in page 74 of (7) to show Algorithm 3’s almost-sure convergence to r ∗ , the
following conditions are sufficient:

(i) f (·, ·) is Lipschitz in the first argument and uniformly in the second argument. This holds
by the construction of h(·).

(ii) The transition kernel of Y (i) is continuous in r(i).This is true due to the way we randomize
the transmission lengths in (5.15).

(iii) (5.38) has a unique convergent point r ∗ , which has been shown above.

(iv) With Algorithm 4, rk (i) is bounded ∀k, i almost surely. This is proved in Lemma 5.12
below.

(v) Tightness condition ((†) in (7), page 71): This is satisfied since Y (i) has a bounded state-
space (cf. conditions (6.4.1) and (6.4.2) in (7), page 76). The state space of Y (i) is bounded
because sk (i) ∈ [0, 1] and w0 (i) is in a finite set (which is shown in Lemma 5.13) below.

So, by (7), r(i) converges to r ∗ , almost surely.

Lemma 5.12 With Algorithm 4, r(i) is Bounded


With Algorithm 4, r(i) is always bounded. Specifically, rk (i) ∈ [rmin − 2, rmax + 2λ̄], ∀k, i,
where λ̄, as defined before, is the maximal instantaneous arrival rate, so that λ k (i) ≤ λ̄, ∀k, i.

Proof. We first prove the upper bound rmax + 2λ̄ by induction: (a) rk (0) ≤ rmax ≤ rmax + 2λ̄; (b)
For i ≥ 1, if rk (i − 1) ∈ [rmax + λ̄, rmax + 2λ̄], then h(rk (i − 1)) ≤ −λ̄. Since λ k (i) − sk (i) ≤ λ̄,
we have rk (i) ≤ rk (i − 1) ≤ rmax + 2λ̄. If rk (i − 1) ∈ (rmin , rmax + λ̄), then h(rk (i − 1)) ≤ 0.
96 5. DISTRIBUTED CSMA SCHEDULING WITH COLLISIONS

Also since λ k (i) − sk (i) ≤ λ̄ and α(i) ≤ 1, ∀i, we have rk (i) ≤ rk (i − 1) + λ̄ · α(i) ≤ rmax + 2λ̄.
If rk (i − 1) ≤ rmin , then

rk (i) = rk (i − 1) + α(i)[λ k (i) − sk (i) + h(rk (i − 1))]


≤ rk (i − 1) + α(i){λ̄ + [rmin − rk (i − 1)]}
= [1 − α(i)] · rk (i − 1) + α(i){λ̄ + rmin }
≤ [1 − α(i)] · rmin + α(i){λ̄ + rmin }
= rmin + α(i) · λ̄
≤ λ̄ + rmin ≤ rmax + 2λ̄.

The lower bound rmin − 2 can be proved similarly. 2

Lemma 5.13 With Algorithm 4, w0 (i) is Bounded


In Algorithm 4, w0 (i) is in a finite set.

p
Proof. By Lemma 5.12, we know that rk (i) ≤ rmax + 2λ̄, ∀k, i, so Tk (i) ≤ T0 exp(rmax +
p
2λ̄), ∀k, i. By (5.15), we have τk (i) ≤ T0 exp(rmax + 2λ̄) + 1, ∀k, i. Therefore, in state w0 (i) =
{x, ((bk , ak ), ∀k : xk = 1)}, we have bk ≤ bmax for a constant bmax and ak ≤ bk for any k such that
xk = 1. So, w0 (i) is in a finite set. 2
Part (ii): Proof of Theorem 5.4 with Constant Step Size
The intuition is the same as in part (i). That is, if the constant step size is small enough, then
the algorithm approximately solves problem maxr G(r; λ). Please refer to (34) for the full proof.

5.7 SUMMARY
The goal of this chapter was to define a CSMA algorithm, Algorithm 4, that achieves maximum
throughput in a network with collisions. The main idea is to let stations with a big backlog transmit
longer packets. In this protocol, the attempt probability is the same for all the stations and is constant.
The stations transmit a short request (similar to an RTS/CTS exchange in WiFi). Stations collide
if they start their request during the same mini-slot. However, the collisions have a short (fixed)
duration.Thus, as the packet transmissions increase, the fraction of time that collisions waste becomes
negligible.
Section 5.2 describes the protocol with collisions and its model. Theorem 5.1 provides the
expression of link service rates. The main idea behind that result is the quasi-reversibility of the
CSMA/CA Markov chain with collisions. That property enables to develop all the results of the
chapter. Theorem 5.2 establishes the existence of transmission duration parameters that stabilize the
queues. Section 5.3 specifies Algorithm 4 in Definition 5.3, and its throughput-optimality is stated
in Theorem 5.4. Section 5.4 specifies Algorithm 4(b) in Definition 5.5. This algorithm is a version
of Algorithm 4 designed to reduce the delays. The capacity region of that algorithm is given by
5.8. RELATED WORKS 97
Theorem 5.6. Section 5.5 discusses numerical examples that confirm the analytical results. Finally,
Section 5.6 gives the technical proofs.

5.8 RELATED WORKS


In (59), Ni and Srikant proposed a CSMA-like algorithm to achieve near-optimal throughput with
collisions taken into account. The algorithm in (59) uses synchronized and alternate control phase
and data phase. Collisions only occur in the control phase but not in the data phase. In the data phase,
the algorithm realizes a discrete-time CSMA with the same product-form stationary distribution
as its continuous counterpart described in Section 3, which is then used to achieve the maximal
throughput.
To also consider the effect of collisions, the authors of (48) used a perturbation analysis of the
idealized CSMA. In particular, in CSMA with discrete backoff counters, they increase the backoff
times and transmission times proportionally, which makes the model asymptotically approach the
idealized CSMA since the probability of collisions becomes negligible. The same intuition was
discussed in (30). Reference (48) also discussed the tradeoff between the throughput and short-term
fairness when the transmission times are increased.
As mentioned before, a by-product of our study is the development of a quite general model for
CSMA with discrete backoff counters. Previously, the throughput expression is known for networks
where all links conflict with each other (e.g., a wireless LAN) (4) and for an idealized CSMA model
without collisions (5). It turns out that these existing models are special cases of our model under a
certain topology or in an asymptotic regime (see (32) for a more detailed discussion).
99

CHAPTER 6

Stochastic Processing networks


6.1 INTRODUCTION
Stochastic Processing Networks (SPNs) are models of service, processing, communication, or man-
ufacturing systems (75). In such a network, service activities require parts and resources to produce
new parts. Thus, parts flow through a network of buffers served by activities that consume parts
and produce new ones. Typically, service activities compete for resources, which yields a scheduling
problem. The goal of the scheduling is to maximize some measure of performance of the network,
such as the net utility of parts being produced.
As SPNs are more general than queuing networks, one may expect the scheduling that min-
imizes an average cost such as total waiting time to be complex. Indeed, the optimal scheduling of
queuing networks is known only for simple cases, such as serving the longest queue or the Klimov
network (70). For SPNs, one approach has been to consider these networks under the heavy-traffic
regime (23). In such a regime, a suitable scheduling may collapse the state space. For instance, when
serving the longest queue, under heavy traffic the queue lengths become equal. It is then sometimes
possible to analyze the SPN under heavy traffic as in (24). Using this approach, in (13), the authors
prove the asymptotic optimality under heavy traffic of maximum back-pressure policies for a class
of SPNs. It may also happen that the control of the heavy traffic diffusion model is tractable while
the original problem is not (73).
Another line of investigation explores a less ambitious formulation of the problem. Instead
of considering the Markov decision problem of minimizing an average cost, this approach searches
for controls that stabilize the queues in the network or that maximize the utility of its flows. This
approach has been followed successfully for communication networks as we reviewed in Chapters 4
and 5.
This chapter follows a similar approach. The objective is to achieve throughput optimality
and maximize the total net utility of flows of parts that the network produces. However, the scheme
proposed in the chapter differs from previous work1 . For instance, simple examples show that MWM
is not stable for some SPNs and that a new approach is needed. The basic difficulty is that MWM
and related algorithms are too greedy and may lead some tasks to starve other tasks of parts. Dai and
Lin (12) show that MWM is stable in SPNs if the network structure satisfies a certain assumption
(for example, in a limited class of SPNs where each task consumes parts from a single queue). We
propose a deficit maximum weight (DMW) algorithm (36) that automatically makes certain tasks
1 Also, it is not based on the randomized CSMA scheduling described in previous chapters, although combining the idea in this
chapter and CSMA scheduling is possible.
100 6. STOCHASTIC PROCESSING NETWORKS
wait instead of always grabbing the parts they can use, therefore achieving throughput optimality
without the assumption in (12).
The chapter is organized as follows. Section 6.2 illustrates through examples the basic difficul-
ties of scheduling SPNs and the operations of the DMW scheduling algorithm. Section 6.3 defines
the basic model. Section 6.4 describes the DMW algorithm formally and proves that it stabilizes the
network. Section 6.5 explains that the algorithm, combined with the control of the input activities,
maximizes the sum of the utilities of the network. Section 6.6 discusses the extension of the results
to the case when tasks have variable durations. Section 6.7 provides a number of simulation results
to confirm the results of the chapter.

6.2 EXAMPLES
This section illustrates critical aspects of the scheduling of SPNs on simple examples. Figure 6.1
shows a SPN with one input activity (IA) represented by the shaded circle and four service activities
(SAs) represented by white circles. SA2 needs one part from queue 2 and produces one part that
leaves the network, similarly for SA4. SA3 needs one part from each of the queues 2, 3 and 4 and
produces one part that leaves the network. SA1 needs one part from queue 1 and produces one part
which is added to queue 4. Each SA takes one unit of time. There is a dashed line between two SAs
if they cannot be performed simultaneously. These conflicts may be due to common resources that
the SAs require. The parts arrive at the queues as follows: at even times, IA1 generates one part for
each of the queues 1, 2 and 3; at odd times, no part arrives.
One simple scheduling algorithm for this network is as follows. At time 0, buffer the parts
that arrive at queues 1, 2 and 3. At time 1, perform SA1 which removes one part from queue 1 and
adds one part to queue 4. At time 2, use the three parts in queue 2, 3, 4 to perform SA3 and buffer
the new arrivals. Repeat this schedule forever, i.e., perform SA1 and SA3 alternately. This schedule
makes the system stable.
Interestingly, the maximum weight algorithm (MWM) makes this system unstable (in a way
similar to a counter example in (12)). By definition, at each time, MWM schedules the SAs that





 
 



Figure 6.1: A network unstable under MWS


6.2. EXAMPLES 101
maximize the sum of the back-pressures. Accordingly, at time 1, one part has arrived in queue 1,
2 and 3 (at time 0). Since queue 4 is empty, SA3 and SA4 cannot be scheduled, so this algorithm
schedules SA1 and SA2, after which one part remains in queue 3 and queue 4. At time 2, the
algorithm schedules SA4, and buffers new arrivals, after which two parts remain in queue 3, and
one part in queue 1 and queue 2. Continuing in this way, the number of parts in queue 3 increases
without bound since the algorithm never schedules SA3 and never serves queue 3. (In fact, any
work-conserving algorithm leads to the same result in this example.) The deficit maximum weight
algorithm that we propose in this chapter addresses this instability.
Fig. 6.2 provides another example of instability, this time due to randomness. There, SA1
processes each part in queue 1 and then produces one part for queue 2 or queue 3, each with
probability 0.5. Each activation of SA2 assembles one part from queue 2 and one part from queue
3. Each SA takes one unit of time. If the parts arrive at queue 1 at rate λ1 < 1, then one would
expect the SPN to be able to process these parts. However, the difference between the number of
parts that enter the queues 2 and 3 is null recurrent. Thus, no scheduling algorithm can keep the
backlogs in the queues 2 and 3 bounded at the same time. In this chapter, we are only interested in
those networks which can be stabilized.


ZS 
Ȝ 
  

ZS 

Figure 6.2: An infeasible example

Figure 6.3 shows another SPN. IA1 produces one part for queue 1. IA2 produces one part
for queue 2 and one part for queue 3. The synchronized arrivals generated by IA2 correspond to
the ordering of a pair of parts, as one knows that such a pair is needed for SA2. This mechanism
eliminates the difficulty encountered in the example of Figure 6.2. In Figure 6.3, we say that each
IA is “source” of a “flow” of parts (as a generalization of a “flow” in data networks). SA1 and SA2
in this network conflict, as indicated by the dashed line between the SAs. Similarly, SA2 and SA3
conflict. One may consider the problem of scheduling both the IAs (ordering parts) and the SAs
to maximize some measure of performance. Our model assumes the appropriate ordering of sets of
parts to match the requirements of the SAs.
We explain the deficit maximum weight (DMW) scheduling algorithm on the example of
Figure 6.1. In that example, we saw that MWM is unstable because it starves SA3. Specifically,
MWM schedules SA2 and SA4 before the three queues can accumulate parts for SA3. The idea
of DMW is to pretend that certain queues are empty even when they have parts, so that the parts
can wait for the activation of SA3. The algorithm is similar to MWM, but the weight of each SA is
102 6. STOCHASTIC PROCESSING NETWORKS

1 4
1 1 3

2
2
2 3

Figure 6.3: Arrivals and conflicting service activities

computed from the virtual queue lengths qk = Qk − Dk , ∀k. Here, Qk is the actual length of queue
k and Dk ≥ 0 is called deficit.
DMW automatically finds the suitable values of the deficits Dk . To do this, DMW uses the
maximum-weighted schedule without considering whether there are enough input parts available.
When the algorithm activates a SA that does not have enough input parts in queue k, the SA produces
fictitious parts, decreases qk (which is allowed to be negative) and increases the deficit of queue k. This
algorithm produces the results in Table 6.1 where each column gives the values of q and D after the
activities in a slot. For deficits, only D4 is shown since the deficits of all other queues are 0. In the table,
SA0 means that no SA is scheduled because all the weights of the activities are non-positive. Note
that when SA3 is activated for the first time, queue 4 is empty: Q4 = 0. Therefore, q4 is decreased to
-1, D4 is increased to 1 and a fictitious part is produced. But since SA1 is activated simultaneously,
q4 becomes 0 after this slot. After that, the sequence (SA0+IA1, SA3+SA1) repeats forever and no
more fictitious parts are produced. The key observation is that, although the virtual queue lengths
are allowed to become negative, they remain bounded in this example. Consequently, with proper
D, the actual queue lengths Q = q + D are always non-negative, and thus the starvation problem
is avoided.

Table 6.1: Deficit Maximum Weight scheduling


Activity→ SA0+IA1 SA3+SA1 SA0+IA1 SA3+SA1 ...
q1 0 1 0 1 0 ...
q2 0 1 0 1 0 ...
q3 0 1 0 1 0 ...
q4 0 0 0 0 0 ...
D4 0 0 1 1 1
6.3. BASIC MODEL 103

6.3 BASIC MODEL


For simplicity, assume a time-slotted system. In each slot, a set of input activities (IAs) and service
activities (SAs) are activated. Assume that each activity lasts one slot for the ease of exposition. (In
Section 6.6, we discuss the case where different activities have different durations.) There are M IAs,
N SAs, and K queues in the network. Each IA, when activated, produces a deterministic number
of parts for each of a fixed set of queues. Each SA, when activated, consumes parts from a set of
queues, and produces parts for another set of queues and/or some “products” that leave the network.
The set of IAs, SAs and queues are defined to be consistent with the following:

(i) Each IA is the “source” of a flow of parts, like in Figure 6.3. (There are M IAs and M
flows.) In other words, the parts generated by one activation of IA m can be exactly served
by activating some SAs and eventually produce a number of products that leave the network,
without leaving any part unused in the network. (This will be made more formal later.) This
is a reasonable setup since the manufacturer knows how many input parts of each type are
needed in order to produce a set of products, and he will order the input parts accordingly.
Otherwise, there will be parts not consumed, which are clearly not necessary to order.

(ii) Parts in different flows are buffered in separate queues.

(iii) A SA n is associated with a set of input queues In and a set of output queues On . Due to
the way we define the queues in (ii), different flows are served by disjoint sets of SAs. (Even if
two SAs in different flows essentially perform the same task, we still label them differently.)
Also, a SA is defined only if it is used by some flow.

Each activation of IA m adds ak,m parts to queue k. Define the input matrix A ∈ RK∗M where
Ak,m = −ak,m , ∀m, k. Each activation of SA n consumes bk,n parts from each queue k ∈ In (the

“input set” of SA n), and produces bk,n parts that are added to each queue k ∈ On (the “output set”
of SA n), and possibly a number of final products that leave the network. Assume that In ∩ On = ∅.

Accordingly, define the service matrix B ∈ RK∗N , where Bk,n = bk,n if k ∈ In , Bk,n = −bk,n if
k ∈ On , and Bk,n = 0 otherwise. Assume that all elements of A and B are integers. Also assume
that the directed graph that represents the network has no cycle (see, for example, Fig. 6.1 and
Fig. 6.3).
Let a(t) ∈ {0, 1}M , t = 0, 1, . . . be the “arrival vector” in slot t, where am (t) = 1 if IA m is
activated and am (t) = 0 otherwise. Let λ ∈ RM be the vector of average arrival rates. Let x(t) ∈
{0, 1}N be the “service vector” in slot t, where xn (t) = 1 if SA n is activated and xn (t) = 0 otherwise.
Let s ∈ RN be a vector of (average) service rates.
Point (i) above means that there exists s m ∈ RN such that

Am + B · s m = 0 (6.1)

where Am is the m’th column of A.


104 6. STOCHASTIC PROCESSING NETWORKS

Therefore, for any activation rate λm > 0 of flow m, there exists sm ∈ RN such that

A m · λm + B · s m = 0 (6.2)

The vector sm is the service rate vector for flow m that can exactly serve λm . This is a reasonable
assumption as discussed in point (i). We also assume that sm is unique given λm , i.e., there is only
one way to serve the arrivals. We expect that this assumption usually holds in practice. Summing
up (6.2) over m gives
A · λ + B · s = 0. (6.3)

where s = m sm  0.2 Note that s  0 because a SA is defined only if it is used by some flow,
and λ  0. Also, since each flow is associated with a separate set of queues and SAs, equation (6.3)
implies (6.2) for all m as well.
By assumption, given any λ, there exists a unique s satisfying (6.3), so we also write s in (6.3)
as s(λ).
Due to resource sharing constraints among the SAs, not all SAs can be performed simul-
taneously at a given time. Assuming that all queues have enough parts such that any SA can be
performed, let x̃ ∈ {0, 1}N be a feasible service vector, and X be the set of such x̃’s. (We also call x̃
an independent set since the active SAs in x̃ can be performed without conflicts.) Denote by  be
the convex hull of X , i.e.,
 
 := {s|∃p  0 : px̃ = 1, s = (px̃ · x̃)}
x̃∈X x̃∈X

and let o be the interior of . (That is, for any s ∈ o , there is a ball B̃ centered at s with radius
r > 0 such that B̃ ⊂ .)

Definition 6.1 Feasible and Strictly Feasible Rates.


We say that λ is feasible iff s(λ) ∈  and that λ is strictly feasible iff s(λ) ∈ o .

Remarks

• If λ is strictly feasible, then by definition s(λ)  0, and, therefore, λ  0. This does not cause
any loss of generality: if some λm = 0, then our DMW algorithm never activates any SA in
this flow. So the flow can be disregarded from the model.

• In a more general setting, the output parts of a certain SA can split and go to more than
one output sets. The split can be random or deterministic. For example, in a hospital, after
a patient is diagnosed, he goes to a certain room based on the result. A probabilistic model
for this is that the patients go to different rooms with certain probabilities after the SA (i.e.,
2 In this chapter, the relationship a  b where a , b ∈ RK means that a > b for i = 1, 2, . . . , K. Similarly, a  b means
i i

that ai ≥ bi for i = 1, 2, . . . , K.
6.4. DMW SCHEDULING 105
the diagnosis). The split can also be deterministic. For example, in manufacturing, the output
parts of a SA may be put into two different queues alternately.
In both cases, we can define the element Bk,n in the matrix B to be the average rate that
SA n consumes (or adds) parts from (to) queue k. However, note that in the random case, it
may not be feasible to stabilize all queues by any algorithm even if there exist average rates
satisfying (6.2). Fig. 6.2 described earlier is such an example. For simplicity, here we mainly
consider networks without splitting.

6.4 DMW SCHEDULING


In this section, we consider the scheduling problem with strictly feasible arrival rates λ. We first
describe the DMW algorithm and then show its throughput optimality.
The key idea, as we explained in Section 6.2, is to keep track of virtual queues that can become
negative and to use MWM based on these virtual queues even if that implies choosing tasks that
correspond to empty real queues. When this happens, the scheduler produces ‘virtual parts’ and
wastes time. Using a Lyapunov function approach, one shows that – under reasonable assumptions
on the arrivals – the virtual backlogs are bounded. This then implies that the scheduler wastes a
finite amount of time.
Another interpretation is as follows. Let the virtual queue length at time t be qk (t), k =
1, 2, · · · , K. DMW uses qk (t) to compute the schedule in each slot. If qk (t) is bounded for all k
at all time in the algorithm, then there exist some deficit D̄k ≥ 0, ∀k such that by letting the actual
queue length Qk (t) = qk (t) + D̄k , Qk (t) is always non-negative, i.e., there are always parts in the
queues to process, thus avoiding the instability problem caused by starvation. DMW finds the proper
deficits automatically and achieves throughput optimality.
The DMW algorithm defined below is fairly simple. At each time, the algorithm uses MWM
to schedule the tasks with the maximum back-pressure, as calculated using the virtual queues. When
a task is scheduled, the corresponding input virtual queues are decremented and the corresponding
output virtual queues are incremented as if the required parts were always available. The changes of
the actual queues, however, depend on the availability of parts.

Definition 6.2 DMW (Deficit Maximum Weight) Scheduling.


DWM is the scheduling algorithm defined as follows:
(i) Initially (at time 0), set q(0) = Q (0) = D(0) = 0.
(ii) Schedule the SAs: In each time slot t = 0, 1, 2 . . . , the set of SAs with the maximal
back-pressure is scheduled:
x ∗ (t) ∈ arg max d T (t) · x (6.4)
x∈X

where d(t) ∈ RN is the vector of back-pressures, defined as

d(t) = B T q(t), (6.5)


106 6. STOCHASTIC PROCESSING NETWORKS
and X is the set of independent sets including non-maximal ones. Also, for any SA n, we
require that xn∗ (t) = 0 if dn (t) ≤ 0.
Recall that an independent set is a set of SAs that can be performed simultaneously assuming
that all input queues have enough parts. So, it is possible that SA n is scheduled (i.e., xn∗ (t) = 1)
even if there are not enough parts in some input queues of SA n. In this case, SA n is activated
as a “null activity” (to be further explained in (ii)).

(iii) Update the virtual queues q(t): Update q as

q(t + 1) = q(t) − A · a(t) − B · x ∗ (t) (6.6)

where, as defined earlier, a(t) is the vector of actual arrivals in slot t (where the m’th element
am (t) corresponds to IA m). In this chapter, x ∗ (t) and x ∗ (q(t)) are interchangeable.
Expression (6.6) can also be written as

qk (t + 1) = qk (t) − μout,k (t) + μin,k (t), ∀k

where μout,k (t) and μin,k (t) are the number of parts coming out of or into virtual queue k in
slot t, expressed below. (We use v + and v − to denote the positive and negative part of v. That
is, v + = max{0, v} and v − = max(0, −v}, so that v = v + − v − .)
N + ∗
μout,k (t) = [Bk,n xn (t)]
n=1
M − N − ∗
μin,k (t) = m=1 [Ak,m am (t)] + n=1 [Bk,n xn (t)].

(iv) Update of actual queues Q (t) and deficits D(t): If SA n is scheduled in slot t but there
are not enough parts in some of its input queues (or some input parts are fictitious, further
explained below), SA n is activated as a null activity. Although the null activity n does not
actually consume or produce parts, parts are removed from the input queues and fictitious parts
are added to the output queues as if SA n was activated normally. So the actual queue length

Qk (t + 1) = [Qk (t) − μout,k (t)]+ + μin,k (t). (6.7)

Then the deficit is computed as

Dk (t + 1) = Qk (t + 1) − qk (t + 1). (6.8)

The proof that DMW achieves maximum throughput consists of the following steps. First,
Lemma 6.3 shows how the deficits get updated. Second, Lemma 6.4 shows that the algorithm
is optimal if q(t) is bounded. Theorems 6.5 and 6.6 provide sufficient conditions for q(t) to be
bounded.
We first derive a useful property of Dk (t).
6.4. DMW SCHEDULING 107
Lemma 6.3 Deficit Update
Dk (t) is non-decreasing with t, and satisfies

Dk (t + 1) = Dk (t) + [μout,k (t) − Qk (t)]+ .

Proof. By (6.8), (6.6) and (6.7), we have

Dk (t + 1) = Qk (t + 1) − qk (t + 1)
= [Qk (t) − μout,k (t)]+ − [qk (t) − μout,k (t)]
= Qk (t) − μout,k (t) + [μout,k (t) − Qk (t)]+
−[qk (t) − μout,k (t)]
= Dk (t) + [μout,k (t) − Qk (t)]+ , (6.9)

which also implies that Dk (t) is non-decreasing with t. 2

Lemma 6.4 Sufficient Condition for Bounded Deficits and Queues


If ||q(t)||2 ≤ G at all time t for some constant G > 0, then
(i) D(t) is bounded. Also, only a finite number of null activities occur. So in the long term the null
activities do not affect the average throughput.
(ii) Q (t) is bounded.

Proof.
Part (i): √
Note that since ||q(t)||2 ≤ G, we have −G ≤ qk (t) ≤ G , ∀k, t where G :=  G,. We
claim that Dk (t) ≤ G + μout , ∀k, t where μout is the maximum number of parts that could leave
a queue in one slot. By the definition of the DMW algorithm, Dk (t) is non-decreasing with t and
initially Dk (t) = 0.
Suppose to the contrary that Dk (t) is above G + μout for some k and t. Then there exists

t , which is the first time that Dk (t ) is above G + μout . In other words, Dk (t ) > G + μout and

Dk (t − 1) ≤ G + μout .
By (6.9) and (6.8), we have

Dk (t + 1) = Dk (t) + [μout,k (t) − Qk (t)]+


= Dk (t) + max{0, μout,k (t) − Qk (t)}
= max{Dk (t), Dk (t) + μout,k (t) − Qk (t)}
= max{Dk (t), −qk (t) + μout,k (t)}
108 6. STOCHASTIC PROCESSING NETWORKS

So Dk (t ) = max{Dk (t − 1), −qk (t − 1) + μout,k (t − 1)}. Since qk (t − 1) ≥ −G , μout,k (t) ≤

μout , we have Dk (t ) ≤ G + μout . This leads to a contradiction. Therefore, Dk (t) ≤ G +
μout , ∀t, k.
Note that when a queue underflow (i.e., when μout,k (t) > Qk (t) for some k, t) occurs, Dk is
increased. Also, the increase of Dk is a positive integer. Since D(0) = 0, D(t) is non-decreasing and
remains bounded for all t, the number of queue underflows must be finite. Since we have assumed
that the directed graph which represents the network has no cycle, it is clear that each underflow
only “pollutes” a finite number of final outputs (i.e., the products). Therefore, in the long term the
queue underflows (and the resulting null activities) do not affect the average throughput.
Part (ii): Observe that
Qk (t) = qk (t) + Dk (t) ≤ 2G + μout , ∀k, t. 2

In Section 6.4.1, we will show that q(t) is bounded under certain conditions on the arrivals.
By Proposition 6.4, Q (t) is bounded and the maximal throughput is achieved.

6.4.1 ARRIVALS THAT ARE SMOOTH ENOUGH


Recall that λ is strictly feasible. First, consider a simple case when the arrival rates are “almost con-

stant” at λ. Specifically, assume that am (t) = λm · (t + 1)] − λm · t], ∀m, t. Then t−1 τ =0 am (τ ) =
λm · t] ≈ λm · t, ∀t, so that the arrival rates are almost constant. Later, we will show that q(t) is
bounded under such arrivals.
However, since the “almost constant” assumption is quite strong in practice, it is useful to relax
it and consider more general arrival processes. In particular, consider the following (mild) smoothness
condition.
Condition 1: There exists σ > 0 and a positive integer T such that for all l = 0, 1, 2, . . . ,
ãl + σ · 1 and ãl − σ · 1 are feasible vectors of arrival rates, where
(l+1)·T −1
ãl := τ =l·T a(τ )/T (6.10)

is the vector of average arrival rates in the l’th time window of length T . In other words, there exists
a large enough time window T such as the ãl is “uniformly” strictly feasible.
Remark: Note that ãl can be very different for different l’s. That is, ãl , l = 0, 1, . . . do not
need to be all close to a certain strictly feasible λ.

Theorem 6.5 Under Condition 1, q(t) is bounded for all t. Therefore, (i) and (ii) in Proposition 6.4
hold.

The proof is in Section 6.9.1.

Theorem 6.6 With the “almost constant” arrivals, q(t) is bounded for all t.
6.5. UTILITY MAXIMIZATION 109
t−1 t−1
Proof. Since
(l+1)·T −1 τ =0 am (τ ) = λm · t],  we have | τ =0 am (τ ) − λm · t| ≤ 1, ∀t.

So
(l+1)·T −1
| τ =l·T am (τ )/T − λm | = (1/T ) · |[ τ =0 am (τ ) − λm · (l + 1)T ] − [ τl·T=0−1 am (τ ) −
λm lT ]| ≤ 2/T .
Since λ is strictly feasible, there exists σ > 0 such that λ + 2σ · 1 and λ − 2σ · 1 are feasible
(l+1)·T −1
vectors of arrival rates. Choose T ≥ 2/σ , then | τ =l·T am (τ )/T − λm | ≤ σ, ∀m. Therefore,
ãl + σ · 1 and ãl − σ · 1 are feasible. 2

6.4.2 MORE RANDOM ARRIVALS


Assume that am (t) ∈ Z+ is a random variable with bounded support, and it satisfies

E(am (t)) = λm , ∀t. (6.11)

For simplicity, also assume that the random variables {am (t), m = 1, 2, . . . , M, t = 0, 1, 2, . . . } are
independent. (This assumption, however, can be easily relaxed.) Suppose that the vector λ is strictly
feasible.
In general, this arrival process does not satisfy the smoothness condition (although when T
 −1
is large, τt+T
=t a(τ )/T is close to λ with high probability). With such arrivals, it is not difficult
to show that q(t) is stable, but it may not be bounded. As a result, the deficits D(t) may increase
without bound. In this case, we show that the system is still “rate stable”, in the sense that in the
long term, the average output rates of the final products converge to the optimum output rates (with
probability 1). The intuitive reason is that as D(t) becomes very large, the probability of generating
fictitious parts approaches 0.

Theorem 6.7 With the arrival process defined above, the system is “rate stable”.

The formal proof is given in Section 6.9.2.


Although the system is throughput optimum, with D(t) unbounded, the actual queue lengths
Q (t) = q(t) + D(t) may become large when D(t) is large. An alternative to avoid large Q (t) is to
set an upper bound of Dk (t), denoted by D̄. In this alternative, we do not increase Dk (t) once it hits
D̄. But q(t) still evolves according to part (i) of the DMW algorithm. Let the actual queue length
be Qk (t) = [qk (t) + Dk (t)]+ . Fictitious parts are generated in slot t as long as qk (t) − μout,k (t) <
−Dk (t) (or, Qk (t) − μout,k (t) < 0). Given a D̄, one expects that the output rates are lower than the
optimum in general since fictitious parts are generated with a certain probability after Dk (t) first
hits D̄. But one can make the probability arbitrarily close to 0 by choose a large enough D̄. The
proof is similar to that of Theorem 6.7 and is not included here.

6.5 UTILITY MAXIMIZATION


Assume that for each IA m, there is a “reward function” vm (fm ) (where fm is the activation rate),
and a cost function cm fm , where cm is the cost of the input materials of IA m per unit rate. Define
110 6. STOCHASTIC PROCESSING NETWORKS

the utility function as um (fm ) := vm (fm ) − cm fm . Let f ∈ RM be the vector of input activation
rates. Assume that vm (·) is increasing and concave. Then um (·) is concave. The joint scheduling and
congestion control algorithm (or “utility maximization algorithm”) works as follows.

Utility Maximization Algorithm


Initially let q(0) = Q (0) = 0. In each time slot t = 0, 1, 2, . . . , besides DMW Schedul-
ing (6.4), i.e.,
x ∗ (t) ∈ arg max d T (t) · x,
x∈X

IA m chooses the input rate

fm (q(t)) := arg max {V · um (f ) + q(t)T Am f } (6.12)


0≤f ≤1

where V > 0 is a constant, and Am is the m’th column of A. Then, update the virtual queues as

q(t + 1) = q(t) − A · f (q(t)) − B · x ∗ (t) (6.13)

Since fm (q(t)) in general is not integer, we let am (t) = Fm (t + 1) − Fm (t), where

Fm (t) := t−1τ =0 fm (q(τ )). And update the actual queues in the same way as (6.7).

Theorem 6.8 With the above algorithm, q(t) and Q (t) are bounded. Also, there are at most a finite
number of null activities which do not affect the long term throughput.

The proof is in Section 6.9.3.


The following is a performance bound of the utility maximization algorithm. The proof is
similar to that in (56), and is given in Section 6.9.4.

Theorem 6.9 We have



um (f˜m ) ≥ U ∗ − c/V (6.14)
m
T −1
where f˜m := lim inf T →∞ t=0 fm (q(t)), U ∗ is the optimal total utility, and c > 0 is a constant defined
in (6.21). That is, a larger V leads to better a lower bound of the achieved utility (at the cost of larger queue
lengths).
6.6. EXTENSIONS 111

6.6 EXTENSIONS
In the above, we have assumed that each activity lasts one slot for the ease of exposition. Our
algorithms can be extended to the case where different activities have different durations under a
particular assumption. The assumption is that each activity can be suspended in the middle and
resumed later. If so, we can still use the above algorithm which re-computes the maximum weight
schedule in each time slot. The only difference is that the activities performed in one time slot may
not be completed at the end of the slot, but they are suspended and to be continued in later slots.
(The above assumption was also made in the “preempted” networks in (12). There, whenever a new
schedule is computed, the ongoing activities are suspended, or “preempted”.)
In this case, the algorithms are adapted in the following way. The basic idea is the same as
before. That is, we run the system according to the virtual queues q(t). Let the elements in matrices
A and B be the average rates of consuming (or producing) parts per slot from (or to) different
queues. Even if an activity is not completed in one slot, we still update the virtual queues q(t)
according to the above average rates. That is, we view the parts in different queues as fluid, and q(t)
reflects the amount of fluid at each queue. However, only when an activity is completed, the actual
parts are removed from or added to the output queues. Note that when an activity is suspended, all
parts involved in the activity are frozen and are not available to other activities. When there are not
enough parts in the queues to perform a scheduled activity, fictitious parts are used instead (and the
corresponding deficits are increased).
On the other hand, if each activity cannot be suspended in the middle once it is started,
then one possible scheme is to use long time slots in our algorithms. In slot t, each SA n with
xn∗ (t) = 1 is activated as many times as possible. When each slot is very long, the wasted time during
the slot becomes negligible, so the algorithm approximates the maximal throughput (with the cost
of longer delay). Without using long slots, the non-preemptive version of the maximal-pressure
algorithm proposed in (12) is not throughput-optimum in general, but it is throughput-optimal
under a certain resource-sharing constraints (12).

6.7 SIMULATIONS
6.7.1 DMW SCHEDULING
We simulate a network similar to Fig. 6.1 but with a different input matrix A and service matrix B
below.
⎡ ⎤ ⎡ ⎤
−3 1 0 0 0
⎢ −2 ⎥ ⎢ 0 1 1 0 ⎥
A=⎢ ⎥ ⎢
⎣ −1 ⎦ , B = ⎣ 0 0 1 0 ⎦

0 −1 0 2 1

It is easy to check that if λ1 = 1/3, we have A · λ1 + B · s = 0 where s :=


[1, 1/3, 1/3, 1/3]T ∈  (and s is unique). So, any λ1 ∈ (0, 1/3) is strictly feasible.
112 6. STOCHASTIC PROCESSING NETWORKS
In the simulation, IA1 is activated in slot 5k, k = 0, 1, 2, . . . , . So the input rate λ1 = 1/5
which is strictly feasible. Since SA 3 requires enough parts from several queues to perform, it is not
difficult to see that normal MWS fails to stabilize queue 3. Fig. 6.4 shows that DMW stabilizes all
queues and have bounded deficits.
Now we make a change to the arrival process. In time slot 4k, k = 0, 1, 2 . . . , IA 1 is inde-
pendently activated with probability 0.8. As a result, the expected arrival rate is strictly feasible and
also satisfies the smoothness condition (Condition 1) with T = 4. Fig. 6.5 shows that our algorithm
stabilizes all queues. As expected, Dk (t) stops increasing after some time since q(t) is bounded.
Q (t)
k
4
Q1
3.5 Q
2
Q
3
3 Q
4

2.5

1.5

0.5

0
0 20 40 60 80 100
Time slot

Figure 6.4: Queue lengths with DMW Scheduling (deterministic arrivals)

6.7.2 UTILITY MAXIMIZATION


Consider the network in Fig. 6.6. The utility functions of both flows are um (·) = log(·). We simulate
the network with the utility maximization algorithm with V = 50. Fig. 6.7 and Fig. 6.8 show Qk (t)
and Dk (t). As expected, Dk (t) stops increasing after some time due to the boundedness of qk (t).
The average throughput of flow 1 and 2 is 0.4998 and 0.4998, which are very close to the theoretical
optimal throughput computed by solving the utility maximization problem numerically. (To double-
check the correctness of the algorithm, note that for example q1 (t) + q2 (t) = [Q1 (t) − D1 (t)] +
[Q2 (t) − D2 (t)] is about 100 after initial convergence. So by (6.12), f1 (q(t)) ≈ V /[q1 (t) + q2 (t)] =
0.5.)
6.8. SUMMARY 113
Qk(t)
4
Q
1
3.5
Q2
3 Q3
Q4
2.5

1.5

0.5

−0.5

−1
0 20 40 60 80 100
Time slot

Figure 6.5: Queue lengths with DMW Scheduling (random arrivals)

Ȝ 
 






Ȝ 
 

Figure 6.6: A SPN

6.8 SUMMARY
In this chapter, we have discussed the problem of achieving the maximal throughput and utility
in SPNs. First, we explained through examples (in Section 6.2) that scheduling in SPNs is more
challenging than in wireless networks because performing a service activity does not only require
resources as in wireless networks, but also requires the availability all necessary input parts. As a
result, the well known Maximum Weight Scheduling may not stabilize the queues.
114 6. STOCHASTIC PROCESSING NETWORKS
Qk(t)
100

90 Q1
Q
80 2
Q
3
70 Q
4

60 Q
5

50

40

30

20

10

0
0 1000 2000
Time slot

Figure 6.7: Actual queue lengths Q (t).

To achieve throughput-optimality, we have proposed a “Deficit Maximum Weight” (DMW)


scheduling algorithm (Section 6.4). We showed that if the arrival process is smooth enough, then
all queues are bounded under DMW. If the arrival process is random, then we can either achieve
“rate stability” (a weaker form of stability) or ensure bounded queues with a loss of throughput that
can be made arbitrarily small. (Achieving positive recurrence of the queues with random arrivals is
an interesting topic for future research.)
To maximize the utility of SPNs, we have combined DMW scheduling with admission control
(Section 6.5). In the joint algorithm, all queues are bounded.

6.9 SKIPPED PROOFS


6.9.1 PROOF OF THEOREM 6.5
First, we need to identify some useful properties of the system. Our analysis differs from existing
analysis of MWM-like algorithms (e.g., in (66; 56)) since we allow qk (t) to be negative, and an
activity generally involves multiple input and output queues.

Lemma 6.10 In DMW, if d(t) = 0 at some time t, then q(t) = 0.

Proof. Let zm (t) and wn (t) be, respectively, the number of times that IA m and SA n have been
performed until time t. Write z(t) and w(t) as the corresponding vectors. Using q(0) = 0 and
equation (6.6), we have
q(t) = −A · z(t) − B · w(t). (6.15)
6.9. SKIPPED PROOFS 115
Dk(t)
100

90
D
1
80 D
2
D
70 3
D
4
60 D
5

50

40

30

20

10

0
0 1000 2000
Time slot

Figure 6.8: Deficits D(t).

By (6.1), there exists s m such that Am = −B · s m . Using this and (6.15),



q(t) = m [B · sm zm (t)] − B · w(t)
= B ·v

where v := m [sm zm (t)] − w(t). By assumption,

d(t) = B T q(t) = B T B · v = 0.

Thus, v T B T B · v = ||B · v||2 = 0. So, B · v = q(t) = 0. 2


Remark: We have used the fact that for any t, q(t) is always in the subspace B := {u|u =
B · v for some v}.

Lemma 6.11 Assume that λ is a strictly feasible, i.e., ∃y ∈ o such that

A · λ + B · y = 0. (6.16)

Then there exists δ > 0 such that for any q satisfying q ∈ B ,

q T B · [x ∗ (q) − y] ≥ δ||q|| (6.17)

where
x ∗ (q) ∈ arg max q T B · x. (6.18)
x∈X
116 6. STOCHASTIC PROCESSING NETWORKS
Proof. Since y ∈ o , ∃σ > 0 such that y ∈  for any y satisfying ||y − y|| ≤ σ .
For any q̂ satisfying ||q̂|| = 1, q̂ ∈ B , by Lemma 6.10, we have d̂ := B T q̂ = 0. Also, ||B T q̂|| ≥
σ̂ := min||q ||=1,q ∈B ||B T q || > 0. Choose ˆ > 0 (which may depend on q̂) so that ||ˆ · B T q̂|| = σ .
Then, y + ˆ · B T q̂ ∈ . Also, (6.18) implies that x ∗ (q) ∈ arg maxx∈ q T B · x. So, q̂T B · x ∗ (q̂) ≥
q̂T B · [y + ˆ · B T q̂] = q̂T B · y + ˆ · ||B T q̂||2 ≥ q̂T B · y + σ̂ · σ . Let δ := σ̂ · σ . Then

min q̂T B · [x ∗ (q̂) − y] ≥ δ. (6.19)


||q̂||=1,q̂∈B

Consider any q = 0. Let q̂ := q/||q||, then ||q̂|| = 1. Note that if x ∗ (q̂) ∈ arg maxx∈X q̂T B · x, then
x ∗ (q̂) ∈ arg maxx∈X qT B · x by linearity, so q T B · x ∗ (q̂) = q T B · x ∗ (q). Therefore, q T B · [x ∗ (q) −
y] = q T B · [x ∗ (q̂) − y] = ||q|| · q̂T B · [x ∗ (q̂) − y] ≥ δ||q||, proving (6.17). If q = 0, then (6.17)
holds trivially. 2
Next, to analyze the queue dynamics, consider the Lyapunov function L(q(t)) = ||q(t)||2 .
We have


(q(t)) := L(q(t + 1)) − L(q(t))
= ||q(t) − A · a(t) − B · x ∗ (q(t))||2 − ||q(t)||2
= −q(t)T A · a(t) − q(t)T B · x ∗ (q(t))
+||A · a(t) + B · x ∗ (q(t))||2
≤ −q(t)T A · a(t) − q(t)T B · x ∗ (q(t)) + c (6.20)

where c > 0 is a constant, defined as



c := (μ2k,in + μ2k,out ) (6.21)
k

where μk,in , μk,out are, respectively, the maximum amount of parts that can enter or leave queue k
in one time slot.

Lemma 6.12 Assume that q(0) = 0. If for any t,

L(q(t + 1)) − L(q(t)) ≤ −δ||q(t)|| + c (6.22)

where δ > 0 is a constant, then q(t) is always bounded. In particular, L(q(t)) ≤ c2 /δ 2 + c.

Proof. We prove this using induction. First, L(q(0)) = 0 ≤ c2 /δ 2 + c.


Next, as the induction hypothesis, assume that L(q(t)) ≤ c2 /δ 2 + c. Consider two cases. (i)
If L(q(t)) ∈ [c2 /δ 2 , c2 /δ 2 + c], then ||q(t)|| ≥ c/δ. By (6.22), we have L(q(t + 1)) ≤ L(q(t)) ≤
c2 /δ 2 + c. (ii) If L(q(t)) < c2 /δ 2 , since L(q(t + 1)) − L(q(t)) ≤ −δ||q(t)|| + c ≤ c, we also have
L(q(t + 1)) ≤ c2 /δ 2 + c. This completes the proof. 2
6.9. SKIPPED PROOFS 117
Lemma 6.13 Assume that condition 1 holds. Let y(l · T ) be the (unique) vector that satisfies

A · ãl + B · y(l · T ) = 0 (6.23)

where ãl is defined in (6.10). Then there exists δ̄ > 0 such that

q T B · [x ∗ (q) − y(l · T )] ≥ δ̄||q||, ∀l, ∀q ∈ B (6.24)

where x ∗ (q) is defined in (6.18).

Proof. By Condition 1, ∃σ > 0 such that for all l, ãl + σ · 1 and ãl − σ · 1 are feasible. Therefore,
y(l · T ) + s(σ · 1M ) ∈  and y(l · T ) − s(σ · 1M ) ∈ . Define σ > 0 to be the minimum element
of s(σ · 1M )  0, then y ∈  for any y satisfying ||y − y(l · T )|| ≤ σ . (This is because the set  is
“comprehensive”: if s ∈ , then s ∈  for any 0  s  s.) Then, following the proof of Lemma 6.11,
letting δ̄ := σ̂ · σ (which do not depend on l or q) completes the proof. 2

Lemma 6.14 Assume that the maximum change of any queue in one time slot is bounded by α. And the
absolute value of every element of A and B is bounded by b̄. Then

L(q((l + 1)T )) − L(q(l · T )) ≤ −T · δ̄||q(l · T )|| + c2

where c2 > 0 is a constant, defined as

c2 := T · c + KT 2 α · (M + K)b̄.

Proof. From (6.20), we have

−1
(l+1)T
L(q((l + 1)T )) − L(q(l · T )) ≤ − q(τ )T A · a(τ )
τ =l·T
−1
(l+1)T
− q(τ )T B · x ∗ (q(τ )) + T · c.
τ =l·T

For any τ ∈ {l · T , . . . , (l + 1)T − 1},

q(τ )T B · x ∗ (q(τ )) ≥ q(τ )T B · x ∗ (q(l · T ))


= q(l · T )T B · x ∗ (q(l · T )) +
[q(τ ) − q(l · T )]T B · x ∗ (q(l · T )).

Since |qk (τ ) − qk (l · T )| ≤ T · α, and each element of x ∗ (q(l · T )) is bounded by 1, we have

|[q(τ ) − q(l · T )]T B · x ∗ (q(l · T ))| ≤ KN b̄T α.


118 6. STOCHASTIC PROCESSING NETWORKS
Therefore,

q(τ )T B · x ∗ (q(τ )) ≥ q(l · T )T B · x ∗ (q(l · T )) − KN b̄T α. (6.25)

Also, q(τ )T A · a(τ ) ≥ q(l · T )T A · a(τ ) − KM b̄T α. Then

L(q((l + 1)T )) − L(q(l · T )) ≤ T · {−q(l · T )T A · ãl + KM b̄T α


−q(l · T )T B · x ∗ (q(l · T )) + KN b̄T α} + T · c
= −T · q(l · T )T B · [x ∗ (q(l · T )) − y(l · T )] + c2
≤ −T · δ̄||q(l · T )|| + c2

where the last two steps have used (6.23) and condition (6.24). 2
Now Theorem 6.5 can be proved as follows.

Proof. Lemma 6.14 and Lemma 6.12 imply that q(l · T ) is bounded for all l. Because each queue
has bounded increments per slot, q(t) is bounded for all t. 2

6.9.2 PROOF OF THEOREM REFTHM:RATE-STABLE


By (6.20), L(q(t + 1)) − L(q(t)) ≤ −q(t)T A · a(t) − q(t)T B · x ∗ (q(t)) + c. So

E[L(q(t + 1)) − L(q(t))|q(t)] ≤ −q(t)T A · E[a(t)] − q(t)T B · x ∗ (q(t)) + c


= −q(t)T A · λ − q(t)T B · x ∗ (q(t)) + c
= q(t)T B · y − q(t)T B · x ∗ (q(t)) + c
≤ −δ||q(t)|| + c. (6.26)

Let E0 := {q(t)| ||q(t)|| ≤ (c + 1)/δ. Then if q(t) ∈ / E0 , E[L(q(t + 1)) − L(q(t))|q(t)] ≤


−1; if q(t) ∈ E0 , E[L(q(t + 1)) − L(q(t))|q(t)] < ∞ due to the bounded change of queue lengths
in each slot. Therefore, by Foster’s criteria as used in (66), q(t) is stable.
Also, we claim that given a set E , with probability 1, the time average P (E ) :=
 −1
limT →∞ Tt=0 I (q(t) ∈ E )/T exists. To see this, partition the state space of q(t) into set
T , R1 , R2 , . . . where Rj , j = 1, 2, . . . are closed sets of communicating states and T is the set of
states not in ∪j Rj . If q(0) = 0 ∈ Rj for some j , then q(t) will not leave the set and all states in Rj
are positive recurrent. Therefore, there is a well defined stationary distribution in Rj , so P (E ) exists
w. p. 1. If q(0) = 0 ∈ T , by Foster’s criteria as used in (66) (Theorem 3.1), the negative drift implies
that w. p. 1, q(t) enters some Rj in finite time. After that, there is a well defined time average of
I (q(t) ∈ E ) w. p. 1. Therefore, the overall time average P (E ) exists. In both cases,

P (E ) = πj (E ) (6.27)

where πj (·) is the stationary distribution on the Rj , and Rj is the closed set of communicating
states q(t) eventually enters.
6.9. SKIPPED PROOFS 119
To show the rate stability, consider two kinds of queues. WLOG, let U be the set of queues
whose deficits go unbounded. According to Proposition 6.4, the queues outside the set only induce
a finite number of null activities.
Consider queue k ∈ U . For any C > 0, since Dk (t) → ∞, there exists finite time tk such
that Dk (t) ≥ C, ∀t ≥ tk . For t ≥ tk , queue k induces null activities at slot t − 1 only when qk (t) <
−D (t) ≤ −C. So the total number of null activities induced by queue k is not more than N · [tk +
∞k ∞
t=tk I (qk (t) < −C)] ≤ N · [tk + t=0 I (qk (t) < −C)], since queue k at most induces N null
activities in one time slot. Therefore, the average rate the queue k induces null activities is

 T −1
1
rk ≤ N · lim [tk + I (qk (t) < −C)] = N · P r(qk < −C) (6.28)
T →∞ T
t=0

where the marginal probability on the RHS is induced by the stationary distribution πj (·) on the
set Rj which q(t) eventually enters. So limC→+∞ P r(qk < −C) = 0. Since (6.28) holds for any
C > 0, letting C → +∞ yields rk = 0.
Therefore, the average rate of null activities is 0 in the long term w. p. 1. Also, if we imagine
that the null activities produce real parts, then the output rates of the final products would be the
maximum since the virtual queues q(t) are stable. Combining the two facts concludes the proof.

6.9.3 PROOF OF THEOREM 6.8

Lemma 6.15 q(t) is bounded.

Proof. Choose any f ∈ Rm and y  0 in o such that the flow conservation constraint is satisfied:

A · f + B · y = 0, and | m um (fm )| < ∞, ∀m. The latter is feasible by letting fm =  > 0, ∀m
where  is small enough.
By Lemma 6.11, we have for any q ∈ B ,

q T B · [x ∗ (q) − y ] ≥ δ ||q|| (6.29)

for some constant δ > 0.


Also, since fm ∈ [0, 1], by (6.12),

V · um (fm ) + q(t)T Am · fm ≤ V · um (fm (q(t))) + q(t)T Am fm (q(t)), ∀m.

Therefore,


M 
M
V · um (fm ) + q(t)T A · f ≤ V · um (fm (q(t))) + q(t)T A · f (q(t)).
m=1 m=1
120 6. STOCHASTIC PROCESSING NETWORKS

Since | m um (fm )| < ∞, we have

M 
M 
M 
M
um (fm (q(t))) − um (fm ) ≤ vm (1) − um (fm ) ≤ C1
m=1 m=1 m=1 m=1
for some positive constant C1 . So
−q(t)T A · f (q(t)) ≤ −q(t)T A · f + V · C1 . (6.30)
Similar to (6.20), the Lyapunov drift in the algorithm is

(q(t)) ≤ −q(t)T A · f (q(t)) − q(t)T B · x ∗ (q(t)) + c. (6.31)
Plugging (6.29) and (6.30) into (6.31) yields

(q(t)) ≤ −q(t)T A · f + V · C1 − q(t)T B · y − δ ||q(t)|| + c
= −q(t)T [A · f + B · y ] − δ ||q(t)|| + V · C1 + c
= −δ ||q(t)|| + V · C1 + c.
Using Lemma 6.12, the above implies that for all t,
L(q(t)) ≤ [(V · C1 + c)/δ ]2 + V · C1 + c.
So q(t) is bounded. 2
Define q̃(0) = 0, and for t = 0, 1, . . . , define
q̃(t + 1) = q̃(t) − A · a(t) − B · x ∗ (t). (6.32)

Lemma 6.16 For all t, ||q̃(t) − q(t)|| ≤ Z for some constant Z > 0.

Proof. By (6.13) and q(0) = 0, we have


t−1
q(t) = τ =0 [−A · f (q(τ )) − B · x ∗ (τ )]
t−1  ∗
= −A τ =0 f (q(τ )) − B · t−1 τ =0 x (τ ).
By (6.32) and q̃(0) = 0, we have
t−1
q̃(t) = τ =0 [−A · a(τ ) − B · x ∗ (τ )]
t−1  ∗
= −A τ =0 f (q(τ )) − B · t−1 τ =0 x (τ ).
 
So, ||q̃(t) − q(t)|| = ||A · { t−1 =0 f (q(τ )) −  t−1
τ =0 f (q(τ ))}||. Since each element of
t−1 t−1 τ
τ =0 f (q(τ )) −  τ =0 f (q(τ )) is between 0 and 1, and each element of A is bounded, we conclude
that ||q̃(t) − q(t)|| ≤ Z for some constant Z > 0. 2
Now we are ready to complete the proof.
Since ||q̃(t)|| ≤ ||q(t)|| + ||q̃(t) − q(t)||, combining the previous two lemmas, we know that
||q̃(t)|| ≤ G, ∀t for some G > 0. Define D(t) = Q (t) − q̃(t). Comparing the dynamics of Q (t)
and q̃(t), it is clear that we can apply Proposition 6.4 to q̃(t), Q (t) and D(t) to complete the proof.
6.9. SKIPPED PROOFS 121
6.9.4 PROOF OF THEOREM 6.9
Proof. Assume that f ∗ ∈ Rm and y ∗ > 0 achieves the optimal utility U ∗ . So A · f ∗ + B · y ∗ = 0

and U ∗ = m um (fm∗ ).
We also have q T B · [x ∗ (q) − y ∗ ] ≥ 0. This is equivalent to (6.29) when δ = 0. Then, fol-
lowing the proof of Theorem 6.8 (but without using the upper bound C1 ), we have
∗ ∗

(q(t)) ≤ −q(t)[A · f + B · y ] +
T

V ·[ um (fm (q(t))) − um (fm∗ )] + c

m m
= V ·[ um (fm (q(t))) − U ∗ ] + c.
m

Summing over t from 0 to T − 1 yields


−1 
T
L(q(T )) − L(q(0)) ≤ V · um (fm (q(t))) − V T U ∗ + T · c.
t=0 m

Dividing both sides by T · V , and using L(q(T )) − L(q(0)) = L(q(T )) ≥ 0, one gets
−1 
T
um (fm (q(t)))/T ≥ U ∗ − c/V . (6.33)
t=0 m
T −1 T −1
Since um (·) is concave, um ( t=0 fm (q(t))/T ) ≥ t=0 um (fm (q(t)))/T . Using this, (6.33) and
letting T → ∞, we have (6.14). 2
123

APPENDIX A

Stochastic Approximation
Algorithm 1 and Algorithm 1(b) that we develop in this book belong to a family of stochastic
approximation algorithms. These algorithms are essentially gradient algorithms to minimize some
function, except that they use a noisy estimate of the gradient.
This chapter provides some background on stochastic approximation. In Section A.1, we
review the standard gradient algorithm. Section A.2 explains the stochastic approximation algorithm
and its convergence properties.

A.1 GRADIENT ALGORITHM


Consider the problem of minimizing the differentiable, convex function f (x) over a bounded convex
set D ⊂ RK . That is, the problem is:

min f (x). (A.1)


x∈D
For simplicity, further assume that (A.1) has a unique solution x ∗ , and that the gradient of
f (x) is bounded over D, i.e., there exists a Cg < ∞, such that

||∇f (x)||2 ≤ Cg , ∀x ∈ D. (A.2)


For example, assume that f (x) = x 2 /2 where x ∈ D = [−2, 2]. Clearly, the minimum is
achieved by x = x ∗ := 0. However, when f (x) is more complex, analytically solving (A.1) is
generally not feasible. A common numerical algorithm to solve (A.1) is the gradient algorithm.
The gradient algorithm starts with an initial point x[0] ∈ D, and generates a sequence of values
x[m], m = 1, 2, . . . , with the objective of making x[m] converge to, or converge to a neighborhood
of x ∗ .
To achieve the objective, the algorithm updates x[m] in the opposite direction of the gradient.
Specifically, the update equation is for m = 0, 1, 2, . . . ,

x[m + 1] = {x[m] − αm ∇f (x[m])}D (A.3)

where {·}D means the projection onto the set D. The projection of a vector x onto a closed set D is
the closest point to x in that set, in the metric being considered. In our example with f (x) = x 2 /2,
one has ∇f (x[m]) = f (x[m]) = x[m], so the algorithm is

x[m + 1] = {x[m] − αm x[m]}D (A.4)


124 A. STOCHASTIC APPROXIMATION
We have the following well-known results about the convergence of algorithm (A.3).

Theorem A.1 Convergence of Gradient Algorithm.



(i) Decreasing step sizes: If αm > 0, limm→∞ αm = 0, and m αm = ∞, (for example, αm =
1/(m + 1)), then x[m] → x ∗ as m → ∞.
(ii) Constant step size: If αm = α, ∀m and α is small enough, then x[m] converges to a neighborhood
of x ∗ . More precisely, for any δ > 0 and x[0] there is some α0 so that if α ≤ α0 , then ||x[n] − x ∗ || ≤
δ, ∀n ≥ n0 for some n0 .

To illustrate the theorem, in our example with f (x) = x 2 /2, we use algorithm (A.4) with
αm = 1/(m + 1) in the case of decreasing step sizes, and αm = α = 0.1 in the case of constant step
size, both with the initial value x[0] = −2. The trajectories of {x[m]} are plotted in Fig. A.1 and
Fig. A.2.
0

−0.2

−0.4

−0.6

−0.8
x[m]
−1

−1.2

−1.4

−1.6

−1.8

−2
0 10 20 30 40 50
m

Figure A.1: Decreasing step sizes αm = 1/(m + 1).

Proof. We give the proof of this result because it illustrates arguments that are typically used to
derive such results.
Denote g(m) := ∇f (x[m]). In the following, we use x[m] and x(m) interchangeably.
Proof of part (i)
Consider the Lyapunov function d(m) := 21 ||x(m) − x ∗ ||2 where || · || denote the L2 norm.
By (A.3), we have
1
d(m + 1) ≤ ||[x[m] − αm g(m)] − x ∗ ||2
2
≤ d(m) + αm · [x ∗ − x(m)]T g(m)
+αm 2
Cg /2. (A.5)
A.1. GRADIENT ALGORITHM 125
0

−0.2

−0.4

−0.6

−0.8
x[m]
−1

−1.2

−1.4

−1.6

−1.8

−2
0 10 20 30 40 50
m

Figure A.2: Constant step sizes αm = α = 0.1.

where the first inequality holds because the projection to a convex set is “non-expansive” (8), that is,
||{y}D − {z}D || ≤ ||y − z||, and the second inequality follows from (A.2).
Step 1: Recurrence to a neighborhood of x ∗
Given a constant μ > 0, define the set Hμ := {x ∈ D|f (x) ≤ μ + f (x ∗ )}. Clearly, x ∗ ∈ Hμ ,
so Hμ is a neighborhood of x ∗ . For example, in Fig. A.3 with f (x) = x 2 /2, the set Hμ when μ = 0.5
is the set [a, b] = [−1, 1].
We claim that for any M0 , there exists m ≥ M0 such that x(m) ∈ Hμ . That is, Hμ is recurrent
for {x(m)}.
This claim can be proved by contradiction. Suppose that x(m) ∈ / Hμ , ∀m ≥ M0 , then ∀m ≥
M0 , using the fact that f (x) is convex in x, we have

[x ∗ − x(m)]T g(m) ≤ f (x ∗ ) − f (x(m)) ≤ −μ.

Combined with (A.5), one has

d(m + 1) ≤ d(m) − αm μ
+αm2
Cg /2. (A.6)

Since limm→∞ αm = 0, there exists M1 such that αm ≤ μ/Cg , ∀m ≥ M1 . Therefore, for all
m ≥ M2 := max{M0 , M1 } , we have

d(m + 1) − d(m) ≤ −αm μ/2.


 n−1
Since m αm = ∞, we have d(n) − d(M2 ) ≤ −(μ/2) m=M2 αm → −∞ as n → ∞. Since
x(M2 ) ∈ D, d(M2 ) is finite. This means that d(n) < 0 for large enough n, which is impossible.
126 A. STOCHASTIC APPROXIMATION
2
f(x)=x /2

1.5

0.5
μ

0
a b
−2 −1 0 1 2
x

Figure A.3: A neighborhood of x ∗

Step 2: Convergence
Fix μ > 0 and  > 0. Since limm→∞ αm = 0, we can choose M3 such that ∀m ≥ M3 ,
2
αm ≤ 2/Cg (A.7)
αm ≤ μ/Cg . (A.8)

By the result of step 1, there exists M4 ≥ M3 such that x(M4 ) ∈ Hμ . In the following, we show
that ∀m ≥ M4 , d(m) ≤ μ +  where μ := maxx∈Hμ ||x − x ∗ ||2 /2.The proof is by induction. First,
it is clear that d(M4 ) ≤ μ < μ + . Now suppose that d(m) ≤ μ +  where m ≥ M4 . We need
to show that d(m + 1) ≤ μ +  as well. This is done by considering two cases. (i) If x(m) ∈ Hμ ,
then by (A.5) and (A.7), d(m + 1) ≤ d(m) + αm 2 C /2 ≤ d(m) +  ≤  + ; (2) If x(m) ∈
g μ / Hμ ,
then by (A.5) and (A.8), d(m + 1) ≤ d(m) ≤ μ + . Therefore, d(m) ≤ μ + , ∀m ≥ M4 . This
argument is illustrated in Figure A.4.
Since x ∗ is unique, μ → 0 as μ → 0. Therefore, the above result holds for arbitrarily small
μ +  by choosing small enough  and μ. This implies that limm→∞ d(m) = 0, completing the
proof.
Proof of part (ii)
Given μ > 0 and  > 0, choose the step size α to satisfy (A.7) and (A.8), i.e., α 2 ≤ 2/Cg
and α ≤ μ/Cg . Using step 1 of the proof of part (i), it is easy to see that there exist M5 such that
x(M5 ) ∈ Hμ . Then using step 2 of that proof, we know that d(m) ≤ μ + , ∀m ≥ M5 . This implies
that x(m) converges to a neighborhood of x ∗ . 2
A.2. STOCHASTIC APPROXIMATION 127

Figure A.4: The key argument in the proof of part i).

A.2 STOCHASTIC APPROXIMATION


Let g(x) := ∇f (x) be the gradient of f (x) at the point x. (In the last example, g(x) = f (x) = x.)
In many scenarios, only noisy estimation of g(x) is available, denoted by g̃(x). For convenience, we
also write g(m) := g(x[m]) and g̃(m) := g̃(x[m]). The gradient algorithm with noisy gradients (or
a stochastic approximation algorithm) is

x[m + 1] = {x[m] − αm g̃(m)}D . (A.9)

Define the “error bias” in the m-th step as

B(m) := Em [g̃(m)] − g(m)

where Em (·) is the conditional expectation given Fm , the σ -field generated by x[0], x[1], . . . , x[m].
Also, define the zero-mean noise

η(m) := g̃(m) − Em [g̃(m)].

Then we have
g̃(m) = g(m) + B(m) + η(m). (A.10)
With algorithm (A.9), we have the following known result.

Theorem A.2 Key Stochastic Approximation Theorem.



(i) Decreasing step sizes: Assume that ||B(m)|| ≤ CB < ∞, ∀m, m αm ||B(m)|| < ∞ w. p.
  2
1 and Em ||η(m)||2 ≤ c3 < ∞, ∀m. Also assume that αm > 0, m αm = ∞, and m αm < ∞ (for
example, αm = 1/(m + 1)). Then w. p. 1, x[m] → x ∗ as m → ∞.
128 A. STOCHASTIC APPROXIMATION

(ii) Constant step size: Assume that ||B(m)|| ≤ CB < ∞, ∀m, m ||B(m)|| < ∞ w. p. 1 and
Em ||η(m)||2 ≤ c3 < ∞, ∀m. Then if αm = α, ∀m where α is small enough, then w. p. 1, x[m] returns
to a neighborhood of x ∗ infinitely often.

To illustrate the theorem, we apply algorithm (A.9) to our example f (x) = x 2 /2, using
αm = 1/(m + 1) in the case of decreasing step sizes, and αm = α = 0.1 in the case of constant
step size, both with the initial value x[0] = −2. In both cases, the error bias B(m) = 0, ∀m, and the
zero-mean noise η(m)’s are independent and uniformly distributed in [−1, 1]. The trajectories of
{x[m]} are plotted in Fig. A.5 and Fig. A.6.

0.5

−0.5
x[m]

−1

−1.5

−2
0 20 40 60 80 100
m

Figure A.5: Decreasing step sizes αm = 1/(m + 1).

Proof. In the following, we use x[m] and x(m) interchangeably.


Proof of part (i)
Consider the same Lyapunov function d(m) = 21 ||x(m) − x ∗ ||2 . By (A.9) and (A.10), we
have

1
d(m + 1) ≤ ||[x[m] − αm g̃(m)] − x ∗ ||2
2
= d(m) + αm · [x ∗ − x(m)]T g(m)
+αm · [x ∗ − x(m)]T [B(m) + η(m)]
+αm 2
||g̃(m)||2 /2. (A.11)

where the first inequality holds because the projection to a convex set is “non-expansive” (8), that is,
||{y}D − {z}D || ≤ ||y − z||.
A.2. STOCHASTIC APPROXIMATION 129
0.5

−0.5
x[m]

−1

−1.5

−2
0 20 40 60 80 100
m

Figure A.6: Constant step sizes αm = α = 0.1.

Since the gradient g(m) is bounded (by (A.2)), ||B(m)|| ≤ CB < ∞, and Em ||η(m)||2 ≤
c3 < ∞, it is easy to show that Em ||g̃(m)||2 /2 ≤ C < ∞ for some constant C. Therefore,

Em [d(m + 1)] ≤ d(m) + αm · [x ∗ − x(m)]T g(m)


+αm · [x ∗ − x(m)]T B(m)
+αm2
· C. (A.12)

Step 1: Recurrence to a neighborhood of x ∗


Given a constant μ > 0. We have shown before that if x(m) ∈
/ Hμ ,

[x ∗ − x(m)]T g(m) ≤ f (x ∗ ) − f (x(m)) ≤ −μ.

Since x(m), x ∗ ∈ D which is a bounded region, one has

||x ∗ − x(m)|| ≤ c2 , ∀m (A.13)

for a constant c2 . Therefore, if x(m) ∈


/ Hμ , then by (A.12) and (A.13), one has

Em [d(m + 1)] ≤ d(m) − αm μ


+αm · c2 ||B(m)||
+αm2
· C. (A.14)

Now we need the following lemma from (3):


Lemma 1: (A Supermartingale Lemma) Let {Xn } be an RK -valued stochastic process, and
V (·) be a real-valued non-negative function in RK . Suppose that {Yn } is a sequence of random
130 A. STOCHASTIC APPROXIMATION

variables satisfying that n |Yn | < ∞ with probability one. Let {Fn } be a sequence of σ -algebras
generated by {Xi , Yi , i ≤ n}. Suppose that there exists a compact set A ⊂ RK such that for all n,

En [V (Xn+1 )] − V (Xn ) ≤ −αn μ + Yn , for Xn ∈ /A



where αn > 0 satisfies n αn = ∞ and μ is a positive constant. Then the set A is recurrent for
{Xn }, i.e., Xn ∈ A for infinitely many n with probability one.
  2 
Since m αm ||B(m)|| < ∞ w. p. 1, m αm < ∞ and m αm = ∞, by Lemma 1, we know
that w. p. 1, x(m) returns to the set Hμ infinite times. In other words, Hμ is recurrent for {x(m)}.
Step 2: Convergence
Next, by (A.11) we have for n > m,

d(n) ≤ d(m)
n−1
+ i=m {α · [x ∗ − x(i)]T g(i)}
n−1 i (A.15)
+ i=m {αi · [x ∗ − x(i)]T [B(i) + η(i)]}
n−1 2
+ i=m αi ||g̃(i)||2 /2.
 ∞ 2
Since Ei ||g̃(i)||2 /2 ≤ C < ∞, ∀i, one has E( ∞ i=0 αi ||g̃(i)|| /2) ≤ C
2 2
i=0 αi < +∞.
∞ 2
Therefore, i=0 αi ||g̃(i)|| /2 < +∞ w. p. 1, which implies that w. p. 1,
2



lim αi2 ||g̃(i)||2 /2 = 0. (A.16)
m→∞
i=m
∞ ∞
Also, i=0 |αi · [x ∗ − x(i)]T B(i)| ≤ i=1 αi · c2 ||B(i)|| < ∞. So


lim |αi · [x ∗ − x(i)]T B(i)| = 0. (A.17)
m→∞
i=m
n−1
Finally, W (n) := i=0 {αi · [x ∗ − x(i)]T η(i)} is a martingale (16).To see this, note that (a) W (n) ∈
Fn ; (b) E|W (n)| < ∞, ∀n; and (c) E(W (n + 1)|Fn ) − W (n) = αn · [x ∗ − x(n)]T E[η(n)|Fn ] =
0. Also, Em ||η(m)||2 ≤ c3 < ∞, ∀m implies that E||η(m)||2 ≤ c3 , ∀m. So
n−1
supn E(W (n)2 ) = sup i=0 E{[αi · (x ∗ − x(i))T η(i)]2 }
∞
≤ E{[αi · (x ∗ − x(i))T η(i)]2 } (A.18)
i=0

≤ i=0 {αi c2 E||η(i)|| } < ∞.
2 2 2

By the L2 Martingale Convergence Theorem (16), W (n) converges with probability 1. So, w. p. 1,


n−1
sup | {αi · [x ∗ − x(i)]T η(i)}|
n≥m≥N0 i=m
= sup |W (n) − W (m)| → 0 (A.19)
n≥m≥N0
A.3. SUMMARY 131
as N0 → ∞.
Combining (A.16), (A.17) and (A.19), we know that with probability 1, for any  > 0, after
x(m) returns to Hμ for some large enough m (due to recurrence of Hμ ),


n−1 
n−1

{αi · [x − x(i)] [B(i) + η(i)]} +
T
αi2 ||g̃(i)||2 /2 ≤ 
i=m i=m

for any n > m. In (A.15), since [x ∗ − x(i)]T g(i) ≤ 0, we have d(n) ≤ d(m) + , ∀n > m. In other
words, r cannot move far away from Hμ after step m. Since the above argument hold for Hμ with
arbitrarily small μ and any  > 0, x converge to x ∗ with probability 1.
Proof of part (ii)
In (A.14), choose αm = α ≤ μ/(2C), then −αm μ + αm 2 · C = α(−μ + αC) ≤ −αμ/2. It

follows that
Em [d(m + 1)] ≤ d(m) − αμ/2 + α · c2 ||B(m)||.

Since m ||B(m)|| < ∞ w. p. 1, by Lemma 1, we conclude that x(m) returns to Hμ infinitely
often w. p. 1. 2

A.3 SUMMARY
This chapter has explained gradient algorithms to minimize an objective function f (x), with accurate
or noisy gradients. For simplicity, we have assumed that the objective function is convex and the
minimization is over a bounded convex region.
We first discussed the case when accurate gradients are available (Section A.1). In this case,
with decreasing step sizes that converge to 0 but sum up to infinity, the gradient algorithm makes x
to converge to x ∗ that minimizes f (x). With a constant step size that is small enough, x to converge
to a neighborhood of x ∗ .
When only inaccurate gradients are available, we have a stochastic approximation algorithm
(Section A.2). We explained that under certain conditions on the error in the gradient, the algorithm
makes x to converge to x ∗ almost surely with properly-chosen decreasing step sizes, and it makes x
returns to a neighborhood of x ∗ infinitely often with a small enough constant step size.
This chapter has provided important background for the development of our throughput-
optimal scheduling algorithms in Chapter 3, which are in the family of stochastic approximation
algorithms. In those algorithms, we need to deal with extra challenges such as quantifying the error
in the gradient and optimizing over unbounded sets.

A.4 REFERENCES
Stochastic approximation was first introduced in (63) as the Robbins-Monro algorithm. Over the
years, the theory has been developed extensively concerning the convergence conditions, rates of
132 A. STOCHASTIC APPROXIMATION
convergence, noise models, etc., with applications in many areas such as control, communications
and signal processing. See, for example, (42; 7) for a comprehensive development.
133

Bibliography
[1] R. Ahlswede, N. Cai, S. Li, and R.W. Yeung, “Network Information Flow,” IEEE Transactions
on Information Theory, vol. 46, no. 4, pp. 1204-1216, Jul. 2000. DOI: 10.1109/18.850663 65

[2] S. Asmussen, Applied probability and queues, Springer Verlag, 2003. 4

[3] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Massachusetts: Athena Sci-


entific, 1996. 129

[4] G. Bianchi, “Performance Analysis of the IEEE 802.11 Distributed Coordination Func-
tion,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 3, pp. 535-547, 2000.
DOI: 10.1109/49.840210 75, 97

[5] R. R. Boorstyn, A. Kershenbaum, B. Maglaris, and V. Sahin,“Throughput Analysis in Multihop


CSMA Packet Radio Networks,” IEEE Transactions on Communications, vol. 35, no. 3, pp. 267-
274, Mar. 1987. DOI: 10.1109/TCOM.1987.1096769 24, 25, 97

[6] C. Bordenave, D. McDonald, A. Proutiere, “Performance of random medium access con-


trol, an asymptotic approach,” in Proceedings of the ACM Sigmetrics 2008, pp. 1-12.
DOI: 10.1145/1375457.1375459 80

[7] V. Borkar, “Stochastic Approximation: A Dynamical Systems Viewpoint,” 2008. 83, 94, 95, 132

[8] S. Boyd and L. Vandenberghe, “Convex Optimization”, Cambridge University Press, 2004. 18,
26, 55, 93, 125, 128

[9] Loc Bui, R. Srikant, and Alexander Stolyar, “Novel Architectures and Algorithms for De-
lay Reduction in Back-Pressure Scheduling and Routing,” in IEEE INFOCOM 2009 Mini-
Conference.

[10] P. Chaporkar, K. Kar, and S. Sarkar, “Throughput guarantees in maximal scheduling in wireless
networks,” in the 43rd Annual Allerton Conference on Communication, Control and Computing,
Sep. 2005. 57

[11] M. Chiang, S. H. Low, A. R. Calderbank, and J. C. Doyle, “Layering as Optimization Decom-


position: A Mathematical Theory of Network Architectures,” in Proceedings of the IEEE, vol.
95, no. 1, pp. 255-312, 2007. DOI: 10.1109/JPROC.2006.887322
134 BIBLIOGRAPHY
[12] J. G. Dai and W. Lin, “Maximum Pressure Policies in Stochastic Processing Networks,” Op-
erations Research, vol. 53, no. 2, pp. 197–218, Mar–Apr 2005. DOI: 10.1287/opre.1040.0170
99, 100, 111

[13] J. G. Dai and W. Lin, “Asymptotic Optimality of Maximum Pressure Policies in Stochas-
tic Processing Networks,” Annals of Applied Probability, vol. 18, no. 6, pp. 2239-2299, 2008.
DOI: 10.1214/08-AAP522 99

[14] P. Diaconis and D. Strook, “Geometric bounds for eigenvalues of Markov chains,” Annals of
Applied Probability, vol. 1, no. 1, pp. 36-61, Feb. 1991. DOI: 10.1214/aoap/1177005980

[15] A. Dimakis and J. Walrand, “Sufficient Conditions for Stability of Longest-Queue-First


Scheduling: Second-Order Properties Using Fluid Limits,” Advances in Applied Probability,
vol. 38, no. 2, pp. 505–521, 2006. DOI: 10.1239/aap/1151337082 57

[16] R. Durrett, Probability: Theory and Examples. Duxbury Press, 3rd edition, March 16, 2004. 41,
130

[17] M. Durvy, O. Dousse, and P. Thiran, “Border Effects, Fairness, and Phase Transition in Large
Wireless Networks”, in IEEE INFOCOM 2008, Phoenix, Arizona, Apr. 2008. 24

[18] M. Durvy and P. Thiran, “Packing Approach to Compare Slotted and Non-Slotted
Medium Access Control,” in IEEE INFOCOM 2006, Barcelona, Spain, Apr. 2006.
DOI: 10.1109/INFOCOM.2006.251 58

[19] A. Eryilmaz, A. Ozdaglar and E. Modiano, “Polynomial Complexity Algorithms for Full
Utilization of Multi-hop Wireless Networks,” in IEEE INFOCOM 2007, Anchorage, Alaska,
May 2007. DOI: 10.1109/INFCOM.2007.65 57

[20] A. Eryilmaz and R. Srikant, “Fair Resource Allocation in Wireless Networks Using Queue-
Length-Based Scheduling and Congestion Control,” in IEEE INFOCOM, Mar. 2005.
DOI: 10.1109/INFCOM.2005.1498459 73

[21] P. Gupta and A. L. Stolyar, “Optimal Throughput Allocation in General Random Access
Networks,” in Conference on Information Sciences and Systems, Princeton, NJ, Mar. 2006.
DOI: 10.1109/CISS.2006.286657 73

[22] B. Hajek, “Cooling Schedules for Optimal Annealing,” Mathematics of Operations Research, vol.
13, no. 2, pp. 311–329, 1988. DOI: 10.1287/moor.13.2.311 29

[23] J. M. Harrison, “Brownian Models of Open Processing Networks: Canonical Represen-


tation of Workload,” Annals of Applied Probability, vol. 10, no. 1, pp. 75-103, 2000.
DOI: 10.1214/aoap/1019737665 99
BIBLIOGRAPHY 135
[24] J. M. Harrison and R. J. Williams, “Workload Reduction of a Generalized Brow-
nian Network,” Annals of Applied Probability, vol. 15, no. 4, pp. 2255-2295, 2005.
DOI: 10.1214/105051605000000458 99

[25] J. M. Harrison and R. J. Williams, “Workload Interpretation for Brownian Models of Stochas-
tic Processing Networks,” Mathematics of Operations Research, vol. 32, pp. 808-820, 2007.
DOI: 10.1287/moor.1070.0271

[26] T. Ho and H. Viswanathan, “Dynamic Algorithms for Multicast with Intra-


Session Network Coding,” submitted to IEEE Transactions on Information Theory.
DOI: 10.1109/TIT.2008.2009809 65, 66, 67

[27] S Hu, G Chen, X Wang, “On extending the Brunk-Prokhorov strong law of large
numbers for martingale differences,” Statistics and Probability Letters, 2008, Elsevier.
DOI: 10.1016/j.spl.2008.06.017 46

[28] L. Jiang and S. C. Liew, “Improving Throughput and Fairness by Reducing Exposed and
Hidden Nodes in 802.11 Networks,” IEEE Transactions on Mobile Computing, vol. 7, no. 1, pp.
34-49, Jan. 2008. DOI: 10.1109/TMC.2007.1070 76

[29] L. Jiang, D. Shah, J. Shin, and J. Walrand, “Distributed Random Access Algorithm: Scheduling
and Congestion Control,” accepted to IEEE Transactions on Information Theory. 23

[30] L. Jiang and J. Walrand, “A Distributed CSMA Algorithm for Throughput and Utility Max-
imization in Wireless Networks,” in the 46th Annual Allerton Conference on Communication,
Control, and Computing, Sep. 23-26, 2008. DOI: 10.1109/ALLERTON.2008.4797741 29,
94, 97

[31] L. Jiang and J. Walrand, “A Distributed Algorithm for Maximal Throughput and Optimal
Fairness in Wireless Networks with a General Interference Model,” EECS Technical Re-
port, UC Berkeley, Apr. 2008. http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/
EECS-2008-38.html 32

[32] L. Jiang and J. Walrand, “A Novel Approach to Model and Control the Throughput of
CSMA/CA Wireless Networks”, Technical Report, UC Berkeley, Jan 2009. http://www.
eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-8.html 75, 97

[33] L. Jiang and J. Walrand, “Convergence and Stability of a Distributed CSMA Al-
gorithm for Maximal Network Throughput,” Technical Report, UC Berkeley, Mar.
2009. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-43.html
DOI: 10.1109/CDC.2009.5400349 29, 42, 70

[34] L. Jiang and J. Walrand, “Approaching throughput-optimality in a Distributed


CSMA Algorithm with Contention Resolution,” Technical Report, UC berkeley, Mar.
136 BIBLIOGRAPHY
2009. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-37.html
DOI: 10.1145/1540358.1540361 84, 96

[35] L. Jiang and J. Walrand, “Approaching Throughput-Optimality in a Distributed CSMA


Algorithm: Collisions and Stability,” (invited), ACM Mobihoc’09 S3 Workshop, May 2009.
DOI: 10.1145/1540358.1540361

[36] L. Jiang and J. Walrand, “Stable and Utility-Maximizing Scheduling for Stochastic Processing
Networks,” in the 47th Annual Allerton Conference on Communication, Control, and Computing,
2009. DOI: 10.1109/ALLERTON.2009.5394870 99

[37] C. Joo, X. Lin, and N. Shroff, “Understanding the Capacity Region of the Greedy Maximal
Scheduling Algorithm in Multi-Hop Wireless Networks,” in IEEE INFOCOM 2008, Phoenix,
Arizona, Apr. 2008. DOI: 10.1109/INFOCOM.2008.165 57

[38] F. P. Kelly, “Reversibility and Stochastic Networks,” Wiley, 1979. 9, 10, 24

[39] F. P. Kelly, “Loss networks,” Ann. Appl. Prob., vol. 1, no. 3, 1991.
DOI: 10.1214/aoap/1177005872 80

[40] F. P. Kelly, “Charging and Rate Control for Elastic Traffic,” European Transactions on Telecom-
munications, vol. 8, pp. 33-37, 1997. DOI: 10.1002/ett.4460080106

[41] F. P. Kelly, A. K. Maulloo, and D. K. H. Tan, “Rate Control for Communication Networks:
Shadow Prices, Proportional Fairness and Stability,” Journal of the Operational Research Society,
vol. 49, no. 3, pp. 237-252, 1998. DOI: 10.2307/3010473 73

[42] H. Kushner and G. Yin, “Stochastic approximation and recursive algorithms and applications,”
Springer-Verlag, New York, 2003. 132

[43] M. Leconte, J. Ni, and R. Srikant, “Improved Bounds on the Throughput Efficiency of
Greedy Maximal Scheduling in Wireless Networks,” in ACM MOBIHOC, May 2009.
DOI: 10.1145/1530748.1530771 57

[44] J. W. Lee, M. Chiang, and R. A. Calderbank, “Utility-Optimal Random-Access Con-


trol,” IEEE Transactions on Wireless Communications, vol. 6, no. 7, pp. 2741-2751, Jul. 2007.
DOI: 10.1109/TWC.2007.05991 73

[45] S. C. Liew, C. Kai, J. Leung, and B. Wong, “Back-of-the-Envelope Computation of Through-


put Distributions in CSMA Wireless Networks,” in IEEE ICC, 2009.
DOI: 10.1109/TMC.2010.89 24, 25

[46] X. Lin and N. Shroff, “The Impact of Imperfect Scheduling on Cross-Layer Rate Control in
Multihop Wireless Networks,” in IEEE INFOCOM 2005, Miami, Florida, Mar. 2005. 73
BIBLIOGRAPHY 137
[47] X. Lin, N. B. Shroff, and R. Srikant, “A Tutorial on Cross-Layer Optimization in Wireless
Networks,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 8, pp.1452-1463,
Aug. 2006. DOI: 10.1109/JSAC.2006.879351 61, 64

[48] J. Liu, Y. Yi, A. Proutiere, M. Chiang, and H. V. Poor, “Convergence and Tradeoff of Utility-
Optimal CSMA,” http://arxiv.org/abs/0902.1996 29, 94, 97

[49] S. H. Low and D. E. Lapsley, “Optimization Flow Control, I: Basic Algorithm and Con-
vergence,” IEEE/ACM Transactions on Networking, vol. 7, no. 6, pp. 861-874, Dec. 1999.
DOI: 10.1109/90.811451 73

[50] S. H. Low and P. P. Varaiya, “A New Approach to Service Provisioning in ATM


Networks,” IEEE Transactions on Networking, vol. 1, no. 5, pp. 549-553, Oct. 1993.
DOI: 10.1109/90.251913

[51] P. Marbach, A. Eryilmaz, and A. Ozdaglar, “Achievable Rate Region of CSMA Schedulers
in Wireless Networks with Primary Interference Constraints,” in IEEE Conference on Decision
and Control, 2007. 58

[52] N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand, “Achieving 100% Throughput


in an Input-Queued Switch,” IEEE Transactions on Communications, vol. 47, no. 8, pp. 1260-
1267, Aug. 1999. DOI: 10.1109/26.780463 56

[53] S. Meyn, “Stability and asymptotic optimality of generalized MaxWeight policies,” SIAM
Journal on Control and Optimization, vol. 47, no. 6, 2009. DOI: 10.1137/06067746X

[54] J. Mo and J. Walrand, “Fair End-to-End Window-Based Congestion Control,” IEEE/ACM


Transactions on Networking, vol. 8, no. 5, pp. 556-567, 2000. DOI: 10.1109/90.879343 73

[55] E. Modiano, D. Shah, and G. Zussman, “Maximizing Throughput in Wireless Networks via
Gossiping,” ACM SIGMETRICS Performance Evaluation Review, vol. 34 , no. 1, Jun. 2006.
DOI: 10.1145/1140103.1140283 57

[56] M. J. Neely, E. Modiano, and C-P. Li, “Fairness and Optimal Stochastic Control for Hetero-
geneous Networks,” in IEEE INFOCOM, Mar. 2005. DOI: 10.1109/TNET.2007.900405 73,
110, 114

[57] M. J. Neely, E. Modiano, and C. P. Li, “Fairness and Optimal Stochastic Control for Hetero-
geneous Networks,” IEEE/ACM Transactions on Networking, vol. 16, no. 2, pp. 396-409, Apr.
2008. DOI: 10.1109/TNET.2007.900405 61, 70, 71

[58] M. J. Neely and R. Urgaonkar, “Cross Layer Adaptive Control for Wireless Mesh
Networks,” Ad Hoc Networks (Elsevier), vol. 5, no. 6, pp. 719-743, Aug. 2007.
DOI: 10.1016/j.adhoc.2007.01.004
138 BIBLIOGRAPHY
[59] J. Ni and R. Srikant, “Distributed CSMA/CA Algorithms for Achieving Maximum Through-
put in Wireless Networks,” in Information Theory and Applications Workshop, Feb. 2009.
DOI: 10.1109/ITA.2009.5044953 80, 97

[60] J. Ni, B. Tan, and R. Srikant, “Q-CSMA: Queue-Length Based CSMA/CA Algorithms for
Achieving Maximum Throughput and Low Delay in Wireless Networks,” http://arxiv.
org/pdf/0901.2333

[61] A. Proutiere, Y. Yi, and M. Chiang, “Throughput of Random Access without Message
Passing,” in Conference on Information Sciences and Systems, Princeton, NJ, USA, Mar. 2008.
DOI: 10.1109/CISS.2008.4558579 58

[62] S. Rajagopalan and D. Shah, “Distributed Algorithm and Reversible Network”, in


Conference on Information Sciences and Systems, Princeton, NJ, USA, Mar. 2008.
DOI: 10.1109/CISS.2008.4558577

[63] H. Robbins and S. Monro, “A Stochastic Approximation Method,” The Annals of Mathematical
Statistics, vol. 22, no. 3, pp. 400-407, Sep. 1951. 131

[64] S. Sanghavi, L. Bui, and R. Srikant, “Distributed Link Scheduling with Constant Overhead,”
in ACM SIGMETRICS, Jun. 2007. DOI: 10.1145/1269899.1254920 57

[65] C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal,


vol. 27, pp. 379-423, 623-656, July, October, 1948. DOI: 10.1063/1.3067010 10

[66] L. Tassiulas and A. Ephremides, “Stability Properties of Constrained Queueing Systems and
Scheduling Policies for Maximum Throughput in Multihop Radio Networks,” IEEE Transac-
tions on Automatic Control, vol. 37, no. 12, pp. 1936-1948, Dec. 1992. DOI: 10.1109/9.182479
7, 56, 114, 118

[67] L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks
and input queued switches,” in IEEE INFOCOM, volume 2, pages 533–539, 1998.
DOI: 10.1109/INFCOM.1998.665071 56

[68] M. J. Wainwright and M. I. Jordan, “Graphical Models, Exponential Families, and Variational
Inference,” Foundations and Trends in Machine Learning, vol. 1, no. 1-2, pp. 1-305, 2008.
DOI: 10.1561/2200000001 25, 26, 32

[69] J. Walrand, “Entropy in Communication and Chemical Systems,” in the first International
Symposium on Applied Sciences in Biomedical and Communication Technologies (Isabel’08), Oct.
2008. DOI: 10.1109/ISABEL.2008.4712620 10

[70] J. Walrand. An Introduction to Queueing Networks. Prentice Hall, 1988. 99


BIBLIOGRAPHY 139
[71] X. Wang and K. Kar, “Throughput Modelling and Fairness Issues in CSMA/CA Based Ad-
Hoc Networks,” in IEEE INFOCOM 2005, Miami, Florida, Mar. 2005. 24

[72] A. Warrier, S. Ha, P. Wason and I. Rhee, “DiffQ: Differential Backlog Congestion Control for
Wireless Multi-hop Networks,” Technical Report, Dept. Computer Science, North Carolina
State University, 2008. 63

[73] L. M. Wein, “ Optimal Control of a Two-Station Brownian Network,” Mathematics of Opera-


tions Research, vol. 15, no. 2, , pp. 215 - 242, May 1990. DOI: 10.1287/moor.15.2.215 99

[74] P. Whittle, “Systems in Stochastic Equilibrium,” John Wiley & Sons, Inc., New York, NY, USA,
1986. 10, 32

[75] R. J. Williams, “On Stochastic Processing Networks,” Lecture Notes, 2006. http://math.
ucsd.edu/˜williams/talks/belz/belznotes06.pdf 99

[76] X. Wu and R. Srikant, “Scheduling Efficiency of Distributed Greedy Scheduling Algo-


rithms in Wireless Networks,” in IEEE INFOCOM 2006, Barcelona, Spain, Apr. 2006.
DOI: 10.1109/INFOCOM.2006.176 57

[77] Y. Xi and E. M. Yeh, “Throughput Optimal Distributed Control of Stochastic Wireless Net-
works,” in International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless
Networks (WiOpt), 2006. 64

[78] J. Zhang, D. Zheng, and M. Chiang, “The Impact of Stochastic Noisy Feedback on Distributed
Network Utility Maximization,” IEEE Transactions on Information Theory, vol. 54, no. 2, pp.
645-665, Feb. 2008. DOI: 10.1109/TIT.2007.913572

[79] G. Zussman, A. Brzezinski, and E. Modiano, “Multihop Local Pooling for Distributed
Throughput Maximization in Wireless Networks,” in IEEE INFOCOM 2008, Phoenix, Ari-
zona, Apr. 2008. 57
141

Authors’ Biographies

LIBIN JIANG
Libin Jiang received the bachelor of engineering degree in electronic engineering and information
science from the University of Science and Technology of China, Hefei, China, in 2003, the master of
philosophy degree in information engineering from the Chinese University of Hong Kong, Shatin,
Hong Kong, in 2005, and the Ph.D. degree in electrical engineering and computer sciences from
the University of California, Berkeley, in 2009. His research interest includes wireless networks,
communications, and game theory.
He received the David Sakrison Memorial Prize for outstanding doctoral research in UC
Berkeley, and the best presentation award in the ACM Mobihoc’09 S3 Workshop.

JEAN WALRAND
Jean Walrand (S’71-M’80-SM’90-F’93) received the Ph.D. degree in electrical engineering and
computer science from the University of California, Berkeley.
He has been a professor at UC Berkeley since 1982. He is the author of An Introduction to
Queueing Networks (Englewood Cliffs, NJ: Prentice Hall, 1988) and Communication Networks: A First
Course (2nd ed., New York: McGraw-Hill, 1998) and coauthor of High Performance Communication
Networks (2nd ed., San Mateo, CA: Morgan Kaufman, 2000).
Prof. Walrand is a fellow of the Belgian American Education Foundation and a recipient of
the Lanchester Prize and the Stephen O. Rice Prize.
143

Index

A-CSMA, 1, 10 Gradient Algorithm, 123


Admission Control, 14
Algorithm 1 Independent Set (IS), 22
Throughput Optimality, 29 Independent Sets, 1, 4
Algorithm 1(b): Reducing Delays, 34 Maximal, 4
Algorithm 1: Stabilization, 28 Input Activity (IA), 103
Algorithm 2: Stabilization, 31 Input Matrix, 103
Algorithm 3: Utility Maximization, 62 Insensitivity Theorem, 53
Algorithm 4(b): Reduce Delays, 84 Interior of a Set, 22
Algorithm 4: Collisions, 82 Irreducible, 4

Back-Pressure, 2 KL Divergence, 26
Backpressure, 17 Kullbach-Liebler Divergence, 26

Conflict Graph, 1 Longest Queue First (LQF), 1


Conflict Graph (CG), 22 Lyapunov Function, 4
Coupling Argument, 42 Markov Random Field, 24
CSMA Algorithm, 23 Maximal Entropy, 31
CSMA Markov Chain, 8, 24 Maximal Throughput, 23
Invariant Distribution, 24 Maximum Weighted Matching (MWM), 1, 6
CSMA Protocol, 8 Minislot, 76
CSMA/CA Markov Chain, 78 Mixing Time Bound, 43
Deficit, 102, 105 Positive Recurrent, 4
Deficit Maximum Weight (DMW), 101, 105 Processing Networks, 2
Detailed Balance Equations, 9
Distributed Scheduling Algorithm, 23 Randomized Scheduling Algorithm, 1
Dummy Packet, 8 Rate Stability, 23

Feasible Rates, 5, 22 Scheduling Algorithm, 22


Fictitious Parts, 102 Scheduling Problem, 1
Flow of Parts, 103 Service Activity (SA), 103
144 INDEX
Service Matrix, 103 Transmission Aggressiveness (TA), 24
Slot, 76 Transmission State, 22
Stabilizing Queues, 23
Stochastic Approximation, 123, 127 Utility Function, 60
Strictly Feasible Rates, 5, 22 Utility-Maximizing Algorithms, 2

Throughput-Optimal, 2, 23 Virtual Queue Length, 105


Time-Reversible, 9 Virtual Queue Lengths, 102

You might also like