You are on page 1of 21

Telecommun Syst (2006) 31: 141–161

DOI 10.1007/s11235-006-6517-7

On SCTP multi-homing performance

Andreas Jungmaier · Erwin P. Rathgeb

Received: February 14, 2006 / Accepted: February 15, 2006



C Springer Science + Business Media, LLC 2006

Abstract The Stream Control Transmission Protocol (SCTP) is a general purpose trans-
port protocol featuring multi-homing support, message oriented and more flexible data de-
livery mechanisms than TCP, and an increased protection against well-known attacks. Orig-
inally developed for the transport of Signaling System No. 7 messages, e.g. MTP level 3
user primitives, over IP networks, SCTP has evolved to a general purpose transport proto-
col with a wide field of applications. With respect to multi-homing, the current SCTP stan-
dard uses this feature for network level redundancy only. Therefore we propose and evaluate
in this contribution mechanisms for the application-specific optimisation of the SCTP pro-
tocol behaviour with respect to its multi-homing capabilities. To satisfy the extremely strict
performance requirements for signalling transport, efficient load sharing among all active
links is also highly desirable in SCTP scenarios. To this end, we propose a novel, improved
load sharing algorithm for SCTP with path based selective acknowledgements which avoids
some of the drawbacks of the existing algorithms and achieves an increase in throughput.
Results of a comparative simulation study are presented to demonstrate the benefits of our
algorithm.

Keywords SCTP . Transport protocol . Multi-homing . Load sharing . Signalling transport

1. Introduction

The Stream Control Transmission Protocol (SCTP) [14] is the third IP-based transport protocol
defined by the IETF, besides TCP and UDP and became a full standards track RFC (i.e. a valid
internet standard) in October 2000.
The reason for introducing SCTP was the expected migration of public voice services from
circuit switched ISDN platforms onto enhanced IP-based next generation networks (NGN)
providing Voice-over-IP (VoIP) services. In such a migration scenario, where classical and

A. Jungmaier · E. P. Rathgeb
Computer Networking Technology Group, IEM, University of Duisburg-Essen Ellernstr. 29, D-45326 Essen
e-mail: {ajung,rathgeb} @iem.uni-due.de
Springer
142 A. Jungmaier, E. P. Rathgeb

new networks will coexist for an extended period of time, seamless interworking of the two
concepts is a crucial issue. More important and demanding than transcoding of user data is
the aspect of signalling interworking. Assuming situations where ISDN islands are intercon-
nected via IP-based networks or where upper layer servers or data bases, e.g. for Intelligent
Network (IN) functions or mobility support for cellular networks, are located in the IP do-
main, fully functional and highly reliable and performing signalling transport via IP is required
[10].
Designed by the Signalling Transport working group of the IETF in particular for the trans-
port of signalling data, SCTP is – nevertheless – a general purpose transport protocol provid-
ing a more flexible data delivery service than TCP by using a specific SCTP stream layer,
and increased fault tolerance by allowing network level redundancy (SCTP multi-homing).
Therefore, SCTP is also a candidate transport protocol for applications beyond signalling
transport. One example is the ‘Reliable Server Pooling’ [17] concept of the IETF RSer-
Pool working group which requires the use of SCTP. Other working groups also included
SCTP as an option in their documents, e.g. for the new AAA protocol Diameter. Two ma-
jor SCTP extensions have been defined so far, providing a partially reliable delivery option
(PR-SCTP [13]) on one hand and the ability to dynamically add and drop IP addresses from
a multi-homed SCTP association (Dynamic Address Reconfiguration [12]) on the other. With
these extensions, there are even more areas in which the use of SCTP is promising, including
e.g. novel concepts for IP mobility support [2].
Many of the defined – as well as potential – SCTP applications, and in particular signalling
transport, have both strict performance and reliability requirements which must be specifically
supported by SCTP. In this respect, SCTP multi-homing scenarios providing multiple, redundant
paths through the IP network are of particular interest. However, the current SCTP standard only
uses redundant network paths as backup for retransmission of lost packets and to provide fast
switchover in case of network failures. Under normal operating conditions, one path carries
the full traffic load while the others remain unused except for heartbeat messages probing their
availability. Considering potential performance improvements, it is obvious to suggest to also
use the redundant links for load sharing to achieve a more balanced network load and to increase
application throughput in case of bandwidth limited links. However, simply distributing the load
evenly to all redundant links is not sufficient, since the systematic violation of packet sequence
integrity resulting from different end-to-end delays on the redundant paths will severely interfere
with the window based SCTP flow control and the acknowledgment based error control and
recovery mechanisms. Therefore, corresponding adaptations have to be introduced in these areas
to allow to actually benefit from load sharing.
After an overview of the relevant SCTP features and mechanisms and a review of the
tools and scenarios used to investigate SCTP multi-homing and SCTP-based load sharing,
we will look into possible optimisations to SCTP multi-homing algorithms and especially dis-
cuss SCTP-based load sharing in some detail. Based on a review of the already existing SCTP
load sharing proposals, we will propose a modified and improved algorithm based on path
specific acknowledgements and confirm its benefits by providing a quantitative performance
comparison.

2. An overview of the stream control transmission protocol

SCTP connections (named associations in SCTP parlay) are established after a 4-way handshake
between two SCTP endpoints (i.e. protocol instances), usually a client and a server.

Springer
On SCTP multi-homing performance 143

The association setup request contains a list of valid transport addresses1 . The set of possible
combinations of n transport addresses of the server with m transport addresses of the client
defines all possible n × m path identifiers of one association. Thus, SCTP explicitly supports
multi-homed endpoints allowing to use multiple paths through the IP network to provide end-
to-end redundancy and fault tolerance.
SCTP is a message-oriented, reliable transport protocol which – other than TCP – preserves
message boundaries. The protocol may multiplex several short messages into one SCTP packet
(subsequently transmitted as IP payload) to reduce transmission overhead for small (signalling)
messages. By using MTU discovery, SCTP avoids IP fragmentation.

2.1. The SCTP packet format

SCTP packets consist of a common header, followed by a variable number of information units,
which are named chunks. There are two types of chunks: control and data chunks. Data chunks
contain the actual user messages, while control chunks are used to support the peer-to-peer
protocol. Control chunks are provided, e.g., for selective acknowledgements, monitoring of
peer reachability with heartbeats, setup and termination of associations, error messages and,
optionally, protocol extensions.
The SCTP common header contains source and destination port numbers, similarly to TCP
and UDP, and a 32 bit checksum. Moreover, it carries a 32 bit value named tag which is a
randomly chosen value exchanged with the peer endpoint at association start up. The tag protects
associations from ‘blind attacks’, i.e. where the attacker tries to blindly insert forged SCTP
packets into an association. As SCTP is a transport layer protocol (much like TCP), it does not
protect communication from man-in-the-middle attacks.

2.2. SCTP data transmission

Reliable data transmission involves two chunk types: the data chunk and the selective acknowl-
edgement chunk (SACK). Data chunks carry higher layer user data and a 32 bit transmission
sequence number (TSN). Each data chunk must be acknowledged by the receiver. If two pack-
ets with data chunks arrive within tsack (usually, tsack = 200 ms), the receiver returns a SACK
immediately after the second packet.
SCTP uses multiple selective repeat mechanisms for error recovery. The SACK contains a
cumulative TSN to be acknowledged (CTSNA), indicating the highest TSN that has been received
in sequence without interruption. Additionally, the SACK acknowledges all other data chunks
with higher TSNs that have been successfully received in a so-called gap report structure. The
sender interprets the CTSNA and the gap reports and when a TSN has been reported missing for
the fourth time, a fast retransmission is triggered.
The amount of data that may be outstanding (i.e. sent but not yet acknowledge) is limited by
the receiver window. The current value of the receiver window is also contained in the SACK
chunk. Flow control parameters are separately computed for each path (specific pair of sender
and receiver transport addresses) to the peer. These parameters are the congestion window cwnd,
and the slow start threshold ssthresh. Similar to TCP, when the cwnd for a path is less than the
current ssthresh, the path is said to be in ‘slow start’, else in ‘congestion avoidance’ mode.

1a transport address is defined as the combination of one of the endpoint’s (multiple) host IP addresses with a port
number that is common for all transport addresses belonging to an endpoint
Springer
144 A. Jungmaier, E. P. Rathgeb

The cwnd for a path is additively increased (by at most one MTU) when new data has
been acknowledged and the CTSNA value has advanced. It is decreased (halved) when fast
retransmissions are triggered, or a timeout has occurred (in which case the cwnd is reset to
one MTU). Thus, SCTP uses an Additive Increase, Multiplicative Decrease (AIMD) algorithm,
similarly to TCP, and its flow and congestion control is therefore TCP-compatible.

2.3. Message streams

SCTP provides its user with flexible methods of data delivery by separating the reliable transfer
of messages between endpoints (see Section 2.2) from the actual delivery to the application.
This is achieved at the cost of introducing an internal multiplexing layer for so called streams
identified by 16 bit stream identifiers. To be able to perform resequencing and delivery on a
per-stream basis, 16 bit stream sequence numbers and stream identifiers are provided in addition
to TSNs, and are transported within data chunks.
SCTP streams are effectively unidirectional channels, within which messages are usually
transported in sequence. The application may also request a message to be delivered by an
unordered service, which can reduce blocking effects in case of message loss, since the reordering
mechanism of one stream is not affected by that of another stream that may have to wait for a
retransmission of a previously lost data chunk.

2.4. Multi-homing

Support for multi-homing refers to the capability of SCTP to establish communication between
hosts that have one or more IP addresses, and to make use of multiple paths between these
hosts. To ensure compatibility with established transport protocols, such as TCP, only one path is
chosen as primary path and subsequently carries the main traffic load. The multi-homing concept
is explained in some more detail in Section 4.

2.5. Extensions to SCTP

SCTP is extensible through the use of new control chunks (cf. Section 2.1). In the following we
present two important protocol extensions one of which has already been standardised within
the IETF, while the other is still being actively developed in standardisation.

2.5.1. Partially reliable message transfer (PR-SCTP)

The extension for partial reliability [13] specifies a mechanism an endpoint can use to indicate
to its associated endpoint that some lost data chunks will not be retransmitted. A new parameter,
called fwtsn is defined for that purpose together with a new control chunk type, the Forward TSN
chunk, to transport it. The sender of this chunk does not need to retransmit any data chunk with
a TSN less than the fwtsn value. The receiver of the Forward TSN chunk advances its CTSNA
(see Section 2.2) to fwtsn, and further if possible, and stops indicating the skipped data chunks
as missing.
This mechanism is advantageous, e.g. in the following scenarios:

r In times of congestion, retransmitted data may add to the congestion. By skipping the retrans-
mission, the network is relieved of additional traffic caused by retransmissions.
Springer
On SCTP multi-homing performance 145

r User data may have a limited time of validity (e.g. packetized voice samples, or sensor data when
a more recent sensor reading is available). After that time, there is no point in retransmitting
this obsolete data.
SCTP implementations conveniently allow for specifying a lifetime for data that is to be sent.
After expiry of this lifetime, data may not be sent or may not be retransmitted again.

2.5.2. Dynamic address reconfiguration

The “SCTP Dynamic Address Reconfiguration” extension proposed in [12] may be used
to dynamically add or remove addresses from an established SCTP association. Moreover,
this extension allows to signal to the peer endpoint which IP address is preferrable as
the primary address. Thus, in a handover situation where connectivity to two networks may
be given, a mobile device can signal to its peer which network is the preferred destination to
send data to and thus improve data throughput.
This is achieved by the use of the new Address Configuration Change (ASCONF) control
chunks, which contain a variable number of request parameters for the peer. These either signal
requests for
r an addition of an address,
r a removal of an address, or
r setting a primary address.
This mechanism can also be used in cases where the originating source IP address of the AS-
CONF request does not match any known SCTP association (when addresses have changed
before this could be signaled to the peer endpoint): Usually, the association to which a packet
belongs is determined by the combination of source and destination IP addresses and port of a
received SCTP packet. Thus, for an ASCONF request it may not be possible to find the proper
association. For this end, the ASCONF contains an additional address parameter which allows
for the receiver of the ASCONF to determine the association. This parameter must contain an
address that was known to belong to the concerned association beforehand.

3. Implementation and deployment

This section will briefly discuss some SCTP implementations available to date (May 2005), and
looks into some tools we developed specifically for protocol testing purposes. In order to be able
to evaluate SCTP in a wider setting, we also created an OPNET based discrete event simulation
model discussed below.

3.1. SCTP kernel implementations

Since SCTP is a transport protocol based on IP, the ususal place for a working implementation is
the kernel of an operating system (much like it is the case for TCP). Within the SCTP community,
a number of kernel implementations have been developed in the past. As we cannot discuss all
implementations here, we will focus on two kernel implementations that were developed for
open source operating systems, namely FreeBSD and Linux.
The FreeBSD implementation comes with the KAME network stack readily available for
most BSD based unices (e.g. FreeBSD, OpenBSD and NetBSD). It was mainly developed by
Randall Stewart, one of the main authors of the SCTP RFC [14], and Peter Lei (both of Cisco),
Springer
146 A. Jungmaier, E. P. Rathgeb

and is the most feature complete, stable and up-to-date implementations available to date. It
supports both the SCTP extension for partially reliable transfer of messages and the dynamic
address reconfiguration (cf. Section 2.5).
The Linux kernel SCTP implementation has been sponsored by IBM, Motorola, Nokia and
Intel, and is currently being maintained by Sridhar Samudrala (of IBM). It is part of the latest
2.6 Linux kernel, quite stable, and features around 25000 lines of C code.
Both implementations offer a standardized, TCP- or UDP-compatible socket-based interface
for the use of most SCTP functions (cf. [15]), and come with a library that can be linked to
applications to use the more advanced SCTP features.

3.2. SCTPLIB – An open source SCTP implementation

As part of a cooperation with an industrial partner, SCTPLIB, a full standards-compliant SCTP


implementation – available for free as open source [9] – together with a suite of test applications
was created and further developed by our group and tested at several international interoperability
meetings. SCTPLIB is written in C, runs on Linux, FreeBSD, MacOS X, Solaris, and Windows,
supports the PR-SCTP and the dynamic address reconfiguration extensions (cf. Section 2.5), and
has around 12.000 lines of code.
The portability of SCTPLIB comes at the price that it is not a kernel but a userland imple-
mentation that relies on a privileged server process for handling SCTP network events (SCTP
server) and non-privileged applications that link to the SCTPLIB library. The data and primitive
exchange between SCTP server and SCTP applications is realized by a local interprocess commu-
nication mechanism also implemented in the SCTPLIB. The privileged server handles network
events (incoming/outgoing SCTP packets, ICMP packets), local IPC events for communication
with user processes, and timers, and distributes events and data to the proper user application
processes. The user programs need to register with the SCTP server, and implement the base
SCTP protocol (cf. Figure 1). All data that is exchanged between applications and server uses the
local IPC mechanism, which may adversely affect performance in certain high load scenarios.

3.3. Simulation environment

In order to be able to evaluate SCTP in a greater parameter space and within more elaborate
network topologies, a discrete event simulation model was developed based on the simulation
tool OPNET modeler [16].
To evaluate the SCTP performance for multi-homed endpoints in the case of path failures,
and for investigating different load sharing algorithms, we decided to create our own SCTP
simulation model, since at the time only an NS-2 SCTP model was available as open source
which implemented SCTP as an extension to TCP and did not readily allow for modeling native
multi-homing. Our OPNET [16] based SCTP simulation model is loosely based on the SCTPLIB
implementation mentioned above and interfaces with the native OPNET network layer models
of IP. Therefore, our SCTP model can be used with all IP-based OPNET node models, and
extensions of this SCTP model for use with more than two IP addresses are trivial.

4. Investigation of SCTP multi-homing

The SCTP support for multi-homing, which is enforced by the endpoints end-to-end, is one of
the key features2 of SCTP, that distinguish this transport protocol from other reliable transport

2 besides the support for independent message streams


Springer
On SCTP multi-homing performance 147

Fig. 1 SCTPLIB implementation structure

protocols as, e.g., TCP. Multi-homing refers to the capability of SCTP to establish communication
between hosts that have one or more IP addresses, and to make use of multiple paths between
these hosts. For an endpoint with an established association, the notion of a path is equal to that
of the transmission route towards one destination transport address of its peer endpoint. Thus, a
multi-homed endpoint may reach the peer via a number of different paths. One of the paths to
the peer is chosen as primary path and subsequently carries the main traffic load. The user or
application may, however, explicitly request to use a path other than the primary for transmission.
When the primary path carries the main load, growth of the congestion window cwnd only
occurs for this path which is desirable for achieving fairness to TCP. Other paths are then only
used for data retransmissions and heartbeat control chunks. Compared to protocols that do not
support multi-homing, sending retransmissions on paths that are not congested will have an
advantageous effect on recovery from packet loss [8].
As shown in the following sections, SCTP multi-homing can be used for providing either
r network level redundancy by ensuring the use of physically separate network paths,
r higher performance compared to, e.g., TCP especially for demanding applications such as
signalling transport, and
r with certain modifications to standard SCTP, a distribution of the traffic load which ensures an
optimal use of existing network resources.

4.1. Path and peer monitoring

By default, SCTP endpoints monitor peer reachability and path states by regularly sending
heartbeat control chunks to all of their destination addresses. These are immediately answered
Springer
148 A. Jungmaier, E. P. Rathgeb

by the peer with heartbeat acknowledgement control chunks. For each path, the endpoint will keep
an error counter that is being incremented should the endpoint not receive an acknowledgement
before a timer elapses. If the error counter exceeds a threshold (which is a configurable parameter),
the state of the path will be set to unreachable. The endpoint will then continue to send heartbeats
to this address, allowing to reinstate the path status to reachable later.
Since endpoints should send their acknowledgements of data and heartbeat control chunks
back to the originating peer destination address [14], paths that are actively used for data trans-
mission need not be monitored by heartbeat chunks.
An SCTP endpoint also keeps track of the number of consecutive retransmissions of data
or heartbeat chunks sent to the peer endpoint on an association level (as opposed to the path
level). Each time a chunk is acknowledged timely, the corresponding association error counter
is cleared. Once the counter exceeds the association error limit, the peer endpoint is considered
unreachable, and the association is closed.

4.2. SCTP behaviour in case of path failure

Assuming a signalling session between two dual-homed hosts, A and B, we investigated the
behaviour of SCTP in case of a failure of the primary path. The relevant SCTP parameters in this
case are the protocol parameters
r RTOmax , the maximum retransmission timeout; defines the maximum time after which a re-
transmission occurs, if no SACK has been received after a data chunk was sent.
r RTOmin , the minimum retransmission timeout; defines the minimum time after which a re-
transmission occurs.
r PRL, the path retransmission limit; integer value that indicates the threshold for the number
of retransmissions that must be exceeded before a path is considered out of service.
r ARL, the association retransmission limit; integer value that indicates the threshold for the
number of retransmissions that must be exceeded before an association is considered out of
service, and subsequently closed.
The current RTO and path error counters are computed separately for each path from an asso-
ciation endpoint to its peer and the association error counter is computed once per association.
Error counters are reset whenever a data or heartbeat chunk has been acknowledged (for the
association and for the path concerned).
For ensuring TCP compatibility, the recommended parameter settings for standard SCTP are
an RTOmin of 1 s, an RTOmax of 60 s, a PRL of 5 and an ARL of 10 [14]. With these values,
a path failure is only recognized and indicated to the upper layer (i.e. the application) after
1 + 2 + 4 + 8 + 16 + 32 = 63 s. On the other hand, requirements for transport of signalling
data in the Message Transfer Part (MTP) of the Common Channel Signalling System No. 7 [4]
in case of an MTP 2 link failure are such that a change-over process to a backup MTP 2 link must
take no longer than 800 ms. The change-over procedure is the process of reporting the MTP 2
link failure to the upper layer (MTP 3), retrieving all messages not yet sent, and re-sending these
messages on a secondary (active) MTP 2 link. Therefore, the SCTP parameters obviously need
some tuning in order for SCTP to be applicable to signalling transport of MTP 2 messages over
IP-based networks, as, e.g., for the MTP2-User Peer-to-Peer Adaptation Layer [3].
Possible suitable parameter settings were evaluated using the OPNET-based SCTP simulation
model (cf. Section 3.3) based on a simple topology with two dual-homed hosts, A and B, that were
connected by two distinct transmission links. These had a fixed delay of approximately 10 ms,
and a link bandwidth of 2.048 MBit/s. The host application mimicks the relevant behaviour of an
MTP 3 instance with an underlying M2PA stack relying on SCTP for the message transport. The
Springer
On SCTP multi-homing performance 149

application performs a unidirectional data transfer of signalling messages from A to B featuring


an exponentially distributed traffic pattern with a mean message arrival rate of 100 messages
per second with 500 bytes per message. The parameters were chosen to model a lightly loaded
broadband signalling relation where an IP/SCTP/M2PA-based signalling endpoint is connected
to a signalling gateway located less than 100 km away. For a simple IP network with few hops,
the chosen link delay time is then appropriate. Two possible scenarios are being investigated:
1. The MTP 3 has only one link, and the underlying M2PA relies on the dual-homed SCTP
association to provide redundancy. This scenario will be named change-over scenario 1, in
the following.
2. The MTP 3 has two links, and the underlying M2PA relies on MTP 3 to handle link layer fail-
ures. Therefore, two single-homed SCTP associations are sufficient for providing redundancy.
In the following, this is change-over scenario 2.
Figure 2 visualizes the behaviour for the two different scenarios by showing the message
delay as perceived by the receiver as a transient function, as well as a moving average of the
message delay as a function of time. After the link failure and when native SCTP dual-homing
is used as in scenario 1, the receiver gets the first messages earlier, whenever data is re-sent
over the secondary backup path. Also, the situation is back to normal fairly quickly, as the first
successful retransmissions reduce the send queue sizes already. When the failure recognition is
handled by the upper layer, and each association is single-homed only, the transmission queue
size of the first association starts to build up after the first path fails. Subsequent retransmissions

Fig. 2 Typical change-over 700


behaviour in scenarios 1 (top) and 2 Moving Average (10 Values)
(bottom) 600 Individual Message

500
Message Delay [ms]

400

300

200

100

0
500 0 500 1000 1500
Time (ms)
(a)

700
Moving Average (10 Values)
Individual Message
600

500
Message Delay [ms]

400

300

200

100

0
0 500 1000 1500
Time (ms)

(b)

Springer
150 A. Jungmaier, E. P. Rathgeb

are unsuccessful, as they are also sent over the failed path. Subsequently, failure recognition
may happen slightly faster compared to scenario 1 (retransmission timer for path 1 is started
earlier, and therefore elapses earlier). Once the failure has been recognized, the second SCTP
association must go through slow start first, and can then quickly send all queued messages over
path 2.
The parameters that were investigated and varied throughout the following simulation runs
are the PRL values for scenario 1 and ARL values for scenario 2 (between 1 and 5). The results
are plotted depending on the configurable RTOmax value (between 100 ms and 500 ms), and all
Figures show 99% confidence intervals over 20 simulation runs. RTOmin is assumed to be 40 ms,
i.e. twice the RTT. It should be noted that a low RTOmin setting is a requirement for achieving
a low change-over time. Figure 3 shows the values for the maximum message delay during
the change-over process in both scenarios. Interestingly, both scenarios achieve comparable
values for the maximum message delay over the simulated parameter space, although scenario

PRL=5
PRL=4
2000 PRL=3
PRL=2
PRL=1
400 ms

1500
Message Delay [ms]

1000

500

0
0.1 0.2 0.3 0.4 0.5
RTOmax [s]

(a)

ARL=5
ARL=4
2000 ARL=3
ARL=2
ARL=1
400 ms

1500
Message Delay [ms]

1000

500

0
0.1 0.2 0.3 0.4 0.5
RTOmax [s]

(b)

Fig. 3 Message delay during the change-over process (scenario 1 top and 2 bottom)
Springer
On SCTP multi-homing performance 151

Fig. 4 Duration of the change-over 2500 PRL=5


process (scenario 1 top and 2 PRL=4
PRL=3
bottom) PRL=2
PRL=1
2000 800 ms

Failover Duration [s]


1500

1000

500

0
0.1 0.2 0.3 0.4 0.5
RTOmax [s]

(a)

2500 ARL=5
ARL=4
ARL=3
ARL=2
ARL=1
2000 800 ms
Failover Duration [s]

1500

1000

500

0
0.1 0.2 0.3 0.4 0.5
RTOmax [s]

(b)

2 performs slightly worse (by 50–100 ms). For staying safely below a 400 ms delay threshold,
only small values of ARL/PRL can be used (i.e. ARL/PRL=1), or for PRL=2 in scenario 1,
RTOmax has to stay below 200 ms.
Figure 4 shows the duration of the change-over process in both scenarios. This process is
assumed to have terminated after the size of the sending queue of the remaining active SCTP
association has gone back to a normal state after the path failure was recognized and after the
change-over procedure has started. Both figures also show the 800 ms threshold that corresponds
to the limit imposed by [4] on the duration of the MTP change-over procedure.
From the results presented in Fig. 4 it becomes clear that for both scenarios, the change-over
procedure can successfully be finished within the 800 ms limit, provided either the RTOmax
parameter is set sufficiently low (i.e. well below 150 ms) or the ARL/PRL parameter is set to
a low value (i.e. ARL/PRL ≤ 2). Also, while scenario 1 achieves slightly shorter maximum
message delay, the overall duration of the change-over is slightly shorter for scenario 2.
Springer
152 A. Jungmaier, E. P. Rathgeb

Fig. 5 Dual-homed simulation scenario with satellite and backup link

4.3. An optimisation for SCTP multi-homing

In the following section we present a simulation study of a modification of the SCTP retrans-
mission algorithm, comparing it to Standard SCTP. This modification can lead to a vastly
improved protocol behaviour, relying on the fact that a communication endpoint can choose
an optimal path depending on the situation. This leads to significant improvements when path
delay characteristics differ by an order of magnitude. While we first proposed this optimization
in [8] and presented some initial investigations of the behaviour within a testbed environment,
here more detailed simulation studies have been performed and are presented below.
We assume a unidirectional communication between two dual-homed hosts A and B as shown
in Fig. 5. The two hosts are connected by a primary path which is a broadband T1 satellite link
with a bandwidth of W  = 1.544 MBit/s, and a secondary backup link based on a dedicated ISDN
channel with a bandwidth of W  = 64 kBit/s. The satellite link features a long transmission delay
D  = 250 ms, while the secondary ISDN link has a short delay of D  = 10 ms. Furthermore, we
assume a fixed message loss probability on the primary path over a period of several seconds,
e.g. due to bad weather conditions. Host A sends messages with an arrival rate of approximately
136 messages per second, with a negative exponential distribution of interarrival times (i.e.
λ = 0.00736). The message length has a triangle distribution ranging from 20 to 1400 bytes,
with an average of 710 bytes. This results in the source at host A creating an average load of
96.467 KByte/s which makes up for an average link load of approximately 50% on the primary
link. We investigated the SCTP behaviour for different bit error rates (BER) on the primary
link, ranging from 0 to 10−5 (the latter results in one out of 16 messages being dropped due
to transmission errors). For clarity, we present a range for the BER from 0 to 2 × 10−6 in the
following figures only, which corresponds to an average of 1 message out of 82 being dropped
in this scenario. All Figures show a 99% confidence interval over 20 simulation runs.
The behaviour of standard SCTP in the face of message loss is that the receiver will notice
a gap in the sequence of received data chunks, and subsequently reports the missing TSN in all
returned SACK chunks. Furthermore, the receiver will start returning one SACK chunk for each
incoming packet that contains a data chunk, until the gap is closed again3 . As per RFC 2960, the

3 as opposed to one SACK chunk for every second incoming packet containing data chunks in the normal case
Springer
On SCTP multi-homing performance 153

SCTP (optimized)
SCTP (standard)
100000

80000
Avg. SCTP Throughput [bytes/s]

60000

40000

20000

0
0 2e-07 4e-07 6e-07 8e-07 1e-06 1.2e-06 1.4e-06 1.6e-06 1.8e-06 2e-06
Bit Error Rate

Fig. 6 Dual-homed simulation scenario: Throughput vs. BER

receiver returns any packet with a SACK chunk (including those indicating a gap) to the source
address of the incoming data packet that triggered the SACK. Once the sender has received four
SACK chunks with gap reports reporting the same TSN missing, it will immediately re-schedule
the missing data chunk, and retransmit it as soon as possible (i.e. before any new data chunk) using
an alternative path. Assuming that the satellite link is the primary SCTP path, the receiver would
send the SACK chunks back via the satellite path as well, so that the sender can only react to the
message loss after a full RTT over the long delay path (and after having received four SACKs).
The proposed modification, named Fast-SACK, uses the link with the shortest link delay (or
with the shortest RTT when the link delay is unknown) not only for heartbeat messages and for
the actual retransmission of previously lost data packets, but also for returning SACK chunks.
Thereby it is speeding up the growth of the congestion window on the primary path, and also
speeds up the recovery process from lost packets. As shown in Fig. 6, the throughput of the mod-
ified SCTP is constant even for BER values of up to 3 × 10−7 which in our simulation scenario
approximately corresponds to an average of 1 out of 318 packets being dropped due to bit errors.
This is also due to the fact that after a loss event, recovery and growth of the congestion window
is much faster than for standard SCTP. The throughput of standard SCTP, on the other hand,
decreases sharply with increasing BER since any lost packet halves the congestion window and
reduces the slow start threshold of the primary path, and it takes several (long) RTTs to reach a sim-
ilar state again. As can be seen in Fig. 7, the optimisation of the acknowledgement process in error
cases also greatly reduces the maximum message delay in the presence of transmission errors.

5. SCTP-based load sharing

SCTP load sharing potentially provides significant increases in transport protocol performance
(i.e. higher application level throughput) and network efficiency. Since a number of proposals
Springer
154 A. Jungmaier, E. P. Rathgeb

1.8
SCTP (standard)
SCTP (optimized)
1.6

1.4

1.2
Max. Message Delay [s]

0.8

0.6

0.4

0.2

0
0 2e-07 4e-07 6e-07 8e-07 1e-06 1.2e-06 1.4e-06 1.6e-06 1.8e-06 2e-06
Bit Error Rate

Fig. 7 Dual-homed simulation scenario: Max. message delay vs. BER

for load sharing extensions to SCTP exist already, we will give a short review first. Based on
this discussion, we will propose a novel algorithm which enhances load sharing performance by
using path specific acknowledgements.

5.1. Existing load sharing proposals

In the following, we will not address network or link layer load sharing algorithms, as e.g. use
of multiple links between routers with a round-robin distribution of packets over these multiple
links (ECMP), or the PPP multilink protocol [11], since these algorithms operate under different
assumptions, and not typically in an end-to-end fashion.

5.1.1. SCTP with loadsharing extensions (LS-SCTP)

In their internet draft [1], Abd El Al et al. suggest the introduction of new SCTP chunk types and
additional association setup parameters for their load sharing extension to SCTP. They propose to
introduce additional, path related sequence numbers and time stamps in new SCTP data chunks
and acknowledgments. While this is likely to simplify the handling of path related congestion
control parameters when load sharing is used, it also introduces extensions to SCTP that are
not wire compatible with existing SCTP implementations. Moreover, the additional meta-data
carried in the proposed LS-DATA and LS-SACK chunks is not necessary, as this information
can be derived from sender information and corresponding interpretation of SCTP selective
acknowledgements received by the sender.

5.1.2. Concurrent multipath transfer (CMT)

In [7], Iyengar et al. propose an algorithm for avoiding the unfair overgrowth of the congestion
control window for an SCTP path that occurs when a change-over is triggered by the application
Springer
On SCTP multi-homing performance 155

layer. SCTP load sharing may be thought of as a cyclic change-over that is triggered (by the
application layer, or by a protocol implementation itself ) whenever the congestion window for a
given path does not allow sending any more new data. At that time, alternate paths may still allow
sending if the sender is not limited by the receiver window of the peer. In this case, the sender
performs change-overs to the alternate paths, and continues to send data. Therefore, sending
continues until the congestion windows of all available paths or the receiver window are fully
exploited.
When change-overs are periodically triggered, significant reordering can be observed by the
receiver [6], which is reported to the sender in SCTP SACK chunks containing gap reports. In this
case, standard SCTP reacts with unnecessary fast retransmissions, and needlessly reduces the
congestion window which limits the overall throughput. Also, since standard SCTP only increases
the congestion window for a path when an incoming SACK advances the highest cumulative
TSN acknowledged so far (CTSNA), the congestion window grows too slow. When chunks are
delivered to the receiver out of sequence over multiple paths, a standard SCTP implementation
will send back too many SACK chunks (one SACK for every new incoming data chunk) even
if packet loss is not occurring. Therefore, the rate of returned SACK chunks can and should be
reduced when load sharing is applied. In [6], Iyengar et al. propose the concurrent multipath
transfer (CMT) which aims at

1. avoiding unnecessary fast retransmissions by proposing an algorithm that only increases the
gap counter for data chunks in the retransmission queue when (i) they were reported missing
by an incoming SACK chunk, and (ii) when higher TSNs were already acknowledged for the
path to which the data chunk had been sent.
2. allowing more (and fairer) updates of the congestion window, since the congestion window for
a path should not only be increased, when the CTSNA value for an association is increased by
a new incoming SACK. If load sharing was applied, this would lead to a stronger growth of the
congestion window for the slower path only. Therefore, a path CTSNA variable is introduced
which stores the highest TSN that was acknowledged for this path without discontinuity. Now
the congestion window is advanced for paths on which new data chunks were acknowledged,
and for which the path CTSNA has advanced.
3. delaying selective acknowledgements appropriately so that unnecessary SACKs need not be
sent. With CMT, flags are used to ensure that retransmissions are still triggered in time (even
by fewer than four SACKs).

5.2. Path based selective acknowledgements

Although the algorithms discussed in Section 5.1.2 adapt the behaviour of the SCTP flow control
and error recovery mechanisms fairly well to the specifics of a load sharing scenario – in particular
in homogeneous environments with respect to link capacity and path delay – there is still room for
improvement. We therefore propose path based selective acknowledgement, in short PB-SACK.
The basic idea of our proposal is that the load sharing receiver maintains a SACK counter
d(i).path sack count for each path d(i) – and not per association as in [6] – and increases this
counter by one whenever a packet containing data chunks is received on the corresponding path.
Whenever d(i).path sack count = 2, a SACK chunk is immediately sent to d(i) and the counter
is reset to d(i).path sack count = 0. As a result, other than with the combination of algorithms
discussed in Section 5.1.2, two successive SCTP packets with data chunks arriving on different
paths can trigger two SACKs on these paths. It should be emphasized, that nevertheless the
PB-SACK algorithm still sends one SACK chunk for every two data packets on average which
is in accordance to the requirement in [14] (also cf. Section 2.2).
Springer
156 A. Jungmaier, E. P. Rathgeb

Upon receipt of a SACK chunk, the sender performs the following actions:

r For all paths d(i), set the flag d(i).saw new path sack = FALSE.
r For all paths d(i), set the flag d(i).new pCTSNA = FALSE.
r For any path d(i), for which a data chunk has been newly acknowledged, set the flag
d(i).saw new path sack = TRUE.
r For any d(i) for which d(i).saw new path sack = TRUE, find the highest TSN newly acknowl-
edged. Store this value in d(i).highest path tsn acked.
r For any d(i) for which d(i).saw new path sack = TRUE, store the number of bytes newly
acknowledged in d(i).newly acked bytes.
r For any d(i) for which d(i).saw new path sack = TRUE, find the corresponding d(i).pCTNSA.
If d(i).pCTSNA was advanced by the SACK that is being processed, set the flag
d(i).new pCTSNA = TRUE.
r For any d(i) for which d(i).new pCTSNA = TRUE, and for which the number of outstand-
ing bytes is higher than the congestion window d(i).cwnd, increase the congestion window
as required in sections 7.2.1 and 7.2.2 of RFC 2960 [14], e.g., in slow start, if the num-
ber of outstanding bytes on d(i) exceeds d(i).cwnd, the congestion window is increased by
d(i).cwnd+ = min(d(i).newly acked bytes, d(i).pMTU).
r If the SACK chunk contains gap reports, check for any data chunk t remaining in the retrans-
mission queue that is reported missing and was sent to path dt , whether dt .saw new path sack
= TRUE and t < dt .highest path tsn acked. If so, increase the gap counter for t. If this counter
reaches the threshold (e.g., 4), perform a fast retransmission as per Section 7.2.4 of RFC 2960.

By using this algorithm, the sender receives one SACK chunk on a path for any two packets
with data chunks sent over this path. This strict allocation of SACK chunks to their paths is
used for the so-called SACK-clocking, where each incoming SACK triggers an update of the
outstanding bytes counter, and the receiver and congestion windows.

5.3. Simulation environment

For evaluating the results of the above mentioned algorithms we added the CMT algorithm as
well as the PB-SACK algorithm to our OPNET SCTP simulation model. The simulation scenario
reflects a typical case in which two dual-homed IP-based signaling end points are connected to
routers via a fast LAN technology (gigabit ethernet). The routers in turn are interconnected
by broadband WAN links, as shown in Fig. 8. Typically, these WAN links use transmission
systems found in public networks, e.g. PDH (E3/T3) or SDH/Sonet (STM-x) providing data
rates ranging from 34 MBit/s (E3) to 155 MBit/s (STM-1) and beyond. In order to estimate the
maximum throughput that can be achieved by SCTP with different load sharing implementations,
we assumed a unidirectional data transmission initiated by a saturated traffic source sending
data chunks of constant length towards the sink. Without loss of generality, we assume long
SIP signalling messages, or SS7 broadband MTP3 messages [5] with 1000 bytes payload, and
one data chunk per SCTP packet. The results presented in the following section are averaged
throughput values as perceived by the application and averaged values for the congestion window
as perceived by the SCTP protocol entity. To isolate the effects due to the load sharing algorithm
from those induced by competing traffic in the network, a scenario without interfering background
traffic has been used. Due to the fairly deterministic scenario, the confidence intervals calculated
from the repeated simulation runs are insignificantly small and have been omitted.
Springer
On SCTP multi-homing performance 157

Fig. 8 Simulation scenario

5.4. Simulation results

In order to evaluate scenarios that are relevant to the purpose of signalling transport, we assumed
that the two bottleneck links are typical E3 broadband links with a bandwidth of W = 34,368
MBit/s. The delay of Path 1, d1 , was configured to be 10 ms, and the delay of Path 2, d2 , was
varied between 10 ms and 200 ms (cf. Figure 8). The properties of SS7 signalling links are well
within this range, with delays typically below 100 ms.
For comparison, the throughput of a single-homed SCTP endpoint – the same as for a multi-
homed SCTP endpoint using standard SCTP without loadsharing – is also given in the figure
(for one, we assume the slower path 2 is used: the graph is labelled “only path 2”, and when
path 1 is used, the throughput remains constant since is does not depend on d2 ). As indicated by
the standard SCTP curve for path 2 only, an SCTP association could fully exploit the bandwidth
of the bottleneck link until the bandwidth delay product limits the achievable throughput – as is
commonly known for all window based transport protocols. Thus, up to a delay d2 of 50 ms, the
throughput is limited by the link bandwidth, and for d2 > 50 ms, the throughput is limited by
the receiver window, and the link delay. Note that the throughput in Fig. 9 is that as perceived
by the application layer and takes into account the overhead of IP and SCTP headers.
Moreover, Fig. 9 shows the throughput of both load sharing algorithms, CMT and PB-SACK.
The through-put for both algorithms is limited by the bandwidth of the bottleneck links and
fully exploits the link capacity for values of d2 ≤ 20 ms, and in the case of PB-SACK, even for
d2 ≤ 30 ms. Due to the interdependence between transmissions on both links in the loadsharing
scenario, the throughput for higher delays of link 2 decreases, even though d1 remains constant
at 10 ms.
Let rd be the ratio of d2 /d1 . For values of rd > 3, i.e. where the difference of link delays is
substantial, the higher delay of Path 2 affects the overall association throughput, which becomes
limited by the receiver window. Path 1 is blocked in this case. The receiver cannot free its buffers
at that time, as it needs to wait for earlier messages arriving on Path 2. Were we to use multiple
independent message streams, the throughput would be higher, as messages on different streams
could be delivered independently without blocking, thus freeing up the receiver window.
For high values of rd , e.g. rd ≥ 9, an association does not benefit from load sharing in the
case of two equal bandwidth links, as the throughput approaches the limit for a standard SCTP
association without loadsharing. Therefore, SCTP implementations should generally refrain from
Springer
158 A. Jungmaier, E. P. Rathgeb

9000
Limit (2 Associations)
PB–SACK
8000 CMT
SCTP (only path 1)
7000 SCTP (only path 2)

Throughput [KByte/s]
6000

5000

4000

3000

2000

1000

0
0 50 100 150 200
Delay (Path 2) [ms]

Fig. 9 Application layer throughput of two load sharing algorithms

0.6
f(d2)=3d2
f(d2)=2d2
f(d2)=d2
0.5 CMT
PB–SACK
Max. Message Delay [s]

0.4

0.3

0.2

0.1

0
0 50 100 150 200
Delay (Path 2) [ms]

Fig. 10 Maximum message delay of two load sharing algorithms

using load sharing in this case. Still, over the whole parameter range, our path specific SACK
algorithm yields a higher throughput than the CMT algorithm.
Figure 10 shows the values of the maximum message delay of the simulation for both load
sharing algorithms. These were also determined after eliminating the initial transient effects of
the simulation. It is obvious that the increase in throughput of the PB-SACK algorithm has not
been achieved at the cost of a substantially higher maximum message delay. Indeed, for values
of d2 < 50 ms, the PB-SACK performs somewhat better than the CMT algorithm. This is due to
the fact that for the PB-SACK algorithm, the growth of the congestion windows for both paths is
more in line with the delay exhibited by each path. The CMT algorithm can cause transmissions
of successive SACK chunks over path 2 which has a higher delay. In this case, the growth of the
congestion window of both paths is likely to happen somewhat slower.
Springer
On SCTP multi-homing performance 159

400000 PB–SACK
CMT

350000

300000

Aggregate CWND
250000

200000

150000

100000

50000

0
0 50 100 150 200
Delay (Path 2) [ms]

Fig. 11 Development of the aggregate CWND for two load sharing algorithms

For rd = 10, i.e. d2 = 100 ms, both algorithms reach a state where effective traffic load
distribution cannot be guaranteed any more (for this the difference in link characteristics has
become too significant). At that point, both algorithms allocate traffic almost exclusively to Path
1, and the maximum message delay equals the link delay of the slower path, i.e. it is 100 ms. For
higher values of d2 , outstanding data on the slower path blocks the sender from sending more data
(i.e. the receiver window is fully used). Therefore for increasing values of d2 , the throughput of
both variants approaches that of a single-homed association using only path 2, and the maximum
message delay increases strongly.
Finally, Fig. 11 shows the development of the aggregate congestion window for both algo-
rithms. As expected (throughput of PB-SACK is higher, its delay lower), the value of the aggregate
congestion window is higher for PB-SACK than for CMT for d2 < 70 ms. However, this is not the
case beyond d2 = 100 ms and the value is substantially smaller than that of the CMT algorithm
around d2 = 150 ms. This seems counter intuitive, as in this region the absolute throughput for
the PB-SACK algorithm exceeds that of the CMT algorithm by almost 500 KByte/s. This clearly
indicates that the growth of the aggregate congestion window is not the only major criterion that
should be optimized to achieve efficient load sharing as the results in [6] suggest. Moreover, for
high values of rd , i.e. rd > 10, the blocking effects of the limited receiver window become more
significant, whereas the sizes of the congestion windows are of lesser importance.
The more inhomogeneous the scenarios become, the more important it gets that the load
sharing algorithm adapts well to the respective characteristics of the individual links in terms
of bandwidth and delay. This has been achieved by introducing path specific selective acknowl-
edgements. The argument also holds for heterogeneity caused by links with different capacity
within the network. Simulations have shown that also in these cases the path specific selective
acknowledgments result in a higher throughput compared to just optimizing the value for the
aggregate congestion window.

6. Conclusion and outlook

With a more widespread usage of SCTP and a growing variety of SCTP applications, the issue
of optimizing SCTP multi-homing performance and protocol variants that allow for effective
Springer
160 A. Jungmaier, E. P. Rathgeb

end-to-end load sharing will become increasingly interesting. In this respect, a wide range of
optimisations can lead to significant improvement of the transport protocol performance, but
also requires application specific tuning. Simulation results were presented for a scenario with
diverse path delay characteristics, in which the default behaviour of standard SCTP leads to
suboptimal results. By introducing a modification of the acknowledgement algorithms, a sub-
stantial improvement both in terms of throughput and message delay characteristics could be
achieved.
Assuming network scenarios with a certain degree of homogeneity with respect to link ca-
pacity and delay characteristics on redundant SCTP paths, load sharing mechanisms can yield
significant benefits. Therefore, such mechanisms have already been proposed to IETF stan-
dardization. While LS-SCTP introduces incompatible protocol extensions, CMT tries to adapt
SCTP flow control mechanisms to the specific requirements of a load sharing scenario with
the goal to optimize the value for the aggregate congestion window of the association. How-
ever, in less homogeneous load sharing scenarios, it is advantageous to adapt the congestion
windows on a per path basis instead. This reasoning led to the definition of the novel load
sharing variant using path based selective acknowledgements (PB-SACK) presented in this pa-
per. The simulation results presented confirm that this algorithm provides better throughput and
end-to-end delay characteristics than CMT. Furthermore, with respect to standardization require-
ments and implementation complexity, it is quite similar to CMT. Therefore, when load sharing
extensions to SCTP are further discussed in the IETF working group, PB-SACK is one of the
candidate algorithms that should be considered.
While the simulations performed so far have confirmed the benefits of load sharing and
the superior performance of our PB-SACK algorithm with respect to the maximum achievable
throughput for given delay bandwidth combinations, additional simulations quantifying the av-
erage gain in more dynamic scenarios (bursty, non-saturated senders and interfering background
traffic in the network) should be performed. In addition, a study on how the extension of the
scenario to more than two network paths influences the results could provide some additional
insight.

References

1. A. Abd El Al, T. Saadawi and M. Lee. Load Sharing in Stream Control Transmission Protocol, May 2003.
draft-ahmed-lssctp-00.txt, Internet Draft, Work in Progress.
2. T. Dreibholz, A. Jungmaier and M. Tüxen. A new scheme for ip-based internet-mobility. In Proceedings of
the IEEE Conference on Local Computer Networks (LCN2003), Bonn, October 2003.
3. T. George, B. Bidulock et al. SS7 MTP2-User Peer-to-Peer Adaptation Layer. IETF, Network Working Group,
September 2005. RFC 4165
4. International Telecommunication Union. Signalling System No. 7 – Message Transfer Part Signaling Perfor-
mance, March 1993. ITU-T Recommendation Q.706.
5. International Telecommunication Union. Message Transfer Part Level 3 functions and messages using the
services of ITU Recommendation Q.2140, July 1996. ITU-T Recommendation Q.2210 (07/96).
6. J.R. Iyengar et al. Concurrent multipath transfer using sctp multihoming. In SPECTS 2004, San Jose, July
2004.
7. J.R. Iyengar et al. Preventing SCTP Congestion Window Overgrowth During Changeover, February 2004.
draft-iyengar-sctp-cacc-02.txt, Internet Draft, Work in Progress.
8. A. Jungmaier, E.P. Rathgeb, M. Schopp and M. Tüxen. Sctp – a multi-link end-to-end protocol for ip-based
networks. AE – International Journal of Electronics and Communications, 55(1) (2001) 46–54.
9. Andreas Jungmaier et al. SCTPLIB – an SCTP implementation, April 2005. For reference, see
http://freshmeat.net/projects/sctplib.
10. L. Ong, I. Rytina et al. Framework Architecture for Signaling Transport. IETF, Signaling Transport Working
Group, October 1999. RFC 2719.
11. K. Sklower et al. The PPP Multilink Protocol (MP). IETF, Network Working Group, August 1996. RFC 1990.
Springer
On SCTP multi-homing performance 161

12. R. Stewart et al. SCTP Dynamic Address Reconfiguration. IETF, Network Working Group, November 2005.
draft-ietf-tsvwg-addip-sctp-13.txt, work in progress.
13. R. Stewart, M. Ramalho, Q. Xie, M. Tüxen and P. Conrad. Stream Control Transmission Protocol (SCTP)
Partial Reliability Extension. IETF, Network Working Group, Mai 2004. RFC 3758.
14. R. Stewart, Q. Xie et al. Stream Control Transmission Protocol. IETF, Signaling Transport Working Group,
October 2000. RFC 2960.
15. R. Stewart, Q. Xie, L. Yarroll, J. Wood, K. Poon, K. Fujita and M. Tüxen. Sockets API Extensions for Stream
Control Transmission Protocol. IETF, Network Working Group, September 2005. draft-ietf-tsvwg-sctpsocket-
11.txt, work in progress.
16. OPNET Technologies. OPNET Modeler, April 2005. Commercial simulation tool. See http://www.opnet.com/
products/modeler/home.html for further reference.
17. M. Tüxen et al. Requirements for Reliable Server Pooling. IETF, Network Working Group, January 2002. RFC
3237.

Springer

You might also like