You are on page 1of 220

Jamali@iust.ac.ir Jamali@iust.ac.

ir ITransport Layer ITransport Layer


3- 3-1 1
.
Transport Layer
Computer Networks
Computer Networks
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-2 2
Chapter 3 Outline
Chapter 3 Outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-3 3
Protocols/Services
Protocols/Services
application
transport
network
link
physical
DataTransport DataTransport
Services Services
ApplicationProgram ApplicationProgram
Services Services
HoptoHop HoptoHop
protocols protocols
EndtoEnd EndtoEnd
protocols protocols
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-4 4
Data Transport Services1
Data Transport Services1
(API) (API)
App.Software App.Software App.Software App.Software
transport
network
link
physical
application
Controlled Controlled
byOS byOS
Controlled Controlled
byApp.Soft. byApp.Soft.
theapplication
theapplication
transport
network
link
physical
theapplication
theapplication
Data
Transport
Services
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-5 5
Data Transport Services2
Data Transport Services2
Data
Transport
Services
Data
Transport
Services
ComputerNetwork
ComputerNetwork
ComputerNetwork
applicationprocess
applicationprocess
applicationprocess
applicationprocess
applicationprocess
applicationprocess
applicationprocess
applicationprocess
applicationprocess
applicationprocess
applicationprocess
applicationprocess
ApplicationsMessages,Objects,Files

DataTransportServicesareprovidedtotheapplication
DataTransportServicesareprovidedtotheapplication
process.Themainservicesare:
process.Themainservicesare:

Breaking down the messages, in source, and


Breaking down the messages, in source, and
assembling the message, in destination.
assembling the message, in destination.

Source-Destination routing, finding the path,


Source-Destination routing, finding the path,
through the links and routers (switches) of the
through the links and routers (switches) of the
network.
network.

SourceDestination (end to end) flow control. It


SourceDestination (end to end) flow control. It
makes possible slow-running process well
makes possible slow-running process well
communicate with fast-running process.
communicate with fast-running process.

Error detection and correction.


Error detection and correction.
...
...
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-6 6
Services & Layer Protocols
Services & Layer Protocols

Transport-layer Protocols:

They take care of application processes. The processes are


distinguished by means of port numbers.

They control flow intensity between the communicating


processes.

They apply acknowledgment scheme and make a reliable inter- inter-


process communication.

Network-layer Protocols:

They manage to pass packets router-by-router, from source source host


to destination destination host. Hosts are distinguished by means of IP add.

They do accounting for inter- inter-host host traffic.

Link-layer Protocols and Physical-layer Protocols:

They make frames move into links, repeaters, hubs, switches, and
routes in a way from source host to destination host.

They take care of channel coding and error correction system.

They regulate flow intensity between adjacent intermediate


systems.

Transport-layer Protocols:

They take care of application processes. The processes are


distinguished by means of port numbers.

They control flow intensity between the communicating


processes.

They apply acknowledgment scheme and make a reliable inter- inter-


process communication.

Network-layer Protocols:

They manage to pass packets router-by-router, from source source host


to destination destination host. Hosts are distinguished by means of IP add.

They do accounting for inter- inter-host host traffic.

Link-layer Protocols and Physical-layer Protocols:

They make frames move into links, repeaters, hubs, switches, and
routes in a way from source host to destination host.

They take care of channel coding and error correction system.

They regulate flow intensity between adjacent intermediate


systems.
Data
Transport
Services
transport
network
link
physical
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-7 7

network layer: logical communication between host-


router, router-router, router-host.

transport layer: logical communication between


processes.

relies on: enhances from network layer services

extends host-to-host communication to process-to-


process communication
Computer Network - Computer Network - University University analogy analogy
IUST students send letters to TU students

processes = students,

Port number = students ID number,

application messages = letters in envelopes,

hosts = universities,

IP add. = universitys address,

transport protocol = post office of universities

network-layer protocol = postal service of state


Transport Layer vs. Network Layer
Transport Layer vs. Network Layer
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-8 8
Chapter 3 Outline
Chapter 3 Outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-9 9

provide logical communication


between application processes
running on different hosts

transport protocols run in end


systems

sending side: breaks app


messages messages into segments,
passes to network layer

receiving side:
reassembles segments into
messages messages, passes to
application layer

more than one transport


protocol available to
applications.

Internet: TCP and UDP

provide logical communication


between application processes
running on different hosts

transport protocols run in end


systems

sending side: breaks app


messages messages into segments,
passes to network layer

receiving side:
reassembles segments into
messages messages, passes to
application layer

more than one transport


protocol available to
applications.

Internet: TCP and UDP


network
datalink
physical
application
transport
network
datalink
physical
application
transport
network
datalink
physical
Transport Services and Protocols
Transport Services and Protocols
Logical Logicalendtoendtransport endtoendtransport
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-10 10
Ht
Message
application
Ha
Message
transport
Ht Ht Ht
network
App.Processdecidestosend
amessagetoitscounterpart
App.Layeraddsitsheader,
sendsthemessagetotransportlayer
Transportlayerbreaksdown
themessageintoseveralparts,
additsheadertoeachpart
Andmakessegments.
Itsendsonebyonesegments
tonetworklayer
Protocol layering and data
Protocol layering and data
App.Process App.Process App.Process App.Process
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-11 11

reliable, in-order
delivery (TCP)

congestion control,

flow control,

connection setup.

unreliable, unordered
delivery (UDP)

no-frills extension of
best-effort IP.

services not available:

delay guarantees,

bandwidth guarantees.

reliable, in-order
delivery (TCP)

congestion control,

flow control,

connection setup.

unreliable, unordered
delivery (UDP)

no-frills extension of
best-effort IP.

services not available:

delay guarantees,

bandwidth guarantees.
Logicalendtoend Logicalendtoend
transport transport
Internet Transport-Layer Protocols
Internet Transport-Layer Protocols
network
datalink
physical
network
datalink
physical
application
transport
network
datalink
physical
application
transport
network
datalink
physical
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-12 12
Chapter 3 Outline
Chapter 3 Outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-13 13
application application application application
transport transport transport transport
network network network network
link link link link
physical physical physical physical
P1
application application application application
transport transport transport transport
network network network network
link link link link
physical physical physical physical
application application application application
transport transport transport transport
network network network network
link link link link
physical physical physical physical
P2
P3 P4
P1
host1 host2 host3
=process =socket
MultiplexingatSendingHost
gatheringdatafrommultiplesockets,envelopingdata
withheader(laterusedfordemultiplexing)
MultiplexingatSendingHost
gatheringdatafrommultiplesockets,envelopingdata
withheader(laterusedfordemultiplexing)
Multiplexing/Demultiplexing
Multiplexing/Demultiplexing
multiplexing multiplexing
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-14 14
application application application application
transport transport transport transport
network network network network
link link link link
physical physical physical physical
P1
application application application application
transport transport transport transport
network network network network
link link link link
physical physical physical physical
application application application application
transport transport transport transport
network network network network
link link link link
physical physical physical physical
P2
P3 P4
P1
host1 host2 host3
=process =socket
DemultiplexingatReceivingHost
deliveringreceivedsegments
tocorrectsocket
DemultiplexingatReceivingHost
deliveringreceivedsegments
tocorrectsocket
Multiplexing/Demultiplexing
Multiplexing/Demultiplexing
demultiplexing demultiplexing
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-15 15

host receives IP datagrams:

each datagram has source IP source IP


address, destination IP address address, destination IP address

each datagram carries 1


transport-layer segment

each segment has source, source,


destination port number destination port number
(recall: well-known port numbers
for specific applications).

host uses IP addresses & port


numbers to direct segment to
appropriate socket.
How Demultiplexing Works
How Demultiplexing Works
TCP/UDPsegmentformat TCP/UDPsegmentformat
sourceport# destport#
32bits
application
application
data
data
(message)
(message)
otherheaderfields
otherheaderfields
sourceport# destport#
application
application
data
data
(message)
(message)
otherheaderfields
otherheaderfields
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-16 16

Apps create sockets in


destination:
sd1 = Socket(PF-Inet,sock-
Dgram,Ipproto_TCP);
Bind(sd1, Socket,Socket length);

UDP Socket identified by


2-tuple:

Dest. IP address

Dest. Port number.

It means:
Socket =
(dest IP address , dest port number)

When host receives UDP When host receives UDP


segment: segment:

checks destination port checks destination port


number in segment, number in segment,

directs UDP segment to directs UDP segment to


Socket Socket (process) with that (process) with that
port number, port number,

IP datagrams with different IP datagrams with different


source IP addresses and/or source IP addresses and/or
source port numbers directed source port numbers directed
to same to same socket socket (process) (process). .
Connectionless Demultiplexing1
Connectionless Demultiplexing1
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-17 17
Connectionless Demultiplexing2
Connectionless Demultiplexing2
create socket,
port=x, for
incoming request:
serverSocket =
DatagramSocket()
read request from
serverSocket
write reply to
serverSocket
specifying client
host address,
port number
Server
(runningonIPaddress:C)
close
clientSocket
read reply from
clientSocket
create socket,
clientSocket =
DatagramSocket()
Create, address (hostid, port=x,
send datagram request
using clientSocket
Client
(runningonIPaddress:A)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-18 18
Client
IP:B
P2
client
IP:A
P1
P3
server
IP:C
Connectionless Demultiplexing3
Connectionless Demultiplexing3
P1
server(datagram)sockets:(C,5193)
clientsocket:(A,4012)
SP:5193 SP:5193
DP:801 DP:801
CtoB CtoB
SP:801 SP:801
DP:5193 DP:5193
BtoC BtoC BtoC BtoC
SP:5193 SP:5193
DP:4012 DP:4012
CtoA
SP:4012 SP:4012
DP:5193 DP:5193
AtoC
TwoarrivingUDPsegmentswith
differentsourceIPaddressorsource
portnumberwillbedirectedtoa
socket.
clientsocket:(B,801)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-19 19

TCP socket identified


by 4-tuple:

source IP address

source port number

dest IP address

dest port number

receiving host uses all


four values to direct
segment to appropriate
socket.

TCP socket identified


by 4-tuple:

source IP address

source port number

dest IP address

dest port number

receiving host uses all


four values to direct
segment to appropriate
socket.

Server host may support


many simultaneous TCP
sockets:

each socket identified by


its own 4-tuple

Example: Web servers


have different sockets
for each connecting
client

non-persistent HTTP will


have different socket for
each request.

Server host may support


many simultaneous TCP
sockets:

each socket identified by


its own 4-tuple

Example: Web servers


have different sockets
for each connecting
client

non-persistent HTTP will


have different socket for
each request.
Connection-Oriented Demultiplexing1
Connection-Oriented Demultiplexing1
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-20 20
create socket,
connect to hostid hostid, port=x x
clientSocket =
Socket()
read reply from
clientSocket
close
clientSocket
send request using
clientSocket
wait for incoming
connection request
connectionSocket =
welcomeSocket.accept()
create socket,
port=x x, for
incoming request:
welcomeSocket =
ServerSocket()
close
connectionSocket
Create and read request from
connectionSocket
write reply to
connectionSocket
TCP
connectionsetup
Client/Server Socket Interaction: TCP
Client/Server Socket Interaction: TCP
Client
(runningonIPaddress:A)
Server
(runningonIP
address:C)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-21 21
Sockets in Connection-Oriented
Sockets in Connection-Oriented
Client
socket
Client
socket
Connection
socket
Connection
socket
Welcoming
socket
Welcoming
socket
T
h
r
e
e

w
a
y

h
a
n
d
s
h
a
k
e
Clientprocess Clientprocess
Serverprocess Serverprocess
ClientIPAddress
&
PortNumber
ServerIPAddress
&
PortNumber2
ServerIPAddress
&
PortNumber1
bytes
Server
(runningonIP
address:C)
Client
(runningonIPaddress:A)
ClientIPAddress
&
PortNumber
+
4tupleidentifier
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-22 22
client
IP:A
server
IP:C
Connection-Oriented Demultiplexing2
Connection-Oriented Demultiplexing2
P1
SP:2549 SP:2549
DP:1324 DP:1324
CtoA
P2
SP:1324 SP:1324
DP:80 DP:80
AtoC
connectionsocket(A,C,1324,2549)
clientsockets(C,A,2549,1324)
SP:1324 SP:1324
DP:2549 DP:2549
A Ato toC C A Ato toC C
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-23 23
Connection-Oriented Demultiplexing3
Connection-Oriented Demultiplexing3
Client
IP:B
P1
client
IP:A
P1
P3
P5
server
IP:C
SP:9157 SP:9157
DP:2053 DP:2053
P6
AtoC
SP:5775 SP:5775
DP:2053 DP:2053
BtoC
SP:1807 SP:1807
DP:2053 DP:2053
AtoC
P4
P2

ServerhostmaysupportmanysimultaneousTCPsockets,witheachsocketattachedtoaprocess.

Eachsocketisidentifiedbyitsown4tuple.

All4fieldsareusedtodirect(demultiplex)thesegmenttotheappropriatesocket.

ServerhostmaysupportmanysimultaneousTCPsockets,witheachsocketattachedtoaprocess.

Eachsocketisidentifiedbyitsown4tuple.

All4fieldsareusedtodirect(demultiplex)thesegmenttotheappropriatesocket.
IncontrastwithUDP,twoarrivingTCPsegmentswithdifferentsourceIPaddressor
sourceportnumberwillbedirectedtotwodifferentsockets.
IncontrastwithUDP,twoarrivingTCPsegmentswithdifferentsourceIPaddressor
sourceportnumberwillbedirectedtotwodifferentsockets.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-24 24
Connection-Oriented Demultiplexing4
Connection-Oriented Demultiplexing4
Client
IP:B
P1
client
IP:A
P1
P3
P4
server
IP:C
SP:9157 SP:9157
DP:2053 DP:2053
AtoC
SP:5775 SP:5775
DP:2053 DP:2053
BtoC
SP:1807 SP:1807
DP:2053 DP:2053
AtoC
P2
ThreadedServer
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-25 25
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-26 26

simple Internet transport


protocol.

best effort service, UDP


segments may be:

Lost,

Delivered out of order


to app,

connectionless:

no handshaking between
UDP sender, receiver.

each UDP segment


handled independently
of others.
Why is there a UDP?

no connection
establishment (which can
add delay).

simple: no connection state


at sender, receiver.

small segment header.

no congestion control: UDP


can blast away as fast as
desired
UDP: User Datagram Protocol
UDP: User Datagram Protocol
[RFC 768]
[RFC 768]
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-27 27

often used for streaming


multimedia apps

loss tolerant

rate sensitive

other UDP uses

DNS

SNMP

reliable transfer over UDP:


add reliability at application
layer

application-specific
error recovery!
sourceport#
destport#
32bits
Application
Application
data
data
(message)
(message)
UDPsegmentformat
length checksum
Length,in
bytesofUDP
segment,
including
header
UDP Header
UDP Header
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-28 28
Sender:

treat segment contents as


sequence of 16-bit
integers.

checksum: addition (1s


complement sum) of
segment contents.

sender puts checksum


value into UDP checksum
field.
Sender:

treat segment contents as


sequence of 16-bit
integers.

checksum: addition (1s


complement sum) of
segment contents.

sender puts checksum


value into UDP checksum
field.
Receiver:

compute checksum of
received segment

check if computed checksum


equals checksum field value:

NO - error detected

YES - no error detected.


Receiver:

compute checksum of
received segment

check if computed checksum


equals checksum field value:

NO - error detected

YES - no error detected.


Goal:detecterrors(e.g.,flippedbits)intransmittedsegment
UDP Checksum
UDP Checksum
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-29 29
Checksum Example
Checksum Example
Source Port #:
Destin. Port #:
1s
compleme
nt
Checksum:
NoteThat:SourcePort#+Dest.Port#+Checksum=1111111111111111
Sum:
10110011001100110
10101010101010101
11011101110111011
10000111100001111
11011101110111011
11100101011001010
Length:
0011010100110101
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-30 30
Checksum Example-When msb is not Zero
Checksum Example-When msb is not Zero

Note

When adding numbers, a carryout from the most


significant bit needs to be added to the result

Example: add two 16-bit integers


11110011001100110
11101010101010101
11011101110111011
1
11011101110111100
10100010001000011
wraparound
Sum:
Checksum:
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-31 31
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of
reliable data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-32 32
Important in: app., transport, link layers. Important in: app., transport, link layers.
a
p
p
l
i
c
a
t
i
o
n
a
p
p
l
i
c
a
t
i
o
n
l
a
y
e
r
l
a
y
e
r
a
p
p
l
i
c
a
t
i
o
n
a
p
p
l
i
c
a
t
i
o
n
l
a
y
e
r
l
a
y
e
r
(a) Service model
(b) Service implementation
Chapter 3 outline
Chapter 3 outline
characteristicsofunreliablechannelwill
determinecomplexityofreliabledata
transfer(rdt)protocol.
characteristicsofunreliablechannelwill
determinecomplexityofreliabledata
transfer(rdt)protocol.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-33 33
send
side
receive
side
rdt_send(): calledfrom
above,(e.g.,byapp.).Passed
datato
delivertoreceiverupperlayer
udt_send(): calledby
rdt,
totransferpacketover
unreliablechannelto
receiver
rdt_rcv(): calledwhen
packetarrivesonrcvsideof
channel
deliver_data():
calledbyrdttodeliver
datatoupper
Reliable Data Transfer: Getting started
Reliable Data Transfer: Getting started
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-34 34
We will:

incrementally develop sender, receiver sides of


reliable data transfer protocol (rdt)

consider only unidirectional data transfer

but control info will flow on both directions!

use finite state machines (FSM) to specify


sender, receiver
We will:

incrementally develop sender, receiver sides of


reliable data transfer protocol (rdt)

consider only unidirectional data transfer

but control info will flow on both directions!

use finite state machines (FSM) to specify


sender, receiver
state:whenin
thisstate
nextstate
uniquely
determined
bynext
event
state
1
state
2
eventcausingstatetransition
actionstakenonstatetransition
event
actions
Reliable Data Transfer: getting started
Reliable Data Transfer: getting started
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-35 35
Reliable Data Transfer over Unreliable Channel
Reliable Data Transfer over Unreliable Channel

rdt1.0: underlying channel perfectly reliable.

rdt2.0: underlying channel may flip bits in packet. ACK/NAK +


Stop&Wait.

rdt2.1: What happens if ACK/NAK corrupted?

Sender handles defected ACK/NAKs.

rdt2.2: a NAK-free protocol.

Instead of NAK, receiver sends ACK for last pkt received


OK.

Duplicate ACK at sender results in same action as NAK:


retransmit current pkt.

rdt3.0: Channels with errors and loss (Timer).

Stop&Wait: Performance is low.

Pipelining increase the performance: Go-Back-N, Selective


Repeat.

rdt1.0: underlying channel perfectly reliable.

rdt2.0: underlying channel may flip bits in packet. ACK/NAK +


Stop&Wait.

rdt2.1: What happens if ACK/NAK corrupted?

Sender handles defected ACK/NAKs.

rdt2.2: a NAK-free protocol.

Instead of NAK, receiver sends ACK for last pkt received


OK.

Duplicate ACK at sender results in same action as NAK:


retransmit current pkt.

rdt3.0: Channels with errors and loss (Timer).

Stop&Wait: Performance is low.

Pipelining increase the performance: Go-Back-N, Selective


Repeat.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-36 36

underlying channel perfectly reliable

no bit errors no loss of packets

separate FSMs for sender, receiver:

sender sends data into underlying channel

receiver read data from underlying channel

underlying channel perfectly reliable

no bit errors no loss of packets

separate FSMs for sender, receiver:

sender sends data into underlying channel

receiver read data from underlying channel


Wait for
call from
above
sndpkt = make_pkt(data)
udt_send(sndpkt)
rdt_send(data)
sender
rdt_send(data)event,createsapacket
containingthedataviatheaction
make_pkt(data) andsendsthepacketviathe
actionudt_send(packet).
Wait for
call from
below
extract (rcvpkt,data)
deliver_data(data)
rdt_rcv(rcvpkt)
receiver
rdt_rcv(rcvpkt)event,removesthedatafromthe
packetviatheactionextract(rcvpkt, data) and
passesthedatauptoupperlayerviatheaction
deliver_data(data).
rdt1.0:
rdt1.0:
a protocol for a completely reliable channel
a protocol for a completely reliable channel
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-37 37

underlying channel may flip bits in packet

recall: UDP checksum to detect bit errors.

the question: how to recover from errors:

acknowledgements (ACKs): receiver explicitly tells sender that


pkt received OK.

negative acknowledgements (NAKs): receiver explicitly tells


sender that pkt had errors (Please repeat that.)

sender retransmits pkt on receipt of NAK.

new mechanisms in rdt2.0 (beyond rdt1.0):

error detection.

receiver feedback: Reciever sends control message (ACK,NAK)


to sender.

raliable data transfer based the retransmission is known as:


ARQ (Automatic Repeat reQuest).

underlying channel may flip bits in packet

recall: UDP checksum to detect bit errors.

the question: how to recover from errors:

acknowledgements (ACKs): receiver explicitly tells sender that


pkt received OK.

negative acknowledgements (NAKs): receiver explicitly tells


sender that pkt had errors (Please repeat that.)

sender retransmits pkt on receipt of NAK.

new mechanisms in rdt2.0 (beyond rdt1.0):

error detection.

receiver feedback: Reciever sends control message (ACK,NAK)


to sender.

raliable data transfer based the retransmission is known as:


ARQ (Automatic Repeat reQuest).
rdt2.0: channel with bit errors
rdt2.0: channel with bit errors
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-38 38
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
Wait for
call from
below
Receiver
(onestate)
Wait for
call from
above
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for
ACK or
NAK
Sender
(twostates)
sndpkt = make_pkt(data data, checksum)
udt_send(sndpkt)
rdt_send(data data)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
Wait for call from above
ACKpacketisreceived.
NACKpacketisreceived.
rdt2.0: FSM specification
rdt2.0: FSM specification
(Stop&Wait)
(Stop&Wait)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-39 39
Wait for
call from
above
Wait for
ACK or
NAK
Wait for
call from
below
sndpkt = make_pkt(data data, checksum)
udt_send(sndpkt)
rdt_send(data data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
Wait for call form above
udt_send(NAK)
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
extract(rcvpkt,data data)
deliver_data(data data)
udt_send(ACK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
rdt2.0: operation with no errors
rdt2.0: operation with no errors
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-40 40
Wait for
call from
above
Wait for
ACK or
NAK
Wait for
call from
below
sndpkt = make_pkt(data data, checksum)
udt_send(sndpkt)
rdt_send(data data)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
Wait for call from above
udt_send(NAK)
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
extract(rcvpkt,data data)
deliver_data(data data)
udt_send(ACK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
rdt2.0: error scenario
rdt2.0: error scenario
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-41 41
What happens if
ACK/NAK corrupted?

sender doesnt know what


happened at receiver!

cant just retransmit:


possible duplicate
What to do?

sender ACKs/NAKs
receivers ACK/NAK? What
if sender ACK/NAK lost?

retransmit, but this might


cause retransmission of
correctly received pkt!
What happens if
ACK/NAK corrupted?

sender doesnt know what


happened at receiver!

cant just retransmit:


possible duplicate
What to do?

sender ACKs/NAKs
receivers ACK/NAK? What
if sender ACK/NAK lost?

retransmit, but this might


cause retransmission of
correctly received pkt!
Handling duplicates:

sender adds sequence


number to each pkt

sender retransmits current


pkt if ACK/NAK is recieved

receiver discards (doesnt


deliver up) duplicate pkt
Handling duplicates:

sender adds sequence


number to each pkt

sender retransmits current


pkt if ACK/NAK is recieved

receiver discards (doesnt


deliver up) duplicate pkt
Sendersendsonepacket,
thenwaitsforreceiver
response
stopandwait
rdt2.0 has a fatal defect: ACK/NAK Corruption
rdt2.0 has a fatal defect: ACK/NAK Corruption
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-42 42
Wait for call
0 from
above
Wait for
ACK or
NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isNAK(rcvpkt) )
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isNAK(rcvpkt) )
Wait for
call 1 from
above
Wait for
ACK or
NAK 1
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt)
Wait for call 1
from above
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt)
Wait for call 0
from above
Seq.no=1
Seq.no=0
rdt2.1:
rdt2.1:
Sender
Sender
handles defected ACK/NAKs.
handles defected ACK/NAKs.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-43 43
Wait for
1 from
below
Wait for
0 from
below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt) &&
has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) &&
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt)
rdt2.1:
rdt2.1:
Receiver
Receiver
handles defected
handles defected
ACK/NAKs.
ACK/NAKs.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-44 44
Sender:

seq # added to pkt

two seq. #s (0,1) will


suffice. Why?

must check if received


ACK/NAK corrupted.

twice as many states.

state must remember


whether current pkt
has 0 or 1 seq. #
Sender:

seq # added to pkt

two seq. #s (0,1) will


suffice. Why?

must check if received


ACK/NAK corrupted.

twice as many states.

state must remember


whether current pkt
has 0 or 1 seq. #
Receiver:

must check if received


packet is duplicate.

state indicates whether


0 or 1 is expected pkt
seq #.

note: receiver can not


know if its last
ACK/NAK received OK
at sender.
Receiver:

must check if received


packet is duplicate.

state indicates whether


0 or 1 is expected pkt
seq #.

note: receiver can not


know if its last
ACK/NAK received OK
at sender.
rdt2.1: discussion
rdt2.1: discussion
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-45 45

same functionality as rdt2.1, using ACKs


only

instead of NAK, receiver sends ACK for


last pkt received OK

receiver must explicitly include seq # of pkt


being ACKed

duplicate ACK at sender results in same


action as NAK:
retransmit current pkt
retransmit current pkt

same functionality as rdt2.1, using ACKs


only

instead of NAK, receiver sends ACK for


last pkt received OK

receiver must explicitly include seq # of pkt


being ACKed

duplicate ACK at sender results in same


action as NAK:
retransmit current pkt
retransmit current pkt
rdt2.2: a NAK-free protocol
rdt2.2: a NAK-free protocol
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-46 46
Wait for
call 0 from
above
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isACK(rcvpkt,1) )
Wait for
ACK
0
sender
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data); deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
has_seq1(rcvpkt) )
udt_send(sndpkt)
Wait for
0 from
below
receiver
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
Wait for
rdt2.2: sender, receiver fragments
rdt2.2: sender, receiver fragments
1
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-47 47
New assumption:
underlying channel can
also lose packets (data
or ACKs)

checksum, seq. #, ACKs,


retransmissions will be
of help, but not enough
Q: how to deal with loss?

sender waits until


certain data or ACK
lost, then retransmits

Timer drawbacks?
New assumption:
underlying channel can
also lose packets (data
or ACKs)

checksum, seq. #, ACKs,


retransmissions will be
of help, but not enough
Q: how to deal with loss?

sender waits until


certain data or ACK
lost, then retransmits

Timer drawbacks?
Approach: sender waits
reasonable amount of
time for ACK

retransmits if no ACK
received in this time

if pkt (or ACK) just delayed


(not lost):

retransmission will be
duplicate, but use of seq.
#s already handles this

receiver must specify seq


# of pkt being ACKed

requires countdown timer


Approach: sender waits
reasonable amount of
time for ACK

retransmits if no ACK
received in this time

if pkt (or ACK) just delayed


(not lost):

retransmission will be
duplicate, but use of seq.
#s already handles this

receiver must specify seq


# of pkt being ACKed

requires countdown timer


rdt3.0: Channels with errors
rdt3.0: Channels with errors
and
and
loss (Timer)
loss (Timer)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-48 48
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
start_timer
rdt_send(data)
Wait
for
ACK0
Wait for
call 1 from
above
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
start_timer
rdt_send(data)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
stop_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,1)
stop_timer
udt_send(sndpkt)
start_timer
timeout
udt_send(sndpkt)
start_timer
timeout
Wait for
call 0from
above
Wait
for
ACK1
Wait for call 1 from above
rdt_rcv(rcvpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isACK(rcvpkt,1) )
Wait for ACK0 rdt_rcv(rcvpkt)
Wait for call 0 from above
rdt_rcv(rcvpkt) &&

( corrupt(rcvpkt) ||
isACK(rcvpkt,0) )
Wait for ACK1
rdt3.0: Sender
rdt3.0: Sender
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-49 49
(b) lost packet
(a) operation with no loss
rdt3.0 in action
rdt3.0 in action
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-50 50
(c) lost ACK (d) premature time
rdt3.0 in action
rdt3.0 in action
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-51 51

rdt3.0 works, but performance stinks

example: 1 Gbps link, 15 ms e-e prop. delay, 1KB packet:

U
sender
: utilization fraction of time sender busy sending

1KB pkt every 30 msec = 33kB/sec throughput over 1 Gbps link

network protocol limits use of physical resources!


T
transm
it
=
8kb/pkt
10
9
b/sec
=8sec
L(packetlengthinbits)
R(transmissionrate,bps)
=

U
sender
=

0.008ms
15ms+15ms+0.008
ms


=0.00027
L/R
RTT+L/R
=
Performance of Stop & Wait (rdt3.0)
Performance of Stop & Wait (rdt3.0)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-52 52
first packet bit transmitted, t = 0
sender receiver
RTT
last packet bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R

U

sender
=
0.008
30.008
=
0.00027
L/R
RTT+L/R
=
rdt3.0: stop-and-wait operation
rdt3.0: stop-and-wait operation
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-53 53
Pipelining: sender allows multiple, in-flight, yet-to-
be-acknowledged pkts

range of sequence numbers must be increased

buffering at sender and/or receiver

Two generic forms of pipelined protocols: go-Back-N,


selective repeat
(b) A pipelined protocol in operation
(a) A stop-and-wait
protocol in operation
data packet
ACK packet
data packet
ACK packet
Pipelined protocols
Pipelined protocols
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-54 54
first packet bit transmitted, t = 0
sender receiver
RTT
last bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R
last bit of 2
nd
packet arrives, send ACK
last bit of 3
rd
packet arrives, send ACK

U
sender
=
.024
30.008
=
0.0008
microsecon
ds
3 * L / R
RTT + L / R
=
Increaseutilization
byafactorof3!
Pipelining: Increasing Utilization
Pipelining: Increasing Utilization
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-55 55
Sender:

k-bit seq # in pkt header

window of up to N, consecutive unacked pkts allowed

ACK(n):ACKsallpktsupto,includingseq#n

may deceive duplicate ACKs (see receiver)

timerforeachinflightpkt

timeout(n):retransmitpktnandallhigherseq#pktsinwindow
already
ACKd
sent, not
yet ACKd
usable,
not yet sent
not usable
67891
0
1
1
1
2
1
3
1
4
5 4 3 2 1 1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
window size
N
A
C
K
6
send_base
nextseqnum
cumulativeACK
Go-Back-N
Go-Back-N
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-56 56
Wait
start_timer
udt_send(sndpkt[base])
udt_send(sndpkt[base+1])

udt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] =
make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (nextseqnum == base+N) start_timer
nextseqnum++
}
else
refuse_data(data)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer
else
start_timer
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base=1
nextseqnum=1
rdt_rcv(rcvpkt)
&& corrupt(rcvpkt)
base
nextseqnum
base+N
wait
start
1 1
2 2
Ifatimeoutoccurs,thesenderresends Ifatimeoutoccurs,thesenderresendsall allpacketsthat packetsthat
havebeenpreviouslysentbutthathavenotyetbeen havebeenpreviouslysentbutthathavenotyetbeen
acknowledged. acknowledged.
IfanACKisreceivedbutthere IfanACKisreceivedbutthere
arestilladditionaltransmitted arestilladditionaltransmitted
butyettobeacknowledged butyettobeacknowledged
packets,thetimerisrestarted packets,thetimerisrestarted
Timercanbethoughtofasa Timercanbethoughtofasa
timerfortheoldest timerfortheoldest
transmittedbutnotyet transmittedbutnotyet
acknowledgedpacket. acknowledgedpacket.
GBN: Sender extended FSM
GBN: Sender extended FSM
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-57 57
ACK-only: always send ACK for correctly-received pkt
with highest in-order seq #

may generate duplicate ACKs

need only remember expectedseqnum

out-of-order pkt:

discard (dont buffer) -> no receiver buffering!

Re-ACK pkt with highest in-order seq #


Wait
udt_send(sndpkt)
default
rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
&& hasseqnum(rcvpkt,expectedseqnum)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++
expectedseqnum=1
sndpkt =
make_pkt(expectedseqnum,ACK,chksum)
start
GBN: Receiver extended FSM
GBN: Receiver extended FSM
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-58 58
g
o
-
b
a
c
k

N
GBN in action
GBN in action
time
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-59 59

receiver individually acknowledges all correctly


received pkts

buffers pkts, as needed, for eventual in-order delivery


to upper layer

sender only resends pkts for which ACK not


received

sender timer for each unACKed pkt

sender window

N consecutive seq #s

again limits seq #s of sent, unACKed pkts


Selective Repeat
Selective Repeat
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-60 60
Selective repeat: sender, receiver windows
Selective repeat: sender, receiver windows
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-61 61
data from above :

if next available seq # in


window, send pkt
timeout(n):

resend pkt n, restart timer


ACK(n) in [sendbase,sendbase+N]:

mark pkt n as received

if n smallest unACKed pkt,


advance window base to
next unACKed seq #
sender
pktnin[rcvbase,rcvbase+N1]

sendACK(n)

outoforder:buffer

inorder:deliver(alsodeliver
buffered,inorderpkts),
advancewindowtonextnot
yetreceivedpkt
pktnin[rcvbaseN,rcvbase1]

ACK(n)
otherwise:

ignore
receiver
Selective repeat
Selective repeat
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-62 62
Selective repeat in action
Selective repeat in action
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-63 63
Example:

seq #s: 0, 1, 2, 3

window size=3

receiver sees no
difference in two
scenarios!

incorrectly passes
duplicate data as new
in (a)
Q: what relationship
between seq # size and
window size?
Selective repeat: dilemma
Selective repeat: dilemma
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-64 64
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-
oriented transport:
TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-65 65

full duplex data:

bi-directional data flow in same


connection

MSS: maximum segment size

connection-oriented:

handshaking (exchange of control


msgs) inits sender, receiver state
before data exchange

flow controlled:

sender will not overwhelm receiver

no delay or bandwidth guarantee

point-to-point:

one sender, one receiver

Reliable:

guaranteed arrival

no error

in order delivery

in-order byte stream:

no message boundaries

pipelined:

TCP congestion and flow control set


window size

send & receive buffers


Process
writes data
Process
writes data
TCP
send
buffer
TCP
send
buffer
Socket Socket
Process
reads data
Process
reads data
TCP
receive
buffer
TCP
receive
buffer
Socket Socket
segment segment
TCP: Overview
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 RFCs: 793, 1122, 1323, 2018, 2581
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-66 66
TCP Reliable Data Transfer

TCP provides reliable data transfer service on top


of IPs unreliable service,

Cumulative ACKs,

Single retransmission timer,

When the receiver receives out-of-order,


segments, it buffers them and re-ACK the last in-
order data,

The sender retransmits at timeout or receiving


duplicate ACKs,

Somewhere between Go-back-N and Selective


Repeat.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-67 67
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-68 68
TCP Segment Structure
TCP Segment Structure
seq#isbytestreamnumberoffirst
databyteinsegment
URG:urgentdata(generallynotused)
ACK:ACK#valid
PSH:pushdatanow
(generallynotused)
RST,SYN,FIN:
connectionestab
(setup,teardown
commands)
#bytes
rcvrwilling
toaccept
TCPchecksum
(asinUDP)
MaximumSegmentSize,windowscalingfactor,
Timestamping,maximumsegmentlength,
RFCs:854,1323
HeaderLength
[4Bytes]
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-69 69
sourceport# destport#
32bits
application
data
(variablelength)
sequencenumber
acknowledgementnumber
Receivewindow
Urgdatapnter
checksum
F S R P A U
head
len
not
used
Options(variablelength)
URG:urgentdata
(generallynotused)
ACK:ACK#
valid
PSH:pushdatanow
(generallynotused)
RST,SYN,FIN:
connectionestab
(setup,teardown
commands)
#bytes
rcvrwilling
toaccept
TCPchecksum
(asinUDP)
MaximumSegmentSize,
windowscalingfactor,
Timestamping,
maximumsegmentlength,
RFCs:854,1323
[4Bytes]
seq#isbyte
streamnumber
offirstdata
bytein
segment
TCP Segment Structure
TCP Segment Structure
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-70 70
a
p
p
l
i
c
a
t
i
o
n
t
r
a
n
s
p
o
r
t
rdt_send(data)
data
6000B
(a)6000Byte
data
passedtoTCP
S
e
q
=
0
0
1
S
e
q
=
1
0
0
1
S
e
q
=
2
0
0
1
S
e
q
=
3
0
0
1
S
e
q
=
4
0
0
1
S
e
q
=
5
0
0
1
Byte
data
121001200130014001
5001
(b)Dataisbrokeninto61000Bytesegments.
TCPHeader
TCP Segment Structure (con.)
TCP Segment Structure (con.)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-71 71
Seq. #s:

byte stream
number of first
byte in segments
data
ACKs:

seq # of next byte


expected from
other side

cumulative ACK
Q: how receiver handles
out-of-order segments

A: TCP spec doesnt


say, it is up to
implementor.
HostA HostB
S
e
q
=
4
2
, A
c
k
=
7
9
, d
a
ta
=
C

S
e
q
=
7
9
, A
c
k
=
4
3
, d
a
ta
=
C

S
e
q
=
4
3
, A
c
k
=
8
0
User
types
C
hostACKs
receipt
ofechoed
C
hostACKs
receiptof
C,echoes
backC
time
simpletelnetscenario
simpletelnetscenario
TCP seq#s and Ack#s
TCP seq#s and Ack#s
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-72 72
Q: how to set TCP
timeout value?

longer than RTT

but RTT varies

too short: premature


timeout

unnecessary
retransmissions

too long: slow reaction


to segment loss
Q: how to set TCP
timeout value?

longer than RTT

but RTT varies

too short: premature


timeout

unnecessary
retransmissions

too long: slow reaction


to segment loss
Q: how to estimate RTT?

SampleRTT: measured time


from segment transmission
until ACK receipt

ignore retransmissions

SampleRTT will vary, want


estimated RTT smoother

average several recent


measurements, not just
current SampleRTT
Q: how to estimate RTT?

SampleRTT: measured time


from segment transmission
until ACK receipt

ignore retransmissions

SampleRTT will vary, want


estimated RTT smoother

average several recent


measurements, not just
current SampleRTT
TCPusesatimeout/retransmitmechanismtorecoverfromlostsegment. TCPusesatimeout/retransmitmechanismtorecoverfromlostsegment.
TCP Round Trip Time and Timeout
TCP Round Trip Time and Timeout
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-73 73
How to estimate max RTT?
How to estimate max RTT?

SampleRTT = propagation + queuing delay

Queuing delay highly variable,

So, different samples of RTTs will give


different random values of queuing delay.

Chebyshevs Theorem:

MaxRTT = AveragegRTT + k*DevRTT

Error probability is less than 1/(k**2)

Result true for ANY distribution of samples.

In TCP:

RetransmotionTimeOut =AverageRTT+
4*DevRTT
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-74 74
Request for Comments: 2988, Nov 2000
Request for Comments: 2988, Nov 2000

Until a round-trip time (RTT) measurement has


been made for a segment sent between the sender
and receiver, the sender should set

RTO 3 secs.

When the first RTT measurement SampleRTT is


made, the host must set
1. StimatedRTT SampleRTT
2. DevRTT SampleRTT/2
3. RTO StimatedRTT + max (G, 4DevRTT).

Experience has shown that finer clock granularities (G 100 msec)


perform somewhat better than more coarse granularities.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-75 75
Request for Comments: 2988, Nov 2000
Request for Comments: 2988, Nov 2000

When a subsequent RTT measurement SampleRTT' is made, a


host must set
1. DevRTT(1 - )DevRTT + |StimatedRTT-SampleRTT'|
2. StimatedRTT (1 - ) StimatedRTT + SampleRTT
3. RTO StimatedRTT + max (G, 4 DevRTT)

The value of StimatedRTT used in the update to DevRTT is its


value before updating StimatedRTT itself using the second
assignment.

Whenever RTO is computed, if it is less than 1 second then the RTO


should be rounded up to 1 second.

The above should be computed using =1/8 and =1/4.


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-76 76
150
R
T
T
(
m
i
l
i
s
e
c
)
350
300
100
200
250
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
n=time(seconds)
SampleRTT
EstimatedRTT
Example RTT estimation
Example RTT estimation
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-77 77
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-78 78

TCP creates rdt


service on top of IPs
unreliable service

Pipelined segments

Cumulative acks

TCP uses single


retransmission timer

TCP is a GBN style


protocol. RFC 2018
propose a Selective
Repeat style for TCP.

TCP creates rdt


service on top of IPs
unreliable service

Pipelined segments

Cumulative acks

TCP uses single


retransmission timer

TCP is a GBN style


protocol. RFC 2018
propose a Selective
Repeat style for TCP.

Retransmissions are
triggered by:

timeout events

duplicate acks

Initially consider
simplified TCP sender:

ignore duplicate acks

ignore flow control,


congestion control

Retransmissions are
triggered by:

timeout events

duplicate acks

Initially consider
simplified TCP sender:

ignore duplicate acks

ignore flow control,


congestion control
TCP reliable data transfer
TCP reliable data transfer
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-79 79
data rcvd from app:

Create segment with


seq #

seq # is byte-stream
number of first data
byte in segment

start timer if not


already running (think
of timer as for oldest
unacked segment)

expiration interval:
RTO
[TimeOutInterval]
data rcvd from app:

Create segment with


seq #

seq # is byte-stream
number of first data
byte in segment

start timer if not


already running (think
of timer as for oldest
unacked segment)

expiration interval:
RTO
[TimeOutInterval]
timeout:

retransmit segment
that caused timeout

restart timer
Ack rcvd:

If acknowledges
previously unacked
segments

update what is known to


be acked

start timer if there are


outstanding segments
timeout:

retransmit segment
that caused timeout

restart timer
Ack rcvd:

If acknowledges
previously unacked
segments

update what is known to


be acked

start timer if there are


outstanding segments
TCP sender events:
TCP sender events:
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-80 80
Comment:
SendBase1:last
cumulatively
ackedbyte
Example:
SendBase1=71;
y=73,sothercvr
wants73+;
y>SendBase,so
thatnewdatais
acked
TCP Sender
TCP Sender
(simplified)
(simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever) {
switch(event)
event: data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running) start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event: timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
} /* end of loop forever */
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever) {
switch(event)
event: data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running) start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event: timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
} /* end of loop forever */
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-81 81
Ack=25001
Win=000
Ack=25001
Win=000
1234567891011121314151617181920212223242526
12001 0001
Adatasegment
1,2,3...,1000[B]
Win=RcvWindow
Win=4000[B] Win=4000[B]
Seq=1 Seq=1
Ack=4001
Win=10000
Ack=4001
Win=10000
4001
Ack=12001
Win=5000
Ack=12001
Win=5000
TimeOut
12001
Ack=18001
Win=7000
Ack=18001
Win=7000
18001
Ack=20001
Win=5000
Ack=20001
Win=5000
20001
HostB
HostA
ACKsfromBisnotdetailed
TCP: retransmission schemes
TCP: retransmission schemes
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-82 82
TCP ACK generation
TCP ACK generation
[RFC 1122, RFC 2581]* [RFC 1122, RFC 2581]*
EventatReceiver TCPReceiveraction
Arrivalofinordersegment
with
expectedseq#.Alldataupto
expectedseq#already
ACKed.
DelayedACK.Waitupto500ms
fornextsegment.Ifnonext
segment,
sendACK.
Arrivalofinordersegment
with
expectedseq#.Oneother
segmenthasACKpending.
Immediatelysendsinglecumulative
ACK,ACKingbothinorder
segments.
Arrivalofoutoforder
segment
higherthanexpectseq.#.
Gapdetected.
ImmediatelysendduplicateACK,
indicatingseq.#ofnextexpected
byte.
Arrivalofsegmentthat
partiallyorcompletelyfills
gap.
ImmediatesendACK,providedthat
segmentstartsatlowerendofgap.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-83 83
S
e
q
=
1
, 1
0
0
0
b
y
te
s
d
a
ta
A
C
K
=
2
0
0
1
HostB
S
e
q
=
4
0
0
1
A
C
K
=
6
0
0
1
time
HostA
S
e
q
=
1
0
0
1
S
e
q
=
2
0
0
1
S
e
q
=
3
0
0
1
T<500ms
T<500ms
Win1
S
e
q
=
1

R
T
O
)
S
e
q
=
5
0
0
1
T<500ms
A
C
K
=
4
0
0
1
SendBase=4001
Win2
W
i
n
2
TCP retransmission (Normal ACK)*
TCP retransmission (Normal ACK)*
SendBase=1
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-84 84
S
e
q
=
1
, 1
0
0
0
b
y
te
s
d
a
ta
A
C
K
=
2
0
0
0
HostB
time
HostA
S
e
q
=
1
0
0
1
S
e
q
=
2
0
0
1
S
e
q
=
3
0
0
1
T<500ms
T<500ms
Win1
S
e
q
=
1

(
R
T
O
)
T<500ms
A
C
K
=
4
0
0
1
SendBase=1
Win1
W
i
n
2
S
e
q
=
1
S
e
q
=
1
0
0
1
S
e
q
=
2
0
0
1
S
e
q
=
1

(
R
T
O
) SendBase=4001
W
i
n
3
SendBase=1
TCP retransmission (Lost ACK)*
TCP retransmission (Lost ACK)*
Win3
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-85 85
S
e
q
=
1
, 1
0
0
0
b
y
te
s
d
a
ta
A
C
K
=
2
0
0
1
W
i
n
2
HostB
HostA
S
e
q
=
1
0
0
1
S
e
q
=
2
0
0
1
S
e
q
=
3
0
0
1
T<500ms
T<500ms
Win1
S
e
q
=
1

(
R
T
O
)
T<500ms
A
C
K
=
4
0
0
1
SendBase=1
Win3
S
e
q
=
1
S
e
q
=
1
0
0
1
S
e
q
=
4
0
0
1
S
e
q
=
5
0
0
1
SendBase=4001
W
i
n
3
Win1
time
TCP retransmission (Premature ACK)*
TCP retransmission (Premature ACK)*
T<500ms
A
C
K
=
4
0
0
1
SendBase=1
Win3
S
e
q
=
1
S
e
q
=
1
0
0
1
S
e
q
=
4
0
0
1
S
e
q
=
5
0
0
1
Win1
T<500ms
Win3
S
e
q
=
1
S
e
q
=
4
0
0
1
Win1
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-86 86
Senderkeeps
transmission
basedon
Win&SendBase
T<500ms
Ack=5001
Ack=3001
2Acks=8001
Seq=3001
Seq=4001
Win
Alldatauptoseq=3001
areACKed.
Time
Seq=8001isexpected
Seq=9001
2Acks=8001
Acks=9001
Receivedsegmentstartsat
lowerendofthegap.
ImmediatesendACK,
gapisdetected
gapisdetected
Seq=12001
HostA
HostB
Alldatauptoseq=5001
areACKed.
Seq=2001
Win
TCP ACK Generation (Illustrated)*
TCP ACK Generation (Illustrated)*
Seq=8001
Seq=8001
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-87 87
Duplicate ACKs
Duplicate ACKs

TCP receiver sends an immediate ACK if it receives


an out-of-order segment.

This is a duplicate ACK.

This dupe ACK informs the sender and tells it what


sequence number the receiver expected.

Its unclear whether dupe ACKs indicate loss or


simply packet re-ordering on the network.

But, multiple duplicate ACKs probably indicate loss.


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-88 88

Time-out period often


relatively long:

long delay before


resending lost packet

Detect lost segments


via duplicate ACKs.

Sender often sends


many segments back-to-
back

If segment is lost,
there will likely be many
duplicate ACKs.

If sender receives 3
ACKs for the same
data, it supposes that
segment after ACKed
data was lost:

fast retransmit: resend


segment before timer
expires
Fast Retransmit
Fast Retransmit
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-89 89
S
eq=2001
HostA
S
e
q
=
1
A
C
K
=
1
0
0
1
ThreeACKs=1001meansthat
thesegmentSeq=1001islost.
HostB
S
e
q
=
1
0
0
1
A
C
K
s
=
1
0
0
1
S
e
q
=
3
0
0
1
1
2
S
eq=1001
S
e
q
=
1
0
0
1

(
R
T
O
)
Resendsegment
before
timerexpires.
time
Fast Retransmit (Illustrated)
Fast Retransmit (Illustrated)
3
A
C
K
s
=
1
0
0
1
A
C
K
s
=
1
0
0
1
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-90 90

event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
else {
increment count of dup ACKs received for y
if (count of ACKs received for y is 3) {
resend segment with sequence number y
}
aduplicateACKfor
alreadyACKedsegment
fastretransmit
Fast retransmit algorithm:
Fast retransmit algorithm:
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-91 91
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-92 92

receive side of TCP


connection has a receive
buffer: RcvBuffer

Sender wont overflow
receivers buffer by
transmitting too much,
too fast.

speed-matching
service: matching the
send rate to the
receiving apps drain
rate.

App process may be


slow at reading from
buffer.
TCP
data
inbuffer
spare
buffer
datafromIP
R
c
v
W
i
n
d
o
w
datatoappproc.
R
c
v
B
u
f
f
e
r
TCP Flow Control
TCP Flow Control
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-93 93
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer =

Rcvr advertises spare


room by including value
of RcvWindow in
segments

Sender limits unACKed


data to RcvWindow

guarantees receive
buffer doesnt overflow
RcvWindow = RcvBuffer - [LastByteRcvd
LastByteRead]
TCP Flow Control: how it works
TCP Flow Control: how it works
TCP
data
inbuffer
spare
buffer
datafromIP
R
c
v
W
i
n
d
o
w
datatoappproc.
R
c
v
B
u
f
f
e
r
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-94 94
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-95 95
Recall: TCP sender, receiver
establish connection
before exchanging data
segments

initialize TCP variables:

seq. #s

buffers, flow control


info (e.g. RcvWindow)

client: connection initiator


Socket clientSocket = new
Socket("hostname","port
number");

server: contacted by client


Socket connectionSocket =
welcomeSocket.accept();
Three way handshake:
Step 1: client host sends TCP
SYN segment to server

specifies initial seq #

no data
Step 2: server host receives
SYN, replies with SYNACK
segment

server allocates buffers

specifies server initial seq.


#
Step 3: client receives SYNACK,
replies with ACK segment,
which may contain data
TCP Connection Management
TCP Connection Management
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-96 96
client
S
Y
N
=
1
,C
lie
n
tI
S
N
server
A
C
K
=
1
,S
Y
N
=
1
,S
e
r
.I
S
N
,W
in
=
X
A
C
K
=
1
,W
in
=
Y
,S
N
=
IS
N
connectionrequest
n
o
p
a
y
lo
a
d
connectionaccepted
n
o
p
a
y
lo
a
d
P
a
y
lo
a
d
(o
p
tio
n
a
l)
connectionack.
time
time
Threewayhandshake:
Win=RcvWindow
TCP Connection Management (cont.)
TCP Connection Management (cont.)
ISN=InitialSequenceNumber
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-97 97
Closing a connection:
Either of the two processes
participating in a TCP
connection can end the
connection.
Example
client closes socket:
ClientSocket.close();
Step 1: client end system
sends TCP FIN control
segment to server.
client
F
IN
=
1
server
A
C
K
=
1
A
C
K
=
1
close
close
closed
t
i
m
e
d

w
a
i
t
3
0
,

6
0

o
r

1
2
0

s
e
c
.
F
I
N
=
1

m
e
a
n
s
:

n
o

m
o
r
e

d
a
t
a

f
r
o
m

s
e
n
d
e
r
F
IN
=
1
time
TCP Connection Management (cont.)
TCP Connection Management (cont.)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-98 98
Final ACK loss
Final ACK loss

If clients final ACK is lost


the server resends ACK
and FIN.

If the resented ACK and


FIN reaches client before
timed wait, client resends
its final ACK and waits
again.

After timed out all


resources on client side
released (including port
numbers).
client
F
IN
=
1
server
A
C
K
=
1
A
C
K
=
1
close
close
t
i
m
e
d

w
a
i
t
3
0
,

6
0

o
r

1
2
0

s
e
c
.
F
IN
=
1
time
F
IN
=
1
A
C
K
=
1
A
C
K
=
1
Thetimeisimplementationdependent.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-99 99
Step 2: server receives FIN, replies
with ACK. Closes connection, sends FIN.
Step 3: client receives FIN, replies with ACK.

Enters timed wait - will respond with ACK to received FINs


Step 4: server, receives ACK. Connection closed.
Note: with small modification, can handle
simultaneous FINs.
TCP Connection Management (cont.)
TCP Connection Management (cont.)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-100 100
TCPclient
lifecycle
TCPserver
lifecycle
TCP Connection Management (cont)
TCP Connection Management (cont)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-101 101
State Transition Diagram
State Transition Diagram
CLOSED CLOSED
LISTEN LISTEN
SYN_RCVD SYN_RCVD SYN_SENT SYN_SENT
ESTABLISHED
CLOSE_WAIT CLOSE_WAIT
LAST_ACK LAST_ACK CLOSING CLOSING
TIME_WAIT TIME_WAIT
FIN_WAIT_2 FIN_WAIT_2
FIN_WAIT_1 FIN_WAIT_1
(Passive open)
SYN + ACK/ACK ACK/-
Close/FIN
FIN/ACK
Close/FIN
timed wait
30,60.120 sec
FIN/ACK
ACK/-
ACK/-
Close/FIN
Close/-
Send/SYN
A
C
K

+

F
I
N
/
A
C
K
SYN/SYN + ACK
FIN/ACK
ACK/-
CLOSED CLOSED
Close/- (Active open)
Connect/SYN
(
P
a
s
s
i
v
e

c
l
o
s
e
)
(
A
c
t
i
v
e

c
l
o
s
e
)
(start)
(backtostart)
Timeout/-
EVENT/ACTION EVENT/ACTION EVENT/ACTION EVENT/ACTION
Open/-
Normalpathforaclient Normalpathforaclient
Normalpathforaserver Normalpathforaserver
Unusualevent Unusualevent
(Step1ofthe3wayhandshake)
RST/-
SYN/SYN + ACK
(Step2ofthe3wayhandshake)
(Step3ofthe3wayhandshake)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-102 102
CLOSED
LISTEN
SYN_RCVD
ESTABLISHED
CLOSING
TIME_WAIT
SYN_SENT
FIN_WAIT_1
CLOSE_WAIT
LAST_ACK
FIN_WAIT_2
a
c
t
i
v
e

o
p
e
n
,
c
r
e
a
t
e

T
C
B

s
e
n
d

S
Y
N
passive open,
create TCB
s
e
n
d

S
Y
N
r
e
c
e
i
v
e

S
Y
N
,
s
e
n
d

S
Y
N
,

A
C
K
r
e
c
e
i
v
e
R
S
T
r
e
c
e
i
v
e
A
C
K receive SYN, ACK,
send ACK applic.
close,
send
FIN
a
p
p
l
i
c
.

c
l
o
s
e
,
s
e
n
d

F
I
N
r
e
c
e
i
v
e

F
I
N
,
s
e
n
d

A
C
K
receive FIN
send ACK
r
e
c
e
i
v
e

F
I
N
,

A
C
K
s
e
n
d

A
C
K
receive
ACK
receive FIN
send ACK
receive
ACK
applic. close
send FIN
receive
ACK
applic. close
or timeout,
delete TCB
2MSL timeout
delete TCB
receive SYN,
send ACK
applic.
close
TCP State
diagram
Thereareseveralthingsthatmustbe
rememberedaboutaconnection.To
storethisinformationweimaginethat
thereisadatastructurecalleda
TransmissionControlBlock(TCB).
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-103 103
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of
congestion control

3.7 TCP congestion control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-104 104

Congestion: too many sources sending too much data too fast
for network to handle and competing for bottleneck
bandwidth

Two common approaches:

rate-based: control rate of traffic (e.g., token bucket)

window-based: limit number of unacknowledged packets

window size controls rate,

Flow control = prevents end-system buffer overflow

window-based control can be used for both.


Principles of Congestion Control
Principles of Congestion Control
source1
source2
source3
sink2
sink1
sink3
100Mbs
100Mbps
100Mbps
10Mbps
100Mbps
10Mbps
1.5Mbps
bottleneck
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-105 105
Congestion: A Close-up View
Congestion: A Close-up View

knee point after which

throughput increases very slowly

delay increases fast

cliff point after which

throughput starts to decrease


very fast to zero (congestion
collapse)

delay approaches infinity

Note (in an M/M/1 queue)

delay = 1/(1 utilization)


D
e
l
a
y
Offered
Load
T
h
r
o
u
g
h
p
u
t
knee cliff
congestion
collapse
packet
loss
Offered
Load
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-106 106

Congestion control goal

Stay left of cliff

Keeps network operating at


full capacity, but minimizes
packet loss maximize
goodput

Congestion avoidance goal

stay left of knee

Right of cliff:

Congestion collapse
Offered
Load
T
h
r
o
u
g
h
p
u
t
knee cliff
congestion
collapse
packet
loss
Congestion Control vs. Congestion Avoidance
Congestion Control vs. Congestion Avoidance
Congestion Control vs. Congestion Avoidance
Congestion Control vs. Congestion Avoidance
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-107 107
Congestion Collapse
Congestion Collapse

Definition: Increase in network load results in decrease


of useful work done

Many possible causes

Spurious retransmissions of packets still in flight

Undelivered packets

Packets consume resources and are dropped elsewhere in network

Fragments

Mismatch of transmission and retransmission units

Control traffic

Large percentage of traffic is for control

Stale or unwanted packets

Packets that are delayed on long queues


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-108 108

ApplicationsinAandBsendingintoconnectionatan
averagerateof
in
bytes/sec.

Original"data:sentintothesocketonlyonce.

Simpletransportprotocol:noerrorrecovery
(retransmission),flowcontrol,orcongestioncontrol.
Causes/Costs of Congestion: scenario 1
Causes/Costs of Congestion: scenario 1
unlimitedshared
outputlinkbuffers
HostA

in
:originaldata
rate[B/s]
HostB

out
Sharedlink
R[B/s]
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-109 109
Causes/Costs of Congestion: scenario 1(cont)
Causes/Costs of Congestion: scenario 1(cont)
(a)Perconnectionthroughput.
[Byte/s]
[Byte/s]

o
u
t
(
t
h
r
o
u
g
h
p
u
t
)

in
(offeredload)
R/2
R/2
(b)Perconnectiondelay.
[Byte/s]
[Byte/s]
D
e
l
a
y

(
m
s
)

in
(offeredload)
R/2

out
=
in

out
=R/2
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-110 110

one router, finite buffers

sender retransmission of lost packet


Scenario 2: Two Sender, a Router with Finite Buffers
Scenario 2: Two Sender, a Router with Finite Buffers
finitesharedoutputlink
buffers
Host A

in
: originaldata
Host B

out

in
: originaldata,plus
retransmitteddata(offeredload
tonetwork)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-111 111

Each connection is reliable. If a packet containing a


transport-level segment is dropped at the router, it
will eventually be retransmitted by the sender.


in
[Bytes/sec] = rate at which the application sends
original data into the socket.


in
[Bytes/sec] = offered load to the network
(containing original data or retransmitted data).
Scenario 2: (cont)
Scenario 2: (cont)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-112 112
[Byte/s]
[Byte/s]

in
(offeredload)
R/2

o
u
t
(
t
h
r
o
u
g
h
p
u
t
)
R/2
R/3
R/4

Example:At
in
=R/2>
out
=R/3

in
=R/2=0.333RBytes/sec(onaverage)originaldata+
0.167RBytes/sec(onaverage)retransmitteddata.

Costofacongestednetwork:thesendermustperformretransmissionsin
ordertocompensatefordropped(lost)packetsduetobufferoverflow.
Scenario 2: Scenario 2: retransmission due to lost packet retransmission due to lost packet ( (perfect retransmission) perfect retransmission)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-113 113
retransmissionduetodelayed(notlost)packet.Each
packetisassumedtobeforwarded(onaverage)twiceby
therouter.
Scenario 2:
Scenario 2:
retransmission due to delayed (not lost) packet
retransmission due to delayed (not lost) packet

o
u
t
(
t
h
r
o
u
g
h
p
u
t
)

in
(offeredload)
R/2
R/2
R/3
R/4
[Byte/s]
[Byte/s]
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-114 114

Sender timeouts and retransmit a packet that has been


delayed in the queue, but not yet lost.

Both the original data packet and the retransmission may


reach the receiver.
The receiver will discard the retransmission.

The "work" done by the router in forwarding the


retransmitted copy of the original packet was "wasted" as the
receiver will have already received the original copy of this
packet.

Cost of a congested network: unneeded retransmissions by


the sender in the face of large delays may cause a router to
use its link bandwidth to forward unneeded copies of a
packet.
Scenario 2:
Scenario 2:
retransmission due to delayed (not lost) packet
retransmission due to delayed (not lost) packet
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-115 115

Each host uses a timeout/ retransmission mechanism

All hosts have the same value of


in
, and

All router links have capacity R Bytes/sec.

A-C:R1,R2;

C-A:R3,R4;

D-B:R3,R2;

B-D:R1,R4

..
finitesharedoutputlink
buffers
HostA

in
: original data
HostD

out
'
in
: original data, plus
retransmitted data
R4
R1
R2
R3
HostB
HostC
Causes/Costs of Congestion: scenario 3
Causes/Costs of Congestion: scenario 3
Suppose:
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-116 116

Extremely small
in
: buffer overflows are rare, and

out
=
in

Larger
in
: , the overflows are still rare. Thus, an
increase in
in
results in an increase in
out
.
Causes/Costs of Congestion: scenario 3(Cont)
Causes/Costs of Congestion: scenario 3(Cont)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-117 117
Causes/Costs of Congestion: scenario 3(Cont)
Causes/Costs of Congestion: scenario 3(Cont)

Extremely large
in
(and hence
in
): A-C traffic arriving to
R2 is at most R.

If
in
is extremely large for all connections, then the arrival
rate of B-D traffic at R2 can be much larger than that of the
A-C traffic.

As the offered load approaches infinity, an empty buffer at


R2 is immediately filled by a B-D packet, and the throughput
of the A-C connection at R2 goes to zero.

Whenpacketdropped(inR2),anyupstream(R1)transmission
capacityusedforthatpacketwaswasted!

Astheofferedloadapproachesinfinity,thethroughputgoesto
zero.

Whenpacketdropped(inR2),anyupstream(R1)transmission
capacityusedforthatpacketwaswasted!

Astheofferedloadapproachesinfinity,thethroughputgoesto
zero.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-118 118
[Byte/s]
[Byte/s]

o
u
t
(
t
h
r
o
u
g
h
p
u
t
)

in
(offeredload)
R/2
R/2
Scenario 3 (cont)
Scenario 3 (cont)
ACandBDtrafficcompeteatrouterR2forthebuffer,AC
trafficthatsuccessfullygetsthroughR2becomessmallerand
smallerastheofferedloadfromBDgetslargerandlarger.
Host A

out
Host B
R1
R2
R4
Host C
Host D
CongestiveCollapse:althoughthe
networklinksarebeingheavily
utilized,verylittleusefulworkis
beingdone.
CongestiveCollapse:althoughthe
networklinksarebeingheavily
utilized,verylittleusefulworkis
beingdone.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-119 119
end-to-end congestion
control:

no explicit feedback
from network

congestion inferred
from end-system
observed loss, delay

approach taken by TCP


end-to-end congestion
control:

no explicit feedback
from network

congestion inferred
from end-system
observed loss, delay

approach taken by TCP


network-assisted congestion
control:

routers provide feedback


to end systems

single bit indicating


congestion (SNA,
DECnet, TCP/IP ECN,
ATM)

explicit rate sender


should send at
network-assisted congestion
control:

routers provide feedback


to end systems

single bit indicating


congestion (SNA,
DECnet, TCP/IP ECN,
ATM)

explicit rate sender


should send at
Twobroadapproachestowardscongestioncontrol:
Approaches towards congestion control
Approaches towards congestion control
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-120 120
Lackofcongestioncontrol Lackofcongestioncontrol
Congestion Control
Congestion Control
Offeredload
T
h
r
o
u
g
h
p
u
t
Controlled
Uncontrolled
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-121 121
Load, delay and power
Load, delay and power
A
v
e
r
a
g
e

P
a
c
k
e
t

d
e
l
a
y
offeredload
Typicalbehaviorofqueueing
systems:
Power
Load
Asimplemetricofhowwellthe
networkisperforming:
Load
Power
Delay

optimal
load
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-122 122
Congestion Avoidance
Congestion Avoidance

Drops are the only widely used indicator of


congestion

TCP - drops and retransmissions


Load
Goodput
K
b
y
t
e
s
/
s
e
c
Time
Congestion Avoidance
TCPs congestion avoidance (Jacobson)
Load
Goodput
K
b
y
t
e
s
/
s
e
c
Time
Congestion Collapse
Congestion collapse
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-123 123

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6 Principles of congestion


control

3.7 TCP congestion


control

3.8 Multimedia Stream & TCP

3.9 TCP fairness

3.10 TCP modeling

3.11 http modeling


Chapter 3 outline
Chapter 3 outline
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-124 124
Queue
Sink
Outbound Link Router Inbound Link
Sink
TCP
TCP AQM
TCP in action
TCP in action
Queue
Sink
Outbound Link Router Inbound Link
Sink
TCP
TCP
ACK
ACK
Queue
Sink
Outbound Link Router Inbound Link
Sink
TCP
TCP
ACK
ACK
Queue
Sink
Outbound Link Router Inbound Link
Sink
TCP
TCP
ACK
Drop!!!
Queue
Sink
Outbound Link Router Inbound Link
Sink
TCP
TCP
Congestion Notification
ACK
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-125 125
FLAVORS OF TCP
FLAVORS OF TCP

When the congestion control is concerned


there TCP flavors:

TCP Tahoe

TCP Reno

TCP New Reno

TCP SACK

TCP Vegas

TransactionTCP
(T/TCP)
rfc1644

TransactionTCP
(T/TCP)
rfc1644
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-126 126
T/TCP
T/TCP

Problem: TCP 3Way-Handshake is expensive for very short


connections

(like RPC or web requests)

Approach: Transaction TCP

send SYN+ACK+data in first packet

reply with SYN+ACK+FIN+data

then ACK+FIN

Limitations

have to cache of ISN (Initial Sequence Number) information,


and may have to fall back to 3WH sometimes

experimental only, not deployed, not clear bug free


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-127 127

end-end control (no


network assistance)

CongWin is dynamic,
function of perceived
network congestion

manifestations:

lost packets
(buffer overflow at
routers)

long delays (queuing


in router buffers)

end-end control (no


network assistance)

CongWin is dynamic,
function of perceived
network congestion

manifestations:

lost packets
(buffer overflow at
routers)

long delays (queuing


in router buffers)
How does sender
perceive congestion?

loss event = timeout or


3 duplicate acks

TCP sender reduces


rate (CongWin) after
loss event
three mechanisms:

slow start

AIMD

conservative after
timeout events
How does sender
perceive congestion?

loss event = timeout or


3 duplicate acks

TCP sender reduces


rate (CongWin) after
loss event
three mechanisms:

slow start

AIMD

conservative after
timeout events
TCP Congestion Control
TCP Congestion Control
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-128 128

0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of Transmission
C
o
n
g
e
s
t
i
o
n

W
i
n
d
o
w

S
i
z
e

[
M
S
S
]

TCP Reno (RFC 2581)
threshold


TCP Tahoe



threshold
Congestion Window Size:

Initial congestion window


threshold=8MSS

Slow start (exponential increase):

Hits threshold at fourth


transmission

Retransmission Time Out:

New CongWin =1

New threshold=12/2

Window then grows


exponentially

3ACKs after eigth transmi.:

New CongWin =1

New threshold=12/2

Window then grows


exponentially

3ACKs after eigth transmi.:

New CongWin=12/2

window then grows linearly


Congestion Window Size:

Initial congestion window


threshold=8MSS

Slow start (exponential increase):

Hits threshold at fourth


transmission

Retransmission Time Out:

New CongWin =1

New threshold=12/2

Window then grows


exponentially

3ACKs after eigth transmi.:

New CongWin =1

New threshold=12/2

Window then grows


exponentially

3ACKs after eigth transmi.:

New CongWin=12/2

window then grows linearly


3ACKsindicatesnetworkcapableofdelivering
somesegments
timeoutbefore3ACKs
3ACKsismorealarming
S
l
o
w

s
t
a
r
t
S
l
o
w

s
t
a
r
t
C
o
n
g
e
s
t
i
o
n

a
v
o
i
d
a
n
c
e
C
o
n
g
e
s
t
i
o
n

a
v
o
i
d
a
n
c
e
TCP Congestion Control and Slow Start
TCP Congestion Control and Slow Start
3ACKs 3ACKs
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-129 129
Slow Start
Slow Start
halved
Timeouts
Exponentialslowstart
t
Rate
Whyisitcalledslowstart?BecauseTCPoriginallyhad
nocongestioncontrolmechanism.Thesourcewouldjust
startbysendingawholewindowsworthofdata.
Slowstartinoperationuntil
itreacheshalfofprevious
cwnd.
Nick McKeown
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-130 130
cwnd
W
W+1
RTT
TCP Congestion Control - 2
TCP Congestion Control - 2

Slow-start and Congestion Avoidance


1
2
4
RTT
Slow Start
Congestion Avoidance
Time
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-131 131
TCP-Reno Behavior
TCP-Reno Behavior
cwnd
time
Slow
Start
Congestion
Avoidance
Timeout
Slow
Start
Packetloss
X
X
O
O
Y
Y
1
2
3
4
5
6
7
8
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-132 132
Slow Start Sequence Plot-
Slow Start Sequence Plot-
Window Doubles every Round
Window Doubles every Round
time
.
.
.
cwnd
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-133 133
Congestion Avoidance Sequence Plot
Congestion Avoidance Sequence Plot
Window grows by 1 every round
Window grows by 1 every round
time
cwnd
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-134 134
Fast Retransmit
Fast Retransmit
time
DuplicateAcks
Retransmission
X
cwnd
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-135 135
Cwnd of TCP
Cwnd of TCP
Slow Start
Fast Recovery
Congestion Avoidance
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-136 136
Queue Size
Queue Size
Queue Empty
Queue Full
Queue Not Full
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-137 137
TCP-Reno AIMD
TCP-Reno AIMD
(Adaptive Increase, Multiplicative-Decrease
(Adaptive Increase, Multiplicative-Decrease

TCP sources change the sending rate by modifying


the window size:

Window = min {Advertised window, Congestion Window}

In other words, send at the rate of the slowest


component: network or receiver.

cwnd follows additive increase/multiplicative


decrease (AIMD)
ReceiveWindow
cwnd
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-138 138
multiplicative decrease:
cut CongWin CongWin

in half
after loss event
multiplicative decrease:
cut CongWin CongWin

in half
after loss event
additiveincrease:increase
CongWin CongWinby1MSSevery
RTTintheabsenceofloss
events:probing
additiveincrease:increase
CongWin CongWinby1MSSevery
RTTintheabsenceofloss
events:probing
Long-lived TCP connection
8 Kbytes
16 Kbytes
24 Kbytes
time
CongWin
TCP-Reno AIMD
TCP-Reno AIMD
receivewindow


limitationofnetwork

optimal(average)
windowsize

Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer


3- 3-139 139

When connection begins,


CongWin = 1 MSS

Example:
MSS = 500B(4000b)
RTT = 200 msec

initial rate = 4000/200


=20 kbps

When connection begins,


increase rate exponentially
until first loss event:

double CongWin every RTT

done by incrementing CongWin


for every ACK received.
HostA
one segm
ent
R
T
T
HostB
time
tw
o segm
ents
four segm
ents
A
C
K
R
a
t
e
=
2
0
k
b
p
s
R
a
t
e
=
4
0
k
b
p
s
R
a
t
e
=
8
0
k
b
p
s
TCP-Reno Slow Start (more)
TCP-Reno Slow Start (more)
Rate(t)=
MSS*CongWin(t)
RTT
[Bytes/sec]
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-140 140
TCP sender congestion control
TCP sender congestion control
Event State TCPSenderAction Commentary
ACKreceiptfor
previously
unackeddata
SlowStart(SS) CongWin=CongWin+MSS,
If(CongWin>Threshold)
setstatetoCongestion
Avoidance
Resultinginadoublingof
CongWineveryRTT
ACKreceiptfor
previously
unackeddata
Congestion
Avoidance
(CA)
CongWin=CongWin+MSS*
(MSS/CongWin)

Additiveincrease,resultingin
increaseofCongWinby1MSS
everyRTT
Lossevent
detectedby
tripleduplicate
ACK
SSorCA Threshold=CongWin/2,
CongWin=Threshold,
SetstatetoCongestionAvoidance
Fastrecovery,implementing
multiplicativedecrease.
CongWinwillnotdropbelow1
MSS.
Timeout SSorCA Threshold=CongWin/2,
CongWin=1MSS,
SetstatetoSlowStart
Enterslowstart
DuplicateACK SSorCA IncrementduplicateACKcountfor
segmentbeingacked
CongWinandThresholdnot
changed
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-141 141
TCP throughput
TCP throughput

Whats the average throughout ot TCP as a function of


window size and RTT?

Ignore slow start

Let W=MSS*CongWin be the window size when loss occurs.

When window is W, throughput is W/RTT

Just after loss, window drops to W/2, throughput to


W/2RTT.

Average throughout: 0.75 W/RTT

Example: MSS=1500 byte segments, RTT=100ms, want 7.5


Gbps throughput

Requires window size CongWin = 83,333 in-flight segments


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-142 142
TCP Futures
TCP Futures

Throughput in terms of loss rate:

For:

1Gbps throughput,

RTT=100ms and

MSS=1500 Byte
e = 2.1410
-8

New versions of TCP for high-speed needed!


e RTT
MSS 22 . 1
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-143 143
Problems with TCP-Reno
Problems with TCP-Reno

TCP Reno uses two mechanisms to detect packet losses:

Triple duplicated ACKs,

Timeout.

Triple duplicated ACKs often fails to be triggered due to


either,

Losses in burst,

Small window.

Timeout needs unnecessarily long delay.

Congestion control in Reno,

Need to create packet losses to find the available bandwidth of


the connection,

Continually congesting the network,

Creating losses for other connections sharing the link,

Oscillations.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-144 144
Performance Evaluation of Vegas
Performance Evaluation of Vegas
Reno Vegas1,3 Vegas2,4
Throughput(Kb/s) 53.00 72.50 75.30
ThroughputRatio 1.00 1.37 1.42
Retransmission(KB) 47.80 24.50 29.30
RetransmissionRatio 1.00 0.51 0.61
CoarseTimeouts 3.30 0.80 0.90
1MByteTransferOvertheInternet
1024KB 512KB 128KB
Reno Vegas Reno Vegas Reno Vegas
Throughput(KB/s) 53.00 72.50 52.00 72.00 31.10 53.10
ThroughputRatio 1.00 1.37 1.00 1.38 1.00 1.71
Retransmission(KB) 47.80 24.50 27.90 10.50 22.90 4.00
RetransmissionRatio 1.00 0.51 1.00 0.38 1.00 0.17
CoarseTimeouts 3.30 0.80 1.70 0.20 1.10 0.20
EffectofTransferSizeOvertheInternet
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-145 145
New Retransmission Mechanism: Vegas
New Retransmission Mechanism: Vegas

Upon receiving a duplicated


ACK or an ACK for a
retransmitted packet, Vegas
checks the time interval after
the previous packet of the just
ACKed packet was sent.

If the time interval is greater


than the timeout value, then
the packet is retransmitted
without waiting triple
duplicated ACKs.

Only decreasing CWND if the


retransmitted packet was sent
after the last decrease.

Upon receiving a duplicated


ACK or an ACK for a
retransmitted packet, Vegas
checks the time interval after
the previous packet of the just
ACKed packet was sent.

If the time interval is greater


than the timeout value, then
the packet is retransmitted
without waiting triple
duplicated ACKs.

Only decreasing CWND if the


retransmitted packet was sent
after the last decrease.
ReceivedACKforpacket10(packets11and12areintransit)
Sendpacket13(whichislost)
ReceivedACKforpacket11
Sendpacket14
ReceivedACKforpacket12
Sendpacket15(whichisalsolost)
ShouldhavegottenACKforpacket13
ReceiveddupACKforpacket12(duetopacket14)
Vegascheckstimestampofpacket13anddecidestotransmitit
(Renowouldneedtowaitforthe3
rd
duplicateACK)
ReceivedACKforpackets13and14
Sinceitis1
st
or2
nd
ACKafterretransmission,
Vegascheckstimestampofpacket15anddecidetotransmitit
(Renowouldneedtowaitfor3newduplicateACKs)
O
n
e

R
T
T
O
n
e

R
T
T
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-146 146
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6Principlesofcongestion
control

3.7TCPcongestioncontrol

3.8MultimediaStream&
TCP

3.9TCPfairness

3.10TCPmodeling

3.11httpmodeling
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-147 147
Multimedia Stream & TCP
Multimedia Stream & TCP

Window control not really appropriate for


multimedia applications:

time-scale too short (~ RTT) constantly switch codecs


visible or audible transitions

TCP may start or drop below minimum codec rate.

Flow control not needed since receiver will need to


process data at the nominal (codec) rate.

TCP reliability mechanism may impose additional


delay (> 500 ms) on packet loss.

Thus, only want to maintain same long-term rate as


TCP

no encouragement to mask file transfer as video

react to congestion and bandwidth bottlenecks.


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-148 148
TCP Friendliness
TCP Friendliness

Internet will soon begin to require applications even uses TCP


or to perform congestion control.

If do not perform congestion control: be penalized

probably in the form of preferentially dropping their packets


during times of congestion.

They are capable of running over a much wider range


bandwidths and are hence more useful in the Internet.

Any new congestion control must compete with TCP flows.

Should not clobber TCP flows and grab bulk of link

Should also be able to hold its own, i.e. grab its fair share, or
it will never become popular.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-149 149
TCP Friendly Rate Control (TFRC)
TCP Friendly Rate Control (TFRC)
TCP
TCP
nonTCP
nonTCP
Internet
Internet
) 32 1 (
8
3
3
3
2
2
) ( e e
be
RTO
be
RTT
MSS
B
+ +

Byte/sec Byte/sec

NonTCPapplicationsmimicAIMDbehavior,possiblywithlonger
timescales

can also change A and D parameters ( GAIMD)

TheysendwithrateB(TCPthroughputequationTCPReno):

Round-trip delay RTT

Packet size MSS [byte]

Loss event rate e (receiver feedback every RTT)

Retransmission timeout RTO, b=2


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-150 150
Internet Measurements
Internet Measurements

3 TCP connections and 1 TFRC connection

London (UCL) to Berkeley (ACIRI).

Throughput measured over 1 sec intervals

3 TCP connections and 1 TFRC connection

London (UCL) to Berkeley (ACIRI).

Throughput measured over 1 sec intervals


TFRCmuchmorestablethanTCP TFRCmuchmorestablethanTCP
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-151 151
Datagram Congestion Control Protocol
Datagram Congestion Control Protocol

Delay-sensitive applications, such as streaming media,


typically prefer timeliness to reliability.

These applications use UDP for transport and implemented


their own congestion control mechanisms - a difficult task, or
no congestion control at all.

DCCP, is a new transport protocol currently being


standardized by IETF that provides a congestion-controlled
flow of unreliable datagrams.

DCCP is an unreliable transport protocol like UDP, but it has


congestion control like TCP.

The protocol can be extended by adding new congestion


control algorithms, TCP-Friendly Rate Control (TFRC) as
profiles, in order to customize the congestion control for
applications with different characteristics.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-152 152
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6Principlesofcongestion
control

3.7TCPcongestioncontrol

3.8MultimediaStream&TCP

3.9TCPfairness

3.10TCPmodeling

3.11httpmodeling
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-153 153
TCP Fairness
TCP Fairness

If N TCP sessions share a


bottleneck link, then each
session should get 1/N of the
link capacity.
source1
source1
source2
source3
source4
sink2
sink1
sink3
sink4
10ms
5ms
5ms
5ms
5ms
5ms
100ms
200ms
30ms
1.5Mbps
bottleneck
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-154 154
Two competing sessions:

Additive increase gives slope of 1, as throughout increases

multiplicative decrease decreases throughput proportionally


R
R
equal bandwidth share
Connection1throughput
C
o
n
n
e
c
t
i
o
n

t
h
r
o
u
g
h
p
u
t
congestionavoidance:additiveincrease
loss:decreasewindowbyfactorof2
congestionavoidance:additiveincrease
loss:decreasewindowbyfactorof2
F
u
l
l

b
a
n
d
w
i
d
t
h

u
t
i
l
i
z
a
t
i
o
n

l
i
n
e
A
Why is TCP fair?
Why is TCP fair?
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-155 155
Fairness and UDP

Multimedia apps often


do not use TCP

do not want rate


throttled by congestion
control

Instead use UDP:

pump audio/video at
constant rate, tolerate
packet loss

Research area: TCP


friendly
Fairness and parallel TCP
connections

nothing prevents app from


opening parallel
connections between 2
hosts.

Web browsers do this

Example: link of rate R


supporting 9 connections;

new app asks for 1 TCP, gets


rate R/10

new app asks for 11 TCPs,


gets R/2 !
Fairness (more)
Fairness (more)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-156 156
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6Principlesofcongestion
control

3.7TCPcongestioncontrol

3.8MultimediaStream&TCP

3.9TCPfairness

3.10TCPmodeling

3.11httpmodeling
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-157 157
Q: How long does it take to
receive an object from a
Web server after sending
a request?
Ignoring congestion, delay is
influenced by:

TCP connection establishment

data transmission delay

slow start
Q: How long does it take to
receive an object from a
Web server after sending
a request?
Ignoring congestion, delay is
influenced by:

TCP connection establishment

data transmission delay

slow start
Notation, assumptions:

Assume one link between


client and server of rate R

S: Segment Size (bits)

O: object (file) size (bits)

no retransmissions (no loss,


no corruption)
Window size:

First assume: fixed


congestion window, W
segments

Then dynamic window,


modeling slow start
Notation, assumptions:

Assume one link between


client and server of rate R

S: Segment Size (bits)

O: object (file) size (bits)

no retransmissions (no loss,


no corruption)
Window size:

First assume: fixed


congestion window, W
segments

Then dynamic window,


modeling slow start
TCP modeling
TCP modeling
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-158 158
Delay=2RTT+O/R
First case:
ACK for first segment
in window returns
before windows
worth of data sent
client
server
2
R
T
T
WS/R>RTT+S/R
Fixed Congestion Window (1)
Fixed Congestion Window (1)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-159 159
Second case:
wait for ACK after
sending windows
worth of data sent
Delay=2RTT+O/R
+(K1)[S/R+RTTWS/R]
client
server
WS/R<RTT+S/R
K=O/WS
1
1

Fixed Congestion Window (2)


Fixed Congestion Window (2)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-160 160
Suppose window grows according to slow start:
It Will shown that the delay for one object is:
wherePisthenumberoftimesTCPidlesatserver:
- whereQisthenumberoftimestheserveridles
iftheobjectwereofinfinitesize.
- andKisthenumberofwindowsthatcovertheobject.
R
S
R
S
RTT P
R
O
RTT Delay
P
) 1 2 ( 2

1
]
1

+ + +
} 1 , { min

K Q
P
TCP Delay Modeling: Slow Start
TCP Delay Modeling: Slow Start
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-161 161
Example:
O/S=15
segments
K=4windows
Q=2
P=min{K1,Q}=
2
ServeridlesP=2
times
Delaycomponents:

2RTTfor
connectionestaband
request
O/Rtotransmit
object
timeserveridles
duetoslowstart
Serveridles:
P=min{K1,Q}
times
1
2
3
4
15
RTT-S/R
Server idle 1
Server idle 2
k=1
k=2
k=3
k=4
RTT
initiate TCP
connection
request
object
first window
= S/R
second window
= 2S/R
third window
= 4S/R
fourth window
= 8S/R
complete
transmission
object
delivered
time at
client
time at
server
TCP Delay Modeling: Slow Start (cont)
TCP Delay Modeling: Slow Start (cont)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-162 162
R
S
R
S
RTT P RTT
R
O
R
S
RTT
R
S
RTT
R
O
idleTime RTT
R
O
delay
P
k
P
k
P
p
p
) 1 2 ( ] [ 2
] 2 [ 2
2
1
1
1
+ + +
+ + +
+ +

+
RTT
R
S Thetimefromwhenserverbeginstotransmitthe1
st
segment
untilthetimewhentheserverreceivesanacknowledgmentthesegment.
2
1

R
S k
=totaltransmissiontimefork
th
window
[x]
+
=max(x,0)
2
1
R
S
RTT
R
S
k
1
]
1

+
+

=idletimeafterk
th
window
TCP Delay Modeling: Slow Start (cont)
TCP Delay Modeling: Slow Start (cont)
Serverstartstosendk
th
window
+
RTT
R
S
Serverreceives1
st
ack
1
st

segmentissend
R
S
m
st

segmentissend
2
1

R
S k
k
th
windowincludingmsegments k
th
windowincludingmsegments
m=2
k
1
m=2
k
1
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-163 163
RecallK=numberofwindowsthatcoverobject
HowKiscalculated?
1
1
1

+
+

+ + +
+ + +

) 1 ( log
)} 1 ( log : { min
} 1 2 : { min
} / 2 2 2 : { min
} 2 2 2 : { min
2
2
1 1 0
1 1 0
S
O
S
O
k k
S
O
k
S O k
O S S S k K
k
k
k

TCP Delay Modeling: Slow Start (cont)


TCP Delay Modeling: Slow Start (cont)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-164 164
CalculationofQ,numberofidlesforinfinitesizeobject:
HowQiscalculated?
1 1 log
1 1 log : max
1 2 : max
0 2 : max
2
2
1
1
+
1
1
]
1

,
_

'

,
_

'

'

R
S
RTT
R
S
RTT
k k
R
S
RTT
k
R
S
RTT
R
S
k Q
k
k
Serverstartstosendk
st

window
+
RTT
R
S
Serverreceives1
st
ack
1
st

segmentissend
R
S
m
st

segmentissend
2
1
R
S k
2
1
R
S
RTT
R
S k
1
]
1

+
+

=idletimeafterk
th
window
TCP Delay Modeling: Slow Start (cont)
TCP Delay Modeling: Slow Start (cont)
k
th
windowincludingmsegments k
th
windowincludingmsegments
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-165 165
28kbps 28.6sec 1 28.8sec 28.9sec
100kbps 8sec 2 8.2sec 8.4sec
1Mbps 800msec 5 1sec 1.5sec
10Mbps 80msec 7 0.28sec 0.98sec
RO/RPFixedWindowSlowStart
Assumptions: S=536B
RTT=100msec
O=100kB
K=8
Examples(1)
Examples(1)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-166 166
28kbps 1.43sec 1 1.63sec 1.73sec
100kbps 0.4sec 2 0.6sec 0.76sec
1Mbps 40ms 3 0.24sec 0.52sec
10Mbps 4ms 3 0.2sec 0.5sec
RO/RPFixedWindowSlowStart
Assumptions: S=536B
RTT=100msec
O=5kB
K=4
Examples(2)
Examples(2)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-167 167
28kbps 1.43sec 3 3.4sec 5.8sec
100kbps 0.4sec 3 2.4sec 5.2sec
1Mbps 40ms 3 2.0sec 5.0sec
10Mbps 4ms 3 2.0sec 5.0sec
RO/RPFixedWindowSlowStart
Assumptions: S=536B
RTT=1000msec
O=5kB
K=4
Examples(3)
Examples(3)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-168 168
0
5
10
15
20
25
30
F1 S1 F2 S2 F3 S3
sec
0
1
2
3
4
5
F1 S1 F2 S2 F3 S3
sec
28kbps
100kbps
1Mbps
10Mbps
S=536B
RTT=
100msec
O=100kB
K=8 K=8
1 1
S=536B
RTT=100msec
O=5kB
K=4 K=4
2 2
S=536B
RTT=1000msec
O=5kB
K=4 K=4
3 3
F: Fixed Window S: Slow Start F: Fixed Window S: Slow Start
Delay: Examples(1,2,3)
Delay: Examples(1,2,3)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-169 169
TCP Send Rate (Throughput)1
TCP Send Rate (Throughput)1

We want to characterize the send rate of a bulk


transfer TCP flow as a function of packet loss and
round trip dalay(RTT).

Bulk transfer means that a Flow with the large data


to send such a ftp transfer.

If we have TCP send rate model, We can define a


TCP-friendly send rate for non-TCP flow such as
Multimedia that interacts with the TCP connections.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-170 170
TCP Send Rate (Throughput)2
TCP Send Rate (Throughput)2

Model captures not only the behavior of the fast


retransmit mechanism but also the effect of the
time-out mechanism.

Model is based on the Reno flavor of TCP, as it is one


of the more popular implementations in the Internet
today.

We model the congestion avoidance behavior of TCP


in terms of rounds.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-171 171
Steady-State Model of TCP Throughput
Steady-State Model of TCP Throughput

Send rate of a bulk transfer TCP Reno flow:

Let b be the number of packets that are acknowledged by a


received ACK. Many TCP receiver implementations send one
cumulative ACK for two consecutive packets received, so b is
typically 2.

where W
max
is the maximum window allowed by receiver and sender
(typically 8KB, 16KB, or 32KB),

we define e to be the probability that a packet is lost. Assumption:


e>0

Send rate of a bulk transfer TCP Reno flow:

Let b be the number of packets that are acknowledged by a


received ACK. Many TCP receiver implementations send one
cumulative ACK for two consecutive packets received, so b is
typically 2.

where W
max
is the maximum window allowed by receiver and sender
(typically 8KB, 16KB, or 32KB),

we define e to be the probability that a packet is lost. Assumption:


e>0
)
) 32 1 (
8
3
3 , 1 min
3
2
1
min( ) (
2
max
) ( e e
be
RTO
be
RTT
,
RTT
W
e B
+ +

[segments/sec]
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-172 172
Approximations
Approximations


+

+ +

+ +

1

22 . 1
)
1
(
2
3 1
) (
: 0.05 e For
) 32 1 (
8
3
3
3
2
1
) (
: W large and 0.148 e For
)
) 32 1 (
8
3
3 , 1 min
3
2
1
, min( ) (
2
max
2
max
) (
b
e RTT e
o
be RTT
e B
e e
be
RTO
be
RTT
e B
e e
be
RTO
be
RTT
RTT
W
e B

Thenotationf=o(g)meansthat(g>0and)(f/g)>0.Thenotationo(g)
indicatesthatthetermisofsmallerorderofmagnitudethang.

Thenotationf=o(g)meansthat(g>0and)(f/g)>0.Thenotationo(g)
indicatesthatthetermisofsmallerorderofmagnitudethang.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-173 173
Total Time To Transfer
Total Time To Transfer

To transfer O [byte] file, the time is calculated by:

In which

RTT is connection setup time

Mss [byte] is segment size

The throughput (achieved bandwidth) between source and


destination:
) (
2
e B MSS
O
RTT delay

+
) (e B MSS Throughput R
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-174 174
Comparison of network throughput and TCP
Comparison of network throughput and TCP
throughput (send rate)
throughput (send rate)
10000
1000
100
10
1
0.01
0.001 0.1 1
LossRate
s
e
g
m
e
n
t
s
/
1
0
0

S
e
c
s
NetworkThroughput
TCPThroughput(SendRate)
RTT=0.470
RTO=3.2,W
max
=12
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-175 175
Chapter 3 outline
Chapter 3 outline

3. Introduction

3.1 Transport-layer
services

3.2 Multiplexing and


demultiplexing

3.3 Connectionless
transport: UDP

3.4 Principles of reliable


data transfer

3.5 Connection-oriented
transport: TCP

segment structure

reliable data transfer

flow control

connection management

3.6Principlesofcongestion
control

3.7TCPcongestioncontrol

3.8MultimediaStream&TCP

3.9TCPfairness

3.10TCPmodeling

3.11httpmodeling
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-176 176
Non-persistent HTTP issues:

requires 2RTTs per object for


TCP connection.

Response time per object= O/R


+ 2RTT + sum of idle times
Persistent without pipelining:

client issues new request only


when previous response has
been received

one RTT for each referenced


object
Non-persistent HTTP issues:

requires 2RTTs per object for


TCP connection.

Response time per object= O/R


+ 2RTT + sum of idle times
Persistent without pipelining:

client issues new request only


when previous response has
been received

one RTT for each referenced


object
Persistent with pipelining:

default in HTTP/1.1

client sends requests as soon


as it encounters a referenced
object

one RTT for all the


referenced objects
Persistent HTTP:

server leaves connection open


after sending response

subsequent HTTP
requests/responses between
same client/server are sent
over connection.
Persistent with pipelining:

default in HTTP/1.1

client sends requests as soon


as it encounters a referenced
object

one RTT for all the


referenced objects
Persistent HTTP:

server leaves connection open


after sending response

subsequent HTTP
requests/responses between
same client/server are sent
over connection.
HTTP Modeling
HTTP Modeling
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-177 177

Assume Web page consists of:

1 base HTML page (of size O bits)

M images (each of size O bits)

What is the Response Time (delay)?

Non-persistent HTTP:

M+1 TCP connections in series

.
HTTP Modeling
HTTP Modeling
) (
) 1 ( 2 ) 1 (
e B MSS
O
M RTT M delay

+ + +
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-178 178

Persistent with pipelining HTTP:

2 RTT to request and receive base HTML file

1 RTT to request all images (if all objects reside


in same server)

.
HTTP Modeling-Persistent
HTTP Modeling-Persistent
) (
) 1 ( 3
e B MSS
O
M RTT delay

+ +
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-179 179

Non-persistent HTTP with X parallel connections

Suppose M/X integer.

1 TCP connection for base file (2RTT).

M/X sets of parallel connections for images


(M/X)(2RTT).

.
HTTP Modeling-Nonpersistent
HTTP Modeling-Nonpersistent
) (
) 1 ( 2 ) 1 (
e B MSS
O
M RTT
X
M
delay

+ + +
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-180 180
Summary
Summary

Assume Web page consists of:

1 base HTML page (of size O bits)

M images (each of size O bits)

What is the Response Time (delay)?


NonpersistentHTTP
withXparallel
connections
Persistentwithpipelining
HTTP
NonpersistentHTTP
) (
) 1 ( 2 ) 1 (
e B MSS
O
M RTT M delay

+ + +
) (
) 1 ( 3
e B MSS
O
M RTT delay

+ +
) (
) 1 ( 2 ) 1 (
e B MSS
O
M RTT
X
M
delay

+ + +
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-181 181
RTT=100msec,O=5Kbytes,M=10andX=5,
FixedcongestionWindow.
Forlowbandwidth,connection&responsetimedominatedby
transmissiontime.
Persistentconnectionsonlygiveminorimprovementoverparallel
connections.
0
2
4
6
8
10
12
14
16
18
20
28
Kbps
100
Kbps
1
Mbps
10
Mbps
non-persistent
Persistent-
pipeline
parallel non-
persistent
Example-HTTP Response time (in seconds)
Example-HTTP Response time (in seconds)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-182 182
ForlargerRTT,responsetimedominatedbyTCPestablishment&
slowstartdelays.
Persistentconnectionsnowgiveimportantimprovement:
particularlyinhighdelaybandwidthnetworks.
RTT=1000sec,O=5Kbytes,M=10andX=5,
FixedcongestionWindow.
0
10
20
30
40
50
60
70
28
Kbps
100
Kbps
1
Mbps
10
Mbps
non-persistent
Persistent-
pipeline
parallel non-
persistent
HTTP Response time (in seconds)
HTTP Response time (in seconds)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-183 183

principles behind transport


layer services:

multiplexing,
demultiplexing

reliable data transfer

flow control

congestion control

instantiation and
implementation in the
Internet

UDP

TCP
Next:

leaving the network


edge (application,
transport layers)

into the network


core
Chapter 3: Summary
Chapter 3: Summary
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-184 184
Evolution of TCP
Evolution of TCP
1975 1980 1985
1990
1982
TCP&IP
RFC793&791
1974
TCPdescribedby
VintCerfandBobKahn
InIEEETransComm
1983
BSDUnix4.2
supportsTCP/IP
1984
Nagelsalgorithm
toreduceoverhead
ofsmallpackets;
predictscongestion
collapse
1987
Karnsalgorithm
tobetterestimate
roundtriptime
1986
Congestioncollapse
observed
1988
VanJacobsons
algorithms
congestionavoidanceand
congestioncontrol
(mostimplementedin
4.3BSDTahoe)
1990
4.3BSDReno
fastretransmit
delayedACKs
1975
Threewayhandshake
RaymondTomlinson
InSIGCOMM75
BerkeleySoftwareDistribution
BSDSocketLayer
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-185 185
TCP Through the 1990s
TCP Through the 1990s
1993 1994
1996
1994
ECN
(Floyd)
Explicit
Congestion
Notification
1993
TCPVegas
(Brakmoetal)
realcongestion
avoidance
1994
T/TCP,rfc1644
(Braden)
Transaction
TCP
1996
SACKTCP
(Floydetal)
Selective
Acknowledgement
1996
Hoe
ImprovingTCP
startup
1996
FACKTCP
(Mathisetal)
extensiontoSACK
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-186 186
Recommended Links
Recommended Links

Sally Floyd's Homepage http://www.icir.org/floyd

Sally Floyd maintains several excellent "pointers to


literature" pages, including pointers to papers in the
following research areas:
Changes proposed to TCP
Global Optimization with End-to-End Congestion Control
Measurement Studies of End-to-End Congestion Control in the
Internet
Research Questions for the Internet
The Evolvability of the Internet Infrastructure
Layering and the Internet Architecture
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-187 187
Recommended Links
Recommended Links

TCP Friendly Page


http://www.psc.edu/networking/tcp_friendly.html

This Web site summarizes some of the recent work on


congestion control algorithms for non-TCP based
applications. It focuses on congestion control schemes
that use the "TCP-friendly" equation, (that is, maintaining
the arrival rate to at most some constant over the square
root of the packet loss rate).

Research on TCP over Wireless Links


http://bbcr.uwaterloo.ca/~jpan/tcpair/

Wireless links, which often have high bit error rates, can
reek havoc when carrying TCP traffic. This page points to
the numerous research papers that have tried to address
the problems.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-188 188
TCP REFERENCES:

[TCP:1] "Transmission Control Protocol," J. Postel, RFC-793,


September,1981.

[TCP:2] "Transmission Control Protocol," MIL-STD-1778, US


Department of, Defense, August 1984.
This specification as amended by RFC-964 is intended to
describe the same protocol as RFC-793 [TCP:1]. If there is a
conflict, RFC-793 takes precedence, and the present document
is authoritative over both.

[TCP:3] "Some Problems with the Specification of the Military


Standard Transmission Control Protocol," D. Sidhu and T.
Blumer, RFC-964, November 1985.

[TCP:4] "The TCP Maximum Segment Size and Related Topics," J.


Postel, RFC-879, November 1983.

[TCP:5] "Window and Acknowledgment Strategy in TCP," D.


Clark, RFC-813, July 1982.

[TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge,


ACM SIGCOMM-87, August 1987.

[TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM


SIGCOMM 88, August 1988.
Request for Comments: 1122; R. Braden, Editor October 1989
Request for Comments: 1122; R. Braden, Editor October 1989
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-189 189
References1
References1

A note on Internet Request for Comments (RFCs): Copies of Internet


RFCs are maintained at multiple sites. The RFC URLs below all point into the
RFC archive at the Information Sciences Institute (ISI), maintained the
the RFC Editor of the Internet Society (the body that oversees the RFCs).
Other RFC sites include http://www.faqs.org/rfc,
http://www.pasteur.fr/other/computer/RFC (located in France), and
http://www.csl.sony.co.jp/rfc/ (located in Japan). Internet RFCs can be
updated or obsoleted by later RFCs. We encourage you to check the sites
listed above for the most up-to-date information. The RFC search facility
at ISI, http://www.rfc-editor.org/rfcsearch.html, will allow you to search
for an RFC and show updates to that RFC.

[Ahn 1995] J. S. Ahn, P. B. Danzig, Z. Liu, and Y. Yan, "Experience with TCP
Vegas: Emulation and Experiment", Proceedings of ACM SIGCOMM '95
(Boston, MA, Aug. 1995), pp. 185-195.
http://www.acm.org/sigcomm/sigcomm95/papers/ahn.html

[Bertsekas 1991] D. Bertsekas and R. Gallagher, Data Networks, 2nd Ed. ,


Prentice Hall, Englewood Cliffs, NJ, 1991.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-190 190
References2
References2

[Bochman 84] G. V. Bochmann and C. A. Sunshine, "Formal methods


in communication protocol design," IEEE Transactions on
Communications, Vol. COM-28, No. 4 (Apr. 1980), pp. 624-631.

[Brakmo 1995] L. Brakmo and L. Peterson, "TCP Vegas: End to End


Congestion Avoidance on a Global Internet," IEEE Journal of
Selected Areas in Communications, Vol. 13, No. 8, pp. 1465-1480,
Oct. 1995. ftp://ftp.cs.arizona.edu/xkernel/Papers/jsac.ps

[Cela 2000] F. Cela, "A quick Tour around TCP,"


http://www.ce.chalmers.se/%7Efcela/tcp-tour.html

[Chiu 1989] D. Chiu and R. Jain, "Analysis of the Increase and


Decrease Algorithms for Congestion Avoidance in Computer
Networks," Computer Networks and ISDN Systems, Vol. 17, No. 1,
pp. 1-14. ftp://netlab.ohio-state.edu/pub/jain/papers/cong_av.pdf
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-191 191
References3
References3

[Fall 1996] K. Fall, S. Floyd, "Simulation-based Comparisons


of Tahoe, Reno and SACK TCP," ACM Computer
Communication Review, Vol. 26, No. 3, pp. 5- 21, July 1996.
ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z

[Floyd TCP 1994] S. Floyd, "TCP and Explicit Congestion


Notification," ACM Computer Communication Review, Vol. 24,
No. 5, pp. 10-23, Oct. 1994.
http://www.aciri.org/floyd/papers/tcp_ecn.4.ps.Z

[Floyd 1999] S. floyd and K. Fall, "Promoting the Use of End-


to-End Congestion Control in the Internet," IEEE/ACM
Transactions on Networking, Vol. 6, No. 5 (Oct. 1998), pp.
458-472.

[Floyd 2000] S. Floyd, M. Handley, J. Padhye, J. Widmer,


"Equation-Based Congestion Control for Unicast Applications,
" Proceedings of ACM SIGGOMM '00, (Stockholm, Sweden,
Aug. 2000).
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-192 192
References4
References4

[Heidemann 1997] J. Heidemann, K. Obraczka, and J. Touch,


"Modeling the Performance of HTTP over Several Transport
Protocols," IEEE/ACM Transactions on Networking, Vol. 5, No. 5
(Oct. 1997), pp. 616-630.

[Jacobson 1988] V. Jacobson, "Congestion Avoidance and Control,"


Proceedings of ACM SIGCOMM '88, pp. (Stanford, CA, Aug. 1988),
314-329, ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z

[Jain 1989] R. Jain, "A Delay-Based Approach for Congestion


Avoidance in Interconnected Heterogeneous Computer Networks,"
ACM Computer Communications Review, Vol. 19, No. 5 (1989), pp.
56-71.

[Jain 1996] R. Jain. S. Kalyanaraman, S. Fahmy, R. Goyal, and S.


Kim, "Tutorial Paper on ABR Source Behavior," ATM Forum/96-
1270, Oct. 1996.
http://www.cis.ohio-state.edu/~jain/atmf/a96-1270.htm
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-193 193
References5
References5

[Mathis 1996] M. Mathis, J. Mahdavi, "Forward Acknowledgment:


Refining TCP Congestion Control", Proceedings of ACM SIGCOMM
'96, (Stanford, CA, Aug. 1996),
http://www.acm.org/sigcomm/sigcomm96/papers/mathis.html

[Mahdavi 1997] J. Mahdavi and S. Floyd, "TCP-Friendly Unicast


Rate-Based Flow Control," unpublished note, Jan. 1997.
http://www.psc.edu/networking/papers/tcp_friendly.html

[Ramakrishnan 1990] K. K. Ramakrishnan and Raj Jain, "A Binary


Feedback Scheme for Congestion Avoidance in Computer
Networks," ACM Transactions on Computer Systems, Vol. 8, No. 2
(May 1990), pp. 158-181.

[RFC 793] J. Postel, "Transmission Control Protocol," RFC 793,


Sept. 1981. http://www.rfc-editor.org/rfc/rfc793.txt
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-194 194
References6
References6

[RFC 1122] R. Braden, "Requirements for Internet Hosts--


Communication Layers," RFC 1122, Oct. 1989.
http://www.rfc-editor.org/rfc/rfc1122.txt

[RFC 1323] V. Jacobson, S. Braden, and D. Borman, "TCP


Extensions for High Performance," RFC 1323, May 1992.
http://www.rfc-editor.org/rfc/rfc1323.txt

[RFC 2018] M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow, "TCP


Selective Acknowledgment Options," RFC 2018, Oct. 1996.
http://www.rfc-editor.org/rfc/rfc2018.txt

[RFC 2481] K. K. Ramakrishnan and S. Floyd, "A Proposal to Add


Explicit Congestion Notification (ECN) to IP," RFC 2481, Jan. 1999.
http://www.rfc-editor.org/rfc/rfc2481.txt
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-195 195
References7
References7

[RFC 2581] M. Allman, V. Paxson, W. Stevens, " TCP Congestion


Control," RFC 2581, Apr. 1999.
http://www.rfc-editor.org/rfc/rfc2581.txt

[Rhee 1998] I. Rhee, "Error Control Techniques for Interactive


Low-bit Rate Video Transmission over the Internet," Proceedings
ACM SIGCOMM'98, Vancouver BC, (Aug. 31 - Sept. 4, 1998).
http://www.acm.org/sigcomm/sigcomm98/tp/abs_24.html

[Schwartz 1982] M. Schwartz, "Performance Analysis of the SNA


Virtual Route Pacing Control," IEEE Transactions on
Communications, Vol. COM-30, No. 1, (Jan. 1982), pp. 172-184.

[Stevens 1994] W. R. Stevens, TCP/IP Illustrated, Vol. 1: The


Protocols, Addison-Wesley, Reading, MA, 1994.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-196 196
References8
References8

[Stone 1998] J. Stone, M. Greenwald, C. Partridge, and J. Hughes,


"Performance of checksums and CRC's over real data," IEEE/ACM
Transactions on Networking, Vol. 6, No. 5 (Oct. 1998), pp 529 -
543

[Stone 2000] J. Stone, C. Partridge, "When Reality and the


Checksum Disagree Proceedings of ACM SIGCOMM '00,
(Stockholm, Sweden, Aug. 2000).

[Sunshine 1978] C. Sunshine and Y. K. Dalal, "Connection


Management in Transport Protocols," Computer Networks, North-
Holland, Amsterdam, 1978.

[Varghese 1997] G. Varghese and A. Lauck, "Hashed and


Hierarchical Timing Wheels: Efficient Data Structures for
Implementing a Timer Facility, " IEEE/ACM Transactions on
Networking, Vol. 5, No. 6, (Dec. 1997), pp.824 - 834
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-197 197


Computer Networking 3edition .
1-7-19-20-21-34-36-39

jamali@iust.ac.ir
. , Subject: HW3 Student ID Number

2 .

power point .


Computer Networking 3edition .
1-7-19-20-21-34-36-39

jamali@iust.ac.ir
. , Subject: HW3 Student ID Number

2 .

power point .
Home Work3
Home Work3
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-198 198
Control System Model [CJ89]
Control System Model [CJ89]

Simple, yet powerful model

Explicit binary signal of congestion


User 1
User 2
User n
x
1
x
2
x
n

x
i
>X
goal
y
D.ChiuandR.Jain,Analysisoftheincreaseanddecreasealgorithmsforcongestion
avoidanceincomputernetworks,ComputerNetworksandISDNSystems,Volume
17, Issue1 (June1989),Pages:114.
DMChiuAndRJain,"AnalysisofIncreaseandDecreaseAlgorithms,PartIIIofCongestion
AvoidanceinComputerNetworkswithaConnectionlessNetworkLayer",DECTechnical
Report509,August1987.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-199 199
Possible Choices
Possible Choices

Multiplicative increase, additive decrease

a
I
=0, b
I
>1, a
D
<0, b
D
=1

Additive increase, additive decrease

a
I
>0, b
I
=1, a
D
<0, b
D
=1

Multiplicative increase, multiplicative decrease

a
I
=0, b
I
>1, a
D
=0, 0<b
D
<1

Additive increase, multiplicative decrease

a
I
>0, b
I
=1, a
D
=0, 0<b
D
<1

Which one?

'

+
+
+
decrease t x b a
increase t x b a
t x
i D D
i I I
i
) (
) (
) 1 (
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-200 200
Multiplicative Increase,
Multiplicative Increase,
Additive Decrease
Additive Decrease
User 1: x
1
U
s
e
r

2
:

x
2
fairness
line
efficiency
line
(x
1h
,x
2h
)
(x
1h
+a
D
,x
2h
+a
D
)
(b
I
(x
1h
+a
D
),
b
I
(x
2h
+a
D
))

Fixed point
at
Fixed point
is unstable!
I
D I
h h
b
a b
x x


1
2 1
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-201 201
Additive Increase,
Additive Increase,
Additive Decrease
Additive Decrease
User 1: x
1
U
s
e
r

2
:

x
2
fairness
line
efficiency
line
(x
1h
,x
2h
)
(x
1h
+a
D
,x
2h
+a
D
)
(x
1h
+a
D
+a
I
),
x
2h
+a
D
+a
I
))

Reaches
stable
cycle, but
does not
converge to
fairness
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-202 202
Multiplicative Increase,
Multiplicative Increase,
Multiplicative Decrease
Multiplicative Decrease
User 1: x
1
U
s
e
r

2
:

x
2
fairness
line
efficiency
line
(x
1h
,x
2h
)
(b
d
x
1h
,b
d
x
2h
)
(b
I
b
D
x
1h
,
b
I
b
D
x
2h
)

Converges
to stable
cycle, but
is not fair
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-203 203
(b
D
x
1h
+a
I
,
b
D
x
2h
+a
I
)
Additive Increase,
Additive Increase,
Multiplicative Decrease
Multiplicative Decrease
User 1: x
1
U
s
e
r

2
:

x
2
fairness
line
efficiency
line
(x
1h
,x
2h
)
(b
D
x
1h
,b
D
x
2h
)

Converges
to stable
and fair
cycle
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-204 204
Modeling
Modeling

Critical to understanding complex systems

[CJ89] model relevant after 15 years, 10


6

increase of bandwidth, 1000x increase in
number of users

Criteria for good models

Two conflicting goals: reality and simplicity

Realistic, complex model too hard to


understand, too limited in
applicability

Unrealistic, simple model can be misleading


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-205 205
TCP Congestion Control
TCP Congestion Control

[CJ89] provides theoretical basis for basic


congestion avoidance mechanism

Must turn this into real protocol


3- 3-206 206
TCP Congestion Control
TCP Congestion Control

Maintains three variables:

cwnd: congestion window

flow_win: flow window; receiver advertised


window

Ssthresh: threshold size (used to update cwnd)

For sending, use: win = min(flow_win, cwnd)


3- 3-207 207
TCP: Slow Start
TCP: Slow Start

Goal: reach knee quickly

Upon starting (or restarting):

Set cwnd =1

Each time a segment is acknowledged


increment cwnd by one (cwnd++).

Slow Start is not actually slow

cwnd increases exponentially


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-208 208
Slow Start Example
Slow Start Example

The congestion
window size
grows very
rapidly

TCP slows down


the increase of
cwnd when
cwnd >=
ssthresh
ACK 2
segment 1
cwnd = 1
cwnd = 2
segment 2
segment 3
ACK 4
cwnd = 4
segment 4
segment 5
segment 6
segment 7
ACK8
cwnd = 8
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-209 209
Congestion Avoidance
Congestion Avoidance

Slow down Slow Start

ssthresh is lower-bound guess about location of


knee

If cwnd > ssthresh then


each time a segment is
acknowledged
increment cwnd by 1/cwnd (cwnd +=
1/cwnd).

So cwnd is increased by one only if all segments


have been acknowledged.
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-210 210
Slow Start/Congestion Avoidance Example
Slow Start/Congestion Avoidance Example

Assume that
ssthresh = 8
c w n d = 1
c w n d = 2
c w n d = 4
c w n d = 8
c w n d = 9
c w n d = 1 0
0
2
4
6
8
10
12
14
t
=
0
t
=
2
t
=
4
t
=
6
Roundtrip times
C
w
n
d

(
i
n

s
e
g
m
e
n
t
s
)
ssthresh
3- 3-211 211
Putting Everything Together:
Putting Everything Together:
TCP Pseudocode
TCP Pseudocode
Initially:
cwnd = 1;
ssthresh = infinite;
New ack received:
if (cwnd < ssthresh)
/* Slow Start*/
cwnd = cwnd + 1;
else
/* Congestion Avoidance */
cwnd = cwnd + 1/cwnd;
Timeout:
/* Multiplicative decrease */
ssthresh = cwnd/2;
cwnd = 1;
while (next < unack + win)
transmit next packet;
where win = min(cwnd,
flow_win);
unack next
win
seq #
3- 3-212 212
The big picture
The big picture
Time
cwnd
Timeout
Slow Start
Congestion
Avoidance
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-213 213
Fast Retransmit
Fast Retransmit

Dont wait for


window to drain

Resend a segment
after 3 duplicate
ACKs
ACK 2
segment 1
cwnd = 1
cwnd = 2 segment 2
segment 3
ACK 4
cwnd = 4
segment 4
segment 5
segment 6
segment 7
ACK 3
3 duplicate
ACKs
ACK 4
ACK 4
ACK 4
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-214 214
Fast Recovery
Fast Recovery

After a fast-retransmit set cwnd to


ssthresh/2

i.e., dont reset cwnd to 1

But when RTO expires still do cwnd = 1

Fast Retransmit and Fast Recovery

Implemented by TCP Reno

Most widely used version of TCP today

Lesson: avoid RTOs at all costs!


3- 3-215 215
Fast Retransmit and Fast Recovery
Fast Retransmit and Fast Recovery

Retransmit after 3 duplicated acks

prevent expensive timeouts

No need to slow start again

At steady state, cwnd oscillates around


the optimal window size.
Time
cwnd
Slow Start
Congestion
Avoidance
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-216 216
Engineering vs Science in CC
Engineering vs Science in CC

Great engineering built useful protocol:

TCP Reno, etc.

Good science by CJ and others

Basis for understanding why it works so well


Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-217 217
Behavior of TCP
Behavior of TCP

Are packets smoothly paced?

NO! Ack-compression

Are long-lived flows nicely interleaved?

NO!

How does throughput depend on drop rate?


Tput ~ 1/sqrt(d)
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-218 218
Extensions to TCP
Extensions to TCP

Selective acknowledgements: TCP SACK

Explicit congestion notification: ECN

Delay-based congestion avoidance: TCP


Vegas

Discriminating between congestion


losses and other losses: cross-layer
signaling and guesses

Randomized drops (RED) and other


router mechanisms
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-219 219
Issues with TCP
Issues with TCP

Fairness:

Throughput depends on RTT

High speeds:

to reach 10gbps, packet losses occur every


90 minutes!

Short flows:

How to set initial cwnd properly

What about flows that want congestion


control, but dont want reliable
delivery?
Jamali@iust.ac.ir Jamali@iust.ac.ir ITransport Layer ITransport Layer
3- 3-220 220
TCP: Cooperation and Compatibility
TCP: Cooperation and Compatibility

TCP assumes all flows employ TCP-like


congestion control

TCP-friendly or TCP-compatible

Selfish flows: can get all the bandwidth


they like

If new congestion control algorithms are


developed, they must be TCP-friendly

You might also like