Professional Documents
Culture Documents
Center TCP
Hakim Weatherspoon
Assistant Professor, Dept of Computer Science
CS 5413: High Performance Systems and
Networking
September 5, 2014
Abstraction / services
Multiplexing/Demultiplexing
UDP: Connectionless Transport
TCP: Reliable Transport
Abstraction, Connection Management, Reliable Transport,
Flow Control, timeouts
Congestion control
l
ca
gi
lo
d
en
den
tr
or
sp
an
t
provide logical
communication between
app processes running on
different hosts
transport protocols run in
end systems
send side: breaks app
messages into segments,
passes to network layer
rcv side: reassembles
segments into messages,
passes to app layer
more than one transport
protocol available to apps
Internet: TCP and UDP
applicatio
n
transport
network
data link
physical
applicatio
n
transport
network
data link
physical
siblings
network-layer
protocol = postal
service
network
data link
physical
network
data link
physical
network
data link
physical
delay guarantees
bandwidth guarantees
or
sp
an
services not
available:
network
data link
physical
tr
no-frills extension of
best-effort IP
network
data link
physical
d
en
den
unreliable, unordered
delivery: UDP
network
data link
physical
l
ca
gi
lo
congestion control
flow control
connection setup
applicatio
n
transport
network
data link
physical
network
data link
physical
applicatio
n
transport
network
data link
physical
reliable transport
between sending and
receiving process
flow control: sender
wont overwhelm
receiver
congestion control:
throttle sender when
network overloaded
does not provide:
timing, minimum
throughput guarantee,
security
connection-oriented:
setup required
between client and
UDP service:
unreliable data transfer
between sending and
receiving process
does not provide:
reliability, flow control,
congestion control,
timing, throughput
guarantee, security, or
connection setup,
Q: why bother? Why is
there a UDP?
Abstraction / services
Multiplexing/Demultiplexing
UDP: Connectionless Transport
TCP: Reliable Transport
Abstraction, Connection Management, Reliable Transport,
Flow Control, timeouts
Congestion control
Transport Layer
Sockets: Multiplexing/Demultiplexing
multiplexing at sender:
handle data from
multiple
sockets, add transport
header (later used for
demultiplexing)
demultiplexing at receiver:
use header info to deliver
received segments to correct
socket
application
application
P3
P1
P2
application
P4
transport
transport
network
transport
network
link
network
link
physical
link
physical
physical
socket
process
Abstraction / services
Multiplexing/Demultiplexing
UDP: Connectionless Transport
TCP: Reliable Transport
Abstraction, Connection Management, Reliable Transport,
Flow Control, timeouts
Congestion control
dest port #
length
checksum
length, in bytes of
UDP segment,
including header
no connection
establishment (which can
add delay)
simple: no connection
state at sender, receiver
small header size
no congestion control:
UDP can blast away as
fast as desired
receiver:
compute checksum of
received segment
check if computed checksum
equals checksum field value:
NO - error detected
YES - no error detected. But
maybe errors nonetheless?
More later .
Abstraction / services
Multiplexing/Demultiplexing
UDP: Connectionless Transport
TCP: Reliable Transport
Abstraction, Connection Management, Reliable Transport,
Flow Control, timeouts
Congestion control
send
side
deliver_data(): called
by rdt to deliver data to
upper
receive
side
reliable, in-order
byte steam:
no message
boundaries
pipelined:
TCP congestion and
flow control set
window size
connection-oriented:
handshaking (exchange
of control msgs) inits
sender, receiver state
before data exchange
flow controlled:
sender will not
overwhelm receiver
source port #
dest port #
sequence number
acknowledgement number
head not
UAP R S F
len used
checksum
receive window
Urg data pointer
application
data
(variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
sequence numbers:
byte stream number
of first byte in
segments data
acknowledgements:
seq # of next byte
expected from other
side
cumulative ACK
Q: how receiver handles
out-of-order segments
A: TCP spec doesnt say,
- up to implementor
source port #
dest port #
sequence number
acknowledgement number
rwnd
checksum
urg pointer
window size
N
dest port #
sequence number
acknowledgement number
rwnd
A
checksum
urg pointer
Host A
User
types
C Seq=42, ACK=79, data = C
host ACKs
receipt
of echoed
C
host ACKs
receipt of
C, echoes
Seq=79, ACK=43, data = C back C
Seq=43, ACK=80
reliable, in-order
byte steam:
no message
boundaries
pipelined:
TCP congestion and
flow control set
window size
connection-oriented:
handshaking (exchange
of control msgs) inits
sender, receiver state
before data exchange
flow controlled:
sender will not
overwhelm receiver
connection state:
ESTAB
connection Variables:
seq # client-toserver
server-to-client
rcvBuffer size
network
at server,client
Socket clientSocket =
newSocket("hostname","port
number");
Socket connectionSocket =
welcomeSocket.accept();
server state
LISTEN
LISTEN
choose init seq num, x
send TCP SYN msg
SYNSENT
SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
SYN RCVD
msg, acking SYN
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
received ACK(y)
indicates client is live
ESTAB
SYN(x)
SYNACK(seq=y,ACKnum=x+1)
create new socket for
communication back to client
listen
SYN(seq=x)
SYN
sent
SYN
rcvd
ACK(ACKnum=y+1)
Socket clientSocket =
newSocket("hostname","port
number");
ESTAB
SYNACK(seq=y,ACKnum=x+1)
ACK(ACKnum=y+1)
server state
ESTAB
ESTAB
clientSocket.close()
FIN_WAIT_1
FIN_WAIT_2
can no longer
send but can
receive data
FINbit=1, seq=x
CLOSE_WAIT
ACKbit=1; ACKnum=x+1
FINbit=1, seq=y
TIMED_WAIT
timed wait
for 2*max
segment lifetime
CLOSED
can still
send data
LAST_ACK
can no longer
send data
ACKbit=1; ACKnum=y+1
CLOSED
reliable, in-order
byte steam:
no message
boundaries
pipelined:
TCP congestion and
flow control set
window size
connection-oriented:
handshaking (exchange
of control msgs) inits
sender, receiver state
before data exchange
flow controlled:
sender will not
overwhelm receiver
timeout:
retransmit segment
that caused timeout
restart timer
ack rcvd:
if ack acknowledges
previously unacked
segments
update what is known
to be ACKed
start timer if there are
still unacked segments
Host A
Host B
Host A
SendBase=92
ACK=100
timeout
Seq=92, 8
bytes of data
SendBase=120
ACK=120
SendBase=120
premature timeout
Host A
timeout
ACK=100
ACK=120
cumulative ACK
Reliable Transport
TCP ACK generation [RFC 1122, 2581]
event at receiver
if sender receives
3 ACKs for same
data duplicate
(triple
ACKs),
(triple duplicate
ACKs), resend
unacked segment
with smallest seq
#
likely that unacked
segment lost, so
dont wait for
timeout
Host B
timeout
X
ACK=100
ACK=100
ACK=100
ACK=100
Seq=100, 20 bytes of data
Q: how to set
TCP timeout
value?
longer than RTT
but RTT varies
too short:
premature
timeout,
unnecessary
retransmissions
too long: slow
reaction to
RTT?
SampleRTT: measured
time from segment
transmission until ACK
receipt
ignore
retransmissions
SampleRTT will vary,
want estimated RTT
smoother
average several
recent
measurements, not
just current
SampleRTT
RTT (milliseconds)
sampleRTT
time
(seconds)
safety margin
reliable, in-order
byte steam:
no message
boundaries
pipelined:
TCP congestion and
flow control set
window size
connection-oriented:
handshaking (exchange
of control msgs) inits
sender, receiver state
before data exchange
flow controlled:
sender will not
overwhelm receiver
application
process
application
TCP socket
receiver buffers
TCP
code
IP
code
flow control
OS
from sender
to application process
RcvBuffer
rwnd
buffered data
free buffer space
receiver-side buffering
Abstraction / services
Multiplexing/Demultiplexing
UDP: Connectionless Transport
TCP: Reliable Transport
Abstraction, Connection Management, Reliable Transport,
Flow Control, timeouts
Congestion control
Data Center TCP
Incast Problem
feedback to end
systems
single bit indicating
congestion (SNA,
DECbit, TCP/IP ECN,
ATM)
explicit rate for
sender to send at
TCP connection 2
bottleneck
router
capacity R
time
last byte
ACKed
sent, notyet
ACKed
(inflight)
last byte
sent
RTT
Connection 2 throughput
Connection 1 throughput
Host B
Host A
RTT
Slow Start
one segm
ent
two segm
ents
four segm
ents
time
Implementation:
variable ssthresh
on loss event, ssthresh
is set to 1/2 of cwnd just
before loss event
cwnd = 1 MSS
ssthresh = 64 KB
dupACKcount = 0
slow
start
timeout
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
dupACKcount == 3
ssthresh= cwnd/2
cwnd = ssthresh + 3
retransmit missing segment
New
New
ACK!
new ACK
cwnd = cwnd+MSS
dupACKcount = 0
transmit new segment(s), as allowed
cwnd > ssthresh
timeout
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
timeout
ssthresh = cwnd/2
cwnd = 1
dupACKcount = 0
retransmit missing segment
ACK!
new ACK
cwnd = cwnd + MSS (MSS/cwnd)
dupACKcount = 0
transmit new segment(s), as allowed
congestion
avoidance
duplicate ACK
dupACKcount++
New
ACK!
New ACK
cwnd = ssthresh
dupACKcount = 0
fast
recovery
dupACKcount == 3
ssthresh= cwnd/2
cwnd = ssthresh + 3
retransmit missing segment
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed
TCP Throughput
avg. TCP thruput as function of window
size, RTT?
ignore slow start, assume always data to send
W: window size
occurs
(measured in bytes)
3 W
bytes/sec
4 RTT
in-flight
bytes)
W/2
where loss
is W
Abstraction / services
Multiplexing/Demultiplexing
UDP: Connectionless Transport
TCP: Reliable Transport
Abstraction, Connection Management, Reliable Transport,
Flow Control, timeouts
Congestion control
Data Center TCP
Incast Problem
R
R
R
R
Client
1
Switch
2
4
4
Server
Request Unit
(SRU)
R
R
R
R
2
4
Client
1
Switch
3
4
4
Server
Request Unit
(SRU)
Collapse!
Ack 1
Ack 1
Ack 1
Ack 1
Sender
Ack 5
Receiver
Timeouts are
expensive
(msecs to recover
after loss)
5
Retransmission
Timeout
(RTO)
1
Ack 1
Sender
Receiver
Seq #
1
Seq #
1
2
3
4
5
2
3
4
5
Retransmit
2
Sender
Retransmission
Timeout
(RTO)
1
Sender
Ack 1
Receiver
Ack 1
Ack 1
Ack 1
Ack 1
Ack 5
Receiver
Perspective
principles behind transport
layer services:
multiplexing,
demultiplexing
reliable data transfer
flow control
congestion control
instantiation, implementation
in the Internet
UDP
TCP
Next time:
Network Layer
leaving the network
edge (application,
transport layers)
into the network
core
Lab1
Single threaded TCP proxy
Due in one week, next Friday