You are on page 1of 285

San Jose State University

College of Engineering
Network Programming And Application
CmpE 207
by
Rod Fatoohi, Ph.D.

Note: These course notes are provided exclusively for the


students convenience to follow the presentations. They cannot
be reproduced without a written permission from the author.
Some of these materials are from the following textbooks:
•  Internetworking with TCP/IP Vol. 3, Client-Server programming &
applications, Comer & Stevens, Linux/POSIX Sockets version, 2001.
•  UNIX Network Programming Vol. 1, 3/e: The Sockets Networking API,
Stevens, Fenner & Rudoff, 2004
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 1
OSI Reference Model
Layer Functionality

7 Application

6 Presentation

5 Session

4 Transport

3 Network

Data Link
2
(Hardware Interface)
Physical Hardware
1
Connection

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 2
•  Protocol hierarchies (layering):
Networks are in general organized as series
of layers each one built upon one below it
–  Each layer provides services to higher layers.
–  Between adjacent layers there is an interface.

•  Interface specifies operations & services


that lower layer offers to upper layer
–  How above layer access that layer

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 3
CmpE 207 Copyright 2011 by Fatoohi, All Rights Reserved 4
TCP/IP Reference Model

Conceptual Layer Objects Passed


Between Layers
Application
Messages or Streams
Transport
Transport Protocol Packets
Internet
IP Datagrams
Network Interface
..........................
. . Network-Specific Frames
. .
. .
.
. Hardware .
.
. .
. .
..........................

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 5
CmpE207 Copyright 2011 by Fatoohi, All Rights Reserved 6
TCP/IP (internet) Reference Model
•  Link layer (network access layer).
–  Not well defined.
–  Handles data exchange between 2 adjacent hosts.
–  Includes device driver (in OS) & NIC.
•  Network layer (internet layer).
–  Handles movement of packets around network.
•  Transport layer.
–  Provides flow of data between 2 hosts.
•  Application layer.
–  Handles details of various user applications.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 7
Protocol Layering

Host A Host B

identical
Application Application
message

identical
Transport Transport
packet

Router R

Internet Internet Internet


identical identical
datagram datagram

Network Network Network


Interface identical Interface identical Interface
frame frame

Physical Net 1 Physical Net 2

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 8
Conceptual Layers Software Organization

High-Level Protocol 1 Protocol 2 Protocol 3


Protocol Layer
Internet
IP Module
Protocol Layer
Network
Interface Layer Interface 1 Interface 2 Interface 3

(a) (b)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 9
Frame De-multiplexing

IP Module ARP Module RARP Module

Demultiplexing Based
On Frame Type

Frame Arrives

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 10
Datagram De-multiplexing

ICMP Protocol UDP Protocol TCP Protocol

IP Module

Datagram Arrives

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 11
API
•  Application Programming Interface
–  set of procedures user can call to access service
–  an interface available to user
–  availability depends on OS & programming
language
–  could be specific for protocol suite and/or OS.
–  implementation: system calls or library calls.
–  Ex: Berkeley sockets, X/Open Transport
Interface (XTI), Windows sockets interface
(Winsock)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 12
API
•  Sockets, XTI & Winsock:
–  interfaces from application layer in TCP/IP (three
upper layers in OSI) into transport layer
–  used to write applications that use TCP, UDP, IP,
ICMP in TCP/IP
–  There is separation between transport & higher layers
•  upper layers deal w/ applications while lower layers deal
w/ communication details.
•  upper layers usually form user process while lower layers
are part of OS.
•  Ex: Unix separates between user process & kernel

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 13
Types of network services:
•  Connection-oriented service:
–  Has three phases: connection establishment,
data transfer & connection termination.
–  Modeled after telephone system.
–  Usually sending host sends stream of data.
–  Communication doesn t have to be continuous.
–  Data ordering preserved.
–  Provides information in case of failure.
–  Uses some resources.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 14
•  Connectionless service:
–  Modeled after postal system.
–  Messages may get lost.
–  No reordering.
–  Requires less initial overhead
–  Can run faster (no ACK)
–  Uses less resources.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 15
Client-Server Model
•  The model of interaction in distributed system
where communication takes form of request
message from client to server (asking for
some defined service) & reply message from
server to client (providing service).

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 16
Client-Server Model (Cont.)
In general:
•  Clients & servers are implemented as
application programs (user processes).
•  Clients are easier to implement than servers.
•  There are more clients than servers.
•  A server waits to be contacted by clients.
•  Clients terminate while servers do not.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 17
Server Classes:
•  Iterative Servers:
–  Only one request is processed at a time while
others wait.
–  Servers wait for requests, process a request,
send a reply, then go back waiting.
–  Usually used for short services.
–  Ex. UDP servers.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 18
Server Classes (Cont.):
•  Concurrent servers:
–  Many requests can be processed at a time.
–  A server waits for requests, starts new server to
handle each request (creating new process, task, or
thread).
–  New server will terminate when it completes its task.
–  Main server is free to accept other requests.
–  Ex. TCP servers.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 19
TCP/IP protocol Suite
•  set of protocols used in Internet
•  Internet Protocol (IP)
•  Internet Control Message Protocol (ICMP)
•  Internet Group Management Protocol (IGMP)
•  Address Resolution Protocol (ARP)
•  Reverse Address Resolution Protocol (RARP)
•  User Datagram Protocol (UDP)
•  Transmission Control Protocol (TCP)
•  Application protocols (telnet, rlogin, FTP,
SNMP, SMTP, DNS, NFS, RPC,…)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 20
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 21
IP (Internet Protocol)
•  standard TCP/IP protocol that provides
unreliable, best-effort, connectionless
delivery service of datagrams across internet.
•  unreliable since delivery is not guaranteed
(lost, delivered out of order)
•  best-effort since IP makes serious attempt to
deliver datagrams (Ex. sending error msgs)
•  Connectionless since each datagram is
handled independently from others w/o
connection establishment & termination
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 22
IP (cont).
•  Delivery is regardless whether source &
destination are on same network or on
different networks.
•  It defines datagram as basic unit of transfer.
•  It performs routing function (choosing path).
•  It includes rules for datagram delivery (ex.
error handling, packet processing).
•  IP is the glue that holds internet together.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 23
IP datagram format:
•  Divided into header & data fields.
•  Header: 20-byte fixed & variable
length optional
•  4-bit version (currently 4) - same
version required.
•  4-bit header length - # of 32-bit words
in header

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 24
IP Protocol

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 25
IP datagram format (Cont.)
•  8-bit Type of Service (TOS):
–  3-bit precedence field (mostly ignored)
–  4-bit flags
•  0000 default
•  0001 Min cost
•  0010 Max reliability
•  0100 Max throughput
•  1000 Min Delay
–  1-bit unused
•  16-bit Total length of datagram in bytes
(max 65535 bytes)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 26
IP datagram Format (Cont.)
•  16-bit identification
– uniquely identifies each datagram
– same value for fragments of a datagram
•  3-bit flag for fragmentation
– 2nd bit: Don t Fragment (DF)
– 3rd bit: More Fragment (MF)
•  13-bit fragment offset
–  location of fragment within datagram

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 27
IP datagram format (Cont.)
•  8-bit Time to Live (TTL)
– counter to limit packet life time
– mainly counts hops
– sender sets it to value & every router
decrements it.
– router throws it away if it is zero.
•  8-bit protocol to identify higher level
protocol to send to.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 28
IP datagram format (cont.)
•  16-bit header checksum
– Error detection for header only
– Uses 16-bit 1 s complement sum
•  32-bit source IP addr.
•  32-bit destination IP addr.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 29
IP datagram Format (Cont.)
•  Options
–  security: specifies how secret datagram is
–  record route: makes each router append its IP addr.
–  Timestamp: makes each router append its IP addr &
time.
–  loose source routing: gives list of IP addrs to be
followed.
–  strict source routing: gives complete path for
datagram.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 30
Direct route: source & destination are directly
connected to single physical network (ex.
Ethernet).
–  Sender extracts network addr of destination
IP addr & compares it to its own network
addr. A match means direct route.
–  Sender encapsulates datagram in frame
–  Sender maps destination IP addr into physical
addr (ARP, ARP cache)
–  Sender sends frame directly to destination.
–  No router is involved.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 31
Indirect route:
– Source & destination are not on directly
connected network.
– Routers are used.
– Sender encapsulates datagram & sends it to
router as in direct route.
– Router extracts encapsulated datagram & selects
next router along path toward destination
– Hosts & routers use routing tables to find route
for given destination
– Routers use dynamic routing protocols to
communicate w/ other routers & update their
routing tables
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 32
ICMP (Internet Control Message Protocol)
•  Required standard TCP/IP protocol that
handles error & control messages.
•  Often considered part of IP.
•  Encapsulated in IP datagram
–  need to travel across physical networks.
•  Routed like other datagrams
–  no extra reliability or priority
•  Provides communication between IP
software on two machines.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 33
ICMP Encapsulation

ICMP ICMP DATA


HEADER

DATAGRAM DATAGRAM DATA AREA


HEADER

FRAME FRAME DATA AREA


HEADER

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 34
ICMP messages.
•  Two kinds of ICMP messages:
– Error messages: routers & hosts use to
send reports of problems about datagrams
back to original source of datagrams.
• Error reporting (not error correction).
• Instead of discarding packets.
• Informing sender only: not
intermediate routers.
– Query messages: hosts use to test status
of a destination or get info about it.
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 35
ICMP Echo Request/Reply Message Format

0 8 16 31
TYPE (8 or 0) CODE (0) CHECKSUM
IDENTIFIER SEQUENCE NUMBER
OPTIONAL DATA
...

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 36
•  ICMP Message Format:
– Not fixed size.
– First four bytes are fixed (ICMP header)
•  1-byte type field to identify messages (15
different types)
•  1-byte code field to further specify condition
•  2-byte checksum field covers whole ICMP
msg - using 1 s complement sum as in IP.
–  Remainder depends on type & code.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 37
ICMP Destination Unreachable Message Format

0 8 16 31
TYPE (3) CODE (0-15) CHECKSUM
UNUSED (MUST BE ZERO)
INTERNET HEADER + FIRST 64 BITS OF DATAGRAM
...

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 38
•  Ports: used to distinguish between many
destinations (processes) within host.
•  Port numbers: used to identify TCP/IP
applications (16-bit positive integers).
•  Well-known (reserved) port numbers:
– Reserved for standard services (servers)
– Between 1 & 1023.
– Managed by internet assigned numbers
authority (IANA).

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 39
Sample of assigned UDP ports
Decimal Keyword UNIX Keyword Description
0 - - Reserved
7 ECHO echo Echo
9 DISCARD discard Discard
11 USERS systat Active Users
13 DAYTIME daytime Daytime
15 - netstat Network Status Program
17 QUOTE qotd Quote of the Day
19 CHARGEN chargen Character Generator
37 TIME time Time
42 NAMESERVER name Host Name Server
43 NICNAME whois Who Is
53 DOMAIN nameserver Domain Name Server
67 BOOTPS bootps BOOTP or DHCP Server
68 BOOTPC bootpc BOOTP or DHCP Client
69 TFTP tftp Trivial File Transfer
88 KERBEROS kerberos Kerberos Security Service
111 SUNRPC sunrpc Sun Remote Procedure Call
123 NTP ntp Network Time Protocol
161 - snmp Simple Network Management Protocol
162 - snmp-trap SNMP traps
512 - biff UNIX comsat
513 - who UNIX rwho Daemon
514 - syslog System Log
525 - timed Time Daemon
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 40
Sample of assigned TCP ports
Decimal Keyword Description
0 Reserved
7 ECHO Echo
9 DISCARD Discard
11 USERS Active Users
13 DAYTIME Daytime
15 netstat Network Status program
17 QUOTE Quote of the day
19 CHARGEN Character Generator
20 FTP-DATA File Transfer Protocol (data)
21 FTP File Transfer Protocol
22 SSH Secure Shell
23 TELNET Terminal connection
25 SMTP Simple Mail Transport Protocol
37 TIME Time
53 DOMAIN Domain name server
67 BOOTPS BOOTP or DHCP Server
79 FINGER Finger
80 WWW World Wide Web server
88 KERBEROS Kerberos security service
110 POP3 Post Office Protocol vers. 3
111 SUNRPC SUN Remote Procedure Call
119 NNTP USENET News Transfer Protocol
123 NTP Network Time Protocol
143 IMAP Internet Message Access Protocol
161 SNMP Simple Network Management Protocol
443 HTTPS Secure HTTP
465 SMTPS SMTP over SSL (TLS)
515 SPOOLER LPR spooler
873 RSYNC Rsync protocol
993 IMAPS Secure IMAP
995 POP3S Secure POP3
1080 SOCKS Proxy server protocol
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 41
User Datagram Protocol (UDP)
•  standard TCP/IP transport layer protocol
•  allows application program (process) on one
host to send datagram to process on another
•  provides unreliable connectionless service
– no ACK nor connection establishment
•  uses IP to deliver datagrams
•  Services provided to application: only port
number & optional checksum above IP
•  used mainly by simple applications (single
requests & replies as in NFS)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 42
UDP Header
•  fixed-size (8 bytes)
•  2-byte source port #: specify sending proc
•  2-byte destination port #: specify recv proc
•  2-byte length: length (in bytes) of UDP
datagram (UDP header + data)
•  2-byte checksum (optional): covers UDP
datagram

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 43
UDP Header

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 44
UDP Header (cont)
•  Data
– host must be able to receive datagrams of
576 bytes (w/ or w/o fragmentation)
– max size of 65507 bytes
– limited by send & receive buffers &
kernel implementation
– mostly either within 8192 bytes (NFS) or
576 bytes (host requirement)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 45
Transmission Control Protocol (TCP)
•  standard TCP/IP transport layer protocol
•  provides reliable, connection-oriented, full-
duplex, byte stream service
•  allows process on one host to send stream of
data to process on another
•  processes must establish connection before
transmitting data
•  provides full duplex connection of 2
independent streams flowing in opposite
directions.
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 46
TCP (cont)
•  is byte stream
– msg boundaries are not preserved
– data are sent as stream of bytes
•  provides point-to-point connection since
each connection has only 2 end points
•  uses IP to deliver data

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 47
TCP (cont)
•  provides reliable delivery service:
– positive acknowledgment (ACK)
– retransmission using timers & buffers
– reordering if msgs received out of order
– breaking up application data into best-sized
chunks, segments, for transmission
– Discarding duplicate segments
– flow control using sliding window protocol
– mandatory error detection (checksum)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 48
TCP Sliding Window

current window
.
.
.
.
.
1 2 3 4 5 6 .
. 7 8 9 10 11 ...
.
.
.
.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 49
TCP Header
•  20 byte fixed format + possible options
•  16-bit source port: identify sending appl
•  16-bit dest port: identify receiving appl
•  32-bit sequence number: identify 1st data
byte in segment
•  32-bit acknowledgment number: identify
next seq # sender expects to receive
–  uses cumulative ACK since it reports how
much of data stream has accumulated
•  4-bit header length - in 32-bit words
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 50
TCP Header

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 51
TCP Header (cont)
•  6-bit unused (reserved)
•  6-bit flags:
– URG: Urgent pointer field is valid
– ACK: Acknowledgment # is valid
– PSH: segment requests push
– RST: used to reset connection
– SYN: used to establish connection
– FIN: used to release connection

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 52
TCP Header (cont)
•  16-bit window size: specify # of bytes
(buffer size) host is willing to receive
starting at byte acknowledged
•  16-bit checksum: covers TCP segment
•  16-bit Urgent pointer: used to transmit
urgent data
•  Options:
–  Maximum Segment Size (MSS)
–  others: window scale, timestamp, …

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 53
TCP Maximum Segment Size (MSS)
•  Largest segment sender is willing to receive
•  Each end may announce its MSS at
connection establishment
– appears only in SYN segment
•  Default to 536 bytes (576 byte IP datagram)
•  usually matches MTU for local traffic (ex.
1460 bytes for Ethernet)
•  Default to 536 bytes for nonlocal traffic
•  Small MSS: low network utilization
•  Large MSS: may cause fragmentation
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 54
TCP connection establishment protocol
•  Using 3-way handshake (takes 3 segments)
•  Server performs passive open by starting
process & wait for requests.
•  Client performs active open by sending
TCP segment w/: SYN on, ACK off,
server s port #, & MSS (optional)
•  Server responds w/ TCP segment w/: SYN
on, ACK on (client s segment), client s
port #, & MSS (optional)
•  Client ACKes server s segment
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 55
TCP connection establishment protocol

Events At Site 1 Network Messages Events At Site 2

Send SYN seq=x

Receive SYN segment


Send SYN seq=y, ACK x+1

Receive SYN + ACK segment


Send ACK y+1

Receive ACK segment

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 56
TCP connection termination protocol
•  Takes 4 segments
•  TCP full duplex is viewed as 2 independent,
simplex connections for termination
•  Client performs active close by sending FIN
•  Server performs passive close after
receiving FIN by notifying application &
ACKing client s FIN - 1 connection closed
•  Server sends its FIN to client
•  Client ACKes server s FIN
•  Client waits for Max Segment Lifetime.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 57
TCP connection termination protocol

Events At Site 1 Network Messages Events At Site 2

(application closes connection)


Send FIN seq=x

Receive FIN segment


Send ACK x+1
(inform application)
Receive ACK segment

(application closes connection)


Send FIN seq=y, ACK x+1

Receive FIN + ACK segment


Send ACK y+1

Receive ACK segment

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 58
TCP State Machine
anything / reset

begin CLOSED

passive open close

LISTEN active open / syn

syn / syn + ack


send / syn
reset
SYN syn / syn + ack SYN close /
RECVD SENT timeout /
reset

ack syn + ack / ack

ESTAB- fin / ack CLOSE


close / fin
LISHED WAIT

close / fin
close / fin

FIN fin / ack LAST ack /


CLOSING
WAIT-1 ACK .
.
.
.
.
.
.
.
.
.
fin-ack / ack .
.
ack / ack / .
.

timeout after 2 segment


.
lifetimes
.
.
.
.
FIN fin / ack TIMED .

WAIT-2 WAIT

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 59
Domain Name System (DNS)
•  Networks use binary addrs (IP addrs)
•  Users use machine names (ASCII strings)
•  DNS is a distributed database that provides
mapping between human-readable machine
names (hostnames) & IP addrs.
•  Standard library functions are provided to
provide mapping (hostnames to IP addrs) &
reverse mapping.
•  Most applications accept names & IP addrs.
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 60
API to Communication Protocols:
•  Desirable: general interface to
communication protocols
– independent of OS
– independent of protocol software

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 61
API to TCP/IP
•  loosely specified
–  required functionality suggested only
–  Adv: more flexible
–  Disadv: less portable; may generate different APIs
•  Specifies conceptual interface based on
procedures & functions
•  development approaches:
–  new interface: a new set of procedures
–  extension of existing interface (normally I/O)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 62
Implementation approaches
•  applies to protocol & its interface
•  2 approaches: system calls & library calls
•  system calls provide direct access to kernel services
–  causes context switching from user mode to kernel mode
•  library calls reside in library archive
–  may provide more functionality & better interface
•  Standard I/O library (fopen, fclose, …) built on top
of system calls (open, close, …)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 63
Implementation examples:
•  4.4BSD: both TCP/IP protocol & socket API
are in kernel
•  SVR4: TCP/IP protocol in kernel; socket &
TLI APIs are library calls
•  Windows 95: both TCP/IP code & socket API
are in Dynamic Linked Libraries (DLLs)
•  Windows 2000 (& later): TCP/IP code in OS;
socket API in DLL.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 64
Basic I/O calls:
•  open(): open a device or file & returns a file
descriptor to be used in I/O operations
•  close(): close a device/file
•  read(): read data from device/file & transfer
it to program s memory
•  write(): transfer data from program s
memory to device/file
•  lseek(): move to a specific position in file/
device
•  ioctl(): change behavior of device/file
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 65
Important process calls:
•  fork(): create a process (child) & execute
same code as calling process (parent)
–  returns a new process ID (pid)
•  execve(): replace current program w/ a new
program
–  used w/ fork()
•  exit(): terminate process & remove its file
descriptors.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 66
BSD Sockets:
•  extended I/O system calls
•  added new system calls
•  originated w/ 4.2BSD in 1983 as part of
Unix code
•  networking code (TCP/IP stack, socket API,
& many applications) are publicly available
(from Berkeley) starting in 1989.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 67
Configuration files:
•  might be needed for socket programming:
•  /etc/hosts: lists hosts known to the system
–  format: (host name, IP address)
•  /etc/services: lists TCP/IP services available on the
system
–  format: (service name, port #, protocol name)
•  /etc/protocols: lists available protocols
–  format: (protocol name, protocol #)
•  /etc/networks: lists networks known to the system
–  format: (network name, network address)
–  may not be available Copyright
CmpE207
on all2011systems 68
by Fatoohi, All Rights Reserved
Socket interface
•  Adds new call socket() to create socket
•  Similar to open() call
•  Creates socket descriptor
–  Small integer similar to file descriptor
–  Allocated in the same descriptor table as file
descriptor for each process
–  Contains pointer to socket data structure (created
by OS) that has info about socket
•  Partial info in data structure filled by socket()
•  Passive socket: used by server
•  Active socket: used by client
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 69
Socket domain
•  Socket supports many protocol families (domains):
–  AF_INET (or PF_INET): IPv4 protocols
–  AF_INET6: IPv6 protocols
–  AF_LOCAL (or AF_UNIX): Unix domain protocols
on same host
–  AF_ROUTE: routing socket (interface to kernel s
routing table)
–  AF_KEY: key socket for cryptographic security
–  AF_NS: Xerox NS protocols
–  AF_ISO: OSI protocols
•  Not all families are supported by every system
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 70
Socket types
•  SOCK_STREAM: reliable, connection-
oriented, byte stream (ex: TCP)
•  SOCK_DGRAM: unreliable, connection-less,
datagrams (ex: UDP)
•  SOCK_RAW: raw sockets (ex: ICMP)
•  SOCK_ SEQPACKET: sequenced packet
socket (not in TCP/IP)
•  SOCK_PACKET: provides access to the
datalink (only on Linux)
Not all are supported by every protocol family
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 71
Socket address structure
•  provides generic data structure
(address family, endpoint address)
•  uses predefined C structure:
struct sockaddr {
u_char sa_len; total length
u_short sa_family; address family
char sa_data[14]; prot specific-addr
};
•  Some domains need more than 14 bytes.
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 72
Internet socket address structure
•  For TCP/IP uses only
struct sockaddr_in {
u_char sin_len; length of structure
u_short sin_family; /AF_INET/
u_short sin_port; 16-bit port #
struct in_addr sin_addr; IPv4 address
char sin_zero[8]; unused
};

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 73
Host information
•  structure to support IP address & host name
struct hostent {
char *h_name; host name
char **h_aliases; other names
int h_addrtype; /AF_INET/
int h_length; address length
char **h_addr_list; list of addresses
};
#define h_addr h_addr_list[0]
•  defined in /usr/include/netdb.h
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 74
Obtaining host information
•  to convert between domain name, dotted
decimal notation, & 32-bit address
•  gethostbyname(): search /etc/hosts or DNS
to find its IP address given its host name
–  returns pointer to a hostent structure
•  inet_addr(): takes dotted decimal address &
returns its IP address (in binary)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 75
Service information
•  structure to associate port number w/
specific service & protocol name:
struct servent {
char *s_name; service name
char **s_aliases; other names
int s_port; port #
char *s_proto; protocol to use
};
•  defined in /usr/include/netdb.h
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 76
Obtaining service information
•  To convert between port #, service name, &
protocol name
•  getservbyname(): search network services
database (/etc/services) to find port # given
its service name & protocol name
–  Returns pointer to servent

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 77
Protocol information
•  structure to associate official protocol name
w/ official protocol number (defined in RFC
# 1700 Assigned Numbers)
struct protoent {
char *p_name; protocol name
char **p_aliases; other names
char *p_proto; protocol #
};
•  defined in /usr/include/netdb.h

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 78
Obtaining protocol information
•  To convert between protocol name &
protocol number
•  getprotobyname(): search /etc/protocols for
specified protocol name
– Returns pointer to protoent
– Less useful than gethostbyname() &
getservbyname() in TCP/IP.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 79
Byte ordering
2 formats to store multibyte data:
•  big endian:
–  high-order byte at starting address

high-order byte low-order byte

N N+1
–  Ex: Sun SPARC, IBM mainframes, Motorola
68000.
–  used in TCP/IP, called network byte order, for
integers in protocol header: IP addr, port #, …
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 80
Byte ordering (cont)
•  Little endian:
–  Low-order byte at starting address

high-order byte low-order byte

N+1 N

–  Ex: Intel 80x86, DEC VAX.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 81
Byte ordering functions
•  Routines to convert between network byte
order & host byte order
– htons(): host to network short.
– htonl(): host to network long.
– ntohs(): network to host short.
– ntohl(): network to host long.
•  Safe to be called if host & network
have same byte ordering
– Always be called for portability.
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 82
Include files
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
•  contain predefined constants & data structure
declarations for socket programming
Also,
#include <arpa/inet.h>
#include <netinet/tcp.h>
and others. Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 83
Client software design
•  Using black box approach for server
•  Client chooses type of service:
– Connection-oriented (TCP) or
connectionless (UDP)
•  Client needs to know server s:
– Host name / IP address
– Service name / port #
•  Network connection requires 5-tuple:
(Protocol, local addr, local port #, remote
addr, remote port #)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 84
Connection-oriented client algorithm
•  obtain IP addr & port # of server
–  arguments given by user
–  call gethostbyname() & getservbyname()
•  allocate socket
–  call socket()
•  establish connection
–  call connect()
•  communicate w/ server
–  call write() / send() & read() / recv()
•  close connection
–  call close() or shutdown().
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 85
Connection-oriented client socket calls

socket()

connect()

write()

read()

Server
close()

Client Copyright 2011


CmpE207 by Fatoohi, All Rights Reserved 86
Socket call
int socket(int family, int type, int protocol);
•  used to create a new socket
•  family: protocol family (ex: PF_INET)
•  type: socket type (ex: SOCK_STREAM,
SOCK_DGRAM)
•  protocol: protocol # or 0 for given family &
type.
•  returns socket descriptor or -1 for error.
•  specifies one element of 5-tuple: protocol.
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 87
Connect call
int connect(int socket, struct sockaddr *addr, int
addrlen);
•  used to establish connection w/ server
•  uses 3-way handshake in connection-oriented
•  does not return until connection is established or
error occurs (ex. ETIMEDOUT after 3 SYNs)
•  socket: socket descriptor
•  addr: remote host addr structure (IP addr & port #)
•  addrlen: size of addr (in bytes)
•  returns 0 on success, -1 on failure
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 88
Connect call (cont)
•  performs four tasks in connection-oriented:
–  checks validity of socket
–  fills remote endpoint in socket data structure
–  chooses unused local port # & local IP addr
–  establishes TCP connection
•  four elements of 5-tuple filled by connect()
•  In connectionless, it only records remote
endpoint in socket data structure for sending
requests

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 89
Sending data calls
int write(int fd, char *buf, int buflen);
int send(int socket, char *buf, int buflen, int flags);
•  2 calls to transfer data to remote machine
•  copy outgoing data into kernel buffers
•  no blocking except if kernel buffers are full
•  fd or socket: socket descriptor
•  buf: buffer containing data
•  buflen: size of buf (in bytes)
•  return # of bytes written/sent or -1 for error
•  flags: 0 or MSG_OOB (out-of-band), others
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 90
Receiving data calls
int read(int fd, char *buf, int buflen);
int recv(int socket, char *buf, int buflen, int flags);
•  2 calls to receive data from remote machine
•  Copy incoming data into user s buffer area
•  Arguments as in write()/send()
•  Return # of bytes read/received (could be less
than buflen) or -1 for error
•  Block if no data received
•  Receive only what their buffer can handle if
more data arrived than fits into buffer
•  Multiple calls for stream-oriented protocols
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 91
Closing connection calls
int close(int fd);
int shutdown(int socket, int how);
•  Both calls to terminate communication & release
socket descriptor
•  Return 0 on success, -1 on failure
•  Close() decrements reference count if multiple
processes share a socket
•  Shutdown() provides partial close mechanism
–  More control over full-duplex connection
•  How: 0: no more data can be received
1: no more output allowed
2: connection closed in both directions
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 92
Connectionless client algorithm
•  2 modes: connected & unconnected
•  connected mode as in connection-oriented
•  unconnected mode:
– obtain IP addr & port # of server
•  call gethostbyname() & getservbyname()
– allocate socket
•  call socket()
– communicate w/ server
•  call sendto() & recvfrom()
– close connection
•  call close()
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 93
Connectionless client socket calls
(unconnected mode)

socket()

sendto()

recvfrom()

close()
Server
Client Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 94
Sendto call
int sendto(int socket, char *buf, int buflen, int
flags, struct sockaddr *to, int tolen);
•  used in connectionless, unconnected mode
•  specifies both datagram & remote endpoint
•  to: remote host address structure
•  tolen: size of to (in bytes)
•  returns # of bytes sent, -1 for error

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 95
Recvfrom call
int recvfrom(int socket, char *buf, int buflen, int
flags, struct sockaddr *from, int *fromlen);
•  used in connectionless, unconnected mode
•  extracts arriving datagram & records
sender s address.
•  from: address of structure to hold sender s
address
•  fromlen: length of from, returned to caller.
•  used to generate a reply
•  returns # of bytes received, -1 for error.
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 96
Server software design
•  connectionless vs. connection-oriented
– UDP vs. TCP
•  iterative vs. concurrent
•  stateless vs. stateful

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 97
Stateless vs. stateful servers
•  Stateless server does not maintain any state
info about clients it is serving or which data
have been passed to them
–  Every client request must be self-contained
–  More reliable since state info might be
incorrect due to system crash (client or server)
–  Less efficient since each request should have
complete info
–  Ex: NFS server, http server.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 98
Stateless vs. stateful servers (cont)
•  stateful server maintains some info about
outgoing interactions w/ clients
–  it remembers previous client requests
–  Ex: file server has table about files currently
been accessed (file name, current position,
client name)
–  more efficient since requests have less info
–  undesirable w/ UDP
•  idempotent operation produces the same
results every time it runs w/ no harm done
–  Ex: read()

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 99
Connectionless vs. connection-oriented
Feature IP UDP TCP
connection-oriented? no no yes
message boundaries? yes yes no
data checksum? no opt. yes
positive ACK? no no yes
timeout & retransmit? no no yes
duplicate detection? no no yes
sequencing? no no yes
flow control? no no yes
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 100
Types of Servers

iterative iterative
connectionless connection-oriented

concurrent concurrent
connectionless connection-oriented

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 101
•  Iterative connectionless server:
– commonly used
– mainly for short processing time per
request
– used for trivial applications
– easy to program, build, modify & debug
– uses single server process
– often stateless.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 102
•  Concurrent connection-oriented server
– commonly used
– Mainly for requests that require significant
and/or variable processing time
– offers reliable communication
– commonly uses multiple processes, but
single-process implementation is possible
– may execute on multiprocessor machine

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 103
•  Iterative connection-oriented server
– Infrequently used
– Uses single process
– Mainly for reliability

•  Concurrent connectionless server


– Infrequently used
– Uses multiple processes
– Mainly for concurrency

•  Server concurrency is hidden from client


Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 104
Iterative connectionless server algorithm
•  create passive socket of type SOCK_DGRAM
•  bind socket to local endpoint
•  read request from client, process it & send a
reply
•  repeat previous step

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 105
Iterative connectionless server socket calls

socket()

socket() bind()

recvfrom()
sendto() (blocks)

recvfrom() sendto()

close()
Server
Client Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 106
Bind call
int bind(int socket, struct sockaddr *addr, int
addrlen);
•  used to assign a local endpoint to socket
•  mainly used by servers, but client can use it
(older implementation)
•  may use INADDR_ANY for local IP address
–  wildcard address matches any of host s IP addr
•  may use reserved port # (privileged),
unreserved port #, or 0 for unreserved
•  returns 0 for success, -1 for error
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 107
Concurrent connection-oriented server
algorithm
•  create socket of type SOCK_STREAM
•  bind socket to local endpoint
•  place socket in passive mode & specify # of
queued connections
•  accept next incoming client request, create
new socket descriptor for it, create child
process to handle it
•  parent process repeats previous step
•  child process communicates w/ client &
closes
CmpE207
that connection
Copyright 2011
108
by Fatoohi, All Rights Reserved
Concurrent connection-oriented server
socket calls socket()

bind()
socket() listen()

accept()
connect() (blocks)

write() read()

read() write()

close() close(new)
Client Server
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 109
Listen call
int listen(int socket, int queuelen);
•  makes socket ready to accept client request
(passive mode)
•  called only by TCP server
•  queuelen (backlog): max # of connection
requests can be queued
–  historically 5, but inadequate for today s applications
–  0 may cause unpredictable results
•  returns 0 for success, -1 for error

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 110
backlog
Kernel uses 2 queues for a listening socket:
•  incomplete connection queue: entry for each
client s SYN received
–  server sends its SYN
•  completed connection queue: entry for each
client s ACK received
–  TCP 3-way handshake completed
•  Entry from 1st queue either
–  moves to end of 2nd queue once client s ACK received
–  times out
•  backlog
CmpE207
specifies max of
Copyright sum
2011 of the 2 queues111
by Fatoohi, All Rights Reserved
Accept call
int accept(int socket, struct sockaddr *from, int
*fromlen);
•  removes next connection request from completed
connection queue, creates a new socket & returns its
descriptor
•  socket created by socket() - listening socket
•  returns nonnegative integer, or -1 for error
•  from is remote endpoint
•  blocks until request arrives
•  new socket (connected socket) has all elements
of 5-tuple;
CmpE207
listening socket
Copyright 2011has 3 elements only.
112
by Fatoohi, All Rights Reserved
Concurrent server
•  consists of single parent process & ≥ 1 child processes
•  parent process
–  creates new socket (connected socket)
–  creates child by calling fork()
–  closes the connected socket
–  goes back waiting at accept() for new connection
–  does not communicate w/ clients
•  child process
–  closes parent socket (listening socket)
–  communicates w/ client
–  closes its descriptor (connected socket)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 113
Cleaning up Zombie
•  Zombie: child process that has exited but its
parent has not waited for it
–  child remains in processing table
•  child sends SIGCHLD signal to its parent
whenever it exits
•  parent needs a signal handler to terminate
its child completely.
signal(SIGCHLD, handler);
•  parent executes handler when child exits
•  handler calls wait(), wait3(), wait4() (4.4BSD),
or waitpid() (POSIX).
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 114
Wait3 system call
int wait3(int *status, int options, struct rusage *rusage);
•  used by parent to complete child termination
•  options: mainly WNOHANG for noblocking (not
available w/ wait())
•  status: status of child if not NULL
•  rusage: CPU usage info
•  returns
–  child pid for success
–  0 if no child has exited & WNOHANG is set
–  -1 for error (no child running)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 115
Thread

•  Single stream of control in flow of program


•  Exists within process & uses its resources
•  May share process resources w/ other threads
–  Process instructions, most data, open files (file
descriptors), signal handlers, current working
directory, user & group ID
•  Each thread has its own resources
–  Thread ID, stack (for local variables), set of
registers (program counter, stack counter), priority
•  Its creation is much faster than process creation
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 116
Thread: Advantages & disadvantages
•  Advantages:
–  Lower contact switching overhead
–  Memory sharing – easy to access shared data
•  Disadvantages:
–  Memory sharing – action by one thread can affect others
–  Lack of robustness – one misbehaved thread can cause
OS to terminate the entire process
–  Non-reentrant property of many functions
•  can t be called safely when another instance is suspended
•  return pointers to static data items
•  Ex. gethostbyname(), gethostbyaddr()
•  reentrant versions w/ _r suffic are not standard
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 117
Pthreads API
•  Part of ANSI/IEEE POSIX 1003.1c standard of 95
–  Latest version: IEEE Std 1003.1, 2004 Ed.
–  Can be purchased from IEEE or downloaded for free
•  Supported by most vendors
•  There are other vendor-specific threads
•  ~ 100 functions specified
•  Names of all functions begin w/ pthread_
•  Header file pthread.h must be included in each
source file
•  Defined for C language
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 118
Pthreads (cont)

•  Thread management: creation & termination


•  Concurrent server using threads
•  Thread synchronization
– Basic constructs:
•  Mutual exclusion (mutex)
•  Condition variables
– Composite constructs:
•  Semaphore
•  Read-write locks
•  Barrier
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 119
Pthreads: creation & termination
int pthread_create (pthread_t *thread, const
pthread_attr_t *attr, void *(*func) (void), void *arg);
•  Creates new thread & makes it runnable (executable)
•  Initially, main comprises single thread (called initial, default
or main thread), other threads must be explicitly created
–  Created threads are peers & may create other threads
•  Returns 0 on success, ≠ 0 on error
•  Max # of threads created by process: implementation
dependent

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 120
Pthreads: creation & termination (cont)

•  thread: opaque, unique identifier (thread ID)


returned on success
–  Often unsigned int
•  Attr: specifies thread attribute object
–  If NULL, thread w/ default attributes created
•  func: function for thread to execute once created
•  arg: single argument passed to func
–  Multiple arguments must be packaged into struct

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 121
Pthreads: creation & termination (cont)
int pthread_join (pthread_t thread, void **status);
•  Blocks the calling thread until thread terminates
•  status: contains thread s termination status
–  The same value passed to pthread_exit
•  Returns 0 on success, ≠ 0 on error
•  Returns immediately if thread already terminated
•  Only threads created as joinable can be joined
–  Threads created as detached can not be joined

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 122
Pthreads: creation & termination (cont)
void pthread_exit (void *status);
•  Terminates the calling thread safely
•  Retains thread ID & exit status (status) for later
pthread_join()

int pthread_equal(pthread_t thread1, pthread_t


thread2);
•  Determines if 2 thread IDs refer to the same thread
•  Returns 0 if different, ≠ 0 if equal
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 123
Pthreads: creation & termination (cont)
Int pthread_detach(pthread_t thread);
•  Changes status of thread so it is detached
•  Resources of detached thread released when thread
terminated
–  Cannot wait for it
•  Can be called by thread that wants to detach itself
–  Passing pthread_self() as thread

pthread_t pthread_self (void);


•  Returns thread ID of the calling thread
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 124
Pthreads: creation & termination (cont)
int pthread_once (pthread_once_t *once_control, void
(*once_routine)(void));
•  Executes once_routine exactly once in process
•  Only the first call by any thread in the process
executes it
–  Any subsequent calls (by other threads) have no effect
•  Used typically for initialization
•  once_control: variable used to determine whether
once_routine has been called
–  Global variable needs to be initialized
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 125
Pthreads: creation & termination (cont)
int pthread_cancel (pthread_t thread);
•  Requests cancellation of another thread, thread
–  Sends cancellation request (message) to thread
•  Uses same termination procedure as pthread_exit
•  Threads can protect themselves against
cancellation
•  Returns after cancellation has been sent
•  Returns 0 on success
–  It means thread is valid thread
–  It does not mean that thread has been cancelled
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 126
Pthreads: creation & termination (cont)
•  Threads can be terminated if:
–  It calls pthread_exit
–  Another thread calls pthread_cancel on it
–  Function that started it (3rd argument in
pthread_create) returns
•  Return value is the exit status of the thread
–  Entire process is terminated due to calling
exec or exit

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 127
Pthreads: attributes object
•  Data structure that describes entity (thread,
mutex, condition variable) properties
•  By default thread is created w/ certain attributes
•  Most attributes can not be changed once created
–  Exception: priority
•  Functions specified to initialize, destroy, query,
set specific attributes in attributes object
–  Starts w/ pthread_attr_
•  Ex. priority, initial stack size, joinable/detached
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 128
Pthreads: attributes object (cont)
Int pthread_attr_init(pthread_attr_t *attr);
•  Creates thread attributes object & initializes it w/
default values

Int pthread_attr_destroy(pthread_attr_t *attr);


•  Destroys thread attributes object & reclaims its
storage space

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 129
Pthreads: attributes object (cont)
int pthread_attr_setdetachstate (pthread_attr_t *attr,
int state);
•  Specifies if thread created w/ attr is detached
•  State: either:
= PTHREAD_CREATE_DETACHED (detached)
= PTHREAD_CREATE_JOINABLE (joinable)
int pthread_attr_getdetachstate (pthread_attr_t *attr,
int *state)
•  Gets detachable state of thread
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 130
Concurrent server using threads
•  consists of single parent thread & ≥ 1 child threads
•  parent thread
–  initializes thread & attributes
–  creates new socket (connected socket)
–  creates child using pthread_create()
–  goes back waiting at accept() for new connection
–  does not communicate w/ clients
•  child thread
–  communicates w/ client
–  closes its descriptor (connected socket)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 131
Concurrent connection-oriented server
algorithm using threads
•  create socket of type SOCK_STREAM
•  bind socket to local endpoint
•  place socket in passive mode & specify # of
queued connections
•  accept next incoming client request, create
new socket descriptor for it, create new
thread to handle it
•  parent thread repeats previous step
•  New thread communicates w/ client &
closes
CmpE207
that connection
Copyright 2011
132
by Fatoohi, All Rights Reserved
Pthread: thread synchronization

•  By using pthread_create & pthread_join, threads


can perform concurrent tasks
•  However, if these tasks try to manipulate shared
variables, results are non-deterministic
–  Order of execution by multiple threads can not
determined
–  Operation could be broken into multiple instructions
–  If one thread is in the middle of updating shared
variable, gets suspended, another thread executes &
updates the same variable ⇒ results unknown

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 133
Pthread: thread synchronization (cont)

•  Mutux (mutual execution): mechanism to


protect shared variables using critical section
–  Section can be executed by one thread at any time
•  Only the thread that holds mutex can access the
shared variables, others have to wait (block)
•  Mutex has 2 states: locked & unlocked
•  Mutex ensures that when several threads update
same variable, final value is same as what it
would be if only one thread performed update.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 134
Pthread: Mutex

•  How mutex is used?


–  mutex is created & initialized to unlocked state
–  Several threads attempt to lock mutex
–  Only one succeeds & that thread owns mutex
–  The owner thread performs some operations
–  The owner unlocks mutex
–  Another thread acquires mutex & repeats process
–  Finally mutex is destroyed

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 135
Pthread: Mutex (cont)

int pthread_mutex_lock (pthread_mutex_t *mutex);


•  Used by calling thread to lock on mutex
•  If mutex is already locked (by another thread),
calling thread blocks; otherwise calling thread
returns
•  Calling thread needs to unlock once it is done
–  Otherwise, deadlock is resulted

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 136
Pthread: Mutex (cont)

int pthread_mutex_unlock (pthread_mutex_t


*mutex);
•  Unlock mutex if called by the owning thread
–  After completing use of critical section
•  One of the blocked threads (if any) will acquire
mutex (depending on scheduling policy)
•  Error generated if:
–  If mutex is already unlocked
–  If mutex is owned by another thread

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 137
Pthread: Mutex (cont)
int pthread_mutex_init (pthread_mutex_t *mutex,
pthread_mutexattr_t *attr);
•  Initializes mutex to unlocked state
–  mutex must be declared & initialized before it can be
used
•  attr: attributes object of mutex
–  If set to NULL, default attributes are used
•  mutex can also be initialized when it is declared
pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 138
Pthread: Mutex (cont)

int pthread_mutex_destroy (pthread_mutex_t


*mutex);
•  Deletes mutex object when it is no longer
needed

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 139
Pthread: Condition Variables
•  Data objects used for synchronization, as mutex
–  Mutex may cause idling of other threads (blocked)
–  Mutex implements synchronization by controlling
access to shared data
•  Provide synchronization based on actual value of
data
•  Use interrupt driven mechanism
–  Mutex uses polling mechanism
•  Used in conjunction w/ mutex

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 140
Pthread: Condition variables (cont)

•  Scenario to use condition variable w/ 2 threads


(created by main thread)
•  Main thread:
–  Declare & initialize global data/variables which require
synchronization
–  Declare & initialize condition variable object
–  Declare & initialize mutex
–  Create threads A & B to do work
–  Join threads A & B
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 141
Pthread: Condition variables (cont)
•  Thread A:
–  Do work up to point where certain condition occurs
–  Lock mutex & check value of global variable
•  Call pthread_mutex_lock()
–  Block while condition is not met, waiting for signal
from Thread-B
•  Call pthread_cond_wait() – it automatically & atomically
unlocks mutex so that it can be used by Thread-B
–  When signaled, wake up
•  mutex is automatically & atomically locked
–  Explicitly unlock mutex
•  Call pthread_mutex_unlock()
–  Continue
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 142
Pthread: Condition variables (cont)
•  Thread B:
–  Do work
–  Lock mutex
•  Call pthread_mutex_lock()
–  Change value of global variable that Thread-A is
waiting upon & then check if it meets condition
–  Once the condition is met, signal to Thread-A
•  Call pthread_cond_signal()
–  Unlock mutex
•  Call pthread_mutex_unlock()
–  Continue
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 143
Condition variables: Example

Thread A Thread B
•  pthread_mutex_lock(&var_mutex); •  for (i = 0, i < max; i++) {
•  if (var < limit) pthread_mutex_lock(&var_mutex);
pthread_cond_wait(&var_cv, var++;
&var_mutex); if (var == limit)
•  pthread_mutex_unlock (&var_mutex); pthread_cond_signal(&var_cv);
•  pthread_exit(); pthread_mutex_unlock(&var_mutex);
}
•  pthread_exit();

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 144
Pthread: Condition variables (cont)

int pthread_cond_wait (pthread_cond_t *cond,


pthread_mutex_t *mutex);
•  Blocks execution of thread until specific
cond is signaled
•  Called when mutex is locked & automatically
releases mutex while it waits
•  Once signal is received, mutex will be
automatically locked.
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 145
Pthread: Condition variables (cont)
Int pthread_cond_signal (pthread_cond_t *cond);
•  Unblocks another thread that is waiting on the
condition variable cond
–  signals (or wake up) another thread which is waiting on
the condition variable

int pthread_cond_broadcast (pthread_cond_t *cond);


•  Signals to all threads that are waiting on the
condition variable

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 146
Pthread: Condition variables (cont)

int pthread_cond_timedwait ( pthread_cond_t *cond,


pthread_mutex_t *mutex, const struct timespec
*abstime);
•  Blocks on condition variable until specified time,
abstime, expires
•  When time-out occurs, thread wakes up by itself if
it does not receive signal or broadcast
–  It also reacquires mutex when it becomes available

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 147
Pthread: Condition variables (cont)
int pthread_cond_init (pthread_cond_t *cond,
pthread_condattr_t *attr);
•  Initializes condition variable, cond
•  Condition variable must be declared & initialized
before it can be used
•  Attr: attributes object of cond
–  If set to NULL, default attributes are used
–  Only one attribute is defined, process-shared
•  cond can also be initialized when it is declared
pthread_cond_t myconvar = PTHREAD_COND_INITIALIZER
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 148
Pthreads: composite synchronization constructs

•  Built using basic synchronization constructs


(mutex, condition variables)
–  Basic constructs provide minimal functionality
•  Examples:
– Semaphore
– Read-write locks
– Barrier

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 149
Pthreads: Semaphore
•  Also called counting semaphore
•  Similar to mutex but it allows multiple threads to
proceed through critical section
•  sem_init(): initialize semaphore (s) to initial count
•  sem_wait(): wait (block calling thread) until s > 0,
then decrement s by 1 & proceed
–  Block others if s = 0
•  sem_post(): increment s by 1, to allow another
thread to proceed (if any).
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 150
Pthreads: Read-write locks
•  Allow concurrent reads & exclusive writes to critical
section
•  Multiple reads can proceed while writes must be
serialized
•  Used when there are many more reads than writes or
if value needs to be examined before it is updated
–  Ex. Database access
•  Slower than mutex but it can improve performance
if write is infrequent while read is frequent

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 151
Pthreads: Barrier

•  Hold thread until all other threads have


reached the barrier
–  No work is performed
•  Provide synchronization point
•  Many possible implementations
–  Counter (int), mutex & condition variable

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 152
I/O Models

•  Blocking I/O
•  Non-blocking I/O
•  I/O Multiplexing
•  Signal Driven I/O
•  Asynchronous I/O

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 153
Blocking I/O
Application Kernel
call read data data not ready
process blocks

data arrive
data recv’d data in recv buf

•  All sockets are blocking by default


•  Read data call does not return until data arrive
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 154
Non-blocking I/O
Application Kernel
call read data data not ready
recv error
call read data (2nd)
recv error
… …
call read data (nth) data arrive
data recv’d data recv buf

•  Call returns error when data is not ready (no blocking)


•  Application keeps trying until data is in recv buf (polling)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 155
I/O Multiplexing
Application Kernel
call select data not ready
process blocks

socket readable data arrive


call read data

data recv’d data in recv buf

•  select blocks until socket(s) is readable


•  Once socket is readable, application calls read data
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 156
Signal Driven I/O
Application Kernel
establish signal handler data not ready

signal recv’d data arrive


call read data

data recv’d data in recv buf

•  Enable socket for signal driven I/O & establish signal handler
(no blocking)
•  Once signal recv’d, application calls read data
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 157
Asynchronous I/O (POSIX)
Application Kernel
call aio_read data data not ready

data arrive
signal & data recv’d data in recv buf

•  Call asynchronous I/O read, aio_read (no blocking)


•  Application receives signal once data recv’d (in appl buf)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 158
Single-Process Concurrent Server
•  single server process manages multiple
connections
•  concurrency through I/O multiplexing
– using select()
•  used if application needs data sharing and/or
for short processing time
•  less switching between process contexts

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 159
Single-process concurrent server algorithm
•  create socket of type SOCK_STREAM
•  bind socket to local endpoint
•  place socket in passive mode
•  add socket to possible I/O list (file descriptors)
•  wait for activity on file descriptors - select()
•  If original socket is ready, accept new
connection, create new socket, add it to
possible I/O list & go back waiting
•  If another socket is ready, communicate w/
client, close socket, remove it from possible I/O
list when it finishes & go back waiting.
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 160
Select system call
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout)
•  instruct kernel that we want to be notified if I/O
condition(s) are ready
–  process waits for any of file descriptors to become ready
•  provides I/O multiplexing
•  nfds: # of file descriptors to be tested
•  readfds, writefds, exceptfds: file descriptors for
reading, writing, exceptions
•  timeout: max time to wait; 0 no waiting
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 161
Select system call (cont)
•  returns
– Total # of ready file descriptors for success
– 0 means time expired
– -1 for error
•  On return, non-ready descriptor will have its
corresponding bit cleared
•  timeval: struct defined in <sys/time.h>
•  fd_set:
– struct defined in <sys/types.h>
– set of possible descriptors using array of integers
•  Ex: 32-bit integer for descriptors 0 to 31
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 162
FD_XXX macros
fd_set(int fd, fd_set *fdset);
fd_clr(int fd, fd_set *fdset);
fd_isset(int fd, fd_set *fdset);
fd_zero(fd_set *fdset);
•  fd_set: set the bit for fd in fdset
•  fd_clr: clear the bit for fd in fdset
•  fd_isset: test the bit for fd in fdset
–  returns non-zero if bit for fd is set, 0 otherwise
•  fd_zero: clear all bits in fdset
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 163
Obtaining process file descriptors
•  int getdtablesize()
– system call to get # of entries in file
descriptor table for process
•  NOFILE
– max # of open files per process
– constant defined in <sys/param.h>
•  FD_SETSIZE
– # of descriptors in fd_set
– constant defined in <sys/select.h>
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 164
Multiprotocol Servers
•  using single server to handle multiple
transport protocols (TCP & UDP)
•  using same service for 2 protocols
•  used for both iterative & concurrent servers
•  uses I/O multiplexing, select()
•  Adv: save system resources
•  Adv: code sharing - easier maintenance &
versioning
•  Disadv: less control
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 165
Multiprotocol server algorithm
•  create 2 sockets: TCP & UDP
•  bind both sockets
•  place TCP socket in listen mode
•  add both sockets to possible I/O list
•  use I/O multiplexing to wait for activity
•  If TCP socket is ready, accept connection &
communicate w/ client
•  If UDP socket is ready, communicate w/ client
•  Go back waiting on the two sockets.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 166
Multiservice servers
•  using single server for multiple services
•  may use multiple protocols
•  Adv: reduce # of active processes, daemons
•  Adv: code sharing (less code & easy maintenance)
•  uses I/O multiplexing
•  single server may not be able to offer all services
–  limit # of sockets per process
•  Ex: Unix Superserver, inetd or xinetd

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 167
Multiservice server algorithm
•  create sockets for each service it offers
•  add sockets to list of possible I/O
•  wait for socket to be active
•  If TCP or UDP socket is ready, invoke
procedure to handle
•  may use iterative, concurrent, or mixture of
servers
•  may use single process

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 168
Server configurations
•  changing set of services that server handles w/o
recompiling source code
•  using configuration file that contains set of
services & programs to be used
•  2 types: static & dynamic configurations
•  static configuration:
– server reads configuration file when it starts
– changing configuration file requires
restarting server

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 169
Dynamic configuration
•  changing configuration file dynamically
–  w/o restarting server
•  server reads initial configuration file
•  sys adm informs server if changes occur
–  sends signal or message (using TCP/IP)
•  server creates new sockets for new services
•  server removes sockets for deleted services
–  continues ongoing service if it has been deleted
until completion
•  Ex: inetd
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 170
Inetd
•  available for most Unix systems (starting
4.3BSD)
•  offers many well-known services (ex: ftp,
telnet, tftp, echo)
•  uses single process, inetd, waiting to service
multiple requests (both TCP & UDP)
•  uses /etc/inetd.conf (or /etc/xinetd.conf ) as
configuration file

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 171
Inetd configuration file (/etc/inetd.conf)
•  specify services that inetd to listen to
•  each entry has 6 or more fields:
–  service name: name of service (in /etc/services)
–  socket type: type of socket (stream or dgram)
–  protocol: protocol name (TCP or UDP)
–  wait status: wait or nowait
•  nowait: run multiple copies of service
concurrently
•  wait: run iteratively but create new process

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 172
Inetd configuration file (cont)
–  user id: login id or root
–  server program: full pathname of service
program or internal to use internal version
–  arguments: 0 or more arguments to be passed
to service program (commonly server name)
•  Ex: ftp stream tcp nowait root /etc/ftpd ftpd

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 173
Server concurrency

•  Choosing between concurrent & iterative server


designs depends on user demands, service time,
communication & processing speeds, …
–  These are not fixed factors (constantly changing)
–  Designers decide based on recent trends

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 174
Cost of concurrency
Let p: processing time, c: time required to
create process, assume burst of n requests
arrives at time 0, then:
•  Iterative server completes processing
requests at times p, 2p, …, np
–  Average per request: ( (n+1) / 2 ) p
•  Concurrent server completes processing
requests at times c + p, 2c + p, …, nc + p
–  Average per request: ( (n+1) / 2 ) c + p
•  If p < c, iterative server performs better
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 175
Level of concurrency
•  Total # of tasks that server has executing at given
time
–  Task: process or thread
–  Varies over time
–  Also called degree of concurrency
•  More importantly: Max level of concurrency
–  Limited by OS & network SW
–  Requests denied once limits are reached
–  Designers might ignore it for flexibility

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 176
Concurrent server design options
•  Demand-driven (on demand)
–  based on demands (client requests)
–  creates processes or threads as needed
–  does not consume resources unnecessarily
–  causes delay since it creates child for each request
–  handles new requests w/o waiting for existing requests to
finish
•  Child (slave) pre-allocation
–  Pre-forking or pre-threading
•  Delayed child (slave) allocation
•  Combination
CmpE207
of above Copyright 2011
177
by Fatoohi, All Rights Reserved
Child pre-allocation concurrent server
•  Parent creates number of children at startup
•  Children handle requests as they arrive
–  One of the waiting children serves a client while others
keep waiting
•  Children persist – they don t exit
•  Advantages: requests can be handled immediately
–  w/o waiting to create child
–  Cost of child creation paid in advance
•  Disadvantages: knowledge of # of needed children
•  Disadvantages: resource allocation & maintenance
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 178
Child pre-allocation concurrent server (cont)
•  Parent may monitor # of available children
–  May create new children if # is below threshold
–  May terminate children if # is above another threshold
•  Parent may do nothing
–  May exit once children have been started
–  May become the last child
•  Used for both connection-oriented &
connectionless servers
•  On multiprocessor: # of children ≡ # of processors
•  2 options: pre-forking & pre-threading
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 179
Preforked connection-oriented server algorithm
•  Parent opens a socket
•  Parent binds socket & places it in listen mode
•  Parent forks # of child processes
–  # might be provided as command-line argument
–  Each child inherits socket descriptors from parent
•  Each child calls accept() on the same listening
socket - blocks awaiting requests
–  One child unblocked, handles connection, closes new
socket, then goes back waiting for new connection
–  How OS handles concurrent invocations of accept()?
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 180
Concurrent invocations of accept()

•  In 4.4BSD implementation:
–  once new connection arrives, all children are
awakened – called thundering herd problem
–  1st child to run obtains connection while others return
to blocked state
•  In Linux: only one child is awakened & handles
connection – efficient
•  In SVR4: it returns an error message
–  Sol n: placing a lock around accept()
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 181
Locking accept()

•  To ensure that 1 child calls accept() at any time


•  Either using file locking w/ fcntl()
•  Or using thread locking between processes
–  mutex must be stored in memory shared between
processes (using mmap() for example)
–  Thread library must be told that the mutex is shared
between processes (using mutex attributes)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 182
Prethreaded connection-oriented server algorithm
•  Parent opens a socket
•  Parent creates pool of threads
–  Each child shares socket descriptors w/ parent
•  Each child calls accept() on the same listening
socket - blocks awaiting requests
–  One child unblocked, handles connection, closes new
socket, then goes back waiting for new connection
–  May use mutex to allow only 1 thread to call accept()
–  In Linux & 4.4BSD, mutex is not needed

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 183
Delayed child allocation

•  Server measures processing time & chooses between


iterative & concurrent handling dynamically
•  Server starts serving as iterative but it creates child
to handle request if threshold exceeds
–  Child continues processing at the point where parent was
executing when timer expired
•  Can be combined w/ child pre-allocation
–  Starting w/o pre-allocation but new created children can
persist

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 184
Concurrency in clients

•  Clients can benefit from concurrency:


–  Modularity
–  Contacting multiple servers at the same time
–  Handling multiple tasks dynamically
•  Implementations:
–  Multithreading
–  I/O Multiplexing
•  Ex: telnet
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 185
Socket options
•  changing or reading options associated w/
socket
•  2 system calls apply to sockets only:
– getsockopt()
– setsockopt()
•  2 system calls apply to file descriptors:
– fcntl()
– ioctl()

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 186
Sockopt system calls.
int getsockopt(int socket, int level, int
optname, char *optval, int *optlen);
int setsockopt(int socket, int level, int
optname, char *optval, int optlen);
•  to obtain or set parameter related to socket.
•  both return 0 for success, -1 for error.
•  socket: open socket (TCP: listening socket
carried over to connected socket).
•  level: system module to interpret options
–  Ex: SOL_SOCKET (socket), IPPROTO_IP
(IP), IPPROTO_TCP (TCP).
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 187
Sockopt system call options
•  optname: integer identifies option
•  optval: pointer to variable (set/returned)
•  optlen: length of optval
•  2 option types: binary & passing value
•  binary (flag) option: enable/disable feature
– optval: 0 (disable), nonzero (enable)
•  passing value option: fetch/return value
– optval: passing value

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 188
SOL_SOCKET options
optname get set flag data type
SO_BROADCAST • • • int
SO_DEBUG • • • int
SO_DONTROUTE • • • int
SO_ERROR • int
SO_KEEPALIVE • • • int
SO_LINGER • • linger()
SO_OOBINLINE • • • int
SO_RCVBUF • • int
SO_SNDBUF • • int
SO_REUSEADDR • • • int
SO_REUSEPORT • • • int
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 189
Generic socket options
•  protocol independent
–  some apply to specific socket type.
•  SO_BROADCAST: allow/disallow process
to send broadcast messages
–  datagram sockets only
–  networks support broadcasting only
–  default: disallowed
•  SO_DEBUG: enable/disable low-level
debugging within kernel
–  stream sockets only
–  default: disabled

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 190
Generic socket options (cont)
•  SO_DONTROUTE: to bypass normal
routing mechanism
–  uses netid for routing
–  may be used by routing daemons for debugging
–  default: disabled
–  similar to option MSG_DONTROUTE in send()
•  SO_ERROR: get error status & clear error
–  get so_error contents for socket: reset by kernel
•  SO_KEEPALIVE: enable periodic
transmission to keep connection alive
–  stream sockets only; default: disabled
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 191
Generic socket options (cont)
•  SO_LINGER: control whether process waits if there are
unsent data & socket closed
–  stream sockets only
–  default: disabled - system attempts to send all unsent data
–  uses struct linger {
int l_onoff; = 0: OFF, ≠ 0: ON
int l_linger; linger time (seconds)
}
–  l_onoff = 0, option disabled
–  l_onoff ≠ 0, l_linger = 0, discard unsent data
–  l_onoff ≠ 0, l_linger ≠ 0, linger for l_linger time
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 192
Generic socket options (cont)
•  SO_OOBINLINE: out-of-band (TCP urgent)
data to be left in input queue (in-line)
–  otherwise, out-of-band data pulled out of data
stream (used w/ MSG_OOB flag in recv())
–  stream sockets only; default: disabled
•  SO_RCVBUF & SO_SNDBUF: set receive
& send queue buffer sizes
–  to improve performance
•  SO_REUSEADDR & SO_REUSEPORT: to
reuse local endpoints for subsequent calls
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 193
IPPROTO_IP options

optname get set flag datatype

IP_OPTIONS • •

IP_HDRINCL • • • int

IP_TOS • • int

IP_TTL • • int

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 194
IP socket options:
•  IP_OPTIONS: set options in IP header
•  IP_HDRINCL: to build complete IP header
for all datagrams using raw socket
•  IP_TOS: to set type-of-service field in IP
header
•  IP_TTL: to set time-to-live field in IP
header

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 195
IPPROTO_TCP

optname get set flag datatype

TCP_MAXSEG • • int

TCP_NODELAY • • • int

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 196
TCP socket options:
•  TCP_MAXSEG: to get/set maximum
segment size for TCP connection
–  can t increase received value
–  set allowed only in BSD 4.4

•  TCP_NODELAY: don t delay send


–  disable Nagle algorithm

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 197
fcntl system call

int fcntl (int fd, int cmd, /* arg */ ...);


•  Stands for file control
•  Provides for control over open file descriptor, fd
•  arg: (if present) its data type, value & use
depend on command cmd
•  Returns depends on cmd if OK, -1 on error
•  Available values of cmd defined in fcntl.h
•  fd: has set of flags that can be fetched & set
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 198
fcntl (cont)
•  3 commands used w/ socket:
– F_GETFL: fetch file status flags
– F_SETFL: set file status flags
– F_SETOWN: let socket owner (process ID)
to receive SIGIO & SIGURG signal
•  Flag is set by fetching current flags, logically
OR return value w/ new flag, then set new flags
•  2 status flags affect socket:
– O_NONBLOCK: non-blocking I/O
– O_ASYNC: signal-driven I/O
•  If flag is not supported, ioctl() used instead
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 199
Signal-driven I/O
•  Kernel sends signal when status of descriptor
changes
–  Using SIGIO signal in Berkeley-driven impl
•  Requires process to perform following steps:
–  Must have signal handler for SIGIO
–  Socket owner must be set (using F_SETOWN of fcntl())
–  Signal-driven I/O must be enabled for socket (using
F_SETFL of fcntl() to turn on O_ASYNC)
•  Easy to set but hard to determine cause of SIGIO
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 200
Non-blocking I/O

•  By default, sockets are blocking


•  In non-blocking, process returns immediately
•  Socket calls that may block:
– input operations: ex. read(), recv(), recvfrom()
– output operations: ex. write(), send(), sendto()
– accepting incoming connections: accept()
– initiating outgoing connections: connect()

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 201
Non-blocking I/O: Input operations
•  Normally (by default) process blocks if there is
no data available in socket receive buffer
–  TCP: process awakened when some data arrives (1
byte or full segment)
–  UDP: process awakened when datagram arrives
•  W/ non-blocking socket, process reads as much
as it can immediately & returns w/o waiting
–  At least 1 byte of data in TCP or datagram in UDP
•  If no data to read, function immediately returns
-1 & sets errno to EWOULDBLOCK
–  Or sets errno to EAGAIN (in System V)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 202
Non-blocking I/O: Output operations
•  Normally process blocks if there is no room in
socket send buffer (in TCP)
–  In UDP, there is no socket send buffer but process
may block for other reasons
•  W/ non-blocking socket:
–  if there is no room in socket send buffer, function
immediately returns -1 & sets errno to
EWOULDBLOCK
–  If there is some room in socket send buffer,
function returns # of bytes kernel was able to copy

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 203
Non-blocking I/O: Accepting incoming connections

•  Normally process blocks if new connection is


not available
•  W/ non-blocking socket, if new connection is
not available, accept() immediately returns -1
& sets errno to EWOULDBLOCK
–  Listening socket should be set to non-blocking

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 204
Non-blocking I/O: Initiating outgoing connections
•  Normally connect() doesn t return until client
receives ACK of its SYN
•  W/ non-blocking socket, if connection can t be
established immediately, connection establishment
is initiated & connect() immediately returns -1 &
sets errno to EINPROGRESS
–  Client can check on connection status using select()
•  Successful connection ⇒ descriptor is writable
•  Case of error ⇒ descriptor is readable & writable (check
value of errno)
–  If server & client are on same host, connection is
established immediately: connect() returns 0
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 205
Raw sockets
•  allows processes direct access to protocol
other than ones used for transport
–  Ex: access network layer protocols (IP, ICMP)
–  used by knowledgeable processes to use some
protocol features or developing new protocols
–  allows read/write ICMP & IGMP packets (Ex:
ping)
–  process can read/write IP datagrams w/ IP
protocol field unprocessed by kernel
–  process can build its own IP header

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 206
Raw socket design
•  create socket of type SOCK_RAW &
protocol IPPROTO_XXX (ex: XXX: ICMP)
•  may use IP_HDRINCL socket option
•  rarely use bind() or connect()
– only IP address - NO port #.
•  use sendto() for sending data
– may use send() or write() after connect()
•  If IP_HDRINCL is not set:
–  kernel builds IP header
–  data start after IP header
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 207
Raw socket design (cont)
•  If IP_HDRINCL is set:
–  process builds IP header except IP
identification field (user sets to 0 & kernel set
it) & IP header checksum (set by kernel)
•  process computes checksum for what
follows IP header (ex: ICMP)
•  use recvfrom() to read data
–  always returns IP header
–  can use recv() & read() after connect()

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 208
Receiving raw socket messages
•  TCP & UDP packets are not passed
•  Most ICMP packets are passed
–  exception: echo request, timestamp request &
address mask request (processed by kernel)
•  All IGMP packets
•  IP datagram w/ unknown protocol field are passed
–  known: 1(ICMP), 2(IGMP), 6(TCP), 17(UDP)
–  raw socket w/ nonzero protocol receives
datagrams w/ matching protocol #
–  If bind() or connect() used, only matched IP addr
–  otherwise, all datagrams will be received

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 209
ping
•  uses ICMP echo request/reply: types 8/0
•  used to check if remote host is reachable
•  msg format: struct defined in <netinet/ip_icmp.h>
struct icmp {
u_char icmp_type; message type
u_char icmp_code; code type
u_short icmp_cksum; struct checksum
u_short icmp_id; identifier
u_short icmp_seq; seq #
char icmp_data[1]; optional data
};

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 210
ICMP Echo Request/Reply Message Format

0 8 16 31
TYPE (8 or 0) CODE (0) CHECKSUM
IDENTIFIER SEQUENCE NUMBER
OPTIONAL DATA
...

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 211
ping design
•  fetch input arguments: SO_DEBUG,
SO_DONTROUTE, data size, # of packets
•  create socket: type SOCK_RAW, protocol:
icmp (using getprotobyname())
•  set data area size (if data size specified &
enable timing if there is space)
•  fetch Unix process id & use it as identifier
•  use signal handler to send 1st packet &
schedule SIGALRM for 1 second
–  fill in the ICMP header

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 212
Ping design (cont)
–  add time stamp to data sent
–  compute & store ICMP checksum
–  use sendto() to send datagram
•  use infinite loop to read all ICMP echo
reply messages
–  use recvfrom() to receive data
–  record time of incoming message
–  get pointer to ICMP header (deduct IP header)
–  check for ICMP reply message
–  check for identifier field
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 213
Windows Sockets Interface (Winsock)
•  socket interface for use w/ MS Windows
95, NT & subsequent Windows releases
•  multivendor specification
•  based on BSD sockets
–  Here we emphasize similarities & differences
•  mostly implemented in Dynamic Linked
Libraries (DLLs)
•  currently 2 versions: winsock 1.1 &
winsock 2

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 214
Winsock 1.1
•  released in Jan 1993
•  supports TCP/IP protocol suite only
•  provides single API
–  either WINSOCK.DLL (16-bit) or
WSOCK32.DLL (32-bit)
•  allows only one Winsock implementation
on a machine at one time

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 215
Winsock 2
•  released in June 1997
•  supports multiple protocol stacks (TCP/IP,
IPX/SPX, OSI, DecNet) simultaneously
•  adopts Windows Open Systems Architecture
(WOSA) model
–  separates Winsock 2.0 API ( WS2-32.DLL)
from Service Provider Interface (SPI)
•  Winsock 2 API:
–  supports multiplexing between multiple SPIs
–  provided & maintained by Microsoft & Intel
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 216
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 217
Winsock 2 (cont)
•  added new functionality
•  backward compatible w/ 1.1
–  Source & binary level compatibility
•  no 16-bit or NT 3.5 (or earlier) support

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 218
Similarities between Winsock & BSD Socket
•  similar client & server design algorithms
–  both connectionless & connection-oriented
–  both iterative & concurrent
•  most data structures & calls are the same
–  sockaddr_in, hostent, servent, protoent
–  gethostbyname(), getservbyname(),
getprotobyname()
–  htons(), htonl(), ntohs(), ntohl()
–  socket(), accept(), bind(), connect(), listen(), recv
(), recvfrom(), send(), sendto(), shutdown(),
select(), getsockopt(), setsockopt()

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 219
Differences between winsock & BSD socket
•  BSD sockets allow mixing I/O w/ socket
functions, winsock doesn t. In winsock:
–  no read() or write()
–  close() replaced by closesocket()
–  select() works w/ socket descriptors only
–  ioctl() replaced by ioctlsocket(), to provide
functionality for sockets only
•  different include file: #include <winsock.h>
–  winsock2.h (for winsock 2)
•  low port # (< 1024) can be used in winsock
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 220
Differences (cont)
•  Concurrency through threading only in winsock
–  POSIX Threads vs. Windows Threads
•  Error handling is different:
–  BSD socket calls return -1 for error
–  Winsock calls return explicit value of
INVALID_SOCKET or SOCKET_ERROR
–  application uses WSAGetLastError(void) to retrieve
error code
•  differences in handling files, directories,
processes between Unix & Windows
•  2 calls added to startup & cleanup:
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 221
WSAStartup( rvers, wsimpl);
•  must be called before using socket
•  needed since system uses DLLs
•  rvers: 2-byte integer identifies winsock
version requested
•  wsimpl: pointer to struct WSADATA which
returns info about winsock version used
•  return 0 for success, error otherwise
WSACleanup(void);
•  called at the end to deallocate data
structures & socket bindings
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 222
Iterative connectionless winsock
WSAStartup

WSAStartup socket()

socket() bind()

sendto() recvfrom()
(blocks)
recvfrom()
sendto()
closesocket()

WSACleanup Server
Client Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 223
Concurrent connection- WSAStartup
oriented winsock socket()
WSAStartup
bind()
socket() listen()

accept()
connect() (blocks)

send() recv()
recv()
send()
closesocket()
closesocket(new)
WSACleanup Server
Copyright 2011
Client
CmpE207 by Fatoohi, All Rights Reserved 224
eXternal Data Representation (XDR)
•  standard developed by Sun
•  specified in RFC 1832, Aug 95
•  used to represent common forms of data for
network communication
•  includes library procedures for conversion
•  used in RPC, NFS
•  computers & network represent data in
different forms (size & format)
–  big endian, little endian

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 225
XDR (cont)
•  TCP/IP uses big endian byte order for
protocol header
–  Socket uses it for port #
–  Only for simple integer data
–  Byte ordering functions used for conversion
•  Uses symmetric data conversion
–  Conversion at both ends
–  Uses machine-independent representation
–  Adv: flexibility (avoid n × m problem)
–  Disadv: performance (a layer of SW)
–  Disadv: sometimes unnecessary conversion
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 226
XDR (cont)
•  uses implicit typing
–  Only the value of variable transmitted, not its
type & size
–  Alternative: explicit typing: extra data
transmitted specifying type & size (Ex: ASN.1)
•  defines numerous data types & how to
convert them
–  uses big endian byte ordering
–  min size of any field: 32 bits
–  data types: int, bool, enum, float, double, string,
structure, fixed array
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 227
XDR (cont)
•  other standards:
–  Network Data Representation (NDR)
•  Distributed Computing Environment (DCE)
•  supports multiple data format representations
•  client chooses format, tag data w/ format,
server transforms to native format
–  Common Data Representation (CDR)
•  communication between ORBs (CORBA)
•  supports both big endian & little endian
•  transformation as in NDR

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 228
XDR paradigms:
•  Memory Buffer (Stream) Paradigm:
– use buffer to hold external representation
of message & to add items
xdrmem_create (xdr *xdrs, char *buf, int
buflen, enum xdr_op op);
•  xdrs: pointer to XDR stream
•  buf: buffer area for XDR data
•  buflen: buffer size (multiple of 4)
•  op: direction of stream (XDR_ENCODE,
XDR_DECODE, XDR_FREE)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 229
XDR memory buffer procedure:
•  Sender:
–  Create XDR input stream using
XDR_ENCODE option in xdrmem_create
–  Encode each data item using XDR data
conversion routine & add it to input stream
•  Routine doesn t specify conversion direction
•  Routine for each data type, for example:
xdr_int ( xdr *xdrs, int *ptr);
•  Converts 32-bit integer from native representation
to XDR representation (or vice versa) & appends
to XDR stream
•  Returns TRUE for success, FALSE for error
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 230
XDR memory buffer procedure (cont):
•  Receiver:
–  Create XDR output stream using
XDR_DECODE option in xdrmem_create
–  Call XDR conversion routine to extract item
from output stream & convert it to native mode

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 231
XDR paradigms (cont):
•  Standard I/O stream:
–  Uses UNIX standard I/O stream
–  Used to send data across TCP connection
directly
–  Conversion routine performs send & recv
automatically

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 232
XDR standard I/O stream procedure
•  Create TCP socket
•  Call fdopen to attach stream w/ socket
•  Call xdrstdio_create to create XDR stream
& attach it to I/O descriptor
•  Call conversion routine to convert & send/
receive.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 233
Distributed application design paradigms:
•  Communication-oriented design:
– Starting w/ communication protocol
– Closer to protocol stack
– Low level
– As in sockets
•  Application-oriented design:
– Starting w/ application
– Problem solving approach
– Higher overhead
– As in RPC
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 234
Remote Procedure Call (RPC)
•  extension to local procedure call
–  dividing program into procedures
–  single thread of control passes through
procedures
–  moving procedures to remote machine (server)
•  differences from local procedure call
–  remote procedure is active & waiting
–  operate in different address space
–  different data representations
–  slower in performance

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 235
RPC
•  Using client/server paradigm to invoke remote
procedures.
•  Easier to code & maintain than low-level APIs
(sockets, XTI)
•  Program s thread of execution passes across
network to remote procedure & back
•  Synchronous
– Client process that issues request waits until it
gets response
•  Similar parameter passing as in local procedures

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 236
RPC products
•  2 popular packages: Sun RPC & DCE RPC
•  Others: Novell Netware RPC, IBM TCP/IP
RPC, …
•  Available in Unix, Windows, MVS, …
•  DCE RPC:
–  Also called OSF RPC
–  Adaptation of Apollo RPC
–  Uses network data representation
–  Integrated w/ DCE security & naming service
–  Provides interface definition language

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 237
Sun RPC
•  also called Open Network Computing (ONC)
RPC
•  specified in RFC 1831, Aug 95, version 2
•  used in many applications, ex: NFS
•  uses XDR for data representation
•  includes run-time library
•  includes compiler system, rpcgen.

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 238
Remote Procedure Identification
•  Each program assigned unique ID (32-bit integer)
0 - 1fffffff defined by rpc@sun.com
20000000 - 3fffffff defined by user
40000000 - 5fffffff transient
60000000 - ffffffff reserved
•  Each program assigned a version number
•  Each procedure within program assigned a number
(1 - N)
•  Procedure identified by triple: prog#, vers#, proc#
•  single remote procedure per program at a time
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 239
ONC RPC Communication Semantics
•  can use TCP or UDP as transport protocol
•  3 forms of RPC semantics:
–  Exactly once: operation performed only once
•  desirable but hard to achieve
–  At least once: operation performed at least once
•  call returns after multiple requests
–  Zero or more: operation performed zero or more
•  call doesn t return
•  caller assumes the worst case
–  may make operation idempotent
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 240
RPC Port Mapper
•  RPC procedure identified by triple (prog#,
vers#, proc#) in addition to IP address
•  remote process requires port # also
•  RPC program uses ephemeral port
•  dynamic mapping needed between RPC
program & port #
•  uses port mapper
–  RPC server program
–  uses well-known port (#111) for TCP & UDP

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 241
RPC Port Mapper Algorithm
•  port mapper starts first by creating passive socket
bound to port #111 & start listening
•  RPC program (on the same port mapper s
machine) starts by obtaining ephemeral port # &
sending registration request to port mapper given
its prog #, vers # & port # (1)
•  port mapper adds triple to its database
•  client contacts port mapper first to obtain port # of
RPC program giving prog # & vers # (2)
•  port mapper provides program s port # to client (3)
•  client then contacts specific program (4)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 242
Port Mapper

2 port mapper
3
client 1
4

server

Client host Server host


Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 243
RPC message format
•  Uses 32-bit integers & big-endian byte order
•  For request, it consists of:
–  4-byte message ID (transaction ID, XID): set
by client & returned by server for matching
–  4-byte message type: 0 for call, 1 for reply
–  4-byte RPC version # (currently 2)
–  4-byte remote program #
–  4-byte remote program version #
–  4-byte remote procedure #
–  Credentials (authentication)
–  Procedure arguments
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 244
RPC message Format (cont)
•  For reply, it consists of:
–  4-byte XID
–  4-byte message type (1)
–  4-byte reply status: 0 for accept, 1 for reject
–  For accept:
•  server verifier
•  accept status
•  encoded results
–  For reject:
•  reject status
•  reject data
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 245
RPC Programming
•  uses XDR language
–  similar to C
•  many structs are defined
•  many calls are used
–  both client & server
•  iterative server by default
– concurrent server through multithreading
•  different levels of programming

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 246
High-Level RPC Programming
registerrpc()

callrpc() svc_run()

Client Server

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 247
int registerrpc(u_long prognum, u_long versnum,
u_long procnum, char *(*procname)(),
xdrproc_t inproc, xdrproc_t outproc);
•  used to register high-level RPC server program
w/ port mapper
•  progrnum, versnum, procnum: triple
•  procname: procedure name
•  inproc, outproc: XDR conversion functions
void svc_run();
•  used by RPC server to wait for requests &
calls appropriate procedure
•  returns
CmpE207
in case of error only
Copyright 2011
248
by Fatoohi, All Rights Reserved
int callrpc(char *hostname, u_long prognum,
u_long versnum, u_long procnum, xdrproc_t
inproc, char *in, xdrproc_t outproc, char *out);
•  used by high-level client program to make
RPC request
•  hostname: name of remote host
•  in: input argument
•  out: output argument
callrpc() & registerrpc() use UDP
–  lower-level programming for TCP/UDP

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 249
Low-Level RPC Programming

clnt_create() svc_create()

svc_getargs()
svc_sendreply()
svc_free_args()

clnt_call() svc_run()

Client Server

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 250
int svc_create(void (*dispatch)(), u_long
prognum, u_long versnum, char *proto);
•  used to create dispatcher to handle multiple
procedures per program
•  used by low-level server program
•  dispatch: dispatcher program
•  proto: protocol (TCP or UDP)
•  it also registers w/ port mapper

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 251
Dispatcher
•  called whenever rpc request received
•  passed 2 params: struct svc_req * & SVCXPRT *
struct svc_req {
u_long rq_prog; /* program # */
u_long rq_vers; /* version # */
u_long rq_proc; /* procedure # */
struct opaque_auth rq_cred; /* credentials */
caddr_t rq_clntcred; /* credentials */
SVCXPRT *rq_xprt; /* transport handle*/
};
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 252
CLIENT* clnt_create(char *hostname, u_long
prognum, u_long versnum, char *proto);
•  Used by client to create integer identifier (handle)
for RPC calls

enum clnt_call(client *clntp, u_long procnum,


xdrproc_t inproc, char *in, xdrproc_t outproc,
char *out, struct timeval timeout);
•  Used by client to make RPC call
•  timeout: time out interval (default: 25 sec)

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 253
Rpcgen
•  Tool used to simplify development of RPC
applications
•  Developed by Sun (part of ONC RPC)
•  Compiler that takes specification file &
generates client stubs & server stubs
•  Specification file (written by user) contains
declaration of
–  Remote program & procedures
–  Global data types
–  Constants
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 254
Client Server
client server
main procedures

client server
stub stub

network software network software

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 255
Rpcgen (cont)
•  stubs (proxies, skeletons)
– generated by compiler
– contain most of RPC communication code
– represent (hide) the other side from caller/
receiver
– perform marshalling & unmarshalling of
RPC parameters
•  converting parameter from its programming
language representation to transmission
representation & vice versa

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 256
Rpcgen (cont)
•  process is automatic but incomplete & user s
intervention is needed
– user writes interfaces to stubs, client main,
server procedures as well as specification
file

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 257
client
client interface
main
X_clnt.c
c client

X.h
X.x rpcgen

X_xdr.c

c server
X_svc.c
server
procedure
server
interface

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 258
RPC application development
•  Start from application
•  Identify procedures to be remote & separate
them (by dividing program into two parts)
–  Sun RPC procedures use one argument & one
result (multiple arguments combined into struct)
•  Create specification file (X.x)
–  Using RPC language (similar to C)
–  Specify data types & constants shared between
client & server or needed to specify arguments
–  Specify procedures & their passing arguments
–  Assign program #, program version, procedure #

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 259
RPC application development (cont)
•  Run rpcgen on X.x, files generated (in C):
– Client stub (x_clnt.c)
•  Contains RPC calls to server
– Server stub (x_svc.c)
•  Contains server main & RPC calls to start &
communicate w/ client
– XDR conversion file (x_xdr.c) for both
client & server
– Header file (X.h) for client & server
•  Procedure name → name_version #
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 260
RPC Application Development (cont)
•  Write client & server stub interfaces
–  using rpcgen convention for procedure calls
–  may use optimization
•  Modify client main for RPC initialization &
declarations
–  may call clnt_create()
•  Compile & run client & server programs

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 261
RPCGEN Programming svcxxx_create()

clnt_create() svc_register()

svc_getargs()
svc_sendreply()
svc_free_args()

clnt_call() svc_run()

Client Server

xxx: tcp or udp

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 262
Multimedia Applications over Internet

•  Streaming stored multimedia


•  Streaming live multimedia
•  Real-time interactive multimedia

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 263
Streaming Multimedia Applications
•  Streaming multimedia: client starts playing out
multimedia application before receiving all data
•  Streaming stored multimedia:
–  Data stored at server typically compressed (ex.
YouTube, CNN)
–  Client starts streaming data (ex. RealPlayer, MS
Media)
–  Continuous playing out based on original timing of
recording w/ short delay (few seconds)
–  User can pause, rewind, fast-forward, index, …
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 264
Streaming Multimedia Applications (2)
•  Streaming live multimedia:
–  User receives live data (ex. Internet Radio, IPTV)
–  Data distribution to multiple users could be done
using IP multicasting
–  Continuous playing out w/ short delay (few seconds)
–  User cannot fast-forward but may be able to pause
& rewind w/ local buffering

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 265
Real-Time Interactive Multimedia
•  Users can interact w/ each other using
multimedia in real time
•  Ex. Internet telephony & video conferencing
•  Minimum delay (few 100 milliseconds)
•  Use standard real-time transport protocol

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 266
Real-Time Transport Protocol (RTP)
•  Standard IETF protocol (RFC 3550, July 2003)
used to transmit digitized multimedia
applications (audio/video) over IP network
•  Runs on top of UDP
–  TCP is not possible because of retransmission, …
–  Transport protocol implemented in application layer
•  Multiplex several real-time data streams into
UDP packets
•  Defines packet structure for multimedia appls
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 267
RTP (2)
•  Provides no retransmission, ACK, or QoS
–  Retransmitted data may arrive too late to be useful
•  Receiver may use forward error correction to
regenerate data from missing packet
•  It supports one-to-one, many-to-one, one-to-
many, many-to-many communication
•  Receives no special treatments by routers
•  Supports variety of applications
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 268
RTP Multiplexing
•  RTP session: all traffic from one or more
senders to destination pair (IP addr, port #)
–  Destination IP could be unicast or multicast
–  Session may consist of multiple RTP streams
•  RTP stream: sequence of packets that carry
related data from single synchronization source
–  Source may send one or more streams (different
data types)
–  Source may bundle multiple data streams into
single stream through encoding (ex. MPEG)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 269
RTP De-multiplexing
•  2 levels: session & stream de-multiplexing
•  Session de-multiplexing: at transport layer
–  Application specifies UDP port # (even ephemeral
number) & binds to unicast or multicast IP address
•  Stream de-multiplexing: at application layer
–  RTP application uses synchronization source ID &
packet type to group packets in the same stream

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 270
RTP Header
•  12-byte fixed header + possible extensions
•  Ver (2-bit): version # (currently 2)
•  P (1-bit): set if padding to 32-bit word is needed
–  Last byte of padding contains # of bytes padded
•  X (1-bit): set if extension header is present
•  CC (4-bit): # of contributing sources if any
•  M (1-bit): application-specific marker
–  Ex. Start of video frame or word in audio channel
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 271
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 272
RTP Header (2)
•  PTYPE (7-bit): payload type - type of encoding
–  It affects other fields (timestamp, marker)
•  SEQUENCE NUM (16-bit): counter
incremented in each packet to detect missing
packet
–  1st seq # chosen at random
•  TIMESTAMP (32-bit): time 1st byte of data
was sampled for playback
–  It incremented continuously even w/o data sent
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 273
RTP Header (3)
•  SYNCHRONIZATION SOURCE IDENTIFIER
(32-bit): (SSRC) - identify source of stream
–  Picked randomly by source
–  used to multiplex & de-multiplex sources
•  CONTRIBUTING SOURCE ID: (CSRC) – used
if mixer is present to identify original SSRCs
–  SSRC field contains mixer ID
–  CC field contains # of CSRCs

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 274
Delayed Playback & Jitter Buffer
•  Playback does not start when 1st packet arrives
–  To avoid gap in output since packets may arrive late
•  Therefore, receiver uses delayed playback
•  Incoming packets placed in a buffer, jitter buffer
–  First-in-first-out
•  Receiver sets threshold (k) > maximum jitter
expected & waits until enough data arrived
•  Playback starts when buffer size reaches threshold
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 275
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 276
Real-Time Transport Control Protocol
(RTCP)
•  Companion protocol to RTP specified in RFC 3550
•  Provides feedback about performance of network &
application: delay, jitter, congestion, bandwidth
•  Provides information about each session
•  Synchronizes & correlates different media streams
of the same sender
•  Transmits no data
•  Runs on top of UDP
–  Its port # is one higher than RTP port #
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 277
RTCP Header
•  12-byte fixed header
•  Ver (2-bit), P (1-bit): as in RTP
•  RC (5-bit): # of reports following header
•  PTYPE (8-bit): specify packet type
–  Sender Report, Receiver Report, Source Description,
Goodbye, Application
•  LENGTH (16-bit): total length of packet
including header
•  DATA Area: sequence of report records
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 278
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 279
RTCP Reports
•  For each stream, sender creates & transmits
sender report periodically:
–  SSRC of RTP stream
–  Wall-clock time in NTP
–  Timestamp generated by the same clock used in RTP
packets
–  # of packets sent in stream
–  # of bytes sent in stream
•  Sender reports used to synchronize different
media streams within RTP session
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 280
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 281
RTCP Reports (2)
•  For each stream, sender creates & transmits
source description:
–  SSRC of RTP stream
–  Sender s canonical name (CNAME) – uniquely
identifies sender (ex. user@host)
•  Streams from the same sender may have
different SSRCs but the same CNAME so they
can be synchronized

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 282
RTCP Reports (3)
•  For each stream received, receiver creates &
transmits receiver report:
–  SSRC of the received RTP stream
–  Fraction of packets lost
–  Last sequence # received
–  Inter-arrival jitter

Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 283
RTCP Traffic
•  RTCP traffic could be significant especially if
there are many receivers (multicast)
–  Since every receiver needs to submit receiver report
•  RTCP limits its traffic to 5% of RTP data traffic
–  Portion of it allocated to sender
–  Remaining is divided between multiple receivers
–  Receivers dynamically calculate their transmission
periods based on RTP packet size & allocated rate
•  RTCP reports can be packed into a single UDP
packet (up to UDP packet size limit)
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 284
RTP Implementation
•  2 approaches: library and integrated code
•  Library approach: using real-time support library
–  Using C library or Java class
–  More generic & supports multiple functionalities
–  Interacts w/ UDP (lower layer) & user application
(upper layer)
–  Altogether large code
•  Integrated approach: application has RTP & RTCP
support
–  Relatively small code
Copyright 2011
CmpE207 by Fatoohi, All Rights Reserved 285

You might also like