You are on page 1of 24

VoIP and Telephony

34350 Broadband Networks Pachito Marco Calabrese s101383@student.dtu.dk December 1, 2010


Abstract This is a comprison document describing VoIP and Telephony. It goes through VoIP architecture and protocols and classic telephony describing the differences of the different services. In the end has been explored some of the feature used in revolutionary Skype VoIP service.

Contents
1 Introduction 2 VoIP 2.1 Architecture . . . . . . . . . . . . . . . 2.1.1 Registration . . . . . . . . . . . 2.1.2 Dialogs . . . . . . . . . . . . . 2.1.3 SIP network and other networks 2.2 Protocols . . . . . . . . . . . . . . . . 2.2.1 Protocol stack . . . . . . . . . . 2.2.2 SIP . . . . . . . . . . . . . . . . 2.2.3 SDP . . . . . . . . . . . . . . . 2.2.4 RTP . . . . . . . . . . . . . . . 2.3 Security . . . . . . . . . . . . . . . . . 3 Telephony 3.1 Architecture . . . . . 3.2 Protocols . . . . . . 3.2.1 STS-1 frame . 3.3 Reliability . . . . . . 3.4 Security . . . . . . . 2 2 . 2 . 3 . 4 . 5 . 6 . 6 . 7 . 10 . 13 . 15 15 15 17 18 20 21 21 22

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

4 Differences between VoIP and Telephony 5 Skype

VoIP

Introduction

VoIP stands for Voice over Internet Protocol. VoIP is a term that describe a set of technology for provide telephony over Internet. The main difference with the classic telephony is the way the voice is transported in the network and the type of the network. The classic telephony uses PSTN (Public Switch Telephone Network) which is a circuit switched network (SN) VoIP uses Internet which is a package switched network (PN). The different way to transport audio ow force to implement the architecture, the protocols, and all the mechanisms for authentication handling compression and so in complete different fashion. From a top point of view the user will still have a telephony service.

VoIP

Figure 1: Logic VoIP network The purpose of the VoIP technology is provide telephony service over Internet. It takes a audio stream (the voice) and it sends through some mechanism over the Internet protocol using the broadband connection (one of the most common VoIP is Skype). A VoIP call can be establish between computers, between computer and xed phone or between computer and mobile phone. Also some smartphone are VoIP enabled and thank to the WiFi connection can establish calls from any WiFi hotspot. VoIP is also known as Internet Telephony or Voice over Broadband. Because the calls go through Internet they are not charged from the operator so VoIP might be a more convenient way to call. But before explore pros and cons, lets see how the architecture of VoIP service are, lets see hot the protocols works and lets also dig into the security. In this document we will focus on Session Initiation Protocol (IETF RFC 3261) which is one of the most common use open standard protocol for VoIP communication. We will also see some concept of Skype protocol. There are different protocols out there but it is difcult cover all of them. From the standard lets dene SIP. SIP is an application-layer control (signaling) protocol for creating, modifying, and terminating sessions with one or more participants. These sessions include Internet telephone calls, multimedia distribution, and multimedia conferences. Basically SIP is the signalling protocol for manage a session (a call between user) and is used for route calls and multimedia.

2.1

Architecture

SIP network is described in gure . There are the following logical entities: User Agent (UA) Registrars Location server Redirect server Proxy server

2.1

Architecture

VoIP

From a logical point of view an UA can be an UAC (UA-Client) or an UAS (UA-Server) and it will be for the entire SIP session. So the protocol SIP establish a communication (in a client-server fashion) between UAC and UAS. The UAC send some request and the UAS answare back with some responces. The main function of the SIP servers is to provide name resolution and user location, since the caller is unlikely to know the IP address or host name of the called party, and to pass on messages to other servers using next hop routing protocols. Lets see how SIP accomplish these tasks and how those logical entities are involved in the entire process. 2.1.1 Registration

Figure 2: Registration procedure [7]Registration is the procedure for discover how reach a user. Basically each user before establish a call must nd the user in the network. The logical device involved for the registration procedure is the Registrars. In refer to gure the user Carol send a REGISTER request to a the Registar. The registar will then store the address of Carol in the database (and , so now Carol can be reached. Now that Carol is registered Bob wants call her (INVITE request), so he digits carol SIP address-of-record carol@chicago.com on his smartphone (VoIP enabled). The chicago.com is resolved by DNS (in the same fashion it happends for an email) and then the INVITE request is sent to the SIP proxy server sip.chicago.com. The proxy server will receive the carol SIP address-of-record and it will look up in the table of the location server (query) and then it will get (Resp) the SIP URI that identify a particular end point for example carol@cube2214a.chicago.com. Now that the proxy server has the SIP URI of the user so can complete the INVITE message and deliver it to the user Carol. The REGISTER procedure must be repeated for keep alive the bindings of the address in the Registrar. Each Registrar request received from the Registrar will follow with a responce from the Registrar to the terminal with all the list of the binding and the expiring timer.

2.1

Architecture

VoIP

2.1.2

Dialogs

Figure 3: Example of a INVITE transaction When the users have URI and they have valid binding with the Registar they can be reached. A dialog represents a peer-to-peer SIP relationship between two user agents that persists for some time. The dialog is a ow of messages proper routed between the UAs involved in the dialog itself. A dialog is controlled by the dialog ID which is composed by: Call-id Local tag Remote tag The Call-id identify a unique group of messages between UAs. The format of a Call-id is the following f81d4fae-7dec-11d0-a765-00a0c91e6bf6@foo.bar.com. The local tag at one UA is identical to the remote tag at the peer UA. Dialogs are created by the generation of a specic set of valid request and responces (positive responses are 101-199 and 2XX). The massages are transmitted in a client-server fashion. The request for establish the dialog is the INVITE request. The INVITE request is generated by the UA that wants to call another UA. The INVITE request generates a INVITE transaction where is performed the three-way handshake authentication. Lets refer to gure, and lets make an example. The INVITE request 4

2.1

Architecture

VoIP

is sent by the UA with the SIP URI sip:marco@bari.it to the UA with SIP URI sip:luigi@roma.it. The invitation transaction performs and the INVITE message contains some information for control the calling (like from where the message came to who is send the messages and the Call-ID) and also contains a listing of media types and associated encoding that the calling party is willing . Same of the information are used for routing the INVITE message and some others will be usefull later for set up the dialog (list of the audio codecs). All the messages are sanded through proxies which will forward and route the messages to the correct UA. If the negotiation ends with positive responces, the dialog is set-up and then the users are connected peer-to-peer. When the dialog is established between two UAs, the UA who send the request will act as UAC and the UA who will responce will act as UAS (Note that these may be different roles than the UAs held during the transaction procedure before establish the dialog). Some of the response received from the UAC (and sanded by UAS) are: 100 Trying - This response indicates that the request has been received by the next-hop server and that some unspecied action is being taken on behalf of this call (for example, a database is being consulted). This response, like all other provisional responses, stops retransmissions of an INVITE by a UAC 180 Ringing - The UA receiving the INVITE is trying to alert the user. This response MAY be used to initiate local ringback 200 OK - The request has succeeded. The information returned with the response depends on the method used in the request 481 Call/Transaction Does Not Exist - This status indicates that the UAS received a request that does not match any existing dialog or transaction 484 Address incomplete - The server received a request with a Request-URI that was incomplete. Additional information SHOULD be provided in the reason phrase The packet switch nature of the network make the communication based on a client-server architecture. It is clear that this architecture and protocol are inspired from other protocols like HTTP, IMAP and POP3. Lets remember that SIP handles only the signalling and the messages exchange between the UAs, and does not care about the audio ow itself. The audio ow is exchanged directly between the peers involved in the conversation. 2.1.3 SIP network and other networks

Figure 4: Media Gateway interconnect VoIP network to TDM network Beside calling from VoIP to VoIP phone (also called softphone because they are software running on smartphone or PC) it is possible to call between a classic phone (connected to PSTN) and a VoIP phone. Thanks to the gateways which interconnect SIP network with other kind of networks. Basically the gateway is able to convert a bunch of SS7 protocol (signalling on the TDM) to VoIP protocols like SIP or H.323. For example a typical TDM to VoIP media gateway will support the following signalling protocols. 5

2.2

Protocols

VoIP

Figure 5: Protocols supported by a typical TDM to VoIP media gateway

2.2
2.2.1

Protocols
Protocol stack

Figure 6: SIP protocol stack Sip protocol stack is shown in gure 6. SIP works on UDP and TCP transport layer. SIP protocol itself is a signalling protocol and no more, so this means that for have a call we need other protocols that will take care the real time audio stream and the codecs for compress the audio. In this section we will see the protocols SIP RTP and the codecs used. 6

2.2

Protocols

VoIP

2.2.2

SIP

[7]As already we have seen the SIP protocol works in client server fashion. For registration, invite transaction and control of the call UAs send and receive request and responces. Lets list the SIP methods: INVITE ACK OPTION CANCEL BYE REGISTER These are methods sanded by the UAC (UA Client). Instead the UAS sends the Responses. The responses are grouped in six family. Lets list them: 1XX Provisional 100 Trying 2XX Successful 200 OK 3XX Redirection 302 Moved Temporarily 4XX Client Error 404 Not found 5XX Server Error 504 Server Time-out 6XX Global failure 603 Decline This methods and responses are used in text-encoded messages which are exchanged between UAs. For see how those messages are built lets take an example similar to the previous one and lets analyze step by step how an INVITE session works from a protocol point of view.

Figure 7:

2.2

Protocols

VoIP

[7, 8]Alice has a SIP URI sip: alice@atlanta.com. She wants call Bob, so she type on her soft phone the Bob SIP URI sip:bob@biloxi.com. where biloxi is the domain of the Bobs SIP service provider. When Alice text Bobs SIP URI and send the call. Sendind the calls means that she forward a INVITE message over the network. This text encoded message look like this:

INVITE sip:bob@biloxi.com SIP/2.0 Via: SIP/2.0/UDP pc33.atlanta.com;branch=z9hG4bK776asdhds Max-Forwards: 70 To: Bob <sip:bob@biloxi.com> From: Alice <sip:alice@atlanta.com>;tag=1928301774 Call-ID: a84b4c76e66710@pc33.atlanta.com CSeq: 314159 INVITE Contact: <sip:alice@pc33.atlanta.com> Content-Type: application/sdp Content-Length: 142
In the rst line of the message we nd INVITE which is the method used and then the SPI URI of the user we want to call and in the end the protocol with its version. The other lines are the headers of the message. Lets go through each of them. Via contains the protocol and the version (2.0) and the transport protocol used UDP in this case, after that there is the address (pc33.atlanta.com) at which Alice is expecting to receive responses to this request (that means that she is registered to that registar server). and in the end there is the branch parameter that identies this transaction. Max-Forwards denes how many hop server can go through, in this case this request cannot go more far away of 70 hops. Each hops will decrees of 1 this value. To contains the display name Bob and the SIP URI in the angular brackets towards which the request was originally directed. From also contains the originator of the request. It has the same format of the eld To, but also has a tag parameter containing a random string (1928301774) that was added to the URI by the softphone, it is used for identication purpose. Call-ID contains a globally unique identier for this call, generated by the combination of a random string and the softphones host name or IP address. The combination of the To tag, From tag, and Call-ID completely denes a peer-to-peer SIP logic link between Alice and Bob. This eld is unique for each call. CSeq or Command Sequence contains an integer and a method name. The CSeq increments for each new request within a dialog and is a traditional sequence number, it helps to track the command history of a dialog. Contact contains a SIP or SIPS URI that represents a direct route to contact Alice, usually composed of a username at a fully qualied domain name (FQDN). While an FQDN is preferred, many end systems do not have registered domain names, so IP addresses are permitted. This eld might be similar to the Via header eld tells other elements where to send the response, the Contact header eld tells other elements where to send future requests, within a dialog Via is used, in a new dialog is used the Contact eld for directly route the request to Alice. Content-Type contains a description of the message body in this case is a SDP packet Content-Length contains the length of the message body in octet in this case 142 bytes. The message body is not shown but we can see from the Content-Type and Content-Length headers that is a SDP packet long 142 bytes.SIP body message contains a Session Description Protocol packet. In the headers of the SDP packet there are details about the session, such as the type of media, codec, or sampling rate, which are not described using SIP. SDP will be discussed later in the document. Referring to the gure we can see that the message is forward to the SIP proxy server that serve Alices 8

2.2

Protocols

VoIP

domain atlanta.com can be resolved by DNS or can be an IP address saved in the Alices soft-phone. The proxy server atlanta.com receive the INVITE message and response back with a 100 Trying, which means that the server received the request and that the proxy server is processing the message to route it to the destination. The proxy server atalanta .com it obtains with a DNS request the IP address of the biloxi.com and forward the INVITE request there.But before forwarding the request, the atlanta.com proxy server adds an additional Via header eld value that contains its own address. So the text encoded message sent from atlanta.com to biloxi.com will be:

INVITE sip:bob@biloxi.com SIP/2.0 Via: bigbox3.site3.atlanta.com;branch=z9hG4bK77ef4c2312983 Via: SIP/2.0/UDP pc33.atlanta.com;branch=z9hG4bK776asdhds Max-Forwards: 69 To: Bob <sip:bob@biloxi.com> From: Alice <sip:alice@atlanta.com>;tag=1928301774 Call-ID: a84b4c76e66710@pc33.atlanta.com CSeq: 314159 INVITE Contact: <sip:alice@pc33.atlanta.com> Content-Type: application/sdp Content-Length: 142
The biloxi.com proxy server receives the INVITE and responds back with a 100 (Trying) .The atlanta.com knows that the INVITE request is processing by biloxi.com. The proxy server biloxi.com it ask to a database (location server) the IP address of Bob. The biloxi.com proxy server after have discovered the IP address of Bob (like the atlanta.com proxy server) adds another Via header eld value with its own address to the INVITE message and proxies it to Bobs SIP phone.

INVITE sip:bob@biloxi.com SIP/2.0 Via: SIP/2.0/UDP server10.biloxi.com ;branch=z9hG4bKnashds8 Via: SIP/2.0/UDP bigbox3.site3.atlanta.com;branch=z9hG4bK77ef4c2312983 Via: SIP/2.0/UDP pc33.atlanta.com;branch=z9hG4bK776asdhds Max-Forwards: 68 To: Bob <sip:bob@biloxi.com> From: Alice <sip:alice@atlanta.com>;tag=1928301774 Call-ID: a84b4c76e66710@pc33.atlanta.com CSeq: 314159 INVITE Contact: <sip:alice@pc33.atlanta.com> Content-Type: application/sdp Content-Length: 142
Bobs SIP phone receives the INVITE and the softphone it alerts Bob to the incoming call (by popping up a message on the display of a computer, or by ringing a smartphone) from Alice so that Bob can decide whether to answer the call. Bobs SIP phone wants to indicate to Alice that it is ringing, so it sent a 180 (Ringing) response, which is routed back through the two proxies in the reverse direction. Each proxy uses the Via header eld to determine where to send the response and removes its own address from the top. So Alices softphone will receive a ringing response and it will wait until Bob answare. Alice will warned perhaps by hearing an audio ringback tone or by displaying a message on Alices screen. Bob after some seconds decides to answer the call. When he picks up the handset, his SIP phone sends a 200 (OK) response to indicate that the call has been answered. The 200 (OK) carries, in its body, an SDP packet with a media description of the type of session that Bob is willing to establish with Alice. As a result, there is a two-phase exchange of SDP messages, the rst as been sent from Alice to Bob and second is sent from Bob back to Alice. This exchange of SDP packet provides basic negotiation capabilities.

2.2

Protocols

VoIP

SIP/2.0 200 OK Via: SIP/2.0/UDP server10.biloxi.com ;branch=z9hG4bKnashds8;received=192.0.2.3 Via: SIP/2.0/UDP bigbox3.site3.atlanta.com ;branch=z9hG4bK77ef4c2312983.1;received=192.0.2.2 Via: SIP/2.0/UDP pc33.atlanta.com ;branch=z9hG4bK776asdhds ;received=192.0.2.1 To: Bob <sip:bob@biloxi.com>;tag=a6c85cf From: Alice <sip:alice@atlanta.com>;tag=1928301774 Call-ID: a84b4c76e66710@pc33.atlanta.com CSeq: 314159 INVITE Contact: <sip:bob@192.0.2.4> Content-Type: application/sdp Content-Length: 131
This SIP message is a 200 response message and it has the same eld of the invite message. Bobs SIP phone has added a tag parameter to the To header eld. This tag will be incorporated in the future text encoded messages exchanged during this call. This 200 OK messages is routed to Alices softphone which will stop ringing and it will display that the call has been answered. In the Contact header of the last message there is the direct SIP URI of Bob so from now on he can be reached directly. In fact Alice sends an ACK (a 200 OK) back to Bob in a direct way bypassing the two proxies (from now on because both the ends points has learned their directly SIP URI they can exchange data without go through the proxies). This completes the INVITE/200/ACK three-way handshake used to establish SIP sessions. The SIP session between the Alice and Bob is up. Now the can send media packets using the format to which they agreed during the three-way handshake. In general, the end-to-end media packets take a different path from the SIP signaling messages, so now we have two logical point-to-point ow, one for the signalling held by the SIP protocol and another for the (audio) media ow held by some other protocol. A protocol that is able to perform real time streaming is the Real Time Protocol. But before see deep in the RTP lets see how SDP works. 2.2.3 SDP

[9]The Session Description Protocol (SDP) is a format for describing streaming media initialization parameters. As we have seen voice-over-IP calls have to convey some stream (audio) the SDP describe how this media information are transported. The streaming itself will be handled by some codecs (see gure 6)The set of properties and parameters are often called a session prole. SDP supports many media types and formats. The following parameters are described in a SDP session: The type of media (video, audio, etc.) The transport protocol (RTP/UDP/IP, H.320, etc.) The format of the media (H.261 video, MPEG video, etc.) For unicast IP sessions (like VoIP) are convey also this parameters: The remote address for media The remote transport port for media An SDP packet has the following elds (the elds with a O are optional):

10

2.2

Protocols

VoIP

Session description v o s i u e p c b z k a t r m i c b k a protocol version originator and session identier session name session information URI of description email address phone number connection information not required if included in all media zero or more bandwidth information lines time zone adjustments) encryption key zero or more session attribute lines Time description time the session is active zero or more repeat times Media description media name and transport address media title connection information optional if included at session level zero or more bandwidth information lines encryption key zero or more media attribute lines

O O O O O O O O O

O O O O O

An example for the offer of SDP session of a VoIP call is:

v=0 o=alice 2890844526 2890844526 IN IP4 host.atlanta.example.com s= c=IN IP4 host.atlanta.example.com t=0 0 m=audio 49170 RTP/AVP 0 8 97 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 iLBC/8000 m=video 51372 RTP/AVP 31 32 a=rtpmap:31 H261/90000 a=rtpmap:32 MPV/90000
for a answare is:

v=0 o=bob 2808844564 2808844564 IN IP4 host.biloxi.example.com s= c=IN IP4 host.biloxi.example.com t=0 0 m=audio 49174 RTP/AVP 0 a=rtpmap:0 PCMU/8000 m=video 49170 RTP/AVP 32 a=rtpmap:32 MPV/90000
Lets describe with more details some of the most relevant elds. Protocol Version ("v=") - gives the version of the SDP. The version in the previous examples is 0. Origin ("o=") - is a complex string containing some elds in the following format o=<username> <sess-id> <sess-version> <nettype> <addrtype> <unicast-address>. 11

2.2

Protocols

VoIP

<username> in the last example is bob, and is the ID of the user that originates the session. <seess-id> is the unique identier of the session. It is a number which is created from some of the elds contained in the origin eld and often with the Network Time Protocol for ensure uniqueness. <sess-version> is a version number for this session description, technically could be a clone of the sess-id like in the examples made. <nettype> is a text string giving the type of network, in our examples is IN which stands for Internet. <addrtype> is a text string giving the type of the address that follows. Basically could be "IP4" or "IP6" but can be dened other values as well. <unicast-address> is the address of the machine from which the session was created. For an address type of IP4, this is either the fully qualied domain name (FQDN) of the machine (like in our example) or the dotteddecimal representation of the IP version 4 (format xxx.xxx.xxx.xxx) address of the machine. For an address type of IP6, this is either the fully qualied domain name of the machine or the compressed textual representation of the IP version 6 address of the machine. Session Name ("s=") - is the name of the session, usually is a human intelligible name (could be VoIP call ). This eld cannot be empty in our examples there is a space, so actually is s= . Session Information ("i=") - provides textual information about the session, this eld is optional and is not present in ours examples. URI ("u=") - provides more information about the session, this eld is optional as well. Connection Data ("c=") - the format for this eld is <nettype> <addrtype> <connection-address> and contains connection data. The rst two elds are IN and IP4 and they have the same meaning of the internal elds of the origin eld. The last, which in our case is host.biloxi.example.com is an IP address in the unicast case. Timing ("t=") - denes the stop and start time of the session. The format of this eld is <start-time> <stop-time>. This eld can be opted during the session with additional "t=" lines. Media Descriptions ("m=") - has three sub-eld in the following format <media> <port> <proto> <fmt> lets describe them: <media> is the media type. Currently dened media are "audio", "video", "text", "application", and "message". In our last example is audio in the rst m eld and video in the second one. <port> is the transport port to which the media stream is sent. This value must coherent with the transport protocol specied in the sub eld protocol. <proto> is the transport protocol. The meaning of the transport protocol is dependent on the address type eld in the relevant "c=" eld. Basically if we have a IP4 protocol dened in the eld c= we must use a transport protocol for IP4. <fmt> is a media format description. The interpretation of the media format depends on the value of the <proto> sub-eld. If the <proto> sub-eld is "RTP/AVP" or "RTP/SAVP" (like in our examples) the <fmt> sub-elds contain RTP payload type numbers. Because the interpretation of this eld depends from <proto> eld there are other implementation so the meaning of the <fmt> can dramatically change depending from the <proto>.

SDP Attributes (a=) - there are many attributes for SDP but we go through only one used in the example the a=rtpmap. This attribute maps from an RTP payload type number (as used in an "m=" line) to an encoding name denoting the payload format to be used. It also provides information on the clock rate and encoding parameters. this manly is a value that refers to the encoding name used in the m= eld. In our specif case the a=rtpmap relative to the m= audio ... eld says that the type of the payload is 0 the codec is PCMU (Pulse code modulation known also as G.711 used mainly in the telephony [1]) 8000 indicates how many samples per second. For the video a= eld we have that the type of payload is 32 the codec is MPV and 90000 Hz as clock rate. A complete list of the RTP/AVP audio and video payload types is on http://en.wikipedia.org/wiki/RTP_aud 12

2.2

Protocols

VoIP

2.2.4

RTP

[6]The last protocol we need is the one for send the audio and video. The Real-time Transport Protocol (RTP) denes a standardized packet format for delivering audio and video over IP networks. The denition from the standard: RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. [6] Basically is the protocol for transmit the voice (because is a real-time data). RTP service include payload type identication, sequence numbering, timestamping and delivery monitoring, but does not provide any mechanism for ensure ow controlling (delivery or prevent out-of-order delivery) or QoS (only monitoring). RTP is supported by the protocol RTCP (RTP control protocol) to monitor the quality of service and to convey information about the participants in an on-going session. The audio data are preceded by RTP header. The RTP header indicates which kind of codecs are used also contains timing information and a sequence number that allow the receivers to reconstruct.

Figure 8: Because of the unpredictable Daley of the IP network the order at the receiver point is not the same of the sender, but fortunately we can reconstruct the right order thanks to the sequence number. Often (like in our case) an RTP data and RTP header are contained in a UDP packet (gure 6). RTP has xed header eld (while SDP and SIP where encoded tex messages). The RTP xed header is the following.

1 0

2 0

3 1

CC

PT time stamp synchronization source (SSRC) identier contributing source (CSRC) identiers ....

sequence number

Lets see the meaning of each eld: version (V): 2 bits - This eld identies the version of RTP. padding (P): 1 bit - If the padding bit is set, the packet contains one or more additional padding octets at the end which are not part of the payload. The last octet of the padding contains how many octets must be consider as padding included itself. extension (X): 1 bit - If the extension bit is set the xed header must be followed by exactly one header extention. CSRC count (CC): 4 bits - contains the number of CSRC identiers that follow the xed header. marker (M): 1 bit - indicates that this header warn a signicant event. This could be for example frame boundaries to mark the begin and the end of a packet stream. payload type (PT): 7 bits - This eld identies the format of the RTP payload and determines its interpretation by the application. For example a sender report packet will have a PT=200. Each well known packet has its own code and packet format (see table 1).

13

2.2

Protocols

VoIP

sequence number: 16 bits - The sequence number increments by one for each RTP data packet sent, and may be used by the receiver to detect packet loss and to restore packet sequence. The initial value of the sequence number should be random for security reason. timestamp: 32 bits - The timestamp reects the sampling instant of the rst octet in the RTP data packet. The sampling instant must be derived from a clock that increments monotonically and linearly in time to allow synchronization and jitter calculations. The role of the timestamp is important because without synchronization in the receiver end will be impossible to reconstruct the audio ow. SSRC: 32 bits - The SSRC eld identies the synchronization source. This identier should be chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identier. Basically the SSRC must be unique for each session. CSRC list: 0 to 15 items, 32 bits each - The CSRC list identies the contributing sources for the payload contained in this packet. The number of identiers is given by the CC eld.
Packet type 192 193 200 201 202 203 204 207 Acronym FIR NACK SR RR SDES BYE APP XR Description Full INTRA-frame request Negative acknowledgment Sender report for transmission and reception statistics from active senders (periodically transmitted) Receiver report for reception statistics from participants that are not active senders (periodically transmitted) Source description items (including CNAME canonical name) Goodbye Indicates end of participation Application specic functions RTCP extension

Table 1: Packet Type Let us make an example on the packetization of the audio and let us nd the packet rate and lets see how the sequence number and time stamp is used. An audio format for digitalizing the voice is the PCM (pulse code modulation). It works by sampling the voice at 8000 Hz. Basically each sampling is 125us of voice. Let us say that each RTP can carry 160 sampling of 125us each, so 20ms per packet. The rst packet will be send with a random time stamp (we call this random value x) and a random sequence number (we call it y) . The second packet will have the timestam incremented by 160, so will be x+160 and the sequence number of y+1). Each RTP packet will carry 160 samples of 8bit each so 1280 bits of payload. For better explanation of the SSRC and CSRC elds lets make another example.

Figure 9: Scenario with different media ows [10]What we can see from gure 9 is that the SSRC indicates a unique source like the microphones are labeled with different SSRC elds. The CSRC, instead, list different mixed sources which belong to the same media (in this case audio). So the SSRC=20 CSRC (10,11) tell us that the unique source SSRC=20 is a mix of two unique audio sources 10 and 11. What also we can see in this scenario is 14

2.3

Security

Telephony

the Translator. The Translator is an element of the RTP network which is able to translate the codecs for better utilization of the bandwidth. For example for a mobile device the bandwidth is limited so the translator before deliver the audio data to the mobile access network will recodec the data with a codec that requires less bandwidth (with some audio quality loss).

2.3

Security

Each layer of the protocol stack can include a security mechanism. The RTP carries the audio data, this data are condential. For ensure that this data are not transparent over the network the RTP can provide a secure layer with an AES encryption of the data. The lower layer can implement extra encryption and other mechanism like certication server logging, key distribution and authentication and integrity services (these mechanisms are complex and are outside the scope of this document). What is important to remember that by increasing the security the complexity and the processing penalty will increase as well.

3
3.1

Telephony
Architecture

[11]Telephone network is a Circuit-switched network. In a circuit switched network in order to communicate a circuit or connection must be established by the network. Basically there are three phases involved: circuit establishment, data transfer and circuit disconnect. The rst phase nd the path and alert the other party of the incoming call (by ringing the phone for example) allocate channel (dedicate resources). The data transfer consist in the conversation of the users involved by transferring voice data over the circuit created. And nally the last phase consist in the release of the resources when they are not used anylonger (hang up the phone). The cons of this kind of network is mainly that a allocated resource if not used is wasted because a circuit is a dedicate channel between the users, and is not shared with anyone else. Let us make an example of a basic telephone network.

Figure 10: A simple telephonenetwork In gure 10 we can see the links used in a basic telephone network. The local loop is a dedicate line that connects the central ofce 1 the the phone (or phones) of a subscriber. And the same happen for the other end central ofce 2 is connected with the other user phone with another local loop. Basically every user has a local loop connected with the (closest) central ofce. Central ofce are interconnected with a T1 line. T1 lines are SDH/SONET links that use Time Division Multiplex for multiplexing different calls. T1 line is organized into frames. Each frame contains 24 time slot. Each time slot is 8 bit long and carries a single voice call in the format PCM uncompressed 64Kbps. Before explore the architecture let us see the entities of a SDH/SONET network.

15

3.1

Architecture

Telephony

Figure 11: SDXC synchronous digital cross connect - Is the element that connects two rings, is able to switch between lines with different speed. Can also add and drop lower order signals. ADM add/drop multiplexer - this used as a node of ring and can add and drop lower order signals. MUX Multiplexer - multiplexes lower order SDH to higher order SDH. Links between PDH and SDH. DEMUX De-multiplexer - demultiplexs higher order SDH to lower order SDH. Reg - regenerates the signal. It also has some supervision functionality for the network administration. From a macroscopic point of view a generic wide telephone network can be drown as in gure 12. PDH networks (Plesiochronous Digital Hierarchy) are network where are transported high quantities of date over ber. The SDH-island network will provide low-to-high and high-to-low speed for interlink the customers. Generally the SDH-island are made from rings interconnected with each other.

Figure 12: PDH and SDH interconnected networks. Macroscopic view of a wide telephone network

16

3.2

Protocols

Telephony

Figure 13: Real case of a SDH-island network of rings interconnected

3.2

Protocols

SDH/SONET is a digital transport technologies for technology. PDH is the rst generation of digital transport technology. SONET was proposed by Bellcore (now Telecordia) and is the second generation digital transport technology. SONET stands for synchronous optical network. Synchronous because all the networks works synchronously and optic because the media where the digital data are on is optic ber. The third generation digital transport network is the G.709 (also known as digital wrapper). This new standard use optic multiplexing (WDM wavelength division multiplexing) and can carry IP packets, ATM cells Ethernet frames and SONET/SDH trafc. We will focus in this section on the SDH/SONET frames. As we already see T1 links are used to interconnect SONET/SDH equipment. Basically because over the network there is a continuous multiplexing and demultiplexing has been dened a standard that species how multiplex several voice calls onto a single link. This is called DS standard. Because of the PCM each voice calls is 64Kbps (DS0). DS1 is 24 DS0, so is 24 calls multiplexed. DS1C is the concatenation of two DS1 so is 48 calls. The information over SONET are transmitted into frames. Because of the synchronicity of the network these frames are transmitted continuously one after the other. All the data carried over optical link are converted in the electronic domain. The electrical side of SONET is called STS and the electrical side of the SDH. STM. The electrical side are equal and interwork awless. In the table we can see the Data relation between STS and STM. For example STM-1 is three times STS-1, so STM-1 has the same data rate of STS-3, which is 155.520 Mbps. Basically by explaining the STS-1 frame is easy to deduce how other STS and STM frames are structured.

17

3.2

Protocols

Telephony

3.2.1

STS-1 frame

Figure 14: SONET frame Each frame contains 810 bytes which can be rapresented by a matrix 9x90. 9 rows each of 90 bytes. the rst three byte of every row are dedicated for TOH (transport overhead). The rest is the payload. Basically the 3,33% of the entire frame is used for overhead, and the rest for payload. Also the payload is called SPE (synchronous payload envelope) and it has some overhead (actually does not carry only pure raw data) called POH (payload overhead. The TOH is made of the SOH (section overhead) and the LOH (line overhead). Section line and path are entities of the SONET stack. Let us dene them. Referring to the gure the path links user-to-user and rapresents the circuit created by the network for set up the conversation. The line is the logical link that inter connets multiplexing/demultiplexing nodes and transport multiple STS-1 frames in bigger frame like STS-12 (for example). The section is the logical link between every SONET equipment like a generator and a multiplexer. Devices that can terminate a path a line or a sectin are respectively called PTE, LTE and STE (TE stands for terminating equipment). In a STS-1 the SOH contains the following elds A1 and A2 - Framing bytes are used for alignment. Identify uniquely the beginning of an STS-frame. J0 - Section trace is for trace the STS-1 frame back to its originating equipment. B1 - Bit interleaved parity is the BIP-8 to perform an even parity check on the previous STS-1. E1 - 64Kbps channel provide a voice channel for eld engineers. F1 - This byte is used by the network operator. D1,D2,D3 These bytes are used for network management operations. The LOH contains H1 and H2 are pointer bytes. Species the offset between the H1 and H2 and the beginning of the SPE. H3 - pointer action used for compensate the slightly timing differences between SONET devices. B2 - carries BIP-8 parity check performed. K1 and K2 - These two bytes are used in automatic protection switching. D4 to D12 - These are 576Kbps which is used for network management. 18

3.2

Protocols

Telephony

Z1 and Z2 - These two bytes have been partially dened. E2 - This byte is similar to the E1 byte in the section overheads. The POH is embedded in the SPE and has the following in elds J1 - This byte is similar to J0 in the section overhead. B3 - This byte carries like the B1 and B2 the BIP-8 parity check on the payload section. C2 - This denes which kind of user information carried in the SPE. VT, asynchronous DS3, ATM cells, HDLC-over-SONET, and PPP and SONET. G1 - Path status byte carries diagnostic signals. F2 - This byte is reserved for future use. H4 - This byte is used to identify payloads carried within the same frame. Z3 and Z4 - These are reserved for future use. Z5 - This byte is used for tandem monitoring. A tandem is a telephone switch that is used in the backbone telephone network. It switches trafc between other telephone switches and it does not serve subscribers directly. STS frames can carries three type Virtual tributaries This VTG (virtual tributary groups) payload is 108 byte which refereed to the gure occupies 7 rows. Ina SPE there could be maximum 12 VTG. Then will remain 3 column one for POH, and the other two are reserved for future use. All the different format of VTG are described in the table below.
VT VT1.5 VT2 VT3 VT6 carried by the VT DS1 E2 DS1 unchannelized 192bits + 1 framing bit DS2 VT in VTG 4 3 2 1 Voice channels per VT 24 30 192 bit (data) 96 Voice channel per VTG 96 90 384 bit 96 Voice channel per SPE 1152 1080 4608 bit 1152

Table 2: Virtual tributaries ATM cells As we can see in gure 14 in the SPE we have 774 byte. Because an ATM cell consists of 53 bytes. An SPE can contain 774/53=14,6 ATM cells. Because is not a integer number the last cell can be straddle in two successive SPE. Because ATM cells are not transmitted continuously there are idle cells for maintain a continuous bit stream. Idle cells can be identied uniquely since the header is marked with VPI=0 VCI=0 PTI=0 and CLP=0. Packet over SONET (PoS) Is a scheme for carry directly IP packets over SONET frames. IP packets are encapsulated in HDLC and the are mapped in to the SPE payload. Like for the ATM cells the packets can be straddled in two successive SPE. Like for the ATM cells there is an idle frame (7E) for continuous bit stream when there are no IP packets to transmit.

19

3.3

Reliability

Telephony

3.3

Reliability

Figure 15: Failure on the working ring link The way SONET rings are architecture, makes SONET infrastructures highly reliable. Also for this reason they are called ve nines, which basically means the 99,999% they are working. Can happen in every infrastructure that a cable get cutted or some equipment stop working. For prevent out-of-service because of this accidents the SONET has some mechanism for be highly reliable. The high degree of redundancy ensure that there is always some link that connects two nodes or there are always some alternative paths available in case of a failure. For example for the ring ber links there are different scheme of redundancy 1+1 1:1 and 1:N. The 1+1 scheme has two bers(usually called working and protection ber) which are used simultaneously and if one brakes all the trafc will be put only on the working ber while the engineering will x the problem on the broke ber. In the 1:1 there are two bers but only one is used, but if get broken then the trafc will switched on the protection ber. the 1:N is similar on the 1:1 scheme but there are N protection bers. Is important to keep in mind that the two links should follow different routes, otherwise the redundancy is really not efcient. The system is called self-healing and is managed by the automatic protection switching protocol (APS). The restoration time of the service hes to be less than 50 msec. The redundancy is applied also in from other point of view of the infrastructure. All the equipment is always powered by uninterruptly power supplies (UPS).This beacuse if there is a black out all the equipment must get powered by a different power source for be kept on and ensure the continous of service. UPS provides also high quality power source by ltering spikes and prevent small drop voltage of the electricity line, and other different kind of power protection. The data integrity is ensured by the BIP-8 algorithm.

20

3.4

Security

Differences between VoIP and Telephony

3.4

Security

Figure 16: SONET/SDH network with Encryptor Data over SONET/SDH are condential, so the data must be protected by some mechanism. Security on the SONET/SDH level is ensure by dedicate equipment. This equipment are called SONET Encryptor and basically they encrypt data over the SONET network otherwise sent in plain text. In the gure we can see where the SONET/SDH are encrypted. There is a small section of the network where the data trafc travels in plain text.

Differences between VoIP and Telephony

Figure 17: Three protcols involved for a VoIP call In this section we will compare the main differences of the VoIP and Telephony. Network category - VoIP runs on IP/UDP which is a connectionless switched network. The voice calls run on SONET/SDH network which is circuit switched network. The difference of the two kind of the network are mainly in the way allocate resources and transmit the data. A packet switching network the packets are transferred separately, and they follow different path over the network. The packets because are following different path arrive to the destination with a different order compered to the origin point. This means that the for VoIP we need some protocols for ordering the data. SONET/SDH allocate a real circuit which link end to end the users. Basically the circuit is a dedicated link for a determinate period, when the phone call is released that path will be available. Architecture - The equipment for a IP network is denitely more complex and involves more protocols and more management for working (but is denitely more exible). SONET/SDH is a simpler network and use simpler protocol based in embedded protocol stack over the physical layer.

21

Skype

Flexibility - The IP/UDP stack (where VoIP is up on) has a better use of the capacity of the network. The SONET/SDH is a continuous synchronized ow caring data, if there are not data that capacity will be wasted because will be sent idle frame. Also VoIP can provide different media services, and it can evolve following future implementation for easy update. Security - Both networks provide security. The public nature of the IP network is less secure then a SONET/SDH network, but both provide high standard of securty. Protocols architecture - The protocols architecture of a VoIP network is much more complex envolves many protocols and mechanism working all togethere. Basically in a VoIP call are envolved three protocols which overlay on an IP/UDP or an IP/TCP stack. VoIP must also ght against the ansence of a QoS system because of the unpredictible delay of the IP network. Cost - From a user point of view is really acceptable calls from two continent for free, so VoIP has deantly some advantages. But classic telephony has became cheap and the operator are usually providing all-inclusive subscription with free-call in the operator country. For example in Italy with 20C per month you can have a broadband connection typically 4Mbit and free call in all Italy. So also telephony is not an expensive service. VoIP is interesting solution for inter-communication in big companies and ofce. Reliability - Telephony network is an extreme reliabile network. Basically is almost never down. VoIP is highly dipendent from the IP and the access network (who you are attached to). In 30 calls (from 3 minutes up to over 1 hour call) with a VoIP service I have personally experienced 4 extremly bad quality calls 7 accettable and the rest were good (in other terms more then 10% of the call did not have success). In my entire life i have never experienced a bad call on the classic Telephone network.

Skype

Skype is a VoIP service and does not use the protocol stack described in this document. Skype protocol is closed source and there are no docomentation about how it works. Skype is a relativy new software and protocol - August 2003, the rst public beta version was released [5]- and reached some unbelievable milestone like 25000000 online users.

22

Skype

Figure 18: Skype user online and new user perday[4] The key feature of Skype is the way use the network, because is based on a P2P (peer-to-peer) paradigm. Basically the Skype network system is based on the same techonlogy [5]widely deployed and popularized by le-sharing applications such as Napster and KaZaA. The denition of P2P is A true P2P system, in our opinion, is one where all nodes in a network join together dynamically to participate in trafc routing-, processing- and bandwidth intensive tasks that would otherwise be handled by central servers. Basically in a P2P system a user can be simultanously client and server. Providing services (like uploading) to a set of user (be a server) and requiring services from a set(be a client) user(like downloading). This kind of networks are called decentralized P2P networks and have several advantages over classic client-server networks like the VoIP stack discussed on this document. Tradicional client-server network wont scale linerally because increasing the clients meand that ther services will be shared from many user, more user means each user get less. A P2P networks scale indenitely without increasing search time and without the need for costly centralized resources (server) . This means that the network resources increase by increasing the number of the users of the P2P network. Basically more user does not mean more server (which are expensive). Another Skype P2P network insipiring feautre is the really fast searching and all-aware paradigm. The Global Index technology is a multi-tiered network where supernodes communicate in such a way that every node in the network has full knowledge of all available users and resources with minimal latency. Decentralizing information means nd them in many small boxes, this for small P2P network can be done easy enough but for fast growing and extremly wide network can takes time. Latency is not good in a real time application like VoIP, so Skype engeneering has developed a Global Index technology able to make supernodes comunicate in such away that all nodes are aware of the available users and resources with extrem low latency. P2P network performances are routing dependence. Skype use a techinque that keep open a set of path and dynamically use the most performance ones. Finally Skype like any other VoIP encrypts all calls and instant messages end-to-end because it use a TCP/IP and UDP/IP stack. 23

References

References

References
[1] http://en.wikipedia.org/wiki/G.711 [2] http://whitepapers.hackerjournals.com/wp-content/uploads/2010/08/Multimedia-over-IP.pdf [3] http://en.wikipedia.org/wiki/Skype [4] http://skypejournal.com/blog/2010/11/25/skypes-25mm-dialtone-raises-questions-for-investors/ [5] http://www.skype.com/intl/en-us/support/user-guides/p2pexplained/ [6] RFC 3550 [7] RFC 3261 [8] RFC 4317 [9] RFC 4566 [10] Multimedia-over-IP by Dennis Baron McGraw-Hill [11] Connection-oriented Networks (rst and second Chapter)

24

You might also like