You are on page 1of 15

Web server Web servers are computers that deliver (serves up) Web pages.

Every Web server has an IP address and possibly a domain name. For example, if you enter the URL http://www.pcwebopedia.com/index.html in your browser, this sends a request to the Web server whose domain name ispcwebopedia.com. The server then fetches the page named index.html and sends it to your browser. Any computer can be turned into a Web server by installing server softwareand connecting the machine to the Internet. There are many Web server software applications, including public domain software from NCSA and Apache, and commercial packages from Microsoft, Netscape and others. How Do Web Servers Work Have you ever wondered just exactly how this Web page you are reading found its way into your browser and onto your computer screen? The process largely depends on Web servers. Read on as Webopedia briefly explains the mechanisms that bring Web pages to your home, your office or your mobile computers. Typically, users visit a Web site by either clicking on a hyperlink that brings them to that site or keying the site's URL directly into the address bar of a browser. But how does the same site appear on anyone's computer anywhere in the world and often on many computers at the same time? Let's use Webopedia as an example. You decide to visit Webopedia by typing its URL -- http://www.webopedia.com -into your Web browser. Through an Internet connection, your browser initiates a connection to the Web server that is storing the Webopedia files by first converting the domain name into an IP address (through adomain name service) and then locating the server that is storing the information for that IP address (also see Understanding IP Addressing). The Web server stores all of the files necessary to display Webopedia's pages on your computer -- typically all the individual pages that comprise the entirety of a Web site, any images/graphic files and any scripts that make dynamic elements of the site function. Once contact has been made, the browser requests the data from the Web server, and using HTTP, the server delivers the data back to your browser. The browser in turn converts, or formats, the computer languages that the files are made up of into what you see displayed in your browser. In the same way the server can send the files to many client computers at the same time, allowing multiple clients to view the same page simultaneously IP address An identifier for a computer or device on a TCP/IP network. Networks using the TCP/IP protocol route messages based on the IP address of the destination. The format of an IP address is a 32-bit numeric address written as four numbers separated by periods. Each number can be zero to 255. For example, 1.160.10.240 could be an IP address. Within an isolated network, you can assign IP addresses at random as long as each one is unique. However, connecting a private network to the Internetrequires using registered IP addresses (called Internet addresses) to avoid duplicates. The four numbers in an IP address are used in different ways to identify a particular network and a host on that network. Four regional Internet registries -- ARIN, RIPE NCC, LACNIC and APNIC -- assign Internet addresses from the following three classes. Class A - supports 16 million hosts on each of 126 networks Class B - supports 65,000 hosts on each of 16,000 networks Class C - supports 254 hosts on each of 2 million networks The number of unassigned Internet addresses is running out, so a new classless scheme called CIDR is gradually replacing the system based on classes A, B, and C and is tied to adoption of IPv6. Home > Did You Know > Understanding IP Addressing Understanding IP Addressing Every computer that communicates over the Internet is assigned an IP addressthat uniquely identifies the device and distinguishes it from other computers on the Internet. An IP address consists of 32 bits, often shown as 4 octets of numbers from 0-255 represented in decimal form instead of binary form. For example, the IP address: 168.212.226.204 in binary form is 10101000.11010100.11100010.11001100. But it is easier for us to remember decimals than it is to remember binary numbers, so we use decimals to represent the IP addresses when describing them. However, the binary number is important because that will determine which class of network the IP address belongs to. An IP address consists of two parts, one identifying the network and one identifying the node, or host. The Class of the address determines which part belongs to the network address and which part belongs to the node address. All nodes on a given network share the same network prefix but must have a unique host number. Class A Network -- binary address start with 0, therefore the decimal number can be anywhere from 1 to 126. The first 8 bits (the first octet) identify the network and the remaining 24 bits indicate the host within the network. An example of a Class A IP address is 102.168.212.226, where "102" identifies the network and "168.212.226" identifies the host on that network. Class B Network -- binary addresses start with 10, therefore the decimal number can be anywhere from 128 to 191. (The number 127 is reserved for loopback and is used for internal testing on the local machine.) The first 16 bits (the first two octets) identify the network and the remaining 16 bits indicate the host within the network. An example of a Class B IP address is 168.212.226.204 where "168.212" identifies the network and "226.204" identifies the host on that network. Class C Network -- binary addresses start with 110, therefore the decimal number can be anywhere from 192 to 223. The first 24 bits (the first three octets) identify the network and the remaining 8 bits indicate the host within the network. An

example of a Class C IP address is 200.168.212.226 where "200.168.212" identifies the network and "226" identifies the host on that network. Class D Network -- binary addresses start with 1110, therefore the decimal number can be anywhere from 224 to 239. Class D networks are used to support multicasting. Class E Network -- binary addresses start with 1111, therefore the decimal number can be anywhere from 240 to 255. Class E networks are used for experimentation. They have never been documented or utilized in a standard way. IP Addressing Fundamentals IP uses an anarchic and highly-distributed model, with every device an equal peer to every other device on the global Internet. This structure was one of IP's original design goals, as it proved to be useful with a variety of systems, did not require a centralized management system (which would never have scaled well), and provided for fault-tolerance on the network (no central management means no single point of failure). In order for systems to locate each other in this distributed environment, nodes are given explicit addresses that uniquely identify the particular network the system is on, and uniquely identify the system to that particular network. When these two identifiers are combined, the result is a globally-unique address. This concept is illustrated in Figure B-1. In this example, the network is numbered 192.168.10, and the two nodes are numbered 10 and 20. Taken together, the fully-qualified IP addresses for these systems would be 192.168.10.10 and 192.168.10.20. Figure B-1. The two parts of an IP address Subnet Masks and CIDR Networks IP addresses are actually 32-bit binary numbers (for example, 11000000101010000000000100010100). Each 32-bit IP address consists of two subaddresses, one identifying the network and the other identifying the host to the network, with an imaginary boundary separating the two. The location of the boundary between the network and host portions of an IP address is determined through the use of a subnet mask. A subnet mask is another 32-bit binary number, which acts like a filter when it is applied to the 32-bit IP address. By comparing a subnet mask with an IP address, systems can determine which portion of the IP address relates to the network, and which portion relates to the host. Anywhere the subnet mask has a bit set to "1", the underlying bit in the IP address is part of the network address. Anywhere the subnet mask is set to "0", the related bit in the IP address is part of the host address. For example, assume that the IP address 11000000101010000000000100010100 has a subnet mask of 11111111111111111111111100000000. In this example, the first 24 bits of the 32-bit IP address are used to identify the network, while the last 8 bits are used to identify the host on that network. The size of a network (i.e., the number of host addresses available for use on it) is a function of the number of bits used to identify the host portion of the address. If a subnet mask shows that 8 bits are used for the host portion of the address block, a maximum of 256 possible host addresses are available for that specific network. Similarly, if a subnet mask shows that 16 bits are used for the host portion of the address block, a maximum of 65,536 possible host addresses are available for use on that network. If a network administrator needs to split a single network into multiple virtual networks, the bit-pattern in use with the subnet mask can be changed to allow as many networks as necessary. For example, assume that we want to split the 24bit 192.168.10.0 network (which allows for 8 bits of host addressing, or a maximum of 256 host addresses) into two smaller networks. All we have to do in this situation is change the subnet mask of the devices on the network so that they use 25 bits for the network instead of 24 bits, resulting in two distinct networks with 128 possible host addresses on each network. In this case, the first network would have a range of network addresses between 192.168.10.0 192.168.10.127, while the second network would have a range of addresses between 192.168.10.128 -192.168.10.255. Networks can also be enlarged through the use of a technique known as "supernetting," which works by extending the host portion of a subnet mask to the left, into the network portion of the address. Using this technique, a pair of networks with 24-bit subnet masks can be turned into a single large network with a 23-bit subnet mask. However, this works only if you have two neighboring 24-bit network blocks, with the lower network having an even value (when the network portion of the address is shrunk, the trailing bit from the original network portion of the subnet mask will fall into the host portion of the new subnet mask, so the new network mask will consume both networks). For example, it is possible to combine the 24-bit 192.168.10.0 and 192.168.11.0 networks together since the loss of the trailing bit from each network (00001010 vs. 00001011) produces the same 23-bit subnet mask (0000101x), resulting in a consolidated 192.168.10.0 network. However, it is not possible to combine the 24-bit 192.168.11.0 and 192.168.12.0 networks, since the binary values in the seventh bit position (00001011 vs. 00001100) do not match when the trailing bit is removed. In the modern networking environment defined by RFC 1519 [Classless Inter-Domain Routing (CIDR)], the subnet mask of a network is typically annotated in written form as a "slash prefix" that trails the network number. In the subnetting example in the previous paragraph, the original 24-bit network would be written as 192.168.10.0/24, while the two new networks would be written as 192.168.10.0/25 and 192.168.10.128/25. Likewise, when the 192.168.10.0/24 and 192.168.11.0/24 networks were joined together as a single supernet, the resulting network would be written as 192.168.10.0/23. Note that the slash prefix annotation is generally used for human benefit; infrastructure devices still use the 32-bit binary subnet mask internally to identify networks and their routes. All networks must reserve any host addresses that are made up entirely of either ones or zeros, to be used by the networks themselves. This is so that each subnet will have a network-specific address (the all-zeroes address) and a broadcast address (the all-ones address). For example, a /24 network allows for 8 bits of host addresses, but only 254 of the 256 possible addresses are available for use. Similarly, /25 networks have a maximum of 7 bits for host addresses, with 126 of the 128 possible addresses available (the all-ones and all-zeroes addresses from each subnet must be set aside for the subnets themselves).

Table B-1 shows some of the most common subnet masks, and the number of hosts available on them after subtracting

the all-zeroes and all-ones addresses. Table B-1: Common Subnet Masks and Their Host Counts Subnet Mask (Slash Prefix) /16 /17 /18 /19 /20 /21 /22 /23 /24 /25 /26 /27 /28 /29 /30 Subnet Mask (Dotted Decimal) 255.255.0.0 255.255.128.0 255.255.192.0 255.255.224.0 255.255.240.0 255.255.248.0 255.255.252.0 255.255.254.0 255.255.255.0 255.255.255.128 255.255.255.192 255.255.255.224 255.255.255.240 255.255.255.248 255.255.255.252 Network Bits in Subnet Mask 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Host Bits in Subnet Mask 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 Hosts per Net 65,534 32,766 16,382 8,190 4,094 2,046 1,022 510 254 126 62 30 14 6 2

All the systems on the same subnet must use the same subnet mask in order to communicate with each other directly. If they use different subnet masks they will think they are on different networks, and will not be able to communicate with each other without going through a router first. Hosts on different networks can use different subnet maks, although the routers will have to be aware of the subnet masks in use on each of the segments. Subnet masks are used only by systems that need to communicate with the network directly. For example, external systems do not need to be aware of the subnet masks in use on your internal networks, since those systems will route data to your networks by way of your parent network's address block. As such, remote routers need to know only about your provider's subnet mask. For example, if you have a small network that uses only a /28 prefix that is a subset of your ISP's /20 network, remote routers need to know only about your upstream provider's /20 network, while your upstream provider needs to know your subnet mask in order to get the data to your local /28 network. The Legacy of Network Classes The use of variable-length subnet masks as described in the preceding section was introduced to the Internet community in 1993 by RFC 1519 as a methodology for maximizing the utilization of limited IPv4 addresses. Even though this specification is nearly a decade old--and even though it is the addressing and routing architecture required by the modern Internet--many legacy systems and documents still refer to the "class-based" addressing architecture that preceded CIDR. Under the old class-based architecture, network addresses are assigned according to fixed subnet mask values, called "classes." These classes are listed in Table B-2. Note that classes A, B, and C are used for end-user network assignments, while classes D and E do not contain end-user addresses. Table B-2: Class-Based Subnet Masks Class Subnet Mask (Slash Prefix) Subnet Mask (Dotted Decimal) Usage Description A B C D E /8 /16 /24 /32 Undefined 255.0.0.0 255.255.0.0 255.255.255.0 255.255.255.255 Undefined Very large networks, always subnetted Large networks, typically subnetted Small networks, the most common class Multicasting group addresses (no hosts) Reserved for experimental purposes

The primary benefit of the class-based model is that routers do not have to be made explicitly aware of the subnet mask in use on a particular network. Whereas CIDR requires that each network route be accompanied by a subnet mask for the advertised network, under the class-based model routers only have to examine the destination IP address to determine the subnet mask associated with that address. By looking at the four leading bits from the destination address, a device can determine which class the destination address falls into, and use that information to determine its subnet mask. Once this information is gleaned, the device can determine which portion of the address refers to the network, and then look up the router for that network. This concept is illustrated in Table B-3. Table B-3: The First Four Bits from the Major Network Classes Class Lead Bits A B C 0xxx 10xx 110x Slash Prefix /8 /16 /24 Possible Address Values[1] Nets per Class Hosts per Net 0.0.0.0-127.255.255.255 128 16,777,214 65,534 254

128.0.0.0-191.255.255.255 16,384 192.0.0.0-223.255.255.255 2,097,152

D E

1110 1111

/32

224.0.0.0-239.255.255.255 268,435,456

0[2] Undefined

Undefined 240.0.0.0-255.255.255.255 Undefined

The address-to-class mapping methodology is described in more detail here: Class A networks Class A addresses always have the first bit of their IP addresses set to "0". Since Class A networks have an 8-bit network mask, the use of a leading zero leaves only 7 bits for the network portion of the address, allowing for a maximum of 128 possible network numbers, ranging from 0.0.0.0 -127.0.0.0.0. However, many of the address blocks from this range have been set aside for other uses over time. In short, any IP address in the range of 0.x.x.x -127.x.x.x is considered a Class A address with a subnet mask of 255.0.0.0, although many of these addresses are considered invalid by Internet routers. Class B networks Class B addresses have their first bit set to "1" and their second bit set to "0". Since Class B addresses have a 16-bit network mask, the use of a leading "10" bit-pattern leaves 14 bits for the network portion of the address, allowing for a maximum of 16,384 networks, ranging from 128.0.0.0 -191.255.0.0. Several network addresses from this range have also been reserved over time. Any IP address in the range of 128.0.x.x -191.255.x.x is considered a Class B address with a subnet mask of 255.255.0.0, but many of these addresses are considered invalid by Internet routers. Class C networks Class C addresses have their first two bits set to "1" and their third bit set to "0". Since Class C addresses have a 24-bit network mask, this leaves 21 bits for the network portion of the address, allowing for a maximum of 2,097,152 network addresses, ranging from 192.0.0.0 -223.255.255.0. Many network address reservations have been made from the Class C pool, substantially reducing the number of Internet-legal Class C networks. Any IP address in the range of 192.0.0.x 223.225.255.x is considered a Class C address with a subnet mask of 255.255.255.0, but again many of these addresses are considered invalid by Internet routers. Class D networks Class D addresses are used for multicasting applications, as described in Chapter 4, Multicasting and the Internet Group Management Protocol. Class D addresses have their first three bits set to "1" and their fourth bit set to "0". Class D addresses are 32-bit network addresses, meaning that all the values within the range of 224.0.0.0 -239.255.255.255 are used to uniquely identify multicast groups. There are no host addresses within the Class D address space, since all the hosts within a group share the group's IP address for receiver purposes. None of the network addresses were reserved in the original allocations, although a variety of subsequent multicast addressing schemes have resulted in reservations. Refer to the IANA pages athttp://www.isi.edu/innotes/iana/assignments/multicast-addresses for information on these reservation schemes. In short, each multicast address exists as a 32-bit network address, so any address within the range of 224.0.0.0 -239.255.255.255 is a Class D multicast address. Class E networks Class E addresses are defined as experimental and reserved for future testing purposes. They have never been documented or utilized in a standard way. The number of networks available to each of the subnet classes--and the number of hosts possible on each of those networks--varies widely between the classes. As we saw in Table B-3, there are only a few Class A networks available, although each of them can have millions of possible hosts. Conversely, there are a couple of million possible Class C networks, but they can serve only 254 devices each (after subtracting for the all-ones and all-zeroes host addresses). All told, there are around 4.3 billion IP addresses (less, if you don't consider Class D and E addresses, which cannot be used as host addresses). Unfortunately, the class-based, legacy addressing scheme places heavy restrictions on the distribution of these addresses. Every time a Class A address is assigned to an organization, almost 17 million host addresses go with it. If all 126 Class A networks were assigned, two billion of the possible addresses would be gone. If all the available Class B networks were assigned, another billion host addresses would be gone as well. This is true regardless of whether the host addresses within those network blocks are used or not; the network address is published along with its routing information, so all host addresses within the network are reachable only through that route. Class C addresses represent the biggest problem, however, for two reasons. First of all, there are fewer IP addresses available in all the Class C networks than there are in the other classes (only about 536 million possible host addresses from all the Class C networks combined). Second, Class C networks are the most popular, since they reflect the size of the majority of LANs in use. Every time a Class C address is assigned, 256 addresses go with it. Organizations that have 3 segments but only 60 devices are wasting over 700 possible addresses (3 segments 256 maximum IP addresses = 768 addresses - 60 active nodes = 708 inactive addresses). Whether all the addresses are actually put to use or not is irrelevant; they are assigned to a specific network and cannot be used by anybody else. This problem is even worse with Class B addresses, since an organization with a few hundred nodes might be given a Class B address, in which case it is wasting several thousand IP addresses. Remember, however, that TCP/IP networks are inherently router-based, and it takes much less overhead to keep track of a few networks than millions of them. If all the addresses were assigned using Class C networks, routers would have to remember and process 16 million network routes; this would quickly overwhelm even the fastest of routers, and network traffic would either slow to a crawl or fail completely. Having larger network classes allows routers to work with smaller routing tables. Remember also that the original architecture of the Internet consisted mostly of large networks connecting to each other directly, and didn't look much like the hierarchical design used today. It was easy to give one huge address block to the military and another to Stanford University. In that model, routers had to remember only one IP address for each network, and could reach millions of hosts through each of those routes.

Today, however, things are considerably different, and organizations of all sizes are connecting to the Internet. Some networks are still quite large, requiring many thousands of network numbers, while some are quite small, consisting of only a handful of PCs. In this environment class-based routing does not scale well, although there still exists the need for bundled networks so that routers do not have to remember millions of separate routers and network paths. This problem has been resolved through the use of variable-length subnet masks, as described in the earlier section "Subnet Masks and CIDR Networks." When variable-length subnet masks are used instead of predefined subnet masks, blocks of addresses can be assigned to organizations using a subnet mask that is appropriate for the number of devices on that network. If a network has only 8 PCs, it only needs a network block with a 28-bit subnet mask, which provides it with 16 addresses (14 of which are usable by the hosts). In this context, CIDR-based addressing rules do not care about the "class" to which a network address appears to belong. Instead, CIDR-aware systems rely on the explicit presence of a subnet mask to make packet-forwarding decisions, and use the class only as a last-ditch effort in the event that the subnet mask is not explicitly defined in the network's routing tables. This results in substantially less wasted address space, although it also results in more routing entries that must be managed. However, another key part of the CIDR architecture is that network blocks are assigned hierarchically, with top-level service providers getting big network numbers (a large ISP may get a network with a /13 prefix, allowing a maximum of 524,288 host addresses for that network assignment), and those organizations can subnet their large networks into multiple smaller networks for their downstream customers. This allows a single routing entry for the top-level ISP to be used for all the networks underneath it. Rather than the toplevel routers having to store routing information for the 32,000+ individual /28 networks beneath the ISP, they have to remember only the routes for the /13 network. Since the modern Internet now predominately uses CIDR addressing and routing, the most important thing to remember about the historical class-based addressing scheme is that it is a legacy design. Just about all of the modern operating systems and network infrastructure devices today fully support variable-length subnet masks. However, much of the older equipment still enforces the use of class-based addressing, and many training courses still teach this historical architecture as if it were current technology. For the most part, network administrators should not be concerned with network classes unless they are suffering problems with legacy equipment. Internet-Legal Versus Private Addressing Although the pool of IP addresses is somewhat limited, most companies have no problems obtaining them. However, many organizations have already installed TCP/IP products on their internal networks without obtaining "legal" addresses from the proper sources. Sometimes these addresses come from example books or are simply picked at random (several firms use networks numbered 1.2.3.0, for example). Unfortunately, since they are not legal, these addresses will not be usable when these organizations attempt to connect to the Internet. These firms will eventually have to reassign Internet-legal IP addresses to all the devices on their networks, or invest in address-translation gateways that rewrite outbound IP packets so they appear to be coming from an Internet-accessible host. Even if an address-translation gateway is installed on the network, these firms will never be able to communicate with any site that is a registered owner of the IP addresses in use on the local network. For example, if you choose to use the 36.0.0.0/8 address block on your internal network, your users will never be able to access the computers at Stanford University, the registered owner of that address block. Any attempt to connect to a host at 36.x.x.x will be interpreted by the local routers as a request for a local system, so those packets will never leave your local network. Not all firms have the luxury of using Internet-legal addresses on their hosts, for any number of reasons. For example, there may be legacy applications that use hardcoded addresses, or there may be too many systems across the organization for a clean upgrade to be successful. If you are unable to use Internet-legal addresses, you should at least be aware that there are groups of "private" Internet addresses that can be used on internal networks by anyone. These address pools were set aside in RFC 1918, and therefore cannot be "assigned" to any organization. The Internet's backbone routers are configured explicitly not to route packets with these addresses, so they are completely useless outside of an organization's internal network. The address blocks available are listed in Table B-4. Table B-4: Private Addresses Provided in RFC 1918 Class Range of Addresses A B C Any addresses in 10.x.x.x Addresses in the range of 172.16.x.x-172.31.x.x Addresses in the range of 192.168.0.x-192.168.255.x

Since these addresses cannot be routed across the Internet, you must use an address-translation gateway or a proxy server in conjunction with them. Otherwise, you will not be able to communicate with any hosts on the Internet. An important note here is that since nobody can use these addresses on the Internet, it is safe to assume that anybody who is using these addresses is also utilizing an address-translation gateway of some sort. Therefore, while you will never see these addresses used as destinations on the Internet, if your organization establishes a private connection to a partner organization that is using the same block of addresses you are, your firms will not be able to communicate. The packets destined for your partner's network will appear to be local to your network, and will never be forwarded to the remote network. There are many other problems that arise from using these addresses, making their general usage difficult for normal operations. For example, many application-layer protocols embed addressing information directly into the protocol stream, and in order for these protocols to work properly, the address-translation gateway has to be aware of their mechanics. In the preceding scenario, the gateway has to rewrite the private addresses (which are stored as application data inside the application protocol), rewrite the UDP/TCP and IP checksums, and possibly rewrite TCP sequence numbers as well. This is difficult to do even with simple and open protocols such as FTP, and extremely difficult with proprietary, encrypted, or dynamic applications (these are problems for many database protocols, network games, and

voice-over-IP services, in particular). These gateways almost never work for all the applications in use at a specific location. It is always best to use formally-assigned, Internet-legal addresses whenever possible, even if the hosts on your network do not necessarily require direct Internet access. In those cases in which your hosts are going through a firewall or application proxy of some sort, the use of Internet-legal addresses causes the least amount of maintenance trouble over time. If for some reason this is not possible, use one of the private address pools described in Table B-4. Do not use random, self-assigned addresses if you can possibly avoid it, as this will only cause connectivity problems for you and your users. port (n.) (1) An interface on a computer to which you can connect a device.Personal computers have various types of ports. Internally, there are several ports for connecting disk drives, display screens, and keyboards. Externally, personal computers have ports for connecting modems, printers, mice, and other peripheral devices. Almost all personal computers come with a serial RS-232C port or RS-422port for connecting a modem or mouse and a parallel port for connecting a printer. On PCs, the parallel port is a Centronics interface that uses a 25pinconnector. SCSI (Small Computer System Interface) ports support higher transmission speeds than do conventional ports and enable you to attach up to seven devices to the same port. (2) In TCP/IP and UDP networks, an endpoint to a logical connection. The port number identifies what type of port it is. For example, port 80 is used for HTTPtraffic. Also see Well-Known TCP Port Numbers in the Quick Referencesection of Webopedia. (v.) To move a program from one type of computer to another. To port anapplication, you need to rewrite sections that are machine dependent, and then recompile the program on the new computer. Programs that can be ported easily are said to be portable. 1) On computer and telecommunication devices, a port (noun) is generally a specific place for being physically connected to some other device, usually with a socket and plug of some kind. Typically, a personal computer is provided with one or more serial ports and usually one parallel port. The serial port supports sequential, one bit-at-a-time transmission to peripheral devices such as scanners and the parallel port supports multiple-bit-at-a-time transmission to devices such as printers. 2) In programming, a port (noun) is a "logical connection place" and specifically, using the Internet's protocol, TCP/IP, the way a client program specifies a particular server program on a computer in a network. Higher-level applications that use TCP/IP such as the Web protocol, Hypertext Transfer Protocol, have ports with preassigned numbers. These are known as "well-known ports" that have been assigned by the Internet Assigned Numbers Authority (IANA). Other application processes are given port numbers dynamically for each connection. When a service (server program) initially is started, it is said to bind to its designated port number. As any client program wants to use that server, it also must request to bind to the designated port number. Port numbers are from 0 to 65535. Ports 0 to 1024 are reserved for use by certain privileged services. For the HTTP service, port 80 is defined as a default and it does not have to be specified in the Uniform Resource Locator (URL). 3) In programming, to port (verb) is to move an application program from an operating system environment in which it was developed to another operating system environment so it can be run there. Porting implies some work, but not nearly as much as redeveloping the program in the new environment. Open standard programming interface (such as those specified in X/Open's 1170 C language specification and Sun Microsystem's Javaprogramming language) minimize or eliminate the work required to port a program. Also seeportability. 7.7. Name Resolution Name resolution tries to convert some of the numerical address values into a human readable format. There are two possible ways to do these conversions, depending on the resolution to be done: calling system/network services (like the gethostname() function) and/or resolve from Wireshark specific configuration files. For details about the configuration files Wireshark uses for name resolution and alike, see Appendix A, Files and Folders. The name resolution feature can be enabled individually for the protocol layers listed in the following sections. 7.7.1. Name Resolution drawbacks Name resolution can be invaluable while working with Wireshark and may even save you hours of work. Unfortunately, it also has its drawbacks. Name resolution will often fail. The name to be resolved might simply be unknown by the name servers asked, or the servers are just not available and the name is also not found in Wireshark's configuration files. The resolved names are not stored in the capture file or somewhere else. So the resolved names might not be available if you open the capture file later or on a different machine. Each time you open a capture file it may look "slightly different", simply because you can't connect to the name server (which you could connect to before). DNS may add additional packets to your capture file. You may see packets to/from your machine in your capture file, which are caused by name resolution network services of the machine Wireshark captures from. XXX - are there any other such packets than DNS ones? Resolved DNS names are cached by Wireshark. This is required for acceptable performance. However, if the name resolution information should change while Wireshark is running, Wireshark won't notice a change in the name resolution information once it gets cached. If this information changes while Wireshark is running, e.g. a new DHCP lease takes effect, Wireshark won't notice it. XXX - is this true for all or only for DNS info? Tip! The name resolution in the packet list is done while the list is filled. If a name could be resolved after a packet was added to the list, that former entry won't be changed. As the name resolution results are cached, you can use "View/Reload" to rebuild the packet list, this time with the correctly resolved names. However, this isn't possible

while a capture is in progress. 7.7.2. Ethernet name resolution (MAC layer) Try to resolve an Ethernet MAC address (e.g. 00:09:5b:01:02:03) to something more "human readable". ARP name resolution (system service): Wireshark will ask the operating system to convert an Ethernet address to the corresponding IP address (e.g. 00:09:5b:01:02:03 192.168.0.1). Ethernet codes (ethers file): If the ARP name resolution failed, Wireshark tries to convert the Ethernet address to a known device name, which has been assigned by the user using an ethers file (e.g. 00:09:5b:01:02:03 homerouter). Ethernet manufacturer codes (manuf file): If neither ARP or ethers returns a result, Wireshark tries to convert the first 3 bytes of an ethernet address to an abbreviated manufacturer name, which has been assigned by the IEEE (e.g. 00:09:5b:01:02:03 Netgear_01:02:03). 7.7.3. IP name resolution (network layer) Try to resolve an IP address (e.g. 216.239.37.99) to something more "human readable". DNS/concurrent DNS name resolution (system/library service): Wireshark will ask the operating system (or the concurrent DNS library), to convert an IP address to the hostname associated with it (e.g. 216.239.37.99 www.1.google.com). The DNS service is using synchronous calls to the DNS server. So Wireshark will stop responding until a response to a DNS request is returned. If possible, you might consider using the concurrent DNS library (which won't wait for a name server response). Warning! Enabling network name resolution when your name server is unavailable may significantly slow down Wireshark while it waits for all of the name server requests to time out. Use concurrent DNS in that case. DNS vs. concurrent DNS: here's a short comparison: Both mechanisms are used to convert an IP address to some human readable (domain) name. The usual DNS call gethostname() will try to convert the address to a name. To do this, it will first ask the systems hosts file (e.g. /etc/hosts) if it finds a matching entry. If that fails, it will ask the configured DNS server(s) about the name. So the real difference between DNS and concurrent DNS comes when the system has to wait for the DNS server about a name resolution. The system call gethostname() will wait until a name is resolved or an error occurs. If the DNS server is unavailable, this might take quite a while (several seconds). The concurrent DNS service works a bit differently. It will also ask the DNS server, but it won't wait for the answer. It will just return to Wireshark in a very short amount of time. The actual (and the following) address fields won't show the resolved name until the DNS server returns an answer. As mentioned above, the values get cached, so you can use View/Reload to "update" these fields to show the resolved values. hosts name resolution (hosts file): If DNS name resolution failed, Wireshark will try to convert an IP address to the hostname associated with it, using a hosts file provided by the user (e.g. 216.239.37.99 www.google.com). 7.7.4. IPX name resolution (network layer) ipxnet name resolution (ipxnets file): XXX - add ipxnets name resolution explanation. 7.7.5. TCP/UDP port name resolution (transport layer) Try to resolve a TCP/UDP port (e.g. 80) to something more "human readable". TCP/UDP port conversion (system service): Wireshark will ask the operating system to convert a TCP or UDP port to its well known name (e.g. 80 http). XXX - mention the role of the /etc/services file (but don't forget the files and folders section)!
in computer languages Expressions in computer languages can contain identifiers. The semantics of such expressions depend on the entities that the identifiers refer to. The algorithm that determines what an identifier in a given context refers to is part of the

language definition. The complexity of these algorithms is influenced by the sophistication of the language. For example, name resolution in assembly language usually involves only a single simple table lookup, while name resolution in C++ is extremely complicated as it involves: namespaces, which make it possible for an identifier to have different meanings depending on its associated namespace; scopes, which make it possible for an identifier to have different meanings at different scope levels, and which involves various scope overriding and hiding rules. At the most basic level name resolution usually attempts to find the binding in the smallest enclosing scope, so that for example local variables supersede global variables; this is called shadowing. visibility rules, which determine whether identifiers from specific namespaces or scopes are visible from the current context; overloading, which makes it possible for an identifier to have different meanings depending on how it is used, even in a single namespace or scope; accessibility, which determines whether identifiers from an otherwise visible scope are actually accessible and participate in the name resolution process.
Static versus dynamic In programming languages, name resolution can be performed either at compile time or at runtime. The former is

called static name resolution, the latter is called dynamic name resolution. A somewhat common misconception is that dynamic typing implies dynamic name resolution. However, static typing does imply static name resolution. For example, Erlang is dynamically typed but has static name resolution. Static name resolution catches, at compile time, use of variables that are not in scope; preventing programmer errors. Languages with dynamic scope resolution sacrifice this safety for more flexibility; they can typically set and get variables in the some scope at runtime. For example, in Python: locals()['num'] = 999 # equivalent to: num = 999 noun = "troubles" noun2 = "hound"

# which variables to use are decided at runtime print("{num} {noun} and a {noun2} ain't one".format(**locals())) # outputs: 999 troubles and a hound ain't one However, relying on dynamic name resolution in code is discouraged by the Python community.[1][2] The feature also may be removed in a later version of Python.[3] Examples of languages that use static name resolution include C, C++, E, Erlang, Haskell, Java, Pascal, Scheme, and Smalltalk. Examples of languages that use dynamic name resolution include some Lisp dialects, Perl, PHP, Python, REBOL, and Tcl. In computer networking, an Internet socket or network socket is an endpoint of a bidirectional interprocess communication flow across an Internet Protocol-based computer network, such as theInternet. The term Internet sockets is also used as a name for an application programming interface (API) for the TCP/IP protocol stack, usually provided by the operating system. Internet sockets constitute a mechanism for delivering incoming data packets to the appropriate application process or thread, based on a combination of local and remote IP addresses and port numbers. Each socket is mapped by the operating system to a communicating application process or thread. A socket address is the combination of an IP address (the location of the computer) and a port (which is mapped to the application program process) into a single identity, much like one end of a telephone connection is the combination of a phone number and a particular extension.
[edit]Overview

An Internet socket is characterized by a unique combination of the following: Local socket address: Local IP address and port number Remote socket address: Only for established TCP sockets. As discussed in the Client-Server section below, this is necessary since a TCP server may serve several clients concurrently. The server creates one socket for each client, and these sockets share the same local socket address. Protocol: A transport protocol (e.g., TCP, UDP), raw IP, or others. TCP port 53 and UDP port 53 are consequently different, distinct sockets. Within the operating system and the application that created a socket, the socket is referred to by a unique integer number called socket identifier or socket number. The operating system forwards the payload of incoming IP packets to the corresponding application by extracting the socket address information from the IP and transport protocol headers and stripping the headers from the application data. In IETF Request for Comments, Internet Standards, in many textbooks, as well as in this article, the term socket refers to an entity that is uniquely identified by the socket number. In other textbooks[1], the socket term refers to a local socket address, i.e. a "combination of an IP address and a port number". In the original definition of socket given in RFC 147, as it was related to theARPA network in 1971, "the socket is specified as a 32 bit number with even sockets identifying receiving sockets and odd sockets identifying sending sockets." Today, however, socket communications are bidirectional. On Unix-like and Microsoft Windows based operating systems the netstat command line tool may be used to list all currently established sockets and related information.
[edit]Socket types

There are several Internet socket types available: Datagram sockets, also known as connectionless sockets, which use User Datagram Protocol (UDP) Stream sockets, also known as connection-oriented sockets, which use Transmission Control Protocol (TCP) or Stream Control Transmission Protocol (SCTP). Raw sockets (or Raw IP sockets), typically available in routers and other network equipment. Here the transport layer is bypassed, and the packet headers are not stripped off, but are accessible to the application. Application examples are Internet Control Message Protocol (ICMP, best known for the Ping suboperation), Internet Group Management [2] Protocol (IGMP), and Open Shortest Path First (OSPF). There are also non-Internet sockets, implemented over other transport protocols, such as Systems Network Architecture (SNA).[3] See also Unix domain sockets (UDS), for internal inter-process communication.
[edit]Socket states and the client-server model

Computer processes that provide application services are called servers, and create sockets on start up that are in listening state. These sockets are waiting for initiatives from client programs. For a listening TCP socket, the remote address presented by netstat may be denoted 0.0.0.0 and the remote port number 0. A TCP server may serve several clients concurrently, by creating a child process for each client and establishing a TCP connection between the child process and the client. Unique dedicated socketsare created for each connection. These are in established state, when a socket-to-socket virtual connection or virtual circuit (VC), also known as a TCP session, is established with the remote socket, providing a duplex byte stream. Other possible TCP socket states presented by the netstat command are Syn-sent, Syn-Recv, Fin-wait1, Fin-wait2, Timewait, Close-wait and Closed which relate to various start up and shutdown steps.[4] A server may create several concurrently established TCP sockets with the same local port number and local IP address, each mapped to its own server-child process, serving its own client process. They are treated as different sockets by the operating system, since the remote socket address (the client IP address and/or port number) are different; i.e. since they have different socket pair tuples(see below). A UDP socket cannot be in an established state, since UDP is connectionless. Therefore, netstat does not show the state of a UDP socket. A UDP server does not create new child processes for every concurrently served client, but the same process handles incoming data packets from all remote clients sequentially through the same socket. This implies that UDP sockets are not identified by the remote address, but only by the local address, although each message has an associated remote address.
[edit]Socket pairs

Communicating local and remote sockets are called socket pairs. Each socket pair is described by a unique 4tuple consisting of source and destination IP addresses and port numbers, i.e. of local and remote socket addresses.[5][6] As seen in the discussion above, in the TCP case, each unique socket pair 4-tuple is assigned a socket number, while in the UDP case, each unique local socket address is assigned a socket number.
[edit]Implementation issues TCP Socket flow diagram.

Sockets are usually implemented by an API library such as Berkeley sockets, first introduced in 1983. Most implementations are based on Berkeley sockets, for example Winsock introduced in 1991. Other socket API implementations exist, such as the STREAMS-based Transport Layer Interface (TLI). Development of application programs that utilize this API is called socket programming or network programming. These are examples of functions or methods typically provided by the API library[7]: socket() creates a new socket of a certain socket type, identified by an integer number, and allocates system resources to it. bind() is typically used on the server side, and associates a socket with a socket address structure, i.e. a specified local port number and IP address. listen() is used on the server side, and causes a bound TCP socket to enter listening state. connect() is used on the client side, and assigns a free local port number to a socket. In case of a TCP socket, it causes an attempt to establish a new TCP connection. accept() is used on the server side. It accepts a received incoming attempt to create a new TCP connection from the remote client, and creates a new socket associated with the socket address pair of this connection. send() and recv(), or write() and read(), or recvfrom() and sendto(), are used for sending and receiving data to/from a remote socket. close() causes the system to release resources allocated to a socket. In case of TCP, the connection is terminated. gethostbyname() and gethostbyaddr() are used to resolve host names and addresses. select() is used to prune a provided list of sockets for those that are ready to read, ready to write or have errors poll() is used to check on the state of a socket. The socket can be tested to see if it can be written to, read from or has errors.
[edit]Sockets in network equipment

The socket is primarily a concept used in the Transport Layer of the Internet model. Networking equipment such as routers and switches do not require implementations of the Transport Layer, as they operate on the Link Layer level (switches) or at the Internet Layer (routers). However, stateful network firewalls, network address translators, and proxy servers keep track of active socket pairs. Also infair queuing, layer 3 switching and quality of service (QoS) support in routers, packet flows may be identified by extracting information about the socket pairs. Raw sockets are typically available in network equipment, and used for routing protocols such as IGMP and OSPF, and in Internet Control Message Protocol (ICMP). A communications protocol is a formal description of digital message formats and the rules for exchanging those messages in or between computing systems and in telecommunications. Protocols may include signaling, authentication and error detection and correction capabilities. A protocol describes the syntax, semantics, and synchronization of communication and may be implemented in hardware or software, or both.
[edit]Introduction

While there is no generally accepted formal definition of "protocol" in computer science,[citation needed] an informal definition, based on the previous, could be "a description of a set of procedures to be followed when communicating". In computer science the word algorithm is a synonym for the word procedure' so a 'protocol is to communications what an algorithm is to mathematics. Communicating systems use well-defined formats for exchanging messages. Each message has an exact meaning intended to provoke a defined response of the receiver. A protocol therefore describes the syntax, semantics, and synchronization of communication. A programming language describes the same for computations, so there is a close analogy between protocols and programming languages: protocols are to communications what programming languages are to computations.[1] (A less technical reader might appreciate this similar analogy: protocols are to communications what grammar is to writing.) The communications protocols in use on the Internet are designed to function in very complex and diverse settings. To ease design, communications protocols are structured using a layering scheme as a basis. Instead of using a single universal protocol to handle all transmission tasks, a set of cooperating protocols fitting the layering scheme is used.[2] Figure 2. The TCP/IP model or Internet layering scheme and its relation to some common protocols. The layering scheme in use on the Internet is called the TCP/IP model. The actual protocols are collectively called the Internet protocol suite. The group responsible for this design is called the Internet Engineering Task Force (IETF). Obviously the number of layers of a layering scheme and the way the layers are defined can have a drastic impact on the protocols involved. This is where the analogies come into play for the TCP/IP model, because the designers of TCP/IP employed the same techniques used to conquer the complexity of programming language compilers (design by analogy) in the implementation of its protocols and its layering scheme.[3] Communications protocols have to be agreed upon by the parties involved. To reach agreement a protocol is developed into a technical standard.
[edit]Communicating systems

The information exchanged between devices on a network or other communications medium is governed by rules or conventions that can be set out in a technical specification called a communication protocol standard. The nature of the communication, the actual data exchanged and any state-dependent behaviors are defined by the specification.

In digital computing systems, the rules can be expressed by algorithms and data structures. Expressing the algorithms in a portable programming language, makes the protocol software operating system independent. Operating systems are usually conceived of as consisting of a set of cooperating processes that manipulate a shared store (on the system itself) to communicate with each other. This communication is governed by well-understood protocols. These protocols can be embedded in the process code itself as small additional code fragments.[4][5] In contrast, communicating systems have to communicate with each other using shared transmission media, because there is no common memory. Transmission is not necessarily reliable and can involve different hardware and operating systems on different systems. To implement a networking protocol, the protocol software modules are interfaced with a framework implemented on the machine's operating system. This framework implements the networking functionality of the operating system.[6] The best known frameworks are the TCP/IP model and the OSI model. At the time the Internet was developed, layering had proven to be a successful design approach for both compiler and operating system design and, given the similarities between programming languages and communication protocols, layering was applied to the protocols as well.[7] This gave rise to the concept of layered protocols which nowadays forms the basis of protocol design.[8] Systems typically do not use a single protocol to handle a transmission. Instead they use a set of cooperating protocols, sometimes called a protocol family or protocol suite.[9] Some of the best known protocol suites include: IPX/SPX, X.25, AX.25, AppleTalk and TCP/IP. The protocols can be arranged based on functionality in groups, for instance there is a group of transport protocols. The functionalities are mapped onto the layers, each layer solving a distinct class of problems relating to, for instance: application-, transport-, internet- and network interface-functions.[10] To transmit a message, a protocol has to be selected from each layer, so some sort of multiplexing/demultiplexing takes place. The selection of the next protocol is accomplished by extending the message with a protocolselector for each layer.[11]
[edit]Basic requirements of protocols

Messages are sent and received on communicating systems to establish communications. Protocols should therefore specify rules governing the transmission. In general, much of the following should be addressed:[12] Data formats for data exchange. Digital message bitstrings are exchanged. The bitstrings are divided in fields and each field carries information relevant to the protocol. Conceptually the bitstring is divided into two parts called the header area and the data area. The actual message is stored in the data area, so the header area contains the fields with more relevance to the protocol. The transmissions are limited in size, because the number of transmission errors is proportional to the size of the bitstrings being sent.[citation needed] Bitstrings longer than the maximum transmission unit (MTU) are divided in pieces of appropriate size. Each piece has almost the same header area contents, because only some fields are dependent on the contents of the data area (notably CRC fields, containing checksums that are calculated from the data area contents).[13] Address formats for data exchange. Addresses are used to identify both the sender and the intended receiver(s). The addresses are stored in the header area of the bitstrings, allowing the receivers to determine whether the bitstrings are intended for themselves and should be processed or should be discarded. A connection between a sender and a receiver can be identified using an address pair (sender address, receiver address). Usually some address values have special meanings. An all-1s address could be taken to mean all stations on the network, so sending to this address would result in a broadcast on the local network. Likewise, an all-'0's address could be taken to mean the sending station itself (as a synonym of the actual address). The rules describing the meanings of the address value are collectively called an addressing scheme.[14] Address mapping. Sometimes protocols need to map addresses of one scheme on addresses of another scheme. For instance to translate a logical IP address specified by the application to a hardware address. This is referred to as address mapping.[15] Routing. When systems are not directly connected, intermediary systems along the route to the intended receiver(s) need to forward messages on behalf of the sender. Determining the route the message should take is called routing. On the Internet, the networks are connected using routers. This way of connecting networks is called internetworking. Detection of transmission errors is necessary on networks which cannot guarantee error-free operation. In a common approach, CRCs of the data area are added to the end of packets, making it possible for the receiver to detect differences caused by errors. The receiver rejects the packets on CRC differences and arranges somehow for retransmission.[16] Acknowledgements of correct reception of packets by the receiver may be used to prevent the sender from retransmitting the packets. Some protocols, notably datagram protocols like the Internet Protocol (IP), do not acknowledge.[17] Loss of information - timeouts and retries. Sometimes packets are lost on the network or suffer from long delays. To cope with this, a sender expects an acknowledgement of correct reception from the receiver within a certain amount of time. On timeouts, the packet is retransmitted. In case of a broken link the retransmission has no effect, so the number of retransmissions is limited. Exceeding the retry limit is considered an error.[18] Direction of information flow needs to be addressed if transmissions can only occur in one direction at a time as on halfduplex links. This is known as Media Access Control. Arrangements have to be made to accommodate the case when two parties want to gain control at the same time.[19] Sequence control. We have seen that long bitstrings are divided in pieces, that are send on the network individually. The pieces may get 'lost' on the network or arrive out of sequence, because the pieces can take different routes to their destination on some types of networks. Pieces may be needlessly retransmitted resulting in duplicate pieces. By sequencing the pieces at the sender, the receiver can determine what was lost or duplicated and ask for retransmissions, if necessary, the pieces may be reassembled.[20] Flow control is needed when the sender transmits faster than the receiver or intermediate network equipment can process the transmissions. Flow control can be implemented by messaging from receiver to sender.[21]

Getting the data across a network is only part of the problem for a protocol. The data received has to be evaluated in the context of the progress of the conversation, so a protocol has to specify rules describing the context and explaining whether the (form of the) data fits this context or not. These kind of rules are said to express the syntax of the communications. Other rules determine whether the data is meaningful for the context in which the exchange takes place. These kind of rules are said to express the semantics of the communications. Both intuitive descriptions as well as more formal specifications in the form of finite state machine models are used to describe the expected interactions of the protocol.[22] Formal ways for describing the syntax of the communications are Abstract Syntax Notation One (a ISO standard) or Augmented Backus-Naur form (a IETF standard).
[edit]An example protocol

To get a feel for protocols and what needs to be described, the ethernet protocol is explained in some detail. To connect to a WiFi LAN, a computer has to be equipped with a wireless network interface controller. The combination of computer and interface controller is called a station. All stations share a single radio frequency communication channel. Transmissions on this channel are received by all stations within range. The hardware provides no indication to the sender about whether the transmission was delivered and is therefore called a best-effort delivery mechanism. A carrier wave is used to transmit the data in packets, referred to as Ethernet frames. Each station is constantly tuned in on the channel, so each transmission is noticed. In order to determine whether the channel is free, the carrier wave can be sensed by the hardware; if not present the channel is free for transmission. WiFi and Ethernet interfaces are assigned unique 48 bit numbers, known as a MAC address, which is used to address the stations. Ethernet establishes link level connections, which can be defined using both the destination and sources addresses. On reception of a transmission, the receiver uses the destination address to determine whether the transmission is relevant to the station or should be ignored. An Ethertype field in each frame is used by the operating system on the receiving station to select the appropriate protocol module (i.e. the Internet protocol module). Ethernet frames are said to be self-identifying, because of the frame type. Self-identifying frames make it possible to intermix multiple protocols on the same physical network and allow a single computer to use multiple protocols together.[23] The details given are from a networking point of view. Data transmission details like the electrical properties of the cable and the electrical circuits used for signal modulation are outside the scope of this article, but should obviously be described by a communication protocol standard as well.
[edit]Protocols and programming languages

The following two analogies between communications and computations are used throughout this text: Protocols are to communications what algorithms are to computations.[24] Protocols are to communications what programming languages are to computations.[1] The analogies have important consequences for both the design and the development of protocols. The word protocol has different meanings in the two analogies: 1. a variant of (or something like) an algorithm. 2. a variant of (or something like) a programming language. Arguably, it would make more sense to speak of a protocolling language (compare algorithmic language) when using the word protocol in its second meaning, but this is not a standard practice in a networking context. To further clarify on this, one has to consider the fact that algorithms, programs and protocols are just different ways of describing expected behaviour of interacting objects. Assuming the use of pseudocode in the case of both algorithms and protocols, all of the representations can be referred to as applications of programming languages. In other words protocols are applications of protocolling languages. A familiar example of a protocolling language is the HTML language used to describe webpages which are the actual webprotocols. In programming languages the association of identifiers to a value is termed a definition. Program text is structured using block constructs and definitions can be local to a block. The localized association of an identifier to a value established by a definition is termed a binding and the region of program text in which a binding is effective is known as its scope.[25] The computational state is kept using two components: the environment, used as a record of identifier bindings, and the store, which is used as a record of the effects of assignments.[26] In communications, message values are transferred using transmission media. By analogy, the equivalent of a store would be a collection of transmission media, instead of a collection of memory locations. A valid assignment in a protocol (as an analog of programming language) could be Ethernet:='message' , meaning a message is to be broadcast on the local ethernet. On a transmission medium there can be many receivers. For instance a mac-address identifies an ether network card on the transmission medium (the 'ether'). In our imaginary protocol, the assignment ethernet[mac-address]:=message value could therefore make sense.[27] By extending the assignment statement of an existing programming language with the semantics described, a protocolling language could easily be imagined. Operating systems provide reliable communication and synchronization facilities for communicating objects confined to the same system by means of system libraries. A programmer using a general purpose programming language (like C or ADA) can use the routines in the libraries to implement a protocol, instead of using a dedicated protocolling language.
[edit]Language layering Early compilers translated high-level language sources to machine code. Instead of translating directly into machine code,

some compilers translate to a machine independent intermediate code in order to enhance portability of the compiler and minimize design efforts. Often at the expense of execution speed. The intermediate language defines a virtual machine that can execute all programs written in the intermediate language (a machine is defined by its language and vice versa).[28] The intermediate code instructions are translated into equivalent machine code sequences by a code generator to create executable code. It is also possible to skip the generation of machine code by actually implementing the virtual machine in machine code. This virtual machine implementation is called an interpreter, because it reads in the intermediate code instructions one by one and after each read executes the equivalent machine code sequences (the interpretation) of the read intermediate instruction directly.[29]

Translation of a programming language into intermediate code and intermediate code into machine code to generate an executable is an example of layering of languages. Because languages define machines, this can be viewed as a layering of virtual machines as well. Using machines (or mechanisms) to build more complex machines, which in their turn are used for even more complex machines etcetera, results in what is referred to as a multi-level machine. A modern computer system is usually a six level machine with the following levels (also called layers) present: the digital logic level, the microprogramming level, the conventional machine level, the operating system level, the assembly language level and the problem-oriented programming language level.[30] The Internet can be viewed as a layered machine as well: a (best-effort) hardware delivery mechanism is used to build a connectionless packet delivery system on top of which a reliable transport system is build, which is used to build an application. The delivery system is defined by the IP protocol and the transport system by the TCP protocol.[31]
[edit]Universal protocols

Despite their numbers, networking protocols show little variety, because all networking protocols use the same underlying principles and concepts, in the same way. So, the use of a general purpose programming language would yield a large number of applications only differing in the details.[30] A suitably defined (dedicated) protocolling language would therefore have little syntax, perhaps just enough to specify some parameters or optional modes of operation, because its virtual machine would have incorporated all possible principles and concepts making the virtual machine itself a universalprotocol. The protocolling language would have some syntax and a lot of semantics describing this universal protocol and would therefore in effect be a protocol, hardly differing from this universal networking protocol. In this (networking) context a protocol is a language. The notion of a universal networking protocol provides a rationale for standardization of networking protocols; assuming the existence of a universal networking protocol, development of protocol standards using a consensus model (the agreement of a group of experts) might be a viable way to coordinate protocol design efforts. Networking protocols operate in very heterogeneous environments consisting of very different network technologies and a (possibly) very rich set of applications, so a single universal protocol would be very hard to design and implement correctly. Instead, the IETF decided to reduce complexity by assuming a relatively simple network architecture allowing decomposition of the single universal networking protocol into two generic protocols, TCP and IP, and two classes of specific protocols, one dealing with the low-level network details and one dealing with the high-level details of common network applications (remote login, file transfer, email and web browsing). ISO choose a similar but more general path, allowing other network architectures, to standardize protocols.
[edit]Protocol design

Communicating systems operate in parallel. The programming tools and techniques for dealing with parallel processes are collectively called concurrent programming. Concurrent programming only deals with the synchronization of communication. The syntax and semantics of the communication governed by a low-level protocol usually have modest complexity, so they can be coded with relative ease. High-level protocols with relatively large complexity could however merit the implementation of language interpreters. An example of the latter case is the HTML language. Concurrent programming has traditionally been a topic in operating systems theorie texts.[32] Formal verification seems indispensable, because concurrent programs are notorious for the hidden and sophisticated bugs they contain.[33] A mathematical approach to the study of concurrency and communication is referred to as Communicating Sequential Processes (CSP).[34] Concurrency can also be modelled using finite state machines like Mealy- and Moore machines. Mealyand Moore machines are in use as design tools in digital electronics systems, which we encounter in the form of hardware used in telecommunications or electronic devices in general.[35] This kind of design can be a bit of a challenge to say the least, so it is important to keep things simple. For the Internet protocols, in particular and in retrospect, this meant a basis for protocol design was needed to allow decomposition of protocols into much simpler, cooperating protocols.
[edit]Concurrent programming

A concurrent program is an abstraction of cooperating processes suitable for formal treatment and study. The goal of the abstraction is to prove correctness of the program assuming the existence of some basic synchronization or data exchange mechanisms provided by the operating system (or other software) or hardware. The mechanisms are complex, so more convenient higher level primitivesare implemented with these mechanisms. The primitives are used to construct the concurrent program. The basic primitive for synchronization is the semaphore. All other primitives (locks, reentrant mutexes, semaphores, monitors, message passing, tuple space) can be defined using semaphores. The semaphore is sufficiently elementary to be successfully studied by formal methods.[36] In order to synchronize or exchange data the processes must communicate by means of either a shared memory, used to store data or access-restricted procedures, or the sending/receiving of signals (message passing) using a shared transmission medium. Most third generation operating systems implement separate processes that use special instructions to ensure only one process can execute the restricted procedures. On distributed systems there is no common central memory so the communications are always by means of message passing. In this case the processes simply have to wait for each other (synchronization by rendezvous) before exchanging data.[4] Conceptually, the concurrent program consists of several sequential processes whose execution sequences are interleaved. The execution sequences are divided into sections. A section manipulating shared resources is called a critical section. The interleaving scheme makes no timing assumptions other than that no process halts in its critical section and that ready processes are eventually scheduled for execution. For correct operation of the program, the critical sections of the processes need to be properly sequenced and synchronized. This is achieved using small code fragments (protocols) at the start and the end of the critical sections. The code fragments determine whether the critical sections of two communicating processes should execute in parallel (rendezvous of processes) or should be executed sequentially (mutual exclusion of processes). A concurrent program is correct if it does not violate some safety property such as mutual exclusion or rendezvous of critical sections and does not suffer of liveness properties such as deadlock or lockout. Correctness of the concurrent program can only be shown using a mathematical argument. Specifications of concurrent programs can be formulated

using formal logics (like CSP) which make it possible to prove properties of the programs. Incorrectness can be shown using execution scenarios.[5] Mutual exclusion is extensively studied in the mutual exclusion problem. The rendezvous is studied in the producerconsumer problem in which a producer process only produces data if and only if the consumer process is ready to consume the data. Although both problems only involve two processes, their solutions require rather complex algorithms (Dekker's algorithm, Lamport's bakery algorithm). The readers-writers problem is a generalization of the mutual exclusion problem. The dining philosophers problem is a classical problem sufficiently difficult to expose many of the potential pitfalls of newly proposed primitives.[37]
[edit]A basis for protocol design

Systems do not use a single protocol to handle a transmission. Instead they use a set of cooperating protocols, sometimes called a protocol family or protocol suite.[9] To cooperate the protocols have to communicate with each other, so some kind of conceptual framework is needed to make this communication possible. Also note that software is needed to implement both the 'xfer-mechanism' and a protocol (no protocol, no communication). In literature there are numerous references to the analogies between computer communication and programming. By analogy we could say that the aforementioned 'xfer-mechanism' is comparable to acpu; a 'xfer-mechanism' performs communications and a cpu performs computations and the 'framework' introduces something that allows the protocols to be designed independent of one and another by providing separate execution environments for the protocols. Furthermore, it is repeatedly stated that protocols are to computer communication what programming languages are to computation.[38][39]
[edit]Protocol layering

Figure 3. Message flows using a protocol suite. Protocol layering now forms the basis of protocol design.[8] It allows the decomposition of single, complex protocols into simpler, cooperating protocols, but it is also a functional decomposition, because each protocol belongs to a functional class, called a protocol layer.[2] The protocol layers each solve a distinct class of communications problems. The Internet protocol suite consists of the following layers: application-, transport-, internet- and network interfacefunctions.[10] Together, the layers make up a layering scheme or model. In computations, we have algorithms and data, and in communications, we have protocols and messages, so the analog of a data flow diagram would be some kind of message flow diagram.[24] To visualize protocol layering and protocol suites, a diagram of the message flows in and between two systems, A and B, is shown in figure 3. The systems both make use of the same protocol suite. The vertical flows (and protocols) are in system and the horizontal message flows (and protocols) arebetween systems. The message flows are governed by rules, and dataformats specified by protocols. The blue lines therefore mark the boundaries of the (horizontal) protocol layers. The vertical protocols are not layered because they don't obey the protocol layering principle which states that a layered protocol is designed so that layer n at the destination receives exactly the same object sent by layer n at the source. The horizontal protocols are layered protocols and all belong to the protocol suite. Layered protocols allow the protocol designer to concentrate on one layer at a time, without worrying about how other layers perform.[39] The vertical protocols neednot be the same protocols on both systems, but they have to satisfy some minimal assumptions to ensure the protocol layering principle holds for the layered protocols. This can be achieved using a technique called Encapsulation.[40] Usually, a message or a stream of data is divided into small pieces, called messages or streams, packets, IP datagrams or network frames depending on the layer in which the pieces are to be transmitted. The pieces contain a header area and a data area. The data in the header area identifies the source and the destination on the network of the packet, the protocol, and other data meaningful to the protocol like CRC's of the data to be send, data length, and a timestamp.[41][42] The rule enforced by the vertical protocols is that the pieces for transmission are to be encapsulated in the data area of all lower protocols on the sending side and the reverse is to happen on the receiving side. The result is that at the lowest level the piece looks like this: 'Header1,Header2,Header3,data' and in the layer directly above it: 'Header2,Header3,data' and in the top layer: 'Header3,data', both on the sending and receiving side. This rule therefore ensures that the protocol layering principle holds and effectively virtualizes all but the lowest transmission lines, so for this reason some message flows are coloured red in figure 3. To ensure both sides use the same protocol, the pieces also carry data identifying the protocol in their header. The design of the protocol layering and the network (or Internet) architecture are interrelated, so one cannot be designed without the other.[43] Some of the more important features in this respect of the Internet architecture and the network services it provides are described next. The Internet offers universal interconnection, which means that any pair of computers connected to the Internet is allowed to communicate. Each computer is identified by an address on the Internet. All the interconnected physical networks appear to the user as a single large network. This interconnection scheme is called an internetwork or internet.[44] Conceptually, an Internet addresses consists of a netid and a hostid. The netid identifies a network and the hostid identifies a host. The term host is misleading in that an individual computer can have multiple network interfaces each having its own Internet address. An Internet Address identifies a connection to the network, not an individual computer.[45] The netid is used by routers to decide where to send a packet.[46] Network technology independence is achieved using the low-level address resolution protocol (ARP) which is used to map Internet addresses to physical addresses. The mapping is called address resolution. This way physical addresses are only used by the protocols of the network interface

layer.[47] The TCP/IP protocols can make use of almost any underlying communication technology.[48] Figure 4. Message flows in the presence of a router Physical networks are interconnected by routers. Routers forward packets between interconnected networks making it possible for hosts to reach hosts on other physical networks. The message flows between two communicating system A and B in the presence of a router R are illustrated in figure 4. Datagrams are passed from router to router until a router is reached that can deliver the datagram on a physically attached network (calleddirect delivery).[49] To decide whether a datagram is to be delivered directly or is to be send to a router closer to the destination, a table called the IP routing table is consulted. The table consists of pairs of networkids and the paths to be taken to reach known networks. The path can be an indication that the datagram should be delivered directly or it can be the address of a router known to be closer to the destination.[50] A special entry can specify that a default router is chosen when there are no known paths.[51] All networks are treated equal. A LAN, a WAN or a point-to-point link between two computers are all considered as one network.[52] A Connectionless packet delivery (or packet-switched) system (or service) is offered by the Internet, because it adapts well to different hardware, including best-effort delivery mechanisms like the ethernet. Connectionless delivery means that the messages or streams are divided in pieces that are multiplexed separately on the high speed intermachine connections allowing the connections to be used concurrently. Each piece carries information identifying the destination. The delivery of packets is said to be unreliable, because packets may be lost, duplicated, delayed or delivered out of order without notice to the sender or receiver. Unreliability arises only when resources are exhausted or underlying networks fail.[53] The unreliable connectionless delivery system is defined by the Internet Protocol (IP). The protocol also specifies the routing function, which chooses a path over which data will be send.[54] It is also possible to use TCP/IP protocols on connection oriented systems. Connection oriented systems build up virtual circuits (paths for exclusive use) between senders and receivers. Once build up the IP datagrams are send as if they were data through the virtual circuits and forwarded (as data) to the IP protocol modules. This technique, calledtunneling, can be used on X.25 networks and ATM networks.[55] A reliable stream transport service using the unreliable connectionless packet delivery service is defined by the transmission control protocol (TCP). The services are layered as well and the application programs residing in the layer above it, called the application services, can make use of TCP.[56] Programs wishing to interact with the packet delivery system itself can do so using theuser datagram protocol (UDP).[57]
[edit]Software layering

Having established the protocol layering and the protocols, the protocol designer can now resume with the software design. The software has a layered organization and its relationship with protocol layering is visualized in figure 5. Figure 5: Protocol and software layering The software modules implementing the protocols are represented by cubes. The information flow between the modules is represented by arrows. The (top two horizontal) red arrows are virtual. The blue lines mark the layer boundaries. To send a message on system A, the top module interacts with the module directly below it and hands over the message to be encapsulated. This module reacts by encapsulating the message in its own data area and filling in its header data in accordance with the protocol it implements and interacts with the module below it by handing over this newly formed message whenever appropriate. The bottom module directly interacts with the bottom module of system B, so the message is send across. On the receiving system B the reverse happens, so ultimately (and assuming there were no transmission errors or protocol violations etc.) the message gets delivered in its original form to the topmodule of system B.[58] On protocol errors, a receiving module discards the piece it has received and reports back the error condition to the original source of the piece on the same layer by handing the error message down or in case of the bottom module sending it across.[59] The division of the message or stream of data into pieces and the subsequent reassembly are handled in the layer that introduced the division/reassembly. The reassembly is done at the destination (i.e. not on any intermediate routers).[60] TCP/IP software is organized in four layers.[61] Application layer. At the highest layer, the services available across a TCP/IP internet are accessed by application programs. The application chooses the style of transport to be used which can be a sequence of individual messages or a continuous stream of bytes. The application program passes data to the transport layer for delivery. Transport layer. The transport layer provides communication from one application to another. The transport layer may regulate flow of information and provide reliable transport, ensuring that data arrives without error and in sequence. To do so, the receiving side sends back acknowledgments and the sending side retransmits lost pieces called packets. The stream of data is divided into packets by the module and each packet is passed along with a destination address to the next layer for transmission. The layer must accept data from many applications concurrently and therefore also includes codes in the packet header to identify the sending and receiving application program. Internet layer. The Internet layer handles the communication between machines. Packets to be send are accepted from the transport layer along with an identification of the receiving machine. The packets are encapsulated in IP datagrams and the datagram headers are filled. A routing algorithm is used to determine if the datagram should be delivered directly or send to a router. The datagram is passed to the appropriate network interface for transmission. Incoming datagrams are checked for validity and the routing algorithm is used to decide whether the datagram should be processed locally or forwarded. If the datagram is addressed to the local machine, the datagram header is deleted and the appropriate transport protocol for the packet is chosen. ICMP error and control messages are handled as well in this layer. Network interface layer. The network interface layer is responsible for accepting IP datagrams and transmitting them over a specific network. A network interface may consist of a device driver or a complex subsystem that uses its own data link protocol.

Program translation has been divided into four subproblems: compiler, assembler, link editor, and loader. As a result, the translation software is layered as well, allowing the software layers to be designed independently. Noting that the ways to conquer the complexity of program translation could readily be applied to protocols because of the analogy between programming languages and protocols. The designers of the TCP/IP protocol suite were keen on imposing the same layering on the software framework. This can be seen in the TCP/IP layering by considering the translation of apascal program (message) that is compiled (function of the application layer) into an assembler program that is assembled (function of the transport layer) to object code (pieces) that is linked (function of the Internet layer) together with library object code (routing table) by the link editor, producing relocatable machine code (datagram) that is passed to the loader which fills in the memory locations (ethernet addresses) to produce executeable code (network frame) to be loaded (function of the network interface layer) into physical memory (transmission medium). To show just how closely the analogy fits, the terms between parentheses in the previous sentence denote the relevant analogs and the terms written cursively denote data representations. Program translation forms a linear sequence, because each layer's output is passed as input to the next layer. Furthermore, the translation process involves multiple data representations. We see the same thing happening in protocol software where multiple protocols define the datarepresentations of the data passed between the software modules.[3] The network interface layer uses physical addresses and all the other layers only use IP addresses. The boundary between network interface layer and Internet layer is called the high-level protocol address boundary.[62] The modules below the application layer are generally considered part of the operating system. Passing data between these modules is much less expensive than passing data between an application program and the transport layer. The boundary between application layer and transport layer is called the operating system boundary.[63]
[edit]Strict layering

Strictly adhering to a layered model, a practice known as strict layering, is not always the best approach to networking.[64] Strict layering, can have a serious impact on the performance of the implementation, so there is at least a trade-off between simplicity and performance.[65] Another, perhaps more important point can be shown by considering the fact that some of the protocols in the Internet Protocol Suite cannot be expressed using the TCP/IP model, in other words some of the protocols behave in ways not described by the model.[66] To improve on the model, an offending protocol could, perhaps be split up into two protocols, at the cost of one or two extra layers, but there is a hidden caveat, because the model is also used to provide a conceptual view on the suite for the intended users. There is a trade-off to be made here between preciseness for the designer and clarity for the intended user.[67]
[edit]Protocol development

For communication to take place, protocols have to be agreed upon. Recall that in digital computing systems, the rules can be expressed by algorithms and datastructures, raising the opportunity of hardware independency. Expressing the algorithms in a portable programming language, makes the protocolsoftware operating system independent. The sourcecode could be considered a protocol specification. This form of specification, however is not suitable for the parties involved. For one thing, this would enforce a source on all parties and for another, proprietary software producers would not accept this. By describing the software interfaces of the modules on paper and agreeing on the interfaces, implementers are free to do it their way. This is referred to as source independency. By specifying the algorithms on paper and detailing hardware dependencies in an unambiguous way, a paper draft is created, that when adhered to and published, ensures interoperability between software and hardware. Such a paper draft can be developed into a protocol standard by getting the approval of a standards organization. To get the approval the paper draft needs to enter and successfully complete thestandardization process. This activity is referred to as protocol development. The members of the standards organization agree to adhere to the standard on a voluntary basis. Often the members are in control of large market-shares relevant to the protocol and in many cases, standards are enforced by law or the government, because they are thought to serve an important public interest, so getting approval can be very important for the protocol. It should be noted though that in some cases protocol standards are not sufficient to gain widespread acceptance i.e. sometimes the sourcecode needs to be disclosed enforced by law or the government in the interest of the public.

You might also like