The Switch Book by Rich Seifert-Notes

Switch Book
Layer 2 concepts
THE SWITCH BOOK By Rich Seifert
CHAPTER-1
Foundations Of LAN Switches:
1/51
Switch Book
Layer 2 concepts
Network Architecture:
OSI LAYER:(Open System Interconnect) It consists of seven layers of network system functions.
1. Physical Layer:
Transmission and reception of signals from the communications medium. Data is sent in terms of bits:0s and 1s. This layer is a function of the design of physical medium.(Cabling)
2.Data Link Layer:
Provides direct communication between devices. Communications are of two types-point to point and point to multipoint. It provides mechanisms for 1.Framing 2.Addressing 3.Error Detection.
2/51
Switch Book
Layer 2 concepts
2 Modes of operation : 1. Connectionless :( a).Just forwards the frame and doesnt receive acknowledgement. (b) Doesnt provides error control and flow control. 2. Connection oriented :( a) continual exchange of data and receives acknowledgement. (b) Provides error and flow control.
3.Network Layer:
Station to Station data delivery across multiple links. Routing of packets across the internetwork usually through routers. Protocols include:IP,IPX,Appletalk etc
4. Transport Layer:
Shields between lower and upper layers. Provides error free sequenced guaranted delivery service. Mechanisms:1.Connection establishment,2.Error Recovery 3.Flow control. Protocols:TCP,ATP,SPX etc.
5. Session Layer:
Establishment of communications sessions between applications. Deals with user authentication and access control(passwords).
6.Presentation Layer:
Presents proper data to application layer. Data formats: encryption,decryption, encoding,decoding.
3/51
Switch Book
Layer 2 concepts
7.Application Layer:
Provides APIs that allow user applications to communicate across the network. Functions such as FTP, Mail Utilities, SMTP, NFS etc.
Data link sub layering:
1. Logical Link Control (LLC): Its the upper layer .Provides the data link service (connectionless or connection oriented) to the higher layer clients, independent of the underlying LAN.There are 3 types of service 1.LLC TYPE 1: Connectionless Service 2.LLC TYPE 2: Connection Oriented Service 3.LLC TYPE 3: Acknowledged Connectionless Service.
2. Medium Access Control (MAC): Its the lower layer.Deals with details of frame formats associated with the particular technology in use. LLC Frame Format:
Mac Header
Dest SAP-1 Byte
Source SAP- CTRL-1 1 Byte Byte
DATA
LLC/Snap Format: If the SAP is set to OXAA, then SNAP is in use.
Mac Header
Dest
Source
CTRL
SNAP OUI
SNAP Pid
Data
SAP=OXAA SAP=OXAA
4/51
Switch Book
Layer 2 concepts
Addressing:
MAC Address:Its a 48 bit address.Its used in the data link.Its also called as hardware address,Physical address.
1 byte
2nd OUI
3rd
4th
5th OAP
6th
The OUI is used the denote the manufacturer.Force10 has the OUI as 00-01-E8.
ETHERNET:
Low cost,High speed communication.
Frame transmission:
Sensing carrier. Waiting for Interframe gap. Transmission takes place.
Frame reception:
Station monitors for receiving frame. When channel becomes non-idle,it starts receiving bits. Frames will be discarded if they less than one slot time in length. FCS checks for minimum frame length,if valid the receiver will check for the DA to see if it matches the physical address of the receiving station.
5/51
Switch Book
Layer 2 concepts
If it matches then frame is forwarded to client.
Ethernet Frame Formats:
Type encapsulation:ETHERNET VERSION-2
Preamble/SFD
DA
SA
TYPE
DATA
FCS
Bytes: 8
46-1500
Length encapsulation:IEEE 802.3
46-1500 LLC Preamble/SFD DA SA HEADER FCS
LENGTH DSAP SSAP CTRL DATA PAD
Bytes: 8
Preamble: It consists of 7 bytes and it allows receivers to synchronize on incoming frame. It has a value of 0X55.
SFD:Its consists of 1 byte. Its used to signify the beginning of the DA.Its value is OXD5.
DA:Destination address of the frame.It consists of 6 bytes.
SA:Source addressof the frame.It consists of 6 bytes.
6/51
Switch Book
Layer 2 concepts
DATA:It consists of 46-1500 bytes. It encapsulates the higher layer protocol information being transferred across the Ethernet.
Pad: This is used to add extra bytes incase the value of the data is less say less than 46 bytes.In that case the frame will be discarded,so in order to prevent it we use pad field.
LAYER ENCAPSULATION:
PHYSICAL LAYER ENCAPSULATION-STREAM-BITS ETHERNET FRAME IP PACKET TCP SEGMENT

PL HEADER
ETH HEADER
IP HEADER
TCP HEADER
APPLICATION DATA
ETH TRAILER
PL TRAILER
A transport PDU is called a segment or message.
A Network PDU is called a packet.
A Data Link PDU is called frame.
A Physical layer PDU is called symbol stream.
PDU: Protocol Data Unit.
7/51
Switch Book
Layer 2 concepts
CHAPTER-2
TRANSPARENT BRIDGES
8/51
Switch Book
Layer 2 concepts
Transparent bridges.
Now getting into details of how things actually work Transparent bridges are so named because their presence and operation are transparent to network hosts. When transparent bridges are powered on, they learn the network's topology by analyzing the source address of incoming frames from all attached networks.
If, for example, a bridge sees a frame arrive on Line 1 from Host A, the bridge concludes that Host A can be reached through the network connected to Line 1. Through this process, transparent bridges build a table.
Host address 15 17 12
Network number 1 1 2.
Figure 1: Transparent bridges build a table that determines a host's accessibility
The bridge uses its table as the basis for traffic forwarding. When a frame is received on one of the bridge's interfaces, the bridge looks up the frame's destination address in its internal table. If the table contains an association between the destination address and any of the bridge's ports aside from the one on which the frame was received, the frame is forwarded out the indicated port. If no association is found, the frame is flooded to all ports except the inbound port. Broadcasts and multicasts also are flooded in this way.
UNICAST OPERATION:
9/51
Switch Book
Layer 2 concepts
When a frame is received on any port, the bridge extracts the destination address from the frame, looks up in the table, and determines the port to which the address maps. We have filtering and forwarding concepts. Filtering: When a packet is received by a node, filtering is the task of a) Determine whether to forward the packet at all, and b) Which port(s) to forward the packet to? Filtering makes the network operation more efficient by reducing the number of output ports to which the packet needs to be sent to. For example: Unicast packets need to go to only one output port, and that output port should the next step in the desired path to the destination. Multicast packets need to go to a (sub) set of ports. The forwarding table encodes this subset of ports and avoids the need to carry such information in the packet itself. Forwarding: Given a packet at a node, finding which output port it needs to go to is called forwarding it is a per-node function, whereas routing may encompass several nodes. The forwarding function is performed by every node in the network including hosts, repeaters, bridges and routers. Forwarding is trivial in the case of a single-port node (or in a dual port node, where the destination is on the port other than the input port) in this case you dont need even addresses. Generating the address table: 1) The address table can be built automatically by considering the source address in received frames.
10/51
Switch Book
Layer 2 concepts
2) Bridges perform a table lookup on the destination address in order to determine on which ports to forward the frame. Address table aging: If all we ever did was add learned address to the table and never removes them, we would have two problems. 1) The larger the table, the more time the lookup will require. Thus we have to restrict entries in the table to only those stations that are known to be currently active. 2) If a station moves from one port to another, the table will incorrectly indicate the old port until the station sends traffic that would cause the bridge to learn its new location. The simple solution to both the problems is to age entries out of the address table when a station has not been heard from for some period of time. Thus, when we perform the table lookup for the source address, we not only make a new entry, we flag the entry as being still active. On regular basis we check for stale entries. entries that have not been flagged as active for some period of timeand remove them from table.
Process Model of Table Operation 1. A lookup process compares the destination address in incoming frames to the entries in the table to determine whether to discard the frame, forward it to a specific port, or flood it to all the ports. 2. A learning process compares the source address in incoming frames to the entries in the table and updates the port mapping and activity indicators or creates new entries as needed. 3. An aging process removes stale entries from the table on a regular basis.
11/51
Switch Book
Layer 2 concepts
Custom Filtering and Forwarding We can add filtering and forwarding criteria beyond defaults. Many commercial bridges allow the network administrator to program the custom filter and forward criteria: for example the network administrator may wish to: 1. Prevent specific users from accessing certain resources. 2. Prevent sensitive traffic from being allowed to propagate beyond a set of controlled LANs. 3. Limit the amount of multicast traffic that is flooded onto certain LANs.
Implementing the bridge address table

Table operations:
There are three operations that need to be performed on the Bridge address table; Destination address lookup, source address learning, and entry aging. Considering the priority of the operations, the table design should be optimized for fast, real-time lookup, at the expense of slower and more complex update and aging algorithms if need be. Search Algorithms: 1) Hash tables 2) Binary search 3) Content addressable Memories (CAM) You can compare CAM to the inverse of RAM. When read, RAM produces the data for a given address. Conversely, CAM produces an address for a given data word. When searching for data within a RAM block, the search is performed serially. Thus, finding a particular data word can take many cycles. CAM searches all addresses in
12/51
Switch Book
Layer 2 concepts
parallel and produces the address storing a particular word. You can use CAM for any application requiring high-speed searches, such as networking, communications, data compression, and cache management. Aging entries from the table The aging process is a non-critical low priority task .It can be done in the background with out significant performance or operational penalty. The common mechanism used ,to maintain two bits valid(V) and Hit(H) .the valid (v) bit indicates that a table entry is valid; the hit(h) bit indicates that this entry has been hit ,that is ,seen as a source address, during the most recent aging process cycle.
The IEEE 802.1D standard In addition to the formal description of transparent bridge operation, the standard provides: 1) An architectural frame work for the operation of bridges, including formal specifications for interlayer services. 2) A formal description of the bridge address table, frame filtering, and forwarding, including for static and dynamic entries, forwarding rules and custom filter definitions. 3) A set of operating parameters for interoperation of bridged catenets.
13/51
Switch Book
Layer 2 concepts
CHAPTER-4
Principle of LAN switches
14/51
Switch Book
Layer 2 concepts
1. Switched LAN concepts
Access Domains
: The set of stations sharing a given LAN and arbitrating
among themselves using whatever access control mechanism is appropriate for that LAN.
Collision Domains: In an Ethernet LAN, the set of stations contending for the access to shared Ethernet LAN. This results in Collision domain.
Token Domains
: Similarly, the set of stations contending for the use of
token on a token-passing LAN, which results in Token domain.
Both Collision and Token domain are the examples for Access domain.
Each port in the switch act as the terminal for the access domain of that particular link.
Its the switch that separates the Access domain of each port.
Segmentation and Microsegmentation:
Segmentation is connecting group of stations to the each port of the switch (i.e. each port is connected with a shared LAN). So a switch used in this manner provides a collapsed backbone.
So to over come the drawbacks of the collapsed backbone, the concept of Microsegmentation come in to task.
Microsegmentation: it is the direct connection of each end stations to each switch port.
15/51
Switch Book
Layer 2 concepts
Microsegmentation interesting characters:
1. No access contention (i.e. no collision) only in the full duplex mode. 2. Possible to eliminate access control when full duplex used. 3. There will be a dedicated bandwidth (LAN segmentation is available for each station). So the data rate is independent. For example one can be 10Mbps, the other one can be 100Mbps or 1000Mbps.
Extended distance limitations:
Switches allow us to extend distance coverage of a LAN.
Using full duplex the distance constraints can be eliminated (i.e. microsegmentation).
Increase aggregate capacity:
A switch provides greater data-carrying capability than a shared LAN.
Since a switch hub provides dedicated capacity on each switch port, the total LAN capacity increases with the number of switch ports. So the aggregate capacity will equal: Capacityagg = port=1n Data Rateport
2. Cut-Through verses Store-and-Forward
Store and forward: as the name implies, each frame is received (stored) completely and then decisions are made regarding whether and where to forward the frames.
16/51
Switch Book
Layer 2 concepts
This is done based on the Destination address in the Ethernet frame. The destination address is the first field in the Ethernet frame.
So in this method the switch waits for 1.2ms, for the frame to receive fully and then the decision and forwarding is done.
To reduce this receiving and forwarding timing, the concept of Cut-Through comes in to picture.
Cut-Through: The switch begin transmitting the frame before the frame fully received at the input side. Since the destination address is the first field in the Ethernet frames, as soon as the switch reads the destination address it forwards the frame to the destination.
Switch can receive the destination address field by 11.2 s make the decision and forward. So the switch need not wait for the whole frame to be received.
Because of this advantage, Cut-Through mode is having less latency than the Store-and-Forward. The implication was that a Cut-Through switch provided a 20:1 performance improvement over Store-and-Forward switch. There are a number of fallacies with this conclusions:
1. Absolute latency is not a significant issue for most higher-layer protocols and applications (at least not latency on the order of a few milliseconds). 2. For those protocols that are sensitive to latency, the switch is only a smaller part of the problem. 3. Any latency benefit accrues only when the output port is available. 4. Cut-Through operation is generally not possible for multicast or unknown destination address.
17/51
Switch Book
Layer 2 concepts
CHAPTER 5
Loop resolution
18/51
Switch Book
Layer 2 concepts
Spanning tree protocol

Frames would loop for an indefinite period of time in networks with physically redundant links. To prevent looping frames, STP blocks some ports from forwarding frames so that only one active path exists between any pair of LAN segments (collision domains). The result of STP is both good and bad Good: Frames do not loop infinitely, which makes the LAN usable. Bad: the network does not actively take advantage of some of the redundant links, because they are blocked to prevent frames from looping. Some users traffic travels a seemingly longer path through the network, because a shorter physical path is blocked. However the net result is GOOD.
Terminology
Tree topology: Think of a tree. There is a root, branches (actually, a hierarchy of progressively smaller branches), and ultimately leaves. On a given tree, there are no disconnected parts that are still considered part of the tree; that is, the tree encompasses all of its leaves. In addition, there are no loops in the tree. Thus a tree is a loop-free topology that spans all of its parts.
Root Bridge: just as a tree has a root, spanning tree has a Root Bridge. The root Bridge is the logical center (but not necessarily the physical center) of the catenet. There is always exactly one Root Bridge in a catenet.
Designated Bridge: the bridge responsible for forwarding traffic in the direction from the root to a given link is known as the designated bridge for that link.
Designated Port: the port in the active topology used to forward traffic away from the root on to the link(s) for which this bridge is the Designated Bridge.
19/51
Switch Book
Layer 2 concepts
Root Port: the port in the active topology that provides connectivity from the designated bridge towards the root.
Bridge identifier: in order to properly configure, calculate, and maintain the spanning tree, there needs to be a way to uniquely identify each bridge in the catenet and each port within the bridge. A bridge identifier is a 64-bit field unique to each port in the catenet. The bridge id is the concatenation of a globally-unique 48-bit field and a 16-bit priority value. Bridge id: the priority is from 0 to 65,535 (216) and the default priority value is 32768(0x8000).
Port identifier: each port of the bridge is assigned a port id. Similar to the bridge id, a port id concatenates a unique 8-bit port number and 8-bit priority field. The range of the priority field in port id is 0 to 255(0xFF); the default value is the range (128 or 0X80).
Link and link cost: each port on a bridge connects to a link. That link may be a highspeed LAN or, alternatively, some wide area communications technology. The STP attempts to configure the catenet such that every end station is reachable from the root through the path with the lowest cost. By default,
Link cost = 1000/ data rate in Mbps Table: link cost recommendations DATA RATE RECOMMENDED LINK COST RANGE 4Mbps 10Mbps 16Mbps 100Mbps 1Gbps 10Gbps 100-1000 50-600 40-400 10-60 3-5 1-5 RECOMMENDED LINK COST VALUE 250 100 62 19 4 2
20/51
Switch Book
Layer 2 concepts
Path cost: as stated earlier, the STP attempts to configure the catenet such that every station is reachable from the root through the path with the lowest cost. The cost of a path is the sum of the cost of the links attached to the root ports in that path, as calculated earlier.
Calculating and maintaining the spanning tree: The spanning tree topology for a given set of links and bridges is determined by the bridge id, the link cost, and the port id associated with the bridges in the catenet. Logically, we need to perform three operations:
1. Determine (elect) a root bridge. 2. Determine (elect) the designated bridge and designated ports for each link. 3. Maintain the topology over time.
In practice all of these are done in parallel, through the spanning tree algorithm operating identically and independently in each bridge.
Elect a root To elect a root there is a election algorithm: the bridge with the numerically-lowest bridge id becomes the root bridge at any given time.
Elect the designated bridges and designated ports: By definition, the root bridge is the designated bridge for each link to which it attaches. For other links the designated bridge is elected with the help of the cost factor. The link which is having low path cost back to the root.
21/51
Switch Book
Layer 2 concepts
If there is a tie in the path cost, then the bridge with lowest-numbered bridge id will become the designated bridge. For a particular designated bridge there have to be only one designated port. So the port with lowest-numbered port id will be the designated port.
Spanning tree maintenance: In normal (steady state) operation, to maintain the tree, the protocol operates as follows: Once every Hello Time (2 seconds), the root bridge transmits a configuration message encoded as BPDU. all bridges sharing links with root bridge receive the BPDU and pass it to the STP entity within the bridge. Like the data frames, the BPDU is no forwarded by the bride to the end stations. The designated bridge will create a new BPDU based no the received BPDU from the root bridge and then transmit the message. So in each tire, the designated bridges will update the BPDU with their own information and transmit to the next tire. This process continues until there are no more designated bridges.
22/51
Switch Book
Layer 2 concepts
CHAPTER 7
Full Duplex Operation
23/51
Switch Book
Layer 2 concepts
Half Duplex: Its like where one device is transmitting and the other devices are receiving. Full Duplex channel:Its a communication channel which supports data transfer in both directions.
Half-duplex works optimally only if one device is transmitting and all the other devices are receiving.otherwise, collisions occur. When the collisions are detected, the devices causing the collision wait for a random time before retransmitting. Half-duplex is the most common transmission method and is adequate for normal workstation and PC connections. Full-duplex provides dual communication on a point-to-point connection and allows each device to simultaneously transmit and receive on a connection. Full-duplex mode is typically used to connect to other switches or to connect fast access devices such as workgroup servers.
To use full-duplex communication, both ends of the connection must be configured to operate in full-duplex mode.Fullduplex operation is only possible on point-to-point Ethernet connections that use separate conductors or fibers for transmit and receive, such as 10Base-T and 100Base-FX cabling etc. Full-duplex operation is not possible on connections using coaxial or AUI (10Base-5) cables or with most hubs.
Full Duplex Operation in LAN:
It Depends on
1.Use of dedicated media as provided by the popular structured cabling.(10 Base T,1000 Base Sx,1000 Base Lx etc) 2. The use of microsegmented (One PC to One Port connection), dedicated LANS.
For full duplex operation to occur: 1. There should be 2 devices on LAN (Switch PC or PC to PC etc). 2. Physical cabling should support Full Duplex. 3.Ethernet MAC must be configured to work in full duplex mode(Pascal code is used to disable collision detection). Full duplex operation is a subset of half duplex,disabling functions of half duplex.(no CS, no MA, no CD).
24/51
Switch Book
Layer 2 concepts
Implications of full duplex operation : 1.Eliminating collisions. 2.Increasing aggregate channel capacity. 3.Increases potential load on switch.
Transmitter Operation: A Full duplex transmitter will send a frame following two simple rules: 1.)The station sends frame by frame, that is, it finishes sending one frame before sending the next pending frame. 2.)The transmitter sends frames with interframe gap which gives the receiver some time to perform housekeeping chores.
Receiver Operation: 1.)The receiver waits for valid SFD and then begins to assemble the data link encapsulation of the frame. 2.) The Destination address is checked whether it matches the device otherwise its discarded. 3.)The FCS is checked, and any frame invalid is discarded. 4.)The frame length is checked and frames shorter than minimum length is discarded. 5.)The receiver passes up to its client all frames that have passed the previous tests.
Full Duplex Application Environments: Full duplex operation is most often seen in: 1.) Switch to Switch connections-Increased capacity, Meet the two station LAN requirement for full duplex operation, and require link lengths in excess of those allowed by the use of CSMA/CD. 2.) Server and Router connections- Increased capacity, justified in using dedicated switch ports, even at very high speeds. 3.) Long distance connections. Optical fiber is commonly used as it supports long distances.
25/51
Switch Book
Layer 2 concepts
Chapter 8
LAN and Switch Flow Control
26/51
Switch Book
Layer 2 concepts
The need for flow control:

Both LANs and LAN switches are connectionless in nature. Frames are transferred without error to a high degree of probability, but there is no absolute assurance of success. In the event of a bit error, receiver buffer unavailability, or any other abnormal occurrence, a receiver simply discards the frame without providing any notification of the fact. This allows LAN interfaces to be built at very low cost; a connectionless system is much simpler to implement than a system that includes mechanisms for error recovery and flow control within the data link. Default switch behavior A switch receives frames on its input ports and forwards them onto the appropriate output ports based on information [typically DA] in the received frame. Depending on the traffic patterns, switch performance limitations, and available buffer memory, it is possible that frames can arrive faster than the switch can receive, process, and forward them. The default behavior of a switch is to discard frames when faced with congestion condition.
The Effect of Frame Loss A higher layer protocol or application that requires reliable delivery must implement some form of error control .Such mechanism in TCP use positive acknowledge and retransmission [PAR] algorithm. In this scheme, data being transferred in one direction between stations is acknowledges in the other. The originating station does not assume that data has been successfully delivered until an acknowledge has been received. Depending on the transport protocol, a single lost frame can incur the penalty of idling the data transfer for seconds.
27/51
Switch Book
Layer 2 concepts
Controlling flow in half duplex networks

Half Duplex with Back Pressure
Half-duplex back pressure ensures retransmission of incoming packets if a half-duplex switch port is unable to receive incoming packets. When back pressure is enabled and no buffers are available to a port, the switch sends collision frames across the affected port and causes the transmitting station to resend the packets. The switch can then use this retransmission time to clear its receive buffer by sending packets already in the queue MAC Control MAC Control frame format
Dest. MAC Address (6Start Preamble Frame bytes) = (01-80C200-00-01) or unique DA Source MAC Length/Type (2-bytes) MAC Control Opcode (2-bytes) = PAUSE (00-01) MAC Control Parameters (2-bytes) = (00-00 to FF-FF) Reserved (42bytes) = all zeros
Frame Check Sequence (4-bytes)
(7-bytes) Delimiter (1-byte)
Address = 802.3 MAC (6bytes) Control (88-08)
PAUSE Function
The PAUSE function is used to implement flow control on full duplex Ethernet links .PAUSE
operation MAC control architecture and frame format .The operation is defined only for use across a single duplex link; it cant be used on a shared LAN. It may be sued to control data frame flow between: A pair of end stations
28/51
Switch Book
Layer 2 concepts
A switch and an end station A switch-to-switch link The pause function is specifically designed to prevent switches from unnecessarily discarding frames due to input buffer overflow under short-term transient overload conditions.
PAUSE operation
PAUSE operation implements a very simple stop-start form of flow control. A device wishing to temporarily inhibit incoming data sends a pause frame, with a parameter indicating the length of time that the full duplex partner should wait before sending any more dataframes.when a station receives a pause frame, it stops sending data frames for the period A station may issue a PAUSE may cancel the remainder of the pause period bu issuing another PAUSE frame with a parameter of zero time.
FLOW CONTROL IMPLEMENTATION ISSUES Design implications of PAUSE Function 1) Inserting PAUSE frames in the Transmit Queue Ethernet simply transmits frames in the order presented by the device driver, without PAUSE frame. Inserting PAUSE frames in timely manner is important for the effective use of flow control. The transmission of PAUSE frame cannot preempt a data transmission in progress. Therefore, the interface should complete the transmission of any frame in progress, wait interface spacing, and then send the requested PAUSE frame.
2) Parsing received PAUSE frames An interface must inspect and parse the fields in all incoming frames to determine when a valid PUASE has been received in order to act upon it. The fields like DA, Type field, MAC control opcode, FCS must be checked.
29/51
Switch Book
Layer 2 concepts
3) PAUSE timing Following the reception of the PAUSE frames itself (i.e. starting from the end of the last bit of the received FCS), the interface has the maximum 512 bit times to validate, decode and act upon the PAUSE frame. If during this time, the transmitter begins transmission of frame, then it is completed normally.
30/51
Switch Book
Layer 2 concepts
Chapter 9 LINK AGGREGATION
31/51
Switch Book
Layer 2 concepts
Why Link Aggregation?

Link Aggregation or trunking is a method of combining physical network links into a single logical link for increased bandwidth. With Link aggregation we are able to increase the capacity and availability of the communications channel between devices (both switches and end stations) using existing Fast Ethernet and Gigabit Ethernet technology. Two or more Gigabit Ethernet connections are combined in order to increase the bandwidth capability and to create resilient and redundant links. A set of multiple parallel physical links between two devices is grouped together to form a single logical link.
Link Aggregation also provides load balancing where the processing and communications activity is distributed across several links in a trunk so that no single link is overwhelmed. By taking multiple LAN connections and treating them as a unified, aggregated link, we can achieve practical benefits in many applications.
Link Aggregation provides the following important benefits:

_ Higher link availability _ Increased link capacity _ Improvements are obtained using existing hardware (no upgrading to higher-capacity link technology is necessary)
Aggregating replaces Upgrading

If the link capacity is to be increased, there are usually two possibilities: either upgrade the native link capacity or use an aggregate of two or more lower-speed links. Upgrades typically occur in factors of 10. In many cases, however, the device cannot take advantage of this increase. A performance improvement of 1:10 is not achieved, moreover the bottleneck is just moved from the network link to some other element within the device. Link aggregation may be less expensive than a native speed upgrade and yet achieve a similar performance level. Both the hardware costs for a higher speed link and the equivalent number of lower speed connections have to be balanced to decide which approach is the most advantageous. **Sometimes link aggregation may even be the only means to improve performance when the highest data rate available on the market is not sufficient.
32/51
Switch Book
Layer 2 concepts
Types of Link Aggregation

There are a number of situations where Link Aggregation is commonly deployed: _ Switch-to-switch connections _ Switch-to-station (server or router) connections _ Station-to-station connections
Switch-to-Switch Connections
In this scenario, multiple workgroups are joined to form one aggregated link. By aggregating multiple links, the higher speed connections can be achieved without hardware upgrade.
Switch-to-Station (Server or Router) Connections

Most server platforms can saturate a single 100 Mb/s link with many of the applications available today. Thus, link capacity becomes the limiting factor for overall system performance.
Station-to-Station Connections
In the case of aggregation directly between a pair of end stations, no switches are involved at all. As in the station-to-switch case, the higher performance channel is created without having to upgrade to higher-speed LAN hardware. In some cases, higher-speed NICs may not even be available for a particular server platform, making link aggregation the only practical choice for improved performance.
Physical issues in Link Aggregation

Addressing
Each network interface controller is assigned a unique MAC address. Usually this address is Programmed into the ROM during manufacturing. During initialization, the device driver reads the contents of the ROM and transfers the address to a register within the MAC controller. In most cases, this address is used as source and destination address during the transmission of packets. Aggregated links are to appear as a single link with a single logical network interface and therefore only have one virtual MAC address. The MAC address of one of the interfaces belonging to the aggregated link provides the virtual address of the logical link.
Frame Distribution[transmission of frames ]

When applying WAN technologies, frames are sometimes broken into smaller units to accelerate transmission . LAN communications channels, however, do not support sub-frame transfers. The complete frame has to be sent through the same physical link. Using aggregated links, the task is to select the link on which to transmit a given frame. Sending one long frame may take longer
33/51
Switch Book
Layer 2 concepts
than sending several short ones, so the short frames may be received earlier than one long frame. The order has to be restored at the receiver side. Thus, an agreement has been made: all frames belonging to one conversation must be transmitted through the same physical link, which guarantees correct ordering at the receiving end station. For this reason no sequencing information may be added to the frames. Traffic belonging to separate conversations can be sent through various links in a random order. The algorithm for assigning frames to a conversation depends on the application environment and the kind of devices used at each end of the link. When a conversation is to be transferred to another link because the originally mapped link is out of service (failed or configured out of the aggregation) or a new link has become available relieving the existing ones, precautions have to be taken to avoid mis-ordering of frames at the receiver. This can be realized either by means of a delay time the distributor must determine somehow or through an explicit marker protocol that searches for a marker identifying the last frame of a conversation. The distributor inserts a marker message behind the last frame of a conversation. After the collector receives this marker message it sends a response to the distributor, which then knows, that all frames of the conversation have been delivered. Now the distributor can send frames of these types of conversations via a new link without delay. If the conversation is to be transferred to a new link, because the originally mapped link failed, this method will not work. There is no path on which the message marker can be transferred, i.e. the distributor has to employ the timeout method.
Technology Constraints
In principle, the devices applied in the aggregation restrict the throughput. Using an aggregation of four 100 Mb/s links instead of one 100 Mb/s link will increase the capacity but the throughput on each link remains the same.
34/51
Switch Book
Layer 2 concepts
CHAPTER-11
Virtual LANs:Applications and Concepts
35/51
Switch Book
Layer 2 concepts
VLAN (Virtual LAN): Virtual LAN - Virtual Local Area Network. A division of a local area network by software rather than by physical arrangement of cables. Division of the LAN into subgroups can simplify and speed up communications within a workgroup. Switching a user from one virtual LAN to another via software is also easier than rewiring the hardware. The stations on the same VLAN group can communicate with each other. With VLAN, a station cannot directly talk to or hear from stations that are not in the same VLAN group(s) Applications of VLAN: 1.) Software patch panel: This simple application requires only port based vlans. With the centralized wiring center connections between equipment on the LAN are made by patch cord interconnections on a wiring panel. Thus moving, adding or changing a station Can be simply achieved by changing the patch cord interconnections without rewiring. 2.) LAN Security: A user on a shared LAN can create problems by sending lots of traffic to some targeted users, resulting in performance degradation. Therefore by creating logical partitions to the catenet with VLAN technology we enhance the protections against unwanted traffic. Port based VLAN allows free communication among the members of a given VLAN, but does forward traffic among switch ports associated with members of different VLANs. 3.) User Mobility: a.) Users view of the network can stay consistent regardless of physical location. b.)Network layer addresses may not need to be changed based on physical location. c.)Mobile users are granted access privileges so that they can access their home servers.
4.) Bandwidth Preservation: VLAN technology will isolate traffic between logically separated workgroups, thus preserving bandwidth.
VLAN Concepts: A station can be in multiple VLANs depending upon the capabilities of the station and switches deployed and applications operating within the station. Stations simply look at frames and classify a frame as belonging to a particular based on a set VLAN association rules. LAN aware devices just need to apply the rules and classify frames as belonging to one vlan or another.
VLAN Tagging:
36/51
Switch Book
Layer 2 concepts
Implicit Tags: Here tags are not involved; its an unmodified frame as sent by any station or switch. All frames sent by VLAN unaware end stations are considered implicitly tagged. Here its based on set of VLAN association rules. The VLAN association is a function of protocol type, data link source address, higher layer network identifiers etc. If there are no explicit tags provided then the VLAN aware switch must determine the VLAN association from an application of the rules. Explicit Tags: An explicit tag is a predefined filed in a frame that carries the VLAN identifier for that frame. These tags are applied to the VLAN aware devices and these devices after receiving the frame does not re-apply the application rules.
Tagged Frame Type - this indicates the type of tag, for Ethernet frames this is currently always 0x8100. Priority - this ranges from binary 000 (0) for low priority to binary 111 (7) for high priority Canonical - this is always 0. VLAN ID - this identifies the VLAN number when trunking VLANs.
VLAN Awareness:
37/51
Switch Book
Layer 2 concepts
1.) Making frame forwarding decisions based on VLAN association of a given frame.(based on DA and also on the VLAN to which the frame belongs). 2.) Providing explicit VLAN identification within transmitted frame.
VLAN Aware Switches: Edge Switches: These switches connect at the boundary between VLAN unaware domain and the VLAN aware domain. This switch apply rules on every frame and then tags these frames for forwarding it to the backbone through the core switch.An edge switch will remove the inserted tag before forwarding the frame to the VLAN unaware domain. Core Switches: These switches connect between two VLAN aware devices. They do not tag or untag frmes. It purely forwards frames based on VLAN identification in the tag. It consist of a table that maps VLAN identifiers to the set of ports that are needed to reach the members of the VLAN.The depth of the table is fixed at 4094 entries. Vlan Aware End Stations: (Advantages): 1.) A Set of stations may negotiate a dynamically created VLAN for the purpose of carrying on short term audio or video conference and the conferencing application can tag frames for that particular conference with a unique VLAN identifier. 2.) The frame sent by the station will reach only to members of that same VLAN. 3.) If all frames carry VLAN tags, then all switches will become core switches that is switches will make decision based on vlan tag information. VLAN awareness in end stations. (Methods) 1.)Applications themselves need to be written to be VLAN aware. 2.)APIs need to be enhanced to support passing of VLAN information to and from applications. 3.) Device drivers for LAN interfaces need to be changed to allow a client to specify a VLAN in addition to the other information needed to send frames on its behalf. 4.) Insert VLAN tags within transmitted frames.This is implemented in the device driver or in VLAN aware NIC.
VLAN Unaware Switches: VLAN unaware switches are not capable of tagging or untagging.VLAN unaware switch can process a VLAN tagged frame based on the address in the frame.
VLAN Association Rules: (Mapping frames to VLANs)
38/51
Switch Book
Layer 2 concepts
1.) Port based VLAN mapping: Stations within a given VLAN can freely communicate among themselves. No communication is possible between stations connected to ports that are members of different VLANs.Its used for software patch panel. It provides bandwidth preservation.This mapping is used in force10. 2.) Mac address VLAN mapping: In this type of mapping switch uses source address to determine the VLAN membership. A look up process that is used to learn the port mapping for the station is used to determine the VLAN mapping.
3.) Protocol Based VLAN mapping: A switch with protocol-based VLANs that divide the physical network into logical VLAN groups for each required protocol. When a frame is received at a port, its VLAN membership can then be determined based on the protocol type being used by the inbound packets. The protocol based VLAN mapping allows a station to be member of multiple VLANs depending on the number of protocols it supports (IP, IPX, and Appletalk etc).The VLAN mapping is a function of both the source address and the encapsulated protocol.
4.) IP Subnet Based VLAN mapping: In this of mapping the VLANs are divided based on the IP Subnets. A VLAN aware switch needs to perform two operations to create IP subnet based VLANs. a.)Check if frame encapsulates an IP datagram. b.)Extract the IP subnet portion of the IP source address in the encapsulated datagram. 5.) Application Based VLAN mapping: In this type the VLANs are divided based on higher layer application processes. The applications could provide audio or video conferencing, group document preparation etc.The use of application based VLANs requires that the station be VLAN aware.The application will ensure that the frame carried the VLAN identifier in an explicit tag, so that the VLAN aware switches never need to parse the frames to determine the application and they can simply switch frames based upon the VLAN identified in the tag.
39/51
Switch Book
Layer 2 concepts
Chapter 12
Virtual LANs: The IEEE Standard
40/51
Switch Book
Layer 2 concepts
VLAN: Virtual Local Area Network and IEEE 802.1Q

Virtual LAN (VLAN) is a group of devices on one or more LANs that are configured so that they can communicate as if they were attached to the same wire, when in fact they are located on a number of different LAN segments. Because VLANs are based on logical instead of physical connections, it is very flexible for user/host management, bandwidth allocation and resource optimization.
There are the following types of Virtual LANs:
1. Port-Based VLAN: each physical switch port is configured with an access list specifying membership in a set of VLANs. 2. MAC-based VLAN: a switch is configured with an access list mapping individual MAC addresses to VLAN membership. 3. Protocol-based VLAN: a switch is configured with a list of mapping layer 3 protocol types to VLAN membership - thereby filtering IP traffic from nearby end-stations using a particular protocol such as IPX. The IEEE 802.1Q specification establishes a standard method for tagging Ethernet frames with VLAN membership information. The IEEE 802.1Q standard defines the operation of VLAN Bridges that permit the definition, operation and administration of Virtual LAN topologies within a Bridged LAN infrastructure. The 802.1Q standard is intended to address the problem of how to break large networks into smaller parts so broadcast and multicast traffic would not grab more bandwidth than necessary. The standard also helps provide a higher level of security between segments of internal networks
Protocol Structure - VLAN: Virtual Local Area Network and the IEEE 802.1Q IEEE 802.1Q Tagged Frame for Ethernet:
7 Preamble 1 SFD 6 DA 6 SA 2 TPID 2 TCI 2 Type Length 42-1496 Data 4 CRC
TPID - defined value of 8100 in hex. When a frame has the EtherType equal to 8100, this frame carries the tag IEEE 802.1Q / 802.1P.
41/51
Switch Book
Layer 2 concepts
TCI - Tag Control Information field including user priority, Canonical format indicator and VLAN ID.
Tag-based VLAN Overview
Regarding IEEE 802.1Q standard, Tag-based VLAN uses an extra tag in the MAC header to identify the VLAN membership of a frame across bridges. This tag is used for VLAN and QoS (Quality of Service) priority identification. The VLANs can be created statically by hand or dynamically through GVRP. The VLAN ID associates a frame with a specific VLAN and provides the information that switches need to process the frame across the network. A tagged frame is four bytes longer than an untagged frame and contains two bytes of TPID (Tag Protocol Identifier, residing within the type/length field of the Ethernet frame) and two bytes of TCI (Tag Control Information, starts after the source address field of the Ethernet frame).
TPID : TPID has a defined value of 8100 in hex. When a frame has the EtherType equal to 8100, this frame carries the tag IEEE 802.1Q / 802.1P.
42/51
Switch Book
Layer 2 concepts
Priority: The first three bits of the TCI define user priority, giving eight (2^3) priority levels. IEEE 802.1P defines the operation for these 3 user priority bits.
CFI: Canonical Format Indicator is a single-bit flag, always set to zero for Ethernet switches. CFI is used for compatibility reason between Ethernet type network and Token Ring type network. If a frame received at an Ethernet port has a CFI set to 1, then that frame should not be forwarded as it is to an untagged port.
VID: VLAN ID is the identification of the VLAN, which is basically used by the standard 802.1Q. It has 12 bits and allow the identification of 4096 (2^12) VLANs. Of the 4096 possible VIDs, a VID of 0 is used to identify priority frames and value 4095 (FFF) is reserved, so the maximum possible VLAN configurations are 4,094.
Note that user priority and VLAN ID are independent of each other. A frame with VID (VLAN Identifier) of null (0) is called a priority frame, meaning that only the priority level is significant and the default VID of the ingress port is given as the VID of the frame.
How 802.1Q VLAN works
According to the VID information in the tag, the switch forward and filter the frames among ports. These ports with same VID can communicate with each other. IEEE 802.1Q VLAN function contains the following three tasks, Ingress Process, Forwarding Process and Egress Process.
43/51
Switch Book
Layer 2 concepts
1. Ingress Process: Each port is capable of passing tagged or untagged frames. Ingress Process identifies if the incoming frames contain tag, and classifies the incoming frames belonging to a VLAN. Each port has its own Ingress rule. If Ingress rule accept tagged frames only, the switch port will drop all incoming non-tagged frames. If Ingress rule accept all frame type, the switch port simultaneously allow the incoming tagged and untagged frames:
When a tagged frame is received on a port, it carries a tag header that has an explicit VID. Ingress Process directly passes the tagged frame to Forwarding Process.
An untagged frame doesn't carry any VID to which it belongs. When an untagged frame is received, Ingress Process insert a tag contained the PVID into the untagged frame. Each physical port has a default VID called PVID (Port VID). PVID is assigned to untagged frames or priority tagged frames (frames with null (0) VID) received on this port.
44/51
Switch Book
Layer 2 concepts
After Ingress Process, all frames have 4-bytes tag and VID information, and then go to Forwarding Process. 2. Forwarding Process: The Forwarding Process decides to forward the received frames according to the Filtering Database. If you want to allow the tagged frames can be forwarded to certain port, this port must be the egress port of this VID. The egress port is an outgoing port for the specified VLAN, that is, frames with specified VID tag can go through this port. The Filtering Database stores and organizes VLAN registration information useful for switching frames to and from switch ports. It consists of static registration entries (Static VLAN or SVLAN table) and dynamic registration entries (Dynamic VLAN or DVLAN table). SVLAN table is manually added and maintained by the administrator. DVLAN table is automatically learned via GVRP protocol, and can't be created and upgraded by the administrator
45/51
Switch Book
Layer 2 concepts
CHAPTER 13
Priority Operation
46/51
Switch Book
Layer 2 concepts
Priority operation adds complexity to switches, there is no need to pay for this complexity unless there is an application benefit to be gained. There are two situations to consider:
1.) The catenet cannot handle steady state traffic load offered by other users: It will occur if some link or switch in the catenet has inadequate capacity to support the desired application data flows.A steady state problem will occur if the switch did not support wire speed operation at higher data rate.The solution is to add capacity to the network.
2.) The catenet has sufficient capacity to handle the steady state traffic load,but not short term peak loads.There can be times when the traffic load wil exceed the capacity of some link or switch,regardless of the design of the catenet.So priorities come in to picture here,some traffic streams are more important then these streams can be given priority over less traffic.This will work only for over load conditions.
LAN Priority Mechanisms: 1.)Access priority:Giving priority to a particular station in a shared LAN. (a.)Static:Giving priority to the station all the time. (b)Dynamic:Priority is given on frame by frame basis depending on applications running.
2.)User priority:It is the priority assigned to a given frame by the application sourcing those frames.
For Ethernet access priority,some of the methods employed are:
1.) Shortened interframe gap: By reducing the IFG, we are making the particular traffic to go sooner than others.
2.) Modified backoff algorithm: When collision occurs, the device with shortened backoff time will transmit its frames sooner than the other stations involved in the collision.
47/51
Switch Book
Layer 2 concepts
3.) Looong preamble: The longer the preamble, the higher the priority. The device with the longest preamble ignores collision and continues with its frame transmission.
VLAN and Priority Tagging:
Tagged Frame Type - this indicates the type of tag, for Ethernet frames this is currently always 0x8100. Priority - this ranges from binary 000 (0) for low priority to binary 111 (7) for high priority Canonical - this is always 0. VLAN ID - this identifies the VLAN number when trunking VLANs. In order to use Priority mechanisms, 1.) The operating system and protocol stack have 2 to be modified. 2.) APIs in the internet protocol stacks have to be modified. 3.) Protocol implementations within the end stations may have to be enhanced.
48/51
Switch Book
Layer 2 concepts
4.)Operating system code(NIC APIs,network devices drivers)have to be modified. 5.) Network interfaces have to be modified. Edge Switches: These switches sit on the boundary between the priority unaware world and the priority aware core. They provide attachments for end stations directly. Core Switches: These typically provide backbone interconnections between the edge switches. Priority Operation in switches: If we dont invoke any priority mechanisms, the operation of a switch is quite straightforward; the switch handles all frames equally. The whole idea of priority is to allow frames that are more important to jump ahead of lower priority frames in the queue. Switch Process Flow for priority operation. Its a three step process 1.) Determining frame priority on input: On receipt of a frame, the switch must determine the priority of that frame, either from explicit priority information provided in the frame itself or implicitly from the frame contents and a set of administrative policy rules. 2.) Mapping input priority to class of service. Knowing the priority of the frame, the switch must map that priority to one set of classes of service avaible at each output port on which the frame is to be forwarded.Typically; each service class identifies a particular output queue on each port. 3.) Output Scheduling. For a set of multiple output queues, the switch must apply some scheduling algorithm to transmit frames from those queues according to the needs of the classes of service that they represent. Scheduling Algorithms:
49/51
Switch Book
Layer 2 concepts
1.) Strict priority: As the name implies inteprets priority literally, higher priority queues will be served first and then the lower priority queues. It is the easiest policy to implement. Incase if a high priority user offers more load than the capacity of the output port, no frames will be transmitted from the lower priority queues. In extreme case, all frames will be discarded. 2.) Weighted Fair Queuing: Its an alternative approach which does not exclude lower priority queues completely. Weight is assigned to each queue; higher priority queues are given greater weight than lower priority queues.The output scheduler then use a round robin algorithm tempered by indicated weight. Weights are usually assigned depending upon the bandwidth allocated to each queue. That is, if all queues have traffic to send, the avaible bandwidth will be divided among them by the ratio of their weights. Indicating the priority in transmitted frames: On input, we made a priority determination and possibly remapped that priority to a globally consistent set of semantics. On output, we have three choices: 1.) Signal the user priority in a VLAN-style tag: This relieves the next device from having to make an implicit priority determination from a set of administrative rules. The tagging approach requires that he output port support tagged frames. 2.) Signal the user priority in a LAN specific maner: This method is used when output port does not support tags, but supports native indication of user priority. 3.) Dont signal user priority: On Ethernet ports without tag support, there is no choice but to forward the frame without priority and the next device to receive the frame will need to determine the priority through implicit means. Priority Regeneration: The IEEE 802.1p and q standards provide for priority regeneration. Priority regeneration is only used when explicit priority is indicated in received frames
50/51
Switch Book
Layer 2 concepts
through a native priority field. Priority regeneration can be used not only to equalize service levels among departments but to change or override the local administrative policy.Priority regeneration allows an easy means of migrating and merging priority enabled LANs into larger catenet without having to change all of the local administrative policies at once. IEEE 802.1p: IEEE 802.1P defines a priority field that can be used by LAN switches and such at the Ethernet level to prioritize traffic. The prioritization specification works at the media access control (MAC) framing layer (OSI model layer 2). The 802.1P standard also offers provisions to filter multicast traffic to ensure it does not proliferate over layer 2-switched networks. The 802.1p header includes a three-bit field for prioritization, which allows packes to be grouped into various traffic classes. The IEEE has made broad recommendations concerning how network managers can implement these traffic classes, but it stops short of mandating the use of its recommended traffic class definitions. It can also be defined as best-effort QoS (Quality of Service) or CoS (Class of Service) at Layer 2 and is implemented in network adapters and switches without involving any reservation setup. 802.1p traffic is simply classified and sent to the destination; no bandwidth reservations are established.
********************************************************
51/51

The Switch Book by Rich Seifert-Notes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Switch Book by Rich Seifert-Notes

Uploaded by

Copyright:

Available Formats

Switch Book

THE SWITCH BOOK By Rich Seifert

Foundations Of LAN Switches:

2.Data Link Layer:

Presents proper data to application layer. Data formats: encryption,decryption, encoding,decoding.

Data link sub layering:

Dest SAP-1 Byte

Source SAP- CTRL-1 1 Byte Byte

LLC/Snap Format: If the SAP is set to OXAA, then SNAP is in use.

Low cost,High speed communication.

Sensing carrier. Waiting for Interframe gap. Transmission takes place.

If it matches then frame is forwarded to client.

Ethernet Frame Formats:

Type encapsulation:ETHERNET VERSION-2

Length encapsulation:IEEE 802.3

46-1500 LLC Preamble/SFD DA SA HEADER FCS

LENGTH DSAP SSAP CTRL DATA PAD

DA:Destination address of the frame.It consists of 6 bytes.

SA:Source addressof the frame.It consists of 6 bytes.

PHYSICAL LAYER ENCAPSULATION-STREAM-BITS ETHERNET FRAME IP PACKET TCP SEGMENT

A transport PDU is called a segment or message.

A Network PDU is called a packet.

A Data Link PDU is called frame.

A Physical layer PDU is called symbol stream.

PDU: Protocol Data Unit.

Figure 1: Transparent bridges build a table that determines a host's accessibility

Implementing the bridge address table

1. Switched LAN concepts

: The set of stations sharing a given LAN and arbitrating

: Similarly, the set of stations contending for the use of

token on a token-passing LAN, which results in Token domain.

Segmentation and Microsegmentation:

Microsegmentation interesting characters:

Extended distance limitations:

Switches allow us to extend distance coverage of a LAN.

Increase aggregate capacity:

A switch provides greater data-carrying capability than a shared LAN.

2. Cut-Through verses Store-and-Forward

Spanning tree protocol

Full Duplex Operation in LAN:

The need for flow control:

Controlling flow in half duplex networks

Frame Check Sequence (4-bytes)

(7-bytes) Delimiter (1-byte)

Address = 802.3 MAC (6bytes) Control (88-08)

Chapter 9 LINK AGGREGATION

Why Link Aggregation?

Link Aggregation provides the following important benefits:

Aggregating replaces Upgrading

Types of Link Aggregation

Switch-to-Station (Server or Router) Connections

Physical issues in Link Aggregation

Frame Distribution[transmission of frames ]

Virtual LANs:Applications and Concepts

VLAN Association Rules: (Mapping frames to VLANs)

Virtual LANs: The IEEE Standard

VLAN: Virtual Local Area Network and IEEE 802.1Q

There are the following types of Virtual LANs:

Tag-based VLAN Overview

How 802.1Q VLAN works

For Ethernet access priority,some of the methods employed are:

VLAN and Priority Tagging: