Juniper Link&Node Robustness

White Paper
IP Dependability: Network Link and Node

Protection
Chuck Semeria
Marketing Engineer
Juniper Networks, Inc.

1194 North Mathilda Avenue
Sunnyvale, CA 94089 USA
408 745 2000 or 888 JUNIPER
www.juniper.net
Part Number : 200044-001

Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
A Recovery Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Recovery Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Reversion Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Dynamic Rerouting Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
SONET Automatic Protection Switching (APS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
SONET APS 1+1 and APS 1:N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Juniper Networks Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Example 1: Protecting Against a PIC Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Example 2: Protecting Against a Router Failure . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Example 3: APS Load Sharing between Circuit Pairs . . . . . . . . . . . . . . . . . . . . . . 13
Virtual Router Redundancy Protocol (VRRP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
VRRP Operational Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Example 1: Basic VRRP Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Example 2: VRRP Load-Sharing Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Link Aggregation and Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Multilink Point-to-Point Protocol (MLPPP): T1/E1 Link Bonding . . . . . . . . . . . . . . . 19
Juniper Networks Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
IEEE 802.3 ad: Ethernet Link Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Juniper Networks Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
SONET/SDH Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
MPLS Label-Switched Path (LSP) Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
MPLS LSP Protection Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Secondary Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Fast Reroute (or 1:1 Protection) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Link Protection (or 1:N Protection) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Secondary Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Fast Reroute (One-to-One Backup) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Link Protection (Many-to-One or Facility Backup) . . . . . . . . . . . . . . . . . . . . . . . . 28
PFE Local Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Fate Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Juniper Networks Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Requests for Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Internet Drafts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Other Standards Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Copyright © 2002, Juniper Networks, Inc.

List of Figures
Figure 1: Juniper Networks High-Dependability Architecture . . . . . . . . . . . . . . . . . . . . . . . 4
Figure 2: The Recovery Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Figure 3: The Reversion Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Figure 4: The Dynamic Rerouting Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Figure 5: SONET APS 1+1 and APS 1:N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Figure 6: APS Configuration Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Figure 7: Protecting Against a PIC Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Figure 8: Protecting Against a Router Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Figure 9: APS Load Sharing Between Circuit Pairs—Initial Configuration . . . . . . . . . . . . 13
Figure 10: APS Load Sharing Between Circuit Pairs—Failure Switchover . . . . . . . . . . . . . 14
Figure 11: Statically Configured Default Routes Create A Single Point of Failure . . . . . . 15
Figure 12: Typical VRRP Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Figure 13: Typical VRRP Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Figure 14: PPP vs. MLPPP Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Figure 15: Example IEEE 802.3ad Group Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 16: IEEE 802.3ad Virtual MACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 17: LSP Recovery Without Protection Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 18: Standby Secondary Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 19: Fast Reroute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 20: Link Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 21: Fast Reroute Detour Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Figure 22: PFE Local Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Copyright © 2002, Juniper Networks, Inc. 3

IP Dependability: Network Link and Node Protection
Executive Summary
Providing protections against link or node failures is an important requirement for the
successful delivery of subscriber services. As the amount of mission-critical data carried by
native IP or converged MPLS infrastructures increases, the effect of disruptions caused by
network outages becomes more significant and more costly. This paper describes mechanisms
supported by JUNOS software that ensure the fundamental availability of the network to
support subscriber services when a link or node fails.
Introduction
Figure 1 shows Juniper Networks architecture for providing solutions that support highly
dependable services across native IP or converged MPLS infrastructures. High dependability
results from the combination of dependable platforms, dependable networks, dependable
service-enabling features, and enhanced security capabilities.
Figure 1: Juniper Networks High-Dependability Architecture
Dependable Subscriber Services
Dependable Service-enabling Features
Forwarding MPLS DiffServ

Security
Dependable Networks
Dependable Platforms
Hardware Software Process
Like a classic protocol stack diagram, Figure 1 shows how these various elements are related
and how high dependability is the result of a holistic approach to delivering network services:
■ Dependable platforms are based on reliable hardware, reliable software, and reliable design
and manufacturing processes.
■ Dependable networks result from implementing a set of features that allow providers to
hide network faults and service interruptions from their subscribers.
■ Dependable service-enabling features include raw packet forwarding performance,
industrial-strength implementations of MPLS traffic engineering, Differentiated Services
(DiffServ) support, and ATM and Frame Relay convergence features.
■ Security is fundamental because it impacts a carrier’s ability to provide the reliable
platforms, the reliable networks, and the reliable service-enabling features needed to meet
subscriber application requirements.
This paper focuses on the set of mechanisms supported by JUNOS software that are used to
deliver dependable networks. These techniques include:
■ Synchronous Optical Network (SONET) Automatic Protection Switching (APS) or
Synchronous Digital Hierarchy (SDH) Multiplex Section Protection (MSP)
■ Virtual Router Redundancy Protocol (VRRP)

■ Link aggregation and redundancy capabilities

■ Multilink Point-to-Point Protocol (MLPPP) for T1/E1 bonding
■ IEEE 802.3ad Ethernet link aggregation
■ SONET/SDH link aggregation
■ Reliability for Multiprotocol Label Switching (MPLS) label-switched paths (LSPs)
■ MPLS secondary paths
■ MPLS fast reroute
■ MPLS link protection
All of these strategies involve providing backup resources within the network to support the
forwarding of traffic in the presence of link or node failures.
A Recovery Framework
There are two basic models for physical and logical path recovery: rerouting and protection
switching. Rerouting uses the dynamic establishment of new paths to restore the flow of traffic
after a link or router failure. Protection switching uses pre-established paths to restore the flow
of traffic after a link or router failure. Protection switching can also involve the pre-reservation
of network resources along the protection path.
The recovery framework consists of three recovery models:
■ Recovery cycle
■ Reversion cycle
■ Dynamic rerouting cycle
Recovery Cycle
The recovery cycle uses protection switching to describe the individual steps required to detect
a network failure and then restore traffic to a recovery path. If the recovery path is not optimal,
then it may be followed by the reversion cycle or the dynamic rerouting cycle.
Figure 2: The Recovery Cycle
Traffic Flows Primary Path Failure Start of Start of Recovery Recovery Operation Traffic Flows
on Primary Path Failure Detected Notification Operation Complete on Recovery Path
Primary Fault Recovery Traffic

Hold-off Notification
Path Detection Operation Restoration
Time Time
Time Time Time Time
Each timing interval in the recovery cycle (Figure 2) is defined below:

■ The Primary Path Time is the time that the primary path is operational and traffic flows on
the primary path.

■ The Fault Detection Time is the time between the moment the failure occurs and the moment
the failure is detected by the recovery mechanism.
■ The Hold-Off Time is the configured waiting time between the moment the failure is
detected by the fault recovery mechanism and the start of the recovery operation. The
Hold-Off Time may occur after the Notification Time if the node responsible for the
switchover is configured to wait. In some implementations the Hold-Off Time can be zero
seconds.
■ The Notification Time is the time between the transmission of the fault indication signal by
the node detecting the fault and the start of the first recovery action by the node responsible
for the switchover. The Notification Time is zero if the node responsible for detecting the
failure is the same node that is responsible for the switchover.
■ The Recovery Operation Time is the time between the first recovery action and the last
recovery action.
■ The Traffic Restoration Time is the time between the last recovery action and the time when
the traffic flow is restored to the recovery path.
This is an example of the recovery cycle:
1. Traffic flows on the primary path
2. A link or router on the primary path fails.
3. The fault recovery mechanism detects the failure and transmits a failure indication signal to
the node that is responsible for initiating the protection switching.
4. The failure indication signal is received by the node that is responsible for initiating the
recovery.
5. The node responsible for initiating the recovery initiates a protection switch to a
preconfigured recovery path.
6. The node responsible for the recovery switches traffic from the failed working path to the
recovery path.
7. Traffic flows on the recovery path.
Reversion Cycle
When operating in reversion mode, protection switching requires that traffic be switched back
to the primary path when the failure on the primary path is corrected.
Figure 3: The Reversion Cycle
Start of Reversion
Traffic flows Primary Path Failure Primary Path Reversion Operation Traffic Flows
on Recovery Path Repaired Cleared Available Operation Complete on Primary Path
Recovery Fault Reversion Traffic

Wait-to-restore Notification
Path Clearing Operation Restoration
Time Time
Time Time Time Time

Each timing interval in the reversion cycle (Figure 3) is defined below:

■ The Recovery Path Time is the time that the primary path is not operational and traffic flows
on the recovery path.
■ The Fault Clearing Time is the time between the repairing of the primary path failure and the
time when the reversion mechanism learns that the failure has been cleared.
■ The Wait-to-Restore Time is the time between the clearing of the failure and the first
reversion action. The Wait-to-Restore Time may occur after the Notification Time if the
node responsible for the switchback is configured to wait.
■ The Notification Time is the time between the initiation of the reversion signal and the time
at which the first reversion action is taken.
■ The Reversion Operation Time is the time between the first reversion action and the last
reversion action.
■ The Traffic Restoration Time is the time between the last reversion action and the time when
the traffic flow is restored to the primary path.
This is an example of the recovery cycle followed by a reversion cycle:
1. Traffic flows on the primary path.
the node that is responsible for initiating the protection switching.
4. The failure indication signal is received by the node that is responsible for initiating the
recovery.
5. The node responsible for the recovery initiates a protection switch to a preconfigured
recovery path.
6. The node responsible for the recovery switches traffic from the failed primary path to the
recovery path.
8. The failure on the primary path is corrected.
9. The reversion mechanism detects the correction of the failure on the primary path and
transmits a failure corrected indication signal to the node that is responsible for initiating
the reversion.
10. The failure corrected indication signal is received by the node that is responsible for
initiating the reversion.
11. The node responsible for the reversion initiates a protection switch from the recovery path
to the corrected primary path.
12. The node responsible for the reversion switches traffic from the recovery path back to the
corrected primary path.

Dynamic Rerouting Cycle

The dynamic rerouting cycle is designed to bring the network back to a stable state after a
network outage occurs. The dynamic rerouting cycle creates a reoptimized network after the
routing protocols converge by moving traffic from the recovery path to the new primary path.
Figure 4: The Dynamic Rerouting Cycle
Switchover Traffic Moved

Network Enters Dynamic Routing Initial Setup of Operation to a New
Semi-stable State Protocols Converge New Working Path Complete Working Path
Route Traffic
Convergence Hold-down Switchover Restoration
Time Time (optional) Time Time
Each timing interval in the dynamic rerouting cycle (Figure 4) is defined below:
■ The Route Convergence Time is the time it takes for routing protocols to converge and for the
network to enter a stable state.
■ The Hold-Down Time is the time that the recovery path is used.
■ The Switchover Time is the time between the first switchover action and the last switchover
action.
■ The Traffic Restoration Time is the time between the last switchover action and the time when
the traffic flow is restored to the new working path.
This is an example of protection switching followed by dynamic rerouting:
the node that is responsible for the protection switching.
4. The failure indication signal is received by the node that is responsible for the protection
switching.
5. The node responsible for the protection switching initiates a protection switch to a
preconfigured recovery path.
6. The node responsible for the protection switching switches traffic from the failed primary
path to the recovery path.
8. The network enters a semi-stable state.
9. Dynamic routing protocols converge after the failure.
10. A new optimized primary path is calculated.
11. The new primary path is established.
12. Traffic is switched from the protection path to the new optimized primary path.
13. Traffic flows on the new optimized primary path.

SONET Automatic Protection Switching (APS)

Synchronous Optical Network (SONET) is the current transmission, multiplexing,
management, and interoperability standard for high-speed fiber-optic networks in North
America. The Synchronous Digital Hierarchy (SDH) is a closely related standard that has been
deployed in Europe and Asia. Although there are many similarities between SONET and SDH,
there are some important differences with respect to terminology. To avoid confusion, the
following discussion uses SONET, not SDH, terminology.
SONET Automatic Protection Switching (APS) provides the ability to restore Layer 1
connectivity after a network failure. The equivalent protocol in an SDH network is known as
multiplex section protection (MSP). The types of operational problems that can be corrected
with SONET APS include fiber cuts, laser failures, node failures, and signal degradation.
SONET network elements constantly monitor the well-being of the network: When a failure is
detected by one or more network elements, the network follows a standardized procedure to
transfer traffic from the primary facility (or “working” channel) to a backup facility (or
“protect” channel). The maximum allowable switchover time from the working channel to the
protect channel is 60 milliseconds.
SONET APS 1+1 and APS 1:N

SONET/SDH APS has two distinct flavors: ring APS and linear APS. For routers, only linear
APS applies. There are two types of linear APS architectures: APS 1+1 and APS 1:N (Figure 5).
■ The APS 1+1 architecture supports line redundancy by providing a dedicated protect
channel for each working channel. Traffic is transmitted simultaneously on two separate
fibers from the head end node to the tail end node producing a working signal and a
protect signal that are identical. The tail end node monitors the quality of both signals and
selects a signal from one of the two fibers for reception. To be effective, the protect channel
should follow a different physical path from the working channel.
■ The APS 1:N architecture provides one protect channel that protects N working channels,
where N is greater than or equal to 1. In 1:N protection, there are still multiple fibers
between the head end and the tail end nodes but traffic is transmitted only over the
working facilities, while the protect channel is kept free until a working channel fails. If
multiple working channels fail, the APS protocol ensures that traffic from only one of the
failed working channels is switched to the protect channel.
In both architectures, the failure of the working channel causes the protect channel to
automatically take over and restore data flow to the network.
Figure 5: SONET APS 1+1 and APS 1:N
1 + 1 APS Architecture 1:N APS Architecture
Working Channels
Working Channel 1
2
ADM ADM
Protect Channel N
Protect Channel
The working channel and the protect channel constitute a bidirectional protection pair. Each
end of the link maintains an APS state machine to process the received protection group line
events and management requests. Signaling is sent between the APS state machines using the

K1/K2 overhead bytes (also called the “APS channel”) in the SONET frame. The setting of the
bits in the received K1/K2 bytes determines the current state of the remote end and whether a
switch operation is required. If a network failure is detected, the protect switchover is
coordinated by changing various bits within the K1/K2 bytes.
APS switching can operate in revertive or non-revertive mode. When operating in revertive
mode and the original working channel is repaired, traffic is switched back to the original
working channel from the protect channel. When operating in nonrevertive mode and the
original working channel is repaired, traffic remains on the protect channel and is not switched
back to the original working channel.
Juniper Networks Implementation

The JUNOS implementation of SONET APS/SDH MSP allows you to protect against circuit
failures between an Add Drop Multiplexer (ADM) and one or more routers, and between
multiple interfaces in the same router. The JUNOS software supports APS 1+1 switching,
bidirectional only (where both fibers go to protect, even if only one fails), and either revertive or
nonrevertive mode. Similar to all router APS 1+1 implementations, JUNOS software does not
transmit identical data on both the working and protect channels (as the APS specification
requires for 1+1 switching) but this has no operational impact.
To configure APS, you configure working channel and protect channel group (Figure 6).
■ To protect against a PIC or FPC failure, you connect one router to the ADM through both
the working and protect channels, configuring one of the PICs or FPCs as the working
channel and the second PIC or FPC as the protect channel. If the working PIC or FPC fails,
traffic is automatically switched over to the protect PIC or FPC.
■ To protect against a router failure, you connect two routers to the ADM and configure one
of the routers as the working router (Router A) and the other router as the protect router
(Router B). If Router A fails, traffic is automatically switched over to Router B.
Figure 6: APS Configuration Topologies
Protecting Against a PIC or FPC Failure Protecting Against a Router Failure
ADM ADM
Working Protect Working Protect

Channel Channel Channel Channel
Router A Router B
The working and protect configurations on the router interfaces must match the circuit
configurations on the ADM. That is, the working router must be connected to the ADM’s
working channel, and the protect router must be connected to the ADM’s protect channel.

To configure APS, you need to include the sonet-options aps statement:

aps {
advertise-interval milliseconds;
authentication-key key;
force;
hold-time milliseconds;
lockout;
neighbor address;
paired-group group-name; #group names are just "matched" among
protect-circuit group-name; #interfaces & not defined
independently
request;
revert-time seconds;
working-circuit group-name;
}
The configuration parameters discussed in this section include the following:

■ The protect-circuit parameter allows you to identify the protect channel in an APS
circuit pair.
■ The working-circuit parameter allows you to identify the working channel in an APS
circuit pair
■ The revert-time parameter allows you to configure APS revertive mode and to specify
the number of seconds to wait after the working channel has again become functional
before switching traffic over to the working channel.
■ The neighbor parameter allows you to specify the IP address of the other router in a router
pair when the working and protect channels are configured on different routers.
■ The authentication-key parameter defines an authentication password that is used
when the working and protect channels are configured on different routers. The
authentication-key parameter should be identical on both the working and protect
routers.
■ The paired-group parameter allows you to configure load sharing between two
working-protect channel pairs.
Example 1: Protecting Against a PIC Failure

Figure 7 shows a common APS 1+1 application where one PIC port on a router is configured to
be the working channel and another PIC port on the same router is configured to be the protect
channel.
Figure 7: Protecting Against a PIC Failure
ADM
Working Protect
Channel Channel
so-1/2/1 so-4/0/2

The configuration for basic 1+1 APS support is very simple:

[edit interfaces so-1/2/1 sonet-options]
aps {
working-circuit San-Jose; # groups this interface with so-4/0/2
}
aps {
protect-circuit San-Jose; # groups this interface with so-1/2/1
}
Using the SONET group-name San Jose for both the working circuit and the protect circuit
parameters defines the APS association between the two PIC interfaces.
Example 2: Protecting Against a Router Failure

Figure 8 shows a typical application for APS 1+1 where one router is configured to be the
working router and a second router is configured to be the protect router.
Figure 8: Protecting Against a Router Failure
ADM
Working Protect
Channel Channel
so-1/2/1 so-4/0/2
200.5.2.3
Router A Router B
200.5.2.4
Router A is configured with the following parameters:

aps {
working-circuit Paris; # groups interface with so-4/0/2 on Rtr B
authentication-key "$123"; # must be set to same value on each router
neighbor 200.5.2.4; # Rtr B’s address on the link between A &
B
}
Router B is configured with the following parameters:

aps {
protect-circuit Paris; # groups interface with so-1/2/1 on Rtr A
authentication-key "$123"; # must be set to same value on each router
neighbor 200.5.2.3; # Rtr A’s address on the link between B &
A
}

This configuration is sufficient to ensure that if Router A fails, traffic will be switched over to
the protect channel on Router B.
Example 3: APS Load Sharing between Circuit Pairs

When two routers are connected to a single ADM, each router can backup the other on two
different pairs of circuits. This example requires two links to same ADM on each router. Figure
9 shows a typical application where APS is configured to provide load balancing between two
routers if one of the working circuits fails. Router A has a working interface for SF traffic
(so-7/0/0) and a protect interface for NY traffic (so-0/0/0). Router B has a working interface
for NY traffic (so-6/0/0) and a protect interface for SF traffic (so-1/0/0). Under normal
operating conditions, Router A carries SF traffic through interface so-7/0/0 and Router B
carries NY traffic through interface s0-6/0/0.
Figure 9: APS Load Sharing Between Circuit Pairs—Initial Configuration
ADM
SF NY
(working) NY SF (working)
(protect) (protect)
so-7/0/0 so-6/0/0
so-0/0/0 so-1/0/0
Router A Router B
200.5.2.3 200.5.2.4
To support load sharing between circuit pairs, Router A is configured with the following
parameters:
aps {
working-circuit SF; # groups interface with so-1/0/0 on Rtr B
authentication-key "$123"; # key for SF traffic
neighbor 200.5.2.4; # Rtr B’s address on link between A & B
paired-group SF-NY # load sharing between two pairs
}

aps {
protect-circuit NY; # groups interface with so-6/0/0 on Rtr B
authentication-key "$987"; # key for NY traffic
neighbor 200.5.2.4; # Rtr B’s address on link between A & B
}
To support load sharing between circuit pairs, Router B is configured with the following
parameters:
aps {
working-circuit NY; # groups interface with so-0/0/0 on Rtr A
authentication-key "$987"; # key for NY traffic
neighbor 200.5.2.4; # Rtr A’s address on link between B & A


aps {
protect-circuit SF; # groups interface with so-7/0/0 on Rtr A
authentication-key "$123"; # key for SF traffic
neighbor 200.5.2.4; # Rtr A’s address on link between B & A
}
Figure 10 shows that when the working channel carrying SF traffic on Router A fails, the two
routers automatically switch SF traffic from its original working interface (so-7/0/0 on Router
A) to its protect channel (so-1/0/0 on Router B). However, at this point Router B is required to
carry both SF traffic and NY traffic. To ensure that each router is required to carry only a single
circuit’s worth of traffic, the two routers also switch NY traffic from its original working
interface (so-6/0/0 on Router B) to its protect interface (so-0/0/0 on Router A). After the
switchover occurs, the working interface on Router A (s0-0/0/0) carries NY traffic and the
working interface on Router B (so-1/0/0) carries SF traffic.
Figure 10: APS Load Sharing Between Circuit Pairs—Failure Switchover
ADM
SF NY
(protect) NY SF (protect)
(working) (working)
so-7/0/0 so-6/0/0
so-0/0/0 so-1/0/0
Router A Router B
200.5.2.3 200.5.2.4
Virtual Router Redundancy Protocol (VRRP)

A host or server can use several different mechanisms that to determine its first-hop router
towards a specific IP destination. These mechanisms include running a dynamic routing
protocol such as the Routing Information Protocol (RIP) or Open Shortest Path First (OSPF),
snooping a dynamic routing protocol, executing an ICMP router discovery client, or using a
statically configured default route. The use of statically configured default routes has been
widely deployed in large IP networks because it minimizes end system administration,
overcomes potential security issues, does not require active participation by all hosts on a
network, and is supported by virtually all IP host implementations.
Figure 11 shows a typical deployment where a single default router connects a server farm to
the Internet. The servers can communicate with any host on the Internet as long as a route
exists between them. However, the use of statically configured default routes in host systems
creates a single point of failure because the failure of the default router isolates all hosts that
rely on the default router for Internet connectivity.

Figure 11: Statically Configured Default Routes Create A Single Point of Failure
Server
Farm Switch Default Router
Internet
The Virtual Router Redundancy Protocol (VRRP), defined in RFC 2338, is specifically designed
to protect against such failures by allowing two or more different routers to work together and
support a single virtual router for hosts and servers on a LAN. The routers in a VRRP group
share the IP and MAC addresses of the virtual router defined for the group. The IP address of
this virtual router is configured as the default gateway address on host systems. At any
moment in time, one of the VRRP routers is the master router (actively forwards packets) and
the other VRRP routers in the group serve as backups. VRRP supports the immediate and
automatic transfer of the routing responsibility (by means of the virtual router) from one
physical router to another physical router if the original default router fails. Furthermore,
VRRP is designed to operate in load-sharing environments and allow each load-sharing router
to act as a redundant backup for the others.
VRRP Operational Model

The operation of VRRP is discussion in two examples:
■ The first example is simple to help you understand the basic operation of the VRRP
protocol. It does not reflect what you are likely to see deployed in an operational network.
■ The second example is more complex and represents what you are likely to see deployed in
an operational network.
Example 1: Basic VRRP Configuration

Figure 12 shows two VRRP routers that are configured to support a single virtual router. Each
server is configured with a static default route to the IP address associated with virtual router
#1 (IP_VR1).

Figure 12: Typical VRRP Topology
VRRP Routers
R1: Master VR#1

Server
Farm Switch
IP_VR1
Internet
IP_2
R2: Backup VR#1
R1 is configured with the following parameters:

vrrp-group 1 # identifies virtual router #1
virtual address = IP_VR1 # IP address for virtual router #1
priority = 255 # priority to become master for vr #1
R2 is configured with the following parameters:

vrrp-group 1
virtual address = IP_VR1
priority = 100
During the initialization process, each VRRP router determines if it is the master router or a
backup router for virtual router #1. For this example, R1 becomes the master router because it
has a higher priority than R2. R2 becomes a backup router because it has a lower priority than
R1.
After determining that it is the master router for virtual router #1, R1 does the following:
■ Transmit an advertisement to the other VRRP routers on the LAN declaring itself the
master router for virtual router #1.
■ Broadcast a gratuitous Address Resolution Protocol (ARP) request containing the virtual
router IEEE 802 MAC address (00-00-5E-00-01-<vrrp-group>) associated with virtual router
#1 so that switches can learn the location of the virtual router #1’s MAC address.
■ Initialize the VRRP advertisement-timer to the value set in the advertisement-interval.
■ Transition to the master state.
After transitioning to master state for virtual router #1, R1 functions as the forwarding router
for the IP address associated with virtual router #1. While in this state, R1 does the following:
■ Respond to host ARP requests for the IP address associated with virtual router #1.
■ Forward packets it receives with destination MAC addresses equal to the virtual router #1
MAC address.
■ Accept packets addressed to the IP address(es) associated with virtual router #1.
■ Broadcast advertisements at regular intervals (specified by advertisement-interval) to inform
backup routers that it is still alive and acting as the master router for virtual router #1.

As a backup router for virtual router #1, R2 monitors the availability and state of the master
router. While in this state, R2 is responsible for the following:
■ Starting the master-down-timer and setting the master-down interval. The master-down interval
is typically 3X the advertisement-interval specified for the virtual router.
■ Receiving advertisements from the master router and verifying their validity.
■ Assuming the role of the master router if it receives an advertisement from another router
with a lower priority than its own.
■ Assuming the role of master router if it does not receive an advertisement from the master
router within the master-down interval.
If R1 fails, R2 detects the failure because its master-down-timer expires. R2 assume the role of
master router for virtual router #1 (vrrp-group = 1), transmits an advertisement to the other
VRRP routers in the group declaring that it is the master router for virtual router #1,
broadcasts a gratuitous ARP using the same virtual router MAC address
(00-00-5E-00-01-<vrrp-group>) so that switches can learn the new location of the virtual router
MAC address, and initializes its advertisement time. As a result of these actions, LAN switches
learn the new location of the virtual router and the transition is completely transparent to end
systems.
In this example, if R1 fails, it is backed up by R2. However, if R2 fails, it is not backed up by R1.
Consequently, this example does not provide efficient asset or bandwidth utilization because
R2 remains idle and forwards traffic only when R1 fails. To provide better asset utilization,
support load balancing for outgoing traffic, and allow R1 to back up R2, a second virtual router
must be defined. A configuration supporting these capabilities is illustrated in the next
example.
Example 2: VRRP Load-Sharing Configuration

Figure 13 shows two VRRP routers that are configured to support two virtual routers, virtual
router #1 and virtual router #2. Servers A and B are configured with a static default route to the
IP address associated with virtual router #1 (IP_VR1). Servers C and D are configured with a
static default route to the IP address associated with virtual router #2 (IP_VR2).
Figure 13: Typical VRRP Topology
VRRP Routers
R1: Master VR#1

Server Backup VR#2
Farm
A
Switch
B IP_VR1
Internet
IP_VR2
C
D R2: Master VR#2

Backup VR#1

R1 is configured to be a participant in two virtual router groups:

vrrp-group 1 # virtual router #1
priority = 255

priority = 100
Similarly, R2 is configured to participate in two virtual router groups:

priority = 100

priority = 255
During the initialization process, each VRRP router determines its role in each of the virtual
router groups. Based on the configured priority values, R1 becomes the master router for
virtual router #1 and a backup router for virtual router #2. R2 becomes the master router for
virtual router #2 and a backup router for virtual router #1. This configuration has the effect of
load-balancing outgoing traffic across two virtual routers while also providing full
redundancy. R1 can assume the role as master router for virtual router #2 if R2 fails, and R2 can
become the master router for virtual router #1 if R1 fails.

The JUNOS software implementation supports the configuration of VRRP on Fast Ethernet
and Gigabit Ethernet interfaces. To configure VRRP, you need to include the vrrp-group
statement when configuring each physical router in a VRRP group:
vrrp-group group-number { # virtual router number
virtual address [address]; # virtual router IP address
priority number; # priority to become master router
(accept-data|no-accept-data);
advertise-interval seconds;
authentication-type authentication;
authentication-key key;
(preempt|no-preempt);
track {
interface interface-name priority-cost cost;
}
}
In addition to the previously defined parameters, the following parameters can also be
configured:
■ The accept-data|no-accept-data option specifies whether the interface accepts
packets addressed to the virtual IP address. The default value is accept-data.
■ The advertise-interval parameter specifies the interval between VRRP advertisement
packets by the master router to other members of the VRRP group. The default value is 1
second.

■ The authentication-type parameter specifies whether authentication is enabled and the

authentication scheme used by the VRRP group. The valid configuration options are none
(authentication disabled), simple (a simple text password), or md5 (packet checksum). The
default value is none.
■ The authentication-key parameter defines the authentication password. All routers in
the VRRP group must use the same authentication scheme and password.
■ The preempt|no-preempt option specifies whether a higher priority backup router can
preempt a lower priority master router. The default value is preempt.
Additionally, JUNOS software can provide redundancy for virtual routing domains on IEEE
802.1Q tagged interfaces. The IEEE 802.1Q standard allows you to provision multiple VLANs
by defining multiple logical interfaces on a specific Ethernet interface.
Finally, JUNOS software extends VRRP to provide the ability to track the up or down state of
other system interfaces. This feature allows a router in a VRRP group to monitor up to 10 local
interfaces and then dynamically change its priority within the VRRP group based on the state
of the tracked interfaces. A user-configured priority-cost defines the value that is
subtracted from the router’s priority parameter when a tracked interface transitions from up
to down. Tracking allows a change in the state of a non-VRRP interface to potentially trigger a
new master router election because the priority of one of the routers in the VRRP group is
changed (reduced). The sum of the costs for all the interfaces tracked must be less than or equal
to the system’s configured priority for the VRRP group.
Link Aggregation and Redundancy

Link aggregation allows multiple physical or logical network links to be joined together into a
single logical link to provide higher bandwidth data rates. Link aggregation provides a
cost-effective solution for applications that require incremental bandwidth scaling rather than
exponential bandwidth scaling. Additionally, link aggregation provides network redundancy
by load-balancing traffic across all available links. If one of the links should fail, the system
automatically load-balances traffic across all remaining links. Link aggregation is also referred
to as link bonding.
Juniper Networks provides three mechanisms to support link aggregation and redundancy:
■ Multilink Point-to-Point Protocol (MLPPP) for T1/E1 bonding
■ IEEE 802.3ad for Ethernet link aggregation
■ SONET/SDH aggregation
Multilink Point-to-Point Protocol (MLPPP): T1/E1 Link Bonding

The point-to-point (PPP) protocol, originally specified in RFC 1548 and subsequently updated
in RFC 2153, provides interoperability between two directly connected systems by supporting
the negotiation of different configuration options, including link quality, link authentication,
and network layer protocols (Figure 14). Although PPP is usually considered a single protocol,
it is actually a set of protocols that work together to provide a broad collection of network
connectivity services. Over the years, the IETF has extend PPP by defining new authentication
and encryption capabilities for security, new compression algorithms, and support for virtually
every major WAN service, including ISDN, Frame Relay, X.25, and SONET/SDH. Despite
widespread deployment in production networks, the fundamental limitation of PPP is that it is
designed to support only a single physical link or logical channel at a time.

Figure 14: PPP vs. MLPPP Connectivity
PPP
MLPPP bundle
Multilink PPP (MLPPP), defined in RFC 1990, is specifically designed to overcome this
limitation of PPP by providing a method of splitting, recombining, and sequencing datagrams
across multiple physical links or logical channels. By supporting the dynamic addition and
deletion of PPP links over multiple simultaneous communications paths between systems,
MLPPP allows you to bundle individual paths to deliver more bandwidth, provide additional
bandwidth on demand to subscribers, load-distribute traffic across multiple paths, and ensure
that the failure of an individual channel does not disrupt packet forwarding.

Juniper Networks supports MLPPP using the Multilink Services PIC (ML PIC) for M5, M10,
M20, and M40 routers. This PIC supports the bonding of multiple T1 links or multiple E1 links
using MLPPP (RFC 1990) or Multilink-Frame Relay (MLFR, FRF.15) protocols. It addresses the
need for NxT1 and NxE1 access service where (subrate) DS3s and E3s are either unavailable or
difficult and expensive to provision. The ML PIC, along with other channelized interfaces,
provides sufficient MLPPP scalability on which to base a broadly deployed NxT1 or NxE1
service, leveraging the prevalence and maturity of T1 and E1 technology, meeting the needs of
subscribers who are outgrowing their T1 and E1 links, and enhancing the reliability of your
network infrastructure.
MLPPP is not a new technology. However, Juniper Networks was the first router vendor to
provide performance-based MLPPP. Despite the prevalence of T1 and E1 circuits, broad
deployment of multi-megabit services based on T1 or E1 has been hindered by the lack of
scalability of software-based MLPPP implementations. These implementations struggle to fill
links within a bundle and are often unstable. In contrast, the ML PIC is an extension of the
ASIC-based forwarding path of the M-Series architecture that can support literally hundreds of
ML bundles in a single chassis, without significantly impacting forwarding performance. This
performance enables providers to move multilink from a niche service for a selected few
customers to a broadly deployed, highly reliable, mainstream service with a service revenue
stream to match. The ML PIC provides a highly scalable solution for bridging the gap between
T1/E1 and DS3/E3.
In addition, the Juniper Networks MLPPP implementation does not restrict the physical
location of the individual link interfaces that are bundled together. The ML PIC can bundle
individual link interfaces residing in any FPC slot in the chassis. This provides an extra level of
flexibility when combining T1/E1 interfaces.
The features supported by the ML PIC include the following:
■ A multilink bundle can contain up to eight links.
■ There are three versions of the ML PIC. The ML-4 supports a maximum of 4 bundles with
an aggregate throughput of 64 Mbps (4 bundles * 8 E1s per bundle). The ML-32 supports a
maximum of 32 bundles with an aggregate throughput of 450 Mbps. The ML-128 supports
a maximum of 128 bundles with an aggregate throughput of 450 Mbps. Mixing different
versions of the ML PIC within a chassis is permitted.

■ A single chassis can support up to 512 distinct bundles.

■ Bundles can be built from any interface within a router chassis.
■ Bonding T1 links enables reliable services ranging from 1.5 Mbps through 12 Mbps.
■ Bonding E1 links enables reliable services ranging from 2 Mbps through 16 Mbps.
IEEE 802.3 ad: Ethernet Link Aggregation

IEEE 802.3ad is the section of the IEEE 802.3 standard that defines Ethernet link aggregation.
Ethernet link aggregation specifies a mechanism that allows you to directly connect two
network elements using multiple parallel Ethernet links. The Ethernet links composing the
aggregate are treated as a single logical link (Figure 15).
Figure 15: Example IEEE 802.3ad Group Topologies
Router to Router Router to Layer 2 Switch Router to Server
802.3ad Group 802.3ad Group 802.3ad Group
The primary benefits of IEEE 802.3ad include link redundancy and increased bandwidth.
■ Link redundancy results from the multiple links that constitute the 802.3ad aggregated link.
■ Increased bandwidth results from the ability to establish multiple Ethernet links between
two network elements connected by the aggregated link. This is a cost-effective solution
and relatively simple migration strategy for applications that require incremental
bandwidth scaling rather than exponential scaling.
IEEE 802.3ad includes the following attributes:
■ Packets from the same network flow arrive in the same sequence as they were originally
transmitted because packets from the same flow traverse the same physical link. The
selection of the link used to transmit a particular packet is determined by the packet
distribution algorithm, which is not defined in the IEEE 802.3ad specification.
■ For ARP resolution to work properly, regardless of the link on which traffic will eventually
be forwarded, all ports that are members of an aggregated link must use the same MAC
address.
■ The entity that manages the aggregate link also monitors the ports that make up the
aggregated link for failures. Packets are never forwarded across a physical interface that is
down.
All packets that are sent or received across a physical interface that is part of an 802.3ad group
use the local Virtual MAC for the group (Figure 16). The Virtual MAC for an 802.3ad group
keeps track of the different physical MAC addresses of the links that are members of the group.
The source Virtual MAC distributes packets received from the Virtual MAC client to the physical
interfaces using a load-balancing algorithm and transmits the packet using the local Virtual
MAC address as the source address and the remote Virtual MAC address as the destination
address. When receiving packets, the destination Virtual MAC forwards packets that are
received from the physical MACs associated with the 802.3ad group to its Virtual MAC client.

Figure 16: IEEE 802.3ad Virtual MACs
Router A Router B
802.3ad Group
Physical Physical
MAC 1 MAC 1
Virtual MAC Virtual Physical Physical Virtual Virtual MAC
Client MAC MAC 2 MAC 2 MAC Client
Physical Physical
MAC 3 MAC 3
802.3ad also defines the Link Aggregation Control Protocol (LACP). This protocol allows you
to configure ports so they can automatically join or leave different 802.3ad groups. If the
protocol running between two nodes determines that link aggregation is possible, the links are
automatically aggregated into an 802.3ad group. If the protocol running between two nodes
determines that link aggregation is not possible, the links operate normally as individual links.

The JUNOS implementation of IEEE 802.3ad balances traffic across the links of an aggregated
Ethernet group based on the Layer 3 information carried in the packet header. The JUNOS
implementation uses the same load-balancing algorithm defined for per-packet load balancing.
On routers with the Internet Processor II ASIC, traffic flowing between routers with multiple
paths is first classified into individual traffic flows. Packets associated with a particular flow
are always transmitted over the same physical interface to maintain packet ordering. To
recognize packets belonging to each flows, the router uses the following information:
■ Source IP address
■ Destination IP address
■ Protocol
■ Source port number
■ Destination port number
■ Interface through which the packet entered the router
Each packet that has the same values for this set of parameters is treated as a member of a flow
and is forwarded out the same physical interface as other packets that are members of the flow.
If the 802.3ad group consists of five distinct interfaces, the traffic is load-balanced across all five
interfaces. If one of the five interfaces fails, forwarding continues but packets are now
load-balanced across the remaining four interfaces. If the failed interface returns to an UP state,
the traffic is now load-balanced across all five interfaces. Finally, if the number of UP interfaces
is less than the value of the user-configured minimum-links parameter, the entire 802.3ad
group is declared DOWN.
The features supported by the JUNOS implementation of 802.3ad include the following:
■ All ports in an 802.3ad group must operate in full-duplex mode.
■ All ports in an 802.3ad group must have the same interface speed (either Fast Ethernet or
Gigabit Ethernet).

■ An 802.3ad group can contain up to eight physical interfaces.

■ A single router can support a maximum of 16 802.3ad groups.
■ An 802.3ad group can be built from any interface within a router chassis. Each physical
interface on a PIC can belong to a different 802.3ad group. Additionally, a physical interface
from a PIC on one FPC and a physical interface from another PIC on a different FPC can
belong to the same 802.3ad group. This means that if the links comprising an 802.3ad group
are allocated across different PICs or FPCs in the same router, connectivity does not depend
on the health of any particular PIC or FPC.
■ LACP is not supported. This means that the links comprising an 802.3ad group must be
explicitly configured.
■ 802.3ad MIB extensions are supported to maintain statistics as well as the operating state of
the Virtual MAC. Because LACP is not supported, the MIB extensions related to LACP are
not supported.
SONET/SDH Aggregation
JUNOS software supports the aggregation of SONET/SDH interfaces. An aggregate
SONET/SDH virtual link is defined by specifying the link number as a physical device and
then associating a set of physical interfaces into a SONET group. The capabilities provided by
SONET/SDH aggregation are similar to those supported by our implementation of IEEE
802.3ad but this feature is not defined in a public standard. Consequently, SONET/SDH
aggregation is proprietary to Juniper Networks and may not be compatible with equipment
from other vendors.
The features supported by the JUNOS implementation of SONET/SDH aggregation include
the following:
■ The transmission of both IP and MPLS traffic are supported
■ Each port in a SONET/SDH virtual link must have the same interface speed. The interface
speed can be OC-3c, STM-1c, OC-12c, STM-4c, OC-48c, STM-16c, OC-192c, or STM-64c.
■ A SONET/SDH virtual link cannot mix SONET and SDH modes.
■ A SONET/SDH virtual link can contain up to eight different physical interfaces.
■ A single router can support a maximum of 16 SONET/SDH groups.
■ A SONET/SDH virtual link can include any interface on any PIC in any FPC within a
router chassis.
MPLS Label-Switched Path (LSP) Reliability

If protection paths have not been defined and a link or node in a Label Switched Path (LSP)
fails, the ingress Label Switching Router (LSR) continues to forward traffic toward its
destination as native IP packets using the IGP shortest path. When attempting to reroute the
physical path of a failed LSP, the ingress LSR establishes a new LSP to the egress LSR. After the
new route is calculated, the ingress LSR uses the MPLS signaling protocol (RSVP-TE) to
establish forwarding state at each hop in the new path. When the new LSP is established, the
ingress LSR forwards traffic to destinations downstream of the egress LSR across the newly
established LSP (Figure 17).

Figure 17: LSP Recovery Without Protection Paths
1: Outage
Ingress Egress
LSR LSR
2: Calculate
new path
4: Switch traffic
to new path
3: Establish
new path
Active LSP Backup LSP
Two mechanisms can be used to make the ingress LSR aware of an LSP outage and the need to
calculate a new path for the LSP:
■ The LSR immediately upstream from the outage signals the failure to the ingress LSR.
■ If the link-state database of the ingress LSR indicates a failed link, the ingress LSR examines
the status of all LSPs traversing the failed link or node.
Unfortunately, this rerouting process can be time-consuming and prone to failures. For
example, the failure notification transmitted by the LSR immediately upstream of the outage
can be lost or the new path for the LSP can be slow to come up due to congestion or a sudden
surge of network activity.
MPLS LSP Protection Mechanisms

Three complementary mechanisms are defined to protect against LSP outages:
■ Secondary paths (standby and non-standby)
■ Fast reroute (one-to-one backup)
■ Link protection (many-to-one or facility backup)
Full traffic protection is afforded when both secondary paths and fast reroute (or link
protection) are configured for an LSP. When a link or node in the primary LSP fails, the LSR
immediately upstream of the failure routes traffic around the outage using the bypass link and
then notifies the ingress LSR of the outage. The bypass link allows traffic to continue flowing
until the ingress LSR processes the failure notification. After receiving the failure notification,
the ingress LSR switches traffic from the patched primary path to the more optimally routed
secondary path.

Secondary Paths
Secondary paths support the configuration of primary and secondary physical paths for an
LSP to protect against link and transit node forwarding plane failures (Figure 18). The primary
path is the preferred path while the secondary path is used as an alternative route when the
primary path fails. There are two types of secondary paths: standby and non-standby. A
standby secondary path is pre-computed and pre-signaled while a non-standby secondary
path is pre-computed but is not pre-signaled.
Figure 18: Standby Secondary Paths
1: Outage
Ingress Egress
LSR LSR
2: Switch traffic
to pre-established
secondary path
If a link or node in the primary path fails, the LSR immediately upstream of the outage uses the
MPLS signaling protocol to notify the ingress LSR of the failure. Upon receipt of the outage
notification, the ingress LSR reroutes traffic from the failed primary path to the secondary path.
The use of standby secondary paths enhances recovery time by eliminating the call-setup delay
that is required to establish a new physical path for the LSP. If the outage in the primary path is
corrected, the ingress LSR, after a few minutes of hold-down to ensure that the primary LSP
remains stable, switches traffic from the secondary path back to the primary path.
Secondary paths are appropriate when used in conjunction with off-line path computation
tools or when the secondary path is less constrained than the primary path. Because resources
are reserved even while the primary path is active, standby secondary paths waste more
resources than non-standby secondary paths. However, the restoration time for standby
secondary paths is faster than for non-standby secondary paths.
Fast Reroute (or 1:1 Protection)

Fast reroute (or one-to-one backup) allows an LSR immediately upstream from an outage to
quickly route around a failed link or node to an LSR downstream of the outage (Figure 19).
This is accomplished by precomputing and pre-establishing detour paths that bypass the
immediate downstream link and the next-hop LSR. The LSR immediately upstream from the
protected link or node calculates a detour path for each LSP. When an outage occurs, the LSR
immediately upstream of the outage switches traffic to the detour path and then signals the
failure to the ingress LSR. Fast reroute provides local repair and allows connectivity to be

restored faster than traffic can be switched by the ingress LSR to a standby secondary LSP. Fast
reroute is only a short-term solution because the detour paths may not provide adequate
bandwidth and the activation of a detour path can result in congestion on bypass links.
Figure 19: Fast Reroute
LSP 1
Upstream
LSR 1: Outage
LSP 2 Downstream
LSR
2: Switch traffic on each LSP
to its dedicated detour path
Fast reroute is appropriate when the number of LSPs to be protected is small relative to the
total number of LSPs, when satisfying the path selection criteria (priority, bandwidth, link
coloring) for detour paths is critical, when control at the granularity of individual LSPs is
important, or when simpler configuration is desired.
Link Protection (or 1:N Protection)

Link protection (or many-to-one backup) allows an LSR immediately upstream from a link
failure to use an alternate interface to forward traffic to its downstream neighbor LSR (Figure
20). This is accomplished by pre-establishing a bypass path that is shared by all protected LSPs
traversing the failed link. A single bypass path safeguards the set of protected LSPs. When an
outage occurs, the LSR immediately upstream of the link outage switches protected traffic to
the bypass link and then signals the link failure to the ingress LSR. Like fast reroute, link
protection provides local repair and allows connectivity to be restored faster than traffic can be
switched by the ingress LSR to a standby secondary LSP. However, unlike fast reroute, link
protection does not provide protection against the failure of the downstream neighbor LSR.
Figure 20: Link Protection
LSP 1
Upstream Downstream
LSR 1: Outage LSR
LSP 2
2: Switch all LSP traffic
to the bypass link

Link protection is appropriate when the number of LSPs to be protected is large, when
satisfying path selection criteria (priority, bandwidth, link coloring) for bypass paths is less
critical, when control at the granularity of individual LSPs is a non-goal, or where
configuration complexity is not an issue.

The JUNOS software implementation supports the configuration of secondary paths, standby
secondary paths, fast reroute, and facility link protection. Additionally, it delivers two critical
implementation features—PFE local protection to reduce the switchover time and fate sharing
to enhance the reliability of backup paths.
Secondary Paths
After defining a named path on the ingress LSR, you can configure the ingress LSR to use it as
a standby secondary path by including the secondary statement at the [edit protocols mpls
label-switched-path lsp-path-name] hierarchy level.
The following example configures an ingress LSR to use the secondary path
backup-sf-to-ny-path as a standby secondary path for the LSP sf-to-ny-lsp.
[edit protocols mpls label-switched-path sf-to-ny-lsp]

primary primary-sf-to-ny-path { # Primary path name for the LSP
...
}
secondary backup-sf-to-ny-path { # Standby secondary path name for
LSP
standby # Path is signaled and always up
...
}
In this example, the primary statement creates the primary path, which is sf-to-ny-lsp’s
preferred path. The secondary statement established a pre-calculated and pre-signaled backup
path. If the primary path can no longer be used to reach the egress LSR, the alternative path is
used.
Fast Reroute (One-to-One Backup)

To enable fast reroute for an LSP, you need to include the fast-reroute statement at the [edit
protocols mpls label-switched-path lsp-path-name] hierarchy level on the ingress LSR.
[edit protocols mpls label-switched-path lsp-path-name]
fast-reroute {
bandwidth bps; # Bandwidth associated with the detour path
hop-limit number; # Maximum number of LSRs the LSP can traverse
}
It is not necessary to explicitly configure fast reroute on the transit or egress LSRs of an LSP.
Once fast-reroute is configured on the ingress LSR, the ingress LSR (during the LSP setup
phase) uses RSVP-TE to signal all downstream LSRs that fast-reroute is enabled for the LSP.
This informs each LSR along the physical path of the LSP that it needs to use the constrained
shortest path first (CSPF) algorithm on the information in the local LSR’s traffic engineering
database (TED) to compute and then pre-establish a detour path for the LSP. By default, the
detour path inherits the same administrative group constraints (link coloring or resource
classes) as its parent LSP when the bypass path is calculated by CSPF.

In one-to-one backup, each LSP traversing a given node is backed up by a dedicated detour
path. Fast reroute detour paths are always calculated to bypass both the link facing the
immediate downstream neighbor LSR in the LSP and the immediate downstream neighbor
LSR itself (Figure 21). This approach provides protection against both the failure of an adjacent
downstream link and the failure of an adjacent downstream node. Of course, if the network
topology contains an insufficient number of LSRs with insufficient links to other LSRs, it may
be impossible to establish a detour path.
Figure 21: Fast Reroute Detour Path
Protected Link Protected Link
Protected Node
Detour
Path
Work is underway in the IETF (draft-ietf-mpls-rsvp-lsp-fastreroute-00.txt) to standardize

MPLS fast reroute solutions so they provide multi-vendor interoperability. The Juniper
Networks approach to fast reroute is referred to as “one-to-one backup” in this draft.
Link Protection (Many-to-One or Facility Backup)

The same IETF draft (draft-ietf-mpls-rsvp-lsp-fastreroute-00.txt) also specifies a second
approach to fast reroute known as “facility backup.” Instead of creating a dedicated detour
path for each LSP, facility backup takes advantage of the label stack to create a single bypass
LSP that protects a set of LSPs. The LSR immediately upstream of the outage pushes a new
label onto the label stack of each redirected packet. At the penultimate hop of the bypass
tunnel, the label is popped from the label stack displaying the label that identifies the protected
LSP. Comparable to one-to-one backup, facility backup protects against the failure of the
immediate downstream link and, optionally, the failure of the immediate downstream node’s
control plane.
The JUNOS software implementation of link protection supports only the mechanisms needed
to protect against the failure of the link facing the immediate downstream LSR and not the
failure of the immediate downstream node’s control plane. We believe that Graceful Restart is
more suitable than either local 1:1 protection (Fast Reroute) or 1:N protection (Link Protection)
when handling control plane restart issues. Unlike local 1:1 or 1:N protection, Graceful Restart
does not temporary introduce jitter, packet loss, out-of-order delivery, or sub-optimal routing.
To enable link protection for an LSP, you need to include the link-protection statement at the
[edit protocols mpls LSP-name] hierarchy level on the ingress LSR.
[edit protocols mpls]
LSP-name {
link-protection;
}
It is not necessary to explicitly configure link protection on the transit or egress LSRs of an LSP.
Once link protection is configured on the ingress LSR, the ingress LSR (during the LSP setup
phase) uses RSVP-TE to signal all downstream LSRs that link protection is enabled for the LSP.

You must also configure link protection on an RSVP interface by include the link-protection
statement at the [edit protocols rsvp interface interface-name] hierarchy level.
[edit protocols rsvp interface]
interface-name {
link-protection;
}
JUNOS software supports the configuration of both fast reroute and link protection for an LSP.
This provides interoperability with equipment from Cisco Systems. Assume that you configure
both fast reroute and link protection for an LSP. If the downstream node is a Cisco Systems
router, then link protection is supported. If the downstream router is a Juniper Networks
router, then fast reroute is supported
PFE Local Repair

Whenever fast reroute or link protection is configured for an LSP, PFE Local Repair is enabled
by default. PFE Local Repair is a Juniper Networks implementation mechanism and is not part
of the MPLS standard. PFE Local Repair is designed to reduce the amount of time needed to
switch MPLS traffic from a primary path to a protection path. This feature was originally
introduced in JUNOS software release 5.3 and is supported by all M-series and T-series routing
platforms (Figure 22).
Figure 22: PFE Local Repair
Without PFE Local Repair With PFE Local Repair
1: FPC informs PFE that 1: FPC informs PFE that

Routing Engine (RE) interface went down. Routing Engine (RE) interface went down.
3 4
2: PFE passes message to RE. 2: PFE passes message to RE.
2 2
3: RE selects the best 3: PFE switches traffic to
4 5
backup route. precalculated and
pre-established backup route.
Packet Forwarding Engine (PFE) Packet Forwarding Engine (PFE)
4: RE updates PFE with
5 the best backup route. 3 4: RE recalculates a new backup route.
1 5: Traffic is switched 1 5: RE updates PFE with
to backup route. the new backup route.
With PFE Local Repair, the routing engine (RE) downloads detour path information to the PFE
for each LSP after the detour paths are calculated. Then, if an LSR detects that a downstream
link or node has failed, the PFE can immediately switch all LSP traffic from the failed path to
precomputed and pre-established detour paths. This feature reduces the time needed to
complete protection path switching by allowing the PFE to respond immediately to a path
failure without having to wait for the RE to download the detour path to the PFE. PFE Local
Repair is enabled by default and requires no additional configuration.

Fate Sharing
Fate sharing allows you to create a database of information that is accessed by CSPF when it
computes backup protection paths. The database describes the relationships between all the
elements in the network—point-to-point links, LAN interfaces, and router IDs. Since the
network elements in the group share the same fate, this relationship is called fate sharing.
For a protection path to work optimally, it should be calculated to minimize the number of
physical links shared with the primary path. This ensures that any single point of failure will
not affect both the primary path and the protection path at the same time. The JUNOS software
fate sharing enhancement allows you manage the way that CSPF computes protection paths so
that the number of shared links between the primary path and the protection path is
minimized. This feature can be applied when calculating primary paths, secondary paths, and
fast reroute detour paths.
To configure a fate-sharing group for protection path calculation, you need to include the
fate-sharing statement at the [edit routing-options] hierarchy level on each LSR for which you
want fate sharing enabled. The following sample configuration defines a fate-sharing group
called "test."
[edit routing-options]
fate-sharing {
group test {
cost 20; # Optional. The default value is 1
from 1.2.3.4 to 1.2.3.5; # Identifies a point-to-point link
from 192.168.200.2; # Identifies a LAN interface
from 192.168.200.3; # Identifies a LAN interface
from 123.3.4.5; # Identifies a Router ID of a router node
}
}
A fate-sharing group is configured with a cost attribute that is comparable to a traffic

engineering metric. Each fate-sharing group can be composed of three types of objects:
■ Point-to-point links
■ Nonpoint-to-point links (LAN or NBMA interfaces)
■ Routing nodes (LSRs)
Typically, you place network objects into a fate-sharing group because all the objects in the
group share an equal opportunity for failure. For example, a fate-sharing group can be defined
for all fibers that run through the same fiber conduit, all optical channels carried by the same
fiber optic cable, all links that connect to the same LAN switch, or all equipment that shares the
same power source.
Each fate-sharing group is configured with a user-defined cost attribute. The cost attribute
describes the impact that this group has in the CSPF computation. The higher the cost assigned
to a fate-sharing group, the less likely that a protection path will share any objects in the group
with the primary path.

Summary
Highly dependable IP networks result from the combination of dependable platforms,
dependable networks, dependable service-enabling features, and enhanced security
capabilities. This paper focused on the mechanisms supported by JUNOS software and M- and
T-series routing platforms that ensure the fundamental ability of your network to continue
supporting subscriber services when links or nodes fail. These mechanisms include SONET
APS and SDH MSP, VRRP, a variety of link aggregation and redundancy techniques (MLPPP,
IEEE 802.3ad, and SONET/SDH link aggregation), and a number of MPLS LSP
protection-features (standby secondary paths, fast reroute, and link protection). As the amount
of mission critical data carried by native IP or converged MPLS infrastructures increases, these
features allow you to hide network faults from your subscribers and to continue to satisfy their
demanding service level agreements. By delivering the dependable platforms, dependable
networks, dependable service-enabling features, and enhanced security capabilities you need
to deploy a highly dependable IP infrastructure, Juniper Networks provides the tools you need
to support existing subscriber services, to grow these services, and to deploy new
revenue-generating services.
References
Juniper Networks Documentation

JUNOS Internet Software Configuration Guide: Getting Started, Release 5.4. Published July 2002.
JUNOS Internet Software Configuration Guide: Interfaces and Class of Service, Release 5.4. Published
July 2002.
JUNOS Internet Software Configuration Guide: MPLS Applications, Release 5.4. Published July
2002.
JUNOS Internet Software Configuration Guide: Network Management, Release 5.4. Published July
2002.
JUNOS Internet Software Configuration Guide: Routing and Routing Protocols, Release 5.4.
Published July 2002.
JUNOS Internet Software Operational Mode Command Reference, Release 5.4. Published July 2002.
JUNOS 5.4 Internet Software Release Notes, Published September 2002.
Requests for Comments

RFC 1990, The PPP Multilink Protocol (MP) K. Slower, B. Lloyd, G. McGregor, D. Carr, and T.
Coradetti, August 1996.
RFC 2338, Virtual Router Redundancy Protocol) S. Knight, D. Weaver, D. Whipple, R. Hinden, D.
Mitzel, P. Hunt, P. Higginson, M. Shand, and A. Lindem, April 1998.

Internet Drafts
Sharma, Vishal et al; Framework for MPLS-based Recovery;
<draft-ietf-mpls-recovery-frmwrk-07.tx>; September 2002.
Pan, Ping et al; Fast Reroute Extensions to RSVP-TE for LSP Tunnels; <draft-ietf-mpls-rsvp-lsp-
fastreroute-00.tx>; July 2002.
Other Standards Documents

GR-253-CORE, SONET Transport Systems: Common Generic Criteria
IEEE, 802.3ad, Aggregation of Multiple Link Segments
Copyright © 2002, Juniper Networks, Inc. All rights reserved. Juniper Networks is registered in the U.S. Patent and Trademark Office and in other countries
as a trademark of Juniper Networks, Inc. Broadband Cable Processor, ERX, ESP, G1, G10, G-series, Internet Processor, JUNOS, JUNOScript, M5, M10, M20,
M40, M40e, M160, M-series, NMC-RX, SDX, ServiceGuard, T320, T640, T-series, UMC, and Unison are trademarks of Juniper Networks, Inc. All other
trademarks, service marks, registered trademarks, or registered service marks are the property of their respective owners. All specifications are subject to change
without notice.
Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right to change, modify, transfer, or otherwise
revise this publication without notice.

Juniper Link&amp;Node Robustness

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Juniper Link&amp;Node Robustness

Uploaded by

Copyright:

Available Formats

White Paper

IP Dependability: Network Link and Node

Juniper Networks, Inc.

Part Number : 200044-001

Copyright © 2002, Juniper Networks, Inc.

Copyright © 2002, Juniper Networks, Inc. 3

Figure 1: Juniper Networks High-Dependability Architecture

Dependable Subscriber Services

Dependable Service-enabling Features

Forwarding MPLS DiffServ

Copyright © 2002, Juniper Networks, Inc. 4

■ Link aggregation and redundancy capabilities

Figure 2: The Recovery Cycle

Primary Fault Recovery Traffic

Each timing interval in the recovery cycle (Figure 2) is defined below:

Copyright © 2002, Juniper Networks, Inc. 5

Figure 3: The Reversion Cycle

Recovery Fault Reversion Traffic

Copyright © 2002, Juniper Networks, Inc. 6

Each timing interval in the reversion cycle (Figure 3) is defined below:

Copyright © 2002, Juniper Networks, Inc. 7

Dynamic Rerouting Cycle

Figure 4: The Dynamic Rerouting Cycle

Switchover Traffic Moved

Copyright © 2002, Juniper Networks, Inc. 8

SONET Automatic Protection Switching (APS)

SONET APS 1+1 and APS 1:N

Figure 5: SONET APS 1+1 and APS 1:N

1 + 1 APS Architecture 1:N APS Architecture

Copyright © 2002, Juniper Networks, Inc. 9

Juniper Networks Implementation

Figure 6: APS Configuration Topologies

Protecting Against a PIC or FPC Failure Protecting Against a Router Failure

Working Protect Working Protect

Copyright © 2002, Juniper Networks, Inc. 10

To configure APS, you need to include the sonet-options aps statement:

The configuration parameters discussed in this section include the following:

Example 1: Protecting Against a PIC Failure

Figure 7: Protecting Against a PIC Failure

Copyright © 2002, Juniper Networks, Inc. 11

The configuration for basic 1+1 APS support is very simple:

Example 2: Protecting Against a Router Failure

Figure 8: Protecting Against a Router Failure

Router A is configured with the following parameters:

Router B is configured with the following parameters:

Copyright © 2002, Juniper Networks, Inc. 12

Example 3: APS Load Sharing between Circuit Pairs

Figure 9: APS Load Sharing Between Circuit Pairs—Initial Configuration

[edit interfaces so-0/0/0 sonet-options]

Copyright © 2002, Juniper Networks, Inc. 13

[edit interfaces so-1/0/0 sonet-options]

Figure 10: APS Load Sharing Between Circuit Pairs—Failure Switchover

Virtual Router Redundancy Protocol (VRRP)

Copyright © 2002, Juniper Networks, Inc. 14

VRRP Operational Model

Example 1: Basic VRRP Configuration

Copyright © 2002, Juniper Networks, Inc. 15

Figure 12: Typical VRRP Topology

R1: Master VR#1

R2: Backup VR#1

R1 is configured with the following parameters:

R2 is configured with the following parameters:

Copyright © 2002, Juniper Networks, Inc. 16

Juniper Link&Node Robustness

Juniper Link&Node Robustness