You are on page 1of 6

2013 IEEE 27th International Conference on Advanced Information Networking and Applications

A failure recovery method based on cycle structure


and its verication by OpenFlow
Junichi Nagano and Norihiko Shinomiya
Graduate School of Engineering, Soka University, Tokyo, Japan
shinomi@soka.ac.jp
AbstractThe purpose of this study is to realize a fast
failure recovery method with reducing packet loss on a largescale complicated network. Though conventional failure recovery
methods can work fast with a little control messages, they might
be limited for a ring topology. Some studies try to apply the
ring failure recovery methods to mesh networks by extracting
ring structure or cycles from the network. However, it is likely
to require a large amount of calculation to nd an efcient set
of cycles that covers all communication links on a network. This
paper proposes a control method for one link failure recovery
using fundamental tie-sets that cover all links. The experimental
results on an OpenFlow network demonstrate that the control
method can reduce packet loss and control messages in the
recovery process.

obtained in polynomial time. A fundamental tie-set created


from a spanning tree is dened as a set of links on a cycle.
A set of fundamental tie-sets and the tree have the one-toone relation where represents the number of links which
are not included in the tree. The characteristics of the union
of fundamental tie-sets covering all links allow a network to
recover a failure for any link [12]. A depth d of a tree, which
is the number of links from the root node to the farthest leaf
node, decides the size of a fundamental tie-set by 2 d + 1 at
most. Thus, the simple tree search algorithms minimizing the
depth of tree such as BFS (Breadth First Search) can reduce
the number of links in a tie-set.
Previous researches just theoretically analyze characteristics
of tie-sets for failure recovery and propose a simple control method. Nakayama [12] discusses control information of
nodes and a failure detection method, but there remains more
consideration of a forwarding method of backup paths.
In order to verify effectiveness of our method, this paper
shows experimental results under emulation environment on
an OpenFlow network such as lost packets, control messages
during a link failure, and recovery cycle sizes.

I. I NTRODUCTION
As the Internet continues to grow in size and complexity,
even short time failure can cause enormous inuences on
communications. For this reason, high-speed restoration for
network failures has been attracting more attention, and the
key to realize fast restoration could be reduction of packet
loss in the time from failure occurrence to repair. It is known
that some restoration methods for ring structure networks can
actualize high-speed restoration [1], [2]. In addition, these
methods have a little control messages, because only two
end nodes of a failed communication link can restore the
failure. These failure recovery methods are highly reliable and
have been deployed in commercial networks. However, these
methods work on limited network topologies, i.e. a single ring
or multi-ring structure; therefore, it would be effective if these
methods could be expanded for mesh networks.
On the other hand, some restoration methods focusing on
ring structures (cycles) underlying in a mesh network have
been proposed [3]. These methods can grow the bandwidth
efciency for backup paths by increasing recoverable links of
a single cycle. Thus, the methods expand the size of a cycle
and append recoverable links straddling the cycle. However,
nding cycles with minimizing the bandwidth efciency would
require huge amount of calculation [4]. In order to reduce
the computational complexity, approximate methods have also
been proposed, but cycles obtained by such methods could
cover just a part of links in the entire network. Furthermore,
the optimal cycles tend to include a larger cycle such as the
hamiltonian cycle [9], thus backup paths would be also much
longer.
Previous researches [10], [11] propose a theoretical basis for
network management using fundamental tie-sets which can be
1550-445X/13 $26.00 2013 IEEE
DOI 10.1109/AINA.2013.81

II. R ELATED

WORKS

There are many works proposing various types of failure


recovery methods using cycles on mesh networks [3]. These
works employ pre-congured cycles called p-cycles that are
constructed for recovery. Among them, some works [5], [6]
attempt to apply p-cycles to WDM networks. The p-cycles
can only restore links that are on a cycle or straddle a cycle.
Figure 1 and g. 2 show failure recovery examples of the oncycle links and the straddling links respectively. Figure 1(a)
illustrates a p-cycle (v2 , v3 , v5 , v7 , v6 , v4 ), a communication
path (v1 , v4 , v6 , v8 ) including a on-cycle link (v4 , v6 ). Figure
1(b) shows a backup path (v4 , v2 , v3 , v5 , v7 , v6 ) with a failure
of on-cycle link (v4 , v6 ). On the other hand, g. 2(a) shows a
communication path (v1 , v4 , v5 , v8 ) which include a straddling
link (v4 , v5 ), and a p-cycle like g. 1(a). Two backup paths
(v4 , v2 , v3 , v5 ) and (v4 , v6 , v7 , v5 ) exist for the straddling link
(v4 , v5 ).
These p-cycle methods hardly gain a set of p-cycles that
minimize bandwidth of backup paths on a large network [4].
This minimization method is formulated as Integer Linear
Programming (ILP) problem, in which the number of variables
increases according to the number of candidate cycles. As
the number of any possible cycles to be extracted increases
298

ment to single link failure recovery. Nakayama et al. [12]


propose a failure recovery algorithm with distributed manner.
Their experimental results compared with RSTP show that
recovery using fundamental tie-sets reduces the number of
nodes related to failure recovery process and route change
times. However, what seems to be lacking is a consideration
of a practical control method on a more realistic network.
Thus, this paper proposes the control table creation method of
each node for failure recovery using fundamental tie-sets.
III. FAILURE RECOVERY METHOD BASED ON
FUNDAMENTAL TIE - SET
A. Denition

Fig. 1.

This paper applies a concept of fundamental tie-sets dened by graph theory to failure recovery. A set of vertices
V = {v1 , v2 , , v|V | } denotes network equipment such
as routers and switches. E = {e1 , e2 , , e|E| } represents
links connecting network equipment, and graph G = (V, E)
indicates information network. Because this paper deals with
failure recovery, all graphs are bi-connected. We call a elementary closed path a tie-set and dene a tree T E as
the maximum edge set that include no tie-sets. T = E T
denotes the co-tree of a tree T , and the number of edges |T |
is |E| |T | = |E| |V | + 1. For any e T in graph G, it
is known that T {e} include one fundamental tie-set L, and
the union of fundamental tie-sets obtained from a tree hold all
links in a bi-connected graph. This feature of tie-sets enable
us failure recovery of any link.

On-cycle link protection.

B. Behavior of the proposed method

Fig. 2.

The proposed method establishes communication tree consisting of communication paths. The communication paths are
paths from each node to one node on the communication
tree. Then, the method produces fundamental tie-sets from
a communication tree, and establishes backup paths from
the fundamental tie-sets. When a link failure occurs on a
communication tree, the proposed method uses a fundamental
tie-set including the failed link. Figure 3 shows an example
of failure recovery that uses tree = {e1 , e3 , e4 , e5 , e8 } and
fundamental tie-sets L1 , L2 , L3 , L4 . If the link e4 failed, tiesets L3 or L4 can detour the failure. Communication messages
passing the failed link e4 toward node v4 will use the backup
path (v3 , v5 , v4 ) when a tie-set L3 is used for recovery.
Counter direction communication messages toward node v3
are transmitted along the backup path (v4 , v5 , v3 ).

Straddling link Protection.

exponentially according to the size of a network, calculation


time for solving the ILP is likely to be tremendous in a largescale network.
Thus, the selection methods of candidate cycles are proposed to calculate approximate solutions [4], [6], [7], [8]. SLA
(Straddling Link Algorithm) [6] can decrease the number of
candidate cycles to nearly the number of links, but an approximate solution cannot cover all links. Although candidate
cycles selection methods [7], [8] increase recoverable links,
the candidate cycles could not cover all links.
On the other hand, previous researches [10], [11] indicate
possibility of network management using a set of fundamental
tie-sets. Koide et al. [11] apply fundamental tie-sets manage-

C. Messages for failure recovery


Our proposed method has three types of messages: communication, failure advertisement and recovery messages. The
communication message such as an Ethernet frame and an
IP packet contains a destination address and data to transfer.
Failure advertisement messages are sent by end nodes of
a failed link to a control node. The advertisement message
contains a sender node identier and a port number connected
to the failed link. The number of control nodes is variable.
If a control node receives a failure advertisement message,

299

TABLE I
T REE TABLES OF THE NODE v4 .
Key (destination address)
N1
N2
N3
N4
N5
N6

Output port
p2
p2
p2
p1
p2
p4

TABLE II
T IE - SET TABLES OF THE NODE v4 .
Key (tie-set ID)
L3 Forward

Fig. 3.

L3 Backward
L4 Forward
L4 Backward

An example of fundamental tie-sets.

Fig. 4.

Action
strip tie-set header
L3 Forward
none
none
none

Output port
p3
p2
p4
p2

Ports of the node v4 .

the control node sends recovery messages to end nodes in


order to forward communication messages to a backup path.
The recovery message attached on communication message
contains backup path information that includes a tie-set identier and a direction of rotation. In this paper, a direction of
a backup path is ether Forward or Backward. Each tie-sets
has its own rotation direction. The tie-set rotation direction is
called Forward, and opposite direction is Backward.

Fig. 5.

Message transfer process of a node.

receives a recovery message that contains an ID of tieset including this node, the node transfers a communication
message referring to its tie-set tables. Otherwise, the node
transfers the message according to its communication path
tables.

D. Tables of a node
Each node contains tables of communication paths and tiesets. A table of a communication path indicates a transfer port
of communication messages in order to reach a destination
address. For example, table I shows the communication path
tables of node v4 in g. 4. A tie-set table indicates a transfer
port of a backup path on a tie-set whose direction is Forward or
Backward. In order to switch communication messages from a
backup path to a communication path, a tie-set table detaches
a recovery message when the link joined to the transfer port
is included in a co-tree. For instance, table II indicates tie-set
tables of the node v4 which belongs to tie-sets L3 and L4 . L3
Forward row of table II detaches a recovery message from a
communication message because port p3 of node v4 connects
to co-tree link e7 .
Figure 5 describes the message transfer process of a node
with its communication path and tie-set tables. When a node

E. Failure recovery procedure


Following steps show a path switching method from a
communication path to a backup path on a graph G = (V, E)
where a link ef = (v, w) E is failed. The graph G includes
a tree T and fundamental tie-sets L1,2,,|T | extracted from the
T.
[Step 1] Extract communication path tables of the link ef .
The end nodes v, w individually extract communication path
tables which contain the port connecting to the failed link ef .
[Step 2] Select tie-sets Lv , Lw for recovery. The nodes
v, w select an optimal tie-set Lv , Lw that includes link ef
respectively. An optimal tie-set can include minimum edges
when a backup path becomes short.

300

TABLE III
O PEN F LOW RULE .
In
Port

VLAN
ID

Src

Ethernet
Dst Type

Src

IP
Dst

Proto

TCP
Src
Dst

[Step 3] Overwrite the communication path tables for


recovery. The node v changes the port on the communication
path tables extracted in step 1 to a port connected to a link
er Lv . The link er connecting to node v is a part of backup
path not the failed link. The node w simultaneously changes
tables in the same way.
[Step 4] Attach recovery messages. The node v attaches
identical recovery messages to communication messages if
the link er is on the tree T . The recovery message has an
ID of tie-set Lv and a direction from w to v. The node
w simultaneously attaches identical recovery messages to
communication messages in the same way.

Fig. 6.

An example of network with port numbers.

to attach a recovery message to a communication message,


we store the recovery message in the VLAN ID eld of the
communication message header. A table V shows an example
of ow tables for a backup path that follows a tie-set L2 on
clockwise. On the table V, the v4 v0 denotes tie-set ID and
its direction; additionally, priority of a backup path should be
higher than one on a communication path table. The node v4
separates VLAN ID from communication messages in order
to switch the messages from a backup path to a communication
path. When a link fails, the controller sends ow tables to end
nodes of the link. The ow tables pass both a tie-set ID and
a rotation direction to the VLAN ID eld. For example, table
VI shows the recovery ow table of node v1 when a failure of
the link (v1 , v0 ) in g. 6 occurs. The ow table passes tie-set
ID of L2 and its direction (v4 v0 ) to a message header. Based
on the header information, the messages are dispatched to port
p1 . These messages follow a backup path (v1 , v2 , v4 , v0 ), and
nally node v4 removes a recovery message.

IV. A N IMPLEMENTATION OF OUR METHOD IN O PEN F LOW


A. Overview of OpenFlow
The OpenFlow protocol [13] can actualize programmable
network and also construct testbed environments of new
control methods under more realistic networks. An OpenFlow
network contains a server called controller and OpenFlow
switches which just follow ow tables provided by the controller. This paper implements the proposed method in the
controller to verify its characteristics. When an OpenFlow
switch receives a communication message whose behavior is
not dened on ow tables, the switch forwards the message to
the controller. Then, the controller creates ow tables for the
message and sends it to the OpenFlow switches. A ow table
contains a rule, actions and priority to determine the operation
on messages matching the rule. The rule is usually set based
on header information of Ethernet, IP, TCP as shown in table
III. The action denes the certain operation on the messages
such as sending a message to a port or modifying a header
eld of a message. The priority denotes the execution order
of ow tables. A ow table whose priority is larger number
is executed rst.

C. Module elements and its behavior


We implement tie-set switch module in the trema framework
[14]. Figure 7 shows the composition of modules and message
sequences between these modules in a controller. The trema
contains a switch manager and a packet in lter as core
modules, and a topology discovery as additional modules.
OpenFlow switches in g. 7 indicate switches that follow
the OpenFlow protocol. OpenFlow switches send control messages, which include LLDP (Link Layer Discovery Protocol)
frames, state notify, port status and a header of communication
messages, to a controller. The switch manager receives the
control messages and forwards these messages to the packet
in lter or the topology. The packet in lter transfers messages
including LLDP frames to the topology discovery module.
The messages without LLDP are sent to the tie-set switch.
The topology discovery can detect a link according to LLDP
frames and notify the topology module of the link information.
When the topology module receives information of an unknown link, it sends a link add message which contains the link
information to the tie-set switch. If the state notify or the port
status message indicates a failure of a link, the topology sends
a link down message to the tie-set switch. The tie-set switch

B. Flow tables for communication and backup paths


This section explains ow tables of communication paths
and backup paths.In order to create these tables, the controller
should know network topology and manage a Forwarding Data
Base (FDB) that indicates a port of a switch to which a
MAC address connects. Then, the controller extracts a tree
and fundamental tie-sets as shown in g. 6. In case of normal
operation, the controller establishes communication paths from
this tree. An example of ow tables is shown in table IV where
the host1 sends communication messages to the host2 on g.
6 network. In table IV, the dl dst and p indicate a destination
MAC address and a port index of a switch respectively. The
ow tables of communication paths transfer messages to host2
using the tree links from all other nodes.
In case of failure recovery, the controller creates ow tables
for backup paths following the fundamental tie-sets. In order

301

TABLE IV
F LOW
Node
v0
v1
v2
v3
v4

dl
dl
dl
dl
dl

24GB RAM, VMware Player 4.02 as Virtualization software,


and Ubuntu 11.04 as Guest OS.
On experimental networks, the number of nodes increases
10 through 90 every 10 nodes. These networks are random
graphs created by the BA model that are adjusted in order to
output bi-connected graphs.
Verication experiments count the number of lost packets
and two types of control messages. The lost packets are
observed during the time between a failure occurrence and
the failure recovery. One type of the control messages is used
to construct communication paths and backup paths, and the
other actualizes recovery of one failure. Following list indicate
the procedure of our experiment.
1) Select two different nodes as a sender and a receiver of
communication messages.
2) Wait establishments of communication paths and backup
paths.
Then, begin to send communication messages from the
sender to the receiver.
3) Wait some seconds (5 seconds), and remove a link
through which the communication messages pass.
4) After nishing communication, measure the number of
received packets.
To remove a link in step 3, we eliminate an network
interface such as trema4-1 using an ifcong command. The
communication messages are UDP packets which contain 50
bytes padding data. The sender node transfers 20000 UDP
packets at 1000 packets per sec, and the capacity of each link is
10 Mbps. Note that the capacity is enough for communication.
We iterate this procedure ten times for each ve random graphs
that have the same number of nodes. The procedure 1 selects
a node pair of a sender and a receiver by the following manner
in order to opt for a different node pair in each execution. A
unique number is assigned to each node in a random graph,
and a node with a smaller number tends to be adjacent to more
nodes; therefore, smaller numbered nodes are likely to connect
each other directly. Because of the numbering feature, ve
candidate nodes of a sender and a receiver are selected from
a random graph so that the differences of ve node numbers
are equal; for example, nodes 1, 5, 9, 13 and 17 could be
the candidates if a random graph has 20 nodes. Then, ten
node pairs of a sender and a receiver can be selected from the
combination of ve candidate nodes with giving the smaller
number to a sender.

TABLES OF A TREE .

Rules
dst=00:00:00:01:00:01
dst=00:00:00:01:00:01
dst=00:00:00:01:00:01
dst=00:00:00:01:00:01
dst=00:00:00:01:00:01

Actions
output:p1
output:p3
output:p4
output:p3
output:p3

priority
65533
65533
65533
65533
65533

dl dst: Destination MAC address


F LOW
Node
v4
v0
v1
v2

TABLE V
L2 ( DIRECTION IS v4

TABLES OF THE TIE - SET

dl
dl
dl
dl

Rules
vlan ID=v4 v0
vlan ID=v4 v0
vlan ID=v4 v0
vlan ID=v4 v0

Actions
strip vlan,output:p2
output:p3
output:p1
output:p5

TO v0 ).

priority
65534
65534
65534
65534

dl vlan ID: VLAN ID eld of a Ethernet header


TABLE VI
A
Node

FLOW TABLE FOR RECOVERY OF A LINK FAILURE .

Actions
priority
mod vlan vid
(ID=v4 v0 ),
v1
dl dst=00:00:00:01:00:01
65534
output:p1
mod vlan vid: modify the VLAN ID eld of Ethernet
header to specied number

Fig. 7.

Rules

Composition of modules and message sequences in a controller.

creates ow tables for communication and backup paths based


on the received messages. The ow tables are transmitted to
the OpenFlow switches through the switch manager. When
the tie-set switch detects a link down, it creates ow tables in
order to switch a communication path to a backup path.

B. Results
Figure 8 indicates the number of lost packets in the time
from failure occurrence to repair. In g. 8, the x-axis and the yaxis show the number of nodes and the number of lost packets,
respectively. The number of lost packets remains around 40
packets regardless of the number of nodes. Thus, the proposed
method can restrain the lost packets even on large networks.
Figure 9 describes the number of control messages during
the initialization and the failure recovery. The initialization
is the number of messages required for the construction of
communication and backup paths. On the other hand, the

V. V ERIFICATION EXPERIMENTS
A. Environment and procedure of experiments
We installed an experimental environment on a virtual
machine. The host PC and the guest PC used for experiments
have Windows 7 as host OS, Intel Core i7 960 3.20GHz CPU,

302

Fig. 8.

Fig. 10.

Nodes vs. lost packets.

The number of edges in a recovery tie-set.

R EFERENCES

Fig. 9.

[1] ITU-T Rec. G.8032, Ethernet Ring Protection Switching, June 2008.
[2] ANSI T1. 105.01-1998, Telecommunications Synchronous Optical Network (SONET) Automatic Protection Switching, Jan. 1998.
[3] M.S. Kiaei, C. Assi and B. Jaumard, A Survey on the p-Cycle Protection
Method, IEEE Communications Surveys & Tutorials, Vol. 11, Issue 3,
pp.5370, 2009.
[4] W.D. Grover and D. Stamatelakis, Cycle-oriented distributed preconguration: ring-like speed with meth-like capacity for self-planning
network restoration, Proc. of IEEE International Conference on Communications, pp.537543, Atlanta, Georgia, USA, June 1998.
[5] W.D. Grover and D. Stamatelakis, Bridging the ring-mesh dichotomy
with p-cycles, Proc. of Design of Reliable Communication Networks,
pp.92104, Munich, Germany, April 2000.
[6] H. Zhang and O. Yang, Finding protection cycles in DWDM networks,
Proc. of IEEE International Conference on Communications, vol.5,
pp.27562760, New York, USA, April 2002.
[7] J. Doucette, D. He, W.D. Grover, O. Yang, Algorithmic approaches
for efcient enumeration of candidate p-cycles and capacitated p-cycle
network design, Proc. of Design of Reliable Communication Networks,
pp.212220, Banff, Alberta, Canada, Oct. 2003.
[8] C. Liu and L. Ruan, Finding good candidate cycles for efcient p-cycle
network design, Proc. of 13th International Conference on Computer
Communication and Networks, pp.321326, Chicago, USA, Oct. 2004.
[9] A. Sack and W. D. Grover, Hamiltonian p-cycles for ber-level
protection in homogeneous and semi-homogeneous optical networks,
IEEE Network, Special Issue on Protection, Restoration, and Disaster
Recovery, Nov./Dec. 2003.
[10] N. Shinomiya, T. Koide and H. Watanabe, A theory of tie-set graph
and its application to information network management, International
J. of Circuit Theory and Applications, Vol.29, No.4, pp.367-379, July
2001.
[11] T. Koide, H. Kubo, and H. Watanabe, A study on the tie-set graph theory
and network ow optimization problems, International J. of Circuit
Theory and Applications, Vol.32, pp.447-470, 2004.
[12] K. Nakayama and N.Shinomiya, Autonomous recovery for link failure
based on tie-sets in information networks, Proc. of 2011 IEEE Symposium on Computers and Communications, pp.671-676, Corfu, Greece,
June 2011.
[13] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson,
J. Rexford, S. Shenker, J. Turner, OpenFlow: Enabling Innovation
in Campus Networks, ACM SIGCOMM Computer Communication
Review, Vol. 38, Issue 2, pp.6974, April 2008.
[14] TremaChttps://github.com/trema

Nodes vs. control messages.

failure recovery indicates the number of necessary messages


to recover the failure. The x-axis is also the number of nodes,
and the y-axis is the number of messages. The initialization
increases linearly against the size of networks; in contrast,
the failure recovery is stable around ten packets. Therefore,
the proposed method can recover a failure with less control
messages.
Figure 10 illustrates the number of links in a tie-set used
for a failure recovery. The number of links indicates the length
of a backup path which ranges between three and the number
of nodes. Furthermore, a backup path that uses a large tieset consumes bandwidth on a network. The average in g. 10
denotes the average of 50 experiments, and the worst denotes
the worst case. The slight increase in the number of links
veries that the proposed method can restrain consumption of
bandwidth on large networks.
VI. C ONCLUSION
This paper proposes a backup path calculation and path
switching method to recover a link failure under realistic network environments. The emulation experiments on OpenFlow
indicate that our method curb loss of packets, control messages
and backup path length against increase in the number of
nodes. Thus, the proposed method can realize efcient failure
recovery on large networks. However, because the proposed
method may increase tables of all nodes, reduction of tables
requires further study.

303

You might also like