RTFT15 Unit 6

UNIT 6
Practical Systems for Fault

Tolerance:
Application:
Ad-hoc wireless network - Application:
NASA Remote Exploration &
Experimentation
System Architecture: Fault tolerant
computers
General purpose commercial systems Fault tolerant multiprocessor and VLSI
basedcommunication architecture.
Fault tolerant software:Design-N-version
Roll No: 15
Ad hoc Wireless Networks

Definition
Ad-hoc network
a LAN or other small network, with wireless
connections
devices are part of the network only for the
duration of a communications session
Or while in close proximity to the network
Decentralized
E.g.: Bluetooth
Real Time and Fault Tolerance
Ad hoc Wireless Networks

Definition
The principle behind ad hoc networking is multi-hop
relaying in which messages are sent from the
source to the destination by relaying through the
intermediate hops (nodes).
In multi-hop wireless networks, communication

between two end nodes is carried out through a
number of intermediate nodes whose function is to
relay information from one point to another.
A static string topology is an example of such

network:
0
In the last few years, efforts have been focused on

multi-hop "ad hoc" networks, in which relaying nodes
are in general mobile, and communication needs are
primarily between nodes within the same network.
Applications
of
Ad
hoc
Wireless
Military
applications
Networks
Ad hoc wireless networks is useful in establishing
communication in a battle field.
Collaborative and Distributed Computing

A group of people in a conference can share data in ad hoc
networks.
Streaming of multimedia objects among the participating
nodes.
Emergency Operations
Ad hoc wireless networks are useful in emergency
operations such as search and rescue, and crowd
control.
Real Time
and Fault Tolerance
Issues in Protocol Design

Must run in distributed environment
must provide loop-free routes
must be able to find multiple routes
must establish routes quickly
must minimize overhead in its communication /
reaction to topology change
Remote Exploration &Experimentation

Introduction
The goal of the Remote Exploration

&Experimentation (REE) is
to move supercomputing into space in a cost

effective manner
to allow the use of inexpensive, state of the art,

commercial-off-the-shelf (COTS) components and
subsystems in these space-based
supercomputers.
sombody@gmail.com
Some of the responsibilities of the REE system software

include:
1. Managing system resources (maintaining state
information about each node and about the global

system, performing system resource diagnostics,etc.).
2. Job scheduling (globally scheduling jobs across the

system, local job scheduling within the node,
allocation of resources to jobs, etc.).
3. Managing the scientific applications (launching the

applications, monitoring the applications for failure,
initiating recovery for applications, etc.)Real Time and Fault Tolerance
The immediate concern of the Applications Manager is to

oversee the execution of the scientific applications.
As the applications represent the ultimate customer of the

REE
environment,
efficiently
supporting
their
required
dependability level is paramount.
The Applications Manager monitors the science application

for externally visible signs of faulty behavior as well as for
messages
generated
internally
by
the
applications
requesting fault tolerance services

sombody@gmail.com
Fault tolerant computers

Fault-tolerant computing is the art and science of
building computing systems that continue to
operate satisfactorily in the presence of faults.
A fault-tolerant system may be able to tolerate

one or more fault-types including
i) transient, intermittent or permanent hardware
faults,
ii) software and hardware design errors,
iii) operator errors,
iv)externally induced upsets or physical
damage.
Fault-tolerant computer systems

Fault-tolerant computer systemsare systems
designed around the concepts offault tolerance.
In essence, they must be able to continue working

to a level of satisfaction in the presence of faults.
conceptual design of a segregated-component

fault-tolerant computer design
sombody@gmail.com
Fault Tolerant Multiprocessor

Introduction
Fault tolerance is often considered as a good additional
feature for multiprocessor systems but nowadays it is
becoming an essential attribute.
Fault tolerance can be achieved by the use of dedicated

customized hardware that may have the disadvantage of
large cost.
Another approach to fault tolerance is to exploit existing

redundancy in multiprocessor systems via a task scheduling
software strategy based on time redundancy. Real Time and Fault Tolerance
Fault tolerant software

N-Version Programming
Recovery Block Approach
N-Version Programming
The N-version software concept attempts to parallel the
traditional hardware fault tolerance concept of N-way
redundant hardware.
In an N-version software system, each module is made

with up toNdifferent implementations. Each variant
accomplishes the same task, but hopefully in a different
way.
Each version then submits its answer to voter or

decider which determines the correct answer, and
This system can hopefully overcome the design faults

present in most software by relying upon the design
diversity concept.
An important distinction in N-version software is the
fact that the system could include multiple types of
hardware using multiple versions of software.
The goal is to increase the diversity in order to avoid
common mode failures.
Using N-version software, it is encouraged that each
different version be implemented in as diverse a
manner as possible, including different tool sets,
different programming languages, and possibly
different environments
Recovery Block Approach

The recovery block operates with an adjudicator which
confirms the results of various implementations of the
same algorithm.
In a system with recovery blocks, the system view is

broken down into fault recoverable blocks.
The entire system is constructed of these fault tolerant

blocks.
Each
block
contains
at
least
primary,
secondary, and exceptional case code along with an

adjudicator
Fault tree diagrams

Alternatively, if both input events must occur in
order for the output event to occur, then they are
connected by an AND gate.
Figure 1 shows a simple fault tree diagram in

which either A or B must occur in order for the
output event to occur. In this diagram, the two
events are connected to an OR gate
THANK YOU

RTFT15 Unit 6

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RTFT15 Unit 6

Uploaded by

Copyright:

Available Formats

UNIT 6

Practical Systems for Fault

Ad hoc Wireless Networks

Ad hoc Wireless Networks

In multi-hop wireless networks, communication

A static string topology is an example of such

In the last few years, efforts have been focused on

Real Time and Fault Tolerance

Collaborative and Distributed Computing

Issues in Protocol Design

Real Time and Fault Tolerance

Remote Exploration &Experimentation

The goal of the Remote Exploration

to move supercomputing into space in a cost

to allow the use of inexpensive, state of the art,

Some of the responsibilities of the REE system software

1. Managing system resources (maintaining state

information about each node and about the global

2. Job scheduling (globally scheduling jobs across the

3. Managing the scientific applications (launching the

Real Time and Fault Tolerance

The immediate concern of the Applications Manager is to

As the applications represent the ultimate customer of the

dependability level is paramount.

The Applications Manager monitors the science application

requesting fault tolerance services

Fault tolerant computers

A fault-tolerant system may be able to tolerate

Fault-tolerant computer systems

In essence, they must be able to continue working

Real Time and Fault Tolerance

conceptual design of a segregated-component

Fault Tolerant Multiprocessor

Fault tolerance can be achieved by the use of dedicated

Another approach to fault tolerance is to exploit existing

Fault tolerant software

Real Time and Fault Tolerance

In an N-version software system, each module is made

Each version then submits its answer to voter or

Real Time and Fault Tolerance

This system can hopefully overcome the design faults

Recovery Block Approach

In a system with recovery blocks, the system view is

The entire system is constructed of these fault tolerant

secondary, and exceptional case code along with an

Real Time and Fault Tolerance

Fault tree diagrams

Figure 1 shows a simple fault tree diagram in

Real Time and Fault Tolerance

You might also like