You are on page 1of 31

1 Cluster Computing

_____________________________________________________________________________
_

What are clusters?


A cluster is a type of parallel or distributed processing
system, which consists of a collection of interconnected stand-
alone computers co - operatively working together as a single,
integrated computing resource.

This cluster of computers shares common network


characteristics like the same namespace and it is available to
other computers on the network as a single resource. These
computers are linked together using high-speed network
interfaces between themselves and the actual binding together of
the all the individual computers in the cluster is performed by
the operating system and the software used.
2 Cluster Computing
_____________________________________________________________________________
_

MOTIVATION FOR CLUSTERING


High cost of ‘traditional’ High Performance Computing
.

Clustering using Commercial Off The Shelf (COTS) is way


cheaper than buying specialized machines for computing.
Cluster computing has emerged as a result of the convergence of
several trends, including the availability of inexpensive high
performance microprocessors and high-speed networks, and the
development of standard software tools for high performance
distributed computing.
Increased need for High Performance Computing
As processing power becomes available, applications
which require enormous amount of processing, like weather
modeling are becoming more common place requiring the high
performance computing provided by Clusters

Thus the viable alternative to this problem is


3 Cluster Computing
_____________________________________________________________________________
_
“Building Your Own Cluster”, which is what Cluster Computing
is all about.

Components of a Cluster
The main components of a cluster are the Personal
Computer and the interconnection network. The computer can
be built out of Commercial off the shelf components (COTS)
and is available economically.
The interconnection network can be either an ATM ring
(Asynchronous Transfer Mode), which guarantees a fast and
effective connection, or a Fast Ethernet connection, which is
commonly available now. Gigabit Ethernet which provides
speeds up to 1000Mbps,or Myrinet a commercial
interconnection network with high speed and reduced latency
are viable options.
But for high-end scientific clustering, there are a variety of
network interface cards designed specifically for clustering.
Those include Myricom's Myrinet, Giganet's cLAN and the
IEEE 1596 standard Scalable Coherent Interface (SCI). Those
4 Cluster Computing
_____________________________________________________________________________
_
cards' function is not only to provide high bandwidth between
the nodes of the cluster but also to reduce the latency (the time it
takes to send messages). Those latencies are crucial to
exchanging state information between the nodes to keep their
operations synchronized.
INTERCONNECTION NETWORKS
Myricom
Myricom offers cards and switches that interconnect at
speeds of up to 1.28 Gbps in each direction. The cards come in
two different forms, copper-based and optical. The copper
version for LANs can communicate at full speed at a distance of
10 feet but can operate at half that speed at distances of up to 60
feet. Myrinet on fiber can operate at full speed up to 6.25 miles
on single-mode fiber, or about 340 feet on multimode fiber.
Myrinet offers only direct point-to-point, hub-based, or switch-
based network configurations, but it is not limited in the number
of switch fabrics that can be connected together. Adding switch
fabrics simply increases the latency between nodes. The average
latency between two directly connected nodes is 5 to 18
microseconds, a magnitude or more faster than Ethernet.
5 Cluster Computing
_____________________________________________________________________________
_

Giganet
Giganet is the first vendor of Virtual Interface (VI)
architecture cards for the Linux platform, in their cLAN cards
and switches. The VI architecture is a platform-neutral software
and hardware system that Intel has been promoting to create
clusters. It uses its own network communications protocol rather
than IP to exchange data directly between the servers, and it is
not intended to be a WAN routable system. The future of VI now
lies in the ongoing work of the System I/O Group, which in
itself is a merger of the Next-Generation I/O group led by Intel,
and the Future I/O Group led by IBM and Compaq. Giganet's
products can currently offer 1 Gbps unidirectional
communications between the nodes at minimum latencies of 7
microseconds.
6 Cluster Computing
_____________________________________________________________________________
_

IEEE SCI
The IEEE standard SCI has even lower latencies (under 2.5
microseconds), and it can run at 400 MB per second (3.2 Gbps)
in each direction. SCI is a ring-topology-based networking
system unlike the star topology of Ethernet. That makes it faster
to communicate between the nodes on a larger scale. Even more
useful is a torus topology network, with many rings between the
nodes. A two-dimensional torus can be pictured as a grid of n by
m nodes with a ring network at every row and every column. A
three-dimensional torus is similar, with a 3D cubic grid of nodes
that also has rings at every level. Supercomputing massively
parallel systems use those to provide the relatively quickest path
for communications between hundreds or thousands of nodes.
The limiting factor in most of those systems is not the operating
system or the network interfaces but the server's internal PCI
bus system. Basic 32-bit, 33-MHz PCI common in nearly all
desktop PCs and most low-end servers offers only 133 MB per
second (1 Gbps), stunting the power of those cards. Some costly
high-end servers such as the Compaq Proliant 6500 and IBM
7 Cluster Computing
_____________________________________________________________________________
_
Netfinity 7000 series have 64-bit, 66-MHz cards that run at four
times that speed. Unfortunately, the paradox arises that more
organizations use the systems on the low end, and thus most
vendors end up building and selling more of the low-end PCI
cards. Specialized network cards for 64-bit, 66-MHz PCI also
exist, but they come at a much higher price. For example, Intel
offers a Fast Ethernet card of that sort for about $400 to $500,
almost five times the price of a regular PCI version
8 Cluster Computing
_____________________________________________________________________________
_

CLUSTER CLASSIFICATION ACCORDING TO


ARCHITECTURE
Clusters can be basically classified into two

o Close Clusters
o Open Clusters
Close Clusters
They hide most of the cluster behind the gateway node.
Consequently they need less IP addresses and provide better
security. They are good for computing task
s.

Open Clusters

All nodes can be seen from outside,and hence they need


more IPs, and cause more security concern .But they are more fl
exible and are used for internet/web/information server task.
9 Cluster Computing
_____________________________________________________________________________
_

Close Cluster
High Speed Network

compute compute compute compute File


node node node node Server
node

Service Network

gateway Front-end
gateway
node
External Network

Open Cluster
High Speed Network

compute compute compute compute File


node node node node Server
node

External Network

Front-end
10 Cluster Computing
_____________________________________________________________________________
_

TECHNICALITIES IN THE DESIGN OF CLUSTER


Homogeneous and Heterogeneous Clusters.
The cluster can either be made of homogeneous machines,
machines that have the same hardware and software
configurations or as a heterogeneous cluster with machines of
different configuration. Heterogeneous clusters face problems of
different performance profiles, software configuration
management.
Diskless Versus “Disk full” Configurations

This decision strongly influences what kind of networking


system is used. Diskless systems are by their very nature slower
performers, than machines that have local disks. This is because
no matter how fast the CPU is , the limiting factor on
performance is how fast a program can be loaded over the
network.
Network Selection.
11 Cluster Computing
_____________________________________________________________________________
_
Speed should be the criterion for selecting the network.
Channel bonding, which is a software trick that allows multiple
network connections to be tied, together to increase overall
performance of the system can be used to increase the
performance of Ethernet networks.

Security Considerations
Special considerations are involved when completing the
implementation of a cluster. Even with the queue system and
parallel environment, extra services are required for a cluster to
function as a multi-user computational platform. These services
include the well-known network services NFS, NIS and rsh.
NFS allows cluster nodes to share user home directories as well
as installation files for the queue system and parallel
environment. NIS provides correct file and process ownership
across all the cluster nodes from the single source on the master
machine. Although these services are significant components of
a cluster, such services create numerous vulnerabilities. Thus, it
would be insecure to have cluster nodes function on an open
network. For these reasons, computational cluster nodes usually
reside on private networks, often accessible for users only
12 Cluster Computing
_____________________________________________________________________________
_
through a firewall gateway. In most cases, the firewall is
configured on the master node using ipchains or iptables.
Having all cluster machines on the same private network
requires them to be connected to the same switch (or linked
switches) and, therefore, localized at the same proximity. This
situation creates a severe limitation in terms of cluster
scalability. It is impossible to combine private network machines
in different geographic locations into one joint cluster, because
private networks are not routable with the standard Internet
Protocol (IP).
Combining cluster resources on different locations, so that users
from various departments would be able to take advantage of
available computational nodes, however, is possible.
Theoretically, merging clusters is not only desirable but also
advantageous, in the sense that different clusters are not
localized at one place but are, rather, centralized. This setup
provides higher availability and efficiency to clusters, and such a
proposition is highly attractive. But in order to merge clusters,
all the machines would have to be on a public network instead of
a private one, because every single node on every cluster needs
to be directly accessible from the others. If we were to do this,
13 Cluster Computing
_____________________________________________________________________________
_
however, it might create insurmountable problems because of
the potential--the inevitable--security breaches. We can see then
that to serve scalability, we severely compromise security, but
where we satisfy security concerns, scalability becomes
significantly limited. Faced with such a problem, how can we
make clusters scalable and, at the same time, establish a rock-
solid security on the cluster networks? Enter the Virtual Private
Network (VPN).
VPNs often are heralded as one of the most cutting-edge,
cost-saving solutions to various applications, and they are
widely deployed in the areas of security, infrastructure
expansion and inter-networking. A VPN adds more dimension to
networking and infrastructure because it enables private
networks to be connected in secure and robust ways. Private
networks generally are not accessible from the Internet and are
networked only within confined locations.
The technology behind VPNs, however, changes what we
have previously known about private networks. Through
effective use of a VPN, we are able to connect previously
unrelated private networks or individual hosts both securely and
transparently. Being able to connect private networks opens a
14 Cluster Computing
_____________________________________________________________________________
_
whole slew of new possibilities. With a VPN, we are not limited
to resources in only one location (a single private network). We
can finally take advantage of resources and information from all
other private networks connected via VPN gateways, without
having to largely change what we already have in our networks.
In many cases, a VPN is an invaluable solution to integrate and
better utilize fragmented resources.
In our environment, the VPN plays a significant role in
combining high performance Linux computational clusters
located on separate private networks into one large cluster. The
VPN, with its power to transparently combine two private
networks through an existing open network, enabled us to
connect seamlessly two unrelated clusters in different physical
locations. The VPN connection creates a tunnel between
gateways that allows hosts on two different subnets (e.g.,
192.168.1.0/24 and 192.168.5.0/24) to see each other as if they
are on the same network. Thus, we were able to operate critical
network services such as NFS, NIS, rsh and the queue system
over two different private networks, without compromising
security over the open network. Furthermore, the VPN encrypts
all the data being passed through the established tunnel and
15 Cluster Computing
_____________________________________________________________________________
_
makes the network more secure and less prone to malicious
exploits.
The VPN solved not only the previously discussed
problems with security, but it also opened a new door for
scalability. Since all the cluster nodes can reside in private
networks and operate through the VPN, the entire infrastructure
can be better organized and the IP addresses can be efficiently
managed, resulting in a more scalable and much cleaner
network. Before VPNs, it was a pending problem to assign
public IP addresses to every single node on the cluster, which
limited the maximum number of nodes that can be added to the
cluster. Now, with a VPN, our cluster can expand in greater
magnitude and scale in an organized manner. As can be seen, we
have successfully integrated the VPN technology to our
networks and have addressed important issues of scalability,
accessibility and security in cluster computing.
16 Cluster Computing
_____________________________________________________________________________
_

Beowulf Cluster
Basically, the Beowulf architecture is a multi-computer
architecture that is used for parallel computation applications.
Therefore, Beowulf clusters are primarily meant only for
processor-intensive and number crunching applications and
definitely not for storage applications. Primarily, a Beowulf
cluster consists of a server computer that controls the
functioning of many client nodes that are connected together
with Ethernet or any other network comprising of a network of
switches or hubs. One good feature of Beowulf is that all the
system's components are available from off-the-shelf component
and there is no special hardware that is required to implement it.
It also uses commodity software - most often Linux - and other
commonly available components like Parallel Virtual Machine
(PVM) and Messaging Passing Interface (MPI).
Besides serving all the client nodes in the Beowulf cluster,
the server node also acts as a gateway to external users and
passes files to the Beowulf system. The server is also used to
drive the console of the system from where the various
17 Cluster Computing
_____________________________________________________________________________
_
parameters and configuration can be monitored. In some cases,
especially in very large Beowulf configurations, there is
sometimes more than one server node with other specialized
nodes that perform tasks like monitoring stations and additional
consoles. In disk-less configurations, very often, the individual
client nodes do not even know their own addresses until the
server node informs them.
The major difference between the Beowulf clustering system
and the more commonly implemented Cluster of Workstations
(CoW) is the fact that Beowulf systems tend to appear as an
entire unit to the external world and not as individual
workstations. In most cases, the individual workstations do not
even have a keyboard, mouse or monitor and are accessed only
by remote login or through a console terminal. In fact, a
Beowulf node can be conceptualized as a CPU+memory
package that can be plugged into the Beowulf system - much
like would be done with a motherboard.
It's important to realize that Beowulf is not a specific set of
components or a networking topology or even a specialized
kernel. Instead, it's simply a technology for clustering together
Linux computers to form a parallel, virtual supercomputer.
18 Cluster Computing
_____________________________________________________________________________
_

PROGRAMMING THE BEOWULF CLUSTER


Parallel programming requires skill and creativity and may
be more challenging than assembling the hardware of a Beowulf
system. The most common model for programming Beowulf
clusters is a master-slave arrangement. In this model, one node
acts as the master, directing the computations performed by one
or more tiers of slave nodes. Another challenge is balancing the
processing workload among the cluster's PCs.

Programming of a Beowulf cluster can be done in three


ways

o Using parallel message passing library such as PVM


and MPI
o Using parallel language such as High Performance
Fortran and OpenMP
o Using parallel math library
19 Cluster Computing
_____________________________________________________________________________
_

PVM - Parallel Virtual Machines

A message passing interface from Oak Ridge National


Laboratory and University of Tennessee.It appeared before MPI.
It is flexible for non-dedicated cluster, and is easy to use.It has l
ower performance and less feature rich compared to MPI
MPI - Message Passing Interface
A standard message passing interface for programming
cluster or parallel system from MPI Forum. It is easy to use.
Since there libraries are available on many platforms and
these are the defacto standards used for implementing parallel
programs, programs written with PVM or MPI will run with
little or no modification on large-scale machines, if the need
arises.
20 Cluster Computing
_____________________________________________________________________________
_

Top 10 reasons to prefer MPI over PVM


MPI has more than one freely available, quality
implementation.
There are at least LAM, MPICH and CHIMP. The choice
of development tools is not coupled to the programming
interface.
MPI defines a 3rd party profiling mechanism.
A tool builder can extract profile information from MPI
applications by supplying the MPI standard profile interface in a
separate library, without ever having access to the source code
of the main implementation.
o MPI has full asynchronous communication.
o Immediate send and receive operations can fully overlap
computation.
o MPI groups are solid, efficient, and deterministic.
o Group membership is static. There are no race conditions
caused by processes independently entering and leaving a
21 Cluster Computing
_____________________________________________________________________________
_
group. New group formation is collective and group
membership information is distributed, not centralized.
o MPI efficiently manages message buffers.
o Messages are sent and received from user data structures, not
from staging buffers within the communication library.
Buffering may, in some cases, be totally avoided.
o MPI synchronization protects the user from 3rd party
software.
o All communication within a particular group of processes is
marked with an extra synchronization variable, allocated by
the system. Independent software products within the same
process do not have to worry about allocating message tags.
o MPI can efficiently program MPP and clusters.
A virtual topology reflecting the communication pattern of
the application can be associated with a group of processes. An
MPP implementation of MPI could use that information to
match processes to processors in a way that optimizes
communication paths.
MPI is totally portable.
Recompile and run on any implementation. With virtual
topologies and efficient buffer management, for example, an
22 Cluster Computing
_____________________________________________________________________________
_
application moving from a cluster to an MPP could even expect
good performance.

MPI is formally specified.


Implementations have to live up to a published document
of precise semantics.
MPI is a standard.
Its features and behaviour were arrived at by consensus in
an open forum. It can change only by the same process.
Advantages and Disadvantages of Programming Using Message
passing
The main advantages are that these are standards, and hence
portable. They provide high performance as compared to the
other approaches.
The disadvantage is that programming is quite difficult.
Programming Using Parallel Languages
There are hundreds of parallel languages but very few of
them are standard. The most popular parallel language is High
Performance Fortran. The High Performance Fortran Forum
(HPFF), a coalition of industry, academic and laboratory
representatives, works to define a set of extensions to Fortran 90
23 Cluster Computing
_____________________________________________________________________________
_
known collectively as High Performance Fortran (HPF). HPF
extensions provide access to high-performance architecture
features while maintaining portability across platforms.
The advantage if programming using parallel languages is
that it is easy to code, and that it is portable. The disadvantage is
lower performance and limited scalability.
Programming using Parallel Math Libraries
By using parallel math libraries, the complexity of writing
parallel code is avoided. Some examples are PETSc, PLAPACK,
ScaLAPACK math libraries.
PETSc.
PETSc is intended for use in large-scale application
projects, and several ongoing computational science projects are
built around the PETSc libraries. With strict attention to
component interoperability, PETSc facilitates the integration of
independently developed application modules, which often most
naturally employ different coding styles and data structures.
PETSc is easy to use for beginners. Moreover, its careful
design allows advanced users to have detailed control over the
solution process. PETSc includes an expanding suite of parallel
linear and nonlinear equation solvers that are easily used in
24 Cluster Computing
_____________________________________________________________________________
_
application codes written in C, C++, and Fortran. PETSc
provides many of the mechanisms needed within parallel
application codes, such as simple parallel matrix and vector
assembly routines that allow the overlap of communication and
computation. In addition, PETSc includes growing support for
distributed arrays.
Features include:
Parallel vectors Complete documentation
scatters Automatic profiling of floating
gathers point and memory usage
Parallel matrices Consistent interface
several sparse storage formats Intensive error checking
easy, efficient assembly. Portable to UNIX and Windows
Scalable parallel Over one hundred examples
preconditioners PETSc is supported and will be
Krylov subspace methods actively enhanced for the next
Parallel Newton-based several years.
nonlinear solvers
Parallel timestepping (ODE)
solvers
25 Cluster Computing
_____________________________________________________________________________
_

PLAPACK
PLAPACK is an MPI-based Parallel Linear Algebra
Package that provides an infrastructure for building parallel
dense linear algebra libraries. PLAPACK provides 3 unique
features.
o Physically based matrix distribution
o API to query matrices and vectors
o Programming interface that allows object oriented
programming
26 Cluster Computing
_____________________________________________________________________________
_

ScaLAPACK
ScaLAPACK is a library of high performance linear
algebra routines for distributed memory MIMD computers. It
contains routines for solving systems of linear equations .Most
machine dependencies are limited to two standard libraries
called the PBLAS, or Parallel Basic Linear Algebra Subroutines,
and the BLACS ,or the BLACS, or Basic Linear Algebra
Communication Subroutines. LAPACK and ScaLAPACK will
run on any system where the PBLAS and the BLACS are
available.
27 Cluster Computing
_____________________________________________________________________________
_

The traditional measure of supercomputer


performance is benchmark speed: how fast the system runs a
standard program. As engineers, however, we prefer to focus on
how well the system can handle practical applications. Ingenuity
in parallel algorithm design is more important than raw speed or
capacity: in this young science, David and Goliath (or Beowulf
and Grendel!) still compete on a level playing field.
Beowulf systems are also muscling their way into the
corporate world. Major computer vendors are now selling
clusters to businesses with large computational needs. IBM, for
instance, is building a cluster of 1,250 servers for NuTec
Sciences, a biotechnology firm that plans to use the system to
identify disease-causing genes. An equally important trend is the
development of networks of PCs that contribute their processing
power to a collective task. An example is SETI@home, a project
28 Cluster Computing
_____________________________________________________________________________
_
launched by researchers at the University of California at
Berkeley who are analyzing deep-space radio signals for signs
of intelligent life. SETI@home sends chunks of data over the
Internet to more than three million PCs, which process the radio-
signal data in their idle time. Some experts in the computer
industry predict that researchers will eventually be able to tap
into a "computational grid" that will work like a power grid:
users will be able to obtain processing power just as easily as
they now get electricity.
Above all, the Beowulf concept is an empowering force.
It wrests high-level computing away from the privileged few
and makes low-cost parallel-processing systems available to
those with modest resources. Research groups, high schools,
colleges or small businesses can build or buy their own Beowulf
clusters, realizing the promise of a supercomputer in every
basement.
29 Cluster Computing
_____________________________________________________________________________
_

Diskless Cluster

Server Diskless Node


DHCP request

Server assign IP

TFTP load
OS
Server Supply OS

Request NFS Root File System


30 Cluster Computing
_____________________________________________________________________________
_

Cluster Computing: Linux Taken to the Extreme. F. M.


Hoffman and W. W. Hargrove in Linux Magazine, Vol. 1, No. 1,
pages 56–59; Spring 1999.
Using Multivariate Clustering to Characterize Ecoregion
Borders. W. W. Hargrove and F. M. Hoffman in Computers in
Science and Engineering, Vol. 1, No. 4, pages 18–25;
July/August 1999.
How to Build a Beowulf: A Guide to the Implementation and
Application of PC Clusters. Edited by T. Sterling, J. Salmon,
D. J. Becker and D. F. Savarese. MIT Press, 1999.
Related Links:
31 Cluster Computing
_____________________________________________________________________________
_
http://stonesoup.esd.ornl.gov/
http://extremelinux.esd.ornl.gov/
http://www.beowulf.org/
http://www.cacr.caltech.edu/research/beowulf/
http://beowulf-underground.org/

You might also like