You are on page 1of 11

Contents

In this class

Types of Grids

Types of Grids
Desktop Grids
Motivations
Applications

Dr. Alejandro Zunino


CONICET / ISISTANUNICEN

Limitations
Boinc!

Clusters
1

Types of Grids
Grid computing vendors have adopted
various nomenclatures to explain and define
the different types of grids.
based on the structure of the organization
(virtual or otherwise) that is served by the grid

Departmental Grids
Solve problems for a particular group of
people within an enterprise:
Cluster Grids (Sun Microsystems):
One or more systems working together to provide a
single point of access to users.
Used by a team for a single project
Support both high throughput and high performance
jobs.

defined by the principle resources used in the


grid.

Infra Grids (IBM)


A grid that optimizes resources within an enterprise
Does not involve any other internal partner.

Enterprise Grids
Consist of resources spread across an
enterprise. Provide service to all users within
that enterprise:
Enterprise Grids (Platform Computing):
deployed within large corporations that have a global
presence or a need to access resources outside a
single corporate location.

Intra Grids (IBM):

Enterprise Grids
Campus Grids (Sun Microsystems):
Enable multiple projects or departments to share
computing resources in a cooperative way

May consist of dispersed workstations and servers as


well as centralized resources located in multiple
administrative domains, in departments, or across the
enterprise.

resource sharing among different groups within an


enterprise constitutes an intra grid.
Can be local or traverse the wide area network.
5

Extraprise Grids
Established between companies, their
partners, and their customers. The grid
resources are generally made available
through a virtual private network (VPN):
Extra Grids (IBM):

Global Grids
Grids established over the public Internet.
They can be established by organizations to
facilitate their business or purchased in part,
or in whole, from service providers:
Global Grids (Sun):

Enable sharing of resources with external partners.


Assumes that connectivity between the two
enterprises is through some trusted service, such as a
private network or a VPN.

Partner Grids (Platform Computing):


Grids between organizations within similar industries,
which have a need to collaborate on projects and use
each other's resources as a means to reach a common
goal.

Allow users to tap into external resources.


Provide the power of distributed resources to users
anywhere in the world for computing and
collaboration.
They can be used to send overflow work over the
public network to a grid services provider.

Inter Grids (IBM):


provide the ability to share compute and data/storage
resources across the public Web

Compute Grids
Provide access to computational resources:
Desktop Grids:

Others
Data Grids:
optimized for data-oriented operations.

Leverage the resources of desktop computers.

Server Grids:
Some corporations, while adopting Grid Computing ,
keep it limited to server resources
Special servers are bought solely for the purpose of
creating an internal utility grid with resources made
available to various departments.
No desktops are included in server grids.

High-Performance/Cluster Grids:
High-end systems, such as supercomputers or HPC
clusters
9

Utility Grids:
commercial compute resources that are
maintained and managed by a service provider
Customers that have the need to augment their
existing, internal computational resources may
purchase cycles from a utility grid.
also offer applications that can be purchased by
the minute.
10

Overview

Desktop Grids

11

Historical context
What is (and what isn't) a Desktop Grid?
Deployment challenges and value proposition
development for Desktop Grid technology
Key areas to assess when evaluating the
suitability of a Desktop Grid
The role of Desktop Grids in an Enterprise
computing infrastructure standards on
Desktop Grids
Examples
12

Motivation

CPU Availability

High computational power at low cost


Reuse existing infrastructure of resources

Successful deployment of high-throughput,


compute-intensive applications
High-Throughput?
Typically means that the performance metric is
the task completion rate over long periods of
time (e.g., month)
As opposed to makespan
13

14

Cause Computing
Searching for extra-terrestrials:
SETI@home: http://setiathome.ssl.berkeley.edu/

Evaluating AIDS drug candidates:


FightAIDS@Home:
http://www.fightaidsathome.org/

Screening for extremely large prime


numbers:
Greater Internet Mersenne Prime Search: http://
www.mersenne.org/prime.htm

Predicting climate on a global scale:


ClimatePrediction.net
http://www.climateprediction.net/index.php
15

Features
Long computations
Short communication packets
User-initiated tasks have preference
Minimally intrusive on the user and his
Internet connection
Primitive version of today's Desktop Grids
18

Why Would I Donate CPU Time?


Donate to scientific cause

Limitations
Lack of Resource Management

Many users wish to advance the specific field of


study
Projects that help fight disease may have an
emotional connection for those participating

Stress test computers


places a computer under full CPU load

Teams, credits, and competition


Hopes of climbing to the top of the world charts
Personal benefits and recognition

Projects such as PlanetQuest plan on


allowing individuals to name those planets
discovered using their19computers

Passive resource management


rely on the enrolled PCs to initiate communication with
the central administration server on a periodic basis.

Limits the degree to which the timeliness of


results from such a grid can be predicted.
Limits the ability to re-prioritize the computational
behavior of the grid
for example: replacing the PC that is working on a
particular task, in a timely manner.
20

Limitations
Lack of Security

Limitations
Machine Heterogeneity

Even if some form of encryption is used in transit,


the data usually reside in an unencrypted format
on the enrolled PC.
This limits the nature of the problems that can be
attempted over the public Internet to those in
which compromise of the data is not a pressing
issue.
The answers produced on the enrolled PC may
be vulnerable to tampering:

A wide variety of machines might be enrolled;


these can vary in CPU speed, RAM, hard-drive
capacity, and operating system level.
The management infrastructure either needs to
operate at the lowest common denominator or
needs to be aware of differences in the machines
and assign tasks appropriately.

Ex: SETI@Home alternative clients (buggy)


21

22

Limitations
Resource Availability
cause-computing paradigm relies on the idea of
voluntary participation
The PC may be turned off for the night, the
screensaver may be changed, the control
program may be disabled (either deliberately or
inadvertently), etc.
This adds another layer of unpredictability to the
performance expectations that can be associated
with such a grid.

23

So... What For?


Embarrassingly Parallel (aka Pleasantly
Parallel) applications
independent tasks (concurrent, out of order)
Example: Mandelbrot

Data Parallel / Iterative applications


Synchronized processes
Example: Jacobi, Matrix Multiply

Workflow applications
Described by a DAG
Example: some image processing applications
24

Examples
67 TFlops/sec, 500,000 workers, $700,000

1 7 .5 TFlo p s /s e c , 8 0 ,0 0 0 wo rke rs
1 8 6 TFlo p s /s e c , 1 9 5 ,0 0 0 wo rke rs

Desktop Grid
A defined (named) collection of machines on
a shared network.
may include dedicated machines, intermittently
connected machines, and shared machines
Any single machine is part of one, and only one,
Desktop Grid.

A set of user-controlled policies describing


the way in which each of these machines
participates in the grid:
Support for automated addition and removal of
machines without user or administrative
intervention.

25

26

Desktop Grid
The machines on the grid are unaware of
each other except as informed by the central
server.
client-server architecture (no peer-to-peer)

Managed mechanism for distribution,


execution, and retrieval of work to and from
the grid under control of a central server.

Components
Grid Server
This is a central machine that controls and
administers the Desktop Grid.

Grid Client
An individual node that is a member of the
Desktop Grid from which spare computational
resources will be harvested.

Grid Client Executive


The software component of the grid
infrastructure that resides on a PC, enables that
PC to serve as a Grid Client, and manages all
interaction between the Grid Client and the Grid
Server.

27

28

Components

Technologies: Considerations
Security

Work Unit
A computation assigned to a Grid Client by the
Grid Server
a grid-enabled version of an application
instructions for establishing an environment for the
application on the Grid Client

Unobtrusiveness
Application Integration
Robustness

input data (or a pointer to the location of the input


data)

Scalability

instructions on how to execute the application and


produce the output data.

Central Management

29

30

Technologies: Security
Disallow (or limit) access to network or local
resources by the distributed application.
Encrypt application and data to preserve
confidentiality and integrity.
Ensure that the Grid Client environment (disk
contents, memory utilization, registry contents, and
other settings) remains unchanged after running
the distributed application.
Prevent local user from interfering with the
execution of the distributed application.
Prevent local user from tampering with or deleting
data associated with the distributed application.
31

Technologies: Integration
Ability to simulate a standalone environment
within the Grid Client.
Integrated security and encryption of
sensitive data.
Easy integration (tools, examples, and
wizards are provided).
Support for any application...
Binary-level integration (no recompilation,
relinking, or source code access...).
33

Technologies: Scalability
Automatic addition, configuration, and
registration of new Grid Clients.

Compatible with heterogeneous resource


population.

Configurable over multiple geographic


locations.

Technologies: Unobtrusiveness
Centrally manage unobtrusiveness levels that are
changeable based on time-of-day or other factors.
Ensure that the Grid Client Executive relinquishes
client resources automatically.
Ensure invisibility to local user.
Prevent distributed application from displaying
dialogs or action requests.
Prevent performance degradation (and total system
failure) due to execution of the distributed
application.
Require very little (ideally, zero) interaction with the
day-to-day user of the Grid Client.
32

Technologies: Robustness
Allocate work to appropriately configured Grid
Clients.
Automatically reallocate work units when Grid
Clients are removed from grid either permanently or
temporarily.
Automatically reallocate work units due to other
resource or network failures.
Prevent aberrant applications from completely
consuming Grid Client resources (disk, memory,
CPU, etc.).
Provide transparent support many OSs in the Grid
Client population.
34

Technologies: Central Manageability


Automated monitoring of all grid resources.
Central queuing and management of work units for
the grid.
Central policy administration for grid access and
utilization.
Compatibility with existing IT management systems.
Product installation and upgrade can be
accomplished using typical enterprise software
management environments.
Remote client deployment and management.

35

36

PC Grids Versus Supercomputers

An Example: BOINC
Berkeley Open Infrastructure for Network
Computing (BOINC)
http://boinc.berkeley.edu/

Features:
Flexible application framework
Existing applications in common languages (C, C++,
Fortran) can run as BOINC applications with little or
no modification.
New versions of applications can be deployed with no
participant involvement.

Security
37

38

An Example: BOINC
BOINC protects against several types of attacks:
digital signatures based on public-key encryption
protect against the distribution of viruses.

Multiple servers and fault-tolerance

An Example: BOINC
Support for large data
BOINC supports applications that produce or consume
large amounts of data, or that use large amounts of
memory.

Separate scheduling and data servers, with


multiple servers of each type.

Data distribution and collection can be spread across


many servers, and participant hosts transfer large
data unobtrusively.

Clients automatically try alternate servers; if all


servers are down, clients do exponential backoff
to avoid flooding the servers when they come
back up.

Users can specify limits on disk usage and network


bandwidth. Work is dispatched only to hosts able to
handle it.

39

40

BOINC

BOINC Credits
Credit System is designed to avoid cheating
by validating results before granting credit
This ensures users are returning accurate
results

41

42

BOINC Manages the Details, But...


Validation:
when a sufficient number (a 'quorum') of
successful results have been returned, the
application compares them and sees if there is
a 'consensus':
method of comparing results (which may need to take
into account platform-varying floating point arithmetic)
policy for determining consensus (e.g., best two out of
three)

If a consensus is reached, a particular result is


designated as the 'canonical' result.
Second, if a result arrives after a consensus has
already been reached, the new result is
compared with the canonical result; this
determines whether the
43 user gets credit.

Projects Using BOINC


SZTAKI Desktop Grid: search for generalized
binary number systems.
LHC@home: improve the design of the CERN
LHC particle accelerator
Quantum Monte Carlo at Home: study the
structure and reactivity of molecules using
Quantum Chemistry.
SIMAP: calculate protein similarity data for
use by many biological research projects.
Rosetta@home: help researchers develop
cures for human diseases.
45

Using BOINC: hello.C


#include "diagnostics.h"
#include "boinc_api.h"
#include "filesys.h"
#include "util.h"
boinc_sleep()

// boinc_init_diagnostics()
//
// boinc_fopen(), etc...
// parse_command_line(),

int main(int argc, char **argv) {


int rc;
// return code from various functions
char resolved_name[512];
// physical file name for out.txt
FILE* f;
// file pointer for out.txt
/* Before initializing BOINC itself, intialize diagnostics so as
to get stderr output to the file stderr.txt */
rc = boinc_init_diagnostics(BOINC_DIAG_REDIRECTSTDERR|
BOINC_DIAG_DUMPCALLSTACKENABLED|
BOINC_DIAG_TRACETOSTDERR);
if(rc) exit(rc);
/* Output written to stderr will be returned with the Result
(task) */

Projects Using BOINC


Climateprediction.net, BBC Climate Change
Experiment, and Seasonal Attribution Project:
study climate change.
Cell Computing biomedical research
Einstein@home: search for gravitational
signals emitted by pulsars.
Predictor@home: predict protein structure
from protein sequence
44

Projects Using BOINC


World Community Grid: advance our
knowledge of human disease
SETI@home: Look for radio evidence of
extraterrestrial life.

46

Using BOINC: hello.C


/* BOINC apps that do not use graphics just call boinc_init() */
rc = boinc_init();
if (rc){
fprintf(stderr, "APP: boinc_init() failed. RC=%d\n", rc);
fflush(0);
exit(rc);
}
/* Input/output files need to be "resolved" from their logical name
for the application to the actual path on the client's disk */
rc = boinc_resolve_filename("out.txt", resolved_name,
sizeof(resolved_name));
if (rc){
fprintf(stderr, "APP: cannot resolve output file name. RC=%d\n",
rc);
boinc_finish(rc);
/* back to BOINC core */
}

fprintf(stderr,"Hello, stderr!\n");
47

48

Using BOINC: hello.C

Using BOINC: hello.C

/* Open files with boinc_fopen() not just fopen()

/* All BOINC applications must exit via boinc_finish(rc), not


merely exit() */

(Output files should usually be opened in "append" mode, in case


this is actually a restart (which will not be the case here)) */

fclose(f);

f = boinc_fopen(resolved_name, "a");

fprintf(stderr,"goodbye!\n");

fprintf(f, "Hello, BOINC World!\n");


/* Now run up a wee bit of credit.

boinc_finish(0);
This is the "worker" loop */

/*

N = 123456789;

fprintf(f, "Starting some computation...\n");

Dummy graphics API entry points.

This app does not do graphics,

but it still must provide these empty callbacks.

*/
void app_graphics_init() {}

for ( j=0 ; j<N ; j++ ){


num=rand()+rand();

/* does not return */

{ int j, num, N;

// just do something to spin the wheels

void app_graphics_resize(int width, int height){}


void app_graphics_render(int xs, int ys, double time_of_day) {}

}
fprintf(f, "Computation completed.\n");

...

}
49

Using BOINC: hello_re.xml (Results)


<file_info>
<name><OUTFILE_0/></name>
<generated_locally/>
<upload_when_present/>
<url><UPLOAD_URL/></url>
<max_nbytes>2048</max_nbytes>
</file_info>
<result>
<file_ref>

50

Using BOINC: hello_wu.xml (Work


Unit)
<workunit>
<min_quorum>
1
<target_nresults>
2
<max_error_results>
3
<max_total_results>
9
<max_success_results> 11
<rsc_fpops_est>
3e9
<rsc_fpops_bound>
9e11
<delay_bound>
8000
<rsc_memory_bound> 204800
<rsc_disk_bound>
307200

</min_quorum>
</target_nresults>
</max_error_results>
</max_total_results>
</max_success_results>
</rsc_fpops_est>
</rsc_fpops_bound>
</delay_bound>
</rsc_mem_bound>
</rsc_disk_bound>

</workunit>

<file_name><OUTFILE_0/></file_name>
<open_name>out.txt</open_name>
</file_ref>
</result>
51

Applications for Desktop Grids

52

Analyzing Application Distribution


Possibilities

Data Parallel:
process large input datasets in a sequential
fashion with no application dependencies
between or among the records of the dataset.

Parameter Sweep:
use an iterative approach to generate a
multidimensional series of input values used to
evaluate a particular set of output functions.

I1

I2

...

In

App*

App*

...

App*

O1

O2

...

On

Input

Application

Probabilistic:
process a very large number of trials using
randomized inputs to generate input values used
to evaluate a particular set of output functions.
53

Output

Output

54

*The application is untouched

Enabling Applications for Grids


how to decompose the input(s) of a large,
monolithic job into an equivalent set of
smaller input(s) that can be processed in a
distributed fashion?
how to recompose the output(s) from these
smaller distributed instances of the
application into a combined output that is
indistinguishable from that which would have
been produced by the single large job?

Determining Application Suitability


Compute Intensity:
reflects the relative percentage of time spent
moving data to and from the Desktop Grid Client
compared to the time spent performing
calculations on that data:
CI=

4WorkUnitDuration
InputSize OutputSize

In general, grid-enabled applications where CI is


greater than 1.0 are well suited for distributed
processing using a Desktop Grid solution.
What if:
the network is very fast: 1Gbps -> <1 may be OK

55

Determining Application Suitability


Example:

56

The Grid Server: Additional Services


Client Group-level Operations

a typical work unit executes in 15 minutes (900


seconds) on a hypothetical average grid client
consumes 2MB (2,000 KB) of input data
produces 0.4MB (400 KB) of output data
CI = (4 * 900) / (2000 + 400) = 1.5

57

As the size and complexity of the grid grows, it is


more useful to administer the grid as a collection
of virtual, overlapping groups.
Set of rules that allow client membership to be
determined automatically for both new Grid
Clients and for Grid Clients that have changed
status (for example, upgrading the Windows
operating system on that client or adding
memory to that client).

58

The Grid Server: Additional Services

The Grid Server: Additional Services

Data Caching

Job-level Scheduling and Administration:

The time needed to move data to and from the


Grid Client plays an important role in the
calculation of CI.
Data caching in which data needed for a work
unit can be placed in (or very close to) the Grid
Client:
be manually controlled:
certain data sets can be pushed to particular Clients and then
any work unit that needs those data sets are assigned
exclusively to those Clients

automatically administered:
the Grid Server examines its queue of work and ensures that
any data needed for a work unit will be available at the Client
59

run this application using these inputs with this


priority and put the answers here.
This is substantially more abstract than the
fundamental work unit level of the internal
workings of the Desktop Grid.
Should support various levels of job priority along
with the ability to select particular Clients (or
groups of Clients) for a particular job based on
characteristics of the job itself.
60

The Grid Server: Additional Services

The Grid Server: Additional Services

Performance Tuning and Analysis

Security

data and reports to allow an administrator to


determine important performance characteristics
of each Grid Client and the grid as a whole:
optimum (theoretical) throughput calculations
actual throughput calculations for any particular job or
set of work units

Each separately identified function within the Grid


Server user environment should include userlevel security
which users may add new applications
which users may submit jobs,

identification of any problematic Clients (or groups of


Clients)

which users may review job output

...

...
61

62

The Grid Server: Additional Services


System Interfaces
The Grid Server should support a variety of
interfaces for its various user and administrative
functions.
At minimum, a browser-based interface.
Other interfaces that might be provided include a
command-line interface (for scripting support),
An API (for invoking grid functionality from other
programs)
An XML interface
63

Data Mining.
Engineering Design, CAD/CAM and rendering
Financial Modeling: Portfolio management and risk
management
Geophysical Modeling: Climate prediction and seismic
computations
Graphic Design: Animation and three-dimensional
rendering
Life Sciences: Disease simulation and target
identification
Material Sciences: Physical property prediction and
product optimization
Supply Chain Management: Process optimization and
total cost minimization
64

Conclusions
Started just a few years ago as noble-minded
projects for combining spare compute
capacity of individual PCs
Can aggregate the unused cycles of an
organization's existing PC resources into a
powerful, virtual computing engine.
Supplement (or replace) existing HPC
resources at a fraction of the cost
Not all computing problems are well suited
65

Practical Uses of Desktop Grids

You might also like