You are on page 1of 6

Grid Computing as Virtual Supercomputing

Paper presented by
Rajsamayya Ganesh Yalagonda
Ashish Kawarlal Khatri Third Year CSE Anuradha Engineering College, Sakegaon Raod,Chikhli,Buldana(M.S.)

Grid Computing as Virtual Supercomputing

Abstract
Grid computing is still a developing field & its related to several other innovative computing systems, some of which are categorized as Shared Computing, Software-as-a-Service (SaaS) known as Utility Computing, Cloud Computing etc. Grid Computing System works on pooled computing. It simply means no. of computers connected in network collectively work to perform single task. Such a setup is known as virtual Super computer. In such system all computers work parallely to achieve the aim. One of the Major Project which used the Grid Computing is SETI@Home. This project aims to Search the alien Intelligence by studying Radio Telescope waves.

problems, is the most common application of this technology. Before moving towards our further discussion over Grid Computing, Its useful to have idea about Super Computer. A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers using custom CPUs traditionally gained their speed over conventional computers through the use of innovative designs that allow them to perform many tasks in parallel, as well as complex detail engineering. They tend to be specialized for certain types of computation, usually numerical calculations, and perform poorly at more general computing tasks. Their memory hierarchy is very carefully designed to ensure the processor is kept fed with data and instructions at all times in fact, much of the performance difference between slower computers and supercomputers is due to the memory hierarchy. The primary advantage of distributed computing is that each node can be purchased as commodity hardware, which when combined can produce similar computing resources to a many-CPU supercomputer, but at lower cost. This is due to the economies of scale of producing commodity hardware, compared to the lower efficiency of designing and constructing a small number of custom supercomputers. The primary performance disadvantage is that the various CPUs and local storage areas do not have high-speed connections. This arrangement is thus well-suited to applications where multiple parallel computations can take place independently, without the need to communicate intermediate results between CPUs. The high-end scalability of geographically dispersed grids is generally favorable, due to the low need for connectivity between nodes relative to the capacity of the public Internet. Conventional supercomputers also create physical challenges in supplying sufficient electricity and cooling capacity in a single location. Both supercomputers and grids can be used to run multiple parallel computations at the same time, which might be different simulations for the same project, or computations for completely

Keywords:

Supercomputing, Computing, Grid computing, SETI@home

Network

1. Introduction:
The term grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid in Ian Foster's and Carl Kesselman's seminal work, "The Grid: Blueprint for a new computing infrastructure" (2004). Grid computing is a phrase in distributed computing which can have several meanings: A local computer cluster which is like a "grid" because it is composed of multiple nodes Offering online computation or storage as a metered commercial service, known as utility computing, "computing on demand", or "cloud computing".

2. Grid Computing Supercomputer:

as

Virtual

The creation of a "virtual supercomputer" by using a network of geographically dispersed computers. Volunteer computing, which generally focuses on scientific, mathematical, and academic

different applications. The infrastructure and programming considerations needed to do this on each type of platform are different, however. There are also differences in programming and deployment. It can be costly and difficult to write programs so that they can be run in the environment of a supercomputer, which may have a custom operating system, or require the program to address concurrency issues. If a problem can be adequately parallelized, a "thin" layer of "grid" infrastructure can cause conventional, standalone programs to run on multiple machines (but each given a different part of the same problem). This makes it possible to write and debug programs on a single conventional machine, and eliminates complications due to multiple instances of the same program running in the same shared memory and storage space at the same time.

3. Design variations:

considerations

and

One feature of distributed grids is that they can be formed from computing resources belonging to multiple individuals or organizations (known as multiple administrative domains). This can facilitate commercial transactions, as in utility computing, or make it easier to assemble volunteer computing networks. One disadvantage of this feature is that the computers which are actually performing the calculations might not be entirely trustworthy. The designers of the system must thus introduce measures to prevent malfunctions or malicious participants from producing false, misleading, or erroneous results, and from using the system as an attack vector. This often involves assigning work randomly to different nodes (presumably with different owners) and checking that at least two different nodes report the same answer for a given work unit. Discrepancies would identify malfunctioning and malicious nodes. Due to the lack of central control over the hardware, there is no way to guarantee that nodes will not drop out of the network at random times. Some nodes (like laptops or dialup Internet customers) may also be available for computation but not network communications for unpredictable periods. These variations can be accommodated by assigning large work units (thus reducing the need for continuous network connectivity) and reassigning work units when a given node fails to report its results as expected. The impacts of trust and availability on performance and development difficulty can influence the choice of whether to deploy onto a dedicated computer cluster, to idle machines internal

to the developing organization, or to an open external network of volunteers or contractors. In many cases, the participating nodes must trust the central system not to abuse the access that is being granted, by interfering with the operation of other programs, mangling stored information, transmitting private data, or creating new security holes. Other systems employ measures to reduce the amount of trust "client" nodes must place in the central system. For example, Parabon Computation produces grid computing software that operates in a Java sandbox. Public systems or those crossing administrative domains (including different departments in the same organization) often result in the need to run on heterogeneous systems, using different operating systems and hardware architectures. With many languages, there is a tradeoff between investment in software development and the number of platforms that can be supported (and thus the size of the resulting network). Crossplatform languages can reduce the need to make this tradeoff, though potentially at the expense of high performance on any given node (due to run-time interpretation or lack of optimization for the particular platform). Various middleware projects have created generic infrastructure, to allow various scientific and commercial projects to harness a particular associated grid, or for the purpose of setting up new grids.

4. Actually how it Works:

The process of participation is the same. A user interested in participating downloads an application from the respective project's Web site. After installation, the application contacts the respective project's control node. The control node sends a chunk of data to the user's computer for analysis. The software analyzes the data, powered by untapped CPU resources.

The project's software has a very low resource priority -- if the user needs to activate a program that requires a lot of processing power, the project software shuts down temporarily. Once CPU usage returns to normal, the software begins analyzing data again. Eventually, the user's computer will complete the requested data analysis. At that time, the project software sends the data back to the control node, which relays it to the proper database. Then the control node sends a new chunk of data to the user's computer, and the cycle repeats itself. If the project attracts enough users, it can complete ambitious goals in a relatively short time span. "Distributed" or "grid computing" in general is a special type of parallel computing which relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) connected by a conventional network interface, such as Ethernet or the Internet. This is in contrast to the traditional notion of a supercomputer, which has many CPUs connected by a local high-speed computer bus. In most grid computing systems, only certain users are authorized to access the full capabilities of the network. Otherwise, the control node would be flooded with processing requests and nothing would happen (a situation called deadlock in the IT business). It's also important to limit access for security purposes. For that reason, most systems have authorization and authentication protocols. These protocols limit network access to a select number of users. Other users are still able to access their own machines, but they can't leverage the entire network.

computer. If the system robs users of computing resources, it's not an efficient system.

5. Applications Computing:

Using

Grid

The middleware and control node of a grid computing system are responsible for keeping the system running smoothly. Together, they control how much access each computer has to the network's resources and vice versa. While it's important not to let any one computer dominate the network, it's just as important not to let network applications take up all the resources of any one

Grids offer a way to solve Grand Challenge problems like protein folding, financial modeling, earthquake simulation, and climate/weather modeling. Grids offer a way of using the information technology resources optimally inside an organization. They also provide a means for offering information technology as a utility for commercial and non-commercial clients, with those clients paying only for what they use, as with electricity or water. 1. The Search for Extraterrestrial Intelligence (SETI) project is one of the earliest grid computing systems to gain popular attention. The mission of the SETI project is to analyze data gathered by radio telescopes in search of evidence for intelligent alien communications. There's far too much information for a single computer to analyze effectively. The SETI project created a program called SETI@home, which networks computers together to form a virtual supercomputer instead. 2. A similar program is the Folding@home project administered by the Pande Group, a nonprofit institution in Stanford University's chemistry department. The Pande Group is studying proteins. The research includes the way proteins take certain shapes, called folds, and how that relates to what proteins do. Scientists believe that protein "misfolding" could be the cause of diseases like Parkinson's or Alzheimer's. It's possible that by studying proteins, the Pande Group may discover new ways to treat or even cure these diseases. The Genome Comparison Project, a research project comparing the protein sequences of more than 3,500 organisms against each other, began on Dec. 20, 2006. By July 21, 2007, the project achieved all its goals by using a grid computing system. 3. Grid computing is presently being applied successfully by the National Science Foundation's National Technology Grid, NASA's Information Power Grid, Pratt & Whitney, Bristol-Myers Squibb, Co., and American ExpressThe NASA Advanced Supercomputing facility (NAS) has run genetic algorithms using the Condor cycle scavenger running on about 350 Sun and SGI workstations.

6. SETI@home:
("SETI at home") is an internet-based public

volunteer computing project employing the BOINC software platform, hosted by the Space Sciences Laboratory, at the University of California, Berkeley, in the United States. SETI is an acronym for the Search for Extra-Terrestrial Intelligence. Its purpose is to analyze radio signals, searching for signs of extra terrestrial intelligence, and is one of many activities undertaken as part of SETI. SETI@home was released to the public on May 17, 1999, making it the second large-scale use of distributed computing over the Internet for research purposes, as Distributed.net was launched in 1997. Along with Milkyway@home and Einstein@home, it is the third computing project of this type that has the investigation of phenomena in interstellar space as its primary purpose. Technology SETI@home version 4.45 Anybody with an at least intermittently Internetconnected computer can participate in SETI@home by running a free program that downloads and analyzes radio telescope data. Observational Data is recorded on 36 Gigabyte tapes at the Arecibo Observatory in Puerto Rico, each holding 15.5 hours of observations, which is then mailed to Berkeley. Arecibo does not have a high bandwidth internet connection, so data must go by postal mail to Berkeley at first. Once there, it is divided in both time and frequency domains work units of 107 seconds of data, or approximately 0.35 MB, which overlap in time but not in frequency. These work units then get sent from the SETI@home server over the Internet to personal computers around the world to analyze. The analysis software can search for signals with about one-tenth the strength of those sought in previous surveys, because it makes use of a computationally intensive algorithm called coherent integration that no one else has had the computing power to implement. Data is merged into a database using SETI@home computers in Berkeley. Interference is rejected, and various pattern-detection algorithms are applied to search for the most interesting signals. Software

The initial software platform, now referred to as "SETI@home Classic", ran from 17 May 1999 to 15 December 2005. This program was only capable of running SETI@home; it was replaced by Berkeley Open Infrastructure for Network Computing (BOINC), which also allows users to contribute to other distributed computing projects at the same time as running SETI@home. The BOINC platform will also allow testing for more types of signals. The discontinuation of the SETI@home Classic platform has rendered older Macintosh computers running pre-Mac OS X versions of the Mac OS unsuitable for participating in the project. On 3 May 2006 new work units for a new version of SETI@home called "SETI@home Enhanced" started distribution. Since computers now have the power for more computationally intensive work than when the project began, this new version is more sensitive by a factor of two with respect to Gaussian signals and to some kinds of pulsed signals than the original SETI@home (BOINC) software. This new application has been optimized to the point where it will run faster on some workunits than earlier versions. However, some workunits (the best workunits, scientifically speaking) will take significantly longer. In addition, some distributions of the SETI@home applications have been optimized for a particular type of CPU. They are referred to as "optimized executables" and have been found to run faster on systems specific for that CPU. As of 2007, most of these applications are optimized for Intel processors (and their corresponding instruction sets). The results of the data processing are normally automatically transmitted when the computer is next connected to the internet; it can also be instructed to connect to the internet as needed.

b) Statistics:
With over 5.2 million participants worldwide, the project is the distributed computing project with the most participants to date. The original intent of SETI@home was to utilize 50,000100,000 home computers. Since its launch on May 17, 1999, the project has logged over two million years of aggregate computing time. On September 26, 2001, SETI@home had performed a total of 1021 floating point operations. It is acknowledged by the Guinness World Records as the largest computation in history. With over 278,832 active computers in the system (2.4 million total) in 234 countries, as of November 14, 2009, SETI@home has the ability to compute over 769 teraFLOPS. For comparison, the Cray Jaguar, which as of 26 September 2009 was the

a) SETI@home (version 3.08):

under

classic

client

The SETI@home distributed computing software runs either as a screensaver or continuously while a user works, making use of processor time that would otherwise be unused.

world's fastest teraFLOPS.

supercomputer,

achieved

1,759

impossible now may be reduced to a project that lasts a few hours. We'll have to wait and see.

7. Advantages & Disadvantages


Some advantages are quite obvious:
1 No need to buy large six figure SMP servers for applications that can be split up and farmed out to smaller commodity type servers. Results can then be concatenated and analyzed upon job(s) completion. 2 Much more efficient use of idle resources. Jobs can be farmed out to idle servers or even idle desktops. Many of these resources sit idle especially during off bus iness hours. Policies can be in place that allows jobs to only go to servers that are lightly loaded or have the appropriate amount of memory/cpu characteristics for the particular application. 3 Grid environments are much more modular and don't have single points of failure. If one of the servers/desktops within the grid fail there are plenty of other resources able to pick the load. Jobs can automatically restart if a failure occurs. 4 Policies can be managed by the grid software. The software is really the brains behind the grid. A client will reside on each server which send information back to the master telling it what type of availability or resources it has to complete incoming jobs.

9. Conclusion:
We can say that Grid Computing will going to serve as Cost effective technology, substituting the use of Super Computer by using Projects like SETI@HOME

10. References
Plaszczak, Pawel; Rich Wellner, Jr. Grid Computing "The Savvy Manager's Guide". Morgan Kaufmann Publishers. ISBN 0-12-742503-9. Berman, Fran; Anthony J. G. Hey, Geoffrey C. Fox. Grid Computing: Making The Global Infrastructure a Reality Li, Maozhen; Mark A. Baker. The Grid: Core Technologies. Wiley. ISBN 0-47009417-6 en.m.wikipedia.org/wiki/Grid_computing en.m.wikipedia.org/wiki/SETI%40home en.m.wikipedia.org/wiki/condor_cycle_sc avenger

Some disadvantages:
1 For memory hungry applications that can't take advantage of MPI you may be forced to run on a large SMP. 2 You may need to have a fast interconnect between compute resources (gigabit ethernet at a minimum). Infiband for MPI intense applications.

8. Future Scope:
As grid computing systems' sophistication increases, we'll see more organizations and corporations create versatile networks. There may even come a day when corporations internetwork with other companies. In that environment, computational problems that seem

You might also like