You are on page 1of 2

From Scheduling Theory to Practice: A Case Study

Ghislain Charrier
University of Lyon. LIP Laboratory
UMR CNRS - ENS Lyon - INRIA
46 allée d’Italie
69364 Lyon Cedex 07, FRANCE
ghislain.charrier@ens-lyon.fr

ABSTRACT 2. CASE STUDY


This paper presents the subject of my Ph.D by develop- It is well known that the world’s climate is changing. One
ing a case study about scheduling heuristics for the Ocean- of the main reasons of this change is the increase of green-
Atmosphere application. Heuristics have been proposed and house gases in the atmosphere. In order to predict the cli-
validated by performing simulations and real experiments on mate evolution, climatologists need to perform simulations.
the grid. Possible future work revolving around scheduling Cerfacs is a research organization that aims at develop-
over the grid is also presented. ing new algorithms and new simulations methods for large
scientific problems requiring access to enormous computa-
tional resources. Cerfacs work on different areas such as
Categories and Subject Descriptors parallel algorithms, aerodynamics, combustion, data assim-
J.2 [Computer Applications]: Physical Science And En- ilation, electromagnetism, climate and environment.
gineering—Earth and Atmospheric Sciences Ocean-Atmosphere is an application developed by Cer-
facs. This software performs simulations to predict the cli-
mate evolution for the years to come. Choosing the initial
General Terms parameters of a simulation is hard, so several simulations
Experimentations, Performances are executed concurrently with different initializations.
The computations need of these simulations is enormous.
The use of a grid seems to be a good way to execute this kind
Keywords of applications. Even if the simulations can be performed in
Scheduling, Climate Prediction, Grid Computing parallel, the time spend to obtain the results is quite large.
So, to reduce the execution time, scheduling techniques can
be applied. Another consequence of decreasing the execution
1. INTRODUCTION time is the decrease in energy consumption, leading to less
Grid infrastructures and middlewares are becoming ma- greenhouse gases emitted and less money spent on electricity
ture. However, a lot of open research interests are still or resources renting.
present. Scheduling is one of the very active research areas The goal of the research is to reduce the execution time
in the grid domain. In order to provide efficient execution of needed to perform several simulations in parallel. This is
computing intensive applications, scheduling is a necessity. done by using heuristics. To achieve this goal, I will first
The main topic of my Ph.D. is going from theoretical to present a representation of the application, and then suggest
pratcical scheduling over the grid. To attain this goal, I a heuristic to minimize the execution time based on the given
studied an application to predict the climate evolution for representation of the application.
the years to come and proposed an appropriate scheduling
heuristic to execute it over the grid.
In this paper, I will present a case study of scheduling for
2.1 Ocean-Atmosphere
a climate prediction application called Ocean-Atmosphere. The proposed climate modeling application consists in ex-
First, I will describe the application, then the heuristic pre- ecuting simulations of present climate followed by the 21st
sented and the experimental results obtained in Section 2. century, for a total of 150 years (scenario). A scenario
Then I will present possible future works for my Ph.D in combines 1800 simulations of one month each (150×12),
Section 3 before concluding in Section 4. launched one after the other. The results from the nth
monthly simulation are the starting point of the (n+1)th
monthly simulation. For the whole experiment, several sce-
narios must be performed. The number of months to be
Permission to make digital or hard copies of all or part of this work for simulated and the number of scenarios are chosen by the
personal or classroom use is granted without fee provided that copies are user.
not made or distributed for profit or commercial advantage and that copies A one month simulation decomposes in three phases: a
bear this notice and the full citation on the first page. To copy otherwise, to pre-processing phase, a main-processing phase and a post-
republish, to post on servers or to redistribute to lists, requires prior specific processing phase. The pre-processing phase only takes a few
permission and/or a fee.
CSTST 2008 October 27-31, 2008, Cergy-Pontoise, France seconds while the main phase takes at least 20 minutes to
Copyright 2008 ACM 978-1-60558-046-3/08/0003 ...$5.00. run in the best case. The post-processing phase needs a

- 581 -
few minutes to execute. To simplify the problem, the pre- 2.3 Experimental Results
processing phase and the main phase are merged into one The heuristic has been implemented and tested on Grid’-
task. The post-processing phase is also represented by a 5000. Several problems occurred when performing these real
single task. So, there are now two tasks: a main task and experiments. First, the application crashes for unknown
a post-processing task. The former is a parallel task using reasons, so a restart feature has been implemented in the
from 4 to 11 processors and the latter is a sequential task. server running the application. Secondly, due to the mas-
The speedup of the main task is superlinear. sive amount of data between two consecutive months, it is
Figure 1 shows the data dependencies between two consec- necessary to implement a flush mechanism to execute the
utive months of a scenario. The left side presents the tasks post-processing tasks and delete some temporary data. An-
before merging them together, and the right side presents other problem caused by large mount of data is the execu-
the new merged tasks. The number after the small tasks tion of the post-processing tasks: when we execute several
names are a possible time needed to execute a task. post-processing tasks in parallel, they all retrieve the needed
data from the NFS so it becomes saturated, leading then to
caif1(1) mp1(1)
main1 a slowdown of the execution.
The last problem was a bug due to the implementation
pcr1(1260)
of MPI we used. When dividing the resources in groups, it
caif2(1) mp2(1) post1 main2
is possible that the grouping separates a node between two
cof1(60)
scenarios because all the nodes on Grid’5000 are at least bi-
emf1(60) cd1(60) pcr2(1260) cores. In such a case, MPI crashes. To tackle this problem,
we added an extra constraint to the choice of the number
post2
cof2(60) of resources per group: the number of resources in a group
must be divisible by the number of cores of each node. To re-
emf2(60) cd2(60)
duce the loss induced by this constraint, we allowed each sce-
nario to be executed on 12 resources instead of 11. Adding
Figure 1: Chain of 2 consecutive monthly simula- the possible 12th resource diminishes the loss.
tions. Simulations and real experimentation have been compared
and show that the simulated times for the main tasks are
quite good, but if the post-processing tasks are taken into
2.2 Scheduling account, the simulations are always underestimating the ex-
Using the representation defined in Section 2.1, the heuris- ecution time.
tic we developed to run the application on a homogeneous This Section presented the heuristic used to execute Ocean-
platform is the following: all the post-processing tasks are Atmosphere over the grid. This works quite fine, but some
scheduled at the end and the processors are divided in groups points must be changed if we want to achieve the best pos-
on which the main tasks will be executed. The grouping is sible performances.
computed using linear constraints. Solving the constraints
gives the groups of processors which will maximize the por- 3. FUTURE WORK
tion of the application executed at each time unit. An es-
timation of the execution time of a one month simulation The future work of my Ph.D. continues in the area of
is needed to compute the grouping. To obtain this time, scheduling over the grid. More precisely, I will work with
benchmarks must be conducted on the targetted platform. Diet, which is a GridRPC middleware, and develop new
Figure 2 shows a possible schedule for the tasks on the scheduling algorithm to use in it. I already used Diet to per-
resources of an homogeneous platform. form the experiments of Ocean-Atmosphere on Grid’5000.
Diet is executed on grids which mostly of the time have a
batch scheduler to reserve resources, it is not always pos-
sible to use classical algorithms to schedule the new tasks
to execute. The reservation deadline, the load of the grid,
Resources

the migration of tasks and a lot of other parameters must


be all taken into account at the same time to provide effi-
cient scheduling heuristics. To test the new heuristics, we
will collaborate with other institutes to execute their appli-
cations. Several domains are envisaged such as cosmology
Time
or calculations on sparse matrices.
Figure 2: Possible schedule.
4. CONCLUSIONS
The aim of this work is to execute the Ocean-Atmosphere In this paper, I presented a case study of a cross-field re-
application on Grid’5000. Grid’5000 is composed of several search project between computer science and climatology.
homogeneous clusters but they differ from one another. An- The climatology part is used to predict the changes in the
other heuristic is added to take into account the heterogene- climate while the scheduling part is used to speedup the ex-
ity between clusters. The idea is to schedule more scenarios ecution of the simulations predicting the climate evolution.
on a fast cluster than on a slow one. Each scenario is added I also gave a quick overview of some possible future work in
one by one on a cluster. The selected cluster is the one the continuity of my Ph.D.
where the execution time suffers the less of the addition of
a scenario.

- 582 -

You might also like