Professional Documents
Culture Documents
Ghislain Charrier
University of Lyon. LIP Laboratory
UMR CNRS - ENS Lyon - INRIA
46 allée d’Italie
69364 Lyon Cedex 07, FRANCE
ghislain.charrier@ens-lyon.fr
- 581 -
few minutes to execute. To simplify the problem, the pre- 2.3 Experimental Results
processing phase and the main phase are merged into one The heuristic has been implemented and tested on Grid’-
task. The post-processing phase is also represented by a 5000. Several problems occurred when performing these real
single task. So, there are now two tasks: a main task and experiments. First, the application crashes for unknown
a post-processing task. The former is a parallel task using reasons, so a restart feature has been implemented in the
from 4 to 11 processors and the latter is a sequential task. server running the application. Secondly, due to the mas-
The speedup of the main task is superlinear. sive amount of data between two consecutive months, it is
Figure 1 shows the data dependencies between two consec- necessary to implement a flush mechanism to execute the
utive months of a scenario. The left side presents the tasks post-processing tasks and delete some temporary data. An-
before merging them together, and the right side presents other problem caused by large mount of data is the execu-
the new merged tasks. The number after the small tasks tion of the post-processing tasks: when we execute several
names are a possible time needed to execute a task. post-processing tasks in parallel, they all retrieve the needed
data from the NFS so it becomes saturated, leading then to
caif1(1) mp1(1)
main1 a slowdown of the execution.
The last problem was a bug due to the implementation
pcr1(1260)
of MPI we used. When dividing the resources in groups, it
caif2(1) mp2(1) post1 main2
is possible that the grouping separates a node between two
cof1(60)
scenarios because all the nodes on Grid’5000 are at least bi-
emf1(60) cd1(60) pcr2(1260) cores. In such a case, MPI crashes. To tackle this problem,
we added an extra constraint to the choice of the number
post2
cof2(60) of resources per group: the number of resources in a group
must be divisible by the number of cores of each node. To re-
emf2(60) cd2(60)
duce the loss induced by this constraint, we allowed each sce-
nario to be executed on 12 resources instead of 11. Adding
Figure 1: Chain of 2 consecutive monthly simula- the possible 12th resource diminishes the loss.
tions. Simulations and real experimentation have been compared
and show that the simulated times for the main tasks are
quite good, but if the post-processing tasks are taken into
2.2 Scheduling account, the simulations are always underestimating the ex-
Using the representation defined in Section 2.1, the heuris- ecution time.
tic we developed to run the application on a homogeneous This Section presented the heuristic used to execute Ocean-
platform is the following: all the post-processing tasks are Atmosphere over the grid. This works quite fine, but some
scheduled at the end and the processors are divided in groups points must be changed if we want to achieve the best pos-
on which the main tasks will be executed. The grouping is sible performances.
computed using linear constraints. Solving the constraints
gives the groups of processors which will maximize the por- 3. FUTURE WORK
tion of the application executed at each time unit. An es-
timation of the execution time of a one month simulation The future work of my Ph.D. continues in the area of
is needed to compute the grouping. To obtain this time, scheduling over the grid. More precisely, I will work with
benchmarks must be conducted on the targetted platform. Diet, which is a GridRPC middleware, and develop new
Figure 2 shows a possible schedule for the tasks on the scheduling algorithm to use in it. I already used Diet to per-
resources of an homogeneous platform. form the experiments of Ocean-Atmosphere on Grid’5000.
Diet is executed on grids which mostly of the time have a
batch scheduler to reserve resources, it is not always pos-
sible to use classical algorithms to schedule the new tasks
to execute. The reservation deadline, the load of the grid,
Resources
- 582 -