Professional Documents
Culture Documents
Technical Report
NWU-CS-99-3
May 27, 1999
Stochastic Scheduling
Jennifer M. Schopf
Francine Berman
Abstract
There is a current need for scheduling policies that can leverage the performance variability
observed when scheduling parallel computations on multi-user clusters. We develop one solution to
this problem called stochastic scheduling that utilizes a distribution of application execution
performance on the target resources to determine a performance-efficient schedule.
In this paper, we define a stochastic scheduling policy based on time-balancing for data parallel
applications whose execution behavior can be adequately represented as a normal distribution. We
demonstrate using a distributed SOR application that a stochastic scheduling policy can achieve
good and predictable performance for the application as evaluated by several performance measures.
!"
#"
$% &'''
!"
!" !"" " !" #
$!
" " !% &! !! % "! "' '
%'"!' % ! ! # !
! #% "!
! % "! "% %! % % !
% ! "!! ! &! #'
! "% %
!! ' %'"! "! ! %'# !
" ! !% % ! &! ! ' "
( %! #' ' "! " !%#
" ! &! "
) ! ! % ! !
"! ! %* ' % ' " %
+ %! # ! !
!%# !' , " ! ! -% . " !%#
& !
"! "% " /% ' ' " %0 ! !!
!
( %%! % # # ! !! ! ' #
1 &" "' %% 2 3 0!%
4
!! ! !! ! !# "
1 &"
! 2 3 0!% %% "' ! ' ' " 5 3 0!%
+' " /# "% ! ! " # %! ! !
! !% %! % %% " ! % !%
& #!"
!"#$%
#/% ; ! < %! " !# !%# !' %
# " ! ! " ! % " /
=&" !%#
!' %! % !
% % > % !!% ! 5
1 " # ! ,&% % #
) !% % ! ! "
% " ! " % " % #%
?
" !%# ", % ! "; #
$
!"" !%# !' % ! " ""/
&! " % ! ' ## % ! ! , &! #
#' " "
"' !!"% ' # * ! #
=* 7 % " % #" - # * .
@ A @
-7.
A
; " !" % !
; " % !
; " % %
* " %
) "
%% % #"
-( !""! " ! " %
#% ! ! %%% !
. $ &" %# % #
" !# * "!! % ! ! % B+)1 CD
! ! " ! - " !" %
! . " !# *
( ! ! # # - . %%
% * %
( ! ! # # -
. ! " # # ' ! ! %!
"!
E& ' ! # " #
!!
# #" % ! ' " # # F!
' ( )
* + ,* -% ,
* (.
*
+
*
( /
%
<
"! ' ! - ' ## % # '
"!.
! "# % ! %* ' % '
" %
6 ! % " ! ! !% "
" % " " " %% %
! %% %
!,! "! # "! ! ! % % ! %9 %
"!%% "
) ! " % " " %% %
%% - ! ". "
# 1! ! %,% '
' % % " ! # G! "H !%# !'
A @
-<.
%! % !" " 0!
%! % !" " 0!
% % % " " %% % %% - ! ".
" % " ! % ! %
? %, =* < * ! " ,% ;
- @
. @ A - @
. @
-.
A
# 1! ' ' " %,% ' "
%% ! ! "
% "" ! !
# 1! ' " %,# ! ! "
!# !'
( % G H " !%# !' " !
6 !" " % % "# # 1! ' " , % !
! % # ! ! %! "
, % ' ! % "#
# 1! % " !%# ! % I BI D
( % #
" !% ! % !! ! -! # % ! %
!
. !%% % %!% "! % -!
!! ' " ""' !! ' !
. !%%
! % % !
# " " ! !%# ! " " ' "!
) % " ' ' " ;
"! & E! # "! "J ' "!
& % ! "! % %% % ! ! %! J % "
!" # !! ' "! !%
) , -# "
" ! !% . " " ' ! % "
!,!'
, ;
( ' "! " '#
" % " " ! !% " "9 '
"! & #
*
(0 23!"4%
"! "' " % #% # ' ' #
"! ! &! "
High Power
High Variability
<
<
Low Power
High Variability
High Power
Low Variability
<
<
Low Power
Low Variablity
Figure 1. Diagram depicting the relative values of Tuning Factors for different configurations of power and
variability.
Value_i
number of machines
# " # 1# < * %, " !,!
&! " ;
! $ " !""! % !" "! !
! *" !
! % "% ' ! !,! !"
>
# ! $ " ' "! %% ' "!
1 &
" K '- . * %% % ""
!
"
# $ % , "! # '
"' '
! "
1 &" "" #' G"%H
&" '
5 ! #
# !
) %,%
4#K ' %% % "" #
* % # "% " &" %
" ! %
%9 % " % ! %,% '
$
# 1! # %, # # 1!
1
# " %,% 1# < # 1! # " %,%;
&
% &
% % &
%
! % % "
! # ! " !% "! % !
( !% ! ! % # " % % !
* ' !% %! " " %% % % "
%% %
&
% A &
% A 7 %
&
% A <
# # " # 1# < !% # ' " 58 58 !
!% -
!% ! # ! "! & # !.
% 1#
! # -%%! %. " &! "
! "!
$%%# ' ' " - ' %%# . '!' "!
"! # ' %# " ! !% " # F! '
!%# !' "' %9 % E! &! ! ! ! ! !
"' F! ! &! " " BI?2D
"
( ! !" " !# !%# !' " /% ' "% -
# 1! ' . " !# ! % "% ' / #
1!
) ! "% !%# !' - 1A. %
3
!! !%# !' % ,&% " !# * # =* 7 -
" " ' " % .
%
$ !%# ! &" !" ! "! %!
"% B!2D " /% ' " " %% ' ) ! B)C
)2 )4LD
) &" ! # % % % "% '
! !' " % !"% &! "
( % !" ! ' !%# ! # " /
! ' %9! !% &! " % % "
). " 5
, ( 0
+ *
5% Conservative
Mean + 1 SD
Mean
Mean + 2 SD
50% Conservative
30% Conservative
95% Conservative
70% Conservative
6 -!! 6& . # 3 !%
I!: *
6 "" % ! ! ! !! !
#% " %% ! % " 7
!
%%% G%H % G !H !""! % !" # !
(
"" %,% "
&" : " I ' -I. % I& !
I ! ! > # ! !! % ' "& %
J I& ! > # I& !! % '
+ " %
! % %# " &"
) !% !%# !' G H & &! " #
( &" G H !%# !' " % %F % !%
# " ! 1# > &! " !% ' 3 "
5L % K1 <2 " 5L % % 51 << " 5L % %! #
K1 !' " " ' ! "" &! " 3
!% ""' ! ! ! 0 ! ! ""
&! "
; ! &! "
% ! &! " ! M %
# &! " # 7 K1 <CL !
&! " 3 51 J 51 <25L %
C
!'
3
K1
51
3
K1
51
3
K1
51
3
K1 3 3 <L<7 "&% %
51 K1
# <
% %! K1 " ' &! "
" % ' 3 &"
1# >
% " ! 1
2 % 1
2
" ! !%# !' >
" ! # #
"! ! !%# ! # % "
1 &" % #
1# > 3 !' % " <
L2 % %% %
2L K1 !'
% "
7 % %% %
> % 51 !' % "
CC
% %% % <
25
%! K1 !' & % 5L8
"" &! " % &! " 3 !'
=&" 3 K1 51
1# >
<2
<<
1# 5
<>
<>
7
1# C
L
77
1# 2
<5
7>
7
Table 3. Window metric for each scheduling policy out of 58 possible windows.
=&"
1# >
1# 5
1# C
1# 2
3
3
<
L2
2L
>
5<
C
<>
2> <
2<
<
7 >
5
K1
3
7
>
>
5< 7
<<
<<
75 <
L
<<
2L <
2>
51
3
CC <
25
5
< 7
5L
<>
>L <
52
<<
7 <
25
Table 4. Average mean and average standard deviation for entire set of runs for each scheduling policy.
% "%" % % # 1# 5 !"
!%# ! ' ! ! % I& !
1# C
!" !%# ! I ! % "! '
' % "! # '
1# 2 !"
!%# ! I ! % "! # '
0 / % ,
% " ! # " <
% >
1" < ! %F ! &! %F % !% % !%#
!' ! !!
4 &" % !%# !' !
,&% &! -K1. " -!
&! ". % ! " % !%
3 ! !' -51.
" # ' 3 % " !%
) #,! # % !! # 1! # " " !
!% % " "!! ! &!
! ! !%# # % !%# ! $I 9! B+)C
+)2 +)1 CD
6 ! % ! % !%# ! #' % ' %
I BI I ID
!! % !" " !%# ! !!
! % ! ! % % ' " , % - # % ,% !.
"# % % !
$ !" " ! %% ! ' " "
% # % %F ! !
6
/ .*( +( 0 #7 0 (0 0 , % * ,.
0 +( ( #7 0 0
. ( % ( %
0 +( ( #7
0 0
0 (, ( %
40
35
30
25
Mean Scheduling Policy
VTF Scheduling Policy
95TF Scheduling Policy
20
910002000.0
910004000.0
Timestamps
910006000.0
Figure 4. Comparison of %, #, and &' policies, for the SOR benchmark on the Linux cluster
with 2 low variability machines, one medium and one high.
40
38
36
34
32
30
28
910142000.0
910144000.0
Timestamps
910146000.0
Figure 5. Comparison of %, #, and &' policies, for the SOR benchmark on the Linux cluster
for 4 low variability machines.
35
30
25
20
15
10
Mean Scheduling Policy
VTF Scheduling Policy
95TF Scheduling Policy
0
910528000.0 910529000.0 910530000.0 910531000.0 910532000.0
Timestamps
Figure 6. Comparison of %, #, and &' policies, for the SOR benchmark on the PCL cluster
when 2 of the machines had a low variability in available CPU and two had high.
35
30
25
20
15
10
0
910531000.0
910533000.0
Timestamps
910535000.0
Figure 7. Comparison of %, #, and &' policies, for the SOR benchmark on the PCL cluster
when 1 of the machines had a low variability in available CPU and three had high.
7
N
BNICD !% % ! ! !%#
!"%
% !# !* # %F !" %F ! % '#
#" % ' " "
4 % "% % %% !! ' '
' "
( ! ! ! !%# % !
) "%'
" !# % ! !' ! ! " % "# % !
! ! " % " %
6 ! ! !%# !'
# 1! % " "' %% % % %%% % "
!%
# 1! % G H % " ! # ! " %%
!%# !'
) ' ! !%# % % 6 !
6
&" %" &! " % " %!
! # ! ! !%#
) % !* ! & ! % %'"!
" ! ! %" !! !* ! &! % %
" "
% "# ' %" %'"!
" ! % !!' " % % " ! "!
!"& % !## "
1' " & !% % !%# ! " !#
! "
) ! %% ' ! ) % % ' # ?/
6 % O" 4' :
3' ! % $I #
!"" %"
#
!
&
' (
*
+
,'!
!
!
,2
,2*
%
)
!
- ,
. ' /
$ /
( 0
"
#$
% &
!
&
1
/
(
&
&
! % *
'*
3
4+ % ,
5(
)
+
'*
3
4+ % ,
2
+
)
+
'
!
2
77
) *
,0
'*
3
4+ % ,
%
)
)
+
6789
0
5(
(
+
+ +
7
#
9 "
")3)66 :35 34 5
!
5(
(
+
(
+
+
;<
>,/
!
-
& ; (
/
*(
)
= (
-.+
/
0
1 +
'!*23
+ +
!
7<
<
3?
(
*