0 views

Uploaded by babarirfanali

paper

- CS-12
- Kiwibank
- An Optimization of Processomulti-cluster computing,datamining
- Tensor Flow
- 10.1.1.167
- database
- Chapter 6
- ch06
- ACA Research Paper
- SAP Background Process Scheduling and Workload Management
- Schedulingalgorithmsincloudcomputing 150114094127 Conversion Gate02
- CS9211-Computer Architecture Question
- SourabhK1511B45
- Exam
- apm2
- LWDF
- ECC3202-1
- A State-based Model for Runtime Resource Reservation for Component-based Applications
- 10.1.1.78
- Deadline – Cost Based Job Scheduling Using Greedy Approach in a Multi-Layer Environment

You are on page 1of 9

Daniel Mosse and Rami Melhem

Computer Science Department

University of Pittsburgh

Pittsburgh, PA 15260

{ramesh,namrata,zdk,mosse,melhem}@cs.pitt.edu

plored. Fewer works, however, have focused on energy

Power management has become popular in mobile com- management for parallel and distributed real-time systems.

puting as well as in server farms. Although a lot of work has For shared memory systems, we have recently proposed

been done to manage the energy consumption on uniproces- some power management schemes for real-time applica-

sor real-time systems, there is less work done on their multi- tions based on global scheduling techniques [27, 28]. In

computer counterparts. For a set of real-time tasks with this paper, we address energy management for distributed

precedence constraints executing on a distributed system, real-time systems, where the communication time is signif-

we propose new static and dynamic power management icant and tasks may have precedence constraints. We start

schemes. Assuming a given static schedule generated from by taking a schedule that is generated from some schedul-

any list scheduling heuristic algorithm, our static power ing discipline, such as list scheduling, and then apply two

management scheme uses the static slack (if any) based on techniques to enhance the power management of the sys-

the degree of parallelism in the schedule. To consider the tem. The mapping of tasks to processors and static schedul-

run-time behavior of tasks, an on-line dynamic power man- ing algorithm used in this work is taken from [21]. First,

agement technique is proposed to further explore the idle we propose a new static power management scheme based

periods of processors. By comparing our static technique on the degree of parallelism in the schedule. Then, to fur-

with the simple static power management, where the static ther explore the tasks run time behavior, we propose two

slack is distributed to the schedule proportionally, we find dynamic techniques, greedy and gap-filling to use the pro-

that our static scheme can save an average of 10% more en- cessor idle periods to execute tasks at reduced speeds for

ergy. When combined with dynamic schemes, our schemes more energy savings. Our algorithms manage slack that is

significantly improve energy savings. dynamically created to slow down scheduled tasks as well

as to change the order in which tasks were originally sched-

uled.

1 Introduction The paper is organized in the following way. The re-

lated work is addressed in Section 2. The application, power

and system models are described in Section 3. Static power

Energy aware computing has recently become popular management scheme is addressed in Section 4. Section 5

not only for mobile computing systems to lengthen battery discusses the dynamic power management scheme and sim-

life but also in large systems consisting of multiple pro- ulation results are shown and analyzed in Section 6. Finally

cessing units to reduce energy consumption and associated Section 7 concludes this paper.

cooling cost. Since processors consume a large percent-

age of energy in computer systems, especially in embedded

systems, much work has been done on managing energy 2 Related Work

consumption for processors. Based on the dynamic voltage

scaling (DVS) technique, energy management schemes in Many hardware and software techniques have been pro-

This work has been supported by the Defense Advanced Research posed to reduce the energy consumption of such systems,

Projects Agency through the PARTS project (Contract F33615-00-C- such as shutting down unused components and low energy

1736). circuit designs. With CMOS technology, processors power

is dominated by dynamic power dissipation which is de- function [9]. For SOCs (system-on-chip) with two pro-

termined by processor supply voltage and clock frequency cessors running at two different fixed voltage levels, Yang

[4, 6]. By reducing processor clock frequency and supply et al. proposed a two-phase scheduling scheme that min-

voltage, we can reduce energy consumption at the cost of imizes the energy consumption while meeting the timing

performance of processors. Processors with the ability of constraints by choosing different scheduling options deter-

dynamic voltage scaling (DVS) are currently commercially mined at compile time [24]. In [26], Zhang et al. proposed a

available [11, 10]. There is an interesting trade-off between priority based task mapping and scheduling for a fixed task

the energy consumption and performance of processors, es- graph and formulated the voltage scaling problem as an in-

pecially for real-time systems in which high performance is teger programming (IP) problem.

sometimes necessary in order to meet the timing constraints.

For uniprocessor systems, Weiser et al. first discussed 3 Models

the problem of scheduling tasks to reduce the energy con-

sumption of processors [23]. Yao et al. described an off-line

In this section, we briefly discuss the application, system

scheduling algorithm for independent tasks running with

and power models that we have used in our work.

variable speed, assuming worst-case execution time [25].

Based on dynamic voltage scaling (DVS) technique, Mosse

et al. proposed and analyzed several schemes to dynami- Application Model A task i is represented by a tuple

cally adjust processor speed with slack reclamation [18]. In (c0i , a0i ), where c0i and a0i are the worst and average case

[22], Shin et al. set the processors speed at branches ac- number of cycles needed to execute i . The precedence

cording to the ratio of the longest path to the taken paths constraints and communication cost between tasks within

from the branch statement to the end of the program. Kumar an application are represented by a directed acyclic graph,

et al. predict the execution time of tasks based on the statis- G(V, E), where vertices represent tasks and edges repre-

tics gathered about execution time of previous instances of sent dependencies between tasks. There is an edge e ::

the same task [13]. The best scheme is an adaptive one that vi vj E if vi is an immediate predecessor of vj ,

takes an aggressive approach while providing safeguards which means that vj depends on vi . In other words, vj is

that avoid violation of the application deadlines [2, 17]. ready to begin execution only after vi finishes execution.

When considering limited voltage/speed levels in The weight associated with each edge represents the com-

uniprocessor systems, Chandrakasan et al. have shown that, munication cost when communicating tasks are scheduled

for periodic tasks, a few voltage/speed levels are suffi- on two different processors. We assume that the communi-

cient to achieve almost the same energy savings as infinite cation cost is zero if the communicating tasks are scheduled

voltage/speed levels [5]. Pillai et al. also proposed a set on the same processor.

of scheduling algorithms (static and dynamic) for periodic To simplify the presentation, we consider frame based

tasks based on EDF/RM scheduling policy [19]. AbouG- applications [14], that is, the applications consist of a set of

hazaleh et al. have studied the effect of voltage/speed ad- tasks which have a common deadline. This model is realis-

justment overhead on choosing the granularity of inserting tic if we consider that each task graph has been assigned a

power management points in a program [1]. certain amount of time to execute. It can be easily achieved

with real-time operating systems that provide temporal pro-

For periodic task graphs and aperiodic tasks in dis-

tection, such as LinuxRK [20].

tributed systems, with a given static schedule for periodic

tasks and hard aperiodic tasks, Luo et al. proposed a static

optimization algorithm by shifting the static schedule to re- Power and System Model We assume that processor

distribute the static slack according to the average slack power consumption is dominated by dynamic power dis-

2

ratio on each processor element [15]. They improved the sipation Pd , which is given by: Pd = Cef Vdd f , where

static optimization by using critical path analysis and task Cef is the effective switch capacitance, Vdd is the supply

execution order refinement to get the maximal static slow voltage and f is the processor clock frequency. Processor

down factor for each task [16]. For a fixed task set and pre- speed, represented by f , is almost linearly related to the

2

dictable execution times, static power management (SPM) supply voltage: f = k (VddVV dd

t)

, where k is constant and

can be accomplished by deciding beforehand the best volt- Vt is the threshold voltage [4, 6]. The energy consumed by a

2

age/speed for each processor [8]. When there are depen- specific task i can be given as Ei = Cef Vdd c0i , where

0

dence constraints between tasks, for a given task assign- ci is the number of cycles needed to execute i . When we

ment, Gruian et al. proposed a priority based energy sen- decrease processor speed, we also reduce the supply volt-

sitive list scheduling heuristic to determine the amount of age. This reduces processor power consumption cubically

time allocated to each task, considering energy consump- with f and reduces tasks energy consumption quadratically

tion and critical path timing requirement in the priority at the expense of linearly decreasing speed and increasing

execution time of the task. From now on, we refer to speed will first discuss three static power management schemes to

adjustment as both changing the processor supply voltage allocate the global static slack.

and frequency. We assume that c0i and a0i do not change

with different processor speeds. We define ci and ai as the 4.1 Greedy Static Power Management (G-SPM)

worst case and average case execution time of task i for

a specific processor, running at maximal processor speed

c0i a0i This algorithm shifts the static schedule forward (that is,

fmax , that is, ci = fmax and ai = fmax . We also assume toward the deadline) and allocates the entire global static

that idle processor consumes 15% of the maximum possi- slack to the first task on each processor, if the task is not

ble power (power consumed without any speed reduction). dependent on others. By shifting all the tasks together, all

We varied this value from 5% to 15% and the nature of the precedence and synchronization constraints are maintained.

graphs remained same. The speed to execute the first task on each processor is

We consider a distributed system where each processing slowed down as they have more time to execute. Apply-

unit has its private memory. The communication cost be- ing G-SPM to the example task graph in Figure 1a, both

tween processors is significant and cannot be ignored. We tasks A and B will get 2 units of slack and slow down pro-

assume continuous speed change for the processors in the portionally. The static schedule is shown in Figure 2a with

system. We also consider preemptive scheduling, but we different processor speeds shown for each task.

consider no migration. For simplicity, we ignore overheads

of speed adjustments and preemptions (the overhead effect f f max D

is discussed in detailed in [18, 28]). 1/2 f max

B C

4 Static Power Management

A 1/3 f max t

The mapping of tasks to processors and static scheduling 0 1 2 3 4 5 6

algorithm used in this work is taken from [21]. For exam-

a. Greedy Static Power Management

ple, the task graph in Figure 1a has a static schedule shown

in Figure 1b, where the dotted line with an arrow represents f 2/3 f max 2/3 f max D

the communication between tasks A and C. After the static

schedule is generated, we apply our static power manage- B C

ment scheme, which is described below. 2/3 f max

We say there is static slack in the system if an application A t

executes for its worst case execution time but still finishes

0 1 2 3 4 5 6

before its deadline. Global static slack is defined as the

difference between the length of the static schedule and the b. Simple Static Power Management

deadline. For example, when the task graph in Figure 1a f D

runs on a 2-processor distributed system, the static schedule 1/2 f max 1/2 f max

obtained is as shown in Figure 1b with schedule length 4.

In the schedule, the y-axis represents the processor speed B C

1/2 f max

and the x-axis represents time. The area of the rectangle t

A

represents the worst case number of cycles needed to be

executed by the task. Assuming that the application has a 0 1 2 3 4 5 6

c. Another Static Power Management

deadline at 6, the global static slack will be L0 = 64 = 2.

f Figure 2. Static Power Management

A B D

1/1 2/2 L0

2 B C

2

A t 4.2 Simple Static Power Management (S-SPM)

C

1/1 0 1 2 3 4 5 6

a. An application b. Static schedule for the application Assuming that every task in an application executes for

exactly ci units of time, the optimal static speed for a

Figure 1. An example and its static schedule uniprocessor system to get minimal energy consumption

can be obtained by proportionally distributing the static

If there is some slack in the system, the system can ap- slack to each task according to its ci [12]. Following the

propriately slow down the processor to save energy. We same idea, a simple static power management (S-SPM)

scheme for multiprocessor systems was proposed by dis- f

tributing global static slack over the length of a schedule T21 T11 T01 T12 D

[8]. Applying S-SPM to the task graph in Figure 1a, we see L0

that every task will run at 23 fmax and the static schedule is B C

shown in Figure 2b.

Note that, for multiprocessor systems, S-SPM is not op- A t

timal in terms of energy consumption because of the differ-

ent degrees of parallelism in a schedule. For the example 0 1 2 3 4 5 6

in Figure 1, S-SPM consumes 49 E, where E is the energy

Figure 3. Parallelism in the schedule

consumed when no power management (NPM) is applied

(assuming that a processor consumes no power when it is

in idle state). Another static power management is shown that the amount of static slack allocated to Ti is denoted by

in Figure 2c. It allocates 2 units of time to task A and C, li (i = 0, 1, 2), the total energy consumption E in the worst

and 4 units of time to task B. The energy consumption will case after allocating the global static slack L0 will be:

be 14 E, which is less than 49 E. Actually, S-SPM consumes X

even more energy than G-SPM, which consumes 29 72 E. The E= Ei (1)

reason for this is that S-SPM wastes an additional 1 units X

of slack by uniformly stretching the whole schedule. For a = (Cef i fi3 (Ti + li )) (2)

given static mapping and schedule, we now explore the allo- X Ti

cation of global static slack (if any) in terms of minimizing = Cef (i ( fmax )3 (Ti + li ))(3)

Ti + li

energy consumption. X

3 Ti3

= Cef fmax (i ) (4)

4.3 Static Power Management with Parallelism (Ti + li )2

(P-SPM) where Cef is the effective switch capacitance and fi is the

speed for sections with parallelism i. For simplicity, the

From the above discussion, we observe that S-SPM is

idle state is assumed to consume no energy. To optimally

not optimal for parallel and distributed systems when par-

allocate L0 , we need to minimize E subject to:

allelism varies in an application. The intuition is that more

energy savings can be obtained by giving more slack to sec- l0 + l1 + l2 L0

tions with higher parallelism, thereby reducing the idle pe-

0 li L0 ; i = 0, 1, 2

riods in the system. We propose a static power management

for parallelism (P-SPM) scheme which takes into consider- Using l0 = L0 l1 l2 in Equation (4), and setting E

=

l1

ation the degree of parallelism when allocating global static E

slack to different sections of a schedule. l2 = 0, we can get the following optimal solutions:

l0 = 0;

4.3.1 P-SPM for 2-processor systems

T1 (L0 (21/3 1) T2 )

l1 = ;

For applications running on a 2-processor system, the de- T1 + 21/3 T2

gree of parallelism (DP) in a static schedule will range from T2 (21/3 L0 + (21/3 1) T2 )

0 to 2 (when communication cost is significant, part of the l2 = ;

T1 + 21/3 T2

schedule may be used only for communication with zero

parallelism; see time 2-3 in Figure 3). where 0 l1 , l2 L0 .

The static schedule will first be partitioned according to From the solutions, if L0 (21/3 1)T2 then l1 = 0

parallelism. For the example in Figure 1, the first time unit (since there is a constraint that li 0) and l2 = L0 , that is,

has parallelism of 2, the second and fourth time units have all the global static slack will be allocated to the sections of

parallelism of 1, and the third time unit has parallelism of schedule with parallelism 2. For the example in Figure 1,

0. We define Tij as the length of the j th section of a sched- there will be l0 = 0, l1 = 1.0676 and l2 = 0.9324, the min-

ule with parallelism of i, and define the Ptotal length with imal energy consumption is 0.3464E. Compared with the

parallelism of i in a schedule as Ti = j Tij . The static energy consumption when using S-SPM 94 E = 0.4444E,

schedule for the example will be partitioned as in Figure 3. an additional 22% energy is saved by P-SPM. Note that, P-

Here, we have T0 = 1, T1 = 2 and T2 = 1. SPM consumes more energy than the static schedule in Fig-

In general, suppose that an application runs on a 2- ure 2c, which is only 0.25E. The reason is that the sched-

processor system with global static slack of L0 and the par- ule in Figure 2c claim the gap in the middle of the schedule

titioned static schedule has specific T0 , T1 and T2 . Assume while P-SPM does not.

Algorithm 1 P-SPM Algorithm where Ei and Ei0 are the energy consumptions for sections

P

Partition static schedule and generate Ti = Tij ; with parallelism i before and after getting L, respectively.

L0

Divide L0 into L intervals, each of length L; In general1 , the smaller L is, the more accurate the solu-

while there is still slack L do tion is. But the more allocation steps there are, the more

Allocate L to Ti such as to maximize time consuming the algorithm is. After allocating each L

Ei = Cef i fi3 Ti L(2T i +L)

(Ti +L)2 ; segment, li will be re-distributed to Tij . Finally, each task

Update fi = fi Ti T+Li

and Ti = Ti + L. will gather all slack allocated to it and a single static speed

end while for the task is computed.

T

For all i, allocate li to Tij with lij = li Tiji ; Due to synchronization of tasks and parallelism of an ap-

For every task i , gather it allotted time ti , and determine plication, gaps may exist in the middle of a static schedule.

the speed fi = ctii ; After distributing global static slack, gaps in the middle of

the schedule can be further explored. Finding an optimal

usage of such gaps seems to be a non-trivial problem. One

4.3.2 P-SPM for N-processor systems simple scheme is to stretch tasks adjacent to the gap when

such stretching does not affect the application timing con-

The above idea can be easily extended to N-processor sys- straints.

tems. Assuming that there are N processors in a system, the From the above discussion, we can notice that even P-

degree of parallelism (DP) in a static schedule will range SPM is not optimal. Since dynamic power management is

from 0 to N . Suppose that a schedule section with DP = i needed to save more energy by taking advantage of tasks

has total length of Ti (which may consist of several sub- actual run-time behavior (which varies significantly [7]), we

sections Tij , j = 1, . . . , ui , where ui is the total number of explore dynamic power management schemes next.

sub-sections with parallelism i) and the global static slack in

the system is L0 . The amount of slack allocated to Ti is li .

The total energy consumption E after allocating L0 would 5 Dynamic Power Management

be the same as shown in Equation (1). Here i = 0, . . . , N .

The problem of finding an optimal allocation of L0 to Ti Dynamic slack is generated when tasks of the applica-

in terms of energy consumption will be to find l0 , . . . , lN so tion execute less than their worst case execution time. Dy-

as to namic power management is applied in addition to static

power management and used to reclaim dynamic slack. We

minimize(E) use two techniques to reclaim dynamic slack. The first is

subject to: greedy, that is, all available slack on one processor is given

li 0 to the next expected task running on that processor. If the ex-

X pected task is not ready when the previous task finishes ex-

li L0

ecution, the processor will enter the idle state if no preemp-

where i = 0, . . . , N . The constraints put limitations on how tion is allowed. The second technique, gap filling, is used

to allocate global static slack. when preemption is allowed. Instead of putting a processor

Solving the above problem is similar to solving the to the idle state if there is some slack and the next expected

constrained optimization problem presented in [3]. The task is not ready, the gap filling technique will fetch the first

P-SPM scheme in Algorithm 1 approximates the solution, future ready task in the local queue.This gap is added to the

where fi is the speed of section Ti and initially fi = fmax . allotted time of this future task to allow it to execute at a

reduced speed. The execution of the out-of-order task will

First, the algorithm partitions the static schedule into sec- be preempted by the next expected task when it receives all

tions according to parallelism and Ti is generated. The slack its data and is ready.

L0 is divided into L segments and there will be L 0

L such The dynamic power management algorithm is illustrated

segments. Then, the algorithm will allocate one L to some in Algorithm 2. After the schedule is generated and static

Ti in each iteration of the while-loop. In each iteration, L power management is applied, each processor will execute

is allocated to schedule sections with DP = i such that tasks from its local queue until the queue is empty. We use

energy reduction Ei is maximized. the function sleep() to put a processor to sleep and assume

it will wake up when the head of the queue is ready or a new

Ei = Ei Ei0

frame will begin. If the task executed is an out-of-order task

Ti

= Cef i (fi3 Ti ( fi )3 (Ti + L)) fetched by gap filling(), it will be preempted when the head

Ti + L task of the queue is ready.

Ti L (2 Ti + L)

= Cef i fi3 1 In

(Ti + L)2 Section 6.2, we discuss this issue in details.

Algorithm 2 DPM - Main Algorithm DPM-S: S-SPM + dynamic power management;

while local queue is not empty do DPM-P: P-SPM + dynamic power management;

if (next expected task k is ready) then

Reclaim dynamic slack (if any) and execute k with The algorithms DPM-G, DPM-S, DPM-P are executed

reduced speed; first using only dynamic greedy technique and then us-

else if (k is not ready) AND (future task j is ready ing gap filling technique on top of it. We use the term

in local queue) then DPM GAPFILL ON and DPM GAPFILL OFF to distin-

Gap filling(j ); guish between the two techniques in the graphs.

else

Sleep(); 6.2 Sensitivity Analysis

end if

end while

1.41e+11

1.39e+11

of how much work is left for each task. Luckily, many real-

time operating systems have this feature, facilitating imple- 1.38e+11

Energy Consumed

mentation. However, although we have not evaluated it,

1.37e+11

it is possible to apply a technique similar to gap filling in

non-preemptive systems. This would necessitate delaying 1.36e+11

1.35e+11

execution. Obviously, this can only be done after checking

precedence and synchronization constraints. 1.34e+11

1.33e+11

0 10 20 30 40 50 60 70 80 90 100

6 Evaluation and Analysis Granularity

L0

a. energy vs. L for 4 Processors

6.1 Simulation Methodology 2.2e+11

task graphs with 7, 50 and 90 nodes. Since the nature of

Energy Consumed

2.1e+11

the results are more or less the same, we only show the re-

sults for a 50-node graph.

The ci and the communication times are randomly gen- 2.05e+11

communication times are varied from 1 to 4 units. We run 2e+11

100 executions on the same task set to get statistically sig-

nificant results. To get the actual execution time of the task,

we define a parameter i for each task which is the ratio of 1.95e+11

0 10 20 30 40 50 60 70 80 90 100

and get the values of i from a discretized normal distribu- b. energy vs. L0

for 8 Processors

L

tion with average and standard deviation 0.48 (1 ) if

> 0.5 and 0.48 if 0.5 (the 0.48 value comes from

discretizing the values of the normal distribution). Figure 4. Sensitivity Analysis for a randomly gen-

We compare the energy consumed by all the schemes erated 50 node graph

with the one consumed by no power management (NPM).

To summarize, we consider the following schemes:

Sensitivity analysis is done offline to find out the opti-

G-SPM: greedy static power management; mal value of L unit for P-SPM. Intuitively a smaller value

S-SPM: simple static power management; of L will lead to better results due to the fine granularity

P-SPM: Static Power Management for parallelism; of slack distribution. However, small L values may signifi-

DPM-G: G-SPM + dynamic power management; cantly increase the cost of the algorithm. We plot the energy

consumption values obtained for varying L on 4 and 8 pro- effect of slack in the middle of schedule decreases as laxity

cessors and the graphs obtained are as shown in Figure 4a increases. Finally, the P-SPM scheme saves 5% more than

and Figure 4b. However, the graphs indicate that as we de- S-SPM and 40% more than G-SPM when using 4 proces-

crease the granularity of L, the energy consumption does sors. When there are 8 processors these values are 10% and

not decrease strictly. 50%, respectively.

The fact that energy does not decrease uniformly with

reducing L can be explained by taking the structure of 1

NPM

G-SPM

the task graph and the nature of algorithm into account. If 0.9

S-SPM

P-SPM

the distribution of the slack units may result in speeds that 0.8

give higher energy consumption in case of x + 1 L units.

The reason is that slack units are distributed to Li intervals 0.7

0.6

To choose the best value of L, we do the sensitivity

analysis by running the algorithms for different values of 0.5

Figure 4. From the figures, we can see that, when L L =

0 0.4

0.3

For the following experiments (Figure 6 and 7), we do the 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

laxity

same experiment and choose the smallest Kmin , such that,

a. energy vs. laxity for 4 Processors

when L L Kmin , the energy consumption error is within

0

1

NPM

2%. G-SPM

0.95 S-SPM

P-SPM

0.9

6.3 Performance Comparison

Normalized Energy Consumed

0.85

0.8

We start by comparing the energy consumed by our new

scheme P-SPM with S-SPM, NPM and SHIFT (the scheme 0.75

we fix the number of processors to 4 and show the energy

0.55

normalized to NPM as a function of the laxity in the sys-

tem (Figure 5a). In these graphs, laxity is a factor that is 0.5

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

laxity

multiplied by the static schedule span to yield the dead-

line. That is, D = laxity static schedule span. In b. energy vs. laxity for 8 Processors

some sense, it gives the inverse of the load imposed on the

system. As we can see from the graph in Figure 5a, the Figure 5. The normalized energy vs. laxity for

P-SPM scheme performs best in terms of energy savings. different SPMs.

On increasing the number of processors to 8, (Figure 5b)

P-SPM experiences more energy savings because of the in-

crease in degree of parallelism in the schedule. P-SPM is Next, we measure the energy saved by the dynamic

able to exploit this feature by sharing the slack with more schemes, with and without gap-filling. We perform these

tasks at a time. Although not shown, the total energy con- experiments by varying the laxity factor from 1.25 to 2.0

sumed in case of 8 processors is greater than that consumed and number of processors from 4 to 8. The results can be

using 4 processors. The reason is that total idle time typi- seen in Figures 6 and 7. We first run all SPM algorithms and

cally increases for larger number of processors due to more then apply our dynamic scheme over the resultant schedule

synchronization needed on more processors (recall that an obtained from different SPM algorithms. We find that the

idle processor consumes energy). Another interesting result DPM-P technique is the best among all three.

is that SHIFT technique performs better in case of smaller Another interesting observation is that with

laxity, whereas P-SPM gets better with increasing laxity. DPM GAPFILL ON we save as much as 5% compared to

The reason is because SHIFT reclaims slack in the middle G-DPM, S-DPM and P-DPM with GAPFILL OFF. We also

of schedule while P-SPM only reclaims global static slack. notice from the graphs that for eight processors DPM-G

There is less global static slack with smaller laxity, and the performs better than DPM-S at lower values of . This is

1 0.9

DPM-G-GAPFILL-ON DPM-G-GAPFILL-ON

DPM-G-GAPFILL-OFF DPM-G-GAPFILL-OFF

DPM-S-GAPFILL-ON DPM-S-GAPFILL-ON

0.9 DPM-S-GAPFILL-OFF DPM-S-GAPFILL-OFF

DPM-P-GAPFILL-ON 0.8 DPM-P-GAPFILL-ON

DPM-P-GAPFILL-OFF DPM-P-GAPFILL-OFF

0.8

0.7

Normalized Energy Consumed

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2 0.2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

alpha alpha

1 1

DPM-G-GAPFILL-ON DPM-G-GAPFILL-ON

DPM-G-GAPFILL-OFF DPM-G-GAPFILL-OFF

DPM-S-GAPFILL-ON DPM-S-GAPFILL-ON

DPM-S-GAPFILL-OFF 0.95 DPM-S-GAPFILL-OFF

0.9 DPM-P-GAPFILL-ON DPM-P-GAPFILL-ON

DPM-P-GAPFILL-OFF DPM-P-GAPFILL-OFF

0.9

Normalized Energy Consumed

0.8 0.85

0.8

0.7

0.75

0.6 0.7

0.65

0.5

0.6

0.4 0.55

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

alpha alpha

Figure 6. The normalized energy vs. for DPM Figure 7. The normalized energy vs. for DPM

(laxity = 1.25, for both GAPFILL ON and GAP- (Laxity = 1.75, for both GAPFILL ON and GAP-

FILL OFF) FILL OFF)

because when much dynamic slack is generated, DPM-S schedule according to their degree of parallelism. Second,

does not use the whole available slack at once, saving the gap-filling technique enhances the greedy algorithm by

slack for future tasks. However, when slack is dynamically allowing out-of-order execution when preemption is con-

generated, this turns out to be a very conservative approach sidered; that is, if there is some slack and the next expected

that benefits DPM-G. Also, by increasing the number task is not ready, the processor will run the future ready

of processors the relative gain increases but at the same tasks mapped to it.

time the overall energy consumption also increases due to We compared our schemes with some previous pro-

increase in the total idle time. posed schemes, the simulation results show that P-SPM

can save 10-20% more energy compared with simple static

power management (S-SPM) for parallel systems, which

7 Conclusion distributes global static slack proportionally to the length

of the schedule, and save 10% more than the static scheme

In this paper, we propose two novel techniques for power proposed in [15]. While the gap-filling technique can save

management in distributed systems. First, static power 5% more energy when applied after greedy.

management for parallelism (P-SPM) allocates global static In this work, we assume continuous speed changes. The

slack, defined as the difference between the length of the schemes can be easily adapted for processors with discrete

static schedule and the deadline, to different sections of the speed levels as shown in [12, 28].

References [16] J. Luo and N. K. Jha. Static and dynamic variable volt-

age scheduling algorithms for real-time heterogeneous dis-

[1] N. AbouGhazaleh, D. Mosse, B. Childers, and R. Melhem. tributed embedded systems. In Proc. of 15th International

Toward the placement of power management points in real Conference on VLSI Design, Bangalore, India, Jan. 2002.

time applications. In Proc. of Workshop on Compilers and [17] R. Melhem, N. AbouGhazaleh, H. Aydin, and Daniel Mosse.

Operating Systems for Low Power, Barcelona, Spain, 2001. Power Management Points in Power-Aware Real-Time Sys-

[2] H. Aydin, R. Melhem, D. Mosse, and P. Mejia-Alvarez. tems, chapter 7, pages 127152. Power Aware Computing.

Dynamic and aggressive scheduling techniques for power- Plenum/Kluwer Publishers, 2002.

aware real-time systems. In Proc. of The 22th IEEE Real- [18] D. Mosse, H. Aydin, B. Childers, and R. Melhem. Compiler-

Time Systems Symposium, London, UK, Dec. 2001. assisted dynamic power-aware scheduling for real-time ap-

[3] H. Aydin, R. Melhem, D. Mosse, and P. Mejia-Alvarez. Opti- plications. In Proc. of Workshop on Compiler and OS for

mal reward-based scheduling for periodic real-time systems. Low Power, Philadelphia, PA, Oct. 2000.

IEEE Trans. on Computers, 50(2):111130, 2001. [19] P. Pillai and K. G. Shin. Real-time dynamic voltage scal-

[4] T. D. Burd and R. W. Brodersen. Energy efficient cmos mi- ing for low-power embedded operating systems. In Proc.

croprocessor design. In Proc. of The HICSS Conference, of 18th ACM Symposium on Operating Systems Principles

pages 288297, Maui, Hawaii, Jan. 1995. (SOSP01), Banff, Canada, Oct. 2001.

[5] A. Chandrakasan, V. Gutnik, and T. Xanthopoulos. Data [20] R. RajKumar, K. Juvva, A. Molano, and S. Oikawa. Re-

driven signal processing: An approach for energy efficient source kernels: A resource-centric approach to real-time sys-

computing. In Proc. Intl Symposium on Low-Power Elec- tems. In Proc. of the SPIE/ACM Conference on Multimedia

tronic Devices, Monterey, CA, 1996. Computing and Networking, Jan. 1998.

[6] A. Chandrakasan, S. Sheng, and R. Brodersen. Low-power [21] S. Selvakumar and C. Siva Ram Murthy. Scheduling prece-

cmos digital design. IEEE Journal of Solid-State Circuit, dence constrained task graphs with non-negligible intertask

27(4):473484, 1992. communication onto multiprocessors. IEEE Trans. on Par-

[7] R. Ernst and W. Ye. Embedded program timing analysis allel and Distributed Systems, 5(3):328336, 1994.

based on path clustering and architecture classification. In [22] D. Shin, J. Kim, and S. Lee. Intra-task voltage scheduling

Proc. of The International Conference on Computer-Aided for low-energy hard real-time applications. IEEE Design &

Design, pages 598604, San Jose, CA, Nov. 1997. Test of Computers, 18(2):2030, 2001.

[8] F. Gruian. System-level design methods for low-energy [23] M. Weiser, B. Welch, A. Demers, and S. Shenker. Schedul-

architectures containing variable voltage processors. In ing for reduced cpu energy. In Proc. of The First USENIX

Proc. of The Workshop on Power-Aware Computing Systems Symposium on Operating Systems Design and Implementa-

(PACS), Cambridge, MA, Nov. 2000. tion, pages 1323, Monterey, CA, Nov. 1994.

[9] F. Gruian and K. Kuchcinski. Lenes: Task scheduling for [24] P. Yang, C. Wong, P. Marchal, F. Catthoor, D. Desmet,

low-energy systems using variable supply voltage proces- D. Kerkest, and R. Lauwereins. Energy-aware runtime

sors. In Proc. of Asia and South Pacific Design Automation scheduling for embedded-multiprocessor socs. IEEE Design

Conference (ASP-DAC), Yokohama, Japan, Jan. 2001. & Test of Computers, 18(5):4658, 2001.

[10] http://developer.intel.com/design/intelxscale/. [25] F. Yao, A. Demers, and S. Shenker. A scheduling model for

[11] http://www.transmeta.com. reduced cpu energy. In Proc. of The 36th Annual Sympo-

[12] T. Ishihara and H. Yauura. Voltage scheduling problem for sium on Foundations of Computer Science, pages 374382,

dynamically variable voltage processors. In Proc. of The Milwaukee, WI, Oct. 1995.

1998 International Symposium on Low Power Electronics [26] Y. Zhang, X. Hu, and D. Z. Chen. Task scheduling and

and Design, pages 197202, Monterey, CA, Aug. 1998. voltage selection for energy minimization. In Proc. of The

[13] P. Kumar and M. Srivastava. Predictive strategies for low- 39th Design Automation Conference, pages 183188, New

power rtos scheduling. In Proc. of the 2000 IEEE Interna- Orleans, LA, Jun. 2002.

tional Conference on Computer Design: VLSI in Computers [27] D. Zhu, N. AbouGhazaleh, D. Mosse, and R. Melhem.

and Processors, Austin, TX, Sep. 2000. Power aware scheduling for and/or graphs in multi-processor

[14] F. Liberato, S. Lauzac, R. Melhem, and D. Mosse. Fault- real-time systems. In Proc. of The Intl Conference on Par-

tolerant real-time global scheduling on multiprocessors. In allel Processing, pages 593601, Vancouver, Cannda, Aug.

Proc. of The 10th IEEE Euromicro Workshop in Real-Time 2002.

Systems, York, UK, Jun. 1999. [28] D. Zhu, R. Melhem, and B. Childers. Scheduling with dy-

[15] J. Luo and N. K. Jha. Power-conscious joint scheduling of namic voltage/speed adjustment using slack reclamation in

periodic task graphs and aperiodic tasks in distributed real- multi-processor real-time systems. In Proc. of The 22th

time embedded systems. In Proc. of International Confer- IEEE Real-Time Systems Symposium, pages 8494, London,

ence on Computer Aided Design (ICCAD), San Jose, CA, UK, Dec. 2001.

Nov. 2000.

- CS-12Uploaded byamritanshuu
- KiwibankUploaded by1E : Empowering IT Efficiency
- An Optimization of Processomulti-cluster computing,dataminingUploaded byeditorijcert
- Tensor FlowUploaded byhanifalisohag
- 10.1.1.167Uploaded bybeck_hammer
- databaseUploaded byDibas Sil
- Chapter 6Uploaded byIzyani Mat Rusni
- ch06Uploaded bypremsukh
- ACA Research PaperUploaded byKashif
- SAP Background Process Scheduling and Workload ManagementUploaded bysinho10
- Schedulingalgorithmsincloudcomputing 150114094127 Conversion Gate02Uploaded byNilo Tique
- CS9211-Computer Architecture QuestionUploaded byrvsamy80
- SourabhK1511B45Uploaded byKíllér ßoy
- ExamUploaded byAhmed Mahmoud
- apm2Uploaded byVenkatakishore Ch
- LWDFUploaded byAnonymous 1aqlkZ
- ECC3202-1Uploaded byBenjamin Chavez
- A State-based Model for Runtime Resource Reservation for Component-based ApplicationsUploaded byIDES
- 10.1.1.78Uploaded bymoicasper
- Deadline – Cost Based Job Scheduling Using Greedy Approach in a Multi-Layer EnvironmentUploaded byseventhsensegroup
- Intro Parallel ComputingUploaded bybiku06326
- WP Demystifying TempdbUploaded byharoldzb
- Top Ten Tips for MS ProjectUploaded byPaulo Pinheiro
- Assembly language introUploaded byhsi3
- Software Engineering Chapter (15)Uploaded byalwaysharsha
- GPU-Accelerated Fluid DynamicsUploaded byMalik Muhammad Nouman
- lecture 3Uploaded byAzkahViqar
- Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud ComputingUploaded byAnonymous kw8Yrp0R5r
- Process ManagementUploaded byModaser Sadat
- 15CS205J Microprocessors and Microcontrollers 1Uploaded byZareen Stella

- CloudSim-DVFS.pdfUploaded byjorisduda
- sociologyUploaded bybabarirfanali
- CSS 2018 Written Part ResultUploaded bybabarirfanali
- Advt. No.9-2018_0.pdfUploaded byMuhammad Tayyab
- 97_astrUploaded byxSonyWIi
- Exercise SolutionsUploaded bybabarirfanali
- CSS_2018_Written_Part_Result.pdfUploaded bybabarirfanali
- Solar System MCQSUploaded bybabarirfanali
- Algorithm AbstractionsUploaded bybabarirfanali
- 00726348Uploaded bybabarirfanali
- --Advertisment-1. Faculty ADV-0006 MUET-Khairpur (WEB) (1).pdfUploaded byAbdul Latif Abro
- --Advertisment-1. Faculty ADV-0006 MUET-Khairpur (WEB) (1).pdfUploaded byAbdul Latif Abro
- scheduling_2.pdfUploaded bybabarirfanali
- Advertisment 2017Uploaded bybabarirfanali
- ADV-06-2017_2.pdfUploaded bybabarirfanali
- 2014_04_msw_a4_format.docUploaded bysunnyday32

- SRS Social Networking SRSUploaded bySupun Thilina
- Agile Release CriteriaUploaded byneovik82
- Benefits of E_CommerceUploaded byElhadi Eldulfag
- Pt Practice Sba EigrpUploaded bydepeters3183
- Sap Gui Download InstructionsUploaded byhuichole
- 32F746GDISCOVERYUploaded bylethach
- Kuehne NagelUploaded by9986
- using_mlUploaded byddsfath
- Functional Hazard Analysis Common ProcessUploaded byMilad Yadollahi
- Magistere Lamia BENMIHOUBUploaded byouael
- Openmrs GuideUploaded byzidia84
- qr codeUploaded bychervalier
- User Manual ICS-4040Uploaded byoring2012
- Mindtree Thought Posts White Paper Customer Centricity in Airline IndustryUploaded byAzuati Mahmud
- A State of the Art Review of the X-FEM for Computational Fracture MechanicsUploaded bynargissuhail
- 3d Printed Prosthetic Hand.pdfUploaded byDavid Corpadean
- 10 Ways to Raise Your ConsciounessUploaded byMagda Preda
- SWRUploaded by14061994
- Comptia Continuing Education Activity Chart 2015Uploaded byGeorgiana Crâmpiță
- Malayala Sangeetham Info-VancouverUploaded bymalayalasangeetham
- Understanding Performance Tuning in OracleUploaded byramesh158
- Library Management System projectUploaded byrachna_raghuwanshi
- Problems of Number Theory in Mathematical CompetitionsUploaded byAndy-Ivan
- Analisis Fitur_editNuning (Hasil Translate)Uploaded byIman
- Influencer Marketing PlaybookUploaded byDemand Metric
- Himanshu ResumeUploaded byHimanshu Mishra
- A Plug and Play Framework for an HVAC Air Handling Unit and TempeUploaded byPrestoneK
- Tutorial 04 SupportUploaded byMauricio Santisteban Campos Robles
- Webuser_-_9_September_2015Uploaded byJoel Cambridge
- nfc nwUploaded byChintala Raghu