Linux Cpu Scheduler

1
CPU Scheduling

The scheduler is the component of the kernel that selects which process to run next. The scheduler (or
process scheduler, as it is sometimes called) can be viewed as the code that divides the finite resource
of processor time between the runnable processes on a system. The scheduler is the basis of a
multitasking operating system such as Linux. By deciding what process can run, the scheduler is
responsible for best utilising the system and giving the impression that multiple processes are
simultaneously executing.
The idea behind the scheduler is simple. To best utilise processor time, assuming there are runnable
processes, a process should always be running. If there are more processes than processors in a system,
some processes will not always be running. These processes are waiting to run. Deciding what process
runs next, given a set of runnable processes, is a fundamental decision the scheduler must make.
Multitasking operating systems come in two flavours:
cooperative multitasking and
preemptive multitasking.
Linux, like all Unix variants and most modern operating systems, provides preemptive multitasking. In
preemptive multitasking, the scheduler decides when a process is to cease running and a new process is
to resume running. The act of involuntarily suspending a running process is called preemption. The
time a process runs before it is preempted is predetermined, and is called the timeslice of the process.
The timeslice, in effect, gives each process a slice of the processor's time. Managing the timeslice
enables the scheduler to make global scheduling decisions for the system. It also prevents any one
process from monopolising the system. This timeslice is dynamically calculated in the Linux scheduler
to provide some interesting benefits.
Conversely, in cooperative multitasking, a process does not stop running until it voluntary decides to
do so. The act of a process voluntarily suspending itself is called yielding. The shortcomings of this
approach are numerous: The scheduler cannot make global decisions regarding how long processes run,
processes can monopolise the processor for longer than the user desires, and a hung process that never
yields can potentially bring down the entire system. Thankfully, most operating systems designed in the
last decade have provided preemptive multitasking.. Unix has been preemptively multitasked since the
beginning.
The Linux kernel, unlike most other Unix variants and many other operating systems, is a fully
preemptive kernel. In non-preemptive kernels, kernel code runs until completion. That is, the
scheduler is not capable of rescheduling a task while it is in the kernelkernel code is scheduled
cooperatively, not preemptively. Kernel code runs until it finishes (returns to user-space) or explicitly
blocks.

Policy
Policy is the behaviour of the scheduler that determines what runs when. A scheduler's policy often
determines the overall feel of a system and is responsible for optimally utilising processor time.
Therefore, it is very important.
2
I/O-Bound Versus Processor-Bound Processes
Processes can be classified as either I/O-bound or processor-bound. The former is characterised as a
process that spends much of its time submitting and waiting on I/O requests. Consequently, such a
process is often runnable, but only for short periods, because it will eventually block waiting on more
I/O (this is any type of I/O, such as keyboard activity, and not just disk I/O). Conversely, processor-
bound processes spend much of their time executing code. They tend to run until they are preempted
because they do not block on I/O requests very often. Because they are not I/O-driven, however,
system response does not dictate that the scheduler run them often. The scheduler policy for processor-
bound processes, therefore, tends to run such processes less frequently but for longer periods. Of
course, these classifications are not mutually exclusive. The scheduler policy in Unix variants tends to
explicitly favour I/O-bound processes.
The scheduling policy in a system must attempt to satisfy two conflicting goals: fast process response
time (low latency) and high process throughput. To satisfy these requirements, schedulers often employ
complex algorithms to determine the most worthwhile process to run, while not compromising fairness
to other, lower priority, processes. Favouring I/O-bound processes provides improved process response
time, because interactive processes are I/O-bound. Linux, to provide good interactive response,
optimises for process response (low latency), thus favouring I/O-bound processes over processor-bound
processors. This is done in a way that does not neglect processor-bound processes.
Timeslice
The timeslice is the numeric value that represents how long a task can run until it is preempted. The
scheduler policy must dictate a default timeslice, which is not simple. A timeslice that is too long will
cause the system to have poor interactive performance; the system will no longer feel as if applications
are being concurrently executed. A timeslice that is too short will cause significant amounts of
processor time to be wasted on the overhead of switching processes, as a significant percentage of the
system's time will be spent switching from one process with a short timeslice to the next. Furthermore,
the conflicting goals of I/O-bound versus processor-bound processes again arise; I/O-bound processes
do not need longer timeslices, whereas processor-bound processes crave long timeslices (to keep their
caches hot, for example).
With this argument, it would seem that any long timeslice would result in poor interactive performance.
In many operating systems, this observation is taken to heart, and the default timeslice is rather low
for example, 20ms. Linux, however, takes advantage of the fact that the highest priority process always
runs. The Linux scheduler bumps the priority of interactive tasks, enabling them to run more
frequently. Consequently, the Linux scheduler offers a relatively high default timeslice (see Figure 1).
Furthermore, the Linux scheduler dynamically determines the timeslice of a process based on priority.
This enables higher priority, allegedly more important, processes to run longer and more often.
Implementing dynamic timeslices and priorities provides robust scheduling performance.

Figure 1

Note that a process does not have to use all its timeslice at once. For example, a process with a 100
millisecond timeslice does not have to run for 100 milliseconds in one go or risk losing the remaining
timeslice. Instead, the process can run on five different reschedules for 20 milliseconds each. Thus, a
large timeslice also benefits interactive taskswhile they do not need such a large timeslice all at once,
it ensures they remain runnable for as long as possible.
When a process's timeslice runs out, the process is considered expired. A process with no timeslice is
not eligible to run until all other processes have exhausted their timeslice (that is, they all have zero
timeslice remaining). At that point, the timeslices for all processes are recalculated. The Linux
scheduler employs an interesting algorithm for handling timeslice exhaustion.
Timeslice is sometimes called quantum or processor slice in other systems. Linux calls it timeslice.

Scheduling Criteria
Many criteria have been suggested for comparing CPU scheduling algorithms. Which characteristics
are used for comparison can make a substantial difference in the determination of the best algorithm.
Criteria that are used include the following:

CPU utilisation. It is desirable to keep the CPU as busy as possible. CPU utilisation may range
from 0 to 100 percent. In a real system, it should range from 40 percent (for a lightly loaded
system) to 90 percent (for a heavily used system).

Throughput. If the CPU is busy, then work is being done. One measure of work is the number
of processes that are completed per time unit, called throughput. For long processes, this rate
may be one process per hour; for short transactions, throughput might be 10 processes per
second.

Turnaround time. From the point of view of a particular process, the important criterion is how
long it takes to execute that process. The interval from the time of submission to the time of
completion is the turnaround time.

3
4
Waiting time. The CPU scheduling algorithm does not affect the amount of time during which a
process executes or does I/O; it affects only the amount of time that a process spends waiting in
the ready queue. Waiting time is the sum of the periods spent waiting in the ready queue.

Response time. The response time, is the amount of time it takes to start responding, but not
the time that it takes to output that response. Often, a process can produce some output fairly
early, and can continue computing new results while previous results are being output to the
user.

It is desirable to maximise CPU utilisation and throughput, and to minimise turnaround time, waiting
time, and response time. In most cases, we optimise the average measure. However, there are
circumstances when it is desirable to optimise the minimum or maximum values, rather than the
average. For example, to guarantee that all users get good service, we may want to minimise the
maximum response time.

Scheduling Algorithms
CPU scheduling deals with the problem of deciding which of the processes in the ready queue is to be
allocated the CPU.

First-Come, First-Served
By far the simplest CPU scheduling algorithm is the first-come, first-served scheduling (FCFS)
algorithm. With this scheme, the process that requests the CPU first is allocated the CPU first. The
implementation of the FCFS policy is easily managed with a FIFO queue. When a process enters the
ready queue, its PCB is linked onto the tail of the queue. When the CPU is free, it is allocated to
the process at the head of the queue. The running process is then removed from the queue. The code for
FCFS scheduling is simple to write and understand. The average waiting time under the FCFS policy,
however, is often quite long.

Consider the following set of processes that arrive at time 0, with the length of the CPU-burst time
given in milliseconds:

Process A
Estimated
runtime
Waiting
B Ratio
B/A
P1 2 0 0
P2 60 2 0.03
P3 1 62 62
P4 3 63 21
P5 50 66 1.32

5

The waiting time is 0 milliseconds for process P1, 2 milliseconds for process
P2, and 62 milliseconds for process P3, 63 millisecond for P4 and 66 ms for P5. Thus, the average
waiting time is
(0 +2 +62 +63 +66)/5 =38.6 milliseconds.

Now consider the performance of FCFS scheduling in a dynamic situation. Assume we have one CPU-
bound process and many I/O-bound processes. As the processes flow around the system, the following
scenario may result. The CPU-bound process will get the CPU and hold it. During this time, all the
other processes will finish their I/O and move into the ready queue, waiting for the CPU. While the
processes wait in the ready queue, the I/O devices are idle. Eventually, the CPU-bound process finishes
its CPU burst and moves to an I/O device. All the I/O-bound processes, which have very short CPU
bursts, execute quickly and move back to the I/O queues. At this point, the CPU sits idle. The CPU-
bound process will then move back to the ready queue and be allocated the CPU. Again, all the I/O
processes end up waiting in the ready queue until the CPU-bound process is done. There is a convoy
effect, as all the other processes wait for the one big process to get off the CPU. This effect results in
lower CPU and device utilisation than might be possible if the shorter processes were allowed to go
first.
The FCFS scheduling algorithm is non preemptive. Once the CPU has been allocated to a process, that
process keeps the CPU until it releases the CPU, either by terminating or by requesting I/O. The FCFS
algorithm is particularly unsuitable for time-sharing systems, where it is important that each user get a
share of the CPU at regular intervals

Shortest-Job-First
A different approach to CPU scheduling is the shortest-job-first (SJ F) algorithm i.e. if the CPU is
available, it is assigned to the process that has the smallest next CPU burst. If two processes have the
same length next CPU burst, FCFS scheduling is used to arbitrate. Note that a more appropriate term
would be the shortest next CPU burst, because the scheduling is done by examining the length of the
next CPU-burst of a process, rather than its total length.As an example, consider the following set of
processes, with the length of the CPU burst time given in milliseconds:

Process A
Estimated runtime
B
Waiting
Ratio B/A
P3 1 0 0
P1 2 1 0.5
P4 3 3 1.0
P5 50 6 0.1
P2 60 56 0.9

The waiting time is 1 millisecond for process P1, 56 milliseconds for process P2, 0 milliseconds for
process P3, 3 milliseconds for process P4 and 6 milliseconds for process P5. Thus, the average waiting
time is (1 +56 +0 +3 +6)/5 =13.2 milliseconds. If we were using the FCFS scheduling, then the
average waiting time would be 38.6 milliseconds.
6
The SJ F scheduling algorithm gives the minimum average waiting time for a given set of processes.
By moving a short process before a long one the waiting time of the short process decreases more than
it increases the waiting time of the long process. Consequently, the average waiting time decreases.
The real difficulty with the SJ F algorithm is knowing. the length of the next CPU request. For long-
term (job) scheduling in a batch system, we can use as the length the process time limit that a user
specifies when he submits the job. Thus, users are motivated to estimate the process time limit
accurately, since a lower value may mean faster response. (Too low a value will cause a time-limit-
exceeded error and require resubmission.) SJ F scheduling is used
frequently in long-term scheduling.
Although the SJ F algorithm is optimal, it cannot be implemented at the level of short-term CPU
scheduling. There is no way to know the length of the next CPU burst. One approach is to try to
approximate SJ F scheduling. We may not know the length of the next CPU burst, but we may be able
to predict its value. We expect that the next CPU burst will be similar in length to the previous ones.
Thus, by computing an approximation of the length of the next CPU burst, we can pick the process
with the shortest predicted CPU burst.

Priority Scheduling
The SJ F algorithm is a special case of the general priority scheduling algorithm. A priority is associated
with each process, and the CPU is allocated to the process with the highest priority. Equal-priority
processes are scheduled in FCFS order. Processes with a higher priority will run before those with a
lower priority, while processes with the same priority are scheduled round-robin (one after the next,
repeating). On some systems, Linux included, processes with a higher priority also receive a longer
timeslice. The runnable process with timeslice remaining and the highest priority always runs. Both the
user and the system may set a processes priority to influence the scheduling behavior of the system.
Linux builds on this idea and provides dynamic priority-based scheduling. This concept begins with the
initial base priority, and then enables the scheduler to increase or decrease the priority dynamically to
fulfill scheduling objectives. For example, a process that is spending more time waiting on I/O than
running is clearly I/O bound. Under Linux, it receives an elevated dynamic priority. As a
counterexample, a process that continually uses up its entire timeslice is processor boundit would
receive a lowered dynamic priority.
The Linux kernel implements two separate priority ranges. The first is the nice value, a number from -
20 to 19 with a default of zero. Larger nice values correspond to a lower priorityyou are being nice to
the other processes on the system. Processes with a lower nice value (higher priority) run before
processes with a higher nice value (lower priority). The nice value also helps determine how long a
timeslice the process receives. A process with a nice value of 20 receives the maximum timeslice,
whereas a process with a nice value of 19 receives the minimum timeslice. Nice values are the standard
priority range used in all Unix systems.
The second range is the real-time priority, which will be discussed later. By default, it ranges from zero
to 99. All real-time processes are at a higher priority than normal processes. Linux implements real-
time priorities in accordance with POSIX. Most modern Unix systems implement a similar scheme.

7
As an example, consider the following set of processes, assumed to have arrived at time 0, in the order
P1, P2, ..., P5. with the length of the CPU-burst time given in milliseconds:

Process Burst
Time
Priority
P1 10 3
P2 1 1
P3 2 3
P4 1 4
P5 5 2

Using priority scheduling, we would schedule these processes as follows:

P2 P5 P1 P3 P4
0 1 6 16 18 19

The average waiting time is 12 milliseconds.
Priority scheduling can be either preemptive or non pre-emptive. When a process arrives at the ready
queue, its priority is compared with the priority of the currently running process. A preemptive priority
scheduling algorithm will preempt the CPU if the priority of the newly arrived process is higher than is
the priority of the currently running process. A non pre-emptive priority scheduling algorithm will
simply put the new process at the head of the ready queue.
A major problem with priority scheduling algorithms is indefinite blocking or starvation. A process
that is ready to run but unable to get CPU time can be considered blocked. A priority scheduling
algorithm can leave some low-priority processes waiting indefinitely for the CPU. In a heavily loaded
computer system, a steady stream of higher-priority processes can prevent a low-priority process from
ever getting the CPU. Generally, one of two things will happen. Either the process will eventually be
run (when the system is finally lightly loaded), or the computer system will eventually crash and lose
all unfinished low-priority processes.
A solution to the problem of indefinite blockage of low-priority processes is aging. Aging is a
technique of gradually increasing the priority )of processes that wait in the system for a long time. For
example, if priorities range from 0 (low) to 127 (high), we could increment the priority of a waiting
process by 1 every 15 minutes. Eventually, even a process with an initial priority of 0 would have the
highest priority in the system and would be executed.

Round-Robin Scheduling
The round-robin (RR) scheduling algorithm is designed especially for time-sharing systems. It is
similar to FCFS scheduling, but preemption is added switch between processes. A small unit of time,
called a time quantum, or time slice, is defined. A time quantum is generally from 10 to 100
milliseconds. The ready queue is treated as a circular queue. The CPU scheduler goes around the ready
queue, allocating the CPU to each process for a time interval of up to 1 time quantum.
To implement RR scheduling, the ready queue is kept as a FIFO queue of processes. New processes
are added to the tail of the ready queue. The CPU scheduler picks the first process from the ready
queue, sets a timer to interrupt after 1 time quantum, and dispatches the process. One of two things will
then happen. The process may have a CPU burst of less than 1, time quantum. In this case, the process
itself will release the CPU voluntarily. The scheduler will then proceed to the next process in the ready
queue. Otherwise, if the CPU burst of the currently running process is longer than 1 time quantum, the
timer will go off and will cause an interrupt to the operating system. A context switch will be executed,
and the process will be put at the tail of the ready queue. The CPU scheduler will then select the next
process in the ready queue. The average waiting time under the RR policy, however, is often quite
long.
Consider the following set of processes that arrive at time 0, with the length of the CPU-burst time
given in milliseconds:

Process Burst Time
P1 24
P2 3
P3 3

Using a time quantum of 4 milliseconds, then process P1 gets the first 4 milliseconds. Since it requires
another 20 milliseconds, it is preempted after the first time quantum, and the CPU is given to the next
process in the queue, process P2. Since process P2 does not need 4 milliseconds, it quits before its time
quantum expires. The CPU is then given to the next process, process P3. Once each process has
received 1 time quantum, the CPU is returned to process P1 for an additional time quantum. The
resulting RR schedule is

P1 P2 P3 P1 P1 P1 P1 P1
0 4 7 10 14 18 22 26 30

The performance of the RR algorithm depends heavily on the size of the time quantum. At one
extreme, if the time quantum is very large (infinite), the RR policy is the same as the FCFS policy. If
the time quantum is very small (say 1 microsecond), the RR approach is called processor sharing, and
appears (in theory) to the users as though each of n processes has its own processor running at 1/n the
speed of the real processor. In software, however, the effect of context switching on the performance
needs to be considered.
It is assumed that there is only one process of 10 time units. If the quantum is 12 time units, the process
finishes in less than 1 time quantum, with no overhead. If the quantum is 6 time units, however, the
process requires 2 quanta, resulting in a context switch, thus slowing the execution of the process

Thus, we want the time quantum to be large with respect to the context-switch time. If the context-
switch time is approximately 10 percent of the time quantum, then about 10 percent of the CPU time
will be spent in context switch. Turnaround time also depends on the size of the time quantum. The
average turnaround time of a set of processes does not necessarily improve as the
time-quantum size increases. In general, the average turnaround time can be improved if most
processes finish their next CPU burst in a single time quantum.

Process Ageing
One of the problems with priority scheduling is that processes can become starved. This occurs when
low priority processes are continually gazumped by higher priority processes and never get to run. One
way round this is to introduce process ageing.

A B C D E F
15 14 13 12 12 10

8
A
15

B C D E F
14 13 12 12 10

9

A B C D E F
16 15 14 13 13 11
A
15
B
15
C D E F
14 13 13 11

B A C D E F
16 16 15 14 14 12

B
14

A C D E
16 15 14 14
F
12

A C D E B F
17 16 15 15 15 13

After each process runs, all other processes have their priorities incremented, whilst the running
process has its priority set back to its original priority and placed back into the queue in ascending
order of priorities. This means that the F, which has the lowest priority, will eventually reach the front
of the queue.

Multilevel Queue Scheduling
Another class of scheduling algorithms has been created for situations in which processes are easily
classified into different groups. For example, a common division is made between foreground
(interactive) processes and background (batch) processes. These two types of processes have different
response-time requirements, and so might have different scheduling needs. In addition, foreground
processes may have priority (externally defined) over background processes.
A multilevel queue-scheduling algorithm partitions the ready queue into several separate queues. The
processes are permanently assigned to one queue, generally based on some property of the process,
such as memory size, process priority, or process type. Each queue has its own scheduling algorithm.
For example, separate queues might be used for foreground and background processes. The foreground
queue might be scheduled by an RR algorithm, while the background queue is scheduled by an FCES
algorithm.In addition, there must be scheduling between the queues, which is commonly implemented
as a fixed-priority preemptive scheduling. For example, the foreground queue may have absolute
priority over the background queue.

Example - a multilevel queue scheduling algorithm with five queues:

System processes
Interactive processes
Interactive editing processes
Batch processes
Student processes
Lowest priority
Highest priority

Each queue has absolute priority over lower-priority queues. No process in the batch queue, for
example, could run unless the queues for system processes, interactive processes, and interactive
editing processes were all empty. If an interactive editing process entered the ready queue while a batch
process was running, the batch process would be preempted.
Another possibility is to time slice between the queues. Each queue gets a certain portion of the CPU
time, which it can then schedule among the various processes in its queue. For instance, in the
foreground-background queue example, the foreground queue can be given 80 percent of the CPU time
for RR scheduling among its processes, whereas the background queue receives 20 percent
of the CPU to give to its processes in a FCFS manner.

Multilevel Feedback Queue Scheduling
Normally, in a multilevel queue-scheduling algorithm, processes are permanently assigned to a queue
on entry to the system. Processes do not move between queues. If there are separate queues for
foreground and background processes, for example, processes do not move from one queue to the
other, since processes do not change their foreground or background nature. This setup has the
advantage of low scheduling overhead, but is inflexible. Multilevel feedback queue scheduling,
however, allows a process to move between queues. The idea is to separate processes with different
CPU-burst lowest priority characteristics. If a process uses too much CPU time, it will be moved to a
lower-priority queue. This scheme leaves I/O-bound and interactive processes in the higher-priority
queues. Similarly, a process that waits too long in a lower-priority queue may be moved to a higher-
priority queue. This form of aging prevents starvation.

Linux
10
Linux uses MFQ with 32 run queues. System run queues use queues 0 to 7, processes executing in user
space use 8 to 31. Inside each queue UNIX uses round robin scheduling. Various distributions of UNIX
11
have different time quanta, but all are less than 100 micro seconds. Every process has a nice priority
but is only used to influence and not solely determine priorities.

Real-time Operating Systems

Definition: A real-time operating system (RTOS) is an operating system that guarantees a certain
capability within a specified time constraint.

For example, an operating system might be designed to ensure that a certain object was available for a
robot on an assembly line. In what is usually called a "hard" real-time operating system, if the
calculation could not be performed for making the object available at the designated time, the operating
system would terminate with a failure. In a "soft" real-time operating system, the assembly line would
continue to function but the production output might be lower as objects failed to appear at their
designated time, causing the robot to be temporarily unproductive.

Hard vs. Soft Real-time
The issue of predictability is of key concern in real-time systems. In fact, the term "predictable" can
often be found in the multitude of definitions of real-time in the literature. Both periodic and aperiodic
tasks have strict timing requirements in the form of deadlines. That is, the scheduler must be able to
guarantee that each task is allocated a resource by a particular point in time, based on the task's
parameters. In order to accomplish this, every allocation of any resource to any task must incur a
latency that is deterministic or that can at least be predicted within a statistical margin of error.

Hard real-time tasks are required to meet all deadlines for every instance, and for these activities the
failure to meet even a single deadline is considered catastrophic. Examples are found in flight
navigation, automobile, and spacecraft systems. In contrast, soft real-time tasks allow for a statistical
bound on the number of deadlines missed, or on the allowable lateness of completing processing for an
instance in relation to a deadline. Soft real-time applications include media streaming in distributed
systems and non-mission-critical tasks in control systems.

Periodic Real-time Tasks
In general, a real-time task requires a specified amount of particular resources during specified periods
of time. A periodic task is a task that requests resources at time values representing a periodic function.
That is, there is a continuous and deterministic pattern of time intervals between requests of a resource.
In addition to this requirement, a real-time periodic task must complete processing by a specified
deadline relative to the time that it acquires the processor (or some other resource).

For simplicity, assume that a real-time task has a constant request period (ie, must begin execution on
the processor every n milliseconds).
For example, a robotics application may consist of a number of periodic real-time tasks, which perform
activities such as sensor data collection or regular network transmissions. Suppose the robot runs a task
that must collect infrared sensor data to determine if a barrier is nearby at regular time intervals. If the
configuration of this task requires that every 5 milliseconds it must complete 2 milliseconds of
collecting and processing the sensor data, then the task is a periodic real-time task.

Aperiodic Real-time Tasks
12
The aperiodic real-time task model involves real-time activities that request a resource during non-
deterministic request periods. Each task instance is also associated with a specified deadline, which
represents the time necessary for it to complete its execution.
Examples of aperiodic real-time tasks are found in event-driven real-time systems, such as ejection of a
pilot seat when the command is given to the navigation system in a jet fighter. Many, less time-
sensitive, applications also arise in distributed systems involving real-time streaming media (ie, end-
host routing over a logical overlay).

RTOS Scheduling
The RTOS scheduling policy is one of the most important features of the RTOS to consider when
assessing its usefulness in a real-time application. There are a multitude of scheduling algorithms and
scheduler characteristics.
An RTOS requires a specific set of attributes to be effective. The task scheduling should be priority
based. A task scheduler for an RTOS has multiple levels of interrupt priorities where the higher priority
tasks run first. The task scheduler for an RTOS is also preemptive. If a higher priority task becomes
ready to run, it will immediately pre-empt a lower priority running task. This is required for real-time
applications. Finally, the RTOS must be event driven. The RTOS has the capability to respond to
external events such as interrupts from the environment. The RTOS can also respond to internal events
if required.

Scheduler Jargon
Multitasking jargon is often confusing as there are so many different strategies and techniques for
multiplexing the processor. The key differences among the various strategies revolve around how a task
loses control and how it gains control of the processor. While these are separate design decisions (for the
RTOS designer), they are often treated as being implicitly linked in marketing literature and casual
conversation.
Scheduling strategies viewed by how tasks lose control:
Only by voluntary surrender. This style is called cooperative multitasking. For a task to lose control of
the processor, the task must voluntarily call the RTOS. These systems continue to multitask only so
long as all the tasks continue to share graciously.
Only after they've finished their work. Called run to completion. To be practical, this style requires that
all tasks be relatively short duration.
Whenever the scheduler says so. Called preemptive. In this style the RTOS scheduler can interrupt a
task at any time. Preemptive schedulers are generally more able to meet specific timing requirements
than others. (Notice that if the scheduler "says so" at regular fixed intervals, then this style is called time
slicing.)

Scheduling strategies viewed by how tasks gain control:
By being next in line. A simple FIFO task queue. Sometimes called round-robin. Very uncommon.
By waiting for its turn in a fixed-rotation. If the cycle is only allowed to restart at specific fixed
intervals, it's called a rate cyclic scheduler.
By waiting a specific amount of time. A very literal form of multiplexing in which each ready to
execute task is given the processor for a fixed-quantum of time. If the tasks are processed in FIFO
order, this style is called a round-robin scheduler. If the tasks are selected using some other scheme
it's considered a time-slicing scheduler.
By having a higher priority than any other task wanting the processor. A priority-based or
prioritised scheduler

Not all of these combinations make sense, but even so, it's important to understand that task
interruption and task selection are separate mechanisms. Certain combinations are so common (e.g.,
preemptive prioritised), that one trait (e.g., prioritised) is often misconstrued to imply the other (e.g.,
13
preemptive). In fact, it is perfectly reasonable to have a prioritised, non-preemptive (e.g. run to
completion) scheduler. For technical reasons, prioritised-preemptive schedulers are the most frequently
used in RTOSs.
The scheduler is a central part of the kernel. It executes periodically and whenever the state of a thread
changes. A single-task system does not really need a scheduler since there is no competition for the
processor. Multitasking implies scheduling because there are multiple tasks competing for the
processor. A scheduler must run often enough to monitor the usage of the CPU by the tasks. In most
real-time systems, the scheduler is invoked at regular intervals. This invocation is usually the result of a
periodic timer interrupt. The period in which this interrupt is invoked is called the tick size or the
system "heartbeat." At each clock interrupt the RTOS updates the state of the system by analysing task
execution budgets and making decisions as to which task should have access to the system CPU.

Scheduling Policies in Real-Time Systems
There are several approaches to scheduling tasks in real-time systems. These fall into two general
categories, fixed or static priority scheduling policies and dynamic priority scheduling policies.
Fixed-priority scheduling algorithms do not modify a job's priority while the task is running. The task
itself is allowed to modify its own priority. This approach requires very little support code in the
scheduler to implement this functionality. The scheduler is fast and predictable with this approach. The
scheduling is mostly done offline (before the system runs). This requires the system designer to know
the task set a-priori (ahead of time) and is not suitable for tasks that are created dynamically during run
time. The priority of the task set must be determined beforehand and cannot change when the system
runs unless the task itself changes its own priority.
Dynamic scheduling algorithms allow a scheduler to modify a job's priority based on one of several
scheduling algorithms or policies. This is a more complicated approach and requires more code in the
scheduler to implement. This leads to more overhead in managing a task set in a system because the
scheduler must now spend more time dynamically sorting through the system task set and prioritising
tasks for execution based on the scheduling policy. This leads to non-determinism, which is not
favorable, especially for hard real-time systems. Dynamic scheduling algorithms are online scheduling
algorithms. The scheduling policy is applied to the task set during the execution of the system. The
active task set changes dynamically as the system runs. The priority of the tasks can also change
dynamically.

Real Time Scheduling Policies
Rate Monotonic - (RMS) is an approach that is used to assign task priority for a pre-emptive system in
such a way that the correct execution can be guaranteed. It assumes that the task priorities are fixed for
a given set of tasks and are not dynamically changed during execution.. It assumes there are sufficient
task priority levels for the task set and that the tasks are periodic.
The key to this scheme is based on the fact that tasks with shorter execution periods are given the
highest priority. This means that the more frequently executing tasks can pre-empt the slower periodic
tasks so that they can meet their deadlines
Figure 1 shows how this policy works. In the diagrams, events that start a task are shown as lines that
cross the horizontal time line and tasks are shown as rectangles whose length determines their
execution time. Example 1 shows a single periodic task where the task t is executed with a periodicity
of time t. The second example adds a second task S where its periodicity is longer than that of task t.
The task priority shown is with task S having highest priority. In this case, the RMS policy has not
been followed because the longest task has been given a higher priority than the shortest task.
However, in this case the system works fine because of the timing of the tasks periods.

14
Example 3 shows the problems if the priority is changed and the periodicity for task S approaches that
of task t.. When t3 occurs, task t is activated and starts to run. It does not complete because S2 occurs
and task S is swapped-in due to its higher priority. When task S completes, task t resumes but during its
execution, the event t4 occurs and thus task t as failed to meet its task 3 deadline. This could result in
missed or corrupted data, for example. When task t completes, it is then reactivated to cope with t4
event.
Example 4 shows the same scenario with the task priorities reversed so that task t pre-empts task S. In
this case, RMS policy has been followed and the system works fine with both tasks reaching their
deadlines.

Deadline Monotonic scheduling
Deadline monotonic scheduling (DMS) is a generalisation of the Rate-Monotonic scheduling
policy that uses the nearest deadline as the criterion for assigning task priority. Given a set of tasks, the
one with the nearest deadline is given the highest priority, hence, the shorter this (fixed) deadline, the
higher the priority.
This means that the scheduling or designer must know when these deadlines are to take place.
Tracking and, in fact, getting this information in the first place can be difficult and this is often the
reason behind why deadline scheduling is often second choice when compared to RMS.

Examples of RMS vs DMS Scheduling
Rate Monotonic Scheduling has shown to be optimal among static priority policies. However, some
task sets that aren't schedulable using RMS can be scheduled using dynamic strategies. An example is a
task set where the deadline for completing processing is not the task period (the deadline is some time
shorter than the task period). In this example, we'll show a task set that can be scheduled under the
deadline monotonic priority policy, but not under RMS.
Consider the following task set:

Using the Deadline Monotonic approach to scheduling, the task execution profile is shown in Figure 1.
All tasks meet their respective deadlines using this approach.

Figure 1. Example of deadline monotonic scheduling.
15
Now consider the same task set, this time prioritised using the Rate Monotonic approach to scheduling.
The task priorities change using the RMA approach, as shown below:

Same task set for rate monotonic scheduling
The timeline analysis using the RMA scheduling technique is shown in Figure 2. Notice that, using the
RMA approach and the deadline constraints defined in the task set, that task 1 is now not schedulable.
Although task 1 meets its period, it misses its defined deadline.

Figure 2. Same example with rate monotonic schedulingtask misses deadline.

Dynamic Scheduling Policies
Dynamic scheduling algorithms can be broken into two main classes of algorithms. The first is referred
to as a "dynamic planning based approach." This approach is very useful for systems that must
dynamically accept new tasks into the system; for example a wireless base station that must accept new
calls into the system at a some dynamic rate. This approach combines some of the flexibility of a
dynamic approach and some of the predictability of a more static approach. After a task arrives, but
before its execution begins, a check is made to determine whether a schedule can be created that can
handle the new task as well as the currently executing tasks. Another approach, called the dynamic best
effort approach, uses the task deadlines and slack to set the priorities. With this approach, a task could
be pre-empted at any time during its execution. So, until the deadline arrives or the task finishes
execution, there is no guarantee that a timing constraint can be met. Examples of dynamic best effort
algorithms are Earliest Deadline First and Least Slack scheduling.

Earliest deadline first scheduling
With this approach, the deadline of a task instance is the absolute point in time by which
the instance must complete. The task deadline is computed when the instance is created.
The operating system scheduler picks the task with the earliest deadline to run. A task
with an earlier deadline preempts a task with a later deadline. If a task set is schedulable,
the EDF algorithm results in a schedule that achieves optimal resource utilisation.
16
However, EDF is shown to be unpredictable if the required utilisation exceeds 100%,
known as an overload condition. EDF is useful for scheduling aperiodic tasks, since the
dynamic priorities of tasks do not depend on the determinism of request periods.

Least slack scheduling Least slack scheduling is also a dynamic priority preemptive
policy. The slack of a task instance is the absolute deadline minus the remaining execution
time for the instance to complete. The OS scheduler picks the task with the shortest slack
to run first. A task with a smaller slack preempts a task with a larger slack. This approach
maximizes the minimum lateness of tasks.

Scheduling with Task Synchronization
Independent tasks have been assumed so far, but this is very limiting. Task interaction is common in
almost all applications. Task synchronization requirements introduce a new set of potential problems.
Consider the following scenario: A task enters a critical section (it needs exclusive use of a resource
such as IO devices or data structures). A higher priority task preempts and wants to use the same
resource. The high priority task is then blocked until the lower priority task completes. Because the low
priority task could be blocked by other higher priority tasks, this is unbounded. This example of a
higher-priority task having to wait for a lower-priority task is call priority inversion.

Example: priority inversion
An example of priority inversion is shown in Figure 3. Tasklow begins executing and requires the use
of a critical section. While in the critical section, a higher priority task, Taskh preempts the lower
priority task and begins its execution. During execution, this task requires the use of the same critical
resource. Since the resource is already owned by the lower priority task, Taskh must block waiting on
the lower priority task to release the resource. Tasklow resumes execution only to be pre-empted by a
medium priority task Taskmed. Taskmed does not require the use of the same critical resource and
executed to completion. Tasklow resumes execution, finishes the use of the critical resource and is
immediately (actually on the next scheduling interval) pre-empted by the higher priority task which
executed its critical resource, completes execution and relinquishes control back to the lower priority
task to complete. Priority inversion occurs while the higher priority task is waiting for the lower
priority task to release the critical resource.

Figure 3. Example of priority inversion.

17

Linux Cpu Scheduler

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linux Cpu Scheduler

Uploaded by

Copyright:

Available Formats

1

You might also like