You are on page 1of 14

978

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 60,

NO. 7, JULY 2011

On Improving Real-Time Interrupt Latencies of Hybrid Operating Systems with Two-Level Hardware Interrupts
Miao Liu, Duo Liu, Yi Wang, Meng Wang, and Zili Shao, Member, IEEE
AbstractIn this paper, we propose to implement hybrid operating systems based on two-level hardware interrupts. We analyze and model the worst-case real-time interrupt latency for RTAI and identify the key component for its optimization. Then, we propose our methodology to implement hybrid operating systems with two-level hardware interrupts by combining the real-time kernel and the timesharing OS (Operating System) kernel. Based on the methodology, we discuss the important issues for the implementation. Finally, we implement a hybrid system called RTLinux-THIN (Real-Time LINUX with Two-level Hardware INterrupts) on the ARM architecture by combining ARM Linux kernel 2.6.9 and C=OS-II. We conduct experiments on a set of real application programs including mplayer, Bonnie, and iperf, and compare the interrupt latency and interrupt task distributions for RTLinux-THIN (with and without cache locking), RTAI, Linux, and Linux with RT patch on a hardware platform based on Intel PXA270 processor. The results show that our scheme not only provides an easy method for implementing hybrid systems but also achieves the performance improvement for both the timesharing and real-time subsystems. Index TermsHybrid operating systems, real-time interrupt latency, RTAI, Linux, two-level hardware interrupts.

1 INTRODUCTION
OMBINING both a real-time and a time-sharing subsystem,

hybrid operating systems can provide both predictable real-time task execution and non-real-time services with well-known interfaces and lots of existing applications. In order to achieve relatively low development and maintenance costs, the time-sharing subsystem of a hybrid system is often based on commodity operating systems, such as Linux [2], [3], [4], [5], [6], [7]. Since commodity operating systems usually focus on general-purpose computing, it becomes an important problem of how to effectively use them in realtime environments without impairing the predictability of real-time applications. The schedulability problem in an environment with both real-time and non-real-time applications was first studied in [8], [9], [10]. In [9], Deng and Liu proposed an open system architecture to run real-time applications concurrently with non-real-time applications. In their work, a two-level hierarchical scheduling framework has been developed to guarantee the schedulability of each real-time application regardless of the behaviors of other applications. Based on this, various techniques have been proposed to address different issues of the open system such as resource sharing [11], [12], [13], resource partitioning [14], and real-time

. M. Liu is with the Robot Research Institute, Beihang University, Beijing, China. E-mail: threewaterl@163.com. . D. Liu, Y. Wang, M. Wang, and Z. Shao are with the Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong. E-mail: {csdliu, csywang, csmewang, cszlshao}@comp.polyu.edu.hk. Manuscript received 1 Apr. 2009; revised 14 Jan. 2010; accepted 12 May 2010; published online 4 June 2010. Recommended for acceptance by S.H. Son. For information on obtaining reprints of this article, please send e-mail to: tc@computer.org, and reference IEEECS Log Number TC-2009-04-0140. Digital Object Identifier no. 10.1109/TC.2010.119.
0018-9340/11/$26.00 2011 IEEE

schedulers [15], [16], [17], [18], [19]. The hierarchical scheduling framework has been extended to different system models such as component-based real-time embedded systems [20], and parallel and distributed systems [21]. To solve the predictability problem in hybrid systems, many techniques have been proposed from the previous work. A general approach used in these techniques is to defer non-real-time tasks when there are real-time tasks awaiting services by modifying the interrupt-handling code of a commodity OS [22]. As interrupt latency has big impact on both performance and predictability in hybrid systems, a lot of studies have been conducted in various aspects such as interrupt latency analysis [23], [24], [25], [26], [27], hybrid interrupt-polling scheme [28], delay locking [29], and interrupt scheduling techniques [18], [22], [30], [31], [32], [33], [34], [35]. Most of the above work is based on one-level hardware interrupts, in which both real-time and non-realtime interrupts are coming from the same interrupt request entry and they will be separated in the interrupt handling code. We found that some problems are caused for hybrid systems by the one-level hardware interrupts. We use RTAI [3], an open source hybrid system based on Linux, as the representative for commodity-OS-based hybrid systems, and discuss these problems below. RTAI is chosen because it is being actively developed and supported. In RTAI, the Linux OS kernel is treated as the idle task, and it only executes when there are no real-time tasks to run and the real-time kernel is inactive. The interrupt-handling code is modified to emulate the function of the interrupt controller, so the Linux task can never block real-time interrupts or prevent itself from being preempted [3] (See Section 2 for details). There are several problems by using the interrupt-handling code to separate real-time and nonreal-time interrupts and emulate the interrupt controller as shown in the following:
Published by the IEEE Computer Society

LIU ET AL.: ON IMPROVING REAL-TIME INTERRUPT LATENCIES OF HYBRID OPERATING SYSTEMS WITH TWO-LEVEL HARDWARE...

979

Interrupt disabling is frequently used in interrupt handlers, critical sections, and so on, in the Linux task. However, the interrupt disabling is processed by setting a flag in the interrupt-handling code based on the software emulation of the interrupt controller without really disabling interrupts in order to avoid blocking real-time interrupts in RTAI. Therefore, non-real-time interrupts of the Linux task can still be delivered when they should be disabled during interrupt disabling. Although these interrupts will be turned off later if the flag is set in the interrupthandling code, they can only be turned off individually. These unnecessary interrupt responses and processing not only degrade the performance of the Linux task but also increase the unpredictability of the real-time subsystem. . The size of the interrupt-handling code for interrupt requests is increased with the functions of identification and emulation. With the big code size, it is difficult to lock the interrupt-handling code into the cache, and various cache locking techniques [36], [37] may not be applicable to improve the predictability. . Although a hardware abstraction layer (HAL) is abstracted in RTAI [3], the code related to interrupts in the Linux OS kernel must be rewritten. This causes a lot of work to port a hybrid system on various processors. To solve these problems, we propose to implement hybrid operating systems based on two-level hardware interrupts. To separate real-time and non-real-time hardware interrupts by hardware, we show that it is easier to build up hybrid systems with better performance. Our focus is on improving the predictability and real-time interrupt latency of the real-time subsystem, and enhancing the performance of the time-sharing subsystem as well. Two-level hardware interrupts with different interrupt request entries have been provided in high-end embedded processors such as those based on the ARM architecture [38]. With this architecture support, our scheme not only provides an easy method for implementing hybrid systems but also achieves the performance improvement for both the time-sharing and real-time subsystems. Our main contributions are summarized as follows: . . We analyze and discuss the key issues for implementing a hybrid system based on two-level hardware interrupts including the methodology of combining the real-time kernel and the timesharing OS kernel, the implementation of the realtime scheduling and the analysis of real-time interrupt latency. We implement a hybrid system called RTLinux-THIN (Real-Time LINUX with Two-level Hardware INterrupts) on the ARM architecture by combining ARM Linux kernel 2.6.9 and C=OS-II, two widely used kernels with source code available in the embedded and real-time fields (ARM Linux for time-sharing systems and  C=OS-II for real-time systems). We implement our RTLinux-THIN on a hardware platform based on Intel PXA270 processor [39]. We conduct experiments with three real application

Fig. 1. A typical interrupt request handling procedure in a hybrid system.

programs: mplayer [40], Bonnie [41], and iperf [42], and compare statistical interrupt latency and interrupt task latency distributions in four system conditions: idle, intensive memory access (decoding a fragment of MPEG-4 video in mplayer), intensive IDE Disk access (evaluating the speed of the file system with a set of file operating benchmarks in Bonnie), and intensive network communication (measuring TCP/UDP bandwidth performance in iperf). The experimental results show that RTLinuxTHIN improves real-time interrupt latencies and provides better predictability. The remainder of the paper is organized as follows: In Section 2, we present the necessary background by introducing basic knowledge of interrupts and analyzing the interrupt processing and worst-case interrupt latency of RTAI. In Section 3, we discuss the key issues of implementing a hybrid system based on two-level hardware interrupts. The system implementation and the experiments are shown in Sections 4 and 5, respectively. The related work is presented in Section 6. In Section 7, we conclude the paper.

BACKGROUND

In this section, we present the background for interrupts and interrupt handling in hybrid systems as interrupt handling plays a very important role in implementing commodityOS-based hybrid systems. We first introduce interrupts, interrupt handlers, and interrupt latencies in Section 2.1. Then, we analyze interrupt handling and the worst-case real-time interrupt latencies in RTAI in Section 2.2.

2.1 Interrupts in Hybrid Systems Fig. 1 shows a typical interrupt request handling procedure in a hybrid system. Basically, real-time and non-realtime interrupt requests are passed to the interrupthandling code through the interrupt request entry. The interrupt handling code can be separated into two parts, the interrupt distribution routine and interrupt service routine (ISR). First, the interrupt distribution routine determines the entry point for an interrupt request. Next, a specific interrupt service routine is called, and then the interrupted task/program/interrupt is resumed or a new task/interrupt is rescheduled to be executed before exiting from the ISR. Although the interrupt handling in hybrid systems looks like that in general-purpose OS, it has to be changed a lot with the structure shown in Fig. 1, in which real-time and non-real-time interrupts are passed through the same interrupt request entry. First, in the interrupt distribution code, in order to satisfy the predictability of the real-time subsystem, we need to separate real-time and non-real-time interrupts since they will be processed differently. Second,

980

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 60,

NO. 7, JULY 2011

Fig. 2. The interrupt processing flow of the interrupt distribution routine in RTAI.

we have to solve the interrupt disabling problem when dealing with non-real-time interrupts. The interrupt disabling problem is caused as follows: The time-sharing subsystem of a hybrid system is usually treated as the task with the lowest priority. With the lowest priority, the time-sharing subsystem task cannot block realtime interrupts nor can it prevent itself from being preempted. On the other hand, in a time-sharing operating system, such as Linux, interrupt disabling is frequently used in interrupt handlers, critical sections, and so on. And in most processors, interrupt disabling is achieved by masking the interrupt disabling/enabling bit in the Program Status Word (PSW) register, and all interrupt requests will be disabled if the bit is set. In hybrid systems, we cannot really set the interrupt disabling/enabling bit for interrupt disabling from the time-sharing subsystem task. Currently, a general approach to solve this problem is to use software to emulate such interrupt control [2], [3]. Next, we will use RTAI as an example to analyze the detailed interrupt handling and worst-case real-time interrupt latency.

2.2 Interrupt Handling in RTAI RTAI is a Linux-based hybrid system, in which the Linux kernel is treated as the idle task, and it only executes when there are no real-time tasks to run and the real-time kernel is inactive [3]. In RTAI, to solve the interrupt disabling problem, in the interrupt distribution routine, a software emulation method called virtual interrupt controller based on adaptive domain environment for operating systems [43] is used to manage the interrupts of the Linux task. Basically, when the Linux task disables interrupts, the interrupt disabling/enabling bit in the PSW is not set; instead, it is set in the virtual interrupt controller, the software emulation of the interrupt controller. If non-real-time interrupts occur when the interrupt disabling/enabling bit in the virtual interrupt controller is set, they will be recorded into a FirstIn-First-Out (FIFO) queue, acknowledged but not served. These interrupts will be served one by one when the interrupt disabling/enabling bit in the virtual interrupt controller is cleared. Correspondingly, the processing flow of the interrupt distribution routine in RTAI is shown in Fig. 2. As shown in Fig. 2, in the interrupt distribution routine, when an interrupt request is responded, we first decide if it is a real-time interrupt or non-real-time interrupt. For a real-time interrupt, it will be directly served by calling its corresponding

interrupt service routine. For a non-real-time interrupt, we first check whether or not there is any ready or running realtime tasks. If there is, the interrupt request is recorded into a FIFO queue, acknowledged but not served, because real-time tasks have higher priority than non-real-time interrupts. Otherwise, we will check whether or not the interrupt disabling/enabling bit in the virtual interrupt controller is set. If the bit is not set (the interrupts are not disabled), the interrupt will be served and the corresponding interrupt routine will be called; otherwise, the interrupt request is recorded and acknowledged but not served. This software emulation method causes some problems for RTAI. First, the interrupt disabling from the Linux task is processed by setting a flag in the interrupt-handling code based on the software emulation of the interrupt controller without really disabling interrupts. Therefore, non-real-time interrupts of the Linux task can still be responded when they should be disabled during interrupt disabling. These unnecessary responses cause CPU overhead. Moreover, the software method may cause some problems for systems with level-triggered interrupts. In a leveltriggered hardware interrupt, the level (high or low) of the interrupt request line indicates whether or not there is an unserviced interrupt. If a device wants to signal an interrupt, it drives the line to the active level, and then holds it at that level until the interrupt is serviced. Therefore, a normal processing procedure is to first serve and then acknowledge driving the line to the inactive level. However, using the software emulation method, when the interrupt disabling/ enabling bit is set in the virtual interrupt handler, we have to first acknowledge an interrupt and then serve it later. Therefore, it cannot correctly handle level-triggered interrupts. In this case, the interrupt handler cannot directly acknowledge such an interrupt request as the request is not serviced yet. In other words, if we directly acknowledge it by driving the line to the inactive level, it means that the interrupt request has been served (but it is not served). On the other hand, if we do not acknowledge the interrupt request, it will repeatedly trigger interrupts as the interrupt request line is in the active level. To repeatedly response these interrupts, therefore, the system cannot be able to make progress. The software emulation method causes overhead for the real-time subsystem. To provide predictable real-time services, real-time interrupt latency is one of the most important performance metrics for a hybrid system. In this paper, we define interrupt latency as the time interval from the point when an interrupt is asserted by a device to the point when the corresponding interrupt service routine starts executing. As predictability is the main concern in realtime systems, in the following, we will analyze the WorstCase Execution Time (WCET) for real-time interrupt latency. Without loss of generality, assume that there are N realtime interrupts, I1 ; I2 ; . . . ; IN . The priority order of N interrupts is I1 > I2 > > IN , which means that if I1 ; I2 ; . . . ; IN occur at the same time, the interrupt processing order is I1 ; I2 ; . . . ; IN . From the processing procedure shown in Fig. 1, the real-time interrupt latency for interrupt IK 1 K N is related to the waiting time and the interrupt processing time. For the convenience of analysis, we further divide the interrupt processing into two parts:

LIU ET AL.: ON IMPROVING REAL-TIME INTERRUPT LATENCIES OF HYBRID OPERATING SYSTEMS WITH TWO-LEVEL HARDWARE...

981

Fig. 3. The real-time interrupt latency of interrupt IK .

the distribution and the interrupt service parts. Based on this division, the real-time interrupt latency for interrupt IK is shown in Fig. 3. As shown in Fig. 3, the interrupt processing time of interrupt IK , TP K , consisting of two parts: the distribution time (TD K ) and the service time (TS K ), can be denoted as follows: TP K TD K TS K : 1

TD K is the time interval from the point the interrupt occurs to the point that the interrupt is served by reaching to its corresponding interrupt service routine. TS K is the interrupt service time in its corresponding interrupt service routine. Based on the definition, the worst-case real-time interrupt latency of interrupt IK , W CET (IK ), is the summation of the distribution time (TD K ) and the waiting time. W CET (IK ) is denoted as follows: W CET IK TD K The worst-case waiting time: 2

The system just finishes executing an operating system trap instruction in which all interrupts are automatically disabled. Therefore, we have to wait maxfTT g in the worst case until the interrupts are enabled, where TT represents one of possible time intervals from the point when the interrupt is disabled to the point when the interrupt is enabled caused by executing the trap instruction. In each of the above states, the hardware interrupts will be disabled when the system enters into it, while the hardware interrupts will be enabled after the system exits from it. Each interrupt can only occur once, and the worst-case execution times of I1 ; I2 ; . . . ; IK 1 have been counted in the above priority-related part. If the system is in State (a), it should be processing an interrupt whose priority is lower than IK ; so it will continue to process I1 ; I2 ; . . . ; IK after it exits from the state. This happens in other states, in which the system will continue to process I1 ; I2 ; . . . ; IK as well after it exits from a state. Therefore, the worst-case waiting time of this part is maxfmaxfTD g; maxfTS g; maxfTC g; maxfTT gg. So we obtain the worst-case interrupt latency of interrupt IK , W CET (IK ), as follows: d. W CET IK TD K
K 1 X i 1

TD i

K 1 X i1

TS i

maxfmaxfTD g; maxfTS g; maxfTC g; maxfTT gg: In the above analysis, we assume that each interrupt can only occur once in the worse case. Next, we extend this to a more general case in which each interrupt can occur multiple times. Assume that for interrupt Ii (1 i K ), N i is the times it occurs, and the occurrence sequence is Ii;1 ; Ii;2 ; . . . ; Ii;N i . From the processing procedure shown in Fig. 1, the real-time interrupt latency for interrupt IK;j 1 K N; 1 j N K is related to the waiting time and the interrupt processing time. Based on the priority, the interrupts of IK can only be processed after the processing for I1;1 ; I1;2 ; . . . ; IK 1;1 ; IK 1;N K 1 has been finished. The worst-case waiting time of this part, therefore, is PK 1 N i TD i TS i. Furthermore, considering the i1 interrupts of IK , the jth interrupt of IK can only be processed after the processing for IK;1 ; IK;2 ; . . . ; IK;j1 has been finished. As a result, the worst-case waiting time of this part is j 1 TD K TS K . Based on (3), we obtain the worst-case interrupt latency of IK;j (the jth interrupt of IK ), W CET (IK;j ), as follows: 8 K 1 X > > > N i TD i TS i > W CET IK;1 TD K > < i1 maxfmaxfTD g; maxfTS g; maxfTC g; maxfTT gg; > > > W CET IK;j W CET IK;1 j 1 TD K TS K ; > > : j > 1 4 From the above equations, we can see that the worstcase real-time interrupt latency is related to TD , TS , TC , and TT , especially the first two. TS , the processing time for an interrupt service routine, is decided by the

First, let us assume that each interrupt can only occur once in the worst case. The worst-case waiting time can be divided into the following two parts: 1. The priority-related worst case: Considering the priority, the worst case is that interrupt IK occurs at the same time as I1 ; I2 ; . . . ; IK 1 , all interrupts that have higher priority than IK , do. Based on the priority, IK can only be processed after the processing for I1 ; I2 ; . . . ; IK 1 has been finished. The worstcase time of this part, therefore, is PK 1 waitingP K 1 T i P i1 i1 fTD i TS ig. The interrupt-disable-related worst case: When I1 ; I2 ; . . . ; IK occur, the system maybe in one of the following four states in which the hardware interrupts are disabled: a. The system just enters into the interrupt distribution routine. Therefore, we have to wait maxfTD g in the worst case until the interrupts are enabled. The system just enters into a real-time interrupt service routine. Therefore, we have to wait maxfTS g in the worst case until the interrupt service is finished. The system just enters into a critical section in a real-time task. Therefore, we have to wait maxfTC g in the worst case until the system exits from the critical section, where TC represents one of the possible time intervals in which the hardware interrupts are disabled caused by entering a critical section in a real-time task.

2.

b.

c.

982

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 60,

NO. 7, JULY 2011

Fig. 4. The two-level hardware interrupts.

interrupt processing and scheduling methods in the realtime subsystem. TC , the waiting time for entering a critical section in a real-time task, is decided by the real-time subsystem. TT , the delay caused by the trap instruction, is decided by the method to deal with system calls in the time-sharing subsystem. Optimizing TS , TC , and TT is closely related to the OS design. However, TD , the delay caused by the interrupt distribution routine, can be optimized by changing the interrupt distribution method. As TD (the distribution time) has a huge influence on the worst-case real-time interrupt delay, it needs to be reduced as much as possible. In the interrupt distribution routine of RTAI, we need to separate real-time/non-real-time interrupts and emulate the functions of the interrupt controller with the virtual interrupt controller. This makes it run slow and difficult to be locked in the cache with the big code size. As this distribution method is applied in most hybrid systems, it causes a general problem. Next, we will propose a scheme to solve it.

since this has been done by hardware. The interrupts for the time-sharing subsystem can be disabled/enabled freely since it will not influence the interrupts for the real-time subsystem. Therefore, the problem caused by using software to emulate the interrupt controller without really disabling interrupts can be solved. In the interrupt distribution routine, there is also no need to emulate the interrupt controller. Thus, the interrupt distribution routines for both the real-time and time-sharing subsystems can be simplified and run faster. Correspondingly, we can reduce interrupt latency for both the real-time and time-sharing subsystems. The predictability of the system can be improved with this scheme. First, the real-time interrupt distribution routine is simpler and can be implemented with smaller code size. Therefore, it is more possible for us to lock it into the cache. In this way, we can reduce its execution time and improve the predictability. Second, even when non-realtime interrupts are disabled, we can still preempt the timesharing subsystem task or non-real-time interrupts through the real-time interrupt entry. Therefore, we can reduce the waiting time of a real-time interrupt. Compared with the worst-case real-time interrupt latency from (3), with the two-level hardware interrupts, TT (the waiting time caused by the trap instruction) can be removed since we can preempt the Linux task through the real-time interrupt request entry. The real-time interrupt distribution delay can be reduced since the code is simplified by removing the real-time/non-real-time interrupt separation and software emulation for the interrupt controller. Therefore, with our scheme, we can obtain the equation of the worst-case real-time interrupt latency for interrupt IK , W CET IK , as follows: W CET IK
K X i 1 0 TS i i 1 0 0 0 maxfmaxfTD g; maxfTS g; maxfTC gg: 0 TD i K 1 X

IMPLEMENTING HYBRID SYSTEMS WITH TWO-LEVEL HARDWARE INTERRUPTS

As shown above, several problems are caused by using the interrupt-handling code to separate real-time and non-realtime interrupts and emulate the interrupt controller. To solve these problems, we propose to implement hybrid operating systems based on two-level hardware interrupts. In this section, we first discuss the methodology and key issues of implementing a hybrid system based on two-level interrupts. Then, we analyze the worst-case real-time interrupt latency based on this scheme. Our two-level-interrupt hybrid-system implementation scheme is based on the two-level hardware interrupt architecture as shown in Fig. 4. In Fig. 4, real-time and non-real-time interrupts are separated by hardware and processed through different interrupt request entries. Therefore, in the interrupt distribution routine, we do not need to separate them, and as we can disable real-time/ non-real-time interrupts independently, we do not need to emulate the interrupt controller. A similar two-level hardware interrupt architecture has been provided in high-end embedded processors such as those based on the ARM architecture [38]. Next, we will discuss the key issues to implement hybrid systems based on this two-level hardware interrupt architecture. With two-level hardware interrupts, it is relatively easy to implement hybrid systems. It is easier to implement interrupt distribution routines for both real-time and timesharing subsystems. There is no need to separate real-time/ non-real-time interrupts in the interrupt distribution routine

0 is used to represent In the above equation, similar to (3), TD 0 the execution time of the interrupt distribution routine, TS is used to represent the processing time for an interrupt 0 service routine, and TC is used to represent the waiting time for entering a critical section in a real-time task. Similarly, the worst-case real-time interrupt latency of IK;j for the general case in which an interrupt can occur multiple times is 8 K 1 X > > 0 0 0 > T K N i TD i TS i W CET I > K; 1 D > < i1 0 0 0 g; maxfTS g; maxfTC gg; maxfmaxfTD > > 0 0 > W CET I W CET I j 1 T K T K ; > K;j K;1 D S > : j > 1

6 From the above equations, we can see that there is no component related to the time-sharing (Linux) subsystem. 0 And TD is smaller than TD in (3) and (4) as the real-time interrupt distribution routine in our scheme is simpler than the interrupt distribution routine in RTAI. We do not need to use the software to emulate the interrupt controller with this scheme. Therefore, we need to find different methods to implement some techniques that are implemented based on the software emulation interrupt

LIU ET AL.: ON IMPROVING REAL-TIME INTERRUPT LATENCIES OF HYBRID OPERATING SYSTEMS WITH TWO-LEVEL HARDWARE...

983

whole system, and the Linux kernel is treated as the idle task with the lowest priority.

4.3 Implementation Details Next, we introduce some key implementation issues including interrupt setting, and real-time scheduling. 4.3.1 Interrupt Setting According to the ARM architecture, all exception routine entries including IRQ, FIQ, and software interrupts are put into a fixed virtual address by function trap_init at the Linux kernel startup. In Linux, FIQ is not used, and it is disabled when the Linux kernel is booted. Moreover, there is no code involving FIQ in the kernel. Therefore, the initialization procedure of IRQ for the Linux does not need to change. For FIQ setting, we need to set its environment correctly and then enable it in the Linux booting code. In RTLinux-THIN, basically, we set the environment for FIQ as follows: 1) Set up the stack for FIQ. A private stack pointer is provided for FIQ and it must be set correctly. 2) Initialize the ISR descriptor table for FIQ so the corresponding interrupt service routines can be called correctly. This must be done at the start-up and updated with real-time ISRs. 3) Enable FIQ. FIQ is set and used only for the real-time kernel without any effect on the non-RT Linux subsystem. In the setting, we ensure that FIQ can be triggered and returned to the original status regardless of the current mode of the Linux subsystem. 4.3.2 Real-Time Scheduling As mentioned above, in RTLinux-THIN, the C=OS-II kernel is the scheduler for the whole system, and the Linux kernel is treated as the idle task with the lowest priority. The real-time scheduling is implemented based on the scheduler of C=OS-II. The priority order is set as follows: real-time interrupts > real-time tasks > non-real-time interrupts > non-real-time tasks. Based on this priority order, a real-time task from the C=OS-II kernel has higher priority than nonreal-time interrupts. Therefore, the scheduling will happen at the end of interrupt service routines of FIQ as the traditional C=OS-II does. When a real-time task is running, the interrupts from IRQ are disabled. In this way, we can guarantee the real-time performance of real-time tasks.

Fig. 5. The system structure of RTLinux-THIN by combining C=OS-II and ARM Linux.

controller. For example, in RTAI, the communication between a real-time task and the Linux task is based on real-time FIFO that is implemented based on the virtual interrupt controller. In Section 4, we discuss this problem and give a solution using spin locks.

SYSTEM IMPLEMENTATION

Based on the two-level hardware interrupt scheme, we implement a hybrid system called RTLinux-THIN on the ARM architecture [38] by combining ARM Linux kernel 2.6.9 and C=OS-II [44], two widely used kernels with source code available in the embedded and real-time fields. In this section, we first introduce the two-level interrupts in the ARM architecture. Then, we present the system structure and implementation details of our RTLinux-THIN system.

4.1 IRQ and FIQ in the ARM Architecture In the ARM architecture, two-level hardware interrupts, Interrupt Request (IRQ) and Fast Interrupt Request (FIQ), are provided. All external interrupts are usually mapped to IRQ, which is the case in most operating systems. However, FIQ provides a faster method to serve interrupt requests. First, FIQ has higher priority than IRQ and other exceptions such as software interrupt exception, undefined instruction exception, and data access abortion. Therefore, FIQ can preempt all other interrupts. Second, FIQ has its private registers (R8-R14), and less registers need to be protected in its interrupt distribution routine. In our implementation, non-real-time interrupts are mapped to IRQ and real-time interrupts are mapped to FIQ. 4.2 The System Structure of RTLinux-THIN Based on IRQ and FIQ, we implement RTLinux-THIN by combining C=OS-II and ARM Linux. The system structure of the RTLinux-THIN is shown in Fig. 5. The right-hand side of RTLinux-THIN is the real-time area that is managed by the C=OS-II kernel based on FIQ. The non-real-time area in the left-hand side is managed by the Linux kernel based on IRQ. In RTLinux-THIN, C=OS-II kernel is compiled as a part of the Linux kernel and works in the kernel mode. Moreover, C=OS-II is the scheduler for the

EXPERIMENTS

We implement our RTLinux-THIN on a hardware platform based on Intel PXA270 processor [39]. We conduct experiments with three real application programs: mplayer [40], Bonnie [41], and iperf [42], and compare statistical interrupt latency distributions in four system conditions: system idle, intensive memory accesses (decoding a fragment of MPEG4 video in mplayer), intensive IDE Disk accesses (evaluating the speed of the file system with a set of file operating benchmarks in Bonnie), and intensive network communication (measuring TCP/UDP bandwidth performance in iperf). In this section, we present and analyze the experimental results. We first introduce the experimental environment including hardware platform and operating systems we evaluate. Then, we present the performance

984

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 60,

NO. 7, JULY 2011

TABLE 1 The Real-Time Interrupt Distribution Routines of RTLinux-THIN and RTAI

metric and our measurement method. Finally, the experimental results are given and discussed.

5.1 Experimental Environment 5.1.1 Hardware Platform The hardware platform we use is based on Intel PXA270 processor [39] which is a XScale-based core complying with the ARM architecture v5TE instruction set. In the platform, the processor is running at 520 MHz with 64 MB SDRAM and 32 KB I-Cache, and the cache and eight-entry write buffer are enabled. The on-chip LCD controller is configured for VGA resolution with 16-bit color whose frame refresh frequency is about 70 Hz. 5.1.2 Operating Systems Evaluated We implement RTLinux-THIN and port ARM Linux 2.6.9 and RTAI on this hardware platform. As shown in Sections 2 and 3, interrupt distribution routines in a hybrid system have huge influence for interrupt latency. In Table 1, we compare the main functions and code sizes of the real-time interrupt distribution routines in RTLinux-THIN and RTAI based on this platform. From Table 1, we can see that in order to lock the real-time interrupt distribution routines into the cache on Intel PXA270 processor, 6 and 46 cache lines are needed for RTLinux-THIN and RTAI, respectively. We implement a version of RTLinux-THIN by locking its real-time interrupt distribution routine into the cache as it only takes six cache lines. The version with cache locking is called RTLinux-THINLock. Totally, we conduct experiments on five systems: RTLinux-THINLock, RTLinuxTHIN, RTAI, Linux (ARM Linux 2.6.9), and Linux with RT-patch (ARM Linux 2.6.31). RTLinux-THINLock, RTLinux-THI, RTAI and ARM Linux is based on Linux preemptable kernel 2.6.9. Linux 2.6.9 is selected as it is the newest one with the RTAI patch for ARM processors. To compare the performance with Linux with RT patch, we port ARM Linux 2.6.31 with RT patch (2.6.31 is the latest Linux version) on our hardware platform as well. 5.2 Performance Metric and Measurement Method The interrupt latency is the performance metric we use to evaluate the five systems. For RTAI, RTLinux-THIN and RTLinux-THINLock, in which real-time tasks are executed,

we also measure interrupt task latency. Interrupt task latency is defined as the time interval from the point when an interrupt (associated with the real-time task scheduler of a system) is asserted to the point when a task is scheduled to start to be executed. We use it to further evaluate real-time performance of the hybrid systems. To obtain a complete performance evaluation, the statistical distribution of the interrupt latencies or interrupt task latencies in a time period is used in our experiments. We first record each latency and then calculate the statistical distribution within a time period. In the following, we show how to set up a hardware timer so hardware interrupts can be generated periodically and how to measure its latencies. We use a timer whose frequency is set up as 3.25 MHz (clock cycles 308 ns) to measure interrupt latency. Two registers in this timer, OS Timer Count Register (OSCR) and OS Timer Match Register (OSMR), are used for calculating the time interval from the point when an interrupt occurs to the point when its corresponding ISR is reached. OSCR is incremented in each clock cycle. When its value reaches to the value we preset in OSMR, an interrupt will be generated. In the ISR of this interrupt, we will read the current value of OSCR into a register R. Then, the interrupt latency can be calculated by R-OSMR=3:25 s. After we obtain this latency, we record it into an array in memory, and then we set up the value of OSMR with OSCR 3;076 (3;076 10 ms=3:25 MHz) so as to generate next interrupt in about 10 ms. The accuracy of this measurement method is up to several tens of nanoseconds on the PXA270 processor (520 MHz), and the only derivation is caused by the time of executing two instructions in ISR (get the address of the OSCR, and load the value of OSCR into a register). Interrupt task latencies are obtained similarly. This interrupt is registered as a real-time interrupt by using re_request_irq in RTAI and request_fiq in our RTLinux-THIN and RTLinux-THINLock. We implement a device driver for this timer to provide a user interface. Based on this, in a user program, we can start/stop the interrupt latency recording in the ISR of the timer, and read experimental results to the user space. Interrupt latencies and interrupt task latencies are obtained in four system conditions with three real application programs: mplayer [40], Bonnie [41], and iperf [42]. The four system conditions are: system idle, intensive memory accesses by decoding a fragment of MPEG-4 video in mplayer, intensive IDE Disk accesses by evaluating the speed of the file system with a set of file operating benchmarks in Bonnie, and intensive network communication by measuring TCP/UDP bandwidth performance in iperf. For each operating system, in each system condition, a user testing program is implemented to run for 10 minutes to obtain about 60,000 interrupt latency samples. Each experiment is repeated 12 times, and then all results are used to generate the final statistical distribution. For interrupt task latencies, we further test two cases. In Case 1, there is only one real-time periodic task whose period is 1 ms. In Case 2, there are two real-time periodic tasks, and one has higher priority than the other. The period of the high-priority task is 1 ms and that of the lowpriority task is 3 ms. In each case, we measure both the

LIU ET AL.: ON IMPROVING REAL-TIME INTERRUPT LATENCIES OF HYBRID OPERATING SYSTEMS WITH TWO-LEVEL HARDWARE...

985

Fig. 6. Latency distribution when the Linux subsystem is idle. (a) Interrupt latency (no real-time task). (b) Interrupt latency (one real-time task). (c) Interrupt task latency (one real-time task). (d) Interrupt latency (two real-time tasks). (e) Interrupt task latency (two real-time tasks).

interrupt latency and interrupt task latency independently. For Case 2 in which there are two tasks, the interrupt task latency is measured for the high-priority task.

5.3 Results and Discussion Based on the above method, the statistical distributions of interrupt latencies and interrupt task latencies are obtained for the above four conditions, and the results are shown in Figs. 6, 7, 8, and 9, respectively, in which THIN represents RTLinux-THIN, THINLock represents RTLinux-THINLock, RTAI represents RTAI, Linux represents Linux, and RTLin represents Linux with RT patch. Each figure in Figs. 6, 7, 8, and 9 corresponds to one condition and includes five subfigures (named as a, b, c, d, and e). The first subfigure (a) shows the statistical interrupt latency distributions for five operating systems (RTLinux-THIN, RTLinux-THINLock, RTAI, Linux, and Linux with RT patch). The second and third subfigures (b and c) show the statistical interrupt latency and interrupt task latency distributions, respectively, when there is one real-time periodic task executed in RTLinux-THIN, RTLinux-THINLock, and RTAI, respectively. Similarly, the fourth and fifth subfigures (d and e) show the statistical interrupt latency and interrupt task latency distributions, respectively, when there are two real-time periodic tasks executed in RTLinux-THIN, RTLinux-THINLock, and RTAI, respectively, in which, the interrupt task latency distributions are for the high-priority task. In each of the subfigures, there is a two-dimensional plot where the X-axis denotes the time (unit: s) with logarithmic scale and the Y-axis denotes the percentage. On each plot, there are several curves, and each curve denotes one statistical distribution of latencies. For each point (x, y) on a curve, y denotes the percentage of the number of latencies

that are less than or equal to x to the total number of latencies. For example, on the curve for RTLinux-THINLock in Fig. 6a, the point (4.31,100) denotes that the percentage of the number of the interrupt latencies that are less than or equal to 4:31 s to the total number of the interrupt latencies is 100 percent. Next, we present and discuss the four figures for the four conditions, respectively.

5.3.1 System Idle The five curves in Fig. 6a show the statistical interrupt latency distributions for the five operating systems, respectively, when there is no real-time task. When the Linux subsystem is idle and there is no real-time task, the system performance is very predictable, and interrupt conflicts hardly happen. Based on (3) and (5), therefore, the execution times of the interrupt distribution routines 0 (TD and TD in the equations) play the most important role in interrupt latencies. Therefore, for RTLinux-THIN, RTLinux-THINLock, RTAI, and Linux, we can see that over 98 percent of interrupt latencies reach to the minimum interrupt latency. For the worst case, 7.69 and 4:31 s are obtained by RTLinux-THIN and RTLinuxTHINLock, respectively, while 12:31 s by RTAI. Both the best and worst interrupt latencies of Linux with RT patch (RTLin in the figure) are worse than other systems as its interrupt distribution routine is more complicate than others. Compared RTAI with RTLinux-THIN, RTLinuxTHIN provides the better worst-case real-time interrupt latency when the system is idle. It is mainly caused by the improvement in the interrupt distribution routines. To further evaluate the real-time performance of the hybrid systems, we provide the interrupt latency and interrupt task latency distributions for RTLinux-THIN,

986

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 60,

NO. 7, JULY 2011

Fig. 7. Latency distribution when a fragment of MPEG-4 video is decoded in mplayer [40]. (a) Interrupt latency (no real-time task). (b) Interrupt latency (one real-time task). (c) Interrupt task latency (one real-time task). (d) Interrupt latency (two real-time tasks). (e) Interrupt task latency (two real-time tasks).

Fig. 8. Latency distribution when the speed of the file system is evaluated with a set of file operating benchmarks in Bonnie [41]. (a) Interrupt latency (no real-time task). (b) Interrupt latency (one real-time task). (c) Interrupt task latency (one real-time task). (d) Interrupt latency (two real-time tasks). (e) Interrupt task latency (two real-time tasks).

RTLinux-THINLock, and RTAI, respectively, when there is one real-time task and there are two real-time tasks. When the interrupt latencies are measured (the results are shown

in Figs. 6b and 6d), the main function of the real-time task is to get into sleep by calling the sleep function. When the interrupt task latencies are measured (the results are shown

LIU ET AL.: ON IMPROVING REAL-TIME INTERRUPT LATENCIES OF HYBRID OPERATING SYSTEMS WITH TWO-LEVEL HARDWARE...

987

Fig. 9. Latency distribution when the TCP/UDP bandwidth performance is measured by iperf [42]. (a) Interrupt latency (no real-time task). (b) Interrupt latency (one real-time task). (c) Interrupt task latency (one real-time task). (d) Interrupt latency (two real-time tasks). (e) Interrupt task latency (two real-time tasks).

in Figs. 6c and 6e), the main function of the real-time task is to set up the register to record the time as described above and then get into sleep by calling the sleep function. Based on (3) and (5), when there exist real-time tasks, we may need to wait until the system exits from a critical section in a 0 in the equations maybe real-time task so TC and TC introduced. For the interrupt latencies in Figs. 6b and 6d, we observe that the minimum interrupt latencies obtained for all three systems are still the same (1:85 s) as the case when there is no real-time task. And most of interrupt latencies reach to the best interrupt latencies, in which, about 98 percent of interrupt latencies and 91 percent of interrupt latencies reach to the minimum one for the cases with one real-time task and two real-time tasks, respectively. The reason is that the execution times of the real-time tasks are very small so in most cases the interrupt will not occur at the same time when the task is scheduled to be executed (in the critical section). However, in the case with two real-time tasks, by introducing one more real-time task, this possibility is increased so the percentage of reaching to the minimum interrupt latency is slightly dropped in Fig. 6d compared to in Fig. 6b. The worst-case interrupt latencies obtained are the same for RTLinux-THIN and RTLinux-THINLock (8 s). One of the reasons to cause this is that Data TLB misses may occur when the registers are saved to the stack in the real-time interrupt distribution routine. Compared RTAI with RTLinux-THIN, RTLinuxTHIN still provides the better worst-case real-time interrupt latencies in both cases. For the interrupt task latencies, the results are shown in Figs. 6c and 6e. Here, the influences of the real-time

schedulers are added into the latencies. When there is one real-time task, compared the minimum interrupt latency in Fig. 6b with the minimum interrupt task latency in Fig. 6c, for RTLinux-THIN and RTLinux-THINLock, the latency is increased from 1.85 to 3:69 s, and for RTAI, the latency is increased from 1.85 to 4:92 s. This shows that the best execution time of the scheduler in C=OS-II is a little bit faster than that in RTAI. In Fig. 6c, for RTLinux-THIN and RTLinux-THINLock, over 95 percent of interrupt task latencies are no more than 4 s, and for RTAI, 5 s are needed to achieve this. In Figs. 6c and 6e, we can see that in terms of the worst-case interrupt task latency, RTLinuxTHINLock is slightly better than RTLinux-THIN, and both are better than RTAI. This is mainly caused by cache misses.

5.3.2 Intensive Memory Accesses in mplayer Fig. 7a shows the interrupt distributions when a fragment MPEG-4 video is decoded in RAM by mplayer [40] and there is no real-time task. For RTAI and Linux, because the video is stored in RAM, interrupt latencies are mainly influenced by two factors, cache misses and system calls. Based on (3) and 0 (the execution (5), besides the difference between TD and TD times of the interrupt distribution routines), for RTAI, TT in (3), the delay caused by the trap instruction from system calls, also introduces extra waiting time while this is removed in RTLinux-THIN with the two-level interrupt scheme. With these influences, for Linux and Linux with RT patch, their worst-case interrupt latencies are 67.38 and 156:31 s, respectively. The main reason that Linux shows the better results than Linux with RT patch is that the interrupt distribution routine of Linux with RT patch is more

988

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 60,

NO. 7, JULY 2011

complicated. For RTAI, its worst-case real-time interrupt latency is 42:77 s. With our two-level interrupt scheme, RTLinux-THIN shows better performance with 12:00 s by completely avoiding the influence caused by system calls from Linux kernel. RTLinux-THINLock gives the best performance with 6:15 s. From Fig. 7a, because of the influences of the cache misses introducing by running MPlayer we can also see that the times by which 90 percent of the interrupt latencies can reach are increased compared with the condition when the system is idle. For example, for RTLinux-THIN and RTAI, when the system is idle, the times are 1:85 s as shown in Fig. 6a, while they are about 9 and 26 s, respectively, in Fig. 7a. Figs. 7b, 7c, 7d, and 7e show the results for RTAI, RTLinux-THIN, and RTLinux-THINLock, when there is one real-time task (Figs. 7b and 7c) and when there are two realtime tasks (Figs. 7d and 7e), respectively. Here, the interrupt latencies are shown in Figs. 7b and 7d, while the interrupt task latencies are shown in Figs. 7c and 7e. We can find that the results show similar trends as these in Section 5.3.1 when the system is idle. There are several differences caused by running MPlayer. First, although the minimum latencies are almost not changed, the percentages to achieve them are greatly reduced. Second, the times by which about 90 percent of interrupt latencies can reach are increased. Third, the worst-case latencies are increased as well. These are mainly caused by cache misses.

Fig. 10. The Ethernet throughput bandwidth for RTLinux-THIN, RTLinuxTHINLock, RTAI, and Linux when the TCP/UDP bandwidth performance is measured by iperf [42].

Figs. 8b, 8c, 8d, and 8e show the results for RTAI, RTLinux-THIN, and RTLinux-THINLock, when there is one real-time task (Figs. 8b and 8c) and when there are two realtime tasks (Figs. 8d and 8e), respectively. From the figures, the similar trends can be found as in Section 5.3.2.

5.3.3 Intensive IDE Disk Accesses in Bonnie Fig. 8a shows the interrupt latency distributions for RTLinux-THIN, RTLinux-THINLock, RTAI, Linux, and Linux with RT patch when the speed of the file system is evaluated with a set of file operation benchmarks in Bonnie. Since the device driver for IDE disk in our platform does not use DMA (the board does not support DMA), the interrupts maybe disabled for very long time in the driver. Therefore, it posts a huge impact for the worst-case interrupt latency for Linux which is 11;353 s. This is greatly reduced (147:08 s) by Linux with RT patch in which interrupt handlers are converted fully preemptable kernel threads. In RTAI, this problem is solved using virtual interrupt controller; therefore, it shows a huge improvement over Linux with 68 s. Our RTAI-THIN and RTAITHINLock can provide further improvements with 9.54 and 6:15 s, respectively. Compared RTAI with RTLinux-THIN, the better worstcase real-time interrupt latency is also produced by RTLinux-THIN in this case. With the intensive IDE disk accesses in Bonnie, there are many system calls and interrupts with file I/O operations. Based on (3), in RTAI, the worst-case real-time interrupt latency is mainly caused by three components: TD (the execution times of the interrupt distribution routine), TT (the delay caused by the trap instruction from system calls), and interrupt conflicts. Interrupt conflicts means that real-time interrupts may occur at the time when the system is in the processing of an interrupt caused by file I/O operations. In this case, real-time interrupts can only be handled when the current low-priority interrupt has been processed. Based on (5), in RTLinux-THIN, both TT and the waiting time caused by interrupt conflicts can be removed as we have separated interrupt entries for real-time and non-real-time interrupts.

5.3.4 Intensive Network Communication in iperf Fig. 9a shows the interrupt latency distributions for RTLinux-THIN, RTLinux-THINLock, RTAI, Linux, and Linux with RT patch, when the TCP/UDP bandwidth performance is measured by iperf. When iperf is running, the interrupt handling code is frequently invoked. Therefore, the worst-case interrupt latencies from Linux and Linux with RT patch are 140.62 and 165:23 s. For RTAI, the worst-case real-time interrupt latency is 23:08 s that is mainly caused by its interrupt distribution routine. Our RTLinux-THIN improves this to 12:92 s since its distribution routine is simpler and runs faster. RTLinux-THINLock further improves this to 9:23 s by reducing cache misses with cache locking. Compared RTAI with RTLinux-THIN, RTLinux-THIN provides better worst-case real-time interrupt latencies when iperf is executed. Based on (3) and (5), 0 besides the difference between TD and TD (the execution times of the interrupt distribution routines), for RTAI, the extra waiting time caused by interrupt conflicts is another factor to cause the difference of the worst-case real-time interrupt latencies between RTAI and RTLinux-THIN. Figs. 9b, 9c, 9d, and 9e show the results for RTAI, RTLinux-THIN, and RTLinux-THINLock, when there is one real-time task (Figs. 9b and 9c) and when there are two realtime tasks (Figs. 9d and 9e), respectively. From the figures, the similar trends can be found as in Section 5.3.2. 5.3.5 Performance Influence for Linux Subsystem In the above, we compared real-time interrupt latencies for the four systems with the four conditions. Next, we use the Ethernet throughput bandwidth obtained from iperf to show the performance influence for the Linux subsystem in RTAI, RTLinux-THIN, and RTLinux-THINLock. Fig. 10 shows the Ethernet throughput bandwidth for RTLinuxTHIN, RTLinux-THINLock, RTAI, and Linux when the network performance is measured by iperf. In the plot in Fig. 10, the X-axis denotes the throughput bandwidth (unit: Mbit/s) and the Y-axis denotes the percentage. For each point (x, y) on a curve, y denotes the percentage of the

LIU ET AL.: ON IMPROVING REAL-TIME INTERRUPT LATENCIES OF HYBRID OPERATING SYSTEMS WITH TWO-LEVEL HARDWARE...

989

Ethernet throughput bandwidth that is equal to x. The data for the throughput bandwidth of Linux are used as the baseline for comparison. Compared with RTAI, we can see that both RTLinux-THIN and RTLinux-THINLock cause less performance overhead for the Linux subsystem. In RTAI and RTLinux-THIN, as the Linux subsystem runs as the idle task, iperf that is run in the Linux subsystem will be preempted by real-time interrupts. Therefore, in our experiments, the performance of iperf is mainly influenced by the efficiency of interrupt distribution routines. With our two-level interrupt scheme, the interrupt distribution routine of RTLinux-THIN is simpler and more efficient than that of RTAI with the software emulation method. This explains the reason why RTLinux-THIN introduces less overhead than RTAI for the Linux subsystem. It is interesting to note that RTLinux-THINLock causes more performance overhead than RTLinux-THIN. The main reason is that for RTLinux-THINLock, the cache lines locked for the interrupt distribution routine cannot be used by the Linux subsystem in such a way that more overhead is caused.

RELATED WORK

In this section, we present the related work in terms of open system architecture, hybrid operating system, interrupt-related WCET analysis, and cache-related WCET analysis, respectively. The open system architecture, first proposed by Deng et al. [8], [9], [10], allows real-time applications to run concurrently with non-real-time applications, where the schedulability of each real-time application can be guaranteed regardless of the behaviors of other applications. To cater for the scheduling requirements of open real-time systems, hierarchical scheduling is developed to provide temporal partition and spatial isolation among applications. Over the years, there has been a lot of research work in hierarchical scheduling for real-time systems [8], [9], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]. A hybrid operating system is a combination of real-time OS and general-purpose OS with different characteristics in order to run real-time applications together with non-realtime applications. The advantage of a hybrid OS is that it can utilize the existing resources and properties of both operating systems. RTAI [3], [43] and RTLinux [2], two famous hybrid operating systems based on Linux, are implemented to support low latency and predictable execution of kernel-level real-time tasks with one-level hardware interrupts. In RTAI and RTLinux, virtual interrupt controller (VIC), a software emulation method, is used to separates real-time and non-real-time interrupts [43]. However, this approach will cause some problems as mentioned in this paper. West and Parmer [45], [46], [47] illustrated how to build a predictable execution environment at user space by using commonly available hardware protection techniques. In addition, their work solved the problem of how to add safe, predictable, and efficient application-specific services to commodity off-the-shelf (COTS) systems, so they could be tailored to the real-time requirements of target applications. Interrupts are events that can be either synchronous or asynchronous with hardware and software overhead. They

are generated by both hardware devices and program conditions. Various strategies for reducing interrupt overheads have been studied [18], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35]. Mehnert et al. [26], [27] determined the cost of introducing address-space protection to RTLinux [2]. By exploring the dependency between device interrupts and processes, the scheme in [22] showed how to properly account for interrupt handling and how to schedule deferrable interrupt servicing so that predictable task execution is guaranteed. Regehr and Duongsaa [32] quantified the worst-case delays incurred by low-priority interrupts and noninterrupt work, and developed two software-based mechanisms for protecting embedded systems against interrupt overload. A delayed locking technique [29] has been proposed for improving the real-time performance of embedded Linux. Tindell et al. [49] developed a windows-based technique to find the worst-case response time for tasks with arbitrary deadlines. Modern microprocessors integrate cache memories in its memory hierarchy to increase system performance. However, in real-time systems, caches introduce behaviors that are hard to predict. Various techniques for improving the predictability of cache have been investigated in the previous work [36], [37], [50], [51], [52], [53], [54], [55]. Arnaud and Puaut [37], [50] proposed static and dynamic cache locking schemes to make memory access times and cache related preemption delays entirely predictable. Kirner and Schoeberl [53] introduced an analysis technique to statically calculate the WCET of systems with a function cache, a special type of instruction cache that stores whole functions only.

CONCLUSION AND FUTURE WORK

In this paper, we proposed to implement hybrid systems with two-level hardware interrupts. We first investigated the interrupt processing in hybrid systems and analyzed the worst-case real-time interrupt latency of RTAI. Then, we discussed the key issues for implementing a hybrid system based on two-level hardware interrupts. We implemented a hybrid system called RTLinux-THIN on the ARM architecture by combining ARM Linux kernel 2.6.9 and C=OS-II based on this scheme. We implemented RTLinux-THIN on a hardware platform based on Intel PXA270 processor [39]. We conducted experiments with three real application programs: mplayer [40], Bonnie [41], and iperf [42] and compared statistical interrupt latency and interrupt task latency distributions in four system conditions: idle, intensive memory access, intensive IDE Disk access, and intensive network communication for RTLinux (with or without cache locking), RTAI, Linux, and Linux with RT patch. The experimental results show that our scheme improves real-time interrupt latencies. Note that our scheme also improves the performance of time-sharing subsystems by introducing less performance overhead compared with the original schemes with one-level hardware interrupts. There are several directions for the future work. First, although RTLinux-THIN can improve the worst-case realtime interrupt latency, we found that the worst latencies still vary a little bit in different situations. How to further

990

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 60,

NO. 7, JULY 2011

reduce this variation and provide better predictability is one of the interesting topics. Second, we have been working on applying RTLinux-THIN in various industrial applications such as robots and Computerized Numerical Control (CNC) systems. We found that it is not easy to directly map an application to the system implementation if users are not familiar with the Linux kernel or C=OS-II. Therefore, a more user-friendly development platform needs to be developed so as to reduce technical difficulties in application program design. The work in supporting predictable execution environments at user space with hardware protection in [45], [46], [47], [48] provides a promising direction to solve this problem. In the future, we will combine our work with these techniques. Third, the system is implemented based on ARM Linux kernel 2.6.9 and C=OS-II. In future, we will extend this to combine other kernels as well. Also for processors such as Intel x86, there is only one interrupt entrance so our method cannot be directly applied in them. A special interrupt controller that can separate real-time and non-real-time interrupts is needed in order to apply our technique in such systems. How to design such interrupt controllers is one of the future topics we will study. Finally, multicore architectures have been widely used in both general-purpose and embedded computing systems. Currently, we are working on extending our technique into multicore processors.

[8] [9] [10] [11] [12] [13] [14] [15] [16]

[17] [18]

ACKNOWLEDGMENTS
The work described in this paper is partially supported by the grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (GRF PolyU 5260/ 07E), the Hong Kong Polytechnic University (HK PolyU 1-ZV5S), the National 863 program of China (2008AA042801, 2008AA042803, and 2009AA043901), and the Major National S&T Program of China (2009ZX04013-012). This version is a revised version. A preliminary version of this work appears in the Proceedings of the 28th IEEE Real-Time Systems Symposium (RTSS 2007) [1]. This work was done when Miao Liu was a Research Assistant in the Department of Computing at The Hong Kong Polytechnic University. RTLinuxTHIN can be downloaded from: http://www4.comp.polyu. edu.hk/~cszlshao/software/RT-THIN.
[19] [20]

[21] [22] [23]

[24] [25] [26] [27] [28] [29]

REFERENCES
[1] M. Liu, Z. Shao, M. Wang, H. Wei, and T. Wang, Implementing Hybrid Operating Systems with Two-Level Hardware Interrupts, Proc. 28th IEEE Intl Real-Time Systems Symp. (RTSS 07), pp. 244253, 2007. V. Yodaiken and M. Barabanov, Real-Time Linux, Proc. Applications Development and Deployment Conf. (USELINUX), Jan. 1997. P. Mantegazza, E.L. Dozio, and S. Papacharalambous, RTAI: Real Time Application Interface, Linux J., vol. 2000, no. 72es, p. 10, 2000. S. Oikawa and R. Rajkumar, Linux/RK: A Portable Resource Kernel in Linux, Proc. IEEE Real-Time Systems Symp., 1998. QLinux, http://www.cs.umass.edu/lass/software/qlinux/, 2002. Y.-C. Wang and K.-J. Lin, Implementing a General Real-Time Framework in The Red-Linux Real-Time Kernel, Proc. 20th IEEE Real-Time Systems Symp. (RTSS 99), pp. 246-255, 1999. B. Srinivasan, S. Pather, R. Hill, F. Ansari, and D. Niehaus, A Firm Real-Time System Implementation Using Commercial Offthe-Shelf Hardware and Free Software, Proc. Fourth IEEE RealTime and Embedded Technology and Applications Symp. (RTAS 98), p. 112, 1998.

[2] [3] [4] [5] [6] [7]

[30]

[31]

Z. Deng, J. Liu, and J. Sun, A Scheme for Scheduling Hard RealTime Applications in Open System Environment, Proc. Ninth Euromicro Workshop Real-Time System, pp. 191-199, June 1997. Z. Deng and J.W.-S. Liu, Scheduling Real-Time Applications in an Open Environment, Proc. 18th IEEE Real-Time Systems Symp. (RTSS 97), pp. 308-319, 1997. Z. Deng, J.W.-S. Liu, L. Zhang, S. Mouna, and A. Frei, An Open Environment for Real-Time Applications, Real-Time Systems, vol. 16, nos. 2/3, pp. 155-185, 1999. R.I. Davis and A. Burns, Resource Sharing in Hierarchical Fixed Priority Pre-Emptive Systems, Proc. 27th IEEE Intl Real-Time Systems Symp. (RTSS 06), pp. 257-270, 2006. X.A. Feng and A.K. Mok, A Model of Hierarchical Real-Time Virtual Resources, Proc. 23rd IEEE Real-Time Systems Symp. (RTSS 02), pp. 26-35, 2002. I. Lee and I. Shin, Periodic Resource Model for Compositional Real-Time Guarantees, Proc. 24th IEEE Intl Real-Time Systems Symp. (RTSS 03), pp. 2-13, 2003. A.K. Mok, X.A. Feng, and D. Chen, Resource Partition for RealTime Systems, Proc. Seventh IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS 01), pp. 75-84, 2001. T.-W. Kuo and C.-H. Li, A Fixed-Priority-Driven Open Environment for Real-Time Applications, Proc. 20th IEEE Real-Time Systems Symp. (RTSS 99), pp. 256-267, 1999. G. Lipari and S.K. Baruah, Efficient Scheduling of Real-Time Multi-Task Applications in Dynamic Systems, Proc. Sixth IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS 00), p. 166, 2000. G. Lipari, E. Bini, and G. Fohler, A Framework for Composing Real-Time Schedulers, Proc. Intl Workshop Test and Analysis of Component-Based Systems (ETAPS 03), pp. 133-146, 2003. J. Regehr, A. Reid, K. Webb, M. Parker, and J. Lepreau, Evolving Real-Time Systems Using Hierarchical Scheduling and Concurrency Analysis, Proc. 24th IEEE Intl Real-Time Systems Symp. (RTSS 03), pp. 25-36, 2003. J. Regehr and J.A. Stankovic, HLS: A Framework for Composing Soft Real-Time Schedulers, Proc. 22nd IEEE Real-Time Systems Symp. (RTSS 01), pp. 3-14, 2001. J.L. Lorente, G. Lipari, and E. Bini, A Hierarchical Scheduling Model for Component-Based Real-Time Systems, Proc. 20th IEEE Intl Parallel and Distributed Processing Symp. (IPDPS 06), pp. 25-36, 2006. T.-W. Kuo, K.-J. Lin, and Y.-C. Wang, An Open Real-Time Environment for Parallel and Distributed Systems, Proc. 20th Intl Conf. Distributed Computing Systems (ICDCS 00), pp. 206-213, 2000. Y. Zhang and R. West, Process-Aware Interrupt Scheduling and Accounting, Proc. 27th IEEE Intl Real-Time Systems Symp. (RTSS 06), pp. 191-201, 2006. L. Abeni, A. Goel, C. Krasic, J. Snow, and J. Walpole, A Measurement-Based Analysis of the Real-Time Performance of Linux, Proc. Eighth IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS 02), pp. 133-142, Sept. 2002. S.A. Banachowski and S.A. Brandt, Better Real-Time Response for Time-Share Scheduling, Proc. 17th Intl Symp. Parallel and Distributed Processing (IPDPS 03), p. 124, Apr. 2003. H. Kopetz, The Time-Triggered Model of Computation, Proc. 19th IEEE Real-Time Systems Symp. (RTSS 98), pp. 168-177, Dec. 1998. rtig, Cost and Benefit of F. Mehnert, M. Hohmuth, and H. Ha Separate Address Spaces in Real-Time Operating Systems, Proc. 23rd IEEE Real-Time Systems Symp. (RTSS 02), pp. 124-133, 2002. F. Mehnert, M. Hohmuth, S. Onberg, and H. Artig, RTLinux with Address Spaces, Proc. Third Real-Time Linux Workshop, Nov. 2001. C. Dovrolis, B. Thayer, and P. Ramanathan, HIP: Hybrid Interrupt-Polling for the Network Interface, ACM SIGOPS Operating Systems Rev., vol. 35, no. 4, pp. 50-60, 2001. J. Lee and K.-H. Park, Delayed Locking Technique for Improving Real-Time Performance of Embedded Linux by Prediction of Timer Interrupt, Proc. 11th IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS 05), pp. 487-496, 2005. T. Facchinetti, G.C. Buttazzo, M. Marinoni, and G. Guidi, NonPreemptive Interrupt Scheduling for Safe Reuse of Legacy Drivers in Real-Time Systems, Proc. 17th Euromicro Conf. Real-Time Systems (ECRTS 05), pp. 98-105, July 2005. J.C. Mogul and K.K. Ramakrishnan, Eliminating Receive Livelock in an Interrupt-Driven Kernel, ACM Trans. Computer Systems, vol. 15, no. 3, pp. 217-252, 1997.

LIU ET AL.: ON IMPROVING REAL-TIME INTERRUPT LATENCIES OF HYBRID OPERATING SYSTEMS WITH TWO-LEVEL HARDWARE...

991

[32] J. Regehr and U. Duongsaa, Preventing Interrupt Overload, Proc. ACM SIGPLAN/SIGBED Conf. Languages, Compilers, and Tools for Embedded Systems (LCTES 05), pp. 50-58, 2005. [33] K. Sandstrom, C. Eriksson, and G. Fohler, Handling Interrupts with Static Scheduling in an Automotive Vehicle Control System, Proc. Fifth Intl Workshop Real-Time Computing Systems and Applications (RTCSA 98), pp. 158-165, 1998. [34] D.B. Stewart and G. Arora, A Tool for Analyzing and Fine Tuning the Real-Time Properties of an Embedded System, IEEE Trans. Software Eng., vol. 29, no. 4, pp. 311-326, Apr. 2003. [35] J. Yang, Y. Chen, H. Wang, and B. Wang, A Linux Kernel with Fixed Interrupt Latency for Embedded Real-Time System, Proc. Second Intl Conf. Embedded Software and Systems (ICESS 05), pp. 127-134, 2005. [36] K.W. Batcher and R.A. Walker, Interrupt Triggered Software Prefetching for Embedded CPU Instruction Cache, Proc. 12th IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS 06), pp. 91-102, 2006. [37] A. Arnaud and I. Puaut, Dynamic Instruction Cache Locking in Hard Real-Time Systems, Proc. 14th Intl Conf. Real-Time and Network Systems (RNTS 06), May 2006. [38] D. Seal, ARM Architecture Reference Manual, second ed., AddisonWesley, Nov. 2000. [39] Intel Inc., Intel PXA27x Processor Family Developers Manual, Jan. 2006. [40] mplayer, http://www.mplayerhq.hu/, 2009. [41] Bonnie, http://www.garloff.de/kurt/linux/bonnie/, 1996. [42] iperf, http://dast.nlanr.net/projects/iperf, 2005. [43] K. Yaghmour, Adaptive Domain Environment for Operating Systems, June 2002. [44] J.J. Labrosse, MicroC/OS-II: The Real-Time Kernel, second ed., CMP Books, Apr. 2002. [45] R. West and G. Parmer, Application-Specific Service Technologies for Commodity Operating Systems in Real-Time Environments, Proc. 12th IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS 06), pp. 3-13, 2006. [46] G. Parmer and R. West, Hijack: Taking Control of COTS Systems for Real-Time User-Level Services, Proc. 13th IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS 07), pp. 133146, 2007. [47] G. Parmer and R. West, Predictable Interrupt Management and Scheduling in the Composite Component-Based System, Proc. Real-Time Systems Symp. (RTSS 08), pp. 232-243, 2008. [48] G. Parmer and R. West, Mutable Protection Domains: Towards a Component-Based System for Dependable and Predictable Computing, Proc. 28th IEEE Intl Real-Time Systems Symp. (RTSS 07), pp. 365-378, 2007. [49] K.W. Tindell, A. Burns, and A.J. Wellings, An Extendible Approach for Analyzing Fixed Priority Hard Real-Time Tasks, Real-Time Systems, vol. 6, no. 2, pp. 133-151, 1994. [50] A. Arnaud and I. Puaut, Towards a Predictable and High Performance Use of Instruction Caches in Hard Real-Time Systems, Proc. Work-in-Progress Session of the 15th Euromicro Conf. Real-Time Systems (ECRTS 03), pp. 61-64, July 2003. [51] A.M. Campoy, A. Perles, F. Rodriguez, and J.V. Busquets-Mataix, Static Use of Locking Caches vs. Dynamic Use of Locking Caches for Real-Time Systems, Proc. Canadian Conf. Electrical and Computer Eng. (CCECE 03), vol. 2, pp. 1283-1286, 2003. [52] M. Campoy, A. Ivars, and J. Busquets-Mataix, Static Use of Locking Caches in Multitask Preemptive Real-Time Systems, Proc. IEEE Real-Time Embedded System Workshop (Satellite of the IEEE Real-Time Systems Symp.), 2001. [53] R. Kirner and M. Schoeberl, Modeling the Function Cache for Worst-Case Execution Time Analysis, Proc. 44th Ann. Design Automation Conf. (DAC 07), pp. 471-476, 2007. [54] I. Puaut and D. Decotigny, Low-Complexity Algorithms for Static Cache Locking in Multitasking Hard Real-Time Systems, Proc. 23rd IEEE Real-Time Systems Symp. (RTSS 02), pp. 114-123, 2002. [55] X. Vera, B. Lisper, and J. Xue, Data Caches in Multitasking Hard Real-Time Systems, Proc. 24th IEEE Intl Real-Time Systems Symp. (RTSS 03), pp. 154-165, 2003.

Miao Liu received the BE degree in mechanical engineering and automation and the PhD degree in electronic mechanics from Beihang University, Beijing, China, in 2002 and 2010, respectively. He has been a postdoctoral fellow in the Department of Computer Science, Beihang University, Beijing, China, since 2010. His research interests include embedded systems, electromechanical control systems, and modular robotics.

Duo Liu received the BE degree in computer science from the Southwest University of Science and Technology, Sichuan, China, in 2003 and the ME degree from the Department of Computer Science, University of Science and Technology of China, in 2006. He is currently a PhD candidate in the Department of Computing at the Hong Kong Polytechnic University. His research interests include embedded systems and high-performance computing for multicore processors.

Yi Wang received the BE and ME degrees in electrical engineering from Harbin Institute of Technology, China, in 2005 and 2008, respectively. He is currently a PhD candidate in the Department of Computing at the Hong Kong Polytechnic University. His research interests include embedded systems and real-time scheduling for multicore systems.

Meng Wang received the BE and MS degrees in computer science from Xidian University, Xian, China, in 2003 and 2006, respectively. He has been a PhD candidate with the Department of Computing, Hong Kong Polytechnic University, since 2006. His research interests include embedded systems, compiler optimization, and real-time systems.

Zili Shao received the BE degree in electronic mechanics from the University of Electronic Science and Technology of China, Sichuan, in 1995, and the MS and PhD degrees from the Department of Computer Science, University of Texas at Dallas, in 2003 and 2005, respectively. He has been an assistant professor with the Department of Computing, Hong Kong Polytechnic University, since 2005. His research interests include embedded systems, real-time systems, compiler optimization, and hardware/software codesign. He is a member of the IEEE.

. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.

You might also like