Final Report

Scheduling Policy Optimization in Kernel-based Virtual Machine
ABSTRACT
Virtualization in computing is the creation of a virtual version for hardware platform, operating system, a storage device or network resources. Scheduling refers to the way processes are assigned to run on the available CPUs, since there are typically many more processes running than there are available CPUs. This assignment is carried out by software known as a scheduler. The scheduler is concerned mainly with: Throughput - number of processes that complete their execution per time unit. Latency is specifically a delay in processing. Turnaround - total time between submission of a process and its completion. Response time- amount of time it takes from when a request was submitted until the first response is produced. Fairness / Waiting Time - Equal CPU time to each process.
Scheduling policy in host operating system will have effect on the performance of Kernel-based Virtual Machine (KVM). KVM driver is added to the Linux kernel and Linux kernel is made to act as virtual machine monitor. By adding virtualization capabilities to a standard Linux kernel, fine-tuning work that is being loaded into the kernel can be enjoyed, and brings that benefit into a virtualized environment. in this model, every virtual machine is a regular Linux process scheduled by the standard Linux scheduler. Its memory is allocated by the Linux memory allocator. Virtualization has led to the creation of hypervisors. A hypervisor, also called virtual machine manager (VMM), is one of the hardware virtualization techniques that allow multiple operating systems, termed guests, to run concurrently on a host computer. The hypervisor presents the guest operating systems a virtual operating platform and manages the execution of the guest operating systems. In this report an optimized scheduling policy is proposed to improve the performance of KVM. In the first stage, a special process queue for virtual machine is added in the host operating system, which is scheduled before the normal process. Same time slices is given to each virtual machine to make their load balanced. Next, the virtual machine process queue is periodically sorted based on the remaining time slices of virtual machine and a special state is added to identify I/O-intensive virtual machine, which makes sure that I/Ointensive virtual machine is processed specially to receive a prior scheduling opportunity. At the end, an experiment in KVM environment is designed and executed to show the effectiveness of this optimized scheduling policy. In KVM environment, the scheduler within the host OS plays a key role in determining the overall fairness and performance characteristics of the whole virtualized system. KEYWORDS: Kernel based virtual machine, Scheduling policy, Host operating system Performance, response latency, native virtualization, hypervisor, QEMU process, KVM driver, and virtualization.
Dept of CSE, BIT
Page 1
1. INTRODUCTION
Virtualization has gained widespread uses in cloud computing, server consolidation and information security for its multitudinous benefits such as flexibility, isolation, high resource utilizing rate, easy IT infrastructure management, power saving and so on. In virtualization systems, resource virtualization of underlying hardware and concurrent execution of virtual machines are in the charge of a software called virtual machine monitor (VMM) or hypervisor. By creating the same view of underlying hardware and platform APIs from different vendors, virtual machine monitor enables virtual machines to run on any available computer. This not only eases the numerous applications of desktop computers, but also reduces the hardware cost of distributed environments. However, these benefits are not always for free. The existing of virtual machine monitor level debases the performance of some specific operations. As one of the core components, virtual machine monitor will affect the performance of virtualization systems to a great extent, so its important to measure and analyze the performance of virtual machine monitors. Kernel-based Virtual Machine (KVM) is a virtualization infrastructure for the Linux kernel. KVM supports native virtualization on processors with hardware virtualization extensions. Native virtualization is a platform virtualization approach that enables efficient full virtualization using help from hardware capabilities, primarily from the host processors. Full virtualization is used to simulate a complete hardware environment, or virtual machine, in which an unmodified guest operating system (using the same instruction set as the host machine) executes in complete isolation. Full virtualization is a virtualization technique used to provide a certain kind of virtual machine environment, namely, one that is a complete simulation of the underlying hardware. Full virtualization requires that every salient feature of the hardware be reflected into one of several virtual machines including the full instruction set, input/output operations, interrupts, memory access, and whatever other elements are used by the software that runs on the bare machine, and that is intended to run in a virtual machine. In such an environment, any software capable of execution on the raw hardware can be run in the virtual machine and, in particular, any operating systems. The obvious test of virtualization is whether an operating system intended for stand-alone use can successfully run inside a virtual machine Kernel based virtual machine is a new Virtualization solution based on Linux kernel, which need x86 hardware virtualization support. KVM has two components namely: KVM driver and QEMU process. KVM Driver consists of loadable volume as a part of Linux kernel and provides core virtualization infrastructure including virtual CPU (VCPU) and virtual memory for virtual machine (VM). The other component is a lightly modified QEMU process, which is used to simulate PC hardware components of user space and provides an I/O device model for Virtual machine. In conjunction with CPU emulation, it also provides a set of device models, allowing it to run a variety of unmodified guest operating systems, it can thus be viewed as a hosted virtual machine monitor. It also provides an accelerated mode for supporting a mixture of binary translation (for kernel code) and native execution (for user code), in the same fashion as VMware Workstation and Virtual Box. QEMU can also be used purely for CPU emulation for user level processes, allowing applications compiled for one architecture to run on another. Kernel modules see the virtualization of hardware resources through /dev/kvm and kill command. With /dev/kvm, guest operating system may have its own
Dept of CSE, BIT
Page 2

address space allocated by the Linux scheduler. The physical memory mapped for each guest operating system is actually the virtual memory of its corresponding process. A set of shadow page tables is maintained to support the translation from guest physical address to host physical address. User space takes charge of the I/Os virtualization by employing a lightly modified QEMU to simulate the behavior of I/O or sometimes necessarily triggering the real I/O. KVM also provides a mechanism for user-space to inject interrupts into guest operating system. Any I/O requests of guest operating system are trapped into userspace and simulated by QEMU. In Kernel based virtual machine, each Virtual machine is a standard process in the Linux Operating system and Linux kernel is used to schedule Virtual machine.
1.1 MOTIVATION
Linux treats each virtual machine as normal process. Hence, they have the same state with normal process and take the benefits of Linux feature. When virtual machine is running a high priority application, if a normal process which has higher priority than virtual machine but lower priority than the application on virtual machine, Linux scheduler enforces the virtual machine to give up the right to use processor resources and schedules normal process that is being arrived to run as it is not aware of the situation. Due to this, the high priority application in virtual machine is not executed and it also results in an unexpected process switch, thereby increasing the virtual machine switching overhead. As Linux kernel will not treat each virtual machine fairly, when kernel based virtual machine is running network monitoring application, it cannot guarantee that each virtual machine receive same network packets or workload balance. Therefore it cannot achieve basic requirement of network monitoring. If kernel based virtual machine is a system virtual machine in host operating system, Linux scheduler is not good enough to meet the requirement. In order to resolve this problem, an optimized scheduling policy is proposed.
1.1.1 VISION
For the drawbacks of Linux scheduling policy towards KVM, an optimized scheduling policy to improve the performance of KVM is proposed.
1.1.2 MISSION
Improving the efficiency of scheduling in Kernel based virtual machine. Improving the response latency of I/O intensive virtual machine.
1.1.3 OBJECTIVES
1. A process queue for scheduling KVM process is added into the queue and the VM in this queue has two states: HAVE and OVER, which depends on whether its time slices is remaining or not. VM in HAVE state always run first than in OVER state by using first-in, first-out (FIFO) technique and the KVM process queue has higher priority than normal process queue, which avoid VM preempted by normal process when running.
Dept of CSE, BIT
Page 3

2. In order to improve the response latency of I/O intensive virtual machine, another KVM state is added: URGENT. When sleep_avg value of virtual machine process is high enough, its state is set to URGENT, which means that it is an I/O-intensive VM, and needs higher priority than other Virtual machines. Beside to sorting the KVM process queue according to the value of time slices periodically, the I/O-intensive VM is also scheduled more frequently.
1.2 LITERATURE SURVEY

Kernel-based Virtual Machine (KVM) is a virtualization infrastructure for the Linux kernel that supports native virtualization on processors. Native virtualization is a platform virtualization approach that enables efficient full virtualization using help from hardware capabilities, primarily from the host processors. [1] [2]. Virtualization has led to the creation of hypervisors. A hypervisor, also called virtual machine manager (VMM), is one of the hardware virtualization techniques that allow multiple operating systems, termed guests, to run concurrently on a host computer [3]. The other component is a lightly modified QEMU process, which is used to simulate PC hardware components of user space and provides an I/O device model for Virtual machine [4]. A scheduling policy is the set of decisions you make regarding scheduling priorities, goals, and objectives [5]. Virtualization, in computing, is the creation of a virtual version of something, such as a hardware platform, operating system, a storage device or network resources [6]. The term network monitoring describes the use of a system that constantly monitors a computer network for slow or failing components and that notifies the network administrator It is a subset of the functions involved in network management [7]. Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency [8]. The Linux kernel uses a circular doubly-linked list of structure task_struct to store the process descriptors. This structure is declared in Linux/sched.h [9]. Sleep_avg is a running average of the time a process spends sleeping. Tasks with high sleep_avg values are considered interactive and given a higher dynamic priority and a larger time slice [10]. Linux has a well designed scheduling framework, which includes three scheduler classes: Real-Time class, Completely Fair Scheduler (CFS) class and idle class [11]. Kernel modules see the virtualization of hardware resources through /dev/kvm and kill command. With /dev/kvm, guest operating system may have its own address space allocated by the Linux scheduler [12]. Without shadow pages virtual memory had to be translated into guest OS physical memory and then translate the latter into the real physical memory. But, the shadow page table trick avoids the double bookkeeping by making the MMU work with a virtual memory of the guest OS to real physical memory green page table, effectively skipping the intermediate guest OS physical memory step [13]. NAPI ("New API") is a modification to the device driver packet processing framework, which is designed to improve the performance of high-speed networking. NAPI works through: Interrupt mitigation where High-speed networking can create thousands of interrupts per second, all of which tell the system something it already knew: it has lots of packets to process. NAPI allows drivers to run with (some) interrupts disabled during times of high traffic, with a corresponding decrease in system load and Packet throttling when the system is overwhelmed and must drop packets, it's better if those packets are
Dept of CSE, BIT
Page 4

disposed of before much effort goes into processing them. NAPI-compliant drivers can often cause packets to be dropped in the network adaptor itself, before the kernel sees them at all [14]. The Completely Fair Scheduler is the name of a task scheduler which was merged into the 2.6.23 release of the Linux kernel. It handles CPU resource allocation for executing processes, and aims to maximize overall CPU utilization while also maximizing interactive performance [15]. Virtualization has gained widespread uses in cloud computing, server consolidation and information security for its multitudinous benefits such as flexibility, isolation, high resource utilizing rate, easy IT infrastructure management, power saving and so on. In virtualization systems, resource virtualization of underlying hardware and concurrent execution of virtual machines are in the charge of a software called virtual machine monitor (VMM) or hypervisor [12]. Full virtualization is a virtualization technique used to provide a certain kind of virtual machine environment, namely, one that is a complete simulation of the underlying hardware. Full virtualization requires that every salient feature of the hardware be reflected into one of several virtual machines including the full instruction set, input/output operations, interrupts, memory access, and whatever other elements are used by the software that runs on the bare machine, and that is intended to run in a virtual machine [16]. A network interface controller also known as a network interface card, network adapter, LAN adapter is a computer hardware component that connects a computer to a computer network. Whereas network interface controllers were commonly implemented on expansion cards that plug into a computer bus, the low cost and ubiquity of the Ethernet standard means that most newer computers have a network interface built into the motherboard [17]. Optimization of current I/O virtualization approaches focus on following aspects: Decrease the overhead bring in by the interactive among inner parts in VMM architecture, including inter VM communication; Reduce context switch overhead among Guest OS, VMM, and Host OS; Optimize the scheduler to balance I/O performance between VMs. Try to migrate a VM accessing hardware device directly [18].
1.3 TAXONAMY
1. Kernel based virtual machine: Kernel-based Virtual Machine (KVM) is a virtualization infrastructure for the Linux kernel which supports native virtualization on processors with hardware virtualization extensions. 2. Hypervisor: A hypervisor, also called virtual machine manager (VMM), is one of the hardware virtualization techniques that allow multiple operating systems, termed guests, to run concurrently on a host computer. 3. Scheduling policy: A scheduling policy is the set of decisions you make regarding scheduling priorities, goals, and objectives. 4. Native virtualization: Native virtualization is a platform virtualization approach that enables efficient full virtualization using help from hardware capabilities, primarily from the host processors. 5. KVM driver: KVM driver as a loadable volume is a part of Linux kernel and provides core virtualization infrastructure including virtual CPU (VCPU) and virtual memory for virtual machine (VM).
Dept of CSE, BIT
Page 5

6. QEMU process: QEMU is a processor emulator that relies on dynamic binary translation to achieve a reasonable speed while being easy to port on new host CPU architectures. 7. NAPI: NAPI ("New API") is a modification to the device driver packet processing framework, which is designed to improve the performance of high-speed networking. 8. NIC: A network interface controller also known as a network interface card, network adapter, LAN adapter is a computer hardware component that connects a computer to a computer network. 9. FULL VIRTUALIZATION: Full virtualization is a virtualization technique used to provide a certain kind of virtual machine environment, namely, one that is a complete simulation of the underlying hardware. 10. OPTIMIZATION: Optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources
Dept of CSE, BIT
Page 6
2. PROPOSED TECHNOLOGY
Linux treats each virtual machine as normal process. So they have same state as that of normal process like: TASK_RUNNING, TASK_INTERUPTABLE, TASK_STOPPED, etc. When virtual machine is created, Linux set its state to TASK_RUNNING, and is put into CPUs process queue, where it waits to get scheduled. Each CPU has a process queue made up of 140 priority lists that are serviced in FIFO order. Processes that are scheduled to execute are added to the end of their respective process queue's priority list. Each process has a time slice that determines how much time it is permitted to execute. The first 100 priority lists of the process queue are reserved for real-time processes, and the last 40 are used for normal processes. Figure below depicts CPU process queue for schedule.
CPU 0 ACTIVE CPU 1
Queue [0]
NULL
Queue [1]
NULL
CPU2 Que[100] . . CPU i EXPIRED Task 0 Task 1
Que[139]
Task 0
Task 1
FIGURE 2.1 CPU PROCESS QUEUE FOR SCHEDULE
Dept of CSE, BIT
Page 7

In addition to the CPU's process queue that is the active process queue, there is also an expired process queue. When a process on the active process queue uses all of its time slice, it is moved to the expired process queue. During this, the time slice is recalculated. If no process exists on the active process queue for a given priority, the pointers for the active and expired process queues are swapped, thus making the expired priority list the active priority list. The economics of supporting a growing number of Internet-based application services has created a demand for server consolidation. Consequently, there has been a resurgence of interest in machine virtualization. A virtual machine monitor (VMM) enables multiple virtual machines, each encapsulating one or more services, to share the same physical machine safely and fairly. Specifically, it provides isolation between the virtual machines and manages their access to hardware resources. The scheduler within the VMM plays a key role in determining the overall fairness and performance characteristics of the virtualized system. Traditionally, VMM schedulers have focused on fairly sharing the processor resources among domains while leaving the scheduling of I/O resources as a secondary concern. However, this can cause poor and/or unpredictable I/O performance, making virtualization less desirable for applications whose performance is critically dependent on I/O latency or bandwidth. The scheduler works in the following way: it chooses the process on the highest priority list to execute. To make this process more efficient, a bitmap is used to define when processes are on a given priority list.
Usually, a find-first-bit-set instruction is used to find the highest priority bit set in one of five 32-bit words (for the 140 priorities). The time it takes to find a process to execute depends not on the number of active processes but on the number of priorities. Linux processes are preemptive. If a process enters the TASK_RUNNING state, the kernel checks whether its dynamic priority is greater than the priority of the currently running process. If the priority is greater, the execution of current is interrupted and the scheduler is invoked to select another process to run (usually the process that just became run able). Of course, a process may also be preempted when its time quantum expires. When this occurs, the need_reached field of the current process is set, so the scheduler is invoked when the timer interrupt handler terminates. In kernel based virtual machine environment, host operating system is the scheduler of the virtual machine and virtual machine is treated as a normal process by Linux, which takes the advantage of Linux feature. The KVM code, which is rather small (about 10,000 lines), turns a Linux kernel into a hypervisor by loading a kernel module. Instead of writing a hypervisor and the necessary components, such as a scheduler, memory manager, I/O stack, and device drivers, KVM leverages the ongoing development of the Linux kernel. The kernel module exports a device called /dev/kvm, which enables a guest mode of the kernel (in addition to the traditional kernel and user modes). With /dev/kvm, a virtual machine has a unique address space. Devices in the device tree (/dev) are common to all user-space processes. But /dev/kvm is different because each process sees a different device map in order to support isolation of the virtual machines. KVM takes advantage of hardware-based virtualization extensions to run an unmodified OS.
Dept of CSE, BIT
Page 8

Despite a simplified I/O call procedure, the I/O performance, as well as the lack of support for low latency drivers, plagued the acceptance of KVM as a viable VMM.
KVM THREADS PRIORITY Two most important kinds of KVM threads are QEMU threads and the VCPU threads. QEMU threads do the actual I/O and emulate the devices. The VCPU threads execute codes by instruction emulation or direct code execution. When QEMU thread has a higher priority, the I/O request will be met in a short time. But QEMU finishes I/O through emulation which has additional overhead. If I/O is emulated too frequently, the performance will also be affected. For example, when the guest does the network I/O, the network speed of guest will be slowed down if QEMU emulates I/O for every package. KVM threads priority should be well configured according to workload types to improve the virtualization performance.
FIGURE 2.2 KVM THREADS IN LINUX KERNEL
KVM THREAD ALLOCATION MECHANISM Threads allocation mechanism decides which physical CPU to place one thread in initialization or migration In the worst situation, multiple VCPUs of one guest virtual machine will be placed onto the same physical core, which may cause a serious contention. It could happen in the real world because the load may be still balanced in this situation. The hypervisor model consists of a software layer which multiplexes the hardware among several guest operating systems. The hypervisor performs basic scheduling and memory management, and typically delegates management and I/O functions to a special, privileged, guest.
Dept of CSE, BIT
Page 9
I/O PROXY
I/O PROXY UNPRIVILEGED GUEST 1 UNPRIVILEGED GUEST 2
PRIVILEGED GUEST KERNEL
HYPERVISOR
FIGURE 2.3 HYPERVISOR BASED ARCHITECTURE
Today's hardware however is becoming increasingly complex. The basic scheduling operations have to take into account multiple hardware threads on a core, multiple cores on a socket, and multiple sockets on a system. Similarly, on-chip memory controllers require that memory management take into effect the NonUniform Memory Access (NUMA) characteristics of a system. While great effort is invested into adding these capabilities to hypervisors, a mature scheduler and memory management system that handles these issues very well the Linux kernel is present. When virtualization capabilities are added to a standard Linux kernel, all the fine-tuning work that has gone (and is going) into the kernel is enjoyed, and brings that benefit into a virtualized environment. Under this model, every virtual machine is a regular Linux process scheduled by the standard Linux scheduler. Its memory is allocated by the Linux memory allocator, with its knowledge of NUMA and integrated into the scheduler.
Dept of CSE, BIT
Page 10

A normal Linux process has two modes of execution: kernel and user. KVM adds a third mode: guest mode (which has its own kernel and user modes) The division of labor among the different modes is: Guest mode: execute non-I/O guest code Kernel mode: switch into guest mode, and handle any exits from guest mode due to I/O or special instructions. User mode: perform I/O on behalf of the guest. By integrating into the kernel, the KVM 'hypervisor' automatically tracks the latest hardware and scalability features without additional effort.
LINUX SCHEDULER
Linux has a well designed scheduling framework, which includes three scheduler classes: Real-Time class, Completely Fair Scheduler (CFS) class and idle class. CFS models an ideal, precise multi-tasking CPU on real hardware, which can run each task at equal speed in parallel. CFS uses a red-black tree to sort the tasks according to their virtual running time. The prioritized tasks virtual running time increases slowly. Each task has a priority from 0 to 139 in kernel. The range from 0 to 99 is reserved for real-time processes. In user space, the value of a task is [-20, 19], which are mapped to the range from 100 to 139. If the task has smaller priority number, it means that the task is more important. Load balance is responsible for balancing tasks among available CPUs. CFS has both the passive balancing and active balancing. Passive balancing tries to balance CPUs in the system with the same loads, but it may fail at times if all the tasks on the busiest CPU have a higher priority. Active balancing moves exactly one task from the busiest CPU run queue to the initiator. Active balancing is more likely to succeed because it does not perform the priority comparison. CFS balances tasks even if the local CPU becomes busier than the busiest CPU. CFS will do the balance as long as the abstract value of imbalance between these two CPUs does not become larger. It is not necessary to load balance KVM threads too frequently, because it may result in large cache misses and low performance.
Dept of CSE, BIT
Page 11
NORMAL USER PROCESS
NORMAL USER PROCESS
GUEST MODE
GUEST MODE
QEMU I/O
QEMU I/O
LINUX KERNEL
KVM DRIVER
FIGURE 2.4 KERNEL BASED VIRTUAL MACHINE ARCHITECTURE
Virtual process, as a system virtual machine, is special to the normal system process, so it needs higher priority to run some of its own applications. Else when virtual machine tries to access system privileged resource, it will result VM exit and VM context switch, and then Linux kernel will complete the real work for virtual machine. As virtual machine context switch will bring system overhead, when VM is preempted by normal process, it will increase the whole system overhead, which is unexpected for. When virtual machine is running a high priority application, if a normal process which has high priority than virtual machine but low priority than the application in virtual machine arrives, Linux scheduler enforces the virtual machine to give up the right to use process resource and put new process to run as it is not aware of this situation. Due to this, not only the high priority application in virtual machine is not executed but also results in an unexpected process switch, thereby increasing the virtual machine switching overhead. As Linux kernel will not treat each kernel fairly, when kernel based virtual machine running network monitoring application is used, it cannot guarantee that each virtual machine receive same network packets or workload balance.
Dept of CSE, BIT
Page 12

Therefore it cannot achieve basic requirement of network monitoring. In order to resolve this problem, an optimized scheduling policy is proposed Optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. An efficient and simple I/O virtualization approach, which provides VM high I/O throughput, low CPU utility, and rich functionality, is still required. Lots of researches try to optimize current I/O virtualization approaches. And most of them focus on following aspects: Decrease the overhead bring in by the interactive among inner parts in VMM architecture, including inter VM communication; Reduce context switch overhead among Guest OS, VMM, and Host OS; Optimize the scheduler to balance I/O performance between VMs. Try to migrate a VM accessing hardware device directly. Optimized scheduling policy is implemented for kernel based virtual machine in the following way: At first, a new process queue for kernel based virtual machine is added. When a virtual machine starts, this new process queue will be put into virtual machine process queue instead of normal process queue. As virtual machine process queue has high priority than normal process queue, virtual machine process will be scheduled before normal process. Each virtual machine is given same time slices such that they are treated fairly. Next, in order to reduce the response latency of I/O intensive virtual machine, a new virtual machine state for I/O intensive virtual machine is added. By sorting the virtual machine process queue according to remaining tome slices, I/O intensive virtual machine is prioritized for scheduling. Also optimization to KVM includes the following aspects: Reduce VM Exit frequency: The hot code tests show that there are some instructions cause a lot of VM exits. If clustered hot instructions can be merged to a single special instruction, frequency of VM exits can be reduced. Simplify Guest OS: There are some redundant operations in Guest OS which are useless in virtual environment. For example, verify_pmtmr_rate is a useless function in Guest OS and NAPI support in NIC driver is useless too. If we can delete these operations from Guest OS, it will be more efficient in virtual environment. Enhance performance of QEMU: We have evaluated KVM performance after we reduced VM exit frequency, and find that VM exit is not the main source of the overhead. QEMU emulation cost may be the main overhead. QEMU emulates I/O devices in user space. Optimize multiple levels of schedulers: Obviously, there are two levels of disk I/O schedulers in VM environment. One is in Guest OS; the other is in Host OS. One of them is redundant. If we omit one of them, the latency and CPU utilization may be optimized.
Dept of CSE, BIT
Page 13
2.1 WORK FLOW DIAGRAM
LINUX
KVM
VM PROCESS QUEUE
VM PROCESS SCHEDULER
FIFO SCHEDULER
CHECKS VM PROCESS tqi
IF tqi>0 THEN PROCESS STATE: HAVE ELSE PROCESS STATE: OVER
ASSIGN PRIORITY TO VM PROCESS
CHECKS avg_sleep TIME
LINUX task_struct
Dept of CSE, BIT
Page 14
2.2 DATA FLOW DIAGRAM

Process Checks for VM process VM process, tqi VM process queue VM process,tqi
LINUX tqi, process state:HAVE
Process state: HAVE
Process state: HAVE
Prioritized VM process
Schedule VM to run
Process state: HAVE
VM process, tqi
FCFS scheduler
Assign priority to VM
VM process, tqi
Process state: HAVE/OVER/ URGENT HAVE/OVER
Assign process state to VM
VM process tq=<0/>>0
Process state: HAVE/URGENT
VM process, tqi
process Linux task_struct Sleep_avg_time Checks process_ avg_sleep _ time Checks time quantum
Dept of CSE, BIT
Page 15
3. DEVELOPING TECHNIQUES
Each virtual machine is given a certain time slices when it starts, then it is put into virtual machine process queue. The overall objective of the optimized scheduler is to allocate the processor resources fairly; weighted by the time slices, each virtual machine is allocated. Therefore, each virtual machine is given the same time slices such that each of them gets an equal fraction of processor resources. Virtual machine in the process queue can be in one of two states: HAVE or OVER. If it is in the HAVE state, it means it has time slices remaining. If it is in the OVER state, it means it has finished time slices allocation. Time slices are based on periodic scheduler interrupts which occur at every 10ms. At each scheduler interrupt, the current running virtual machine consumes 100 slices. When the time slices for all of the virtual machine in the system goes negative, all VMs are given new time slices. Scheduling decisions are made such that virtual machines in the HAVE state run before the virtual machines in the OVER state. Virtual machines whose time slices allocation is OVER is executed only if there are no virtual machine in the HAVE state that are ready to run. When making scheduling decisions, the kernel based virtual machine scheduler only considers whether virtual machines is in the OVER or HAVE state. The remaining time slice a virtual machine has is irrelevant to it. Rather, it considers virtual machines in the same state that are scheduled in a FIFO manner. Virtual machines are inserted into the process queues after all other VMs in the process queue that are in the same state, and the scheduler selects the virtual machine which is at the front of the process queue to execute. When a VM reaches the front of the process queue, it is allowed to run for three scheduling intervals (for a total of 30ms) as long as it has sufficient time slices to do so. When a virtual machines time slice is over, it will enter to OVER state. As shown in the below figure, the kernel based virtual machine is loaded into Linux kernel. Kernel based virtual machine provides n number of virtual machine each of which can accommodate guest operating system which is shown in top layer rectangle in the diagram. Also the kernel based virtual machine is isolated from normal process queue and kernel based virtual machine process queue is scheduled separately. The schedule controller takes the input from system clock which makes it to accommodate the scheduling depending on the time quantum. The scheduling is done in FCFS manner depending on the state of virtual machine which in turn depends on time quantum which it constitutes. The scheduler schedules the virtual machine which is at the front of the queue for execution, if the state of virtual machine is HAVE. If the virtual machine is in OVER state, then the virtual machine is sent back to tail of the queue and state is changed to HAVE. The I/O intensive virtual machine will be given the other state and prioritized.
Dept of CSE, BIT
Page 16
GUEST OS NORMAL PROCESS PROCESS INFORMATION VM1 ..
GUEST OS PROCESS INFORMATION VMn
NORMAL RUN QUEUE LINUX KERNEL KVM RUN QUEUE SCHEDULER CONTROL
P1 VM
Pi VMi
OPTIMIZED LINUX SCHEDULER
HARDWARE
CLOCK
CPU
MM
FIGURE 3.1 KVM PROCESS QUEUE
In virtualized data centers, I/O performance problems are caused by running numerous virtual machines on one server. In early server virtualization implementations, the number of virtual machines per server was typically limited to six or less. But it was found that it could safely run seven or more applications per server, often using 80 percent of total server capacity, an improvement over the average 5 to 15 percent utilized with non-virtualized servers.
Dept of CSE, BIT
Page 17

However, increased utilization created by virtualization placed a significant strain on the servers I/O capacity. Network traffic, storage traffic, and inter-server communications combine to impose increased loads that may overwhelm the server's channels, leading to backlogs and idle CPUs as they wait for data. Virtual I/O addresses performance bottlenecks by consolidating I/O to a single connection whose bandwidth ideally exceeds the I/O capacity of the server itself; thereby ensuring that the I/O link itself is not a bottleneck. That bandwidth is then dynamically allocated in real time across multiple virtual connections to both storage and network resources. In I/O intensive applications, this approach can help increase both VM performance and the potential number of VMs per server. Scheduler is not aware of the urgency with which virtual machine needs to execute, because it only attempts to fairly allocate processor resources among virtual machine over the long run. This results in a problem in I/O-intensive virtual machine, in which response latency will vary widely among various virtual machines. If the virtual machine is close to the front of the queue, its response latency will be low. If it is
away from the front of the queue, it has to wait until Computation of intensive virtual machine is over. This results in high response latency. In order to resolve the problem of high response latency, an additional state: URGENT is added. A virtual machine in this state has higher priority than the virtual machine in HAVE and OVER states. Linux task_struct structure of sleep_avg domain records the past behavior of the process, especially the average sleep time of the process. The Linux kernel uses a circular doubly-linked list of structure task_struct to store the process descriptors. This structure is declared in Linux/sched.h. So when virtual machine process reaches the maximum sleep_avg value, it is considered as an I/O intensive virtual machine and is set to URGENT state. Once a virtual machine enters the URGENT state, it will prevent from entering the process queue at the tail and also to wait for all other active virtual machines before it is executed. It will preempt the current virtual machine and starts running. By increasing virtual machines priority in this fashion response latency of I/O intensive virtual machine can be reduced. In the optimized scheduler, when a virtual machine becomes run able, its remaining time slices have only a limited effect on its place in the process queue. Specifically, the remaining time slices only determines the virtual machines state. The virtual machine is always sent to the queue after the last virtual machine of the same state. In fact, I/O-intensive virtual machine will not be given any time slices if it happens to block before the periodic scheduler interrupt. I/O-intensive VMs will often consume their time slices more slowly than CPU-intensive VMs. So by sorting the process queue periodically according to each virtual machines remaining time slices, the latency for an I/O-intensive virtual machine which is to be executed can be reduced. The overall process of the new KVM scheduler is depicted in the following flow chart.
Dept of CSE, BIT
Page 18
Y Y N
Y
Y
N
FLOW CHART OF KVM SCHEDULER PROCESS PROCESS
Dept of CSE, BIT
Page 19
EVALUATION AND ANALYSIS

Some evaluation and analysis is made on optimization which is described above. These optimizations have been implemented in KVM module of Linux 2.6.29 version kernel, and run CentOS5.3 in VM. The hardware system for running the evaluation is: CPU Intel Xeon CPU 3.0GHZ, memory is DDR2 667MHZ 1G*4. Physical network interface card is: Intel PRO/1000 Gigabit Server Adapter and virtual network interface card is similar with the hosts and the netperf tools to test the CPU utilization of whole VM when receive packet in VM, and the version of netperf is 2.4.5.
FIGURE 3.3 CPU UTILIZATION WHEN RUN NET SERVER IS IN ONE VM Figure above shows the total CPU utilization in 120 seconds when run netserver is in one virtualization before and after optimization. Also from figure it is seen that, after optimization, the CPU utilization of virtual machine is reduced almost 50%.
Dept of CSE, BIT
Page 20
FIGURE 3.4 CPU UTILIZATION WHEN RUN NET SERVER IS IN TWO VMs Figure above shows the total CPU utilization in 120 seconds when run net server is in two virtual machines before and after optimization. From the figure, it is observed that after optimization, each virtual machines CPU utilization reduces nearly by 50% and become more stable, and each virtual machine tend to balance the workload. On comparing above figures, it is also seen that the sum of two virtual machines CPU utilization is lower than one CPU utilization.
Dept of CSE, BIT
Page 21
4. CONCLUSION
In this report, problem of scheduling kernel based virtual machine is analyzed and to improve the performance, optimized approach for scheduling policy for scheduling both normal virtual machine and I/O intensive virtual machine is proposed.
Dept of CSE, BIT
Page 22
BIBLIOGRAPHY
1. http://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine
2. en.wikipedia.org/wiki/Native_virtualization
3. http://www.qumranet.com/art_images/files/8/KVM_Whitepaper.pdf
4. http://en.wikipedia.org/wiki/QEMU
5. http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture08.pdf
6. http://en.wikipedia.org/wiki/Virtualization
7. http://en.wikipedia.org/wiki/Network_monitoring
8. The netperf benchmark: http://www.netperf.org/netperf/NetperfPage.html 9. http://linuxgazette.net/133/saha.html 10. http://kerneltrap.org/node/525 11. CFS Optimizations to KVM Threads on Multi-Core Environment (IEEE, 2010) 12. A Synthetic Performance Evaluation of OpenVZ, Xen and KVM (IEEE, 2009) 13. http://www.anandtech.com/show/2480/10 14. http://www.linuxfoundation.org/collaborate/workgroups/networking/napi 15. http://en.wikipedia.org/wiki/Completely_Fair_Scheduler 16. http://en.wikipedia.org/wiki/Full_virtualization 17. http://en.wikipedia.org/wiki/Network_interface_controller 18. A Survey on I/O Virtualization and Optimization (The Fifth Annual China Grid Conference)
Dept of CSE, BIT
Page 23

Final Report

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Report

Uploaded by

Copyright:

Available Formats

Scheduling Policy Optimization in Kernel-based Virtual Machine

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

1.2 LITERATURE SURVEY

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

CPU 0 ACTIVE CPU 1

CPU2 Que[100] . . CPU i EXPIRED Task 0 Task 1

FIGURE 2.1 CPU PROCESS QUEUE FOR SCHEDULE

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

FIGURE 2.2 KVM THREADS IN LINUX KERNEL

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

I/O PROXY UNPRIVILEGED GUEST 1 UNPRIVILEGED GUEST 2

PRIVILEGED GUEST KERNEL

FIGURE 2.3 HYPERVISOR BASED ARCHITECTURE

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

NORMAL USER PROCESS

NORMAL USER PROCESS

FIGURE 2.4 KERNEL BASED VIRTUAL MACHINE ARCHITECTURE

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

2.1 WORK FLOW DIAGRAM

CHECKS VM PROCESS tqi

IF tqi>0 THEN PROCESS STATE: HAVE ELSE PROCESS STATE: OVER

ASSIGN PRIORITY TO VM PROCESS

CHECKS avg_sleep TIME

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

2.2 DATA FLOW DIAGRAM

LINUX tqi, process state:HAVE

Process state: HAVE

Process state: HAVE

Process state: HAVE

Assign process state to VM

Process state: HAVE/URGENT

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

GUEST OS NORMAL PROCESS PROCESS INFORMATION VM1 ..

GUEST OS PROCESS INFORMATION VMn

OPTIMIZED LINUX SCHEDULER

FIGURE 3.1 KVM PROCESS QUEUE

Dept of CSE, BIT

Scheduling Policy Optimization in Kernel-based Virtual Machine

Dept of CSE, BIT