You are on page 1of 39

CCMA

&
Cloud OS

符儒嘉
工研院雲瑞運算中心
系统軟體組 組長

1
Agenda
• Introduction
• CCMA @ ITRI ( 工研院雲端運算行動應用科技中心
)
• Cloud OS
– Virtual Data Center & Virtual Clusters
– Virtualized Storage
– Networking in a Cloud Data Center
– Runtime Virtual Machine Management
– Security
• System Management
• Summary
Cloud Computing Definition
• Provisioning of dynamically scalable and virtualized
resources as a service over the Internet.
– Multi-tenancy
– Device & Location independence
– Ability to obtain virtual computing resource on demand
– Provides the Illusion of infinite computing resources
– Self-Provisioning of virtual resources
– Eliminates the need for up-front commitment by Cloud
developers
– Provides the ability to pay as you go for use of computing
resources
– Reliability, Scalability, Security, Manageability
Cloud Computing vs. Utility Services
雲端運算 電力供應

資料中心

IaaS
Providers
企業

PaaS
Providers 一次變電所
ISVs

SaaS
二次變電所
Providers

End Users
資料來源: IEK (2010/02)
Timing is right
• Technology Push
– Broadband network connectivity getting faster
and more reliable
– Internet service availability significantly
improved
– Sufficient trust in infrastructure providers
– By many measures, Google is already a
critical service for most of the world, and it is
in the cloud!

• Market Pull
– Big Data
– Software install on premise  Software as a
service (SaaS)
– Information technology (IT) on premise  IT
service as a rented utility (as in electricity)
• “IT should not and will not be a core
competence for most corporations”
Nicholas Carr’s - “Does IT matter?” and
“The Big Switch”
– Lowering up-front and day-to-day IT cost: pay
only as much as actual resource usage
Cost of Data Center
Data Center Budget

Servers
15%
Power distribution &
45% Cooling
25% Power Draw (utility)

15% Network

Power Usage
Types of Clouds

Hybrid Public Cloud


Cloud

Providers
Service Users

Service
Private Cloud

Cloud End-User Services


(SaaS)

Cloud Providers
Cloud Platform Services (PaaS)

Cloud Infrastructure Services (IaaS)

Physical
Infrastructure
7
Infrastructure as a Service
Example Players
• Amazon
• GoGrid
• RightScale
• Rackspace
• VMOps
• Eucalyptus
• ElasticHosts
•…
Platform as a Service
Example Players
• Microsoft Azure
• Google App Engine
• Force.com
• Rackspace Cloud
• Heroku
• QuickBase
• Caspio
•…
Software as a Service
Example Players
• SaleForce.com
• Adobe.com
• Autodesk
• WebEx
• Microsoft Office
• Gmail & other
Google Apps
• Flicker
•…
DataCenter as a Computer
• Majority of cloud computing infrastructure consists of
reliable services delivered through data centers
• Traditional colocation Datacenters
– Multiple servers and communications gear collocated due to
common environmental & security needs
– Hosts a large number of relatively small or medium-sized
applications, each running on a dedicated hardware
infrastructure
• Datacenters for Cloud Computing platform
– Belongs to a single organization,
– Uses a relatively homogeneous hardware and system software
platform, and share a common system management layer.
– Runs a smaller number of very large applications
– Cloud computing workloads must be designed to gracefully
tolerate large numbers of component faults with little or no
impact on service level performance and availability.
Warehouse Scale Computers (WSC)
• Not just a collection of servers
– Hundreds to Thousands of servers running in concordance
– Typically runs on a virtualized platform
– Fault behavior & energy considerations have significant impact
– Needs to be considered as a single unit
• Must be highly manageable
– Deployment of software updates
– Monitoring & system management
• Affordability
– Currently power Public Cloud such as Google, Amazon, Yahoo,
Microsoft’s, etc…
– Soon to be affordable by Enterprises
• A rack of servers can easily have > 600 cores
Google “Warehouse Style Computer”
Data Center
“Secret Sauce” of Cloud Computing
• Commodity components
• Virtualization
– Servers, Memory, Storage, Network
• Self Provisioning
– Programmatic Control
• Elasticity
• Data vs. Response time
– Data and Traffic keeps on growing, but response time must
maintain relatively constant
• Data Center must “scale out”
– Manageability
– High Availability
– “Green” Computing
The New Data Center Industry
• Container Computer for high efficiency and
environmental conservation (Packaging, PUE, …)
• Bundled software (Cloud OS) for integrated service, high
scalability, and availability
• Large Enterprise will bypass traditional server channels
(IBM, HP, Dell, …)
– Purchase of entire data center directly from ODM manufacturers
• Significant cost reductions
• Horizontal scalability
• High Availability
• Google already directly purchase from Taiwan manufacturers
工研院雲端運算應用科技中心

CCMA@ITRI
Mission Statement

Deliver an end-to-end data center


architecture know-how and a
system software suite that will
enable a cloud service provider to
operate a mega data center that is
the most efficient and capable in
the world
Cloud Computing Food Chain

Build Cloud Data Center


the Google Way

Hardware Cloud OS
DataCenter
Know-how

18
Container Computers

19
Data Center Architecture Know-how
• Treat the entire data center as a computer
- Air flow analysis
- Cooling architecture (thermal management)
- Power/energy management
- Focus on ease of system and network management
- What cannot be managed/monitored does not get deployed
• Modular and Scalable (Card to Rack to Container to
Warehouse)
• Explore low power, commodity CPU as a building block
• Google data center tour (http://
www.google.com/corporate/green/datacenters/summit.html)
System Software (Cloud OS)
• Virtualization Platform
– CPUs Mail
Virtual
Bkup
Virtual
HC
Virtual AppX
Virtual
– Storage (Filesystems) Cluster Cluster Cluster
Cluster

– Network
• Resource Management
– Provisioning of virtual clusters
– Physical machine load balancing
VM VM VM
– Network traffic load balancing VM

• Power Management
• Security
– Hypervisor protection
– Compartmentalization between
Clusters CCMA Infrastructure SW
• System Management
– FCAPS Physical Physical
Physical Physical
Node Node
• High Availability Node Node

– Physical component failure does not


interrupt availability of virtual resources
• Cloud Applications management
Cloud OS
What’s different about WSC’s?
As computation continues to move into the cloud, the computing
platform of interest no longer resembles a pizza box or a
refrigerator, but a warehouse full of computers. These new large
datacenters are quite different from traditional hosting facilities of
earlier times and cannot be viewed simply as a collection of co-
located servers. Large portions of the hardware and software
resources in these facilities must work in concert to efficiently deliver
good levels of Internet service performance, something that can
only be achieved by a holistic approach to their design and
deployment. In other words, we must treat the datacenter itself as
one massive warehouse-scale computer (WSC).

The Datacenter as a Computer: An Introduction to the Design of


May, 2009
Commodity Hardware-Only System
Architecture
Physical Server
VM0 VM1 VMn

Layer-3
Border
Routers

Layer-2-Only Data
Center Network

Server Load
Balancer
Cluster
Compute Server
Rack
Storage
Server
Architecture Prinicples
• Commodity Hardware
– A set of compute servers each equipped with
homogenous multiple CPUs
• Requires CPU/memory/IO virtualization support
– A set of JBOD (just a bunch of disks) storage servers
proportionally intermixed with the compute servers
• Low-power CPU is sufficient; RAID is optional
– A layer-2-only network connects all servers that
consists of top-of-rack switches and core switches
• Everything is virtualized
– CPU, Memory, Storage, Network
• If a resource cannot be remotely managed, it
should not be part of the CCMA data center
Software Stack for Cloud OS
Cloud Application Management Tool

Virtual Cluster Provisioning


Network/System
Management

Physical
Physical Compute Servers
Cluster Security
Deployment Distributed Main/Secondary Storage
Tool All-layer-2 Network

Intra-Virtual-Cluster
Load Balancing
Power
Management
Virtual Machine Management
Virtualization Platform
• Leverage existing Mail Bkup HC
AppXYZ
Virtual Virtual Virtual
hypervisors Cluster Cluster Cluster
Virtual
Cluster

– Allocation of virtual machine System


System
Service
Service
instances Compute
daemons
daemons
Cloud
Cloud OS
OS

– Monitor VM Performance
Nodes agents
agents

– Virtual storage provisioning


– Intra-VirtualCluster load
balancing
– Scalable data center network Service Nodes Data Nodes

– Isolation between virtual


clusters
Physical Physical Storage Storage
– Virtual machine migration Node Node
Physical
Node Server Server
Virtual Resource Provisioning
• Physical cluster deployment
• Virtual Cluster
– A group of VM’s providing same service, front-ended by a network load
balancer
– Configuration
• Storage space requirement
• External network bandwidth requirement
• Load Balancing policy
• Firewall/IDS setting
• Network configuration, including DNS and DHCP
• OS image and application image
• Virtual Data Center
– One or more virtual cluster working in coordination (multi-tier web
services, EMR’s, VDI’s, etc)
• Physical Machine Load Balancing
– Satisfy each virtual cluster’s performance requirement while minimizing
the total amount of physical resource reserved
Virtual Storage Management
• Storage virtualization
– Service models
• Dedicated or Shared Volume
• Shared Filesystem
• Shared Database
• Distributed main storage
– Provides a global storage abstraction on a large number of
distributed storage servers
• Distributed secondary storage
– Replication, Snapshot, Deduplication
• Unification of SAN and LAN: 10G Ethernet interconnect
• Each storage block in a disk volume remains available
despite failure in switch, server, or disk drive
• Thin Provisioning
• Scales to a very large number of concurrent accesses
Cloud Storage System Architecture

DFS
DMS
Metadata

DFS
VM iSCSI iSCSI DFS DFS
Client …
Volume Initiator Target DataNode DataNode
Networking in Cloud OS
• Scalable Load Balancer Cluster
– Inter-VirtualCluster load balancing
– Each member of SLB cluster responsible for load balancing one or
more VC’s
– Load balance based on current load on virtual machines
• Layer 2 only
– How to scale up to 100,000 physical servers with commodity Ethernet
switches
– Load balance of Network packet routing
• Support for fast fail-over
– Pre-computed main and alternative routes
– Fast failure detection and re-routing
• Use Valiant load balancing to avoid congestion or bottlenecks
Layer-2-Only Data Center Network

Valiant
Server load balancing Fast failure
load Network load balancing Server Fail Over Server detection and re-
balancing routing

Core (Layer 2
switch)

Region (Layer 2
switch)

Top Of Rack (Layer 2


switch)

IP1, MAC1
Node #1 Node #2 Node #3 Node #4 Node #10 Node #20 Node #30 Node #40 Node #100 Node #200 Node #300 Node #400
IP2, MAC2 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1
VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2
VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3
Compute VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24

Server
Rack
Virtual Machine Management
• Objective
– Power Management
– Physical Machine Load Balancing
• Monitor runtime VM statistics
– Heuristic calculation to predict workload for virtual clusters
• Determine power down/up of machines
– 2 dimensional bin packing
– VM migration algorithm
• Physical machine load balancing
– Migration of VM’s to other physical machine to balance out CPU and
I/O load

CONSIGNEE

CONSIGNOR
= PM to be turn off

CONSIGNMNET
= VM to be migrated
Fail-over & Load Balancing

Virtual Machine Manager

1. One VM die
2.1 Migrate to
2. System is busy meet load
balancing

1.1 Restart
the dead VM

I am the
Die
VM new one!

Monitor Monitor
Hypervisor
Security
• Multi-tenancy architecture
• Inter-virtual-cluster
compartmentalization
– Works in the presence of
constant VM motion
• Virtual appliance-based
firewall and IDS/IPS
– Leverages open-source
firewall/IDS/IPS technology
• Support for AAA, VPN,
and standard access
control
System Management
• Leverages open-source
network/system monitor tool and SNMP IPMI
server configuration tool Agent

• Discovery of comprehensive inter-


service dependency map: How an Security
arbitrary service depends on other Configuration
CFENGINE
LDAP

services and in what temporal Container


order Computer
• Provides application-level Network
Operating
performance monitoring support to System
cloud application management Performance
tool GANGLIA
Accounting
RADIUS
• Comprehensive resource usage
accounting for SLA or billing
purpose Fault
MANTIS
• Virtualization-aware, temperature
aware and power-aware
Summary
Why do we need Cloud OS?
• Warehouse Style Computer (WSC) takes a holistic view
of the entire data center to make sure it works as if it is a
single server
• Cloud OS integrates server virtualization, storage
virtualization, and network virtualization to provide:
– Resource management for Virtual Data Centers and Virtual Clusters
– Scalable Data Center Networking
– Load Balancing of Virtual Cluster, Network Traffic, and Physical Machines
– Ease of management for all Data Center resources
– Highly Available services
– End-to-end security and QoS guarantee
• Taiwan ODM manufacturers is uniquely positioned to take
advantage of growth Data Center Industry due to Cloud
Computing
– WSC will be used in both Public clouds and Private clouds
– Cloud OS will significantly enhance the value of WSC’s
Q&A
Thank you!

You might also like