Professional Documents
Culture Documents
&
Cloud OS
符儒嘉
工研院雲瑞運算中心
系统軟體組 組長
1
Agenda
• Introduction
• CCMA @ ITRI ( 工研院雲端運算行動應用科技中心
)
• Cloud OS
– Virtual Data Center & Virtual Clusters
– Virtualized Storage
– Networking in a Cloud Data Center
– Runtime Virtual Machine Management
– Security
• System Management
• Summary
Cloud Computing Definition
• Provisioning of dynamically scalable and virtualized
resources as a service over the Internet.
– Multi-tenancy
– Device & Location independence
– Ability to obtain virtual computing resource on demand
– Provides the Illusion of infinite computing resources
– Self-Provisioning of virtual resources
– Eliminates the need for up-front commitment by Cloud
developers
– Provides the ability to pay as you go for use of computing
resources
– Reliability, Scalability, Security, Manageability
Cloud Computing vs. Utility Services
雲端運算 電力供應
資料中心
IaaS
Providers
企業
PaaS
Providers 一次變電所
ISVs
SaaS
二次變電所
Providers
End Users
資料來源: IEK (2010/02)
Timing is right
• Technology Push
– Broadband network connectivity getting faster
and more reliable
– Internet service availability significantly
improved
– Sufficient trust in infrastructure providers
– By many measures, Google is already a
critical service for most of the world, and it is
in the cloud!
• Market Pull
– Big Data
– Software install on premise Software as a
service (SaaS)
– Information technology (IT) on premise IT
service as a rented utility (as in electricity)
• “IT should not and will not be a core
competence for most corporations”
Nicholas Carr’s - “Does IT matter?” and
“The Big Switch”
– Lowering up-front and day-to-day IT cost: pay
only as much as actual resource usage
Cost of Data Center
Data Center Budget
Servers
15%
Power distribution &
45% Cooling
25% Power Draw (utility)
15% Network
Power Usage
Types of Clouds
Providers
Service Users
Service
Private Cloud
Cloud Providers
Cloud Platform Services (PaaS)
Physical
Infrastructure
7
Infrastructure as a Service
Example Players
• Amazon
• GoGrid
• RightScale
• Rackspace
• VMOps
• Eucalyptus
• ElasticHosts
•…
Platform as a Service
Example Players
• Microsoft Azure
• Google App Engine
• Force.com
• Rackspace Cloud
• Heroku
• QuickBase
• Caspio
•…
Software as a Service
Example Players
• SaleForce.com
• Adobe.com
• Autodesk
• WebEx
• Microsoft Office
• Gmail & other
Google Apps
• Flicker
•…
DataCenter as a Computer
• Majority of cloud computing infrastructure consists of
reliable services delivered through data centers
• Traditional colocation Datacenters
– Multiple servers and communications gear collocated due to
common environmental & security needs
– Hosts a large number of relatively small or medium-sized
applications, each running on a dedicated hardware
infrastructure
• Datacenters for Cloud Computing platform
– Belongs to a single organization,
– Uses a relatively homogeneous hardware and system software
platform, and share a common system management layer.
– Runs a smaller number of very large applications
– Cloud computing workloads must be designed to gracefully
tolerate large numbers of component faults with little or no
impact on service level performance and availability.
Warehouse Scale Computers (WSC)
• Not just a collection of servers
– Hundreds to Thousands of servers running in concordance
– Typically runs on a virtualized platform
– Fault behavior & energy considerations have significant impact
– Needs to be considered as a single unit
• Must be highly manageable
– Deployment of software updates
– Monitoring & system management
• Affordability
– Currently power Public Cloud such as Google, Amazon, Yahoo,
Microsoft’s, etc…
– Soon to be affordable by Enterprises
• A rack of servers can easily have > 600 cores
Google “Warehouse Style Computer”
Data Center
“Secret Sauce” of Cloud Computing
• Commodity components
• Virtualization
– Servers, Memory, Storage, Network
• Self Provisioning
– Programmatic Control
• Elasticity
• Data vs. Response time
– Data and Traffic keeps on growing, but response time must
maintain relatively constant
• Data Center must “scale out”
– Manageability
– High Availability
– “Green” Computing
The New Data Center Industry
• Container Computer for high efficiency and
environmental conservation (Packaging, PUE, …)
• Bundled software (Cloud OS) for integrated service, high
scalability, and availability
• Large Enterprise will bypass traditional server channels
(IBM, HP, Dell, …)
– Purchase of entire data center directly from ODM manufacturers
• Significant cost reductions
• Horizontal scalability
• High Availability
• Google already directly purchase from Taiwan manufacturers
工研院雲端運算應用科技中心
CCMA@ITRI
Mission Statement
Hardware Cloud OS
DataCenter
Know-how
18
Container Computers
19
Data Center Architecture Know-how
• Treat the entire data center as a computer
- Air flow analysis
- Cooling architecture (thermal management)
- Power/energy management
- Focus on ease of system and network management
- What cannot be managed/monitored does not get deployed
• Modular and Scalable (Card to Rack to Container to
Warehouse)
• Explore low power, commodity CPU as a building block
• Google data center tour (http://
www.google.com/corporate/green/datacenters/summit.html)
System Software (Cloud OS)
• Virtualization Platform
– CPUs Mail
Virtual
Bkup
Virtual
HC
Virtual AppX
Virtual
– Storage (Filesystems) Cluster Cluster Cluster
Cluster
– Network
• Resource Management
– Provisioning of virtual clusters
– Physical machine load balancing
VM VM VM
– Network traffic load balancing VM
• Power Management
• Security
– Hypervisor protection
– Compartmentalization between
Clusters CCMA Infrastructure SW
• System Management
– FCAPS Physical Physical
Physical Physical
Node Node
• High Availability Node Node
Layer-3
Border
Routers
Layer-2-Only Data
Center Network
Server Load
Balancer
Cluster
Compute Server
Rack
Storage
Server
Architecture Prinicples
• Commodity Hardware
– A set of compute servers each equipped with
homogenous multiple CPUs
• Requires CPU/memory/IO virtualization support
– A set of JBOD (just a bunch of disks) storage servers
proportionally intermixed with the compute servers
• Low-power CPU is sufficient; RAID is optional
– A layer-2-only network connects all servers that
consists of top-of-rack switches and core switches
• Everything is virtualized
– CPU, Memory, Storage, Network
• If a resource cannot be remotely managed, it
should not be part of the CCMA data center
Software Stack for Cloud OS
Cloud Application Management Tool
Physical
Physical Compute Servers
Cluster Security
Deployment Distributed Main/Secondary Storage
Tool All-layer-2 Network
Intra-Virtual-Cluster
Load Balancing
Power
Management
Virtual Machine Management
Virtualization Platform
• Leverage existing Mail Bkup HC
AppXYZ
Virtual Virtual Virtual
hypervisors Cluster Cluster Cluster
Virtual
Cluster
– Monitor VM Performance
Nodes agents
agents
DFS
DMS
Metadata
DFS
VM iSCSI iSCSI DFS DFS
Client …
Volume Initiator Target DataNode DataNode
Networking in Cloud OS
• Scalable Load Balancer Cluster
– Inter-VirtualCluster load balancing
– Each member of SLB cluster responsible for load balancing one or
more VC’s
– Load balance based on current load on virtual machines
• Layer 2 only
– How to scale up to 100,000 physical servers with commodity Ethernet
switches
– Load balance of Network packet routing
• Support for fast fail-over
– Pre-computed main and alternative routes
– Fast failure detection and re-routing
• Use Valiant load balancing to avoid congestion or bottlenecks
Layer-2-Only Data Center Network
Valiant
Server load balancing Fast failure
load Network load balancing Server Fail Over Server detection and re-
balancing routing
Core (Layer 2
switch)
Region (Layer 2
switch)
IP1, MAC1
Node #1 Node #2 Node #3 Node #4 Node #10 Node #20 Node #30 Node #40 Node #100 Node #200 Node #300 Node #400
IP2, MAC2 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1 VM #1
VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2 VM #2
VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3 VM #3
Compute VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24 VM #24
Server
Rack
Virtual Machine Management
• Objective
– Power Management
– Physical Machine Load Balancing
• Monitor runtime VM statistics
– Heuristic calculation to predict workload for virtual clusters
• Determine power down/up of machines
– 2 dimensional bin packing
– VM migration algorithm
• Physical machine load balancing
– Migration of VM’s to other physical machine to balance out CPU and
I/O load
CONSIGNEE
CONSIGNOR
= PM to be turn off
CONSIGNMNET
= VM to be migrated
Fail-over & Load Balancing
1. One VM die
2.1 Migrate to
2. System is busy meet load
balancing
1.1 Restart
the dead VM
I am the
Die
VM new one!
Monitor Monitor
Hypervisor
Security
• Multi-tenancy architecture
• Inter-virtual-cluster
compartmentalization
– Works in the presence of
constant VM motion
• Virtual appliance-based
firewall and IDS/IPS
– Leverages open-source
firewall/IDS/IPS technology
• Support for AAA, VPN,
and standard access
control
System Management
• Leverages open-source
network/system monitor tool and SNMP IPMI
server configuration tool Agent