You are on page 1of 90

MCAE25 Cloud Computing

Chethan Venkatesh Department of MCA M S Ramaiah Institute of Technology


25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 1

UNIT-IV

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

Programming Support of Google App Engine


Programming the Google App Engine : Web resources : http://code.google.com/appengine. Books and articles : www.byteonic.com/2009/overview-of-java-in-google-appengine/

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

Programming Support of Google App Engine Cont..


Key features of GAE programming model supported on two languages : Java and Python : 1. A client environment with Eclipse plug-in for Java to debug GAE on local machine. 2. GWT available for Java web application developers. Developers can use this or any other language using JVM based interpreter or compiler, such as JavaScript or Ruby. Python is often used with frameworks such as Django and CherryPy. Google also supplies a built in webapp Python environment.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

Programming Support of Google App Engine Cont..


Powerful constructs for storing and accessing data :The data store is NOSQL data management. Schema-less properties. Java offers JDO (Java Data Object) and JPA (Java Persistence API) interfaces implemented by the open source Data Nucleus Access platform. Python has SQL-like query language called GQL. Data store is strongly consistent. Uses optimistic concurrency control.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

Programming Support of Google App Engine Cont..


An update of an entity occurs in a transaction that is retried a fixed number of times if other processes are trying to update the same entity simultaneously. Data store implements transactions across its distributed network using entity groups. Entities of the same group are stored together. A transaction manipulates entities within a single group. GAE applications can assign entities to groups when they are created. Performance can be enhanced by in-memory caching using memcache.
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 6

Programming Support of Google App Engine Cont..


Blobstore :- store large files. Google SDC (Secure Date Connection) :- tunnel through the internet and link your intranet to an external GAE applications. URL Fetch operation:- provides the ability for applications to fetch resources and communicate with other hosts over the internet using HTTP and HTTPS requests. Applications can access web services, resources and other data on the internet. Specialized mail mechanism to send e-mail from your GAE application. Googles corporate facilities includes maps, sites, groups, calendar, docs, and Youtube. Support Google Data API and can be used inside GAE
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 7

Programming Support of Google App Engine Cont..


Google accounts :- used by applications for user authentication. Handles account creation and sign-in. Easy for user with Google account. Image service :- manipulate image data (resize, rotate, flip, crop). Cron jobs :- applications can perform tasks outside of responding to web requests. GAE is configured to consume resources up to certain limits or quotas. Free up to certain quotas.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

Programming Support of Google App Engine Cont..


Google File System (GFS) : Fundamental storage service for Googles search engine. Web data crawled and saved was huge. Need for an distributed file system to redundantly store massive amounts of data on cheap and unreliable computers. Assumptions : The system is built from many inexpensive commodity components that often fail. It must constantly monitor itself and detect, tolerate, and recover promptly from component failures on a routine basis.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

Programming Support of Google App Engine Cont..


The system stores a modest number of large les. We expect a few million les, each typically 100 MB or larger in size. Multi-GB les are the common case and should be managed eciently. Small les must be supported, but we need not optimize for them. The workloads primarily consist of two kinds of reads: large streaming reads and small random reads. In large streaming reads, individual operations typically read hundreds of KBs, more commonly 1 MB or more. The workloads also have many large, sequential writes that append data to les.
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 10

Programming Support of Google App Engine Cont..


The system must eciently implement well-dened semantics for multiple clients that concurrently append to the same le. High sustained bandwidth is more important than low latency.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

11

Programming Support of Google App Engine Cont..


Design Decision :-

Files stored as chunks Fixed size (64MB) Reliability through replication Each chunk replicated across 3+ chunkservers Single master to coordinate access, keep metadata Simple centralized management No data caching Little benefit due to large data sets, streaming reads Familiar interface, but customize the API Simplify the problem; focus on Google apps Add snapshot and record append operations
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 12

Programming Support of Google App Engine Cont..

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

13

Programming Support of Google App Engine Cont..


Single master Mutiple chunkservers Master Manages namespace/metadata Manages chunk creation, replication, placement Performs snapshot operation to create duplicate of file or directory tree Performs checkpointing and logging of changes to metadata Load balancing Unused storage reclaim Periodically communicate with chunkservers (HeartBeat message)
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 14

Chunkservers Stores chunk data and checksum for each block On startup/failure recovery, reports chunks to master Periodically reports sub-set of chunks to master (to detect no longer needed chunks) Chunkservers store chunks on local disk as Linux files

Programming Support of Google App Engine Cont..

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

15

Programming Support of Google App Engine Cont.. From distributed systems we know this is a:
Single point of failure Scalability bottleneck

GFS solutions: Shadow masters Minimize master involvement (client chunk server)
never move data through it, use only for metadata
and cache metadata at clients

large chunk size master delegates authority to primary replicas in data mutations (chunk leases)

Simple, and good enough!

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

16

Programming Support of Google App Engine Cont..


Data mutation (Write, Append operations) in GFS :-

Data blocks must be created for all replicas. Goal minimize involvement of the master.
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 17

Programming Support of Google App Engine Cont..


Steps in mutation :1. The client asks the master which chunk server holds the current lease for the chunk and the locations of the other replicas. If no one has a lease, the master grants one to a replica it chooses (not shown). 2. The master replies with the identity of the primary and the locations of the other (secondary) replicas. The client caches this data for future mutations. It needs to contact the master again only when the primary becomes unreachable or replies that it no longer holds a lease.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

18

Programming Support of Google App Engine Cont..


3. The client pushes the data to all the replicas. A client can do so in any order. Each chunk server will store the data in an internal LRU buer cache until the data is used or aged out. By decoupling the data ow from the control ow, we can improve performance by scheduling the expensive data ow based on the network topology regardless of which chunk server is the primary. 4. Once all the replicas have acknowledged receiving the data, the client sends a write request to the primary. The request identies the data pushed earlier to all of the replicas. The primary assigns consecutive serial numbers to all the mutations it receives, possibly from multiple clients, which provides the necessary serialization. It applies the mutation to its own local state in serial number order.
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 19

Programming Support of Google App Engine Cont..


5. The primary forwards the write request to all secondary replicas. Each secondary replica applies mutations in the same serial number order assigned by the primary. 6. The secondaries all reply to the primary indicating that they have completed the operation. 7. The primary replies to the client. Any errors encountered at any of the replicas are reported to the client. In case of errors, the write may have succeeded at the primary and an arbitrary subset of the secondary replicas.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

20

Programming Support of Google App Engine Cont..


(If it had failed at the primary, it would not have been assigned a serial number and forwarded.) The client request is considered to have failed, and the modied region is left in an inconsistent state. Our client code handles such errors by retrying the failed mutation. It will make a few attempts at steps (3) through (7) before falling backto a retry from the beginning of the write.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

21

Programming Support of Google App Engine Cont..


BigTable, Googles NOSQL System :

Lots of (semi-)structured data at Google


URLs:
Contents, crawl metadata, links, anchors, pagerank,

Per-user Data:
User preference settings, recent queries/search results,

Geographic locations:
Physical entities (shops, restaurants, etc.). roads, satellite image data, user annotations,

Scale is large
Billions of URLs, many versions/page(~20K/version) Hundreds of millions of users, thousands of q/sec 100TB+ of satellite image data
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 22

Programming Support of Google App Engine Cont..


Why not commercial database? Scale is too large for most commercial databases Even if it werent, cost would be very high Building internally means system can be applied across many projects for low incremental cost Low-level storage optimizations help performance significantly Much harder to do when running on top of a database layer

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

23

Programming Support of Google App Engine Cont..


BigTable is : Distributed multi-level map With an interesting data model Fault-tolerant, persistent Scalable Thousands of servers Terabytes of in-memory data Petabytes of disk-based data Millions of reads/writes per second, efficient scans Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

24

Programming Support of Google App Engine Cont..


The BigTable system is built on top of an existing Google cloud infrastructure. Uses the following building blocks :1. GFS : stores persistent state 2. Scheduler : schedules jobs involved in BigTable serving 3. Lock service : master election, location bootstrapping 4. MapReduce : often used to read/write BigTable data

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

25

Programming Support of Google App Engine Cont..


Tablets : Large tables broken into tablets at row boundaries Tablet holds contiguous range of rows Clients can often choose row keys to achieve locality Aim for ~100MB to 200MB of data per tablet Serving machine responsible for ~100 tablets Fast recovery: 100 machines each pick up 1 tablet from failed machine Fine-grained load balancing: Migrate tablets away from overloaded machine Master makes load-balancing decisions

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

26

Programming Support of Google App Engine Cont..


Tablet Location Hierarchy :

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

27

Programming Support of Google App Engine Cont..


Since tablets move around from server to server, given a row, how do clients find the right machine? Need to find tablet whose row range covers the target row One approach: could use the BigTable master Central server almost certainly would be bottleneck in large system Instead: store special tables containing tablet location info in BigTable cell itself

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

28

Programming Support of Google App Engine Cont..


Googles approach: 3-level hierarchical lookup scheme for tablets Location is ip:port of relevant server 1st level: bootstrapped from lock service, points to owner of META0 2nd level: Uses META0 data to find owner of appropriates META1 tablet 3rd level: META1 table holds locations of tablets of all other tables META table itself can be split into multiple tables

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

29

Programming Support of Google App Engine Cont..


Chubby, Googles Distributed Lock Service :\ Provide coarse-grained locking service. Store small files inside Chubby storage which provide namespace as a file system tree. Files stored are small compared to GFS. Paxos agreement protocol. Reliable.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

30

Programming Support of Google App Engine Cont..

Two main components:


server (Chubby cell) client library communicate via RPC

Proxy
optional

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

31

Programming on Amazon AWS and Microsoft AZURE cont.. Programming on Amazon EC2 : First company to introduce VMs in application hosting. Rent VM instead of physical machines. Can load any software on VM. Elastic feature customer can create, launch, and terminate server instances as needed. Pay by hour for active servers. Provides preinstalled VMs. Instances are called as Amazon Machine Images (AMI). Preconfigured with operating systems based on Linux or Windows, and additional software. Table defines 3 types of AMI.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

32

Programming on Amazon AWS and Microsoft AZURE cont..

Image Type Private

Definition Images created by you, which are private by default. You can grant access to other users to launch your private images. Images created by users and released to the Amazon Web Services community, so anyone can launch instances based on them and use them any way they like. The Amazon Web Services Developer Connection Web site lists all public images. You can create images providing specific functions that can be launched by anyone willing to pay you per each hour of usage on top of Amazon charges.

Public

Paid

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

33

Programming on Amazon AWS and Microsoft AZURE cont.. Execution environment of Amazon EC2
Amazon Machine Image Public AMIs Private AMIs Paid AMIs
Create an AMI Launch Create Key Pair Configure Firewall

Elastic IP address Elastic Block Store

Virtualization Layer

Compute

Storage

Server

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

34

Programming on Amazon AWS and Microsoft AZURE cont.. AMIs are the templates for instances, which are running VMs. Workflow to create a VM is : Create an AMI Create Key Pair Configure Firewall Launch

This sequence is supported by public, private, and paid AMIs. Table shows instance types available on Amazon EC2 (Oct 6, 2010)

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

35

Programming on Amazon AWS and Microsoft AZURE cont..

Compute Instance
Standard: Small

Memory GB

ECU or EC2 Units

Virtual Cores

Storage GB

32/64 Bit

1.7

160

32

Standard: Large
Standard: Extra Large

7.5
15 0.613

4
8 Up to 2

2
4

850
1690 Only EBS

64
64 32 or 64 64

Micro
High-Memory 17.1 6.5 2

420

High-Memory: Double
High-Memory: Quadruple High-CPU : Medium High-CPU: Extra Large Cluster Compute
25 December 2012

34.2
68.4 1.7 7 23

13
26 5 20 33.5

4
8 2 8 8

850
1690 350 1690 1690

64
64 32 64 64
36

Chethan Venkatesh, Dept of MCA, MSRIT

Programming on Amazon AWS and Microsoft AZURE cont..

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

37

Programming on Amazon AWS and Microsoft AZURE cont.. Amazon Simple Storage Service (S3) : Provides simple web service interface. Used to store and retrieve any amount of data, anytime from anywhere on the web. Provides object-oriented storage service. Users can access their objects through Simple Object Access Protocol (SOAP) using browsers or other client program which supports SOAP. SQS :- responsible for ensuring a reliable message service between two processes, even if the receiver processes are not running. Figure shows S3 execution environment.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

38

Programming on Amazon AWS and Microsoft AZURE cont..


User

Object is the basic


unit of data
SOAP Interface

REST Interface
Bucket

Object
Key Value Meta-data Access Control

Bucket for storing


objects

Key for data object


retrieval

Object is attributed to
Virtualization Layer

value, metadata, and access control

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

39

Programming on Amazon AWS and Microsoft AZURE cont.. Object-Based Storage. 1 B 5 GB / object. Redundant thru geographic dispersion. 99.99% Availability Goal. Authentication mechanisms. Objects can be Private or Public. Per-object URLs & ACLs. BitTorrent Support (default download protocol is HTTP).

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

40

Programming on Amazon AWS and Microsoft AZURE cont.. Pricing : $.15 per GB per month storage. First 1 GB per month input or output free and then $.08 to $0.15 per GB for transfers outside S3 region. There is no data transfer charge for data transferred between EC2 and S3 within the same region or for data transferred between EC2 Northern Virginia and S3 U.S. Standard region (Oct 6, 2010)

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

41

Programming on Amazon AWS and Microsoft AZURE cont.. Amazon Elastic Block Store (EBS) and SimpleDB : EBS provides volume block interface for saving and restoring the virtual images of EC2 instances. Traditional EC2 will be destroyed after use. Status of EC2 will be saved on to the EBS after the machine is shutdown. S3 is Storage as a Service with messaging interface. EBS is similar to distributed file system. Allows to create volumes from 1GB to 1TB which can be mounted on EC2 instances. Multiple volumes can be mounted on the same instance. Storage volume behaves like raw unformatted block devices.
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 42

Programming on Amazon AWS and Microsoft AZURE cont.. You can create a file system on top of EBS volumes. Also use them as a hard disk. Snapshots for incremental data saving. Pricing :$0.10 pre GB/month. $0.10 per 1 million I/O requests made to the storage (Oct 6 , 2010). Nimbus an open source equivalent to EBS.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

43

Programming on Amazon AWS and Microsoft AZURE cont.. Amazon SimpleDB Service : Also called as LittleTable (metadata). Provides a simplified data model based on relational database data model. Domains used to organize structured data. Each domain can be considered as a table. Items are rows in the table. Cell is the value for a specific attribute (column name) of corresponding row. Key feature :- possible to assign multiple values to a single cell in the table (not permitted in traditional databases).

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

44

Programming on Amazon AWS and Microsoft AZURE cont.. Removes requirement to maintain database schemas. Faster store, access, and query operations. Pricing: First 25 Amazon SimpleDB Machine Hours consumed per month free. $0.140 per Amazon SimpleDB Machine Hour consumed (Oct 6, 2010).

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

45

Programming on Amazon AWS and Microsoft AZURE cont.. Microsoft Azure Programming Support : Components shown in the fig ref text book page number 385. Underlying Azure fabric which consists of virtualized hardware together with sophisticated control environment. Implements dynamic assignment of resources, fault tolerance, DNS and monitoring capabilities. Automated service management allows service models to be defined by an XML template and multiple service copies to be instantiated on request. Azure storage :- stores event logs, trace/debug data, performance counters, IIS web server logs, crash dumps, and other log files. There is no debugging capability for running cloud applications.
Chethan Venkatesh, Dept of MCA, MSRIT 46

25 December 2012

Programming on Amazon AWS and Microsoft AZURE cont.. Basic capabilities. Storage. Compute. Web role :- customized VM (appliances) link to internet for Microsoft web hosting. Worker role :- schedule needed resources. Roles support HTTP(S) and TCP. Offer Onstart() method allows you to perform initialization tasks. Onstop() method called when a role is to be shutdown. Run() method contains the main logic.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

47

Programming on Amazon AWS and Microsoft AZURE cont.. SQL Azure : Offer SQL server as a service. All the storage modalities are accessed with REST interfaces except drives. Drives :- recently introduced. Similar to Amazon EBS. Offers file system interface as durable as NTFS. Also support blob storage. Storage replication is 3 times for fault tolerance.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

48

Programming on Amazon AWS and Microsoft AZURE cont.. Basic storage system is built from blobs similar to S3. Blobs arranged in 3 level hierarchy Account Containers Page or Block Blobs. Containers similar to directories in traditional file systems. Account as root. Blobs used to stream data and is a sequence of blocks up to 4 MB. Each block has 64 byte ID. Block blobs can be up to 200GB in size. Page blobs are for random read/write access. Array of pages with a mazimum blob of 1TB. Metadata can be associated with blobs as <name, value> pairs up to 8 KB per block.
Chethan Venkatesh, Dept of MCA, MSRIT 49

25 December 2012

Programming on Amazon AWS and Microsoft AZURE cont.. Azure Tables : Azure Table and queue storage modes for smaller data volumes. Queues provide reliable message delivery. Support work spooling between web and worker roles. 8KB limit on message size. Can consist of unlimited number of messages. Azure supports PUT, GET, and DELETE message operations. Queues supports CREATE and DELETE operations. Each account can have any number of Azure tables. Consists :Rows entities Columns properties.
Chethan Venkatesh, Dept of MCA, MSRIT 50

25 December 2012

Programming on Amazon AWS and Microsoft AZURE cont.. All entities have upto 255 general properties. <name, type, value> triples. Two extra properties PartitionKey and RowKey. RowKey gives unique label to each entity. PartitionKey designed to be shared and entities with the same partitionKey are stored next to each other. Maximum storage for an entity is 1MB. For large values store a link to a blob store in Table property value. ADO.NET and LINQ support table queries.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

51

Emerging Cloud Software Environments Open Source Eucalyptus and Nimbus : A software platform developed by Eucalyptus Systems, Inc., (started 2008 and stable release 2010) Written in Java, C, running with Linux, can host Linux and Windows VMs Use hypervisors (Xen, KVM and VMWare) and compatible with EC2 and S3 services Eucalyptus stands for Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems. For use in developing IaaS-style private cloud or hybrid cloud on computer cluster, working with AWS API License : Proprietary or GPLv3 for open-core enterprise edition and also an open-source edition available Web site: www.eucalyptus.com
Chethan Venkatesh, Dept of MCA, MSRIT 52

25 December 2012

Emerging Cloud Software Environments cont.. Eucalyptus Architecture :

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

53

Emerging Cloud Software Environments cont.. Open software environment. Supports cloud programmers in VM image management. Supports both computer cloud and storage cloud. VM Image Management : Many design queues from EC2. Similar image management system. Stores images in Walrus. Walrus :- block storage system similar to S3. User can bundle his/her own root file system, and upload and then register this image and link it with a particular kernel and ramdisk image.
Chethan Venkatesh, Dept of MCA, MSRIT 54

25 December 2012

Emerging Cloud Software Environments cont.. Images are uploaded into user-defined bucket within Walrus and can be retrieved anytime from any availability zone. Users need to create special virtual appliances and deploy them on Eucalyptus. http://en.wikipedia.org/wiki/Virtual_appliance. Available in both commercial proprietary and open source versions.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

55

Emerging Cloud Software Environments cont.. Nimbus :

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

56

Emerging Cloud Software Environments cont.. Set of open source tools that together provide an IaaS cloud computing solution. Allows client to lease remote resources by deploying VMs on those resources and configuring them. Offers special web interface known as Nimbus Web. Provides administrative and user functions in a friendly interface. Cumulus :- a storage cloud implementation tightly integrated with other central services. Compatible with Amazon S3 REST API. Additional feature quota management. Nimbus cloud client uses Java Jets3t library to interact with Cumulus.
Chethan Venkatesh, Dept of MCA, MSRIT 57

25 December 2012

Emerging Cloud Software Environments cont.. Two resource management strategies supported by Nimbus :Default resource pool mode. Service has direct control of a pool of VM manager nodes. pilot mode. Service makes request to a clusters Local Resource Management System (LRMS) to get a VM manager available to deploy VMs. Nimbus also provides an implementation of Amazons EC2 interface.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

58

Emerging Cloud Software Environments cont.. Open source tool kit which allows users to transform existing infrastructure into an IaaS cloud with cloud-like interfaces.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

59

Emerging Cloud Software Environments cont.. Main features of OpenNebula :Feature Internal Interface Scheduler Function Unix-like CLI for fully management of VM life-cycle and physical boxes XML-RPC API and libvirt virtualization API Requirement/rank matchmaker allowing the definition of workload and resource-aware allocation policies Support for advance reservation of capacity through Haizea Xen, KVM, and VMware Generic libvirt connector (VirtualBox planned for 1.4.2) General mechanisms to transfer and clone VM images Definition of isolated virtual networks to interconnect VMs Support for multi-tier services consisting of groups of inter-connected VMs, and their autoconfiguration at boot time Management of users by the infrastructure administrator Persistent database backend to store host and VM information Tested in the management of medium scale infrastructures with hundreds of servers and VMs (no scalability issues has been reported) Installation on a UNIX cluster front-end without requiring new services Distributed in Ubuntu 9.04 (Jaunty Jackalope) Open, flexible and extensible architecture, interfaces and components, allowing its integration with any product or tool

Virtualization Management Image Management Network Management Service Management and Contextualization Security Fault Tolerance Scalability Installation Flexibility and Extensibility

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

60

Emerging Cloud Software Environments cont.. The Core :Request manager: Provides a XML-RPC interface to manage and get information about ONE entities. SQL Pool: Database that holds the state of ONE entities. VM Manager (virtual machine): Takes care of the VM life cycle. Host Manager: Holds handling information about hosts. VN Manager (virtual network): This component is in charge of generating MAC and IP addresses.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

61

Emerging Cloud Software Environments cont.. The tools layer : Scheduler: Searches for physical hosts to deploy newly defined VMs Command Line Interface: Commands to manage OpenNebula. onevm: Virtual Machines create, list, migrate onehost: Hosts create, list, disable onevnet: Virtual Networks create, list, delete

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

62

Emerging Cloud Software Environments cont.. The drivers layer :Transfer Driver: Takes care of the images. cloning, deleting, creating swap image Virtual Machine Driver: Manager of the lifecycle of a virtual machine deploy, shutdown, poll, migrate Information Driver: Executes scripts in physical hosts to gather information about them total memory, free memory, total cpus, cpu consumed

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

63

Emerging Cloud Software Environments cont.. In case of insufficiency of local resources, OpenNebula can support hybrid cloud model by using cloud drivers to interface with the external clouds. Leads to HA. Currently includes EC2 driver and submits requests to EC and Eucalyptus.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

64

Emerging Cloud Software Environments cont.. Sector/sphere : Software platform that supports very large distributed data storage and simplified distributed data processing over large clusters of commodity computers. Can take place either within a same data center or across multiple data centers. Sector: Distributed File System Sphere: Simplified Parallel Data Processing Framework Goal: handling big data on commodity clusters Open source software, BSD license, written in C++. Started since 2006, current version 2.3 http://sector.sf.net Architecture figure refer text book page 391.
Chethan Venkatesh, Dept of MCA, MSRIT 65

25 December 2012

Emerging Cloud Software Environments cont.. DFS designed to work on commodity hardware racks of computers with internal hard disks and high speed network connections. File system level fault tolerance via replication Support wide area networks Can be used for data collection and distribution. Security server :User accounts, permission, IP access control lists. Use independent accounts, but connect to existing account database via a simple driver, e.g., Linux accounts, LDAP, etc. Single security server, system continue to run when security server is down, but new users cannot login.
Chethan Venkatesh, Dept of MCA, MSRIT 66

25 December 2012

Emerging Cloud Software Environments cont.. Master servers :Maintain file system metadata Metadata is a customizable module, currently there are two implementations, one in-memory and one on disk. Authenticate users, slaves, and other masters (via security server). Maintain and manage file replication, data IO and data processing requests Topology aware. Multiple active masters can dynamically join and leave; load balancing between masters.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

67

Emerging Cloud Software Environments cont.. Slave nodes : Store Sector files Sector file is not split into blocks. One Sector file is stored on the native file system (e.g., EXT, XFS, etc.) of one or more slave nodes. Process Sector data Data is processed on the same storage node, or nearest storage node possible. Input and output are Sector files.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

68

Emerging Cloud Software Environments cont.. Clients :Sector file system client API Access Sector files in applications using the C++ API. Sector system tools File system access tools. FUSE Mount Sector file system as a local directory. Sphere programming API Develop parallel data processing applications to process Sector data with a set of simple API. The client communicate with slave directly for data IO, via UDT.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

69

Emerging Cloud Software Environments cont.. UDT: UDP-based Data Transfer :http://udt.sf.net Open source UDP based data transfer protocol With reliability control and congestion control. Fast, firewall friendly, easy to use. Already used in many commercial and research systems for large data transfer.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

70

Emerging Cloud Software Environments cont.. Files are not split into blocks Users are responsible to use proper sized files. Directory and File Family Sector will keep related files together during upload and replication.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

71

Emerging Cloud Software Environments cont.. Sphere :Data parallel applications. Data is processed at where it resides, or on the nearest possible node (locality). Same user defined functions (UDF) are applied on all elements (records, blocks, files, or directories). Processing output can be written to Sector files or sent back to the client. Transparent load balancing and fault tolerance.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

72

OpenStack Community Today

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

73

Emerging Cloud Software Environments cont.. OpenStack : Introduced by Rackspace and NASA in July 2012. An open source community spanning technologists, developers, researchers, and industry to share resources and technologies. Goal creating a massively scalable and secure cloud infrastructure. Software is open source and limited to just open source APIs. Addresses compute and storage aspects. OpenStack Compute and OpenStack Storage solutions.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

74

Emerging Cloud Software Environments cont.. The pieces of OpenStack :-

OpenStack Compute (Nova)

OpenStack Object Storage (Swift)

OpenStack Image Service (Glance)

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

75

Emerging Cloud Software Environments cont.. OpenStack Compute : Nova :- OpenStack is developing a cloud computing fabric controller, a component of an IaaS system. Architecture built on the concept shared-nothing and messagebased information exchange. Communication in Nova facilitated by message queues. Shared-nothing :- the overall system is kept in a distributed data system. Updates are made consistent through atomic transactions. Implemented using Python. Architecture fig refer text book page 392.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

76

Emerging Cloud Software Environments cont.. Supports external libraries and components. Boto, Amazon API provided in Python, and Tornado, a fast HTTP server used to implement the S3 capabilities in OpenStack. Cloud controller :- maintains global state of the system. Ensures authorization while interacting with user manager via Lightweight Directory Access Protocol (LDAP). Interacts with S3 service. Manages nodes as well as storage workers through a queue.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

77

Emerging Cloud Software Environments cont.. Integrates networking components to manage private networks, public IP addressing, virtual private network (VPN) connectivity, firewall rules. Includes following types :NetworkController :- manages address and virtual LAN (VLAN) allocations. RoutingNode :- governs the NAT (network address translation) conversion of public IPs to private IPs, and enforce firewall rules. AddressingNodes : runs Dynamic Host Configuration Protocol (DHCP) services for private networks. TunnelingNode :- provides VPN connectivity.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

78

Emerging Cloud Software Environments cont.. The network state (managed in the distributed object store) consists of the following:VLAN assignment to a project. Private subnet assignment to a security group in VLAN. Private IP assignments to running instances. Public IP allocations to a project. Public IP associations to a private IP/running instance.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

79

Emerging Cloud Software Environments cont.. OpenStack Storage : Built around a number of interacting components and concepts including a proxy server, a ring, an object server, a container server, an account server, replication updaters, and auditors. Proxy server :- enable lookups to the accounts, containers, or objects in OpenStack storage rings and route the requests. Ring :- represents a mapping between names of entities stored on disk and their physical location. Separate rings for accounts, containers, and objects exist. A ring includes the concepts of using zones, devices, partitions, and replicas. Handling failure is easier.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

80

Emerging Cloud Software Environments cont.. Manjrasoft Aneka Cloud and Appliances : What is Aneka? Cloud application platform developed by Manjrasoft, based in Melbourne, Australia. www.manjrasoft.com Designed to support rapid development and deployment of parallel and distributed applications on private or public clouds. Service Oriented Architecture (SOA). Provides a runtime environment and set of APIs. Choice for flexible, extensible .NET enterprise Cloud application and deployment.
Chethan Venkatesh, Dept of MCA, MSRIT 81

25 December 2012

Emerging Cloud Software Environments cont.. Aneka Meaning : many, in many ways, many in one

This means:
Multiple programming/deployment models Multiple scheduling strategies Multiple authentication models Multiple persistence backends Multiple platform and OSs

Designed to be a configurable middleware with the aim of supporting an open ended set of abstractions for distributed computing and deployment scenarios

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

82

Emerging Cloud Software Environments cont.. Aneka acts as a workload distribution and management platform for accelerating applications in both Linux and Microsoft .NET framework environments. Advantages with respect to workload distribution : Supports of multiple programming and application environments. Simultaneous support of multiple runtime environments. Rapid deployment tools and framework. Ability to harness multiple virtual and/or physical machines for accelerating application provisioning based on users Quality of Service/service-level agreement (QoS/SLA) requirements. Built on top of Microsoft .NET framework, with support for Linux environments.
Chethan Venkatesh, Dept of MCA, MSRIT 83

25 December 2012

Emerging Cloud Software Environments cont.. Offers 3 types of capabilities Build, Accelerate, Manage. Build :- includes a new SDK that combine API and tools to enable users to rapidly develop applications. Allows users to build different runtime environment like enterprise/private cloud. Achieved with the help of compute resources in network or enterprise data centers.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

84

Emerging Cloud Software Environments cont.. Accelerate :- supports rapid development and deployment of applications in multiple runtime environments running different OSs such as Windows or Linux/UNIX. Uses physical machines to achieve maximum utilization in local environments. To achieve QoS parameters in case of insufficiency of resources supports dynamic leasing of extra capabilities from public clouds like EC2.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

85

Emerging Cloud Software Environments cont.. Manage : Management tools include a GUI, and APIs to set up, monitor, manage, and maintain remote and global Aneka compute clouds. Accounting mechanism manages priorities and scalability based on SLA/QoS which enables dynamic provisioning. Important programming models supported by Aneka for both cloud and traditional parallel applications : 1. Thread programming model. 2. Task programming model. 3. MapReduce programming model.
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 86

Emerging Cloud Software Environments cont..

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

87

Emerging Cloud Software Environments cont.. Aneka Architecture : Cloud platform features a homogeneous distributed environment for applications. Collection of physical and virtual nodes hosting the Aneka container. Interaction with hosting platform through PAL (Platform Abstraction Layer). Hides implementation of heterogeneity of different OSs. Supports all infrastructure related tasks. PAL and container together represents the hosting environment of services.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

88

Emerging Cloud Software Environments cont.. Categories of services are : Fabric Services : Implements fundamental operations of the infrastructure of the cloud. Services are HA and failover for improved reliability, node membership and directory, resource provisioning, performance monitoring, and hardware profiling. Foundation Services : Comprises core functionalities of Aneka middleware. Provides basic set of capabilities to enhance application execution in the cloud. Services are storage management, resource reservation, reporting, accounting, billing, services monitoring, and licensing 25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 89

Emerging Cloud Software Environments cont.. Application Services :Execution of applications. Provides appropriate runtime environment for each application model. Leverage foundation and fabric services for several tasks of an application execution such as elastic, scalability, data transfer, and performance monitoring, accounting, and billing.

Virtual Appliances : Refer textbook page 398.

25 December 2012

Chethan Venkatesh, Dept of MCA, MSRIT

90

You might also like