Professional Documents
Culture Documents
UNIT-IV
25 December 2012
25 December 2012
25 December 2012
25 December 2012
25 December 2012
25 December 2012
25 December 2012
11
Files stored as chunks Fixed size (64MB) Reliability through replication Each chunk replicated across 3+ chunkservers Single master to coordinate access, keep metadata Simple centralized management No data caching Little benefit due to large data sets, streaming reads Familiar interface, but customize the API Simplify the problem; focus on Google apps Add snapshot and record append operations
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 12
25 December 2012
13
Chunkservers Stores chunk data and checksum for each block On startup/failure recovery, reports chunks to master Periodically reports sub-set of chunks to master (to detect no longer needed chunks) Chunkservers store chunks on local disk as Linux files
25 December 2012
15
Programming Support of Google App Engine Cont.. From distributed systems we know this is a:
Single point of failure Scalability bottleneck
GFS solutions: Shadow masters Minimize master involvement (client chunk server)
never move data through it, use only for metadata
and cache metadata at clients
large chunk size master delegates authority to primary replicas in data mutations (chunk leases)
25 December 2012
16
Data blocks must be created for all replicas. Goal minimize involvement of the master.
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 17
25 December 2012
18
25 December 2012
20
25 December 2012
21
Per-user Data:
User preference settings, recent queries/search results,
Geographic locations:
Physical entities (shops, restaurants, etc.). roads, satellite image data, user annotations,
Scale is large
Billions of URLs, many versions/page(~20K/version) Hundreds of millions of users, thousands of q/sec 100TB+ of satellite image data
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 22
25 December 2012
23
25 December 2012
24
25 December 2012
25
25 December 2012
26
25 December 2012
27
25 December 2012
28
25 December 2012
29
25 December 2012
30
Proxy
optional
25 December 2012
31
Programming on Amazon AWS and Microsoft AZURE cont.. Programming on Amazon EC2 : First company to introduce VMs in application hosting. Rent VM instead of physical machines. Can load any software on VM. Elastic feature customer can create, launch, and terminate server instances as needed. Pay by hour for active servers. Provides preinstalled VMs. Instances are called as Amazon Machine Images (AMI). Preconfigured with operating systems based on Linux or Windows, and additional software. Table defines 3 types of AMI.
25 December 2012
32
Definition Images created by you, which are private by default. You can grant access to other users to launch your private images. Images created by users and released to the Amazon Web Services community, so anyone can launch instances based on them and use them any way they like. The Amazon Web Services Developer Connection Web site lists all public images. You can create images providing specific functions that can be launched by anyone willing to pay you per each hour of usage on top of Amazon charges.
Public
Paid
25 December 2012
33
Programming on Amazon AWS and Microsoft AZURE cont.. Execution environment of Amazon EC2
Amazon Machine Image Public AMIs Private AMIs Paid AMIs
Create an AMI Launch Create Key Pair Configure Firewall
Virtualization Layer
Compute
Storage
Server
25 December 2012
34
Programming on Amazon AWS and Microsoft AZURE cont.. AMIs are the templates for instances, which are running VMs. Workflow to create a VM is : Create an AMI Create Key Pair Configure Firewall Launch
This sequence is supported by public, private, and paid AMIs. Table shows instance types available on Amazon EC2 (Oct 6, 2010)
25 December 2012
35
Compute Instance
Standard: Small
Memory GB
Virtual Cores
Storage GB
32/64 Bit
1.7
160
32
Standard: Large
Standard: Extra Large
7.5
15 0.613
4
8 Up to 2
2
4
850
1690 Only EBS
64
64 32 or 64 64
Micro
High-Memory 17.1 6.5 2
420
High-Memory: Double
High-Memory: Quadruple High-CPU : Medium High-CPU: Extra Large Cluster Compute
25 December 2012
34.2
68.4 1.7 7 23
13
26 5 20 33.5
4
8 2 8 8
850
1690 350 1690 1690
64
64 32 64 64
36
25 December 2012
37
Programming on Amazon AWS and Microsoft AZURE cont.. Amazon Simple Storage Service (S3) : Provides simple web service interface. Used to store and retrieve any amount of data, anytime from anywhere on the web. Provides object-oriented storage service. Users can access their objects through Simple Object Access Protocol (SOAP) using browsers or other client program which supports SOAP. SQS :- responsible for ensuring a reliable message service between two processes, even if the receiver processes are not running. Figure shows S3 execution environment.
25 December 2012
38
REST Interface
Bucket
Object
Key Value Meta-data Access Control
Object is attributed to
Virtualization Layer
25 December 2012
39
Programming on Amazon AWS and Microsoft AZURE cont.. Object-Based Storage. 1 B 5 GB / object. Redundant thru geographic dispersion. 99.99% Availability Goal. Authentication mechanisms. Objects can be Private or Public. Per-object URLs & ACLs. BitTorrent Support (default download protocol is HTTP).
25 December 2012
40
Programming on Amazon AWS and Microsoft AZURE cont.. Pricing : $.15 per GB per month storage. First 1 GB per month input or output free and then $.08 to $0.15 per GB for transfers outside S3 region. There is no data transfer charge for data transferred between EC2 and S3 within the same region or for data transferred between EC2 Northern Virginia and S3 U.S. Standard region (Oct 6, 2010)
25 December 2012
41
Programming on Amazon AWS and Microsoft AZURE cont.. Amazon Elastic Block Store (EBS) and SimpleDB : EBS provides volume block interface for saving and restoring the virtual images of EC2 instances. Traditional EC2 will be destroyed after use. Status of EC2 will be saved on to the EBS after the machine is shutdown. S3 is Storage as a Service with messaging interface. EBS is similar to distributed file system. Allows to create volumes from 1GB to 1TB which can be mounted on EC2 instances. Multiple volumes can be mounted on the same instance. Storage volume behaves like raw unformatted block devices.
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 42
Programming on Amazon AWS and Microsoft AZURE cont.. You can create a file system on top of EBS volumes. Also use them as a hard disk. Snapshots for incremental data saving. Pricing :$0.10 pre GB/month. $0.10 per 1 million I/O requests made to the storage (Oct 6 , 2010). Nimbus an open source equivalent to EBS.
25 December 2012
43
Programming on Amazon AWS and Microsoft AZURE cont.. Amazon SimpleDB Service : Also called as LittleTable (metadata). Provides a simplified data model based on relational database data model. Domains used to organize structured data. Each domain can be considered as a table. Items are rows in the table. Cell is the value for a specific attribute (column name) of corresponding row. Key feature :- possible to assign multiple values to a single cell in the table (not permitted in traditional databases).
25 December 2012
44
Programming on Amazon AWS and Microsoft AZURE cont.. Removes requirement to maintain database schemas. Faster store, access, and query operations. Pricing: First 25 Amazon SimpleDB Machine Hours consumed per month free. $0.140 per Amazon SimpleDB Machine Hour consumed (Oct 6, 2010).
25 December 2012
45
Programming on Amazon AWS and Microsoft AZURE cont.. Microsoft Azure Programming Support : Components shown in the fig ref text book page number 385. Underlying Azure fabric which consists of virtualized hardware together with sophisticated control environment. Implements dynamic assignment of resources, fault tolerance, DNS and monitoring capabilities. Automated service management allows service models to be defined by an XML template and multiple service copies to be instantiated on request. Azure storage :- stores event logs, trace/debug data, performance counters, IIS web server logs, crash dumps, and other log files. There is no debugging capability for running cloud applications.
Chethan Venkatesh, Dept of MCA, MSRIT 46
25 December 2012
Programming on Amazon AWS and Microsoft AZURE cont.. Basic capabilities. Storage. Compute. Web role :- customized VM (appliances) link to internet for Microsoft web hosting. Worker role :- schedule needed resources. Roles support HTTP(S) and TCP. Offer Onstart() method allows you to perform initialization tasks. Onstop() method called when a role is to be shutdown. Run() method contains the main logic.
25 December 2012
47
Programming on Amazon AWS and Microsoft AZURE cont.. SQL Azure : Offer SQL server as a service. All the storage modalities are accessed with REST interfaces except drives. Drives :- recently introduced. Similar to Amazon EBS. Offers file system interface as durable as NTFS. Also support blob storage. Storage replication is 3 times for fault tolerance.
25 December 2012
48
Programming on Amazon AWS and Microsoft AZURE cont.. Basic storage system is built from blobs similar to S3. Blobs arranged in 3 level hierarchy Account Containers Page or Block Blobs. Containers similar to directories in traditional file systems. Account as root. Blobs used to stream data and is a sequence of blocks up to 4 MB. Each block has 64 byte ID. Block blobs can be up to 200GB in size. Page blobs are for random read/write access. Array of pages with a mazimum blob of 1TB. Metadata can be associated with blobs as <name, value> pairs up to 8 KB per block.
Chethan Venkatesh, Dept of MCA, MSRIT 49
25 December 2012
Programming on Amazon AWS and Microsoft AZURE cont.. Azure Tables : Azure Table and queue storage modes for smaller data volumes. Queues provide reliable message delivery. Support work spooling between web and worker roles. 8KB limit on message size. Can consist of unlimited number of messages. Azure supports PUT, GET, and DELETE message operations. Queues supports CREATE and DELETE operations. Each account can have any number of Azure tables. Consists :Rows entities Columns properties.
Chethan Venkatesh, Dept of MCA, MSRIT 50
25 December 2012
Programming on Amazon AWS and Microsoft AZURE cont.. All entities have upto 255 general properties. <name, type, value> triples. Two extra properties PartitionKey and RowKey. RowKey gives unique label to each entity. PartitionKey designed to be shared and entities with the same partitionKey are stored next to each other. Maximum storage for an entity is 1MB. For large values store a link to a blob store in Table property value. ADO.NET and LINQ support table queries.
25 December 2012
51
Emerging Cloud Software Environments Open Source Eucalyptus and Nimbus : A software platform developed by Eucalyptus Systems, Inc., (started 2008 and stable release 2010) Written in Java, C, running with Linux, can host Linux and Windows VMs Use hypervisors (Xen, KVM and VMWare) and compatible with EC2 and S3 services Eucalyptus stands for Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems. For use in developing IaaS-style private cloud or hybrid cloud on computer cluster, working with AWS API License : Proprietary or GPLv3 for open-core enterprise edition and also an open-source edition available Web site: www.eucalyptus.com
Chethan Venkatesh, Dept of MCA, MSRIT 52
25 December 2012
25 December 2012
53
Emerging Cloud Software Environments cont.. Open software environment. Supports cloud programmers in VM image management. Supports both computer cloud and storage cloud. VM Image Management : Many design queues from EC2. Similar image management system. Stores images in Walrus. Walrus :- block storage system similar to S3. User can bundle his/her own root file system, and upload and then register this image and link it with a particular kernel and ramdisk image.
Chethan Venkatesh, Dept of MCA, MSRIT 54
25 December 2012
Emerging Cloud Software Environments cont.. Images are uploaded into user-defined bucket within Walrus and can be retrieved anytime from any availability zone. Users need to create special virtual appliances and deploy them on Eucalyptus. http://en.wikipedia.org/wiki/Virtual_appliance. Available in both commercial proprietary and open source versions.
25 December 2012
55
25 December 2012
56
Emerging Cloud Software Environments cont.. Set of open source tools that together provide an IaaS cloud computing solution. Allows client to lease remote resources by deploying VMs on those resources and configuring them. Offers special web interface known as Nimbus Web. Provides administrative and user functions in a friendly interface. Cumulus :- a storage cloud implementation tightly integrated with other central services. Compatible with Amazon S3 REST API. Additional feature quota management. Nimbus cloud client uses Java Jets3t library to interact with Cumulus.
Chethan Venkatesh, Dept of MCA, MSRIT 57
25 December 2012
Emerging Cloud Software Environments cont.. Two resource management strategies supported by Nimbus :Default resource pool mode. Service has direct control of a pool of VM manager nodes. pilot mode. Service makes request to a clusters Local Resource Management System (LRMS) to get a VM manager available to deploy VMs. Nimbus also provides an implementation of Amazons EC2 interface.
25 December 2012
58
Emerging Cloud Software Environments cont.. Open source tool kit which allows users to transform existing infrastructure into an IaaS cloud with cloud-like interfaces.
25 December 2012
59
Emerging Cloud Software Environments cont.. Main features of OpenNebula :Feature Internal Interface Scheduler Function Unix-like CLI for fully management of VM life-cycle and physical boxes XML-RPC API and libvirt virtualization API Requirement/rank matchmaker allowing the definition of workload and resource-aware allocation policies Support for advance reservation of capacity through Haizea Xen, KVM, and VMware Generic libvirt connector (VirtualBox planned for 1.4.2) General mechanisms to transfer and clone VM images Definition of isolated virtual networks to interconnect VMs Support for multi-tier services consisting of groups of inter-connected VMs, and their autoconfiguration at boot time Management of users by the infrastructure administrator Persistent database backend to store host and VM information Tested in the management of medium scale infrastructures with hundreds of servers and VMs (no scalability issues has been reported) Installation on a UNIX cluster front-end without requiring new services Distributed in Ubuntu 9.04 (Jaunty Jackalope) Open, flexible and extensible architecture, interfaces and components, allowing its integration with any product or tool
Virtualization Management Image Management Network Management Service Management and Contextualization Security Fault Tolerance Scalability Installation Flexibility and Extensibility
25 December 2012
60
Emerging Cloud Software Environments cont.. The Core :Request manager: Provides a XML-RPC interface to manage and get information about ONE entities. SQL Pool: Database that holds the state of ONE entities. VM Manager (virtual machine): Takes care of the VM life cycle. Host Manager: Holds handling information about hosts. VN Manager (virtual network): This component is in charge of generating MAC and IP addresses.
25 December 2012
61
Emerging Cloud Software Environments cont.. The tools layer : Scheduler: Searches for physical hosts to deploy newly defined VMs Command Line Interface: Commands to manage OpenNebula. onevm: Virtual Machines create, list, migrate onehost: Hosts create, list, disable onevnet: Virtual Networks create, list, delete
25 December 2012
62
Emerging Cloud Software Environments cont.. The drivers layer :Transfer Driver: Takes care of the images. cloning, deleting, creating swap image Virtual Machine Driver: Manager of the lifecycle of a virtual machine deploy, shutdown, poll, migrate Information Driver: Executes scripts in physical hosts to gather information about them total memory, free memory, total cpus, cpu consumed
25 December 2012
63
Emerging Cloud Software Environments cont.. In case of insufficiency of local resources, OpenNebula can support hybrid cloud model by using cloud drivers to interface with the external clouds. Leads to HA. Currently includes EC2 driver and submits requests to EC and Eucalyptus.
25 December 2012
64
Emerging Cloud Software Environments cont.. Sector/sphere : Software platform that supports very large distributed data storage and simplified distributed data processing over large clusters of commodity computers. Can take place either within a same data center or across multiple data centers. Sector: Distributed File System Sphere: Simplified Parallel Data Processing Framework Goal: handling big data on commodity clusters Open source software, BSD license, written in C++. Started since 2006, current version 2.3 http://sector.sf.net Architecture figure refer text book page 391.
Chethan Venkatesh, Dept of MCA, MSRIT 65
25 December 2012
Emerging Cloud Software Environments cont.. DFS designed to work on commodity hardware racks of computers with internal hard disks and high speed network connections. File system level fault tolerance via replication Support wide area networks Can be used for data collection and distribution. Security server :User accounts, permission, IP access control lists. Use independent accounts, but connect to existing account database via a simple driver, e.g., Linux accounts, LDAP, etc. Single security server, system continue to run when security server is down, but new users cannot login.
Chethan Venkatesh, Dept of MCA, MSRIT 66
25 December 2012
Emerging Cloud Software Environments cont.. Master servers :Maintain file system metadata Metadata is a customizable module, currently there are two implementations, one in-memory and one on disk. Authenticate users, slaves, and other masters (via security server). Maintain and manage file replication, data IO and data processing requests Topology aware. Multiple active masters can dynamically join and leave; load balancing between masters.
25 December 2012
67
Emerging Cloud Software Environments cont.. Slave nodes : Store Sector files Sector file is not split into blocks. One Sector file is stored on the native file system (e.g., EXT, XFS, etc.) of one or more slave nodes. Process Sector data Data is processed on the same storage node, or nearest storage node possible. Input and output are Sector files.
25 December 2012
68
Emerging Cloud Software Environments cont.. Clients :Sector file system client API Access Sector files in applications using the C++ API. Sector system tools File system access tools. FUSE Mount Sector file system as a local directory. Sphere programming API Develop parallel data processing applications to process Sector data with a set of simple API. The client communicate with slave directly for data IO, via UDT.
25 December 2012
69
Emerging Cloud Software Environments cont.. UDT: UDP-based Data Transfer :http://udt.sf.net Open source UDP based data transfer protocol With reliability control and congestion control. Fast, firewall friendly, easy to use. Already used in many commercial and research systems for large data transfer.
25 December 2012
70
Emerging Cloud Software Environments cont.. Files are not split into blocks Users are responsible to use proper sized files. Directory and File Family Sector will keep related files together during upload and replication.
25 December 2012
71
Emerging Cloud Software Environments cont.. Sphere :Data parallel applications. Data is processed at where it resides, or on the nearest possible node (locality). Same user defined functions (UDF) are applied on all elements (records, blocks, files, or directories). Processing output can be written to Sector files or sent back to the client. Transparent load balancing and fault tolerance.
25 December 2012
72
25 December 2012
73
Emerging Cloud Software Environments cont.. OpenStack : Introduced by Rackspace and NASA in July 2012. An open source community spanning technologists, developers, researchers, and industry to share resources and technologies. Goal creating a massively scalable and secure cloud infrastructure. Software is open source and limited to just open source APIs. Addresses compute and storage aspects. OpenStack Compute and OpenStack Storage solutions.
25 December 2012
74
25 December 2012
75
Emerging Cloud Software Environments cont.. OpenStack Compute : Nova :- OpenStack is developing a cloud computing fabric controller, a component of an IaaS system. Architecture built on the concept shared-nothing and messagebased information exchange. Communication in Nova facilitated by message queues. Shared-nothing :- the overall system is kept in a distributed data system. Updates are made consistent through atomic transactions. Implemented using Python. Architecture fig refer text book page 392.
25 December 2012
76
Emerging Cloud Software Environments cont.. Supports external libraries and components. Boto, Amazon API provided in Python, and Tornado, a fast HTTP server used to implement the S3 capabilities in OpenStack. Cloud controller :- maintains global state of the system. Ensures authorization while interacting with user manager via Lightweight Directory Access Protocol (LDAP). Interacts with S3 service. Manages nodes as well as storage workers through a queue.
25 December 2012
77
Emerging Cloud Software Environments cont.. Integrates networking components to manage private networks, public IP addressing, virtual private network (VPN) connectivity, firewall rules. Includes following types :NetworkController :- manages address and virtual LAN (VLAN) allocations. RoutingNode :- governs the NAT (network address translation) conversion of public IPs to private IPs, and enforce firewall rules. AddressingNodes : runs Dynamic Host Configuration Protocol (DHCP) services for private networks. TunnelingNode :- provides VPN connectivity.
25 December 2012
78
Emerging Cloud Software Environments cont.. The network state (managed in the distributed object store) consists of the following:VLAN assignment to a project. Private subnet assignment to a security group in VLAN. Private IP assignments to running instances. Public IP allocations to a project. Public IP associations to a private IP/running instance.
25 December 2012
79
Emerging Cloud Software Environments cont.. OpenStack Storage : Built around a number of interacting components and concepts including a proxy server, a ring, an object server, a container server, an account server, replication updaters, and auditors. Proxy server :- enable lookups to the accounts, containers, or objects in OpenStack storage rings and route the requests. Ring :- represents a mapping between names of entities stored on disk and their physical location. Separate rings for accounts, containers, and objects exist. A ring includes the concepts of using zones, devices, partitions, and replicas. Handling failure is easier.
25 December 2012
80
Emerging Cloud Software Environments cont.. Manjrasoft Aneka Cloud and Appliances : What is Aneka? Cloud application platform developed by Manjrasoft, based in Melbourne, Australia. www.manjrasoft.com Designed to support rapid development and deployment of parallel and distributed applications on private or public clouds. Service Oriented Architecture (SOA). Provides a runtime environment and set of APIs. Choice for flexible, extensible .NET enterprise Cloud application and deployment.
Chethan Venkatesh, Dept of MCA, MSRIT 81
25 December 2012
Emerging Cloud Software Environments cont.. Aneka Meaning : many, in many ways, many in one
This means:
Multiple programming/deployment models Multiple scheduling strategies Multiple authentication models Multiple persistence backends Multiple platform and OSs
Designed to be a configurable middleware with the aim of supporting an open ended set of abstractions for distributed computing and deployment scenarios
25 December 2012
82
Emerging Cloud Software Environments cont.. Aneka acts as a workload distribution and management platform for accelerating applications in both Linux and Microsoft .NET framework environments. Advantages with respect to workload distribution : Supports of multiple programming and application environments. Simultaneous support of multiple runtime environments. Rapid deployment tools and framework. Ability to harness multiple virtual and/or physical machines for accelerating application provisioning based on users Quality of Service/service-level agreement (QoS/SLA) requirements. Built on top of Microsoft .NET framework, with support for Linux environments.
Chethan Venkatesh, Dept of MCA, MSRIT 83
25 December 2012
Emerging Cloud Software Environments cont.. Offers 3 types of capabilities Build, Accelerate, Manage. Build :- includes a new SDK that combine API and tools to enable users to rapidly develop applications. Allows users to build different runtime environment like enterprise/private cloud. Achieved with the help of compute resources in network or enterprise data centers.
25 December 2012
84
Emerging Cloud Software Environments cont.. Accelerate :- supports rapid development and deployment of applications in multiple runtime environments running different OSs such as Windows or Linux/UNIX. Uses physical machines to achieve maximum utilization in local environments. To achieve QoS parameters in case of insufficiency of resources supports dynamic leasing of extra capabilities from public clouds like EC2.
25 December 2012
85
Emerging Cloud Software Environments cont.. Manage : Management tools include a GUI, and APIs to set up, monitor, manage, and maintain remote and global Aneka compute clouds. Accounting mechanism manages priorities and scalability based on SLA/QoS which enables dynamic provisioning. Important programming models supported by Aneka for both cloud and traditional parallel applications : 1. Thread programming model. 2. Task programming model. 3. MapReduce programming model.
25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 86
25 December 2012
87
Emerging Cloud Software Environments cont.. Aneka Architecture : Cloud platform features a homogeneous distributed environment for applications. Collection of physical and virtual nodes hosting the Aneka container. Interaction with hosting platform through PAL (Platform Abstraction Layer). Hides implementation of heterogeneity of different OSs. Supports all infrastructure related tasks. PAL and container together represents the hosting environment of services.
25 December 2012
88
Emerging Cloud Software Environments cont.. Categories of services are : Fabric Services : Implements fundamental operations of the infrastructure of the cloud. Services are HA and failover for improved reliability, node membership and directory, resource provisioning, performance monitoring, and hardware profiling. Foundation Services : Comprises core functionalities of Aneka middleware. Provides basic set of capabilities to enhance application execution in the cloud. Services are storage management, resource reservation, reporting, accounting, billing, services monitoring, and licensing 25 December 2012 Chethan Venkatesh, Dept of MCA, MSRIT 89
Emerging Cloud Software Environments cont.. Application Services :Execution of applications. Provides appropriate runtime environment for each application model. Leverage foundation and fabric services for several tasks of an application execution such as elastic, scalability, data transfer, and performance monitoring, accounting, and billing.
25 December 2012
90