You are on page 1of 379

70-534 Architecting Microsoft Azure Solutions

Design Azure Resource Manager (ARM) networking (510%)


1. Design Azure virtual networks

Extend on-premises; leverage Azure networking services: implement load balancing using Azure Load Balancer and Azure
Traffic Manager; define DNS, DHCP, and IP addressing configuration; define static IP reservations; apply Network Security
Groups (NSGs) and User Defined Routes (UDRs); deploy Azure Application Gateway

2. Describe Azure VPN and ExpressRoute architecture and design

Describe Azure point-to-site (P2S) and site-to-site (S2S) VPN, leverage Azure VPN and ExpressRoute in network architecture

Architect an Azure Compute infrastructure (1015%)


1. Design ARM Virtual Machines (VMs)

Design VM deployments leveraging availability sets, fault domains, and update domains in Azure; select appropriate VM
SKUs

2. Design ARM template deployment

Author ARM templates; deploy ARM templates via the portal, PowerShell, and CL

3. Design for availability

Implement regional availability and high availability for Azure deployments

Secure resources (2025%)


4. Secure resources by using managed identities

Describe the differences between Active Directory on-premises and Azure Active Directory (Azure AD), programmatically
access Azure AD using Graph API, secure access to resources from Azure AD applications using OAuth and OpenID Connect

5. Secure resources by using hybrid identities

Use SAML claims to authenticate to on-premises resources, describe AD Connect synchronization, implement federated
identities using Active Directory Federation Services (ADFS)

6. Secure resources by using identity providers

Provide access to resources using identity providers, such as Microsoft account, Facebook, Google, and Yahoo!; manage
identity and access by using Azure AD B2C; implement Azure AD B2B

7. Identify an appropriate data security solution

Identify security requirements for data in transit and data at rest; identify security requirements using Azure services,
including Azure Storage Encryption, Azure Disk Encryption, and Azure SQL Database TDE

8. Design a role-based access control (RBAC) strategy

Secure resource scopes, such as the ability to create VMs and Azure Web Apps; implement Azure RBAC standard roles;
design Azure RBAC custom roles

9. Manage security risks by using an appropriate security solution

Identify, assess, and mitigate security risks by using Azure Security Center, Operations Management Suite, and other services

Design an application storage and data access strategy (510%)


10. Design data storage

Design storage options for data, including Table Storage, SQL Database, DocumentDB, Blob Storage, MongoDB, and MySQL;
design security options for SQL Database or Azure Storage

11. Select the appropriate storage option

Select the appropriate storage for performance, identify storage options for cloud services and hybrid scenarios with
compute on-premises and storage on Azure

Design Azure Web and Mobile Apps (510%)


12. Design Web Applications

1|P ag e
70-534 Architecting Microsoft Azure Solutions

Design Azure App Service Web Apps, design custom web API, offload long-running applications using WebJobs, secure Web
API using Azure AD, design Web Apps for scalability and performance, deploy Azure Web Apps to multiple regions for high
availability, deploy Web Apps, create App Service plans, design Web Apps for business continuity, configure data replication
patterns, update Azure Web Apps with minimal downtime, back up and restore data, design for disaster recovery

13. Design Mobile Applications

Design Azure Mobile Services; consume Mobile Apps from cross-platform clients; integrate offline sync capabilities into an
application; extend Mobile Apps using custom code; implement Mobile Apps using Microsoft .NET or Node.js; secure Mobile
Apps using Azure AD; implement push notification services in Mobile Apps; send push notifications to all subscribers, specific
subscribers, or a segment of subscribers

Design advanced applications (2025%)


14. Create compute-intensive applications

Design high-performance computing (HPC) and other compute-intensive applications using Azure Services

15. Create long-running applications

Implement Azure Batch for scalable processing, design stateless components to accommodate scale, use Azure Scheduler

16. Integrate Azure services in a solution

Design Azure architecture using Azure services, such as Azure AD, Azure App Service, API Management, Azure Cache, Azure
Search, Service Bus, Event Hubs, Stream Analytics, and IoT Hub; identify the appropriate use of Azure Machine Learning, big
data, Azure Media Services, and Azure Search services

17. Implement messaging applications

Use a queue-centric pattern for development; select appropriate technology, such as Azure Storage Queues, Azure Service
Bus queues, topics, subscriptions, and Azure Event Hubs

18. Implement applications for background processing

Implement Azure Batch for compute-intensive tasks, use Azure WebJobs to implement background tasks, use Azure
Functions to implement event-driven actions, leverage Azure Scheduler to run processes at preset/recurring timeslots

19. Design connectivity for hybrid applications

Connect to on-premises data from Azure applications using Service Bus Relay, Hybrid Connections, or the Azure Web App
virtual private network (VPN) capability; identify constraints for connectivity with VPN; identify options for joining VMs to
domains or cloud services

Design a management, monitoring, and business continuity strategy (2025%)


20. Design a monitoring strategy

Identify the Microsoft products and services for monitoring Azure solutions; leverage the capabilities of Azure Operations
Management Suite and Azure Application Insights for monitoring Azure solutions; leverage built-in Azure capabilities;
identify third-party monitoring tools, including open source; describe Azure architecture constructs, such as availability sets
and update domains, and how they impact a patching strategy; analyze logs by using the Azure Operations Management
Suite

21. Describe Azure business continuity/disaster recovery (BC/DR) capabilities

Leverage the architectural capabilities of BC/DR, describe Hyper-V Replica and Azure Site Recovery (ASR), describe use cases
for Hyper-V Replica and ASR

22. Design a disaster recovery strategy

Design and deploy Azure Backup and other Microsoft backup solutions for Azure, leverage use cases when StorSimple and
System Center Data Protection Manager would be appropriate, design and deploy Azure Site recovery

23. Design Azure Automation and PowerShell workflows

Create a PowerShell script specific to Azure, automate tasks by using the Azure Operations Management Suite

24. Describe the use cases for Azure Automation configuration

Evaluate when to use Azure Automation, Chef, Puppet, PowerShell, or Desired State Configuration (DSC).
2|P ag e
70-534 Architecting Microsoft Azure Solutions

CHAPTER 1: Design Azure Resource Manager (ARM) networking

The Azure Virtual Network service enables you to securely connect Azure resources to each other with virtual
networks (VNets). A VNet is a representation of your own network in the cloud. A VNet is a logical isolation of the
Azure cloud dedicated to your subscription. You can also connect VNets to your on-premises network. The following
picture shows some of the capabilities of the Azure Virtual Network service:1

Network isolation and segmentation

You can implement multiple VNets within each Azure subscription and Azure region. Each VNet is isolated from other
VNets. For each VNet you can:

Specify a custom private IP address space using public and private (RFC 1918) addresses. Azure assigns
resources connected to the VNet a private IP address from the address space you assign.

Segment the VNet into one or more subnets and allocate a portion of the VNet address space to each subnet.

Use Azure-provided name resolution or specify your own DNS server for use by resources connected to a
VNet. To learn more about name resolution in VNets, read the Name resolution for VMs and Cloud
Services article.

Connect to the Internet

All resources connected to a VNet have outbound connectivity to the Internet by default. The private IP address of the
resource is source network address translated (SNAT) to a public IP address by the Azure infrastructure. To learn more
about outbound Internet connectivity, read the Understanding outbound connections in Azure article. You can change
the default connectivity by implementing custom routing and traffic filtering.

To communicate inbound to Azure resources from the Internet, or to communicate outbound to the Internet without
SNAT, a resource must be assigned a public IP address. To learn more about public IP addresses, read the Public IP
addresses article.

Connect Azure resources

You can connect several Azure resources to a VNet, such as Virtual Machines (VM), Cloud Services, App Service
Environments, and Virtual Machine Scale Sets. VMs connect to a subnet within a VNet through a network interface
(NIC). To learn more about NICs, read the Network interfaces article.

Connect virtual networks

3|P ag e
70-534 Architecting Microsoft Azure Solutions

You can connect VNets to each other, enabling resources connected to either VNet to communicate with each other
across VNets. You can use either or both of the following options to connect VNets to each other:

Peering: Enables resources connected to different Azure VNets within the same Azure location to
communicate with each other. The bandwidth and latency across the VNets is the same as if the resources
were connected to the same VNet. To learn more about peering, read the Virtual network peering article.

VNet-to-VNet connection: Enables resources connected to different Azure VNet within the same, or different
Azure locations. Unlike peering, bandwidth is limited between VNets because traffic must flow through an
Azure VPN Gateway. To learn more about connecting VNets with a VNet-to-VNet connection, read
the Configure a VNet-to-VNet connection article.

Connect to an on-premises network

You can connect your on-premises network to a VNet using any combination of the following options:

Point-to-site virtual private network (VPN): Established between a single PC connected to your network and
the VNet. This connection type is great if you're just getting started with Azure, or for developers, because it
requires little or no changes to your existing network. The connection uses the SSTP protocol to provide
encrypted communication over the Internet between the PC and the VNet. The latency for a point-to-site VPN
is unpredictable, since the traffic traverses the Internet.

Site-to-site VPN: Established between your VPN device and an Azure VPN Gateway. This connection type
enables any on-premises resource you authorize to access a VNet. The connection is an IPSec/IKE VPN that
provides encrypted communication over the Internet between your on-premises device and the Azure VPN
gateway. The latency for a site-to-site connection is unpredictable, since the traffic traverses the Internet.

Azure ExpressRoute: Established between your network and Azure, through an ExpressRoute partner. This
connection is private. Traffic does not traverse the Internet. The latency for an ExpressRoute connection is
predictable, since traffic doesn't traverse the Internet.

To learn more about all the previous connection options, read the Connection topology diagrams article.

Filter network traffic

You can filter network traffic between subnets using either or both of the following options:

Network security groups (NSG): Each NSG can contain multiple inbound and outbound security rules that
enable you to filter traffic by source and destination IP address, port, and protocol. You can apply an NSG to
each NIC in a VM. You can also apply an NSG to the subnet a NIC, or other Azure resource, is connected to. To
learn more about NSGs, read the Network security groups article.

Network virtual appliances (NVA): An NVA is a VM running software that performs a network function, such as
a firewall. View a list of available NVAs in the Azure Marketplace. NVAs are also available that provide WAN
optimization and other network traffic functions. NVAs are typically used with user-defined or BGP routes.
You can also use an NVA to filter traffic between VNets.

Route network traffic

Azure creates route tables that enable resources connected to any subnet in any VNet to communicate with each
other, by default. You can implement either or both of the following options to override the default routes Azure
creates:

User-defined routes: You can create custom route tables with routes that control where traffic is routed to for
each subnet. To learn more about user-defined routes, read the User-defined routes article.

BGP routes: If you connect your VNet to your on-premises network using an Azure VPN Gateway or
ExpressRoute connection, you can propagate BGP routes to your VNets.

4|P ag e
70-534 Architecting Microsoft Azure Solutions

Pricing

There is no charge for virtual networks, subnets, route tables, or network security groups. Outbound Internet
bandwidth usage, public IP addresses, virtual network peering, VPN Gateways, and ExpressRoute each have their own
pricing structures. View the Virtual network, VPN Gateway, and ExpressRoute pricing pages for more information.

Creating a VNet to experiment with is easy enough, but chances are, you will deploy multiple VNets over time to
support the production needs of your organization. With some planning and design, you will be able to deploy VNets
and connect the resources you need more effectively. If you are not familiar with VNets, it's recommended that
you learn about VNets and how to deploy one before proceeding.

Plan

A thorough understanding of Azure subscriptions, regions, and network resources is critical for success. You can use
the list of considerations below as a starting point. Once you understand those considerations, you can define the
requirements for your network design.

Considerations

Before answering the planning questions below, consider the following:

Everything you create in Azure is composed of one or more resources. A virtual machine (VM) is a resource,
the network adapter interface (NIC) used by a VM is a resource, the public IP address used by a NIC is a
resource, the VNet the NIC is connected to is a resource.

You create resources within an Azure region and subscription. And resources can only be connected to a VNet
that exists in the same region and subscription they are in.

You can connect VNets to each other by using an Azure VPN Gateway. You can also connect VNets across
regions and subscriptions this way.

You can connect VNets to your on-premises network by using one of the connectivity options available in
Azure.

Different resources can be grouped together in resource groups, making it easier to manage the resource as a
unit. A resource group can contain resources from multiple regions, as long as the resources belong to the
same subscription.

Define requirements

Use the questions below as a starting point for your Azure network design.

1. What Azure locations will you use to host VNets?

2. Do you need to provide communication between these Azure locations?

3. Do you need to provide communication between your Azure VNet(s) and your on-premises datacenter(s)?

4. How many Infrastructure as a Service (IaaS) VMs, cloud services roles, and web apps do you need for your
solution?

5. Do you need to isolate traffic based on groups of VMs (i.e. front end web servers and back end database
servers)?

6. Do you need to control traffic flow using virtual appliances?

7. Do users need different sets of permissions to different Azure resources?

Understand VNet and subnet properties

5|P ag e
70-534 Architecting Microsoft Azure Solutions

VNet and subnets resources help define a security boundary for workloads running in Azure. A VNet is characterized
by a collection of address spaces, defined as CIDR blocks.

Note

Network administrators are familiar with CIDR notation. If you are not familiar with CIDR, learn more about it.

VNets contain the following properties.

Property Description Constraints

name VNet name String of up to 80 characters. May contain letters, numbers,


underscore, periods, or hyphens. Must start with a letter or
number. Must end with a letter, number, or underscore. Can
contains upper or lower case letters.

location Azure location (also Must be one of the valid Azure locations.
referred to as region).

addressSpace Collection of address Must be an array of valid CIDR address blocks, including public
prefixes that make up IP address ranges.
the VNet in CIDR
notation.

subnets Collection of subnets see the subnet properties table below.


that make up the VNet

dhcpOptions Object that contains a


single required property
named dnsServers.

dnsServers Array of DNS servers Must be an array of up to 10 DNS servers, by IP address.


used by the VNet. If no
server is specified, Azure
internal name resolution
is used.

A subnet is a child resource of a VNet, and helps define segments of address spaces within a CIDR block, using IP
address prefixes. NICs can be added to subnets, and connected to VMs, providing connectivity for various workloads.

Subnets contain the following properties.

Property Description Constraints

name Subnet name String of up to 80 characters. May contain


letters, numbers, underscore, periods, or

6|P ag e
70-534 Architecting Microsoft Azure Solutions

Property Description Constraints

hyphens. Must start with a letter or


number. Must end with a letter, number, or
underscore. Can contains upper or lower
case letters.

location Azure location (also referred to Must be one of the valid Azure locations.
as region).

addressPrefix Single address prefix that Must be a single CIDR block that is part of
make up the subnet in CIDR one of the VNet's address spaces.
notation

networkSecurityGroup NSG applied to the subnet

routeTable Route table applied to the


subnet

ipConfigurations Collection of IP configuration


objects used by NICs
connected to the subnet

Name resolution

By default, your VNet uses Azure-provided name resolution to resolve names inside the VNet, and on the public
Internet. However, if you connect your VNets to your on-premises data centers, you need to provide your own DNS
server to resolve names between your networks.

Limits

Review the networking limits in the Azure limits article to ensure that your design doesn't conflict with any of the
limits. Some limits can be increased by opening a support ticket.

Role-Based Access Control (RBAC)

You can use Azure RBAC to control the level of access different users may have to different resources in Azure. That
way you can segregate the work done by your team based on their needs.

As far as virtual networks are concerned, users in the Network Contributor role have full control over Azure Resource
Manager virtual network resources. Similarly, users in the Classic Network Contributor role have full control over
classic virtual network resources.

Note

You can also create your own roles to separate your administrative needs.

7|P ag e
70-534 Architecting Microsoft Azure Solutions

Designing VNETs
Once you know the answers to the questions in the Plan section, review the following before defining your VNets.

Number of subscriptions and VNets

You should consider creating multiple VNets in the following scenarios:

VMs that need to be placed in different Azure locations. VNets in Azure are regional. They cannot span
locations. Therefore you need at least one VNet for each Azure location you want to host VMs in.

Workloads that need to be completely isolated from one another. You can create separate VNets, that even
use the same IP address spaces, to isolate different workloads from one another.

Keep in mind that the limits you see above are per region, per subscription. That means you can use multiple
subscriptions to increase the limit of resources you can maintain in Azure. You can use a site-to-site VPN, or an
ExpressRoute circuit, to connect VNets in different subscriptions.

Subscription and VNet design patterns

The table below shows some common design patterns for using subscriptions and VNets.

Scenario Diagram Pros Cons

Single Only one Maximum


subscription, subscription number of VNets
two VNets per to manage. per Azure region.
app You need more
subscriptions
after that. Review
the Azure
limits article for
details.

One Uses only two Harder to


subscription VNets per manage when
per app, two subscription. there are too
VNets per app many apps.

8|P ag e
70-534 Architecting Microsoft Azure Solutions

Scenario Diagram Pros Cons

One Balance Maximum


subscription between number of VNets
per business number of per business unit
unit, two subscriptions (subscription).
VNets per and VNets. Review the Azure
app. limits article for
details.

One Balance Apps must be


subscription between isolated by using
per business number of subnets and
unit, two subscriptions NSGs.
VNets per and VNets.
group of apps.

Number of subnets

You should consider multiple subnets in a VNet in the following scenarios:

Not enough private IP addresses for all NICs in a subnet. If your subnet address space does not contain
enough IP addresses for the number of NICs in the subnet, you need to create multiple subnets. Keep in mind
that Azure reserves 5 private IP addresses from each subnet that cannot be used: the first and last addresses
of the address space (for the subnet address, and multicast) and 3 addresses to be used internally (for DHCP
and DNS purposes).

9|P ag e
70-534 Architecting Microsoft Azure Solutions

Security. You can use subnets to separate groups of VMs from one another for workloads that have a multi-
layer structure, and apply different network security groups (NSGs) for those subnets.

Hybrid connectivity. You can use VPN gateways and ExpressRoute circuits to connect your VNets to one
another, and to your on-premises data center(s). VPN gateways and ExpressRoute circuits require a subnet of
their own to be created.

Virtual appliances. You can use a virtual appliance, such as a firewall, WAN accelerator, or VPN gateway in an
Azure VNet. When you do so, you need to route traffic to those appliances and isolate them in their own
subnet.

Subnet and NSG design patterns

The table below shows some common design patterns for using subnets.

Scenario Diagram Pros Cons

Single subnet, Only one Multiple NSGs


NSGs per subnet to necessary to
application manage. isolate each
layer, per app application.

One subnet Fewer Multiple subnets


per app, NSGs NSGs to to manage.
per manage.
application
layer

10 | P a g e
70-534 Architecting Microsoft Azure Solutions

Scenario Diagram Pros Cons

One subnet Balance Maximum number


per between of NSGs per
application number of subscription.
layer, NSGs subnets Review the Azure
per app. and NSGs. limits article for
details.

One subnet Possibly Multiple subnets


per smaller to manage.
application number of
layer, per app, NSGs.
NSGs per
subnet

Sample design

To illustrate the application of the information in this article, consider the following scenario.

11 | P a g e
70-534 Architecting Microsoft Azure Solutions

You work for a company that has 2 data centers in North America, and two data centers Europe. You identified 6
different customer facing applications maintained by 2 different business units that you want to migrate to Azure as a
pilot. The basic architecture for the applications are as follows:

App1, App2, App3, and App4 are web applications hosted on Linux servers running Ubuntu. Each application
connects to a separate application server that hosts RESTful services on Linux servers. The RESTful services
connect to a back end MySQL database.

App5 and App6 are web applications hosted on Windows servers running Windows Server 2012 R2. Each
application connects to a back end SQL Server database.

All apps are currently hosted in one of the company's data centers in North America.

The on-premises data centers use the 10.0.0.0/8 address space.

You need to design a virtual network solution that meets the following requirements:

Each business unit should not be affected by resource consumption of other business units.

You should minimize the amount of VNets and subnets to make management easier.

Each business unit should have a single test/development VNet used for all applications.

Each application is hosted in 2 different Azure data centers per continent (North America and Europe).

Each application is completely isolated from each other.

Each application can be accessed by customers over the Internet using HTTP.

Each application can be accessed by users connected to the on-premises data centers by using an encrypted
tunnel.

Connection to on-premises data centers should use existing VPN devices.

The company's networking group should have full control over the VNet configuration.

Developers in each business unit should only be able to deploy VMs to existing subnets.

All applications will be migrated as they are to Azure (lift-and-shift).

The databases in each location should replicate to other Azure locations once a day.

Each application should use 5 front end web servers, 2 application servers (when necessary), and 2 database
servers.

Plan

You should start your design planning by answering the question in the Define requirements section as shown below.

1. What Azure locations will you use to host VNets?

2 locations in North America, and 2 locations in Europe. You should pick those based on the physical location of your
existing on-premises data centers. That way your connection from your physical locations to Azure will have a better
latency.

2. Do you need to provide communication between these Azure locations?

Yes. Since the databases must be replicated to all locations.

3. Do you need to provide communication between your Azure VNet(s) and your on-premises data center(s)?

12 | P a g e
70-534 Architecting Microsoft Azure Solutions

Yes. Since users connected to the on-premises data centers must be able to access the applications through an
encrypted tunnel.

4. How many IaaS VMs do you need for your solution?

200 IaaS VMs. App1, App2, App3, and App4 require 5 web servers each, 2 applications servers each, and 2 database
servers each. That's a total of 9 IaaS VMs per application, or 36 IaaS VMs. App5 and App6 require 5 web servers and 2
database servers each. That's a total of 7 IaaS VMs per application, or 14 IaaS VMs. Therefore, you need 50 IaaS VMs
for all applications in each Azure region. Since we need to use 4 regions, there will be 200 IaaS VMs.

You will also need to provide DNS servers in each VNet, or in your on-premises data centers to resolve name between
your Azure IaaS VMs and your on-premises network.

5. Do you need to isolate traffic based on groups of VMs (i.e. front end web servers and back end database
servers)?

Yes. Each application should be completely isolated from each other, and each application layer should also be
isolated.

6. Do you need to control traffic flow using virtual appliances?

No. Virtual appliances can be used to provide more control over traffic flow, including more detailed data plane
logging.

7. Do users need different sets of permissions to different Azure resources?

Yes. The networking team needs full control on the virtual networking settings, while developers should only be able
to deploy their VMs to pre-existing subnets.

Design

You should follow the design specifying subscriptions, VNets, subnets, and NSGs. We will discuss NSGs here, but you
should learn more about NSGs before finishing your design.

Number of subscriptions and VNets

The following requirements are related to subscriptions and VNets:

Each business unit should not be affected by resource consumption of other business units.

You should minimize the amount of VNets and subnets.

Each business unit should have a single test/development VNet used for all applications.

Each application is hosted in 2 different Azure data centers per continent (North America and Europe).

Based on those requirements, you need a subscription for each business unit. That way, consumption of resources
from a business unit will not count towards limits for other business units. And since you want to minimize the
number of VNets, you should consider using the one subscription per business unit, two VNets per group of
apps pattern as seen below.

13 | P a g e
70-534 Architecting Microsoft Azure Solutions

You also need to specify the address space for each VNet. Since you need connectivity between the on-premises data
centers and the Azure regions, the address space used for Azure VNets cannot clash with the on-premises network,
and the address space used by each VNet should not clash with other existing VNets. You could use the address
spaces in the table below to satisfy these requirements.

Subscription VNet Azure region Address space

BU1 ProdBU1US1 West US 172.16.0.0/16

BU1 ProdBU1US2 East US 172.17.0.0/16

BU1 ProdBU1EU1 North Europe 172.18.0.0/16

BU1 ProdBU1EU2 West Europe 172.19.0.0/16

BU1 TestDevBU1 West US 172.20.0.0/16

BU2 TestDevBU2 West US 172.21.0.0/16

BU2 ProdBU2US1 West US 172.22.0.0/16

BU2 ProdBU2US2 East US 172.23.0.0/16

BU2 ProdBU2EU1 North Europe 172.24.0.0/16

BU2 ProdBU2EU2 West Europe 172.25.0.0/16

Number of subnets and NSGs

The following requirements are related to subnets and NSGs:

14 | P a g e
70-534 Architecting Microsoft Azure Solutions

You should minimize the amount of VNets and subnets.

Each application is completely isolated from each other.

Each application can be accessed by customers over the Internet using HTTP.

Each application can be accessed by users connected to the on-premises data centers by using an encrypted
tunnel.

Connection to on-premises data centers should use existing VPN devices.

The databases in each location should replicate to other Azure locations once a day.

Based on those requirements, you could use one subnet per application layer, and use NSGs to filter traffic per
application. That way, you only have 3 subnets in each VNet (front end, application layer, and data layer) and one NSG
per application per subnet. In this case, you should consider using the one subnet per application layer, NSGs per
app design pattern. The figure below shows the use of the design pattern representing the ProdBU1US1 VNet.

However, you also need to create an extra subnet for the VPN connectivity between the VNets, and your on-premises
data centers. And you need to specify the address space for each subnet. The figure below shows a sample solution
for ProdBU1US1 VNet. You would replicate this scenario for each VNet. Each color represents a different application.

15 | P a g e
70-534 Architecting Microsoft Azure Solutions

Access Control

The following requirements are related to access control:

The company's networking group should have full control over the VNet configuration.

Developers in each business unit should only be able to deploy VMs to existing subnets.

Based on those requirements, you could add users from the networking team to the built-in Network Contributor role
in each subscription; and create a custom role for the application developers in each subscription giving them rights to
add VMs to existing subnets.

VPN Gateway
A VPN gateway is a type of virtual network gateway that sends encrypted traffic across a public connection to an on-
premises location. You can also use VPN gateways to send encrypted traffic between Azure virtual networks over the
Microsoft network. To send encrypted network traffic between your Azure virtual network and your on-premises site,
you must create a VPN gateway for your virtual network.

Each virtual network can have only one VPN gateway, however, you can create multiple connections to the same VPN
gateway. An example of this is a Multi-Site connection configuration. When you create multiple connections to the
same VPN gateway, all VPN tunnels, including Point-to-Site VPNs, share the bandwidth that is available for the
gateway.
16 | P a g e
70-534 Architecting Microsoft Azure Solutions

What is a virtual network gateway?

A virtual network gateway is composed of two or more virtual machines that are deployed to a specific subnet called
the GatewaySubnet. The VMs that are located in the GatewaySubnet are created when you create the virtual network
gateway. Virtual network gateway VMs are configured to contain routing tables and gateway services specific to the
gateway. You can't directly configure the VMs that are part of the virtual network gateway and you should never
deploy additional resources to the GatewaySubnet.

When you create a virtual network gateway using the gateway type 'Vpn', it creates a specific type of virtual network
gateway that encrypts traffic; a VPN gateway. A VPN gateway can take up to 45 minutes to create. This is because the
VMs for the VPN gateway are being deployed to the GatewaySubnet and configured with the settings that you
specified. The Gateway SKU that you select determines how powerful the VMs are.

Gateway SKUs

When you create a virtual network gateway, you need to specify the gateway SKU that you want to use. Select the
SKUs that satisfy your requirements based on the types of workloads, throughputs, features, and SLAs.

Note

The new VPN gateway SKUs (VpnGw1, VpnGw2, and VpnGw3) are supported for the Resource Manager deployment
model only. Classic virtual networks should continue to use the old SKUs. For more information about the old gateway
SKUs, see Working with virtual network gateway SKUs (old).

Azure offers the following VPN gateway SKUs:

S2S/VNet-to-VNet P2S Aggregate


SKU Tunnels Connections Throughput

VpnGw1 Max. 30 Max. 128 500 Mbps

VpnGw2 Max. 30 Max. 128 1 Gbps

VpnGw3 Max. 30 Max. 128 1.25 Gbps

Basic Max. 10 Max. 128 100 Mbps

Throughput is based on measurements of multiple tunnels aggregated through a single gateway. It is not a
guaranteed throughput due to Internet traffic conditions and your application behaviors.

Pricing information can be found on the Pricing page.

SLA (Service Level Agreement) information can be found on the SLA page.

Production vs. Dev-Test Workloads

Due to the differences in SLAs and feature sets, we recommend the following SKUs for production vs. dev-test:

17 | P a g e
70-534 Architecting Microsoft Azure Solutions

Workload SKUs

Production, critical workloads VpnGw1, VpnGw2, VpnGw3

Dev-test or proof of concept Basic

If you are using the old SKUs, the production SKU recommendations are Standard and HighPerformance SKUs. For
information on the old SKUs, see Gateway SKUs (old).

Gateway SKU feature sets

The new gateway SKUs streamline the feature sets offered on the gateways:

SKU Features

VpnGw1 Route-based VPN up to 30 tunnels*


VpnGw2 P2S, BGP, active-active, custom IPsec/IKE policy, ExpressRoute/VPN co-existence
VpnGw3
* You can configure "PolicyBasedTrafficSelectors" to connect a route-based VPN gateway
(VpnGw1, VpnGw2, VpnGw3) to multiple on-premises policy-based firewall devices. Refer
to Connect VPN gateways to multiple on-premises policy-based VPN devices using
PowerShell for details.

Basic Route-based: 10 tunnels with P2S


Policy-based (IKEv1): 1 tunnel; no P2S

Resizing gateway SKUs

1. You can resize between VpnGw1, VpnGw2, and VpnGw3 SKUs.

2. When working with the old gateway SKUs, you can resize between Basic, Standard, and HighPerformance
SKUs.

3. You cannot resize from Basic/Standard/HighPerformance SKUs to the new VpnGw1/VpnGw2/VpnGw3 SKUs.
You must, instead, migrate to the new SKUs.

Migrating from old SKUs to the new SKUs

Note

The VPN Gateway Public IP address will change when migrating from an old SKU to a new SKU.

You can't resize your Azure VPN gateways directly between the old SKUs and the new SKU families. If you have VPN
gateways in the Resource Manager deployment model that are using the older version of the SKUs, you can migrate to
the new SKUs. To migrate, you delete the existing VPN gateway for your virtual network, then create a new one.

Migration workflow:

18 | P a g e
70-534 Architecting Microsoft Azure Solutions

1. Remove any connections to the virtual network gateway.

2. Delete the old VPN gateway.

3. Create the new VPN gateway.

4. Update your on-premises VPN devices with the new VPN gateway IP address (for Site-to-Site connections).

5. Update the gateway IP address value for any VNet-to-VNet local network gateways that will connect to this
gateway.

6. Download new client VPN configuration packages for P2S clients connecting to the virtual network through
this VPN gateway.

7. Recreate the connections to the virtual network gateway.

Configuring a VPN Gateway

A VPN gateway connection relies on multiple resources that are configured with specific settings. Most of the
resources can be configured separately, although they must be configured in a certain order in some cases.

Settings

The settings that you chose for each resource are critical to creating a successful connection. For information about
individual resources and settings for VPN Gateway, see About VPN Gateway settings. You'll find information to help
you understand gateway types, VPN types, connection types, gateway subnets, local network gateways, and various
other resource settings that you may want to consider.

Deployment tools

You can start out creating and configuring resources using one configuration tool, such as the Azure portal. You can
then later decide to switch to another tool, such as PowerShell, to configure additional resources, or modify existing
resources when applicable. Currently, you can't configure every resource and resource setting in the Azure portal. The
instructions in the articles for each connection topology specify when a specific configuration tool is needed.

Deployment model

When you configure a VPN gateway, the steps you take depend on the deployment model that you used to create
your virtual network. For example, if you created your VNet using the classic deployment model, you use the
guidelines and instructions for the classic deployment model to create and configure your VPN gateway settings. For
more information about deployment models, see Understanding Resource Manager and classic deployment models.

Connection topology diagrams

It's important to know that there are different configurations available for VPN gateway connections. You need to
determine which configuration best fits your needs. In the sections below, you can view information and topology
diagrams about the following VPN gateway connections: The following sections contain tables which list:

Available deployment model

Available configuration tools

Links that take you directly to an article, if available

Use the diagrams and descriptions to help select the connection topology to match your requirements. The diagrams
show the main baseline topologies, but it's possible to build more complex configurations using the diagrams as a
guideline.

Site-to-Site and Multi-Site (IPsec/IKE VPN tunnel)

Site-to-Site
19 | P a g e
70-534 Architecting Microsoft Azure Solutions

A Site-to-Site (S2S) VPN gateway connection is a connection over IPsec/IKE (IKEv1 or IKEv2) VPN tunnel. This type of
connection requires a VPN device located on-premises that has a public IP address assigned to it and is not located
behind a NAT. S2S connections can be used for cross-premises and hybrid configurations.

Multi-Site

This type of connection is a variation of the Site-to-Site connection. You create more than one VPN connection from
your virtual network gateway, typically connecting to multiple on-premises sites. When working with multiple
connections, you must use a RouteBased VPN type (known as a dynamic gateway when working with classic VNets).
Because each virtual network can only have one VPN gateway, all connections through the gateway share the
available bandwidth. This is often called a "multi-site" connection.

Deployment models and methods for Site-to-Site and Multi-Site

Deployment Model/Method Azure Portal Classic Portal PowerShell Azure CLI

Resource Manager Article Not Supported Article Article

Classic Article** Article* Article+ Not Supported

(*) denotes that the classic portal can only support creating one S2S VPN connection.

(**) denotes that this method contains steps that require PowerShell.

(+) denotes that this article is written for multi-site connections.

Point-to-Site (VPN over SSTP)

20 | P a g e
70-534 Architecting Microsoft Azure Solutions

A Point-to-Site (P2S) VPN gateway connection allows you to create a secure connection to your virtual network from
an individual client computer. P2S is a VPN connection over SSTP (Secure Socket Tunneling Protocol). Unlike S2S
connections, P2S connections do not require an on-premises public-facing IP address or a VPN device. You establish
the VPN connection by starting it from the client computer. This solution is useful when you want to connect to your
VNet from a remote location, such as from home or a conference, or when you only have a few clients that need to
connect to a VNet. P2S connections can be used with S2S connections through the same VPN gateway, as long as all
the configuration requirements for both connections are compatible.

Deployment models and methods for Point-to-Site

Deployment Model/Method Azure Portal Classic Portal PowerShell

Classic Article Supported Supported

Resource Manager Article Not Supported Article

VNet-to-VNet connections (IPsec/IKE VPN tunnel)

Connecting a virtual network to another virtual network (VNet-to-VNet) is similar to connecting a VNet to an on-
premises site location. Both connectivity types use a VPN gateway to provide a secure tunnel using IPsec/IKE. You can
even combine VNet-to-VNet communication with multi-site connection configurations. This lets you establish network
topologies that combine cross-premises connectivity with inter-virtual network connectivity.

The VNets you connect can be:

in the same or different regions

in the same or different subscriptions

in the same or different deployment models

21 | P a g e
70-534 Architecting Microsoft Azure Solutions

Connections between deployment models

Azure currently has two deployment models: classic and Resource Manager. If you have been using Azure for some
time, you probably have Azure VMs and instance roles running in a classic VNet. Your newer VMs and role instances
may be running in a VNet created in Resource Manager. You can create a connection between the VNets to allow the
resources in one VNet to communicate directly with resources in another.

VNet peering

You may be able to use VNet peering to create your connection, as long as your virtual network meets certain
requirements. VNet peering does not use a virtual network gateway. For more information, see VNet peering.

Deployment models and methods for VNet-to-VNet

Azure
Deployment Model/Method Portal Classic Portal PowerShell CLI

Classic Article* Article* Supported Not


Supported

Resource Manager Article+ Not Article Article


Supported

Connections between different Article* Supported* Article Not


deployment models Supported

(+) denotes this deployment method is available only for VNets in the same subscription.
(*) denotes that this deployment method also requires PowerShell.

ExpressRoute (dedicated private connection)

Microsoft Azure ExpressRoute lets you extend your on-premises networks into the Microsoft cloud over a dedicated
private connection facilitated by a connectivity provider. With ExpressRoute, you can establish connections to
Microsoft cloud services, such as Microsoft Azure, Office 365, and CRM Online. Connectivity can be from an any-to-
any (IP VPN) network, a point-to-point Ethernet network, or a virtual cross-connection through a connectivity provider
at a co-location facility.

ExpressRoute connections do not go over the public Internet. This allows ExpressRoute connections to offer more
reliability, faster speeds, lower latencies, and higher security than typical connections over the Internet.

An ExpressRoute connection does not use a VPN gateway, although it does use a virtual network gateway as part of its
required configuration. In an ExpressRoute connection, the virtual network gateway is configured with the gateway
type 'ExpressRoute', rather than 'Vpn'. For more information about ExpressRoute, see the ExpressRoute technical
overview.

Site-to-Site and ExpressRoute coexisting connections


22 | P a g e
70-534 Architecting Microsoft Azure Solutions

ExpressRoute is a direct, dedicated connection from your WAN (not over the public Internet) to Microsoft Services,
including Azure. Site-to-Site VPN traffic travels encrypted over the public Internet. Being able to configure Site-to-Site
VPN and ExpressRoute connections for the same virtual network has several advantages.

You can configure a Site-to-Site VPN as a secure failover path for ExpressRoute, or use Site-to-Site VPNs to connect to
sites that are not part of your network, but that are connected through ExpressRoute. Notice that this configuration
requires two virtual network gateways for the same virtual network, one using the gateway type 'Vpn', and the other
using the gateway type 'ExpressRoute'.1

Deployment models and methods for S2S and ExpressRoute

Classic Deployment Resource Manager Deployment

Classic Portal Not Supported Not Supported

Azure Portal Not Supported Not Supported

PowerShell Article Article

Pricing

You pay for two things: the hourly compute costs for the virtual network gateway, and the egress data transfer from
the virtual network gateway. Pricing information can be found on the Pricing page.

Virtual network gateway compute costs


Each virtual network gateway has an hourly compute cost. The price is based on the gateway SKU that you specify
when you create a virtual network gateway. The cost is for the gateway itself and is in addition to the data transfer
that flows through the gateway.

Data transfer costs


Data transfer costs are calculated based on egress traffic from the source virtual network gateway.

23 | P a g e
70-534 Architecting Microsoft Azure Solutions

If you are sending traffic to your on-premises VPN device, it will be charged with the Internet egress data
transfer rate.

If you are sending traffic between virtual networks in different regions, the pricing is based the region.

If you are sending traffic only between virtual networks that are in the same region, there are no data costs.
Traffic between VNets in the same region is free.

Planning and design for VPN Gateway


Planning and designing your cross-premises and VNet-to-VNet configurations can be either simple, or complicated,
depending on your networking needs. This article walks you through basic planning and design considerations.

Planning

Cross-premises connectivity options

If you want to connect your on-premises sites securely to a virtual network, you have three different ways to do so:
Site-to-Site, Point-to-Site, and ExpressRoute. Compare the different cross-premises connections that are available. The
option you choose can depend on various considerations, such as:

What kind of throughput does your solution require?

Do you want to communicate over the public Internet via secure VPN, or over a private connection?

Do you have a public IP address available to use?

Are you planning to use a VPN device? If so, is it compatible?

Are you connecting just a few computers, or do you want a persistent connection for your site?

What type of VPN gateway is required for the solution you want to create?

Which gateway SKU should you use?

Planning table

The following table can help you decide the best connectivity option for your solution.

Point-to-Site Site-to-Site ExpressRoute

Azure Cloud Services and Cloud Services and Services list


Supported Virtual Machines Virtual Machines
Services

Typical Typically < 100 Mbps Typically < 100 Mbps 50 Mbps, 100 Mbps, 200
Bandwidths aggregate aggregate Mbps, 500 Mbps, 1 Gbps, 2
Gbps, 5 Gbps, 10 Gbps

Protocols Secure Sockets IPsec Direct connection over VLANs,


Supported Tunneling Protocol NSP's VPN technologies
(SSTP) (MPLS, VPLS,...)

24 | P a g e
70-534 Architecting Microsoft Azure Solutions

Point-to-Site Site-to-Site ExpressRoute

Routing RouteBased We support PolicyBased BGP


(dynamic) (static routing) and
RouteBased (dynamic
routing VPN)

Connection active-passive active-passive active-active


resiliency

Typical use Prototyping, dev / Dev / test / lab scenarios Access to all Azure services
case test / lab scenarios and small scale (validated list), Enterprise-
for cloud services and production workloads for class and mission critical
virtual machines cloud services and virtual workloads, Backup, Big Data,
machines Azure as a DR site

Gateway SKUs

Azure offers the following VPN gateway SKUs:

S2S/VNet-to-VNet P2S Aggregate


SKU Tunnels Connections Throughput

VpnGw1 Max. 30 Max. 128 500 Mbps

VpnGw2 Max. 30 Max. 128 1 Gbps

VpnGw3 Max. 30 Max. 128 1.25 Gbps

Basic Max. 10 Max. 128 100 Mbps

Throughput is based on measurements of multiple tunnels aggregated through a single gateway. It is not a
guaranteed throughput due to Internet traffic conditions and your application behaviors.

Pricing information can be found on the Pricing page.

SLA (Service Level Agreement) information can be found on the SLA page.

Workflow

The following list outlines the common workflow for cloud connectivity:

1. Design and plan your connectivity topology and list the address spaces for all networks you want to connect.

2. Create an Azure virtual network.

25 | P a g e
70-534 Architecting Microsoft Azure Solutions

3. Create a VPN gateway for the virtual network.

4. Create and configure connections to on-premises networks or other virtual networks (as needed).

5. Create and configure a Point-to-Site connection for your Azure VPN gateway (as needed).

Design

Connection topologies

Start by looking at the diagrams in the About VPN Gateway article. The article contains basic diagrams, the
deployment models for each topology (Resource Manager or classic), and which deployment tools you can use to
deploy your configuration.

Design basics

The following sections discuss the VPN gateway basics. Also, consider networking services limitations.

About subnets

When you are creating connections, you must consider your subnet ranges. You cannot have overlapping subnet
address ranges. An overlapping subnet is when one virtual network or on-premises location contains the same
address space that the other location contains. This means that you need your network engineers for your local on-
premises networks to carve out a range for you to use for your Azure IP addressing space/subnets. You need address
space that is not being used on the local on-premises network.

Avoiding overlapping subnets is also important when you are working with VNet-to-VNet connections. If your subnets
overlap and an IP address exists in both the sending and destination VNets, VNet-to-VNet connections fail. Azure can't
route the data to the other VNet because the destination address is part of the sending VNet.

VPN Gateways require a specific subnet called a gateway subnet. All gateway subnets must be named GatewaySubnet
to work properly. Be sure not to name your gateway subnet a different name, and don't deploy VMs or anything else
to the gateway subnet. See Gateway Subnets.

About local network gateways

The local network gateway typically refers to your on-premises location. In the classic deployment model, the local
network gateway is referred to as a Local Network Site. When you configure a local network gateway, you give it a
name, specify the public IP address of the on-premises VPN device, and specify the address prefixes that are in the on-
premises location. Azure looks at the destination address prefixes for network traffic, consults the configuration that
you have specified for the local network gateway, and routes packets accordingly. You can modify these address
prefixes as needed. For more information, see Local network gateways.

About gateway types

Selecting the correct gateway type for your topology is critical. If you select the wrong type, your gateway won't work
properly. The gateway type specifies how the gateway itself connects and is a required configuration setting for the
Resource Manager deployment model.

The gateway types are:

Vpn

ExpressRoute

About connection types

Each configuration requires a specific connection type. The connection types are:

IPsec

26 | P a g e
70-534 Architecting Microsoft Azure Solutions

Vnet2Vnet

ExpressRoute

VPNClient

About VPN types

Each configuration requires a specific VPN type. If you are combining two configurations, such as creating a Site-to-
Site connection and a Point-to-Site connection to the same VNet, you must use a VPN type that satisfies both
connection requirements.

PolicyBased: PolicyBased VPNs were previously called static routing gateways in the classic deployment model.
Policy-based VPNs encrypt and direct packets through IPsec tunnels based on the IPsec policies configured
with the combinations of address prefixes between your on-premises network and the Azure VNet. The policy
(or traffic selector) is usually defined as an access list in the VPN device configuration. The value for a
PolicyBased VPN type is PolicyBased. When using a PolicyBased VPN, keep in mind the following limitations:

o PolicyBased VPNs can only be used on the Basic gateway SKU. This VPN type is not compatible with
other gateway SKUs.

o You can have only 1 tunnel when using a PolicyBased VPN.

o You can only use PolicyBased VPNs for S2S connections, and only for certain configurations. Most VPN
Gateway configurations require a RouteBased VPN.

RouteBased: RouteBased VPNs were previously called dynamic routing gateways in the classic deployment
model. RouteBased VPNs use "routes" in the IP forwarding or routing table to direct packets into their
corresponding tunnel interfaces. The tunnel interfaces then encrypt or decrypt the packets in and out of the
tunnels. The policy (or traffic selector) for RouteBased VPNs are configured as any-to-any (or wild cards). The
value for a RouteBased VPN type is RouteBased.

The following tables show the VPN type as it maps to each connection configuration. Make sure the VPN type for your
gateway matches the configuration that you want to create.

VPN type - Resource Manager deployment model

RouteBased PolicyBased

Site-to-Site Supported Supported

VNet-to-VNet Supported Not Supported

Multi-Site Supported Not Supported

S2S and ExpressRoute coexist Supported Not Supported

Point-to-Site Supported Not Supported

27 | P a g e
70-534 Architecting Microsoft Azure Solutions

RouteBased PolicyBased

Classic to Resource Manager Supported Not Supported

VPN type - classic deployment model

Dynamic Static

Site-to-Site Supported Supported

VNet-to-VNet Supported Not Supported

Multi-Site Supported Not Supported

S2S and ExpressRoute coexist Supported Not Supported

Point-to-Site Supported Not Supported

Classic to Resource Manager Supported Not Supported

VPN devices for Site-to-Site connections

To configure a Site-to-Site connection, regardless of deployment model, you need the following items:

A VPN device that is compatible with Azure


VPN gateways

A public-facing IPv4 IP address that is not


behind a NAT

You need to have experience configuring your VPN


device, or have someone that can configure the
device for you. For more information about VPN
devices, see About VPN devices. The VPN devices
article contains information about validated
devices, requirements for devices that have not
been validated, and links to device configuration
documents if available.

Consider forced tunnel routing

For most configurations, you can configure forced


tunneling. Forced tunneling lets you redirect or
"force" all Internet-bound traffic back to your on-
premises location via a Site-to-Site VPN tunnel for

28 | P a g e
70-534 Architecting Microsoft Azure Solutions

inspection and auditing. This is a critical security requirement for most enterprise IT policies.

Without forced tunneling, Internet-bound traffic from your VMs in Azure will always traverse from Azure network
infrastructure directly out to the Internet, without the option to allow you to inspect or audit the traffic. Unauthorized
Internet access can potentially lead to information disclosure or other types of security breaches.

A forced tunneling connection can be configured in both deployment models and by using different tools. For more
information, see Configure forced tunneling.

Azure Load Balancer

Azure Load Balancer delivers high availability and network performance to your applications. It is a Layer 4 (TCP, UDP)
load balancer that distributes incoming traffic among healthy instances of services defined in a load-balanced set.+

Azure Load Balancer can be configured to:

Load balance incoming Internet traffic to virtual machines. This configuration is known as Internet-facing load
balancing.

Load balance traffic between virtual machines in a


virtual network, between virtual machines in cloud
services, or between on-premises computers and virtual
machines in a cross-premises virtual network. This
configuration is known as internal load balancing.

Forward external traffic to a specific virtual machine.

All resources in the cloud need a public IP address to be


reachable from the Internet. The cloud infrastructure in
Azure uses non-routable IP addresses for its resources.
Azure uses network address translation (NAT) with public
IP addresses to communicate to the Internet.

Azure deployment models

It's important to understand the differences between the


Azure classic and Resource Manager deployment models.
Azure Load Balancer is configured differently in each
model.

Figure 1 - Azure Load Balancer in the classic deployment model

Azure classic deployment model

Virtual machines deployed within a cloud service boundary can be grouped to use a load balancer. In this model a
public IP address and a Fully Qualified Domain Name, (FQDN) are assigned to a cloud service. The load balancer does
port translation and load balances the network traffic by using the public IP address for the cloud service.

Load-balanced traffic is defined by endpoints. Port translation endpoints have a one-to-one relationship between the
public-assigned port of the public IP address and the local port assigned to the service on a specific virtual machine.
Load balancing endpoints have a one-to-many relationship between the public IP address and the local ports assigned
to the services on the virtual machines in the cloud service.

The domain label for the public IP address that the load balancer uses for this deployment model is <cloud service
name>.cloudapp.net. The following graphic shows the Azure Load Balancer in this model.

29 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure Resource Manager deployment model

In the Resource Manager deployment model there is no need to create a Cloud service. The load balancer is created
to explicitly route traffic among multiple virtual machines.

A public IP address is an individual resource that has a domain label (DNS name). The public IP address is associated
with the load balancer resource. Load balancer rules and inbound NAT rules use the public IP address as the Internet
endpoint for the resources that are receiving load-balanced network traffic.

A private or public IP address is assigned to the network interface resource attached to a virtual machine. Once a
network interface is added to a load balancer's back-end IP address pool, the load balancer is able to send load-
balanced network traffic based on the load-balanced rules that are created.

The following graphic shows the Azure Load Balancer in this model:1

Figure 2 - Azure Load Balancer in Resource Manager

The load balancer can be managed through Resource Manager-based templates, APIs, and tools. To learn more about
Resource Manager, see the Resource Manager overview.

Load Balancer features

Hash-based distribution

Azure Load Balancer uses a hash-based distribution algorithm. By default, it uses a 5-tuple hash composed of source
IP, source port, destination IP, destination port, and protocol type to map traffic to available servers. It provides
stickiness only within a transport session. Packets in the same TCP or UDP session will be directed to the same
instance behind the load-balanced endpoint. When the client closes and reopens the connection or starts a new
session from the same source IP, the source port changes. This may cause the traffic to go to a different endpoint in a
different datacenter.

For more details, see Load balancer distribution mode. The following graphic shows the hash-based distribution:

30 | P a g e
70-534 Architecting Microsoft Azure Solutions

Figure 3 - Hash based distribution

Port forwarding

Azure Load Balancer gives you control over how inbound communication is managed. This communication includes
traffic initiated from Internet hosts, virtual machines in other cloud services, or virtual networks. This control is
represented by an endpoint (also called an input endpoint).

An input endpoint listens on a public port and forwards traffic to an internal port. You can map the same ports for an
internal or external endpoint or use a different port for them. For example, you can have a web server configured to
listen to port 81 while the public endpoint mapping is port 80. The creation of a public endpoint triggers the creation
of a load balancer instance.

When created using the Azure portal, the portal automatically creates endpoints to the virtual machine for the
Remote Desktop Protocol (RDP) and remote Windows PowerShell session traffic. You can use these endpoints to
remotely administer the virtual machine over the Internet.

Automatic reconfiguration

Azure Load Balancer instantly reconfigures itself when you scale instances up or down. For example, this
reconfiguration happens when you increase the instance count for web/worker roles in a cloud service or when you
add additional virtual machines into the same load-balanced set.

Service monitoring

Azure Load Balancer can probe the health of the various server instances. When a probe fails to respond, the load
balancer stops sending new connections to the unhealthy instances. Existing connections are not impacted.

Three types of probes are supported:

o Guest agent probe (on Platform as a Service Virtual Machines only): The load balancer utilizes the
guest agent inside the virtual machine. The guest agent listens and responds with an HTTP 200 OK
response only when the instance is in the ready state (i.e. the instance is not in a state like busy,
recycling, or stopping). If the agent fails to respond with an HTTP 200 OK, the load balancer marks the
instance as unresponsive and stops sending traffic to that instance. The load balancer continues to
ping the instance. If the guest agent responds with an HTTP 200, the load balancer will send traffic to
that instance again. When you're using a web role, your website code typically runs in w3wp.exe,
which is not monitored by the Azure fabric or guest agent. This means that failures in w3wp.exe (e.g.
31 | P a g e
70-534 Architecting Microsoft Azure Solutions

HTTP 500 responses) will not be reported to the guest agent, and the load balancer will not know to
take that instance out of rotation.

o HTTP custom probe: This probe overrides the default (guest agent) probe. You can use it to create
your own custom logic to determine the health of the role instance. The load balancer will regularly
probe your endpoint (every 15 seconds, by default). The instance is considered to be in rotation if it
responds with a TCP ACK or HTTP 200 within the timeout period (default of 31 seconds). This is useful
for implementing your own logic to remove instances from the load balancer's rotation. For example,
you can configure the instance to return a non-200 status if the instance is above 90% CPU. For web
roles that use w3wp.exe, you also get automatic monitoring of your website, since failures in your
website code return a non-200 status to the probe.

o TCP custom probe: This probe relies on successful TCP session establishment to a defined probe port.

For more information, see the LoadBalancerProbe schema.

Source NAT

All outbound traffic to the Internet that originates from your service undergoes source NAT (SNAT) by using the same
VIP address as the incoming traffic. SNAT provides important benefits:

o It enables easy upgrade and disaster recovery of services, since the VIP can be dynamically mapped to
another instance of the service.

o It makes access control list (ACL) management easier. ACLs expressed in terms of VIPs do not change
as services scale up, down, or get redeployed.

The load balancer configuration supports full cone NAT for UDP. Full cone NAT is a type of NAT where the port allows
inbound connections from any external host (in response to an outbound request).

For each new outbound connection that a virtual machine initiates, an outbound port is also allocated by the load
balancer. The external host sees traffic with a virtual IP (VIP)-allocated port. For scenarios that require a large number
of outbound connections, it is recommended to use instance-level public IP addresses so that the VMs have a
dedicated outbound IP address for SNAT. This reduces the risk of port exhaustion.

Please see outbound connections article for more details on this topic.

Support for multiple load-balanced IP addresses for virtual machines

You can assign more than one load-balanced public IP address to a set of virtual machines. With this ability, you can
host multiple SSL websites and/or multiple SQL Server AlwaysOn Availability Group listeners on the same set of virtual
machines. For more information, see Multiple VIPs per cloud service.

Load Balancer differences

There are different options to distribute network traffic using Microsoft Azure. These options work differently from
each other, having a different feature set and support different scenarios. They can each be used in isolation, or
combining them.

Azure Load Balancer works at the transport layer (Layer 4 in the OSI network reference stack). It provides
network-level distribution of traffic across instances of an application running in the same Azure data center.

Application Gateway works at the application layer (Layer 7 in the OSI network reference stack). It acts as a
reverse-proxy service, terminating the client connection and forwarding requests to back-end endpoints.

Traffic Manager works at the DNS level. It uses DNS responses to direct end-user traffic to globally distributed
endpoints. Clients then connect to those endpoints directly.

The following table summarizes the features offered by each service:

32 | P a g e
70-534 Architecting Microsoft Azure Solutions

Service Azure Load Balancer Application Gateway Traffic Manager

Technology Transport level (Layer Application level (Layer 7) DNS level


4)

Application Any HTTP, HTTPS, and Any (An HTTP endpoint


protocols WebSockets is required for endpoint
supported monitoring)

Endpoints Azure VMs and Cloud Any Azure internal IP Azure VMs, Cloud
Services role instances address, public internet IP Services, Azure Web
address, Azure VM, or Azure Apps, and external
Cloud Service endpoints

Vnet support Can be used for both Can be used for both Only supports Internet-
Internet facing and Internet facing and internal facing applications
internal (Vnet) (Vnet) applications
applications

Endpoint Supported via probes Supported via probes Supported via


Monitoring HTTP/HTTPS GET

Azure Load Balancer and Application Gateway route network traffic to endpoints but they have different usage
scenarios to which traffic to handle. The following table helps understanding the difference between the two load
balancers:

Type Azure Load Balancer Application Gateway

Protocols UDP/TCP HTTP, HTTPS, and WebSockets

IP reservation Supported Not supported

Load balancing 5-tuple(source IP, source port, Round Robin


mode destination IP, destination port, Routing based on URL
protocol type)

Load balancing 2-tuple (source IP and destination IP), Cookie-based affinity


mode (source IP 3-tuple (source IP, destination IP, and Routing based on URL
/sticky sessions) port). Can scale up or down based on
the number of virtual machines

33 | P a g e
70-534 Architecting Microsoft Azure Solutions

Type Azure Load Balancer Application Gateway

Health probes Default: probe interval - 15 secs. Taken Idle probe interval 30 secs. Taken out
out of rotation: 2 Continuous failures. after 5 consecutive live traffic failures
Supports user-defined probes or a single probe failure in idle mode.
Supports user-defined probes

SSL offloading Not supported Supported

Url-based routing Not supported Supported

SSL Policy Not supported Supported

Create, change, or delete network interfaces


A NIC enables an Azure Virtual Machine (VM) to communicate with Internet, Azure, and on-premises resources. When
creating a VM using the Azure portal, the portal creates one NIC with default settings for you. You may instead choose
to create NICs with custom settings and add one or more to VMs when you create them. You may also want to change
default NIC settings for existing NICs. This article explains how to create NICs with custom settings, change existing NIC
settings, such as network filter assignment (network security groups), subnet assignment, DNS server settings, and IP
forwarding, and delete NICs.+

If you need to add, change, or remove IP addresses for a NIC, read the Add, change, or remove IP addresses article. If
you need to add NICs to, or remove NICs from VMs, read the Add or remove NICs article.

Before you begin

Complete the following tasks before completing any steps in any section of this article:

Review the Azure limits article to learn about limits for NICs.

Log in to the Azure portal, Azure command-line interface (CLI), or Azure PowerShell with an Azure account. If
you don't already have an Azure account, sign up for a free trial account.

If using PowerShell commands to complete tasks in this article, install and configure Azure PowerShell by
completing the steps in the How to install and configure Azure PowerShell article. Ensure you have the most
recent version of the Azure PowerShell commandlets installed. To get help for PowerShell commands, with
examples, type get-help <command> -full.

If using Azure Command-line interface (CLI) commands to complete tasks in this article, install and configure
the Azure CLI by completing the steps in the How to install and configure the Azure CLI article. Ensure you
have the most recent version of the Azure CLI installed. To get help for CLI commands, type az <command> --
help.

Create a NIC

When creating a VM using the Azure portal, the portal creates a NIC with default settings for you. If you'd rather
specify all your NIC settings, you can create a NIC with custom settings and attach the NIC to a VM when creating a
VM. You can also create a NIC and add it to an existing VM. To learn how to create a VM with an existing NIC or to add
to, or remove NICs from existing VMs, read the Add or remove NICs article. Before creating a NIC, you must have an
34 | P a g e
70-534 Architecting Microsoft Azure Solutions

existing virtual network (VNet) in the same location and subscription you create a NIC in. To learn how to create a
VNet, read the Create a VNet article.

1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.

2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.

3. In the Network interfaces blade that appears, click + Add.

4. In the Create network interface blade that appears, enter, or select values for the following settings, then
click Create:

Setting Required? Details

Name Yes The name must be unique within the resource group you select. Over
time, you'll likely have several NICs in your Azure subscription. Read
the Naming conventions article for suggestions when creating a naming
convention to make managing several NICs easier. The name cannot be
changed after the NIC is created.

Virtual Yes Select a VNet to connect the NIC to. You can only connect a NIC to a
network VNet that exists in the same subscription and location as the NIC. Once a
NIC is created, you cannot change the VNet it is connected to. The VM
you add the NIC to must also exist in the same location and subscription
as the NIC.

Subnet Yes Select a subnet within the VNet you selected. You can change the
subnet the NIC is connected to after it's created.

Private IP Yes Choose from the following assignment methods: Dynamic: When
address selecting this option, Azure automatically assigns an available address
assignment from the address space of the subnet you selected. Azure may assign a
different address to a NIC when the VM it's in is started after having
been in the stopped (deallocated) state. The address remains the same
if the VM is restarted without having been in the stopped (deallocated)
state. Static: When selecting this option, you must manually assign an
available IP address from within the address space of the subnet you
selected. Static addresses do not change until you change them or the
NIC is deleted. You can change the assignment method after the NIC is
created. The Azure DHCP server assigns this address to the NIC within
the operating system of the VM.

Network No Leave set to None, select an existing network security group (NSG), or
security group create an NSG. NSGs enable you to filter network traffic in and out of a
NIC. To learn more about NSGs, read the Network security
groups article. To create an NSG, read the Create an NSG article. You

35 | P a g e
70-534 Architecting Microsoft Azure Solutions

Setting Required? Details

can apply zero or one NSG to a NIC. Zero or one NSG can also be applied
to the subnet the NIC is connected to. When an NSG is applied to a NIC
and the subnet the NIC is connected to, sometimes unexpected results
occur. To troubleshoot NSGs applied to NICs and subnets, read
the Troubleshoot NSGs article.

Subscription Yes Select one of your Azure subscriptions. The VM you attach a NIC to and
the VNet you connect it to must exist in the same subscription.

Resource Yes Select an existing resource group or create one. A NIC can exist in the
group same, or different resource group, than the VM you attach it to, or the
VNet you connect it to.

Location Yes The VM you attach a NIC to and the VNet you connect it to must exist in
the same location, also referred to as a region.

The portal doesn't provide the option to assign a public IP address to the NIC when you create it, though it does assign
a public IP address to a NIC when you create a VM using the portal. To learn how to add a public IP address to the NIC
after creating it, read the Add, change, or remove IP addresses article. If you want to create a NIC with a public IP
address, you must use the CLI or PowerShell to create the NIC.

Note

Azure assigns a MAC address to the NIC only after the NIC is attached to a VM and the VM is started the first time. You
cannot specify the MAC address that Azure assigns to the NIC. The MAC address remains assigned to the NIC until the
NIC is deleted or the private IP address assigned to the primary IP configuration of the primary NIC is changed. To
learn more about IP addresses and IP configurations, read the Add, change, or remove IP addresses article.

Commands

Tool Command

CLI az network nic create

PowerShell New-AzureRmNetworkInterface

View NIC settings

You can view and change most settings for a NIC.

1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.

36 | P a g e
70-534 Architecting Microsoft Azure Solutions

2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.

3. In the Network interfaces blade that appears, click the NIC you want to view or change settings for.

4. The following settings are listed in the blade that appears for the NIC you selected:

Overview: Provides information about the NIC, such as the IP addresses assigned to it, the
VNet/subnet the NIC is connected to, and the VM the NIC is attached to (if it's attached to one). The
following picture shows the overview settings for a NIC
named mywebserver256:

You can move a NIC to a different resource group or subscription by clicking (change) next to
the Resource group or Subscription name. If you move the NIC, you must move all resources related
to the NIC with it. If the NIC is attached to a VM, for example, you must also move the VM, and other
VM-related resources. To move a NIC, read the Move resource to a new resource group or
subscription article. The article lists prerequisites, and how to move resources using the Azure portal,
PowerShell, and the Azure CLI.

IP configurations: Public and private IP addresses are assigned to one or more IP configurations for a
NIC. To learn more about the maximum number of IP configurations supported for a NIC, read
the Azure limits article. Each IP configuration has one assigned private IP address, and may have one
public IP address associated to it. To add, change, or delete IP configurations from the NIC, complete
the steps in the Add a secondary IP configuration to a NIC, Change an IP configuration, or Delete an IP
configurationsections of the Add, change, or remove IP addresses article. IP forwarding and subnet
assignment are also configured in this section. To learn more about these settings, read the Enable-
disable IP forwarding and Change subnet assignment sections of this article.

DNS servers: You can specify which DNS server a NIC is assigned by the Azure DHCP servers. The NIC
can inherit the setting from the VNet the NIC is connected to, or have a custom setting that overrides
the setting for the VNet it's connected to. To modify what's displayed, complete the steps in
the Change DNS servers section of this article.

Network security group (NSG): Displays which NSG is associated to the NIC (if any). An NSG contains
inbound and outbound rules to filter network traffic for the NIC. If an NSG is associated to the NIC, the
name of the associated NSG is displayed. To modify what's displayed, complete the steps in
the Associate an NSG to or disassociate an NSG from a network interface section of this article.

Properties: Displays key settings about the NIC, including its MAC address (blank if the NIC isn't
attached to a VM), and the subscription it exists in.

Effective security rules: Security rules are listed if the NIC is attached to a running VM, and an NSG is
associated to the NIC, the subnet it's connected to, or both. To learn more about what's displayed,

37 | P a g e
70-534 Architecting Microsoft Azure Solutions

read the Troubleshoot network security groups article. To learn more about NSGs, read the Network
security groups article.

Effective routes: Routes are listed if the NIC is attached to a running VM. The routes are a
combination of the Azure default routes, any user-defined routes (UDR), and any BGP routes that may
exist for the subnet the NIC is connected to. To learn more about what's displayed, read
the Troubleshoot routes article. To learn more about Azure default and UDRs, read the User-defined
routesarticle.

Common Azure Resource Manager settings: To learn more about common Azure Resource Manager
settings, read the Activity log, Access control (IAM), Tags, Locks, and Automation script articles.

Commands

Tool Command

CLI az network nic list to view NICs in the subscription; az network nic show to view settings for
a NIC

PowerShell Get-AzureRmNetworkInterface to view NICs in the subscription or view settings for a NIC

Change DNS servers

The DNS server is assigned by the Azure DHCP server to the NIC within the VM operating system. The DNS server
assigned is whatever the DNS server setting is for a NIC. To learn more about name resolution settings for a NIC, read
the Name resolution for VMs article. The NIC can inherit the settings from the VNet, or use its own unique settings
that override the setting for the VNet.

1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.

2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.

3. In the Network interfaces blade that appears, click the NIC you want to view or change settings for.

4. In the blade for the NIC you selected, click DNS servers under SETTINGS.

5. Click either:

Inherit from virtual network (default): Choose this option to inherit the DNS server setting defined for
the virtual network the NIC is connected to. At the VNet level, either a custom DNS server or the
Azure-provided DNS server is defined. The Azure-provided DNS server can resolve hostnames for
resources connected to the same VNet. FQDN must be used to resolve for resources connected to
different VNets.

Custom: You can configure your own DNS server to resolve names across multiple VNets. Enter the IP
address of the server you want to use as a DNS server. The DNS server address you specify is assigned
only to this NIC and overrides any DNS setting for the VNet the NIC is connected to.

6. Click Save.

Commands

38 | P a g e
70-534 Architecting Microsoft Azure Solutions

Tool Command

CLI az network nic update

PowerShell Set-AzureRmNetworkInterface

Enable-disable IP forwarding

IP forwarding enables the VM a NIC is attached to:

Receive network traffic not destined for one of the IP addresses assigned to any of the IP configurations
assigned to the NIC.

Send network traffic with a different source IP address than the one assigned to one of a NIC's IP
configurations.

The setting must be enabled for every NIC attached to the VM that receives traffic that the VM needs to forward. A
VM can forward traffic whether it has multiple NICs or a single NIC attached to it. While IP forwarding is an Azure
setting, the VM must also run an application able to forward the traffic, such as firewall, WAN optimization, and load
balancing applications. When a VM is running network applications, the VM is often referred to as a network virtual
appliance (NVA). You can view a list of ready to deploy NVAs in the Azure Marketplace. IP forwarding is typically used
with user-defined routes. To learn more about user-defined routes, read the User-defined routes article.

1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.

2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.

3. In the Network interfaces blade that appears, click the NIC you want to enable or disable IP forwarding for.

4. In the blade for the NIC you selected, click IP configurations in the SETTINGS section.

5. Click Enabled or Disabled (default setting) to change the setting.

6. Click Save.

Commands

Tool Command

CLI az network nic update

PowerShell Set-AzureRmNetworkInterface

Change subnet assignment

You can change the subnet, but not the VNet, that a NIC is connected to.

39 | P a g e
70-534 Architecting Microsoft Azure Solutions

1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.

2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.

3. In the Network interfaces blade that appears, click the NIC you want to view or change settings for.

4. Click IP configurations under SETTINGS in the blade for the NIC you selected. If any private IP addresses for
any IP configurations listed have (Static) next to them, you must change the IP address assignment method to
dynamic by completing the steps that follow. All private IP addresses must be assigned with the dynamic
assignment method to change the subnet assignment for the NIC. If the addresses are assigned with the
dynamic method, continue to step five. If any addresses are assigned with the static assignment method,
complete the following steps to change the assignment method to dynamic:

Click the IP configuration you want to change the IP address assignment method for from the list of IP
configurations.

In the blade that appears for the IP configuration, click Dynamic for the Assignment method.

Click Save.

5. Select the subnet you want to connect the NIC to from the Subnet drop-down list.

6. Click Save. New dynamic addresses are assigned from the subnet address range for the new subnet. After
assigning the NIC to a new subnet, you can assign a static IP address from the new subnet address range if you
choose. To learn more about adding, changing, and removing IP addresses for a NIC, read the Add, change, or
remove IP addresses article.

Commands

Tool Command

CLI az network nic ip-config update

PowerShell Set-AzureRmNetworkInterfaceIpConfig

Delete a NIC

You can delete a NIC as long as it's not attached to a VM. If it is attached to a VM, you must first place the VM in the
stopped (deallocated) state, then detach the NIC from the VM, before you can delete the NIC. To detach a NIC from a
VM, complete the steps in the Detach a NIC from a virtual machine section of the Add or remove network
interfaces article. Deleting a VM detaches all NICs attached to it, but does not delete the NICs.

1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.

2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.

3. Right-click the NIC you want to delete and click Delete.

4. Click Yes to confirm deletion of the NIC.

40 | P a g e
70-534 Architecting Microsoft Azure Solutions

When you delete a NIC, any MAC or IP addresses assigned to it are released.

Commands

Tool Command

CLI az network nic delete

PowerShell Remove-AzureRmNetworkInterface

Create a VM with a static public IP address using the Azure portal


You can create virtual machines (VMs) in Azure and expose
them to the public Internet by using a public IP address. By
default, Public IPs are dynamic and the address associated to
them may change when the VM is deleted. To guarantee that
the VM always uses the same public IP address, you need to
create a static Public IP.

Before you can implement static Public IPs in VMs, it is


necessary to understand when you can use static Public IPs, and
how they are used. Read the IP addressing overview to learn
more about IP addressing in Azure.

Note

Azure has two different deployment models for creating and


working with resources: Resource Manager and classic. This
article covers using the Resource Manager deployment model,
which Microsoft recommends for most new deployments
instead of the classic deployment model.

Scenario

This document will walk through a deployment that uses a static


public IP address allocated to a virtual machine (VM). In this
scenario, you have a single VM with its own static public IP
address. The VM is part of a subnet named FrontEnd and also
has a static private IP address (192.168.1.101) in that subnet.

You may need a static IP address for web servers that require
SSL connections in which the SSL certificate is linked to an IP
address.

You can follow the steps below to deploy the environment


shown in the figure above.

Create a VM with a static public IP

To create a VM with a static public IP address in the Azure


portal, complete the following steps:
41 | P a g e
70-534 Architecting Microsoft Azure Solutions

1. From a browser, navigate to the Azure portal and, if necessary, sign in with
your Azure account.

2. On the top left hand corner of the portal, click New>>Compute>Windows


Server 2012 R2 Datacenter.

3. In the Select a deployment model list, select Resource Manager and


click Create.

4. In the Basics blade, enter the VM information as shown below, and then
click OK.

5. In the Choose a size blade, click A1 Standard as


shown below, and then click Select.

6. In the Settings blade, click Public IP address, then in the Create public IP
address blade, under Assignment, click Static as shown below. And then
click OK.

7. In the Settings blade, click OK.

8. Review the Summary blade, as shown below, and then click OK.

9. Once the VM is created, the Settings blade will be displayed as shown below

42 | P a g e
70-534 Architecting Microsoft Azure Solutions

Create network security groups using the Azure portal


You can use an NSG to control traffic to one or more virtual machines (VMs), role instances, network adapters (NICs),
or subnets in your virtual network. An NSG contains access control rules that allow or deny traffic based on traffic
direction, protocol, source address and port, and destination address and port. The rules of an NSG can be changed at
any time, and changes are applied to all associated instances.

For more information about NSGs, visit what is an NSG.

Scenario

To better illustrate how to create NSGs, this document will use the scenario below.

43 | P a g e
70-534 Architecting Microsoft Azure Solutions

In this scenario you will create an NSG for each subnet in the TestVNet virtual network, as described below:

NSG-FrontEnd. The front end NSG will be applied to the FrontEnd subnet, and contain two rules:

o rdp-rule. This rule will allow RDP traffic to the FrontEnd subnet.

o web-rule. This rule will allow HTTP traffic to the FrontEnd subnet.

NSG-BackEnd. The back end NSG will be applied to the BackEnd subnet, and contain two rules:

o sql-rule. This rule allows SQL traffic only from the FrontEnd subnet.

o web-rule. This rule denies all internet bound traffic from the BackEnd subnet.

The combination of these rules create a DMZ-like scenario, where the back end subnet can only receive incoming
traffic for SQL from the front end subnet, and has no access to the Internet, while the front end subnet can
communicate with the Internet, and receive incoming HTTP requests only.

The sample PowerShell commands below expect a simple environment already created based on the scenario above.
If you want to run the commands as they are displayed in this document, first build the test environment by
deploying this template, click Deploy to Azure, replace the default parameter values if necessary, and follow the
instructions in the portal. The steps below use RG-NSG as the name of the resource group the template was deployed
to.

Create the NSG-FrontEnd NSG

To create the NSG-FrontEnd NSG as shown in the scenario above, follow the steps below.

1. From a browser, navigate to http://portal.azure.com and, if necessary, sign in with your Azure account.

2. Click Browse > > Network Security Groups.

44 | P a g e
70-534 Architecting Microsoft Azure Solutions

3. In the Network security groups blade, click Add.

4. In the Create network security group blade, create an NSG named NSG-FrontEnd in the RG-NSG resource
group, and then click Create.

45 | P a g e
70-534 Architecting Microsoft Azure Solutions

Create rules in an existing NSG

To create rules in an existing NSG from the Azure portal, follow the steps below.

1. Click Browse > > Network security groups.

2. In the list of NSGs, click NSG-FrontEnd > Inbound security rules

3. In the list of Inbound security rules, click Add.

46 | P a g e
70-534 Architecting Microsoft Azure Solutions

4. In the Add inbound security rule blade, create a rule named web-rule with priority of 200 allowing access
via TCP to port 80 to any VM from any source, and then click OK. Notice that most of these settings are
default values already.

5. After a few seconds you will see the new rule in the NSG.

47 | P a g e
70-534 Architecting Microsoft Azure Solutions

6. Repeat steps to 6 to create an inbound rule named rdp-rule with a priority of 250 allowing access via TCP to
port 3389 to any VM from any source.

Associate the NSG to the FrontEnd subnet

1. Click Browse > > Resource groups > RG-NSG.

2. In the RG-NSG blade, click ... > TestVNet.

3. In the Settings blade, click Subnets > FrontEnd > Network security group > NSG-FrontEnd.

48 | P a g e
70-534 Architecting Microsoft Azure Solutions

4. In the FrontEnd blade, click Save.

Create the NSG-BackEnd NSG

To create the NSG-BackEnd NSG and associate it to the BackEnd subnet, follow the steps below.

1. Repeat the steps in Create the NSG-FrontEnd NSG to create an NSG named NSG-BackEnd

2. Repeat the steps in Create rules in an existing NSG to create the inbound rules in the table below.

49 | P a g e
70-534 Architecting Microsoft Azure Solutions

Inbound rule Outbound rule

3. Repeat the steps in Associate the NSG to the FrontEnd subnet to associate the NSG-Backend NSG to
the BackEnd subnet.

Create User-Defined Routes (UDR) using PowerShell


Although the use of system routes facilitates traffic automatically for your deployment, there are cases in which you
want to control the routing of packets through a virtual appliance. You can do so by creating user defined routes that
specify the next hop for packets flowing to a specific subnet to go to your virtual appliance instead, and enabling IP
forwarding for the VM running as the virtual appliance.

Some of the cases where virtual appliances can be used include:

Monitoring traffic with an intrusion detection system (IDS)

Controlling traffic with a firewall

For more information about UDR and IP forwarding, visit User Defined Routes and IP Forwarding.

Scenario

To better illustrate how to create UDRs, this document will use the scenario below.

50 | P a g e
70-534 Architecting Microsoft Azure Solutions

In this scenario you will create one UDR for the Front end subnet and another UDR for the Back end subnet , as
described below:

UDR-FrontEnd. The front end UDR will be applied to the FrontEnd subnet, and contain one route:

o RouteToBackend. This route will send all traffic to the back end subnet to the FW1 virtual machine.

UDR-BackEnd. The back end UDR will be applied to the BackEnd subnet, and contain one route:

o RouteToFrontend. This route will send all traffic to the front end subnet to the FW1 virtual machine.

The combination of these routes will ensure that all traffic destined from one subnet to another will be routed to
the FW1 virtual machine, which is being used as a virtual appliance. You also need to turn on IP forwarding for that
VM, to ensure it can receive traffic destined to other VMs.

The sample PowerShell commands below expect a simple environment already created based on the scenario above.
If you want to run the commands as they are displayed in this document, first build the test environment by
deploying this template, click Deploy to Azure, replace the default parameter values if necessary, and follow the
instructions in the portal.

Prerequisite: Install the Azure PowerShell Module

To perform the steps in this article, you'll need to to install and configure Azure PowerShell and follow the instructions
all the way to the end to sign into Azure and select your subscription.

51 | P a g e
70-534 Architecting Microsoft Azure Solutions

Note

If you don't have an Azure account, you'll need one. Go sign up for a free trial here.

Create the UDR for the front-end subnet

To create the route table and route needed for the front-end subnet based on the scenario above, complete the
following steps:

1. Create a route used to send all traffic destined to the back-end subnet (192.168.2.0/24) to be routed to
the FW1 virtual appliance (192.168.0.4).
$route = New-AzureRmRouteConfig -Name RouteToBackEnd `

-AddressPrefix 192.168.2.0/24 -NextHopType VirtualAppliance `

-NextHopIpAddress 192.168.0.4

2. Create a route table named UDR-FrontEnd in the westus region that contains the route.
$routeTable = New-AzureRmRouteTable -ResourceGroupName TestRG -Location westus `

-Name UDR-FrontEnd -Route $route

3. Create a variable that contains the VNet where the subnet is. In our scenario, the VNet is named TestVNet.
$vnet = Get-AzureRmVirtualNetwork -ResourceGroupName TestRG -Name TestVNet

4. Associate the route table created above to the FrontEnd subnet.


Set-AzureRmVirtualNetworkSubnetConfig -VirtualNetwork $vnet -Name FrontEnd `

-AddressPrefix 192.168.1.0/24 -RouteTable $routeTable

Warning

The output for the command above shows the content for the virtual network configuration object, which only exists
on the computer where you are running PowerShell. You need to run the Set-AzureVirtualNetwork cmdlet to save
these settings to Azure.

5. Save the new subnet configuration in Azure.


Set-AzureRmVirtualNetwork -VirtualNetwork $vnet

Expected output:
Name : TestVNet
ResourceGroupName : TestRG
Location : westus
Id : /subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/virtualNetworks/TestVNet
Etag : W/"[Id]"
ProvisioningState : Succeeded
Tags :
Name Value
=========== =====
displayName VNet

AddressSpace : {
"AddressPrefixes": [
"192.168.0.0/16"
]
}
DhcpOptions : {
"DnsServers": null
}
NetworkInterfaces : null
52 | P a g e
70-534 Architecting Microsoft Azure Solutions

Subnets : [
...,
{
"Name": "FrontEnd",
"Etag": "W/\"[Id]\"",
"Id":
"/subscriptions/[Id]/resourceGroups/TestRG/providers/Microsoft.Network/virtualNetworks/TestVNet/subnets/F
rontEnd",
"AddressPrefix": "192.168.1.0/24",
"IpConfigurations": [
{
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICWEB2/ipConfigurations/ipconfig
1"
},
{
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICWEB1/ipConfigurations/ipconfig
1"
}
],
"NetworkSecurityGroup": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkSecurityGroups/NSG-FrontEnd"
},
"RouteTable": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/routeTables/UDR-FrontEnd"
},
"ProvisioningState": "Succeeded"
},
...
]
Create the UDR for the back-end subnet

To create the route table and route needed for the back-end subnet based on the scenario above, follow the steps
below.

1. Create a route used to send all traffic destined to the front-end subnet (192.168.1.0/24) to be routed to
the FW1 virtual appliance (192.168.0.4).
$route = New-AzureRmRouteConfig -Name RouteToFrontEnd `
-AddressPrefix 192.168.1.0/24 -NextHopType VirtualAppliance `
-NextHopIpAddress 192.168.0.4

2. Create a route table named UDR-BackEnd in the uswest region that contains the route created above.
$routeTable = New-AzureRmRouteTable -ResourceGroupName TestRG -Location westus `
-Name UDR-BackEnd -Route $route

3. Associate the route table created above to the BackEnd subnet.


Set-AzureRmVirtualNetworkSubnetConfig -VirtualNetwork $vnet -Name BackEnd `
-AddressPrefix 192.168.2.0/24 -RouteTable $routeTable

4. Save the new subnet configuration in Azure.


Set-AzureRmVirtualNetwork -VirtualNetwork $vnet

Expected output:
Name : TestVNet
ResourceGroupName : TestRG
Location : westus
Id : /subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/virtualNetworks/TestVNet
53 | P a g e
70-534 Architecting Microsoft Azure Solutions

Etag : W/"[Id]"
ProvisioningState : Succeeded
Tags :
Name Value
=========== =====
displayName VNet

AddressSpace : {
"AddressPrefixes": [
"192.168.0.0/16"
]
}
DhcpOptions : {
"DnsServers": null
}
NetworkInterfaces : null
Subnets : [
...,
{
"Name": "BackEnd",
"Etag": "W/\"[Id]\"",
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/virtualNetworks/TestVNet/subnets/BackEnd",
"AddressPrefix": "192.168.2.0/24",
"IpConfigurations": [
{
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICSQL2/ipConfigurations/ipconfig
1"
},
{
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICSQL1/ipConfigurations/ipconfig
1"
}
],
"NetworkSecurityGroup": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkSecurityGroups/NSG-BacEnd"
},
"RouteTable": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/routeTables/UDR-BackEnd"
},
"ProvisioningState": "Succeeded"
}
]
Enable IP forwarding on FW1

To enable IP forwarding in the NIC used by FW1, follow the steps below.

1. Create a variable that contains the settings for the NIC used by FW1. In our scenario, the NIC is
named NICFW1.
$nicfw1 = Get-AzureRmNetworkInterface -ResourceGroupName TestRG -Name NICFW1

2. Enable IP forwarding, and save the NIC settings.


$nicfw1.EnableIPForwarding = 1
Set-AzureRmNetworkInterface -NetworkInterface $nicfw1

Expected output:
Name : NICFW1
ResourceGroupName : TestRG

54 | P a g e
70-534 Architecting Microsoft Azure Solutions

Location : westus
Id : /subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICFW1
Etag : W/"[Id]"
ProvisioningState : Succeeded
Tags :
Name Value
=========== =======================
displayName NetworkInterfaces - DMZ

VirtualMachine : {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Compute/virtualMachines/FW1"
}
IpConfigurations : [
{
"Name": "ipconfig1",
"Etag": "W/\"[Id]\"",
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICFW1/ipConfigurations/ipconfig1
",
"PrivateIpAddress": "192.168.0.4",
"PrivateIpAllocationMethod": "Static",
"Subnet": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/virtualNetworks/TestVNet/subnets/DMZ"
},
"PublicIpAddress": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/publicIPAddresses/PIPFW1"
},
"LoadBalancerBackendAddressPools": [],
"LoadBalancerInboundNatRules": [],
"ProvisioningState": "Succeeded"
}
]
DnsSettings : {
"DnsServers": [],
"AppliedDnsServers": [],
"InternalDnsNameLabel": null,
"InternalFqdn": null
}
EnableIPForwarding : True
NetworkSecurityGroup : null
Primary : True

55 | P a g e
70-534 Architecting Microsoft Azure Solutions

CHAPTER 2: Architect an Azure Compute infrastructure

Regions and availability for virtual machines in Azure


It is important to understand how and where your virtual machines (VMs) operate in Azure, along with your options to
maximize performance, availability, and redundancy. Azure operates in multiple datacenters around the world. These
datacenters are grouped in to geographic regions, giving you flexibility in choosing where to build your applications.
This article provides you with an overview of the availability and redundancy features of Azure.

What are Azure regions?

Azure allows you to create resources, such as VMs, in defined geographic regions like 'West US', 'North Europe', or
'Southeast Asia'. There are currently 30 Azure regions around the world. You can review the list of regions and their
locations. Within each region, multiple datacenters exist to provide for redundancy and availability. This approach
gives you flexibility when building your applications to create VMs closest to your users and to meet any legal,
compliance, or tax purposes.

Special Azure regions

There are some special Azure regions for compliance or legal purposes that you may wish to use when building out
your applications. These special regions include:

US Gov Virginia and US Gov Iowa

o A physical and logical network-isolated instance of Azure for US government agencies and partners,
operated by screened US persons. Includes additional compliance certifications such
as FedRAMP and DISA. Read more about Azure Government.

China East and China North

o These regions are available through a unique partnership between Microsoft and 21Vianet, whereby
Microsoft does not directly maintain the datacenters. See more about Microsoft Azure in China.

Germany Central and Germany Northeast

o These regions are currently available via a data trustee model whereby customer data remains in
Germany under control of T-Systems, a Deutsche Telekom company, acting as the German data
trustee.

Region pairs

Each Azure region is paired with another region within the same geography (such as US, Europe, or Asia). This
approach allows for the replication of resources, such as VM storage, across a geography that should reduce the
likelihood of natural disasters, civil unrest, power outages, or physical network outages affecting both regions at once.
Additional advantages of region pairs include:

In the event of a wider Azure outage, one region is prioritized out of every pair to help reduce the time to
restore for applications.

Planned Azure updates are rolled out to paired regions one at a time to minimize downtime and risk of
application outage.

Data continues to reside within the same geography as its pair (except for Brazil South) for tax and law
enforcement jurisdiction purposes.

Examples of region pairs include:

56 | P a g e
70-534 Architecting Microsoft Azure Solutions

Primary Secondary

West US East US

North Europe West Europe

Southeast Asia East Asia

You can see the full list of regional pairs here.

Feature availability

Some services or VM features are only available in certain regions, such as specific VM sizes or storage types. There
are also some global Azure services that do not require you to select a particular region, such as Azure Active
Directory, Traffic Manager, or Azure DNS. To assist you in designing your application environment, you can check
the availability of Azure services across each region.

Storage availability

Understanding Azure regions and geographies becomes important when you consider the available storage replication
options. Depending on the storage type, you have different replication options.

Azure Managed Disks

Locally redundant storage (LRS)

o Replicates your data three times within the region in which you created your storage account.

Storage account-based disks

Locally redundant storage (LRS)

o Replicates your data three times within the region in which you created your storage account.

Zone redundant storage (ZRS)

o Replicates your data three times across two to three facilities, either within a single region or across
two regions.

Geo-redundant storage (GRS)

o Replicates your data to a secondary region that is hundreds of miles away from the primary region.

Read-access geo-redundant storage (RA-GRS)

o Replicates your data to a secondary region, as with GRS, but also then provides read-only access to
the data in the secondary location.

The following table provides a quick overview of the differences between the storage replication types:

57 | P a g e
70-534 Architecting Microsoft Azure Solutions

RA-
Replication strategy LRS ZRS GRS GRS

Data is replicated across multiple facilities. No Yes Yes Yes

Data can be read from the secondary location and from the primary No No No Yes
location.

Number of copies of data maintained on separate nodes. 3 3 6 6

You can read more about Azure Storage replication options here. For more information about managed disks,
see Azure Managed Disks overview.

Storage costs

Prices vary depending on the storage type and availability that you select.

Azure Managed Disks

Premium Managed Disks are backed by Solid State Drives (SSDs) and Standard Managed Disks are backed by
regular spinning disks. Both Premium and Standard Managed Disks are charged based on the provisioned
capacity for the disk.

Unmanaged disks

Premium storage is backed by Solid State Drives (SSDs) and is charged based on the capacity of the disk.

Standard storage is backed by regular spinning disks and is charged based on the in-use capacity and desired
storage availability.

o For RA-GRS, there is an additional Geo-Replication Data Transfer charge for the bandwidth of
replicating that data to another Azure region.

Azure images

In Azure, VMs are created from an image. Typically, images are from the Azure Marketplace where partners can
provide pre-configured complete OS or application images.

When you create a VM from an image in the Azure Marketplace, you are actually working with templates. Azure
Resource Manager templates are declarative JavaScript Object Notation (JSON) files that can be used to create
complex application environments comprising VMs, storage, virtual networking, etc. You can read more about
using Azure Resource Manager templates, including how to build your own templates.

You can also create your own custom images and upload them using Azure CLI or Azure PowerShell to quickly create
custom VMs to your specific build requirements.

Availability sets

An availability set is a logical grouping of VMs that allows Azure to understand how your application is built to provide
for redundancy and availability. It is recommended that two or more VMs are created within an availability set to
provide for a highly available application and to meet the 99.95% Azure SLA. When a single VM is using Azure
Premium Storage, the Azure SLA applies for unplanned maintenance events. An availability set is compromised of two
additional groupings that protect against hardware failures and allow updates to safely be applied - fault domains
(FDs) and update domains (UDs).
58 | P a g e
70-534 Architecting Microsoft Azure Solutions

You can read more about how to manage the availability of Linux VMs or Windows VMs.

Fault domains

A fault domain is a logical group of underlying hardware that share a common power source and network switch,
similar to a rack within an on-premises datacenter. As you create VMs within an availability set, the Azure platform
automatically distributes your VMs across these fault domains. This approach limits the impact of potential physical
hardware failures, network outages, or power interruptions.

Managed Disk fault domains and availability sets

For VMs using Azure Managed Disks, VMs are aligned with managed disk fault domains when using a managed
availability set. This alignment ensures that all the managed disks attached to a VM are within the same managed disk
fault domain. Only VMs with managed disks can be created in a managed availability set. The number of managed disk
fault domains varies by region - either two or three managed disk fault domains per region.

Important

The number of fault domains for managed availability sets varies by region - either two or three per region

Update domains

An update domain is a logical group of underlying hardware that can undergo maintenance or be rebooted at the
same time. As you create VMs within an availability set, the Azure platform automatically distributes your VMs across
59 | P a g e
70-534 Architecting Microsoft Azure Solutions

these update domains. This approach ensures that at least one instance of your application always remains running as
the Azure platform undergoes periodic maintenance. The order of update domains being rebooted may not proceed
sequentially during planned maintenance, but only one update domain is rebooted at a time.

Understand planned vs. unplanned maintenance

There are two types of Microsoft Azure platform events that can affect the availability of your virtual machines:
planned maintenance and unplanned maintenance.

Planned maintenance events are periodic updates made by Microsoft to the underlying Azure platform to
improve overall reliability, performance, and security of the platform infrastructure that your virtual machines
run on. Most of these updates are performed without any impact upon your virtual machines or cloud
services. However, there are instances where these updates require a reboot of your virtual machine to apply
the required updates to the platform infrastructure.

Unplanned maintenance events occur when the hardware or physical infrastructure underlying your virtual
machine has faulted in some way. This may include local network failures, local disk failures, or other rack
level failures. When such a failure is detected, the Azure platform automatically migrates your virtual machine
from the unhealthy physical machine hosting your virtual machine to a healthy physical machine. Such events
are rare, but may also cause your virtual machine to reboot.

To reduce the impact of downtime due to one or more of these events, we recommend the following high availability
best practices for your virtual machines:

Configure multiple virtual machines in an availability set for redundancy

To provide redundancy to your application, we recommend that you group two or more virtual machines in an
availability set. This configuration ensures that during either a planned or unplanned maintenance event, at least one
virtual machine is available and meets the 99.95% Azure SLA. For more information, see the SLA for Virtual Machines.

Important

Avoid leaving a single instance virtual machine in an availability set by itself. VMs in this configuration do not qualify
for a SLA guarantee and face downtime during Azure planned maintenance events, except when a single VM is
using Azure Premium Storage. For single VMs using premium storage, the Azure SLA applies.

Each virtual machine in your availability set is assigned an update domain and a fault domain by the underlying Azure
platform. For a given availability set, five non-user-configurable update domains are assigned by default (Resource
Manager deployments can then be increased to provide up to 20 update domains) to indicate groups of virtual
machines and underlying physical hardware that can be rebooted at the same time. When more than five virtual
machines are configured within a single availability set, the sixth virtual machine is placed into the same update
domain as the first virtual machine, the seventh in the same update domain as the second virtual machine, and so on.
The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only
one update domain is rebooted at a time.

Fault domains define the group of virtual machines that share a common power source and network switch. By
default, the virtual machines configured within your availability set are separated across up to three fault domains for
Resource Manager deployments (two fault domains for Classic). While placing your virtual machines into an
availability set does not protect your application from operating system or application-specific failures, it does limit
the impact of potential physical hardware failures, network outages, or power interruptions.

Use managed disks for VMs in an availability set

If you are currently using VMs with unmanaged disks, we highly recommend you convert VMs in Availability Set to use
Managed Disks.

Managed disks provides better reliability for Availability Sets by ensuring that the disks of VMs in an Availability Set are
sufficiently isolated from each other to avoid single points of failure. It does this by automatically placing the disks in

60 | P a g e
70-534 Architecting Microsoft Azure Solutions

different storage clusters. If a storage cluster fails due to hardware or software failure, only the VM instances with
disks on those stamps fail.

If you plan to use VMs with unmanaged disks, follow below best practices for Storage accounts where virtual hard
disks (VHDs) of VMs are stored as page blobs.

1. Keep all disks (OS and data) associated with a VM in the same storage account

2. Review the limits on the number of unmanaged disks in a Storage account before adding more VHDs to a
storage account

3. Use separate storage account for each VM in an Availability Set. Do not share Storage accounts with multiple
VMs in the same Availability Set. It is acceptable for VMs across different Availability Sets to share storage
accounts if above best practices are followed

Configure each application tier into separate availability sets

If your virtual machines are all nearly identical and serve the
same purpose for your application, we recommend that you
configure an availability set for each tier of your application. If
you place two different tiers in the same availability set, all
virtual machines in the same application tier can be rebooted
at once. By configuring at least two virtual machines in an
availability set for each tier, you guarantee that at least one
virtual machine in each tier is available.

For example, you could put all the virtual machines in the
front-end of your application running IIS, Apache, Nginx in a
single availability set. Make sure that only front-end virtual
machines are placed in the same availability set. Similarly,
make sure that only data-tier virtual machines are placed in
their own availability set, like your replicated SQL Server virtual machines or your MySQL virtual machines.

Combine a load balancer with availability sets

Combine the Azure Load Balancer with an availability set to get the most application resiliency. The Azure Load
Balancer distributes traffic between multiple virtual machines. For our Standard tier virtual machines, the Azure Load
Balancer is included. Not all virtual machine tiers include the Azure Load Balancer. For more information about load
balancing your virtual machines, see Load Balancing virtual machines.

If the load balancer is not configured to balance traffic across multiple virtual machines, then any planned
maintenance event affects the only traffic-serving virtual machine, causing an outage to your application tier. Placing
multiple virtual machines of the same tier under the same load balancer and availability set enables traffic to be
continuously served by at least one instance.

Sizes for Windows virtual machines in Azure

This article describes the available sizes and options for the Azure virtual machines you can use to run your Windows
apps and workloads. It also provides deployment considerations to be aware of when you're planning to use these
resources. This article is also available for Linux virtual machines.

61 | P a g e
70-534 Architecting Microsoft Azure Solutions

Type Sizes Description

General purpose DSv2, Dv2, DS, Balanced CPU-to-memory ratio. Ideal for testing and
D, Av2, A0-7 development, small to medium databases, and low to medium
traffic web servers.

Compute Fs, F High CPU-to-memory ratio. Good for medium traffic web
optimized servers, network appliances, batch processes, and application
servers.

Memory M, GS, G, High memory-to-core ratio. Great for relational database


optimized DSv2, DS, Dv2, servers, medium to large caches, and in-memory analytics.
D

Storage optimized Ls High disk throughput and IO. Ideal for Big Data, SQL, and NoSQL
databases.

GPU NV, NC Specialized virtual machines targeted for heavy graphic


rendering and video editing. Available with single or multiple
GPUs.

High performance H, A8-11 Our fastest and most powerful CPU virtual machines with
compute optional high-throughput network interfaces (RDMA).

Understand the structure and syntax of Azure Resource Manager templates


This topic describes the structure of an Azure Resource Manager template. It presents the different sections of a
template and the properties that are available in those sections. The template consists of JSON and expressions that
you can use to construct values for your deployment. For a step-by-step tutorial on creating a template, see Create
your first Azure Resource Manager template.

Template format

In its simplest structure, a template contains the following elements:


{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "",
"parameters": { },
"variables": { },
"resources": [ ],
"outputs": { }
}

62 | P a g e
70-534 Architecting Microsoft Azure Solutions

Element name Required Description

$schema Yes Location of the JSON schema file that describes the version of the
template language. Use the URL shown in the preceding example.

contentVersion Yes Version of the template (such as 1.0.0.0). You can provide any value for
this element. When deploying resources using the template, this value
can be used to make sure that the right template is being used.

parameters No Values that are provided when deployment is executed to customize


resource deployment.

variables No Values that are used as JSON fragments in the template to simplify
template language expressions.

resources Yes Resource types that are deployed or updated in a resource group.

outputs No Values that are returned after deployment.

Each element contains properties you can set. The following example contains the full syntax for a template:
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "",
"parameters": {
"<parameter-name>" : {
"type" : "<type-of-parameter-value>",
"defaultValue": "<default-value-of-parameter>",
"allowedValues": [ "<array-of-allowed-values>" ],
"minValue": <minimum-value-for-int>,
"maxValue": <maximum-value-for-int>,
"minLength": <minimum-length-for-string-or-array>,
"maxLength": <maximum-length-for-string-or-array-parameters>,
"metadata": {
"description": "<description-of-the parameter>"
}
}
},
"variables": {
"<variable-name>": "<variable-value>",
"<variable-name>": {
<variable-complex-type-value>
}
},
"resources": [
{
"condition": "<boolean-value-whether-to-deploy>",
"apiVersion": "<api-version-of-resource>",
"type": "<resource-provider-namespace/resource-type-name>",
"name": "<name-of-the-resource>",
"location": "<location-of-resource>",
"tags": {
"<tag-name1>": "<tag-value1>",
"<tag-name2>": "<tag-value2>"
63 | P a g e
70-534 Architecting Microsoft Azure Solutions

},
"comments": "<your-reference-notes>",
"copy": {
"name": "<name-of-copy-loop>",
"count": "<number-of-iterations>",
"mode": "<serial-or-parallel>",
"batchSize": "<number-to-deploy-serially>"
},
"dependsOn": [
"<array-of-related-resource-names>"
],
"properties": {
"<settings-for-the-resource>",
"copy": [
{
"name": ,
"count": ,
"input": {}
}
]
},
"resources": [
"<array-of-child-resources>"
]
}
],
"outputs": {
"<outputName>" : {
"type" : "<type-of-output-value>",
"value": "<output-value-expression>"
}
}
}
We examine the sections of the template in greater detail later in this topic.

Expressions and functions

The basic syntax of the template is JSON. However, expressions and functions extend the JSON values available within
the template. Expressions are written within JSON string literals whose first and last characters are the
brackets: [ and ], respectively. The value of the expression is evaluated when the template is deployed. While written
as a string literal, the result of evaluating the expression can be of a different JSON type, such as an array or integer,
depending on the actual expression. To have a literal string start with a bracket [, but not have it interpreted as an
expression, add an extra bracket to start the string with [[.

Typically, you use expressions with functions to perform operations for configuring the deployment. Just like in
JavaScript, function calls are formatted as functionName(arg1,arg2,arg3). You reference properties by using the dot
and [index] operators.

The following example shows how to use several functions when constructing values:
"variables": {
"location": "[resourceGroup().location]",
"usernameAndPassword": "[concat(parameters('username'), ':', parameters('password'))]",
"authorizationHeader": "[concat('Basic ', base64(variables('usernameAndPassword')))]"
}
For the full list of template functions, see Azure Resource Manager template functions.

Parameters

In the parameters section of the template, you specify which values you can input when deploying the resources.
These parameter values enable you to customize the deployment by providing values that are tailored for a particular
environment (such as dev, test, and production). You do not have to provide parameters in your template, but
without parameters your template would always deploy the same resources with the same names, locations, and
properties.

64 | P a g e
70-534 Architecting Microsoft Azure Solutions

You define parameters with the following structure:


"parameters": {
"<parameter-name>" : {
"type" : "<type-of-parameter-value>",
"defaultValue": "<default-value-of-parameter>",
"allowedValues": [ "<array-of-allowed-values>" ],
"minValue": <minimum-value-for-int>,
"maxValue": <maximum-value-for-int>,
"minLength": <minimum-length-for-string-or-array>,
"maxLength": <maximum-length-for-string-or-array-parameters>,
"metadata": {
"description": "<description-of-the parameter>"
}
}
}

Element name Required Description

parameterName Yes Name of the parameter. Must be a valid JavaScript identifier.

type Yes Type of the parameter value. See the list of allowed types after this
table.

defaultValue No Default value for the parameter, if no value is provided for the
parameter.

allowedValues No Array of allowed values for the parameter to make sure that the right
value is provided.

minValue No The minimum value for int type parameters, this value is inclusive.

maxValue No The maximum value for int type parameters, this value is inclusive.

minLength No The minimum length for string, secureString, and array type
parameters, this value is inclusive.

maxLength No The maximum length for string, secureString, and array type
parameters, this value is inclusive.

description No Description of the parameter that is displayed to users through the


portal.

The allowed types and values are:

string
secureString
int
65 | P a g e
70-534 Architecting Microsoft Azure Solutions

bool
object
secureObject
array
To specify a parameter as optional, provide a defaultValue (can be an empty string).

If you specify a parameter name in your template that matches a parameter in the command to deploy the template,
there is potential ambiguity about the values you provide. Resource Manager resolves this confusion by adding the
postfix FromTemplate to the template parameter. For example, if you include a parameter
named ResourceGroupName in your template, it conflicts with the ResourceGroupName parameter in the New-
AzureRmResourceGroupDeployment cmdlet. During deployment, you are prompted to provide a value
for ResourceGroupNameFromTemplate. In general, you should avoid this confusion by not naming parameters with
the same name as parameters used for deployment operations.

Note

All passwords, keys, and other secrets should use the secureString type. If you pass sensitive data in a JSON object, use
the secureObject type. Template parameters with secureString or secureObject types cannot be read after resource
deployment.

For example, the following entry in the deployment history shows the value for a string and object but not for
secureString and secureObject.

The following example shows how to define parameters:


"parameters": {
"siteName": {
"type": "string",
"defaultValue": "[concat('site', uniqueString(resourceGroup().id))]"
},
"hostingPlanName": {
"type": "string",
"defaultValue": "[concat(parameters('siteName'),'-plan')]"
},
"skuName": {
"type": "string",
"defaultValue": "F1",
"allowedValues": [
"F1",
"D1",
"B1",
"B2",
"B3",
"S1",
"S2",
"S3",
"P1",
"P2",
"P3",

66 | P a g e
70-534 Architecting Microsoft Azure Solutions

"P4"
]
},
"skuCapacity": {
"type": "int",
"defaultValue": 1,
"minValue": 1
}
}
For how to input the parameter values during deployment, see Deploy an application with Azure Resource Manager
template.

Variables

In the variables section, you construct values that can be used throughout your template. You do not need to define
variables, but they often simplify your template by reducing complex expressions.

You define variables with the following structure:


"variables": {
"<variable-name>": "<variable-value>",
"<variable-name>": {
<variable-complex-type-value>
}
}
The following example shows how to define a variable that is constructed from two parameter values:
"variables": {
"connectionString": "[concat('Name=', parameters('username'), ';Password=', parameters('password'))]"
}
The next example shows a variable that is a complex JSON type, and variables that are constructed from other
variables:
"parameters": {
"environmentName": {
"type": "string",
"allowedValues": [
"test",
"prod"
]
}
},
"variables": {
"environmentSettings": {
"test": {
"instancesSize": "Small",
"instancesCount": 1
},
"prod": {
"instancesSize": "Large",
"instancesCount": 4
}
},
"currentEnvironmentSettings": "[variables('environmentSettings')[parameters('environmentName')]]",
"instancesSize": "[variables('currentEnvironmentSettings').instancesSize]",
"instancesCount": "[variables('currentEnvironmentSettings').instancesCount]"
}

Resources

In the resources section, you define the resources that are deployed or updated. This section can get complicated
because you must understand the types you are deploying to provide the right values. For the resource-specific values
(apiVersion, type, and properties) that you need to set, see Define resources in Azure Resource Manager templates.

You define resources with the following structure:


67 | P a g e
70-534 Architecting Microsoft Azure Solutions

"resources": [
{
"condition": "<boolean-value-whether-to-deploy>",
"apiVersion": "<api-version-of-resource>",
"type": "<resource-provider-namespace/resource-type-name>",
"name": "<name-of-the-resource>",
"location": "<location-of-resource>",
"tags": {
"<tag-name1>": "<tag-value1>",
"<tag-name2>": "<tag-value2>"
},
"comments": "<your-reference-notes>",
"copy": {
"name": "<name-of-copy-loop>",
"count": "<number-of-iterations>",
"mode": "<serial-or-parallel>",
"batchSize": "<number-to-deploy-serially>"
},
"dependsOn": [
"<array-of-related-resource-names>"
],
"properties": {
"<settings-for-the-resource>",
"copy": [
{
"name": ,
"count": ,
"input": {}
}
]
},
"resources": [
"<array-of-child-resources>"
]
}
]

Element
name Required Description

condition No Boolean value that indicates whether the resource is deployed.

apiVersion Yes Version of the REST API to use for creating the resource.

type Yes Type of the resource. This value is a combination of the namespace of the
resource provider and the resource type (such
as Microsoft.Storage/storageAccounts).

name Yes Name of the resource. The name must follow URI component restrictions
defined in RFC3986. In addition, Azure services that expose the resource
name to outside parties validate the name to make sure it is not an attempt
to spoof another identity.

location Varies Supported geo-locations of the provided resource. You can select any of the
available locations, but typically it makes sense to pick one that is close to
your users. Usually, it also makes sense to place resources that interact with

68 | P a g e
70-534 Architecting Microsoft Azure Solutions

Element
name Required Description

each other in the same region. Most resource types require a location, but
some types (such as a role assignment) do not require a location. See Set
resource location in Azure Resource Manager templates.

tags No Tags that are associated with the resource. See Tag resources in Azure
Resource Manager templates.

comments No Your notes for documenting the resources in your template

copy No If more than one instance is needed, the number of resources to create. The
default mode is parallel. Specify serial mode when you do not want all or the
resources to deploy at the same time. For more information, see Create
multiple instances of resources in Azure Resource Manager.

dependsOn No Resources that must be deployed before this resource is deployed. Resource
Manager evaluates the dependencies between resources and deploys them
in the correct order. When resources are not dependent on each other, they
are deployed in parallel. The value can be a comma-separated list of a
resource names or resource unique identifiers. Only list resources that are
deployed in this template. Resources that are not defined in this template
must already exist. Avoid adding unnecessary dependencies as they can slow
your deployment and create circular dependencies. For guidance on setting
dependencies, see Defining dependencies in Azure Resource Manager
templates.

properties No Resource-specific configuration settings. The values for the properties are
the same as the values you provide in the request body for the REST API
operation (PUT method) to create the resource. You can also specify a copy
array to create multiple instances of a property. For more information,
see Create multiple instances of resources in Azure Resource Manager.

resources No Child resources that depend on the resource being defined. Only provide
resource types that are permitted by the schema of the parent resource.
The fully qualified type of the child resource includes the parent resource
type, such as Microsoft.Web/sites/extensions. Dependency on the parent
resource is not implied. You must explicitly define that dependency.

The resources section contains an array of the resources to deploy. Within each resource, you can also define an array
of child resources. Therefore, your resources section could have a structure like:
"resources": [
{
"name": "resourceA",

69 | P a g e
70-534 Architecting Microsoft Azure Solutions

},
{
"name": "resourceB",
"resources": [
{
"name": "firstChildResourceB",
},
{
"name": "secondChildResourceB",
}
]
},
{
"name": "resourceC",
}
]
For more information about defining child resources, see Set name and type for child resource in Resource Manager
template.

The condition element specifies whether the resource is deployed. The value for this element resolves to true or false.
For example, to specify whether a new storage account is deployed, use:
{
"condition": "[equals(parameters('newOrExisting'),'new')]",
"type": "Microsoft.Storage/storageAccounts",
"name": "[variables('storageAccountName')]",
"apiVersion": "2017-06-01",
"location": "[resourceGroup().location]",
"sku": {
"name": "[variables('storageAccountType')]"
},
"kind": "Storage",
"properties": {}
}
For an example of using a new or existing resource, see New or existing condition template.

To specify whether a virtual machine is deployed with a password or SSH key, define two versions of the virtual
machine in your template and use condition to differentiate usage. Pass a parameter that specifies which scenario to
deploy.
{
"condition": "[equals(parameters('passwordOrSshKey'),'password')]",
"apiVersion": "2016-03-30",
"type": "Microsoft.Compute/virtualMachines",
"name": "[concat(variables('vmName'),'password')]",
"properties": {
"osProfile": {
"computerName": "[variables('vmName')]",
"adminUsername": "[parameters('adminUsername')]",
"adminPassword": "[parameters('adminPassword')]"
},
...
},
...
},
{
"condition": "[equals(parameters('passwordOrSshKey'),'sshKey')]",
"apiVersion": "2016-03-30",
"type": "Microsoft.Compute/virtualMachines",
"name": "[concat(variables('vmName'),'ssh')]",
"properties": {
"osProfile": {
"linuxConfiguration": {
"disablePasswordAuthentication": "true",
"ssh": {
"publicKeys": [
{

70 | P a g e
70-534 Architecting Microsoft Azure Solutions

"path": "[variables('sshKeyPath')]",
"keyData": "[parameters('adminSshKey')]"
}
]
}
}
},
...
},
...
}
For an example of using a password or SSH key to deploy virtual machine, see Username or SSH condition template.

Outputs

In the Outputs section, you specify values that are returned from deployment. For example, you could return the URI
to access a deployed resource.

The following example shows the structure of an output definition:


"outputs": {
"<outputName>" : {
"type" : "<type-of-output-value>",
"value": "<output-value-expression>"
}
}

Element
name Required Description

outputName Yes Name of the output value. Must be a valid JavaScript identifier.

type Yes Type of the output value. Output values support the same types as
template input parameters.

value Yes Template language expression that is evaluated and returned as output
value.

The following example shows a value that is returned in the Outputs section.
"outputs": {
"siteUri" : {
"type" : "string",
"value": "[concat('http://',reference(resourceId('Microsoft.Web/sites',
parameters('siteName'))).hostNames[0])]"
}
}
For more information about working with output, see Sharing state in Azure Resource Manager templates.

Template limits

Limit the size of your template to 1 MB, and each parameter file to 64 KB. The 1-MB limit applies to the final state of
the template after it has been expanded with iterative resource definitions, and values for variables and parameters.

You are also limited to:

256 parameters

256 variables
71 | P a g e
70-534 Architecting Microsoft Azure Solutions

800 resources (including copy count)

64 output values

24,576 characters in a template expression

You can exceed some template limits by using a nested template. For more information, see Using linked templates
when deploying Azure resources. To reduce the number of parameters, variables, or outputs, you can combine
several values into an object. For more information, see Objects as parameters.

Deploy resources with Resource Manager templates and Azure PowerShell


This topic explains how to use Azure PowerShell with Resource Manager templates to deploy your resources to Azure.
If you are not familiar with the concepts of deploying and managing your Azure solutions, see Azure Resource
Manager overview.

The Resource Manager template you deploy can either be a local file on your machine, or an external file that is
located in a repository like GitHub. The template you deploy in this article is available in the Sample template section,
or as storage account template in GitHub.

If needed, install the Azure PowerShell module using the instructions found in the Azure PowerShell guide, and then
run Login-AzureRmAccount to create a connection with Azure. Also, you need to have an SSH public key
named id_rsa.pub in the .ssh directory of your user profile.

Deploy a template from your local machine

When deploying resources to Azure, you:

1. Log in to your Azure account


2. Create a resource group that serves as the container for the deployed resources. The name of the resource
group can only include alphanumeric characters, periods, underscores, hyphens, and parenthesis. It can be up
to 90 characters. It cannot end in a period.
3. Deploy to the resource group the template that defines the resources to create
A template can include parameters that enable you to customize the deployment. For example, you can provide
values that are tailored for a particular environment (such as dev, test, and production). The sample template defines
a parameter for the storage account SKU.

The following example creates a resource group, and deploys a template from your local machine:
Login-AzureRmAccount

New-AzureRmResourceGroup -Name ExampleResourceGroup -Location "South Central US"


New-AzureRmResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName ExampleResourceGroup `
-TemplateFile c:\MyTemplates\storage.json -storageAccountType Standard_GRS
The deployment can take a few minutes to complete. When it finishes, you see a message that includes the result:
ProvisioningState : Succeeded
Deploy a template from an external source

Instead of storing Resource Manager templates on your local machine, you may prefer to store them in an external
location. You can store templates in a source control repository (such as GitHub). Or, you can store them in an Azure
storage account for shared access in your organization.

To deploy an external template, use the TemplateUri parameter. Use the URI in the example to deploy the sample
template from GitHub.
New-AzureRmResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName ExampleResourceGroup `
-TemplateUri https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/101-storage-
account-create/azuredeploy.json `
-storageAccountType Standard_GRS

72 | P a g e
70-534 Architecting Microsoft Azure Solutions

The preceding example requires a publicly accessible URI for the template, which works for most scenarios because
your template should not include sensitive data. If you need to specify sensitive data (like an admin password), pass
that value as a secure parameter. However, if you do not want your template to be publicly accessible, you can
protect it by storing it in a private storage container. For information about deploying a template that requires a
shared access signature (SAS) token, see Deploy private template with SAS token.

Parameter files

Rather than passing parameters as inline values in your script, you may find it easier to use a JSON file that contains
the parameter values. The parameter file must be in the following format:
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"storageAccountType": {
"value": "Standard_GRS"
}
}
}
Notice that the parameters section includes a parameter name that matches the parameter defined in your template
(storageAccountType). The parameter file contains a value for the parameter. This value is automatically passed to the
template during deployment. You can create multiple parameter files for different deployment scenarios, and then
pass in the appropriate parameter file.

Copy the preceding example and save it as a file named storage.parameters.json.

To pass a local parameter file, use the TemplateParameterFile parameter:


New-AzureRmResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName ExampleResourceGroup `
-TemplateFile c:\MyTemplates\storage.json `
-TemplateParameterFile c:\MyTemplates\storage.parameters.json

To pass an external parameter file, use the TemplateParameterUri parameter:


New-AzureRmResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName ExampleResourceGroup `
-TemplateUri https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/101-storage-
account-create/azuredeploy.json `
-TemplateParameterUri https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/101-
storage-account-create/azuredeploy.parameters.json

You can use inline parameters and a local parameter file in the same deployment operation. For example, you can
specify some values in the local parameter file and add other values inline during deployment. If you provide values
for a parameter in both the local parameter file and inline, the inline value takes precedence.

However, when you use an external parameter file, you cannot pass other values either inline or from a local file.
When you specify a parameter file in the TemplateParameterUri parameter, all inline parameters are ignored. Provide
all parameter values in the external file. If your template includes a sensitive value that you cannot include in the
parameter file, either add that value to a key vault, or dynamically provide all parameter values inline.1

If your template includes a parameter with the same name as one of the parameters in the PowerShell command,
PowerShell presents the parameter from your template with the postfix FromTemplate. For example, a parameter
named ResourceGroupName in your template conflicts with the ResourceGroupName parameter in the New-
AzureRmResourceGroupDeployment cmdlet. You are prompted to provide a value
for ResourceGroupNameFromTemplate. In general, you should avoid this confusion by not naming parameters with
the same name as parameters used for deployment operations.

Test a template deployment

73 | P a g e
70-534 Architecting Microsoft Azure Solutions

To test your template and parameter values without actually deploying any resources, use Test-AzureRmResource
GroupDeployment.
Test-AzureRmResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName ExampleResourceGroup `
-TemplateFile c:\MyTemplates\storage.json -storageAccountType Standard_GRS

If no errors are detected, the command finishes without a response. If an error is detected, the command returns an
error message. For example, attempting to pass an incorrect value for the storage account SKU, returns the following
error:
Test-AzureRmResourceGroupDeployment -ResourceGroupName testgroup `
-TemplateFile c:\MyTemplates\storage.json -storageAccountType badSku
Code : InvalidTemplate
Message : Deployment template validation failed: 'The provided value 'badSku' for the template parameter
'storageAccountType'
at line '15' and column '24' is not valid. The parameter value is not part of the allowed
value(s):
'Standard_LRS,Standard_ZRS,Standard_GRS,Standard_RAGRS,Premium_LRS'.'.
Details :

If your template has a syntax error, the command returns an error indicating it could not parse the template. The
message indicates the line number and position of the parsing error.
Test-AzureRmResourceGroupDeployment : After parsing a value an unexpected character was encountered:
". Path 'variables', line 31, position 3.

Incremental and complete deployments

When deploying your resources, you specify that the deployment is either an incremental update or a complete
update. The primary difference between these two modes is how Resource Manager handles existing resources in the
resource group that are not in the template:

In complete mode, Resource Manager deletes resources that exist in the resource group but are not specified
in the template.

In incremental mode, Resource Manager leaves unchanged resources that exist in the resource group but are
not specified in the template.

For both modes, Resource Manager attempts to provision all resources specified in the template. If the resource
already exists in the resource group and its settings are unchanged, the operation results in no change. If you change
the settings for a resource, the resource is provisioned with those new settings. If you attempt to update the location
or type of an existing resource, the deployment fails with an error. Instead, deploy a new resource with the location or
type that you need.

By default, Resource Manager uses the incremental mode.

To illustrate the difference between incremental and complete modes, consider the following scenario.

Existing Resource Group contains:


Resource A
Resource B
Resource C
Template defines:
Resource A
Resource B
Resource D
When deployed in incremental mode, the resource group contains:
Resource A

74 | P a g e
70-534 Architecting Microsoft Azure Solutions

Resource B
Resource C
Resource D
When deployed in complete mode, Resource C is deleted. The resource group contains:
Resource A
Resource B
Resource D
To use complete mode, use the Mode parameter:
New-AzureRmResourceGroupDeployment -Mode Complete -Name ExampleDeployment `
-ResourceGroupName ExampleResourceGroup -TemplateFile c:\MyTemplates\storage.json

Sample template

The following template is used for the examples in this topic. Copy and save it as a file named storage.json. To
understand how to create this template, see Create your first Azure Resource Manager template.
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"storageAccountType": {
"type": "string",
"defaultValue": "Standard_LRS",
"allowedValues": [
"Standard_LRS",
"Standard_GRS",
"Standard_ZRS",
"Premium_LRS"
],
"metadata": {
"description": "Storage Account type"
}
}
},
"variables": {
"storageAccountName": "[concat(uniquestring(resourceGroup().id), 'standardsa')]"
},
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"name": "[variables('storageAccountName')]",
"apiVersion": "2016-01-01",
"location": "[resourceGroup().location]",
"sku": {
"name": "[parameters('storageAccountType')]"
},
"kind": "Storage",
"properties": {
}
}
],
"outputs": {
"storageAccountName": {
"type": "string",
"value": "[variables('storageAccountName')]"
}
}
}

75 | P a g e
70-534 Architecting Microsoft Azure Solutions

CHAPTER 3: Secure resources


What is Azure Active Directory?
Azure Active Directory (Azure AD) is Microsofts multi-tenant cloud based directory and identity management service.
For IT Admins, Azure AD provides an affordable, easy to use solution to give employees and business partners single
sign-on (SSO) access to thousands of cloud SaaS Applications like Office365, Salesforce.com, DropBox, and Concur.
For application developers, Azure AD lets you focus on building your application by making it fast and simple to
integrate with a world class identity management solution used by millions of organizations around the world.
Azure AD also includes a full suite of identity management capabilities including multi-factor authentication, device
registration, self-service password management, self-service group management, privileged account management,
role based access control, application usage monitoring, rich auditing and security monitoring and alerting. These
capabilities can help secure cloud based applications, streamline IT processes, cut costs and help ensure that
corporate compliance goals are met.
Additionally, with just four clicks, Azure AD can be integrated with an existing Windows Server Active Directory, giving
organizations the ability to leverage their existing on-premises identity investments to manage access to cloud based
SaaS applications.
If you are an Office365, Azure or Dynamics CRM Online customer, you might not realize that you are already using
Azure AD. Every Office365, Azure and Dynamics CRM tenant is actually already an Azure AD tenant. Whenever you
want you can start using that tenant to manage access to thousands of other cloud applications Azure AD integrates
with!1

How reliable is Azure AD?


The multi-tenant, geo-distributed, high availability design of Azure AD means that you can rely on it for your most
critical business needs. Running out of 28 data centers around the world with automated failover, youll have the
comfort of knowing that Azure AD is highly reliable and that even if a data center goes down, copies of your directory
data are live in at least two more regionally dispersed data centers and available for instant access.
For more details, see Service Level Agreements.
What are the benefits of Azure AD?
Your organization can use Azure AD to improve employee productivity, streamline IT processes, improve security and
cut costs in many ways:
Quickly adopt cloud services, providing employees and partners with an easy single-sign on experience
powered by Azure ADs fully automated SaaS app access management and provisioning services capabilities.
Empower employees with access to world class cloud apps and self-service capabilities from wherever they
need to work on the devices they love to use.
Easily and securely manage employee and vendor access to your corporate social media accounts.
Improve application security with Azure AD multifactor authentication and conditional access.
Implement consistent, self-service application access management, empowering business owners to move
quickly while cutting IT costs and overheads.
76 | P a g e
70-534 Architecting Microsoft Azure Solutions

Monitor application usage and protect your business from advanced threats with security reporting and
monitoring.
Secure mobile (remote) access to on-premises applications.

How does Azure AD compare to on-premises Active Directory Domain Services (AD DS)?
Both Azure Active Directory (Azure AD) and on-premises Active Directory (Active Directory Domain Services or AD DS)
are systems that store directory data and manage communication between users and resources, including user logon
processes, authentication, and directory searches.
AD DS is a server role on Windows Server, which means that it can be deployed on physical or virtual machines. It has
a hierarchical structure based on X.500. It uses DNS for locating objects, can be interacted with using LDAP, and it
primarily uses Kerberos for authentication. Active Directory enables organizational units (OUs) and Group Policy
Objects (GPOs) in addition to joining machines to the domain, and trusts are created between domains.
Azure AD is a multi-customer public directory service, which means that within Azure AD you can create a tenant for
your cloud servers and applications such as Office 365. Users and groups are created in a flat structure without OUs or
GPOs. Authentication is performed through protocols such as SAML, WS-Federation, and OAuth. It's possible to query
Azure AD, but instead of using LDAP you must use a REST API called AD Graph API. These all work over HTTP and
HTTPS.

Get started with Azure Active Directory Identity Protection & Graph API
Microsoft Graph is Microsofts unified API endpoint and the home of Azure Active Directory Identity Protections APIs.
Our first API, identityRiskEvents, allows you to query Microsoft Graph for a list of risk events and associated
information. This article gets you started querying this API. For an in depth introduction, full documentation, and
access to the Graph Explorer, see the Microsoft Graph site.
There are three steps to accessing Identity Protection data through Microsoft Graph:
1. Add an application with a client secret.
2. Use this secret and a few other pieces of information to authenticate to Microsoft Graph, where you receive
an authentication token.
3. Use this token to make requests to the API endpoint and get Identity Protection data back.
Before you get started, youll need:
Administrator privileges to create the application in Azure AD
The name of your tenant's domain (for example, contoso.onmicrosoft.com)
Add an application with a client secret
1. Sign in to your Azure classic portal as an administrator.
2. On on the left navigation pane, click Active Directory.

3. From the Directory list, select the directory for which you want to enable directory integration.
4. In the menu on the top, click Applications.

5. Click Add at the bottom of the page.

6. On the What do you want to do dialog, click Add an application my organization is developing.

77 | P a g e
70-534 Architecting Microsoft Azure Solutions

7. On the Tell us about your application dialog, perform the following steps:

a. In the Name textbox, type a name for your application (e.g.: AADIP Risk Event API Application).
b. As Type, select Web Application And / Or Web API.
c. Click Next.
8. On the App properties dialog, perform the following steps:

a. In the Sign-On URL textbox, type http://localhost.


b. In the App ID URI textbox, type http://localhost.
c. Click Complete.
Your can now configure your application.

Grant your application permission to use the API


1. On your application's page, in the menu on the top, click Configure.

78 | P a g e
70-534 Architecting Microsoft Azure Solutions

2. In the permissions to other applications section, click Add application.

3. On the permissions to other applications dialog, perform the following steps:

a. Select Microsoft Graph.


b. Click Complete.
4. Click Application Permissions: 0, and then select Read all identity risk event information.

5. Click Save at the bottom of the page.

Get an access key


1. On your application's page, in the keys section, select 1 year as duration.

2. Click Save at the bottom of the page.

3. in the keys section, copy the value of your newly created key, and then paste it into a safe location.

79 | P a g e
70-534 Architecting Microsoft Azure Solutions

Note
If you lose this key, you will have to return to this section and create a new key. Keep this key a secret: anyone who
has it can access your data.
4. In the properties section, copy the Client ID, and then paste it into a safe location.
Authenticate to Microsoft Graph and query the Identity Risk Events API
At this point, you should have:
The client ID you copied above
The key you copied above
The name of your tenant's domain
To authenticate, send a post request to https://login.microsoft.com with the following parameters in the body:
grant_type: client_credentials
resource: https://graph.microsoft.com
client_id:
client_secret:
Note
You need to provide values for the client_id and the client_secret parameter.
If successful, this returns an authentication token.
To call the API, create a header with the following parameter:
Copy
`Authorization`=<token_type> <access_token>"
When authenticating, you can find the token type and access token in the returned token.
Send this header as a request to the following API URL: https://graph.microsoft.com/beta/identityRiskEvents
The response, if successful, is a collection of identity risk events and associated data in the OData JSON format, which
can be parsed and handled as see fit.
Heres sample code for authenticating and calling the API using Powershell.
Just add your client ID, key, and tenant domain.
$ClientID = "<your client ID here>" # Should be a ~36 hex character string; insert your info
here
$ClientSecret = "<your client secret here>" # Should be a ~44 character string; insert your info
here
$tenantdomain = "<your tenant domain here>" # For example, contoso.onmicrosoft.com

$loginURL = "https://login.microsoft.com"
$resource = "https://graph.microsoft.com"

$body =
@{grant_type="client_credentials";resource=$resource;client_id=$ClientID;client_secret=$ClientSecret}
$oauth = Invoke-RestMethod -Method Post -Uri $loginURL/$tenantdomain/oauth2/token?api-version=1.0 -
Body $body

Write-Output $oauth

if ($oauth.access_token -ne $null) {


$headerParams = @{'Authorization'="$($oauth.token_type) $($oauth.access_token)"}

$url = "https://graph.microsoft.com/beta/identityRiskEvents"
Write-Output $url

$myReport = (Invoke-WebRequest -UseBasicParsing -Headers $headerParams -Uri $url)

foreach ($event in ($myReport.Content | ConvertFrom-Json).value) {


Write-Output $event
}

} else {
Write-Host "ERROR: No Access Token"
}

80 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure Active Directory v2.0 and the OpenID Connect protocol


OpenID Connect is an authentication protocol built on OAuth 2.0 that you can use to securely sign in a user to a web
application. When you use the v2.0 endpoint's implementation of OpenID Connect, you can add sign-in and API access
to your web-based apps. In this article, we show you how to do this independent of language. We describe how to
send and receive HTTP messages without using any Microsoft open-source libraries.
Note
The v2.0 endpoint does not support all Azure Active Directory scenarios and features. To determine whether you
should use the v2.0 endpoint, read about v2.0 limitations.
OpenID Connect extends the OAuth 2.0 authorization protocol to use as an authentication protocol, so that you can
perform single sign-on using OAuth. OpenID Connect introduces the concept of an ID token, which is a security token
that allows the client to verify the identity of the user. The ID token also gets basic profile information about the user.
Because OpenID Connect extends OAuth 2.0, apps can securely acquire access tokens, which can be used to access
resources that are secured by an authorization server. We recommend that you use OpenID Connect if you are
building a web application that is hosted on a server and accessed via a browser.
Protocol diagram: Sign-in
The most basic sign-in flow has the steps shown in the next diagram. We describe each step in detail in this article.

Fetch the OpenID Connect metadata document


OpenID Connect describes a metadata document that contains most of the information required for an app to
perform sign-in. This includes information such as the URLs to use and the location of the service's public signing keys.
For the v2.0 endpoint, this is the OpenID Connect metadata document you should use:
https://login.microsoftonline.com/{tenant}/v2.0/.well-known/openid-configuration

The {tenant} can take one of four values:

81 | P a g e
70-534 Architecting Microsoft Azure Solutions

Value Description

common Users with both a personal Microsoft account and a


work or school account from Azure Active Directory
(Azure AD) can sign in to the application.

organizations Only users with work or school accounts from Azure


AD can sign in to the application.

consumers Only users with a personal Microsoft account can


sign in to the application.

8eaef023-2b34-4da1-9baa- Only users with a work or school account from a


8bc8c9d6a490 or contoso.onmicrosoft.com specific Azure AD tenant can sign in to the
application. Either the friendly domain name of the
Azure AD tenant or the tenant's GUID identifier can
be used.

The metadata is a simple JavaScript Object Notation (JSON) document. See the following snippet for an example. The
snippet's contents are fully described in the OpenID Connect specification.
{
"authorization_endpoint": "https:\/\/login.microsoftonline.com\/common\/oauth2\/v2.0\/authorize",
"token_endpoint": "https:\/\/login.microsoftonline.com\/common\/oauth2\/v2.0\/token",
"token_endpoint_auth_methods_supported": [
"client_secret_post",
"private_key_jwt"
],
"jwks_uri": "https:\/\/login.microsoftonline.com\/common\/discovery\/v2.0\/keys",

...

}
Typically, you would use this metadata document to configure an OpenID Connect library or SDK; the library would
use the metadata to do its work. However, if you're not using a pre-build OpenID Connect library, you can follow the
steps in the remainder of this article to perform sign-in in a web app by using the v2.0 endpoint.
Send the sign-in request
When your web app needs to authenticate the user, it can direct the user to the /authorize endpoint. This request is
similar to the first leg of the OAuth 2.0 authorization code flow, with these important distinctions:
The request must include the openid scope in the scope parameter.
The response_type parameter must include id_token.
The request must include the nonce parameter.
For example:
// Line breaks are for legibility only.

GET https://login.microsoftonline.com/{tenant}/oauth2/v2.0/authorize?
client_id=6731de76-14a6-49ae-97bc-6eba6914391e
&response_type=id_token
&redirect_uri=http%3A%2F%2Flocalhost%2Fmyapp%2F
&response_mode=form_post
&scope=openid
&state=12345
&nonce=678910

Tip: Click the following link to execute this request. After you sign in, your browser will be redirected
to https://localhost/myapp/, with an ID token in the address bar. Note that this request
uses response_mode=query (for demonstration purposes only). We recommend that you
use response_mode=form_post.https://login.microsoftonline.com/common/oauth2/v2.0/authorize...1
82 | P a g e
70-534 Architecting Microsoft Azure Solutions

Parameter Condition Description

tenant Required You can use the {tenant} value in the path of the request to
control who can sign in to the application. The allowed values
are common, organizations, consumers, and tenant identifiers.
For more information, see protocol basics.

client_id Required The Application ID that the Application Registration


Portal assigned to your app.

response_type Required Must include id_token for OpenID Connect sign-in. It might also
include other response_types values, such as code.

redirect_uri Recommended The redirect URI of your app, where authentication responses can
be sent and received by your app. It must exactly match one of
the redirect URIs you registered in the portal, except that it must
be URL encoded.

scope Required A space-separated list of scopes. For OpenID Connect, it must


include the scope openid, which translates to the "Sign you in"
permission in the consent UI. You might also include other scopes
in this request for requesting consent.

nonce Required A value included in the request, generated by the app, that will be
included in the resulting id_token value as a claim. The app can
verify this value to mitigate token replay attacks. The value
typically is a randomized, unique string that can be used to
identify the origin of the request.

response_mode Recommended Specifies the method that should be used to send the resulting
authorization code back to your app. Can be one
of query, form_post, or fragment. For web applications, we
recommend using response_mode=form_post, to ensure the
most secure transfer of tokens to your application.

state Recommended A value included in the request that also will be returned in the
token response. It can be a string of any content you want. A
randomly generated unique value typically is used to prevent
cross-site request forgery attacks. The state also is used to encode
information about the user's state in the app before the
authentication request occurred, such as the page or view the
user was on.

prompt Optional Indicates the type of user interaction that is required. The only
valid values at this time are login, none, and consent.
The prompt=login claim forces the user to enter their credentials
on that request, which negates single sign-on.
The prompt=none claim is the opposite. This claim ensures that
the user is not presented with any interactive prompt whatsoever.

83 | P a g e
70-534 Architecting Microsoft Azure Solutions

Parameter Condition Description

If the request cannot be completed silently via single sign-on, the


v2.0 endpoint returns an error. The prompt=consent claim
triggers the OAuth consent dialog after the user signs in. The
dialog asks the user to grant permissions to the app.

login_hint Optional You can use this parameter to pre-fill the username and email
address field of the sign-in page for the user, if you know the
username ahead of time. Often, apps use this parameter during
re-authentication, after already extracting the username from an
earlier sign-in by using the preferred_username claim.

domain_hint Optional This value can be consumers or organizations. If included, it skips


the email-based discovery process that the user goes through on
the v2.0 sign-in page, for a slightly more streamlined user
experience. Often, apps use this parameter during re-
authentication by extracting the tid claim from the ID token. If
the tid claim value is 9188040d-6c67-4c5b-b112-36a304b66dad,
use domain_hint=consumers. Otherwise,
use domain_hint=organizations.

At this point, the user is prompted to enter their credentials and complete the authentication. The v2.0 endpoint
verifies that the user has consented to the permissions indicated in the scope query parameter. If the user has not
consented to any of those permissions, the v2.0 endpoint prompts the user to consent to the required permissions.
You can read more about permissions, consent, and multitenant apps.
After the user authenticates and grants consent, the v2.0 endpoint returns a response to your app at the indicated
redirect URI by using the method specified in the response_mode parameter.
Successful response
A successful response when you use response_mode=form_post looks like this:
POST /myapp/ HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded

id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik1uQ19WWmNB...&state=12345

Parameter Description

id_token The ID token that the app requested. You can use the id_token parameter to verify the
user's identity and begin a session with the user. For more details about ID tokens and their
contents, see the v2.0 endpoint tokens reference.

state If a state parameter is included in the request, the same value should appear in the
response. The app should verify that the state values in the request and response are
identical.

Error response: Error responses might also be sent to the redirect URI so that the app can handle them. An error
response looks like this:
POST /myapp/ HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded

error=access_denied&error_description=the+user+canceled+the+authentication

84 | P a g e
70-534 Architecting Microsoft Azure Solutions

Parameter Description

error An error code string that you can use to classify types of errors that occur, and to
react to errors.

error_description A specific error message that can help you identify the root cause of an
authentication error.

Error codes for authorization endpoint errors


The following table describes error codes that can be returned in the error parameter of the error response:

Error code Description Client action

invalid_request Protocol error, such as a Fix and resubmit the request. This is a
missing, required development error that typically is
parameter. caught during initial testing.

unauthorized_client The client application This usually occurs when the client
cannot request an application is not registered in Azure AD
authorization code. or is not added to the user's Azure AD
tenant. The application can prompt the
user with instructions to install the
application and add it to Azure AD.

access_denied The resource owner The client application can notify the user
denied consent. that it cannot proceed unless the user
consents.

unsupported_response_type The authorization server Fix and resubmit the request. This is a
does not support the development error that typically is
response type in the caught during initial testing.
request.

server_error The server encountered an Retry the request. These errors can
unexpected error. result from temporary conditions. The
client application might explain to the
user that its response is delayed due to a
temporary error.

temporarily_unavailable The server is temporarily Retry the request. The client application
too busy to handle the might explain to the user that its
request. response is delayed due to a temporary
condition.

invalid_resource The target resource is This indicates that the resource, if it


invalid because either it exists, has not been configured in the
does not exist, Azure AD tenant. The application can prompt the
cannot find it, or it is not user with instructions for installing the
correctly configured. application and adding it to Azure AD.

85 | P a g e
70-534 Architecting Microsoft Azure Solutions

Validate the ID token


Receiving an ID token is not sufficient to authenticate the user. You must also validate the ID token's signature and
verify the claims in the token per your app's requirements. The v2.0 endpoint uses JSON Web Tokens (JWTs) and
public key cryptography to sign tokens and verify that they are valid.
You can choose to validate the ID token in client code, but a common practice is to send the ID token to a back-end
server and perform the validation there. After you've validated the signature of the ID token, you'll need to verify a
few claims. For more information, including more about validating tokens and important information about signing
key rollover, see the v2.0 tokens reference. We recommend using a library to parse and validate tokens. There's at
least one of these libraries available for most languages and platforms.
You also might want to validate additional claims, depending on your scenario. Some common validations include:
Ensure that the user or organization has signed up for the app.
Ensure that the user has required authorization or privileges.
Ensure that a certain strength of authentication has occurred, such as multi-factor authentication.
For more information about the claims in an ID token, see the v2.0 endpoint tokens reference.
After you have completely validated the ID token, you can begin a session with the user. Use the claims in the ID token
to get information about the user in your app. You can use this information for display, records, authorizations, and so
on.
Send a sign-out request
When you want to sign out the user from your app, it isn't sufficient to clear your app's cookies or otherwise end the
user's session. You must also redirect the user to the v2.0 endpoint to sign out. If you don't do this, the user re-
authenticates to your app without entering their credentials again, because they will have a valid single sign-in session
with the v2.0 endpoint.
You can redirect the user to the end_session_endpoint listed in the OpenID Connect metadata document:
GET https://login.microsoftonline.com/common/oauth2/v2.0/logout?
post_logout_redirect_uri=http%3A%2F%2Flocalhost%2Fmyapp%2F

Parameter Condition Description

post_logout_redirect_uri Recommended The URL that the user is redirected to after successfully
signing out. If the parameter is not included, the user is
shown a generic message that's generated by the v2.0
endpoint. This URL must match one of the redirect URIs
registered for your application in the app registration
portal.

Single sign-out
When you redirect the user to the end_session_endpoint, the v2.0 endpoint clears the user's session from the
browser. However, the user may still be signed in to other applications that use Microsoft accounts for authentication.
To enable those applications to sign the user out simultaneously, the v2.0 endpoint sends an HTTP GET request to the
registered LogoutUrl of all the applications that the user is currently signed in to. Applications must respond to this
request by clearing any session that identifies the user and returning a 200 response. If you wish to support single sign
out in your application, you must implement such a LogoutUrl in your application's code. You can set
the LogoutUrl from the app registration portal.
Protocol diagram: Token acquisition
Many web apps need to not only sign the user in, but also to access a web service on behalf of the user by using
OAuth. This scenario combines OpenID Connect for user authentication while simultaneously getting an authorization
code that you can use to get access tokens if you are using the OAuth authorization code flow.
The full OpenID Connect sign-in and token acquisition flow looks similar to the next diagram. We describe each step in
detail in the next sections of the article.

86 | P a g e
70-534 Architecting Microsoft Azure Solutions

Get access tokens


To acquire access tokens, modify the sign-in request:
// Line breaks are for legibility only.

GET https://login.microsoftonline.com/{tenant}/oauth2/v2.0/authorize?
client_id=6731de76-14a6-49ae-97bc-6eba6914391e // Your registered Application ID
&response_type=id_token%20code
&redirect_uri=http%3A%2F%2Flocalhost%2Fmyapp%2F // Your registered redirect URI, URL encoded
&response_mode=form_post // 'query', 'form_post', or 'fragment'
&scope=openid%20 // Include both 'openid' and scopes that your app
needs
offline_access%20
https%3A%2F%2Fgraph.microsoft.com%2Fmail.read
&state=12345 // Any value, provided by your app
&nonce=678910 // Any value, provided by your app
Tip: Click the following link to execute this request. After you sign in, your browser is redirected
to https://localhost/myapp/, with an ID token and a code in the address bar. Note that this request
uses response_mode=query (for demonstration purposes only). We recommend that you
use response_mode=form_post.https://login.microsoftonline.com/common/oauth2/v2.0/authorize...
By including permission scopes in the request and by using response_type=id_token code, the v2.0 endpoint ensures
that the user has consented to the permissions indicated in the scope query parameter. It returns an authorization
code to your app to exchange for an access token.
Successful response: A successful response from using response_mode=form_post looks like this:
POST /myapp/ HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded

id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik1uQ19WWmNB...&code=AwABAAAAvPM1KaPlrEqdFSBzjqfTGBC
mLdgfSTLEMPGYuNHSUYBrq...&state=12345

87 | P a g e
70-534 Architecting Microsoft Azure Solutions

Parameter Description

id_token The ID token that the app requested. You can use the ID token to verify the user's identity
and begin a session with the user. You'll find more details about ID tokens and their contents
in the v2.0 endpoint tokens reference.

code The authorization code that the app requested. The app can use the authorization code to
request an access token for the target resource. An authorization code is very short-lived.
Typically, an authorization code expires in about 10 minutes.

state If a state parameter is included in the request, the same value should appear in the
response. The app should verify that the state values in the request and response are
identical.

Error response: Error responses might also be sent to the redirect URI so that the app can handle them appropriately.
An error response looks like this:
POST /myapp/ HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded

error=access_denied&error_description=the+user+canceled+the+authentication

Parameter Description

error An error code string that you can use to classify types of errors that occur, and to
react to errors.

error_description A specific error message that can help you identify the root cause of an
authentication error.

For a description of possible error codes and recommended client responses, see Error codes for authorization
endpoint errors.
When you have an authorization code and an ID token, you can sign the user in and get access tokens on their behalf.
To sign the user in, you must validate the ID token exactly as described. To get access tokens, follow the steps
described in our OAuth protocol documentation.

How Azure Active Directory uses the SAML protocol


Azure Active Directory (Azure AD) uses the SAML 2.0 protocol to enable applications to provide a single sign-on
experience to their users. The Single Sign-On and Single Sign-Out SAML profiles of Azure AD explain how SAML
assertions, protocols and bindings are used in the identity provider service.

SAML Protocol requires the identity provider (Azure AD) and the service provider (the application) to exchange
information about themselves.

When an application is registered with Azure AD, the app developer registers federation-related information with
Azure AD. This includes the Redirect URI and Metadata URI of the application.

Azure AD uses the Metadata URI of the cloud service to retrieve the signing key and the logout URI of the cloud
service. If the application does not support a metadata URI, the developer must contact Microsoft support to provide
the logout URI and signing key.

88 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure Active Directory exposes tenant-specific and common (tenant-independent) single sign-on and single sign-out
endpoints. These URLs represent addressable locations -- they are not just an identifiers -- so you can go to the
endpoint to read the metadata.

The Tenant-specific endpoint is located


at https://login.microsoftonline.com/<TenantDomainName>/FederationMetadata/2007-
06/FederationMetadata.xml . The placeholder represents a registered domain name or TenantID GUID
of an Azure AD tenant. For example, the federation metadata of the contoso.com tenant is
at: https://login.microsoftonline.com/contoso.com/FederationMetadata/2007-
06/FederationMetadata.xml
The Tenant-independent endpoint is located
at https://login.microsoftonline.com/common/FederationMetadata/2007-06/FederationMetadata.xml .In
this endpoint address, common appears, instead of a tenant domain name or ID.

Azure AD Connect sync: Understand and customize synchronization


The Azure Active Directory Connect synchronization services (Azure AD Connect sync) is a main component of Azure
AD Connect. It takes care of all the operations that are related to synchronize identity data between your on-premises
environment and Azure AD. Azure AD Connect sync is the successor of DirSync, Azure AD Sync, and Forefront Identity
Manager with the Azure Active Directory Connector configured.
This topic is the home for Azure AD Connect sync (also called sync engine) and lists links to all other topics related to it.
For links to Azure AD Connect, see Integrating your on-premises identities with Azure Active Directory.
The sync service consists of two components, the on-premises Azure AD Connect sync component and the service side
in Azure AD called Azure AD Connect sync service. The service is common for DirSync, Azure AD Sync, and Azure AD
Connect.

Deploying Active Directory Federation Services in Azure


AD FS provides simplified, secured identity federation and Web single sign-on (SSO) capabilities. Federation with Azure
AD or O365 enables users to authenticate using on-premises credentials and access all resources in cloud. As a result,
it becomes important to have a highly available AD FS infrastructure to ensure access to resources both on-premises
and in the cloud. Deploying AD FS in Azure can help achieve the high availability required with minimal efforts. There
are several advantages of deploying AD FS in Azure, a few of them are listed below:
High Availability - With the power of Azure Availability Sets, you ensure a highly available infrastructure.
Easy to Scale Need more performance? Easily migrate to more powerful machines by just a few clicks in
Azure
Cross-Geo Redundancy With Azure Geo Redundancy you can be assured that your infrastructure is highly
available across the globe
Easy to Manage With highly simplified management options in Azure portal, managing your infrastructure is
very easy and hassle-free

Design principles

89 | P a g e
70-534 Architecting Microsoft Azure Solutions

The diagram above shows the recommended basic topology to start deploying your AD FS infrastructure in Azure. The
principles behind the various components of the topology are listed below:
DC / ADFS Servers: If you have fewer than 1,000 users you can simply install AD FS role on your domain
controllers. If you do not want any performance impact on the domain controllers or if you have more than
1,000 users, then deploy AD FS on separate servers.
WAP Server it is necessary to deploy Web Application Proxy servers, so that users can reach the AD FS when
they are not on the company network also.
DMZ: The Web Application Proxy servers will be placed in the DMZ and ONLY TCP/443 access is allowed
between the DMZ and the internal subnet.
Load Balancers: To ensure high availability of AD FS and Web Application Proxy servers, we recommend using
an internal load balancer for AD FS servers and Azure Load Balancer for Web Application Proxy servers.
Availability Sets: To provide redundancy to your AD FS deployment, it is recommended that you group two or
more virtual machines in an Availability Set for similar workloads. This configuration ensures that during either
a planned or unplanned maintenance event, at least one virtual machine will be available
Storage Accounts: It is recommended to have two storage accounts. Having a single storage account can lead
to creation of a single point of failure and can cause the deployment to become unavailable in an unlikely
scenario where the storage account goes down. Two storage accounts will help associate one storage account
for each fault line.
Network segregation: Web Application Proxy servers should be deployed in a separate DMZ network. You can
divide one virtual network into two subnets and then deploy the Web Application Proxy server(s) in an
isolated subnet. You can simply configure the network security group settings for each subnet and allow only
required communication between the two subnets. More details are given per deployment scenario below
Steps to deploy AD FS in Azure
The steps mentioned in this section outline the guide to deploy the below depicted AD FS infrastructure in Azure.
1. Deploying the network
As outlined above, you can either create two subnets in a single virtual network or else create two completely
different virtual networks (VNet). This article will focus on deploying a single virtual network and divide it into two
subnets. This is currently an easier approach as two separate VNets would require a VNet to VNet gateway for
communications.
1.1 Create virtual network

90 | P a g e
70-534 Architecting Microsoft Azure Solutions

In the Azure portal, select virtual


network and you can deploy the
virtual network and one subnet
immediately with just one click.
INT subnet is also defined and is
ready now for VMs to be added.
The next step is to add another
subnet to the network, i.e. the
DMZ subnet. To create the DMZ
subnet, simply
Select the newly created
network
In the properties select
subnet
In the subnet panel click
on the add button
Provide the subnet
name and address space
information to create
the subnet

91 | P a g e
70-534 Architecting Microsoft Azure Solutions

1.2. Creating the network security groups


A Network security group (NSG) contains a list of Access Control List (ACL)
rules that allow or deny network traffic to your VM instances in a Virtual
Network. NSGs can be associated with either subnets or individual VM
instances within that subnet. When an NSG is associated with a subnet, the
ACL rules apply to all the VM instances in that subnet. For the purpose of this
guidance, we will create two NSGs: one each for an internal network and a
DMZ. They will be labeled NSG_INT and NSG_DMZ respectively.

After the NSG is created, there will be 0 inbound and 0 outbound rules. Once
the roles on the respective servers are installed and functional, then the
inbound and outbound rules can be made according to the desired level of
security.

After the NSGs are created, associate NSG_INT with subnet INT and NSG_DMZ
with subnet DMZ. An example screenshot is given below:

92 | P a g e
70-534 Architecting Microsoft Azure Solutions

Click on Subnets to open the panel for subnets


Select the subnet to associate with the NSG
After configuration, the panel for Subnets should look like below:

1.3. Create Connection to on-premises


We will need a connection to on-premises in order to deploy the domain controller (DC) in azure. Azure offers various
connectivity options to connect your on-premises infrastructure to your Azure infrastructure.
Point-to-site
Virtual Network Site-to-site
ExpressRoute
It is recommended to use ExpressRoute. ExpressRoute lets you create private connections between Azure datacenters
and infrastructure thats on your premises or in a co-location environment. ExpressRoute connections do not go over
the public Internet. They offer more reliability, faster speeds, lower latencies and higher security than typical
connections over the Internet. While it is recommended to use ExpressRoute, you may choose any connection
method best suited for your organization. To learn more about ExpressRoute and the various connectivity options
using ExpressRoute, read ExpressRoute technical overview.
2. Create storage accounts
In order to maintain high availability and avoid dependence on a single storage account, you can create two storage
accounts. Divide the machines in each availability set into two groups and then assign each group a separate storage
account.

93 | P a g e
70-534 Architecting Microsoft Azure Solutions

3. Create availability sets


For each role (DC/AD FS and WAP), create availability sets that will contain 2 machines each at the minimum. This will
help achieve higher availability for each role. While creating the availability sets, it is essential to decide on the
following:
Fault Domains: Virtual machines in the same fault domain share the same power source and physical network
switch. A minimum of 2 fault domains are recommended. The default value is 3 and you can leave it as is for
the purpose of this deployment
Update domains: Machines belonging to the same update domain are restarted together during an update.
You want to have minimum of 2 update domains. The default value is 5 and you can leave it as is for the
purpose of this deployment

94 | P a g e
70-534 Architecting Microsoft Azure Solutions

Create the following availability sets

Availability Set Role Fault domains Update domains

contosodcset DC/ADFS 3 5

contosowapset WAP 3 5

4. Deploy virtual machines


The next step is to deploy virtual machines that will host the different roles in your infrastructure. A minimum of two
machines are recommended in each availability set. Create four virtual machines for the basic deployment.

Machine Role Subnet Availability set Storage account IP Address

contosodc1 DC/ADFS INT contosodcset contososac1 Static

contosodc2 DC/ADFS INT contosodcset contososac2 Static

contosowap1 WAP DMZ contosowapset contososac1 Static

contosowap2 WAP DMZ contosowapset contososac2 Static

As you might have noticed, no NSG has been specified. This is because azure lets you use NSG at the subnet level.
Then, you can control machine network traffic by using the individual NSG associated with either the subnet or else
the NIC object. Read more on What is a Network Security Group (NSG). Static IP address is recommended if you are

95 | P a g e
70-534 Architecting Microsoft Azure Solutions

managing the DNS. You can use Azure DNS and instead in the DNS records for your domain, refer to the new machines
by their Azure FQDNs. Your virtual machine pane should look like below after the deployment is completed:

5. Configuring the domain controller / AD FS servers


In order to authenticate any incoming request, AD FS will need to contact the domain controller. To save the costly
trip from Azure to on-premises DC for authentication, it is recommended to deploy a replica of the domain controller
in Azure. In order to attain high availability, it is recommended to create an availability set of at-least 2 domain
controllers.

Domain controller Role Storage account

contosodc1 Replica contososac1

contosodc2 Replica contososac2

Promote the two servers as replica domain controllers with DNS


Configure the AD FS servers by installing the AD FS role using the server manager.
6. Deploying Internal Load Balancer (ILB)
6.1. Create the ILB
To deploy an ILB, select Load Balancers in the Azure portal and click on add (+).
Note
if you do not see Load Balancers in your menu,
click Browse in the lower left of the portal and scroll
until you see Load Balancers. Then click the yellow
star to add it to your menu. Now select the new
load balancer icon to open the panel to begin
configuration of the load balancer.

Name: Give any suitable name to the load


balancer
Scheme: Since this load balancer will be
placed in front of the AD FS servers and is meant
for internal network connections ONLY, select
Internal
Virtual Network: Choose the virtual network
where you are deploying your AD FS
Subnet: Choose the internal subnet here
IP Address assignment: Dynamic

96 | P a g e
70-534 Architecting Microsoft Azure Solutions

After you click create and the ILB is deployed, you should see it in
the list of load balancers:

Next step is to configure the backend pool and the backend probe.
6.2. Configure ILB backend pool
Select the newly created ILB in the Load Balancers panel. It will open the settings panel.
1. Select backend pools from the settings panel
97 | P a g e
70-534 Architecting Microsoft Azure Solutions

2. In the add backend pool panel, click on add virtual machine


3. You will be presented with a panel where you can choose availability set
4. Choose the AD FS availability set

6.3. Configuring probe


In the ILB settings panel, select Probes.
1. Click on add
2. Provide details for probe a. Name: Probe name b. Protocol: TCP c. Port: 443 (HTTPS) d. Interval: 5 (default
value) this is the interval at which ILB will probe the machines in the backend pool e. Unhealthy threshold
limit: 2 (default val ue) this is the threshold of consecutive probe failures after which ILB will declare a
machine in the backend pool non-responsive and stop sending traffic to it.

6.4. Create load balancing rules


In order to effectively balance the traffic, the ILB should be configured with load balancing rules. In order to create a
load balancing rule,
1. Select Load balancing rule from the settings panel of the ILB
2. Click on Add in the Load balancing rule panel
3. In the Add load balancing rule panel a. Name: Provide a name for the rule b. Protocol: Select TCP c. Port: 443
d. Backend port: 443 e. Backend pool: Select the pool you created for the AD FS cluster earlier f. Probe: Select
the probe created for AD FS servers earlier

98 | P a g e
70-534 Architecting Microsoft Azure Solutions

6.5. Update DNS with ILB


Go to your DNS server and create a CNAME for the ILB. The CNAME should be for the federation service with the IP
address pointing to the IP address of the ILB. For example if the ILB DIP address is 10.3.0.8, and the federation service
installed is fs.contoso.com, then create a CNAME for fs.contoso.com pointing to 10.3.0.8. This will ensure that all
communication regarding fs.contoso.com end up at the ILB and are appropriately routed.
7. Configuring the Web Application Proxy server
7.1. Configuring the Web Application Proxy servers to reach AD FS servers
In order to ensure that Web Application Proxy servers are able to reach the AD FS servers behind the ILB, create a
record in the %systemroot%\system32\drivers\etc\hosts for the ILB. Note that the distinguished name (DN) should be
the federation service name, for example fs.contoso.com. And the IP entry should be that of the ILBs IP address
(10.3.0.8 as in the example).
7.2. Installing the Web Application Proxy role
After you ensure that Web Application Proxy servers are able to reach the AD FS servers behind ILB, you can next
install the Web Application Proxy servers. Web Application Proxy servers do not be joined to the domain. Install the
Web Application Proxy roles on the two Web Application Proxy servers by selecting the Remote Access role. The
server manager will guide you to complete the WAP installation. For more information on how to deploy WAP,
read Install and Configure the Web Application Proxy Server.1
8. Deploying the Internet Facing (Public) Load Balancer
8.1. Create Internet Facing (Public) Load Balancer
In the Azure portal, select Load balancers and then click on Add. In the Create load balancer panel, enter the following
information
1. Name: Name for the load balancer
2. Scheme: Public this option tells Azure that this load balancer will need a public address.
3. IP Address: Create a new IP address (dynamic)

99 | P a g e
70-534 Architecting Microsoft Azure Solutions

After deployment, the load balancer will appear in the Load balancers list.

8.2. Assign a DNS label to the public IP


Click on the newly created load balancer entry in the Load balancers panel to bring up the panel for configuration.
Follow below steps to configure the DNS label for the public IP:
1. Click on the public IP address. This will open the panel for the public IP and its settings
2. Click on Configuration
3. Provide a DNS label. This will become the public DNS label that you can access from anywhere, for example
contosofs.westus.cloudapp.azure.com. You can add an entry in the external DNS for the federation service
(like fs.contoso.com) that resolves to the DNS label of the external load balancer
(contosofs.westus.cloudapp.azure.com).

100 | P a g e
70-534 Architecting Microsoft Azure Solutions

8.3. Configure backend pool for Internet Facing (Public) Load Balancer
Follow the same steps as in creating the internal load balancer, to configure the backend pool for Internet Facing
(Public) Load Balancer as the availability set for the WAP servers. For example, contosowapset.

101 | P a g e
70-534 Architecting Microsoft Azure Solutions

8.4. Configure probe


Follow the same steps as in configuring the internal load balancer to configure the probe for the backend pool of WAP
servers.

8.5. Create load balancing rule(s)


Follow the same steps as in ILB to configure the load balancing rule for TCP 443.

102 | P a g e
70-534 Architecting Microsoft Azure Solutions

9. Securing the network


9.1. Securing the internal subnet
Overall, you need the following rules to efficiently secure your internal subnet (in the order as listed below)

Rule Description Flow

AllowHTTPSFromDMZ Allow the HTTPS communication from DMZ Inbound

DenyInternetOutbound No access to internet Outbound

9.2. Securing the DMZ subnet

Rule Description Flow

AllowHTTPSFromInternet Allow HTTPS from internet to the DMZ Inbound

DenyInternetOutbound Anything except HTTPS to internet is blocked Outbound

103 | P a g e
70-534 Architecting Microsoft Azure Solutions

Note
If client user certificate authentication (clientTLS authentication using X509 user certificates) is required, then AD FS
requires TCP port 49443 be enabled for inbound access.
10. Test the AD FS sign-in
The easiest way is to test AD FS is by using the IdpInitiatedSignon.aspx page. In order to be able to do that, it is
required to enable the IdpInitiatedSignOn on the AD FS properties. Follow the steps below to verify your AD FS setup
1. Run the below cmdlet on the AD FS server, using PowerShell, to set it to enabled. Set-AdfsProperties -
EnableIdPInitiatedSignonPage $true
2. From any external machine access https://adfs.thecloudadvocate.com/adfs/ls/IdpInitiatedSignon.aspx
3. You should see the AD FS page like below:

On successful sign-in, it will provide you with a success message as shown below:

104 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure AD B2C
Azure AD B2C is a cloud identity management solution for your web and mobile applications. It is a highly available
global service that scales to hundreds of millions of identities. Built on an enterprise-grade secure platform, Azure AD
B2C keeps your applications, your business, and your customers protected.
With minimal configuration, Azure AD B2C enables your application to authenticate:
Social Accounts (such as Facebook, Google, LinkedIn, and more)
Enterprise Accounts (using open standard protocols, OpenID Connect or SAML)
Local Accounts (email address and password, or username and password)

Azure Active Directory B2C: Provide sign-up and sign-in to consumers with Microsoft accounts
Create a Microsoft account application
To use Microsoft account as an identity provider in Azure Active Directory (Azure AD) B2C, you need to create a
Microsoft account application and supply it with the right parameters. You need a Microsoft account to do this. If you
dont have one, you can get it at https://www.live.com/.
1. Go to the Microsoft Application Registration Portal and sign in with your Microsoft account credentials.
2. Click Add an app.

105 | P a g e
70-534 Architecting Microsoft Azure Solutions

3. Provide a Name for your application and click Create application.

4. Copy the value of Application Id. You will need it to configure Microsoft account as an identity provider in your
tenant.

5. Click on Add platform and choose Web.

106 | P a g e
70-534 Architecting Microsoft Azure Solutions

6. Enter https://login.microsoftonline.com/te/{tenant}/oauth2/authresp in the Redirect URIs field.


Replace {tenant} with your tenant's name (for example, contosob2c.onmicrosoft.com).

7. Click on Generate New Password under the Application Secrets section. Copy the new password displayed on
screen. You will need it to configure Microsoft account as an identity provider in your tenant. This password is
an important security credential.

107 | P a g e
70-534 Architecting Microsoft Azure Solutions

8. Check the box that says Live SDK support under the Advanced Options section. Click Save.

Configure Microsoft account as an identity provider in your tenant


1. Follow these steps to navigate to the B2C features blade on the Azure portal.
2. On the B2C features blade, click Identity providers.
3. Click +Add at the top of the blade.
4. Provide a friendly Name for the identity provider configuration. For example, enter "MSA".
5. Click Identity provider type, select Microsoft account, and click OK.
6. Click Set up this identity provider and enter the Application Id and password of the Microsoft account
application that you created earlier.
7. Click OK and then click Create to save your Microsoft account configuration.

Azure Active Directory B2C: Enable Multi-Factor Authentication in your


consumer-facing applications
Azure Active Directory (Azure AD) B2C integrates directly with Azure Multi-Factor Authentication so that you can add a
second layer of security to sign-up and sign-in experiences in your consumer-facing applications. And you can do this
without writing a single line of code. Currently we support phone call and text message verification. If you already
created sign-up and sign-in policies, you can still enable Multi-Factor Authentication.
Note
Multi-Factor Authentication can also be enabled when you create sign-up and sign-in policies, not just by editing
existing policies.
This feature helps applications handle scenarios such as the following:
You don't require Multi-Factor Authentication to access one application, but you do require it to access
another one. For example, the consumer can sign into an auto insurance application with a social or local
account, but must verify the phone number before accessing the home insurance application registered in the
same directory.

108 | P a g e
70-534 Architecting Microsoft Azure Solutions

You don't require Multi-Factor Authentication to access an application in general, but you do require it to
access the sensitive portions within it. For example, the consumer can sign in to a banking application with a
social or local account and check account balance, but must verify the phone number before attempting a
wire transfer.
Modify your sign-up policy to enable Multi-Factor Authentication
1. Follow these steps to navigate to the B2C features blade on the Azure portal.
2. Click Sign-up policies.
3. Click your sign-up policy (for example, "B2C_1_SiUp") to open it.
4. Click Multi-factor authentication and turn the State to ON. Click OK.
5. Click Save at the top of the blade.
You can use the "Run now" feature on the policy to verify the consumer experience. Confirm the following:
A consumer account gets created in your directory before the Multi-Factor Authentication step occurs. During the
step, the consumer is asked to provide his or her phone number and verify it. If verification is successful, the phone
number is attached to the consumer account for later use. Even if the consumer cancels or drops out, he or she can
be asked to verify a phone number again during the next sign-in (with Multi-Factor Authentication enabled).
Modify your sign-in policy to enable Multi-Factor Authentication
1. Follow these steps to navigate to the B2C features blade on the Azure portal.
2. Click Sign-in policies.
3. Click your sign-in policy (for example, "B2C_1_SiIn") to open it. Click Edit at the top of the blade.
4. Click Multi-factor authentication and turn the State to ON. Click OK.
5. Click Save at the top of the blade.
You can use the "Run now" feature on the policy to verify the consumer experience. Confirm the following:
When the consumer signs in (using a social or local account), if a verified phone number is attached to the consumer
account, he or she is asked to verify it. If no phone number is attached, the consumer is asked to provide one and
verify it. On successful verification, the phone number is attached to the consumer account for later use.
Multi-Factor Authentication on other policies
As described for sign-up & sign-in policies above, it is also possible to enable multi-factor authentication on sign-up or
sign-in policies and password reset policies. It will be available soon on profile editing policies.

Azure Active Directory B2B


On-premise Active Directory has put some requirements on your infrastructure, but moving AD to the cloud has
removed most of these obstacles. You don't care about replication or the number of domain controllers when it's all
in Azure. It has however not easily solved one of the other big issues that has plagued a lot of admins out there for
quite a while; how do I let non-employees into my directory? The typical use case for this is having some consultant
that needs to access your servers to do work for you. If they come in for a hit-and-run install job you might just log in
for them with an account and let them do their business. But if they need access over time, and don't come into your
office, that isn't going to work.
Still, consultants are "easy" in the big picture. What about having a joint project with another organization where you
need to collaborate on documents, share web apps, etc.? That's even more taxing.
The collaboration scenarios have typically been solved for things like Lync/Skype for Business by setting up federation
between the organizations. This works fairly good once you've got it up and running. The challenge with this approach
is two-fold; not every organization out there has a setup suited for federation. If you have on-prem Skype for Business
you probably have a server or two already, and adding a few more for something like ADFS isn't necessarily an issue.
But if you have a smaller organization, and you don't have the necessary solutions in place getting one going usually
has some level of complexity to it. Disregarding the complexity involved in implementing these products, some of the
solutions in the market has a high price tag attached to them as well.
Even if you overcome these factors the problem is scaling. Granted, if you only have a few companies you're working
with this isn't something you need to worry about. If you have a lot of partners/customers/etc. you're working with
setting up federations for all of them is probably going to become less enjoyable over time.
Enter Azure AD B2B to assist.
Note: the service is in preview, and things might change between writing this and the release going GA.

109 | P a g e
70-534 Architecting Microsoft Azure Solutions

AAD B2B doesn't remove the concept of federation, but it takes the work away from you. If you use AAD, and an
organization you work with also has AAD you both have "stuff" in Azure so wouldn't it make sense that Microsoft
handles this for you? Yes, it would, and that is what AAD B2B will leverage.
If you have resources in your AAD tenant you can invite users from other AAD tenants to be linked to your resources.
They login with their existing credentials, but gains access to your data through a federation not visible to you.
It's not an AAD-only thing either, you will be able to invite users not in an AAD tenant as well, with the option for them
to step up to getting one should they like. The only restriction for now is that the external users cannot have a
consumer email provider like Gmail/Outlook.com.
Let's go through the necessary steps for setting this up between two organizations. Both of these organizations has an
Office 365 subscription, and an associated Azure AD tenant.
Go to the Active Directory section in the legacy Azure portal https://manage.windowsazure.com, navigate to the Users
tab, and click "Add User".

You should select "Users in partner organizations" as the user type. As you can see this currently requires uploading a
csv-file to progress.
The csv-file has a format like this:
Email,DisplayName,InviteAppID,InviteReplyUrl,InviteAppResources,InviteGroupResources,InviteContactUsUrl
andreas@contoso.com,Andreas,cd3ed3de-93ee-400b-8b19-b61ef44a0f29,,,,http://contoso.com

The invite is sent via mail to the external users. You can verify the status of the invite in the management portal. (If
your csv-file is incorrect you will also be notified.)

110 | P a g e
70-534 Architecting Microsoft Azure Solutions

The invited user receives a mail with a unique link:

Once they click it they will need to accept/confirm.

The same link can be reused later, but it's probably easier to save a favorite for the following url:
https://myapps.microsoft.com/

111 | P a g e
70-534 Architecting Microsoft Azure Solutions

In addition to the apps of your own organization, you should have an extra icon for apps belonging to other
organizations:

At the moment you can't tell the difference between the apps based on your organizational belonging.
If you browse to the Users tab of the AAD section in the portal you will see be able to see the external user as an AAD
account (Other directory).

As you can see this is a fairly friction-free process with a minimum of configuration needed. Even without deep
knowledge about Identity Management this is achievable.

Azure Storage security


Azure Storage provides a comprehensive set of security capabilities which together enable developers to build secure
applications. The storage account itself can be secured using Role-Based Access Control and Azure Active Directory.
Data can be secured in transit between an application and Azure by using Client-Side Encryption, HTTPS, or SMB 3.0.
Data can be set to be automatically encrypted when written to Azure Storage using Storage Service Encryption (SSE).
OS and Data disks used by virtual machines can be set to be encrypted using Azure Disk Encryption. Delegated access
to the data objects in Azure Storage can be granted using Shared Access Signatures.
This article will provide an overview of each of these security features that can be used with Azure Storage. Links are
provided to articles that will give details of each feature so you can easily do further investigation on each topic.
Here are the topics to be covered in this article:
Management Plane Security Securing your Storage Account

112 | P a g e
70-534 Architecting Microsoft Azure Solutions

The management plane consists of the resources used to manage your storage account. In this section, we'll talk
about the Azure Resource Manager deployment model and how to use Role-Based Access Control (RBAC) to control
access to your storage accounts. We will also talk about managing your storage account keys and how to regenerate
them.
Data Plane Security Securing Access to Your Data
In this section, we'll look at allowing access to the actual data objects in your Storage account, such as blobs, files,
queues, and tables, using Shared Access Signatures and Stored Access Policies. We will cover both service-level SAS
and account-level SAS. We'll also see how to limit access to a specific IP address (or range of IP addresses), how to
limit the protocol used to HTTPS, and how to revoke a Shared Access Signature without waiting for it to expire.
Encryption in Transit
This section discusses how to secure data when you transfer it into or out of Azure Storage. We'll talk about the
recommended use of HTTPS and the encryption used by SMB 3.0 for Azure File Shares. We will also take a look at
Client-side Encryption, which enables you to encrypt the data before it is transferred into Storage in a client
application, and to decrypt the data after it is transferred out of Storage.
Encryption at Rest
We will talk about Storage Service Encryption (SSE), and how you can enable it for a storage account, resulting in your
block blobs, page blobs, and append blobs being automatically encrypted when written to Azure Storage. We will also
look at how you can use Azure Disk Encryption and explore the basic differences and cases of Disk Encryption versus
SSE versus Client-Side Encryption. We will briefly look at FIPS compliance for U.S. Government computers.
Using Storage Analytics to audit access of Azure Storage
This section discusses how to find information in the storage analytics logs for a request. We'll take a look at real
storage analytics log data and see how to discern whether a request is made with the Storage account key, with a
Shared Access signature, or anonymously, and whether it succeeded or failed.
Enabling Browser-Based Clients using CORS
This section talks about how to allow cross-origin resource sharing (CORS). We'll talk about cross-domain access, and
how to handle it with the CORS capabilities built into Azure Storage.
Management Plane Security
The management plane consists of operations that affect the storage account itself. For example, you can create or
delete a storage account, get a list of storage accounts in a subscription, retrieve the storage account keys, or
regenerate the storage account keys.
When you create a new storage account, you select a deployment model of Classic or Resource Manager. The Classic
model of creating resources in Azure only allows all-or-nothing access to the subscription, and in turn, the storage
account.
This guide focuses on the Resource Manager model which is the recommended means for creating storage accounts.
With the Resource Manager storage accounts, rather than giving access to the entire subscription, you can control
access on a more finite level to the management plane using Role-Based Access Control (RBAC).
How to secure your storage account with Role-Based Access Control (RBAC)
Let's talk about what RBAC is, and how you can use it. Each Azure subscription has an Azure Active Directory. Users,
groups, and applications from that directory can be granted access to manage resources in the Azure subscription that
use the Resource Manager deployment model. This is referred to as Role-Based Access Control (RBAC). To manage
this access, you can use the Azure portal, the Azure CLI tools, PowerShell, or the Azure Storage Resource Provider
REST APIs.
With the Resource Manager model, you put the storage account in a resource group and control access to the
management plane of that specific storage account using Azure Active Directory. For example, you can give specific
users the ability to access the storage account keys, while other users can view information about the storage
account, but cannot access the storage account keys.
Granting Access
Access is granted by assigning the appropriate RBAC role to users, groups, and applications, at the right scope. To
grant access to the entire subscription, you assign a role at the subscription level. You can grant access to all of the
resources in a resource group by granting permissions to the resource group itself. You can also assign specific roles to
specific resources, such as storage accounts.
Here are the main points that you need to know about using RBAC to access the management operations of an Azure
Storage account:

113 | P a g e
70-534 Architecting Microsoft Azure Solutions

When you assign access, you basically assign a role to the account that you want to have access. You can
control access to the operations used to manage that storage account, but not to the data objects in the
account. For example, you can grant permission to retrieve the properties of the storage account (such as
redundancy), but not to a container or data within a container inside Blob Storage.
For someone to have permission to access the data objects in the storage account, you can give them
permission to read the storage account keys, and that user can then use those keys to access the blobs,
queues, tables, and files.
Roles can be assigned to a specific user account, a group of users, or to a specific application.
Each role has a list of Actions and Not Actions. For example, the Virtual Machine Contributor role has an
Action of "listKeys" that allows the storage account keys to be read. The Contributor has "Not Actions" such as
updating the access for users in the Active Directory.
Roles for storage include (but are not limited to) the following:
o Owner They can manage everything, including access.
o Contributor They can do anything the owner can do except assign access. Someone with this role
can view and regenerate the storage account keys. With the storage account keys, they can access
the data objects.
o Reader They can view information about the storage account, except secrets. For example, if you
assign a role with reader permissions on the storage account to someone, they can view the
properties of the storage account, but they can't make any changes to the properties or view the
storage account keys.
o Storage Account Contributor They can manage the storage account they can read the
subscription's resource groups and resources, and create and manage subscription resource group
deployments. They can also access the storage account keys, which in turn means they can access the
data plane.
o User Access Administrator They can manage user access to the storage account. For example, they
can grant Reader access to a specific user.
o Virtual Machine Contributor They can manage virtual machines but not the storage account to
which they are connected. This role can list the storage account keys, which means that the user to
whom you assign this role can update the data plane.
In order for a user to create a virtual machine, they have to be able to create the corresponding VHD file in a storage
account. To do that, they need to be able to retrieve the storage account key and pass it to the API creating the VM.
Therefore, they must have this permission so they can list the storage account keys.
The ability to define custom roles is a feature that allows you to compose a set of actions from a list of
available actions that can be performed on Azure resources.
The user has to be set up in your Azure Active Directory before you can assign a role to them.
You can create a report of who granted/revoked what kind of access to/from whom and on what scope using
PowerShell or the Azure CLI.
Resources
Azure Active Directory Role-based Access Control
This article explains the Azure Active Directory Role-based Access Control and how it works.
RBAC: Built in Roles
This article details all of the built-in roles available in RBAC.
Understanding Resource Manager deployment and classic deployment
This article explains the Resource Manager deployment and classic deployment models, and explains the benefits of
using the Resource Manager and resource groups. It explains how the Azure Compute, Network, and Storage
Providers work under the Resource Manager model.
Managing Role-Based Access Control with the REST API
This article shows how to use the REST API to manage RBAC.
Azure Storage Resource Provider REST API Reference
This is the reference for the APIs you can use to manage your storage account programmatically.
Developer's guide to auth with Azure Resource Manager API
This article shows how to authenticate using the Resource Manager APIs.
Role-Based Access Control for Microsoft Azure from Ignite

114 | P a g e
70-534 Architecting Microsoft Azure Solutions

This is a link to a video on Channel 9 from the 2015 MS Ignite conference. In this session, they talk about access
management and reporting capabilities in Azure, and explore best practices around securing access to Azure
subscriptions using Azure Active Directory.
Managing Your Storage Account Keys
Storage account keys are 512-bit strings created by Azure that, along with the storage account name, can be used to
access the data objects stored in the storage account, e.g. blobs, entities within a table, queue messages, and files on
an Azure Files share. Controlling access to the storage account keys controls access to the data plane for that storage
account.
Each storage account has two keys referred to as "Key 1" and "Key 2" in the Azure portal and in the PowerShell
cmdlets. These can be regenerated manually using one of several methods, including, but not limited to using
the Azure portal, PowerShell, the Azure CLI, or programmatically using the .NET Storage Client Library or the Azure
Storage Services REST API.
There are any number of reasons to regenerate your storage account keys.
You might regenerate them on a regular basis for security reasons.
You would regenerate your storage account keys if someone managed to hack into an application and retrieve
the key that was hardcoded or saved in a configuration file, giving them full access to your storage account.
Another case for key regeneration is if your team is using a Storage Explorer application that retains the
storage account key, and one of the team members leaves. The application would continue to work, giving
them access to your storage account after they're gone. This is actually the primary reason they created
account-level Shared Access Signatures you can use an account-level SAS instead of storing the access keys
in a configuration file.
Key regeneration plan
You don't want to just regenerate the key you are using without some planning. If you do that, you could cut off all
access to that storage account, which can cause major disruption. This is why there are two keys. You should
regenerate one key at a time.
Before you regenerate your keys, be sure you have a list of all of your applications that are dependent on the storage
account, as well as any other services you are using in Azure. For example, if you are using Azure Media Services that
are dependent on your storage account, you must re-sync the access keys with your media service after you
regenerate the key. If you are using any applications such as a storage explorer, you will need to provide the new keys
to those applications as well. Note that if you have VMs whose VHD files are stored in the storage account, they will
not be affected by regenerating the storage account keys.
You can regenerate your keys in the Azure portal. Once keys are regenerated they can take up to 10 minutes to be
synchronized across Storage Services.
When you're ready, here's the general process detailing how you should change your key. In this case, the assumption
is that you are currently using Key 1 and you are going to change everything to use Key 2 instead.
1. Regenerate Key 2 to ensure that it is secure. You can do this in the Azure portal.
2. In all of the applications where the storage key is stored, change the storage key to use Key 2's new value.
Test and publish the application.
3. After all of the applications and services are up and running successfully, regenerate Key 1. This ensures that
anybody to whom you have not expressly given the new key will no longer have access to the storage
account.
If you are currently using Key 2, you can use the same process, but reverse the key names.
You can migrate over a couple of days, changing each application to use the new key and publishing it. After all of
them are done, you should then go back and regenerate the old key so it no longer works.
Another option is to put the storage account key in an Azure Key Vault as a secret and have your applications retrieve
the key from there. Then when you regenerate the key and update the Azure Key Vault, the applications will not need
to be redeployed because they will pick up the new key from the Azure Key Vault automatically. Note that you can
have the application read the key each time you need it, or you can cache it in memory and if it fails when using it,
retrieve the key again from the Azure Key Vault.
Using Azure Key Vault also adds another level of security for your storage keys. If you use this method, you will never
have the storage key hardcoded in a configuration file, which removes that avenue of somebody getting access to the
keys without specific permission.

115 | P a g e
70-534 Architecting Microsoft Azure Solutions

Another advantage of using Azure Key Vault is you can also control access to your keys using Azure Active Directory.
This means you can grant access to the handful of applications that need to retrieve the keys from Azure Key Vault,
and know that other applications will not be able to access the keys without granting them permission specifically.
Note: it is recommended to use only one of the keys in all of your applications at the same time. If you use Key 1 in
some places and Key 2 in others, you will not be able to rotate your keys without some application losing access.
Resources
About Azure Storage Accounts
This article gives an overview of storage accounts and discusses viewing, copying, and regenerating storage access
keys.
Azure Storage Resource Provider REST API Reference
This article contains links to specific articles about retrieving the storage account keys and regenerating the storage
account keys for an Azure Account using the REST API. Note: This is for Resource Manager storage accounts.
Operations on storage accounts
This article in the Storage Service Manager REST API Reference contains links to specific articles on retrieving and
regenerating the storage account keys using the REST API. Note: This is for the Classic storage accounts.
Say goodbye to key management manage access to Azure Storage data using Azure AD
This article shows how to use Active Directory to control access to your Azure Storage keys in Azure Key Vault. It also
shows how to use an Azure Automation job to regenerate the keys on an hourly basis.
Data Plane Security
Data Plane Security refers to the methods used to secure the data objects stored in Azure Storage the blobs,
queues, tables, and files. We've seen methods to encrypt the data and security during transit of the data, but how do
you go about allowing access to the objects?
There are basically two methods for controlling access to the data objects themselves. The first is by controlling access
to the storage account keys, and the second is using Shared Access Signatures to grant access to specific data objects
for a specific amount of time.
One exception to note is that you can allow public access to your blobs by setting the access level for the container
that holds the blobs accordingly. If you set access for a container to Blob or Container, it will allow public read access
for the blobs in that container. This means anyone with a URL pointing to a blob in that container can open it in a
browser without using a Shared Access Signature or having the storage account keys.
Storage Account Keys
Storage account keys are 512-bit strings created by Azure that, along with the storage account name, can be used to
access the data objects stored in the storage account.
For example, you can read blobs, write to queues, create tables, and modify files. Many of these actions can be
performed through the Azure portal, or using one of many Storage Explorer applications. You can also write code to
use the REST API or one of the Storage Client Libraries to perform these operations.
As discussed in the section on the Management Plane Security, access to the storage keys for a Classic storage
account can be granted by giving full access to the Azure subscription. Access to the storage keys for a storage
account using the Azure Resource Manager model can be controlled through Role-Based Access Control (RBAC).
How to delegate access to objects in your account using Shared Access Signatures and Stored Access Policies
A Shared Access Signature is a string containing a security token that can be attached to a URI that allows you to
delegate access to storage objects and specify constraints such as the permissions and the date/time range of access.
You can grant access to blobs, containers, queue messages, files, and tables. With tables, you can actually grant
permission to access a range of entities in the table by specifying the partition and row key ranges to which you want
the user to have access. For example, if you have data stored with a partition key of geographical state, you could give
someone access to just the data for California.
In another example, you might give a web application a SAS token that enables it to write entries to a queue, and give
a worker role application a SAS token to get messages from the queue and process them. Or you could give one
customer a SAS token they can use to upload pictures to a container in Blob Storage, and give a web application
permission to read those pictures. In both cases, there is a separation of concerns each application can be given just
the access that they require in order to perform their task. This is possible through the use of Shared Access
Signatures.
Why you want to use Shared Access Signatures
Why would you want to use an SAS instead of just giving out your storage account key, which is so much easier?
Giving out your storage account key is like sharing the keys of your storage kingdom. It grants complete access.
116 | P a g e
70-534 Architecting Microsoft Azure Solutions

Someone could use your keys and upload their entire music library to your storage account. They could also replace
your files with virus-infected versions, or steal your data. Giving away unlimited access to your storage account is
something that should not be taken lightly.
With Shared Access Signatures, you can give a client just the permissions required for a limited amount of time. For
example, if someone is uploading a blob to your account, you can grant them write access for just enough time to
upload the blob (depending on the size of the blob, of course). And if you change your mind, you can revoke that
access.
Additionally, you can specify that requests made using a SAS are restricted to a certain IP address or IP address range
external to Azure. You can also require that requests are made using a specific protocol (HTTPS or HTTP/HTTPS). This
means if you only want to allow HTTPS traffic, you can set the required protocol to HTTPS only, and HTTP traffic will be
blocked.
Definition of a Shared Access Signature
A Shared Access Signature is a set of query parameters appended to the URL pointing at the resource
that provides information about the access allowed and the length of time for which the access is permitted. Here is
an example; this URI provides read access to a blob for five minutes. Note that SAS query parameters must be URL
Encoded, such as %3A for colon (:) or %20 for a space.
http://mystorage.blob.core.windows.net/mycontainer/myblob.txt (URL to the blob)
?sv=2015-04-05 (storage service version)
&st=2015-12-10T22%3A18%3A26Z (start time, in UTC time and URL encoded)
&se=2015-12-10T22%3A23%3A26Z (end time, in UTC time and URL encoded)
&sr=b (resource is a blob)
&sp=r (read access)
&sip=168.1.5.60-168.1.5.70 (requests can only come from this range of IP addresses)
&spr=https (only allow HTTPS requests)
&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D (signature used for the authentication of the SAS)

How the Shared Access Signature is authenticated by the Azure Storage Service
When the storage service receives the request, it takes the input query parameters and creates a signature using the
same method as the calling program. It then compares the two signatures. If they agree, then the storage service can
check the storage service version to make sure it's valid, verify that the current date and time are within the specified
window, make sure the access requested corresponds to the request made, etc.
For example, with our URL above, if the URL was pointing to a file instead of a blob, this request would fail because it
specifies that the Shared Access Signature is for a blob. If the REST command being called was to update a blob, it
would fail because the Shared Access Signature specifies that only read access is permitted.
Types of Shared Access Signatures
A service-level SAS can be used to access specific resources in a storage account. Some examples of this are
retrieving a list of blobs in a container, downloading a blob, updating an entity in a table, adding messages to
a queue or uploading a file to a file share.
An account-level SAS can be used to access anything that a service-level SAS can be used for. Additionally, it
can give options to resources that are not permitted with a service-level SAS, such as the ability to create
containers, tables, queues, and file shares. You can also specify access to multiple services at once. For
example, you might give someone access to both blobs and files in your storage account.
Creating an SAS URI
1. You can create an ad hoc URI on demand, defining all of the query parameters each time.
This is really flexible, but if you have a logical set of parameters that are similar each time, using a Stored Access Policy
is a better idea.
2. You can create a Stored Access Policy for an entire container, file share, table, or queue. Then you can use this
as the basis for the SAS URIs you create. Permissions based on Stored Access Policies can be easily revoked.
You can have up to 5 policies defined on each container, queue, table, or file share.
For example, if you were going to have many people read the blobs in a specific container, you could create a Stored
Access Policy that says "give read access" and any other settings that will be the same each time. Then you can create
an SAS URI using the settings of the Stored Access Policy and specifying the expiration date/time. The advantage of
this is that you don't have to specify all of the query parameters every time.
Revocation

117 | P a g e
70-534 Architecting Microsoft Azure Solutions

Suppose your SAS has been compromised, or you want to change it because of corporate security or regulatory
compliance requirements. How do you revoke access to a resource using that SAS? It depends on how you created the
SAS URI.
If you are using ad hoc URI's, you have three options. You can issue SAS tokens with short expiration policies and
simply wait for the SAS to expire. You can rename or delete the resource (assuming the token was scoped to a single
object). You can change the storage account keys. This last option can have a big impact, depending on how many
services are using that storage account, and probably isn't something you want to do without some planning.
If you are using a SAS derived from a Stored Access Policy, you can remove access by revoking the Stored Access Policy
you can just change it so it has already expired, or you can remove it altogether. This takes effect immediately, and
invalidates every SAS created using that Stored Access Policy. Updating or removing the Stored Access Policy may
impact people accessing that specific container, file share, table, or queue via SAS, but if the clients are written so they
request a new SAS when the old one becomes invalid, this will work fine.
Because using a SAS derived from a Stored Access Policy gives you the ability to revoke that SAS immediately, it is the
recommended best practice to always use Stored Access Policies when possible.
Resources
For more detailed information on using Shared Access Signatures and Stored Access Policies, complete with examples,
please refer to the following articles:
These are the reference articles.
o Service SAS
This article provides examples of using a service-level SAS with blobs, queue messages, table ranges, and files.
o Constructing a service SAS
o Constructing an account SAS
These are tutorials for using the .NET client library to create Shared Access Signatures and Stored Access
Policies.
o Using Shared Access Signatures (SAS)
o Shared Access Signatures, Part 2: Create and Use a SAS with the Blob Service
This article includes an explanation of the SAS model, examples of Shared Access Signatures, and recommendations
for the best practice use of SAS. Also discussed is the revocation of the permission granted.
Limiting access by IP Address (IP ACLs)
o What is an endpoint Access Control List (ACLs)?
o Constructing a Service SAS
This is the reference article for service-level SAS; it includes an example of IP ACLing.
o Constructing an Account SAS
This is the reference article for account-level SAS; it includes an example of IP ACLing.
Authentication
o Authentication for the Azure Storage Services
Shared Access Signatures Getting Started Tutorial
o SAS Getting Started Tutorial
Encryption in Transit
Transport-Level Encryption Using HTTPS
Another step you should take to ensure the security of your Azure Storage data is to encrypt the data between the
client and Azure Storage. The first recommendation is to always use the HTTPS protocol, which ensures secure
communication over the public Internet.
To have a secure communication channel, you should always use HTTPS when calling the REST APIs or accessing
objects in storage. Also, Shared Access Signatures, which can be used to delegate access to Azure Storage objects,
include an option to specify that only the HTTPS protocol can be used when using Shared Access Signatures, ensuring
that anybody sending out links with SAS tokens will use the proper protocol.
You can enforce the use of HTTPS when calling the REST APIs to access objects in storage accounts by enabling Secure
transfer required for the storage account. Connections using HTTP will be refused once this is enabled.
Using encryption during transit with Azure File Shares
Azure File Storage supports HTTPS when using the REST API, but is more commonly used as an SMB file share attached
to a VM. SMB 2.1 does not support encryption, so connections are only allowed within the same region in Azure.
However, SMB 3.0 supports encryption, and it's available in Windows Server 2012 R2, Windows 8, Windows 8.1, and
Windows 10, allowing cross-region access and even access on the desktop.
118 | P a g e
70-534 Architecting Microsoft Azure Solutions

Note that while Azure File Shares can be used with Unix, the Linux SMB client does not yet support encryption, so
access is only allowed within an Azure region. Encryption support for Linux is on the roadmap of Linux developers
responsible for SMB functionality. When they add encryption, you will have the same ability for accessing an Azure
File Share on Linux as you do for Windows.
You can enforce the use of encryption with the Azure Files service by enabling Secure transfer required for the storage
account. If using the REST APIs, HTTPs is required. For SMB, only SMB connections that support encryption will
connect successfully.
Resources
How to use Azure File Storage with Linux
This article shows how to mount an Azure File Share on a Linux system and upload/download files.
Get started with Azure File storage on Windows
This article gives an overview of Azure File shares and how to mount and use them using PowerShell and .NET.
Inside Azure File Storage
This article announces the general availability of Azure File Storage and provides technical details about the SMB 3.0
encryption.
Using Client-side encryption to secure data that you send to storage
Another option that helps you ensure that your data is secure while being transferred between a client application
and Storage is Client-side Encryption. The data is encrypted before being transferred into Azure Storage. When
retrieving the data from Azure Storage, the data is decrypted after it is received on the client side. Even though the
data is encrypted going across the wire, we recommend that you also use HTTPS, as it has data integrity checks built in
which help mitigate network errors affecting the integrity of the data.
Client-side encryption is also a method for encrypting your data at rest, as the data is stored in its encrypted form.
We'll talk about this in more detail in the section on Encryption at Rest.
Encryption at Rest
There are three Azure features that provide encryption at rest. Azure Disk Encryption is used to encrypt the OS and
data disks in IaaS Virtual Machines. The other two Client-side Encryption and SSE are both used to encrypt data in
Azure Storage. Let's look at each of these, and then do a comparison and see when each one can be used.
While you can use Client-side Encryption to encrypt the data in transit (which is also stored in its encrypted form in
Storage), you may prefer to simply use HTTPS during the transfer, and have some way for the data to be automatically
encrypted when it is stored. There are two ways to do this -- Azure Disk Encryption and SSE. One is used to directly
encrypt the data on OS and data disks used by VMs, and the other is used to encrypt data written to Azure Blob
Storage.
Storage Service Encryption (SSE)
SSE allows you to request that the storage service automatically encrypt the data when writing it to Azure Storage.
When you read the data from Azure Storage, it will be decrypted by the storage service before being returned. This
enables you to secure your data without having to modify code or add code to any applications.
This is a setting that applies to the whole storage account. You can enable and disable this feature by changing the
value of the setting. To do this, you can use the Azure portal, PowerShell, the Azure CLI, the Storage Resource Provider
REST API, or the .NET Storage Client Library. By default, SSE is turned off.
At this time, the keys used for the encryption are managed by Microsoft. We generate the keys originally, and manage
the secure storage of the keys as well as the regular rotation as defined by internal Microsoft policy. In the future, you
will get the ability to manage your own encryption keys, and provide a migration path from Microsoft-managed keys
to customer-managed keys.
This feature is available for Standard and Premium Storage accounts created using the Resource Manager deployment
model. SSE applies only to block blobs, page blobs, and append blobs. The other types of data, including tables,
queues, and files, will not be encrypted.
Data is only encrypted when SSE is enabled and the data is written to Blob Storage. Enabling or disabling SSE does not
impact existing data. In other words, when you enable this encryption, it will not go back and encrypt data that
already exists; nor will it decrypt the data that already exists when you disable SSE.
If you want to use this feature with a Classic storage account, you can create a new Resource Manager storage
account and use AzCopy to copy the data to the new account.

119 | P a g e
70-534 Architecting Microsoft Azure Solutions

Client-side Encryption
We mentioned client-side encryption when discussing the encryption of the data in transit. This feature allows you to
programmatically encrypt your data in a client application before sending it across the wire to be written to Azure
Storage, and to programmatically decrypt your data after retrieving it from Azure Storage.
This does provide encryption in transit, but it also provides the feature of Encryption at Rest. Note that although the
data is encrypted in transit, we still recommend using HTTPS to take advantage of the built-in data integrity checks
which help mitigate network errors affecting the integrity of the data.
An example of where you might use this is if you have a web application that stores blobs and retrieves blobs, and you
want the application and data to be as secure as possible. In that case, you would use client-side encryption. The
traffic between the client and the Azure Blob Service contains the encrypted resource, and nobody can interpret the
data in transit and reconstitute it into your private blobs.
Client-side encryption is built into the Java and the .NET storage client libraries, which in turn use the Azure Key Vault
APIs, making it pretty easy for you to implement. The process of encrypting and decrypting the data uses the envelope
technique, and stores metadata used by the encryption in each storage object. For example, for blobs, it stores it in
the blob metadata, while for queues, it adds it to each queue message.
For the encryption itself, you can generate and manage your own encryption keys. You can also use keys generated by
the Azure Storage Client Library, or you can have the Azure Key Vault generate the keys. You can store your
encryption keys in your on-premises key storage, or you can store them in an Azure Key Vault. Azure Key Vault allows
you to grant access to the secrets in Azure Key Vault to specific users using Azure Active Directory. This means that
not just anybody can read the Azure Key Vault and retrieve the keys you're using for client-side encryption.
Resources
Encrypt and decrypt blobs in Microsoft Azure Storage using Azure Key Vault
This article shows how to use client-side encryption with Azure Key Vault, including how to create the KEK and store it
in the vault using PowerShell.
Client-Side Encryption and Azure Key Vault for Microsoft Azure Storage
This article gives an explanation of client-side encryption, and provides examples of using the storage client library to
encrypt and decrypt resources from the four storage services. It also talks about Azure Key Vault.
Using Azure Disk Encryption to encrypt disks used by your virtual machines
Azure Disk Encryption is a new feature. This feature allows you to encrypt the OS disks and Data disks used by an IaaS
Virtual Machine. For Windows, the drives are encrypted using industry-standard BitLocker encryption technology. For
Linux, the disks are encrypted using the DM-Crypt technology. This is integrated with Azure Key Vault to allow you to
control and manage the disk encryption keys.
The solution supports the following scenarios for IaaS VMs when they are enabled in Microsoft Azure:
Integration with Azure Key Vault
Standard tier VMs: A, D, DS, G, GS, and so forth series IaaS VMs
Enabling encryption on Windows and Linux IaaS VMs
Disabling encryption on OS and data drives for Windows IaaS VMs
Disabling encryption on data drives for Linux IaaS VMs
Enabling encryption on IaaS VMs that are running Windows client OS
Enabling encryption on volumes with mount paths
Enabling encryption on Linux VMs that are configured with disk striping (RAID) by using mdadm
Enabling encryption on Linux VMs by using LVM for data disks
Enabling encryption on Windows VMs that are configured by using storage spaces
All Azure public regions are supported
The solution does not support the following scenarios, features, and technology in the release:
Basic tier IaaS VMs
Disabling encryption on an OS drive for Linux IaaS VMs
IaaS VMs that are created by using the classic VM creation method
Integration with your on-premises Key Management Service
Azure Files (shared file system), Network File System (NFS), dynamic volumes, and Windows VMs that are
configured with software-based RAID systems
Note
Linux OS disk encryption is currently supported on the following Linux distributions: RHEL 7.2, CentOS 7.2n, and
Ubuntu 16.04.
120 | P a g e
70-534 Architecting Microsoft Azure Solutions

This feature ensures that all data on your virtual machine disks is encrypted at rest in Azure Storage.
Resources
Azure Disk Encryption for Windows and Linux IaaS VMs
Comparison of Azure Disk Encryption, SSE, and Client-Side Encryption
IaaS VMs and their VHD files
For disks used by IaaS VMs, we recommend using Azure Disk Encryption. You can turn on SSE to encrypt the VHD files
that are used to back those disks in Azure Storage, but it only encrypts newly written data. This means if you create a
VM and then enable SSE on the storage account that holds the VHD file, only the changes will be encrypted, not the
original VHD file.
If you create a VM using an image from the Azure Marketplace, Azure performs a shallow copy of the image to your
storage account in Azure Storage, and it is not encrypted even if you have SSE enabled. After it creates the VM and
starts updating the image, SSE will start encrypting the data. For this reason, it's best to use Azure Disk Encryption on
VMs created from images in the Azure Marketplace if you want them fully encrypted.
If you bring a pre-encrypted VM into Azure from on-premises, you will be able to upload the encryption keys to Azure
Key Vault, and continue using the encryption for that VM that you were using on-premises. Azure Disk Encryption is
enabled to handle this scenario.
If you have non-encrypted VHD from on-premises, you can upload it into the gallery as a custom image and provision
a VM from it. If you do this using the Resource Manager templates, you can ask it to turn on Azure Disk Encryption
when it boots up the VM.
When you add a data disk and mount it on the VM, you can turn on Azure Disk Encryption on that data disk. It will
encrypt that data disk locally first, and then the service management layer will do a lazy write against storage so the
storage content is encrypted.
Client-side encryption
Client-side encryption is the most secure method of encrypting your data, because it encrypts it before transit, and
encrypts the data at rest. However, it does require that you add code to your applications using storage, which you
may not want to do. In those cases, you can use HTTPs for your data in transit, and SSE to encrypt the data at rest.
With client-side encryption, you can encrypt table entities, queue messages, and blobs. With SSE, you can only
encrypt blobs. If you need table and queue data to be encrypted, you should use client-side encryption.
Client-side encryption is managed entirely by the application. This is the most secure approach, but does require you
to make programmatic changes to your application and put key management processes in place. You would use this
when you want the extra security during transit, and you want your stored data to be encrypted.
Client-side encryption is more load on the client, and you have to account for this in your scalability plans, especially if
you are encrypting and transferring a lot of data.
Storage Service Encryption (SSE)
SSE is managed by Azure Storage. Using SSE does not provide for the security of the data in transit, but it does encrypt
the data as it is written to Azure Storage. There is no impact on the performance when using this feature.
You can only encrypt block blobs, append blobs, and page blobs using SSE. If you need to encrypt table data or queue
data, you should consider using client-side encryption.
If you have an archive or library of VHD files that you use as a basis for creating new virtual machines, you can create a
new storage account, enable SSE, and then upload the VHD files to that account. Those VHD files will be encrypted by
Azure Storage.
If you have Azure Disk Encryption enabled for the disks in a VM and SSE enabled on the storage account holding the
VHD files, it will work fine; it will result in any newly-written data being encrypted twice.
Storage Analytics
Using Storage Analytics to monitor authorization type
For each storage account, you can enable Azure Storage Analytics to perform logging and store metrics data. This is a
great tool to use when you want to check the performance metrics of a storage account, or need to troubleshoot a
storage account because you are having performance problems.
Another piece of data you can see in the storage analytics logs is the authentication method used by someone when
they access storage. For example, with Blob Storage, you can see if they used a Shared Access Signature or the storage
account keys, or if the blob accessed was public.
This can be really helpful if you are tightly guarding access to storage. For example, in Blob Storage you can set all of
the containers to private and implement the use of an SAS service throughout your applications. Then you can check

121 | P a g e
70-534 Architecting Microsoft Azure Solutions

the logs regularly to see if your blobs are accessed using the storage account keys, which may indicate a breach of
security, or if the blobs are public but they shouldn't be.
What do the logs look like?
After you enable the storage account metrics and logging through the Azure portal, analytics data will start to
accumulate quickly. The logging and metrics for each service is separate; the logging is only written when there is
activity in that storage account, while the metrics will be logged every minute, every hour, or every day, depending on
how you configure it.
The logs are stored in block blobs in a container named $logs in the storage account. This container is automatically
created when Storage Analytics is enabled. Once this container is created, you can't delete it, although you can delete
its contents.
Under the $logs container, there is a folder for each service, and then there are subfolders for the
year/month/day/hour. Under hour, the logs are simply numbered. This is what the directory structure will look like:

Every request to Azure Storage is logged. Here's a snapshot of a log file, showing the first few fields.

You can see that you can use the logs to track any kind of calls to a storage account.
What are all of those fields for?
There is an article listed in the resources below that provides the list of the many fields in the logs and what they are
used for. Here is the list of fields in order:

We're interested in the entries for GetBlob, and how they are authenticated, so we need to look for entries with
operation-type "Get-Blob", and check the request-status (4th column) and the authorization-type (8th column).
For example, in the first few rows in the listing above, the request-status is "Success" and the authorization-type is
"authenticated". This means the request was validated using the storage account key.
122 | P a g e
70-534 Architecting Microsoft Azure Solutions

How are my blobs being authenticated?


We have three cases that we are interested in.
1. The blob is public and it is accessed using a URL without a Shared Access Signature. In this case, the request-
status is "AnonymousSuccess" and the authorization-type is "anonymous".
1.0;2015-11-17T02:01:29.0488963Z;GetBlob;AnonymousSuccess;200;124;37;anonymous;;mystorage
2. The blob is private and was used with a Shared Access Signature. In this case, the request-status is
"SASSuccess" and the authorization-type is "sas".
1.0;2015-11-16T18:30:05.6556115Z;GetBlob;SASSuccess;200;416;64;sas;;mystorage
3. The blob is private and the storage key was used to access it. In this case, the request-status is "Success" and
the authorization-type is "authenticated".
1.0;2015-11-16T18:32:24.3174537Z;GetBlob;Success;206;59;22;authenticated;mystorage
You can use the Microsoft Message Analyzer to view and analyze these logs. It includes search and filter capabilities.
For example, you might want to search for instances of GetBlob to see if the usage is what you expect, i.e. to make
sure someone is not accessing your storage account inappropriately.
Resources
Storage Analytics
This article is an overview of storage analytics and how to enable them.
Storage Analytics Log Format
This article illustrates the Storage Analytics Log Format, and details the fields available therein, including
authentication-type, which indicates the type of authentication used for the request.
Monitor a Storage Account in the Azure portal
This article shows how to configure monitoring of metrics and logging for a storage account.
End-to-End Troubleshooting using Azure Storage Metrics and Logging, AzCopy, and Message Analyzer
This article talks about troubleshooting using the Storage Analytics and shows how to use the Microsoft Message
Analyzer.
Microsoft Message Analyzer Operating Guide
This article is the reference for the Microsoft Message Analyzer and includes links to a tutorial, quick start, and feature
summary.
Cross-Origin Resource Sharing (CORS)
Cross-domain access of resources
When a web browser running in one domain makes an HTTP request for a resource from a different domain, this is
called a cross-origin HTTP request. For example, an HTML page served from contoso.com makes a request for a jpeg
hosted on fabrikam.blob.core.windows.net. For security reasons, browsers restrict cross-origin HTTP requests initiated
from within scripts, such as JavaScript. This means that when some JavaScript code on a web page on contoso.com
requests that jpeg on fabrikam.blob.core.windows.net, the browser will not allow the request.
What does this have to do with Azure Storage? Well, if you are storing static assets such as JSON or XML data files in
Blob Storage using a storage account called Fabrikam, the domain for the assets will be
fabrikam.blob.core.windows.net, and the contoso.com web application will not be able to access them using
JavaScript because the domains are different. This is also true if you're trying to call one of the Azure Storage Services
such as Table Storage that return JSON data to be processed by the JavaScript client.
Possible solutions
One way to resolve this is to assign a custom domain like "storage.contoso.com" to fabrikam.blob.core.windows.net.
The problem is that you can only assign that custom domain to one storage account. What if the assets are stored in
multiple storage accounts?
Another way to resolve this is to have the web application act as a proxy for the storage calls. This means if you are
uploading a file to Blob Storage, the web application would either write it locally and then copy it to Blob Storage, or it
would read all of it into memory and then write it to Blob Storage. Alternately, you could write a dedicated web
application (such as a Web API) that uploads the files locally and writes them to Blob Storage. Either way, you have to
account for that function when determining the scalability needs.
How can CORS help?
Azure Storage allows you to enable CORS Cross Origin Resource Sharing. For each storage account, you can specify
domains that can access the resources in that storage account. For example, in our case outlined above, we can
enable CORS on the fabrikam.blob.core.windows.net storage account and configure it to allow access to contoso.com.
Then the web application contoso.com can directly access the resources in fabrikam.blob.core.windows.net.
123 | P a g e
70-534 Architecting Microsoft Azure Solutions

One thing to note is that CORS allows access, but it does not provide authentication, which is required for all non-
public access of storage resources. This means you can only access blobs if they are public or you include a Shared
Access Signature giving you the appropriate permission. Tables, queues, and files have no public access, and require a
SAS.
By default, CORS is disabled on all services. You can enable CORS by using the REST API or the storage client library to
call one of the methods to set the service policies. When you do that, you include a CORS rule, which is in XML. Here's
an example of a CORS rule that has been set using the Set Service Properties operation for the Blob Service for a
storage account. You can perform that operation using the storage client library or the REST APIs for Azure Storage.
<Cors>
<CorsRule>
<AllowedOrigins>http://www.contoso.com, http://www.fabrikam.com</AllowedOrigins>
<AllowedMethods>PUT,GET</AllowedMethods>
<AllowedHeaders>x-ms-meta-data*,x-ms-meta-target*,x-ms-meta-abc</AllowedHeaders>
<ExposedHeaders>x-ms-meta-*</ExposedHeaders>
<MaxAgeInSeconds>200</MaxAgeInSeconds>
</CorsRule>
<Cors>
Here's what each row means:
AllowedOrigins This tells which non-matching domains can request and receive data from the storage service.
This says that both contoso.com and fabrikam.com can request data from Blob Storage for a specific storage
account. You can also set this to a wildcard (*) to allow all domains to access requests.
AllowedMethods This is the list of methods (HTTP request verbs) that can be used when making the request.
In this example, only PUT and GET are allowed. You can set this to a wildcard (*) to allow all methods to be
used.
AllowedHeaders This is the request headers that the origin domain can specify when making the request. In
this example, all metadata headers starting with x-ms-meta-data, x-ms-meta-target, and x-ms-meta-abc are
permitted. The wildcard character (*) indicates that any header beginning with the specified prefix is allowed.
ExposedHeaders This tells which response headers should be exposed by the browser to the request issuer. In
this example, any header starting with "x-ms-meta-" will be exposed.
MaxAgeInSeconds This is the maximum amount of time that a browser will cache the preflight OPTIONS
request. (For more information about the preflight request, check the first article below.)
Resources
For more information about CORS and how to enable it, please check out these resources.
Cross-Origin Resource Sharing (CORS) Support for the Azure Storage Services on Azure.com
This article provides an overview of CORS and how to set the rules for the different storage services.
Cross-Origin Resource Sharing (CORS) Support for the Azure Storage Services on MSDN
This is the reference documentation for CORS support for the Azure Storage Services. This has links to articles applying
to each storage service, and shows an example and explains each element in the CORS file.
Microsoft Azure Storage: Introducing CORS
This is a link to the initial blog article announcing CORS and showing how to use it.
Frequently asked questions about Azure Storage security
1. How can I verify the integrity of the blobs I'm transferring into or out of Azure Storage if I can't use the HTTPS
protocol?
If for any reason you need to use HTTP instead of HTTPS and you are working with block blobs, you can use MD5
checking to help verify the integrity of the blobs being transferred. This will help with protection from
network/transport layer errors, but not necessarily with intermediary attacks.
If you can use HTTPS, which provides transport level security, then using MD5 checking is redundant and unnecessary.
For more information, please check out the Azure Blob MD5 Overview.
2. What about FIPS-Compliance for the U.S. Government?
The United States Federal Information Processing Standard (FIPS) defines cryptographic algorithms approved for use
by U.S. Federal government computer systems for the protection of sensitive data. Enabling FIPS mode on a Windows
server or desktop tells the OS that only FIPS-validated cryptographic algorithms should be used. If an application uses
non-compliant algorithms, the applications will break. With.NET Framework versions 4.5.2 or higher, the application
automatically switches the cryptography algorithms to use FIPS-compliant algorithms when the computer is in FIPS
mode.

124 | P a g e
70-534 Architecting Microsoft Azure Solutions

Microsoft leaves it up to each customer to decide whether to enable FIPS mode. We believe there is no compelling
reason for customers who are not subject to government regulations to enable FIPS mode by default.
Resources
Why We're Not Recommending "FIPS Mode" Anymore
This blog article gives an overview of FIPS and explains why they don't enable FIPS mode by default.
FIPS 140 Validation
This article provides information on how Microsoft products and cryptographic modules comply with the FIPS
standard for the U.S. Federal government.
"System cryptography: Use FIPS compliant algorithms for encryption, hashing, and signing" security settings
effects in Windows XP and in later versions of Windows
This article talks about the use of FIPS mode in older Windows computers.

Azure Storage Service Encryption for Data at Rest


Azure Storage Service Encryption (SSE) for Data at Rest helps you protect and safeguard your data to meet your
organizational security and compliance commitments. With this feature, Azure Storage automatically encrypts your
data prior to persisting to storage and decrypts prior to retrieval. The encryption, decryption, and key management
are totally transparent to users.
The following sections provide detailed guidance on how to use the
Storage Service Encryption features as well as the supported
scenarios and user experiences.
Overview
Azure Storage provides a comprehensive set of security capabilities
which together enable developers to build secure applications. Data
can be secured in transit between an application and Azure by
using Client-Side Encryption, HTTPs, or SMB 3.0. Storage Service
Encryption provides encryption at rest, handling encryption,
decryption, and key management in a totally transparent fashion. All
data is encrypted using 256-bit AES encryption, one of the strongest
block ciphers available.
SSE works by encrypting the data when it is written to Azure Storage,
and can be used for Azure Blob Storage and File Storage. It works for
the following:
Standard Storage: General purpose storage accounts for
Blobs and File storage and Blob storage accounts
Premium storage
All redundancy levels (LRS, ZRS, GRS, RA-GRS)
Azure Resource Manager storage accounts (but not classic)
All regions.
To learn more, please refer to the FAQ.
To enable or disable Storage Service Encryption for a storage account,
log into the Azure portal and select a storage account. On the
Settings blade, look for the Blob Service section as shown in this
screenshot and click Encryption.

Figure 1: Enable SSE for Blob Service (Step1)

125 | P a g e
70-534 Architecting Microsoft Azure Solutions

Figure 2: Enable SSE for File Service (Step1)


After you click the Encryption setting, you can enable or disable Storage Service Encryption.

Figure 3: Enable SSE for Blob and File Service (Step2)


Encryption Scenarios
Storage Service Encryption can be enabled at a storage account level. Once enabled, customers will choose which
services to encrypt. It supports the following customer scenarios:
Encryption of Blob Storage and File Storage in Resource Manager accounts.
Encryption of Blob and File Service in classic storage accounts once migrated to Resource Manager storage
accounts.
SSE has the following limitations:
Encryption of classic storage accounts is not supported.
Existing Data - SSE only encrypts newly created data after the encryption is enabled. If for example you create
a new Resource Manager storage account but don't turn on encryption, and then you upload blobs or
archived VHDs to that storage account and then turn on SSE, those blobs will not be encrypted unless they are
rewritten or copied.
Marketplace Support - Enable encryption of VMs created from the Marketplace using the Azure portal,
PowerShell, and Azure CLI. The VHD base image will remain unencrypted; however, any writes done after the
VM has spun up will be encrypted.

126 | P a g e
70-534 Architecting Microsoft Azure Solutions

Table and Queues data will not be encrypted.


Getting Started
Step 1: Create a new storage account.
Step 2: Enable encryption.
You can enable encryption using the Azure portal.
Note
If you want to programmatically enable or disable the Storage Service Encryption on a storage account, you can use
the Azure Storage Resource Provider REST API, the Storage Resource Provider Client Library for .NET, Azure
PowerShell, or the Azure CLI.
Step 3: Copy data to storage account
If you enable SSE for the Blob service, any blobs written to that storage account will be encrypted. Any blobs already
located in that storage account will not be encrypted until they are rewritten. You can copy the data from one storage
account to one with SSE encrypted, or even enable SSE and copy the blobs from one container to another to sure that
previous data is encrypted. You can use any of the following tools to accomplish this. This is the same behavior for File
Storage as well.
Using AzCopy
AzCopy is a Windows command-line utility designed for copying data to and from Microsoft Azure Blob, File, and Table
storage using simple commands with optimal performance. You can use this to copy your blobs or files from one
storage account to another one that has SSE enabled.
To learn more, please visit Transfer data with the AzCopy Command-Line Utility.
Using SMB
Azure File storage offers file shares in the cloud using the standard SMB protocol. You can mount a file share from a
client on premises or in Azure. Once mounted, tools such as Robocopy can be used to copy files over to Azure File
shares. For more information, see how to mount Azure Fileshare on Windows and how to mount Azure File share on
Linux.
Using the Storage Client Libraries
You can copy blob or file data to and from blob storage or between storage accounts using our rich set of Storage
Client Libraries including .NET, C++, Java, Android, Node.js, PHP, Python, and Ruby.
To learn more, please visit our Get started with Azure Blob storage using .NET.
Using a Storage Explorer
You can use a Storage explorer to create storage accounts, upload and download data, view contents of blobs, and
navigate through directories. You can use one of these to upload blobs to your storage account with encryption
enabled. With some storage explorers, you can also copy data from existing blob storage to a different container in
the storage account or a new storage account that has SSE enabled.
To learn more, please visit Azure Storage Explorers.
Step 4: Query the status of the encrypted data
An updated version of the Storage Client libraries has been deployed that allows you to query the state of an object to
determine if it is encrypted or not. This is currently only available for Blob storage. Support for File storage is on the
roadmap.
In the meantime, you can call Get Account Properties to verify that the storage account has encryption enabled or
view the storage account properties in the Azure portal.
Encryption and Decryption Workflow
Here is a brief description of the encryption/decryption workflow:
The customer enables encryption on the storage account.
When the customer writes new data (PUT Blob, PUT Block, PUT Page, PUT File etc.) to Blob or File storage;
every write is encrypted using 256-bit AES encryption, one of the strongest block ciphers available.
When the customer needs to access data (GET Blob, etc.), data is automatically decrypted before returning to
the user.
If encryption is disabled, new writes are no longer encrypted and existing encrypted data remains encrypted
until rewritten by the user. While encryption is enabled, writes to Blob or File storage will be encrypted. The
state of data does not change with the user toggling between enabling/disabling encryption for the storage
account.
All encryption keys are stored, encrypted, and managed by Microsoft.
Frequently asked questions about Storage Service Encryption for Data at Rest
127 | P a g e
70-534 Architecting Microsoft Azure Solutions

Q: I have an existing classic storage account. Can I enable SSE on it?


A: No, SSE is only supported on Resource Manager storage accounts.
Q: How can I encrypt data in my classic storage account?
A: You can create a new Resource Manager storage account and copy your data using AzCopy from your existing
classic storage account to your newly created Resource Manager storage account.
If you migrate your classic storage account to a Resource Manager storage account, this operation is instantaneous, it
changes the type of your account but does not effect your existing data. Any new data written will be encrypted only
after enabling encryption. For more information, see Platform Supported Migration of IaaS Resources from Classic to
Resource Manager. Please note that this is supported only for Blob and File services.
Q: I have an existing Resource Manager storage account. Can I enable SSE on it?
A: Yes, but only newly written data will be encrypted. It does not go back and encrypt data that was already present.
This is not yet supported for the File Storage Preview.
Q: I would like to encrypt the current data in an existing Resource Manager storage account?
A: You can enable SSE at any time in a Resource Manager storage account. However, data that were already present
will not be encrypted. To encrypt existing data, you can copy them to another name or another container and then
remove the unencrypted versions.
Q: I'm using Premium storage; can I use SSE?
A: Yes, SSE is supported on both Standard Storage and Premium Storage. Premium Storage is not supported for the
File Service.
Q: If I create a new storage account and enable SSE, then create a new VM using that storage account, does that mean
my VM is encrypted?
A: Yes. Any disks created that use the new storage account will be encrypted, as long as they are created after SSE is
enabled. If the VM was created using Azure Market Place, the VHD base image will remain unencrypted; however, any
writes done after the VM has spun up will be encrypted.
Q: Can I create new storage accounts with SSE enabled using Azure PowerShell and Azure CLI?
A: Yes.
Q: How much more does Azure Storage cost if SSE is enabled?
A: There is no additional cost.
Q: Who manages the encryption keys?
A: The keys are managed by Microsoft.
Q: Can I use my own encryption keys?
A: We are working on providing capabilities for customers to bring their own encryption keys.
Q: Can I revoke access to the encryption keys?
A: Not at this time; the keys are fully managed by Microsoft.
Q: Is SSE enabled by default when I create a new storage account?
A: SSE is not enabled by default; you can use the Azure portal to enable it. You can also programmatically enable this
feature using the Storage Resource Provider REST API.
Q: How is this different from Azure Disk Encryption?
A: This feature is used to encrypt data in Azure Blob storage. The Azure Disk Encryption is used to encrypt OS and Data
disks in IaaS VMs. For more details, please visit our Storage Security Guide.
Q: What if I enable SSE, and then go in and enable Azure Disk Encryption on the disks?
A: This will work seamlessly. Your data will be encrypted by both methods.
Q: My storage account is set up to be replicated geo-redundantly. If I enable SSE, will my redundant copy also be
encrypted?
A: Yes, all copies of the storage account are encrypted, and all redundancy options Locally Redundant Storage (LRS),
Zone-Redundant Storage (ZRS), Geo-Redundant Storage (GRS), and Read Access Geo-Redundant Storage (RA-GRS)
are supported.
Q: I can't enable encryption on my storage account.
A: Is it a Resource Manager storage account? Classic storage accounts are not supported.
Q: Is SSE only permitted in specific regions?
A: The SSE is available in all regions for Blob storage. Please check the Availability Section for File storage.
Q: How do I contact someone if I have any issues or want to provide feedback?
A: Please contact ssediscussions@microsoft.com for any issues related to Storage Service Encryption.

128 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure Disk Encryption for Windows and Linux IaaS VMs


Microsoft Azure is strongly committed to ensuring your data privacy, data sovereignty and enables you to control your
Azure hosted data through a range of advanced technologies to encrypt, control and manage encryption keys, control
& audit access of data. This provides Azure customers the flexibility to choose the solution that best meets their
business needs. In this paper, we will introduce you to a new technology solution Azure Disk Encryption for Windows
and Linux IaaS VMs to help protect and safeguard your data to meet your organizational security and compliance
commitments. The paper provides detailed guidance on how to use the Azure disk encryption features including the
supported scenarios and the user experiences.
Note
Certain recommendations might increase data, network, or compute resource usage, resulting in additional license or
subscription costs.
Overview
Azure Disk Encryption is a new capability that helps you encrypt your Windows and Linux IaaS virtual machine disks.
Azure Disk Encryption leverages the industry standard BitLocker feature of Windows and the DM-Crypt feature of
Linux to provide volume encryption for the OS and the data disks. The solution is integrated with Azure Key Vault to
help you control and manage the disk-encryption keys and secrets in your key vault subscription. The solution also
ensures that all data on the virtual machine disks are encrypted at rest in your Azure storage.1
Azure disk encryption for Windows and Linux IaaS VMs is now in General Availability in all Azure public regions and
AzureGov regions for Standard VMs and VMs with premium storage.
Encryption scenarios
The Azure Disk Encryption solution supports the following customer scenarios:
Enable encryption on new IaaS VMs created from pre-encrypted VHD and encryption keys
Enable encryption on new IaaS VMs created from the Azure Gallery images
Enable encryption on existing IaaS VMs running in Azure
Disable encryption on Windows IaaS VMs
Disable encryption on data drives for Linux IaaS VMs
Enable encryption of managed disk VMs
Update encryption settings of an existing encrypted non-premium storage VM
Backup and restore of encrypted VMs, encrypted with key encryption key
The solution supports the following scenarios for IaaS VMs when they are enabled in Microsoft Azure:
Integration with Azure Key Vault
Standard tier VMs: A, D, DS, G, GS, F, and so forth series IaaS VMs
Enable encryption on Windows and Linux IaaS VMs and managed disk VMs
Disable encryption on OS and data drives for Windows IaaS VMs and managed disk VMs
Disable encryption on data drives for Linux IaaS VMs and managed disk VMs
Enable encryption on IaaS VMs running Windows Client OS
Enable encryption on volumes with mount paths
Enable encryption on Linux VMs configured with disk striping (RAID) using mdadm
Enable encryption on Linux VMs using LVM for data disks
Enable encryption on Windows VMs configured with Storage Spaces
Update encryption settings of an existing encrypted non-premium storage VM
All Azure Public and AzureGov regions are supported
The solution does not support the following scenarios, features, and technology:
Basic tier IaaS VMs
Disabling encryption on an OS drive for Linux IaaS VMs
IaaS VMs that are created by using the classic VM creation method
Integration with your on-premises Key Management Service
Azure Files (shared file system), Network File System (NFS), dynamic volumes, and Windows VMs that are
configured with software-based RAID systems
Backup and restore of encrypted VMs, encrypted without key encryption key.
Update encryption settings of an existing encrypted premium storage VM.
Note

129 | P a g e
70-534 Architecting Microsoft Azure Solutions

Backup and restore of encrypted VMs is supported only for VMs that are encrypted with the KEK configuration. It is
not supported on VMs that are encrypted without KEK. KEK is an optional parameter that enables VM encryption. This
support is coming soon. Update encryption settings of an existing encrypted premium storage VM are not supported.
This support is coming soon.
Encryption features
When you enable and deploy Azure Disk Encryption for Azure IaaS VMs, the following capabilities are enabled,
depending on the configuration provided:
Encryption of the OS volume to protect the boot volume at rest in your storage
Encryption of data volumes to protect the data volumes at rest in your storage
Disabling encryption on the OS and data drives for Windows IaaS VMs
Disabling encryption on the data drives for Linux IaaS VMs
Safeguarding the encryption keys and secrets in your key vault subscription
Reporting the encryption status of the encrypted IaaS VM
Removal of disk-encryption configuration settings from the IaaS virtual machine
Backup and restore of encrypted VMs by using the Azure Backup service
Note
Backup and restore of encrypted VMs is supported only for VMs that are encrypted with the KEK configuration. It is
not supported on VMs that are encrypted without KEK. KEK is an optional parameter that enables VM encryption.
Azure Disk Encryption for IaaS VMS for Windows and Linux solution includes:
The disk-encryption extension for Windows.
The disk-encryption extension for Linux.
The disk-encryption PowerShell cmdlets.
The disk-encryption Azure command-line interface (CLI) cmdlets.
The disk-encryption Azure Resource Manager templates.
The Azure Disk Encryption solution is supported on IaaS VMs that are running Windows or Linux OS. For more
information about the supported operating systems, see the "Prerequisites" section.
Note
There is no additional charge for encrypting VM disks with Azure Disk Encryption.
Value proposition
When you apply the Azure Disk Encryption-management solution, you can satisfy the following business needs:
IaaS VMs are secured at rest, because you can use industry-standard encryption technology to address
organizational security and compliance requirements.
IaaS VMs boot under customer-controlled keys and policies, and you can audit their usage in your key vault.
Encryption workflow
To enable disk encryption for Windows and Linux VMs, do the following:
1. Choose an encryption scenario from among the preceding encryption scenarios.
2. Opt in to enabling disk encryption via the Azure Disk Encryption Resource Manager template, PowerShell
cmdlets, or CLI command, and specify the encryption configuration.
For the customer-encrypted VHD scenario, upload the encrypted VHD to your storage account and
the encryption key material to your key vault. Then, provide the encryption configuration to enable
encryption on a new IaaS VM.
For new VMs that are created from the Marketplace and existing VMs that are already running in
Azure, provide the encryption configuration to enable encryption on the IaaS VM.
3. Grant access to the Azure platform to read the encryption-key material (BitLocker encryption keys for
Windows systems and Passphrase for Linux) from your key vault to enable encryption on the IaaS VM.
4. Provide the Azure Active Directory (Azure AD) application identity to write the encryption key material to your
key vault. Doing so enables encryption on the IaaS VM for the scenarios mentioned in step 2.
5. Azure updates the VM service model with encryption and the key vault configuration, and sets up your
encrypted VM.

130 | P a g e
70-534 Architecting Microsoft Azure Solutions

Decryption workflow
To disable disk encryption for IaaS VMs, complete the following high-level steps:
1. Choose to disable encryption (decryption) on a running IaaS VM in Azure via the Azure Disk Encryption
Resource Manager template or PowerShell cmdlets, and specify the decryption configuration.
This step disables encryption of the OS or the data volume or both on the running Windows IaaS VM. However, as
mentioned in the previous section, disabling OS disk encryption for Linux is not supported. The decryption step is
allowed only for data drives on Linux VMs.
2. Azure updates the VM service model, and the IaaS VM is marked decrypted. The contents of the VM are no
longer encrypted at rest.
Note
The disable-encryption operation does not delete your key vault and the encryption key material (BitLocker
encryption keys for Windows systems or Passphrase for Linux). Disabling OS disk encryption for Linux is not supported.
The decryption step is allowed only for data drives on Linux VMs.

Transparent Data Encryption with Azure SQL Database


THIS TOPIC APPLIES TO: SQL Server Azure SQL Database Azure SQL Data Warehouse Parallel Data
Warehouse

Azure SQL Database transparent data encryption helps protect against the threat of malicious activity by performing
real-time encryption and decryption of the database, associated backups, and transaction log files at rest without
requiring changes to the application.
TDE encrypts the storage of an entire database by using a symmetric key called the database encryption key. In SQL
Database the database encryption key is protected by a built-in server certificate. The built-in server certificate is
unique for each SQL Database server. If a database is in a GeoDR relationship, it is protected by a different key on each
server. If 2 databases are connected to the same server, they share the same built-in certificate. Microsoft
automatically rotates these certificates at least every 90 days. For a general description of TDE, see Transparent Data
Encryption (TDE).
Azure SQL Database does not support Azure Key Vault integration with TDE. SQL Server running on an Azure virtual
machine can use an asymmetric key from the Key Vault. For more information, see Extensible Key Management Using
Azure Key Vault (SQL Server).
Permissions
To configure TDE through the Azure portal, by using the REST API, or by using PowerShell, you must be connected as
the Azure Owner, Contributor, or SQL Security Manager.
To configure TDE by using Transact-SQL requires the following:
To execute the ALTER DATABASE statement with the SET option requires membership in the dbmanager role.
Enable TDE on a Database Using the Portal
1. Visit the Azure Portal at https://portal.azure.com and sign-in with your Azure Administrator or Contributor
account.
2. On the left banner, click to BROWSE, and then click SQL databases.

131 | P a g e
70-534 Architecting Microsoft Azure Solutions

3. With SQL databases selected in the left pane, click your user database.
4. In the database blade, click All settings.
5. In the Settings blade, click Transparent data encryption part to open the Transparent data encryption blade.
6. In the Data encryption blade, move the Data encryption button to On, and then click Save (at the top of the
page) to apply the setting. The Encryption status will approximate the progress of the transparent data
encryption.

You can also monitor the progress of encryption by connecting to SQL Database using a query tool such as SQL Server
Management Studio as a database user with the VIEW DATABASE STATE permission. Query
the encryption_state column of the sys.dm_database_encryption_keys view.
Enabling TDE on SQL Database by Using Transact-SQL
The following steps enable TDE.
1. Connect to the database using a login that is an administrator or a member of the dbmanager role in the
master database.
2. Execute the following statements to encrypt the database.
Copy
-- Enable encryption
ALTER DATABASE [AdventureWorks] SET ENCRYPTION ON;
GO
3. To monitor the progress of encryption on SQL Database, database users with the VIEW DATABASE
STATE permission can query the encryption_state column of the sys.dm_database_encryption_keys view.
Enabling and Disabling TDE on SQL Database by Using PowerShell
Using the Azure PowerShell you can run the following command to turn TDE on/off. You must connect your account
to the PS window before running the command. Customize the example to use your values for
the ServerName, ResourceGroupName, and DatabaseName parameters. For additional information about PowerShell,
see How to install and configure Azure PowerShell.
Note
To continue, you should install and configure version 1.0 of Azure PowerShell. Version 0.9.8 can be used but it is
deprecated and it requires switching to the AzureResourceManager cmdlets by using the PS C:\> Switch-AzureMode -
Name AzureResourceManager command.
1. To enable TDE, return the TDE status, and view the encryption activity:
PS C:\> Set-AzureRMSqlDatabaseTransparentDataEncryption -ServerName "myserver" -ResourceGroupName
"Default-SQL-WestUS" -DatabaseName "database1" -State "Enabled"

PS C:\> Get-AzureRMSqlDatabaseTransparentDataEncryption -ServerName "myserver" -ResourceGroupName


"Default-SQL-WestUS" -DatabaseName "database1"

PS C:\> Get-AzureRMSqlDatabaseTransparentDataEncryptionActivity -ServerName "myserver" -ResourceGroupName


"Default-SQL-WestUS" -DatabaseName "database1"

132 | P a g e
70-534 Architecting Microsoft Azure Solutions

If using version 0.9.8 use the Set-AzureSqlDatabaseTransparentDataEncryption, Get-


AzureSqlDatabaseTransparentDataEncryption, and Get-AzureSqlDatabaseTransparentDataEncryptionActivity
commands.
2. To disable TDE:
PS C:\> Set-AzureRMSqlDatabaseTransparentDataEncryption -ServerName "myserver" -ResourceGroupName
"Default-SQL-WestUS" -DatabaseName "database1" -State "Disabled"
If using version 0.9.8 use the Set-AzureSqlDatabaseTransparentDataEncryption command.
Decrypting a TDE Protected Database on SQL Database
To Disable TDE by Using the Azure Portal
1. Visit the Azure Portal at https://portal.azure.com and sign-in with your Azure Administrator or Contributor
account.
2. On the left banner, click to BROWSE, and then click SQL databases.
3. With SQL databases selected in the left pane, click your user database.
4. In the database blade, click All settings.
5. In the Settings blade, click Transparent Data encryption part to open the Transparent data encryption blade.
6. In the Transparent data encryption blade, move the Data encryption button to Off, and then click Save (at the
top of the page) to apply the setting. The Encryption status will approximate the progress of the transparent
data decryption.
You can also monitor the progress of decryption by connecting to SQL Database using a query tool such as
Management Studio as a database user with the VIEW DATABASE STATE permission. Query
the encryption_state column of the sys.dm_database_encryption_keys view.
To Disable TDE by Using Transact-SQL
1. Connect to the database using a login that is an administrator or a member of the dbmanager role in the
master database.
2. Execute the following statements to decrypt the database.
-- Enable encryption
ALTER DATABASE [AdventureWorks] SET ENCRYPTION OFF;
GO
3. To monitor the progress of encryption on SQL Database, database users with the VIEW DATABASE
STATE permission can query the encryption_state column of the sys.dm_database_encryption_keys view.
Moving a TDE Protected Database on SQL Database
You do not need to decrypt databases for operations within Azure. The TDE settings on the source database or
primary database are transparently inherited on the target. This includes operations involving:
Geo-Restore
Self-Service Point in Time Restore
Restore a Deleted Database
Active Geo_Replication
Creating a Database Copy
When exporting a TDE protected database using the Export Database function in the Azure SQL Database Portal or the
SQL Server Import and Export Wizard, the exported content of the database is not encrypted. This exported content is
stored in unencrypted .bacpac files. Be sure to protect the .bacpac files appropriately and enable TDE once import of
the new database is completed.
For example, if the .bacpac file is exported from an on-premises SQL Server, then the imported content of the new
database will not be automatically encrypted. Likewise, if the .bacpac file is exported from an Azure SQL Database to
an on-premises SQL Server, the new database is also not automatically encrypted.
The one exception is when exporting to and from Azure SQL Database TDE will be enabled in the new database, but
the .bacpac file itself is still not encrypted.

Use Role-Based Access Control to manage access to your Azure subscription


resources
Azure Role-Based Access Control (RBAC) enables fine-grained access management for Azure. Using RBAC, you can
grant only the amount of access that users need to perform their jobs. This article helps you get up and running with

133 | P a g e
70-534 Architecting Microsoft Azure Solutions

RBAC in the Azure portal. If you want more details about how RBAC helps you manage access, see What is Role-Based
Access Control.
Within each subscription, you can grant up to 2000 role assignments.2
View access
You can see who has access to a resource, resource group, or subscription from its main blade in the Azure portal. For
example, we want to see who has access to one of our resource groups:
1. Select Resource groups in the navigation bar on the left.

2. Select the name of the resource group from the Resource groups blade.
3. Select Access control (IAM) from the left menu.
4. The Access control blade lists all users, groups, and applications that have been granted access to the
resource group.

Notice that some users were Assigned access while others Inherited it. Access is either assigned specifically to the
resource group or inherited from an assignment to the parent subscription.
Note
Classic subscription admins and co-admins are considered owners of the subscription in the new RBAC model.
Add Access
You grant access from within the resource, resource group, or subscription that is the scope of the role assignment.
1. Select Add on the Access control blade.
2. Select the role that you wish to assign from the Select a
role blade.
3. Select the user, group, or application in your directory that
you wish to grant access to. You can search the directory with
display names, email addresses, and object identifiers.

4. Select OK to create the assignment. The Adding user popup


tracks the progress.

After successfully adding a role assignment, it will appear on


the Users blade.
Remove Access
1. Use the check boxes on the Access control blade to select
one or more role assignments.
2. Select Remove.
3. A box will pop up asking you to confirm the action.
Select Yes to remove the role assignments.
Inherited assignments cannot be removed. If you need to remove
an inherited assignment, you need to do it at the scope where the
role assignment was created. In the Scope column, next

134 | P a g e
70-534 Architecting Microsoft Azure Solutions

to Inherited there is a link that takes you to the resources where this role was assigned. Go to the resource listed there
to remove the role assignment.

Built-in roles for Azure role-based access control


Azure Role-Based Access Control (RBAC) comes with the following built-in roles that can be assigned to users, groups,
and services. You cant modify the definitions of built-in roles. However, you can create Custom roles in Azure RBAC to
fit the specific needs of your organization.
Roles in Azure
The following table provides brief descriptions of the built-in roles. Click the role name to see the detailed list
of actions and notactions for the role. The actions property specifies the allowed actions on Azure resources. Action
strings can use wildcard characters. The notactions property specifies the actions that are excluded from the allowed
actions.
The action defines what type of operations you can perform on a given resource type. For example:
Write enables you to perform PUT, POST, PATCH, and DELETE operations.
Read enables you to perform GET operations.
This article only addresses the different roles that exist today. When you assign a role to a user, though, you can limit
the allowed actions further by defining a scope. This is helpful if you want to make someone a Website Contributor,
but only for one resource group.
Note
The Azure role definitions are constantly evolving. This article is kept as up to date as possible, but you can always find
the latest roles definitions in Azure PowerShell. Use the Get-AzureRmRoleDefinition cmdlet to list all current roles. You
can dive in to a specific role using (get-azurermroledefinition "<role name>").actions or (get-azurermroledefinition
"<role name>").notactions as applicable. Use Get-AzureRmProviderOperation to list operations of specific Azure
resource providers.1

Role name Description

API Management Service Can manage API Management service and the APIs
Contributor

API Management Service Can manage API Management service, but not the APIs themselves
Operator Role

API Management Service Read-only access to API Management service and APIs
Reader Role

Application Insights Can manage Application Insights components


Component Contributor

Automation Operator Able to start, stop, suspend, and resume jobs

Backup Contributor Can manage backup in Recovery Services vault

Backup Operator Can manage backup except removing backup, in Recovery Services
vault

135 | P a g e
70-534 Architecting Microsoft Azure Solutions

Role name Description

Backup Reader Can view all backup management services

Billing Reader Can view all billing information

BizTalk Contributor Can manage BizTalk services

ClearDB MySQL DB Contributor Can manage ClearDB MySQL databases

Contributor Can manage everything except access.

Data Factory Contributor Can create and manage data factories, and child resources within
them.

DevTest Labs User Can view everything and connect, start, restart, and shutdown virtual
machines

DNS Zone Contributor Can manage DNS zones and records

Azure Cosmos DB Account Can manage Azure Cosmos DB accounts


Contributor

Intelligent Systems Account Can manage Intelligent Systems accounts


Contributor

Logic App Contributor Can manage all aspects of a Logic App, but not create a new one.

Logic App Operator Can start and stop workflows defined within a Logic App.

Monitoring Reader Can read all monitoring data

Monitoring Contributor Can read monitoring data and edit monitoring settings

Network Contributor Can manage all network resources

New Relic APM Account Can manage New Relic Application Performance Management
Contributor accounts and applications

Owner Can manage everything, including access

Reader Can view everything, but can't make changes

Redis Cache Contributor Can manage Redis caches

136 | P a g e
70-534 Architecting Microsoft Azure Solutions

Role name Description

Scheduler Job Collections Can manage scheduler job collections


Contributor

Search Service Contributor Can manage search services

Security Manager Can manage security components, security policies, and virtual
machines

SQL DB Contributor Can manage SQL databases, but not their security-related policies

SQL Security Manager Can manage the security-related policies of SQL servers and databases

SQL Server Contributor Can manage SQL servers and databases, but not their security-related
policies

Classic Storage Account Can manage classic storage accounts


Contributor

Storage Account Contributor Can manage storage accounts

Support Request Contributor Can create and manage support requests

User Access Administrator Can manage user access to Azure resources

Classic Virtual Machine Can manage classic virtual machines, but not the virtual network or
Contributor storage account to which they are connected

Virtual Machine Contributor Can manage virtual machines, but not the virtual network or storage
account to which they are connected

Classic Network Contributor Can manage classic virtual networks and reserved IPs

Web Plan Contributor Can manage web plans

Website Contributor Can manage websites, but not the web plans to which they are
connected

Create custom roles for Azure Role-Based Access Control


Create a custom role in Azure Role-Based Access Control (RBAC) if none of the built-in roles meet your specific access
needs. Custom roles can be created using Azure PowerShell, Azure Command-Line Interface (CLI), and the REST API.
Just like built-in roles, custom roles can be assigned to users, groups, and applications at subscription, resource group,
and resource scopes. Custom roles are stored in an Azure AD tenant and can be shared across all subscriptions that
use that tenant as the Azure AD directory for the subscription.
Each tenant can create up to 2000 custom roles.
137 | P a g e
70-534 Architecting Microsoft Azure Solutions

The following is an example of a custom role for monitoring and restarting virtual machines:
{
"Name": "Virtual Machine Operator",
"Id": "cadb4a5a-4e7a-47be-84db-05cad13b6769",
"IsCustom": true,
"Description": "Can monitor and restart virtual machines.",
"Actions": [
"Microsoft.Storage/*/read",
"Microsoft.Network/*/read",
"Microsoft.Compute/*/read",
"Microsoft.Compute/virtualMachines/start/action",
"Microsoft.Compute/virtualMachines/restart/action",
"Microsoft.Authorization/*/read",
"Microsoft.Resources/subscriptions/resourceGroups/read",
"Microsoft.Insights/alertRules/*",
"Microsoft.Insights/diagnosticSettings/*",
"Microsoft.Support/*"
],
"NotActions": [

],
"AssignableScopes": [
"/subscriptions/c276fc76-9cd4-44c9-99a7-4fd71546436e",
"/subscriptions/e91d47c4-76f3-4271-a796-21b4ecfe3624",
"/subscriptions/34370e90-ac4a-4bf9-821f-85eeedeae1a2"
]
}
Actions
The Actions property of a custom role specifies the Azure operations to which the role grants access. It is a collection
of operation strings that identify securable operations of Azure resource providers. Operation strings follow the
format of Microsoft.<ProviderName>/<ChildResourceType>/<action>. Operation strings that contain wildcards (*)
grant access to all operations that match the operation string. For instance:
*/read grants access to read operations for all resource types of all Azure resource providers.
Microsoft.Compute/* grants access to all operations for all resource types in the Microsoft.Compute resource
provider.
Microsoft.Network/*/read grants access to read operations for all resource types in the Microsoft.Network
resource provider of Azure.
Microsoft.Compute/virtualMachines/* grants access to all operations of virtual machines and its child
resource types.
Microsoft.Web/sites/restart/Action grants access to restart websites.
Use Get-AzureRmProviderOperation (in PowerShell) or azure provider operations show (in Azure CLI) to list operations
of Azure resource providers. You may also use these commands to verify that an operation string is valid, and to
expand wildcard operation strings.
Get-AzureRMProviderOperation Microsoft.Compute/virtualMachines/*/action | FT Operation, OperationName

Get-AzureRMProviderOperation Microsoft.Network/*

138 | P a g e
70-534 Architecting Microsoft Azure Solutions

azure provider operations show "Microsoft.Compute/virtualMachines/*/action" --js on | jq '.[] |


.operation'

azure provider operations show "Microsoft.Network/*"

NotActions
Use the NotActions property if the set of operations that you wish to allow is more easily defined by excluding
restricted operations. The access granted by a custom role is computed by subtracting the NotActions operations from
the Actions operations.
Note
If a user is assigned a role that excludes an operation in NotActions, and is assigned a second role that grants access to
the same operation, the user will be allowed to perform that operation. NotActions is not a deny rule it is simply a
convenient way to create a set of allowed operations when specific operations need to be excluded.3
AssignableScopes
139 | P a g e
70-534 Architecting Microsoft Azure Solutions

The AssignableScopes property of the custom role specifies the scopes (subscriptions, resource groups, or resources)
within which the custom role is available for assignment. You can make the custom role available for assignment in
only the subscriptions or resource groups that require it, and not clutter user experience for the rest of the
subscriptions or resource groups.
Examples of valid assignable scopes include:
/subscriptions/c276fc76-9cd4-44c9-99a7-4fd71546436e, /subscriptions/e91d47c4-76f3-4271-a796-
21b4ecfe3624 - makes the role available for assignment in two subscriptions.
/subscriptions/c276fc76-9cd4-44c9-99a7-4fd71546436e - makes the role available for assignment in a
single subscription.
/subscriptions/c276fc76-9cd4-44c9-99a7-4fd71546436e/resourceGroups/Network - makes the role
available for assignment only in the Network resource group.
1
Note
You must use at least one subscription, resource group, or resource ID.
Custom roles access control
The AssignableScopes property of the custom role also controls who can view, modify, and delete the role.
Who can create a custom role? Owners (and User Access Administrators) of subscriptions, resource groups,
and resources can create custom roles for use in those scopes. The user creating the role needs to be able to
perform Microsoft.Authorization/roleDefinition/write operation on all the AssignableScopes of the role.
Who can modify a custom role? Owners (and User Access Administrators) of subscriptions, resource groups,
and resources can modify custom roles in those scopes. Users need to be able to perform
the Microsoft.Authorization/roleDefinition/write operation on all the AssignableScopes of a custom role.
Who can view custom roles? All built-in roles in Azure RBAC allow viewing of roles that are available for
assignment. Users who can perform the Microsoft.Authorization/roleDefinition/read operation at a scope can
view the RBAC roles that are available for assignment at that scope.

Azure Security Center planning and operations guide


This guide is for information technology (IT) professionals, IT architects, information security analysts, and cloud
administrators whose organizations are planning to use Azure Security Center.
Note
Beginning in early June 2017, Security Center will use the Microsoft Monitoring Agent to collect and store data.
See Azure Security Center Platform Migration to learn more. The information in this article represents Security Center
functionality after transition to the Microsoft Monitoring Agent.
Planning guide
This guide covers a set of steps and tasks that you can follow to optimize your use of Security Center based on your
organizations security requirements and cloud management model. To take full advantage of Security Center, it is
important to understand how different individuals or teams in your organization use the service to meet secure
development and operations, monitoring, governance, and incident response needs. The key areas to consider when
planning to use Security Center are:
Security Roles and Access Controls
Security Policies and Recommendations
Data Collection and Storage
Ongoing Security Monitoring
Incident Response
In the next section, you will learn how to plan for each one of those areas and apply those recommendations based on
your requirements.
Note
Read Azure Security Center frequently asked questions (FAQ) for a list of common questions that can also be useful
during the designing and planning phase.
Security roles and access controls
Depending on the size and structure of your organization, multiple individuals and teams may use Security Center to
perform different security-related tasks. In the following diagram you have an example of fictitious personas and their
respective roles and security responsibilities:
140 | P a g e
70-534 Architecting Microsoft Azure Solutions

Security Center enables these individuals to meet these various responsibilities. For example:
Jeff (Cloud Workload Owner)
Manage a cloud workload and its related resources
Responsible for implementing and maintaining protections in accordance with company security policy
Ellen (CISO/CIO)
Responsible for all aspects of security for the company
Wants to understand the company's security posture across cloud workloads
Needs to be informed of major attacks and risks
David (IT Security)
Sets company security policies to ensure the appropriate protections are in place
Monitors compliance with policies
Generates reports for leadership or auditors
Judy (Security Operations)
Monitors and responds to security alerts 24/7
Escalates to Cloud Workload Owner or IT Security Analyst
Sam (Security Analyst)
Investigate attacks
Work with Cloud Workload Owner to apply remediation
Security Center uses Role-Based Access Control (RBAC), which provides built-in roles that can be assigned to users,
groups, and services in Azure. When a user opens Security Center, they only see information related to resources they
have access to. Which means the user is assigned the role of Owner, Contributor, or Reader to the subscription or
resource group that a resource belongs to. In addition to these roles, there are two specific Security Center roles:
Security reader: user that belongs to this role is be able to view rights to Security Center, which includes
recommendations, alerts, policy, and health, but it won't be able to make changes.
Security admin: same as security reader but it can also update the security policy, dismiss recommendations
and alerts.
The Security Center roles described above do not have access to other service areas of Azure such as Storage, Web &
Mobile, or Internet of Things.
Note
A user needs to be at least a subscription, resource group owner, or contributor to be able to see Security Center in
Azure.
Using the personas explained in the previous diagram, the following RBAC would be needed:
Jeff (Cloud Workload Owner)
Resource Group Owner/Collaborator

141 | P a g e
70-534 Architecting Microsoft Azure Solutions

David (IT Security)


Subscription Owner/Collaborator or Security Admin
Judy (Security Operations)
Subscription Reader or Security Reader to view Alerts
Subscription Owner/Collaborator or Security Admin required to dismiss Alerts
Sam (Security Analyst)
Subscription Reader to view Alerts
Subscription Owner/Collaborator required to dismiss Alerts
Access to the workspace may be required
Some other important information to consider:
Only subscription Owners/Contributors and Security Admins can edit a security policy
Only subscription and resource group Owners and Contributors can apply security recommendations for a
resource
When planning access control using RBAC for Security Center, be sure to understand who in your organization will be
using Security Center. Also, what types of tasks they will be performing and then configure RBAC accordingly.
Note
We recommend that you assign the least permissive role needed for users to complete their tasks. For example, users
who only need to view information about the security state of resources but not take action, such as applying
recommendations or editing policies, should be assigned the Reader role.
Security policies and recommendations
A security policy defines the set of controls, which are recommended for resources within the specified subscription.
In Security Center, you define policies according to your company's security requirements and the type of applications
or sensitivity of the data.
Policies that are enabled in the subscription level automatically propagate to all resources groups within the
subscription as shown in the following diagram:

Note
If you need to review which policies were changed, you can use Azure Audit Logs. Policy changes are always logged in
Azure Audit Logs.
Security recommendations

142 | P a g e
70-534 Architecting Microsoft Azure Solutions

Before configuring security policies, review each of the security recommendations, and determine whether these
policies are appropriate for your various subscriptions and resource groups. It is also important to understand what
action should be taken to address Security Recommendationsand who in your organization will be responsible for
monitoring for new recommendations and taking the needed steps.
Security Center will recommend that you provide security contact details for your Azure subscription. This information
will be used by Microsoft to contact you if the Microsoft Security Response Center (MSRC) discovers that your
customer data has been accessed by an unlawful or unauthorized party. Read Provide security contact details in Azure
Security Center for more information on how to enable this recommendation.
Data collection and storage
Azure Security Center uses the Microsoft Monitoring Agent this is the same agent used by the Operations
Management Suite and Log Analytics service to collect security data from your virtual machines. Data collected from
this agent will be stored in your Log Analytics workspace(s).
Agent
After data collection is enabled in the security policy, the Microsoft Monitoring Agent (for Windows or Linux) is
installed on all supported Azure VMs and any new ones that are created. If the VM already has the Microsoft
Monitoring Agent installed, Azure Security Center will leverage the current installed agent. The agents process is
designed to be non-invasive and have very minimal impact on VM performance.
The Microsoft Monitoring Agent for Windows requires use TCP port 443. See the Troubleshooting article for
additional details.
If at some point you want to disable Data Collection, you can turn it off in the security policy. However, because the
Microsoft Monitoring Agent may be used by other Azure management and monitoring services, the agent will not be
uninstalled automatically when you turn off data collection in Security Center. You can manually uninstall the agent if
needed.
Note
To find a list of supported VMs, read the Azure Security Center frequently asked questions (FAQ).
Workspace
Data collected from the Microsoft Monitoring Agent (on behalf of Azure Security Center) will be stored in either an
existing Log Analytics workspace(s) associated with your Azure subscription or a new workspace(s), taking into
account the Geo of the VM.
In the Azure portal, you can browse to see a list of your Log Analytics workspaces, including any created by Azure
Security Center. A related resource group will be created for new workspaces. Both will follow this naming
convention:
Workspace: DefaultWorkspace-[subscription-ID]-[geo]
Resource Group: DefaultResouceGroup-[geo]
For workspaces created by Azure Security Center, data is retained for 30 days. For exiting workspaces, retention is
based on the workspace pricing tier.
Note
Microsoft make strong commitments to protect the privacy and security of this data. Microsoft adheres to strict
compliance and security guidelinesfrom coding to operating a service. For more information about data handling
and privacy, read Azure Security Center Data Security.
Ongoing security monitoring
After initial configuration and application of Security Center recommendations, the next step is considering Security
Center operational processes.
To access Security Center from the Azure portal you can click Browse and type Security Center in the Filter field. The
views that the user gets are according to these applied filters, the example below shows an environment with many
issues to be addressed:

143 | P a g e
70-534 Architecting Microsoft Azure Solutions

Note
Security Center will not interfere with your normal operational procedures, it will passively monitor your deployments
and provide recommendations based on the security policies you enabled.
When you first opt in to use Security Center for your current Azure environment, make sure that you review all
recommendations, which can be done in the Recommendations tile or per resource (Compute, Networking, Storage &
data, Application).
Once you address all recommendations, the Prevention section should be green for all resources that were addressed.
Ongoing monitoring at this point becomes easier since you will only take actions based on changes in the resource
security health and recommendations tiles.
The Detection section is more reactive, these are alerts regarding issues that are either taking place now, or occurred
in the past and were detected by Security Center controls and 3rd party systems. The Security Alerts tile will show bar
graphs that represent the number of threat detection alerts that were found in each day, and their distribution among
the different severity categories (low, medium, high). For more information about Security Alerts, read Managing and
responding to security alerts in Azure Security Center.
Note
You can also leverage Microsoft Power BI to visualize your Security Center data. Read Get insights from Azure Security
Center data with Power BI.
Monitoring for new or changed resources
Most Azure environments are dynamic, with new resources being spun up and down on a regular basis, configurations
or changes, etc. Security Center helps ensure that you have visibility into the security state of these new resources.
When you add new resources (VMs, SQL DBs) to your Azure Environment, Security Center will automatically discover
these resources and begin to monitor their security. This also includes PaaS web roles and worker roles. If Data
Collection is enabled in the Security Policy, additional monitoring capabilities will be enabled automatically for your
virtual machines.
144 | P a g e
70-534 Architecting Microsoft Azure Solutions

1. For virtual machines, click Compute, under Prevention section. Any issues with enabling data or related
recommendations will be surfaced in the Overview tab, and Monitoring Recommendations section.
2. View the Recommendations to see what, if any, security risks were identified for the new resource.
3. It is very common that when new VMs are added to your environment, only the operating system is initially
installed. The resource owner might need some time to deploy other apps that will be used by these VMs.
Ideally, you should know the final intent of this workload. Is it going to be an Application Server? Based on
what this new workload is going to be, you can enable the appropriate Security Policy, which is the third step
in this workflow.
4. As new resources are added to your Azure environment, it is possible that new alerts appear in the Security
Alerts tile. Always verify if there are new alerts in this tile and take actions according to Security Center
recommendations.
You will also want to regularly monitor the state of existing resources to identify configuration changes that have
created security risks, drift from recommended baselines, and security alerts. Start at the Security Center dashboard.
From there you have three major areas to review on a consistent basis.

145 | P a g e
70-534 Architecting Microsoft Azure Solutions

1. The Prevention section panel provides you quick access to your key resources. Use this option to monitor
Compute, Networking, Storage & data and Applications.
2. The Recommendations panel enables you to review Security Center recommendations. During your ongoing
monitoring you may find that you dont have recommendations on a daily basis, which is normal since you
addressed all recommendations on the initial Security Center setup. For this reason, you may not have new
information in this section every day and will just need to access it as needed.
3. The Detection section might change on either a very frequent or very infrequent basis. Always review your
security alerts and take actions based on Security Center recommendations.
Incident response
Security Center detects and alerts you to threats as they occur. Organizations should monitor for new security alerts
and take action as needed to investigate further or remediate the attack. For more information on how Security
Center threat detection works, read Azure Security Center detection capabilities.
While this article doesnt have the intent to assist you creating your own Incident Response plan, we are going to use
Microsoft Azure Security Response in the Cloud lifecycle as the foundation for incident response stages. The stages
are shown in the following diagram:

Note

146 | P a g e
70-534 Architecting Microsoft Azure Solutions

You can use the National Institute of Standards and Technology (NIST) Computer Security Incident Handling Guide as a
reference to assist you building your own.
You can use Security Center Alerts during the following stages:
Detect: identify a suspicious activity in one or more resources.
Assess: perform the initial assessment to obtain more information about the suspicious activity.
Diagnose: use the remediation steps to conduct the technical procedure to address the issue.
Each Security Alert provides information that can be used to better understand the nature of the attack and suggest
possible mitigations. Some alerts also provide links to either more information or to other sources of information
within Azure. You can use the information provided for further research and to begin mitigation, and you can also
search security-related data that is stored in your workspace.
The following example shows a suspicious RDP activity taking place:

As you can see, this blade shows details regarding the time that the attack took place, the source hostname, the
target VM and also gives recommendation steps. In some circumstances the source information of the attack may be
empty. Read Missing Source Information in Azure Security Center Alerts for more information about this type of
behavior.
In the How to Leverage the Azure Security Center & Microsoft Operations Management Suite for an Incident
Response video you can see some demonstrations that can help you to understand how Security Center can be used
in each one of those stages.

147 | P a g e
70-534 Architecting Microsoft Azure Solutions

CHAPTER 4: Design an application storage and data access strategy


Introduction to Microsoft Azure Storage
Azure Storage is the cloud storage solution for modern applications that rely on durability, availability, and scalability
to meet the needs of their customers. By reading this article, developers, IT Pros, and business decision makers can
learn about:
What Azure Storage is, and how you can take advantage of it in your cloud, mobile, server, and desktop
applications
What kinds of data you can store with the Azure Storage services: blob (object) data, NoSQL table data, queue
messages, and file shares.
How access to your data in Azure Storage is managed
How your Azure Storage data is made durable via redundancy and replication
Where to go next to build your first Azure Storage application
For details on tools, libraries, and other resources for working with Azure Storage, see Next Steps below.
What is Azure Storage?
Cloud computing enables new scenarios for applications requiring scalable, durable, and highly available storage for
their data which is exactly why Microsoft developed Azure Storage. In addition to making it possible for developers
to build large-scale applications to support new scenarios, Azure Storage also provides the storage foundation for
Azure Virtual Machines, a further testament to its robustness.
Azure Storage is massively scalable, so you can store and process hundreds of terabytes of data to support the big
data scenarios required by scientific, financial analysis, and media applications. Or you can store the small amounts of
data required for a small business website. Wherever your needs fall, you pay only for the data you're storing. Azure
Storage currently stores tens of trillions of unique customer objects, and handles millions of requests per second on
average.
Azure Storage is elastic, so you can design applications for a large global audience, and scale those applications as
needed - both in terms of the amount of data stored and the number of requests made against it. You pay only for
what you use, and only when you use it.
Azure Storage uses an auto-partitioning system that automatically load-balances your data based on traffic. This
means that as the demands on your application grow, Azure Storage automatically allocates the appropriate resources
to meet them.1
Azure Storage is accessible from anywhere in the world, from any type of application, whether it's running in the
cloud, on the desktop, on an on-premises server, or on a mobile or tablet device. You can use Azure Storage in mobile
scenarios where the application stores a subset of data on the device and synchronizes it with a full set of data stored
in the cloud.
Azure Storage supports clients using a diverse set of operating systems (including Windows and Linux) and a variety of
programming languages (including .NET, Java, Node.js, Python, Ruby, PHP and C++ and mobile programming
languages) for convenient development. Azure Storage also exposes data resources via simple REST APIs, which are
available to any client capable of sending and receiving data via HTTP/HTTPS.1
Azure Premium Storage delivers high-performance, low-latency disk support for I/O intensive workloads running on
Azure Virtual Machines. With Azure Premium Storage, you can attach multiple persistent data disks to a virtual
machine and configure them to meet your performance requirements. Each data disk is backed by an SSD disk in
Azure Premium Storage for maximum I/O performance. See Premium Storage: High-Performance Storage for Azure
Virtual Machine Workloads for more details.
Introducing the Azure Storage services
Azure storage provides the following four services: Blob storage, Table storage, Queue storage, and File storage.
Blob Storage stores unstructured object data. A blob can be any type of text or binary data, such as a
document, media file, or application installer. Blob storage is also referred to as Object storage.
Table Storage stores structured datasets. Table storage is a NoSQL key-attribute data store, which allows for
rapid development and fast access to large quantities of data.
Queue Storage provides reliable messaging for workflow processing and for communication between
components of cloud services.

148 | P a g e
70-534 Architecting Microsoft Azure Solutions

File Storage offers shared storage for legacy applications using the standard SMB protocol. Azure virtual
machines and cloud services can share file data across application components via mounted shares, and on-
premises applications can access file data in a share via the File service REST API.

An Azure storage account is a secure account that gives you access to services in Azure Storage. Your storage account
provides the unique namespace for your storage resources. The image below shows the relationships between the
Azure storage resources in a storage account:

1
There are two types of storage accounts:
General-purpose Storage Accounts
A general-purpose storage account gives you access to Azure Storage services such as Tables, Queues, Files, Blobs and
Azure virtual machine disks under a single account. This type of storage account has two performance tiers:
A standard storage performance tier which allows you to store Tables, Queues, Files, Blobs and Azure virtual
machine disks.
A premium storage performance tier which currently only supports Azure virtual machine disks. See Premium
Storage: High-Performance Storage for Azure Virtual Machine Workloads for an in-depth overview of
Premium storage.

Blob Storage Accounts


A Blob storage account is a specialized storage account for storing your unstructured data as blobs (objects) in Azure
Storage. Blob storage accounts are similar to your existing general-purpose storage accounts and share all the great
durability, availability, scalability, and performance features that you use today including 100% API consistency for
block blobs and append blobs. For applications requiring only block or append blob storage, we recommend using
Blob storage accounts.
Note
Blob storage accounts support only block and append blobs, and not page blobs.
Blob storage accounts expose the Access Tier attribute which can be specified during account creation and modified
later as needed. There are two types of access tiers that can be specified based on your data access pattern:
A Hot access tier which indicates that the objects in the storage account will be more frequently accessed.
This allows you to store data at a lower access cost.
A Cool access tier which indicates that the objects in the storage account will be less frequently accessed. This
allows you to store data at a lower data storage cost.
If there is a change in the usage pattern of your data, you can also switch between these access tiers at any time.
Changing the access tier may result in additional charges. Please see Pricing and billing for Blob storage accounts for
more details.
For more details on Blob storage accounts, see Azure Blob Storage: Cool and Hot tiers.
Before you can create a storage account, you must have an Azure subscription, which is a plan that gives you access to
a variety of Azure services. You can get started with Azure with a free account. Once you decide to purchase a
subscription plan, you can choose from a variety of purchase options. If youre an MSDN subscriber, you get free
monthly credits that you can use with Azure services, including Azure Storage. See Azure Storage Pricing for
information on volume pricing.

149 | P a g e
70-534 Architecting Microsoft Azure Solutions

To learn how to create a storage account, see Create a storage account for more details. You can create up to 200
uniquely named storage accounts with a single subscription. See Azure Storage Scalability and Performance
Targets for details about storage account limits.
Storage Service Versions
The Azure Storage services are regularly updated with support for new features. The Azure Storage services REST API
reference describes each supported version and its features. We recommend that you use the latest version
whenever possible. For information on the latest version of the Azure Storage services, as well as information on
previous versions, see Versioning for the Azure Storage Services.
Blob storage
For users with large amounts of unstructured object data to store in the cloud, Blob storage offers a cost-effective and
scalable solution. You can use Blob storage to store content such as:
Documents
Social data such as photos, videos, music, and blogs
Backups of files, computers, databases, and devices
Images and text for web applications
Configuration data for cloud applications
Big data, such as logs and other large datasets
Every blob is organized into a container. Containers also provide a useful way to assign security policies to groups of
objects. A storage account can contain any number of containers, and a container can contain any number of blobs,
up to the 500 TB capacity limit of the storage account.1
Blob storage offers three types of blobs, block blobs, append blobs, and page blobs (disks).1
Block blobs are optimized for streaming and storing cloud objects, and are a good choice for storing
documents, media files, backups etc.
Append blobs are similar to block blobs, but are optimized for append operations. An append blob can be
updated only by adding a new block to the end. Append blobs are a good choice for scenarios such as logging,
where new data needs to be written only to the end of the blob.
Page blobs are optimized for representing IaaS disks and supporting random writes, and may be up to 1 TB in
size. An Azure virtual machine network attached IaaS disk is a VHD stored as a page blob.
For very large datasets where network constraints make uploading or downloading data to Blob storage over the wire
unrealistic, you can ship a hard drive to Microsoft to import or export data directly from the data center. See Use the
Microsoft Azure Import/Export Service to Transfer Data to Blob Storage.

Table storage
Tip: The content in this article applies to the original basic Azure Table storage. However, there is now a premium
offering for Azure Table storage in public preview that offers throughput-optimized tables, global distribution, and
automatic secondary indexes. To learn more and try out the new premium experience, please check out Azure
Cosmos DB: Table API.
Modern applications often demand data stores with greater scalability and flexibility than previous generations of
software required. Table storage offers highly available, massively scalable storage, so that your application can
automatically scale to meet user demand. Table storage is Microsoft's NoSQL key/attribute store it has a schemaless
design, making it different from traditional relational databases. With a schemaless data store, it's easy to adapt your
data as the needs of your application evolve. Table storage is easy to use, so developers can create applications
quickly. Access to data is fast and cost-effective for all kinds of applications. Table storage is typically significantly
lower in cost than traditional SQL for similar volumes of data.1
Table storage is a key-attribute store, meaning that every value in a table is stored with a typed property name. The
property name can be used for filtering and specifying selection criteria. A collection of properties and their values
comprise an entity. Since Table storage is schemaless, two entities in the same table can contain different collections
of properties, and those properties can be of different types.
You can use Table storage to store flexible datasets, such as user data for web applications, address books, device
information, and any other type of metadata that your service requires. You can store any number of entities in a
table, and a storage account may contain any number of tables, up to the capacity limit of the storage account.
Like Blobs and Queues, developers can manage and access Table storage using standard REST protocols, however
Table storage also supports a subset of the OData protocol, simplifying advanced querying capabilities and enabling
both JSON and AtomPub (XML based) formats.
150 | P a g e
70-534 Architecting Microsoft Azure Solutions

For today's Internet-based applications, NoSQL databases like Table storage offer a popular alternative to traditional
relational databases.

Queue storage
In designing applications for scale, application components are often decoupled, so that they can scale independently.
Queue storage provides a reliable messaging solution for asynchronous communication between application
components, whether they are running in the cloud, on the desktop, on an on-premises server, or on a mobile device.
Queue storage also supports managing asynchronous tasks and building process workflows.1
A storage account can contain any number of queues. A queue can contain any number of messages, up to the
capacity limit of the storage account. Individual messages may be up to 64 KB in size.

File storage
Azure File storage offers cloud-based SMB file shares, so that you can migrate legacy applications that rely on file
shares to Azure quickly and without costly rewrites. With Azure File storage, applications running in Azure virtual
machines or cloud services can mount a file share in the cloud, just as a desktop application mounts a typical SMB
share. Any number of application components can then mount and access the File storage share simultaneously.
Since a File storage share is a standard SMB file share, applications running in Azure can access data in the share via
file system I/O APIs. Developers can therefore leverage their existing code and skills to migrate existing applications. IT
Pros can use PowerShell cmdlets to create, mount, and manage File storage shares as part of the administration of
Azure applications.

Like the other Azure storage services, File storage exposes a REST API for accessing data in a share. On-premises
applications can call the File storage REST API to access data in a file share. This way, an enterprise can choose to
migrate some legacy applications to Azure and continue running others from within their own organization. Note that
mounting a file share is only possible for applications running in Azure; an on-premises application may only access
the file share via the REST API.
Distributed applications can also use File storage to store and share useful application data and development and
testing tools. For example, an application may store configuration files and diagnostic data such as logs, metrics, and
crash dumps in a File storage share so that they are available to multiple virtual machines or roles. Developers and
administrators can store utilities that they need to build or manage an application in a File storage share that is
available to all components, rather than installing them on every virtual machine or role instance.
Access to Blob, Table, Queue, and File resources
By default, only the storage account owner can access resources in the storage account. For the security of your data,
every request made against resources in your account must be authenticated. Authentication relies on a Shared Key
model. Blobs can also be configured to support anonymous authentication.
Your storage account is assigned two private access keys on creation that are used for authentication. Having two keys
ensures that your application remains available when you regularly regenerate the keys as a common security key
management practice.
If you do need to allow users controlled access to your storage resources, then you can create a shared access
signature. A shared access signature (SAS) is a token that can be appended to a URL that enables delegated access to a
storage resource. Anyone who possesses the token can access the resource it points to with the permissions it
specifies, for the period of time that it is valid. Beginning with version 2015-04-05, Azure Storage supports two kinds
of shared access signatures: service SAS and account SAS.
The service SAS delegates access to a resource in just one of the storage services: the Blob, Queue, Table, or File
service.
An account SAS delegates access to resources in one or more of the storage services. You can delegate access to
service-level operations that are not available with a service SAS. You can also delegate access to read, write, and
delete operations on blob containers, tables, queues, and file shares that are not permitted with a service SAS.
Finally, you can specify that a container and its blobs, or a specific blob, are available for public access. When you
indicate that a container or blob is public, anyone can read it anonymously; no authentication is required. Public
containers and blobs are useful for exposing resources such as media and documents that are hosted on websites. To
decrease network latency for a global audience, you can cache blob data used by websites with the Azure CDN.

151 | P a g e
70-534 Architecting Microsoft Azure Solutions

See Using Shared Access Signatures (SAS) for more information on shared access signatures. See Manage anonymous
read access to containers and blobs and Authentication for the Azure Storage Services for more information on secure
access to your storage account.
Replication for durability and high availability
The data in your Microsoft Azure storage account is always replicated to ensure durability and high availability.
Replication copies your data, either within the same data center, or to a second data center, depending on which
replication option you choose. Replication protects your data and preserves your application up-time in the event of
transient hardware failures. If your data is replicated to a second data center, that also protects your data against a
catastrophic failure in the primary location.
Replication ensures that your storage account meets the Service-Level Agreement (SLA) for Storage even in the face of
failures. See the SLA for information about Azure Storage guarantees for durability and availability.
When you create a storage account, you can select one of the following replication options:
Locally redundant storage (LRS). Locally redundant storage maintains three copies of your data. LRS is
replicated three times within a single data center in a single region. LRS protects your data from normal
hardware failures, but not from the failure of a single data center.
LRS is offered at a discount. For maximum durability, we recommend that you use geo-redundant storage, described
below.
Zone-redundant storage (ZRS). Zone-redundant storage maintains three copies of your data. ZRS is replicated
three times across two to three facilities, either within a single region or across two regions, providing higher
durability than LRS. ZRS ensures that your data is durable within a single region.
ZRS provides a higher level of durability than LRS; however, for maximum durability, we recommend that you use geo-
redundant storage, described below.
Note
ZRS is currently available only for block blobs, and is only supported for versions 2014-02-14 and later.
Once you have created your storage account and selected ZRS, you cannot convert it to use to any other type of
replication, or vice versa.
Geo-redundant storage (GRS). GRS maintains six copies of your data. With GRS, your data is replicated three
times within the primary region, and is also replicated three times in a secondary region hundreds of miles
away from the primary region, providing the highest level of durability. In the event of a failure at the primary
region, Azure Storage will failover to the secondary region. GRS ensures that your data is durable in two
separate regions.
For information about primary and secondary pairings by region, see Azure Regions.
Read-access geo-redundant storage (RA-GRS). Read-access geo-redundant storage replicates your data to a
secondary geographic location, and also provides read access to your data in the secondary location. Read-
access geo-redundant storage allows you to access your data from either the primary or the secondary
location, in the event that one location becomes unavailable. Read-access geo-redundant storage is the
default option for your storage account by default when you create it.
Important: You can change how your data is replicated after your storage account has been created, unless you
specified ZRS when you created the account. However, note that you may incur an additional one-time data transfer
cost if you switch from LRS to GRS or RA-GRS.
See Azure Storage replication for additional details about storage replication options.
For pricing information for storage account replication, see Azure Storage Pricing. See Azure Regions for more
information about what services are available in each region.
For architectural details about durability with Azure Storage, see SOSP Paper - Azure Storage: A Highly Available Cloud
Storage Service with Strong Consistency.
Transferring data to and from Azure Storage
You can use the AzCopy command-line utility to copy blob, file, and table data within your storage account or across
storage accounts. See Transfer data with the AzCopy Command-Line Utility for more information.
AzCopy is built on top of the Azure Data Movement Library, which is currently available in preview.
The Azure Import/Export service provides a way to import blob data into or export blob data from your storage
account via a hard drive disk mailed to the Azure data center. For more information about the Import/Export service,
see Use the Microsoft Azure Import/Export Service to Transfer Data to Blob Storage.

152 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure Cosmos DB
Azure Cosmos DB is Microsoft's globally distributed, multi-model database. With the click of a button, Azure Cosmos
DB enables you to elastically and independently scale throughput and storage across any number of Azure's
geographic regions. It offers throughput, latency, availability, and consistency guarantees with comprehensive service
level agreements (SLAs), something no other database service can offer.

Azure Cosmos DB contains a write optimized, resource governed, schema-agnostic database engine that natively
supports multiple data models: key-value, documents, graphs, and columnar. It also supports many APIs for accessing
data including MongoDB, DocumentDB SQL, Gremlin(preview), and Azure Tables (preview), in an extensible manner.
Azure Cosmos DB started in late 2010 to address developer pain-points that are faced by large scale applications
inside Microsoft. Since building globally distributed applications is not a problem unique to just to Microsoft, we made
the service available externally to all Azure Developers in the form of Azure DocumentDB. Azure Cosmos DB is the
next big leap in the evolution of DocumentDB and we are now making it available for you to use. As a part of this
release of Azure Cosmos DB, DocumentDB customers (with their data) are automatically Azure Cosmos DB customers.
The transition is seamless and they now have access to a broader range of new capabilities offered by Azure Cosmos
DB.
Capability comparison
Azure Cosmos DB provides the best capabilities of relational and non-relational databases.

Relational Non-relational
Capabilities DBs (NoSQL) DBs Azure Cosmos DB

Global x x Turnkey, 30+ regions, multi-homing


distribution

Horizontal x Independently scale storage and


scale throughput

Latency x <10 ms for reads, <15 ms for writes at


guarantees p99

153 | P a g e
70-534 Architecting Microsoft Azure Solutions

Relational Non-relational
Capabilities DBs (NoSQL) DBs Azure Cosmos DB

High availability x Always on, PACELC tradeoffs, automatic &


manual failover

Data model + Relational + Multi-model + OSS Multi-model + SQL + OSS API (more coming
API SQL API soon)

SLAs x Comprehensive SLAs for latency,


throughput, consistency, availability

Key capabilities
As a globally distributed database service, Azure Cosmos DB provides the following capabilities to help you build
scalable, globally distributed, highly responsive applications:
Turnkey global distribution
o Your application is instantly available to your users, everywhere. Now your data can be too.
o Don't worry about hardware, adding nodes, VMs or cores. Just point and click, and your data is there.
Multiple data models and popular APIs for accessing and querying data
o Support for multiple data models including key-value, document, graph, and columnar.
o Extensible APIs for Node.js, Java, .NET, .NET Core, Python, and MongoDB.
o SQL and Gremlin for queries.
Elastically scale throughput and storage on demand, worldwide
o Easily scale throughput at second and minute granularities, and change it anytime you want.
o Scale storage transparently and automatically to cover your size requirements now and forever.
Build highly responsive and mission-critical applications
o Get access to your data with single digit millisecond latencies at the 99th percentile, anywhere in the
world.
Ensure "always on" availability
o 99.99% availability within a single region.
o Deploy to any number of Azure regions for higher availability.
o Simulate a failure of one or more regions with zero-data loss guarantees.
Write globally distributed applications, the right way
o Five consistency models models offer strong SQL-like consistency to NoSQL-like eventual consistency,
and every thing in between.
Money back guarantees
o Your data gets there fast, or your money back.
o Service level agreements for availability, latency, throughput, and consistency.
No database schema/index management
o Stop worrying about keeping your database schema and indexes in-sync with your applications
schema. We're schema-free.
Low cost of ownership
o Five to ten times more cost effective than a non-managed solution.
o Three times cheaper than DynamoDB.
Global distribution
Azure Cosmos DB containers are distributed along two dimensions:
1. Within a given region, all resources are horizontally partitioned using resource partitions (local distribution).
2. Each resource partition is also replicated across geographical regions (global distribution).

154 | P a g e
70-534 Architecting Microsoft Azure Solutions

When your storage and throughput needs to be scaled, Cosmos DB transparently performs partition management
operations across all the regions. Independent of the scale, distribution, or failures, Cosmos DB continues to provide a
single system image of the globally distributed resources.
Global distribution of resources in Cosmos DB is turn-key. At any time with a few button clicks (or programmatically
with a single API call), you can associate any number of geographical regions with your database account.
Regardless of the amount of data or the number of regions, Cosmos DB guarantees each newly associated region to
start processing client requests under an hour at the 99th percentile. This is done by parallelizing the seeding and
copying data from all the source resource partitions to the newly associated region. Customers can also remove an
existing region or take a region that was previously associated with their database account offline.
Multi-model, multi-API support
Azure Cosmos DB natively supports multiple data models including documents, key-value, graph, and column-family.
The core content-model of Cosmos DBs database engine is based on atom-record-sequence (ARS). Atoms consist of a
small set of primitive types like string, bool, and number. Records are structs composed of these types. Sequences are
arrays consisting of atoms, records, or sequences.

155 | P a g e
70-534 Architecting Microsoft Azure Solutions

The database engine can efficiently translate and project different data models onto the ARS-based data model. The
core data model of Cosmos DB is natively accessible from dynamically typed programming languages and can be
exposed as-is as JSON.
The service also supports popular database APIs for data access and querying. Cosmos DBs database engine currently
supports DocumentDB SQL, MongoDB, Azure Tables (preview), and Gremlin (preview). You can continue to build
applications using popular OSS APIs and get all the benefits of a battle-tested and fully managed, globally distributed
database service.
Horizontal scaling of storage and throughput
All the data within a Cosmos DB container (for example, a document collection, table, or graph) is horizontally
partitioned and transparently managed by resource partitions. A resource partition is a consistent and highly available
container of data partitioned by a customer specified partition-key. It provides a single system image for a set of
resources it manages and is a fundamental unit of scalability and distribution. Cosmos DB is designed to let you
elastically scale throughput based on the application traffic patterns across different geographical regions to support
fluctuating workloads varying both by geography and time. The service manages the partitions transparently without
compromising the availability, consistency, latency, or throughput of a Cosmos DB container.

You can elastically scale throughput of an Azure Cosmos DB container by programmatically provisioning throughput
using request units per second (RU/s). Internally, the service transparently manages resource partitions to deliver the
throughput on a given container. Cosmos DB ensures that the throughput is available for use across all the regions
associated with the container. The new throughput is effective within five seconds of the change in the configured
throughput value.
You can provision throughput on a Cosmos DB container at both, per-second and at per-minute (RU/m) granularities.
The provisioned throughput at per-minute granularity is used to manage unexpected spikes in the workload occurring
at a per-second granularity.
Low latency guarantees at the 99th percentile
As part of its SLAs, Cosmos DB guarantees end-to-end low latency at the 99th percentile to its customers. For a typical
1-KB item, Cosmos DB guarantees end-to-end latency of reads under 10 ms and indexed writes under 15 ms at the
99th percentile, within the same Azure region. The median latencies are significantly lower (under 5 ms). With an
upper bound of request processing on every database transaction, Cosmos DB allows clients to clearly distinguish
between transactions with high latency vs. a database being unavailable.
Transparent multi-homing and 99.99% high availability

156 | P a g e
70-534 Architecting Microsoft Azure Solutions

You can dynamically associate "priorities" to the regions associated with your Azure Cosmos DB database account.
Priorities are used to direct the requests to specific regions in the event of regional failures. In an unlikely event of a
regional disaster, Cosmos DB automatically failovers in the order of priority.
To test the end-to-end availability of the application, you can manually trigger failover (rate limited to two operations
within an hour). Cosmos DB guarantees zero data loss during manual regional failovers. In case a regional disaster
occurs, Cosmos DB guarantees an upper-bound on data loss during the system-initiated automatic failover. You do not
have to redeploy your application after a regional failover, and availability SLAs are maintained by Azure Cosmos DB.
For this scenario, Cosmos DB allows you to interact with resources using either logical (region-agnostic) or physical
(region-specific) endpoints. The former ensures that the application can transparently be multi-homed in case of
failover. The latter provides fine-grained control to the application to redirect reads and writes to specific regions.
Cosmos DB guarantees 99.99% availability SLA for every database account. The availability guarantees are agnostic of
the scale (provisioned throughput and storage), number of regions, or geographical distance between regions
associated with a given database.
Multiple, well-defined consistency models
Commercial distributed databases fall into two categories: databases that do not offer well-defined, provable
consistency choices at all, and databases which offer two extreme programmability choices (strong vs. eventual
consistency). The former burdens application developers with minutia of their replication protocols and expects them
to make difficult tradeoffs between consistency, availability, latency, and throughput. The latter puts a pressure to
choose one of the two extremes. Despite the abundance of research and proposals for more than 50 consistency
models, the distributed database community has not been able to commercialize consistency levels beyond strong
and eventual consistency.
Cosmos DB allows you to choose between five well-defined consistency models along the consistency spectrum
strong, bounded staleness, session, consistent prefix, and eventual.

The following table illustrates the specific guarantees each consistency level provides.
Consistency Levels and guarantees

Consistency Level Guarantees

Strong Linearizability

Bounded Consistent Prefix. Reads lag behind writes by k prefixes or t interval


Staleness

Session Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-


follows-reads

Consistent Prefix Updates returned are some prefix of all the updates, with no gaps

Eventual Out of order reads

157 | P a g e
70-534 Architecting Microsoft Azure Solutions

You can configure the default consistency level on your Cosmos DB account (and later override the consistency on a
specific read request). Internally, the default consistency level applies to data within the partition sets which may be
span regions.
Guaranteed service level agreements
Cosmos DB is the first managed database service to offer 99.99% SLA guarantees for availability, throughput, low
latency, and consistency.
Availability: 99.99% uptime availability SLA for each of the data and control plane operations.
Throughput: 99.99% of requests complete successfully
Latency: 99.99% of <10 ms latencies at the 99th percentile
Consistency: 100% of read requests will meet the consistency guarantee for the consistency level requested
by you.
Schema-free
Both relational and NoSQL databases force you to deal with schema & index management, versioning and migration
all of this is extremely challenging in a globally distributed setup. But dont worry -- Cosmos DB makes this problem go
away! With Cosmos DB, you do not have to manage schemas and indexes, deal with schema versioning or worry
about application downtime while migrating schemas. Cosmos DBs database engine is fully schema-agnostic it
automatically indexes all the data it ingests without requiring any schema or indexes and serves blazing fast queries.
Low cost of ownership
When all total cost of ownership (TCO) considerations taken into account, managed cloud services like Azure Cosmos
DB can be five to ten times more cost effective than their OSS counter-parts running on-premises or virtual machines.
And Azure Cosmos DB is up to two to three times cheaper than DynamoDB for high volume workloads. Learn more in
the TCO whitepaper.

Azure Storage Scalability and Performance Targets


This topic describes the scalability and performance topics for Microsoft Azure Storage. For a summary of other Azure
limits, see Azure Subscription and Service Limits, Quotas, and Constraints.
Note
All storage accounts run on the new flat network topology and support the scalability and performance targets
outlined below, regardless of when they were created. For more information on the Azure Storage flat network
architecture and on scalability, see Microsoft Azure Storage: A Highly Available Cloud Storage Service with Strong
Consistency.
Important
The scalability and performance targets listed here are high-end targets, but are achievable. In all cases, the request
rate and bandwidth achieved by your storage account depends upon the size of objects stored, the access patterns
utilized, and the type of workload your application performs. Be sure to test your service to determine whether its
performance meets your requirements. If possible, avoid sudden spikes in the rate of traffic and ensure that traffic is
well-distributed across partitions.
When your application reaches the limit of what a partition can handle for your workload, Azure Storage will begin to
return error code 503 (Server Busy) or error code 500 (Operation Timeout) responses. When this occurs, the
application should use an exponential backoff policy for retries. The exponential backoff allows the load on the
partition to decrease, and to ease out spikes in traffic to that partition.
If the needs of your application exceed the scalability targets of a single storage account, you can build your
application to use multiple storage accounts, and partition your data objects across those storage accounts. See Azure
Storage Pricing for information on volume pricing.
Scalability targets for blobs, queues, tables, and files

Resource Default Limit

Number of storage accounts per subscription 2001

TB per storage account 500 TB

158 | P a g e
70-534 Architecting Microsoft Azure Solutions

Resource Default Limit

Max number of blob containers, blobs, file Only limit is the 500 TB storage account capacity
shares, tables, queues, entities, or messages
per storage account

Max size of a single blob container, table, or 500 TB


queue

Max number of blocks in a block blob or 50,000


append blob

Max size of a block in a block blob 100 MB

Max size of a block blob 50,000 X 100 MB (approx. 4.75 TB)

Max size of a block in an append blob 4 MB

Max size of an append blob 50,000 X 4 MB (approx. 195 GB)

Max size of a page blob 8 TB

Max size of a table entity 1 MB

Max number of properties in a table entity 252

Max size of a message in a queue 64 KB

Max size of a file share 5 TB

Max size of a file in a file share 1 TB

Max number of files in a file share Only limit is the 5 TB total capacity of the file share

Max IOPS per share 1000

Max number of files in a file share Only limit is the 5 TB total capacity of the file share

Max number of blob containers, blobs, file Only limit is the 500 TB storage account capacity
shares, tables, queues, entities, or messages
per storage account

Max number of stored access policies per 5


container, file share, table, or queue

159 | P a g e
70-534 Architecting Microsoft Azure Solutions

Resource Default Limit

Maximum Request Rate per storage account Blobs: 20,000 requests per second for blobs of any valid
size (capped only by the account's ingress/egress
limits)
Files: 1000 IOPS (8 KB in size) per file share
Queues: 20,000 messages per second (assuming 1 KB
message size)
Tables: 20,000 transactions per second (assuming 1 KB
entity size)

Target throughput for single blob Up to 60 MB per second, or up to 500 requests per
second

Target throughput for single queue (1 KB Up to 2000 messages per second


messages)

Target throughput for single table partition (1 Up to 2000 entities per second
KB entities)

Target throughput for single file share Up to 60 MB per second

Max ingress2 per storage account (US Regions) 10 Gbps if GRS/ZRS3 enabled, 20 Gbps for LRS

Max egress2 per storage account (US Regions) 20 Gbps if RA-GRS/GRS/ZRS3 enabled, 30 Gbps for LRS

Max ingress2 per storage account (Non-US 5 Gbps if GRS/ZRS3 enabled, 10 Gbps for LRS
regions)

Max egress2 per storage account (Non-US 10 Gbps if RA-GRS/GRS/ZRS3 enabled, 15 Gbps for LRS
regions)
1
This includes both Standard and Premium storage accounts. If you require more than 200 storage accounts, make a
request through Azure Support. The Azure Storage team will review your business case and may approve up to 250
storage accounts.
2
Ingress refers to all data (requests) being sent to a storage account. Egress refers to all data (responses) being
received from a storage account.
3
Azure Storage replication options include:
RA-GRS: Read-access geo-redundant storage. If RA-GRS is enabled, egress targets for the secondary location
are identical to those for the primary location.
GRS: Geo-redundant storage.
ZRS: Zone-redundant storage. Available only for block blobs.
LRS: Locally redundant storage.
Scalability targets for virtual machine disks
An Azure virtual machine supports attaching a number of data disks. For optimal performance, you will want to limit
the number of highly utilized disks attached to the virtual machine to avoid possible throttling. If all disks are not being
highly utilized at the same time, the storage account can support a larger number disks.
For Azure Managed Disks: Managed Disks count limit is regional and also depends on the storage type. The
default and also the maximum limit is 10,000 per subscription, per region and per storage type. For example,

160 | P a g e
70-534 Architecting Microsoft Azure Solutions

you can create up to 10,000 standard managed disks and also 10,000 premium managed disks in a
subscription and in a region.
Managed Snapshots and Images are counted against the Managed Disks limit.
For standard storage accounts: A standard storage account has a maximum total request rate of 20,000 IOPS.
The total IOPS across all of your virtual machine disks in a standard storage account should not exceed this
limit.
You can roughly calculate the number of highly utilized disks supported by a single standard storage account based on
the request rate limit. For example, for a Basic Tier VM, the maximum number of highly utilized disks is about 66
(20,000/300 IOPS per disk), and for a Standard Tier VM, it is about 40 (20,000/500 IOPS per disk), as shown in the
table below.
For premium storage accounts: A premium storage account has a maximum total throughput rate of 50 Gbps.
The total throughput across all of your VM disks should not exceed this limit.
See Windows VM sizes or Linux VM sizes for additional details.
Managed virtual machine disks
Standard managed virtual machine disks

Standard
Disk Type S4 S6 S10 S20 S30 S40 S50

Disk size 30 GB 64 GB 128 GB 512 GB 1024 GB 2048 4095 GB


(1 TB) GB (4 TB)
(2TB)

IOPS per 500 500 500 500 500 500 500


disk

Throughput 60 60 60 60 60 60 60
per disk MB/sec MB/sec MB/sec MB/sec MB/sec MB/sec MB/sec

Premium managed virtual machine disks: per disk limits

Premium
Disks Type P4 P6 P10 P20 P30 P40 P50

Disk size 128 GB 512 GB 128 GB 512 GB 1024 GB 2048 GB 4095 GB


(1 TB) (2 TB) (4 TB)

IOPS per 120 240 500 2300 5000 7500 7500


disk

Throughput 25 50 100 150 200 250 250


per disk MB/sec MB/sec MB/sec MB/sec MB/sec MB/sec MB/sec

Premium managed virtual machine disks: per VM limits

Resource Default Limit

Max IOPS Per VM 80,000 IOPS with GS5 VM1

Max throughput per VM 2,000 MB/s with GS5 VM1


1
Refer to VM Size for limits on other VM sizes.

161 | P a g e
70-534 Architecting Microsoft Azure Solutions

Unmanaged virtual machine disks


Standard unmanaged virtual machine disks: per disk limits

VM Tier Basic Tier VM Standard Tier VM

Disk size 4095 GB 4095 GB

Max 8 KB IOPS per persistent disk 300 500

Max number of disks performing max IOPS 66 40

Premium unmanaged virtual machine disks: per account limits

Resource Default Limit

Total disk capacity per account 35 TB

Total snapshot capacity per account 10 TB

Max bandwidth per account (ingress + egress1) <=50 Gbps


1
Ingress refers to all data (requests) being sent to a storage account. Egress refers to all data (responses) being
received from a storage account.
Premium unmanaged virtual machine disks: per disk limits

Premium Storage Disk Type P10 P20 P30 P40 P50

Disk size 128 GiB 512 GiB 1024 GiB (1 2048 GiB (2 4095 GiB (4
TB) TB) TB)

Max IOPS per disk 500 2300 5000 7500 7500

Max throughput per disk 100 150 200 MB/s 250 MB/s 250 MB/s
MB/s MB/s

Max number of disks per 280 70 35 17 8


storage account

Premium unmanaged virtual machine disks: per VM limits

Resource Default Limit

Max IOPS Per VM 80,000 IOPS with GS5 VM1

Max throughput per VM 2,000 MB/s with GS5 VM1


1
Refer to VM Size for limits on other VM sizes.
Scalability targets for Azure resource manager
The following limits apply when using the Azure Resource Manager and Azure Resource Groups only.

162 | P a g e
70-534 Architecting Microsoft Azure Solutions

Resource Default Limit

Storage account management operations (read) 800 per 5 minutes

Storage account management operations (write) 200 per hour

Storage account management operations (list) 100 per 5 minutes

Partitions in Azure Storage


Every object that holds data that is stored in Azure Storage (blobs, messages, entities, and files) belongs to a partition,
and is identified by a partition key. The partition determines how Azure Storage load balances blobs, messages,
entities, and files across servers to meet the traffic needs of those objects. The partition key is unique and is used to
locate a blob, message, or entity.
The table shown above in Scalability Targets for Standard Storage Accounts lists the performance targets for a single
partition for each service.
Partitions affect load balancing and scalability for each of the storage services in the following ways:
Blobs: The partition key for a blob is account name + container name + blob name. This means that each blob
can have its own partition if load on the blob demands it. Blobs can be distributed across many servers in
order to scale out access to them, but a single blob can only be served by a single server. While blobs can be
logically grouped in blob containers, there are no partitioning implications from this grouping.
Files: The partition key for a file is account name + file share name. This means all files in a file share are also
in a single partition.
Messages: The partition key for a message is the account name + queue name, so all messages in a queue are
grouped into a single partition and are served by a single server. Different queues may be processed by
different servers to balance the load for however many queues a storage account may have.
Entities: The partition key for an entity is account name + table name + partition key, where the partition key
is the value of the required user-defined PartitionKey property for the entity. All entities with the same
partition key value are grouped into the same partition and are served by the same partition server. This is an
important point to understand in designing your application. Your application should balance the scalability
benefits of spreading entities across multiple partitions with the data access advantages of grouping entities
in a single partition.
A key advantage to grouping a set of entities in a table into a single partition is that it's possible to perform atomic
batch operations across entities in the same partition, since a partition exists on a single server. Therefore, if you wish
to perform batch operations on a group of entities, consider grouping them with the same partition key.
On the other hand, entities that are in the same table but have different partition keys can be load balanced across
different servers, making it possible to have greater scalability.
Detailed recommendations for designing partitioning strategy for tables can be found here.

163 | P a g e
70-534 Architecting Microsoft Azure Solutions

CHAPTER 5: Design Azure Web and Mobile Apps


App Service Web Apps
App Service Web Apps is a fully managed compute platform that is optimized for hosting websites and web
applications. This platform-as-a-service (PaaS) offering of Microsoft Azure lets you focus on your business logic while
Azure takes care of the infrastructure to run and scale your apps.1
What is a web app in App Service?
In App Service, a web app is the compute resources that Azure provides for hosting a website or web application.
The compute resources may be on shared or dedicated virtual machines (VMs), depending on the pricing tier that you
choose. Your application code runs in a managed VM that is isolated from other customers.
Your code can be in any language or framework that is supported by Azure App Service, such as ASP.NET, Node.js,
Java, PHP, or Python. You can also run scripts that use PowerShell and other scripting languages in a web app.
For examples of typical application scenarios that you can use Web Apps for, see Web app scenarios and the Scenarios
and recommendations section of Azure App Service, Virtual Machines, Service Fabric, and Cloud Services comparison.
Why use Web Apps?
Here are some key features of App Service that apply to Web Apps:
Multiple languages and frameworks - App Service has first-class support for ASP.NET, Node.js, Java, PHP, and
Python. You can also run PowerShell and other scripts or executables on App Service VMs.
DevOps optimization - Set up continuous integration and deployment with Visual Studio Team Services,
GitHub, or BitBucket. Promote updates through test and staging environments. Perform A/B testing. Manage
your apps in App Service by using Azure PowerShell or the cross-platform command-line interface (CLI).
Global scale with high availability - Scale up or out manually or automatically. Host your apps anywhere in
Microsoft's global datacenter infrastructure, and the App Service SLA promises high availability.
Connections to SaaS platforms and on-premises data - Choose from more than 50 connectors for enterprise
systems (such as SAP, Siebel, and Oracle), SaaS services (such as Salesforce and Office 365), and internet
services (such as Facebook and Twitter). Access on-premises data using Hybrid Connections and Azure Virtual
Networks.
Security and compliance - App Service is ISO, SOC, and PCI compliant.
Application templates - Choose from an extensive list of application templates in the Azure Marketplace that
let you use a wizard to install popular open-source software such as WordPress, Joomla, and Drupal.
Visual Studio integration - Dedicated tools in Visual Studio streamline the work of creating, deploying, and
debugging.
In addition, a web app can take advantage of features offered by API Apps (such as CORS support) and Mobile
Apps (such as push notifications). For more information about app types in App Service, see Azure App Service
overview.
Besides Web Apps in App Service, Azure offers other services that can be used for hosting websites and web
applications. For most scenarios, Web Apps is the best choice. For microservice architecture, consider Service Fabric,
and if you need more control over the VMs that your code runs on, consider Azure Virtual Machines. For more
information about how to choose between these Azure services, see Azure App Service, Virtual Machines, Service
Fabric, and Cloud Services comparison.

Host ASP.NET Web API 2 in an Azure Worker Role


This tutorial shows how to host ASP.NET Web API in an Azure Worker Role, using OWIN to self-host the Web API
framework.
Open Web Interface for .NET (OWIN) defines an abstraction between .NET web servers and web applications. OWIN
decouples the web application from the server, which makes OWIN ideal for self-hosting a web application in your
own process, outside of IISfor example, inside an Azure worker role.
In this tutorial, you'll use the Microsoft.Owin.Host.HttpListener package, which provides an HTTP server that be used
to self-host OWIN applications.
Software versions used in the tutorial
Visual Studio 2013
Web API 2

164 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure SDK for .NET 2.3


Create a Microsoft Azure Project
Start Visual Studio with administrator privileges. Administrator privileges are needed to debug the application locally,
using the Azure compute emulator.
On the File menu, click New, then click Project. From Installed Templates, under Visual C#, click Cloud and then
click Windows Azure Cloud Service. Name the project "AzureApp" and click OK.

In the New Windows Azure Cloud Service dialog, double-click Worker Role. Leave the default name ("WorkerRole1").
This step adds a worker role to the solution. Click OK.

The Visual Studio solution that is created contains two projects:


"AzureApp" defines the roles and configuration for the Azure application.
"WorkerRole1" contains the code for the worker role.

165 | P a g e
70-534 Architecting Microsoft Azure Solutions

In general, an Azure application can contain multiple roles, although this tutorial uses a single role.

Add the Web API and OWIN Packages


From the Tools menu, click Library Package Manager, then click Package Manager Console.
In the Package Manager Console window, enter the following command:
Install-Package Microsoft.AspNet.WebApi.OwinSelfHost

Add an HTTP Endpoint


In Solution Explorer, expand the AzureApp project. Expand the Roles node, right-click WorkerRole1, and
select Properties.

Click Endpoints, and then click Add Endpoint.


In the Protocol dropdown list, select "http". In Public Port and Private Port, type 80. These port numbers can be
different. The public port is what clients use when they send a request to the role.

166 | P a g e
70-534 Architecting Microsoft Azure Solutions

Configure Web API for Self-Host


In Solution Explorer, right click the WorkerRole1 project and select Add / Class to add a new class. Name the
class Startup.

Replace all of the boilerplate code in this file with the following:
using Owin;
using System.Web.Http;

namespace WorkerRole1
{
class Startup
{
public void Configuration(IAppBuilder app)
{
HttpConfiguration config = new HttpConfiguration();
config.Routes.MapHttpRoute(
"Default",
"{controller}/{id}",
new { id = RouteParameter.Optional });

app.UseWebApi(config);
}
}
}

Add a Web API Controller


Next, add a Web API controller class. Right-click the WorkerRole1 project and select Add / Class. Name the class
TestController. Replace all of the boilerplate code in this file with the following:

using System;
using System.Net.Http;
using System.Web.Http;

167 | P a g e
70-534 Architecting Microsoft Azure Solutions

namespace WorkerRole1
{
public class TestController : ApiController
{
public HttpResponseMessage Get()
{
return new HttpResponseMessage()
{
Content = new StringContent("Hello from OWIN!")
};
}

public HttpResponseMessage Get(int id)


{
string msg = String.Format("Hello from OWIN (id = {0})", id);
return new HttpResponseMessage()
{
Content = new StringContent(msg)
};
}
}
}

For simplicity, this controller just defines two GET methods that return plain text.
Start the OWIN Host
Open the WorkerRole.cs file. This class defines the code that runs when the worker role is started and stopped.
Add the following using statement:
using Microsoft.Owin.Hosting;

Add an IDisposable member to the WorkerRole class:


public class WorkerRole : RoleEntryPoint
{
private IDisposable _app = null;

// ....
}

In the OnStart method, add the following code to start the host:
public override bool OnStart()
{
ServicePointManager.DefaultConnectionLimit = 12;

// New code:
var endpoint = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["Endpoint1"];
string baseUri = String.Format("{0}://{1}",
endpoint.Protocol, endpoint.IPEndpoint);

Trace.TraceInformation(String.Format("Starting OWIN at {0}", baseUri),


"Information");

_app = WebApp.Start<Startup>(new StartOptions(url: baseUri));


return base.OnStart();
}

The WebApp.Start method starts the OWIN host. The name of the Startup class is a type parameter to the method. By
convention, the host will call the Configure method of this class.
Override the OnStop to dispose of the _app instance:
public override void OnStop()
{
if (_app != null)
{
_app.Dispose();
}
base.OnStop();
}

168 | P a g e
70-534 Architecting Microsoft Azure Solutions

Here is the complete code for WorkerRole.cs:


using Microsoft.Owin.Hosting;
using Microsoft.WindowsAzure.ServiceRuntime;
using System;
using System.Diagnostics;
using System.Net;
using System.Threading;

namespace WorkerRole1
{
public class WorkerRole : RoleEntryPoint
{
private IDisposable _app = null;

public override void Run()


{
Trace.TraceInformation("WebApiRole entry point called", "Information");

while (true)
{
Thread.Sleep(10000);
Trace.TraceInformation("Working", "Information");
}
}

public override bool OnStart()


{
ServicePointManager.DefaultConnectionLimit = 12;

var endpoint = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["Endpoint1"];


string baseUri = String.Format("{0}://{1}",
endpoint.Protocol, endpoint.IPEndpoint);

Trace.TraceInformation(String.Format("Starting OWIN at {0}", baseUri),


"Information");

_app = WebApp.Start<Startup>(new StartOptions(url: baseUri));


return base.OnStart();
}

public override void OnStop()


{
if (_app != null)
{
_app.Dispose();
}
base.OnStop();
}
}
}
Build the solution, and press F5 to run the application locally in the Azure Compute Emulator. Depending on your
firewall settings, you might need to allow the emulator through your firewall.
Note
If you get an exception like the following, please see this blog post for a workaround. "Could not load file or assembly
'Microsoft.Owin, Version=2.0.2.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies.
The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT:
0x80131040)"
The compute emulator assigns a local IP address to the endpoint. You can find the IP address by viewing the Compute
Emulator UI. Right-click the emulator icon in the task bar notification area, and select Show Compute Emulator UI.

169 | P a g e
70-534 Architecting Microsoft Azure Solutions

Find the IP address under Service Deployments, deployment [id], Service Details. Open a web browser and navigate to
http://address/test/1, where address is the IP address assigned by the compute emulator; for
example, http://127.0.0.1:80/test/1. You should see the response from the Web API controller:

Deploy to Azure
For this step, you must have an Azure account. If you don't already have one, you can create a free trial account in just
a couple of minutes. For details, see Microsoft Azure Free Trial.
In Solution Explorer, right-click the AzureApp project. Select Publish.

170 | P a g e
70-534 Architecting Microsoft Azure Solutions

If you are not signed in to your Azure account, click Sign In.

After you are signed in, choose a subscription and click Next.

171 | P a g e
70-534 Architecting Microsoft Azure Solutions

Enter a name for the cloud service and choose a region. Click Create.

Click Publish.

172 | P a g e
70-534 Architecting Microsoft Azure Solutions

The Azure Activity Log window shows the progress of the deployment. When the app is deployed, browse
to http://appname.cloudapp.net/test/1.

Run Background tasks with WebJobs


You can run programs or scripts in WebJobs in your Azure App Service web app in three ways: on demand,
continuously, or on a schedule. There is no additional cost to use WebJobs.
Note
The WebJobs SDK does not yet support .NET Core.
This article shows how to deploy WebJobs by using the Azure Portal. For information about how to deploy by using
Visual Studio or a continuous delivery process, see How to Deploy Azure WebJobs to Web Apps.
The Azure WebJobs SDK simplifies many WebJobs programming tasks. For more information, see What is the
WebJobs SDK.
Azure Functions provides another way to run programs and scripts from either a serverless environment or from an
App Service app. For more information, see Azure Functions overview.
Note
Although this article refers to web apps, it also applies to API apps and mobile apps.
Acceptable file types for scripts or programs
The following file types are accepted:
.cmd, .bat, .exe (using windows cmd)
.ps1 (using powershell)

173 | P a g e
70-534 Architecting Microsoft Azure Solutions

.sh (using bash)


.php (using php)
.py (using python)
.js (using node)
.jar (using java)

Create an on demand WebJob in the portal


1. In the Web App blade of the Azure Portal, click All settings > WebJobs to show the WebJobs blade.

2. Click Add. The Add WebJob dialog appears.

3. Under Name, provide a name for the WebJob. The name must start with a letter or a number and cannot
contain any special characters other than "-" and "_".
4. In the How to Run box, choose Run on Demand.

174 | P a g e
70-534 Architecting Microsoft Azure Solutions

5. In the File Upload box, click the folder icon and browse to the zip file that contains your script. The zip file
should contain your executable (.exe .cmd .bat .sh .php .py .js) as well as any supporting files needed to run
the program or script.
6. Check Create to upload the script to your web app.
The name you specified for the WebJob appears in the list on the WebJobs blade.
7. To run the WebJob, right-click its name in the list and click Run.

Create a continuously running WebJob


1. To create a continuously executing WebJob, follow the same steps for creating a WebJob that runs once, but
in the How to Run box, choose Continuous.
2. To start or stop a continuous WebJob, right-click the WebJob in the list and click Start or Stop.
Note
If your web app runs on more than one instance, a continuously running WebJob will run on all of your instances. On-
demand and scheduled WebJobs run on a single instance selected for load balancing by Microsoft Azure.
For Continuous WebJobs to run reliably and on all instances, enable the Always On* configuration setting for the web
app otherwise they can stop running when the SCM host site has been idle for too long.
Create a scheduled WebJob using a CRON expression
This technique is available to Web Apps running in Basic, Standard or Premium mode, and requires the Always
On setting to be enabled on the app.2
To turn an On Demand WebJob into a scheduled WebJob, simply include a settings.job file at the root of your WebJob
zip file. This JSON file should include a schedule property with a CRON expression, per example below.
The CRON expression is composed of 6 fields: {second} {minute} {hour} {day} {month} {day of the week}.
For example, to trigger your WebJob every 15 minutes, your settings.job would have:
{
"schedule": "0 */15 * * * *"
}

Other CRON schedule examples:


Every hour (i.e. whenever the count of minutes is 0): 0 0 * * * *
Every hour from 9 AM to 5 PM: 0 0 9-17 * * *
At 9:30 AM every day: 0 30 9 * * *
At 9:30 AM every week day: 0 30 9 * * 1-5

175 | P a g e
70-534 Architecting Microsoft Azure Solutions

Note: when deploying a WebJob from Visual Studio, make sure to mark your settings.job file properties as 'Copy if
newer'.
Create a scheduled WebJob using the Azure Scheduler
The following alternate technique makes use of the Azure Scheduler. In this case, your WebJob does not have any
direct knowledge of the schedule. Instead, the Azure Scheduler gets configured to trigger your WebJob on a schedule.
The Azure Portal doesn't yet have the ability to create a scheduled WebJob, but until that feature is added you can do
it by using the classic portal.
1. In the classic portal go to the WebJob page and click Add.
2. In the How to Run box, choose Run on a schedule.

3. Choose the Scheduler Region for your job, and then click the arrow on the bottom right of the dialog to
proceed to the next screen.
4. In the Create Job dialog, choose the type of Recurrence you want: One-time job or Recurring job.

5. Also choose a Starting time: Now or At a specific time.

176 | P a g e
70-534 Architecting Microsoft Azure Solutions

6. If you want to start at a specific time, choose your starting time values under Starting On.

Scale up an app in Azure


There are two workflows for scaling, scale up and scale out, and this article explains the scale up workflow.
Scale up: Get more CPU, memory, disk space, and extra features like dedicated virtual machines (VMs),
custom domains and certificates, staging slots, autoscaling, and more. You scale up by changing the pricing
tier of the App Service plan that your app belongs to.
Scale out: Increase the number of VM instances that run your app. You can scale out to as many as 20
instances, depending on your pricing tier. App Service Environments in Premium tier will further increase your
scale-out count to 50 instances. For more information about scaling out, see Scale instance count manually or
automatically. There you will find out how to use autoscaling, which is to scale instance count automatically
based on predefined rules and schedules.
The scale settings take only seconds to apply and affect all apps in your App Service plan. They do not require you to
change your code or redeploy your application.
For information about the pricing and features of individual App Service plans, see App Service Pricing Details.
Note
Before you switch an App Service plan from the Free tier, you must first remove the spending limits in place for your
Azure subscription. To view or change options for your Microsoft Azure App Service subscription, see Microsoft Azure
Subscriptions.
Scale up your pricing tier
1. In your browser, open the Azure portal.
2. In your app's blade, click All settings, and then click Scale Up.

177 | P a g e
70-534 Architecting Microsoft Azure Solutions

3. Choose your tier, and then click Select.


The Notifications tab will flash a green SUCCESS after the operation is complete.
Scale related resources
If your app depends on other services, such as Azure SQL Database or Azure Storage, you can also scale up those
resources based on your needs. These resources are not scaled with the App Service plan and must be scaled
separately.
1. In Essentials, click the Resource group link.

178 | P a g e
70-534 Architecting Microsoft Azure Solutions

2. In the Summary part of the Resource group blade, click a resource that you want to scale. The following
screenshot shows a SQL Database resource and an Azure Storage resource.

3. For a SQL Database resource, click Settings > Pricing tier to scale the pricing tier.

You can also turn on geo-replication for your SQL Database instance.
For an Azure Storage resource, click Settings > Configuration to scale up your storage options.

179 | P a g e
70-534 Architecting Microsoft Azure Solutions

Learn about developer features


Depending on the pricing tier, the following developer-oriented features are available:
Bitness
The Basic, Standard, and Premium tiers support 64-bit and 32-bit applications.
The Free and Shared plan tiers support 32-bit applications only.
Debugger support
Debugger support is available for the Free, Shared, and Basic modes at one connection per App Service plan.
Debugger support is available for the Standard and Premium modes at five concurrent connections per App
Service plan.

Scale instance count manually or automatically


In the Azure Portal, you can manually set the instance count of your service, or, you can set parameters to have it
automatically scale based on demand. This is typically referred to as Scale out or Scale in.
Before scaling based on instance count, you should consider that scaling is affected by Pricing tier in addition to
instance count. Different pricing tiers can have different numbers cores and memory, and so they will have better
performance for the same number of instances (which is Scale up or Scale down). This article specifically
covers Scale in and out.
You can scale in the portal, and you can also use the REST API or .NET SDK to adjust scale manually or
automatically.
Note:This article describes how to create an autoscale setting in the portal at http://portal.azure.com. Autoscale
settings created in this portal cannot be edited it the classic portal (http://manage.windowsazure.com).
Scaling manually
In the Azure Portal, click Browse, then navigate to the resource you want to scale, such as an App Service plan.
Click Settings > Scale out (App Service plan).
At the top of the Scale blade you can see a history of autoscale actions of the service.

180 | P a g e
70-534 Architecting Microsoft Azure Solutions

Note:Only actions that are performed by


autoscale will show up in this chart. If you
manually adjust the instance count, the
change will not be reflected in this chart.
You can manually adjust the
number Instances with slider.
Click the Save command and you'll be scaled to
that number of instances almost immediately.
Scaling based on a pre-set metric
If you want the number of instances to
automatically adjust based on a metric, select
the metric you want in the Scale by dropdown.
For example, for an App Service plan you can
scale by CPU Percentage.
When you select a metric you'll get a slider,
and/or, text boxes to enter the number of
instances you want to scale between:

Autoscale will never take your service below or


above the boundaries that you set, no matter
your load.
Second, you choose the target range for the
metric. For example, if you chose CPU percentage, you can set a target for the average CPU across all of the
instances in your service. A scale out will happen when the average CPU exceeds the maximum you define,
likewise, a scale in will happen whenever the average CPU drops below the minimum.
Click the Save command. Autoscale will check
every few minutes to make sure that you are
in the instance range and target for your
metric. When your service receives additional
traffic, you will get more instances without
doing anything.
Scale based on other metrics
You can scale based on metrics other than the
presets that appear in the Scale by dropdown,
and can even have a complex set of scale out
and scale in rules.
Adding or changing a rule

181 | P a g e
70-534 Architecting Microsoft Azure Solutions

Choose the schedule and performance rules in the Scale

by dropdown:
If you previously had autoscale, on you'll see a view of the exact rules that you had.
To scale based on another metric click the Add Rule row. You can also click one of the existing rows to change
from the metric you previously had to the metric you want to scale
by.

Now you need to select which metric you want to scale by. When choosing a metric there are a couple things to
consider:
The resource the metric comes from. Typically, this will be the same as the resource you are scaling. However, if
you want to scale by the depth of a Storage queue, the resource is the queue that you want to scale by.

182 | P a g e
70-534 Architecting Microsoft Azure Solutions

The metric name itself.


The time aggregation of the metric. This is how the data is combine over the duration.
After choosing your metric you choose the threshold for the metric, and the operator. For example, you could
say Greater than 80%.
Then choose the action that you want to take. There are a couple different type of actions:
Increase or decrease by - this will add or remove the Value number of instances you define
Increase or decrease percent - this will change the instance count by a percent. For example, you could put 25 in
the Value field, and if you currently had 8 instances, 2 would be added.
Increase or decrease to - this will set the instance count to the Value you define.
Finally, you can choose cool down - how long this rule should wait after the previous scale action to scale again.
After configuring your rule hit OK.
Once you have configured all of the rules you want, be sure to hit the Save command.
Scaling with multiple steps
The examples above are pretty basic. However, if you want to be more agressive about scaling up (or down), you
can even add multiple scale rules for the same metric. For example, you can define two scale rules on CPU
percentage:
Scale out by 1 instance if CPU percentage is above 60%
Scale out by 3 instances if CPU percentage is above 85%

1
With this additional rule, if your load exceeds 85% before a scale action, you will get two additional instances
instead of one.
Scale based on a schedule
By default, when you create a scale rule it will always apply. You can see that when you click on the profile header:

183 | P a g e
70-534 Architecting Microsoft Azure Solutions

However, you may want to have more agressive scaling during the day, or the week, than on the weekend. You
could even shut down your service entirely off working hours.
To do this, on the profile you have, select recurrence instead of always, and choose the times that you want the
profile to apply.
For example, to have a profile that applies during the week, in the Days dropdown uncheck Saturday and Sunday.
To have a profile that applies during the daytime, set the Start time to the time of day that you want to start at.

184 | P a g e
70-534 Architecting Microsoft Azure Solutions

Click OK.
Next, you will need to add the profile that you want to apply at other
times. Click the Add Profile row.

Name your new, second, profile, for example you could call it Off work.
185 | P a g e
70-534 Architecting Microsoft Azure Solutions

Then select recurrence again, and choose the instance count range you want during this time.
As with the Default profile, choose the Days you want this profile to apply to, and the Start time during the day.
Note
Autoscale will use the Daylight savings rules for whichever Time zone you select. However, during Daylight savings
time the UTC offset will show the base Time zone offset, not the Daylight savings UTC offset.
Click OK.
Now, you will need to add whatever rules you want to apply during your second profile. Click Add Rule, and then
you could construct the same rule you have during the Default profile.

Be sure to create both a rule for scale out and scale in, or else during the profile the instance count will only grow
(or decrease).
Finally, click Save.

Controlling Azure web app traffic with Azure Traffic Manager


Note: This article provides summary information for Microsoft Azure Traffic Manager as it relates to Azure App
Service Web Apps. More information about Azure Traffic Manager itself can be found by visiting the links at the
end of this article.
Introduction

186 | P a g e
70-534 Architecting Microsoft Azure Solutions

You can use Azure Traffic Manager to control how requests from web clients are distributed to web apps in Azure
App Service. When web app endpoints are added to a Azure Traffic Manager profile, Azure Traffic Manager keeps
track of the status of your web apps (running, stopped or deleted) so that it can decide which of those endpoints
should receive traffic.
Load Balancing Methods
Azure Traffic Manager uses three different load balancing methods. These are described in the following list as
they pertain to Azure web apps.
Failover: If you have web app clones in different regions, you can use this method to configure one web app to
service all web client traffic, and configure another web app in a different region to service that traffic in case the
first web app becomes unavailable.
Round Robin: If you have web app clones in different regions, you can use this method to distribute traffic equally
across the web apps in different regions.
Performance: The Performance method distributes traffic based on the shortest round trip time to clients. The
Performance method can be used for web apps within the same region or in different regions.
Web Apps and Traffic Manager Profiles
To configure the control of web app traffic, you create a profile in Azure Traffic Manager that uses one of the
three load balancing methods described previously, and then add the endpoints (in this case, web apps) for which
you want to control traffic to the profile. Your web app status (running, stopped or deleted) is regularly
communicated to the profile so that Azure Traffic Manager can direct traffic accordingly.
When using Azure Traffic Manager with Azure, keep in mind the following points:
For web app only deployments within the same region, Web Apps already provides failover and round-robin
functionality without regard to web app mode.
For deployments in the same region that use Web Apps in conjunction with another Azure cloud service, you can
combine both types of endpoints to enable hybrid scenarios.
You can only specify one web app endpoint per region in a profile. When you select a web app as an endpoint for
one region, the remaining web apps in that region become unavailable for selection for that profile.
The web app endpoints that you specify in a Azure Traffic Manager profile will appear under the Domain
Names section on the Configure page for the web app in the profile, but will not be configurable there.
After you add a web app to a profile, the Site URL on the Dashboard of the web app's portal page will display the
custom domain URL of the web app if you have set one up. Otherwise, it will display the Traffic Manager profile
URL (for example, contoso.trafficmgr.com). Both the direct domain name of the web app and the Traffic Manager
URL will be visible on the web app's Configure page under the Domain Names section.
Your custom domain names will work as expected, but in addition to adding them to your web apps, you must
also configure your DNS map to point to the Traffic Manager URL. For information on how to set up a custom
domain for a Azure web app, see Configuring a custom domain name for an Azure web site.
You can only add web apps that are in standard mode to a Azure Traffic Manager profile.

Azure App Service plans in-depth overview


App Service plans represent the collection of physical resources used to host your apps.1
App Service plans define:
Region (West US, East US, etc.)
Scale count (one, two, three instances, etc.)
Instance size (Small, Medium, Large)
SKU (Free, Shared, Basic, Standard, Premium)
Web Apps, Mobile Apps, API Apps, Function Apps (or Functions), in Azure App Service all run in an App Service plan.
Apps in the same subscription, region, and resource group can share an App Service plan.
All applications assigned to an App Service plan share the resources defined by it. This sharing saves money when
hosting multiple apps in a single App Service plan.
Your App Service plan can scale from Free and Shared SKUs to Basic, Standard, and Premium SKUs giving you access to
more resources and features along the way.1
If your App Service plan is set to Basic SKU or higher, then you can control the size and scale count of the VMs.

187 | P a g e
70-534 Architecting Microsoft Azure Solutions

For example, if your plan is configured to use two "small" instances in the standard service tier, all apps that are
associated with that plan run on both instances. Apps also have access to the standard service tier features. Plan
instances on which apps are running are fully managed and highly available.
Important
The SKU and Scale of the App Service plan determines the cost and not the number of apps hosted in it.
This article explores the key characteristics, such as tier and scale, of an App Service plan and how they come into play
while managing your apps.
Apps and App Service plans
An app in App Service can be associated with only one App Service plan at any given time.
Both apps and plans are contained in a resource group. A resource group serves as the lifecycle boundary for every
resource that's within it. You can use resource groups to manage all the pieces of an application together.
Because a single resource group can have multiple App Service plans, you can allocate different apps to different
physical resources.
For example, you can separate resources among dev, test, and production environments. Having separate
environments for production and dev/test lets you isolate resources. In this way, load testing against a new version of
your apps does not compete for the same resources as your production apps, which are serving real customers.
When you have multiple plans in a single resource group, you can also define an application that spans geographical
regions.
For example, a highly available app running in two regions includes at least two plans, one for each region, and one
app associated with each plan. In such a situation, all the copies of the app are then contained in a single resource
group. Having a resource group with multiple plans and multiple apps makes it easy to manage, control, and view the
health of the application.
Create an App Service plan or use existing one
When you create an app, you should consider creating a resource group. On the other hand, if this app is a
component for a larger application, create it within the resource group that's allocated for that larger application.
Whether the app is an altogether new application or part of a larger one, you can choose to use an existing plan to
host it or create a new one. This decision is more a question of capacity and expected load.
We recommend isolating your app into a new App Service plan when:1
App is resource-intensive.
App has different scaling factors from the other apps hosted in an existing plan.
App needs resource in a different geographical region.
This way you can allocate a new set of resources for your app and gain greater control of your apps.
Create an App Service plan
Tip
If you have an App Service Environment, you can review the documentation specific to App Service Environments
here: Create an App Service plan in an App Service Environment
You can create an empty App Service plan from the App Service plan browse experience or as part of app creation.

188 | P a g e
70-534 Architecting Microsoft Azure Solutions

In the Azure portal, click New > Web + mobile, and then select Web App or other App Service app
kind.

You can then select or create the App Service plan for the new app.

To create an App Service plan, click [+] Create New, type the App Service plan name, and then select an
appropriate Location. Click Pricing tier, and then select an appropriate pricing tier for the service. Select View all to
view more pricing options, such as Free and Shared. After you have selected the pricing tier, click the Select button.
Move an app to a different App Service plan
You can move an app to a different App Service plan in the Azure portal. You can move apps between plans as long as
the plans are in the same resource group and geographical region.
To move an app to another plan:
Navigate to the app that you want to move.
In the Menu, look for the App Service Plan section.
189 | P a g e
70-534 Architecting Microsoft Azure Solutions

Select Change App Service plan to start the process.


Change App Service plan opens the App Service plan selector. At this point, you can pick an existing plan to move this
app into.
Important
Only valid plans (in the same resource group and geographical location) are shown.

Each plan has its own pricing tier. For example, moving a site from a Free tier to a Standard tier, enables all apps
assigned to it to use the features and resources of the Standard tier.
Clone an app to a different App Service plan
If you want to move the app to a different region, one alternative is app cloning. Cloning makes a copy of your app in a
new or existing App Service plan in any region.
You can find Clone App in the Development Tools section of the menu.
Important
Cloning has some limitations that you can read about at Azure App Service App cloning using Azure portal.
Scale an App Service plan
There are three ways to scale a plan:
Change the plans pricing tier. A plan in the Basic tier can be converted to Standard, and all apps assigned to it
to use the features of the Standard tier.
Change the plans instance size. As an example, a plan in the Basic tier that uses small instances can be
changed to use large instances. All apps that are associated with that plan now can use the additional memory
and CPU resources that the larger instance size offers.
Change the plans instance count. For example, a Standard plan that's scaled out to three instances can be
scaled to 10 instances. A Premium plan can be scaled out to 20 instances (subject to availability). All apps that
are associated with that plan now can use the additional memory and CPU resources that the larger instance
count offers.
You can change the pricing tier and instance size by clicking the Scale Up item under settings for either the app or the
App Service plan. Changes apply to the App Service plan and affect all apps that it hosts.

190 | P a g e
70-534 Architecting Microsoft Azure Solutions

App Service plan cleanup


Important
App Service plans that have no apps associated to them still incur charges since they continue to reserve the compute
capacity.
To avoid unexpected charges, when the last app hosted in an App Service plan is deleted, the resulting empty App
Service plan is also deleted.
Summary
App Service plans represent a set of features and capacity that you can share across your apps. App Service plans give
you the flexibility to allocate specific apps to a set of resources and further optimize your Azure resource utilization.
This way, if you want to save money on your testing environment, you can share a plan across multiple apps. You can
also maximize throughput for your production environment by scaling it across multiple regions and plans.

Set up staging environments in Azure App Service


When you deploy your web app, web app on Linux, mobile back end, and API app to App Service, you can deploy to a
separate deployment slot instead of the default production slot when running in the Standard or Premium App Service
plan mode. Deployment slots are actually live apps with their own hostnames. App content and configurations
elements can be swapped between two deployment slots, including the production slot. Deploying your application to
a deployment slot has the following benefits:
You can validate app changes in a staging deployment slot before swapping it with the production slot.
Deploying an app to a slot first and swapping it into production ensures that all instances of the slot are
warmed up before being swapped into production. This eliminates downtime when you deploy your app. The
traffic redirection is seamless, and no requests are dropped as a result of swap operations. This entire
workflow can be automated by configuring Auto Swap when pre-swap validation is not needed.
After a swap, the slot with previously staged app now has the previous production app. If the changes
swapped into the production slot are not as you expected, you can perform the same swap immediately to
get your "last known good site" back.
Each App Service plan mode supports a different number of deployment slots. To find out the number of slots your
app's mode supports, see App Service Pricing.
When your app has multiple slots, you cannot change the mode.
Scaling is not available for non-production slots.

191 | P a g e
70-534 Architecting Microsoft Azure Solutions

Linked resource management is not supported for non-production slots. In the Azure Portal only, you can
avoid this potential impact on a production slot by temporarily moving the non-production slot to a different
App Service plan mode. Note that the non-production slot must once again share the same mode with the
production slot before you can swap the two slots.
Add a deployment slot
The app must be running in the Standard or Premium mode in order for you to enable multiple deployment slots.
1. In the Azure Portal, open your app's resource blade.
2. Choose the Deployment slots option, then click Add Slot.

Note:If the app is not already in the Standard or Premium mode, you will receive a message indicating the supported
modes for enabling staged publishing. At this point, you have the option to select Upgrade and navigate to
the Scale tab of your app before continuing.
3. In the Add a slot blade, give the slot a name, and select whether to clone app configuration from another
existing deployment slot. Click the check mark to continue.

192 | P a g e
70-534 Architecting Microsoft Azure Solutions

The first time you add a slot, you will only have two choices: clone configuration from the default slot in production or
not at all. After you have created several slots, you will be able to clone configuration from a slot other than the one in
production:

193 | P a g e
70-534 Architecting Microsoft Azure Solutions

4. In your app's resource blade, click Deployment slots, then click a deployment slot to open that slot's resource
blade, with a set of metrics and configuration just like any other app. The name of the slot is shown at the top
of the blade to remind you that you are viewing the deployment slot.

5. Click the app URL in the slot's blade. Notice the deployment slot has its own hostname and is also a live app.
To limit public access to the deployment slot, see App Service Web App block web access to non-production
deployment slots.
There is no content after deployment slot creation. You can deploy to the slot from a different repository branch, or
an altogether different repository. You can also change the slot's configuration. Use the publish profile or deployment
credentials associated with the deployment slot for content updates. For example, you can publish to this slot with git.
Configuration for deployment slots
When you clone configuration from another deployment slot, the cloned configuration is editable. Furthermore, some
configuration elements will follow the content across a swap (not slot specific) while other configuration elements will
stay in the same slot after a swap (slot specific). The following lists show the configuration that will change when you
swap slots.
Settings that are swapped:
General settings - such as framework version, 32/64-bit, Web sockets
App settings (can be configured to stick to a slot)
Connection strings (can be configured to stick to a slot)
Handler mappings
Monitoring and diagnostic settings
WebJobs content
Settings that are not swapped:
Publishing endpoints
Custom Domain Names
SSL certificates and bindings
Scale settings
WebJobs schedulers
To configure an app setting or connection string to stick to a slot (not swapped), access the Application Settings blade
for a specific slot, then select the Slot Setting box for the configuration elements that should stick the slot. Note that
marking a configuration element as slot specific has the effect of establishing that element as not swappable across all
the deployment slots associated with the app.

194 | P a g e
70-534 Architecting Microsoft Azure Solutions

Swap deployment slots


You can swap deployment slots in the Overview or Deployment slots view of your app's resource blade.
Important
Before you swap an app from a deployment slot into production, make sure that all non-slot specific settings are
configured exactly as you want to have it in the swap target.
1. To swap deployment slots, click the Swap button in the command bar of the app or in the command bar of a
deployment slot.

2. Make sure that the swap source and swap target are set properly. Usually, the swap target is the production
slot. Click OK to complete the operation. When the operation finishes, the deployment slots have been
swapped.

195 | P a g e
70-534 Architecting Microsoft Azure Solutions

For the Swap with preview swap type, see Swap with preview (multi-phase swap).
Swap with preview (multi-phase swap)
Swap with preview, or multi-phase swap, simplify validation of slot-specific configuration elements, such as
connection strings. For mission-critical workloads, you want to validate that the app behaves as expected when the
production slot's configuration is applied, and you must perform such validation before the app is swapped into
production. Swap with preview is what you need.
Note
Swap with preview is not supported in web apps on Linux.
When you use the Swap with preview option (see Swap deployment slots), App Service does the following:
Keeps the destination slot unchanged so existing workload on that slot (e.g. production) is not impacted.
Applies the configuration elements of the destination slot to the source slot, including the slot-specific
connection strings and app settings.
Restarts the worker processes on the source slot using these aforementioned configuration elements.
When you complete the swap: Moves the pre-warmed-up source slot into the destination slot. The
destination slot is moved into the source slot as in a manual swap.
When you cancel the swap: Reapplies the configuration elements of the source slot to the source slot.
You can preview exactly how the app will behave with the destination slot's configuration. Once you completes
validation, you complete the swap in a separate step. This step has the added advantage that the source slot is already
warmed up with the desired configuration, and clients will not experience any downtime.
Samples for the Azure PowerShell cmdlets available for multi-phase swap are included in the Azure PowerShell
cmdlets for deployment slots section.
Configure Auto Swap
Auto Swap streamlines DevOps scenarios where you want to continuously deploy your app with zero cold start and
zero downtime for end customers of the app. When a deployment slot is configured for Auto Swap into production,
every time you push your code update to that slot, App Service will automatically swap the app into production after it
has already warmed up in the slot.
Important

196 | P a g e
70-534 Architecting Microsoft Azure Solutions

When you enable Auto Swap for a slot, make sure the slot configuration is exactly the configuration intended for the
target slot (usually the production slot).
Note:Auto swap is not supported in web apps on Linux.
Configuring Auto Swap for a slot is easy. Follow the steps below:
1. In Deployment Slots, select a non-production slot, and choose Application Settings in that slot's resource
blade.

2. Select On for Auto Swap, select the desired target slot in Auto Swap Slot, and click Save in the command bar.
Make sure configuration for the slot is exactly the configuration intended for the target slot.
The Notifications tab will flash a green SUCCESS once the operation is complete.

Note: To test Auto Swap for your app, you can first select a non-production target slot in Auto Swap Slot to become
familiar with the feature.
3. Execute a code push to that deployment slot. Auto Swap will happen after a short time and the update will be
reflected at your target slot's URL.
To rollback a production app after swap
If any errors are identified in production after a slot swap, roll the slots back to their pre-swap states by swapping the
same two slots immediately.
Custom warm-up before swap

197 | P a g e
70-534 Architecting Microsoft Azure Solutions

Some apps may require custom warm-up actions. The applicationInitialization configuration element in web.config
allows you to specify custom initialization actions to be performed before a request is received. The swap operation
will wait for this custom warm-up to complete. Here is a sample web.config fragment.
<applicationInitialization>
<add initializationPage="/" hostName="[app hostname]" />
<add initializationPage="/Home/About" hostname="[app hostname]" />
</applicationInitialization>

To delete a deployment slot


In the blade for a deployment slot, open the deployment slot's blade, click Overview (the default page), and
click Delete in the command bar.

Azure PowerShell cmdlets for deployment slots


Azure PowerShell is a module that provides cmdlets to manage Azure through Windows PowerShell, including support
for managing deployment slots in Azure App Service.
For information on installing and configuring Azure PowerShell, and on authenticating Azure PowerShell with
your Azure subscription, see How to install and configure Microsoft Azure PowerShell.

Create a web app


New-AzureRmWebApp -ResourceGroupName [resource group name] -Name [app name] -Location [location] -
AppServicePlan [app service plan name]

Create a deployment slot


New-AzureRmWebAppSlot -ResourceGroupName [resource group name] -Name [app name] -Slot [deployment slot
name] -AppServicePlan [app service plan name]

Initiate a swap with review (multi-phase swap) and apply destination slot configuration to source slot
$ParametersObject = @{targetSlot = "[slot name e.g. production]"}
Invoke-AzureRmResourceAction -ResourceGroupName [resource group name] -ResourceType
Microsoft.Web/sites/slots -ResourceName [app name]/[slot name] -Action applySlotConfig -Parameters
$ParametersObject -ApiVersion 2015-07-01

Cancel a pending swap (swap with review) and restore source slot configuration
Invoke-AzureRmResourceAction -ResourceGroupName [resource group name] -ResourceType
Microsoft.Web/sites/slots -ResourceName [app name]/[slot name] -Action resetSlotConfig -ApiVersion 2015-
07-01

198 | P a g e
70-534 Architecting Microsoft Azure Solutions

Swap deployment slots


$ParametersObject = @{targetSlot = "[slot name e.g. production]"}
Invoke-AzureRmResourceAction -ResourceGroupName [resource group name] -ResourceType
Microsoft.Web/sites/slots -ResourceName [app name]/[slot name] -Action slotsswap -Parameters
$ParametersObject -ApiVersion 2015-07-01

Delete deployment slot


Remove-AzureRmResource -ResourceGroupName [resource group name] -ResourceType Microsoft.Web/sites/slots
Name [app name]/[slot name] -ApiVersion 2015-07-01

Azure Command-Line Interface (Azure CLI) commands for Deployment Slots


The Azure CLI provides cross-platform commands for working with Azure, including support for managing App Service
deployment slots.
For instructions on installing and configuring the Azure CLI, including information on how to connect Azure CLI
to your Azure subscription, see Install and Configure the Azure CLI.
To list the commands available for Azure App Service in the Azure CLI, call azure site -h.
Note
For Azure CLI 2.0 commands for deployment slots, see az appservice web deployment slot.

azure site list


For information about the apps in the current subscription, call azure site list, as in the following example.
azure site list webappslotstest

azure site create


To create a deployment slot, call azure site create and specify the name of an existing app and the name of the slot to
create, as in the following example.
azure site create webappslotstest --slot staging
To enable source control for the new slot, use the --git option, as in the following example.
azure site create --git webappslotstest --slot staging

azure site swap


To make the updated deployment slot the production app, use the azure site swap command to perform a swap
operation, as in the following example. The production app will not experience any down time, nor will it undergo a
cold start.
azure site swap webappslotstest

azure site delete


To delete a deployment slot that is no longer needed, use the azure site delete command, as in the following example.
azure site delete webappslotstest --slot staging

Restore an app in Azure


This article shows you how to restore an app in Azure App Service that you have previously backed up (see Back up
your app in Azure). You can restore your app with its linked databases on-demand to a previous state, or create a new
app based on one of your original app's backup. Azure App Service supports the following databases for backup and
restore:
SQL Database
Azure Database for MySQL (Preview)
Azure Database for PostgreSQL (Preview)
ClearDB MySQL
MySQL in-app
Restoring from backups is available to apps running in Standard and Premium tier. For information about scaling up
your app, see Scale up an app in Azure. Premium tier allows a greater number of daily backups to be performed
than Standard tier.
Restore an app from an existing backup

199 | P a g e
70-534 Architecting Microsoft Azure Solutions

1. On the Settings blade of your app in the Azure Portal, click Backups to display the Backups blade. Then
click Restore.

2. In the Restore blade, first select the backup source.

200 | P a g e
70-534 Architecting Microsoft Azure Solutions

The App backup option shows you all the existing backups of the current app, and you can easily select one.
The Storage option lets you select any backup ZIP file from any existing Azure Storage account and container in your
subscription. If you're trying to restore a backup of another app, use the Storage option.
3. Then, specify the destination for the app restore in Restore destination.

Warning: If you choose Overwrite, all existing data in your current app is erased and overwritten. Before you click OK,
make sure that it is exactly what you want to do. You can select Existing App to restore the app backup to another app
in the same resoure group. Before you use this option, you should have already created another app in your resource
group with mirroring database configuration to the one defined in the app backup. You can also Create a New app to
restore your content to.
4. Click OK.
Download or delete a backup from a storage account
1. From the main Browse blade of the Azure portal, select Storage accounts. A list of your existing storage
accounts is displayed.
2. Select the storage account that contains the backup that you want to download or delete.The blade for the
storage account is displayed.
3. In the storage account blade, select the container you want

201 | P a g e
70-534 Architecting Microsoft Azure Solutions

4. Select backup file you want to download or delete.

5. Click Download or Delete depending on what you want to do.


Monitor a restore operation
To see details about the success or failure of the app restore operation, navigate to the Activity Log blade in the Azure
portal.
Scroll down to find the desired restore operation and click to select it.
The details blade displays the available information related to the restore operation.

What is Mobile Apps?


Azure App Service is a fully managed Platform as a Service (PaaS) offering for professional developers that brings a rich
set of capabilities to web, mobile and integration scenarios. Mobile Apps in Azure App Service offer a highly scalable,
globally available mobile application development platform for Enterprise Developers and System Integrators that
brings a rich set of capabilities to mobile developers.

202 | P a g e
70-534 Architecting Microsoft Azure Solutions

Why Mobile Apps?


Mobile Apps in Azure App Service offers a highly scalable, globally available mobile application development platform
for Enterprise Developers and System Integrators that brings a rich set of capabilities to mobile developers. With
Mobile Apps you can:
Build native and cross platform apps - whether you're building native iOS, Android, and Windows apps or
cross-platform Xamarin or Cordova (Phonegap) apps, you can take advantage of App Service using native
SDKs.
Connect to your enterprise systems - with Mobile Apps you can add corporate sign on in minutes, and connect
to your enterprise on-premises or cloud resources.
Build offline-ready apps with data sync - make your mobile workforce productive by building apps that work
offline and use Mobile Apps to sync data in the background when connectivity is present with any of your
enterprise data sources or SaaS APIs.
Push Notifications to millions in seconds - engage your customers with instant push notifications on any
device, personalized to their needs, sent when the time is right.
Mobile App Features
The following features are important to cloud-enabled mobile development:
Authentication and Authorization - Select from an ever-growing list of identity providers, including Azure
Active Directory for enterprise authentication, plus social providers like Facebook, Google, Twitter and
Microsoft Account. Azure Mobile Apps provides an OAuth 2.0 service for each provider. You can also integrate
the SDK for the identity provider for provider specific functionality.
Discover more about our authentication features.
Data Access - Azure Mobile Apps provides a mobile-friendly OData v3 data source linked to SQL Azure or an
on-premises SQL Server. This service can be based on Entity Framework, allowing you to easily integrate with
other NoSQL and SQL data providers, including Azure Table Storage, MongoDB, DocumentDB and SaaS API
providers like Office 365 and Salesforce.com.
Offline Sync - Our Client SDKs make it easy for you to build robust and responsive mobile applications that
operate with an offline data set that can be automatically synchronized with the backend data, including
conflict resolution support.
Discover more about our data features.
Push Notifications - Our Client SDKS seamlessly integrate with the registration capabilities of Azure Notification
Hubs, allowing you to send push notifications to millions of users simultaneously.
Discover more about our push notification features.
Client SDKs - We provide a complete set of Client SDKs that cover native development
(iOS, Android and Windows), cross-platform development (Xamarin for iOS and Android, Xamarin Forms) and
hybrid application development (Apache Cordova). Each client SDK is available with an MIT license and is
open-source.

Azure App Service Features.


The following platform features are generally useful for mobile production sites.
Auto Scaling - App Service enables you to quickly scale-up or out to handle any incoming customer load.
Manually select the number and size of VMs or set up auto-scaling to scale your mobile app backend based on
load or schedule.
Discover more about auto scaling.
Staging Environments - App Service can run multiple versions of your site, allowing you to perform A/B testing,
test in production as part of a larger DevOps plan and do in-place staging of a new backend.
Discover more about staging environments.
Continuous Deployment - App Service can integrate with common SCM systems, allowing you to automatically
deploy a new version of your backend by pushing to a branch of your SCM system.
Discover more about deployment options.
Virtual Networking - App Service can connect to on-premises resources using virtual network, ExpressRoute or
hybrid connections.
Discover more about hybrid connections, virtual networks, and ExpressRoute.

203 | P a g e
70-534 Architecting Microsoft Azure Solutions

Isolated / Dedicated Environments - App Service can be run in a fully isolated and dedicated enviroment for
securely running Azure App Service apps at high scale. This is ideal for application workloads requiring very
high scale, isolation or secure network access.
Discover more about App Service Environments.
Getting Started
To get started with Mobile Apps, follow the Get Started tutorial. This will cover the basics of producing a mobile
backend and client of your choice, then integrating authentication, offline sync and push notifications. You can follow
the Get Started tutorial several times - once for each client application.

Offline Data Sync in Azure Mobile Apps


What is offline data sync?
Offline data sync is a client and server SDK feature of Azure Mobile Apps that makes it easy for developers to create
apps that are functional without a network connection.
When your app is in offline mode, you can still create and modify data, which are saved to a local store. When the app
is back online, it can synchronize local changes with your Azure Mobile App backend. The feature also includes
support for detecting conflicts when the same record is changed on both the client and the backend. Conflicts can
then be handled either on the server or the client.
Offline sync has several benefits:
Improve app responsiveness by caching server data locally on the device
Create robust apps that remain useful when there are network issues
Allow end users to create and modify data even when there is no network access, supporting scenarios with
little or no connectivity
Sync data across multiple devices and detect conflicts when the same record is modified by two devices
Limit network use on high-latency or metered networks
The following tutorials show how to add offline sync to your mobile clients using Azure Mobile Apps:
Android: Enable offline sync
Apache Cordova: Enable offline sync
iOS: Enable offline sync
Xamarin iOS: Enable offline sync
Xamarin Android: Enable offline sync
Xamarin.Forms: Enable offline sync
Universal Windows Platform: Enable offline sync
What is a sync table?
To access the "/tables" endpoint, the Azure Mobile client SDKs provide interfaces such as IMobileServiceTable (.NET
client SDK) or MSTable (iOS client). These APIs connect directly to the Azure Mobile App backend and fail if the client
device does not have a network connection.
To support offline use, your app should instead use the sync table APIs, such as IMobileServiceSyncTable (.NET client
SDK) or MSSyncTable (iOS client). All the same CRUD operations (Create, Read, Update, Delete) work against sync
table APIs, except now they read from or write to a local store. Before any sync table operations can be performed,
the local store must be initialized.
What is a local store?
A local store is the data persistence layer on the client device. The Azure Mobile Apps client SDKs provide a default
local store implementation. On Windows, Xamarin and Android, it is based on SQLite. On iOS, it is based on Core Data.
To use the SQLite-based implementation on Windows Phone or Windows Store 8.1, you need to install a SQLite
extension. For more information, see Universal Windows Platform: Enable offline sync. Android and iOS ship with a
version of SQLite in the device operating system itself, so it is not necessary to reference your own version of SQLite.
Developers can also implement their own local store. For instance, if you wish to store data in an encrypted format on
the mobile client, you can define a local store that uses SQLCipher for encryption.
What is a sync context?
A sync context is associated with a mobile client object (such as IMobileServiceClient or MSClient) and tracks changes
that are made with sync tables. The sync context maintains an operation queue, which keeps an ordered list of CUD
operations (Create, Update, Delete) that is later be sent to the server.

204 | P a g e
70-534 Architecting Microsoft Azure Solutions

A local store is associated with the sync context using an initialize method such
as IMobileServicesSyncContext.InitializeAsync(localstore) in the .NET client SDK.
How offline synchronization works
When using sync tables, your client code controls when local changes are synchronized with an Azure Mobile App
backend. Nothing is sent to the backend until there is a call to push local changes. Similarly, the local store is
populated with new data only when there is a call to pull data.
Push: Push is an operation on the sync context and sends all CUD changes since the last push. Note that it is
not possible to send only an individual table's changes, because otherwise operations could be sent out of
order. Push executes a series of REST calls to your Azure Mobile App backend, which in turn modifies your
server database.
Pull: Pull is performed on a per-table basis and can be customized with a query to retrieve only a subset of the
server data. The Azure Mobile client SDKs then insert the resulting data into the local store.
Implicit Pushes: If a pull is executed against a table that has pending local updates, the pull first executes
a push() on the sync context. This push helps minimize conflicts between changes that are already queued
and new data from the server.
Incremental Sync: the first parameter to the pull operation is a query name that is used only on the client. If
you use a non-null query name, the Azure Mobile SDK performs an incremental sync. Each time a pull
operation returns a set of results, the latest updatedAttimestamp from that result set is stored in the SDK
local system tables. Subsequent pull operations retrieve only records after that timestamp.
To use incremental sync, your server must return meaningful updatedAt values and must also support sorting by this
field. However, since the SDK adds its own sort on the updatedAt field, you cannot use a pull query that has its
own orderBy clause.
The query name can be any string you choose, but it must be unique for each logical query in your app. Otherwise,
different pull operations could overwrite the same incremental sync timestamp and your queries can return incorrect
results.
If the query has a parameter, one way to create a unique query name is to incorporate the parameter value. For
instance, if you are filtering on userid, your query name could be as follows (in C#):
Copy
await todoTable.PullAsync("todoItems" + userid,
syncTable.Where(u => u.UserId == userid));
If you want to opt out of incremental sync, pass null as the query ID. In this case, all records are retrieved on every call
to PullAsync, which is potentially inefficient.
Purging: You can clear the contents of the local store using IMobileServiceSyncTable.PurgeAsync. Purging may
be necessary if you have stale data in the client database, or if you wish to discard all pending changes.
A purge clears a table from the local store. If there are operations awaiting synchronization with the server database,
the purge throws an exception unless the force purge parameter is set.
As an example of stale data on the client, suppose in the "todo list" example, Device1 only pulls items that are not
completed. A todoitem "Buy milk" is marked completed on the server by another device. However, Device1 still has
the "Buy milk" todoitem in local store because it is only pulling items that are not marked complete. A purge clears
this stale item.

Add push notifications to your Windows app


Overview
In this tutorial, you add push notifications to the Windows quick start project so that a push notification is sent to the
device every time a record is inserted.
If you do not use the downloaded quick start server project, you will need the push notification extension package.
See Work with the .NET backend server SDK for Azure Mobile Apps for more information.
Configure a Notification Hub
The Mobile Apps feature of Azure App Service uses Azure Notification Hubs to send pushes, so you will be configuring
a notification hub for your mobile app.
1. In the Azure portal, go to App Services, and then click your app back end. Under Settings, click Push.
2. Click Connect to add a notification hub resource to the app. You can either create a hub or connect to an
existing one.
205 | P a g e
70-534 Architecting Microsoft Azure Solutions

Now you have connected a notification hub to your Mobile Apps back-end project. Later you will configure this
notification hub to connect to a platform notification system (PNS) to push to devices.
Register your app for push notifications
You need to submit your app to the Windows Store, then configure your server project to integrate with Windows
Notification Services (WNS) to send push.
1. In Visual Studio Solution Explorer, right-click the UWP app project, click Store > Associate App with the Store....

2. In the wizard, click Next, sign in with your Microsoft account, type a name for your app in Reserve a new app
name, then click Reserve.
3. After the app registration is successfully created, select the new app name, click Next, and then
click Associate. This adds the required Windows Store registration information to the application manifest.

206 | P a g e
70-534 Architecting Microsoft Azure Solutions

4. Navigate to the Windows Dev Center, sign-in with your Microsoft account, click the new app registration
in My apps, then expand Services > Push notifications.
5. In the Push notifications page, click Live Services site under Microsoft Azure Mobile Services.
6. In the registration page, make a note of the value under Application secrets and the Package SID, which you
will next use to configure your mobile app backend.

Important
The client secret and package SID are important security credentials. Do not share these values with anyone or
distribute them with your app. The Application Id is used with the secret to configure Microsoft Account
authentication.
Configure the backend to send push notifications
1. In the Azure portal, click Browse All > App Services, and click your Mobile Apps back end. Under Settings,
click App Service Push, and then click your notification hub name.
2. Go to Windows (WNS), enter the Security key (client secret) and Package SID that you obtained from the Live
Services site, and then click Save.

207 | P a g e
70-534 Architecting Microsoft Azure Solutions

Your back end is now configured to use WNS to send push notifications.
Update the server to send push notifications
Use the procedure below that matches your backend project typeeither .NET backend or Node.js backend.
.NET backend project
1. In Visual Studio, right-click the server project and click Manage NuGet Packages, search for
Microsoft.Azure.NotificationHubs, then click Install. This installs the Notification Hubs client library.
2. Expand Controllers, open TodoItemController.cs, and add the following using statements:
using System.Collections.Generic;
using Microsoft.Azure.NotificationHubs;
using Microsoft.Azure.Mobile.Server.Config;
3. In the PostTodoItem method, add the following code after the call to InsertAsync:
// Get the settings for the server project.
HttpConfiguration config = this.Configuration;
MobileAppSettingsDictionary settings =
this.Configuration.GetMobileAppSettingsProvider().GetMobileAppSettings();

// Get the Notification Hubs credentials for the Mobile App.


string notificationHubName = settings.NotificationHubName;
string notificationHubConnection = settings
.Connections[MobileAppSettingsKeys.NotificationHubConnectionString].ConnectionString;

// Create the notification hub client.


NotificationHubClient hub = NotificationHubClient
.CreateClientFromConnectionString(notificationHubConnection, notificationHubName);

// Define a WNS payload


var windowsToastPayload = @"<toast><visual><binding template=""ToastText01""><text id=""1"">"
+ item.Text + @"</text></binding></visual></toast>";
try
{
// Send the push notification.
var result = await hub.SendWindowsNativeNotificationAsync(windowsToastPayload);

// Write the success result to the logs.


config.Services.GetTraceWriter().Info(result.State.ToString());
}
catch (System.Exception ex)
{
// Write the failure result to the logs.
config.Services.GetTraceWriter()
.Error(ex.Message, null, "Push.SendAsync Error");
}

208 | P a g e
70-534 Architecting Microsoft Azure Solutions

This code tells the notification hub to send a push notification after a new item is insertion.
4. Republish the server project.
Node.js backend project
1. If you haven't already done so, download the quickstart project or else use the online editor in the Azure
portal.
2. Replace the existing code in the todoitem.js file with the following:
var azureMobileApps = require('azure-mobile-apps'),
promises = require('azure-mobile-apps/src/utilities/promises'),
logger = require('azure-mobile-apps/src/logger');

var table = azureMobileApps.table();

table.insert(function (context) {
// For more information about the Notification Hubs JavaScript SDK,
// see http://aka.ms/nodejshubs
logger.info('Running TodoItem.insert');

// Define the WNS payload that contains the new item Text.
var payload = "<toast><visual><binding template=\ToastText01\><text id=\"1\">"
+ context.item.text + "</text></binding></visual></toast>";

// Execute the insert. The insert returns the results as a Promise,


// Do the push as a post-execute action within the promise flow.
return context.execute()
.then(function (results) {
// Only do the push if configured
if (context.push) {
// Send a WNS native toast notification.
context.push.wns.sendToast(null, payload, function (error) {
if (error) {
logger.error('Error while sending push notification: ', error);
} else {
logger.info('Push notification sent successfully!');
}
});
}
// Don't forget to return the results from the context.execute()
return results;
})
.catch(function (error) {
logger.error('Error while running context.execute: ', error);
});
});

module.exports = table;
This sends a WNS toast notification that contains the item.text when a new todo item is inserted.
3. When editing the file on your local computer, republish the server project.
Add push notifications to your app
Next, your app must register for push notifications on start-up. When you have already enabled authentication, make
sure that the user signs-in before trying to register for push notifications.
1. Open the App.xaml.cs project file and add the following using statements:
using System.Threading.Tasks;
using Windows.Networking.PushNotifications;
2. In the same file, add the following InitNotificationsAsync method definition to the App class:
private async Task InitNotificationsAsync()
{
// Get a channel URI from WNS.
var channel = await PushNotificationChannelManager
.CreatePushNotificationChannelForApplicationAsync();

// Register the channel URI with Notification Hubs.


await App.MobileService.GetPush().RegisterAsync(channel.Uri);
}
This code retrieves the ChannelURI for the app from WNS, and then registers that ChannelURI with your App Service
Mobile App.
209 | P a g e
70-534 Architecting Microsoft Azure Solutions

3. At the top of the OnLaunched event handler in App.xaml.cs, add the async modifier to the method definition
and add the following call to the new InitNotificationsAsync method, as in the following example:
protected async override void OnLaunched(LaunchActivatedEventArgs e)
{
await InitNotificationsAsync();

// ...
}
This guarantees that the short-lived ChannelURI is registered each time the application is launched.
4. Rebuild your UWP app project. Your app is now ready to receive toast notifications.
Test push notifications in your app
1. Right-click the Windows Store project, click Set as StartUp Project, then press the F5 key to run the Windows
Store app.
After the app starts, the device is registered for push notifications.
2. Stop the Windows Store app and repeat the previous step for the Windows Phone Store app.
At this point, both devices are registered to receive push notifications.
3. Run the Windows Store app again, and type text in Insert a TodoItem, and then click Save.
Note that after the insert completes, both the Windows Store and the Windows Phone apps receive a push
notification from WNS. The notification is displayed on Windows Phone even when the app isn't running.

Configure your App Service application to use Azure Active Directory login
This topic shows you how to configure Azure App Services to use Azure Active Directory as an authentication provider.
Configure Azure Active Directory using express settings
1. In the Azure portal, navigate to your application. Click Settings, and then Authentication/Authorization.
2. If the Authentication / Authorization feature is not enabled, turn the switch to On.
3. Click Azure Active Directory, and then click Express under Management Mode.
4. Click OK to register the application in Azure Active Directory. This will create a new registration. If you want to
choose an existing registration instead, click Select an existing app and then search for the name of a
previously created registration within your tenant. Click the registration to select it and click OK. Then
click OK on the Azure Active Directory settings blade.

By default, App Service provides authentication but does not restrict authorized access to your site content and APIs.
You must authorize users in your app code.

210 | P a g e
70-534 Architecting Microsoft Azure Solutions

5. (Optional) To restrict access to your site to only users authenticated by Azure Active Directory, set Action to
take when request is not authenticated to Log in with Azure Active Directory. This requires that all requests be
authenticated, and all unauthenticated requests are redirected to Azure Active Directory for authentication.
6. Click Save.
You are now ready to use Azure Active Directory for authentication in your app.
(Alternative method) Manually configure Azure Active Directory with advanced settings
You can also choose to provide configuration settings manually. This is the preferred solution if the AAD tenant you
wish to use is different from the tenant with which you sign into Azure. To complete the configuration, you must first
create a registration in Azure Active Directory, and then you must provide some of the registration details to App
Service.
Register your application with Azure Active Directory
1. Log on to the Azure portal, and navigate to your application. Copy your URL. You will use this to configure your
Azure Active Directory app.
2. Sign in to the Azure classic portal and navigate to Active Directory.

3. Select your directory, and then select the Applications tab at the top. Click ADD at the bottom to create a new
app registration.
4. Click Add an application my organization is developing.
5. In the Add Application Wizard, enter a Name for your application and click the Web Application And/Or Web
API type. Then click to continue.
6. In the SIGN-ON URL box, paste the application URL you copied earlier. Enter that same URL in the App ID
URI box. Then click to continue.
7. Once the application has been added, click the Configure tab. Edit the Reply URL under Single Sign-on to be
the URL of your application appended with the path, /.auth/login/aad/callback. For
211 | P a g e
70-534 Architecting Microsoft Azure Solutions

example, https://contoso.azurewebsites.net/.auth/login/aad/callback. Make sure that you are using the


HTTPS scheme.

8. Click Save. Then copy the Client ID for the app. You will configure your application to use this later.
9. In the bottom command bar, click View Endpoints, and then copy the Federation Metadata Document URL and
download that document or navigate to it in a browser.
10. Within the root EntityDescriptor element, there should be an entityID attribute of the
form https://sts.windows.net/ followed by a GUID specific to your tenant (called a "tenant ID"). Copy this
value - it will serve as your Issuer URL. You will configure your application to use this later.
Add Azure Active Directory information to your application
1. Back in the Azure portal, navigate to your application. Click Settings, and then Authentication/Authorization.
2. If the Authentication/Authorization feature is not enabled, turn the switch to On.
3. Click Azure Active Directory, and then click Advanced under Management Mode. Paste in the Client ID and
Issuer URL value which you obtained previously. Then click OK.

212 | P a g e
70-534 Architecting Microsoft Azure Solutions

By default, App Service provides authentication but does not restrict authorized access to your site content and APIs.
You must authorize users in your app code.
4. (Optional) To restrict access to your site to only users authenticated by Azure Active Directory, set Action to
take when request is not authenticated to Log in with Azure Active Directory. This requires that all requests be
authenticated, and all unauthenticated requests are redirected to Azure Active Directory for authentication.
5. Click Save.
You are now ready to use Azure Active Directory for authentication in your app.
(Optional) Configure a native client application
Azure Active Directory also allows you to register native clients, which provides greater control over permissions
mapping. You need this if you wish to perform logins using a library such as the Active Directory Authentication Library.
1. Navigate to Active Directory in the Azure classic portal.
2. Select your directory, and then select the Applications tab at the top. Click ADD at the bottom to create a new
app registration.
3. Click Add an application my organization is developing.
4. In the Add Application Wizard, enter a Name for your application and click the Native Client Application type.
Then click to continue.
5. In the Redirect URI box, enter your site's /.auth/login/done endpoint, using the HTTPS scheme. This value
should be similar to https://contoso.azurewebsites.net/.auth/login/done. If creating a Windows application,
instead use the package SID as the URI.
6. Once the native application has been added, click the Configure tab. Find the Client ID and make a note of this
value.
7. Scroll the page down to the Permissions to other applications section and click Add application.
8. Search for the web application that you registered earlier and click the plus icon. Then click the check to close
the dialog. If the web application cannot be found, navigate to its registration and add a new reply URL (e.g.,
the HTTP version of your current URL), click save, and then repeat these steps - the application should show
up in the list.
9. On the new entry you just added, open the Delegated Permissions dropdown and select Access (appName).
Then click Save.
You have now configured a native client application which can access your App Service application.

213 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure Notification Hubs


Azure Notification Hubs provide an easy-to-use, multi-platform, scaled-out push engine. With a single cross-platform
API call, you can easily send targeted and personalized push notifications to any mobile platform from any cloud or
on-premises backend.
Notification Hubs works great for both enterprise and consumer scenarios. Here are a few examples customers use
Notification Hubs for:
Send breaking news notifications to millions with low latency.
Send location-based coupons to interested user segments.
Send event-related notifications to users or groups for media/sports/finance/gaming applications.
Push promotional contents to apps to engage and market to customers.
Notify users of enterprise events like new messages and work items.
Send codes for multi-factor authentication.
What are Push Notifications?
Push notifications is a form of app-to-user communication where users of mobile apps are notified of certain desired
information, usually in a pop-up or dialog box. Users can generally choose to view or dismiss the message, and
choosing the former will open the mobile app that had communicated the notification.
Push notifications is vital for consumer apps in increasing app engagement and usage, and for enterprise apps in
communicating up-to-date business information. It is the best app-to-user communication because it is energy-
efficient for mobile devices, flexible for the notifications senders, and available while corresponding apps are not
active.
For more information on push notifications for a few popular platforms:
iOS
Android
Windows
How Push Notifications Work
Push notifications are delivered through platform-specific infrastructures called Platform Notification Systems (PNSes).
They offer barebone push functionalities to delivery message to a device with a provided handle, and have no
common interface. To send a notification to all customers across the iOS, Android, and Windows versions of an app,
the developer must work with APNS (Apple Push Notification Service), FCM (Firebase Cloud Messaging), and WNS
(Windows Notification Service), while batching the sends.
At a high level, here is how push works:
1. The client app decides it wants to receive pushes hence contacts the corresponding PNS to retrieve its unique
and temporary push handle. The handle type depends on the system (e.g. WNS has URIs while APNS has
tokens).
2. The client app stores this handle in the app back-end or provider.
3. To send a push notification, the app back-end contacts the PNS using the handle to target a specific client app.
4. The PNS forwards the notification to the device specified by the handle.

The Challenges of Push Notifications


214 | P a g e
70-534 Architecting Microsoft Azure Solutions

While PNSes are powerful, they leave much work to the app developer in order to implement even common push
notification scenarios, such as broadcasting or sending push notifications to segmented users.
Push is one of the most requested features in mobile cloud services, because its working requires complex
infrastructures that are unrelated to the app's main business logic. Some of the infrastructural challenges are:
Platform dependency:
o The backend needs to have complex and hard-to-maintain platform-dependent logic to send
notifications to devices on various platforms as PNSes are not unified.
Scale:
o Per PNS guidelines, device tokens must be refreshed upon every app launch. This means the backend
is dealing with a large amount of traffic and database access just to keep the tokens up-to-date. When
the number of devices grows to hundreds and thousands of millions, the cost of creating and
maintaining this infrastructure is massive.
o Most PNSes do not support broadcast to multiple devices. This means a simple broadcast to a million
devices results in a million calls to the PNSes. Scaling this amount of traffic with minimal latency is
nontrivial.
Routing:
o Though PNSes provide a way to send messages to devices, most apps notifications are targeted at
users or interest groups. This means the backend must maintain a registry to associate devices with
interest groups, users, properties, etc. This overhead adds to the time to market and maintenance
costs of an app.
Why Use Notification Hubs?
Notification Hubs eliminates all complexities associated with enabling push on your own. Its multi-platform, scaled-out
push notification infrastructure reduces push-related codes and simplifies your backend. With Notification Hubs,
devices are merely responsible for registering their PNS handles with a hub, while the backend sends messages to
users or interest groups, as shown in the following figure:1

Notification hubs is your ready-to-use push engine with the following advantages:
Cross platforms
o Support for all major push platforms including iOS, Android, Windows, and Kindle and Baidu.
o A common interface to push to all platforms in platform-specific or platform-independent formats
with no platform-specific work.
o Device handle management in one place.
Cross backends
o Cloud or on-premises
o .NET, Node.js, Java, etc.
Rich set of delivery patterns:
o Broadcast to one or multiple platforms: You can instantly broadcast to millions of devices across
platforms with a single API call.
o Push to device: You can target notifications to individual devices.
o Push to user: Tags and templates features help you reach all cross-platform devices of a user.
215 | P a g e
70-534 Architecting Microsoft Azure Solutions

o Push to segment with dynamic tags: Tags feature helps you segment devices and push to them
according to your needs, whether you are sending to one segment or an expression of segments (e.g.
active AND lives in Seattle NOT new user). Instead of being restricted to pub-sub, you can update
device tags anywhere and anytime.
o Localized push: Templates feature helps achieve localization without affecting backend code.
o Silent push: You can enables the push-to-pull pattern by sending silent notifications to devices and
triggering them to complete certain pulls or actions.
o Scheduled push: You can schedule to send out notifications anytime.
o Direct push: You can skip registering devices with our service and directly batch push to a list of device
handles.
o Personalized push: Device push variables helps you send device-specific personalized push
notifications with customized key-value pairs.
Rich telemetry
o General push, device, error, and operation telemetry is available in the Azure portal and
programmatically.
o Per Message Telemetry tracks each push from your initial request call to our service successfully
batching the pushes out.
o Platform Notification System Feedback communicates all feedback from Platfom Notification Systems
to assist in debugging.
Scalability
o Send fast messages to millions of devices without re-architecting or device sharding.
Security
o Shared Access Secret (SAS) or federated authentication.
Integration with App Service Mobile Apps
To facilitate a seamless and unifying experience across Azure services, App Service Mobile Apps has built-in support
for push notifications using Notification Hubs. App Service Mobile Apps offers a highly scalable, globally available
mobile application development platform for Enterprise Developers and System Integrators that brings a rich set of
capabilities to mobile developers.
Mobile Apps developers can utilize Notification Hubs with the following workflow:
1. Retrieve device PNS handle
2. Register device with Notification Hubs through convenient Mobile Apps Client SDK register API
Note that Mobile Apps strips away all tags on registrations for security purposes. Work with
Notification Hubs from your backend directly to associate tags with devices.
3. Send notifications from your app backend with Notification Hubs
Here are some conveniences brought to developers with this integration:
Mobile Apps Client SDKs: These multi-platform SDKs provide simple APIs for registration and talk to the
notification hub linked up with the mobile app automatically. Developers do not need to dig through
Notification Hubs credentials and work with an additional service.
o Push to user: The SDKs automatically tag the given device with Mobile Apps authenticated User ID to
enable push to user scenario.
o Push to device: The SDKs automatically use the Mobile Apps Installation ID as GUID to register with
Notification Hubs, saving developers the trouble of maintaining multiple service GUIDs.
Installation model: Mobile Apps works with Notification Hubs' latest push model to represent all push
properties associated with a device in a JSON Installation that aligns with Push Notification Services and is
easy to use.
Flexibility: Developers can always choose to work with Notification Hubs directly even with the integration in
place.
Integrated experience in Azure portal: Push as a capability is represented visually in Mobile Apps and
developers can easily work with the associated notification hub through Mobile Apps.

216 | P a g e
70-534 Architecting Microsoft Azure Solutions

CHAPTER 6: Design advanced applications


Big Compute: HPC & Batch
Large-scale cloud computing power on demand
Azure provides on-demand compute resources that enable you to run large parallel
and batch compute jobs in the cloud. Extend your on-premises HPC cluster to the
cloud when you need more capacity, or run work entirely in Azure. Scale easily and
take advantage of advanced networking features such as RDMA to run true HPC
applications using MPI to get the results you want, when you need them.

Cloud enable cluster applications


Azure Batch provides job scheduling and auto-scaling of compute resources as a
platform service, making it easy to run large-scale parallel and HPC applications in
the cloud. By using the Batch SDKs and Batch service, you can configure one or
more applications to run on demand or on a schedule across a pool of VMs.
Describe how the data should be distributed, what parameters to use for each task, and the command line to get
started. Azure Batch handles both scale and scheduling, managing the execution as a whole.
Learn more about Azure Batch

True HPC capabilities in the cloud, on demand


The performance and scalability of a world-class supercomputing center is now
available to everyone, on demand in the cloud. Run your Windows and Linux HPC
applications using high performance A8 and A9 compute instances on Azure, and
take advantage of a backend network with MPI latency under 3 microseconds and
non-blocking 32 Gbps throughput. This backend network includes remote direct
memory access (RDMA) technology on Windows and Linux that enables parallel
applications to scale to thousands of cores. Azure provides you with high memory
and HPC-class CPUs to help you get results fast. Scale up and down based upon
what you need and pay only for what you use to reduce costs.
Learn more about high performance A8 and A9 compute instances
Read about the performance improvements ANEO achieved with Azure

Linux and open source support


Customers running high performance computing workloads on Linux can tap into the power of Azure. Azure supports
Linux remote direct memory access (RDMA) technology on high performance A8 and A9 compute instances, enabling
scientists and engineers to solve complex problems using many popular industry-standard applications for Linux, or by
bringing their own Linux application to Azure. You can also use HPC Pack to schedule jobs on Linux Virtual Machines in
Azure, giving you the ability to use a single job scheduling solution for your Linux and Windows HPC applications.
Learn more about Linux RDMA support on Azure Virtual Machines

Extend your HPC cluster or run entirely in the cloud


With Microsoft HPC Pack you can deploy an on-premises Windows compute cluster and
dynamically extend to Azure when you need additional capacity. You can also use HPC Pack to
deploy a cluster entirely on Azure and connect to it over a VPN or the Internet. Do all this without
compromising performance thanks to a wide range of compute options, including memory
intensive and compute intensive instances. Get started today!
Build a high performance cluster by spinning up compute resources on demand

Broad partner ecosystem

217 | P a g e
70-534 Architecting Microsoft Azure Solutions

Take advantage of a wide range of Linux and Windows applications, libraries, and tools from independent software
vendors with solutions across industries such as financial services, engineering, oil and gas, life sciences, and digital
content creation. Your existing cluster manager and job scheduler can work with Azure Virtual Machines. Microsoft
partners including Excelian, Cycle Computing, Techila, Rescale, Fixstars, and Nimbo can help make the cloud work for
you.

Run intrinsically parallel workloads with Batch


Azure Batch is a platform service for running large-scale parallel and high-performance computing (HPC) applications
efficiently in the cloud. Azure Batch schedules compute-intensive work to run on a managed collection of virtual
machines, and can automatically scale compute resources to meet the needs of your jobs.
With Azure Batch, you can easily define Azure compute resources to execute your applications in parallel, and at scale.
There's no need to manually create, configure, and manage an HPC cluster, individual virtual machines, virtual
networks, or a complex job and task scheduling infrastructure. Azure Batch automates or simplifies these tasks for
you.
Use cases for Batch
Batch is a managed Azure service that is used for batch processing or batch computing--running a large volume of
similar tasks for a desired result. Batch computing is most commonly used by organizations that regularly process,
transform, and analyze large volumes of data.
Batch works well with intrinsically parallel (also known as "embarrassingly parallel") applications and workloads.
Intrinsically parallel workloads are those that are easily split into multiple tasks that perform work simultaneously on
many computers.

Some examples of workloads that are commonly processed using this technique are:
Financial risk modeling
Climate and hydrology data analysis
Image rendering, analysis, and processing
Media encoding and transcoding
Genetic sequence analysis
Engineering stress analysis
Software testing
Batch can also perform parallel calculations with a reduce step at the end, and execute more complex HPC workloads
such as Message Passing Interface (MPI) applications.
For a comparison between Batch and other HPC solution options in Azure, see Batch and HPC solutions.
Pricing
Azure Batch is a free service; you aren't charged for the Batch account itself. You are charged for the underlying Azure
compute resources that your Batch solutions consume, and for the resources consumed by other services when your
workloads run. For example, you are charged for the compute nodes (VMs) in your pools and for the data you store in
Azure Storage as input or output for your tasks. Similarly, if you use the application packages feature of Batch, you are

218 | P a g e
70-534 Architecting Microsoft Azure Solutions

charged for the Azure Storage resources used for storing your application packages. See Batch pricing for more
information.
Low-priority VMs can significantly reduce the cost of Batch workloads. For information about pricing for low-priority
VMs, see Batch Pricing.
Scenario: Scale out a parallel workload
A common solution that uses the Batch APIs to interact with the Batch service involves scaling out intrinsically parallel
work--such as the rendering of images for 3D scenes--on a pool of compute nodes. This pool of compute nodes can be
your "render farm" that provides tens, hundreds, or even thousands of cores to your rendering job, for example.
The following diagram shows a common Batch workflow, with a client application or hosted service using Batch to run
a parallel workload.

In this common scenario, your application or service processes a computational workload in Azure Batch by
performing the following steps:
1. Upload the input files and the application that will process those files to your Azure Storage account. The input
files can be any data that your application will process, such as financial modeling data, or video files to be
transcoded. The application files can be any application that is used for processing the data, such as a 3D
rendering application or media transcoder.
2. Create a Batch pool of compute nodes in your Batch account--these nodes are the virtual machines that will
execute your tasks. You specify properties such as the node size, their operating system, and the location in
Azure Storage of the application to install when the nodes join the pool (the application that you uploaded in
step #1). You can also configure the pool to automatically scale in response to the workload that your tasks
generate. Auto-scaling dynamically adjusts the number of compute nodes in the pool.
3. Create a Batch job to run the workload on the pool of compute nodes. When you create a job, you associate it
with a Batch pool.
4. Add tasks to the job. When you add tasks to a job, the Batch service automatically schedules the tasks for
execution on the compute nodes in the pool. Each task uses the application that you uploaded to process the
input files.
4a. Before a task executes, it can download the data (the input files) that it is to process to the
compute node it is assigned to. If the application has not already been installed on the node (see step
219 | P a g e
70-534 Architecting Microsoft Azure Solutions

#2), it can be downloaded here instead. When the downloads are complete, the tasks execute on
their assigned nodes.
5. As the tasks run, you can query Batch to monitor the progress of the job and its tasks. Your client application
or service communicates with the Batch service over HTTPS. Because you may be monitoring thousands of
tasks running on thousands of compute nodes, be sure to query the Batch service efficiently.
6. As the tasks complete, they can upload their result data to Azure Storage. You can also retrieve files directly
from the file system on a compute node.
7. When your monitoring detects that the tasks in your job have completed, your client application or service
can download the output data for further processing or evaluation.
Keep in mind this is just one way to use Batch, and this scenario describes only a few of its available features. For
example, you can execute multiple tasks in parallel on each compute node, and you can use job preparation and
completion tasks to prepare the nodes for your jobs, then clean up afterward.

What is Azure Scheduler?


Azure Scheduler allows you to declaratively describe actions to run in the cloud. It then schedules and runs those
actions automatically. Scheduler does this by using the Azure portal, code, REST API, or Azure PowerShell.
Scheduler creates, maintains, and invokes scheduled work. Scheduler does not host any workloads or run any code. It
only invokes code hosted elsewherein Azure, on-premises, or with another provider. It invokes via HTTP, HTTPS, a
storage queue, a service bus queue, or a service bus topic.+
Scheduler schedules jobs, keeps a history of job execution results that one can review, and deterministically and
reliably schedules workloads to be run. Azure WebJobs (part of the Web Apps feature in Azure App Service) and other
Azure scheduling capabilities use Scheduler in the background. The Scheduler REST API helps manage the
communication for these actions. As such, Scheduler supports complex schedules and advanced recurrence easily.
There are several scenarios that lend themselves to the usage of Scheduler. For example:
Recurring application actions: Periodically gathering data from Twitter into a feed.
Daily maintenance: Daily pruning of logs, performing backups, and other maintenance tasks. For example, an
administrator may choose to back up the database at 1:00 A.M. every day for the next nine months.
Scheduler allows you to create, update, delete, view, and manage jobs and job collections programmatically, by using
scripts, and in the portal.

Azure Service Bus


Whether an application or service runs in the cloud or on premises, it often needs to interact with other applications
or services. To provide a broadly useful way to do this, Microsoft Azure offers Service Bus. This article looks at this
technology, describing what it is and why you might want to use it.
Service Bus fundamentals
Different situations call for different styles of communication. Sometimes, letting applications send and receive
messages through a simple queue is the best solution. In other situations, an ordinary queue isn't enough; a queue
with a publish-and-subscribe mechanism is better. In some cases, all that's needed is a connection between
applications, and queues are not required. Service Bus provides all three options, enabling your applications to
interact in several different ways.
Service Bus is a multi-tenant cloud service, which means that the service is shared by multiple users. Each user, such
as an application developer, creates a namespace, then defines the communication mechanisms needed within that
namespace. Figure 1 shows how this architecture looks.

220 | P a g e
70-534 Architecting Microsoft Azure Solutions

Figure 1: Service Bus provides a multi-tenant service for connecting applications through the cloud.
Within a namespace, you can use one or more instances of three different communication mechanisms, each of
which connects applications in a different way. The choices are:
Queues, which allow one-directional communication. Each queue acts as an intermediary (sometimes called
a broker) that stores sent messages until they are received. Each message is received by a single recipient.
Topics, which provide one-directional communication using subscriptions-a single topic can have multiple
subscriptions. Like a queue, a topic acts as a broker, but each subscription can optionally use a filter to receive
only messages that match specific criteria.
Relays, which provide bi-directional communication. Unlike queues and topics, a relay doesn't store in-flight
messages; it's not a broker. Instead, it just passes them on to the destination application.
When you create a queue, topic, or relay, you give it a name. Combined with whatever you called your namespace,
this name creates a unique identifier for the object. Applications can provide this name to Service Bus, then use that
queue, topic, or relay to communicate with one another.
To use any of these objects in the relay scenario, Windows applications can use Windows Communication Foundation
(WCF). This service is known as WCF Relay. For queues and topics, Windows applications can use Service Bus-defined
messaging APIs. To make these objects easier to use from non-Windows applications, Microsoft provides SDKs for
Java, Node.js, and other languages. You can also access queues and topics using REST APIsover HTTP(s).
It's important to understand that even though Service Bus itself runs in the cloud (that is, in Microsoft's Azure
datacenters), applications that use it can run anywhere. You can use Service Bus to connect applications running on
Azure, for example, or applications running inside your own datacenter. You can also use it to connect an application
running on Azure or another cloud platform with an on-premises application or with tablets and phones. It's even
possible to connect household appliances, sensors, and other devices to a central application or to one other. Service
Bus is a communication mechanism in the cloud that's accessible from pretty much anywhere. How you use it
depends on what your applications need to do.
Queues
Suppose you decide to connect two applications using a Service Bus queue. Figure 2 illustrates this situation.

221 | P a g e
70-534 Architecting Microsoft Azure Solutions

Figure 2: Service Bus queues provide one-way asynchronous queuing.


The process is simple: A sender sends a message to a Service Bus queue, and a receiver picks up that message at some
later time. A queue can have just a single receiver, as Figure 2 shows. Or, multiple applications can read from the
same queue. In the latter situation, each message is read by just one receiver. For a multi-cast service, you should use
a topic instead.
Each message has two parts: a set of properties, each a key/value pair, and a message payload. The payload can be
binary, text, or even XML. How they're used depends on what an application is trying to do. For example, an
application sending a message about a recent sale might include the properties Seller="Ava" and Amount=10000. The
message body might contain a scanned image of the sale's signed contract or, if there isn't one, remain empty.
A receiver can read a message from a Service Bus queue in two different ways. The first option,
called ReceiveAndDelete, removes a message from the queue and immediately deletes it. This option is simple, but if
the receiver crashes before it finishes processing the message, the message is lost. Because it's been removed from
the queue, no other receiver can access it.
The second option, PeekLock, is meant to help with this problem. Like ReceiveAndDelete, a PeekLock read removes a
message from the queue. It doesn't delete the message, however. Instead, it locks the message, making it invisible to
other receivers, then waits for one of three events:
If the receiver processes the message successfully, it calls Complete(), and the queue deletes the message.
If the receiver decides that it can't process the message successfully, it calls Abandon(). The queue then
removes the lock from the message and makes it available to other receivers.
If the receiver calls neither of these methods within a configurable period of time (by default, 60 seconds), the
queue assumes the receiver has failed. In this case, it behaves as if the receiver had called Abandon, making
the message available to other receivers.
Notice what can happen here: the same message might be delivered twice, perhaps to two different receivers.
Applications using Service Bus queues must be prepared for this event. To make duplicate detection easier, each
message has a unique MessageID property that by default stays the same no matter how many times the message is
read from a queue.
Queues are useful in quite a few situations. They enable applications to communicate even when both aren't running
at the same time, something that's especially handy with batch and mobile applications. A queue with multiple
receivers also provides automatic load balancing, since sent messages are spread across these receivers.
Topics

222 | P a g e
70-534 Architecting Microsoft Azure Solutions

Useful as they are, queues aren't always the right solution. Sometimes, Service Bus topics are better. Figure 3
illustrates this idea.

Figure 3: Based on the filter a subscribing application specifies, it can receive some or all the messages sent to a Service
Bus topic.
A topic is similar in many ways to a queue. Senders submit messages to a topic in the same way that they submit
messages to a queue, and those messages look the same as with queues. The difference is that topics enable each
receiving application to create its own subscription by defining a filter. A subscriber then sees only the messages that
match that filter. For example, Figure 3 shows a sender and a topic with three subscribers, each with its own filter:
Subscriber 1 receives only messages that contain the property Seller="Ava".
Subscriber 2 receives messages that contain the property Seller="Ruby" and/or contain an Amount property
whose value is greater than 100,000. Perhaps Ruby is the sales manager, so she wants to see both her own
sales and all large sales regardless of who makes them.
Subscriber 3 has set its filter to True, which means that it receives all messages. For example, this application
might be responsible for maintaining an audit trail and therefore it needs to see all the messages.
As with queues, subscribers to a topic can read messages using either ReceiveAndDelete or PeekLock. Unlike queues,
however, a single message sent to a topic can be received by multiple subscriptions. This approach, commonly
called publish and subscribe (or pub/sub), is useful whenever multiple applications are interested in the same
messages. By defining the right filter, each subscriber can tap into just the part of the message stream that it needs to
see.
Relays
Both queues and topics provide one-way asynchronous communication through a broker. Traffic flows in just one
direction, and there's no direct connection between senders and receivers. But what if you don't want this
connection? Suppose your applications need to both send and receive messages, or perhaps you want a direct link
between them and you don't need a broker to store messages. To address scenarios such as this, Service Bus
provides relays, as Figure 4 shows.

223 | P a g e
70-534 Architecting Microsoft Azure Solutions

Figure 4: Service Bus relay provides synchronous, two-way communication between applications.
The obvious question to ask about relays is this: why would I use one? Even if I don't need queues, why make
applications communicate via a cloud service rather than just interact directly? The answer is that talking directly can
be harder than you might think.
Suppose you want to connect two on-premises applications, both running inside corporate datacenters. Each of these
applications sits behind a firewall, and each datacenter probably uses network address translation (NAT). The firewall
blocks incoming data on all but a few ports, and NAT implies that the machine each application is running on doesn't
have a fixed IP address that you can reach directly from outside the datacenter. Without some extra help, connecting
these applications over the public internet is problematic.
A Service Bus relay can help. To communicate bi-directionally through a relay, each application establishes an
outbound TCP connection with Service Bus, then keeps it open. All communication between the two applications
travels over these connections. Because each connection was established from inside the datacenter, the firewall
allows incoming traffic to each application without opening new ports. This approach also gets around the NAT
problem, because each application has a consistent endpoint in the cloud throughout the communication. By
exchanging data through the relay, the applications can avoid the problems that would otherwise make
communication difficult.
To use Service Bus relays, applications rely on the Windows Communication Foundation (WCF). Service Bus provides
WCF bindings that make it straightforward for Windows applications to interact via relays. Applications that already
use WCF can typically specify one of these bindings, then talk to each other through a relay. Unlike queues and topics,
however, using relays from non-Windows applications, while possible, requires some programming effort; no
standard libraries are provided.
Unlike queues and topics, applications don't explicitly create relays. Instead, when an application that wishes to
receive messages establishes a TCP connection with Service Bus, a relay is created automatically. When the
connection is dropped, the relay is deleted. To enable an application to find the relay created by a specific listener,
Service Bus provides a registry that enables applications to locate a specific relay by name.
Relays are the right solution when you need direct communication between applications. For example, consider an
airline reservation system running in an on-premises datacenter that must be accessed from check-in kiosks, mobile
devices, and other computers. Applications running on all these systems could rely on Service Bus relays in the cloud
to communicate, wherever they might be running.
Summary
Connecting applications has always been part of building complete solutions, and the range of scenarios that require
applications and services to communicate with each other is set to increase as more applications and devices are
connected to the internet. By providing cloud-based technologies for achieving communication through queues,
topics, and relays, Service Bus aims to make this essential function easier to implement and more broadly available.

Storage queues and Service Bus queues - compared and contrasted


This article analyzes the differences and similarities between the two types of queues offered by Microsoft Azure
today: Storage queues and Service Bus queues. By using this information, you can compare and contrast the
respective technologies and be able to make a more informed decision about which solution best meets your needs.
Introduction
Azure supports two types of queue mechanisms: Storage queues and Service Bus queues.
Storage queues, which are part of the Azure storage infrastructure, feature a simple REST-based Get/Put/Peek
interface, providing reliable, persistent messaging within and between services.
Service Bus queues are part of a broader Azure messaging infrastructure that supports queuing as well as
publish/subscribe, and more advanced integration patterns. For more information about Service Bus
queues/topics/subscriptions, see the overview of Service Bus.
While both queuing technologies exist concurrently, Storage queues were introduced first, as a dedicated queue
storage mechanism built on top of Azure Storage services. Service Bus queues are built on top of the broader
"messaging" infrastructure designed to integrate applications or application components that may span multiple
communication protocols, data contracts, trust domains, and/or network environments.
Technology selection considerations

224 | P a g e
70-534 Architecting Microsoft Azure Solutions

Both Storage queues and Service Bus queues are implementations of the message queuing service currently offered
on Microsoft Azure. Each has a slightly different feature set, which means you can choose one or the other, or use
both, depending on the needs of your particular solution or business/technical problem you are solving.
When determining which queuing technology fits the purpose for a given solution, solution architects and developers
should consider the recommendations below. For more details, see the next section.
As a solution architect/developer, you should consider using Storage queues when:
Your application must store over 80 GB of messages in a queue, where the messages have a lifetime shorter
than 7 days.
Your application wants to track progress for processing a message inside of the queue. This is useful if the
worker processing a message crashes. A subsequent worker can then use that information to continue from
where the prior worker left off.
You require server side logs of all of the transactions executed against your queues.
As a solution architect/developer, you should consider using Service Bus queues when:
Your solution must be able to receive messages without having to poll the queue. With Service Bus, this can
be achieved through the use of the long-polling receive operation using the TCP-based protocols that Service
Bus supports.
Your solution requires the queue to provide a guaranteed first-in-first-out (FIFO) ordered delivery.
You want a symmetric experience in Azure and on Windows Server (private cloud). For more information,
see Service Bus for Windows Server.
Your solution must be able to support automatic duplicate detection.
You want your application to process messages as parallel long-running streams (messages are associated
with a stream using the SessionIdproperty on the message). In this model, each node in the consuming
application competes for streams, as opposed to messages. When a stream is given to a consuming node, the
node can examine the state of the application stream state using transactions.
Your solution requires transactional behavior and atomicity when sending or receiving multiple messages
from a queue.
The time-to-live (TTL) characteristic of the application-specific workload can exceed the 7-day period.
Your application handles messages that can exceed 64 KB but will not likely approach the 256 KB limit.
You deal with a requirement to provide a role-based access model to the queues, and different
rights/permissions for senders and receivers.
Your queue size will not grow larger than 80 GB.
You want to use the AMQP 1.0 standards-based messaging protocol. For more information about AMQP,
see Service Bus AMQP Overview.
You can envision an eventual migration from queue-based point-to-point communication to a message
exchange pattern that enables seamless integration of additional receivers (subscribers), each of which
receives independent copies of either some or all messages sent to the queue. The latter refers to the
publish/subscribe capability natively provided by Service Bus.
Your messaging solution must be able to support the "At-Most-Once" delivery guarantee without the need for
you to build the additional infrastructure components.
You would like to be able to publish and consume batches of messages.
Comparing Storage queues and Service Bus queues
The tables in the following sections provide a logical grouping of queue features and let you compare, at a glance, the
capabilities available in both Storage queues and Service Bus queues.
Foundational capabilities
This section compares some of the fundamental queuing capabilities provided by Storage queues and Service Bus
queues.

Comparison
Criteria Storage queues Service Bus Queues

Ordering No Yes - First-In-First-Out (FIFO)


guarantee
For more information, see the first (through the use of messaging sessions)

225 | P a g e
70-534 Architecting Microsoft Azure Solutions

Comparison
Criteria Storage queues Service Bus Queues

note in the Additional Information


section.

Delivery At-Least-Once At-Least-Once


guarantee
At-Most-Once

Atomic No Yes
operation
support

Receive Non-blocking Blocking with/without timeout


behavior
(completes immediately if no new (offers long polling, or the "Comet
message is found) technique")

Non-blocking

(through the use of .NET managed API only)

Push-style No Yes
API
OnMessage and OnMessage sessions .NET
API.

Receive Peek & Lease Peek & Lock


mode
Receive & Delete

Exclusive Lease-based Lock-based


access mode

Lease/Lock 30 seconds (default) 60 seconds (default)


duration
7 days (maximum) (You can renew or You can renew a message lock using
release a message lease using the RenewLock API.
the UpdateMessage API.)

Lease/Lock Message level Queue level


precision
(each message can have a different (each queue has a lock precision applied to
timeout value, which you can then all of its messages, but you can renew the
update as needed while processing lock using the RenewLockAPI.)
the message, by using
the UpdateMessage API)

226 | P a g e
70-534 Architecting Microsoft Azure Solutions

Comparison
Criteria Storage queues Service Bus Queues

Batched Yes Yes


receive
(explicitly specifying message count (implicitly enabling a pre-fetch property or
when retrieving messages, up to a explicitly through the use of transactions)
maximum of 32 messages)

Batched No Yes
send
(through the use of transactions or client-
side batching)

Additional information
Messages in Storage queues are typically first-in-first-out, but sometimes they can be out of order; for
example, when a message's visibility timeout duration expires (for example, as a result of a client application
crashing during processing). When the visibility timeout expires, the message becomes visible again on the
queue for another worker to dequeue it. At that point, the newly visible message might be placed in the
queue (to be dequeued again) after a message that was originally enqueued after it.
The guaranteed FIFO pattern in Service Bus queues requires the use of messaging sessions. In the event that
the application crashes while processing a message received in the Peek & Lock mode, the next time a queue
receiver accepts a messaging session, it will start with the failed message after its time-to-live (TTL) period
expires.
Storage queues are designed to support standard queuing scenarios, such as decoupling application
components to increase scalability and tolerance for failures, load leveling, and building process workflows.
Service Bus queues support the At-Least-Once delivery guarantee. In addition, the At-Most-Once semantic can
be supported by using session state to store the application state and by using transactions to atomically
receive messages and update the session state.
Storage queues provide a uniform and consistent programming model across queues, tables, and BLOBs
both for developers and for operations teams.
Service Bus queues provide support for local transactions in the context of a single queue.
The Receive and Delete mode supported by Service Bus provides the ability to reduce the messaging
operation count (and associated cost) in exchange for lowered delivery assurance.
Storage queues provide leases with the ability to extend the leases for messages. This allows the workers to
maintain short leases on messages. Thus, if a worker crashes, the message can be quickly processed again by
another worker. In addition, a worker can extend the lease on a message if it needs to process it longer than
the current lease time.
Storage queues offer a visibility timeout that you can set upon the enqueueing or dequeuing of a message. In
addition, you can update a message with different lease values at run-time, and update different values across
messages in the same queue. Service Bus lock timeouts are defined in the queue metadata; however, you can
renew the lock by calling the RenewLock method.
The maximum timeout for a blocking receive operation in Service Bus queues is 24 days. However, REST-
based timeouts have a maximum value of 55 seconds.
Client-side batching provided by Service Bus enables a queue client to batch multiple messages into a single
send operation. Batching is only available for asynchronous send operations.
Features such as the 200 TB ceiling of Storage queues (more when you virtualize accounts) and unlimited
queues make it an ideal platform for SaaS providers.
Storage queues provide a flexible and performant delegated access control mechanism.
Advanced capabilities
This section compares advanced capabilities provided by Storage queues and Service Bus queues.

227 | P a g e
70-534 Architecting Microsoft Azure Solutions

Comparison
Criteria Storage queues Service Bus Queues

Scheduled Yes Yes


delivery

Automatic No Yes
dead
lettering

Increasing Yes Yes


queue time-
to-live value (via in-place update of visibility timeout) (provided via a dedicated API function)

Poison Yes Yes


message
support

In-place Yes Yes


update

Server-side Yes No
transaction
log

Storage Yes Yes


metrics
Minute Metrics: provides real-time metrics (bulk queries by calling GetQueues)
for availability, TPS, API call counts, error
counts, and more, all in real time
(aggregated per minute and reported
within a few minutes from what just
happened in production. For more
information, see About Storage Analytics
Metrics.

State No Yes
management
Microsoft.ServiceBus.Messaging.EntityStatus.Active,
Microsoft.ServiceBus.Messaging.EntityStatus.Disabled,
Microsoft.ServiceBus.Messaging.EntityStatus.SendDisabled,
Microsoft.ServiceBus.Messaging.EntityStatus.ReceiveDisabled

Message No Yes
auto-
forwarding

Purge queue Yes No


function

228 | P a g e
70-534 Architecting Microsoft Azure Solutions

Comparison
Criteria Storage queues Service Bus Queues

Message No Yes
groups
(through the use of messaging sessions)

Application No Yes
state per
message
group

Duplicate No Yes
detection
(configurable on the sender side)

Browsing No Yes
message
groups

Fetching No Yes
message
sessions by
ID

Additional information
Both queuing technologies enable a message to be scheduled for delivery at a later time.
Queue auto-forwarding enables thousands of queues to auto-forward their messages to a single queue, from
which the receiving application consumes the message. You can use this mechanism to achieve security,
control flow, and isolate storage between each message publisher.
Storage queues provide support for updating message content. You can use this functionality for persisting
state information and incremental progress updates into the message so that it can be processed from the
last known checkpoint, instead of starting from scratch. With Service Bus queues, you can enable the same
scenario through the use of message sessions. Sessions enable you to save and retrieve the application
processing state (by using SetState and GetState).
Dead lettering, which is only supported by Service Bus queues, can be useful for isolating messages that
cannot be processed successfully by the receiving application or when messages cannot reach their
destination due to an expired time-to-live (TTL) property. The TTL value specifies how long a message remains
in the queue. With Service Bus, the message will be moved to a special queue called $DeadLetterQueue when
the TTL period expires.
To find "poison" messages in Storage queues, when dequeuing a message the application examines
the DequeueCount property of the message. If DequeueCount is greater than a given threshold, the
application moves the message to an application-defined "dead letter" queue.
Storage queues enable you to obtain a detailed log of all of the transactions executed against the queue, as
well as aggregated metrics. Both of these options are useful for debugging and understanding how your
application uses Storage queues. They are also useful for performance-tuning your application and reducing
the costs of using queues.
The concept of "message sessions" supported by Service Bus enables messages that belong to a certain logical
group to be associated with a given receiver, which in turn creates a session-like affinity between messages
and their respective receivers. You can enable this advanced functionality in Service Bus by setting

229 | P a g e
70-534 Architecting Microsoft Azure Solutions

the SessionID property on a message. Receivers can then listen on a specific session ID and receive messages
that share the specified session identifier.
The duplication detection functionality supported by Service Bus queues automatically removes duplicate
messages sent to a queue or topic, based on the value of the MessageId property.
Capacity and quotas
This section compares Storage queues and Service Bus queues from the perspective of capacity and quotas that may
apply.

Comparison
Criteria Storage queues Service Bus Queues

Maximum queue 500 TB 1 GB to 80 GB


size
(limited to a single storage account capacity) (defined upon creation of a queue
and enabling partitioning see
the Additional Information
section)

Maximum 64 KB 256 KB or 1 MB
message size
(48 KB when using Base64 encoding) (including both header and body,
maximum header size: 64 KB).
Azure supports large messages by combining
queues and blobs at which point you can Depends on the service tier.
enqueue up to 200GB for a single item.

Maximum 7 days TimeSpan.Max


message TTL

Maximum Unlimited 10,000


number of
queues (per service namespace, can be
increased)

Maximum Unlimited Unlimited


number of
concurrent (100 concurrent connection limit
clients only applies to TCP protocol-
based communication)

Additional information
Service Bus enforces queue size limits. The maximum queue size is specified upon creation of the queue and
can have a value between 1 and 80 GB. If the queue size value set on creation of the queue is reached,
additional incoming messages will be rejected and an exception will be received by the calling code. For more
information about quotas in Service Bus, see Service Bus Quotas.
In the Standard tier, you can create Service Bus queues in 1, 2, 3, 4, or 5 GB sizes (the default is 1 GB). In the
Premium tier, you can create queues up to 80 GB in size. In Standard tier, with partitioning enabled (which is
the default), Service Bus creates 16 partitions for each GB you specify. As such, if you create a queue that is 5
GB in size, with 16 partitions the maximum queue size becomes (5 * 16) = 80 GB. You can see the maximum
size of your partitioned queue or topic by looking at its entry on the Azure portal. In the Premium tier, only 2
partitions are created per queue.

230 | P a g e
70-534 Architecting Microsoft Azure Solutions

With Storage queues, if the content of the message is not XML-safe, then it must be Base64 encoded. If
you Base64-encode the message, the user payload can be up to 48 KB, instead of 64 KB.
With Service Bus queues, each message stored in a queue is composed of two parts: a header and a body. The
total size of the message cannot exceed the maximum message size supported by the service tier.
When clients communicate with Service Bus queues over the TCP protocol, the maximum number of
concurrent connections to a single Service Bus queue is limited to 100. This number is shared between
senders and receivers. If this quota is reached, subsequent requests for additional connections will be
rejected and an exception will be received by the calling code. This limit is not imposed on clients connecting
to the queues using REST-based API.
If you require more than 10,000 queues in a single Service Bus namespace, you can contact the Azure support
team and request an increase. To scale beyond 10,000 queues with Service Bus, you can also create additional
namespaces using the Azure portal.
Management and operations
This section compares the management features provided by Storage queues and Service Bus queues.

Comparison Criteria Storage queues Service Bus queues

Management REST over HTTP/HTTPS REST over HTTPS


protocol

Runtime protocol REST over HTTP/HTTPS REST over HTTPS

AMQP 1.0 Standard (TCP with


TLS)

.NET API Yes Yes

(.NET Storage Client API) (.NET Service Bus API)

Native C++ Yes Yes

Java API Yes Yes

PHP API Yes Yes

Node.js API Yes Yes

Arbitrary metadata Yes No


support

Queue naming rules Up to 63 characters long Up to 260 characters long

(Letters in a queue name must be lowercase.) (Queue paths and names are
case-insensitive.)

Get queue length Yes Yes


function
(Approximate value if messages expire beyond (Exact, point-in-time value.)
the TTL without being deleted.)

231 | P a g e
70-534 Architecting Microsoft Azure Solutions

Comparison Criteria Storage queues Service Bus queues

Peek function Yes Yes

Additional information
Storage queues provide support for arbitrary attributes that can be applied to the queue description, in the
form of name/value pairs.
Both queue technologies offer the ability to peek a message without having to lock it, which can be useful
when implementing a queue explorer/browser tool.
The Service Bus .NET brokered messaging APIs leverage full-duplex TCP connections for improved
performance when compared to REST over HTTP, and they support the AMQP 1.0 standard protocol.
Names of Storage queues can be 3-63 characters long, can contain lowercase letters, numbers, and hyphens.
For more information, see Naming Queues and Metadata.
Service Bus queue names can be up to 260 characters long and have less restrictive naming rules. Service Bus
queue names can contain letters, numbers, periods, hyphens, and underscores.

Authentication and authorization


This section discusses the authentication and authorization features supported by Storage queues and Service Bus
queues.

Comparison Criteria Storage queues Service Bus Queues

Authentication Symmetric key Symmetric key

Security model Delegated access via SAS tokens. SAS

Identity provider federation No Yes

Additional information
Every request to either of the queuing technologies must be authenticated. Public queues with anonymous
access are not supported. Using SAS, you can address this scenario by publishing a write-only SAS, read-only
SAS, or even a full-access SAS.
The authentication scheme provided by Storage queues involves the use of a symmetric key, which is a hash-
based Message Authentication Code (HMAC), computed with the SHA-256 algorithm and encoded as
a Base64 string. For more information about the respective protocol, see Authentication for the Azure Storage
Services. Service Bus queues support a similar model using symmetric keys. For more information, see Shared
Access Signature Authentication with Service Bus.
Conclusion
By gaining a deeper understanding of the two technologies, you will be able to make a more informed decision on
which queue technology to use, and when. The decision on when to use Storage queues or Service Bus queues clearly
depends on a number of factors. These factors may depend heavily on the individual needs of your application and its
architecture. If your application already uses the core capabilities of Microsoft Azure, you may prefer to choose
Storage queues, especially if you require basic communication and messaging between services or need queues that
can be larger than 80 GB in size.
Because Service Bus queues provide a number of advanced features, such as sessions, transactions, duplicate
detection, automatic dead-lettering, and durable publish/subscribe capabilities, they may be a preferred choice if you
are building a hybrid application or if your application otherwise requires these features.

Service Bus queues, topics, and subscriptions


Microsoft Azure Service Bus supports a set of cloud-based, message-oriented middleware technologies including
reliable message queuing and durable publish/subscribe messaging. These "brokered" messaging capabilities can be
thought of as decoupled messaging features that support publish-subscribe, temporal decoupling, and load balancing
232 | P a g e
70-534 Architecting Microsoft Azure Solutions

scenarios using the Service Bus messaging fabric. Decoupled communication has many advantages; for example,
clients and servers can connect as needed and perform their operations in an asynchronous fashion.
The messaging entities that form the core of the messaging capabilities in Service Bus are queues, topics and
subscriptions, and rules/actions.
Queues
Queues offer First In, First Out (FIFO) message delivery to one or more competing consumers. That is, messages are
typically expected to be received and processed by the receivers in the order in which they were added to the queue,
and each message is received and processed by only one message consumer. A key benefit of using queues is to
achieve "temporal decoupling" of application components. In other words, the producers (senders) and consumers
(receivers) do not have to be sending and receiving messages at the same time, because messages are stored durably
in the queue. Furthermore, the producer does not have to wait for a reply from the consumer in order to continue to
process and send messages.
A related benefit is "load leveling," which enables producers and consumers to send and receive messages at different
rates. In many applications, the system load varies over time; however, the processing time required for each unit of
work is typically constant. Intermediating message producers and consumers with a queue means that the consuming
application only has to be provisioned to be able to handle average load instead of peak load. The depth of the queue
grows and contracts as the incoming load varies. This directly saves money with regard to the amount of
infrastructure required to service the application load. As the load increases, more worker processes can be added to
read from the queue. Each message is processed by only one of the worker processes. Furthermore, this pull-based
load balancing allows for optimum use of the worker computers even if the worker computers differ with regard to
processing power, as they will pull messages at their own maximum rate. This pattern is often termed the "competing
consumer" pattern.
Using queues to intermediate between message producers and consumers provides an inherent loose coupling
between the components. Because producers and consumers are not aware of each other, a consumer can be
upgraded without having any effect on the producer.
Creating a queue is a multi-step process. You perform management operations for Service Bus messaging entities
(both queues and topics) via the Microsoft.ServiceBus.NamespaceManager class, which is constructed by supplying
the base address of the Service Bus namespace and the user credentials. NamespaceManager provides methods to
create, enumerate and delete messaging entities. After creating a Microsoft.ServiceBus.TokenProvider object from
the SAS name and key, and a service namespace management object, you can use
the Microsoft.ServiceBus.NamespaceManager.CreateQueue method to create the queue. For example:
// Create management credentials
TokenProvider credentials =
TokenProvider.CreateSharedAccessSignatureTokenProvider(sasKeyName,sasKeyValue);
// Create namespace client
NamespaceManager namespaceClient = new NamespaceManager(ServiceBusEnvironment.CreateServiceUri("sb",
ServiceNamespace, string.Empty), credentials);
You can then create a queue object and a messaging factory with the Service Bus URI as an argument. For example:
QueueDescription myQueue;
myQueue = namespaceClient.CreateQueue("TestQueue");
MessagingFactory factory = MessagingFactory.Create(ServiceBusEnvironment.CreateServiceUri("sb",
ServiceNamespace, string.Empty), credentials);
QueueClient myQueueClient = factory.CreateQueueClient("TestQueue");
You can then send messages to the queue. For example, if you have a list of brokered messages called MessageList,
the code appears similar to the following:
for (int count = 0; count < 6; count++)
{
var issue = MessageList[count];
issue.Label = issue.Properties["IssueTitle"].ToString();
myQueueClient.Send(issue);
}
You then receive messages from the queue as follows:
while ((message = myQueueClient.Receive(new TimeSpan(hours: 0, minutes: 0, seconds: 5))) != null)
{
Console.WriteLine(string.Format("Message received: {0}, {1}, {2}", message.SequenceNumber,
message.Label, message.MessageId));
message.Complete();

Console.WriteLine("Processing message (sleeping...)");

233 | P a g e
70-534 Architecting Microsoft Azure Solutions

Thread.Sleep(1000);
}
In the ReceiveAndDelete mode, the receive operation is single-shot; that is, when Service Bus receives the request, it
marks the message as being consumed and returns it to the application. ReceiveAndDelete mode is the simplest
model and works best for scenarios in which the application can tolerate not processing a message in the event of a
failure. To understand this, consider a scenario in which the consumer issues the receive request and then crashes
before processing it. Because Service Bus marks the message as being consumed, when the application restarts and
begins consuming messages again, it will have missed the message that was consumed prior to the crash.
In PeekLock mode, the receive operation becomes two-stage, which makes it possible to support applications that
cannot tolerate missing messages. When Service Bus receives the request, it finds the next message to be consumed,
locks it to prevent other consumers from receiving it, and then returns it to the application. After the application
finishes processing the message (or stores it reliably for future processing), it completes the second stage of the
receive process by calling Complete on the received message. When Service Bus sees the Complete call, it marks the
message as being consumed.
If the application is unable to process the message for some reason, it can call the Abandon method on the received
message (instead of Complete). This enables Service Bus to unlock the message and make it available to be received
again, either by the same consumer or by another competing consumer. Secondly, there is a timeout associated with
the lock and if the application fails to process the message before the lock timeout expires (for example, if the
application crashes), then Service Bus unlocks the message and makes it available to be received again (essentially
performing an Abandon operation by default).
Note that in the event that the application crashes after processing the message, but before the Complete request is
issued, the message is redelivered to the application when it restarts. This is often called At Least Once processing;
that is, each message is processed at least once. However, in certain situations the same message may be redelivered.
If the scenario cannot tolerate duplicate processing, then additional logic is required in the application to detect
duplicates which can be achieved based upon the MessageId property of the message, which remains constant across
delivery attempts. This is known as Exactly Once processing.
Topics and subscriptions
In contrast to queues, in which each message is processed by a single consumer, topics and subscriptions provide a
one-to-many form of communication, in a publish/subscribe pattern. Useful for scaling to very large numbers of
recipients, each published message is made available to each subscription registered with the topic. Messages are
sent to a topic and delivered to one or more associated subscriptions, depending on filter rules that can be set on a
per-subscription basis. The subscriptions can use additional filters to restrict the messages that they want to receive.
Messages are sent to a topic in the same way they are sent to a queue, but messages are not received from the topic
directly. Instead, they are received from subscriptions. A topic subscription resembles a virtual queue that receives
copies of the messages that are sent to the topic. Messages are received from a subscription identically to the way
they are received from a queue.
By way of comparison, the message-sending functionality of a queue maps directly to a topic and its message-
receiving functionality maps to a subscription. Among other things, this means that subscriptions support the same
patterns described earlier in this section with regard to queues: competing consumer, temporal decoupling, load
leveling, and load balancing.
Creating a topic is similar to creating a queue, as shown in the example in the previous section. Create the service URI,
and then use the NamespaceManager class to create the namespace client. You can then create a topic using
the CreateTopic method. For example:
TopicDescription dataCollectionTopic = namespaceClient.CreateTopic("DataCollectionTopic");
Next, add subscriptions as desired:
SubscriptionDescription myAgentSubscription = namespaceClient.CreateSubscription(myTopic.Path,
"Inventory");
SubscriptionDescription myAuditSubscription = namespaceClient.CreateSubscription(myTopic.Path,
"Dashboard");
You can then create a topic client. For example:
MessagingFactory factory = MessagingFactory.Create(serviceUri, tokenProvider);
TopicClient myTopicClient = factory.CreateTopicClient(myTopic.Path)
Using the message sender, you can send and receive messages to and from the topic, as shown in the previous
section. For example:
foreach (BrokeredMessage message in messageList)
{

234 | P a g e
70-534 Architecting Microsoft Azure Solutions

myTopicClient.Send(message);
Console.WriteLine(
string.Format("Message sent: Id = {0}, Body = {1}", message.MessageId, message.GetBody<string>()));
}
Similar to queues, messages are received from a subscription using a SubscriptionClient object instead of
a QueueClient object. Create the subscription client, passing the name of the topic, the name of the subscription, and
(optionally) the receive mode as parameters. For example, with the Inventory subscription:
// Create the subscription client
MessagingFactory factory = MessagingFactory.Create(serviceUri, tokenProvider);

SubscriptionClient agentSubscriptionClient = factory.CreateSubscriptionClient("IssueTrackingTopic",


"Inventory", ReceiveMode.PeekLock);
SubscriptionClient auditSubscriptionClient = factory.CreateSubscriptionClient("IssueTrackingTopic",
"Dashboard", ReceiveMode.ReceiveAndDelete);

while ((message = agentSubscriptionClient.Receive(TimeSpan.FromSeconds(5))) != null)


{
Console.WriteLine("\nReceiving message from Inventory...");
Console.WriteLine(string.Format("Message received: Id = {0}, Body = {1}", message.MessageId,
message.GetBody<string>()));
message.Complete();
}

// Create a receiver using ReceiveAndDelete mode


while ((message = auditSubscriptionClient.Receive(TimeSpan.FromSeconds(5))) != null)
{
Console.WriteLine("\nReceiving message from Dashboard...");
Console.WriteLine(string.Format("Message received: Id = {0}, Body = {1}", message.MessageId,
message.GetBody<string>()));
}
Rules and actions
In many scenarios, messages that have specific characteristics must be processed in different ways. To enable this,
you can configure subscriptions to find messages that have desired properties and then perform certain modifications
to those properties. While Service Bus subscriptions see all messages sent to the topic, you can only copy a subset of
those messages to the virtual subscription queue. This is accomplished using subscription filters. Such modifications
are called filter actions. When a subscription is created, you can supply a filter expression that operates on the
properties of the message, both the system properties (for example, Label) and custom application properties (for
example, StoreName.) The SQL filter expression is optional in this case; without a SQL filter expression, any filter action
defined on a subscription will be performed on all the messages for that subscription.
Using the previous example, to filter messages coming only from Store1, you would create the Dashboard subscription
as follows:
C#Copy
namespaceManager.CreateSubscription("IssueTrackingTopic", "Dashboard", new SqlFilter("StoreName = 'Store1'"));
With this subscription filter in place, only messages that have the StoreName property set to Store1 are copied to the
virtual queue for the Dashboard subscription.
For more information about possible filter values, see the documentation for the SqlFilter and SqlRuleAction classes.
Also, see the Brokered Messaging: Advanced Filters and Topic Filters samples.

What is Event Hubs?


Azure Event Hubs is a highly scalable data streaming platform and event ingestion service capable of receiving and
processing millions of events per second. Event Hubs can process and store events, data, or telemetry produced by
distributed software and devices. Data sent to an event hub can be transformed and stored using any real-time
analytics provider or batching/storage adapters. With the ability to provide publish-subscribe capabilities with low
latency and at massive scale, Event Hubs serves as the "on ramp" for Big Data.1
Why use Event Hubs?
Event Hubs event and telemetry handling capabilities make it especially useful for:
Application instrumentation
User experience or workflow processing
Internet of Things (IoT) scenarios
235 | P a g e
70-534 Architecting Microsoft Azure Solutions

For example, Event Hubs enables behavior tracking in mobile apps, traffic information from web farms, in-game event
capture in console games, or telemetry collected from industrial machines, connected vehicles, or other devices.
Azure Event Hubs overview
The common role that Event Hubs plays in solution architectures is the "front door" for an event pipeline, often called
an event ingestor. An event ingestor is a component or service that sits between event publishers and event
consumers to decouple the production of an event stream from the consumption of those events. The following figure
depicts this architecture:

Event Hubs provides message stream handling capability but has characteristics that are different from traditional
enterprise messaging. Event Hubs capabilities are built around high throughput and event processing scenarios. As
such, Event Hubs is different from Azure Service Busmessaging, and does not implement some of the capabilities that
are available for Service Bus messaging entities, such as topics.1
Event Hubs features
Event Hubs contains the following key elements:
Event producers/publishers: An entity that sends data to an event hub. An event is published via AMQP 1.0 or
HTTPS.
Partitions: Enables each consumer to only read a specific subset, or partition, of the event stream.
SAS tokens: used to identify and authenticate the event publisher.
Event consumers: An entity that reads event data from an event hub. Event consumers connect via AMQP 1.0.
Consumer groups: Provides each multiple consuming application with a separate view of the event stream,
enabling those consumers to act independently.
Throughput units: Pre-purchased units of capacity. A single partition has a maximum scale of one throughput
unit.

Develop large-scale parallel compute solutions with Batch


In this overview of the core components of the Azure Batch service, we discuss the primary service features and
resources that Batch developers can use to build large-scale parallel compute solutions.
Whether you're developing a distributed computational application or service that issues direct REST API calls or
you're using one of the Batch SDKs, you'll use many of the resources and features discussed in this article.
Tip

236 | P a g e
70-534 Architecting Microsoft Azure Solutions

For a higher-level introduction to the Batch service, see Basics of Azure Batch.
Batch service workflow
The following high-level workflow is typical of nearly all applications and services that use the Batch service for
processing parallel workloads:
1. Upload the data files that you want to process to an Azure Storage account. Batch includes built-in support for
accessing Azure Blob storage, and your tasks can download these files to compute nodes when the tasks are
run.
2. Upload the application files that your tasks will run. These files can be binaries or scripts and their
dependencies, and are executed by the tasks in your jobs. Your tasks can download these files from your
Storage account, or you can use the application packages feature of Batch for application management and
deployment.
3. Create a pool of compute nodes. When you create a pool, you specify the number of compute nodes for the
pool, their size, and the operating system. When each task in your job runs, it's assigned to execute on one of
the nodes in your pool.
4. Create a job. A job manages a collection of tasks. You associate each job to a specific pool where that job's
tasks will run.
5. Add tasks to the job. Each task runs the application or script that you uploaded to process the data files it
downloads from your Storage account. As each task completes, it can upload its output to Azure Storage.
6. Monitor job progress and retrieve the task output from Azure Storage.
The following sections discuss these and the other resources of Batch that enable your distributed computational
scenario.
Note: You need a Batch account to use the Batch service. Also, nearly all solutions use an Azure Storage account for
file storage and retrieval. Batch currently supports only the General purpose storage account type, as described in step
5 of Create a storage account in About Azure storage accounts.
Batch service resources
Some of the following resources--accounts, compute nodes, pools, jobs, and tasks--are required by all solutions that
use the Batch service. Others, like job schedules and application packages, are helpful, but optional, features.
Account
Compute node
Pool
Job
o Job schedules
Task
o Start task
o Job manager task
o Job preparation and release tasks
o Multi-instance task (MPI)
o Task dependencies
Application packages
Account
A Batch account is a uniquely identified entity within the Batch service. All processing is associated with a Batch
account.
You can create an Azure Batch account using the Azure portal or programmatically, such as with the Batch
Management .NET library. When creating the account, you can associate an Azure storage account.
Batch supports two account configurations, and you'll need to select the appropriate configuration when you create
your Batch account. The difference between the two account configurations lies in how Batch pools are allocated for
the account. You can either allocate pools of compute nodes in a subscription managed by Azure Batch, or you can
allocate them in your own subscription. The pool allocation mode property for the account determines which
configuration it uses.
To decide which account configuration to use, consider which best fits your scenario:
Batch Service: Batch Service is the default account configuration. For an account created with this
configuration, Batch pools are allocated behind the scenes in Azure-managed subscriptions. Keep in mind
these key points about the Batch Service account configuration:
o The Batch Service account configuration supports both Cloud Service and Virtual Machine pools.
237 | P a g e
70-534 Architecting Microsoft Azure Solutions

o The Batch Service account configuration supports access to the Batch APIs using either shared key
authentication or Azure Active Directory authentication.
o You can use either dedicated or low-priority compute nodes in pools in the Batch Service account
configuration.
o Do not use the Batch Service account configuration if you plan to create Azure virtual machine pools
from custom VM images, or if you plan to use a virtual network. Create your account with the User
Subscription account configuration instead.
o Virtual Machine pools provisioned in an account with the Batch Service subscription account
configuration must be created from Azure Virtual Machines Marketplace images.
User subscription: With the User Subscription account configuration, Batch pools are allocated in the Azure
subscription where the account is created. Keep in mind these key points about the User Subscription account
configuration:
o The User Subscription account configuration supports only Virtual Machine pools. It does not support
Cloud Services pools.
o To create Virtual Machine pools from custom VM images or to use a virtual network with Virtual
Machine pools, you must use the User Subscription configuration.
o You must authenticate requests to the Batch service using Azure Active Directory authentication.
o The User Subscription account configuration requires you to set up an Azure key vault for your Batch
account.
o You can use only dedicated compute nodes in pools in an account created with the User Subscription
account configuration. Low-priority nodes are not supported.
o Virtual Machine pools provisioned in an account with the User Subscription account configuration can
be created either from Azure Virtual Machines Marketplace images, or from custom images that you
provide.
Compute node
A compute node is an Azure virtual machine (VM) or cloud service VM that is dedicated to processing a portion of
your application's workload. The size of a node determines the number of CPU cores, memory capacity, and local file
system size that is allocated to the node. You can create pools of Windows or Linux nodes by using either Azure Cloud
Services or Virtual Machines Marketplace images. See the following Pool section for more information on these
options.
Nodes can run any executable or script that is supported by the operating system environment of the node. This
includes *.exe, *.cmd, *.bat and PowerShell scripts for Windows--and binaries, shell, and Python scripts for Linux.
All compute nodes in Batch also include:
A standard folder structure and associated environment variables that are available for reference by tasks.
Firewall settings that are configured to control access.
Remote access to both Windows (Remote Desktop Protocol (RDP)) and Linux (Secure Shell (SSH)) nodes.
Pool
A pool is a collection of nodes that your application runs on. The pool can be created manually by you, or
automatically by the Batch service when you specify the work to be done. You can create and manage a pool that
meets the resource requirements of your application. A pool can be used only by the Batch account in which it was
created. A Batch account can have more than one pool.
Azure Batch pools build on top of the core Azure compute platform. They provide large-scale allocation, application
installation, data distribution, health monitoring, and flexible adjustment of the number of compute nodes within a
pool (scaling).
Every node that is added to a pool is assigned a unique name and IP address. When a node is removed from a pool,
any changes that are made to the operating system or files are lost, and its name and IP address are released for
future use. When a node leaves a pool, its lifetime is over.
When you create a pool, you can specify the following attributes. Some settings differ, depending on the pool
allocation mode of the Batch account:
Compute node operating system and version
Compute node type and target number of nodes
Size of the compute nodes
Scaling policy
Task scheduling policy
238 | P a g e
70-534 Architecting Microsoft Azure Solutions

Communication status for compute nodes


Start tasks for compute nodes
Application packages
Network configuration
Each of these settings is described in more detail in the following sections.
Important: Batch accounts created with the Batch Service configuration have a default quota that limits the number of
cores in a Batch account. The number of cores corresponds to the number of compute nodes. You can find the default
quotas and instructions on how to increase a quota in Quotas and limits for the Azure Batch service. If your pool is not
achieving its target number of nodes, the core quota might be the reason.
Batch accounts created with the User Subscription configuration do not observe the Batch service quotas. Instead,
they share in the core quota for the specified subscription. For more information, see Virtual Machines limits in Azure
subscription and service limits, quotas, and constraints.
Compute node operating system and version
When you create a Batch pool, you can specify the Azure virtual machine configuration and the type of operating
system you want to run on each compute node in the pool. The two types of configurations available in Batch are:
The Virtual Machine Configuration, which specifies that the pool is comprised of Azure virtual machines. These
VMs may be created from either Linux or Windows images.
When you create a pool based on the Virtual Machine Configuration, you must specify not only the size of the nodes
and the source of the images used to create them, but also the virtual machine image reference and the Batch node
agent SKU to be installed on the nodes. For more information about specifying these pool properties, see Provision
Linux compute nodes in Azure Batch pools.
The Cloud Services Configuration, which specifies that the pool is comprised of Azure Cloud Services nodes.
Cloud Services provide Windows compute nodes only.
Available operating systems for Cloud Services Configuration pools are listed in the Azure Guest OS releases and SDK
compatibility matrix. When you create a pool that contains Cloud Services nodes, you need to specify the node size
and its OS Family. Cloud Services are deployed to Azure more quickly than virtual machines running Windows. If you
want pools of Windows compute nodes, you may find that Cloud Services provide a performance benefit in terms of
deployment time.
o The OS Family also determines which versions of .NET are installed with the OS.
o As with worker roles within Cloud Services, you can specify an OS Version (for more information on
worker roles, see the Tell me about cloud services section in the Cloud Services overview).
o As with worker roles, we recommend that you specify * for the OS Version so that the nodes are
automatically upgraded, and there is no work required to cater to newly released versions. The
primary use case for selecting a specific OS version is to ensure application compatibility, which allows
backward compatibility testing to be performed before allowing the version to be updated. After
validation, the OS Version for the pool can be updated and the new OS image can be installed--any
running tasks are interrupted and requeued.
See the Account section for information on setting the pool allocation mode when you create a Batch account.
Custom images for Virtual Machine pools
To use custom images for your Virtual Machine pools, create your Batch account with the User Subscription account
configuration. With this configuration, Batch pools are allocated into the subscription where the account resides. See
the Account section for information on setting the pool allocation mode when you create a Batch account.
To create a Virtual Machine Configuration pool using a custom image, you'll need one or more standard Azure Storage
accounts to store your custom VHD images. Custom images are stored as blobs. To reference your custom images
when you create a pool, specify the URIs of the custom image VHD blobs for the osDisk property of
the virtualMachineConfiguration property.
Make sure that your storage accounts meet the following criteria:
The storage accounts containing the custom image VHD blobs need to be in the same subscription as the
Batch account (the user subscription).
The specified storage accounts need to be in the same region as the Batch account.
Only standard storage accounts are currently supported. Azure Premium storage will be supported in the
future.
You can specify one storage account with multiple custom VHD blobs or multiple storage accounts each
having a single blob. We recommend you to use multiple storage accounts to get a better performance.
239 | P a g e
70-534 Architecting Microsoft Azure Solutions

One unique custom image VHD blob can support up to 40 Linux VM instances or 20 Windows VM instances.
You will need to create copies of the VHD blob to create pools with more VMs. For example, a pool with 200
Windows VMs needs 10 unique VHD blobs specified for the osDisk property.
When you create a pool, you need to select the appropriate nodeAgentSkuId, depending on the OS of the base image
of your VHD. You can get a mapping of available node agent SKU ID's to their OS Image references by calling the List
Supported Node Agent SKUs operation.
To create a pool from a custom image using the Azure portal:
1. Navigate to your Batch account in the Azure portal.
2. On the Settings blade, select the Pools menu item.
3. On the Pools blade, select the Add command; the Add pool blade will be displayed.
4. Select Custom Image (Linux/Windows) from the Image Type dropdown. The portal displays the Custom
Image picker. Choose one or more VHDs from the same container and click the Select button. Support for
multiple VHDs from different storage accounts and different containers will be added in the future.
5. Select the correct Publisher/Offer/Sku for your custom VHDs, select the desired Caching mode, then fill in all
the other parameters for the pool.
6. To check if a pool is based on a custom image, see the Operating System property in the resource summary
section of the Pool blade. The value of this property should be Custom VM image.
7. All custom VHDs associated with a pool are displayed on the pool's Properties blade.
Compute node type and target number of nodes
When you create a pool, you can specify which types of compute nodes you want and the target number for each.
The two types of compute nodes are:
Dedicated compute nodes. Dedicated compute nodes are reserved for your workloads. They are more
expensive than low-priority nodes, but they are guaranteed to never be preempted.
Low-priority compute nodes. Low-priority nodes take advantage of surplus capacity in Azure to run your Batch
workloads. Low-priority nodes are less expensive per hour than dedicated nodes, and enable workloads
requiring a lot of compute power. For more information, see Use low-priority VMs with Batch.
Low-priority compute nodes may be preempted when Azure has insufficient surplus capacity. If a node is preempted
while running tasks, the tasks are requeued and run again once a compute node becomes available again. Low-priority
nodes are a good option for workloads where the job completion time is flexible and the work is distributed across
many nodes. Before you decide to use low-priority nodes for your scenario, make sure that any work lost due to
preemption will be minimal and easy to recreate.
Low-priority compute nodes are available only for Batch accounts created with the pool allocation mode set to Batch
Service.
You can have both low-priority and dedicated compute nodes in the same pool. Each type of node low-priority and
dedicated has its own target setting, for which you can specify the desired number of nodes.
The number of compute nodes is referred to as a target because, in some situations, your pool might not reach the
desired number of nodes. For example, a pool might not achieve the target if it reaches the core quota for your Batch
account first. Or, the pool might not achieve the target if you have applied an auto-scaling formula to the pool that
limits the maximum number of nodes.
For pricing information for both low-priority and dedicated compute nodes, see Batch Pricing.
Size of the compute nodes
Cloud Services Configuration compute node sizes are listed in Sizes for Cloud Services. Batch supports all Cloud
Services sizes except ExtraSmall, STANDARD_A1_V2, and STANDARD_A2_V2.
Virtual Machine Configuration compute node sizes are listed in Sizes for virtual machines in Azure (Linux) and Sizes for
virtual machines in Azure(Windows). Batch supports all Azure VM sizes except STANDARD_A0 and those with
premium storage (STANDARD_GS, STANDARD_DS, and STANDARD_DSV2 series).
When selecting a compute node size, consider the characteristics and requirements of the applications you'll run on
the nodes. Aspects like whether the application is multithreaded and how much memory it consumes can help
determine the most suitable and cost-effective node size. It's typical to select a node size assuming one task will run
on a node at a time. However, it is possible to have multiple tasks (and therefore multiple application instances) run in
parallel on compute nodes during job execution. In this case, it is common to choose a larger node size to
accommodate the increased demand of parallel task execution. See Task scheduling policy for more information.
All of the nodes in a pool are the same size. If you intend to run applications with differing system requirements
and/or load levels, we recommend that you use separate pools.
240 | P a g e
70-534 Architecting Microsoft Azure Solutions

Scaling policy
For dynamic workloads, you can write and apply an auto-scaling formula to a pool. The Batch service periodically
evaluates your formula and adjusts the number of nodes within the pool based on various pool, job, and task
parameters that you can specify.
Task scheduling policy
The max tasks per node configuration option determines the maximum number of tasks that can be run in parallel on
each compute node within the pool.
The default configuration specifies that one task at a time runs on a node, but there are scenarios where it is
beneficial to have two or more tasks executed on a node simultaneously. See the example scenario in the concurrent
node tasks article to see how you can benefit from multiple tasks per node.
You can also specify a fill type which determines whether Batch spreads the tasks evenly across all nodes in a pool, or
packs each node with the maximum number of tasks before assigning tasks to another node.
Communication status for compute nodes
In most scenarios, tasks operate independently and do not need to communicate with one another. However, there
are some applications in which tasks must communicate, like MPI scenarios.
You can configure a pool to allow internode communication, so that nodes within a pool can communicate at runtime.
When internode communication is enabled, nodes in Cloud Services Configuration pools can communicate with each
other on ports greater than 1100, and Virtual Machine Configuration pools do not restrict traffic on any port.
Note that enabling internode communication also impacts the placement of the nodes within clusters and might limit
the maximum number of nodes in a pool because of deployment restrictions. If your application does not require
communication between nodes, the Batch service can allocate a potentially large number of nodes to the pool from
many different clusters and datacenters to enable increased parallel processing power.
Start tasks for compute nodes
The optional start task executes on each node as that node joins the pool, and each time a node is restarted or
reimaged. The start task is especially useful for preparing compute nodes for the execution of tasks, like installing the
applications that your tasks run on the compute nodes.
Application packages
You can specify application packages to deploy to the compute nodes in the pool. Application packages provide
simplified deployment and versioning of the applications that your tasks run. Application packages that you specify for
a pool are installed on every node that joins that pool, and every time a node is rebooted or reimaged. Application
packages are currently unsupported on Linux compute nodes.
Network configuration
You can specify the subnet of an Azure virtual network (VNet) in which the pool's compute nodes should be created.
See the Pool network configuration section for more information.
Job
A job is a collection of tasks. It manages how computation is performed by its tasks on the compute nodes in a pool.
The job specifies the pool in which the work is to be run. You can create a new pool for each job, or use one
pool for many jobs. You can create a pool for each job that is associated with a job schedule, or for all jobs
that are associated with a job schedule.
You can specify an optional job priority. When a job is submitted with a higher priority than jobs that are
currently in progress, the tasks for the higher-priority job are inserted into the queue ahead of tasks for the
lower-priority jobs. Tasks in lower-priority jobs that are already running are not preempted.
You can use job constraints to specify certain limits for your jobs:
You can set a maximum wallclock time, so that if a job runs for longer than the maximum wallclock time that is
specified, the job and all of its tasks are terminated.
Batch can detect and then retry failed tasks. You can specify the maximum number of task retries as a constraint,
including whether a task is always or never retried. Retrying a task means that the task is requeued to be run again.
Your client application can add tasks to a job, or you can specify a job manager task. A job manager task
contains the information that is necessary to create the required tasks for a job, with the job manager task
being run on one of the compute nodes in the pool. The job manager task is handled specifically by Batch--it is
queued as soon as the job is created, and is restarted if it fails. A job manager task is required for jobs that are
created by a job schedule because it is the only way to define the tasks before the job is instantiated.
By default, jobs remain in the active state when all tasks within the job are complete. You can change this
behavior so that the job is automatically terminated when all tasks in the job are complete. Set the
241 | P a g e
70-534 Architecting Microsoft Azure Solutions

job's onAllTasksComplete property (OnAllTasksComplete in Batch .NET) to terminatejob to automatically


terminate the job when all of its tasks are in the completed state.
Note that the Batch service considers a job with no tasks to have all of its tasks completed. Therefore, this option is
most commonly used with a job manager task. If you want to use automatic job termination without a job manager,
you should initially set a new job's onAllTasksComplete property to noaction, then set it to terminatejob only after
you've finished adding tasks to the job.
Job priority
You can assign a priority to jobs that you create in Batch. The Batch service uses the priority value of the job to
determine the order of job scheduling within an account (this is not to be confused with a scheduled job). The priority
values range from -1000 to 1000, with -1000 being the lowest priority and 1000 being the highest. To update the
priority of a job, call the Update the properties of a job operation (Batch REST), or modify
the CloudJob.Priority property (Batch .NET).
Within the same account, higher-priority jobs have scheduling precedence over lower-priority jobs. A job with a
higher-priority value in one account does not have scheduling precedence over another job with a lower-priority value
in a different account.
Job scheduling across pools is independent. Between different pools, it is not guaranteed that a higher-priority job is
scheduled first if its associated pool is short of idle nodes. In the same pool, jobs with the same priority level have an
equal chance of being scheduled.
Scheduled jobs
Job schedules enable you to create recurring jobs within the Batch service. A job schedule specifies when to run jobs
and includes the specifications for the jobs to be run. You can specify the duration of the schedule--how long and
when the schedule is in effect--and how frequently jobs are created during the scheduled period.
Task
A task is a unit of computation that is associated with a job. It runs on a node. Tasks are assigned to a node for
execution, or are queued until a node becomes free. Put simply, a task runs one or more programs or scripts on a
compute node to perform the work you need done.
When you create a task, you can specify:
The command line for the task. This is the command line that runs your application or script on the compute
node.
It is important to note that the command line does not actually run under a shell. Therefore, it cannot natively take
advantage of shell features like environment variable expansion (this includes the PATH). To take advantage of such
features, you must invoke the shell in the command line--for example, by launching cmd.exe on Windows nodes
or /bin/sh on Linux:
cmd /c MyTaskApplication.exe %MY_ENV_VAR%
/bin/sh -c MyTaskApplication $MY_ENV_VAR
If your tasks need to run an application or script that is not in the node's PATH or reference environment variables,
invoke the shell explicitly in the task command line.
Resource files that contain the data to be processed. These files are automatically copied to the node from
Blob storage in a general-purpose Azure Storage account before the task's command line is executed. For
more information, see the sections Start task and Files and directories.
The environment variables that are required by your application. For more information, see the Environment
settings for tasks section.
The constraints under which the task should execute. For example, constraints include the maximum time that
the task is allowed to run, the maximum number of times a failed task should be retried, and the maximum
time that files in the task's working directory are retained.
Application packages to deploy to the compute node on which the task is scheduled to run. Application
packages provide simplified deployment and versioning of the applications that your tasks run. Task-level
application packages are especially useful in shared-pool environments, where different jobs are run on one
pool, and the pool is not deleted when a job is completed. If your job has fewer tasks than nodes in the pool,
task application packages can minimize data transfer since your application is deployed only to the nodes that
run tasks.
In addition to tasks you define to perform computation on a node, the following special tasks are also provided by the
Batch service:
Start task
242 | P a g e
70-534 Architecting Microsoft Azure Solutions

Job manager task


Job preparation and release tasks
Multi-instance tasks (MPI)
Task dependencies
Start task
By associating a start task with a pool, you can prepare the operating environment of its nodes. For example, you can
perform actions like installing the applications that your tasks run or starting background processes. The start task
runs every time a node starts, for as long as it remains in the pool--including when the node is first added to the pool
and when it is restarted or reimaged.
A primary benefit of the start task is that it can contain all of the information that is necessary to configure a compute
node and install the applications that are required for task execution. Therefore, increasing the number of nodes in a
pool is as simple as specifying the new target node count. The start task provides the Batch service the information
that is needed to configure the new nodes and get them ready for accepting tasks.
As with any Azure Batch task, you can specify a list of resource files in Azure Storage, in addition to a command line to
be executed. The Batch service first copies the resource files to the node from Azure Storage, and then runs the
command line. For a pool start task, the file list typically contains the task application and its dependencies.
However, the start task could also include reference data to be used by all tasks that are running on the compute
node. For example, a start task's command line could perform a robocopy operation to copy application files (which
were specified as resource files and downloaded to the node) from the start task's working directory to the shared
folder, and then run an MSI or setup.exe.
Important
Batch currently supports only the General purpose storage account type, as described in step 5 of Create a storage
account in About Azure storage accounts. Your Batch tasks (including standard tasks, start tasks, job preparation tasks,
and job release tasks) must specify resource files that reside only in General purpose storage accounts.
It is typically desirable for the Batch service to wait for the start task to complete before considering the node ready to
be assigned tasks, but you can configure this.
If a start task fails on a compute node, then the state of the node is updated to reflect the failure, and the node is not
assigned any tasks. A start task can fail if there is an issue copying its resource files from storage, or if the process
executed by its command line returns a nonzero exit code.
If you add or update the start task for an existing pool, you must reboot its compute nodes for the start task to be
applied to the nodes.
Job manager task
You typically use a job manager task to control and/or monitor job execution--for example, to create and submit the
tasks for a job, determine additional tasks to run, and determine when work is complete. However, a job manager task
is not restricted to these activities. It is a fully fledged task that can perform any actions that are required for the job.
For example, a job manager task might download a file that is specified as a parameter, analyze the contents of that
file, and submit additional tasks based on those contents.
A job manager task is started before all other tasks. It provides the following features:
It is automatically submitted as a task by the Batch service when the job is created.
It is scheduled to execute before the other tasks in a job.
Its associated node is the last to be removed from a pool when the pool is being downsized.
Its termination can be tied to the termination of all tasks in the job.
A job manager task is given the highest priority when it needs to be restarted. If an idle node is not available,
the Batch service might terminate one of the other running tasks in the pool to make room for the job
manager task to run.
A job manager task in one job does not have priority over the tasks of other jobs. Across jobs, only job-level
priorities are observed.
Job preparation and release tasks
Batch provides job preparation tasks for pre-job execution setup. Job release tasks are for post-job maintenance or
cleanup.
Job preparation task: A job preparation task runs on all compute nodes that are scheduled to run tasks, before
any of the other job tasks are executed. You can use a job preparation task to copy data that is shared by all
tasks, but is unique to the job, for example.

243 | P a g e
70-534 Architecting Microsoft Azure Solutions

Job release task: When a job has completed, a job release task runs on each node in the pool that executed at
least one task. You can use a job release task to delete data that is copied by the job preparation task, or to
compress and upload diagnostic log data, for example.
Both job preparation and release tasks allow you to specify a command line to run when the task is invoked. They
offer features like file download, elevated execution, custom environment variables, maximum execution duration,
retry count, and file retention time.
For more information on job preparation and release tasks, see Run job preparation and completion tasks on Azure
Batch compute nodes.
Multi-instance task
A multi-instance task is a task that is configured to run on more than one compute node simultaneously. With multi-
instance tasks, you can enable high-performance computing scenarios that require a group of compute nodes that are
allocated together to process a single workload (like Message Passing Interface (MPI)).
For a detailed discussion on running MPI jobs in Batch by using the Batch .NET library, check out Use multi-instance
tasks to run Message Passing Interface (MPI) applications in Azure Batch.
Task dependencies
Task dependencies, as the name implies, allow you to specify that a task depends on the completion of other tasks
before its execution. This feature provides support for situations in which a "downstream" task consumes the output
of an "upstream" task--or when an upstream task performs some initialization that is required by a downstream task.
To use this feature, you must first enable task dependencies on your Batch job. Then, for each task that depends on
another (or many others), you specify the tasks which that task depends on.
With task dependencies, you can configure scenarios like the following:
taskB depends on taskA (taskB will not begin execution until taskA has completed).
taskC depends on both taskA and taskB.
taskD depends on a range of tasks, such as tasks 1 through 10, before it executes.
Check out Task dependencies in Azure Batch and the TaskDependencies code sample in the azure-batch-
samples GitHub repository for more in-depth details on this feature.
Environment settings for tasks
Each task executed by the Batch service has access to environment variables that it sets on compute nodes. This
includes environment variables defined by the Batch service (service-defined) and custom environment variables that
you can define for your tasks. The applications and scripts your tasks execute have access to these environment
variables during execution.
You can set custom environment variables at the task or job level by populating the environment settings property for
these entities. For example, see the Add a task to a job operation (Batch REST API), or
the CloudTask.EnvironmentSettings and CloudJob.CommonEnvironmentSettingsproperties in Batch .NET.
Your client application or service can obtain a task's environment variables, both service-defined and custom, by using
the Get information about a task operation (Batch REST) or by accessing the CloudTask.EnvironmentSettings property
(Batch .NET). Processes executing on a compute node can access these and other environment variables on the node,
for example, by using the familiar %VARIABLE_NAME% (Windows) or $VARIABLE_NAME (Linux) syntax.
You can find a full list of all service-defined environment variables in Compute node environment variables.
Files and directories
Each task has a working directory under which it creates zero or more files and directories. This working directory can
be used for storing the program that is run by the task, the data that it processes, and the output of the processing it
performs. All files and directories of a task are owned by the task user.
The Batch service exposes a portion of the file system on a node as the root directory. Tasks can access the root
directory by referencing the AZ_BATCH_NODE_ROOT_DIR environment variable. For more information about using
environment variables, see Environment settings for tasks.
The root directory contains the following directory structure:

244 | P a g e
70-534 Architecting Microsoft Azure Solutions

shared: This directory provides read/write access to all tasks that run on a node. Any task that runs on the
node can create, read, update, and delete files in this directory. Tasks can access this directory by referencing
the AZ_BATCH_NODE_SHARED_DIR environment variable.
startup: This directory is used by a start task as its working directory. All of the files that are downloaded to
the node by the start task are stored here. The start task can create, read, update, and delete files under this
directory. Tasks can access this directory by referencing the AZ_BATCH_NODE_STARTUP_DIR environment
variable.
Tasks: A directory is created for each task that runs on the node. It is accessed by referencing
the AZ_BATCH_TASK_DIR environment variable.
Within each task directory, the Batch service creates a working directory (wd) whose unique path is specified by
the AZ_BATCH_TASK_WORKING_DIR environment variable. This directory provides read/write access to the task. The
task can create, read, update, and delete files under this directory. This directory is retained based on
the RetentionTime constraint that is specified for the task.
stdout.txt and stderr.txt: These files are written to the task folder during the execution of the task.
Important
When a node is removed from the pool, all of the files that are stored on the node are removed.
Application packages
The application packages feature provides easy management and deployment of applications to the compute nodes in
your pools. You can upload and manage multiple versions of the applications run by your tasks, including their binaries
and support files. Then you can automatically deploy one or more of these applications to the compute nodes in your
pool.
You can specify application packages at the pool and task level. When you specify pool application packages, the
application is deployed to every node in the pool. When you specify task application packages, the application is
deployed only to nodes that are scheduled to run at least one of the job's tasks, just before the task's command line is
run.
Batch handles the details of working with Azure Storage to store your application packages and deploy them to
compute nodes, so both your code and management overhead can be simplified.
To find out more about the application package feature, check out Application deployment with Azure Batch
application packages.
Note
If you add pool application packages to an existing pool, you must reboot its compute nodes for the application
packages to be deployed to the nodes.
Pool and compute node lifetime
When you design your Azure Batch solution, you have to make a design decision about how and when pools are
created, and how long compute nodes within those pools are kept available.
On one end of the spectrum, you can create a pool for each job that you submit, and delete the pool as soon as its
tasks finish execution. This maximizes utilization because the nodes are only allocated when needed, and shut down
as soon as they're idle. While this means that the job must wait for the nodes to be allocated, it's important to note
that tasks are scheduled for execution as soon as nodes are individually available, allocated, and the start task has
completed. Batch does not wait until all nodes within a pool are available before assigning tasks to the nodes. This
ensures maximum utilization of all available nodes.

245 | P a g e
70-534 Architecting Microsoft Azure Solutions

At the other end of the spectrum, if having jobs start immediately is the highest priority, you can create a pool ahead
of time and make its nodes available before jobs are submitted. In this scenario, tasks can start immediately, but
nodes might sit idle while waiting for them to be assigned.
A combined approach is typically used for handling a variable, but ongoing, load. You can have a pool that multiple
jobs are submitted to, but can scale the number of nodes up or down according to the job load (see Scaling compute
resources in the following section). You can do this reactively, based on current load, or proactively, if load can be
predicted.
Pool network configuration
When you create a pool of compute nodes in Azure Batch, you can specify a subnet ID of an Azure virtual network
(VNet) in which the pool's compute nodes should be created.
The VNet must be:
o In the same Azure region as the Azure Batch account.
o In the same subscription as the Azure Batch account.
The type of VNet supported depends on how pools are being allocated for the Batch account:
o If the Batch account was created with its poolAllocationMode property set to 'BatchService', then the
specified VNet must be a classic VNet.
o If the Batch account was created with its poolAllocationMode property set to 'UserSubscription', then
the specified VNet may be a classic VNet or an Azure Resource Manager VNet. Pools must be created
with a virtual machine configuration in order to use a VNet. Pools created with a cloud service
configuration are not supported.
If the Batch account was created with its poolAllocationMode property set to 'BatchService', then you must
provide permissions for the Batch service principal to access the VNet. The Batch service principal, named
'Microsoft Azure Batch' or 'MicrosoftAzureBatch', must have the Classic Virtual Machine Contributor Role-
Based Access Control (RBAC) role for the specified VNet. If the specified RBAC role is not provided, the Batch
service returns 400 (Bad Request).
The specified subnet should have enough free IP addresses to accommodate the total number of target
nodes; that is, the sum of the targetDedicatedNodes and targetLowPriorityNodes properties of the pool. If the
subnet doesn't have enough free IP addresses, the Batch service partially allocates the compute nodes in the
pool and returns a resize error.
The specified subnet must allow communication from the Batch service to be able to schedule tasks on the
compute nodes. If communication to the compute nodes is denied by a Network Security Group
(NSG) associated with the VNet, then the Batch service sets the state of the compute nodes to unusable.
If the specified VNet has any associated Network Security Groups (NSG), then a few reserved system ports
must be enabled for inbound communication. For pools created with a virtual machine configuration, enable
ports 29876 and 29877, as well as port 22 for Linux and port 3389 for Windows. For pools created with a
cloud service configuration, enable ports 10100, 20100, and 30100. Additionally, enable outbound
connections to Azure Storage on port 443.
The following table describes the inbound ports that you need to enable for pools that you created with the virtual
machine configuration:

Required
for VM to
Does Batch add be Action from
Destination Port(s) Source IP address NSGs? usable? user

For pools Only Batch service Yes. Batch adds Yes You do not
created with role IP addresses NSGs at the level need to
the virtual of network specify an
machine interfaces (NIC) NSG, because
configuration: attached to VMs. Batch allows
29876, 29877 These NSGs allow only Batch IP
For pools traffic only from addresses.
created with Batch service role

246 | P a g e
70-534 Architecting Microsoft Azure Solutions

Required
for VM to
Does Batch add be Action from
Destination Port(s) Source IP address NSGs? usable? user

the cloud IP addresses. However, if


service Even if you open you do
configuration: these ports for specify an
10100, 20100, the entire web, NSG, please
30100 the traffic will get ensure that
blocked at the these ports
NIC. are open for
inbound
traffic.

If you specify
* as the
source IP in
your NSG,
Batch still
adds NSGs at
the level of
NIC attached
to VMs.

3389, 22 User machines, used No No Add NSGs if


for debugging you want to
purposes, so that you permit
can remotely access remote
the VM. access
(RDP/SSH) to
the VM.

The following table describes the outbound port that you need to enable to permit access to Azure Storage:

Outbound Does Batch Required for VM


Port(s) Destination add NSGs? to be usable? Action from user

443 Azure No Yes If you add any NSGs, then


Storage ensure that this port is open to
outbound traffic.

Additional settings for the VNet depend on the pool allocation mode of the Batch account.
VNets for pools provisioned in the Batch service
In Batch service allocation mode, only Cloud Services Configuration pools can be assigned a VNet. Additionally, the
specified VNet must be aclassic VNet. VNets created with the Azure Resource Manager deployment model are not
supported.
The MicrosoftAzureBatch service principal must have the Classic Virtual Machine Contributor Role-Based
Access Control (RBAC) role for the specified VNet. In the Azure portal:
o Select the VNet, then Access control (IAM) > Roles > Classic Virtual Machine Contributor > Add
o Enter "MicrosoftAzureBatch" in the Search box
o Check the MicrosoftAzureBatch check box
o Select the Select button
247 | P a g e
70-534 Architecting Microsoft Azure Solutions

VNets for pools provisioned in a user subscription


In user subscription allocation mode, only Virtual Machine Configuration pools are supported and can be assigned a
VNet. Additionally, the specified VNet must be a Resource Manager based VNet. VNets created with the classic
deployment model are not supported.
Scaling compute resources
With automatic scaling, you can have the Batch service dynamically adjust the number of compute nodes in a pool
according to the current workload and resource usage of your compute scenario. This allows you to lower the overall
cost of running your application by using only the resources you need, and releasing those you don't need.
You enable automatic scaling by writing an automatic scaling formula and associating that formula with a pool. The
Batch service uses the formula to determine the target number of nodes in the pool for the next scaling interval (an
interval that you can configure). You can specify the automatic scaling settings for a pool when you create it, or enable
scaling on a pool later. You can also update the scaling settings on a scaling-enabled pool.
As an example, perhaps a job requires that you submit a very large number of tasks to be executed. You can assign a
scaling formula to the pool that adjusts the number of nodes in the pool based on the current number of queued
tasks and the completion rate of the tasks in the job. The Batch service periodically evaluates the formula and resizes
the pool, based on workload and your other formula settings. The service adds nodes as needed when there are a
large number of queued tasks, and removes nodes when there are no queued or running tasks.
A scaling formula can be based on the following metrics:
Time metrics are based on statistics collected every five minutes in the specified number of hours.
Resource metrics are based on CPU usage, bandwidth usage, memory usage, and number of nodes.
Task metrics are based on task state, such as Active (queued), Running, or Completed.
When automatic scaling decreases the number of compute nodes in a pool, you must consider how to handle tasks
that are running at the time of the decrease operation. To accommodate this, Batch provides a node deallocation
option that you can include in your formulas. For example, you can specify that running tasks are stopped
immediately, stopped immediately and then requeued for execution on another node, or allowed to finish before the
node is removed from the pool.
For more information about automatically scaling an application, see Automatically scale compute nodes in an Azure
Batch pool.
Tip
To maximize compute resource utilization, set the target number of nodes to zero at the end of a job, but allow
running tasks to finish.
Security with certificates
You typically need to use certificates when you encrypt or decrypt sensitive information for tasks, like the key for
an Azure Storage account. To support this, you can install certificates on nodes. Encrypted secrets are passed to tasks
via command-line parameters or embedded in one of the task resources, and the installed certificates can be used to
decrypt them.
You use the Add certificate operation (Batch REST) or CertificateOperations.CreateCertificate method (Batch .NET) to
add a certificate to a Batch account. You can then associate the certificate with a new or existing pool. When a
certificate is associated with a pool, the Batch service installs the certificate on each node in the pool. The Batch
service installs the appropriate certificates when the node starts up, before launching any tasks (including the start
task and job manager task).
If you add certificates to an existing pool, you must reboot its compute nodes for the certificates to be applied to the
nodes.
Error handling
You might find it necessary to handle both task and application failures within your Batch solution.
Task failure handling
Task failures fall into these categories:
Pre-processing failures
If a task fails to start, a pre-processing error is set for the task.
Pre-processing errors can occur if the task's resource files have moved, the Storage account is no longer available, or
another issue was encountered that prevented the successful copying of files to the node.
File upload failures
If uploading files that are specified for a task fails for any reason, a file upload error is set for the task.

248 | P a g e
70-534 Architecting Microsoft Azure Solutions

File upload errors can occur if the SAS supplied for accessing Azure Storage is invalid or does not provide write
permissions, if the storage account is no longer available, or if another issue was encountered that prevented the
successful copying of files from the node.
Application failures
The process that is specified by the task's command line can also fail. The process is deemed to have failed when a
nonzero exit code is returned by the process that is executed by the task (see Task exit codes in the next section).
For application failures, you can configure Batch to automatically retry the task up to a specified number of times.
Constraint failures
You can set a constraint that specifies the maximum execution duration for a job or task, the maxWallClockTime. This
can be useful for terminating tasks that fail to progress.
When the maximum amount of time has been exceeded, the task is marked as completed, but the exit code is set
to 0xC000013A and the schedulingError field is marked as { category:"ServerError", code="TaskEnded"}.
Debugging application failures
stderr and stdout
During execution, an application might produce diagnostic output that you can use to troubleshoot issues. As
mentioned in the earlier section Files and directories, the Batch service writes standard output and standard error
output to stdout.txt and stderr.txt files in the task directory on the compute node. You can use the Azure portal or one
of the Batch SDKs to download these files. For example, you can retrieve these and other files for troubleshooting
purposes by using ComputeNode.GetNodeFile and CloudTask.GetNodeFile in the Batch .NET library.
Task exit codes
As mentioned earlier, a task is marked as failed by the Batch service if the process that is executed by the task returns
a nonzero exit code. When a task executes a process, Batch populates the task's exit code property with the return
code of the process. It is important to note that a task's exit code is not determined by the Batch service. A task's exit
code is determined by the process itself or the operating system on which the process executed.
Accounting for task failures or interruptions
Tasks might occasionally fail or be interrupted. The task application itself might fail, the node on which the task is
running might be rebooted, or the node might be removed from the pool during a resize operation if the pool's
deallocation policy is set to remove nodes immediately without waiting for tasks to finish. In all cases, the task can be
automatically requeued by Batch for execution on another node.
It is also possible for an intermittent issue to cause a task to hang or take too long to execute. You can set the
maximum execution interval for a task. If the maximum execution interval is exceeded, the Batch service interrupts
the task application.
Connecting to compute nodes
You can perform additional debugging and troubleshooting by signing in to a compute node remotely. You can use the
Azure portal to download a Remote Desktop Protocol (RDP) file for Windows nodes and obtain Secure Shell (SSH)
connection information for Linux nodes. You can also do this by using the Batch APIs--for example, with Batch
.NET or Batch Python.
Important
To connect to a node via RDP or SSH, you must first create a user on the node. To do this, you can use the Azure
portal, add a user account to a node by using the Batch REST API, call
the ComputeNode.CreateComputeNodeUser method in Batch .NET, or call the add_user method in the Batch Python
module.
Troubleshooting problematic compute nodes
In situations where some of your tasks are failing, your Batch client application or service can examine the metadata
of the failed tasks to identify a misbehaving node. Each node in a pool is given a unique ID, and the node on which a
task runs is included in the task metadata. After you've identified a problem node, you can take several actions with it:
Reboot the node (REST | .NET)
Restarting the node can sometimes clear up latent issues like stuck or crashed processes. Note that if your pool uses a
start task or your job uses a job preparation task, they are executed when the node restarts.
Reimage the node (REST | .NET)
This reinstalls the operating system on the node. As with rebooting a node, start tasks and job preparation tasks are
rerun after the node has been reimaged.
Remove the node from the pool (REST | .NET)
Sometimes it is necessary to completely remove the node from the pool.
249 | P a g e
70-534 Architecting Microsoft Azure Solutions

Disable task scheduling on the node (REST | .NET)


This effectively takes the node offline so that no further tasks are assigned to it, but allows the node to remain
running and in the pool. This enables you to perform further investigation into the cause of the failures without losing
the failed task's data, and without the node causing additional task failures. For example, you can disable task
scheduling on the node, then sign in remotely to examine the node's event logs or perform other troubleshooting.
After you've finished your investigation, you can then bring the node back online by enabling task scheduling
(REST | .NET), or perform one of the other actions discussed earlier.
Important: With each action that is described in this section--reboot, reimage, remove, and disable task scheduling--
you are able to specify how tasks currently running on the node are handled when you perform the action. For
example, when you disable task scheduling on a node by using the Batch .NET client library, you can specify
a DisableComputeNodeSchedulingOption enum value to specify whether to Terminate running tasks, Requeue them
for scheduling on other nodes, or allow running tasks to complete before performing the action (TaskCompletion).

Run Background tasks with WebJobs


You can run programs or scripts in WebJobs in your Azure App Service web app in three ways: on demand,
continuously, or on a schedule. There is no additional cost to use WebJobs.
Note
The WebJobs SDK does not yet support .NET Core.
This article shows how to deploy WebJobs by using the Azure Portal. For information about how to deploy by using
Visual Studio or a continuous delivery process, see How to Deploy Azure WebJobs to Web Apps.
The Azure WebJobs SDK simplifies many WebJobs programming tasks. For more information, see What is the
WebJobs SDK.
Azure Functions provides another way to run programs and scripts from either a serverless environment or from an
App Service app. For more information, see Azure Functions overview.
Note
Although this article refers to web apps, it also applies to API apps and mobile apps.
Acceptable file types for scripts or programs
The following file types are accepted:
.cmd, .bat, .exe (using windows cmd)
.ps1 (using powershell)
.sh (using bash)
.php (using php)
.py (using python)
.js (using node)
.jar (using java)
1
Create an on demand WebJob in the portal
1. In the Web App blade of the Azure Portal, click All settings > WebJobs to show the WebJobs blade.

2. Click Add. The Add WebJob dialog appears.

250 | P a g e
70-534 Architecting Microsoft Azure Solutions

3. Under Name, provide a name for the WebJob. The name must start with a letter or a number and cannot
contain any special characters other than "-" and "_".
4. In the How to Run box, choose Run on Demand.
5. In the File Upload box, click the folder icon and browse to the zip file that contains your script. The zip file
should contain your executable (.exe .cmd .bat .sh .php .py .js) as well as any supporting files needed to run
the program or script.
6. Check Create to upload the script to your web app.
The name you specified for the WebJob appears in the list on the WebJobs blade.
7. To run the WebJob, right-click its name in the list and click Run.

251 | P a g e
70-534 Architecting Microsoft Azure Solutions

Create a continuously running WebJob


1. To create a continuously executing WebJob, follow the same steps for creating a WebJob that runs once, but
in the How to Run box, choose Continuous.
2. To start or stop a continuous WebJob, right-click the WebJob in the list and click Start or Stop.
Note: If your web app runs on more than one instance, a continuously running WebJob will run on all of your
instances. On-demand and scheduled WebJobs run on a single instance selected for load balancing by Microsoft
Azure.
For Continuous WebJobs to run reliably and on all instances, enable the Always On* configuration setting for the web
app otherwise they can stop running when the SCM host site has been idle for too long.
Create a scheduled WebJob using a CRON expression
This technique is available to Web Apps running in Basic, Standard or Premium mode, and requires the Always
On setting to be enabled on the app.2
To turn an On Demand WebJob into a scheduled WebJob, simply include a settings.job file at the root of your WebJob
zip file. This JSON file should include a schedule property with a CRON expression, per example below.
The CRON expression is composed of 6 fields: {second} {minute} {hour} {day} {month} {day of the week}.
For example, to trigger your WebJob every 15 minutes, your settings.job would have:
JSONCopy
{
"schedule": "0 */15 * * * *"
}
Other CRON schedule examples:
Every hour (i.e. whenever the count of minutes is 0): 0 0 * * * *
Every hour from 9 AM to 5 PM: 0 0 9-17 * * *
At 9:30 AM every day: 0 30 9 * * *
At 9:30 AM every week day: 0 30 9 * * 1-5

Note: when deploying a WebJob from Visual Studio, make sure to mark your settings.job file properties as 'Copy if
newer'.
Create a scheduled WebJob using the Azure Scheduler
The following alternate technique makes use of the Azure Scheduler. In this case, your WebJob does not have any
direct knowledge of the schedule. Instead, the Azure Scheduler gets configured to trigger your WebJob on a schedule.

252 | P a g e
70-534 Architecting Microsoft Azure Solutions

The Azure Portal doesn't yet have the ability to create a scheduled WebJob, but until that feature is added you can do
it by using the classic portal.
1. In the classic portal go to the WebJob page and click Add.
2. In the How to Run box, choose Run on a schedule.

3. Choose the Scheduler Region for your job, and then click the arrow on the bottom right of the dialog to
proceed to the next screen.
4. In the Create Job dialog, choose the type of Recurrence you want: One-time job or Recurring job.

5. Also choose a Starting time: Now or At a specific time.

253 | P a g e
70-534 Architecting Microsoft Azure Solutions

6. If you want to start at a specific time, choose your starting time values under Starting On.

7. If you chose a recurring job, you have the Recur Every option to specify the frequency of occurrence and
the Ending On option to specify an ending time.

8. If you choose Weeks, you can select the On a Particular Schedule box and specify the days of the week that
you want the job to run.

254 | P a g e
70-534 Architecting Microsoft Azure Solutions

9. If you choose Months and select the On a Particular Schedule box, you can set the job to run on particular
numbered Days in the month.

10. If you choose Week Days, you can select which day or days of the week in the month you want the job to run
on.

255 | P a g e
70-534 Architecting Microsoft Azure Solutions

11. Finally, you can also use the Occurrences option to choose which week in the month (first, second, third etc.)
you want the job to run on the week days you specified.

12. After you have created one or more jobs, their names will appear on the WebJobs tab with their status,
schedule type, and other information. Historical information for the last 30 WebJobs is maintained.

256 | P a g e
70-534 Architecting Microsoft Azure Solutions

Scheduled jobs and Azure Scheduler


Scheduled jobs can be further configured in the Azure Scheduler pages of the classic portal.
1. On the WebJobs page, click the job's schedule link to navigate to the Azure Scheduler portal page.

2. On the Scheduler page, click the job.

3. The Job Action page opens, where you can further configure the job.

257 | P a g e
70-534 Architecting Microsoft Azure Solutions

258 | P a g e
70-534 Architecting Microsoft Azure Solutions

View the job history


1. To view the execution history of a job, including jobs created with the WebJobs SDK, click its corresponding
link under the Logs column of the WebJobs blade. (You can use the clipboard icon to copy the URL of the log
file page to the clipboard if you wish.)

2. Clicking the link opens the details page for the WebJob. This page shows you the name of the command run,
the last times it ran, and its success or failure. Under Recent job runs, click a time to see further details.

3. The WebJob Run Details page appears. Click Toggle Output to see the text of the log contents. The output log
is in text format.

259 | P a g e
70-534 Architecting Microsoft Azure Solutions

4. To see the output text in a separate browser window, click the download link. To download the text itself,
right-click the link and use your browser options to save the file contents.

5. The WebJobs link at the top of the page provides a convenient way to get to a list of WebJobs on the history
dashboard.

260 | P a g e
70-534 Architecting Microsoft Azure Solutions

Clicking one of these links takes you to the WebJob Details page for the job you selected.
Notes
Web apps in Free mode can time out after 20 minutes if there are no requests to the scm (deployment) site
and the web app's portal is not open in Azure. Requests to the actual site will not reset this.
Code for a continuous job needs to be written to run in an endless loop.
Continuous jobs run continuously only when the web app is up.
Basic and Standard modes offer the Always On feature which, when enabled, prevents web apps from
becoming idle.
You can only debug continuously running WebJobs. Debugging scheduled or on-demand WebJobs is not
supported.

Create a function triggered by a GitHub webhook


Learn how to create a function that is triggered by an HTTP webhook request with a GitHub-specific payload.

Prerequisites
A GitHub account with at least one project.
261 | P a g e
70-534 Architecting Microsoft Azure Solutions

An Azure subscription. If you don't have one, create a free account before you begin.
Add Function Apps to your portal favorites
If you haven't already done so, add Function Apps to your favorites in the Azure portal. This makes it easier to find
your function apps. If you have already done this, skip to the next section.
1. Log in to the Azure portal.
2. Click the arrow at the bottom left to expand all services, type Functions in the Filter field, and then click the
star next to Function Apps.

This adds the Functions icon to the menu on the left of the portal.
3. Close the menu, then scroll down to the bottom to see the Functions icon. Click this icon to see a list of all
your function apps. Click your function app to work with functions in this app.

262 | P a g e
70-534 Architecting Microsoft Azure Solutions

Create an Azure Function app


1. Click the New button found on the upper left-hand corner of the Azure portal.
2. Click Compute > Function App, select your Subscription. Then, use the function app settings as specified in the
table.

263 | P a g e
70-534 Architecting Microsoft Azure Solutions

Setting Suggested value Description

App name Globally unique Name that identifies your new function app.
name

Resource myResourceGroup Name for the new resource group in which to create your function
Group app.

Hosting Consumption plan Hosting plan that defines how resources are allocated to your
plan function app. In the default Consumption Plan, resources are added
dynamically as required by your functions. You only pay for the
time your functions run.

Location West Europe Choose a location near you or near other services your functions
will access.

Storage Globally unique Name of the new storage account used by your function app. You
account name can also use an existing account.

3. Click Create to provision and deploy the new function app.

Next, you create a function in the new function app.


Create a GitHub webhook triggered function
1. Expand your function app and click the + button next to Functions. If this is the first function in your function
app, select Custom function. This displays the complete set of function templates.

264 | P a g e
70-534 Architecting Microsoft Azure Solutions

2. Select the GitHubWebHook template for your desired language. Name your function, then select Create.

265 | P a g e
70-534 Architecting Microsoft Azure Solutions

3. In your new function, click </> Get function URL, then copy and save the values. Do the same thing for </> Get
GitHub secret. You use these values to configure the webhook in GitHub.

Next, you create a webhook in your GitHub repository.


Configure the webhook
266 | P a g e
70-534 Architecting Microsoft Azure Solutions

1. In GitHub, navigate to a repository that you own. You can also use any repository that you have forked. If you
need to fork a repository, use https://github.com/Azure-Samples/functions-quickstart.
2. Click Settings, then click Webhooks, and Add webhook.

3. Use settings as specified in the table, then click Add webhook.

267 | P a g e
70-534 Architecting Microsoft Azure Solutions

Setting Suggested value Description

Payload URL Copied value Use the value returned by </> Get function URL.

Secret Copied value Use the value returned by </> Get GitHub secret.

Content type application/json The function expects a JSON payload.

Event triggers Let me select individual events We only want to trigger on issue comment events.

Issue comment

Now, the webhook is configured to trigger your function when a new issue comment is added.
Test the function
1. In your GitHub repository, open the Issues tab in a new browser window.

268 | P a g e
70-534 Architecting Microsoft Azure Solutions

2. In the new window, click New Issue, type a title, and then click Submit new issue.
3. In the issue, type a comment and click Comment.

4. Go back to the portal and view the logs. You should see a trace entry with the new comment text.

Clean up resources
Other quick starts in this collection build upon this quick start. If you plan to continue on to work with subsequent
quick starts or with the tutorials, do not clean up the resources created in this quick start.
If you do not plan to continue, click the Resource group for the function app in the portal, and then click Delete.

269 | P a g e
70-534 Architecting Microsoft Azure Solutions

Get started with Azure Scheduler in Azure portal


It's easy to create scheduled jobs in Azure Scheduler. In this tutorial, you'll learn how to create a job. You'll also learn
Scheduler's monitoring and management capabilities.
Create a job
1. Sign in to Azure portal.
2. Click +New > type Scheduler in the search box > select Scheduler in results > click Create.

3. Lets create a job that simply hits http://www.microsoft.com/ with a GET request. In the Scheduler Job screen,
enter the following information:
a. Name: getmicrosoft
b. Subscription: Your Azure subscription
c. Job Collection: Select an existing job collection, or click Create New > enter a name.
4. Next, in Action Settings, define the following values:
. Action Type: HTTP
a. Method: GET
b. URL: http://www.microsoft.com

270 | P a g e
70-534 Architecting Microsoft Azure Solutions

5. Finally, let's define a schedule. The job could be defined as a one-time job, but lets pick a recurrence
schedule:
. Recurrence: Recurring
a. Start: Today's date
b. Recur every: 12 Hours
c. End by: Two days from today's date

271 | P a g e
70-534 Architecting Microsoft Azure Solutions

6. Click Create
Manage and monitor jobs
Once a job is created, it appears in the main Azure dashboard. Click the job and a new window opens with the
following tabs:
1. Properties
2. Action Settings
3. Schedule
4. History
5. Users

Properties
These read-only properties describe the management metadata for the Scheduler job.

272 | P a g e
70-534 Architecting Microsoft Azure Solutions

Action settings
Clicking on a job in the Jobs screen allows you to configure that job. This lets you configure advanced settings, if you
didn't configure them in the quick-create wizard.
For all action types, you may change the retry policy and the error action.
For HTTP and HTTPS job action types, you may change the method to any allowed HTTP verb. You may also add,
delete, or change the headers and basic authentication information.
For storage queue action types, you may change the storage account, queue name, SAS token, and body.
For service bus action types, you may change the namespace, topic/queue path, authentication settings, transport
type, message properties, and message body.

273 | P a g e
70-534 Architecting Microsoft Azure Solutions

Schedule
This lets you reconfigure the schedule, if you'd like to change the schedule you created in the quick-create wizard.
This is an opportunity to build complex schedules and advanced recurrence in your job
You may change the start date and time, recurrence schedule, and the end date and time (if the job is recurring.)

274 | P a g e
70-534 Architecting Microsoft Azure Solutions

History
The History tab displays selected metrics for every job execution in the system for the selected job. These metrics
provide real-time values regarding the health of your Scheduler:
1. Status
2. Details
3. Retry attempts
4. Occurrence: 1st, 2nd, 3rd, etc.
5. Start time of execution
6. End time of execution

275 | P a g e
70-534 Architecting Microsoft Azure Solutions

You can click on a run to view its History Details, including the whole response for every execution. This dialog box also
allows you to copy the response to the clipboard.

Users
Azure Role-Based Access Control (RBAC) enables fine-grained access management for Azure Scheduler. To learn how
to use the Users tab, refer to Azure Role-Based Access Control

276 | P a g e
70-534 Architecting Microsoft Azure Solutions

What is Azure Relay?


The Azure Relay service facilitates hybrid applications by enabling you to securely expose services that reside within a
corporate enterprise network to the public cloud, without having to open a firewall connection, or require intrusive
changes to a corporate network infrastructure. Relay supports a variety of different transport protocols and web
services standards.
The relay service supports traditional one-way, request/response, and peer-to-peer traffic. It also supports event
distribution at internet-scope to enable publish/subscribe scenarios and bi-directional socket communication for
increased point-to-point efficiency.
In the relayed data transfer pattern, an on-premises service connects to the relay service through an outbound port
and creates a bi-directional socket for communication tied to a particular rendezvous address. The client can then
communicate with the on-premises service by sending traffic to the relay service targeting the rendezvous address.
The relay service then "relays" data to the on-premises service through a bi-directional socket dedicated to each
client. The client does not need a direct connection to the on-premises service, it is not required to know where the
service resides, and the on-premises service does not need any inbound ports open on the firewall.
The key capability elements provided by Relay are bi-directional, unbuffered communication across network
boundaries with TCP-like throttling, endpoint discovery, connectivity status, and overlaid endpoint security. The relay
capabilities differ from network-level integration technologies such as VPN, in that relay can be scoped to a single
application endpoint on a single machine, while VPN technology is far more intrusive as it relies on altering the
network environment.
Azure Relay has two features:
1. Hybrid Connections - Uses the open standard web sockets enabling multi-platform scenarios.
2. WCF Relays - Uses Windows Communication Foundation (WCF) to enable remote procedure calls. WCF Relay
is the legacy relay offering that many customers already use with their WCF programming models.
Hybrid Connections and WCF Relays both enable secure connection to assets that exist within a corporate enterprise
network. Use of one over the other is dependent on your particular needs, as described in the following table:

WCF Relay Hybrid Connections

WCF x

.NET Core x

.NET Framework x x

JavaScript/NodeJS x

Standards-Based Open Protocol x

Multiple RPC Programming Models x

Hybrid Connections
The Azure Relay Hybrid Connections capability is a secure, open-protocol evolution of the existing Relay features that
can be implemented on any platform and in any language that has a basic WebSocket capability, which explicitly
includes the WebSocket API in common web browsers. Hybrid Connections is based on HTTP and WebSockets.
WCF Relays
The WCF Relay works for the full .NET Framework (NETFX) and for WCF. You initiate the connection between your on-
premises service and the relay service using a suite of WCF "relay" bindings. Behind the scenes, the relay bindings map
to new transport binding elements designed to create WCF channel components that integrate with Service Bus in the
cloud.
Service history

277 | P a g e
70-534 Architecting Microsoft Azure Solutions

Hybrid Connections supplants the former, similarly named "BizTalk Services" feature that was built on the Azure
Service Bus WCF Relay. The new Hybrid Connections capability complements the existing WCF Relay feature and these
two service capabilities exist side-by-side in the Azure Relay service for the foreseeable future. They share a common
gateway, but are otherwise different implementations.

Azure Relay Hybrid Connections protocol


Azure Relay is one of the key capability pillars of the Azure Service Bus platform. The Relay's new "Hybrid
Connections" capability is a secure, open-protocol evolution based on HTTP and WebSockets. It supersedes the
former, equally named "BizTalk Services" feature that was built on a proprietary protocol foundation. The integration
of Hybrid Connections into Azure App Services will continue to function as-is.
"Hybrid Connections" enables bi-directional, binary stream communication between two networked applications,
whereby either or both parties can reside behind NATs or firewalls. This article describes the client-side interactions
with the Hybrid Connections relay for connecting clients in listener and sender roles, and how listeners accept new
connections.
Interaction model
The Hybrid Connections relay connects two parties by providing a rendezvous point in the Azure cloud that both
parties can discover and connect to from their own networks perspective. That rendezvous point is called "Hybrid
Connection" in this and other documentation, in the APIs, and also in the Azure portal. The Hybrid Connections service
endpoint will be referred to as the "service" for the rest of this article. The interaction model leans on the
nomenclature established by many other networking APIs:
There is a listener that first indicates readiness to handle incoming connections, and subsequently accepts them as
they arrive. On the other side, there is a connecting client that connects towards the listener, expecting that
connection to be accepted for establishing a bi-directional communication path. "Connect," "Listen," and "Accept" are
the same terms you will find on most socket APIs.
Any relayed communication model has either party making outbound connections towards a service endpoint, which
makes the "listener" also a "client" in colloquial use and may also cause other terminology overloads; the precise
terminology we therefore use for Hybrid Connections is as follows:
The programs on both sides of a connection are called "client," since they are clients to the service. The client that
waits for and accepts connections is the "listener," or is said to be in the "listener role." The client that initiates a new
connection towards a listener via the service is called the "sender," or is in the "sender role."
Listener interactions
The listener has four interactions with the service; all wire details are described later in this article in the reference
section.
Listen
To indicate readiness to the service that a listener is ready to accept connections, it creates an outbound web socket
connection. The connection handshake carries the name of a Hybrid Connection configured in the Relay namespace,
and a security token that confers the "Listen" right on that name. When the web socket is accepted by the service, the
registration is complete and the established web socket is kept alive as the "control channel" for enabling all
subsequent interactions. The service allows up to 25 concurrent listeners on a Hybrid Connection. If there are 2 or
more active listeners, incoming connections will be balanced across them in random order; fair distribution is not
guaranteed.
Accept
Whenever a sender opens up a new connection on the service, the service will choose and notify one of the active
listeners on the Hybrid Connection. The notification is sent to the listener over the open control channel as a JSON
message containing the URL of the Web socket endpoint that the listener must connect to for accepting the
connection.
The URL can and must be used directly by the listener without any extra work; the encoded information is only valid
for a short period of time, essentially for as long as the sender is willing to wait for the connection to be established
end-to-end, but up to a maximum of 30 seconds. The URL can only be used for one successful connection attempt. As
soon as the Web socket connection with the rendezvous URL is established, all further activity on this web socket is
relayed from and to the sender, without any intervention or interpretation by the service.
Renew
278 | P a g e
70-534 Architecting Microsoft Azure Solutions

The security token that must be used to register the listener and maintain the control channel may expire while the
listener is active. The token expiry will not affect ongoing connections, but it will cause the control channel to be
dropped by the service at or soon after the instant of expiry. The "renew" operation is a JSON message that the
listener can send to replace the token associated with the control channel, so that the control channel can be
maintained for extended periods.
Ping
If the control channel stays idle for a long time, intermediaries on the way, such as load balancers or NATs may drop
the TCP connection. The "ping" operation avoids that by sending a small amount of data on the channel that reminds
everyone on the network route that the connection is meant to be alive, and it also serves as a "liveness" test for the
listener. If the ping fails, the control channel should be considered unusable and the listener should reconnect.
Sender interaction
The sender only has a single interaction with the service; it connects.
Connect
The "connect" operation opens a web socket on the service, providing the name of the Hybrid Connection and
(optionally, but required by default) a security token conferring "Send" permission in the query string. The service will
then interact with the listener in the way described previously, and have the listener create a rendezvous connection
that will be joined with this web socket. After the web socket has been accepted, all further interactions on the web
socket will therefore be with a connected listener.
Interaction summary
The result of this interaction model is that the sender client comes out of the handshake with a "clean" web socket
which is connected to a listener and that needs no further preambles or preparation. This allows practically any
existing web socket client implementation to readily take advantage of the Hybrid Connections service by simply
supplying a correctly-constructed URL into their web socket client layer.
The rendezvous connection web Socket that the listener obtains through the accept interaction is also clean and can
be handed to any existing web socket server implementation with some minimal extra abstraction that distinguishes
between "accept" operations on their framework's local network listeners and Hybrid Connections remote "accept"
operations.
Protocol reference
This section describes the details of the protocol interactions described above.
All web socket connections are made on port 443 as an upgrade from HTTPS 1.1, which is commonly abstracted by
some web socket framework or API. The description here is kept implementation neutral, without suggesting a
specific framework.
Listener protocol
The listener protocol consists of two connection gestures and three message operations.
Listener control channel connection
The control channel is opened with creating a web socket connection to:
Copy
wss://{namespace-address}/$hc/{path}?sb-hc-action=...[&sb-hc-id=...]&sb-hc-token=...
The namespace-address is the fully qualified domain name of the Azure Relay namespace that hosts the Hybrid
Connection, typically of the form {myname}.servicebus.windows.net.
The query string parameter options are as follows.

Parameter Required Description

sb-hc- Yes For the listener role the parameter must be sb-hc-action=listen
action

{path} Yes The URL-encoded namespace path of the preconfigured Hybrid Connection
to register this listener on. This expression is appended to the fixed $hc/ path
portion.

279 | P a g e
70-534 Architecting Microsoft Azure Solutions

Parameter Required Description

sb-hc- Yes* The listener must provide a valid, URL-encoded Service Bus Shared Access
token Token for the namespace or Hybrid Connection that confers the Listen right.

sb-hc-id No This client-supplied optional ID enables end-to-end diagnostic tracing.

If the web socket connection fails due to the Hybrid Connection path not being registered, or an invalid or missing
token, or some other error, the error feedback will be provided using the regular HTTP 1.1 status feedback model. The
status description will contain an error tracking-id that can be communicated to Azure support:

Code Error Description

404 Not Found The Hybrid Connection path is invalid or the base URL is malformed.

401 Unauthorized The security token is missing or malformed or invalid.

403 Forbidden The security token is not valid for this path for this action.

500 Internal Error Something went wrong in the service.

If the web socket connection is intentionally shut down by the service after it was initially set up, the reason for doing
so will be communicated using an appropriate web socket protocol error code along with a descriptive error message
that will also include a tracking ID. The service will not shut down the control channel without encountering an error
condition. Any clean shutdown is client controlled.

WS Status Description

1001 The Hybrid Connection path has been deleted or disabled.

1008 The security token has expired and the authorization policy is therefore violated.

1011 Something went wrong in the service.

Accept handshake
The accept notification is sent by the service to the listener over the previously established control channel as a JSON
message in a web socket text frame. There is no reply to this message.
The message contains a JSON object named "accept," which defines the following properties at this time:
address the URL string to be used for establishing the web socket to the service to accept an incoming
connection.
id the unique identifier for this connection. If the ID was supplied by the sender client, it is the sender
supplied value, otherwise it is a system generated value.
connectHeaders all HTTP headers that have been supplied to the Relay endpoint by the sender, which also
includes the Sec-WebSocket-Protocol and the Sec-WebSocket-Extensions headers.
Accept Message
{
"accept" : {
"address" : "wss://168.61.148.205:443/$hc/{path}?..."
"id" : "4cb542c3-047a-4d40-a19f-bdc66441e736",
"connectHeaders" : {
"Host" : "...",
"Sec-WebSocket-Protocol" : "...",
"Sec-WebSocket-Extensions" : "..."
280 | P a g e
70-534 Architecting Microsoft Azure Solutions

}
}
}
The address URL provided in the JSON message is used by the listener to establish the web socket for accepting or
rejecting the sender socket.
Accepting the Socket
To accept, the listener establishes a web socket connection to the provided address.
If the "accept" message carries a "Sec-WebSocket-Protocol" header, it is expected that the listener will only accept the
web socket if it supports that protocol and that it sets the header as the web socket is established.
The same applies to the "Sec-WebSocket-Extensions" header. If the framework supports an extension, it should set
the header to the server side reply of the required "Sec-WebSocket-Extensions" handshake for the extension.
The URL must be used as-is for establishing the accept socket, but contains the following parameters:

Parameter Required Description

sb-hc-action Yes For accepting a socket the parameter must be sb-hc-action=accept

{path} Yes (see the following paragraph)

sb-hc-id No See description of id above.

The {path} is the URL-encoded namespace path of the preconfigured Hybrid Connection on which to register this
listener. This expression is appended to the fixed $hc/ path portion.
The path expression MAY be extended with a suffix and a query string expression that follows the registered name
after a separating forward slash. This allows the sender client to pass dispatch arguments to the accepting listener
when it is not possible to include HTTP headers. The expectation is that the listener framework will parse out the fixed
path portion and the registered name from the path and make the remainder, possibly without any query string
arguments prefixed by "sb-", available to the application for deciding whether to accept the connection.
For more details see the following "Sender Protocol" section.
If theres an error, the service may reply as follows:

Code Error Description

403 Forbidden The URL is not valid.

500 Internal Error Something went wrong in the service

After the connection has been established, the server will shut down the web socket when the sender web socket
shuts down, or with the following status.

WS Status Description

1001 The sender client shuts down the connection.

1001 The Hybrid Connection path has been deleted or disabled.

1008 The security token has expired and therefore the authorization policy is violated.

1011 Something went wrong in the service.

Rejecting the Socket


Rejecting the socket after inspecting the "accept" message requires a similar handshake so that the status code and
status description communicating the reason for the rejection can flow back to the sender.
281 | P a g e
70-534 Architecting Microsoft Azure Solutions

The protocol design choice here is to use a web socket handshake (that is designed to end in a defined error state) so
that listener client implementations can continue to rely on a web socket client and dont need to employ an extra,
bare HTTP client.
To reject the socket, the client takes the address URI from the "accept" message and appends two query string
parameters to it:

Param Required Description

statusCode Yes Numeric HTTP status code.

statusDescription Yes Human readable reason for the rejection.

The resulting URI is then used to establish a WebSocket connection.


When completing correctly, this handshake will intentionally fail with an HTTP error code 410, since no web socket
has been established. If an error occurs, these are the options:

Code Error Description

403 Forbidden The URL is not valid.

500 Internal Error Something went wrong in the service.

Listener token renewal


When the listener token is about to expire, it can replace it by sending a text frame message to the service via the
established control channel. The message contains a JSON object named "renewToken," which defines the following
property at this time:
token a valid, URL-encoded Service Bus Shared Access Token for the namespace or Hybrid Connection that
confers the Listen right.
renewToken Message
{
"renewToken" : {
"token" : "SharedAccessSignature
sr=http%3a%2f%2fcontoso.servicebus.windows.net%2fhyco%2f&amp;sig=XXXXXXXXXX%3d&amp;se=1471633754&amp;skn=
SasKeyName"
}
}
If the token validation fails, access is denied, and cloud service will close the control channel web socket with an error,
otherwise there is no reply.

WS Status Description

1008 The security token has expired and the authorization policy is therefore violated.

Sender protocol
The sender protocol is effectively identical to how a listener is established. The goal is maximum transparency for the
end-to-end web socket. The address to connect to is the same as for the listener, but the "action" differs and the
token needs a different permission:
wss://{namespace-address}/$hc/{path}?sb-hc-action=...&sb-hc-id=...&sbc-hc-token=...

The namespace-address is the fully qualified domain name of the Azure Relay namespace that hosts the Hybrid
Connection, typically of the form {myname}.servicebus.windows.net.
The request may contain arbitrary extra HTTP headers, including application-defined ones. All supplied headers flow
to the listener and can be found on the "connectHeader" object of the "accept" control message.
The query string parameter options are as follows

282 | P a g e
70-534 Architecting Microsoft Azure Solutions

Param Required? Description

sb-hc- Yes For the sender role the parameter must be action=connect.
action

{path} Yes (see the following paragraph)

sb-hc- Yes* The listener must provide a valid, URL-encoded Service Bus Shared Access
token Token for the namespace or Hybrid Connection that confers the Send right.

sb-hc-id No An optional ID that enables end-to-end diagnostic tracing and is made


available to the listener during the accept handshake.

The {path} is the URL-encoded namespace path of the preconfigured Hybrid Connection to register this listener on.
The path expression MAY be extended with a suffix and a query string expression to communicate further. If the
Hybrid Connection is registered under the path "hyco," the path expression can
be hyco/suffix?param=value&... followed by the query string parameters defined here. A complete expression may
then be:
wss://{namespace-address}/$hc/hyco/suffix?param=value&sb-hc-action=...[&sb-hc-id=...&]sbc-hc-token=...

The path expression is passed through to the listener in the address URI contained in the "accept" control message.
If the web socket connection fails due to the Hybrid Connection path not being registered, or an invalid or missing
token, or some other error, the error feedback will be provided using the regular HTTP 1.1 status feedback model. The
status description will contain an error tracking-id that can be communicated to Azure support:

Code Error Description

404 Not Found The Hybrid Connection path is invalid or the base URL is malformed.

401 Unauthorized The security token is missing or malformed or invalid.

403 Forbidden The security token is not valid for this path for this action.

500 Internal Error Something went wrong in the service.

If the web socket connection is intentionally shut down by the service after it has been initially set up, the reason for
doing so will be communicated using an appropriate web socket protocol error code along with a descriptive error
message that will also include a tracking ID.

WS Status Description

1000 The listener shut down the socket.

1001 The Hybrid Connection path has been deleted or disabled.

1008 The security token has expired and therefore the authorization policy is violated.

1011 Something went wrong in the service.

283 | P a g e
70-534 Architecting Microsoft Azure Solutions

Integrate your app with an Azure Virtual Network


This document describes the Azure App Service virtual network integration feature and shows how to set it up with
apps in Azure App Service. If you are unfamiliar with Azure Virtual Networks (VNETs), this is a capability that allows
you to place many of your Azure resources in a non-internet routeable network that you control access to. These
networks can then be connected to your on premise networks using a variety of VPN technologies. To learn more
about Azure Virtual Networks start with the information here: Azure Virtual Network Overview.
The Azure App Service has two forms.
1. The multi-tenant systems that support the full range of pricing plans
2. The App Service Environment (ASE) premium feature which deploys into your VNET.
This document goes through VNET Integration and not App Service Environment. If you want to learn more about the
ASE feature then start with the information here: App Service Environment introduction.
VNET Integration gives your web app access to resources in your virtual network but does not grant private access to
your web app from the virtual network. Private site access is only available with an ASE configured with an Internal
Load Balancer (ILB). For details on using an ILB ASE, start with the article here: Creating and using an ILB ASE.
A common scenario where you would use VNET Integration is enabling access from your web app to a database or a
web services running on a virtual machine in your Azure virtual network. With VNET Integration you don't need to
expose a public endpoint for applications on your VM but can use the private non-internet routable addresses instead.
The VNET Integration feature:
requires a Standard or Premium pricing plan
will work with Classic(V1) or Resource Manager(V2) VNET
supports TCP and UDP
works with Web, Mobile and API apps
enables an app to connect to only 1 VNET at a time
enables up to 5 VNETs to be integrated with in an App Service Plan
allows the same VNET to be used by multiple apps in an App Service Plan
supports a 99.9% SLA due to a reliance on the VNET Gateway
There are some things that VNET Integration does not support including:
mounting a drive
AD integration
NetBios
private site access
Getting started
Here are some things to keep in mind before connecting your web app to a virtual network:
VNET Integration only works with apps in a Standard or Premium pricing plan. If you enable the feature and
then scale your App Service Plan to an unsupported pricing plan your apps will lose their connections to the
VNETs they are using.
If your target virtual network already exists, it must have point-to-site VPN enabled with a Dynamic routing
gateway before it can be connected to an app. You cannot enable point-to-site Virtual Private Network (VPN)
if your gateway is configured with Static routing.
The VNET must be in the same subscription as your App Service Plan(ASP).
The apps that integrate with a VNET will use the DNS that is specified for that VNET.
By default your integrating apps will only route traffic into your VNET based on the routes that are defined in
your VNET.
Enabling VNET Integration
This document is focused primarily on using the Azure Portal for VNET Integration. To enable VNET Integration with
your app using PowerShell, follow the directions here: Connect your app to your virtual network by using PowerShell.
You have the option to connect your app to a new or existing virtual network. If you create a new network as a part of
your integration then in addition to just creating the VNET, a dynamic routing gateway will be pre-configured for you
and Point to Site VPN will be enabled.
Note: Configuring a new virtual network integration can take several minutes.

284 | P a g e
70-534 Architecting Microsoft Azure Solutions

To enable VNET Integration open your app Settings and then select Networking. The UI that opens up offers three
networking choices. This guide is only going into VNET Integration though Hybrid Connections and App Service
Environments are discussed later in this document.1
If your app is not in the correct pricing plan the UI will helpfully enable you to scale your plan to a higher pricing plan
of your choice.

Enabling VNET Integration with a pre-existing VNET


The VNET Integration UI allows you to select from a list of your VNETs. The Classic VNETs will indicate that they are
such with the word "Classic" in parenthesis next to the VNET name. The list is sorted such that the Resource Manager
VNETs are listed first. In the image shown below you can see that only one VNET can be selected. There are multiple
reasons that a VNET will be greyed out including:
the VNET is in another subscription that your account has access to
the VNET does not have Point to Site enabled
the VNET does not have a dynamic routing gateway

285 | P a g e
70-534 Architecting Microsoft Azure Solutions

To enable integration simply click on the VNET you wish to integrate with. After you select the VNET, your app will be
automatically restarted for the changes to take effect.
Enable Point to Site in a Classic VNET
If your VNET does not have a gateway nor has Point to Site then you have to set that up first. To do this for a Classic
VNET, go to the Azure Portaland bring up the list of Virtual Networks(classic). From here click on the network you
want to integrate with and click on the big box under Essentials called VPN Connections. From here you can create
your point to site VPN and even have it create a gateway. After you go through the point to site with gateway creation
experience it will be about 30 minutes before it is ready.

286 | P a g e
70-534 Architecting Microsoft Azure Solutions

Enabling Point to Site in a Resource Manager VNET


To configure a Resource Manager VNET with a gateway and Point to Site, you can use either PowerShell as
documented here, Configure a Point-to-Site connection to a virtual network using PowerShell or use the Azure Portal
as documented here, Configure a Point-to-Site connection to a VNet using the Azure Portal. The UI to perform this
capability is not yet available.
Creating a pre-configured VNET
If you want to create a new VNET that is configured with a gateway and Point-to-Site, then the App Service networking
UI has the capability to do that but only for a Resource manager VNET. If you wish to create a Classic VNET with a
gateway and Point-to-Site then you need to do this manually through the Networking user interface.
To create a Resource Manager VNET through the VNET Integration UI, simply select Create New Virtual Network and
provide the:
Virtual Network Name
Virtual Network Address Block
Subnet Name
Subnet Address Block
Gateway Address Block
Point-to-Site Address Block
If you want this VNET to connect to any of your other network then you should avoid picking IP address space that
overlaps with those networks.
Note
Resource Manager VNET creation with a gateway takes about 30 minutes and currently will not integrate the VNET
with your app. After your VNET is created with the gateway you need to come back to your app VNET Integration UI
and select your new VNET.

287 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure VNETs normally are created within private network addresses. By default the VNET Integration feature will
route any traffic destined for those IP address ranges into your VNET. The private IP address ranges are:
10.0.0.0/8 - this is the same as 10.0.0.0 - 10.255.255.255
172.16.0.0/12 - this is the same as 172.16.0.0 - 172.31.255.255
192.168.0.0/16 - this is the same as 192.168.0.0 - 192.168.255.255
The VNET address space needs to be specified in CIDR notation. If you are unfamiliar with CIDR notation, it is a
method for specifying address blocks using an IP address and an integer that represents the network mask. As a quick
reference, consider that 10.1.0.0/24 would be 256 addresses and 10.1.0.0/25 would be 128 addresses. An IPv4
address with a /32 would be just 1 address.
If you set the DNS server information here then that will be set for your VNET. After VNET creation you can edit this
information from the VNET user experiences.
When you create a Classic VNET using the VNET Integration UI, it will create a VNET in the same resource group as
your app.
How the system works
Under the covers this feature builds on top of Point-to-Site VPN technology to connect your app to your VNET. Apps in
Azure App Service have a multi-tenant system architecture which precludes provisioning an app directly in a VNET as
is done with virtual machines. By building on point-to-site technology we limit network access to just the virtual

288 | P a g e
70-534 Architecting Microsoft Azure Solutions

machine hosting the app. Access to the


network is further restricted on those app
hosts so that your apps can only access
the networks that you configure them to
access.

If you havent configured a DNS server


with your virtual network you will need to
use IP addresses. While using IP
addresses, remember that the major
benefit of this feature is that it enables
you to use the private addresses within
your private network. If you set your app
up to use public IP addresses for one of
your VMs then you aren't using the VNET Integration feature and are communicating over the internet.
Managing the VNET Integrations
The ability to connect and disconnect to a VNET is at an app level. Operations that can affect the VNET Integration
across multiple apps are at an ASP level. From the UI that is shown at the app level you can get details on your VNET.
Most of the same information is also shown at the ASP level.

From the Network Feature Status page you can see if your app is connected to your VNET. If your VNET gateway is
down for whatever reason then this would show as not-connected.
289 | P a g e
70-534 Architecting Microsoft Azure Solutions

The information you now have available to you in the app level VNET Integration UI is the same as the detail
information you get from the ASP. Here are those items:
VNET Name - This link opens the the network UI
Location - This reflects the location of your VNET. It is possible to integrate with a VNET in another location.
Certificate Status - There are certificates used to secure the VPN connection between the app and the VNET.
This reflects a test to ensure they are in sync.
Gateway Status - Should your gateways be down for whatever reason then your app cannot access resources
in the VNET.
VNET address space - This is the IP address space for your VNET.
Point to Site address space - This is the point to site IP address space for your VNET. Your app will show
communication as coming from one of the IPs in this address space.
Site to site address space - You can use Site to Site VPNs to connect your VNET to your on premise resources
or to other VNETs. Should you have that configured then the IP ranges defined with that VPN connection will
show here.
DNS Servers - If you have DNS Servers configured with your VNET then they are listed here.
IPs routed to the VNET - There are a list of IP addresses that your VNET has routing defined for. Those
addresses will show here.
The only operation you can take in the app view of your VNET Integration is to disconnect your app from the VNET it is
currently connected to. To do this simply click Disconnect at the top. This action does not change your VNET. The
VNET and it's configuration including the gateways remains unchanged. If you then want to delete your VNET you
need to first delete the resources in it including the gateways.
The App Service Plan view has a number of additional operations. It is also accessed differently than from the app. To
reach the ASP Networking UI simply open your ASP UI and scroll down. There is a UI element called Network Feature
Status. It will give some minor details around your VNET Integration. Clicking on this UI opens the Network Feature
Status UI. If you then click on "Click here to manage" you will open up UI that lists the VNET Integrations in this ASP.

The location of the ASP is good to remember when looking at the locations of the VNETs you are integrating with.
When the VNET is in another location you are far more likely to see latency issues.
The VNETs integrated with is a reminder on how many VNETs your apps are integrated with in this ASP and how many
you can have.
To see added details on each VNET, just click on the VNET you are interested in. In addition to the details that were
noted earlier you will also see a list of the apps in this ASP that are using that VNET.
With respect to actions there are two primary actions. The first is the ability to add routes that drive traffic leaving
your app into your VNET. The second action is the ability to sync certificates and network information.

290 | P a g e
70-534 Architecting Microsoft Azure Solutions

Routing As noted earlier the routes that are defined in your VNET are what is used for directing traffic into your VNET
from your app. There are some uses though where customers want to send additional outbound traffic from an app
into the VNET and for them this capability is provided. What happens to the traffic after that is up to how the
customer configures their VNET.
Certificates The Certificate Status reflects a check being performed by the App Service to validate that the certificates
that we are using for the VPN connection are still good. When VNET Integration enabled, then if this is the first
integration to that VNET from any apps in this ASP, there is a required exchange of certificates to ensure the security
of the connection. Along with the certificates we get the DNS configuration, routes and other similar things that
describe the network. If those certificates or network information is changed then you will need to click "Sync
Network".NOTE: When you click "Sync Network" then you will cause a brief outage in connectivity between your app
and your VNET. While your app will not be restarted the loss of connectivity could cause your site to not function
properly.

291 | P a g e
70-534 Architecting Microsoft Azure Solutions

Accessing on premise resources


One of the benefits of the VNET Integration feature is that if your VNET is connected to your on premise network with
a Site to Site VPN then your apps can have access to your on premise resources from your app. For this to work
though you may need to update your on premise VPN gateway with the routes for your Point to Site IP range. When
the Site to Site VPN is first set up then the scripts used to configure it should set up routes including your Point to Site
VPN. If you add the Point to Site VPN after your create your Site to Site VPN then you will need to update the routes
manually. Details on how to do that will vary per gateway and are not described here.
Note
While the VNET Integration feature will work with a Site to Site VPN to access on premise resources it currently will
not work with an ExpressRoute VPN to do the same. This is true when integrating with either a Classic or Resource
Manager VNET. If you need to access resources through an ExpressRoute VPN then you can use an ASE which can run
in your VNET.
Pricing details
There are a few pricing nuances that you should be aware of when using the VNET Integration feature. There are 3
related charges to the use of this feature:
ASP pricing tier requirements
Data transfer costs
VPN Gateway costs.
For your apps to be able to use this feature, they need to be in a Standard or Premium App Service Plan. You can see
more details on those costs here: App Service Pricing.
Due to the way Point to Site VPNs are handled, you always have a charge for outbound data through your VNET
Integration connection even if the VNET is in the same data center. To see what those charges are take a look
here: Data Transfer Pricing Details.
The last item is the cost of the VNET gateways. If you don't need the gateways for something else such as Site to Site
VPNs then you are paying for gateways to support the VNET Integration feature. There are details on those costs
here: VPN Gateway Pricing.
Troubleshooting
While the feature is easy to set up that doesn't mean that your experience will be problem free. Should you encounter
problems accessing your desired endpoint there are some utilities you can use to test connectivity from the app
console. There are two console experiences you can use. One is from the Kudu console and the other is the console
that you can reach in the Azure Portal. To get to the Kudu console from your app go to Tools -> Kudu. This is the same
as going to [sitename].scm.azurewebsites.net. Once that opens simply go to the Debug console tab. To get to the
Azure portal hosted console then from your app go to Tools -> Console.
Tools
The tools ping, nslookup and tracert wont work through the console due to security constraints. To fill the void there
have been two separate tools added. In order to test DNS functionality we added a tool named nameresolver.exe. The
syntax is:
Copy
nameresolver.exe hostname [optional: DNS Server]
You can use nameresolver to check the hostnames that your app depends on. This way you can test if you have
anything mis-configured with your DNS or perhaps don't have access to your DNS server.
The next tool allows you to test for TCP connectivity to a host and port combination. This tool is called tcpping.exe and
the syntax is:
Copy
tcpping.exe hostname [optional: port]
This tool will tell you if you can reach a specific host and port but will not perform the same task you get with the
ICMP based ping utility. The ICMP ping utility will tell you if your host is up. With tcpping you find out if you can access
a specific port on a host.
Debugging access to VNET hosted resources
There are a number of things that can prevent your app from reaching a specific host and port. Most of the time it is
one of three things:
There is a firewall in the way If you have a firewall in the way then you will hit the TCP timeout. That is 21
seconds in this case. Use the tcpping tool to test connectivity. TCP timeouts can be due to many things
beyond firewalls but start there.
292 | P a g e
70-534 Architecting Microsoft Azure Solutions

DNS is not accessible The DNS timeout is 3 seconds per DNS server. If you have 2 DNS servers that is 6
seconds. Use nameresolver to see if DNS is working. Remember you can't use nslookup as that does not use
the DNS your VNET is configured with.
Invalid P2S IP range The point to site IP range needs to be in the RFC 1918 private IP ranges (10.0.0.0-
10.255.255.255 / 172.16.0.0-172.31.255.255 / 192.168.0.0-192.168.255.255) If the range uses IPs outside of
that then things won't work.
If those items don't answer your problem, look first for the simple things like:
Does the Gateway show as being up in the Portal?
Do certificates show as being in sync?
Did anybody change the network configuration without doing a "Sync Network" in the affected ASPs?
If your gateway is down then bring it back up. If your certificates are out of sync then go to the ASP view of your VNET
Integration and hit "Sync Network". If you suspect that there has been a change made to your VNET configuration and
it wasn't sync'd with your ASPs then go to the ASP view of your VNET Integration and hit "Sync Network" Just as a
reminder, this will cause a brief outage with your VNET connection and your apps.
If all of that is fine then you need to dig in a bit deeper:
Are there any other apps using VNET Integration to reach resources in the same VNET?
Can you go to the app console and use tcpping to reach any other resources in your VNET?
If either of the above are true then your VNET Integration is fine and the problem is somewhere else. This is where it
gets to be more of a challenge because there is no simple way to see why you can't reach a host:port. Some of the
causes include:
you have a firewall up on your host preventing access to the application port from your point to site IP range.
Crossing subnets often requires Public access.
your target host is down
your application is down
you had the wrong IP or hostname
your application is listening on a different port than what you expected. You can check this by going onto that
host and using "netstat -aon" from the cmd prompt. This will show you what process ID is listening on what
port.
your network security groups are configured in such a manner that they prevent access to your application
host and port from your point to site IP range
Remember that you don't know what IP in your Point to Site IP range that your app will use so you need to allow
access from the entire range.
Additional debug steps include:
log onto another VM in your VNET and attempt to reach your resource host:port from there. There are some
TCP ping utilities that you can use for this purpose or can even use telnet if need be. The purpose here is just
to determine if connectivity is there from this other VM.
bring up an application on another VM and test access to that host and port from the console from your app
####On premise resources#### If your cannot reach resources on premise then the first thing you should
check is if you can reach a resource in your VNET. If that is working then the next steps are pretty easy. From a
VM in your VNET you need to try to reach the on premise application. You can use telnet or a TCP ping utility.
If your VM can't reach your on premise resource then first make sure your Site to Site VPN connection is
working. If it is working then check the same things noted earlier as well as the on premise gateway
configuration and status.
Now if your VNET hosted VM can reach your on premise system but your app can't then the reason is likely one of the
following:
your routes are not configured with your point to site IP ranges in your on premise gateway
your network security groups are blocking access for your Point to Site IP range
your on premise firewalls are blocking traffic from your Point to Site IP range
you have a User Defined Route(UDR) in your VNET that prevents your Point to Site based traffic from reaching
your on premise network
Hybrid Connections and App Service Environments
There are 3 features that enable access to VNET hosted resources. They are:
VNET Integration
Hybrid Connections
293 | P a g e
70-534 Architecting Microsoft Azure Solutions

App Service Environments


Hybrid Connections requires you to install a relay agent called the Hybrid Connection Manager(HCM) in your network.
The HCM needs to be able to connect to Azure and also to your application. This solution is especially great from a
remote network such as your on premise network or even another cloud hosted network because it does not require
an internet accessible endpoint. The HCM only runs on Windows and you can have up to 5 instances running to
provide high availability. Hybrid Connections only supports TCP though and each HC endpoint has to match to a
specific host:port combination.
The App Service Environment feature allows you to run an instance of the Azure App Service in your VNET. This lets
your apps access resources in your VNET without any extra steps. Some of the other benefits of an App Service
Environment is that you can use 8 core dedicated workers with 14 GB of RAM. Another benefit is that you can scale
the system to meet your needs. Unlike the multi-tenant environments where your ASP is limited in size, in an ASE you
control how many resources you want to give to the system. With respect to the network focus of this document
though, one of the things you get with an ASE that you don't with VNET Integration is that it can work with an
ExpressRoute VPN.
While there is some use case overlap, none of these feature can replace any of the others. Knowing what feature to
use is tied to your needs and how you will want to use it. For example:
If you are a developer and simply want to run a site in Azure and have it access the database on the
workstation under your desk then the easiest thing to use is Hybrid Connections.
If you are a large organization that wants to put a large number of web properties in the public cloud and
manage them in your own network then you want to go with the App Service Environment.
If you have a number of App Service hosted apps and simply want to access resources in your VNET then
VNET Integration is the way to go.
Beyond the use cases there are some simplicity related aspects. If your VNET is already connected to your on premise
network then using VNET Integration or an App Service Environment is an easy way to consume on premise resources.
On the other hand, if your VNET is not connected to your on premise network then it's a lot more overhead to set up a
site to site VPN with your VNET compared with installing the HCM.
Beyond the functional differences there are also pricing differences. The App Service Environment feature is a
Premium service offering but offers the most network configuration possibilities in addition to other great features.
VNET Integration can be used with Standard or Premium ASPs and is perfect for securely consuming resources in your
VNET from the multi-tenant App Service. Hybrid Connections currently depends on a BizTalk account which has pricing
levels that start free and then get progressively more expensive based on the amount you need. When it comes to
working across many networks though, there is no other feature like Hybrid Connections which can enable you to
access resources in well over 100 separate networks.

294 | P a g e
70-534 Architecting Microsoft Azure Solutions

CHAPTER 7: Design a management, monitoring, and business continuity


strategy
Microsoft products and services for monitoring Azure solutions:

What is Application Insights?


Application Insights is an extensible Application Performance Management (APM) service for web developers on
multiple platforms. Use it to monitor your live web application. It will automatically detect performance anomalies. It
includes powerful analytics tools to help you diagnose issues and to understand what users actually do with your app.
It's designed to help you continuously improve performance and usability. It works for apps on a wide variety of
platforms including .NET, Node.js and J2EE, hosted on-premises or in the cloud. It integrates with your devOps
process, and has connection points to a variety of development tools.

295 | P a g e
70-534 Architecting Microsoft Azure Solutions

Take a look at the intro animation.


How does Application Insights work?
You install a small instrumentation package in your application, and set up an Application Insights resource in the
Microsoft Azure portal. The instrumentation monitors your app and sends telemetry data to the portal. (The
application can run anywhere - it doesn't have to be hosted in Azure.)
You can instrument not only the web service application, but also any background components, and the JavaScript in
the web pages themselves.

296 | P a g e
70-534 Architecting Microsoft Azure Solutions

In addition, you can pull in telemetry from the host environments such as performance counters, Azure diagnostics, or
Docker logs. You can also set up web tests that periodically send synthetic requests to your web service.
All these telemetry streams are integrated in the Azure portal, where you can apply powerful analytic and search tools
to the raw data.
What's the overhead?
The impact on your app's performance is very small. Tracking calls are non-blocking, and are batched and sent in a
separate thread.
What does Application Insights monitor?
Application Insights is aimed at the development team, to help you understand how your app is performing and how
it's being used. It monitors:
Request rates, response times, and failure rates - Find out which pages are most popular, at what times of day,
and where your users are. See which pages perform best. If your response times and failure rates go high
when there are more requests, then perhaps you have a resourcing problem.
Dependency rates, response times, and failure rates - Find out whether external services are slowing you
down.
Exceptions - Analyse the aggregated statistics, or pick specific instances and drill into the stack trace and
related requests. Both server and browser exceptions are reported.
Page views and load performance - reported by your users' browsers.
AJAX calls from web pages - rates, response times, and failure rates.
User and session counts.
Performance counters from your Windows or Linux server machines, such as CPU, memory, and network
usage.
Host diagnostics from Docker or Azure.
Diagnostic trace logs from your app - so that you can correlate trace events with requests.
Custom events and metrics that you write yourself in the client or server code, to track business events such
as items sold or games won.
Where do I see my telemetry?
There are plenty of ways to explore your data. Check out these articles:

Smart detection and manual alerts


Automatic alerts adapt to your app's normal
patterns of telemetry and trigger when there's
something outside the usual pattern. You can
also set alerts on particular levels of custom or
standard metrics.

Application map
The components of your app, with key metrics and
alerts.

Profiler
Inspect the execution profiles of sampled requests.

297 | P a g e
70-534 Architecting Microsoft Azure Solutions

Usage analysis
Analyze user segmentation and retention.

Diagnostic search for instance data


Search and filter events such as requests,
exceptions, dependency calls, log traces, and page
views.

Metrics Explorer for aggregated data


Explore, filter, and segment aggregated data such
as rates of requests, failures, and exceptions;
response times, page load times.

Dashboards
Mash up data from multiple resources and share
with others. Great for multi-component
applications, and for continuous display in the team
room.

Live Metrics Stream


When you deploy a new build, watch these near-
real-time performance indicators to make sure
everything works as expected.

298 | P a g e
70-534 Architecting Microsoft Azure Solutions

Analytics
Answer tough questions about your app's
performance and usage by using this powerful
query language.

Visual Studio
See performance data in the code. Go to code from
stack traces.

Snapshot debugger
Debug snapshots sampled from live operations,
with parameter values.

Power BI
Integrate usage metrics with other business
intelligence.

REST API
Write code to run queries over your metrics and
raw data.

299 | P a g e
70-534 Architecting Microsoft Azure Solutions

Continuous export
Bulk export of raw data to storage as soon as it
arrives.

How do I use Application Insights?


Monitor
Install Application Insights in your app, set up availability web tests, and:
Set up a dashboard for your team room to keep an eye on load, responsiveness, and the performance of your
dependencies, page loads, and AJAX calls.
Discover which are the slowest and most failing requests.
Watch Live Stream when you deploy a new release, to know immediately about any degradation.
Detect, Diagnose
When you receive an alert or discover a problem:
Assess how many users are affected.
Correlate failures with exceptions, dependency calls and traces.
Examine profiler, snapshots, stack dumps, and trace logs.
Build, Measure, Learn
Measure the effectiveness of each new feature that you deploy.
Plan to measure how customers use new UX or business features.
Write custom telemetry into your code.
Base the next development cycle on hard evidence from your telemetry.
Get started
Application Insights is one of the many services hosted within Microsoft Azure, and telemetry is sent there for analysis
and presentation. So before you do anything else, you'll need a subscription to Microsoft Azure. It's free to sign up,
and if you choose the basic pricing plan of Application Insights, there's no charge until your application has grown to
have substantial usage. If your organization already has a subscription, they could add your Microsoft account to it.
There are several ways to get started. Begin with whichever works best for you. You can add the others later.
At run time: instrument your web app on the server. Avoids any update to the code. You need admin access to
your server.
o IIS on-premises or on a VM
o Azure web app or VM
o J2EE
At development time: add Application Insights to your code. Allows you to write custom telemetry and to
instrument back-end and desktop apps.
o Visual Studio 2013 update 2 or later.
o Java in Eclipse or other tools
o Node.js
o Other platforms
Instrument your web pages for page view, AJAX and other client-side telemetry.
Availability tests - ping your website regularly from our servers.

Overview of Monitoring in Microsoft Azure


This article provides an overview of tools available for monitoring Microsoft Azure. It applies to
monitoring applications running in Microsoft Azure
tools/services that run outside of Azure that can monitor objects in Azure.
300 | P a g e
70-534 Architecting Microsoft Azure Solutions

It discusses the various products and services available and how they work together. It can assist you to determine
which tools are most appropriate for you in what cases.
Why use Monitoring and Diagnostics?
Performance issues in your cloud app can impact your business. With multiple interconnected components and
frequent releases, degradations can happen at any time. And if youre developing an app, your users usually discover
issues that you didnt find in testing. You should know about these issues immediately, and have tools for diagnosing
and fixing the problems. Microsoft Azure has a range of tools for identifying these problems.
How do I monitor my Azure cloud apps?
There is a range of tools for monitoring Azure applications and services. Some of their features overlap. This is partly
for historical reasons and partly due to the blurring between development and operation of an application.
Here are the principal tools:
Azure Monitor is basic tool for monitoring services running on Azure. It gives you infrastructure-level data
about the throughput of a service and the surrounding environment. If you are managing your apps all in
Azure, deciding whether to scale up or down resources, then Azure Monitor gives you what you use to start.
Application Insights can be used for development and as a production monitoring solution. It works by
installing a package into your app, and so gives you a more internal view of whats going on. Its data includes
response times of dependencies, exception traces, debugging snapshots, execution profiles. It provides
powerful smart tools for analyzing all this telemetry both to help you debug an app and to help you
understand what users are doing with it. You can tell whether a spike in response times is due to something in
an app, or some external resourcing issue. If you use Visual Studio and the app is at fault, you can be taken
right to the problem line(s) of code so you can fix it.
Log Analytics is for those who need to tune performance and plan maintenance on applications running in
production. It is based in Azure. It collects and aggregates data from many sources, though with a delay of 10
to 15 minutes. It provides a holistic IT management solution for Azure, on-premises, and third-party cloud-
based infrastructure (such as Amazon Web Services). It provides richer tools to analyze data across more
sources, allows complex queries across all logs, and can proactively alert on specified conditions. You can even
collect custom data into its central repository so can query and visualize it.
System Center Operations Manager (SCOM) is for managing and monitoring large cloud installations. You
might be already familiar with it as a management tool for on-premises Windows Sever and Hyper-V based-
clouds, but it can also integrate with and manage Azure apps. Among other things, it can install Application
Insights on existing live apps. If an app goes down, it tells you in seconds. Note that Log Analytics does not
replace SCOM. It works well in conjunction with it.
Accessing monitoring in the Azure portal
All Azure monitoring services are now available in a single UI pane. For more information on how to access this area,
see Get started with Azure Monitor.
You can also access monitoring functions for specific resources by highlighting those resources and drilling down into
their monitoring options.
Examples of when to use which tool
The following sections show some basic scenarios and which tools should be used together.
Scenario 1 Fix errors in an Azure Application under development
The best option is to use Application Insights, Azure Monitor, and Visual Studio together
Azure now provides the full power of the Visual Studio debugger in the cloud. Configure Azure Monitor to send
telemetry to Application Insights. Enable Visual Studio to include the Application Insights SDK in your application. Once
in Application Insights, you can use the Application Map to discover visually which parts of your running application
are unhealthy or not. For those parts that are not healthy, errors and exceptions are already available for exploration.
You can use the various analytics in Application Insights to go deeper. If you are not sure about the error, you can use
the Visual Studio debugger to trace into code and pin point a problem further.
For more information, see Monitoring Web Apps and refer to the table of contents on the left for instructions on
various types of apps and languages.
Scenario 2 Debug an Azure .NET web application for errors that only show in production
Note
These features are in preview.
The best option is to use Application Insights and if possible Visual Studio for the full debugging experience.

301 | P a g e
70-534 Architecting Microsoft Azure Solutions

Use the Application Insights Snapshot Debugger to debug your app. When a certain error threshold occurs with
production components, the system automatically captures telemetry in windows of time called snapshots." The
amount captured is safe for a production cloud because its small enough not to affect performance but significant
enough to allow tracing. The system can capture multiple snapshots. You can look at a point in time in the Azure
portal or use Visual Studio for the full experience. With Visual Studio, developers can walk through that snapshot as if
they were debugging in real-time. Local variables, parameters, memory, and frames are all available. Developers must
be granted access to this production data via an RBAC role.
For more information, see Snapshot debugging.
Scenario 3 Debug an Azure application that uses containers or microservices
Same as scenario 1. Use Application Insights, Azure Monitor, and Visual Studio together Application Insights also
supports gathering telemetry from processes running inside containers and from microservices (Kubernetes, Docker,
Azure Service Fabric). For more information, see this video on debugging containers and microservices.
Scenario 4 Fix performance issues in your Azure application
The Application Insights profiler is designed to help troubleshoot these types of issues. You can identify and
troubleshoot performance issues for applications running in App Services (Web Apps, Logic Apps, Mobile Apps, API
Apps) and other compute resources such as Virtual Machines, Virtual machine scale sets (VMSS), Cloud Services, and
Service Fabric.
Note
Ability to profile Virtual Machines, Virtual machine scale sets (VMSS), Cloud Services and Services Fabric is in preview.
In addition, you are proactively notified by email about certain types of errors, such as slow page load times, by the
Smart Detection tool. You dont need to do any configuration on this tool. For more information, see Smart Detection
- Performance Anomalies and Smart Detection - Performance Anomalies.

What is Log Analytics?


Log Analytics is a service in Operations Management Suite (OMS) that monitors your cloud and on-premises
environments to maintain their availability and performance. It collects data generated by resources in your cloud and
on-premises environments and from other monitoring tools to provide analysis across multiple sources. This article
provides a brief discussion of the value that Log Analytics provides, an overview of how it operates, and links to more
detailed content so you can dig further.
Is Log Analytics for you?
If you have no current monitoring in place for your Azure environment, you should start with Azure Monitor which
collects and analyzes monitoring data for your Azure resources. Log Analytics can collect data from Azure Monitor to
correlate it with other data and provide additional analysis.4
If you want to monitor your on-premise environment or you have existing monitoring using services such as Azure
Monitor or System Center Operations Manager, then Log Analytics can add significant value. It can collect data
directly from your agents and also from these other tools into a single repository. Analysis tools in Log Analytics such
as log searches, views, and solutions work against all collected data providing you with centralized analysis of your
entire environment.
Using Log Analytics
You can access Log Analytics through the OMS portal or the Azure portal which run in any browser and provide you
with access to configuration settings and multiple tools to analyze and act on collected data. From the portal you can
leverage log searches where you construct queries to analyze collected data, dashboards which you can customize
with graphical views of your most valuable searches, and solutions which provide additional functionality and analysis
tools.
The image below is from the OMS portal which shows the dashboard that displays summary information for
the solutions that are installed in the workspace. You can click on any tile to drill further into the data for that solution.

302 | P a g e
70-534 Architecting Microsoft Azure Solutions

Log Analytics includes a query language to quickly retrieve and consolidate data in the repository. You can create and
save Log Searches to directly analyze data in the portal or have log searches run automatically to create an alert if the
results of the query indicate an important condition.

To get a quick graphical view of the health of your overall environment, you can add visualizations for saved log
searches to your dashboard.

303 | P a g e
70-534 Architecting Microsoft Azure Solutions

In order to analyze data outside of Log Analytics, you can export the data from the OMS repository into tools such
as Power BI or Excel. You can also leverage the Log Search API to build custom solutions that leverage Log Analytics
data or to integrate with other systems.
Add functionality with management solutions
Management solutions add functionality to OMS, providing additional data and analysis tools to Log Analytics. They
may also define new record types to be collected that can be analyzed with Log Searches or by additional user
interface provided by the solution in the dashboard. The example image below shows the Change Tracking solution

Solutions are available for a variety of functions, and additional solutions are consistently being added. You can easily
browse available solutions and add them to your OMS workspace from the Solutions Gallery or Azure Marketplace.
Many will be automatically deployed and start working immediately while others may require moderate configuration.

304 | P a g e
70-534 Architecting Microsoft Azure Solutions

Log Analytics components


At the center of Log Analytics is the OMS repository which is hosted in the Azure cloud. Data is collected into the
repository from connected sources by configuring data sources and adding solutions to your subscription. Data
sources and solutions will each create different record types that have their own set of properties but may still be
analyzed together in queries to the repository. This allows you to use the same tools and methods to work with
different kinds of data collected by different sources.

Connected sources are the computers and other resources that generate data collected by Log Analytics. This can
include agents installed on Windows and Linux computers that connect directly or agents in a connected System
Center Operations Manager management group. For Azure resources, Log Analytics collects data from Azure Monitor
and Azure Diagnostics.
Data sources are the different kinds of data collected from each connected source. This
includes events and performance data from Windows and Linux agents in addition to sources such as IIS logs,
and custom text logs. You configure each data source that you want to collect, and the configuration is automatically
delivered to each connected source.
If you have custom requirements, then you can use the HTTP Data Collector API to write data to the repository from a
REST API client.
Log Analytics architecture

305 | P a g e
70-534 Architecting Microsoft Azure Solutions

The deployment requirements of Log Analytics are minimal since the central components are hosted in the Azure
cloud. This includes the repository in addition to the services that allow you to correlate and analyze collected data.
The portal can be accessed from any browser so there is no requirement for client software.
You must install agents on Windows and Linux computers, but there is no additional agent required for computers
that are already members of a connected SCOM management group. SCOM agents will continue to communicate with
management servers which will forward their data to Log Analytics. Some solutions though will require agents to
communicate directly with Log Analytics. The documentation for each solution will specify its communication
requirements.
When you sign up for Log Analytics, you will create an OMS workspace. You can think of the workspace as a unique
Log Analytics environment with its own data repository, data sources, and solutions. You may create multiple
workspaces in your subscription to support multiple environments such as production and test.

What is Site Recovery?


Welcome to the Azure Site Recovery service! This article provides a quick overview of the service.
Outages are causes by natural events and operational failures. Your organization needs a business continuity and
disaster recovery (BCDR) strategy so that, during planned and unplanned downtime, data stays safe, apps remain
available, and business recovers to normal working conditions as soon as possible.
Azure Recovery Services contribute to your BCDR strategy. The Azure Backup service keeps your data safe and
recoverable. Site Recovery replicates, fails over, and recovers workloads, so that they remain available when failure
occurs.
What does Site Recovery provide?
Disaster recovery in the cloud You can replicate and protect VMs running on Azure, using Azure to Azure
disaster recovery solution. You can replicate workloads running on VMs and physical servers to Azure, rather
than to a secondary site. This eliminates the cost and complexity of maintaining a secondary datacenter.
Flexible replication for hybrid environmentsYou can replicate any workload supported on Azure VMs, on-
premises Hyper-V VMs, VMware VMs, and Windows/Linux physical servers.
MigrationYou can use Site Recovery to migrate on-premises and AWS instances to Azure VMs, or to migrate
Azure VMs between Azure regions.
Simplified BCDRYou can deploy replication from a single location in the Azure portal. You can run simple
failovers and failback of single and multiple machines.
ResilienceSite recovery orchestrates replication and failover, without intercepting application data.
Replicated data is stored in Azure storage, with the resilience that Azure storage provides. When failover
occurs, Azure VMs are created based on the replicated data.

306 | P a g e
70-534 Architecting Microsoft Azure Solutions

Replication performanceSite Recovery provides continuous replication for Azure VMs and VMware VMs, and
replication frequency as low as 30 seconds for Hyper-V. You can reduce recovery time objective (RTO) with
Site Recovery's automated recovery process, and integration with Azure Traffic Manager
Application consistency You can configure application-consistent snapshots for the recovery points. In
addition to capturing disk data, application-consistent snapshots capture all data in memory, and all
transactions in process.
Testing without disruptionYou can easily run test failovers to support disaster recovery drills, without
affecting production environments and the ongoing replication.
Flexible failover and recoveryYou can run planned failovers for expected outages with zero-data loss, or
unplanned failovers with minimal data loss (depending on replication frequency) for unexpected disasters.
You can easily fail back to your primary site when it's available again.
Custom recovery plansRecovery plans allow you to model and customize failover and recovery of multi-tier
applications that are spread over multiple VMs. You order groups within plans, and add scripts and manual
actions. Recovery plans can be integrated with Azure automation runbooks.
Multi-tier appsYou can create recovery plans for sequenced failover and recovery of multi-tiered apps. You
can group machines in different tiers (for example database, web, app) within a recovery plan, and customize
how each group fails over and starts up.
Integration with existing BCDR technologiesSite Recovery integrates with other BCDR technologies. For
example, you can use Site Recovery to protect the SQL Server backend of corporate workloads, including
native support for SQL Server AlwaysOn, to manage the failover of availability groups.
Integration with the automation libraryA rich Azure Automation library provides production-ready,
application-specific scripts that can be downloaded and integrated with Site Recovery.
Simple network managementAdvanced network management in Site Recovery and Azure simplifies
application network requirements, including reserving IP addresses, configuring load-balancers, and
integrating Azure Traffic Manager for efficient network switchovers.
What's supported?

Supported Details

Which regions are supported for Site Supported regions


Recovery?

What can I replicate? Azure VMs (in preview), On-premises VMware VMs, Hyper-V VMs,
Windows and Linux physical servers.

What operating systems do Supported operating systems


replicated machines need? for Azure VMs
Supported operating systems for VMware VMs

For Hyper-V VMs, any guest OS supported by Azure and Hyper-V is


supported.

Operating systems for physical servers

Where can I replicate to? For Azure VMs, you can replicate another Azure region.
For on-premises machines, you can replicate to Azure storage, or
to a secondary datacenter.

Note
For Hyper-V, only VMs on Hyper-V hosts managed in System Center VMM clouds can replicate to a secondary
datacenter.

307 | P a g e
70-534 Architecting Microsoft Azure Solutions

What VMware servers/hosts do I need? | VMware VMs you want to replicate can be managed by supported vSphere
hosts/vCenter servers What workloads can I replicate | You can replicate any workload running on a supported
replication machine. In addition, the Site Recovery team have performed app-specific testing for a number of apps.
Which Azure portal?
Site Recovery can be deployed in the Azure portal.
In the Azure classic portal, you can manage Site Recovery with the classic services management model.
The classic portal should only be used to maintain existing Site Recovery deployments. You can't create new
vaults in the classic portal.

How does Hyper-V replication to Azure work in Site Recovery?


This article describes the components and processes involved when replicating on-premises Hyper-V virtual machines,
to Azure using the Azure Site Recovery service.
Site Recovery can replicate Hyper-V VMs on Hyper-V clusters and standalone hosts that are managed with or without
System Center Virtual Machine Manager (VMM).
Post any comments at the bottom of this article, or in the Azure Recovery Services Forum.
Architectural components
There are a number of components involved when replicating Hyper-V VMs to Azure.

Component Location Details

Azure In Azure you need a Replicated data is stored in the storage account, and
Microsoft Azure account, Azure VMs are created with the replicated data when
an Azure storage account, failover from your on-premises site occurs.
and a Azure network.
The Azure VMs connect to the Azure virtual network
when they're created.

VMM Hyper-V hosts are located If Hyper-V hosts are managed in VMM clouds, you
server in VMM clouds register the VMM server in the Recovery Services vault.

On the VMM server you install the Site Recovery Provider


to orchestrate replication with Azure.

You need logical and VM networks set up to configure


network mapping. A VM network should be linked to a
logical network that's associated with the cloud.

Hyper-V Hyper-V hosts and clusters If there's no VMM server, the Site Recovery Provider is
host can be deployed with or installed on the host to orchestrate replication with Site
without VMM server. Recovery over the internet. If there's a VMM server, the
Provider is installed on it, and not on the host.

The Recovery Services agent is installed on the host to


handle data replication.

Communications from both the Provider and the agent


are secure and encrypted. Replicated data in Azure
storage is also encrypted.

308 | P a g e
70-534 Architecting Microsoft Azure Solutions

Component Location Details

Hyper-V You need one or more Nothing needs to explicitly installed on VMs.
VMs VMs running on a Hyper-V
host server.

Learn about the deployment prerequisites and requirements for each of these components in the support matrix.
Figure 1: Hyper-V site to Azure replication

Figure 2: Hyper-V in VMM clouds to Azure replication

Replication process
309 | P a g e
70-534 Architecting Microsoft Azure Solutions

Figure 3: Replication and recovery process for Hyper-V replication to Azure

Enable protection
1. After you enable protection for a Hyper-V VM, in the Azure portal or on-premises, the Enable
protection starts.
2. The job checks that the machine complies with prerequisites, before invoking
the CreateReplicationRelationship, to set up replication with the settings you've configured.
3. The job starts initial replication by invoking the StartReplication method, to initialize a full VM replication, and
send the VM's virtual disks to Azure.

310 | P a g e
70-534 Architecting Microsoft Azure Solutions

4. You can monitor the job in the Jobs tab.

Replicate the initial data


1. A Hyper-V VM snapshot snapshot is taken when initial replication is triggered.
2. Virtual hard disks are replicated one by one until they're all copied to Azure. It could take a while, depending
on the VM size, and network bandwidth. To optimize your network usage, see How to manage on-premises to
Azure protection network bandwidth usage.
3. If disk changes occur while initial replication is in progress, the Hyper-V Replica Replication Tracker tracks
those changes as Hyper-V Replication Logs (.hrl). These files are located in the same folder as the disks. Each
disk has an associated .hrl file that will be sent to secondary storage.
4. The snapshot and log files consume disk resources while initial replication is in progress.
5. When the initial replication finishes, the VM snapshot is deleted. Delta disk changes in the log are
synchronized and merged to the parent disk.
Finalize protection

311 | P a g e
70-534 Architecting Microsoft Azure Solutions

1. After the initial replication finishes, the Finalize protection on the virtual machine job configures network and
other post-replication settings, so that the virtual machine is protected.

2. If you're replicating to Azure, you might need to tweak the settings for the virtual machine so that it's ready
for failover. At this point you can run a test failover to check everything is working as expected.
Replicate the delta
1. After the initial replication, delta synchronization begins, in accordance with replication settings.
2. The Hyper-V Replica Replication Tracker tracks the changes to a virtual hard disk as .hrl files. Each disk that's
configured for replication has an associated .hrl file. This log is sent to the customer's storage account after
initial replication is complete. When a log is in transit to Azure, the changes in the primary disk are tracked in
another log file, in the same directory.
3. During initial and delta replication, you can monitor the VM in the VM view. Learn more.
Synchronize replication
1. If delta replication fails, and a full replication would be costly in terms of bandwidth or time, then a VM is
marked for resynchronization. For example, if the .hrl files reach 50% of the disk size, then the VM will be
marked for resynchronization.
2. Resynchronization minimizes the amount of data sent by computing checksums of the source and target
virtual machines, and sending only the delta data. Resynchronization uses a fixed-block chunking algorithm
where source and target files are divided into fixed chunks. Checksums for each chunk are generated and
then compared to determine which blocks from the source need to be applied to the target.
3. After resynchronization finishes, normal delta replication should resume. By default resynchronization is
scheduled to run automatically outside office hours, but you can resynchronize a virtual machine manually.
For example, you can resume resynchronization if a network outage or another outage occurs. To do this,
select the VM in the portal > Resynchronize.

Retry logic
312 | P a g e
70-534 Architecting Microsoft Azure Solutions

If a replication error occurs, there's a built-in retry. This logic can be classified into two categories:

Category Details

Non- No retry is attempted. VM status will be Critical, and administrator intervention is


recoverable required. Examples of these errors include: broken VHD chain; Invalid state for the
errors replica VM; Network authentication errors: authorization errors; VM not found errors
(for standalone Hyper-V servers)

Recoverable Retries occur every replication interval, using an exponential back-off that increases the
errors retry interval from the start of the first attempt by 1, 2, 4, 8, and 10 minutes. If an error
persists, retry every 30 minutes. Examples include: network errors; low disk errors; low
memory conditions

Failover and failback process


1. You can run a planned or unplanned failover from on-premises Hyper-V VMs to Azure. If you run a planned
failover, then source VMs are shut down to ensure no data loss.
2. You can fail over a single machine, or create recovery plans to orchestrate failover of multiple machines.
3. After you run the failover, you should be able to see the created replica VMs in Azure. You can assign a public
IP address to the VM if required.
4. You then commit the failover to start accessing the workload from the replica Azure VM.
5. When your primary on-premises site is available again, you can fail back. You kick off a planned failover from
Azure to the primary site. For a planned failover you can select to failback to the same VM or to an alternate
location, and synchronize changes between Azure and on-premises, to ensure no data loss. When VMs are
created on-premises, you commit the failover.

Azure Site Recovery support matrix for replicating from on-premises to Azure
This article summarizes supported configurations and components for Azure Site Recovery when replicating and
recovering to Azure. For more about Azure Site Recovery requirements, see the prerequisites.
Support for deployment options

Hyper-V (with/without Virtual


Deployment VMware/physical server Machine Manager)

Azure portal On-premises VMware VMs to Azure On-premises Hyper-V VMs to


storage, with Azure Resource Azure storage, with Resource
Manager or classic storage and Manager or classic storage and
networks. networks.

Failover to Resource Manager-based Failover to Resource Manager-


or classic VMs. based or classic VMs.

Classic Maintenance mode only. New vaults Maintenance mode only.


portal can't be created.

PowerShell Not currently supported. Supported

Support for datacenter management servers


Virtualization management entities

313 | P a g e
70-534 Architecting Microsoft Azure Solutions

Deployment Support

VMware VM/physical server vSphere 6.0, 5.5, or 5.1 with latest update

Hyper-V (with Virtual Machine System Center Virtual Machine Manager 2016 and System
Manager) Center Virtual Machine Manager 2012 R2

Note
A System Center Virtual Machine Manager 2016 cloud with a mixture of Windows Server 2016 and 2012 R2 hosts
isn't currently supported.
Host servers

Deployment Support

VMware VM/physical server vCenter 5.5 or 6.0 (support for 5.5 features only)

Hyper-V (with/without Virtual Windows Server 2016, Windows Server 2012 R2 with
Machine Manager) latest updates.
If SCVMM is used, Windows Server 2016 hosts should be
managed by SCVMM 2016.

Note
A Hyper-V site that mixes hosts running Windows Server 2016 and 2012 R2 isn't currently supported. Recovery to
an alternate location for VMs on a Windows Server 2016 host isn't currently supported.
Support for replicated machine OS versions
Virtual machines that are protected must meet Azure requirements when replicating to Azure. The following table
summarizes replicated operating system support in various deployment scenarios while using Azure Site Recovery.
This support is applicable for any workload running on the mentioned OS.

Hyper-V
(with/without
VMware/physical server VMM)

64-bit Windows Server 2012 R2, Windows Server 2012, Windows Server Any guest
2008 R2 with at least SP1 OS supported by
Azure
Red Hat Enterprise Linux 6.7, 6.8, 7.1, 7.2

CentOS 6.5, 6.6, 6.7, 6.8, 7.0, 7.1, 7.2

Ubuntu 14.04 LTS server (supported kernel versions)

Oracle Enterprise Linux 6.4, 6.5 running either the Red Hat compatible
kernel or Unbreakable Enterprise Kernel Release 3 (UEK3)

SUSE Linux Enterprise Server 11 SP3

SUSE Linux Enterprise Server 11 SP4


(Upgrade of replicating machines from SLES 11 SP3 to SLES 11 SP4 is not
supported. If a replicated machine has been upgraded from SLES 11SP3 to
SLES 11 SP4, you'll need to disable replication and protect the machine
again post the upgrade.)

314 | P a g e
70-534 Architecting Microsoft Azure Solutions

Important
(Applicable to VMware/Physical servers replicating to Azure)
On Red Hat Enterprise Linux Server 7+ and CentOS 7+ servers, kernel version 3.10.0-514 is supported starting
from version 9.8 of the Azure Site Recovery Mobility service.

Customers on the 3.10.0-514 kernel with a version of the Mobility service lower than version 9.8 are required to
disable replication, update the version of the Mobility service to version 9.8 and then enable replication again.
Supported Ubuntu kernel versions for VMware/physical servers

Release Mobility service version Kernel version

14.04 LTS 9.9 3.13.0-24-generic to 3.13.0-117-generic,


3.16.0-25-generic to 3.16.0-77-generic,
3.19.0-18-generic to 3.19.0-80-generic,
4.2.0-18-generic to 4.2.0-42-generic,
4.4.0-21-generic to 4.4.0-75-generic

Supported file systems and guest storage configurations on Linux (VMware/Physical servers)
The following file systems and storage configuration software is supported on Linux servers running on VMware or
Physical servers:
File systems: ext3, ext4, ReiserFS (Suse Linux Enterprise Server only), XFS (upto v4 only)
Volume manager : LVM2
Multipath software : Device Mapper
Physical servers with the HP CCISS storage controller aren't supported.
Note
On Linux servers the following directories (if set up as separate partitions/file-systems) must all be on the same
disk (the OS disk) on the source server: / (root), /boot, /usr, /usr/local, /var, /etc

XFS v5 features such as metadata checksum are currently not supported by ASR on XFS filesystems. Ensure that
your XFS filesystems aren't using any v5 features. You can use the xfs_info utility to check the XFS superblock for
the partition. If ftype is set to 1, then XFSv5 features are being used.
Support for network configuration
The following tables summarize network configuration support in various deployment scenarios that use Azure
Site Recovery to replicate to Azure.
Host network configuration

Hyper-V (with/without Virtual Machine


Configuration VMware/physical server Manager)

NIC teaming Yes Yes

Not supported in physical


machines

VLAN Yes Yes

IPv4 Yes Yes

IPv6 No No

Guest VM network configuration

315 | P a g e
70-534 Architecting Microsoft Azure Solutions

VMware/physical Hyper-V (with/without Virtual Machine


Configuration server Manager)

NIC teaming No No

IPv4 Yes Yes

IPv6 No No

Static IP Yes Yes


(Windows)

Static IP (Linux) No No

Multi-NIC Yes Yes

Failed-over Azure VM network configuration

Azure VMware/physical Hyper-V (with/without Virtual Machine


networking server Manager)

Express Route Yes Yes

ILB Yes Yes

ELB Yes Yes

Traffic Manager Yes Yes

Multi-NIC Yes Yes

Reserved IP Yes Yes

IPv4 Yes Yes

Retain source IP Yes Yes

Support for storage


The following tables summarize storage configuration support in various deployment scenarios that use Azure Site
Recovery to replicate to Azure.
Host storage configuration

VMware/physical Hyper-V (with/without


Configuration server Virtual Machine Manager)

NFS Yes for VMware N/A

316 | P a g e
70-534 Architecting Microsoft Azure Solutions

VMware/physical Hyper-V (with/without


Configuration server Virtual Machine Manager)

No for physical
servers

SMB 3.0 N/A Yes

SAN (ISCSI) Yes Yes

Multi-path (MPIO) Yes Yes


Tested with: Microsoft DSM, EMC
PowerPath 5.7 SP4, EMC PowerPath
DSM for CLARiiON

Guest or physical server storage configuration

VMware/physical Hyper-V (with/without Virtual


Configuration server Machine Manager)

VMDK Yes N/A

VHD/VHDX N/A Yes

Gen 2 VM N/A Yes

EFI/UEFI No Yes

Shared cluster disk Yes for VMware No

N/A for physical


servers

Encrypted disk No No

NFS No N/A

SMB 3.0 No No

RDM Yes N/A

N/A for physical


servers

Disk > 1 TB No No

317 | P a g e
70-534 Architecting Microsoft Azure Solutions

VMware/physical Hyper-V (with/without Virtual


Configuration server Machine Manager)

Volume with striped disk > Yes Yes


1 TB

LVM-Logical Volume
Management

Storage Spaces No Yes

Hot add/remove disk No No

Exclude disk Yes Yes

Multi-path (MPIO) N/A Yes

VMware/physical Hyper-V (with/without Virtual Machine


Azure storage server Manager)

LRS Yes Yes

GRS Yes Yes

RA-GRS Yes Yes

Cool storage No No

Hot storage No No

Encryption at Yes Yes


rest(SSE)

Premium storage Yes Yes

Import/export No No
service

Support for Azure compute configuration

Compute VMware/physical Hyper-V (with/without Virtual Machine


feature server Manager)

Availability sets Yes Yes

HUB Yes Yes

Failed-over Azure VM requirements


318 | P a g e
70-534 Architecting Microsoft Azure Solutions

You can deploy Site Recovery to replicate virtual machines and physical servers running any operating system
supported by Azure. This includes most versions of Windows and Linux. On-premises VMs that you want to
replicate must conform with the following Azure requirements while replicating to Azure.

Entity Requirements Details

Guest Hyper-V to Azure replication: Prerequisites check will fail if


operating Site Recovery supports all unsupported.
system operating systems that
are supported by Azure.

For VMware and physical


server replication: Check the
Windows and
Linux prerequisites

Guest 64-bit Prerequisites check will fail if


operating unsupported
system
architecture

Operating Up to 1023 GB Prerequisites check will fail if


system disk unsupported
size

Operating 1 Prerequisites check will fail if


system disk unsupported.
count

Data disk 64 or less if you are Prerequisites check will fail if


count replicating VMware VMs to unsupported
Azure; 16 or less if you are
replicating Hyper-V VMs to
Azure

Data disk VHD Up to 1023 GB Prerequisites check will fail if


size unsupported

Network Multiple adapters are


adapters supported

Shared VHD Not supported Prerequisites check will fail if


unsupported

FC disk Not supported Prerequisites check will fail if


unsupported

319 | P a g e
70-534 Architecting Microsoft Azure Solutions

Entity Requirements Details

Hard disk VHD Although VHDX isn't currently


format supported in Azure, Site Recovery
VHDX automatically converts VHDX to VHD
when you fail over to Azure. When you
fail back to on-premises the virtual
machines continue to use the VHDX
format.

Bitlocker Not supported Bitlocker must be disabled before


protecting a virtual machine.

VM name Between 1 and 63 characters. Update the value in the virtual


Restricted to letters, numbers, machine properties in Site Recovery.
and hyphens. The VM name
must start and end with a letter
or number.

VM type Generation 1 Generation 2 VMs with an OS disk


type of basic (which includes one or
Generation 2 -- Windows two data volumes formatted as VHDX)
and less than 300 GB of disk space are
supported.
Linux Generation 2 VMs aren't
supported. Learn more

Support for Recovery Services vault actions

Hyper-V (no Hyper-V (with


VMware/physical Virtual Machine Virtual Machine
Action server Manager) Manager)

Move vault across No No No


resource groups

Within and across


subscriptions

Move storage, No No No
network, Azure VMs
across resource groups

Within and across


subscriptions

Support for Provider and Agent

320 | P a g e
70-534 Architecting Microsoft Azure Solutions

Name Description Latest version Details

Azure Site Coordinates communications 5.1.19 Latest


Recovery between on-premises servers and (available features
Provider Azure from portal) and fixes

Installed on on-premises Virtual


Machine Manager servers, or on
Hyper-V servers if there's no
Virtual Machine Manager server

Azure Site Coordinates communications 9.3.4246.1 Latest


Recovery between on-premises VMware (available features
Unified Setup servers and Azure from portal) and fixes
(VMware to
Azure) Installed on on-premises VMware
servers

Mobility service Coordinates replication between N/A (available N/A


on-premises VMware from portal)
servers/physical servers and
Azure/secondary site

Installed on VMware VM or
physical servers you want to
replicate

Microsoft Azure Coordinates replication between Latest agent


Recovery Hyper-V VMs and Azure (available
Services (MARS) from portal)
agent Installed on on-premises Hyper-V
servers (with or without a Virtual
Machine Manager server)

Overview of the features in Azure Backup


Azure Backup is the Azure-based service you can use to back up (or protect) and restore your data in the Microsoft
cloud. Azure Backup replaces your existing on-premises or off-site backup solution with a cloud-based solution that is
reliable, secure, and cost-competitive. Azure Backup offers multiple components that you download and deploy on
the appropriate computer, server, or in the cloud. The component, or agent, that you deploy depends on what you
want to protect. All Azure Backup components (no matter whether you're protecting data on-premises or in the
cloud) can be used to back up data to a Recovery Services vault in Azure. See the Azure Backup components
table (later in this article) for information about which component to use to protect specific data, applications, or
workloads.
Watch a video overview of Azure Backup
Why use Azure Backup?
Traditional backup solutions have evolved to treat the cloud as an endpoint, or static storage destination, similar to
disks or tape. While this approach is simple, it is limited and doesn't take full advantage of an underlying cloud
platform, which translates to an expensive, inefficient solution. Other solutions are expensive because you end up
paying for the wrong type of storage, or storage that you don't need. Other solutions are often inefficient because
321 | P a g e
70-534 Architecting Microsoft Azure Solutions

they don't offer you the type or amount of storage you need, or administrative tasks require too much time. In
contrast, Azure Backup delivers these key benefits:
Automatic storage management - Hybrid environments often require heterogeneous storage - some on-premises and
some in the cloud. With Azure Backup, there is no cost for using on-premises storage devices. Azure Backup
automatically allocates and manages backup storage, and it uses a pay-as-you-use model. Pay-as-you-use means that
you only pay for the storage that you consume. For more information, see the Azure pricing article.
Unlimited scaling - Azure Backup uses the underlying power and unlimited scale of the Azure cloud to deliver high-
availability - with no maintenance or monitoring overhead. You can set up alerts to provide information about events,
but you don't need to worry about high-availability for your data in the cloud.
Multiple storage options - An aspect of high-availability is storage replication. Azure Backup offers two types of
replication: locally redundant storage and geo-redundant storage. Choose the backup storage option based on need:
Locally redundant storage (LRS) replicates your data three times (it creates three copies of your data) in a
paired datacenter in the same region. LRS is a low-cost option for protecting your data from local hardware
failures.
Geo-redundant storage (GRS) replicates your data to a secondary region (hundreds of miles away from the
primary location of the source data). GRS costs more than LRS, but GRS provides a higher level of durability for
your data, even if there is a regional outage.
Unlimited data transfer - Azure Backup does not limit the amount of inbound or outbound data you transfer. Azure
Backup also does not charge for the data that is transferred. However, if you use the Azure Import/Export service to
import large amounts of data, there is a cost associated with inbound data. For more information about this cost,
see Offline-backup workflow in Azure Backup. Outbound data refers to data transferred from a Recovery Services
vault during a restore operation.
Data encryption - Data encryption allows for secure transmission and storage of your data in the public cloud. You
store the encryption passphrase locally, and it is never transmitted or stored in Azure. If it is necessary to restore any
of the data, only you have encryption passphrase, or key.
Application-consistent backup - Whether backing up a file server, virtual machine, or SQL database, you need to know
that a recovery point has all required data to restore the backup copy. Azure Backup provides application-consistent
backups, which ensured additional fixes are not needed to restore the data. Restoring application consistent data
reduces the restoration time, allowing you to quickly return to a running state.
Long-term retention - Instead of switching backup copies from disk to tape and moving the tape to an off-site location,
you can use Azure for short-term and long-term retention. Azure doesn't limit the length of time data remains in a
Backup or Recovery Services vault. You can keep data in a vault for as long as you like. Azure Backup has a limit of
9999 recovery points per protected instance. See the Backup and retention section in this article for an explanation of
how this limit may impact your backup needs.
Which Azure Backup components should I use?
If you aren't sure which Azure Backup component works for your needs, see the following table for information about
what you can protect with each component. The Azure portal provides a wizard, which is built into the portal, to guide
you through choosing the component to download and deploy. The wizard, which is part of the Recovery Services
vault creation, leads you through the steps for selecting a backup goal, and choosing the data or application to
protect.

Where are
What is backups
Component Benefits Limits protected? stored?

Azure Backup Back up files and folders Backup 3x per day Files, Recovery
(MARS) agent on physical or virtual Not application Folders Services
Windows OS (VMs can be aware; file, folder, vault
on-premises or in Azure) and volume-level
No separate backup restore only,
server required. No support for
Linux.

322 | P a g e
70-534 Architecting Microsoft Azure Solutions

Where are
What is backups
Component Benefits Limits protected? stored?

System Application-aware Cannot back up Files, Recovery


Center DPM snapshots (VSS) Oracle workload. Folders, Services
Full flexibility for when to Volumes, vault,
take backups VMs, Locally
Recovery granularity (all) Applications, attached
Can use Recovery Workloads disk,
Services vault Tape (on-
Linux support on Hyper-V premises
and VMware VMs only)
Back up and restore
VMware VMs using DPM
2012 R2

Azure Backup App aware snapshots Cannot back up Files, Recovery


Server (VSS) Oracle workload. Folders, Services
Full flexibility for when to Always requires Volumes, vault,
take backups live Azure VMs, Locally
Recovery granularity (all) subscription Applications, attached
Can use Recovery No support for Workloads disk
Services vault tape backup
Linux support on Hyper-V
and VMware VMs
Back up and restore
VMware VMs
Does not require a
System Center license

Azure IaaS Native backups for Back up VMs VMs, Recovery


VM Backup Windows/Linux once-a-day All disks (using Services
No specific agent Restore VMs only PowerShell) vault
installation required at disk level
Fabric-level backup with Cannot back up
no backup infrastructure on-premises
needed

What are the deployment scenarios for each component?

Target
Can be deployed on- storage
Component Can be deployed in Azure? premises? supported

Azure Backup Yes Yes Recovery


(MARS) agent The Azure Backup agent can be The Backup agent can be Services vault
deployed on any Windows deployed on any Windows
Server VM that runs in Azure. Server VM or physical
machine.2

323 | P a g e
70-534 Architecting Microsoft Azure Solutions

Target
Can be deployed on- storage
Component Can be deployed in Azure? premises? supported

System Center Yes Yes Locally


DPM Learn more about how to Learn more about how to attached disk,
protect workloads in Azure by protect workloads and VMs in Recovery
using System Center DPM. your datacenter. Services
vault,
tape (on-
premises
only)

Azure Backup Yes Yes Locally


Server Learn more about how to Learn more about how to attached disk,
protect workloads in Azure by protect workloads in Azure by Recovery
using Azure Backup Server. using Azure Backup Server. Services vault

Azure IaaS VM Yes No Recovery


Backup Part of Azure fabric Use System Center DPM to Services vault
Specialized for backup of Azure back up virtual machines in
infrastructure as a service (IaaS) your datacenter.
virtual machines.

Which applications and workloads can be backed up?


The following table provides a matrix of the data and workloads that can be protected using Azure Backup. The Azure
Backup solution column has links to the deployment documentation for that solution. Each Azure Backup component
can be deployed in a Classic (Service Manager-deployment) or Resource Manager-deployment model environment.
Important
Before you work with Azure resources, get familiar with the deployment models: Resource Manager, and classic.

Source
Data or Workload environment Azure Backup solution

Files and folders Windows Server Azure Backup agent,


System Center DPM (+ the Azure Backup
agent),
Azure Backup Server (includes the Azure
Backup agent)

Files and folders Windows Azure Backup agent,


computer System Center DPM (+ the Azure Backup
agent),
Azure Backup Server (includes the Azure
Backup agent)

Hyper-V virtual machine Windows Server System Center DPM (+ the Azure Backup
(Windows) agent),
Azure Backup Server (includes the Azure
Backup agent)

324 | P a g e
70-534 Architecting Microsoft Azure Solutions

Source
Data or Workload environment Azure Backup solution

Hyper-V virtual machine Windows Server System Center DPM (+ the Azure Backup
(Linux) agent),
Azure Backup Server (includes the Azure
Backup agent)

Microsoft SQL Server Windows Server System Center DPM (+ the Azure Backup
agent),
Azure Backup Server (includes the Azure
Backup agent)

Microsoft SharePoint Windows Server System Center DPM (+ the Azure Backup
agent),
Azure Backup Server (includes the Azure
Backup agent)

Microsoft Exchange Windows Server System Center DPM (+ the Azure Backup
agent),
Azure Backup Server (includes the Azure
Backup agent)

Azure IaaS VMs (Windows) running in Azure Azure Backup (VM extension)

Azure IaaS VMs (Linux) running in Azure Azure Backup (VM extension)

Linux support
The following table shows the Azure Backup components that have support for Linux.

Component Linux (Azure endorsed) Support

Azure Backup (MARS) agent No (Only Windows based agent)

System Center DPM File-consistent backup of Linux Guest VMs on Hyper-V and VMWare
VM restore of Hyper-V and VMWare Linux Guest VMs

File-consistent backup not available for Azure VM

Azure Backup Server File-consistent backup of Linux Guest VMs on Hyper-V and VMWare
VM restore of Hyper-V and VMWare Linux Guest VMs
File-consistent backup not available for Azure VM

Azure IaaS VM Backup Application-consistent backup using pre-script and post-script framework
Granular file recovery
Restore all VM disks
VM restore

Using Premium Storage VMs with Azure Backup

325 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure Backup protects Premium Storage VMs. Azure Premium Storage is solid-state drive (SSD)-based storage
designed to support I/O-intensive workloads. Premium Storage is attractive for virtual machine (VM) workloads. For
more information about Premium Storage, see the article, Premium Storage: High-Performance Storage for Azure
Virtual Machine Workloads.
Back up Premium Storage VMs
While backing up Premium Storage VMs, the Backup service creates a temporary staging location, named
"AzureBackup-", in the Premium Storage account. The size of the staging location is equal to the size of the recovery
point snapshot. Be sure the Premium Storage account has adequate free space to accommodate the temporary
staging location. For more information, see the article, premium storage limitations. Once the backup job finishes, the
staging location is deleted. The price of storage used for the staging location is consistent with all Premium storage
pricing.
Note: Do not modify or edit the staging location.
Restore Premium Storage VMs
Premium Storage VMs can be restored to either Premium Storage or to normal storage. Restoring a Premium Storage
VM recovery point back to Premium Storage is the typical process of restoration. However, it can be cost effective to
restore a Premium Storage VM recovery point to standard storage. This type of restoration can be used if you need a
subset of files from the VM.
Using managed disk VMs with Azure Backup
Azure Backup protects managed disk VMs. Managed disks free you from managing storage accounts of virtual
machines and greatly simplify VM provisioning.
Back up managed disk VMs
Backing up VMs on managed disks is no different than backing up Resource Manager VMs. In the Azure portal, you
can configure the backup job directly from the Virtual Machine view or from the Recovery Services vault view. You can
back up VMs on managed disks through RestorePoint collections built on top of managed disks. Azure Backup also
supports backing up managed disk VMs encrypted using Azure Disk encryption(ADE).
Restore managed disk VMs
Azure Backup allows you to restore a complete VM with managed disks, or restore managed disks to a storage
account. Azure manages the managed disks during the restore process. You (the customer) manage the storage
account created as part of the restore process. When restoring managed encrypted VMs, the VM's keys and secrets
should exist in the key vault prior to starting the restore operation.
What are the features of each Backup component?
The following sections provide tables that summarize the availability or support of various features in each Azure
Backup component. See the information following each table for additional support or details.
Storage

Azure Backup System Center Azure Backup Azure IaaS VM


Feature agent DPM Server Backup

Recovery Services
vault

Disk storage

Tape storage

Compression
(in Recovery Services
vault)

326 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure Backup System Center Azure Backup Azure IaaS VM


Feature agent DPM Server Backup

Incremental backup

Disk deduplication

The Recovery Services vault is the preferred storage target across all components. System Center DPM and Azure
Backup Server also provide the option to have a local disk copy. However, only System Center DPM provides the
option to write data to a tape storage device.
Compression
Backups are compressed to reduce the required storage space. The only component that does not use compression is
the VM extension. The VM extension copies all backup data from your storage account to the Recovery Services vault
in the same region. No compression is used when transferring the data. Transferring the data without compression
slightly inflates the storage used. However, storing the data without compression allows for faster restoration, should
you need that recovery point.
Disk Deduplication
You can take advantage of deduplication when you deploy System Center DPM or Azure Backup Server on a Hyper-V
virtual machine. Windows Server performs data deduplication (at the host level) on virtual hard disks (VHDs) that are
attached to the virtual machine as backup storage.
Note
Deduplication is not available in Azure for any Backup component. When System Center DPM and Backup Server are
deployed in Azure, the storage disks attached to the VM cannot be deduplicated.
Incremental backup explained
Every Azure Backup component supports incremental backup regardless of the target storage (disk, tape, Recovery
Services vault). Incremental backup ensures that backups are storage and time efficient, by transferring only those
changes made since the last backup.
Comparing Full, Differential and Incremental backup
Storage consumption, recovery time objective (RTO), and network consumption varies for each type of backup
method. To keep the backup total cost of ownership (TCO) down, you need to understand how to choose the best
backup solution. The following image compares Full Backup, Differential Backup, and Incremental Backup. In the
image, data source A is composed of 10 storage blocks A1-A10, which are backed up monthly. Blocks A2, A3, A4, and
A9 change in the first month, and block A5 changes in the next month.

327 | P a g e
70-534 Architecting Microsoft Azure Solutions

With Full Backup, each backup copy contains the entire data source. Full backup consumes a large amount of network
bandwidth and storage, each time a backup copy is transferred.
Differential backup stores only the blocks that changed since the initial full backup, which results in a smaller amount
of network and storage consumption. Differential backups don't retain redundant copies of unchanged data.
However, because the data blocks that remain unchanged between subsequent backups are transferred and stored,
differential backups are inefficient. In the second month, changed blocks A2, A3, A4, and A9 are backed up. In the
third month, these same blocks are backed up again, along with changed block A5. The changed blocks continue to be
backed up until the next full backup happens.
Incremental Backup achieves high storage and network efficiency by storing only the blocks of data that changed since
the previous backup. With incremental backup, there is no need to take regular full backups. In the example, after the
full backup is taken for the first month, changed blocks A2, A3, A4, and A9 are marked as changed and transferred for
the second month. In the third month, only changed block A5 is marked and transferred. Moving less data saves
storage and network resources, which decreases TCO.
Security

Azure Backup System Center Azure Backup Azure IaaS VM


Feature agent DPM Server Backup

Network
security
(to Azure)

Data security
(in Azure)

Network security
All backup traffic from your servers to the Recovery Services vault is encrypted using Advanced Encryption Standard
256. The backup data is sent over a secure HTTPS link. The backup data is also stored in the Recovery Services vault in
encrypted form. Only you, the Azure customer, have the passphrase to unlock this data. Microsoft cannot decrypt the
backup data at any point.
Warning
328 | P a g e
70-534 Architecting Microsoft Azure Solutions

Once you establish the Recovery Services vault, only you have access to the encryption key. Microsoft never maintains
a copy of your encryption key, and does not have access to the key. If the key is misplaced, Microsoft cannot recover
the backup data.
Data security
Backing up Azure VMs requires setting up encryption within the virtual machine. Use BitLocker on Windows virtual
machines and dm-crypt on Linux virtual machines. Azure Backup does not automatically encrypt backup data that
comes through this path.
Network

Azure Backup System Center Azure Backup Azure IaaS VM


Feature agent DPM Server Backup

Network compression
(to backup server)

Network compression
(to Recovery Services
vault)

Network protocol TCP TCP


(to backup server)

Network protocol HTTPS HTTPS HTTPS HTTPS


(to Recovery Services
vault)

The VM extension (on the IaaS VM) reads the data directly from the Azure storage account over the storage network,
so it is not necessary to compress this traffic.
If you use a System Center DPM server or Azure Backup Server as a secondary backup server, compress the data going
from the primary server to the backup server. Compressing data before backing it up to DPM or Azure Backup Server,
saves bandwidth.
Network Throttling
The Azure Backup agent offers network throttling, which allows you to control how network bandwidth is used during
data transfer. Throttling can be helpful if you need to back up data during work hours but do not want the backup
process to interfere with other internet traffic. Throttling for data transfer applies to back up and restore activities.
Backup and retention
Azure Backup has a limit of 9999 recovery points, also known as backup copies or snapshots, per protected instance. A
protected instance is a computer, server (physical or virtual), or workload configured to back up data to Azure. For
more information, see the section, What is a protected instance. An instance is protected once a backup copy of data
has been saved. The backup copy of data is the protection. If the source data was lost or became corrupt, the backup
copy could restore the source data. The following table shows the maximum backup frequency for each component.
Your backup policy configuration determines how quickly you consume the recovery points. For example, if you create
a recovery point each day, then you can retain recovery points for 27 years before you run out. If you take a monthly
recovery point, you can retain recovery points for 833 years before you run out. The Backup service does not set an
expiration time limit on a recovery point.

329 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure Backup System Center Azure Backup Azure IaaS VM


agent DPM Server Backup

Backup frequency Three backups Two backups per Two backups per One backup
(to Recovery per day day day per day
Services vault)

Backup frequency Not applicable Every 15 Every 15 Not applicable


(to disk) minutes for SQL minutes for SQL
Server Server
Every hour for Every hour for
other workloads other workloads

Retention options Daily, weekly, Daily, weekly, Daily, weekly, Daily, weekly,
monthly, yearly monthly, yearly monthly, yearly monthly, yearly

Maximum recovery 9999 9999 9999 9999


points per protected
instance

Maximum retention Depends on Depends on Depends on Depends on


period backup backup backup backup
frequency frequency frequency frequency

Recovery points on Not applicable 64 for File 64 for File Not applicable
local disk Servers, Servers,
448 for 448 for
Application Application
Servers Servers

Recovery points on Not applicable Unlimited Not applicable Not applicable


tape

What is a protected instance


A protected instance is a generic reference to a Windows computer, a server (physical or virtual), or SQL database that
has been configured to back up to Azure. An instance is protected once you configure a backup policy for the
computer, server, or database, and create a backup copy of the data. Subsequent copies of the backup data for that
protected instance (which are called recovery points), increase the amount of storage consumed. You can create up to
9999 recovery points for a protected instance. If you delete a recovery point from storage, it does not count against
the 9999 recovery point total. Some common examples of protected instances are virtual machines, application
servers, databases, and personal computers running the Windows operating system. For example:
A virtual machine running the Hyper-V or Azure IaaS hypervisor fabric. The guest operating systems for the
virtual machine can be Windows Server or Linux.
An application server: The application server can be a physical or virtual machine running Windows Server and
workloads with data that needs to be backed up. Common workloads are Microsoft SQL Server, Microsoft
Exchange server, Microsoft SharePoint server, and the File Server role on Windows Server. To back up these
workloads you need System Center Data Protection Manager (DPM) or Azure Backup Server.
A personal computer, workstation, or laptop running the Windows operating system.
What is a Recovery Services vault?
A Recovery Services vault is an online storage entity in Azure used to hold data such as backup copies, recovery points,
and backup policies. You can use Recovery Services vaults to hold backup data for Azure services and on-premises
330 | P a g e
70-534 Architecting Microsoft Azure Solutions

servers and workstations. Recovery Services vaults make it easy to organize your backup data, while minimizing
management overhead. You can create as many Recovery Services vaults as you like, within a subscription.
Backup vaults, which are based on Azure Service Manager, were the first version of the vault. Recovery Services
vaults, which add the Azure Resource Manager model features, are the second version of the vault. See the Recovery
Services vault overview article for a full description of the feature differences. You can no longer create Backup vaults
in the Azure portal, but Backup vaults are still supported.
Important
You can now upgrade your Backup vaults to Recovery Services vaults. For details, see the article Upgrade a Backup
vault to a Recovery Services vault. Microsoft encourages you to upgrade your Backup vaults to Recovery Services
vaults.
Starting November 1, 2017:
Any remaining Backup vaults will be automatically upgraded to Recovery Services vaults.
You won't be able to access your backup data in the classic portal. Instead, use the Azure portal to access your
backup data in Recovery Services vaults.
How does Azure Backup differ from Azure Site Recovery?
Azure Backup and Azure Site Recovery are related in that both services back up data and can restore that data.
However, these services serve different purposes in providing business continuity and disaster recovery in your
business. Use Azure Backup to protect and restore data at a more granular level. For example, if a presentation on a
laptop became corrupted, you would use Azure Backup to restore the presentation. If you wanted to replicate the
configuration and data on a VM across another datacenter, use Azure Site Recovery.
Azure Backup protects data on-premises and in the cloud. Azure Site Recovery coordinates virtual-machine and
physical-server replication, failover, and failback. Both services are important because your disaster recovery solution
needs to keep your data safe and recoverable (Backup) and keep your workloads available (Site Recovery) when
outages occur.
The following concepts can help you make important decisions around backup and disaster recovery.

Concept Details Backup Disaster recovery (DR)

Recovery The amount of Backup solutions have wide variability Disaster recovery
point acceptable data in their acceptable RPO. Virtual solutions have low
objective loss if a recovery machine backups usually have an RPO RPOs. The DR copy can
(RPO) needs to be of one day, while database backups be behind by a few
done. have RPOs as low as 15 minutes. seconds or a few
minutes.

Recovery The amount of Because of the larger RPO, the amount Disaster recovery
time time that it of data that a backup solution needs to solutions have smaller
objective takes to process is typically much higher, which RTOs because they are
(RTO) complete a leads to longer RTOs. For example, it more in sync with the
recovery or can take days to restore data from source. Fewer changes
restore. tapes, depending on the time it takes need to be processed.
to transport the tape from an off-site
location.

Retention How long data For scenarios that require operational


needs to be recovery (data corruption, inadvertent
stored file deletion, OS failure), backup data is
typically retained for 30 days or less.
From a compliance standpoint, data
might need to be stored for months or
even years. Backup data is ideally
suited for archiving in such cases.

331 | P a g e
70-534 Architecting Microsoft Azure Solutions

First look: back up files and folders in Resource Manager deployment


This article explains how to back up your Windows Server (or Windows computer) files and folders to Azure using a
Resource Manager deployment. It's a tutorial intended to walk you through the basics. If you want to get started using
Azure Backup, you're in the right place.
If you want to know more about Azure Backup, read this overview.
If you don't have an Azure subscription, create a free account that lets you access any Azure service.
Create a recovery services vault
To back up your files and folders, you need to create a Recovery Services vault in the region where you want to store
the data. You also need to determine how you want your storage replicated.
To create a Recovery Services vault
1. If you haven't already done so, sign in to the Azure Portal using your Azure subscription.
2. On the Hub menu, click More services and in the list of resources, type Recovery Services and click Recovery
Services vaults.

If there are recovery services vaults in the subscription, the vaults are listed.
3. On the Recovery Services vaults menu, click Add.

332 | P a g e
70-534 Architecting Microsoft Azure Solutions

The Recovery Services vault blade opens, prompting you to provide a Name, Subscription, Resource group,
and Location.

333 | P a g e
70-534 Architecting Microsoft Azure Solutions

4. For Name, enter a friendly name to identify the vault. The name needs to be unique for the Azure
subscription. Type a name that contains between 2 and 50 characters. It must start with a letter, and can
contain only letters, numbers, and hyphens.
5. In the Subscription section, use the drop-down menu to choose the Azure subscription. If you use only one
subscription, that subscription appears and you can skip to the next step. If you are not sure which
subscription to use, use the default (or suggested) subscription. There are multiple choices only if your
organizational account is associated with multiple Azure subscriptions.
6. In the Resource group section:
select Create new if you want to create a new Resource group. Or
select Use existing and click the drop-down menu to see the available list of Resource groups.
For complete information on Resource groups, see the Azure Resource Manager overview.
7. Click Location to select the geographic region for the vault. This choice determines the geographic region
where your backup data is sent.
8. At the bottom of the Recovery Services vault blade, click Create.
It can take several minutes for the Recovery Services vault to be created. Monitor the status notifications in the upper
right-hand area of the portal. Once your vault is created, it appears in the list of Recovery Services vaults. If after
several minutes you don't see your vault, click Refresh.

Once you see your vault in the list of Recovery Services vaults, you are ready to set the storage redundancy.

334 | P a g e
70-534 Architecting Microsoft Azure Solutions

Set storage redundancy for the vault


When you create a Recovery Services vault, make sure storage redundancy is configured the way you want.
1. From the Recovery Services vaults blade, click the new vault.

When you select the vault, the Recovery Services vault blade narrows, and the Settings blade (which has the name of
the vault at the top) and the vault details blade open.

335 | P a g e
70-534 Architecting Microsoft Azure Solutions

2. In the new vault's Settings blade, use the vertical slide to scroll down to the Manage section, and click Backup
Infrastructure. The Backup Infrastructure blade opens.
3. In the Backup Infrastructure blade, click Backup Configuration to open the Backup Configuration blade.

336 | P a g e
70-534 Architecting Microsoft Azure Solutions

4. Choose the appropriate storage replication option for your vault.

By default, your vault has geo-redundant storage. If you use Azure as a primary backup storage endpoint, continue to
use Geo-redundant. If you don't use Azure as a primary backup storage endpoint, then choose Locally-redundant,
which reduces the Azure storage costs. Read more about geo-redundant and locally redundant storage options in
this Storage redundancy overview.
Now that you've created a vault, configure it for backing up files and folders.
Configure the vault
1. On the Recovery Services vault blade (for the vault you just created), in the Getting Started section,
click Backup, then on the Getting Started with Backup blade, select Backup goal.

The Backup Goal blade opens.

337 | P a g e
70-534 Architecting Microsoft Azure Solutions

2. From the Where is your workload running? drop-down menu, select On-premises.
You choose On-premises because your Windows Server or Windows computer is a physical machine that is not in
Azure.
3. From the What do you want to backup? menu, select Files and folders, and click OK.

After clicking OK, a checkmark appears next to Backup goal, and the Prepare infrastructure blade opens.

338 | P a g e
70-534 Architecting Microsoft Azure Solutions

4. On the Prepare infrastructure blade, click Download Agent for Windows Server or Windows Client.

339 | P a g e
70-534 Architecting Microsoft Azure Solutions

If you are using Windows Server Essential, then choose to download the agent for Windows Server Essential. A pop-up
menu prompts you to run or save MARSAgentInstaller.exe.

5. In the download pop-up menu, click Save.


By default, the MARSagentinstaller.exe file is saved to your Downloads folder. When the installer completes, you will
see a pop-up asking if you want to run the installer, or open the folder.

You don't need to install the agent yet. You can install the agent after you have downloaded the vault credentials.
6. On the Prepare infrastructure blade, click Download.

340 | P a g e
70-534 Architecting Microsoft Azure Solutions

The vault credentials download to your Downloads folder. After the vault credentials finish downloading, you see a
pop-up asking if you want to open or save the credentials. Click Save. If you accidentally click Open, let the dialog that
attempts to open the vault credentials, fail. You cannot open the vault credentials. Proceed to the next step. The vault
credentials are in the Downloads folder.

Install and register the agent


Note
Enabling backup through the Azure portal is not available, yet. Use the Microsoft Azure Recovery Services Agent to
back up your files and folders.
1. Locate and double-click the MARSagentinstaller.exe from the Downloads folder (or other saved location).
The installer provides a series of messages as it extracts, installs, and registers the Recovery Services agent.

2. Complete the Microsoft Azure Recovery Services Agent Setup Wizard. To complete the wizard, you need to:
Choose a location for the installation and cache folder.
Provide your proxy server info if you use a proxy server to connect to the internet.
341 | P a g e
70-534 Architecting Microsoft Azure Solutions

Provide your user name and password details if you use an authenticated proxy.
Provide the downloaded vault credentials
Save the encryption passphrase in a secure location.
Note
If you lose or forget the passphrase, Microsoft cannot help recover the backup data. Save the file in a secure location.
It is required to restore a backup.
The agent is now installed and your machine is registered to the vault. You're ready to configure and schedule your
backup.
Back up your files and folders
The initial backup includes two key tasks:
Schedule the backup
Back up files and folders for the first time
To complete the initial backup, use the Microsoft Azure Recovery Services agent.
To schedule the backup job
1. Open the Microsoft Azure Recovery Services agent. You can find it by searching your machine for Microsoft
Azure Backup.

2. In the Recovery Services agent, click Schedule Backup.

3. On the Getting started page of the Schedule Backup Wizard, click Next.
4. On the Select Items to Backup page, click Add Items.
5. Select the files and folders that you want to back up, and then click Okay.
6. Click Next.
7. On the Specify Backup Schedule page, specify the backup schedule and click Next.
You can schedule daily (at a maximum rate of three times per day) or weekly backups.

342 | P a g e
70-534 Architecting Microsoft Azure Solutions

Note
For more information about how to specify the backup schedule, see the article Use Azure Backup to replace your
tape infrastructure.
8. On the Select Retention Policy page, select the Retention Policy for the backup copy.
The retention policy specifies how long the backup data is stored. Rather than specifying a flat policy for all backup
points, you can specify different retention policies based on when the backup occurs. You can modify the daily,
weekly, monthly, and yearly retention policies to meet your needs.
9. On the Choose Initial Backup Type page, choose the initial backup type. Leave the option Automatically over
the network selected, and then click Next.
You can back up automatically over the network, or you can back up offline. The remainder of this article describes
the process for backing up automatically. If you prefer to do an offline backup, review the article Offline backup
workflow in Azure Backup for additional information.
10. On the Confirmation page, review the information, and then click Finish.
11. After the wizard finishes creating the backup schedule, click Close.
To back up files and folders for the first time
1. In the Recovery Services agent, click Back Up Now to complete the initial seeding over the network.

2. On the Confirmation page, review the settings that the Back Up Now Wizard will use to back up the machine.
Then click Back Up.

343 | P a g e
70-534 Architecting Microsoft Azure Solutions

3. Click Close to close the wizard. If you close the wizard before the backup process finishes, the wizard
continues to run in the background.
After the initial backup is completed, the Job completed status appears in the Backup console.

Azure Automation overview


Microsoft Azure Automation provides a way for users to automate the manual, long-running, error-prone, and
frequently repeated tasks that are commonly performed in a cloud and enterprise environment. It saves time and
increases the reliability of regular administrative tasks and even schedules them to be automatically performed at
regular intervals. You can automate processes using runbooks or automate configuration management using Desired
State Configuration. This article provides brief overview of Azure Automation and answers some common questions.
You can refer to other articles in this library for more detailed information on the different topics.
Automating processes with runbooks
A runbook is a set of tasks that perform some automated process in Azure Automation. It may be a simple process
such as starting a virtual machine and creating a log entry, or you may have a complex runbook that combines other
smaller runbooks to perform a complex process across multiple resources or even multiple clouds and on premise
environments.
For example, you might have an existing manual process for truncating a SQL database if its approaching maximum
size that includes multiple steps such as connecting to the server, connecting to the database, get the current size of
database, check if threshold has exceeded and then truncate it and notify user. Instead of manually performing each
of these steps, you could create a runbook that would perform all of these tasks as a single process. You would start
the runbook, provide the required information such as the SQL server name, database name, and recipient e-mail and
then sit back while the process completes.
What can runbooks automate?
Runbooks in Azure Automation are based on Windows PowerShell or Windows PowerShell Workflow, so they do
anything that PowerShell can do. If an application or service has an API, then a runbook can work with it. If you have a
PowerShell module for the application, then you can load that module into Azure Automation and include those
cmdlets in your runbook. Azure Automation runbooks run in the Azure cloud and can access any cloud resources or
external resources that can be accessed from the cloud. Using Hybrid Runbook Worker, runbooks can run in your local
data center to manage local resources.
Getting runbooks from the community
The Runbook Gallery contains runbooks from Microsoft and the community that you can either use unchanged in your
environment or customize them for your own purposes. They are also useful to as references to learn how to create
your own runbooks. You can even contribute your own runbooks to the gallery that you think other users may find
useful.
Creating Runbooks with Azure Automation
You can create your own runbooks from scratch or modify runbooks from the Runbook Gallery for your own
requirements. There are four different runbook types that you can choose from based on your requirements and
PowerShell experience. If you prefer to work directly with the PowerShell code, then you can use a PowerShell
344 | P a g e
70-534 Architecting Microsoft Azure Solutions

runbook or PowerShell Workflow runbook that you edit offline or with the textual editor in the Azure portal. If you
prefer to edit a runbook without being exposed to the underlying code, then you can create a Graphical runbook using
the graphical editor in the Azure portal.
Prefer watching to reading? Have a look at the below video from Microsoft Ignite session in May 2015. Note: While
the concepts and features discussed in this video are correct, Azure Automation has progressed a lot since this video
was recorded, it now has a more extensive UI in the Azure portal, and supports additional capabilities.
Automating configuration management with Desired State Configuration
PowerShell DSC is a management platform that allows you to manage, deploy and enforce configuration for physical
hosts and virtual machines using a declarative PowerShell syntax. You can define configurations on a central DSC Pull
Server that target machines can automatically retrieve and apply. DSC provides a set of PowerShell cmdlets that you
can use to manage configurations and resources.
Azure Automation DSC is a cloud based solution for PowerShell DSC that provides services required for enterprise
environments. You can manage your DSC resources in Azure Automation and apply configurations to virtual or
physical machines that retrieve them from a DSC Pull Server in the Azure cloud. It also provides reporting services that
inform you of important events such as when nodes have deviated from their assigned configuration and when a new
configuration has been applied.
Creating your own DSC configurations with Azure Automation
DSC configurations specify the desired state of a node. Multiple nodes can apply the same configuration to assure that
they all maintain an identical state. You can create a configuration using any text editor on your local machine and
then import it into Azure Automation where you can compile it and apply it nodes.
Getting modules and configurations
You can get PowerShell modules containing cmdlets that you can use in your runbooks and DSC configurations from
the PowerShell Gallery. You can launch this gallery from the Azure portal and import modules directly into Azure
Automation, or you can download and import them manually. You cannot install the modules directly from the Azure
portal, but you can download them install them as you would any other module.
Example practical applications of Azure Automation
Following are just a few examples of what are the kinds of automation scenarios with Azure Automation.
Create and copy virtual machines in different Azure subscriptions.
Schedule file copies from a local machine to an Azure Blob Storage container.
Automate security functions such as deny requests from a client when a denial of service attack is detected.
Ensure machines continually align with configured security policy.
Manage continuous deployment of application code across cloud and on premises infrastructure.
Build an Active Directory forest in Azure for your lab environment.
Truncate a table in a SQL database if DB is approaching maximum size.
Remotely update environment settings for an Azure website.
How does Azure Automation relate to other automation tools?
Service Management Automation (SMA) is intended to automate management tasks in the private cloud. It is installed
locally in your data center as a component of Microsoft Azure Pack. SMA and Azure Automation use the same
runbook format based on Windows PowerShell and Windows PowerShell Workflow, but SMA does not
support graphical runbooks.
System Center 2012 Orchestrator is intended for automation of on-premises resources. It uses a different runbook
format than Azure Automation and Service Management Automation and has a graphical interface to create runbooks
without requiring any scripting. Its runbooks are composed of activities from Integration Packs that are written
specifically for Orchestrator.
Where can I get more information?
A variety of resources are available for you to learn more about Azure Automation and creating your own runbooks.
Azure Automation Library is where you are right now. The articles in this library provide complete
documentation on the configuration and administration of Azure Automation and for authoring your own
runbooks.
Azure PowerShell cmdlets provides information for automating Azure operations using Windows PowerShell.
Runbooks use these cmdlets to work with Azure resources.
Management Blog provides the latest information on Azure Automation and other management technologies
from Microsoft. You should subscribe to this blog to stay up to date with the latest from the Azure
Automation team.
345 | P a g e
70-534 Architecting Microsoft Azure Solutions

Automation Forum allows you to post questions about Azure Automation to be addressed by Microsoft and
the Automation community.
Azure Automation Cmdlets provides information for automating administration tasks. It contains cmdlets to
manage Automation accounts, assets, runbooks, DSC.

My first PowerShell runbook


This tutorial walks you through the creation of a PowerShell runbook in Azure Automation. We start with a simple
runbook that we test and publish while we explain how to track the status of the runbook job. Then we modify the
runbook to actually manage Azure resources, in this case starting an Azure virtual machine. Lastly, we make the
runbook more robust by adding runbook parameters.
Prerequisites
To complete this tutorial, you need the following:
Azure subscription. If you don't have one yet, you can activate your MSDN subscriber benefits or [sign up for a
free account](https://azure.microsoft.com/free/).
Automation account to hold the runbook and authenticate to Azure resources. This account must have
permission to start and stop the virtual machine.
An Azure virtual machine. We stop and start this machine so it should not be a production VM.
Step 1 - Create new runbook
We'll start by creating a simple runbook that outputs the text Hello World.
1. In the Azure portal, open your Automation account.
The Automation account page gives you a quick view of the resources in this account. You should already have
some assets. Most of those are the modules that are automatically included in a new Automation account.
You should also have the Credential asset that's mentioned in the prerequisites.
2. Click the Runbooks tile to open the list of runbooks.

3. Create a new runbook by clicking the Add a runbook button and then Create a new runbook.
4. Give the runbook the name MyFirstRunbook-PowerShell.

346 | P a g e
70-534 Architecting Microsoft Azure Solutions

5. In this case, we're going to create a PowerShell runbook so select Powershell for Runbook type.

6. Click Create to create the runbook and open the textual editor.
Step 2 - Add code to the runbook
You can either type code directly into the runbook, or you can select cmdlets, runbooks, and assets from the Library
control and have them added to the runbook with any related parameters. For this walkthrough, we type directly in
the runbook.
1. Our runbook is currently empty, type Write-Output "Hello World.".

2. Save the runbook by clicking Save.

Step 3 - Test the runbook


Before we publish the runbook to make it available in production, we want to test it to make sure that it works
properly. When you test a runbook, you run its Draft version and view its output interactively.
1. Click Test pane to open the Test pane.

2. Click Start to start the test. This should be the only enabled option.
3. A runbook job is created and its status displayed.
The job status starts as Queued indicating that it is waiting for a runbook worker in the cloud to come
347 | P a g e
70-534 Architecting Microsoft Azure Solutions

available. It will then move to Starting when a worker claims the job, and then Running when the runbook
actually starts running.
4. When the runbook job completes, its output is displayed. In our case, we should see Hello World.

5. Close the Test pane to return to the canvas.


Step 4 - Publish and start the runbook
The runbook that we created is still in Draft mode. We need to publish it before we can run it in production. When
you publish a runbook, you overwrite the existing Published version with the Draft version. In our case, we don't have
a Published version yet because we just created the runbook.
1. Click Publish to publish the runbook and then Yes when prompted.

2. If you scroll left to view the runbook in the Runbooks pane now, it will show an Authoring Status of Published.
3. Scroll back to the right to view the pane for MyFirstRunbook-PowerShell.
The options across the top allow us to start the runbook, view the runbook, schedule it to start at some time
in the future, or create a webhook so it can be started through an HTTP call.
4. We want to start the runbook, so click Start and then click Ok when the Start Runbook blade opens.

5. A job pane is opened for the runbook job that we created. We can close this pane, but in this case we leave it
open so we can watch the job's progress.

348 | P a g e
70-534 Architecting Microsoft Azure Solutions

6. The job status is shown in Job Summary and matches the statuses that we saw when we tested the runbook.

7. Once the runbook status shows Completed, click Output. The Output pane is opened, and we can see
our Hello World.

349 | P a g e
70-534 Architecting Microsoft Azure Solutions

8. Close the Output pane.


9. Click All Logs to open the Streams pane for the runbook job. We should only see Hello World in the output
stream, but this can show other streams for a runbook job such as Verbose and Error if the runbook writes to
them.

350 | P a g e
70-534 Architecting Microsoft Azure Solutions

10. Close the Streams pane and the Job pane to return to the MyFirstRunbook-PowerShell pane.
11. Click Jobs to open the Jobs pane for this runbook. This lists all of the jobs created by this runbook. We should
only see one job listed since we only ran the job once.

351 | P a g e
70-534 Architecting Microsoft Azure Solutions

12. You can click this job to open the same Job pane that we viewed when we started the runbook. This allows
you to go back in time and view the details of any job that was created for a particular runbook.
Step 5 - Add authentication to manage Azure resources
We've tested and published our runbook, but so far it doesn't do anything useful. We want to have it manage Azure
resources. It won't be able to do that though unless we have it authenticate using the credentials that are referred to
in the prerequisites. We do that with the Add-AzureRmAccount cmdlet.
1. Open the textual editor by clicking Edit on the MyFirstRunbook-PowerShell pane.

2. We don't need the Write-Output line anymore, so go ahead and delete it.
3. Type or copy and paste the following code that handles the authentication with your Automation Run As
account:
Copy
$Conn = Get-AutomationConnection -Name AzureRunAsConnection
Add-AzureRMAccount -ServicePrincipal -Tenant $Conn.TenantID `
-ApplicationId $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint

4. Click Test pane so that we can test the runbook.


5. Click Start to start the test. Once it completes, you should receive output similar to the following, displaying
basic information from your account. This confirms that the credential is valid.

Step 6 - Add code to start a virtual machine


Now that our runbook is authenticating to our Azure subscription, we can manage resources. We add a command to
start a virtual machine. You can pick any virtual machine in your Azure subscription, and for now we will hardcode that
name in the runbook.
1. After Add-AzureRmAccount, type Start-AzureRmVM -Name 'VMName' -ResourceGroupName
'NameofResourceGroup' providing the name and Resource Group name of the virtual machine to start.
Copy
$Conn = Get-AutomationConnection -Name AzureRunAsConnection

352 | P a g e
70-534 Architecting Microsoft Azure Solutions

Add-AzureRMAccount -ServicePrincipal -Tenant $Conn.TenantID `


-ApplicationID $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint
Start-AzureRmVM -Name 'VMName' -ResourceGroupName 'ResourceGroupName'

2. Save the runbook and then click Test pane so that we can test it.
3. Click Start to start the test. Once it completes, check that the virtual machine was started.
Step 7 - Add an input parameter to the runbook
Our runbook currently starts the virtual machine that we hardcoded in the runbook, but it would be more useful if we
specify the virtual machine when the runbook is started. We will now add input parameters to the runbook to provide
that functionality.
1. Add parameters for VMName and ResourceGroupName to the runbook and use these variables with the Start-
AzureRmVM cmdlet as in the example below.
Copy
Param(
[string]$VMName,
[string]$ResourceGroupName
)
$Conn = Get-AutomationConnection -Name AzureRunAsConnection
Add-AzureRMAccount -ServicePrincipal -Tenant $Conn.TenantID `
-ApplicationID $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint
Start-AzureRmVM -Name $VMName -ResourceGroupName $ResourceGroupName

2. Save the runbook and open the Test pane. You can now provide values for the two input variables that are
used in the test.
3. Close the Test pane.
4. Click Publish to publish the new version of the runbook.
5. Stop the virtual machine that you started in the previous step.
6. Click Start to start the runbook. Type in the VMName and ResourceGroupName for the virtual machine that
you're going to start.

353 | P a g e
70-534 Architecting Microsoft Azure Solutions

7. When the runbook completes, check that the virtual machine was started.
Differences from PowerShell Workflow
PowerShell runbooks have the same lifecycle, capabilities, and management as PowerShell Workflow runbooks but
there are some differences and limitations:
1. PowerShell runbooks run fast compared to PowerShell Workflow runbooks as they dont have compilation
step.
2. PowerShell Workflow runbooks support checkpoints, using checkpoints, PowerShell Workflow runbooks can
resume from any point in the runbook whereas PowerShell runbooks can only resume from the beginning.
3. PowerShell Workflow runbooks support parallel and serial execution whereas PowerShell runbooks can only
execute commands serially.
4. In a PowerShell Workflow runbook, an activity, a command, or a script block can have its own runspace
whereas in a PowerShell runbook, everything in a script runs in a single runspace. There are also
some syntactic differences between a native PowerShell runbook and a PowerShell Workflow runbook.

Azure Automation DSC Overview


Azure Automation DSC is an Azure service that allows you to write, manage, and compile PowerShell Desired State
Configuration (DSC)configurations, import DSC Resources, and assign configurations to target nodes, all in the cloud.
Why use Azure Automation DSC
Azure Automation DSC provides several advantages over using DSC outside of Azure.
Built-in pull server
Azure Automation provides a DSC pull server so that target nodes automatically receive configurations, conform to
the desired state, and report back on their compliance. The built-in pull server in Azure Automation eliminates the
need to set up and maintain your own pull server. Azure Automation can target virtual or physical Windows or Linux
machines, in the cloud or on-premises.
Management of all your DSC artifacts
354 | P a g e
70-534 Architecting Microsoft Azure Solutions

Azure Automation DSC brings the same management layer to PowerShell Desired State Configuration as Azure
Automation offers for PowerShell scripting.
From the Azure portal, or from PowerShell, you can manage all your DSC configurations, resources, and target nodes.

Import reporting data into Log Analytics


Nodes that are managed with Azure Automation DSC send detailed reporting status data to the built-in pull server.
You can configure Azure Automation DSC to send this data to your Microsoft Operations Management Suite (OMS)
Log Analytics workspace. To learn how to send DSC status data to your Log Analytics workspace, see Forward Azure
Automation DSC reporting data to OMS Log Analytics.
Introduction video
Prefer watching to reading? Have a look at the following video from May 2015, when Azure Automation DSC was first
announced.
Note
While the concepts and life cycle discussed in this video are correct, Azure Automation DSC has progressed a lot since
this video was recorded. It is now generally available, has a much more extensive UI in the Azure portal, and supports
many additional capabilities.

355 | P a g e
Table of Contents

Azure Architecture Center


Design Review Checklists
Availability
DevOps
Resiliency
Scalability
Availability checklist
6/23/2017 13 min to read Edit Online

Application design
Avoid any single point of failure. All components, services, resources, and compute instances should be
deployed as multiple instances to prevent a single point of failure from affecting availability. This includes
authentication mechanisms. Design the application to be configurable to use multiple instances, and to
automatically detect failures and redirect requests to non-failed instances where the platform does not do this
automatically.
Decompose workload per different service-level agreement. If a service is composed of critical and less-
critical workloads, manage them differently and specify the service features and number of instances to meet
their availability requirements.
Minimize and understand service dependencies. Minimize the number of different services used where
possible, and ensure you understand all of the feature and service dependencies that exist in the system. This
includes the nature of these dependencies, and the impact of failure or reduced performance in each one on the
overall application. Microsoft guarantees at least 99.9 percent availability for most services, but this means that
every additional service an application relies on potentially reduces the overall availability SLA of your system
by 0.1 percent.
Design tasks and messages to be idempotent (safely repeatable) where possible, so that duplicated
requests will not cause problems. For example, a service can act as a consumer that handles messages sent as
requests by other parts of the system that act as producers. If the consumer fails after processing the message,
but before acknowledging that it has been processed, a producer might submit a repeat request which could be
handled by another instance of the consumer. For this reason, consumers and the operations they carry out
should be idempotent so that repeating a previously executed operation does not render the results invalid. This
may mean detecting duplicated messages, or ensuring consistency by using an optimistic approach to handling
conflicts.
Use a message broker that implements high availability for critical transactions. Many scenarios for
initiating tasks or accessing remote services use messaging to pass instructions between the application and the
target service. For best performance, the application should be able to send the message and then return to
process more requests, without needing to wait for a reply. To guarantee delivery of messages, the messaging
system should provide high availability. Azure Service Bus message queues implement at least once semantics.
This means that each message posted to a queue will not be lost, although duplicate copies may be delivered
under certain circumstances. If message processing is idempotent (see the previous item), repeated delivery
should not be a problem.
Design applications to gracefully degrade when reaching resource limits, and take appropriate action to
minimize the impact for the user. In some cases, the load on the application may exceed the capacity of one or
more parts, causing reduced availability and failed connections. Scaling can help to alleviate this, but it may
reach a limit imposed by other factors, such as resource availability or cost. Design the application so that, in this
situation, it can automatically degrade gracefully. For example, in an ecommerce system, if the order-processing
subsystem is under strain (or has even failed completely), it can be temporarily disabled while allowing other
functionality (such as browsing the product catalog) to continue. It might be appropriate to postpone requests to
a failing subsystem, for example still enabling customers to submit orders but saving them for later processing,
when the orders subsystem is available again.
Gracefully handle rapid burst events. Most applications need to handle varying workloads over time, such as
peaks first thing in the morning in a business application or when a new product is released in an ecommerce
site. Auto-scaling can help to handle the load, but it may take some time for additional instances to come online
and handle requests. Prevent sudden and unexpected bursts of activity from overwhelming the application:
design it to queue requests to the services it uses and degrade gracefully when queues are near to full capacity.
Ensure that there is sufficient performance and capacity available under non-burst conditions to drain the
queues and handle outstanding requests. For more information, see the Queue-Based Load Leveling Pattern.

Deployment and maintenance


Deploy multiple instances of roles for each service. Microsoft makes availability guarantees for services
that you create and deploy, but these guarantees are only valid if you deploy at least two instances of each role
in the service. This enables one role to be unavailable while the other remains active. This is especially important
if you need to deploy updates to a live system without interrupting clients' activities; instances can be taken
down and upgraded individually while the others continue online.
Host applications in multiple datacenters. Although extremely unlikely, it is possible for an entire datacenter
to go offline through an event such as a natural disaster or Internet failure. Vital business applications should be
hosted in more than one datacenter to provide maximum availability. This can also reduce latency for local
users, and provide additional opportunities for flexibility when updating applications.
Automate and test deployment and maintenance tasks. Distributed applications consist of multiple parts
that must work together. Deployment should therefore be automated, using tested and proven mechanisms
such as scripts and deployment applications. These can update and validate configuration, and automate the
deployment process. Automated techniques should also be used to perform updates of all or parts of
applications. It is vital to test all of these processes fully to ensure that errors do not cause additional downtime.
All deployment tools must have suitable security restrictions to protect the deployed application; define and
enforce deployment policies carefully and minimize the need for human intervention.
Consider using staging and production features of the platform where these are available. For example,
using Azure Cloud Services staging and production environments allows applications to be switched from one
to another instantly through a virtual IP address swap (VIP Swap). However, if you prefer to stage on-premises,
or deploy different versions of the application concurrently and gradually migrate users, you may not be able to
use a VIP Swap operation.
Apply configuration changes without recycling the instance when possible. In many cases, the
configuration settings for an Azure application or service can be changed without requiring the role to be
restarted. Role expose events that can be handled to detect configuration changes and apply them to
components within the application. However, some changes to the core platform settings do require a role to be
restarted. When building components and services, maximize availability and minimize downtime by designing
them to accept changes to configuration settings without requiring the application as a whole to be restarted.
Use upgrade domains for zero downtime during updates. Azure compute units such as web and
worker roles are allocated to upgrade domains. Upgrade domains group role instances together so that,
when a rolling update takes place, each role in the upgrade domain is stopped, updated, and restarted in
turn. This minimizes the impact on application availability. You can specify how many upgrade domains
should be created for a service when the service is deployed.

NOTE
Roles are also distributed across fault domains, each of which is reasonably independent from other fault domains in
terms of server rack, power, and cooling provision, in order to minimize the chance of a failure affecting all role
instances. This distribution occurs automatically, and you cannot control it.

Configure availability sets for Azure virtual machines. Placing two or more virtual machines in the same
availability set guarantees that these virtual machines will not be deployed to the same fault domain. To
maximize availability, you should create multiple instances of each critical virtual machine used by your system
and place these instances in the same availability set. If you are running multiple virtual machines that serve
different purposes, create an availability set for each virtual machine. Add instances of each virtual machine to
each availability set. For example, if you have created separate virtual machines to act as a web server and a
reporting server, create an availability set for the web server and another availability set for the reporting server.
Add instances of the web server virtual machine to the web server availability set, and add instances of the
reporting server virtual machine to the reporting server availability set.

Data management
Geo-replicate data in Azure Storage. Data in Azure Storage is automatically replicated within in a datacenter.
For even higher availability, use Read-access geo-redundant storage (-RAGRS), which replicates your data to a
secondary region and provides read-only access to the data in the secondary location. The data is durable even
in the case of a complete regional outage or a disaster. For more information, see [Azure Storage
replication(/azure/storage/storage-redundancy)].
Geo-replicate databases. Azure SQL Database and Cosmos DB both support geo-replication, which enables
you to configure secondary database replicas in other regions. Secondary databases are available for querying
and for failover in the case of a data center outage or the inability to connect to the primary database. For more
information, see Failover groups and active geo-replication (SQL Database) and How to distribute data globally
with Azure Cosmos DB?.
Use optimistic concurrency and eventual consistency where possible. Transactions that block access to
resources through locking (pessimistic concurrency) can cause poor performance and considerably reduce
availability. These problems can become especially acute in distributed systems. In many cases, careful design
and techniques such as partitioning can minimize the chances of conflicting updates occurring. Where data is
replicated, or is read from a separately updated store, the data will only be eventually consistent. But the
advantages usually far outweigh the impact on availability of using transactions to ensure immediate
consistency.
Use periodic backup and point-in-time restore, and ensure it meets the Recovery Point Objective (RPO).
Regularly and automatically back up data that is not preserved elsewhere, and verify you can reliably restore
both the data and the application itself should a failure occur. Data replication is not a backup feature because
errors and inconsistencies introduced through failure, error, or malicious operations will be replicated across all
stores. The backup process must be secure to protect the data in transit and in storage. Databases or parts of a
data store can usually be recovered to a previous point in time by using transaction logs. Microsoft Azure
provides a backup facility for data stored in Azure SQL Database. The data is exported to a backup package on
Azure blob storage, and can be downloaded to a secure on-premises location for storage.
Enable the high availability option to maintain a secondary copy of an Azure Redis cache. When using
Azure Redis Cache, choose the standard option to maintain a secondary copy of the contents. For more
information, see Create a cache in Azure Redis Cache.

Errors and failures


Introduce the concept of a timeout. Services and resources may become unavailable, causing requests to
fail. Ensure that the timeouts you apply are appropriate for each service or resource as well as the client that is
accessing them. (In some cases, it may be appropriate to allow a longer timeout for a particular instance of a
client, depending on the context and other actions that the client is performing.) Very short timeouts may cause
excessive retry operations for services and resources that have considerable latency. Very long timeouts can
cause blocking if a large number of requests are queued, waiting for a service or resource to respond.
Retry failed operations caused by transient faults. Design a retry strategy for access to all services and
resources where they do not inherently support automatic connection retry. Use a strategy that includes an
increasing delay between retries as the number of failures increases, to prevent overloading of the resource and
to allow it to gracefully recover and handle queued requests. Continual retries with very short delays are likely
to exacerbate the problem.
Stop sending requests to avoid cascading failures when remote services are unavailable. There may be
situations in which transient or other faults, ranging in severity from a partial loss of connectivity to the
complete failure of a service, take much longer than expected to return to normal. Additionally, if a service is
very busy, failure in one part of the system may lead to cascading failures, and result in many operations
becoming blocked while holding onto critical system resources such as memory, threads, and database
connections. Instead of continually retrying an operation that is unlikely to succeed, the application should
quickly accept that the operation has failed, and gracefully handle this failure. You can use the circuit breaker
pattern to reject requests for specific operations for defined periods. For more information, see Circuit Breaker
Pattern.
Compose or fall back to multiple components to mitigate the impact of a specific service being offline or
unavailable. Design applications to take advantage of multiple instances without affecting operation and existing
connections where possible. Use multiple instances and distribute requests between them, and detect and avoid
sending requests to failed instances, in order to maximize availability.
Fall back to a different service or workflow where possible. For example, if writing to SQL Database fails,
temporarily store data in blob storage. Provide a facility to replay the writes in blob storage to SQL Database
when the service becomes available. In some cases, a failed operation may have an alternative action that allows
the application to continue to work even when a component or service fails. If possible, detect failures and
redirect requests to other services that can offer a suitable alternative functionality, or to back up or reduced
functionality instances that can maintain core operations while the primary service is offline.

Monitoring and disaster recovery


Provide rich instrumentation for likely failures and failure events to report the situation to operations
staff. For failures that are likely but have not yet occurred, provide sufficient data to enable operations staff to
determine the cause, mitigate the situation, and ensure that the system remains available. For failures that have
already occurred, the application should return an appropriate error message to the user but attempt to
continue running, albeit with reduced functionality. In all cases, the monitoring system should capture
comprehensive details to enable operations staff to effect a quick recovery, and if necessary, for designers and
developers to modify the system to prevent the situation from arising again.
Monitor system health by implementing checking functions. The health and performance of an
application can degrade over time, without being noticeable until it fails. Implement probes or check functions
that are executed regularly from outside the application. These checks can be as simple as measuring response
time for the application as a whole, for individual parts of the application, for individual services that the
application uses, or for individual components. Check functions can execute processes to ensure they produce
valid results, measure latency and check availability, and extract information from the system.
Regularly test all failover and fallback systems to ensure they are available and operate as expected.
Changes to systems and operations may affect failover and fallback functions, but the impact may not be
detected until the main system fails or becomes overloaded. Test it before it is required to compensate for a live
problem at runtime.
Test the monitoring systems. Automated failover and fallback systems, and manual visualization of system
health and performance by using dashboards, all depend on monitoring and instrumentation functioning
correctly. If these elements fail, miss critical information, or report inaccurate data, an operator might not realize
that the system is unhealthy or failing.
Track the progress of long-running workflows and retry on failure. Long-running workflows are often
composed of multiple steps. Ensure that each step is independent and can be retried to minimize the chance that
the entire workflow will need to be rolled back, or that multiple compensating transactions need to be executed.
Monitor and manage the progress of long-running workflows by implementing a pattern such as Scheduler
Agent Supervisor Pattern.
Plan for disaster recovery. Create an accepted, fully-tested plan for recovery from any type of failure that may
affect system availability. Choose a multi-site disaster recovery architecture for any mission-critical applications.
Identify a specific owner of the disaster recovery plan, including automation and testing. Ensure the plan is well-
documented, and automate the process as much as possible. Establish a backup strategy for all reference and
transactional data, and test the restoration of these backups regularly. Train operations staff to execute the plan,
and perform regular disaster simulations to validate and improve the plan.
DevOps Checklist
6/23/2017 14 min to read Edit Online

DevOps is the integration of development, quality assurance, and IT operations into a unified culture and set of
processes for delivering software.
Use this checklist as a starting point to assess your DevOps culture and process.

Culture
Ensure business alignment across organizations and teams. Conflicts over resources, purpose, goals, and
priorities within an organization can be a risk to successful operations. Ensure that the business, development, and
operations teams are all aligned.
Ensure the entire team understands the software lifecycle. Your team needs to understand the overall
lifecycle of the application, and which part of the lifecycle the application is currently in. This helps all team
members know what they should be doing now, and what they should be planning and preparing for in the future.
Reduce cycle time. Aim to minimize the time it takes to move from ideas to usable developed software. Limit the
size and scope of individual releases to keep the test burden low. Automate the build, test, configuration, and
deployment processes whenever possible. Clear any obstacles to communication among developers, and between
developers and operations.
Review and improve processes. Your processes and procedures, both automated and manual, are never final. Set
up regular reviews of current workflows, procedures, and documentation, with a goal of continual improvement.
Do proactive planning. Proactively plan for failure. Have processes in place to quickly identify issues when they
occur, escalate to the correct team members to fix, and confirm resolution.
Learn from failures. Failures are inevitable, but it's important to learn from failures to avoid repeating them. If an
operational failure occurs, triage the issue, document the cause and solution, and share any lessons that were
learned. Whenever possible, update your build processes to automatically detect that kind of failure in the future.
Optimize for speed and collect data. Every planned improvement is a hypothesis. Work in the smallest
increments possible. Treat new ideas as experiments. Instrument the experiments so that you can collect production
data to assess their effectiveness. Be prepared to fail fast if the hypothesis is wrong.
Allow time for learning. Both failures and successes provide good opportunities for learning. Before moving on
to new projects, allow enough time to gather the important lessons, and make sure those lessons are absorbed by
your team. Also give the team the time to build skills, experiment, and learn about new tools and techniques.
Document operations. Document all tools, processes, and automated tasks with the same level of quality as your
product code. Document the current design and architecture of any systems you support, along with recovery
processes and other maintenance procedures. Focus on the steps you actually perform, not theoretically optimal
processes. Regularly review and update the documentation. For code, make sure that meaningful comments are
included, especially in public APIs, and use tools to automatically generate code documentation whenever possible.
Share knowledge. Documentation is only useful if people know that it exists and can find it. Ensure the
documentation is organized and easily discoverable. Be creative: Use brown bags (informal presentations), videos,
or newsletters to share knowledge.

Development
Provide developers with production-like environments. If development and test environments don't match
the production environment, it is hard to test and diagnose problems. Therefore, keep development and test
environments as close to the production environment as possible. Make sure that test data is consistent with the
data used in production, even if it's sample data and not real production data (for privacy or compliance reasons).
Plan to generate and anonymize sample test data.
Ensure that all authorized team members can provision infrastructure and deploy the application. Setting
up production-like resources and deploying the application should not involve complicated manual tasks or
detailed technical knowledge of the system. Anyone with the right permissions should be able to create or deploy
production-like resources without going to the operations team.

This recommendation doesn't imply that anyone can push live updates to the production deployment. It's about
reducing friction for the development and QA teams to create production-like environments.

Instrument the application for insight. To understand the health of your application, you need to know how it's
performing and whether it's experiencing any errors or problems. Always include instrumentation as a design
requirement, and build the instrumentation into the application from the start. Instrumentation must include event
logging for root cause analysis, but also telemetry and metrics to monitor the overall health and usage of the
application.
Track your technical debt. In many projects, release schedules can get prioritized over code quality to one degree
or another. Always keep track when this occurs. Document any shortcuts or other nonoptimal implementations,
and schedule time in the future to revisit these issues.
Consider pushing updates directly to production. To reduce the overall release cycle time, consider pushing
properly tested code commits directly to production. Use feature toggles to control which features are enabled. This
allows you to move from development to release quickly, using the toggles to enable or disable features. Toggles
are also useful when performing tests such as canary releases, where a particular feature is deployed to a subset of
the production environment.

Testing
Automate testing. Manually testing software is tedious and susceptible to error. Automate common testing tasks
and integrate the tests into your build processes. Automated testing ensures consistent test coverage and
reproducibility. Integrated UI tests should also be performed by an automated tool. Azure offers development and
test resources that can help you configure and execute testing. For more information, see Development and test.
Test for failures. If a system can't connect to a service, how does it respond? Can it recover once the service is
available again? Make fault injection testing a standard part of review on test and staging environments. When
your test process and practices are mature, consider running these tests in production.
Test in production. The release process doesn't end with deployment to production. Have tests in place to ensure
that deployed code works as expected. For deployments that are infrequently updated, schedule production testing
as a regular part of maintenance.
Automate performance testing to identify performance issues early. The impact of a serious performance
issue can be just as severe as a bug in the code. While automated functional tests can prevent application bugs,
they might not detect performance problems. Define acceptable performance goals for metrics like latency, load
times, and resource usage. Include automated performance tests in your release pipeline, to make sure the
application meets those goals.
Perform capacity testing. An application might work fine under test conditions, and then have problems in
production due to scale or resource limitations. Always define the maximum expected capacity and usage limits.
Test to make sure the application can handle those limits, but also test what happens when those limits are
exceeded. Capacity testing should be performed at regular intervals.
After the initial release, you should run performance and capacity tests whenever updates are made to production
code. Use historical data to fine tune tests and to determine what types of tests need to be performed.
Perform automated security penetration testing. Ensuring your application is secure is as important as testing
any other functionality. Make automated penetration testing a standard part of the build and deployment process.
Schedule regular security tests and vulnerability scanning on deployed applications, monitoring for open ports,
endpoints, and attacks. Automated testing does not remove the need for in-depth security reviews at regular
intervals.
Perform automated business continuity testing. Develop tests for large scale business continuity, including
backup recovery and failover. Set up automated processes to perform these tests regularly.

Release
Automate deployments. Automate deploying the application to test, staging, and production environments.
Automation enables faster and more reliable deployments, and ensures consistent deployments to any supported
environment. It removes the risk of human error caused by manual deployments. It also makes it easy to schedule
releases for convenient times, to minimize any effects of potential downtime.
Use continuous integration. Continuous integration (CI) is the practice of merging all developer code into a
central codebase on a regular schedule, and then automatically performing standard build and test processes. CI
ensures that an entire team can work on a codebase at the same time without having conflicts. It also ensures that
code defects are found as early as possible. Preferably, the CI process should run every time that code is committed
or checked in. At the very least, it should run once per day.

Consider adopting a trunk based development model. In this model, developers commit to a single branch (the
trunk). There is a requirement that commits never break the build. This model facilitates CI, because all feature
work is done in the trunk, and any merge conflicts are resolved when the commit happens.

Consider using continuous delivery. Continuous delivery (CD) is the practice of ensuring that code is always
ready to deploy, by automatically building, testing, and deploying code to production-like environments. Adding
continuous delivery to create a full CI/CD pipeline will help you detect code defects as soon as possible, and
ensures that properly tested updates can be released in a very short time.

Continuous deployment is an additional process that automatically takes any updates that have passed through
the CI/CD pipeline and deploys them into production. Continuous deployment requires robust automatic
testing and advanced process planning, and may not be appropriate for all teams.

Make small incremental changes. Large code changes have a greater potential to introduce bugs. Whenever
possible, keep changes small. This limits the potential effects of each change, and makes it easier to understand and
debug any issues.
Control exposure to changes. Make sure you're in control of when updates are visible to your end users.
Consider using feature toggles to control when features are enabled for end users.
Implement release management strategies to reduce deployment risk. Deploying an application update to
production always entails some risk. To minimize this risk, use strategies such as canary releases or blue-green
deployments to deploy updates to a subset of users. Confirm the update works as expected, and then roll the
update out to the rest of the system.
Document all changes. Minor updates and configuration changes can be a source of confusion and versioning
conflict. Always keep a clear record of any changes, no matter how small. Log everything that changes, including
patches applied, policy changes, and configuration changes. (Don't include sensitive data in these logs. For example,
log that a credential was updated, and who made the change, but don't record the updated credentials.) The record
of the changes should be visible to the entire team.
Automate Deployments. Automate all deployments, and have systems in place to detect any problems during
rollout. Have a mitigation process for preserving the existing code and data in production, before the update
replaces them in all production instances. Have an automated way to roll forward fixes or roll back changes.
Consider making infrastructure immutable. Immutable infrastructure is the principle that you shouldnt modify
infrastructure after its deployed to production. Otherwise, you can get into a state where ad hoc changes have
been applied, making it hard to know exactly what changed. Immutable infrastructure works by replacing entire
servers as part of any new deployment. This allows the code and the hosting environment to be tested and
deployed as a block. Once deployed, infrastructure components aren't modified until the next build and deploy
cycle.

Monitoring
Make systems observable. The operations team should always have clear visibility into the health and status of a
system or service. Set up external health endpoints to monitor status, and ensure that applications are coded to
instrument the operations metrics. Use a common and consistent schema that lets you correlate events across
systems. Azure Diagnostics and Application Insights are the standard method of tracking the health and status of
Azure resources. Microsoft Operation Management Suite also provides centralized monitoring and management
for cloud or hybrid solutions.
Aggregate and correlate logs and metrics. A properly instrumented telemetry system will provide a large
amount of raw performance data and event logs. Make sure that telemetry and log data is processed and correlated
in a short period of time, so that operations staff always have an up-to-date picture of system health. Organize and
display data in ways that give a cohesive view of any issues, so that whenever possible it's clear when events are
related to one another.

Consult your corporate retention policy for requirements on how data is processed and how long it should be
stored.

Implement automated alerts and notifications. Set up monitoring tools like Azure Monitor to detect patterns
or conditions that indicate potential or current issues, and send alerts to the team members who can address the
issues. Tune the alerts to avoid false positives.
Monitor assets and resources for expirations. Some resources and assets, such as certificates, expire after a
given amount of time. Make sure to track which assets expire, when they expire, and what services or features
depend on them. Use automated processes to monitor these assets. Notify the operations team before an asset
expires, and escalate if expiration threatens to disrupt the application.

Management
Automate operations tasks. Manually handling repetitive operations processes is error-prone. Automate these
tasks whenever possible to ensure consistent execution and quality. Code that implements the automation should
be versioned in source control. As with any other code, automation tools must be tested.
Take an infrastructure-as-code approach to provisioning. Minimize the amount of manual configuration
needed to provision resources. Instead, use scripts and Azure Resource Manager templates. Keep the scripts and
templates in source control, like any other code you maintain.
Consider using containers. Containers provide a standard package-based interface for deploying applications.
Using containers, an application is deployed using self-contained packages that include any software,
dependencies, and files needed to run the application, which greatly simplifies the deployment process.
Containers also create an abstraction layer between the application and the underlying operating system, which
provides consistency across environments. This abstraction can also isolate a container from other processes or
applications running on a host.
Implement resiliency and self-healing. Resiliency is the ability of an application to recover from failures.
Strategies for resiliency include retrying transient failures, and failing over to a secondary instance or even another
region. For more information, see Designing resilient applications for Azure. Instrument your applications so that
issues are reported immediately and you can manage outages or other system failures.
Have an operations manual. An operations manual or runbook documents the procedures and management
information needed for operations staff to maintain a system. Also document any operations scenarios and
mitigation plans that might come into play during a failure or other disruption to your service. Create this
documentation during the development process, and keep it up to date afterwards. This is a living document, and
should be reviewed, tested, and improved regularly.
Shared documentation is critical. Encourage team members to contribute and share knowledge. The entire team
should have access to documents. Make it easy for anyone on the team to help keep documents updated.
Document on-call procedures. Make sure on-call duties, schedules, and procedures are documented and shared
to all team members. Keep this information up-to-date at all times.
Document escalation procedures for third-party dependencies. If your application depends on external third-
party services that you don't directly control, you must have a plan to deal with outages. Create documentation for
your planned mitigation processes. Include support contacts and escalation paths.
Use configuration management. Configuration changes should be planned, visible to operations, and recorded.
This could take the form of a configuration management database, or a configuration-as-code approach.
Configuration should be audited regularly to ensure that what's expected is actually in place.
Get an Azure support plan and understand the process. Azure offers a number of support plans. Determine
the right plan for your needs, and make sure the entire team knows how to use it. Team members should
understand the details of the plan, how the support process works, and how to open a support ticket with Azure. If
you are anticipating a high-scale event, Azure support can assist you with increasing your service limits. For more
information, see the Azure Support FAQs.
Follow least-privilege principles when granting access to resources. Carefully manage access to resources.
Access should be denied by default, unless a user is explicitly given access to a resource. Only grant a user access to
what they need to complete their tasks. Track user permissions and perform regular security audits.
Use role-based access control. Assigning user accounts and access to resources should not be a manual process.
Use Role-Based Access Control (RBAC) grant access based on Azure Active Directory identities and groups.
Use a bug tracking system to track issues. Without a good way to track issues, it's easy to miss items, duplicate
work, or introduce additional problems. Don't rely on informal person-to-person communication to track the status
of bugs. Use a bug tracking tool to record details about problems, assign resources to address them, and provide
an audit trail of progress and status.
Manage all resources in a change management system. All aspects of your DevOps process should be
included in a management and versioning system, so that changes can be easily tracked and audited. This includes
code, infrastructure, configuration, documentation, and scripts. Treat all these types of resources as code
throughout the test/build/review process.
Use checklists. Create operations checklists to ensure processes are followed. Its common to miss something in a
large manual, and following a checklist can force attention to details that might otherwise be overlooked. Maintain
the checklists, and continually look for ways to automate tasks and streamline processes.
For more about DevOps, see What is DevOps? on the Visual Studio site.
Resiliency checklist
6/23/2017 27 min to read Edit Online

Designing your application for resiliency requires planning for and mitigating a variety of failure modes that could
occur. Review the items in this checklist against your application design to improve its resiliency.

Requirements
Define your customer's availability requirements. Your customer will have availability requirements for the
components in your application and this will affect your application's design. Get agreement from your
customer for the availability targets of each piece of your application, otherwise your design may not meet the
customer's expectations. For more information, see Defining your resiliency requirements.

Application Design
Perform a failure mode analysis (FMA) for your application. FMA is a process for building resiliency
into an application early in the design stage. For more information, see Failure mode analysis. The goals of
an FMA include:
Identify what types of failures an application might experience.
Capture the potential effects and impact of each type of failure on the application.
Identify recovery strategies.
Deploy multiple instances of services. If your application depends on a single instance of a service, it
creates a single point of failure. Provisioning multiple instances improves both resiliency and scalability. For
Azure App Service, select an App Service Plan that offers multiple instances. For Azure Cloud Services,
configure each of your roles to use multiple instances. For Azure Virtual Machines (VMs), ensure that your
VM architecture includes more than one VM and that each VM is included in an availability set.
Use autoscaling to respond to increases in load. If your application is not configured to scale out
automatically as load increases, it's possible that your application's services will fail if they become saturated
with user requests. For more details, see the following:
General: Scalability checklist
Azure App Service: Scale instance count manually or automatically
Cloud Services: How to auto scale a cloud service
Virtual Machines: Automatic scaling and virtual machine scale sets
Use load balancing to distribute requests. Load balancing distributes your application's requests to healthy
service instances by removing unhealthy instances from rotation. If your service uses Azure App Service or
Azure Cloud Services, it is already load balanced for you. However, if your application uses Azure VMs, you will
need to provision a load balancer. See the Azure Load Balancer overview for more details.
Configure Azure Application Gateways to use multiple instances. Depending on your application's
requirements, an Azure Application Gateway may be better suited to distributing requests to your application's
services. However, single instances of the Application Gateway service are not guaranteed by an SLA so it's
possible that your application could fail if the Application Gateway instance fails. Provision more than one
medium or larger Application Gateway instance to guarantee availability of the service under the terms of the
SLA.
Use Availability Sets for each application tier. Placing your instances in an availability set provides a higher
SLA.
Consider deploying your application across multiple regions. If your application is deployed to a single
region, in the rare event the entire region becomes unavailable, your application will also be unavailable. This
may be unacceptable under the terms of your application's SLA. If so, consider deploying your application and
its services across multiple regions. A multi-region deployment can use an active-active pattern (distributing
requests across multiple active instances) or an active-passive pattern (keeping a "warm" instance in reserve, in
case the primary instance fails). We recommend that you deploy multiple instances of your application's
services across regional pairs. For more information, see Business continuity and disaster recovery (BCDR):
Azure Paired Regions.
Use Azure Traffic Manager to route your application's traffic to different regions. Azure Traffic Manager
performs load balancing at the DNS level and will route traffic to different regions based on the traffic routing
method you specify and the health of your application's endpoints. Without Traffic Manager, you are limited to a
single region for your deployment, which limits scale, increases latency for some users, and causes application
downtime in the case of a region-wide service disruption.
Configure and test health probes for your load balancers and traffic managers. Ensure that your health
logic checks the critical parts of the system and responds appropriately to health probes.
The health probes for Azure Traffic Manager and Azure Load Balancer serve a specific function. For Traffic
Manager, the health probe determines whether to fail over to another region. For a load balancer, it
determines whether to remove a VM from rotation.
For a Traffic Manager probe, your health endpoint should check any critical dependencies that are
deployed within the same region, and whose failure should trigger a failover to another region.
For a load balancer, the health endpoint should report the health of the VM. Don't include other tiers or
external services. Otherwise, a failure that occurs outside the VM will cause the load balancer to remove
the VM from rotation.
For guidance on implementing health monitoring in your application, see Health Endpoint Monitoring
Pattern.
Monitor third-party services. If your application has dependencies on third-party services, identify where and
how these third-party services can fail and what effect those failures will have on your application. A third-party
service may not include monitoring and diagnostics, so it's important to log your invocations of them and
correlate them with your application's health and diagnostic logging using a unique identifier. For more
information on proven practices for monitoring and diagnostics, see Monitoring and Diagnostics guidance.
Ensure that any third-party service you consume provides an SLA. If your application depends on a third-
party service, but the third party provides no guarantee of availability in the form of an SLA, your application's
availability also cannot be guaranteed. Your SLA is only as good as the least available component of your
application.
Implement resiliency patterns for remote operations where appropriate. If your application depends on
communication between remote services, follow design patterns for dealing with transient failures, such as
Retry Pattern, and Circuit Breaker Pattern. For more information, see Resiliency strategies.
Implement asynchronous operations whenever possible. Synchronous operations can monopolize
resources and block other operations while the caller waits for the process to complete. Design each part of your
application to allow for asynchronous operations whenever possible. For more information on how to
implement asynchronous programming in C#, see Asynchronous Programming with async and await.

Data management
Understand the replication methods for your application's data sources. Your application data will be
stored in different data sources and have different availability requirements. Evaluate the replication methods
for each type of data storage in Azure, including Azure Storage Replication and SQL Database Active Geo-
Replication to ensure that your application's data requirements are satisfied.
Ensure that no single user account has access to both production and backup data. Your data backups
are compromised if one single user account has permission to write to both production and backup sources. A
malicious user could purposely delete all your data, while a regular user could accidentally delete it. Design your
application to limit the permissions of each user account so that only the users that require write access have
write access and it's only to either production or backup, but not both.
Document your data source fail over and fail back process and test it. In the case where your data source
fails catastrophically, a human operator will have to follow a set of documented instructions to fail over to a new
data source. If the documented steps have errors, an operator will not be able to successfully follow them and
fail over the resource. Regularly test the instruction steps to verify that an operator following them is able to
successfully fail over and fail back the data source.
Validate your data backups. Regularly verify that your backup data is what you expect by running a script to
validate data integrity, schema, and queries. There's no point having a backup if it's not useful to restore your
data sources. Log and report any inconsistencies so the backup service can be repaired.
Consider using a storage account type that is geo-redundant. Data stored in an Azure Storage account
is always replicated locally. However, there are multiple replication strategies to choose from when a Storage
Account is provisioned. Select Azure Read-Access Geo Redundant Storage (RA-GRS) to protect your
application data against the rare case when an entire region becomes unavailable.

NOTE
For VMs, do not rely on RA-GRS replication to restore the VM disks (VHD files). Instead, use Azure Backup.

Security
Implement application-level protection against distributed denial of service (DDoS) attacks. Azure
services are protected against DDos attacks at the network layer. However, Azure cannot protect against
application-layer attacks, because it is difficult to distinguish between true user requests from malicious user
requests. For more information on how to protect against application-layer DDoS attacks, see the "Protecting
against DDoS" section of Microsoft Azure Network Security (PDF download).
Implement the principle of least privilege for access to the application's resources. The default for
access to the application's resources should be as restrictive as possible. Grant higher level permissions on an
approval basis. Granting overly permissive access to your application's resources by default can result in
someone purposely or accidentally deleting resources. Azure provides role-based access control to manage user
privileges, but it's important to verify least privilege permissions for other resources that have their own
permissions systems such as SQL Server.

Testing
Perform failover and failback testing for your application. If you haven't fully tested failover and failback,
you can't be certain that the dependent services in your application come back up in a synchronized manner
during disaster recovery. Ensure that your application's dependent services failover and fail back in the correct
order.
Perform fault-injection testing for your application. Your application can fail for many different reasons,
such as certificate expiration, exhaustion of system resources in a VM, or storage failures. Test your application
in an environment as close as possible to production, by simulating or triggering real failures. For example,
delete certificates, artificially consume system resources, or delete a storage source. Verify your application's
ability to recover from all types of faults, alone and in combination. Check that failures are not propagating or
cascading through your system.
Run tests in production using both synthetic and real user data. Test and production are rarely identical,
so it's important to use blue/green or a canary deployment and test your application in production. This allows
you to test your application in production under real load and ensure it will function as expected when fully
deployed.

Deployment
Document the release process for your application. Without detailed release process documentation, an
operator might deploy a bad update or improperly configure settings for your application. Clearly define and
document your release process, and ensure that it's available to the entire operations team.
Automate your application's deployment process. If your operations staff is required to manually deploy
your application, human error can cause the deployment to fail.
Design your release process to maximize application availability. If your release process requires services
to go offline during deployment, your application will be unavailable until they come back online. Use the
blue/green or canary release deployment technique to deploy your application to production. Both of these
techniques involve deploying your release code alongside production code so users of release code can be
redirected to production code in the event of a failure.
Log and audit your application's deployments. If you use staged deployment techniques such as
blue/green or canary releases there will be more than one version of your application running in production. If a
problem should occur, it's critical to determine which version of your application is causing a problem.
Implement a robust logging strategy to capture as much version-specific information as possible.
Have a rollback plan for deployment. It's possible that your application deployment could fail and cause
your application to become unavailable. Design a rollback process to go back to a last known good version and
minimize downtime.

Operations
Implement best practices for monitoring and alerting in your application. Without proper monitoring,
diagnostics, and alerting, there is no way to detect failures in your application and alert an operator to fix them.
For more information, see Monitoring and Diagnostics guidance.
Measure remote call statistics and make the information available to the application team. If you don't
track and report remote call statistics in real time and provide an easy way to review this information, the
operations team will not have an instantaneous view into the health of your application. And if you only
measure average remote call time, you will not have enough information to reveal issues in the services.
Summarize remote call metrics such as latency, throughput, and errors in the 99 and 95 percentiles. Perform
statistical analysis on the metrics to uncover errors that occur within each percentile.
Track the number of transient exceptions and retries over an appropriate timeframe. If you don't track
and monitor transient exceptions and retry attempts over time, it's possible that an issue or failure could be
hidden by your application's retry logic. That is, if your monitoring and logging only shows success or failure of
an operation, the fact that the operation had to be retried multiple times due to exceptions will be hidden. A
trend of increasing exceptions over time indicates that the service is having an issue and may fail. For more
information, see Retry service specific guidance.
Implement an early warning system that alerts an operator. Identify the key performance indicators of
your application's health, such as transient exceptions and remote call latency, and set appropriate threshold
values for each of them. Send an alert to operations when the threshold value is reached. Set these thresholds at
levels that identify issues before they become critical and require a recovery response.
Ensure that more than one person on the team is trained to monitor the application and perform any
manual recovery steps. If you only have a single operator on the team who can monitor the application and
kick off recovery steps, that person becomes a single point of failure. Train multiple individuals on detection and
recovery and make sure there is always at least one active at any time.
Ensure that your application does not run up against Azure subscription limits. Azure subscriptions have
limits on certain resource types, such as number of resource groups, number of cores, and number of storage
accounts. If your application requirements exceed Azure subscription limits, create another Azure subscription
and provision sufficient resources there.
Ensure that your application does not run up against per-service limits. Individual Azure services have
consumption limits for example, limits on storage, throughput, number of connections, requests per second,
and other metrics. Your application will fail if it attempts to use resources beyond these limits. This will result in
service throttling and possible downtime for affected users. Depending on the specific service and your
application requirements, you can often avoid these limits by scaling up (for example, choosing another pricing
tier) or scaling out (adding new instances).
Design your application's storage requirements to fall within Azure storage scalability and
performance targets. Azure storage is designed to function within predefined scalability and performance
targets, so design your application to utilize storage within those targets. If you exceed these targets your
application will experience storage throttling. To fix this, provision additional Storage Accounts. If you run up
against the Storage Account limit, provision additional Azure Subscriptions and then provision additional
Storage Accounts there. For more information, see Azure Storage Scalability and Performance Targets.
Select the right VM size for your application. Measure the actual CPU, memory, disk, and I/O of your VMs in
production and verify that the VM size you've selected is sufficient. If not, your application may experience
capacity issues as the VMs approach their limits. VM sizes are described in detail in Sizes for virtual machines in
Azure.
Determine if your application's workload is stable or fluctuating over time. If your workload fluctuates
over time, use Azure VM scale sets to automatically scale the number of VM instances. Otherwise, you will have
to manually increase or decrease the number of VMs. For more information, see the Virtual Machine Scale Sets
Overview.
Select the right service tier for Azure SQL Database. If your application uses Azure SQL Database, ensure
that you have selected the appropriate service tier. If you select a tier that is not able to handle your application's
database transaction unit (DTU) requirements, your data use will be throttled. For more information on selecting
the correct service plan, see SQL Database options and performance: Understand what's available in each
service tier.
Create a process for interacting with Azure support. If the process for contacting Azure support is not set
before the need to contact support arises, downtime will be prolonged as the support process is navigated for
the first time. Include the process for contacting support and escalating issues as part of your application's
resiliency from the outset.
Ensure that your application doesn't use more than the maximum number of storage accounts per
subscription. Azure allows a maximum of 200 storage accounts per subscription. If your application requires
more storage accounts than are currently available in your subscription, you will have to create a new
subscription and create additional storage accounts there. For more information, see Azure subscription and
service limits, quotas, and constraints.
Ensure that your application doesn't exceed the scalability targets for virtual machine disks. An Azure
IaaS VM supports attaching a number of data disks depending on several factors, including the VM size and type
of storage account. If your application exceeds the scalability targets for virtual machine disks, provision
additional storage accounts and create the virtual machine disks there. For more information, see Azure Storage
Scalability and Performance Targets

Telemetry
Log telemetry data while the application is running in the production environment. Capture robust
telemetry information while the application is running in the production environment or you will not have
sufficient information to diagnose the cause of issues while it's actively serving users. For more information, see
Monitoring and Diagnostics.
Implement logging using an asynchronous pattern. If logging operations are synchronous, they might
block your application code. Ensure that your logging operations are implemented as asynchronous operations.
Correlate log data across service boundaries. In a typical n-tier application, a user request may traverse
several service boundaries. For example, a user request typically originates in the web tier and is passed to the
business tier and finally persisted in the data tier. In more complex scenarios, a user request may be distributed
to many different services and data stores. Ensure that your logging system correlates calls across service
boundaries so you can track the request throughout your application.

Azure Resources
Use Azure Resource Manager templates to provision resources. Resource Manager templates make it
easier to automate deployments via PowerShell or the Azure CLI, which leads to a more reliable deployment
process. For more information, see Azure Resource Manager overview.
Give resources meaningful names. Giving resources meaningful names makes it easier to locate a specific
resource and understand its role. For more information, see Naming conventions for Azure resources
Use role-based access control (RBAC). Use RBAC to control access to the Azure resources that you deploy.
RBAC lets you assign authorization roles to members of your DevOps team, to prevent accidental deletion or
changes to deployed resources. For more information, see Get started with access management in the Azure
portal
Use resource locks for critical resources, such as VMs. Resource locks prevent an operator from accidentally
deleting a resource. For more information, see Lock resources with Azure Resource Manager
Choose regional pairs. When deploying to two regions, choose regions from the same regional pair. In the
event of a broad outage, recovery of one region is prioritized out of every pair. Some services such as Geo-
Redundant Storage provide automatic replication to the paired region. For more information, see Business
continuity and disaster recovery (BCDR): Azure Paired Regions
Organize resource groups by function and lifecycle. In general, a resource group should contain resources
that share the same lifecycle. This makes it easier to manage deployments, delete test deployments, and assign
access rights, reducing the chance that a production deployment is accidentally deleted or modified. Create
separate resource groups for production, development, and test environments. In a multi-region deployment,
put resources for each region into separate resource groups. This makes it easier to redeploy one region without
affecting the other region(s).

Azure Services
The following checklist items apply to specific services in Azure.
App Service
Use Standard or Premium tier. These tiers support staging slots and automated backups. For more
information, see Azure App Service plans in-depth overview
Avoid scaling up or down. Instead, select a tier and instance size that meet your performance requirements
under typical load, and then scale out the instances to handle changes in traffic volume. Scaling up and down
may trigger an application restart.
Store configuration as app settings. Use app settings to hold configuration settings as app settings. Define
the settings in your Resource Manager templates, or using PowerShell, so that you can apply them as part of an
automated deployment / update process, which is more reliable. For more information, see Configure web apps
in Azure App Service.
Create separate App Service plans for production and test. Don't use slots on your production deployment
for testing. All apps within the same App Service plan share the same VM instances. If you put production and
test deployments in the same plan, it can negatively affect the production deployment. For example, load tests
might degrade the live production site. By putting test deployments into a separate plan, you isolate them from
the production version.
Separate web apps from web APIs. If your solution has both a web front-end and a web API, consider
decomposing them into separate App Service apps. This design makes it easier to decompose the solution by
workload. You can run the web app and the API in separate App Service plans, so they can be scaled
independently. If you don't need that level of scalability at first, you can deploy the apps into the same plan, and
move them into separate plans later, if needed.
Avoid using the App Service backup feature to back up Azure SQL databases. Instead, use SQL Database
automated backups. App Service backup exports the database to a SQL .bacpac file, which costs DTUs.
Deploy to a staging slot. Create a deployment slot for staging. Deploy application updates to the staging slot,
and verify the deployment before swapping it into production. This reduces the chance of a bad update in
production. It also ensures that all instances are warmed up before being swapped into production. Many
applications have a significant warmup and cold-start time. For more information, see Set up staging
environments for web apps in Azure App Service.
Create a deployment slot to hold the last-known-good (LKG) deployment. When you deploy an update
to production, move the previous production deployment into the LKG slot. This makes it easier to roll back a
bad deployment. If you discover a problem later, you can quickly revert to the LKG version. For more
information, see Basic web application.
Enable diagnostics logging, including application logging and web server logging. Logging is important for
monitoring and diagnostics. See Enable diagnostics logging for web apps in Azure App Service
Log to blob storage. This makes it easier to collect and analyze the data.
Create a separate storage account for logs. Don't use the same storage account for logs and application
data. This helps to prevent logging from reducing application performance.
Monitor performance. Use a performance monitoring service such as New Relic or Application Insights to
monitor application performance and behavior under load. Performance monitoring gives you real-time insight
into the application. It enables you to diagnose issues and perform root-cause analysis of failures.
Application Gateway
Provision at least two instances. Deploy Application Gateway with at least two instances. A single instance is
a single point of failure. Use two or more instances for redundancy and scalability. In order to qualify for the
SLA, you must provision two or more medium or larger instances.
Azure Search
Provision more than one replica. Use at least two replicas for read high-availability, or three for read-write
high-availability.
Configure indexers for multi-region deployments. If you have a multi-region deployment, consider
your options for continuity in indexing.
If the data source is geo-replicated, you should generally point each indexer of each regional Azure
Search service to its local data source replica. However, that approach is not recommended for large
datasets stored in Azure SQL Database. The reason is that Azure Search cannot perform incremental
indexing from secondary SQL Database replicas, only from primary replicas. Instead, point all indexers to
the primary replica. After a failover, point the Azure Search indexers at the new primary replica.
If the data source is not geo-replicated, point multiple indexers at the same data source, so that Azure
Search services in multiple regions continuously and independently index from the data source. For more
information, see Azure Search performance and optimization considerations.
Azure Storage
For application data, use read-access geo-redundant storage (RA-GRS). RA-GRS storage replicates the
data to a secondary region, and provides read-only access from the secondary region. If there is a storage
outage in the primary region, the application can read the data from the secondary region. For more
information, see Azure Storage replication.
For VM disks, use Managed Disks. Managed Disks provide better reliability for VMs in an availability set,
because the disks are sufficiently isolated from each other to avoid single points of failure. Also, Managed Disks
aren't subject to the IOPS limits of VHDs created in a storage account. For more information, see Manage the
availability of Windows virtual machines in Azure.
For Queue storage, create a backup queue in another region. For Queue storage, a read-only replica has
limited use, because you can't queue or dequeue items. Instead, create a backup queue in a storage account in
another region. If there is a storage outage, the application can use the backup queue, until the primary region
becomes available again. That way, the application can still process new requests.
Cosmos DB
Replicate the database across regions. Cosmos DB allows you to associate any number of Azure regions
with a Cosmos DB database account. A Cosmos DB database ca one write region and multiple read regions. If
there is a failure in the write region, you can read from another replica. The Client SDK handles this
automatically. You can also fail over the write region to another region. For more information, see How to
distribute data globally with Azure Cosmos DB?
SQL Database
Use Standard or Premium tier. These tiers provide a longer point-in-time restore period (35 days). For more
information, see SQL Database options and performance.
Enable SQL Database auditing. Auditing can be used to diagnose malicious attacks or human error. For more
information, see Get started with SQL database auditing.
Use Active Geo-Replication Use Active Geo-Replication to create a readable secondary in a different region. If
your primary database fails, or simply needs to be taken offline, perform a manual failover to the secondary
database. Until you fail over, the secondary database remains read-only. For more information, see SQL
Database Active Geo-Replication.
Use sharding. Consider using sharding to partition the database horizontally. Sharding can provide fault
isolation. For more information, see Scaling out with Azure SQL Database.
Use point-in-time restore to recover from human error. Point-in-time restore returns your database to an
earlier point in time. For more information, see Recover an Azure SQL database using automated database
backups.
Use geo-restore to recover from a service outage. Geo-restore restores a database from a geo-redundant
backup. For more information, see Recover an Azure SQL database using automated database backups.
SQL Server (running in a VM )
Replicate the database. Use SQL Server Always On Availability Groups to replicate the database. Provides
high availability if one SQL Server instance fails. For more information, see Run Windows VMs for an N-tier
application
Back up the database. If you are already using Azure Backup to back up your VMs, consider using Azure
Backup for SQL Server workloads using DPM. With this approach, there is one backup administrator role for the
organization and a unified recovery procedure for VMs and SQL Server. Otherwise, use SQL Server Managed
Backup to Microsoft Azure.
Traffic Manager
Perform manual failback. After a Traffic Manager failover, perform manual failback, rather than automatically
failing back. Before failing back, verify that all application subsystems are healthy. Otherwise, you can create a
situation where the application flips back and forth between data centers. For more information, see Run VMs in
multiple regions for high availability.
Create a health probe endpoint. Create a custom endpoint that reports on the overall health of the
application. This enables Traffic Manager to fail over if any critical path fails, not just the front end. The endpoint
should return an HTTP error code if any critical dependency is unhealthy or unreachable. Don't report errors for
non-critical services, however. Otherwise, the health probe might trigger failover when it's not needed, creating
false positives. For more information, see Traffic Manager endpoint monitoring and failover.
Virtual Machines
Avoid running a production workload on a single VM. A single VM deployment is not resilient to planned
or unplanned maintenance. Instead, put multiple VMs in an availability set or VM scale set, with a load balancer
in front.
Specify an availability set when you provision the VM. Currently, there is no way to add a VM to an
availability set after the VM is provisioned. When you add a new VM to an existing availability set, make sure to
create a NIC for the VM, and add the NIC to the back-end address pool on the load balancer. Otherwise, the load
balancer won't route network traffic to that VM.
Put each application tier into a separate Availability Set. In an N-tier application, don't put VMs from
different tiers into the same availability set. VMs in an availability set are placed across fault domains (FDs) and
update domains (UD). However, to get the redundancy benefit of FDs and UDs, every VM in the availability set
must be able to handle the same client requests.
Choose the right VM size based on performance requirements. When moving an existing workload to
Azure, start with the VM size that's the closest match to your on-premises servers. Then measure the
performance of your actual workload with respect to CPU, memory, and disk IOPS, and adjust the size if needed.
This helps to ensure the application behaves as expected in a cloud environment. Also, if you need multiple NICs,
be aware of the NIC limit for each size.
Use Managed Disks for VHDs. Managed Disks provide better reliability for VMs in an availability set, because
the disks are sufficiently isolated from each other to avoid single points of failure. Also, Managed Disks aren't
subject to the IOPS limits of VHDs created in a storage account. For more information, see Manage the
availability of Windows virtual machines in Azure.
Install applications on a data disk, not the OS disk. Otherwise, you may reach the disk size limit.
Use Azure Backup to back up VMs. Backups protect against accidental data loss. For more information, see
Protect Azure VMs with a recovery services vault.
Enable diagnostic logs, including basic health metrics, infrastructure logs, and boot diagnostics. Boot
diagnostics can help you diagnose a boot failure if your VM gets into a non-bootable state. For more
information, see Overview of Azure Diagnostic Logs.
Use the AzureLogCollector extension. (Windows VMs only.) This extension aggregates Azure platform logs
and uploads them to Azure storage, without the operator remotely logging into the VM. For more information,
see AzureLogCollector Extension.
Virtual Network
To whitelist or block public IP addresses, add an NSG to the subnet. Block access from malicious users, or
allow access only from users who have privilege to access the application.
Create a custom health probe. Load Balancer Health Probes can test either HTTP or TCP. If a VM runs an HTTP
server, the HTTP probe is a better indicator of health status than a TCP probe. For an HTTP probe, use a custom
endpoint that reports the overall health of the application, including all critical dependencies. For more
information, see Azure Load Balancer overview.
Don't block the health probe. The Load Balancer Health probe is sent from a known IP address,
168.63.129.16. Don't block traffic to or from this IP in any firewall policies or network security group (NSG)
rules. Blocking the health probe would cause the load balancer to remove the VM from rotation.
Enable Load Balancer logging. The logs show how many VMs on the back-end are not receiving network
traffic due to failed probe responses. For more information, see Log analytics for Azure Load Balancer.
Scalability checklist
6/23/2017 14 min to read Edit Online

Service design
Partition the workload. Design parts of the process to be discrete and decomposable. Minimize the size of
each part, while following the usual rules for separation of concerns and the single responsibility principle. This
allows the component parts to be distributed in a way that maximizes use of each compute unit (such as a role
or database server). It also makes it easier to scale the application by adding instances of specific resources. For
more information, see Compute Partitioning Guidance.
Design for scaling. Scaling allows applications to react to variable load by increasing and decreasing the
number of instances of roles, queues, and other services they use. However, the application must be designed
with this in mind. For example, the application and the services it uses must be stateless, to allow requests to be
routed to any instance. This also prevents the addition or removal of specific instances from adversely
impacting current users. You should also implement configuration or auto-detection of instances as they are
added and removed, so that code in the application can perform the necessary routing. For example, a web
application might use a set of queues in a round-robin approach to route requests to background services
running in worker roles. The web application must be able to detect changes in the number of queues, to
successfully route requests and balance the load on the application.
Scale as a unit. Plan for additional resources to accommodate growth. For each resource, know the upper
scaling limits, and use sharding or decomposition to go beyond these limits. Determine the scale units for the
system in terms of well-defined sets of resources. This makes applying scale-out operations easier, and less
prone to negative impact on the application through limitations imposed by lack of resources in some part of
the overall system. For example, adding x number of web and worker roles might require y number of
additional queues and z number of storage accounts to handle the additional workload generated by the roles.
So a scale unit could consist of x web and worker roles, y queues, and z storage accounts. Design the application
so that it's easily scaled by adding one or more scale units.
Avoid client affinity. Where possible, ensure that the application does not require affinity. Requests can thus
be routed to any instance, and the number of instances is irrelevant. This also avoids the overhead of storing,
retrieving, and maintaining state information for each user.
Take advantage of platform autoscaling features. Where the hosting platform supports an autoscaling
capability, such as Azure Autoscale, prefer it to custom or third-party mechanisms unless the built-in
mechanism can't fulfill your requirements. Use scheduled scaling rules where possible to ensure resources are
available without a start-up delay, but add reactive autoscaling to the rules where appropriate to cope with
unexpected changes in demand. You can use the autoscaling operations in the Service Management API to
adjust autoscaling, and to add custom counters to rules. For more information, see Auto-scaling guidance.
Offload intensive CPU/IO tasks as background tasks. If a request to a service is expected to take a long time
to run or absorb considerable resources, offload the processing for this request to a separate task. Use worker
roles or background jobs (depending on the hosting platform) to execute these tasks. This strategy enables the
service to continue receiving further requests and remain responsive. For more information, see Background
jobs guidance.
Distribute the workload for background tasks. Where there are many background tasks, or the tasks
require considerable time or resources, spread the work across multiple compute units (such as worker roles or
background jobs). For one possible solution, see the Competing Consumers Pattern.
Consider moving towards a shared-nothing architecture. A shared-nothing architecture uses independent,
self-sufficient nodes that have no single point of contention (such as shared services or storage). In theory, such
a system can scale almost indefinitely. While a fully shared-nothing approach is generally not practical for most
applications, it may provide opportunities to design for better scalability. For example, avoiding the use of
server-side session state, client affinity, and data partitioning are good examples of moving towards a shared-
nothing architecture.

Data management
Use data partitioning. Divide the data across multiple databases and database servers, or design the
application to use data storage services that can provide this partitioning transparently (examples include Azure
SQL Database Elastic Database, and Azure Table storage). This approach can help to maximize performance and
allow easier scaling. There are different partitioning techniques, such as horizontal, vertical, and functional. You
can use a combination of these to achieve maximum benefit from increased query performance, simpler
scalability, more flexible management, better availability, and to match the type of store to the data it will hold.
Also, consider using different types of data store for different types of data, choosing the types based on how
well they are optimized for the specific type of data. This may include using table storage, a document database,
or a column-family data store, instead of, or as well as, a relational database. For more information, see Data
partitioning guidance.
Design for eventual consistency. Eventual consistency improves scalability by reducing or removing the time
needed to synchronize related data partitioned across multiple stores. The cost is that data is not always
consistent when it is read, and some write operations may cause conflicts. Eventual consistency is ideal for
situations where the same data is read frequently but written infrequently. For more information, see the Data
Consistency Primer.
Reduce chatty interactions between components and services. Avoid designing interactions in which an
application is required to make multiple calls to a service (each of which returns a small amount of data), rather
than a single call that can return all of the data. Where possible, combine several related operations into a single
request when the call is to a service or component that has noticeable latency. This makes it easier to monitor
performance and optimize complex operations. For example, use stored procedures in databases to encapsulate
complex logic, and reduce the number of round trips and resource locking.
Use queues to level the load for high velocity data writes. Surges in demand for a service can overwhelm
that service and cause escalating failures. To prevent this, consider implementing the Queue-Based Load
Leveling Pattern. Use a queue that acts as a buffer between a task and a service that it invokes. This can smooth
intermittent heavy loads that may otherwise cause the service to fail or the task to time out.
Minimize the load on the data store. The data store is commonly a processing bottleneck, a costly resource,
and often not easy to scale out. Where possible, remove logic (such as processing XML documents or JSON
objects) from the data store, and perform processing within the application. For example, instead of passing
XML to the database (other than as an opaque string for storage), serialize or deserialize the XML within the
application layer and pass it in a form that is native to the data store. It's typically much easier to scale out the
application than the data store, so you should attempt to do as much of the compute-intensive processing as
possible within the application.
Minimize the volume of data retrieved. Retrieve only the data you require by specifying columns and using
criteria to select rows. Make use of table value parameters and the appropriate isolation level. Use mechanisms
like entity tags to avoid retrieving data unnecessarily.
Aggressively use caching. Use caching wherever possible to reduce the load on resources and services that
generate or deliver data. Caching is typically suited to data that is relatively static, or that requires considerable
processing to obtain. Caching should occur at all levels where appropriate in each layer of the application,
including data access and user interface generation. For more information, see the Caching Guidance.
Handle data growth and retention. The amount of data stored by an application grows over time. This
growth increases storage costs, and increases latency when accessing the data which affects application
throughput and performance. It may be possible to periodically archive some of the old data that is no longer
accessed, or move data that is rarely accessed into long-term storage that is more cost efficient, even if the
access latency is higher.
Optimize Data Transfer Objects (DTOs) using an efficient binary format. DTOs are passed between the
layers of an application many times. Minimizing the size reduces the load on resources and the network.
However, balance the savings with the overhead of converting the data to the required format in each location
where it is used. Adopt a format that has the maximum interoperability to enable easy reuse of a component.
Set cache control. Design and configure the application to use output caching or fragment caching where
possible, to minimize processing load.
Enable client side caching. Web applications should enable cache settings on the content that can be cached.
This is commonly disabled by default. Configure the server to deliver the appropriate cache control headers to
enable caching of content on proxy servers and clients.
Use Azure blob storage and the Azure Content Delivery Network to reduce the load on the
application. Consider storing static or relatively static public content, such as images, resources, scripts, and
style sheets, in blob storage. This approach relieves the application of the load caused by dynamically
generating this content for each request. Additionally, consider using the Content Delivery Network to cache
this content and deliver it to clients. Using the Content Delivery Network can improve performance at the client
because the content is delivered from the geographically closest datacenter that contains a Content Delivery
Network cache. For more information, see the Content Delivery Network Guidance.
Optimize and tune SQL queries and indexes. Some T-SQL statements or constructs may have an impact on
performance that can be reduced by optimizing the code in a stored procedure. For example, avoid converting
datetime types to a varchar before comparing with a datetime literal value. Use date/time comparison
functions instead. Lack of appropriate indexes can also slow query execution. If you use an object/relational
mapping framework, understand how it works and how it may affect performance of the data access layer. For
more information, see Query Tuning.
Consider de-normalizing data. Data normalization helps to avoid duplication and inconsistency. However,
maintaining multiple indexes, checking for referential integrity, performing multiple accesses to small chunks of
data, and joining tables to reassemble the data imposes an overhead that can affect performance. Consider if
some additional storage volume and duplication is acceptable in order to reduce the load on the data store.
Also, consider if the application itself (which is typically easier to scale) can be relied upon to take over tasks
such as managing referential integrity in order to reduce the load on the data store. For more information, see
Data partitioning guidance.

Service implementation
Use asynchronous calls. Use asynchronous code wherever possible when accessing resources or services that
may be limited by I/O or network bandwidth, or that have a noticeable latency, in order to avoid locking the
calling thread. To implement asynchronous operations, use the Task-based Asynchronous Pattern (TAP).
Avoid locking resources, and use an optimistic approach instead. Never lock access to resources such as
storage or other services that have noticeable latency, because this is a primary cause of poor performance.
Always use optimistic approaches to managing concurrent operations, such as writing to storage. Use features
of the storage layer to manage conflicts. In distributed applications, data may be only eventually consistent.
Compress highly compressible data over high latency, low bandwidth networks. In the majority of cases
in a web application, the largest volume of data generated by the application and passed over the network is
HTTP responses to client requests. HTTP compression can reduce this considerably, especially for static content.
This can reduce cost as well as reducing the load on the network, though compressing dynamic content does
apply a fractionally higher load on the server. In other, more generalized environments, data compression can
reduce the volume of data transmitted and minimize transfer time and costs, but the compression and
decompression processes incur overhead. As such, compression should only be used when there is a
demonstrable gain in performance. Other serialization methods, such as JSON or binary encodings, may reduce
the payload size while having less impact on performance, whereas XML is likely to increase it.
Minimize the time that connections and resources are in use. Maintain connections and resources only for
as long as you need to use them. For example, open connections as late as possible, and allow them to be
returned to the connection pool as soon as possible. Acquire resources as late as possible, and dispose of them
as soon as possible.
Minimize the number of connections required. Service connections absorb resources. Limit the number
that are required and ensure that existing connections are reused whenever possible. For example, after
performing authentication, use impersonation where appropriate to run code as a specific identity. This can
help to make best use of the connection pool by reusing connections.

NOTE
: APIs for some services automatically reuse connections, provided service-specific guidelines are followed. It's
important that you understand the conditions that enable connection reuse for each service that your application
uses.

Send requests in batches to optimize network use. For example, send and read messages in batches when
accessing a queue, and perform multiple reads or writes as a batch when accessing storage or a cache. This can
help to maximize efficiency of the services and data stores by reducing the number of calls across the network.
Avoid a requirement to store server-side session state where possible. Server-side session state
management typically requires client affinity (that is, routing each request to the same server instance), which
affects the ability of the system to scale. Ideally, you should design clients to be stateless with respect to the
servers that they use. However, if the application must maintain session state, store sensitive data or large
volumes of per-client data in a distributed server-side cache that all instances of the application can access.
Optimize table storage schemas. When using table stores that require the table and column names to be
passed and processed with every query, such as Azure table storage, consider using shorter names to reduce
this overhead. However, do not sacrifice readability or manageability by using overly compact names.
Use the Task Parallel Library (TPL) to perform asynchronous operations. The TPL makes it easy to write
asynchronous code that performs I/O-bound operations. Use ConfigureAwait(false) wherever possible to
eliminate the dependency of a continuation on a specific synchronization context. This reduces the chances of
thread-deadlock occurring.
Create resource dependencies during deployment or at application startup. Avoid repeated calls to
methods that test the existence of a resource and then create the resource if it does not exist. (Methods such as
CloudTable.CreateIfNotExists and CloudQueue.CreateIfNotExists in the Azure Storage Client Library follow this
pattern). These methods can impose considerable overhead if they are invoked before each access to a storage
table or storage queue. Instead:
Create the required resources when the application is deployed, or when it first starts (a single call to
CreateIfNotExists for each resource in the startup code for a web or worker role is acceptable). However,
be sure to handle exceptions that may arise if your code attempts to access a resource that doesn't exist.
In these situations, you should log the exception, and possibly alert an operator that a resource is
missing.
Under some circumstances, it may be appropriate to create the missing resource as part of the exception
handling code. But you should adopt this approach with caution as the non-existence of the resource
might be indicative of a programming error (a misspelled resource name for example), or some other
infrastructure-level issue.
Use lightweight frameworks. Carefully choose the APIs and frameworks you use to minimize resource usage,
execution time, and overall load on the application. For example, using Web API to handle service requests can
reduce the application footprint and increase execution speed, but it may not be suitable for advanced scenarios
where the additional capabilities of Windows Communication Foundation are required.
Consider minimizing the number of service accounts. For example, use a specific account to access
resources or services that impose a limit on connections, or perform better where fewer connections are
maintained. This approach is common for services such as databases, but it can affect the ability to accurately
audit operations due to the impersonation of the original user.
Carry out performance profiling and load testing during development, as part of test routines, and before
final release to ensure the application performs and scales as required. This testing should occur on the same
type of hardware as the production platform, and with the same types and quantities of data and user load as it
will encounter in production. For more information, see Testing the performance of a cloud service.

You might also like