You are on page 1of 11

Windows Azure and Java: Working with Blob

Storage
By Satish Nikam_Senior Architect_CTO Organisation Posted June 22, 2012 In Technology Frontier

Windows Azure Blobs are part of the Windows Azure Storage service, along with Queues and Tables. Windows Azure
Blob Storage can store large amounts of data such as videos, audio, and images. Data stored in Blob storage can be
exposed publicly or privately and can be accessed from anywhere via HTTP or HTTPS. A single blob can store up to
200GB (or 1TB), depending on type. A storage account can have up to 100TB of blobs. Data stored in Windows Azure
Storage is durable, meaning storage is triple-replicated within the datacenter, providing resiliency to hardware failures.
Also Blobs are, by default, replicated to another sub-region which ensures high degree of disaster recovery.
Blob Storage can be accessed using Windows Azure SDK for Java, which is a wrapper over the REST API and provides
a way to work with containers and blobs.
Here we will demonstrate the use of Windows Azure Blob Storage service from a Java application. Blob Storage can be
accessed from a Java application running locally or within Windows Azure worker and web role instances. We recently
published CloudNinja for Java to github, a reference application illustrating how to build multi-tenant Java based
applications for Windows Azure. CloudNinja for Java uses Windows Azure Blob Storage for storing Tomcat access logs
and tenant logo files.
Here we will discuss the following operations on Blob Storage:
Create and delete a blob container
Create and delete blobs inside a container
Verify the integrity of the blob content
Lease blobs
Create and delete blob snapshots
Set Access Control Levels (ACLs) on blobs and containers
List blobs in a container
Create a directory structure of blobs and containers
Use Shared Access Signatures on containers

PREREQUISITES
The prerequisites for using Windows Azure Blob Storage service from a Java application are:
Windows Azure Libraries for Java
Windows Azure SDK

Java Development Kit (JDK)

CREATING A JAVA APPLICATION TO ACCESS BLOB STORAGE


We add the following import statements to the Java classes that we use to access Blob Storage.

// Import following to use blob APIs


Import java.io.*;
Import java.net.*;
Import java.util.*;
Import com.microsoft.windowsazure.services.blob.*;
Import com.microsoft.windowsazure.services.core.*;
import java.security.InvalidKeyException;

Retrieving a Storage Account


A storage account is required to create a blob client, which is used to perform various operations on Blob Storage. To
retrieve a storage account, initialize an object of the CloudStorageAccount class. The initialized object represents the
storage account. We can initialize CloudStorageAccount using a Windows Azure Storage account or an emulated
storage account (Storage Emulator account).
RETRIEVING A WINDOWS AZURE STORAGE ACCOUNT
We first need to retrieve the cloud storage account using the CloudStorageAccount class. The cloud storage account
can be retrieved by parsing the connection string using the CloudStorageAccount.parse method. The connection string
consists of the default endpoint protocol, storage account name, and storage account key.
Here is the sample code of retrieving the cloud storage account.

// Define the connection-string with your values


public static final String storageConnectionString =
DefaultEndpointsProtocol=http; +
AccountName=your_storage_account; +
AccountKey=your_storage_account_key;
// Retrieve storage account from connection-string
CloudStorageAccount storageAccount =
CloudStorageAccount.parse(storageConnectionString);
In this code, the storage account is specified as AccountName and the primary access key of the storage account is
specified as AccountKey. The primary access key is listed in the Windows Azure management portal.
WORKING WITH STORAGE ACCOUNT LOCAL EMULATOR

Windows Azure SDK provides a Storage Emulator that emulates Windows Azure Storage, and is backed by a local SQL
Server instance (SQL Express, by default). While the storage emulator is fine for development, it differs from Windows
Azure Storage. Please see this MSDN article for details about specific differences.
The code below retrieves the emulated storage account. Before running the following code, ensure that Storage Emulator
is up and running.

CloudStorageAccount storageAccount =
CloudStorageAccount.getDevelopmentStorageAccount();
While developing an application the CloudStorageAccount.getDevelopmentStorageAccount method can be used to
access the emulated storage account. This is particularly useful if the developer is not having access to the Windows
Azure Storage account. However, you should not use this method in code that you deploy to Windows Azure, because
the development storage account is not available in Windows Azure.
An alternative approach to accessing the local emulator storage account is to access it just like you would access a real
storage account, with a storage account name and key in your configuration file. The emulator account has a special
account name and key:
Account name: devstoreaccount1
Account key:
Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==
You can place these in the local configuration file, and place your real credentials in the cloud configuration file, allowing
you to easily run code against either account without changing any code.
Development storage account details are documented in this MSDN article.

PERFORMING OPERATIONS ON BLOB STORAGE FROM A JAVA APPLICATION


To access the Blob Storage service, a blob client is required. We use the CloudBlobClient class to get the reference to
blobs and containers. We initialize blobClient, the object of CloudBlobClient, using the CloudStorageAccount class.
Here is the sample code to create a blob client.

CloudBlobClient blobClient = storageAccount.createCloudBlobClient();

blobClient is used to perform various operations on Blob Storage.

HOW TO CREATE A BLOB CONTAINER


A blob container is necessary to create and store a blob, and it facilitates organization of blobs. The container has its own
metadata and properties. To create a blob container, we initialize an object of the CloudBlobContainer class by getting
the reference of a container with the help of blobClient.

CloudBlobContainer blobContainer =
blobClient.getContainerReference(container-name);
blobContainer.createIfNotExist();
Create a blob container using the createIfNotExist method that checks whether a container exists with the same name.
The method creates the blob container only if a container with the same name does not exist. Otherwise, no operation is
performed.
It is better to use createIfNotExist method instead of create method, as create throws StorageException if the
specified container name already exists.

BLOB AND BLOB CONTAINER NAMING CONVENTIONS


Blob container names are alphanumeric and lowercase, while blob names are case-sensitive. For complete naming rules,
please view this article.

HOW TO DELETE A BLOB CONTAINERHOW TO CREATE A BLOB


Using the same approach that we specified for creating a blob container, a blob container can be deleted as well.

blobContainer.delete();
blobContainer.deleteIfExists();
HOW TO CHECK THE EXISTENCE OF A CONTAINER
If a container exists, the following code returns true.

// Check if the container exists or not.


boolean containerExists = blobContainer.exists();
Blobs are managed inside a blob container. By using CloudBlobContainer, we get the reference of the blobs available in
the specific container.
Blobs are of two types:
Block blobs: Block blobs are optimized for streaming and can be of a maximum of 200 GB in size. Block blobs consist of
blocks, identified by block ID. Each block can be of different size but not exceeding 4MB. Block blobs are used where
random access to data is not required, for example streaming video or a jpeg file.
Page blobs: Page blobs are a collection of 512-byte pages that are optimized for random read and write operations.
These blobs provide the ability to write to a specific range of bytes and can be of a maximum of 1 TB in size. Some of the
usage scenarios for Page Blobs are virtual hard drives (VHDs), Files with Range-Based Updates (updating just the parts
of the blob that have changed using ranged writes).

For more information on Block blobs and Page blobs, you can visit Understanding Block Blobs and Page Blobs .
We create a block blob for an image file using following code. If a blob with the same name already exists, the code
overwrites the existing blob.

// Getting reference to a block blob.


CloudBlob blockBlob = blobContainer.getBlockBlobReference(image.gif);
blockBlob.getProperties().setContentType(image/gif);
// Uploading stream to the blob.
InputStream stream = new FileInputStream( C:\image.gif );
blockBlob.upload(stream, stream.available());
HOW TO DELETE A BLOB
Using the same approach that we specified for creating a blob, a blob can be deleted as well.

blob.delete();
blob.deleteIfExists();
HOW TO VERIFY INTEGRITY OF THE BLOB CONTENT
Data transfer over a network is possibly going to face some errors. While uploading or downloading data on cloud, the
data may get corrupted due to network behavior or some other intermittent issues.
To reduce the risk of corrupt data being processed, Windows Azure blob storage supports MD5 hashing. This hashing
ensures end-to-end data integrity.
While uploading a blob, we calculate the Base64 encoded MD5 content for the blob. This encoded content is also
uploaded in the request header. The encoded content is then used to perform the end-to-end integrity of the data being
uploaded. If the content of blob and its MD5 hash dont match, the upload operation will fail. The blob will not be
uploaded, and the operation will throw StorageException.

String blobContent = This is the blob content. ;


byte [] blobContentBytes = blobContent.getBytes();
//Generating MD5 of the blob content.
MessageDigest md = MessageDigest.getInstance(MD5);
md.reset();
md.update(blobContentBytes);
// Encode the md5 content using Base64 encoding
String base64EncodedMD5content = Base64.encode(md.digest());
// initialize blob properties and assign md5 content generated.
BlobProperties blobProperties = blob.getProperties();
blobProperties.setContentMD5(base64EncodedMD5content);
// Upload the blob content in the blob along with request options.

// This will also upload the ContentMD5 property.


// The Server will verify the uploaded content against ContentMD5 if not matched
will throw an exception.
InputStream stream = new ByteArrayInputStream(blobContentBytes);
try {
// If the integrity check fails then it throws StorageException
blob.upload(stream, stream.available());
} catch (StorageException storageException) {
storageException.printStackTrace();
}
LEASING BLOBS
Leasing means acquiring a lock on a blob. No other lease can be acquired for that blob until the lease is released. This is
particularly useful in a multi-threaded or multi-role instance scenario. Consider that only one instance is to run a process
such as a scheduler. This can be ensured by making use of lease. The instance that succeeds in acquiring lease will run
the scheduler, and other instances will not as they fail to acquire a lease. Thus leases can be used to manage
concurrency.
Another use of lease can be to maintain consistency of blob contents. For example, suppose a blob contains a counter
which should be updated by only one instance in multi-role instance scenario. This is achieved with the help of lease as it
provides exclusive write access to a blob. The instance that succeeds in acquiring lease will update the counter.

h3 style=LINE-HEIGHT: 150%; MARGIN: 10pt 0in 0pt>How to Lease BlobsAcquire Lease


The Lease Blob operation establishes and manages a lock on a blob to get exclusive write access to the blob and its
metadata or properties. Lease can be acquired for 15s up to 60s or for an infinite time period.
The Lease Blob operation can be called in one of four modes:
Acquire
Renew
Release
Break
Acquire lease is to request a new lease. The method blob.acquireLease() returns a Lease ID. Using this ID, we can
modify, renew, and release a lease.

// Lease Blob and acquire lock. It returns the lease ID


String leaseId = blob.acquireLease();
RENEW LEASE
Renew lease is to renew an existing lease for additional 60 seconds to continue the protected write access to the blob
and its data.

// Setting the lease ID into Access Conditions


AccessCondition accessCondition = new AccessCondition();
accessCondition.setLeaseID(leaseId);
// Renew lease
blob.renewLease(accessCondition);
RELEASE LEASE
Release lease is to free a lease if it is no longer needed so that another client may immediately acquire a lease on the
blob. It requires a Lease ID.

blob.releaseLease(accessCondition);
BREAK LEASE
Break lease is to end a lease. After breaking a lease, we must ensure that another client cannot acquire a new lease to
the blob until the current lease period has expired. Breaking a lease leaves the blob in an unlocked state for the remaining
duration of the lease period.

// Break Lease
blob.breakLease();
HOW TO CREATE A BLOB SNAPSHOT
Sometimes an application may corrupt the blob content while processing it. This may happen due to different reasons.
For example, a runtime exception occurs before completing the operations. In this situation, we may need to reinstate the
blob to its original content. This can be done by creating a blob snapshot.
Another use of creating blob snapshots is for creating backups of blobs. The name of a snapshot consists of base blob
name followed by the DateTime value as suffix. The DateTime value indicates time at which snapshot was taken. Blob
snapshots are read-only. They can be read, copied, and deleted; but never modified.
Upon creation, snapshots have no associated cost. However, as committed blocks (or pages) are replaced in the base
blob, storage costs begin to accrue as the base blob diverges from the snapshot.
More details about snapshots may be found here. Snapshot billing details are here.
The createSnapshot method creates a read-only snapshot of a blob.

// Create a snapshot
CloudBlob snapshotBlob = blob.createSnapshot();
A unique ID is associated with each snapshot blob. Generally, the :mestamp of the snapshot blob is the ID. It appears as a
string. For example, 2012-03-26T14:16:18.0174890Z.
// Get the snapshot ID

String snapshotID = snapshotBlob.getSnapshotID();


HOW TO DELETE A BLOB SNAPSHOTHOW TO SET ACCESS CONTROL LEVELS ON BLOB CONTAINERS AND BLOBS
When we delete a snapshot blob, only the snapshot of the original blob is deleted. The original blob is not deleted. If we
have the reference of the snapshot blob, we delete the snapshot using the following code.

// Deleting the snapshot


booleansnapshotDeleted= snapshotBlob.deleteIfExists();
LISTING THE BLOB SNAPSHOTS
There can be scenarios where you need to restore the contents of a blob from a snapshot that was taken in a DateTime
range for example 12 July 2012 between 4 to 5 pm. In such a case you need to iterate through the snapshot list, retrieve
and parse URL for the DateTime value. Once found, restore from the snapshot by copying the snapshot to the base blob.
The following code lists the existing blob snapshots along with the original blobs in the container.

for (ListBlobItem item : blobContainer.listBlobs(, true,


EnumSet.of(BlobListingDetails.SNAPSHOTS), null, null)) {
URI snapshotURI = item.getUri();
}
Setting Access Control Levels (ACLs) for a blob container is setting permissions for the container. The permissions define
who can access the blob container.
The access level can be public or private. Anyone can access a public container using the URLs of the blobs that are
available in the container, while a private container can only be accessed using the account credentials (or with a Shared
Access Signature, which will be covered shortly).
The BlobContainerPermissions class is used to set the permissions that are uploaded to a blob container.

BlobContainerPermissions permissions = new BlobContainerPermissions();


SETTING PRIVATE ACCESS LEVEL FOR A BLOB CONTAINER
By default blob container has private access level. In some scenarios it may be required to change the access level from
public to private. Setting private access for a blob container restricts the access to the container for everyone, except the
account holder. Only the one who holds the account credentials can work with the blobs inside a private container.

// Setting Private access to Container


permissions.setPublicAccess(BlobContainerPublicAccessType.OFF);
// Uploading the permissions
blobContainer.uploadPermissions(permissions);
SETTING PUBLIC ACCESS LEVEL FOR A BLOB CONTAINER
The container with the public access is available to everyone who knows the container URL. All the blobs under a public
container are by default public and accessible to all.

Clients can read the container metadata and the blob content, and can also list the blobs within the container.

// Setting Public access to Container


permissions.setPublicAccess(BlobContainerPublicAccessType.CONTAINER);
// Uploading the permissions
blobContainer.uploadPermissions(permissions);
PUBLIC ACCESS LEVEL TO A BLOB
The following code sets the blobs as public inside a private container. Users can read the content and metadata of blobs
within this container. But, they cannot read the container metadata or list the blobs within the container.

// Setting Public access to Container


permissions.setPublicAccess(BlobContainerPublicAccessType.BLOB);
// Uploading the permissions
blobContainer.uploadPermissions(permissions);
HOW TO USE SHARED ACCESS SIGNATURE ON BLOB CONTAINERS
Shared Access Signature is used to make a private blob container or blobs accessible to public for a specific period of
time. It permits us to provide access rights to containers and blobs at a more granular level than by simply setting the
permission for a container for public access.
The following sample code grants the Shared Access READ permission to a container for an hour.

//Set ACL to private


BlobContainerPermissions permissions = new BlobContainerPermissions();
// Setting Private access to Container
permissions.setPublicAccess(BlobContainerPublicAccessType.OFF);
Calendar cal = Calendar.getInstance();
cal.setTimeZone(TimeZone.getTimeZone(UTC));
// Define the start and end time to grant permissions.
Date sharedAccessStartTime = cal.getTime();
cal.add(Calendar.HOUR, 1);
Date sharedAccessExpiryTime = cal.getTime();
// Define shared access policy
SharedAccessPolicy policy = new SharedAccessPolicy ();
// In the Sample the Shared Access Permissions are set to READ permission.
EnumSet<SharedAccessPermissions> perEnumSet =
EnumSet.of(SharedAccessPermissions.READ);
policy.setPermissions(perEnumSet);
policy.setSharedAccessExpiryTime(sharedAccessExpiryTime);
policy.setSharedAccessStartTime(sharedAccessStartTime);
// Define Blob container permissions.

HashMap<String, SharedAccessPolicy > map = new HashMap<String,


SharedAccessPolicy >();
map.put(policy, policy);
permissions = new BlobContainerPermissions();
permissions.setSharedAccessPolicies(map);
// Uploading the permissions
blobContainer.uploadPermissions(permissions);
The Signature is generated using CloudBlobContainer as shown in the following code.

SharedAccessPolicy policy =
blobContainer.downloadPermissions().getSharedAccessPolicies().get(policy);
String signature = blobContainer.generateSharedAccessSignature(policy));
After generating the Signature, the format of the URL to access the blobs in the container is:

http://<storage-account-name>.blob.core.windows.net/<container-name>/<blob-name>?
<signature>
HOW TO USE SHARED ACCESS SIGNATURE ON BLOBS
Shared Access Signature (SAS) on a blob will allow operations on it for a specific duration as specified in the code.
Following code is to generate SAS for duration of 30 minutes.

// Generate shared access signature on blob


// Define the start and end time to granting permissions.
Calendar cal = Calendar.getInstance();
cal.setTimeZone(TimeZone.getTimeZone(UTC));
// Define the start and end time to grant permissions.
// To handle clock skew set start time 5 min early and
// expiry time 5 min later. So actual duration to be specified
// for SAS is between 1hr 5min and 1hr 35 min from now.
cal.add(Calendar.HOUR, 1);
Date sharedAccessStartTime = cal.getTime();
cal.add(Calendar.MINUTE, 40);
Date sharedAccessExpiryTime = cal.getTime();
// Define shared access policy
SharedAccessPolicy policy = new SharedAccessPolicy();
EnumSet<SharedAccessPermissions> perEnumSet =
EnumSet.of(SharedAccessPermissions.READ);
policy.setPermissions(perEnumSet);
policy.setSharedAccessExpiryTime(sharedAccessExpiryTime);
policy.setSharedAccessStartTime(sharedAccessStartTime);
//Generating Shared Access Signature
String sharedUri = blob.generateSharedAccessSignature(policy);

A SAS token might start or expire earlier or later than expected as a result of clock skew. To handle clock skew problem
specify start time a few minutes earlier than required and expiry time a few minutes later than required.

HOW TO LIST BLOBS IN A CONTAINERHOW TO CREATE A DIRECTORY STRUCTURE


The CloudBlobContainer.listBlobs method returns the iterator for ListBlobItem objects from the container.

// Listing Blobs in a container and printing its URI


for(ListBlobItem blobItem : blobContainer.listBlobs()) {
// Getting URI
URI blobUri = blobItem.getUri();
// Get the blob using the Uri and
// perform operations on blob.
}
Blob Storage does not have nested containers. To simulate a subdirectory structure in Blob Storage, we can use forward
slashes (/) to represent the directory level in the respective URI. For example, MainDir/SubDir/sampleTextDoc.txt.

CloudBlob blob =
container.getBlockBlobReference(MainDir/SubDir/sampleTextDoc.txt);
This will help in organizing blobs in a container.
In order to search through a structure of subdirectories you can use the method listBlobs of blob container and specify
the prefix parameter. The prefix parameter value can be path to subdirectory whose contents are to be listed.

SUMMARY
In this article we discussed using the Blob Storage service to perform various operations on blobs. Also we discussed
how acquiring leases on blobs can be used to manage concurrency. We demonstrated generating shared access
signature to provide access rights to blobs for specific time periods.

You might also like