You are on page 1of 7

Intersection of

Big Data and


Cloud
Computing
Intersection of Big Data and Cloud Computing

The entire industry is moving around SMAC (Social-Mobile-


Analytics-Cloud) and IOT technologies. This is where most of
the opportunities lie for the next couple of years for both
who want to get into IT field and for those who are nect deep
into the IT. These technologies dont work alone, but they
work in tandem with each other. In this blog, we will explore
how the Big Data and Cloud technologies go hand in hand.
Having a good grip on the Big Data and the Cloud
technologies will increase the candidates prospects.
Traditionally data used to be stored on high end machines with proprietary software,
both of which are expensive to procure and maintain. Trend is moving towards the Big
Data tools and technologies, where we use commodity machines and open source
softwares.

The open sources softwares are relatively cheaper when compared to the proprietary
softwares and can be customized the way we want to as per our requirements. Apache
Hadoop, Hive, Pig, Sqoop, Oozie are a few of the popular open source Big Data
softwares. The code for them can be got here.

The different companies like Cloudera, Hortonworks,


MapR, Databricks, DataStax take the open source
software from the Apache Software Foundation and
fine tune them, make them much more user friendly
and offer it as a product with commercial support,
documentation, trainings etc.
Coming to the commodity hardware these machines are not as powerful as server grade
machines and not as dumb as desktop grade machines, they fall in between. Big Data
processing happens on thousands of these commodity machines. The more the data, the
more the machines required.

When the machines are in thousands, there is a good probability that some of the
machines go down on a regular basis. Instead of using the proprietary and high-end
hardware to address the failure scenarios, the software will tackle them.

The hardware failures can be like a hard disk going down, a problem with the network
card or it can be any one of the infinite number of problems. In this case the software
will automatically route the data and the processing to a healthy machine.
With the amount, the different types/complexity of the data increasing day by day
increasingly more machines are required for the storage and computation. Now a days,
the processing is also being shifting to specialized hardware like the GPUs from Nvidia,
ASIC processors like the Tensor Processing Unit from Google. These specialized
hardwares not only are expensive, but get outdated fast.

This is where Cloud comes in the play. In the Cloud, the hardware can be got without any
upfront commitment and we exactly pay for what we use. Its exactly like renting a car.
Lets say a server in the Amazon Cloud costs 1$ and hour. If we use it for 10 hours, then
we need to pay 10$ to Amazon at the end of the billing cycle.

The different Cloud vendors provide services like Amazon Elastic MapReduce, Google
Cloud Dataproc which makes it easy to spawn a cluster of machines and do complicated
Big Data processing using Spark, Hive, Pig and other Big Data softwares.
As a user, we dont need to worry about procuring the hardware, installing the software
and other minute details. We can simply think more about the business and the
customers, let the Cloud vendor worry about the rest of the details. The different
services provided by the Amazon and the Google Cloud are mentioned here.

To summarize, Cloud and Big Data technologies have very good prospects, but an
individual with the combination of these two technologies will make him/her much
more desirable in the IT field.

Turn to our expert trainers and career advisors who will make comfortable with Cloud
Computing with AWS and Big Data training program.

For information on Kovid Academy training programmes, click here.


Visit Us: Contact Us:
www.kovidacademy.com support@kovidacademy.com

US: 609-436-9548 ,
IND: +91 9700022933.

You might also like