You are on page 1of 6

BIG DATA

We all are inundated with ever growing volumes of data from both within and outside the enterprise
and are increasingly looking for tools and techniques to obtain actionable insights for effective decision
making.

From data warehousing, to data marts, to reporting tools to BI, and now Big Data,

organizations and leaders have been inundated with technology fads.

What is Big Data:


Big data is a popular term used to describe the exponential growth, availability and use of information,
both structured and unstructured. Much has been written on the big data trend and how it can serve as
the basis for innovation, differentiation and growth. According to IDC, it is imperative that
organizations and IT leaders focus on the ever-increasing volume, variety and velocity of information
that forms big data.

Big data is a collection of data sets so large and complex that it becomes difficult to process using onhand database management tools or traditional data processing applications. The challenges include
capture, curation, storage, search, sharing, analysis, and visualization. The trend to larger data sets is
due to the additional information derivable from analysis of a single large set of related data, as
compared to separate smaller sets with the same total amount of data, allowing correlations to be
found to "spot business trends, determine quality of research, prevent diseases, link legal citations,
combat crime, and determine real-time roadway traffic conditions.
Big data is an emerging phenomenon characterized by the four Vs: volume, velocity, variety, and
Variability:

Volume enterprises are swimming in data, banks, for example collect data in terabytes (the U.S.
Library of Congress total book stack measures 15TB and petabytes (Google processes more than
1PB every hour).
Managing huge volumes of data is a major challenge for financial services firms, for example. Data
sharing across Wall Street enterprises is still a big issue, as each business unit frequently prefers
calculating from its own set of data. With these companies collecting even more unstructured data,
advanced enterprises have developed tools that can analyze news, via video, audio and Twitter, for
example, in real-time to help make trading decisions.
New regulations focused on transparency and risk management to be put in place in 2012 are
driving greater urgency among capital markets firms to manage big data.

Velocity global banks handle trillions of messages in a single days trading, mostly processed by
computers.

Variety the IT industry has dealt with big data for decades as structured data in static and
disciplined databases and spreadsheets.
Whats new are tools to effectively capture, visualize and analyze unstructured data that is messy,

moving, ubiquitous, streaming in text, audio, video, clicks, PDF files, email, blogs, tweets, sensors
and the rest.
About 80 percent of a corporations data is unstructured including office productivity documents, email, Web content, in addition to social media. Email and messaging systems create unstructured
data more than anything else. While two of five respondents to a Unisphere survey say upper
management is barely aware of the challenges of unstructured data, IT professionals are seriously
concerned about the volumes theyre getting. At least 57 percent of respondents report that
unstructured data is very important and about 18 percent consider it a core of their business.

Variability semantics or the variability of meaning in language.

Big Data technologies like Apache Hadoop provide a framework for large-scale, distributed data
storage and processing across clusters of hundreds or even thousands of networked computers. The
overall goal is to provide a scalable solution for vast quantities of data (terabytes/petabytes/exabytes)
while maintaining reasonable processing times. These systems are incredibly effective for storing and
analyzing large volumes of structured as well as unstructured or semi-structured data such as text,
web or application logs, email, web pages, documents, and images.

Examples of Big Data

Amazon.com handles millions of back-end operations every day, as well as queries from more than
half a million third-party sellers. The core technology that keeps Amazon running is Linux-based
and as of 2005 they had the worlds three largest Linux databases, with capacities of 7.8 TB, 18.5
TB, and 24.7 TB.[36]

Walmart handles more than 1 million customer transactions every hour, which is imported into
databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data the equivalent
of 167 times the information contained in all the books in the US Library of Congress.[5]

Facebook handles 50 billion photos from its user base.

FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide.
[37]

The volume of business data worldwide, across all companies, doubles every 1.2 years, according
to estimates.[38]

Big Data in the Enterprise


Companies are capturing and digitizing more information than ever before. According to IDC, the world
produced one zettabyte (1,000,000,000,000 gigabytes) of data in 2010. Fueling this data explosion are
over five billion mobile phones, 30 billion pieces of content shared on Facebook per month, 20 billion

Internet searches per month, and millions of networked sensors connected to mobile phones, energy
meters, automobiles, shipping containers, retail packaging and more. Big Data is a platform for
transforming all of this data into actionable items for business decision making.
The barriers to entry for Big Data analytics are rapidly shrinking. Big Data cloud services like Amazon
Elastic MapReduce and Microsofts Hadoop distribution for Windows Azure allow companies to spin up
Big Data projects without upfront infrastructure costs and allow them to respond quickly to scale-out
requirements. Commercial vendor support from companies like Cloudera can speed development and
deliver more value from Big Data projects. Bundled server options such as Oracles Big Data Appliance
offer fast setup and scale-out solutions. Finally, modular data center designs are emerging as a way to
efficiently manage hardware and scale-out rapidly and cost-effectively.

Companies likely to get the most out of Big Data analytics include:

Supply chain, logistics, and manufacturing With RFID sensors, handheld scanners, and onboard GPS vehicle and shipment tracking, logistics and manufacturing operations produce vast
quantities of information offering significant insight into route optimization, cost savings and
operational efficiency

Online services and web analytics Internet companies invented Big Data specifically to
handle processing information at Internet scale. Implementation of these analytical platforms is
now viable for smaller online services companies to provide an edge over competitors for
advertising, customer intelligence, capacity planning and more. Companies who dont offer online
services but do have an ecommerce or other online presence will benefit greatly from
understanding customer behavior and buying patterns via clickstream, cohort analysis and other
advanced analytics.

Financial services Financial markets generate immense quantities of stock market and
banking transaction data that can help companies maximize trading opportunities or identify
potentially fraudulent charges, among various other uses. New regulations also require detailed
financial records to be maintained for longer periods.

Energy and utilities Smart instrumentation such as smart grids and electronic sensors
attached to machinery, oil pipelines and equipment generate streams of incoming data that must
be stored and analyzed quickly to uncover and fix potential problems before they result in costly
or even disastrous failures.

Media and telecommunications Streaming media, smartphones, tablets, browsing behavior


and text messages are captured at ever-increasing rates all over the world, representing a
potential treasure trove of knowledge about user behavior and tastes.

Health care and life sciences Electronic medical records systems are some of the most dataintensive systems in the world and making sense of all this data to provide patient treatment
options and analyze data for clinical studies can have a dramatic effect for both individual
patients and public health management and policy.

Retail and consumer products Retailers can analyze vast quantities of sales transaction data
to unearth patterns in user behavior and monitor brand awareness and sentiment with social
networking data.

Data Warehouse Integration


To apply this new technology effectively, it is important to understand its role and when and how to
integrate Big Data with the other components of the data warehouse environment. In a vast majority
of cases, Big Data does not replace the data warehouse. Hadoop is built for speed and flexibility across
huge sets of often unstructured data, but is best used for fairly simple workloads, such as sorting,
aggregating, converting, and filtering. Hadoop is also not intended to manage schema structure,
referential integrity or security. Database management systems are therefore still a vital part of the
overall solution architecture. So how will Big Data Analytics be incorporated with existing BI/DW
investments?
Hadoop provides an adaptable and robust solution for storing large data volumes and aggregating and
applying business rules for on-the-fly analysis that crosses boundaries of traditional ETL and ad-hoc
analysis. It is also common for the results of Big Data processing jobs to be automated and loaded into
the data warehouse for further transformation, integration and analysis. This allows Big Data to be
integrated with data from other sources and exposed to users via BI tools, dashboards and reports.
Several options are available for extracting data from Hadoop into the data warehouse. IBM,
Informatica, Microsoft, Oracle and SAP have released or announced tools to interface between Hadoop
and relational database management systems.

User-Friendly Tools for Big Data


Tools like Apache Pig and Apache Hive provide SQL-like frameworks for advanced data analysts to run
queries directly against data stored in Hadoop. This is an effective way to do targeted, one-time
analysis, perform exploratory data mining, or develop queries that may later be automated and loaded
into a data warehouse. However, these tools require technical expertise and do not cater to end users.
Luckily, there are some exciting end-user tools coming in 2012. Tableau has support for drag and drop
Hadoop reporting currently in beta and Microsoft recently announced the Hive ODBC driver and the
Hive add-in for Excel which will allow end-user access to data stored in Hadoop through Excel,
PowerPivot and Analysis Services. Tools that enable end users to slice, dice and visualize data in
Hadoop will become increasingly important components of a companys Big Data analytics arsenal
over the coming years.
Big Data adoption will continue to be driven by large and/or rapidly growing data being captured by
automated and digitized business processes. Successful adoption of this technology requires turning
this raw information into usable knowledge throughout the enterprise. To accomplish this, companies

will need to intelligently incorporate Big Data into their existing information management systems and
take advantage of the developing ecosystem of integration and analysis tools. As we move into the
age of Big Data, companies that are able to put this technology to work for them are likely to find
significant revenue generating and cost savings opportunities that will differentiate them from their
competitors and drive success well into the next decade.

Scope of Big Data in India


The big data market in India, which presents a huge market opportunity for the IT services and
analytics firms, will grow to $ 1 billion in 2015. The global 'Big Data' market opportunity is estimated to
grow at 45 per cent annually to reach $ 25 billion by 2015, from the current size of about $ 8 billion,
according to a report launched by IT industry body Nasscom and Crisil today. It added that the Indian
Big Data industry is expected to grow...

Software Companies Exploring Big Data


Opera Solutions. Opera Solutions is an interesting company, because although its doing $100 million
in revenue a year. But its service is pretty compelling: Customers upload their data to Operas
platform, which then analyzes it and delivers results based on the relevant signals in a customers
data set. Not content with providing generic analysis to customers, Opera focuses on each customers
specific needs and employs experts in a variety of industries to help it cater unique analytics programs
for each customer.
IBM. IBM has seemingly limitless options in terms of providing big data analytics as a cloud-based
service, but its current strategy appears centered around Hadoop. When IBM launched its SmartCloud
cloud computing platform in April, it promised that Hadoop workloads will be part of it. A likely
candidate to provide that capability is InfoSphere BigInsights, IBMs Hadoop-based software for
analyzing and visualizing large quantities of unstructured data. BigInsights previously was available as
a service on IBMs test-and-development cloud that SmartCloud replaced.
Amazon Web Services. AWS isnt providing actual analytics as a service, just the parallel processing
framework and computing power necessary to do them at scale. Its Elastic MapReduce platform is a
cloud-based Hadoop implementation onto which users port their Hadoop applications, then upload
their data and run the workload. Like all things AWS, customers only pay for the resources used while
the jobs running, as well as for storing the data in AWSs S3 storage service.
HPCC Systems. LexisNexis spinoff and Hadoop alternative HPCC Systems plans to give customers
cloud-based access to a system running the companys HPCC data-processing software. During an
interview during Structure 2011, CTO Armando Escalante noted the company might even offer up its

own massive data sets which span the financial, legal and intelligence sectors, among others to
be processed by customers applications.

You might also like