Professional Documents
Culture Documents
We all are inundated with ever growing volumes of data from both within and outside the enterprise
and are increasingly looking for tools and techniques to obtain actionable insights for effective decision
making.
From data warehousing, to data marts, to reporting tools to BI, and now Big Data,
Big data is a collection of data sets so large and complex that it becomes difficult to process using onhand database management tools or traditional data processing applications. The challenges include
capture, curation, storage, search, sharing, analysis, and visualization. The trend to larger data sets is
due to the additional information derivable from analysis of a single large set of related data, as
compared to separate smaller sets with the same total amount of data, allowing correlations to be
found to "spot business trends, determine quality of research, prevent diseases, link legal citations,
combat crime, and determine real-time roadway traffic conditions.
Big data is an emerging phenomenon characterized by the four Vs: volume, velocity, variety, and
Variability:
Volume enterprises are swimming in data, banks, for example collect data in terabytes (the U.S.
Library of Congress total book stack measures 15TB and petabytes (Google processes more than
1PB every hour).
Managing huge volumes of data is a major challenge for financial services firms, for example. Data
sharing across Wall Street enterprises is still a big issue, as each business unit frequently prefers
calculating from its own set of data. With these companies collecting even more unstructured data,
advanced enterprises have developed tools that can analyze news, via video, audio and Twitter, for
example, in real-time to help make trading decisions.
New regulations focused on transparency and risk management to be put in place in 2012 are
driving greater urgency among capital markets firms to manage big data.
Velocity global banks handle trillions of messages in a single days trading, mostly processed by
computers.
Variety the IT industry has dealt with big data for decades as structured data in static and
disciplined databases and spreadsheets.
Whats new are tools to effectively capture, visualize and analyze unstructured data that is messy,
moving, ubiquitous, streaming in text, audio, video, clicks, PDF files, email, blogs, tweets, sensors
and the rest.
About 80 percent of a corporations data is unstructured including office productivity documents, email, Web content, in addition to social media. Email and messaging systems create unstructured
data more than anything else. While two of five respondents to a Unisphere survey say upper
management is barely aware of the challenges of unstructured data, IT professionals are seriously
concerned about the volumes theyre getting. At least 57 percent of respondents report that
unstructured data is very important and about 18 percent consider it a core of their business.
Big Data technologies like Apache Hadoop provide a framework for large-scale, distributed data
storage and processing across clusters of hundreds or even thousands of networked computers. The
overall goal is to provide a scalable solution for vast quantities of data (terabytes/petabytes/exabytes)
while maintaining reasonable processing times. These systems are incredibly effective for storing and
analyzing large volumes of structured as well as unstructured or semi-structured data such as text,
web or application logs, email, web pages, documents, and images.
Amazon.com handles millions of back-end operations every day, as well as queries from more than
half a million third-party sellers. The core technology that keeps Amazon running is Linux-based
and as of 2005 they had the worlds three largest Linux databases, with capacities of 7.8 TB, 18.5
TB, and 24.7 TB.[36]
Walmart handles more than 1 million customer transactions every hour, which is imported into
databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data the equivalent
of 167 times the information contained in all the books in the US Library of Congress.[5]
FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide.
[37]
The volume of business data worldwide, across all companies, doubles every 1.2 years, according
to estimates.[38]
Internet searches per month, and millions of networked sensors connected to mobile phones, energy
meters, automobiles, shipping containers, retail packaging and more. Big Data is a platform for
transforming all of this data into actionable items for business decision making.
The barriers to entry for Big Data analytics are rapidly shrinking. Big Data cloud services like Amazon
Elastic MapReduce and Microsofts Hadoop distribution for Windows Azure allow companies to spin up
Big Data projects without upfront infrastructure costs and allow them to respond quickly to scale-out
requirements. Commercial vendor support from companies like Cloudera can speed development and
deliver more value from Big Data projects. Bundled server options such as Oracles Big Data Appliance
offer fast setup and scale-out solutions. Finally, modular data center designs are emerging as a way to
efficiently manage hardware and scale-out rapidly and cost-effectively.
Companies likely to get the most out of Big Data analytics include:
Supply chain, logistics, and manufacturing With RFID sensors, handheld scanners, and onboard GPS vehicle and shipment tracking, logistics and manufacturing operations produce vast
quantities of information offering significant insight into route optimization, cost savings and
operational efficiency
Online services and web analytics Internet companies invented Big Data specifically to
handle processing information at Internet scale. Implementation of these analytical platforms is
now viable for smaller online services companies to provide an edge over competitors for
advertising, customer intelligence, capacity planning and more. Companies who dont offer online
services but do have an ecommerce or other online presence will benefit greatly from
understanding customer behavior and buying patterns via clickstream, cohort analysis and other
advanced analytics.
Financial services Financial markets generate immense quantities of stock market and
banking transaction data that can help companies maximize trading opportunities or identify
potentially fraudulent charges, among various other uses. New regulations also require detailed
financial records to be maintained for longer periods.
Energy and utilities Smart instrumentation such as smart grids and electronic sensors
attached to machinery, oil pipelines and equipment generate streams of incoming data that must
be stored and analyzed quickly to uncover and fix potential problems before they result in costly
or even disastrous failures.
Health care and life sciences Electronic medical records systems are some of the most dataintensive systems in the world and making sense of all this data to provide patient treatment
options and analyze data for clinical studies can have a dramatic effect for both individual
patients and public health management and policy.
Retail and consumer products Retailers can analyze vast quantities of sales transaction data
to unearth patterns in user behavior and monitor brand awareness and sentiment with social
networking data.
will need to intelligently incorporate Big Data into their existing information management systems and
take advantage of the developing ecosystem of integration and analysis tools. As we move into the
age of Big Data, companies that are able to put this technology to work for them are likely to find
significant revenue generating and cost savings opportunities that will differentiate them from their
competitors and drive success well into the next decade.
own massive data sets which span the financial, legal and intelligence sectors, among others to
be processed by customers applications.