You are on page 1of 8

White

Paper

Closing the Big Data Management


and Security Gap

By Nik Rouda, Senior Analyst



October 2014











This ESG White Paper was commissioned by Zettaset


and is distributed under license from ESG.


2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.

White Paper: Closing the Big Data Management & Security Gap 2

Contents
Big Data Is Gaining Momentum, but Increasing Concerns, Too .................................................................. 3
Big Data Projects Still Rely Heavily on Professional Services ................................................................................... 3
Security Still a Top Concern for Big Data Platforms ................................................................................................. 4
How Organizations Should Automate and Secure Big Data Deployments ................................................. 5
Zettaset Delivers a Safer, More Automated and Secure Solution .............................................................. 6

The Bigger Truth ......................................................................................................................................... 7


















































All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The
Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are
subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of
this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the
express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and,
if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.

2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.

White Paper: Closing the Big Data Management & Security Gap 3

Big Data Is Gaining Momentum, but Increasing Concerns, Too


More and more companies are exploring new opportunities offered by big data and advanced analytics, across a
broad range of industries and functional lines of business. Data-driven decision making is being seen not as a luxury,
a management fad, or an area for future innovation, but as an essential need in order to compete successfully in
the modern world. In parallel or even driving this interest, emerging technologies like Hadoop and NoSQL databases
are finding a ready market and are increasingly being chosen as the primary platforms for accommodating the
intense demands of big data. The appetite and applications are virtually endless, applicable to nearly any business
process or activity, and limited more often by managerial creativity and institutional resistance to change than by
technology today.
IT budgets are suddenly reflecting this fundamental shift as well, and recent ESG research found 56% of companies
surveyed are increasing their investments in big data and analytics by more than 10% in 2014, as compared with
the previous year.1 This rapid increase further indicates that most organizations are now moving beyond small
pilots and proof-of-concept stages into enterprise-wide production deployments.
However, as big data projects migrate from pilot to production deployment and extend beyond the exclusive realm
of IT and into the business unit, new factors come into play. How will the enterprise efficiently scale a technology
that is still relatively immature and overly dependent on manual installation and configuration processes? How will
the enterprise lock down sensitive data in Hadoop and NoSQL environments for Big Data technologies that were
never conceived with security in mind?

Big Data Projects Still Rely Heavily on Professional Services


Development of a big data solution is still a complex undertaking that is very interdisciplinary in nature, requiring
specialized personnel to provide operational support. Hadoop is rapidly evolving, but has not yet reached the level
of maturity and sophistication that traditional relational databases offer. There may not be enough in-house
expertise to understand all the requirements of the new Big Data platforms, making users more reliant on the
professional services.
Persistent skills gaps in various IT disciplines impact projects, and these include shortages in security (25%
surveyed), architecture planning (24%), BI and analytics (20%), and database administration (17%), as shown in
Figure 1.2 If unaddressed, these staff gaps will often lead to unforeseen delays and risks in new initiatives.
Hadoop and NoSQL technology is rapidly evolving, but has not yet reached the level of maturity and sophistication
that traditional relational databases offer. As a result, users expecting lower operational costs by using Hadoop
software and infrastructure can sometimes find they must spend significant sums for software support and
maintenance in the form of recurring subscription fees to vendors of branded Hadoop and NoSQL distributions.
It could be argued that since professional services represent a substantial revenue source for some distribution
vendors, they have less incentive to incorporate more process automation into their respective offerings. While
this model may have worked during the early phases of Hadoop deployment in pilot environments, it often
becomes a resource issue for organizations wishing to scale their deployments in an efficient and cost-effective
manner. More automation of management tasks could help organizations to avoid having to spend inordinate sums
for outside support and maintenance of a technology that has been touted as cost-saving.



Source: ESG Research Report, Enterprise Data Analytics Trends, May 2014.
Ibid.

2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.

White Paper: Closing the Big Data Management & Security Gap 4

Figure 1. Top Ten Skills Shortages Impacting Initiative Success

In which of the following areas do you believe your IT organizaGon currently has a
problemaGc shortage of exisGng skills? (Percent of respondents, N=545, mulGple
responses accepted)
Informaeon security

25%

IT architecture/planning

24%

Mobile applicaeon development

21%

Business intelligence/data analyecs

20%

Server virtualizaeon/private cloud infrastructure

20%
19%

Mobile device management


Applicaeon development

18%

Database administraeon

17%

Data proteceon (i.e., backup and recovery)

17%
0%

5%

10%

15%

20%

25%

30%

Source: Enterprise Strategy Group, 2014.

Security Still a Top Concern for Big Data Platforms


As the number of distinct data sources and total data volumes grow exponentially, correspondingly more strategic
planning and tactical administration is required, and this basic talent problem is magnified to potentially deleterious
effect. This problem can manifest in different ways, but when asked about it by ESG, 38% of respondents cited
security requirements as being a top order challenge due to unchecked size growth and proliferation of databases.3
So not only is there more data, in more places, and too few people to steer projects, but also the stakes are raised
for protecting this sensitive information in the age of malicious hackers, advanced persistent threats, and
occasional internal malfeasance.
One implication is that these new big data projects cant be led solely by the data scientists, analysts, and database
administrators. While they may possess the know-how to design in new functionality and support new applications,
they may not have the detailed understanding and skill-set required to manage the security nuances. A copy of
privileged data in a test and development environmental is still a copy susceptible to breach, and more worryingly,
the end goal of consolidating as much information as possible into a central data lake or hub can further compound
the exposure if not handled appropriately.
As such, ESG research found that 84% of respondents in a recent enterprise data survey say it is important or crucial
that security teams are actively involved in development of new big data and analytics initiatives.4 This is proven
out in customers lists of technology evaluation criteria for selecting an enterprise data management platform in
Figure 2, below. Security is tied for first place as the most important factor according to survey respondents when
defining requirements for new initiatives in big data, analytics, or business intelligence.5 With these various
challenges in mind, most customers are looking for already proven approaches to achieving better security in the
face of pressure to deliver new deployments in the most efficient and cost-effective way.

Source: ESG Research Report, Enterprise Database Trends in a Big Data World, July 2014.
Source: ESG Research Report, Enterprise Data Analytics Trends, May 2014.
5
Source: Ibid.
4

2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.

White Paper: Closing the Big Data Management & Security Gap 5

Figure 2. Top Five Most Important Criteria in Evaluating a Big Data Solution

Which of the following aUributes are most important to your organizaGon when
considering technology soluGons in the area of business intelligence, analyGcs, and big
data? (Percent of respondents, N=375, three responses accepted)
Security

26%

Cost, ROI and/or TCO

26%

Reliability

22%

21%

Performance

Ease of integraeon with other applicaeons, APIs

20%
0%

5%

10%

15%

20%

25%

30%

Source: Enterprise Strategy Group, 2014.


How Organizations Should Automate and Secure Big Data Deployments
The good news is that as adoption has accelerated and more production deployments are being settled into
enterprise environments, there are now some emerging best practices to follow to automate and secure a Hadoop
environment. The bad news is that the requisite functionality is by no means yet a standardized part of any
particular distribution, and many customers will need to look carefully at vendors glib promises to determine for
themselves which are most up for the deployment and security challenge. A typical CISO will be interested in
establishing sound methodologies for security efficacy, operational efficiency, and enabling the business to conduct
activities in a safe manner without undue burden.
Both IT and line of business leaders should take an interest and demand the best-of-breed capabilities outlined in
Table 1 from any production solution.
Table 1. Four Primary Considerations in Selecting a Secure Big Data Platform

Common Enterprise Requirements

Deployment (incl. automation and integration of


tested configurations)
Encryption (both at rest and in motion) and/or
data masking as appropriate
Key management (incl. policies, HA, and key
management interoperability protocol - KMIP)
User authentication and access control by role for
users and administrators

Impact / Benefit
Faster time to production and reduced risk of security gaps
Safer ETL and storage of everything in data lake/hub
Simplified key admin and more reliable access
Only approved people can see only appropriate data
Source: Enterprise Strategy Group, 2014.

2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.

White Paper: Closing the Big Data Management & Security Gap 6

While set up and configuration of a few management and data nodes in a Hadoop cluster may be touted as
relatively easy to do, the manual effort introduces chances of errors, which are increased for each additional
instance. Having an automated system for deployment simplifies this process, making for both a more scalable and
more reliably protected environment.
Encryption may seem like a common tick box option on many Hadoop distributions, but not all follow the same
conventions or coverage model. Ensure that all data on disk is covered with strong encryption, and take steps to
also guard against network attacks for data being transferred between nodes; during extract, transform, and load
activities; and when exporting information. Data masking can also be useful if certain fields need to be identifiably
unique for analytics without exposing their actual contents.
Though encryption itself may seem quite simple to turn on, key management is often the weak point of solutions,
particularly in larger, more varied, or more dynamic environments. Unique keys should be generated and controlled
via customizable policies, kept and provided in a highly available source, and compliant with KMIP definitions. Key
management should also have role-based administration and auditing capabilities.
Even if the whole environment is defended from external attacks using these mechanisms, steps should be taken to
limit access to particular data sets for only authenticated users. This should be fine-grained, role-based,
automatically tied into AD and LDAP protocols, and carry over permissions as specified from these proven access
control systems.
From a broader perspective, additional steps should be explored as best practices, including establishing a security
zone for the analytics servers, deploying these servers in a hardened configuration, frequent scanning and timely
patching, and traffic monitoring. These approaches are not necessarily different for Hadoop environments,
however, and should be considered as a standard part of a larger IT security framework.
Although a non-trivial undertaking, IT technology decision makers should build these into their must have
evaluation criteria, and select products that have functionality to match.

Zettaset Delivers a Safer, More Automated and Secure Solution


While many companies, young and old, are rushing to capitalize on the new opportunities afforded by big data,
many vendors are seeking to provide them with the technology to do so. Of these, some focus on performance,
some on connectivity, and some on vertical-specific applications. Zettaset is differentiating with a focus on building
rock solid enterprise-ready management and security applications that augment and improve the branded open-
source distribution frameworks. In doing so, Zettaset enables other vendors big data solutions to also better meet
enterprise operational requirements. As already noted, these requirements may not be top of mind for the DBA or
data scientist, but they will be critical steps before IT infrastructure and operations teams can adopt the new
solutions and begin enterprise-wide production deployments.
Zettasets Orchestrator provides a more mature, more comprehensive approach to managing big data
environments, automating and standardizing common activities like cluster configuration, node deployment, set up
of interfaces to applications, general administration, and not least, securing Hadoop environments.
With the recent Fast-PATH addition, Orchestrator process automation reduces reliance on manual efforts and
accelerates database cluster deployment. In the companys internal benchmark testing, Zettaset found Fast-PATH
was able to fully install a 50-node Hadoop cluster in 140 minutes, which would almost certainly be quicker and less
error-prone than a manual effort. The benchmark time includes installation of the Hadoop distribution, as well as
installation of Kerberos, HBase, Hive, Encryption, Key Management, and Zettasets patented High-Availability
framework on all nodes. Orchestrator Fast-PATH dramatically lowers operational costs and reduces the IT resource
requirements necessary to implement Hadoop, as well as reduces time to value from weeks to hours.
Now Zettaset is going a step further and modularizing key components, like Hadoop security and their patented
multi-service high availability and automated failover, to more easily complement and integrate with popular
Hadoop distributions from Cloudera and Hortonworks. This enterprise-class add-on functionality enhances the

2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.

White Paper: Closing the Big Data Management & Security Gap 7

management and security mechanisms of most branded distributions, and will help address the considerations
outlined in Table 1.
Specific modularized Big Data management and security capabilities include:

Data-at-rest Encryption Zettaset offers a standards-based, low-overhead approach linking up AES-256


bit disk partition encryption with existing frameworks, and smoothly interoperates with KMIP-
compliant key management, PKCS hardware security modules, and a wide range of leading Hadoop
distributions and NoSQL databases. This complements open source encryption approaches for data in
motion in Hadoop clusters, and also ensures the Orchestrator console communications are safe.
Multi-Service High Availability - Hadoop cluster environments are complex, and require multiple
services to productively function. Zettaset Orchestrator uniquely delivers enterprise class high
availability with automated fail-over for all Hadoop services running in a cluster, eliminating single
points of failure that exist in open source Hadoop, and delivering the robust security and compliance
capabilities that enterprises expect and need.
Fine-Grained, Role-based Access Control Because Hadoop may often contain a wide range of
information, both management tools and data itself must be restricted to those who need to know.
Fine-grained controls ensure that roles and permissions can be easily customized, and that only
appropriate administrators and users can make changes or access sensitive information.

Zettaset has a bigger vision, too, including smoother deployments, better reliability, improved performance, and
easier support and administration for broader big data environments. Centralizing and certifying management of all
required functions to meet enterprise operational standards will go a long way to facilitating the adoption of
technologies that are still evolving and maturing. Modularizing the Zettaset offerings opens them up to the wider
community with a flexible a la carte menu to suit specific enterprise requirements, while also paving the way for
an expanded, more comprehensive, and fully integrated solution for big data management and security.


The Bigger Truth
Big data is rapidly entering the mainstream, and new data platforms like Hadoop and NoSQL databases are
becoming increasingly popular tools to capture and serve up more enterprise data than ever before, spanning
sensitive personal profile, health, financial, and sometimes R&D information. Not only is more data being collected
and compiled into a single repository, but also more people are being given access to this data across multiple lines
of business for application development and for analysis and reporting. Yet these emerging technologies are not yet
fully mature in their security capabilities, increasing the risk of a super breach. The financial repercussions and
brand damage of an incident are well documented, as are the limitations of simple perimeter-based security
products.
While many are leaping into the big data opportunity with enthusiasm, the need to build a robust, manageable, and
safe solution is paramount. Many vendors are paying lip-service to these issues, but few have really understood the
scope of the problem or yet endeavored to design and implement a truly protected product. Zettaset has focused
on building more comprehensive security and management functionality, and offers a great complementary
solution that addresses the inherent risks of Hadoop distribution frameworks.

2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.
















































20 Asylum Street | Milford, MA 01757 | Tel: 508.482.0188 Fax: 508.482.0218 | www.esg-global.com

You might also like