You are on page 1of 13

Striking It Rich?

SQL Server Joins the Rush to Harness, Analyze Big Data


Microsoft has worked to address the big issue of big data, as companies are dealing with a constant influx of digital information. SQL Server is no exception, and Microsoft has given its enterprise database a big data makeover. This e-book takes a close look at SQL Server 2012 and finds out why some DBAs are flocking to itand why others arent. Read on for some of the bigger features of the new release, including big data integration, and dive into the connection Microsoft built between SQL Server and Hadoop.

1 2 3

Big Data Is a Big Deal for SQL Server 2012 Armed With Its New Database Release, Microsoft Enters the Realm of Big Data SQL Server-Hadoop Highway to Big Data Ventures Into New Territory

chapter 1

Big Data Is a Big Deal for SQL Server 2012


By alaN r. earls
Home

Big Data is a Big Deal for sQl server 2012

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

or frank rietta, the fact that Microsoft has been willing to work with an established open source project is the best part of SQL Server 2012. Rietta is talking about the Apache Hadoop integration with SQL Server 2012. He is a software developer and president of Atlanta-based Rietta Inc., one of the member companies at the Advanced Technology Development Center, which runs the incubator program at the Georgia Institute of Technology. Big data is becoming increasingly important even for small startups, he said. And Microsofts partnering with established open source project is refreshing, he said. The alternative would have been reinventing the wheel, rendering itself incompatible with the competition. Rietta added that some of the new features in SQL Server 2012 look use-

ful, including column-based querying, enhanced Excel-based analytics and reporting and high availability improvements. But in his opinion, they are not compelling enough to encourage a migration from a working open source server stack to Microsoft. So hes not upgrading yet.

Making Big Data a Huge feature

Discussions with a spectrum of users and consultants revealed the usual reasons some IT shops like SQL Server over competitorsnamely, its simplicity and lower cost. SQL Server 2012 had its virtual launch in March and was made generally available April 1. And big data is no doubt on the minds of many IT shops and has them intrigued. Count Sanjay Bhatia among them. Bhatia is CEO and founder of Izenda, a decade-old ad hoc reporting company based in Atlanta. The company uses SQL Server to store all its data, and it is

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

chapter 1: Big Data iS a Big DeaL for SQL Server 2012

the database that 90% of its customers use. Bhatia likes the Hadoop feature because it is integration with what he called SQL core capability.

Home

Big Data is a Big Deal for sQl server 2012

the biggest thing to like about sQl server in general is that the microsoft software requires less of a scarce resource: DBas.
So you may have an enormous amount of unstructured data, and with Hadoop-like query capability you could have an enormous user volume able to access that data, he said. Bhatia added that when you consider SQL Server 2012 with Windows Server 8, a business unit can press a button and in effect, get an application. For example, if you want an app, it can be automatically installed in a private cloud. Then your business unit gets billed or it can refund the money if it doesnt work to the level of the service-level agreement. So instead of tackling huge software installations you can just try it first, he explained. Another enthusiast is Jordan Hudgens, senior software engineer at MCW Services in Midland, Texas, which specializes in creating software applications for the oil field industry. His company has a heterogeneous

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

database environment, including SQL Server 2008 for its .NET applications and MySQL for PHP. They are using managed servers from Rackspace and LiquidWeb, along with other cloudbased systems, and are in the process of migrating to SQL Server 2012. Hudgens said the best feature in the 2012 release for MCW Services is its ad hoc query paging. By being able to more efficiently filter query results, we can decrease lag time and increase the data available to users, he said. Specifically, the new functionality built into the offset, select and fetch commands will help return more accurate results for our data-rich applications.

SQL Server, generaLLy Speaking

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

Bhatia said the biggest thing to like about SQL Server in general is that the Microsoft software requires less of a scarce resource: DBAs. If this was 15 years ago, your only choice would be to buy an Oracle database or something similar and hire a team of DBAs, he said. Now, even small businesses can afford similar database technology, and although they might hire a number of people, there is a good chance none of them will be DBAs. The most profound change with SQL Server 2012 is that when used with modern hardware you will be able to handle close to 10 terabytes of data without tuninga 2-order-

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

chapter 1: Big Data iS a Big DeaL for SQL Server 2012

Home

Big Data is a Big Deal for sQl server 2012

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

of-magnitude improvement over its predecessor. While Oracle and IBM deliver similar capabilities, Bhatia said SQL Server 2012 has a big cost advantage. There is a theoretical performance advantage with other databases, but to get that you must usually spend hundreds of hours tuning, while with Microsoft it is pretty much unnecessary, he said. You dont need to requisition a DBA from IT and then wait a year. Even the customers that can afford to do that to would rather not. That difference highlights the builtin simplicity that Bhatia said has been a Microsoft hallmark. It has always been simple, but with this release they simplified things further, he said. That simplicity begins with installationsomething that Bhatia said even a nontechnical manager could always do. By contrast, Bhatia said complexity has been part of the culture with Oracle. SQL Server was not structured that way; even the first version you could install without being certified. SQL Server 2005 offered the Index Tuning Wizard and 2008 offered the Database Engine Tuning Advisor, which was basically the same thing, he said. Those tools had already reduced

Now you can buy sQl server for terabytes of data and for crunching billions of records, still without hiring someone to optimize the database.
time requirements to complete tasks, Bhatia said, but with SQL Server 2012, the scale of capabilities has simply grown. Now you can buy SQL Server for terabytes of data and for crunching billions of records, still without hiring someone to optimize the database. Of course, he noted that companies can still benefit from optimization. But its not a requirement, and thats what continues to make the 2012 version so attractive, according to Bhatia. Because we have a self-service model for our customers, that ease of use is important for us, he said. Today we still need to help customers optimize, but with SQL 2012, we are confident that if they are using new hardware it will handle the load without the optimization process, and that is huge for us. n

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

chapter 2

Armed With Its New Database Release, Microsoft Enters the Realm of Big Data
By fraNk J. oHlHorst
Home

Big Data is a Big Deal for sQl server 2012

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

icroSoft has been pitching SQL Server as the alternative to big iron databases for years now. But the message has been somewhat muddled, with most adopters or reviewers thinking that as an alternative to Oracle or IBMs DB2, SQL Server lacked much of the oomph to deal with large, enterprise-level databases. Nevertheless, Microsoft has plodded on, strengthening SQL Server with each iteration and adding capabilities to help move the product farther up the corporate ladder, into the realm where the mighty database giants only dared to tread. Big data, a towering IT phenomenon today, is an increasingly prominent part of that realm. With SQL Server 2012, Microsoft may now be able to challenge the relational database giants thanks to significant design changes, new

technologies and a host of enhancementsall of which add up to a product that can go toe to toe with the best. At least thats what Microsoft hopes. Perhaps the most significant enhancements to SQL Server stem from Microsofts technology development partnership with Hadoop provider Hortonworks. In addition to enabling Apache Hadoop to run in Windows Server and cloud-based Windows Azure environments, the partnership gives SQL Server users access to several technologies for managing and analyzing big data, including an open source version of an Open Database Connectivity (ODBC) driver for Hadoops Hive data warehouse. That lets users analyze unstructured data stored in Hadoop with Excel or Microsoft business intelligence (BI) tools such as PowerPivot, SQL Server Analysis Services and SQL Server 2012s new Power View feature. Microsoft and Hortonworks are also

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

chapter 2: armeD With itS neW DataBaSe reLeaSe, microSoft enterS the reaLm of Big Data

Home

working together to develop a JavaScript implementation for Hadoop that will let developers use the scripting language to create big data applications in Hadoop. Additionally, Microsoft has built connectors to Hadoop into SQL Server and SQL Server Parallel Data Warehouse, enabling users to

Big Data is a Big Deal for sQl server 2012

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

for organizations looking to make the leap into big data analytics, sQl server 2012 brings with it tools and functionality for working with large data sets.
move data between SQL Server and the increasingly popular open source distributed computing framework and NoSQL database platform. The company also has released an ODBC driver that allows applications running on Linux systems, the usual platform for Hadoop, to connect to SQL Server databases. For organizations looking to make the leap into big data analytics, SQL Server 2012 brings with it tools and functionality for working with large data sets, ranging from support for using the Power View reporting tool to display data from Hadoop clusters to

the ability to leverage Microsoft Office tools such as Excel to create analytics algorithms. Whats more, the integration between SQL Server 2012 and Hadoop enables Microsoft shops to bring management of Hadoop clusters under System Center 2012. Now lets delve a little deeper into what else is new in SQL Server 2012 and why its significant. alwayson availability groups. This is a feature that clearly moves SQL Server into the arena of mission-critical applications, where business continuity is a major concern. With AlwaysOn, users can fail over multiple databases in groups instead of individually. Also, secondary copies will be readable and can be used for database backups. That provides a level of continuity not seen before in SQL Server, and it also means that disaster recovery environments no longer need to sit idle; they can actively participate in real-time activity.
n

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

Support for Windows Server core. SQL Server 2012 will work under Windows Core, allowing administrators to create SQL database appliances, without the resource-intensive graphic user interface (GUI) normally associated with a Windows Server OS. Core is the GUI-less version of Windows that uses DOS and PowerShell for user interaction. It has a much lower footprint (50% less memory and disk space utilization), requires
n

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

chapter 2: armeD With itS neW DataBaSe reLeaSe, microSoft enterS the reaLm of Big Data

fewer patches and is more secure than the full install. Starting with SQL 2012, it is supported for SQL Server. column-store indexes. This is a unique new feature offered by SQL Server geared toward large data warehouse queries. Column-store indexes are read-only indexes, where data is grouped and stored in a flat, compressed column index, greatly reducing I/O and memory utilization on large queries.
n

Home

Bi Semantic Model. A replacement for Analysis Services Unified Dimensional Model, or cubes, as most people referred to it. Its a hybrid model that supports BI functionality in SQL Server, yet still allows for text infographics processing and other hybrid constructions.
n

Big Data is a Big Deal for sQl server 2012

Sequence objects. A longtime feature found in Oracle databases that SQL Server has lacked, a sequence is a counter object that can be used to increment values in a table based on triggers.
n

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

user-defined server roles. DBAs have always had the ability to create custom database administration roles, but not ones that were all encompassing across the server. For example, if the DBA wanted to give a development team read-and-write access to every database on a shared server, traditionally the only ways to do it were by hand or using undocumented procedures. Neither were good solutions. Now the DBA can create a role, which has readand-write access on every database on the server or any other custom serverwide role.
n

enhanced powerShell support. SQL Server has robust support for Microsoft PowerShell, allowing administrators to script many events and triggers. PowerShell is becoming a common theme for Microsoft, which is incorporating the technology across all of its server products.
n

enhanced auditing features. Full auditing is available in all editions of SQL Server. Additionally, users can define custom audit specifications to write custom events to the audit log. New filtering features give greater flexibility in choosing which events to write to the log.
n

Distributed replay. Distributed Replay allows admins to capture workloads on a production server and then replay it on another machine. That lets them validate changes in underlying schemas, support packs or hardware changes before moving those new elements to a production environment. The feature is similar to Oracles Real Application Testing option.
n

power view. This feature is a powerful self-service BI tool kit that allows users to create mashups of BI reports
n

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

chapter 2: armeD With itS neW DataBaSe reLeaSe, microSoft enterS the reaLm of Big Data

Home

from all over the enterprise. The key feature in a Reporting Services add-in for SharePoint Server 2010 Enterprise Edition, Power View can be used to build and distribute interactive reports. The reports are viewed in Silverlight, Microsofts rival to Flash, and are designed to let users view data in a number of ways. SQL Database enhancements. While not incorporated into SQL Server 2012, Microsoft is enhancing the Azure SQL Database platform (formerly called SQL Azure) to work better with SQL Server 2012. Improvements include Ren

Big Data is a Big Deal for sQl server 2012

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

porting Services for Azure and backup to the Windows Azure data store. SQL Data Sync now supports a hybrid model of cloud and on-premises technologies. With SQL Server 2012, Microsoft is clearing a path that leads to large-scale enterprise use, yet its still offering support for small and medium-sized businesses. The name of the game here is scalability, and the latest iteration promises to scale into realms once before only dreamed of, while still keeping things relatively simple and low costat least when compared with the industry giants. n

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

chapter 3

SQL Server-Hadoop Highway to Big Data Ventures Into New Territory


By roBert sHelDoN
Home

Big Data is a Big Deal for sQl server 2012

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

icroSoft has made little secret of its move into the realm of big data particularly the massive sets of unstructured files that companies like Google and Yahoo have been grappling with since their inceptions. Google, for instance, processes 20 petabytes thats 20,000 terabytesof data each day, much of that in the form of textbased index files. Yet big data is hardly limited to indexes. Corporations regularly manage large volumes of emails, documents, Web server logs, social networking feeds and an assortment of other unstructured information. To get a handle on all their data, companies such as Autodesk, IBM and Facebookalong with Google and Yahoo, not surprisinglyhave implemented Apache Hadoop, an open source cloud platform designed to manage unstructured data sets too

large for traditional tools. Microsoft has taken notice of all this Hadoop hoopla and forged its own SQL ServerHadoop connection. The company released connectors that let organizations move large amounts of data between Hadoop clusters and SQL Server 2008 R2, Parallel Data Warehouse or SQL Server 2012. Because the connectors allow data to move in both directions, businesses can take advantage of the storage and data processing power of SQL Server and still leverage Hadoops ability to manage large, unstructured data sets. Yet the SQL Server-Hadoop connectors arent quite what SQL Server users are accustomed to. They are command-line tools implemented in a Linux environment. As a result, SQL Server users planning to implement one of the connectors will benefit from having a conceptual overview of how they fit into the Hadoop environment.

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

chapter 3: SQL Server-haDoop highWay to Big Data ventureS into neW territory

tHe apacHe HaDoop cLuSter

Hadoop is a master-slave architecture implemented on a cluster of Linux computers. To process such massive volumes of data, the Hadoop environment must include the following components:
Home

parallel operations on thousands of computers. Figure 1 (see page 11) illustrates how the various components interact within the Hadoop environment. Notice that the master node runs the JobTracker daemon and each slave node runs the TaskTracker daemon. JobTracker processes requests from client applications and assigns those tasks to the various instances of TaskTracker. When it receives instructions from JobTracker, TaskTracker works with the DataNode daemon to run the assigned tasks and handle the movement of data during each phase of the operation.

ppThe master node manages the slave

Big Data is a Big Deal for sQl server 2012

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

nodes and the tasks related to processing, managing and accessing the data files. The master node also serves as the primary access point for outside applications making job requests into the Hadoop environment. The node is also referred to as the master server.
ppThe name mode runs a daemon

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

called NameNode, which manages the Hadoop Distributed File System (HDFS) namespace and regulates access to the data files. The node supports operations such as opening, closing and renaming files as well as determining how to map blocks of data on the slave nodes. In smaller environments, the name mode can be implemented on the same server as the master node.
ppEach slave node runs the DataNode

tHe MapreDuce fraMeWork

daemon, which manages the storage of data files and processes read and write requests to those files. The slave nodes are made up of commodity hardware components that are relatively inexpensive and readily available, making it possible to run

As Figure 1 illustrates, the master node supports the framework necessary to run MapReduce operations. MapReduce technology lies at the core of the Hadoop environment. In fact, you can think of Hadoop as a MapReduce framework, with components such as JobTracker and TaskTracker playing critical roles within that framework. MapReduce breaks large data sets into small, manageable chunks that can be spread across thousands of computers. It also provides the mechanisms necessary to perform numerous parallel operations, search petabytes of data, manage complex client requests and perform in-depth analyses on the data. In addition, the MapReduce

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

10

chapter 3: SQL Server-haDoop highWay to Big Data ventureS into neW territory

structure provides the load balancing and fault tolerance needed to ensure that operations are completed quickly and accurately. The MapReduce framework works

hand in hand with the HDFS framework, which stores each file as a sequence of blocks. Blocks are replicated across the cluster for fault tolerance, and except for the last block, all blocks

Home

Figure 1. You must implement the SQL Server-Hadoop connector within the Hadoop cluster.

Big Data is a Big Deal for sQl server 2012

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

graphic By roBert SheLDon for techtarget

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

11

chapter 3: SQL Server-haDoop highWay to Big Data ventureS into neW territory

within a file are the same size. The DataNode daemon on each slave node works with HDFS to create, delete and replicate blocks. But an HDFS file can be written to only once and by only one writer at a time.
Home

SQL Server-HaDoop connector iMpLeMentation

data from a SQL Server database and adding it to the Hadoop environment, but when you export data, youre retrieving data from Hadoop and sending it to SQL Server. Sqoop lets you import SQL Server data into or export Hadoop data out of one of the following three storage types:
pptext files: Basic text files delimited

Big Data is a Big Deal for sQl server 2012

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

You implement the SQL Server-Hadoop connector on the master node in the Hadoop cluster. However, the master node must also be configured with Sqoop and Microsofts Java Database Connectivity driver. Sqoop is an open source, command-line tool used to import data from a relational database, transform the data using the Hadoop MapReduce framework and then export the data back into the database. When the SQL Server-Hadoop connector is also installed on the master node, you can use Sqoop to import or export SQL Server data. Note, however, that Sqoop and the connector are operating within a Hadoop-centric view of your data. That means when you use Sqoop to import data, youre retrieving

with commas, tabs or other supported characters.


ppSequencefiles: Binary files that con-

tain serialized record data.


ppHive tables: Tables in a Hive data

warehouse, a special warehousing infrastructure built on top of Hadoop. Together, SQL Server and the Hadoop environment (MapReduce and HDFS) enable users to process large amounts of unstructured data and integrate that data into a structured environment that supports reporting, analysis and business intelligence. n

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

12

aBout the authorS

Alan R. Earls is a Boston-area freelance writer focused on business and technology. Email him at alan.r.earls@gmail.com. Frank J. Ohlhorst is an award-winning technology journalist, professional speaker and IT business consultant with more than 25 years of experience. He served as a network administrator and applications programmer at the U.S. Department of Energy before forming his own computer consulting firm, which can be found at ohlhorst.net. Robert Sheldon is a technical consultant and the author of numerous books, articles and training materials related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. For more information, check out his blog, Slipstream.

Home

Striking It Rich? SQL Server Joins the Rush to Harness, Analyze Big Data is a SearchSQLServer.com e-publication. Jason Sparapani managing editor, e-publications Mark fontecchio Site editor Lena Weiner associate Site editor craig Stedman executive editor Linda koury Director of online Design Mike Bolduc publisher mbolduc@techtarget.com ed Laplante Director of Sales elaplante@techtarget.com

Big Data is a Big Deal for sQl server 2012

armeD WitH its NeW DataBase release, microsoft eNters tHe realm of Big Data

sQl serverHaDoop HigHWay to Big Data veNtures iNto NeW territory

2012 techtarget inc. no part of this publication may be transmitted or reproduced in any form or by any means without written permission from the publisher. techtarget reprints are available through The YGS Group. About TechTarget: techtarget publishes media for information technology professionals. more than 100 focused websites enable quick access to a deep store of news, advice and analysis about the technologies, products and processes crucial to your job. our live and virtual events give you direct access to independent expert commentary and advice. at it knowledge exchange, our social community, you can get advice and share solutions with peers and experts.

Striking it rich? SQL Server JoinS the ruSh to harneSS, anaLyze Big Data

13

You might also like