You are on page 1of 7

WILLIAM MCKNIGHT

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST

A Guide to the Teradata Appliance Line www.williammcknight.com


WILLIAM MCKNIGHT STARTING SMALL, BUT THINKING LARGE AND SCALING FAST

Starting Small, but Thinking Large and Scaling Fast


INTRODUCTION
As companies take steps to manage their information asset, choosing a platform and database management system (DBMS) is absolutely fundamental. In fact, the platform is the foundation of architecture and business intelligence and the starting point for tool selection, consultancy hires, and more. In short, a companys platform is key in defining its information culture. These platform decisions are taking place in a challenging context. Over time, data volumes are continuing to soar as history accumulates, syndicated data is collected and new sources with more detailed data are added. Furthermore, communities consuming the data continue to grow, expanding well beyond usual company boundaries to customers, supply-chain partners, and even the internet. Companies need to make sure they choose a proven platform not just for initial, known requirements but also with scalability to future, to-be-determined requirements as data, users, and applications grow. These challenges are no longer only affecting the big players. Mid-size companies1 have similar data management needs to Fortune companies, albeit with reduced data volume and, sometimes, fewer users. They, too, need: Rapid development that can be built upon over time. Quality data that is available. Architectures that provide low, long-term total cost of ownership (TCO). Good query performance that results in increased interactive usage. Ability to get to real-time feeds. A platform to support advanced workload management. A scalable path forward as data, users, and application needs grow. provided by: William McKnight
www.williammcknight.com

Table of Contents Introduction ........................................................................ 2 Information is of Major Importance ................................... 3 The Enterprise Data Warehouse Approach ........................ 3 Mid-market Data Warehousing and BI ............................... 4 Criteria for an EDW Platform Selection .............................. 4 Teradata Innovations for Performance and Availability .... 5 The Teradata Data Warehouse Appliance .......................... 5 The Teradata Data Mart Appliance ..................................... 6 The Teradata Extreme Data Appliance ............................... 6 Scaling to the Teradata Active EDW ................................... 6 Conclusion ........................................................................... 6 About the Author ................................................................ 7
1

For purposes of this paper, mid-size companies will be defined as companies with $1B to $50B in annual revenue.

WILLIAM MCKNIGHT

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST

Complicating matters, in selecting the platform to support their data, companies are now faced with an exponentially higher number of variations and distinct departures from the traditional online transactional processing (OLTP) DBMS than ever before. In 2008, in concert with this increase in information management needs, Teradata Corporation a successful data warehouse provider for the one-terabyte+ market for nearly 30 years began making its technology affordable to the mid-market customer. This move is ushering in a new era of scalability and performance in that segment, as the #1 platform provider is poised to provide its leadership and influence for companies off, as well as on, the Fortune charts.

INFORMATION IS OF MAJOR IMPORTANCE


The battleground on which many industries engage today extends well beyond customary core competencies to the collection, management, and use of data. As proof even in a subdued economy, business intelligence remains at the forefront of IT-related spending. This is in large part due to the applicability of information directly and indirectly to the organizations bottom line. Information must be flexible, manageable, and actionable. And it must be all these things within the framework of a multitude of IT-related realities, such as: Multiple, complex applications serving a variety of users Exploding data size Data latency becoming intolerable as real-time information becomes necessary to compete As data begins to accede to its profitable use and platforms evolve to handle the workload, its always only a matter of time until new demands to leverage data arise, adding requirements on a seemingly ongoing basis. But there is a natural flow to information management maturity that Teradata is not only well aware of, but has helped define over the years. Today, this maturity includes using data to take advantage of relationships that extend beyond the company walls. But acknowledging these requirements and realities, and being able to support them are two different things.

THE ENTERPRISE DATA WAREHOUSE APPROACH


The efficacy of having a centralized data store with quality, integrated, accessible, high-performance, and scalable data cannot be denied, regardless of company size. Yet some organizations with a decentralized orientation believe that initiating an enterprise data warehouse (EDW) is too difficult an endeavor without a quick and clear ROI. The assumption here is that EDW architecture implementation has an unbearable, year-plus timeline when it comes to delivering business value. Fortunately, this is no longer the reality. Today, EDW represents commitment to organize the information of the corporation, regardless of its size, in the most efficient manner possible. Its not put in place using a big bang approach, but is instead, primarily accomplished by meeting the objectives of a key subject area, data source, business objective, or user department, and then progressively building the environment with scalability from there. Another manageable aspect of EDW implementation is through the consolidation of smaller, independent data marts into a centralized, money-saving architecture. The most efficient way to accomplish EDW objectives is the way that builds a data warehouse to solve specific needs, but does so in a manner that leverages previous investment in the architecture, tools, processes, and people, and does not prohibit future growth. This enables an efficient, programmatic approach to data warehousing created to serve information to the enterprise. Setting aside EDW implementation is also particularly important for mid-market organizations that are getting started developing their architectural foundations. Too often these decisions are made within departmental boundaries without consideration of an overarching data warehousing strategy. This has led many organizations down the path of data mart proliferation the creation of non-integrated data sets developed to address specific application needs, usually with an inflexible design. In the vast majority of cases, data mart proliferation is not the result of a chosen architectural strategy, but a consequence due to lack of an architectural strategy. In either case, bringing the EDW approach to bear economically at the outset of such development is critical to economically taking advantage of its vast promise down the road.

WILLIAM MCKNIGHT

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST

MID-MARKET DATA WAREHOUSING AND BUSINESS INTELLIGENCE


Business intelligence vendors have been slow to respond to the needs of the midmarket. This factor, combined with their own more limited budgets, has meant that many in the midmarket have had to take alternative paths to business intelligence than the Fortune 50. In fact, the multi-layered architectures and multi-quarter timeframes-to-value were barriers to business intelligence in the midmarket long before the current recession began. Teradata is among the vendors that has mobilized solutions with the realities of the mid-market in mind. Enterprise-class business intelligence with simplicity and scalability is available now in a midmarket-oriented suite of affordable platforms delivered in the increasingly popular preconfigured data warehouse appliance model. The data warehouse appliance is a hardware/software/OS/DBMS/storage preconfiguration for data management requirements. Low TCO for a mixed workload data warehouse environment is consequential with appliances. Naturally, vendors can mix and match their components to best suit certain workloads. Without compromising on the criteria that experienced practitioners know to be required for success at any level, Teradata has done this with the Teradata Data Warehouse Appliance, Teradata Data Mart Appliance, and the Teradata Extreme Data Appliance. All are designed and priced to meet midmarket needs, or the departmental needs of the larger enterprise. Teradata appliances use the proven and powerful Teradata DBMS. They also benefit from Teradatas industry-leading integration with multiple data integration and BI tools and vendors.

CRITERIA FOR AN ENTERPRISE DATA WAREHOUSE PLATFORM SELECTION


The decision process for choosing a data warehouse platform should go well beyond the usual consideration of the operational DBMS vendor. Nuances about several potential requirements include: The immediate availability of information Cross-functional complexity The level of query concurrency The scalability needs of the platform The functionality of the DBMS Given the state of the marketplace, the technical architecture for a data platform in a mid-size-or-larger company should be: Scalable The solution should be scalable in both performance capacity and incremental data volume growth. The solution should scale in a near-linear fashion and allow for growth in database size, the number of concurrent users, and the complexity of queries. Understanding hardware and software requirements for such growth is paramount. Powerful The platform should be designed for complex decision support in an advanced workload management environment. The optimizer should be mature enough to support every type of query with good performance. Determine the best execution plan based on changing data demographics. Check on conditional parallelism and the causes of variations in the parallelism deployed, and on dynamic and controllable prioritization of resources for queries. Manageable The solution should be manageable with minimal support tasks requiring DBA/System Administrator intervention. There should be no need for the proverbial army of DBAs to support an environment, and the system should provide a single point of control to simplify administration. You should be able to create and implement new tables and indexes at will. Extensible Look for flexible database design and system architecture that keeps pace with evolving business requirements and leverages existing investment in hardware and software applications. Know the answers to questions such as: What is required to add and delete columns? What is the impact of repartitioning tables? Interoperable The system should have integrated access to the web, internal networks, and corporate mainframes. Recoverable In the event of component failure, the system must continue providing value to the business. It also should allow the business to selectively recover the data to points in time and provide an easy-to-use mechanism for doing this quickly. Affordable The proposed solution (hardware, software, services) should provide a relatively low TCO over a multi-year period. Flexible The system should provide optimal performance across the full range of normalized, star, and hybrid data schemas with large numbers of tables. Look for proven ability to support multiple applications from different business units, leveraging data that is integrated across business functions and subject areas.

WILLIAM MCKNIGHT

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST

Robust in Database Management Systems Features and Functions Make sure there are DBA productivity tools, monitoring features, parallel utilities, robust query optimizer, locking schemes, security methodology, intra-query parallel implementation for all possible access paths, chargeback and accounting features, and remote maintenance capabilities. There are few vendors who understand what it means to build mission-critical, well-performing data platforms that meet all of the above criteria. Of course, the vendor itself should be a major consideration, especially in these days of consolidation. When making this all-important decision, consider a vendors financial stability, the importance of data management to their overall business strategy, and their continued research and development in these areas towards a well-developed and relevant vision.

TERADATA INNOVATIONS FOR MAXIMUM PERFORMANCE AND AVAILABILITY


One of the hallmarks of Teradatas unique approach is that all database functions (table scan, index scan, joins, sorts, insert, delete, update, load and all utilities) are done in parallel all of the time. There is no conditional parallelism. All units of parallelism participate in each database action. Also of special note is the table scan. One of Teradata Databases main features is a technique called synchronous scan, which allows scan requests to piggy back onto scans already in process. So maximum concurrency is achieved through maximum leverage of every scan. Teradata Database keeps a detailed profile of the data under management to efficiently scan only the limited storage where query results might be found.2 The Teradata optimizer intelligently runs steps in a query in parallel wherever possible. For example, for a three-table join requiring three-table scans, Teradata Database would start all three scans in parallel. When scans of tables B and C finished, it would begin the join step as the scan for table A finished. Teradatas optimizer is grounded in the knowledge that every query will be executing on a massively parallel processing system (MPP). Such systems are generally acknowledged as the preferred architecture for analytic query, business intelligence, and data warehousing. Teradata systems do not share memory or disk across the nodes, the collections of CPU, memory and bus connected in an MPP environment. Sharing disk and/or memory creates overhead. Sharing nothing minimizes disk access bottlenecks. The Teradata BYNET, the node-to-node interconnect, which scales linearly to more than a thousand nodes, has fault tolerant characteristics designed specifically for a parallel processing environment. Hot-pluggable components allow you to replace components without affecting your applications. If a component fails, builtin redundancy allows the application to continue running in Teradata systems. Furthermore, the growth path in the Teradata environment is a function of easily adding nodes and disk storage. Continual feeding without table-level locks with Teradata utilities can be done with multiple feeders at any point in time. And again, the impact of the data load on the resources is customizable. The process ensures no input data is missed regardless of the allocation. Teradata has extended the concepts that are interesting to the midmarket and to a single-application focus from their Active Enterprise Data Warehouse into their new appliance family. In so doing, Teradata has ushered in true business intelligence affordability for the midmarket.

THE TERADATA DATA WAREHOUSE APPLIANCE


The Teradata Data Warehouse Appliance supports the EDW approach to building the data warehouse and is the Teradata appliance family flagship product. It is suitable for an upper midmarket true EDW or as the platform for a focused application. With four MPP nodes per cabinet and scaling up to 11 cabinets with 12.6 terabytes each, the Teradata Data Warehouse Appliance can manage up to 140 terabytes3, with the workload characteristics of a typical data warehouse multiple, complex applications serving a wide variety of users. The experience can begin at two terabytes of fully redundant user data on two
2

Teradata Intelligent Scanning

WILLIAM MCKNIGHT

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST

nodes and grow, node-by-node if necessary, up to 46 nodes. The nodes can be provided with Capacity on Demand as well, which means the capacity can be configured into the system unlicensed until it is needed. This makes adding the capacity simple.

THE TERADATA DATA MART4 APPLIANCE


The Teradata Data Mart Appliance is a more limited capacity equivalent of the Teradata Data Warehouse Appliance and is ideal for the data warehouse or another of the larger data stores in the midmarket. Its a single node, single cabinet design with a total user data capacity of six terabytes5. It should be noted, though, that a single node environment comes with the potential for downtime in the unlikely event that the node fails there is no other node to cover for the failure.

THE TERADATA EXTREME DATA APPLIANCE


Though not nearly strictly a mid-market need, the Teradata Extreme Data Appliance is also part of the Teradata appliance family, and represents affordability for the management of large data. It out-scales even the Teradata Active EDW platform. While the Active EDW tops out at 10 petabytes, the Extreme Data Appliance will scale to 50 petabytes. A system of this size would have less concurrent access requirements due to access being spread out across the large data set. The Teradata Extreme Data Appliance is designed for high-volume data capture such as that found in click stream capture, call detail records, high-end POS, scientific analysis, sensor data, and any other specialist system useful when the performance of straightforward, non-concurrent analytical queries is the overriding selection factor. It also can serve as a surrogate for nearline archival strategies that move interesting data to slow retrieval systems, and it will keep this data online.

SCALING TO THE TERADATA ACTIVE ENTERPRISE DATA WAREHOUSE


Any code built for a Teradata appliance is completely portable to the Teradata Active Enterprise Data Warehouse, in case you need to go beyond the chosen Teradata appliance. This platform for data warehousing with nine nodes per cabinet scaling up to 1,024 nodes, has a total disk capacity of 10 petabytes. A superset of features is part of the Teradata Active EDW, including automatic node failover and recovery, active system management with full performance continuity with hot standby nodes, fallback, backup and recovery, and dual active systems. The system is designed to manage the most mission-critical systems. The need for such management could be one reason to upsize to this platform. Another reason, except for those using the Extreme Data Appliance, might be data sizing.

CONCLUSION
From straightforward mid-market data warehouse requirements to the global enterprise and beyond, Teradatas platforms are built on a foundation that has served the largest and most complex environments in the world for nearly 30 years. By meeting the needs of the midmarket with the proven appliance model, as well as with a flexible combination in nodes, maximum data size, storage and cabinet configurations, and high availability features, Teradata is showing its leadership in the midmarket, as well as in the larger-company arena. Teradata solutions allow you to start small, think big, and scale fast in terms of an EDW approach to data management and, if required, migrate to an Active EDW platform. The Teradata Data Mart Appliance is the robust selection for the mid-market data warehouse or data store. The Teradata Data Warehouse Appliance takes the data mart appliance benefits to another level, and the Teradata Extreme Data Appliance has the upper end of data size covered for any enterprise. Whatever your information needs, Teradatas principles of scalability, power, manageability, extensibility, interoperability, manageable long-term TCO, flexibility, and robust features and functions support the possibilities.

3 4 5

Numbers do not assume compression, which should allow for 30% more user storage on average. Data Mart (vs. Warehouse) is a product label only and is meant to address scale of the project and not the polar opposite of a Data Warehouse However, as noted, once the limits are approached, porting to the Teradata Active Enterprise Data Warehouse is an attractive option.

WILLIAM MCKNIGHT

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST

About the Author


William functions as Strategist, Lead Enterprise Information Architect, and Program Manager for complex, highvolume full life-cycle implementations worldwide utilizing the disciplines of data warehousing, master data management, business intelligence, data quality and operational business intelligence. Many of his clients have gone public with their success stories. William is a Southwest Entrepreneur of the Year Finalist, a frequent best practices judge, has authored more than 150 articles and white papers and given over 150 international keynotes and public seminars. His teams implementations from both IT and consultant positions have won Best Practices awards. William is a former IT VP of a Fortune company, a former engineer of DB2 at IBM and holds an MBA. William can be reached at 214-514-1444 or william@williammcknight.com.

5960 W. Parker Rd., Suite 278-133 Plano, TX 75093 Tel (214) 514-1444 Fax (800) 886-7033

Teradata and the Teradata logo are registered trademarks of Teradata Corporation and/or its affiliates in the U.S. and worldwide.

EB-5933 > 0609

WILLIAM MCKNIGHT

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST

You might also like