Professional Documents
Culture Documents
Inmon Vs. KimballDifferences Between Industry Titans. Almost everyone who has a connection to the concept of a data warehouse has an opinion concerning the best way to construct one for optimum results. The two prominent proponents of data warehouse architecture are Bill Inmon considered to be the father of the data warehouse and !alph Kimball the creator of the data mart. "ince the first time these two industry leaders published their conception of data warehouse architecture there has been anticipation of a live debate of the similarities and differences between their concepts. It had been hoped that having #r. Inmon and #r. Kimball on the same stage whether live or virtual would provide a dialogue and clearly delineate their personal perspectives on data warehouse design. $nfortunately for everyone who has e%pressed an interest in such a thought&provo'ing milestone event it is still not possible to have these two industry leaders s(uare&off in the same forum. #r. Kimball in the past has declined on several occasions to accept the invitation to such a debate and has also declined to be associated with this series of articles. The latest response from #r. Kimball)s office to our re(uest to participate will appear at the end of this series. *owever because both #r. Inmon and #r. Kimball are very prolific in publishing their ideas and offering their opinions I have underta'en the tas' of +staging, a debate between them based on their writings. It will appear as a series over the ne%t five wee's. In preparation for this series Bill Inmon has provided me with a source article ,The -reat Inmon. Kimball Debate that /ever Too' 0lace., In presenting #r. Inmon)s position I have also generally relied on the numerous articles he has published on Business Intelligence /etwor' and for the most part will not be providing specific citations to them. I recommend them to you for reading at your leisure. 1hen directly (uoting #r. Kimball I will provide specific citations to his materials. 2ven though by training and professional bac'ground I am a lawyer this is not intended to be an +empty chair, e%amination. This will be a best effort to present the two viewpoints as impartially and accurately as possible given the constraints of the circumstances. That being said allow me to introduce our participants3
Bill Inmon is recogni4ed as the +father of the data warehouse, and co&creator of the +5orporate Information 6actory., *e has more than 78 years of e%perience in database technology management and data warehouse design. *e has spo'en at seminars worldwide on developing data warehouses. *e has published more than 98: articles and ;8 boo's on the sub<ect. !alph Kimball is 'nown worldwide as an innovator writer educator spea'er and consultant in the field of data warehousing. *e maintains a strong conviction that data warehouses must be designed to be understandable and fast. *e has written more than =:: articles and his boo's on dimensional design techni(ues have been the all&time best sellers in data warehousing. This establishes our players) eminent credentials for this forum. >ur Agenda will be3 ? "ession I3 !alph Kimball)s 5oncept@ ? "ession II3 Bill Inmon)s 5oncept@ ? "ession III3 "imilarities and Differences@ ? "ession IV3 !elational vs. #ultidimensional@ and ? "ession V3 "ummary !eader 6eedbac' and !eader Auestions At any time during the series we will readily accept feedbac' from either #r. Inmon or #r. Kimball and these will be published as their comments clarifications or rebuttals. !eaders) constructive comments and (uestions are also welcomed and will be published and addressed in a special blog.
class of user. The star schema approach has been viewed as a +Bottom $p, approach from those outside the Kimball group as contrasted with the Bill Inmon approach which has been termed +Top Down., The most accurate description regarding the Kimball approach in the author)s opinion comes directly from material from the Kimball website +Design Tip F;G)>ff the Bench),3 +1hen we wrote +The Data 1arehouse Hifecycle Tool'it , we referred to our approach as the Business Dimensional Hifecycle. In retrospect we should have probably <ust called it the Kimball Approach as suggested by our publisher. 1e chose the Business Dimensional Hifecycle label instead because it reinforced our core tenets about successful data warehousing based on our collective e%periences since the mid&=GI:s.
6irst and foremost you need to focus on the business J. Kou must have one eye on the business) re(uirements while the other is focused on broader enterprise integration and consistency issues. The analytic data should be delivered in dimensional models for ease&of&use and (uery performance. 1e recommend that the most atomic data be made available dimensionally so that it can be sliced and diced +any which way., 1hile the data warehouse will constantly evolve each iteration should be considered a pro<ect life cycle consisting of predictable activities with a finite start and end ..., As the above&referenced Design Tip was written e%pressly to refute the +Bottom $p, label for the Kimball approach it went on to e%plain that the Kimball approach recommends developing an +enterprise data warehouse bus matri%., Design Tip ;G continues3 +6inally we believe conformed dimensions Dwhich are logically defined in the bus matri% and then physically enforced through the staging processE are absolutely critical to data consistency and integration. They provide consistent labels business rules.definitions and domains that are re&used as we construct more fact tables to integrate and capture the results from additional business processes.events., The above e%cerpts from the design tip describe the more current Kimball approach which is called the +data warehouse bus architecture., This architecture is comprised of3
A staging area Dwhich can have an 2.! or relationally designed 7/6 design or flat file formatE which cannot be accessed by an end&user of the data warehouse bus. The Data 1arehouse Bus itself which includes several atomic data marts several aggregated data marts and a personal data mart but no single or centrali4ed data warehouse component. The Data 1arehouse Bus3
Is dimensional@ 5ontains transaction and summary data@ Includes data marts which have single sub<ect or fact tables@ and 5an consist of multiple data marts in a single data base. According to the article by !alph Kimball and #argy !oss +Differences of >pinion, in Intelligent Enterprise #arch C::; in the Data 1arehouse Bus Architecture3 +Ja dimensional model contains the same information as a normali4ed L7/6M model but pac'ages it for ease&of&use and (uery performanceJ. It includes both atomic detail and summari4ed informationJ.Aueries descend to progressively lower levels of detail without reprogrammingJ. Dimensional models are built by business processes J not business departments. >nce foundation business processes are available in the warehouse consolidated dimensional models deliver cross&process metrics. The enterprise data warehouse identifies and enforces the relationship between business process metrics DfactsE and descriptive attributes DdimensionsE., A fundamental concept of the Kimball Data 1arehouse Bus design is that in this approach the data warehouse is not a physical repository of the data as in the Inmon approach. It is +virtual., It is a collection of data marts each having a star schema design at its base. >ur ne%t session will describe #r. Inmon)s 5oncept of the data warehouse and set the stage for the 5orporate Information 6actory.
+The relational foundation for the data warehouse needs to be built iteratively one table at a time. $nder no circumstances is it optimal to build a data warehouse all at once using the +big bang, approach. Accordingly the methodology that is appropriate to the building of a data warehouse is 'nown as the +spiral approach,J. In the LiterativeM spiral approach one small part of the system isJcompletLedMJ"mall parts of the relational data warehouse are added with each new iteration., In the Inmon model by using the iterative method errors and ad<ustments can be applied to a small amount of data or code without the need to re&program or code large amounts of data in the data warehouse... This relationally designed or 7/6 approach permits a granularity of integrated data which provides ma%imum fle%ibility to the enterprise. If the enterprise has new re(uirements for the data that is warehoused the data in the data warehouse is in a form that is ready to be shaped or formatted to meet the new re(uirements. Bill Inmon has provided an e%cellent description of his concept of data warehouse design3, Data warehouses are arranged LbyM the corporate sub<ect areasJin the corporate data model. $sually the data warehouse is built and owned by centrally coordinated organi4ations. J LItM is a truly corporate L&wideM effort., *e also advises that the data warehouse contains the corporation)s most granular level of data. The structure and content of the data warehouse is not dictated by the re(uirements of any one department but instead is intended to serve the entire corporation)s data re(uirements. The data warehouse therefore re(uires scalable technology to properly house it because of the tremendous volume of data needed for the entire enterprise. The data warehouse also contains historical data from many legacy sources. A critical design tenet of a data warehouse is that it is />T a collection of data marts but is in fact a physically distinct component altogether. The ne%t "ession will focus on specific similarities and differences between the Inmon 5orporate Information 6actory and the Kimball Bus Architecture.
There is no single source of data for analytical processing J@ There is no easy reconcilability of data values J@ There is no foundation to build on for new data marts J An independent data mart is rarely reusable for other purposes@ There are too many interface programs to be built and maintained@ There is a massive redundancy of detailed data in each data mart ... because there is no common place where that detailed data is collected and integrated@ There is no convenient place for historical data@ There is no low level of granularity guaranteed for all data marts to use@ 2ach data mart integrates data from the source systems in a uni(ue way which does not permit reconcilability or integrity of the data across the enterprise@ and The window for e%tracting data from the legacy environment is stretched with each independent data mart re(uiring its own window of time for e%traction J,
In, Differences of >pinion, Dpreviously citedE #r. Kimball gives his opinion of independent data marts3 +6inally stand&alone data marts or warehouses J are problematic. These independent silos are built to satisfy specific needs without regard to other e%isting or planned analytic data. They tend to be departmental in nature often loosely dimensionally structured. Although often perceived as the path of least resistance because no coordination is re(uired the independent approach is unsustainable in the long run. #ultiple uncoordinated e%tracts from the same operational sources are inefficient and wasteful. They generate similar but different variations with inconsistent naming conventions and business rules. The conflicting results cause confusion rewor' and reconciliation. In the end decision&ma'ing based on independent data is often clouded by fear uncertainty and doubt., It appears from the above that both Inmon and Kimball are of the opinion that independent or stand&alone data marts are of marginal use. *owever for the most part this is where the perception of similarity stops. Kou may discern later as I have that there are more similarities but each of our data warehouse architects e%presses them in a very different way.
Inmon believes that Kimball)s star schema&only approach causes infle%ibility and therefore leads to a +brittle, structure. *e writes,J this basic lac' of fle%ibility is at the heart of the wea'ness of the star schema model as the basis of the data warehouse ... 1hen there is an enterprise need for data the star schema is not at all optimal. Ta'en together a series of star schemas and multi&dimensional tables are brittle ... LTheyM cannot change gracefully over time J, #r. Inmon believes his approach which uses the dependent data mart as the source for star schema usage solves the problem of enterprise&wide access to the same data which can change over time. +The relational data warehouse is best served by a relational L7/6M database design running on relational technology J This should be no surprise since the dbms technology the data warehouse runs on wor's the best with a relational database design., The Kimball B$" architecture e%presses that +raw data is transformed into presentable information in the staging area ever mindful of throughput and (uality. "taging begins with coordinated e%tracts from the operational source systems. "ome staging +'itchen, activities are centrali4ed such as maintenance and storage of common reference data while others may be distributed. D+Data 1arehouse Dining 2%perience , Intelligent 2nterprise Pan = C::;.E The above indicates to this author that Kimball has gone beyond the individual star schema approach critici4ed by Inmon and in fact has described his multi&dimensional data warehouse. In this approach the model contains atomic data and the summari4ed data but its construction is based on business measurements which enable disparate business departments to (uery the data from a higher level of detail to the lowest level without reprogramming. Although this description appears to indicate that the Kimball +staging area, is V2!K similar to the Inmon data warehouse the Kimball approach does not recommend a real physically implemented data warehouse. *is +data warehouse, is still the collection of data marts with their conformed dimensions. In #astering Data 1arehouse Design3 !elational and Dimensional Techni(ues by 5laudia Imhoff /icholas -alemmo and Ponathan -eiger D1iley C::7E these authors analy4e the Kimball approach as relying on star schemas for both atomic and aggregated storage. "ummari4ing this point of their research the Data 1arehouse Bus Architecture is said to consist of two types of data marts3 The Atomic Data #arts which hold multi&dimensional data at the lowest level. These can also include aggregated data for improved (uery performance. Aggregated Data #arts. These can store data according to a core business process.
In both the Atomic and Aggregated Data #arts the data is stored in a star schema design.
Their description of the Kimball Bus Architecture seems to indicate that the Kimball Approach still does not recogni4e a need for nor re(uire a central data warehouse repository. The ne%t article will highlight the differences in the two models regarding relational vs. multidimensional data.
10
fact table represents... 1hen you ma'e a grain declaration you can have a very precise discussion of which dimensions are possible and which ones are not... Atomic data has the most dimensionality and so it can be constrained and rolled up in every way that is possible for that data source. Atomic data is a perfect match for the dimensional approach... higher levels of aggregation will almost always have smaller dimensions... "ince useful aggregations necessarily shrin' dimensions and remove dimensions it leads to the reali4ation that aggregated data must always be used in con<unction with base atomic data because aggregated data has less dimensional detail.Q At this point in the comparison between Inmon and Kimball it seems they agree on the need for atomic data and the need for it to be available when aggregated data is being used. #r. Kimball is emphatic on this point in defending data marts3 Q"ome authors get confused on this point and after declaring that data marts necessarily consist of aggregated data they critici4e the data marts for Banticipating business (uestions.BQ *e then points out that the misunderstanding can be clarified by providing the atomic data along with the derivative aggregated data. The ne%t and last article will be a summary and will provide reader feedbac' and answers to readersB (uestions received during this series.
11
12
"toring the atomic data in dimensional structures provides business users with the ability to get answers to immediate and sometimes unpredictable problems. According to the Kimball approach this puts usable data in the hands of the business user ma'ing the (uery without re(uiring a data warehouse e%pert to drill into the different normali4ed structures for the data. #r. Kimball also points out that his approach uses the enterprise data warehouse B$" architecture Qwith common conformed dimensions for integration and drill&across support. 5onformed dimensions are the bac'bone of any enterprise approach...Q In rebuttal from #r. Inmon the Inmon approach would accept the premise of Kimball stated above Q If you ma'e the atomic data available in dimensional structures you can always summari4e the data Bany which way BQ B$T only to the e%tent that the atomic data in the dimensional structures is being analy4ed using >/HK multi&dimensional methods. The Inmon approach would contend that statistical mining and even e%ploratory methods cannot be used if the atomic data is made available only in dimensional states. In the Inmon approach the main reason for storing the data in a 7/6 fashion in the data warehouse is not to predispose the data to favor any particular analytical method. As you can see there are many more similarities than differences between the architectures once you get past the semantics. Which is Better' The answer is of course it depends&&on how you cleanse your data@ the level of granularity you choose to access it@ the variety of analytical techni(ues you use to analy4e the data the time and resources you have to build it and your prevailing corporate culture. 1hether you decide to Q0unch InQ to the Inmon 5orporate Information 6actory or Qget onQ the Kimball B$" we hope you have en<oyed this series and we loo' forward to your comments. This concludes the Q-reat Debate.Q The author wishes to than' Bill Inmon for his source material and also to ac'nowledge 5laudia Imhoff Intelligent "olutions@ Dan #eers Knightsbridge 5onsulting@ Poyce #ontanari Independent 5onsultant@ -enia /euschloss -avrosche@ Dere' "trauss
13
-avrosche3 and Bob Terdeman Independent 5onsultant for their assistance and insights on the Inmon approach. 1hile the author relied heavily on #r. KimballBs website articles in Intelligent 2nterprise and on the Design Tips from Kimball $niversity to attempt to present his approach for this series she does not e%pect him to reply. The reply received by Business Intelligence /etwor' from #r. KimballBs office appears below. QAs for the article series we are constantly developing new content for our e%isting writing commitments Q was the Kimball office response. Q1e canBt review and edit.correct everything thatBs written about dimensional modeling and the Kimball #ethods to ensure accuracy. 1eBve decided to pass on a review of your series rather than establishing any precedent for other similar re(uests.Q The author does welcome any rebuttal or clarification to the Kimball approach she has presented from readers who have adopted or employed that approach. Than' you for your interest in QThe -reat Debate.Q
14