DW

What is a lookup table?
Why should you put your data warehouse on a different system than your OLTP system? What are Aggregate tables? What's A Data warehouse What is ODS? What is a dimension table? What is Dimensional Modelling? Why is it important ? Why is Data Modeling Important? What is data mining? What is ETL? Why are OLTP database designs not generally a good idea for a Data Warehouse? What is Fact table? What are conformed dimensions? What are the Different methods of loading Dimension tables? What is conformed fact?
What are Data Marts? What is a level of Granularity of a fact table? How are the Dimension tables designed? What are non-additive facts? What type of Indexing mechanism do we need to use for a typical datawarehouse? What Snow Flake Schema? What is real time data-warehousing? What are slowly changing dimensions? What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables? Differences between star and snowflake schemas? What is a Star Schema? What is a general purpose scheduling tool?
What is ER Diagram? Which columns go to the fact table and which columns go the dimension table? What are modeling tools available in the Market? Name some of modeling tools available in the Market? How do you load the time dimension? Explain the advanatages of RAID 1, 1/0, and 5. What type of RAID setup would you put your TX logs. What is Difference between E-R Modeling and Dimentional Modeling.? Why fact table is in normal form? What are the advantages data mining over traditional approaches? What are the vaious ETL tools in the Market? What is a CUBE in datawarehousing concept? What is data validation strategies for data mart validation after loading process ? what is the datatype of the surrgate key ? What is degenerate dimension table? What does level of Granularity of a fact table signify? What is the Difference between OLTP and OLAP? What is SCD1 , SCD2 , SCD3? What is Dimensional Modelling? What are the methodologies of Data Warehousing.? What is a linked cube? What is the main difference between Inmon and Kimball philosophies of data warehousing? What is Data warehosuing Hierarchy? What is the main differnce between schema in RDBMS and schemas in DataWarehouse....? What is hybrid slowly changing dimension? What are the different architecture of datawarehouse? What are the vaious ETL tools in the Market? What is VLDB?
What are Data Marts ? What are the steps to build the datawarehouse ? what is incremintal loading? 2.what is batch processing? 3.what is crass reference table? 4.what is aggregate fact table? what is junk dimension? what is the difference between junk dimension and degenerated dimension? What are the possible data marts in Retail sales.? What is the definition of normalized and denormalized view and what are the differences between them? What is meant by metadata in context of a Datawarehouse and how it is important? Differences between star and snowflake schemas? Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true? What is VLDB? What's the data types present in bo?n what happens if we implement view in the designer n report Can a dimension table contains numeric values? What is the difference between view and materialized view? What is surrogate key ? where we use it expalin with examples What is ER Diagram? What is aggregate table and aggregate fact table ... any examples of both? What is active data warehousing? Why do we override the execute method is struts? Plz give me the details? What is the difference between Datawarehousing and BusinessIntelligence? What is the difference between OLAP and datawarehosue? What is fact less fact table? where you have used it in your project? Why Denormalization is promoted in Universe Designing? What is the difference between ODS and OLTP? What is the difference between datawarehouse and BI? Is OLAP databases are called decision support system ??? true/false?
explain in detail about type 1, type 2(SCD), type 3 ? What is snapshot? What is the difference between datawarehouse and BI? What are non-additive facts in detail? What is BUS Schema? What are the various Reporting tools in the Market? What is Normalization, First Normal Form, Second Normal Form , Third Normal Form? What is OLAP? OLAP is abbreviation of Online Analytical Processing. This system is an application that collects, manages, processes and presents multidimensional data for analysis and management purposes. What is the difference between OLTP and OLAP? Data Source OLTP: Operational data is from original data source of the data OLAP: Consolidation data is from various source. Process Goal OLTP: Snapshot of business processes which does fundamental business tasks OLAP: Multi-dimensional views of business activities of planning and decision making Queries and Process Scripts OLTP: Simple quick running queries ran by users. OLAP: Complex long running queries by system to update the aggregated data. Database Design OLTP: Normalized small database. Speed will be not an issue due to smaller database and normalization will not degrade performance. This adopts entity relationship(ER) model and an application-oriented database design. OLAP: De-normalized large database. Speed is issue due to larger database and de-normalizing will improve performance as there will be lesser tables to scan while performing tasks. This adopts star, snowflake or fact constellation mode of subject-oriented database design. Describes the foreign key columns in fact table and dimension table? Foreign keys of dimension tables are primary keys of entity tables. Foreign keys of facts tables are primary keys of Dimension tables. What is Data Mining? Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information. What is the difference between view and materialized view? A view takes the output of a query and makes it appear like a virtual table and it can be used in place of tables. A materialized view provides indirect access to table data by storing the results of a query in a separate schema object. What is ER Diagram? Entity Relationship Diagrams are a major data modelling tool and will help organize the data in your project into entities and define the relationships between the entities. This process has
proved to enable the analyst to produce a good database structure so that the data can be stored and retrieved in a most efficient manner. An entity-relationship (ER) diagram is a specialized graphic that illustrates the interrelationships between entities in a database. A type of diagram used in data modeling for relational data bases. These diagrams show the structure of each table and the links between tables. What is ODS? ODS is abbreviation of Operational Data Store. A database structure that is a repository for near real-time operational data rather than long term trend data. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases. What is ETL? ETL is abbreviation of extract, transform, and load. ETL is software that enables businesses to consolidate their disparate data while moving it from place to place, and it doesnt really matter that that data is in different forms or formats. The data can come from any source.ETL is powerful enough to handle such data disparities. First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data using rules orlookup tables, or creating combinations with other data to convert it to the desired state. Finally, the load function is used to write the resulting data to a target database. What is VLDB? VLDB is abbreviation of Very Large DataBase. A one terabyte database would normally be considered to be a VLDB. Typically, these are decision support systems or transaction processing applications serving large numbers of users. Is OLTP database is design optimal for Data Warehouse? No. OLTP database tables are normalized and it will add additional time to queries to return results. Additionally OLTP database is smaller and it does not contain longer period (many years) data, which needs to be analyzed. A OLTP system is basically ER model and not Dimensional Model. If a complex query is executed on a OLTP system, it may cause a heavy overhead on the OLTP server that will affect the normal business processes. If de-normalized is improves data warehouse processes, why fact table is in normal form? Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains columns which are primary key to other table that itself make normal form table. What are lookup tables? A lookup table is the table placed on the target table based upon the primary key of the target, it just updates the table by allowing only modified (new or updated) records based on thelookup condition. What are Aggregate tables? Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions. It is always easy to retrieve data from aggregated tables than visiting original table which has million records. Aggregate tables reduces the load in the database server and increases the performance of the query and can retrieve the result quickly. What is real time data-warehousing? Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. What are conformed dimensions? Conformed dimensions mean the exact same thing with every possible fact table to which they are joined. They are common to the cubes.
What is conformed fact? Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly. How do you load the time dimension? Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. 100 years may be represented in a time dimension, with one row per day. What is a level of Granularity of a fact table? Level of granularity means level of detail that you put into the fact table in a data warehouse. Level of granularity would mean what detail are you willing to put for each transactional fact. What are non-additive facts? Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. However they are not considered as useless. If there is changes in dimensions the same facts can be useful. What is factless facts table? A fact table which does not contain numeric fact columns it is called factless facts table. What are slowly changing dimensions (SCD)? SCD is abbreviation of Slowly changing dimensions. SCD applies to cases where the attribute for a record varies over time. There are three different types of SCD. 1) SCD1 : The new record replaces the original record. Only one record exist in database current data. 2) SCD2 : A new record is added into the customer dimension table. Two records exist in database current data and previous history data. 3) SCD3 : The original data is modified to include new data. One record exist in database new information are attached with old information in same row. What is hybrid slowly changing dimension? Hybrid SCDs are combination of both SCD 1 and SCD 2. It may happen that in a table, some columns are important and we need to track changes for them i.e capture the historical data for them whereas in some columns even if the data changes, we dont care. What is BUS Schema? BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts. What is a Star Schema? Star schema is a type of organizing the tables such that we can retrieve the result from the database quickly in the warehouse environment. What Snow Flake Schema? Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table. Differences between star and snowflake schema? Star schema A single fact table with N number of Dimension, all dimensions will be linked directly with a fact table. This schema is de-normalized and results in simple join and less complex query as well as faster results. Snow schema Any dimensions with extended dimensions are know as snowflake schema, dimensions maybe interlinked or may have one to many relationship with other tables. This schema is normalized and results in complex join and very complex query as well as slower results.
What is Difference between ER Modeling and Dimensional Modeling? ER modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design. What is degenerate dimension table? If a table contains the values, which are neither dimension nor measures is called degenerate dimensions. Why is Data Modeling Important? Data modeling is probably the most labor intensive and time consuming part of the development process. The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language , it can be reviewed and verified as correct by the end-users. In computer science, data modeling is the process of creating a data model by applying a data model theory to create a data model instance. A data model theory is a formal data model description. When data modelling, we are structuring and organizing data. These data structures are then typically implemented in a database management system. In addition to defining and organizing the data, data modeling will impose (implicitly or explicitly) constraints or limitations on the data placed within the structure. Managing large quantities of structured and unstructured data is a primary function of information systems. Data models describe structured data for storage in data management systems such as relational databases. They typically do not describe unstructured data, such as word processing documents, email messages, pictures, digital audio, and video. (Reference : Wikipedia) What is surrogate key? Surrogate key is a substitution for the natural primary key. It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table. It is useful because the natural primary key can change and this makes updates more difficult.Surrogated keys are always integer or numeric. What is Data Mart? A data mart (DM) is a specialized version of a data warehouse (DW). Like data warehouses, data marts contain a snapshot of operational data that helps business people to strategize based on analyses of past trends and experiences. The key difference is that the creation of a data mart is predicated on a specific, predefined need for a certain grouping and configuration of select data. A data mart configuration emphasizes easy access to relevant information (Reference : Wiki). Data Marts are designed to help manager make strategic decisions about their business. What is the difference between OLAP and data warehouse? Datawarehouse is the place where the data is stored for analyzing where as OLAP is the process of analyzing the data,managing aggregations, partitioning information into cubes for in depth visualization. What is a Cube and Linked Cube with reference to data warehouse? Cubes are logical representation of multidimensional data.The edge of the cube contains dimension members and the body of the cube contains data values. The linking in cube ensures that the data in the cubes remain consistent. What is junk dimension? A number of very small dimensions might be lumped together to form a single dimension, a junk dimension the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension. What is snapshot with reference to data warehouse? You can disconnect the report from the catalog to which it is attached by saving the report with a snapshot of the data.
What is active data warehousing? An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly, efficiently and proactively. What is the difference between data warehousing and business intelligence? Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term business intelligence is used to encompass OLAP, data visualization, data mining and query/reporting tools. How do you index a dimension table? Answer: clustered index on the dim key, and non clustered index (individual) on attribute columns which are used on the querys where clause. Purpose: this question is critical to be asked if you are looking for a Data Warehouse Architect (DWA) or a Data Architect (DA). Many DWA and DA only knows logical data model. Many of them dont know how to index. They dont know how different the physical tables are in Oracle compared to in Teradata. This question is not essential if you are looking for a report or ETL developer. Its good for them to know, but its not essential Tell me what you know about William Inmon? Answer: He was the one who introduced the concept of data warehousing. Arguably Barry Devlin was the first one, but hes not as popular as Inmon. If you ask who is Barry Devlin or who is Claudia Imhoff 99.9% of the candidates wouldnt know. But every decent practitioner in data warehousing should know about Inmon and Kimball. Purpose: to test if the candidate is a decent practitioner in data warehousing or not. Youll be surprise (especially if you are interviewing a report developer) how many candidates dont know the answer. If someone is applying for a BI architect role and he never heard about Inmon you should worry. How do we build a real time data warehouse? Answer: if the candidate asks Do you mean real time or near real time it may indicate that they have a good amount of experience dealing with this in the past. There are two ways we build a real time data warehouse (and this is applicable for both Normalised DW and Dimensional DW): a) By storing previous periods data in the warehouse then putting a view on top of it pointing to the source systems current period data. Current period is usually 1 day in DW, but in some industries e.g. online trading and ecommerce, it is 1 hour. b) By storing previous periods data in the warehouse then use some kind of synchronous mechanism to propagate current periods data. An example of synchronous data propagation mechanism is SQL Server 2008s Change Tracking or the old schools trigger. Near real time DW is built using asynchronous data propagation mechanism, aka mini batch (2-5 mins frequency) or micro batch (30s 1.5 mins frequency). Purpose: to test if the candidate understands complex, non-traditional mechanism and follows the latest trends. Real time DW was considered impossible 5 years ago and only developed in the last 5 years. If the DW is normalised its easier to make it real time than if the DW is dimensional as theres dim key lookup involved.
What is the difference between a data mart and a data warehouse
Answer: Most candidates will answer that one is big and the other is small. Some good candidates (particularly Kimball practitioners) will say that data mart is one star. Whereas DW is a collection of all stars. An excellent candidate will say all the above answers, plus they will say that a DW could be the normalised model that store EDW, whereas DM is the dimensional model containing 1-4 stars for specific department (both relational DB and multidimensional DB). Purpose: The question has 3 different levels of answer, so we can see how deep the candidates knowledge in data warehousing. What the purpose of having a multidimensional database? Answer: Many candidates dont know what a multidimensional database (MDB) is. They have heard about OLAP, but not MDB. So if the candidate looks puzzled, help them by saying an MDB is an OLAP database. Many will say Oh I see but actually they are still puzzled so it will take a good few moments before they are back to earth again. So ask again: What is the purpose of having an OLAP database? The answer is performance and easier data exploration. An MDB (aka cube) is a hundred times faster than relational DB for returning an aggregate. An MDB will be very easy to navigate, drilling up and down the hierarchies and across attributes, exploring the data. Purpose: This question is irrelevant to report or ETL developer, but a must for a cube developer and DWA/DA. Every decent cube developer (SSAS, Hyperion, Cognos) should be able to answer the question as its their bread and butter. Why do you need a staging area? Answer: Because: a) Some data transformations/manipulations from source system to DWH cant be done on the fly, but requires several stages and therefore needs to be landed on disk first b) The time to extract data from the source system is limited (e.g. we were only given 1 hour window) so we just get everything we need out first and process later c) For traceability and consistency, i.e. some data transform are simple and some are complex but for consistency we put all of them on stage first, then pick them up from stage for further processing d) Some data is required by more than 1 parts of the warehouse (e.g. ODS and DDS) and we want to minimise the impact to the source systems workload. So rather than reading twice from the source system, we land the data on the staging then both the ODS and the DDS read the data from staging. Purpose: This question is intended more for an ETL developer than a report/cube developer. Obviously a data architect needs to know this too. How do you decide that you need to keep it as 1 dimension or split it into 2 dimensions? Take for example dim product: there are attributes which are at product code level and there are attributes which are at product group level. Should we keep them all in 1 dimension (product) or split them into 2 dimensions (product and product group)? Answer: Depends on how they are going to be used, as I explained in my article One or two dimensions Purpose: To test if the candidate is conversant in dimensional modelling. This question especially is relevant for data architects and cube developers and less relevant for a report or ETL developer. Fact table columns usually numeric. In what case does a fact table have a varchar column? Answer: degenerate dimension Purpose: to check if the candidate has ever involved in detailed design of warehouse tables.
What kind of dimension is a degenerate dimension? Give me an example. Answer: A dimension which stays in the fact table. It is usually the reference number of the transaction. For example: Transaction ID, payment ref and order ID Purpose: Just another question to test the fundamentals.
What is show flaking? What are the advantages and disadvantages? Answer: In dimensional modelling, snow flaking is breaking a dimension into several tables by normalising it. The advantages are: a) performance when processing dimensions in SSAS, b) flexibility if the sub dim is used in several places e.g. city is used in dim customer and dim supplier (or in insurance DW: dim policy holder and dim broker), c) one place to update, and d) the DW load is quicker as there are less duplications of data. The disadvantages are: a) more difficult in navigating the star*, i.e. need joins a few tables, b) worse sum group by* query performance (compared to pure star*), c) more flexible in accommodating requirements, i.e. the city attributes for dim supplier dont have to be the same as the city attributes for dim customer, d) the DW load is simpler as you dont have to integrate the city. *: a star is a fact table with all its dimensions, navigating means browsing/querying, sum group by is a SQL select statement with a group by clause, pure star is a fact table with all its dimensions and none of the dims are snow-flaked. Purpose: Snow flaking is one of the classic debates in dimensional modelling community. It is useful to check if the candidate understands the reasons of just following blindly. This question is applicable particularly for data architect and OLAP designer. If their answers are way off then you should worry. But it also relevant to ETL and report developers as they will be populating and querying the structure How do you implement Slowly Changing Dimension type 2? I am not looking for the definition, but the practical implementation e.g. table structure, ETL/loading. {M} Answer: Create the dimension table as normal, i.e. first the dim key column as an integer, then the attributes as varchar (or varchar2 if you use Oracle). Then Id create 3 additional columns: IsCurrent flag, Valid From and Valid To (they are datetime columns). With regards to the ETL, Id check first if the row already exists by comparing the natural key. If it exists then expire the row and insert a new row. Set the Valid From date to todays date or the current date time. An experienced candidate (particularly DW ETL developer) will not set the Valid From date to the current date time, but to the time when the ETL started. This is so that all the rows in the same load will have the same Valid From, which is 1 millisecond after the expiry time of the previous version thus avoiding issue with ETL workflows that run across midnight. Purpose: SCD 2 is the one of the first things that we learn in data warehousing. It is considered the basic/fundamental. The purpose of this question is to separate the quality candidate from the ones who are bluffing. If the candidate can not answer this question you should worry. How do you index a fact table? And explain why. {H} Answer: Index all the dim key columns, individually, non clustered (SQL Server) or bitmap (Oracle). The dim key columns are used to join to the dimension tables, so if they are indexed the join will be faster. An exceptional candidate will suggest 3 additional things: a) index the fact key separately, b) consider creating a covering index in the right order on the combination of dim keys, and c) if the fact table is partitioned the partitioning key must be included in all indexes. Purpose: Many people know data warehousing only in theory or only in logical data model. This question is designed to separate those who have actually built a data warehouse and those who havent. In the source system, your customer record changes like this: customer1 and customer2 now becomes one company called customer99. Explain a) impact to the customer dim
(SCD1), b) impact to the fact tables. {M} Answer: In the customer dim we update the customer1 row, changing it to customer99 (remember that it is SCD1). We do soft delete on the customer2 row by updating the IsActive flag column (hard delete is not recommended). On the fact table we find the Surrogate Key for customer1 and 2 and update it with customer99s SK. Purpose: This is a common problem that everybody in data warehousing encounters. By asking this question we will know if the candidate has enough experience in data warehousing. If they have not come across this (probably they are new in DW), we want to know if they have the capability to deal with it or not. Question: What are the differences between Kimball approach and Inmons? Which one is better and why? Answer: if you are looking for a junior role e.g. a developer, then the expected answer is: in Kimball we do dimension modelling, i.e. fact and dim tables whereas in Inmons we do CIF, i.e. EDW in normalised form and we then create a DM/DDS from the EDW. Junior candidates usually prefer Kimball, because of query performance and flexibility, or because thats the only one they know; which is fine. But if you are interviewing for a senior role e.g. senior data architect then they need to say that the approach depends on the situation. Both Kimball & Inmons approaches have advantages and disadvantages. Purpose: a) to see if the candidate understands the core principles of data warehousing or they just know the skin, b) to find out if the candidate is open minded, i.e. the solution depends on what we are trying to achieve (theres right or wrong answer) or if they are blindly using Kimball for every situation. Suppose a fact row has unknown dim keys, do you load that row or not? Can you explain the advantage/disadvantages? Answer: We need to load that row so that the total of the measure/fact is correct. To enable us to load the row, we need to either set the unknown dim key to 0 or the dim key of the newly created dim rows. We can also not load that row (so the total of the measure will be different from the source system) if the business requirement prefer it. In this case we load the fact row to a quarantine area complete with error processing, DQ indicator and audit log. On the next day, after we receive the dim row, we load the fact row. This is commonly known as Late Arriving Dimension Rows and there are many sources for further information; . Purpose: again this is a common problem that we encounter in regular basis in data warehousing. With this question we want to see if the candidates experience level is up to the expectation or not. Please tell me your experience on your last 3 data warehouse projects. What were your roles in those projects? What were the issues and how did you solve them? Answer: Theres no wrong or right answer here. With this question you are looking for a) whether they have done similar things to your current project, b) whether their have done the same role as the role you are offering, c) whether they faces the same issues as your current DW project. Purpose: Some of the reasons why we pay more to certain candidates compared to the others are: a) they have done it before they can deliver quicker than those who havent, b) they come from our competitors so we would know whats happening there and we can make a better system than theirs, c) they have solved similar issues so we could borrow their techniques. What are the advantages of having a normalised DW compared to dimensional DW? What are the advantages of dimensional DW compared to normalised DW? Answer: The advantages of dimensional DW are: a) flexibility, e.g. we can accommodate changes in the requirements with minimal changes on the data model, b) performance, e.g. you can query it faster than normalised model, c) its quicker and simpler to develop than normalised DW and easier to maintain. Purpose: to see if the candidate has seen the other side of the coin. Many people in data warehousing only knows Kimball/dimensional. Second purpose of this question is to check if the
candidate understands the benefit of dimensional modelling, which is a fundamental understanding in data warehousing. What is 3rd normal form? {L} Give me an example of a situation where the tables are not in 3rdrdNF. Answer: No column is transitively depended on the PK. For example, column1 is dependant on column2 and column2 is dependant on column3. In this case column3 is transitively dependant on column1. To make it 3rd NF we need to split it into 2 tables: table1 which has column1 & column2 and table2 which has column2 and column3. Purpose: A lot of people talk about 3rd normal form but they dont know what it means. This is to test if the candidate is one of those people. If they cant answer 3 rd NF, ask 2nd NF. If they cant answer 2nd NF, ask 1st NF. Tell me how to design a data warehouse, i.e. what are the steps of doing dimensional modelling? Answer: There are many ways, but it should not be too far from this order: 1. Understand the business process, 2. Declare the grain of the fact table, 3. Create the dimension tables including attributes, 4. Add the measures to the fact tables (from Kimballs Toolkit book chapter 2). Step 3 and 4 could be reversed (add the fact first, then create the dims), but step 1 & 2 must be done in that order. Understanding the business process must always be the first, and declaring the grain must always be the second. Purpose: This question is for data architect or data warehouse architect to see if they can do their job. Its not a question for an ETL, report or cube developer. How do you join 2 fact tables? Answer: Its a trap question. You dont usually join 2 fact tables especially if they have different grain. When designing a dimensional model, you include all the necessary measures into the same fact table. If the measure you need is located on another fact table, then theres something wrong with the design. You need to add that measure to the fact table you are working with. But what if the measure has a different grain? Then you add the lower grain measure to the higher grain fact table. What if the fact table you are working with has a lower grain? Then you need to get the business logic for allocating the measure. It is possible to join 2 fact tables, i.e. using the common dim keys. But the performance is usually horrible, hence people dont do this in practice, except for small fact tables (<100k rows). For example: if FactTable1 has dim1key, dim2key, dimkey3 and FactTable2 has dim1key and dim2key then you could join them like this: 1 select f2.dim1key, f2.dim2key, f1.measure1, f2.measure2 2 from 3 ( select dim1key, dim2key, sum(measure1) as measure1 4 from FactTable1 5 group by dim1key, dim2key 6 ) f1 7 join FactTable2 f2 8 on f1.dim1key = f2.dim1key and f1.dim2key = f2.dim2key So if we dont join 2 fact tables that way, how do we do it? The answer is using the fact key column. It is a good practice (especially in SQL Server because of the concept of cluster index) to have a fact key column to enable us to identify rows on the fact table . The performance would be much better (than joining on dim keys), but you need to plan this in advance as you need to include the fact key column on the other fact table. 1 select f2.dim1key, f2.dim2key, f1.measure1, f2.measure2 2 from FactTable1 f1
3 join FactTable2 f2 4 on f2.fact1key = f1.factkey I implemented this technique originally for self joining, but then expand the usage to join to other fact table. But this must be used on an exception basis rather than the norm. Purpose: not to trap the candidate of course. But to see if they have the experience dealing with a problem which doesnt happen every day.

DW

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DW

Uploaded by

Copyright:

Available Formats

What is a lookup table?

What is the difference between a data mart and a data warehouse

You might also like