You are on page 1of 59

Data Modeling

Introduction

Name Experience in Database Expectation from the class

Rajesh Kumar V.S rajesh@aroha.co.in

www.ilearnonline.co.in

Agenda

Data Modeling Different types of models Operational Systems 3rd NF (ER Modeling) Enterprise Data Modeling Modeling Lifecycle Modeling stakeholders Validating models Questions..

www.ilearnonline.co.in

Model

Many perspectives of business How it works can be defined by the process model How the organization is organized can be defined by the org chart Where the organizations operate can be defined / explained by location or Geo map To understand the data flow of the application you can create data flow diagram What information needs to run the business is defined by Data Model

www.ilearnonline.co.in

Types of Models

Subject Area / Contextual


Define main terms and definitions for the high level entities. Not every project will have this, only if the scope is big then we can have this. Important entities which holds the application together Business concepts and rules Identifying the relationship between each of the entities. Complete detail requirements, resolving M:M relationships identified in Conceptual Model. Detailing all the attributes needed for every entity Providing relationships in terms of PK and FK.

Conceptual (Overview)

Logical

Physical Model Exploiting the RDBMS to take the advantage of the model.

www.ilearnonline.co.in

Business to Physical Modeling


The Business

Contextual

Conceptual

Logical Usually 1:5 to 1:7 ratio of entities between different models. Physical

www.ilearnonline.co.in

About Data Modeling


A data model is the place through which you maintain information about things (entities) The facts about those things are nothing but rows. Its a real thing, not a technology Its stable until business does not change, if business changes then the data model has to be flexible enough to accommodate that change. It drives the consistency of dealing with information It should always start simple. Data model is relevant to the business As the business rules and requirements gets complex, the painful detail comes

www.ilearnonline.co.in

About Data Modeling


To be a better data modeler you should have both domain knowledge and good exposure to data modeling concepts Its a living thing, so expect changes for sure. Its our visualization of information and proactive vision of business is key to build model which does not change quite often. Data model is a non technical description of your business in terms of things it need to know about By following few techniques a business analyst can build great data models Collect all the facts about business through knowledge transfer sessions, reading about your business and the process at least 10% of your time. Involve in discussions with various business line mangers to get overview of what they are doing.

www.ilearnonline.co.in

Information Visualization
Business Line of Business

Business Process

Activity

Information DATA

www.ilearnonline.co.in

Information Visualization

Business Line of Business Business Process Activity Information Data

HDFC Credit Card Customer Care Customer verification Verifier, address etc Data

www.ilearnonline.co.in

Information Visualization

Business Line of Business Business Process Activity Information Data

IBM ???? ???? ???? ???? ????

www.ilearnonline.co.in

Zachman Framework

Zachman Framework -- stakeholders

Contextual Data Model


It explains the major processes Focuses on specific problem area (SCOPE) Master entities are defined At this level we never talk about the data what you are going to store (No details) Excellent starting point for the data modeling Vocabulary, Define the terms.

(Classic example of confusing the terms are vendor / supplier).

www.ilearnonline.co.in

Conceptual / Logical Model

Conceptual Model Entities are identified Not Normalized M:M relationships No keys Can list the important attributes Logical data model M:M resolved Shows PK, FK and all attributes Its fully Normalized All attributes are atomic

www.ilearnonline.co.in

Generalization of the data model


Any business have to have the following entities through which it can work. What is the product you sell in a Software service company? What is the product you sell in Airlines?

Customers / markets

Products

Services

www.ilearnonline.co.in

Create simple data model


Create a model to hold all the employee information, career path individual choose. Each career path has certain set of training programs. Individuals can choose training programs based on the current capabilities and future needs. Assign to be participants of individuals to up coming training programs based on the request individual mentioned. You should look for manager recommended candidates for the training program first then fill the requested resources based on the seats in the training program. Create the set of training programs and associate the proposed candidates based on the business rules. System should be able to capture the attendance for the training program.

www.ilearnonline.co.in

Visualizing the functionality

www.ilearnonline.co.in

Attributes conceptual vs logical


Customer -------------------Contacts Addresses Bank info Conceptual Customer_ID Customer_nm Effective dt In_eff_dt Cust_crdit

Cntc_typ_id Short_dsc description

Addr_typ_id Short_dsc description

Cust_id Account_id Bank_id Bank_nm Address1 City

Cust_id Cntc_typ_id Contact_nm Email phone

Cust_id addr_typ_id Address1 Address2 City State

Logical Model

www.ilearnonline.co.in

How to model (Steps)

Requirement gathering and understanding the business process is the foundation to get the right data model. Gather the information Questions Answers Analyze and confirm Add to model Make the above 4 steps as an iterative process before freezing the model. Apply the business scenarios and see whether the data model accommodates the same.

www.ilearnonline.co.in

Depicting and Documenting


places

Customer
Order contains

Product

Data Model Diagram

specifies Order Line


Order Data Model

Data Model Introduction Document Entity definition

Attribute definition Additional Bus Rules

www.ilearnonline.co.in

About entities

An entity is a person, place, thing, event or any of the interest to the enterprise, about which facts may be recorded You should name it in a real world term Eventually entity becomes a table in relational database Examples Employee Region Department Customer Entities are not supposed to be designed by using the input or output formats (screens and reports). This wont give us enough flexibility in the data model.

www.ilearnonline.co.in

Entity Definition

Answers the question

Definition Interesting points Exceptions Examples (demonstrate the data)


List of instances Scenarios

www.ilearnonline.co.in

Entity example

Building

A ground based structure which supports the company business operations by housing employees, equipment, or supplies Examples: office buildings, retail store, plants and warehouse Can be owned or leased May or many not have street address To avoid misunderstandings between entities we have to do this Its crucial, so do them early

Key Points

www.ilearnonline.co.in

Entity Types

Kernels / Master The central object in the model

Every thing else either further describes, associates or classifies the kernel entities

Can exists independently Ideally, its the starting point for modeling Associative Relates two other entities Evolves from resolving M:M relationships Important associates shown in conceptual, remaining will be detailed in the logical model Types Kind of Kernel entity that classifies or categorizes other entities Example (Attribute Customer Type in customer table , or Account Type and Account table) Transaction tables

www.ilearnonline.co.in

Attributes

A property of a thing that can be expressed as a piece of information one of the facts about things that must be maintained Properties of the entities Example for customer entity, following are the attributes

Cust_name Cust_contact Cust_address Cust_city

Questions to ask Is it a fundamental attribute or could it be derived attribute Business Rules association at the attribute level. Exceptions

www.ilearnonline.co.in

Relationships in the RDBMS


An association between two things (entities) is called a relation We have three different types of relationships in RDBMS

1:1 (One to One) rare 1:M (One to Many) common M:M (Many to Many) more in conceptual model, none in Logical model and Physical model 1:1 (Person to PAN ID) 1:M (Customer to Phone) M:M (Doctor and Patient)

Examples

M:M relationships

Only RDBMS supports the M:M relationship Only in the conceptual model, we can have this relationship In the logical model, we have to resolve this relationship. We resolve M:M by using associate table concept.

doctor

Patient

doctor

Patient

doc_pat associate table

www.ilearnonline.co.in

M:M relationship

Steps in resolving M:M

Split it into two 1:M relationships

Example: Customer and Product has M:M relationship

Always M:M is intersected by some other parameter, in this case its the orders. In the case in survey you will have multiple questions, so the set of questions becomes the parameter. In this case a specific course can be conducted in more than one branch. In one branch you will conduct different courses.

Example: Customer and Survey.

Example: Course and Branches.

www.ilearnonline.co.in

Special cases

Recursion

Empno and mgrno are stored in the same entity Mgrno is also an employee number Usually this goes as self referential integrity constraint We define this as a foreign key which refers the PK of the same table. A recursive relationship is fully optional.

customer

account

phone

Call records

www.ilearnonline.co.in

Sample Tables
Cust_id 100 101 102 Cust_name Citi HSBC SBI Email id citi@citi.com hsbc@hsbc.com sbi@sbi.com Contact name Bill H Tim D Ram K Cust_since 10-JAN-08 15-APR-10 12-APR-11 Act_id 1234 1235 1236 1237 Call_id 56789 56790 --------Duration 14 15 --------Phone_no 123456776 987654321 --------098756782 1234 Cust_id 100 101 100 102 No_of_phones 2 3 2 1

Phone_no
123456776 987654321 456780987

Act_id
1234 1235 1234

Current system is generating bills based on the customer id. Today, Bank A bought the Bank B, then we should generate the bills bank B to bank A only. How do we change this model.

www.ilearnonline.co.in

Normalization

Its a methodology we follow in order to make sure there is no redundant data available in the data model. Its a part of your logical model process with in database design Advantages: Reduced space in the db Transaction speed increases Disadvantage Complex queries i.e. a query which has more number of table joins will tend to impact the query performance.

customer
Cust_id Cust_name Cust_dob Cust_phone Cust_email Cust_city

Assume we have 10000 rows in this table. We have customers from 5 different cities. Mumbai has 2000 customers. One of the city name changed again (Mumbai -- > ) In the real world one activity happen. What statement you will issue to record the same change on our data. How many records you changed?

www.ilearnonline.co.in

Normalization process

Its about breaking entities into their most granular form 1st Normal Form (1NF) Every attribute must be atomic Repeating attributes moved to a separate entity 2nd Normal form (2NF) All the attributes should be functionally / partially dependent on the primary key / concatenated 3rd Normal form (3NF) All the attributes should be primarily dependent on the primary key
No repeating elements or groups of elements 1st NF No partial dependencies on a concatenated key -- 2nd NF No dependencies on non-key attributes 3rd NF

www.ilearnonline.co.in

Table to be normalized
CUSTOMER VEHICLE COST DATE_OUT DATE_IN CUSTOMER_ PHONE CUSTOMER_ CITY

Dolye, Dawn
Davidow, Joel Fox, Valerie Vidal, Alina

Ford
chevy Suzuki Honda

59.99
79.99 69.99 39.99

10-SEP-01
12-OCT-01 15-NOV-01 24-NOV-01

12-SEP-01
15-OCT-01 19-NOV-01 25-NOV-01

123789489
879393399 89303809 74990039

DALLAS
NEW YORK DALLAS NEW YORK

Uma T
Dolye, Dawn

Honda
Ford

39.99
59.99

23-APR-11
15-jun-11

25-APR-11
19-jun-11

89009000
123789489

New york
DALLAS

This excel kind of data is tracking all the transactions happens when a customer rents the vehicle. It has some customer information, vehicle information and the rented out and returned back data. Normalize this table based on your assumptions.

www.ilearnonline.co.in

Normalized model

www.ilearnonline.co.in

Tables to analyze
Store name ABC Stores Address 123, MGRoad, Chennai 124, 5th Cross, Anna Nagar, CHN Jan_sales 345609 Feb_Sales 94040 Mar_sales 45958 Apr_Sales 748490 May_sales 849938 June_sales 84949

BBC Stores

849409

440400

9840940

89989

456655

23455

cust_nm Ram Kumar Ramesh K

Address 123, Housur Road, BLR 34,ABC St

Email 1 ram@gmail.com ramK12@gmail.co m

Email 2 ramk@hotmail.com Null

Mobile1 84940390 89490490

mobile2 484040398 null

www.ilearnonline.co.in

What should we have in requirements

Agent Management -- Requirements


Agent is resource through we which we sell the products to our end customers. With out agent you cannot sell any product we have. Products have different life spans, minimum payment, minimum terms etc. Product can be sold during its offer only. Company provides the training to agents when a new product has been launched. Only agents who completed the training can sell that product. When a customer buys a product through an agent, then we create a contract called as policy. All the policy numbers will uniquely identify a product, a customer, agent. We get revenue after getting the contract and have to pay the commission to agent. This commission depends on premium then pay. System should be able to track the premium payment data and intimate to respective agents to do a follow up for the payments which are due. Need to stored the customer multiple address, agents family, addresses and bank /account information for payment processing. Organization should be able to generate the expected commission for next month based on the policy premiums we expect.

www.ilearnonline.co.in

Conceptual Model for Agent System


Agent Bank family Product Terms Plans Comm. plan Training program customer policy

agent commission

Policy transactions

www.ilearnonline.co.in

Normalization

1NF, 2NF, 3NF are breaking entities into their most granular form 4NF and 5NF also about granularity, but only the granularity of associative entities with 3 or more parents

AB ABC No 4NF or 5NF concerns Possible 4NF or 5NF concerns

4NF and 5NF


A B C A B C

ABC Assume this is wrong 4NF

AB

BC

AB 5NF

BC

CA

Example
This is OK if. Any combination is valid

Program

Worker

Role

Assignment

BUT.. Workers have defined role Programs require certain roles Workers are assigned programs independent of roles Then we will end up writing lots and lots of coding to make sure to implement these business rules. If we normalize it some of the rules are automatically taken care of

Modeling Approach
Top Down

Good way of getting a model We will be thinking out of the box because we talk about entities, how two entities relates to each other etc Think about the business in broader sense by using the Business Analysts / subjects. Most of the companies follow the top down approach and get validate based on the scenarios.

Bottom Up

Based on the output we will create the model Kind of reverse engineering (some times) Easy to normalize, but we may miss out on bigger picture.

www.ilearnonline.co.in

Mutually exclusive relationship


customer order

)
Internal department

In this scenario, an order can be placed only by customer or by internal department So its mutually exclusive (Solid lines exactly one) The other example is, payment table in telecom billing system can have credit_card, Cash or check. In one payment you will accept either one of these three, so its mutually exculsive

Mutually inclusive relationship


Radio Advertisement

)
Televison Internet

In this scenario, an advertisement can be placed in different media So its mutually inclusive (dotted lines one or more)

Claim Processing system


region

state
branches contract Agent

Contract type

category

city

product

claims customer claim status Claim transaction Claim category

Cust_type

www.aroha.co.in

www.ilearnonline.co.in

Modeling time and history


Principles

Dont change the stored data, if you do you loose the history It becomes as is reporting. Add new records when there is a change This way you track the history of changes to that record and when it got changed. If they can correct the changes, then capture the correction date also, so that you have all the information to track back what did happen. Have audit columns in place to capture the time and history.

Logical / Physical Modeling


Logical Data Model
Business info and rules Entity, Attribute

Physical Data Model


Model in the database Tables, Columns

Primary Key, Alternate key


Inversion Key Rule Relationship

Primary Key constraint, unique index


Non unique index (for performance) Check constraint / triggers PK /FK

Definition of the entity

Comments

www.aroha.co.in

www.ilearnonline.co.in

Physical Model Implementation


All most same as Logical Model, with little variation. Taking the advantage of the database what you are dealing with. Make sure all the PK and FK are numbers. (Try a POC before we say this) Example: Should I use partitions or materialized views or external tables etc. Should we use columnar Database (Vertica, Opensource db) for certain data marts. Designing rolling window based on partitions to improve the performance. Creating views to simulate the scenario rather than increasing the database size (especially small tables)

Checklist for Physical Model


Logical model as the input. Factors of the DB on which we deploy the tables. (Oracle,
SQL Server, DB2, MySQL etc)

Data Velocity factors (Frequency of activities in the business) Initial Load Data & Incremental load planning Data Volume calculation (Capacity Planning) Identifying importance of tables in terms of joins and table data volume Deciding partitions, indexes, views, materialized views Implementing back up recovery mechanism for the system.

Channel Partner Payment system


hp sells the products through channel partners. The scope of the project is related to only sales. Through presales system hp generates the quotes and provide the same to channel partners, channel partners sells the products to end customers based on the quote (This is outside of this system) Certain channel partners can sell only certain products of hp. End customer who buy hp products through channel partner can be an individual or a company (dont want to consider end customers as part of the scope). Based on the sales made by the channel partner, hp have to raise the invoices to channel partner. System should have the ability to store the grade of the channel partner. hp provides the commission to channel partners based on the sales they made automatically once in three months. System should have the ability to store the addresses, different kinds of contacts and bank information of channel partners. Channel partner payment to our invoices and the payment to channel partner commission are tracked in this system only

De-normalization

Is a process where we increase the query performance Especially used for reporting, not used to increase the transaction processing To get best of both worlds, is to create the normalized model for faster transaction processing and take the advantage of oracles materialized views concept to get your reports run faster We are making an another copy of the data, but system takes care of it. This way we dont implement new bugs.

Insurance Business

Company sells insurance to various customers through agents We sell different kind of insurance policies like Risk, kids education, endowment, pension etc Nominations must exists in each one of the policy. Max of 2 nominations can exists of every policy. We can surrender the policy if we paid the minimum number of premiums. This number varies between policies Customers can get loan against the policy they have. Agent commission should be processed by the system. Based on the policy as well as the premium paid, the commission to agent differs. When we surrender the policy, the charges are applicable. When we pay the premium late the interest should be added to the payment. Build a conceptual model and logical model for the above mentioned business.

Telephone Billing

Customer comes and buys the telephone Customer can have multiple phones. Customers can be either corporate or individuals Billing flexibility should be available based on certain phone numbers so that I can send the consolidated bill to different groups with in the corporation Billing cycle can be decided by customer Must maintain different addresses of the customer Always we have to mail the bill to Billing Address, should have flexibility to send it to an electronic address also. Customer can subscribe for multiple service like wireless, internet. Customer can select a specific plan (rate)

Retail Case Study


Assume, we want to create a data model which takes care of a retail chain. We want to create the model which stores the supplier information, order management through which we place orders to all the suppliers. We have one warehouse from which point, we distribute to various stores in the city and store the point of sale. Customer can return the goods with in 7 days. Some of the products cannot be taken back. One employee can work multiple stores.

Star Schemas for General Insurance


Training program AGENT customer policy Coverage type policy status

Trg fact

Enroll fact

Exp rev fact

Premium fact

Comm. fact

claims fact

workflow fact

product

time

Payment type

channel

location

Insurance Model - Normalized model


Org_cat Org_cat_id Desc Employees Emp_id Ename Joindate Sal Comm Org_id product department Customers Transaction_type

organization Org_id Org_cat_id Desc Manager_id

Cust_id Name Emp_id Address Phone Email State

Tran_type_id desc

Transactions Cust_id Policy_id Trans_id Date Tran_type_id amt

Dpet_id Dept_name Org_id

Prod_id Name Start_date End_date

Cust_id Policy_id Prod_id Start_dt End_dt notes

Cust Policies

Snow flake - Insurance model


Cat_lookup
org_dim Cat_id Cat_desc Org_id Name Cat_id Manager_nm

cust_dim
txn_fact Cust_id Name Address Phone State_id

State_lookup State_id State_desc

dept_lookup Dept_id Dept_name

emp_dim

Year_lookup Year_id year

Emp_id Ename Join date Sal Dept_id

Org_id Emp_id Cust_id Product_id Policy_id Cal_date_id Payment_amt Claims_amt

policy_dim

Product_id Name Start_date End_date Cal_dim

Quarter_lookup
Quarter_id Quarter Year_id month_lookup Month_id Cal_month Quarter_id

Week_lookup
Week_id Cal_week Month_id Cal_date_id Date Week_id

Normalized Model (Electronic AtoZ Store)


In store POS

Online Sales

Call Center

Thank you

Questions

You might also like