You are on page 1of 42

ISOM

MIS710 Module 1b
Relational Model and Normalization
Arijit Sengupta

Structure of this semester


ISOM
MIS710
1. Design

0. Intro
Database
Fundamentals

Conceptual
Modeling
Relational
Model

2. Querying

Query
Languages
Advanced
SQL

3. Applications

4. Advanced
Topics

Java DB
Applications
JDBC

Transaction
Management
Data
Mining

Normalization

Newbie

Users

Designers

Developers

Professionals

Todays Buzzwords
ISOM

Relational Model
Superkey, Candidate Key, Primary Key
and Foreign Key
Entity Integrity Rule
Referential Integrity Rule
Normalization
First, Second, Third, and Boyce-Codd
Normal Forms
Unnormalization

Objectives of this lecture


ISOM

Understand the Relational Model and its properties


Understand the notion of keys
Understand the use and importance of referential
integrity
Provide an alternative way to design relations using
semantics rather than concepts
Take an existing flat file design and creating a
relational design from it through the process of
Normalization
Identify sources of problems (or anomalies) within a
given relational design
Argue about improvements to designs created by
others

Relational Data Model


ISOM

Originally proposed by Codd in 1970


Based on mathematical set theory
Attribute Names

Relation

Tuples

ID
S1
S2
S3
S4
S5

Name
Jose
Alice
Lin
Joyce
Sunil

Age
21
18
32
20
27

Attributes

Address
Stoned Hill
BigHead
Done-Audy
Atlanta
Mare-iota

GPA
3.1
3.2
2.9
3.7
3.2

Attribute
Values

Relation: Properties
ISOM

A relation is a set of tuples


A tuple is a set of attribute-value properties
(relations)
Ordering of attributes is immaterial
Ordering of Tuples is immaterial

Tuples are distinct from one another


Attributes contain atomic values only
Emp#
E1

Name
Jose' 'M.' 'Smith'

Address
3413 Main Street', 'Atlanta', GA

Attributes
ISOM

Attribute name
Attribute names are unique within a relation

Attribute domain
Set of all possible values an attribute may
take
Domain (GPA) =
Domain (name) =
Domain (DateOfBirth) =
Domain (year)

Number of attributes: degree of the


relation

Tuples
ISOM

Aggregation of attribute values


S1 = (s1, Jose, 21, StonedHill, 3.1)
S2 = (s2, Alice, 18, BigHead, 3.2)

Cardinality: Number of tuples in a relation


ID
S1
S2
S3
S4
S5

Name
Jose
Alice
Lin
Joyce
Sunil

Age
21
18
32
20
27

Address
Stoned Hill
BigHead
Done-Audy
Atlanta
Mare-iota

GPA
3.1
3.2
2.9
3.7
3.2

What is the difference between the cardinality and the


degree?

Primary Keys
ISOM

Superkey: SK, a subset of attributes of R, satisfying


Uniqueness, that is, no two tuples have the same
combination of values for these attributes
Candidate Key: K, a superkey SK, satisfying
minimality, that is, no component of K can be
eliminated without destroying the uniqueness
property.
Primary Key: PK, the selected Candidate key, K.
Can a primary key be composed of multiple attributes?
Can a relation have multiple primary keys?

Keys - example
ISOM
Disk: (ISBN#, Artist_name, Album_name, Year, Producer, Genre, time, price)

Superkeys?

Candidate keys?

Primary key?

Entity Integrity Rule


ISOM

The primary key of a base relation


cannot contain a NULL value.
Enforcement of the rule:
An update which results in a NULL value
in the primary key must be rejected.

Are the following ok?

Primary Key
Course
201
201
NULL

Section
1
NULL
NULL

Meets
MW
TTh
MWF

Enrolled
20
25
18

Foreign Key
ISOM
Physician (ID, Name, )

Patient (ID, Name, PhysID*, )

Club (ID, Name, ) Player (ID, Name, ?*, )


Order (OrdID, Date, , ?*)

Customer (ID, Name, , ?*)

Dept (DeptID, Name, , ?*) Employee (EID, Name, , ?*)

Attribute(s) of one relation that reference(s) the PK of another


relation
FK may or may not be (a part of) the PK of this relation
Course (CourseID, Name, , ?*)
Class (ClassID, Meets, , ?*)
Student (SID, Name, , ?*) Registration (?)

Can an FK refer to a part of the PK of another relation?


Can an FK refer to a PK of the same relation?

Foreign Key ..
ISOM

FK and referenced PK may have different names


The values of FK must draw from the value set of PK

Primary Key
Value Set

Foreign Key
Domain

How do we define
the Domain of an FK?
Domain
Can an FK have a NULL value?
What can we enforce with PKs and FKs?

Referential Integrity Rule


ISOM

If FK is the foreign key of a relation R2, which matches the


primary key PK of the relation R1, then:
the FK value must match the PK value in some tuple of R1, or
the FK value may be NULL, but only if the FK is not (a part of)
the PK of R2.

Enforcement of the Rule


An update on either a referenced PK or an FK must satisfy the
rule. Otherwise, the operation is rejected.

Which operation on the primary key may violate this rule?


Which operation on the foreign key may violate this rule?

ISOM

Referential Integrity
Enforcement
If an operation violates referential integrity:
Restrict
reject the operation

Cascade
try to propagate the operation to all dependent FK
values, if it is not possible, reject the operation

Nullify (or Default)


set all dependent FK values to NULL (or a default
value), if that is not possible, reject the operation

Cases for each of the above situations?

Creating Relations
ISOM
create table STUDENT (
ID char (11) not null primary key,
Name char(30) not null,
age int,
GPA number (2,1));
create table COURSE (
courseno char (6) not null primary key,
coursename char(30) not null,
credithours number (2,1));
create table REGISTRATION (
ID references STUDENT (ID)
on delete cascade,
CourseNum references COURSE (courseno),
primary key (ID, CourseNum) );

Normalization - Motivating Example


ISOM
SID
s1
s1
s1
s2
s2
s3
s3
s3

Name
Joseph
Joseph
Joseph
Alice
Alice
Tom
Tom
Tom

Grade
A
B
A
A
A
B
B
A

Course#
CIS800
CIS820
CIS872
CIS800
CIS872
CIS800
CIS872
CIS860

Text
b1
b2
b5
b1
b5
b1
b5
b1

Major
CIS
CIS
CIS
CS
CS
Acct
Acct
Acct

Dept
CIS
CIS
CIS
MCS
MCS
Acct
Acct
Acct

Is there any redundant data?


Can we insert a new course# with a new
textbook?
What should be done if CIS is changed to MIS?
What would happen if we remove all CIS 800
students?

Why Normalization?
ISOM

Poor Relation Design causes Anomalies


Insertion anomalies - Insertion of some piece of
information cannot be performed unless other
irrelevant information is added to it.
Update anomalies - Update of a single piece of
information requires updates to multiple tuples.
Deletion anomalies - Deletion of a piece of
information removes other unrelated but necessary
information.

Normalization improves the design to remove


these anomalies

Why Normalization?
ISOM

Benefits
contain minimum amount of redundancy
allow users to insert, delete and modify tuples
in the relation without errors or inconsistencies.
improve quality of information in the database
decrease storage space for the database

Costs
may contribute to performance problems
may require more storage in some cases

Unnormalized Relation
ISOM

STUDENT STUDENT COURSE


ID
NAME
ID
224 Waters
CIS20
CIS40
CIS50
351 Byron
CIS30
CIS50
421 Smith
CIS20
CIS30
CIS50

COURSE
NAME
Intro CBIS
Database Mgt
Sys.Analysis
COBOL
Sys.Analysis
Intro CBIS
COBOL
Sys.Analysis

INSTR
NAME
Greene
Hong
Purao
Brown
Purao
Greene
Brown
Purao

ROOM

CREDITS GRADE

205G
311S
139S
629G
139S
205G
629G
139S

5
5
5
3
5
5
3
5

A
B
B
B
C
B
B
B

Create a Definition for this relation.


Do you see any problems in the definition?
Do you see any anomalies in the data?

Normal Forms
ISOM

NF2
1NF
2NF

3NF
BCNF

Unnormalized Relation
Only atomic attributes

First Normal Form


Remove nonkey dependency

Second Normal Form


Remove transitive dependency

Third Normal Form


Dependency preservation: BCNF
Remove Multi-valued Dependencies: 4NF
Remove Join Dependencies: 5NF

Higher Order Forms

The Basis of Normalization


ISOM

Functional Dependency (FD)


Consider two attributes, X and Y, and two
arbitrary tuples r1 and r2 of a relation R.

Y is functionally dependent on X iff:


value of x in r1 = value of x in r2
implies
value of Y in r1 = value of Y in r2

Also stated as: R.X R.Y or X Y

Properties of FDs
ISOM

If R.X R.Y or X Y
X is called the determinant of Y.
X may or may not be the key attribute of R.
A FD changes with its semantic meaning
Name Address?

X and Y may be composite


X and Y may be mutually dependent on each other
Husband Wife, Wife Husband

The same Y value may occur in multiple tuples


Course# Text

Fully Functional Dependencies


ISOM

When is X Y a FFD?
When Y is not functionally dependent on any proper subset
of X

X Y is a fully functional dependency ( FFD )


( SID, Course# ) Name? ( SID, Course# ) Grade?
( SID, Name ) Major?
( SID, Name ) SID?

By default, the term FD refers to FFD

Transitive Dependencies
ISOM

Given attributes X, Y, and Z of a relation R,


Z is transitively dependent on X (X Z)
iff

X Y and Y Z

For example:
SID Dept, SID Major,
Dept School, Major Dept

Do you see any Transitive Functional Dependencies?

Some Inference Rules for FDs


ISOM

An FD is redundant if it can be derived from other FDs based on


a set of inference rules. Some of these rules are:

Reflexive rule: If X Y, then X Y


X always determines a subset of itself.

Augmentation rule: If X Y, then XZ YZ


Adding an attribute(s) on both side does not change the FD.

Transitive rule: If X Y & Y Z, then X Z


Functional dependencies can be chained.

Decomposition rule: If X YZ, then X Y and X Z


Given: { SID Name, SID Major, Major Dept }, which
ones is/are redundant?
SID School, SID Dept, Dept School
SID ( Name, Major ), (SID, Name) (Major, Name)
SID SID, SID (Name, SID)

First Normal Form


ISOM

DEFINITION
A relation R is in first normal form (1NF) if and
only if all underlying domains contain atomic
values only.

Translation
To be in first normal form the table must not
contain any repeating attributes.

Implication
Are all relations in First Normal Form (1NF) ?

Example - 1NF
ISOM

StudentID
224
251
421

The unnormalized relation


has been decomposed in two.

Relation: Student

Relation: Student-Course
StudentID
224
224
224
351
351
421
421
421

Course#
CIS20
CIS40
CIS50
CIS30
CIS50
CIS20
CIS30
CIS50

Course Title
Intro CBIS
Database Mgt
Sys.Analysis
COBOL
Sys.Analysis
Intro CBIS
COBOL
Sys.Analysis

Instrname
Greene
Hong
Purao
Brown
Purao
Greene
Brown
Purao

What are the PKs?

ROOM
205G
311S
139S
629G
139S
205G
629G
139S

StudentName
Waters
Byron
Smith

CREDITS
5
5
5
3
5
5
3
5

GRADE
A
B
B
B
C
B
B
B

Anomalies (with only 1NF)


ISOM

Insertion Anomaly
A new course cannot be inserted in the database (relation
Student-Course) until a student registers for that course.

Update Anomaly
If the instructor of a course is changed, this fact would have
to be noted at many places in the database (many tuples of
the relation Student-Course).

Deletion Anomaly
Withdrawal of all students from an existing course (that is,
deletion of related tuples from the relation Student-Course)
will result in unwarranted removal of that course from the
database.

Anomalies in 1NF
ISOM

Course (SID, Name, Grade, Course#, Text, Major, Dept)

1NF Relations have anomalies


Redundant Information ?
Update Anomalies ?
Insertion Anomalies ?
Deletion Anomalies ?
Major

SID

Name

Course#

Grade

Dept
Text

Second Normal Form


ISOM

DEFINITION
A relation R is in second normal form (2NF) if
and only if it is in 1NF and every nonkey attribute
is dependent on the full primary key.

Translation
A table is in second normal form if there are no
partial dependencies.

Implication
What kinds of primary keys may lead to a
violation of the Second Normal Form (2NF) ?

Bubble Chart
ISOM

Reconsider the example ..


StudentName
Credits
StudentId+
CourseId

CourseTitle
Instructor
Classroom
Grade

ISOM

Dealing with Compound


Keys
Revised Bubble Chart
StudentName
Credits
StudentId

CourseTitle
Instructor

CourseId
Classroom
Grade

Example - 2NF
ISOM

STUDENT STUDENT
ID
NAME
224 Waters
251 Byron
421 Smith

COURSE
ID
CIS20
CIS30
CIS40
CIS50

COURSE
TITLE
Intro to CIS
Java
DBMS
Systems Analysis

CREDITS
5
3
5
5

STUDENT COURSE
ID
ID
224 CIS20
224 CIS40
224 CIS50
351 CIS30
351 CIS50
421 CIS20
421 CIS30
421 CIS50

GRADE
A
B
B
B
C
B
B
B

Anomalies with (only) 2NF


ISOM

Insertion anomaly
Information about a faculty (potential advisor) cannot be
added to the database unless a student is assigned to
him/her.

Update anomaly
If the advisors office location or phone were changed, many
tuples would need to be changed.

Deletion anomaly
If all students assigned to an advisor graduate, information
about the advisor will disappear from the database.

Third Normal Form


ISOM

DEFINITION
A relation R is in third normal form (3NF) if and only if
it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key.

Translation
A table is in Third Normal Form if every non-key
attribute is determined by the key, and nothing else.

Implication
How many total attributes must the relation have for a
possible violation of the Third Normal Form (3NF) ?

3NF Example
ISOM

Chalk out the relations.


StudentName
StudentId

TotalCredits
Advisor
Status

Advisor

AdvisorOffice

How do you maintain student-advisor relation?


AdvisorPhone

Boyce-Codd Normal Form (BCNF)


ISOM

Update anomalies occur in an 3NF relation R if


R has multiple candidate keys,
Those candidate keys are composite, and
The candidate keys are overlapped.

Computer-Lab (SID, Account, Class, Hours)

A relation R is in BCNF iff every determinant is


a candidate key.

The Normalization Process


ISOM

1.
2.
3.
4.
5.

Flatten the Table Completely (no composite


columns)
Find the Key and all FDs (well as many as
you can possibly detect)
Find Partial Dependencies and decompose
relation using them (2NF)
Find Transitive dependencies and decompose
using them (3NF)
Remember this is not a deterministic
method depends on the order in which FDs
are chosen, so same Relation, same set of
FDs can lead to different decompositions!

Lossless Decomposition
ISOM

A bad decomposition loses information


In a good decomposition
The join of decomposed relations restores the original
relation
Decomposed relations can be maintained independently

Rissanens rule for non-loss decomposition: Two


projections R1 and R2 of a relation R are
independent iff:
Every FD in R can be logically deduced from those in R 1
and R 2 , and
The common attributes of R 1 and R 2 form a candidate key
for at least one of the pair.

Higher Normal Forms


ISOM

Fourth Normal Form


Multivalued Dependencies (Fagin 1977)

Fifth Normal Form


Join Dependencies (Fagin 1979)

Other Dependencies
Inclusion Dependencies (Casanova 1981)
Template Dependencies (Sadri 1982)
Domain-Key Normal Form (Fagin 1981)

In-class Exercise Normalize this:


ISOM

You might also like