Data Modelling Dbms PDF

Data Modeling & Database
Design Concepts
Database Development Life Cycle
• Prelimnary Planning
• Feasibility Study
• Requirements Definition
• Conceptual Design
• Implementation
• Evaluation & Maintenance
You are familiar with these steps from
Software Engineering
Data Models
• A model is a representation of real world objects
and their associations.
• A data model in contrast is a representation of
real world data objects and their associations.
• D ata m o d e l s a c t u a l l y d ef i n e h ow d ata i s
connected to each other and how they are
processed inside the system.
• More precisely a Data Model is a logical structure
of Database. It describes the design of database
to reflect entities, attributes, relationship among
data, constrains etc.
• The purpose of a data model is to represent the
data and to make the data understandable. If it
does this then it can be easily used to design a
database.
• There have been many data models proposed in
literature and we can categorize them according
to the level they are used to describe the
database structure.
• High Level or Conceptual Data Models provide
concepts that are close to the way many users
perceive data, whereas Low-Level or Physical
Data Models provide concepts that describe the
details of how data is stored in the memory of the
computer.
• Concepts provided by low-level data models are
generally meant for computer specialists, not for
typical end users.
• Between these two extremes is a class of
Representation al ( or imple mentation or
trad itiona l) D ata M od e ls , w hi c h provi d e
concepts that may be understood by end users
but they are not too far from the way data is
organized within the memory of computer.
• Representational data models hide some (not all)
details of data storage but can be implemented
on a computer system in a direct way.
• Conceptual Data Models use concepts such as entities,
attributes and relationships. (We shall discuss ER
Model in detail).
• Representational Data Models are the models used
most frequently in traditional commercial DBMSs, and
they include the most widely used relational data
model as well as the so-called legacy data models – the
Network and Hierarchical Data Models.
• Representational Data Models represent data by using
record structures ( a characteristics close to physical
data models) and hence are referred to as Record
Based Data Models.
• We can regard Object Data Models as a new family of
high level implementation data models that are closer
to the conceptual data models.
• In practice Conceptual Data Models are used
to derive the highest level abstraction of the
database and the model thus obtained is
converted to representational data model
that can be directly implemented on a
computer (i.e. that can be utilized to create
conceptual and external schemas).
• In the next few portions of this lecture we
shall discuss Conceptual and Representational
Data Models at length.
Data Associations
• Data Associations means relationship between various data
items (Entities and Attributes). Data Modeling in fact is
used to represent the entities of Interest and their
relationships in the database.
• Entities specify distinct real world items in an application.
Basically they are nouns or in other words anything of the
interest of the organization.
• Attributes are nothing but the properties of Entities.
• When a large amount of data is stored in a database, we
have to formalize the storage mechanism that will be used
to obtain the correct information from the database. We
have to establish a means of showing the relationship
among various sets of data represented in the database.
• A R e l a t i o n s h i p b e t w e e n t w o s e t s X a n d Y i s a
correspondence or mapping between members of the sets.
A possible relationship that may exist between any two sets
(X & Y) may be one-to-one, one-to-many or many-to-many.
X Y
One-to-One
X Y
One-to-Many
X Y
Many-to-Many
An Example...
• Consider the Entity “Employee”.
• Attributes of this Entity may be – EMP_ID, SSN,
Name, Salary, Address, Status etc.
• Anyone among EMP_ID and SSN can be
chosen as Primary Key (An attribute that is
capable of uniquely identifying a given
employee).
• Some possible associations (relationships)
among the attributes of Employee have been
shown in the figure on the next page.
EMP_ID SSN
One-to-One (1 : 1)
EMP_ID NAME
Many-to-One (M : 1)
NAME SSN
One-to-Many (1 : M)
SALARY STATUS
Many-to-Many (M : N)
Relationships Among Entities
• As associations (relationships) do exist
between the attributes of a given entity,
similarly two entities may also be related.
• We distinguish between the association
that exists among the attributes of an
entity, called attribute association, and
that which exists between entities, called
a relationship.
An Example: Employees in an organization work on several
p ro j e c t s . I d e nt i f y E n t i t i e s a n d t h e i r re l at i o n s h i p s .
• Possible Entities are:
– MANAGER
– EMPLOYEES
– DEPARTMENT
– PROJECTS
• Relationships Among Entities:
– Relationship between DEPARTMENT and MANAGER is 1 : 1.
– Relationship between MANAGER and EMPLOYEES is 1 : M.
– Relationship between EMPLOYEES and PROJECTS is M : N.
– Relationship between DEPARTMENT and EMPLOYEES is 1 : M.
– Relationship between PROJECTS and MANAGER M : N.
– Relationship between PROJECTS and DEPARTMENT is M : 1.
DEPART MENT MANAGER
1:1
MANAGER EMPLOYEES
1:M
EMPLOYEES PROJECT S
M:N
DEPART MENT EMPLOYEES

1:M
PROJECT S MANAGER
M:N
PROJECT S DEPART MENT S

M:1
Conceptual Data Modeling
Entity Relationship (E-R) Model
• We analyze the organization and try to obtain an ER diagram
(model).
• We concentrate only on the relationships among entities and on the
attributes of a single entity and not on the attributes associations.
• An entity is shown as a rectangle.
• A diamond represents the relationship among a number of entities
which are connected to the diamond by lines.
• The attributes shown as oval are connected to the entities or
relationships by lines.
• Diamonds, Ovals and rectangles are labeled. The type of
relationship existing between the entities is represented by giving
the type of the relationship on the lines joining the relationship to
the entity.
• The entities relationships shown in the above figure must be shown
as below in the ER Diagram.
1 1
DEPART MENT Has MANAGER
1 M
MANAGER Manages EMPLOYEES
M N
EMPLOYEES Work On PROJECT S
1 M
DEPART MENT Consists EMPLOYEES
M Managed N
PROJECT S MANAGER
By
M 1
PROJECT S Assignedto DEPART MENT
ER Diagram for Employees – Projects Problem
Problem Statement:
• S e v e r a l E m p l o y e e s i n v a r i o u s
Departments managed by different
individual Managers work on several
projects. Draw an ER diagram for
modeling the data of organization.
Students – Teachers – Courses Problem
Problem Statement:
• In a University department students are
taught several courses by different
teachers. Identify entities and draw an
ER diagram to model the data of the
department.
Assignment Problems
1. A machine shop produces many parts which it takes on contract.
It has many machinists who can operate any of the machines. A
part needs working on only one machine. A record is kept on
the quality of material needed for producing each part. The
production of each part is tracked by giving a job number, start
time , end time and machinist identification. Obtain an ER
diagram for this problem.
2. A magazine is published monthly and is sent by post to its
subscribers. Two months before the expiry of subscription, a
reminder is sent to the subscriber. If subscription is not within a
month another reminder is sent. If renewal subscription received
up to two weeks before the expiry of subscription the
subscriber’s name is removed from the mailing list and the
subscriber is informed. Obtain an ER diagram to model the
situation.
3. A library receives 1300 journals of varying periodicities. The journals
receipt have to be recorded and displayed. Action has to be taken
when journals are not received in time or lost in mail. Unless
request for replacement is sent quickly, it may not be possible to
get replacement. Journals have to be ordered at different times
during the year and subscriptions renewed in time. Late payment of
subscriptions may lead to non availability of earlier issues or paying
high amounts for those issues,. Draw an ER diagram for the problem.
4. An advertisement is issued giving essential qualifications for the
course, the last date receipt of application form and a fee to be
enclosed with the application. A clerk in the registrar’s office
checks the received applications to see if marks sheet and fee are
enclosed and sends valid applications to the concerned academic
departments. The department checks the application in detail and
decides the applicants to be admitted, those to be put in waiting
list and those rejected. Appropriate letters are sent to the
registrar’s office which intimates the applicants. Draw ER diagram
for the problem.
5. Draw an ER diagram for a banking system.
Representational (Traditional) Data Models
Hierarchical Data Model
• The Hierarchical Data Model (HDM) uses the tree concept to represent
data and the relationships among data.
• The nodes of the tree are the record types (segments) representing the
entities and are connected by links.
• Each hierarchical tree can have only one root record type and this
record type doesn’t have a parent record type.
• The root can have any number of child record types, each of which can
itself be a root of another hierarchical tree.
• Each child record type can have only one parent record type, thus a
many to many relationship can’t be directly expressed between two
record types.
• Data in parent record applies to all its child records. A child record
occurrence must have a parent record occurrence.
• Deleting a parent record occurrence requires deleting all its child record
occurrences.
• A hierarchical tree can have any number of record occurrences for each
record type at each level of hierarchy.
Transforming an ER Model to Hierarchical
Data Structures (Hierarchical Model)
1. Transforming One to One Relationships:
• We follow the following Rule:
– For each entity E in the ER model, create a record type (segment) S
in the hierarchical model. All attributes of E are represented as
fields of S. Any of the segments may be chosen as parent and the
other segment becomes the child.
2. Transforming One to Many Relationships:
• We follow the following two Rules:
– For each entity E in the ER model, create a record type (segment) S
in the hierarchical model. All attributes of E are represented as
fields of S.
– For one to many relationship between two entities, create
corresponding tree structure diagrams, making each entity as
record type and making one to many relationship as a parent child
relationship. The record type (segment) on the many side of the
relationship becomes the child record type and the record type on
the one side of the relationship becomes the parent.
3. Transforming Many to Many Relationships:
• We follow the following two Rules:
– For entities E1 and E2 that have a many to many relationship and
from which segments (record types) S1 and S2 have been derived,
construct two different trees: S1 to S2 and S2 to S1. In one tree S1
would be parent and in the other tree it would be child. Similarly in
one tree S2 would be child and in the other tree it would be parent.
– If a many to many relationship has common attribute data, create a
new intersection segment I which contains that data. Each of the
segment types created from the entities will function as a root of a
distinct tree. Insert the new segment between the two entity types
and establish the corresponding one to many relationships
between parent child segments. If any of those parent child
relationships are exactly one to one, the common attribute data
might be combined into segments created from entities.
Example (1 : 1 Relationships)
MID
DID
1 1
DEPARTMENT Has MANAGER MNANE
DNAME
LOCATION
SEX
ER Model
DID DNAME LOCATION Parent (Department)
Segment 1
MID MNAME SEX Child (Manager)
Segment 2
Hierarchical data Model

100 Physics West 101 Chemistry East
M100 Ravi M M101 Sanjay M
102 Maths East
M102 Sandy F
Record Occurrences
Example (1 : M Relationships)
SEX EID
DID
1 M
DNAM E DEPART M ENT Has EM PLOYEES ENAM E
LOCAT ION M
GRADE
SALARY
1 Has
Provi des M Retirem ent T YPE

Pl an
AM OUNT
ID DAT E
ER Model
Parent (Department)
DID DNAME LOCATION
[One Side]
Child (Employee)
EID ENAME GRADE SALARY SEX
[Many Side]
Parent (Department)
DID DNAME LOCATION
[One Side]
ID TYPE Date AMOUNT Child (Retirement Plan)

[Many Side]

ID TYPE Date AMOUNT Parent (Retirement Plan)
[One Side]
Child (Employee)
EID ENAME GRADE SALARY SEX
[Many Side]

100 Physics West 101 Chemistry East
E1 RAVI A 10000 M F1 SMITH A 10000 M
E2 SANJAY B 8000 M F2 CLARK B 8000 M
E3 SANDY A 10000 F F3 LUCY A 10000 F
140 A 31NMar09 5 Lacs 100 Physics West
100 A 31Mar090 5 Lacs

101 Narayan A 10000 M
101 B 10Jan99 4 Lacs

102 Shankar B 8000 M
102 C 5Feb2002 3 Lacs

103 Ajit A 15000 M
Record Occurrences
Example (M : N Relationships)
MID
PID
M N
PNAME PRODUCTS MadeBy Manufacturers MNANE
DESC
LOCATION
ER Model
PID PNAME DESC Parent (Products)
Tree 1 MID MNAME LOCATION Child (Manufacturer)
MID MNAME LOCATION Parent (Manufacturer)
PID PNAME DESC Child (Products)
Tree 2
101 Bajaj Mumbai
1 Scooter 25000
2 BikeE 40000
Three
3 60000
Wheeler
1 Scooter 25000
101 Bajaj Mumbai
102 LML Italy
Record Occurrences 103 Kinetic Mumbai

Example (Relationships Carrying Attribute Data)
CITY
STATUS
QTY
S#
S# PNAME
SNAME
SNAME Suppliers Supply Parts P#
COLOR
CITY
CITY
WEIGHT
S# SNAME STATUS CITY Parent P# PNAME COLOR WEIGHT CITY Parent
QTY QTY
P# PNAME COLOR WEIGHT CITY Child S# SNAME STATUS Parent

CITY Child
Tree 1 Tree 2
• f you carefully examine the relations used by
C.J. Date for the supplier-parts problem,
throughout his book they are:
S <S#, SNAME, CITY, STATUS>
P <P#, PNAME, COLOR, WEIGHT, CITY>
SP <S#, P#, QTY>
• It indicates that QTY attribute must be the
attribute data.
Evaluation of Hierarchical Model
• There are three features that define the hierarchical data structures –
trees, segments and fields.
• While any ER model can be transformed to a hierarchical data structure,
the requirement that all database records can be trees may result in
segments duplication.
• Any situation whose natural mapping results in a segment being a child
segment of two distinct parent segments requires that those parent
segments occur in separate trees.
• As an example consider the ER Model and its transformation to
Hierarchical Data Structures, on the next page.
• Such duplication has the following negative points:
– Storage space is used inefficiently as the segment is repeated.
– Possibility of inconsistent data is there, if the data are changed in one
segment copy but not in the other.
1 M N M
E1 E2 E3
E1 Parent E2 Parent E3 Parent
E2 Child E3 Child E2 Child

• This problem has been dealt with the help of virtual segments. A Virtual
Segment contains no data but has a pointer to a data segment where
data are stored.
• When a segment is required to be replicated in two or more trees, the
actual data are stored in just one of the trees. All other occurrences of
that data segment will contain a pointer to the location where the
actual data are stored.
• This is shown in the figure on the next page.
E1 Parent 500 Parent E3 Parent
E2 Child E3 Child 500 Child
500
Virtual Segments
Examples of Commercial Database Management Systems
Based on Hierarchical Approach
• No DBMS supports hierarchical data organization in modern
days.
• Few legacy DBMSs based on this approach are:
– IMS: IBM’s Information Management System. It was
once the leading DBMS based on Hierarchical approach.
– TDMS: System Development Corporation’s Time Shared
Data Management System.
– Mark IV: Control Data Corporation’s Multi Access
Retrieval System.
– System 2000: SAS Institute’s Hierarchical DBMS.
Network Data Model
• There are two fundamental data structures in Network data
Model:
– Record Types (same as Segments in Hierarchical Data Model)
– DBTG Sets or simply Sets
• Record Types are defined in a usual way as collections of logically
related data items (fields). Record types are same as segments in
hierarchical model.
• DBTG Set or simply Set in the DBTG model expresses a one to
one or one to many relationship between two record types
(segments) or entities. The record type on the one side of the
one to many relationship of a DBTG set is called the Owner
record type and the record type on the many side of the one to
many relationship of a DBTG set is called Member record type. In
a one to one relationship any record type can be chosen as
Owner and the other becomes the member.
Transforming an ER Model to Network Data
Structures (Network Model)
• Conceptual (ER) modeling to network data modeling is expressed by
means of Batchman Diagrams.
• Following conventions are there for constructing the Batchman
Diagrams:
– The sets are denoted by the arrows between the record types
with the arrow pointing to the member record type.
– Each set is consists of an owner record type, a member record
and a name for the set. The set name is the label given to the
arrow.
• Simple & Complex Networks: A conceptual data structure (ER
Model) in which all relationships are one to one or one to many is
called a Simple Network and a conceptual data structure in which
one or more relationships are many to many is called Complex
Network. Note that the DBTG network model allows only simple
networks in which all relationships are one to many or one to one. A
complex network can’t be directly implemented in DBTG model.
Rules for Transformation
1. Transforming One to One Relationships:
• We follow the following Rule:
– For each entity E in the ER model, create a record type R in
the network model. All attributes of E are represented as
fields of R. Any of the record types may be chosen as owner
and the other record type becomes the member.
2. Transforming One to Many Relationships:
• We follow the following Rules:
– For each entity E in the ER model, create a record type R in
the network model. All attributes of E are represented as
fields of R.
– For one to many relationship the record type on one side of
the relationship becomes the owner and the record type on
the many side of the relationship becomes the member.
Example (1 : 1 Relationships)
SSN
CODE
1 1
COUNTRY Has PM PMNANE
NAME
CURRENCY
SEX
CODE NAME CURRENCY Ow ne r (Country)
CON-PM
SSN PMNAME SEX Me m be r (PM)
OR
SSN PMNAME SEX Ow ne r (PM)
PM-CON
CODE NAME CURRENCY Me m be r (Country)
Network data Model

100 India Rupees 101 PAkistan Rupee
1 M.M. Singh M 2 Mr. Gilani M
Record Occurrences
(as per first choice in above diagram)
Example (1 : M Relationships)
(for simplicity we are considering one attribute for each entity and are considering only 1:M relationships)
O No .
1 M
CNA M E CU S T O M E R G i ve s O RD E R S
E xe cu te s
S A L E S P E RS O N
NA M E
(for simplicity I have considered entities only. You are advised to use the complete
record types as we have done in the previous example)
Owner CUSTOMER SALES PERSON Owner
CUS-PO SAP-PO
PURCHASE ORDER M em ber
Network data Model

Customers Sales Persons
Jones Smith Bill Ravi Mohan Sanjay
100 103 109
Orders
Record Occurrences
3. Transforming Manyto Many Relationships:
• When two entities are connected in many to many relationship we
create an intersection or link record type consisting of at least the
key attributes from both the entities. Other attributes may be
added at the discretion of the designer.
• What is a link record?
– A dummy record type that is created in order to convert a
complex network into an equivalent simple network is called a
link record or link record type.
– With the creation of link record type, all many to many
relationships are converted into equivalent one to many
relationships. Which are required by the DBTG Network Model.
• So, to derive the Network Data Structure, we follow the following
rule:
– For each many to many relationship between entities E1 and E2,
create a link record type L, and make it the member record type
in the two set types, of which the set types owners are the
record types corresponding to E1 and E2.
Example (M : N Relationships)
CID
ENO
M N
SNAME ST UDENT S Attend COURSES
CNAM E
STUDENTS SID SNAME (OWNERS) CID CNAME COURSES
STU-LREC COU-LREC
SID CID LREC (MEMBER)
Network data Model

S1 JONES S2 CLARK S3 SMITH
S1 C2 S1 C3 S1 C4 S2 C1 S2 C3 S2 C4 S3 C1 S3 C4
C1 PHY C2 MATHS C3 ECO C4 CHEM
Record Occurrences
(Notice that on both sides now relationship is 1:M)
Hierarchical Vs. Network Data Model
• Consider the Network Data Structure shown on the
next page for the Customer-Sales Person-Purchase
problem.
• This example indicates the clear difference
between hierarchical and network models.
• In this figure, the PURCHASE ORDER record type is
a member (Child) of two sets (Parents) – CUS-PO
and SAP-PO.
• In the hierarchical data model no record type can
be a member (child) of two different sets (parents).
Owner CUSTOMER SALES PERSON Owner
CUS-PO SAP-PO
PURCHASE ORDER Member

Examples of Commercial Database Management Systems
Based on Network Approach
• No DBMS supports network data organization in modern days.
• Few legacy DBMSs based on this approach are:
– IDMS/ R: The most widely used commercial implementation
of DBTG network data model. It stands for Integrated
Database Management System/ Relational
– DMS 1100: from UNIVAC.
– TOTAL: from Cincom’s.
– DBDMP: (Database Operations & Manipulation Process) from
IBM.
Relational Data Model
• In 1970 the way many people viewed databases was permanently changed when E.F.
Codd introduced the relational model.
• In this model relation is the only construct required to represent the associations
among the attributes of an entity as well as the relationships among different
entities.
• One of the major reasons for introducing this model was to increase the
productivity of application programmers.
• Users need not to know the exact physical structure to use the database. They are
however required to know how the data has been partitioned into various relations.
• In relational data model, relation is the only data structure used to represent
entities and relationships among them.
• In addition Codd proposed two data languages which promised more power in
accessing and processing the data. These Languages are:
– Relational Algebra
– Relational Calculus
• Today, these languages provide the basis for the commercial database languages
used in many of the most popular commercial DBMSs.
Terminology of Relational Model
Relation:
Given a collection of sets D1, D2 , D3 …. …. …. Dn, R is a relation on
those n sets if it is a set of ordered n-tuples < d1, d2 , d3 …. …. …. dn>
such that d1є D1, d2є D2, d3є D3,……., dnє Dn. Sets D1, D2 , D3 …. …. ….
D n are called the domains of R and the value n is called the
degree of R.
OR
We define R to be relation on sets D1, D2 , D3 …. …. …. Dn, if it is a
subset of the Cartesian Product D1 X D2 X D3 …. …. …. X Dn,
Cartesian product of these n sets, written as “D1 X D2 X D3 …. …. ….
X Dn” is the set of all possible ordered n tuples < d1, d2 , d3 …. …. ….
dn> such d1є D1, d2є D2, d3є D3,……., dnє Dn.
An Example
S# P#
S1
S2 X
P1
P2
= {<S1,P1>, <S1,P2>, <S2,P1>, <S2,P2>}
S# P#
S1 P1
S1 P2
S2 P1
S2 P2
Then, R = {<S1,P1>, <S2,P1>} , being the subset of

S# X P# is a relation on these sets.
• Clearly, a relation represents a unique two dimensional
table consisting of rows and columns of data.
• Thus relational model organizes and represents the data in
the form of tables (the two dimensional data structure).
Cardinality of a Relation:
• The number of rows (tuples) in a relation is called the
cardinality of the relation.
• We have already defined the degree of a relation. Relations
of degree one are called unary relations, that of degree two
are called binary relations, that of degree three are called
ternary relations and so on.
• A relation of degree n is called n-ary relation.
Domains and Attributes:
• Each column in a relation is an attribute of the relation. For example
in our relation S# and P# are attributes.
• The set of all possible values that an attribute may have is called the
domain of attribute.
• Every attribute in a relation is defined on a domain. Domains may
be distinct for each attribute or two or more attributes may be
defined on the same domain.
• Note that at any given instant of time there will be values in the
domain that do not currently appear as values in the corresponding
attribute.
Tuples:
• The rows of a relation are called tuples. It is assumed that there is
no predefined order of the rows or tuples of a relation and that no
two tuples can have identical set of values.
Recursive Relationship:
• To understand the concept of recursive relationship, consider the ER
diagram on the next page.
• A relationship which relates an object set or an entity set to itself is
called the recursive relationship.
• For example in the following diagram the relationship, “supervises “ is a
recursive relationship.
• We can derive the following relations from the ER diagram in question:
WORKER<WORKER_ID,NAME,HOURLY_RATE,SKILL_TYPE,SUPV_ID>
ASSIGNMENT<WORKER_ID,BUILD_ID,START_DATE,NUM_DAYS>
BUILDING<BUILD_ID,BUILD_ADDRESS,TYPE,QUALITY_LEVEL,STATUS>
SKILL<SKILL_TYPE,BONOUS_RATE,HOURS_PER_WEEK>
NULL Values:
• Suppose an attribute is not applicable in a specific case. For
example some employees in the WORKER relation don’t have
supervisors (or they themselves are supervisors).
• In that case no value exists corresponding to SUPV_ID attribute
for such employees.
• In addition when we are entering data for a row in a relation, we
might not know the values of one or more of the attributes for
that row.
• In either case we enter nothing, and the row is recorded in the
database with null values for those attributes.
• A null value is neither a blank nor a zero.
• It is simply unknown and may be supplied at a later time.
Relational Keys
• A key is a single attribute or combination of two or more attributes of a relation that
is used to uniquely identify a tuple in the relation.
• For example in the Worker relation, the WORKER_ID attribute is a key. When we
refer to the term key, we normally mean to primary key.
• Here we concentrate on the terminology used for relational keys.
Superkey:
• If we add additional attribute to a key (primary key) and the resulting combination
of attributes still uniquely identifies a tuple in the relation, it is called a superkey of
the relation.
• For example the combination (WORKER_ID,Name) uniquely identifies each worker
in the WORKER relation, so it is a superkey of the relation.
• Clearly, a key of a relation is a minimal set of such attributes.
• That is, a key is a minimal superkey. By minimal we mean that no subset of the set
of key attributes will uniquely identify tuples in the relation.
• As an example consider the relation SP<S#,P#,QTY>. Key to the relation is (S#,P#). If
we add an additional attribute i.e. QTY with this attributes combination, we get the
superkey of the relation as (S#,P#,QTY). Clearly key is the minimal set of attributes
that uniquely identify each row in the relation. No subset of the key attributes (i.e.
S# or P# individually) can be used as key.
Composite Key:
• A key consisting of more than one attribute is called a composite key.
• For example in the ASSIGNMENT relation the attributes combination
(WORKER_ID, BUILD_ID) uniquely identifies each tuple in the relation, so key
to the relation is composite.
• In the same manner the attributes combination (S#,P#) is the composite key
to SP relation.
Candidate Keys:
In any given relation there may be more than one set of attributes that could be
chosen as a key. These are called candidate keys.
It may appear for example that NAME is a candidate key in the WORKER
relation. This would be so if we could assume that NAME will always be unique.
Primary Key:
When one of the candidate keys is selected as the relational key, it is called the
Primary Key.
The remaining candidate keys would be considered as alternate keys.
When we use the term key we mean to primary key.
Secondary Key:
A secondary key is an attribute or combination of attributes that may not be a
candidate key but that classifies the relation based on a particular
characteristics (or criteria).
• Actually, a secondary key is one that identifies a set of records having
the same values for the secondary key attribute.
• For example in the relation WORKER, the SKILL_TYPE attribute is a
secondary key when the following query is executed:
select NAME, WORKER_ID from WORKER where SKILL_TYP E= ‘Engg.”;
Foreign Key:
• A foreign key is a set of attributes (or single attribute) in a relation that
is a primary key in another relation (or in the same relation).
• For example, SKILL_TYPE attribute in the WORKER relation and
BUILD_ID attribute in the ASSIGNMENT relation are foreign keys as they
are primary keys in the relations SKILL and BULIDING respectively.
Importance of Foreign Keys:
• Foreign keys are the essential links between relations. They are used to
tie data in one relation to the data in another relation.
• Thus SKILL_TYPE links the WORKER relation to the SKILL relation and
BUILD_ID in the ASSIGNMENT relation show the link between WORKER
and BUILDING relations.
Recursive Foreign Key:
• Foreign key attributes need not have the same names as the key attributes to
which they correspond (or refer).
• For example consider the listing of WORKER relation:
WORKER_ID NAME HOURLY_RATE SKILL_TYPE SUPV_ID
100 Ganesh 10.00 EE 102
101 Rajesh 12.00 ME 100
102 Mahesh 13.00 CE
103 Mukesh 15.00 AR 104
104 Ravi 11.00 CE
105 Meena 12.00 COE 102
• Both WORKER_ID and SUPV_ID in the WORKER relation have different names
but both take their values from the same domain (i.e. the domain of workers
identification numbers).
• The SUPV_ID is a foreign key in the worker relation that references the key of
its own relation. Such foreign keys are called Recursive Foreign Keys.
• Thus a recursive foreign key is nothing but a foreign key that references the key
of its own relation.
Relational Database Schema:
• A listing that gives relation names followed by their attribute names with
key attributes underlined and with foreign keys designated is called a
relational schema.
• Note that the term “Schema” has been used loosely in this definition. What
we are going to create is more closer to the relational view, because
schema is defined using a data sublanguage as we have practiced in past.
• An example of relational schema is:
WORKER<WORKER_ID,NAME,HOURLY_RATE,SKILL_TYPE,SUPV_ID>
Foreign Keys: SKILL_TYPE REFERENCES SKILL
: SUPV_ID REFERENCES WORKER
ASSIGNMENT<WORKER_ID,BUILD_ID,START_DATE,NUM_DAYS>
Foreign Keys: WORKER_ID REFERENCES WORKER
: BUILD_ID REFERENCES BUILDING
BUILDING<BUILD_ID,BUILD_ADDRESS,TYPE,QUALITY_LEVEL,STATUS>
SKILL<SKILL_TYPE,BONOUS_RATE,HOURS_PER_WEEK>
Views:
• We know how tables are defined in a relational database schema.
These tables are called base tables, because they contain the basic data
of the database.
• Portions of these base tables as well as information derived from them
can be defined in database views which are also defined as part of the
database schema.
• A view is a virtual table i.e. a window into a portion of the database.
Views are useful for maintaining confidentiality by restricting access to
selected parts of the database and for simplifying frequently used
query types.
• Following example illustrates how a view can be created
CREATE VIEW B_WORKER AS SELECT WORKER_ID, SKILL_TYPE FROM
WORKER
Transforming E-R Diagrams into Relational
Data Structures (Model)
• Please refer to the class notes.
Normalization
• This topic is concerned with the design and implementation
issues, that would be considered in a RDBMS (i.e. when we
shall design a database using relation DBMS).
• This is actually in continuation of our earlier discussion on
RDBMS to further refine our design (View) to produce the
relations in a form that is least prone to the problems like
inconsistency and redundancy.
• In general the goal of a relational database design is to
generate a set of relational schema that allows us to store
information without unnecessary redundancy, yet allows us
to retrieve information easily.
What is Normalization?
• The entities and their attributes can be organized into a set of tables in
many different ways.
• One method of organization is to design schemas that are in
appropriate normal form. The theory behind such arrangement of
attributes in tables is known as the Normalization.
• The normalization of data helps to ensure that a particular organization
conforms to such standards as:
– Minimization of duplication of data (redundancy).
– Providing flexibility to support different functional requirements.
– Easy and consistent modification of data.
– Enabling the organization to be translated to the actual database design.
• A number of normal forms have been defined for classifying relations.
Each normal form has associated with it a number of constraints.
• A relational schema is said to be in a particular normal form if it
satisfies all the constraints required for that normal form.
• In general if a given relation is in (n+1)th Normal form, it is obvious that
the relation is in the nth Normal form.
Univ erse of Relations (Un-Normalized & Normalized)
I NF Relations (Normalized Relations)
2 NF
3 NF
BCNF
4 NF
PJ/ NF (5 NF)
Why Normalization?
Relations are normalized so that when relations in a database are to
be altered during the lifetime of the database, we don’t loose vital
information or introduce inconsistencies. The types of alterations
normally needed for relations are:
Insertion:
• Insertion of data items into database should be possible without
being forced to leave blank fields for some attributes. If our design
has such undesirable property it is called Insertion Anomaly.
Deletion:
• Deletion of a tuple from a relation should be possible without
loosing vital information. If our design has such undesirable
property it is called Deletion Anomaly.
Updation:
• Updation or changing the value of an attribute should be possible
without introducing inconsistencies in the database. If our design
has such undesirable property it is called Updation Anomaly.
Clearly Normalization is used to avoid anomalies from the design.
Un-Normalized Relations
• Consider the following two relations:
Order_No. Order_Date Item_Lines

Item_code Quantity Price/ Unit
1456 22-02-2002 3687 52 50.40
4627 34 60.50
4009 62 20.60
1667 26-02-2004 4627 40 60.50
3687 13 50.40
40.96 20 118.90
---- ---- ----
Dept. of Faculty Teacher Course Prefrences
Course_No. Course_Dept
Comp. Engg. Smith 353 Comp. Engg
279 Comp. Engg
221 Mathematics
Chemistry Clark 353 Comp. Engg
456 Mathematics
336 Chemistry
---- -----
• In the above relation the attribute Items_Lines
(Course_Prefrences) is not a single attribute but is composed
of three attributes namely Item_Code, Quantity and
Price/Unit (Course_no. and Course_Dept).
• Each row contains multiple set of values for some of the
columns. These multiple values in a single row are called non-
atomic values.
• This form of data is not suitable for storing as a file on the
computer, as retrieval of data based on a component of a
composite attribute is difficult.
• For example, to find out “how many items for a specified
item_code (or how many teachers are interested to teach a
particular course)” is really difficult.
• Thus relations as shown in the above two tables are not
allowed. Such relations are known as Un-normalized
Relations.
Normal Forms
First Normal Form
• The two relations shown above can be rewritten as below:
Order_No. Order_Date Item_Code Quantity Price/ Unit

--- ---- --- --- --
--- ---- --- --- --
--- ---- --- --- --
Dept._of_Fac. Teacher Course_No. Course_Dept.

---- ---- ---- ----
---- ---- ---- ----
---- ---- ---- ----
• The relations shown in the above two tables are called
normalized relations and they are in First Normal Form.
• Note that they don’t have any composite attribute i.e.
values in each row are atomic.
• So – “A Relation is in First Normal Form (1 NF) if and only if
(iff) all underlying domains contain atomic values only.”
• A database schema is in 1 NF if and only if, every relation
included in the database schema is in 1 NF.
• Further discussion o n Normalizat io n requires the
knowledge of Functional Dependencies as the higher
normal forms are constrained by Functional Dependencies.
Functional Dependencies
• Given a relation R, attribute Y of R is functionally dependent on
attribute X of R if and only if, each X-value in R has associated with it
precisely one Y-value in R.
OR
• Given a relation R, attribute Y of R is functionally dependent on
attribute X of R, iff, whenever two tuples of R agree on their X-value,
they also agree on their Y-value.
• The notion for FD is:
FD: R.X  R.Y
• The attribute on the left hand side of an FD is called determinant,
because it determines the value of the attribute on the right hand side
of the FD.
• The above notion for FD is read as : “Attribute Y of R is functionally
dependent on attribute X of R” or “Attribute X of R functionally
determines attribute Y of R”.
X Y
. .
. .
. .
FD: X  Y
X Y
. .
. .
. .
Not an FD
• Graphically an FD is shown as below:
X Y
• The above diagram is known as Functional Dependency Diagram.

• Note that both X and Y may be composite attributes:
An Example
• Consider the following relation (S):
S# SNAME CITY STATUS

S1 Smith London 20
S2 Jones Paris 10
S3 Blake Paris 30
S4 Clark London 20
S5 Adams Athens 30
• Find the Functional Dependencies between the attributes of S. Also

draw the dependency diagram.
ANSWER
• Following FDs hold on the above relation:
S.S#  S.SNAME
S.S#  S.CITY
S.S#  S.STATUS
S.SNAME  S.CITY
S.SNAME  S.S#
S.SNAME  S.STATUS
• Collectively we can indicate them as follows:
S.S#  S.(SNAME,CITY,STATUS)
S.SNAME  S.(S#,CITY,STATUS)
• The dependency diagram is shown in the following figure.
S# CITY
SNAME STATUS
Example 2
• For the following relation (SP), determine FDs and draw the functional
dependency diagram:
S# P# QTY
S1 P1 300
S1 P2 200
S1 P3 400
S2 P1 300
S2 P2 400
S3 P2 200
ANSWER
• The following FD holds on SP relation.
SP.(S#,P#)  SP.QTY
• The dependency diagram is shown in the following figure.
S#
QTY
P#
Full Functional Dependency
• Given a relation R and an FD: R.X  R.Y.
Attr i bu te Y o f R i s f ul l y f u nc t i o na l l y
dependent on attribute X of R, if there is no
Z, where Z is a proper subset of X, such that
Z determines Y.
• So, full functional dependency is meaningful
when X is a composite attribute, otherwise
functional dependency and full functional
are used interchangeably.
An Example
• Consider the relation S. Determine whether S.CITY is fully
functionally dependent upon S.(S#,STATUS)?
• In relation S the FD:
S.(S#,STATUS)  S.CITY
Does not hold.
Explanation:
We Have
S.S#  S.CITY
As a subset (i.e. S#) of (S#,STATUS) is determining CITY, clearly
CITY is not fully functionally dependent on the composite
attribute in question.
• Note that in case of SP relation SP.QTY is fully functionally
dependent on SP.(S#,P#). (Why???)
Decompositions
• Decomposition is the process of splitting a relation into multiple
relations.
• Decompositions are mandatory requirements
Heath’s Theorem for Non-loss Decomposition

• Given a relation R, with attributes A,B and C (may be
composite) and satisfying the Functional Dependency:
FD: R.A  R.B
Then, R can always be non-lossy decomposed into two
relations R1<A,B> and R2<A,C>. The relations R1 and R2 are
called projections of relation R.
An Example: Decompose following relation X (if possible)
Course_Code Course_Name Teacher
321 Maths – I Prof. White
321 Maths – I Prof. Black
321 Maths – I Prof. Brown
331 Chemistry - I Prof. Red
331 Chemistry - I Prof. Green
331 Chemistry - I Prof. Violet
332 Physics – I Prof. Blue
--- --- ---
--- --- ---
SOLUTION
• We can easily see that the following FD holds on the above relation:
FD: X.Course_Code  X.Course_Name
Applying Heath’s theorem the given relation may be non-lossly
decomposed as:
X1 <Course_Code,Course_Name>
X2<Course_Code,Teacher>
Course_Code Course_Name Course_Code Teacher

321 Maths - I 321 Prof. White
331 Chemistry-I 321 Prof. Black
332 Physics - I 321 Prof. Brown
--- ---
--- ---
--- ---
Problems with 1 NF
• Consider the relations FIRST and ORDER, both in First Normal Form
(1NF).
FIRST
S# STATUS CITY P# QTY
S1 20 LONDON P1 300
S1 20 LONDON P2 200
S1 20 LONDON P3 400
S1 20 LONDON P4 200
S1 20 LONDON P5 100
S1 20 LONDON P6 100
S2 10 PARIS P1 300
S2 10 PARIS P2 400
S3 10 PARIS P2 200
S4 20 LONDON P2 200
S4 20 LONDON P4 300
S4 20 LONDON P5 400
S5 20 ATHENS P1 300
ORDER
ORDER_NO. ORDER_DATE ITEM_CODE QTY PRICE/
UNIT
1456 26-2-2001 3687 52 50.40
1456 26-2-2001 4627 38 60.20
1456 26-2-2001 3214 20 17.50
1886 04-03-2001 4629 45 20.25
1886 04-03-2001 4627 30 60.20
1788 04-04-2001 4630 40 62.20

• Key to the relation FIRST is the attributes combination (S#,P#)
and that of the relation ORDER is (Order_No.,Item_Code).
• Though these relations are in 1 NT, still suffering from anomalies
with respect to insertion, deletion and updation, as explained
below:
Insertion Anomaly:
• We can’t enter the fact that a particular supplier is located in a
particular city until that supplier supplies at least one part. The
primary key to this relation is the attribute combination (S#,P#).
If we try to enter a new supplier S6, who doesn’t supply any part,
we can’t do that, because we will have to leave the P# value
blank which is not possible due to integrity rule 1.
• Similarly if in the ORDER relation we try to enter an order
number that has no item code, it is not possible as we will have
to leave the Item_Code field blank and that would be violation of
Integrity Rule 1.
Deletion Anomaly:
• From the relation FIRST if we delete the supplier, having S#=‘S5’,
we not only destroy the shipment connecting to S5, but also
loose the information that S5 is located in the city ATHENS.
• Similarly, in the ORDER relation, if we delete the order for
Order_No.=‘1788’, we loose the information that Price/ Unit for
Item_code=‘4630’ is Rs. 62.20.
Updation Anomaly:
• If the supplier S1 changes its city from LONDON to ATHENS, then
it is required to update several tuples for STATUS and CITY. This
may lead to inconsistent values in the table.
• Similarly, in ORDER relation if the price for Item_code 4627
cahnges to 63.75, then we have to modify several tuples
resulting more chances of inconstancies.
• To solve these problems what we should do?
• First we draw the dependency diagrams for these relations. They
have been shown on the next page.
• Now, we decompose these relation as follows:
Decomposition of FIRST:
SECOND<S#,STATUS,CITY>
SP<S#,P#,QTY>
Decomposition of ORDER:
XORDER<ORDER_NO,ORDER_DATE>
PRICES<ITEM_CODE,PRICE/UNIT>
ITEMS<ORDER_NO,ITEM_CODE,QTY>
• Related tables show the instances of new relations.
S# STATUS
QTY
P# CITY
Dependency Diagram for FIRST
ORDER_NO
S# ORDER_DATE
QTY
ITEM_CODE
P# PRICR/ UNIT
Dependency Diagram for ORDER

SECOND SP
S# STATUS CITY S# P# QTY
S1 20 LONDON S1 P1 300
S1 P2 200
S2 10 PARIS S1 P3 400
S3 10 PARIS S1 P4 200
S4 20 LONDON S1 P5 100
S2 P1 300
S5 20 ATHENS S3 P2 200
S4 P2 200
S4 P4 300
S4 P5 400
S5 P1 300
XORDER
ORDER_NO. ORDER_DATE
1456 26-2-2001
1886 04-03-2001
1788 04-04-2001
ITEMS PRICES
ORDER_NO ITEM_CODE QTY ITEM_CODE PRICE/ UNIT
1456 3687 52 3687 50.40
1456 4627 38 4627 60.20
1456 3214 20 3214 17.50
1886 4629 45 4629 20.25
1886 4627 30 4630 62.20
1788 4630 40
• With these decompositions now we shall observe how the insertion,
deletion and updation anomalies problems related to 1 NF representations
of these relations are solved:
Insertion Anomaly:
• We can now enter the information that S6 is located in Washington, in spite
of the fact that currently he doesn’t supply any part.
• Similarly an order without any item description can be placed.
Deletion Anomaly:
• We can now delete the shipment connecting S5 to P2 and S3 to P2 by
deleting appropriate tuples from the relation SP, yet we don’t loose the
information that they are located in which particular cities.
• Similarly, if an order is cancelled we can delete appropriate tuple from
XORDER relation and this time we don’t loose the price of the item (s)
described in that order.
Updation Anomaly:
• In the revised structures of the relations, the CITY of a given supplier
appears once, so if a supplier moves from one city to another, his location
can easily be altered.
• Similarly the PRICE/ UNIT of an item can be changed easily in the PRICES
relation.
What We Have Actually Done to Solve The Problems??
• We have decomposed the relations in
such a way that in each of the resultant
relations every non key attribute is
fully functionally dependent on the key
attribute.
• Check the dependency diagrams of new
relations below:
S# STATUS
S#
QTY
CITY
P#
Dependency Diagram for SECOND
Dependency Diagram for SP
ORDER_NO ORDER_DATE ITEM_CODE PRICR/ UNIT
ORDER_NO
Dependency Diagram for XORDER Dependency Diagram for PRICES

QTY
ITEM_CODE
Dependency Diagram for ITEMS

Second Normal Form (2 NF)
• A relation R is in Second Normal Form
(2 NF) if and only if it is in First Normal
F o r m ( 1 N F ) a n d e v e r y n o n k e y
attribute is fully functionally dependent
on the key attribute.
• Further a relational schema is said to
be in 2 NF if each rel ati on i n th e
schema is in 2 NF.
Problems with 2 NF
• Anomalies similar to those described with 1 NF can, also occur with a
relation that is in second normal form (2 NF).
• To remove them another normalization step is used, that converts
second normal form relation to the third normal form relation.
• For further discussion consider the following relations namely SECOND
and STU_INFO
SECOND STU_INFO
S# STATUS CITY ENo. NAME DEPT YEAR HALL
S1 20 LONDON Y-100 RAMA PHYSICS 1 MM
S2 10 PARIS Y-212 ALI CHEM 1 MM
S3 10 PARIS Y-107 RAVI MATHS 2 MM
S4 20 LONDON Z-551 TONI BOTANY 2 MM
S5 20 ATHENS X-553 MOHAN GEOLOGY 3 AI
T-337 RAJAN ZOOLOGY 4 RM

• Both the relations SECOND and STU_INFO are in second normal
form and are still suffering from anomalies with respect to
insertion, deletion and updation as indicated by the following
discussion:
Insertion Anomaly:
• We can not enter the fact that a particular city has a particular
status value until we don’t have some supplier located in that city.
For example we can’t enter that Rome has a status value of 50
until we have a supplier in Rome.
• Similarly, in the STU_INFO we can not enter the fact that a
particular year value has a particular hall name until we don’t
have a student belonging to that year.
Deletion Anomaly:
• If we delete the tuple for S#=‘S5’, we not only destroy the
information of the concerned supplier but also the information
that a particular city has a particular status value (or Athens has
the status value 20 in this case)
• Similarly in STU_INFO, if we delete the tuple having ENo=‘Z-337’,
we not only loose the information of the concerned student but
also the information that the 4th year students have been placed
in RM hall.
Updation Anomaly:
• The STATUS value for a given city appears many times in SECOND.
So in case we need to change the status of a city we will have to
change it in many tuples resulting more chances of inconsistency.
• Similarly in the STU_INFO, if the hall value for a given year
appears many times. So in case we need to change the hall of a
year we will have to make changes in several tuples resulting
more chances of inconsistency.
• All these anomalies are due to another kind of attribute
dependence that is known as Transitive Dependence.
• Next we shall discuss Transitive Dependence and the ways to
remove it.
Transitive Dependence
• Suppose A, B and C are three attributes or distinct collections of
attributes of a relation R. If C is functionally dependent on B and B is
functionally dependent on A then obviously C is functionally dependent
on A.
• If the inverse mapping is non simple (i.e. if A is not functionally
dependent on B or B is not functionally dependent on C), then C is said
to be transitively dependent on A via B.
Transitive Dependence
• Now consider the dependency diagrams of SECOND and
STU_INFO Relations on the next page.
• We notice that STATUS is transitively dependent on S# via CITY
and HALL is transitively dependent on ENo. via YEAR.
• In third normal form our goal is to remove these transitive
dependences.
• The only way to remove them is DECOMPOSITION. So, we once
again decompose the relations SECOND and STU_INFO as below:
Decomposition of SECOND:
SC<S#,CITY>
CS<CITY,STATUS>
Decomposition of STU_INFO:
STU1<ENO,NAME,DEPT,YEAR>
STU2<YEAR,HALL>
C ITY
S#
NAME
S TA TU S
Tra n sitive De pe nd e nce in S ECO ND

DEPT
ENo
YEAR
HALL
Transitive Dependence in STU_INFO

Third Normal Form
• A Relation R is said to be in third normal form
(3 NF) if and only if it is in second normal form
(2 NF) and every non-key attribute is non-
transitively dependent on the primary key (key
attribute).
• A relational schema is in 3 NF if all of its
relations are in 3 NF.
• Consider the dependency diagrams (on next
pages) for the relations SC, CS, STU1 and STU2
and note that all of them are in 3 NF.
S# CITY
Dependences in SC
CITY STATUS
Dependences in CS
NAME
DEPT
ENo YEAR HALL
YEAR Dependence in STU2
HALL
Dependence in STU1
Problems
• Reduce the following relation into third normal form.
Project number Project name Empnumber Employee name Rate category Hourly rate
1023 Madagascar travel site 11 Vincent Radebe A $60
12 Pauline James B $50
16 Charles Ramoraz C $40
1056 Online estate agency 11 Vincent Radebe A $60
17 Monique Williams B $50
• Consider the following invoice for International Widgets Corp.
Design a relational schema for the computer based system that is
expected to generate the same invoice. Make sure that your
schema is in 3 NF.
Relational Algebra & Relational
Calculus
• In 1971 E.F.Codd published two papers introducing the relational data model and
relational data manipulation languages – Relational Algebra and Relational Calculus.
• Both of these languages allow the manipulation of data solely on the basis of their
logical characteristics.
• In his original paper Codd introduced the relational data model and relational
algebra.
• Relational Algebra is a procedural language for manipulating relations i.e. relational
algebra uses a step by step procedure to create a relation containing the data that
answer the query.
• In subsequent paper Codd introduced Relational Calculus. Relational calculus is non-
procedural. In relational calculus a query is solved by defining a solution relation in a
single step.
• Codd showed that relational algebra and relational calculus are logically equivalent.
It meant that any query that could be formulated in relational calculus could also be
formulated in relational algebra and vice versa.
• This provided a means of measuring logical power of a query language. If a language
was at least as powerful as relational algebra or relational calculus, it was called
Relationally Complete Language.
Relational Algebra
• Relational Algebra operators manipulate relations i.e. these
operators use one or two existing relations to create a new
relation. This new relation may then be used as an input to a new
operation.
• This powerful concept i.e. the creation of new relations from old
ones makes possible an infinite variety of data manipulations. It
also makes the solution of queries easier, since we can
experiment with partial solution until we find an approach that
will lead us to the final solution.
• The relational algebra operators can be divided into following
two categories:
– Basic Set Oriented Operations
– Relational Oriented Operations
Basic Set Oriented Operations
• These are traditional set operations namely Union, Difference,
Intersection and Cartesian Product.
• Three of these four basic operations – Union, Intersection and
Difference require that operand relations must be union
compatible which means that the names of the attributes of the
operand relations are same and that the resultant relation
inherits these names. Mathematically:
• Two relations P and Q, are said to be union compatible if both P
a n d Q a re o f s a m e d e g re e n a n d t h e d o m a i n s o f t h e
corresponding n attributes are identical i.e. if P = {P1, P2, P3,….,
Pn} and Q = {Q1, Q2, Q3,…., Qn} then:
Dom(Pi) = Dom(Qi) for all i = 1 to n.
Where Dom(Pi) represents the domain of the attribute Pi.
• Next we discuss the basic set oriented operators of relational
algebra.
1. Union (U)
• If we assume that P and Q are two union compatible relations then
the union of P and Q is set – theoretic union of P and Q. The resultant
relation R = P U Q has tuples drawn from P and Q such that:
R = { t | t є P V t є Q } and
max(|P|, |Q|) ≤ |R| ≤ |P| + |Q|
Note: |X| means cardinality of relation X.
• Following example illustrates:
P Q R=PUQ
ID NAME ID NAME ID NAME
101 Jones U 103 Smith 101 Jones
103 Smith 104 Lloyd
= 103 Smith
104 Lloyd 106 Byron 104 Lloyd
107 Evan 110 Drew 106 Byron
110 Drew 107 Evan
112 Smith 110 Drew
112 Smith
Also, |P| = 6 and |Q| = 4

max(6,4) = 6
|R| = 7
Clearly, max(|P|, |Q|) ≤ |R| ≤ |P| + |Q|
2. Difference ( - )
• Let P and Q be two union compatible
relations, then the difference operation
removes common tuples from the first
relation. Mathematically R = P – Q such
that:
R = { t | t є P Λ t є Q } and
0 ≤ |R| ≤ |P|.
• Following example illustrates the difference
operation:
P Q R=P-Q
101 Jones 103 Smith 101 Jones
103 Smith
- 104 Lloyd = 107 Evan
104 Lloyd 106 Byron 112 Smith
107 Evan 110 Drew
110 Drew
112 Smith
Also, |P| = 6 and |Q| = 4

|R| = 3
Clearly, 0 ≤ |R| ≤ |P|
3. Intersection (∩)
• If we assume that P and Q are two union
compatible relations then intersection
operation selects the common tuples from P
and Q. The resultant relation R = P ∩ Q has
tuples drawn from P and Q such that:
R = { t | t є P Λ t є Q } and
0 ≤ |R| ≤ min(|P|,|Q|)
Note: |X| means cardinality of relation X.
• Following example illustrates the intersection
operation:
P Q R=P∩Q
101 Jones 103 Smith 103 Smith
103 Smith ∩ 104 Lloyd
= 104 Lloyd
104 Lloyd 106 Byron 110 Drew
107 Evan 110 Drew
110 Drew
112 Smith
Also, |P| = 6 and |Q| = 4

|R| = 3
min(|P|, |Q|) = 4
Clearly, 0 ≤ |R| ≤ min(|P|, |Q|)
4. Cartesian Product (X)
• The Extended Cartesian product or simply the
Cartesian Product of two relations is the
concatenation of tuples belonging to the two
relations.
• A new resultant relation is created, consisting of all
possible combination of tuples. Mathematically:
R = P X Q = { t1 || t2 | t1 є P Λ t2 є Q }
• The degree of the resultant relation is given by:
Deg(R) = Deg(P) + Deg(Q)
• The cardinality of resultant relation is given by:
|R| = |P| * |Q|
P Q R=PXQ
ID NAME Project# ID Name Project#
101 Jones X J1 = 101 Jones J1
103 Smith J2 101 Jones J2
104 Lloyd 103 Smith J1
103 Smith J2
104 Lloyd J1
104 Lloyd J2
Also, |P| = 3 and |Q| = 2

|R| = 3*2 = 6
Deg(P) = 2
Deg(Q) = 1
Deg(R) = 3 = Deg(P) + Deg(Q)
Relational Oriented Operations
• They are also known as Additional Relational Algebra
Operations.
• The Basic Set Operations which have very limited data
manipulation capabilities have been supplemented by the
following additional operations:
– Projection (π)
– Selection (σ)
– Join ( )
– Division (÷)
• Projection and selection are unary operators (i.e. they
require single operand) however Join and Division are
binary operators.
1. Projection (π)
• The projection operation over attributes
a 1 , a 2 , a 3 .. … ..a n is written as π ( a 1, a 2 ,
a3.. … ..an) (R) works on a single relation R
and defines a relation that contains a
vertical subset of R, extracting the values
of specified attributes and eliminating
duplicates.
• Following examples on relation SECOND
illustrate Projection Operation:
SECOND
S# STATUS CITY S#
S1 20 LONDON S1
S2 10 PARIS S2
1.π(S#) (SECOND) =
S3
S3 10 PARIS
S4
S4 20 LONDON
S5
S5 20 ATHENS
STATUS
2.π(STATUS) (SECOND) = 10
20
STATUS CITY
20 LONDON
3.π(STATUS, CITY) (SECOND) =
10 PARIS
20 ATHENS
2. Selection Operation (σ)
• The select io n o perati o n w ri tte n a s
σ predicate (R), works on a single relation R
and defines a relation that contains only
those tuples of R that satisfy the specified
condition (predicate) for a given attribute
or for a combination of attributes.
• Following examples on relations P and
SECOND illustrate the Selection operation.
P
ID Names ID Names
101 Jones 101 Jones
103 Smith 1. σID < 107(P) = 103 Smith
104 Lloyd 104 Lloyd
106 Byron 106 Byron
107 Evan
S# STATUS CITY
2. σCITY=’LONDON’(SECOND) = S1 20 LONDON
S4 20 LONDON
• Write Relational Algebra expression to answer
the following query for relation SECOND:
“Get S# for all suppliers residing in PARIS”.
• Solution:
1. Select all suppliers residing in Paris:
σCITY=’PARIS’(SECOND)
2. Project Out the undesired attributes (or take
projection on S#)
π(S#) (σCITY=’PARIS’(SECOND))
3. Join ( )
• The Join operation as its name suggests, allows the
combination of two relations to form a single new relation.
The join operation is used to connect data across relations,
which is the most important function in any database
language.
• There are several versions of join namely:
– Natural Join
– Theta Join
– Equi Join
– Self Join
– Outer Join
• Join is supposed to be one of the most important
operations of relational databases.
(i) Natural Join
• Suppose we want to take Natural Join of two relations P
and Q which have columns C1, C2, C3…. …. ….Cn in common.
Then,
P Q is obtained as:
1. Take Cartesian Product of P and Q i.e. Find P X Q.
2. Eliminate all tuples from P X Q except those on which the
values of columns C1, C2, C3…. …. ….Cn, in P are equal,
respectively to the values of those columns in Q.
3. Project out one copy of the columns C1, C2, C3…. …. ….Cn.
• The degree of resultant relation is always less than the
sum of the degree of P and degree of Q.
• Following example illustrates the Natural Join operation:
Computation of Natural Join for the relations E and S:
E S
ID NAME ID SALARY CODE
101 Jones 101 67
103 Smith 103 55
104 Evan 104 75
(i) Compute E X S
101 Jones 101 67
101 Jones 103 55
101 Jones 104 75
103 Smith 101 67
103 Smith 103 55
103 Smith 104 75
104 Evan 101 67
104 Evan 103 55
104 Evan 104 75
(ii) Select rows for which values of common attribute are equal
101 Jones 101 67
103 Smith 104 75
104 Evan 104 75
(iii) Project out one copy of common attributes

ID NAME SALARY CODE
101 Jones 67
103 Smith 75
104 Evan 75
• We can note that natural join on relations E and S
may also be expressed as:
E S = π(Id, Name, Salary Code) (σE.Id = S.Id(E X S))
(ii) Theta Join ( ) B
• Theta Join is intended for those occasions when we need to join two
relations on the basis of some condition.
• So, theta join is a join operation that connects relations when the
values from specified columns of the relations have a specified
relationship.
• Let P and Q be wo relations, then theta join of relation P on attribute
X with relation Q on attribute Y is written as:
P Q
B
Where, B is a condition of the form of:

P.X θ Q.Y
Where θ may be anyone of the following:
>, <, =, !=, ≤, ≥
and is defined as the set of all tuples for which P.X θ Q.Y is true in the
Cartesian Product of P and Q. The degree of resultant relation is the
sum of the degree of P and degree of Q.
Computation of Theta Join for the relations E and S:
E S
ID NAME ID STATUS
101 Jones 101 Clerk
102 Smith 107 Engineer
103 Evan
(i) E X S
ID NAME S.ID S.STATUS
101 Jones 101 Clerk
101 Jones 107 Engineer
102 Smith 101 Clerk
103 Evan 101 Clerk
103 Evan 107 Engineer
(ii) Assume that θ = E.ID < S.ID
Select all rows for which B is TRUE
ID NAME S.ID S.STATUS
101 Jones 107 Engineer
103 Evan 107 Engineer
• We can easily note that theta join on relations E and S may also be
expressed as:
E S = σE.Id < S.Id(E X S)
B
• It would be interesting to note that the attributes on which theta join
operation is being carried out, may have different names in the two
relations. For example the ID attribute of relation S could have the
name CODE or something else. The true spirit of a theta join
operation is that the attributes on which it is being carried out must
take their values from the same domain.
(iii) Equi Join:
• Theta join is called Equi Join when the condition for
comparison (θ) is equality.
• Clearly, Equi Join is a special case of Theta Join.
• Also, Natural Join is a special case of Equi Join where,
among the duplicate copies of columns, one copy is
projected out.
• One major distinction between Equi Join and Natural Join
is that, for Natural Join it is mandatory that the attributes
on which natural join operation is being carried out must
have same names in the two relations however in case of
equi join they may have different names.
• Following example illustrates EQUI Join:
Computation of EQUI Join for the relations E and S:
E S
ID NAME CODE SALARY CODE
101 Jones 101 67
103 Smith 103 55
104 Evan 104 75
(i) ComputeE X S
101 Jones 101 67
101 Jones 103 55
101 Jones 104 75
103 Smith 101 67
103 Smith 103 55
103 Smith 104 75
104 Evan 101 67
104 Evan 103 55
104 Evan 104 75
(ii) Assume that θ = (E.ID = S.CODE)
Select all rows for which B is TRUE
101 Jones 101 67
103 Smith 104 55
104 Evan 104 75

(iv) Outer Join
• The Outer Join expands the natural join by making
sure that every record from both the relations is
listed in the join relation at least once.
• The Outer Join consists of two steps:
1. First a natural join is executed
2. Then, if any record in one relation does not match
a record from other relation in the natural join,
that unmatched record is added to the join
relation and the additional columns are filled
with nulls.
3. Following example illustrates Outer Join:
Computation of Outer Join for the relations E and S:
E S
101 Jones 101 67
103 Smith 103 55
104 Evan
(i) E X S
101 Jones 101 67
101 Jones 103 55
103 Smith 101 67
103 Smith 103 55
104 Evan 101 67
104 Evan 103 55
(ii) Select rows for which values of common attribute are equal
101 Jones 101 67
103 Smith 104 75
(iii) Project out one copy of common columns
ID NAME SALARY CODE
101 Jones 67
103 Smith 75
(iii) Indicate the Unmatched records
ID NAME SALARY CODE
101 Jones 67
103 Smith 75
104 Evan null
Please Note…
• Strictly speaking the outer join operation
we have just executed is a Left Outer Join as
it keeps every tuple in the left hand relation
in the result.
• Similarly there is a Right Outer Join, that
keeps every tuple in the right hand relation
in the result.
• There is also a Full Outer Join that keeps all
tuples in both relations, padding tuples
with nulls when no matching tuples are
found.
(v) Self Join
• As the name indicates, self join means joining a
relation with itself.
• But how to join a relation with itself when all the
attributes are common and take the values from
identical domains.
• To resolve the problem we copy the relation into
another relation having a different name and join
operation is executed on the qualified attributes
indicated by the requirement i.e. by the query.
• Following example illustrates:
• Consider the relations ASSIGNMENT:
ASSIGNMENT
EMP# PROD# JOB#
107 HEAP1 800
101 HEAP1 600
110 BINS9 800
103 HEAP1 700
101 BINS9 700
110 FM6 800
107 B++1 800
• Write Relational Algebra Expression to answer the Query:

“Find the Co-Workers in all projects (but not necessarily
doing the same job)”.
SOLUTION
1. Copy ASSIGNMENT to COASSIGN.
2. Join ASSIGNMENT with COASSIGN and get R as below:
R = (ASSIGNMENT) (COASSIGN)
ASSIGNMENT.PROD# = COASSIGN.PROD#
3. Take Projection of R on EMP# and COA.EMP#::
π(EMP#, COA.EMP#) (R)
4. The Complete Expression Would be:
π(EMP#, COA.EMP#) ((ASSIGNMENT) (COASSIGN))
ASSIGNMENT.PROD# = COASSIGN.PROD#
ASSIGNMENT COASSIGN
EMP# PROD# JOB# EMP# PROD# JOB#
107 HEAP1 800 107 HEAP1 800
101 HEAP1 600 101 HEAP1 600
110 BINS9 800 110 BINS9 800
103 HEAP1 700 103 HEAP1 700
101 BINS9 700 101 BINS9 700
110 FM6 800 110 FM6 800
107 B++1 800 107 B++1 800
(i) Join ASSIGN with COASSIGN on PROD#

EMP# PROD# JOB# COA.EMP# COA.PROD# COA.JOB#
107 HEAP1 800 107 HEAP1 800
107 HEAP1 800 101 HEAP1 600
107 HEAP1 800 103 HEAP1 700
101 HEAP1 600 107 HEAP1 800
101 HEAP1 600 101 HEAP1 600
101 HEAP1 600 103 HEAP1 700
110 BINS9 800 110 BINS9 800
-- --- --- ---- ---- ---
--- --- --- ---- --- ----
110 BINS9 800 101 BINS9 700
103 HEAP1 700 107 HEAP1 800
103 HEAP1 700 101 HEAP1 600
103 HEAP1 700 103 HEAP1 700
101 BINS9 700 110 BINS9 800
101 BINS9 700 101 BINS9 700
110 FM6 800 110 FM6 800
107 B++1 800 107 B++1 800
Taking Projection on EMP#, COA.EMP#
EMP# COA.EMP#
107 101
107 103
101 107
101 103
110 101
103 107
103 101
101 110
Self Join: Another Example
• Consider the table shown below. Assume that
its name is Emp1:
Above Table Shows
• Unnath Nayar's supervisor is Vijes Setthi
• Anant Kumar and Vinod Rathor are also under
supervision of Vijes Setthi.
• Rakesh Patel and Mukesh Singh are under
supervision of Unnith Nayar.
Now Think of Answering The Following Query:
• Get the list of employees along
with their supervisors??
Solution
• Copy Emp1 to Emp2
• J o i n E m p 1 w i t h E m p 2 s u c h t h a t
Emp1.EMP_SUPV = Emp2.EMP_ID
• SQL Command:
SELECT EMP1.EMP_ID, EMP1.EMP_NAME,
EMP2.EMP_ID, EMP2.EMP_NAME FROM EMP1,
E M P 2 W H E R E E m p 1 . E M P _ S U P V =
Emp2.EMP_ID;
Output
4. Divide (÷)
• Consider the relation P and the several results when relation P is
divided by different values of relation Q:
P
A B
A1 B1
A1 B2
A2 B1
A3 B1
A4 B1
A4 B2
A5 B1
A5 B2
Values of Relation Q Result of P ÷ Q
B A
B1 A1
B2 A4
A5
B A
B1 A1
A2
A3
A4
A5
B A
B1 NULL SET
B2
B3
• Simply stated – The division of P and Q is defined
such that the Cartesian Product of the result with
Q is a subset of P.
“OR”
• If we assume that a tuple for an instance (A1,B1) of
P represents the object A1 with the property B1
then resultant relation R is the set of all such
instances from P that posses the property B1.
Problems:
• Consider the Relation Schema Given Below taken over the ER diagram
shown therein:
EMPLOYEE<Emp#, Name>
ASSIGNED_TO<Project#,Emp#>
PROJECT<Project#,Project_Name,Chief_Architects>
EMPLOYEE ASSIGNED_TO PROJECT
• Write Relational Algebra Expressions to answer the following queries

on the schema shown above:
1. Get Emp# of employees working on project COMP353.
2. Get details of employees (both number and name) working on
project COMP353.
3. Obtain details of employees working on Database projects.
4. Gather details of employees working on both COMP353 and
COMP354.
5. Find the numbers of employees who work on at least all of the
projects that employee 107 works on.
6. Find employee numbers of employees do not work on project
COMP453.
7. Get employee number of employees who work on all projects.
8. List the employee numbers of employees other than employee
107 who work on at least one project that employee 107
works on.
Do Yourself...
• Consider the Relation Schema Given Below taken over the ER diagram shown
therein:
ATTENDENT<AID#, Name>
BOARDING_TO<FLIGHT#,AID#>
FLIGHT<FLIGHT#,Destination,Chief_Pilot>
• Assiuming that several attendents may be assigned to the same flight, write
Relational Algebra Expressions to answer the following queries on the schema
shown above:
1. Get AID# of attendents boarding on flight KF353.
2. Get details of attendents (both numbers and names) boarding on flight KF353.
3. Obtain details of attendents boarding on the flight destinated to 'Muscat'.
4. Gather details of attendents who have to board on both KF353 and KF354.
5. Find the numbers of attendents who board on at least all of the flights that attendent
107 bords on.
6. Find AID# of attendents who do not board on flight KF353.
7. Get AID# of attendents who board on all flights.
8. List the AID# of attendents other than attendent 107 who board on at least one flight
that attendent 107 boards on.

Data Modelling Dbms PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Modelling Dbms PDF

Uploaded by

Copyright:

Available Formats

Data Modeling & Database

DEPART MENT EMPLOYEES

PROJECT S DEPART MENT S

MID MNAME SEX Child (Manager)

Hierarchical data Model

M100 Ravi M M101 Sanjay M

102 Maths East

Provi des M Retirem ent T YPE

ID TYPE Date AMOUNT Child (Retirement Plan)

Hierarchical data Model

Hierarchical data Model

E1 RAVI A 10000 M F1 SMITH A 10000 M

E2 SANJAY B 8000 M F2 CLARK B 8000 M

E3 SANDY A 10000 F F3 LUCY A 10000 F

140 A 31NMar09 5 Lacs 100 Physics West

100 A 31Mar090 5 Lacs

101 B 10Jan99 4 Lacs

102 C 5Feb2002 3 Lacs

Tree 1 MID MNAME LOCATION Child (Manufacturer)

MID MNAME LOCATION Parent (Manufacturer)

PID PNAME DESC Child (Products)

101 Bajaj Mumbai

102 LML Italy

Record Occurrences 103 Kinetic Mumbai

P# PNAME COLOR WEIGHT CITY Child S# SNAME STATUS Parent

E1 Parent E2 Parent E3 Parent

E2 Child E3 Child E2 Child

E2 Child E3 Child 500 Child

SSN PMNAME SEX Me m be r (PM)

CODE NAME CURRENCY Me m be r (Country)

Network data Model

1 M.M. Singh M 2 Mr. Gilani M

Owner CUSTOMER SALES PERSON Owner

PURCHASE ORDER M em ber

Network data Model

Jones Smith Bill Ravi Mohan Sanjay

100 103 109

SID CID LREC (MEMBER)

Network data Model

C1 PHY C2 MATHS C3 ECO C4 CHEM

PURCHASE ORDER Member

Then, R = {<S1,P1>, <S2,P1>} , being the subset of

I NF Relations (Normalized Relations)

Order_No. Order_Date Item_Lines

279 Comp. Engg

Chemistry Clark 353 Comp. Engg

Order_No. Order_Date Item_Code Quantity Price/ Unit

Dept._of_Fac. Teacher Course_No. Course_Dept.

• The above diagram is known as Functional Dependency Diagram.

S# SNAME CITY STATUS

• Find the Functional Dependencies between the attributes of S. Also

Heath’s Theorem for Non-loss Decomposition

Course_Code Course_Name Course_Code Teacher

1456 26-2-2001 4627 38 60.20

1456 26-2-2001 3214 20 17.50

1886 04-03-2001 4629 45 20.25

1886 04-03-2001 4627 30 60.20

1788 04-04-2001 4630 40 62.20

Dependency Diagram for FIRST

Dependency Diagram for ORDER

Dependency Diagram for SP

ORDER_NO ORDER_DATE ITEM_CODE PRICR/ UNIT

Dependency Diagram for XORDER Dependency Diagram for PRICES