You are on page 1of 71

Lecture Notes For DBMS and Data Mining and data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal


Lecture 1.
What do you mean by Data and Database?

Data can be divided into three categories.

Raw data this could be 85 doesnt have meaning when it stands alone. It might mean
something if you knew it was weight of a man in Kilograms.

Related raw data is a group (data set or data file) of organized raw data that can be tied
together. For example, it could be a group of Names, weights, blood group and identification
numbers, all tied to the Identity cards issued to patients at hospitals

Cleaned raw data is all the above after being validated or processed through some process.
Such a process might ensure that blood groups doesnt have any value as red or black for
example only allowed values could be of the kind A,A+,B,B+ etc.

Data can be acquired from many different sources. It must always be evaluated as to which
category it belongs, and if it needs any additional validation before analysis that produces
information.

Database:
A database consists of an organized collection of interrelated data for one or more uses,
typically in digital form.
Examples of databases could be: Database for Educational Institute or a Bank, Library, Railway
Reservation system etc.

What Is a DBMS?

? Consists of two things- a Database and a set of programs.
? Database is a very large, integrated collection of data.
? The set of programs are used to Access and Process the database.
? So DBMS can be defined as the software package designed to store and manage or
process the database.
? Management of data involves
o Definition of structures for the storage of information
o Methods to manipulate information
o Safety of the information stored despite system crashes.
? Database models real-world enterprise by entities and relationships.
o Entities (e.g., students, courses, class, subject)
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
o Relationships (e.g., Arjun studies in Class -EEE VII)
File System

? Data is stored in Different Files in forms of Records
? The programs are written time to time as per the requirement to manipulate the data
within files.
o A program to debit and credit an account
o A program to find the balance of an account
o A program to generate monthly statements

Disadvantages of File system over DBMS

Most explicit and major disadvantages of file system when compared to database management
system are as follows:
? Data Redundancy- The files are created in the file system as and when required by an
enterprise over its growth path. So in that case the repetition of information about an
entity cannot be avoided.
Eg. The addresses of customers will be present in the file maintaining information
about customers holding savings account and also the address of the customers will be
present in file maintaining the current account. Even when same customer have a saving
account and current account his address will be present at two places.
? Data Inconsistency: Data redundancy leads to greater problem than just wasting the
storage i.e. it may lead to inconsistent data. Same data which has been repeated at several
places may not match after it has been updated at some places.
For example: Suppose the customer requests to change the address for his account in
the Bank and the Program is executed to update the saving bank account file only but his
current bank account file is not updated. Afterwards the addresses of the same customer
present in saving bank account file and current bank account file will not match.
Moreover there will be no way to find out which address is latest out of these two.
? Difficulty in Accessing Data: For generating ad hoc reports the programs will not already
be present and only options present will to write a new program to generate requested
report or to work manually. This is going to take impractical time and will be more
expensive.
Set of programs
File
System
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
For example: Suppose all of sudden the administrator gets a request to generate a list
of all the customers holding the saving banks account who lives in particular locality of
the city. Administrator will not have any program already written to generate that list but
say he has a program which can generate a list of all the customers holding the savings
account. Then he can either provide the information by going thru the list manually to
select the customers living in the particular locality or he can write a new program to
generate the new list. Both of these ways will take large time which would generally be
impractical.
? Data Isolation: Since the data files are created at different times and supposedly by
different people the structures of different files generally will not match. The data will be
scattered in different files for a particular entity. So it will be difficult to obtain
appropriate data.
For example: Suppose the Address in Saving Account file have fields: Add line1, Add
line2, City, State, Pin while the fields in address of Current account are: House No.,
Street No., Locality, City, State, Pin. Administrator is asked to provide the list of
customers living in a particular locality. Providing consolidated list of all the customers
will require looking in both files. But they both have different way of storing the address.
Writing a program to generate such a list will be difficult.
? Integrity Problems: All the consistency constraints have to be applied to database through
appropriate checks in the coded programs. This is very difficult when number such
constraint is very large.
For example: An account should not have balance less than Rs. 500. To enforce this
constraint appropriate check should be added in the program which add a record and the
program which withdraw from an account. Suppose later on this amount limit is
increased then all those check should be updated to avoid inconsistency. These time to
time changes in the programs will be great headache for the administrator.
? Security and access control: Database should be protected from unauthorized users.
Every user should not be allowed to access every data. Since application programs are
added to the system
For example: The Payroll Personnel in a bank should not be allowed to access
accounts information of the customers.
? Concurrency Problems: When more than one users are allowed to process the database.
If in that environment two or more users try to update a shared data element at about the
same time then it may result into inconsistent data.
For example: Suppose Balance of an account is Rs. 500. And User A and B try to
withdraw Rs 100 and Rs 50 respectively at almost the same time using the Update
process.
Update:
1. Read the balance amount.
2. Subtract the withdrawn amount from balance.
3. Write updated Balance value.
Suppose A performs Step 1 and 2 on the balance amount i.e it reads 500 and subtract 100
from it. But at the same time B withdraws Rs 50 and he performs the Update process and
he also reads the balance as 500 subtract 50 and writes back 450. User A will also write
his updated Balance amount as 400. They may update the Balance value in any order
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
depending on various reasons concerning to system being used by both of the users. So
finally the balance will be either equal to 400 or 450. Both of these values are wrong for
the updated balance and so now the balance amount is having inconsistent value forever.









































www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Lecture 2
Why Use a DBMS?
? Data independence and efficient access.
? Reduced application development time.
? Data integrity and security.
? Uniform data administration.
? Concurrent access, recovery from crashes.

Role of DBMS:

The earlier Information system will work as follows:


While the DBMS will be another layer of software package placed between the file system and
set of application programs. The Role of DBMS can described by the following diagram at a
very high level.

Role of DBMS
Set of programs
File
System
Disk
Users
DBMS
File System
Application Programs
Disk
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal

Instances and Schema:

Database schema: Overall design of the database. An analogy to the programming language
could be the definition of various variables with their data types. In case of relational database
management system the definition of table names, and their fields with data types will be the
database schema.

Database Instance: The collection of information stored in the database at a particular moment is
called database instance. An analogy to the programming languages would be the values stored
in the variables during the execution of programs. In case of relational database management
system the data stored in various tables at a particular time is the instance of the database.

Data Abstraction: Three-Level Architecture of DBMS:
Since many of the database system users are not computer trained, developers hide the
complexity from users through several level of abstraction, to simplify users interaction with the
system:
? Physical Level:
o Lowest level of abstraction
o Describes how the data are actually stored
o Complex low-level data structures are defined by system programs which are
generally hidden from high level computer programs also.
o In the case of relational database management systems the files and indexes used
are described at physical level of abstraction.
o It is similar as a programming language hides exact way of storing the values
defined by variables or records or arrays. Thus defining exact way of storing a
record or an array defined by suppose C language will be called physical level of
abstraction.
o Physical schema is used at the physical level of abstraction.
? Logical Level:
o Describes what data are stored in the database and what relationships exist among
those data.
o Entire database is represented in simple structure which may be specified by very
complex structures at physical level.
o In the case of relational database management systems definitions of Tables and
their fields are defined at logical level of abstraction.
o An analogy with programming language for logical level of data abstraction is the
definitions of record structures or arrays in a programming language (say C).
o Logical Schema is used at logical level of abstraction.
? View Level:
o Describes only part of the entire database.
o Many users will not be concerned with all the information stored in a database
o System may provide several views for the several type of users of the database
which will show only the concerned part of the database.
o View schema is used at view level of abstraction.
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
































View level
- - - - - -
- - - -
View 1
View 2 View n
Logical Level
Physical Level
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Lecture 3

Data Independence:

Physical Data Independence:
? allow changes in physical schema without changes in logical schema or application
programs to be rewritten.
? The changes in physical schema can include: using new storage devices, using different
data structures, using different file organizations or storage structures or changing file
index. All these changes should be possible without changes in logical schema or
application programs to be rewritten.
Logical Data Independence:
? allow changes in logical schema without causing application programs to be rewritten.
? The changes like addition or deletion of entities, attributes or relationships come in
logical schema changes and they should be possible without rewriting the already written
application programs.

Major components of a DBMS:
1. Data Definition Language Interpreter/ Compiler
2. Data Manipulation Language Compiler
3. Query processor
4. Database Manager

Data Definition Language (DDL):

? This language provides a set of commands which can be used to define
o what is the data in database.
o what is the relationship between various data elements
o what are the integrity constraints put on various data items needed to be satisfied
o etc.

? It will be used to define the records or structures of database.

? The DDL statements are compiled to form the Data Dictionary or Data Directory which
contains the meta data i.e. data about data.

? The data dictionary is consulted by DBMS before any operation on data.

Data Manipulation Language(DML):

? It is a language that enables users access or manipulate data from the database.
? This consists of very high level statements that are used to specify the operations to be
performed on the database.

www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Query Language:

It is the portion of DML that is used to access or retrieve the information from the database.

Database Manager:

? This is the software that takes care for execution of all the statements specified in DDL or
DML. This software handles all the problems of a database and is responsible for
providing all of the features claimed above like data consistency, non-redundant data,
atomicity, concurrency control, easy access to data etc.
? It may be subdivided into two major components:
o Transaction Manager
o Storage Manager

A Transaction is a collection of operations that performs a single logical function in a database
application. Transaction manager takes care for identifying the transaction and their proper
execution. It is responsible to provide features like atomicity, concurrency control etc.

Storage Manager is responsible for the interaction with the file system and provides an
appropriate level of physical level of data abstraction. It is responsible to provide easy access to
database to the users.

The overall system structure of the database management system could be shown as below:


Application
Interface
Application
Programs
Query DB Scheme
DML
Compiler
Object
Code
Query
Processor
DDL
Compiler
Database
Manager
File
Manager
DBMS
Data
Data
Dictionary
DISK
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Types Of Users:

? DBA: Person who designs the database and writes database schema in DDL based on the
design
? Sophisticated Users: People who know DML commands and operate on database
directly.
? Application Programmers: People who operate on the database through the application
programs usually written in some high level computer language like C, Java, VB etc.
? Nave Users: People who executes the application programs through APIs written
specifically for their requirements. They are generally not aware of the computer
technology e.g. tellers, agents, registrars, librarian etc.


































www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Lecture 4
Data Models
? A data model is a collection of concepts for describing data, data relationships, data
semantics and consistency constraints.
? A schema is a description of a particular collection of data, using a given data model.
? Primary categories for various data models are:
o Object-based logical models
? Provide very high level design of the database
? Provide flexible structuring capabilities. The most popular ones are as
follows:
? The Entity-Relationship model
? The object oriented model
? The semantic model
? The functional model
o Record-based logical models
? Provide more implementation based design
? Specify overall logical structure of the database and provide high level
description of the implementation
? The most popular ones are as follows:
? Relational Model
? Network Model
? Hierarchical Model
o Physical models
? Describe data at the lowest level
? Captures aspects of database-system implementation
? Widely known are unifying model and frame-memory model
Entity Relationship model (E-R Model):
? Identifies basic elements, or objects, or entities which are core to the data base
? Consider for example the Library database. Most basic entities of a library can be
identified as books and users. There are other basic entities like suppliers, magazines,
journals etc..
? We describe the database generally be diagrams called E-R diagrams when using ER
Model.
? The sample E-R diagram for the above mentioned simple library database having only
two entities Books and Users can be formed as follows.
? All of the entities of type book will be represented by an entity set.(represented by
rectangle)



? Similarly we represent all users by an entity set



BOOK
USER
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
? We identify what are various attributes that describes the entities of an entity set. A book
is described in library by its Accession Number, Call Number, Title, Author, Publisher,
Year of publication etc. They are attached to entities as Ellipses as shown below:

? Similarly we associate attributes which defines a user in library to the respective entity in
E-R diagram as shown below

? Apart from entities E-R model describes the relationships between the entities. They are
again seen as relationship sets existing between Entity sets. For example a user can
borrow a book from library. All of those relationships between any book of library to any
user are represented by a relationship set. We can name it as borrowed by relationship
set. Borrowed By relationship can again have its own attributes which exists only when a
relationship exists. For example Date of Issue exists only when a book has been
borrowed by a particular user it is neither the attribute of Book nor of User. We represent
the relationships by diamond in E-R diagrams as shown below:
USER
Card. No.
Name
User Type
User ID
BOOK
Acc. No.
Call No..
Yr. of Pub.
Publisher
Title
Author
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal


Relational Model:

? Both the data and relationships are represented by tables
? So it is closer to implementation of the ER diagrams designed in first phase of modeling.
? The entities BOOK and USER can be represented by respective tables where columns of
the table represent the attributes of the entity. Every Column has unique column name
corresponding to its attribute name. The values are filled into the table for different
entities of this set.

Book Table:
Acc. No. Call No. Title Author Publisher Yr. of
Publication
312 245 Database System
Concepts
Silberschatz,
Korth, Sudarshan
McGrawHill 1997

433 23 Fundamental of
Database Systems
Elmasri, Navathe Addison Wesley 1999

User Table:
Card No. Name User Type User ID
422 Abhishek Student 0706412234
4322 Mr. Lalit Faculty 23456789

? The relationships can also be represented by tables. They include only those attributes of
the related entities which are sufficient to identify them uniquely and possibly attributes
which are specific to the relation

Borrowed By Table:
Book Acc. No. User Card No. Date of Issue
312 422 03/08/2010
433 4322 05/08/2010

? The above relationship table shows that User Abhishek has borrowed Database System
Concepts by Silberschatz, Korth, Sudarshan and user Mr. Lalit has borrowed
Fundamental of Database Systems by Elmasri, Navathe from library on 03/08/2010
and 05/08/2010 respectively.
USER
Card. No.
Name
User Type
User ID
BOOK
Acc. No.
Call No..
Yr. of Pub.
Publisher
Title
Author
Borr.
By
Date of issue
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Network Model:
? Data in Network Model are represented by collection of records.
? Relationships between data are represented by links or pointers.

Book1 User1
312 245 Database
System
Concepts
*

Book2
433 23 Fundamental
of Database
Systems
*
Book3
434 24 Fundamental
of Database
Systems
*
? The above diagram shows 5 data records each having the several data values for
corresponding attributes and an extra field marked as * which is used for link or pointer.
? Whenever a relationship exists between two data elements that is explicitly shown by
using the pointers. So relationship of BOOK1 and BOOK2 with USER1 is shown by
pointers in BOOK1 and BOOK2 records. Similarly the USER1 is related to these books
and they can be shown by circular linked list. This list contains only pointers. One such
list is pointed by the link field of USER1 which in turn contains list of pointers to all of
the books which are borrowed by USER1 and last pointer points back to USER1.
? So the combination of data records and links can be used in any way to form the network
of data as per the convenience of designers and programmers.

Hierarchical Model:
? This one is very similar to Network Model in terms that it also uses records and links to
represent data and relationships respectively.
? The difference is that Network Model forms a network or graph of data and connections
while Hierarchical model forms only trees which doesnt allow cycles.
? The data elements present in the model have parent-child relationships. Where the data
nodes which are pointing are called parents and those nodes which are pointed by their
parents are called child.
? Any child data node cannot be pointed by two different parents.
? For example if we put Books as parents and Users as children then two books cannot
point to same user record. In that case we will have to replicate the record of user for
each book

Book1 Book2
312 245 Database
System
Concepts
*



User1 User1 User 2

422 Abhishek Student *
* * *
4322 Mr. Lalit Faculty *
433 23 Fundamental
of Database
Systems
*
434 24 Fundamental
of Database
Systems
*
422 Abhishek Student * 422 Abhishek Student *
4322 Mr. Lalit Faculty *
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Lecture 5
Entity-Relationship Model

Various symbols used in the E-R diagrams are as follows:

Symbol Meaning




Entity Type




Weak Entity Type




Relationship Type




Identifying Relationship ( for weak entity)




Attribute




Key Attribute




Multivalued Attribute




Composite Attribute




Derived Attribute




Total Participation of E
2
in R




One to Many relationship between entity sets E
1
to E
2






R E1
E2
R E1
E2
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Description of some symbols used in E-R diagrams:

? An Entity is a basic element of a system identified by a set of attributes and has
independent existence e.g. a Student, a faculty, a subject etc. in a college. An Entity type
defines a set of entities that have the same attributes.
? A relationship type R among n entity types E
1
,E
2
,,E
n
defines a set of associations
among these types.
? The Entity types which do not have key attributes of their own are called Weak Entity
Types. For example: Entity Type Dependent , related to an Employee. Which have
attributes Dependent name, Birth Date, Sex and Relationship. Two dependents of
distinct employees may have the same values for all these attributes but they will still be
distinct entities as they are linked to different Employee.
? Entities belonging to a weak entity type are identified by being related to specific entities
from another entity type in combination with some of their attribute values. This other
entity type is called identifying owner and the relationship type that relates the weak
entity type to its owner is called identifying relationship type. For example in the above
example the Dependent is weak entity whose owner is Employee and the relationship
between Dependent and Employee will be a identifying relationship type.
? The attributes which are keys to identify an entity in an entity set uniquely are called key
attributes. For example attribute Roll no. for an entity type Student is a key attribute.
? Some attributes may have many values at the same time for an entity of entity set. For
example the attribute College Degrees which lists the name of degrees obtained by a
person. It may have one value for a person but may have more than one value for others.
Such attributes are called multi-valued attributes.
? The attributes which are formed be combining smaller subparts are called composite
attributes. For example an Address attribute is formed by several sub parts like
Street, City, State and Pin code.
? The attributes which need not be stored with the entities and can be calculated from the
values of other attributes which are stored are called derived attributes. For example if
we have an attribute Birth Date storing date of birth of a person then the attribute Age
need not be stored as it can be calculated whenever we access that entity. So Age is a
derived attribute.
? When all the entities of an entity set/ entity type have to participate in a particular
relationship set/type, it is called total participation. Weak entities have total
participation in identifying relationships with their identifying owners.

Degree of Relationship Set:
? The number of entity sets participating in a relationship is the degree of relationship
? Example of Unary Relationship. Only one entity set participates in the relationship.

?



? Example of Binary relationship. Two entity sets participate in a relationship.
Manages
Employee
Manager
Subordinate
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal





? Example of ternary relationship. Three entity types participate in a relationship.








? N-ary relationship associates n entity sets.

Mapping Constraints:
? Two most important type of mapping constraints are Mapping cardinalities and Existence
Dependencies.
? For a binary relationship set R between entity sets A and B, The mapping cardinalities
must be one of the following:
o One to One: An entity in A is associated to at most one entity in B and vice-versa.
o One to Many: An entity in A is associated with any number of entities in B but an
entity in B is associated to at most one entity in A.
o Many to One: An entity in A is associated to only one entity in B while an entity
in B is associated to any number of entities in A.
o Many to Many: An entity in A is related to any number of entities in B and vice-
versa.

One to One One to Many
Many to One Many to Many
Book
Borrowed
By
User
Owns
Account Branch
Customer
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
? If the existence of entity x depends on the existence of entity y, then x is said to be
existence dependent on y. If y is deleted so is x. y is dominant entity and x is
subordinate entity. Example a payment entity is dependent on the loan entity. Payment
are identifies by payment number, payment date, and payment amount.
? Weak entities are existent dependent on their identifying owner. But every dependent
entity may not be a weak entity. In the above example of loan and payment entities
payment entity is having the payment number as unique key.



www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Lecture 6
Keys:

? Super Key: set of one or more attributes, when taken collectively, can identify uniquely
an entity in the entity set. There can be more than one super key of an Entity set.
? Candidate Key: The minimal super key is a candidate key i.e those super keys of the
entity set, who doesnt have any subset which are also a super key are called candidate
keys. There can be more than one candidate key of an Entity set.
? Primary Key: A candidate key which used as the key by database administrator while
implementing the database management system is called primary key of the entity set.
? Example: Consider an Entity Set named Student which has following set of attributes:
1. Student ID
2. Roll Number
3. Name
4. Fathers Name
5. DOB
6. Address
? Various entities of the above entity set will have the values for all the fields. But no two
entities i.e. no two students will have same values for all the six attributes. So one of the
super key is set containing following attributes:
Super Key1: (Student ID, Roll Number, Name, Fathers Name, DOB, Address)
Also Super Key2: (Student ID, Roll Number, Name, Fathers Name, DOB) will not have
same values for any two students in Student entity set. Similarly other super key of the
above entity set are following:
Super Key3: (Student ID, Roll Number, Name, Fathers Name)
Super Key4: (Student ID, Roll Number, Name)
Super Key5: (Student ID, Roll Number)
Super Key6: (Student ID)
Also Roll number is unique for a student so
Super Key7: (Roll Number)
All of them are sufficient to identify a particular student entity in the Entity Set of all
students.

The Super Key6 and Super Key7 is the candidate key of this entity set as no subset of any
of them is a super key. They are minimal subsets which are super keys.
Candidate Key1: (Student ID)
Candidate Key2: (Roll Number)

Any one of the candidate keys can be used as primary key. So
Primary Key: Either (Student ID) Or (Roll Number)
? The Primary Key of the many-to-many Relationship set is formed by including the
primary keys of all participating Entity sets.
? The Primary Key of the many-to-one and one-to-many Relationship set is formed by
including only the primary key of entity set from which many entities takes part in the
association to one of other participating entity set. E.g. In case of Borrowed By
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
relationship between entities Book and User which is a many-to-one relationship in
the sense many books may be borrowed by one user but one book cannot be borrowed by
many users. So for Borrowed By relationship the primary key will only contain the
Primary key of Entity Set Book.
? In case of One-to-One type the primary key of any one of the participating entity set is
used as primary key of the relationship.

Weak Entity Sets:
? Example: Consider an entity set LOAN having attributes (Loan Number, Loan
Amount, Customer ID) containing all the loans taken. The customer after taking a loan
pay loan in installments. Consider the entity set PAYMENT representing all the
payments made against all loans taken. The attributes of the PAYMENT are (Payment
Number, Payment Date, Payment Amount). The Payment number refers to the number of
payment made by the customer against a particular loan. The first payment made against
all the loans will have payment number as 1. Second payment for any loan will have
payment number as 2 and so on.
We can see that there can be two entities in PAYMENT which have values for all
attributes same and still they are two different entities. E.g. Suppose there are two loans
having following values
Loan 1: (Loan Number= 1, Loan Amount= Rs. 2000, Customer ID = A)
Loan 2: (Loan Number= 2, Loan Amount= Rs. 3000, Customer ID = B)
Now Customer A makes his first payment on date 03/03/2010 of amount Rs100 and by
chance B also makes his first payment on same date of same amount. So the two payment
entities will have following values:
Payment by A: (Payment Number=1, date= 03/03/2010, Payment amount=Rs 100)
Payment by B: (Payment Number=1, date= 03/03/2010, Payment amount=Rs 100)
It means that no set attributes of PAYMENT will have unique values. So the entity set
Payment will not have any key. Above mentioned two payments actually refers to
different payments because they are made against different loan but in the Payment
they doesnt have any difference.
? Such Entity Sets which doesnt have sufficient attributes to form a primary key are
called Weak Entity Sets.
We should have relationships defined for weak entities which associate them with a
strong entity in order to identify the different entities of these Entity Sets. Each entity of
Payment is actually linked to some loan so the entity set Payment is dependent and
Loan is Owner. The relationship Loan-Payment between Loan and Payment
which associates loans with their payments is must to identify each payment in
Payment.
? Such a relationship which links dependent entities of a weak entity set to their
owners in strong entity set to identify the weak entities is called identifying
relationship. Shown as doubly outlined diamond in ER Diagrams.
Also Every Entity of a weak entity existence dependent on some entity of their owners.
So every entity of a weak entity set must participate in the identifying relationship.
? Weak entity sets must have total participation in the identifying relationships.
Shown by double lines.
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
? The following diagram represents the above described entity sets and relationships by an
E-R Diagram.








? The Weak entity sets may also be modeled as a multivalued, composite attribute of the
owner entity set. Modeling as multivalued, composite attribute will be appropriate when
weak entity set participate only in the identifying relationship and number of attributes
are less. Otherwise modeling as Weak entity set will be more appropriate.
? A Weak Entity set may have several entities which have same values of all the attributes
provided they are related to different strong owner entity. But all the weak entities related
to a particular strong owner entity must be distinguishable. The set of attributes which
allows making a distinction between the weak entities related to a particular strong entity
is called the partial key or discriminator of the weak entity set.


























Loan -
Payment
Payment
Loan
Loan No.
Amount
Cust. Id.
Payment No.
Date
Amount
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Lecture 7
Extended E-R Features:
? Specialization: It is the process of identifying the subclass/es of an entity set which are
special from other entities of this set in terms of attributes or relationships they make.
Consider the design of a database for an academic institution. While designing we
identified an entity set Employee which represent all the employees of this. The
attributes of this may be (Employee ID, Employee Name, Date of Joining, Address) but
then we see a subclass of this set of employees, called set of all Teachers which is
different from other employees. We may have other employees in the sub classes called
Admin Staff or Technical Lab Staff or Other Staff like peons and other workers.
All of these subclasses form specialization of class Employee. All the attributes and
associations of class Employee will also be there with all the subclasses. Every teacher
will have an employee ID, Name, DOJ, Address similarly the admin staff, technical lab
staff and other staff members. Sub classes may have some attributes or associations of
their own which make them different from others. The Teachers will have a subject they
teach, Department they belong to, Expertise they have. The teachers will have
associations with different entities like Classes they teach in, Projects they guide etc.
These attributes and associations will not be there with other employees sub sets
? Generalization: This is again a similar process as specialization but it is just opposite of
that. It is the process of combining the subclasses into a general class and moving the
common attributes and associations from subclasses to the general class. It is just the
different practical approach. In specialization we start from the general classes and forms
the special classes out of them while in generalization we start from various low level
classes and forms the general classes by combining several of them identifying the
common features in them. So in the above example of Academic Institute we may start
thinking of Teachers as an Entity set and then Technical Lab Staff and Then Admin
staff and then observing that they have several fields in common like Employee Id,
name, Address etc. we combine them to define a general class Employee which
will only those attributes which are common in all the above three classes. And then these
classes will not have these common fields rather they will be there for all employees
collectively in the general entity set Employee.
? The result of both the above process is same. We get a hierarchy of classes and
subclasses which can be represented by a tree structure. The result of generalization and
specialization will be like this.

Employee
Teacher Admin Staff Lab Staff
ISA
Other Staff
EMPID
Name
Address
DOJ
Subject Deptt.
Role Lab Name
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal

? Aggregation: Consider a Borrower relationship set that associates customers from
Customer entity set to the loans they borrowed in entity set Loan. Suppose the bank
decides to attach an employee to some customer-loan relationships of Borrower based
on probably the size of loan or status of customer. This employee called the loan officer
will be responsible for tracking up and following up the status of the loan time to time.
This suggests a relationship that exists between Customer, Borrower and
Employee. So we can draw the simple ERD as follows:

o The diagram above may imply that the relationships Borrower and Loan-Officer
may be combined into one. But then it will require that a loan-officer must be
combined to every Customer-loan pair, which is not true.
o The above diagram also have redundancy as every customer-loan pair in Loan-
officer is also in Borrower.
? More appropriate way of representing the above set of relationships would be to consider
the entire relationship Borrower with its associated entities Customer and Loan as an
entity i.e. an aggregate entity, and then representing relationship Loan-Officer between
the entities Employee and the above aggregated entity. As follows:


Loan
Loan Number
Amount
Customer
Cust-name
Cust-ID
Address
Borrower
Employee
Loan-officer
Emp-ID
Emp-name
Address
Loan
Loan Number
Amount
Customer
Cust-name
Cust-ID
Address
Borrower
Employee
Loan-officer
Emp-ID
Emp-name
Address
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
? Important Terms related to generalization and specialization:
o Attribute inheritance: All the attributes of higher level entity sets are inherited by
the respective lower level entity sets. Also the relationships in which higher level
entity sets participates all the respective lower level entity level entity sets also
participates.
o Disjoint: The lower level entity sets corresponding to a higher level entity sets are
called disjoint if an entity doesnt belong to more than one lower level entity sets.
o Overlapping: When the same entity may belong to more than one lower level
entity set.
o Complete Generalization: When each higher level entity belongs to at least one
lower-level entity set.
o Partial Generalization: When some higher-level entities may not belong to any
lower-level entity set.


www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Lecture 8
Reduction of ER Schema to Tables:
? Strong Entity Sets: These are the entity sets for which we have a set of attributes which
are called primary key (or simply a key). To represent such an entity set in a form of table
we will have a column in a table for each attribute of the entity schema. Each entity of
that entity set will be represented by a row in table having values for each attribute.
For Example- the entity set BOOK as referred earlier has following attributes: Acc. No.,
Call No., Title, Author, Publisher and Yr of Publication. Also consider that we have
only two books in the library- a book on Database System Concepts by Silbershatz,
Korth and Sudarshan published in 1997 having accession number as 312 and Call no. as
245 and another book on Fundament of Database Systems by Elmasri and Navathe
published in year 1999 having accession no. as 433 and call no. as 23. Then entity set
BOOK will be represented in tabular form as follows:
Acc. No. Call No. Title Author Publisher Yr. of
Publication
312 245 Database System
Concepts
Silberschatz,
Korth, Sudarshan
McGrawHill 1997

433 23 Fundamental of
Database Systems
Elmasri, Navathe Addison Wesley 1999

? Weak Entity Sets: These are the sets where we cannot identify the different entities only
looking at their attributes we should be able to establish a link between a weak entity and
some of the entity from another strong entity set which is called owner of the weak entity
set. Such entity sets when represented in a tabular form will have a column for key
attributes of owner apart from other columns for the attributes of the entity set.
For Example: Consider the PAYMENT entity set which is a weak entity set dependent on
its owner entity set LOAN. LOAN has Loan No. and Loan Amount as its attributes with
Loan No. as the key attribute and PAYMENT has Payment No., Payment Amount and
Payment Date as the set of attributes (no key as its a weak entity set but Payment No. is
a partial Key). Consider the following table is there for LOAN:

Loan No. Loan
Amount
L-1 Rs. 10000
L-2 Rs. 40000

Also consider that a payment of Rs 100 is made for L-1 on 22/08/2010 as its first
payment and a payment of Rs 300 is made for L-2 on 25/08/2010 as its first payment then
table corresponding to entity set PAYMENT will look like this:

Loan No. Payment No. Payment
Amount
Payment
Date
L-1 1 Rs. 100 22/08/2010
L-2 1 Rs. 300 25/08/2010

www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Notice: We have included Loan No. as a column even though it was not an attribute of
entity set PAYMENT because it is the Key attribute of the owner entity set LOAN. Loan
No. and Payment No. in combination forms the primary key of this table.
? Relationships: To represent a relationship of an ERD in tabular form we have a column
corresponding to key attributes of each of the participating entity with a column for each
attribute which is directly associated to the relationship set only.
For example: We have defined earlier the relationship BORROWED BY which exists
between the entity sets BOOK and USER. It has an attribute Date of Issue directly
associated to it. The tabular representation of BORROWED BY will have a column for
Acc. No. a column corresponding key of BOOK and a column for Card No.
corresponding to key of USER and a column for DOI corresponding to the attribute of
relationship set. The table may look like this where rows represents all the borrowings
which are there in the library:
Book Acc. No. User Card No. Date of Issue
312 422 03/08/2010
433 4322 05/08/2010
? Existentially Dependent Entity Sets: Since the existence of all the entities of a
dependent entity set depends on the existence of some entity of its owner. We may
remove the table representing the relationship that is there between an existentially
dependent entity and its dominant entity by adding the Column in table for dependent
entity corresponding to key of dominant entity.
For Example: ACCOUNT having attributes Account No.(key) and Balance is
existentially dependent on BRANCH having attributes Branch Id(key) and Address. So
the table for relationship BRANCH-ACCOUNT which associates the accounts to
branches may be removed by just adding a column naming Branch Id in table
representing ACCOUNT. The Table will have following columns:

Account No. Balance Branch Id

? Identifying Relationship Sets: These are the relationship sets represented as doubly
outlined diamonds in ERD which form an associate a weak entity set to its owner. Since
we have already included the Primary key of Strong owner Entity set in the table of weak
entity set so we do not require a separate table to represent the identifying relationships.
? Multivalued Attributes: The attributes of an entity which can have more than one value
is called multi valued attribute. They are marked by doubly outlined ovals in ERD.
For example: Consider Dependants an attribute of an EMPLOYEE. Since there may be
more than one dependent of an employee we will represent this as a multivalued attribute
of an EMPLOYEE. But if we represent it as a column in the table for the entity set we
will not be able to put all of the values for a row. A multivalued attribute is represented as
a different table similar to the weak entities where you will have a column corresponding
to the primary key of the entity and a column corresponding to each sub attribute of
multivalued attribute(there will be sub attributes in case multivalued attribute itself is a
composite attribute). We show the ERD and corresponding Table for such an analogy


www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal













Employee Table
EMPID EMPName
EMP-1 Rajan
EMP-2 Sartaj

Dependant Table
EMPID Dependent No. Name Relation
EMP-1 1 Shashi Mother
EMP-1 2 Rachna Spouse
EMP-2 1 Shahina Spouse
EMP-2 2 Rehman Son

Here we see that Rajan is having dependents his mother and his wife and Sartaj also
having two dependents Shahina his wife and Rehman his son.

? Generalization: Consider following Generalization example:


It shows a general class of entities ACCOUNT which two special classes SAVING and
CURRENT referring to savings bank account and current bank account.
It can be represented in the following way
1. Tables for general case:
ACCOUNT
AccNo Balance


ACCOUNT
SAVING CURRENT
ISA
AccNo
Balance
InterestRate
OverdraftAmount
EMPLOYEE
EMPID
Dependent
Dependent No.
Name
Relation
EMPName
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal

SAVING:
AccNo InterestRate



CURRENT
AccNo OverdraftAmount



2. Tables when generalization is disjoint and complete
Above case has both properties Disjoint and Complete. No account can both be saving
and current account and Every account has to be either Saving Account or Current
Account.In such case we my club the Account table into its child tables.We will have
only two tables as follows
SAVING:
AccNo Balance InterestRate


CURRENT:
AccNo Balance OverdraftAmount


? Aggregation:In case of ERD given earlier referring LOAN, CUSTOMER,
BORROWER, EMPLOYEE and LOAN-OFFICER

we can have the following tables
o Loan: with attributes LoanNumber and Amount.
o Customer: with attributes Cust-Name,Cust-ID, Address.
o Borrower: with attributes Cust-ID and LoanNumber.
o Employee: with attributes Emp-ID, Emp-Name, Address.
o LoanOfficer: with attributes Cust-ID, LoanNumber and Emp-ID.
Loan
Loan Number
Amount
Customer
Cust-name
Cust-ID
Address
Borrower
Employee
Loan-officer
Emp-ID
Emp-name
Address
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture 9
Relational Model
A relational database consists of a collection of tables, each of which is assigned a unique
name.
A row in table represents a relationship among a set of values.
Table is a collection of rows or relationships which is similar to a mathematical relation
i.e. a set of tuples.
Basic Structure
A mathematical binary relation is an association of values from one set to another set. Ex.
Less-than relation associates a set of integers with another set of integer. Consider
A= B=
Relationship less-than from A to B =
Or it can be represented as set of tuples <x,y> where x is an element from A and y is an
element from B such that x<y i.e.
Relationship less-than from A to B = {<1,2>,<1,3>,<1,4>,<2,3>,<2,4>,,<3,4>}.
Similarly,
Relationship equal-to from A to B= {<1,1>,<2,2>,<3,3>}.
And
Relationship greater than from A to B = {<2,1>,<3,1>,<3,2>,<4,1>,<4,2>,<4,3>}
While the Cartesian product of A and B contains all such tuple <x,y> where x belongs to
A and y belongs to B i.e. A B =

> < > < > <


> < > < > <
> < > < > <
... ... ... ...
... 3 , 3 2 , 3 1 , 3
... 3 , 2 2 , 2 1 , 2
... 3 , 1 2 , 1 1 , 1
So we can see any relation from A to B above will be a subset of A B.
1,
2,
3,

1,
2,
3,

1,
2,
3,

1,
2,
3,

s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Now Suppose in a relational model a table T has columns titled as A, B and C. If T is
representing an entity set then A, B, C will be its attributes. Every attribute corresponds
to a limited number of values that can be assigned to it.
The set of values that can be assigned to a particular column is called domain for that. So
A, B and C will have their specified domains. Suppose those domain sets are denoted as
D
A
, D
B
, and D
C
.
Any row of the table will have a value from D
A
in first column, D
B
in second column, and
D
C
in third column. So a row of relational table is similar to a tuple of a mathematical
relation between the sets D
A
, D
B
, and D
C
.
For Example: Consider an BOOK table having attributes acc-no, title, and author. To
make it simple we restrict the domain for acc-no. as A={100, 101, 102}, for title as
B={DBMS, COMPILER, OS} and for author as C={Ramanuj, Aristotle and
Silbershatz}. That means the first column of BOOK can have any value from only A,
second from only B and third from only C. the Cartesian product of these domain sets can
be represented in a tabular form as :
A B C =
Now we can observe any valid table representing the entity set BOOK for a library given
above restriction on domains will have only a subset of the rows from the above table
which represents A B C. For Example a valid entity set for all books in the library
can be
BOOK
So we can say any table of relational model is actually similar to the mathematical
relation.
100 DBMS Ramanuj
100 DBMS Aristotle
100 DBMS Silbershatz
100 COMPILER Ramanuj
100 COMPILER Aristotle
100 COMPILER Silbershatz
100 OS Ramanuj
100 OS Aristotle
200 DBMS Ramanuj
200 DBMS Aristotle
200 DBMS Silbershatz
200 COMPILER Ramanuj
200 COMPILER Aristotle
200 COMPILER Silbershatz
200 OS Ramanuj
200 OS Aristotle
200 OS Silbershatz
300 DBMS Ramanuj
300 DBMS Aristotle
300 DBMS Silbershatz
300 COMPILER Ramanuj
300 COMPILER Aristotle
300 COMPILER Silbershatz
300 OS Ramanuj
300 OS Aristotle
300 OS Silbershatz
100 DBMS Silbershatz
200 DBMS Ramanuj
300 COMPILER Silbershatz
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Every row of such relational table is similar to a tuple of a mathematical relation. Let the
tuple variable t refers to the first tuple (first row) in above mentioned BOOK table then
we can various elements of the tuple as t[acc-no]= 100, t[title]= DBMS and t[author] =
Silbershatz.
Query Languages
A language in which a user requests information from the database is called a query
language.
o Procedural- user instructs the system to perform a sequence of operations on the
database to compute the desired result. E.g. Relational algebra
o Nonprocedural- user describes the information desired without giving a specific
procedure for obtaining the desired information. E.g. tuple calculus, domain
calculus.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture 10
Relational Algebra:
Basic operations:
o Selection () Selects a subset of rows from relation.
o Projection () Selects a subset of columns from relation.
o Cross-product () Allows us to combine two relations.
o Set-difference () Tuples in reln. 1, but not in reln. 2.
o Union (U) Tuples in reln. 1 and in reln. 2.
o Rename( ) Use new name for the Tables or fields.
Additional operations:
o Intersection (), join( ), division(): Not essential, but (very!) useful.
Since each operation returns a relation, operations can be composed! (Algebra is
closed.)
Projection
Deletes attributes that are not in projection list.
Schema of result contains exactly the fields in the projection list, with the same names
that they had in the (only) input relation. ( Unary Operation)
Projection operator has to eliminate duplicates! (as it returns a relation which is a set)
o Note: real systems typically dont do duplicate elimination unless the user
explicitly asks for it. (Duplicate values may be representing different real world
entity or relationship)
Consider the BOOK table:

Title
(BOOK) =
Selection
Selects rows that satisfy selection condition.
No duplicates in result! (Why?)
Schema of result identical to schema of (only) input relation.
Result relation can be the input for another relational algebra operation! (Operator
composition.)

Acc-no>300
(BOOK) =
Acc-No Title Author
100 DBMS Silbershatz
200 DBMS Ramanuj
300 COMPILER Silbershatz
400 COMPILER Ullman
500 OS Sudarshan
600 DBMS Silbershatz
Title
DBMS
COMPILER
OS
Acc-No Title Author
400 COMPILER Ullman
500 OS Sudarshan
600 DBMS Silbershatz
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal

Title=DBMS
(BOOK)=

Acc-no
(
Title=DBMS
(BOOK))=
Union, Intersection, Set-Difference
All of these operations take two input relations, which must be union-compatible:
o Same number of fields.
o `Corresponding fields have the same type.
What is the schema of result?
Consider:
Borrower Depositor
List of customers who are either borrower or depositor at bank=
Cust-name
(Borrower) U

Cust-name
(Depositor)=
Customers who are both borrowers and depositors =
Cust-name
(Borrower)
Cust-name
(Depositor)=
Customers who are borrowers but not depositors =
Cust-name
(Borrower)
Cust-name
(Depositor)=
Acc-No Title Author
100 DBMS Silbershatz
200 DBMS Ramanuj
600 DBMS Silbershatz
Acc-No
100
200
600
Cust-name Loan-no
Ram L-13
Shyam L-30
Suleman L-42
Cust-name Acc-no
Suleman A-100
Radheshyam A-300
Ram A-401
Cust-name
Ram
Shyam
Suleman
Radeshyam
Cust-name
Ram
Suleman
Cust-name
Shyam
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-11
Cartesian-Product or Cross-Product (S1 R1)
Each row of S1 is paired with each row of R1.
Result schema has one field per field of S1 and R1, with field names `inherited if
possible.
Consider the borrower and loan tables as follows:
Borrower: Loan:
Cross product of Borrower and Loan, Borrower Loan =
The rename operation can be used to rename the fields to avoid confusion when two field
names are same in two participating tables:
For example the statement,
Loan-borrower(Cust-name,Loan-No-1, Loan-No-2,Amount)
( Borrower Loan)
results into- A new Table named Loan-borrower is created where it has four fields which
are renamed as Cust-name, Loan-No-1, Loan-No-2 and Amount and the rows contains
the same data as the cross product of Borrower and Loan.
Loan-borrower:
Rename Operation:
It can be used in two ways :
o ( ) return the result of expression E in the table named x.
Cust-name Loan-no
Ram L-13
Shyam L-30
Suleman L-42
Loan-no Amount
L-13 1000
L-30 20000
L-42 40000
Borrower.Cust-
name
Borrower.Loan-
no
Loan.Loan-
no
Loan.Amount
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000
Cust-name Loan-No-1 Loan-No-2 Amount
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
o
( , ,, )
( ) return the result of expression E in the table named x with
the attributes renamed to A
1,
A
2,
, A
n
.
o Its benefit can be understood by the solution of the query Find the largest
account balance in the bank
It can be solved by following steps:
1. Find out the relation of those balances which are not largest.
a. Consider Cartesion product of Account with itself i.e. Account
Account
b. Compare the balances of first Account table with balances of
second Account table in the product.
c. For that we should rename one of the account table by some
other name to avoid the confusion
d. It can be done by following operation

Account.balance
(
Account.balance < d.balance
(Account
d
(Account))
e. So the above relation contains the balances which are not
largest.
2. Subtract this relation from the relation containing all the balances i.e .

balance
(Account).
3. So the final statement for solving above query is

balance
(Account)-
Account.balance
(
Account.balance < d.balance
(Account
d
(Account))
Additional Operations
Natural Join ( )
Forms Cartesian product of its two arguments, performs selection forcing equality
on those attributes that appear in both relations
For example consider Borrower and Loan relations, the natural join between them
Borrower Loan will automatically perform the selection on the table returned
by Borrower Loan which force equality on the attribute that appear in both
Borrower and Loan i.e. Loan-no and also will have only one of the column named
Loan-No.
That means Borrower Loan =
Borrower.Loan-no = Loan.Loan-no
(Borrower Loan).
The table returned from this will be as follows:
Eliminate rows that does not satisfy the selection criteria
Borrower.Loan-no = Loan.Loan-
no
from Borrower Loan =

And will remove one of the column named Loan-no.


Borrower.Cust-
name
Borrower.Loan-
no
Loan.Loan-
no
Loan.Amount
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
i.e. Borrower Loan =
Division Operation:
denoted by is used for queries that include the phrase for all.
For example Find customers who has an account in all branches in branch city
Agra. This query can be solved by following statement.

Customer-name. branch-name
(Depositor )
branch-name
(
Branch-city=Agra
(Branch)
The division operations can be specified by using only basic operations as
follows: Let r(R) and s(S) be given relations for schema R and S with S R
r s =
R-S
(r) -
R-S
((
R-S
(r) s) -
R-S,S
(r))
Cust-name Loan-no Amount
Ram L-13 1000
Shyam L-30 20000
Suleman L-42 40000
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-12
Tuple Relational Calculus
Relational algebra is an example of procedural language while tuple relational
calculus is a nonprocedural query language.
A query is specified as:
{t | P(t)}, i.e it is the set of all tuples t such that predicate P is true for t.
The formula P(t) is formed using atoms which uses the relations, tuples of
relations and fields of tuples and following symbols
o ( belongs to),<,>,,,,=, (comparison operators)
These atoms can then be used to form formulas with following symbols
o ( universal qualifier generally called "for all")
o ( existential qualifier generally called "there exists")
o ( and), (or), ( not)
For example : here are some queries and a way to express them using tuple
calculus:
o Find the branch-name, loan-number and amount for loans over Rs 1200.
{t| t Loan t[amount] > 1200}.
o Find the loan number for each loan of an amount greater that Rs1200.
{t| s Loan(t[loan-number] = s[loan-number] s[amount] >1200}
o Find the names of all the customers who have a loan from the Sadar
branch.
{t | s Borrower ( t customer-name =s customer-name
u Loan ( u[loan-number] = s[loan-number
u[branch-name] = "Sadar"))}
o Find all customers who have a loan , an account, or both at the bank
{t| s Borrower ( t[customer-name] = s[customer-name])
u Depositor (t[customer-name] = u[customer-name])}
o Find only those customers who have both an account and a loan.
{t| s Borrower ( t[customer-name] = s[customer-name])
u Depositor (t[customer-name] = u[customer-name])}
o Find all customers who have an account but do not have loan.
{t| u Depositor (t[customer-name] = u[customer-name])
s Borrower ( t[customer-name] = s[customer-name])}
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
o Find all customers who have an account at all branches located in Agra
{t | w Branch( w[branch-city] = "Agra"
s Depositor ( t customer-name = s customer-name
u Account ( u[account-number] = s[account-number]
u[branch-name] = w[branch-name])))}
Domain Relational Calculus
Domain relational calculus is another non procedural language for expressing database
queries.
A query is specified as:
{<x
1
,x
2
,,x
n
> | P(x
1
,x
2
,,x
n
)} where x
1
,x
2
,,x
n
represents domain variables. P
represent a predicate formula as in tuple calculus
Since the domain variables are referred in place of tuples the formula doesnt refer the
fields of tuples rather they refer the domain variables.
For example the queries in domain calculus are mentioned as follows:
o Find the branch-name, loan-number and amount for loans over Rs 1200.
{<b, l, a>| <b, l, a> Loan a >1200}.
o Find the loan number for each loan of an amount greater that Rs1200.
{< l >| b,a( <b, l, a> Loan a >1200}
o Find the names of all the customers who have a loan from the Sadar branch and
find the loan amount
{<c, a >| l(<c, l > Borrower
b( <b, l, a > Loan b="Sadar"))}
o Find names of all customers who have a loan , an account, or both at the Sadar
Branch
{<c>| l(<c, l > Borrower b, a(<b, l, a> Loan b ="Sadar"))
a(<c, a> Depositor b, n(<b, a, n> Account b ="Sadar"))}
o Find only those customers who have both an account and a loan.
{<c>| l(<c, l> Borrower ) a(<c, a> Depositor )}
o Find all customers who have an account but do not have loan.
{t| a(<c, a> Depositor ) l(<c, l> Borrower )}
o Find all customers who have an account at all branches located in Agra
{<c>| x, y, z(<x, y, z> Branch) y = "Agra"
a, b(<x, a, b> Account <c, a> Depositor)}
Outer Join
Outer join operation is an extension of join operation to deal with missing information
Suppose that we have following relational schemas:
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Employee( employee-name, street, city)
Fulltime-works(employee-name, branch-name, salary)
A snapshot of these relations is as follows:
Employee:
Fulltime-works
Suppose we want complete information of the full time employees.
The natural join (Employee Fulltime-works)will result into the loss of information for
Suleman and Rehman because they dont have record in both the tables ( left and right
relation). The outer join will solve the problem.
Three forms of outer join:
o Left outer join():the tuples which doesnt match while doing natural join
from left relation are also added in the result putting null values in missing field
of right relation.
o Right outer join():the tuples which doesnt match while natural join from
right relation are also added in the result putting null values in missing field of left
relation.
o Full outer join(): include both of the left and right outer joins i.e. adds the
tuples which did not match either in left relation or right relation and put null in
place of missing values.
The result for three forms of outer join are as follows:
Left join: Employee Fulltime-works=
Right join: Employee Fulltime-works=
Full join: Employee Fulltime-works=
employee-name street city
Ram M G Road Agra
Shyam New Mandi Road Mathura
Suleman Bhagat Singh Road Aligarh
employee-name branch-name salary
Ram Sadar 30000
Shyam Sanjay Place 20000
Rehman Dayalbagh 40000
employee-name street City branch-name salary
Ram M G Road Agra Sadar 30000
Shyam New Mandi Road Mathura Sanjay Place 20000
Suleman Bhagat Singh Road Aligarh Null Null
employee-name street city branch-name salary
Ram M G Road Agra Sadar 30000
Shyam New Mandi Road Mathura Sanjay Place 20000
Rehman null null Dayalbagh 40000
employee-name street city branch-name salary
Ram M G Road Agra Sadar 30000
Shyam New Mandi Road Mathura Sanjay Place 20000
Suleman Bhagat Singh Road Aligarh null null
Rehman null null Dayalbagh 40000
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Aggregate Functions
Aggregate functions are functions that take a collection of values and return a single
value as a result.
Examples are sum, avg, count, max, min.
Find the total balance of all the accounts
sum
balance
(Account).
Find the no of borrowers
count
customer-name
(Borrower)
Find the distinct customers who are either borrowers or depositors.
count-distinct
customer-name
(Borrower Depositor)
The aggregate functions can be applied on sub groups of the rows in the table rather than
on all of the rows of table using the denoted by symbol( ).
For example we want to find the total salary of all the full time employees branch wise. It
can be specified as follows:
branch-name
(Fulltime-works)
Fulltime-works
The result of aggregate function with grouping specified above will be:
employee-name branch-name salary
Ram Sadar 30000
Shyam Sanjay Place 20000
Rehman Dayalbagh 40000
Suleman Sadar 25000
branch-name sum of salary
Sadar 55000
Sanjay Place 20000
Dayalbagh 40000
Group1: branch name = sadar
Group2: branch name = sanjay place
Group3: branch name = Dayalbagh
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-13
Structured Query Language (SQL)
Introduction
Commercial database systems use more user friendly language to specify the queries.
SQL is the most influential commercially marketed product language.
Other commercially used languages are QBE, Quel, and Datalog.
Basic Structure
The basic structure of an SQL consists of three clauses: select, from and where.
select: it corresponds to the projection operation of relational algebra. Used to list the
attributes desired in the result.
from: corresponds to the Cartesian product operation of relational algebra. Used to list
the relations to be scanned in the evaluation of the expression
where: corresponds to the selection predicate of the relational algebra. It consists of a
predicate involving attributes of the relations that appear in the from clause.
A typical SQL query has the form:
select A
1
, A
2
,, A
n
fromr
1
, r
2
,, r
m
where P
o A
i
represents an attribute
o r
j
represents a relation
o P is a predicate
o It is equivalent to following relational algebra expression:
o
A
1
,A
2
,,A
n
(
P
(r
1
r
2
r
m
))
[Note: The words marked in dark in this text work as keywords in SQL language. For example
select, from and where in the above paragraph are shown in bold font to indicate that
they are keywords]
Select Clause
Let us see some simple queries and use of select clause to express them in SQL.
Find the names of all branches in the Loan relation
select branch-name
from Loan
By default the select clause includes duplicate values. If we want to force the elimination
of duplicates the distinct keyword is used as follows:
select distinct branch-name
from Loan
The all key word can be used to specify explicitly that duplicates are not removed. Even
if we not use all it means the same so we dont require all to use in select clause.
select all branch-name
from Loan
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
The asterisk * can be used to denote all attributes. The following SQL statement will
select and all the attributes of Loan.
select *
from Loan
The arithmetic expressions involving operators, +, -, *, and / are also allowed in select
clause. The following statement will return the amount multiplied by 100 for the rows in
Loan table.
select branch-name, loan-number, amount * 100
from Loan
Where Clause
Find all loan numbers for loans made at Sadar branch with loan amounts greater than
Rs 1200.
select loan-number
from Loan
where branch-name= Sadar and amount > 1200
where clause uses uses logival connectives and, or, and not
operands of the logical connectives can be expressions involving the comparison
operators <, <=, >, >=, =, and < >.
between can be used to simplify the comparisons
select loan-number
from Loan
where amount between 90000 and 100000
From Clause
The from clause by itself defines a Cartesian product of the relations in the clause.
When an attribute is present in more than one relation they can be referred as relation-
name.attribute-name to avoid the ambiguity.
For all customers who have loan from the bank, find their names and loan numbers
select distinct customer-name, Borrower.loan-number
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number
The Rename Operation
Used for renaming both relations both relations and attributes in SQL
Use as clause: old-name as new-name
Find the names and loan numbers of the customers who have a loan at the Sadar
branch.
select distinct customer-name, borrower.loan-number as loan-id
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number and
branch-name = Sadar
we can now refer the loan-number instead by the name loan-id.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
For all customers who have a loan from the bank, find their names and loan-numbers
select distinct customer-name, T.loan-number
from Borrower as T, Loan as S
where T.loan-number = S.loan-number
Find the names of all branches that have assets greater than at least one branch located in
Mathura.
select distinct T.branch-name
from branch as T, branch as S
where T.assets > S.assets and S.branch-city = Mathura
String Operation
Two special characters are used for pattern matching in strings:
o Percent ( % ) : The % character matches any substring
o Underscore( _ ): The _ character matches any character
%Mandi: will match with the strings ending with Mandi viz. Raja Ki mandi,
Peepal Mandi
_ _ _ matches any string of three characters.
Find the names of all customers whose street address includes the substring Main
select customer-name
from Customer
where customer-street like %Main%
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-14
Set Operations
union, intersect and except operations are set operations available in SQL.
Relations participating in any of the set operation must be compatible; i.e. they must have
the same set of attributes.
Union Operation:
o Find all customers having a loan, an account, or both at the bank
(select customer-name
fromDepositor )
union
(select customer-name
from Borrower )
It will automatically eliminate duplicates.
o If we want to retain duplicates union all can be used
(select customer-name
fromDepositor )
union all
(select customer-name
from Borrower )
Intersect Operation
o Find all customers who have both an account and a loan at the bank
(select customer-name
fromDepositor )
intersect
(select customer-name
from Borrower )
o If we want to retail all the duplicates
(select customer-name
fromDepositor )
intersect all
(select customer-name
from Borrower )
Except Opeartion
o Find all customers who have an account but no loan at the bank
(select customer-name
fromDepositor )
except
(select customer-name
from Borrower )
o If we want to retain the duplicates:
(select customer-name
fromDepositor )
except all
(select customer-name
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
from Borrower )
Aggregate Functions
Aggregate functions are those functions which take a collection of values as input and
return a single value.
SQL offers 5 built in aggregate functions-
o Average: avg
o Minimum:min
o Maximum:max
o Total: sum
o Count:count
The input to sum and avg must be a collection of numbers but others may have
collections of non-numeric data types as input as well
Find the average account balance at the Sadar branch
select avg(balance)
from Account
where branch-name= Sadar
The result will be a table which contains single cell (one row and one column) having
numerical value corresponding to average balance of all account at sadar branch.
group by clause is used to form groups, tuples with the same value on all attributes in
the group by clause are placed in one group.
Find the average account balance at each branch
select branch-name, avg(balance)
from Account
group by branch-name
By default the aggregate functions include the duplicates.
distinct keyword is used to eliminate duplicates in an aggregate functions:
Find the number of depositors for each branch
select branch-name, count(distinct customer-name)
from Depositor, Account
where Depositor.account-number = Account.account-number
group by branch-name
having clause is used to state condition that applies to groups rather than tuples.
Find the average account balance at each branch where average account balance is more
than Rs. 1200
select branch-name, avg(balance)
from Account
group by branch-name
having avg(balance) > 1200
Count the number of tuples in Customer table
select count(*)
from Customer
SQL doesnt allow distinct with count(*)
When where and having are both present in a statement where is applied before having.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Nested Subqueries
A subquery is a select-from-where expression that is nested within another query.
Set Membership
o The in and not in connectives are used for this type of subquery.
o Find all customers who have both a loan and an account at the bank, this query
can be written using nested subquery form as follows
select distinct customer-name
from Borrower
where customer-name in(select customer-name
fromDepositor )
o Select the names of customers who have a loan at the bank, and whose names are
neither Smith nor Jones
select distinct customer-name
from Borrower
where customer-name not in(Smith, Jones)
Set Comparison
o Find the names of all branches that have assets greater than those of at least one
branch located in Mathura
select branch-name
from Branch
where asstets > some (select assets
from Branch
where branch-city = Mathura )
o Apart from > some others comparison could be < some , <= some , >= some ,
= some , < > some.
o Find the names of all branches that have assets greater than that of each branch
located in Mathura
select branch-name
from Branch
where asstets > all (select assets
from Branch
where branch-city = Mathura )
o Apart from > all others comparison could be < all , <= all , >= all , = all ,
< >all.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Lecture-15
Views
? In SQL create view command is used to define a view as follows:
create view v as <query expression>
where <query expression> is any legal query expression and v is the view name.
? The view consisting of branch names and the names of customers who have either an
account or a loan at the branch. This can be defined as follows:
create view All-customer as
(select branch-name, customer-name
from Depositor, Account
where Depositor.account-number=account.account-number)
union
(select branch-name, customer-name
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number)
? The attributes names may be specified explicitly within a set of round bracket after the
name of view.
? The view names may be used as relations in subsequent queries. Using the view All-
customer find all customers of Sadar branch
select customer-name
from All-customer
where branch-name= Sadar
? A create-view clause creates a view definition in the database which stays until a
command - drop view view-name - is executed.

Modification of Database
? Deletion
o In SQL we can delete only whole tuple and not the values on any particular
attributes. The command is as follows:
delete from r
where P.
where P is a predicate and r is a relation.
o delete command operates on only one relation at a time. Examples are as follows:
o Delete all tuples from the Loan relation
delete from Loan
o Delete all of the Smiths account records
delete from Depositor
where customer-name = Smith
o Delete all loans with loan amounts between Rs 1300 and Rs 1500.
delete from Loan
where amount between 1300 and 1500




www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
o Delete the records of all accounts with balances below the average at the bank
delete from Account
where balance < ( select avg(balance)
from Account)

? Insertion
o In SQL we either specify a tuple to be inserted or write a query whose result is a
set of tuples to be inserted. Examples are as follows:
o Insert an account of account number A-9732 at the Sadar branch having balance
of Rs 1200
insert into Account
values(Sadar, A-9732, 1200)
the values are specified in the order in which the corresponding attributes are
listed in the relation schema.
o SQL allows the attributes to be specified as part of the insert statement
insert into Account(account-number, branch-name, balance)
values(A-9732, Sadar, 1200)
insert into Account(branch-name, account-number, balance)
values(Sadar, A-9732, 1200)
o Provide for all loan customers of the Sadar branch a new Rs 200 saving account
for each loan account they have. Where loan-number serve as the account number
for these accounts.
insert into Account
select branch-name, loan-number, 200
from Loan
where branch-name = Sadar
? Updates
o Used to change a value in a tuple without changing all values in the tuple.
o Suppose that annual interest payments are being made, and all balances are to be
increased by 5 percent.
update Account
set balance = balance * 1.05
o Suppose that accounts with balances over Rs10000 receive 6 percent interest,
whereas all others receive 5 percent.
update Account
set balance = balance * 1.06
where balance > 10000

update Account
set balance = balance * 1.05
where balance <= 10000




www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing

Department of Electrical and Electronics By: Sulabh Bansal
Data Definition Language
? Data Types in SQL
o char(n): fixed length character string, length n.
o varchar(n): variable length character string, maximum length n.
o int: an integer.
o smallint: a small integer.
o numeric(p,d): fixed point number, p digits( plus a sign), and d of the p digits are
to right of the decimal point.
o real, double precision: floating point and double precision numbers.
o float(n): a floating point number, precision at least n digits.
o date: calendar date; four digits for year, two for month and two for day of month.
o time: time of day n hours minutes and seconds.
? Domains can be defined as
create domain person-name char(20).
the domain name person-name can be used to define the type of an attribute just like
built-in domain.
? Schema Definition in SQL
o create table command is used to define relations.
create table r (A
1
D
1,
A
2
D
2
,, A
n
D
n
,
<integrity constraint
1
>,
,
<integrity constraint
k
>)
where r is relation name, each A
i
is the name of attribute, D
i
is the domain type of
values of A
i
. Several types of integrity constraints are available to define in SQL.
o Integrity Constraints which are allowed in SQL are
primary key(A
j1,
A
j2,,
A
jm
)
and
check(P) where P is the predicate.
o drop table command is used to remove relations from database.
o alter table command is used to add attributes to an existing relation
alter table r add A D
it will add attribute A of domain type D in relation r.
alter table r drop A
it will remove the attribute A of relation r.

www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-16
Integrity Constraints
Integrity Constraints guard against accidental damage to the database.
Integrity constraints are predicates pertaining to the database.
Domain Constraints:
o Predicates defined on the domains are Domain constraints.
o Simplest Domain constraints are defined by defining standard data types of the
attributes like Integer, Double, Float, etc.
o We can define domains by create domain clause also we can define the
constraints on such domains as follows:
create domain hourly-wage numeric(5,2)
constraint wage-value-test check(value >= 4.00)
So we can use hourly-wage as data type for any attribute where DBMS will
automatically allow only values greater than or equal to 4.00.
o Other examples for defining Domain constraints are as follows:
create domain account-number char(10)
constraint account-number-null-test check(value not null)
create domain account-type char(10)
constraint account-tyope-test
check(value in ( Checking, Saving))
By using the later domain of two above the DBMS will allow only values for any
attribute having type as account-type i.e. Checking and Saving.
Referential Integrity:
o Foreign Key: If two table R and S are related to each other, K1 and K2 are
primary keys of the two relations also K1 is one of the attribute in S. Suppose we
want that every row in S must have a corresponding row in R, then we define the
K1 in S as foreign key. Example in our original database of library we had a table
for relation BORROWEDBY, containing two fields Card No. and Acc. No. .
Every row of BORROWEDBY relation must have corresponding row in USER
Table having same Card No. and a row in BOOK table having same Acc. No..
Then we will define the Card No. and Acc. No. in BORROWEDBY relation as
foreign keys.
o In other way we can say that every row of BORROWEDBY relation must refer to
some row in BOOK and also in USER tables.
o Such referential requirement in one table to another table is called Referential
Integrity.
o Referential Integrity constraints are defined by defining some of the attributes in a
table, which forms primary key of some other table, as foreign key.
Functional Dependencies
o Suppose in a relation having schema R, R and R. A functional
dependency holds on R if, in any table having schema R, for every two rows
r1 and r2 the values of attributes are same in r1 and r2 then values of attributes
are also same.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
o Consider for example the table as follows
Seq
A B C D
1
a
1
b
1
c
1
d
1
2
a
1
b
2
c
1
d
2
3
a
2
b
2
c
2
d
2
4
a
2
b
3
c
2
d
3
5
a
3
b
3
c
2
d
4
Check if AC Holds, find pair of rows where value of A is same
row 1 and 2, value of A is same and C is also same
row 3 and 4, Value of A is same and C is also same
No other two rows having same value on A, So AC holds.
Check if CA Holds, find pair of rows where value of C is same
row 1 and 2, value of C is same and A is also same
row 3 and 4, value of C is same and A is also same
row 4 and 5, value of C is same but A is not same, So CA doesnt hold.
We can prove ABD also holds, find pair of rows where value of A and B
are both same
No row where A and B both are same, So ABD holds
o If K is a super key of a relation R then it means functional dependency KR
holds and vice versa.
o Armstrongs Rules: Suppose there is a given relation R and a set of functional
dependencies F that holds on R. Then these rules can be used to derive all of the
other functional dependencies which are logically implied from the given relation
R and functional dependencies F.
Reflexivity rule: if is a set of attributes and , then holds.
Augmentation rule: if holds and is a set of attributes, then
holds.
Transitivity rule: if holds and holds, then holds.
o Additional rules are also formed to simplify deriving new functional dependencies
since applying Armstrongs rules is a lengthy and tiresome task. Although we can
generate all the functional dependencies using only Armstrongs rule.
Union rule: if holds and holds, then holds.
Decomposition rule. if holds, then holds and holds.
Pseudotransitivity rule. If holds and holds, then
holds.
o Closure of Functional Dependencies: Suppose the given set of functional
dependencies is F for a given relation schema R. When we apply various rules
stated above and generate all of the possible newer functional dependencies. Then
the set containing all these newer functional dependencies and the given set of
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
functional dependencies F is called the closure of functional dependencies and is
denoted as F
+
.
o Consider schema R=( A, B, C, G, H, I ) and the set of functional dependencies F
containing following functional dependencies.
AB
AC
CGH
CGI
BH
Find other functional dependencies that can be derived using various rules
given above
Examples are as follows-
AH can be derived using functional dependencies 1 and 5 and
transitivity rule.
CGHI can be derived using functional dependencies 3 and 4 and union
rule.
AGI can be derived using 2 and 4 and Pseudotransitivity.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-17
Normal Forms
Some of the undesirable properties that a bad database design may have
o Repetition of information
o Inability to represent certain information
o Incapability to maintain integrity of data
The normal forms of relational database theory provide criteria for determining a table's
degree of vulnerability to logical inconsistencies and anomalies.
The higher the normal form applicable to a table, the less vulnerable it is to
inconsistencies and anomalies.
Each table has a "highest normal form" (HNF): by definition, a table always meets the
requirements of its HNF and of all normal forms lower than its HNF; also by definition, a
table fails to meet the requirements of any normal form higher than its HNF.
Generally known hierarchy of normal forms is as follows First Normal Form(1NF),
Second Normal Form(2NF), Third Normal Form(3NF), Fourth Normal Form(4NF), Fifth
Normal Form(5NF).
We will discuss only up to 3NF of above hierarchy and another normal form Boyce-Codd
Normal Form(BCNF) in this course.
First Normal Form
According to Date's definition of 1NF, a table is in 1NF if and only if it is "isomorphic to
some relation", which means, specifically, that it satisfies the following five conditions:
1. There's no top-to-bottom ordering to the rows.
2. There's no left-to-right ordering to the columns.
3. There are no duplicate rows.
4. Every row-and-column intersection contains exactly one value from the
applicable domain (and nothing else).
5. All columns are regular [i.e. rows have no hidden components such as row IDs,
object IDs, or hidden timestamps].
Examples of tables (or views) that would not meet this definition of 1NF are:
o A table that lacks a unique key. Such a table would be able to accommodate
duplicate rows, in violation of condition 3.
o A view whose definition mandates that results be returned in a particular order, so
that the row-ordering is an intrinsic and meaningful aspect of the view. This
violates condition 1. The tuples in true relations are not ordered with respect to
each other.
o A table which is having at least one nullable attribute. A nullable attribute would
be in violation of condition 4, which requires every field to contain exactly one
value from its column's domain. It should be noted, however, that this aspect of
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
condition 4 is controversial. It marks an important departure from Codd's later
vision of the relational model, which made explicit provision for nulls.
Codd states that the "values in the domains on which each relation is defined are required
to be atomic with respect to the DBMS." Codd defines an atomic value as one that
"cannot be decomposed into smaller pieces by the DBMS (excluding certain special
functions)." Meaning a field should not be divided into parts with more than one kind of
data in it such that what one part means to the DBMS depends on another part of the
same field.
Suppose a novice designer wish to record the names and telephone numbers of
customers. He defines a customer table which looks like this:
Customer
Customer ID First Name Surname
Telephone
Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
789 Maria Fernandez 555-808-9633
The designer then becomes aware of a requirement to record multiple telephone
numbers for some customers. He reasons that the simplest way of doing this is to
allow the "Telephone Number" field in any given record to contain more than one
value:
Assuming, however, that the Telephone Number column is defined on some Telephone
Number-like domain (e.g. the domain of strings 12 characters in length), the
Customer
ID
First Name Surname
Telephone
Number
123 Robert Ingram 555-861-2025
456 Jane Wright
555-403-1659
555-776-4100
789 Maria Fernandez 555-808-9633
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
representation above is not in 1NF. 1NF (and, for that matter, the RDBMS) prevents a
single field from containing more than one value from its column's domain.
Repeating groups across columns: The designer might attempt to get around this
restriction by defining multiple Telephone Number columns:
Customer
ID
First
Name
Surname Tel. No. 1 Tel. No. 2 Tel. No. 3
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659 555-776-4100 555-403-1659
789 Maria Fernandez 555-808-9633
This representation, however, makes use of nullable columns, and therefore does not
conform to Date's definition of 1NF. Even if the view is taken that nullable columns are
allowed, the design is not in keeping with the spirit of 1NF.Tel. No. 1, Tel. No. 2., and
Tel. No. 3. share exactly the same domain and exactly the same meaning; the splitting of
Telephone Number into three headings is artificial and causes logical problems. These
problems include:
o Difficulty in querying the table. Answering such questions as "Which
customers have telephone number X?" and "Which pairs of customers share a
telephone number?" is awkward.
o Inability to enforce uniqueness of Customer-to-Telephone Number links
through the RDBMS. Customer 789 might mistakenly be given a Tel. No. 2
value that is exactly the same as her Tel. No. 1 value.
o Restriction of the number of telephone numbers per customer to three. If a
customer with four telephone numbers comes along, we are constrained to
record only three and leave the fourth unrecorded. This means that the
database design is imposing constraints on the business process, rather than
(as should ideally be the case) vice-versa.
Repeating groups within columns: The designer might, alternatively, retain the
single Telephone Number column but alter its domain, making it a string of sufficient
length to accommodate multiple telephone numbers:
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
This design is consistent with 1NF according to Dates definition but not according to
Codds definition. It presents several design issues. The Telephone Number heading
becomes semantically woolly, as it can now represent either a telephone number, a list of
telephone numbers, or indeed anything at all. A query such as "Which pairs of customers
share a telephone number?" is more difficult to formulate, given the necessity to cater for
lists of telephone numbers as well as individual telephone numbers. Meaningful
constraints on telephone numbers are also very difficult to define in the RDBMS with this
design.
A design that complies with 1NF:A design that is unambiguously in 1NF makes
use of two tables: a Customer Name table and a Customer Telephone Number table.
Customer Name Customer Telephone
Customer
ID
First
Name
Surname
123 Robert Ingram
456 Jane Wright
789 Maria Fernandez
Repeating groups of telephone numbers do not occur in this design. Instead, each
Customer-to-Telephone Number link appears on its own record.
It is worth noting that this design meets the additional requirements for second
and third normal form (3NF).
Customer
ID
First
Name
Surname
Telephone
Numbers
123 Robert Ingram 555-861-2025
456 Jane Wright
555-403-1659,
555-776-4100
789 Maria Fernandez 555-808-9633
Customer
ID
Telephone
Number
123 555-861-2025
456 555-403-1659
456 555-776-4100
789 555-808-9633
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-18
Second Normal Form
2NF was originally defined by E.F. Codd in 1971.
A 1NF table is in 2NF if and only if, given any candidate key K and any attribute A
that is not a constituent of a candidate key, A depends upon the whole of K rather
than just a part of it
A 1NF table is in 2NF if and only if all its non-prime attributes are functionally
dependent on the whole of every candidate key. (A non-prime attribute is one that
does not belong to any candidate key.)
Note that when a 1NF table has no composite candidate keys (candidate keys
consisting of more than one attribute), the table is automatically in 2NF.
Consider a table describing employees' skills:
Employees' Skills
Employee Skill
Current
Work
Location
Jones Typing
114
Main
Street
Jones Shorthand
114
Main
Street
Jones Whittling
114
Main
Street
Bravo
Light
Cleaning
73
Industrial
Way
Ellis Alchemy
73
Industrial
Way
Ellis Flying
73
Industrial
Way
Harrison
Light
Cleaning
73
Industrial
Way
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Neither {Employee} nor {Skill} is a candidate key for the table. This is because a
given Employee might need to appear more than once (he might have multiple
Skills), and a given Skill might need to appear more than once (it might be
possessed by multiple Employees). Only the composite key {Employee, Skill}
qualifies as a candidate key for the table.
The remaining attribute, Current Work Location, is dependent on only part of the
candidate key, namely Employee. Therefore the table is not in 2NF. Note the
redundancy in the way Current Work Locations are represented: we are told three
times that Jones works at 114 Main Street, and twice that Ellis works at 73
Industrial Way. This redundancy makes the table vulnerable to update anomalies:
it is, for example, possible to update Jones' work location on his "Typing" and
"Shorthand" records and not update his "Whittling" record. The resulting data
would imply contradictory answers to the question "What is Jones' current work
location?"
A 2NF alternative to this design would represent the same information in two tables:
an "Employees" table with candidate key {Employee}, and an "Employees' Skills"
table with candidate key {Employee, Skill}:
Employees Employees Skills
Employee Current Work Location
Jones 114 Main Street
Bravo 73 Industrial Way
Ellis 73 Industrial Way
Harrison 73 Industrial Way
Neither of these tables can suffer from update anomalies.
Not all 2NF tables are free from update anomalies, however. An example of a 2NF
table which suffers from update anomalies is:
Tournament Winners
Tournament Year Winner
Winner Date of
Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977
Employee Skill
Jones Typing
Jones Shorthand
Jones Whittling
Bravo Light Cleaning
Ellis Alchemy
Ellis Flying
Harrison Light Cleaning
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Even though Winner and Winner Date of Birth are determined by the whole key
{Tournament / Year} and not part of it, particular Winner / Winner Date of Birth
combinations are shown redundantly on multiple records. This leads to an update
anomaly: if updates are not carried out consistently, a particular winner could be
shown as having two different dates of birth.
The underlying problem is the transitive dependency to which the Winner Date of
Birth attribute is subject. Winner Date of Birth actually depends on Winner,
which in turn depends on the key Tournament / Year.
This problem is addressed by third normal form (3NF)
Note: In addition to the primary key, the table may contain other candidate keys; it is
necessary to establish that no non-prime attributes have part-key dependencies on any
of these candidate keys.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-19
Third Normal Form:
3NF as defined by E.F. Codd in 1971 is - a table is in 3NF if and only if both of the
following conditions hold:
o The relation R (table) is in second normal form (2NF)
o Every non-prime attribute of R is non-transitively dependent (i.e. directly
dependent) on every candidate key of R.
o Note:
A non-prime attribute of R is an attribute that does not belong to any
candidate key of R.
A transitive dependency is a functional dependency in which X Z (X
determines Z) indirectly, because X Y and Y Z (where it is not the
case that Y X).
A 3NF definition, equivalent to Codd's given by Carlo Zaniolo in 1982, states that a table
is in 3NF if and only if, for each of its functional dependencies X A, at least one of the
following conditions holds:
o X contains A (that is, X A is trivial functional dependency), or
o X is a superkey, or
o Each attribute in X-A is a prime attribute (i.e., it is contained within a candidate
key)
Zaniolo's definition gives a clear sense of the difference between 3NF and the more
stringent Boyce-Codd normal form (BCNF). BCNF simply eliminates the third
alternative ("X-A has only prime attribute").
Difference between 2NF and 3NF can be stated as: non-key attributes be dependent on
"the whole key" ensures that a table is in 2NF; while that non-key attributes be dependent
on "nothing but the key" ensures that the table is in 3NF.
Example of table given above :
Tournament Winners
Tournament Year Winner Winner Date of Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
This table is in 2NF but not in 3NF. The breach of 3NF occurs because the non-prime
attribute Winner Date of Birth is transitively dependent on the candidate key
{Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of
Birth is functionally dependent on Winner makes the table vulnerable to logical
inconsistencies, as there is nothing to stop the same person from being shown with
different dates of birth on different records.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
In order to express the same facts without violating 3NF, it is necessary to split the table
into two:
Tournament Winners Player Dates of Birth
Tournament Year Winner
Des Moines Masters 1998 Chip Masterson
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson
Des Moines Masters 1999 Al Fredrickson
Indiana Invitational 1999 Chip Masterson
Boyce-Codd Normal Form:
It is a slightly stronger version of the third normal form (3NF). A table is in Boyce-Codd
normal form if and only if for every one of its non-trivial [dependencies] X Y, X is a
superkeythat is, X is either a candidate key or a superset thereof.
Note the above set of tables Tournament Winners and Player Dates of Birth shown as
in 3NF are also in BCNF
Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table
which does not have multiple overlapping candidate keys is guaranteed to be in BCNF
An example of a 3NF table that does not meet BCNF is
Today's Court Bookings
Court Start Time End Time Rate Type
1 09:30 10:30 SAVER
1 11:00 12:00 SAVER
1 14:00 15:30 STANDARD
2 10:00 11:30 PREMIUM-B
2 11:30 13:30 PREMIUM-B
2 15:00 16:30 PREMIUM-A
There are two courts available and there are four distinct rate types:
SAVER, for Court 1 bookings made by members
STANDARD, for Court 1 bookings made by non-members
PREMIUM-A, for Court 2 bookings made by members
PREMIUM-B, for Court 2 bookings made by non-members
So, Rate Type Court is only non-trivial functional dependency that holds.
o We can observe that the table's candidate keys are:
{Court, Start Time}
Player Date of Birth
Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Bob Albertson 28 September 1968
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
{Court, End Time}
{Rate Type, Start Time}
{Rate Type, End Time}
o In the Today's Court Bookings table, there are no non-prime attributes: that is, all
attributes belong to candidate keys. Therefore the table adheres to both 2NF and
3NF
o The table does not adhere to BCNF because in the dependency Rate Type
Court, the determining attribute (Rate Type) is not a super key.
The design can be amended so that it meets BCNF as follows:
Rate Types Todays Bookings
Rate Type Court Member Flag
SAVER 1 Yes
STANDARD 1 No
PREMIUM-A 2 Yes
PREMIUM-B 2 No
The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag};
the candidate keys for the Today's Bookings table are {Rate Type, Start Time} and {Rate
Type, End Time}. Both tables are in BCNF.
Rate Type Start Time End Time
SAVER 09:30 10:30
SAVER 11:00 12:00
STANDARD 14:00 15:30
PREMIUM-B 10:00 11:30
PREMIUM-B 11:30 13:30
PREMIUM-A 15:00 16:30
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-20
Consider the following table:
Lending
branch-name branch-city assets customer-name loan-number amount
Sadar Agra 200000 Ram L-12 12000
Sanjay-place Agra 100000 Ram L-13 13000
This table stores the information regarding loans. This table has following problems:
Since every branch is going to have several loans, the table will have one row for each
loan taken from a branch all of which will have same value for the columns branch-name,
branch-city and assets, repetition of data.
Updating the branch-city or assets of a particular branch will require updating each row
of this table and hence the operation will be costly.
If we miss any row without updating then there will be more than one value for a branch
city or assets of a branch, which means breaching the data integrity.
If there is a branch having no loans then we will not have any entry in this table and we
will not be able represent the complete information.
Decomposition
The above problem can be solved by decomposing the above table. The set of relations
R
1
, R
2
,R
n
is a decomposition of relation R if R = R
1
R
2
R
n
. It should be
noted that every pair R
i
and R
i+1
of this set should have at least one common attribute so
that they can be combined back again using join operation.
But all decompositions of this table will not be free from problem.
Consider for example if we form two new tables out of our Lending table as follows
Branch-customer-schema = (branch-name, branch-city, assets, customer name)
Customer-loan-schema = (customer-name, loan-number, amount)
Then the resulting tables with data will be as follows:
Branch-customer
branch-name branch-city assets customer-name
Sadar Agra 200000 Ram
Sanjay-place Agra 100000 Ram
Customer-loan
customer-name loan-number amount
Ram L-12 12000
Ram L-13 13000
Now suppose to know the branch for loan L-12 we try to form join of these two we will
a table as follows:
Branch-customer Customer-loan =
branch-name branch-city assets customer-name loan-number amount
Sadar Agra 200000 Ram L-12 12000
Sadar Agra 200000 Ram L-13 13000
Sanjay-place Agra 100000 Ram L-12 12000
Sanjay-place Agra 100000 Ram L-13 13000
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
According to this join both of the loans are taken from both of the branches. This is an
example of information loss. This occurred because the choice of Column to be kept
common in two tables after decomposition is wrong.
Lossless-Join Decomposition: A decomposition { R
1
, R
2
,R
n
} of relation schema R is
lossless join decomposition if for all legal relations r on schema R,
r =
R
1
(r)
R
1
(r)
R
n
(r)
In other words after decomposition, when we join all of the decomposed tables with data
it should result in the original table with data as was before decomposition.
Otherwise it is called Lossy-join decomposition.
Dependency preservation: This is another desirable property of a decomposition.
Suppose it is given that a set F of functional dependencies holds on any relation based on
schema R. Then set of functional dependencies that holds on any relation subschema R
1
is F
1
that contains all the functional dependencies of F which contains attributes of only
R
1
. So if decomposition of R is { R
1
, R
2
,R
n
} such that corresponding functional
dependencies which holds on them are { F
1
, F
2
,F
n
} then following should be true.
F
+
= {F
1
F
2
F
n
}
+
.
Such a decomposition is called dependency preserving decomposition.
For example:
Consider the schema R = {A, B, C, D} such that following functional dependency holds
on it F = {AB, A BC, C D}.
Now suppose the decomposition of this R is R
1
= {A,B} and R
2
= {B,C,D}, so the
functional dependencies which holds on R
1
are F
1
= {AB} (Note: F
1
should contain all
the functional dependencies in F which have only attributes of R
1
) and those on R
2
are F
2
={CD}. If we union F
1
F
2
is {AB, C D} which doesnt contain the A BC , so
it is not a dependency preserving decomposition.
If we decompose R into these relation schemas R
1
={A,B,C} and R
2
={C,D} then
F
1
={AB, A BC} and F
2
={CD} so F
1
F
2
is {AB, A BC, C D}.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-21
Normalization Using Functional Dependency
Lossless-Join Decomposition using FD:
o Let R is relation schema and F is a set of functional dependency on R. Let R
1
and
R
2
form a decomposition of R. This decomposition is lossless join decomposition
if at least one of the following functional dependency is in F
+
:
R
1
R
2
R
1
R
1
R
2
R
2
o Example: Lending-schema=(branch-name, branch-city, assets, customer-name,
loan-number, amount) the FD that holds on this schema are given as
branch-name assets branch-city
loan-number amount branch-name
so the decomposition of it into two schema as follows:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
is a lossless join decomposition because-
Branch-schema Loan-info-schema = branch-name
and we have an FD branch-name assets branch-city, applying augmentation
rule to it, this FD is equivalent to branch-name branch-name assets branch-
city i.e. branch-name Branch-schema.
Third Normal Form Using FD:
o Let R is a relation having F as the minimal set of functional dependencies that
holds on R.
Then do the following:
1. Initially have an empty set of relations.
2. for each FD in F, , i=1
Add a relation R
i
=( ,) if no other relation contains , , Increase
i by one
3. After adding all such relations add another relation R
i
= ( any candidate
key of R) if no other relation is containing a candidate key.
Boyce-Codd Normal Form using FD:
1. Let R
i
be relation i.e. not in BCNF
2. And, let is the FD that holds on but R
i
doesnt hold on (i.e. is not a
super key of R
i
)
3. Replace relation R
i
by two relations (, ) and (R
i
- ).
4. Now check again all the relations present with all the FDs that holds on them and
Go back to step 1.
o Example:
Consider: Lending-schema=(branch-name, branch-city, assets, customer-
name, loan-number, amount) the FD that holds on this schema are given as
1. branch-name assets branch-city
2. loan-number amount branch-name
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
We can see that Lending-schema is not in BCNF. Also we see that in FD
branch-name assets branch-city, branch-name is not superkey of
Lending-schema. So new relations is a set as follows:
Branch-schema=(branch-name, branch-city, assets)
branch-name assets branch-city
Loan-info-schema = (branch-name, customer-name, loan-number,
amount)
loan-number amount branch-name
Again in the new set of relations we see Loan-info-schema is not in BCNF
as loan-number is not a super key of Loan-info-schema. Again we
decompose it and the set of relations are
Branch-schema=(branch-name, branch-city, assets)
branch-name assets branch-city
Loan-schema = (branch-name, loan-number, amount)
loan-number amount branch-name
Borrower-schema = (customer-name, loan-number)
Now all of the three relations are in BCNF so we do not have to
decompose any more.
BCNF may not satisfy the dependency preservation criteria.
o In some cases, a non-BCNF table cannot be decomposed into tables that satisfy
BCNF and preserve the dependencies that held in the original table
o For example, a set of functional dependencies {AB C, C B} cannot be
represented by a BCNF schema.
o Unlike the first three normal forms, BCNF is not always achievable.
o Consider the following non-BCNF table whose functional dependencies follow
the {AB C, C B} pattern:
Nearest Shop
Person Shop Type Nearest Shop
Davidson Optician Eagle Eye
Davidson Hairdresser Snippets
Wright Bookshop Merlin Books
Fuller Bakery Doughy's
Fuller Hairdresser Sweeney Todd's
Fuller Optician Eagle Eye
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
For each Person / Shop Type combination, the table tells us which shop of this
type is geographically nearest to the person's home. We assume for simplicity that
a single shop cannot be of more than one type.
The candidate keys of the table are:
{Person, Shop Type}
{Person, Nearest Shop}
Because all three attributes are prime attributes (i.e. belong to candidate keys), the
table is in 3NF. The table is not in BCNF, however, as the Shop Type attribute is
functionally dependent on a non-superkey: Nearest Shop.
Shop Near Person Shop
Person Shop
Davidson Eagle Eye
Davidson Snippets
Wright Merlin Books
Fuller Doughy's
Fuller Sweeney Todd's
Fuller Eagle Eye
The "Shop Near Person" table has a candidate key of {Person, Shop}, and the
"Shop" table has a candidate key of {Shop}. Unfortunately, although this design
adheres to BCNF, it is unacceptable on different grounds: it allows us to record
multiple shops of the same type against the same person. In other words, its
candidate keys do not guarantee that the functional dependency {Person, Shop
Type} {Shop} will be respected.
Shop Shop Type
Eagle Eye Optician
Snippets Hairdresser
Merlin Books Bookshop
Doughy's Bakery
Sweeney Todd's Hairdresser
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture 22
Multivalued Dependencies
Let R be a relation schema, and X and Y be disjoint subsets of R (i.e., X R, Y R,
XY= ), and Z = R- XY.A relation r(R) satisfies XY if for any two tuples t
1
and t
2
,
o t
1
(X)=t
2
(X), then there exist t
3
in r such that
o t
3
(X)=t
1
(X), t
3
(Y)=t
1
(Y), t
3
(Z)=t
2
(Z).
o By symmetry, there exist t4 in r such that
o t
4
(X)=t
1
(X), t
4
(Y)=t
2
(Y), t
4
(Z)=t
1
(Z).
X Y Z
t1 x1 y1 z1
t2 x1 y2 z2
t3 x1 y1 z2
t4 x1 y2 z1
The MVD X Y says that the relationship between X and Y is independent of the
relationship between X and R-Y
For example consider the table Employee:
Employee-name Project-name Dependant-name
Smith X John
Smith Y Ann
Smith X Ann
Smith Y John
o MVDs Employee-name Project-name and Employee-name Dependant-name
hold in the relation
o The employee named Smith works on projects X and Y, and has two dependents
John and Ann.
o If we store only the first two tuples in the relation, it would incorrectly show the
associations among attributes
o If we have MVDs in a relation, we may have to repeat values redundantly in the
tuples. In the Employee relation, values X and Y of Project-name are repeated
with each value of Dependant-name--- clearly undesirable
o Problem: Employee schema is in BCNF because no FDs hold for it
o Trivial MVD: If MVD X Y is satisfied by all relations whose schemas include X
and Y, it is called trivial MVD.
XY is trivial whenever Y X or XY=R
o If a relation r fails to satisfy a given MVD, a relation r that satisfies the MVD can
be constructed by adding tuples to r.
MVD is called "tuple generating dependency"
compare it with FD: need to delete tuples to make the relation to satisfy a
given FD
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
o MVD can be used in two ways
test relations to determine whether they are legal under a given set of FDs
and MVDs
specify constraints on a set of relations
Let D: a set of FDs and MVDs then D
+
: the closure of D is the set of all FDs and MVDs
logically implied by D.
D
+
can be computed using the following set of sound and complete rules
1. reflexivity: if Y X then XY
2. augmentation: if X Y then WX Y
3. transitivity: if XY and YZ then X Z
4. complementation: if XY then XR-XY
5. MV augmentation: if XY and W R, V W,then WXVY
6. MV transitivity: if X Y and YZ then XZ-Y
7. replication: if X Y then XY
8. coalescence: if XY and ZY, WR, WY= , WZ, then XZ
Note: The first three rules are Armstrongs axioms.
Fourth Normal Form(4NF):
A relation scheme R is in 4NF w.r.t. D, if for every non-trivial MVD XY in D+, X is a
superkey for R
4NF vs BCNF
o 4NF is different from BCNF only in the use of D (FD + MVD) instead of F (FDs)
o every 4NF schemas are also in BCNF.
By replication rule, XY implies XY.
o If R is not in BCNF, there exists a non-trivial FD XY where X is not a superkey
--- R cannot be in 4NF
For example: Employee (Employee-name, Project-name, Dependant-name) is not in 4NF,
since
o Employee-namePproject-name but Employee-name is not a key.
o Decompose into Emp-proj (E-n, P-n) and Emp-dep (E-n, D-n) do bring the tables
in 4NF
For example: Borrow (Loan#, C-name, Street, C-city) is in BCNF, but not in 4NF,
because C-nameLoan# is a non-trivial MVD, where C-name is not a key in this
schema.
The decomposition -- R1=(C-name, Loan#), R2=(C-name, Street, C-city)brings them
in 4NF
Benefits of Fourth Normal Form
o Reduced number of tuples
o No anomalies for insert/delete/update
Comparing FD and MVD
o if we have (a1,b1,c1,d1) r and (a1,b2,c2,d2) r
AB implies b1=b2
AB implies (a1,b1,c2,d2) r and (a1,b2,c1,d1) r
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture 23
Join Dependency and Fifth Normal form(Project Join Normal Form):
The normal forms discussed so far required that the given relation R if not in the given
normal form be decomposed in two relations to meet the requirements of the normal
form. In some rare cases, a relation can have problems like redundant information and
update anomalies because of it but cannot be decomposed in two relations to remove the
problems. In such cases it may be possible to decompose the relation in three or more
relations using the 5NF.
The fifth normal form deals with join-dependencies which is a generalisation of the
MVD. The aim of fifth normal form is to have relations that cannot be decomposed
further. A relation in 5NF cannot be constructed from several smaller relations.
A relation R satisfies join dependency *(R
1
, R
2
, ..., R
n
) if and only if R is equal to the join
of R
1
, R
2
, ..., R
n
where each R
i
is a subset of the set of attributes of R
A relation R is in 5NF (or project-join normal form, PJNF) if for all join dependencies of
the form *(R
1
, R
2
, ..., R
n
), where each R
i
is a subset of the set of attributes of R and
R = R
1
R
2
...R
n
, at least one of the following holds.
o *(R
1
, R
2
, ..., R
n
) is a trivial join-dependency (i.e., one of R
i
is R)
o Every R
i
is a super key for R.
An example of 5NF can be provided by the example below that deals with departments,
subjects and students.
Department Subject Student
Comp. Sc. CP1000 John Smith
Mathematics MA1000 John Smith
Comp. Sc. CP2000 Arun Kumar
Comp. Sc. CP3000 Reena Rani
Physics PH1000 Raymond Chew
Chemistry CH2000 Albert Garcia
o The above relation says that Comp. Sc. offers subjects CP1000, CP2000 and
CP3000 which are taken by a variety of students. No student takes all the subjects
and no subject has all students enrolled in it and therefore all three fields are
needed to represent the information.
o The above relation does not show MVDs since the attributes subject and student
are not independent; they are related to each other and the pairings have
significant information in them. The relation can therefore not be decomposed in
two relations
(dept, subject), and (dept, student)
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
without losing some important information.
o The relation can however be decomposed in the following three relations
(dept, subject), and
(dept, student)
(subject, student)
and now it can be shown that this decomposition is lossless
Consider the Loan-Info-Schema discussed earlier. Suppose it is given that following join
dependency holds on the Loan-Info-Schema-
*((loan-number,branch-name), (loan-number, customer-name), (loan-number,amount))
Then it is not in 5
th
normal form as all of these relation schema doesnt represent the
super keys so we should decompose it into three relations as given by the join
dependency i.e. we should have following three relation schemas in place of given Loan-
Info-Schema:
o (loan-number, branch-name),
o (loan-number, customer-name), and
o (loan-number, amount)
s.sanyasirao1@gmail.com

You might also like