You are on page 1of 11

1.

RELATIONAL DATA STRUCTURE

The database and the database structure are defined in the installation process. .

A database that can be perceived as a set of tables and manipulated in accordance


with the relational model of data. Each database includes:

 a set of system catalog tables that describe the logical and physical structure
of the data
 a configuration file containing the parameter values allocated for the database
 a recovery log with ongoing transactions and archivable transactions

Table 1. Database hierarchy


Component Description
Data A repository of information about the application programs,
dictionary databases, logical data models, and authorizations for an
organization.

When you change the data dictionary, the change process


includes edit checks that can prevent the data dictionary from
being corrupted. The only way to recover a data dictionary is to
restore it from a backup.
Container A data storage location, for example, a file, directory, or device that
is used to define a database.
Storage A logical unit of storage in a database such as a collection of
partition containers. Database storage partitions are called table spaces in
DB2®.
Business A tangible entity within an application that users create, access,
object and manipulates while performing a use case. Business objects
within a system are typically stateful, persistent, and long-lived.
Business objects contain business data and model the business
behavior.
Database An object that exists in an installation of a database system, such
object as an instance, a database, a database partition group, a buffer
pool, a table, or an index. A database object holds data and has no
behavior.
Table 1. Database hierarchy
Component Description
Table A database object that holds a collection of data for a specific
topic. Tables consist of rows and columns.
Column The vertical component of a database table. A column has a name
and a particular data type for example, character, decimal, or
integer.
Row The horizontal component of a table, consisting of a sequence of
values, one for each column of the table.
View A logical table that is based on data stored in an underlying set of
tables. The data returned by a view is determined by a SELECT
statement that is run on the underlying tables.
Index A set of pointers that is logically ordered by the values of a key.
Indexes provide quick access to data and can enforce uniqueness
of the key values for the rows in the table.
Relationship A link between one or more objects that is created by specifying a
join statement.
Join An SQL relational operation in which data can be retrieved from
two tables, typically based on a join condition specifying join
columns.

 Data dictionary tables

The structure of a relational database is stored in the data dictionary tables of


the database.

 Integrity checker

The integrity checker is a database configuration utility that you can use to
assesses the health of the base layer data dictionary. The tool compares the
data dictionary with the underlying physical database schema. If errors are
detected, the tool produces error messages detailing how to resolve the
issues.

 Storage partitions
A database storage partition is the location where a database object is stored
on a disk. Database storage partitions are called table spaces in DB2

 Business objects

A business object is an object that has a set of attributes and values,


operations, and relationships to other business objects. Business objects
contain business data and model the business behaviour.

 User-defined objects

Objects can be created in two ways: you can create an object in the database
or an object can be natively defined in the database. User-defined objects are
always created in the Database Configuration application.

 Configuration levels for objects

Levels describe the scope of objects and must be applied to objects.


Depending on the level that you assign to objects, you must create certain
attributes. For users to access an object, an attribute value must exist at the
level to which they have authority. The level that you assign to an object
sometimes depends on the level of the record in the database.

 Database relationships

Database relationships are associations between tables that are created


using join statements to retrieve data.

 Business object attributes

Attributes of business objects contain the data that is associated with a


business object. A persistent attribute represents a database table column or
a database view column. A nonpersistent attribute exists in memory only,
because the data that is associated with the attribute is not stored in the
database.

 Attribute data types

Each database record contains multiple attributes. Every attribute has an


associated data type.

 Database views
A database view is a subset of a database and is based on a query that runs
on one or more database tables. Database views are saved in the database
as named queries and can be used to save frequently used, complex queries.

 Indexes

You can use indexes to optimize performance for fetching data. Indexes
provide pointers to locations of frequently accessed data. You can create an
index on the columns in an object that you frequently query.

 Primary keys

When you assign a primary key to an attribute, the key uniquely identifies the
object that is associated with that attribute. The value in the primary column
determines which attributes are used to create the primary key.

2. RELATIONAL ALGEBRA

Relational algebra is a procedural query language, which takes instances of relations


as input and yields instances of relations as output. It uses operators to perform
queries. An operator can be either unary or binary. They accept relations as their
input and yield relations as their output. Relational algebra is performed recursively
on a relation and intermediate results are also considered relations.

The fundamental operations of relational algebra are as follows −

 Select
 Project
 Union
 Set different
 Cartesian product
 Rename

We will discuss all these operations in the following sections.

Select Operation (σ)

It selects tuples that satisfy the given predicate from a relation.

Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional
logic formula which may use connectors like and, or, and not. These terms may use
relational operators like − =, ≠, ≥, < , >, ≤.

For example −

σsubject = "database"(Books)

Output − Selects tuples from books where subject is 'database'.

σsubject = "database" and price = "450"(Books)

Output − Selects tuples from books where subject is 'database' and 'price' is 450.

σsubject = "database" and price = "450" or year > "2010"(Books)

Output − Selects tuples from books where subject is 'database' and 'price' is 450 or
those books published after 2010.

Project Operation (∏)

It projects column(s) that satisfy a given predicate.

Notation − ∏A1, A2, An (r)

Where A1, A2 , An are attribute names of relation r.

Duplicate rows are automatically eliminated, as relation is a set.

For example −

∏subject, author (Books)

Selects and projects columns named as subject and author from the relation Books.

Union Operation (∪)

It performs binary union between two given relations and is defined as −

r ∪ s = { t | t ∈ r or t ∈ s}

Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).

For a union operation to be valid, the following conditions must hold −

 r, and s must have the same number of attributes.


 Attribute domains must be compatible.
 Duplicate tuples are automatically eliminated.

∏ author (Books) ∪ ∏ author (Articles)

Output − Projects the names of the authors who have either written a book or an
article or both.

Set Difference (−)

The result of set difference operation is tuples, which are present in one relation but
are not in the second relation.

Notation − r − s

Finds all the tuples that are present in r but not in s.

∏ author (Books) − ∏ author (Articles)

Output − Provides the name of authors who have written books but not articles.

Intersection (∩):

Intersection on two relations R1 and R2 can only be computed if R1 and R2 are


union compatible (These two relation should have same number of attributes and
corresponding attributes in two relations have same domain). Intersection operator
when applied on two relations as R1∩R2 will give a relation with tuples which are in
R1 as well as R2. Syntax: Relation1 ∩ Relation2

3. OPTIMIZE DB DESINGS

Functional Dependency

Functional dependency (FD) is a set of constraints between two attributes in a


relation. Functional dependency says that if two tuples have same values for
attributes A1, A2,..., An, then those two tuples must have to have same values for
attributes B1, B2, ..., Bn.

Functional dependency is represented by an arrow sign (→) that is, X→Y, where X
functionally determines Y. The left-hand side attributes determine the values of
attributes on the right-hand side.

Trivial Functional Dependency

 Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset


of X, then it is called a trivial FD. Trivial FDs always hold.
 Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is
called a non-trivial FD.
 Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is
said to be a completely non-trivial FD.

First Normal Form

First Normal Form is defined in the definition of relations (tables) itself. This rule
defines that all the attributes in a relation must have atomic domains. The values in
an atomic domain are indivisible units.

We re-arrange the relation (table) as below, to convert it to First Normal Form.

Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form

Before we learn about the second normal form, we need to understand the following

 Prime attribute − An attribute, which is a part of the candidate-key, is known


as a prime attribute.
 Non-prime attribute − An attribute, which is not a part of the prime-key, is
said to be a non-prime attribute.

If we follow second normal form, then every non-prime attribute should be fully
functionally dependent on prime key attribute. That is, if X → A holds, then there
should not be any proper subset Y of X, for which Y → A also holds true.

We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name
must be dependent upon both and not on any of the prime key attribute individually.
But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be
identified by Proj_ID independently. This is called partial dependency, which is not
allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no
partial dependency.

Third Normal Form

For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy −

 No non-prime attribute is transitively dependent on prime key attribute.


 For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,
o A is prime attribute.

We find that in the above Student_detail relation, Stu_ID is the key and only prime
key attribute. We find that City can be identified by Stu_ID as well as Zip itself.
Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip →
City, so there exists transitive dependency.

To bring this relation into third normal form, we break the relation into two relations
as follows −

Boyce-Codd Normal Form


Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict
terms. BCNF states that −

 For any non-trivial functional dependency, X → A, X must be a super-key.

In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip
is the super-key in the relation ZipCodes. So,

Stu_ID → Stu_Name, Zip

and

Zip → City

Which confirms that both the relations are in BCNF.

4. NORMALIZATION PROCESS

Normalization

If a database design is not perfect, it may contain anomalies, which are like a bad
dream for any database administrator. Managing a database with anomalies is next
to impossible.

 Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when we
try to update one data item having its copies scattered over several places, a
few instances get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.
 Deletion anomalies − We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere else.
 Insert anomalies − We tried to insert data in a record that does not exist at
all.

Normalization is a method to remove all these anomalies and bring the database to
a consistent state.

You might also like