You are on page 1of 23

Chapter : 1

Concurrency
1.1 Introduction
In information technology and computer science, especially in the fields of computer
programming, operating systems, multiprocessors, and databases, concurrency control ensures
that correct results for concurrent operations are generated, while getting those results as quickly
as possible.
Computer systems, both software and hardware, consist of modules, or components. Each
component is designed to operate correctly, i.e., to obey or to meet certain consistency rules.
When components that operate concurrently interact by messaging or by sharing accessed data
(in memory or storage), a certain component's consistency may be violated by another
component. The general area of concurrency control provides rules, methods, design
methodologies, and theories to maintain the consistency of components operating concurrently
while interacting, and thus the consistency and correctness of the whole system. Introducing
concurrency control into a system means applying operation constraints which typically result in
some performance reduction. Operation consistency and correctness should be achieved with as
good as possible efficiency, without reducing performance below reasonable levels. Concurrency
control can require significant additional complexity and overhead in a concurrent algorithm
compared to the simpler sequential algorithm.
For example, a failure in concurrency control can result in data corruption from torn read or
write operations.
Multiple transactions are allowed to run concurrently in the system. Advantages are:

increased processor and disk utilization, leading to better transaction throughput :


one transaction can be using the CPU while another is reading from or writing to the
disk.
reduced average response time for transactions : short transactions need not wait
behind long ones.

Concurrency control schemes mechanisms to achieve isolation; that is, to control the
interaction among the concurrent transactions in order to prevent them from destroying the
consistency of the database.

1.2 Schedule
Schedule a sequences of instructions that specify the chronological order in which instructions
of concurrent transactions are executed.

a schedule for a set of transactions must consist of all instructions of those transactions.

1|Page

must preserve the order in which the instructions appear in each individual transaction.

A transaction that successfully completes its execution will have a commit instructions as the
last statement (will be omitted if it is obvious).
A transaction that fails to successfully complete its execution will have an abort instructions as
the last statement (will be omitted if it is obvious).

1.3 Serializability
For correctness, a common major goal of most concurrency control mechanisms is
generating schedules with the Serializability property. Without serializability undesirable
phenomena may occur, e.g., money may disappear from accounts, or be generated from
nowhere. Serializability of a schedule means equivalence (in the resulting database values) to
some serial schedule with the same transactions (i.e., in which transactions are sequential with
no overlap in time, and thus completely isolated from each other: No concurrent access by any
two transactions to the same data is possible). Serializability is considered the highest level
of isolation among database transactions, and the major correctness criterion for concurrent
transactions. In some cases compromised, relaxed forms of serializability are allowed for better
performance
(e.g.,
the
popular Snapshot
isolation mechanism)
or
to
meet availability requirements in highly distributed systems (see Eventual consistency), but only
if application's correctness is not violated by the relaxation (e.g., no relaxation is allowed
for money transactions, since by relaxation money can disappear, or appear from nowhere).
Almost all implemented concurrency control mechanisms achieve serializability by
providing Conflict serializablity, a broad special case of serializability (i.e., it covers, enables
most serializable schedules, and does not impose significant additional delay-causing
constraints) which can be implemented efficiently.

Basic Assumption Each transaction preserves database consistency.


Thus serial execution of a set of transactions preserves database consistency.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.
We ignore operations other than read and write operations (OS-level instructions), and
we assume that transactions may perform arbitrary computations on data in local buffers
in between reads and writes.
Our simplified schedules consist of only read and write instructions.
Conflicting Instructions
Instructions Ii and Ij of transactions Ti and Tj respectively, conflict if and only if there exists
some item Q accessed by both Ii and Ij, and at least one of these instructions wrote Q.
1.
2.
3.
4.

Ii = read(Q), Ij = read(Q) Ii and Ij dont conflict.


Ii = read(Q), Ij = write(Q) They conflict.
Ii = write(Q), Ij = read(Q) They conflict.
Ii = write(Q), Ij = write(Q) They conflict

2|Page

Intuitively, a conflict between Ii and Ij forces a (logical) temporal order between them. If Ii and
Ij are consecutive in a schedule and they do not conflict, their results would remain the same
even if they had been interchanged in the schedule.
If a schedule S can be transformed into a schedule S by a series of swaps of non-conflicting
instructions, we say that S and S are conflict equivalent.
We say that a schedule S is serializable if it is conflict equivalent to a serial schedule.
Schedule S3 can be transformed into S6, a serial schedule where T2 follows T1 , by series of
swaps of non-conflicting instructions.
Therefore Schedule S3 is serializable.
Schedule S3

T1

T2

Read(A)
Write(A)
Read(A)
Write(A)

Schedule S6

T1

T2

Read(A)
Write(A)
Read(B)
Write( B)

Read(B)
Write(B)
Read(B)
Write(B)

Read(A)
Write(A)
Read(B)
Write( B)

1.4 Concurrency Control


A database must provide a mechanism that will ensure that all possible schedules are :

serializable, and
are recoverable and preferably cascadeless .

A policy in which only one transaction can execute at a time generates serial schedules, but
provides a poor degree of concurrency and low throughput. Are serial schedules
recoverable/cascadeless?
Testing a schedule for serializability after it has executed is a little too late!
Goal to develop concurrency control protocols that will assure serializability.
In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions. We have
3|Page

concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent


transactions. There are two approaches used in algorithms to deals with the problems of
concurrency control. These are:
Pessimistic Approach
Optimistic Approach
Pessimistic Concurrency control protocols can be broadly divided into two categories

Lock based protocols

Time stamp based protocols

1.4.1 Lock Based Protocols


Database systems equipped with lock-based protocols use a mechanism by which any
transaction cannot read or write data until it acquires an appropriate lock on it. A lock is a
mechanism to control concurrent access to a data item. Locks are of two kinds

Binary Locks A lock on a data item can be in two states; it is either locked or
unlocked.

Shared/exclusive This type of locking mechanism differentiates the locks based on


their uses. If a lock is acquired on a data item to perform a write operation, it is an
exclusive lock. Allowing more than one transaction to write on the same data item
would lead the database into an inconsistent state. Read locks are shared because no data
value is being changed.

Lock requests are made to concurrency-control manager. Transaction can proceed only after
request is granted. A transaction may be granted a lock on an item if the requested lock is
compatible with locks already held on the item by other transactions. Any number of
transactions can hold shared locks on an item, but if any transaction holds an exclusive on the
item no other transaction may hold any lock on the item. If a lock cannot be granted, the
requesting transaction is made to wait till all incompatible locks held by other transactions have
been released. The lock is then granted.
Example of a transaction with locking:

T2 : lock-S(A);
read(A);
unlock(A);
lock-S(B);
4|Page

read(B);
unlock(B);
display(A+B);
Locking as above is not sufficient to guarantee serializability

if A and B get updated in-between the read of A and B, the displayed sum would be
wrong.

A locking protocol is a set of rules followed by all transactions while requesting and
releasing locks. Locking protocols restrict the set of possible schedules. Locking may be
dangerous.

Danger of deadlocks - Cannot be completely solved transactions have to be killed and


rolled back.
Danger of starvation - A transaction is repeatedly rolled back due to deadlocks.
Concurrency control manager can be designed to prevent starvation.

There are four types of lock protocols available


Simplistic Lock Protocol
Simplistic lock-based protocols allow transactions to obtain a lock on every object before a
'write' operation is performed. Transactions may unlock the data item after completing the
write operation.
Pre-claiming Lock Protocol
Pre-claiming protocols evaluate their operations and create a list of data items on which they
need locks. Before initiating an execution, the transaction requests the system for all the locks it
needs beforehand. If all the locks are granted, the transaction executes and releases all the locks
when all its operations are over. If all the locks are not granted, the transaction rolls back and
waits until all the locks are granted.

1.4.1.1
5|Page

Two Phase Locking Protocol

This locking protocol divides the execution phase of a transaction into three parts. In the first
part, when the transaction starts executing, it seeks permission for the locks it requires. The
second part is where the transaction acquires all the locks. As soon as the transaction releases its
first lock, the third phase starts. In this phase, the transaction cannot demand any new locks; it
only releases the acquired locks.

Two-phase locking has two phases, one is growing, where all the locks are being acquired by
the transaction; and the second phase is shrinking, where the locks held by the transaction are
being released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then
upgrade it to an exclusive lock.

1.4.1.2

Strict Two Phase Locking Protocol

The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the
transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a
lock after using it. Strict-2PL holds all the locks until the commit point and releases all the locks
at a time.

Strict-2PL does not have cascading abort as 2PL does.

6|Page

1.4.2 Time stamp Based Protocols


The most commonly used concurrency protocol is the timestamp based protocol. This protocol
uses either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the
time of execution, whereas timestamp-based protocols start working as soon as a transaction is
created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age
of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004 is
two seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system
know when the last read and write operation was performed on the data item.

1.4.2.1

Time stamp Ordering Protocol

The timestamp-ordering protocol ensures serializability among transactions in their conflicting


read and write operations. This is the responsibility of the protocol system that the conflicting
pair of tasks should be executed according to the timestamp values of the transactions.

The timestamp of transaction Ti is denoted as TS(Ti).

Read time-stamp of data-item X is denoted by R-timestamp(X).

Write time-stamp of data-item X is denoted by W-timestamp(X).

Timestamp ordering protocol works as follows

If a transaction Ti issues a read(X) operation


o

If TS(Ti) < W-timestamp(X)

If TS(Ti) >= W-timestamp(X)

Operation rejected.

Operation executed.

All data-item timestamps updated.

If a transaction Ti issues a write(X) operation


o

7|Page

If TS(Ti) < R-timestamp(X)

If TS(Ti) < W-timestamp(X)

Operation rejected.

Operation rejected and Ti rolled back.

Otherwise, operation executed.

Thomas Write Rule


This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and Ti is rolled back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making Ti rolled back, the 'write' operation itself is ignored.

1.4.3 Optimistic Concurrency Control / scheduling


The optimistic method of concurrency control is based on the assumption that conflicts of
database operations are rare and that it is better to let transactions run to completion and only
check for conflicts before they commit. An optimistic concurrency control method is also known
as validation or certification methods. No checking is done while the transaction is executing.
The optimistic method does not require locking or time stamping techniques. Instead, a
transaction is executed without restrictions until it is committed.
Optimistic concurrency control (OCC) is a concurrency control method applied to
transactional systems such as relational database management systems and software transactional
memory. OCC assumes that multiple transactions can frequently complete without interfering
with each other. While running, transactions use data resources without acquiring locks on those
resources. Before committing, each transaction verifies that no other transaction has modified the
data it has read. If the check reveals conflicting modifications, the committing transaction rolls
back and can be restarted.[1] Optimistic concurrency control was first proposed by H.T. Kung.[2]
OCC is generally used in environments with low data contention. When conflicts are rare,
transactions can complete without the expense of managing locks and without having
transactions wait for other transactions' locks to clear, leading to higher throughput than other
concurrency control methods. However, if contention for data resources is frequent, the cost of
repeatedly restarting transactions hurts performance significantly; it is commonly thought that
other concurrency control methods have better performance under these conditions. However,
locking-based ("pessimistic") methods also can deliver poor performance because locking can
drastically limit effective concurrency even when deadlocks are avoided.
It allows transactions to proceed unsynchronized and only check conflicts at the end. This
approach is based on the premise that conflicts are rare.

8|Page

Optimistic Execution: It perform read and compute operation without validation and perform
validation just before write operation.
Read Composite Validate Write

1.4.3.1

Advantages Of Optimistic Concurrency Control

The optimistic concurrency control has the following advantages :


This technique is very efficient when conflicts are rare. The occasional conflicts result in the
transaction roll back.
The rollback involves only the local copy of data, the database is not involved and thus there
will not be any cascading rollbacks.

1.4.3.2

Problems Of Optimistic Concurrency Control

The optimistic concurrency control suffers from the following problems:


Conflicts are expensive to deal with, since the conflicting transaction must be rolled back.
Longer transactions are more likely to have conflicts and may be repeatedly rolled back
because of conflicts with short transactions.

1.4.3.3

Application Of Optimistic Concurrency Control

Only suitable for environments where there are few conflicts and no long transactions.
Acceptable for mostly Read or Query database systems that requires very few update
transactions.

1.5 Multiversion Techniques


This concurrency control technique keeps the old values of a data item when the item is updated.
These are known as multiversion concurrency control, because several versions (values) of an
item are maintained. When a transaction requires access to an item, an appropriate version is
chosen to maintain the serializability of the currently executing schedule, if possible. The idea is
that some read operations that would be rejected in other techniques can still be accepted by
reading an older version of the item to maintain serializability. When a transaction writes an
item, it writes a new version and the old version of the item is retained. Some multiversion
concurrency control algorithms use the concept of view serializability rather than conflict
serializability.
An obvious drawback of multiversion techniques is that more storage is needed to maintain
multiple versions of the database items. However, older versions may have to be maintained
anywayfor example, for recovery purposes. In addition, some database applications require
older versions to be kept to maintain a history of the evolution of data item values. The extreme
9|Page

case is a temporal database, which keeps track of all changes and the times at which they
occurred. In such cases, there is no additional storage penalty for multiversion techniques, since
older versions are already maintained.
Multiversion Technique Based on Timestamp Ordering
In this method, several versions , , ..., of each data item X are maintained. For each version, the
value of version and the following two timestamps are kept:
1. read_TS: The read timestamp of is the largest of all the timestamps of transactions that have
successfully read version .
2. write_TS: The write timestamp of is the timestamp of the transaction that wrote the value of
version .
Whenever a transaction T is allowed to execute a write_item(X) operation, a new version of
item X is created, with both the write_TS and the read_TS set to TS(T). Correspondingly, when a
transaction T is allowed to read the value of version Xi, the value of read_TS() is set to the larger
of the current read_TS() and TS(T).
To ensure serializability, the following two rules are used:
1. If transaction T issues a write_item(X) operation, and version i of X has the highest write_TS()
of all versions of X that is also less than or equal to TS(T), and read_TS() > TS(T), then abort
and roll back transaction T; otherwise, create a new version of X with read_TS() = write_TS() =
TS(T).
2. If transaction T issues a read_item(X) operation, find the version i of X that has the highest
write_TS() of all versions of Xthat is also less than or equal to TS(T); then return the value of to
transaction T, and set the value of read_TS() to the larger of TS(T) and the current read_TS().
As we can see in case 2, a read_item(X) is always successful, since it finds the appropriate
version to read based on the write_TS of the various existing versions of X. In case 1, however,
transaction T may be aborted and rolled back. This happens if T is attempting to write a version
of X that should have been read by another transaction T whose timestamp is read_TS();
however, T has already read version Xi, which was written by the transaction with timestamp
equal to write_TS(). If this conflict occurs, T is rolled back; otherwise, a new version of X,
written by transaction T, is created. Notice that, if T is rolled back, cascading rollback may
occur. Hence, to ensure recoverability, a transaction T should not be allowed to commit until
after all the transactions that have written some version that T has read have committed.

10 | P a g e

Multiversion Two-Phase Locking Using Certify Locks


In this multiple-mode locking scheme, there are three locking modes for an item: read, write, and
certify, instead of just the two modes (read, write). Hence, the state of LOCK(X) for an
item X can be one of read-locked, write-locked, certify-locked, or unlocked.
In the standard locking scheme, once a transaction obtains a write lock on an item, no other
transactions can access that item. The idea behind multiversion 2PL is to allow other transactions
T to read an item X while a single transaction T holds a write lock on X. This is accomplished by
allowing two versions for each item X; one version must always have been written by some
committed transaction. The second version X is created when a transaction T acquires a write
lock on the item. Other transactions can continue to read the committed version of X while T
holds the write lock. Transaction T can write the value of X as needed, without affecting the
value of the committed version X. However, once T is ready to commit, it must obtain a certify
lock on all items that it currently holds write locks on before it can commit. The certify lock is
not compatible with read locks, so the transaction may have to delay its commit until all its
write-locked items are released by any reading transactions in order to obtain the certify locks.
Once the certify lockswhich are exclusive locksare acquired, the committed version X of the
data item is set to the value of version X, version X is discarded, and the certify locks are then
released. In this multiversion 2PL scheme, reads can proceed concurrently with a single write
operationan arrangement not permitted under the standard 2PL schemes.

1.6 Deadlock
In a multi-process system, deadlock is an unwanted situation that arises in a shared resource
environment, where a process indefinitely waits for a resource that is held by another process.
For example, assume a set of transactions {T0, T1, T2, ...,Tn}. T0 needs a resource X to complete
its task. Resource X is held by T1, and T1 is waiting for a resource Y, which is held by T2. T2 is
waiting for resource Z, which is held by T0. Thus, all the processes wait for each other to release
resources. In this situation, none of the processes can finish their task. This situation is known as
a deadlock.
Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the transactions
involved in the deadlock are either rolled back or restarted.
Transaction is unit of work done. So a database management system will have number of
transactions. There may be situations when two or more transactions are put into wait state
simultaneously .In this position each would be waiting for the other transaction to get released.
Suppose we have two transactions one and two both executing simultaneously. In transaction
numbered one we update student table and then update course table. We have transaction two in
which we update course table and then update student table. We know that when a table is
updated it is locked and prevented from access from other transactions from updating.
So in transaction one student table is updated it is locked and in transaction two course table is
updated and it is locked. We have given already that both transactions gets executed
11 | P a g e

simultaneously. So both student table and course table gets locked so each one waits for the other
to get released. This is the concept of deadlock in DBMS.
A deadlock is a condition wherein two or more tasks are waiting for each other in order to be
finished but none of the task is willing to give up the resources that other task needs. In this
situation no task ever gets finished and is in waiting state forever.

Coffman conditions
Coffman stated four conditions for a deadlock occurrence. A deadlock may occur if all the
following conditions holds true.

Mutual exclusion condition: There must be at least one resource that cannot be used by
more than one process at a time.
Hold and wait condition: A process that is holding a resource can request for additional
resources that are being held by other processes in the system.
No preemption condition: A resource cannot be forcibly taken from a process. Only the
process can release a resource that is being held by it.
Circular wait condition: A condition where one process is waiting for a resource that is
being held by second process and second process is waiting for third process .so on and
the last process is waiting for the first process. Thus making a circular chain of waiting.

1.6.1 Deadlock Handling


Ignore the deadlock (Ostrich algorithm)
Did that made you laugh? You may be wondering how ignoring a deadlock can come under
deadlock handling. But to let you know that the windows you are using on your PC, uses this
approach of deadlock handling and that is reason sometimes it hangs up and you have to reboot it
to get it working. Not only Windows but UNIX also uses this approach.
12 | P a g e

The question is why? Why instead of dealing with a deadlock they ignore it and why this is
being called as Ostrich algorithm?
Well! Let me answer the second question first, This is known as Ostrich algorithm because in
this approach we ignore the deadlock and pretends that it would never occur, just like Ostrich
behavior to stick ones head in the sand and pretend there is no problem.
Lets discuss why we ignore it: When it is believed that deadlocks are very rare and cost of
deadlock handling is higher, in that case ignoring is better solution than handling it. For example:
Lets take the operating system example If the time requires handling the deadlock is higher
than the time requires rebooting the windows then rebooting would be a preferred choice
considering that deadlocks are very rare in windows.

1.6.2 Deadlock Detection


Resource scheduler is one that keeps the track of resources allocated to and requested by
processes. Thus, if there is a deadlock it is known to the resource scheduler. This is how a
deadlock is detected.
Once a deadlock is detected it is being corrected by following methods:

Terminating processes involved in deadlock: Terminating all the processes involved in


deadlock or terminating process one by one until deadlock is resolved can be the
solutions but both of these approaches are not good. Terminating all processes cost high
and partial work done by processes gets lost. Terminating one by one takes lot of time
because each time a process is terminated, it needs to check whether the deadlock is
resolved or not. Thus, the best approach is considering process age and priority while
terminating them during a deadlock condition.
Resource Preemption: Another approach can be the preemption of resources and
allocation of them to the other processes until the deadlock is resolved.

1.6.3 Deadlock Prevention


We have learnt that if all the four Coffman conditions hold true then a deadlock occurs so
preventing one or more of them could prevent the deadlock.

Removing mutual exclusion: All resources must be sharable that means at a time more
than one processes can get a hold of the resources. That approach is practically
impossible.
Removing hold and wait condition: This can be removed if the process acquires all the
resources that are needed before starting out. Another way to remove this to enforce a
rule of requesting resource when there are none in held by the process.
Preemption of resources: Preemption of resources from a process can result in rollback
and thus this needs to be avoided in order to maintain the consistency and stability of the
system.

13 | P a g e

Avoid circular wait condition: This can be avoided if the resources are maintained in a
hierarchy and process can hold the resources in increasing order of precedence. This
avoid circular wait. Another way of doing this to force one resource per process rule A
process can request for a resource once it releases the resource currently being held by it.
This avoids the circular wait.

1.6.4 Deadlock Avoidance


Deadlock can be avoided if resources are allocated in such a way that it avoids the deadlock
occurrence. There are two algorithms for deadlock avoidance.

Wait/Die
Wound/Wait

Here is the table representation of resource allocation for each algorithm. Both of these
algorithms take process age into consideration while determining the best possible way of
resource allocation for deadlock avoidance.
Wait/Die

Wound/Wait

Older process needs a resource held by younger


Older process waits Younger process dies
process
Younger process needs a resource held by older Younger
process
process dies

Younger
process waits

One of the famous deadlock avoidance algorithm is Bankers algorithm.


Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with
a conflicting lock by another transaction, then one of the two possibilities may occur
If TS(Ti) < TS(Tj) that is Ti, which is requesting a conflicting lock, is older than Tj then Ti is
allowed to wait until the data-item is available.
If TS(Ti) > TS(tj) that is Ti is younger than Tj then Ti dies. Ti is restarted later with a random
delay but with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with
conflicting lock by some another transaction, one of the two possibilities may occur
If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back that is Ti wounds Tj. Tj is restarted later
with a random delay but with the same timestamp.

14 | P a g e

If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
This scheme, allows the younger transaction to wait; but when an older transaction requests an
item held by a younger one, the older transaction forces the younger one to abort and release the
item.
In both the cases, the transaction that enters the system at a later stage is aborted.
Aborting a transaction is not always a practical approach. Instead, deadlock avoidance
mechanisms can be used to detect any deadlock situation in advance. Methods like "wait-for
graph" are available but they are suitable for only those systems where transactions are
lightweight having fewer instances of resource. In a bulky system, deadlock prevention
techniques may work well.
Wait-for Graph
This is a simple method available to track if any deadlock situation may arise. For each
transaction entering into the system, a node is created. When a transaction Ti requests for a lock
on an item, say X, which is held by some other transaction Tj, a directed edge is created from
Ti to Tj. If Tj releases item X, the edge between them is dropped and Ti locks the data item.
The system maintains this wait-for graph for every transaction waiting for some data items held
by others. The system keeps checking if there's any cycle in the graph.

Here, we can use any of the two following approaches


First, do not allow any request for an item, which is already locked by another transaction. This
is not always feasible and may cause starvation, where a transaction indefinitely waits for a data
item and can never acquire it.
The second option is to roll back one of the transactions. It is not always feasible to roll back the
younger transaction, as it may be important than the older one. With the help of some relative
algorithm, a transaction is chosen, which is to be aborted. This transaction is known as
the victim and the process is known as victim selection.

15 | P a g e

Chapter : 2
Object Oriented Database Development

2.1 Introduction
In this chapter, we will describe how such conceptual objectoriented models can be transformed
into logical schemas that can be directly implemented using an object database management
system (ODBMS). As you will learn later, although relational databases are effective for
traditional business applications, they have severe limitations (in the amount of programming
required and DBMS performance) when it comes to storing and manipulating complex data and
relationships. In this chapter, we will show how to implement applications within an objectoriented database environment.
An object database (also object-oriented database management system, OODBMS) is
a database management system in which information is represented in the form of objects as
used in object-oriented programming. Object databases are different from relational
databases which are table-oriented. Object-relational databases are a hybrid of both approaches.
Object databases have been considered since the early 1980s.
An object-oriented database management system (OODBMS), sometimes shortened
to ODBMS for object database management system), is a database management system (DBMS)
that supports the modelling and creation of data as objects. This includes some kind of support
for classes of objects and the inheritance of class properties and methods by subclasses and their
objects. There is currently no widely agreed-upon standard for what constitutes an OODBMS,
and OODBMS products are considered to be still in their infancy. In the meantime, the objectrelational database management system (ORDBMS), the idea that object-oriented database
concepts can be superimposed on relational databases, is more commonly encountered in
available products. An object-oriented database interface standard is being developed by an
industry group, the Object Data Management Group (ODMG). The Object Management Group
(OMG) has already standardized an object-oriented data brokering interface between systems in
a network.

16 | P a g e

Object-oriented database management systems (OODBMSs) combine database capabilities


with object-oriented programming language capabilities. OODBMSs allow object-oriented
programmers to develop the product, store them as objects, and replicate or modify existing
objects to make new objects within the OODBMS. Because the database is integrated with the
programming language, the programmer can maintain consistency within one environment, in
that both the OODBMS and the programming language will use the same model of
representation. Relational DBMS projects, by way of contrast, maintain a clearer division
between the database model and the application.
As the usage of web-based technology increases with the implementation of Intranets and
extranets, companies have a vested interest in OODBMSs to display their complex data. Using a
DBMS that has been specifically designed to store data as objects gives an advantage to those
companies that are geared towards multimedia presentation or organizations that
utilize computer-aided design (CAD).
Some object-oriented databases are designed to work well with object-oriented programming
languages such as Delphi, Ruby, Python, Perl, Java, C#, Visual Basic .NET, C++, ObjectiveC and Smalltalk; others such as JADE have their own programming languages. OODBMSs use
exactly the same model as object-oriented programming languages.

2.2 Object Definition Language


Similarly, the ODL allows you to specify a logical schema for an object-oriented database. ODL
is a programming-language-independent specification language for defining OODB schemas.
Just as an SQL DDL schema is portable across SQL-compliant relational DBMSs, an ODL
schema is portable across ODMGcompliant ODBMSs.
Corresponds to SQLs DDL (Data Definition Language ) Specify the logical schema for an
objectoriented database oriented database z Based on the specifications of Object Database
Management Group (ODMG)
class keyword for defining classes
17 | P a g e

attribute keyword for attributes keyword for attributes


operations return type, name, parameters in parentheses in parentheses
relationship keyword for establishing relationship
Defining an Attribute
Value can be either: Object identifier OR Literal
Types of literals
Atomic a constant that cannot be decomposed into components a constant that cannot be
decomposed into components
Collection multiple literals or object types
Structure a fixed number of named elements, each of which could be a literal or object type
literal or object type
Attribute ranges
Allowable values for an attribute
enum for enumerating the allowable values
Kinds of Collections

Set unordered collection without duplicates


Ba g unordered collection that ma y contain duplicates
z List ordered collection, all the same type
Array dynamically sized ordered collection, locatable by position
Dictionary unordered sequence of key-value pairs without duplicates

Defining Structures
Structure = user-defined type with components struct keyword
Example:
struct Address
{
String street_address
String city;
String state;
String zip;
};
18 | P a g e

Defining Operations

Return type
Name
Parenthesis following the name
Arguments within the parentheses

Defining Relationships

Only unary and binary relationships allowed.


Relationships are bi-directional
implemented through use of inverse keyword
ODL relationships are specified:
relationship indicates that class is on many-side
relationship list indicates that class is on one-side and other class (many) instances
ordered

2.3 Creating Object Instances


When a new instance of a class is created, a unique object identifier is assigned. You may
specify an object identifier with one or more unique tag names.
For example, we can create a new course object called MBA 669 as follows:
MBA669 course ( );

19 | P a g e

This creates a new instance of Course. The object tag name, MBA699, can be used to reference
this object. We have not specified the attribute values for this object at this point.
Suppose you want to create a new student object and initialize some of its attributes.
Cheryl student (name: Cheryl Davis, dateOfBirth: 4/5/77);
This creates a new student object with a tag name of Cheryl and initializes the values of two
attributes. You can also specify the values for the attributes within a structure, as in the following
example:
Jack student (
name: Jack Warner, dateOfBirth: 2/12/74,
address: {street_address 310 College Rd, city Dayton, state Ohio, zip 45468},
phone: {area_code 937, personal_number 2282252}
);
For a multivalued attribute, you can specify a set of values.
For example, you can specify the skills for an employee called Dan Bellon as follows:
Dan employee (
emp_id: 3678, name: Dan Bellon,
skills: {Database design, OO Modeling }
);
Establishing links between objects for a given relationship is also easy. Suppose you want to
store the fact that Cheryl took three courses in fall 1999. You can write:
Cheryl student (
takes: {OOAD99F, Telecom99F, Java99F }
);
where OOAD99F, Telecom99F, and Java99F are tag names for three course-offering objects.
This definition creates three links for the takes relationship, from the object tagged Cheryl to
each of the course offering objects.
Consider another example. To assign Dan to the TQM project, we write:
assignment (start_date: 2/15/2001, allocated_to: Dan, for TQM);
Notice that we have not specified a tag name for the assignment object. Such objects will be
identified by the system-generated object identifiers. The assignment object has a link to an
employee object (Dan) and another link to a project object (TQM).
20 | P a g e

When an object is created, it is assigned a lifetime, either transient or persistent. A transient


object exists only while some program or session is in operation. A persistent object exists until
it is explicitly deleted. Database objects are almost always persistent.

2.4 Object Query Language


We will now describe the Object Query Language (OQL), which is similar to SQL-92 and has
been set forth as an ODMG standard for querying OODBs. OQL allows you a lot of flexibility in
formulating queries. You can write a simple query such as
Jack.dateOfBirth
which returns Jacks birth date, a literal value, or
Jack.address
. . . which returns a structure with values for street address, city, state, and zip. If instead we
want to simply find in which city Jack resides, we can write
Jack.address.city
Like SQL, OQL uses a select-from-where structure to write more complex queries.
Basic Retrieval Command
Suppose we want to find the title and credit hours for MBA 664. Parallel to SQL, those
attributes are specified in the select clause, and the extent of the class that has those attributes is
specified in the from clause. In the where clause, we specify the condition that has to be satisfied.
In the query shown below, we have specified the extent courses of the Course class and bound
the extent to a variable called c in the from clause. We have specified the attributes crse_title and
credit_hrs for the extent (i.e., set of all Course instances in the database) in the select clause, and
stated the condition c.crse_code = MBA 664 in the where clause.
select c.crse_title, c.credit_hrs
from courses c
where c.crse_code = MBA 664
Because we are dealing with only one extent, we could have left out the variable c without any
loss in clarity. However, as with SQL, if you are dealing with multiple classes that have common
attributes, you must bind the extents to variables so that the system can unambiguously identify
the classes for the selected attributes. The result of this query is a bag with two attributes.
Including Operations in Select Clause
We can invoke operations in an OQL query similar to the way we specify attributes. For
example, to find the age of John Marsh, a student, we invoke the age operation in the select
clause.
select s.age
21 | P a g e

from students s
where s.name = John Marsh
The query returns an integer value, assuming that there is only one student with that name. In
addition to literal values, a query can also return objects with identity.
For example, the query
select s
from students s
where s.gpa = 3.0
returns a collection (bag) of student objects for which the gpa is greater than or equal to 3.0.
Notice that we have used the gpa operation in the where clause. If we want to formulate the
same query, but only for those students who do not reside in Dayton, we can use the not operator
as in SQL: select s
from students s
where s.gpa = 3.0
and not (s.address.city = Dayton)
Instead of using not, we could have specified the new condition as follows:
s.address.city ! = Dayton
where ! is the inequality operator.
Now suppose that we want to find the ages of all students whose gpa is less than 3.0. This query
is select s.age
from students s
where s.gpa < 3.0

22 | P a g e

References

1.
2.
3.
4.
5.
6.
7.

http://mafisamin.web.ugm.ac.id/materikuliah/sistembasisdata/chap15r.pdf
database system concept by Silberschatz, Korth, Sudarshan S. 2007
codex.cs.yale.edu/avi/db-book/slide-dir/ch16.ppt
https://www.tutorialspoint.com/dbms/dbms_deadlock.htm
wikipedia
Fundamentals of Database Systems by Elmasri Navathe
OBrien, J. A., & Marakas, G. M. (2009). Management Information Systems (9th ed.).
New York, NY: McGraw-Hill/Irwin
8. http://wps.prenhall.com/wps/media/objects/3310/3390076/hoffer_ch15.pdf

23 | P a g e

You might also like