You are on page 1of 63

Case Study-Oracle

Mod III

History
Relational model was introduced in 1970.
Research and development effort was initiated at IBMs San Jose Research
Center.
It led to the announcement of two commercial relational DBMS products by
IBM in the 1980s:
SQL/DS for DOS/VSE (disk operating system/virtual storage extended)
and for VM/CMS (virtual machine/ conversational monitoring system)
environments, introduced in 1981;
DB2 for the MVS operating system, introduced in 1983.
Another relational DBMS, INGRES, was developed at the University of
California, Berkeley, in the early 1970s .
INGRES became a commercial RDBMS marketed by Ingres, Inc., a
subsidiary of ASK, Inc., and is presently marketed by Computer
Associates.
Other popular commercial RDBMSs include
Oracle of Oracle, Inc.; Sybase of Sybase, Inc.; RDB of Digital Equipment
Corp, now owned by Compaq; INFORMIX of Informix, Inc.; and UNIFY
of Unify, Inc.

History
Besides the RDBMSs many implementations of the
relational data model appeared on the personal
computer (PC) platform in the 1980s.
These include RIM, RBASE 5000, PARADOX,OS/2
Database Manager, DBase IV, XDB, WATCOM SQL, SQL
Server (of Sybase, Inc.), SQL Server (of Microsoft), and
most recently Access (also of Microsoft, Inc.).
They were initially single-user systems, but more
recently they have started offering the client/server
database architecture and are becoming compliant
with Microsofts Open Database Connectivity (ODBC),
a standard that permits the use of many front-end
tools with these systems

Basic Structure of the Oracle


System
An Oracle server consists of
an Oracle databasethe collection of stored data,
including log and control files and
the Oracle Instancethe processes, including Oracle
(system) processes and user processes taken
together, created for a specific instance of the
database operation.
Oracle server supports SQL to define and manipulate
data. Has a procedural language,called PL/SQL,to
control the flow of SQL, to use variables, and to
provide error-handling procedures.
Oracle can also be accessed through general purpose
programming languages such as C or JAVA

Oracle Database Structure


The Oracle database has two primary
structures: (1) a physical structure
referring to the actual stored data
(2) a logical structure
corresponding to an abstract
representation of the stored data,
which roughly corresponds to the
conceptual schema of the database

Oracle database
Contains :
One or more data files; these contain the actual data.
Two or more log files called redo log ;these record all changes
made to data and are used in the process of recovering, if
certain changes do not get written to permanent storage.
One or more control files; these contain control information
such as database name, file names and locations, and a
database creation timestamp. This file is also needed for
recovery purposes.
Trace files and an alert log; background processes have a
trace file associated with them and the alert log maintains
major database events

Oracle Instance
The set of processes that constitute an instance of the servers
operation is called an Oracle Instance,
which consists of a System Global Area and a set of
background processes.
It has the following components:
1. System global area (SGA): This area of memory is used for
database information shared by users. Oracle assigns an SGA
area when an instance starts. For optimal performance, the
SGA is generally made as large as possible, while still fitting
in real memory.

SGA
The SGA is divided into several types of memory structures:
1. Database buffer cache: This keeps the most recently
accessed data blocks from the database. By keeping most
frequently accessed data blocks in this cache, the disk I/O
activity can be significantly reduced.
2. Redo log buffer, which is the buffer for the redo log file
and is used for recovery purposes.
3. Shared pool, which contains shared memory constructs;
these include shared SQL areas, which contain parse trees
of SQL queries and execution plans for executing
SQL statements.

Oracle Instance

1. SGA
2. User processes: Each user process corresponds
to the execution of some or some tool.
3. Program global area (PGA) : This is a memory
buffer that contains data and control information
for a server process. A PGA is created by Oracle
when a server process is started.
4. Oracle processes: A process is a "thread of
control" or a mechanism in an operating system
that can execute a series of steps. A process has
its own private memory area where it runs. Oracle
processes are divided into server processes and
background processes.

Oracle Processes
Oracle creates server processes to
handle requests from connected user
processes .
The background processes are
created for each instance of Oracle .

Background processes
Database Writer (DBWR): Writes the modified blocks from the buffer cache
to the data files on disk
Log writer (LGWR): Writes from the log buffer area to the on-line disk log
file.
Checkpoint (CKPT): Refers to an event at which all modified buffers in the
SGA since the last checkpoint are written to the data files . The CKPT process
works with DBWR to execute a checkpointing operation.
System monitor (SMON): Performs instance recovery, manages storage
areas by making the space contiguous, and recovers transactions skipped
during recovery.
Process monitor (PMON): Performs process recovery when a user process
fails. It is also responsible for managing the cache and other resources used by
a user process.
Archiver (ARCH): Archives on-line log files to archival storage if configured to
do so.
Recoverer process (RECO): Resolves distributed transactions that are
pending due to a network or systems failure in a distributed database.
Dispatchers (Dnnn): In multithreaded server configurations, route requests
from connected user processes to available shared server processes. There is
one dispatcher per standard communication protocol supported.
Lock processes (LCKn): Used for inter-instance locking when Oracle runs in a
parallel server mode.

Oracle Startup and


Shutdown
An Oracle database is not available

to users until the Oracle server has


been started up and the database
has been opened.
The following steps need to be taken:
Starting an instance of the
database
Mounting a database
Opening a database

The reverse of the above operations


will shut down an Oracle instance

Database Structure and Its


Manipulation in Oracle
Oracle was designed originally for
RDBMS ,but now ORDBMS.
The features of Oracle including its
relational and object-relational modeling
facilities are:
Schema Objects for table & view,
Synonyms,index,sequence
Oracle Data Dictionary- read-only set of
tables that keeps the metadata. Users are
rarely given access to base tables.
SQL in Oracle- Oracle is compliant with
the SQL ANSI/ISO standard .

Methods in Oracle 8-operations


Triggers

Storage Organization in
Oracle

Storage
Each Database is divided into one
or more different tablespaces.
As a minimum there is always a
System and Users tablespace.
One or more data files are
created in each tablespace.
These data files are associated
with only one database.
18

Physical Storage
Physical Storage is organized in terms of
data blocks, extents and segments.
Data blocks are the finest (smallest) size
of storage. They are also called logical
blocks, page or Oracle blocks.
An Extent is a specific number of
contiguous data blocks.
A Segment is a set of extents for a data
structure.
19

Data Blocks
A Data Block has the following
components:
Header: Contains the general block
information such as block address &
type of segment.
Table Directory: Contains information
about tables that have data in the data
block.
Row Directory: Contains information
about the actual rows.
20

Data Blocks (cont)


Row Data: Uses the bulk of the space
of the Data Block. A row may span
multiple blocks.
Free Space: Space allocated for row
updates and new rows.

21

Extents
The amount of space initial allocated
to an extent is determine by the
Create command. Incremental
extents are allocated when the initial
one becomes full and their size is
determined by Create command.
All extents allocated to index exist as
long as the index exists.
22

Segments
There are four types of Segments:
Data segments: Each non-clustered
table and each cluster has a Single
data segment.
Index segments: Each index has a
single index segment.
Temporary segments: Used by SQL
as a temporary work area.
Rollback segments: Used for
undoing transactions.
23

INDEXING
&
HASHING

Basic Concepts

Indexing mechanisms are used to speed up access


to desired data.
Search Key - attribute to set of attributes used to
look up records in a file.
An index file consists of records (called index
entries) of the form
search-key

pointer

Index files are typically much smaller than the


original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted order
Hash indices: search keys are distributed uniformly
across buckets using a hash function.

Index Evaluation Metrics

Access time
Insertion time
Deletion time
Space overhead
Access types supported efficiently. E.g.,
records with a specified value in the attribute
or records with an attribute value falling in a
specified range of values.
This strongly influences the choice of index, and
depends on usage.

Ordered Indices
In an ordered index, index entries are stored sorted
on the search key value. E.g., author catalog in
library.
Primary index: in a sequentially ordered file,
the index whose search key specifies the sequential
order of the file.
Also called clustering index
The search key of a primary index is usually but not
necessarily the primary key.

Secondary index: an index whose search key


specifies an order different from the sequential order
of the file. Also called non-clustering index.
Index-sequential file: ordered sequential file with a
primary index.

Dense Index Files

Dense index Index record appears


for every search-key value in the file.

Sparse Index Files


Sparse Index: contains index records for only
some search-key values.
Only applicable when records are sequentially ordered
on search-key

To locate a record with search-key value K we:


Find index record with largest search-key value < K
Search file sequentially starting at the record to which
the index record points

Multilevel Index
If primary index does not fit in memory,
access becomes expensive.
Solution: treat primary index kept on disk as a
sequential file and construct a sparse index
on it.
outer index a sparse index of primary
index
inner index the primary index file
If even outer index is too large to fit in main
memory, yet another level of index can be
created, and so on.
Indices at all levels must be updated on
insertion or deletion from the file.

Multilevel Index (Cont.)

Index Update: Deletion


If deleted record was the only record in the file with its
particular search-key value, the search-key is deleted
from the index also.
Single-level index deletion:
Dense indices deletion of search-key: similar to file record
deletion.
Sparse indices
if an entry for the search key exists in the index, it is deleted by
replacing the entry in the index with the next search-key value in the
file (in search-key order).
If the next search-key value already has an index entry, the entry is
deleted instead of being replaced.

Index Update: Insertion


Single-level index insertion:
Perform a lookup using the search-key value appearing
in the record to be inserted.
Dense indices if the search-key value does not
appear in the index, insert it.
Sparse indices if index stores an entry for each block
of the file, no change needs to be made to the index
unless a new block is created.
If a new block is created, the first search-key value appearing in
the new block is inserted into the index.

Multilevel insertion (as well as deletion) algorithms


are simple extensions of the single-level algorithms

Non-clustering Indices
Frequently, one wants to find all the records
whose values in a certain field satisfy some
condition, and the file is not ordered on the
field.
Example 1: In the account database stored
sequentially by account number, we may want to find
all accounts in a particular branch.
Example 2: As above, but where we want to find all
accounts with a specified balance or range of
balances.

We can have a non-clustering index with an


index record for each search-key value. The
index record points to a bucket that contains
pointers to all the actual records with that
CIS552
Indexing and Hashing
particular search-key value.

34

Secondary Index on balance field of


account
350
400
500
600
700
750
900

CIS552

Brighton

A-217

750

Downtown

A-101

500

Downtown

A-110

600

Miami

A-215

700

Perryridge

A-102

400

Perryridge

A-201

900

Perryridge

A-218

700

Redwood

A-222

700

Round Hill

A-305

350

Indexing and Hashing

35

Clustering and Non-clustering


Non-clustering indices have to be dense.
Indices offer substantial benefits when searching
for records.
When a file is modified, every index on the file
must be updated. Updating indices imposes
overhead on database modification.
Sequential scan using clustering index is efficient,
but a sequential scan using a non-clustering
index is expensive each record access may
fetch a new block from disk.

CIS552

Indexing and Hashing

36

Secondary Indices

Frequently, one wants to find all the records


whose values in a certain field (which is not the
search-key of the primary index) satisfy some
condition.
Example 1: In the account relation stored
sequentially by account number, we may want to find
all accounts in a particular branch
Example 2: as above, but where we want to find all
accounts with a specified balance or range of
balances

We can have a secondary index with an index


record for each search-key value

Secondary Indices Example

Secondary index on balance field of account

Index record points to a bucket that contains pointers to all the actual records with that
particular search-key value.
Secondary indices have to be dense, since the file is not sorted by the search key.

Primary and Secondary


Indices
Indices offer substantial benefits when
searching for records, but updating indices
imposes overhead on database modification when a file is modified, every index on the file
must be updated,
Sequential scan using primary index is efficient,
but a sequential scan using a secondary index
is expensive
Each record access may fetch a new block from disk
Block fetch requires about 5 to 10 micro seconds,
versus about 100 nanoseconds for memory access

Hashing

Static Hashing
A bucket is a unit of storage containing one or more
records (a bucket is typically a disk block).
In a hash file organization we obtain the bucket of a
record directly from its search-key value using a hash
function.
Hash function h is a function from the set of all searchkey values K to the set of all bucket addresses B.
Hash function is used to locate records for access,
insertion as well as deletion.
Records with different search-key values may be
mapped to the same bucket; thus entire bucket has to
be searched sequentially to locate a record.

Example of Hash File


Organization

Hash file
organization of
account file, using
branch_name as key

Example of Hash File Organization

Hash file organization


of account file, using
branch_name as key

There are 10 buckets,


The binary representation of the ith character is
assumed to be the integer i.
The hash function returns the sum of the binary
representations of the characters modulo 10
E.g. h(Perryridge) = 5

h(Round Hill) = 3 h(Brighton) = 3

Hash Functions

Worst hash function maps all search-key values to the


same bucket; this makes access time proportional to the
number of search-key values in the file.
An ideal hash function is uniform, i.e., each bucket is
assigned the same number of search-key values from
the set of all possible values.
Ideal hash function is random, so each bucket will have
the same number of records assigned to it irrespective
of the actual distribution of search-key values in the file.
Typical hash functions perform computation on the
internal binary representation of the search-key.
For example, for a string search-key, the binary
representations of all the characters in the string could be
added and the sum modulo the number of buckets could be
returned. .

Handling of Bucket
Overflows
Bucket overflow
can occur because of
Insufficient buckets
Skew in distribution of records. This can
occur due to two reasons:
multiple records have same search-key value
chosen hash function produces non-uniform
distribution of key values

Although the probability of bucket


overflow can be reduced, it cannot be
eliminated; it is handled by using
overflow buckets.

Handling of Bucket Overflows (Cont.)


Overflow chaining the overflow buckets of a given
bucket are chained together in a linked list.

Above scheme is called closed hashing.


An alternative, called open hashing, which does not use
overflow buckets, is not suitable for database applications.

Hashing can
be used not
only for file
organization,
but also for
indexstructure
creation.
A hash
index
organizes the
search keys,
with their
associated
record
pointers, into
a hash file
structure.

Hash Indices

Deficiencies of Static
In static hashing, Hashing
function h maps search-key values to a
fixed set of B of bucket addresses. Databases grow or
shrink with time.

If initial number of buckets is too small, and file grows,


performance will degrade due to too much overflows.
If space is allocated for anticipated growth, a significant amount
of space will be wasted initially (and buckets will be underfull).
If database shrinks, again space will be wasted.

One solution: periodic re-organization of the file with a


new hash function
Expensive, disrupts normal operations

Better solution: allow the number of buckets to be


modified dynamically.

Dynamic Hashing

Good for database that grows and shrinks in size


Allows the hash function to be modified dynamically
Extendable hashing one form of dynamic
hashing
Hash function generates values over a large range
typically b-bit integers, with b = 32 (Note that 232 is quite
large!)
At any time use only a prefix of the hash function to index
into a table of bucket addresses.
Let the length of the prefix be i bits, 0 i 32.
Bucket address table size = 2 i. Initially i = 0
Value of i grows and shrinks as the size of the database grows
and shrinks.

Multiple entries in the bucket address table may point to


a same bucket. Thus, actual number of buckets is < 2 i
The number of buckets also changes dynamically due to
coalescing and splitting of buckets.

General Extendable Hash


Structure

In this structure, i2 = i3 = i, whereas i1 = i 1

Use of Extendable Hash


Structure
Each bucket j stores
a value i
j

All the entries that point to the same bucket have the same
values on the first ij bits.

To locate the bucket containing search-key Kj:


1. Compute h(Kj) = X
2. Use the first i high order bits of X as a displacement into
bucket address table, and follow the pointer to appropriate
bucket

To insert a record with search-key value Kj


follow same procedure as look-up and locate the bucket, say
j.
If there is room in the bucket j insert record in the bucket.
Else the bucket must be split and insertion re-attempted
Overflow buckets used instead in some cases

Insertion in Extendable Hash Structure


To split a bucket j when inserting
record with search-key value K :
(Cont)
j

If i > ij (more than one pointer to bucket j)


allocate a new bucket z, and set ij = iz = (ij + 1)
Update the second half of the bucket address table
entries originally pointing to j, to point to z
remove each record in bucket j and reinsert (in j or z)
recompute new bucket for Kj and insert record in the
bucket (further splitting is required if the bucket is still
full)

If i = ij (only one pointer to bucket j)


If i reaches some limit b, or too many splits have
happened in this insertion, create an overflow bucket
Else
increment i and double the size of the bucket address table.
replace each entry in the table by two entries that point to the
same bucket.
recompute new bucket address table entry for Kj
Now i > ij so use the first case above.

Deletion in Extendable Hash Structure

To delete a key value,


locate it in its bucket and remove it.
The bucket itself can be removed if it becomes
empty (with appropriate updates to the bucket
address table).
Coalescing of buckets can be done (can coalesce
only with a buddy bucket having same value of i j
and same ij 1 prefix, if it is present)
Decreasing bucket address table size is also possible
Note: decreasing bucket address table size is an expensive
operation and should be done only if number of buckets
becomes much smaller than the size of the table

Use of Extendable Hash Structure: Example

Initial Hash structure, bucket size = 2

Example (Cont.)

Hash structure after insertion of


one Brighton and two Downtown
records

Example (Cont.)

Hash structure after insertion of Mianus record

Example (Cont.)

Hash structure after insertion of three Perryridge records

Example (Cont.)
Hash structure after insertion of
Redwood and Round Hill records

Extendable Hashing vs. Other


Schemes
Benefits of extendable hashing:
Hash performance does not degrade with growth of file
Minimal space overhead

Disadvantages of extendable hashing


Extra level of indirection to find desired record
Bucket address table may itself become very big (larger
than memory)
Cannot allocate very large contiguous areas on disk either
Solution: B+-tree structure to locate desired record in bucket
address table

Changing size of bucket address table is an expensive


operation

Comparison of Ordered Indexing and Hashing

Cost of periodic re-organization


Relative frequency of insertions and deletions
Is it desirable to optimize average access time at the
expense of worst-case access time?
Expected type of queries:
Hashing is generally better at retrieving records having a
specified value of the key.
If range queries are common, ordered indices are to be preferred
Consider e.g. query with where A v1 and A v2

In practice:
PostgreSQL supports hash indices, but discourages use due to
poor performance
Oracle supports static hash organization, but not hash indices
SQLServer supports only B+-trees

Index Definition in SQL


standard
Create an index
create index <index-name> on <relation-name>
(<attribute-list>)
E.g.: create index b-index on branch(branch_name)

Use create unique index to indirectly specify and


enforce the condition that the search key is a candidate
key is a candidate key.
Not really required if SQL unique integrity constraint is
supported

To drop an index
drop index <index-name>

Most database systems allow specification of type of


index, and clustering.

Indexing in Oracle

Oracle supports B+-Tree indices as a default


for the create index SQL command
A new non-null attribute row-id is a added to
all indices, so as to guarantee that all search
keys are unique.
indices are supported on
attributes, and attribute lists,
on results of function over attributes
or using structures external to Oracle (Domain indices)

Bitmap indices are also supported, but for


that an explicit declaration is needed:
create bitmap index <index-name>
on <relation-name> (<attribute-list>)

Hashing in Oracle

Hash indices are not supported


However (limited) static hash file organization is supported for
partitions
create table partition by hash(<attribute-list>)
partitions <N>
stored in (<tables>)
Index files can also be partitioned using hash function
create index global partition by hash(<attribute-list>)
partitions <N>
This creates a global index partitioned by the hash function

(Global) indexing over hash partitioned table is also possible


Hashing may also be used to organize clusters in multitable
clusters

You might also like