Dynamic Hashing and Indexing

Dynamic Hashing
 Good for database that grows and shrinks in size

 Allows the hash function to be modified dynamically
 Extendable hashing – one form of dynamic hashing
 This hashing scheme take advantage of the fact that the result of
applying a hashing function is a non-negative integer which can be
represented as a binary number- a string of bits.
 a type of directory, i.e., an array of 2d bucket addresses—is
maintained, where d is called the global depth of the directory.
 A local depth d’—stored with each bucket—specifies the number
of bits on which the bucket contents are based
 Value of d grows and shrinks as the size of the database grows and
shrinks.
 Thus, actual number of buckets is < 2d
 The number of buckets changes dynamically due to coalescing
and splitting of buckets.
Database System Concepts 12.1 ©Silberschatz, Korth and Sudarshan

Splitting and Coalescing of Buckets
 Splitting of buckets is done when an overflow occurs; the value
of d is incremented by one.
For a bucket whose hash value starting with 01, after splitting,
first contains records whose hash value start with 010 and the
other with 011
 Coalescing occurs when records are deleted, i.e. d>d’; The

value of d is decremented by one.

Extendible Hashing - Example
Record K h(K) h(K)2

rec1 2639 1 00001
rec2 3760 16 10000
rec3 4692 20 10100
rec4 4871 7 00111
rec5 5659 27 11011
rec6 1821 29 11101
rec7 1074 18 10010
rec8 2115 11 01011
rec9 1620 20 10100
rec10 2428 28 11100
rec11 3943 7 00111
rec12 4750 14 01110
rec13 6975 31 11111
rec14 4981 21 10101
rec15 9208 24 11000
d1 = local depth
d = global depth
d1 = 1
Directory rec 1
Locations rec 4
splitting splitting
bucket bucket
d1=0
rec 1 0
rec 2 1
d=0 d=1
record 3 = rec 2 d1 = 1
overflow!! rec 3
record 5 =
overflow!!
NEXT
d1 = 1
rec 1
rec 4
00 splitting
01 d1 = 2
bucket
rec 2
10
rec 3 record 7 =
11
rec 5 d1 = 2 overflow!!
d=2
rec 6
NEXT
splitting
rec 1 d1 = 1 bucket
rec 4 record 8 =
000
d1 = 3
overflow!!
001 rec 2
010 rec 7
011 d1 = 3
rec 3
100
101
110
d1 = 2
111 rec 5
rec 6
d=3
NEXT
d1 = 3
rec 1 NEXT
d1 = 3
rec 4
d1 = 2
rec 8
000
001
010
011
100 rec 2 d1 = 3
101 rec 7
110 rec 3 d1 = 3
111 rec 9
splitting
d=3
rec 5
d1 = 2
bucket
rec 6 record 10 =
overflow!!
d1 = 3
rec 1 NEXT
d1 = 3
rec 4
rec 11
d1 = 2
rec 8
000
rec 12
001
010
011
100 rec 2 d1 = 3
101 rec 7
110 rec 3 d1 = 3
111 rec 9
d=3 d1 = 3
rec 5
splitting
bucket
d1 = 3
rec 6
rec 10 record 13 =
overflow!!
d1 = 3
rec 1
0000 d1 = 3
0001 rec 4
0010 rec 11
0011 rec 8 d1 = 2
0100 rec 12
0101
d1 = 3
0110 rec 2
0111 rec 7
1000 rec 3 d1 = 3
1001 rec 14
1010 d1 = 3
rec 5
1011
rec 15
1100
d1 = 4
1101 rec 6
1110 rec 10
d1 = 4
1111 rec 13
d=4
Advantages and Disadvantages
 Benefits of extendable hashing:
 Hash performance does not degrade with growth of file
 Minimal space overhead
 Disadvantages of extendable hashing
 Extra level of indirection to find desired record
 Bucket address table may itself become very big (larger than
memory)
 Need a tree structure to locate desired record in the structure!
 Changing size of bucket address table is an expensive operation

 Linear hashing is an alternative mechanism which avoids these
disadvantages at the possible cost of more bucket overflows.
That is the directory is not needed.

Indexing

Indexing : Basic Concepts
 Indexing mechanisms used to speed up access to desired
data.
 E.g., The catalog of library.
 Search Key - attribute to set of attributes used to look up
records in a file.
 An index file consists of records (called index entries) of the
form
search-key pointer
 Index files are typically much smaller than the original file
 Two basic kinds of indices:
 Ordered indices: search keys are stored in sorted order
 Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.

Index Evaluation Factors
 Access types supported efficiently. E.g.,
 records with a specified value in the attribute
 or records with an attribute value falling in a specified range of
values.
 Access time
 Insertion time
 Deletion time
 Space overhead- additional space occupied by an index
structure.

Ordered Indices
Indexing techniques evaluated on basis of:
 In an ordered index, index entries are stored sorted on the
search key value. E.g., author catalog in library.
 Primary index: in a sequentially ordered file, the index whose
search key specifies the sequential order of the file.
 Also called clustering index
 The search key of a primary index is usually but not necessarily the
primary key.
 Secondary index: an index whose search key specifies an
order different from the sequential order of the file. Also called
non-clustering index.
 Index-sequential file: ordered sequential file with a primary
index.

Dense Index Files
 Dense index — Index record appears for every search-key value
in the file.

Sparse Index Files
 Sparse Index: contains index records for only some search-key
values.
 Applicable when records are sequentially ordered on search-key
 To locate a record with search-key value K we:
 Find index record with largest search-key value < K
 Search file sequentially starting at the record to which the index
record points
 Less space and less maintenance overhead for insertions and
deletions.
 Generally slower than dense index for locating records.
 Good tradeoff: sparse index with an index entry for every block
in file, corresponding to least search-key value in the block.

Example of Sparse Index Files

Multilevel Index
 If primary index does not fit in memory, access becomes
expensive.
 To reduce number of disk accesses to index records, treat
primary index kept on disk as a sequential file and construct a
sparse index on it.
 outer index – a sparse index of primary index
 inner index – the primary index file
 If even outer index is too large to fit in main memory, yet another
level of index can be created, and so on.
 Indices at all levels must be updated on insertion or deletion
from the file.

Multilevel Index (Cont.)

Index Update: Insertion
 Single-level index insertion:
 Perform a lookup using the search-key value appearing in the record
to be inserted.
 Dense indices – if the search-key value does not appear in the
index, insert it.
 Sparse indices – if index stores an entry for each block of the file, no
change needs to be made to the index unless a new block is
created. In this case, the first search-key value appearing in the
new block is inserted into the index.
 Multilevel insertion (as well as deletion) algorithms are simple
extensions of the single-level algorithms

Index Update: Deletion
 If deleted record was the only record in the file with its particular
search-key value, the search-key is deleted from the index also.
 Single-level index deletion:
 Dense indices – deletion of search-key is similar to file record
deletion.
 Sparse indices – if an entry for the search key exists in the index, it
is deleted by replacing the entry in the index with the next search-
key value in the file (in search-key order). If the next search-key
value already has an index entry, the entry is deleted instead of
being replaced.

Secondary Indices
 Frequently, one wants to find all the records whose

values in a certain field (which is not the search-key of
the primary index satisfy some condition.
 Example 1: In the account database stored sequentially
by account number, we may want to find all accounts in a
particular branch
 Example 2: as above, but where we want to find all
accounts with a specified balance or range of balances
 We can have a secondary index with an index record

for each search-key value; index record points to a
bucket that contains pointers to all the actual records
with that particular search-key value.

Secondary Index on balance field of
account

That’s all about Indices……
THANK YOU.

Dynamic Hashing and Indexing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dynamic Hashing and Indexing

Uploaded by

Copyright:

Available Formats

Dynamic Hashing

 Good for database that grows and shrinks in size

Database System Concepts 12.1 ©Silberschatz, Korth and Sudarshan

 Coalescing occurs when records are deleted, i.e. d>d’; The

Database System Concepts 12.2 ©Silberschatz, Korth and Sudarshan

Record K h(K) h(K)2

 Changing size of bucket address table is an expensive operation

Database System Concepts 12.10 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.11 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.12 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.13 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.14 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.15 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.16 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.17 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.18 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.19 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.20 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.21 ©Silberschatz, Korth and Sudarshan

 Frequently, one wants to find all the records whose

 We can have a secondary index with an index record

Database System Concepts 12.22 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.23 ©Silberschatz, Korth and Sudarshan

Database System Concepts 12.24 ©Silberschatz, Korth and Sudarshan

You might also like