You are on page 1of 18

HASHING

by David Toman University of Waterloo

LIST OF SLIDES

1-1

List of Slides
1 2 Other ways to Implement a Dictionary 3 Example 4 Considerations 5 Hash Functions 6 Collision Resolution 7 Separate Chaining 8 Example Data 9 Example 10 Coalesced Chaining 11 Example 12 Open Addressing 13 Example 14 Double Hashing 15 Example 16 Probe Sequences 17 Summary

Hashing: 2

Other ways to Implement a Dictionary

IDEA1: Store in an array indexed by the keys

really fast lookups/insertions/. . .


PROBLEM: VERY poor utilization of space

IDEA2: Store in an array indexed by results of applying a (hash) function on the keys

requirement:

where is the size of the hash table.

Data Types and Structures

CS 234 University of Waterloo

Hashing: 3

Example
Let be a hash function with range

Keys

Hash Function

Hash Table

. . .

: keys : values associated with the keys


CS 234 University of Waterloo

Data Types and Structures

Hashing: 4

Considerations

We need to solve two problems: 1. how do we design the hash function?

static case (we know all the keys in advance): perfect hash functions every key is mapped to a separate value the table size is number of keys dynamic case (we dont know the keys in advance)

mostly injective

2. what do we do if two or more records hash to the same place (a collision occurs)?

conict (collision) resolution

Data Types and Structures

CS 234 University of Waterloo

Hashing: 5

Hash Functions

Selection-based Hash Function Choose a part of the key to be the hash value

common choices: rst couple of bits xed pattern of bits problems with non-uniformly distributed keys
Division-based Hash Function Take the key modulo the table size

choose table size to be prime number


Folding-based Hash Function Take bit patterns and add them together

may need an adjustment to match the table size


Data Types and Structures CS 234 University of Waterloo

Hashing: 6

Collision Resolution

How good is a particular hashing scheme?

measured by load factor


load factor number of stored items size of table

the number of probes (accesses needed to locate a key) is compared against the load factor

load factor closer to 1 more collisions more probes

Data Types and Structures

CS 234 University of Waterloo

Hashing: 7

Separate Chaining

IDEA: buckets contain lists of elements

use to determine bucket retrieve by searching the bucket (a list!) insert by adding to the list

Advantages: exible size of buckets deletion is easy (how?) Disadvantages: worst case: linear search pointers take space

Data Types and Structures

CS 234 University of Waterloo

Hashing: 8

Example Data

Insert BEAR CAT COW DOG ELEPHANT FOX FROG GAZELLE HAMSTER HORSE JAGUAR KOALA LION RABBIT RAT SNAKE TIGER TOAD

into a 26-bucket Hash table based on a Hash function the 3rd letter of .

Data Types and Structures

CS 234 University of Waterloo

Hashing: 9

Example

TOAD RABBIT

SNAKE

KOALA

BEAR

ELEPHANT

DEER

TIGER

JAGUAR

DOG

HAMSTER


CS 234 University of Waterloo
CAT FROG

LION

HORSE

RAT

COW FOX

GAZELLE

Data Types and Structures

Hashing: 10

Coalesced Chaining

IDEA: Store overow in empty cells of the table and remember where you put it a pointer from the original bucket

Lookup : look at , if different from follow pointers till you nd it or till you nd a nil ( not found)

Insert : similar, if not found store in the next free cell update the last pointer

Problem: deletion is quite difcult.

Data Types and Structures

CS 234 University of Waterloo

Hashing: 11

Example
BEAR ELEPHANT JAGUAR KOALA DEER LION D H K J B

DOG RABBIT RAT SNAKE TIGER TOAD HAMSTER

FROG

HORSE

CAT

COW FOX

GAZELLE

Data Types and Structures

CS 234 University of Waterloo

Hashing: 12

Open Addressing
IDEA dont use pointers

use xed probe sequence is a hash function used for the -th probe should cover the whole table for varying
Most common version: Sequential (linear) probing:

mod

Problem: primary clustering

clogging parts of the table long chains of probes using different offset doesnt help
Another problem: deletion to insert TOAD we need 10 probes and may miss parts of the table one reason to pick prime

only possibility: mark as deleted.


Data Types and Structures CS 234 University of Waterloo

Hashing: 13

Example
BEAR KOALA RABBIT SNAKE DEER ELEPHANT

DOG JAGUAR TIGER TOAD

HAMSTER

FROG LION

HORSE

CAT RAT

COW FOX

GAZELLE

Data Types and Structures

CS 234 University of Waterloo

Hashing: 14

Double Hashing

IDEA: use 2nd hash function to determine the probe sequence for

mod

must be relatively prime to !

In our Example:

if the rst letter of

is

in alphabet.

Data Types and Structures

CS 234 University of Waterloo

Hashing: 15

Example
BEAR RABBIT

DEER



Data Types and Structures

DOG

RAT ELEPHANT TIGER KOALA HAMSTER

FROG

JAGUAR HORSE

CAT TOAD

COW FOX

GAZELLE LION

SNAKE

CS 234 University of Waterloo

Hashing: 16

Probe Sequences

Key BEAR CAT COW DOG ELEPHANT FOX FROG GAZELLE HAMSTER ...

Probes A,C,E,G,. . . T,W,Z,2,. . . W,Z,2,C,. . . G,K,O,S,. . . E,J,O T,. . . X,A,G,M. . . O,U,0,D,. . . Z,D,J,P,. . . M,U,2,H,. . .

Data Types and Structures

CS 234 University of Waterloo

Hashing: 17

Summary

fast array-like search

degenerates for high load factors (


easy implementation

no pointer overhead
many versions

extensible/linear hashing
the table grows/shrinks with inserts/deletes disk (block-based) versions

Data Types and Structures

CS 234 University of Waterloo

You might also like