CH 6

Chapter 6: Physical Database Design and Performance
Outline
The physical database design process
Input and output Know what columns to index
Indexes and their appropriate use
Know different RAID levels and how to choose RAID levels
The Physical Design Stage of SDLC (Figures 2-4, 2-5 revisited)

Project Identification and Selection Project Initiation and Planning Analysis
Purpose develop technology specs Deliverable program/data structures, technology purchases, organization redesigns
Logical Design Physical Design Physical Design
Database activity physical database design
Implementation Maintenance
Physical Database Design
Purpose - translate the logical description of data into the technical specifications for storing and retrieving data Goal - create a design for storing data that will provide adequate performance and insure database integrity, security and
recoverability
Input
Logical design (normalized relations) Statistics about data

Number of rows Number of distinct values for each attribute, ranges Ideally, the set of queries (including insert, update, delete, select) and their frequencies E.g., average response time, number of queries per minute
Information about usage (called workload)
Requirement for performance
Output
Attribute data types Indexes

Storage (RAID level) Denormalization (merge some tables such as look up table)
Others (not covered)

Partitioning database into smaller pieces

6
Physical Design Process

Inputs
Normalized
Volume
relations
Decisions
Attribute data types Physical record descriptions
estimates
Attribute definitions Response time Data
expectations
Leads to File
(doesnt always match logical design) organizations database architectures
security needs
Backup/recovery needs Integrity expectations DBMS
Indexes and
technology used
Query optimization
Figure 6-1 Composite usage map (Pine Valley Furniture Company)
Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.)
Data volumes
Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.)
Access Frequencies (per hour)
10
Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.) Usage analysis:
140 purchased parts accessed per hour 80 quotations accessed from these 140 purchased part accesses 70 suppliers accessed from these 80 quotation accesses
11
Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.) Usage analysis:
75 suppliers accessed per hour 40 quotations accessed from these 75 supplier accesses 40 purchased parts accessed from these 40 quotation accesses
12
Designing Fields
Field: smallest unit of data in database Field design
Choosing data type Coding, compression, encryption Controlling data integrity
13
Figure 6-2 Example code look-up table (Pine Valley Furniture Company)
Code saves space, but costs an additional lookup to obtain actual value
14
Outline
Input and output Know what columns to index
Indexes and their appropriate use
15
Motivation
Suppose the following query is running really slow. What can you do to make it faster without buying new hardware?
Return product ID and quantity of product in the order with oid = 2 Select pid, quantity From order_line Where oid=2
16
Motivation
Select pid, quantity From order_line Where oid=2
If there are 10000 order_line
You have to go through all of them to check whether Oid = 2.
17
Motivation
Select pid, quantity From order_line Where oid=2

We can create an index on order_line.oid The index will be used to find order_line with oid=2 Only those order_line will be accessed
18
Basic Concepts
Index is a data structure that speeds up access to a table (so each index is tied to a table) Can you create an index on two tables? Components of an index
Search key Index entries Index file Primary & secondary

19
Types of indexes
Benefits and Cost of Indexes
Benefits:
Speed up queries
Cost:
Whenever data gets updated, index also needs to be updated Index needs disk space as well
20
Basic Concepts
Search Key - attribute or set of attributes used to look up records in a file. Select pid, quantity From order_line Where oid=2
We can create an index on order_line.oid Search key of this index?

21
Basic Concepts
An index file consists of index entries of the form

search-key
pointer
Pointer either points to data records or another index entry (we will talk about this later)
Index files are typically much smaller than the original file because all other columns are not stored
22
Primary Index
Each table may have one primary index such that Index entries are stored sorted on the search key value, so do the rows in the table. E.g., Create an index on orders.oid If this index and the orders table are sorted on oid, this index is the primary index of orders
Table(oid,): Index:
(1,) (1,)
(2,) (2,)
(,)
(,)
How many primary indexes a table can have? Only one since table can be only sorted in one way
23
Secondary Index

Secondary index: an index whose index entries are not sorted in the same order as the rows in the table E.g., Suppose we create an index on order_line.pid
(1,3) (2,5,) (3,2,) (,) (1000,1,)
Table (oid,pid,):
Index(pid, pointer):
(1,)
(2,)
(3,)
(,)
Index: How many secondary indexes a table can have? As many as you want. But secondary index is less efficient than primary index since data are not sorted in the same order (so need to move back and forth if many rows are retrieved)
24
B+-tree index
The most commonly used type of index
Q1: Select * from order_line where oid >= 1234 and oid <= 2000;
Q2: Select pid, quantity From order_line Where oid=2 We can create an B+-tree index on order_line.oid to evaluate the range condition in Q1 and the equality condition in Q2
25
B+-tree index
B+-tree index can answer both equality conditions and range conditions very efficiently It has a tree structure, where leaves points to real data, and are sorted on indexed columns Intermediate levels help locate leaves satisfying range or equality conditions
26
Figure 6-7b B-tree index
Leaves of the tree are all at same level consistent access time
uses a tree search

Average time to find desired record = depth of the tree
27
Index Definition in SQL
Create an index
create index <index-name> on <relation-name> (<attribute-list>) Use comma to separate attributes if there are more than one
To drop an index
drop index <index-name>
Oracle (and DB2, SQL Server) uses B+ tree index

28

E.g. Create index idx_orderline_pid on order_line(pid);

There is no need to create an index on primary key because it is created automatically when the primary key is defined
29

Check whether an index exists select * from user_indexes; Many indexes with weird names (start with sys) are system created indexes for primary keys
Check details of an index select * from USER_IND_COLUMNS;
Drop an index: drop index index-name; Drop index idx_orderline_pid;
30
Rules of Thumb of Index Creation
Indexes are not needed for toy data sets (no more than 100 rows) Indexes are always needed for any serious application Whether the index is used in a SQL query is still decided by DBMS Index incurs update overhead and needs extra space
31

Create index on columns appear in where clause The index will be used to evaluate where condition efficiently
Create multi-column index if there are multiple condition on the same table and at least one of them is equality condition The order in multi-column index starts with the those columns with equality conditions
E.g. select pid from order_line where pid = 2 and quantity > 1; Create an index on order_line(pid, quantity) An index on order_line(quantity, pid) is not useful
32
Create index on columns appear in join condition (or foreign key columns)
E.g. select * from product p, order_line L where p.pid = L.pid Create an index on order_line(pid) The index will be used to evaluate join efficiently Do you need to create an index on product.pid?
33

Create index on group by and order by columns E.g. select sum(quantity) from order_line group by pid;
Create an index on order_line(pid) The index will be used to evaluate the group by more efficiently
34

Create index on all columns appear in the SQL if there are many columns that do not appear in the SQL statement E.g. select pid from order_line; Create an index on order_line(pid) The index will be used to access only columns appear in the SQL statement
35
Outline
Input and output
Choose storage formats for attributes Indexes and their appropriate use
Know what columns to index

36
Types of Storage Media

Price goes down from top to bottom Capacity goes up Speed goes down
CPU
Cache
Only hard disk is non volatile (meaning it survives power off)
Memory
Hard disks Database is stored on hard disks

37
RAID
Motivation
Suppose your application needs very high read/write rate? (e.g., 100 customers visit your website per minute) Buy a faster disk? But better solutions? Your application also needs to run at 24/7, but what if your disk fails? Is backup the solution?
38
Redundant Arrays of Independent Disks
RAID Solution: buy more disks rather than faster disks

Speed up read/write by reading/writing multiple disks at the same time Improve reliability by having copies of data at run time (So if one disk fails, program can still run) Cost factor: multiple cheap disks are often cheaper than a single super fast disk
39
RAID
A set of disk drives that appear to the user to be a single disk drive (you only see one disk in your OS) Data blocks (pages) are arranged in stripes such that data are spread onto multiple disks There are different ways of striping, represented as different RAID levels We assume data blocks (sectors) are in the order of 1,2,3,4,
40
RAID-0
All
disks are used for parallel read and write Best I/O performance No backup, so data may get lost
41
RAID Levels
RAID Level 1: Mirrored disks
Keep one identical copy of disks So improve reliability If 1% of chance a disk will fail each month, the chance of both disks fail in the same month?
1 2 3 4 1 2 3 4
42
RAID Levels
RAID Level 0+1: when number of disks >= 4
Two copies of data, half of disks for each copy (mirroring) Within each copy, spread data on all disks (striping) Benefits of both RAID 0 (parallel read/write) and RAID 1 (reliability) But cost may be high (minimal 4 disks, and half space for redundant
copy)
2 4
1
3
2
4
43
Parity Code for Error Detection
For a block of data, compute the number of bits with value 1 If there are odd number of bits, parity code = 1 Otherwise, parity code = 0
E.g., Data 1101: parity = 1 Data 1100: parity = 0 So if one bit gets corrupted, the parity and the data does not match
E.g., Correct Data 1101: parity = 1 Corrupted Data 1100: parity = 1 but number of 1 bits is even
44
RAID Levels (Cont.)
RAID Level 5: Compute one parity block over one bit of data on other disks
Data
Data
Data
Parity
45
RAID Levels (Cont.)
RAID Level 5: Each disk can detect whether it is functioning correctly

Data
Data
Data
Parity
What shall be the data on disk 3?

46
RAID Levels (Cont.)
RAID Level 5: Each disk can detect whether it is functioning correctly

Data
Data
Data
Parity
Suppose only one disk goes wrong, we can recover the data on that disk using data on other disks and parity. The data must be 0 in this case.
47
RAID Levels (Cont.)
RAID Level 5: striping data and parity blocks

E.g., 4 disks, 12 data blocks, 4 parity blocks Tips: first put parity blocks on different disks (in a different row for each disk), then put data blocks to the remaining slots
P1 4 7 ? 1 P2 8 ?
2 5
P3 ?
6
9 ?
48
RAID Levels (Cont.)
RAID Level 5: striping data and parity blocks
So if only one disk goes wrong, all data stored on that disk can be recovered What is the size of copy (parity) given n disks?
P1 4 7 10 1 P2 8 11
2 5
P3 12
6
9 P4
49
Choice of RAID Level
Consider level 0, 0+1, and 5 (level 1 can be seen as Level 0+1 for 2 disks) Read performance: about the same for Level 0, 0+1, and 5 (all disks can read at the same time) Write performance: Level 0 and 0+1 the best, Level 5 is bad because parity block also needs to be read/updated if data get updated
50
Choice of RAID Level
Fault tolerant: Level 0 not fault tolerant, all others do Space wasted for fault tolerant: Level 0: waste no space but NOT fault tolerant Level 0+1: half of disks (bad) Level 5: 1 disk, no matter the total number of disks (good)
51
Denormalization
Transforming normalized relations into unnormalized physical record specifications Benefits:
Can improve performance (speed) by reducing number of table lookups (i.e. reduce number of necessary join queries)
Costs (due to data duplication)

Wasted storage space Data integrity/consistency threats
Common denormalization opportunities

One-to-one relationship (Fig. 6-3) Many-to-many relationship with attributes (Fig. 6-4) Reference data (1:N relationship where 1-side has data not used in any other relationship) (Fig. 6-5)
52
Figure 6-3 A possible denormalization situation: two entities with oneto-one relationship
53
Figure 6-4 A possible denormalization situation: a many-to-many

relationship with nonkey attributes
Extra table access required
Null description possible

54
Figure 6-5
A possible denormalization situation: reference data
Extra table access required
Data duplication
55

CH 6

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH 6

Uploaded by

Copyright:

Available Formats

Chapter 6: Physical Database Design and Performance

The physical database design process

Input and output Know what columns to index

Indexes and their appropriate use

Know different RAID levels and how to choose RAID levels

The Physical Design Stage of SDLC (Figures 2-4, 2-5 revisited)

Logical Design Physical Design Physical Design

Database activity physical database design

Physical Database Design

Logical design (normalized relations) Statistics about data

Information about usage (called workload)

Requirement for performance

Attribute data types Indexes

Others (not covered)

Partitioning database into smaller pieces

Physical Design Process

Attribute definitions Response time Data

(doesnt always match logical design) organizations database architectures

Backup/recovery needs Integrity expectations DBMS

Figure 6-1 Composite usage map (Pine Valley Furniture Company)

Access Frequencies (per hour)

Choosing data type Coding, compression, encryption Controlling data integrity

The physical database design process

Input and output Know what columns to index

Indexes and their appropriate use

Know different RAID levels and how to choose RAID levels

You have to go through all of them to check whether Oid = 2.

Search key Index entries Index file Primary & secondary

Benefits and Cost of Indexes

We can create an index on order_line.oid Search key of this index?

An index file consists of index entries of the form

Figure 6-7b B-tree index

uses a tree search

Index Definition in SQL

Oracle (and DB2, SQL Server) uses B+ tree index

Index Definition in SQL

Index Definition in SQL

Check details of an index select * from USER_IND_COLUMNS;

Drop an index: drop index index-name; Drop index idx_orderline_pid;

Rules of Thumb of Index Creation

Rules of Thumb of Index Creation

Rules of Thumb of Index Creation

Rules of Thumb of Index Creation

Rules of Thumb of Index Creation

The physical database design process

Input and output

Know what columns to index

Know different RAID levels and how to choose RAID levels

Types of Storage Media

Only hard disk is non volatile (meaning it survives power off)

Hard disks Database is stored on hard disks

Redundant Arrays of Independent Disks

RAID Solution: buy more disks rather than faster disks

Parity Code for Error Detection

RAID Levels (Cont.)

RAID Levels (Cont.)

RAID Level 5: Each disk can detect whether it is functioning correctly

What shall be the data on disk 3?

RAID Levels (Cont.)

RAID Level 5: Each disk can detect whether it is functioning correctly

RAID Levels (Cont.)

RAID Level 5: striping data and parity blocks