Professional Documents
Culture Documents
Databases
CS263 Lecture 16
LECTURE PLAN
Parallel DBMS - What and Why?
What is a Client/Server DBMS?
Why do we need Distributed DBMSs?
Dates rules for a Distributed DBMS
Benefits of a Distributed DBMS
Issues associated with a Distributed DBMS
Disadvantages of a Distributed DBMS
PARALLEL DATABASE SYSTEM
PARALLEL DBMSs
WHY DO WE NEED THEM?
More and More Data!
We have databases that hold a high amount of
data, in the order of 1012 bytes:
10,000,000,000,000 bytes!
Faster and Faster Access!
We have data applications that need to process
data at very high speeds:
10,000s transactions per second!
SINGLE-PROCESSOR DBMS ARENT UP TO THE JOB!
PARALLEL DBMSs
BENEFITS OF A PARALLEL DBMS
INTERQUERY PARALLELISM
Improves Throughput.
INTRAQUERY PARALLELISM
Speed-Up.
Scale-up.
2000/Sec
1600/Sec
Sub-linear speed-up
1000/Sec
Number of CPUs
PARALLEL DBMSs
SCALE-UP
Number of transactions/second
5 CPUs 10 CPUs
1 GB Database 2 GB Database
CPU MEMORY
CPU
CPU
CPU
CPU
CPU
Shared Disk Parallel Database Architecture
M CPU
M CPU
M CPU
M CPU
M CPU
M CPU
Shared Nothing Parallel Database Architecture
M CPU
CPU M
M CPU
CPU M
M CPU
MAINFRAME DATABASE
SYSTEM
SPECIALISED NETWORK CONNECTION
TERMINALS
MAINFRAME COMPUTER
DUMB
DUMB
SERVER
CLIENT
#2
D/BASE
CLIENT
#3
DATA LOGIC
PRESENTATION LOGIC
BUSINESS LOGIC Data Request
(FAT CLIENT) Data Response
CLIENT CLIENT/SERVER
#1
DBMS ARCHITECTURE
SERVER
CLIENT
#2
D/BASE
CLIENT
#3
BUSINESS LOGIC
DATA LOGIC
PRESENTATION LOGIC
(THIN CLIENT) Data Request
Data Response
DISTRIBUTED PROCESSING ARCHITECTURE
CLIENT CLIENT
CLIENT CLIENT
LAN
LAN
CLIENT CLIENT
CLIENT CLIENT
Stratford Leyton
DBMS
LAN LAN
CLIENT CLIENT
CLIENT CLIENT
Barking Leytonstone
DISTRIBUTED DATABASE
SYSTEM
DISTRIBUTED DATABASES
WHAT IS A DISTRIBUTED DATABASE?
A distributed database system is a collection of
logically related databases that co-operate in a
transparent manner.
DBMS
DBMS
LAN
Stratford Leyton
CLIENT
CLIENT CLIENT CLIENT CLIENT
DBMS
DBMS
LAN
Barking Leytonstone
M:N CLIENT/SERVER DBMS ARCHITECTURE
SERVER #1
CLIENT
#1
D/BASE
CLIENT
#2
SERVER #2
D/BASE
CLIENT
#3
NOT TRANSPARENT!
COMPONENTS OF A DDBMS
Site 1
DDBMS
DC LDBMS
GSC
Computer DB
Network
GSC
DDBMS
LDBMS = Local DBMS
DC DC = Data Communications
GSC = Global Systems Catalog
Site 2 DDBMS = Distributed DBMS
DISTRIBUTED DATABASES
ADVANTAGES
Reduced Communication Overhead
Most data access is local, less expensive and performs
better.
Improved Processing Power
Instead of one server handling the full database, we now
have a collection of machines handling the same database.
Data Allocation
Data Fragmentation
1. Locality of reference
Is the data near to the sites that need it?
3. Performance
Does the strategy result in bottlenecks or under-utilisation of resources?
4. Storage costs
How does the strategy effect the availability and cost of data storage?
5. Communication costs
How much network traffic will result from the strategy?
DISTRIBUTED DATABASES
DATA ALLOCATION STRATEGIES
CENTRALISED
Reliability/Availability Lowest
Performance Unsatisfactory
PARTITIONED/FRAGMENTED
Performance Satisfactory
COMPLETE REPLICATION
Reliability/Availability Highest
Performance High
SELECTIVE REPLICATION
Performance Satisfactory
Usage
Applications are usually interested in views not whole relations.
Efficiency
Its more efficient if data is close to where it is frequently used.
Parallelism
It is possible to run several sub-queries in tandem.
Security
Data not required by local applications is not stored at the local
site.
DISTRIBUTED DATABASES
HORIZONTAL DATA FRAGMENTATION
ACCOUNT CUSTOMER BRANCH BALANCE
200 JONES STRATFORD 1000.00
324 GRAY BARKING 200.00
345 SMITH STRATFORD 23.17
350 GREEN BARKING 340.14
400 ONO BARKING 500.00
456 KHAN STRATFORD 333.00
Horizontal Fragmentation: Consists of a Restriction on a Relation.
NETWORK ADMINISTRATION
S# LOGIN-ID PASSWORD
200 JON200T XXYY22
324 GRA324S ZZEE56
456 KHA456T KJTR78
DISTRIBUTED DATABASES
DISTRIBUTED CATALOG MANAGEMENT
Dispersed Catalog
There is no physical global catalog. Each time a remote
data item is required, the catalogues from ALL other sites
are examined for the item. This has severe performance
penalties.
DISTRIBUTED DATABASES
DISTRIBUTED CATALOG MANAGEMENT
Local-Master Catalog
Each site maintains both its local system catalog as well
as a catalog of all of its data items that are replicated at
other sites. This avoids compromising site autonomy, is
fairly efficient, and is not a single point of failure.
DISTRIBUTED DATABASES
DISTRIBUTED TRANSACTIONS
Stratford (a)
Stratford DBMS Stratford DB
Client
Stratford
Barking (b)
Client Barking DB
DBMS
Global Transaction
Leyton (c)
(a) Debit Stratford A/C 500 DBMS Leyton DB
(b) Credit Barking A/C 350
(c) Credit Leyton A/C 150
TWO-PHASE COMMIT (2PC) - OK
TWO-PHASE COMMIT (2PC) - ABORT
DISTRIBUTED DATABASES
DISADVANTAGES OF DDBMSs
Architectural complexity.
Cost.
Security.
Lack of experience.