Professional Documents
Culture Documents
Current Trends
Distributed Databases and DBMSs: Concepts and Design
Slide 1/32
12.0 Content
Content
12.1 Objectives 12.2 Overview of Networking 12.3 Introduction to DDBMSs
- Concepts - Advantages and Disadvantages - Homogeneous and Heterogeneous
Slide 2/32
12.1 Objectives
Objectives
In this Lecture you will learn:
Concepts. Advantages and disadvantages of distributed databases. Functions and architecture for a DDBMS. Distributed database design. Levels of transparency. Comparison criteria for DDBMSs.
Slide 3/32
Overview of Networking
Network: interconnected collection of autonomous computers, capable of exchanging information. Local Area Network (LAN) intended for connecting computers at same site. Wide Area Network (WAN) used when computers or LANs need to be connected over long distances. WAN relatively slow Less reliable than LANs. DDBMS using LAN provides much faster response time than one using WAN.
Slide 4/32
Overview of Networking
Network: interconnected collection of autonomous computers, capable of exchanging information. Local Area Network (LAN) intended for connecting computers at same site. Wide Area Network (WAN) used when computers or LANs need to be connected over long distances. WAN relatively slow Less reliable than LANs. DDBMS using LAN provides much faster response time than one using WAN.
Slide 5/32
12.3 Introduction
Concepts
Databases and networks: 1. 2. 3. A centralized DBMS could be physically processed by several computers distributed across a network There could be several separate DBMS on several computers distributed across a network There may be a Distributed DBMS (DDBMS)
made up of several DBMSs distributed across a network each with local autonomy Each participates in at least one global DBMS action The DDBMS therefore can operate as a single global DBMS
Slide 6/32
12.3 Introduction
Concepts
DDBMS to Avoid `islands of information problem A Distributed Database : is a logically interrelated collection of
shared data (and a description of this data), physically distributed over a computer network. A Distributed DBMS (DDBMS): is a Software system that permits the management of the distributed database and makes the distribution transparent to users.
Slide 7/32
12.3 Introduction
Concepts
DDBMS has following characteristics: Collection of logically-related shared data. Data split into fragments. Fragments may be replicated. Fragments/replicas allocated to sites. Sites linked by a communication network. Data at each site is under control of a DBMS. DBMSs handle local applications autonomously. Each DBMS participates in at least one global application.
Slide 8/32
12.3 Introduction Important difference between DDBMS and distributed processing ! Distributed processing of DDBMS centralised DBMS
Slide 9/32
12.3 Introduction
Distributed Processing
Distributed processing of a centralised DBMS has following characteristics : Much more tightly coupled than a DDBMS. Database design is same as for standard DBMS No attempt to reflect organizational structure Much simpler than DDBMS More secure than DDBMS No local autonomy
Slide 10/32
12.3 Introduction Important difference between DDBMS and parallel database DDBMS
Parallel Database Architectures: Shared: a)memory b)disk c)nothing
Slide 11/32
12.3 Introduction
12.3 Introduction
12.3 Introduction
Slide 14/32
12.3 Introduction
Slide 15/32
Slide 16/32
Functions of a DDBMS
Expect DDBMS to have at least the functionality of a DBMS.
Also to have following functionality: Extended communication services. Extended Data Dictionary. Distributed query processing. Extended concurrency control. Extended recovery services.
Slide 17/32
Slide 19/32
Data Allocation !
Four alternative strategies regarding placement of data:
Centralized: single database and DBMS stored at one site with users distributed across the network. Partitioned: Database partitioned into disjoint fragments, each fragment assigned to one site. Complete Replication: Consists of maintaining complete copy of database at each site. Selective Replication: Combination of partitioning, replication, and centralization.
Comparison of strategies
Slide 20/32
Data Allocation
Four alternative strategies regarding placement of data:
Centralized: single database and DBMS stored at one site with users distributed across the network. Partitioned: Database partitioned into disjoint fragments, each fragment assigned to one site. Complete Replication: Consists of maintaining complete copy of database at each site. Selective Replication: Combination of partitioning, replication, and centralization.
Comparison of strategies
Slide 21/32
Fragmentation
Why fragment?
Usage: - Apps work with views rather than entire relations. Efficiency: - Data stored close to where most frequently used. - Data not needed by local applications is not stored. Security: - and so not available to unauthorized users. Parallelism: - With fragments as unit of distribution, T can be divided into several subqueries that operate on fragments. Disadvantages: Performance & Integrity.
Slide 22/32
Fragmentation !
Three Correctness of fragmentation rules:
1. Completeness: If relation R decomposed into fragments R1, R2, ... Rn, each data item that can be found in R must appear in at least one fragment. 2. Reconstruction: Must be possible to define a relational operation that will reconstruct R from the fragments. - for horizontal fragmentation: Union operation - for vertical: Join 3. Disjointness: If data item di appears in fragment Ri, then should not appear in any other fragment. - Exception: vertical fragmentation. - For horizontal fragmentation, data item is a tuple. - For vertical fragmentation, data item is an attribute.
Slide 23/32
Fragmentation !
Four types of fragmentation:
1.
Horizontal:
- Defined using Selection operation - Determined by looking at predicates used by Ts. - Involves finding set of minimal (complete and relevant) predicates. - Set of predicates is complete, iff, any two tuples in same fragment are referenced with same probability by any application. - Predicate is relevant if there is at least one application that accesses fragments differently.
Slide 24/32
Fragmentation !
Four types of fragmentation: 2. Vertical: subset of atts of a relation.
Other possibility is no fragmentation: -If relation is small and not updated frequently, may be better not to fragment. - Defined using Projection operation - Determined by establishing affinity of one attribute to another.
Transparency in a DDBMS
Transparency hides implementation details from users. Overall objective: equivalence to user of DDBMs to centralised DBMS - FULL transparency not universally accepted objective
Four main types:
1. 2. 3. 4.
Distribution transparency Transaction transparency Performance transparency DBMS transparency (only applicable to heterogeneous)
Slide 26/32
1. Distribution Transparency
Distribution transparency: allows user to perceive database as single, logical entity.
If DDBMS exhibits distribution transparency, user does not need to know: fragmentation transparency: data is fragmented Location transparency: location of data items otherwise call this local mapping transparency replication transparency: user unaware of replication of fragments Naming transparency: each item in a DDB must have a unique name. -One solution: create central name server - loss of some local autonomy.
- central site may become a bottleneck. - low availability: if the central site fails.
Alternative solution: prefix object with identifier of creator site, each fragment and its copies. Then each site uses alias.
Slide 27/32
2. Transaction Transparency
Transaction transparency: Ensures all distributed Ts maintain distributed database s integrity and consistency. Distributed T accesses data stored at more than one location. Each T is divided into no. of subTs, one for each site that has to be accessed. DDBMS must ensure the indivisibility of both the global T and each of the subTs.
Slide 28/32
2. Transaction Transparency
Concurrency transparency: All Ts must execute independently and be
logically consistent with results obtained if Ts executed in some arbitrary serial order. Replication makes concurrency more complex
3. Performance Transparency
DDBMS: - no performance degradation due to distributed architecture. - determine most cost-effective strategy to execute a request. Distributed Query Processor (DQP) maps data request into ordered sequence of operations on local databases. - Must consider fragmentation, replication, and allocation schemas. DQP has to decide: 1. which fragment to access 2. which copy of a fragment to use 3. which location to use. - produces execution strategy optimized with respect to some cost function. Typically, costs associated with a distributed request include: I/O cost; CPU cost, communication cost.
Slide 30/32
Ideals: 9. Hardware Independence 10. Operating System Independence 11. Network Independence 12. Database Independence
Slide 31/32
12.8 Summary
Summary
12.1 Objectives 12.2 Overview of Networking 12.3 Introduction to DDBMSs
Concepts Advantages and Disadvantages Homogeneous and Heterogeneous Functions of a DDBMS Reference Architecture for a DDBMS/ Federated MDBS Data Allocation Fragmentation
NEXT LECTURE:
III Current Trends Part 2: Distributed DBMSsAdvanced concepts - advanced concepts - protocols for distributed deadlock control - X/Open Distributed Transaction Processing Model - Oracle.
Slide 32/32