Unit-5: Database System Concepts, 6 Ed

UNIT- 5
Database System Concepts, 6th Ed.

Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use
Centralized Systems
Run on a single computer system and do not interact with other
computer systems.
General-purpose computer system: one to a few CPUs and a number
of device controllers that are connected through a common bus that provides access to shared memory.
Single-user system (e.g., personal computer or workstation): desk-top
unit, single user, usually has only one CPU and one or two hard disks; the OS may support only one user.
Multi-user system: more disks, more memory, multiple CPUs, and a
multi-user OS. Serve a large number of users who are connected to the system vie terminals. Often called server systems.
Database System Concepts - 6th Edition
22.2
Silberschatz, Korth and Sudarshan
A Centralized Computer System
22.3
Client-Server Systems
Server systems satisfy requests generated at m client systems, whose
general structure is shown below:
22.4
Client-Server Systems (Cont.)

Database functionality can be divided into:
Back-end: manages access structures, query evaluation and optimization, concurrency control and recovery.
Front-end: consists of tools such as forms, report-writers, and graphical user interface facilities.
The interface between the front-end and the back-end is through SQL or
through an application program interface.
22.5
Client-Server Systems (Cont.)

Advantages of replacing mainframes with networks of workstations or
personal computers connected to back-end server machines:

better functionality for the cost flexibility in locating resources and expanding facilities better user interfaces easier maintenance
22.6
Server System Architecture

Server systems can be broadly categorized into two kinds:

transaction servers which are widely used in relational database systems, and data servers, used in object-oriented database systems
22.7
Transaction Servers
Also called query server systems or SQL server systems

Clients send requests to the server Transactions are executed at the server Results are shipped back to the client.
Requests are specified in SQL, and communicated to the server
through a remote procedure call (RPC) mechanism.

Transactional RPC allows many RPC calls to form a transaction. Open Database Connectivity (ODBC) is a C language application
program interface standard from Microsoft for connecting to a server, sending SQL requests, and receiving results.
JDBC standard is similar to ODBC, for Java
22.8
Transaction Server Process Structure

A typical transaction server consists of multiple processes accessing
data in shared memory.

Server processes
These receive user queries (transactions), execute them and send results back
Processes may be multithreaded, allowing a single process to execute several user queries concurrently
Typically multiple multithreaded server processes
Lock manager process
More on this later

Output modified buffer blocks to disks continually
Database writer process
22.9
Transaction Server Processes (Cont.)

Log writer process
Server processes simply add log records to log record buffer

Log writer process outputs log records to stable storage. Performs periodic checkpoints Monitors other processes, and takes recovery actions if any of the other processes fail
Checkpoint process
Process monitor process
E.g., aborting any transactions being executed by a server process and restarting it
22.10
Transaction System Processes (Cont.)
22.11
Transaction System Processes (Cont.)

Shared memory contains shared data
Buffer pool Lock table Log buffer
Cached query plans (reused if same query submitted again) All database processes can access shared memory To ensure that no two processes are accessing the same data structure at the same time, databases systems implement mutual exclusion using either Operating system semaphores Atomic instructions such as test-and-set
To avoid overhead of interprocess communication for lock
request/grant, each database process operates directly on the lock table instead of sending requests to lock manager process Lock manager process still used for deadlock detection
Database System Concepts - 6th Edition 22.12 Silberschatz, Korth and Sudarshan
Data Servers
Used in high-speed LANs, in cases where

The clients are comparable in processing power to the server The tasks to be executed are compute intensive.
Data are shipped to clients where processing is performed, and then
shipped results back to the server.

This architecture requires full back-end functionality at the clients. Used in many object-oriented database systems Issues:

Page-Shipping versus Item-Shipping Locking Data Caching Lock Caching
22.13
Data Servers (Cont.)

Page-shipping versus item-shipping
Smaller unit of shipping more messages Worth prefetching related items along with requested item Page shipping can be thought of as a form of prefetching Locking
Overhead of requesting and getting locks from server is high due to message delays Can grant locks on requested and prefetched items; with page shipping, transaction is granted lock on whole page. Locks on a prefetched item can be P{called back} by the server, and returned by client transaction if the prefetched item has not been used. Locks on the page can be de escalated to locks on items in the page when there are lock conflicts. Locks on unused items can then be returned to server.
22.14
Data Servers (Cont.)

Data Caching
Data can be cached at client even in between transactions

But check that data is up-to-date before it is used (cache coherency) Check can be done when requesting lock on data item Locks can be retained by client system even in between transactions Transactions can acquire cached locks locally, without contacting server Server calls back locks from clients when it receives conflicting lock request. Client returns lock once no local transaction is using it. Similar to deescalation, but across transactions.
Lock Caching

22.15
Parallel Systems
Parallel database systems consist of multiple processors and multiple
disks connected by a fast interconnection network.

A coarse-grain parallel machine consists of a small number of
powerful processors
A massively parallel or fine grain parallel machine utilizes
thousands of smaller processors.

Two main performance measures:

throughput --- the number of tasks that can be completed in a given time interval response time --- the amount of time it takes to complete a single task from the time it is submitted
22.16
Speed-Up and Scale-Up

Speedup: a fixed-sized problem executing on a small system is given
to a system which is N-times larger.
Measured by: speedup = small system elapsed time large system elapsed time
Speedup is linear if equation equals N. N-times larger system used to perform N-times larger job Measured by: scaleup = small system small problem elapsed time big system big problem elapsed time
Scaleup: increase the size of both the problem and the system

Scale up is linear if equation equals 1.
22.17
Speedup
22.18
Scaleup
22.19
Batch and Transaction Scaleup

Batch scaleup:

A single large job; typical of most decision support queries and scientific simulation. Use an N-times larger computer on N-times larger problem. Numerous small queries submitted by independent users to a shared database; typical transaction processing and timesharing systems. N-times as many users submitting requests (hence, N-times as many requests) to an N-times larger database, on an N-times larger computer. Well-suited to parallel execution.
Transaction scaleup:
22.20
Factors Limiting Speedup and Scaleup

Speedup and scaleup are often sublinear due to:
Startup costs: Cost of starting up multiple processes may dominate
computation time, if the degree of parallelism is high.

Interference: Processes accessing shared resources (e.g., system
bus, disks, or locks) compete with each other, thus spending time waiting on other processes, rather than performing useful work.
Skew: Increasing the degree of parallelism increases the variance in
service times of parallely executing tasks. Overall execution time determined by slowest of parallely executing tasks.
22.21
Parallel Database Architectures

Shared memory -- processors share a common memory Shared disk -- processors share a common disk Shared nothing -- processors share neither a common memory nor
common disk
Hierarchical -- hybrid of the above architectures
22.22
Parallel Database Architectures
22.23
Shared Memory
Processors and disks have access to a common memory, typically via
a bus or through an interconnection network.

Extremely efficient communication between processors data in
shared memory can be accessed by any processor without having to move it using software.
Downside architecture is not scalable beyond 32 or 64 processors
since the bus or the interconnection network becomes a bottleneck

Widely used for lower degrees of parallelism (4 to 8).
22.24
Shared Disk
All processors can directly access all disks via an interconnection
network, but the processors have private memories.

The memory bus is not a bottleneck Architecture provides a degree of fault-tolerance if a processor fails, the other processors can take over its tasks since the database is resident on disks that are accessible from all processors.
Examples: IBM Sysplex and DEC clusters (now part of Compaq)
running Rdb (now Oracle Rdb) were early commercial users

Downside: bottleneck now occurs at interconnection to the disk
subsystem.
Shared-disk systems can scale to a somewhat larger number of
processors, but communication between processors is slower.
22.25
Shared Nothing
Node consists of a processor, memory, and one or more disks.
Processors at one node communicate with another processor at another node using an interconnection network. A node functions as the server for the data on the disk or disks the node owns.
Examples: Teradata, Tandem, Oracle-n CUBE Data accessed from local disks (and local memory accesses) do not
pass through interconnection network, thereby minimizing the interference of resource sharing.
Shared-nothing multiprocessors can be scaled up to thousands of
processors without interference.

Main drawback: cost of communication and non-local disk access;
sending data involves software interaction at both ends.
22.26
Hierarchical
Combines characteristics of shared-memory, shared-disk, and shared-
nothing architectures.
Top level is a shared-nothing architecture nodes connected by an
interconnection network, and do not share disks or memory with each other.
Each node of the system could be a shared-memory system with a
few processors.
Alternatively, each node could be a shared-disk system, and each of
the systems sharing a set of disks could be a shared-memory system.

Reduce the complexity of programming such systems by distributed
virtual-memory architectures
Also called non-uniform memory architecture (NUMA)
22.27
Hybrid architecture
hybrid architecture includes:
Non-Uniform Memory Architecture (NUMA), which involves the Non-
Uniform Memory Access.

Cluster (shared nothing + shared disk: SAN/NAS), which is formed by
a group of connected computers.

Non-Uniform Memory Access (NUMA) is a computer memory
design used in multiprocessing, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than nonlocal memory (memory local to another processor or memory shared between processors).
NUMA architectures logically follow in scaling from symmetric
multiprocessing (SMP) architectures.
22.28
XML
Using an example explain the distinction between attribute and a sub
element. Explain the purpose and use of namespaces

Give the DTD for an XML representation of the following nested-
relational schema.
Emp = (ename, ChildrenSet setof(Children), SkillsSet setof(Skills)) Children = (name, Birthday) Birthday = (day, month, year) Skills = (type, ExamsSet setof(Exams)) Exams = (year, city)
Explain the limitations of DTD. Describe the alternative to overcome
this limitation.
22.29
Introduction
XML: Extensible Markup Language Defined by the WWW Consortium (W3C) Derived from SGML (Standard Generalized Markup Language), but
simpler to use than SGML

Documents have tags giving extra information about sections of the
document
E.g. <title> XML </title> <slide> Introduction </slide> Users can add new tags, and separately specify how the tag should be handled for display
Extensible, unlike HTML
22.30
Comparison with Relational Data

Inefficient: tags, which in effect represent schema information, are
repeated
Better than relational tuples as a data-exchange format
Unlike relational tuples, XML data is self-documenting due to presence of tags
Non-rigid format: tags can be added

Allows nested structures Wide acceptance, not only in database systems, but also in browsers, tools, and applications
22.31
Structure of XML Data

Tag: label for a section of data Element: section of data beginning with <tagname> and ending with
matching </tagname>
Elements must be properly nested
Proper nesting
<course> <title> . </title> </course> <course> <title> . </course> </title>
Improper nesting
Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element.
Every document must have a single top-level element
22.32
Structure of XML Data (Cont.)

Mixture of text with sub-elements is legal in XML.
Example: <course> This course is being offered for the first time in 2009. <course id> BIO-399 </course id> <title> Computational Biology </title> <dept name> Biology </dept name> <credits> 3 </credits> </course> Useful for document markup, but discouraged for data representation
22.33
Attributes
Elements can have attributes
<course course_id= CS-101> <title> Intro. to Computer Science</title> <dept name> Comp. Sci. </dept name> <credits> 4 </credits> </course>
Attributes are specified by name=value pairs inside the starting tag of an
element
An element may have several attributes, but each attribute name can
only occur once <course course_id = CS-101 credits=4>
22.34
Attributes vs. Subelements

Distinction between subelement and attribute

In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents In the context of data representation, the difference is unclear and may be confusing
Same information can be represented in two ways

<course course_id= CS-101> </course> <course> <course_id>CS-101</course_id> </course>
Suggestion: use attributes for identifiers of elements, and use subelements for contents
22.35
Namespaces
XML data has to be exchanged between organizations Same tag name may have different meaning in different organizations,
causing confusion on exchanged documents

Specifying a unique string as an element name avoids confusion
Better solution: use unique-name:element-name Avoid using long unique names all over document by using XML
Namespaces
<university xmlns:yale=http://www.yale.edu> <yale:course> <yale:course_id> CS-101 </yale:course_id> <yale:title> Intro. to Computer Science</yale:title> <yale:dept_name> Comp. Sci. </yale:dept_name> <yale:credits> 4 </yale:credits> </yale:course> </university>
22.36
XML Document Schema

Database schemas constrain what information can be stored, and the
data types of stored values

XML documents are not required to have an associated schema However, schemas are very important for XML data exchange
Otherwise, a site cannot automatically interpret data received from another site Document Type Definition (DTD)
Two mechanisms for specifying XML schema
Widely used Newer, increasing use
XML Schema
22.37
Document Type Definition (DTD)

The type of an XML document can be specified using a DTD DTD constraints structure of XML data

What elements can occur What attributes can/must an element have What subelements can/must occur inside each element, and how many times. All values represented as strings in XML <!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) >
DTD does not constrain data types
DTD syntax

22.38
Element Specification in DTD
Subelements can be specified as

names of elements, or #PCDATA (parsed character data), i.e., character strings EMPTY (no subelements) or ANY (anything can be a subelement)
Example <! ELEMENT department (dept_name building, budget)> <! ELEMENT dept_name (#PCDATA)> <! ELEMENT budget (#PCDATA)> Subelement specification may have regular expressions <!ELEMENT university ( ( department | course | instructor | teaches )+)>
Notation:
| - alternatives + - 1 or more occurrences * - 0 or more occurrences
22.39
University DTD
<!DOCTYPE university [ <!ELEMENT university ( (department|course|instructor|teaches)+)> <!ELEMENT department ( dept name, building, budget)> <!ELEMENT course ( course id, title, dept name, credits)> <!ELEMENT instructor (IID, name, dept name, salary)> <!ELEMENT teaches (IID, course id)> <!ELEMENT dept name( #PCDATA )> <!ELEMENT building( #PCDATA )> <!ELEMENT budget( #PCDATA )> <!ELEMENT course id ( #PCDATA )> <!ELEMENT title ( #PCDATA )> <!ELEMENT credits( #PCDATA )> <!ELEMENT IID( #PCDATA )> <!ELEMENT name( #PCDATA )> <!ELEMENT salary( #PCDATA )> ]>
22.40
Attribute Specification in DTD

Attribute specification : for each attribute
Name Type of attribute
CDATA ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) more on this later Whether mandatory (#REQUIRED) has a default value (value), or neither (#IMPLIED) Examples <!ATTLIST course course_id CDATA #REQUIRED>, or <!ATTLIST course course_id ID #REQUIRED dept_name IDREF #REQUIRED instructors IDREFS #IMPLIED >
IDs and IDREFs

An element can have at most one attribute of type ID The ID attribute value of each element in an XML document must be
distinct
Thus the ID attribute value is an object identifier
An attribute of type IDREF must contain the ID value of an element in
the same document

An attribute of type IDREFS contains a set of (0 or more) ID values.
Each ID value must contain the ID value of an element in the same document
22.42
University DTD with Attributes

University DTD with ID and IDREF attribute types.
<!DOCTYPE university-3 [ <!ELEMENT university ( (department|course|instructor)+)> <!ELEMENT department ( building, budget )> <!ATTLIST department dept_name ID #REQUIRED > <!ELEMENT course (title, credits )> <!ATTLIST course course_id ID #REQUIRED dept_name IDREF #REQUIRED instructors IDREFS #IMPLIED > <!ELEMENT instructor ( name, salary )> <!ATTLIST instructor IID ID #REQUIRED dept_name IDREF #REQUIRED > declarations for title, credits, building, budget, name and salary ]>
22.43
Limitations of DTDs
No typing of text elements and attributes
All values are strings, no integers, reals, etc. Order is usually irrelevant in databases (unlike in the documentlayout environment from which XML evolved) (A | B)* allows specification of an unordered set, but
Difficult to specify unordered sets of subelements

Cannot ensure that each of A and B occurs only once
IDs and IDREFs are untyped
The instructors attribute of an course may contain a reference to another course, which is meaningless
instructors attribute should ideally be constrained to refer to instructor elements
22.44
XML Schema
XML Schema is a more sophisticated schema language which
addresses the drawbacks of DTDs. Supports
Typing of values

E.g. integer, string, etc Also, constraints on min/max values
User-defined, comlex types Many more features, including
uniqueness and foreign key constraints, inheritance
XML Schema is itself specified in XML syntax, unlike DTDs
More-standard representation, but verbose
XML Scheme is integrated with namespaces BUT: XML Schema is significantly more complicated than DTDs.
22.45
Decision Support Systems

Decision-support systems are used to make business decisions,
often based on data collected by on-line transaction-processing systems.

Examples of business decisions:

What items to stock? What insurance premium to change? To whom to send advertisements? Retail sales transaction details Customer profiles (income, age, gender, etc.)
Examples of data used for making decisions

22.46
Decision-Support Systems: Overview

Data analysis tasks are simplified by specialized tools and SQL
extensions Example tasks For each product category and each region, what were the total sales in the last quarter and how do they compare with the same quarter last year As above, for each product category and each customer category
Statistical analysis packages (e.g., : S++) can be interfaced with
databases Statistical analysis is a large field, but not covered here Data mining seeks to discover knowledge automatically in the form of statistical rules and patterns from large databases.
A data warehouse archives information gathered from multiple
sources, and stores it under a unified schema, at a single site.
Important for large businesses that generate data from multiple divisions, possibly at multiple sites Data may also be purchased externally
Data Warehousing
Data sources often store only current data, not historical data Corporate decision making requires a unified view of all organizational
data, including historical data

A data warehouse is a repository (archive) of information gathered
from multiple sources, stored under a unified schema, at a single site
Greatly simplifies querying, permits study of historical trends

Shifts decision support query load away from transaction processing systems
22.48
Data Warehousing
22.49
Design Issues
When and how to gather data
Source driven architecture: data sources transmit new information to warehouse, either continuously or periodically (e.g., at night) Destination driven architecture: warehouse periodically requests new information from data sources Keeping warehouse exactly synchronized with data sources (e.g., using two-phase commit) is too expensive

Usually OK to have slightly out-of-date data at warehouse Data/updates are periodically downloaded form online transaction processing (OLTP) systems.
What schema to use
Schema integration
22.50
More Warehouse Design Issues

Data cleansing

E.g., correct mistakes in addresses (misspellings, zip code errors) Merge address lists from different sources and purge duplicates Warehouse schema may be a (materialized) view of schema from data sources Raw data may be too large to store on-line
How to propagate updates
What data to summarize
Aggregate values (totals/subtotals) often suffice

Queries on raw data can often be transformed by query optimizer to use aggregate values
22.51
Why Data Mining?

The Explosive Growth of Data
Data collection and data availability
Automated data collection tools, database systems, Web, computerized society
Major sources of abundant data

Business: Web, e-commerce, transactions, stocks, Science: Remote sensing, bioinformatics, scientific simulation, Society and everyone: news, digital cameras,
We are drowning in data, but starving for knowledge! Necessity is the mother of inventionData miningAutomated analysis of
massive data sets

Database System Concepts - 6th Edition 22.52
52
Why Data Mining?Potential Applications

Data analysis and decision support
Market analysis and management
Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation
Risk analysis and management
Forecasting, customer retention, improved underwriting, quality control, competitive analysis
Fraud detection and detection of unusual patterns (outliers)
Other Applications
Text mining (news group, email, documents) and Web mining

Stream data mining Bioinformatics and bio-data analysis
53
22.53 Silberschatz, Korth and Sudarshan
Data Mining: A KDD Process

Pattern Evaluation
Data mining: the core of knowledge discovery Data Mining process.

Task-relevant Data
Data Selection Data Preprocessing
Data Warehouse
Data Cleaning Data Integration
Databases
Steps of a KDD Process

Learning the application domain:
relevant prior knowledge and goals of application
Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and transformation:
Find useful features, dimensionality/variable reduction, invariant representation.

summarization, classification, regression, association, clustering.
Choosing functions of data mining
Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation
visualization, transformation, removing redundant patterns, etc.

22.55 Silberschatz, Korth and Sudarshan
Use of discovered knowledge

Data Mining Functionalities

General functionality

Descriptive data mining Predictive data mining
Different views lead to different classifications

Data view: Kinds of data to be mined Knowledge view: Kinds of knowledge to be discovered Method view: Kinds of techniques utilized Application view: Kinds of applications adapted
56
Data Mining Functionalities

Multidimensional concept description: Characterization and discrimination
Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions
association analysis
Diaper Beer [0.5%, 75%] (Correlation or causality?)
Classification and prediction
Construct models (functions) that describe and distinguish classes or concepts for future prediction
E.g., classify countries based on (climate), or classify cars based on (gas mileage)
Predict some unknown or missing numerical values

57
22.57
Data Mining Functionalities (2)

Cluster analysis

Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns Maximizing intra-class similarity & minimizing interclass similarity Outlier: Data object that does not comply with the general behavior of the data Noise or exception? Useful in fraud detection, rare events analysis Trend and deviation: e.g., regression analysis Periodicity analysis Similarity-based analysis
Outlier analysis

Trend and evolution analysis

Other pattern-directed or statistical analyses

58
Data Cleaning
Importance
Data cleaning is one of the three biggest problems in data warehousingRalph Kimball Data cleaning is the number one problem in data warehousingDCI survey
Data cleaning tasks

Fill in missing values
Identify outliers and smooth out noisy data

Correct inconsistent data Resolve redundancy caused by data integration
Data Mining: Concepts and 22.59
December 5, 2013
59
Missing Data
Data is not always available
E.g., many tuples have no recorded value for several attributes, such as customer income in sales data
Missing data may be due to

equipment malfunction inconsistent with other recorded data and thus deleted data not entered due to misunderstanding certain data may not be considered important at the time of entry not register history or changes of the data
Missing data may need to be inferred.

December 5, 2013
60
How to Handle Missing Data?

Ignore the tuple: usually done when class label is missing (assuming
the tasks in classificationnot effective when the percentage of missing values per attribute varies considerably.
Fill in the missing value manually: tedious + infeasible? Fill in it automatically with

a global constant : e.g., unknown, a new class?! the attribute mean the attribute mean for all samples belonging to the same class: smarter
the most probable value: inference-based such as Bayesian formula or decision tree
December 5, 2013 Database System Concepts - 6th Edition
61
Noisy Data
Noise: random error or variance in a measured variable Incorrect attribute values may due to

faulty data collection instruments data entry problems data transmission problems technology limitation inconsistency in naming convention duplicate records incomplete data
Other data problems which requires data cleaning

inconsistent December 5, 2013
data Data Mining: Concepts and 22.62
62
How to Handle Noisy Data?

Binning

first sort data and partition into (equal-frequency) bins then one can smooth by bin means, smooth by bin median, smooth by bin boundaries, etc. smooth by fitting the data into regression functions detect and remove outliers detect suspicious values and check by human (e.g., deal with possible outliers)
Regression
Clustering
Combined computer and human inspection
December 5, 2013
63
Simple Discretization Methods: Binning

Equal-width (distance) partitioning

Divides the range into N intervals of equal size: uniform grid if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B A)/N.
The most straightforward, but outliers may dominate presentation Skewed data is not handled well
Equal-depth (frequency) partitioning
Divides the range into N intervals, each containing approximately same number of samples
Good data scaling Managing categorical attributes can be tricky

December 5, 2013
64
Binning Methods for Data Smoothing

Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28,
29, 34
* Partition into equal-frequency (equi-depth) bins:

- Bin 1: 4, 8, 9, 15 - Bin 2: 21, 21, 24, 25 - Bin 3: 26, 28, 29, 34 * Smoothing by bin means: - Bin 1: 9, 9, 9, 9 - Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29

* Smoothing by bin boundaries: - Bin 1: 4, 4, 4, 15 - Bin 2: 21, 21, 25, 25
December 5, 2013
65
Regression
y
Y1
Y1
y=x+1
X1
December 5, 2013
66
Cluster Analysis
December 5, 2013
67

Unit-5: Database System Concepts, 6 Ed

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit-5: Database System Concepts, 6 Ed

Uploaded by

Copyright:

Available Formats

UNIT- 5

Database System Concepts, 6th Ed.

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

A Centralized Computer System

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

general structure is shown below:

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Client-Server Systems (Cont.)

through an application program interface.

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Client-Server Systems (Cont.)

personal computers connected to back-end server machines:

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Server System Architecture

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Requests are specified in SQL, and communicated to the server

through a remote procedure call (RPC) mechanism.

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Transaction Server Process Structure

data in shared memory.

Lock manager process

More on this later

Database writer process

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Transaction Server Processes (Cont.)

Server processes simply add log records to log record buffer

Process monitor process

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Transaction System Processes (Cont.)

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Transaction System Processes (Cont.)

Buffer pool Lock table Log buffer

Data are shipped to clients where processing is performed, and then

shipped results back to the server.

Page-Shipping versus Item-Shipping Locking Data Caching Lock Caching

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Data Servers (Cont.)

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Data Servers (Cont.)

Data can be cached at client even in between transactions

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

disks connected by a fast interconnection network.

thousands of smaller processors.

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan

Speed-Up and Scale-Up

to a system which is N-times larger.

Scale up is linear if equation equals 1.

Database System Concepts - 6th Edition

Silberschatz, Korth and Sudarshan