You are on page 1of 19

Chapter 2

DataBase System Concepts and Architecture

2.1 Data Models, Schemas, and Instances


2.2 DBMS Architecture and Data Independence
2.3 Database Languages and Interfaces
2.4 The Database System Environment
2.5 Classification of Database Management Systems
2.6 Summary

2-1 2-1
2.1 Data Models, Schemas, and Instances
‧data types
‧relationships
Data Model: A set of concepts to describe the structure of a
database, and certain constraints that the database should obey.
Provide data abstraction

Data Model Operations: Operations for specifying database


retrievals and updates by referring to the concepts of the data
model.
‧generic operation: insert, delete, modify, retrieve
‧user-defined operations

2-2 2-2
2.1.1 Categories of Data Models:
- Conceptual (high-level, semantic) data models: Provide concepts that
are close to the way many users perceive data.
(Also called entity-based or object-based data models.)
‧entity ‧attribute ‧relationship

- Physical (low-level, internal) data models: Provide concepts that


describe details of how data is stored in the computer.
‧record formats ‧record ordering ‧access paths

- Implementation (record-oriented) data models: Provide concepts that


fall between the above two, balancing user views with some computer
storage details.
‧relational ‧network ‧hierarchical
2-2 2-3
2.1.2 Schemas, Instances and Database State
cf database
Database Schema (meta-data): The description of a database. Includes
descriptions of the database structure and the constraints that should hold
on the database.

Schema Diagram: A diagrammatic display of (some aspects of ) a


database schema. (refer to Fig 2.1 2-5)

Database Instance: The actual data stored in a database at a particular


moment in time. Also called database state ( or occurrence, snapshot)
(refer to Fig 1.2 2-6)

Each schema construct has its own current set of instances.

The database schema changes very infrequently. The database state


changes every time the database is updated. Schema is also called
intension, whereas state is called extension.
2-3 2-4
Figure 2.1 Schema diagram for UNIVERSITY database

schema construct

Known data:
name of record types, data items

2-4a 2-5
Figure 1.2
UNIVERSITY Database

2-4 2-6
define

empty state

load

initial state

update
valid state
state satisfy database schema

update

2-3 2-7
2.2 DBMS Architecture and Data Independence

2.2.1 Three-Schema Architecture


Proposed to support DBMS characteristics of:
- Insulation of programs and data/program and operations
(program-data and program-operation independence)
- Support of multiple views of the data.
- Use of catalog (database description)

Defines DBMS schema at three levels: (see 2-9)


- Internal schema at the internal level to describe data storage structures and access
paths. Typically uses a physical data model.
- Conceptual schema at the conceptual level to describe the structure and constraints
for the whole database. Uses a conceptual or an implementation data model.
- External schema at the external level to describe the various user views. Usually
uses the same data model as the conceptual level or high-level data model.

Mappings among schema levels are also needed. Programs refer to an external
schema, 2-5 2-8
Figure 2.2 The Three-schema architecture 2-6

2-6 2-9
2.2.2 Data Independence By adding or removing a record type or data
item to
· expand the database (2-11)
· reduce the database
Logical Data Independence: The capacity to change the conceptual schema without
having to change the external schemas and their application programs.

Physical Data Independence: The capacity to change the internal schema without
having to change the conceptual schema.

Reorganize physical files to improve performance


e.g. List all sections offered in Fall 1998
When a schema at a lower level is changed, only the mappings between this
schema and higher-lever schemas need to be changed in a DBMS that fully supports
data independence. The higher-level schemas themselves are unchanged. Hence, the
application programs need not be changed since they refer to the external schemas.

Disadvantages of two levels of mappings:


Overhead during compilation or execution of a query or program
2-7 2-10
UNIVERSITY Conceptual Schema
STUDENT (Name, Student Number, Class, Major)
COURSE (Course Name, Course Number, Credit, Dept)
PREREQUISITE (Course Number, Prerequisite Number)
SECTION (Section Id, Course Number, Semester, Year, Instructor)
GRADE_REPORT(Student Number, Section Id , Grade)

UNIVERSITY External Schema


TRANSCRIPT(Student Name, Course Number, Grade, Semester, Year, Section Id)
derived from STUDENT, SECTION, GRADE_REPORT
PREREQUISITES(Course Name, Course Number, Prerequisites)
derived from PREREQUISITE, COURSE

Change GRADE-REPORT Schema Construct


GRADE_REPORT (Student Number, Student Name, Section Id, Course Number,
Grade)

Change Mapping (& View Definition)


TRANSCRIPT derived from SECTION, GRADE_REPORT
2-7a 2-11
2.3 Database Languages and Interfaces
provide appropriate languages and interfaces for each category of users.

2.3.1 DBMS Languages


Data Definition Language (DDL): Used by the DBA and database designers to
specify the conceptual schema of a database. In many DBMSs, the DDL is also
used to define internal and external schemas (views). In some DBMSs, separate
storage definition language (SDL) and view definition language (VDL) are
used to define internal and external schemas.
DDL Compiler

Data Manipulation Language (DML): Used to specify database retrievals and


updates (insertion, deletion, modifications)

- DML commands (data sublanguage) can be embedded in a general-purpose


programming language (host language).

- Alternatively, stand-alone DML commands can be applied directly (query


language).
2-8 2-12
Types of DML

-Procedural DML:
• Also called record-at-a-time (record-oriented) or low-level DML
• Must be embedded in a programming language.
• Searches for and retrieves individual database records and uses looping
and other constructs of the host programming language to retrieve multiple
records.

-Declarative or non-procedural DML:


• Also called set-at-a-time (set-oriented) or high-level DML.
• Can be used as a stand-alone query language or can be embedded in a
programming language.
• Searches for and retrieves information from multiple related database
records in a single command.

- host language: general-purpose language


- data sublanguage: DML
- C++
2-9 2-13
2.3.2 DBMS Interfaces
- Stand-alone query language interfaces. (casual end user)

- Programmer interfaces for embedding DML in programming


languages: (programmer)
-Pre-compiler Approach
-Procedure (Subroutine) Call Approach

- User-friendly interfaces:
-Menu-based Interfaces for Browsing.
-Forms-based Interfaces.
-Graphical User Interfaces.
-Natural language Interfaces
-Combination of the above

-Interfaces for Parametic Users (using function keys)

- Interfaces for the DBA:


-Creating accounts, granting authorizations
-Setting system parameters
2-10 2-14
-Changing schemas or access path
2.4 The Database System Environment
2.4.1 DBMS Component Modules

Figure 2.3
2-11 2-15
2.4.2 Database System Utilities

To perform certain functions such as:

- Loading data stored in files into a database. Conversion tool


- Backing up the database periodically on storage.
- File reorganizing database file structures.
- Report generation utilities.
- Performance monitoring utilities.
- Other functions, such as sorting, user monitoring,
data compression, etc.

2-12 2-16
2.4.3 Tools, Application Environments, and
Communications Facilities

Data dictionary utility:


- Used to store schema descriptions and other information such as design
decisions, application program descriptions, user information, usage
standards, etc. (comment)
-Active data dictionary is accessed by DBMS software and users/DBA.
-Passive data dictionary is accessed by users/DBA only.

Communications Facilities
- Allow users at locations remote from the database system site to access
the database.
DB (DBMS)/DC (Data Communication System)

2-12 2-17
2.5 Classification of Database Management Systems
Based on the data model used:
•Data models
-Traditional: Relational, Network (see 2-19), Hierarchical
- Emerging: Object-oriented, Semantic, Entity- Relationship, other.

Other classifications:
•Number of users : Single-user (typically used with personal computers) vs.
multi-user (most DBMSs)

•Number of sites:
Centralized (uses a single computer) vs. distributed (uses multiple computers).
Homogeneous vs. Heterogeneous
• Cost of DBMS software. $10,000~100,000
$100~3,000
•Types of access paths used. (inverted file structures, …)
•Purpose general purpose
special purpose
e.g. airline reservations, telephone directory, on-line transaction
processing system
2-13 2-18
Figure 2.4 A Network Schema

2-14 2-19

You might also like