Professional Documents
Culture Documents
I
History of Database Systems
The brief history of the database systems are 1950s and early 1960s: Magnetic tapes were developed for data storage. Data processing tasks such as payroll were automated, with data stored on tapes. Processing of data consisted of reading data from one or more tapes and writing data to a new tape. Tapes could be read only sequentially, and data sizes were much larger than main memory. o Late 1960s and 1970s: widespread use of hard disks in the late 1960s changed the scenario for data processing greatly, since hard disks allowed direct access to data. Unlike files hard disk supports random access of files. This feature was used to develop network and hierarchical database. Disadvantages: - Related data is required to be stores in consequent blocks of memory. For example Student marks record were required to be stored very next to the student details record. This provide efficient original query responses but fails to provide solutions for new complex queries - Data reorganization is time consuming - Every query has to be programmed. o 1980s: Relational data were not in practice until 1960s and 1970s. with the development of system R, the implementation of relational database improved drastically, system R, is an IBM research product based on SEQUEL/2, provides to validate feasibility of relational model. SQL/DS,IBM DB2,Oracle, Ingres, DEC Rdb are some of earlier relational data systems. Advantages: - Easy implementations - Programmer relieved from low level implementation details - Queries need not be coded in procedural fashion - Programmers work only at logical level 1990s : SQL language was developed in early 1990s. this is designed for query intensive applications. o
Databases and Database systems are an essential component of everyday life in modern society. Daily most of us encounter several activities that involve some interaction with a database. For example, if we go to the bank to deposit or withdraw funds, if we make a hotel or airline reservation, if we access a computerized library catalog to search for a bibliographic item, or if we purchase something online such as a book, toy or computer chances are that our activates will involve someone or some computer program accessing a database. Even purchasing items at a supermarket in many cases, automatically updates the database that holds the inventory of grocery items.
Data vs Information
The terms Data and Information are important to the design of Databases; it is better understanding the difference between them.
What is Data?
Data are raw facts. The word raw indicates that the facts have not yet been processed to reveal their meaning.
What is Information?
Information is the result of processing raw data to reveal its meaning. To reveal meaning, information requires context. For example an average temperature reading of 105 degrees does not mean much unless you also know its context: Is this in degrees Fahrenheit or Celsius? Is this a machine temperature, a body temperature, or an outside air temperature? Information can be used as the foundation for decision making. Data are the foundation of information, which is the bedrock of knowledge that is, the body of information and facts about a specific subject. Knowledge implies familiarity, awareness, and understanding to an environment. A key characteristic of knowledge is that new knowledge can be derived from old knowledge. Lets summarize some key points: Data constitute the building blocks of information Information is produced by processing data. Information is used to reveal the meaning of data. Accurate, relevant, and timely information is the key to good decision making Good decision making is the key to organizational survival in a global environment.
Defining a database involves specifying the data types, structures, and constraints of the data to be stored in the database. The database definition or descriptive information is also stores in the database in the form of a database catalog or dictionary; it is called meta-data. Constructing the database is the process of storing the data on some storage medium that is controlled by the DBMS. Manipulating a database includes functions such as querying the database retrieve specific data, updating the database to reflect changes in the miniworld, and generating reports
from the data. Sharing a database allows multiple users and programs to access the database simultaneously. Users/Programmers
Database System
Application Programs/Queries
DBMS Software
Stored Database
Data Redundancy
In File-Based Systems, each application has its own set of files. Each application creates and stores data just for the use of that application. For example, in a bank, there could be three separate applications- one for current accounts, one for savings accounts, and another one for loan account, if a bank customer has a current account, a savings account, and a loan account, data about that customer such as name and address are kept separately in each of these applications. There is unnecessary and uncontrolled duplication of the same data in the banks computer files. Obviously, data duplication results in wasted storage space. Current Account File
CAcc 101 102 103 CName Suresh Ramu Srinivas Loca Chennai Nellore Chennai 101 102 103
Data Inconsistency
Data redundancy or duplication of data in your computer files can cause serious inconsistency in the data. Suppose a bank customer has both current and savings accounts. If that persons name, address, or both are different in the two accounts, the data about that customer are inconsistent in the banks files. Which set of data is correct? It is possible that the name of that customer in one system is correct and the address as recorded in the other application is correct. Inconsistency of data is a direct result of data duplication. Saving Account File
CAcc 101 102 CName Suresh Sindhrui Loca Chennai Nellore LAcc 101 102
Data Isolation
Data are scattered in various files, and files may be in different formats, writing new application programs to retrieve the appropriate data is difficult.
Data Integrity
The data values stored in the database must satisfy certain types of constraints. Suppose the any Bank maintains an Savings Bank Account, and records the balance amount may never fall below zero. Developers enforce these constraints in the system by adding appropriate code in the various application programs. However, when new constraints are added, it is difficult to change the programs to enforce them. The problem is compounded when constraints involve several data items from different files.
Data Dependence
The physical structure and storage of the data files and records are defined in the application code. This means that changes to an existing structure are difficult to make. For example, increasing the size of the Customer Address field from 40 to 41 characters sounds like a simple change, but it requires the creation of a one-off program (i.e a program that is run only once and can then be discarded) that converts the Customer file to the new format. This program has to: o Open the original Customer file for reading; o Open a temporary file with the new structure; o Read a record from the original file, convert the data to conform to the new structure, and write it to the temporary file. Repeat this step for all records in the original file; o Delete the original Customer file; o Rename the temporary file as Customer.
Atomicity Problems
A computer system, like any other device, is subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed prior to the failure. Consider a program to transfer $500 from the account balance of A to the account balance of B. if a system failure occurs during the execution of the program. It is possible that the $500 was removed from the balance of A but was not credited to the balance of B. that is, the funds transfer must be atomic. It is difficult to ensure atomicity in a conventional file-processing system.
A
$1000
-$500
+$500
B
$200
FILE
User 1
User 2
Security Problems
Information is a corporate asset and, therefore, must be protected through proper security controls. In file-oriented systems, security controls cannot be established easily. Not every user of the database system should be able to access all the data. For example, in a university, payroll personnel need to see only that part of the database that has financial information. They do not need access to information about academic records. But, since application programs are added to the file-processing system such security constraints is difficult.
Database Designer
Database Designers are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. These tasks are mostly undertaken before the database is actually implemented and populated with data. It is responsibility of database designers to communicate with all prospective database users in order to understand their requirements and to createdesign that meets these requirements.
End Users
End Users are the people whose jobs require access to the database for querying, updating and generating Reports; the database primarily exists for their use.
Database Systems
The problems inherent in file systems make using a database system very desirable. Unlike the file system, with its many separate and unrelated files, the database system consists of logically related data stored in a single logical data repository. The current generation of DBMS software also takes care of defining, storing, and managing all required access paths to those components. Remember that the DBMS is just one of several crucial components of a database system. The DBMS may even be referred to as the database systems heart. The term database system refers to an organization of components that define and regulate the collection, storage, management and use of data within a database environment. The database system is composed of the five major parts i.e. 1. Hardware 2. Software 3. People 4. Procedures 5. Data
Writes
DBA
Manages
Database Designer
Designs
HARD WARE
Procedures
Programmer
Use
DBMS
DATA
10
Hardware
Hardware refers to all of the systems physical devices. The hardware can range from a single personal computer, to a single mainframe, to a network of computers. For example, computers, storage devices, printers, network devices (hubs, switches, routers, fiber optics) and other devices.
Software
The software component comprises the DBMS software itself and the application programs together with the operating system. To make the database system function fully, three types of software are needed. Operating System: Software manages all hardware components and makes it possible for all other software to run on the computers. Examples of operating system software include Microsoft Windows, Linux etc. DBMS Software manages the database within the database system. Some examples of DBMS software include Microsoft SQL Server, Oracle Corporations Oracle, MySQL, etc. Application programs and utility software are used to access and manipulate data in DBMS and to manage the computer environment in which data access and manipulation take place. Application programs are most commonly used to access data found within the database to generate reports, tabulations, and other information.
People
This component includes all users of the database system. Here we identify four types users in a database system. 1. Database Administrator(DBA) 2. Database Designer 3. System Analysts and Application Programmers 4. End Users Database Administrator(DBA): manage the DBMS and ensure that the database is functioning properly. Database Designer : To design the database Strucutre. System Analysts and Application Programmers: Design and implement the application programs. They design and create the data entry screens,reports and procedures through which end users access and manipulate the databases data. End Users Are the people who use the application programs to run the organizations daily operations.
11
Data
The most important component of the DBMS environment, certainly from the end-users point of view. We observe that the data acts as bridge between the machine components and the human components. The database contains both the operational data and the metadata.
Procedures
Procedures are the instructions and rules that govern the design and use of the database system. The users of the system and the staff that manage the database require documented procedures on how to use or run the system. These may consist of instructions on how to: o Log on the DBMS; o Use a particular DBMS facility or application program; o Start and Stop the DBMS; o Make backup copies of the database; o Handle hardware or software failures. This may include procedures on how to identify the failed component, how to fix the failed component. How to recover the database. o Change the structure of a table, reorganize the database across multiple tasks, improve performance, or archive data to secondary storage.
Disadvantages of DBMS Complexity The provision of the functionality we expect of a good DBMS makes the DMBS an extremely complex piece of software. Database designers and developers, the data and database administrators, and end-users must understand this functionality to take full advantage or it. Failure to understand the system can lead to bad design decisions, which can have serious consequences for an organization. Size The complexity and breadth of functionality makes the DBMS an extremely large piece of software, occupying many megabytes of disk space and requiring substantial amounts of memory to run efficiently. Cost of DMBSs The cost of DBMSs varies significantly, depending on the environment and functionality provided. For example, a single-user DBMS for a personal computer may only cost US$100. However, a large mainframe multi-user DBMS servicing hundreds of users can be extremely expensive, perhaps US$100,000 or even US$1,000,000.
AUDISANKARA COLLEGE OF ENGINEERING FOR WOMEN, GUDURU 12
Performance
A file-based system is written for a specific application, such as invocing. As a result, performance is generally very good. However, the DBMS is written to be more general, to cater for many applications rather than just one. The effect is that some applications may not run as fast as they used to.
***** Review Questions 1. Discuss each of the following terms: a. Data b. Field c. Record d. File 2. What is data redundancy, and which characteristics of the file system can lead to it? 3. What is a DBMS, and what are its functions? 4. Explain the difference between data and information? 5. What is the role of a DBMS, and what are its advantages? What are its disadvantages? 6. What is metadata? 7. Explain why database design is important? 8. Explain the Disadvantages of File-Based System? 9. What are the responsibilities of the DBA and the database designers? 10. Define the following terms: data, database, DBMS, database system, DBA, meta-data. 11. Discuss the advantages and disadvantages of DBMS.
13
Data Modeling
Database design focuses on how the database structures will be used to store and manage end-user data. Data modeling, the first step in designing a database, refers to the process of creating a specific data model for a determined problem domain. (A problem domain is a clearly defined area within the real world environment, with well defined scope and boundaries that is to be systematically addressed). A data model is a relatively simple representation, usually graphical, of more complex real-world data structures. In general terms, a model is an abstraction of a more complex real-world object or event. A models main function is to help you understand the complexities of the real-world environment. Within the database environment, a data model represents data structures and their characteristics, relations, constraints, transformations, and other constructs with the purpose of supporting a specific problem domain. Data modeling is an iterative, progressive process. You start with a simple understanding of the problem domain, and as your understanding of the problem domain increases, so does the level of detail of the data model. Done properly, the final data model is in effect a blueprint containing all instructions to build a data that will meet all end-user requirements. This blueprint is narrative and graphical in nature. Database designers relied on good judgment to help them develop a good data model.
Data Models
A data model is a collection of concepts that can be used to describe the structure of a database. By structure of a database, we mean the data types, relationships, and constraints that should hold for the data. Most data models also include a set of basic operations for specifying retrievals and updates on the database.
hierarchical structure contains levels, or segments. A segment is equivalent of a file systems record type. Within hierarchy, the top layer (the root) is perceived as the parent of the segment directly beneath it. For example, the following figure the root segment is the parent of the level 1. The segments below other segments are the children of the segment above. In short the hierarchical model shows a set of one-tomany (1:M) relationships between a parent and its children segments.
Root Segment Level 1 Segments (Root Children) Component A Level 2 Segments (Level 1 Children) Assembly A
Final Assembly
Component B
Component C
Assembly B
Assembly C
Part B
Part C
A hierarchical Structure
Part D
Part E
The hierarchical data model yielded many advantages over the file system model. In fact, many of the hierarchical data models features formed the foundation for current data models. However, the hierarchical model had limitations: it was complex to implement, it was difficult to manage.
Relational Model
The relational model uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. tables are also known as relations. The relational model is an example of record-based model. Record based models are so named because the
15
database is structured in fixed-format records of several types. Each record type defines a fixed number of fields, or attributes.
Entity-Relationship Model
The entity-relationship (E-R) data model uses a collection of basic objects, called entities and relationships among these objects. An entity is a thing or object in the real world. The entity-relationship model is widely used in database design.
modeling cannot be overstated. Data constitute the most basic information units employed by a system. Applications are created to manage data and to help transform data into information. But data are viewed in different ways by different people. The data environment requires an overall database blueprint based on an appropriate data model. Keep in mind that a house blueprint is an abstraction; you cannot live in the blueprint. Similarly, the data model is an abstraction; you cannot draw the required data out of the data model. Just as you are not likely to build a good house without a blueprint, you are equally unlikely to create a good database without first creating an appropriate data model.
Business Rules
A business rule is a policy, procedure or standard that an organization has adopted. Business rules are very important in database design because they dictate controls that must be placed upon the data. Most business rules can be enforced through manual procedures that employees are directed to follow or logic placed in the application programs. However, each of these can be circumvented employees can forget or can choose not to follow a manual procedure. Business rules can be implemented in the database as constraints, which are formally defined rules that restrict the data values in the database in some way.
17
First
1960s 1970s
Second
1970s
Third
XML
Used mainly on IBM mainframe systems Early database IMS systems ADABAS Navigation IDS-II alaccess Conceptual simplicity DB2 Entity Oracle Relationship(ER) MS SQL-Server modeling and MySQL support for relational data modeling Support complex data Versant Extended relational FastObjects.Net products support Objectivity/DB objects and data DB/2 UDB warehousing Oracle 10g Web databases become common. Oraganization and dbXML management of Tamino unstructured data DB2 UDB Relational and Oracle 10g object models add MS SQL Server support for XML document VMS/VSAM
18
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to use complex data structures to represent data in the database. Since many database-system users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users interactions with the system:
...
...
View N
Logical Level
Physical Level
The Three levels of Data Abstraction
The objective of the three-level architecture is to separate each users view of the database from the way the database is physically represented. There are several reasons why this separation is desirable: Each user should be able to access the same data, but have a different customized view of the data. Each user should be able to change the way he or she views the data, and this change should not affect other users.
19
Users should not have to deal directly with physical database storage details. In other words a users interaction with the database should be independent of storage considerations. The Database Administrator (DBA) should be able to change the database storage structures without affecting the users views.
Physical Level or Internal Level The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in details. The internal level is concerned with such things as: o o o o Storage space allocation for data and indexes; Record descriptions for storage (with stored sizes for data items) Record placement; Data compression and data encryption techniques
Logical Level or Conceptual Level The middle level in the three-level architecture is the conceptual level describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures. The Conceptual level represents: o o o o All entities, their attributes, and their relationships; The constraints on the data; Semantic information about the data; Security and integrity information.
View Level or External Level The highest level of abstraction describes only part of the entire database. Even though the logical level used simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view Level of abstraction exists to simplify their interaction with the system. The system may provide many views for the dame database.
20
Storage Manager
1. A storage manager is a program module that provides the interface between the low level data stored in the database and the application programs and queries submitted to the system. 2. The raw data are stored on the disk using the file system, which is usually provided by a conventional operating system. 3. The storage manager translates the various DML statements into low-level filesystem commands. Thus, the storage manager is responsible for storing, retrieving, and updating data in the database. The storage manager components include: o o Authorization and integrity manager, which tests for the satisfaction of integrity constraints and checks the authority of users to access data. Transaction manager, which ensures that the database remains in a consistent (correct) state regard less of system failures, and that concurrent transaction Executions proceed without conflicting. File manager, which manages the allocation of space on disk storage and the
21
data structures used to represent information stored on disk. o Buffer manager, which is responsible for fetching data from disk storage into main memory, and deciding what data to cache in main memory. The buffer manager is a critical part of the database system, since it enables the database to handle data sizes that are much larger than the size of main memory.
The storage manager implements several data structures as part of the physical system implementation: o Data files, which store the database itself. o Data dictionary, which stores metadata about the structure of the database, in particular the schema of the database. o Indices, which provide fast access to data items that hold particular values. The Query Processor The query processor components include DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary. DML compiler, which translates DML statements in a query language into an evaluation plan consisting of low-level instructions that the query evaluation engine understands. query can usually be translated into any of a number of alternative evaluation plans that all give the same result. The DML compiler also performs Query optimization, that is, it picks the lowest cost evaluation plan from among the alternatives. Query evaluation engine, which executes low-level instructions generated by the DML compiler.
22
23
Application Architectures
Most users of a database system today are not present at the site of the database system, but connect to it through a network. We can therefore differentiate between client machines and server machines, on which the database system runs. Database applications are usually partitioned into two or three parts. In two-tier architecture, the application is partitioned into a component that resides at the client machine, which invokes database system functionality at the server machine through query language statements. Application program interface standards like ODBC and JDBC are used for interaction between the client and the server. In three-tier architecture, the client machine acts as simply a front end and does not contain any direct database calls. Instead, the client communicates with an application server, usually through a forms interface. The application server in turn communicates with a database system to access data. The business logic of the application, which says what actions to carry out under what conditions, is embedded in the application server, instead of being distributed across multiple clients. Three-tier applications are more appropriate for large applications, and for applications that run on the World Wide Web.
24