You are on page 1of 33

Advanced Database Systems

Assignment : No.5
Analyse and Compare the Physical Storage Structures and types of available INDEX of the latest versions of: Oracle SQL Server DB2 MySQL Teradata Define a comparative framework. Recommend one product for organizations of around 2000-4000employees with sound reasoning based on Physical Storage Structures.

SUBMITTED TO: SUBMITTED BY:


DR. FAROOQUE AZAM

SHAHID IQBAL
USMAN HUMAYUN

AQSA BAJWA
MADIHA WARIS COMPUTER SOFTWARE

DEPARTMENT:
ENGINEERING

COLLEGE OF E&ME, NUST

Advanced Database Systems Management Systems

Comparison of Latest Database

ABSTRACT:
In this document the detailed analysis of all the latest versions of Database Management Systems has been done. The Physical Storage Structure and Types of available indexes of Oracle, SQL server, DB2, MySQL, and Teradata have been explained and then a comparative analysis has been done on the basis of Physical Storage Structure of all these Database Management Systems. Based on this comparative analysis one of the Database Management systems is proposed as best fit for the organizations having 2000 to 4000 employees.

MS-11

College of E&ME, NUST

Advanced Database Systems Management Systems

Comparison of Latest Database

1. INTRODUCTION:
A database management system (DBMS) is a collection of programs that enables users to create and maintain a database. Database management systems provide several functions in addition to simple file management which are as follows: Allow concurrency Control security Maintain data integrity Provide for backup and recovery Control redundancy Allow data independence Provide non-procedural query language Perform automatic query optimization [1] A DBMS is a complex set of software programs that controls the organization, storage, management, and retrieval of data in a database. Database Management Systems are categorized according to their data structures or types, sometime DBMS is also known as Data base Manager [2]. 1.1.DATABASE MANAGEMENT SYSTEM SOFTWARES: The examples of different database management systems are as follows: Oracle DB2 SQL Server MySQL Teradata All above stated Database Management Systems should have the following features: They must provide a way to structure data as records, tables, or objects They should accept data input from operators and store that data for later retrieval Query languages for searching, sorting, reporting, and other "decision support" activities that help users correlate and make sense of collected data must be provided They must provide multiuser access to data, along with security features that prevent some users from viewing and/or changing certain types of information They should provide data integrity features that prevent more than one user from accessing and changing the same information simultaneously A data dictionary (metadata) that describes the structure of the database, related files, and record information must be provided [3]

MS-11

College of E&ME, NUST

Advanced Database Systems Management Systems 2.

Comparison of Latest Database

ANALYSIS OF THE PHYSICAL STORAGE STRUCTURES AND TYPES OF AVAILABLE INDEX OF THE LATEST DBMSs:

In this section a complete analysis of the following latest versions of Database Management Systems has been done on the basis of their Physical Storage Structure and their available indexes: a. b. c. d. e. Oracle Sql server Mysql DB2 Teradata

A. ORACLE:
An Oracle database is made up of physical and logical structures. Physical structures are those that can be seen and operated on from the operating system, such as the physical files that store data on a disk. Whereas logical structures are created and recognized by Oracle Database and are not known to the operating system. In this section only the physical storage structure in oracle is discussed as follows: PHYSICAL STORAGE STRUCTRUE IN ORACLE: Structures of an Oracle database include data files on disk that are not directly manipulated by users of the database. Physical structures exist at the operating system level.
2.1.

In the Figure 1 below, the physical database structures of an Oracle database, including data files, redo log files, and control files are shown.

MS-11

College of E&ME, NUST

Advanced Database Systems Management Systems

Comparison of Latest Database

Figure 1: Oracle Database Storage Structure

2.1.1. Data files:

Files that contain all of the database data that the users of the database save and retrieve using SELECT and other DML statements are called data files. A table space comprises one or more data files. A single data file is an operating system file on a servers disk drive. This disk may be local to the server or a drive on a shared storage array. Each data file belongs to only one table space; a table space can have many data files associated with it. There are five physical data files in the database in the physical structure illustration as shown in Figure 1: one is used for the SYSTEM table space, one is used for the SYSAUX table space, two data files are assigned to Table space 1, and the fifth data file is assigned to Table space 2.
2.1.2. Redo Log Files:

Files that contain a record of all changes made to the data in both tables and indexes as well as changes to the database structures themselves. These files are used to recover changed data that was in memory at the time of a crash. The redo log files facilitate the Oracle mechanism to recover from an instance failure or a media failure. When any changes are made to the database, such as updates to data or creating or dropping database objects, the changes are recorded to the redo log files first.

MS-11

College of E&ME, NUST

Advanced Database Systems Management Systems

Comparison of Latest Database

A database has at least two redo log files, and it is recommended that multiple copies of the redo log files be stored on different disks. If the instance fails, any changed database blocks that were not yet written to the data files are retrieved from the redo log files and written to the data files when the instance is started again and this process is called instance recovery.
2.1.3. Control Files:

Files that record the physical structure of a database, the database name, and the names and locations of data files and redo log files are called a control files. The control file maintains the metadata for the physical structure of the entire database. It stores the following metadata: The name of the database The names and locations of the table spaces in the database The locations of the redo log files and The information about the last backup of each table space in the database. Because of the importance of control file file, Oracle best practices recommend that you keep a copy of the control file on at least three different physical disks. The control file and redo log file contents do not map directly to any database objects, but their contents and status are available to the DBA by accessing virtual tables called data dictionary views, which are owned by the SYS schema [4].
2.1.4. Mechanisms for Storing Database Files:

Several mechanisms are available for allocating and managing the storage of these files. The most common mechanisms include: i. Oracle Automatic Storage Management (Oracle ASM): Oracle ASM includes a file system designed exclusively for use by Oracle Database. ii. Operating system file system: Most Oracle databases store files in a file system, which is a data structure built inside a contiguous disk address space. All operating systems have file managers that allocate and deal locate disk space into files within a file system. A file system enables disk space to be allocated too many files. Each file has a name and is made to appear as a contiguous address space to applications such as Oracle Database. The database can create, read, write, resize, and delete files. A file system is commonly built on top of a logical volume constructed by a software package called a logical volume manager (LVM). The LVM enables pieces of multiple physical disks to be combined into a single contiguous address space that appears as one disk to higher layers of software. iii. Raw device: Raw devices are disk partitions or logical volumes not formatted with a file system. The primary benefit of raw devices is the ability to perform direct I/O and to write larger buffers. In direct I/O, applications write to and read from the storage device directly, bypassing the operating system buffer cache. iv. Cluster file system: A cluster file system is software that enables multiple computers to share file storage while maintaining consistent space allocation and file content. In an Oracle RAC environment, a cluster
MS-11 College of E&ME, NUST 6

Advanced Database Systems Management Systems

Comparison of Latest Database

file system makes shared storage appears as a file system shared by many computers in a clustered environment. With a cluster file system, the failure of a computer in the cluster does not make the file system unavailable. In an operating system file system, however, if a computer sharing files through NFS or other means fails, then the file system is unavailable [5].
2.2. INDEX TYPES IN ORACLE: Oracle includes numerous data structures to improve the speed of Oracle SQL queries. Taking advantage of the low cost of disk storage, Oracle includes many new indexing algorithms that dramatically increase the speed with which Oracle queries are serviced. Oracle has different types of indexes available in it which are designed for certain scenarios and are described as follow:

b*tree indexes - the most common type (especially in OLTP environments) and the default type b*tree cluster indexes - for clusters hash cluster indexes - for hash clusters reverse key indexes - useful in Oracle Real Application Cluster (RAC) applications bitmap indexes - common in data warehouse applications partitioned indexes - also useful for data warehouse applications function-based indexes index organized tables domain indexes 2.2.1. B*Tree Indexes: B*tree stands for balanced tree. This means that the height of the index is the same for all values thereby ensuring that retrieving the data for any one value takes approximately the same amount of time as for any other value. Oracle b*tree indexes are best used when each value has a high cardinality (low number of occurrences)for example primary key indexes or unique indexes. One important point to note is that NULL values are not indexed. They are the most common type of index in OLTP systems.

B-tree indexes are used to avoid large sorting operations. For example, a SQL query requiring 10,000 rows to be presented in sorted order will often use a b-tree index to avoid the very large sort required to deliver the data to the end user.

Figure 2: Oracle B-tree index

MS-11

College of E&ME, NUST

Advanced Database Systems Management Systems 2.2.2. B*Tree Cluster Indexes:

Comparison of Latest Database

These are B*tree index defined for clusters. Clusters are two or more tables with one or more common columns and are usually accessed together (via a join). CREATE INDEX product_orders_ix ON CLUSTER product_orders;
2.2.3. Hash Cluster Indexes:

In a hash cluster rows that have the same hash key value (generated by a hash function) are stored together in the Oracle database. Hash clusters are equivalent to indexed clusters, except the index key is replaced with a hash function. This also means that here is no separate index as the hash is the index. CREATE CLUSTER emp_dept_cluster (dept_id NUMBER) HASHKEYS 50;
2.2.4. Reverse Key Indexes:

These are typically used in Oracle Real Application Cluster (RAC) applications. In this type of index the bytes of each of the indexed columns are reversed (but the column order is maintained). This is useful when new data is always inserted at one end of the index as occurs when using a sequence as it ensures new index values are created evenly across the leaf blocks preventing the index from becoming unbalanced which may in turn affect performance. CREATE INDEX emp_ix ON emp(emp_id) REVERSE;
2.2.5. Bitmap Indexes: These are commonly used in data warehouse applications for tables with no updates and whose columns have low cardinality (i.e. there are few distinct values). In this type of index Oracle stores a bitmap for each distinct value in the index with 1 bit for each row in the table. These bitmaps are expensive to maintain and are therefore not suitable for applications which make a lot of writes to the data. For example consider a car manufacturer which records information about cars sold including the colour of each car. Each colour is likely to occur many times and is therefore suitable for a bitmap index.

CREATE BITMAP INDEX car_col ON cars(colour) REVERSE;

MS-11

College of E&ME, NUST

Advanced Database Systems Management Systems

Comparison of Latest Database

Oracle uses a specialized optimizer method called a bitmapped index merge to service this query. In a bitmapped index merge, each Row-ID, or RID, list is built independently by using the bitmaps, and a special merge routine is used in order to compare the RID lists and find the intersecting values. Using this methodology, Oracle can provide sub-second response time when working against multiple low-cardinality columns.

Figure 3: Oracle Bitmap Merge Join

2.2.6. Partitioned Indexes: Partitioned Indexes are also useful in Oracle data warehouse applications where there is a large amount of data that is partitioned by a particular dimension such as time. Partition indexes can either be created as local partitioned indexes or global partitioned indexes. Local partitioned indexes mean that the index is partitioned on the same columns and with the same number of partitions as the table. For global partitioned indexes the partitioning is user defined and is not the same as the underlying table. 2.2.7. Function-based Indexes: These are indexes created on the result of a function modifying a column value. For example CREATE INDEX upp_ename ON emp(UPPER(ename((;

The function must be deterministic (always return the same value for the same input).
2.2.8. Index Organized Tables:

In an index-organized table all the data is stored in the Oracle database in a B*tree index structure defined on the table's primary key. This is ideal when related pieces of data must be stored together or data must be physically stored in a specific order. Index-organized tables are often used for information retrieval, spatial and OLAP applications.
2.2.9. Domain Indexes:

These indexes are created by user-defined indexing routines and enable the user to define his or her own indexes on custom data types (domains) such as pictures, maps or fingerprints for example. These types of index require in-depth knowledge about the data and how it will be accessed.

MS-11

College of E&ME, NUST

Advanced Database Systems Management Systems B.

Comparison of Latest Database

SQL SERVER:

SQL Server does not see data and storage in exactly the same way a DBA or end-user does. DBA sees initialized devices, device fragments allocated to databases, segments defined within databases, tables defined within segments, and rows stored in tables. PHYSICAL STORAGE STERUCTURE IN SQL SERVER: SQL Server views storage at a lower level as device fragments allocated to databases, pages allocated to tables and indexes within the database, and information stored on pages.
2.3.

There are two basic types of storage structures in a database: Linked data pages and Index trees. All information in SQL Server is stored at the page level. When a database is created, all space allocated to it is divided into a number of pages, each page is 2KB in size. There are five types of pages within SQL Server: i. ii. iii. iv. v. Data and log pages Index pages Text/image pages Allocation pages Distribution pages

All pages in SQL Server contain a page header. The page header is 32 bytes in size and contains the logical page number, the next and previous logical page numbers in the page linkage, the object_id of the object to which the page belongs, the minimum row size, the next available row number within the page, and the byte location of the start of the free space on the page.
2.3.1.

Data Pages:

A data page is the basic unit of storage within SQL Server. All the other types of pages within a database are essentially variations of the data page. All data pages contain a 32-byte header, as described earlier. With a 2KB page (2048 bytes) this leaves 2016 bytes for storing data within the data page. In SQL Server, data rows cannot cross page boundaries. The maximum size of a single row is 1962 bytes, including row overhead. Data pages are linked to one another by using the page pointers (prevpg, nextpg) contained in the page header. This page linkage enables SQL Server to locate all rows in a table by scanning all pages in the link. Data page linkage can be thought of as a two-way linked list. This enables SQL Server to easily link new pages into or unlink pages from the page linkage by adjusting the page pointers. In addition to the page header, each data page also contains data rows and a row offset table. The row-offset table grows backward from the end of the page and contains the location or each row on the data page. Each entry is 2 bytes wide. 2.3.1.1.Data Rows:
MS-11 College of E&ME, NUST 10

Advanced Database Systems Management Systems

Comparison of Latest Database

Data is stored on data pages in data rows. The size of each data row is a factor of the sum of the size of the columns plus the row overhead. Each record in a data page is assigned a row number. A single byte is used within each row to store the row number. Therefore, SQL Server has a maximum limit of 256 rows per page, because that is the largest value that can be stored in a single byte (2^8).For a data row containing all fixed-length columns, there are four bytes of overhead per row:

1 byte to store the number of variable-length columns (in this case, 0) 1 byte to store the row number 2 bytes in the row offset table at the end of the page to store the location of the row on the page

If a data row contains variable-length columns, there is additional overhead per row. A data row is variable in size if any column is defined as varchar, varbinary, or allows null values. In addition to the 4 bytes of overhead described previously, the following bytes are required to store the actual row width and location of columns within the data row:

2 bytes to store the total row width 1 byte per variable-length column to store the starting location of the column within the row 1 byte for the column offset table 1 additional byte for each 256-byte boundary passed (adjust table)

2.3.2. Allocation Pages: Space is allocated to a SQL Server database by the create database and alter database commands. The space allocated to a database is divided into a number of 2KB pages. Each page is assigned a logical page number starting at page 0 and increased sequentially. The pages are then divided into allocation units of 256 contiguous 2KB pages, or 512 bytes (1/2 MB) each. The first page of each allocation unit is an allocation page that controls the allocation of all pages within the allocation unit. The allocation pages control the allocation of pages to tables and indexes within the database. Pages are allocated in contiguous blocks of eight pages called extents. The minimum unit of allocation within a database is an extent. When a table is created, it is initially assigned a single extent, or 16KB of space, even if the table contains no rows. There are 32 extents within an allocation unit (256/8). An allocation page contains 32 extent structures for each extent within that allocation unit. Each extent structure is 16 bytes and contains the following information:
MS-11

Object ID of object to which extent is allocated Next extent ID in chain Previous extent ID in chain Allocation bitmap
College of E&ME, NUST 11

Advanced Database Systems Management Systems

Comparison of Latest Database

Deallocation bitmap Index ID (if any) to which the extent is allocated Status

The allocation bitmap for each extent structure indicates which pages within the allocated extent are in use by the table. The deallocation bit map is used to identify pages that have become empty during a transaction that has not yet been completed. The actual marking of the page as unused does not occur until the transaction is committed, to prevent another transaction from allocating the page before the transaction is complete [6]. INDEX TYPES IN SQL SERVER: All SQL Server indexes are B-Trees. There is a single root page at the top of the tree, branching out into N number of pages at each intermediate level until it reaches the bottom, or leaf level, of the index. The index tree is traversed by following pointers from the upper-level pages down through the lower-level pages. In addition, each index level is a separate page chain. There may be many intermediate levels in an index. The number of levels is dependent on the index key width, the type of index, and the number of rows and/or pages in the table. The number of levels is important in relation to index performance. 2.4.1. Non Clustered Index: A non clustered index is analogous to an index in a textbook. The data is stored in one place, the index in another, with pointers to the storage location of the data. The items in the index are stored in the order of the index key values, but the information in the table is stored in a different order (which can be dictated by a clustered index). If no clustered index is created on the table, the rows are not guaranteed to be in any particular order.
2.4.

Similar to the way you use an index in a book, Microsoft SQL Server 2000 searches for a data value by searching the non clustered index to find the location of the data value in the table and then retrieves the data directly from that location. This makes non clustered indexes the optimal choice for exact match queries because the index contains entries describing the exact location in the table of the data values being searched for in the queries. If the underlying table is sorted using a clustered index, the location is the clustering key value; otherwise, the location is the row ID (RID) comprised of the file number, page number, and slot number of the row. For example, to search for an employee ID (emp_id) in a table that has a non clustered index on the emp_id column, SQL Server looks through the index to find an entry that lists the exact page and row in the table where the matching emp_id can be found, and then goes directly to that page and row.

MS-11

College of E&ME, NUST

12

Advanced Database Systems Management Systems

Comparison of Latest Database

Figure 4: Non Clustered Index

2.4.1.1. Considerations:

Consider using non clustered indexes for:

Columns that contain a large number of distinct values, such as a combination of last name and first name (if a clustered index is used for other columns). If there are very few distinct values, such as only 1 and 0, most queries will not use the index because a table scan is usually more efficient. Queries that do not return large result sets. Columns frequently involved in search conditions of a query (WHERE clause) that return exact matches. Decision-support-system applications for which joins and grouping are frequently required. Create multiple non clustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns. Covering all columns from one table in a given query. This eliminates accessing the table or clustered index altogether.

2.4.2. Clustered Indexes:

A clustered index determines the physical order of data in a table. A clustered index is analogous to a telephone directory, which arranges data by last name. Because the clustered index dictates the physical storage order of the data in the table, a table can contain only one clustered index. However, the index can comprise multiple columns (a composite index), like the way a telephone directory is organized by last name and first name. Clustered Indexes are very similar to Oracle's IOT's (Index-Organized Tables).

MS-11

College of E&ME, NUST

13

Advanced Database Systems Management Systems

Comparison of Latest Database

Figure 5: Clustered Index

A clustered index is particularly efficient on columns that are often searched for ranges of values. After the row with the first value is found using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent. For example, if an application frequently executes a query to retrieve records between a range of dates, a clustered index can quickly locate the row containing the beginning date, and then retrieve all adjacent rows in the table until the last date is reached. This can help increase the performance of this type of query. Also, if there is a column(s) that is used frequently to sort the data retrieved from a table, it can be advantageous to cluster (physically sort) the table on that column(s) to save the cost of a sort each time the column(s) is queried. Clustered indexes are also efficient for finding a specific row when the indexed value is unique. For example, the fastest way to find a particular employee using the unique employee ID column emp_id is to create a clustered index or PRIMARY KEY constraint on the emp_id column. Note PRIMARY KEY constraints create clustered indexes automatically if no clustered index already exists on the table and a non clustered index is not specified when you create the PRIMARY KEY constraint.
2.4.2.1. Considerations:

It is important to define the clustered index key with as few columns as possible. If a large clustered index key is defined, any non clustered indexes that are defined on the same table will be significantly larger because the non clustered index entries contain the clustering key. Consider using a clustered index for:
MS-11

Columns that contain a large number of distinct values.


College of E&ME, NUST 14

Advanced Database Systems Management Systems

Comparison of Latest Database

Queries that return a range of values using operators such as BETWEEN, >, >=, <, and <=. Columns that are accessed sequentially. Queries that return large result sets. Columns that are frequently accessed by queries involving join or GROUP BY clauses; typically these are foreign key columns. An index on the column(s) specified in the ORDER BY or GROUP BY clause eliminates the need for SQL Server to sort the data because the rows are already sorted. This improves query performance. OLTP-type applications where very fast single row lookup is required, typically by means of the primary key. Create a clustered index on the primary key.

Clustered indexes are not a good choice for: Columns that undergo frequent changes

This results in the entire row moving (because SQL Server must keep the data values of a row in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile. Wide keys

The key values from the clustered index are used by all non clustered indexes as lookup keys and therefore are stored in each non clustered index leaf entry [7]. In addition to an index being clustered or non clustered, it can be configured in other ways:

Composite index: An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesnt exceed the 900-byte limit. Both clustered and non clustered indexes can be composite indexes. Unique Index: An index that ensures the uniqueness of each value in the indexed column. If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. For example, if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated.

A unique index is automatically created when you define a primary key or unique constraint:

Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, non clustered index on the primary key. Unique: When you define a unique constraint, SQL Server automatically creates a unique, non clustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table.

MS-11

College of E&ME, NUST

15

Advanced Database Systems Management Systems

Comparison of Latest Database

Covering index: A type of index that includes all the columns that are needed to process a particular query. For example, your query might retrieve the FirstName and LastName columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns.

C. My SQL:
The MySQL server provides a database management system with querying and connectivity capabilities, as well as the ability to have excellent data structure and integration with many different platforms. It can handle large databases reliably and quickly in high-demanding production environments. The MySQL server also provides rich function such as its connectivity, speed, and security that make it suitable for accessing databases. The MySQL server works in a client and server system. This system includes a multiple-threaded SQL server that supports varied back ends, different client programs and libraries, administrative tools, and many application programming interfaces (API)s. 2.5.PHYSICAL STORAGE STRUCUTRE IN MySQL: The storage engines supported by MySQL are explained as follows:
2.5.1. InnoDB:

A transaction-safe (ACID compliant) storage engine for MySQL that has commit, rollback, and crash-recovery capabilities to protect user data. InnoDB row-level locking (without escalation to coarser granularity locks) and Oracle-style consistent non locking reads increase multi-user concurrency and performance. InnoDB stores user data in clustered indexes to reduce I/O for common queries based on primary keys. To maintain data integrity, InnoDB also supports FOREIGN KEY referential-integrity constraints. InnoDB is the default storage engine as of MySQL 5.5.5.
2.5.2. MyISAM:

The MySQL storage engine that is used the most in Web, data warehousing, and other application environments. MyISAM is supported in all MySQL configurations, and is the default storage engine prior to MySQL 5.5.5.
2.5.3. Memory:

This storage engine stores all data in RAM for extremely fast access in environments that require quick lookups of reference and other like data. This engine was formerly known as the HEAP engine.

MS-11

College of E&ME, NUST

16

Advanced Database Systems Management Systems 2.5.4. Merge:

Comparison of Latest Database

Enables a MySQL DBA or developer to logically group a series of identical MyISAM tables and reference them as one object. Good for VLDB environments such as data warehousing.
2.5.5. Archive:

This storage engine provides the perfect solution for storing and retrieving large amounts of seldom-referenced historical, archived, or security audit information.
2.5.6. Federated:

It offers the ability to link separate MySQL servers to create one logical database from many physical servers. This storage engine is very good for distributed or data mart environments.
2.5.7. CSV:

The CSV storage engine stores data in text files using comma-separated values format. It can be used to easily exchange data between other software and applications that can import and export in CSV format.
2.5.8. Blackhole:

The Blackhole storage engine accepts but does not store data and retrievals always return an empty set. The functionality can be used in distributed database design where data is automatically replicated, but not stored locally. It is important to remember that you are not restricted to using the same storage engine for an entire server or schema: you can use a different storage engine for each table in your schema [9]. The following table provides an overview of some storage engines provided with MySQL: Feature Storage limits Transactions Locking granularity MVCC Geospatial data type support Geospatial indexing support B-tree indexes Hash indexes Full-text search indexes MyISAM Memory InnoDB Archive NDB 256TB No Table No Yes Yes Yes No Yes RAM No Table No No No Yes Yes No 64TB Yes Row Yes Yes No Yes No No None No Table No Yes No No No No 384EB Yes Row No Yes No Yes Yes No

MS-11

College of E&ME, NUST

17

Advanced Database Systems Management Systems

Comparison of Latest Database

Feature Clustered indexes Data caches Index caches Compressed data Encrypted data[d] Cluster database support Replication support[e] Foreign key support Backup / point-in-time recovery[f] Query cache support Update statistics for data dictionary 2.6.INDEX TYPES IN MySQL:

MyISAM Memory InnoDB Archive NDB No No Yes Yes Yes No Yes No Yes Yes Yes No N/A N/A No Yes No Yes No Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes No No No Yes Yes No Yes No Yes Yes Yes No Yes Yes No Yes Yes Yes No Yes Yes Yes

MySQL allows four general types of indexes (keys). These indexes can be created either on single column or multi-columns however both single column index and multi columns index have some different behaviors. Primary Indexes (also called clustered indexes) Unique Key Indexes Normal Indexes Full text Indexes 2.6.1. Primary Index:

The primary keys are almost always added when creating the table. However, MySQL provides more than one SQL statements to create primary key indexes. MySQL includes the standard storage engine, as well as the InnoDB storage engine, which is touted as a transaction-safe, ACID-compliant database with some additional features over the standard version. Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key.

When a PRIMARY KEY is defined on a table, InnoDB uses it as the clustered index. Primary key for each table must be created. If there is no logical unique and non-null column or set of columns, a new auto-increment column must be added, whose values are filled in automatically. If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is

MS-11

College of E&ME, NUST

18

Advanced Database Systems Management Systems

Comparison of Latest Database

a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order [8]. Syntax to create primary key index when creating a table is: CREATE TABLE tablename ([...], PRIMARY KEY (columns_to_index)); A PRIMARY index is intended as a way to uniquely identify any row in the table, so it shouldn't be used on any columns which allow NULL values. PRIMARY index should always be on the smallest number of columns that are sufficient to uniquely identify a row. Often, this is just one column containing a unique auto-incremented number, but if there is anything else that can uniquely identify a row, such as "countrycode" in a list of countries, it can be used instead. 2.6.2. Unique Key Index: A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. This constraint does not apply to NULL values.

If a PRIMARY KEY for a table is not defined, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.

MySQL doesnt let you define a primary key over nullable columns, for this reason. Since MySQL allows to create unique key or index on single column as well as on multi columns (combination of columns) of the table, you can create multi columns unique index using CREATE TABLE, ALTER TABLE and CREATE UNIQUE INDEX statements provided by the MySQL. A unique index means that two rows cannot have the same index value. Here is the syntax to create an Index on a table: CREATE UNIQUE INDEX index_name ON table_name (columne1, column2,...); 2.6.3. Normal Index: Index without constraints is also known as "normal index, ordinary index, or non-unique index". KEY or INDEX refers to a normal non-unique index. Non-distinct values for the index are allowed, so the index may contain rows with identical values in all columns of the index. These indexes don't enforce any structure on your data so they are used only for speeding up queries. Since MySQL allows creating normal key or index on single column as well as on multi-columns (combination of columns) of the table, you can create multi columns normal index using CREATE TABLE, ALTER TABLE and CREATE INDEX statements provided by the MySQL. 2.6.4. Full text Index: FULLTEXT indexes are different to all of the above, and their behavior differs more between database systems. Unlike the above three, which are typically b-tree (allowing for selecting, sorting or ranges starting from left most column) or hash (allowing for selection starting from left most
MS-11 College of E&ME, NUST 19

Advanced Database Systems Management Systems

Comparison of Latest Database

column), FULLTEXT indexes are only useful for full text searches done with the MATCH() / AGAINST() clause. A full text index can be created for a TEXT, CHAR or VARCHAR type field, or combination of fields. FULLTEXT indexes are most often used to search natural language text, such as through newspaper articles, web page contents and so on. For this reason MySQL has added a number of features to assist this kind of searching. MySQL does not index any words less than or equal to 3 characters in length, nor does it index any words that appear in more than 50% of the rows. This means that if your table contains 2 or less rows, a search on a FULLTEXT index will never return anything.

D. DB2:
DB2 Universal Database is a Web-enabled relational database management system that supports data warehousing and transaction processing. It can be scaled from hand-held computers to single processors to clusters of computers and is multimedia-capable with image, audio, video, and text support. 2.7.PHYSICAL STORAGE STRUCTURE IN DB2: In DB2, the hierarchy of storage units (from least inclusive to most inclusive) is Page, Extent, Container, Table space, and Database as shown in Figure 6.

Figure 6: Storage Structure of DB2

2.7.1. Page:

MS-11

College of E&ME, NUST

20

Advanced Database Systems Management Systems

Comparison of Latest Database

In DB2, data is stored in a structure called a Page, it is the equivalent of an Oracle Block. The basic page structure in DB2 is that there is a fixed length page header, a variable length page trailer, and the space in between is used for data or free space. The information stored in a page can be data records (rows or indexes) or system information such as a table space spacemap page, which keeps track of the free space in data pages in a table space. The Page structure has been shown in Figure 7.

Figure 7: DB2 database page

Database page size can vary and is determined by the buffer pool size. Pages sizes can be 4, 8, 16 and 32KB. It is possible to set up a database instance that has mixed page sizes in it 4KB for system pages, and 16KB for user data page sizes.
2.7.2. Extent:

An extent in DB2 contains a number of contiguous pages. The page size and extent size in DB2 are set at the time that the table space is created and cannot be altered easily (i.e. by an ALTER command), once they are defined they are set for the life of the database. It is inefficient for a DBMS to work on individual pages, the I/O overhead eats into performance. Database I/O works with extents, in this way a number of contiguous pages are fetched from disk in a single I/O. Table data cannot be mixed within extents, so one extent is for one table only. An extent is a multiple of pages, so selection of the page and extent size will have an impact on the type of application processing you are doing. Typically an OLTP environment will have small pages (4k,8k), and a small number of pages per extent: 4,6,16. Decision Support and Data Warehousing applications will have larger pages (16 or 32K), and larger extents (16, 32, 64 or 128). The extent size should be chosen based on the table size and anticipated usage, depending on whether the application is query intensive, transaction intensive or a mixture of both. Smaller tables are handled more efficiently with smaller extents.

MS-11

College of E&ME, NUST

21

Advanced Database Systems Management Systems

Comparison of Latest Database

The extent size is used by DB2 to determine the size of the prefetch block when data is retrieved. It also determines the number of pages that will be written to a container before skipping to the next container. If new data needs to be written and there is insufficient space, the write will physically allocate a full extent of contiguous pages. 2.7.3. Container: Containers are how DB2 handles the storage of data on the operating systems file system. Data in the database is viewed logically as table spaces, tables, rows and so on. Each table space is stored and managed in one or more containers, and how the table space is defined determines how data in the table space is handled on disk. Containers can be managed by the DBA Database Managed Storage, by the Operating System File System System Managed Storage, or by the Database Automatic Storage. Containers (and how they are managed) are more concerned with the efficient usage of storage than the logical considerations of the database. 2.7.4. Table Space: A tablespace is a very important structure in DB2. Tablespaces are made up of a number of containers which map directly to the disk. The containers in the tablespace specify the storage paths or directory structures that will physically hold the data. Tablespaces contain tables, and hence user data that maps to the user and database applications. When you create a table, it has to go in a tablespace, and you can specify which tablespace to use (if you dont, the system will default where it puts the table, and it might not be where you need it). The tablespace is also what is mapped into RAM. For each tablespace, you have to define the page size, and each tablespace must have access to a bufferpool of the same page size, and a system temporary tablespace of the same page size. You have to specify which buffer pool the table space is associated with.

MS-11

College of E&ME, NUST

22

Advanced Database Systems Management Systems

Comparison of Latest Database

Figure 8:DB2 Extents, Container and Disk

2.7.5. Database: In DB2 (like Oracle) a database is comprised of a number of tablespaces. The size of a database is the sum of the tablespaces it contains, and the containers within each tablespace have storage paths within them that map to the underlying disk system, whether this is managed by the OS, DBMS or DBA. When a database is first created, three system tablespaces are created: The Catalog tablespace:

The catalog tablespace is a Regular tablespace called SYSCATSPACE and it holds the system catalog tables. There is only one catalog table space per database. Large table space:

DB2 creates one large table space named USERSPACE1 when a database is created. A large table space stores all permanent data just as a regular table space does, including LOBs. This table space type must be DMS.

MS-11

College of E&ME, NUST

23

Advanced Database Systems Management Systems

Comparison of Latest Database

System temporary table space:

At least one system temporary must exist per database, and the default named TEMPSPACE1 is created with the database. A system temporary table space stores internal temporary data required during SQL operations such as joins, sorts, index creation, and table reorganization. At database creation time, the recovery logs for the database are created, and a set of system tables is created in SYSCATSPACE. These system catalog tables contain information about the definitions of the database objects (i.e. tables, views, indexes, and packages), and database security information (i.e. which users can have access to these objects.) These tables are updated during the operation of a database; for example, when a table is created. The information in these tables is available through a series of routines and views. After the database is created, the user can create additional Large Tablespaces, Regular Tablespaces and User Temporary Tablespaces. By default, no User Temporary Tablespaces are created along with the database [10]. 2.8.INDEX TYPES IN DB2: 2.8.1. Unique indexes:: When you define a unique index on a DB2 table, you ensure that no duplicate values of the index key exist in the table. 2.8.2. Clustering indexes: When you define a clustering index on a DB2 table, you direct DB2 to insert rows into the table in the order of the clustering key values. The first index that you define on the table serves implicitly as the clustering index unless you explicitly specify CLUSTER when you create or alter another index. You can specify CLUSTER for any index, whether or not it is a partitioning index. 2.8.3. Partitioning indexes: Before DB2 Version 8, when you defined a partitioning index on a table in a partitioned table space, you specified the partitioning key and the limit key values in the PART VALUES clause of the CREATE INDEX statement. This type of partitioning is referred to as index-controlled partitioning. A partitioning index is optional; table-controlled partitioning is a replacement for index-controlled partitioning. For more information, see Moving from index-controlled to table-controlled partitioning. 2.8.4. Secondary indexes: A secondary index is any index that is not a partitioning index. You can create an index on a table to enforce a uniqueness constraint, to cluster data, or most typically to provide access paths to data for queries.

MS-11

College of E&ME, NUST

24

Advanced Database Systems Management Systems

Comparison of Latest Database

The usefulness of an index depends on the columns in its key and on the cardinality of the key. Columns that you use frequently in performing selection, join, grouping, and ordering operations are good candidates for keys. In addition, the number of distinct values in an index key for a large table must be sufficient for DB2 to use the index for data retrieval; otherwise, DB2 could choose to perform a table space scan. A secondary index can be partitioned or not. This section discusses the two types of secondary indexes: Non partitioned secondary index (NPSI) Data-partitioned secondary index (DPSI)

2.8.4.1.Non partitioned secondary index (NPSI): A non partitioned secondary index is any index that is not defined as a partitioning index or a partitioned index. You can create a non partitioned secondary index on a table that resides in a partitioned table space or a non partitioned table space. 2.8.4.2.Data-partitioned secondary index (DPSI): A data-partitioned secondary index is any index that is not defined as a partitioning index but is defined as a partitioned index. You can create a partitioned secondary index only on a table that resides in a partitioned table space. The partitioning scheme is the same as that of the data in the underlying table. That is, the index entries that reference data in physical partition 1 of a table reside in physical partition 1 of the index, and so on [11].

E. TERADATA:
2.9.PHYSICAL STORAGE STRUCTURE IN TERADATA: The physical storage of the database has been virtual for a long time. Virtual in the sense that the data placement, organization and management are automatically taken care of by the Teradata Database with minimal user input or administration. There are no tablespaces to define or maintain, no buffer storage or temporary workspaces to manage, no requirement to define how to spread data evenly across drives and controllers, and no need to perform periodic table or index reorganizations. The new Teradata Virtual Storage option can enhance storage management. To maintain optimum system balance in a data warehouse from Teradata, this new capability allows the physical storage to include a mix of devices with different capacities and performance levels. Additionally, Teradata Virtual Storage manages the data that gets distributed across those resources. 2.9.1. Virtualization: Prior to Teradata 13, all storage in the configuration was assumed to have equivalent capacity and performance, and the only parameter used to determine where to store a piece of data was the ID of

MS-11

College of E&ME, NUST

25

Advanced Database Systems Management Systems

Comparison of Latest Database

the owning AMP (The database engine which is the only entity in the database that can create, read, lock, unlock, update or delete a particular row of data). Since each drive (or drive pair with RAID1) is reserved for the use of a single AMP, any data from that AMP could be written to any of the drives that were reserved for it. All drives have equal performance, so to help balance the system; all AMPs are given access to an equal number of drives. However, this makes it difficult to alter the amount of storage in the system for two reasons: Any new drives must have the same capacity as the original drives, and enough drives must be added to maintain an equal number for each AMP. The link between each disk drive and an AMP is removed in Teradata Database 13 so that those constraints can be relaxed in Teradata Virtual Storage, allowing storage to be made up of mixed capacities. Even though there might not be enough high-capacity drives for each AMP, Teradata Virtual Storage will ensure the necessary space is allocated to each AMP to maintain balanced performance. 2.9.2. Allocation: In addition to changing how mixed storage is handled, Teradata Virtual Storage can also manipulate how it is used by the database. Space in the database can be allocated based on the owning AMP and the performance requirementor response timefor the data that will be stored. This is accomplished by measuring the expected performance of each drive within the storage arrays. These measurements are used to match the correct storage resources with the type of data, including user data (tables/rows), spool data, indexes, logs, journals, and fallback data and so on.
2.9.3. Temperature Monitoring:

In data warehousing terms, temperature represents the relative demand for a particular set of data (i.e., tables, partitions, rows). The datas temperature is described by a few qualitative terms, as opposed to a large range of explicit numerical values. Hot refers to data that is accessed frequently by users, such as the last 30 days of data in the promotions redemption table. Cold refers to data that is infrequently accessed. While temperature refers to the frequency of data access, terms like fast and slow are explicit to the performance characteristics of storage devices. In other words, the relative performance of each storage device is measured by response time. 2.9.4. Data Migration: Teradata Virtual Storage can automatically move data to different storage locations that best match its currently measured temperature. This is called migration. If there is a large difference between the datas temperature and the response time of the location where it is currently stored, the data can be moved to a more appropriate location (either faster or slower).This serves two purposes: correcting any original placements that were made in predicting the initial data temperature and modifying the placement of data to match its usage as it progresses through different temperature cycles.

MS-11

College of E&ME, NUST

26

Advanced Database Systems Management Systems

Comparison of Latest Database

2.9.5. Varied Uses: By more closely integrating the database and storage capabilities, Teradata Virtual Storage more effectively and efficiently uses storage for active and multi-temperature data warehouses. Many organizations will factor Teradata Virtual Storage in their planning efforts to expand their data warehouses with new subject areas and deeper histories, or additional applications and the associated processing and storage requirements. As updated storage solutions and components are released, Teradata Virtual Storage will let customers add them to the data warehouse. 2.10. INDEX TYPES IN TERADATA:

In the Teradata RDBMS, an index is used to define row uniqueness and retrieve data rows; it also can be used to enforce the primary key and unique constraints for a table. The Teradata RDBMS support five types of indexes: Unique Primary Index (UPI) Unique Secondary Index (USI) Non-Unique Primary Index (NUPI) Non-Unique Secondary Index (NUPI) Join Index

2.10.1. Unique Primary Index: Primary index determines the distribution of table rows on the disks controlled by AMPs. In Teradata RDBMS, a primary index is required for row distribution and storage. When a new row is inserted, its hash code is derived by applying a hashing algorithm to the value in the column(s) of the primary code (as show in the following figure). Rows having the same primary index value are stored on the same AMP.

Figure 9: Primary index concept

MS-11

College of E&ME, NUST

27

Advanced Database Systems Management Systems

Comparison of Latest Database

2.10.2. Secondary Index: In addition to a primary index, up to 32 unique and non-unique secondary indexes can be defined for a table. Comparing to primary indexes, Secondary indexes allow access to information in a table by alternate, less frequently used paths. A secondary index is a sub table that is stored in all AMPs, but separately from the primary table. The sub tables, which are built and maintained by the system, contain the following;

RowIDs of the sub-table rows Base table index column values RowIDs of the base table rows (points)

As shown in the following Figure 10, the secondary index sub-table on each AMP is associated with the base table by the rowID.

Figure 10: Secondary Index Concept 2.10.3. Join Index: A join index is an indexing structure containing columns from multiple tables; specifically the resulting columns form one or more tables. Rather than having to join individual tables each time the join operation is needed, the query can be resolved via a join index and, in most cases, dramatically improve performance. Effects of Join Index Depending on the complexity of the joins, the Join Index helps improve the performance of certain types of work. The following need to be considered when manipulating join indexes:

MS-11

College of E&ME, NUST

28

Advanced Database Systems Management Systems

Comparison of Latest Database

Load Utilities The join indexes are not supported by MultiLoad and FastLoad utilities, they must be dropped and recreated after the table has been loaded. Archive and Restore Archive and Restore cannot be used on join index itself. During a restore of a base table or database, the join index is marked as invalid. The join index must be dropped and recreated before it can be used again in the execution of queries. Fallback Protection Join index sub tables cannot be Fallback-protected. Permanent Journal Recovery The join index is not automatically rebuilt during the recovery process. Instead, the join index is marked as invalid and the join index must be dropped and recreated before it can be used again in the execution of queries. Triggers A join index cannot be defined on a table with triggers. Collecting Statistics In general, there is no benefit in collecting statistics on a join index for joining columns specified in the join index definition itself. Statistics related to these columns should be collected on the underlying base table rather than on the join index.

MS-11

College of E&ME, NUST

29

Advanced Database Systems Management Systems

Comparison of Latest Database

3. COMPARISON:
In this section the comparison of above stated database management systems has been done on the basis of their indexing and physical storage structure. 3.1.COMPARISON ON THE BASIS OF INDEXES:

ORACLE B tree index Primary index (Cluster index) Partitioned index Reverse key index Unique index Normal index Function based index Domain index Non Clustered index Secondary Index Bitmap index Hash index Full Text index Partial index X X X X X X X X X X

SQL SERVER X X

MySQL

DB2 X

TERADAT A

X X X

X X

X X X X X X X X X X X X

In the above table the available indexes have been compared with the database management systems. The X sign indicates the presence of certain index in particular DBMS. 3.2.COMPARISON ON THE BASIS OF PHYSICAL STORAGE STRUCTURE:
MS-11 College of E&ME, NUST 30

Advanced Database Systems Management Systems

Comparison of Latest Database

The comparative framework is attached as Annex A

4. BEST RECOMMENDED TOOL:

Oracle is well known for its industry-leading database engine, Oracle Database. Oracle Database is an extremely reliable, highly scalable, client-server, relational database management system (RDBMS). It serves as the information repository for a wide range of applications in a large variety of industries around the world.

Oracle is targeted towards large organizations with the high cost it comes with. Organizations with strength of 2000 to 4000 employees are considered to be in the large category.

The physical storage structure of Oracle database includes; data files, redo log files, and control files.

It allows the user to define logical storage units on available physical storage spaces and allows for creation and management of data files manually.

Oracle has a powerful Clustering mechanism. One of the main advantage of oracle is that it can be used on all operating systems i.e. Linux, windows and many others.

Oracle uses so many security features like username, password, profiles, local authentication, external authentication, advance security enhancements etc. While DBMS like MySQL uses only three parameters to authenticate a user namely user name, password and Location.

Oracle has functionality within the database server to automate the management of its structure.

Oracle Enterprise Manager provides a Web-based graphical user interface to enable easy management and monitoring of database.

MS-11

College of E&ME, NUST

31

Advanced Database Systems Management Systems

Comparison of Latest Database

Oracle provides alerts, advisories, and monitoring pages to help make decisions regarding database storage.

Oracle support for manual management provides a very reasonable solution for a storage structure optimization in this scenario of large organization.

Oracle is recommended as its storage structure is better than Sql Server and TeraData because it has a large list of indexes and its file structure is also very consistent.

With Oracle, the process of choosing partitioning methods and partitioning keys is the balancing of query access path, performance, and data load requirements. Specifically the partitioning constraints and their relationship to disk storage are managed.

Teradata parallelism is automatic, pervasive, and database managed.

MS-11

College of E&ME, NUST

32

Advanced Database Systems Management Systems

Comparison of Latest Database

REFERENCES: [1] http://www.esp.org/db-fund.pdf [2]http://www.nou.edu.ng/noun/NOUN_OCL/pdf/pdf2/MBA%20758%20Database %20Management%20System.pdf [3]http://wiki.answers.com/Q/Discuss_the_capabilities_and_features_of_a_database_management _system_dbms_what_set_of_functions_does_a_dbms_provide_give_some_examples_of_dbms_so ftware [4] http://churmura.com/technology/computer-science/physical-storage-structures-in-oracle/29192/ [5] http://docs.oracle.com/cd/E11882_01/server.112/e16508/physical.htm [6] http://technet.microsoft.com/en-us/library/cc966414.aspx [7] http://www.akadia.com/services/sqlsrv_data_structure.html [8] http://dev.mysql.com/doc/refman/5.6/en/innodb-table-and-index.html# [9] http://docs.oracle.com/cd/E12151_01/doc.150/e12155/oracle_mysql_compared.htm [10]http://en.wikibooks.org/wiki/Oracle_and_DB2,_Comparison_and_Compatibility/Storage_Mod el/Physical_Storage/DB2 [11] http://yanghui8.wordpress.com/2008/06/24/db2-v8-types-of-indexes/

MS-11

College of E&ME, NUST

33

You might also like