Master of Computer Application-Mc0077

Master of Computer Application (MCA)
MC0077 – Advanced Database Systems

Answer all Questions Each Question carries TEN Marks
1. Explain the following normal forms with a suitable example demonstrating the reduction of a
sample table into the said normal forms:
A) First Normal Form B) Second Normal Form C) Third Normal Form

Ans: 1NF A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values
only
Example: 1NF but not 2NF
FIRST (supplier_no, status, city, part_no, quantity)

Functional Dependencies:
(supplier_no, part_no) ® quantity
(supplier_no) ® status
(supplier_no) ® city
city ® status (Supplier's status is determined by location)
Comments:
Non-key attributes are not mutually independent (city ® status).

Non-key attributes are not fully functionally dependent on the primary key (i.e., status and city are dependent on
just part of the key, namely supplier_no).
Anomalies:
INSERT: We cannot enter the fact that a given supplier is located in a given city until that supplier supplies at
least one part (otherwise, we would have to enter a null value for a column participating in the primary key C a
violation of the definition of a relation).
DELETE: If we delete the last (only) row for a given supplier, we lose the information that the supplier is
located in a particular city.
UPDATE: The city value appears many times for the same supplier. This can lead to inconsistency or the need
to change many values of city if a supplier moves.
Decomposition (into 2NF):
SECOND (supplier_no, status, city)

SUPPLIER_PART (supplier_no, part_no, quantity)
2NF A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully
dependent on the primary key
Example (2NF but not 3NF):
SECOND (supplier_no, status, city)
supplier_no ® status
supplier_no ® city
city ® status
Comments:
Lacks mutual independence among non-key attributes.
Mutual dependence is reflected in the transitive dependencies: supplier_no ® city, city ® status.
Anomalies:
INSERT: We cannot record that a particular city has a particular status until we have a supplier in that city.
DELETE: If we delete a supplier which happens to be the last row for a given city value, we lose the fact that
the city has the given status.
UPDATE: The status for a given city occurs many times, therefore leading to multiple updates and possible loss
of consistency.
Decomposition (into 3NF):
SUPPLIER_CITY (supplier_no, city)
CITY_STATUS (city, status)
3NF A relation R is in third normal form (3NF) if and only if it is in 2NF and every non-key attribute is non-
transitively dependent on the primary key. An attribute C is transitively dependent on attribute A if there exists
an attribute B such that: A ® B and B ® C. Note that 3NF is concerned with transitive dependencies which do
not involve candidate keys. A 3NF relation with more than one candidate key will clearly have transitive
dependencies of the form: primary_key ® other_candidate_key ® any_non-key_column
An alternative (and equivalent) definition for relations with just one candidate key is:
A relation R having just one candidate key is in third normal form (3NF) if and only if the non-key attributes of
R (if any) are: 1) mutually independent, and 2) fully dependent on the primary key of R. A non-key attribute is
any column which is not part of the primary key. Two or more attributes are mutually independent if none of
the attributes is functionally dependent on any of the others. Attribute Y is fully functionally dependent on
attribute X if X ® Y, but Y is not functionally dependent on any proper subset of the (possibly composite)
attribute X
For relations with just one candidate key, this is equivalent to the simpler:
A relation R having just one candidate key is in third normal form (3NF) if and only if no non-key column (or
group of columns) determines another non-key column (or group of columns)
Example (3NF but not BCNF):

SUPPLIER_PART (supplier_no, supplier_name, part_no, quantity)
We assume that supplier_name's are always unique to each supplier. Thus we have two candidate keys:
(supplier_no, part_no) and (supplier_name, part_no)
Thus we have the following dependencies:
(supplier_no, part_no) ® quantity
(supplier_no, part_no) ® supplier_name
(supplier_name, part_no) ® quantity
(supplier_name, part_no) ® supplier_no
supplier_name ® supplier_no
supplier_no ® supplier_name
Comments:
Although supplier_name ® supplier_no (and vice versa), supplier_no is not a non-key column — it is part of the
primary key! Hence this relation technically satisfies the definition(s) of 3NF (and likewise 2NF, again because
supplier_no is not a non-key column).
Anomalies:
INSERT: We cannot record the name of a supplier until that supplier supplies at least one part.
DELETE: If a supplier temporarily stops supplying and we delete the last row for that supplier, we lose the
supplier's name.
UPDATE: If a supplier changes name, that change will have to be made to multiple rows (wasting resources
and risking loss of consistency).
2. Explain the concept of a Query? How a Query Optimizer works.
Ans: Queries are essentially powerful Filters. Queries allow you to decide what fields or expressions are to be
shown and what information to be sought. Queries are usually based on Tables but can also be based on an
existing Query. Queries allow you seek from very basic information through to much more complicated
specifications. They also allow you to list information in a particular order, such as listing all the resulting
records in Surname order for example.
Queries can select records that fit certain criteria. If you had a list of people and had a gender field you could
use a query to select just the males or females in the database. The gender field would have a criteria set as
"male" which means that when the query is run only records with "male" in the Gender field would be listed.
For each record that meets the criteria the you could choose to list other fields that may be in the table like first
name, surname, phone number, date of birth or whatever you may have in the database.
Queries can do much more than just listing out records. It is also possible to list out totals, averages etc. from
the data and do various other calculations. Queries can also be used to do other tasks, such as deleting records,
updating records, adding new records, creating new tables and creating tabulated reports
The query optimizer is the component of a database management system that attempts to determine the most
efficient way to execute a query. The optimizer considers the possible query plans for a given input query, and
attempts to determine which of those plans will be the most efficient. Cost-based query optimizers assign an
estimated "cost" to each possible query plan, and choose the plan with the smallest cost. Costs are used to
estimate the runtime cost of evaluating the query, in terms of the number of I/O operations required, the CPU
requirements, and other factors determined from the data dictionary. The set of query plans examined is formed
by examining the possible access paths (e.g. index scan, sequential scan) and join algorithms (e.g. sort-merge
join, hash join, nested loop join). The search space can become quite large depending on the complexity of the
SQL query.
Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to database
server, and parsed by the parser, they are then passed to the query optimizer where optimization occurs.
However, some database engines allow guiding the query optimizer with hints.
Most query optimizers represent query plans as a tree of "plan nodes". A plan node encapsulates a single
operation that is required to execute the query. The nodes are arranged as a tree, in which intermediate results
flow from the bottom of the tree to the top. Each node has zero or more child nodes—those are nodes whose
output is fed as input to the parent node. For example, a join node will have two child nodes, which represent
the two join operands, whereas a sort node would have a single child node (the input to be sorted). The leaves of
the tree are nodes which produce results by scanning the disk, for example by performing an index scan or a
sequential scan
Join ordering
The performance of a query plan is determined largely by the order in which the tables are joined. For example,
when joining 3 tables A, B, C of size 10 rows, 10,000 rows, and 1,000,000 rows, respectively, a query plan that
joins B and C first can take several orders-of-magnitude more time to execute than one that joins A and C first.
Most query optimizers determine join order via a dynamic programming algorithm pioneered by IBM's System
R database project. This algorithm works in two stages:
1. First, all ways to access each relation in the query are computed. Every relation in the query can be
accessed via a sequential scan. If there is an index on a relation that can be used to answer a predicate in
the query, an index scan can also be used. For each relation, the optimizer records the cheapest way to
scan the relation, as well as the cheapest way to scan the relation that produces records in a particular
sorted order.
2. The optimizer then considers combining each pair of relations for which a join condition exists. For each
pair, the optimizer will consider the available join algorithms implemented by the DBMS. It will
preserve the cheapest way to join each pair of relations, in addition to the cheapest way to join each pair
of relations that produces its output according to a particular sort order.
3. Then all three-relation query plans are computed, by joining each two-relation plan produced by the
previous phase with the remaining relations in the query.
In this manner, a query plan is eventually produced that joins all the queries in the relation. Note that the
algorithm keeps track of the sort order of the result set produced by a query plan, also called an interesting
order. During dynamic programming, one query plan is considered to beat another query plan that produces the
same result, only if they produce the same sort order. This is done for two reasons. First, a particular sort order
can avoid a redundant sort operation later on in processing the query. Second, a particular sort order can speed
up a subsequent join because it clusters the data in a particular way.
Historically, System-R derived query optimizers would often only consider left-deep query plans, which first
join two base tables together, then join the intermediate result with another base table, and so on. This heuristic
reduces the number of plans that need to be considered (n! instead of 4^n), but may result in not considering the
optimal query plan. This heuristic is drawn from the observation that join algorithms such as nested loops only
require a single tuple (aka row) of the outer relation at a time. Therefore, a left-deep query plan means that
fewer tuples need to be held in memory at any time: the outer relation's join plan need only be executed until a
single tuple is produced, and then the inner base relation can be scanned (this technique is called "pipelining").
Subsequent query optimizers have expanded this plan space to consider "bushy" query plans, where both
operands to a join operator could be intermediate results from other joins. Such bushy plans are especially
important in parallel computers because they allow different portions of the plan to be evaluated independently.
Q.3. Explain the following with respect to Heuristics of Query Optimizations:

A) Equivalence of Expressions B) Selection Operation
C) Projection Operation D) Natural Join Operation
 Equivalent expressions
We often want to replace a complicated expression with a simpler one that means the same thing. For example,
the expression x + 4 + 2 obviously means the same thing as x + 6, since 4 + 2 = 6. More interestingly, the
expression x + x + 4 means the same thing as 2x + 4, because 2x is x + x when you think of multiplication as
repeated addition. (Which of these is simpler depends on your point of view, but usually 2x + 4 is more
convenient in Algebra.)
Two algebraic expressions are equivalent if they always lead to the same result when you evaluate them, no
matter what values you substitute for the variables. For example, if you substitute x := 3 in x + x + 4, then you
get 3 + 3 + 4, which works out to 10; and if you substitute it in 2x + 4, then you get 2(3) + 4, which also works
out to 10. There's nothing special about 3 here; the same thing would happen no matter what value we used, so
x + x + 4 is equivalent to 2x + 4. (That's really what I meant when I said that they mean the same thing.)
When I say that you get the same result, this includes the possibility that the result is undefined. For example,
1/x + 1/x is equivalent to 2/x; even when you substitute x := 0, they both come out the same (in this case,
undefined). In contrast, x2/x is not equivalent to x; they usually come out the same, but they are different when
x := 0. (Then x2/x is undefined,
but x is 0.) To deal with this situation, there is a sort of trick you can play, forcing the second expression to be
undefined in certain cases. Just add the words ‘for x ≠ 0’ at the end of the expression to make a new expression;
then the new expression is undefined unless x ≠ 0. (You can put any other condition you like in place of x ≠ 0,
whatever is appropriate in a given situation.) So x2/x is equivalent to x for x ≠ 0.
To symbolise equivalent expressions, people often simply use an equals sign. For example, they might say ‘x +
x + 4 = 2x + 4’. The idea is that this is a statement that is always true, no matter what x is. However, it isn't
really correct to write ‘1/x + 1/x = 2/x’ to indicate an equivalence of expressions, because this statement is not
correct when x := 0. So instead, I will use the symbol ‘≡’, which you can read ‘is equivalent to’ (instead of ‘is
equal to’ for ‘=’). So I'll say, for example,
 x + x + 4 ≡ 2x + 4,
 1/x + 1/x ≡ 2/x, and
 x2/x ≡ x for x ≠ 0.
The textbook, however, just uses ‘=’ for everything, so you can too,
 Selection Operation
1. Consider the query to find the assets and branch-names of all banks who have depositors living in Port
Chester. In relational algebra, this is
2.
3. (CUSTOMER DEPOSIT BRANCH))
4.
o This expression constructs a huge relation,
o CUSTOMER DEPOSIT BRANCH
of which we are only interested in a few tuples.
o We also are only interested in two attributes of this relation.

o We can see that we only want tuples for which CCITY = ``PORT CHESTER''.
o Thus we can rewrite our query as:
o
o DEPOSIT BRANCH)
o
o This should considerably reduce the size of the intermediate relation.
 Projection Operation
1. Like selection, projection reduces the size of relations.
It is advantageous to apply projections early. Consider this form of our example query:
2. When we compute the subexpression

3.
we obtain a relation whose scheme is

(CNAME, CCITY, BNAME, ACCOUNT#, BALANCE)
4. We can eliminate several attributes from this scheme. The only ones we need to retain are those that
o appear in the result of the query or
o are needed to process subsequent operations.
5. By eliminating unneeded attributes, we reduce the number of columns of the intermediate result, and
thus its size.
6. In our example, the only attribute we need is BNAME (to join with BRANCH). So we can rewrite our
expression as:
7.
8.
9.
10. Note that there is no advantage in doing an early project on a relation before it is needed for some other
operation:
o We would access every block for the relation to remove attributes.
o Then we access every block of the reduced-size relation when it is actually needed.
o We do more work in total, rather than less!
 Natural Join Operation
1. Another way to reduce the size of temporary results is to choose an optimal ordering of the join
operations.
2. Natural join is associative:
3.
4. Although these expressions are equivalent, the costs of computing them may differ.
o Look again at our expression
o
o
o
o we see that we can compute DEPOSIT BRANCH first and then join with the first part.
o However, DEPOSIT BRANCH is likely to be a large relation as it contains one tuple for every
account.
o The other part,
is probably a small relation (comparatively).
o So, if we compute
o
first, we get a reasonably small relation.
o It has one tuple for each account held by a resident of Port Chester.
o This temporary relation is much smaller than DEPOSIT BRANCH.
5. Natural join is commutative:
6.
o Thus we could rewrite our relational algebra expression as:
o
o
o
o But there are no common attributes between CUSTOMER and BRANCH, so this is a Cartesian
product.
o Lots of tuples!
o If a user entered this expression, we would want to use the associativity and commutativity of
natural join to transform this into the more efficient expression we have derived earlier (join with
DEPOSIT first, then with BRANCH).
o
4. There are a number of historical, organizational, and technological reasons explain the lack
of an all-encompassing data management system. Discuss few of them with appropriate
examples.
Most current data management systems, DMS, have been built on the assumption that the data collection, or
database, to be administered consists of a single media type - structured tables of "fact" data or unstructured
strings of bits representing such media objects as text documents, images, or video. The result is that most
DMS' store and index a specific type of media data and provide a query (data access) language that is
specialized for efficient access to and retrieval of this data type.
A further assumption that has frequently been made is that the information requirements of the system users are
known and can be used for structuring the data collection and tuning the data management system. It has also
been assumed that the users would only infrequently require information/data from some other type of data
management system.
These assumptions have been criticized since the early 1980s by researchers who have pointed out that almost
from the point of creation, a database would not (nor could) contain all of the data required by the user
community (Gligor & Luckenbaugh, 1984; Landers & Rosenberg, 1982; Litwin et al., 1982; among many
others). A number of historical, organizational, and technological reasons explain the lack of an all-
encompassing data management system. Among these are:
 The sensible advice - to build small systems with the plan to extend their scope in later implementation
phases - allows a core system to be implemented relatively quickly, but has lead to a proliferation of
relatively small systems.
 Department autonomy has led to construction of department specific rather than organization wide
systems, again leading to many small, overlapping, and often incompatible systems within an
organization.
 The continual evolution of the organization and its interactions both within and to its external
environment prohibits complete understanding of future information requirements.
 Parallel development of data management systems for particular applications has lead to different and
incompatible systems for management of tabular/administrative data, text/document data,
historical/statistical data, spatial/geographic data, and streamed/audio and visual data.
The result is that only a portion of an organization's data is administered by any one data management system
and most organizations have a multitude of special purpose databases, managed by different, and often
incompatible, data management system types. The growing need to retrieve data from multiple databases within
an organization, as well as the rapid dissemination of data through the Internet, has given rise to the requirement
of providing integrated access to both internal and external data of multiple types.
A major challenge and critical practical and research problem for the information, computer, and
communication technology communities is to develop data management systems that can provide efficient
access to the data stored in multiple private and public databases (Brodie, 1993; Hurson & Bright, 1996;
Nordbotten, 1988a, 1988b and Nordbotten, 1994a).
Problems to be resolved include:
1. Interoperability among systems (Fox & Sornil, 1999; Liwtin, & Abdellatif, 1986),
2. Incorporation of legacy systems (Brodie, 1993) and
3. Integration of management techniques for structured and unstructured data (Stonebraker & Brown,
1999).
Each of the above problems entails an integration of concepts, methods, techniques and tools from separate
research and development communities that have existed in parallel but independently and have had rather
minimal interaction. One consequence of which is that there exists an overlapping and conflicting terminology
between these communities.
In the previous chapter, a database was defined as a COLLECTION OF RELATED DATA

REPRESENTING SOME LOGICALLY COHERENT ASPECT OF THE REAL WORLD .
With this definition, NO limitations are given as to the type of:
 Data in the collection,
 Model used to structure the collection, or
 Architecture and geographic location of the database
The focus of this text is on on-line - electronic and web accessible - databases containing multiple media data,
thus restricting our interest/focus to multimedia databases stored on one or more computers (DB servers) and
accessible from the Internet. Examples of such databases include the image collections of the Hermitage
Museum, the catalog and full text materials of the ACM digital library, and the customer records for the 7 sites
of Amazon.com
Electronic databases are important since they contain data recording the products and services, as well as the
economic history and current status of the owner organization. They are also a source of information for the
organization's employees and customers/users. However, databases can not be used effectively unless there
exist efficient and secure data management systems, DMS for the data in the databases.
Q5.Describe the Structural Semantic Data Model (SSM) with relevant examples.
Ans: Modelling Complex and Multimedia Data
Data modelling addresses a need in information system analysis and design to develop a model of the
information requirements as well as a set of viable database structure proposals. The data modelling process
consists of:
1. Identifying and describing the information requirements for an information system,

2. Specifying the data to be maintained by the data management system, and
3. Specifying the data structures to be used for data storage that best support the information requirements.
A fundamental tool used in this process is the data model, which is used both for specification of the
information requirements at the user level and for specification of the data structure for the database. During
implementation of a database, the data model guides construction of the schema or data catalog which contains
the metadata that describe the DB structure and data semantics that are used to support database implementation
and data retrieval.
Data modelling, using a specific data model type, and as a unique activity during information system design, is
commonly attributed to Charles Bachman (1969) who presented the Data Structure Diagram as one of the first,
widely used data models for network database design. Several alternative data model types were proposed
shortly thereafter, the best known of which are the:
 Relational model (Codd, 1970) and the

 Entity-relationship, ER, model (Chen, 1976).
The relational model was quickly criticized for being 'flat' in the sense that all information is represented as a set
of tables with atomic cell values. The definition of well-formed relational models requires that complex
attribute types (hierarchic, composite, multi-valued, and derived) be converted to atomic attributes and that
relations be normalized. Inter-entity (inter-relation) relationships are difficult to visualize in the resulting set of
relations, making control of the completeness and correctness of the model difficult. The relational model maps
easily to the physical characteristics of electronic storage media, and as such, is a good tool for design of the
physical database.
The entity-relationship approach to modelling, proposed by Chen (1976), had two primary objectives: first to
visualize inter-entity relationships and second to separate the DB design process into two phases:
1. Record, in an ER model, the entities and inter-entity relationships required "by the enterprise", i.e. by the
owner/user of the information system or application. This phase and its resulting model should be
independent of the DBMS tool that is to be used for realizing the DB.
2. Translate the ER model to the data model supported by the DBMS to be used for implementation.
This two-phase design supports modification at the physical level without requiring changes to the enterprise or
user view of the DB content.
Also Chen's ER model quickly came under criticism, particularly for its lack of ability to model classification
structures. In 1977, (Smith & Smith) presented a method for modelling generalization and aggregation
hierarchies that underlie the many extended/enhanced entity-relationship, EER, model types proposed and in
use today.
6.What are differences in Global and Local Transactions in distributed database system? What
are the roles of Transaction Manager and Transaction Coordinator in managing transactions in
distributed database?
Ans: A distributed database system consists of a-collection of sites, each of which maintains a local databases
system. Each site is able to process local transactions, those transactions that access data only in that single site.
In addition, a site may participate in the execution of global transactions, those transactions that access data is
several sites. The execution of global transactions requires communication among the sitcs.
The sites in the system can be connected physically in a variety of ways. The various topologies are represented
as graphs whose nodes correspond to sites. An edge from node A Lo node B corresponds to a direct
connection between the two sites. Some of thc most common configurations are depicted in Figure 1. The major
differences among these configurations involve:
 Installation cost. The cost of physically linking the sites in the system.
 Communication cost. The cost in time and money to send a message from site A to
 site B.
 Reliability. The frequency with which a link or site fails.
 Availability. The degree to which data can be accessed despite the failure of some
 links or sites.
As we shall see, these differences play an important role in choosing the appropriate
mechanism for handling the distribution of data. The sites of a distributed database system may be distributed
physically either over a large
geographical area (such as the all Indian states). or over a small geographical area such as a single building or a
number of adjacent buildings). The former type of network is referred to as a long-haul network, while the
latter is referred to as a local-area network. Since the sites in long-haul networks are distributed physically over
a large geographical area, the communication links are likely to be relatively slow and less reliable as compared
with local-are. networks. Qpical long-haul links are telephone lines, microwave links, and satellite channels. In
contrasf since all the sites in local-area netwoks are close to each other, communication links are of higher
speed and lower e m r rate than their counterparts in
long-haul networks. The most common links are twisted pair, baseband coaxial, broadband
coaxial, and fiber optics.
Let us illustrate these concepts by considering a banking system consisting of four branches located in four
different cities. Each branch has its own computer with a database consisting of all the accounts maintained at
that branch. Each such installation is thus a site. There also exists one single site which maintains information
about all the branches of the bank. Suppose that the database systems at the various sites are based on the
relational model. Thus, each branch maintains (among others) the relation deposite (Dejmsit-scheme) where
Deposite-scheme = (branch-name, account-number, customer-name, balance) site containing information about
the four branches maintains the relation branch
(Branch-scheme), where
Branch-scheme = (branch-name, assets, branch-city)
There are other relations maintained at the various sites which are ignored for the prrrpose of our example.
A local transaction is a transaction that accesses accounts in the one single site, at which the D b h l b u t d
Datnbs s l uansaction was initiated. A global transaction, on the other hand is m e which either access accounts
in a site different from the one at which the transaction was initiated, or accesses accounts in several different
sites. To illustrate the difference between these two types of transactions, consider the transaction to add $ 5 0
to account number 177 located at the Dehi branch. If the transaction was initiated at the Delhi branch, then i t
is considered local; otherwise, it is considered global. A transaction to uansfer $50 from account 177 to account
305, which is located at the Bombay branch, is a global transaction since accounts in two different sites are
accessed as a result of its execution. What makes the above configuration a distributed database system are the
facts that
The various sites are aware of each other.
Each site provides an environment for executing both local and global transactions.
There are several reasons for building distributed database systems, including sharing of data, reliability
and availability, and speedup of query processing. However, along with these advantages come several
disadvantages, including software development cost, greater potential for bugs, and increased processing
overhead.
A distributed database system cons is of a collection of sites, each of which maintain a local database
system. Each site is able to process local transaction, those transaction that access data only in that single site.
In addition, a site may participate in the execution of global transactions those transactions that access data n
several sites. The execution of global transactions requires communication among the sites. There are several
reasons for building distributed database systems, including sharing of data, reliability and availability, and
speed or query processing. However, along with those advantages come several disadvantages, including
software development cost, greater potential for bugs, and increased processing overhead. The primary
disadvantage of distributed database systems in the added complexity required to ensure proper co-ordination
among the sites. There are several issues involved in storing, a relation in the distributed database, including
replication and fragmentation. It is essential that the system minimise the degree to which a user needs to be
aware of how a relation is stored.
A transaction manager is a program module that provides the interface between the lowlevel data stored in the
database and the application programs and queries submitted to the system. The storage manager is responsible
for the interaction with the file manager. The raw data are stored on the disk using the file system, which is
usually provided by a conventional operating system. The storage manager translates the various DML
statements into low-level file-system commands. Thus, the storage manager is responsible for storing,
retrieving, and updating data in the database.The storage manager components include:
 Authorization and integrity manager, which tests for the satisfaction of integrity constraints and
checks the authority of users to access data.
 Transaction manager, which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
 File manager, which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
 Buffer manager, which is responsible for fetching data from disk storage into main memory, and
deciding what data to cache in main memory. The buffer manager is a critical part of the database
system, since it enables the database to handle data sizes that are much larger than the size of main
memory.The storage manager implements several data structures as part of the physical system
implementation:
1. Data files, which store the database itself.

2. Data dictionary, which stores metadata about the structure of the database, in particular the schema of
the database.
3. Indices, which provide fast access to data items that hold particular values.
-------------------------- xxxxxxxxxxxxxxxxx -------------------------

Master of Computer Application-Mc0077

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Master of Computer Application-Mc0077

Uploaded by

Copyright:

Available Formats

Master of Computer Application (MCA)

MC0077 – Advanced Database Systems

A) First Normal Form B) Second Normal Form C) Third Normal Form

Example: 1NF but not 2NF

FIRST (supplier_no, status, city, part_no, quantity)

Non-key attributes are not mutually independent (city ® status).

Decomposition (into 2NF):

SECOND (supplier_no, status, city)

Example (3NF but not BCNF):

2. Explain the concept of a Query? How a Query Optimizer works.

Q.3. Explain the following with respect to Heuristics of Query Optimizations:

of which we are only interested in a few tuples.

o We also are only interested in two attributes of this relation.

1. Like selection, projection reduces the size of relations.

2. When we compute the subexpression

we obtain a relation whose scheme is

 Natural Join Operation

is probably a small relation (comparatively).

first, we get a reasonably small relation.

Problems to be resolved include:

In the previous chapter, a database was defined as a COLLECTION OF RELATED DATA

Ans: Modelling Complex and Multimedia Data

1. Identifying and describing the information requirements for an information system,

 Relational model (Codd, 1970) and the

1. Data files, which store the database itself.

-------------------------- xxxxxxxxxxxxxxxxx -------------------------

You might also like