You are on page 1of 51

Data model (Database Models) A data model in software engineering is an abstract model that describes how data are

represented and accessed. Data models formally define data elements and relationships among data elements for a domain of interest. According to Hoberman (2009), A data model is a way finding tool for both b!siness and "# professionals, which !ses a set of symbols and te$t to precisely e$plain a s!bset of real information to impro%e comm!nication within the organi&ation and thereby lead to a more fle$ible and stable application en%ironment. A data model e$plicitly determines the str!ct!re of data or str!ct!red data. #ypical applications of data models incl!de database models, design of information systems, and enabling e$change of data. 's!ally data models are specified in a data modeling lang!age. A database model is a theory or specification describing how a database is str!ct!red and !sed. (e%eral s!ch models ha%e been s!ggested. )ommon models incl!de*

Flat model: #his may not strictly +!alify as a data model. #he flat (or table) model consists of a single, two,dimensional array of data elements, where all members of a gi%en col!mn are ass!med to be similar %al!es, and all members of a row are ass!med to be related to one another.

Hierarchical model: "n this model data is organi&ed into a tree,li-e str!ct!re, implying a single !pward lin- in each record to describe the nesting, and a sort field to -eep the records in a partic!lar order in each same,le%el list.

Network model: #his model organi&es data !sing two f!ndamental constr!cts, called records and sets. .ecords contain fields, and sets define one,to,many relationships between records* one owner, many members.

Relational model: is a database model based on first,order predicate logic. "ts core idea is to describe a database as a collection of predicates o%er a finite set of predicate %ariables, describing constraints on the possible %al!es and combinations of %al!es.

Object-relational model: (imilar to a relational database model, b!t ob/ects, classes and inheritance are directly s!pported in database schemas and in the +!ery lang!age.

Concept Oriented Model: #his is the concept!al str!ct!ring of a database. .eal str!ct!re may %ary from this str!ct!ring as this widely depend !pon system or database designer and may concei%e a problem in different way than that is act!ally implemented.

Star schema is the simplest style of data wareho!se schema. #he star schema consists of a few fact tables (possibly only one, /!stifying the name) referencing any n!mber of dimension tables . #he star schema is considered an important special case of the snowfla-e schema.

0roperties of Databases (A)"D) Atomicity Atomicity re+!ires that database modifications m!st follow an all,or,nothing r!le. 1ach transaction is said to be atomic if one part of the transaction fails, the entire transaction fails and database state is left !nchanged. "t is critical that the database management system maintains the atomic nat!re of transactions in spite of any application, D2M(, operating system or hardware fail!re. An atomic transaction cannot be s!bdi%ided, and m!st be processed in its entirety or not at all. Atomicity means that !sers do not ha%e to worry abo!t the effect of incomplete transactions. #ransactions can fail for se%eral -inds of reasons* Hardware ail!re* A dis- dri%e fails, pre%enting some of the transaction3s database changes from ta-ing effect System ail!re* #he !ser loses their connection to the application before pro%iding all necessary information "atabase ail!re* 1.g., the database r!ns o!t of room to hold additional data

Application ail!re* #he application attempts to post data that %iolates a r!le that the database itself enforces, s!ch as attempting to create a new acco!nt witho!t s!pplying an acco!nt n!mber

Consistency #he consistency property ens!res that the database remains in a consistent state. More precisely, it says that any transaction will ta-e the database from one consistent state to another consistent state. #he consistency r!le applies only to integrity r!les that are within its scope. #h!s, if a D2M( allows fields of a record to act as references to another record, then consistency implies the D2M( m!st enforce referential integrity* by the time any transaction ends, each and e%ery reference in the database m!st be %alid. "f a transaction consisted of an attempt to delete a record referenced by another, each of the following mechanisms wo!ld maintain consistency* Abort the transaction, rolling bac- to the consistent, prior state "elete all records that reference the deleted record (this is -nown as cascade delete) N!lli y the rele%ant fields in all records that point to the deleted record.

#solation "solation refers to the re+!irement that other operations cannot access or see data that has been modified d!ring a transaction that has not yet completed. 1ach transaction m!st remain !naware of other conc!rrently e$ec!ting transactions, e$cept that one transaction may be forced to wait for the completion of another transaction that has modified data that the waiting transaction re+!ires. "!rability D!rability is the D2M(3s g!arantee that once the !ser has been notified of a transaction3s s!ccess, the transaction will not be lost. #he transaction3s data changes will s!r%i%e system fail!re, and that all integrity constraints ha%e been satisfied, so the D2M( won3t need to re%erse the transaction. Many D2M(s implement d!rability by writing transactions into a transaction log that can be reprocessed to recreate the system state right before any later fail!re. A transaction is deemed committed only after it is entered in the log. Deeper into Database modeling lang!age

Hierarchical model o A hierarchy can lin- entities either directly or indirectly, and either %ertically or hori&ontally. #he only direct lin-s in a hierarchy, in so far as they are hierarchical, are to one3s immediate s!perior or to one of one3s s!bordinates, altho!gh a system that is largely hierarchical can also incorporate alternati%e hierarchies. "ndirect hierarchical lin-s can e$tend %ertically !pwards or downwards %ia m!ltiple lin-s in the same direction, following a path. o "e$ree o branchin$ Degree of branching refers to the n!mber of direct s!bordinates or children an ob/ect has (e+!i%alent to the n!mber of %ertices a node has). Hierarchies can be categori&ed based on the ma$im!m degree , the highest degree present in the system as a whole. )ategori&ation in this way yields two broad classes* linear and branching. "n a linear hierarchy, the ma$im!m degree is 4. "n other words, all of the ob/ects can be %is!ali&ed in a line!p, and each ob/ect (e$cl!ding the top and bottom ones) has e$actly one direct s!bordinate and one direct s!perior. 5ote that this is referring to the ob/ects and not the le%els6 e%ery hierarchy has this property with respect to le%els, b!t normally each le%el can ha%e an infinite n!mber of ob/ects. An e$ample of a linear hierarchy is the hierarchy of life. "n a branchin$ hierarchy, one or more ob/ects ha%e a degree of 2 or more (and therefore the ma$im!m degree is 2 or higher). 7or many people, the word hierarchy a!tomatically e%o-es an image of a branching hierarchy. 2ranching hierarchies are present within n!mero!s systems, incl!ding organi&ations and classification schemes. #he broad category of branching hierarchies can be f!rther s!bdi%ided based on the degree. A lat hierarchy is a branching hierarchy in which the ma$im!m degree approaches infinity, i.e., with a wide span. Most often, systems int!iti%ely regarded as hierarchical ha%e at most a moderate span. #herefore, a flat hierarchy is often not %iewed as a hierarchy at all at first bl!sh. 7or e$ample, diamonds and graphite is a flat hierarchy of n!mero!s carbon atoms which can be f!rther decomposed into s!batomic particles.

An o%erlappin$ hierarchy is a branching hierarchy in which at least one ob/ects has two parent ob/ects. 7or e$ample, a grad!ate st!dent can ha%e two co,s!per%isors to whom they report directly and e+!ally, and who ha%e the same le%el of a!thority within the !ni%ersity hierarchy (i.e., they ha%e the same position or ten!re stat!s).

5etwor- model o #he networ- model is a database model concei%ed as a fle$ible way of representing ob/ects and their relationships. "ts disting!ishing feat!re is that the schema, %iewed as a graph in which ob/ect types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.

o 8b/ect model o A collection of ob/ects or classes thro!gh which a program can e$amine and manip!late some specific parts of its world. "n other words, the ob/ect,oriented interface to some ser%ice or system. (!ch an interface is said to be the ob/ect model of the represented ser%ice or system.

.elational model o "ts central idea is to describe a database as a collection of predicates o%er a finite set of predicate %ariables, describing constraints on the possible %al!es and combinations of %al!es. #he content of the database at any gi%en time is a finite (logical) model of the database, i.e. a set of relations, one per predicate %ariable, s!ch that all predicates are satisfied. A re+!est for information from the database (a database +!ery) is also a predicate. o #he p!rpose of the relational model is to pro%ide a declarati%e method for specifying data and +!eries* we directly state what information the database contains and what information we want from it, and let the database management

system software ta-e care of describing data str!ct!res for storing the data and retrie%al proced!res for getting +!eries answered.

"n%erted lists and other methods are also !sed. A gi%en database management system may pro%ide one or more of the fo!r models. #he optimal str!ct!re depends on the nat!ral organi&ation of the application3s data, and on the application3s re+!irements (which incl!de transaction rate (speed), reliability, maintainability, scalability, and cost). #he dominant model in !se today is the ad hoc one embedded in (9:, despite the ob/ections of p!rists who belie%e this model is a corr!ption of the relational model, since it %iolates se%eral of its f!ndamental principles for the sa-e of practicality and performance. Many D2M(s also s!pport the 8pen Database )onnecti%ity A0" that s!pports a standard way for programmers to access the D2M(. D2M( #erminologies ; "atabase mana$ement system &"'MS(: (oftware for establishing, !pdating, and +!erying (e.g., managing) a database

"atabase: 8rgani&ing files into related !nits which are then %iewed as a single storage. #he data in the database are generally made a%ailable to a wide range of !sers thro!gh sharing and mentioning different rights and roles to different classes of !sers. S)* &Str!ct!ral )!ery *an$!a$e(: #his is the core lang!age of all databases and this is also the common platform for different database engines to interact. "ata wareho!se: #his is a physical repository where relational data are organi&ed to pro%ide clean, enterprise,wide data in a standardi&ed format. Data wareho!se is a h!ge database that stores c!rrent and historical data of potential interest to decision ma-ers thro!gho!t the company. #hese data originates in different #0( and thro!gh other e$ternal entry methods. "ata Marts: #hese are the s!bsets of a data wareho!se in which a s!mmari&ed and highly foc!sed portion of the organi&ation<s data is placed in a separate database for a specific set of !sers. )ompanies often b!ild enterprise,wide wareho!ses where a central data wareho!se ser%es the entire organi&ation6 or they create small decentrali&ed wareho!ses called data marts. +ntity: An entity may be defined as a thing which is recogni&ed as being capable of an independent e$istence and which can be !ni+!ely identified. 1ntities carries attrib!tes to get it !ni+!ely identified. Relationship: #wo different entities possessing some logical associations are physically connected !sing relationships. .elationships may also ha%e attrib!tes attached to it. Attrib!tes: #hese are the feat!res or !ni+!ely identifiable characteristic of an element (entity or .elationship).

; ;

; ;

.ele%ance of relational design in D(( M!ltidimensional problem sol%in$* in D(( architect!re, problem sol%ing re+!ires m!ltiple ways of e%al!ation of the problem and collecting re+!isite information towards Q1 (A) each different e%al!ation. 2005 Critical ,!eries: D2M( and .D2M( can handle comple$ +!eries and information 2006 search which is %ery !sef!l in D((.
Q2 (A)

Re erentially inte$rated inp!ts: .D2M( and .elational str!ct!ring of data helps in connecting related fields and information of a single item or ob/ect. "ata wareho!sin$ s!pport: .D2M( can remotely connect to different ser%ers to fetch data from and span across bo!ndaries to create a centrali&ed data access medi!m which e%ent!ally gi%es rise to data wareho!ses. "ata mart s!pport: .2DM(, thro!gh its access rights and different %iews to the same data can create data marts for high in%ol%ement decision ma-ing Sharability and scalability o in ormation* (ince a database accepts conc!rrent access, m!ltiple !sers can log on to the same screen at different geographical locations or at different decision points. "nformation stored in the database is highly scalable to offer fle$ibility at the information searcher<s end.

Database 5ormali&ation 5ormali&ation is the scientific method of brea-ing down comple$ table str!ct!res into simple table str!ct!res !sing certain r!les. #his method is !sed to red!ce red!ndancy in table and eliminate the problems of inconsistency and dis- space !sage. #he normali&ation theory is based on the f!ndamental notion of f!nctional dependency. (=i%en a .elation > #able ., Attrib!te A is f!nctionally dependent on attrib!te 2 if each %al!e of A in . is associated with precisely one %al!e of 2. 1.g., ?? )ode 14 12 1@ 5ame Mac (andra Henry )ity Delhi )A 0aris

Not Normali-ed Form #he relation is -ept witho!t any normali&ation r!les and g!idelines. 1.g., ?? 1)8D1 1404 D10# (ystems D10#H1AD 0.8A)8D1 H8'.( 1904 02B 90 0C4 404 020 D0 190D 02B 409 022 9E 190E 0C4 5':: 02B B2

1@0C 1C0E

(ales Admin

First Normal Form &.NF( A table is said to be in 457 if each cell of the table contains precisely one %al!e. 1.g., ?? 1)8D1 1404 1404 1404 1@0C 1@0C 1C0E 1C0E D10# (ystems (ystems (ystems (ales (ales Admin Admin D10#H1AD 1904 1904 1904 190D 190D 190E 190E 0.8A)8D1 02B 0C4 020 02B 022 0C4 02B H8'.( 90 404 D0 409 9E 5':: B2

Second Normal Form &/NF( A table is said to be in 257 when it is in 457 and e%ery attrib!te in the row is f!nctionally dependent on the whole -ey, and is not /!st a part of the -ey. =!idelines to con%ert a table to 257* 7ind and remo%e attrib!tes that are f!nctionally dependent on only a part of the -ey and not on the whole -ey. 0lace them in a different table. =ro!p the remaining attrib!tes. 1.g., ??

1)8D1 1404 1@0C 1C0E

D10# (ystems (ales Admin

1)8D1 0.8A)8D1 D10#H1AD 1404 02B 1904 1404 0C4 190D 1404 020 190E 1@0C 02B 1@0C 022 1C0E 0C4 1C0E 02B

H8'.( 90 404 D0 409 9E 5':: B2

0hird Normal Form &1NF( A table is said to be in @57 when it is in 257 and e%ery non,-ey attrib!te is f!nctionally dependent only on the primary -ey. =!idelines to con%ert a table to @57* 7ind and remo%e non,-ey attrib!tes that are f!nctionally dependent on attrib!tes that are not primary -ey. 0lace them in a different table containing same properties =ro!p the remaining attrib!tes D10# (ystems (ales Admin 7inance 1.g., ?? 1)8D1 1404 1@0C 1F02 1C0E 1D0B 1D0E 140F 'oyce 2 Codd Normal Form D10# (ystems (ales 7inance Admin 7inance 7inance (ystems D10#H1AD 1904 190D 190E 1909

A relation is in 2)57 only if e%ery determinant is a candidate -ey. =!idelines to con%ert a table to 2)57 7ind and remo%e the o%erlapping candidate -eys. 0lace the part of candidate -ey and the attrib!te it is f!nctionally dependent on, in another table. =ro!p the remaining items into a table. 1.g., ?? 1)8D1 14 12 1@ 1F 1F 14 5AM1 Geronica Anthony Mac (!san (!san Geronica 0.8A)8D1 02 0C 0D 02 0C 0C H8'.( FE 400 4C 2C0 BC F0

1)8D1 14 12 1@ 1F 1F 14

0.8A)8D1 02 0C 0D 02 0C 0C

H8'.( FE 400 4C 2C0 BC F0

"A0A 3AR+HO4S+:

A data wareho!se (DH) is a database !sed for reporting. #he data is offloaded from the operational systems for reporting. #he data may pass thro!gh an 8perational Data (tore (8D() for additional operations before it is !sed in the DH for reporting. A data wareho!se maintains its f!nctions in three layers* staging, integration and access. . Sta$in$ is !sed to store raw data for !se by de%elopers (analysis and s!pport). #he inte$ration layer is !sed to integrate data and to ha%e a le%el of abstraction from !sers. #he access layer is for getting data o!t for !sers.

#his definition of the data wareho!se foc!ses on data storage. #he main so!rce of the data is cleaned, transformed, catalog!ed and made a%ailable for !se by managers and other b!siness professionals for data mining, online analytical processing, mar-et research and decision s!pport (Mara-as I 82rien 2009). Howe%er, the means to retrie%e and analy&e data, to e$tract, transform and load data, and to manage the data dictionary are also considered essential components of a data wareho!sing system. Many references to data wareho!sing !se this broader conte$t. #h!s, an e$panded definition for data wareho!sing incl!des b!siness intelligence tools, tools to e$tract, transform and load data into the repository, and tools to manage and retrie%e metadata. #he concept of data wareho!sing dates bac- to the late 49E0s when "2M researchers 2arry De%lin and 0a!l M!rphy de%eloped the b!siness data wareho!se . "n essence, the data wareho!sing concept was intended to pro%ide an architect!ral model for the flow of data from operational systems to decision s!pport en%ironments. "A0A'AS+ ARCH#0+C04R+* Architect!re, in the conte$t of an organi&ation3s data wareho!sing efforts, is a concept!ali&ation of how the data wareho!se is b!ilt. #here is no right or wrong architect!re, b!t rather there are m!ltiple architect!res that e$ist to s!pport %ario!s en%ironments and sit!ations. #he worthiness of the architect!re can be /!dged from how the concept!ali&ation aids in the b!ilding, maintenance, and !sage of the data wareho!se. 8ne possible simple concept!ali&ation of a data wareho!se architect!re consists of the following interconnected layers* 4. 8perational database layer

#he so!rce data for the data wareho!se J An organi&ation3s 1nterprise .eso!rce 0lanning systems fall into this layer. 2. Data access layer #he interface between the operational and informational access layer J #ools to e$tract, transform, load data into the wareho!se fall into this layer. @. Metadata layer #he data directory J #his is !s!ally more detailed than an operational system data directory. #here are dictionaries for the entire wareho!se and sometimes dictionaries for the data that can be accessed by a partic!lar reporting and analysis tool. F. "nformational access layer #he data accessed for reporting and analy&ing and the tools for reporting and analy&ing data J 2!siness intelligence tools fall into this layer. #he "nmon,Kimball differences abo!t design methodology, disc!ssed later in this article, ha%e to do with this layer +5O*40#ON #N OR6AN#7A0#ONA* 4S+* 8rgani&ations generally start off with relati%ely simple !se of data wareho!sing. 8%er time, more sophisticated !se of data wareho!sing e%ol%es. #he following general stages of !se of the data wareho!se can be disting!ished* 4. 8ffline 8perational Data Hareho!se Data wareho!ses in this initial stage are de%eloped by simply copying the data off of an operational system to another ser%er where the processing load of reporting against the copied data does not impact the operational system3s performance. 2. 8ffline Data Hareho!se Data wareho!ses at this stage are !pdated from data in the operational systems on a reg!lar basis and the data wareho!se data are stored in a data str!ct!re designed to facilitate reporting. @. .eal #ime Data Hareho!se Data wareho!ses at this stage are !pdated e%ery time an operational system performs a transaction (e.g. an order or a deli%ery or a boo-ing). F. "ntegrated Data Hareho!se

#hese data wareho!ses assemble data from different areas of b!siness, so !sers can loo- !p the information they need across other systems. 'ene its (ome of the benefits that a data wareho!se pro%ides are as follows*

A data wareho!se pro%ides a common data model for all data of interest regardless of the data3s so!rce. #his ma-es it easier to report and analy&e information than it wo!ld be if m!ltiple data models were !sed to retrie%e information s!ch as sales in%oices, order receipts, general ledger charges, etc. 0rior to loading data into the data wareho!se, inconsistencies are identified and resol%ed. #his greatly simplifies reporting and analysis. "nformation in the data wareho!se is !nder the control of data wareho!se !sers so that, e%en if the so!rce system data are p!rged o%er time, the information in the wareho!se can be stored safely for e$tended periods of time. 2eca!se they are separate from operational systems, data wareho!ses pro%ide retrie%al of data witho!t slowing down operational systems. Data wareho!ses can wor- in con/!nction with and, hence, enhance the %al!e of operational b!siness applications, notably c!stomer relationship management ().M) systems. Data wareho!ses facilitate decision s!pport system applications s!ch as trend reports (e.g., the items with the most sales in a partic!lar area within the last two years), e$ception reports, and reports that show act!al performance %ers!s goals

"isad%anta$es #here are also disad%antages to !sing a data wareho!se. (ome of them are*

Data wareho!ses are not the optimal en%ironment for !nstr!ct!red data. 2eca!se data m!st be e$tracted, transformed and loaded into the wareho!se, there is an element of latency in data wareho!se data. 8%er their life, data wareho!ses can ha%e high costs.

Data wareho!ses can get o!tdated relati%ely +!ic-ly. #here is a cost of deli%ering s!boptimal information to the organi&ation. #here is often a fine line between data wareho!ses and operational systems. D!plicate, e$pensi%e f!nctionality may be de%eloped. 8r, f!nctionality may be de%eloped in the data wareho!se that, in retrospect, sho!ld ha%e been de%eloped in the operational systems. R+*A0#ONA* A*6+'RA

.elational algebra is the formal lang!ages associated with the relational model. "nformally, relational algebra is a (high,le%el) proced!ral lang!age. .elational algebra operations wor- on one or more relations to define another relation witho!t changing the original relations. 2oth operands and res!lts are relations, so o!tp!t from one operation can become inp!t to another operation. "t allows e$pressions to be nested, /!st as in arithmetic. #his property is called clos!re. 7i%e basic operations in relational algebra* (election 0ro/ection )artesian prod!ct 'nion (et Difference.

#he abo%e operations perform the most of the data retrie%al operations needed. "t incl!des Join, Intersection, and Division operations, which can be e$pressed in terms of C basic operations. 6RA8H#CA**9 R+8R+S+N0A0#ON OF R+*A0#ONA* A*6+'RA:

+:8*ANA0#ON OF R+*A0#ONA* A*6+'RA O8+RA0#ONS: 1. Selection &or Restriction(:(ymbolically, it can be represented as predicate &R(. "t wor-s on a single relation . and defines a relation that contains only those t!ples (rows) of . that satisfy the specified condition (predicate). 7or e$ample, :ist all staff with a salary greater than L40,000. salary ; .<<<< &Sta ( 2. 8rojection* (ymbolically , it can be represented as col4, . . . , coln(.). "t wor-s on a single relation . and defines a relation that contains a %ertical s!bset of ., e$tracting the %al!es of specified attrib!tes and eliminating d!plicates. 7or e$ample, 0rod!ce a list of salaries for all staff, showing only staff5o, f5ame, l5ame, and salary details. staff5o, f5ame, l5ame, salary((taff) @. 'nion* (ymbolically, it can be represented as , . (. #he 'nion of two relations . and ( defines a relation that contains all the t!ples of ., or (, or both . and (, d!plicate t!ples being eliminated. #he relation . and ( m!st be !nion,compatible. "f . and ( ha%e I and J t!ples, respecti%ely, !nion is obtained by concatenating them into one relation with a ma$im!m of ( I M J) t!ples. 7or e$ample, :ist all cities where there is either a branch office or a property for rent. city(2ranch) city(0roperty7or.ent) F. (et Difference*

(ymbolically, it can be represented as . N (. "t defines a relation consisting of the t!ples that are in relation ., b!t not in (. #he relation . and ( m!st be !nion,compatible. 7or e$ample, :ist all cities where there is a branch office b!t no properties for rent. city(2ranch) N city(0roperty7or.ent) C. "ntersection* (ymbolically, it can be represented as . (. "t defines a relation consisting of the set of all t!ples that are in both . and (. #he relation . and ( m!st be !nion,compatible. "t can be e$pressed !sing basic operations* . ( O . N (. N () 7or e$ample, :ist all cities where there is both a branch office and at least one property for rent. city(2ranch) city(0roperty7or.ent) D. )artesian prod!ct* (ymbolically, it can be represented as . P (. "t defines a relation that is the concatenation of e%ery t!ple of relation . with e%ery t!ple of relation (. 7or e$ample, :ist the names and comments of all clients who ha%e %iewed a property for rent. (client5o, f5ame, l5ame()lient)) P (client5o, property5o, comment (Giewing)) B. Aoin* Aoin is a deri%ed from )artesian prod!ct. "t is e+!i%alent to perform a (election, !sing /oin predicate as selection form!la, o%er )artesian prod!ct of the two operand relations. 8ne of the most diffic!lt operations is to implement efficiently in an .D2M( and one reason why .D2M(s ha%e nat!ral performance problems. #he %ario!s forms of /oin operation #heta /oin 1+!i/oin (a partic!lar type of #heta /oin) 5at!ral /oin 8!ter /oin (emi /oin E. Di%ision*

(ymbolically, it can be represented as . (. "t defines a relation o%er the attrib!tes ) that consists of set of t!ples from . that match combination of every t!ple in (. 7or e$ample, "dentify all clients who ha%e %iewed all properties with three rooms. (client5o, property5o(Giewing)) (property5o(rooms O @ (0roperty7or.ent)))

0H+ +N0#09 2 R+*A0#ONA* MO"+* An entity,relationship model (1.M) is an abstract and concept!al representation of data. "t is a high,le%el concept!al data model de%eloped by 0eter )hen in 49BD after relational database model. 0eter )hen felt that .elational model is not rich eno!gh beca!se e%ery relation was based on collection of domains (mathematical). 1ntity,relationship modeling is a concept!al database modeling method, !sed to prod!ce a type of concept!al schema or semantic data model of a system, often a relational database, and its re+!irements in a top,down fashion. )oncept!al Modeling is an important phase in designing a s!ccessf!l database.

A concept!al data model is a set of concepts that describe the str!ct!re of a database and associated retrie%al and !pdation transactions on the database. An 1.M

(o to s!m !p, the 1ntity,.elationship (1,.) Model is based on a %iew of a real world that consists of set of ob/ects called entities and relationships among entity sets which are basically a gro!p of similar ob/ects. #he relationships between entity sets is represented by a named 1,. relationship and is of 4*4, 4* 5 or M* 5 type which tells the mapping from one entity set to another.

#he 1,. model is shown diagrammatically !sing 1ntity,.elationship (1,.) diagrams which represent the elements of the concept!al model that show the meanings and the relationships between those elements independent of any partic!lar D2M( and implementation details.

7eat!res of the 1,. Model*

4. #he 1,. diagram !sed for representing 1,. Model can be easily con%erted into .elations (tables) in .elational Model.

2. #he 1,. Model is !sed for the p!rpose of good database design by the database de%eloper so to !se that data model in %ario!s D2M(.

@. "t is helpf!l as a problem decomposition tool as it shows the entities and the relationship between those entities.

F. "t is inherently an iterati%e process. 8n later modifications, the entities can be inserted into this model.

C. "t is %ery simple and easy to !nderstand by %ario!s types of !sers and designers beca!se specific standards are !sed for their representation.

=8A:( 87 1. M8D1:*)apt!re semantics of information ob/ects )apt!re comple$ity of relationships between ob/ects F4NC0#ONA* "+8+N"+NC#+S

A dependency occ!rs in a database when information stored in the same database table !ni+!ely determines other information stored in the same table. Qo! can also describe this as a relationship where -nowing the %al!e of one attrib!te (or a set of attrib!tes) is eno!gh to tell yo! the %al!e of another attrib!te (or set of attrib!tes) in the same table.

(aying that there is a dependency between attrib!tes in a table is the same as saying that there is a f!nctional dependency between those attrib!tes. "f there is a dependency in a database s!ch that attrib!te 2 is dependent !pon attrib!te A, yo! wo!ld write this as RA ,? 2S.

#he determination of f!nctional dependencies is an important part of designing databases in the relational model, and in database normali&ation and denormali&ation. #he f!nctional dependencies, along with the attrib!te domains, are selected so as to generate constraints that wo!ld e$cl!de as m!ch data inappropriate to the !ser domain from the system as possible.

7or e$ample, s!ppose one is designing a system to trac- %ehicles and the capacity of their engines. 1ach %ehicle has a !ni+!e %ehicle identification n!mber (G"5). 8ne wo!ld write G"5 T 1ngine)apacity beca!se it wo!ld be inappropriate for a %ehicle3s engine to ha%e more than one capacity. (Ass!ming, in this case, that %ehicles only ha%e one engine.) Howe%er, 1ngine)apacity T G"5, is incorrect beca!se there co!ld be many %ehicles with the same engine capacity.

Definition*

A f!nctional dependency occ!rs when one attrib!te in a relation !ni+!ely determines another attrib!te. #his can be written A ,? 2 which wo!ld be the same as stating 2 is f!nctionally dependent !pon A.

1$amples* "n a table listing employee characteristics incl!ding (ocial (ec!rity 5!mber (((5) and name, it can be said that name is f!nctionally dependent !pon ((5 (or ((5 ,? name) beca!se an employee3s name can be !ni+!ely determined from their ((5. Howe%er, the re%erse statement (name ,? ((5) is not tr!e beca!se more than one employee can ha%e the same name b!t different ((5s.

#ri%ial 7!nctional Dependencies A tri%ial f!nctional dependency occ!rs when yo! describe a f!nctional dependency of an attrib!te on a collection of attrib!tes that incl!des the original attrib!te. 7or e$ample, RUA, 2V ,? 2S is a tri%ial f!nctional dependency, as is RUname, ((5V ,? ((5S. #his type of f!nctional dependency is called tri%ial beca!se it can be deri%ed from common sense. "t is ob%io!s that if yo! already -now the %al!e of 2, then the %al!e of 2 can be !ni+!ely determined by that -nowledge. #ransiti%e Dependencies #ransiti%e dependencies occ!r when there is an indirect relationship that ca!ses a f!nctional dependency. 7or e$ample, SA ,? )S is a transiti%e dependency when it is tr!e only beca!se both RA ,? 2S and R2 ,? )S are tr!e.

M!lti%al!ed Dependencies*

M!lti%al!ed dependencies occ!r when the presence of one or more rows in a table implies the presence of one or more other rows in that same table. 7or e$ample, imagine a car company that man!fact!res many models of car, b!t always ma-es both red and bl!e colors of each model. "f yo! ha%e a table that contains the model name, color and year of each car the company man!fact!res, there is a m!lti%al!ed dependency in that table. "f there is a row for a certain model name and year in bl!e, there m!st also be a similar row corresponding to the red %ersion of that same car. "rred!cible f!nction depending set A f!nctional depending set ( is irred!cible if the set has three following properties* 4. 1ach right set of a f!nctional dependency of ( contains only one attrib!te. 2. 1ach left set of a f!nctional dependency of ( is irred!cible. "t means that red!cing any one attrib!te from left set will change the content of ( (( will lose some information). @. .ed!cing any f!nctional dependency will change the content of (. (ets of 7!nctional Dependencies(7D) with these properties are also called canonical or minimal.

0roperties of f!nctional dependencies*

=i%en that X, Y, and Z are sets of attrib!tes in a relation R, one can deri%e se%eral properties of f!nctional dependencies. Among the most important are Armstrong3s a$ioms, which are !sed in database normali&ation*

(!bset 0roperty (A$iom of .efle$i%ity)* "f Y is a s!bset of X, then X T Y A!gmentation (A$iom of A!gmentation)* "f X T Y, then XZ T YZ #ransiti%ity (A$iom of #ransiti%ity)* "f X T Y and Y T Z, then X T Z

7rom these r!les, we can deri%e these secondary r!les*


'nion* "f X T Y and X T Z, then X T YZ Decomposition* "f X T YZ, then X T Y and X T Z 0se!do transiti%ity* "f X T Y and WY T Z, then XW T Z

1+!i%alent sets of f!nctional dependencies are called co%ers of each other. 1%ery set of f!nctional dependencies has a canonical co%er.

F+A04R+S OF "'MS A database management system is the system in which related data is stored in an efficient and compact manner. 1fficient means that the data which is stored in the D2M( can be accessed +!ic-ly and compact means that the data ta-es !p %ery little space in the comp!ter3s memory. #he phrase related data is means that the data stored pertains to a partic!lar topic. 7eat!res commonly offered by database management systems incl!de* 4. )!ery ability 9!erying is the process of re+!esting attrib!te information from %ario!s perspecti%es and combinations of factors. 1$ample* How many 2,door cars in #e$as are greenW A database +!ery lang!age and report writer allow !sers to interacti%ely interrogate the database, analy&e its data and !pdate it according to the !sers pri%ileges on data.

2. 'ack!p and replication )opies of attrib!tes need to be made reg!larly in case primary dis-s or other e+!ipment fails. A periodic copy of attrib!tes may also be created for a distant organi&ation that cannot readily access the original. D2M( !s!ally pro%ide !tilities to facilitate the process of e$tracting and disseminating attrib!te sets. Hhen data is replicated between database ser%ers, so that the information remains consistent thro!gho!t the database system and !sers cannot tell or e%en -now which ser%er in the D2M( they are !sing, the system is said to e$hibit replication transparency. @. R!le en orcement 8ften one wants to apply r!les to attrib!tes so that the attrib!tes are clean and reliable. 7or e$ample, we may ha%e a r!le that says each car can ha%e only one engine associated with it (identified by 1ngine 5!mber). "f somebody tries to associate a second engine with a gi%en car, we want the D2M( to deny s!ch a re+!est and display an error message. Howe%er, with changes in the model specification s!ch as, in this e$ample, hybrid gas,electric cars, r!les may need to change. "deally s!ch r!les sho!ld be able to be added and remo%ed as needed witho!t significant data layo!t redesign. => Sec!rity 7or sec!rity reasons, it is desirable to limit who can see or change specific attrib!tes or gro!ps of attrib!tes. #his may be managed directly on an indi%id!al basis, or by the assignment of indi%id!als and pri%ileges to gro!ps, or (in the most elaborate models) thro!gh the assignment of indi%id!als and gro!ps to roles which are then granted entitlements.

?> Comp!tation )ommon comp!tations re+!ested on attrib!tes are co!nting, s!mming, a%eraging, sorting, gro!ping, cross,referencing, and so on. .ather than ha%e each comp!ter application implement these from scratch, they can rely on the D2M( to s!pply s!ch calc!lations. D. Chan$e and access lo$$in$ #his describes who accessed which attrib!tes, what was changed, and when it was changed. :ogging ser%ices allow this by -eeping a record of access occ!rrences and changes. B. A!tomated optimi-ation

7or fre+!ently occ!rring !sage patterns or re+!ests, some D2M( can ad/!st themsel%es to impro%e the speed of those interactions. "n some cases the D2M( will merely pro%ide tools to monitor performance, allowing a h!man e$pert to ma-e the necessary ad/!stments after re%iewing the statistics collected. A"5AN0A6+S OF "'MS* Database is a software program, !sed to store, delete, !pdate and retrie%e data. A database can be limited to a single des-top comp!ter or can be stored in large ser%er machines, li-e the "2M Mainframe. #here are %ario!s database management systems a%ailable in the mar-et. (ome of them are (ybase, Microsoft (9: (er%er, 8racle .D2M(, 0ostgre(9:, My(9:, etc. #he ad%antages of the database management systems can be en!merated as !nder* 4. 3areho!se o #n ormation #he database management systems are wareho!ses of information, where large amo!nt of data can be stored. #he common e$amples in commercial applications are in%entory data, personnel data, etc. "t often happens that a common man !ses a database management system, witho!t e%en reali&ing, that it is being !sed. #he best e$amples for the same, wo!ld be the address boo- of a cell phone, digital diaries, etc. 2oth these e+!ipments store data in their internal database. 2. "e inin$ Attrib!tes #he !ni+!e data field in a table is assigned a primary -ey. #he primary -ey helps in the identification of data. "t also chec-s for d!plicates within the same table, thereby red!cing data red!ndancy. #here are tables, which ha%e a secondary -ey in addition to the primary -ey. #he secondary -ey is also called 3foreign -ey3. #he secondary -ey refers to the primary -ey of another table, th!s establishing a relationship between the two tables. @. Systematic Stora$e #he data is stored in the form of tables. #he tables consists of rows and col!mns. #he primary and secondary -ey help to eliminate data red!ndancy, enabling systematic storage of data. F. Chan$es to Schema #he table schema can be changed and it is not platform dependent. #herefore, the tables in the system can be edited to add new col!mns and rows witho!t hampering the applications, that depend on that partic!lar database.

C. No *an$!a$e "ependence #he database management systems are not lang!age dependent. #herefore, they can be !sed with %ario!s lang!ages and on %ario!s platforms. D. 0able @oins #he data in two or more tables can be integrated into a single table. #his enables to red!ce the si&e of the database and also helps in easy retrie%al of data. B. M!ltiple Sim!ltaneo!s 4sa$e #he database can be !sed sim!ltaneo!sly by a n!mber of !sers. Gario!s !sers can retrie%e the same data sim!ltaneo!sly. #he data in the database can also be modified, based on the pri%ileges assigned to !sers. E. "ata Sec!rity Data is the most important asset. #herefore, there is a need for data sec!rity. Database management systems help to -eep the data sec!red. A> 8ri%ile$es Different pri%ileges can be gi%en to different !sers. 7or e$ample, some !sers can edit the database, b!t are not allowed to delete the contents of the database. 40. Abstract 5iew o "ata and +asy Retrie%al D2M( enables easy and con%enient retrie%al of data. A database !ser can %iew only the abstract form of data6 the comple$ities of the internal str!ct!re of the database are hidden from him. #he data fetched is in !ser friendly format. 44. "ata Consistency Data consistency ens!res a consistent %iew of data to e%ery !ser. "t incl!des the acc!racy, %alidity and integrity of related data. #he data in the database m!st satisfy certain consistency constraints, for e$ample, the age of a candidate appearing for an e$am sho!ld be of n!mber datatype and in the range of 20,2C. Hhen the database is !pdated, these constraints are chec-ed by the database systems. #he commonly !sed database management system is called relational database management system (.D2M(). #he most important ad%antage of database management systems is the

systemetic storage of data, by maintaining the relationship between the data members. #he data is stored as t!ples in a .D2M(. #he ad%ent of ob/ect oriented programming ga%e rise to the concept of ob/ect oriented database management systems. #hese systems combine properties li-e inheritance, encaps!lation, polymorphism, abstraction with atomicity, consistency, isolation and d!rability, also called A)"D properties of D2M(. Database management systems ha%e bro!ght abo!t systemati-ation in data stora$e, along with data sec!rity.

An "ntrod!ction to Data Mining Disco%ering hidden %al!e in yo!r data wareho!se 8%er%iew Data mining, the e$traction of hidden predicti%e information from large databases, is a powerf!l new technology with great potential to help companies foc!s on the most important information in their data wareho!ses. Data mining tools predict f!t!re trends and beha%iors, allowing b!sinesses to ma-e proacti%e, -nowledge,dri%en decisions. #he a!tomated, prospecti%e (f!t!re) analyses offered by data mining mo%e beyond the analyses of past e%ents pro%ided by retrospecti%e(bac-ward,loo-ing) tools typical of decision s!pport systems. Data mining tools can answer b!siness +!estions that traditionally were too time cons!ming to resol%e. #hey search databases for hidden patterns, finding predicti%e information that e$perts may miss beca!se it lies o!tside their e$pectations. 0he Fo!ndations o "ata Minin$ Data mining techni+!es are the res!lt of a long process of research and prod!ct de%elopment. #his e%ol!tion began when b!siness data was first stored on comp!ters, contin!ed with impro%ements in data access, and more recently, generated technologies that allow !sers to na%igate thro!gh their data in real time. Data mining ta-es this e%ol!tionary process beyond retrospecti%e data access and na%igation to prospecti%e and proacti%e information deli%ery. Data mining is ready for application in the b!siness comm!nity beca!se it is s!pported by three technologies that are now s!fficiently mat!re*

Massi%e data collection

0owerf!l m!ltiprocessor comp!ters Data mining algorithms

"n the e%ol!tion from b!siness data to b!siness information, each new step has b!ilt !pon the pre%io!s one. 7or e$ample, dynamic data access is critical for drill,thro!gh in data na%igation applications, and the ability to store large databases is critical to data mining. 7rom the !ser<s point of %iew, the fo!r steps listed in #able 4 were re%ol!tionary beca!se they allowed new b!siness +!estions to be answered acc!rately and +!ic-ly. #he core components of data mining technology ha%e been !nder de%elopment for decades, in research areas s!ch as statistics, artificial intelligence, and machine learning. #oday, the mat!rity of these techni+!es, co!pled with high,performance relational database engines and broad data integration efforts, ma-e these technologies practical for c!rrent data wareho!se en%ironments. 0he Scope o "ata Minin$ :Data mining deri%es its name from the similarities between searching for %al!able b!siness information in a large database J for e$ample, finding lin-ed prod!cts in gigabytes of store scanner data J and mining a mo!ntain for a %ein of %al!able ore. 2oth processes re+!ire either sifting (go) thro!gh an immense amo!nt of material, or intelligently probing (penetrating) it to find e$actly where the %al!e resides. =i%en databases of s!fficient si&e and +!ality, data mining technology can generate new b!siness opport!nities by pro%iding these capabilities*

A!tomated prediction o trends and beha%iors. Data mining a!tomates the process of finding predicti%e information in large databases. 9!estions that traditionally re+!ired e$tensi%e hands,on analysis can now be answered directly from the data J +!ic-ly. A typical e$ample of a predicti%e problem is targeted mar-eting. Data mining !ses data on past promotional mailings to identify the targets most li-ely to ma$imi&e ret!rn on in%estment in f!t!re mailings. 8ther predicti%e problems incl!de forecasting ban-r!ptcy and other forms of defa!lt, and identifying segments of a pop!lation li-ely to respond similarly to gi%en e%ents. A!tomated disco%ery o pre%io!sly !nknown patterns. Data mining tools sweep thro!gh databases and identify pre%io!sly hidden patterns in one step. An e$ample of pattern disco%ery is the analysis of retail sales data to identify seemingly !nrelated prod!cts that are often p!rchased together. 8ther pattern disco%ery problems incl!de detecting fra!d!lent credit card transactions and identifying anomalo!s data that co!ld represent data entry -eying errors.

Data mining techni+!es can yield the benefits of a!tomation on e$isting software and hardware platforms, and can be implemented on new systems as e$isting platforms are !pgraded and new prod!cts de%eloped. Hhen data mining tools are implemented on high performance parallel processing systems, they can analy&e massi%e databases in min!tes. 7aster processing means that !sers can a!tomatically e$periment with more models to !nderstand comple$ data. High speed ma-es it practical for !sers to analy&e h!ge +!antities of data. :arger databases, in t!rn, yield impro%ed predictions. Databases can be larger in both depth and breadth*

More col!mns. Analysts m!st often limit the n!mber of %ariables they e$amine when doing hands,on analysis d!e to time constraints. Qet %ariables that are discarded beca!se they seem !nimportant may carry information abo!t !n-nown patterns. High performance data mining allows !sers to e$plore the f!ll depth of a database, witho!t preselecting a s!bset of %ariables. More rows. :arger samples yield lower estimation errors and %ariance, and allow !sers to ma-e inferences abo!t small b!t important segments of a pop!lation. #he most commonly !sed techni+!es in data mining are* Arti icial ne!ral networks* 5on,linear predicti%e models that learn thro!gh training and resemble biological ne!ral networ-s in str!ct!re. "ecision trees* #ree,shaped str!ct!res that represent sets of decisions. #hese decisions generate r!les for the classification of a dataset. (pecific decision tree methods incl!de )lassification and .egression #rees ()A.#) and )hi (+!are A!tomatic "nteraction Detection ()HA"D) . 6enetic al$orithms* 8ptimi&ation techni+!es that !se processes s!ch as genetic combination, m!tation, and nat!ral selection in a design based on the concepts of e%ol!tion. Nearest nei$hbor method* A techni+!e that classifies each record in a dataset based on a combination of the classes of the - record(s) most similar to it in a historical dataset (where - X 4). (ometimes called the -,nearest neighbor techni+!e. R!le ind!ction* #he e$traction of !sef!l if,then r!les from data based on statistical significance.

Many of these technologies ha%e been in !se for more than a decade in speciali&ed analysis tools that wor- with relati%ely small %ol!mes of data. #hese capabilities are now e%ol%ing to integrate

directly with ind!stry,standard data wareho!se and 8:A0 platforms. #he appendi$ to this white paper pro%ides a glossary of data mining terms. Conc!rrency Control Hhen m!ltiple transactions are trying to access the same sharable reso!rce, there co!ld arise many problems if the access control is not done properly. #here are some important mechanisms to which access control can be maintained. 1arlier we tal-ed abo!t theoretical concepts li-e seriali&ability, b!t the practical concept of this can be implemented by !sing *ocks and 0imestamps. Here we shall disc!ss some protocols where :oc-s and #imestamps can be !sed to pro%ide an en%ironment in which conc!rrent transactions can preser%e their )onsistency and "solation properties. *ock 'ased 8rotocol A loc- is nothing b!t a mechanism that tells the D2M( whether a partic!lar data item is being !sed by any transaction for read>write p!rpose. (ince there are two types of operations, i.e. read and write, whose basic nat!re are different, the loc-s for read and write operation may beha%e differently. .ead operation performed by different transactions on the same data item poses less of a challenge. #he %al!e of the data item, if constant, can be read by any n!mber of transactions at any gi%en time. Hrite operation is something different. Hhen a transaction writes some %al!e into a data item, the content of that data item remains in an inconsistent state, starting from the moment when the writing operation begins !p to the moment the writing operation is o%er. "f we allow any other transaction to read>write the %al!e of the data item d!ring the write operation, those transaction will read an inconsistent %al!e or o%erwrite the %al!e being written by the first transaction. "n both the cases anomalies will creep into the database. #he simple r!le for loc-ing can be deri%ed from here. "f a transaction is reading the content of a sharable data item, then any n!mber of other processes can be allowed to read the content of the same data item. 2!t if any transaction is writing into a sharable data item, then no other transaction will be allowed to read or write that same data item. Depending !pon the r!les we ha%e fo!nd, we can classify the loc-s into two types. Shared *ock: A transaction may ac+!ire shared loc- on a data item in order to read its content. #he loc- is shared in the sense that any other transaction can ac+!ire the shared loc- on that same data item for reading p!rpose. +Bcl!si%e *ock: A transaction may ac+!ire e$cl!si%e loc- on a data item in order to both read>write into it. #he loc- is e$c!si%e in the sense that no other transaction can ac+!ire any -ind of loc- (either shared or e$cl!si%e) on that same data item. #he relationship between (hared and 1$cl!si%e :oc- can be represented by the following table which is -nown as *ock MatriB. :oc-s already e$isting (hared 1$cl!si%e

(hared #.'1 7A:(1 1$cl!si%e 7A:(1 7A:(1 How Sho!ld *ock be 4sedC "n a transaction, a data item which we want to read>write sho!ld first be loc-ed before the read>write is done. After the operation is o%er, the transaction sho!ld then !nloc- the data item so that other transaction can loc- that same data item for their respecti%e !sage. "n the earlier chapter we had seen a transaction to deposit .s 400>, from acco!nt A to acco!nt 2. #he transaction sho!ld now be written as the following* :oc-,P (A)6 (1$cl!si%e :oc-, we want to both read A<s %al!e and modify it) .ead A6 A O A N 4006 Hrite A6 'nloc- (A)6 ('nloc-ing A after the modification is done) :oc-,P (2)6 (1$cl!si%e :oc-, we want to both read 2<s %al!e and modify it) .ead 26 2 O 2 M 4006 Hrite 26 'nloc- (2)6 ('nloc-ing 2 after the modification is done) And the transaction that deposits 40Y amo!nt of acco!nt A to acco!nt ) sho!ld now be written as* :oc-,( (A)6 ((hared :oc-, we only want to read A<s %al!e) .ead A6 #emp O A Z 0.46 'nloc- (A)6 ('nloc-ing A) :oc-,P ())6 (1$cl!si%e :oc-, we want to both read )<s %al!e and modify it) .ead )6 ) O ) M #emp6 Hrite )6 'nloc- ())6 ('nloc-ing ) after the modification is done) :et !s see how these loc-ing mechanisms help !s to create error free sched!les. Qo! sho!ld remember that in the pre%io!s chapter we disc!ssed an e$ample of an erroneo!s sched!le* #4 #2 .ead A6 A O A , 4006 .ead A6 #emp O A Z 0.46 .ead )6 ) O ) M #emp6 Hrite )6 Hrite A6

.ead 26 2 O 2 M 4006 Hrite 26 He detected the error based on common sense only, that the )onte$t (witching is being performed before the new %al!e has been !pdated in A. #2 reads the old %al!e of A, and th!s deposits a wrong amo!nt in ). Had we !sed the loc-ing mechanism, this error co!ld ne%er ha%e occ!rred. :et !s rewrite the sched!le !sing the loc-s. #4 #2 :oc-,P (A) .ead A6 A O A , 4006 Hrite A6 :oc-,( (A) .ead A6 #emp O A Z 0.46 'nloc- (A) :oc-,P()) .ead )6 ) O ) M #emp6 Hrite )6 'nloc- ()) Hrite A6 'nloc- (A) :oc-,P (2) .ead 26 2 O 2 M 4006 Hrite 26 'nloc- (2) He cannot prepare a sched!le li-e the abo%e e%en if we li-e, pro%ided that we !se the loc-s in the transactions. (ee the first statement in #2 that attempts to ac+!ire a loc- on A. #his wo!ld be impossible beca!se #4 has not released the e$c!si%e loc- on A, and #2 /!st cannot get the shared loc- it wants on A. "t m!st wait !ntil the e$cl!si%e loc- on A is released by #4, and can begin its e$ec!tion only after that. (o the proper sched!le wo!ld loo- li-e the following* #4 #2 :oc-,P (A) .ead A6 A O A , 4006 Hrite A6 'nloc- (A) :oc-,( (A) .ead A6

#emp O A Z 0.46 'nloc- (A) :oc-,P()) .ead )6 ) O ) M #emp6 Hrite )6 'nloc- ()) :oc-,P (2) .ead 26 2 O 2 M 4006 Hrite 26 'nloc- (2) And this a!tomatically becomes a %ery correct sched!le. He need not apply any man!al effort to detect or correct the errors that may creep into the sched!le if loc-s are not !sed in them. 0wo 8hase *ockin$ 8rotocol #he !se of loc-s has helped !s to create neat and clean conc!rrent sched!le. #he #wo 0hase :oc-ing 0rotocol defines the r!les of how to ac+!ire the loc-s on a data item and how to release the loc-s. #he #wo 0hase :oc-ing 0rotocol ass!mes that a transaction can only be in one of two phases . 6rowin$ 8hase: "n this phase the transaction can only ac+!ire loc-s, b!t cannot release any loc-. #he transaction enters the growing phase as soon as it ac+!ires the first loc- it wants. 7rom now on it has no option b!t to -eep ac+!iring all the loc-s it wo!ld need. "t cannot release any loc- at this phase e%en if it has finished wor-ing with a loc-ed data item. 'ltimately the transaction reaches a point where all the loc- it may need has been ac+!ired. #his point is called *ock 8oint. Shrinkin$ 8hase: After :oc- 0oint has been reached, the transaction enters the shrin-ing phase. "n this phase the transaction can only release loc-s, b!t cannot ac+!ire any new loc-. #he transaction enters the shrin-ing phase as soon as it releases the first loc- after crossing the :oc0oint. 7rom now on it has no option b!t to -eep releasing all the ac+!ired loc-s. #here are two different %ersions of the #wo 0hase :oc-ing 0rotocol. 8ne is called the (trict #wo 0hase :oc-ing 0rotocol and the other one is called the .igoro!s #wo 0hase :oc-ing 0rotocol. Strict 0wo 8hase *ockin$ 8rotocol "n this protocol, a transaction may release all the shared loc-s after the :oc- 0oint has been reached, b!t it cannot release any of the e$cl!si%e loc-s !ntil the transaction commits. #his protocol helps in creating cascade less sched!le. A Cascadin$ Sched!le is a typical problem faced while creating conc!rrent sched!le. )onsider the following sched!le once again. #4 #2

:oc-,P (A) .ead A6 A O A , 4006 Hrite A6 'nloc- (A) :oc-,( (A) .ead A6 #emp O A Z 0.46 'nloc- (A) :oc-,P()) .ead )6 ) O ) M #emp6 Hrite )6 'nloc- ()) :oc-,P (2) .ead 26 2 O 2 M 4006 Hrite 26 'nloc- (2) #he sched!le is theoretically correct, b!t a %ery strange -ind of problem may arise here. #4 releases the e$cl!si%e loc- on A, and immediately after that the )onte$t (witch is made. #2 ac+!ires a shared loc- on A to read its %al!e, perform a calc!lation, !pdate the content of acco!nt ) and then iss!e )8MM"#. Howe%er, #4 is not finished yet. Hhat if the remaining portion of #4 enco!nters a problem (power fail!re, disc fail!re etc) and cannot be committedW "n that case #4 sho!ld be rolled bac- and the old 27"M %al!e of A sho!ld be restored. "n s!ch a case #2, which has read the !pdated (b!t not committed) %al!e of A and calc!lated the %al!e of ) based on this %al!e, m!st also ha%e to be rolled bac-. He ha%e to rollbac- #2 for no fa!lt of #2 itself, b!t beca!se we proceeded with #2 depending on a %al!e which has not yet been committed. #his phenomenon of rolling bac- a child transaction if the parent transaction is rolled bac- is called )ascading .ollbac-, which ca!ses a tremendo!s loss of processing power and e$ec!tion time. 'sing (trict #wo 0hase :oc-ing 0rotocol, )ascading .ollbac- can be pre%ented. "n (trict #wo 0hase :oc-ing 0rotocol a transaction cannot release any of its ac+!ired e$cl!si%e loc-s !ntil the transaction commits. "n s!ch a case, #4 wo!ld not release the e$cl!si%e loc- on A !ntil it finally commits, which ma-es it impossible for #2 to ac+!ire the shared loc- on A at a time when A<s %al!e has not been committed. #his ma-es it impossible for a sched!le to be cascading. Ri$oro!s 0wo 8hase *ockin$ 8rotocol "n .igoro!s #wo 0hase :oc-ing 0rotocol, a transaction is not allowed to release any loc- (either shared or e$cl!si%e) !ntil it commits. #his means that !ntil the transaction commits, other transaction might ac+!ire a shared loc- on a data item on which the !ncommitted transaction has

a shared loc-6 b!t cannot ac+!ire any loc- on a data item on which the !ncommitted transaction has an e$cl!si%e loc-. 0imestamp Orderin$ 8rotocol A timestamp is a tag that can be attached to any transaction or any data item, which denotes a specific time on which the transaction or data item had been acti%ated in any way. He, who !se comp!ters, m!st all be familiar with the concepts of RDate )reatedS or R:ast ModifiedS properties of files and folders. Hell, timestamps are things li-e that. A timestamp can be implemented in two ways. #he simplest one is to directly assign the c!rrent %al!e of the cloc- to the transaction or the data item. #he other policy is to attach the %al!e of a logical co!nter that -eeps incrementing as new timestamps are re+!ired. #he timestamp of a transaction denotes the time when it was first acti%ated. #he timestamp of a data item can be of the following two types* 3-timestamp &)(* #his means the latest time when the data item 9 has been written into. R-timestamp &)(* #his means the latest time when the data item 9 has been read from. #hese two timestamps are !pdated each time a s!ccessf!l read>write operation is performed on the data item 9. "'MS and its Applications "'MS: A database management system is a software system. "t allows access to the data in a database. "t consists of a set of interrelated data together with a set of programs to access those data. Objecti%e: #he ob/ecti%e of a D2M( is to pro%ide a con%enient and effecti%e method of defining, storing and retrie%ing the data in the database. 8!rpose o "'MS: 2efore the arri%al of D2M(, data were processed !sing file processing system. 7ile processing system consists of se%eral application programs and each application program has its own data stored in pri%ate files. "n this system the same data file cannot be shared. Hence d!plication of data is re+!ired if two or more application programs ha%e to share the same data. "isad%anta$e o ile processin$ system:

#he file processing system has the following ma/or disad%antages*


Data red!ndancy and inconsistency. "ntegrity 0roblems. (ec!rity 0roblems Diffic!lty in accessing data. Data isolation. a( "ata red!ndancy and inconsistency: Data red!ndancy means d!plication of data and inconsistency means that the d!plicated %al!es are different. b( #nte$rity problems: Data integrity means that the data %al!es in the data base sho!ld be acc!rate in the sense that the %al!e m!st satisfy some r!les. c( Sec!rity 8roblem: Data sec!rity means pre%ention of data accession by !na!thori&ed !sers. d( "i ic!lty in accessin$ data: Diffic!lty in accessing data arises whene%er there is no application program for a specific tas-. e( "ata isolation: #his problem arises d!e to the scattering of data in %ario!s files with %ario!s formats. D!e to the abo%e disad%antages of the earlier data processing system, the necessity for an effecti%e data processing system arises. 8nly at that time the concept of D2M( emerges for the resc!e of a large n!mber of organi&ations. "ata base system applications: 'ni%ersities* 7or st!dent information, co!rse details, and grades.

Airlines* 7or reser%ations and sched!le information. )redit card transactions* 7or p!rchase on credit cards and generations of monthly statements. H!man reso!rces* 7or information abo!t employees, salaries, payroll ta$es, benefits and for generations of paychec-s. 2an-ing* 7or c!stomer information, acco!nts, and ban-ing transactions. (tages of growth model (tages of growth model is a theoretical model for the growth of information technology ("#) in a b!siness or similar organi&ation. "t was de%eloped by .ichard :. 5olan d!ring the 49B0s, and p!blished by him in the Harvard Business Review.[ 5olan<s model concerns the general approach to "# in b!siness. #he model proposes that e%ol!tion of "# in organi&ations begins slowly in (tage ", the initiation stage.#his stage is mar-ed by hands off !ser awareness and an emphasis on f!nctional applications to red!ce costs. (tage " is followed by f!rther growth of "# in the contagion stage. "n this stage there is a proliferation of applications as well as the potential for more problems to arise. D!ring (tage """ a need for control arises. )entrali&ed controls are p!t in place and a shift occ!rs from management of comp!ters to management of data reso!rces. 5e$t, in (tage "G, integration of di%erse technological sol!tions e%ol%es. Management of data allows de%elopment witho!t increasing "# e$pendit!res in (tage G. 7inally, in (tage G", mat!rity ,high control is e$ercised by !sing all the information from the pre%io!s stages Sta$e # 2 #nitiation "n this stage, information technology is first introd!ced into the organi&ation. According to 5olan<s article in 49B@, comp!ters were introd!ced into companies for two reasons. #he first reason deals with the company reaching a si&e where the administrati%e processes cannot be accomplished witho!t comp!ters. Also, the s!ccess of the b!siness /!stifies large in%estment in speciali&ed e+!ipment. #he second reason deals with comp!tational needs. 5olan defined the critical si&e of the company as the most pre%alent reason for comp!ter ac+!isition. D!e to the !nfamiliarity of personnel with the technology, !sers tend to ta-e a hands off approach to new technology. #his introd!ctory software is simple to !se and cheap to implement, which pro%ides s!bstantial monetary sa%ings to the company. D!ring this stage, the "# department recei%es little attention from management and wor- in a carefree atmosphere. (tage " Key points*

'ser awareness is characteri&ed as being hands off . "# personnel are speciali&ed for technological learning . "# planning and control is not e$tensi%e. #here is an emphasis on f!nctional applications to red!ce costs.

Sta$e ## 2 Conta$ion 1%en tho!gh the comp!ters are recogni&ed as Rchange agentsS in (tage ", 5olan ac-nowledged that many !sers become alienated by comp!ting. 2eca!se of this, (tage "" is characteri&ed by a managerial need to e$plain the potential of comp!ter applications to alienated !sers. #his leads to the adoption of comp!ters in a range of different areas. A problem that arises in (tage "" is that pro/ect and b!dgetary controls are not de%eloped. 'na%oidably, this leads to a sat!ration of e$isting comp!ter capacity and more sophisticated comp!ter systems being obtained. (ystem sophistication re+!ires employing speciali&ed professionals. D!e to the shortage of +!alified indi%id!als, implementing these employees res!lts in high salaries. #he b!dget for comp!ter organi&ation rises significantly and ca!ses concern for management. Altho!gh the price of (tage "" is high, it is e%ident that planning and control of comp!ter systems is necessary. [4\[2\ (tage "" Key points*

#here is a proliferation of applications. 'sers are s!perficially enth!siastic abo!t !sing data processing. Management control is e%en more rela$ed. #here is a rapid growth of b!dgets. #reatment of the comp!ter by management is primarily as /!st a machine. .apid growth of comp!ter !se occ!rs thro!gho!t the organi&ation3s f!nctional areas. )omp!ter !se is plag!ed by crisis after crisis.

Sta$e ### 2 Control (tage """ is a reaction against e$cessi%e and !ncontrolled e$pendit!res of time and money spent on comp!ter systems, and the ma/or problem for management is the organi&ation of tas-s for control of comp!ter operating costs. "n this stage, pro/ect management and management report systems are organi&ed, which leads to de%elopment of programming, doc!mentation, and operation standards. D!ring (tage """, a shift occ!rs from management of comp!ters to management of data reso!rces. #his shift is an o!tcome of analysis of how to increase management control and planning in e$pending data processing operations. Also, the shift

pro%ides fle$ibility in data processing that is needed in a case of management<s new controls. #he ma/or characteristic of (tage """ is reconstr!ction of data processing operation.[4\[2\ (tage """ Key points*

#here is no red!ction in comp!ter !se. "# di%ision3s importance to the organi&ation is greater. )entrali&ed controls are p!t in place. Applications are often incompatible or inade+!ate. #here is !se of database and comm!nications, often with negati%e general management reaction. 1nd !ser fr!stration is often the o!tcome.

Sta$e #5 2 #nte$ration (tage "G feat!res the adoption of new technology to integrate systems that were pre%io!sly separate entities. #his creates data processing ("#) e$pendit!re growth rates similar to that of (tage "". "n the latter half of (tage "G, e$cl!si%e reliance on comp!ter controls leads to inefficiencies. #he inefficiencies associated with rapid growth may create another wa%e of problems sim!ltaneo!sly. #his is the last stage that 5olan ac-nowledged in his initial proposal of the stages of growth in 49B@ (tage "G Key points*

#here is rise of control by the !sers. A larger data processing b!dget growth e$ists. #here is greater demand for on,line database facilities. Data processing department now operates li-e a comp!ter !tility. #here is formal planning and control within data processing. 'sers are more acco!ntable for their applications. #he !se of steering committees, applications financial planning becomes important. Data processing has better management controls and set standards.

Sta$e 5 2 "ata administration

5olan determined that fo!r stages were not eno!gh to describe the proliferation of "# in an organi&ation and added (tage G in 49B9. (tage G feat!res a new emphasis on managing corporate data rather than "#. :i-e the proceeding (tage G", it is mar-ed by the de%elopment and mat!rity of the new concept of data administration (tage G Key points*

Data administration is introd!ced. #here is identification of data similarities, its !sage, and its meanings within the whole organi&ation. #he applications portfolio is integrated into the organi&ation. Data processing department now ser%es more as an administrator of data reso!rces than of machines. A -ey difference is the !se of term "#>"( rather than data processing..

Sta$e 5# 2 Mat!rity "n (tage G", the application portfolio J tas-s li-e orderly entry, general ledger, and material re+!irements planning J is completed and its str!ct!re RmirrorsS the organi&ation and information flows in the company. D!ring this stage, trac-ing sales growth becomes an important aspect. 8n the a%erage, 40Y batch and remote /ob entry, D0Y are dedicated to data base and data comm!nications processing, CY personal comp!ting, 2CY minicomp!ter processing. Management control systems are !sed the most in (tage G" (F0Y). #here are three aspects of management control6 man!fact!ring, mar-eting and financial. Man!fact!ring control demands forecasting J loo-ing down the road for f!t!re needs. Mar-eting control strictly deals with research. 7inancial control, forecasts cash re+!irements for the f!t!re. (tage G" e$ercises high control, by compiling all of the information from (tages " thro!gh G. #his allows the organi&ation to f!nction at high le%els of efficiency and effecti%eness.[4\ (tage G" Key points*

(ystems now reflect the real information needs of the organi&ation. =reater !se of data reso!rces to de%elop competiti%e and opport!nistic applications. Data processing organisation is %iewed solely as a data reso!rce f!nction. Data processing now emphasi&es data reso!rce strategic planning. 'ltimately, !sers and D0 department /ointly responsible for the !se of data reso!rces within the organi&ation.

Manager of "# system ta-es on the same importance in the organi&ational hierarchy as say the director of finance or director of H.

#nitial reaction .ichard 5olan<s (tages of =rowth Model seemed ahead of its time when it was first p!blished in the 49B0s. *e$acy

)ritics agree that 5olan<s model presents se%eral shortcomings and is slightly o!t of date. As time has progressed, .ichard 5olan<s (tages of =rowth Model has re%ealed some apparent wea-nesses. Howe%er, many agree that this does not ta-e away from his inno%ati%e loo- into the realm of comp!ting de%elopment.[citation needed\ Criticism An arg!ment posed dealt with the main foc!s on the change in b!dget, and whether it is Rreasonable to ass!me that a single %ariable ser%es as a s!itable s!rrogate for so m!ch.S[F\ "t seems logical that this single %ariable co!ld be an indicator of other %ariables s!ch as the organi&ational en%ironment or an organi&ation3s learning c!r%e, b!t not that it is the sole dri%ing force of the entire model. 5olan shows little connection that wo!ld ma-e his initial point a %alid one. "n his model, .ichard 5olan states that the force behind the growth of comp!ting thro!gh the stages is technological change. King and Kramer[F\ find this to be far too general as they say, Rthere are additional factors that sho!ld be considered. Most important are the demand,side factors that create a ripe en%ironment for technological changes to be considered and adopted.S[F\ As proposed, technological change has a m!ltit!de of facets that determine its necessity. )hange cannot be bro!ght forth !nless it is needed !nder certain circ!mstances. 'nwarranted change wo!ld res!lt in e$cess costs and potential fail!re of the process. :ast, the stages of growth model ass!mes straightforward organi&ational goals that are to be determined thro!gh the technological change. #his can be %iewed as %ery na]%e from the !ser perspecti%e. King and Kraemer state, Rthe +!estion of whether organi&ational goals are !niform and consistent g!ides for the beha%ior of organi&ational actors, as opposed to dynamic and changing targets that res!lt from competition and conflict among organi&ational actors, has recei%ed considerable attention in the literat!re on comp!ting.S )learly, organi&ational goals are e%er changing and sometimes rigid indicators of direction. #hey cannot be R!niformS ob/ecti%es that are not s!b/ect to change.

/D>.>? Selected 'iblio$raphy or 3orld 3ide 3eb "atabases #he idea of Horld Hide Heb was proposed by 2erners,:ee (4992, 499F) and his gro!p at )1.5 in =ene%a. #he "nformi$ Heb "ntegration 8ption is described in "nformi$ (499Ea). Manola (499E) disc!sses the de%elopments of new standards li-e PM: and the doc!ment ob/ect model for integration of Heb technology with ob/ect technology. Mendel&on (499B) describes concepts for +!ery processing on the Heb6 =ra%ano and =arcia,Molina (499B) propose the notion of ran-ing answers to free,form +!eries on the Heb6 At&eni et al. propose a data model named A.A51'( and two lang!ages for +!erying, b!ilding hyperte$t!al %iews, and deri%ing data. 7raternali (4999) pro%ides a s!r%ey of approaches to s!pporting data intensi%e web applications. #he area of electronic commerce will draw many benefits from and share many problems with the database de%elopment on the Heb6 a good o%er%iew is in Dogac (499E). (e%eral white papers are a%ailable on the database %endors< Heb sites abo!t their prod!cts pro%iding Heb access to databases (e.g., www.oracle.com, www.informi$.com). #he www consorti!m (H@)) has a Heb site where !p,to,date information on PM: may be fo!nd* www.w@.org>PM:>6 protocols are described at www.w@.org>protocols>. 7or details on AD2) A0" specification, cons!lt www./a%a.s!n.com>prod!cts>.

/D>. "atabases on the 3orld 3ide 3eb 2B.4.4 0ro%iding Access to Databases on the Horld Hide Heb 2B.4.2 #he Heb "ntegration 8ption of "578.M"P 2B.4.@ #he 8.A):1 Heb(er%er 2B.4.F 8pen 0roblems with Heb Databases 2B.4.C (elected 2ibliography for Horld Hide Heb Databases #he Horld Hide Heb (HHH)Jpop!larly -nown as the Heb Joriginally de%eloped in (wit&erland at )1.5 (5ote 4) in early 4990 as a large,scale hypermedia information ser%ice system for biological scientists to share information (5ote 2). #oday this technology allows

!ni%ersal access to this shared information to anyone ha%ing access to the "nternet and the Heb contains h!ndreds of millions of Heb pages within the reach of millions of !sers. "n Heb technology, a basic client,ser%er architect!re !nderlies all acti%ities. "nformation is stored on comp!ters designated as Heb ser%ers in p!blicly accessible shared files encoded !sing Hyper0eBt Mark!p *an$!a$e &H0M*(> A n!mber of tools enable !sers to create Heb pages formatted with H#M: tags, freely mi$ed with m!ltimedia contentJfrom graphics to a!dio and e%en to %ideo. A page has many interspersed hyperlinksEliterally a lin- that enables a !ser to browse or mo%e from one page to another across the "nternet. #his ability has gi%en a tremendo!s power to end !sers in searching and na%igating related informationJoften across different continents. "nformation on the Heb is organi&ed according to a 4ni orm Reso!rce *ocator &4R*(J something similar to an address that pro%ides the complete pathname of a file. #he pathname consists of a string of machine and directory names separated by slashes and ends in a filename. interpret and present H#M: doc!ments to !sers. 0op!lar Heb browsers incl!de the "nternet 1$plorer of Microsoft and the 5etscape 5a%igator. A collection of H#M: doc!ments and other files accessible %ia the '.: on a Heb ser%er is called a 3eb site> "n the abo%e '.:, www.awl.com may be called the Heb site of Addison Hesley 0!blishing. /D>.>. 8ro%idin$ Access to "atabases on the 3orld 3ide 3eb #oday<s technology has been mo%ing rapidly from static to dynamic Heb pages, where content may be in a constant state of fl!$. #he Heb ser%er !ses a standard interface called the Common 6ateway #nter ace &C6#( to act as the middlewareJthe additional software layer between the !ser interface front,end and the D2M( bac-,end that facilitates access to heterogeneo!s databases. #he )=" middleware e$ec!tes e$ternal programs or scripts to obtain the dynamic information, and it ret!rns the information to the ser%er in H#M:, which is gi%en bac- to the browser. As the Heb !ndergoes its latest transformations, it has become necessary to allow !sers access not only to file systems b!t to databases and D2M(s to s!pport +!ery processing, report generation, and so forth. #he e$isting approaches may be di%ided into two categories*

4. Access using CGI scripts #he database ser%er can be made to interact with the Heb ser%er %ia )=". 7ig!re 2B.04 shows a schematic for the database access architect!re on the Heb !sing )=" scripts, which are written in lang!ages li-e 01.:, #cl, or ). #he main disad%antage of this approach is that for each !ser re+!est, the Heb ser%er m!st start a new )=" process* each process ma-es a new connection with the D2M( and the Heb ser%er m!st wait !ntil the res!lts are deli%ered to it. 5o efficiency is achie%ed by any gro!ping of m!ltiple !sers< re+!ests6 moreo%er, the de%eloper m!st -eep the scripts in the )=",bin s!bdirectories only, which opens it to a possible breach of sec!rity. #he fact that )=" has no lang!age associated with it b!t re+!ires database de%elopers to learn 01.: or #cl is also a drawbac-. Manageability of scripts is another problem if the scripts are scattered e%erywhere. 2. Access using J!BC AD2) is a set of Aa%a classes de%eloped by (!n Microsystems to allow access to relational databases thro!gh the e$ec!tion of (9: statements. "t is a way of connecting with databases, witho!t any additional processes for each client re+!est. 5ote that AD2) is a name trademar-ed by (!n6 it does not stand for Aa%a Data 2ase connecti%ity as many belie%e. AD2) has the capabilities to connect to a database, send (9: statements to a database and to retrie%e the res!lts of a +!ery !sing the Aa%a classes )onnection, (tatement, and .es!lt(et respecti%ely. Hith Aa%a<s claimed platform independence, an application may r!n on any Aa%a,capable browser, which loads the Aa%a code from the ser%er and r!ns it on the client<s browser. #he Aa%a code is D2M( transparent6 the AD2) dri%ers for indi%id!al D2M(s on the ser%er end carry the tas- of interacting with that D2M(. "f the AD2) dri%er is on the client, the application r!ns on the client and its re+!ests are comm!nicated to the D2M( directly by the dri%er. 7or standard (9: re+!ests, many .D2M(s can be accessed this way. #he drawbac- of !sing AD2) is the prospect of e$ec!ting Aa%a thro!gh %irt!al machines with inherent efficiency. #he AD2) bridge to 8b/ect Database )onnecti%ity (8D2)) remains another way of getting to the .D2M(s.

2esides )=", other Heb ser%er %endors are la!nching their own middleware prod!cts for pro%iding m!ltiple database connecti%ity. #hese incl!de "nternet (er%er A0" ("(A0") from Microsoft and 5etscape A0" (5(A0") from 5etscape. "n the ne$t section we describe the Heb access option pro%ided by "nformi$. 8ther D2M( %endors already ha%e, or will ha%e similar pro%isions to s!pport database access on the Heb. /D>.>/ 0he 3eb #nte$ration Option o #NFORM#: "nformi$ has addressed the limitations of )=" and the incompatibilities of )=", 5(A0", and "(A0" by creating the Heb "ntegration 8ption (H"8). H"8 eliminates the need for scripts. De%elopers !se tools to create intelligent H#M: pages called Application 0ages (or App 0ages) directly within the database. #hey e$ec!te (9: statements dynamically, format the res!lts inside H#M:, and ret!rn the res!lting Heb page to the end !sers. #he schematic architect!re is shown in 7ig!re 2B.02. H"8 !ses the 3eb "ri%erF a lightweight )=" process that is in%o-ed when a '.: re+!est is recei%ed by the Heb ser%er. A !ni+!e session identifier is generated for each re+!est b!t the H"8 application is persistent and does not terminate after each re+!est. Hhen the H"8 application recei%es a re+!est from the Heb dri%er, it connects to the database and e$ec!tes Heb 1$plode, a f!nction that e$ec!tes +!eries within Heb pages and formats res!lts as a Heb page that goes bac- to the browser %ia the Heb dri%er. "nformi$ H#M: tag e$tensions allow Heb a!thors to create applications that can dynamically constr!ct Heb page templates from the "nformi$ Dynamic (er%er and present them to the end !sers. H"8 also lets !sers create their own c!stomi&ed tags to perform speciali&ed tas-s. #h!s, witho!t resorting to any programming or script de%elopment, powerf!l applications can be designed. Another feat!re of H"8 helps transaction,oriented applications by pro%iding an application programming interface (A0") that offers a collection of basic ser%ices s!ch as connection and session management that can be incorporated into Heb application. H"8 s!pports applications de%eloped in ), )MM, and Aa%a. #his fle$ibility lets de%elopers port e$isting applications to the Heb or de%elop new applications in these lang!ages. #he H"8 is integrated with Heb ser%er software and !tili&es the nati%e sec!rity mechanism of the "nformi$

Dynamic (er%er. #he open architect!re of H"8 allows the !se of %ario!s Heb browsers and ser%ers. /D>.>1 0he ORAC*+ 3ebSer%er 8.A):1 s!pports Heb access to databases !sing the components shown in 7ig!re 2B.0@. #he client re+!ests files that are called static or dynamic files from the Heb ser%er. (tatic files ha%e a fi$ed content whereas dynamic files may ha%e content that incl!des res!lts of +!eries to the database.#here is an H##0 demon (a process that r!ns contin!o!sly) called Heb :istener r!nning on the ser%er that listens for the re+!ests originating in the clients. A static file (doc!ment) is retrie%ed from the file system of the ser%er and displayed on the Heb browser at the client. .e+!est for a dynamic page is passed by the listener to a Heb re+!est bro-er (H.2), which is a m!lti,threaded dispatcher that adheres to cartridges. Cartrid$es are software mod!les (mentioned earlier in (ection 4@.2.D) that perform specific f!nctions on specific types of data6 they can comm!nicate among themsel%es. )!rrently cartridges are pro%ided for 0:>(9:, Aa%a, and :i%e H#M:6 c!stomi&ed cartridges may be pro%ided as well. Heb(er%er has been f!lly integrated with 0:>(9:, ma-ing it efficient and scalable. #he cartridges gi%e it additional fle$ibility, ma-ing it possible to wor- with other lang!ages and software pac-ages. An ad%anced sec!re soc-ets layer may be !sed for sec!re comm!nication o%er the "nternet. #he Designer 2000 de%elopment tool (see (ection 4D.4) has a Heb generator that enables pre%io!s applications de%eloped for :A5s to be ported to the "nternet and "ntranet en%ironments. /D>.>= Open 8roblems with 3eb "atabases #he Heb is an important factor in planning for enterprise,wide comp!ting en%ironments, both for pro%iding e$ternal access to the enterprise<s systems and information for c!stomers and s!ppliers and for mar-eting and ad%ertising p!rposes. At the same time, d!e to sec!rity re+!irements, employees of some organi&ations are restricted to operate within intranetsJ s!bnetwor-s that cannot be accessed freely from the o!tside world. Among the prominent applications of the intranet and the HHH are databases to s!pport electronic storefronts, parts and prod!ct catalogs, directories and sched!les, newsstands, and boo-stores. +lectronic

commerceJthe p!rchasing of prod!cts and ser%ices electronically on the "nternetJis li-ely to become a ma/or application s!pported by s!ch databases. #he f!t!re challenges of managing databases on the Heb will be many, among them the following* ; Heb technology needs to be integrated with the ob/ect technology. )!rrently, the web can be %iewed as a distrib!ted ob/ect system, with H#M: pages f!nctioning as ob/ects identified by the '.:. ; H#M: f!nctionality is too simple to s!pport comple$ application re+!irements. As we saw, the Heb "ntegration 8ption of "nformi$ adds f!rther tags to H#M:. "n general, additional facilities will be needed to (4) ma-e Heb clients f!nction as application front ends, integrating data from m!ltiple heterogeneo!s databases6 (2) ma-e Heb clients present different %iews of the same data to different !sers6 and (@) ma-e Heb clients intelligent by pro%iding additional data mining f!nctionality (see (ection 2D.2). ; Heb page content can be made more dynamic by adding more beha%ior to it as an ob/ect (see )hapter 44 for a disc!ssion of ob/ect modeling). "n this respect (4) client and ser%er ob/ects (H#M: pages) can be made to interact6 (2) Heb pages can be treated as collections of programmable ob/ects6 and (@) client,side code can access these ob/ects and manip!late them dynamically. ; #he s!pport for a large n!mber of clients co!pled with reasonable response times for +!eries against %ery large (se%eral tens of gigabytes in si&e) databases will be ma/or challenges for Heb databases. #hey will ha%e to be addressed both by Heb ser%ers and by the !nderlying D2M(s.

1fforts are !nderway to address the limitations of the c!rrent data str!ct!ring technology, partic!larly by the Horld Hide Heb )onsorti!m (H@)). #he H@) is designing a Heb 8b/ect Model. H@) is also proposing an +Btensible Mark!p *an$!a$e &:M*( for str!ct!red doc!ment interchange on the Heb. PM: defines a s!bset of S6M* &the Standard 6enerali-ed

Mark!p *an$!a$e(F allowing c!stomi&ation of mar-!p lang!ages with application,specific tags. PM: is rapidly gaining gro!nd d!e to its e$tensibility in defining new tags. H@)<s "oc!ment Object Model &"OM( defines an ob/ect,oriented A0" for H#M: or PM: doc!ments presented by a Heb client. H@) is also defining metadata modeling standards for describing "nternet reso!rces. #he technology to model information !sing the standards disc!ssed abo%e and to find information on the Heb is !ndergoing a ma/or e%ol!tion. 8%erall, the Heb ser%ers ha%e to gain rob!stness as a reliable .technology to handle prod!ction,le%el databases for s!pporting 2F$B applicationsJ2F ho!rs a day, B days a wee-. (ec!rity remains a critical problem for s!pporting applications in%ol%ing financial and medical databases. /D>/ M!ltimedia "atabases "n the years ahead m!ltimedia information systems are e$pected to dominate o!r daily li%es. 8!r ho!ses will be wired for bandwidth to handle interacti%e m!ltimedia applications. 8!r high, definition #G>comp!ter wor-stations will ha%e access to a large n!mber of databases, incl!ding digital libraries (see (ection 2B.D) that will distrib!te %ast amo!nts of m!ltiso!rce m!ltimedia content. 0he Nat!re o M!ltimedia "ata and Applications 5at!re of M!ltimedia Applications "n (ection 2@.@ we disc!ssed the ad%anced modeling iss!es related to m!ltimedia data. He also e$amined the processing of m!ltiple types of data in )hapter 4@ in the conte$t of ob/ect relational D2M(s (8.D2M(s). D2M(s ha%e been constantly adding to the types of data they s!pport. #oday the following types of m!ltimedia data are a%ailable in c!rrent systems* ; "e#t May be formatted or !nformatted. 7or ease of parsing str!ct!red doc!ments, standards li-e (=M: and %ariations s!ch as H#M: are being !sed. ; Grap$ics 1$amples incl!de drawings and ill!strations that are encoded !sing some descripti%e standards (e.g., )=M, 0")#, postscript). ; Images "ncl!des drawings, photographs, and so forth, encoded in standard formats s!ch as bitmap, A01=, and M01=. )ompression is b!ilt into A01= and M01=. #hese images are

not s!bdi%ided into components. Hence +!erying them by content (e.g., find all images containing circles) is nontri%ial. ; Animations #emporal se+!ences of image or graphic data. ; %ideo A set of temporally se+!enced photographic data for presentation at specified rates Jfor e$ample, @0 frames per second. ; &tructured audio A se+!ence of a!dio components comprising note, tone, d!ration, and so forth. ; Audio (ample data generated from a!ral recordings in a string of bits in digiti&ed form. Analog recordings are typically con%erted into digital form before storage. ; Composite or mi#ed multimedia data A combination of m!ltimedia data types s!ch as a!dio and %ideo which may be physically mi$ed to yield a new storage format or logically mi$ed while retaining original types and formats. )omposite data also contains additional control information describing how the information sho!ld be rendered.

M!ltimedia applications dealing with tho!sands of images, doc!ments, a!dio and %ideo segments, and free te$t data depend critically on appropriate modeling of the str!ct!re and content of data and then designing appropriate database schemas for storing and retrie%ing m!ltimedia information. M!ltimedia information systems are %ery comple$ and embrace a large set of iss!es, incl!ding the following* ; 'odeling #his area has the potential for applying database %ers!s information retrie%al techni+!es to the problem. #here are problems of dealing with comple$ ob/ects (see )hapter 44) made !p of a wide range of types of data* n!meric, te$t, graphic (comp!ter, generated image), animated graphic image, a!dio stream, and %ideo se+!ence. Doc!ments constit!te a speciali&ed area and deser%e special consideration. ; !esign #he concept!al, logical, and physical design of m!ltimedia databases has not been addressed f!lly, and it remains an area of acti%e research. #he design process can be based on the general methodology described in )hapter 4D, b!t the performance and t!ning iss!es at each le%el are far more comple$. ; &torage (torage of m!ltimedia data on standard dis-li-e de%ices presents problems of representation, compression, mapping to de%ice hierarchies, archi%ing, and b!ffering d!ring the inp!t>o!tp!t operation. Adhering to standards s!ch as A01= or M01= is one way most %endors of m!ltimedia prod!cts are li-ely to deal with this iss!e. "n D2M(s, a 2:82 (2inary :arge 8b/ect) facility allows !ntyped bitmaps to be stored and retrie%ed. (tandardi&ed software will be re+!ired to deal with synchroni&ation and compression>decompression, and will be co!pled with inde$ing problems, which are still in the research domain. ; (ueries and retrieval #he database way of retrie%ing information is based on +!ery lang!ages and internal inde$ str!ct!res. #he information retrie%al way relies strictly on -eywords or predefined inde$ terms. 7or images, %ideo data, and a!dio data, this opens !p many iss!es, among them efficient +!ery form!lation, +!ery e$ec!tion, and optimi&ation. #he standard optimi&ation techni+!es we disc!ssed in )hapter 4E need to be modified to wor- with m!ltimedia data types.

; )er*ormance 7or m!ltimedia applications in%ol%ing only doc!ments and te$t, performance constraints are s!b/ecti%ely determined by the !ser. 7or applications in%ol%ing %ideo playbac- or a!dio,%ideo synchroni&ation, physical limitations dominate. 7or instance, %ideo m!st be deli%ered at a steady rate of D0 frames per second. #echni+!es for +!ery optimi&ation may comp!te e$pected response time before e%al!ating the +!ery. #he !se of parallel processing of data may alle%iate some problems, b!t s!ch efforts are c!rrently s!b/ect to f!rther e$perimentation.

You might also like