You are on page 1of 43

CERTIFICATE

This is to certify that


Ms. Shaivya Easwaren
Roll no. 115
BE I
has completed the necessary seminar work and prepared the bona fide report
on

XML-Based Servers- Communicating meaningful information


over the Web using XML
in a satisfactory manner as partial fulfillment for requirement of the degree
of
B.E (Computer)
Of
University of Pune
in the academic year 2002-2003
Date:
Place:

Prof.
Internal Guide

Prof. G P Potdar
Seminar coordinator

Prof. Dr. C V K Rao


H.O.D

DEPARTMENT OF COMPUTER ENGINEERING


PUNE INSTITUTE OF COMPUTER TECHNOLOGY
PUNE - 43

ACKNOWLEDGEMENT
I would like to express my sincere gratitude to Prof. G P Potdar, our seminar coordinator, Prof .Dr. C V K Rao, Head of Computer Department, PICT, and Prof.
, our internal guide, for their valuable guidance in the completion and presentation
of the seminar, and the submission of this report.
-Shaivya Easwaren.
Roll No. 115,
BE I
(2002-2003)

INDEX
Page no.
Chapter 1- Introduction

1.1

What is XML?

1.2

Why extensible?

1.3

Pieces of XML

1.4

Where it is used

1.4.1

Reducing server load

1.4.2

Web site content

1.4.3

Remote procedure calls

1.4.4

E-Commerce

Chapter 2- XML in the Browser

2.1

Example XML document

2.2

Rules for elements

11

2.3

XML parsers

11

2.4

Formatting XML documents

11

2.4.1

Cascading style sheet language

12

2.4.2

Extensible style sheet language

15

Chapter 3- XML Interfaces

19

3.1

Document Object Model (DOM) and XML-DOM 19

3.2

Simple API for XML (SAX)

20

3.3

Namespaces

21

Chapter 4- XML-Data

25

4.1

Introduction

25

4.2

XML specific elements

26

4.3

Example of XML schema

29

Page no.
Chapter 5-XML and Databases

33

5.1

Using XML in an N-tier Application

33

5.2

Returning XML from a Data Object

33

5.3

Database Vendors and XML

35

5.3.1 Microsofts XML technologies

35

5.3.2 Oracles XML technologies

36

Chapter 6- Applications of XML

37

Chapter 7- Conclusion

40

Bibliography

41

CHAPTER 1
INTRODUCTION
1.1

WHAT IS XML?

Extensible Markup Language (XML) is the latest buzzword on the Internet, but it is also
a rapidly growing and maturing technology with real world applications, particularly for
management, display, and the organization of data. It is primarily a technology concerned
with the description and structuring of data.
The idea of a universal data format is not new. An early attempt to combine a
universally acceptable data format with rich information storage capabilities was SGML
(Standard Generalized Markup Language). The best known application of SGML is
HTML (Hypertext Markup Language). The idea was that any HTML document (or web
page) would be presentable in any application that was capable of understanding HTML
(termed the Web Browser).
Unfortunately, SGML is such a complicated language that it is not well suited to data
interchange over the Web. HTML too is limited in its scope, in that it is intended for
displaying documents in a browser only. Thus to adapt SGML to provide facilities to
describe some kinds of specialized information, XML was developed. Thus XML is
actually a subset of SGML and fully compatible with it.
It is important to note however that XML is not really a language at all, but a standard
for creating languages that meet the XML criteria. It thus describes a syntax that you
would use to create your own languages. XML can be viewed in an IE5 web browser
since IE5 contains a default built in style sheet that enables us to view XML documents
in a web browser.
XML is very flexible. Hence it is targeted to be the basis for defining data exchange
languages, especially for communication over the Internet. It makes it very easy to work
with data within applications but it also makes it easy to share this information with
others. The coming chapters will highlight some factors about the use of XML in real
world applications, as well as the reason why it is becoming the lingua franca for
database applications.

1.2 WHY EXTENSIBLE ?


Since we have full control over the creation of the XML document, we can shape the
data any way we wish, so that it makes sense to our particular application. For example,
instead of creating a text file to store the name- John Doe, I might create an XML file like
<name>
<first>John</first>
<last>Doe</last>
</name>

If I do not wish for this level of flexibility, i.e. different tags for each part of the name, I
can also write
<designation>John Fitzgerald Byers</designation>

We are thus free to structure the same data in different applications to suit the
requirements of that application. If we want to create data in a way that only a particular
computer program will use, we can do so. If we want to share data with other programs
we can do so.
This is where the Extensible in XML comes from, in the freedom to use our own tags
to describe data, to make it more comprehensible. Anyone is free to mark up data in any
way using the language, even if others are doing it in totally different ways.
To interchange information much more easily, if people use the same format of data, it
becomes much more easier. Thus XML allows us to use various industry-standard
vocabularies to describe various types of data. For example, Scalable Vector Graphics
(SVG) is an XML vocabulary for describing 2-dimensional graphics, MathML is an XML
vocabulary for describing mathematics as a basis for machine to machine
communication, etc.

1.2.1 Hierarchies in XML


XML groups information in hierarchies. The items in the document relate to each other in
parent/child and sibling/sibling relationships.
These items are called elements or individual pieces of information in the data.

7
For example, in our name example, the hierarchy is:
<name>
-<first>
-

<John>

-<middle>
-

<Fitzgerald>

-<last>
-

1.3

<Byers>

PIECES OF XML

Here are some of the important technologies that make up the XML family, each
specification covering different aspects of communicating information.

XML 1.0 is the base specification upon which the XML family is built. It
describes the syntax that XML documents have to follow, the rules that XML
parsers follow, and anything else you need to know to write an XML document.

DTDs (Document Type Definitions) and Schemas provide ways to create


templates for our document types.

Namespaces provide a way to distinguish one XML vocabulary from another,


which allows us to create richer documents.

XPath describes a querying language for addressing parts of an XML document.

CSS (Cascading style sheets) and XSL (Extensible Style sheet language) are used
to format the XML documents for displaying them.

XLink and XPointer are languages used to link the XML documents with one
another, in a similar manner to HTML hyperlinks.

DOM (Document Object Model) provides a traditional way to interface with


XML documents, and SAX (Simple API for XML) is an alternative way for
programmers to interface with XML documents from their code.

1.4

WHERE IT IS USED

1.4.1 Reducing Server Load


Web based applications can use XML to reduce the load on web servers. This can be done
by keeping all the information on the client as long as possible and then sending the
information to those servers in one big XML document.

1.4.2 Web Site Content


The W3C (world wide web consortium) uses XML to write their specifications. These
XML documents can then be transformed into HTML for display or transformed into a
number of other presentation formats.
Some web sites also use XML entirely for their content where traditionally HTML would
have been used.
XML is the basis for metadata (information about information, a special type of data)
such as Microsofts Channel Definition Format (CDF) for describing Web push channels
or Netscapes Meta Content Framework (MCF).

1.4.3 Remote Procedure Calls


XML is also used for Remote Procedure Calls (RPCs) , which allow objects on one
computer to call objects on another computer to do work, allowing distributed computing.
Using XML and HTTP for these RPC calls allows this to occur even through a firewall,
which would normally block such calls, providing greater opportunities for distributed
computing.

1.4.4 E-Commerce
E-Commerce is one of those buzzwords that you hear al over the place. Companies are
discovering that communicating via the Internet, instead of by more traditional methods,
they can streamline their processes, decreasing costs and increasing response times.
Whenever one company needs to send data to another, XML is the perfect fir for the
exchange format.

CHAPTER 2
XML IN THE BROWSER
2.1

EXAMPLE XML DOCUMENT

Suppose we wish to create an XML document to describe a library of books according to


information contained as regards the book name, author, price and other statistics. We use
XML for such a project since it allows us to create a prototype of data that can be used in
other files as well.
<?xml version='1.0' encoding='us-ascii'?>
<!DOCTYPE Library>
<Library>
<!-- Book 1 Comments -->
<Book ISBN="8763-343-2343" >
<Title>Professional JINI</Title>
<Author>Sing Li</Author>
<Publisher>Wrox Publications</Publisher>
<Date_Published>22/10/1999</Date_Published>
</Book>
<!-- Book 2 Comments -->
<Book ISBN="6834-423-3434">
<Title>XML Programming</Title>
<Author>Sudhir Ancha</Author>
<Publisher>Mann Publications</Publisher>
<Date_Published/>
</Book>
</Library>

10
Couple of things to be noticed in the XML File :
The line "<?xml version='1.0' encoding='us-ascii'?>" is called the XML prolog. The
XML version number should be mentioned at the start of every XML file. The rest of the
line is optional. It tells the server about the type of character encoding our text is in, the
style sheet files used, if any, and so on.
In the above XML file, after the XML Prolog
"<?xml version='1.0' encoding='us-ascii'?>"
we have added one more line called
"<!DOCTYPE Library >"
Here DOCTYPE Library indicates that all the Tags inside this XML file will be under the
Tag "Library". Which means "Library" will be the parent or root of all other Tags in this
XML file. Each XML file can have only one DOCTYPE.
Also in the XML File we have added comments for Book1 using the Following syntax
<!-- Book 1 Comments -->
The Element called "Book" has both Attributes and More Tags under it. For Example in
the above XML file, for the Book Element, ISBN is attribute and Title, Author and
Publisher are sub Tags under the Book Element. If the Tags and Elements need to be
added compulsorily or not in the XML file along with the Element is defined by DTD
(Document Type Definition) file. For Example in the above XML file, For Book Element,
ISBN might be compulsory if the search Based on ISBN is supported. And Date
Published Tag may not be necessary at all times if there's no search facility based on get
the Most Recent Books. I will explaining how to create DTD's after next few sections.
We have declared a Empty Tag for <Date_Published/>, under Second Book. This
statement is equivalent to writing <Date_Published><Date_Published/>. This feature
could save your XML file size if there is no Data required between the Tags.
The above XML document may be called well formed XML.
Well formed XML is the XML document thatmeets certain grammatical rules outlined in
the XML 1.0 specification. There are a certain set of rules to be adhered to by the tags for
the well formedness of XML documents. They are listed in the forthcoming section.

11

2.2

RULES FOR ELEMENTS

XML documents should adhere to the following rules to be well formed.

Every start tag must have a matching end tag.

Tags cannot overlap.

XML documents can have only one root element.

Element names must obey XML naming conventions.

XML is case sensitive with respect to tags.

XML will keep white space in your text.

2.3

XML PARSERS

The main reason for creating all these rules about the well formed-ness of XML
documents is so that we can create a computer program to read in the data and easily tell
markup from information. An XML processor is more commonly called a parser, since it
simply parses XML and provides the application with any information it needs.
There are quite a number of XML parsers available. Some of them include Microsoft
Internet Explorer Parser, James Clarks Expat, Vivid Creations ActiveDOM, and popular
among Java users, JavaSofts's XML Parser and IBM's Xerces Parser.

2.4

FORMATTING XML DOCUMENTS

Trying to view an XML document in the IE5 browser will certainly result in some kind of
output due to the inbuilt default style sheet language in IE5. But this output will not be
conforming to how you wanted it, in terms of color, foreground, design, formatting and
general style. This is why we have style sheet languages like Cascading StyleSheet
Language (CSS) and the default style sheet language for XML developed by the W3C
called the XSL (Extensible Stylesheet language).
The need for formatting arises from the fact that in XML user defined tags rule the roost.
Thus in our earlier book example, there is nothing built into the browser that will
recognize that Book title, author, ISBN identification number et al, will appear in
different columns, in different fonts or colors, or even on different lines. What we would

12
see would be an unending line of words separated by whitespaces at the appropriate
places.
Thus we develop the Content/Presentation paradigm with the use of stylesheet languages,
which essentially embodies the idea that you separate the data from the way that data is
displayed.

2.4.1 CASCADING STYLE SHEETS


It is a styling tool that can be used with XML as well as HTML documents. It provides
us with the facility of being able to style individual tags the way we want; for example,
the book name in bold, with a font size of 15 in blue color, the ISBN in red, with a font
size of 18, and italicized print, and things like that.
Since my aim in this seminar is to show the importance of XML in data storage
applications, I will not be delving into the details of how we can make XML documents
look attractive in a browser environment. Nevertheless, an example is presented below.
Consider the following example of an XML document which is used to display the news
of the week on a website:
<?xml version="1.0"?>
<!--File name: first.xml-->
<?xml-stylesheet type="text/css" href="c:\first.css"?>
<nitf>
<head>
<title>The Weekly News</title>
</head>
<news>
<news.head>
<hedline>
<h1>Bush Warns Terrorists</h1>
</hedline>
<byline>
<bytag>By our correspondent</bytag>
</byline>

13
<dateline>
<location>Beijing, China.</location>
<story.date>Saturday February 23 2002 9:43 IST</story.date>
</dateline>
</news.head>
<news.content>
<p>The President of the United States, George W. Bush, today issued a statement, in which
he came down hard on the abductors, and supposedly the killers, of the kidnapped
Washington Post correspondent Daniel Pearl.</p>
<p>A grim faced Bush told reporters that the agents of terror operating in Asia would not
get away with what they had done to the American journalist.</p>
<p>Although the abductors have not released Pearls's body, American sources said they
have received evidence of Pearl's murder, in the form of a video tape showing him being
stabbed.</p>
<p> The news of Pearl's death comes as no surprise after the arrested Pakistani militant
Omar Sheikh, said that Pearl had been murdered by his abductors.However it has been
received by the world community with a mixture of shock, sadness and outrage. Pearl is
survived by his reporter wife Marianne, who is pregnant with their first child.</p>
</news.content>
</news>
</nitf>

As can be seen in the above document, we have a well formed XML document. But it
will appear on the website as a drab line by line account that will leave visitors with a bad
taste for reasons other than the news it is displaying. But this problem can be offset by the
efficient use of a cascaded style sheet document that will attach style to each of the tags.
All that needs to be done is linking the CSS file with the above XML document with the
use of the line:
<?xml-stylesheet type="text/css" href="c:\first.css"?>

first.css will be the cascaded style sheet document that can be typed on a simple notepad
and then be saved as a .css file. For example, the CSS file for the above document can be
written as follows:
/*File name-first.css*/

14
hedline {
display: block;
width: 400px;
border-bottom: 5px double black;
text-align: right;
font-family: Times, serif;
font-size: 36pt;
background-image: url("c:\news.bmp");
}
byline {
display: inline;
width: 200px;
text-align: left;
color: black;
font-family: Times, serif;
font-size: 14pt;
}
dateline {
display: inline;
width: 200px;
text-align: right;
color: black;
font-family: Times, serif;
font-size: 11pt;
font-style: italic;
}
p{
display: block;
width: 400px;
color: black;
font-family: Times, serif;
font-size: 12pt;
}

15
As can be seen in the above example, CSS files specify the tag of the corresponding
XML document (in the same case) and also specify various attributes like font size, font
family, color, background color, background image, width of the place on the webpage
occupied by the tag etc, followed by the value of that particular attribute. For example,
the statement:
background-image: url("c:\news.bmp");

in the hedline tag, styles the document in such a way that the headline of the news item
is presented on the background of the news.bmp image.
Other attributes like alignment, indentation, margins and padding, position of the content
with respect to the browser (static, relative, absolute and fixed), height and width of the
output and tables can also be specified similarly.

2.4.2 EXTENSIBLE STYLE SHEET LANGUAGE


It is a language that can transform XML documents into any text based format, XML or
otherwise. It is also used to create style sheets, similar to CSS. You can define the layout
of the output document, and where to get data from within the output document.
XSLT style sheets are built on structures called templates, which specify what to look
for in the source tree, and what to put into the result tree.
XSLT is especially important in the area of E-Commerce. For instance, consider two
companies that wish to communicate their data. A is a store and B fulfils As orders.
Then three scenarios are possible- A can use the same structure for its data as B uses, B
can use the same data structure as As, or they can use whatever XML format they wish to
use internally, but transform their data to a common format whenever they wish to
communicate the information outside.
With XSLT, this kind of transformation becomes quite easy.
Consider the following example of a database consisting of student records. The XML
document looks like this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="c:\shaivya2.xsl"?>

16
<rollcall>
<student>
<name>Shaivya Easwaren</name>
<address> 6, Gagangiri Villa, Vidyasagar Colony, Gultekdi, Pune</address>
<number>115</number>
<tel>020-4272832</tel>
<email>shaivya_e@yahoo.com</email>
</student>
<student>
<name>Ranjana Rao</name>
<address> 7/A, Pleasant Park,Bhairoba Nala,Hadapsar, Pune</address>
<number>126</number>
<tel>020-6871580</tel>
<email>rrao@chequemail.com</email>
</student>
<student>
<name>Namita Sane</name>
<address> 4,Center Court, Prabhat Road, Deccan Gymkhana, Pune</address>
<number>132</number>
<tel>020-5673275</tel>
<email>namita_s@hotmail.com</email>
</student>
<student>
<name>Krushna Bagade</name>
<address> 532/2, Adinath Society, Vithoba Chowk , Kothrud, Pune</address>
<number>104</number>
<tel>020-5436119</tel>
<email>krushna_b@yahoo.com</email>
</student>
</rollcall>

The above information needs to be displayed in tabular format, which is not possible if
we simply open the file in the Internet Explorer 5.0 browser. Here is where XSLT comes
to our rescue.

17
The statement in the afore mentioned XML document:
<?xml-stylesheet type="text/xsl" href="c:\shaivya2.xsl"?>

tells the browser that the type of style sheet used for styling the file is a .XSL file and to
look for the linked xsl file that contains the formatting information of the XML
document.
The XSL file will look something like this:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<head>
<title>Student table</title>
</head>
<h1><center>Student Records</center></h1>
<body bgcolor="#d1d1d1" text="#ff0000">
<table border="2">
<tr>
<td><i><h2>Name</h2></i></td>
<td><i><h2>Address</h2></i></td>
<td><i><h2>Roll no</h2></i></td>
<td><i><h2>Tel No.</h2></i></td>
<td><i><h2>E-mail ID</h2></i></td>
</tr>
<xsl:for-each select="rollcall/student">
<tr>
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="address"/></td>
<td><xsl:value-of select="number"/></td>
<td><xsl:value-of select="tel"/></td>
<td><xsl:value-of select="email"/></td>
</tr>
</xsl:for-each>

18
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

In the above .XSL file the first statement is necessary. It defines the W3C standard used
in the XSL document. Templates are the heart and soul of XSLT. Style sheets are simply a
collection of these templates, which are applied to the input document to get the output
document. Style sheets may have as many templates as are needed.
The section of the source tree to which the template applies is specified by the match
attribute. In this case match=/ indicates the template is matched against the document
root.
Special XSLT elements indicate to the processor that it should do some work. In this
case, an element called <xsl:for-each> is in effect, a mini template applying to any XML
element in the source tree matching its select attribute. The element called <xsl:value-of>
is used to put the value of the XML element in the result tree.
When viewed in the browser, an XSL file appears in the form of a hierarchical tree,
much like the XML document it is intended to style.
XSLT can also be used in conjunction with CSS. They provide a complementary
functionality. XSLT can help you structure your pages in a wide number of formats. CSS
can then balance this with easily modifiable media representations for those browsers that
support it. XSLTs primary domain is to provide transformation (i.e. programming)
services to XML, while CSS takes the results of such transformations and makes XML
into multimedia.

19

CHAPTER 3
XML INTERFACES
3.1

DOCUMENT OBJECT MODEL AND XML-DOM

The Document Object Model (DOM) provides a means of working with XML documents
and other types of documents through the use of code, and a way to interface with that
code in the programs that we write. For instance DOM enables us to create documents
and parts of documents, navigate through the document, move, copy and remove parts of
the document, add or modify attributes.
Working with an object model makes working with information easier. An XML
document in fact is structured very much like an object model, as seen in chapter 1. it is
hierarchical, with nodes potentially having other nodes as children.
The DOM can model any XML document regardless of how it is structured. It is usually
added as a layer between the XML parser and the application that needs information from
the document, meaning that the parser reads the data from the XML document and then
feeds the data into the DOM. The DOM is then used by a higher level application. The
application can do whatever it desires with such information, including putting it into
another proprietary object model if desired.
The DOM does not really deal with objects that much. It mainly works with interfaces.
An interface is, by definition, a contract to support certain properties and methods, which
can be applied to an object. Different programming languages may or may not use the
term interface, or have a specific mechanism for providing interfaces, but the same
concept can be applied to any language.

20

3.2

SIMPLE API FOR XML (SAX)

The Simple API for XML or SAX was developed in order to enable more efficient
analysis of large XML documents. The problem with DOM is that before you can use it
to traverse a document, it has to build a massive in-memory map of it. This takes up
space and time, and is inefficient if you wish to recover small amounts of information.
If we want to locate only specific parts of the document, a second approach is more
appropriate. The way that SAX works, and that is EVENT-DRIVEN. Rather than parse
the document into the DOM and then use the DOM to navigate around the document, we
tell the parser to raise events whenever it finds something.
Known SAX interfaces such as DocumentHandler can be used to catch events passed
to us by the parser. We can use this to extract some simple information from the XML
document. It is also possible to implement error handling by making the
DocumentHandler throw SAXExceptions whenever an error is detected in the parsing.
Sophisticated intelligent parsing allows us to report errors and throw exceptions as they
are found. Error handling mechanisms in the parser can be supplemented by using the
Locator object.
Thus SAX is an excellent API for analyzing and extracting information from large XML
documents without incurring the time and space overheads associated with the DOM. The
latest version of SAX is the SAX 2.0.

21

3.3

NAMESPACES

Namespaces are the means by which we can differentiate elements and sometimes
attributes of different XML document types from each other when combining them
together into other documents or even when processing multiple documents
simultaneously.
Because of the nature of XML, it is possible for any individual to create XML document
types which describe the world in their own terms. If company A feels that <order>
should contain a certain set of information, and B feels it should contain a different set of
information, they can both go ahead and create different document types to describe that
information. However, personalized XML vocabularies are bound to create a problem
sometime, due to the limits imposed by the scope of human vocabulary. How can one
define a <title> element to denote a persons name when XHTML already has a <title>
element, which is used to describe the title of an HTML document? Further, how can one
distinguish these from the <title> of a book? The answer is- namespaces.
A traditional namespace is a set of zero or more names, each of which must be unique
within the namespace and constructed according to the rules (if any) of the namespace.
For example, the names of element types in an XML document inhabit a traditional
namespace, as do the names of tables in a relational database and the names of class
variables in a Java class. Traditional namespaces also occur outside the field of computer
science -- for example, the names of people could be thought to inhabit a traditional
namespace, as could the names of species.
Different traditional namespaces are disjoint, i.e. they are not related. Because of this, a
name in one traditional namespace does not collide with the same name in a different
traditional namespace. This property is useful to applications that have multiple sets of
names. By assigning each set of names to a different traditional namespace, they can
allow the same name to occur in each set of names without fear of collision. For example,

22
in the following XML document, there is no conflict between the three different uses of
the name Value.
<AuctionItem>
<Title Value="486Laptop"/>
<Category Value="Computers"/>
<Value>$100</Value>
</AuctionItem>

This is because an XML document has one traditional namespace for element type names
and, for each element type, one traditional namespace for the names of the attributes that
apply to that element type. Thus, the two Value attribute names don't conflict because
each is assigned to a different traditional namespace -- the first to the attribute namespace
for the Title element type and the second to the attribute namespace for the Category
element type. Furthermore, neither of the Value attribute names conflicts with the Value
element type name because element type names are kept in a traditional namespace that is
separate from the attribute namespaces. The XML namespaces recommendation does not
define anything except a two-part naming system for element names and attributes.
As an example of how XML namespaces are used to resolve naming conflicts in XML
documents that contain element types and attributes from multiple XML languages,
consider the following two XML documents:
<?xml version="1.0" ?>
<Address>
<Street>Wilhelminenstr. 7</Street>
<City>Darmstadt</City>
<State>Hessen</State>
<Country>Germany</Country>
<PostalCode>D-64285</PostalCode>
</Address>

and:

23
<?xml version="1.0" ?>
<Server>
<Name>OurWebServer</Name>
<Address>123.45.67.8</Address>
</Server>

Each document uses a different XML language and each language defines an Address
element type. Each of these Address element types is different -- that is, each has a
different content model, a different meaning, and is interpreted by an application in a
different way. This is not a problem as long as these element types exist only in separate
documents. But what if they are combined in the same document, such as a list of
departments, their addresses, and their Web servers? How does an application know
which Address element type it is processing?
The answer is to assign each language (including its Address element type) to a different
namespace. This allows us to continue using the Address name in each language, but to
distinguish between the two different element types.
By assigning each Address name to an XML namespace, we actually change the name to
a two-part name consisting of the name of the XML namespace plus the name Address.
This means that any code that recognizes just the name Address will need to be changed
to recognize the new two-part name. However, this only needs to be done once, as the
two-part name is universally unique.
The name of the XML namespace is a URI. This allows XML namespaces to provide a
two-part naming system for element types and attributes. The first part of the name is the
URI used to identify the XML namespace -- the namespace name. The second part is the
element type or attribute name itself -- the local part, also known as the local name.
Together, they form the universal name. For example:
<Department>
<Name>DVS1</Name>
<addr:Addressxmlns:addr="http://www.tudarmstadt.de/to/addresses">

24
<addr:Street>Wilhelminenstr. 7</addr:Street>
<addr:City>Darmstadt</addr:City>
<addr:State>Hessen</addr:State>
<addr:Country>Germany</addr:Country>
<addr:PostalCode>D-64285</addr:PostalCode>
</addr:Address>
<serv:Server xmlns:serv="http://www.tu-darmstadt.de/ito/servers">
<serv:Name>OurWebServer</serv:Name>
<serv:Address>123.45.67.8</serv:Address>
</serv:Server>
</Department>

Thus, each universal name is unique, meeting the requirement that each element type in
an XML document have a unique name.
Thus we have seen the functions of various XML interfaces and the modularity and
reusability each of them incorporate into the language.

25

CHAPTER 4
XML DATA
4.1

INTRODUCTION

XML Documents follow a tree structure. A tree is a natural structure that is richer than a
simple flat list yet also respectful of cognitive and data processing requirements for
economy and simplicity. Valid XML documents belong to classes- document types- that
determine the tree structure and other properties of their member documents. The
properties of the classes themselves comprise the document type definitions or DTDs
which serve the same role for documents that schemas do for databases.
XML-Data is a notation in the form of an XML document that is both an alternative to
markup declarations for writing DTDs and a means of augmenting DTDs with
additional capabilities. For instance,

XML-data supports rich data types, allowing for tighter validation of data and
reduced application effort.

Through the namespaces facility, XML-Data improves expressiveness, ensuring


the existence of uniquely qualified names.

XML-Data provides for greater and more efficient semantic capabilities by


incorporating the concept of inheritance, enabling one schema to be based on
another. For instance, a bookstore purchase order schema could be based on a
general- purpose E-Commerce schema.

Other benefits of the XML-Data, which uses XML instance syntax, include

The same tools that are used to parse XML can be used to parse the XMLData notation.

26

As the syntax is very similar to HTML, it is easy for HTML authors to


write and read.

It is easily extensible.

Schemas define the characteristics of classes of objects. Syntactic schemas are


used for classes that are strictly syntactic, like XML. Conceptual schemas are used for
classes that indicate concepts or relations among concepts, like RDBMS. Schemas are
composed of declarations for

Element- indicates the containment of a single element type


(property).

Empty, Any, String and Mixed content- the names are selfexplanatory. Mixed content is a mixture of parsed character data
and one or more elements.

Group- a set of sequence of elements.

Constraints and additional properties- like min and max


constraints, domain and range constraints etc.

4.2

XML-SPECIFIC ELEMENTS

1) ATTRIBUTES
The XML syntax allows that certain properties can be expressed in a form called
attributes. An attribute may be given a default value. For example:
2) ENTITY DECLARATION ELEMENT TYPES
Entities are a shorthand mechanism similar to macros in a programming language.
3) EXTERNAL DECLARATIONS ELEMENT TYPE

27
The extDcls declaration gives a clean mechanism of importing fragments from other
schema.

4) DATATYPES
A datatype indicates that the contents of an element can be interpreted as both a
string, and also, more specifically, as an object that can be interpreted more
specifically as a number, date etc. The datatype indicates that the elements contents
can be parsed or interpreted to yield an object more specific than a string.
Some common data types, their parse types, storage types in memory etc. are given in
the table on the next page.
XML-Data datatypes include all the highly popular types and all the built in types of
popular data base and programming languages like SQL, Visual Basic, C, C++, and
Java.

TABLE 4.1 : SPECIFIC DATATYPES IN XML-Data

NAME

PARSE TYPE

STORAGE TYPE EXAMPLE

string

Pcdata

String(Unicode)

Greek letters

number

A number, with

String

15, 3.14, 123.456E+10

no limit on its
digits, and
optional sign,
float and

28

exponent
int

A number with

32-bit signed

optional sign,

binary

1,58502, -13

no fraction, no
exponent
.31415926E+1
float

Same as for

64 bit IEEE 488

number
12.0044
fixed .14.4

Same as

64 bit signed

number, less

binary

than 15 digits
to left of ., 4
to the right.
boolean

1 or 0

Bit

char

String

1 unicode

0,1 (1==true)

character (16 bits)


string.ansi

String with

Unicode or single

only ascii

byte string

I am Shaivya.

characters
bin.hex

Hexadecimal

No specified size

digits
representing
octets
uri

Universal
resource
identifier

Per W3C spec

http://www.gossamer.org/fluky

29
Other datatypes include:
-

i1: an 8-bit binary number with optional sign, no fractions, no exponent.

i2: 16-bit binary.

i4: 32-bit binary.

i8: 64-bit binary.

ui1-ui8: unsigned binary.

r4: IEEE 488 4-byte float.

r8: IEEE 488 8-byte float.

4.3 EXAMPLE XML SCHEMA


The Schema element type:
All schema declarations are contained within a schema element type like this:
<?XML version=1.0?>
<?xml:namespace
name=urn:uuid:BDC6E-11d1-00AA00CC14822/
as = s/?>
<s:schema id=ExampleSchema>
<!schema goes here-- >
</s:schema>

30
The heart of the XML-Data schema is the elementType declaration which defines a class
of objects (or type of element in XML terminology). The id attribute serves a dual role
of identifying the definition and also naming the specific class.
<elementType id=author/>
<description> The description subelement may be used to provide a human readable
description of the elements purpose. In this case it is the person who wrote the book.
</description>
</elementType>

Consider the following schema that describes the book object. The element may be
required or optional and may occur multiple times, as indicated by the occurs attribute
that may have the values REQUIRED, OPTIONAL , ZEROORMORE or
ONEORMORE. It has a default of required.
<elementType id=Book>
<element type =#title occurs=OPTIONAL/>
<element type=#author occurs=ONEORMORE/>
<attribute name=copyright/>
</elementType>

describes an instance such as


<Book copyright=1922>
<title>Hitchhikers Guide to the Galaxy</title>
<author>
Douglas Adams
</author>
</Book>

31
Here each instance of book may contain a title and must contain one or more
authors.
Consider another example of the schema to make the idea clearer.
<?XML version=1.0?>
<?xml:namespace
name=urn:uuid:BDC6E-11d1-00AA00CC14822/
as = s/?>
<s:schema id=ExampleSchema>
<elementType id=name>
<string/>
</elementType>

<elementType id=Person>
<any/>
</elementType>

<elementType id=author>
<string/>
</elementType>

<elementType id=titlePart>
<string/>
</elementType>

32
<elementType id=title>
<mixed><element type=#titlePart/></mixed>
</elementType>

<elementType id=Book>
<element type=#title occurs = OPTIONAL/>
<element type=#author occurs ONEORMORE/>
</elementType>
</s:schema>

Consider an instance of the above schema, that is, the information regarding a single
book:
<Book>
<author>Henry Ford </author>
<author>Samuel Crowley</author>
<title> The Unofficial Guide to Intergalactic Travel <titlePart>A Spoof on Martians and
Everything Otherworldly</titlePart></title>
</Book>

Then the schema defines an instance of Book to have an optional title, and one or more
authors. The name element has a content model of any, meaning that free text is not
allowed, but any arrangement of subtitlements is valid. The content model of title is
mixed, allowing a free intermixture of characters and any number of titleParts. The
author, name and titleParts elements have a content model of string.

Mapping between schemas

33
Syntactic schemas often have fewer elements compared to explicitly conceptual ones. It
is also easier to design a schema that merely covers syntax than a well thought out
conceptual data model. An effect of this is that many practical schemas will not contain
all the elements that a conceptual schema would, either for reasons of economy or
because the schema was simply syntactic. But it is useful to make the implicit explicit, or
more general over time, so that more generic processors can make use of the data.
Thus we can add mapping information to the syntactic schema using the <mapsTo type..>
statement. It will tell us how to interpolate the implied elements thereby creating a
conceptual or RDBMS data model.
Thus schemas are an alternative way to constrain the nature and structure of data items in
XML as well as the relationships among those data items. They also provide several
advantages over DTDs .

CHAPTER 5
XML AND DATABASES
5.1 USING XML IN AN N-TIER APPLICATION
The N-tier architecture typically has the following logical layers

Data services, where all data for the application is stored (usually a database).

Data Objects, which handle the communication between the database and
Business Objects.

Business Objects, which take care of the business logic in your application, and
are responsible for communication between the presentation and data layers.

Presentation, which is responsible for communication between the user and the
business logic tier.

34
If XML is to be used in the above client-server environment, the presentation layer is
going to be using XML for its data needs. Our business objects will also be using
XML, both to communicate with the presentation tier and with each other. So we can
might as well go all the way and have our data objects return XML instead of
recordsets. When updating the database the Business Object could pass XML to the
data objects, which would parse the XML and pull out the appropriate data to insert
into the databases. This means that potentially, any time one object would
communicate with another it would use XML as the common language.

5.2

RETURNING XML FROM A DATA OBJECT

We will be using Visual Basic to write the data object, ADO to connect to the
database, and Microsofts XML parser, MSXML (Section 5.3.1), to create an XML
document with the results of the query.

Dim cnnDatabaseConnection as ADODB.Connection.


Set cnnDatabaseConnection=New ADODB.Connection.
here we are connecting to the database.
Dim strSQL as string
StrSQL=SELECT last_name FROM Customer WHERE account_number=1952

We are now ready to execute our SQL. We call the execute() method that returns a
recordset object as the result of the query.
Dim rsResult as ADODB.Recordset
Set rsResult=cnnDatabaseConnection.Execute(strSQL)

We then create a quick XML document and populate that object via the DOM with
the values from our SQL.

35
Dim objXML as MSXML.DOMDocument
Set objXML=New MSXML.DOMDocument
objXML.loadXML <root><lastname/></root >

The XML document looks like this:


<root>
<lastname/>
</root>

And finally the last step is to get the value from our recordset and add it to the XML
document.
objXML.selectSingleNode(/root/lastname).Text=rsResults(last_name).Value

MSXML provides a property of the Document object called xml which returns a
string containing the XML document that this DOM is modeling. All the data object
now has to do is to return the text from that property and we are done.

5.3

DATABASE VENDORS AND XML

With both XML and database being data-centric technologies, they are not in
competition with each other, contrary to established belief. XML is best used to
communicate data and a database is best used to retrieve data, which makes the two
complementary rather than competitive. For this reason, database vendors, while
realizing that XML will never replace the database, but become more closely
integrated with it, have recognized the power and flexibility of XML. They are thus
building support for XML right into their products.

5.3.1 MICROSOFTS XML TECHNOLOGIES


Microsoft has been big on XML since the very beginning. Some of its technologies
providing XML support are:

36

MSXML
The Internet Explorer Browser comes bundled with the MSXML COM-based

parser which provides a DOM interface. It provides validating and nonvalidating modes as well as support for XML namespaces. It also provides
support for XSL transformations.

Visual Basic Code generator

It can read XML Schema documents and produce Visual Basic code to match
the schema. In effect, you can build the basics of an Object Model, based on an
XML document type automatically.

SQL Server

There is XML support built into SQL Server, Microsofts Relational Database
Management System. SQL server provides the capability to perform an SQL
query through an HTTP request via an ISAPI filter for Internet Information
Server (Microsofts Web Server). Not only can you get data from the SQL
Server using XML, you can also put it in using SQL Update Grams. These are
XML files containing the information you want to put into the database in a
certain format.

5.3.2 ORACLES XML TECHNOLOGIES

XML parsers:

The first tool available from Oracle is the XML parser. Oracle provides parsers
written in Java, C, C++, and PL/SQL. These parsers provide a DOM interface, a
SAX interface, both validating and non-validating support, support for
namespaces and fully compliant support for XSLT.

Code Generators:

37
Oracle offers Java and C++ class generating applications like the Visual Basic
code generator. However these generators work from DTDs and not schemas,
meaning they are fully conformant with W3C specifications.

XML SQL Utility for Java

The XDK (XML Developers Kit) also provides the XML SQL Utility for Java
that can generate an XML document from an SQL query, either in text form or
as a DOM. It can also take in XML documents and use the information to update
the database, like SQL Server 2000.

XSQL Servlet

This servlet takes in an XML document that contains SQL queries, like the XML
templates used by the SQL Server. It can optionally perform XSL
transformations on the results, so the results can potentially be any type of file
that can be returned from an XSLT transformation, including XML and HTML.
Because it is a servlet, it can run on any web server that has a Java virtual
machine and can host servlets.

CHAPTER 6
APPLICATIONS OF XML
WHAT YOU CAN DO WITH XML
The potential areas of application of XML can be classified into three main
ones: three tier web applications, multi-platform electronic publishing and
electronic commerce or EDI.
Some of the possible areas of improvement using XML are outlined below:

38
The following are just a few examples of some of the exciting new technologies
enabled by XML:
1) Internet Search Engines:
Imagine a search engine that understands and uses contextual information when
performing a full-text search. Searching for information about the Java
programming language would no longer yield links to coffee sites or the Island of
Java. This is because searching for the term "Java" is narrowed down to those
fields tagged as a "programming language". As a result, the speed and accuracy of
the search is dramatically improved. Widespread use of XML repository
technology on Web servers will play a vital role in easing the "information
overload" currently suffered by Internet users. Of course all of these benefits
require a sophisticated, scalable and fast repository. This repository must be able
to manage the rich XML links and understand XML structure so that it indexes
text based on its context and use in a document.
2) Electronic Commerce:
The long-expected rise of electronic commerce has been stymied by the difficulty
encountered by consumers in finding the desired product among the myriad of
vendors setting up shop on the Internet, all with different product lines, prices, online viewing capabilities, delivery options and so forth. So-called intelligent
agents have not helped because they have an even harder time than humans in
trying to make sense of the digital morass presented by HTML. With XML
repository technology, on-line stores can present product information in a
standard, structured format, independent of page design. Electronic commerce is
obviously focused on financial transactions. Using HTML, the user must
manually wade through HTML information to extract relevant data like price, tax,
etc. And unlike text, numbers have no inherent context. In other words, price
means something, but how do you know whether a number is associated with a
price, a tax, an address or anything? XML creates this association, making human

39
and machine interpretation a reality. XML is the catalyst that will finally unleash
the explosive potential of electronic commerce. The XML-aware query facilities
of the repository make it possible to retrieve relevant information directly and repurpose it as needed it for processing by an automatic agent or a user. By reducing
the time needed to locate a product, a price, or any other relevant information on
the Internet, XML repositories will play an important role in making on-line
shopping more efficient and enjoyable.
3) Electronic Data Interchange (EDI):
EDI (Electronic Data Interchange) works by providing a collection of standard
message formats and element dictionary in a simple way for businesses to
exchange data via any electronic messaging service.
XML/EDI provides a standard framework to exchange different types of data -for example, an invoice, healthcare claim, project status -- so that the information
be it in a transaction, exchanged via an Application Program Interface (API), web
automation, database portal, catalog, a workflow document or message can be
searched, decoded, manipulated, and displayed consistently and correctly by first
implementing EDI dictionaries and extending our vocabulary via on-line
repositories to include our business language, rules and objects. Thus by
combining XML and EDI we create a new powerful paradigm different from
XML or EDI.
4) Data Re-purposing:
By breaking documents into discrete elements, it becomes very easy for
individuals to extract the truly relevant information from several sources and
reassemble it into any format (e.g. web page, document, presentation, whatever).
This helps to address the current information overload, because the user receives
only the relevant information. In fact, the information might even be assembled
by a personal agent. This ability also facilitates the acceleration of learning since
it becomes much easier to assemble the "current" body of work on a particular

40
subject, and then take it a step further, pushing the development of human
knowledge ever forward.
These are just a few of the exciting technologies enabled by XML. Looking at
these examples, it is easy to understand why XML is creating such excitement in
the Internet community. As software developers begin to implement XML
applications, however, they will have to address the need to turn these ideas into
reality, while keeping up with the ever shortening development cycles
characteristic of Web development. In many cases, developers will find that their
prototypes work fine in the test laboratory but do not scale to address real world
conditions of concurrent usage and data volume. XML's rich interlinking and
hierarchical naming structure introduces a whole new set of requirements that
bring solutions based on the file system of relational architectures to their knees.
An XML-savvy object repository, designed to be embedded in XML applications
of all types, or the operating system itself, is the only solution that provides the
functionality and scalability required to drive the realization of this vision of a
new generation of networked applications.

CHAPTER 7
CONCLUSION
Although the XML specification is still not complete and there is a need for many
more XML tools, developers have enough information to get started today with
this new technology. XML is an efficient way to move data around the Web,
between different Web-enabled applications.

41
The ultimate promise of XML is that it will allow Web publishers to provide users
with information they find more meaningful. Once the technology has matured
and Web publishers have created rich sets of data described by XML, users will
find it easier to extract the information they want.
When a document is fully tagged with XML, you can do amazing things with it.
XML lets you view a document in whatever form you want to see it, rearranging
its structure on the fly, or just pulling out sections that contain the information
you're looking for. You can break a document into as many fragments of text as
you want and tag them all. In an extreme case, you could apply an XML tag to
every word, although that probably wouldn't make sense. You can even define
your own new XML tags, allowing for infinite tagging possibilities.
Large companies will be among the first to use XML since it can help them
catalog, search, and retrieve information from their vast stores of documents.
Because XML is extensible, it can describe data contained in a wide variety of
applications, from entire collections of Web pages to individual database records.
XML thus truly is a panacea for Web users looking for an efficient way to store,
retrieve, communicate and structure data moving on the World Wide Web.

BIBLIOGRAPHY
1) THE XML HANDBOOK

- Charles F. Goldfarb, Paul Prescod (Addison-

Wesley)
2) BEGINNING XML
Websites:

- David Hunter

(Wrox)

42

1) www.oasis.org
2) www.w3c.org/xml
3) www.ebXML.org
4) www.microsoft.com/xml

43

You might also like