You are on page 1of 16

What is XML?

The eXtensible Markup Language (XML) is a universal way of structuring documents and other data.

Markup Languages have existed for many years before the start of the orld ide eb. ord!erfect and "ich Text #ormat ("T#) have used markup tags to provide special formatting commands that apply to specific words and text. $yper Text Markup Language ($TML) is the markup language used for web pages.

$TML has gained widespread use and is easy to understand. %oth $TML and XML are derived from the &tandard 'enerali(ed Markup Language (&'ML).

1986 SGML (document markup language)

199 !"ML (#e$ page speci%ic markup language)

199& XML (#e$ page and general documents markup language)

''1 XML 1(1

!reviously anyone who wanted to create web pages would have to learn $TML syntax and make the page using simple text editors.

More advanced $TML specific editors appeared that checked the web pages and $TML tags. hen applications such as M& #rontpage appeared) people could author web pages without learning all the $TML tags. Many thousands of web pages were created daily) mostly showing personal homepages or company marketing information. *s the use of websites became more sophisticated) the limitations of $TML as become apparent. The next section covers the similarities and differences between $TML and XML.

!"ML )s( XML


Those developers that are familiar with $TML will recognise syntax used in XML) however XML describes the data better than $TML. &imilarities with $TML+ , #irewalls do not need to be reconfigured , -nown system for security (same web server) firewall) protocols). *i%%erences $et#een XML and !"ML+ , XML is a standard for data interchange , $TML has a fixed set of tags) whereas XML allows you to define tags. , $TML was designed for rendering information from computer to , XML has a large overhead in tags used to define the document your own custom human. elements.

To see how XML separates data from the presentation format) the following example is provided. !"ML .b/ visualbuilder.com webmaster0visualbuilder.com .1b/ *fter reading the $TML displayed above) you can see that it is not exactly clear what is being displayed. e can guess that it is a web site and an email address. * computer program will have great difficultly understanding what this text is in a reliable way. %elow is XML e2uivalent to represent the same text and data XML .site/ .sitename/visualbuilder.com.1sitename/ .emailaddress/webmaster0visualbuilder.com.1emailaddress/

.1site/ 3ou can figure out what this means) but the main reason is that the computer program can make use of it. XML takes more space but it defines the information more precisely and robustly. XML ,asics 4n order to use XML) the document must comply with certain rules to be ell #ormed.

The

ell #ormed rules for XML documents are+

1("ags must $e nested(

.a/

.b/.1b/

.1a/

( -ou cannot omit end tags

4n $TML many developers leave out .1br/ and .1p/ tags. Most browsers will handle this correctly.

.( /e# s0nta1 %or end tags(

.car engine5678886/.1car/

.car engine59788891/

These are the same.

2( 3ll documents must $e contained in the root element(

*ny document is well formed if it agrees with above rules.

4f the document is to be checked for validity) the document uses a :ocument Type :efinition (:T:). The document must begin with .;:<=T3!>/ and agree to the above rules. Wh0 use XML in 4a)a? XML and ?ava work very well together. @. !ortability A ?ava is a platform independent development language. XML is an architecture and language independent data format. %oth ?ava and XML do not care about the platform. B. >xtensive development support A There is good *!4 support in ?ava for XML. !eople are using XML for conducting businessCtoCbusiness transactions and a standard way for different computers to communicate with each other. 3pplication areas o% XML !resentation <riented !ublishing (!<!) A the same data in a web and a !:*. , Message <riented Middleware (M<M) A %B%. browser) mobile phone

This is where an application uses XML as a message format for communication between different systems. , XML used for exchanging database contents. To illustrate the applications we will use an example of an online shopping system. @) The owner of the online shop wants to provide an online service so that it also works on mobile phones and !:*s. :ata in the store database is used to generate a XML document. This same XML document can be transformed using X&L (covered in a later section) into $TML) ML or any other format. This would allow the information to be displayed on mobile phones and !:*. B) The owner wants to automate order processing and fulfilment. &ince the @DE8s) systems have used >lectronic :ata 4nterchange for computerCtoCcomputer communication. This allowed orders and invoices to be sent using a messaging standard. The problem with >:4 is that it is very expensive to buy and >:4 system) therefore only large companies would use it. Two companies agreeing on a :T: can send messages over the 4nternet using XML. !roblems occur regarding security and reliability but these are being addressed. 'oing back to our example) when a customer orders a product from the online shop) the shop sends a standard message to the delivery agent. The delivery computer system automatically updates itself with the latest orders and automatically sends an acknowledgement back to the shop. Messages could also be sent from the suppliers to the online shop to update the inventory on the specific products. 7) The marketing department would like to extract the data from the online shop so they can organise product promotions and sales. $owever) the marketing database is in M& *ccess and the shop uses <racle. There is no specific standard for exchanging data from one database to another. XML allows all the tables to be totally described by using custom tags. .table/ .field/ Standards There are several standard bodies involved in ?ava and XML.

XML speci%ications #rom a specifications perspective specifications for XML. orld ide eb =onsortium ( 7=) provides the base

http+11www.w7.org

*e)eloping 3pplications using XML

The *pache XML proFect provides open source XML implementation solutions.

3ou can find the following at the *pache XML site+ http+11xml.apache.org

Listed below are the *pache proFects related to using XML in ?ava+

Xerces C XML parsers in ?ava) = (with !erl and =<M bindings)

Xalan C X&LT stylesheet processors) in ?ava and =

=ocoon C XMLCbased web publishing) in ?ava

#<! C X&L formatting obFects) in ?ava

Xang C "apid development of dynamic server pages) in ?ava&cript

&<*! C &imple <bFect *ccess !rotocol

%atik C * ?ava based toolkit for &calable Gector 'raphics (&G')

=rimson C * ?ava XML parser derived from the &un !roFect X !arser. The ?ava =ommunity !rocess (?=!) also has developed a comprehensive set of application programming interfaces (*!4) for developing XML applications in ?ava.

4n this tutorial you will see how to develop XML applications in ?ava using different methods. *ll the methods are based on downloadable and free tools and technologies. Setting up the en)ironment %or XML and 4a)a To use XML you will need a XML parser but before downloading a XML parser) you must make sure you have ?ava (?:-). Setting up 4a)a :ownload ?:- @.7 from the following H"L+

http+11Fava.sun.com1FBse1@.71 #or indows) the complete download is about 78 M%.

"un through the setup. <ne of the main problems new ?ava developers have is setting the !*T$ and =L*&&!*T$.

#or indows DI1DJ1M> you edit the *HT<>X>=.%*T file with the new !*T$ and =L*&&!*T$ settings and reboot your machine.

#or

indows KT1B888 you edit the environment settings.

%oth of these changes are described in the ?ava installation instructions.

&etting up XML

This tutorial will use the Xerces XML parser found on the *pache XML site.

@. :ownload the latest version of Xerces from the following H"L+

http+11xml.apache.org1dist1xercesCF1

4f you are a indows user) the following is the current download (at the time of writing C XercesC? @.L.8)) you will need+

http+11xml.apache.org1dist1xercesCF1XercesC?Cbin.@.L.7.(ip

This tutorial assumes you copied the file to the c+M

B. >xtract the contents of the (ip and this will copy the files and create all the subdirectories. 4f you go to your XercesC? @.L.7 directory you should see xerces.Far and xerces&amples.Far

The next step is to edit the =L*&&!*T$ in your *HT<>X>=.%*T file. 3ou need to tell ?ava where it can find xerces.Far and xerces&amples.Far. *dd the two files to your =L*&&!*T$.

#or example) set =L*&&!*T$5N=L*&&!*T$NO =+MXercesC?Cbin.@.L.7MxercesC@PLP7M xerces.FarO =+MXercesC?C bin.@.L.7MxercesC@PLP7M xerces&amples.Far

4n order to test your install) try one of the included samples) &*X=ount.

'o to the directory of you have the two Far files) for example+

=: =+MXercesC?Cbin.@.L.7MxercesC@PLP7 (5S675L "89 C in indows DJ1M>1B888 and KT you can drag and drop a directory into a command prompt window. This saves you from having to type in long directory names.)

Type the following to execute the &*X=ount application+

Fava sax.&*X=ount data1personal.xml

3ou should get the output of the application. data1personal.xml+ BJ8 ms (7E elems) @J attrs) @L8 spaces) @BJ chars) This is a breakdown of the personal.xml file in the data directory.

4f you do not get this output then you are either in the wrong directory or most probably your =L*&&!*T$ is incorrect. =heck your *HT<>X>=.%*T file ( indows DI1DJ1M>) or your

environment settings.

4f this is all working) then you have correctly setup the environment for ?ava and XML.

ell :one 43:3 XML 398 The main three main *!4s that we shall focus on are+ @. ?*X! A !arsing *!4. B. ?*XM A Messaging *!4. 7. ?*X% A %inding *!4. ?ava *!4 for XML !rocessing *s a developer) you would program to a special interface. This interface isolates you from specific parsers and coding changes. &*X and :<M are language independent interfaces. &*X and :<M have two different *!4s to access the information from the XML parser. The different *!4s use different approaches to access the information in the XML document. &*X is a low level *!4 and :<M is a high level *!4. The next sections will cover &*X and :<M in more detail. XML applications will create a parser obFect) throw some XML at the parser and then process the results Simple 398 %or XML (S3X) &*X is a standard interface for event based XML parsing. &*X defines a number of events. 4tQs up to you to listen for them and respond to them. The documents are accessed serially and an event is triggered at different parts of the document. The common events are+ @. start of the document B. start elements 7. characters L. end elements

I. end of the document. %asically you would write a program that has event handlers. &*X is fast and has a low memory re2uirement. &*X parsing is harder to setup. *ocument ;$<ect Model (*;M) :esigned to be a portable interface for manipulating document structures.

Hsing :<M) the application builds a tree structure of the XML document in memory. The different parts of the XML file are stored in nodes in the :<M document. 4t then walks back and forth through the nodes in tree.

.?1ml )ersion=>1('>?/ .resource/ .site/ .sitename/visualbuilder.com.1sitename/ .emailaddress/webmaster0visualbuilder.com.?emailaddress/ .?site/

.site/ .sitename/activepace.com.1sitename/ .emailaddress/info0activepace.com.1emailaddress/ .?site/ .?resource/ ,asics o% programming using *;M *fter you have made sure that your environment has been set up correctly (see &etting up the environment for XML and ?ava section)) you may write your first ?ava and XML example. #or this example we will use the :<M *!4 discussed in the previous section. This is a simple example that will read the text R$ello 1) 8mport package org(#.c(dom orld6 from a xml file called Rhello.xml6.

The ?ava interfaces have been defined by

7= and are contained in the package org.w7c.dom.

import org.w7c.dom.,O ) 8mport :endor dependent 9arser(

The next step is to import a vendor dependent XML parser. 4n our case it will be the xerces :<M parser that we configured. import org.apache.xerces.parsers.:<M!arserO .) ;n calling *;M!elloWorld

The main method of :<M$ello orld.Fava will check that the filename of the xml file has been provided as an argument.

public static void main(&tringST args) U if (args.length ;5 @) U &ystem.out.println(9usage+ Fava :<M$ello orld hello.xml9)O &ystem.exit(8)O V

&tring xmlfilename 5 argsS8TO

V 2) 9arsing the XML document

#irst we must create an instance of the parser (vendor specific parser). This is the same parser we imported earlier in step B.

:<M!arser xmlparser 5 new :<M!arser()O @) 9arse the 1ml %ile(

This is really easy because the parser does it for you. *ll you have to do is call the parse method with the name of the xml file.

xmlparser.parse(xmlfilename)O 4f you look at the *!4 documentation that comes with the Xerces parser and search for the parse method you will notice something special. *;M9arser and S3X9arser are subclasses of XML9arser. The parse method throws two exceptions) &*X>xception and Fava.io.4<>xception. 4f you try to compile the source code without catching the exception) an error will occur (Fava.io.4<>xception must be caught). Hnder the previous import statements add the imports for the 4<>xception and &*X>xception classes . import Fava.io.4<>xceptionO import org.xml.sax.&*X>xceptionO &o now we put a try1catch block around the parse method of :<M!arser. try U xmlparser.parse(xmlfilename)O V catch (4<>xception e)

U &ystem.out.println(9>rror reading xml file+ 9 e.getMessage())O V catch (&*X>xception e) U &ystem.out.println(9>rror in parsing+ 9 e.getMessage())O V 6) 3ccessing the *;M tree

*s :<M creates a treeCbased structure based on the xml file) we will need to access the information stored in the tree. To access the tree call get:ocument() and this returns a :ocument.

:ocument doc 5 xmlparser.get:ocument()O

This :ocument represents the entire XML document. The data we need to access is stored in nodes. These nodes can have child nodes. Therefore what you need to do next is walk through the Tree structure (:ocument) and display the data stored in the Kodes. /o# the 7un startsAAA &) Walking the nodes

Kext we will write a method to display the data in a node. The method will be called displayKode and will take one parameter) the start node.

This method will start from the first node and walk through all the nodes in the XML using recursion.

The nodes in the XML document are of different node types. The node types can be divided into two broad categoriesO structural nodes and content nodes.

&tructural nodes are not actually part of the content in the document but are used to provide syntax structure.

The following is a list of the different types of nodes+

/ode "0pe

Bategor0

*TT"4%HT>PK <:> =:*T*P&>=T4 <KPK<:> =ontent =ontent =<MM>KTPK< =ontent :> :<=HM>KTP#" =ontent *'M>KTPK<: =ontent > :<=HM>KTPK &tructural <:> =ontent :<=HM>KTPT &tructural 3!>PK<:> >L>M>KTPK<: =ontent > &tructural >KT4T3PK<:> &tructural >KT4T3P">#>" =ontent >K=>PK<:> K<T*T4<KPK< =ontent :> !"<=>&&4K'P4 K&T"H=T4<KP K<:> T>XTPK<:>

The main nodes we are interested in are :<=HM>KTPK<:>) >L>M>KTPK<:> and *TT"4%HT>PK<:>.

You might also like