Professional Documents
Culture Documents
The eXtensible Markup Language (XML) is a universal way of structuring documents and other data.
Markup Languages have existed for many years before the start of the orld ide eb. ord!erfect and "ich Text #ormat ("T#) have used markup tags to provide special formatting commands that apply to specific words and text. $yper Text Markup Language ($TML) is the markup language used for web pages.
$TML has gained widespread use and is easy to understand. %oth $TML and XML are derived from the &tandard 'enerali(ed Markup Language (&'ML).
!reviously anyone who wanted to create web pages would have to learn $TML syntax and make the page using simple text editors.
More advanced $TML specific editors appeared that checked the web pages and $TML tags. hen applications such as M& #rontpage appeared) people could author web pages without learning all the $TML tags. Many thousands of web pages were created daily) mostly showing personal homepages or company marketing information. *s the use of websites became more sophisticated) the limitations of $TML as become apparent. The next section covers the similarities and differences between $TML and XML.
To see how XML separates data from the presentation format) the following example is provided. !"ML .b/ visualbuilder.com webmaster0visualbuilder.com .1b/ *fter reading the $TML displayed above) you can see that it is not exactly clear what is being displayed. e can guess that it is a web site and an email address. * computer program will have great difficultly understanding what this text is in a reliable way. %elow is XML e2uivalent to represent the same text and data XML .site/ .sitename/visualbuilder.com.1sitename/ .emailaddress/webmaster0visualbuilder.com.1emailaddress/
.1site/ 3ou can figure out what this means) but the main reason is that the computer program can make use of it. XML takes more space but it defines the information more precisely and robustly. XML ,asics 4n order to use XML) the document must comply with certain rules to be ell #ormed.
The
.a/
.b/.1b/
.1a/
4n $TML many developers leave out .1br/ and .1p/ tags. Most browsers will handle this correctly.
.car engine5678886/.1car/
.car engine59788891/
4f the document is to be checked for validity) the document uses a :ocument Type :efinition (:T:). The document must begin with .;:<=T3!>/ and agree to the above rules. Wh0 use XML in 4a)a? XML and ?ava work very well together. @. !ortability A ?ava is a platform independent development language. XML is an architecture and language independent data format. %oth ?ava and XML do not care about the platform. B. >xtensive development support A There is good *!4 support in ?ava for XML. !eople are using XML for conducting businessCtoCbusiness transactions and a standard way for different computers to communicate with each other. 3pplication areas o% XML !resentation <riented !ublishing (!<!) A the same data in a web and a !:*. , Message <riented Middleware (M<M) A %B%. browser) mobile phone
This is where an application uses XML as a message format for communication between different systems. , XML used for exchanging database contents. To illustrate the applications we will use an example of an online shopping system. @) The owner of the online shop wants to provide an online service so that it also works on mobile phones and !:*s. :ata in the store database is used to generate a XML document. This same XML document can be transformed using X&L (covered in a later section) into $TML) ML or any other format. This would allow the information to be displayed on mobile phones and !:*. B) The owner wants to automate order processing and fulfilment. &ince the @DE8s) systems have used >lectronic :ata 4nterchange for computerCtoCcomputer communication. This allowed orders and invoices to be sent using a messaging standard. The problem with >:4 is that it is very expensive to buy and >:4 system) therefore only large companies would use it. Two companies agreeing on a :T: can send messages over the 4nternet using XML. !roblems occur regarding security and reliability but these are being addressed. 'oing back to our example) when a customer orders a product from the online shop) the shop sends a standard message to the delivery agent. The delivery computer system automatically updates itself with the latest orders and automatically sends an acknowledgement back to the shop. Messages could also be sent from the suppliers to the online shop to update the inventory on the specific products. 7) The marketing department would like to extract the data from the online shop so they can organise product promotions and sales. $owever) the marketing database is in M& *ccess and the shop uses <racle. There is no specific standard for exchanging data from one database to another. XML allows all the tables to be totally described by using custom tags. .table/ .field/ Standards There are several standard bodies involved in ?ava and XML.
XML speci%ications #rom a specifications perspective specifications for XML. orld ide eb =onsortium ( 7=) provides the base
http+11www.w7.org
The *pache XML proFect provides open source XML implementation solutions.
3ou can find the following at the *pache XML site+ http+11xml.apache.org
Listed below are the *pache proFects related to using XML in ?ava+
=rimson C * ?ava XML parser derived from the &un !roFect X !arser. The ?ava =ommunity !rocess (?=!) also has developed a comprehensive set of application programming interfaces (*!4) for developing XML applications in ?ava.
4n this tutorial you will see how to develop XML applications in ?ava using different methods. *ll the methods are based on downloadable and free tools and technologies. Setting up the en)ironment %or XML and 4a)a To use XML you will need a XML parser but before downloading a XML parser) you must make sure you have ?ava (?:-). Setting up 4a)a :ownload ?:- @.7 from the following H"L+
"un through the setup. <ne of the main problems new ?ava developers have is setting the !*T$ and =L*&&!*T$.
#or indows DI1DJ1M> you edit the *HT<>X>=.%*T file with the new !*T$ and =L*&&!*T$ settings and reboot your machine.
#or
&etting up XML
This tutorial will use the Xerces XML parser found on the *pache XML site.
http+11xml.apache.org1dist1xercesCF1
4f you are a indows user) the following is the current download (at the time of writing C XercesC? @.L.8)) you will need+
http+11xml.apache.org1dist1xercesCF1XercesC?Cbin.@.L.7.(ip
B. >xtract the contents of the (ip and this will copy the files and create all the subdirectories. 4f you go to your XercesC? @.L.7 directory you should see xerces.Far and xerces&les.Far
The next step is to edit the =L*&&!*T$ in your *HT<>X>=.%*T file. 3ou need to tell ?ava where it can find xerces.Far and xerces&les.Far. *dd the two files to your =L*&&!*T$.
4n order to test your install) try one of the included samples) &*X=ount.
'o to the directory of you have the two Far files) for example+
=: =+MXercesC?Cbin.@.L.7MxercesC@PLP7 (5S675L "89 C in indows DJ1M>1B888 and KT you can drag and drop a directory into a command prompt window. This saves you from having to type in long directory names.)
3ou should get the output of the application. data1personal.xml+ BJ8 ms (7E elems) @J attrs) @L8 spaces) @BJ chars) This is a breakdown of the personal.xml file in the data directory.
4f you do not get this output then you are either in the wrong directory or most probably your =L*&&!*T$ is incorrect. =heck your *HT<>X>=.%*T file ( indows DI1DJ1M>) or your
environment settings.
4f this is all working) then you have correctly setup the environment for ?ava and XML.
ell :one 43:3 XML 398 The main three main *!4s that we shall focus on are+ @. ?*X! A !arsing *!4. B. ?*XM A Messaging *!4. 7. ?*X% A %inding *!4. ?ava *!4 for XML !rocessing *s a developer) you would program to a special interface. This interface isolates you from specific parsers and coding changes. &*X and :<M are language independent interfaces. &*X and :<M have two different *!4s to access the information from the XML parser. The different *!4s use different approaches to access the information in the XML document. &*X is a low level *!4 and :<M is a high level *!4. The next sections will cover &*X and :<M in more detail. XML applications will create a parser obFect) throw some XML at the parser and then process the results Simple 398 %or XML (S3X) &*X is a standard interface for event based XML parsing. &*X defines a number of events. 4tQs up to you to listen for them and respond to them. The documents are accessed serially and an event is triggered at different parts of the document. The common events are+ @. start of the document B. start elements 7. characters L. end elements
I. end of the document. %asically you would write a program that has event handlers. &*X is fast and has a low memory re2uirement. &*X parsing is harder to setup. *ocument ;$<ect Model (*;M) :esigned to be a portable interface for manipulating document structures.
Hsing :<M) the application builds a tree structure of the XML document in memory. The different parts of the XML file are stored in nodes in the :<M document. 4t then walks back and forth through the nodes in tree.
.site/ .sitename/activepace.com.1sitename/ .emailaddress/info0activepace.com.1emailaddress/ .?site/ .?resource/ ,asics o% programming using *;M *fter you have made sure that your environment has been set up correctly (see &etting up the environment for XML and ?ava section)) you may write your first ?ava and XML example. #or this example we will use the :<M *!4 discussed in the previous section. This is a simple example that will read the text R$ello 1) 8mport package org(#.c(dom orld6 from a xml file called Rhello.xml6.
The next step is to import a vendor dependent XML parser. 4n our case it will be the xerces :<M parser that we configured. import org.apache.xerces.parsers.:<M!arserO .) ;n calling *;M!elloWorld
The main method of :<M$ello orld.Fava will check that the filename of the xml file has been provided as an argument.
public static void main(&tringST args) U if (args.length ;5 @) U &ystem.out.println(9usage+ Fava :<M$ello orld hello.xml9)O &ystem.exit(8)O V
#irst we must create an instance of the parser (vendor specific parser). This is the same parser we imported earlier in step B.
This is really easy because the parser does it for you. *ll you have to do is call the parse method with the name of the xml file.
xmlparser.parse(xmlfilename)O 4f you look at the *!4 documentation that comes with the Xerces parser and search for the parse method you will notice something special. *;M9arser and S3X9arser are subclasses of XML9arser. The parse method throws two exceptions) &*X>xception and Fava.io.4<>xception. 4f you try to compile the source code without catching the exception) an error will occur (Fava.io.4<>xception must be caught). Hnder the previous import statements add the imports for the 4<>xception and &*X>xception classes . import Fava.io.4<>xceptionO import org.xml.sax.&*X>xceptionO &o now we put a try1catch block around the parse method of :<M!arser. try U xmlparser.parse(xmlfilename)O V catch (4<>xception e)
U &ystem.out.println(9>rror reading xml file+ 9 e.getMessage())O V catch (&*X>xception e) U &ystem.out.println(9>rror in parsing+ 9 e.getMessage())O V 6) 3ccessing the *;M tree
*s :<M creates a treeCbased structure based on the xml file) we will need to access the information stored in the tree. To access the tree call get:ocument() and this returns a :ocument.
This :ocument represents the entire XML document. The data we need to access is stored in nodes. These nodes can have child nodes. Therefore what you need to do next is walk through the Tree structure (:ocument) and display the data stored in the Kodes. /o# the 7un startsAAA &) Walking the nodes
Kext we will write a method to display the data in a node. The method will be called displayKode and will take one parameter) the start node.
This method will start from the first node and walk through all the nodes in the XML using recursion.
The nodes in the XML document are of different node types. The node types can be divided into two broad categoriesO structural nodes and content nodes.
&tructural nodes are not actually part of the content in the document but are used to provide syntax structure.
/ode "0pe
Bategor0
*TT"4%HT>PK <:> =:*T*P&>=T4 <KPK<:> =ontent =ontent =<MM>KTPK< =ontent :> :<=HM>KTP#" =ontent *'M>KTPK<: =ontent > :<=HM>KTPK &tructural <:> =ontent :<=HM>KTPT &tructural 3!>PK<:> >L>M>KTPK<: =ontent > &tructural >KT4T3PK<:> &tructural >KT4T3P">#>" =ontent >K=>PK<:> K<T*T4<KPK< =ontent :> !"<=>&&4K'P4 K&T"H=T4<KP K<:> T>XTPK<:>
The main nodes we are interested in are :<=HM>KTPK<:>) >L>M>KTPK<:> and *TT"4%HT>PK<:>.