Professional Documents
Culture Documents
What Is XML?
3.1
Unit Objectives
After completing this unit, you should be able to:
Describe the basic rules of XML
Describe what it means for an XML document to be well-formed
List the components that make up an XML document
Differentiate between XML and HTML
Describe the internationalization support in XML
Define some best practices for XML
What Is XML?
At its core XML is text formatted to follow a well-defined set of rules.
XML documents consist primarily of tags and text.
If you've ever seen the source to an HTML document, then the
XML structure should look familiar
This text may be stored/represented in:
A normal file stored on disk
A message being sent over HTTP
A character string in a programming language
A CLOB (character large object) in a database
Any other way textual data can be used
XML documents do not need to exist as documents --they may be:
Byte streams sent between applications
Fields in a database record
Collections of XML Infoset information items
For simplicity they will be referred to as though they are
documents and files.
Copyright IBM Corporation 2004
<?xml version="1.0"?>
<book>
<author>
Tom Wolfe
</author>
<title>
The Right Stuff
</title>
<price>
$6.00
</price>
</book>
ROOT
<book>
<author>
<title>
<price>
"Tom
Wolfe"
"The
Right
Stuff"
"$6.00"
<book>
<title>
Alphabet from A to Z
</title>
<author>
<firstName>Boreng</firstName>
<lastName>Riter</lastName>
</author>
Comment
</book>
For every opening tag "<...>" there must be a matching closing tag
"</...>"
The exception is an empty (no content or body) tag "<.../>"
All tag and attribute names, attribute values, and data must comply
with XML naming rules.
6. The first line should/must contain the special tag that identifies
the version of the XML specification to apply:
XML 1.0 is currently the most common.
Legal:
Not legal:
<?xml version="1.0"?>
<colors>
<color>red</color>
<color>green</color>
</colors>
<?xml version="1.0"?>
<color>red</color>
<color>green</color>
Not legal:
<?xml version="1.0"?>
<shirt>
<style>Polo</style>
<color>red</color>
<size>large</size>
</shirt>
<?xml version="1.0"?>
<shirt>
<style>
<size>large
<color>red
Polo
</style>
</size></color>
</shirt>
The element tags are mixed up
and not ordered.
Not Legal
Comments
title, book.isdn,
lastName, _street,
addrLine1, name:first
<color>
red
</color>
<SIZE>
small
</SIZE>
1name, -street,
&name
<color>
red
</COLOR>
<SIZE>
small
<SiZe>
<fname>
John
</fname>
<f name>
John
</f name>
<nameXML>
John
</nameXML>
<xmlName>
John
</xmlName>
Entity
<
>
&
'
"
Description
"less than"
"greater than"
"ampersand"
"apostrophe"
"quote"
Character
<
>
&
'
"
Examples:
<range>> 6 & < 20</range>
<quotes characters="'"'"/>
<script><![CDATA[
function matchwo(a,b) {
if (a < b && a < 0)
then
{ return 1 }
else
{ return 0 }
}
]]></script>
Notice the different usage of the attribute "type" in the two elements;
semantically they are not the same.
Attributes must have a value.
Values must be quoted with either double or single quotes.
Convention is to stick with one or the other.
Comments
<!--
Processing Instruction
Syntax <? target arg*?>
Processing Instruction is often abbreviated as PI in
documentation.
A feature inherited from SGML.
Used to embed application-specific instructions in documents.
The target name immediately follows "<?" and is used to
associate the PI with an application.
May include zero or more arguments.
May be preceded by comments.
<course>
<name>Java Programming</name>
<department>EECS</department>
<teacher>
<name>Paul Thompson</name>
</teacher>
<student>
<name>Ron Jones</name>
</student>
<student>
<name>Uma Abingdon</name>
</student>
<student>
<name>Lindsay Garmon</name>
</student>
</course>
XML
<html>
<title>Course Roster</title>
<body>
<center>
<h1>Course Roster</h1>
<h2>XML Programming</h2>
<h3>Department: EECS</h3>
<p>
<table border=2>
<tr>
<th>Teacher</th>
<td>Paul Thompson</td>
</tr><tr>
<th>Student<br>List</th>
<td>Ron Jones<br>
Uma Abingdon<br>
Lindsay Garmon
</td>
</tr>
</table>
</center>
</body>
</html>
<?xml version="1.0"?>
<course>
<name>Java Programming</name>
<department>EECS</department>
<teacher>
<name>Paul Thompson</name>
</teacher>
<student>
<name>Ron Jones</name>
</student>
<student>
<name>Uma Abingdon</name>
</student>
<student>
<name>Lindsay Garmon</name>
</student>
</course>
XML
<img src=myDog.jpeg>
<book isdn="3432"></book>
<H1><center>Hello!</H1></center>
<H1><center>Hello!</center></H1>
is valid
<name>test</name>
Is case sensitive.
Checkpoint Questions (1 of 3)
1. Basic XML can be described as:
A. A hierarchical structure of tagged elements, attributes and text.
B. All the HTML tags plus a set of new XML only tags.
C. Object-oriented structure of rows and columns.
D. Processing instructions (PIs) for text data.
E. Textual data with tags for visual presentation.
2. Which of these XML fragments is not well-formed?
A. <root><class>XML</class></root>
B. <class><root>XML</root></class>
C. <root><class id="XML"></root>
D. <root>XML<class id="XML"/>XML</root>
E. <root class="XML"><class id="root"/>XML</root>
Checkpoint Questions (2 of 3)
3. XML Comments are allowed (Select all that apply):
A. Before the XML Declaration
B. Anywhere
C. Between element tags
D. Before the root element
E. All of the Above
4. Which of these XML elements with attributes is not well-formed?
A. <name first='Tony' LAST="Romeo" />
B. <name name="Tony" NAME="ROMEO" />
C. <_name_ first-name="Tony" last-name="Romeo"/>
D. <name="Tony Romeo" />
E. <name name="first='Tony' last='Romeo'" />
F. All of the Above
Checkpoint Questions (3 of 3)
5. Which of these comments regarding HTML and XML is not true?
A. HTML markup is focused on presentation.
B. XML markup is based on defining the data.
C. XML is based on HTML.
D. HTML tags are not case sensitive.
E. XML tags are case sensitive.
F. Both XML and HTML support attributes.
Unit Summary
Having completed this unit, you should be able to:
Describe the basic rules of XML
Describe what it means for an XML document to be well-formed
List the components that make up an XML document
Describe the differences between XML and HTML
Describe the internationalization support in XML
Describe some best practices in XML