Module 2 PDF

MODULE 2
Introduction to SGML
Markup language refers to the traditional way of marking up a document. It determines
the structure and meaning of textual elements.
The first formal markup language used to specify the structure of documents was created
at IBM in the 1960s.
SGML became a standard for information presentation that was adopted by many
different industries. SGML was adopted as a standard by the International Organization
for Standardization in 1986. SGML was invented by Dr.Charles Goldfark
SGML is the Standard Generalized Markup Language.
SGML is a specification for creating markup languages in a standard way.
Every SGML document has three parts:
SGML declaration
SGML declaration binds SGML processing quantities and syntax token names to specific
values. It specifies which characters and delimiters may appear in an SGML. For
example, the SGML declaration in the HTML DTD specifies that the string that opens an
end tag is </ and the maximum length of a name is 72 characters.
Prologue
Prologue includes one or more document type declarations (DTDs), which specify the
element types, element relationships and attributes. The document structure is written in a
DTD/SGML application. The HTML 3.0 DTD provides a definitive specification of the
allowed syntax for HTML 3.0 documents.
References
References can be represented by markup -an instance, which contains the data and
markup of the document.
Features of SGML
1. Text divided into elements, which can be nested
2. Element boundaries marked by tags
3. Elements carry generic type and other attributes
4. Entity references allow string substitution for character set problems, standard
boilerplate text, and document management
5. Consistent use of delimiters, few special characters
6. Descriptive.
7. Data independence.
8. More powerful.
9. Build to last.
10. Ensure that documents should be transfer from one hardware or software without loss
of information.
SGML is very large and complex, however, and probably overkill for most common
office desktop applications.
HTML is concerned with the presentation of data only.HTML has no provisions for
extending itself in a standard way into new tags, attributes .So it is too limited
(predefined).
XML, however rather than being a predefined language like HTML, is a predefined way
of defining new languages.
XML is a lightweight cut-down version of SGML which keeps enough of its
functionality to make it useful but removes all the optional features which made SGML
too complex to program for in a Web environment.
XML bridges the gap b/w the complex world of SGML and the limited world of
XML:
XML stands for EXtensible Markup Language

XML is a markup language much like HTML
XML was designed to describe data
XML tags are not predefined. We must define our own tags
XML uses a Document Type Definition (DTD) or an XML Schema to describe the data
XML with a DTD or XML Schema is designed to be self-descriptive
XML is a W3C Recommendation
XML is a subset of SGML
XML helps to interchange data on web
Features of XML
XML is for structuring data
Structured data includes things like spreadsheets, address books, configuration parameters,
financial transactions, and technical drawings. XML avoids common pitfalls in language design:
it is extensible, platform-independent, and it supports internationalization and localization. XML
is fully Unicode-compliant.
XML looks a bit like HTML
Like HTML, XML makes use of tags (words bracketed by '<' and '>') and attributes (of the form
name="value").
XML is a family of technologies
Beyond XML 1.0, "the XML family" is a growing set of modules that offer useful services to
accomplish important and frequently demanded tasks. XLink describes a standard way to add
hyperlinks to an XML file. XPointer is a syntax in development for pointing to parts of an XML
document. An XPointer is a bit like a URL, but instead of pointing to documents on the Web, it
points to pieces of data inside an XML file. CSS, the style sheet language, is applicable to XML
as it is to HTML. XSL is the advanced language for expressing style sheets. The DOM is a
standard set of function calls for manipulating XML (and HTML) files from a programming
language.
XML leads HTML to XHTML
There is an important XML application that is a document format: W3C's XHTML, the successor
to HTML. XHTML has many of the same elements as HTML. The syntax has been changed
slightly to conform to the rules of XML. A format that is "XML-based" inherits the syntax from
XML and restricts it in certain ways (e.g, XHTML allows "<p>", but not "<r>"); it also adds
meaning to that syntax (XHTML says that "<p>" stands for "paragraph", and not for "price",
"person", or anything else).
XML is modular
XML allows you to define a new document format by combining and reusing other formats. To
eliminate name confusion when combining formats, XML provides a namespace mechanism.
XSL and RDF (Resource Description Framework is an XML text format that supports resource
description and metadata applications, such as music playlists, photo collections, and
bibliographies)are good examples of XML-based formats that use namespaces.
XML is license-free, platform-independent and well-supported

Opting for XML is a bit like choosing SQL for databases: you still have to build your own
database and your own programs and procedures that manipulate it, but there are many tools
available and many people who can help you. And since XML is license-free, you can build your
own software around it without paying anybody anything. The large and growing support means
that you are also not tied to a single vendor.
XML as a subset of SGML

XML has some features of SGML. So XML is called a subset of SGML. Some of the SGML
features that XML has:
o Modularity
o Extensibility
o Internationality
As XML is a proper subset of SGML, all XML documents are valid SGML documents .But
not all SGML documents are valid XML document.
Simple XML Document /Views of an XML Document
Syntax of XML
The syntax of XML is in two distinct levels:
1. The general low-level rules that apply to all XML documents
2. For a particular XML tag set, either a document type definition (DTD) or an XML schema
General XML Syntax
- XML documents have data elements, markup declarations (instructions for the XML parser),
and processing instructions (for the application program that is processing the data in the
document)
- All XML documents begin with an XML declaration:
<?xml version = "1.0"?>
- XML comments are just like HTML comments
- XML names:
- Must begin with a letter or an underscore

- They can include digits, hyphens, and periods
- There is no length limitation
- They are case sensitive (unlike HTML names)
Every XML document defines a single rootelement, whose opening tag must appear as the first
line of the document
- Every element that has content must have a

closing tag
- Tags must be properly nested
- All attribute values must be quoted
- An XML document that follows all of these rules is well formed
<?xml version = "1.0">

<ad>
<year> 1960 </year>
<make> Cessna </make>
<model> Centurian </model>
<color> Yellow with white trim </color>
<location>
<city> Gulfport </city>
<state> Mississippi </state>
</location>
</ad>
In XML, you often define a new nested tag to provide more info about the content of a tag
- Nested tags are better than attributes, because attributes cannot describe structure and the
structural complexity may grow
- Attributes should always be used to identify numbers or names of elements (like HTML id
and name attributes)


<patient name = "Maggie Dee Magpie">
...
</patient>


<patient>
<name> Maggie Dee Magpie </name>
...
</patient>

<patient>
<name>
<first> Maggie </first>
<middle> Dee </middle>
<last> Magpie </last>
</name>
...
</patient>
XML Document Structure

An XML document often uses two auxiliary files:
- One to specify the structural syntactic rules
- One to provide a style specification
- An XML document has a single root element, but often consists of one or more entities
- Entities range from a single special character to a book chapter
- An XML document has one document entity
- All other entities are referenced in the document entity
- Reasons for entity structure:
1. Large documents are easier to manage
2. Repeated entities need not be literally repeated
3. Binary entities can only be referenced in the document entities (XML is all text!)
When the XML parser encounters a reference to a non-binary entity, the entity is merged in
- Entity names:
- No length limitation
- Must begin with a letter, a dash, or a colon
- Can include letters, digits, periods, dashes, underscores, or colons
- A reference to an entity has the form:
&entity_name;
- One common use of entities is for special characters that may be used for markup delimiters
- These are predefined (as in XHTML):
< <
> >
& &
" "
' '
- The user can only define entities in a DTD

If several predefined entities must appear near each other in a document, it is better to avoid using
entity references
- Character data section
<![CDATA[ content ]]>
e.g., instead of
Start > > > > HERE

< < < <
use
<![CDATA[Start >>>> HERE <<<<]]>
- If the CDATA content has an entity reference, it is taken literally
Contents inside CDATA Section will not processed by XML parser.

It is commonly used for scripting code (e.g., JavaScript).
Begin with <![CDATA[ and terminate with ]]>
A CDATA section cannot contain the string "]]>", therefore, nested CDATA sections are not
allowed.
Also make sure there are no spaces or line breaks inside the "]]>" string.
<?xml version = "1.0"?>


<book title = "C++ How to Program" edition = "3">
<sample>
if ( a > b & & a < c )
printf( a is greater);
</sample>
<sample>
<![CDATA[
if ( a > b && a > c )
printf( a is greater);
]]>
Document Type declarations
The Document Type Declaration attaches a DTD (Document Type Definition) to a document. It
is an optional part in the prolog of an xml document after xml declaration. This part refers to a
line in the document making reference to the DTD and other external entities. A well formed
document may omit this declaration but to be valid, the xml document must include it.
The document type declaration appears at the beginning of the XML document, after the XML
declaration.
A DTD is a set of structural rules called declarations
- These rules specify a set of elements, along with how and where they can appear in a document
- Purpose: provide a standard form for a collection of XML documents
- Not all XML documents have or need a DTD
- The DTD for a document can be internal or

external
- Errors in DTD: Find them early!
- All of the declarations of a DTD are enclosed in the block of a DOCTYPE markup declaration
- DTD declarations have the form:
<!keyword >
- There are four possible declaration keywords:

ELEMENT, ATTLIST, ENTITY, and NOTATION
Declaring Elements
- Element declarations are similar to BNF
- An element declaration specifies the names of an an element, and the elements structure
- If the element is a leaf node of the document tree its structure is in terms of characters
- If it is an internal node, its structure is a list of children elements (either leaf or internal nodes)
- General form:
<!ELEMENT element_name (list of child names)>
e.g.,
<!ELEMENT memo (from, to, date, re, body)>

memo
Declaring Elements (continued)
- Child elements can have modifiers, +(one or more), *(zero or more), ?(zero or one)
e.g.,
<!ELEMENT person
(parent+, age, spouse?, sibling*)>
- Leaf nodes specify data types, most often PCDATA, which is an acronym for parsable
character data
- Data type could also be EMPTY (no content) and ANY (can have any content)
- Example of a leaf declaration:
<!ELEMENT name (#PCDATA)>
- Declaring Attributes
- General form:
<!ATTLIST el_name at_name at_type [default]>

Declaring Attributes (continued)
- Attribute types: there are many possible, but we will consider only CDATA
- Default values:
a value
#FIXED value (every element will have
this value),
#REQUIRED (every instance of the element must
have a value specified), or
#IMPLIED (no default value and need not specify
a value)
- e.g.,
<!ATTLIST car doors CDATA "4">

<!ATTLIST car engine_type CDATA #REQUIRED>
<!ATTLIST car price CDATA #IMPLIED>
<!ATTLIST car make CDATA #FIXED "Ford">
<car doors = "2" engine_type = "V8">

...
</car>
Declaring Entities
- Two kinds:
- A general entity can be referenced anywhere in the content of an XML document
- A parameter entity can be referenced only in a markup declaration
- General form of declaration:
<!ENTITY [%] entity_name "entity_value">
e.g., <!ENTITY jfk "John Fitzgerald Kennedy">
- A reference: &jfk;
- If the entity value is longer than a line, define it in a separate file (an external text entity)
<!ENTITY entity_name SYSTEM "file_location">

Types of Entities
1. Predefined Entity
In XML certain character (< ,> , /) are used specifically for marking up the document
.It cannot be interpreted as Character data ,so cannot be used as content .You must use.Entity
Reference to insert the character into the document like (<,> ,&ampetc)<myelement>7 >
2</myelement>
2.Parsed Entity
It contains text data that becomes part of the XML document once the data is processed.Parsed
entity is intended to be read by the XML processor which will extract the content.After the
content is extracted it becomes part of the document at the location of the entity reference.
Eg: publisher information (PUB1) entity can be declared as <!ENTITY PUB! BPB Publishers>
Whenever the entity declaration is referenced in the document it will be replaced by its
content .First insert an ampersand (&) and then enter entity name followed by (;) for entity
reference.
<publisher>This book is from &PUB1;</publisher>
3. Unparsed Entity
The contents may or may not be text .It is often a binary file or image that is not directly
interpreted by the XML processor .Unparsed entity requires a notation. Notation identifies the
format or type or resource to which the entity is declared.
<!ENTITY myimage SYSTEM 1.gif NDATA GIF>
Here GIF is the notation.Notation declaration for GIF is <!Notation GIF SYSTEM
utils\gifview.exe>
The above declaration tells the processor that whenever it encounters an entity of type GIF it
should use gifview.exe to process it.
4. External Entity
It refers to a storage unit in its declaration by using a SYSTEM or public identifier.It provides a
pointer to a location at which entity can be found.
<!ENTITY myimage SYSTEM http://www.abc.com/image/1.gif
NDATA GIF>
SHOW planes.dtd (Refer text)

XML Parsers
- Always check for well formedness
- Some check for validity, relative to a given DTD
- Called validating XML parsers
- You can download a validating XML parser from: http://xml.apache.org/xerces-j/index.html
- Internal DTDs
<!DOCTYPE root_name [

]>
- External DTDs
<!DOCTYPE XML_doc_root_name SYSTEM

DTD_file_name>

The benefits of using external DTDs is that they can more easily and efficiently be shared by
more than one XML document, or in fact, many organizations with the need to standardize
communications and data. You can write a DTD once and have multiple documents reference it.
Namespaces
A markup vocabulary is the collection of all of the element types and attribute names of a markup
language (a tag set)
- An XML document may define its own tag set and also use that of another tag set -
CONFLICTS!
- An XML namespace is a collection of names used in XML documents as element types and
attribute names
- The name of an XML namespace has the form of a URI
- A namespace declaration has the form:
<element_name xmlns[:prefix] = URI>
- The prefix is a short name for the namespace,which is attached to names from the
namespace in the XML document
<gmcars xmlns:gm = "http://www.gm.com/names">
- In the document, you can use <gm:pontiac>
- Purposes of the prefix:

1. Shorthand
2. URI includes characters that are illegal in XML
This XML carries HTML table information:

<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
This XML carries information about a table (a piece of furniture):

<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
When using prefixes in XML, a so-called namespace for the prefix must be
defined.The namespace is defined by the xmlns attribute in the start tag of an
element.
The namespace declaration has the following syntax. xmlns:prefix="URI".

<root>
<h:table xmlns:h="http://www.w3.org/TR/html4/">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table xmlns:f="http://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
Xml Schemas
Problems with DTDs:
1. Syntax is different from XML - cannot be parsed with an XML parser
2. It is confusing to deal with two different syntactic forms
3. DTDs do not allow specification of particular kinds of data
XML Schemas is one of the alternatives to DTD
- Two purposes:
1. Specify the structure of its instance XML documents
2. Specify the data type of every element and attribute of its instance XML documents
- Schemas are written using a namespace:
http://www.w3.org/2001/XMLSchema
- Every XML schema has a single root, schema
The schema element must specify the namespace for schemas as its xmlns:xsd attribute
- Every XML schema itself defines a tag set, which must be named
targetNamespace = http://cs.uccs.edu/planeSchema
If we want to include nested elements, we must set the elementFormDefault attribute to

qualified
- The default namespace must also be specified
xmlns = "http://cs.uccs.edu/planeSchema"
- A complete example of a schema element:
<xsd:schema

<xmlns:xsd =
"http://www.w3.org/2001/XMLSchema"


elementFormDefault = "qualified">
Defining an instance document
- The root element must specify the namespaces it uses
1. The default namespace
2. The standard namespace for instances

(XMLSchema-instance)
3. The location where the default namespace is defined, using the schemaLocation attribute
which is assigned two values
<planes
xmlns:xsi =
http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation =
"http://cs.uccs.edu/planeSchema/planes.xsd" >
- Data Type Categories
1. Simple (strings only, no attributes and nonnested elements)
2. Complex (can have attributes and nested elements)

XMLS defines over 40 data types
- Primitive: string, Boolean, float,
- Derived: byte, decimal, positiveInteger,
- User-defined (derived) data types specify constraints on an existing type (the base type)
- Constraints are given in terms of facets (totalDigits, maxInclusive, etc.)
- Both simple and complex types can be either named or anonymous
-
DTDs define global elements (context is irrelevant)
- With XMLS, context is essential, and elements

can be either:
1. Local, which appears inside an element that is a child of schema, or
2. Global, which appears as a child of schema
Defining a simple type:
- Use the element tag and set the name and type
attributes
<xsd:element name = "bird" type = "xsd:string" />
- An instance could have:
<bird> Yellow-bellied sap sucker </bird>
- Element values can be constant, specified with the fixed attribute
fixed = "three-toed"
- User-Defined Types
- Defined in a simpleType element, using facets specified in the content of a restriction

element
- Facet values are specified with the value attribute
<xsd:simpleType name = "middleName" >

<xsd:restriction base = "xsd:string" >
<xsd:maxLength value = "20" />
</xsd:restriction>
</xsd:simpleType>
- Categories of Complex Types
1. Element-only elements
2. Text-only elements
3. Mixed-content elements
4. Empty elements
-
Element-only elements
- Defined with the complexType element
- Use the sequence tag for nested elements that must be in a particular order
- Use the all tag if the order is not important
<xsd:complexType name = "sports_car" >

<xsd:sequence>
<xsd:element name = "make"
type = "xsd:string" />
<xsd:element name = "model "
<xsd:element name = "engine"
<xsd:element name = "year"
</xsd:sequence>
</xsd:complexType>
- Nested elements can include attributes that give the allowed number of occurrences
(minOccurs, maxOccurs, unbounded)
SHOW planes.xsd and planes.xml(refer text for example)
- We can define nested elements elsewhere
<xsd:element name = "year" >

<xsd:simpleType>
<xsd:restriction base = "xsd:decimal" >
<xsd:minInclusive value = "1990" />
<xsd:maxInclusive value = "2003" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
Different forms of markup that can occur in XML documents -
here are six kinds of markup that can occur in an XML document: elements, entity references,
comments, processing instructions, marked sections, and document type declarations.
Elements :
These are the most common form of markup. Delimited by angle brackets, most elements identify
the nature of the content they surround. Some elements may be empty in which case they have no
content. If an element is not empty, it begins with a start-tag, <element>, and ends with an end-
tag, </element>.
Attributes :
These are name-value pairs that occur inside start-tags after the element name. For example
<fontdata classtype="bold"> is a fontdata element with the attribute classtype having the value
bold. In XML, all attribute values must be quoted.
Entity References :
The XML specification reserves the use of certain characters such as < and >. In order to insert
these characters into your document as content, there must be an alternative way to represent
them. In XML, entities are used to represent these special characters. Entities are also used to
refer to often repeated or varying text and to include the content of external files.
Every entity must have a unique name. In order to use an entity, you simply reference it by name.
Entity references begin with the ampersand and end with a semicolon.
For example, the lt entity inserts a literal < into a document. So the string <element> can be
represented in an XML document as <element>.
A special form of entity reference, called a character reference, can be used to insert arbitrary
Unicode characters into your document. This is a mechanism for inserting characters that cannot
be typed directly on your keyboard.
Character references take one of two forms: decimal references, ℞, and hexadecimal
references, ℞. Both of these refer to character number U+211E from Unicode.
Comments :
These begin with . Comments can contain any data except the literal string --
. You can place comments between markup anywhere in your document.
Comments are not part of the textual content of an XML document and are displayed in Alchemy
CATALYST as locked strings.
Processing Instructions :
Commonly referred to as PI instructions, they provide an escape hatch used to send raw data to
an XML application. Like comments, they are not textually part of the XML document, but the
XML processor is required to pass them to an application. Processing instructions have the form:
<?name pidata?>. The name, called the PI target, identifies the PI to the application. Applications
should process only the targets they recognize and ignore all other PIs.
Eg :<?xml version=1.0?>
Any data that follows the PI target is optional, it is for the application that recognizes the target.
The names used in PIs may be declared as notations in order to formally identify them. PI names
beginning with xml are reserved for XML standardization.
CDATA Sections :
In a document, a CDATA section instructs the parser to ignore most markup characters.
Consider a source code listing in an XML document. It might contain characters that the XML
parser would ordinarily recognize as markup (< and &, for example). In order to prevent this, a
CDATA section can be used.
<![CDATA[*p = &q;b = (i <= 3);]]>
Between the start of the section, <![CDATA[ and the end of the section, ]]>, all character data is
passed directly to the application, without interpretation. Elements, entity references, comments,
and processing instructions are all unrecognized and the characters that comprise them are passed
literally to the application.
Displaying XML Data in HTML browser
Displaying XML data in HTML browser as HTML tables / Storing XML data in HTML
document
To display an XML document in an HTML browser, we have to load the XML document into a
data island.
A Data Island can be used to access the XML file.
To get the XML document "inside" an HTML page, add an XML Data Island to the HTML page:
The <xml> tag is used to embed XML data within HTML.
Syntax:
<xml src=URL of the external XML file id=name of the data island>
The above XML document can be displayed as HTML table using the following HTML code.
<html>
<body>
<xml src="CDCatalog.xml" id="xmldso" >
</xml>
<table datasrc="#xmldso" width="100%" border="1">
<tr>
<th>Title</th>
<th>Artist</th>
<th>Year</th>
</tr>
<tr >
<td><span datafld="TITLE"></span></td>
<td><span datafld="ARTIST"></span></td>
<td><span datafld="YEAR"></span></td>
</tr>
</table>
</body>
</html>
We don't have to use the HTML table element to display XML data. Data from a Data Island can
be displayed anywhere on an HTML page. All we have to do is to add some <span> or <div>
elements to our page. Use the datasrc attribute to bind the elements to the Data Island, and the
datafld attribute to bind each element to an XML element, like this:
<br />Title: <span _datasrc="#xmldso" datafld="TITLE"></span>

<br />Artist: <span _datasrc="#xmldso" datafld="ARTIST"></span>
<br />Year: <span _datasrc="#xmldso" datafld="YEAR"></span>
or like this:
<br />Title: <div _datasrc="#xmldso" datafld="TITLE"></div>
<br />Artist: <div _datasrc="#xmldso" datafld="ARTIST"></div>
<br />Year: <div _datasrc="#xmldso" datafld="YEAR"></div>
The functionality within Internet Explorer to bind XML data to HTML is called
the Data Source Object (DSO).
EXTENSIBLE STYLESHEET LANGUAGE

XSL stands for Extensible Stylesheet Language. XSL is similar to Cascading Style Sheet (CSS)
language in that it lets you format the display of XML data. This, however, is were the
similarities stop. With CSS, the structure of the XML data must be identical to the structure of its
display. XSL is a language that can transform XML into HTML, a language that can filter and
sort XML data and a language that can format XML data based on the data value.
XSL consists of three parts:
XSLT - a language for transforming XML documents
XPath - a language for navigating in XML documents
XSL-FO - a language for formatting XML documents
How does XSLT transform XML?

XSLT is used to transform an XML document into another XML document, or another type of
document that is recognized by a browser, like HTML and XHTML. Normally XSLT does this
by transforming each XML element into an (X)HTML element. With XSLT you can add/remove
elements and attributes to or from the output file. You can also rearrange and sort elements,
perform tests and make decisions about which elements to hide and display, and a lot more.
In the transformation process, XSLT uses XPath to define parts of the source document that
should match one or more predefined templates. When a match is found, XSLT will transform the
matching part of the source document into the result document.
XSLT Processors
The principle role of an XSLT processor is to apply an XSLT stylesheet to an XML source
document and produce a result document. It is important to note that each of these is an
application of XML and so the underlying structure of each is a tree. So, in fact, the XSLT
processor handles these trees. There are several XSLT processors to choose from. like Saxon,
xt, and Microsoft MSXML3.
XSLT - Transformation
Example study: How to transform XML into XHTML using XSLT.
Style Sheet Declaration

The correct way to declare an XSL style sheet according to the W3C XSLT
Recommendation is: <xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
or: <xsl:transform version="1.0"
The xmlns:xsl attribute is an XML Namespace declaration, which indicates that the prefix xsl is
going to be used for elements defined in the W3C XSLT specification
We want to transform the following XML document ("cdcatalog.xml") into XHTML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog> <cd> <title>Empire Burlesque</title>
<artist>Bob Dylan</artist> <country>USA</country> <company>Columbia</company>
<price>10.90</price>
<year>1985</year> </cd> -
</catalog>
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0"

<xsl:template match="/"> <html> <body> <h2>My CD Collection</h2>
<table border="1"> <tr bgcolor="#9acd32"> <th>Title</th>
<th>Artist</th> </tr>
<xsl:for-each select="catalog/cd"> <tr> <td>
<xsl:value-of select="title"/></td> <td>
<xsl:value-of select="artist"/></td> </tr> </xsl:for-each>
</table> </body> </html>
</xsl:template> </xsl:stylesheet>
Link the XSL Style Sheet to th XML Document
Add the XSL style sheet reference to your XML document ("cdcatalog.xml"):
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?>
<catalog> <cd> <title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<country>USA</country> <company>Columbia</company>
<price>10.90</price>
<year>1985</year>
</cd> . . </catalog>
Since an XSL style sheet is an XML document, it always begins with the XML
declaration: <?xml version="1.0" encoding="ISO-8859-1"?>.
The next element, <xsl:stylesheet>, defines that this document is an XSLT style sheet
document (along with the version number and XSLT namespace attributes).
The <xsl:template> element defines a template. The match="/" attribute associates the
template with the root of the XML source document.
The content inside the <xsl:template> element defines some HTML to write to the
output.
The last two lines define the end of the template and the end of the style sheet.
The <xsl:template> Element
An XSL style sheet consists of one or more set of rules that are called templates.A template
contains rules to apply when a specified node is matched. The <xsl:template> element is used to
build templates.
The match attribute is used to associate a template with an XML element. The match attribute
can also be used to define a template for the entire XML document. The value of the match
attribute is an XPath expression (i.e. match="/" defines the whole document).
The <xsl:value-of> Element
The <xsl:value-of> element can be used to extract the value of an XML element and add it to the
output stream of the transformation:
The <xsl:for-each> Element

The XSL <xsl:for-each> element can be used to select every XML element of a specified node-
set.The <xsl:for-each> element allows you to do looping in XSLT.
Eg :
<xsl:for-each select="catalog/cd"> <tr> <td><xsl:value-of select="title"/></td> <td><xsl:value-
of select="artist"/></td> </tr> </xsl:for-each>
XML Applications
Math ML
Mathematical Markup Language (MathML) is an application of XML for describing
mathematical notations and capturing both its structure and content. It aims at integrating
mathematical formulae into World Wide Web documents. It is a recommendation of the W3C
math working group
XHTML
The Extensible Hypertext Markup Language, or XHTML, is a markup language that has the same
depth of expression as HTML, but also conforms to XML syntax.
CellML
CellML is an XML based markup language for describing mathematical models. Although it
could theoretically describe any mathematical model, it was originally created with the Physiome
Project in mind, and hence used primarily to describe models relevant to the field of biology.
DocBook is a semantic markup language for technical documentation. It was originally intended
for writing technical documents related to computer hardware and software but it can be used for
any other sort of documentation.
ebXML
Electronic Business using eXtensible Markup Language, commonly known as e-business XML,
or ebXML (pronounced ee-bee-ex-em-el) as it is typically referred to as, is a family of XML
based standards sponsored by OASIS and UN/CEFACT whose mission is to provide an open,
XML-based infrastructure that enables the global use of electronic business information in an
interoperable, secure, and consistent manner by all trading partners.
eLML
The eLesson Markup Language (eLML) is an open source XML framework for creating
electronic lessons.
FicML
FicML (Fiction Markup Language) is an XML format for fictional stories (short stories, novellas,
novels, etc). Originally conceived of by multiple contributors, it is an initiative and is in the
process of forming its first specification.
VoiceXML
VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice
dialogues between a human and a computer. It allows voice applications to be developed and
deployed in an analogous way to HTML for visual applications.
Wireless Markup Language
Wireless Markup Language, based on XML, is a markup language intended for devices that
implement the Wireless Application Protocol (WAP) specification, such as mobile phones,
PDAs etc.
General Applications:
1. XML can Separate data from HTML

2. XML is used to exchange data
3. With xml financial data can be exchanged over the internet
4. XML can be used to share data
5. XML can be used to create new Language

Module 2 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 2 PDF

Uploaded by

Copyright:

Available Formats

MODULE 2

XML stands for EXtensible Markup Language

XML is license-free, platform-independent and well-supported

XML as a subset of SGML

The syntax of XML is in two distinct levels:

1. The general low-level rules that apply to all XML documents

General XML Syntax

- All XML documents begin with an XML declaration:

<?xml version = "1.0"?>

- XML comments are just like HTML comments

- Must begin with a letter or an underscore

- Every element that has content must have a

- Tags must be properly nested

- All attribute values must be quoted

- An XML document that follows all of these rules is well formed

<?xml version = "1.0">

XML Document Structure

- One to specify the structural syntactic rules

- One to provide a style specification

- Entities range from a single special character to a book chapter

- An XML document has one document entity

- All other entities are referenced in the document entity

- Reasons for entity structure:

1. Large documents are easier to manage

2. Repeated entities need not be literally repeated

- A reference to an entity has the form:

- The user can only define entities in a DTD

- Character data section

<![CDATA[ content ]]>

Start &gt; &gt; &gt; &gt; HERE

<![CDATA[Start >>>> HERE <<<<]]>

- If the CDATA content has an entity reference, it is taken literally

Contents inside CDATA Section will not processed by XML parser.

<?xml version = "1.0"?>

A DTD is a set of structural rules called declarations

- Purpose: provide a standard form for a collection of XML documents

- Not all XML documents have or need a DTD

- The DTD for a document can be internal or

- Errors in DTD: Find them early!

- DTD declarations have the form:

- There are four possible declaration keywords:

- Element declarations are similar to BNF

<!ELEMENT element_name (list of child names)>

<!ELEMENT memo (from, to, date, re, body)>

- Example of a leaf declaration:

<!ELEMENT name (#PCDATA)>

<!ATTLIST el_name at_name at_type [default]>

<!ATTLIST car doors CDATA "4">

<car doors = "2" engine_type = "V8">

- A general entity can be referenced anywhere in the content of an XML document

- A parameter entity can be referenced only in a markup declaration

- General form of declaration:

<!ENTITY [%] entity_name "entity_value">

e.g., <!ENTITY jfk "John Fitzgerald Kennedy">

<!ENTITY entity_name SYSTEM "file_location">

SHOW planes.dtd (Refer text)

- Always check for well formedness

- Some check for validity, relative to a given DTD

- Called validating XML parsers

- You can download a validating XML parser from: http://xml.apache.org/xerces-j/index.html

<!DOCTYPE XML_doc_root_name SYSTEM

Start > > > > HERE