You are on page 1of 82

Fundamental XML for Developers

Dr. Timothy M. Chester Texas A&M University

Timothy M. Chester is. . .


Senior IT Manager, Texas A&M University
Application Development, Systems Integration, Developer Tools & Training

Lecturer, Texas A&M College of Business


Courses on Business Programming Fundamentals (VB.NET, C#), XML & Advanced Web Development.

Author
Visual Studio Magazine, Dr. Dobbs Journal, IT Professional

Consultant
President & Principal, eInternet Studios

Contact Information
E-mail: tim-chester@tamu.edu Web: http://tim-chester.tamu.edu

Texas A&M University

You Are. . .
Software Developers
New to XML, Object Oriented Development Require basics of XML course

IT Managers
Need familiarity with XML basics and terminology Interested in how XML can affect both software development and legacy system integration

This session . . .
Assumes you know nothing about XML or XML based technologies Provides a basic introduction to XML based technologies Demonstrates some of the basics of working with the DOM, XSLT, Schema, WSDL, and SOAP.

Agenda
XML Document Object Model (DOM) XPATH XSLT Schema WSDL SOAP Questions

Underlying Technologies
XML Is the Glue

Connectivity

Presentation

Connecting Applications

Connect the Web

Browse the Web

Program the Web

Evolution of Web

HTML, XML HTML HTML

HTML, XML

Generation 1

Generation 2

Generation 3

Static HTML

Web Applications

Web Services

Web Services Overview


Application Model
Partner Web Service Other Web Services Partner Web Service

Internet + XML

End Users

YourCompany.com
Application Business Logic Tier Data Access and Storage Tier Other Applications

Introducing XML
XML stands for Extensible Markup Language. A markup language specifies the structure and content of a document. Because it is extensible, XML can be used to create a wide variety of document types.

Introducing XML
XML is a subset of a the Standard Generalized Markup Language (SGML) which was introduced in the 1980s. SGML is very complex and can be costly. These reasons led to the creation of Hypertext Markup Language (HTML), a more easily used markup language. XML can be seen as sitting between SGML and HTML easier to learn than SGML, but more robust than HTML.

The Limits of HTML


HTML was designed for formatting text on a Web page. It was not designed for dealing with the content of a Web page. Additional features have been added to HTML, but they do not solve data description or cataloging issues in an HTML document. Because HTML is not extensible, it cannot be modified to meet specific needs. Browser developers have added features making HTML more robust, but this has resulted in a confusing mix of different HTML standards.

Introducing XML
HTML cannot be applied consistently. Different browsers require different standards making the final document appear differently on one browser compared with another.

Introduction to XML Markup


XML document (intro.xml)
Marks up message as XML Commonly stored in text files
Extension .xml

1 2 3 4 5 6 7 8

<?xml version = "1.0"?>

Document begins with declaration that specifies XML version 1.0

<!-- Fig. 5.1 : intro.xml --> <!-- Simple introduction to XML markup --> <myMessage> <message>Welcome to XML!</message> </myMessage>

Element message is child element of root element myMessage

Line numbers are not part of XML document. We include them for clarity.

Introduction to XML Markup (cont.)


XML documents
Must contain exactly one root element
Attempting to create more than one root element is erroneous

Elements must be nested properly


Incorrect: <x><y>hello</x></y> Correct: <x><y>hello</y></x>

Must be well-formed

XML Parsers
An XML processor (also called XML parser) evaluates the document to make sure it conforms to all XML specifications for structure and syntax. XML parsers are strict. It is this rigidity built into XML that ensures XML code accepted by the parser will work the same everywhere.

XML Parsers
Microsofts parser is called MSXML and is built directly in IE versions 5.0 and above. Netscape developed its own parser, called Mozilla, which is built into version 6.0 and above.

Parsers and Well-formed XML Documents (cont.)


XML parsers support
Document Object Model (DOM)
Builds tree structure containing document data in memory

Simple API for XML (SAX)


Generates events when tags, comments, etc. are encountered
(Events are notifications to the application)

Parsing an XML Document with MSXML


XML document
Contains data Does not contain formatting information Load XML document into Internet Explorer 5.0
Document is parsed by msxml. Places plus (+) or minus (-) signs next to container elements
Plus sign indicates that all child elements are hidden Clicking plus sign expands container element Displays children Minus sign indicates that all child elements are visible Clicking minus sign collapses container element Hides children

Error generated, if document is not well formed

XML document shown in IE6.

Character Set
XML documents may contain
Carriage returns Line feeds Unicode characters
Enables computers to process characters for several languages

Characters vs. Markup


XML must differentiate between
Markup text
Enclosed in angle brackets (< and >)
e.g,. Child elements

Character data
Text between start tag and end tag
Welcome to XML!

Elements versus Attributes

White Space, Entity References and Built-in Entities


Whitespace characters
Spaces, tabs, line feeds and carriage returns
Significant (preserved by application) Insignificant (not preserved by application)
Normalization Whitespace collapsed into single whitespace character Sometimes whitespace removed entirely
<markup>This is character data</markup>

after normalization, becomes <markup>This is character data</markup>

White Space, Entity References and Built-in Entities (cont.)


XML-reserved characters
Ampersand (&) Left-angle bracket (<) Right-angle bracket (>) Apostrophe () Double quote ()

Entity references
Allow to use XML-reserved characters
Begin with ampersand (&) and end with semicolon (;)

Prevents from misinterpreting character data as markup

White Space, Entity References and Built-in Entities (cont.)


Build-in entities
Ampersand (&amp;) Left-angle bracket (&lt;) Right-angle bracket (&gt;) Apostrophe (&apos;) Quotation mark (&quot;) Mark up characters <>& in element message
<message>&lt;&gt;&amp;</message>

Agenda
XML Document Object Model (DOM) XPATH XSLT Schema WSDL SOAP Questions

Introduction
XML Document Object Model (DOM)
Build tree structure in memory for XML documents DOM-based parsers parse these structures
Exist in several languages (Java, C, C++, Python, Perl, C#, VB.NET, VB, etc)

Introduction
DOM tree
Each node represents an element, attribute, etc.
<?xml version = "1.0"?> <message from = "Paul" to = "Tem"> <body>Hi, Tim!</body> </message>

Node created for element message


Element message has child node for body element Element body has child node for text "Hi, Tim!" Attributes from and to also have nodes in tree

DOM Implementations
DOM-based parsers
Microsofts msxml Microsoft.NET System.Xml Namspace Sun Microsystems JAXP

Creating Nodes
Create XML document at run time

Traversing the DOM


Use DOM to traverse XML document
Output element nodes Output attribute nodes Output text nodes

DOM Components
Manipulate XML document

Agenda
XML Document Object Model (DOM) XPATH XSLT Schema WSDL SOAP Questions

Introduction
XML Path Language (XPath)
Syntax for locating information in XML document
e.g., attribute values

String-based language of expressions


Not structural language like XML

Used by other XML technologies


XSLT

Nodes
XML document
Tree structure with nodes Each node represents part of XML document
Seven types
Root Element Attribute Text Comment Processing instruction Namespace

Attributes and namespaces are not children of their parent node


They describe their parent node

XPath node types


Node Type root string-value expanded-name Description Represents the root of an XML document. This node exists only at the top of the tree and may contain element, comment or processorinstruction children. Represents an XML element and may contain element, text, comment or processorinstruction children. Represents an attribute of an element. Determined by None. concatenating the string-values of all textnode descendents in document order. Determined by The element tag, concatenating the including the namespace string-values of all text- prefix (if applicable). node descendents in document order. The normalized value of the attribute. The name of the attribute, including the namespace prefix (if applicable).

element

attribute

XPath node types. (Part 2)


Node Type text string-value The character data contained in the text node. expanded-name Description None. Represents the character data content of an element.

comment

The content of the comment None. (not including <!-- and -->).

Represents an XML comment.

processing instruction namespace

The part of the processing instruction that follows the target and any whitespace.

The target of the processing instruction.

Represents an XML processing instruction. Represents an XML namespace.

The URI of the namespace. The namespace prefix.

Location Paths
Location path
Expression specifying how to navigate XPath tree Composed of location steps
Each location step composed of
Axis Node test Predicate

Axes
XPath searches are made relative to context node Axis
Indicates which nodes are included in search
Relative to context node

Dictates node ordering in set


Forward axes select nodes that follow context node Reverse axes select nodes that precede context node

Node Tests
Node tests
Refine set of nodes selected by axis
Rely upon axis principle node type
Corresponds to type of node axis can select

Node-set Operators and Functions (cont.)


Location-path expressions
Combine node-set operators and functions
Select all head and body children element nodes
head | body

Select last bold element node in head element node


head/title[ last() ]

Select third book element


book[ position() = 3 ] Or alternatively
book[ 3 ]

Return total number of element-node children


count( * )

Select all book element nodes in document


//book

Agenda
XML Document Object Model (DOM) XPATH XSLT Schema WSDL SOAP Questions

Introduction
Extensible Stylesheet Language (XSL)
Used to format XML documents Consist of two parts
XSL Transformation Language (XSLT)
Transform XML document from one form to another Use XPath to match nodes

XSL formatting objects


Alternative to CSS

Setup
XSLT processor
Microsoft Internet Explorer 6 Java 2 Standard Edition Microsoft.NET System.Xml Namespace

Templates
XSLT document
XML document with root element stylesheet template element
Matches specific XML document nodes Uses XPath expression in attribute match

Templates (cont.)
XSLT
Two trees of nodes
Source tree corresponds to original XML document Result tree contains nodes produced by transformation

Transforms intro.xml into HTML document

Iteration and Sorting


XSLT allows
Iteration through node set
Element for-each

Sorting node set


Element sort
Attribute ascending (i.e., A-Z) Attribute descending (i.e., Z-A)

Conditional Processing
Perform conditional processing
Such as if statement Use element choose
Allows alternate conditional statements Similar to switch statement Has child elements when and otherwise
when element content used if condition is met otherwise element content used if no conditions in when condition are met

XSLT and XPath


XPath Expression
locates elements, attributes and text in XML document

Agenda
XML Document Object Model (DOM) XPATH XSLT Schema WSDL SOAP Questions

Working with Namespaces


Name collision occurs when elements from two or more documents share the same name. Name collision isnt a problem if you are not concerned with validation. The document content only needs to be well-formed. However, name collision will keep a document from being validated.

Name Collision
This figure shows two documents each with a Name element

Using Namespaces to Avoid Name Collision


This figure shows how to use a namespace to avoid collision

Declaring a Namespace
A namespace is a defined collection of element and attribute names. Names that belong to the same namespace must be unique. Elements can share the same name if they reside in different namespaces. Namespaces must be declared before they can be used.

Declaring a Namespace
A namespace can be declared in the prolog or as an element attribute. The syntax to declare a namespace in the prolog is: <?xml:namespace ns=URI prefix=prefix?> Where URI is a Uniform Resource Identifier that assigns a unique name to the namespace, and prefix is a string of letters that associates each element or attribute in the document with the declared namespace.

Declaring a Namespace
For example,
<?xml:namespace ns=http://uhosp/patients/ns prefix=pat>

Declares a namespace with the prefix pat and the URI http://uhosp/patients/ns. The URI is not a Web address. A URI identifies a physical or an abstract resource.

1 2 3 4 5 6 7 8 9 10 11

<?xml version = "1.0"?> <!-- Fig. 5.8 : namespace.xml --> <!-- Namespaces -->

<directory xmlns:text = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <text:file filename = "book.xml"> <text:description>A book list</text:description> </text:file> <image:file filename = "funny.jpg"> <image:description>A funny picture</image:description> <image:size width = "200" height = "100"/> </image:file>

12
13 14 15 16 17 18 </directory>

1 2 3 4 5 6 7 8 9 10 11 12 13 14

<?xml version = "1.0"?> <!-- Fig. 5.9 : defaultnamespace.xml --> <!-- Using Default Namespaces -->

<directory xmlns = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <file filename = "book.xml"> <description>A book list</description> </file> <image:file filename = "funny.jpg"> <image:description>A funny picture</image:description>

15
16 17

<image:size width = "200" height = "100"/>


</image:file>

18 </directory>

Schemas
A schema is an XML document that defines the content and structure of one or more XML documents. To avoid confusion, the XML document containing the content is called the instance document. It represents a specific instance of the structure defined in the schema.

Comparing Schemas and DTDs


This figure compares schemas and DTDs

Schema Dialects
There is no single schema form. Several schema dialects have been developed in the XML language. Support for a particular schema depends on the XML parser being used for validation.

Starting a Schema File


A schema is always placed in a separate XML document that is referenced by the instance document.

Schema Types
XML Schema recognize two categories of element types: complex and simple. A complex type element has one or more attributes, or is the parent to one or more child elements.

A simple type element contains only character data and has no attributes.

Schema Types
This figure shows types of elements

Understanding Data Types


XML Schema supports two data types: built-in and user-derived. A built-in data type is part of the XML Schema specifications and is available to all XML Schema authors.

A user-derived data type is created by the XML Schema author for specific data values in the instance document.

Understanding Data Types


A primitive data type, also called a base type, is one of 19 fundamental data types not defined in terms of other types. A derived data type is a collection of 25 data types that the XML Schema developers created based on the 19 primitive types.

Agenda
XML Document Object Model (DOM) XPATH XSLT Schema WSDL SOAP Questions

WSDL
Think "TypeLib for SOAP" WSDL = Web Service Description Language Uniform representation for services
Transport Protocol neutral Access Protocol neutral (not only SOAP)

Describes:
Schema for Data Types Call Signatures (Message) Interfaces (Port Types) Endpoint Mappings (Bindings) Endpoints (Services)

UDDI
Think "Yahoo!" for WebServices Universal Description and Discovery Interface WebService-Programmable "Yellow Pages" Advertise Sites and Services May point to DISCO resources Initiative driven by Microsoft, IBM, Ariba

Agenda
XML Document Object Model (DOM) XPATH XSLT Schema WSDL SOAP Questions

SOAP
Overview
A lightweight protocol for exchanging information in a distributed, heterogeneous environment
It enables cross-platform interoperability

Interoperable
OS, object model, programming language neutral Hardware independent Protocol independent

Works over existing Internet infrastructure

SOAP
Overview
Guiding principle: Invent no new technology Builds on key Internet standards
SOAP HTTP + XML Submitted to W3C

The SOAP specification defines:


The SOAP message format How to send messages How to receive responses Data encoding

SOAP
SOAP Is Not Objects-by-reference
Distributed garbage collection Bi-directional HTTP

Activation Complicated
Doesnt try to solve every problem in distributed computing Can be easily implemented

SOAP
The HTTP Aspect SOAP requests are HTTP POST requests
POST /WebCalculator/Calculator.asmx HTTP/1.1 Content-Type: text/xml SOAPAction: http://tempuri.org/Add Content-Length: 386 <?xml version=1.0?> <soap:Envelope ...> ... </soap:Envelope>

SOAP
Message Structure
SOAP Message
Headers SOAP Envelope
The complete SOAP message Protocol binding headers <Envelope> encloses payload <Header> encloses headers Individual headers

SOAP Header
Headers SOAP Body Message Name & Data

<Body> contains SOAP message name


XML-encoded SOAP message name & data

SOAP
SOAP Message Format
An XML document using the SOAP schema:
<?xml version=1.0?> <soap:Envelope ...> <soap:Header ...> ... </soap:Header> <soap:Body> <Add xmlns=http://tempuri.org/> <n1>12</n1> <n2>10</n2> </Add> </soap:Body> </soap:Envelope>

SOAP
Server Responses Server replies with a result message:
HTTP/1.1 200 OK ... Content-Type:text/xml Content-Length: 391 <?xml version=1.0?> <soap:Envelope ...> <soap:Body> <AddResult xmlns=http://tempuri.org/> <result>28.6</result> </AddResult> </soap:Body> </soap:Envelope>

SOAP
Industry Support
DevelopMentor Inc. Digital Creations IONA Technologies PLC Jetform ObjectSpace Inc. Rockwell Software Inc. SAP Compaq Microsoft Rogue Wave Software Inc. Scriptics Corp. Secret Labs AB UserLand Software Inc. Zveno Pty. Ltd. IBM Hewlett Packard Intel

Agenda
XML Document Object Model (DOM) XPATH XSLT Schema WSDL SOAP Questions

Questions

Bibliography
Harvey Deitels XML:How To Program Prentice Hall XML Reference Microsoft Academic Resource Kit

You might also like