Professional Documents
Culture Documents
A text book publishing professional takes basic text from authors on paper and write markup instructions to tell the type setter how to make the document look on the printed page. eg: insert a paragraph break here, make this word bold ,double space this text ,etc.
XML is a subset of SGML intended to make SGML light enough for use on
the web XML is a proper subset of SGML
SGML
XML
Text Encoding Initiative (TEI) Encoding of literature J2008 Edgar HTML Pinnacles (PCIS) Automotive maintenance Financial report of public companies Hyper Text Markup language Semiconductor data
What is XML?
XML is a framework used to create Markup Languages to describe data in a structured manner, and an open technology for electronic data exchange and storage.
to the user needs as the user can structure the data and add his own tags
XML is actually a framework used to create other markup languages to
DocBook
Edgar
HTML
Application languages
Frameworks
A Simple HTML
<h1> Personal Computers for sale </h1> <h2> Maker: IBM PC </h2> <h3> Model: Pentium IV </h3> <table border =1> <tr> <td> Storage </td></tr> <tr> <td> RAM </td> <td>512 MB</td> </tr>
Drawback of HTML
A lot of useful information is lost when data is represented using HTML HTML has a fixed tag set, like <h1> , <table> , <tr> etc. HTML is an example of a markup language
Advantages of XML
XML preserves useful information XML liberates information from the shackles of a fixed tag set XML provides a standardized frame work with which one can make his
own tags or use those defined by others that best fit the needs
XML is a flexible framework in which you can create your own customized
markup language
XML is a frame work for making markup languages
Electronic Documents
Electronic documents consists of 3 distinct components. Data content - the words itself Structure - the document type and organization of its elements Presentation - the way the information is presented to the reader
Presentation
Content
XML Document
Content
Structure
Presentation
Structure
Valid XML
Invalid XML
DSSSL
ISO/IEC 10179 + CSS
In HTML <A> element is used for hypertext-linking The element is built directly into the language XLL standard is used with XML for hypertext-linking
DSSSL
ISO/IEC 10179 + CSS
Applications of XML
The world wide web and E-commerce cannot survive with out XML There are a lot of examples where XML is used 1. Online Banking Open financial Exchange Initiative.
2.
Online Banking
With the advent of the web the need for an inter change data representation for financial data become necessary Users want to be able to purchase goods from anywhere on the web with funds drawn from institutions anywhere in the world Users are least concern about the way the financial transactions are represented The solution is a web-friendly internationally agreed standard notation for expressing financial transactions
<Statement Request>
<Bank Account> <BankID>123456<BankID>
<AccountID>9999<AccountID>
<AccountType>CHEKING </AccountType> <PersonName>Tom Hanks</PersonName>
</Bank Account>
</Statement Request> This example conform to the Open Financial Exchange (OFX) specifications
Client Software
(eg. Microsoft Money)
OFX Conversion
OFX Documents
Web Server
The Web
OFX Documents
Database Integration
A lot of information in the world is stored in data bases like personal files, health records, football results, stock prices etc The data base needs to be constantly updated and often change rapidly A mechanism is needed to access and modify the database using a web interface The solution is to capture the database related information such as tables fields, values etc. from the form and translate the data into XML format and transfer it to the database
Name
Phone Number
Address
Submit
Web Browser
Relational Database
<Phone book> <Name> James Thomas </Name> <Phone Number>27598191 </phone Number> <Address> 20 MG Road Bangalore 560001 </Address>
</Phone book>
Formatter
HTML
Formatter
Online Help
Formatter
Braille
XML Document
XML Document
Document Creation
When a document is created it should be specified what the document is, what components they contain and how the components are structured When the computer is told what is in the document then it can differentiate between letters, invoices, copyright notices etc. By using XML the computer can be told what is in a document Name the component parts based on what they are <Copyright> <Para>This document is copyright.</Para> </Copyright>
Benefits of XML
XML documents are self describing When documents contain rich structural information complex queries can be precisely answered and searches gets good results In industry information interchange is very necessary for the existence It will be very difficult for information interchange with out a commonly accepted standard By using an industry standard interchange notation the number of converters can be minimized
Format A
Format B
Format C
Format D
Format A
Format B
Format C
Format D
Logical structure
Physical Structure In a logical structure an XML document is a hierarchy of information The character data of the document hangs in individual chunks out of a tree
Logical View
IBMPC
brand
supplier id = Compusa
IBM inspire
Physical View
IBM PC
Entity A ( part1.xml )
Entity B ( part2.xml )
Entity A1 ( part11.xml )
Entity A2 ( part12.xml )
An XML processor that ignores any validity constraints spelled out in DTD
is known as a nonvalidating XML processor Elfred is an example of nonvalidating XML processor Any processor capable of checking for validity is capable of checking for well-formedness
</introduction>
<John Mathew> <42>
</ product>
Bad end tag no space allowed between the slash and the element type name
eg:- <introduction/>
Attribute Assignments
Attributes are pieces of information that are associated with XML elements In HTML we have the align attribute of the p element, the border attribute of the table element Attribute come in verity of shapes and sizes that are controlled by DTD Attribute assignments always appear within the start-tag of an element The normal syntax of an attribute is
Entity references
Entities are the physical building blocks of the XML documents An entity is a unit of text as single as character or as complex as an entire document Consider the following piece of XML document <Document> If a<b and b< c then a<c </Document>
<Document>
If a & lt; b and b< c then a & lt; c </Document>
<
> & ' "
<
> &
&Chapter 2;
&Chapter 3; </Book> The three chapters of a book stored in separate entities to be gathered in to a single Book element
Comments
XML comments take exactly the same form as HTML comments. <!--This is a comment --> The string -- cannot be used within a comment. <!--This is -- not a comment -->
Processing Instructions
Processing Instruction is used to store application specific information in an XML document The processing instructions pass straight through as SGML parser because they are for the consumption of an application eg1. <?rtf\page?> is a processing instruction to force a particular type setting device to output a page break at particular place eg2. <?XML version =1.0 ?> This shows the version of XML and is a standard for all XML documents
CDATA sections
In some cases a document may contain large numbers of characters that are specially considered by an XML parser like < and > XML allows a block of text to be insulated from attention of the parser using a CDATA section
<Document>
<![CDATA[ If a<b and b<c and a<c ]] > </Document> By prefixing the string <![CDATA [ and appending the string ]]> the entire section in insulated and passes through the parser with out any problem
orange element
4. <!ELEMENT fruit(apple/orange)+> An element of type fruit contains one or more sub elements that are either apple element or orange elements eg : <fruit>
<apple> ----------</apple>
<apple>-----------</apple> <orange>---------</orange> </fruit> Or
<fruit>
<orange>--------</orange> </fruit>
any order.
eg : <para> Here is my list </para> Or <para> XML is a frame work </para>
Or
<para> <list> ------</list> </para>
8. <!element InStock EMPTY> This is an error because the keyword ELEMENT must always be in uppercase This is for all XML Keywords
<!ATTLIST product name color > An element of type product has two attributes known as name and color
Attribute Types
3. <!ATTLIST product code ID ----> An element declared of type product has an attribute known as code. The values of the code, attribute must be unique among attributes of the ID type across the entire XML document. Eg: <product code=B42>
Attribute Defaults
value can be any string of characters. A value for this attribute must
be supplied when it is used in an XML document. 2. <!ATTLIST product name CDATA IBMPC >
attribute must be either the string red or the string green. In the
absence of a value for the attribute in the document, use the default value red E.g.: <product color=red> 4. <!ATTLIST product color(red/green) ) # REQUIRED > An element of type product has an attribute named color. The color attribute must be either the string red or the string green. A value must be supplied when the element is used in a document. Eg: <product color=red>
Entity declarations
2. <!ENTITY chapter1 SYSTEM http://www.digitome.com/chap1.xml> There is an entity known as chapter1. When referenced in an XML document, the parser will insert the contents of the file http://www.digitome.com/chap1.xml.
database data and the Web The Internet Explorer uses a XMLDSO applet to map XML to HTML The following is an XML example to be shown in a browser
<P>
XML catalog displayed in an HTML table using Data Binding <applet code =com.xml.dso. XMLDSO.class width-100% height=25
id=xmldso >
<PARAM NAME=url VALUE=cat.xml > </applet>
HTML PCS
table
PC
row
name
capacity
price
XML DSO Applet
cell
cell
cell
To allow the applet to access the HTML page set the MAYSCRIPT
attribute to TRUE
</PC> <PC>
<NAME> Gonzo PC </NAME> <CAPACIY> 300 </CAPACITY> <PRICE> 5000</PRICE> </PC> </PCS> </applet> <!--table declaration as in Example 1 -- > <table id=table border=2 width=100% datasrc=#xmldso> <thead> <th> Name <th>Capacity <th>Price </thead>
</tr>
</table> </BODY> </html>
An XSL style sheet consists of a set of rules that tell an XSL processor how to
row
name
capacity
price
cell
cell
cell