You are on page 1of 28

UNIT 1 Introduction to XML

XML document structure – Well formed and valid documents – Namespaces – DTD – XML
Schema – X-Files.

------------------------------------------------------------------------------------------------------

What is meant by Markup?

• Some information is added to the documents


• Eg: bold. Font, underline

What is Markup Language?

• Set of symbols that can be enclosed with two angle bragets(<>) is called Markup
Language.
• Ex: GML, SGML, HTML, XML.

What are the markup languages available nowadays?

1. GML
2. SGML
3. HTML
4. XML

SGML:

• Standard Generalized Markup Language


• The international standard for defining descriptions of structure and content in text
documents
• File extension: .sgml
• Developed by : ISO
• Extended from : GML (from IBM)
• Extended to: HTML . XML
• Standard :ISO 8977 – 1986 (October)
• SGML in 1996 extended naming rules allowing arbitrary-language and script markup
• SGML in 1998 to support WWW. Hence it is called WebSGML.
• Interchangeable: device-independent, system-independent
• tags are not predefined
• Using DTD to validate the structure of the document
• Large, powerful, and very complex
• Heavily used in industrial and commercial for over a decade.
Drawbacks:
• SGML was too complex – it defines everything you could ever want to know
about markups and more
• Creation and parsing of SGML documents were difficult

HTML:

• Hyper Text Markup Language.


• Developed by : W3C & WHATWG (world wide web consortium & web hypertext
application technology working group)
1
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
• File extension: .html or .htm
• Initial release : 1993
• Latest release: HTML 5 /5.1(working draft)
• Extended from : SGML
• Extended to : Xhtml
• HTML is the standard Markup Language used to create web pages as well as used to
create user interface for mobile and web application.
• Device independent, system independent
• Tags are predefined
• Version:
o 1995 - HTML 2.0
o 1997 Jan – HTML 3.2 W3C recommended
o 1997 Dec- HTML 4.0
o 1999- HTML 4.1
o 2014 – HTML 5 (W3C and WHATWG)
XML

• XML stands for eXtensible Markup LanguageB


• XML is a markup language much like HTML
• XML was designed to XML is designed to transport and store data, not to display data
• XML tags are not predefined. You must define your own tags
• XML is a W3C Recommendation

EDI:

• Electronic Data Interchange.


• It is transfer of data from one computer to another by standardized message
formatting, without human intervention.
• In 1996 National Institute of Standard and Technology (NIST) defined electronic data
interchange as computer–to– computer.
• In e-commerce and e-business communities EDI were used than SGML

HTML Vs XML
S.no HTML XML
1 HTML stands for Hyper Text Markup XML stands for eXtensible Markup Language
Language
2 HTML is designed to display data with XML is used to transport and store data with
the focus on look and feel of data. focus on what data is.
3 HTML is case insensitive XML is case sensitive
4 As HTML is for displaying the data it As XML is for carrying the data it is dynamic.
is static.
5 HTML has predefined tags XML has user defined tags.
6 For Example: For Example:
<html> <?xml version=”1.0”?>
<head> <student>
<title></title> <roll_No> 1115134001</roll_No>
</head> <Personal_Info>
<body> <Name> Sankar</Name>
<div> Hai </div> <Address>Chennai</Address>
</body> </Personal_Info>
</html> <Dept> CSE</Dept>
<Year> Final </Year> </student>
2
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
Advantage of XML over SGML:

➢ XML allows well-formed documents to be parsed without the DTD where as SGML
required some DTD for processing.
➢ XML is simpler and more permissive in syntax than SGML.
➢ XML can be directly implemented on internet.
➢ XML specifications are flexible and easy to implement.

Advantage of XML over HTML:

➢ HTML can’t represent metadata for which XML was designed.


➢ XML is meant for both machine and human consumption.

Uses of XML:

➢ XML is used to display the meta contents i.e. XML describes the contents of the
documents.
➢ XML is used in exchanging data between the applications.
➢ The data can be extracted from database and can be used in more than one
application.
➢ Different applications can perform different tasks on this data.

Benefit of XML:

➢ XML documents is human readable and we can edit any XML documents in simple
text editor.
➢ The XML document is language neutral.
➢ XML files are independent of an operating system.
➢ Every XML documents has a tree structure.

Drawback of XML:

➢ XML needs lot of space to represent data. XML documents are 3-20 times large when
compared with binary or text file representation.
➢ No intrinsic data type support: XML provides no specific notion of “integer”, “string”,
“Boolean”, “date”, and so on.
➢ XML namespaces are problematic to use.

Application of XML:

➢ E-business and E-commerce


➢ Content Management
➢ Web services and Distributed Computing
➢ Peer– to–peer networking and Instant Massaging.

3
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
1. XML Document Structure:
• XML declaration
• The Document Type Declaration
• Element data
• Attribute data
• Entity data
• Character data or XML content
1. XML Declaration:

<? xml
version =”version_number”
encoding=”encoding_declaration”
standalone=”standalone_staus”
?>

Where,

<?xml - Start the beginning of the processing instruction


Version=”1.0” - XML standard version declaration
encoding=”UTF-8 or UTF-16” - It defines character encoding used in XML document.
Default UTF-8.
standalone=”yes or no” - if yes, XML document has an internal DTD
If no, XML document has an external DTD. Default is no.

2. Document Type Declaration ( DOCTYPE):

➢ Define the constraints on the structure of an XML.


➢ Type:
o 1. Internal DTD
<!DOCTYPE root_element [element_declarations]>

o 2. External DTD
1. Private:
<!DOCTYPE root_element SYSTEM “file_name”>

2. Public:
<!DOCTYPE root_element PUBLIC “DTD_name”
“DTD_location”>

DTD_location: relative or absolute URL

3. XML Elements:
1. Matched pair of XML tags : <name> XYZ </name>

2. Single XML tags : <name/>

Rules:
4
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
1. It may have letter, number and other characters but can not start with
number or special character.
2. Case sensitive
3. Can’t have white space
4. Can’t have < or > symbol.
5. Can’t start with letter xml.
6. Element names have no size limitations.

4. XML Attributes:
➢ It provides additional information about the elements.

Rules:

1. Same name must not appear more than once in the same start-tag.
2. Attributes values must be Quoted. Either single or double can be used.
Syntax:

<element_name attribute_name=”attribute_values”></element_name>

Example:

<employee empid=”125”>

5. Entity References:
➢ Some characters have a special meaning in XML

Types:

1. Internal Entity:
• It is defined locally within a DTD
• Syntax:<!ENTITY entity_name “replacement_text”>
• Example:<!ENTITY compname “w3resource.com”>

2. External Entity:
• It is defined externally in a DTD
• Syntax: <!ENTITY entity_name SYSTEM “URL/URI”>
• Example: <!ENTITY xyz SYSTEM “file:///etc/password”>

3. Parameter entity:
• It is defined within the context of a DTD
• Syntax:<!ENTITY entity_name (entity_content)>
• Example: <!ENTITY invoice (name, street, city, state, zipcode)>

4. Character entity
1. Predefined
• Entities start with & symbol and end with ; symbol.
• Example:
Correct: <message>if salary &lt; 1000 then</message>
Wrong: <message>if salary < 1000 then</message>

5
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
2. Numbered
• Decimal entity
• Hexadecimal entity

Content Predefined Entity Decimal Reference Hexa Decimal Reference


& &amp; &#38; &#x26;
‘ &apos; &#39; &#x27;
< &lt; &#60; &#x3c;
> &gt; &#62; &#x3e;
“ &qout; &#34; &#x22;
XML COMMENTS:

• It can place anywhere in the XML documents but can’t placed before XML
declaration.
• Syntax: <!—content-->

Processing Instructions (PIs):

➢ PIs is similar as comments ie. They are not a textual part of XML documents
but “provide information to applications regarding how to process the content”
➢ Syntax: <? Instruction_name options ?>

Where

➢ Instruction name is called “PI target” is special identifier that the processing
application has to understand.
➢ PIs names should not start with an “xml” keyword.
➢ An option is a character that describes the information for the application to
process.

6. Character Data:

➢ PCDATA:
• PCDATA means parsed character data.
• PCDATA is text that will be parsed by a parser. The text will be examined
by the parser for entities and markup.
• Tags inside the text will be treated as markup and entities will be
expanded.
➢ CDATA:
• CDATA means character data.
• CDATA is a text that will not be parsed by a parser.
• Tags inside the text will not be treated as markup and entities will not be
expanded.

========================================================

6
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
well-formed and valid XML documents

➢ When talking about XML documents, two commonly-used terms are "well formed"
and "valid."

Well-formed Documents:

➢ An XML document with correct syntax is called "Well Formed". This means that it has
no syntax, spelling, punctuation, grammar errors, etc. in its markup.
➢ This means that an XML parser will be able to parse the contents of the XML
document without raising an error.
➢ If a XML document is not well formed then it is not XML.

Rules of XML structure:

1. All the XML elements must have a closing tag.


2. XML tags are case sensitive
3. All XML elements must have proper nesting.
4. All XML documents must contain single root element.
5. Attributes values must be quoted.
6. Attributes may be only appearing once in the same start tag.
7. Attribute vale cannot contain external entities.

[Write all XML Rules here]

Valid Documents:

➢ An XML document validated against a DTD is both "Well Formed" and "Valid".
➢ A valid XML document means that not only is the XML well formed, but it is also
valid against an associated XML Schema and DTD. This means a validating parser
will be able to parse the contents of the XML document and validate the data against
the rules of a specified XML Schema and DTD.
➢ One standard used to validate XML is a DTD, or Document Type Declaration, although
XML Schemas are also used.
➢ Finally, Well formed documents are simple process. The use of valid XML documents
can improve the quality of document process.

========================================================

Namespace
Namespaces in XML:
➢ Sometimes we need to create two different elements by the same name.
➢ XML document allows us to create different element which are having the
common name. This technique is known as namespace.

7
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
➢ Example:
<vehicles>
<car>
<price> 100000</price>
</car>

<bike>
<price> 250000</price>
</bike>
</vehicles>

Above example, both car and bike have same element “price” element. So parser
doesn’t known which one is which. Such name conflicts in XML can easily be avoided
using a name prefix.

➢ Syntax: A namespace is declared using reserved attribute xmlns

<element Xmlns:prefix or name=”URI”>

➢ To rewrite the above XML documents without name conflicts using a name
prefix:

<v:vehicles>
<c:car>
<c:price> 100000</c:price>
</c:car>
<b:bike>
<b:price> 25000</b:price>
</bike>
</v:vehicles>

➢ To rewrite the above XML documents without name conflicts Using xmlns
Attribute:

• Default Declaration: define the namespace without prefix. No need to use


prefixes in all the child elements

<vehicles xmlns=”http://www.w3.org/vehicles >


<car xmlns=”http://www.w3.org/car” >
<price> 100000</price>
</car>
<bike xmlns=”http://www.w3.org/bike” >
<price> 25000</price>
</bike>
</vehicles>

• Explicit Declaration:
When a namespace is defined for an element, all child elements with the
same prefix are associated with the same namespace.

<v:vehicles xmlns:v=”http://www.w3.org/vehicles”>
<c:car xmlns:c=”http://www.w3.org/car”>
<c:price> 100000</c:price>
</c:car>

8
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
<b:bike xmlns:b=”http://www.w3.org/bike”>>
<b:price> 25000</b:price>
</bike>
</v:vehicles>

➢ Example 2:
<file>
<text fname=”input.txt”>
<description>It is a text file </description>
</text>
<text fname=”flower.jpg”>
<description> It is an image file </description>
<text>
</file>
Above XML document have text element. But it doesn’t produce name conflicts
because the element text is used for two different attribute values.

========================================================

The Document Type Definition (DTD):

Why use a DTD?

➢ Usage a DTD to verify that XML data is valid.

DTD:

➢ The Document Type Definition (DTD) is used to define the basic building block of any
XML documents.
➢ Using DTD we can specify the various elements types, attributes and their
relationship with one another.
➢ Basically DTD is used to specify the set of rules for structuring data in any XML files.

Types of DTDs:
1. Internal DTD
2. External DTD

Example:

SimpleXml.xml

<?xml version=”1.0”?>
<student>
<name>Anand</name>
<address>Chennai</address>
<dept>cse</dept>
<marks>90 </marks>
</student>

9
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
1. Internal DTD
• If the DTD is declared inside the XML file, it must be wrapped inside the
<!DOCTYPE> definition.

• Syntax:

<!DOCTYPE root-element [ element-declaration]>

DTDDemo1.xml - Open some suitable text editor or notepad. Type the following
code into it

<?xml version="1.0" encoding="UTF-8" ?>


<!DOCTYPE student
[
<!ELEMENT student (name, address, dept, marks)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT dept (#PCDATA)>
<!ELEMENT marks (#PCDATA)>
]>

<student>
<name>Anand</name>
<address>Chennai</address>
<dept>cse</dept>
<marks>90</marks>
</student>

2. External DTD
• If the DTD is declared in an external file, the <!DOCTYPE> definition must
contain a reference to the DTD file.
• Syntax:
<!DOCTYPE root-element SYSTEM “URL/URI”>

Step 1: Creation of DTD file [Student.dtd]

<!ELEMENT student (name, address, dept, marks)>


<!ELEMENT name (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT dept (#PCDATA)>
<!ELEMENT marks (#PCDATA)>

Step 2: Creation of XML document [DTDdemo2.xml]

<?xml version=”1.0” ?>


<!DOCTYPE student SYSTEM “Student.dtd">
<student>
<name>Anand</name>
<address>Chennai</address>
<dept>cse</dept>
<marks>90 </marks>

10
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
</student>

Step3: Using some web browser open the XML documents.

Output:

Advantage of DTD:

1. These are relatively simple.


2. DTDs are used to define the structural components of XML document.

Disadvantage of DTD:

1. DTD are very basic and cannot specific for complex operation.
2. Various Data types does not support. It support only two data types
(CDATA,PCDATA).
3. Not aware namespace.
4. Some XML processor does not understand DTD.

Structure of DTD

The Structure of a DTD consists of

1. Elements
2. Attributes
3. Entities

1. Elements:

➢ Elements are the main building blocks of XML documents.


➢ In a DTD, elements are declared with an ELEMENT keyword.
➢ Elements are processed from the top down approach.
➢ Syntax :
<!ELEMENT element-name content>

11
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
➢ DTD element rules:
1. content rule
2. Structure rule

➢ Content Rule: Deal with actual content of data.


1. EMPTY rule
• Element that contain no data.
• Syntax : <!ELEMENT element-name EMPTY>
DTD example:<!ELEMENT name EMPTY>
XML example : <name/>

2. #PCDATA Rule (Elements with Parsed Character Data)


• Elements that contain only parsed character data
• Syntax: <!ELEMENT element-name (#PCDATA)>
DTD example:<!ELEMENT name (#PCDATA)>
XML example: <name> xyz</name>

3. ANY rule (Elements with any Contents)


• Elements that contain other elements and/or normal character data.
Syntax: <!ELEMENT element-name ANY>
DTD example:< !ELEMENT student ANY>
➢ Structure Rules: Deal how that data may be organized.
1. “Element only” rule
• This rule is used to define elements that can have child elements.
• Syntax:
<!ELEMENT element-name (child1,child2,….)>
• Example:
<!ELEMENT student (name, address, dept, marks)>

2. “Mixed” rule
• This rule is used to define elements that can have both character data and
child elements.
• Syntax:
<!ELEMENT element-name (#PCDATA|child1|child2)*>
• Example:
<!ELEMENT student (#PCDATA|name|sddress|dept|marks)>

• "student" element that can contain zero or more occurrences of parsed


character data, "name", "address", "dept", or "marks" elements.

Element Symbols:

Symbol: ( + )

• It indicates that child elements can occur 1 or more times inside the
parent element.
• Example:<!ELEMENT appliances (phone+)>

12
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
Symbol : ( * )

• It indicates that child elements can occur 0 or more times inside the
parent element.
• Example:<!ELEMENT children (name*)>

Symbol : ( ? )

• It indicates that child elements can occur 0 or 1 time inside the


parent element.
• Example:<!ELEMENT address (landmark?)>

Symbol: ( | )

• Allow for making choices in the child element.


• Example: <!ELEMENT vehicle (car | bike)>

Symbol: ( , )

• It provides separation of an element in a sequence.


• Example:<!ELEMENT address(street, city, state, zip)>

Symbol: [ () ]

• It is used to group a sequence of elements.


• Example:<!ELEMENT address(street, city, state, zip)>

Symbol: no symbol

• It specifies that the data must appear once in the XML file.
• Example: <!ELEMENT contact (name)>

2.Attributes:

• Attributes are name/value pair to describe XML documents.


• In a DTD, attributes are declared with an ATTLIST keyword.
• Syntax:

<!ATTLIST element-name attribute-name attribute-type attribute-value>

DTD example: <!ATTLIST payment type CDATA #REQUIRED>


XML example: <payment type=”check”/>

• Attribute Types:
1. CDATA - the value is character data.
2. ID - The value is a unique id.
3. IDREF - the value is the id of another element.
4. IDREFS - the value is a list of other ids.
5. NMTOKEN - the value is a valid XML name
6. NMTOKENS - the value is a list of valid XML names.
7. ENTITY - the value is an entity.
13
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
8. ENTITIES - the value is a list of entites.
9. NOTATION - the value is a name of notation.
10. ENUMERATED -list the possible value for the attributes.

• Attributes values:

#REQUIRED -the attribute is required


#IMPLIED -the attribute is optional
#FIXED value -the attribute value is fixed

#REQUIRED

<!ATTLIST element-name attribute-name attribute-type #REQUIRED>


#IMPLIED

<!ATTLIST element-name attribute-name attribute-type #IMPLIED>


#FIXED

<!ATTLIST element-name attribute-name attribute-type #FIXED “value”>

3. Entities:

➢ Entities are used to define shortcuts to special characters.


➢ Syntax: <!ENTITY entity-name “entity-value”>
➢ Type:
1. Internal Entities Declaration
2. External Entities Declaration
3. Parameter Entities Declaration
4. Pre-defined Entities Declaration

1. Internal Entity Declaration:


• Entity should be declared inside the DTD

Syntax: <!ENTITY entity-name description>


DTD Example:<!ENTITY copyright “Copyright W3Schools.”>
XML example:<author>&copyright;</author>
• An entity has three parts: an ampersand (&), an entity name, and a
semicolon (;).

2. External Entity Declaration


• Entity is not a part of XML file. It is a separate file that should be
embedded in XML file during parsing.

Syntax:<!ENTITY entity-name SYSTEM “URI/URL”>


DTD Example: <!ENTITY copyright SYSTEM“entities.dtd”>
XML example:<author>&copyright;</author>

14
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
3. Parameter Entity:
• Parameter enties are defined and used within the DTD itself.
• Syntax: <!ENTITY % entity_name entity_content)

• In parameter entity, entity start with % sign and end with ;


Example:

<!ENTITY % pc “(#PCDATA)”>
<!ELEMENT name (%pc;)>
<!ELEMENT age (%pc;)>
<!ELEMENT weight (%pc;)>

4. Pre-defined Entity:
• It is used to insert character encoding in the XML document and used to
define shortcuts to special characters.

Content Entity Decimal Hexa Decimal


Reference Reference
& &amp; &#38; &#x26;
‘ &apos; &#39; &#x27;
< &lt; &#60; &#x3c;
> &gt; &#62; &#x3e;
“ &qout; &#34; &#x22;

DTD Directives:

IGNORE:
• To ignore elements , entities or attributes.
• Syntax:
<![IGNORE
// this part of DTD will be ignored.
]>
• Example:<!ELEMENT student <![IGNORE (#PCDATA)]> (name,address,dept,mark)>
INCLUDE
•Syntax:
<![INCLUDE
// this part of DTD will be included.
]>
========================================================
XML Schema
➢ XML Schema are used to represent the structure of the XML document
➢ The goal or purpose of XML Schema is to define the building blocks of an XML
document.
➢ These used to alternative way to XML DTD.
➢ XML Schema Language is called as XSD(XML Schema Definition Language).

Advantage of Schema:
1. It is support the Data Types
2. It is aware of namespace
15
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
3. It is W3c recommendation. Hence it is supported by various XML validator and
XML processors.

Disadvantage:
1. It is complex to design and hard to learn
2. Complex operations sometimes slow down the processing of XML document.

Comparison of XMLSchema and DTD

XML Schema DTD


XML Schema are complex DTD are basic and cannot specific for
complex operation
It support various data types Does not support. It support only two data
types (CDATA,PCDATA)
It is aware namespace Not aware namespace
It is W3c recommendation. Hence it is Some XML processor does not support
supported by various XML validator and XML
processors.

EXAMPLE:
I. Simple XML document
SimpleXml.xml
<?xml version=”1.0”?>
<student>
<name>Anand</name>
<dept>Chennai</dept>
<dob>09-12-1992</dob>
<marks>90 </marks>
</student>

II. XML Document with DTD


Student.dtd
<!ELEMENT student (name, dept, dob, marks)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT dept (#PCDATA)>
<!ELEMENT dob (#PCDATA)>
<!ELEMENT marks (#PCDATA)>

DTDdemo.xml
<?xml version=”1.0”?>
<!DOCTYPE student SYSTEM “Student.dtd">
<student>
<name>Anand</name>
<dept>Chennai</dept>
<dob>09-12-1992</dob>
<marks>90 </marks>
</student>

III. XML Schema


StudentSchema.xsd
<?xml version="1.0"?>
<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema “>
<xsd:element name="student">
<xsd:simpleType>
<xsd:sequence>
16
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="dept" type="xsd:string"/>
<xsd:element name="dob" type="xsd:date"/>
<xsd:element name="marks" type="xsd:integer"/>
</xsd:sequence>
</xsd:simpleType>
</xsd:element>
</xsd:schema>

MySchema.xml
<?xml version=”1.0” encoding=”UTF-8”?>
<student xmlns:xsd="http://www.w3.org/2001/XMLSchema-instance"
xsd:schemaLocation="StudentSchema.xsd">
<name>Anand</name>
<dept>Chennai</dept>
<dob>09-12-1992</dob>
<marks>90 </marks>
</student>

XML SCHEMA DATA TYPE

XML Schema has a lot of built-in data types. The most common types are:
• xsd:string
• xsd:decimal
• xsd:integer
• xsd:boolean
• xsd:date
• xsd:time

There are two types


1. Simple Element
2. Complex Element
1. Simple Element:

➢ A simple element is an XML element that can contain only text. It cannot contain
any other elements or attributes.
➢ A simple element is defined as
<xsd:element name="xxx" type="yyy" />
Data types of element
Name of the elmenet
StudentSchema.xsd
<?xml version="1.0"?>
<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema “>
<xsd:element name="student">
<xsd:simpleType>
<xsd:sequence>
` <xsd:element name="name" type="xsd:string"/>
<xsd:element name="dept" type="xsd:string"/>
<xsd:element name="dob" type="xsd:date"/>
<xsd:element name="marks" type="xsd:integer"/>
</xsd:sequence>
</xsd:simpleType>
</xsd:element>
</xsd:schema>
17
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
MySchema.xml
<?xml version=”1.0” encoding=”UTF-8”?>
<student xmlns:xsd="http://www.w3.org/2001/XMLSchema-instance"
xsd:schemaLocation="StudentSchema.xsd">
<name>Anand</name>
<dept>Chennai</dept>
<dob>09-12-1992</dob>
<marks>90 </marks>
</student>

Restrictions or “facets”

➢ Restriction are used to define acceptable values for XML elements or attributes.
➢ Restriction on XML elements are called “facets”.
➢ There are
• Length
• minLength
• maxLength
• minInclusive
• maxInclusive
• minExclusive
• maxExclusive
• enumeration

Length: Restrictions on length of Values


• This example below defines an element called "password" with a restriction.
The value must be eight character length.

<xsd:element name="password">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:length value="8"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>

• This example below defines another element called "password" with a


restriction. The value must be minimum five characters and maximum eight
characters:

<xsd:element name="password">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:minLength value="5"/>
<xsd:maxLength value="8"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>

Enumeration : Restrictions on a Set of Values

• The example below defines an element called "car" with a restriction. The only
acceptable values are: Audi, Golf, BMW:
18
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
<xsd:element name="car">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Audi"/>
<xsd:enumeration value="Golf"/>
<xsd:enumeration value="BMW"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>

2. COMPLEX ELEMENT:

➢ A complex element is an XML element that contains other elements and/or


attributes.
➢ Example:

StudentSchema.xsd
<?xml version="1.0"?>
<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema “>
<xsd:element name="student">
<xsd:complexType>
<xsd:sequence>
` <xsd:element name="name" type="xsd:string"/>
<xsd:element name="dept" type="xsd:string"/>
<xsd:element name="dob" type="xsd:date"/>
<xsd:element name="marks" type="xsd:integer"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

MySchema.xml
<?xml version=”1.0” encoding=”UTF-8”?>
<student xmlns:xsd="http://www.w3.org/2001/XMLSchema-instance"
xsd:schemaLocation="StudentSchema.xsd">
<name>Anand</name>
<dept>Chennai</dept>
<dob>09-12-1992</dob>
<marks>90 </marks>
</student>

➢ There are four kinds of complex elements:


• empty elements
• elements that contain only other elements
• elements that contain only text
• elements that contain both other elements and text

XSD Complex Types Indicators

➢ We can control HOW elements are to be used in documents with indicators.


➢ There are seven indicators:

19
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
Order indicators:
• All
• Choice
• Sequence

Occurrence indicators:
• maxOccurs
• minOccurs

Group indicators:
• Group name
• attributeGroup name

All Indicator
➢ The <all> indicator specifies that the child elements can appear in any order, and
that each child element must occur only once:
<xsd:element name="student">
<xsd:complexType>
<xsd:all>
` <xsd:element name="name" type="xsd:string"/>
<xsd:element name="dept" type="xsd:string"/>
<xsd:element name="dob" type="xsd:date"/>
<xsd:element name="marks" type="xsd:integer"/>
</xsd:all>
</xsd:complexType>
</xsd:element>

Choice Indicator
➢ The <choice> indicator specifies that either one child element or another can occur:
<xsd:element name="student">
<xsd:complexType>
<xsd:choice>
` <xsd:element name="name" type="xsd:string"/>
<xsd:element name="dept" type="xsd:string"/>
<xsd:element name="dob" type="xsd:date"/>
<xsd:element name="marks" type="xsd:integer"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>

Sequence Indicator
➢ The <sequence> indicator specifies that the child elements must appear in a specific
order:
<xsd:element name="student">
<xsd:complexType>
<xsd:sequence>
` <xsd:element name="name" type="xsd:string"/>
<xsd:element name="dept" type="xsd:string"/>
<xsd:element name="dob" type="xsd:date"/>
<xsd:element name="marks" type="xsd:integer"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

20
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
Occurrence Indicators

➢ Occurrence indicators are used to define how often an element can occur.

1. maxOccurs Indicator
• The <maxOccurs> indicator specifies the maximum number of times an
element can occur:
<xsd:element name="student">
<xsd:complexType>
<xds:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="dept" type="xsd:string"/>
<xsd:element name="dob" type="xsd:date"/>
<xsd:element name="marks" type="xsd:integer"
maxOccurs="6"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

2. minOccurs Indicator

• The <minOccurs> indicator specifies the minimum number of times an


element can occur:

<xsd:element name="student">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="dept" type="xsd:string"/>
<xsd:element name="dob" type="xsd:date"/>
<xsd:element name="marks" type="xsd:integer"
maxOccurs="10"minOccurs="1"/>
</xsd:sequence>
</xsd:complexType></xsd:element>

Group Indicators

➢ Group indicators are used to define related sets of elements.

1. Element Groups
• Element groups are defined with the group declaration, like this:

<xsd:group name="persongroup">
<xsd:sequence>
<xsd:element name="firstname" type="xsd:string"/>
<xsd:element name="lastname" type="xsd:string"/>
</xsd:sequence>
</xsd:group>

2. Attribute Groups
• Attribute groups are defined with the attributeGroup declaration, like this:

<xsd:attributeGroup name="personattrgroup">
<xsd:attribute name="firstname" type="xsd:string"/>
<xsd:attribute name="lastname" type="xsd:string"/>
21
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
</xsd:attributeGroup>

XFiles :
1. XPath
2. XLink
3. XPointer

Xpath:

➢ XPath is used to navigate through elements and attributes in an XML document.


➢ XPath uses path expressions to navigate in XML document.
➢ It is a W3C standard
➢ Xpath is a case-sensitive language.
➢ Xpath is a major element in XSLT.

<?xml version="1.0" encoding="UTF-8"?>


<bookstore>
<book>
<title lang="eng">Harry Potter</title>
<price>29.99</price>
</book>
<book>
<title lang="eng">Learning XML</title>
<price>39.95</price>
</book>
</bookstore>

Selecting Nodes:
➢ XPath uses path expressions to select nodes in an XML document
Expression Description
/ selects from root node
// Select nodes in the document from current node
. selects the current node
.. select the parent of the current node
@ selects attributes.
* Matches any element node
@* Matches any attribute node
: Namespace separator

Predicates:

➢ Predicates are used to find a specific node or a node that contains a specific value.
Path Expression Result
/bookstore/book[1] Selects the first book element
/bookstore/book[last()] Selects the last book element
/bookstore/book[last()-1] Selects the last but one book element
/bookstore/book[position()<3] Selects the first two book elements
//title[@lang] Selects all the title elements
/bookstore/book[price>35.00] Selects all the book elements of the bookstore
element that have a price element with a value
greater than 35.00

XPath syntax:

22
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
axis :: node_test [predicate]

where

➢ axis
• It identifies the hierarchical relationship for the desire nodes from the current
context
• There are several axis name namely
▪ parent - Select the parent of the current node
▪ child - Selects all children of the current node
▪ attribute - Selects all attributes of the current node
▪ self - Select the current node
▪ ancestor - Selects all ancestors of the current node
▪ descendant - Selects all descendants of the current node
▪ namespace - Selects all namespaces of the current node
▪ preceding-sibling - Select all siblings before the current node

➢ note_test
• Indicates the types of node desired for the result
• Seven nodes types
▪ Root, element, attribute, text, processing instruction and namespace
• List of node_test
▪ comment() - Select nodes that are comments
▪ node() - Select nodes of any type
▪ text() -Select a text node
▪ processing-instruction() -Select nodes that are processing
instruction
➢ predicate
• A predicate may also contain an expression that result in a Boolean value.

Xpath operators: |, +, -, *, div, =, !=, <, <=, >, >=, or, and, mod

Sample Xpath queries and their result:

<?xml version="1.0" encoding="UTF-8"?>


<bookstore>
<book>
<title lang="eng">Harry Potter</title>
<price>29.99</price>
</book>
<book>
<title lang="eng">Learning XML</title>
<price>39.95</price>
</book>
</bookstore>

Select all titles: /bookstore/book/title


Result: Java
Learning XML
Select the title of the first book : /bookstore/book[1]/title
23
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
Result: Java

Select all price: /bookstore/book/price[text()]


Result: 29.99
39.95

Select price nodes with price>35: /bookstore/book[price>35]/price


Result: 39.95

Select title nodes with price>35 : /bookstore/book[price>35]/title


Result: Learning XML

Select all child node of book: /bookstore/book/child::node()


Result:
title
price

Select attribute : /bookstore/book/title/attribute::lang


Result: lang=”eng”

XLink:

➢ XML Linking Language, or XLink, is used to create hyperlinks (internal and


external links) in XML documents.
➢ Element that include links are called linking elements.
➢ It contains 5 attributes.

1. xlink:type attribute:
• Specifies the type of link.
• Syntax: xlink:type=”value”
• Two types of links
1. simple
• It is similar to HTML hyperlink like <a> tag.
• It provides an unidirectional hyperlink from one element to
another through URI.
• Example:
<?xml version = "1.0"encoding = "UTF-8" ?>
<links xmlns:xlink = "http://www.w3.org/1999/xlink">
<link xlink:type = "simple"
xlink:href ="http://www.w3web.com">w3web.com
</link>
</links>
2. extended
• It allow multiple resources to be linked to an element
• It consist of
• locator locates a remote resource used by extended link
(using URI)
• resource locates a local resources used by extended link
24
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
• Arc defines traversal rules
• Title Human readable labels for links

<workplan xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="extended">


<myresource xlink:type="locator"
xlink:href="students.xml" xlink:label="student"/>
<myresource xlink:type="locator"
xlink:href="teachers.xml" xlink:label="teacher"/>
<mymark xlink:type=”resource” xlink:role="marks"
xlink:label="marks.html">7.8</marks>
<myarc xlink:type="arc
xlink:from="student" xlink:to="teacher"/>
<myarc xlink:type="arc
xlink:from="marks" xlink:to="student"/>
</workplan>

2. xlink:herf attribute:
• Specifies the URL to link.

• Syntax: xlink:herf=”url”

3. xlink:show attribute
• Specifies where to open the link. Default is "replace".
• The value of the show attribute is one of “new”, “replace”, “embed”, “other”,
or “none”

o new - the application should open the ending resource in a


new window
o replace - the application should load the ending resource in a
same window
o Embed - the application should load the ending resource in the
place of starting resource
25
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
o other - do something other than new, replace, embed do.
o none - the application is not constrained on how to load the
ending resource.

• Syntax : xlink:show=”value”

4. xlink:actuate attribute:
• Tell the browser when to show the content.
• The value of the actuate attribute is one of “onLoad”, “onRequest”, “other”, or
“none”
o onLoad - the application should traverse to the ending resource
upon loading of the starting resource.
o onRequest - the application should traverse to the ending resource
after loading of the starting resource, but only when
some explicit event (an mouse click, etc.) initiates the
traversal.
o other - do something other than onLoad and onRequest.
o none - the application is not constrained on how it handles the
traversal.

• Syntax: xlink:actuate=”value”

• Example:
<image xmlns:xlink=http://www.w3.org/1999/xlink
xlink:type=”simple”
xlink:herf=”logo.gip”
xlink:actuate=”onLoad”
xlink:show=”embed”/>.

XPointer
➢ XPointer allows the hyperlinks to point to more specific parts (fragments) in the XML
document.
➢ This is just an extension of XPath.
➢ Xpointer provide two more importance node tests:
• point()
• range()
• A point can represent the location immediately before or after a specific
character.
• A range consists of the start point and an end point that contains all XML
information between those two points.
➢ Points:
• There are 2 different types of points can be represented using Xpointer
• Type:
1. Node points:
✓ If the node contains child nodes, then points exist before and after
each of its children.
✓ Example:
▪ There are 8 points are located inside the novel element.

<novel copyright=”public domain”>* point 1

26
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
point 2 *<title>XML book </title>* point 3
point 4 *<author>Ron</author>* point 5
point 6 *<year>1990</year>* point 7

point 8 *</novel>

2. Character points:
o If the node does not contain any child nodes, then a point is
present before and after each character in the node’s string value.
✓ Example:
▪ There are 5 points are present inside the text node of the
year.
<year>1980</year>
✓ Point 1 between <year> and 1
✓ Point 2 between 1 and 9
✓ Point 3 between 9 and 8
✓ Point 4 between 8 and 0
✓ Point 5 between 0 and </year>
• Example:
<novel copyright=”public domain”>
<title>XML book </title>
<author>Ron</author>
<year>1990</year>
</novel>

Xpointer(//title[position()=1]/text()/point() [position()=4])

✓ Initially finds the document’s first title element, then it takes its text
node of title. Within this text node, it select the point between first and
fourth.
<title>XML book </title>
o Point 1 between <title> and X
o Point 2 between X and M
o Point 3 between M and L
o Point 4 between L and space
✓ Apply it to this example, it would display XML.

➢ Ranges:
• A range will contain the XML between start point and an end point.
• Range are created with 4 functions that XPointer add to XPath
o range()
o range-inside()
o range-to()
o string-range()
• The range () function takes XPath expression that returns a range that covers
that node exactly.
• Example:
xpointer(range(/novel/*))

27
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College
• When applied to the example, it returns three ranges, one covering each of
the novel root element’s and three child elements.

28
Prepare by: Dr. A. GNANASEKAR ASP/CSE R.M.D. Engineering College

You might also like