You are on page 1of 8

What is XML Schema?

In previous volumes, we discussed well-formed XML documents, valid XML documents using DTDs, and XML parsers. DTD has a characteristically simple syntax for functions and content definition. We see, however, that DTD functions and definitions have limitations when it comes to using XML for a variety of complex purposes. Traditionally, DTD has been the standard for XML schema definition; however, XML usage has expanded dramatically in core application systems, being tailored for a wide range of purposes for which DTD is not fully capable of supporting. Given this development, the W3C recommended "XML Schema" as a schema definition language to replace DTD. The recommendation of XML Schema has spurred its adoption as a standard schema definition language.

Differences between XML Schema and DTD Definitions


What differences are there between XML Schema and DTD definitions? We will explain these differences using an XML document related to employee information as an example. When defining XML Schema, the content you wish to put into an XML document must first be summarized. The next step is to create a tree structure. Content to put into the XML document: 1. The root element is "Employee_Info" 2. As the content for "Employee_Info," "Employee" occurs 0 or more times 3. As content of "Employee," "Name," "Department," "Telephone," and "Email" elements occur once in respective order 4. "Name," "Department," "Telephone," and "Email" content are text strings 5. "Employee" has an attribute called "Employee_Number" 6. "Employee_Number" content must be int type

This provides us with an understanding of the hierarchical structure of the XML document. Now, we can provide a schema definition using actual schema definition language. LIST1 is an example using DTD and providing a schema definition for the content above, while LIST2 is an example using XML Schema to provide a schema definition (employee.xs). LIST1: Employee Information DTD
<!ELEMENT Employee_Info (Employee)*> <!ELEMENT Employee (Name, Department, Telephone, Email)>

<!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ATTLIST

Name (#PCDATA)> Department (#PCDATA)> Telephone (#PCDATA)> Email (#PCDATA)> Employee Employee_Number CDATA #REQUIRED>

LIST2 Employee Information XML Schema employee.xs


01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > <xs:element name="Employee_Info" type="EmployeeInfoType" /> <xs:complexType name="EmployeeInfoType"> <xs:sequence> <xs:element ref="Employee" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> <xs:element name="Employee" type="EmployeeType" /> <xs:complexType name="EmployeeType"> <xs:sequence > <xs:element ref="Name" /> <xs:element ref="Department" /> <xs:element ref="Telephone" /> <xs:element ref="Email" /> </xs:sequence> <xs:attribute name="Employee_Number" type="xs:int" use="required"/> </xs:complexType> <xs:element <xs:element <xs:element <xs:element name="Name" type="xs:string" /> name="Department" type="xs:string" /> name="Telephone" type="xs:string" /> name="Email" type="xs:string" />

</xs:schema>

(Line numbers have been added for reference, and are not necessary in the actual code.)

As you see, the syntax is completely different between the two. For the DTD, a unique syntax is written, whereas the XML Schema is written in XML format conforming to XML 1.0 syntax. LIST3 is an example of a valid XML document for the LIST2 XML Schema (employee.xml). LIST3: Valid XML Document for XML Schema (employee.xml)
<?xml version="1.0"?> <Employee_Info xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="employee.xs"> <Employee Employee_Number="105"> <Name>Masashi Okamura</Name> <Department>Design Department</Department> <Telephone>03-1452-4567</Telephone> <Email>okamura@xmltr.co.jp</Email> </Employee> <Employee Employee_Number="109"> <Name>Aiko Tanaka</Name> <Department>Sales Department</Department> <Telephone>03-6459-98764</Telephone> <Email>tanaka@xmltr.co.jp</Email> </Employee> </Employee_Info>

For DTD, a DOCTYPE declaration is used to associate with the XML document; but, in the case of XML Schema, the specification does not particularly determine anything with respect to the association of the XML document. Accordingly, the implementation method of the validation tool actually used is followed. However, under the XML Schema specification, there is a defined method for writing a hint to associate with the XML document. The following content is inserted into the root element of the XML document. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="employee.xs"

XML Schema Structure


From here, using the LIST2 employee.xs file as an example, we will explain the method for writing XML schema.

XML Schema Root Element


The schema element is used as the root element, and the XML Schema "Namespace" is declared. Namespace is a specification used to avoid the duplication of attribute and element names defined under XML, and is normally designated using URL format. Under LIST2, the "xmlns:xs="http://www.w3.org/2001/XMLSchema" section at Line 2 is a Namespace declaration. The "xs" designation is called the "Namespace Prefix," and can be used with an element and a child element. Generally, the "xs" prefix is used most often.

Element Declaration
When declaring an element, an ELEMENT keyword is used under DTD; however, under XML Schema, the element element is used. The declaration method is different depending on whether the element element has a child element or not. When no child element is present, the element name is designated with the name attribute, and the data type is designated using the type attribute.

Under DTD, not much more than being able to show an optional text string called #PCDATA as the element content was possible; however, under XML Schema, a variety of data types can be defined. Data types can be designated using pre-defined embedded simple type (Note), including string type, int type and date type shown in a table, as well as ID type and NMTOKEN type that are compatible with DTD. These can be combined and extended or restricted to create new, unique data types. Note: The XML Schema specification consists of "Part 1: Structure Specification" and "Part 2: Data Type Specification." The embedded simple type is a data type already stipulated in "Part 2: Data Type Specification." In addition to what is shown in Table 2, there is also an "xs: hexBinary" data type that represents hexadecimal binaries and an "xs:base64Binary" data type that represents Base64 format binaries. Table : Main XML Schema Data Types General Data Types

Name

Explanation xs:integer Integers (infinite precision) xs:positiveInteger Positive integers (infinite precision) xs:negativeInteger Negative integers (infinite precision) xs:nonPositiveInteger Negative integers including 0 (infinite precision) xs:nonNegativeInteger Positive integers including 0 (infinite precision) xs:byte Integer represented by 8 bits xs:unsignedByte Integer represented by 8 bits (no symbols) xs:short Integer represented by 16 bits xs:unsignedShort Integer represented by 16 bits (no symbols) xs:int Integer represented by 32 bits xs:unsignedInt Integer represented by 32 bits (no symbols) xs:long Integer represented by 64 bits xs:unsignedLong xs:decimal xs:float xs:double xs:Boolean xs:string Integer represented by 64 bits (no symbols) Decimal number (infinite precision) Single-precision floating-point number (32-bit) Double-precision floating-point number (64-bit) Boolean value Arbitrary text string

Types Representing Dates and Times Name xs:time xs:dateTime xs:date xs:gYear xs:gYearMonth xs:gMonth xs:gMonthDay xs:gDay Explanation Time of day Date and time of day Date Year Year and month Month Month and day Day

DTD-Compatible Types Name xs:ID xs:IDREF xs:IDREFS xs:ENTITY xs:ENTITIES xs:NOTATION xs:NMTOKEN Explanation XML 1.0 Specification ID type XML 1.0 Specification IDREF type XML 1.0 Specification IDREFS type XML 1.0 Specification ENTITY type XML 1.0 Specification ENTITIES type XML 1.0 Specification NOTATION type XML 1.0 Specification NMTOKEN type

xs:NMTOKENS XML 1.0 Specification NMTOKENS type Meanwhile, if the element has a child element, a new data type must first be designated for the element (Line 11): <xs:element name="Employee" type="EmployeeType" /> This "EmployeeType type" designated by the type attribute is a Complex Data Type. Lines 11 through 20 are Complex Type declarations. In the actual content of the Complex Type, EmployeeType type is designated with the name attribute of the complexType element, and the Model Group (settings method for the occurrence order of the child element) is designated in the child element. In the Model Group, use the sequence element to output occurrences in the order written (equivalent to the "," in DTD), and use the choice element to output the occurrence of any given element (equivalent to the "|" in DTD). Meaning of the Model Group XML Schema DTD Output the element in the written order in the exact sequence element number of occurrences designated Output any one element in choice element the exact number of occurrences designated For Model Group element declarations, the most common method is to designate the ref attribute of the element element, referencing the element declared in a separate location (LIST4). LIST4: Element Declaration Reference for a Model Group Element

The element reference syntax is as follows:

Attribute Declarations
When declaring an attribute, the ATTLIST keyword is used under DTD, while the attribute element is used under XML Schema. The syntax is as shown below. As mentioned previously in connection with Complex Type declarations, when describing an attribute, the convention is to describe it after the Complex Type definition content (after the Model Group) (Line 19).

<xs:attribute name="Employee_Number" type="xs:int" use="required"/> The attribute name is designated using the name attribute and the data type is designated using the type attribute. The use, default, and fixed attributes can be designated as options. The use attribute is a designation related to occurrences, and can be used to designate "required" (equivalent of #REQUIRED in DTD) or "optional" (equivalent to #IMPLIED in DTD). When nothing is written, the setting is "optional." The default attribute is used to designate initial values, while the fixed attribute is used to designate a fixed value (equivalent to #FIXED in DTD). Table : XML Schema and DTD Differences related to Attribute Declarations Designations related to Occurrences Attribute description may be omitted Attribute description is required Attribute description is prohibited XML Schema DTD optional required prohibited #IMPLIED #REQUIRED None

Designating Repeat Count

Under DTD, designating a repeat count was only possible by designating the minimum value as (*) for 0 or more times, or (+) for one or more times. However, under XML Schema, the minOccurs and maxOccurs attributes can be used to designate detailed repeat counts, such as "from one to three" or "between three and unlimited." For an unlimited upper limit, set the maxOccurs attribute to "unbounded." Be sure to remember that if the minOccurs and maxOccurs attributes are omitted, both default to a value of 1. Repeat count designations can be used within element references, attribute declarations and within Model Groups.

Review Questions
Question 1 Which of the following two answers are correct regarding Embedded Simple Types representing date and time in XML Schema? 1. 2. 3. 4. 5. xs xs xs xs xs date gMonthYear gMonthDay timeDate gDayMonth

Comments The correct answer is A (xs:date) and C (xs:gMonthDay). These two are embedded simple types representing time and date under XML Schema. Embedded simple types that are used quite often are the same as data types used for common programming languages. Embedded simple types that are difficult to map to types used in current programming languages and databases tend to not be used very often. Question 2 Select which of the following is a correct XML Schema description matching the conditions below. Select all that apply. Assume the XML Schema Namespace prefix is "xs." Conditions: The "Address" attribute is defined as a string type that may be omitted. 1. 2. 3. 4. <xs:attribute name="Address" type="xs:string" use="optional"/> <xs:attribute name="Address" type="xs:string" optional="true"/> <xs:attribute name="Address" type="xs:string" required="optional"/> <xs:attribute name="Address" type="xs:string" use="required"/>

Comments Attribute occurrences are designated using the use attribute of the attribute element. Designating the value as "optional" allows the attribute description to be omitted. Since this meets the required conditions of the question, the correct answer is A. No optional attribute or required attribute exists for the attribute element, and the syntax itself contains an error. Accordingly, answers B and C are incorrect. Designating "required" as the value of the use attribute for the attribute element means that the attribute description is required. The syntax itself is correct, but does not meet the required condition of providing a definition allowing the Address attribute to be omitted. Accordingly, answer D is incorrect.

Question 3 Select which of the following is a valid XML document with respect to the following XML Schema Document. XML Schema Document <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > <xs:element name="Employee" type="EmployeeType" /> <xs:complexType name="EmployeeType"> <xs:sequence maxOccurs="unbounded"> <xs:element ref="Name" /> <xs:element ref="Department" /> </xs:sequence> </xs:complexType> <xs:element name="Name" type="xs:string" /> <xs:element name="Department" type="xs:string" /> </xs:schema> 1. <Employee></Employee> 2. <Employee> <Name>Masashi Tanaka</Name> <Name>Makiko Okamura</Name> </Employee> 3. <Employee> <Name>Masashi Tanaka</Name> <Name>Makiko Okamura</Name> <Department>Sales Department</Department> <Department>Accounting Department</Department> </Employee> 4. Neither A, B, nor C follows the definition in XML Schema Document

You might also like