You are on page 1of 8

XML Interview Questions and Answers

What is XML?

XML is the Extensible Markup Language. It improves the functionality of the Web by letting you identify your information in a more accurate, flexible, and adaptable ay. It is extensible because it is not a fixed format like !"ML # hich is a single, predefined markup language$. Instead, XML is actually a metalanguage%a language for describing other languages% hich lets you design your o n markup languages for limitless different types of documents. XML can do this because it&s ritten in '(ML, the international standard metalanguage for text document markup #I') **+,$.
What is a markup language?

- markup language is a set of ords and symbols for describing the identity of pieces of a document #for example .this is a paragraph/, .this is a heading/, .this is a list/, .this is the caption of this figure/, etc$. 0rograms can use this ith a style sheet to create output for screen, print, audio, video, 1raille, etc. 'ome markup languages #e.g. those used in ord processors$ only describe appearances #.this is italics/, .this is bold/$, but this method can only be used for display, and is not normally re2usable for anything else.
Where should I use XML?

Its goal is to enable generic '(ML to be served, received, and processed on the Web in the ay that is no possible ith !"ML. XML has been designed for ease of implementation and for interoperability ith both '(ML and !"ML. 3espite early attempts, bro sers never allo ed other '(ML, only !"ML #although there ere plugins$, and they allo ed it #even encouraged it$ to be corrupted or broken, hich held development back for over a decade by making it impossible to program for it reliably. XML fixes that by making it compulsory to stick to the rules, and by making the rules much simpler than '(ML. 1ut XML is not 4ust for Web pages5 in fact it&s very rarely used for Web pages on its o n because bro sers still don&t provide reliable support for formatting and transforming it. 6ommon uses for XML include5 Information identification because you can define your o n markup, you can define meaningful names for all your information items. Information storage because XML is portable and non2proprietary, it can be used to store textual information across any platform. 1ecause it is backed by an international standard, it ill remain accessible and processable as a data format. Information structure XML can therefore be used to store and identify any kind of #hierarchical$ information structure, especially for long, deep, or complex document sets or data sources, making it ideal for an information2management back2end to serving the Web. "his is its most common Web application, ith a transformation system to serve it as !"ML until such time as bro sers are able to handle XML consistently. 0ublishing "he original goal of XML as defined in the 7uotation at the start of this section. 6ombining the three previous topics #identity, storage, structure$ means it is possible to get all the benefits of robust document management and control # ith XML$ and publish to the Web #as !"ML$ as ell as to paper #as 038$ and to other formats #e.g. 1raille, -udio, etc$ from a single source document by using the appropriate style sheets. Messaging and data transfer XML is also very heavily used for enclosing or encapsulating information in order to pass it bet een different computing systems hich ould other ise be unable to communicate. 1y providing a lingua franca for data identity and structure, it provides a common envelope for inter2process communication #messaging$. Web services 1uilding on all of these, as ell as its use in bro sers, machine2processable data can be exchanged

bet een consenting systems, here before it as only comprehensible by humans #!"ML$. Weather services, e2commerce sites, blog ne sfeeds, -9-X sites, and thousands of other data2exchange services use XML for data management and transmission, and the eb bro ser for display and interaction.
Why is XML such an important development?

It removes t o constraints hich ere holding back Web developments5 :. dependence on a single, inflexible document type #!"ML$ hich as being much abused for tasks it as never designed for; <. the complexity of full '(ML, hose syntax allo s many po erful but hard2to2program options. XML allo s the flexible development of user2defined document types. It provides a robust, non2 proprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of '(ML, making it easier to program for.
Describe the role that XSL can play when dynamically generating H ML pages !rom a relational database"

Even if candidates have never participated in a pro4ect involving this type of architecture, they should recogni=e it as one of the common uses of XML. >uerying a database and then formatting the result set so that it can be validated as an XML document allo s developers to translate the data into an !"ML table using X'L" rules. 6onse7uently, the format of the resulting !"ML table can be modified ithout changing the database 7uery or application code since the document rendering logic is isolated to the X'L" rules.
What is S#ML?

'(ML is the 'tandard (enerali=ed Markup Language #I') **+,5:,*?$, the international standard for defining descriptions of the structure of different types of electronic document. "here is an '(ML 8-> from 3avid Megginson at http5@@math.albany.edu5**AA@hm@sgml@cts2fa7.html8->; and Bobin 6over&s '(ML Web pages are at http5@@ .oasis2open.org@cover@general.html. 8or a little light relief, try 9oe English&s .Cot the '(ML 8->/ at http5@@ .flightlab.com@D4oe@sgml@fa72not.txt8->. '(ML is very large, po erful, and complex. It has been in heavy industrial and commercial use for nearly t o decades, and there is a significant body of expertise and soft are to go ith it. XML is a light eight cut2do n version of '(ML hich keeps enough of its functionality to make it useful but removes all the optional features hich made '(ML too complex to program for in a Web environment.
$ren%t XML& S#ML& and H ML all the same thing?

Cot 7uite; '(ML is the mother tongue, and has been used for describing thousands of different document types in many fields of human activity, from transcriptions of ancient Irish manuscripts to the technical documentation for stealth bombers, and from patients& clinical records to musical notation. '(ML is very large and complex, ho ever, and probably overkill for most common office desktop applications. XML is an abbreviated version of '(ML, to make it easier to use over the Web, easier for you to define your o n document types, and easier for programmers to rite programs to handle them. It omits all the complex and less2used options of '(ML in return for the benefits of being easier to rite applications for, easier to understand, and more suited to delivery and interoperability over the Web. 1ut it is still '(ML, and XML files may still be processed in the same ay as any other '(ML file #see the 7uestion on XML soft are$.

!"ML is 4ust one of many '(ML or XML applications%the one most fre7uently used on the Web. "echnical readers may find it more useful to think of XML as being '(ML22 rather than !"MLEE.
Who is responsible !or XML?

XML is a pro4ect of the World Wide Web 6onsortium #WF6$, and the development of the specification is supervised by an XML Working (roup. - 'pecial Interest (roup of co2opted contributors and experts from various fields contributed comments and revie s by email. XML is a public format5 it is not a proprietary development of any company, although the membership of the W( and the 'I( represented companies as ell as research and academic institutions. "he v:.A specification as accepted by the WF6 as a Becommendation on 8eb :A, :,,*.
Why is XML such an important development?

It removes t o constraints hich ere holding back Web developments5 :. dependence on a single, inflexible document type #!"ML$ hich as being much abused for tasks it as never designed for; <. the complexity of full 7uestion -.G, '(ML, hose syntax allo s many po erful but hard2to2 program options. XML allo s the flexible development of user2defined document types. It provides a robust, non2 proprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of '(ML, making it easier to program for.
#ive a !ew e'amples o! types o! applications that can bene!it !rom using XML"

"here are literally thousands of applications that can benefit from XML technologies. "he point of this 7uestion is not to have the candidate rattle off a laundry list of pro4ects that they have orked on, but, rather, to allo the candidate to explain the rationale for choosing XML by citing a fe real orld examples. 8or instance, one appropriate ans er is that XML allo s content management systems to store documents independently of their format, hich thereby reduces data redundancy. -nother ans er relates to 1<1 exchanges or supply chain management systems. In these instances, XML provides a mechanism for multiple companies to exchange data according to an agreed upon set of rules. - third common response involves ireless applications that re7uire WML to render data on hand held devices.
What is D(M and how does it relate to XML?

"he 3ocument )b4ect Model #3)M$ is an interface specification maintained by the WF6 3)M Workgroup that defines an application independent mechanism to access, parse, or update XML data. In simple terms it is a hierarchical model that allo s developers to manipulate XML documents easily -ny developer that has orked extensively ith XML should be able to discuss the concept and use of 3)M ob4ects freely. -dditionally, it is not unreasonable to expect advanced candidates to thoroughly understand its internal orkings and be able to explain ho 3)M differs from an event2 based interface like '-X.
What is S($) and how does it relate to XML?

"he 'imple )b4ect -ccess 0rotocol #')-0$ uses XML to define a protocol for the exchange of information in distributed computing environments. ')-0 consists of three components5 an envelope, a set of encoding rules, and a convention for representing remote procedure calls. Hnless experience ith ')-0 is a direct re7uirement for the open position, kno ing the specifics of the protocol, or ho it can be used in con4unction ith !""0, is not as important as identifying it as a natural application of XML

Why not *ust carry on e'tending H ML?

!"ML as already overburdened ith do=ens of interesting but incompatible inventions from different manufacturers, because it provides only one ay of describing your information. XML allo s groups of people or organi=ations to 7uestion 6.:F, create their o n customi=ed markup applications for exchanging information in their domain #music, chemistry, electronics, hill2 alking, finance, surfing, petroleum geology, linguistics, cooking, knitting, stellar cartography, history, engineering, rabbit2keeping, 7uestion 6.:,, mathematics, genealogy, etc$. !"ML is no ell beyond the limit of its usefulness as a ay of describing information, and hile it ill continue to play an important role for the content it currently represents, many ne applications re7uire a more robust and flexible infrastructure.
Why should I use XML?

!ere are a fe reasons for using XML #in no particular order$. Cot all of these ill apply to your o n re7uirements, and you may have additional reasons not mentioned here #if so, please let the editor of the 8-> kno I$. J XML can be used to describe and identify information accurately and unambiguously, in a ay that computers can be programmed to .understand/ # ell, at least manipulate as if they could understand$. J XML allo s documents hich are all the same type to be created consistently and ithout structural errors, because it provides a standardised ay of describing, controlling, or allo ing@disallo ing particular types of document structure. KCote that this has absolutely nothing hatever to do ith formatting, appearance, or the actual text content of your documents, only the structure of them.L J XML provides a robust and durable format for information storage and transmission. Bobust because it is based on a proven standard, and can thus be tested and verified; durable because it uses plain2text file formats hich ill outlast proprietary binary ones. J XML provides a common syntax for messaging systems for the exchange of information bet een applications. 0reviously, each messaging system had its o n format and all ere different, hich made inter2system messaging unnecessarily messy, complex, and expensive. If everyone uses the same syntax it makes riting these systems much faster and more reliable. J XML is free. Cot 4ust free of charge #free as in beer$ but free of legal encumbrances #free as in speech$. It doesn&t belong to anyone, so it can&t be hi4acked or pirated. -nd you don&t have to pay a fee to use it #you can of course choose to use commercial soft are to deal ith it, for lots of good reasons, but you don&t pay for XML itself$. J XML information can be manipulated programmatically #under machine control$, so XML documents can be pieced together from disparate sources, or taken apart and re2used in different ays. "hey can be converted into almost any other format ith no loss of information. J XML lets you separate form from content. Mour XML file contains your document information #text, data$ and identifies its structure5 your formatting and other processing needs are identified separately in a stylesheet or processing system. "he t o are combined at output time to apply the re7uired formatting to the text or data identified by its structure #location, position, rank, order, or hatever$.
+an you walk us through the steps necessary to parse XML documents?

'uperficially, this is a fairly basic 7uestion. !o ever, the point is not to determine hether candidates understand the concept of a parser but rather have them alk through the process of parsing XML documents step2by2step. 3etermining hether a non2validating or validating parser is needed, choosing the appropriate parser, and handling errors are all important aspects to this process that should be included in the candidate&s response.

#ive some e'amples o! XML D Ds or schemas that you have worked with"

-lthough XML does not re7uire data to be validated against a 3"3, many of the benefits of using the technology are derived from being able to validate XML documents against business or technical architecture rules. 0olling for the list of 3"3s that developers have orked ith provides insight to their general exposure to the technology. "he ideal candidate ill have kno ledge of several of the commonly used 3"3s such as 8pML, 3oc1ook, !BML, and B38, as ell as experience designing a custom 3"3 for a particular pro4ect here no standard existed.
,sing XSL & how would you e'tract a speci!ic attribute !rom an element in an XML document?

'uccessful candidates should recogni=e this as one of the most basic applications of X'L". If they are not able to construct a reply similar to the example belo , they should at least be able to identify the components necessary for this operation5 xsl5template to match the appropriate XML element, xsl5value2of to select the attribute value, and the optional xsl5apply2templates to continue processing the document. Extract -ttributes from XML 3ata Example :. Nxsl5template matchOPelement2namePQ -ttribute Ralue5 Nxsl5value2of selectOPSattributeP@Q Nxsl5apply2templates@Q N@xsl5templateQ
When constructing an XML D D& how do you create an e'ternal entity re!erence in an attribute value?

Every intervie session should have at least one trick 7uestion. -lthough possible hen using '(ML, XML 3"3s don&t support defining external entity references in attribute values. It&s more important for the candidate to respond to this 7uestion in a logical ay than than the candidate kno the some hat obscure ans er.
How would you build a search engine !or large volumes o! XML data?

"he ay candidates ans er this 7uestion may provide insight into their vie of XML data. 8or those ho vie XML primarily as a ay to denote structure for text files, a common ans er is to build a full2text search and handle the data similarly to the ay Internet portals handle !"ML pages. )thers consider XML as a standard ay of transferring structured data bet een disparate systems. "hese candidates often describe some scheme of importing XML into a relational or ob4ect database and relying on the database&s engine for searching. Lastly, candidates that have orked ith vendors speciali=ing in this area often say that the best ay the handle this situation is to use a third party soft are package optimi=ed for XML data.
How does XML handle white-space in my documents?

-ll hite2space, including linebreaks, "-1 characters, and normal spaces, even bet een .structural/ elements here no text can ever appear, is passed by the parser unchanged to the application #bro ser, formatter, vie er, converter, etc$, identifying the context in hich the hite2space as found #element content, data content, or mixed content, if this information is available to the parser, eg from a 3"3 or 'chema$. "his means it is the application&s responsibility to decide hat to do ith such space, not the parser&s5 J insignificant hite2space bet een structural elements #space hich occurs here only element content is allo ed, ie bet een other elements, here text data never occurs$ ill get passed to the application #in '(ML this hite2space gets suppressed, hich is hy you can put all that extra space in !"ML documents and not orry about it$

J significant hite2space #space hich occurs ithin elements hich can contain text and markup mixed together, usually mixed content or 063-"-$ ill still get passed to the application exactly as under '(ML. It is the application&s responsibility to handle it correctly. "he parser must inform the application that hite2space has occurred in element content, if it can detect it. #Hsers of '(ML ill recogni=e that this information is not in the E'I', but it is in the (rove.$ NchapterQ NtitleQ My title for 6hapter :. N@titleQ NparaQ text N@paraQ N@chapterQ In the example above, the application ill receive all the pretty2printing linebreaks, "-1s, and spaces bet een the elements as ell as those embedded in the chapter title. It is the function of the application, not the parser, to decide hich type of hite2space to discard and hich to retain. Many XML applications have configurable options to allo programmers or users to control ho such hite2space is handled.
Which parts o! an XML document are case-sensitive?

-ll of it, both markup and text. "his is significantly different from !"ML and most other '(ML applications. It as done to allo markup in non2Latin2alphabet languages, and to obviate problems ith case2folding in riting systems hich are caseless. J Element type names are case2sensitive5 you must follo hatever combination of upper2 or lo er2 case you use to define them #either by first usage or in a 3"3 or 'chema$. 'o you can&t say N1)3MQTN@bodyQ5 upper2 and lo er2case must match; thus NImg@Q, NIM(@Q, and Nimg@Q are three different element types; J 8or ell2formed XML documents ith no 3"3, the first occurrence of an element type name defines the casing; J -ttribute names are also case2sensitive, for example the t o idth attributes in N0I6 idthOP+inP@Q and N0I6 WI3"!OP?inP@Q #if they occurred in the same file$ are separate attributes, because of the different case of idth and WI3"!; J -ttribute values are also case2sensitive. 63-"- values #eg HrlOPMy8ile.'(MLP$ al ays have been, but C-ME types #I3 and I3BE8 attributes, and token list attributes$ are no case2sensitive as ell; J -ll general and parameter entity names #eg U$, and your data content #text$, are case2sensitive as al ays.
How can I make my e'isting H ML !iles work in XML?

Either convert them to conform to some ne document type # ith or ithout a 3"3 or 'chema$ and rite a stylesheet to go ith them; or edit them to conform to X!"ML. It is necessary to convert existing !"ML files because XML does not permit end2tag minimisation #missing , etc$, un7uoted attribute values, and a number of other '(ML shortcuts hich have been normal in most !"ML 3"3s. !o ever, many !"ML authoring tools already produce almost #but not 7uite$ ell2formed XML. Mou may be able to convert !"ML to X!"ML using the 3ave Baggett&s !"ML "idy program, hich can clean up some of the formatting mess left behind by inade7uate !"ML editors, and even separate out some of the formatting to a stylesheet, but there is usually still some hand2editing to do.

Is there an XML version o! H ML?

Mes, the WF6 recommends using X!"ML hich is .a reformulation of !"ML G in XML :.A/. "his specification defines !"ML as an XML application, and provides three 3"3s corresponding to the ones defined by !"ML G.J #'trict, "ransitional, and 8rameset$. "he semantics of the elements and their attributes are as defined in the WF6 Becommendation for !"ML G. "hese semantics provide the foundation for future extensibility of X!"ML. 6ompatibility ith existing !"ML bro sers is possible by follo ing a small set of guidelines #see the WF6 site$.
I! XML is *ust a subset o! S#ML& can I use XML !iles directly with e'isting S#ML tools?

Mes, provided you use up2to2date '(ML soft are hich kno s about the Web'(ML -daptations "6 to I') **+, #the features needed to support XML, such as the variant form for EM0"M elements; some aspects of the '(ML 3eclaration such as C-ME6-'E (ECEB-L C); multiple attribute token list declarations, etc$. -n alternative is to use an '(ML 3"3 to let you create a fully2normalised '(ML file, but one hich does not use empty elements; and then remove the 3oc"ype 3eclaration so it becomes a ell2formed 3"3less XML file. Most '(ML tools no handle XML files ell, and provide an option s itch bet een the t o standards.
+an XML use non-Latin characters?

Mes, the XML 'pecification explicitly says XML uses I') :A?G?, the international standard character repertoire hich covers most kno n languages. Hnicode is an identical repertoire, and the t o standards track each other. "he spec says #<.<$5 .-ll XML processors must accept the H"82* and H"82:? encodings of I') :A?G?T/. "here is a Hnicode 8-> at http5@@ .unicode.org@fa7@8->. H"82* is an encoding of Hnicode into *2bit characters5 the first :<* are the same as -'6II, and higher2order characters are used to encode anything else from Hnicode into se7uences of bet een < and ? bytes. H"82* in its single2octet form is therefore the same as I') ?G? IBR #-'6II$, so you can continue to use -'6II for English or other languages using the Latin alphabet ithout diacritics. Cote that H"82* is incompatible ith I') **V,2: #I') Latin2:$ after code point :<+ decimal #the end of -'6II$. H"82:? is an encoding of Hnicode into :?2bit characters, hich lets it represent :? planes. H"82:? is incompatible ith -'6II because it uses t o *2bit bytes per character #four bytes above HE8888$.
What%s a Document ype De!inition .D D/ and where do I get one? - 3"3 is a description in XML 3eclaration 'yntax of a particular type or class of document. It sets out hat names are to be used for the different types of element, here they may occur, and ho they all fit together. #- 7uestion 6.:?, 'chema does the same thing in XML 3ocument 'yntax, and allo s more extensive data2checking.$ 8or example, if you ant a document type to be able to describe Lists hich contain Items, the relevant part of your 3"3 might contain something like this5 NIELEMEC" List #Item$EQ NIELEMEC" Item #W063-"-$Q

"his defines a list as an element type containing one or more items #that&s the plus sign$; and it defines items as element types containing 4ust plain text #0arsed 6haracter 3ata or 063-"-$. Ralidators read the 3"3 before they read your document so that they can identify here every element type ought to come and ho each relates to the other, so that applications hich need to kno this in advance #most editors, search engines, navigators, and databases$ can set themselves up correctly. "he example above lets you create lists like5 NListQ NItemQ6hocolateN@ItemQ NItemQMusicN@ItemQ

NItemQ'urfingvN@ItemQ N@ListQ #"he indentation in the example is 4ust for legibility hile editing5 it is not re7uired by XML.$ - 3"3 provides applications ith advance notice of hat names and structures can be used in a particular document type. Hsing a 3"3 and a validating editor means you can be certain that all documents of that particular type ill be constructed and named in a consistent and conformant manner. 3"3s are not re7uired for processing the tip in 7uestion 1 ell2formed documents, but they are needed if you ant to take advantage of XML&s special attribute types like the built2in I3@I3BE8 cross2reference mechanism; or the use of default attribute values; or references to external non2XML files #.Cotations/$; or if you simply ant a check on document validity before processing. "here are thousands of 3"3s already in existence in all kinds of areas #see the '(ML@XML Web pages for pointers$. Many of them can be do nloaded and used freely; or you can rite your o n #see the 7uestion on creating your o n 3"3. )ld '(ML 3"3s need to be converted to XML for use ith XML systems5 read the 7uestion on converting '(ML 3"3s to XML, but most popular '(ML 3"3s are already available in XML form. "he alternatives to a 3"3 are various forms of 7uestion 6.:?, 'chema. "hese provide more extensive validation features than 3"3s, including character data content validation.

You might also like