Introducing XML

What is XML?

XML, which stands for Extensible Markup Language, is the name of a textual format designed for electronically encoding documents, especially (though not exclusively) for the World Wide Web. You can think of XML as an outgrowth of Hypertext Markup Language (HTML), though in fact XML is broader than HTML its scope of application.

According to Wikipedia (9-14-09): "XML is beginning to appear as a first-class data type in other languages. The ECMAScript for XML (E4X) extension to the ECMAScript/JavaScript language explicitly defines two specific objects (XML and XMLList) for JavaScript, which support XML document nodes and XML document lists as distinct objects and use a dot-notation specifying parent-child relationships." You'll be learning more about these arcana in a few weeks.

XML is most relevant to us as the "X" in AJAX: a common, standardized method for organizing data in Web services. XML has rivals for this role, notably JSON (Javascript Object Notation), but because of its broad popularity, we'll use XML exclusively in this course.

Where did XML come from?

XML is in large part a revision of Standard Generalized Markup Language (SGML), a pre-Web convention for tagging electronic documents that emerged in the late 1980s. Around the mid-1990s, the developers of SGML began to propose it as an alternative to the original HTML,which was already understood to have serious limitations as the World Wide Web evolved from display-based publishing toward dynamic documents and online services.

In 1996, the World Wide Web Consortium (W3C), the agency responsible for design and management of the Web, started work on an SGML-based successor and complement to HTML. The first Working Draft XML language specification was released in November, 1996. XML 1.0 became a W3C Recommendation in early 1998.

Though the XML specification is well into its second decade, the current version is still 1.0. A second major version (1.1) appeared in 2004, but it is used only for specialized applications. A complete revision of XML (2.0) has been discussed in technical circles, and a demonstration project called XML-SW (for "Skunk Works") has been circulated. However, no group is currently working on an official XML 2.0.

Form and structure

To the unsophisticated user (like you and me), XML may seem to HTML what checkers is to chess: a much simpler game played in a similar space with a greatly reduced rule set.

Like HTML, XML has tags delimited by < and > characters (angled brackets). However, the population of HTML tags is strictly defined in the language specification. We cannot invent new ones. Not so in XML, where the names of tags are constrained only by local consistency, and their function is supplied by external logic in an application or script (e.g., Javascript).

XML preserves from HTML (and SGML) the distinction between markup and content -- the latter meaning any information, usually text, that falls between two matched tags.

So here's a fairly typical instance of XML markup, as we'll be using it:

Notice that as in HTML, XML tags fall in open-close pairs. The opening tag consists of a tagname (without spaces), surrounded by angled brackets. the closing tag repeats the tagname, preceded by a forward slash (/).

The names of tags are arbitrary. The following fragment is structurally identical to the one above:

Of course, a script or program written with the tag names in the first example will not work with the second example. It would work, though, if the names invoked in the script were changed to match the second version of the XML.

For XML, names don't matter, but patterns do. This assumption puts the X (for Extensible) into XML.

Notice also that in both our examples above, our series of items or thingies is wrapped in a larger container, stuff or myData. This outer wrapping is required for AJAX, and indeed for most other XML applications, because when we deal with XML, we will almost always be traversing a hierarchy, or node tree. Such structures need to have a fundamental or parent node, which is represented by the outer wrapper.

The ability of XML documents to contain structured hierarchies makes them useful for various sorting and arranging tasks. In a way, an XML document is always a database-in-waiting. Consider this example:

As you can see, XML supports the repetition of entire groups, as in a database. In fact, this only scratches the surface of XML's potential, but it will do for a first introduction.

Why XML?

Finally, a moment of reflection. Why do we need XML at all, when it appears to accomplish pretty much the same thing as Arrays in Javascript?

The chief difference between XML and Javascript Arrays lies in the fact that XML data is stored in external files. (Technically, you could store Array data that way, too, though it would be somewhat harder to access.) Externalizing data brings two significant advantages:

As you will see, these benefits are both substantial.




University of Baltimore Logo

Last updated: 09/14/09 16:52:05
Copyright © 2009 School of Information Arts and Technologies