September 9, 2007

XML Basics - Video Tutorials

XML (Extensible Markup Language) is based on the same parent technology as HTML or XHTML. XML looks a lot like HTML, complete with tags, attributes and values. But rather that serving as a lagnguage just for creating web pages, XML is a language for creating other languages. You use XML to design your own custom markup language and then you use that language to format your own documents. Your custom markup language, officially called an XML application will contain tags that actually describe the data that they contain.

If a tag indentifies data, that data becomes available for other tasks. A software program can be designed to extract just the information that it needs, perhaps join it with data from another source, and finally output the resulting combination in another form for another purpose. Instead of being lost on an HTML-based web page, labeled information can be reused as often as neccessary.

But, as always, power comes with a price. XML is not nearly as lenient as HTML. To make it easy for XML parsers (software that reads and interprets XML data, either independently or within a browser), XML demands careful attention to case, quotation marks, closing tags and other details.

XML Schemas

A DTD, or Document Type Definition, is an old-fashioned system of rules with a peculiar, rather limited syntax. DTDs have serveral disadvantages with respect to schemas written with XML Schema. First, DTDs are written in a syntax that has little to do with XML and cannot be parsed with an XML parser. Second, all the declarations in DTDs are global, which means you can't define two different elements with the same name. Third, DTDs cannot control what kind of information in a given element or attribute can contain.

The XML Schema language, developed by the W3C, attempts to remedy each of these problems. XML Schema, written in XML itself, lets you define both global elements and local elements.

XSLT

Perhaps the most powerful tools for working with XML documents are XSLT (Extensible Stylesheet Language Transformations) and XPath. XSLT lets you extract and transform the information into any shape you need. For example, you can use XSLT to create summary and full versions of the same document.

And perhaps most importantly, you can use XSLT to convert XML into HTML. When you are applying this template (XSLT), you use a pattern to specify the nodes that the template can be applied to. When you call a template, you use an expression to specify the node set that should be processed. You also use expresions in other instructions to isolate and then further process given node sets.

You write both patterns and expressions using XPath syntax. XPath is a system for identifying the different parts of the document. More specifically it describes the node sets by specifying their location in the XML document.

The main difference between patterns and expressions is that the former are basically context-free, which means that a pattern like "name" matches any name element in the XML document regardless of its location. Expressions, on the other hand, can only be evauluated by looking at the context in which they appear. An expression "name" might refer to only name nodes within subspecies elements, depending on where it is used.

In Use Today

Content mangement software vendors are leading companies into XML by providing systems that enable you to accomplish what you want - and need - to do with your content. Content in the XML format is easily managed than content in a basic text format or a word processing format because the XML is not bogged down with proprietary coding or formatting. Additionally, the XML information is identified by the elements used around the content.

Using a scripting language or stylesheet, chunks of the XML content can be pulled from one document and used to produce other documents. Additionally, data can be sorted and retrieved to create custom web pages, database content, or hand-held device files. When changes are made to each linked content chunk, all of the documents that contain that content can be automatically updated.

Once your information is in XML and accessible, stylesheets and XSLT can be used to push your information to the web, PDF, cellular phones, iPod (PodCasts), and to some handheld devices.

Tool for Writing XML

XML, like HTML can be written with any text editor. There are specialized editors that can help you write it by giving specialized views and XSLT integration. A great free tool that you should checkout is XML Notepad 2007. There are also many commerical grade applications, including the very popular ones by Altova that can do a whole lot more.

Of course what no matter what tool you use, all XML documents have an .xml extension and XSLT stylesheets have an .xsl extension.

Browser Support

Since the majority of people using XML and XSLT will be in the broswer at first, it is a good idea to see what the browser support is:

Internet Explorer - Version 6 supports XML, CSS, XSLT, and XPath.

Firefox - Version 1.0.2 supports XML and XSLT (and CSS).

Mozilla - Includes Expat for XML parsing and supports XML + CSS. Limited support for Namespaces.

Netscape - Version 8 uses the Mozilla engine.

Opera - Version 9 supports XML and XSLT (and CSS). Version 8 supports only XML + CSS.

Reference Material

Apart from the many online resources on the web to help you learn XML, here are two tutorials (.pdf) on Introduction to XML and DTD and XSLT Basics by Tom Dell'Aringa.

Here are some great videos (.zip) explaining XML basics and XSLT by Joe Marini.

Basics
What is XML?
Describing Information
Advantages and drawbacks of XML
Real Life Examples
Proper XML Syntax
Valid Documents
Namespaces in XML

XSLT
What is XSLT?
Styling XML with XSLT
Simple XSLT styling
Using XSLT with CSS
Repeating items
Conditional logic
Sorting and rearranging XML data