You read that right: “XHTML2 Explorations”. Yes, the fun never stops here at goer.org.
I’ve decided to take a closer look at XHTML2, or more specifically, XHTML2 Working Draft 6. I’ll admit that I haven’t done a good job slogging through the thousands of messages on the W3C lists. I’m just a casual observer.
Fortunately, there’s been plenty of weblog chatter over the <object>
replacing <img>
, <cite>
getting dropped and then added back, the excitement over <blockcode>
, the battle over the style
attribute, navigation lists, the new <h>
and <section>
model, the “href
on everything” model, and more. The Alphas have been discussing these issues for months, and we Gammas have been well-served by just listening in.
But even with all the healthy public discussion, XHTML2 is a big specification (430 KB and counting). At least for my own edification, it’s time to see how deep the rabbit hole goes.
A Swarm of Attributes
XHTML2 provides a huge number of common attributes that are divided up into collections. Most XHTML2 elements accept attributes from all collections. The most well-known example of this is the “everything is a hyperlink” concept, wherein you can turn any XHTML2 element into a link by applying the href
attribute.
But of course there’s much more. Consider the set of “common” attributes in HTML 4.01: there’s id
, style
, class
, dir
, lang
, title
, and the “event” attributes (such as onmouseover
). This yields a little over 15 attributes, depending on how you’re counting. In contrast, XHTML2 already provides around 30 common attributes. Let’s take a closer look.
The Edit Collection
The Edit Collection provides an edit
attribute with four allowed values: inserted
, deleted
, changed
, and moved
. Presumably if something is moved
, you can specify where it moved to by using the href
attribute. There’s also a datetime
attribute for specifying the timestamp, the format of which is defined in XML Schema, which references ISO 8601, which has since been revised. Whew.
The default presentation should be display: none
for deleted
markup, while the other three types should be displayed as-is. Note that if we assume that the XHTML2 browsers of the future will have solid CSS2 support, then we can write:
*[edit="deleted"] {
display: inline;
color: red;
text-decoration: line-through;
}
I’ll admit I like the idea of having a simple change record facility. However, there doesn’t appear to be an editedby
attribute, which seems like a bit of an oversight.
The Embedding Collection
Through the Embedding collection, each element can have a src
attribute. The browser attempts to replace the element’s content with the embedded file or resource. If the embedding fails for some reason, the browser proceeds to process the contents of the element. The spec provides an example involving a table:
<table src="temperature-graph.png" type="image/png">
<caption>Monthly temperatures</caption>
... (lots of table rows and cells) ...
</table>
The spec also declares,
Note that this behavior makes documents far more robust, and gives much better opportunities for accessible documents than the longdesc
attribute present in earlier versions of XHTML, since it allows the description of the resource to be included in the document itself, rather than in a separate document.
I scratched my head over the new src
attribute for a while… but if you think of it as a replacement for the longdesc
attribute, then it sort of makes sense. A couple of points, though.
First, a quibble with the W3C’s example — I’m not quite sure how replacing a perfectly good XHTML table with a PNG image constitutes a great leap forward under any circumstances.
Second, the W3C recommends:
This collection causes the contents of a remote resource to be embedded in the document in place of the element’s content. If accessing the remote resource fails, for whatever reason (network unavailable, no resource available at the URI given, inability of the user agent to process the type of resource) the content of the element must be processed instead.
Maybe I don’t understand this statement correctly, but I take the “instead” to mean that browsers should not continue to process content if the remote resource is accessible. If so, I certainly hope that the browsers ignore this recommendation. The browser doesn’t need to display the child content directly, but it needs to process the entire document and provide access to the child content somehow. Otherwise, accessibility gets worse, not better.
Finally, this concept opens up all sorts of interesting UI issues. Take the <table>
example above. If I use my browser’s “Find Text In This Page” function, will the browser search the text in the table cells? How will it highlight successful matches?
The Cite Attribute
The Hypertext collection permits any element to have a cite
attribute. At first glance, I thought that this made the <cite>
element redundant. But as Mark Pilgrim points out in Semantic Obsolescence:
“No, the cite
tag and the cite
attribute are not the same thing. The cite
attribute is a URL; the cite
tag is wrapped around actual names within your text.”
Fair enough. However, I’ve been thinking that this might be an oversight on the part of the W3C, and the cite
attribute should be allowed to contain arbitrary text. For example:
<h1>Ask Dr. Science!</h1>
<p>Q: Dr. Science, why is the sky blue? -- Ashley, age 8</p>
<p>A: Glad you asked, Ashley! The answer is simple, really:</p>
<blockquote
cite="Jackson, J.D. 1975. Classical Electrodynamics, 2nd. ed.
New York: John Wiley and Sons">
<p>
The scattering of light by gases, first treated quantitatively by Lord
Rayleigh in his celebrated work on the sunset and blue sky, can be
discussed in the present framework. Since the magnetic moments
of most gas molecules are neglible compared to the electric dipole
moments, the scattering is purely electric dipole in character. In
the previous section we have discussed the angular distribution and
polarization of the individual scatterings (see Figure 9.6). We
therefore confine our attention to the total scattering cross section
and the attenuation of the incident beam. The treatment is in two
parts...
</p>
</blockquote>
The cite
attribute provides an elegant way to scope sections of a document as belonging somewhere else, so why limit it only to stuff on the web? As for the <cite>
element, it would still be good for explicitly marking up citations (such as the ones found at the end of a journal article). Well, just a thought.
New LinkType Options
The rel
attribute, once the sole province of the <link>
element and <a>
element, is now universal. The allowed values are defined by the LinkType
data type. There are three new options:
-
parent
: You may now specify a link as a parent document. Strangely, you can’t specify your children
or siblings
. After all, it’s kinda hard to construct a full tree without information about the children. Oh heck, let’s just say it: won’t somebody think of the children?
-
meta
: The link “provides metadata, for instance in RDF, about the current document.” This allows you to place your metadata directly in the body content of your document. Not sure why you would want to do this as opposed to using the good old fashioned <meta>
tag, but ours is not to question why.
Speaking of <meta>
element, the spec states that “A common use for meta
is to specify keywords that a search engine may use to improve the quality of search results. When several meta
elements provide language-dependent information about a document, search engines may filter on the xml:lang
attribute to display search results using the language preferences of the user.” The idea that people will provide accurate keywords in the first place, let alone scope these keywords appropriately according to language, seems a quaint notion at best.
-
p3pv1
: When I first read this, the first thought that popped into my head was, “What’s the ‘v1’ doing in there?” The second thought that popped into my head was, “What’s the p3p doing in there?” I have no idea why we would want to reference a particular technology here, let alone a particular version of a particular technology. The W3C should rename this one to “privacy
” post-haste.
Note: Part II is now available.