HTML House of Horror: Things That Go <BLINK> in the Night

Thanks to Anne Van Kesteren for inspiring the HTML House of Horror. I’m not sure if he really meant to inspire it, but life is funny like that. So read on… if you dare!

Introduction

Several years ago, my roommate Sam was playing around with a language he called “HTML”. The great thing about HTML, I thought, was the power it gave you over the screen. Just reload the page and presto! Words, colors, images, fonts — you name it. It was sooo much cooler than the “real” languages I had struggled with in college. I didn’t know how to turn the screen red in C++, but darned if it weren’t easy in HTML.

I still remember downloading and embedding my first image in a page, a tacky little dancing Santa animated GIF. “Check this out!” I said. Sam seemed kind of impressed, but not sufficiently so. Hmmm, I thought. Okay then, let’s try this BACKGROUND attribute…

Suddenly, the screen flooded with dancing Santas. That was the day I learned: with great power comes great responsibility.

The <blink> tag

Once upon a time, Netscape and Internet Explorer fought for the hearts and minds of HTML coders in what became known as the browser wars. Although the conflict ended long ago, the battlefield is still strewn with landmines, otherwise known as proprietary tags. The theory was that by adding whizzy new features available only in the Netscape (or IE) browser, developers would flock to that “platform”. Sadly, judging by the number of “best viewed in IE 4” messages that still litter the web today, the theory was pretty sound.

The Netscape-proprietary <blink> tag… well, makes text blink. Microsoft never implemented this tag in Internet Explorer. One might attribute this decision to Microsoft’s sensibility and good taste, but given their response to the <blink> tag, this explanation seems unlikely. Of the billions of pages on the public web today, there are only two sites in existence that make effective use of the <blink> tag:

Other than that, it’s all crap.

Although the <blink> never made it into the HTML standard, it still lives on to this day in Gecko-based browsers such as Netscape 7 and Mozilla. And although the tag itself is forbidden, have no fear! The CSS standard continues to provide the World Wide Web with critical blinking functionality, in the form of the CSS declaration text-decoration: blink. Inquiring minds might wonder — what happens if you declare the following CSS rule?


  blink {
    text-decoration: none;
  } 

On Mozilla/Mac, the result is as you would expect. I haven’t tested this markup on other platforms, but my theoretical model predicts that your system will just, like, explode. Don’t say I didn’t warn you.

The <marquee> tag

The Microsoft-specific <marquee> tag scrolls a selection of markup across the screen. Essentially, the <marquee> tag creates a 100% wide div with text creeping across the screen from right to left, or left to right. There are a large number of attributes for the <marquee> tag: you can change the width, the alignment, the scrolling speed, the scrolling delay, and even the scrolling direction (right to left or left to right).

And if that’s not enough shiny animated goodness for you, you can of course style the <marquee> tag with CSS. For some reason IE5/Mac ignores the width property, but you can still muck with the padding, font, and color to your heart’s content. Furthermore, IE4+ allows you to marquee-up arbitrary chunks of markup, not just text. If we need to have a table or bulleted list sliding across the screen, we can build it. We Have The Technology.

Note bene: nesting <marquee> tags is not recommended.

A Most Horrifying Discovery

And now we come to the most horrifying discovery of all. Mozilla supports the <marquee> tag. Let me repeat that — Mozilla supports the <marquee> tag. In other words, the following markup is now possible:


  <marquee>
      <blink>Night Of The Living Dead</blink>
  </marquee>

Back in 1997, the line of demarcation was clear: Netscape supported <blink>, Internet Explorer supported <marquee>, and that was the end of the story. But today in 2003, we have a browser that supports blinking marquee text. Yes, only in the fiendish laboratory that is the Gecko Rendering Engine is such a crime against humanity even possible.

There are those who believe that there are lines that should never be crossed. There are those who believe that there are Secrets Man Was Not Meant To Know.

And then there are those who forge boldly onward on paths where more timid souls dare not tread. In the interest of Science,[1] I have have painstakingly constructed the most horrifying web page ever written. Behold! The Page of the Damned![2] (view source) Those of you with a Gecko-based browser will see the Page of the Damned in all its glory; anyone using a lesser browser will be shielded from the horror. So if you do doubt your courage, or your resistance to video-induced epileptic seizures, come no further. Either way, may you forgive me for my crimes, and may the W3C have mercy on my soul.

Happy Halloween!

1. Mad Science!

2. The Page of the Damned uses no JavaScript. It does use a smidge of CSS, but only because the <basefont> tag doesn’t seem to work in Mozilla. Also, to my great disappointment, Mozilla doesn’t support the direction attribute of the <marquee> tag. So in theory the page could have been even more horrifying, but we make do with the tools we have.

Worst. Tag. Ever.

“Rest assured, I was on the Internet within minutes, registering my disgust throughout the world.”

So far, this week has not been a great week, technology-wise.

First, this week’s award for Kookiest Third-Party Documentation goes to JUnit.org, for their rather creative Javadoc description for TestSuite.createTest(). Umm, guys, the traditional we-don’t-give-a-rat’s-ass-about-our-Javadoc Javadoc would have been something like, “TestSuite.createTest(): Creates a test.” Better to be random and pointless than simply pointless, I suppose.

Second, a special shout-out to Dell for designing their hardware such that you can’t buy 3rd-party memory. Let’s see, $210 for two sticks of PC100 RAM.[1] At least now my system can handle amazing feats of computational wizardry… such as running Netscape and FrameMaker at the same time.

But the real winner of this week’s sweepstakes is the <object> tag. The problems with this tag are well-known. Yes, it has terrible support in modern browsers. Yes, it is the only replacement for the <img> tag in XHTML2still. And no, the situation is not going to improve significantly until about the year 2006. But you’ve heard about all that. I’d like to share with you my particular episode of <object> tag pain.

The trouble started when I decided to play around with SVG. I dutifully downloaded the Adobe SVG plugin, and soon I was off to the races.

So how do you embed SVG in a web page? In theory, you could do it inline… if you served up a pure XHTML page with the right MIME-type, and a carefully constructed DOCTYPE, to the right browser, on every third Sabbath after Simchat Torah. But even the foolhardy don’t bother with this strategy. No, the accepted method uses the <object> tag, like so:

<object type="image/svg+xml" data="/path/to/image.svg"
    width="400" height="400">
  <img src="/path/to/image.gif" width= "400" height="400" 
    alt="description" />
</object>

The <object> tage embeds the SVG image, while the old-fashioned <img> tag provides a fallback GIF image for older browsers. Elegant, right? Only problem is that this markup crashes my copy of Safari every single time.

The problem is with the type attribute, which specifies the MIME-type of the SVG file. If the attribute is present, Safari crashes. If you delete it, Safari works just fine, but IE 5.2 for Mac no longer displays the object. Apparently IE5.2 needs the MIME-type explicitly defined. (This might also be the case for IE/Win, but I haven’t tested this yet.) Note that both browsers have interesting and quirky behavior. Safari is perfectly happy to display the SVG image if there is no type attribute at all or a totally made-up value (such as “foo/bar“. However, a wrong MIME-type (“text/html“) causes Safari to A) not display the image and B) bring a Finder window to the foreground. (?!) IE5.2, on the other hand, refuses to display the object if the attribute is absent or if the MIME-type is totally made up… but it does display the image if you provide any MIME-type that it understands, such as text/html. Meanwhile, Mozilla displays the image in all circumstances.

But don’t fret! You know what works for all three browsers, every single time, with no crashing or quirks whatsoever?

<embed src="/path/to/image.svg" width="400" height="400">

Yeah, I need a drink too.

1. Seriously, who do these Dell guys think they are? Apple?

A Little Too Fussy

So I finally gave the new Beta W3C Validator a spin. The new validator has a number of new features and bug fixes, including plain-English error explanations (yay!) and support for the application/xhtml+xml MIME-type (double yay!)

One of the most dramatic changes is the addition of “fussy” parsing, wherein the validator dings you for “things that are not strictly forbidden in the HTML Recommendation, but that are known to be problematic in popular browsers.” To my horror, I discovered that while my site passes standard validation, it fails “fussy” validation. Now, here’s where things get tricky. Unlike a purely mechanical validation against a schema or DTD, Fussy Validation is getting into a more complicated and subjective realm. Unfortunately for me, the Fussy Validator is right in my particular case. My error is straightforward to fix, but I’m going to leave it in place for the next few days, just so that you can see that my site doesn’t validate.[1] We’ll come back to this in a minute.

There are two problems with the Fussy Validator as it stands. The first problem is a simple UI problem, which we can easily attribute to the fact that the validator is beta software. When validated, my site yields a big red error message, “This page is not Valid HTML 4.01 Strict!” The problem is that this error message is demonstrably false. My site is HTML 4.01 Strict; it just fails Fussy Validation. Still, I have no doubt that the 1.0 version of the software will no longer conflate Fussy Validation with the official W3C Recommendation itself. One big step in this direction (besides changing the incorrect text) would be to lose the confusing Giant Red Bar of Total Rejection and replace it with something more subtle and tasteful, such as the Medium-Sized Yellow Bar of Wrinkly-Nosed Disapproval. But that’s up to the W3C’s UI gurus, I suppose.

The second problem is a bit more subtle. To go further, we’ll need to delve into A Little History.

A Little History

Ages ago, our ancient web designer ancestors had only the most primitive tools at their disposal. Fragments of web page source recovered from Lascaux, France reveal evidence of only the simplest table tags: <table>, <tr>, <td>, <th>, <caption>. But despite this handicap, our ancestors managed to construct extraordinarily complex tabular structures. In fact, most present-day web designers continue to pay our ancestors homage by creating tables with the exact same set of building blocks.

Of course in our enlightened modern age, we have many more tools available. Chief among these are the <tbody>, <thead>, and <tfoot> tags. These tags allow you to semantically specify the body, head, and foot of a table. And here’s where my site comes in. The Fussy Validator doesn’t like the fact that I have a table on my site (the calendar in the sidebar) that lacks a body and head (or foot). And in my case, the Fussy Validator is right. My calendar table should have a body and head.[2]

But that doesn’t mean the Fussy Validator is off the hook yet.

A cursory examination of the spec might give one the impression that tables must have a <tbody> and a <thead> or <tfoot>. Coincidentally, I reached that conclusion just a few months ago, during Part II of my XHTML2 analysis. At first I did the only rational thing for a Markup Geek to do when faced with such a shocking discovery. I panicked. Fortunately Jacques Distler came to the rescue. “There, there,” he said soothingly. “All is well with the world.”

Jacques pointed out that the relevant text is a little further down in the spec: “The TBODY start tag is always required except when the table contains only one table body and no table head or foot sections.” In other words, if your table doesn’t have a head or foot and you only have one table body, you can just leave well enough alone. And if you think about it, this makes sense. There’s all sorts of gridlike or tabular data that simply doesn’t have a head or foot, and in that case the <tbody> tag is just redundant.

The problem with the Fussy Validator is that it can’t possibly know which tables do need a head, foot, and body, and which ones don’t. That’s a fundamental limitation of Fussy Validation in the first place. Fussy Validation has to somehow understand the meaning of the code you’re writing, not just the structure. Of course, we’ve seen this problem before. Author Joe Clark rails against this very issue in Building Accessible Websites. He’s talking about Bobby, the automated accessibility checker, but the issue is the same:

What we have here is a computer program that threatens to withhold its certification badge (of dubious value in any case) if you didn’t write clearly enough. How does it know the difference, exactly? You probably get enough of that kind of bellyaching at home. Do you also need it at work?

My advice is simple: Do not use Bobby. Do not rely on software as dumb as a dromedary to evaluate accessibility.

I certainly don’t feel that the Fussy Validator is as “dumb as a dromedary,”[3] but Joe’s basic point stands. No software program can truly evaluate something as nebulous as “accessibility” or “proper coding practices”. Fussy Validation is an interesting concept, but it should probably be considered to be an advanced option and turned off by default. If people start conflating Fussy Validation with Real Validation, we’re going to be in for a bumpy ride.

1. And not because I’m feeling too lazy tonight to change and rebuild all my MT templates. I am thinking of nothing but the educational benefits for you, dear reader.

2. The head would be the row of weekday abbreviations, and the body would be everything else.

3. For one thing, I seriously doubt that any publicly-available software has managed to reach the intelligence level or complexity of even a spirochete.

Son of Bulletproof XHTML

I’m pleased to announce that I am the latest guest columnist for A Second Voice, Dave Shea’s collection of articles on standards, markup, accessibility, and more.

For those of you who don’t follow such things (hi Mom!), Dave is the caretaker of the CSS Zen Garden, which has quickly become the repository for elegant and modern website design techniques. For this reason, I’m deeply honored that Dave tapped me to write this piece,1 which grew out of an earlier discussion with Dave and Jacques on creating Bulletproof XHTML. I should note that the previous Second Voice article by Ian Lloyd discusses the importance of promoting web standards out in the workplace, while my article discusses the practical difficulties and considerations in maintaining compliance with these standards. I think the two articles bookend each other nicely, although I get the feeling that Ian is playing Good Cop and I’m playing Bad Cop. Well, we’ll see how this plays out.

1. Particularly considering what a horrible green eyesore my site is. At first I worried that I was going to ruin Dave’s rep as a designer just by association. But then I got over it.

Nobody Beats Up My Little Brother But Me

Deep within the comments of Dave Shea’s recent post on browser dependencies, Jeff Croft summarizes his design methodology. It’s so excellent that I’m going to go right ahead and reprint the thing:

In practice, I often find myself doing a bit of an “outside-in” thing. Since my University job forces me to make sites look reasonable in Netscape 4.7x, I have a general design process that looks something like this:

  1. Mark up content in XHTML. Test in Lynx to esure proper flow and such.
  2. Link to a basic stylesheet that Netscape 4 will see.
  3. Write styles for basic (NN4) stylesheet. Typically, this is fonts, colors, and not much else.
  4. @import an advanced stylesheet, for modern browsers.
  5. Write styles for advanced stylesheet, taking full advantage of as much CSS as possible, not really caring whether it works in “mid-level” browser such as IE5 or IE6. At this point, I’m just getting it to look perfect in Safari/Mozilla/Other near-perfect browser.
  6. Revert to a mid-level browser (usually IE5 and IE6) and tweak styles to satisfy them.

Right on, Jeff. Methodical and comprehensive.

I can also sympathize with Jeff and the requirements of his University job. After all, for three-and-a-half years, I had to target Netscape 4.7 as my organization’s primary browser. I was waiting for years for the company to switch to Netscape 6… then Netscape 6.2… then Netscape 7… but it never happened. Be with me here, people. Feel my pain.

So these days I’m of two minds when I hear people ganging up on poor old Netscape 4. On the one hand, Netscape 4 deserves to be bashed. It is truly a lousy piece of software in all respects: standards compliance, rendering speed, user interface, system resources consumed, you name it.1 On the other hand, most of the people doing the bashing don’t really know the horror. Sure, they’ve thrown up their hands in disgust at its CSS bugs.2 Who hasn’t? But have they fought with it for hours? Have they tried to scroll through a styled table with hundreds of cells on an old UltraSparc? Have they had to explain to users that disabling JavaScript also secretly disables style sheets (even though the two options are separate checkboxes that sit right next to each other)? In short, have they bled?

I dunno. I know it’s perverse, but sometimes I feel like I should defend battered, dying old Netscape 4 from the general population. Journeymen! Dilettantes! Feh. If anyone has the right to bash Netscape 4, it ought to be me.

1. The one area where Netscape 4 made signficant strides was stability. Early Netscape 4 was horribly crashy, but as we moved up through Netscape 4.71, 4.72, etc., it actually became fairly stable. Go figure.

2. Incidentally, Netscape 4 sorta kinda understands the float property. So with some tweaking, you can produce primitive tableless sites that display (imperfectly) in Netscape 4. This very site is only one example.

3. Regarding the title of this entry: I don’t actually have a little brother, and I wouldn’t beat him up if I did. It’s just an expression.

Bulletproof XHTML

Is your XHTML bulletproof?

Jacques Distler‘s is. So is Yuan-Chung Cheng‘s. Dave Shea‘s working on it.

The day’s just wasting away, isn’t it?

Orthodoxies

I keep my RSS subscriptions fixed at a manageable 20; Jeffrey Zeldman‘s got a permanent spot on my list. Zeldman is a superb designer and a hell of a writer. I love Zeldman bunches.

And yet as often as I find his comments on web standards illuminating, occasionally I’m forcibly reminded that deep down Zeldman is a Technology Evangelist. Evangelists are great at efficiently spreading new ideas throughout the community… but the flip side is that they often have trouble departing from certain orthodoxies.

For a prime example of this, see Zeldman’s Web Design World keynote. There’s plenty of good stuff in the keynote overall, but this particular slide needs addressing:

  1. “XHTML is XML that acts like HTML in browsers.”

    Better to replace “acts like” with “is”, given that nearly all “XHTML” websites are actually parsed as HTML by all currently available browsers. (I suppose there are a rarified few sites that actually can be parsed as XML by certain highly advanced browsers, but this hardly counts as statistically significant.)

  2. “It also works as expected in most Internet devices (Palm Pilots, web phones, screen readers).”

    There are some who would dispute that. Apparently many real-world mobile devices happily support old-skool HTML cruft (table-based layouts, the <font> tag) while ignoring the products of our more enlightened era (XHTML Basic, CSS). Weird but true.

  3. “It’s as easy to learn and use as HTML.”

    Except you have to teach people about closing tags, proper nesting, encoding all your ampersands, and so on. Meanwhile, people who feel like writing tag soup HTML can just whip out a text editor and go. (Then again, considering the tremendous popularity of tag-soup XHTML on the web today, maybe this is a distinction without a difference.)

  4. “It’s the current recommended markup standard (replaces HTML 4).”

    The closest the spec comes to saying this is, “The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while still remaining confident in their content’s backward and future compatibility.” That’s a fine marketing blurb, but it’s not an official announcement that HTML 4.01 is deprecated.

  5. “Because it’s XML, it works better with existing XML languages and XML-based tools (SVG, SOAP, RSS, SMIL, XSL, XSLT, databases, etc.).”

    This is pretty hand-wavy. On the server side, you can easily transform your backend data into valid XHTML, but you can also transform it into valid HTML 4.01 Strict just as easily. Heck, some people do both. On the client side, there are a tiny, tiny fraction of super-geeks who embed SVG or MathML directly in their valid and properly MIME-typed XHTML pages. These super-geeks are the only people in the world who have to present outward-facing XHTML; the rest of us are just fooling around.

  6. “It brings consistency to web documents that was missing in HTML.”

    I assume this means “more consistency in coding style.” XHTML theoretically enforces more consistency due to its more rigorous syntax… but since the vast majority of people can’t be bothered to produce valid XHTML code, this benefit is somewhat obviated.

  7. “It’s a bridge to more advanced (future) XML languages and applications and perhaps to more advanced versions of itself (XHTML 2?).”

    Heh.

The presentation then goes on to cite nine websites that are using structured XHTML and CSS, including some big names such as ESPN and Wired. None of the sites serve up their pages as application/xhtml+xml to browsers that accept this MIME-type, which means that each site is being treated as… you guessed it, good old fashioned HTML. And that’s a damn good thing, because four of the sites have invalid home pages, and four others have invalid secondary pages just one click away. The only site diligent enough to pass the “Laziness Test” is the CSS Zen Garden. (To his further credit, the creator of the Zen Garden is considering the MIME-type issues as we speak.)

Anyway. The point is not that cutting edge designs are bad; they’re not. The point is not that “Evan hates XHTML”; I don’t. XHTML allows you to do some amazing things that you can’t do with HTML. Unfortunately, due to the dreadfully primitive state of XML browsers and tools, there’s really nobody using XHTML for anything that you can’t do with HTML.[1]

The real problem is that XHTML is being touted as a replacement for HTML. It’s not. XHTML is a different technology that suits different purposes. There are a lot of influential people who are blurring this distinction, and I’d like to think that they know better.

1. Except for a few oddball physicists, mathematicians, and chemists, but who’s counting them?

XHTML2 Explorations, Part II

In the last installment of XHTML2 Explorations, we touched on how XHTML2’s attribute collections provide a rich variety of behaviors for nearly all elements. Today’s installment focuses on the individual elements themselves. As with Part I, this post does not discuss “well-known” XHTML2 concepts.1

New Paragraph Model

XHTML2 gives us a new twist on our old friend the <p> tag. In previous versions of HTML and XHTML, you could not nest other block elements inside a paragraph. Now you can nest any block element except for another paragraph. In other words, you can now consider a list or table to be semantically part of a paragraph.

This leads to an interesting result (first brought to my attention by Jacques Distler). Consider the following (invalid) markup2:

  <p style="color: red">
    Why is Gordon better for Dana than Casey?
    <ul>
      <li>Knows mayor Giuliani</li>
      <li>Can dress self w/o assistance from Wardrobe</li>
      <li>Obvious physical prowess</li>
      <li>Two!! post-graduate degrees</li>
    </ul>
  </p>

Since a paragraph can’t contain lists, a standards-compliant browser would end the paragraph just before the start of the unordered list. Thus the words, “Why is Gordon…” would be colored red while the list items would remain unchanged.

However, under XHTML2’s new paragraph model this code is valid, and everything inside <p> and </p> would be red. I have to admit that from a coding standpoint, I like this. If nothing else, it matches my naive expectations a little better. (“Hey, why isn’t my list red?” the newbie web designer wonders…)

From a semantic standpoint, I’m a little less sure about this. I usually don’t think of my tables as being nested inside my paragraphs. Neither does Framemaker, for that matter. Is Framemaker broken? Am I broken? Well, maybe. The good thing about this model is that it’s totally optional — you can nest stuff inside paragraphs, or not. Works for me.

Scripting

You can now nest <script> elements and process them like the <object> element. If the browser understands the parent script, execute it; otherwise go on to the child script(s). It’s a nifty model, albeit one that is not backwards compatible. Whoops — I forgot, it’s not fair for us to harp on that. Sorry. Anyway, the model itself looks good, particularly if you want to script with multiple languages. The spec does take great pains to mention that there are scripting languages besides Javascript, such as… type="text/x-perl". Hmmm.

There is also a new declare attribute for both scripts and objects. This boolean attribute specifies whether the script or object is a “declaration only”, meaning that it is not to be processed until the user initiates some sort of action. And speaking of actions, there’s a brand-new events model defined by the XMLEVENTS standard (which is pleasantly short and easy to read). XMLEVENTS allows you to set any element as the observer or handler for standard DOM events. It also provides a generic <listener> element that can pass events off to handlers. The XMLEVENTS spec looks to be far more comprehensive and flexible than our current model, which involves slathering our code with onmouseover attributes and whatnot. The only catch is that XMLEVENTS has totally replaced the existing model, which means our current scripts all just went POOF!. Argh, there I go again…

The spec also declares the death of document.write:

Note that because of the XML processing model, where a document is first parsed before being processed, the form of dynamic generation used in earlier versions of HTML, using document.write cannot be used in XHTML2. To dynamically generate content in XHTML you have to add elements to the DOM tree using DOM calls [DOM] rather than using document.write to generate text that then gets parsed.

Well, that’s fair enough. The W3C seems to be saying, “Look guys, this is XML here. Not HTML, not some halfway-step that decays back into friendly tag-soup if you make a mistake — this is the real stuff here.” I suppose the real question is whether they’re going to forbid XHTML2 to be served up as text/html. I can’t find any discussion of MIME-type issues in the current spec, so it’ll be interesting to hear the W3C’s final decision on this.

Miscellany

  • There’s a new <quote> element for marking up inline quotes. Unlike its ill-fated predecessor <q>, the <quote> element does not automagically insert localized quotation marks. “We give up,” the W3C is saying. “Insert your own damn quotation marks.”

  • Tables are relatively untouched. The one noticeable change is that the summary attribute is now an element, presumably to provide a facility for richer descriptions. The spec makes no recommendations for how to display this content in visual browsers. I think we can assume that the default should be display: none, but you never know.

    I also noticed that the spec requires tables to have one or more <tbody> elements… but as it turns out, this requirement dates all the way back to HTML4! I must say this comes as quite a shock, given that the validator happily accepts pages with <tbody>-free tables as HTML 4.01 Strict. I’m looking at the spec again right now, and I’m still a bit freaked out over this. Do I not know how to read the spec? Am I hallucinating because I skipped lunch in preparation for the big Fourth of July BBQ? We report, you decide. (Edit: it turns out I can’t read the spec. Never post while under the influence of carbohydrate deprivation.)

  • Finally, Ruby text is now a standard module. This is good news for hundreds of millions of Asian-language speaking users… at least, I think. Actually, this begs the question: if Ruby text is a fundamental component of Asian typography, why are we forcing Asian users to go all the way to XHTML2 to use it? It seems like it would be useful and straightforward to retrofit Ruby text onto HTML. Then again, since HTML is officially a dead specification, the point is moot.

Conclusions

Errr… who needs conclusions? It’s barbeque time. Happy Fourth!

1. My definition of “well-known” is “I remember hearing about it vaguely on someone’s blog somewhere.”

2. As for whether the pro-Gordon argument is as invalid as the markup… well, I leave that as an exercise for the reader.

XHTML2 Explorations, Part I

You read that right: “XHTML2 Explorations”. Yes, the fun never stops here at goer.org.

I’ve decided to take a closer look at XHTML2, or more specifically, XHTML2 Working Draft 6. I’ll admit that I haven’t done a good job slogging through the thousands of messages on the W3C lists. I’m just a casual observer.

Fortunately, there’s been plenty of weblog chatter over the <object> replacing <img>, <cite> getting dropped and then added back, the excitement over <blockcode>, the battle over the style attribute, navigation lists, the new <h> and <section> model, the “href on everything” model, and more. The Alphas have been discussing these issues for months, and we Gammas have been well-served by just listening in.

But even with all the healthy public discussion, XHTML2 is a big specification (430 KB and counting). At least for my own edification, it’s time to see how deep the rabbit hole goes.

A Swarm of Attributes

XHTML2 provides a huge number of common attributes that are divided up into collections. Most XHTML2 elements accept attributes from all collections. The most well-known example of this is the “everything is a hyperlink” concept, wherein you can turn any XHTML2 element into a link by applying the href attribute.

But of course there’s much more. Consider the set of “common” attributes in HTML 4.01: there’s id, style, class, dir, lang, title, and the “event” attributes (such as onmouseover). This yields a little over 15 attributes, depending on how you’re counting. In contrast, XHTML2 already provides around 30 common attributes. Let’s take a closer look.

The Edit Collection

The Edit Collection provides an edit attribute with four allowed values: inserted, deleted, changed, and moved. Presumably if something is moved, you can specify where it moved to by using the href attribute. There’s also a datetime attribute for specifying the timestamp, the format of which is defined in XML Schema, which references ISO 8601, which has since been revised. Whew.

The default presentation should be display: none for deleted markup, while the other three types should be displayed as-is. Note that if we assume that the XHTML2 browsers of the future will have solid CSS2 support, then we can write:

  *[edit="deleted"] {
    display: inline;
    color: red;
    text-decoration: line-through;
  }

I’ll admit I like the idea of having a simple change record facility. However, there doesn’t appear to be an editedby attribute, which seems like a bit of an oversight.

The Embedding Collection

Through the Embedding collection, each element can have a src attribute. The browser attempts to replace the element’s content with the embedded file or resource. If the embedding fails for some reason, the browser proceeds to process the contents of the element. The spec provides an example involving a table:

  <table src="temperature-graph.png" type="image/png">
    <caption>Monthly temperatures</caption>
    ... (lots of table rows and cells) ...
  </table>

The spec also declares,

Note that this behavior makes documents far more robust, and gives much better opportunities for accessible documents than the longdesc attribute present in earlier versions of XHTML, since it allows the description of the resource to be included in the document itself, rather than in a separate document.

I scratched my head over the new src attribute for a while… but if you think of it as a replacement for the longdesc attribute, then it sort of makes sense. A couple of points, though.

First, a quibble with the W3C’s example — I’m not quite sure how replacing a perfectly good XHTML table with a PNG image constitutes a great leap forward under any circumstances.

Second, the W3C recommends:

This collection causes the contents of a remote resource to be embedded in the document in place of the element’s content. If accessing the remote resource fails, for whatever reason (network unavailable, no resource available at the URI given, inability of the user agent to process the type of resource) the content of the element must be processed instead.

Maybe I don’t understand this statement correctly, but I take the “instead” to mean that browsers should not continue to process content if the remote resource is accessible. If so, I certainly hope that the browsers ignore this recommendation. The browser doesn’t need to display the child content directly, but it needs to process the entire document and provide access to the child content somehow. Otherwise, accessibility gets worse, not better.

Finally, this concept opens up all sorts of interesting UI issues. Take the <table> example above. If I use my browser’s “Find Text In This Page” function, will the browser search the text in the table cells? How will it highlight successful matches?

The Cite Attribute

The Hypertext collection permits any element to have a cite attribute. At first glance, I thought that this made the <cite> element redundant. But as Mark Pilgrim points out in Semantic Obsolescence:

“No, the cite tag and the cite attribute are not the same thing. The cite attribute is a URL; the cite tag is wrapped around actual names within your text.”

Fair enough. However, I’ve been thinking that this might be an oversight on the part of the W3C, and the cite attribute should be allowed to contain arbitrary text. For example:

  <h1>Ask Dr. Science!</h1>

  <p>Q: Dr. Science, why is the sky blue? -- Ashley, age 8</p>

  <p>A: Glad you asked, Ashley!  The answer is simple, really:</p>

  <blockquote
    cite="Jackson, J.D. 1975. Classical Electrodynamics, 2nd. ed. 
          New York: John Wiley and Sons">
  <p>
    The scattering of light by gases, first treated quantitatively by Lord
    Rayleigh in his celebrated work on the sunset and blue sky, can be 
    discussed in the present framework.  Since the magnetic moments 
    of most gas molecules are neglible compared to the electric dipole
    moments, the scattering is purely electric dipole in character.  In 
    the previous section we have discussed the angular distribution and
    polarization of the individual scatterings (see Figure 9.6).  We 
    therefore confine our attention to the total scattering cross section
    and the attenuation of the incident beam.  The treatment is in two 
    parts...
  </p>
  </blockquote>

The cite attribute provides an elegant way to scope sections of a document as belonging somewhere else, so why limit it only to stuff on the web? As for the <cite> element, it would still be good for explicitly marking up citations (such as the ones found at the end of a journal article). Well, just a thought.

New LinkType Options

The rel attribute, once the sole province of the <link> element and <a> element, is now universal. The allowed values are defined by the LinkType data type. There are three new options:

  • parent: You may now specify a link as a parent document. Strangely, you can’t specify your children or siblings. After all, it’s kinda hard to construct a full tree without information about the children. Oh heck, let’s just say it: won’t somebody think of the children?

  • meta: The link “provides metadata, for instance in RDF, about the current document.” This allows you to place your metadata directly in the body content of your document. Not sure why you would want to do this as opposed to using the good old fashioned <meta> tag, but ours is not to question why.

    Speaking of <meta> element, the spec states that “A common use for meta is to specify keywords that a search engine may use to improve the quality of search results. When several meta elements provide language-dependent information about a document, search engines may filter on the xml:lang attribute to display search results using the language preferences of the user.” The idea that people will provide accurate keywords in the first place, let alone scope these keywords appropriately according to language, seems a quaint notion at best.

  • p3pv1: When I first read this, the first thought that popped into my head was, “What’s the ‘v1’ doing in there?” The second thought that popped into my head was, “What’s the p3p doing in there?” I have no idea why we would want to reference a particular technology here, let alone a particular version of a particular technology. The W3C should rename this one to “privacy” post-haste.

Note: Part II is now available.