So I finally gave the new Beta W3C Validator a spin. The new validator has a number of new features and bug fixes, including plain-English error explanations (yay!) and support for the application/xhtml+xml
MIME-type (double yay!)
One of the most dramatic changes is the addition of “fussy” parsing, wherein the validator dings you for “things that are not strictly forbidden in the HTML Recommendation, but that are known to be problematic in popular browsers.” To my horror, I discovered that while my site passes standard validation, it fails “fussy” validation. Now, here’s where things get tricky. Unlike a purely mechanical validation against a schema or DTD, Fussy Validation is getting into a more complicated and subjective realm. Unfortunately for me, the Fussy Validator is right in my particular case. My error is straightforward to fix, but I’m going to leave it in place for the next few days, just so that you can see that my site doesn’t validate.[1] We’ll come back to this in a minute.
There are two problems with the Fussy Validator as it stands. The first problem is a simple UI problem, which we can easily attribute to the fact that the validator is beta software. When validated, my site yields a big red error message, “This page is not Valid HTML 4.01 Strict!” The problem is that this error message is demonstrably false. My site is HTML 4.01 Strict; it just fails Fussy Validation. Still, I have no doubt that the 1.0 version of the software will no longer conflate Fussy Validation with the official W3C Recommendation itself. One big step in this direction (besides changing the incorrect text) would be to lose the confusing Giant Red Bar of Total Rejection and replace it with something more subtle and tasteful, such as the Medium-Sized Yellow Bar of Wrinkly-Nosed Disapproval. But that’s up to the W3C’s UI gurus, I suppose.
The second problem is a bit more subtle. To go further, we’ll need to delve into A Little History.
A Little History
Ages ago, our ancient web designer ancestors had only the most primitive tools at their disposal. Fragments of web page source recovered from Lascaux, France reveal evidence of only the simplest table tags: <table>
, <tr>
, <td>
, <th>
, <caption>
. But despite this handicap, our ancestors managed to construct extraordinarily complex tabular structures. In fact, most present-day web designers continue to pay our ancestors homage by creating tables with the exact same set of building blocks.
Of course in our enlightened modern age, we have many more tools available. Chief among these are the <tbody>
, <thead>
, and <tfoot>
tags. These tags allow you to semantically specify the body, head, and foot of a table. And here’s where my site comes in. The Fussy Validator doesn’t like the fact that I have a table on my site (the calendar in the sidebar) that lacks a body and head (or foot). And in my case, the Fussy Validator is right. My calendar table should have a body and head.[2]
But that doesn’t mean the Fussy Validator is off the hook yet.
A cursory examination of the spec might give one the impression that tables must have a <tbody>
and a <thead>
or <tfoot>
. Coincidentally, I reached that conclusion just a few months ago, during Part II of my XHTML2 analysis. At first I did the only rational thing for a Markup Geek to do when faced with such a shocking discovery. I panicked. Fortunately Jacques Distler came to the rescue. “There, there,” he said soothingly. “All is well with the world.”
Jacques pointed out that the relevant text is a little further down in the spec: “The TBODY start tag is always required except when the table contains only one table body and no table head or foot sections.” In other words, if your table doesn’t have a head or foot and you only have one table body, you can just leave well enough alone. And if you think about it, this makes sense. There’s all sorts of gridlike or tabular data that simply doesn’t have a head or foot, and in that case the <tbody>
tag is just redundant.
The problem with the Fussy Validator is that it can’t possibly know which tables do need a head, foot, and body, and which ones don’t. That’s a fundamental limitation of Fussy Validation in the first place. Fussy Validation has to somehow understand the meaning of the code you’re writing, not just the structure. Of course, we’ve seen this problem before. Author Joe Clark rails against this very issue in Building Accessible Websites. He’s talking about Bobby, the automated accessibility checker, but the issue is the same:
What we have here is a computer program that threatens to withhold its certification badge (of dubious value in any case) if you didn’t write clearly enough. How does it know the difference, exactly? You probably get enough of that kind of bellyaching at home. Do you also need it at work?
My advice is simple: Do not use Bobby. Do not rely on software as dumb as a dromedary to evaluate accessibility.
I certainly don’t feel that the Fussy Validator is as “dumb as a dromedary,”[3] but Joe’s basic point stands. No software program can truly evaluate something as nebulous as “accessibility” or “proper coding practices”. Fussy Validation is an interesting concept, but it should probably be considered to be an advanced option and turned off by default. If people start conflating Fussy Validation with Real Validation, we’re going to be in for a bumpy ride.
1. And not because I’m feeling too lazy tonight to change and rebuild all my MT templates. I am thinking of nothing but the educational benefits for you, dear reader.
2. The head would be the row of weekday abbreviations, and the body would be everything else.
3. For one thing, I seriously doubt that any publicly-available software has managed to reach the intelligence level or complexity of even a spirochete.
Fussy, schmussy!
I don’t see what you’re complaining about. My site doesn’t validate at all in the beta validator:
http://validator.w3.org:8001/check?uri=http%3A%2F%2Fgolem.ph.utexas.edu%2F%7Edistler%2Fblog%2Findex.shtml&verbose=1&fussy=1
Anyway, since all sorts of bozotic constructions
http://www.annevankesteren.nl/archives/2003/09/19/invalid-after-validated
http://golem.ph.utexas.edu/~distler/blog/archives/000223.html
are valid (X)HTML, Fussy Parsing is even less realistic than you make out.
Oy, it’s even worse than I thought. Well, like the incorrect error message, I hope we can chalk that up to the fact that the new validator is beta software. I mean, at the very least the new validator should support the same level of validation functionality as the old one.
As for the bozotic constructions… I suppose this is what Fussy Parsing would theoretically be good for, right? Fussy Parsing enables the validator to “frown on” the crazy stuff that the DTD technically allows. Of course Fussy Parsing is never going to be all *that* smart, so caveat validator and all that.
“I suppose this is what Fussy Parsing would theoretically be good for, right? Fussy Parsing enables the validator to ‘frown on’ the crazy stuff that the DTD technically allows.”
I doubt it. Given your experience with <table>s, the “fussy” parser appears to be stupider than regular validation. The bozotic constructions Anne and I were talking about — <p> can contain <strong> as a child element, <strong> can contain <p> as a child element, ergo <p> can contain <p> as a grandchild (or more distant descendent) — can only be caught if the fussy parser is willing to descend the whole document tree looking for “non-nestable” elements as (arbitrarily distant) descendents.
It’s doable but, for any reasonable efficiency, requires considerably more intelligence than you have shown the Fussy Parser to possess.