X-Purgation

I was home working on the novel, minding my own business, when suddenly the door crashed in. A dozen men in black body armor swarmed through the splintered doorframe. “FEDERAL AGENTS! PUT DOWN THE LAPTOP!” I raised my hands, confused and frightened. What on earth had I done?

Then a blue eyed, middle-aged man in a charcoal suit strode into the room. I gulped. “Oh my God, it’s United States Congressional Representative Michael G. Oxley, (R-OH)!”

“That’s right, Evan,” Rep. Oxley said. “You’ve been a bad boy, I hear.”

“Those Sports Night DVDs are entirely legitimate,” I said. “Look, I can find the receipt…”

Oxley waggled his finger. “I’m talking about… the X-Philes.”

My blood ran cold. “The X-Philes?”

“Yes,” he said. “We have reports that substantial numbers of X-Philes sites are dead. Or worse, gone back to tag soup.”

I shifted uncomfortably in my seat. “Well… I was kinda counting on self-policing…” Rep. Oxley looked disgusted. The federal agents looked at each other, shaking their heads. “Keee-rist,” I heard one say.

“Son,” Oxley said, putting his hand on my shoulder, “I’m sure this is all an honest mistake. I’m sure you didn’t mean to be so trusting, and that a thorough audit will set this right. Riiight?”

“Y-Yes, sir,” I said, sinking even lower in my seat.

So at great personal expense, my crack team of PricewaterhouseCoopers auditors worked night and day to compile a full compliance report. Of the three X-Philes tests, the auditors only conducted the first test (“The Simple Validation Test”) and the third test (“The MIME-type Test”). As for validating three secondary pages (“The Laziness Test”), I am sorry to report that we simply could not include this metric in the test suite given current budgetary constraints. Look, when these guys are costing $54,000 a week, you have to cut corners somewhere.

Of the eighty X-Phile sites before the purge,[1], twenty-two (27.5%) were no longer compliant. The twenty-two non-compliant sites broke down as follows:

  • Six former X-Philes (27%) invalid and serving text/html.
  • Eight former X-Philes (36%) valid but back to serving text/html.
  • One former X-Phile (5%) invalid but still well-formed and serving up application/xhtml+xml.
  • Six former X-Philes (27%) dead or hijacked.
  • One former X-Phile (5%) hijacked and turned into a porn search engine / linkfarm node, complete with annoying pop-unders.

Which is actually not as bad as I thought… particularly the number of dead and hijacked sites, which seems on the low side.

Over the last year and a half, some X-Philes have been thoughtful enough to inform me about changes in XHTML compliance, and for that, I thank them. I also don’t mind it when sites go dark — that’s just inevitable. These folks are all welcome back at any time. As for the folks who deliberately switched back to tag soup without bothering to ping me, well, that’s a horse of another color.

I had never planned on doing any maintenance of the X-Philes list, and to be honest, I’m not sure how long the list itself will be around. (I’m pretty slow getting back to people on new submissions already — a sure sign of growing boredom/indifference on my part.) Nevertheless, it seems like a yearly purge is probably a good idea. I’m not the XHTML Police, but I would like the list to stay somewhat current.

By the way, this morning I got a phone call from Rep. Oxley. After exchanging some pleasantries, he informed me that next year, Senator Paul Sarbanes would be checking in on me.

“Senator Sarbanes is coming here?” I asked.

“That is correct, Evan. And he is most displeased with your apparent lack of site maintenance.”

“We shall redouble our efforts,” I said.

“I hope so, for your sake. The Senator is not as forgiving as I am.” Then the line went dead.

1. Not counting Beandizzy.com, which has been known to be dead/hijacked for many months. Beandizzy will always maintain its spot as X-Phile #2, for sentimental reasons.

27 thoughts on “X-Purgation

  1. Hah, and people still dare to claim XHTML is simple and good for the web.

    /me wished he was as smart as Evan to never choose it in the first place.

  2. Good for the Web? Yes. Simple? No.

    Sadly, I don’t see anyone building the next generation of CMSs, in which generating well-formed XHTML content is simple, straightforward, and all-but-foolproof.

    Until that happens, I don’t think Evan’s list is going to grow much longer than it is. (When it does happen, I suppose it will make Evan’s list superfluous.)

    In any case, I don’t think that future direction of the Web is to be decided by Advocacy (“Oh, you should use HTML4.” vs. “Oh, you should use XHTML.”) Instead, it’s compelling content which will determine the fate of XHTML.

    Maybe y’all need to start playing with the design possibilities inherent in inline SVG, or the tight integration of Web-Services and XHTML pages. There are definitely cool things you can do with XHTML (besides including MathML content). Finding such a use case will definitely spur adoption of XHTML (and of browsers which support it).

    Since I do think that XHTML is the future, I don’t think your experiments with it are a waste of time. But I sure wish we didn’t have to waste out time “bulletproofing” our CMSs instead of doing something cool.

  3. > Finding such a use case will definitely spur adoption of XHTML (and of browsers which support it).

    Rather browsers which support it will encourage people to find ue cases. There have been noises in the Mozilla community about getting a profile of SVG shipping in some fairly-near-future Firefox releases. Indeed, there has been some recent SVG activity. But even if it goes in, people won’t use it inline general untill IE supports it (of course SVG in general might be used since there is an Adobe plugin. But that doesn’t depend on XHTML).

    Even if IE did start to support SVG, the fact that there are apparently few XML based CMS systems (a few exist – e.g. Syncato for which you’ll need high tolerance for XSLT) and, moreover, little existing content is stored as valid XML mean that XML on the web is unlikely to take off in the near future.

  4. I had been hoping that in encouraging people to run up against the wall of XHTML, the X-Philes might have some tiny effect on innovation. People would figure out that XHTML was in fact really hard, and once that happened, they would:

    A) ask themselves why they needed to use XHTML,
    B) start thinking of new, consumer-friendly use cases for XHTML, and
    C) create pressure for a bulletproof, XML-aware CMS, a CMS that would finally put the X-Philes out of business.

    What actually happened was that people figured out that XHTML was really hard… so they decided to label their tag soup “XHTML” anyway, pat themselves on the back, and ship it. People want that “X”, and they’ll use any rationalization they can think of to slap that DOCTYPE at the top of their pages and call it a day.

    In this sense, the X-Philes have been a colossal failure. It has not even remotely slowed all the “elite” designers from promising high-profile clients “forward compatibility” via non-well-formed, text/html “XHTML” designs. My only hope is that when all the use cases actually do come into full flower, and the clients discover that they have to completely rewrite the front end of their CMS again, that they will ask said elite designers for their money back.

  5. “But even if it goes in, people won’t use it inline general untill IE supports it”

    If you’re waiting for IE to (natively) support anything, you will wait forever. This isn’t an XHTML issue per se.

    At best, you might hope to convince users to download a plugin. (I’ve had pretty decent success convincing users to download the MathPlayer plugin.) At worst, you can use the fact that, since your content is inline, you can use the DOM to provide IE users with alternative content.

    It’s not easy, but that’s the nature of new technology. Look at the long struggle to get users to migrate away from “Generation-4” browsers.

    BTW, syncato sounds neat. Certainly, an example of what could be done with the right tools.

  6. “In this sense, the X-Philes have been a colossal failure. It has not even remotely slowed all the ‘elite’ designers from promising high-profile clients ‘forward compatibility’ via non-well-formed, text/html ‘XHTML’ designs.”

    If anything, the “pushback” from top-drawer designers, like Mike Davidson has been been in the opposite direction. Ill-formed tag-soup is a badge of honour for him. And I’m afraid his point of view is much more popular than that of anyone who thinks that “forward-compatibility” is anything more than a marketing slogan.

  7. I do question why I use XHTML. I have no good reason, but always make sure my pages validate and are served as proper XHTML. It’s really not that hard to automate validation.

    Why use XHTML over well-formed HTML (i.e. tags closed in an XML rather than SGML fashion)? No idea.

    I may switch to HTML if XHTML is making my life too difficult, but for now it’s alright.

    Some people will say I’m nuts, but I’m really looking forward to XHTML 2.0. I want section and h tags. Far more flexible in terms of moving around and embedding data fragments without having to think too hard about header depth.

  8. I’d argue that Jacques is the only one of the X-Philes that actually needs XHTML at the moment. Musings is a great showcase for what XHTML was intended.

    Like Jacques, however, I believe that XHTML is an important piece of the web future. I started using XHTML in the hope that it would make it easier to use things like SVG somewhere down the line. jessey.net would have been an X-Phile if I had converted my older pages to be served as application/xhtml+xml, but since some of them are HTML 4.01 for a reason, it would have been silly. As it is, I am happy with the soon-to-be-redesigned Keystone Websites being on the list – particularly because of the article about serving XHMTL with the correct MIME type, because it promotes the technology.

    I think Anne van Kesteren has the right approach. He advocates using the right markup language for the right job, instead of blindly using XHTML for everything. As a part of that, I think it is important that XHTML is not misused. Strict flavors of XHTML should have to pass the acid tests of both the validator and an XML parser, otherwise it will be necessary for user agents to have the capability of handling XHTML as tag soup HTML.

  9. > BTW, syncato sounds neat. Certainly, an example of what could be done with the right tools.

    Syncato is neat. OTOH, last time I tried to get it running on my machine it would allow access to the admin interface but reliably crash when trying to access the main site. It is also, if not abandoned, then at least neglected. There has been no visible development for months.

    > At worst, you can use the fact that, since your content is inline, you can use the DOM to provide IE users with alternative content.

    True. But there’s almost no big sites willing to stick their neck out and provide a better experience for people not using IE. I suppose if Firefox is a continued success that might change. Remember, despite the gloom, it’s only very recently that an XHTML-enabled browser has reached 5% of the market (the reliability of such statistics not withstanding).

    > It’s really not that hard to automate validation.

    automating validation isn’t really the issue. In principle, anyone can add a ‘validate the document’ step to their workflow. The issues are:
    a) Understanding the errors validation produces
    b) Making sure it’s impossible for any site content to bypass validation
    c) Converting old, malformed, content.

    b) has a technical solution: make a XML parser the heart of your CMS. Have it pass around DOM nodes instead of strings. Finially serialize the DOM tree to text at the last possible moment. That’s what’s needed from next-generation CMS systems.
    a) is a hard human-computer interaction problem. The cleanest solution is to make it impossible for users to input malformed code – one can imagine a content-editor itself based on an underlying DOM-like representation of the document (Mozilla Composer does this); that might work to prevent XMLilliterate users from having to deal with validation errors (of course, enforcing validity as well as well-formedness is even harder. Adapting the system to deal with user-specific requirements worse still).
    c) is the really hard problem. If people want to move to XML they either have to treat their old content as an untouchable archive or have some mechanism for cleaning it up and integrating it with the new system. That means a lot of labour. Such a process can, at best, be semi-automatic. Unless there is a really killer reason to send XML to visitors, this last point may be the death of XML on the web.

  10. Gary wrote:

    “Why use XHTML over well-formed HTML (i.e. tags closed in an XML rather than SGML fashion)? No idea.”

    What advantage would that confer?

    It’s valid HTML (there’s no such notion of “well-formedness” for SGML), but so is omitting optional closing tags. As far as an SGML parser is concerned, they are identical.

  11. “It’s valid HTML (there’s no such notion of “well-formedness” for SGML), but so is omitting optional closing tags. As far as an SGML parser is concerned, they are identical.”

    This I know. Several reasons for what I said:

    a) I much prefer XML syntax to SGML syntax. I like things being closed and nested properly. I’m a programmer, so it’s something that has been drummed into me for as long as I’ve used computers.
    b) Technical reasons. Well-formed HTML will work with XML parsers in practice. Why not use SGML parsers? Because they are not as prevalent as XML parsers right now, and will become even more marginalised in the coming years. The chances of new platforms having SGML parsers (more readily available or at all) is getting slimmer.
    c) Political. Buzzwords help sell things. Sad, but true. XML is the king of buzzwords right now, so being able to use XML in any form will help.

  12. Gary wrote:

    “a) I much prefer XML syntax to SGML syntax. I like things being closed and nested properly. I’m a programmer, so it’s something that has been drummed into me for as long as I’ve used computers.”

    So use it. And be sure to indent the <li> items in your lists. And comment your code. And so forth.

    These things are geared to humans and make your code more maintainable. That’s good.

    But the machines couldn’t care less.

    “b) Technical reasons. Well-formed HTML will work with XML parsers in practice.”

    You’re actually going to throw HTML4 at an XML parser? Oooh la la! Have fun!

    “Why not use SGML parsers? Because they are not as prevalent as XML parsers right now, and will become even more marginalised in the coming years. The chances of new platforms having SGML parsers (more readily available or at all) is getting slimmer.”

    I don’t think HTML parsers are going to go out of fashion anytime soon.

    While there are lots of cool tools for manipulating XML, I wouldn’t try throwing HTML4 (even “carefully-authored” HTML4) at them.

    “c) Political. Buzzwords help sell things. Sad, but true. XML is the king of buzzwords right now, so being able to use XML in any form will help.”

    I think XML is the bee’s knees. So what? If you really want XML, author XML, and take the requisite steps to ensure well-formedness. Otherwise, you’re just setting yourself (or your clients) up for a disappointment.

  13. Jacques: I have thrown this sort of html4 at XML parsers before (a previous boss told me to do so). Worked fine. Not a single problem

    I’m not saying SGML parsers will die off, I am saying that they might not be written for future platforms. I’ve already seen this for one (admittedly esoteric but very cutting edge) platform with J2ME support.

    Probably worth mentioning that I’ve fairly convinced myself, by my earlier arguments, to stick with XHTML. That’s what it’s useful for.

  14. I’m not too concerned about standalone SGML parsers. More pertinent to the discussion is the HTML parser, which includes a combination of SGML rules for the good Web Citizens, plus a huge amount of robust error-correction code for everyone else.

    If you were worried about HTML parsers disappearing, fear not, all signs point in the opposite direction. Nowadays the trend is to build in an HTML parser as a fundamental OS component. (Which makes sense, because everyone needs a HTML parser, and by all accounts they are bitchy hard to write.) The point is that the longevity of our old friend the HTML parser is assured, at least until well after everyone stops opening up their favorite text editor, typing angley-brackety things in, and uploading the result to web servers. In other words, a very, very, very long time.

  15. And few other guys and I are currently working on a standards compliant CMS called “Fidelis”.

    We want to go for 100% valid markup, XHTML sent under the correct content-type, and many other goodies.

    So far, we have registered a site with source forge: http://fidelis.sourceforge.net (the site is still empty though.) We have some stuff in the CVS as well (no official releases yet).

    If we could get more people to join the project and participate in the mailing list, contribute code, help design logos, etc that would be great.

    For interested parties, please see this post.

  16. > And few other guys and I are currently working on a standards compliant CMS called “Fidelis”.

    Sorry to be boring (and I did scan the page but didn’t see an answer), but what type of architecture are you planning? Will the CMS pass around strings containing angle-brackets internally (as the choice of PHP suggests)? How will the system ensure that everything is valid (particularly if you’re looking to make the CMS extensible)?

  17. Okay, not going into the debate of HTML and XHTML once more, but I know only this: using XHTML sent as xhtml+xml is good for the technology itself. Using HTML is easier and more fail-safe. Choose whatever you like, I use both. For my blog, I use XHTML simply because I can. I have no fundamental reasons. No MathML, no pure XML. Just playing around. And that’s still what the web is about on personal level.

  18. Hmm, I’m sure I’m off the list as I no longer serve app/xthml.. I meant to write you, as I disabled it while testing some stuff out (3rd party code tests).

    At any rate for the re-design I’ll likely go with text/html anyway 🙂

  19. One thing that doesn’t seem to have been mentioned so far is that using PHP to detect browsers compatible with the application/xhtml+xml mime type can make your web pages uncacheable.

    Left to its own devices with a plain text file, a server provides browsers with useful information such as the date the file was last modified. it will handle conditional and partial fetch requests so that redundant traffic to the visitors is eliminated, and provides Content-Length headers that can be used to establish persistent HTTP connections (these reduce the number of round trips to the server needed to load a web page).

    By default, a PHP web page is provided with the bare minimum of HTTP headers, so as soon as you start using PHP on a web page, this functionality is lost. Unless, that is, you program your PHP pages to reproduce all these built-in server functions.

    Perhaps you could add a fourth test for XHTML pages: “Cacheability”. You can check the cacheability of your pages at http://www.ircache.net/cgi-bin/cacheability.py

    (Nice website by the way!)

  20. You don’t have to use PHP to do server-side MIME-type manipulation. If you’re only serving static pages, you can serve application/xhtml+xml to compliant browsers by configuring mod_rewrite.

    So I assume that the reason people use PHP is because they want they want dynamic features along with their XHTML. If these people were concerned about caching, they could certainly configure their PHP, or even choose to use a templating language with more aggressive default caching. PHP and its caching issues are implementation details, and are are not relevant to XHTML per se.

  21. > PHP and its caching issues are implementation details,
    > and are are not relevant to XHTML per se.

    True, but there’s a lot of it about, and it’s another reason why XHTML 1.1 is generally a waste of time.

    I’ve still not found a single XHTML 1.1 page that
    a) declares itself as xhtml+xml to all browsers but IE and
    b) doesn’t make the web work slower through lack of cacheability.

    Even W3.org are sticking with xhtml1.0 served as text/html for the time being.

  22. I dunno. Just tried out Jacques’ site in Safari, and it seems to cache okay, even though his site only rates a “yellow” in that Cacheability tool.

    Of all the sins XHTML has visited upon us, poor caching behavior isn’t one of them. Most hobbyist PHP sites will suffer from this regardless of their DOCTYPE and MIME type. This is partly a PHP problem, but ultimately a PEBKAC problem.

  23. > Just tried out Jacques’ site in Safari, and it seems to cache okay, even though his site only rates a “yellow” in that Cacheability tool

    Odd, it looks red to me. No Last-Modified, no ETag, no Content-Length. Your browser is probably caching it by default, but web caches and proxy servers won’t want to touch it. But at least it can differentiate between GET and HEAD requests 🙂

    Of course you’re right about PHP. It has dismal built-in support for cacheability. But then so do most server scripting languages. I wish i could play around with JSP, but it’s not supported on my server…

  24. “Odd, it looks red to me. No Last-Modified, no ETag, no Content-Length. Your browser is probably caching it by default, but web caches and proxy servers won’t want to touch it. But at least it can differentiate between GET and HEAD requests :-)”

    That has nothing to do with the page being served as application/xhtml+xml to capable browsers and everything to do with it being a *.shtml page.

    If you chose, instead, to look at one of my individual archive pages (served with exactly the same MIME-type logic) in that cacheability tool, you would see that it has Last-Modified, ETag and Content-Length headers, and merits a “green dot.”

    I should note, as well, that all my pages are sent deflate-compressed, which is even more important than cacheability in terms of its bandwidth savings.

Comments are closed.