Tutorial Sneak Preview: Elements vs. Tags

I’m working on a new version of my ancient, long-neglected HTML Tutorial, which I originally wrote in 2002. The Official List of reasons why I’m doing this includes:

  • finishing broken and missing sections, such as tables and forms
  • updating the tutorial for the new web environment (IE 6 is now the baseline)
  • migrating to DocBook so that I can provide multiple output formats
  • replacing all copyrighted material in the examples with material in the public domain
  • releasing the tutorial under a much more liberal license

But the real reason I’m doing this is because the tutorial currently says “tags” when it really should be saying “elements”. Arrgh! I cringe every time I read that. I’ve added a section about this to the tutorial so that others can avoid my mistake…

A Digression: What’s a “Tag”?

You’ll often hear people refer to “tags,” as in, “The markup tags tell the Web browser how to display the page.” Almost always, they really meant to say “elements.” Tags are not elements, they define the boundaries of an element. The p element begins with a <p> open tag and ends with a </p> closing tag, but it is not a tag itself.

  • Incorrect: “You can make a new HTML paragraph with a <p> tag!”
  • Correct: “It’s a good idea to close that open <p> tag.”

Now that you possess this valuable information, you’re in the same position as someone who knows that in the phrase “That’s not my forte,” the word “forte” should be pronounced fort, not for-tay. You get to feel slightly superior to people who say for-tay, but you really shouldn’t go running around correcting them.

Sometimes you’ll hear people say “alt tag,” which is even worse. An “alt tag” is really an alt attribute. This important attribute provides alternative text for images, in case the user can’t see the image for some other reason. We’ll talk more about this attribute later.

The element vs. tag confusion is sort of understandable: it’s a common mistake even among professionals, and they both look like angle-brackety things, after all. But attributes are not tags, not even close. If you hear someone say “alt tag,” this is a key indication that the speaker does not understand HTML very well. (You probably shouldn’t invite them to your next birthday party.)

I suppose I shouldn’t beat myself up, since nearly all of the prominent HTML tutorials get this and many other issues wrong, as any search for “html tutorial” makes depressingly obvious. Two notable exceptions are Stephanos Piperoglou’s Webreference.com tutorial and Patrick Griffith’s outstanding HTML Dog.

How to Convert AuthorIT to DocBook

Because the public demanded it! This is really just an overview of the process, but it should give you a basic idea about what to watch out for.

  1. Convert your AuthorIT book to DITA.

    DITA (Darwin Information Typing Architecture) is one of AuthorIT’s built-in publishing formats. Publishing to DITA results in a folder containing your book’s image files, a collection of *.dita files, and a toc.ditamap file.

    Sadly, you must take this opportunity to wave your index markers a fond farewell. They are apparently too old and frail to survive this stage of the journey.

  2. Download the DITA Open Toolkit.

    The DITA Open Toolkit (DITA-OT) is a collection of Apache Ant scripts, XSL stylesheets, and other goodies that enable you to transform DITA into other formats, including DocBook. For those of you who don’t live in the Java world, Ant is basically make for Java. Newer versions of DITA-OT conveniently include a copy of Ant, so you don’t need to install it separately.

    To install DITA-OT, unzip the toolkit’s files into any directory and run the startcmd.sh script (or startcmd.bat script on Windows) to configure your CLASSPATH and other environment variables. If you forget to set your CLASSPATH, the toolkit will helpfully indicate this to you by bailing out mid-transformation and complaining that the Ant script is broken.

    Before you run any DocBook transformations, edit xsl/docbook/topic2db.xsl and comment out the template that contains “Related links”. The only thing this template does is riddle your DocBook with invalid itemizedlist elements.

    Do not waste time reading the toolkit’s documentation. The manual that ships with DITA-OT 1.3 actually applies to DITA-OT 1.2, so most of the examples are broken. As for grammar and clarity, let’s just say that the manual’s translation from the original Old Frisian leaves much to be desired.

  3. Transform the DITA document into DocBook.

    All the toolkit’s transformations involve running an Ant script:

    ant options targets

    To transform DITA to Docbook, run:

    ant -Dargs.input=path/toc.ditamap dita2docbook

    If the transform fails (and all your environment variables are set correctly), there might be errors lurking in your generated DITA source. This is AuthorIT’s way of telling you, “Don’t let the door hit you on the way out, jerk!”

    • If DITA-OT complains about a missing topic reference, there’s a good chance toc.ditamap is referencing a topic that doesn’t exist. Go back to the original AuthorIT doc and try to identify the missing topic. If all else fails, delete the reference from toc.ditamap and move on. Your readers already knew about the safety hazards of handling lithium deuteride, anyway.
    • If a topic contains a xref with a crazy relative path, this can really confuse DITA-OT. The good news is that the toolkit indicates the path that is causing the problem. The bad news is that AuthorIT dumps its DITA output in UTF-16, which is really annoying to grep through.
    • If you had any “Note” paragraph styles in your AuthorIT doc, these might disappear. Even more strangely, “Warning” paragraphs do make it through.
  4. Clean up the DocBook output with a script.

    Congratulations, your document is now DocBook! Well, more accurately, it’s “DocBook”. Just be happy your tables made it through, sort of.

    Fortunately, you can fix many issues pretty easily by running the document through a cleanup script. This script is particularly important if you’re converting multiple documents. The canonical language for the script is XSLT, but if you’d rather stick it to the W3C Man, Python or Perl would work fine too. Here’s what you’ll want to fix:

    • Remove all id attributes. These generated IDs are duplicated throughout the doc, and nothing points to them. Throw them away and start over.
    • Remove all remap attributes. In theory, these attributes contain useful information about the original DITA element, which in turn could help you design your post-processing script to provide better-quality DocBook markup. In practice… eh, not so much.
    • Remove all sectioninfo elements. They’re often invalid, and always contain nothing useful.
    • Remove empty type attributes. Not sure how those got there.
    • Remove empty para elements.
    • Change sidebar elements to section elements. Like the empty type attributes, these are another mystery guest.
    • Join programlisting elements. If you had any multi-line code samples, you might find that in the transformed DocBook, each line appears in its own programlisting. Join adjacent programlisting elements into a single programlisting (or screen, if appropriate).
    • (Optional) Change the article to a book, if appropriate. Add chapter elements as necessary.
    • (Optional) Try to improve the quality of the markup by changing emphasis role="bold" and literal elements to something more specific. For example, you define a list of commands that appear in your book and wrap each one in a command element. Creating explicit lists of commands, GUI buttons, and so on is tedious, but it’s still better to do these substitutions in the script.

    Finally, there’s the issue of broken IDs and links. Currently, every one of your AuthorIT hyperlinks is now a ulink that falls into one of these categories:

    • The ulink‘s url starts with “mailto:“. Convert these to email elements.
    • The ulink‘s url starts with “http://“, or “ftp://“, or “gopher://“. Leave these alone.
    • The ulink‘s url points to something like “D1228.xml“, a.k.a. nowhere. These are your former internal hyperlinks. They’re all broken.

    But don’t be discouraged, your script can actually “guess” at where many of these links should point. If a given internal ulink contains something like, “Configuring the MIRV Launch Sequence”, there’s an excellent chance that somewhere else in your document there’s a section with a title, “Configuring the MIRV Launch Sequence”! So all you have to do is:

    1. Convert the content of each ulink to a nicely-formatted ID. Replace whitespace with underscores, remove extraneous punctuation, and lower-casing everything.
    2. Convert the ulink to an xref, setting the linkend to the new ID.
    3. For each section element, apply the same ID-conversion algorithm to the section‘s title. Set this value as the section‘s id.

    A healthy fraction of your ids and linkends should now match up, fixing those broken links.

  5. Clean up the DocBook output manually.

    Oh, you’re not done yet! Here’s a non-exhaustive list of what’s left:

    • Fix the remaining invalid ids and broken links that your script didn’t catch.
    • Fix any other DocBook validity issues.
    • Add programlisting and screen elements where appropriate. Remove excess carriage returns as necessary.
    • Make your inline markup consistent. For example, all command-line tools should be consistently marked up as commands (assuming your organization chooses to use that element). You can partly script this, but mostly this is a manual job.
    • Remove any mysterious duplicate sections.
    • Rename your images from “898.png” to something more descriptive, such as “mirv_reentry_trajectory.png“. Embed the images in a figure with a proper title and id.
    • Add any missing front matter.
    • Rebuild your index by hand. By hand. Jesus H. Christ.

    Now put your feet up on the desk and pour yourself a well-deserved gin-and-tonic. If anyone asks you why you look so frazzled, do not under any circumstances tell the truth. Otherwise they’ll just respond with, “Well, why don’t you just move it all to the corporate wiki?” And there’s only one rational reaction to that. Don’t get me wrong, it’s not easy to inflict serious blunt force trauma using a 15″ Powerbook, but somehow, you’ll find a way.

You Heard It Here First: Ticketmaster Sucks

So yesterday evening SJSU held a reading and book signing for the incomparably awesome Neil Gaiman. I went to the SJSU website, and discovered the tickets were being sold through Ticketmaster.

Uh-oh.

$15.00 to buy the ticket and hold it at will-call. $5.00 for the Ticketmaster service charge. Then, after you’ve entered your name and email address, another $4.80 “processing fee”. Screw that.[1],[2]

If SJSU can’t figure out how to sell $15.00 tickets without charging another 66% in fees, I guess it’s not really my concern. But perhaps they should take note: this could help explain why there were still tickets available on Thursday afternoon. Or maybe Neil Gaiman just isn’t very popular with the kids these days?

1. The contrast between the fees of Ticketmaster and the fees of other online companies that actually ship physical products are especially striking. Monopolies are awesome.

2. Although I do like the time-pressure aspect. They’re holding the ticket for 2:00 minutes! The clock is ticking… can our hero create a new user account in time? Cut the red wire — no, the blue!

The Spider in the Rearview Mirror

I have a spider in my rearview mirror.

This not a metaphorical spider in a symbolic mirror; I’m talking about a garden-variety California orb-weaver. I’ve only seen it once, but it lives in the gap between the rearview mirror’s glass and housing. Every night it comes out, spins a little web between the mirror and the window, and retreats back to its lair. Every morning I destroy the web.

Occasionally, I try to root out the spider with a twig, but I can’t seem to get at it. I could probably flush it out with a blast of water from the hose. But I haven’t bothered yet, because what really fascinates me about the spider is its tenacity, its single-mindedness. It doesn’t get discouraged. It doesn’t move its home to a more promising location. It seems to have no ability to process this particular input and react accordingly. The spider and I, we have a failure to communicate.

Fundamentally, I think this is why arachnids and insects are so creepy. Sam raised this idea a while back. If you’re hiking and you step near a snake, it will rear up and hiss at you to warn you off. You scared it, it’s trying to scare you. Message sent, message received. Reptiles, mammals, birds… there’s something comforting about how you can communicate with these creatures, at least at some very basic level. The Brotherhood of the Vertebrates.

But arthropods are alien creatures. Little unfathomable machines. Is it going to bite me? Scuttle away? Ignore me? What is the spider thinking when it fastens those eight beady little eyes on me?

Unhappy Predictions

The flu and some random colds are sweeping through the office. It’s like the plague hit. It’s so bad that on Friday, even I felt like I was getting the sniffles. Fortunately nothing came of it. I think this is how my ancestors managed to survive into the modern era. We weren’t even close to being the biggest or toughest or meanest SOBs around, but we did have a kick-ass immune system. Also we could sprint surprisingly fast when hard-pressed.

Of course, it goes without saying that whenever I start bragging about not getting sick, I get sick. As long as I keep my mouth shut, I can be fine for years. But this blog post has pretty much guaranteed that in short order, I’ll be deathly ill for at least a week.

For another dose of pre-Holiday cheer, I’m going to come right out and predict that the Republicans are going to maintain control of both the Senate and the House. I also predict much jawing by the pundits Wednesday about “the incredible last-minute Republican surge!” Somehow the polls were 3-4 sigma out again! Amazing!

Although the prospect of the Republicans winning again is awful enough, the thing that is orders of magnitude more horrifying is what that victory would prove, once and for all, about our electoral system. I really, really hope I’m wrong about this.

Update: Well, I must say, it has never felt so good to be so colossally wrong. I’m just glad I didn’t have a few hundred extra bucks burning a hole in my pocket, or I would have lost it all betting at tradesports.com.

Have You Hugged Your Local Browser Developer Today?

Sigh.

Look, go ahead and serve up “XHTML” as text/html. Really! So what? Yes, yes, if you actually served up your page with the right mime type so that it actually got parsed as XML instead of invalid HTML, your website would completely fall over. But hey, no worries! Fortunately, you’re not actually using your “XHTML” for anything that HTML 4.01 can’t do, so you can afford to blithely ignore the standard. Unlike, say, Jacques “The Hardest Working Man In Show Business” Distler.

So write some manifestos! Slap those XHTML doctypes at the top of your pages! Go nuts! But before you do so, take some time out to be thankful that there are hundreds of bright, hardworking, underappreciated coders who are designing browsers to clean up after your mess, just as they did for the table/spacer-gif jockeys of yore. I don’t think that’s too much to ask.

Back from VP X

Well, actually I got back from Viable Paradise late last night. Still unpacking, figuratively and literally.

At one of the late night get-togethers, Mur went around with her wellworn microphone and asked us what we had learned at VP X. I think I said something about plotting, which was the best I could come up with after N glasses of wine. That answer is at best incomplete, so let me try again:

  • Well okay, I did a great deal about plotting, mostly from sitting down with Jim Patrick Kelly for 45 minutes. The man is a mad genius.

  • I also learned how to make and use quill pens (or more properly, simply “pens”). And I learned the basics of how to spin yarn, but not how to knit. Next time for sure, Nikki

  • I learned that if you’re writing fantasy, you need to get medieval on your characters’ asses. If you write a novel about an imaginary Crusader state, you cannot afford to have prose that reeks of “bloodless modernism.” The big question is, how to write about pre-modern people and politics in a manner that doesn’t totally repulse an enlightened 21st century reader? Food for thought. In the meantime, I will be reading Icelandic sagas and pondering what is best in life.

  • Most important of all, I learned that SF is a fundamentally social genre. There is a huge ecosystem of SF readers and writers out there, and you need to be a part of that community to make any headway. Thank you to my amazing classmates and instructors for finally managing to drive this concept through my thick skull after all these years.

I’ll probably be talking about Viable Paradise and other SFnal things for some time to come. But overall, Viable Paradise was an amazing and possibly life-changing experience. I miss my classmates already, and no doubt I will be leaning on them heavily in the future…

X-Purgation Part II

Wait… I’m on vacation! Why am I writing about X-Philes stuff?

Well, before Viable Paradise starts up, I’m staying with my friends Byron and Karin in Boston. Yesterday, I spent all afternoon roaming around Boston on foot on about one hour of sleep. Today, I enjoyed a glorious brunch of French toast made from homemade whole-grain bread, fresh blueberries, maple syrup, and “sausage” patties made from beans and various spices, with gourmet tea and mocha. This delicious meal would have been entirely Vegan, except for the bacon. Mmmm, bacon. Anyway, this afternoon we’ve just been chilling at the house, killing some time, waiting for Karin to get back from work. Byron has been playing with his latest toy, the PhidgetServo 1-Motor board. And I’ve been cleaning out my email inbox. Which leads us back to… the X-Philes.

For many, many months, I’ve felt a gnawing sense of guilt every time I looked in my inbox. That’s because way down at the bottom, there’s been an ever-growing pile of X-Phile email to deal with. Some of the emails I responded to with a lie, saying, “Thanks for this submission! I’ll try to get to this soon.” But most of the emails I just ignored. Terribly, terribly rude of me. At the very least I should have told each person that truthfully, I wasn’t sure if I would ever get to their submission. But the more time went on, the more I neglected the submissions, and the more I neglected the submissions, the more I wanted to neglect them…

Finally, in an effort to regain control of my inbox, I went through the entire queue. But before I announce the new additions, I’d first like to apologize to everybody for sitting on this for so long. I am far, far too embarrassed to contact anyone individually about the status of their submission. Most have probably forgotten or don’t care anymore, but either way, my sincere apologies for not at least getting back to you in a timely manner.

And now, congratulations to our new X-Philes. They are, in order:

  • Schillmania. The blog of fellow Yahoo! Scott Schiller.

  • Sam Kauffmann. Associate professor of film at Boston University.

  • Phonophunk. Website and musical showcase for John Serris.

  • loadaverageZero. Dedicated to the latest standards in Web accessibility, design and programming using client-server, open-source technology.

  • Ether Multimedia. Multimedia consultancy and production house in Sydney, Australia.

  • Plerion Webdesign & Development. Web development and consulting with an emphasis on usability, accessibility, and standards.

  • Simone Deville. The most technologically-advanced dominatrix site, ever.

  • Funky Jah. Coding projects, music, and more (and if I knew more than twenty words of French, I could tell you all about it)

  • SR-71 Online. Thousands of pages about military aircraft, including the awesome SR-71, aircraft of choice for X-Men and X-Philes eveyrwhere.

A special award should go to Scott Schiller, who sent in his site in November 2004. Ouch. You can see why I’m far, far too embarrassed to even contact the owners of the sites that made it, let alone the ones that didn’t.[1] I’d also like to thank Drake Wilson, for helping kick me out of my lethargy and convincing me to go through the old list and purge all the sites that were dead or invalid.

Finally, I would be remiss not to share a fabulous message I received late last year from an anonymous Concerned Citizen:

Well, I can’t imagine why you think this CRAP

http://www.brantfordsymphony.com/

qualifies. There is hardly any XHTML in it – it’s just cut-up images. IMO, it encapsulates all that is WRONG with the web.

You can’t pretend to me that it is valid XHTML 1.1 – there are 177 errors on the main page alone.

So much for your list. How much are you being paid by these people to pretend their sites are what you claim they are?

Wow, XHTML hate mail! The only other piece of hate mail I’ve ever received was from an anonymous person who disagreed with my assessment of Howard Stern. It’s not immediately obvious which guy wins in the IQ department, though. Howard Stern Guy was far more profane and less grammatically correct than XHTML Guy, but I think Howard Stern Guy still pulls ahead if we take into account the key metric of “reading comprehension”. See, Howard Stern Guy managed to correctly assess that I do not like Howard Stern, while XHTML Guy failed to read the text on the main X-Philes page, even though I had helpfully bolded the key part:

Note that I do not check sites for whether they are “Bulletproof” (by, say, stress-testing them with invalid comments and trackbacks). Nor do I continually monitor these sites for validity. Ongoing XHTML maintenance is the site owner’s responsibility, not mine.

Maybe I should bold the whole thing? Set it to text-decoration: blink? Oh, well. And now to go roll around naked on the huge piles of dollar bills that I’ve made off of the X-Philes. Enjoy your weekends, all!

1. If you A) submitted your site, B) are not on the list above, and C) still care about this silliness, please check your referer logs. You’ll see a cluster of validations made in rapid succession — these should indicate which pages I checked, and the last page is the one that either failed validation or wasn’t serving application/xhtml+xml. Back in the old days, I used to send an email thanking people for their submission and mentioning what had gone wrong, but as I said above, I am just way too embarrassed to send out emails after all this time. But please feel free to contact me or resubmit if you like. My apologies again.

Posted in Web

Do Not Push the Red Button!

So this is the time of year when all Nice Jewish Boys (and Girls) should turn their minds to ethical questions. And no field is more fraught with ethical conundrums than… technical writing. For example: is it better to document every API method or option, no matter how obsolete or dangerous? (Otherwise known as the “Give them rope and explain how they can tie their own noose” approach.) Or should you try to hide all the bad stuff for the user’s own good? (Otherwise known as the “Allanon School of Technical Writing.”)

Usually I advocate the first school. Basically I figure that people are grownups, and if you do your best to explain things, hey, it’s their lookout. This is not to say some projects shouldn’t go the other way, but for the most part, I think more information is good. Plus, the first approach means more work to do, which theoretically means more employment for myself and my fellow tech writers! Solidarity, my brothers and sisters!

Anyway, while I do prefer the first school, this approach sometimes leads to amusing results. For example, my group maintains a certain internal tool that has a few dangerous command-line options. Most of these fall into the category of, “only use this if you really know what you’re doing,” which is fine. But there’s one that I had to document like this:

“[blah blah blah descriptive text.] CAUTION: Using this option can completely destroy your system. Do not use this option.”

I ran across this description again just a few days ago, and man, it never fails to crack me up. Trust me, we technical writers are really hilarious once you get to know us.