Tech Pubs Tuesday: Don’t Write Directly in HTML

Since most of the documentation you produce is going to be hosted as HTML anyway, you might be wondering: why not just cut out the middleman and write all your documentation in plain HTML? Instead of learning some weird intermediate format that transforms to HTML, wouldn’t it make sense to handcraft whatever markup, CSS, and JS that you need directly?

Actually, since most documentation tools and formats range from mediocre to awful, plain HTML isn’t necessarily a bad choice. I’ve seen some beautiful documentation authored in handcrafted HTML. That said, using a more specialized format will make things easier on you. Here’s why.

First, typing out HTML open and close tags is annoying and breaks your writing flow, even if you have a good text editor or IDE. If you’ve used a lightweight markup language like Markdown or reStructuredText, you know the difference. Creating a new paragraph with newline, newline is more natural than angle-bracket, p, angle-bracket. Creating a bullet list with leading *s is much more natural than typing <ul>s and <li>s.

Second, maybe you do want multiple output formats! Are you sure you don’t want PDF? What about man pages? If you want man pages, you’re going to have to invent some mechanism for transforming your HTML into TROFF (a venal sin), or you’re going to have to write the same material twice (a mortal sin).

Third, and perhaps most important, HTML is deliberately primitive and general-purpose. It lacks the semantics that you want for describing technical documentation. For example, when writing a sophisticated book, you might want things like:

  • Real cross-references. You want to create a link that points to a section or an example or a table that automatically updates itself when the thing it references moves, or changes its title.
  • Fancy admonitions (warnings, dangers, cautions, notes, tips)
  • Fancy titled tables and figures
  • Fancy titled code examples
  • Automatically generated tables of contents, lists of figures, lists of examples…
  • Automatically generated glossaries and indexes. (Okay, who am I kidding? Nobody cares about indexes anymore. Sniff.)
  • File inclusions (raw, interpreted, syntax highlighted or not, with line numbers or not, …)
  • Replaceable text
  • Conditional text (generating different aspects of the book from the same source)

… and so on.

I think my bottom line is, you can get away with writing a small amount of documentation in plain HTML, such as a README or a short install guide. But the larger your book, the harder this gets. The pattern you really want to avoid is:

  1. Start authoring a substantial book in HTML.
  2. Part way through, discover that you need some of the features on the list above.
  3. Start hacking those features in with some kind of special ad-hoc syntax. No worries — it’s not too hard, it’s just one or two “special tags” or “macros”…
  4. Eventually end up re-inventing a bad version of reStructuredText or Pandoc-flavored Markdown. Except with way more angle brackets.

Don’t be that guy.

5 thoughts on “Tech Pubs Tuesday: Don’t Write Directly in HTML

  1. So what specialized format do you recommend to make things easier? This post was spell checked per your advice in last week’s tech pub 🙂

  2. Oh thank G0d, someone else remembers TROFF.

    Given that it’s unlikely I’ll ever write a book in html, how do you feel about writing blogs directly in, say, WordPress? Does it even matter?

  3. Dave — ha!

    Patrick, Ryan — I recommend reStructuredText, specifically [Sphinx](http://sphinx-doc.org/). Markdown is decent too, though not my favorite. For a book-length technical work, you really want a superset of Markdown, something like MultiMarkdown or Pandoc Markdown.

    Robin — I think writing blogs in WordPress is a great idea. If you ever want to collect your blog posts and turn them into a book, you’ll figure it out without too much trouble. 🙂 I’m sure there are some nice WordPress plugins that can automate this for you.

Comments are closed.