Because the public demanded it! This is really just an overview of the process, but it should give you a basic idea about what to watch out for.
-
Convert your AuthorIT book to DITA.
DITA (Darwin Information Typing Architecture) is one of AuthorIT’s built-in publishing formats. Publishing to DITA results in a folder containing your book’s image files, a collection of
*.dita
files, and atoc.ditamap
file.Sadly, you must take this opportunity to wave your index markers a fond farewell. They are apparently too old and frail to survive this stage of the journey.
-
Download the DITA Open Toolkit.
The DITA Open Toolkit (DITA-OT) is a collection of Apache Ant scripts, XSL stylesheets, and other goodies that enable you to transform DITA into other formats, including DocBook. For those of you who don’t live in the Java world, Ant is basically
make
for Java. Newer versions of DITA-OT conveniently include a copy of Ant, so you don’t need to install it separately.To install DITA-OT, unzip the toolkit’s files into any directory and run the
startcmd.sh
script (orstartcmd.bat
script on Windows) to configure yourCLASSPATH
and other environment variables. If you forget to set yourCLASSPATH
, the toolkit will helpfully indicate this to you by bailing out mid-transformation and complaining that the Ant script is broken.Before you run any DocBook transformations, edit
xsl/docbook/topic2db.xsl
and comment out the template that contains “Related links”. The only thing this template does is riddle your DocBook with invaliditemizedlist
elements.Do not waste time reading the toolkit’s documentation. The manual that ships with DITA-OT 1.3 actually applies to DITA-OT 1.2, so most of the examples are broken. As for grammar and clarity, let’s just say that the manual’s translation from the original Old Frisian leaves much to be desired.
-
Transform the DITA document into DocBook.
All the toolkit’s transformations involve running an Ant script:
ant options targets
To transform DITA to Docbook, run:
ant -Dargs.input=path/toc.ditamap dita2docbook
If the transform fails (and all your environment variables are set correctly), there might be errors lurking in your generated DITA source. This is AuthorIT’s way of telling you, “Don’t let the door hit you on the way out, jerk!”
- If DITA-OT complains about a missing topic reference, there’s a good chance
toc.ditamap
is referencing a topic that doesn’t exist. Go back to the original AuthorIT doc and try to identify the missing topic. If all else fails, delete the reference fromtoc.ditamap
and move on. Your readers already knew about the safety hazards of handling lithium deuteride, anyway. - If a topic contains a
xref
with a crazy relative path, this can really confuse DITA-OT. The good news is that the toolkit indicates the path that is causing the problem. The bad news is that AuthorIT dumps its DITA output in UTF-16, which is really annoying togrep
through. - If you had any “Note” paragraph styles in your AuthorIT doc, these might disappear. Even more strangely, “Warning” paragraphs do make it through.
- If DITA-OT complains about a missing topic reference, there’s a good chance
-
Clean up the DocBook output with a script.
Congratulations, your document is now DocBook! Well, more accurately, it’s “DocBook”. Just be happy your tables made it through, sort of.
Fortunately, you can fix many issues pretty easily by running the document through a cleanup script. This script is particularly important if you’re converting multiple documents. The canonical language for the script is XSLT, but if you’d rather stick it to the W3C Man, Python or Perl would work fine too. Here’s what you’ll want to fix:
- Remove all
id
attributes. These generated IDs are duplicated throughout the doc, and nothing points to them. Throw them away and start over. - Remove all
remap
attributes. In theory, these attributes contain useful information about the original DITA element, which in turn could help you design your post-processing script to provide better-quality DocBook markup. In practice… eh, not so much. - Remove all
sectioninfo
elements. They’re often invalid, and always contain nothing useful. - Remove empty
type
attributes. Not sure how those got there. - Remove empty
para
elements. - Change
sidebar
elements tosection
elements. Like the emptytype
attributes, these are another mystery guest. - Join
programlisting
elements. If you had any multi-line code samples, you might find that in the transformed DocBook, each line appears in its ownprogramlisting
. Join adjacentprogramlisting
elements into a singleprogramlisting
(orscreen
, if appropriate). - (Optional) Change the
article
to abook
, if appropriate. Addchapter
elements as necessary. - (Optional) Try to improve the quality of the markup by changing
emphasis role="bold"
andliteral
elements to something more specific. For example, you define a list of commands that appear in your book and wrap each one in acommand
element. Creating explicit lists of commands, GUI buttons, and so on is tedious, but it’s still better to do these substitutions in the script.
Finally, there’s the issue of broken IDs and links. Currently, every one of your AuthorIT hyperlinks is now a
ulink
that falls into one of these categories:- The
ulink
‘surl
starts with “mailto:
“. Convert these toemail
elements. - The
ulink
‘surl
starts with “http://
“, or “ftp://
“, or “gopher://
“. Leave these alone. - The
ulink
‘surl
points to something like “D1228.xml
“, a.k.a. nowhere. These are your former internal hyperlinks. They’re all broken.
But don’t be discouraged, your script can actually “guess” at where many of these links should point. If a given internal
ulink
contains something like, “Configuring the MIRV Launch Sequence”, there’s an excellent chance that somewhere else in your document there’s asection
with atitle
, “Configuring the MIRV Launch Sequence”! So all you have to do is:- Convert the content of each
ulink
to a nicely-formatted ID. Replace whitespace with underscores, remove extraneous punctuation, and lower-casing everything. - Convert the
ulink
to anxref
, setting thelinkend
to the new ID. - For each
section
element, apply the same ID-conversion algorithm to thesection
‘stitle
. Set this value as thesection
‘sid
.
A healthy fraction of your
id
s andlinkend
s should now match up, fixing those broken links. - Remove all
-
Clean up the DocBook output manually.
Oh, you’re not done yet! Here’s a non-exhaustive list of what’s left:
- Fix the remaining invalid
id
s and broken links that your script didn’t catch. - Fix any other DocBook validity issues.
- Add
programlisting
andscreen
elements where appropriate. Remove excess carriage returns as necessary. - Make your inline markup consistent. For example, all command-line tools should be consistently marked up as
command
s (assuming your organization chooses to use that element). You can partly script this, but mostly this is a manual job. - Remove any mysterious duplicate
section
s. - Rename your images from “
898.png
” to something more descriptive, such as “mirv_reentry_trajectory.png
“. Embed the images in afigure
with a propertitle
andid
. - Add any missing front matter.
- Rebuild your index by hand. By hand. Jesus H. Christ.
Now put your feet up on the desk and pour yourself a well-deserved gin-and-tonic. If anyone asks you why you look so frazzled, do not under any circumstances tell the truth. Otherwise they’ll just respond with, “Well, why don’t you just move it all to the corporate wiki?” And there’s only one rational reaction to that. Don’t get me wrong, it’s not easy to inflict serious blunt force trauma using a 15″ Powerbook, but somehow, you’ll find a way.
- Fix the remaining invalid
“Your readers already knew about the safety hazards of handling lithium deuteride, anyway.”
“Configuring the MIRV Launch Sequence”
“mirv_reentry_trajectory.png”
Does Yahoo have a project in the works to bring a new meaning to “Googlebombing”? Just askin’. Wouldn’t want to be over there having dinner with friends on the, uh, launch date.
Haha, “launch date”! I love my family.
No, we’re just trying to maintain some sort of minimal deterrent capability, should arms talks break down. Refer to “Should Google Go Nuclear?“
Interesting. I’m still hoping for the success of the Z-Machine, which I wrote about ages ago. That one runs by repeatedly smashing pellets full of deuterium using an electromagnetic compressor; some of the current-transmission elements involved are actually vaporized, used to boil water to turn a turbine, and then re-formed. Kinda cool. And it’s the only alt-energy tech I’ve seen that actually looks like the energy of Tomorrow!
That is an amazing picture. I don’t even care if it works, I’m writing my Congressperson to fund this sucker!