Once upon a time, there was a young technical writer who dwelled in an elegant cubicle with high, noise-baffling walls. He was handsome and clever, beloved by engineers and product managers throughout the Valley, and could write like the wind itself.
One dark winter’s evening just before close of business, an ancient crone came tottering over to his desk. In her arms she clutched a tattered collection of printouts of the DR DOS 3.31 manual. “Please, young sir,” the crone asked, “could you trouble yourself to help me edit this poor old woman’s manuscript?”
The young technical writer laughed and refused. “I am far too busy to help the likes of you,” he said, by which he meant too busy reading Reddit. “Begone!”
Once more the crone asked the young man for help with her edits, taking care to warn him that things were not always what they seemed on the surface… and once more the young man refused to help her.
BAMF! With a flash of light and an acrid whiff of toner, the crone’s disguise fell away, revealing her true form: the Goddess of Technical Writing herself.
The young man fell onto his knees, begging forgiveness and sobbing, but it was too little, too late. The Goddess faded away, but not before spitting out her most venomous curse: “May all your software manuals forever be written in TWiki!”
So if you’re an unlucky person, you might find yourself stuck with a pile of technical documentation in TWiki or some other baroque “enterprise” wiki. If so, here’s a hacky recipe for getting yourself unstuck.
View a rendered wiki page and select the content div out of the HTML, saving each page as an HTML snippet. In TWiki or Foswiki, the content div typically has an ID of
patternMainContents. For a small number of pages, you can use your browser’s element inspector to help you copy and paste the content. For a large number of pages, you can try automating this process with curl and some custom script that strips off everything that isn’t the content div.
Run all resulting HTML snippets through
Download and install pandoc.
Run all tidied HTML snippets through pandoc, converting them to Markdown, RST, or the output format of your choice.
The main takeaway is that you don’t want to mess with parsing the horrible native wiki format. Let the wiki do its thing and just get the resulting HTML — that’s something you can work with.