Publishing in Standard Manuscript Format with Sphinx, reST, and Sffms LaTeX

Although it’s the Year of our Lord 2011, and we are blessed with no end of advanced publishing technology, many fiction writers practice their craft something along these lines:

  1. Open Microsoft Word or Libre Office.
  2. Type some stuff.
  3. At the end of the writing session, click “Save As” and save your file as "Title_of_Story Todays_Date.doc"
  4. Occasionally, when you remember, burn all your story files to a CD.

Some writers are a little nerdier. They might use a specialized writing tool such as Scrivener or Ulysses. They might have automated backup set up. Or they might use an online word processor such as Zoho Docs and Google Docs which provide “free” offsite storage and version control.

But what if you’re a really nerdy writer? Nerdy enough to want to author in an open plain text format? Nerdy enough to want to check your files into a real version control system? Nerdy enough to run diff and sed and perform other text-munging feats of strength?

It all sounds promising, but there’s one major obstacle standing in your way: Standard Manuscript Format. Double spaced, 12pt monospace, one-inch margins, a running header with the author and title, you know the drill. This exacting print-ready format is easy to produce with a word processor, but if you want to stay outside of that world, you’re in for some serious pain. So what to do?

  • Give up and use Word.
  • Wait for the current generation of publishers and editors and all the people they have trained to die.
  • Other.

I choose… “Other!”

Using Sffms

When it comes to exacting typesetting, nothing beats LaTeX. And it just so happens that M. C. DeMarco has designed a LaTeX document class named sffms that outputs Standard Manuscript Format.

DeMarco has documented sffms to the point where you can use sffms without actually knowing much about LaTeX itself. There’s a bunch of config-y header-y stuff at the top of the file. Paragraphs look like paragraphs. Occasionally you need to add a special command like chapter{In Which Stuff Happens} to create a chapter break, or emph{Look out!} for emphasis. You run the thing through latex and then pdflatex, and out comes a beautifully typeset manuscript complete with wordcount. You have to watch out for reserved LaTeX characters, but quite honestly, authoring a story in LaTeX is easier and cleaner than, say, hand authoring the same thing in HTML. It’s all quite civilized.

The one wrinkle is that the LaTeX toolchain really only works for print. All the TeX to HTML conversion tools I’ve tried are ancient and horrible. They’re particularly bad with sffms, which has some differences from a “normal” LaTeX article that trip up the ordinary HTML converters.

So if all you want to do is submit manuscripts to publishers, you are golden with LaTeX and sffms. But if you want anything even remotely reasonable to post to the web, you’ll need to roll your own converter. Or read on.

Using Sphinx and reST

LaTeX is a great technology. But for a humane plain text format that converts nicely to HTML, we have to move forward in time a couple of decades. reStructuredText (aka reST) is a lightweight markup language developed by the Python community for technical documentation. Sphinx is a builder tool that transforms reStructuredText into different target output formats.

The nice thing about authoring in reST is that the source files are even cleaner-looking than LaTeX (let alone an angle brackety language). Paragraphs are paragraphs. Chapter headings are titles with underlines. Emphasized text is text surrounded by asterisks. Even if the entire Python documentation toolchain disappears, even if you’re looking at your story’s source files decades from now, your work will still be perfectly readable as plain text.

For the fiction writer, what Sphinx brings to the table is solid, easy to use HTML production. If you don’t like the built-in HTML templates, it’s straightforward to hack your own. Sphinx also produces EPUB out of the box, as should all writing tools worthy of your consideration. So to recap: reST is a really nice source format, while Sphinx provides you with lots of power and control over how the HTML and EPUB output looks. Looking good so far!

Sphinx also produces LaTeX output. The problem with Sphinx’s LaTeX output is that — surprise! — it’s designed to typeset a nice looking technical manual, not a novel. The results look nothing like Standard Manuscript Format. Ah, well.

But wait a second:

  1. sffms LaTeX is designed to typeset documents into Standard Manuscript Format… but its HTML output is unacceptable.
  2. Sphinx and reST have great HTML facilities… but its LaTeX output is unacceptable.

Hmmm…

Someone Got sffms Peanut Butter in My reST Chocolate!

So it turns out that it’s not terribly difficult to write a Sphinx extension, even if you are a Bear of Very Little Brain who hardly knows a lick of Python. Now available for public consumption: the sffms Sphinx extension. And yes, the source code is available on github.

To use this extension:

  • You must be able to install LaTeX with the sffms document class included.
  • You must be able to install Python and Sphinx.
  • You must be comfortable editing configuration files and running commands on the command line.

There is also very little documentation other than some comments in example files (though I’m working on that). In other words, this extension is for software engineers who also like to write fiction. Preferably software engineers who have used Sphinx before, and who have lots of patience.

You don’t have to be familiar with the sffms LaTeX class, but reading DeMarco’s original documentation might help you wrap your head around what sffms actually can do. My extension currently exposes almost all the knobs that are available in the sffms LaTeX class. For example, you can set sffms_frenchspacing = True in your Sphinx conf.py, which injects a frenchspacing command in the resulting LaTeX output.

If you want to kick the tires, here is a sketchy outline of what to do:

  1. Install LaTeX with the sffms LaTeX class.
  2. Verify that you can run latex successfully on one of DeMarco’s example stories.
  3. Install Python 2.x (if necessary), followed by the Sphinx and sffms Python packages.
  4. Verify that you can do a vanilla Sphinx doc build. You can use sphinx-quickstart to create a test Sphinx document. Try creating a PDF manual of your test document.
  5. Download the skeleton short story and skeleton novel from GitHub. (These artifacts are not currently included in the sffms package itself, but they should be).
  6. cd into one of the skeleton directories and run sphinx-build -b sffms . _build.
  7. cd into _build/sffms and run latex index.tex — twice.
  8. Run pdflatex index.tex. View the resulting PDF.
  9. Go back to the story directory and start playing around with the Sphinx conf.py file. The configuration file contains all the available sffms configuration options with comments. The file also contains some minimal options for Sphinx in general. These other options are uncommented and you can ignore them if you are just working with sffms LaTeX output.

The next order of business is to write some real documentation. Beyond that, I welcome any bug reports or suggestions for feature enhancements. I am neither a professional programmer nor very experienced with Python, so any help you can provide would be very… helpful. Thanks!

White Teeth

Recently my friend Shelly went on a trip to Europe, where she had the opportunity to take a tour of the crypts of Vienna. The tour guide pointed out some skulls, one of which had incredibly poor teeth.

“Do you think this is the skull of a rich person or a poor one?” asked the guide.

A poor one, the tourists guessed.

“Wrong,” said the tour guide. It turns out that the entire upper class in old Vienna suffered from severe tooth decay, because unlike the poor, they had access to all the sugar they could eat.

No reason to bring this up, other than I was reminded of this story while staring at Paul Allen’s horrific visage. All that money to spend on lawyers, and not a penny for dental floss. Sad.

Fever

I’m trying out a new feed reader called Fever. So far, I like it. Even if I didn’t like it, at the very least Fever’s creator Shaun Inman deserves major credit for diving into a moribund marketplace and trying to create something new and interesting.

Fever is a collection of PHP files that you install on your own server, sold for $30/pop. There is no central hosting, so if you don’t have access to a server, you’re out of luck. At first glance this seems like a crazy business decision. On the other hand:

  • running a hosted service is a PITA
  • in 2010, anyone who is actively looking for a new feed reader is almost guaranteed to be a nerd who has their own server

More interesting than Fever’s basic architecture or business model is how Fever treats feeds and entries. There are three basic categories:

  • Kindling — feeds that you want to read on a regular basis. Family, friends, important stuff.
  • Sparks — feeds that you don’t really care about reading regularly. These feeds only really exist as fodder for the Hot list.
  • The Hot list — a list of links that Fever has determined to be relevant based on the contents of your Kindling and Sparks.

For example, one of the top entries in my Hot list right now is “Do you skim?” In NetNewsWire, I would have seen this post anyway, since tor.com is one of my daily reads. However, Fever boosts this post to the top because the link is also referenced by SFSignal (a Spark) and a couple of science fiction author blogs (also Sparks). Another link in the Hot list is a PDF article in National Affairs by Robert Solow, “Science and Ideology in Economics.” I don’t subscribe to National Affairs at all, but that link appeared in a couple of my Sparks, so poof! there it is.

What I love about this design is that it mirrors how I actually want to think about feeds. There are my actual friends and colleagues who I want to pay attention to, plus a small number of pro blogs that are consistently interesting. And then there are feeds that are sort of interesting, but just not worth the cost of adding to the feed reader. But in Fever, you can just throw lower-priority feeds into Sparks and never have to think about them again. If anything interesting happens, Fever will bubble it up. In traditional feed readers, this kind of thing is a chore — I know I can only keep up with so many feeds, so every new feed is a costly decision. Fever solves this problem elegantly. It’s actually kind of liberating to race around the web, adding feeds again.

The other brilliant thing about Fever is that unread counts are not shown by default.

What are my issues with Fever?

  • The only big one is, reading authenticated feeds does not seem to work, at least not with protected LiveJournals. This only affects a couple of my feeds, but it’s something I really need to figure out before I’ll be able to wean myself from NetNewsWire entirely.
  • This might be pilot error, but as far as I can tell, links in the Hot list don’t seem to have the concept of “Read / Unread”. Instead, they have a “Blacklist” button that nukes the link from view. There’s something a little off about having only this metaphor for clearing up the Hot list.
  • Building up Sparks is pretty fun, but you have to be careful to keep things balanced: if you add a bunch of SF writer blogs, you need to add a roughly equal number of econ blogs, and so on. Otherwise one group will start to dominate. (SF nerds vs. econ nerds: FIGHT!)
  • So far, Twitter appears to be useless. I’ve added a few of the people I follow on Twitter to Sparks, and either these test feeds don’t include enough overlapping links to make a difference, or Fever is getting confused by Twitter’s link shorteners. Screw link shorteners and screw Twitter.

So next up, figuring out the auth problem. After that, pulling in my Facebook activity and turning it into a dedicated private feed for Fever to consume. Thanks to Facebook’s EVIL PRIVACY-DESTROYING BABY-KILLING Graph API, I think that project might be easy enough even for me.

The Greatest Generation Really Was Pretty Great!

I’ve been reading Steve Blank’s outstanding series, The Secret History of Silicon Valley. Blank makes the case that much of the valley’s history has been simply forgotten, and the true starting point is at least 100 years ago:

I read all the popular books about the valley and they all told a variant of the same story; entrepreneurs as heroes building the Semiconductor and Personal Computer companies: Bill Hewlett and David Packard at HP, Bob Taylor and the team at Xerox PARC, Steve Jobs and Wozniak at Apple, Gordon Moore and Bob Noyce at Intel, etc. These were inspiring stories, but I realized that, no surprise, the popular press were writing books that had mass appeal. They were all fun reads about plucky entrepreneurs who start from nothing and against all odds, build a successful company.

To my surprise, I discovered that yes, Silicon Valley did start in a garage in Palo Alto, but it didn’t start in the Hewlett Packard garage. The first electronics company in Silicon Valley was Federal Telegraph, a tube company started in 1909 in Palo Alto as Poulsen Wireless. (This October is the 100th anniversary of Silicon Valley, unnoticed and unmentioned by anyone.) By 1912, Lee Deforest working at Federal Telegraph would invent the Triode, (a tube amplifier) and would go on to become the Steve Jobs of his day — visionary, charismatic and controversial… By 1937, when Bill Hewlett and David Packard left Stanford to start HP, the agricultural fields outside of Stanford had already become “Vacuum Tube Valley.”

The part that really struck me was the section about World War II, where Fred Terman and his colleagues were tasked with defeating Germany’s very sophisticated and secret electronic air defense system, which was responsible for inflicting unsustainable losses on Allied bomber crews. In an incredibly short period of time, these engineers completely transformed the nature of electronic warfare. Or as Blank puts it,

Just to give you a sense of scale of how big this electronic warfare effort was, we built over 30,000 jammers, with entire factories running 24/7 in the U.S. making nothing but jammers to put on our bombers.

By the end of World War II, over Europe, a bomber stream no longer consisted of just planes with bombs. Now the bombers were accompanied by electronics intelligence planes looking for new radar signals, escort bombers just full of jammers and others full of chaff, as well as P-51 fighter planes patrolling alongside our bomber stream.

Unbelievably, in less than two years, Terman’s Radio Research lab invented an industry and had turned out a flurry of new electronic devices the likes of which had never been seen.

Aside from catching up on my history, the other thing I’ve been doing is moving the HTML tutorial out of WordPress completely and into the new template. This also gave me the opportunity to do some cleanup — fixing typos, outdated sections, broken links, and so on.

One section of the tutorial discusses abusing HTML borders to do dotted underlines and other fancy decorations. Originally, I had a link to a 2003 version of the CSS 3 spec, which included the possibility of doing dotted underlines natively, using CSS text-decoration As I was editing, I thought it would be good to update the link to the latest version of the draft. To my surprise, the 2007 version of the section now says in red,

Paul and I have agreed that we want to simplify the set of properties introduced in the previous CSS3 Text Candidate Recommendation. We’re not sure how yet, though, and would like to solicit input from the www-style community.
So far, we think that the following capabilities should be sufficient…

Hmmm. Okay, so to recap:

  • In the early 1940s, Fred Terman’s Radio Research Lab spawned an entire new industry in a couple of years, based on far-out science-fictional technology, shipped product, and helped win the war against fascism.

  • Meanwhile in the 2000s, after nearly a decade, we still can’t figure out how to do fancy underlines.

HELLOSKI!

I’m done with “Hello World“.

It’s just so soggy and uninspired. Whenever you see output from a new test program, you should be happy. Huzzah, something is working! But all “Hello World” makes me think of is a sad, wrinkled turtle, peering out of his shell, looking around timidly before speaking. If there’s one fundamental principle of software engineering we can all agree on, it should be this: whatever your program outputs on success, it should never be something that you could imagine being warbled by a turtle.

Just the other day, I installed Apache2 on my home machine via MacPorts fired it up, and the test page said, “It works!” Sweet! Now that’s what I’m talking about.

“It works!” is pretty good, but I think we can do even better. I hereby declare that starting today, the official replacement for “Hello World” is… “HELLOSKI”. When you see “HELLOSKI”, you think of — a cheerful Russian! Who is going to slap you on the back and buy you a drink! Because you are writing awesome, revolutionary software!

Seriously, try it for yourself. The next time you’re writing a “Hello World” program, make it say “HELLOSKI” instead. (If you’re in a web context, be sure to add H1s for full effect.) You’ll be glad you did.

Prying Up Rocks, Shining Flashlights

“I can’t believe you like money too! We should hang out.” – Frito, Idiocracy

With Microsoft’s launch of Bing, we finally have someone in the marketplace who is able to match Google dollar for dollar, both on the tech side and on the business side. This is having plenty of ripple effects [cough]. Anyway, one interesting facet of this story is that it is forcing the tech press to pay more attention to how search market share works. Heck, with Microsoft moving so aggressively to win back deals from Google, some of these stories are even turning out to be kind of sexy!

The tech press definitely deserves some sympathy here, because deal distribution stories are a fundamentally a hard story to report.

  • They’re about boring sales guys, not exciting new technology. It’s not a very heroic narrative.
  • The effects of these deals are shifted in time. Deals made two, three, four years ago are just having their major effects felt now.
  • The effects of these deals are subtle and difficult for outsiders to track. The press only has access to aggregate data from third parties. Only companies with search engines have access to the real, raw, data, and we don’t share it.
  • Understanding the real importance of distribution deals requires the reader (and reporter!) to make a huge cognitive leap: namely, that almost everyone else in the world is pretty vague on the concept of a browser, or a website, or a search engine. For most internet users, these concepts are all mushed together. Which is why unlike you, 98% of the population can’t or won’t change their search engine preferences.

I think any techie can relate to that last issue. We’ve all had our own proverbial “Aunt Ida”, a happy, intelligent, fully functional member of society who nevertheless has major trouble getting online, reading her email, etc. So we sort of get that first bit at least. Here’s the thing that we all have a hard time understanding at a deep, fundamental level, because we live in the tech echo chamber: everybody is Aunt Ida. We are a rounding error.

So with that in mind, it’s good to see more light being shined on this corner of the industry, and it’s very interesting to watch the press starting to wrap their heads around this stuff. Still, this process is happening in fits and starts, and in some instances I think we have… a ways to go. Here, for example, is CNBC on the Bing/Apple discussions:

“Got an intriguing email from a knowledgeable source very familiar with search dynamics involving Apple, Microsoft and Yahoo for that matter.

Third point: Every time you do a Google search from Apple’s iPhone Safari and a user clicks an ad, Apple gets a payment. Microsoft, this source tells me, is willing to throw much more money to Apple to ensure that they displace Google as the default engine…”

CNBC needed secret inside sources to let them in on this? Incredible. But even more jaw-dropping was the report from Search Engine Land. When reading the following passage, keep in mind that Search Engine Land is widely considered to be the premier source for news about the search industry:

“… Becoming the default search provider on 70 million (roughly) iPhone OS devices would be an enormous boost for Bing. (One question: is Microsoft offering Apple money?)”

Headdesk. Oh, well. Until those guys catch up, there’s always Kara Swisher

At Least I Minored in PPOWER

Via Timothy Burke, I ran across Course Hero, a Web 2.0 startup whose mission is:

Accelerating and maximizing educational breakthroughs (“Ah-Ha” moments) of students from inquiry to Course Hero Responses via an open, best-of-breed content sharing model.

Or in other words, a site for collecting student notes and papers. Now, Prof. Burke isn’t particularly worried about term paper download sites in general. (He has an excellent defense strategy — don’t hand out boring, easily-copied assignments.) However, in the case of Course Hero he observes,

“What you find in the folders for Swarthmore is a bunch of junk pulled straight out of specific folders on the server, with the server folder titles on it, most of them connected to the oldest layers of our web presence. Almost none of the stuff in there has got anything to do with actual courses taught here: it’s some old .pdf handouts, some faculty c.v.s, a few papers or publications by faculty. Useless to anyone, especially to some would-be plagiariser at another college who is hunting for a paper to rip off. It’s a lot of noise. But seriously, don’t even try to pretend that this is all coming from user submissions, that’s laughable.”

It seems like a bad idea In These Economic Times (TM) to launch a site whose business model is obviated by typing site:swarthmore.edu {query} into Yahoo! or Google or Bing. But I’m not an MBA or a VC, so what do I know.

Anyway, I was particularly tickled not by the bad content, but the bad metadata. Here’s the landing page for my alma mater:

> Harvey Mudd College) is a private university in California. Harvey Mudd College has over 738 undergraduate students. The top 10 departments are CS, ENG, MATH, LIT, E, FOOL, PPOWER, WIN, WMF, and WINW.

What an idiot I was to major in FOOL!

Videos I’d Like to See

Earlier this summer, the Google Chrome team produced a video where they went around asking regular citizens what a browser was. Turns out that about eight percent of the people know the answer, while the rest have no idea.

Cue mockery, laughter, sadness, feigned outrage, and even the occasional reasonable response.

I’ve never liked Jay Leno-style man-on-the-street video interviews; it’s easy to make regular people with no TV experience look bad. But one thing is clear: this video has legs, and the “8 percent” figure will probably be cited in blogs and articles and conferences for years to come. If this ends up driving home the point that hectoring ordinary people to “get a better browser” is a waste of time, then it’s hard to argue with the overall merit of the project. Remember kids: users do not change their defaults. That’s why it’s all about the Benjamins distribution deals.

Here is the video I would like to see. The scene: a succession of sleek Silicon Valley or NYC webdev offices filled (in no particular order) with beanbags, contemporary art, and twentysomethings with messenger bags. The questions:

  1. Why do we have seasons?
  2. Why is the sky blue?

Okay, I’ll admit that I’m cheating a little on Question #1, since I’ve already seen the famous 90s era video of new Harvard graduates flubbing that question. Unfortunately I can’t find that clip on YouTube or any other major video sites. Chalk this one up to the Vast Harvard Conspiracy (Truth Suppression Division).

As for Question #2, I only have my instinct to go on, but I suspect the results would be equally dismal. Note that we’re looking for layman-friendly answers here. References to Rayleigh scattering are admired but not required.

Sorry, My Amicus Briefs Only Work Against Chaotic Evil

Commenter Harry Lewis, on the Google Books settlement:

The proposed settlement includes a “most favored nation” provision. The parties agree that IF the Authors and Publishers ever come to terms with another party who is scanning books, Google has to get the same deal. That is an anti-competitive provision that will make it impossible for anyone else ever to underprice Google. If the Court adds its signature to the deal, it is sanctifying the creation of a monopoly.

Driven by despair, or perhaps fragile hope, my old classmate Sam Mikes responds with poetry:

The law condemns the man or woman
who steals the goose from off the common
but lets the greater villain loose
who steals the common from the goose.

One thing is clear: Brewster Kahle is going to need all the help he can get if he’s going to slug it out with Google. So what are our most prominent knights of the commons doing to assist us in our hour of need? I sauntered on over to Larry Lessig‘s place to see what he thought about the original settlement in October.

Oops, looks like Lessig’s in the tank.

Maybe the EFF… hmm, no. They’re a little more measured, but they don’t seem all fired up to go after Google either.

Being a Paladin of the law is tough work, I guess.