The Advantages of setting all books into a common primary format
by NH

I try to render all my books into a common primary format, which makes the rendering into other formats extremely easy. Two examples can be given. I convert all quotes to the state where the double quote or “dog-ear ” is the primary one. Most nineteenth century books are in this state, in any case. From the twenties onwards, and occasionally even earlier, the single quote was sometimes used as the primary one. However it is a mathematical fact that it is far easier to convert from the primary double-quote to the primary single-quote, because the latter gives rise to too many uses of the right single-quote (over-typing, as a programmer would say). In other words the primary single-quote format can easily be attained from the primary double-quote format, but the other way round is not so easy.

The other example concerns what I call punctuation tokens. This phrase refers to a set of symbols appearing between two strings of alphabetic characters. There are a few hundred of these that are definitely permitted. Occasionally we find a token that appears not to be permitted. We can add it to the list of permitted tokens, or we can investigate by looking to see if there had been a misread or a typo.

I actually perform two punctuation token tests. One of these is performed on the primary files, and the other is performed on the Gutenberg-style files that I produce. The latter sometimes find things that had been missed or glossed over at the primary stage. The first test is one of the first things that we do in working on a book, and the Gutenberg-style test is one of the last. To be quite truthful there is another punctuation token test that can be used, and sometimes is, and that is when we produce a stripped-down version of the book for playing aloud in an iPaq.

There are other things that I do to harmonise the format of a book with an ideal. One is to make sure that spelling is consistent, including accents where required, and the other is to make sure that hyphenation is consistent. Neither of these are particularly difficult, once the process has been set up. These processes constitute what I call watermarking, which became important to me once I realised that pirates were taking my work and pretending that it was their own, for their own commercial benefit.

All these things having been said I must say that I try to deviate only in the instances of typos and misreads from the way the actual book was presented. Any other deviation needs a serious reason.

From the primary format we produce the following:

  1. The book in xhtml/css format by chapters, as published on my website.
  2. The book in xhtml/css format, as formerly published on my website, but consuming too much bandwidth, so discontinued, except when requested. This can have pointers to the original pages, as well as to the chapter headings.
  3. The book in OEBPS format (Open e-Book Publication Structure as defined on September 21, 1999, and later refined on July 2, 2001).
  4. The book in Microsoft e-Book format (.LIT) which was created to conform with the OEBPS format).
  5. The book in Rocket e-Book format, now deprecated.
  6. The book in TextAloud MP3 format.
  7. The book in Fonix iSpeak format.
  8. The book in a no-frills format that can be played aloud by the TTS function in an iPaq. This can sometimes disclose a hitherto unseen error.
  9. The book in a Plain Vanilla Ascii format as required by Gutenberg.
  10. The book in an MP3 format produced using TextAloud MP3.
  11. The book in an MP3 format produced using Fonix iSpeak.

NH, Monday, 22nd May 2006.