Thoughts on Using the Athelstane e-Book System
by NH

Working with XP

For the xp operating system use SET_MODE.BAT

This will set the screen size to be the same as for Windows 98.

Note that in the old dos there were 8 dos windows, such that you could load each screen, and then display any of them during a run. This was useful for Help screens, or for other different levels of activity.

It seems that none of this works any more, and that a single set area of memory is allocated to the screen. With the old system there were 25 rows of 80 characters, namely 2000 bytes. Each byte had an associated one that contained its attribute (Colour). To make the cursor disappear you would move it just off the screen, to row 25, col 9. This does not work any more.

Displaying an image, as opposed to text, is quick and easy if each row can be displayed as a horizontal raster, but for some reason the computer is very noticeably slower if the rasters are vertical. This affects the manual cropping program.

This may well be the only point in the whole Athelstane system that is affected now, since all the programs that park their cursor off-screen in (25,0) have been amended to park it on the bottom left of the screen in (24,0).

Permanent Settings

Before you start any book you need to set once and only once such items as the name to which you are ascribing your copyrights. This is done by editing thus:-

edit \wavefile\crite.dat

In our case this contains the one word “Athelstane”.

You can also set, and update, your own version of the copyright message, which you do by editing thus:

edit \wavefile\copyrite.dat

With these small changes you have identified yourself as the owner of the output from your work with this program.

There are a few minor problems to be ironed out.

The token analyser currently holds all valid tokens in memory within the program. There is another token analyser, which works on the ascii output, and which holds all valid tokens in the form of a sorted file, which is then “compiled” into a database format, so that each token is looked up as it is encountered, and is found to be valid, or otherwise. The obvious advantage is that this database can be extended at will, without the need to recompile the token analyser. We will probably take this step forward in the near future, but meanwhile what we have works very well.

Text other than numbered chapters 1, 2, 3, ...

Bear in mind that in the full version of the Athelstane system, as opposed to the “lite” version, there do exist other “chapters” than the numbered ones, for instance:-

_P1 and -0 Cover blurb for the book;
_P2 and -1 Publisher’s blurb for the book;
PIC and -2 List of illustrations for the book;
INT and -3 Introduction to the book;
PRE and -4 Preface to the book;
FWD and -5 Foreword to the book
DED and -6 Dedication
PRO and -7 Prologue
All the above precede the text of the book, and these four follow it.
POS and @0 Postscript
APX and @1 Appendix.
BIO and @2 Biography of the author;
EPI and @3 Epilogue
Scanning double pages

A most important issue is how we deal with books that have been scanned with a conventional scanner, as opposed to a page-at-a-time scanner such as the Plustek OpticBook.

We advise that you use the generic software that came with the computer’s operating system, rather than the software that came with the scanner, for the latter may well try to do unhelpful things, and may even fail while doing them. All we want is a good TIFF image of each double page.

There are a number of recommendations.

1. That the scans should be numbered sequentially in the simplest possible manner, for instance u001.tif, u002.tif, u003.tif, and so forth. Thus a 400 page book would require 201 scans. U001.tif would have a (usually) blank page on the left and the first text page on the right; u200.tif would have page 398 on the left, and page 399 on the right; u201.tif would have page 400 on the left and nothing on the right. These should be packed black-and-white tiffs.

2. It is very desirable that the middle of the scan, the centre of the gutter line down the page should always be at the same place in the scan, preferably the middle. You can achieve this by sticking a piece of paper with a line drawn on it onto the side of the scanner to point to the middle of the scan, the fold of the book.

3. The top of the page should be towards the right of the scanner, so that in the actual scan image it appears to the left. The other way round if you like, but be consistent.

4. Do not make any special effort to make the scans always unskewed. Just do your best. A little bit of skew on either page will be taken out automatically, in due course.

5. The one thing that you must be sure of is that you are producing good clear images. For instance a ‘t’ followed by a full-stop ought not to look like a ‘t’ with no full stop. Of course we remember that with a nineteenth century book there may be considerable variation of print quality through the book, and even within a single page.

The total process we have provided to give such scans an initial processing is called “Stage_A”, as opposed to “Stage_1” for those scans done with the OpticBook.

What “Stage_A” does is to chop each double page scan into two single page scans, each correctly numbered, but, at this stage, not correctly oriented.

All these page-scans are loaded into ABBYY FineReader, and the whole collection of them highlighted in the usual way, by clicking on one of them, then hitting control-A. Then they are rotated with Image/Rotate Right, then despeckled with Image/Despeckle, and finally saved in the subfolder named “straight”, as the “ppp” series.

These images will be (possibly) a little black to the left and right of the page, but the Stage 2 element “left and right polish” will try to deal with this.

So, “Stage_2” having been invoked as usual, the cropping will be carried out exactly as before, and from now on the processes are identical.

Doing the Editing of the Chapters

Part of the proofing process is listening to the books, played in the background. Obviously this wouldn’t work if there were so many errors in what you hear that your mind blanks it all off, rather than subconsciously listening to the whole thing and understanding it. When editing we go through the chapters from first to last, preparing them for the final edit. In this first pass we attend to the top and bottom of each page, then to errors thrown up by the spelling checker. Then we attend to the punctuation. We know there are going to be a few errors left, and we have techniques for finding many of them, notably by the first two T-tests, but we leave these out for now.

When we complete the processing of each chapter it is automatically used to generate an extensively marked-up text which can be played aloud by Fonix ISpeak.

As a rule of thumb you can do this fairly good (but not complete) processing of each chapter, plus the dozen or so tests applied to the whole book, while the chapters are in turn each read out aloud. It is remarkable how these two lengthy processes seem to take almost the same amount of time.

We can now go back and run through each chapter, applying the two T-tests, and probably the R-test, again. After each chapter is processed an improved text-to-speech file is available. Remember that we have made use in doing this of a database of nearly 20,000 words for which we have an improved pronunciation markup. But we can do two things to improve matters. We can instal markup for words that have two possible different pronunciations, such as “wind”, “row”, “lead” and “bow”. With some software tricks that we have installed we can make sure that these markups are correct, as near as we can make them. We actually provide for over 120 of these words with two different pronunciations, and occasionally we encounter a new one, which always cause us to pause and think about it.

And then we can apply a test, that emerges with a list, usually fairly short, of words that have appeared in the book, but which we do not yet know whether ISpeak can pronounce correctly. On average it cannot correctly pronounce one new word in four. So we need a way of identifying such words, and installing them with their correct markup in the appropriate database. This will be found to work very smoothly.

When these things have been done - and we do provide very simple and effective ways of doing them - the text played aloud is probably going to sound very good indeed.

When we have heard the book played through a second time, albeit in our subconscious background, and amended anything we hear to be wrong, we have probably arrived at quite a good version of the book, almost certainly better than the original book as we received it in printed form.

As we developed and improved the implementations of the above ideas, which took a number of years, so we raised the nominal “level” to which books were being spoken.

At Level One, no work was done to improve the test by adding markup. This is how we were working in 2002. But very quickly it became obvious that these two forward steps were needed. At Level 2 all words known to be mispronounced by ISpeak have had markup added to improve them: this is the level at which the first reading of a book is carried out. It is pretty good, but not as good as we can make it.

When we have identified any new words to be added to the databases of words that ISpeak can, and cannot, speak correctly unaided, and applied them, we are on Level 3.0. Add the raw markup for the bi-phone words and we reach Level 3.1. There may be one or two changes needed per chapter. Identify and carry out these changes and we are at Level 3.2. It has taken a long time to explain all this, but the processes for getting a book up to Level 3.2 are simple and easy to carry out, once you know how to do it.

By now every test that we have devised has been applied to catch out word and punctuation misreads, and typos. So a CD made with a level less than 3.2 really needs to be made again now it has reached 3.2, and CDs for earlier levels thrown out.

One of the skills you will need is to have in your personal memory a set of Fonix ISpeak phonetics. These are given in their help-file. You will notice that there are American sounds which are defective over many sounds in the English language, and we have to use American approximations to King’s English sounds.

Listening with NeoSpeech Paul

Having said all this about our recommended Fonix ISpeak, it will have occurred to you that there ought by now to be tts programs that do not need any input from the users or very little.

There are indeed, and NeoSpeech Paul is one of them. It’s a fairly big program and it does an acceptable job on the entire book without too much markup. There does have to be some markup because he speaks American which occasionally is incomprehensible to the English ear. For example he says “wrack” when he means “rock”, which leads to some odd results when the action is taking place on a beach.

This program is available through TextAloud MP3.

We prepare this set of files last of all. We start off by preparing quite unmarked up chapter files for use in the Ipaq, in which there resides a version of ISpeak that is very small, but that does the trick, except that it doesn’t allow markup such as that needed for delays at the ends of paragraphs and sentences. So having produced these Ipaq files, a further set is produced for NeoSpeech Paul by adding markup for full-stops and for the ends of paragraphs.

The Audiobook - Making CDs of a book

When making CDs of a book you need to place the speech-text files in a directory called “bookname”, the 8-character short name that you chose for the book, and have a further subdirectory with the same name, into which you will get the tts program to place the mp3 files when they are created, either by ISpeak or by TextAloud MP3.

When the MP3 file creation is completed we run a program called “how_long”, in that MP3 directory. This utilises a very small file called TAG, which was delivered with the marked-up speech files, and adds identity data to the end of each chapter mp3 file. This data appears on the screen of your mp3 player when you are playing the book. Alternatively you might be listening to the book using something like Windows Media Player, in which case you should note that “how_long” also creates a playlist file called “bookname.m3u”. You just need to load this into Windows Media player, or whatever other program you use to play your audiobooks, and the chapters and other parts, such as Preface and Epilogue, will be played in the correct order. Of course, if you are playing them off a CD player, including the CD type, they do have software in them which gets the chapters into the right order, without using the playlist m3u files at all.

No special preferences about mobile CD MP3 players. At the present time we use Phillips ones, which seem to do the trick. For some strange reason, although mp3 is a widely used format, there do not seem to be all that many CD MP3 players on the market. DVD MP3 players are just beginning to make their appearance (November 2005).

MP3 players with their own built-in memory, often very large, are also on the market. Until recently the memory available was only 128 or 256 megabytes, the latter being sufficient to play many books, but not the former. We now possess an MP3 player with no less than two gigabytes.

Perhaps the most important piece of information about an MP3 CD player is the range of Hertz and Bits-per-Second settings that it will play. Believe it or not we have actually bought in good faith an MP3 player that could not play MP3 files in the speech range. In other words the audiobooks we create with Athelstane software do not play on it. Not good.

An Essay by the Webmaster of Athelstane E-Books