Fonix iSpeak uses a playlist approach. Therefore you can continually improve the components of the playlist even while a book is being spoken aloud. This means that as soon as the initial OCR process has completed, with its consequent creation of the pages in text form, and from them, the chapters in text form, you can automatically create a text that can be read aloud very well, for each chapter, and then the playlist for the book. This is before you have even had a look at anything that came out of the OCR process. However, it will initially not be very good, and you need to put each chapter in turn through two manual processes, (1) P, the top-and-tailing of each page, and (2) S, the spelling check. There are lots more errors to be found, but you are probably within an initial target of 98% correct when P and S have been done. Therefore you only need to P and S the first chapter, and you can start listening to the audiobook playlist. When iSpeak comes to the end of a chapter it just loads the next chapter in whatever state that might be. You can do P and S at about three times the reading rate, so you are always ahead of what is being read, once you have done the first chapter.
By contrast TextAloud MP3 loads the chapters, (rather than their names) and plays them in numerical order. It is perfectly possible to load a chapter at a time, while you are working in the rest of the book, but this can be a nuisance. You can load any number of chapters, for instance half the book, or the whole book. So there is a constant need to supervise the process, while iSpeak supervises itself, once you have told it to read whatever it finds indirectly addressed in the playlist.
Another problem with TextAloud MP3 is that its speed control is not very good. It sometimes very slightly speeds up. If you are doing two jobs at once, you need the audiobook speaker to go steadily and not too fast. If it starts going too fast you have to switch your entire attention to it. This does not occur when it is recording onto disk, only when it is “reading aloud”.
We initially found iSpeak to be not very good at pronouncing some of the words. If you list all the words that you find in a book, it will pronounce two-thirds of them correctly, and those will be the most common words. But one-third of the words need assistance. Luckily there is a very good feature of iSpeak which allows you to insert markup just before one of these dodgy words, so that iSpeak pronounces it acceptably, if not totally correctly. We have provided a database of some 18,000 words and their markups, together with software that inserts them into the text. Thus as soon as you have finished editing a chapter, a version of the text is automatically created that is referred to in the playlist, and that is what iSpeak will be playing back to you. We also provide the software for testing words and adding them to the database.
Fonix iSpeak allows the use of several different voices, including “Basil” who allegedly speaks UK-English, but he is not very good at it, and you end up being annoyed. So I stick to “Roger” who is iSpeak's own voice. There are a couple of annoying bugs that you need to be aware of when you are using iSpeak. One of them can cause it not to work at all, if you have deleted the location you previously wrote your MP3 files to. You can get out of this impasse by directly editing its ini file. The other bug concerns its use of temporary files, and then not clearing them down when it has finished. Eventually the operating system sorts this out, but if you are short of space on your computer it can be very bewildering if for no apparent reason you run out of space.
TextAloud MP3 allows the use of several speakers, some of which are acceptable and some not. Cepstral “Duncan”, a well-spoken Scotsman, would be very good if he spoke louder. NeoSpeech “Paul” is excellent, speaking US English. In December 2005 ScanSoft released a UK-English voice that is very good, so good that you can happily use it with a minimum of tweaks, or even none at all, mainly needing tweaks to ensure the correct pronunciation of names. But because of the differences between iSpeak and TextAloud MP3 it is desirable to use the former when developing a good version of a book, and the latter when creating a final audiobook. There are no problems with creating MP3 files with TextAloud, at any rate not as serious ones as the two you get with iSpeak.
As far as third party opinion goes, I am the only person I know who likes the iSpeak voice, “Roger”. This is probably because for many years it was the only sensible voice available, and I used it for every one of my audiobooks. Microsoft voices are no good at all. Other people seem to prefer the better quality TextAloud voices.
Finally, we provide the tags necessary to have identification (author, book-name, chapter number) come up on the screen when playing an audiobook, either on an MP3 player or on the computer. The above three entities may be referred to as “artist”, “album” and “song”. This is a great convenience and is very easy to use. You only have to type “how_long” and the job is done, tagging each chapter-file, producing a duration report, and producing an m3u playlist file.
For most books you can fit both the TextAloud audiobook and the iSpeak one on a CD, leaving room on it for the html version, the Microsoft eBook version, and the Gutenberg-style version, as well as for the pdf of the original scans.
NH, Sunday, 21st May 2006.