What is the Athelstane Website?

I shall attempt to answer this question by expanding it to several other questions: What? When? What books? How? What software? What is produced?

What? The Athelstane website is a collection of e-books, today numbering no less than 300. I will try here to tell you how it came about. and how it is managed.

When? The collection was begun in about 1994, when a friend who had parted from her bookseller husband gave me some old out-of-copyright books that were in unsaleable condition. Previously I had scanned in articles intended for a sports magazine that I was editor of, from their typescripts. I had also scanned in a whole unpublished book that was on flimsies dating back to from 1928 to 1943. This was Kenneth Tindall’s “Life of Christ”, which you can see on the Athelstane website.

I scanned in one or two of her books, but quickly realised that I was not making as good a job of it as I would have liked. So even now these books are not on the website. One of these was Conrad’s “Chance”, a very difficult book for a beginner to do, because of the deep nesting of its speech paragraphs. That means a conversation, within a conversation, within a conversation, etcetera. Most books have instances of a conversation within a conversation, and possibly an instance or two of the next level of nesting. Here’s an example. “He was telling me about his plans. ‘You’ll find me,’ he said, ‘at “The George” in Oxford.’”

In 1997 I began scanning books in seriously, as I had moved into their home to look after my parents, now in their eighties and nineties, and on discovering that Captain Marryat, one of England’s best authors, was unrepresented on the web, I determined to obtain a copy of every one of his books, scan it, OCR it, check it through and post the book on a very limited website I had at the time with Compuserve. Very quickly I realised that I would need to write an extensive amount of software modules, from tiff handling to error trapping in the raw scanned texts.

What Books? When I had finished Marryat I went to a second-hand shop and asked if they had any old nineteenth century books. They produced “Saved by the Lifeboat” by R.M. Ballantyne. Again, I decided to present the world with as complete a collection of his works as I could manage. The total expected was in the mid eighties.

When the supply of Ballantyne books began to die down I went to another second-hand shop, and simply chose authors from what I could see, choosing some of those whose works were now in the Public Domain, or out of copyright. That shop was in Barking, and has since disappeared. Here I found Richard Archer and Lewis Hough. I bought several other old books which I did not wish to scan in, not at that time, anyway.

Then there was a bookshop in the covered market in Oxford. The owner has since retired, and the shop is no longer there. Browsing this shop brought Amy Walton and Mrs. George de Horne Vaizey onto my scene. I was becoming quite anxious to have more girls’ books on the website. I think that George Manville Fenn came to my knowledge here as well, and certainly I was re-acquainted with Frederic Farrar here.

Another old bookshop, in Winchester this time, put me on the track of John Conroy Hutcheson and Charlotte M. Yonge.

Kingston came in because of an email received from a reader, advising me to start using eBay, and to build up a collection of his books that way. Today that collection tops 70 volumes, 50 of them already on line. And I have bought several hundreds of books online through eBay and Abebooks.

Another email from a reader, received a few months ago, put me on the track of Captain Mayne Reid, the first writer of books about the wild life in the Southern States and Mexico. He wrote about 20 books, of which three are already on the website.

And very recently an email from a reader asked if I could put on some books by Talbot Baines Reed, some of which are “school stories”. As many of his books are available cheaply I bought a dozen, and we are working through these as well.

How? Of course at first I had to scan the books by pressing them down onto a flat-bed scanner, which did some of the books a great deal of harm. Even to this day some people scanning books cut the back off the book, and scan it as a sequence of loose sheets, which they then throw away. I have to confess that I even did that myself for at least one book, which in the end I abandoned.

In 2004 I bought a good camera and found that it was easy to make usable images of the pages of a book with it. But during that year my old scanner gave up the ghost, and I bought a new one. This scanner was a great mistake, because of the software that was bundled with it, and in the autumn of that year I heard about the Plustek OpticBook 3600 scanner. It sounded ideal for the job and I had one working by the start of December 2004, since when I have used no other.

But it wasn’t until January 2006 that I increased the scanning depth to 600 dpi. This means that the speed of scanning is a little slower, but that the quality of the resulting raw OCRed texts is much better.

What Software? At first I used the Hungarian Recognita for scanning, but when I started on the Marryat books in 1997 I began to use TextBridge, moving to TextBridge Pro 97 when that became available. However I have used ABBYY FineReader since 2004.

All the software used to edit and proof the books was written by myself, in the C language. Although some of the modules are by now perhaps redundant, there are about 400 modules in use, though the user does not need to know the names of any of them.

For program writing I use a popup text editor called “Simon”, the author of which would be surprised if he knew how much I rely on it, as he no longer uses it himself

For creating viewable collections of the scans of a book I assemble them using “ImageToPdf” into pdf files viewable by Adobe software. This is a comparatively inexpensive but commercial program that does not use any of the Adobe software that puts the price up so steeply of other programs that create pdfs.

For batch conversion of the tiff scans into Group 4 images I use “Advanced Batch Converter”. The packing of the scans is very efficient in this format. This program is also used to reduce the images to the size in which they can be used by the editor. Until I started to use this program in 2004 I merely scaled the images by simple factors that I had written a program to do. This worked, but needed improving upon.

What is produced? What we put on the website is a set of xhtml (an orderly form of html) files, one for each chapter of the book. There is an index file letting you choose which chapter to read, and there is a short essay by myself on the substance of the book.

But we do in fact maintain the software for several other kinds of output. We used to put the books out as a single xhtml file for each book, but now because of the bandwidth used up by people who really only want a momentary glimpse of the book, we keep these in a place on the website that only approved users have access to.

We create a Microsoft e-book formatted version, known as a lit file. This is in fact a very efficient way of publishing a book, because it is quite tightly packed, and can be read aloud by nearly all of the commercial “voices.” It also looks nice on the screen, if you want to read it as a book instead of listening to it, and furthermore can be imported easily into a hand-held PC.

But most important of all we produce a file that can be read aloud very well by Fonix ISpeak. This can be made available to anyone that asks for it, but we do not put it directly on the website.

Very recently a most excellent “voice” for reading books in the best English has come onto the market. This is “Daniel” by Scansoft. This is so good that very little preparation of the book is required before it can be used to read a book. This voice, and many others, is available through NextUp’s “TextAloud MP3”. With this program you can easily download a text from Gutenberg, have it chopped up automatically into its individual chapters, have those chapters transcribed into well-spoken English MP3 files; and it then is easy to burn these onto CDs or DVDs, or to copy them into an MP3 player, or even burn them onto a MiniDisk, and listen to them with a MiniDisk player. I find this last development very exciting, and take a pile of CDs and MiniDisks with me when I go on holiday. To me, that’s what making e-books is all about—audiobooks.

Nick Hodson, 15th March 2006.