Using the Athelstane system to check other people's texts

by NH

The Athelstane system can be made to analyse books for which you only have a plain text version, such as texts downloaded from Gutenberg.

In the first place create a folder for the text. The name for this should have six, seven or eight letters. You also need a name for the author, again of six, seven or eight letters. We normally create an author-folder of this name, and make the book-folder or folders subsidiary to it, though this isn't really necessary.

You now need an unique five character name for the book. We usually choose a name that is not likely to be chosen for any other book, while at the same time being sufficiently memorable for the book we are working on. For instance, as we write, we named “A Houseful of Girls” by Mrs. G de Horne Vaizey, “hgirl” rather than “girls” because the latter might have been chosen for some other book. There is a very good reason for doing this: when you have many books on the website you need to be able to refer to them uniquely by their 5-character name.

So now you are in the folder you have created for the book. Let us pretend you have called the author “myauthor”, the book “the_book” and given a five character name “abook” to the book.

Enter the following: set_book myauthor abook the_book

Of course you used the names you actually gave to the book and its author.

Make sure the book you are going to be working on is in the folder, and that its name is different from “the_book” otherwise you might find it gets overwritten. In any case it is as well to have a backup copy of the original text somewhere else on the computer, or on a CD. Let us pretend that its name is “actual_name.txt”.

Now you need to split the book up into chapters. So enter the following:

    txtsplit no actual_name

The “no” in this command merely tells the program that we are not wanting to produce chapters in a form that can be read immediately by a text-to speech program.

Of course this assumes that there are lines that start “Chapter” or “CHAPTER” in the book. It also looks for the following; “Preface”, “Introduction” and “Prologue”, or their capitalised forms.

Each time it finds one of these lines it starts a new chapter. It produces files called “chapt01.txt”, “chapt02.txt” and so on. Thus sometimes you might find that the file for “Chapter One” is not called “chapt01.txt”, but, say, “chapt03.txt”.

Now enter REN chapt??.txt abook??.txt

Remember that “abook” stands for the name you gave the book.

Give the following commands:

    create short

    copy C:\wavefile\names.ini

    call ro names

Now we are ready to start working through the book, except for one thing, which is that we prefer the main character for speech to be the double quote sign " and not the single quote sign ‘, and certainly not the apostrophe sign ’.

If you need to convert these single-quote signs to double-quote signs and vice-versa, just enter the single command

    dogear (Enter).

This will do the conversion as well as it can, though sometimes there may be places where it has left the character unchanged (you will see why, if you encounter any of these places).

You also need to make sure there are no hash signs in the book. If for example they are in the book and ## is being used to mark a break, then convert all of these to <hr>. You could try the following:

    allrepla   ##   ≤hr≥

Obviously you can’t put < or > on a command line like this one, so we use the characters ≤ (alt-numpad-243) and ≥ (alt-numpad-242) instead. To get these characters hold down the alt key, and enter the numbers on the number keypad on the right of the keyboard.

Now we are ready to roll.

Enter PCN which will get you to the program that we use to edit books.

There are various actions that are to be used when we are checking over books that have come in from another source. The first is 2A, and we will use this for every chapter just to do the following:

Spelling-checker, punctuation checking and checking the signs that should come in pairs. We'll come back to how this is done in a few more paragraphs.

Then we will do the following actions on the whole book:

2BFind missing stop after T
2CFind "I" anomalies
2DFind "V" anomalies
2EFind "Funnies"
2HOver-all Spelling Check
2JNumerals report
2KTest all punctuation tokens
2LRare word test
6KSome frequent misreads
7AFind missing comma before quote
7B"He" that might possibly be "be"
7DCheck for single characters
7KCheck HTML codes balance

Each of the above is fully described in the help file that you can find on the corresponding pages 2, 6 or 7 of PCN.

Now for the chapter-by-chapter checking that we mentioned before:

control-upspelling checker from the start of chapter
control-downspelling checker from current position
alt-‘visits all instances of ‘ in the chapter
alt-’visits all instances of ’ in the book
alt-rchecks punctuation
alt-qchecks paired punctuation

In each case control-z exits the checking temporarily, if you want, and in the case of the commands initiated by alt key the following conventions apply.

1. alt-n OR the original key-stroke restart the checking 2. If the key-stroke is given at the bottom of the file, then it will start working from the top of it. The reason for this is obvious.

The help file available while you are in the editor will get you out of difficulties in all cases, we hope, so don’t be afraid to use it.

So now you are at the end of all the checking, and you have found loads of errors in that book. Since you have corrected the errors in your own derived version of the book, you need to have made a note of each error as you found it. Then you can send this list back to the author, or whoever you got the book from in the first place.

NH, 5th September 2005.

An Essay by the Webmaster of Athelstane E-Texts