My friends at the
Paperback Warrior blog had
high praise for this Harry Whittington story - THE SEARCHING RIDER, which made up half of Ace Double D-510. I was able to find a really good scan of the story online and I converted the images to text using an OCR program called
Tesseract. After running the text through a Perl script to remove page numbers, titles and linefeeds I then copied the text into MS Word, did a few Replace All changes to fix some oddball characters and ended up with a pretty clean document. I'll still have to go through the painstaking process of comparing the source scan to the Word document and fixing a lot of other stuff. The best part of that is that I get to read the book, plus it will be digitally preserved in a true book format. Anyway, the screenshot above shows the source image on the left, and the target document on the right. I'm a bit constrained on time so it might take a while.
No comments:
Post a Comment