Friday, January 24, 2020

A new ebook project


My friends at the Paperback Warrior blog had high praise for this Harry Whittington story - THE SEARCHING RIDER, which made up half of Ace Double D-510. I was able to find a really good scan of the story online and I converted the images to text using an OCR program called Tesseract. After running the text through a Perl script to remove page numbers, titles and linefeeds I then copied the text into MS Word, did a few Replace All changes to fix some oddball characters and ended up with a pretty clean document. I'll still have to go through the painstaking process of comparing the source scan to the Word document and fixing a lot of other stuff. The best part of that is that I get to read the book, plus it will be digitally preserved in a true book format. Anyway, the screenshot above shows the source image on the left, and the target document on the right. I'm a bit constrained on time so it might take a while.

1 comment:

  1. Hi. I found this post from the book review on the Paperback Warrior website. I'm also a member of the Men's Adventure Paperbacks Of The 20th Century and just recently got a copy of the Ace Double and will start reading it tonight. I was intrigued to read about your project to convert the book to ebook format. I'm currently editing manuscripts for Piccadilly Publishing so I'm quite familiar with the whole editing process. I was wondering if you ever finished the conversion?

    ReplyDelete