UBUNTU APP: gscan2pdf

A number of times now Ive needed some OCR software. OCR stands for Optical Character Recognition. A document I need to have transcribed into Open Office, a magazine article, whatever it is, Ive need OCR software in Ubuntu a few times now. Theres command line programs like "gocr", but for those of us that have used the command line for years now, but still cant figure it out completely, in comes gscan2pdf. Its a GUI mainly for scanning in documents and converting those images to PDF. There website says this:

"gscan2pdf - A GUI to produce a multipage PDF or DjVu from a scan."

But it has a clever feature that allows you to import an image (or to scan it in) and then run the OCR software. gscan2pdf makes the OCR process very easy. 

- Open gscan2pdf
- Scan in your image or import it
- go to "tools" and than select OCR 
- pick GOCR or Tesseract as your OCR engine
- click the "Start OCR" button and let it do its thing

The output will be in the lower half of the gscan2pdf program. Click the image below to see gscan2pdf full size. I have better luck using the Tesseract OCR engine. Your mileage may vary of course!

SHARE

About jake

    Blogger Comment
    Facebook Comment

2 comments:

  1. Anonymous7:35 AM

    Here is a good article which compares the various OCR software:

    ReplyDelete
  2. Hi Nina and thanks for the reply! I dont see an article link though.

    In any event there is lots of stuff on the net about OCR in linux, but not a lot of good info.

    XPDF has a command line utility for exporting PDF to text, but it doesnt work well at all.

    KOOKA scanning utility for KDE supports GOCR, but you need a scanner hooked up to even open KOOKA.

    Abbyy Finereader in Windows is AWESOME (costs money) and they have a trial version available for linux...
    http://www.abbyy.com/ocr_sdk_linux/key_features/

    I'll try it out, but I really dont want to pay for anything :)

    Again, the best Ive found for Ubuntu users is gscan2pdf and using the OCR feature within.

    ReplyDelete