PDFpen logo PDFpen SmileOnMyMac logo
 

Help: OCR (Optical Character Recognition)

OCR (Optical Character Recognition) is the process of converting a bitmap image of text (like a scanned document) into text that can be selected, copied and searched by PDFpen and other text editing software.

OCR technology will not produce a perfect rendering of the bitmapped text. You will need to proofread and edit the text that results from OCR.

Using OCR in PDFpen

  1. Open a scanned PDF in PDFpen
  2. An alert box opens with the message "This document appears to be scanned. Would you like to perform optical character recognition (OCR) on it? OCR will allow you to select the text." You have three options:
    • Cancel
      No OCR will be performed
    • OCR Page
      OCR will be performed on the current page
    • OCR Document
      If your document has multiple pages, OCR will be performed on all of the pages.

While PDFpen is performing the OCR, a progress bar will appear. The operation can take a few seconds or much longer, depending on the size and the contents of the scanned document.

To perform OCR manually, choose Edit > OCR. PDFpen commences to perform the OCR operation and the progress bar appears.

Selecting, copying and correcting OCR Text

The text generated by the OCR operation can be edited like any other text. See the Working with Text.

Searching OCR Text

The text generated by the OCR operation can be searched like any other text. See Searching Within A PDF.

Tips to Improve the OCR Results of Your Document:

  • The quality of the original document affects the quality of the OCR performance. Crisp, clean originals with clear text will produce much better results than crumpled, faded photocopies.
  • Place your original document on the scanner as straight as possible. If you have a scanned PDF that is not straight, use software to straighten (or "deskew") the image before opening it in PDFpen.
  • Increase the contrast of your scanned document so that the background is as white as possible.

How to Force PDFpen to Perform OCR

PDFpen looks at the document and if it sees one image the size of a page, it assumes that the document is a scan and automatically offers to perform OCR. In some cases, PDFpen may not recognize a scanned document. Under the Edit menu, OCR... will be grayed out and unavailable to select.

  1. Hold down the Command and Option keys together.
  2. Choose Edit > OCR... from the menu.

 

 

 
 
© 2003-2009 SmileOnMyMac, LLC. All rights reserved.
SmileOnMyMac, PDFpen and PDFpenPro are trademarks of SmileOnMyMac, LLC.