UNDERSTANDING OPTICAL CHARACTER RECOGNITION (OCR) | Special Edition Using Adobe Creative Suite 2

Optical Character Recognition (OCR) is the ability to "read" text from a rasterized image and create an editable text from it. There are two ways to use the OCR engine that is built inside Acrobat and each depends on whether the image that contains the text has already been scanned.

If the image is already part of a PDF document, open the OCR engine by choosing Document, Recognize Text Using OCR, Start. In the Recognize Text dialog you can adjust language, resolution to downsample images, and the output method in addition to specifying pages of the document you want processed. Acrobat recognizes not only the text but also the font and other formatting and presents the converted text so that it looks as close to the original as possible (see Figure 44.1). If you need to process only one scanned image in a document you can click on it with the Select tool to select it. Next, right-click (in Windows) or Control-click (on the Mac) on it and select Recognize Text using OCR from the context menu.

Figure 44.1. Acrobat makes it easy to use OCR to generate editable text.

If you have a printed pageor several pageswith text that you need to convert to editable text you need to bring it into Acrobat first. To do this, choose File, Create PDF, From Scanner. A dialog prompts you to select your scanner and edit OCR settings. These OCR settings are the same as when using the previous method. You can use this feature to scan long paper documents, for example, if your scanner has an auto-feeder. Acrobat generates the total number of pages of the original document.