Scanning Documents

team bbl


If you have a scanner, you can scan documents in Linux. You can scan two types of documents:

  • Images: A document is scanned and stored in a graphics file format. The image must be viewed with software that can display graphics. The image is a picture of the document.

  • Text: A document that contains text can be converted into a text document. Using a process called optical character recognition (OCR), the characters that are scanned are read as letters and stored in a text file. OCR doesn't get the characters 100% correct. Its accuracy depends on the quality of the document being scanned. However, if you need to edit a document, it's often faster to scan it and edit the OCR errors than to type the document from scratch.

Scanning on Linux is provided via Kooka, an open source raster image scan program that is an official part of the KDE Graphics Package. Kooka uses the SANE (Scanner Access Now Easy) library. Kooka provides OCR, as well as scanning documents in image format.

The SANE Web site allows you to search a database of scanners. You can enter a manufacturer and model to determine whether the scanner is supported (www.sane-project.org/cgi-bin/driver.pl). If possible, check for scanner support before purchasing a scanner.

If you want to use OCR on your scanned image, additional software is required. You may need to install it. Check your distribution software for a package called gocr. If you can't find it on your system or your installation CDs, you can install a package from jocr.sourceforge.net. See Chapter 10 for information on installing packages.

Start Kooka from the main menu, in the graphics or multimedia submenu, or select Run and type kooka. When Kooka starts, it scans your system for scanners and provides a list of scanners found. Select the scanner you want to use.

When Kooka is open, look at the bottom-left section for your scanner settings. What settings you can change depend on the scanner you are using. You can usually set color and resolution and often brightness and other options. You may have to experiment with settings to obtain the best scan. Higher resolution can improve the image, but can result in huge image files. The default is often 72 for a screen display, appropriate for Web images. However, if the document is text that you want to transcribe using OCR, choose a higher resolution.

Preview the picture first. Click Preview Scan when the document is in the scanner. When it's been scanned, click the Preview tab. In the image preview section, you can select a section of the picture to scan, rather than the whole picture. Select the section to scan with the mouse. Click ImageCanvas and select Create from selection. When you are satisfied with the preview, click Final Scan.

When the final scan is complete, a window (shown on the right) opens where you can select the graphics format for the image. A description of a format displays when you highlight a format from the list box.

If you check the "Don't ask again" box below the list box, Kooka will save all future images in the selected format. To change the format in the future, click Settings and select Configure Kooka->Save Image->Always show memory assistant.

Click OK when you have selected the desired format.

The final image displays in the right section of the Kooka window. Right-click the image to see some View options, such as Scale to width, Zoom, or Rotate Image.

You can click the Gallery tab to see a list of the images you have scanned. When you are ready to save a scanned image, right-click its name in the gallery and select Save Image.

If the image you scanned is a document that you want to transcribe using OCR, click ImageCanvas and select OCR image. A window that allows you to change some settings displays. Again, you may need to experiment to achieve the best results. You can start with the defaults. Click Start OCR. The process may take a little time. When it's finished, a window opens showing the output from the OCR process, with an editing window below. You can try different settings for more accurate results.

When you are satisfied with the OCR output, click Open in Kate. Kate is a KDE text editor, described in Chapter 18. Kate can be used to edit the file if necessary, including checking the file with the spell checker, useful for finding the OCR errors in the file. You can save the file from Kate as a text file. You can then open it in any text editor or word processor for further use or editing.

    team bbl



    Spring Into Linux
    Spring Into Linux
    ISBN: 0131853546
    EAN: 2147483647
    Year: 2005
    Pages: 362
    Authors: Janet Valade

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net