Optimizing PDF Documents


Search engines have become increasingly efficient at indexing different types of documents. Google, for example, can index 12 types of documents, including Microsoft Word, Microsoft Excel, Microsoft PowerPoint, RTF (rich text format), and Adobe PDF documents. Many other search engines can index PDF documents as well.

PDF stands for portable document format , which is a universal file format that preserves fonts, colors, graphic images, and formatting of any source document. Many web site owners like to create marketing brochures , media kits, and how-to manuals in PDF format and make them available on the web. Figure 3.18 shows a typical web page highlighting media kits.

Figure 3.18. Position Technologies created its media kit documents in PDF format.

graphics/03fig18.gif

Many web site owners like to have PDF documents on their web sites because they want to preserve the exact and look and feel of a printed piece. For example, let's say you would like your online brochure text to display in the typeface Avant Garde. In order for the online brochure to actually appear in this typeface, your site's visitors must have the Avant Garde typeface installed on their computers. If your visitors do not have this typeface installed, your online brochure will look completely different than what you intended. Therefore, many online brochures are formatted as PDF documents.

PDF documents can achieve top search engine visibility when formatted correctly. In fact, some top search engine results are PDF documents, as shown in Figure 3.19.

Figure 3.19. A PDF document displays in the top search result in Google for the keyword phrase "chromatography manuals."

graphics/03fig19.gif

To make your PDF documents search friendly, the documents must contain actual text, not a picture of text. One way to determine if a PDF document contains text the search engines can index is to check the Document Properties dialog box. If no fonts are displayed in the Document Properties dialog box, the PDF document does not contain any text.

To check for fonts in your PDF files, follow these steps:

  1. Open the PDF document in Acrobat 5.0.

  2. Select File > Document Properties > Fonts. The Document Fonts dialog box should appear, as shown in Figure 3.20. If any fonts appear in this dialog box, the PDF document contains text the search engines can index.

    Figure 3.20. The Document Fonts dialog box for this PDF document displays four fonts, which means that search engines are able to index the text in this document.

    graphics/03fig20.gif

To see the specific text the search engines are able to index, use the Text Select tool, which is highlighted in Figure 3.21.

Figure 3.21. The Text Select tool in Adobe Acrobat 5.0.

graphics/03fig21.gif

Try to highlight the text in the PDF document, as shown in Figure 3.22. The text you are able to highlight is the text that the search engines can index.

Figure 3.22. In this PDF example, the text in the main paragraphs can be highlighted, but the text in the logo cannot. Therefore, the search engines are not able to index the text in this logo.

graphics/03fig22.gif

General Guidelines

The same optimization guidelines apply to PDF documents that apply to HTML documents.

  • Make sure your PDF documents contain text that the search engines can index.

    Search engines are unable to index Image Only PDF documents. So if you create a PDF document by using a flatbed scanner, the search engines will not be able to extract that text.

  • Use keyword-rich text in your PDF documents.

    Important

    graphics/icon01.gif

    Site visitors find it useful to know they will be viewing a PDF document before they click on a link. Since many PDF documents tend to be greater than 100K in file size, visitors like to know the file size information as well.

    For example, on the fictional TranquiliTeas web site, a simple way to let visitors know they will be viewing a PDF document is to make the hypertext link look like the following:

    View the TranquiliTeas Organic Tea Brochure PDF (360K)


  • For PDF documents with multiple pages, the most important text is on the first page of your PDF document.

    Be sure that the titles, headlines, and text on the first page of your PDF documents contain your most important keywords.

  • Minimize download time.

    In general, search engine representatives recommend keeping document file size to less than 100K. If you find that your PDF documents are larger than 100K, consider creating abstracts.

  • Create optimized HTML pages with abstracts of PDF documents.

    If your PDF documents are considerably large, such as a manual or a catalog, consider creating HTML pages that summarize PDF files. The abstract pages should contain at least 200 to 250 words of quality content within the <body > and </body> tags. Title tags and meta tags should also contain keywords.

    Whenever possible, the anchor text to the PDF file should contain keywords. Be sure to have links to your PDF documents on your Site Map page as well.



Search Engine Visibility
Search Engine Visibility (2nd Edition)
ISBN: 0321503244
EAN: 2147483647
Year: 2003
Pages: 111
Authors: Shari Thurow

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net