Hack 64 Get and Set PDF Metadata

 < Day Day Up > 

figs/moderate.gif figs/hack64.gif

Add document information to your PDF, even without using Acrobat .

Traditional metadata includes things such as your document's title, authors, and ISBN. But you can add anything you want, such as the document's revision number, category, internal ID, or expiration date. PDF can store this information in two different ways: using the PDF's Info dictionary [Hack #80] or using an embedded Extensible Metadata Platform (XMP) stream. When you change the PDF's title, authors, subject, or keywords using Acrobat, as shown in Figure 5-13, it updates both of these resources. Acrobat 6 also enables you to export or import PDF XMP datafiles. Visit http://www.adobe.com/products/xmp/ to learn about Adobe's XMP.

Figure 5-13. Viewing or changing a PDF's basic metadata in Acrobat
figs/pdfh_0513.gif

In Acrobat 6, view and update metadata by selecting File Document Properties . . . Description or Advanced Document Metadata . . . . In Acrobat 5, select File Document Properties Summary. Save your PDF after making changes to the metadata.

Our pdftk [Hack #79] currently reads and writes only the metadata in a PDF's Info dictionary. However, it does not restrict you to just the title, authors, subject, and keywords. This solves the basic problem of embedding information into a PDF document; pdftk allows you to add custom metadata fields to PDF as needed. pdftk is free software.

Xpdf's (http://www.foolabs.com/xpdf/) pdfinfo reports a PDF's Info dictionary contents, its XMP stream, and other document data. pdfinfo is free software.

5.15.1 Get Document Metadata

To create a plain-text report of PDF metadata, use pdftk's dump_data operation. It will also report PDF bookmarks and page labels, among other things. The command looks like this:

  pdftk    mydoc.pdf    dump_data output    mydoc.data.txt   

Metadata will be represented as key/value pairs, like so:

 InfoKey: Creator InfoValue: Acrobat PDFMaker 6.0 for Word InfoKey: Title InfoValue: Brian Eno: His Music and the Vertical Color of Sound InfoKey: Author InfoValue: Eric Tamm InfoKey: Producer InfoValue: Acrobat Distiller 6.0.1 (Windows) InfoKey: ModDate InfoValue: D:20040420234132-07'00' InfoKey: CreationDate InfoValue: D:20040420234045-07'00' 

Another tool for reporting PDF metadata is pdfinfo, which is part of the Xpdf project (http://www.foolabs.com/xpdf/). In addition to metadata, it also reports page sizes, page count, and PDF permissions [Hack #52] . Running pdfinfo mydoc.pdf yields a report such as this:

 Title:          Brian Eno: His Music and the Vertical Color of Sound Author:         Eric Tamm Creator:        Acrobat PDFMaker 6.0 for Word Producer:       Acrobat Distiller 6.0.1 (Windows) CreationDate:   04/20/04 23:40:45 ModDate:        04/22/04 14:39:30 Tagged:         no Pages:          216 Encrypted:      no Page size:      522 x 756 pts File size:      1126904 bytes Optimized:      yes PDF version:    1.4 

Use pdfinfo's options to fine-tune its behavior. Use its -meta option to view a PDF's XMP stream.

5.15.2 Set Document Metadata

pdftk can take a plain-text file of these same key/value pairs and update a PDF's Info dictionary to match. Currently, it does not update the PDF's XMP stream. The command would look like this:

  pdftk    mydoc.pdf    update_info    new_info.txt    output    mydoc.updated.pdf   

This will add or modify the Info keys given by mydoc.new_data.txt . Note that the output PDF filename must be different from the input. To remove a key/value pair, simply pass in the key/value with an empty value, like so:

 InfoKey: MyDataKey InfoValue: 

Use pdftk to strip all Info and XMP metadata from a document by copying its pages into a new PDF, like so:

 pdftk mydoc.pdf cat A output mydoc.no_metadata.pdf 


The PDF specification defines several Info fields. Be careful to use these only as described in the specification. They are Title , Author , Subject , Keywords , Creator , Producer , CreationDate , ModDate , and Trapped .

 < Day Day Up > 


PDF Hacks.
PDF Hacks: 100 Industrial-Strength Tips & Tools
ISBN: 0596006551
EAN: 2147483647
Year: N/A
Pages: 158
Authors: Sid Steward

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net