Hack 81 Edit PDF Code Freely

 < Day Day Up > 

figs/expert.gif figs/hack81.gif

Take control of PDF code by mastering its XREF table .

[Hack #80] revealed the hackable plain text behind PDF. Here we edit this PDF text and then use pdftk [Hack #79] to cover our tracks. pdftk can also compress the page streams when we're done.

An unsuitable text editor can quietly damage your PDF. Test your text editor by simply opening a PDF, saving it into a new file, and then trying to open this new file in Acrobat or Reader. If your editor corrupted the PDF's data, Acrobat or Reader should display a brief warning before displaying the PDF. Sometimes, however, this warning flashes by too quickly to notice. After the PDF is repaired, Acrobat and Reader will display the PDF as if nothing happened .

Since Acrobat and Reader aren't the most reliable tools for testing PDFs, you should consider some alternatives. The free command-line pdfinfo program from the Xpdf project (http://www.foolabs.com/xpdf/) can tell you whether a PDF is damaged. The Multivalent Tools (http://multivalent. sourceforge .net/Tools/index.html) also provide a free PDF validator.

If you need a good text editor, try gVim [Hack #82] .


First, uncompress your PDF's page streams [Hack #80] :

  pdftk    mydoc.pdf    output    mydoc.uncompressed.pdf    uncompress  

Then, open this new PDF in your text editor. Locate your page of interest by searching for the text /pdftk_pageNum N , where N is the number of your page (the first page is 1, not 0). This text was added to the page dictionaries by pdftk.

Find the /Contents key in your page's dictionary. It is probably mapped to an indirect object reference: m n R . Locate this indirect object by searching for the text m n obj . This will take you to a stream or to an array of streams. If it is an array, look up any of its referenced streams the same way.

Now you should be looking at a stream of PDF drawing operations that describe your page. These operations and their interactions are best understood by studying the PDF Reference [Hack #98] . However, if your page has a lot of text on it, you can probably make it out. An example of a legal change in page text is changing [(gr)17.7(oup)] to [(grip)] , or (storey) to (story) . Anything inside parentheses this way is fair game. So, change something and save your work.

Editing PDF at the text level typically corrupts the XREF lookup table at the end of the file. Repair your edited PDF using pdftk like so:

  pdftk    mydoc.edited.pdf    output    mydoc.fixed.pdf   

Or, if you want to compress the output and remove the /pdftk_pageNum entries, add compress to the end like so:

  pdftk    mydoc.edited.pdf    output    mydoc.fixed.pdf    compress  

Open your new PDF in Reader and view your page. Do you see the change you made? If it was in the middle of a paragraph, you might be surprised to find that the paragraph hasn't rewrapped to fit your altered word. Most PDFs have no concept of a paragraph , so how could it?

This procedure is an unlikely way to fix typos. We put it to better use in [Hack #82] .

 < Day Day Up > 


PDF Hacks.
PDF Hacks: 100 Industrial-Strength Tips & Tools
ISBN: 0596006551
EAN: 2147483647
Year: N/A
Pages: 158
Authors: Sid Steward

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net