Hack 95 Script Acrobat Using Perl on Windows

 < Day Day Up > 

figs/expert.gif figs/hack95.gif

Install Perl and use it instead of Visual Basic to drive Acrobat .

Depending on your tastes or requirements, you might want to use the Perl scripting language instead of Visual Basic [Hack #94] to program Acrobat. Perl can access the same Acrobat OLE interface used by Visual Basic to manipulate PDFs. Perl is well documented, is widely supported, and has been extended with an impressive collection of modules. A Perl installer for Windows is freely available from ActiveState.

We'll describe how to install the ActivePerl package from ActiveState, and then we'll use an example to show how to access Acrobat's OLE interface using Perl.

Acrobat OLE documentation comes with the Acrobat SDK [Hack #98] . Look for IACOverview.pdf and IACReference.pdf . Acrobat Distiller also has an OLE interface. It is documented in DistillerAPIReference.pdf .


7.4.1 Install Perl on Windows

The ActivePerl installer for Windows is freely available from http://www.ActiveState.com/Products/ActivePerl/. Download and install. It comes with excellent documentation, which you can access by selecting Start Programs ActiveState ActivePerl 5.8 Documentation.

ActivePerl also includes the OLE Browser, shown in Figure 7-8, which enables you to browse the OLE servers available on your machine (Start Programs ActiveState ActivePerl 5.8 OLE-Browser). The OLE Browser is an HTML file that must be opened in Internet Explorer to work properly.

Figure 7-8. The OLE Browser, which you can use to discover OLE servers available on your machine
figs/pdfh_0708.gif

7.4.2 The Code

In this example, the Perl script will use Acrobat to read annotation (e.g., sticky notes) data from the currently open PDF. The script will format this data using HTML and then output it to stdout .

Copy the script in Example 7-2 into a file named SummarizeComments.pl . You can download this code from http://www.pdfhacks.com/summarize/.

Example 7-2. Perl code for summarizing comments
 # SummarizeComments.pl ver. 1.0 use strict; use Win32::OLE; my $app = Win32::OLE->new("AcroExch.App"); if( 0< $app->GetNumAVDocs ) { # a PDF is open in Acrobat   # open the HTML document   print "<html>\n<head>\n<title>PDF Comments Summary</title>\n</head>\n<body>\n";   my $found_notes_b= 0;   # get the active PDF and drill down to its PDDoc   my $avdoc= $app->GetActiveDoc;   my $pddoc= $avdoc->GetPDDoc;   # iterate over pages   my $num_pages= $pddoc->GetNumPages;   for( my $ii= 0; $ii< $num_pages; ++$ii ) {     my $pdpage= $pddoc->AcquirePage( $ii );     if( $pdpage ) {       # interate over annotations (e.g., sticky notes)       my $page_head_b= 0;       my $num_annots= $pdpage->GetNumAnnots;       for( my $jj= 0; $jj< $num_annots; ++$jj ) {            my $annot= $pdpage->GetAnnot( $jj );         # Pop-up annots give us duplicate contents         if( $annot->GetContents ne '' and           $annot->GetSubtype ne 'Popup' ) {           if( !$page_head_b ) { # output the page number             print "<h2>Page: " . ($ii+ 1) . "</h2>\n";             $page_head_b= 1;           }           # output the annotation title and format it a little           print "<p><i>" . $annot->GetTitle . "</i></p>\n";                      # output the note text; replace carriage returns           # with paragraph breaks           my $comment= $annot->GetContents;           $comment =~ s/\r/<\/p>\n<p>/g;           print "<p>" . $comment . "</p>\n";           $found_notes_b= 1;         }       }     }   }   if( !$found_notes_b ) {     print "<h3>No Notes Found in PDF</h3>\n";   }      # close the HTML document   print "</body>\n</html>\n"; } 

7.4.3 Running the Hack

Open a PDF in Acrobat, as shown in Figure 7-6, and then run this script from the command line by typing:

 C:\>  perl SummarizeComments.pl > comments.html  

It will take a few seconds to complete. When it is done, you can open comments.html in your browser to see a summary of the PDF's comments, as shown in Figure 7-9.

Figure 7-9. The PDF Comments in Mozilla after extraction via SummarizeComments.pl
figs/pdfh_0709.gif

As noted in [Hack #94] , this example demonstrates the relationships between several fundamental PDF objects.

 < Day Day Up > 


PDF Hacks.
PDF Hacks: 100 Industrial-Strength Tips & Tools
ISBN: 0596006551
EAN: 2147483647
Year: N/A
Pages: 158
Authors: Sid Steward

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net