| < Day Day Up > |
Hack 56 Add PDF Attachment Actions to
|
|
If you invoke one of these commands on a selection of multiple PDFs, you will get one command prompt for each PDF.
| < Day Day Up > |
| < Day Day Up > |
Hack 57 Create a Traditional Index Section from Keywords
Add a search feature to your print edition . Creating a good document Index section is a difficult job performed by professionals. However, an automatically generated index still can be very helpful. Use automatic keywords [Hack #19] or select your own keywords. This hack will locate their pages, build a reference, and then create PDF pages that you can append to your document, as shown in Figure 5-5. It even uses your PDF's page labels (also known as logical page numbering ) to ensure trouble-free lookup. Figure 5-5. Turning document keywords into a PDF Index section
5.8.1 Tool UpDownload and install pdftotext [Hack #19] , our kw_index [Hack #19] , and pdftk [Hack #79] . You must also have enscript (Windows users visit http://gnuwin32.sf.net/packages/enscript.htm ) and ps2pdf. ps2pdf comes with Ghostscript [Hack #39] . Our kw_index package includes the kw_catcher and page_refs programs (and source code) that we use in the following sections. 5.8.2 The ProcedureFirst, set your PDF's logical page numbering [Hack #62] to match your document's page numbering. Then, use pdftk to dump this information into a text file, like so: pdftk mydoc.pdf dump_data output mydoc.data.txt Next, convert your PDF to plain text with pdftotext:
pdftotext
mydoc.pdf mydoc.txt
Create a keyword list [Hack #19] from mydoc.txt using kw_catcher, like so: kw_catcher 12 keywords_only mydoc.txt > mydoc.kw.txt Edit mydoc.kw.txt to remove duds and add missing keywords. At present, only one keyword is allowed per line. If two or more keywords are adjacent in mydoc.txt , our page_refs program will assemble them into phrases. Now pull all these together to create a text index using page_refs: page_refs mydoc.txt mydoc.kw.txt mydoc.data.txt > mydoc.index.txt Finally, create a PDF from mydoc.index.txt using enscript and ps2pdf: enscript --columns 2 --font 'Times-Roman@10' \ --header 'INDEX' --header-font 'Times-Bold@14' \ --margins 54:54:36:54 --word-wrap --output - mydoc.index.txt \ ps2pdf - mydoc.index.pdf 5.8.3 The CodeOf course, the thing to do is to wrap this procedure into a tidy script. Copy the following Bourne shell script into a file named make_index.sh , and make it executable by applying chmod 700 . Windows users can get a Bourne shell by installing MSYS [Hack #97] .
#!/bin/sh
# make_index.sh, version 1.0
# usage: make_index.sh <PDF filename> <page window>
# requires: pdftk, kw_catcher, page_refs,
# pdftotext, enscript, ps2pdf
#
# by Ross Presser, Imtek.com
# adapted by Sid Steward
# http://www.pdfhacks.com/kw_index/
fname=`basename .pdf`
pdftk ${fname}.pdf dump_data output ${fname}.data.txt && \
pdftotext ${fname}.pdf ${fname}.txt && \
kw_catcher keywords_only ${fname}.txt \
page_refs ${fname}.txt - ${fname}.data.txt \
enscript --columns 2 --font 'Times-Roman@10' \
--header 'INDEX' --header-font 'Times-Bold@14' \
--margins 54:54:36:54 --word-wrap --output - \
ps2pdf - ${fname}.index.pdf
5.8.4 Running the Hack
Pass the
make_index.sh
mydoc.pdf 12
The script will create a document index named
mydoc.index.pdf
. Review this index and append it to your PDF document
[Hack #51]
if you
The second argument to
make_index.sh
controls the keyword detection sensitivity. Smaller numbers yield fewer keywords at the risk of omitting some keywords; larger
|
| < Day Day Up > |