Recipe 1.17 Reformatting Paragraphs

1.17.1 Problem

Your string is too big to fit the screen, and you want to break it up into lines of words, without splitting a word between lines. For instance, a style correction script might read a text file a paragraph at a time, replacing bad phrases with good ones. Replacing a phrase like utilizes the inherent functionality of with uses will change the length of lines, so it must somehow reformat the paragraphs when they're output.

1.17.2 Solution

Use the standard Text::Wrap module to put line breaks at the right place:

use Text::Wrap; @output = wrap($leadtab, $nexttab, @para);

Or use the more discerning CPAN module, Text::Autoformat, instead:

use Text::Autoformat; $formatted = autoformat $rawtext;

1.17.3 Discussion

The Text::Wrap module provides the wrap function, shown in Example 1-3, which takes a list of lines and reformats them into a paragraph with no line more than $Text::Wrap::columns characters long. We set $columns to 20, ensuring that no line will be longer than 20 characters. We pass wrap two arguments before the list of lines: the first is the indent for the first line of output, the second the indent for every subsequent line.

Example 1-3. wrapdemo
  #!/usr/bin/perl -w   # wrapdemo - show how Text::Wrap works   @input = ("Folding and splicing is the work of an editor,",             "not a mere collection of silicon",             "and",             "mobile electrons!");   use Text::Wrap qw($columns &wrap);   $columns = 20;   print "0123456789" x 2, "\n";   print wrap("    ", "  ", @input), "\n";

The result of this program is:

01234567890123456789     Folding and   splicing is the   work of an   editor, not a   mere collection   of silicon and   mobile electrons!

We get back a single string, with newlines ending each line but the last:

# merge multiple lines into one, then wrap one long line use Text::Wrap; undef $/; print wrap('', '', split(/\s*\n\s*/, <>));

If you have the Term::ReadKey module (available from CPAN) on your system, you can determine your window size so you can wrap lines to fit the current screen size. If you don't have the module, sometimes the screen size can be found in $ENV{COLUMNS} or by parsing the output of the stty(1) command.

The following program tries to reformat both short and long lines within a paragraph, similar to the fmt(1) program, by setting the input record separator $/ to the empty string (causing <> to read paragraphs) and the output record separator $\ to two newlines. Then the paragraph is converted into one long line by changing all newlines and any surrounding whitespace to single spaces. Finally, we call the wrap function with leading and subsequent tab strings set to the empty string so we can have block paragraphs.

use Text::Wrap      qw(&wrap $columns); use Term::ReadKey   qw(GetTerminalSize); ($columns) = GetTerminalSize( ); ($/, $\)  = ('', "\n\n");   # read by paragraph, output 2 newlines while (<>) {                # grab a full paragraph     s/\s*\n\s*/ /g;         # convert intervening newlines to spaces     print wrap('', '', $_); # and format }

The CPAN module Text::Autoformat is much more clever. For one thing, it tries to avoid "widows," that is, very short lines at the end. More remarkably, it correctly copes with reformatting paragraphs that have multiple, deeply nested citations. An example from that module's manpage shows how the module can painlessly convert:

In comp.lang.perl.misc you wrote: : > <CN = Clooless Noobie> writes: : > CN> PERL sux because: : > CN>    * It doesn't have a switch statement and you have to put $ : > CN>signs in front of everything : > CN>    * There are too many OR operators: having |, || and 'or' : > CN>operators is confusing : > CN>    * VB rools, yeah!!!!!!!!! : > CN> So anyway, how can I stop reloads on a web page? : > CN> Email replies only, thanks - I don't read this newsgroup. : > : > Begone, sirrah! You are a pathetic, Bill-loving, microcephalic : > script-infant. : Sheesh, what's with this group - ask a question, get toasted! And how : *dare* you accuse me of Ianuphilia!

into:

In comp.lang.perl.misc you wrote: : > <CN = Clooless Noobie> writes: : > CN> PERL sux because: : > CN>    * It doesn't have a switch statement and you : > CN>      have to put $ signs in front of everything : > CN>    * There are too many OR operators: having |, || : > CN>      and 'or' operators is confusing : > CN>    * VB rools, yeah!!!!!!!!! So anyway, how can I : > CN>      stop reloads on a web page? Email replies : > CN>      only, thanks - I don't read this newsgroup. : > : > Begone, sirrah! You are a pathetic, Bill-loving, : > microcephalic script-infant. : Sheesh, what's with this group - ask a question, get toasted! : And how *dare* you accuse me of Ianuphilia!

simply via print autoformat($badparagraph). Pretty impressive, eh?

Here's a miniprogram that uses that module to reformat each paragraph of its input stream:

use Text::Autoformat; $/ = ''; while (<>) {     print autoformat($_, {squeeze => 0, all => 1}), "\n"; }

1.17.4 See Also

The split and join functions in perlfunc(1) and Chapter 29 of Programming Perl; the manpage for the standard Text::Wrap module; the CPAN module Term::ReadKey, and its use in Recipe 15.6 and the CPAN module Text::Autoformat



Perl Cookbook
Perl Cookbook, Second Edition
ISBN: 0596003137
EAN: 2147483647
Year: 2003
Pages: 501

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net