Recipe 20.5 Converting HTML to ASCII

20.5.1 Problem

You want to convert an HTML file into formatted, plain ASCII. For example, you want to mail a web document to someone.

20.5.2 Solution

If you have an external formatter like lynx, call an external program:

$ascii = `lynx -dump $filename`;

If you want to do it within your program and don't care about the things that the HTML::FormatText formatter doesn't yet handle well (tables and frames):

use HTML::FormatText 3; $ascii = HTML::FormatText->format_file(   $filename,   leftmargin => 0, rightmargin => 50 );

20.5.3 Discussion

These examples both assume the HTML is in a file. If your HTML is in a variable, you need to write it to a file for lynx to read. With HTML::FormatText, use the format_string( ) method:

use HTML::FormatText 3; $ascii = HTML::FormatText->format_string(   $filename,   leftmargin => 0, rightmargin => 50 );

If you use Netscape, its "Save as" option with the type set to "Text" does the best job with tables.

20.5.4 See Also

The documentation for the CPAN modules HTML::TreeBuilder and HTML::FormatText; your system's lynx(1) manpage; Recipe 20.6



Perl Cookbook
Perl Cookbook, Second Edition
ISBN: 0596003137
EAN: 2147483647
Year: 2003
Pages: 501

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net