Recipe 20.4 Converting ASCII to HTML

20.4.1 Problem

You want to convert ASCII text to HTML. For example, you have mail you want to display intelligently on a web page.

20.4.2 Solution

Use the simple little encoding filter in Example 20-3.

Example 20-3. text2html
  #!/usr/bin/perl -w -p00   # text2html - trivial html encoding of normal text   # -p means apply this script to each record.   # -00 mean that a record is now a paragraph      use HTML::Entities;   $_ = encode_entities($_, "\200-\377");      if (/^\s/) {       # Paragraphs beginning with whitespace are wrapped in <PRE>        s{(.*)$}        {<PRE>\n$1</PRE>\n}s;           # indented verbatim   } else {       s{^(>.*)}       {$1<BR>}gm;                     # quoted text       s{<URL:(.*?)>}    {<A HREF="$1">$1</A>}gs       # embedded URL  (good)                       ||       s{(http:\S+)}   {<A HREF="$1">$1</A>}gs;        # guessed URL   (bad)       s{*(\S+)*}    {<STRONG>$1</STRONG>}g;           # this is *bold* here       s{\b_(\S+)\_\b} {<EM>$1</EM>}g;                 # this is _italics_ here       s{^}            {<P>\n};                        # add paragraph tag    }

20.4.3 Discussion

Converting arbitrary plain text to HTML has no general solution because there are too many conflicting ways to represent formatting information. The more you know about the input, the better you can format it.

For example, if you knew that you would be fed a mail message, you could add this block to format the mail headers:

BEGIN {     print "<TABLE>";     $_ = encode_entities(scalar <>);     s/\n\s+/ /g;  # continuation lines     while ( /^(\S+?:)\s*(.*)$/gm ) {                # parse heading         print "<TR><TH ALIGN='LEFT'>$1</TH><TD>$2</TD></TR>\n";     }     print "</TABLE><HR>"; }

The CPAN module HTML::TextToHTML has options for headers, footers, indentation, tables, and more.

20.4.4 See Also

The documentation for the CPAN modules HTML::Entities and HTML::TextToHTML



Perl Cookbook
Perl Cookbook, Second Edition
ISBN: 0596003137
EAN: 2147483647
Year: 2003
Pages: 501

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net