Recipe 6.21 Program: urlify

This program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries to avoid including end-of-sentence punctuation in the marked-up URL.

It is a typical Perl filter, so it can be fed input from a pipe:

% gunzip -c ~/mail/archive.gz | urlify > archive.urlified

or by supplying files on the command line:

% urlify ~/mail/*.inbox > ~/allmail.urlified

The program is shown in Example 6-10.

Example 6-10. urlify
  #!/usr/bin/perl   # urlify - wrap HTML links around URL-like constructs   $protos = '(http|telnet|gopher|file|wais|ftp)';   $ltrs   = '\w';   $gunk   = ';/#~:.?+=&%@!\-';   $punc   = '.:?\-';   $any    = "${ltrs}${gunk}${punc}";   while (<>) {       s{         \b                    # start at word boundary         (                     # begin $1  {          $protos   :          # need resource and a colon          [$any] +?            # followed by on or more                               #  of any valid character, but                               #  be conservative and take only                               #  what you need to....         )                     # end   $1  }         (?=                   # look-ahead non-consumptive assertion          [$punc]*             # either 0 or more punctuation          [^$any]              #   followed by a non-url char          |                    # or else          $                    #   then end of the string       )      }{<A HREF="$1">$1</A>}igox;     print;   }


Perl Cookbook
Perl Cookbook, Second Edition
ISBN: 0596003137
EAN: 2147483647
Year: 2003
Pages: 501

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net