Recipe 20.21 Program: hrefsub

hrefsub makes substitutions in HTML files, so changes apply only to text in <A HREF="..." > tags. For instance, if you had the scooby.html file from the previous recipe, and you've moved shergold.html to be cards.html, you need but say:

% hrefsub shergold.html cards.html scooby.html <HTML><HEAD><TITLE>Hi!</TITLE></HEAD> <BODY><H1>Welcome to Scooby World!</H1> I have <A HREF="pictures.html">pictures</A> of the crazy dog himself.  Here's one!<P> <IMG src="/books/2/106/1/html/2/scooby.jpg" ALT="Good doggy!"><P <BLINK>He's my hero!</BLINK>  I would like to meet him some day, and get my picture taken with him.<P> P.S. I am deathly ill.  <a href="cards.html">Please send cards</A>. </BODY></HTML>

The HTML::Filter manual page has a BUGS section that says:

Comments in declarations are removed from the declarations and then inserted as separate comments after the declaration. If you turn on strict_comment( ), then comments with embedded "-\|-" are split into multiple comments.

This version of hrefsub (shown in Example 20-13) always lowercases the a and the attribute names within this tag when substitution occurs. If $foo is a multiword string, then the text given to MyFilter->text may be broken such that these words do not come together; i.e., the substitution does not work. There should probably be a new option to HTML::Parser to make it not return text until the whole segment has been seen. Also, some people may not be happy with having their 8-bit Latin-1 characters replaced by ugly entities, so htmlsub does that, too.

Example 20-13. hrefsub
  #!/usr/bin/perl -w   # hrefsub - make substitutions in <A HREF="..."> fields of HTML files   # from Gisle Aas <gisle@aas.no>      sub usage { die "Usage: $0 <from> <to> <file>...\n" }      my $from = shift or usage;   my $to   = shift or usage;   usage unless @ARGV;      # The HTML::Filter subclass to do the substitution.      package MyFilter;   use HTML::Filter;   @ISA=qw(HTML::Filter);   use HTML::Entities qw(encode_entities);      sub start {      my($self, $tag, $attr, $attrseq, $orig) = @_;      if ($tag eq 'a' && exists $attr->{href}) {              if ($attr->{href} =~ s/\Q$from/$to/g) {                  # must reconstruct the start tag based on $tag and $attr.                  # wish we instead were told the extent of the 'href' value                  # in $orig.                  my $tmp = "<$tag";                  for (@$attrseq) {                      my $encoded = encode_entities($attr->{$_});                      $tmp .= qq( $_="$encoded ");                  }                  $tmp .= ">";                  $self->output($tmp);                  return;              }      }      $self->output($orig);   }      # Now use the class.      package main;   foreach (@ARGV) {           MyFilter->new->parse_file($_);   }


Perl Cookbook
Perl Cookbook, Second Edition
ISBN: 0596003137
EAN: 2147483647
Year: 2003
Pages: 501

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net