Hack 18. Rewrite the Web


Use the power of Perl to rewrite the web.

The Greasemonkey extension for Mozilla Firefox and related browsers is a powerful way to modify web pages to your liking. In fact, the Mozilla family projects are customizable in many waysas long as you like writing C++, JavaScript, or XUL.

If your network doesn't run only Firefox, or if you just prefer to customize the Web with Perl instead of any other language, HTTP::Proxy can help.

The Hack

For whatever reason (registrar greed, mostly), plenty of useful sites such as Perl Monks have .com and .org domain names. One visitor might use http://www.perlmonks.com/, while the truly blessed saints prefer http://perlmonks.org/. That's all well and good except for the cases where you have logged in to the site through one domain name but not the others. Your HTTP cookie uses the specific domain name for identification.

Thus you may follow a link from somewhere that leads to the correct site with the incorrect domain name. How annoying!

Fixing this with HTTP::Proxy is easy though:

use strict; use warnings; use HTTP::Proxy ':log'; use HTTP::Proxy::HeaderFilter::simple; # start the proxy with the given command-line parameters my $proxy = HTTP::Proxy->new( @ARGV ); for my $redirect (<DATA>) {     chomp $redirect;     my ($pattern, $destination) = split( /\\|/, $redirect );     my $filter                  = get_filter( $destination );     $proxy->push_filter( host => $pattern, request => $filter ); } $proxy->start( ); my %filters; sub get_filter {     my $site = shift;     return $filters{ $site } ||= HTTP::Proxy::HeaderFilter::simple->new(         sub         {             my ( $self, $headers, $message ) = @_;             # modify the host part of the request only             $message->uri( )->host( $site );             # create a new redirect response             my $res = HTTP::Response->new(                 301,                 "Moved to $site",                  [ Location => $message->uri( ) ]             );             # and make the proxy send it back to the client             $self->proxy( )->response( $res );         }     ); } __DATA__ perlmonks.com|perlmonks.org www.perlmonks.org|perlmonks.org

The program creates a new HTTP::Proxy object, then reads all of the data at the end of the program to create header filters. When a request comes in, the proxy runs all header filters that match the request. These filters can manipulate the request as appropriate.

In this example, if the host of a request matches perlmonks.com, the filter sends back an HTTP 301 status code redirecting the request to perlmonks.org. A well-behaved client will repeat the request with the new host (this time, sending along the proper cookie).

The use of the %filters lexical is the Orcish Maneuver. Read the line in get_filter( ) as "return the cached object or cache a new one".


Running the Hack

Run the program from the command line. If necessary, pass in arguments, perhaps to run on a different port:

$ perl memoryproxy.pl port 5000             

Configure your browser to use this proxy and try to visit http://perlmonks.com/). You'll end up at http://perlmonks.org/.

Hacking the Hack

There are countless uses for HTTP::Proxy, even beyond rewriting both request and response headers and bodies. Try:

  • Restricting a browsing session to no more than ten minutes at a time during working hours.

  • Maintaining a list (or graph or tree) of relationships between sites.

  • Forbidding yourself from wasting time reading certain sites during working hours.

  • Creating shortcuts for URLs [Hack #1] across multiple browsers without manipulating local DNS records.



Perl Hacks
Perl Hacks: Tips & Tools for Programming, Debugging, and Surviving
ISBN: 0596526741
EAN: 2147483647
Year: 2004
Pages: 141

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net