Section 7.2. Top-Level Pieces: Components | Advanced Perl Programming

7.2. Top-Level Pieces: Components

The examples we've seen so far in this chapter go part way to abstracting out some of the I/O logic in a program, but not all of it; and they certainly don't relieve us of some of the problems of higher layers of program design, such as the protocol layer. If POE is going to help us concentrate purely on the logic of our particular application, we need another layer of abstraction on topfortunately, we have such a layer, and it's provided by POE's Components.

Components are modules, usually in the POE::Component:: namespace (often abbreviated to PoCo:: in POE documentation), that provide very high-level functionality to an application. There are components that act as SOAP or XML/RPC servers, provide the basics of a mail server, speak Jabber or Yahoo! IM, receive syslog messages, play MP3s, and many other things. We'll start by looking at one of the protocol-level components, such as PoCo::Client::HTTP, and then move up to look at components that provide the whole core of an application for us.

7.2.1. Medium-Level Components

One of the ideas behind POE components is to hide the more repetitive parts of setting up I/O from the user, to abstract even wheels away. (One of the reasons wheels are called wheels is because they so often get reinvented.)

The most-used components are those that deal with TCP clients and servers; the server component knows how to bind to sockets, accept connections, talk to clients, and so on. Let's convert our port forwarder to use PoCo::Client::TCP and PoCo::Server::TCP instead of doing the work ourselves.

First, we have the same idea of a server where we're listening for connections, but this is handled somewhat differently:

     POE::Component::Server::TCP->new(         Port => 6667,         ClientConnected => \&spawn_client_side,         ClientInput => sub {             my ( $kernel, $heap, $input ) = @_[ KERNEL, HEAP,  ARG0 ];             $kernel->post( $heap->{client_id} => send_stuff => $input );         },         InlineStates => {             _child => sub {                 my ( $heap, $child_op, $child ) = @_[ HEAP, ARG0, ARG1 ];                 $heap->{client_id} = $child->ID                  if $child_op eq "create";             },             send_stuff => sub {                 my ( $heap, $stuff ) = @_[ HEAP, ARG0 ];                 $heap->{client}->put($stuff);             }         },     );

We start by saying we want to listen on port 6667, and once a client has connected, we'll head off and set up the client's component. The ClientInput state says that when the client sends us something, we post a send_stuff event to the client session, which sends it off to the other side of the tunnel.

But wait! How do we know what the client session is? Well, this is what the _child state is for. When something happens to a child session, when it gets created or destroyed, POE automatically tells our session about it. So using the _child state, we can store the client's ID so we can talk to it later.

And that's all we need to do for that part of the session. Now what happens to spawn the client?

     sub spawn_client_side {         POE::Component::Client::TCP->new(             RemoteAddress => 'mybox.real-world.int',             RemotePort    => 80,             Started => sub { $_[HEAP]->{server_id} = $_[SENDER]->ID; },             ServerInput => sub {                 my ( $kernel, $heap, $input ) = @_[ KERNEL, HEAP, SESSION, ARG0 ];                 $kernel->post( $heap->{server_id} => send_stuff => $input );             },             InlineStates => {                 send_stuff => sub {                     my ( $heap, $stuff ) = @_[ HEAP, ARG0 ];                     $heap->{server}->put($stuff);                 },             },         );     }

This session is a POE::Component::Client::TCP, and the first two parameters set up where it's talking to. We store the ID of the server that spawned the new session, so we can send it stuff.

Now, things are about to get a little tricky to describe, because we have a server that's just spawned a client, but that client opens a TCP connection to a completely different server. So let's have a quick look at a diagram in Figure 7-4 to explain what's going on here.

Figure 7-4. Port forwarder, mark 2

When we receive something from the other end of the tunnel (port 80 of the remote host), we post it as a send_stuff event to the server component, which, as we've seen, sends it to the end user. Conversely, when the server component tells us to send stuff arriving on port 6667 of the local host, we want to send it down the POE::Wheel::ReadWrite connection to port 80 of the remote host. PoCo::Client::TCP stores the wheel in the heap as $heap->{server}, so we just call put on that to send the data across. And that's all there is to it--50 lines of code, all told.

Using components has greatly simplified the process of handling network servers and clients, but we can go much further even than this.

7.2.2. A POE Web Server

The POE component POE::Component::Server::HTTP implements the business end of a web server in POE; it handles all the network and protocol layers and leaves us a callback to provide content in response to a request. This couldn't be simpler: we get an HTTP::Request object, and we have to send back an HTTP::Response object. This is how programming is meant to beall we need to do is decide on how we're going to create our content.

We could write an extremely simple server using PoCo::Server::HTTP, but we'll be slightly more advanced and create a file server that serves up files under a given directory. Here's all it takes to fire up our web server:

     use strict;     use POE::Component::Server::HTTP;     use POE;     my $datadir = "/Users/simon/";     POE::Component::Server::HTTP->new(         ContentHandler => { '/' => \&callback },         Port => 8000     );     $poe_kernel->run;

Next comes the actual callback that responds to the request:

     use URI::Escape;     use HTTP::Headers;     use File::Spec::Functions;     use File::MMagic;     sub callback {         my ($request, $response) = @_;         my $path = catfile($datadir,canonpath(uri_unescape($request->uri->path)));         return error($response, $request, RC_NOT_FOUND) unless -e $path;         return error($response, $request, RC_FORBIDDEN) unless open OUT, $path;         $response->code(RC_OK);         my $magic = File::MMagic->new(  );         $response->push_header("Content-type", $magic->checktype_filename($path));         local $/; $response->content(scalar <OUT>);         close OUT;         return $response;     }

Let's briefly pause to examine this function. Most of the magic is done in the second line:

     my $path = catfile($datadir,canonpath(uri_unescape($request->uri->path)));

This first extracts the path part of the request URI, turning http://www.foo.int:8000/some/file into /some/file. Then, as this is a URI, it may contain characters encoded in the percent-encoding scheme, so we unescape those using the uri_unescape function from URI::Escape.

Now we have a valid local part; however, we have to be careful at this point. If we blindly tack this onto the end of our data directory, /Users/simon, some joker will come along and request /../../etc/passwd.^[*] The canonpath function, from File::Spec::Functions, will tidy this up as though it were an absolute path, and remove leading ..sequences.

^[*] And, of course, he'll find that since this is a Macintosh, that information won't help him much. But it's the principle of the thing.

Once we add our document root to the beginning of this path, we've got something that turns http://www.foo.int:8000/some/file into /Users/simon/some/filethis one line has done the rough equivalent of Apache's URL mapping phase.

We must now check whether our file actually exists and is readable:

     return error($response, $request, RC_NOT_FOUND) unless -e $path;     return error($response, $request, RC_FORBIDDEN) unless open OUT, $path;

We'll define the error routine in a second; we use the codes from HTTP::Headers to represent the 404 (Not Found) and 403 (Forbidden) status codes. If we get past these two statements, we have a readable file and an active filehandle, so we can return a 200 (OK) status code. The next stage is to establish the MIME type of the file, which we do using a similar trick to Apache's mod_mime_magicthe File::MMagic module gives us a method that looks at the first few bytes of a file to determine its content type.

     $response->push_header("Content-type", $magic->checktype_filename($path));

To complete the request, we spit out the contents of the file in a relatively straightforward way:

     local $/; $response->content(scalar <OUT>);     close OUT;     return $response;

And, finally, the error response subroutine is equally straightforward:

     sub not_found {         my ($response, $request, $code) = @_;         my $uri = $request->uri;         my $message = status_message($code);         my $explanation = $code =  = RC_FORBIDDEN ? "accessible" : "found";         $response->code($code);         $response->push_header("Content-type", "text/html");         $response->content(<<EOF);     <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">     <HTML><HEAD>     <TITLE>$code $message</TITLE>     </HEAD><BODY>     <H1>$message</H1>     The requested URL $uri was not $explanation on this server.<P>     </BODY></HTML>     EOF         return $response;     }

The key to this is the status_message routine provided by HTTP::Headers, which turns a numeric status code (404) into a message (Not Found).

When we put this all together, we have a very simple file server in fewer than 50 lines of Perl code. The vast majority of these lines are actually taken up with error handling; perhaps that's the way it should be.

I hope you've noticed that when we've been looking at this web server, we've not really talked about POE at all. This is deliberate; the idea of POE components is to make the POE part almost invisible and allow you to concentrate on the program logic.

7.2.3. Highest-Level Components

As we mentioned at the beginning of this section, there are a wealth of components out there, and after awhile one can begin to think that most programming with POE is just a matter of sticking the appropriate bits together.

In Chapter 2, we looked at several implementations of an RSS aggregator and renderer. Now we'll look at a related problem: a realtime RSS newswire, which periodically checks a bunch of RSS sources and informs us of any new headlines.

How would you go about this without POE? Maybe use LWP to fetch a list of URLs, determine which have changed since the last fetch, parse with XML::RSS, work out the new articles, report these to the user, then go back to sleep for a while. Sounds easy, but when you get down to the details of working out the changed feeds and new headlines, you're probably looking at about 200 lines of code, at least. If you're lucky, you might find XML::RSS::Feed, which does some of this, but it's still not a 10-minute job.

Now that you know about POE, you might think you can use POE::Component::Client::HTTP to handle queuing and scheduling the HTTP fetches, and have a response state grab the responses and parse them. That takes some of the pressure away, but it's still way too much work. Can't we get a component to do this?

Here's a simple RSS newswire using POE::Component::RSSAggregator. We'll start by setting up our arrays of feeds using XML::RSS::FeedFactory:

     use XML::RSS::Feed::Factory;     my @feeds = feed_factory(         {   url => "http://slashdot.org/slashdot.rss",            name => "Slashdot",           delay => 60 },         {   url => "http://blog.simon-cozens.org/blosxom.cgi/xml",            name => "Simon Cozens",           delay => 60 },         {   url => "http://use.perl.org/perl-news-short.rdf",            name => "Perl news",           delay => 60 }     );

Now we can simply pass this array of feeds to POE::Component::RSSAggregator, and most of the work is done:

     my $aggie = POE::Component::RSSAggregator->new(                    feeds    => \@feeds,                    callback => \&new_headlines     );     POE::Kernel->run;

This sets up the relevant sessions to take care of getting the summaries from the feeds; all that's left is to decide what to do each time some RSS arrives:

     sub new_headlines {         my ($feed) = shift;         return unless my @newnews = $feed->late_breaking_news;         for my $headline (@newnews) {              print $headline->headline . ": " . $headline->url . "\n";         }     }

XML::RSS::Feed automatically keeps track of what headlines we've seen, so we can return immediately unless there's something new for us to see. When there is something new, we get an XML::RSS::Headline object we can interrogate.

Again, POE components have abstracted away the generic pieces of our applicationfetching and parsing feeds, and keeping track of what headlines we've seen and allowed us to concentrate on the specific parts: what we want to do with new headlines.