Section 3.4. HTML::Mason | Advanced Perl Programming

3.4. HTML::Mason

One of the big drawbacks of HTML::Template is that it forces us, to some degree, to mix program logic and presentation, something that we sought to avoid by using templates. For instance, that last template got a little difficult to follow, with variable and HTML tags crowding up the template and obscuring what was actually going on. What we would prefer, then, is a system that allows us to further abstract out the individual elements of what we expect our templates to do, and this is where HTML::Mason comes in.

As we've mentioned, HTML::Mason is an inside-out templating system. As well as templating, it could also be described as a component abstraction system for building HTML web pages out of smaller, reusable pieces of logic. Here's a brief overview of how to use it, before we go on to implement the same RSS aggregator application.

3.4.1. Basic Components

In Mason, everything is a component. Here's a simple example of using components. Suppose we have three files: test.html in Example 3-1, Header in Example 3-2, and Footer in Example 3-3.

Example 3-1. test.html

 <& /Header &> <p>   Hello World </p> <& /Footer &>

Example 3-2. Header

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html>   <head>       <title>Some Web Application</title>       <link rel=stylesheet type="text/css" href="nt.css">   </head> <body>

Example 3-3. Footer

     <hr>     <div >       <address>          <a href="mailto:webmaster@yourcompany.com">webmaster@yourcompany.com</a>       </address>     </div>   </body> </html>

HTML::Mason builds up the page by including the components specified inside <& and &> tags. When creating test.html, Mason first includes the Headercomponent found at the document root, then the rest of the HTML, then the Footer component.

Components may call other components. So far, we've done nothing outside the scope of server-side includes.

3.4.2. Basic Dynamism

So where does the templating come in? There are three basic ways of adding templates to Mason pages. Here's the first, a simple modification to our Footer component.

         <hr>         <div >           <address>              <a href="mailto:webmaster@yourcompany.com">webmaster@yourcompany.com</a>           </address>           Generated: <% scalar localtime %>         </div>       </body>     </html>

If you wrap some Perl code in <% ... %> tags, the result of the Perl expression is inserted into the resulting HTML.

That's all very well for simple expressions, but what about actual Perl logic? For this, Mason has an ugly hack: a single % at the beginning of a line is interpreted as Perl code. This lets you do things like Example 3-4, to dump out the contents of a hash.

Example 3-4. Hashdump

 <table>   <tr>      <th> key </th>      <th>value</th>   </tr> % for (keys %hash) {    <tr>      <td> <% $_ %> </td>      <td> <% $hash{$_} %> </td>    </tr> % } </table> <%ARGS> %hash => undef </%ARGS>

There's a few things to notice in this example. First, see how we intersperse ordinary HTML with logic, using % ..., and evaluated Perl expressions, using <% ... %>. The only places % is special are at the start of a line and as part of the <% ... %> tag; the % of %hash is plain Perl.

The second thing to notice in the example is how we get the hash into the component in the first place. That's the purpose of the <%ARGS> sectionit declares arguments to pass to the component. And how do we pass in those arguments? Here's something that might call Hashdump:

     % my %foo = ( one => 1, two => 2 );     <& /Hashdump, hash => %foo &>

So altogether, we have an example of declaring my variables inside a component, passing a named parameter to another component, and having that component receive the parameter and make use of it. Mason will try to do something sensible if you pass parameters of different types than the types you've declared in the <%ARGS> section of the receiving component (here we passed a hash to fill in the %hash parameter, for instance), but life is easier if you stick to the same types.

3.4.3. Perl Blocks

There's a final way of adding Perl logic to your components, but it's not used much in the form we're about to describe. If you've got long Perl sections, you won't want to put a % at the beginning of every line. Instead, you can wrap the whole thing up in a <%PERL>...</%PERL> block.

However, something you will see quite often in real-life components is the <%INIT>...</%INIT> block. This can be placed anywhere in the component, although typically it's placed at the end to keep it away from all the HTML. No matter where it's placed, it always runs first, before anything else in the component. It's a good place to declare and initialize any variables you're going to use (by the wayMason forces use strict...) and do any heavy computation that needs to happen before you do the displaying.

Another vaguely useful thing to know about is the <%ONCE>...</%ONCE> block, which is executed only at startupthink of it as the Mason equivalent of a Perl BEGIN block.

3.4.4. Our RSS Aggregator

We're now in a position where we can start putting together our RSS aggregator. The example in this section is taken from some code I wrote for a portal site. It's worth noting that I threw it together in a matter of around two or three hours. The intention was to support logins, personalized lists of feeds, personalized ordering, and so on. Although I didn't get that far, what I had after those two or three hours is worth looking at.^[*]

^[*] Feel free, of course, to implement all these things as an exercise in HTML::Mason programming.

Let's start by thinking of what we want on the front page. I opted for a two-column design, shown in Figure 3-1, with the left column containing an invitation to log in to the portal and a list of the feeds available. As an additional flourish, the list of feeds are categorized into folders, represented by directories in the filesystem. The right column contains the logged-in user's favorite feeds, the feeds from a given folder if a folder has been clicked, or a default set of feeds in all other cases.

Figure 3-1. The RSS aggregator

Let's begin to build the site. First, we'll want a header and a footer to take away most of the boring parts of the HTML generation, as in Examples Example 3-5 and 3-6.

Example 3-5. Header

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html lang="en"> <head> <title> My Portal </title> <link rel="stylesheet" type="text/css" href="/stylesheets/portal.css"> </head> <body > <img src="/books/2/171/1/html/2//images/portal-logo.gif" > <h1>My Portal</h1>

Example 3-6. Footer

 </body> </html>

Now we're going to use a slight Mason trick: instead of wrapping every page in the header and footer manually, we use an autohandler, a component that is applied to all pages, as in Example 3-7.

Example 3-7. Autohandler

 <& /header &> <% $m->call_next %> <& /footer &>

Behind the scenes, Mason pages are processed by one or more handlers, reminiscent of Apache mod_perl handlers. Indeed, $m in our code is the Mason request object, which is similar to the Apache request object.^[*]

^[*] If you need the actual Apache request object in Mason, it's available as $r.

In the lineup of Mason handlers, first come the autohandlers, which handle every request; then come dhandlers, which handle particular URIs; and finally comes the ordinary Mason handler for the page you're trying to process. Our example shows the simplest but most common autohandler: call a header component, then pass this request on to the next handler in the Mason handler chain, and finally call a footer component. This ensures that every page has its header and footer.

Next, we'll think about what the index has to be. As we've said, we're going for a two-column design, something like Example 3-8.

Example 3-8. index.html

 <table> <tr> <td valign="top"> <& /LoginBox &> <& /Directories &> <%INIT> $open = ($open =~ /(\w+)/) ? $1 : ''; </%INIT> </td> <td width=4>&nbsp;</td> <td width='100%'> %# Am I logged in ? % if (0) { <& /LoggedInPane &> %} elsif ($open) { <& /DirectoryPane, open => $open &> %} else { <& /StandardPane &> %} </td> </table> <%ARGS> $open => undef </%ARGS>

As promised, the column on the left contains a login box and the directory of feeds. The right-hand side has three states: one pane for those who are logged in (which is ifdef'ed out since user control is left for future expansion), one if a particular directory has been opened, and one if the user has just come to the site's front page.^[*]

^[*] Therefore, as it happens, all requests will go through index.html, and we could have put our header and footer code in there, but using an autohandler is cleaner and actually more conventional.

What about the value of $open? Mason allows components to take arguments, either via CGI or by being passed in from other components. In this case, index.html is a top-level component and will receive its arguments via CGIthat is, if we request the URL http://www.oursite.com/rss/index.html?open=News, then $open will be set to News. The directory pane component receives its arguments from index.html, and so we pass it the value of $open we received.

Because $open later names a directory on the web server, we sanitize its value to avoid directory-perusal attacks such as passing in a query of open=../../... We do this in the <%INIT%> phase by replacing the parameter passed in with the first word in the string. If the parameter has no word characters, we set it to an empty string so the remainder of the code acts as if no directory was selected.

Now, our site is going to be made up of a load of boxes of various titles and different colors, so let's have a couple of helper components to draw boxes for us. We're going to allow the box to have a user-defined color, title, and optional title link. Experience has shown that the best way to do this is to create components for the start of the box and the end of the box. The start of the box, shown in Example 3-9, creates a table inside a table.

Example 3-9. BoxTop

 <table bgcolor="#777777" cellspacing=0 border=0 cellpadding=0> <tr><td rowspan=2></td> <td valign=middle align=left bgcolor="<%$color%>"> &nbsp; <font size=-1 color="#ffffff"> <b> <% $title_href && "<a  href=\"$title_href\">"|n %> <%$title |n %> <%  $title_href && "</a>" |n %> </b></font></td> <td rowspan=2>&nbsp;</td></tr> <tr><td colspan=2 bgcolor="#eeeeee" valign=top align=left width=100%> <table cellpadding=2 width=100%><tr><td> <%ARGS> $title_href => undef $title => undef $color => "#000099" </%ARGS>

One thing to notice from this is the |n directive that appears at the end of some of the interpolated Perl sections. The reason for these is to turn off Mason's default HTML entity escaping code. For instance, if we had passed in a value for $title_href, then this line:

     <%  $title_href && "</a>" %>

would want to output </a>. However, as Mason tries to escape HTML entities for you, this would become </a>so we need to turn that off.

The box ending code, shown in Example 3-10, is much simpler and merely ends the two tables we opened.

Example 3-10. BoxEnd

 </td></tr></table> </td></tr> <tr><td colspan=4>&nbsp;</td></tr> </table>

As an example of these box drawing components, let's first dispatch the dummy login box for completeness, as in Example 3-11.

Example 3-11. LoginBox

 <& BoxTop, title=>"Login" &> <small>Log in to Your Portal:</small><br/> <form> <ul> <li> Barcode: <input name="barcode"> <li> Password: <input name="password"> </ul> </form> <& BoxEnd &>

When Mason processes that component, it produces HTML that looks like this:

     <table bgcolor="#777777" cellspacing=0 border=0 cellpadding=0>     <tr><td rowspan=2></td>     <td valign=middle align=left bgcolor="#000099">     &nbsp;     <font size=-1 color="#ffffff">     <b> Login </b></font></td>     <td rowspan=2>&nbsp;</td></tr>     <tr><td colspan=2 bgcolor="#eeeeee" valign=top align=left width=100%>     <table cellpadding=2 width=100%><tr><td>     <small>Log in to Your Portal:</small><br/>     <form>     <ul>     <li> Barcode: <input name="barcode">     <li> Password: <input name="password">     </ul>     </form>     </td></tr></table>     </td></tr>     <tr><td colspan=4>&nbsp;</td></tr>     </table>

Now we need to make some decisions about our site's layout. As we've mentioned, we're going to put our feeds in the filesystem, categorized by directory. We'll actually have each individual feed be a Mason component, drawing on a library component we'll call RSSBox. Our Directories component is a box containing a list of categories; clicking on a category displays all the feeds in that category. As each category is a directory, we can create the list, as in Example 3-12.

Example 3-12. Directories

 <& /BoxTop, title=> "Resources" &> <ul> <%$Portal::dirs%> </ul> <& /BoxEnd &> <%ONCE>     my $root = "/var/portal/";     for my $top (grep { -d $_ } glob("$root*")) {         $top =~ s/$root//;         $Portal::dirs .= qq{             <li><a href="/?open=$top">$top</a>         } unless $top =~ /\W/;     } </%ONCE>

What's happening here is that when the server starts up, it looks at all the subdirectories of our portal directory and strips them of their root (in this instance, /var/portal/) to turn them into a link for the purposes of our application. For instance, a directory called /var/portal/News would turn into a link /?open=News with the heading News. This link redirects back to our home page, where the open parameter causes the DirectoryPaneto be presented and opens the feeds in the selected directory. The code skips any directories with non-word characters in the name, so it only generates links that will pass the parameter check on open.

Let's think about how that pane is implemented. We know that we open a directory and find it full of Mason component files. We want to then dynamically include each of those component files in turn, to build up our directory of feeds.

The trick to dynamically calling a component is the comp method on the Mason request object $m; this is the Perl-side version of the <& comp &> component include tag. Hence, our directory pane ends up looking like Example 3-13.

Example 3-13. DirectoryPane

 <%ARGS> $open </%ARGS> % for (grep {-f $_} glob( "/var/portal/$open/*") ) { % s|/var/portal/||; <% $m->comp($_) %> % }

We first receive the name of the directory we're trying to open. Next we look at each file in that directory, strip off the name of the root directory (ideally this would all be provided by a configuration file), and then call the component with that name. This means that if we have a directory called Technology containing the following files:

     01-Register     02-Slashdot     03-MacNews     04-LinuxToday     05-PerlDotCom

then calling <& /DirectoryPane, open =>"Technology"&> would have the effect of saying:

     <& /Technology/01-Register   &>     <& /Technology/02-Slashdot   &>     <& /Technology/03-MacNews    &>     <& /Technology/04-LinuxToday &>     <& /Technology/05-PerlDotCom &>

The standard pane, shown in Example 3-14, appears when no directory is open. It consists of whatever feeds we choose to make default.

Example 3-14. StandardPane

 <& /BoxTop, title=> "Hello!", color => "dd2222"&> Welcome to your portal! From here you can subscribe to a wide range of news and alerting services; if you log in, you can customize this home page. <& /BoxEnd &> <& /Weather/01-Oxford &> <& /Technology/02-Slashdot &> <& /News/01-BBC &> <& /People/03-Rael &> ...

So what's in the individual files? As we've mentioned, they make use of an RSSBox component, and they simply pass in the URL for the feed and optionally a color, a maximum number of items, and a name for the feed. They also pass in a parameter to say whether we want to display just the titles and links for each RSS item, or the description as well. For instance, /News/01-BBC looks like this:

     <& /RSSBox, URL =>"http://www.newsisfree.com/HPE/xml/feeds/60/60.xml",     Color =>"#dd0000" &>

whereas Rael Dornfest's blog looks like this:

     <& /RSSBox, URL => "http://www.oreillynet.com/~rael/index.rss",     Color=> "#cccc00", Title => "Rael Dornfest", Full => 0 &>

As we'll see in a moment, the beauty of this modular system is that we can have components that do things other than fire off RSS feeds if we want.

But first, let's complete our portal by writing the RSSBox library that all these sources use. First, we want a ONCE block to load up the modules we need:

     <%ONCE>     use XML::RSS;     use LWP::Simple;     </%ONCE>

Next we take our arguments, setting appropriate defaults:

     <%ARGS>     $URL     $Color => "#0000aa"     $Max => 5     $Full => 1     $Title => undef     </%ARGS>

Before we start outputting any content, we load up the feed in question and parse it with the XML::RSS module. We call Mason's cache_self method to have this component handle caching its output; if the same URL is accessed within 10 minutes, the cached copy will be presented instead:

     <%INIT>     return if $m->cache_self(key => $URL, expires_in => '10 minutes');     my $rss = new XML::RSS;     eval { $rss->parse(get($URL));};     my $title = $Title || $rss->channel('title');     </%INIT>

And now we are ready to go. So let's look at this altogether in Example 3-15.

Example 3-15. RSSBox

 <%ONCE> use XML::RSS; use LWP::Simple; </%ONCE> <%ARGS> $URL $Color => "#0000aa" $Max => 5 $Full => 1 $Title => undef </%ARGS> <%INIT> my $rss = new XML::RSS; eval { $rss->parse(get($URL));}; my $title = $Title || $rss->channel('title'); my $site = $rss->channel('link'); </%INIT> <BR> <& BoxTop, color => $Color, title => $title, title_href => $site &>     <dl > % my $count = 0; % for (@{$rss->{items}}) {     <dt >     <a href="<% $_->{link} %>"> <% $_->{title} %> </a>     </dt> % if ($Full) {     <dd> <% $_->{description} %> </dd> % } %   last if ++$count >= $Max; % }     </dl> <& /BoxEnd &>

There isn't much to it; for each item in the feed, we want to provide a link, the item's title, and, optionally, the description. We stop if we have more items than we want.

This demonstrates how powerful Mason can be; as I said, the total development time for this site was a couple of hours at most. The entire site takes considerably fewer than 200 lines of code. And, as we mentioned, we have the flexibility to include components that are not RSS. For instance, we don't actually have an RSS feed of the Oxford weather. However, there is a web page that spits out a weather report in a well-known format. This means that Weather/01-Oxford does not call RSSBox at all, but is in fact the following:

     <%INIT>     use LWP::Simple;     my @lines = grep /Temperature|Pressure|humidity|^Sun|Rain/,                 split /\n/,                 get('http://www-atm.physics.ox.ac.uk/user/cfinlay/now.htm');     </%INIT>     <br>     <& /BoxTop, title => "Oxford Weather", color => "#dd00dd" &>     <ul>     % for (@lines) {      <li> <% $_ %> </li>     % }     </ul>     <& /BoxEnd &>

And that sums up Masonsimple, extensible, and highly powerful.

Of course, there are many other Mason tricks for you to learntoo many to cover here. Dave Rolsky and Ken Williams's fantastic book Embedding Perl in HTML with Mason (http://www.masonbook.com/) covers many of them, including more details about getting Mason up and running in your web server. Also check out the Mason home page (http://www.masonhq.com).