Section 8.8. Additional Examples


8.8. Additional Examples

8.8.1. Adding Width and Height Attributes to Image Tags

This section presents a somewhat advanced example of in-place search and replace that updates HTML to ensure that all image tags have both WIDTH and HEIGHT attributes. (The HTML must be in a StringBuilder , StringBuffer , or other writable CharSequence .)

Having even one image on a web page without both size attributes can make the page appear to load slowly, since the browser must actually fetch such images before it can position items on the page. Having the size within the HTML itself means that the text and other content can be properly positioned immediately, which makes the page-loading experience seem faster to the user . [ ]

[ ] "All images must have size attributes!" was a mantra at Yahoo!, even in its early days. Remarkably, there are still plenty of major web sites today sending out high-traffic pages with sizeless < img > tags.

When an image tag is found, the program looks within the tag for SRC, WIDTH, and HEIGHT attributes, extracting their values when present. If either the WIDTH or the HEIGHT is missing, the image is fetched to determine its size, which is then used to construct the missing attribute(s).

If neither the WIDTH nor HEIGHT are present in the original tag, the image's true size is used in creating both attributes. However, if one of the size attributes is already present in the tag, only the other is inserted, with a value that maintains the image's proper aspect ratio. (For example, if a WIDTH that's half the true size of the image is present in the HTML, the added HEIGHT attribute will be half the true height; this solution mimics how modern browsers deal with this situation.)

This example manually maintains a match pointer, as we did in the section starting on page 383. It makes use of regions (˜ 384) and method chaining (˜ 389) as well. Here's the code:

 //  Matcher for isolating <img> tags  Matcher  mImg  = Pattern.compile("(?id)<IMG\s+(.*?)/?>").matcher (  html  );     //  Matchers that isolate the SRC, WIDTH, and HEIGHT attributes within a tag  (with very nave regexes)  Matcher  mSrc  = Pattern.compile("(?ix)\bSRC   =(\S+)").matcher(  html  );     Matcher  mWidth  = Pattern.compile("(?ix)\bWIDTH =(\S+)").matcher(  html  );     Matcher  mHeight  = Pattern.compile("(?ix)\bHEIGHT=(\S+)").matcher(  html  );     int  imgMatchPointer  = 0; //  The first search begins at the start of  the string  while (  mImg  .find(  imgMatchPointer  ))     {  imgMatchPointer  =  mImg  .end(); //  Next image search starts from where  this one ended  //  Look for our attributes within the body of the just-found image tag  Boolean  hasSrc  =  mSrc  .region(  mImg  .start(1),  mImg  .end(1)).find();        Boolean  hasHeight  =  mHeight  .region(  mImg  .start(1),  mImg  .end(1)) .find();        Boolean  hasWidth  =  mWidth  .region(  mImg  .start(1),  mImg  .end(1)).find();        //  If we have a SRC attribute, but are missing WIDTH and/or HEIGHT  ...        if (  hasSrc  && (!  hasWidth  !  hasHeight  ))        {           java.awt.image.BufferedImage  i  = //  this fetches the image  javax.imageio.ImageIO.read(new java.net.URL(mSrc.group(1)));           String  size  ; //  Will hold the missing WIDTH and/or HEIGHT attributes  if (  hasWidth  )                //  We're told the width, so compute the height that maintains the  proper aspect ratio   size  = "height='"  + (int)(Integer.parseInt(  mWidth  .group(1)) *  i  .getHeight() /  i  .getWidth()) + "' ";           else if (  hasHeight  )               //  We're told the height, so compute the width that maintains the  proper aspect ratio   size  = "width='"    + (int)(Integer.parseInt(  mHeight  .group(1)) *  i  .getWidth() /  i  .getHeight()) + "' ";           else //  We're told neither, so just insert the actual size   size  = "width='" +  i  .getWidth() + "' " +                      "height='" +  i  .getHeight() + "' ";  html  .insert(  mImg  .start(1),  size  ); //  Update the HTML in place   imgMatchPointer  +=  size  .length(); //  Account for the new text in  mImg's eyes  }     } 

Although it's an instructive example, a few disclaimers are in order. Because the focus of the example is on in-place search and replace, I've kept some unrelated aspects of it simple by allowing it to make fairly na ve assumptions about the HTML it will be passed. For example, the regular expressions don't allow whitespace around the attribute's equal sign, nor quotes around the attribute's value. (See the Perl regex on page 202 for a real-world, Java- applicable approach to matching a tag attribute.) The program doesn't handle relative URLs, nor any ill-formatted URLs for that matter, as it doesn't handle any of the exceptions that the image-fetching code might throw.

Still, it's an interesting example illustrating a number of important concepts.



Mastering Regular Expressions
Mastering Regular Expressions
ISBN: 0596528124
EAN: 2147483647
Year: 2004
Pages: 113

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net