Recipe 13.9. Marking Up a Web Page | PHP Cookbook: Solutions and Examples for PHP Programmers

13.9.1. Problem

You want to display a web page'for example, a search result'with certain words highlighted.

13.9.2. Solution

Build an array replacement for each word you want to highlight. Then, chop up the page into "HTML elements" and "text between HTML elements" and apply the replacements to just the text between HTML elements. Example 13-42 applies highlighting in the HTML in $body to the words found in $words.

Marking up a web page

$body = ' <p>I like pickles and herring.</p> <a href="pickle.php"><img src="/books/3/131/1/html/2/pickle.jpg"/>A pickle picture</a> I have a herringbone-patterned toaster cozy. <herring>Herring is not a real HTML element!</herring> '; $words = array('pickle','herring'); $replacements = array(); foreach ($words as $i => $word) {     $replacements[] = "<span class='word-$i'>$word</span>"; } // Split up the page into chunks delimited by a // reasonable approximation of what an HTML element // looks like. $parts = preg_split("{(<(?:\"[^\"]*\"|'[^']*'|[^'\">])*>)}",                     $body,                     -1,  // Unlimited number of chunks                     PREG_SPLIT_DELIM_CAPTURE); foreach ($parts as $i => $part) {     // Skip if this part is an HTML element     if (isset($part[0]) && ($part[0] == '<')) { continue; }     // Wrap the words with <span/>s     $parts[$i] = str_replace($words, $replacements, $part); } // Reconstruct the body $body = implode('',$parts); print $body; ?>

13.9.3. Discussion

Example 13-42 prints:

<p>I like <span class='word-0'>pickle</span>s and <span class='word-1'>herring</span>. </p> <a href="pickle.php"><img src="/books/3/131/1/html/2/pickle.jpg"/>A <span class='word-0'>pickle</span> picture</a> I have a <span class='word-1'>herring</span>bone-patterned toaster cozy. <herring>Herring is not a real HTML element!</herring>

Each of the words in $words (pickle and herring) has been wrapped with a <span/> that has a specific class attribute. Use a CSS stylesheet to attach particular display attributes to these classes, such as a bright yellow background or a border.

The regular expression in Example 13-42 chops up $body into a series of chunks delimited by HTML elements. This lets us just replace the text between HTML elements and leaves HTML elements or attributes alone whose values might contain a search term. The regular expression does a pretty good job of matching HTML elements, but if you have some particularly crazy, malformed markup with mismatched or unescaped quotes, it might get confused.

Because str_replace( ) is case sensitive, only strings that exactly match words in $words are replaced. The last Herring in Example 13-42 doesn't get highlighted because it begins with a capital letter. To do case-insensitive matching, we need to switch from str_replace( ) to regular expressions. (We can't use str_ireplace( ) because the replacement has to preserve the case of what matched.) Example 13-43 shows the altered code that uses regular expressions to do the replacement.

Marking up a web page with regular expressions

<?php $body = ' <p>I like pickles and herring.</p> <a href="pickle.php"><img src="/books/3/131/1/html/2/pickle.jpg"/>A pickle picture</a> I have a herringbone-patterned toaster cozy. <herring>Herring is not a real HTML element!</herring> '; $words = array('pickle','herring'); $patterns = array(); $replacements = array(); foreach ($words as $i => $word) {     $patterns[] = '/' . preg_quote($word) .'/i';     $replacements[] = "<span class='word-$i'>\\0</span>"; } // Split up the page into chunks delimited by a // reasonable approximation of what an HTML element // looks like. $parts = preg_split("{(<(?:\"[^\"]*\"|'[^']*'|[^'\">])*>)}",                     $body,                     -1,  // Unlimited number of chunks                     PREG_SPLIT_DELIM_CAPTURE); foreach ($parts as $i => $part) {     // Skip if this part is an HTML element     if (isset($part[0]) && ($part[0] == '<')) { continue; }     // Wrap the words with <span/>s     $parts[$i] = preg_replace($patterns, $replacements, $part); } // Reconstruct the body $body = implode('',$parts); print $body; ?>

The two differences in Example 13-43 are that it builds a $patterns array in the loop at the top and it uses the preg_replace( ) (with the $patterns array) instead of str_replace( ). The i at the end of each element in $patterns makes the match case insensitive. The \\0 in the replacement preserves the case in the replacement with the case of what it matched.

Switching to regular expressions also makes it easy to prevent substring matching. In both Example 13-42 and Example 13-43, the herring in herringbone gets highlighted. To prevent this, change $patterns[] = '/' . preg_quote($word) .'/i'; in Example 13-43 to $patterns[] = '/\b' . preg_quote($word) .'\b/i';. The additional \b items in the pattern tell preg_replace( ) only to match a word if it stands on its own.

13.9.4. See Also

Documentation on str_replace( ) at http://www.php.net/str_replace, on str_ireplace( ) at http://www.php.net/str_ireplace, on preg_replace( ) at http://www.php.net/preg_replace, and on preg_split( ) at http://www.php.net/preg_split.