ProblemYou need to turn complex content management system, blog, or shopping cart URLs into easy-to-remember URLs. SolutionUse mod_rewrite rules in an .htaccess file to invisibly turn simple URLs into complex query strings that return dynamic pages to the visitor's browser. For example, an e-commerce site that sells men and women's clothes might offer a variety of men's shoes, such as boots, oxfords, sandals, and loafers. A URL for the list of loafers might look like this:
Using rewrite rules, you can tidy up the URL to something like this:
A rewrite rule in the .htaccess file that you create or modify in the /store directory takes care of converting the clean URL to the more complex query string that the store template (list.php) needs to generate the list of loafers from the store database. Here's the code for the rewrite rule: RewriteEngine On Options +FollowSymLinks RewriteRule ^(.*)/(.*)/(.*)/$ /store/list.php?type=$1&cat=$2&subcat=$3 Assuming the mod_rewrite module has been compiled into your installation of Apache (typically, it has), the first line (RewriteEngine On) prepares the module for the rewrite rule or rules to follow. The second line (Options +FollowSymLinks) can be left out if it's already in the main Apache configuration file (typically, httpd.conf). The third line contains the rule. Three consecutive wildcard search patterns followed by a slash(.*)/match the structure of the simple URL. The patterns would match other clean URLS, too, such as …womens/skirts/mini/ or …mens/hats/ stetsons/. Note that the URL has to end with a slash (marked by the $ at the end of the search pattern) for a successful match. DiscussionInvesting the time to create rewrite rules that turn simple URLs into complex, behind-the-scenes queries pays off in several ways for you and your site's visitors. First off, because the resultant URLs are generally shorter and follow the "directory-slash-directory" model of URL construction, they're easier to print in offline materials, recite over the phone, and remember. Web surfersespecially novices shouldn't be expected to remember the arcane syntax of complex query strings involving questions marks, ampersands, and equals signs. Clean web page addresses also encourage power users to go "URL fishing." If …mens/shoes/loafers/ works, then sophisticated visitors to your site may conclude that …mens/shoes/boots/, …/mens/shoes/sandals/, and a variety of other permutations will work, too. This approach to improving your site's user-friendliness can be applied to as many "aliases" to the various pages of your site as you can think of, both static and dynamic. For example, if you keep your news releases in a directory called /news, set up rewrite rules (or a redirect) for /pr and /press. That way, you'll always have an answer for web surfers who "guess" their way to your site.
Clean URLs also make your life as a web designer easier as your site grows and changes. By providing a layer of abstraction between your visitors and your site's backend technology, clean URLs allow you to rearrange and even re-engineer your site without too much trouble. For example, you might want to move your store's shopping cart from PHP to a proprietary ecommerce platform that ties into your point-of-sale and inventory system. That probably means different server-side script names (possibly not ending in .php) and different query strings to get the same product lists. But with a few changes to the rewrite rules, the clean URLs in your site's links and visitors' bookmarks can stay the same. It might seem that clean URLs generated with rewrite rules can do everything but slice your crusty loaf of artisan bread. One area they do fall short in, though, is making much of an improvement in your site's search engine "indexability." That's because dynamic pages are, ultimately, still generated by the query string that the clean URL hides. The pages of thousands, if not millions, of sites lie beyond the reach of search enginesan area dubbed the deep Web. Search engines were originally designed to index static pages found through links. A dynamic page on your site accessible through a hardcoded link in your HTML codewhether its clean or complexwill be found using this method. But the automated spiders and robots crawling your site are not designed to uncover the bulk of your dynamic content by guessing at other permutations of the same query string, much less enter terms in a site search form or negotiate a subscribers-only login form. Google, Yahoo!, and others are at work on the problem of indexing this hidden content, which may prove to be up to one hundred times larger than what's available through search engines today. See AlsoFor other techniques that use Apache's rewriting engine, see Recipes 1.6 and 9.1. For more on the issues surrounding permanent web page addresses, read World Wide Web inventor Tim Berners-Lee's article "Cool URIs don't change" at http://www.w3.org/Provider/Style/URI. For more about the problem of the deep Web, see http://deepweb.com. |