|I l @ ve RuBoard|
The backend infrastructure of most Web sites today is a hodgepodge of code in a number of languages. Typically there are static HTML files, graphic images, dynamic scripted templates, DHTML scripts for client-side interactivity, business objects, databases, legacy systems, middleware adapters, and content management systems ”all thrown together to form a collection of interlinked Web pages.
In any Web system most of these pages usually have a number of common elements: content, navigation components , and graphic design (branding). Across multiple pages of a single Web site, there is usually a somewhat uniform application of design elements. Navigation and page layout are uniformly structured from page to page and section to section, but the actual menu choices tend to change depending on your current location. Content changes from page to page, and often a dynamically generated page will be different every time it is loaded.
When you build Web sites in HTML, you risk an eternity of "painful" code maintenance.
This dynamism is usually produced by scripting templates, written in any number of programming languages and environments such as JSP, ASP, PHP, and Cold Fusion. With a traditional Web architecture, these scripts call on objects in the business logic tier to retrieve content (from files, content management systems, legacy system middleware adapters, or databases). Once they have retrieved the data to be presented, they format it for display in a Web page by churning through it and wrapping HTML tags around the various fields. This is an effective process that has worked well for a number of years . Nevertheless, it is not without its drawbacks.
The first problem with the approach just described is that a programming language such as Java, VB, or Perl is used to specify formatting, design, and layout. This means that programmers who aren't good at or interested in graphic design spend a lot of time on frustrating and annoying details of HTML page layout. It also means that visual-design “oriented developers who don't consider themselves hardcore programmers need to learn a scripting language (or several if they want to develop in more than one environment) and muck around with data access and intermediary APIs. A number of talented developers manage to enjoy and excel at both application programming and graphic design, but they are few and far between.
For programmers the issue isn't that page layout in HTML is hard ; it is that it is time-consuming work focused on visual details that most programmers couldn't care less about. For them, nothing is more frustrating and disheartening than completing an important piece of functionality with an elegant, scalable, robust, and flawlessly working code structure, only to have some nitwit from the customer's marketing department return with a five-page list of " bugs " such as "Subsection heading font too small," "Buttons should be aligned with right edge of paragraph," "Table columns in wrong order," and the dreaded "Page looks too cluttered." Not only is the work tedious and dull but it also represents time that a highly paid programmer could spend on more complex and important tasks .
These problems are exacerbated when it is time for a visual redesign of a site. If all of the HTML formatting is intermingled with programming code in many different modules, it becomes a daunting task to make the necessary changes. Usually it is possible to do this only by bringing programmers back into the project because they are the only ones who can decipher the code.
Yet another problem is that effective automated unit testing is at best limited on HTML pages. The final output of a JSP script is essentially a long string, which in a browser is parsed into tags, text, links, and scripts. But outside of the browser, you need an HTML parser to look inside the string to verify the contents of the Web page. Generically parsing Web pages is tricky because many contain sloppy HTML and errors such as improperly nested tags. Many browsers do the best they can with bad HTML and usually display something on the screen, albeit in sometimes unpredictable ways.
HTTPUnit does a decent job of addressing the testing problem. However, the API gives you only limited functionality to test tables and forms, providing no advanced testing of the page structure and layout. You can use HTTPUnit to convert a Web page into XHTML (an XML-compliant variation of HTML) and process it as an XML document, but in order to do that HTTPUnit must run your original page through JTidy.
JTidy is a Java version of Tidy, a module that cleans up messy HTML and optionally outputs XHTML.  It often makes guesses as to what was meant when moderate HTML problems are encountered , and it does as good a job as can be expected under such circumstances. Tidy is a great thing to have when you need it, but the bottom line is that it changes your original document, so that tests you execute on the cleaned-up XHMTL do not actually reflect the original document. This could have the unwanted effect of hiding errors from your test script. And, besides, if you are going to use an XML parser for unit testing on your final HTML output, why not simply create your final output as XHTML in the first place? We will show how much more efficiently all this can be done with XML and XSLT.
A corollary to this is the fact that HTML Web pages contain a mess of tags, text, and script code. This often makes it difficult to locate various page elements in the source code. Writing scripted unit tests to make sure that various elements in a design-rich page are present, in the right place, laid out properly, and contain the right content is not at all easy. There is simply no practical, standard, programmatic way to do it.
HTML Web sites are tangled messes of interfaces, content, and code.
|I l @ ve RuBoard|