Web Technology Overview


Developing a Web site might seem straightforward or at least easier than developing a full-blown cross-platform networked application. For better or worse, Web technology has evolved to the point that developing a Web application is almost as complex as other networked services. This following paragraph is from the documentation for a popular open-source Web framework, Apache Struts:

The core of the Struts framework is a flexible control layer based on standard technologies like Java Servlets, JavaBeans, ResourceBundles, and XML, as well as various Jakarta Commons packages. Struts encourages application architectures based on the Model 2 approach, a variation of the classic Model-View-Controller (MVC) design paradigm.

Struts provides its own Controller component and integrates with other technologies to provide the Model and the View. For the Model, Struts can interact with standard data access technologies, like JDBC and EJB, as well as most any third-party packages, like Hibernate, iBATIS, or Object Relational Bridge. For the View, Struts works well with JavaServer Pages, including JSTL and JSF, as well as Velocity Templates, XSLT, and other presentation systems.

If you understand all that, you can probably skip the first half of this chapter. If you don't, this chapter and the next cover enough ground that you'll be able to at least approach it. The Struts framework isn't alone in the Web space as far as complexity and approachability. The point is that you need to consider these details when reviewing enterprise-class Web applications. You need to budget a good deal of preparation time or find a strategy for dealing with unfamiliar and complex technology. The remainder of this section provides an overview of the general principles and common elements of the most popular web technologies.

The Basics

The World Wide Web (WWW) is a distributed global network of servers that publishes documents over various protocols, such as gopher, FTP, and HTTP. A document, or resource, is identified by a Uniform Resource Identifier (URI), such as http://www.neohapsis.com/index.html. This URI is the identifier for the HTML document located on the www.neohapsis.com Web server at /index.html, which can be retrieved via and HTTP request.

Hypertext Markup Language (HTML) is a simple language for marking up text documents with tags that identify semantic structure and visual presentation. HTML is a Standard Generalized Markup Language (SGML) applicationthat is, a markup language defined in SGML. A key concept in HTML is the hyperlink, which is a reference to another resource on another server (given as a URI). One of the defining characteristics of the Web is that it's composed largely of hypertextinterconnected documents that reference each other via hyperlinks.

Hypertext Transport Protocol (HTTP) is a simple protocol that Web servers use to make documents available to clients (discussed in more detail in "HTTP" later in this chapter). A Web client, or Web browser, connects to a Web server by using a TCP connection and issues a simple request for a URI path, such as /index.html. The server then returns this document over the connection or notifies the client if there has been an error condition. Web servers typically listen on port 80. SSL-wrapped HTTP (known as HTTPS) is typically available on port 443.

Static Content

The most straightforward request a Web server can broker is for a file sitting on its local file system or in memory. The Web server simply retrieves the file and sends it to the network as the HTTP response. This process is known as serving static content because the document is the same for every user every time it's served.

Static content is great for data that doesn't change often, like your Star Trek Web site or pictures of your extensive collection of potted meat products. However, more complex Web sites need to be able to control the Web server's output programmatically. The Web server needs to create content on the fly that reacts to users' actions so that it can exhibit the behavior of an application. Naturally, there are myriad ways a programmer can interface with a Web server to create this dynamic content.

CGI

Common Gateway Interface (CGI) is one of the oldest mechanisms for creating dynamic Web content. A CGI program simply takes input from the Web server via environment variables, the command line, and standard input. This input describes the request the user made to the Web server. The CGI program performs some processing on this input, and then writes its output (usually an HTML document) to standard output. When a Web server receives a request for a CGI program, it simply forks and runs that program as a new process, and then relays the program's output back to the user.

CGI programs can be written in almost any language, as the only real requirement is the ability to write to STDOUT. Perl is a popular choice because of its string manipulation features, as are Python and Ruby. Here is a bare-bones CGI program in Perl:

#!/usr/bin/perl print "Content-type: text/html\r\n\r\n"; print "<html><body>hi!</body></html>\r\n";


The primary disadvantage of the CGI model is that it requires a separate process for each Web request, which means it isn't well suited to handling heavy traffic. Modified interfaces are available, such as FastCGI, that allow a more lightweight request-handling process, but CGI-style programs are typically used for low-traffic applications.

Web Server APIs

Most Web servers provide an API that enables developers to customize the server's behavior. These APIs are provided by creating a shared library or dynamic link library (DLL) in C or C++ that's loaded into the Web server at runtime. These Web server extensions can be used for creating dynamic content, as Web requests can be passed to developer-supplied functions that process them and generate responses. These extensions also allow global modification of the server, so developers can perform analysis or processing of every request the server handles. These APIs allow far more customization than an interface such as CGI because Web developers can alter the behavior of the Web server at a very granular level by manipulating shared data structures and using control APIs and callbacks. Here are the common interfaces:

  • Internet Server Application Programming Interface (ISAPI) Microsoft provides this API for extending the functionality of its Internet Information Services (IIS) Web server. ISAPI filters and DLLs are often found in older Microsoft-based Web applications, particularly in Web interfaces to commercial software packages.

  • Netscape Server Application Programming Interface (NSAPI) Netscape's Web server control API can be used to extend Netscape's line of servers and Web proxies. It's occasionally used in older enterprise applications for global input validation as a first line of defense.

  • Apache API This API supports extension of the Apache open-source Web server via modules and filters.

Many of the other Web programming technologies discussed in this chapter are implemented on top of these Web server APIs. Modern Web servers are usually constructed in an open, modular fashion. Therefore, these extension APIs can be used to make changes commensurate with what you'd expect from full source-code-based modifications of the Web server.

Server-Side Includes

A Web server doesn't examine a typical static HTML document when presenting it to a Web browser. The server simply reads the document from memory or disk and sends it out over the network without looking at the document's contents. Several technologies are based on slightly altering this design so that the Web server inspects and processes the document while it serves it to the client. These technologies range in complexity from simple directives to the Web server, to full programming language interpreters embedded in the Web server.

The simplest and oldest form of server-side document processing is server-side includes (SSIs), which are specially formatted tags placed in HTML pages. These tags are simple directives to the Web server that are followed as a document is presented to a user. As the Web server outputs the document, it pulls out the SSI tags and performs the appropriate actions. These tags provide basic functionality and can be used to create simple dynamic content. Most Web servers support them in some fashion. Take a look at a few examples of SSIs. The following command prints the value of the Web server variable DOCUMENT_NAME, which is the name of the requested document:

<p>The current page is <!--#echo var="DOCUMENT_NAME" --></p>


The following SSI directs the server to retrieve the file /footer.html and replace the #include tag with the contents of that file:

<!--#include virtual="/footer.html" -->


When the Web server parses the following tag, it runs the ls command and replaces the #exec tag with its results:

<!--#exec cmd="ls" -->


As a security reviewer, SSI functionality should make your ears perk up a little. You learn more some handling issues with SSI in "Programmatic SSI" later in this chapter.

Server-Side Transformation

Storing the content of a Web site in a format other than HTML is often advantageous. This content might be generated by another program or tool in a common format such as XML, or it might reside on a live resource, such as a database server. Web developers can use server-side parsing technologies to instruct the Web server to automatically transform content into HTML on the fly. These technologies are more involved than server-side includes, but they aren't as sophisticated as the more popular full server-side scripting implementations.

Extensible Stylesheet Language Transformation (XSLT) is a general language that describes how to turn one XML document into another XML document. Web developers can use XSLT to tell a Web server how to transform a XML document containing a page's content into an HTML document that's presented to users. Say you have the following simple XML document describing a person:

<person>     <name>Zoe</name>     <age>1</age> </person>


An XSLT style sheet that describes how to turn this XML document into HTML could look something like this:

<xsl:stylesheet version = '1.0'     xmlns:xsl='http://www.w3.org/1999/XSL/Transform'> <xsl:template match="/">     <html>         <body>             <p>Name: <xsl:value-of select="person/name"/></p>             <p>Age: <xsl:value-of select="person/age"/></p>         </body>     </html> </xsl:template> </xsl:stylesheet>


The result of transforming the XML content into HTML is this document:

<html> <body> <p>Name: Zoe</p> <p>Age: 1</p> </body> </html>


Internet Database Connection (IDC) is an older, now unsupported, Microsoft Web programming technology for binding an HTML page to a data source (such as a database) and populating fields in the page with dynamic data. It has strong similarities to XSLT. Web developers create a template, known as an .htx file, which is basically an HTML document with special tags that indicate where data from the database should be inserted. They then create an .idc file that tells the Web server which template file to use and what database query to run to get the values needed to fill in the template.

Server-Side Scripting

Server-side scripting technology is essentially server-side document processing taken to the next level. Instead of embedding simple directives or providing transformation templates, server-side scripting technologies enable Web developers to embed actual program code in HTML documents. When the Web server encounters these embedded programs, it runs them through an internal program interpreter. This model is popular for small- to medium-scale Web development because it offers good performance, and Web sites that use it are typically simple to develop. Here are the popular server-side scripting technologies:

  • PHP: Hypertext Preprocessor (PHP) Because PHP is a recursive acronym, so you can probably guess that it's a UNIX-oriented, open-source technology. It's currently a popular language for Web development, especially for small to medium applications. PHP is a scripting language designed from the ground up to be embedded in HTML files and interpreted by a Web server. It's a fairly easy language to pick up because it has much overlap with Perl, C, and Java.

  • Active Server Pages (ASP) ASP is Microsoft's popular server-side scripting technology. ASP pages can contain code written in a variety of languages, although most developers use VBScript or JScript (Microsoft's JavaScript). It's also relatively easy to develop for because the ASP framework is fairly straightforward, and pages can call Component Object Model (COM) objects for involved processing.

  • ColdFusion Markup Language (CFML) This server-side scripting language is used by the Adobe (formerly Macromedia) ColdFusion framework. ColdFusion is another popular technology that has retained a core set of developers over many years.

  • JavaServer Pages (JSP) JSP is ostensibly a server-side scripting language in the same vein as PHP and ASP. It does allow Web developers to embed Java code in HTML documents, but it isn't typically used in the same fashion as other server-side scripting languages. JSP pages are with a component of Java servlet technology, explained in the next bulleted list.

Over time, server-side scripting solutions have evolved away from an interpreted model. Instead of running a page through an interpreter for each request, a Web server can compile the page down to a more efficient representation, such as bytecode. The Web server needs to do this compilation only once, as it can keep the compiled program in a cache. The virtual machine that interprets the bytecode can then cache the corresponding machine code, resulting in performance similar to a normal compiled language, such as straight C/C++. Here are some popular technologies of this nature:

  • Java servlets Java is probably responsible for much of the evolution in server-side scripting, as it was originally designed with a compiled model. Java servlets are simply classes that are instantiated by and interact with the Web server through a common interface. JSP pages are actually compiled into Java servlets by the Web server.

  • ASP.NET ASP.NET is Microsoft's revamping of ASP. ASP.NET page code can be written in any .NET language, such as C# or VB.NET. The pages are compiled down to intermediate language (IL) and cached by the Web server. The .NET framework handles just-in-time (JIT) compilation of the IL.

  • ColdFusion MX ColdFusion MX compiles CFML pages down to Java bytecode instead of running an interpreter.

Note

Even pure scripting technologies are often compiled to bytecode when a script is requested for the first time. The bytecode is then cached to accelerate later requests for the same unmodified script.





The Art of Software Security Assessment. Identifying and Preventing Software Vulnerabilities
The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities
ISBN: 0321444426
EAN: 2147483647
Year: 2004
Pages: 194

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net