Languages of the Web

Before the first two computers could ever start talking to each other, they had to be programmed to speak the same language. A number of popular Web languages exist and each of them has strengths and weaknesses. On the one hand, for a lightweight, low-impact language, HTML is by far your best choice; it is simple, straightforward, and involves few difficult concepts. On the other hand, if you need interactivity, dynamic data updating, and complex graphical displays, Java is your best choice. Either way, to understand how a Web server talks to a client browser (and therefore find a weakness in its implementation), you must understand the underlying technologies.

In this chapter we discuss the most popular Web languages available. It certainly isn't an exhaustive list of languages, but it gets you well on your way to understanding the major languages on the Web. The following languages comprise the majority of important Web infrastructure technologies, and therefore, an understanding of them is essential.

HTML

File extension(s): .html, .htm, .html4

HyperText Markup Language (HTML), invented by Tim Berners-Lee in 1989, is the undergirding framework of the Internet. Almost every Web site in existence uses the HTML language to display text, graphics, sounds, and animations. The caretaker of the specification is the World Wide Web Consortium (W3C) (http://www.w3.org).

Although the language itself is new, it is derived from an older language called Standard Generalized Markup Language (SGML). The impetus for HTML was fostered almost ten years before it was introduced when Bill Atkinson invented HyperCard for the Macintosh computer and the MacOS operating system. The premise behind HyperCard is precisely what supports the HTML language: hyperlinking, or the ability to link from one word or area of a page to another area or page. The concept is simple, but before HyperCard, it never had really been put into widespread practice.

HTML is made up of a series of "elements." They act as a language which tells the receiving browser to display certain elements on the screen. In our work, we discovered that simple, seemingly innocuous HTML elements can be used to gain unauthorized access on Web servers. Table 1-1 contains our list of HTML elements and their security implications.

Other HTML elements may pose a security risk, but the list in Table 1-1 provides a good start toward understanding the security implications of HTML. For a complete list of HTML elements and their uses, see the W3C's HTML 4.01 specification at http://www.w3.org/TR/html401/.

For tips on making your Web site more accessible to multiple browsers, check out the online W3C HTML Validation Service (http://validator.w3.org) or Dave Raggett's HTML Tidy program (http://www.w3.org/People/Raggett/tidy/).

Table 1-1. HTML Elements: Their Attributes and Security Implications
Element/Attribute	Security Implication
<form>	Form for user input Whenever a program accepts user input, a security risk exists. In fact, that is how most attacks occur submission of characters to a program that isn't expecting them, resulting in abnormal results.
<form action>	Action attribute This <form> attribute defines the executing program on the Web server. By knowing the name of the program processing the user-supplied data, an attacker can learn valuable information about the Web server and potentially find backup or older versions of the program in the same or other directories.
<form method>	Method attribute This <form> attribute defines the mechanism for sending user-supplied information to the Web server's processing program. Two methods exist for submitting information to the program: POST and GET. By understanding the method of submission, an attacker can listen to the information (which may be sensitive in nature), or worse, alter the information being sent and produce abnormal results. We discuss GETs and POSTs further in Chapter 3.
<script language=<variable>>	Scripting The <script> element, used in conjunction with the "language" attribute, allows an attacker to modify any client-side scripting being sent to the browser. When an attacker can modify the client-side scripting, she can then bypass certain filtering or sanitization scripts. Client side scripting languages include: Javascript VBScript Jscript XML
<input>	Input form control The <input> element allows for an input control for a form. Specific attributes can be altered to send undesirable data to the Web server.
<input type=hidden>	Type attribute The "type" attribute, when assigned the value of "hidden," can allow an attacker to change the "value" attribute to something undesirable. For example, some Web sites use a hidden attribute to store the price of an item in a shopping cart, which allows an attacker to change the price of that item manually to whatever she wants. If there is no server-side processing and validation of the price, an attacker can purchase items online at significantly reduced prices.
<input maxlength=<variable>>	Maxlength attribute The "maxlength" attribute can be altered by an attacker, causing it to submit large strings that can disable a Web server if not preprocessed appropriately.
<input size=<variable>>	Size attribute Similar to the "maxlength" attribute, the "size" attribute can be altered by an attacker causing it to submit large strings that can disable a Web server if not preprocessed appropriately.
<applet>	Java applet This element is used to display or run a Java applet. Because Java is transmitted in the clear and uses a known byte-code for execution, it can be seen on the wire by using a packet analyzer such as Snort or EtherPeek. For more information about decompiling Java applets and the <applet> tag, see the Java section later in this chapter.
<object>	This element typically is used for displaying an ActiveX control, but it can also be used for Java applets. An attacker can send an e-mail with HTML embedded and have the reader execute an ActiveX control, which can take over the system. The <object> element is among the best ways to propagate an e-mail virus.
<embed>	This element typically is used in conjunction with the <object> tag to display ActiveX controls and Netscape plug-ins.

Dynamic HTML (DHTML)

File extension(s): .dhtml

DHTML is often considered the object version of HTML. This language extends the HTML language to allow for increased control over page elements by allowing them to be accessed and modified by a scripting language such as Javascript or VBScript. As a result, an image tag may have an "OnMouseOver" event triggered when a user places his mouse over the image tag creating an object that comes to life with animation. Briefly, DHTML allows a Web developer to rapidly develop enhanced animations and effects such as mouseover effects, animated text, and dynamic color changes. Because DHTML is based on HTML, its security implications are similar to those for HTML.

XML

File extension(s): .xml

A derivative of SGML, eXtensible Markup Language (XML) is less restrictive than HTML's well-defined, standardized elements. XML allows anyone to create her own elements and extend the language herself.

At the heart of this element extension are the Document Type Definitions (DTDs), which are similar in function to the data definition library of a relational database. A DTD defines the beginning and ending tags of an XML file, allowing the viewer to make sense of the data. For example, to define the data structure of a car dealership, we might have the following DTD:

<!ELEMENT Inventory (Car*)>

<!ELEMENT Car ( Make, Model, Color, Owner*)>

<!ELEMENT Make (#PCDATA)>

<!ELEMENT Model (#PCDATA)>

<!ELEMENT Color (#PCDATA)>

<!ELEMENT Owner (#PCDATA)>

The XML file displaying this data reads:

<?xml version="1.0" ?>

<!DOCTYPE Inventory PUBLIC "." "Inventory.dtd" >

<Inventory>

  <Car>

  <Make>Honda</Make>

  <Model>Civic</Model>

  <Color>Red</Color>

  <Owner>Jack Scanly</Owner>

  <Owner>Jane Scanly</Owner>

  </Car>

  <Car>

  <Make>Nissan</Make>

  <Model>Maxima</Author>

  <Color>Black</Color>

  <Owner>Mike Smith</Owner>

  </Car>

</Inventory>

The preceding code shows the current car inventory for a very small car dealership. Two cars populate the dataset, one red Honda Civic and one black Nissan Maxima. Note that the Civic has two owners in the database. The DTD specification allows for multiple owners by including an asterisk (*) next to the "Owner" name:

<!ELEMENT Car ( Make, Model, Color, Owner*)>

The preceding example gives an overview of the XML language and how it looks and functions at least enough information so that you will know it when you see it. For the most part, the XML language is too new for us to cite specific security risks, but you should be prepared to identify the risks as the language matures and the hackers of the world begin to attack it.

XHTML

File extension: .xhtml

According to the W3C, HTML 4 is the final release of the ubiquitous language as we know it. The next version of HTML is being reformulated to include the XML language's definition and structure. In other words, HTML and XML are being combined to form the XHTML language and give the W3C continued control over its design. The complete XHTML 1.1 specification can be found at http://www.w3.org/TR/xhtml11/. As with XML, XHTML is in its infancy, and its security hasn't been tested to any significant degree.

Perl

File extension(s): .pl or anything

The Practical Extraction and Report Language (Perl) is a high-level, (often considered a scripting) programming language written by Larry Wall in 1987. It is arguably the most ported scripting language to date with versions for AS/400, Windows 9x/NT/2000/XP, OS/2, Novell Netware, IBM's MVS, Cray supercomputers, Digital's VMS, Linux, Tandem, HP's MPE/ix, MacOS, and all versions of Unix. The portability of the Perl language, combined with its low price (it's free!) and robustness, has created a truly ubiquitous language for the Internet and is largely responsible for the Internet's tremendous growth.

Perl is remarkably robust and flexible, because it can be written to accommodate server-side actions, scripted to perform functions locally on a system, or used to create entire standalone applications such as majordomo, the universal mail list manager. However, its primary use is handling the server-side scripting of Web sites. Security never has been a fundamental component of the language. As a result, many security vulnerabilities are present in Web sites that utilize the Perl language. But, if you use Perl there are ways to harden it.

Perl code can range from simple to overly complex in its design. To familiarize you with the look and feel of Perl, we present a script to run on the command line and display the words "We love Perl!" There are many ways to accomplish this task, but here is a simple one:

# Perl script to pass a message to the script

# and print it to the screen.

@parms = @ARGV;

$msg = $parms[0];

print "$msg!\n";

Let's break this code snippet down:

@parms = @ARGV;

This line takes in the parameters from the command line and inserts them into the @parms array:

$msg = $parms[0];

This line assigns the variable $msg to an element of the @parms array:

Print "$msg!\n";

This line prints out the contents of the variable $msg and appends an exclamation mark and newline character to the screen. Running the program from the command line gives the following output:

C:\temp>perl love-perl.pl "We love Perl"

We love Perl!

C:\temp>

In the preceding example, Perl is run locally on the system and the script is run at the command line. However, on the Web this technique is highly unusual as most Perl execution is performed with the Common Gateway Interface (CGI) as the mechanism for outputting the text to a Web browser.

The most common use of Perl is the handling of input from a user form and then the processing of it. A good example is an evaluation or feedback form which accepts user input through HTML <input> and <textarea> fields:

<form method=POST action="/cgi-bin/mailto.pl"></form>

<html>

Email Address:

<input value="" size=60 maxlength=60 name="from">

Your message:

<textarea name="body" rows=10 cols=50 size="500" maxlength=500></textarea>

<input type=submit value="send mail">

</html>

The HTML file then submits the collected data to the Perl program (i.e., mailto.pl) with a <form> POST action:

<form method=POST action="/cgi-bin/mailto.pl"></form>

The mailto.pl script then performs the desired action and returns the intended output a fairly simple and straightforward procedure.

If you currently use Perl as your server-side scripting language, or are considering using it, you should be aware of the following security risks and their countermeasures.

Be sure that your Web servers aren't running as an administrative user such as "root" (Unix) or "administrator" (Windows). By running your Web servers as an administrator, you leave open the opportunity for someone to execute commands with privilege.

Always preprocess field values. Develop a list of alphanumeric characters that are legal in your application and then filter out any characters that don't belong to that set. For example, if you're receiving an e-mail field, you can use regular expression pattern matching to detect when the field is tainted with bad data, then error and prompt the user to "fix" his input. Only the following characters should be allowed in any e-mail address ("a..z," "A..Z," "0-9," only a single "@," hyphen "-," underscore "_," and period "."). Here is a simple regular expression that you should use for discovering nefarious intentions in submitted e-mail addresses:

                 if ($email !~ /^[\w.-]+\@[\w.-]+$/) {

                  print "<br><br>#Warning: an error in your email address has been

                  found. Please try again.<br>";

                 }  else {

                  # Process the rest of your Perl script

If you're unfamiliar with this regular expression statement, we can break it down for you:

/  # Beginning of the regular expression

  ^  # Matches from the beginning of the string

  [  # Designates the beginning of a set of characters

  \w  # Matches an alphanumeric character (including "_")

  .  # Matches a period "."

  -  # Matches a dash or hyphen "-"

  ]  # Designates the end of the set of characters

  +  # Matches 1 or more of the prior set of characters

  \@  # Matches an ampersand "@"

  [  # Designates the beginning of a set of characters

   \w  # Matches an alphanumeric character (including "_")

  .  # Matches a period "."

  -  # Matches a dash or hyphen "-"

  ]  # Designates the end of the set of characters

  +  # Matches 1 or more of the prior set of characters

  $  # Matches from the end of the string

/  # End of the regular expression

If the condition indicated by the Perl evaluation "!~" isn't met, the script is in error and the user has to correct her entry.

Restrict the use of local system commands that shell out to the operating system. Functions such as open(), system(), fork(), or exec() can be deadly when passing variables, allowing an attacker to execute commands. If you must use these functions, be sure to sanitize the input variables as noted previously.

On Unix systems, don't inherently trust your environment variables. Be sure to set the $PATH and $IFS variables explicitly in your scripts:

                 $ENV{"PATH"}  = "/bin:/usr/bin:/usr/local/bin:/opt";

                 $ENV{"IFS"}  = "/";

If you don't explicitly control your $PATH and $IFS variables, an attacker may edit them and get your program to execute an alternative program rather than the intended one.

Confirm the input size of variables and form field submissions. Either confirm the length of variables received by your program or use the $ENV{CONTENT_LENGTH} field to confirm the length of the data on POST (and sometimes GET) requests. If you don't confirm the length of user input, an attacker can send large amounts of data in the variable, crashing the Web server or system or worse, find a buffer overflow condition and remotely execute arbitrary commands.

Try not to accept paths in your fields. But if you must, be sure that they are relative and not absolute. Also, strip out any dots (..) or slashes (/ or \). If you don't, an attacker may submit a request for the password file on a Unix system:

                 ../../../../../../../etc/passwd

or request the backup SAM file on a Windows system:

../../../../../../../winnt/repair/sam._

Whenever possible, use taint checks. Perl provides a facility called taint checking, which tells Perl to investigate the origins of variables.

By default, Perl is stored in clear text. Thus if an attacker compromises your Web server, he may be able to read your Perl files and extract valuable information such as usernames and passwords to your database. A couple of programs, such as Perl2exe (http://www.perl2exe.com), allow you to obfuscate the Perl code. They create an independent executable file (.exe), eliminating the need for both the Perl source and the interpreter.

Overall, Perl sanitization is crucial, so find or develop a good input sanitization function and use it on every field accepted from a user. For more information about securing Perl scripts, check out http://www.w3.org/Security/Faq/.

PHP

File extension(s): .php, .php3, and potentially no extension

Although there have been a number of PHP language authors, the origins of PHP start with Rasmus Lerdorf. He originally wrote the first PHP parsing engine in 1995 as a Perl CGI program, which he called "Personal Home Page," or simply PHP. His original purpose was to log visitors to his r sum page on the Web. He later rewrote the whole thing in C and made it much larger, enhancing the program with greater parsing capabilities and adding database connectivity. Over the years, many other programmers have contributed to the development of PHP, including Zeev Suraski and Andi Gutmans, who rewrote the parsing engine to create PHP version 3.

Besides Active Server Pages (ASP) and Perl, PHP is among the most ubiquitous server-side scripting languages in existence. The PHP language is incredibly diverse and can be used for standalone applications not tied to the Web. However, the language is most often used on Unix Web servers (typically, Apache; see http://www.apache.org) as its dynamic server-side processing engine. In fact, it is the most frequently used Apache module (http://www.securityspace.com/s_survey/data/man.200111/apachemods.html). PHP files can be named anything, but they typically run with the extensions:

.php

.php3

To familiarize you with PHP code, we present the following code snippet, which uses the PHP command echo to display the string "Hello World!" in the user's Web browser:

<!--  PHP Example in HTML

<!--  Prints "PHP Example: Hello World!" to the browser

<html><head><title> PHP Example</title></head>

<?php

echo "<br><h1>Hello World!<br></h1>";

?>

</html>

Note that PHP is much like Perl in that you can use the language in line with the HTML. Lines 1 and 2 are HTML comments, indicated by the <!-- tags. Line 3 is strictly an HTML series of tags, including the <html>, <head>, and <title> tags. Lines 4, 5, and 6 are PHP code. Line 4 begins the PHP code, as indicated by the (<?) opening bracket. Line 5 is PHP's code for outputting the string "Hello World!" on the screen. Line 6 is the (?>) closing bracket for the fourth line's opening PHP bracket. The following is a simple example of PHP code and how you can quickly surmise the technologies in play on a Web site:

<?

// Open the SQL connection

$conn = mysql_connect("10.1.1.1", "sa", "guessme") or die(mysqlerror());

@mysql_select_db("inventory") or die(mysqlerror());

// SQL query

$data = mysql_query("SELECT * FROM autos") or die(mysqlerror());

// Print the data in HTML

print "<table>\n";

while ($row = mysql_fetch_row ($data))

  print "<tr>\n";

  print "<td>$row[0]</td>\n";

  print "<td>$row[1]</td>\n";

  print "</tr>\n";

print "</table>\n";

// Close the SQL connection

mysql_close($conn);

?>

Noteworthy about PHP is that its weaknesses are very similar to those of Perl. If the script processes input from the Web browser for database queries, system(), passthry(), shellexec(), exec() calls, or server-side includes (SSI), an attacker can take advantage of poor variable input sanitization and get the PHP engine to perform nefarious activities, including arbitrary command execution. To limit your exposure, be sure to include a solid input sanitization routine in your parsing routines. As with Perl, you can (and should) use regular expressions in your PHP code to discover malicious data in your input fields and error it out when one is discovered. The following code uses the preg_match() function to compare the $string received with a regular expression series and also can be used for parsing a number field:

if (preg_match("/^[0-9]+$/i", $string))

  echo "Error discovered in your number field\n";

  return 1;

break;

The following code can be used to parse a string field:

if (preg_match("/^[a-z0-9]+$/i", $string))

  return 1;

break;

For more information about PHP and to download the program, check out http://www.php.net.

ColdFusion

File extension: .cfm

ColdFusion is Allaire's (http://www.allaire.com) Web application development system, the latest version is ColdFusion 5. ColdFusion has three major components: Application Server, Markup Language, and Studio. You need to understand them to increase your hacking prevention capabilities.

ColdFusion Application Server

The Application Server is really the brains behind the ColdFusion operation and runs on both Windows and Unix platforms. The Application Server runs on the Web server and processes ColdFusion page requests. Any specific ColdFusion tag requested is pulled by the Application Server and processed accordingly.

ColdFusion Markup Language

ColdFusion Markup Language (CFML) is the server-side language that powers ColdFusion. CFML follows the HTML calling conventions of using tags and parameters passed as tag attributes. The CFML language is used in conjunction with the Application Server to create Web applications such as shopping carts, online bank accounts, and the like. The extension recognized by the Application Server is .CFM.

Like PHP and Perl, CFM files are stored in plain text by default. That is, anyone with the right credentials, including anyone who has gained those credentials in an unauthorized manner, is able to view the files. An attacker viewing these files can uncover sensitive information, such as database connectivity usernames and passwords, giving her access to all customer data (e.g., credit card numbers, social security numbers, and mother's maiden name).

CFM is similar to HTML in that it uses tags that provide enormous functionality, such as database connectivity, Post Office Protocol (POP) and Simple Mail Transfer Protocol (SMTP) support, and Component Object Model (COM) support. In addition, third-party add-ons, which provide enhanced functionality, can be purchased. For example, to perform a SELECT query statement against a database, you can use the following CFM code snippet:

<CFQUERY DATASOURCE="inventory" NAME="automobiles">

  SELECT auto_id, auto_make, auto_model, auto_color

  FROM autos WHERE auto_id = #URL.auto_id#

</CFQUERY>

Note that the CFM tag <CFQUERY> has an attribute called DATASOURCE, which specifies the ODBC data source to query. The NAME attribute is used to further reference the ODBC connection. An attacker who knew how this query is generally structured could attempt to attack the database by submitting a nonstandard request in variables. For example, the URL auto_id variable is used to pass the auto identification number from the URL (see Chapter 3 for more details on GET and POST commands). As a result, the URL that instigated the query looks something like the following and pulls up the first auto_id in the database:

http://www.example.com/cfm/get.cfm?auto_id=1

As you will learn throughout this book, if proper input sanitization isn't used on the server side, an attacker can potentially pull up all kinds of data.

Now, to display the information retrieved from the preceding SQL query, we must reference the "automobiles" array:

<HEAD>Automobile Inventory</HEAD>

  <CFOUTPUT QUERY="automobiles">

  <LI>#auto_make#, #auto_model#, #auto_color#, (#auto_id#)</LI><BR>

  </CFOUTPUT>

Assuming that data is in the "inventory" database, the HTML output of our request looks something like this:

Automobile Inventory

    Honda, Civic, Red, (1)

    Honda, Accord, Blue (2)

    Nissan, Sentra, Black (3)

...

Note that the output from <CFOUTPUT> tag is almost always purely HTML. We say "almost" because there is always the potential for tricking the Application Server to bring up CFML source code, but that isn't typical.

ColdFusion Studio

Built loosely on Homesite, Studio is the Integrated Development Environment (IDE) for designing Web pages and applications.

Together, ColdFusion Application Server, Markup Language, and Studio provide an environment ripe for hacking. The main problems with CFM are the sample files and unsanitized input. Most of the early public attacks on ColdFusion servers were due to poorly written sample scripts that allowed attackers to upload files, view a file's contents and source, control the Web service, or execute arbitrary commands. The simple solution to most of these problems is to remove the default sample scripts and sanitize your input fields.

Active Server Pages

File extension: .asp

Active Server Pages (ASP) is Microsoft's version of a server-side scripting environment. Produced mainly for Microsoft's Internet Information Server (IIS) Web server, ASP allows you to combine HTML, scripting code, and server-side ActiveX components to create dynamic content. As with Perl, PHP, and CFM, the ASP language allows you to execute code on the Web server itself without the end-user knowing about it. Much like the server-side scripting languages discussed previously, ASP can be used to execute requests against databases, COM, and local system execution commands.

The default ASP language is VBScript, a scripting version of the popular Visual Basic language used in nearly all of Microsoft's products. This language isn't as robust as its parent but manages to provide the necessary tools to create dynamic Web pages.

Visual Basic comes in two basic forms: server-side and client-side. The server-side form is represented in ASP code by the <%@ and %> tags, and the client-side form is represented in HTML in the <script> tags. For example, to have your IIS Web server display the current date and time on a Web page, you can use the following syntax:

<%@ language="VBScript" %>

<html>

<body>

<h1>Welcome to server-side date:</h1>

<% =date %>

<h1>And to server-side time:</h1>

<% =time %>

</body>

</html>

Let's break this code snippet down. The first tag, <%@, is a shortcut to the <script language="VBScript" runat="Server"> tag, which usually is necessary and allows you to specify the type of server-side language that is used in the forthcoming code. The heart of the ASP in this file is <% =date %>, which tells the server to run the VBScript date() function and return the output to the client. The output is displayed in a client's Web browser as shown in Figure 1-1.

Figure 1-1. Output of the VBScript date() function

graphics/01fig01.gif

Remember, the preceding code snippet is all server side. To allow the client to perform the same function you can use the following code snippet:

<html>

<body>

<script type="text/vbscript">

document.write("<h1>Welcome to client-side date: </h1>")

document.write("<br>" & date() & "<br>")

document.write("<h1>And to client-side time: </h1>")

document.write("<br>" & time() & "<br>")

</script>

</body>

</html>

Note that in client-side scripting you wouldn't use the <%@ designation or the runat attribute of <script>. You would use the specific VBScript functions to write the date to the screen (date() and time(), respectively). The downside to writing your Web applications with client-side processing is that the scripts can be altered for nefarious purposes, as you will soon learn.

Database Connectivity

Database connectivity with ASP isn't quite as trivial as with ColdFusion, but you should note a couple of things about how connectivity occurs. First, connection information is essential to security, because it holds the Data Source Names (DSN), usernames, and passwords for the data. Second, sensitive data can be stored either in the ASP file itself or in a global.asa file.

ConnectionString

The following code snippet shows how a database connection occurs (as described in the first case in this subsection):

<object runat="Server" id="Conn" progid="ADODB.Connection">

</object>

<%@ language = VBScript %>

<html>

<body>

<%

Session("ConnectionString") = "dsn=Autos;uid=sa;pwd=guessme;APP=ASP Script"

Conn.Open Session("ConnectionString")

SQL = "SELECT * FROM Autos WHERE auto_color='black'"

Set RS = Conn.Execute(SQL)

Do While Not RS.EOF

%>

<table bordercolor=black border=2>

<tr>

<td>

<%=RS("auto_id")%>

</td><td>

<%=RS("auto_make")%>

</td><td>

<%=RS("auto_model")%>

</td><td>

<%=RS("auto_color")%>

</td>

<br>

<%

RS.MoveNext

Loop

%>

</tr>

</table>

</body>

</html>

The Session("ConnectionString") line indicates that the DSN, username, and password are stored in clear text in the ASP file itself. If an attacker can access this file somehow or get the Web server to bring up the source code of the ASP file, she can potentially gain direct access to the database. The output of the preceding code appears in the browser shown in Figure 1-2.

Figure 1-2. Output of automobile database using the database.asp file

graphics/01fig02.gif

The second scenario for database connectivity removes the need to put the connection string information in clear text in the actual ASP file. In this case, the function is called and moved into a global file called global.asa. This optional global.asa file must reside in the root directory of the ASP application (e.g., c:\inetpub\wwwroot or c:\inetpub\scripts). Changes to the global.asa file typically require a Web server stop/start for the server to activate the changes.

The global.asa file contains declarations of objects, variables, and methods to be used by most Web browser scripts, including JavaScript, VBScript, Jscript, and PerlScript. Specifically, the global.asa file typically holds application events, session events, <object> declarations, and TypeLibrary declarations.

To avoid putting the ConnectionString information in the ASP file, you can put the information in a global.asa file in your Web root c:\inetpub\wwwroot and effectively "hide" it from prying eyes:

<SCRIPT LANGUAGE=VBScript RUNAT=Server>

Sub Session_OnStart

Session("ConnectionString") = "DSN=Autos;UID=sa;PWD=guessme;APP=ASP script"

Session("ConnectionTimeout") = 15

Session("CommandTimeout") = 30

End Sub

</SCRIPT>

Note that the entire Session("ConnectionString") line is inserted in global.asa. When that is done, the same line in the ASP file (and every ASP file that references the same DSN) can be removed.

ActiveX

ActiveX is considered to be the Internet portion of the Component Object Model (COM) and is another Microsoft mechanism for delivering dynamic content to the Web. Netscape's equivalent is the plug-in.

Microsoft allows you to create ActiveX controls with a number of languages including C++, Visual Basic, and Java. With ActiveX, you can provide dynamic content such as online clocks, animated graphics, and database connectivity. The real danger with ActiveX is that it resides in container programs such as Microsoft Office.

Unlike most of the Web languages discussed so far, you can't create ActiveX controls with simple text editors; they have to be compiled and stored in .CAB files within integrated development environments (IDE) such as C++/MFC or Visual Basic. Once created, the CAB file and CLASSID must be referenced in the HTML file, which loads the ActiveX control. The link is accomplished by using the HTML <object> tag and its CLASSID and CODEBASE attributes. The following code snippet runs an ActiveX control which plays a QuickTime movie:

<OBJECT CLASSID="clsid:02BF25D5-8C17-4B23-BC80-D3488ABDDC6B"

WIDTH="160"HEIGHT="144"

CODEBASE="http://www.example.com/activex/movie-plugin.cab">

<PARAM name="SRC" VALUE="sample.mov">

<PARAM name="AUTOPLAY" VALUE="true">

<PARAM name="CONTROLLER" VALUE="false">

<EMBED SRC="sample.mov" WIDTH="160" HEIGHT="144" AUTOPLAY="true"

CONTROLLER="false" PLUGINSPAGE="http://www.example.com/quicktime/download/">

</EMBED>

</OBJECT>

The CLASSID attribute in the <object> tag holds the identifier to the ActiveX control in the movie plugin.cab file. The CODEBASE attribute in the <object> tag isn't necessary, but when used it holds the location of the compiled .CAB file. The security risk with ActiveX is twofold:

1. ActiveX controls can perform elaborate functions, including reading/writing files or executing code on your system. An attacker could create a nefarious ActiveX control and when a user browsed to his Web site, the control could perform certain functions like read files from their hard drive and send it over to the attacker.

2. ActiveX controls typically require the location of the .CAB file to be disclosed so an attacker could perhaps discover a location on the Web site that is less noticed or less secured and attack it.

For security conscious developers, we generally recommend that the use of ActiveX be limited if not avoided in Web sites.

ASP Summary

The preceding discussion merely skims the surface of ASP and its functionality, but it should give you a general idea of its typical use and features. Two basic solutions to ASP security problems are similar to those for ColdFusion:

1. Remove the sample files from the default installed directories.

2. Sanitize all your input fields.

However, these solutions represent just the beginning when it comes to overcoming ASP insecurity, as we discuss in later chapters.

CGI

File extension(s): .cgi, .pl

Common Gateway Interface (CGI) is one of the oldest and most mature standards on the Internet for passing information from a Web server to a program (such as Perl) and back to the Web browser in the proper format. Combined with languages such as Perl, CGI offered one of the first platforms for delivering server-side, dynamic content to the Web.

Unlike ASP or PHP, CGI is not a language per se but rather a set of guidelines to be used for other languages. In fact, numerous languages can be used to create a CGI program, including

Perl

C/C++

Java

Shell script language such as sh, csh, and ksh (Unix)

Visual Basic (Windows)

AppleScript (MacOS)

To help you become familiar with CGI code, we use the following trivial Perl/CGI program to display the message "Hello World!" in a Web browser:

# Perl Program to show general CGI functionality

# Prints "Hello World!" to the browser

print "Content-type:text/html\n\n";

print "<html><br><head><title>Simple CGI Program</title></head><br>";

print "<h1>Hello World!</h1><br><br></html>";

At its heart, this code is Perl. The first print statement, using Content-type:text/html\n\n is typically required, because CGI doesn't automatically send the HTTP headers for each request and consequently turns the responsibility for doing so over to the Perl script. Without the Content-type header, the CGI program doesn't display the "Hello World!" text but typically displays an error message to the effect that:

The specified CGI application misbehaved by not returning a complete set of HTTP

headers.

The second print statement simply sets up the HTML file with the <html>, <head>, and <title> tags. The final print statement prints the text "Hello World!" to the browser.

Environmental Variables

Finally, a discussion of CGI isn't complete without covering environment variables. As we discuss in more detail in Chapter 3, HTTP requests contain common headers that can be used as CGI variables. They are the conduit between the form being used by the Web browser and the back-end CGI program in that they allow certain information to pass to the running CGI program. The purpose of these variables is to better interpret the environment of the running browser. A list of the most common CGI environment variables is provided in Table 1-2.

The list of environment variables in Table 1-2 isn't complete, but it gives you a good idea of the large number of avenues potentially open to attack on a Web server. For more detail on HTTP_ USER_AGENT variable assignments check out http://www.siteware.ch/webresources/useragents/db.html.

Note: With PHP you can dump all the variables by using the phpinfo() function. For example, if an attacker writes a file to a Web server and the server has PHP running, the attacker can most likely insert the following single line in the file to dump all the variables on the system:

<?=phpinfo()?>

Server-Side Includes (SSI): HTML and SHTML

File extension(s): .shtml, .shtm, .stm

SHTML is considered a CGI file because it typically uses server-side includes (SSI) for server-side processing. SHTML is considered the grandfather of server-side processing, because it has been around since the beginning of the Web and is still in use today (but sparingly). Like most of the languages we discussed so far, SSI are included in the HTML and get picked up by the application mappings of the Web server. The commands are acted on, with the output of the programs being sent to the browser. A simple example of an SSI is one that will return the server's date, in GMT:

<html>

<body>

<!--#echo var="DATE_GMT" -->

</body>

</html>

Table 1-2. Common CGI Environment Variables
Variable	Description
GATEWAY_INTERFACE	Holds the CGI version number supported on the server. Format: CGI/version (e.g., CGI/1.1)
SERVER_NAME	Holds the server's DNS name, hostname, or IP address that is running the CGI program.
SERVER_SOFTWARE	Holds the name and version of the software that is running on the server. Format: name/version (e.g., Server: Microsoft-IIS/4.0)
QUERY_STRING	Anything that follows the question mark "?" on the URL gets passed to this variable. For example, if two parameters are being passed to the test.cgi script, `http://www.example.com/test.cgi?fname=Stuart&lname=McClure` the QUERY_STRING variable would be `fname=Stuart&lname=McClure`
PATH_INFO	Holds extra information embedded in the URL that may be needed on the server to process a form. The best example is when a program needs to pass a file's location to the CGI program.
SERVER_PROTOCOL	Holds the name and revision number of the protocol being used. Format: protocol/revision (e.g., HTTP/1.1)
SERVER_PORT	Holds the port number the request was sent over; typically, TCP port 80.
REQUEST_METHOD	The HTTP method used to pass information, typically, GET, PUT, POST, or HEAD. For more information about HTTP request methods, see Chapter 3.
CONTENT_TYPE	Holds the content type of the data being passed. For example, CONTENT_TYPE:text/html
CONTENT_LENGTH	Holds the length of the content (POST or PUT) being sent by the client, typically given in bytes.
SCRIPT_NAME	Holds the virtual path to the running script; typically used for self-referencing URLs.
REMOTE_USER	Authenticates the user to the server.
AUTH_TYPE	Provides the protocol-specific authentication method used to authenticate the user.
PATH_TRANSLATED	Holds the translated version of PATH_INFO sent by the server. This variable takes the path and performs any virtual-to-physical mapping.
REMOTE_HOST	Holds the hostname of the remote system making the request, typically the client.
REMOTE_ADDR	Holds the IP address of the remote system making the request, typically the client.
REMOTE_IDENT	If the Web server supports RFC 931, the REMOTE_IDENT variable will be set to the remote user name retrieved by the server, typically limited to logging.
HTTP_ACCEPT or ACCEPT	Holds the MIME types that will be accepted by the client. For example, text/plain, text/html, www/mime, and application/html.
HTTP_USER_AGENT	Holds the brand and version of Web browser being used by the client.

The distinctive features of SSI are the opening characters (). In the preceding example, we used the var directive to display the variable named DATE_GMT. However, we can perform many functions within an SSI, including those displayed in Table 1-3.

Microsoft's IIS Web Server and SSI

By default, IIS does not allow the CMD command within the EXEC designation. To enable your IIS Web server to allow CMD to work, you must add a registry key and reboot your system:

Key	HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet \Services\W3SVC\Parameters
Value	SSIEnableCmdDirective
Data Type	DWORD
Value	1

Note: If you have this registry key set to a value of 1, be sure to remove it immediately (provided of course that key functionality isn't dependent on it).

Although SSI aren't frequently used today in Web applications, they have been known to pop up from time to time. Be on the lookout for servers and applications that take advantage of these features because a wily hacker may be the next to take advantage.

Table 1-3. Directives and Their Corresponding Functions in SSI
Directive	Function
#exec var=<variable>	Many variables that can be accessed with SSI exist, including: DATE_LOCAL displays the local date and time. HTTP_REFERER displays where the user came from. DATE_GMT displays the date and time, according to GMT. REMOTE_ADDR or REMOTE_HOST displays the source IP address or hostname respectively. DOCUMENT_NAME or DOCUMENT_URI displays the Web page's name. HTTP_USER_AGENT displays the Web browser's name.
#exec cmd=<command>	Executes arbitrary commands on the remote system. Typically the cmd command is run on Unix systems only, but Windows systems have been known to allow it as well. This function obviously is a serious security risk and therefore isn't enabled by default in most Web application servers today. However, it was once the source of many security compromises.
#exec cgi=<filename>	Activates a CGI program within the HTML Web page. It often comes in handy with header or footer files that call files to display text.
#fsize file=<filename>	Displays the <filename>'s file size.
#flastmod file=<filename>	Displays the <filename>'s "last modified" date.
#include file=<filename>	Includes the text from the <filename> and displays it in the Web browser.
#include virtual=</filename>	With an .shtml file in another directory, the "virtual" command must be used.
#comment	Protects whatever is written after this command because it is unprocessed by the server. This location is also where you can find some juicy information.

Languages of the Web