Section 6.3. The Fragile Web | Computer Security Basics

6.3. The Fragile Web

Of all the languages in the world, it is somehow fitting that the one that unifies most people in a common experience is a product of the Electronic Age. It's not English, Chinese, or even Japanese; rather it is Hypertext Markup Language (HTML). One of the most important protocols on the Internet, HTML is used to format the text and images people place in their web pages.

The actual heavy lifter for web work on the Internet is Hypertext Transfer Protocol, a special protocol that carries HTML codes and other information over the Internet efficiently.

It would follow that anything so useful as HTML and HTTP would be subject to special attacks, and this is precisely the case. A whole family of attacks can be perpetrated against these formats, and when web pages become corrupted, they can actually become instruments of attack and destruction themselves. Attacks can take place on the user's end, in which case they are called client-side attacks, or on the servers that send out the web pages that clients view, in which case they are called server-side attacks. This section describes the most important client- and server-side attacks.

6.3.1. How HTML Formats the Web

HTTP is a sparse language with a few terse commands such as "GET." What HTTP "gets" is a small file of HTML code.

HTML is blocks of text surrounded by commands. The commands are generally embedded within greater than and less than signs, which are called start tags (<command>) and end tags (</command>), respectively. The start tag instructs the user's web browser to do something, say turn on the bold text function, and the end tag tells the browser to stop doing whatever the start tag kicked off. Thus the command that makes text bold applies to all the text between <bold> and </bold>. Given the many available text and commands in the HTML language, lots of effects are possible.

More advanced commands deal with inserting images on the page. Image commands are actually a call back to a server to download a picture, which the HTML code positions on a user's screen.

Hyperlinks are also important. A user activates a hyperlink by clicking on it. This calls a new file from the server, which may be a whole new web page. Hyperlinks greatly increase the interactivity of a web site, while making it easy to organize a lot of information with a point-and-click interface.

You can view HTML code simply by calling up a web site and clicking on the browser's View Source command. It is usually located on the command line at the top of the browser window. If you cannot find it, use the browser's Help function to look it up.

6.3.2. Advanced Web Services

Merely displaying text, graphics, and photos, might make an attractive web site, but is unlikely to make much money. With increased advertising, the Web has become commercialized, and innovation, not all of it strictly beneficial, has become commonplace. In the first place, ads, instead of images, can be downloaded to users. Banner ads, for instance rotate among a variety of sponsors, who pay for page views, knowing that some percentage of those who view a page will be exposed to the ad. In order to make the ads change from time to time, the embedded commands needed to be a little bit more sophisticated. Various scripting languages have been adapted to this purpose.

A web page script is a little bit of computer code that executes either as the page loads into the computer from the server or on the user's computer as the page is displayed. The scripts that execute on the server are called server-side scripts; scripts that run on the user's machine are called client-side scripts. Using a script, complex functions and displays can be generated programmatically, greatly increasing the functionality and interactivity of web-based applications.

With the power of scripting comes also the opportunity for abuse. It is possible, for instance, to use scripting languages to force disallowed states on either the client or server machines, which can result in unpredictable results and crashes. Some scripts can even exploit vulnerabilities in the underlying operating systems. In this case, the scripts become exploits, and can cause serious trouble or damages. A good deal of work has now gone into trying to ensure that scripting languages on the Web are secure, and for the most part they are. However, every now and again, somebody somewhere discovers a new code sequence that can get around the controls and checks. Most browsers enable users to disable scripts and applets, or to run them in a restrictive fashion. Assuming users stay in the mainstream in terms of web sites, it should not be an issue.

6.3.2.1. What is a script?

The term script in a computing context dates back to the early 1970s and comes from the Unix operating system term shell script. A shell script is a sequence of commands that are read by the computer from a file. Using scripts, multistep commands can be embedded in a web page. Scripting languages have increased in flexibility and robustness; today, scripts can be executed either at the server as the page is called from memory or at the user's computer as the material is called up for display.

The advantage to client-side scripting is that it allows the local computer to check against data entry errors such as proper abbreviation of a state name, or the format of something such as a phone number, before the data crosses the network and ties up the server, or worse, creates an entry in the server which has to be corrected. The advantage to scripting on the server side is that it helps preserve confidentiality. Processes that involve company secrets are executed in the safety of the server room rather than distributed out to where they can be captured and analyzed. In addition, some processes are so intensive that it might bog down the user's machine to execute them on the client side. Anything that involves lots of tables or uses a database is usually best kept server side.

Scripting allows a lot of interaction between the client and the server. Responses from the field may be organized and checked by a client-side script, which transmits them to the server. On the server side, a script may use the information contained in a client's response to modify on the fly what the server sends back to the user. This way the application shows the user exactly what is required or what has changed as a result of the user's entry. A raft of conventional business applications can be operated in real time on the Web using judicious scripting. Focusing demands for computational power on the server also allows for the use of simpler clients. This means that cell phones, video games, and personal digital assistants (PDAs) can be viable terminals in complex interactions.

6.3.2.2. Client-side scripting languages

There are several different scripting languages. While a few are used both client side and server side, most languages are used on one or the other. Language choice is generally based on the specific features or characteristics of the language, adherence to company- or project-wide standards, or simply personal preference.

Occasionally the native browser does not have the program code required to execute everything that is asked of it. Certain media files, for instance, may require a special code to play back properly. A plug-in is a software component that loads automatically to extend the range of files types that a web browser can understand and execute.

These are the most popular scripting languages for client-side applications:

JavaScript
VBScript
Java (requires plug-ins on the receiving machine)
ActiveX Controls
Macromedia Flash (requires a plug-in on the receiving machine)

Multiple scripts can run on the same web page, but languages cannot be mixed within the same script.

JavaScript (not to be confused with scripts written in Java) is one of the earliest languages and is supported by both major browsers. It was introduced first for Netscape Navigator and, when it was adopted by Microsoft for Internet Explorer, it was called Jscript.

JavaScript is based on computer languages such as C, C++, and Perl. VBScript, on the other hand, is based on Microsoft's Visual Basic. The major difference between JavaScript and VBScript is the command syntax. However, only Internet Explorer natively supports VBScript. To make VBScript work on Netscape Navigator requires a plug-in.

Java may require the use of a browser plug-in, but it is a popular language because it can operate on any computer that can host a Java Runtime Engine (JRE). The programmer must write the Java code and compile it, but thereafter it will run on any type of computer or device that has a JRE, with little extra programming effort. Java scripts for the client side are usually called applets. When they are constructed to use as server-side, they are called servlets.

Java has lots of power, but it has certain built-in security features that are designed to keep it from doing anything damaging to the machine on which it is running. These features create a virtual environment for the Java code to run in, formerly called a sandbox. The Java security manager keeps the Java code from crawling out of the sandbox. A Java that tries to escape the sandbox and do harm to the client is called a malicious applet.

ActiveX is a powerhouse language that features a lot of capabilities and a fair amount of danger. ActiveX controls can be integrated into web pages and use numerous functions such as buttons, drop-down list boxes, text entry, and display fields. ActiveX controls can take nearly complete control of the host machine, unlike Java, which is at least partially constrained by the security manager. ActiveX can cause real trouble if malicious code is included. On the other hand, ActiveX is powerful and flexible, making it highly attractive to programmers and developers.

6.3.2.3. Server-side scripting languages

When a server-side script runs, it is often because a user has submitted answers to an application or form. The user then downloads a page that shows the effect of the user's submissions. A general property of all server-side processing is that when a user submits a form, the server processes it and creates a new HTML page. The user downloads this page to observe the effect of what they have requested, in effect, the completed form.

The use of scripts on the server is divided between two main paths. Some set ups communicate with the server directly; others communicate via the Common Gateway Interface (CGI).

The Common Gateway Interface (CGI) is a middleman between the web server and the clients. CGI establishes the rules under which various programs will talk to each other. The purpose of the CGI is to translate between unlike languages and systems. CGI programs can be written in many languages. C, C++, Java, Perl, Visual Basic, or any other language that can accept user input, process it, and output a response, can be used in CGI scripts.

If the scripting language is portable between platforms, such as Java, the script can play in multiple environments, and CGI might not be employed.

Scripting languages for server-side applications include:

Perl
Active Server Pages (ASP)
PHP
ColdFusion
Java Server Pages (JSP)
Python

Which programming language to use is a matter of preference and performance. Perl, PHP, and Ruby tend to look like code written in C, C++, or Java. They bristle with curly braces. ColdFusion code, on the other hand, is stuffed with HTML-like tags. This bothers some programmers because the resulting code seems to lack precision. Metrics that matter include reliability, flexibility, ease of making changes, speedy performance, and an ability to wring the most performance out of the computer it runs on.

Support for the application is also important. A firm that specializes in Windows applications that also uses some Unix or Linux merely because one of the employees is a whiz at it may find themselves in a difficult position if that employee leaves.

Perl is a language tightly focused on the Web and web applications. It has a long history of service. Veteran programmers swear by it. Perl is also an open source language, which means it is produced and supported by a community that allows members to access the internal workings of the language. This may increase its inherent security because any changes will be seen by many eyes, rather than a few employees working under the cover of confidentiality agreements. Combining Perl with an open source web server, such as Apache, creates a powerful combination for web servers.

Active Server Pages can be written in JavaScript, Perl, Python, and VBScript. ASP has the advantage of being able to work with COM objects. These small, portable code objects (one line often does it) can provide similar capabilities in different applications. By reusing previously developed COM objects, developers can reduce the amount of code they must write and can more easily import advanced or complicate functions into the web server.

PHP is open source. Because PHP resembles C, Java, and Perl, this can shorten the time it takes to learn it if you are already an experienced programmer. PHP features easy interface with numerous databases and allows you to program on both Windows and Unix machines.

ColdFusion is a popular web development tool because it uses a system of tags called ColdFusion Markup Language (CFML), which resembles HTML. The ColdFusion development environment tries to automate the coding process as much as possible. This may limit coding errors, since less is done by hand.

Java Server Pages (JSP), or servlets, are a set of Java classes optimized for client server interactions. JSP operates with true Java, which brings portability, multithreading, extensive class libraries, strong safety features, robust security measures, and several other of Java's advantages. Java servlets are efficient because they are persistent. Once created, usually at the time of the first request, the servlets stay resident in memory as compact Java objects. When a subsequent request comes by, the servlet can quickly build the HTML page the client requires. With ASP, the code may need to be reinterpreted for every client request, slowing down the process of page generation.

Python is an interpreted, interactive, object-oriented programming language that resembles Perl and Java. The syntax is clear and avoids nested curly braces. Python is copyrighted but freely usable and distributable, even for commercial use. Python can run on many brands of Unix, Windows, Macintosh, and others.

Each of these programming languages can make the web experience better for the user and more efficient for the provider. However, each of these environments can be a source of trouble. Every language or technology has its own set of bugs and exploits that server operators must track and patch against. It is important to monitor language-specific newsgroups and bulletins to learn about the problems and solutions.

6.3.3. Web Attacks and Preventions

The following sections describe both client-side and server-side web attacks, and what you can do about them.

6.3.3.1. Client-side web attacks

Client-side web attacks include the following:

Malicious HTML tags in web requests

Malicious code in a form window can cause the server to generate pages that are unpredictable or dangerous if run on the server. Malformed pages sent back to the client for execution may cause further problems.

Curative: Webmasters must not allow nonvalidated input. Use client-side scripting to clean up form data before it is transmitted.

Malicious code from other clients

A web site with a discussion group may be open to attacks of the form:

 Hello Group- Here is my message! <SCRIPT>malicious code</SCRIPT> That is all!

If a victim client has scripting enabled, their browser may run this code unexpectedly.

Curative: Users should turn off script functions, web servers should screen for embedded tags that show a script may be present.

Clients sending malicious code to themselves

An attacker can slip a client a message or file and encourage them to post it to the server. When the server echoes or displays the posting, the client's machine may execute it.

Curative: Webmasters should screen data, even if the intended recipient is the client that sent it.

Abuse of tags

Tags such as <FORM>, normally harmless enough, can cause trouble if they're embedded at the wrong place. An intruder can trick users into revealing sensitive information by modifying the behavior of an existing form or can display information that may have been held in the form of a previous user.

Other HTML tags can alter the appearance of a page, insert unwanted or offensive images or sounds, break things, and otherwise disturb the peace by interfering with the page's intended appearance and behavior.

Curative: Set browser security to high and lower it only for those users you are sure will not violate that trust.

Poisoned cookies

While visiting a web site, a simple text file called a cookie is often placed in the user's computer. At the next visit, the web server scans for cookies, and if it locates one, can use the cookie data to recall the previous conversation. A poisoned cookie is one that has been altered to trigger the download of malicious code.

Curative: Keep security settings high until trust is earned. Scan all incoming files (cookies included) for viruses to prevent the injection of malicious code.

Using the wrong character set

Browsers interpret the information they receive according to the character set chosen by the user. If the user fails to specify a character set, the web server uses a default setting, which can result in garbled displays or unintended meanings.

Curative: Users should declare their character set when configuring their browsers.

6.3.3.1.1. General client-side attack preventatives

General measures that can be taken to help prevent client-side attacks include the following:

There are a host of security options built into most browsers, but it is up to the user to tailor them to her specific situation and needs. Web page developers should filter their page output to eliminate these types of problems.
Users should set security high and set scripting off. This may disable some web functionality, so users should know how to make changes to browser configurations as required.
Remember the client may not be the intended victim. A carefully constructed attack may execute script on the client machine that is designed not to hurt the client, but rather some other computer or network to which the client is connected. For instance, the client may have cached security information when it last connected to a particular server, and this authorization information may be co-opted by the attacker in order to attack the server via the client.

6.3.3.2. Server-side web attacks

Servers can be attacked just as easily as clients, or perhaps more readily. Servers have the dual disadvantage of having to be exposed to many users, and possibly also to the Internet.

Buffer overflows

One of the most serious attacks against a server involves causing an intentional buffer overflow. Although the arrangement varies slightly from computer to computer and from operating system to operating system, in most computers, RAM memory is organized by roping off a piece for the operating system, then roping off a section to be used for temporary variable storage called the stack. Above the stack is cordoned off yet another section of memory, this one called the heap, after which is the memory storage spot for code waiting for execution.

If one of these areas, often the stack, suddenly grows too large, it may overwrite the area above it. This is called smashing the stack. When this happens the values that were stored in those regions are changed to whatever was being written into memory at the time the overwriting occurred. This may cause the computer to behave erratically or to crash.

If the values designed to be overwritten are chosen with extreme care, they may actually end up being stored, as if they were instructions. They may execute the next time the computer reads those memory locations. This is one way to inject arbitrary code into the server; such code could be instructions that allow an attacker to take over the computer.

One popular approach is to install a back door code that allows an intruder to enter the machine without going through security first. Another approach is to install a rootkitcode that promotes the privileges of the attacker to that of the system administrator. Either would be a home run. It would be enough to install a code stub to facilitate installing more code later on. This can be as simple as telling the computer to go to a previously prepared web address or URL, where the malicious code is waiting.

Curative: The defense against buffer overflows is good programming practice. No user input should ever be permitted without first verifying that it is of the correct length and that it contains no characters that may be invalid or that may be misinterpreted.

Stolen cookies

Because most Internet transactions last only as long as you are logged on, all the information about that session usually dissolves as soon as the link is broken. Cookies remind the server who you are and what you talked about last time. If someone copies your cookies, the potential exists for them to trick the server into revealing what it knows about you. This may not be critical, but if the server has a credit card number on file, or personal information, this can expose too much information about you.

This becomes a more powerful attack if a really complex business process is involved. For instance, if one server hosts your credit card information and address information, and a second holds your academic records, while a third holds your current schedules, it may be possible to use minimal access to one server to trick the others into revealing a great deal. This exploits the server-to-server trust. These attacks are called cross-site scripting (CSS).

Curative: Cross-site scripting is a complicated exploit. Once the attacker is communicating from server-to-server as a peer, many unpredictable things can happen. The strongest defense is for each server to be properly protected in the first place.

Malicious ActiveX controls

As part of the .NET initiative, ActiveX allows powerful interactions between computers. ActiveX operates in a way that resembles a plug-in for Java, but ActiveX does not use an isolating area of memory as does Java. Today's fine-grained access controls prevent a lot of Java problems because it confines the code to allowable portions of the machine it is running on. Malicious code using ActiveX, on the other hand, can have much more power. Such code can facilitate many computers working together, or it can be exploited, with devastating consequences.

TIME (XNTP) exploits

Many network processes require that servers and clients keep track of time, and that there is no variance between elements in the network. Time is kept the same all over using the XNTP Network Time Protocol. If an attacker can assign legitimate times to fake messages, the messages could masquerade as real ones. Alternatively, a legitimate message can be rendered suspect by shifting its time component to an illegitimate value.