6.3. The Fragile Web
Of all the languages in the world, it is somehow fitting that the one that unifies most people in a common experience is a product of the Electronic Age. It's not English, Chinese, or even Japanese; rather it is Hypertext Markup Language (HTML). One of the most important protocols on the Internet, HTML is used to format the text and images people place in their web pages.
The actual heavy lifter for web work on the Internet is Hypertext Transfer Protocol, a special protocol that carries HTML codes and other information over the Internet efficiently.
It would follow that anything so useful as HTML and HTTP would be subject to special attacks, and this is precisely the case. A whole family of attacks can be perpetrated against these formats, and when web pages become corrupted, they can actually become instruments of attack and destruction themselves. Attacks can take place on the user's end, in which case they are called client-side attacks, or on the servers that send out the web pages that clients view, in which case they are called server-side attacks. This section describes the most important client- and server-side attacks.
6.3.1. How HTML Formats the Web
HTTP is a sparse language with a few terse commands such as "GET." What HTTP "gets" is a small file of HTML code.
HTML is blocks of text surrounded by commands. The commands are generally embedded within greater than and less than signs, which are called start tags (<command>) and end tags (</command>), respectively. The start tag instructs the user's web browser to do something, say turn on the bold text function, and the end tag tells the browser to stop doing whatever the start tag kicked off. Thus the command that makes text bold applies to all the text between <bold> and </bold>. Given the many available text and commands in the HTML language, lots of effects are possible.
More advanced commands deal with inserting images on the page. Image commands are actually a call back to a server to download a picture, which the HTML code positions on a user's screen.
Hyperlinks are also important. A user activates a hyperlink by clicking on it. This calls a new file from the server, which may be a whole new web page. Hyperlinks greatly increase the interactivity of a web site, while making it easy to organize a lot of information with a point-and-click interface.
You can view HTML code simply by calling up a web site and clicking on the browser's View Source command. It is usually located on the command line at the top of the browser window. If you cannot find it, use the browser's Help function to look it up.
6.3.2. Advanced Web Services
Merely displaying text, graphics, and photos, might make an attractive web site, but is unlikely to make much money. With increased advertising, the Web has become commercialized, and innovation, not all of it strictly beneficial, has become commonplace. In the first place, ads, instead of images, can be downloaded to users. Banner ads, for instance rotate among a variety of sponsors, who pay for page views, knowing that some percentage of those who view a page will be exposed to the ad. In order to make the ads change from time to time, the embedded commands needed to be a little bit more sophisticated. Various scripting languages have been adapted to this purpose.
A web page script is a little bit of computer code that executes either as the page loads into the computer from the server or on the user's computer as the page is displayed. The scripts that execute on the server are called server-side scripts; scripts that run on the user's machine are called client-side scripts. Using a script, complex functions and displays can be generated programmatically, greatly increasing the functionality and interactivity of web-based applications.
With the power of scripting comes also the opportunity for abuse. It is possible, for instance, to use scripting languages to force disallowed states on either the client or server machines, which can result in unpredictable results and crashes. Some scripts can even exploit vulnerabilities in the underlying operating systems. In this case, the scripts become exploits, and can cause serious trouble or damages. A good deal of work has now gone into trying to ensure that scripting languages on the Web are secure, and for the most part they are. However, every now and again, somebody somewhere discovers a new code sequence that can get around the controls and checks. Most browsers enable users to disable scripts and applets, or to run them in a restrictive fashion. Assuming users stay in the mainstream in terms of web sites, it should not be an issue.
220.127.116.11. What is a script?
The term script in a computing context dates back to the early 1970s and comes from the Unix operating system term shell script. A shell script is a sequence of commands that are read by the computer from a file. Using scripts, multistep commands can be embedded in a web page. Scripting languages have increased in flexibility and robustness; today, scripts can be executed either at the server as the page is called from memory or at the user's computer as the material is called up for display.
The advantage to client-side scripting is that it allows the local computer to check against data entry errors such as proper abbreviation of a state name, or the format of something such as a phone number, before the data crosses the network and ties up the server, or worse, creates an entry in the server which has to be corrected. The advantage to scripting on the server side is that it helps preserve confidentiality. Processes that involve company secrets are executed in the safety of the server room rather than distributed out to where they can be captured and analyzed. In addition, some processes are so intensive that it might bog down the user's machine to execute them on the client side. Anything that involves lots of tables or uses a database is usually best kept server side.
Scripting allows a lot of interaction between the client and the server. Responses from the field may be organized and checked by a client-side script, which transmits them to the server. On the server side, a script may use the information contained in a client's response to modify on the fly what the server sends back to the user. This way the application shows the user exactly what is required or what has changed as a result of the user's entry. A raft of conventional business applications can be operated in real time on the Web using judicious scripting. Focusing demands for computational power on the server also allows for the use of simpler clients. This means that cell phones, video games, and personal digital assistants (PDAs) can be viable terminals in complex interactions.
18.104.22.168. Client-side scripting languages
There are several different scripting languages. While a few are used both client side and server side, most languages are used on one or the other. Language choice is generally based on the specific features or characteristics of the language, adherence to company- or project-wide standards, or simply personal preference.
Occasionally the native browser does not have the program code required to execute everything that is asked of it. Certain media files, for instance, may require a special code to play back properly. A plug-in is a software component that loads automatically to extend the range of files types that a web browser can understand and execute.
These are the most popular scripting languages for client-side applications:
Multiple scripts can run on the same web page, but languages cannot be mixed within the same script.
Java may require the use of a browser plug-in, but it is a popular language because it can operate on any computer that can host a Java Runtime Engine (JRE). The programmer must write the Java code and compile it, but thereafter it will run on any type of computer or device that has a JRE, with little extra programming effort. Java scripts for the client side are usually called applets. When they are constructed to use as server-side, they are called servlets.
Java has lots of power, but it has certain built-in security features that are designed to keep it from doing anything damaging to the machine on which it is running. These features create a virtual environment for the Java code to run in, formerly called a sandbox. The Java security manager keeps the Java code from crawling out of the sandbox. A Java that tries to escape the sandbox and do harm to the client is called a malicious applet.
ActiveX is a powerhouse language that features a lot of capabilities and a fair amount of danger. ActiveX controls can be integrated into web pages and use numerous functions such as buttons, drop-down list boxes, text entry, and display fields. ActiveX controls can take nearly complete control of the host machine, unlike Java, which is at least partially constrained by the security manager. ActiveX can cause real trouble if malicious code is included. On the other hand, ActiveX is powerful and flexible, making it highly attractive to programmers and developers.
22.214.171.124. Server-side scripting languages
When a server-side script runs, it is often because a user has submitted answers to an application or form. The user then downloads a page that shows the effect of the user's submissions. A general property of all server-side processing is that when a user submits a form, the server processes it and creates a new HTML page. The user downloads this page to observe the effect of what they have requested, in effect, the completed form.
The use of scripts on the server is divided between two main paths. Some set ups communicate with the server directly; others communicate via the Common Gateway Interface (CGI).
The Common Gateway Interface (CGI) is a middleman between the web server and the clients. CGI establishes the rules under which various programs will talk to each other. The purpose of the CGI is to translate between unlike languages and systems. CGI programs can be written in many languages. C, C++, Java, Perl, Visual Basic, or any other language that can accept user input, process it, and output a response, can be used in CGI scripts.
If the scripting language is portable between platforms, such as Java, the script can play in multiple environments, and CGI might not be employed.
Scripting languages for server-side applications include:
Which programming language to use is a matter of preference and performance. Perl, PHP, and Ruby tend to look like code written in C, C++, or Java. They bristle with curly braces. ColdFusion code, on the other hand, is stuffed with HTML-like tags. This bothers some programmers because the resulting code seems to lack precision. Metrics that matter include reliability, flexibility, ease of making changes, speedy performance, and an ability to wring the most performance out of the computer it runs on.
Support for the application is also important. A firm that specializes in Windows applications that also uses some Unix or Linux merely because one of the employees is a whiz at it may find themselves in a difficult position if that employee leaves.
Perl is a language tightly focused on the Web and web applications. It has a long history of service. Veteran programmers swear by it. Perl is also an open source language, which means it is produced and supported by a community that allows members to access the internal workings of the language. This may increase its inherent security because any changes will be seen by many eyes, rather than a few employees working under the cover of confidentiality agreements. Combining Perl with an open source web server, such as Apache, creates a powerful combination for web servers.
PHP is open source. Because PHP resembles C, Java, and Perl, this can shorten the time it takes to learn it if you are already an experienced programmer. PHP features easy interface with numerous databases and allows you to program on both Windows and Unix machines.
ColdFusion is a popular web development tool because it uses a system of tags called ColdFusion Markup Language (CFML), which resembles HTML. The ColdFusion development environment tries to automate the coding process as much as possible. This may limit coding errors, since less is done by hand.
Java Server Pages (JSP), or servlets, are a set of Java classes optimized for client server interactions. JSP operates with true Java, which brings portability, multithreading, extensive class libraries, strong safety features, robust security measures, and several other of Java's advantages. Java servlets are efficient because they are persistent. Once created, usually at the time of the first request, the servlets stay resident in memory as compact Java objects. When a subsequent request comes by, the servlet can quickly build the HTML page the client requires. With ASP, the code may need to be reinterpreted for every client request, slowing down the process of page generation.
Python is an interpreted, interactive, object-oriented programming language that resembles Perl and Java. The syntax is clear and avoids nested curly braces. Python is copyrighted but freely usable and distributable, even for commercial use. Python can run on many brands of Unix, Windows, Macintosh, and others.
Each of these programming languages can make the web experience better for the user and more efficient for the provider. However, each of these environments can be a source of trouble. Every language or technology has its own set of bugs and exploits that server operators must track and patch against. It is important to monitor language-specific newsgroups and bulletins to learn about the problems and solutions.
6.3.3. Web Attacks and Preventions
The following sections describe both client-side and server-side web attacks, and what you can do about them.
126.96.36.199. Client-side web attacks
Client-side web attacks include the following:
188.8.131.52.1. General client-side attack preventatives
General measures that can be taken to help prevent client-side attacks include the following:
184.108.40.206. Server-side web attacks
Servers can be attacked just as easily as clients, or perhaps more readily. Servers have the dual disadvantage of having to be exposed to many users, and possibly also to the Internet.