Step 3: Analyzing Each Web Resource

Step 3: Analyzing Each Web Resource

Our ultimate goal in site linkage analysis is to classify the resources within a Web site or a Web application in an organized way. From this classification, we should be able to take a quick look at each resource and determine the role it plays in the overall application. In our fieldwork we often create a matrix of Web resources and check each element in the matrix as we go along. To illustrate the creation of such a matrix, we present some techniques that we use successfully in our analytic work the seven techniques associated with Step 3 in Figure 8-1.

1. Extension Analysis

Looking at file extensions in the URL path gives us an idea about the technologies being used. We can determine, for example, whether a resource has static content or is a program that generates dynamic content. We can determine the underlying technology used to host this resource, especially in the case of application servers. Let's consider again the four Web resources from http://www.example.com/:

http://www.example.com/index.asp

http://www.example.com/login/login.asp

http://www.example.com/scripts/search.cgi

http://www.example.com/scripts/mail.pl

Note the three types of extensions: .asp, .cgi, and .pl. They indicate that technologies such as Microsoft's Active Server Pages and Perl are in use. The extension .cgi is too generic to make a specific comment. Extensions such as .jsp, .cfm, and .php represent server-side scripts or programs. Other extensions, such as .html, .htm, and .txt, signify static HTML or ASCII content. If you've forgotten what these extensions represent, take a quick look at Chapter 1. In this way you can identify different technologies in use by just inspecting the final extension. Hence the file or resource extension provides essential information on its type and the technology used to host it.

2. URL Path Analysis

If extension analysis doesn't fully reveal the technologies that are at work, the next step is to analyze the entire URL path. If a URL path has strings such as /scripts/ or /cgi-bin/ or /servlet/ within it, revealing the underlying application server technology helps. Many times, URLs have a query string component. The parameters in the query string component also give clues about the resource's functionality. For example, the following resources found on http://www.example.com suggest that "showtrans" and "setcustinfo" are Java servlets, from the /servlet/ URL prefix.

http://www.example.com/servlet/showtrans?id=13445466

http://www.example.com/servlet/setcustinfo

3. Session Analysis

In applications where maintaining the session state is essential, we come across one of two proven methods for maintaining state information. HTTP application sessions are maintained either by passing a session identifier as a hidden field in HTML forms or via cookies. Sometimes the session identifier is passed as part of the URL itself, but this has the same effect as passing the session identifier with hidden fields.

Close inspection of the following URL on www.example.com reveals the field name "id" with an alphanumeric string:

http://www.example.com/login/secure/transaction.asp?id=AD678GWRT67344&user=bob&doit=add

The parameter "id" is responsible for maintaining the application session. The value of "id" is the session identifier string, which is generated by some specific algorithm. The session identifier allows the application to maintain state information about all current client sessions.

Session management also can be performed by using HTTP cookies. Some newer Web application servers use HTTP cookies to provide built-in support for session management. In www.example.com, if we send an HTTP GET request to /login.asp, we get the following response:

nc www.example.com 80
GET /login.asp HTTP/1.0
 
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Mon, 18 Mar 2002 11:53:33 GMT
Connection: Keep-Alive
Content-Length: 496
Content-Type: text/html
Set-Cookie: examplelogin=3D23523ADAF34546124AE8; path=/
Cache-control: private

The HTTP response header instructs the browser to set a cookie "examplelogin," which has a hexadecimal encoded string as a session identifier. The browser sends this cookie with each HTTP request to the server for the duration of the browsing session.

4. Form Determination

HTML forms are the most common means of allowing the user to interact with a Web application. HTML forms capture input from the user and pass it to the application. Neither HTML nor HTTP is designed with any way to validate the input being sent from the browser to the Web server. The burden of inspecting, validating, and cleaning up the input received from the browser lies with the Web application. HTML forms constitute a crucial entry point to the Web application, from a security point of view.

We can fairly easily determine which Web resources have HTML forms embedded in them. A simple substring search for the <FORM> tag reveals all the Web resources containing HTML forms. Along with looking for the presence of HTML forms, we also look at the fields passed within each form. Many times, an HTML form has hidden fields, which contain critical information being passed. Each form has its own purpose and validation technique. Understanding the purpose of each form and linking it with a possible set of attacks that may be carried out on it is important. Web resources with forms should be identified and given special consideration for security analysis.

5. Applet and Object Identification

Many times, Web resources contain active components embedded as objects in a document. Java applets and Microsoft ActiveX components are well-known examples of embedded objects. Some Web applications are designed so that the applet running on the client's browser provides a richer visual interface than HTML forms. In essence, the entire application becomes a client-server application with the client part handled by the applet and the server part handled by a component that communicates directly with the applet. This direct communication doesn't have to be via HTTP. These objects can potentially open security vulnerabilities at the application layer. It is really quite easy to reverse engineer applets and reconstruct the source code from it, thereby gaining immediate insight into the inner workings of the application. Any weaknesses in design or implementation can be uncovered with reverse engineering.

Looking for applets is quite easy: All you have to do is scan the HTML code for the <APPLET> tag. Consider the Java applet "db.class," which perhaps was designed to interact with a database located somewhere on the server; the following HTML code snippet is from http://172.16.12.1/catalog.html:

<applet codebase="./database" code="db.class" width=400 height=150>
<param name="db_server" value="172.16.12.3">
<param name="db_instance" value="catalog">
</applet>

The <APPLET> tag reveals that the Java applet is located at http://172.16.12.1/database/db.class. Two parameters db_server and db_ instance get passed to the db.class applet.

Microsoft ActiveX components get embedded in HTML similarly. Now consider use of the <OBJECT> tag instead of the <APPLET> tag; the following HTML code snippet is from http://172.16.12.1/xml_view.html.

<OBJECT id=xmlFormat style="WIDTH:500px; HEIGHT:500px"
codeBase=http://172.16.12.1/xmlFormat.CAB>
<PARAM NAME="Data" VALUE="http://172.16.12.1/myData.xml">
</OBJECT>

6. Client-Side Script Evaluation

Client-side scripts such as JavaScript or VBScript are embedded in HTML by using the <SCRIPT> tag. While performing site linkage analysis, we need to identify and evaluate the Web resources that contain client-side scripts, because they may need more attention than regular HTML content does. At times, Web developers leave the burden of input validation to client-side scripts. However, bypassing such checks, either by altering the HTML source received in the browser or by entirely disabling client-side scripts, is trivial and quickly done.

7. Comment and E-Mail Address Analysis

Web resources are developed in predefined higher-level languages. As with most programming languages, HTML supports comments embedded in document code. We dealt with HTML comments in Chapter 7. The same techniques need to be used to extract and analyze HTML comments for linkage analysis.

Step-3 Wrap-Up

By now, we have enough material to start building a Web resources inventory matrix. Continuing with http://www.example.com, we get the following matrix, filled in with information gathered from Steps 1 3.

Resource

Applet/Form

Object

Cookie

Client Script

Authentication

Resource Type

/login/login.asp

X

 

X

 

 

.asp

/login/secure/transaction.asp?id=

 

 

 

 

 

 

AD678GWRT67344&user=bob

 

 

 

 

 

 

&doit=add

 

 

X

X

 

.asp

/download/download.cgi?

 

 

 

 

 

 

file=tools.exe

 

 

 

 

 

.cgi

/download/download.cgi?

 

 

 

 

 

 

file=setup.exe

 

 

 

 

 

.cgi

/scripts/search.cgi

X

 

 

X

 

.cgi

/scripts/mail.pl

X

X

 

 

 

.pl

/admin/administrator.asp

 

 

 

 

 

.asp

/news/index.asp

 

 

 

X

 

.asp

/public/default.asp

 

 

 

 

 

.asp

/servlet/showtrans?id=13445466

 

 

 

 

 

servlet

/servlet/setcustinfo

 

 

 

 

 

servlet

/contacts/

 

 

 

 

 

 

/profile/

 

X

X

 

 

 

/private/

 

 

 

X

Basic

 

/logoff/logoff.asp

 

 

X

 

 

.asp

 



Web Hacking(c) Attacks and Defense
Web Hacking: Attacks and Defense
ISBN: 0201761769
EAN: 2147483647
Year: 2005
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net