8.3 Web Languages

Team-Fly    

 
Malicious Mobile Code: Virus Protection for Windows
By Roger A. Grimes
Slots : 1
Table of Contents
Chapter 8.  Internet Browser Technologies


The World Wide Web runs HTML, scripting languages, and object files (such as audio, video, graphics). Most of the coding is located on the web server and downloaded as needed. Scripting languages can run either on the web server (like CGI or ASP) or within the confines of the browser (like JavaScript). This section will focus on web languages that download and operate within the browser (client-side).

8.3.1 HTML

Web site documents are made up of ordinary text files conforming to the Hypertext Markup Language (HTML) standard. HTML is a subset of the larger Standardized General Markup Language (SGML) document specification in use long before HTML. A simple HTML file contains four components :

  • Text

  • Tags

  • Links

  • Other nontextual content

The text is the plain ASCII text you see displayed in your browser, or it can be used inside of tags as undisplayed code. Tags are contained within angled brackets <>, and are used to mark actions and format portions of the text. Often formatting tags come in sets of two: one to turn on a particular attribute and one to turn off the attribute. For example, <B>Malicious Mobile Code</B> in a web document would display the text, Malicious Mobile Code in a bolded format. Most HTML pages will contain a fair share of formatting tags to display even the simplest of text statements. Example 8-1 shows an example of HTML source code for a very simple web page.

Example 8-1. Example of a small HTML document
 <HTML>    <HEAD>       <TITLE>Tiny HTML document</TITLE>    </HEAD>    <BODY>       <P>Hello world!    </BODY> </HTML> 

A few tags are required on all web documents. For example, all HTML documents should begin with <HTML> and end with </HTML>. Other are only used where necessary. Tags are also used to link to other types of content, like script files, audio and video files. Table 8-3 shows some of the legitimate HTML tags that can bring malicious mobile code our way.

Table 8-3. Common HTML 4.0 tags

Tag

Explanation

Example

<Img Src>

References content, usually an image, into the downloaded web document

<img src="graphics/picture.gif">

Downloads a graphics file called PICTURE.GIF from the web server's graphics subdirectory into the browser

<A Href>

Anchor reference creating a link to another document or object

 <a href=http://www.myexample.com/index.html> 

<Frameset>

Defines the attributes and boundaries of frames within a browser

 <Frameset cols="50%,50%" rows="75%,25%"> 

<Script>

Defines where a script is located within a page

 <SCRIPT type="text/vbscript" src="http://www.example.com/vbcalc"> </SCRIPT> 

<Applet code=>

Calls a Java applet; being phased out in favor of <object> tag

 <Applet code="Sample.class"> 

<Object>

Calls an image, Java Applet, ActiveX control, video clips, audio clips, or other HTML documents

 <OBJECT CODETYPE="application/java-archive" CLASSID="java.Sample.class" HEIGHT="101" WIDTH="101" 

HTML-coded documents can be written manually or generated from within an HTML tool. Most web sites are a combination of both. Reading and understanding HTML code from a live web site can be challenging if you are not intimately familiar with its syntax. Example 8-2 is the source code taken from O'Reilly & Associates' web site.

Example 8-2. Example of HTML source code from http://www.oreilly.com
 <html>  <head>  <TITLE>www.oreilly.com - Welcome to O'Reilly &amp; Associates!-computer  books, conferences, software, online publishing</TITLE> <META name="keywords" content="O'Reilly, computer books, technical  books, UNIX, unix, Perl, Java, Linux, Internet, Web, C, C++, Windows, Windows  NT, Security, Sys Admin, System Administration, Oracle, PL/SQL, online books,  books online, computer book online,e-books, Perl Conference, Open  Source Conference, Java Conference, open source, free software, XML,  php, PHP, CGI, cgi, VB, vb, VB Script, Java Script, javascript, Windows 2000, p2p, peer to peer, peer-to-peer"> <link href="style/style2.css" type="text/css" rel="stylesheet"> </HEAD> <BODY BGCOLOR="#FFFFFF" VLINK="#0000CC" LINK="#990000" TEXT="#000000"> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" WIDTH="700"> <TR> <TD NOWRAP COLSPAN=2><img src="graphicsnew/header_main.gif" width="700"  height="75" border="0" alt="Welcome to O'Reilly &amp;  Associates"><br><a href="index.html"><img src="graphicsnew/hometab.gif"  width="79" height="18" border="0" alt="O'Reilly  Home"></a><a href="http://www.oreillynet.com"><img  src="graphics_new/orn_tab.gif" width="91" height="18" border="0"  alt="O'Reilly Network"></a><img src="graphics_new/header_tag.gif"  width="530" height="18" border="0"></TD> </TR> <td valign="middle" bgcolor="#990000" align="right" height="30" NOWRAP> <font class="tiny"> <FORM METHOD="get" ACTION="http://search.oreilly.com/cgi-bin/search"> <INPUT TYPE="text" NAME="term" SIZE="20"> <SELECT NAME="category"> <OPTION VALUE="All">All of oreilly.com</OPTION> <OPTION VALUE="Books">Books</OPTION> <OPTION VALUE="Conferences">Conferences</OPTION> </SELECT> <INPUT CLASS="tiny" TYPE="submit" VALUE="Search"> <img src="/graphics_new/dotclear.gif" width="2" height="1"> </TD></FORM> </TR> </TABLE> <TD VALIGN="TOP"><a href="cat/search.html" CLASS="nav2">Search</a></TD> </TR> 

As you can see, there is significantly more to HTML code than I can even begin to explain in this book. When I review HTML source code looking for possible malicious code, I breeze through most of the code looking for tags that can load potentially malicious links and content. Most of those are listed in Table 8-3 later in this chapter and will be covered in more detail.

8.3.1.1 Viewing HTML source code

You can view a web page's HTML source code within your browser. In Internet Explorer, choose View figs/u2192.gif Source (see Figure 8-2). You can view, copy, or print out the code. In many cases, not all the code related to a page will be contained within the page itself. The page can contain links to other pages, objects, and scripts.

Figure 8-2. Viewing HTML source code
figs/mmc_0802.gif
8.3.1.2 HTML versions

There have been several versions of HTML, starting with 0.90, 2.0, 3.2, 4.0, and now 4.01. Each version of HTML adds more functionality, while remaining mostly backward-compatible . Most browsers support HTML 4.0, which was released in 1997. The current recommended version by the Worldwide Web Consortium is XHTML 1.0 , which uses Extensible Markup Language (XML) to extend HTML 4.0's functionality.

8.3.2 XML

XML is probably the biggest innovation on the Web since HTML. You will see it mentioned in almost every major product upgrade. XML will deliver on the long-talked-about promise of allowing different systems to communicate. Most new interfaces between systems are built using an XML-standard. HTML, a subset of SGML, was designed to transmit information across the Web into computers. As the Web matures, it's starting to reach into every electronic device (for example, television, radio, pagers , cell phones, fax machines, and more). The Internet isn't just computers anymore and HTML is not flexible enough. Enter XML.

XML isn't a language; it's a way of describing information. The extensible nature of XML means that documents and information can be transmitted to all types of devices. Using XML, anyone can create a new set of communication standards (i.e. create their own markup language), which adheres to basic tenets with a large goal of interoperability. Many of the upcoming web standards will based in XML. To date, there have been no documented cases of XML exploits, but as its popularity rises, you can be assured it will become an avenue of attack.

XHTML implements HTML 4.0 with XML formatting and syntax, instituting, among other things, stricter coding requirements. HTML allows sloppy coding. Programmers can leave mandatory tags and mangled syntax in a web page and HTML browsers will ignore it. Different than XHTML, it enforces a normal amount of discipline on its programmers.

8.3.3 DHTML

Dynamic HTML (DHTML) is the group name for a set of new HTML tags and features. Created to overcome the static nature of HTML, DHTML makes web pages more animated and interactive. Text can change colors, flash, or change size when the mouse cursor passes over it. Forms can present themselves in a pop up fashion, page headings can change automatically after some predefined event, and objects can be dragged and dropped within the web page. Most popular browsers, including Internet Explorer, Netscape, Mozilla, and Opera implement various versions of these HTML tag extensions. DHTML relies heavily on cascading style sheets , which are covered later on in this chapter.

8.3.4 Scripting Languages

Scripting languages allow HTML to be dynamic and flashy. Prior to scripting, HTML downloaded static text and graphics into the browser with very little creativity. Scripts allow a web site to interact with the user, responding to pages loading, mouse movements, and button clicks. Scripting can be used to deliver customized forms to the end user or allow the web site to react differently based upon the user's choices. And probably its most dangerous attribute is the ability to invoke other programs, applets, and controls. A scripting language can take a program designed with no harmful intent, and cause serious data loss. Scripting is the malicious hacker's best tool for browser malevolence.

HTML is script language-independent, meaning that as long as the proper syntax is used to load the script, any scripting language (i.e. VBScript, JavaScript, JScript, PHP, CGI, Python, TCL, etc.) can be used. What language can be used is only limited by what the browser supports. The script can be located within the HTML page itself, or pulled in on demand. The latter is easier and more common. The three most common scripting languages for the Windows browser environment are

  • JavaScript

  • VBScript

  • JScript

8.3.4.1 JavaScript

Invented by Netscape in conjunction with Sun Microsystems (borrowed from a scripting language called LiveScript ), JavaScript has little to do with the Java language besides the name. JavaScript is widely support by most Internet browsers, including Navigator (versions 2.0 and higher), Internet Explorer (versions 3.0 and higher), and Opera figs/u2122.gif . Because of this, it is the scripting language of choice for most web sites. Besides being able to manipulate the browser window, JavaScript can access the local file system and environment, modify the registry, launch other programs, and create new files and processes. All and all, JavaScript is pretty powerful stuff. HTML was boring and plain -- and relatively safe till JavaScript showed up.

All scripts must be defined on a web page using the <SCRIPT> tags. The <SCRIPT> tag was added as an extension in HTML version 3.2, and as such, has been supported since Internet Explorer version 3.0. JavaScript scripts are defined by <SCRIPT LANGUAGE="JavaScript"> or <SCRIPT TYPE="text/javascript">, and hand back control to the HTML page with the closing </SCRIPT> tags.

8.3.4.2 VBScript

VBScript , or Visual Basic Scripting Edition , is a Microsoft scripting language with deep roots in Visual Basic programming. Like JScript, it comes free with either Internet Explorer or Microsoft's Internet Information Server. Unfortunately, it is only supported by Microsoft browsers version 3.0 and higher, and because of this, has attracted more attention in the intranet environment space. VBScript scripts are called by the <SCRIPT LANGUAGE="VBScript"> or <SCRIPT TYPE="text/vbscript"> tags. The VBScript code in Example 8-3 presents a button that will display "Hello World" when clicked.

Example 8-3. Example VBScript
 <A HREF="" language=VBScript onclick="alert 'Hello World' "> <IMG SRC=example.gif> </ A> or use VBScript as... <A HREF="" onclick="DoBegin"> Click here!</A> <SCRIPT Language = "VBScript"> <!--DoBegin subprocedure SUB DoBegin alert "Hello World" END SUB --> </SCRIPT> 

VBScript also has the distinction of being used in the largest number of malicious web objects. Whereas JavaScript and HTML have generated a few dozen malicious scripts each, malicious VBScripts number in the hundreds.

8.3.4.3 JScript

JScript is a JavaScript clone developed by Microsoft to compete against Netscape's popular scripting language. Its claim to fame is near 100 percent compatibility with JavaScript and full support of the open scripting standard, ECMAScript figs/u2122.gif . Since JScript strives to maintain close compatibility with its competitor, JScript scripts are called with either the <SCRIPT LANGUAGE="JavaScript"> or the <SCRIPT TYPE="text/javascript"> tag.

8.3.4.4 Remote scripting calls

Microsoft is promoting a new type of scripting called Remote Scripting figs/u2122.gif (http://msdn.microsoft.com/workshop/languages/clinic/scripting041299.asp). It uses client-side Jscript, a client-side Java Applet, and server-side Jscript or VBScript, to accomplish a more efficient way of filling out forms and end user interaction. Remote scripting works with Internet Explorer and Netscape, and provides a way for a client browser to call code on the server before resubmitting the entire form or user response, as is normally the case. For instance, with typical HTML forms, you fill in the entire form and hit the Submit button. The entire form does not get server-side scrutiny until the user sends the form. A remote script call can allow the server to query the user's form as it is being filled in, offer suggestions, and proofread it before it is submitted in whole. Great idea, although I'm sure it would allow a creative hacker to introduce new security holes.

8.3.4.5 Hypertext preprocessor script

Hypertext Preprocessor Scripting Language (PHP) is an open-source, server-side scripting language (http://www.php.net) gaining popularity in the Windows and Linux world. The PHP comes from its earliest name, Personal Home Page Tools , and PHP-enabled web pages will end in .PHP , .PHP3 , or .PHTML . It was developed as an open-source, cross-platform alternative for web pages. Many small web sites use it to collect data and to interface with a backend database. PHP files and tools are commonly shared over the Internet, and a few viruses have been written in it. Fortunately, because it is a server-side tool, PHP viruses will not affect most PCs. I mention it because PHP is commonly used in personal web servers, and thus, on end user workstations.

There are many other browser scripting languages, like Perl , that are just as powerful with varying levels of support among the different browsers. Some require add-ons (for example, Internet Explorer using ActiveState's PerlScript) to launch, and others are directly supported by the browser environment. Other scripting languages, like Python , can be used to build client applications, but are just finding widespread support. Most web sites use JavaScript or JScript.

8.3.5 HTML Applications

HTML Applications (HTAs) were introduced with Microsoft's Internet Explorer 5.0 to allow programmers to write local applications using all the conventional HTML tools. They can include JavaScript and VBScript commands. A HTA source code file is identical to an HTML file, except that the file extension is .HTA instead of .HTM or .HTML . HTAs only work with Microsoft's latest operating systems, Windows 9x, NT, 2000, and ME. HTAs display themselves in a plain-looking window that can be customized, with little resemblance to Internet Explorer.

When an HTA is run, Windows starts MSHTA.EXE , which is sometimes called "IE-lite." The first time an HTA is executed, it will prompt the user to Open or Save the file. After being launched, the HTA program will download any necessary components. Further launches will not result in the user being asked to Open or Save.

HTAs aren't subject to the same security limitations that other types of browser content are and can do pretty much anything they want to the local file system. This, of course, is something malicious code writers love. Microsoft has taken a few security steps to limit foreign web pages or code from calling a locally trusted HTA, but there have been several exploits. Some of the most popular viruses and Trojans launched in 1999 and 2000 were written as HTAs.


Team-Fly    
Top


Malicious Mobile Code. Virus Protection for Windows
Malicious Mobile Code: Virus Protection for Windows (OReilly Computer Security)
ISBN: 156592682X
EAN: 2147483647
Year: 2001
Pages: 176

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net