Using Obfuscation to Protect Your Algorithms | Coder to Developer: Tools and Strategies for Delivering Your Software

< Day Day Up >

I mentioned code obfuscation briefly in Chapter 7, “Digging Into Source Code.” Now I want to dig into this topic a bit more, and show you an example of obfuscation in action.

Why Obfuscation?

Any time you’re delivering software without selling the source code (or the rights to the source code), you need to worry about reverse-engineering: obtaining the original code by inspecting the compiled version that you deliver. This is a special concern when you’re working in a modern language that supports reflection, which makes decompilation a trivial affair. Java and .NET both suffer from this potential weakness; you saw in Chapter 7 how easy it is to get the source code back from a .NET application.

This is where obfuscation comes in. Code obfuscators are programs that take your application’s executable code and remove some of the information that the computer doesn’t really need. For instance, you might have a method named MasterLicenseCheck. The computer doesn’t care if that’s renamed to Plergb, as long as the change is made everywhere in the application. An obfuscator will go through your compiled code, changing all of the identifiers to remove any clues that a human with a decompiler can use to make sense of your code.

The goal of an obfuscator is to make recovering the source code from the compiled version difficult or impossible. In practice, there’s a continued arms race between manufacturers of obfuscators and manufacturers of decompilers, but a good obfuscator can certainly help you attain this goal.

Approaches to Obfuscation

Obfuscators can use many sneaky tricks to make it harder to understand your compiled code. This isn’t an exhaustive catalog (and new methods are invented all the time), but I’ll show you some of the cleverness that can go into this effort.

Identifier renaming By changing the names of classes and members such as Download, Engine, and GetDownload to a, b, and c, an obfuscator can make it difficult to guess what code does by looking for keywords. In addition, this change helps optimize execution times by cutting down the length of variable names.
Method overloading The .NET framework allows two methods to have the same name so long as they have different parameters. Obfuscators can thus rename two methods to the same nonsense identifier, even if the two methods have nothing to do with each other, as long as they take different parameters.
Metadata removal Microsoft Intermediate Language (MSIL) files contain some metadata that isn’t needed by the application at runtime. Removing this metadata can make it harder to determine what the assembly is meant to do.
String encryption Encrypting hard-coded string constants can remove another layer of information from your compiled code.
Resource obfuscation Some obfuscators may also scramble, mask, or otherwise hide bitmaps, animations, and other resources embedded in your executable.
Control flow obfuscation There’s more than one way to write many bits of code. For example, you probably know that a For Each loop can be changed to a Do While loop with the introduction of a variable and a logical test. Some obfuscators will perform such transformations to turn your application into spaghetti code while maintaining the logic.
Use of MSIL-only features C#, Visual Basic .NET, and other languages are all compiled to MSIL. Some MSIL features are not used by some source languages. If an obfuscator can figure out how to use these features in the MSIL, they can prevent it from being decompiled back to source code in the affected language.
Unused code removal You may have methods or properties in your code for future use. If an obfuscator can detect this code and remove it, you’ll have an application that loads faster and that is harder to understand.
Cross-assembly obfuscation Even public members can be renamed if you also fix up calls to them from other assemblies at the same time.
Decompiler protection Some obfuscators inject code sequences that will crash or confuse existing decompilers.

Not all obfuscators implement all of these forms of obfuscation, nor do they all perform equally. If you’re thinking of using obfuscation for your product, you should evaluate the available obfuscators to determine which one works best for you. Here are some of the .NET possibilities:

Demeanor for .NET (www.wiseowl.com/products/Products.aspx)
Dotfuscator (www.preemptive.com/dotfuscator/index.html)
Salamander .NET Obfuscator (www.remotesoft.com/salamander/obfuscator.html)
Spices .NET (www.9rays.net/cgi-bin/components.cgi?act=1&cid=86)

Obfuscation in Action

To give you a feel for the obfuscation process, I’ll run through obfuscating Download Tracker using the Professional Edition of Dotfuscator. To start, take a look at the information that you can get out using Ildasm on the unobfuscated assembly. Figure 14.1 shows a bit of the Ildasm interface; you can see all of the control and method names here, among other things.

click to expand
Figure 14.1: Looking at an unobfuscated assembly

Ildasm can go further than just showing you the classes and members in your code. Here’s a small piece of the IL code from DownloadTracker.exe:

   IL_008c:  callvirt   instance void     [DownloadEngine]DownloadTracker.DownloadEngine.Download::     set_ProductName(string)    IL_0091:  ldloc.2    IL_0092:  ldstr      "Enter the Description:"    IL_0097:  ldstr      "Download Info"    IL_009c:  ldloc.2    IL_009d:  callvirt   instance string     [DownloadEngine]DownloadTracker.DownloadEngine.Download::     get_Description()    IL_00a2:  ldc.i4.0    IL_00a3:  ldc.i4.0    IL_00a4:  call       string     [Microsoft.VisualBasic]Microsoft.VisualBasic.Interaction::     InputBox(string, string, string, int32, int32)    IL_00a9:  callvirt   instance void     [DownloadEngine]DownloadTracker.DownloadEngine.Download::     set_Description(string)    IL_00ae:  ldsfld     class [System]System.Diagnostics.TraceSwitch     DownloadTracker.Form1::ts    IL_00b3:  callvirt   instance bool     [System]System.Diagnostics.TraceSwitch::get_TraceError()    IL_00b8:  brfalse.s  IL_00d4    IL_00ba:  ldstr      "Downloading "    IL_00bf:  ldloc.2    IL_00c0:  callvirt   instance string     [DownloadEngine]DownloadTracker.DownloadEngine.Download::     get_ProductName()    IL_00c5:  call       string [mscorlib]System.String::Concat(string,                                                                string)    IL_00ca:  ldstr      "DTAction"    IL_00cf:  call       void     [System]System.Diagnostics.Trace::WriteLine(string, string)    IL_00d4:  ldloc.1    IL_00d5:  ldloc.2    IL_00d6:  callvirt   instance bool     [DTLogic]DownloadTracker.DTLogic::UpdateDownload(     class [DownloadEngine]DownloadTracker.DownloadEngine.Download)    IL_00db:  pop    IL_00dc:  ret  } // end of method Form1::btnGo_Click

It’s not the easiest thing in the world to read without practice, but the MSIL does contain all of the logic and identifiers from the original source code.

The first step in obfuscating this code was to add a new Dotfuscator project to my Visual Studio .NET solution (one reason I like Dotfuscator is that it’s integrated with the Visual Studio .NET IDE). Figure 14.2 shows the solution open in the IDE. The various nodes of the project specify the assemblies that should be obfuscated and the options that I’ve chosen for obfuscation. The Output node brings up a report on the obfuscator’s actions.

click to expand
Figure 14.2: The Dotfuscator project in the Visual Studio .NET IDE

Obfuscating the assembly is then just a matter of building the solution. When all of the other projects have been built, Dotfuscator builds its own project, turning out obfuscated versions of the libraries and executables in its own folder. Figure 14.3 shows the obfuscated assembly loaded into Ildasm.

click to expand
Figure 14.3: Looking at an obfuscated assembly

As you can see, the real names of all of the classes and members have vanished, replaced by single letters. And these single letters are overloaded, referring to two, three, or more different members. For comparison, here’s the same section of code that I disassembled earlier, after obfuscation:

   IL_00d7:  callvirt   instance void [DownloadEngine]f::a(string)    IL_00dc:  ldloc.2    IL_00dd:  ldstr      bytearray (62 28 47 2A 5F 2C 48 2E 5D     30 11 32 47 34 5D 36   // b(G*_,H.]0.2G4]6                                    52 38 19 3A 7F 3C 58 3E 4C     40 22 42 31 44 2C 46   // R8.:.<X>L@"B1D,F                                    37 48 3D 4A 22 4C 22 4E 21     50 6B 52 )             // 7H=J"L"N!PkR    IL_00e2:  ldloc      V_3    IL_00e6:  call       string a$PST06000001(string,                                              int32)    IL_00eb:  ldstr      bytearray (63 28 46 2A 5C 2C 43 2E 43     30 5E 32 52 34 51 36   // c(F*\,C.C0^2R4Q6                                    17 38 70 3A 55 3C 5B 3E 50     40 )                   // .8p:U<[>P@    IL_00f0:  ldloc      V_3    IL_00f4:  call       string a$PST06000001(string,                                              int32)    IL_00f9:  ldloc.2    IL_00fa:  callvirt   instance string [DownloadEngine]f::e()    IL_00ff:  ldc.i4.0    IL_0100:  ldc.i4.0    IL_0101:  call       string     [Microsoft.VisualBasic]Microsoft.VisualBasic.Interaction::     InputBox(string, string, string, int32, int32)    IL_0106:  callvirt   instance void [DownloadEngine]f::b(string)    IL_010b:  ldsfld     class [System]System.Diagnostics.TraceSwitch     a::a    IL_0110:  callvirt   instance bool     [System]System.Diagnostics.TraceSwitch::get_TraceError()    IL_0115:  brfalse.s  IL_0142    IL_0117:  br         IL_001a    IL_011c:  ldstr      bytearray (72 28 5A 2A 4E 2C 5F 2E 0F      30 52 32 5F 34 5C 36   // r(Z*N,_..0R2_4\6                                    54 38 52 3A 5E 3C 59 3E 1F     40 35 42 2B 44 20 46   // T8R:^<Y>.@5B+D F                                    67 48 0E 4A 24 4C 6D 4E 2D     50 24 52 27 54 21 56   // gH.J$LmN-P$R'T!V                                    38 58 37 5A )                            // 8X7Z    IL_0121:  ldloc      V_3    IL_0125:  call       string a$PST06000001(string, int32)    IL_012a:  ldstr      bytearray (63 28 7D 2A 62 2C 43 2E 49     30 5E 32 41 34 58 36   // c(}*b,C.I0^2A4X6                                    56 38 4D 3A 52 3C 52 3E 51     40 )                   // V8M:R<R>Q@    IL_012f:  ldloc      V_3    IL_0133:  call       string a$PST06000001(string,                                              int32)    IL_0138:  call       void [System]System.Diagnostics.Trace::     WriteLine(string, string)    IL_013d:  br         IL_004b    IL_0142:  ldloc.1    IL_0143:  ldloc.2    IL_0144:  callvirt   instance bool [DTLogic]e::     a(class [DownloadEngine]f)    IL_0149:  pop    IL_014a:  ret  } // end of method a::c

TECHNOLOGY TRAP: If It’s on Their Machine, They Own It

Obfuscation is not a panacea. It will not prevent reverse-engineering MSIL into a higher-level language, though that language might not make a lot of sense. You can’t prevent people from decompiling Java or .NET code, but you can go a long way to ensure that what they get as a result is a stew of confusing jumps and meaningless names. The bottom line, though, is that code that runs on someone’s computer can be analyzed, and potentially understood, by that person. For example, instead of trying to understand the code only from static disassembly, a determined attacker who can run the application can hook up a debugger and monitor the process memory, learning what variables are stored and how they change during execution. From this and other information, and lots of patience, even obfuscated code can yield original algorithms.

Sometimes this state of affairs is not satisfactory. If you’re dealing with encryption algorithms, or other extremely sensitive code, you may want to implement an even higher level of protection than obfuscation can offer. How can you do this? The answer is to never run the sensitive code on the user’s machine. Instead of implementing your sensitive algorithms in a library that resides on the user’s computer, implement them in a library that runs only on your own server. Then provide a remote interface, through web services or some other API, to allow users to invoke the library and get back results. Obviously there are drawbacks to this scenario (you need to worry about the server’s reliability and ability to handle the load, and remote method calls over the Internet are likely to be slow), but you can be sure that no one is reverse-engineering your actual code.

While you can still extract some meaning from this (for example, it’s still clear that the code calls the Visual Basic InputBox function), the actual logic flow of the application is a good deal more obscure than it was in the original.

< Day Day Up >