You are on page 1of 21

Security Response

Contents
Abstract .............................................................. 1
File Image vs. Memory Image ............................ 2
API Analysis ........................................................ 4
Generating Memory Dumps .............................. 5
Runtime API Address Resolution....................... 7
Basic API Obfuscation ..................................... 10
Advanced API Obfuscation .............................. 13
Conclusion........................................................ 19
References ....................................................... 20
Abstract
Antivirus software vendors attempt to identify threats by unpack-
ing suspicious samples and hence aim to produce as many unpackers
as possible. When characteristic portions of a successfully unpacked
sample are identified, the sample can be tagged and detection add-
ed. This procedure is commonly used for variants of well-known mal-
ware families and does not require the analysis of Windows API calls
made by the sample. In contrast, in-depth analysis of packed threats
requires the knowledge of the API functions called during execution.
When a sample cannot be unpacked, memory dumps may be used
to provide insight into its behavior. Such dumps are frequently de-
tailed enough to allow analysts to ascertain whether a particular
sample is malicious, although they may not be suitable for analy-
sis using software disassemblers; executable section headers need
to be adjusted in order for the software to resolve API calls and
avoid problems that may be caused by incorrect file alignments.
API calls may be obfuscated in a number of ways, with some examples
being:
Setting up multiple memory segments containing jump relays
Copying initial instructions from API functions into malware mod-
ules and subsequently jumping into the body of the API
Copying API instructions interleaved with redundant instructions
Analysis may be made more difficult when these methods are mixed
or of a scope that makes the use of manual tools unwieldy. This paper
seeks to detail methods that may be used to obfuscate API calls and
the tools and techniques that may be used to resolve them.
Masaki Suenaga
Senior Software Engineer
A Museum of API
Obfuscation on Win32
A Museum of API Obfuscation on Win32
Page 2
Security Response
File Image vs. Memory Image
The Portable Executable (PE) format is the file format for executables used in 32-bit and 64-bit versions of Win-
dows (Windows 95, 98, Me, NT 4.0, 2000, Server 2003, XP, Vista, and so on). PE files are not loaded into memory
as they appear on disk, instead being loaded according to the information in the PE file header. Unlike MS-DOS,
segment registers are not used to relocate code and data, with sections instead being used to hold program
code, variable, constant and resource data. Where possible, the PE file format stores these sections in a packed
state, holding information as to how much memory will be required as opposed to the actual data itself. Also un-
like MS-DOS, Windows assigns a 2GB address space to each newly loaded program.
The Role of the Loader
When a PE-format file is to be executed, the file handle is passed to the CreateProcess() API function and statical-
ly imported DLL files are loaded (EXE and DLL files are both in PE format and are identical apart from a single-bit
flag). This initialization is performed by the OS component called the loader, detailed in this section.
The loader reads the PE file header and copies it to the 32-bit ImageBase address taken from the header; if this
address is in use the loader determines an alternative address. It then reads each section and copies each one
to memory. Two distinct alignments may be specified: file alignment and section alignment. If these alignments
are different, the loader copies each section to memory according to the section alignment, which in most cases
results in the memory layout being different from that of the PE file.
If the loader uses a different ImageBase address from that specified in the header as is often the case with
.dll files address relocation must be performed. Relative jumps and calls do not need to be changed, but all
absolute addresses (calculated during the linking stage of program compilation) must be adjusted by taking the
difference between the preferred ImageBase and the chosen ImageBase and altering the addresses accordingly.
Nearly all programs import DLLs of some kind. Regular Windows executables are unable to call int instructions
(as was possible with MS-DOS) and must make API calls in order to perform useful work. The loader loads the
statically imported DLLs, resolves the APIs and writes the addresses of API calls to memory. This process is
described in more detail in the following section.
With these processes complete, the process memory layout differs from that of the PE file in terms of alignment,
section length, address relocation and API resolution. If program execution has begun the variable data section
will also be tainted.
API Address Resolution in the Loader
API calls in programs built with Microsoft Visual Studio are compiled to call [offset32], or mov reg32 [offset32],
call reg32. For example (note that addresses will vary):
TranslateMessage(&msg);
will be compiled and linked as:
lea eax, [ebp-20h]
push eax
call [01001270h]
or:
mov edi, [01001270h]
lea eax, [EBP-20h]
push eax
call edi
A Museum of API Obfuscation on Win32
Page 3
Security Response
The address of TranslateMessage() in the example above is stored at the virtual address of 01001270h. When
program execution begins, the correct address of TranslateMessage() is written to these four bytes of memory;
the PE file contains dummy values.
This API resolution is done by the loader before starting the programs code. In order to achieve this, the loader
traverses the import directory table (pointed to by the PE header) which specifies which DLLs to load, what API
calls to search for and where to write the API function addresses.
IDA
Interactive Disassembler (IDA) is a software tool used by many virus analysts. IDA is able to report the names of
API functions when it finds call instructions leading to Import Address Table (IAT) entries. In other words, IDA is
unable to resolve API names if the IAT is not found as specified in the PE header, which is why memory dumps
are often unsuitable for analysis with IDA.
Development Environments and API Calls
Differences in the PE files produced by different development environments can aid the virus analyst.
PE files built with Microsoft Visual C usually make API calls using call dword ptr [IAT entry]. Programs written in
C++ do the same when calling Windows APIs, but some encoding is applied when calling C++ runtime libraries,
for example the new operator will be compiled to call dword ptr [??2@YAPAXI@Z], where ??2@YAPAXI@Z maps to
void * __cdecl operator new (unsigned int). The name of the function is difficult to ascertain by eye, but IDA is able
to perform the decoding and reports operator new when this entry is encountered.
Unfortunately, calls to MFC methods such as the destructor void CWnd::~CWnd() (which may be expressed as
??1CWnd@@UAE@XZ) are not imported by name. Instead, a predetermined ordinal number is used, which is 818
within mfc42.dll in this case. Because the program simply calls the 818th function of mfc42.dll, it is impossible to
retrieve the function name from this information alone. When IDA detects mfc42.dll and imports by ordinal num-
bers it is able to display the API function names using a preconfigured internal list; however, IDA cannot resolve
the virtual address 0x73D31828 in a memory dump unless it is linked to the 818th export of mfc42.dll.
Programs written in C and Delphi and built using Borland products call inside the runtime libraries and then
jump to API functions. For example, a call to GetSystemMetrics() will be compiled to call near ptr j_GetSystemMet-
rics, whereas j_GetSystemMetrics will be jmp dword ptr [GetSystemMetrics] which resides in the IAT and should
be resolved by the loader. IDA labels the entry in the IAT as __imp_GetSystemMetrics and the called address as
GetSystemMetrics(), thus aiding the process of analysis.
Programs written in Visual Basic import VB runtime functions via the IAT, but Windows APIs are imported in a
slightly different way. The typical method is as follows:
00408094: db urlmon,0
004080A0: db URLDownloadToFileA,0
004080B4: dd 408094 ; offset of urlmon
004080B8: dd 4080A0 ; offset of URLDownloadToFileA
004080BC: dd 040000
004080C0: dd 4092D8 ; offset of a structure
. . .
004080CC: URLDownloadToFileA proc near
mov eax, dword _ 4092E0 ; initially zero
or eax, eax
jz short 4080D7 ; If not yet resolved, call VB Runtime library.
jmp eax
004080D7: push 4080B4
mov eax, offset DllFunctionCall ; jmp [ _ _ imp _ DllFunctionCall]
jmp eax
A Museum of API Obfuscation on Win32
Page 4
Security Response
When URLDownloadToFileA() is first called, a check is made as to whether URLDownloadToFileA() has been
already resolved. If not, control is deferred to DllFunctionCall() in the VB Runtime library which resolves the
address of the API function (caching it for the next time URLDownloadToFileA() is called) and calls the function
itself.
As API references will not be resolved until call-time, memory dumps of programs using the above method for
API calls are likely to be incomplete. Fortunately for the virus analyst, the names of the API functions appear in
ASCII text.
API Analysis
Analysis of API calls may not be necessary when analyzing variants of familiar threat families, as core functional-
ity is likely to remain chiefly the same across the board, with differences being the network addresses, URLs, file
names and various other strings compiled into the code.
The knowledge of which API functions are called by an unfamiliar sample, though, is important when attempt-
ing to decide whether or not it is malicious. If the API functions called remain a mystery, the presence of cer-
tain email addresses, URLs, game-related strings and keystroke logging routines would tend to indicate that a
particular sample might be a Trojan horse that attempts to steal information related to online games; however,
without the keystroke logging code, such a sample could itself be an online game, and as such API analysis is
often required.
API Call Analysis vs. API Call Monitoring
Some security products monitor the API calls made while a program is executing; by evaluating the call sequenc-
es the software can attempt to determine whether a process is malicious or benign. Products such as these often
hook the API entry points; some techniques used to avoid detection are detailed in Operating System Interface
Obfuscation and the Revealing of Hidden Operations.
1
In contrast, the virus analyst often has only a memory dump to work from, and needs to know what API functions
may be called by a sample, i.e. a superset of those that have been called. API call monitoring techniques can only
provide the latter.
Avoiding Guesswork
In addition to the classification of samples into malicious and non-malicious groups, it is also the job of the
malware analyst to come to understand, by whatever means, the behavior of a sample. If the API functions called
by a sample cannot be deduced, it may be possible to make educated guesses based on the presence of certain
parameters and strings. Some API functions such as RegOpenKey() require obvious parameters (for instance,
0x80000001 which is HKEY _ CURRENT _ USER), but others such as GetLocalTime() do not. String parameters
may also be good hints as to which API function is being called, for example RegOpenKey() may take Software\
Microsoft\Windows\CurrentVersion. API functions taking arguments other than strings are harder to
guess, and ambiguity is introduced. When possible, guesswork is to be avoided during detailed analysis.
Following the operation of the loader, the IAT will contain the virtual addresses of all imported API functions; for
example, the address 0x77D16017 in the IAT refers to the GetSystemMetrics() API function in certain versions of
Windows XP. The obfuscation of API calls hides this information and makes analysis much more difficult.
Motivation for API Obfuscation
Ambiguity is introduced if strings are encrypted and API calls are obfuscated; the analysis of such a sample is
likely to be a more complex and lengthy task than if no such obfuscation were present. Even if an unknown pro-
prietary packer is used (with no corresponding unpacker available to the analyst), the memory dump of a running
executable can be used during analysis. Just-in-time string decryption, code obfuscation and API call obfuscation
are all techniques used by malware authors to hide their intentions, but API call obfuscation also has legitimate
uses.
A Museum of API Obfuscation on Win32
Page 5
Security Response
An online game in which experience points are calculated by the client application as opposed to on the server
is one example of a legitimate use of API obfuscation. Underhanded players may seek to analyze the client
program and overwrite certain memory locations in order to change values or inject custom code. The game pro-
vider may decide to obfuscate the API calls made by the client program in order to prevent this kind of analysis
and hence minimize cheating.
API call obfuscation may also be used to prevent code from scrutiny when proprietary algorithms are used, for
instance in the case of product registration keys, encryption algorithms and routines that are trade secrets.
Meaningless and/or redundant API calls may also be inserted in order to make analysis more difficult.
Although intended for legitimate use, commercial packers used to obfuscate program code may also be used by
malware authors.
Motivation for Overcoming API Obfuscation
It is the job of the malware analyst to provide his or her customers with information that may be used to mitigate
against threats; sample analysis can provide mitigation information such as TCP/IP ports to close, IP addresses
or domains to block at the firewall, registry entries to set or other system changes to make.
Malware analysts are also called upon to create free removal tools for certain viruses. Deep analysis of a sample
is required for tools such as these to comprehensively and effectively clean all traces of a threat from a compro-
mised computer for all possible differences in system setup and locale.
Tools for Overcoming API Obfuscation
As previously mentioned, IDA is a disassembly tool used by many malware analysts that supports a C-like inter-
nal scripting language, IDC script. IDC script allows the user to automate repetitive tasks, perform lookups and
change the way in which information is displayed, including changing the label names on specific addresses.
The following sections will discuss the creation of a tool to de-obfuscate the API calls made by a program and
generate an IDC script to perform the address renaming within the disassembler environment.
Generating Memory Dumps
Resolving API Function Names Without the Import Table
Providing the addresses of the API functions called by a program have been resolved (either by the loader or the
program itself), the function names are relatively easy to look up and it is not necessary to use the import table.
For example, if the instruction call [01001480h] appears in a memory dump, the tool may read 77D16017h from
this address and then search all memory blocks for the module in which this address appears; it will soon find a
match in user32.dll. After parsing the export table of user32.dll the tool can resolve this address to the GetSys-
temMetrics() API function.
Trusting the Import Table
The import table from a given memory dump cannot necessarily be trusted; malware authors may remove the
import table from memory or may even deliberately construct a fake import table to mislead the analyst (and his
or her tools) and hence make the task at hand more difficult. The import table should therefore be erased from
the memory dump.
Adjusting Image Base and Section Tables
IDA is designed to dissemble PE programs in file format, and as such considers information written in the PE
header to be correct. If the ImageBase in the header appears as 10000000h, IDA shows the assembler code at
this location without taking into account any relocation that might have occurred at load-time, which is espe-
cially common with DLLs. If a DLL whose ImageBase is 10000000h is relocated to 12000000h, all operands and
data structures that use absolute addresses will be updated to point to the correct virtual addresses. The reloca-
A Museum of API Obfuscation on Win32
Page 6
Security Response
tion is performed by the loader but the ImageBase information in the PE header is not updated in memory. This
discrepancy will cause IDA to function incorrectly, especially when strings are involved. Any tool for resolving API
names from memory dumps therefore must take relocation into account.
Differences in alignment between file and memory must also be considered; for example, a given executable may
have file alignment of 200h and memory alignment of 1000h, meaning that the first section can begin at file
offset of 200h (or 400h, 600h etc.), but it is loaded to 1000h (or 2000h, 3000h, etc.) from the header position
in memory. When IDA reads a PE file as input it calculates addresses and offsets based on file alignment, which
results in a difference of 0E00h (or 0C00h, 0A00h etc.) in addresses for this example.
IDA also examines the raw data offsets in each section, which reflects the file alignment. For example, if the raw
data offset of a section is 1200h and the virtual address of the section is 2000h, IDA adjusts every address in
the section by 0E00h, effectively copying the 1200h to 2000h portion of the section. When a memory dump is
being used, however, this section data will already have been copied by the loader; a further addition of 0E00h
will produce anomalous results. In order to avoid this the tool must adjust each sections raw address to match
the virtual address of the section.
Recreating a Missing Header
Once an executable is running, the PE header of the .exe file is no longer required; indeed, some programs over-
write their own PE header structure in memory. In some cases the section is truncated while in others overwrit-
ten with data that may cause IDA to fail.
Even if the PE header is zeroed out or filled with junk data, the tool should be able to create a new catch-all PE
header with a single flat section table that covers the entire block of memory used by the executable. This means
that section information will no longer be required to analyze a memory dump.
Searching Hidden Modules
We refer to EXE and DLL programs as modules. An EXE is loaded first, followed by the DLLs specified in its
import table; if one of the DLLs requires another DLL to run, this will also be loaded, and so on. The modules are
loaded to the appropriate memory block positions, with each block holding a PE header or a section. The space
for the process stack also occupies a memory block. When an API function to allocate memory is called, new
blocks are assigned as required.
During analysis, certain modules can be enumerated using the EnumProcessModules() and GetModuleInforma-
tion() API calls; only modules managed by the OS (using a reference counter that is incremented on load and
decremented when freed) can be examined in this way.
Except for those OSs supporting the NX (No eXecute) bit, Windows does not require a memory block containing
executable code to be registered as executable with the OS. This means that code in any portion of memory
can execute if pointed to by the instruction pointer. This system provides a great deal of flexibility when design-
ing program architecture but also room to exploit stack and buffer overflows (as the instruction pointer can be
manipulated to point even to a block of memory that has been allocated from the heap).
Traditional packers overwrite the packed program code with the unpacked code in place during the unpack op-
eration; many others call the VirtualAlloc() or GlobalAlloc() API functions to allocate memory, store the unpacked
code in the newly-allocated block and perform a jump to run the code from there. VirtualAlloc() is generally pre-
ferred to GlobalAlloc() because the memory address can be specified by way of a parameter, which means that
the need to perform address relocation on the unpacked code is obviated if a memory block can be allocated at
the desired address.
Occasionally it is necessary to search all memory blocks to find a hidden module.
Resolving Names of API Calls made by Injected Threads
A program (or more strictly, process) cannot usually access the address space of another, as was possible with
A Museum of API Obfuscation on Win32
Page 7
Security Response
MS-DOS; a process cannot simply write to an area of memory in which, say, Internet Explorer is running, as the
virtual addresses mappings used by each process will differ.
The API function VirtualAllocEx(), however, allows a block of memory belonging to a different process to be al-
located; code can then be copied into this area and executed by way of the CreateRemoteThread() function. Code
injected into Internet Explorer in this way is, to all intents and purposes, part of Internet Explorer itself, and as
such is able to operate with the same privileges as the original program. In the case of Internet Explorer this is
likely to mean being able to bypass any firewall present and access the Internet on port 80.
Injected threads generally do not have import table unless they are DLLs; they may instead have their own API
resolution routines. Alternatively the calling thread may pass the API function addresses to the injected thread
by way of a parameter. It is not a trivial task to write a generic tool to resolve the addresses of API calls made by
injected threads.
Other Memory Blocks
As previously discussed, program code can reside in allocated memory blocks and on the stack as well as in EXE
and DLL modules. Other blocks may also be used to hold data or code to relay instructions from the application
program to an OS module in order to obfuscate the API calls made. This means that the entirety of memory ac-
cessible by a program is required during analysis.
Runtime API Address Resolution
There are two main types of API obfuscation. In the first type, all API function addresses are resolved before the
main routine of the program begins. In the second, API function addresses are resolved individually at call-time.
The remainder of this paper will focus on the first kind of obfuscation, but the second kind will also be covered to
some extent.
Decoding API Function Names using Hashing
Shell code in document files deliberately crafted to exploit vulnerabilities usually encodes API functions by
hashing API names. A typical algorithm is to add each ASCII character of an API function name to a 32-bit value,
performing a bitwise rotation right 13 places for each character. This produces a hash with no collisions in any
major system DLLs, making it an easy and safe method of obfuscation. The parameters of the hashing algorithm
may also be modified, for example adding XOR operations or altering the number of bitwise rotations applied to
each character. Modifications to hashing algorithms result in analysis taking longer to complete as tools/scripts
may need to be altered. The following is an example of hashed API address resolution, taken from Trojan.Anic-
moo:
0000016F GetAPIaddress proc near
0000016F arg _ 0 = dword ptr 14h ; DWORD checksum value
0000016F arg _ 4 = dword ptr 18h ; virtual address of module (DLL)
0000016F
0000016F push ebx
00000170 push ebp
00000171 push esi
00000172 push edi
00000173 mov ebp, [esp+arg _ 4] ; module handle (== VA of DLL image base)
00000177 mov eax, [ebp+3Ch] ; position of PE header
0000017A mov edx, [ebp+eax+78h] ; Export Directory Table
0000017E add edx, ebp ; convert RVA to VA
00000180 mov ecx, [edx+18h] ; number of Name Pointers
00000183 mov ebx, [edx+20h] ; Name Pointer RVA
00000186 add ebx, ebp ; convert RVA to VA
A Museum of API Obfuscation on Win32
Page 8
Security Response
00000188
00000188 LOOP _ NEXT _ API:
00000188 jecxz short NOT _ FOUND
0000018A dec ecx
0000018B mov esi, [ebx+ecx*4] ; Export RVA
0000018E add esi, ebp ; convert RVA to VA
00000190 xor edi, edi ; clear the checksum
00000192 cld
00000193
00000193 LOOP _ NEXT _ CHARACTER:
00000193 xor eax, eax
00000195 lodsb ; al <-- [esi] , then esi++
00000196 cmp al, ah ; is it zero (null-terminator)?
00000198 jz short END _ OF _ API _ NAME
0000019A ror edi, 13
0000019D add edi, eax
0000019F jmp short LOOP _ NEXT _ CHARACTER
000001A1
000001A1 END _ OF _ API _ NAME:
000001A1 cmp edi, [esp+arg _ 0] ; compare with the parameter checksum
000001A5 jnz short LOOP _ NEXT _ API
000001A7 mov ebx, [edx+24h] ; Ordinal Table RVA
000001AA add ebx, ebp ; convert RVA to VA
000001AC mov cx, [ebx+ecx*2] ; get the ordinal number
000001B0 mov ebx, [edx+1Ch] ; Export Address Table RVA
000001B3 add ebx, ebp ; convert RVA to VA
000001B5 mov eax, [ebx+ecx*4] ; get RVA of the API via the ordinal number
000001B8 add eax, ebp ; convert RVA to VA
000001BA jmp RETURN
000001BF
000001BF NOT _ FOUND:
000001BF xor eax, eax
000001C1
000001C1 RETURN:
000001C1 mov edx, ebp
000001C3 pop edi
000001C4 pop esi
000001C5 pop ebp
000001C6 pop ebx
000001C7 retn
000001C7 GetAPIaddress endp
. . .
0000011E push eax ; HMODULE (== virtual address) of urlmon.dll
0000011F push 702F1A36h ; checksum of URLDownloadToFileA
00000124 call GetAPIaddress
The Backdoor.Darkmoon Trojan horse uses a more complex algorithm to hash API function names. Malware that
uses hashes to encode/decode API functions often includes a routine to store the addresses in allocated or stack
memory. It is difficult to develop a tool to resolve these API addresses from memory dumps because the memory
locations at which the resolved API function addresses are held will vary and where the structures begin may not
be clear (often referenced by a register plus an offset such as [ESI + 24h]).
A Museum of API Obfuscation on Win32
Page 9
Security Response
The Use of LoadLibrary() and GetProcAddress()
Strings are commonly encrypted in malicious code to make it more difficult to analyze. In addition, presence
of the strings bind, listen, send, recv, RegSetValue, CreateRemoteThread and SetWindowsHook would
immediately arouse the suspicions of a malware analyst and increase the risk of the program being detected by
antivirus software. Because it is easy to discover which API functions a program calls by examining the import
table, the LoadLibrary() and GetProcAddress() API functions are used by malware such as the Spybot family of
worms to resolve the addresses of the API calls made by the body of the threat (as opposed to making the calls
directly).
Although GetProcAddress() is called after the Spybot worm has started, the presence of this call in conjunction
with a suspicious parameter such as GetProcAddress(send) would serve as an obvious indicator of a pro-
grams malicious intent; the string parameter therefore must also be encrypted if this technique is to be of any
use. W32.Stration, prevalent in 2006 through 2007, is an example of a worm that decrypts Windows API function
name parameters to be passed to GetProcAddress() the first time they are called. The addresses of resolved API
functions are saved in global variables but not all API calls will be resolved unless every code path is executed,
making analysis using memory dumps difficult. When this technique is used, static decryption tools may be more
effective.
The following example is a routine for API address resolution taken from W32.Stration.CX@mm:
00401EE0 sub _ 401EE0 proc near
00401EE0
00401EE0 var _ 18 = dword ptr -18h
00401EE0 var _ 14 = dword ptr -14h
00401EE0 var _ 10 = dword ptr -10h
00401EE0 var _ C = dword ptr -0Ch
00401EE0 var _ 8 = dword ptr -8
00401EE0 var _ 4 = byte ptr -4
00401EE0 arg _ 0 = dword ptr 4
00401EE0 arg _ 4 = dword ptr 8
00401EE0
00401EE0 mov eax, dword _ 404118 ; saved API address
00401EE5 or byte ptr word _ 40401C, 3Dh
00401EEC sub esp, 18h
00401EEF test eax, eax
00401EF1 jnz short loc _ 401F48
00401EF3 mov eax, ds:dword _ 4010C0 ; 637E7640h
00401EF8 mov ecx, ds:dword _ 4010C4 ; 44657851h
00401EFE mov edx, ds:dword _ 4010C8 ; 7B70797Eh
00401F04 mov [esp+18h+var _ 18], eax
00401F07 mov eax, ds:dword _ 4010CC ; 7D755872h
00401F0C mov [esp+18h+var _ 14], ecx
00401F10 mov ecx, ds:dword _ 4010D0 ; 17637472h
00401F16 mov [esp+18h+var _ 10], edx
00401F1A mov dl, ds:byte _ 4010D4 ; 0
00401F20 mov [esp+18h+var _ C], eax
00401F24 mov [esp+18h+var _ 8], ecx
00401F28 mov [esp+18h+var _ 4], dl
00401F2C xor eax, eax
00401F2E mov edi, edi
00401F30
00401F30 loc _ 401F30:
00401F30 xor byte ptr [esp+eax+18h+var _ 18], 17h ; decrypting
00401F34 inc eax
A Museum of API Obfuscation on Win32
Page 10
Security Response
00401F35 cmp eax, 14h
00401F38 jl short loc _ 401F30
00401F3A lea eax, [esp+18h+var _ 18]
00401F3D push eax ; WaitForSingleObject
00401F3E call sub _ 401E40 ; get the API address
00401F43 mov dword _ 404118, eax ; save the API address for the next time
00401F48
00401F48 loc _ 401F48:
00401F48 mov ecx, [esp+18h+arg _ 4]
00401F4C mov edx, [esp+18h+arg _ 0]
00401F50 push ecx
00401F51 push edx
00401F52 call eax ; call the API
00401F54 add esp, 18h
00401F57 retn 8
00401F57 sub _ 401EE0 endp
Basic API Obfuscation
Unlike most MS-DOS viruses, which tended to be written in assembly language, high-level languages are pre-
ferred to write malware for Windows. W32.Stration variant worms are written in C and use the method described
above to obfuscate API calls (decrypting the API name string when the API function is first called). This requires
the malware author to write code to encrypt and decrypt strings and to use these routines to call API functions.
Although these tasks can be time-consuming, a skilled analyst can decrypt the strings with relative ease, mean-
ing that home-grown implementations are not particularly prevalent.
In many cases API obfuscation can be achieved without writing custom code in C or Delphi by way of the use of
software libraries or existing packers. To make use of the former, an application is linked not against the regular
import library but instead against another library that adds a layer of misdirection or redundant code before
calling regular API functions. Packers, in contrast, typically redirect API calls to custom code following the un-
packing operation but before the execution of the program proper has begun. The commonly seen packer UPX
operates in this way. Usually the loader resolves the addresses of all the API functions called by a program, but
in this case the loader only resolves those used during the unpacking operation; the original programs imported
API information is also packed and is resolved by the unpacker instead of by the loader itself.
The remainder of this chapter provides some examples of methods used to obfuscate API calls.
Staged API Obfuscation
A regular API call consists of a single call to the target function. In C:
call ds:GetSystemTime
Or in Delphi:
call j _ GetSystemTime
. . .
j _ GetSystemTime: jmp ds:GetSystemTime
Not usually seen in regular code, a layer of misdirection can be introduced by constructing a call instruction to
call another function, which in turn calls the target API:
call call _ GetSystemTime
. . .
call _ GetSystemTime:
mov EAX, ds:GetSystemTime
jmp EAX
A Museum of API Obfuscation on Win32
Page 11
Security Response
This redirection procedure is termed a stage; the example above is a one-stage API obfuscation because it
requires a single set of instructions to redirect to the target API function. When many redirections are used the
technique is termed multi-stage API obfuscation.
Although multi-stage API obfuscation can easily be resolved by a (human) malware analyst, this is not the case
for IDA. IDA may display the aforementioned code as:
call sub _ 412458
. . .
sub _ 412458:
mov EAX, ds:GetSystemTime
jmp EAX
A de-obfuscation tool must rename the label sub_412458 to GetSystemTime if it is to be of maximum possible
use.
If the redirection stage exists in the same module as the one currently being analyzed, finding the address of the
system call is no problem for the analyst; there are, however, some cases where the stage component resides in
an allocated memory block and therefore out of the scope of the current module under analysis. This may look
like:
4010C0: call ds:[402780h]
. . .
402780: dd 00370000h
It is impossible to know what the above code calls without reading the DWORD value at 370000h; this is why
access to all memory blocks used by a program is essential during in-depth analysis. Performing a search for a
memory block containing 370000h yields:
370000: mov EAX, 77E7B476h
370005: jmp EAX
77E7B476h appears to be the Virtual Address (VA) of an API function. Searching memory dumps for a DLL
containing 77E7B476h and subsequently examining the export table reveals that the function in question is
CreateFileA(). Since most DLLs are relocatable it is important that the memory dumps of the same process are
searched (as opposed to the files in the %System% directory) in order for the VAs to be correct.
When the redirection stage is located outside of the current module, as in the example above, we term it extra-
modular one-stage API obfuscation.
If an API address de-obfuscation tool is able to add the label CreateFileA at VA 402780h, IDA correctly displays:
4010C0: call ds:[CreateFileA]
. . .
402780: CreateFileA: dd 00370000h
Extra-Modular Function Tables
As discussed in the previous section, an allocated memory block can be used to store a staged program that re-
directs the flow of execution. Memory blocks of this type can also be used to store API addresses. As an example:
00404A13 call dword ptr ds:0B5A068h ; references a memory block outside the module
. . .
00B5A068 dd 7743DE3Ah; SHFileOperationA
It appears that the IAT, which should reside in the same module, is located in a different memory block outside
of the module. This presents a problem for a de-obfuscation tool; it cannot add a label SHFileOperationA to the
address 00B5A068h because it is outside of the range of the module currently being displayed in IDA. The only
thing that can be done is to add the comment SHFileOperationA to the address 00404A13h.
A Museum of API Obfuscation on Win32
Page 12
Security Response
Immediate Jumps
As mentioned above, API calls made in Delphi compile down to a call to an address followed by a jmp to the tar-
get API function address. These two steps are known as a thunk and can appear in code other than that gener-
ated by Delphi build systems. For example, if a thunk is created in an allocated memory block it can be used for
API call obfuscation, as in:
004023A8 call ds:label _ 4130B4
. . .
004130B4 label _ 4130B4 dd 972030h ; This is outside the current module.
. . .
00972030 jmp near ptr 77E5B476h; CreateFileA
Since this example has a label at 004130B4h, which is inside the current module, the de-obfuscation tool can
rename label_4130B4 to CreateFileA which results in the instruction at 004023A8h correctly being displayed
as call ds:CreateFileA.
Jump-in
Regardless of the tricks used along the way, if a call eventually reaches the address of an API function the ana-
lyst can resolve the obfuscation. For example, if a call of a one-stage API obfuscation reaches 77E7B476h and
77E7B476h is the entry address of CreateFileA(), the obfuscation is resolved. This means that if a full list of all
the API function addresses in all the DLLs loaded by a given process is generated, an automated de-obfuscation
tool can trace all call and jmp instructions in the current process until one of the API addresses in the list is
reached.
In order to prevent the use of this method of de-obfuscation a technique termed jump-in API obfuscation is
used; the target of a thunk operation is altered to be several instructions after the entry address of the target
API function, as in:
00401922 call sub _ 403C08
. . .
00403C08 jmp ds:off _ 404090
. . .
00404090 off _ 404090 dd offset unk _ 40C4A1
. . .
0040C4A1 unk _ 40C54A1 jmp near ptr 0040C4A4h
0040C4A3 db 0EAh ; dummy byte to distract
0040C4A4 push 0
0040C4A6 jmp near ptr 77E41BECh ; API entry (Sleep()) + 2
This example of one-stage API obfuscation does not hit any API function entry points; the Sleep() API function
starts at 77E41BEAh, but this is not the address reached by the final jmp instruction. A push 0 instruction can
be seen before the final jmp to 77E41BECh; subtracting 2 from this address (i.e. the length of the push 0 instruc-
tion) yields the entry address of the API function Sleep(). The first instruction in the Sleep() function is also
push 0; the instruction has been copied to ensure that no functionality is lost or skipped when jumping into the
middle of the Sleep() routine proper.
It may be possible to de-obfuscate API calls made in the manner above by simply subtracting the length of the
instructions executed before the jump and adjusting the target address appropriately, but redundant or dummy
instructions may have been inserted (such as jmp $+1or db 0EAh, as seen in the example). The redundant code
here is three bytes long, but it is non-trivial to determine whether an arbitrary sequence of instructions is redun-
dant or not, and an over- or under-estimation will result in the calculated address of the target API function entry
point being incorrect. One way around this problem is simply to select the nearest API addresses; if a tool can
suggest an approximate adjustment this may be enough.
A Museum of API Obfuscation on Win32
Page 13
Security Response
Jump-in obfuscation may originally have been developed to prevent API call monitoring tools from functioning;
in the above example the first address of the Sleep() API function (as exported by kernel32.dll) is never executed
and as such any monitoring tool will never trigger if only the first instruction is hooked.
Advanced API Obfuscation
API obfuscation has evolved to such an extent that a simple tool can no longer fully resolve all obfuscated API
calls. Commercial packers such as ASProtect, Enigma, Themida and Obsidium are being armored with ever-
more sophisticated API obfuscation techniques, and in many cases make use of several obfuscation techniques
simultaneously. To de-obfuscate methods such as these, many CPU instructions have to be emulated; it would
take many months for an analyst to perform the task by hand. As such a timescale is not acceptable, automated
emulation tools can be used, but this approach also has drawbacks. Some examples of advanced API obfuscation
techniques appear in the following sections.
Logic Stage and Skipper Stage Obfuscation
As detailed above, staged calls can obfuscate API calls to some extent, but these can usually be resolved as there
are clear patterns involved. A more complex technique is termed logic stage API obfuscation; although a pattern
is still evident, there is a sequence of instructions that must be executed in order to reach the target address. A
logic stage does not necessarily require emulation of code to de-obfuscate the call as the code is redundant and
does not affect the flow of execution of the program proper.
Logic stages are sometimes armored with a return address skipper: while calculating the target address to jump
to, the logic also rewrites the return address on the stack, usually adding one to it. This is termed skipper stage
API obfuscation. In many cases a logic component and a skipper component share a stage, as in:
00D50000 sbb ecx,61h ; meaningless instruction
00D50003 jmp short 00D50006h
00D50005 db 0E9h ; placed to obfuscate in disassembler
00D50006 mov ecx, 486366h ; meaningless instruction
00D5000B pop eax ; return address
00D5000C lea eax,[eax+1] ; return address += 1
00D5000F push eax ; return address is now incremented (skipper)
00D50010 push 0D40000h ; the address of the next stage
00D50015 retn ; jump to the next stage
This type of obfuscation can be seen in Backdoor.Graybird, which contains code as below:
00411000 push eax
00411001 call ds:[420008h] ; points to logic and skipper stage
00411007 db 0E9h ; a skipped byte
00411008 or eax, eax ; instruction pointer returns here from the call
The instructions from 411007h onwards are correct because the skipper stage was noticed and the addresses
adjusted accordingly. This adjustment could not have been made by a disassembler alone; the instruction at
411007h would likely have been interpreted as call XXXXXXh (with XXXXXX being a meaningless address), mean-
ing that the flow of the program would appear incorrectly. A de-obfuscation tool must detect the skipper stage
and return address adjustment, recognize which API function is being called and add a label to 420008h; it must
then undo its analysis from 411007h and reanalyze from 411008h onwards.
Copied and Substituted Obfuscation
ASProtect is a packer that is often used to obfuscate API calls. One of its obfuscation methods is to copy the
whole body of the target API function into memory owned by the process. This is termed copied API obfusca-
tion, for example:
A Museum of API Obfuscation on Win32
Page 14
Security Response
01230000 mov eax,fs:[18h]
01230006 mov ecx,[eax+30h]
01230009 mov eax, word ptr [ecx+0B0h]
0123000F movzx edx, word ptr [ecx+0ACh]
01230016 xor eax,0FFFFFFFEh
01230019 shl eax,0Eh
0123001C or eax,edx
0123001E shl eax,8
01230021 or eax,[ecx+0A8h]
01230027 shl eax,8
0123002A or eax,[ecx+0A4h]
01230030 ret
The above code is taken from an allocated memory block and is called from the main program. Since it does
not reference any system DLL it is impossible to tell directly from the address what API function was called, or
indeed whether it represents an API call at all. Searching kernel32.dll for this block of code yields the result that
the entire block matches the GetVersion() API function.
Although it can be time consuming to search existing DLLs for identical snippets of code, it is not difficult; the
code can be found verbatim. A more sophisticated approach to this method of API obfuscation is to copy the
whole or partial API routine and insert some additional code to fix up the address displacement. For example,
the jz conditional jump instruction takes two bytes of memory if the distance to jump is between -128 and +127
bytes of the current operation. If a routine containing this instruction is copied, this distance is likely to increase
and as such the jz instruction will require six bytes of memory; the obfuscation routine is able to adjust the
code accordingly. This means that any tool capable of de-obfuscating techniques such as these must be able
to compare blocks of code in terms of underlying logic as opposed to surface structure. The following example
illustrates this necessity:
(Code copied from kernel32.dll into the main program):
00402000 6A 00 push 0
00402002 FF 74 24 08 push [esp+8]
00402006 E8 1A 83 A5 77 call SleepEx @ 77E5A325
0040200B C2 04 00 ret 4
(Code as in kernel32.dll):
77E41BEA 6A 00 push 0
77E41BEC FF 74 24 08 push [esp+8]
77E41BF0 E8 30 87 01 00 call SleepEx @ 77E5A325
77E41BF5 C2 04 00 ret 4
The above examples are both from the Sleep() API function but a simple binary comparison will fail to match
because the machine code for call SleepEx differs.
The semantic comparison of code can be computationally expensive and hence time-consuming; an optimization
to improve performance is to enumerate system DLLs, ordering on frequency of use (for example, kernel32.dll >
advapi32.dll > user32.dll > shell32.dll), and perform the search based on this ranking.
Another example of copied API code is as follows, termed substituted API obfuscation:
00F40000 mov eax,fs:[18h]
00F40006 mov eax,[eax+34h]
00F40009 ret
A Museum of API Obfuscation on Win32
Page 15
Security Response
The above code can be found in the function RtlGetLastWin32Error() in ntdll.dll; in fact, the often-used GetLastEr-
ror() API function in kernel32.dll simply redirects to this function. It would be optimal if a de-obfuscation tool
were to be aware of this substitution and insert the more commonly used API call as a label in the appropriate
place.
Push-ret and Push-calc-ret Obfuscation
The simple technique of pushing the target API address to the stack and executing a ret instruction is termed
push-ret API obfuscation; however, some enhanced versions are as follows:
003C80B0 call dword ptr [3E82B8h] ; calls 17B000Dh
. . .
003E83B8 dd 17B000Dh ; in another memory block
. . .
017B000D push 3E62B8CDh
017B0012 sub dword ptr [esp], 0CCC079FFh ; = 71A23ECEh (bind())
017B0019 ret
The code in the example above pushes an immediate value of 3E62B8CDh on to the stack, then subtracts
0CCC079FFh to yield 71A23ECEh; this is the address of the bind() function from the Windows implementation
of the Berkley sockets API. The de-obfuscation tool should be aware of this technique and calculate the value on
the top of the stack just before the ret instruction is executed. Since this technique requires calculation between
push and ret it is termed the push-calc-ret API obfuscation.
An example of further enhanced push-ret API obfuscation, seen in certain Trojan horse programs with back door
functionality, is as follows:
004014DA mov esi, offset unk _ 404907 ; stores DWORD-value list
004014DF push dword ptr [esi+30h] ; pushes 8DC82618h
004014E2 push loc _ 4014ED ; return address
004014E7 push loc _ 4010A4 ; call destination
004014EC ret ; calls 4010A4h
004014ED <next instruction> ; returns here
. . .
004010A4 mov edx, [esp+4] ; 8DC82618h <-- came from [esi+30h]
004010A8 mov ecx, [esp+0] ; 004014EDh (return address)
004010AB add esp, 8
004010AE ror edx, 0FAh
004010B1 sub edx, dword _ 404027 ; == 0FA23D1ADh
004010B7 push ecx ; returning address
004010B8 push edx ; API address of CreateFileA
004010B9 ret ; jumps to CreateFileA
The block of code from 4010A4h onwards is common to all API calls made by the program. The code from
4014DAh has clearly not been compiled from a high-level language, and as it does not explicitly call anything it
initially appears as though no API calls are made. The offset from the stack pointer, however, is actually used to
access the API, with [esi+30h] being CreateFileA() and [esi+34h] also referring to another API function. A de-
obfuscation tool must be able to recognize the following sequence of instructions:
mov esi, xxxx
push dword ptr [esi+xx]
push xxxx
push xxxx
ret
A Museum of API Obfuscation on Win32
Page 16
Security Response
Some simple address arithmetic must then be performed to calculate the address of the API function. The tool
can then insert an IDA comment containing the functions name.
Padded and Copied API Obfuscation (Themida)
Certain obfuscating packers, such as Themida and Enigma, copy code from API functions and in addition inter-
leave redundant instructions in with the copied code. They may also replace blocks of code with equivalent but
longer sequences of instructions. It is not known where this technique originated but its use was observed in the
Themida packer in 2005 to 2006.
The API obfuscation techniques used by Themida are explained in the presentation Analysis and Visualization of
Common Packers.
2
The following is an example of these methods:
00401B77 call 2930000h
...
02930000 push edx ; making room for EBP
02930001 push eax ; save EAX
02930002 push edx ; save EDX
02930003 jmp 293000Eh
...
0293000E rdtsc ; destroys EDX:EAX
02930010 jmp 2930029h
...
02930029 pop edx ; restore EDX
0293002A pop eax ; restore EAX
0293002B mov [esp],ebp
The code block starting at 02930000h contains junk code from 02930001h to 0293002Ah. Removing this block
yields the following:
02930000 push edx ; making room for EBP
0293002B mov [esp],ebp
Patterns of code replacement can be exploited during analysis. A de-obfuscation tool may have an internal
replacement table which maps obfuscated sequences of instructions to their more concise counterparts, for
example mapping push reg32(1), mov [esp], reg32(2) to push reg32(2).The example above can be replaced with
the more simple push ebp, with the remaining task being to follow the steps necessary to resolve the jump-in
obfuscation as previously detailed.
Padded and Copied API Obfuscation (Enimga)
The Enigma Protector obfuscates API calls in a similar way to Themida but introduces more complexity:
00401753 call dword ptr ds:973245h ; it points to 974819h
...
00974819 call 97481Fh ;
0097481E push esi ; dummy instruction
0097481F call 974827h ;
00974824 jmp 97482Ah
00974826 db 15h ; dummy code
00974827 ret 4 ;
0097482A add esp, 5C9099Bh
00974830 mov [esp-5C909Fh],esi ; mov [esp-4], esi
00974837 add esp, 0FA36F661h ; sub esp, 4 (5C9099Bh + 0FA36F661h == -4)
0097483D call 974846h ;
00974842 db 80h, 0DEh, 9Dh, 70h ; dummy code
00974846 add esp,4
A Museum of API Obfuscation on Win32
Page 17
Security Response
It is necessary to examine the code carefully to observe how the stack pointer is manipulated. Following the
second call instruction, the stack pointer is decreased by 8 to store the two return addresses. The ret instruction
at 00974827h not only returns, however, but also adds 4 to the stack pointer, resetting it to its original state. The
block of code from 00974819h to 00974827h is redundant and can be deleted. The block from 0097483Dh to
00974846h can also be removed. Removing these blocks yields:
0097482A add esp, 5C9099Bh
00974830 mov [esp-5C909Fh],esi ; mov [esp-4], esi
00974837 add esp, 0FA36F661h ; sub esp, 4 (5C9099Bh + 0FA36F661h == -4)
The adding of such a large number to the stack pointer may cause alarm. This sequence of instructions is likely
to cause problems in an MS-DOS environment as an interrupt may occur before the stack pointer is restored,
resulting in stack corruption. This, however, is not a problem for user-space executables in the Windows environ-
ment, and as such these three instructions are the equivalent of the shorter push esi.
As with Themida, patterns of junk code and code replacement can be observed. If a list of possible patterns is
available, an automated tool should be able to reconstruct the (possible) original code and compare it with rou-
tines in system DLLs, thus locating the called API function.
Splicing Intensive Instructions to Provide Obfuscation (Obsidium)
Sometimes emulation of code is required to obtain target API addresses. One such situation is the obfuscation
technique in which all API calls are replaced by jumps to a common routine which dispatches the flow of control
to the various target API functions. Disambiguation is performed using the EDX register, which is set prior to the
jump to the dispatch routine and, following some calculations, is used to index into a table of API function ad-
dresses. Another similar obfuscation technique is to determine the address of the target API function by way of
the address from which the common dispatch routine is called.
Emulation is often effective in dealing with these kinds of API obscuration, but can be time-consuming and occa-
sionally yields wrong answers. This means that it is best considered to be a last resort, employed only when the
use of conventional analytical techniques proves impossible.
Obsidium is a packer that requires emulation to resolve its API obscuration. It installs a custom Structured Ex-
ception Handler (SEH) and intentionally executes erroneous instructions in order to jump to this code, as shown
in the following example:
008B6037 55 push ebp
008B6038 8B EC mov ebp, esp
008B603A 81 EC 30 01 00 00 sub esp, 130h
008B6040 EB 04 jmp short 008B6046
008B6046 60 pusha
008B6047 EB 04 jmp short 008B604D
008B604D 9C pushf
008B604E EB 03 jmp short 008B6053
008B6053 EB 04 jmp short 008B6059
008B6059 E8 00 00 00 00 call $+5 (008B605E)
008B605E EB 01 jmp short 008B6061
008B6061 5E pop esi
008B6062 EB 03 jmp short 008B6067
008B6067 EB 01 jmp short 008B606A
008B606A 8B 96 64 03 00 00 lea edx, [esi+364h]
008B6070 EB 04 jmp short 008B6076
008B6076 33 C0 xor eax, eax
008B6078 EB 03 jmp short 008B607D
008B607D 52 push edx
008B607E EB 01 jmp short 008B6081
A Museum of API Obfuscation on Win32
Page 18
Security Response
008B6081 64 FF 30 push dword ptr fs:[eax]
008B6084 EB 01 jmp short 008B6087
008B6087 64 89 20 mov fs:[eax], esp
008B608A EB 01 jmp short 008B608D
008B608D EB 03 jmp short 008B6092
008B6092 EB 02 jmp short 008B6096
008B6096 EB 36 jmp short 008B60CE
008B60CE EB 01 jmp short 008B60D1
008B60D1 8B 54 24 30 mov edx, [esp+30h]
008B60D5 EB 01 jmp short 008B60D8
008B60D8 EB C1 jmp short 008B609B
008B609B EB 02 jmp short 008B609F
008B609F F7 C2 01 00 00 00 test edx, 1
008B60A5 EB 04 jmp short 008B60AB
008B60AB 74 0C jz 008B60B9
008B60AD EB 04 jmp short 008B60B3
008B60B3 0F 0B ud2 ; undefned opcode
008B60B5 EB 02 jmp short 008B60B9
008B60B9 EB 03 jmp short 008B60BE
008B60BE F7 F0 div eax ; division by zero
Removing short jumps, the above code is as follows:
008B6037 55 push ebp
008B6038 8B EC mov ebp, esp
008B603A 81 EC 30 01 00 00 sub esp, 130h
008B6046 60 pusha ; push EAX,ECX,EDX,EBX,ESP,BP,ESI,EDI
008B604D 9C pushf ; push EFLAGS
008B6059 E8 00 00 00 00 call $+5 (008B605E)
008B6061 5E pop esi ; esi = 008B6061h
008B606A 8B 96 64 03 00 00 lea edx, [esi+364h] ; edx = 008B63C5h
008B6076 33 C0 xor eax, eax
008B607D 52 push edx ; exception handler address
(008B63C5h)
008B6081 64 FF 30 push dword ptr fs:[eax]
008B6087 64 89 20 mov fs:[eax], esp
008B60D1 8B 54 24 30 mov edx, [esp+30h] ; value from uninitialized stack
variable
008B609F F7 C2 01 00 00 00 test edx, 1
008B60AB 74 0C jz 008B60BE
008B60B3 0F 0B ud2 ; undefned opcode
008B60BE F7 F0 div eax ; division by zero
A CPU exception occurs when execution reaches either 008B60B3h or 008B60BEh. The exception is handled by
the SEH at 008B63C5h, shown here:
008B63C5 EB 03 jmp short 008B63CA
008B63CA E8 00 00 00 00 call $+5 (008B63CF)
008B63CF EB 02 jmp short 008B63D3
008B63D3 5A pop edx
008B63D4 EB 01 jmp short 008B63D7
008B63D7 8B 8A 95 FB FF FF mov ecx, [edx-46Bh]
008B63DD EB 04 jmp short 008B63E3
. . .
A Museum of API Obfuscation on Win32
Page 19
Security Response
The exceptions that will be encountered when debugging the above code are likely to be extremely distracting.
This, coupled with the jumps to random locations and the fact that some 100,000 instructions must be emulated
before the target API function is reached, makes the analysis of the above code a troublesome and difficult task.
A further complication exists in that the obfuscated code deliberately calls ret from inside an OS DLL, thus ren-
dering useless the technique of checking all instructions for references to system libraries. A de-obfuscation tool
must be able to recognize these kinds of dummy call ret sequences and avoid flagging them as significant.
16-bit Addressing Obfuscation
A de-obfuscation tool that includes emulation of instructions must support 16-bit addressing. This initially
seems counter-intuitive given the 32-bit Win32 environment; the 0 to 0FFFFh address range is not normally
visible from user-space, and indeed access to address 0 will cause a page fault exception. In SEH code, though,
address 0 is accessible by way of fs:[eax], used to map the thread information block (TIB) with an offset of 0. The
following code is an example of this operation:
64 67 FF 36 00 00 push dword ptr fs:[0] ; 67 changes from 32 bit to 16 bit mode
64 67 89 26 00 00 mov fs:[0], esp
This sequence of instructions is not seen very often in user-space code given that 32-bit mode is central to the
architecture and operation of the underlying OS. A de-obfuscation tool must support 16-bit addressing in order
to avoid missing API calls made from this mode.
Conclusion
Uncovering obfuscated API calls is a difficult task given the wide range of obfuscation techniques that can be
used and combined to hide a programs functionality. Emulation centered on the call instruction may initially
seem to be an effective method of de-obfuscation but suffers from the disadvantage of being defeated by the
copying of API code and may yield false positives when the emulated instruction pointer reaches an OS module.
Emulation can also be time-consuming and as such may not be the best choice in situations in which results are
required in a timely manner. It is therefore necessary to design a modular de-obfuscation tool able to deal with
the myriad of techniques described in this paper.
A Museum of API Obfuscation on Win32
Page 20
Security Response
References
1. Operating System Interface Obfuscation and the Revealing of Hidden Operations Abhinav Srivastava, An-
drea Lanzi, Jonathon Giffin
www.cc.gatech.edu/research/reports/GT-CS-08-09.pdf
2. Analysis and Visualization of Common Packers Ero Carrera
http://nchovy.kr/uploads/3/301/D1T1%20-%20Ero%20Carrera%20-%20Analysis%20and%20Visualiza-
tion%20of%20Common%20Packers.pdf
About Symantec
Symantec is a global leader in
providing security, storage and
systems management solutions to
help businesses and consumers
secure and manage their information.
Headquartered in Cupertino, Calif.,
Symantec has operations in more
than 40 countries. More information
is available at www.symantec.com.
For specific country offices and contact num-
bers, please visit our Web site. For product
information in the U.S., call
toll-free 1 (800) 745 6054.
Symantec Corporation
World Headquarters
20330 Stevens Creek Blvd.
Cupertino, CA 95014 USA
+1 (408) 517 8000
1 (800) 721 3934
www.symantec.com
Copyright 2009 Symantec Corporation. All rights reserved.
Symantec and the Symantec logo are trademarks or registered
trademarks of Symantec Corporation or its affiliates in the
U.S. and other countries. Other names may be trademarks of
their respective owners.
About the author
Masaki Suenaga is a Senior Software
Engineer with Symantec Security Response.
This paper was originally presented at the AVAR2009 Conference. For more information on AVAR, please visit https://www.aavar.org/.
NO WARRANTY . The technical information is being delivered to you as is and Symantec Corporation makes no warranty as to its accuracy or use. Any use of the
technical documentation or the information contained herein is at the risk of the user. Documentation may include technical or other inaccuracies or typographical
errors. Symantec reserves the right to make changes without prior notice.
Security Response

You might also like