You are on page 1of 40

Portable Executable

From Wikipedia, the free encyclopedia


Jump to: navigation, search
Not to be confused with Portable application.
Portable Executable
Filename .cpl, .exe, .dll, .ocx,
extension .sys, .scr, .drv, .tlb

Developed by Microsoft
Type of format Binary, executable, object, shared libraries
DOS MZ executable
Extended from
COFF

The Portable Executable (PE) format is a file format for executables, object code and DLLs,
used in 32-bit and 64-bit versions of Windows operating systems. The term "portable" refers to
the format's versatility in numerous environments of operating system software architecture. The
PE format is a data structure that encapsulates the information necessary for the Windows OS
loader to manage the wrapped executable code. This includes dynamic library references for
linking, API export and import tables, resource management data and thread-local storage (TLS)
data. On NT operating systems, the PE format is used for EXE, DLL, SYS (device driver), and
other file types. The Extensible Firmware Interface (EFI) specification states that PE is the
standard executable format in EFI environments.

PE is a modified version of the Unix COFF file format. PE/COFF is an alternative term in
Windows development.

On Windows NT operating systems, PE currently supports the IA-32, IA-64, and x86-64
(AMD64/Intel64) instruction set architectures (ISAs). Prior to Windows 2000, Windows NT
(and thus PE) supported the MIPS, Alpha, and PowerPC ISAs. Because PE is used on Windows
CE, it continues to support several variants of the MIPS, ARM (including Thumb), and SuperH
ISAs.

Contents
[hide]

• 1 Brief history
• 2 Technical details
o 2.1 Layout
o 2.2 Import Table
o 2.3 Relocations
• 3 .NET, metadata, and the PE format
• 4 Use on other operating systems
• 5 See also
• 6 References

• 7 External links

[edit] Brief history


Microsoft migrated to the PE format with the introduction of the Windows NT 3.1 operating
system. All later versions of Windows, including Windows 95/98/ME, support the file structure.
The format has retained limited legacy support to bridge the gap between DOS-based and NT
systems. For example, PE/COFF headers still include an MS-DOS executable program, which is
by default a stub that displays the simple message "This program cannot be run in DOS mode"
(or similar). PE also continues to serve the changing Windows platform. Some extensions
include the .NET PE format (see below), a 64-bit version called PE32+ (sometimes PE+), and a
specification for Windows CE.

[edit] Technical details


[edit] Layout

A PE file consists of a number of headers and sections that tell the dynamic linker how to map
the file into memory. An executable image consists of several different regions, each of which
require different memory protection; so the start of each section must be aligned to a page
boundary. For instance, typically the .text section (which holds program code) is mapped as
execute/readonly, and the .data section (holding global variables) is mapped as no-
execute/readwrite. However, to avoid wasting space, the different sections are not page aligned
on disk. Part of the job of the dynamic linker is to map each section to memory individually and
assign the correct permissions to the resulting regions, according to the instructions found in the
headers.

[edit] Import Table

One section of note is the import address table (IAT), which is used as a lookup table when the
application is calling a function in a different module. It can be in form of both import by ordinal
and import by name. Because a compiled program cannot know the memory location of the
libraries it depends upon, an indirect jump is required whenever an API call is made. As the
dynamic linker loads modules and joins them together, it writes jump instructions into the IAT
slots, so that they point to the memory locations of the corresponding library functions. Though
this adds an extra jump over the cost of an intra-module call resulting in a performance penalty,
it provides a key benefit: dynamic libraries are much more flexible and reduce code redundancy
(which would occur if common libraries had to be linked statically to each program). If the
compiler knows ahead of time that a call will be inter-module (via a dllimport attribute) it can
produce more optimized code that simply results in an indirect call opcode.
Texe and LordPE are tools that can be used to view the Import and Export tables of PE Files.

[edit] Relocations

PE files do not contain position-independent code. Instead they are compiled to a preferred base
address, and all addresses emitted by the compiler/linker are fixed ahead of time. If a PE file
cannot be loaded at its preferred address (because it's already taken by something else), the
operating system will rebase it. This involves recalculating every absolute address and
modifying the code to use the new values. The loader does this by comparing the preferred and
actual load addresses, and calculating a delta value. This is then added to the preferred address to
come up with the new address of the memory location. Base relocations are stored in a list and
added, as needed, to an existing memory location. The resulting code is now private to the
process and no longer shareable, so many of the memory saving benefits of DLLs are lost in this
scenario. It also slows down loading of the module significantly. For this reason rebasing is to be
avoided wherever possible, and the DLLs shipped by Microsoft have base addresses pre-
computed so as not to overlap. In the no rebase case PE therefore has the advantage of very
efficient code, but in the presence of rebasing the memory usage hit can be expensive. This
contrasts with ELF which uses fully position independent code and a global offset table, which
trades off execution time against memory usage in favor of the latter.

[edit] .NET, metadata, and the PE format


Microsoft's .NET Framework has extended the PE format with features which support the
Common Language Runtime. Among the additions are a CLR Header and CLR Data section.
Upon loading a binary, the OS loader yields execution to the CLR via a reference in the
PE/COFF IMPORT table. The CLR then loads the CLR Header and Data sections.

The CLR Data section contains two important segments: Metadata and Intermediate Language
(IL) code:

• Metadata contains information relevant to the assembly, including the assembly manifest.
A manifest describes the assembly in detail including unique identification (via a hash,
version number, etc.), data on exported components, extensive type information
(supported by the Common Type System (CTS)), external references, and a list of files
within the assembly. The CLR environment makes extensive use of metadata.
• Intermediate Language (IL) code is abstracted, language independent code that satisfies
the .NET CLR's Common Intermediate Language (CIL) requirement. The term
"Intermediate" refers to the nature of IL code being cross-language and cross-platform
compatible. This intermediate language, similar to Java bytecode, allows platforms and
languages to support the common .NET CLR. IL supports object-oriented programming
(polymorphism, inheritance, abstract types, etc.), exceptions, events, and various data
structures. IL code is assembled into a .NET PE for execution by the CLR.

[edit] Use on other operating systems


The PE format is also used by ReactOS, as ReactOS is intended to be binary-compatible with
Windows. It has also historically been used by a number of other operating systems, including
SkyOS and BeOS R3. However, both SkyOS and BeOS eventually moved to ELF.

As the Mono development platform intends to be binary compatible with Microsoft .NET, it uses
the same PE format as the Microsoft implementation.

On x86, Unix-like operating systems, some Windows binaries (in PE format) can be executed
with Wine. The HX DOS Extender also uses the PE format for native DOS 32-bit binaries, plus
it can to some degree execute existing Windows binaries in DOS, thus acting like a Wine for
DOS.

Mac OS X 10.5 has the ability to load and parse PE files, but is not binary compatible with
Windows. [1]

[edit] See also


• EXE
• a.out
• Comparison of executable file formats
• Executable compression
• Application virtualization

[edit] References
1. ^ Chartier, David (2007-11-30). "Uncovered: Evidence that Mac OS X could run
Windows apps soon". Ars Technica.
http://arstechnica.com/journals/apple.ars/2007/11/30/uncovered-evidence-that-mac-os-x-
could-run-windows-apps-soon. Retrieved 2007-12-03. "... Steven Edwards describes the
discovery that Leopard apparently contains an undocumented loader for Portable
Executables, a type of file used in 32-bit and 64-bit versions of Windows. More poking
around revealed that Leopard's own loader tries to find Windows DLL files when
attempting to load a Windows binary."

[edit] External links


• Microsoft Portable Executable and Common Object File Format Specification (latest
edition, OOXML format)
• Microsoft Portable Executable and Common Object File Format Specification (latest
edition, HTML format)
• Microsoft Portable Executable and Common Object File Format Specification (1999
edition, .doc format)
• The original Portable Executable article by Matt Pietrek (MSDN Magazine, March 1994)
• Part I. An In-Depth Look into the Win32 Portable Executable File Format by Matt
Pietrek (MSDN Magazine, February 2002)
• Part II. An In-Depth Look into the Win32 Portable Executable File Format by Matt
Pietrek (MSDN Magazine, March 2002)
• The .NET File Format by Daniel Pistelli
• Creating the smallest possible PE executable (97 bytes)
• Detailed description of the PE format by Johannes Plachy
• Windows Authenticode Portable Executable Signature Format
• LUEVELSMEYER's description about PE file format Mirror
• A tool to inspect the content of any PE File

Executable compression
From Wikipedia, the free encyclopedia
(Redirected from EXE packer)
Jump to: navigation, search

Executable compression is any means of compressing an executable file and combining the
compressed data with the decompression code it needs into a single executable.

Running a compressed executable essentially unpacks the original executable code, then
transfers control to it. The effect is the same as if the original uncompressed executable had been
run, so compressed and uncompressed executables are indistinguishable to the casual user.

A compressed executable is one variety of self-extracting archive, where compressed data is


packaged along with the relevant decompression code in an executable file. It is often possible to
decompress a compressed executable without directly executing it (two such programs are
CUP386 and UNP).

Most packed executables decompress directly into the memory and need no free file system
space to start. However, some decompressor stubs are known to write the uncompressed
executable to the file system in order to start it.

Contents
[hide]

• 1 Advantages and disadvantages


• 2 List of packers
• 3 See also

• 4 References

[edit] Advantages and disadvantages


Software distributors use executable compression for a variety of reasons, primarily to reduce the
secondary storage requirements of their software; as executable compressors are specifically
designed to compress executable code, they often achieve better compression ratio than standard
data compression facilities such as gzip, zip or bzip2[citation needed]. This allows software distributors
to stay within the constraints of their chosen distribution media (such as CD-ROM, DVD-ROM,
or Floppy disk), or to reduce the time and bandwidth customers require to access software
distributed via the Internet.

Executable compression is also frequently used to deter reverse engineering or to obfuscate the
contents of the executable (for example, to hide the presence of malware from antivirus
scanners) by proprietary methods of compression and/or added encryption. Executable
compression can be used to prevent direct disassembly, mask string literals and modify
signatures. Although this does not eliminate the chance of reverse engineering, it can make the
process more costly.

A compressed executable requires less storage space in the file system, thus less time to transfer
data from the file system into memory. On the other hand, it requires some time to decompress
the data before execution begins. However, the speed of various storage media has not kept up
with average processor speeds, so the storage is very often the bottleneck. Thus the compressed
executable will load faster on most common systems. On modern desktop computers, this is
rarely noticeable unless the executable is unusually big, so loading speed is not a primary reason
for or against compressing an executable.

On operating systems which read executable images on demand from the disk (see virtual
memory), compressed executables make this process less efficient. The decompressor stub
allocates a block of memory to hold the decompressed data, which stays allocated as long as the
executable stays loaded, whether it is used or not, competing for memory resources with other
applications all along. If the operating system uses a swap file, the decompressed data has to be
written to it to free up the memory instead of simply discarding unused data blocks and reloading
them from the executable image if needed again. This is usually not noticeable, but it becomes a
problem when an executable is loaded more than once at the same time—the operating system
cannot reuse data blocks it has already loaded, the data has to be decompressed into a new
memory block, and will be swapped out independently if not used. The additional storage and
time requirements mean that it has to be weighed carefully whether to compress executables
which are typically run more than once at the same time.

Another disadvantage is that some utilities can no longer identify run-time library dependencies,
as only the statically linked extractor stub is visible.

Also, some older virus scanners simply report all compressed executables as viruses because the
decompressor stubs share some characteristics with those. Most modern virus scanners can
unpack several different executable compression layers to check the actual executable inside, but
some popular anti-virus and anti-malware scanners have had troubles with false alarms on
compressed executables.
Executable compression used to be more popular when computers were limited to the storage
capacity of floppy disks and small hard drives; it allowed the computer to store more software in
the same amount of space, without the inconvenience of having to manually unpack an archive
file every time the user wanted to use the software. However, executable compression has
become less popular because of increased storage capacity on computers.

[edit] List of packers


For Portable Executable (Windows) files:

• ASPack • HASP Envelope • Sentinel CodeCover


• ASPR (ASProtect) • kkrunchy – Freeware (Sentinel Shell)
• Armadillo Packer • MEW – development • Shrinker32
• AxProtector stopped • Smart Packer Pro
• BeRoEXEPacker • NeoLite • SmartKey GSS
• CExe • Obsidium • tElock
• exe32pack • PECompact • Themida
• EXE Bundle • PEPack • UniKey Enveloper
• EXECryptor • PKLite32 • Upack (software) –
• EXE Stealth • PELock Freeware
• eXPressor • PESpin • UPX – free software
• MPRESS – Freeware • PEtite • VMProtect
• Privilege Shell • WWPack
• FSG (Fast Small Good) • BoxedApp Packer
• RLPack
• XComp/XPack –
Freeware

For New Executable (Windows) files:

• PackWin
• WinLite
• PKLite 2.01

For OS/2 executables only:

• NeLite
• LxLite

For DOS executables only:

• 32LiTE
• 624
• AINEXE
• aPACK
• DIET
• HASP Envelope
• LGLZ
• LZEXE – First widely publicly used executable compressor for microcomputers.
• PKLite
• PMWLITE
• UCEXE
• UPX
• WDOSX
• WWpack
• XE

For ELF files:

• gzexe
• HASP Envelope
• UPX

For .NET assembly files:

• .NETZ
• NsPack
• HASP Envelope

For Mach-O (Apple Mac OS X) files:

• HASP Envelope
• UPX

For Java JAR files:

• HASP Envelope
• pack200

For Java WAR files:

• HASP Envelope

[edit] See also


• Data compression
• Disk compression
• Executable
• Kolmogorov complexity
• UPX
• Self-extracting archive
File format
From Wikipedia, the free encyclopedia

Jump to: navigation, search

This article includes a list of references, but its sources remain unclear
because it has insufficient inline citations.
Please help to improve this article by introducing more precise citations where
appropriate. (October 2008)

A file format is a particular way that information is encoded for storage in a computer file.

Since a disk drive, or indeed any computer storage, can store only bits, the computer must have
some way of converting information to 0s and 1s and vice-versa. There are different kinds of
formats for different kinds of information. Within any format type, e.g., word processor
documents, there will typically be several different formats. Sometimes these formats compete
with each other.

File formats are divided into proprietary and open formats.

Contents
[hide]

• 1 Generality
• 2 Specifications
• 3 Identifying the type of a file
o 3.1 Filename extension
o 3.2 Internal metadata
 3.2.1 File header
 3.2.2 Magic number
o 3.3 External metadata
 3.3.1 Mac OS type-codes
 3.3.2 Mac OS X Uniform Type Identifiers (UTIs)
 3.3.3 OS/2 Extended Attributes
 3.3.4 POSIX extended attributes
 3.3.5 PRONOM Unique Identifiers (PUIDs)
 3.3.6 MIME types
 3.3.7 File format identifiers (FFIDs)
 3.3.8 File content based format identification
• 4 File structure
o 4.1 Unstructured formats (raw memory dumps)
o 4.2 Chunk-based formats
o 4.3 Directory-based formats
• 5 See also
• 6 References

• 7 External links

[edit] Generality
Some file formats are designed for very particular sorts of data: PNG files, for example, store
bitmapped images using lossless data compression. Other file formats, however, are designed for
storage of several different types of data: the Ogg format can act as a container for many
different types of multimedia, including any combination of audio and/or video, with or without
text (such as subtitles), and metadata. A text file can contain any stream of characters, encoded
for example as ASCII or Unicode, including possible control characters. Some file formats, such
as HTML, Scalable Vector Graphics and the source code of computer software, are also text files
with defined syntaxes that allow them to be used for specific purposes.

[edit] Specifications
Many file formats, including some of the most well-known file formats, have a published
specification document (often with a reference implementation) that describes exactly how the
data is to be encoded, and which can be used to determine whether or not a particular program
treats a particular file format correctly. There are, however, two reasons why this is not always
the case. First, some file format developers view their specification documents as trade secrets,
and therefore do not release them to the public. Second, some file format developers never spend
time writing a separate specification document; rather, the format is defined only implicitly,
through the program(s) that manipulate data in the format.

Using file formats without a publicly available specification can be costly. Learning how the
format works will require either reverse engineering it from a reference implementation or
acquiring the specification document for a fee from the format developers. This second approach
is possible only when there is a specification document, and typically requires the signing of a
non-disclosure agreement. Both strategies require significant time, money, or both. Therefore, as
a general rule, file formats with publicly available specifications are supported by a large number
of programs, while non-public formats are supported by only a few programs.

Patent law, rather than copyright, is more often used to protect a file format. Although patents for
file formats are not directly permitted under US law, some formats require the encoding of data
with patented algorithms. For example, using compression with the GIF file format requires the
use of a patented algorithm, and although initially the patent owner did not enforce it, they later
began collecting fees for use of the algorithm. This has resulted in a significant decrease in the
use of GIFs, and is partly responsible for the development of the alternative PNG format.
However, the patent expired in the US in mid-2003, and worldwide in mid-2004. Algorithms are
usually held not to be patentable under current European law, which also includes a provision
that members "shall ensure that, wherever the use of a patented technique is needed for a
significant purpose such as ensuring conversion of the conventions used in two different
computer systems or networks so as to allow communication and exchange of data content
between them, such use is not considered to be a patent infringement", which would apparently
allow implementation of a patented file system where necessary to allow two different computers
to interoperate.[1]

[edit] Identifying the type of a file


A method is required to determine the format of a particular file within the filesystem—an
example of metadata. Different operating systems have traditionally taken different approaches
to this problem, with each approach having its own advantages and disadvantages.

Of course, most modern operating systems, and individual applications, need to use all of these
approaches to process various files, at least to be able to read 'foreign' file formats, if not work
with them completely.

[edit] Filename extension

Main article: Filename extension

One popular method in use by several operating systems, including Windows, Mac OS X, CP/M,
DOS, VMS, and VM/CMS, is to determine the format of a file based on the section of its name
following the final period. This portion of the filename is known as the filename extension. For
example, HTML documents are identified by names that end with .htm (or .html), and GIF
images by .gif. In the original FAT filesystem, filenames were limited to an eight-character
identifier and a three-character extension, which is known as 8.3 filename. Many formats thus
still use three-character extensions, even though modern operating systems and application
programs no longer have this limitation. Since there is no standard list of extensions, more than
one format can use the same extension, which can confuse the operating system and
consequently users.

One artifact of this approach is that the system can easily be tricked into treating a file as a
different format simply by renaming it—an HTML file can, for instance, be easily treated as
plain text by renaming it from filename.html to filename.txt. Although this strategy was
useful to expert users who could easily understand and manipulate this information, it was
frequently confusing to less technical users, who might accidentally make a file unusable (or
'lose' it) by renaming it incorrectly.

This led more recent operating system shells, such as Windows 95 and Mac OS X, to hide the
extension when displaying lists of recognized files. This separates the user from the complete
filename, preventing the accidental changing of a file type, while allowing expert users to still
retain the original functionality through enabling the displaying of file extensions.

A downside of hiding the extension is that it then becomes possible to have what appear to be
two or more identical filenames in the same folder. This is especially true when image files are
needed in more than one format for different applications. For example, a company logo may be
needed both in .tif format (for publishing) and .gif format (for web sites). With the extensions
visible, these would appear as the unique filenames "CompanyLogo.tif" and
"CompanyLogo.gif". With the extensions hidden, these would both appear to have the identical
filename "CompanyLogo", making it more difficult to determine which to select for a particular
application.

A further downside is that hiding such information can become a security risk[2]. This is because
on a filename extensions reliant system all usable files will have such an extension (for example
all JPEG images will have ".jpg" or ".jpeg" at the end of their name), so seeing file extensions
would be a common occurrence and users may depend on them when looking for a file's format.
By having file extensions hidden a malicious user can create an executable program with an
innocent name such as "Holiday photo.jpg.exe". In this case the ".exe" will be hidden and a
user will see this file as "Holiday photo.jpg", which appears to be a JPEG image, unable to
harm the machine save for bugs in the application used to view it. However, the operating system
will still see the ".exe" extension and thus will run the program, which is then able to cause
harm and presents a security issue. To further trick users, it is possible to store an icon inside the
program, as done on Microsoft Windows, in which case the operating system's icon assignment
can be overridden with an icon commonly used to represent JPEG images, making such a
program look like and appear to be called an image, until it is opened that is. This issue requires
users with extensions hidden to be vigilant, and never open files which seem to have a known
extension displayed despite the hidden option being enabled (since it must therefore have 2
extensions, the real one being unknown until hiding is disabled). This presents a practical
problem for Windows systems where extension hiding is turned on by default.

[edit] Internal metadata

A second way to identify a file format is to store information regarding the format inside the file
itself. Usually, such information is written in one (or more) binary string(s), tagged or raw texts
placed in fixed, specific locations within the file. Since the easiest place to locate them is at the
beginning of it, such area is usually called a file header when it is greater than a few bytes, or a
magic number if it is just a few bytes long.

[edit] File header

First of all, the meta-data contained in a file header are not necessarily stored only at the
beginning of it, but might be present in other areas too, often including the end of the file; that
depends on the file format or the type of data it contains. Character-based (text) files have
character-based human-readable headers, whereas binary formats usually feature binary headers,
although that is not a rule: a human-readable file header may require more bytes, but is easily
discernable with simple text or hexadecimal editors. File headers may not only contain the
information required by algorithms to identify the file format alone, but also real metadata about
the file and its contents. For example most image file formats store information about image size,
resolution, colour space/format and optionally other authoring information like who, when and
where it was made, what camera model and shooting parameters was it taken with (if any, cfr.
Exif), and so on. Such metadata may be used by a program reading or interpreting the file both
during the loading process and after that, but can also be used by the operating system to quickly
capture information about the file itself without loading it all into memory.
The downsides of file header as a file-format identification method are at least two. First, at least
a few (initial) blocks of the file need to be read in order to gain such information; those could be
fragmented in different locations of the same storage medium, thus requiring more seek and I/O
time, which is particularly bad for the identification of large quantities of files altogether (like a
GUI browsing inside a folder with thousands or more files and discerning file icons or
thumbnails for all of them to visualize). Second, if the header is binary hard-coded (i.e. the
header itself is subject to a non-trivial interpretation in order to be recognized), especially for
metadata content protection's sake, there is some risk that file format is misinterpreted at first
sight, or even badly written at the source, often resulting in corrupt metadata (which, in
extremely pathological cases, might even render the file unreadable anymore).

A more logically sophisticated example of file header is that used in wrapper (or container) file
formats.

[edit] Magic number


See also: Magic number (programming)

One way to incorporate such metadata, often associated with Unix and its derivatives, is just to
store a "magic number" inside the file itself. Originally, this term was used for a specific set of 2-
byte identifiers at the beginning of a file, but since any undecoded binary sequence can be
regarded as a number, any feature of a file format which uniquely distinguishes it can be used for
identification. GIF images, for instance, always begin with the ASCII representation of either
GIF87a or GIF89a, depending upon the standard to which they adhere. Many file types, most
especially plain-text files, are harder to spot by this method. HTML files, for example, might
begin with the string <html> (which is not case sensitive), or an appropriate document type
definition that starts with <!DOCTYPE, or, for XHTML, the XML identifier, which begins with <?
xml. The files can also begin with HTML comments, random text, or several empty lines, but
still be usable HTML.

The magic number approach offers better guarantees that the format will be identified correctly,
and can often determine more precise information about the file. Since reasonably reliable
"magic number" tests can be fairly complex, and each file must effectively be tested against
every possibility in the magic database, this approach is relatively inefficient, especially for
displaying large lists of files (in contrast, filename and metadata-based methods need check only
one piece of data, and match it against a sorted index). Also, data must be read from the file
itself, increasing latency as opposed to metadata stored in the directory. Where filetypes don't
lend themselves to recognition in this way, the system must fall back to metadata. It is, however,
the best way for a program to check if a file it has been told to process is of the correct format:
while the file's name or metadata may be altered independently of its content, failing a well-
designed magic number test is a pretty sure sign that the file is either corrupt or of the wrong
type. On the other hand a valid magic number does not guarantee that the file is not corrupt or of
a wrong type.

So-called shebang lines in script files are a special case of magic numbers. Here, the magic
number is human-readable text that identifies a specific command interpreter and options to be
passed to the command interpreter.
Another operating system using magic numbers is AmigaOS, where magic numbers were called
"Magic Cookies" and were adopted as a standard system to recognize executables in Hunk
executable file format and also to let single programs, tools and utilities deal automatically with
their saved data files, or any other kind of file types when saving and loading data. This system
was then enhanced with the Amiga standard Datatype recognition system. Another method was
the FourCC method, originating in OSType on Macintosh, later adapted by Interchange File
Format (IFF) and derivatives.

[edit] External metadata

A final way of storing the format of a file is to explicitly store information about the format in
the file system, rather than within the file itself.

This approach keeps the metadata separate from both the main data and the name, but is also less
portable than either file extensions or "magic numbers", since the format has to be converted
from filesystem to filesystem. While this is also true to an extent with filename extensions — for
instance, for compatibility with MS-DOS's three character limit — most forms of storage have a
roughly equivalent definition of a file's data and name, but may have varying or no
representation of further metadata.

Note that zip files or archive files solve the problem of handling metadata. A utility program
collects multiple files together along with metadata about each file and the folders/directories
they came from all within one new file (e.g. a zip file with extension .zip). The new file is also
compressed and possibly encrypted, but now is transmissible as a single file across operating
systems by FTP systems or attached to email. At the destination, it must be unzipped by a
compatible utility to be useful, but the problems of transmission are solved this way.

[edit] Mac OS type-codes

The Mac OS' Hierarchical File System stores codes for creator and type as part of the directory
entry for each file. These codes are referred to as OSTypes, and for instance a HyperCard "stack"
file has a creator of WILD (from Hypercard's previous name, "WildCard") and a type of STAK.
The type code specifies the format of the file, while the creator code specifies the default
program to open it with when double-clicked by the user. For example, the user could have
several text files all with the type code of TEXT, but which each open in a different program, due
to having differing creator codes. RISC OS uses a similar system, consisting of a 12-bit number
which can be looked up in a table of descriptions — e.g. the hexadecimal number FF5 is
"aliased" to PoScript, representing a PostScript file.

[edit] Mac OS X Uniform Type Identifiers (UTIs)


Main article: Uniform Type Identifier

A Uniform Type Identifier (UTI) is a method used in Mac OS X for uniquely identifying "typed"
classes of entity, such as file formats. It was developed by Apple as a replacement for OSType
(type & creator codes).
The UTI is a Core Foundation string, which uses a reverse-DNS string. Common or standard
types use the public domain (e.g. public.png for a Portable Network Graphics image), while
other domains can be used for third-party types (e.g. com.adobe.pdf for Portable Document
Format). UTIs can be defined within a hierarchical structure, known as a conformance hierarchy.
Thus, public.png conforms to a supertype of public.image, which itself conforms to a
supertype of public.data. A UTI can exist in multiple hierarchies, which provides great
flexibility.

In addition to file formats, UTIs can also be used for other entities which can exist in OS X,
including:

• Pasteboard data
• Folders (directories)
• Translatable types (as handled by the Translation Manager)
• Bundles
• Frameworks
• Streaming data
• Aliases and symlinks

[edit] OS/2 Extended Attributes

The HPFS, FAT12 and FAT16 (but not FAT32) filesystems allow the storage of "extended
attributes" with files. These comprise an arbitrary set of triplets with a name, a coded type for the
value and a value, where the names are unique and values can be up to 64 KB long. There are
standardized meanings for certain types and names (under OS/2). One such is that the ".TYPE"
extended attribute is used to determine the file type. Its value comprises a list of one or more file
types associated with the file, each of which is a string, such as "Plain Text" or "HTML
document". Thus a file may have several types.

The NTFS filesystem also allows to store OS/2 extended attributes, as one of file forks, but this
feature is merely present to support the OS/2 subsystem (not present in XP), so the Win32
subsystem treats this information as an opaque block of data and does not use it. Instead, it relies
on other file forks to store meta-information in Win32-specific formats. OS/2 extended attributes
can still be read and written by Win32 programs, but the data must be entirely parsed by
applications.

[edit] POSIX extended attributes

On Unix and Unix-like systems, the ext2, ext3, ReiserFS version 3, XFS, JFS, FFS, and HFS+
filesystems allow the storage of extended attributes with files. These include an arbitrary list of
"name=value" strings, where the names are unique and a value can be accessed through its
related name.
[edit] PRONOM Unique Identifiers (PUIDs)

The PRONOM Persistent Unique Identifier (PUID) is an extensible scheme of persistent, unique
and unambiguous identifiers for file formats, which has been developed by The National
Archives of the UK as part of its PRONOM technical registry service. PUIDs can be expressed
as Uniform Resource Identifiers using the info:pronom/ namespace. Although not yet widely
used outside of UK government and some digital preservation programmes, the PUID scheme
does provide greater granularity than most alternative schemes.

[edit] MIME types

MIME types are widely used in many Internet-related applications, and increasingly elsewhere,
although their usage for on-disc type information is rare. These consist of a standardised system
of identifiers (managed by IANA) consisting of a type and a sub-type, separated by a slash — for
instance, text/html or image/gif. These were originally intended as a way of identifying what
type of file was attached to an e-mail, independent of the source and target operating systems.
MIME types identify files on BeOS, AmigaOS 4.0 and MorphOS, as well as store unique
application signatures for application launching. In AmigaOS and MorphOS the Mime type
system works in parallel with Amiga specific Datatype system.

There are problems with the MIME types though; several organisations and people have created
their own MIME types without registering them properly with IANA, which makes the use of
this standard awkward in some cases.

[edit] File format identifiers (FFIDs)

File format identifiers is another, not widely used way to identify file formats according to their
origin and their file category. It was created for the Description Explorer suite of software. It is
composed of several digits of the form NNNNNNNNN-XX-YYYYYYY. The first part indicates the
organisation origin/maintainer (this number represents a value in a company/standards
organisation database), the 2 following digits categorize the type of file in hexadecimal. The final
part is composed of the usual file extension of the file or the international standard number of the
file, padded left with zeros. For example, the PNG file specification has the FFID of 000000001-
31-0015948 where 31 indicates an image file, 0015948 is the standard number and 000000001
indicates the ISO Organisation.

[edit] File content based format identification

Another but least popular way to identify the file format is to look at the file contents for
distinguishable patterns among file types. As we know, the file contents are sequence of bytes
and a byte has 256 unique patterns (0~255). Thus, counting the occurrence of byte patterns that
is often referred as byte frequency distribution gives distinguishable patterns to identify file
types. There are many content based file type identification schemes that use byte frequency
distribution to build the representative models for file type and use any statistical and data
mining techniques to identify file types [3]
[edit] File structure
There are several types of ways to structure data in a file. The most usual ones are described
below.

[edit] Unstructured formats (raw memory dumps)

Earlier file formats used raw data formats that consisted of directly dumping the memory images
of one or more structures into the file.

This has several drawbacks. Unless the memory images also have reserved spaces for future
extensions, extending and improving this type of structured file is very difficult. It also creates
files that might be specific to one platform or programming language (for example a structure
containing a Pascal string is not recognized as such in C). On the other hand, developing tools
for reading and writing these types of files is very simple.

The limitations of the unstructured formats led to the development of other types of file formats
that could be easily extended and be backward compatible at the same time.

[edit] Chunk-based formats

Electronic Arts and Commodore-Amiga pioneered this file format in 1985, with their IFF
(Interchange File Format) file format. In this kind of file structure, each piece of data is
embedded in a container that contains a signature identifying the data, as well the length of the
data (for binary encoded files). This type of container is called a "chunk". The signature is
usually called a chunk id, chunk identifier, or tag identifier.

With this type of file structure, tools that do not know certain chunk identifiers simply skip those
that they do not understand.

This concept has been taken again and again by RIFF (Microsoft-IBM equivalent of IFF), PNG,
JPEG storage, DER (Distinguished Encoding Rules) encoded streams and files (which were
originally described in CCITT X.409:1984 and therefore predate IFF), and Structured Data
Exchange Format (SDXF). Even XML can be considered a kind of chunk based format, since
each data element is surrounded by tags which are akin to chunk identifiers.

[edit] Directory-based formats

This is another extensible format, that closely resembles a file system (OLE Documents are
actual filesystems), where the file is composed of 'directory entries' that contain the location of
the data within the file itself as well as its signatures (and in certain cases its type). Good
examples of these types of file structures are disk images, OLE documents and TIFF images.

[edit] See also


• Audio file format
• Chemical file format
• Container format (digital)
• Document file format
• DROID file format identification utility
• File (command), a file type identification utility
• File Formats, Transformation, and Migration (related wikiversity article)
• FormatFactory, a free omni file format converter.
• Future proofing
• Graphics file format summary
• List of archive formats
• Image file formats
• List of file formats
• List of free file formats
• List of motion and gesture file formats
• Magic number (programming)
• List of file signatures, or "magic numbers"
• Object file
• Open format
• TrID, a freeware file type identification utility
• Windows file types

[edit] References
1. ^ Foundation for a Free Information Infrastructure. "Europarl 2003-09-24:
Amended Software Patent Directive".
http://swpat.ffii.org/papers/europarl0309/index.en.html. Retrieved 2007-01-
07.
2. ^ PC World. "Windows Tips: For Security Reasons, It Pays To Know Your File
Extensions". http://www.pcworld.com/article/id,113758-page,1/article.html.
Retrieved 2008-06-20.
3. ^ "File Format Identification".
http://www.forensicswiki.org/wiki/File_Format_Identification.

• "Extended Attribute Data Types". REXX Tips & Tricks, Version 2.80.
http://markcrocker.com/rexxtipsntricks/rxtt28.2.0301.html. Retrieved
February 9, 2005.
• "Extended Attributes used by the WPS". REXX Tips & Tricks, Version 2.80.
http://markcrocker.com/rexxtipsntricks/rxtt28.2.0300.html. Retrieved
February 9, 2005.
• "Extended Attributes - what are they and how can you use them ?". Roger
Orr. http://www.howzatt.demon.co.uk/articles/06may93.html. Retrieved
February 9, 2005.

Dynamic-link library
From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article may contain excessive, poor or irrelevant examples. You can improve the
article by adding more descriptive text. See Wikipedia's guide to writing better articles for
further suggestions. (September 2010)
This article includes a list of references, but its sources remain unclear because it has
insufficient inline citations.
Please help to improve this article by introducing more precise citations where appropriate. (October 2009)
Dynamic link library

Filename extension .dll


Internet media type application/x-msdownload
Uniform Type com.microsoft.windows-dynamic-link-
Identifier library
Magic number MZ
Developed by Microsoft
Container for Shared library

Dynamic-link library (also written without the hyphen), or DLL, is Microsoft's implementation
of the shared library concept in the Microsoft Windows and OS/2 operating systems. These
libraries usually have the file extension DLL, OCX (for libraries containing ActiveX controls), or
DRV (for legacy system drivers). The file formats for DLLs are the same as for Windows EXE
files — that is, Portable Executable (PE) for 32-bit and 64-bit Windows, and New Executable
(NE) for 16-bit Windows. As with EXEs, DLLs can contain code, data, and resources, in any
combination.

In the broader sense of the term, any data file with the same file format can be called a resource
DLL. Examples of such DLLs include icon libraries, sometimes having the extension ICL, and
font files, having the extensions FON and FOT.[citation needed]

Contents
[hide]

• 1 Background for DLL


• 2 Features of DLL
o 2.1 Memory management
o 2.2 Import libraries
o 2.3 Symbol resolution and binding
o 2.4 Explicit run-time linking
o 2.5 Delayed loading
• 3 Compiler and language considerations
o 3.1 Delphi
o 3.2 Microsoft Visual Basic
o 3.3 C and C++
• 4 Programming examples
o 4.1 Creating DLL exports
o 4.2 Using DLL imports
o 4.3 Using explicit run-time linking
 4.3.1 Microsoft Visual Basic
 4.3.2 Delphi
 4.3.3 C and C++
 4.3.4 Python
• 5 Component Object Model
• 6 DLL Hijacking
• 7 See also
• 8 External links

• 9 References

[edit] Background for DLL


The first versions of Microsoft Windows ran every program in a single address space. Every
program was meant to co-operate by yielding the CPU to other programs so that the GUI was
capable of multitasking and could be as responsive as possible. All Operating-System level
operations were provided by the underlying operating system: MS-DOS. All higher level
services were provided by Windows Libraries Dynamic Link Libraries. The Drawing API,
GDI, was implemented in a DLL called GDI.EXE, the user interface in USER.EXE. These extra
layers on top of DOS had to be shared across all running windows programs, not just to enable
Windows to work in a machine with less than a megabyte of RAM, but to enable the programs to
co-operate amongst each other. The Graphics Device Interface code in GDI needed to translate
drawing commands to operations on specific devices. On the display, it had to manipulate pixels
in the frame buffer. When drawing to a printer, the API calls had to be transformed into requests
to a printer. Although it could have been possible to provide hard-coded support for a limited set
of devices (like the Color Graphics Adapter display, the HP LaserJet Printer Command
Language), Microsoft chose a different approach. GDI would work by loading different pieces of
code to work with different output devices—pieces of code called 'Device Drivers'.

The same architectural concept that allowed GDI to load different device drivers is that which
allowed the Windows shell to load different windows programs, and for these programs to
invoke API calls from the shared USER and GDI libraries. That concept was Dynamic Linking.

In a conventional non-shared, static library, sections of code are simply added to the calling
program when its executable is built at the linking phase; if two programs use the same routine,
the code has to be included in both. With dynamic linking, shared code is placed into a single,
separate file. The programs that call this file are connected to it at run time, with the operating
system (or, in the case of early versions of Windows, the OS-extension), performing the binding.
For those early versions of Windows (1.0 to 3.11), the DLLs were the foundation for the entire
GUI.

• Display drivers were merely DLLs with a .DRV extension that provided custom
implementations of the same drawing API through a unified Device Driver Interface
(DDI).
• The Drawing (GDI) and GUI (USER) APIs were merely the function calls exported by
the GDI and USER, system DLLs with .EXE extension.

This notion of building up the operating system from a collection of dynamically loaded libraries
is a core concept of Windows that persists even today. DLLs provide the standard benefits of
shared libraries, such as modularity. Modularity allows changes to be made to code and data in a
single self-contained DLL shared by several applications without any change to the applications
themselves.

Another benefit of the modularity is the use of generic interfaces for plug-ins. A single interface
may be developed which allows old as well as new modules to be integrated seamlessly at run-
time into pre-existing applications, without any modification to the application itself. This
concept of dynamic extensibility is taken to the extreme with the Component Object Model, the
underpinnings of ActiveX.

In Windows 1.x, 2.x and 3.x, all windows applications shared the same address space, as well as
the same memory. A DLL was only loaded once into this address space; from then on all
programs using the library accessed it. The library's data was shared across all the programs.
This could be used as an indirect form of Inter-process communication, or it could accidentally
corrupt the different programs. With Windows 95 and successors every process runs in its own
address space. While the DLL code may be shared, the data is private except where shared data
is explicitly requested by the library. That said, large swathes of Windows 95, Windows 98 and
Windows Me were built from 16-bit libraries, a feature which limited the performance of the
Pentium Pro microprocessor when launched, and ultimately limited the stability and scalability
of the DOS-based versions of Windows.

While DLLs are the core of the Windows architecture, they have a number of drawbacks,
collectively called "DLL hell".[1] Currently, Microsoft promotes Microsoft .NET as one solution
to the problems of DLL hell, although they now promote Virtualization based solutions such as
Microsoft Virtual PC and Microsoft Application Virtualization, because they offer superior
isolation between applications. An alternative mitigating solution to DLL hell has been the
implementation of Side-by-Side Assembly.

[edit] Features of DLL


[edit] Memory management

In Win32, the DLL files are organized into sections. Each section has its own set of attributes,
such as being writable or read-only, executable (for code) or non-executable (for data), and so
on.
The code in a DLL is usually shared among all the processes that use the DLL; that is, they
occupy a single place in physical memory, and do not take up space in the page file. If the
physical memory occupied by a code section is to be reclaimed, its contents are discarded, and
later reloaded directly from the DLL file as necessary.

In contrast to code sections, the data sections of a DLL are usually private; that is, each process
using the DLL has its own copy of all the DLL's data. Optionally, data sections can be made
shared, allowing inter-process communication via this shared memory area. However, because
user restrictions do not apply to the use of shared DLL memory, this creates a security hole;
namely, one process can corrupt the shared data, which will likely cause all other sharing
processes to behave undesirably. For example, a process running under a guest account can in
this way corrupt another process running under a privileged account. This is an important reason
to avoid the use of shared sections in DLLs.

If a DLL is compressed by certain executable packers (e.g. UPX), all of its code sections are
marked as read-and-write, and will be unshared. Read-and-write code sections, much like private
data sections, are private to each process. Thus DLLs with shared data sections should not be
compressed if they are intended to be used simultaneously by multiple programs, since each
program instance would have to carry its own copy of the DLL, resulting in increased memory
consumption.

[edit] Import libraries

Linking to dynamic libraries is usually handled by linking to an import library when building or
linking to create an executable file. The created executable then contains an import address table
(IAT) by which all DLL function calls are referenced (each referenced DLL function contains its
own entry in the IAT). At run-time, the IAT is filled with appropriate addresses that point
directly to a function in the separately-loaded DLL.

Like static libraries, import libraries for DLLs are noted by the .lib file extension. For example,
kernel32.dll, the primary dynamic library for Windows' base functions such as file creation and
memory management, is linked via kernel32.lib.

[edit] Symbol resolution and binding

Each function exported by a DLL is identified by a numeric ordinal and optionally a name.
Likewise, functions can be imported from a DLL either by ordinal or by name. The ordinal
represents the position of the function's address pointer in the DLL Export Address table. It is
common for internal functions to be exported by ordinal only. For most Windows API functions
only the names are preserved across different Windows releases; the ordinals are subject to
change. Thus, one cannot reliably import Windows API functions by their ordinals.

Importing functions by ordinal provides only slightly better performance than importing them by
name: export tables of DLLs are ordered by name, so a binary search can be used to find a
function. The index of the found name is then used to look up the ordinal in the Export Ordinal
table. In 16-bit Windows, the name table was not sorted, so the name lookup overhead was much
more noticeable.

It is also possible to bind an executable to a specific version of a DLL, that is, to resolve the
addresses of imported functions at compile-time. For bound imports, the linker saves the
timestamp and checksum of the DLL to which the import is bound. At run-time Windows checks
to see if the same version of library is being used, and if so, Windows bypasses processing the
imports. Otherwise, if the library is different from the one which was bound to, Windows
processes the imports in a normal way.

Bound executables load somewhat faster if they are run in the same environment that they were
compiled for, and exactly the same time if they are run in a different environment, so there's no
drawback for binding the imports. For example, all the standard Windows applications are bound
to the system DLLs of their respective Windows release. A good opportunity to bind an
application's imports to its target environment is during the application's installation. This keeps
the libraries 'bound' until the next OS update. It does, however, change the checksum of the
executable, so it is not something that can be done with signed programs, or programs that are
managed by a configuration management tool that uses checksums (such as MD5 checksums) to
manage file versions. As more recent Windows versions have moved away from having fixed
addresses for every loaded library (for security reasons), the opportunity and value of binding an
executable is decreasing.

[edit] Explicit run-time linking

DLL files may be explicitly loaded at run-time, a process referred to simply as run-time dynamic
linking by Microsoft, by using the LoadLibrary (or LoadLibraryEx) API function. The
GetProcAddress API function is used to look up exported symbols by name, and FreeLibrary
— to unload the DLL. These functions are analogous to dlopen, dlsym, and dlclose in the
POSIX standard API.

// LSPaper draw using OLE2 function if available on client

HINSTANCE hOle2Dll ;

hOle2Dll = LoadLibrary ( "OLE2.DLL" ) ;

if ( hOle2Dll != NULL )
{
FARPROC lpOleDraw ;

lpOleDraw = GetProcAddress ( hOle2Dll , "OleDraw" ) ;

if ( lpOleDraw != (FARPROC)NULL )
{
(*lpOleDraw) (pUnknown , dwAspect , hdcDraw , lprcBounds ) ;
}
FreeLibrary ( hOle2Dll ) ;
}
The procedure for explicit run-time linking is the same in any language that supports pointers to
functions, since it depends on the Windows API rather than language constructs.

[edit] Delayed loading

Normally, an application that was linked against a DLL’s import library will fail to start if the
DLL cannot be found, because Windows will not run the application unless it can find all of the
DLLs that the application may require. However an application may be linked against an import
library to allow delayed loading of the dynamic library.[2] In this case the operating system will
not try to find or load the DLL when the application starts; instead, it will only try to find and
load the DLL when one of its functions is called. If the DLL cannot be found or loaded, or the
called function does not exist, the operating system will generate an exception, which the
application can catch and handle appropriately. If the application does not handle the exception,
it will be caught by the operating system, which will terminate the program with an error
message.

The delay-loading mechanism also provides notification hooks, allowing the application to
perform additional processing or error handling when the DLL is loaded and/or any DLL
function is called.

[edit] Compiler and language considerations


[edit] Delphi

In the heading of a source file, the keyword library is used instead of program. At the end of
the file, the functions to be exported are listed in exports clause.

Delphi does not require LIB files to import functions from DLLs; to link to a DLL, the external
keyword is used in the function declaration.

[edit] Microsoft Visual Basic

In Visual Basic (VB), only run-time linking is supported; but in addition to using LoadLibrary
and GetProcAddress API functions, declarations of imported functions are allowed.

When importing DLL functions through declarations, VB will generate a run-time error if the
DLL file cannot be found. The developer can catch the error and handle it appropriately.

When creating DLLs in VB, the IDE will only allow you to create ActiveX DLLs, however
methods have been created [3] to allow the user to explicitly tell the linker to include a .DEF file
which defines the ordinal position and name of each exported function. This allows the user to
create a standard Windows DLL using Visual Basic (Version 6 or lower) which can be
referenced through a "Declare" statement.

[edit] C and C++


Microsoft Visual C++ (MSVC) provides a number of extensions to standard C++ which allow
functions to be specified as imported or exported directly in the C++ code; these have been
adopted by other Windows C and C++ compilers, including Windows versions of GCC. These
extensions use the attribute __declspec before a function declaration. Note that when C
functions are accessed from C++, they must also be declared as extern "C" in C++ code, to
inform the compiler that the C linkage should be used.[4]

Besides specifying imported or exported functions using __declspec attributes, they may be
listed in IMPORT or EXPORTS section of the DEF file used by the project. The DEF file is
processed by the linker, rather than the compiler, and thus it is not specific to C++.

DLL compilation will produce both DLL and LIB files. The LIB file is used to link against a DLL
at compile-time; it is not necessary for run-time linking. Unless your DLL is a COM server, the
DLL file must be placed in one of the directories listed in the PATH environment variable, in the
default system directory, or in the same directory as the program using it. COM server DLLs are
registered using regsvr32.exe, which places the DLL's location and its globally unique ID
(GUID) in the registry. Programs can then use the DLL by looking up its GUID in the registry to
find its location.

[edit] Programming examples


[edit] Creating DLL exports

The following examples show language-specific bindings for exporting symbols from DLLs.

Delphi

library Example;

// function that adds two numbers


function AddNumbers(a, b : Double): Double;
begin
Result := a + b;
end;

// export this function


exports AddNumbers;

// DLL initialization code: no special handling needed


begin
end.

C and C++

#include <windows.h>

// DLL entry function (called on load, unload, ...)


BOOL APIENTRY DllMain(HANDLE hModule, DWORD dwReason, LPVOID lpReserved)
{
return TRUE;
}

// Exported function - adds two numbers


extern "C" __declspec(dllexport) double AddNumbers(double a, double b)
{
return a + b;
}

[edit] Using DLL imports

The following examples show how to use language-specific bindings to import symbols for
linking against a DLL at compile-time.

Delphi

{$APPTYPE CONSOLE}

program Example;

// import function that adds two numbers


function AddNumbers(a, b : Double): Double; external 'Example.dll';

// main program
var
R:Double;

begin
R := AddNumbers(1, 2);
Writeln('The result was: ', R);
end.

C and C++

Make sure you include Example.lib file (assuming that Example.dll is generated) in the project
(Add Existing Item option for Project!) before static linking. The file Example.lib is
automatically generated by the compiler when compiling the DLL. Not executing the above
statement would cause linking error as the linker would not know where to find the definition of
AddNumbers. You also need to copy the DLL Example.dll to the location where the .exe file
would be generated by the following code.

#include <windows.h>
#include <stdio.h>

// Import function that adds two numbers


extern "C" __declspec(dllimport) double AddNumbers(double a, double b);

int main(int argc, char *argv[])


{
double result = AddNumbers(1, 2);
printf("The result was: %f\n", result);
return 0;
}

[edit] Using explicit run-time linking

The following examples show how to use the run-time loading and linking facilities using
language-specific WIN32 API bindings.

[edit] Microsoft Visual Basic

Option Explicit
Declare Function AddNumbers Lib "Example.dll" _
(ByVal a As Double, ByVal b As Double) As Double

Sub Main()
Dim Result As Double
Result = AddNumbers(1, 2)
Debug.Print "The result was: " & Result
End Sub

[edit] Delphi

program Example;

{$APPTYPE CONSOLE}

uses Windows;

var
AddNumbers : function (a, b: Double): Double;
LibHandle : HMODULE;

begin
LibHandle := LoadLibrary('example.dll');

if LibHandle = 0 then
Exit;

AddNumbers := GetProcAddress(LibHandle, 'AddNumbers');

if Assigned( AddNumbers ) then


Writeln( '1 + 2 = ', AddNumbers( 1, 2 ) );
else
Writeln('Error: unable to find DLL function');

FreeLibrary(LibHandle);
end.

[edit] C and C++

#include <windows.h>
#include <stdio.h>

// DLL function signature


typedef double (*importFunction)(double, double);

int main(int argc, char **argv)


{
importFunction addNumbers;
double result;

// Load DLL file


HINSTANCE hinstLib = LoadLibrary(TEXT("Example.dll"));
if (hinstLib == NULL) {
printf("ERROR: unable to load DLL\n");
return 1;
}

// Get function pointer


addNumbers = (importFunction)GetProcAddress(hinstLib, "AddNumbers");
if (addNumbers == NULL) {
printf("ERROR: unable to find DLL function\n");
FreeLibrary(hinstLib);
return 1;
}

// Call function.
result = addNumbers(1, 2);

// Unload DLL file


FreeLibrary(hinstLib);

// Display result
printf("The result was: %f\n", result);

return 0;
}

[edit] Python

import ctypes

my_dll = ctypes.cdll.LoadLibrary("Example.dll")

# The following "restype" method specificationis needed to make


# Python understand what type is returned by the function.
my_dll.AddNumbers.restype = ctypes.c_double

p = my_dll.AddNumbers(ctypes.c_double(1.0), ctypes.c_double(2.0))

print "The result was: ", p

[edit] Component Object Model


The Component Object Model (COM) extends the DLL concept to object-oriented
programming. Objects can be called from another process or hosted on another machine. COM
objects have unique GUIDs and can be used to implement powerful back-ends to simple GUI
front ends such as Visual Basic and ASP. They can also be programmed from scripting
languages. COM objects are more complex to create and use than DLLs.

[edit] DLL Hijacking


Due to a vulnerability commonly known as DLL Hijacking, many programs will load and
execute a malicious DLL contained in the same folder as a file on a remote system. The
vulnerability was discovered by ethical hacker HD Moore, who has published an exploit for the
open-source based penetration testing software Metasploit.[5]

[edit] See also


• Dependency Walker, a utility which displays exported and imported functions of DLL
and EXE files.
• DLL Hijacking
• Dynamic library
• Library (computing)
• Linker (computing)
• Loader (computing)
• Object file
• Shared library
• Static library

[edit] External links


• dllexport, dllimport on MSDN
• Dynamic-Link Libraries on MSDN
• What is a DLL? on Microsoft support site
• Dynamic-Link Library Functions on MSDN
• Microsoft Portable Executable and Common Object File Format Specification
• Microsoft specification for dll files

[edit] References
1. ^ "The End of DLL Hell". Microsoft Corporation. http://msdn.microsoft.com/en-
us/library/ms811694.aspx. Retrieved 2009-07-11.
2. ^ "Linker Support for Delay-Loaded DLLs". Microsoft Corporation.
http://msdn.microsoft.com/en-us/library/151kt790.aspx. Retrieved 2009-07-11.
3. ^ Petrusha, Ron (2005-04-26). "Creating a Windows DLL with Visual Basic". O'Reilly
Media. http://www.windowsdevcenter.com/pub/a/windows/2005/04/26/create_dll.html?
page=1. Retrieved 2009-07-11.
4. ^ MSDN, Using extern to Specify Linkage
5. ^ TechWorld: Hacking toolkit publishes DLL hijacking exploit
• Hart, Johnson. Windows System Programming Third Edition. Addison-Wesley, 2005.
ISBN 0-321-25619-0
• Rector, Brent et al. Win32 Programming. Addison-Wesley Developers Press, 1997. ISBN
0-201-63492-9.

[hide]
v•d•e
Microsoft Windows components

Active Scripting (WSH · VBScript · JScript) · Aero · AutoPlay · AutoRun ·


ClearType · COM (ActiveX · ActiveX Document · COM Structured storage ·
DCOM · OLE · OLE Automation · Transaction Server) · Desktop Window
Manager · DirectX · Explorer · Graphics Device Interface · Imaging Format · .NET
Core
Framework · Search (IFilter · Saved search) · Server Message Block · Shell
(Extensions · File associations · Namespace · Special Folders) · Start menu ·
Previous Versions · Taskbar · Windows USER · Win32 console · XML Paper
Specification

Backup and Restore Center · cmd.exe · Control Panel (Applets) · Device


Manager · Disk Cleanup · Disk Defragmenter · Driver Verifier · Event Viewer ·
IExpress · Management Console · Netsh · Problem Reports and Solutions ·
Management
Resource Monitor · Sysprep · System Policy Editor · System Configuration · Task
tools
Manager · System File Checker · System Restore · WMI · Windows Installer ·
Windows PowerShell · Windows Update · WAIK · WinSAT · Windows Easy
Transfer

Calculator · Calendar · CD Player · Character Map · Contacts · DVD Maker · Fax


and Scan · Internet Explorer · Journal · Mail · Magnifier · Media Center · Media
Player · Meeting Space · Mobile Device Center · Mobility Center · Movie Maker ·
Applications
Narrator · Notepad · Paint · Photo Gallery · Private Character Editor · Remote
Assistance · Windows Desktop Gadgets · Snipping Tool · Sound Recorder ·
Speech Recognition · Tablet PC Input Panel · WordPad

Chess Titans · FreeCell · Hearts · Hold 'Em · InkBall · Mahjong Titans ·


Games
Minesweeper · Pinball · Purble Place · Solitaire · Spider Solitaire · Tinker

Ntoskrnl.exe · hal.dll · System Idle Process · Svchost.exe · Registry · Windows


Kernel service · DLL · EXE · NTLDR / Boot Manager · Winlogon · Recovery Console ·
I/O · WinRE · WinPE · Kernel Patch Protection

BITS · Task Scheduler · Wireless Zero Configuration · Shadow Copy · Error


Services
Reporting · Multimedia Class Scheduler · CLFS

NTFS (Hard link · Junction point · Mount Point · Reparse point · Symbolic link ·
File systems
TxF · EFS) · WinFS · FAT32·FAT16·FAT12 · exFAT · CDFS · UDF · DFS · IFS
Domains · Active Directory · DNS · Group Policy · Roaming user profiles · Folder
redirection · Distributed Transaction Coordinator · MSMQ · Windows Media
Services · Rights Management Services · IIS · Terminal Services · WSUS ·
Server Windows SharePoint Services · Network Access Protection · PWS · DFS
Replication · Remote Differential Compression · Print Services for UNIX · Remote
Installation Services · Windows Deployment Services · System Resource
Manager · Hyper-V

NT series architecture · Object Manager · Startup process (Vista/7) · I/O request


packet · Kernel Transaction Manager · Logical Disk Manager · Security Accounts
Architecture
Manager · Windows File Protection / Windows Resource Protection · Windows
library files · LSASS · CSRSS · SMSS · MinWin

User Account Control · BitLocker · Defender · Data Execution Prevention ·


Security Security Essentials · Protected Media Path · Mandatory Integrity Control · User
Interface Privilege Isolation · Windows Firewall · Security Center

Unix subsystem (Microsoft POSIX · Interix) · Virtual DOS machine ·


Compatibility
command.com · Windows on Windows · WoW64 · Windows XP Mode

DLL hijacking
From Wikipedia, the free encyclopedia
(Redirected from DLL Hijacking)
Jump to: navigation, search

DLL Hijacking is a computing term that refers to a vulnerability that is triggered when a
vulnerable file type is opened from within a directory controlled by the attacker. The directory
can be a USB drive, an extracted archive, or a remote network share. In most cases, the user will
have to browse to the directory and then open the target file type for this exploit to work. The file
opened by the user can be completely harmless, the flaw is that the application launched to
handle the file type will inadvertently load a DLL from the working directory.[1]

In practice, this flaw can be exploited by sending the target user a link to a network share
containing a file they perceive as safe. For example iTunes, which was affected by this flaw until
patched, is associated with a number of media file types, and each of these would result in a
specific DLL being loaded from the same directory as the opened file.[2] The user would be
presented with a link in the form of \\server\movies\ and a number of media files would be
present in this directory. If the user tries to open any of these files, iTunes would search the
remote directory for one or more DLLs and then load these DLLs into the process. If the attacker
supplied a malicious DLL containing malware or shellcode, the user would be rendered open to
further exploits.[3]
The vulnerability was discovered by HD Moore, who has published an exploit for the open-
source based penetration testing software Metasploit.[4]

Making DLLs easy to build and use


By bnn3nasdfasdfa | 8 Mar 2004
How to quickly build a DLL file from an existing class and how to easily use it.

Introduction
Trying to find out how to build DLLs for a beginner or even experienced programmer can be
complicated to configure and use. This article is to post a simple method of building a DLL and
then using that DLL in a project with no effort other than needing to include the header file. All
you will need is the class already designed and ready to become a DLL file. Even though this is
probably some novice stuff, it is believed some experience in working with the MSVS projects
are needed. In addition, the advanced person might find some finer points here that could be
useful.

Background
Some people might find it hard to research on the Internet useful ways of building DLLs the very
simple way. The information in this document is actually a collection of information found on
this site and other sites on building DLLs so it is nothing new (only pulling together their
efforts). A few articles relate in detail on how to build DLLs and highlights the finer points of
what a DLL actually is and how to use it. I suggest reading them first to find broader definitions.

However, it seems that a lot of effort to the novice "DLL Builder" is lacking or too complicated
to understand thereof. You may have heard from someone it is easy to build DLLs but trying to
figure out how it all pulls together can be an effort in its own. Some articles do not seem to also
highlight the importance of directory locations. A DLL, its library and header file(s) must all be
in a findable directory that can either be a Windows directory or somewhere where the compiler
can find them. Otherwise you may encounter frustrating compiler problems if one or all of the
DLLs file are not found, which usually results in an unusable circumstance.

Using the code


There are no code attachments other than comment areas. This should be simple enough to figure
out by working alongside in your MSVS environment. What you will need to do is:

1. Have a class already prepared that needs to be a DLL file.


2. Start a new DLL project
3. Insert MACRO definitions in your �StdAfx.h� file.
4. Add easy including of the DLL libraries in the main �foo.h� file.
5. Add some project preprocessor MACROs.
6. Make batch files for the �Post Builds� - to appropriately copy library, header and DLL
files to findable directory(s).

Details
1. It is assumed that you already created your class, so this step is bypassed.
2. First is the easy part. Start up �Microsoft Visual Studio Visual C++�. Select �File-
>New� to pull up the creation dialog. Under the �Projects� tab button, select �MFC
AppWizard (DLL)�. Enter the project name and directory to be used then click
�OK�. The rest of the options are not necessary, so select what you want to do in the
rest of the Wizard and click �Finish�.

What typically happens after you started the new project is that a source code and header
file has been created for you. These are not really needed if you already have the code
you are going to build. Only the preset definitions for the project are needed from this
step. It is suggested that you just empty the files created by highlighting and deleting the
source and header files created by the Wizard. Do not edit the �StdAfx� files yet as
these are needed.

The point in this step is to simply copy and paste your class (�foo�) into the header
and source files created. It should be a simple concept to grasp, but if you are skeptical
you can alternatively just add your �foo� source file and header file into the project so
that it is built.

You will next begin the editing part. You must be able to make the build compile as a
DLL file correctly for usage. DLL files use a combination of Exporting and Importing.
When this is the build project you will need to �Export�. When it is being included in
another project that does not use the source code, you will need to �Import�. This is
probably the harder part of understanding the DLL as it is not necessarily implemented
for you and requires a little effort before understanding. As a side note, the
�resource.h� is only required by DLLs that have dialogs in them. In this case, you will
have to remember that you may have several �resource.h� files then and will have to
include a full path statement to each �resource.h� file. This article will show in the
following example what is probably the best method (keeping in mind that if a project
needs this resource file it must manually be added to the project including the full path to
it).

3. Edit the �StdAfx.h� file and include the following MACROS and header file
�resource.h�.

Collapse

StdAfx.h
//

// MSVS included headers, definitions, etc.


//

// Somewhere near the bottom

//This is the project macro preprocessor definition

//you will be adding shortly.

#ifdef DLL_EXPORTS
//This is to be used for the class header file.

#define DLL_BUILD_MACRO __declspec(dllexport)


#else
#define DLL_BUILD_MACRO __declspec(dllimport)
#endif

#ifndef _DLL_BUILD_ //Why do this? Is it necessary? Yes.

#define _DLL_BUILD_ DLL_BUILD_MACRO


#endif

//make sure resources are included here, if desired,

//to prevent ambiguous

//callings to different resource.h files.

#include �resource.h�

//

// Rest of file

//

4.
o Next we go ahead and edit the main header file of your DLL code. Only a few
simple lines need be added to support accurate DLL building and usage:

Collapse

foo.h
#ifndef _FOO_H_
#define _FOO_H_

//

// Miscellanous here

//
/*This part automatically includes any libraries when called.
When this file is built within the DLL project,
it will not be called because
of our preprocessor macro definition �_FOO_DLL_�.
However, when this file is called from another project,
not part of this build,
it appropriately chooses the correct library and
includes them for you.
WHAT this means is that you will not have to
add the library to the project link settings
for a project that requires this DLL. This helps to
avoid the tedious task of
linking to several custom DLLs.

Note there are two different libraries here and


probably not necessary but give
you an idea of how to separate debug versions
from release versions.*/

#ifndef _FOO_DLL_
#ifdef _DEBUG
//You will be building a debug program that

//uses this file, so in this case we

//want the debug library.

#pragma comment( lib, �food.lib� )


#else
//You will be building a release program that

//uses this file, so in this case we

//want the release library.

#pragma comment( lib, �foo.lib� )


#endif
#endif
#ifndef _DLL_BUILD_
#define _DLL_BUILD_ //Makes sure there are no compiler errors.

#endif

class _DLL_BUILD_ foo


private:
//

// Your members

//

public:
//

// Your functions
//

};

#endif

o To continue, edit the �foo.cpp� file and make sure you include the appropriate
headers.

Collapse

foo.cpp

#include �stdafx.h� //place first

#include �foo.h�

//

//Your code

//

5.
o This next step requires that you actually add the MACROS into the project
settings. In the menu tool bar, go to �Project->Settings�� or press Alt+F7
alternatively. Click the �C/C++� tab. Under �Preprocessor definitions:� add
the macro definitions �_FOO_DLL_� and �DLL_EXPORTS� at the end of the list
of other macro definitions. Be sure to separate each new MACRO with a coma
(�,�). Make sure to do this for the release version too, as you will have to redo
these next steps for each type of build. Make sure to build with any �debug
info� and/or �browse info� if this is a debug version and if you want to debug
the DLL later using the MSVS debugger.
o Next you will want to prepare for the last step. Go to the �Post-build step� tab
under the same dialog. Under �Post-buid command(s):� click an empty space
and enter �Debug.bat�. For the release version go to in the left pane and in the
combo box �Settings For:� select �Win32 Release�. Enter like you did
before in �Post-Build command(s)� but not �Debug.bat� and instead
�Release.bat�. This is it for all the project settings.
6. Now to build the batch files. Create a blank file. You will be adding command line codes
that will copy your files to a findable directory. The point here is that if you may have a
ton of DLLs and it will be easier to have all the needed components in one directory. This
is easier versus linking to several directories. Below are the suggestions used in this
article that maybe you will want to consider for adding other options:

Collapse
Debug.bat
Copy �Debug\foo.lib� �c:\<libraries dir>\food.lib�
REM copy the dll file to the windows system32
REM directory to make the DLL easily
REM accessible. Note that you will have to install or include the
REM dll file in your distribution package for the program that uses it.
Copy �Debug\foo.dll� �c:\%system dir%\system32�
Copy �foo.h� �c:\<headers dir>�

Release.bat
Copy �Release\foo.lib� �c:\<libraries dir>�
REM copy the dll file to the windows system32 directory
REM to make the DLL easily accessible.
REM Note that you will have to install or include the dll
REM file in your distribution package.
REM Unfortanetly this will overwrite any other DLL files
REM such as the Release/Debug version,
REM so accurate update compilations are needed.
REM You will have to note this yourself
REM before distributing to the public.
Copy �Release\foo.dll� �c:\%system dir%\system32�
Copy �foo.h� �c:\<headers dir>�

You are completely finished. Providing you have already pre-tested your class and considered
where your library, header and DLL files are being copied you should receive no problems at all.
Your class should be a DLL file that can be easily used in future projects without much work.
All you have to do with this information is to know that you only need to include the header file
and everything is done for you. Everything is made as easy and simple as possible for you on in
out.

Notes
Some reminders are that the DLL file must be accessible. This article references using the
�c:\windows\system32� directory. This may be a bad idea if you want to later retrieve that
DLL file and must find it in the large collection of DLLs probably existing already in that
directory. It can also be annoying if you later decide to change the name of the project and build
under the different name. In that case you will have to find the old DLL file and manually
remove it or just leave it there.

Alternatively you can copy the DLL file(s), library file(s) and header file(s) to the project
directory that will be using it. However, if you decide to use the same DLL in another project
you will have to go back and add a copy command line(s) in both the �Debug.bat� and
�Release.bat� files and then rebuild the project to have them copied.

Also note that in MSVS it is easy to add custom directories for new header directories to search,
but unfortunately not DLL directories. Go under �Tools->Options� in the menu bar and then
under the �Directories� tab button. Where �<libraries dir>� above in the batch file should
be included under �Library files� in the �Show directories for� combo box and �<headers
dir>� in the �Include files�. Some MACROS may seem repetitive or not in use in this article.
However, the compiler, when building, requires this type of style for both the DLL project and
the project using that DLL. If you find that some are not needed you can remove them yourself.
This code is designed so that they are there and readably Accessible. Testing shows that for both
the DLL build and the project using that DLL build require these types of MACRO setups for
usage. Other possibilities and locations exist.

Points of Interest
Figuring out Firewalls, UNIX and why mail programs show the contents automatically in
Windows.

History
• No reformatting necessary.

License
This article has no explicit license attached to it but may contain usage terms in the article text or
the download files themselves. If in doubt please contact the author via the discussion board
below.

A list of licenses authors might use can be found here

DLL Files in Windows- What Are They?

Dynamic Link Library (DLL) files are an essential part of the Windows operating system.
Although they are ubiquitous, most PC users neither know nor care what these files do.
Nonetheless, a little understanding of the role that DLL files play can make the computer a little
less of a mystery box. Only programmers and computer technicians need to know any of the gory
details of the structure and function of a DLL, but these files are so important that all of us should
know a few simple facts about them. Here is some information for the non-technical PC user.

What Do DLL Files Do?

A DLL file is indicated by the extension DLL in its name. Microsoft gives several definitions of a
DLL but the one that I think has the least jargon is this:

"A Dynamic Link Library (DLL) is a file of code containing functions that can be called from
other executable code (either an application or another DLL). Programmers use DLLs to provide
code that they can reuse and to parcel out distinct jobs. Unlike an executable (EXE) file, a DLL
cannot be directly run. DLLs must be called from other code that is already executing."

Another way of putting it is that DLL files are like modules that can be plugged into different
programs whenever a program needs the particular features that come with the DLL. The original
concept behind DLL files was to simplify things. It was recognized that there were many
functions common to a lot of software. For example, most programs need to create the graphical
interface that appears on the screen. Instead of having to contain the code to create the interface
themselves, programs call on a DLL for that function. The idea is to have a central library where
everyone can obtain the commonly used functions, as they are needed. This cuts down on code,
speeds things up, is more efficient, etc. They are called dynamic links because they are put to use
only when a program calls on them and they are run in the program’s own memory space. More
than one program can use the functions of a particular DLL at the same time.

Parenthetically, I have to say that the software developers (not least of all, Microsoft) have strayed
from the path of keeping things simple. A computer today may contain a thousand or more
different DLL files. Also, Microsoft seems to tinker endlessly with DLL files, giving rise to many
different versions of a file with the same name, not all compatible. Microsoft maintains a database
with information about various DLLs to help with version conflicts.

There are several very important DLLs that contain a large number of the basic Windows
functions. Since they figure so importantly in the workings of Windows, it is worth noting their
names.

Examples of Important DLL files

COMDLG32.DLL
Controls the dialog boxes
GDI32.DLL
Contains numerous functions for drawing graphics, displaying text, and managing fonts
KERNEL32.DLL
Contains hundreds of functions for the management of memory and various processes
USER32.DLL
Contains numerous user interface functions. Involved in the creation of program windows
and their interactions with each other
It is the common use of these types of DLLs by most programs that ensures that all applications
written for Windows will have a standard appearance and behavior. This standardization was a big
factor in the rise of Windows to dominance of the desktop computer. Anyone who was working
with computers in the days of DOS will remember that every program had its own interface and
menus.

Error Messages involving DLLs

PC users often see DLLs (especially the ones mentioned above) mentioned in error messages. One
might conclude, therefore, that something is always going wrong with DLLs. Very often,
however, it is not the DLL itself that is at fault. DLL files figure prominently in the error messages
when something in the system goes awry because they are involved in the most basic processes of
Windows. They are in effect the messenger of trouble, not the actual trouble. It is beyond our
scope to discuss any details of error messages but there are substantial references on interpreting
them. One is at James Eshelman's site.

Using Regsvr32.exe to Register DLLs

First, let it be clear that the important system file regsvr.exe should not be confused with the file
regsrv.exe that is used by certain worms and Trojans.

In order for a DLL to be used, it has to be registered by having appropriate references entered in
the Registry. It sometimes happens that a Registry reference gets corrupted and the functions of
the DLL cannot be used anymore. The DLL can be re-registered by opening Start-Run and
entering the command
regsvr32 somefile.dll
This command assumes that somefile.dll is in a directory or folder that is in the path. Otherwise,
the full path for the DLL must be used. A DLL file can also be unregistered by using the switch
"/u" as shown below.
regsvr32 /u somefile.dll
This can be used to toggle a service on and off.

You might also like