Professional Documents
Culture Documents
It would be wonderful if we could write memory addresses, and binary values of lowlevel code to the original source
programs that were guaranteed to work which the processor actually understands. which generated it.
correctly and never needed to be debugged. After all, the processor really doesn't care
The second challenge is how to describe
Until that halcyon day, the normal pro whether you used object oriented program
the executable program and its relationship
gramming cycle is going to involve writing ming, templates, or smart pointers; it only
to the original source with enough detail to
a program, compiling it, executing it, and understands a very simple set of operations
allow a debugger to provide the program
then the (somewhat) dreaded scourge of on a limited number of registers and mem
mer useful information. At the same time,
debugging it. And then repeat until the pro ory locations containing binary values.
the description has to be concise enough so
gram works as expected.
As a compiler reads and parses the that it does not take up an extreme amount
It is possible to debug programs by in source of a program, it collects a variety of of space or require significant processor
serting code that prints values of selected information about the program, such as the time to interpret. This is where the DWARF
interesting variables. Indeed, in some situa line numbers where a variable or function Debugging Format comes in: it is a compact
tions, such as debugging kernel drivers, this is declared or used. Semantic analysis ex representation of the relationship between
may be the preferred method. There are tends this information to fill in details such the executable program and the source in a
lowlevel debuggers that allow you to step as the types of variables and arguments of way that is reasonably efficient for a debug
through the executable program, instruc functions. Optimizations may move parts of ger to process.
tion by instruction, displaying registers and the program around, combine similar
memory contents in binary. pieces, expand inline functions, or remove
parts which are unneeded. Finally, code
The Debugging
But it is much easier to use a sourcelev
el debugger which allows you to step
generation takes this internal representa Process
W
tion of the program and generates the actu hen a programmer runs a program
through a program's source, set break
al machine instructions. Often, there is an under a debugger, there are some
points, print variable values, and perhaps a
other pass over the machine code to per common operations which he or she may
few other functions such as allowing you to
form what are called "peephole" optimiza want to do. The most common of these are
call a function in your program while in the
tions that may further rearrange or modify setting a breakpoint to stop the debugger at
debugger. The problem is how to coordi
the code, for example, to eliminate dupli a particular point in the source, either by
nate two completely different programs,
cate instructions. specifying the line number or a function
the compiler and the debugger, so that the
program can be debugged. Allinall, the compiler's task is to take name. When this breakpoint is hit, the pro
the wellcrafted and understandable source grammer usually would like to display the
code and convert it into efficient but essen values of local or global variables, or the ar
Translating from tially unintelligible machine language. The guments to the function. Displaying the
Source to Executable better the compiler achieves the goal of cre call stack lets the programmer know how
T
the program arrived at the breakpoint in
he process of compiling a program ating tight and fast code, the more likely it
cases where there are multiple execution
from humanreadable form into the bi is that the result will be difficult to under
paths. After reviewing this information, the
nary form that a processor executes is quite stand.
programmer can ask the debugger to con
complex, but it essentially involves succes During this translation process, the tinue execution of the program under test.
sively recasting the source into simpler and compiler collects information about the
simpler forms, discarding information at program which will be useful later when There are a number of additional opera
each step until, eventually, the result is the the program is debugged. There are two tions that are useful in debugging. For ex
sequence of simple operations, registers, challenges to doing this well. The first is ample, it may be helpful to be able to step
that in the later parts of this process, it may through a program line by line, either en
be difficult for the compiler to relate the tering or stepping over called functions.
Michael Eager is Principal Consultant at
changes it is making to the program to the Setting a breakpoint at every instance of a
Eager Consulting (www.eagercon.com),
original source code that the programmer template or inline function can be impor
specializing in development tools for
wrote. For example, the peephole optimizer tant for debugging C++ programs. It can
embedded systems. He was a member
may remove an instruction because it was be helpful to stop just before the end of a
of PLSIG's DWARF standardization com
able to switch around the order of a test in function so that the return value can be dis
mittee and has been Chair of the
code that was generated by an inline func played or changed. Sometimes the pro
DWARF Standards Committee since
tion in the instantiation of a C++ template. grammer may want to bypass execution of
1999. Michael can be contacted at
By the time it gets its metaphorical hands a function, returning a known value instead
eager@eagercon.com.
on the program, the optimizer may have a of what the function would have (possibly
© Eager Consulting, 2006, 2007, 2012
difficult time connecting its manipulations incorrectly) computed.
There are also data related operations
that are useful. For example, displaying
while attempting to reverse engineer the
Sun extensions. Nonetheless, stabs is still
A Brief History of
the type of a variable can avoid having to widely used. DWARF
look up the type in the source files. Dis
COFF stands for Common Object File
playing the value of a variable in different DWARF 1 ─ Unix SVR4 sdb
Format and originated with Unix System V
formats, or displaying a memory or register
in a specified format is helpful.
Release 3. Rudimentary debugging infor and PLSIG
There are some operations which might
be called advanced debugging functions:
mation was defined with the COFF format,
but since COFF includes support for named
sections, a variety of different debugging
D WARF3 was developed by Brian Rus
sell, Ph.D., at Bell Labs in 1988 for use
with the C compiler and sdb debugger in
for example, being able to debug multi formats such as stabs have been used with Unix System V Release 4 (SVR4). The Pro
threaded programs or programs stored in COFF. The most significant problem with gramming Languages Special Interest
readonly memory. One might want a de COFF is that despite the Common in its Group (PLSIG), part of Unix International
bugger (or some other program analysis name, it isn’t the same in each architecture (UI), documented the DWARF generated by
tool) to keep track of whether certain sec which uses the format. There are many SVR4 as DWARF Version 1 in 1992. Al
tions of code had been executed or not. variations in COFF, including XCOFF (used though the original DWARF had several
Some debuggers allow the programmer to on IBM RS/6000), ECOFF (used on MIPS clear shortcomings, most notably that it
call functions in the program being tested. and Alpha), and Windows PECOFF. Docu was not very compact, the PLSIG decided to
In the notsodistant past, debugging pro mentation of these variants is available to standardize the SVR4 format with only
grams that had been optimized would have varying degrees but neither the object mod minimal modification. It was widely adopt
been considered an advanced feature. ule format nor the debugging information ed within the embedded sector where it
is standardized. continues to be used today, especially for
The task of a debugger is to provide the
programmer with a view of the executing PECOFF is the object module format small processors.
program in as natural and understandable used by Microsoft Windows beginning with
fashion as possible, while permitting a wide Windows 95. It is based on the COFF for DWARF 2 ─ PLSIG
T
range of control over its execution. This mat and contains both COFF debugging
he PLSIG continued to develop and
means that the debugger has to essentially data and Microsoft’s own proprietary Code
document extensions to DWARF to ad
reverse much of the compiler’s carefully View or CV4 debugging data format. Docu
dress several issues, the most important of
crafted transformations, converting the pro mentation on the debugging format is both
which was to reduce the size of debugging
gram’s data and state back into the terms sketchy and difficult to obtain.
data that were generated. There were also
that the programmer originally used in the
OMF stands for Object Module Format additions to support new languages such as
program’s source.
and is the object file format used in CP/M, the upandcoming C++ language. DWARF
The challenge of a debugging data for DOS and OS/2 systems, as well as a small Version 2 was released as a draft standard
mat, like DWARF, is to make this possible number of embedded systems. OMF de in 1993.
and even easy. fines public name and line number infor
In an example of the domino theory in
mation for debuggers and can also contain
action, shortly after PLSIG released the
Microsoft CV, IBM PM, or AIX format de
Debugging Formats bugging data. OMF only provides the most
draft standard, fatal flaws were discovered
T
in Motorola's 88000 microprocessor. Mo
here are several debugging formats: rudimentary support for debuggers.
torola pulled the plug on the processor,
stabs, COFF, PECOFF, OMF, IEEE695,
and two variants1 of DWARF, to name some IEEE695 is a standard object file and which in turn resulted in the demise of
common ones. I’m not going to describe debugging format developed jointly by Mi Open88, a consortium of companies that
these in any detail. The intent here is only crotec Research and HP in the late 1980’s were developing computers using the
to mention them to place the DWARF De for embedded environments. It became an 88000. Open88 in turn was a supporter of
bugging Format in context. IEEE standard in 1990. It is a very flexible Unix International, sponsor of PLSIG, which
specification, intended to be usable with al resulted in UI being disbanded. When UI
The name stabs comes from symbol ta most any machine architecture. The de folded, all that remained of the PLSIG was
ble strings, since the debugging data were bugging format is block structured, which a mailing list and a variety of ftp sites that
originally saved as strings in Unix’s a.out corresponds to the organization of the had various versions of the DWARF 2 draft
object file’s symbol table. Stabs encodes source better than other formats. Although standard. A final standard was never re
the information about a program in text it is an IEEE standard, in many ways IEEE leased.
strings. Initially quite simple, stabs has 695 is more like the proprietary formats.
Since Unix International had disap
evolved over time into a quite complex, oc Although the original standard is readily
peared and PLSIG disbanded, several orga
casionally cryptic and lessthanconsistent available from IEEE, Microtec Research
nizations independently decided to extend
debugging format. Stabs is not standard made extensions to support C++ and opti
DWARF 1 and 2. Some of these extensions
ized nor well documented2. Sun Microsys mized code which are poorly documented.
were specific to a single architecture, but
tems has made a number of extensions to The IEEE standard was never revised to in
others might be applicable to any architec
stabs. GCC has made other extensions, corporate the Microtec Research or other
ture. Unfortunately, the different organiza
changes. Despite being an IEEE standard,
tions didn’t work together on these exten
1
DWARF Version 1 is significantly different from it's use is limited to a few small processors.
sions. Documentation on the extensions is
Versions 2 and later.
2 3
In 1992, the author wrote an extensive docu The name DWARF is something of a pun, since
ment describing the stabs generated by Sun Mi it was developed along with the ELF object file
crosytems' compilers. Unfortunately, it was never format. The name is an acronym for “Debugging
widely distributed. With Arbitrary Record Formats”.
M
cate information that is contained in the
ost modern programming languages
DWARF 3 ─ Free Standards object file, such as identifying the processor
are block structured: each entity (a
architecture or whether the file is written in
Group class definition or a function, for example)
bigendian or littleendian format.
D
is contained within another entity. Each
espite several online discussions
file in a C program may contain multiple
about DWARF on the PLSIG email list
(which survived under X/Open [later Open data definitions, multiple variable defini Debugging
tions, and multiple functions. Within each
Group] sponsorship after UI’s demise),
C function there may be several data defini
Information Entry
there was little impetus to revise (or even
finalize) the document until the end of tions followed by executable statements. A (DIE)
statement may be a compound statement
1999. At that time, there was interest in
that in turn can contain data definitions
extending DWARF to have better support
and executable statements. This creates lex
Tags and Attributes
T
for the HP/Intel IA64 architecture as well he basic descriptive entity in DWARF is
ical scopes, where names are known only
as better documentation of the ABI used by the Debugging Information Entry
within the scope in which they are defined.
C++ programs. These two efforts separat
To find the definition of a particular symbol (DIE). A DIE has a tag, which specifies
ed, and the author took over as Chair for
in a program, you first look in the current what the DIE describes and a list of at
the revived DWARF Committee.
scope, then in successive enclosing scopes tributes which fill in details and further de
Following more than 18 months of de until you find the symbol. There may be scribes the entity. A DIE (except for the top
velopment work and creation of a draft of multiple definitions of the same name in most) is contained in or owned by a parent
the DWARF 3 specification, the standard different scopes. Compilers very naturally DIE and may have sibling DIEs or children
ization effort hit what might be called a soft represent a program internally as a tree. DIEs. Attributes may contain a variety of
patch. The committee (and this author, in values: constants (such as a function
DWARF follows this model in that it is name), variables (such as the start address
particular) wanted to insure that the
also block structured. Each descriptive enti for a function), or references to another
DWARF standard was readily available and
ty in DWARF (except for the topmost entry DIE (such as for the type of a function’s re
to avoid the possible divergence caused by
which describes the source file) is con turn value).
multiple sources for the standard. The
tained within a parent entry and may con
DWARF Committee became the DWARF
tain children entities. If a node contains Figure 1 shows C's classic hello.c
Workgroup of the Free Standards Group in
multiple entities, they are all siblings, relat program with a simplified graphical repre
2003. Active development and clarification
ed to each other. The DWARF description sentation of its DWARF description. The
of the DWARF 3 Standard resumed early in
of a program is a tree structure, similar to topmost DIE represents the compilation
2005 with the goal to resolve any open is
the compiler’s internal tree, where each unit. It has two “children”, the first is the
sues in the standard. A public review draft
node can have children or siblings. The DIE describing main and the second de
was released to solicit public comments in
nodes may represent types, variables, or scribing the base type int which is the type
October and the final version of the
functions. This is a compact format where of the value returned by main. The sub
DWARF 3 Standard was released in Decem
only the information that is needed to de program DIE is a child of the compilation
ber, 2005.
scribe an aspect of a program is provided. unit DIE, while the base type DIE is refer
The format is extensible in a uniform fash enced by the Type attribute in the subpro
DWARF 4 ─ DWARF ion, so that a debugger can recognize and gram DIE. We also talk about a DIE “own
Debugging Format Committee ignore an extension, even if it might not ing” or “containing” the children DIEs.
After the Free Standards Group merged understand its meaning. (This is much bet
with Open Source Development Labs ter than the situation with most other de Types of DIEs
D
(OSDL) in 2007 to form the Linux Founda bugging formats where the debugger gets
IEs can be split into two general types.
tion, the DWARF Committee returned to in fatally confused attempting to read unrec
Those that describe data including
dependent status and created its own web ognized data.) DWARF is also designed to
data types and those that describe functions
site at dwarfstd.org. Work began on Ver be extensible to describe virtually any pro
sion 4 of the DWARF in 2007. This version cedural programming language on any ma and other executable code.
clarified DWARF expressions, added sup chine architecture, rather than being bound
port for VLIW architectures, improved lan to only describing one language or one ver Describing Data and
sion of a language on a limited range of ar
guage support, generalized support for
chitectures. Types
M
packed data, added a new technique for
compressing the debug data by eliminating ost programming languages have so
duplicate type descriptions, and added sup phisticated descriptions of data.
port for profilebased compiler optimiza 4
There are a number of builtin data types,
In the remainder of this paper, we will be dis pointers, various data structures, and usual
tions, as well as extensive editing of the cussing DWARF Version 2 and later versions.
documentation. The DWARF Version 4 ly ways of creating new data types. Since
Unless otherwise noted, all descriptions apply to
DWARF Versions 2 through 4. DWARF is intended to be used with a vari
A
DW_TAG_base_type mented, make it difficult to have
compatibility between different named variable is described by a DIE
DW_AT_name = int
DW_AT_byte_size = 4 compilers or debuggers, or even which has a variety of attributes, one
DW_AT_encoding = signed between different versions of the of which is a reference to a type definition.
same tools. Figure 4 describes an integer variable
Figure 2a. int base type on 32bit processor. named x. (For the moment we will ignore
DWARF base types provide the other information that is usually con
the lowest level mapping be tained in a DIE describing a variable.)
DW_TAG_base_type tween the simple data types and
DW_AT_name = int how they are implemented on The base type for int describes it as a
DW_AT_byte_size = 2 the target machine's hardware. signed binary integer occupying four bytes.
DW_AT_encoding = signed This makes the definition of int
5
explicit for both Java and C and This is a reallife example taken from an imple
Figure 2b. int base type on 16bit processor allows different definitions to be mentation of Pascal that passed 16bit integers in
the top half of a word on the stack.
A
bly other attributes. If the size of an in
rray types are described by a DIE (file, line, column) triplet.
stance is known at compile time, then it
which defines whether the data is will have a byte size attribute. Each of these DWARF splits variables into three cate
6
descriptions looks very much like the de gories: constants, formal parameters, and
Some compilers define a common set of type
scription of a simple variable, although variables. A constant is used with languages
definitions at the start of every compilation unit.
Others only generate the definitions for the types there may be some additional attributes. that have true named constants as part of
which are actually referenced in the program. For example, C++ allows the programmer the language, such as Ada parameters. (C
Either is valid. to specify whether a member is public, pri
D
a fixed offset from where the executable is load
ed. The loader relocates references to addresses WARF treats functions that return val cation attributes are not shown.
within an executable so that at runtime the loca ues and subroutines that do not as
tion attribute contains the actual memory ad variations of the same thing. Drifting slight In Figure 8b, DIE <2> shows the defi
dress. In an object file, the location attribute is ly away from its roots in C terminology, nition of size_t which is a typdef of un-
the offset, along with an appropriate relocation
DWARF describes both with a subprogram signed int. This allows a debugger to
table entry.
M
size_t, while displaying its value as an tion unit is not contiguous, then a list of the
ost interesting programs consists of
unsigned integer. DIE <5> describes the memory addresses that the code occupies is
more than a single file. Each source
function strndup. This has a pointer to its provided by the compiler and linker.
file that makes up a program is compiled
sibling, DIE <10>; all of the following
independently and then linked together The Compilation Unit DIE is the parent
DIEs are children of the Subprogram DIE.
with system libraries to make up the pro of all of the DIEs that describe the compila
The function returns a pointer to char, de
gram. DWARF calls each separately com tion unit. Generally, the first DIEs will de
scribed in DIE <10>. DIE <5> also de
piled source file a compilation unit. scribe data types, followed by global data,
scribes the subroutine as external and pro then the functions that make up the source
totyped and gives the low and high PC valThe DWARF data for each compilation
file. The DIEs for variables and functions
unit starts with a Compilation Unit DIE.
ues for the routine. The formal parameters are in the same order in which they appear
This DIE contains general information
and local variables of the routine are de in the source file.
scribed in DIEs <6> to <9>. about the compilation, including the direc
tory and name of the
source file, the pro Data encoding
C
strndup.c:
gramming language onceptually, the DWARF data that de
1: #include "ansidecl.h"
2: #include <stddef.h> used, a string which scribes a program is a tree. Each DIE
3: identifies the produc may have a sibling and maybe several chil
4: extern size_t strlen (const char*); er of the DWARF dren DIEs. Each of the DIEs has a type
5: extern PTR malloc (size_t); data, and offsets into (called its TAG) and a number of attributes.
6: extern PTR memcpy (PTR, const PTR, size_t); the DWARF data sec Each attributes is represented by a attribute
7:
tions to help locate type and a value. Unfortunately, this is not
8: char *
9: strndup (const char *s, size_t n) the line number and a very dense encoding. Without compres
10: { macro information. sion, the DWARF data is unwieldy.
11: char *result;
12: size_t len = strlen (s); If the compilation DWARF offers several ways to reduce
13: unit is contiguous the size of the data which needs to be saved
14: if (n < len) (i.e., it is loaded into with the object file. The first is to "flatten"
15: len = n; memory in one piece) the tree by saving it in prefix order. Each
16: then there are values type of DIE is defined to either have chil
17: result = (char *) malloc (len + 1);
for the low and high dren or not. If the DIE cannot have chil
18: if (!result)
19: return 0; memory addresses for dren, the next DIE is its sibling. If the DIE
20: the unit. This makes it can have children, then the next DIE is its
21: result[len] = '\0'; easier for a debugger first child. The remaining children are rep
22: return (char *) memcpy (result, s, len); to identify which resented as the siblings of this first child.
23: } compilation unit cre This way, links to the sibling or child DIEs
Figure 8a. Source for strndup.c. ated the code at a can be eliminated. If the compiler writer
thinks that it might be useful to be able to
jump from one DIE to its sibling without
stepping through each of its children DIEs
<1>: DW_TAG_base_type <7>: DW_TAG_formal_parameter
DW_AT_name = int DW_AT_name = n (for example, to jump to the next function
DW_AT_byte_size = 4 DW_AT_type = <2> in a compilation) then a sibling attribute
DW_AT_encoding = signed DW_AT_location = can be added to the DIE.
<2>: DW_TAG_typedef (DW_OP_fbreg: 4)
DW_AT_name = size_t <8>: DW_TAG_variable A second scheme to compress the data
DW_AT_type = <3> DW_AT_name = result is to use abbreviations. Although DWARF
<3>: DW_TAG_base_type DW_AT_type = <10> allows great flexibility in which DIEs and
DW_AT_name = unsigned int DW_AT_location = attributes it may generate, most compilers
DW_AT_byte_size = 4 (DW_OP_fbreg: -28)
DW_AT_encoding = unsigned <9>: DW_TAG_variable
only generate a limited set of DIEs, all of
<4>: DW_TAG_base_type DW_AT_name = len which have the same set of attributes. In
DW_AT_name = long int DW_AT_type = <2> stead of storing the value of the TAG and
DW_AT_byte_size = 4 DW_AT_location = the attributevalue pairs, only an index into
DW_AT_encoding = signed (DW_OP_fbreg: -24) a table of abbreviations is stored, followed
<5>: DW_TAG_subprogram <10>: DW_TAG_pointer_type by the attribute codes. Each abbreviation
DW_AT_sibling = <10> DW_AT_byte_size = 4
DW_AT_external = 1 DW_AT_type = <11>
gives the TAG value, a flag indicating
DW_AT_name = strndup <11>: DW_TAG_base_type whether the DIE has children, and a list of
DW_AT_prototyped = 1 DW_AT_name = char attributes with the type of value it expects.
DW_AT_type = <10> DW_AT_byte_size = 1 Figure 9 shows the abbreviation for the for
DW_AT_low_pc = 0 DW_AT_encoding = mal parameter DIE used in Figure 8b. DIE
DW_AT_high_pc = 0x7b signed char <6> in Figure 8 is actually encoded as
<6>: DW_TAG_formal_parameter <12>: DW_TAG_pointer_type
DW_AT_name = s DW_AT_byte_size = 4
shown8. This is a significant reduction in
DW_AT_type = <12> DW_AT_type = <13> the amount of data that needs to be saved
DW_AT_location = <13>: DW_TAG_const_type at some expense in added complexity.
(DW_OP_fbreg: 0) DW_AT_type = <11>
Figure 8b. DWARF description for strndup.c. 8
The encoded entry also includes the file and line
values which are not shown in Fig. 8b.
T
instruction, it 0x41 0 50 0 yes no no no no 0
he DWARF line table contains the map would be 0x47 0 51 0 yes no no no no 0
ping between memory addresses that huge. DWARF 0x50 0 53 0 yes no no no no 0
contain the executable code of a program compresses 0x59 0 54 0 yes no no no no 0
and the source lines that correspond to this data by 0x6a 0 54 0 yes no no no no 0
these addresses. In the simplest form, this encoding it as 0x73 0 55 0 yes no no no no 0
can be looked at as a matrix with one col sequence of 0x7b 0 56 0 yes no yes no no 0
umn containing the memory addresses and instructions File 0: strndup.c
another column containing the source called a line File 1: stddef.h
triplet (file, line, and column) for that ad number pro
dress. If you want to set a breakpoint at a gram9. These Figure 10. Line Number Table for strndup.c.
particular line, the table gives you the instructions
memory address to store the breakpoint in are interpret
struction. Conversely, if your program has a ed by a simple finite state machine to recre Macro Information
M
fault (say, using a bad pointer) at some lo ate the complete line number table. ost debuggers have a very difficult
cation in memory, you can look for the time displaying and debugging code
source line that is closest to the memory The finite state machine is initialized
which has macros. The user sees the origi
address. with a set of default values. Each row in the
nal source file, with the macros, while the
line number table is generated by executing
DWARF has extended this with added one or more of the opcodes of the line code corresponds to whatever the macros
columns to convey additional information number program. The opcodes are general generated.
about a program. As a compiler optimizes ly quite simple: for example, add a value to DWARF includes the description of the
the program, it may move instructions either the machine address or to the line macros defined in the program. This is
around or remove them. The code for a giv number, set the column number, or set a quite rudimentary information, but can be
en source statement may not be stored as a flag which indicates that the memory ad used by a debugger to display the values
sequence of machine instructions, but may dress represents the start of an source state for a macro or possibly translate the macro
be scattered and interleaved with the in into the corresponding source language.
structions for other nearby source state 9
ments. It may be useful to identify the end Calling this a line number program is some
thing of a misnomer. The program describes Call Frame Information
of the code which represents the prolog of
E
much more than just line numbers, such as in
a function or the beginning of the epilog, so struction set, beginning of basic blocks, end of very processor has a certain way of
that the debugger can stop after all of the function prolog, etc. calling functions and passing argu
W
return address for the function.
or more commonly ULEB for unsigned val hile DWARF is defined in a way that
For some processors, there may be dif ues and SLEB for signed values), which allows it to be used with any object
ferent calling sequences depending on how compresses these integer values. Since the file format, it's most often used with ELF.
the function is written, for example, if there loworder bits contain the data and highor Each of the different kinds of DWARF data
are more than a certain number of argu der bits consist of all zeros or ones, LEB val are stored in their own section. The names
ments. There may be different calling se ues chop off the loworder seven bits of the of these sections all start with ".debug_".
quences depending on operating systems. value. If the remaining bits are all zero or For improved efficiency, most references to
Compilers will try to optimize the calling one (signextension bits), this is the encod DWARF data use an offset from the start of
sequence to make code both smaller and ed value. Otherwise, set the highorder bit the data for the current compilation. This
faster. One common optimization is having to one, output this byte, and go on to the avoids the need to relocate the debugging
a simple function which doesn't call any next seven loworder bits. data, which speeds up program loading and
others (a leaf function) use its caller stack debugging.
frame instead of creating its own. Another Shrinking DWARF data
optimization may be to eliminate a register The ELF sections and their contents are
which points to the current call frame. The encoding schemes used by DWARF
.debug_abbrev Abbreviations used in the
Some registers may be preserved across the significantly reduce the size of the debug .debug_info section
call while others are not. While it may be ging information compared to an unencod .debug_aranges
A mapping between
possible for the debugger to puzzle out all ed format like DWARF Version 1. Unfortu memory address and
the possible permutations in calling se nately, with many programs the amount of compilation
quence or optimizations, it is both tedious debugging data generated by the compiler
.debug_frame Call Frame Information
and errorprone. A small change in the op can become quite large, frequently much
timizations and the debugger may no larger than the executable code and data.
.debug_info The core DWARF data
longer be able to walk the stack to the call DWARF offers ways to further reduce containing DIEs
ing function. the size of the debugging data. Most strings
.debug_line Line Number Program
The DWARF Call Frame Information in the DWARF debugging data are actually
(CFI) provides the debugger with enough references into a separate .debug_str .debug_loc Location descriptions
information about how a function is called section. Duplicate strings can be eliminat
so that it can locate each of the arguments ed when generating this section. Potential
to the function, locate the current call ly, a linker can merge the .debug_str Macro descriptions
.debug_macinfo
frame, and locate the call frame for the sections from several compilations into a
calling function. This information is used by single, smaller string section. .debug_pubnames A lookup table for global
the debugger to "unwind the stack," locat objects and functions
Many programs contain declarations
ing the previous function, the location which are duplicated in each compilation .debug_pubtypes A lookup table for global
where the function was called, and the val unit. For example, debugging data describ types
ues passed. ing many (perhaps thousands) declarations .debug_ranges Address ranges
referenced by DIEs
Like the line number table, the CFI is of C++ template functions may be repeated
encoded as a sequence of instructions that in each compilation. These repeated de .debug_str String table used by
are interpreted to generate a table. There is scriptions can be saved in separate compila .debug_info
one row in this table for each address that tion units in uniquely named sections. The .debug_types Type descriptions
contains code. The first column contains linker can use COMDAT (common data)
the machine address while the subsequent techniques to eliminate the duplicate sec
columns contain the values of the machine tions.
registers when the instruction at that ad Many programs reference a large num
dress is executed. Like the line number ta ber of include files which contain many
ble, if this table were actually created it type definitions, resulting in DWARF data Summary
S
would be huge. Luckily, very little changes which contains thousands of DIEs for these
between two machine instructions, so the o there you have it ─ DWARF in a nut
types. A compiler can reduce the size of
CFI encoding is quite compact. shell. Well, not quite a nutshell. The ba
this data by only generating DWARF for the sic concepts for the DWARF debug informa
types which are actually used in the compi tion are straightforward. A program is de
Variable length data lation. With DWARF Version 4, type defini scribed as a tree with nodes representing
Integer values are used throughout tions can be saved into a separate the various functions, data and types in the
DWARF to represent everything from off .debug_types section. The compilation source in a compact language and ma
sets into data sections to sizes of arrays or unit contains a DIE which references this
chineindependent fashion. The line table
structures. In most cases, it isn't possible to separate type unit and a unique 64bit sig provides the mapping between the exe
place a bound on the size of these values. nature for these types. A linker can recog cutable instructions and the source that
In a classic data structure each of these val 10
An example of this may be seen in the reloca generated them. The CFI describes how to
ues would be represented using the default tion directory in an object file, where file offset unwind the stack.
integer size. Since most values can be rep and relocation values are represented by inte
gers. Most values are have leading zeros.
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000007b 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000000 00000000 00000000 000000b0 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 00000000 00000000 000000b0 2**2
ALLOC
3 .debug_abbrev 00000073 00000000 00000000 000000b0 2**0
CONTENTS, READONLY, DEBUGGING
4 .debug_info 00000118 00000000 00000000 00000123 2**0
CONTENTS, RELOC, READONLY, DEBUGGING
5 .debug_line 00000080 00000000 00000000 0000023b 2**0
CONTENTS, RELOC, READONLY, DEBUGGING
6 .debug_frame 00000034 00000000 00000000 000002bc 2**2
CONTENTS, RELOC, READONLY, DEBUGGING
7 .debug_loc 0000002c 00000000 00000000 000002f0 2**0
CONTENTS, READONLY, DEBUGGING
8 .debug_pubnames 0000001e 00000000 00000000 0000031c 2**0
CONTENTS, RELOC, READONLY, DEBUGGING
9 .debug_aranges 00000020 00000000 00000000 0000033a 2**0
CONTENTS, RELOC, READONLY, DEBUGGING
10 .comment 0000002a 00000000 00000000 0000035a 2**0
CONTENTS, READONLY
11 .note.GNU-stack 00000000 00000000 00000000 00000384 2**0
CONTENTS, READONLY
The DWARF listing for all but the smallest programs is quite voluminous, so it would be a good idea to direct readelf’s output to a file
and then browse the file with less or an editor such as vi.