Professional Documents
Culture Documents
Model I3
Although the PowerPC processor does not enforce any particular programming
model, a variety of programming conventions that IBM devised for the POWER
architecture have become the standard for the PowerPC. These conventions are
part of the Poweropen Application Binary Interface (ABI) and Application Pro-
gramming Interface (API) that formalize the standard to make compliant sys-
tems binary and source-code compatible. These same conventions are also used
on Apple's PowerPC-based Macintosh computers.
Programming Model 25 1
5 13.1 Register Usage Conventions
GPR Usage
The PowerPC has 32 GPRs, which may be either 32 or 64 bits (depending on the
implementation). Table 13-1 summarizes the standard register usage conven-
tions.
Table 13-1 GPR Register Usage Conventions
Two of the GPRs ( r l and r 2 ) are dedicated for use with 0s-related tasks, three
(ro, r 11 and r 12) are used by compiler, linkage, or glue routines, and eight
more (1-3 through r 10) are allocated for passing parameters into a function (see
s13.5, "Subroutine Calling Conventions," for more information). This leaves 19
GPRs (r13 through r 3 1)which are available for general use, but which must be
saved and restored if used.
By convention, a routine should use the volatile registers first because they do
not need to be saved and restored. Thus, a routine should first use GPRO and any
of the registers GPR3 through GPR12 that are not already being used by the rou-
tine for parameters.
If a routine needs to use still more registers, the non-volatile GPRs should be
used from highest numbered to lowest numbered. That is, GPR31 is used first,
followed by GPR30, and so on. Using the non-volatile registers in this fashion
allows the stmw and lmw instructions to be used to save and restore the registers
in the function prologlepilog code. However, some important issues are
involved with using the load and store multiple instructions. These issues are
discussed later in s13.8, "Saving Registers on the Stack."
252 Chapter 13
FPR Usage
FPR Usage
The PowerPC defines thirty-two 64-bit floating point registers. Of these regis-
ters, one (f r O ) is set aside as a scratch register, 13 (f r l through f r13) are used
for passing parameters to subroutines, and the remaining 18 (f r 1 4 through
fr31) are available for general use. Table 13-2 summarizes the floating-point
register usage conventions.
Table 13-2 FPR Register Usage Conventions
As with the GPRs, a routine that needs to use floating-point registers should first
use the volatile registers, f r O and any of the registers f r 1through f r 13 that are
not being used to hold parameters. The remaining 18 FPRs can be used if there
are not enough volatile registers to hold the required values.
The non-volatile registers should be used from the highest to the lowest (that is,
from f r 3 1 down to f r 1 4 ) so that the FPR and GPR register save methods fol-
low the same basic conventions. (See 513.8, "Saving Registers on the Stack," later
in h s chapter.)
SPR Usage
Table 13-3 summarizes the standard usage conventions for the common SPRs
available on PowerPC implementations. In general, the system registers do not
need to be preserved across function calls. The only exceptions are that some of
the fields within the CR must always be preserved, and the FPSCR should be
preserved under certain circumstances.
I
SPR
Type
Must be
Preserved? 1 Usage
'generaluse;implicitly userby integer
0
- volatile instructions with the Record bit set
general use; implicitly used by floating-
1
CR -
2
3
-
non-volatile yes general use; must be preserved
4
5
-
6
- volatile no 1 general use
-
7
branch target address
LR
, subroutine return address I
loop counter
CTR volatile branch address (goto, case, system glue)
-
XER fixed point exceptions
FPSCR floating-point exceptions
MQ volatile no obsolete; exists on the 601 onlv
Note that even though the FPSCR is listed as a volatile register that doesn't need
to be preserved, it is considered rude for a routine to change the floating-point
exception enable bits in the FPSCR without restoring them to their original state.
The only exceptions are routines defined to modify the floating-point execution
state.
It is unnecessary to restore the FPSCR if it was only used to record the exception
information as set by the standard arithmetic floating-point instructions.
254 Chapter 13
Initializing the TOC
I I low addresses
Pointer to stack
frame of caller
I Stack frame
I high addresses
It's nice to be able to assume that the stack pointer always points to a valid stack
frame, because interrupt-level code doesn't need to worry about the stack being
in an inconsistent state. However, it does require a little bit of effort on the part
of the program to insure that the stack pointer does, indeed, always identify a
valid stack frame.
Basically this effort amounts to making sure that the stack pointer update oper-
ation is atomic (that is, it is accomplished in one instruction). This means that the
stack frame cannot be built by using a series of small steps that each allocate a
small area on the stack-the entire stack frame size must be allocated at once,
and the offsets to each area within the frame must be calculated. These calcula-
tions can become quite cumbersome because of multiple variable-sized areas in
a stack frame. Efforts to simplify these calculations have led to some rather
bizarre stack frame building conventions, which are covered later in "Building
Stack Frames."
1. Actually, the C library routine a l l o c a ( ), which dynamically allocates storage on the stack,
also updates the stack pointer. This special case is discussed later in 913.12, "Stack Frames
and alloca()."
256 Chapter 13
Stack Pointer Maintenance on Function Entry
When a function is entered, it needs to create a new stack frame above the cur-
rent one on the stack and update r l to point to the new frame, saving the
address of the previous stack frame in the process. When a function exits, it'sim-
ply needs to restore the previous stack frame.
Figure 13-2 shows this for the simple case of the routine f oo ( ) calling the rou-
tine bar ( ) . At point (A), £00 ( ) has initialized itself but has not yet called
another routine. During (B), £ oo ( ) has called bar ( ) , and b a r ( ) has set itself
up with its own stack frame. A pointer back to £00 ( )'s stack frame is recorded
as part of bar ( ) 's initialization process. After the call to b a r ( ) is complete (C),
the stack pointer once again points to £00 ( ) 's stack frame.
Figure 13-2 Stack When £00 ( ) Calls b a r ( )
foo's foo's
stack frame stack frame
The three states shown in Figure 13-2 are the only states that the stack pointer
has during the subroutine call. There are no intermediate states where bar ( )'s
stack frame is only partially built.
258 Chapter 13
Building Stack Frames
Which of the two methods is more efficient depends on the particular routine
since the three-instruction sequence provides more opportunities for schedul-
ing the instructions.
bar's bar's
stack stack
frame frame
(under
construction)
foo's foo's
stack stack
frame frame
Before the frame is built, r l points to the stack frame of the previous routine,
which is conveniently located immediately below where the new stack frame is
going to be built. The new stack frame areas can be initialized at this point by
using a negative offset from r 1to write values above the top of the stack.
..
Writing Above the Stack Pointer. ick!
The calling conventions for most well-designed systems involve initializing the
stack frame values after the frame has been built and consider it a bad idea to
write values above the current stack pointer. There's a very good reason for this:
if an interrupt comes along, it may need to allocate some temporary storage on
the stack for itself. Although it will free the space and return the stack pointer to
its original value, any data that was above the stack pointer is likely to be
trashed.
The PowerPC calling conventions are no exception: it is still considered bad
form to write data above the stack pointer-with this one exception of building
stack frames. Two things make this exception acceptable. First, the only values
written above the stack are the GPR and FPR save areas, guaranteed to be no
larger than a certain maximum size (because only a certain number of registers
will ever need to be saved). Second, interrupts and other system-level code are
aware of this maximum size and skip over that many bytes before they allocate
any space on the stack.
Now, this may seem like too much effort just to write values safely above the
stack, and, to a certain extent, it is. The GPR and FPR save areas could have been
written using offsets from the new stack frame after it had been built. However,
because many areas in the stack frame are of variable width and need to be prop-
erly aligned, the formula to determine the offset to these two areas from the
frame pointer is relatively complicated.
So, it's basically a choice between complicating the interrupt handlers (which
very few people write) or complicating the formula for calculating the offsets to
these areas (which would affect more people). It's important to note that neither
option affects performance. The interrupt handlers simply add the maximum
save area size to the amount of space that they're allocating on the stack, and the
"more complex offset formula" is statically computed at assembly time doesn't
generate any extra code.
The end result is that it doesn't really matter. The standard calling convention
involves writing above the stack, and the system-level code is designed to han-
dle this. If writing above the stack offends you as a programmer, you can simply
not do it. It is perfectly acceptable to save the GPR and FPR values after the stack
frame has been built because the only code that uses them is the function that
owns the stack frame (no one else can use them since the number of regsters
saved isn't even recorded unless debugging information (h la traceback table) is
present).
It's interesting to note that because the size of the GPR and FPR save areas is
dependent on the size and number of registers being saved, the maximum save
area size will change for 64-bit PowerPC implementations. In order to make
room for the nineteen 64-bit GPRs and eighteen 64-bit FPRs, 296 bytes (instead
260 Chapter 13
Brief Interlude: Naming Conventions
of the 220 required for nineteen 32-bit GPRs and eighteen 64-bit FPRs) will need
to be "reserved above the stack.
Programming Model 26 1
§ 13.6 A Simple Subroutine Call
Function Prologs
When a function is called, it must set up the stack properly so that it creates room
for all local and register-save storage, and so that it sets up a proper back chain.
The portion of a function that does this is known as the function prolog.
Function Epilogs
The function epilog undoes the work of the prolog. It restores the registers that
were saved and makes sure that the stack pointer once again points to the stack
frame of the routine that' originally called this routine.
262 Chapter 13
Prolog for bar()
- Link Area
Argument Area
GPR Save
FPR Save
The argument area is where the arguments are placed if there isn't enough
room to store all of them in the registers. This area is always at least eight
words in size and is left unused most of the time (because the arguments
are stored in registers if possible). The arguments stored here are the
arguments that foe( ) (in this example) is sending to another routine
(bar ( ) ). These are not the arguments that were passed to £00 ( ) .
The local storage area is where £00 ( ) stores whatever it likes. Its size is
determined by f oo ( ) when it creates the stack frame.
The register save area is where £00 ( ) stores the original contents of any of
the non-volatile GPRs or FPRs that it needs to use. This way it can restore
them to their original values before returning. If no non-volatile registers
are used by f oo ( ), then this area will be zero bytes in size.
Figure 13-5 The Stack after bar ( ) Has Built Its Stack Frame.
Argument Area
-
Local Storage
Area
Link Area
foo's
stack
frame
264 Chapter 13
Saving the Non-volatile Registers
Figure 13-6 Saving the GPRs and FPRs above the Stack Pointer
bar's
GPR Save stack
frame
(under
construction)
FPR Save
d
foo's
stack
frame
Because of the register usage convention of starting from register 31 and work-
ing down, each routine will have a contiguous range of registers (rN through
r31) that need to be saved. This simplifies the savelrestore code and allows the
Load and Store Multiple instructions to be used for this purpose. In reality, the
process of saving and restoring registers ends up being a little more compli-
cated, but these complications don't affect the fact that the registers are stored in
the register save area in ascending order. Register saving is treated in more detail
later in 513.8, "Saving Registers on the Stack."
For the CR and LR, storage space is set aside in £00 ( )'s stack frame to save
these values. The CR is saved at offset 4 from the start of f oo ( )'s stack frame,
and the LR is stored at offset 8. Figure 13-7 shows this operation. In this figure,
the highlighted area in £00 ( )'s stack frame is the area in the Link Area where
the register is being stored.
The code to save the CR and LR is:
# save t h e Link R e g i s t e r
mflr r O
stw r0,8(rl)
# save t h e Condition R e g i s t e r
mfcr r O
stw r0,4(rl)
bar's stack
frame (under
construction)
foo's
stack frame
One additional register, the FPSCR, is a special case because it generally doesn't
need to be saved and restored, but there are situations where it should be. If any
of the enable (VE, OE, UE, ZE, or XE) or mode (NI or RN) bits of the FPSCR are
changed, then the routine should save and restore the FPSCR, because it is impo-
lite for a routine to globally change the floating-point model. The other bits of
the FPSCR which may be set as a side effect of executing floating-point instruc-
tions, are volatile, and the FPSCR does not need to be saved if they are modified.
No special storage location is set aside for the FPSCR. If a routine needs to save
and restore it, the routine must allocate space in its local storage area.
266 Chapter 13
Execution of bar()
into bar ( ) . This area is for the arguments that bar ( ) will (possibly)pass to
another routine. This is always at least eight words (64 bytes) in length.
local-size is the size of the local storage area that bar ( ) needs, or 0 if it
doesn't need any local storage.
gpr-size is large enough to hold all of the GPRs that bar ( ) is saving and
restoring. This can range from 0 to 19 words (76 bytes).
fir-size is large enough to hold all of the FPRs that bar ( ) is saving and
restoring. This can range from 0 to 18 doublewords (144 bytes).
padding is the number of extra bytes needed to insure that the stack pointer
is always quadword (16-byte) aligned.
Because the link, argument, and local storage area are allocated from the top of the
stack frame, and the FPR and GPR save areas are allocated from the bottom, the
padding bytes fall between the local storage area and the GPR save area.
It is important to note that this one Store w i t h Update instruction performs two
critical functions: it allocates the new stack frame on the stack, and it saves the
back chain (pointer to the previous stack frame) at offset 0 into the newly created
stack frame.
Execution of bar ( )
Finally, bar ( ) can execute its code and accomplish the tasks that it needs to,
including calling other routines.
Epilog of bar ( )
During the epilog, bar ( ) must restore the registers, restore f oo ( )'s stack
frame, and then return control to f oo ( 1.
mtlr r O
# r e s t o r e t h e Condition R e g i s t e r
lwz r 0 ,frame-size+ 4 ( r 1 )
mtcr r O
Technically only the CR fields that have changed need to be restored, but some
PowerPC implementations may execute a complete CR restore instruction sig-
nificantly faster than they would execute a partial CR restore.
Returning Control to f oo ( )
Because the LR has been restored, it now holds the return address in foo ( ).
This means that control can be returned to f oo ( ) by simply executing a Branch
to Link Register instruction:
blr
Return to foo ( )
At this point, control has been returned to f oo ( ), and the stack and all of the
non-volatile registers have been restored. Execution in f oo ( ) continues.
268 Chapter 13
Before Calling bar()
space on the stack. If bar ( ) can fit its register save area and its local
storage area into 220 bytes, then it can get away without having a
stack frame.
If a routine doesn't require a stack frame, then there's no sense in creating one.
This section will step through the subroutine call where is it assumed that
bar ( ) does not need a stack frame.
Prolog of bar ( )
The only task that the prolog needs to perform is saving the non-volatile regis-
ters that bar( ) needs to use. This is done above the stack pointer (where
bar ( )Is stack frame would be if it were to build one). g13.8, "Saving Registers
on the Stack," discusses in detail how this is done.
The CR and LR should also be saved. Because space has been allocated for them
in foo ( ) 's stack frame, they can be saved using the same code given in the pre-
vious section:
# save the Link Register
mflr rO
stw r0,8(rl)
Execution of bar ( )
As in the previous description, the execution of bar ( ) continues as it normally
would. The only differences are that bar ( ) is not allowed to call any other
routines, and any of bar ( ) 's local variables must be accessed using a positive
offset from the top of the stack.
Epilog of bar ( )
Because there is no stack frame to restore, the epilog code just restores the non-
volatile registers that were used by bar ( ) and then returns control to f oo ( ) .
For the GPRs and FPRs, the original values are pulled from above the stack and
stuffed into the appropriate registers. See g13.8, "Saving Registers on the Stack,"
for a full discussion of how this should be done.
For the CR and LR, the original values were stored at offsets 4 and 8 of f oo ( )'s
stack frame. This is the same code used in the previous section, with the simpli-
fication that theframe-size is known to be 0.
# restore the Link Register
lwz r0,8(rl)
mtlr rO
270 Chapter 13
Saving GPRs Only
mean that a series of loads or stores will always be faster than the analagous Load
or Store Multiple instruction. On processors with unified caches (like the 601), the
instruction fetches from the series of loads / stores can collide with the data cache
accesses and thus be less efficient than the Load or Store Multiple.
The second reason is that equivalent instructions do not exist for the FPRs, and
there are no planned instructions to support 64-bit GPRs. This means that the
only time these instructions can be used is with the GPRs on 32-bit PowerPC
implementations, and they might be horribly inefficient.
So, because a mechanism is needed for the FPRs anyway, it might as well be
generalized to handle the GPRs.
bar's stack
-
frame (under
construction)
GPR Save Area
foo's
stack frame
There are three common ways of accomplishing this. The first is to use the Store
Multiple instruction, which has already been presented as a potentially bad idea.
However, it doesn't hurt to show how it would be done. The second and third
methods both involve using a series of Store instructions. The second method
has these instructions inline, and the third branches to a system routine that per-
forms the appropriate register saves.
Programming Model 27 1
$13.8 Saving Registers on the Stack
272 Chapter 13
Restoring GPRs Only
274 Chapter 13
Restoring FPRs Only
As with the GPR case, the LR must be saved before the save routine is called.
Figure 13-9 The GPRs Save Area Is Immediately above the FPR Save Area.
new frame
under
construction
GPR Save
1-
FPR Save
" )2@
I
(I1) old frame
If the standard GPR save routines are rewritten to use r12 instead of rl, then
this code can be used to save six GPRs and five FPRs:
mf lr r0
subi r12,rl18*5
bla -s avegpr2 6
bla -savef pr27
276 Chapter 13
Multiple GPR SaveRestore Routines
Additional Functionality
Because these routines are always part of a function's prolog or epilog, it makes
sense to include some other prolog/epilog tasks in their code.
Additional Functionality for Save Routines
For the save routines, it's easy to add the store of the original LR into the caller's
stack frame because it is something that the function prolog needs to do anyway.
If the LR is saved in rO before calling the save routine (as it should be), the LR
can be saved in the proper place using:
stw r0,8(rl)
For the GPR save routine, this instruction only needs to be added to the '0' vari-
ant because the '1' variant is always used with the FPR save routine, which will
presumably handle the LR save.
The FPR save routine that includes an LR save is differentiated from the stan-
dard FPR save routine by adding an underscore between the 'f pr' and the reg-
ister number. Thus, -save f pr-2 9 would be used instead of -save f pr2 9.
Additional Functionality for Restore Routines
The restore routines can restore the saved LR value and return directly to the
value stored there, eliminating the need to return back to the function perform-
ing the restore. This is done by adding code to the end of the restore routine:
lwz r0,8(rl)
mtlr r0
blr
When this version of a restore routine is used, it isn't necessary to use the Branch
with Link form to call the restore routine (no damage will occur if it is used). Only
the non-Link branch form is needed because the restore routine will return
directly to the caller.
Rewriting the example that restores six GPRs and five FPRs, the restore routines
would be called as:
subi r12,r118*5
bla -restgprl-26
ba -restfpr-27
As with the save routines, this extra functionality only needs to be added to the
'0' variant of the GPR save routines. The FPR restore routine that restores the LR
adds an underscore to the name just like the FPR save routine does. Thus,
-rest fpr-2 7 is used in the above example instead of -rest fpr2 7.
278 Chapter 13
Link Area
p b l e padding bytes
Link Area
The Link Area is a 24-byte area that is used by both f oo ( ) (the stack frame
owner) and bar ( ). Table 13-4 lists the area's fields.
Table 13-4 Link Area fields
The first word (at offset 0) contains a pointer to the stack frame of the routine
that called f oo ( ), in this case sna ( ). This field is initialized by f oo ( ) as part
of its function prolog and is used by foo ( )'s epilog to restore the original stack
frame when the routine is complete.
The next two fields are used by routines that f oo ( ) calls, in this case bar ( ), so
that they have a place to store the LR and CR. These values are stored here
because it is possible that bar ( ) will not need to create a stack frame and it is
convenient to always save these register values in the same place.
The next two fields are reserved for use by compilers and binders, but they are
generally left unused.
The last field is a storage space set aside for bar ( ) to save its TOC pointer when
it makes out-of-module function calls. Like the LR and CR storage areas, this
area is part of £00 ( )'s stack frame so that all routines have a common place to
store their TOC value, whether or not they create a stack frame.
Because the Saved CR, LR, and TOC fields are presumed to be available to any
routine that £00 ( ) calls, £00 ( ) must create a stack frame if it calls any other
routine.
Argument Area
The Argument Area is a storage place that f oo ( ) can use to hold arguments that
are being passed to bar ( ). Table 13-5 lists the area's fields. Note that this is not
where the arguments to £00 ( ) are stored. The arguments being passed to
£00 ( ) are stored in the Argument area owned by sna ( ).
Because the Argument Area is used to hold the arguments that £00 ( ) is passing
to bar ( ), it must be large enough to hold all the arguments that bar ( ) expects.
If £00 ( ) calls multiple routines, then this area must be large enough to hold all
the arguments for the routine that requires the most argument space.
One restriction on the size of the Argument Area is that it is always at least eight
words in size. The first eight words of arguments would be placed in these eight
words if they weren't placed in GPRs 3 through 10. These eight words are
always allocated because it may be necessary for bar ( ) to take the address of
one of the arguments. In this case, the value would be written from the GPR into
the analogous slot in the Argument Area and the address into the Argument
Area would be used.
280 Chapter 13
Local Storage Area
Programming Model 28 1
5 13.10 Passing Arguments to Routines
Arg 1 2 3 4
Type int char short long
I Offset (0) (4) (8) (12)
GPR 3 4 5 6
FPR -
char in low-order slzort in low-order
byte of register halfroord of register
Because this table format will be used throughout this section to describe the
argument-passing conventions, it is worthwhile to spend a few paragraphs to
describe how the data is arranged in the table.
First of all, each argument is described in a separate column. Since there are four
arguments, there are four columns numbered 1 through 4. This numbering is
useful because it is convenient to refer to arguments by number instead of name
or type.
For each argument, the argument type, an offset, and a GPRIFPR allocation will
be given. In addition, sometimes notes on the bottom describe some special fea-
ture of the argument.
The argument type is simply the type originally defined for the variable, to typ-
ically something like int, char, or long.
The offset for each argument is the offset into the Argument Area that is being
used to hold the arguments. In many cases, the arguments are passed in registers
and the space left in the Argument Area is unused. In these cases, the offset will
be displayed in parentheses.
The last two rows are the GPR or FPR assignment for this argument. For GPRs,
this ranges between 3 and 10, and for FPRs it ranges from 1to 13. Not all argu-
ments have register assignments, and it is possible for an argument to be
assigned to both a GPR and an FPR.
282 Chapter 13
Routines with Integer Arguments
Table 13-8 Arguments for void f 3 (int a,double b, char c, f loat d, int e)
Offset
GPR (4,5)
FPR - 2
Another interesting aspect of this example is that even though the GPRs have all
been expended and the Argument Area has been partially used, the FPRs can
continue to be allocated as long as they are available. In this case, the floating-
point values must still be passed in the Argument Area, but they may also be
passed in the appropriate FPRs.
284 Chapter 13
Routines with an Ellipsis
A r g 1 1 1 2 1 3 4 1
.-
Type1 double / double I float I double
Offset (0) (8) (16) (20)
GPR 3,4 5,6 7 83
1 FPR 1 2 3 4
data
-4% 4 5
1 2 3
Type int short double double int
Offset (0) (4) (8) (16) (24)
Integer Results
Integer function results are returned in registers r 3 and r 4 . The rules governing
the use of these registers are:
int, long, and pointer values are returned in r 3 .
Unsigned char and short values are returned in r 3 , where the value is
right justified in the register and zero-extended.
Signed char and short values are returned in r 3 , where the value is
right justified in the register and sign-extended.
Bit fields of 32 bits or less are returned right justified in r 3 .
64-bit fixed-point values are returned in r 3 : r 4 , where r 3 contains the
high-order portion of the value and r 4 contains the low-order
portion.
Floating-Point Results
Floating-point function results are returned in registers f r l through f r 4 ,
according to these rules:
Single-precision(32-bit) values are returned in f r 1.
Double-precision (64-bit)values are returned in f r 1.
Long double-precision (128-bit) values are returned in f r l : f r 2 ,
where f r l contains the high-order 64 bits and f r 2 contains the low-
order 64 bits.
Single- or double-precision complex values are returned in f r l:f r 2 ,
where f r l contains the real single- or double-precision portion, and
f r 2 contains the imaginary single- or double-precision portion.
Long double-precision complex values are returned in f r l : f r 4 ,
where f r l : f r 2 contain the high- and low-order bits of the real
portion, and f r 3 : f r 4 contain the high- and low-order bits of the
imaginary portion.
286 Chapter 13
Complex Results
Complex Results
When complex data types like structures or strings (of greater than four charac-
ters) need to be returned as a function result, the caller must first allocate a buffer
large enough to hold the result. A pointer to this buffer is passed to the routine
as its first argument and occupies GPR 3 (and the first four bytes of the Argu-
ment Area).
The first user-visible argument to the routine is passed in GPR 4.
bar's
stack
frame
f 00's
stack
frame
288 Chapter 13
Function Descriptors
Module 1 Module 2
TOC = 1000 TOC = 2000
The kyoko ( ) routine is located in a module that is foreign to foo ( )'s module,
so kyoko ( ) has a different TOC environment. If foo ( ) were to simply Branch
with Link to the address of kyoko ( ), kyoko ( ) would inherit the wrong TOC
environment, which would probably cause bad things to happen.
To prevent them from happening, there must be some mechanism for switching
TOC environments whenever an "out-of-module" routine is called.
Function Descriptors
One way of handling the necessary TOC context switch is to use a structure
called afuncfion descriptor (sometimes called a transition vector) instead of just a
simple pointer. A function descriptor is a pointer to the structure described in
Table 13-13.
Table 13-13 Structure of a Function Descriptor
I Offset I Descrivtion
1 0 1I routine address I
4 1 TOC
8 1 environment vointer
The first word of this structure contains the address of the routine, and the sec-
ond word contains the TOC pointer. Following words can contain data such as
an environment pointer, but anything beyond the TOC pointer is optional and
depends on the development environment originally used to create the routine.
In many cases, these following words are zero or are not present.
Using this structure, an external routine can be called by passing the function
descriptor to a special routine provided to take the information in the descriptor
and properly pass control to the target routine.
The "20" value is the magic offset from the top of the stack frame to the TOC
.
storage area in the Link Area. This is where the ptrgl routine will save the
current TOC.
If it is known that the source and target routines both share the same TOC envi-
ronment, then the lwz instruction is not necessary. In this case, a standard no-op
instruction will be added in place of the load. This no-op is commonly encoded
as a ori rO ,rO ,0, but some older development environments used cror
31,31,31orcror 15,15,15.
This no-op reflects the fact that the compiler is generally unable to tell if the tar-
get routine will be in the same module as the source (the calling) routine. The
compiler cannot figure this out because module assignments are not made until
the object code is linked by the linker. When the compiler encounters a subrou-
290 Chapter 13
How the .ptrgl Routine Works
tine call to a routine that is not defined in the same source file, it must assume
that the linker could place the routine in another module. To support this poten-
tial out-of-module call, a no-op placeholder instruction is placed after the sub-
routine call. This gives the linker a place to write the required load instruction. If
the two routines are in the same module, the linker doesn't add the load instruc-
tion and the no-op instruction remains.
Of course, if the compiler has enough information available to determine if the
called routine is in the same TOC environment, then the placeholder no-op
instruction is unnecessary and is not generated.
Programming Model 29 1
5 13.13 Linking with Global Routines
the original routine) where control should return. Since the LR hasn't been mod-
ified, the target routine can return control using the standard blr instruction.
292 Chapter 13
Function Pointers in High-Level Languages