You are on page 1of 105

GCC-Inline-Assembly-HOWTO

Page 1 of 13

GCC-Inline-Assembly-HOWTO
Sandeep.S
v0.1, 01 March 2003. This HOWTO explains the use and usage of the inline assembly feature provided by GCC. There are only two prerequisites for reading this article, and thats obviously a basic knowledge of x86 assembly language and C.

1. Introduction.
1.1 Copyright and License. 1.2 Feedback and Corrections. 1.3 Acknowledgments.

2. Overview of the whole thing. 3. GCC Assembler Syntax. 4. Basic Inline. 5. Extended Asm.
5.1 Assembler Template. 5.2 Operands. 5.3 Clobber List. 5.4 Volatile ...?

6. More about constraints.


6.1 Commonly used constraints. 6.2 Constraint Modifiers.

7. Some Useful Recipes. 8. Concluding Remarks. 9. References. 1. Introduction. 1.1 Copyright and License.
Copyright (C)2003 Sandeep S.

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 2 of 13

This document is free; you can redistribute and/or modify this under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

1.2 Feedback and Corrections.


Kindly forward feedback and criticism to Sandeep.S. I will be indebted to anybody who points out errors and inaccuracies in this document; I shall rectify them as soon as I am informed.

1.3 Acknowledgments.
I express my sincere appreciation to GNU people for providing such a great feature. Thanks to Mr.Pramode C E for all the helps he did. Thanks to friends at the Govt Engineering College, Trichur for their moral-support and cooperation, especially to Nisha Kurur and Sakeeb S. Thanks to my dear teachers at Govt Engineering College, Trichur for their cooperation. Additionally, thanks to Phillip, Brennan Underwood and colin@nyx.net; Many things here are shamelessly stolen from their works.

2. Overview of the whole thing.


We are here to learn about GCC inline assembly. What this inline stands for? We can instruct the compiler to insert the code of a function into the code of its callers, to the point where actually the call is to be made. Such functions are inline functions. Sounds similar to a Macro? Indeed there are similarities. What is the benefit of inline functions? This method of inlining reduces the function-call overhead. And if any of the actual argument values are constant, their known values may permit simplifications at compile time so that not all of the inline functions code needs to be included. The effect on code size is less predictable, it depends on the particular case. To declare an inline function, weve to use the keyword inline in its declaration. Now we are in a position to guess what is inline assembly. Its just some assembly routines written as inline functions. They are handy, speedy and very much useful in system programming. Our main focus is to study the basic format and usage of (GCC) inline assembly functions. To declare inline assembly functions, we use the keyword asm. Inline assembly is important primarily because of its ability to operate and make its output visible on C variables. Because of this capability, "asm" works as an interface between the assembly instructions and the "C" program that contains it.

3. GCC Assembler Syntax.


GCC, the GNU C Compiler for Linux, uses AT&T/UNIX assembly syntax. Here well be using AT&T syntax for assembly coding. Dont worry if you are not familiar with AT&T syntax, I will teach you. This is quite different from Intel syntax. I shall give the major differences. 1. Source-Destination Ordering.

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 3 of 13

The direction of the operands in AT&T syntax is opposite to that of Intel. In Intel syntax the first operand is the destination, and the second operand is the source whereas in AT&T syntax the first operand is the source and the second operand is the destination. ie, "Op-code dst src" in Intel syntax changes to "Op-code src dst" in AT&T syntax. 2. Register Naming. Register names are prefixed by % ie, if eax is to be used, write %eax. 3. Immediate Operand. AT&T immediate operands are preceded by $. For static "C" variables also prefix a $. In Intel syntax, for hexadecimal constants an h is suffixed, instead of that, here we prefix 0x to the constant. So, for hexadecimals, we first see a $, then 0x and finally the constants. 4. Operand Size. In AT&T syntax the size of memory operands is determined from the last character of the op-code name. Opcode suffixes of b, w, and l specify byte(8-bit), word(16-bit), and long(32-bit) memory references. Intel syntax accomplishes this by prefixing memory operands (not the op-codes) with byte ptr, word ptr, and dword ptr. Thus, Intel "mov al, byte ptr foo" is "movb foo, %al" in AT&T syntax. 5. Memory Operands. In Intel syntax the base register is enclosed in [ and ] where as in AT&T they change to ( and ). Additionally, in Intel syntax an indirect memory reference is like section:[base + index*scale + disp], which changes to section:disp(base, index, scale) in AT&T. One point to bear in mind is that, when a constant is used for disp/scale, $ shouldnt be prefixed. Now we saw some of the major differences between Intel syntax and AT&T syntax. Ive wrote only a few of them. For a complete information, refer to GNU Assembler documentations. Now well look at some examples for better understanding.
+------------------------------+------------------------------------+ | Intel Code | AT&T Code | +------------------------------+------------------------------------+ | mov eax,1 | movl $1,%eax | | mov ebx,0ffh | movl $0xff,%ebx | | int 80h | int $0x80 | | mov ebx, eax | movl %eax, %ebx | | mov eax,[ecx] | movl (%ecx),%eax | | mov eax,[ebx+3] | movl 3(%ebx),%eax | mov eax,[ebx+20h] | movl 0x20(%ebx),%eax | | add eax,[ebx+ecx*2h] | addl (%ebx,%ecx,0x2),%eax | | lea eax,[ebx+ecx] | leal (%ebx,%ecx),%eax | | sub eax,[ebx+ecx*4h-20h] | subl -0x20(%ebx,%ecx,0x4),%eax | +------------------------------+------------------------------------+

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 4 of 13

4. Basic Inline.
The format of basic inline assembly is very much straight forward. Its basic form is
asm("assembly code");

Example.

asm("movl %ecx %eax"); /* moves the contents of ecx to eax */ __asm__("movb %bh (%eax)"); /*moves the byte from bh to the memory pointed by eax */

You might have noticed that here Ive used asm and __asm__. Both are valid. We can use __asm__ if the keyword asm conflicts with something in our program. If we have more than one instructions, we write one per line in double quotes, and also suffix a \n and \t to the instruction. This is because gcc sends each instruction as a string to as (GAS) and by using the newline/tab we send correctly formatted lines to the assembler. Example.

__asm__ ("movl "movl "movl "movb

%eax, %ebx\n\t" $56, %esi\n\t" %ecx, $label(%edx,%ebx,$4)\n\t" %ah, (%ebx)");

If in our code we touch (ie, change the contents) some registers and return from asm without fixing those changes, something bad is going to happen. This is because GCC have no idea about the changes in the register contents and this leads us to trouble, especially when compiler makes some optimizations. It will suppose that some register contains the value of some variable that we might have changed without informing GCC, and it continues like nothing happened. What we can do is either use those instructions having no side effects or fix things when we quit or wait for something to crash. This is where we want some extended functionality. Extended asm provides us with that functionality.

5. Extended Asm.
In basic inline assembly, we had only instructions. In extended assembly, we can also specify the operands. It allows us to specify the input registers, output registers and a list of clobbered registers. It is not mandatory to specify the registers to use, we can leave that head ache to GCC and that probably fit into GCCs optimization scheme better. Anyway the basic format is:

asm ( assembler template : output operands : input operands : list of clobbered registers );

/* optional */ /* optional */ /* optional */

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 5 of 13

The assembler template consists of assembly instructions. Each operand is described by an operand-constraint string followed by the C expression in parentheses. A colon separates the assembler template from the first output operand and another separates the last output operand from the first input, if any. Commas separate the operands within each group. The total number of operands is limited to ten or to the maximum number of operands in any instruction pattern in the machine description, whichever is greater. If there are no output operands but there are input operands, you must place two consecutive colons surrounding the place where the output operands would go. Example:

asm ("cld\n\t" "rep\n\t" "stosl" : /* no output registers */ : "c" (count), "a" (fill_value), "D" (dest) : "%ecx", "%edi" );

Now, what does this code do? The above inline fills the fill_value count times to the location pointed to by the register edi. It also says to gcc that, the contents of registers eax and edi are no longer valid. Let us see one more example to make things more clearer.

int a=10, b; asm ("movl %1, %%eax; movl %%eax, %0;" :"=r"(b) /* output */ :"r"(a) /* input */ :"%eax" /* clobbered register */ );

Here what we did is we made the value of b equal to that of a using assembly instructions. Some points of interest are: "b" is the output operand, referred to by %0 and "a" is the input operand, referred to by %1. "r" is a constraint on the operands. Well see constraints in detail later. For the time being, "r" says to GCC to use any register for storing the operands. output operand constraint should have a constraint modifier "=". And this modifier says that it is the output operand and is write-only. There are two %s prefixed to the register name. This helps GCC to distinguish between the operands and registers. operands have a single % as prefix. The clobbered register %eax after the third colon tells GCC that the value of %eax is to be modified inside "asm", so GCC wont use this register to store any other value. When the execution of "asm" is complete, "b" will reflect the updated value, as it is specified as an output operand. In other words, the change made to "b" inside "asm" is supposed to be reflected outside the "asm". Now we may look each field in detail.

5.1 Assembler Template.

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 6 of 13

The assembler template contains the set of assembly instructions that gets inserted inside the C program. The format is like: either each instruction should be enclosed within double quotes, or the entire group of instructions should be within double quotes. Each instruction should also end with a delimiter. The valid delimiters are newline(\n) and semicolon(;). \n may be followed by a tab(\t). We know the reason of newline/tab, right?. Operands corresponding to the C expressions are represented by %0, %1 ... etc.

5.2 Operands.
C expressions serve as operands for the assembly instructions inside "asm". Each operand is written as first an operand constraint in double quotes. For output operands, therell be a constraint modifier also within the quotes and then follows the C expression which stands for the operand. ie, "constraint" (C expression) is the general form. For output operands an additional modifier will be there. Constraints are primarily used to decide the addressing modes for operands. They are also used in specifying the registers to be used. If we use more than one operand, they are separated by comma. In the assembler template, each operand is referenced by numbers. Numbering is done as follows. If there are a total of n operands (both input and output inclusive), then the first output operand is numbered 0, continuing in increasing order, and the last input operand is numbered n-1. The maximum number of operands is as we saw in the previous section. Output operand expressions must be lvalues. The input operands are not restricted like this. They may be expressions. The extended asm feature is most often used for machine instructions the compiler itself does not know as existing ;). If the output expression cannot be directly addressed (for example, it is a bit-field), our constraint must allow a register. In that case, GCC will use the register as the output of the asm, and then store that register contents into the output. As stated above, ordinary output operands must be write-only; GCC will assume that the values in these operands before the instruction are dead and need not be generated. Extended asm also supports input-output or read-write operands. So now we concentrate on some examples. We want to multiply a number by 5. For that we use the instruction lea.

asm ("leal (%1,%1,4), %0" : "=r" (five_times_x) : "r" (x) );

Here our input is in x. We didnt specify the register to be used. GCC will choose some register for input, one for output and does what we desired. If we want the input and output to reside in the same register, we can instruct GCC to do so. Here we use those types of read-write operands. By specifying proper constraints, here we do it.

asm ("leal (%0,%0,4), %0" : "=r" (five_times_x) : "0" (x) );

Now the input and output operands are in the same register. But we dont know which register. Now if we want to

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 7 of 13

specify that also, there is a way.

asm ("leal (%%ecx,%%ecx,4), %%ecx" : "=c" (x) : "c" (x) );

In all the three examples above, we didnt put any register to the clobber list. why? In the first two examples, GCC decides the registers and it knows what changes happen. In the last one, we dont have to put ecx on the c lobberlist, gcc knows it goes into x. Therefore, since it can know the value of ecx, it isnt considered clobbered.

5.3 Clobber List.


Some instructions clobber some hardware registers. We have to list those registers in the clobber-list, ie the field after the third : in the asm function. This is to inform gcc that we will use and modify them ourselves. So gcc will not assume that the values it loads into these registers will be valid. We shoudnt list the input and output registers in this list. Because, gcc knows that "asm" uses them (because they are specified explicitly as constraints). If the instructions use any other registers, implicitly or explicitly (and the registers are not present either in input or in the output constraint list), then those registers have to be specified in the clobbered list. If our instruction can alter the condition code register, we have to add "cc" to the list of clobbered registers. If our instruction modifies memory in an unpredictable fashion, add "memory" to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction. We also have to add the volatile keyword if the memory affected is not listed in the inputs or outputs of the asm. We can read and write the clobbered registers as many times as we like. Consider the example of multiple instructions in a template; it assumes the subroutine _foo accepts arguments in registers eax and ecx.

asm ("movl %0,%%eax; movl %1,%%ecx; call _foo" : /* no outputs */ : "g" (from), "g" (to) : "eax", "ecx" );

5.4 Volatile ...?


If you are familiar with kernel sources or some beautiful code like that, you must have seen many functions declared as volatile or __volatile__ which follows an asm or __asm__. I mentioned earlier about the keywords asm and __asm__. So what is this volatile? If our assembly statement must execute where we put it, (i.e. must not be moved out of a loop as an optimization), put the keyword volatile after asm and before the ()s. So to keep it from moving, deleting and all, we declare it as
asm volatile ( ... : ... : ... : ...);

Use __volatile__ when we have to be verymuch careful.

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 8 of 13

If our assembly is just for doing some calculations and doesnt have any side effects, its better not to use the keyword volatile. Avoiding it helps gcc in optimizing the code and making it more beautiful. In the section Some Useful Recipes, I have provided many examples for inline asm functions. There we can see the clobber-list in detail.

6. More about constraints.


By this time, you might have understood that constraints have got a lot to do with inline assembly. But weve said little about constraints. Constraints can say whether an operand may be in a register, and which kinds of register; whether the operand can be a memory reference, and which kinds of address; whether the operand may be an immediate constant, and which possible values (ie range of values) it may have.... etc.

6.1 Commonly used constraints.


There are a number of constraints of which only a few are used frequently. Well have a look at those constraints. 1. Register operand constraint(r) When operands are specified using this constraint, they get stored in General Purpose Registers(GPR). Take the following example:
asm ("movl %%eax, %0\n" :"=r"(myval));

Here the variable myval is kept in a register, the value in register eax is copied onto that register, and the value of myval is updated into the memory from this register. When the "r" constraint is specified, gcc may keep the variable in any of the available GPRs. To specify the register, you must directly specify the register names by using specific register constraints. They are:
+---+--------------------+ | r | Register(s) | +---+--------------------+ | a | %eax, %ax, %al | | b | %ebx, %bx, %bl | | c | %ecx, %cx, %cl | | d | %edx, %dx, %dl | | S | %esi, %si | | D | %edi, %di | +---+--------------------+

2. Memory operand constraint(m) When the operands are in the memory, any operations performed on them will occur directly in the memory location, as opposed to register constraints, which first store the value in a register to be modified and then write it back to the memory location. But register constraints are usually used only when they are absolutely necessary for an instruction or they significantly speed up the process. Memory constraints can be used most efficiently in cases where a C variable needs to be updated inside "asm" and you really dont want to use a register to hold its value. For example, the value of idtr is stored in the memory location loc:
asm("sidt %0\n" : :"m"(loc));

3. Matching(Digit) constraints In some cases, a single variable may serve as both the input and the output operand. Such cases may be specified in "asm" by using matching constraints.

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 9 of 13

asm ("incl %0" :"=a"(var):"0"(var));

We saw similar examples in operands subsection also. In this example for matching constraints, the register % eax is used as both the input and the output variable. var input is read to %eax and updated %eax is stored in var again after increment. "0" here specifies the same constraint as the 0th output variable. That is, it specifies that the output instance of var should be stored in %eax only. This constraint can be used: In cases where input is read from a variable or the variable is modified and modification is written back to the same variable. In cases where separate instances of input and output operands are not necessary. The most important effect of using matching restraints is that they lead to the efficient use of available registers. Some other constraints used are: 1. "m" : A memory operand is allowed, with any kind of address that the machine supports in general. 2. "o" : A memory operand is allowed, but only if the address is offsettable. ie, adding a small offset to the address gives a valid address. 3. "V" : A memory operand that is not offsettable. In other words, anything that would fit the `m constraint but not the `oconstraint. 4. "i" : An immediate integer operand (one with constant value) is allowed. This includes symbolic constants whose values will be known only at assembly time. 5. "n" : An immediate integer operand with a known numeric value is allowed. Many systems cannot support assembly-time constants for operands less than a word wide. Constraints for these operands should use n rather than i. 6. "g" : Any register, memory or immediate integer operand is allowed, except for registers that are not general registers. Following constraints are x86 specific. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. "r" : Register operand constraint, look table given above. "q" : Registers a, b, c or d. "I" : Constant in range 0 to 31 (for 32-bit shifts). "J" : Constant in range 0 to 63 (for 64-bit shifts). "K" : 0xff. "L" : 0xffff. "M" : 0, 1, 2, or 3 (shifts for lea instruction). "N" : Constant in range 0 to 255 (for out instruction). "f" : Floating point register "t" : First (top of stack) floating point register "u" : Second floating point register "A" : Specifies the `a or `d registers. This is primarily useful for 64-bit integer values intended to be returned with the `d register holding the most significant bits and the `a register holding the least significant bits.

6.2 Constraint Modifiers.


While using constraints, for more precise control over the effects of constraints, GCC provides us with constraint modifiers. Mostly used constraint modifiers are 1. "=" : Means that this operand is write-only for this instruction; the previous value is discarded and replaced by output data. 2. "&" : Means that this operand is an earlyclobber operand, which is modified before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is used as an input operand or as part of any memory address. An input operand can be tied to an earlyclobber operand if its only use as an input occurs before the early result is written.

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 10 of 13

The list and explanation of constraints is by no means complete. Examples can give a better understanding of the use and usage of inline asm. In the next section well see some examples, there well find more about clobber-lists and constraints.

7. Some Useful Recipes.


Now we have covered the basic theory about GCC inline assembly, now we shall concentrate on some simple examples. It is always handy to write inline asm functions as MACROs. We can see many asm functions in the kernel code. (/usr/src/linux/include/asm/*.h). 1. First we start with a simple example. Well write a program to add two numbers.

int main(void) { int foo = 10, bar = 15; __asm__ __volatile__("addl %%ebx,%%eax" :"=a"(foo) :"a"(foo), "b"(bar) ); printf("foo+bar=%d\n", foo); return 0; }

Here we insist GCC to store foo in %eax, bar in %ebx and we also want the result in %eax. The = sign shows that it is an output register. Now we can add an integer to a variable in some other way.

__asm__ __volatile__( " lock ;\n" " addl %1,%0 ;\n" : "=m" (my_var) : "ir" (my_int), "m" (my_var) : );

/* no clobber-list */

This is an atomic addition. We can remove the instruction lock to remove the atomicity. In the output field, "=m" says that my_var is an output and it is in memory. Similarly, "ir" says that, my_int is an integer and should reside in some register (recall the table we saw above). No registers are in the clobber list. 2. Now well perform some action on some registers/variables and compare the value.

__asm__ __volatile__(

"decl %0; sete %1" : "=m" (my_var), "=q" (cond) : "m" (my_var) : "memory" );

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 11 of 13

Here, the value of my_var is decremented by one and if the resulting value is 0 then, the variable cond is set. We can add atomicity by adding an instruction "lock;\n\t" as the first instruction in assembler template. In a similar way we can use "incl %0" instead of "decl %0", so as to increment my_var. Points to note here are that (i) my_var is a variable residing in memory. (ii) cond is in any of the registers eax, ebx, ecx and edx. The constraint "=q" guarantees it. (iii) And we can see that memory is there in the clobber list. ie, the code is changing the contents of memory. 3. How to set/clear a bit in a register? As next recipe, we are going to see it.

__asm__ __volatile__(

"btsl %1,%0" : "=m" (ADDR) : "Ir" (pos) : "cc" );

Here, the bit at the position pos of variable at ADDR ( a memory variable ) is set to 1 We can use btrl for btsl to clear the bit. The constraint "Ir" of pos says that, pos is in a register, and its value ranges from 0-31 (x86 dependant constraint). ie, we can set/clear any bit from 0th to 31st of the variable at ADDR. As the condition codes will be changed, we are adding "cc" to clobberlist. 4. Now we look at some more complicated but useful function. String copy.

static inline char * strcpy(char * dest,const char *src) { int d0, d1, d2; __asm__ __volatile__( "1:\tlodsb\n\t" "stosb\n\t" "testb %%al,%%al\n\t" "jne 1b" : "=&S" (d0), "=&D" (d1), "=&a" (d2) : "0" (src),"1" (dest) : "memory"); return dest; }

The source address is stored in esi, destination in edi, and then starts the copy, when we reach at 0, copying is complete. Constraints "&S", "&D", "&a" say that the registers esi, edi and eax are early clobber registers, ie, their contents will change before the completion of the function. Here also its clear that why memory is in clobberlist. We can see a similar function which moves a block of double words. Notice that the function is declared as a macro.

#define mov_blk(src, dest, numwords) \ __asm__ __volatile__ ( "cld\n\t" "rep\n\t" "movsl"

\ \ \ \

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 12 of 13

: : "S" (src), "D" (dest), "c" (numwords) : "%ecx", "%esi", "%edi" )

\ \ \

Here we have no outputs, so the changes that happen to the contents of the registers ecx, esi and edi are side effects of the block movement. So we have to add them to the clobber list. 5. In Linux, system calls are implemented using GCC inline assembly. Let us look how a system call is implemented. All the system calls are written as macros (linux/unistd.h). For example, a system call with three arguments is defined as a macro as shown below.

#define _syscall3(type,name,type1,arg1,type2,arg2,type3,arg3) \ type name(type1 arg1,type2 arg2,type3 arg3) \ { \ long __res; \ __asm__ volatile ( "int $0x80" \ : "=a" (__res) \ : "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)), \ "d" ((long)(arg3))); \ __syscall_return(type,__res); \ }

Whenever a system call with three arguments is made, the macro shown above is used to make the call. The syscall number is placed in eax, then each parameters in ebx, ecx, edx. And finally "int 0x80" is the instruction which makes the system call work. The return value can be collected from eax. Every system calls are implemented in a similar way. Exit is a single parameter syscall and lets see how its code will look like. It is as shown below.

{ asm("movl $1,%%eax; xorl %%ebx,%%ebx; int $0x80" ); } /* SYS_exit is 1 */ /* Argument is in ebx, it is 0 */ /* Enter kernel mode */

The number of exit is "1" and here, its parameter is 0. So we arrange eax to contain 1 and ebx to contain 0 and by int $0x80, the exit(0) is executed. This is how exit works.

8. Concluding Remarks.
This document has gone through the basics of GCC Inline Assembly. Once you have understood the basic concept it is not difficult to take steps by your own. We saw some examples which are helpful in understanding the frequently used features of GCC Inline Assembly. GCC Inlining is a vast subject and this article is by no means complete. More details about the syntaxs we discussed

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

GCC-Inline-Assembly-HOWTO

Page 13 of 13

about is available in the official documentation for GNU Assembler. Similarly, for a complete list of the constraints refer to the official documentation of GCC. And of-course, the Linux kernel use GCC Inline in a large scale. So we can find many examples of various kinds in the kernel sources. They can help us a lot. If you have found any glaring typos, or outdated info in this document, please let us know.

9. References.
1. 2. 3. 4. 5. Brennans Guide to Inline Assembly Using Assembly Language in Linux Using as, The GNU Assembler Using and Porting the GNU Compiler Collection (GCC) Linux Kernel Source

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

6/25/2010

Using Inline Assembly With gcc


Clark L. Coleman (plagiarist/researcher)

1.0 Overview
This is a compilation in FrameMaker of three public domain documents written by others. There is no original content added by myself. The three documents are: 1. A portion of the gcc info page for gcc 2.8.1, dealing with the subject of inline assembly language. 2. A tutorial by Brennan Underwood. 3. A tutorial by colin@nyx.net.

2.0 Information from the gcc info pages


2.1 General and Copyright Information
This is Info le gcc.info, produced by Makeinfo-1.55 from the input le gcc.texi. This le documents the use and the internals of the GNU compiler. Published by the Free Software Foundation 59 Temple Place - Suite 330 Boston, MA 02111-1307 USA Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modied versions of this manual under the conditions for verbatim copying, provided also that the sections entitled GNU General Public License, Funding for Free Software, and Protect Your Freedom--Fight Look And Feel are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. File: gcc.info, Node: Extended Asm, Next: Asm Labels, Prev: Inline, Up: C Extensions

2.2 Assembler Instructions with C Expression Operands


In an assembler instruction using asm, you can now specify the operands of the instruction using C expressions. This means no more guessing which registers or memory locations will contain the data you want to use.
Using Inline Assembly With gcc January 11, 2000 1

You must specify an assembler instruction template much like what appears in a machine description, plus an operand constraint string for each operand. For example, here is how to use the 68881s fsinx instruction:
asm (fsinx %1,%0 : =f (result) : f (angle));

Here angle is the C expression for the input operand while result is that of the output operand. Each has f as its operand constraint, saying that a oating point register is required. The = in =f indicates that the operand is an output; all output operands constraints must use =. The constraints use the same language used in the machine description (*note Constraints::.). Each operand is described by an operand-constraint string followed by the C expression in parentheses. A colon separates the assembler template from the rst output operand, and another separates the last output operand from the rst input, if any. Commas separate output operands and separate inputs. The total number of operands is limited to ten or to the maximum number of operands in any instruction pattern in the machine description, whichever is greater. If there are no output operands, and there are input operands, then there must be two consecutive colons surrounding the place where the output operands would go. Output operand expressions must be lvalues; the compiler can check this. The input operands need not be lvalues. The compiler cannot check whether the operands have data types that are reasonable for the instruction being executed. It does not parse the assembler instruction template and does not know what it means, or whether it is valid assembler input. The extended asm feature is most often used for machine instructions that the compiler itself does not know exist. If the output expression cannot be directly addressed (for example, it is a bit eld), your constraint must allow a register. In that case, GNU CC will use the register as the output of the asm, and then store that register into the output. The output operands must be write-only; GNU CC will assume that the values in these operands before the instruction are dead and need not be generated. Extended asm does not support input-output or read-write operands. For this reason, the constraint character +, which indicates such an operand, may not be used. When the assembler instruction has a read-write operand, or an operand in which only some of the bits are to be changed, you must logically split its function into two separate operands, one input operand and one write-only output operand. The connection between them is expressed by constraints which say they need to be in the same location when the instruction executes. You can use the same C expression for both operands, or different expressions. For example, here we write the (ctitious) combine instruction with bar as its read-only source operand and foo as its read-write destination:
asm (combine %2,%0 : =r (foo) : 0 (foo), g (bar));

Using Inline Assembly With gcc

January 11, 2000

The constraint 0 for operand 1 says that it must occupy the same location as operand 0. A digit in constraint is allowed only in an input operand, and it must refer to an output operand. Only a digit in the constraint can guarantee that one operand will be in the same place as another. The mere fact that foo is the value of both operands is not enough to guarantee that they will be in the same place in the generated assembler code. The following would not work:
asm (combine %2,%0 : =r (foo) : r (foo), g (bar));

Various optimizations or reloading could cause operands 0 and 1 to be in different registers; GNU CC knows no reason not to do so. For example, the compiler might nd a copy of the value of foo in one register and use it for operand 1, but generate the output operand 0 in a different register (copying it afterward to foos own address). Of course, since the register for operand 1 is not even mentioned in the assembler code, the result will not work, but GNU CC cant tell that. Some instructions clobber specic hard registers. To describe this, write a third colon after the input operands, followed by the names of the clobbered hard registers (given as strings). Here is a realistic example for the Vax:
asm volatile (movc3 %0,%1,%2 : /* no outputs */ : g (from), g (to), g (count) : r0, r1, r2, r3, r4, r5);

If you refer to a particular hardware register from the assembler code, then you will probably have to list the register after the third colon to tell the compiler that the registers value is modied. In many assemblers, the register names begin with %; to produce one % in the assembler code, you must write %% in the input. If your assembler instruction can alter the condition code register, add cc to the list of clobbered registers. GNU CC on some machines represents the condition codes as a specic hardware register; cc serves to name this register. On other machines, the condition code is handled differently, and specifying cc has no effect. But it is valid no matter what the machine. If your assembler instruction modies memory in an unpredictable fashion, add memory to the list of clobbered registers. This will cause GNU CC to not keep memory values cached in registers across the assembler instruction. You can put multiple assembler instructions together in a single asm template, separated either with newlines (written as \n) or with semicolons if the assembler allows such semicolons. The GNU assembler allows semicolons and all Unix assemblers seem to do so. The input operands are guaranteed not to use any of the clobbered registers, and neither will the output operands addresses, so you can read and write the clobbered registers

Using Inline Assembly With gcc

January 11, 2000

as many times as you like. Here is an example of multiple instructions in a template; it assumes that the subroutine _foo accepts arguments in registers 9 and 10:
asm (movl : : : %0,r9;movl %1,r10;call _foo /* no outputs */ g (from), g (to) r9, r10);

Unless an output operand has the & constraint modier, GNU CC may allocate it in the same register as an unrelated input operand, on the assumption that the inputs are consumed before the outputs are produced. This assumption may be false if the assembler code actually consists of more than one instruction. In such a case, use & for each output operand that may not overlap an input. *Note Modiers::. If you want to test the condition code produced by an assembler instruction, you must include a branch and a label in the asm construct, as follows:
asm (clr %0;frob %1;beq 0f;mov #1,%0;0: : g (result) : g (input));

This assumes your assembler supports local labels, as the GNU assembler and most Unix assemblers do. Speaking of labels, jumps from one asm to another are not supported. The compilers optimizers do not know about these jumps, and therefore they cannot take account of them when deciding how to optimize. Usually the most convenient way to use these asm instructions is to encapsulate them in macros that look like functions. For example,
#define sin(x) \ ({ double __value, __arg = (x); asm (fsinx %1,%0 \ : =f (__value) \ : f (__arg)); \ __value; }) \

Here the variable __arg is used to make sure that the instruction operates on a proper double value, and to accept only those arguments x which can convert automatically to a double. Another way to make sure the instruction operates on the correct data type is to use a cast in the asm. This is different from using a variable __arg in that it converts more different types. For example, if the desired type were int, casting the argument to int would accept a pointer with no complaint, while assigning the argument to an int variable named __arg would warn about using a pointer unless the caller explicitly casts it. If an asm has output operands, GNU CC assumes for optimization purposes that the instruction has no side effects except to change the output operands. This does not mean that instructions with a side effect cannot be used, but you must be careful, because the compiler may eliminate them if the output operands arent used, or move them out of loops, or replace two with one if they constitute a common subexpression. Also, if your

Using Inline Assembly With gcc

January 11, 2000

instruction does have a side effect on a variable that otherwise appears not to change, the old value of the variable may be reused later if it happens to be found in a register. You can prevent an asm instruction from being deleted, moved signicantly, or combined, by writing the keyword volatile after the asm. For example:
#define set_priority(x) \ asm volatile (set_priority %0 : /* no outputs */ : g (x)) \ \

An instruction without output operands will not be deleted or moved signicantly, regardless, unless it is unreachable. Note that even a volatile asm instruction can be moved in ways that appear insignicant to the compiler, such as across jump instructions. You cant expect a sequence of volatile asm instructions to remain perfectly consecutive. If you want consecutive output, use a single asm. It is a natural idea to look for a way to give access to the condition code left by the assembler instruction. However, when we attempted to implement this, we found no way to make it work reliably. The problem is that output operands might need reloading, which would result in additional following store instructions. On most machines, these instructions would alter the condition code before there was time to test it. This problem doesnt arise for ordinary test and compare instructions because they dont have any output operands. If you are writing a header le that should be includable in ANSI C programs, write __asm__ instead of asm. *Note Alternate Keywords::.

Using Inline Assembly With gcc

January 11, 2000

3.0

Brennans Guide to Inline Assembly


by Brennan Bas Underwood Document version 1.1.2.2

3.1 Introduction
Ok. This is meant to be an introduction to inline assembly under DJGPP. DJGPP is based on GCC, so it uses the AT&T/UNIX syntax and has a somewhat unique method of inline assembly. I spent many hours guring some of this stuff out and told Info that I hate it, many times. Hopefully if you already know Intel syntax, the examples will be helpful to you. Ive put variable names, register names and other literals in bold type.

3.2 The Syntax


So, DJGPP uses the AT&T assembly syntax. What does that mean to you? * Register naming: AT&T: %eax Register names are prexed with %. To reference eax: Intel: eax

* Source/Destination Ordering: In AT&T syntax (which is the UNIX standard, BTW) the source is always on the left, and the destination is always on the right. So lets load ebx with the value in eax: AT&T: movl %eax, %ebx Intel: mov ebx, eax

* Constant value/immediate value format: You must prex all constant/immediate values with $. Lets load eax with the address of the C variable booga, which is static. AT&T: movl $_booga, %eax Now lets load ebx with 0xd00d: AT&T: movl $0xd00d, %ebx Intel: mov ebx, d00dh Intel: mov eax, _booga

* Operator size specication: You must sufx the instruction with one of b, w, or l to specify the width of the destination register as a byte, word or longword. If you omit this, GAS (GNU assembler) will attempt to guess. You dont want GAS to guess, and guess wrong! Dont forget it. AT&T: movw %ax, %bx Intel: mov bx, ax

The equivalent forms for Intel is byte ptr, word ptr, and dword ptr, but that is for when you are...

Using Inline Assembly With gcc

January 11, 2000

* Referencing memory: DJGPP uses 386-protected mode, so you can forget all that realmode addressing junk, including the restrictions on which register has what default segment, which registers can be base or index pointers. Now, we just get 6 general purpose registers. ( if you use ebp, but be sure to restore it yourself or compile with -fomit-framepointer.) Here is the canonical format for 32-bit addressing: AT&T: immed32(basepointer,indexpointer,indexscale) Intel: [basepointer + indexpointer*indexscale + immed32] You could think of the formula to calculate the address as: immed32 + basepointer + indexpointer * indexscale You dont have to use all those elds, but you do have to have at least 1 of immed32, basepointer and you MUST add the size sufx to the operator! Lets see some simple forms of memory addressing: o Addressing a particular C variable: AT&T: _booga Intel: [_booga]

Note: the underscore (_) is how you get at static (global) C variables from assembler. This only works with global variables. Otherwise, you can use extended asm to have variables preloaded into registers for you. I address that farther down. o Addressing what a register points to: AT&T: (%eax) Intel: [eax]

o Addressing a variable offset by a value in a register: AT&T: _variable(%eax) Intel: [eax + _variable]

o Addressing a value in an array of integers (scaling up by 4): AT&T: _array(,%eax,4) Intel: [eax*4 + array]

o You can also do offsets with the immediate value: C code: *(p+1) where p is a char * AT&T: 1(%eax) where eax has the value of p Intel: [eax + 1] o You can do some simple math on the immediate value: AT&T: _struct_pointer+8
Using Inline Assembly With gcc January 11, 2000 7

I assume you can do that with Intel format as well. o Addressing a particular char in an array of 8-character records: eax holds the number of the record desired. ebx has the wanted chars offset within the record. AT&T: _array(%ebx,%eax,8) Intel: [ebx + eax*8 + _array]

Whew. Hopefully that covers all the addressing youll need to do. As a note, you can put esp into the address, but only as the base register.

3.3 Basic inline assembly


The format for basic inline assembly is very simple, and much like Borlands method. asm (statements); Pretty simple, no? So asm (nop); will do nothing of course, and asm (cli); will stop interrupts, with asm (sti); of course enabling them. You can use __asm__ instead of asm if the keyword asm conicts with something in your program. When it comes to simple stuff like this, basic inline assembly is ne. You can even push your registers onto the stack, use them, and put them back. asm (pushl %eax\n\t movl $0, %eax\n\t popl %eax);

(The \ns and \ts are there so the .s le that GCC generates and hands to GAS comes out right when youve got multiple statements per asm.) Its really meant for issuing instructions for which there is no equivalent in C and dont touch the registers. But if you do touch the registers, and dont x things at the end of your asm statement, like so: asm (movl %eax, %ebx); asm (xorl %ebx, %edx); asm (movl $0, _booga); then your program will probably blow things to hell. This is because GCC hasnt been told that your asm statement clobbered ebx and edx and booga, which it might have been keeping in a register, and might plan on using later. For that, you need:

Using Inline Assembly With gcc

January 11, 2000

3.4 Extended inline assembly


The basic format of the inline assembly stays much the same, but now gets Watcom-like extensions to allow input arguments and output arguments. Here is the basic format: asm ( statements : output_registers : input_registers : clobbered_registers); Lets just jump straight to a nifty example, which Ill then explain: asm (cld\n\t rep\n\t stosl : /* no output registers */ (ll_value), D (dest) : %ecx, %edi ); : c (count), a

The above stores the value in ll_value count times to the pointer dest. Lets look at this bit by bit. asm (cld\n\t We are clearing the direction bit of the ags register. You never know what this is going to be left at, and it costs you all of 1 or 2 cycles. rep\n\t stosl

Notice that GAS requires the rep prex to occupy a line of its own. Notice also that stos has the l sufx to make it move longwords. : /* no output registers */ Well, there arent any in this function. : c (count), a (ll_value), D (dest) Here we load ecx with count, eax with ll_value, and edi with dest. Why make GCC do it instead of doing it ourselves? Because GCC, in its register allocating, might be able to arrange for, say, ll_value to already be in eax. If this is in a loop, it might be able to preserve eax thru the loop, and save a movl once per loop. : %ecx, %edi ); And heres where we specify to GCC, you can no longer count on the values you loaded into ecx or edi to be valid. This doesnt mean they will be reloaded for certain. This is the clobberlist. Seem funky? Well, it really helps when optimizing, when GCC can know exactly what youre doing with the registers before and after. It folds your assembly code into the code it generates (whose rules for generation look remarkably like the above) and then optimizes. Its even smart enough to know that if you tell it to put (x+1) in a register, then if
Using Inline Assembly With gcc January 11, 2000 9

you dont clobber it, and later C code refers to (x+1), and it was able to keep that register free, it will reuse the computation. Whew. Heres the list of register loading codes that youll be likely to use: a eax b ebx c ecx d edx S esi D edi I constant value (0 to 31) q,r dynamically allocated register (see below) g eax, ebx, ecx, edx or variable in memory A eax and edx combined into a 64-bit integer (use long longs) Note that you cant directly refer to the byte registers (ah, al, etc.) or the word registers (ax, bx, etc.) when youre loading this way. Once youve got it in there, though, you can specify ax or whatever all you like. The codes have to be in quotes, and the expressions to load in have to be in parentheses. When you do the clobber list, you specify the registers as above with the %. If you write to a variable, you must include memory as one of The Clobbered. This is in case you wrote to a variable that GCC thought it had in a register. This is the same as clobbering all registers. While Ive never run into a problem with it, you might also want to add cc as a clobber if you change the condition codes (the bits in the ags register the jnz, je, etc. operators look at.) Now, thats all ne and good for loading specic registers. But what if you specify, say, ebx, and ecx, and GCC cant arrange for the values to be in those registers without having to stash the previous values. Its possible to let GCC pick the register(s). You do this: asm (leal (%1,%1,4), %0 : =r (x) : 0 (x) );

The above example multiplies x by 5 really quickly (1 cycle on the Pentium). Now, we could have specied, say eax. But unless we really need a specic register (like when using rep movsl or rep stosl, which are hardcoded to use ecx, edi, and esi), why not let GCC pick an available one? So when GCC generates the output code for GAS, %0 will be replaced by the register it picked. And where did q and r come from? Well, q causes GCC to allocate from eax, ebx, ecx, and edx. r lets GCC also consider esi and edi. So make sure, if you use r that it would be possible to use esi or edi in that instruction. If not, use q. Now, you might wonder, how to determine how the %n tokens get allocated to the arguments. Its a straightforward rst-come-rst-served, left-to-right thing, mapping to the qs and rs. But if you want to reuse a register allocated with a q or r, you use 0, 1, 2... etc. You dont need to put a GCC-allocated register on the clobberlist as GCC knows that youre messing with it. Now for output registers.

Using Inline Assembly With gcc

January 11, 2000

10

asm (leal (%1,%1,4), %0

: =r (x_times_5)

: r (x) );

Note the use of = to specify an output register. You just have to do it that way. If you want 1 variable to stay in 1 register for both in and out, you have to respecify the register allocated to it on the way in with the 0 type codes as mentioned above. asm (leal (%0,%0,4), %0 This also works, by the way: asm (leal (%%ebx,%%ebx,4), %%ebx 2 things here: * Note that we dont have to put ebx on the clobberlist, GCC knows it goes into x. Therefore, since it can know the value of ebx, it isnt considered clobbered. * Notice that in extended asm, you must prex registers with %% instead of just %. Why, you ask? Because as GCC parses along for %0s and %1s and so on, it would interpret %edx as a %e parameter, see that thats non-existent, and ignore it. Then it would bitch about nding a symbol named dx, which isnt valid because its not prexed with % and its not the one you meant anyway. Important note: If your assembly statement must execute where you put it, (i.e. must not be moved out of a loop as an optimization), put the keyword volatile after asm and before the ()s. To be ultra-careful, use __asm__ __volatile__ (...whatever...); However, I would like to point out that if your assemblys only purpose is to calculate the output registers, with no other side effects, you should leave off the volatile keyword so your statement will be processed into GCCs common subexpression elimination optimization. : =b (x) : b (x) ); : =r (x) : 0 (x) );

3.5 Some useful examples


#define disable() __asm__ __volatile__ (cli); #define enable() __asm__ __volatile__ (sti);

Using Inline Assembly With gcc

January 11, 2000

11

Of course, libc has these dened too.


#define times3(arg1, arg2) \ __asm__ ( \ leal (%0,%0,2),%0 \ : =r (arg2) \ : 0 (arg1) ); #define times5(arg1, arg2) \ __asm__ ( \ leal (%0,%0,4),%0 \ : =r (arg2) \ : 0 (arg1) ); #define times9(arg1, arg2) \ __asm__ ( \ leal (%0,%0,8),%0 \ : =r (arg2) \ : 0 (arg1) );

These multiply arg1 by 3, 5, or 9 and put them in arg2. You should be ok to do: times5(x,x); as well.
#define rep_movsl(src, dest, numwords) \ __asm__ __volatile__ ( \ cld\n\t \ rep\n\t \ movsl \ : : S (src), D (dest), c (numwords) \ : %ecx, %esi, %edi )

Helpful Hint: If you say memcpy() with a constant length parameter, GCC will inline it to a rep movsl like above. But if you need a variable length version that inlines and youre always moving dwords, there ya go.
#define rep_stosl(value, dest, numwords) \ __asm__ __volatile__ ( \ cld\n\t \ rep\n\t \ stosl \ : : a (value), D (dest), c (numwords) \ : %ecx, %edi )

Same as above but for memset(), which doesnt get inlined no matter what (for now.)
#define RDTSC(llptr) ({ \ __asm__ __volatile__ ( \ .byte 0x0f; .byte 0x31 \ : =A (llptr) \ : : eax, edx); })

Reads the TimeStampCounter on the Pentium and puts the 64 bit result into llptr.

3.6 The End


The End?! Yah, I guess so. If youre wondering, I personally am a big fan of AT&T/UNIX syntax now. (It might have helped that I cut my teeth on SPARC assembly. Of course, that machine actually had a
Using Inline Assembly With gcc January 11, 2000 12

decent number of general registers.) It might seem weird to you at rst, but its really more logical than Intel format, and has no ambiguities. If I still havent answered a question of yours, look in the Info pages for more information, particularly on the input/output registers. You can do some funky stuff like use A to allocate two registers at once for 64-bit math or m for static memory locations, and a bunch more that arent really used as much as q and r. Alternately, mail me, and Ill see what I can do. (If you nd any errors in the above, please, e-mail me and tell me about it! Its frustrating enough to learn without buggy docs!) Or heck, mail me to say boogabooga. Its the least you can do. ---------------------------------------------------------------------------Related Usenet posts: * local labels * xed point multiplies

---------------------------------------------------------------------------- Thanks to Eric J. Korpela <korpela@ssl.Berkeley.EDU> for some corrections. --------------------------------------------------------------------------- Have you seen the DJGPP2+Games Page? Probably. Page written and provided by Brennan Underwood. Copyright 1996 Brennan Underwood. Share and enjoy! Page created with vi, Gods own editor.

Using Inline Assembly With gcc

January 11, 2000

13

4.0 A Brief Tutorial on GCC inline asm (x86 biased)


colin@nyx.net, 20 April 1998 I am a great fan of GCCs inline asm feature, because there is no need to second-guess or outsmart the compiler. You can tell the compiler what you are doing and what you expect of it, and it can work with it and optimize your code. However, on a convoluted processor like the x86, describing just what is going on can be quite a complex job. In the interest of a faster kernel through appropriate usage of this powerful tool, here is an introduction to its use.

4.1 Extended asm, an introduction.


In a nice clean register-register RISC architecture, accessing an occasional foo instruction is quite simple. You just write:
asm(foo %1,%2,%0 : =r (output) : r (input1), r (input2));

The part before the rst colon is very much line the semi-standard asm() feature that has been in many C compilers since the K&R days. The string is pasted into the compilers assembly output at the current location. However, GCC is rather cleverer. What will actually appear in the output of gcc -O -S foo.c (a le named foo.s) is:
#APP foo r17,r5,r9 #NO_APP

The #APP and #NO_APP parts are instructions to the assembler that briey put it into normal operating mode, as opposed to the special high-speed compiler output mode that turns off every feature that the compiler doesnt use as well as a lot of error-checking. For our purposes, its convenient becuase it highlights the part of the code were interested in. Between, you will see that the %1 and so forth have turned into registers. This is because GCC replaced %0, %1 and %2 with registers holding the rst three arguments after the colon. That is, r17 holds input1, r5 holds input2, and r9 holds output. Its perfectly legal to use more complex expressions like:
asm(foo %1,%2,%0 : =r (ptr->vtable[3](a,b,c)->foo.bar[baz]) : : r (gcc(is) + really(damn->cool)), r (42));

Using Inline Assembly With gcc

January 11, 2000

14

GCC will treat this just like:


register int t0, t1, t2; t1 = gcc(is) + really(damn->cool); t2 = 42; asm(foo %1,%2,%0 : =r (t0) : r (t1), r (t2)); ptr->vtable[3](a,b,c)->foo.bar[baz] = t0;

The general form of an asm() is


asm( code : outputs : inputs : clobbers);

Within the code, %0 refers to the rst argument (usually an output, unless there are no outputs), %1 to the second, and so forth. It only goes up to %9. Note that GCC prepends a tab and appends a newline to the code, so if you want to include multi-line asm (which is legal) and you want it to look nice in the asm output, you should separate lines with \n\t. (Youll see lots of examples of this in the Linux source.) Its also legal to use ; as a separator to put more than one asm statement on a line. There are option letters that you can put between the % and the digit to print the operand specially; more on this later. Each output or input in the comma-separated list has two parts, constraints and (value). The (value) part is pretty straightforward. Its an expression. For outputs, it must be an lvalue, i.e. something that is legal to have on the left side of an assignment. The constraints are more interesting. All outputs must be marked with =, which says that this operand is assigned to. Im not sure why this is necessary, since you also have to divide up outputs and inputs with the colon, but Im not inclined to make a fuss about it, since its easy to do once you know. The letters that come after that give permitted operands. There are more choices than you might think. Some depend on the processor, but there are a few that are generic. r, as example rm means a register or memory. ri means a register or an immediate value. g is general; it can be anything at all. Its usually equivalent to rim, but your processor may have even more options that are included. o is like m, but offsettable, meaning that you can add a small offset to it. On the x86, all memory operands are offsettable, but some machines dont support indexing and displacement at the same time, or have something like the 680x0s autoincrement addressing mode that doesnt support a displacement. Capital letters starting with I are usually assigned to immediate values in a certain range. For example, a lot of RISC machines allow either a register or a short immediate value. If our machine is like the DEC Alpha, and allows a register or a 16-bit immediate, you could write
asm(foo %1,%2,%0 : =r (output) : r (input1), rI (input2));

Using Inline Assembly With gcc

January 11, 2000

15

and if input2 were, say, 42, the compiler would use an immediate constant in the instruction. The x86-specic constraints are dened later.

4.2 A few notes about inputs


An input may be a temporary copy, but it may not be. Unless you tell GCC that you are going to modify that location (described later in equivalence constraints), you must not alter any inputs. GCC may, however, elect to place an output in the same register as an input if it doesnt need the input value any more. You must not make assumptions either way. If you need to have it one way or the other, there are ways (described later) to tell GCC what you need. The rule in GCCs inline asm is, say what you need and then get out of the optimizers way.

4.3 x86 assembly code


The GNU tools used in Linux use an AT&T-developed assembly syntax that is different from the Intel-developed one that you see in a lot of example code. Its a lot simpler, actually. It doesnt have any of the DWORD PTR stuff that the Intel syntax requires. The most signicant difference, however, is a major one and easy to get confused by. While Intel uses op dest,src, AT&T syntax uses op src,dest. DONT FORGET THIS. If youre used to Intel syntax, this can take quite a while to get used to. The easy way to know which avour of asm syntax youre reading is to look for all the % synbols. AT&T names the registers %eax, %ebx, etc. This avoids the need for a kludge like putting _ in front of all the function and variable names to avoid using perfectly good C names like esp. Its easy enough to read, but dont forget it when writing. The other major difference is that the operand size is clear from the instruction. You dont have just inc, you have incb, incw and incl to increment 8, 16 or 32 bits. If the size is clear from the operands, you can just write inc, (e.g. inc %eax), but if its a memory operand, rather than writing inc DWORD PTR foo you just wrote incl foo. inc foo is an error; the assembler doesnt try to keep track of the type of anything. Writing incl %al is an error which the assembler catches. Immediate values are written with a leading $. Thus, movl foo,%eax copies the contents of memory location foo into %eax. movl $foo,%eax copies the address of foo. movl 42,%eax is a fetch from an absolute address. movel $42,%eax is an immediate load.

Using Inline Assembly With gcc

January 11, 2000

16

Addressing modes are written offset(base,index,scale). You may leave out anything irrelevant. So (%ebx) is legal, as is -44(%ebx,%eax), which is equivalent to 44(%ebx,%eax,1). Legal scales are 1, 2 4 and 8.

4.4 Equivalence constraints


Sometimes, especially on two-address machines like the x86, you need to use the same register for output and for input. Although if you look into the GCC documentation, youll see a useful-looking + constraint character, this isnt available to inline asm. What you have to do instead is to use a special constraint like 0:
asm(foo %1,%0 : =r (output) : r (input1), 0 (input2));

This says that input2 has to go in the same place as the output, so %2 and %0 are the same thing. (Which is why %2 isnt actually mentioned anywhere.) Note that it is perfectly legal to have different variables for input and output even though they both use the same register. GCC will do any necessary copying to temporary registers for you.

4.5 Constraints on the x86


The i386 has *lots* of register classes, designed for anything remotely useful. Common ones are dened in the constraints section of the GCC manual. Here are the most useful: g - general effective address m - memory effective address r - register i - immediate value, 0..0xffffffff n - immediate value known at compile time. (i would allow an address known only at link time) But there are some i386-specic ones described in the processor-specic part of the manual and in more detail in GCCs i386.h: q - byte-addressible register (eax, ebx, ecx, edx) A - eax or edx a, b, c, d, S, D - eax, ebx, ..., esi, edi only I - immediate 0..31 J - immediate 0..63 K - immediate 255 L - immediate 65535 M immediate 0..3 (shifts that can be done with lea) N - immediate 0..255 (one-byte immediate value) O - immedaite 0..32 There are some more for oating-point registers, but I wont go into those. The very special cases like K are mostly used inside GCC in alternative code sequences, providing a special-case way to do something like ANDing with 255. But something like I is useful, for example the x86 rotate left:
asm(roll %1,%0 : =g (result) : cI (rotate), 0 (input));

Using Inline Assembly With gcc

January 11, 2000

17

(See the section on x86 assembly syntax if you wonder why the extra l is on rol.)

4.6 Advanced constraints


In the GCC manual, constraints and so on are described in most detail in the section on writing machine descriptions for ports. GCC, not surprisingly, uses the same constaints mechanism internally to compile C code. Heres a summary. = has already been discussed, to mark an output. No, I dont know why its needed in inline asm, but its not worth xing. + is described in the gcc manual, but is not legal in inline asm. Sorry. % says that this operand and the next one may be switched at the compilers convenience; the arguments are commutative. Many operations (+, *, &, |, ^) have this property, but the options permitted in the instruction set may not be as general. For example, on a RISC machine which lets the second operand be an immediate value (in the I range), you could specify an add instruction like:
asm(add %1,%2,%0 : =r (output) : %r (input1), rI (input2));

, separates a list of alternative constraints. Each input and output must have the same length list of alternatives, and one element of the list is chosen. For example, the x86 permits register-memory and memory-register operations, but not memory-memory. So an add could be written as:
asm(add %1,%0 : =r,rm (output) : %g,ri (input1), 0,0 (input2));

This says that if the output is a register, input1 may be anything, but if the output is memory, the input may only be a register or an immediate value. And input2 must be in the same place as the output, although you can swap things and place input1 there instead. If there are multiple options listed and the compiler has no preference, it will choose the rst one. Thus, if theres a minor difference in timing or some such, list the faster one rst. ? in one alternative says that an alternative is discouraged. This is important for compiler-writers who want to encourage the fastest code, but is getting pretty esoteric for inline asm. & says that an output operand is written to before the inputs are read, so this output must not be the same register as any input. Without this, gcc may place an output and an input in the same register even if not required by a 0 constraint. This is very useful, but is mentioned here because its specic to an alternative. Unlike = and %, but like ?, you have to include it with each alternative to which it applies.

Using Inline Assembly With gcc

January 11, 2000

18

Note that there is no way to encode more complex information, like this output may not be in the same place as *that* input, but may share a ragiater with that *other* input. Each output either may share a register with any input, or with none. In inline asm, you usually specify this with every alternative, since you cant chnage the order of operations depending on the option selected. In GCCs internal code generation, there are provisions for producing different code depending on the register alternative chosen, but you cant do that with inline asm. One place you might use it is when you have the possibility of the output overlapping with input two, but not input one. E.g.
asm(foo %1,%0; bar %2,%0 : =r,&r (out) : r,r (in1), 0,r (in2));

This says that either in2 is in the same register as out, or nothing is. However, with more operands, the number of possibilities quickly mushrooms and GCC doesnt cope gracefully with large numbers of alternatives.

4.7 Clobbers
Sometimes an instruction knocks out certain specic registers. The most common example of this is a function call, where the called function is allowed to do whatever it likes with some registers. If this is the case, you can list specic registers that get clobbered by an operation after the inputs. The syntax is not like constraints, you just provide a comma-separated list of registers in string form. On the 80x86, theyre ax, bx, si di, etc. There are two special cases for clobbered values. One is memory, meaning that this instruction writes to some memory (other than a listed output) and GCC shouldnt cache memory values in registers across this asm. An asm memcpy() implementation would need this. You do *not* need to list memory just because outputs are in memory; gcc understands that. The second is cc. Its not necessary on all machines, and I havemt gured it out for the x86 (I dont think it is), but its always legal to specify, and means that the instructions mess up the condition codes. Note that GCC will not use a clobbered register for inputs or outputs. GCC 2.7 would let you do it anyway, specifying an input in class a and saying that ax is clobbered. GCC 2.8 and egcs are getting pickeri, and complaining that there are no free registers in class a available. This is not the way to do it. If you corrput an input register, include a dummy output in the same register, the value of which is never used. E.g.
int dummy; asm(munge %0 : =r (dummy) : 0 (input));

Using Inline Assembly With gcc

January 11, 2000

19

4.8 Temporary registers


People also sometimes erroneously use clobbers for temporary registers. The right way is to make up a dummy output, and use =r or =&r depending on the permitted overlap with the inputs. GCC allocates a register for the dummy value. The difference is that GCC can pick a convenient register, so it has more exibility.

4.9 const and volatile


There are two optimization hints that you can give to an asm statement. asm volatile(...) statements may not be deleted or signicantly reordered; the volatile keyword says that they do something magic that the compiler shouldnt play with too much. GCC will delete ordinary asm() blocks if the outputs are not used, and will reorder them slightly to be convenient to where the outputs are. (asm blocks with no outputs are assumed to be volatile by default.) asm const() statements are assumed to produce outputs that depend only on the inputs, and thus can be subject to common subexpression optimization and can be hoisted out of loops. The most common example of an output that does *not* depend only on an input is a pointer that is fetched. *p may change from time to time even if p does not change. Thus, an asm block that fetches from a pointer should not include a const. An example of something that is good is a coprocessor instruction to compute sin(x). If GCC knows that two calls have the same value of x, it can compute sin(x) only once. For example, compare:
int foo(int x); { int i, y, total; total = 0; for (i = 0; i < 100; i++) { asm volatile(foo %1,%0 : =r (y) : g (x)); total += y; } return total; }

Using Inline Assembly With gcc

January 11, 2000

20

then try changing that to const after the asm. The code (on an x86) looks like:
func1: xorl %ecx,%ecx pushl %ebx movl %ecx,%edx movl 8(%esp),%ebx .align 4 .L7: #APP foo %ebx,%eax #NO_APP addl %eax,%ecx incl %edx cmpl $99,%edx jle .L7 movl %ecx,%eax popl %ebx ret

which then changes to (in the const case):


func2: xorl %edx,%edx #APP foo 4(%esp),%ecx #NO_APP movl %edx,%eax .align 4 .L13: addl %ecx,%edx incl %eax cmpl $99,%eax jle .L13 movl %edx,%eax ret

Im still not completely thrilled with the code (why put the loop counter in %eax instead of total, which gets returned), but you can see how it improves.

4.10 Alternate keywords


__asm__() is a legal alias for asm(), and it is legal (and produces no warnings) even when in strict-ANSI mode or when warning about non-portable constructs. Otherwise, it is equivalent.

4.11 Output substitutions


Sometimes you want to include a value in an asm statement in an unusual way. For example, you could use the lea instruction to do something hairy like
asm(lea %1(%2,%3,1<<%4),%0 : =r (out) : %i (in1), r (in2), r (in3), M(logscale));

this looks like a way to generate a legal lea instruction with all the possible bells and whistles. Theres only one problem. When GCC substitutes the immedaites in1 and logscale, its going to produce something like:
lea $-44(%ebx,%eax,1<<$2),%ecx

Using Inline Assembly With gcc

January 11, 2000

21

which is a syntax error. The $ on the constants are not useful in this context. So there are modier characters. The one applicable in this context is c, which means to omit the usual immediate value information. The correct asm is
asm(lea %c1(%2,%3,1<<%c4),%0 : =r (out) : %i (in1), r (in2), r (in3), M(logscale));

which will produce


lea -44(%ebx,%eax,1<<2),%ecx

as desired. There are a few others mentioned in the GCC manual as generic: %c0 substitutes the immediate value %0, but without the immediate syntax. %n0 substitutes like %c0, but the negated value. %l0 substitutes lile %c0, but with the syntax expected of a jump target. (This is usually the same as %c0.) And then there are the x86-specic ones. These are, unfortunately, only listed in the i386.h header le in the GCC source (cong/i386/i386.h), so you havr to dig a bit for them. %k0 prints the 32-bit form of an operand. %eax, etc. %w0 prints the 16-bit form of an operand. %ax, etc. %b0 prints the 8-bit form of an operand. %al, etc. %h0 prints the high 8-bit form of a register. %ah, etc. %z0 print opcode sufx coresponding to the operand type, b, w or l. By default, when %0 prints a register in the form corresponding to the argument size. E.g. asm(inc %0 : =r (out) : 0 (in)) will print as inc %al, inc %ax or inc %eax depending on the type of out. For example, byte-swapping on a non-486:
asm(xchg %b0,%h0; roll $16,%0; xchg %b0,%h0 : =q (x) : = (x));

This says that x must be in a byte-addressible register and proceeds to swap the bytes to big-endian form. Its legal to use the %w and %b forms on objects that arent registers, it just makes no difference. Using %b and %h on non-byte addressible registers tends to make the compiler abort, so dont do that.

Using Inline Assembly With gcc

January 11, 2000

22

%z is rather cool. For example, consider the following code:


#define xchg(m, in, out) \ asm(xchg%z0 %2,%0 \ : =g (*(m)), =r (out) : 1 (in)) \

int bar(void *m, int x) { xchg((char *)m, (char)x, x); xchg((short *)m, (short)x, x); xchg((int *)m, (int)x, x); return x; }

This produces, as assembly output,


.globl bar .type bar,@function bar: movl 4(%esp),%eax movb 8(%esp),%dl #APP xchgb %dl,(%eax) xchgw %dx,(%eax) xchgl %edx,(%eax) #NO_APP movl %edx,%eax ret

(Re-using x is a way to make sure that nothing got optimized away.) Its not really needed here because the size of the %2 register lets you get away with just xchg, but there are situations where its nice to have an operand size.

4.12 Extra % patterns


Some % substitutions dont specify an argument. The most common one is %%, which comes out as a single %. The second is %=, which generates a unique number for each asm() block. (Each time it is used if inlined or used in a macro.) This can be used for temporary labels and so on.

4.13 Examples
Some code that was in include/asm-i386/system.h:
#define _set_tssldt_desc(n,addr,limit,type) \ __asm__ __volatile__ (movw %3,0(%2)\n\t \ movw %%ax,2(%2)\n\t \ rorl $16,%%eax\n\t \ movb %%al,4(%2)\n\t \ movb %4,5(%2)\n\t \ movb $0,6(%2)\n\t \ movb %%ah,7(%2)\n\t \ rorl $16,%%eax \ : =m(*(n)) \ : a (addr), r(n), ri(limit), i(type))

Using Inline Assembly With gcc

January 11, 2000

23

Its obvious that the writer didnt know how to take optimal advantage of this (admittedly complex, but x86 addressing *is* complex) facility. This could be rewritten to use any register instead of %eax:
#define _set_tssldt_desc(n,addr,limit,type) \ __asm__ __volatile__ (movw %w3,0(%2)\n\t \ movw %w1,2(%2)\n\t \ rorl $16,%1\n\t \ movb %b1,4(%2)\n\t \ movb %4,5(%2)\n\t \ movb $0,6(%2)\n\t \ movb %h1,7(%2)\n\t \ rorl $16,%1 \ : =m(*(n)) : q (addr), r(n), ri(limit), ri(type))

You notice here that *n is listed as an output, so GCC knows that its modied, but actually addressing it is done relative to n as an input register everywhere because of the need to compute an offset. The problem is that there is no syntactic way to encode an offset from a given address. If the address is 40(%eax) then an offset of 2 can be made by prepending 2+ to it. But if the address is (%eax) then 2+(%eax) is not valid. Tricks like 2+0 fall at because 040 is taken as octal and gets translated into 32. BUT THERES NEWS (19 April 1998): gas will actually Do The Right Thing with 2+(%eax), just emit a warning. Having seen this, a gas maintainer (Alan Modra) decided to make the warning go away in this case, so in some near future version you will be able to do it. With this x (or putting up with the warning), you could write the above as:
#define _set_tssldt_desc(n,addr,limit,type) \ __asm__ __volatile__ (movw %w2,%0\n\t \ movw %w1,2+%0\n\t \ rorl $16,%1\n\t \ movb %b1,4+%0\n\t \ movb %3,5+%0\n\t \ movb $0,6+%0\n\t \ movb %h1,7+%0\n\t \ rorl $16,%1 \ : =o(*(n)) : q (addr), ri(limit), i(type))

The o constraint is just like m, except that its offstable; adding a small value to it leaves a valid address. On the x86, there is no distinction, so its not really necessary, but on the 68000, for example, you cant add an offset to a postincrement addressing mode.

Using Inline Assembly With gcc

January 11, 2000

24

If neither the warning nor waiting is acceptable, a x is to list each possible offset as a different output (here were using the fact that n is a char *):
__asm__ __volatile__ (movw %w7,%0\n\t \ movw %w6,%1\n\t \ rorl $16,%6\n\t \ movb %b6,%2\n\t \ movb %b8,%3\n\t \ movb $0,%4\n\t \ movb %h6,%5\n\t \ rorl $16,%6 \ : =m(*(n)), \ =m((n)[2]), \ =m((n)[4]), \ =m((n)[5]), \ =m((n)[6]), \ =m((n)[7]) \ : q (addr), g(limit), iqm(type))

Although, as you can see, this gets a bit ugly when you have lots of offsets, but it works just the same.

4.14 Conclusion
I hope this has been of use to some folks. GCCs inline asm features are really cool becuase you can just do the little bit that you want and let the compiler optimize the rest. This has the unfortunate side effect that you have to learn how to explain to the compiler whats going on. But its worth it, really!

Using Inline Assembly With gcc

January 11, 2000

25

Using Assembly Language in Linux.


by Phillip
phillip@ussrback.com
Last updated: Monday 8th January 2001

Contents:
Introduction Intel and AT&T Syntax Prefixes Direction of Operands Memory Operands Suffixes Syscalls Syscalls with < 6 args Syscalls with > 5 args Socket syscalls Command Line Arguments GCC Inline ASM Compiling Further reference/Links Example Code.

Introduction.
This article will describe assembly language programming under Linux. Contained within the bounds of the article is a comparison between Intel and AT&T syntax asm, a guide to using syscalls and a introductory guide to using inline asm in gcc. This article was written due to the lack of (good) info on this field of programming (inline asm section in particular), in which case i should remind thee that this is not a shellcode writing tutorial because there is no lack of info in this field. Various parts of this text I have learnt about through experimentation and hence may be prone to error. Should you find any of these errors on my part, do not hesitate to notify me via email and enlighten me on the given issue. There is only one prerequisite for reading this article, and thats obviously a basic knowledge of x86 assembly language and C.

Intel and AT&T Syntax.


Intel and AT&T syntax Assembly language are very different from each other in appearance, and this will lead to confusion when one first comes across AT&T syntax after having learnt Intel syntax first, or vice versa. So lets start with the basics.

Prefixes.
In Intel syntax there are no register prefixes or immed prefixes. In AT&T however registers are prefixed with a % and immeds are prefixed with a $. Intel syntax hexadecimal or binary immed data are suffixed with h and b respectively. Also if the first hexadecimal digit is a letter then the value is prefixed by a 0. Example: Intex Syntax
mov mov int

AT&T Syntax
$1,%eax $0xff,%ebx $0x80

eax,1 movl ebx,0ffh movl 80h int

Direction of Operands.
The direction of the operands in Intel syntax is opposite from that of AT&T syntax. In Intel syntax the first operand is the destination, and the second operand is the source whereas in AT&T syntax the first operand is the source and the second operand is the destination. The advantage of AT&T syntax in this situation is obvious. We read from left to right, we write from left to right, so this way is only natural. Example: Intex Syntax
instr mov

AT&T Syntax
source,dest (%ecx),%eax

dest,source instr eax,[ecx] movl

Memory Operands.
Memory operands as seen above are different also. In Intel syntax the base register is enclosed in [ and ] whereas in AT&T syntax it is enclosed in ( and ). Example: Intex Syntax
mov mov

AT&T Syntax
(%ebx),%eax 3(%ebx),%eax

eax,[ebx] movl eax,[ebx+3] movl

The AT&T form for instructions involving complex operations is very obscure compared to Intel syntax. The Intel syntax form of these is segreg:[base+index*scale+disp]. The AT&T syntax form is %segreg:disp(base,index,scale). Index/scale/disp/segreg are all optional and can simply be left out. Scale, if not specified and index is specified, defaults to 1. Segreg depends on the instruction and whether the app is being run in real mode or pmode. In real mode it depends on the instruction whereas in pmode its unnecessary. Immediate data used should not $ prefixed in AT&T when used for scale/disp. Example: Intel Syntax
instr mov add lea sub foo,segreg:[base+index*scale+disp] eax,[ebx+20h] eax,[ebx+ecx*2h eax,[ebx+ecx] eax,[ebx+ecx*4h-20h]

AT&T Syntax
instr movl addl leal subl %segreg:disp(base,index,scale),foo 0x20(%ebx),%eax (%ebx,%ecx,0x2),%eax (%ebx,%ecx),%eax -0x20(%ebx,%ecx,0x4),%eax

As you can see, AT&T is very obscure. [base+index*scale+disp] makes more sense at a glance than disp(base,index,scale).

Suffixes.
As you may have noticed, the AT&T syntax mnemonics have a suffix. The significance of this suffix is that of operand size. l is for long, w is for word, and b is for byte. Intel syntax has similar directives for use with memory operands, i.e. byte ptr, word ptr, dword ptr. "dword" of course corresponding to "long". This is similar to type casting in C but it doesnt seem to be necessary since the size of registers used is the assumed datatype. Example: Intel Syntax
mov mov mov mov al,bl ax,bx eax,ebx eax, dword ptr [ebx]

AT&T Syntax
movb movw movl movl %bl,%al %bx,%ax %ebx,%eax (%ebx),%eax

**NOTE: ALL EXAMPLES FROM HERE WILL BE IN AT&T SYNTAX**

Syscalls.
This section will outline the use of linux syscalls in assembly language. Syscalls consist of all the functions in the second section of the manual pages located in /usr/man/man2. They are also listed in: /usr/include/sys/syscall.h. A great list is at http://www.linuxassembly.org/syscall.html. These functions can be executed via the linux interrupt service: int $0x80.

Syscalls with < 6 args.


For all syscalls, the syscall number goes in %eax. For syscalls that have less than six args, the args go in %ebx,%ecx,%edx,%esi,%edi in order. The return value of the syscall is stored in %eax. The syscall number can be found in /usr/include/sys/syscall.h. The macros are defined as SYS_<syscall name> i.e. SYS_exit, SYS_close, etc. Example: (Hello world program - it had to be done) According to the write(2) man page, write is declared as: ssize_t write(int fd, const void *buf, size_t count); Hence fd goes in %ebx, buf goes in %ecx, count goes in %edx and SYS_write goes in %eax. This is followed by an int $0x80 which executes the syscall. The return value of the syscall is stored in %eax.
$ cat write.s .include "defines.h" .data hello: .string "hello world\n" .globl main: main movl movl movl movl int ret $ $SYS_write,%eax $STDOUT,%ebx $hello,%ecx $12,%edx $0x80

The same process applies to syscalls which have less than five args. Just leave the un-used registers unchanged. Syscalls such as open or fcntl which have an optional extra arg will know what to use.

Syscalls with > 5 args.


Syscalls whos number of args is greater than five still expect the syscall number to be in %eax, but the args are arranged in memory and the pointer to the first arg is stored in %ebx. If you are using the stack, args must be pushed onto it backwards, i.e. from the last arg to the first arg. Then the stack pointer should be copied to %ebx. Otherwise copy args to an allocated area of memory and store the address of the first arg in %ebx. Example: (mmap being the example syscall). Using mmap() in C:
#include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h>

#include <fcntl.h> #include <unistd.h> #define STDOUT 1

void main(void) { char file[]="mmap.s"; char *mappedptr; int fd,filelen; fd=fopen(file, O_RDONLY); filelen=lseek(fd,0,SEEK_END); mappedptr=mmap(NULL,filelen,PROT_READ,MAP_SHARED,fd,0); write(STDOUT, mappedptr, filelen); munmap(mappedptr, filelen); close(fd); }

Arrangement of mmap() args in memory: %esp %esp+4 %esp+8 %esp+12 %esp+16 %esp+20 00000000 filelen 00000001 00000001 fd 00000000 ASM Equivalent:
$ cat mmap.s .include "defines.h" .data file: .string "mmap.s" fd: .long filelen: .long mappedptr: .long .globl main main: push movl subl // 0 0 0

%ebp %esp,%ebp $24,%esp

open($file, $O_RDONLY); movl movl $fd,%ebx %eax,(%ebx) // save fd

//

lseek($fd,0,$SEEK_END); movl movl xorl $filelen,%ebx %eax,(%ebx) %edx,%edx // save file length

//

mmap(NULL,$filelen,PROT_READ,MAP_SHARED,$fd,0); movl %edx,(%esp) movl %eax,4(%esp) // file length still in %eax movl $PROT_READ,8(%esp) movl $MAP_SHARED,12(%esp)

movl movl movl movl movl movl int movl movl // // //

$fd,%ebx (%ebx),%eax %eax,16(%esp) %edx,20(%esp) $SYS_mmap,%eax %esp,%ebx $0x80

// load file descriptor

$mappedptr,%ebx // save ptr %eax,(%ebx)

write($stdout, $mappedptr, $filelen); munmap($mappedptr, $filelen); close($fd); movl popl ret %ebp,%esp %ebp

**NOTE: The above source listing differs from the example source code found at the end of the article. The code listed above does not show the other syscalls, as they are not the focus of this section. The source above also only opens mmap.s, whereas the example source reads the command line arguments. The mmap example also uses lseek to get the filesize.**

Socket Syscalls.
Socket syscalls make use of only one syscall number: SYS_socketcall which goes in %eax. The socket functions are identified via a subfunction numbers located in /usr/include/linux/net.h and are stored in %ebx. A pointer to the syscall args is stored in %ecx. Socket syscalls are also executed with int $0x80.
$ cat socket.s .include "defines.h" .globl _start _start: pushl movl sub //

%ebp %esp,%ebp $12,%esp

socket(AF_INET,SOCK_STREAM,IPPROTO_TCP); movl $AF_INET,(%esp) movl $SOCK_STREAM,4(%esp) movl $IPPROTO_TCP,8(%esp) movl movl movl int movl xorl int movl popl ret $SYS_socketcall,%eax $SYS_socketcall_socket,%ebx %esp,%ecx $0x80 $SYS_exit,%eax %ebx,%ebx $0x80 %ebp,%esp %ebp

Command Line Arguments.


Command line arguments in linux executables are arranged on the stack. argc comes first, followed by an array of pointers (**argv) to the strings on the command line followed by a NULL pointer. Next comes an array of pointers to the environment (**envp). These are very simply obtained in asm, and this is demonstrated in the example code (args.s).

GCC Inline ASM.


This section on GCC inline asm will only cover the x86 applications. Operand constraints will differ on other processors. The location of the listing will be at the end of this article. Basic inline assembly in gcc is very straightforward. In its basic form it looks like this:
__asm__("movl %esp,%eax"); // look familiar ?

or
__asm__(" movl xor int "); $1,%eax %ebx,%ebx $0x80 // SYS_exit

It is possible to use it more effectively by specifying the data that will be used as input, output for the asm as well as which registers will be modified. No particular input/output/modify field is compulsory. It is of the format:
__asm__("<asm routine>" : output : input : modify);

The output and input fields must consist of an operand constraint string followed by a C expression enclosed in parentheses. The output operand constraints must be preceded by an = which indicates that it is an output. There may be multiple outputs, inputs, and modified registers. Each "entry" should be separated by commas (,) and there should be no more than 10 entries total. The operand constraint string may either contain the full register name, or an abbreviation.

Abbrev Table Abbrev a b c d S D m Example:

Register %eax/%ax/%al %ebx/%bx/%bl %ecx/%cx/%cl %edx/%dx/%dl %esi/%si %edi/%di memory

__asm__("test

%%eax,%%eax", : /* no output */ : "a"(foo));

OR
__asm__("test %%eax,%%eax", : /* no output */ : "eax"(foo));

You can also use the keyword __volatile__ after __asm__: "You can prevent an asm instruction from being deleted, moved significantly, or combined, by writing the keyword volatile after the asm." (Quoted from the "Assembler Instructions with C Expression Operands" section in the gcc info files.)
$ cat inline1.c #include <stdio.h> int main(void) { int foo=10,bar=15; __asm__ __volatile__ ("addl %%ebxx,%%eax" : "=eax"(foo) // ouput : "eax"(foo), "ebx"(bar)// input : "eax" // modify ); printf("foo+bar=%d\n", foo); return 0; } $

You may have noticed that registers are now prefixed with "%%" rather than %. This is necessary when using the output/input/modify fields because register aliases based on the extra fields can also be used. I will discuss these shortly. Instead of writing "eax" and forcing the use of a particular register such as "eax" or "ax" or "al", you can simply specify "a". The same goes for the other general purpose registers (as shown in the Abbrev table). This seems useless when within the actual code you are using specific registers and hence gcc provides you with register aliases. There is a max of 10 (%0-%9) which is also the reason why only 10 inputs/outputs are allowed.
$ cat inline2.c int main(void) {

long eax; short bx; char cl; __asm__("nop;nop;nop"); // to separate inline asm from the rest of // the code __volatile__ __asm__(" test %0,%0 test %1,%1 test %2,%2" : /* no outputs */ : "a"((long)eax), "b"((short)bx), "c"((char)cl) ); __asm__("nop;nop;nop"); return 0; } $ gcc -o inline2 inline2.c $ gdb ./inline2 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnulibc1"... (no debugging symbols found)... (gdb) disassemble main Dump of assembler code for function main: ... start: inline asm ... 0x8048427 : nop 0x8048428 : nop 0x8048429 : nop 0x804842a : mov 0xfffffffc(%ebp),%eax 0x804842d : mov 0xfffffffa(%ebp),%bx 0x8048431 : mov 0xfffffff9(%ebp),%cl 0x8048434 : test %eax,%eax 0x8048436 : test %bx,%bx 0x8048439 : test %cl,%cl 0x804843b : nop 0x804843c : nop 0x804843d : nop ... end: inline asm ... End of assembler dump. $

As you can see, the code that was generated from the inline asm loads the values of the variables into the registers they were assigned to in the input field and then proceeds to carry out the actual code. The compiler auto detects operand size from the size of the variables and so the corresponding registers are represented by the aliases %0, %1 and %2. (Specifying the operand size in the mnemonic when using the register aliases may cause errors while compiling). The aliases may also be used in the operand constraints. This does not allow you to specify more than 10 entries in the input/output fields. The only use for this i can think of is when you specify the operand constraint as "q" which allows the compiler to choose between a,b,c,d registers. When this register is modified we will not know which register has been chosen and consequently cannot specify it in the modify field. In which case you can simply specify "<number>". Example:

$ cat inline3.c #include <stdio.h> int main(void) { long eax=1,ebx=2; __asm__ __volatile__ ("add %0,%2" : "=b"((long)ebx) : "a"((long)eax), "q"(ebx) : "2" ); printf("ebx=%x\n", ebx); return 0; } $

Compiling
Compiling assembly language programs is much like compiling normal C programs. If your program looks like Listing 1, then you would compile it like you would a C app. If you use _start instead of main, like in Listing 2 you would compile the app slightly differently: Listing 1
$ cat write.s .data hw: .string "hello world\n" .text .globl main main: movl $SYS_write,%eax movl $1,%ebx movl $hw,%ecx movl $12,%edx int $0x80 movl $SYS_exit,%eax xorl %ebx,%ebx int $0x80 ret $ gcc -o write write.s $ wc -c ./write 4790 ./write $ strip ./write $ wc -c ./write 2556 ./write

Listing 2
$ cat write.s .data hw: .string .text .globl _start _start: movl movl movl movl int movl xorl int

"hello world\n"

$SYS_write,%eax $1,%ebx $hw,%ecx $12,%edx $0x80 $SYS_exit,%eax %ebx,%ebx $0x80

$ gcc -c write.s $ ld -s -o write write.o $ wc -c ./write 408 ./write

The -s switch is optional, it just creates a stripped ELF executable which is smaller than a non-stripped one. This method (Listing 2) also creates smaller executables, since the compiler isnt adding extra entry and exit routines as would normally be the case.

Links.
Further reference.
http://www.linuxassembly.org GNU Assembler Manual GNU C Compiler Manual GNU Debugger Manual Operand Constraint Reference AT&T Syntax Reference

Example Code
args.s daemon.s mmap.s socket.s write.s linasm-src.tgz Reads command line arguments passed to the prog Binds a shell to a port (backdoor style) Maps a file to memory, and dumps its contents Creates a socket Hello world ! Makefile defines.h args.s daemon.s socket.s write.s

======================================================================== LINUX ASSEMBLER TUTORIAL by Robin Miyagi @ http://www.geocities.com/SiliconValley/Ridge/2544/ ======================================================================== start@: Thu Feb 03 02:14:37 UTC 2000 update: Fri Jul 30 23:52:23 UTC 2000 update: Fri Sep 15 22:39:17 UTC 2000 : - This tutorial now explains assembler as. - Information Discussion about on Linux assembler in terms of the GNU

Binutils programs such as Objdump, and ld. Debugging and gdb is added.

update: Thu Jan 11 20:13:06 UTC 2001 : ======================================================================== * Introduction -----------------------------------------------------------------------When programming in assembler for Linux (or any other Unix variant for that matter), it is important to remember that Linux is a protected mode operating system (on i386 machines, Linux operates the CPU in protected mode). This means that ordinary user mode processes are not allowed to do certain things, such as access DMA, or access IO ports. Writing Linux kernel modules on the other hand (which operate in kernel mode), are allowed to access hardware directly (Read the Assembler-HOWTO on my assembler page for more information on this issue). User mode processes may access hardware using device files. Device files actually access kernel modules which access hardware directly. This file will be restricted to user mode operation. See my pages on kernel module programming. Please email me comments penguin@dccnet.com . and suggestions regarding this tutorial at

* System Calls -----------------------------------------------------------------------In programming in assembler for DOS you probably made use of software interrupts, especially the int 0x21 functions which were the DOS system calls. In Linux, system calls are made via int 0x80. The sytem call number is passed via register EAX, and the parameters to the system call are passed via the remaining registers. This discussion only applies if there are no more than five parameters passed to the system call. If there are more than 5 parameters. The parameters must be located in memory (e.g. on the stack), and EBX must contain the address of the beginning of the parameters. If you would like a list of the system call numbers, look at the

contents of /usr/include/asm/unistd.h. If you would like information about a specific system call (e.g. write ()), type man 2 write at the prompt. Section 2 of the linux man pages covers sytem calls. If you look at the contents of /usr/include/asm/unistd.h, you will see the following line near the top of the file; #define __NR_write 4

This indicates that register EAX must be set to 4 in order to call the write () system call. Now, if you execute the following command; $ man 2 write you get the following heading). function description (under the SYNOPSIS

ssize_t write(int fd, const void *buf, size_t count); This indicates that ebx is equal to the file descriptor of the file you want to write to, ecx is a pointer of the string you want to write, and edx contains the length of the string. If there were 2 more parameters to this system call, they would be placed in esi, and edi respectively. How do I know the file discriptor for stdout is 1. If you look at your /dev directory, you will notice that /dev/stdout is a symbolic link that points to /proc/self/fd/1. Therefore stdout is file descriptor 1. I leave looking up the _exit system call as an exercise. In linux, system calls are processed by the kernel. * GNU Assembler -----------------------------------------------------------------------On most Linux systems, you will usually find the GNU C compiler (gcc). This compiler uses an assembler called as as a back-end. This means that the C compiler translates the C code into assembler, which in turn is assembled by as to an object file (*.o). As uses the AT&T syntax. Experienced intel syntax assembler programmers find AT&T really weird. It is really no more or no less difficult than intel syntax. I switched over to as because there is less ambiguity, works better with the standard GNU/Linux programs such as gdb (supports the gstabs format), objdump (objdump dissassembles code in as syntax). In short, it is a standard component of a GNU Linux system with programming tools installed. I will explain debugging and objdump later in this tutorial. If you would like more information about as look in the info documentation under as (e.g. type info as at the shell prompt). Also look in the info documentation on the Binutils package (this package contains such programming tools as objdump, ld, etc.). ** GNU assembler v.s. Intel Syntax ------------------------------------------------------------------------

Since most assembler documentation for the i386 platform is written using intel syntax, some comparison between the 2 formats is in order. Here is a summarized list of the differences; - In as the source comes before the the destination, opposite to the intel syntax. - The opcodes are suffixed with a letter indicating the size of the opperands (e.g. l for dword, w for word, b for byte). - Immediate values must be prefixed with a $, and registers must be prefixed with a %. - Effective addresses use the General DISP(BASE,INDEX,SCALE). A concrete example would be; movl mem_location(%ebx,%ecx,4), %eax Which is equivelent to the following in intel syntax; mov eax, [eax + ecx*4 + mem_location] Now for an example comments); movl %eax, %ebx movw $0x3c4a, %ax illustrating the difference (intel version in syntax

# mov %ebx, %eax

Now for our little program; -----------------------------------------------------------------------## hello-world.s ## by Robin Miyagi ## http://www.geocities.com/SiliconValley/Ridge/2544/ ## ## ## ## Compile Instructions: ------------------------------------------------------------as -o hello-world hello-world.o ld -o -O0 hello-world.o hello-world.s a basic demonstration of the GNU assembler,

## This file is ## as.

## This program displays a friendly string on the screen using ## the write () system call ######################################################################## .section .data hello: .ascii "Hello, world!\n" hello_len: .long . - hello ######################################################################## .section .text .globl _start _start: ## display string using write () system call xorl %ebx, %ebx # %ebx = 0 movl $4, %eax # write () system call xorl %ebx, %ebx # %ebx = 0 incl %ebx # %ebx = 1, fd = stdout

leal hello, %ecx movl hello_len, %edx int $0x80

# %ecx ---> hello # %edx = count # execute write () system call

## terminate program via _exit () system call xorl %eax, %eax # %eax = 0 incl %eax # %eax = 1 system call _exit () xorl %ebx, %ebx # %ebx = 0 normal program return code int $0x80 # execute system call _exit () -----------------------------------------------------------------------In the above program, notice the use of # to start comments. As also supports the /* C comment * syntax. If you use the C comment syntax, it works exactly the same as for C (multiple lines, as well as inline commenting). I always use the # comment syntax, as this works better with emacs asm-mode. The double ## is allowed but not neccessary (this is only because of a quirk of emacs asm-mode). Notice the names of the sections .text, and .data. these are used in ELF files to tell the linker where the code and data segments are. There is also the .bss section to store uninitialized data. It is only these sections that occupy memory durring program execution. * Accessing Command Line Arguments and Environment Variables When an ELF executable starts running, the command line arguments and environment variables are available on the stack. In assembler this means that you may access these via the pointer stored in ESP when the program starts execution. See the documentation on my assembler programming page relating to the ELF binary format. So how is this data arranged on the stack? Quite simple really. The number of command line arguments (including the name of the program) are stored as an integer at [esp]. Then, at [esp+4] a pointer to the first command line argument (which is the name of the program) is stored. If there were any additional command line parameters, their pointers would be stored in [esp+8], [esp+12], etc. After all the command line argument pointers, comes a NULL pointer. After the NULL pointer are all the pointers to the environment variables, and then finally a NULL pointer to indicate the end of the environment variables have been reached. A summary of the initial ELF stack is shown below; (%esp) 4(%esp) ... ?(%esp) NULL ... ??(%esp) argc, count of arguments (integer) char *argv (pointer to first command line argument) pointers to the rest of the command line arguments pointer pointers to environment variables NULL pointer

Now for our little program; -----------------------------------------------------------------------## stack-param.s ############################################### ## Robin Miyagi ################################################ ## http://www.geocities.com/SiliconValley/Ridge/2544/ ##########

## This file shows how one can access command line parameters ## via the stack at process start up. This behavior is defined ## in the ELF specification. ## Compile Instructions: ## ------------------------------------------------------------## as -o stack-param.o stack-param.s ## ld -O0 -o stack-param stack-param.o ######################################################################## .section .data new_line_char: .byte 0x0a ######################################################################## .section .text .globl _start .align 4 _start: movl %esp, %ebp again: addl $4, %esp movl (%esp), %eax testl %eax, %eax jz end_again call putstring jmp again end_again: xorl %eax, %eax incl %eax xorl %ebx, %ebx int $0x80 # # # # # # # # # # %esp ---> next parameter on stack move next parameter into %eax %eax (parameter) == NULL pointer? get out of loop if yes output parameter to stdout. repeat loop %eax = 0 %eax = 1, system call _exit () %ebx = 0, normal program exit. execute _exit () system call # store %esp in %ebp

## prints string to stdout putstring: .type @function pushl %ebp movl %esp, %ebp movl 8(%ebp), %ecx xorl %edx, %edx count_chars: movb (%ecx,%edx,$1), %al testb %al, %al jz done_count_chars incl %edx jmp count_chars done_count_chars: movl $4, %eax xorl %ebx, %ebx incl %ebx int $0x80 movl $4, %eax leal new_line_char, %ecx xorl %edx, %edx incl %edx int $0x80 movl %ebp, %esp popl %ebp ret ------------------------------------------------------------------------

* The Binutils Package -----------------------------------------------------------------------Binutils stands for binary utilities, and includes a lot useful to programmers, especially durring debugging. I will now address some of these utilities. ** Objdump -----------------------------------------------------------------------Objdump diplays information about 1 or more object files. For example, to see information about param-stack, type the following command at shell prompt (be sure working directory contains param-stack); objdump -x param-stack | less Since the information output of objdump is command less. the numeric information in command; is likely to span more than one screen, the piped to the standard input of the paging option -x tells objdump to display the hexadecimal. Here is the output of the above of tools

---------------------------------------------------------------stack-param: file format elf32-i386 stack-param architecture: i386, flags 0x00000112: EXEC_P, HAS_SYMS, D_PAGED start address 0x08048074 Program Header: LOAD off filesz LOAD off filesz Sections: Idx Name 0 .text 1 .data 2 .bss SYMBOL TABLE: 08048074 l 080490c0 l 080490c4 l 00000000 l 00000000 l 00000000 l 080490c0 l 08048076 l 08048087 l 0804808e l 08048096 l 080480a0 l 00000000 080480be g 08048074 g d d d d d d 0x00000000 0x000000be 0x000000c0 0x00000001 Size 0000004a CONTENTS, 00000001 CONTENTS, 00000000 ALLOC .text .data .bss *ABS* *ABS* *ABS* .data .text .text .text .text .text F *UND* O *ABS* .text vaddr memsz vaddr memsz 0x08048000 0x000000be 0x080490c0 0x00000004 paddr flags paddr flags 0x08048000 align 2**12 r-x 0x080490c0 align 2**12 rwAlgn 2**2 2**2 2**2

VMA LMA File off 08048074 08048074 00000074 ALLOC, LOAD, READONLY, CODE 080490c0 080490c0 000000c0 ALLOC, LOAD, DATA 080490c4 080490c4 000000c4

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

new_line_char again end_again putstring count_chars done_count_chars _etext _start

080490c1 g 080490c1 g 080490c4 g

O *ABS* O *ABS* O *ABS*

00000000 __bss_start 00000000 _edata 00000000 _end

---------------------------------------------------------------Notice the Information provided from the program header (ELF files have header information at the beginning of the file giving information to the kernel on how to load the file into memory etc.). ELF files also contain information about the sections (contained in section tables). Notice that the .text section contains 0x4a bytes of information, is located 0x74 bytes into the file, and is aligned at a 4 byte boundary (4 == 2 ** 2), has memory allocated to it (ALLOC), is readoly, and contains code (the segment selector cs for this process points to this section (handled by the operating system)). Information about the symbols information is used by debuggers examine binary files. is also provided. All this and other programming tools to

Objdump can also be used to dissasemble binary executables. Typeing the following command will dissassemble the file to standard output (this does nothing to the actual file, as objdump only reads from the file); objdump -d stack-param | less Here is the output of the above command; ---------------------------------------------------------------stack-param: file format elf32-i386 Disassembly of section .text: 08048074 <_start>: 8048074: 89 e5 08048076 <again>: 8048076: 83 8048079: 8b 804807c: 85 804807e: 74 8048080: e8 8048085: eb c4 04 04 24 c0 07 09 00 00 00 ef movl addl movl testl je call jmp xorl incl xorl int pushl movl movl xorl movb %esp,%ebp $0x4,%esp (%esp,1),%eax %eax,%eax 8048087 <end_again> 804808e <putstring> 8048076 <again> %eax,%eax %eax %ebx,%ebx $0x80 %ebp %esp,%ebp 0x8(%ebp),%ecx %edx,%edx (%ecx,%edx,1),%al

08048087 <end_again>: 8048087: 31 c0 8048089: 40 804808a: 31 db 804808c: cd 80 0804808e <putstring>: 804808e: 55 804808f: 89 e5 8048091: 8b 4d 08 8048094: 31 d2 08048096 <count_chars>: 8048096: 8a 04 11

8048099: 804809b: 804809d: 804809e:

84 c0 74 03 42 eb f6

testb je incl jmp

%al,%al 80480a0 <done_count_chars> %edx 8048096 <count_chars>

080480a0 <done_count_chars>: 80480a0: b8 04 00 00 00 movl $0x4,%eax 80480a5: 31 db xorl %ebx,%ebx 80480a7: 43 incl %ebx 80480a8: cd 80 int $0x80 80480aa: b8 04 00 00 00 movl $0x4,%eax 80480af: 8d 0d c0 90 04 08 leal 0x80490c0,%ecx 80480b5: 31 d2 xorl %edx,%edx 80480b7: 42 incl %edx 80480b8: cd 80 int $0x80 80480ba: 89 ec movl %ebp,%esp 80480bc: 5d popl %ebp 80480bd: c3 ret ---------------------------------------------------------------The -d tells objdump to disassemble sections that are expected to contain code (usually the .text section). Using the -D option will disassemble all sections. Objdump was able to give the names of labels in the code because of the information contained in the symbols table. The first column displays the virtual memory address for each line of code. The second column displays the machine code corresponding to its respective assembler line of code, and finally the code in assembler is contained in the 3rd column. For more information look in the info documentation system. ** Getting the amount of memory used with size -----------------------------------------------------------------------If you do an ls -l stack-param you get the following -rwxrwxr-x 1 robin robin 932 Sep 15 18:21 stack-param

This tells you that the file is 932 bytes long. However this file also contains header tables, section tables, symbol tables etc. The amount of memory that this program will use durring run time will be less than this. To find out actual memory use, type the following; size stack-param The above will result in the following output; text 74 data 1 bss 0 dec 75 hex filename 4b stack-param

This tells you that .text occupies 74 bytes, and .data occupies one byte, for a total of 75 bytes memory use. ** Getting rid of symbol information with strip -----------------------------------------------------------------------The strip command can be used to get rid of the symbol information. With no options, this command only strips symbols that are not used for debugging. With the --stip-all option provided, it will strip

all symbol information, including those used for debugging. I recommend not doing this, as this makes the files harder to analyse with the standard programming tools. This command is used only if file size is of paramount importance. * debugging and gdb -----------------------------------------------------------------------Perhaps the most difficult aspect of programming is debugging. Quite often the error that caused the program to terminate abnormally is not at the line where the program terminated (the example later on will show this). Program that exits with SIG_SEGV -----------------------------------------------------------------------## stack-param-error.s ######################################### ## Robin Miyagi ################################################ ## http://www.geocities.com/SiliconValley/Ridge/2544/ ########## ## This file shows how one can access command line parameters ## via the stack at process start up. This behavior is defined ## in the ELF specification. ## Compile Instructions: ## ------------------------------------------------------------## as --gstabs -o stack-param-error.o stack-param-error.s ## ld -O0 -o stack-param-error stack-param-error.o ######################################################################## .section .data new_line_char: .byte 0x0a ######################################################################## .section .text .globl _start .align 4 _start: movl %esp, %ebp again: addl $4, %esp leal (%esp), %eax testl %eax, %eax jz end_again call putstring jmp again end_again: xorl %eax, %eax incl %eax xorl %ebx, %ebx int $0x80 # # # # # # # # # # %esp ---> next parameter on stack move next parameter into %eax %eax (parameter) == NULL pointer? get out of loop if yes output parameter to stdout. repeat loop %eax = 0 %eax = 1, system call _exit () %ebx = 0, normal program exit. execute _exit () system call # store %esp in %ebp

## prints string to stdout putstring: .type @function pushl %ebp movl %esp, %ebp movl 8(%ebp), %ecx xorl %edx, %edx count_chars:

movb (%ecx,%edx,$1), %al testb %al, %al jz done_count_chars incl %edx jmp count_chars done_count_chars: movl $4, %eax xorl %ebx, %ebx incl %ebx int $0x80 movl $4, %eax leal new_line_char, %ecx xorl %edx, %edx incl %edx int $0x80 movl %ebp, %esp popl %ebp ret -----------------------------------------------------------------------Notice that the above program is assembled with the --gstabs option of as. This make as put debugging information in output file, such as the original source file, debugging symbols etc. Using objdump -x stack-param-error | less will show you the inclusion of debugging symbols. Now to find out where our error occurred type the following command; gdb stack-param-error this will get you to the gdb prompt (gdb); (gdb) run eat my shorts /home/robin/programming/asm-tut/stack-param-error eat my shorts Program recieved SIGSEGV, segmentation fault count_chars () at stack-param-error.s:47 47 movb (%ecx,%edx,$1), %al Current language: auto; currently asm (gdb) q [~]$ _ (gdb will output more than this, I just wanted to highlight what is important). This tells us that the segmentation fault occured at line 47 of param-stack-error.s. However the problem was caused in line 29. If you look at line 29 of stack-param.s, you will see that this line reads movl (%esp), %eax. This is due to the way intel i386 opcode lea handles NULL pointers. EAX was never loaded with 0 on a null pointer (just some invalid pointer), which caused line 47 to access an area of memory not available to this process (hence the segmentation fault). The loop in _start () never stopped normally, as the condition for breaking out of the loop is eax being 0, which never happened. Debugging is an art that comes with practice. about gdb, look in the info pages (e.g. info For more information gdb). You can also

type help at the (gdb) prompt. The only reason gdb was able to tell you what line number in the source code the error occured is that the debugging symbols and source code was included in the output file (recall that we used the --gstabs option). -------------------------------------------------------------------Comments and suggestions <penguin@dccnet.com> ======================================================================== You are free to make verbatim copies of this file, providing that this notice is preserved.

-----------------------------------------------------------------------Introduction to GCC Inline Asm By Robin Miyagi http://www.geocities.com/SiliconValley/Ridge/2544/ Wed Sep 13 19:18:50 UTC -----------------------------------------------------------------------* as and AT&T Syntax -----------------------------------------------------------------------The GNU C Compiler uses the assembler as as a backend. This assembler uses AT&T syntax. Here is a brief overview of the syntax. For more information about as, look in the system info documentation. - as uses the form; nemonic source, destination (opposite to intel syntax) - as prefixes registers with $. with %, and prefixes numeric constants

- Effective addresses use the following general syntax; SECTION:DISP(BASE, INDEX, SCALE) As in other assemblers, any one or more of these components may be ommited, within constraints of valid intel instruction syntax. The above syntax was shamelessly copied from the info pages under the i386 dependant features of as. - As suffixes the assembler nemonics with a letter indicating the operand sizes (b for byte, w for word, l for long word). Read the info pages for more information such as suffixes for floating point registers etc. Example code (raw asm, not gcc inline) -------------------------------------------------------------------movl %eax, %ebx /* intel: mov ebx, eax */ movl $56, %esi /* intel: mov esi, 56 */ movl %ecx, $label(%edx,%ebx,$4) /* intel: mov [edx+ebx*4+4], ecx */ movb %ah, (%ebx) /* intel: mov [ebx], ah */ -------------------------------------------------------------------Notice that as uses C comment syntax. As can also use works the same way as ; in most other intel assemblers. # that

Above code in inline asm -------------------------------------------------------------------__asm__ ("movl %eax, %ebx\n\t" "movl $56, %esi\n\t" "movl %ecx, $label(%edx,%ebx,$4)\n\t" "movb %ah, (%ebx)"); -------------------------------------------------------------------Notice that in the above example, the __ prefixing and suffixing asm are not neccesary, but may prevent name conflicts in your program. You can read more about this in [C enxtensions|extended asm] under

the info documentation

for gcc.

Also notice the \n\t at the end of each line except the last, and that each line is inclosed in quotes. This is because gcc sends each as instruction to as as a string. The newline/tab combination is required so that the lines are fed to as according to the correct format (recall that each line in asssembler is indented one tab stop, generally 8 characters). You can also use labels from your C code (variable names and such). In Linux, underscores prefixing C variables are not Necessary in your code; e.g. int main (void) { int Cvariable; __asm__ ("movl Cvariable, %eax"); # Cvariable contents > eax __asm__ ("movl $Cvariable, %ebx"); # ebx ---> Cvariable } Notice that in the documentation for DJGPP, it will say that the underscore is necessary. The difference is do to the differences between djgpp RDOFF format and Linuxs ELF format. I am not certain, but I think that the old Linux a.out object files also use underscores (please contact me if you have comments on this). * Extended Asm -----------------------------------------------------------------------The code in the above example will most probably cause conflicts with the rest of your C code, especially with compiler optimizations (recall that gcc is an optimizing compiler). Any registers used in your code may be used to hold C variable data from the rest of your program. You would not want to inadvertently modify the register without telling gcc to take this into account when compiling. This is where extended asm comes into play. Extended asm allows you to specify input registers, output registers, and clobbered registers as interface information to your block of asm code. You can even allow gcc to choose actual physical CPU registers automatically, that probably fit into gccs optimization scheme better. An example will demonstrate extended asm better. Example code -------------------------------------------------------------------#include <stdlib.h> int main (void) { int operand1, operand2, sum, accumulator; operand1 = rand (); operand2 = rand (); __asm__ ("movl %1, %0\n\t" "addl %2, %0" : "=r" (sum) /* output operands */ : "r" (operand1), "r" (operand2) /* input operands */ : "0"); /* clobbered operands */ accumulator = sum; __asm__ ("addl %1, %0\n\t"

"addl %2, %0" : "=r" (accumulator) : "0" (accumulator), "g" (operand1), "r" (operand2) : "0"); return accumulator; } -------------------------------------------------------------------The first the line that begins with : specifies the output operands, the second indicates the input operands, and the last indicates the clobbered operands. the "r", "g", and "0" are examples of constraints. Output constraints must be prefixed with an =, as in "=r" (= is a constraint modifier, indicating write only). Input and output constraints must have its correspoding C argument included with it enclosed in parenthisis (this must not be done with the clobbered line, I figured this out after an hour of fustration). "r" means assign a general register register for the argument, "g" means to assign any register, memory or immediate integer for this. Notice the use of "0", "1", "2" etc. These are used to ensure that when the same variable is indicated in more than one place in the extended asm, that is variable is only mapped to one register. If you had merely used another "r" for example, the compiler may or may not assign this variable to the same register as before. You can surmise from this that "0" refers to the first register assigned to a variable, "1" the second etc. When these registers are used in the asm code, they are refered to as "%0", "%1" etc. Summary of constraints. (copied from the system info documentation for gcc) -------------------------------------------------------------------m A memory operand is allowed, with any kind machine supports in general. o A memory operand is allowed, but only if the address is "offsettable". This means that adding a small integer (actually, the width in bytes of the operand, as determined by its machine mode) may be added to the address and the result is also a valid memory address. For example, an address which is constant is offsettable; so is an address that is the sum of a register and a constant (as long as a slightly larger constant is also within the range of address-offsets supported by the machine); but an autoincrement or autodecrement address is not offsettable. More complicated indirect/indexed addresses may or may not be offsettable depending on the other addressing modes that the machine supports. Note that in an output operand which can be matched by another operand, the constraint letter o is valid only when accompanied by both < (if the target machine has predecrement addressing) and > (if the target machine has preincrement addressing). V of address that the

A memory operand that is not offsettable. In other words, anything that would fit the m constraint but not the o constraint. < A memory operand with autodecrement predecrement or postdecrement) is allowed. > A memory operand with autoincrement preincrement or postincrement) is allowed. r A register operand register. d, a, f, ... Other letters can be defined in machine-dependent fashion to stand for particular classes of registers. d, a and f are defined on the 68000/68020 to stand for data, address and floating point registers. i An immediate integer operand (one with constant value) is allowed. This includes symbolic constants whose values will be known only at assembly time. n An immediate integer operand with a known numeric value is allowed. Many systems cannot support assembly-time constants for operands less than a word wide. Constraints for these operands should use n rather than i. I, J, K, ... P Other letters in the range I through P may be defined in a machine-dependent fashion to permit immediate integer operands with explicit integer values in specified ranges. For example, on the 68000, I is defined to stand for the range of values 1 to 8. This is the range permitted as a shift count in the shift instructions. E An immediate floating operand (expression code const_double) is allowed, but only if the target floating point format is the same as that of the host machine (on which the compiler is running). F An immediate is allowed. G, H floating operand (expression code const_double) is allowed provided that it is in a general addressing (either addressing (either

G and H may be defined in a machine-dependent permit immediate floating operands in particular values. s An immediate integer integer is allowed. operand whose value is not

fashion to ranges of

an explicit

This might appear strange; if an insn allows a constant operand with a value not known at compile time, it certainly must allow any known value. So why use s instead of i? Sometimes it allows better code to be generated. For example, on the 68000 in a fullword instruction it is possible to use an immediate operand; but if the immediate value is between -128 and 127, better code results from loading the value into a register and using the register. This is because the load into the register can be done with a moveq instruction. We arrange for this to happen by defining the letter K to mean "any integer outside the range -128 to 127", and then specifying Ks in the operand constraints. g Any register, memory or immediate integer operand is allowed, except for registers that are not general registers. X Any operand whatsoever is allowed, even if it does not satisfy general_operand. This is normally used in the constraint of a match_scratch when certain alternatives will not actually require a scratch register. 0, 1, 2, ... 9 An operand that matches the specified operand number is allowed. If a digit is used together with letters within the same alternative, the digit should come last. This is called a "matching constraint" and what it really means is that the assembler has only a single operand that fills two roles considered separate in the RTL insn. For example, an add insn has two input operands and one output operand in the RTL, but on most CISC machines an add instruction really has only two operands, one of them an input-output operand: addl #35,r12 Matching constraints are used in these circumstances. More precisely, the two operands that match must include one input-only operand and one output-only operand. Moreover, the digit must be a smaller number than the number of the operand that uses it in the constraint. For operands to match in a particular case usually means that they are identical-looking RTL expressions. But in a few special cases specific kinds of dissimilarity are allowed. For example, *x as an input operand will match *x++ as an output

operand. For proper results in such cases, the output template should always use the output-operands number when printing the operand. p An operand that is a valid memory address is allowed. for "load address" and "push address" instructions. This is

p in the constraint must be accompanied by address_operand as the predicate in the match_operand. This predicate interprets the mode specified in the match_operand as the mode of the memory reference for which the address would be valid. Q, R, S, ... U Letters in the range Q through U may be defined in a machine-dependent fashion to stand for arbitrary operand types. The machine description macro EXTRA_CONSTRAINT is passed the operand as its first argument and the constraint letter as its second operand. A typical use for this would be to distinguish certain types of memory references that affect other insn operands. Do not define these constraint letters references (reg); the reload pass does would not handle it properly. to accept not expect register this and

In order to have valid assembler code, each operand must satisfy its constraint. But a failure to do so does not prevent the pattern from applying to an insn. Instead, it directs the compiler to modify the code so that the constraint will be satisfied. Usually this is done by copying an operand into a register. Contrast, therefore, the two instruction patterns that follow: (define_insn "" [(set (match_operand:SI 0 "general_operand" "=r") (plus:SI (match_dup 0) (match_operand:SI 1 "general_operand" "r")))] "" "...") which has two operands, one and of which must appear in two places,

(define_insn "" [(set (match_operand:SI 0 "general_operand" "=r") (plus:SI (match_operand:SI 1 "general_operand" "0") (match_operand:SI 2 "general_operand" "r")))] "" "...") which has three operands, two constraint to be identical. If the form (insn N PREV NEXT (set (reg:SI 3) of which are required by a we are considering an insn of

(plus:SI (reg:SI 6) (reg:SI 109))) ...) the first pattern would not apply at all, because this insn does not contain two identical subexpressions in the right place. The pattern would say, "That does not look like an add instruction; try other patterns." The second pattern would say, "Yes, thats an add instruction, but there is something wrong with it." It would direct the reload pass of the compiler to generate additional insns to make the constraint true. The results might look like this: (insn N2 PREV N (set (reg:SI 3) (reg:SI 6)) ...) (insn N N2 NEXT (set (reg:SI 3) (plus:SI (reg:SI 3) (reg:SI 109))) ...) It is up to you to make sure that each operand, in each pattern, has constraints that can handle any RTL expression that could be present for that operand. (When multiple alternatives are in use, each pattern must, for each possible combination of operand expressions, have at least one alternative which can handle that combination of operands.) The constraints dont need to *allow* any possible operand--when this is the case, they do not constrain--but they must at least point the way to reloading any possible operand so that it will fit. * If the constraint accepts whatever operands the predicate permits, there is no problem: reloading is never necessary for this operand. For example, an operand whose constraints permit everything except registers is safe provided its predicate rejects registers. An operand whose predicate accepts only constant values is safe provided its constraints include the letter i. If any possible constant value is accepted, then nothing less than i will do; if the predicate is more selective, then the constraints may also be more selective. * Any operand expression can be reloaded by copying it into a register. So if an operands constraints allow some kind of register, it is certain to be safe. It need not permit all classes of registers; the compiler knows how to copy a register into another register of the proper class in order to make an instruction valid. * A nonoffsettable memory reference can be reloaded by copying the address into a register. So if the constraint uses the letter o, all memory references are taken care of. * A constant operand can be reloaded by allocating space in memory to hold it as preinitialized data. Then the memory reference can be used in place of the constant. So if the constraint uses the letters o or m, constant operands are not a problem.

* If the constraint permits a constant and a pseudo register used in an insn was not allocated to a hard register and is equivalent to a constant, the register will be replaced with the constant. If the predicate does not permit a constant and the insn is re-recognized for some reason, the compiler will crash. Thus the predicate must always recognize any objects allowed by the constraint. If the operands predicate can recognize registers, but the constraint does not permit them, it can make the compiler crash. When this operand happens to be a register, the reload pass will be stymied, because it does not know how to copy a register temporarily into memory. If the predicate accepts a unary operator, the constraint applies to the operand. For example, the MIPS processor at ISA level 3 supports an instruction which adds two registers in SImode to produce a DImode result, but only if the registers are correctly sign extended. This predicate for the input operands accepts a sign_extend of an SImode register. Write the constraint to indicate the type of register that is required for the operand of the sign_extend. -----------------------------------------------------------------------The = in the "=r" is a constraint modifier, you can find more information about constraint modifiers, in the gcc info under Machine Descriptions : Constraints : Modifiers. I strongly recommend reading more in the system info documentation. If you havent had much experience with the info reader (also accesable through emacs), learn it, it is an excellent source of information. The gcc info documentation also explains how to use a specific CPU register for a constraint for various hardware including the i386. You can find this information under [gcc : Machine Desc : Constraints : Machine Constraints] in the info documentation. You can specify specific registers in your constraints, e.g. "%eax". * __asm__ __volatile__ -----------------------------------------------------------------------Because of the compilers optimization mechanism, your code may not appear at exactly in the location specified by the programmer. I may even be interspersed with the rest of the code. To prevent this, you can use __asm__ __volotile__ instead. Like the __ for asm, these are also not needed for volatile, but can prevent name conflicts. ======================================================================== comments and suggestions <deltak@telus.net>

Linux Assembly Tutorial, CS 200


by Bjorn Chambless
Introduction The following is designed to be a Linux equivalent to "Developing Assembly Language Programs on a PC" by Douglas V. Hall. This tutorial requires the following: an i386 family PC running Linux as, the GNU assembler (included with any gcc installation) ld, the GNU linker (also included with gcc) gdb, the GNU debugger The tutorial was developed on a 5.1 Redhat Linux installation running a 2.0.34 version kernel and the version 5 and 6 C language libraries with ELF file format. But I have tried to make the tutorial as general possible with respect to Linux systems. I highly recommend working through this tutorial with "as" and "gdb" documentation close at hand. Overview The process of developing an assembly program under linux is somewhat different from development under NT. In order to accommodate object oriented languages which require the compiler to create constructor and destructor methods which execute before and after the execution of "main", the GNU development model embeds user code within a wrapper of system code. In other words, the users "main" is treated as a function call. An advantage of this is that user is not required to initialize segment registers, though user code must obey some function requirements. The Code The following is the Linux version of the average temperature program. It will be referred to as "average.s". Note: Assembly language programs should use the ".s" suffix.
/* linux version of AVTEMP.ASM CS 200, fall 1998 */ .data /* beginning of data segment */ /* hi_temp data item */ .type hi_temp,@object /* declare as data object */ .size hi_temp,1 /* declare size in bytes */ hi_temp: .byte 0x92 /* set value */ /* lo_temp data item */ .type lo_temp,@object .size lo_temp,1 lo_temp: .byte 0x52

/* av_temp data item */ .type av_temp,@object .size av_temp,1 av_temp: .byte 0 /* segment registers set up by linked code */ /* beginning of text(code) segment */ .text .align 4 /* set 4 double-word alignment */ .globl main /* make main global for linker */ .type main,@function /* declare main as a function */ main: pushl %ebp /* function requirement */ movl %esp,%ebp /* function requirement */ movb hi_temp,%al addb lo_temp,%al movb $0,%ah adcb $0,%ah movb $2,%bl idivb %bl movb %al,av_temp leave /* function requirement */ ret /* function requirement */

assembly instructions This code may be assembled with the following command: as -a --gstabs -o average.o average.s The "-a" option prints a memory listing during assembly. This output gives the location variables and code with respect to the beginnings of the data and code segments. "--gstabs" places debugging information in the executable (used by gdb). "-o" specifies average.o as the output file name (the default is a.out, which is confusing since the file is not executable.) The object file (average.o) can then be linked to the Linux wrapper code in order to create an executable. These files are crt1.o, crti.o and crtn.o. crt1.o and crti.o provide initialization code and crtn.o does cleanup. These should all be located in "/usr/lib" be may be elsewere on some systems. They, and their source, might be located by executing the following find command: find / -name "crt*" -print The link command is the following: ld -m elf_i386 -static /usr/lib/crt1.o /usr/lib/crti.o -lc average.o /usr/lib/crtn.o "-m elf_i386" instructs the linker to use the ELF file format. "-static" cause static rather than dynamic linking to occur. And "-lc" links in the standard c libraries (libc.a). It might be necessary to include "-I/libdirectory" in the invocation for ld to find the c library. It will be necessary to change the mode of the resulting object file with "chmod +x ./a.out".

It should now be possible to execute the file. But, of course, there will be no output. I recommend placing the above commands in a makefile . debugging The "--gstabs" option given to the assembler allows the assembly program to be debugged under gdb. The first step is to invoke gdb: gdb ./a.out gdb should start with the following message:
[bjorn@pomade src]$ gdb ./a.out GNU gdb 4.17 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (gdb) The "l" command will list the program sourcecode. (gdb) l 1 /* linux version of AVTEMP.ASM CS 200, fall 1998 */ 2 .data /* beginning of data segment */ 3 4 /* hi_temp data item */ 5 .type hi_temp,@object /* declare as data object */ 6 .size hi_temp,1 /* declare size in bytes */ 7 hi_temp: 8 .byte 0x92 /* set value */ 9 10 /* lo_temp data item */ (gdb) The first thing to do is set a breakpoint so it will be possible to step through the code. (gdb) break main Breakpoint 1 at 0x80480f7 (gdb) This sets a breakpoint at the beginning of main. Now run the program. (gdb) run Starting program: /home/bjorn/src/./a.out Breakpoint 1, main () at average.s:31 31 movb hi_temp,%al Current language: auto; currently asm (gdb)

values in registers can be checked with either "info registers" (gdb) info registers eax 0x8059200 ecx 0xbffffd94 edx 0x0 0 ebx 0x8097bf0 esp 0xbffffdd8 ebp 0xbffffdd8 esi 0x1 1 edi 0x8097088 eip 0x80480f7 eflags 0x246 582 cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x2b 43 gs 0x2b 43 (gdb) 134582784 -1073742444 134839280 0xbffffdd8 0xbffffdd8 134836360 0x80480f7

...or "p/x $eax" which prints the value in the EAX register in hex. The "e" in front of the register name indicates a 32 bit register. The Intel x86 family has included "extended" 32 bit registers since the 80386. These E registers are to the X registers as the L and H are to the X registers. Linux also uses a "flat" and protected memory model rather that segmentation, thus the EIP stores the entire current address. (gdb) p/x $eax $4 = 0x8059200 (gdb) The "p" command prints, "/x" indicates the output should be in hexadecimal. type "s" or "step" to step to the next instruction. (gdb) step 32 (gdb) addb lo_temp,%al

notice that 92H has been loaded into the least significant bit of the EAX register (ie. the AL register) by the movb instruction. (gdb) p/x $eax $6 = 0x8059292 (gdb) And we continue stepping through the program.... (gdb) 33 (gdb) 34 (gdb) 35 (gdb) 36 (gdb) 37 (gdb) 38 s movb $0,%ah s adcb $0,%ah s movb $2,%bl s idivb %bl s movb %al,av_temp s leave

and if we examine the EAX register and the variable av_temp after the final movb instruction, we see that they are set to the correct value, 72H. (gdb) p/x $eax $9 = 0x8050072 (gdb) p/x av_temp $10 = 0x72 (gdb) Note that during stepping the listed instruction is the one about to be executed.

< back to main page

DJGPP QuickAsm Programming Guide


Okay, so this tutorial has long been overdue, Ive been putting it off for a couple of months; but right now Im in the mood to write it so here it is. This is just a short tutorial on doing assembly code using DJGPP. I am not teaching how to code x86 asm (get another tutorial or book); but Ill try to show how to do both inline and external asm in DJGPP. I assume you are already familiar with "standard" Intel asm, as used in TASM, MASM, etc. I highly suggest reading the FAQ lists first, faq102.zip and faq211b.zip, and the online documentation inside the txi*.zip package. There is also a newsgroup, comp.os.msdos.djgpp. The main site is at Delorie Software, where the most up-to-date information is available on DJGPP, and where the mail archives are kept. I find many helpful articles in the mail archives.
Click Here to Get Paid to Surf the Web!

AT&T x86 Asm Syntax


DJGPP uses AT&T asm syntax. This is a little different from the regular Intel format. The main differences are: AT&T syntax uses the opposite order for source and destination operands, source followed by destination. Register operands are preceded by the % character, including sections. Immediate operands are preceded by the $ character. The size of memory operands are specified using the last character of the opcode. These are b (8-bit), w (16-bit), and l (32-bit). Here are some examples. Intel equivalents, if any, are provided in C++-style comments..
movw xorl movw movb movw movl %bx, %ax %eax, %eax $1, %ax X, %ah X, %ax X, %eax // // // // // // mov xor mov mov mov mov ax, bx eax, eax ax,1 ah, byte ptr X ax, word ptr X eax, X

Most opcodes are identical between AT&T and Intel format, except for these:
movsSD movzSD // movsx // movz

where S and D are the source and destination operand size suffixes, respectively

movswl %ax, %ecx cbtw cwtl cwtd cltd lcall $S,$O ljmp $S,$O lret $V

// // // // // // // //

movsx ecx, ax cbw cwde cwd cdq call far S:O jump far S:O ret far V

Opcode prefixes should not be written on the same line as the instruction they act upon. For example, rep and stosd should be two separate instructions, with the latter immediately following the former. Memory references are a little different too. The usual Intel memory reference of the form: section:[base + index*scale + disp] is written as: section:disp(base, index, scale) Here are some examples:
movl addl movb movl movw 4(%ebp), %eax (%eax,%eax,4), %ecx $4, %fs:(%eax) _array(,%eax,4), %eax _array(%ebx,%eax,4), %cx // // // // // mov add mov mov mov eax, [ebp+4]) ecx, [eax + eax*4]) fs:eax, 4) eax, [4*eax + array]) cx, [ebx + 4*eax + array])

Jump instructions always use the smallest displacements. However, the following instructions always work in byte displacements only: jcxz, jecxz, loop, loopz, loope, loopnz and loopne. As suggested in the online documentation, a jcxz foo could be expanded to work:
jcxz cx_zero jmp cx_nonzero cx_zero: jmp foo cx_nonzero:

The documentation also caution on mul and imul instructions. The expanding multiply instructions are done using one operand. For example, imul $ebx, $ebx will not put the result in edx:eax. Use the single operand form imul %ebx to get the expanded result.

Inline Asm
Ill start with inline asm first, because it seems to be the more frequently asked question. This is the basic syntax, as described in the online help: __asm__(asm statements : outputs : inputs : registers-modified); The four fields are:

asm statements - AT&T form, separated by newline outputs - constraint followed by name in parentheses, separated by comma inputs - constraint followed by name in parentheses, separated by comma registers-modified - names separated by comma A simple example:
__asm__(" pushl %eax\n movl $1, %eax\n popl %eax" );

You do not always have to use the other three fields, as long as you do not want to specify any input or output variables and youre not accidentally clobbering any registers. Lets spice it up with input variables.
int i = 0; __asm__(" pushl %%eax\n movl %0, %%eax\n addl $1, %%eax\n movl %%eax, %0\n popl %%eax" : : "g" (i) ); // increment i

Dont panic yet! Ill try to explain first. Our input variable is i and we want to increment it by 1. We dont have any output variables, nor clobbered registers (we save eax ourselves). Therefore the second and last fields are empty. Since the input field is specified, we need to leave a blank colon for the output field, but none for the last field, since it isnt used. Leave a newline or at least one space between two blank colons. Lets move on to the input field. Constraints are just instructions you give to the compiler to handle the variables they act upon. They are enclosed in double quotes. So what does the "g" mean? "g" lets the compiler decide where to load the value of i into, as long as the asm instructions accept it. In general, most of your input variables can be constrained as "g", letting the compiler decide how to load them (gcc might even optimize it!). Other commonly used constraints are "r" (load into any available register), "a" (ax/eax), "b" (bx/ebx), "c" (cx/ecx), "d" (dx/edx), "D" (di/edi), "S" (si/esi), etc. The one input we have will be referred to as %0 inside the asm statements. If we have two inputs, they will be %0 and %1, in the order listed in the input fields (see next example). For N inputs and no outputs, %0 through %N-1 will correspond to the inputs, in the order they are listed. If any of the input, output, or registers-modified fields are used, register names inside the asm statements must be preceded with two percent (%) characters, instead of one. Contrast this example

with the first one, which didnt use any of the last three fields. Lets do two inputs and introduce "volatile":
int i=0, j=1; __asm__ __volatile__(" pushl %%eax\n movl %0, %%eax\n addl %1, %%eax\n movl %%eax, %0\n popl %%eax" : : "g" (i), "g" (j) ); // increment i by j

Okay, this time around we have two inputs. No problem, the only thing we have to remember is %0 corresponds to the first input (i in this case), and %1 to j, which is listed after i. Oh yeah, what exactly is this volatile thing? It just prevents the compiler from modifying your asm statements (reordering, deleting, combining, etc.), and assemble them as they are (yes, gcc will optimize if it feels like it!). I suggest using volatile most of the time, and from here on well be using it. Lets do one which uses the output field this time.
int i=0; __asm__ __volatile__(" pushl %%eax\n movl $1, %%eax\n movl %%eax, %0\n popl %%eax" : "=g" (i) ); // assign 1 to i

This looks almost exactly like one of our previous input field examples; and it is really not very different. All output constraints are preceded by an equal (=) character. They are also referred from %0 to %N-1 inside the asm statements, in the order they are listed in the output field. You might ask what happens if one uses both the input and output fields? Well, the next example will show you how to use them together.
int i=0, j=1, k=0; __asm__ __volatile__(" pushl %%eax\n movl %1, %%eax\n addl %2, %%eax\n movl %%eax, %0\n popl %%eax" : "=g" (k) : "g" (i), "g" (j) ); // k = i + j

Okay, the only unclear part is just the numbering of the variables inside the asm statements. Ill explain. When using both input and output fields:

%0 ... %K are the outputs %K+1 ... %N are the inputs In our example, %0 refers to k, %1 to i, and %2 to j. Simple, no? So far we havent used the last field, registers-modified, at all. If we need to use any register inside our asm statements, we either have to push or pop them explicitly, or list them in this field and let gcc take care of that. Heres the previous example, without the explicit eax save and restore.
int i=0, j=1, k=0; __asm__ __volatile__(" pushl %%eax\n movl %1, %%eax\n addl %2, %%eax\n movl %%eax, %0\n popl %%eax" : "=g" (k) : "g" (i), "g" (j) : "ax", "memory" ); // k = i + j

We tell gcc that were using register eax in the registers-modified field and it will take care of saving or restoring, if necessary. A 16-bit register name covers 32-, 16- or 8-bit registers. If we are also touching memory (writing to variables, etc.), its recommended to specify "memory" in the registers-modified field. This means all our examples here should have had this specified (well, except the very first one), but I chose not to bring this up until now, just for simplicity. Local labels inside your inline asm should be terminated with either b or f, for backward and forward references, respectively. For example,
__asm__ __volatile__(" 0:\n ... jmp 0b\n ... jmp 1f\n ... 1:\n ... );

Heres an example on mixing C code with inline asm jumps (thanks to Srikanth B.R for this tip).
void MyFunction( { __asm__( __asm__( __asm__( int x, int y ) "Start:" ); ...do some comparison... ); "jl Label_1" );

CallFunction( &x, &y ); __asm__("jmp Start"); Label_1: return; }

External Asm
Blah... Okay fine. Heres a clue: Get some of your C/C++ files, and compile them as gcc -S file.c. Then inspect file.S. The basic layout is:
.file "myasm.S" .data somedata: .word 0 ... .text .globl __myasmfunc __myasmfunc: ... ret

Macros, macros! Theres a header file libc/asmdefs.h that is convenient for writing external asm. Just include it on top of your asm source and use the macros accordingly. For example, heres myasm.S:
#include <libc/asmdefs.h> .file "myasm.S" .data .align 2 somedata: .word ...

.text .align 4 FUNC(__MyExternalAsmFunc) ENTER movl ARG1, %eax ... jmp mylabel ... mylabel: ... LEAVE

That should be a good skeleton for your external asm code.

Other Resources

The best way to learn all these is to look at others code. Theres some inline asm code in the sys/farptr.h. Also, if you run Linux, FreeBSD, etc., somewhere in the kernel source tree (i386/ or something), there are plenty of asm sources. Check the djgpp2/ directory at x2ftp.oulu.fi, for graphics and gaming libraries that have sources. If you have asm code that needs to be converted from Intel to AT&T syntax, or just want to stick with regular Intel syntax, you can: Get NASM, a free assembler which takes Intel asm format and produces COFF binaries compatible with DJGPP Get MASM and compile your sources to COFF format (object file format used by DJGPP) Get ta2asv08.zip, a TASM to AT&T asm converter Get o2cv10.arj to convert .OBJ/.LIB between TASM and DJGPP Search the mail archives for a sed script that converts Intel to AT&T syntax

< back to main page July 28, 1999 - icbmX2 on EFNET IRC. Send e-mail for corrections or suggestions to avly@castle.net. Copyright 1995-1999 icbmX2. All rights reserved. Standard Disclaimer: All trademarks mentioned are owned by their respective companies. There are absolutely no guarantees, expressed or implied, on anything that you find in this document. I cannot be held responsible for anything that results from the use or misuse of this document.

Section 2.4. Inline Assembly

Page 1 of 7

Click here to show toolbars of the Web Online Help System: show toolbars

2.4. Inline Assembly


Another form of coding allowed with the gcc compiler is the ability to do inline assembly code. As its name implies, inline assembly does not require a call to a separately compiled assembler program. By using certain constructs, we can tell the compiler that code blocks are to be assembled rather than compiled. Although this makes for an architecture-specific file, the readability and efficiency of a C function can be greatly increased. Here is the inline assembler construct: ----------------------------------------------------------------------1 asm (assembler instruction(s) 2 : output operands (optional) 3 : input operands (optional) 4 : clobbered registers (optional) 5 ); -----------------------------------------------------------------------

For example, in its most basic form, asm ("movl %eax, %ebx");

could also be written as asm ("movl %eax, %ebx" :::);

We would be lying to the compiler because we are indeed clobbering ebx. Read on. What makes this form of inline assembly so versatile is the ability to take in C expressions, modify them, and return them to the program, all the while making sure that the compiler is aware of our changes. Let's further explore the passing of parameters.

2.4.1. Ouput Operands


On line 2, following the colon, the output operands are a list of C expressions in parentheses preceded by a "constraint." For output operands, the constraint usually has the = modifier, which indicates that this is write-only. The & modifier shows that this is an "early clobber" operand, which means that this operand is clobbered before the instruction is finished using it. Each operand is separated by a comma.

2.4.2. Input Operands


The input operands on line 3 follow the same syntax as the output operands except for the write-only modifier.

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se... 6/25/2010

Section 2.4. Inline Assembly

Page 2 of 7

2.4.3. Clobbered Registers (or Clobber List)


In our assembly statements, we can modify various registers and memory. For gcc to know that these items have been modified, we list them here.

2.4.4. Parameter Numbering


Each parameter is given a positional number starting with 0. For example, if we have one output parameter and two input parameters, %0 references the output parameter and %1 and %2 reference the input parameters.

2.4.5. Constraints
Constraints indicate how an operand can be used. The GNU documentation has the complete listing of simple constraints and machine constraints. Table 2.4 lists the most common constraints for the x86.

Table 2.4. Simple and Machine Constraints for x86


Constraint Function
a b c d S D I q r m A eax register. ebx register. ecx register. edx register. esi register. edi register.

Constant value (031). Dynamically allocates a register from eax, ebx, ecx, edx. Same as q + esi, edi. Memory location. Same as a + b. eax and ebx are allocated together to form a 64-bit register.

2.4.6. asm
In practice (especially in the Linux kernel), the keyword asm might cause errors at compile time because of other constructs of the same name. You often see this expression written as __asm__, which has the same meaning.

2.4.7. __volatile__
Another commonly used modifier is __volatile__. This modifier is important to assembly code. It tells the compiler not to optimize the inline assembly routine. Often, with hardware-level software, the compiler thinks we are being redundant and wasteful and attempts to rewrite our code to be as efficient as possible. This is useful for application-level programming, but at the hardware level, it can be counterproductive. For example, say we are writing to a memory-mapped register represented by the reg variable. Next, we initiate some action that requires us to poll reg. The compiler simply sees this as consecutive reads to the same memory location and eliminates the apparent redundancy. Using __volatile__, the compiler now

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se... 6/25/2010

Section 2.4. Inline Assembly

Page 3 of 7

knows not to optimize accesses using this variable. Likewise, when you see asm volatile (...) in a block of inline assembler code, the compiler should not optimize this block. Now that we have the basics of assembly and gcc inline assembly, we can turn our attention to some actual inline assembly code. Using what we just learned, we first explore a simple example and then a slightly more complex code block. Here's the first code example in which we pass variables to an inline block of code: ----------------------------------------------------------------------6 int foo(void) 7 { 8 int ee = 0x4000, ce = 0x8000, reg; 9 __asm__ __volatile__("movl %1, %%eax"; 10 "movl %2, %%ebx"; 11 "call setbits" ; 12 "movl %%eax, %0" 13 : "=r" (reg) // reg [param %0] is output 14 : "r" (ce), "r"(ee) // ce [param %1], ee [param %2] are inputs 15 : "%eax" , "%ebx" // %eax and % ebx got clobbered 16 ) 17 printf("reg=%x",reg); 18 } -----------------------------------------------------------------------

Line 6
This line is the beginning of the C routine.

Line 8
ee, ce, and req are local variables that will be passed as parameters to the inline assembler.

Line 9
This line is the beginning of the inline assembler routine. Move ce into eax.

Line 10
Move ee into ebx.

Line 11
Call some function from assembler.

Line 12
Return value in eax, and copy it to reg.

Line 13
This line holds the output parameter list. The parm reg is write only.

Line 14
This line is the input parameter list. The parms ce and ee are register variables.

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se... 6/25/2010

Section 2.4. Inline Assembly

Page 4 of 7

Line 15
This line is the clobber list. The regs eax and ebx are changed by this routine. The compiler knows not to use the values after this routine.

Line 16
This line marks the end of the inline assembler routine. This second example uses the switch_to() function from include/ asm-i386/system.h. This function is the heart of the Linux context switch. We explore only the mechanics of its inline assembly in this chapter. Chapter 9, "Building the Linux Kernel," covers how switch_to() is used:
[View full width]

----------------------------------------------------------------------include/asm-i386/system.h 012 extern struct task_struct * FASTCALL(__switch_to(struct task_struct *prev, struct task_struct *next)); ... 015 #define switch_to(prev,next,last) do { 016 unsigned long esi,edi; 017 asm volatile("pushfl\n\t" 018 "pushl %%ebp\n\t" 019 "movl %%esp,%0\n\t" /* save ESP */ 020 "movl %5,%%esp\n\t" /* restore ESP */ 021 "movl $1f,%1\n\t" /* save EIP */ 022 "pushl %6\n\t" /* restore EIP */ 023 "jmp __switch_to\n" 023 "1:\t" 024 "popl %%ebp\n\t" 025 "popfl" 026 :"=m" (prev->thread.esp),"=m" (prev->thread.eip), 027 "=a" (last),"=S" (esi),"=D" (edi) 028 :"m" (next->thread.esp),"m" (next->thread.eip), 029 "2" (prev), "d" (next)); 030 } while (0) -----------------------------------------------------------------------

Line 12
FASTCALL tells the compiler to pass parameters in registers.

The asmlinkage tag tells the compiler to pass parameters on the stack.

Line 15
do { statements...} while(0) is a coding method to allow a macro to appear more like a function to the compiler. In this case, it allows the use of local variables.

Line 16
Don't be confused; these are just local variable names.

Line 17
This is the inline assembler; do not optimize.

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se... 6/25/2010

Section 2.4. Inline Assembly

Page 5 of 7

Line 23
Parameter 1 is used as a return address.

Lines 1724
\n\t has to do with the compiler/assembler interface. Each assembler instruction should be on its own line.

Line 26
prev->thread.esp and prev->thread.eip are the output parameters:

[ %0]= (prev->thread.esp), is write-only memory [%1]= (prev->thread.eip), is write-only memory

Line 27
[%2]=(last) is write only to register eax:

[%3]=(esi), is write-only to register esi [%4]=(edi), is write-only to register edi

Line 28
Here are the input parameters: [%5]= (next->thread.esp), is memory [%6]= (next->thread.eip), is memory

Line 29
[%7]= (prev), reuse parameter "2" (register eax) as an input:

[%8]= (next), is an input assigned to register edx.

Note that there is no clobber list. The inline assembler for PowerPC is nearly identical in construct to x86. The simple constraints, such as "m" and "r," are used along with a PowerPC set of machine constraints. Here is a routine to exchange a 32-bit pointer. Note how similar the inline assembler syntax is to x86: ----------------------------------------------------------------------include/asm-ppc/system.h 103 static __inline__ unsigned long 104 xchg_u32(volatile void *p, unsigned long val) 105 { 106 unsigned long prev; 107 108 __asm__ __volatile__ ("\n\ 109 1: lwarx %0,0,%2 \n" 110 111 " stwcx. %3,0,%2 \n\ 112 bne- 1b" 113 : "=&r" (prev), "=m" (*(volatile unsigned long *)p)

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se... 6/25/2010

Section 2.4. Inline Assembly

Page 6 of 7

114 : "r" (p), "r" (val), "m" (*(volatile unsigned long *)p) 115 : "cc", "memory"); 116 117 return prev; 118 } -----------------------------------------------------------------------

Line 103
This subroutine is expanded in place; it will not be called.

Line 104
Routine names with parameters p and val.

Line 106
This is the local variable prev.

Line 108
This is the inline assembler. Do not optimize.

Lines 109111
lwarx, along with stwcx, form an "atomic swap." lwarx loads a word from memory and "reserves" the address for a subsequent store from stwcx.

Line 112
Branch if not equal to label 1 (b = backward).

Line 113
Here are the output operands: [%0]= (prev), write-only, early clobber [%1]= (*(volatile unsigned long *)p), write-only memory operand

Line 114
Here are the input operands: [%2]= (p), register operand [%3]= (val), register operand [%4]= (*(volatile unsigned long *)p), memory operand

Line 115
Here are the clobber operands: [%5]= Condition code register is altered [%6]= memory is clobbered

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se... 6/25/2010

Section 2.4. Inline Assembly

Page 7 of 7

This closes our discussion on assembly language and how the Linux 2.6 kernel uses it. We have seen how the PPC and x86 architectures differ and how general ASM programming techniques are used regardless of platform. We now turn our attention to the programming language C, in which the majority of the Linux kernel is written, and examine some common problems programmers encounter when using C.

Copyright @ 2007 OpenSourceProject.org.cn. chinaperl@gmail.com

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se... 6/25/2010

Inline assembly for x86 in Linux

http://www-128.ibm.com/developerworks/linux/library/l-ia.html?dwzone...

Inline assembly for x86 in Linux


Putting the pieces together

Level: Advanced Bharata Rao (rbharata@in.ibm.com)IBM Linux Technology Center, IBM Software Labs, India 01 Mar 2001 Bharata B. Rao offers a guide to the overall use and structure of inline assembly for x86 on the Linux platform. He covers the basics of inline assembly and its various usages, gives some basic inline assembly coding guidelines, and explains the instances of inline assembly code in the Linux kernel. If youre a Linux kernel developer, you probably find yourself coding highly architecture-dependent functions or optimizing a code path pretty often. And you probably do this by inserting assembly language instructions into the middle of C statements (a method otherwise known as inline assembly). Lets take a look at the specific usage of inline assembly in Linux. (Well limit our discussion to the IA32 assembly.)

GNU assembler syntax in brief


Lets first look at the basic assembler syntax used in Linux. GCC, the GNU C Compiler for Linux, uses AT&T assembly syntax. Some of the basic rules of this syntax are listed below. (The list is by no means complete; Ive included only those rules pertinent to inline assembly.) Register naming Register names are prefixed by %. That is, if eax has to be used, it should be used as %eax. Source and destination ordering In any instruction, source comes first and destination follows. This differs from Intel syntax, where source comes after destination.

mov %eax, %ebx, transfers the contents of eax to ebx.

Size of operand The instructions are suffixed by b, w, or l, depending on whether the operand is a byte, word, or long. This is not mandatory; GCC tries provide the appropriate suffix by reading the operands. But specifying the suffixes manually improves the code readability and eliminates the possibility of the compilers guessing incorrectly.

movb %al, %bl -- Byte move movw %ax, %bx -- Word move movl %eax, %ebx -- Longword move

Immediate operand An immediate operand is specified by using $.

movl $0xffff, %eax -- will move the value of 0xffff into eax register.

Indirect memory reference Any indirect references to memory are done by using ( ).

1 of 7

08/28/2006 01:23 PM

Inline assembly for x86 in Linux

http://www-128.ibm.com/developerworks/linux/library/l-ia.html?dwzone...

movb (%esi), %al -- will transfer the byte in the memory pointed by esi into al register

Inline assembly
GCC provides the special construct "asm" for inline assembly, which has the following format:

asm ( assembler template : output operands : input operands : list of clobbered registers );

(optional) (optional) (optional)

In this example, the assembler template consists of assembly instructions. The input operands are the C expressions that serve as input operands to the instructions. The output operands are the C expressions on which the output of the assembly instructions will be performed.

asm ("movl %%cr3, %0\n" :"=r"(cr3val));

a b c d S D

%eax %ebx %ecx %edx %esi %edi

Memory operand constraint(m) When the operands are in the memory, any operations performed on them will occur directly in the memory location, as opposed to register constraints, which first store the value in a register to be modified and then write it back to the memory location. But register constraints are usually used only when they are absolutely necessary for an instruction or they significantly speed up the process. Memory constraints can be used most efficiently in cases where a C variable needs to be updated inside "asm" and you really dont want to use a register to hold its value. For example, the value of idtr is stored in the memory location loc:

("sidt %0\n" : :"m"(loc));

Matching(Digit) constraints In some cases, a single variable may serve as both the input and the output operand. Such cases may be specified in "asm" by using matching constraints.

2 of 7

08/28/2006 01:23 PM

Inline assembly for x86 in Linux

http://www-128.ibm.com/developerworks/linux/library/l-ia.html?dwzone...

asm ("incl %0" :"=a"(var):"0"(var));

In our example for matching constraints, the register %eax is used as both the input and the output variable. var input is read to %eax and updated %eax is stored in var again after increment. "0" here specifies the same constraint as the 0th output variable. That is, it specifies that the output instance of var should be stored in %eax only. This constraint can be used: In cases where input is read from a variable or the variable is modified and modification is written back to the same variable In cases where separate instances of input and output operands are not necessary The most important effect of using matching restraints is that they lead to the efficient use of available registers.

Examples of common inline assembly usage


The following examples illustrate usage through different operand constraints. There are too many constraints to give examples for each one, but these are the most frequently used constraint types. "asm" and the register constraint "r" Lets first take a look at "asm" with the register constraint r. Our example shows how GCC allocates registers, and how it updates the value of output variables.

int main(void) { int x = 10, y; asm ("movl %1, %%eax; "movl %%eax, %0;" :"=r"(y) :"r"(x) :"%eax"); }

/* y is output operand */ /* x is input operand */ /* %eax is clobbered register */

In this example, the value of x is copied to y inside "asm". x and y are passed to "asm" by being stored in registers. The assembly code generated for this example looks like this:

main: pushl %ebp movl %esp,%ebp subl $8,%esp movl $10,-4(%ebp) movl -4(%ebp),%edx #APP /* asm starts here */ movl %edx, %eax movl %eax, %edx #NO_APP /* asm ends here */ movl %edx,-8(%ebp)

/* x=10 is stored in %edx */ /* x is moved to %eax */ /* y is allocated in edx and updated */ /* value of y in stack is updated with the value in %edx */

GCC is free here to allocate any register when the "r" constraint is used. In our example it chose %edx for storing x. After reading the value of x in %edx, it allocated the same register for y.

3 of 7

08/28/2006 01:23 PM

Inline assembly for x86 in Linux

http://www-128.ibm.com/developerworks/linux/library/l-ia.html?dwzone...

Since y is specified in the output operand section, the updated value in %edx is stored in -8(%ebp), the location of y on stack. If y were specified in the input section, the value of y on stack would not be updated, even though it does get updated in the temporary register storage of y(%edx). And since %eax is specified in the clobbered list, GCC doesnt use it anywhere else to store data. Both input x and output y were allocated in the same %edx register, assuming that inputs are consumed before outputs are produced. Note that if you have a lot of instructions, this may not be the case. To make sure that input and output are allocated in different registers, we can specify the & constraint modifier. Here is our example with the constraint modifier added.

int main(void) { int x = 10, y; asm ("movl %1, %%eax; "movl %%eax, %0;" :"=&r"(y) :"r"(x) :"%eax"); }

/* y is output operand, note the & constraint modifier. */ /* x is input operand */ /* %eax is clobbered register */

And here is the assembly code generated for this example, from which it is evident that x and y have been stored in different registers across "asm".

main: pushl %ebp movl %esp,%ebp subl $8,%esp movl $10,-4(%ebp) movl -4(%ebp),%ecx #APP movl %ecx, %eax movl %eax, %edx #NO_APP movl %edx,-8(%ebp)

/* x, the input is in %ecx */ /* y, the output is in %edx */

Use of specific register constraints


Now lets take a look at how to specify individual registers as constraints for the operands. In the following example, the cpuid instruction takes the input in the %eax register and gives output in four registers: %eax, %ebx, %ecx, %edx. The input to cpuid (the variable "op") is passed to "asm" in the eax register, as cpuid expects it to. The a, b, c, and d constraints are used in the output to collect the values in the four registers, respectively.

asm ("cpuid" : "=a" (_eax), "=b" (_ebx), "=c" (_ecx), "=d" (_edx) : "a" (op));

And below you can see the generated assembly code for this (assuming the _eax, _ebx, etc.... variables are stored on stack):

4 of 7

08/28/2006 01:23 PM

Inline assembly for x86 in Linux

http://www-128.ibm.com/developerworks/linux/library/l-ia.html?dwzone...

movl -20(%ebp),%eax /* store op in %eax -- input */ #APP cpuid #NO_APP movl movl movl movl %eax,-4(%ebp) %ebx,-8(%ebp) %ecx,-12(%ebp) %edx,-16(%ebp) /* store %eax in _eax -- output */ /* store other registers in respective output variables */

The strcpy function can be implemented using the "S" and "D" constraints in the following manner:

asm ("cld\n rep\n movsb" : /* no input */ :"S"(src), "D"(dst), "c"(count));

The source pointer src is put into %esi by using the "S" constraint, and the destination pointer dst is put into %edi using the "D" constraint. The count value is put into %ecx as it is needed by rep prefix. And here you can see another constraint that uses the two registers %eax and %edx to combine two 32-bit values and generate a 64-bit value:

#define rdtscll(val) \ __asm__ __volatile__ ("rdtsc" : "=A" (val)) The generated assembly looks like this (if val has a 64 bit memory space). #APP rdtsc #NO_APP movl %eax,-8(%ebp) movl %edx,-4(%ebp) /* As a result of A constraint %eax and %edx serve as outputs */

Note here that the values in %edx:%eax serve as 64 bit output.

Using matching constraints


Here you can see the code for the system call, with four parameters:

#define _syscall4(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4) \ type name (type1 arg1, type2 arg2, type3 arg3, type4 arg4) \ { \ long __res; \ __asm__ volatile ("int $0x80" \ : "=a" (__res) \ : "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)), \ "d" ((long)(arg3)),"S" ((long)(arg4))); \ __syscall_return(type,__res); \ }

In the above example, four arguments to the system call are put into %ebx, %ecx, %edx, and %esi by using the constraints b, c, d, and S. Note that the "=a" constraint is used in the output so that the return value of the system call, which is in %eax, is put into the variable __res. By using the matching constraint "0" as the first operand constraint in the input section, the syscall number __NR_##name is put into %eax and serves as the input to the system call. Thus

5 of 7

08/28/2006 01:23 PM

Inline assembly for x86 in Linux

http://www-128.ibm.com/developerworks/linux/library/l-ia.html?dwzone...

%eax serves here as both input and output register. No separate registers are used for this purpose. Note also that the input (syscall number) is consumed (used) before the output (the return value of syscall) is produced.

Use of memory operand constraint


Consider the following atomic decrement operation:

__asm__ __volatile__( "lock; decl %0" :"=m" (counter) :"m" (counter));

The generated assembly for this would look something like this:

#APP lock decl -24(%ebp) /* counter is modified on its memory location */ #NO_APP.

You might think of using the register constraint here for the counter. If you do, the value of the counter must first be copied on to a register, decremented, and then updated to its memory. But then you lose the whole purpose of locking and atomicity, which clearly shows the necessity of using the memory constraint.

Using clobbered registers


Consider an elementary implementation of memory copy.

asm ("movl $count, %%ecx; up: lodsl; stosl; loop up;" : :"S"(src), "D"(dst) :"%ecx", "%eax" );

/* no output */ /* input */ /* clobbered list */

While lodsl modifies %eax, the lodsl and stosl instructions use it implicitly. And the %ecx register explicitly loads the count. But GCC wont know this unless we inform it, which is exactly what we do by including %eax and %ecx in the clobbered register set. Unless this is done, GCC assumes that %eax and %ecx are free, and it may decide to use them for storing other data. Note here that %esi and %edi are used by "asm", and are not in the clobbered list. This is because it has been declared that "asm" will use them in the input operand list. The bottom line here is that if a register is used inside "asm" (implicitly or explicitly), and it is not present in either the input or output operand list, you must list it as a clobbered register.

Conclusion
On the whole, inline assembly is huge and provides a lot of features that we did not even touch on here. But with a basic grasp of the material in this article, you should be able to start coding inline assembly on your own.

6 of 7

08/28/2006 01:23 PM

Inline assembly for x86 in Linux

http://www-128.ibm.com/developerworks/linux/library/l-ia.html?dwzone...

Resources
Refer to the Using and Porting the GNU Compiler Collection (GCC) manual. Refer to the GNU Assembler (GAS) manual. Check out Brennans Guide to Inline Assembly.

About the author


Bharata B. Rao has a bachelor of Engineering in Electronics and Communication from Mysore University, India. He has been working for IBM Global Services, India since 1999. He is a member of the IBM Linux Technology Center, where he concentrates primarily on Linux RAS (Reliability, Availability, and Serviceability). Other areas of interest are operating system internals and processor architecture. He can be reached at rbharata@in.ibm.com.

7 of 7

08/28/2006 01:23 PM

Linux Assembly and Disassembly the Basics

phiral.net
|=---------------=[ Linux Assembly and Disassembly the Basics ]=--------------=| |=----------------------------------------------------------------------------=|

Home

---[ Introduction to as, ld and writing your own asm from scratch. First off you have to know what a system call is. A system call, or software interrupt is the mechanism used by an application program to request a service from the operating system. System calls often use a special machine code instruction which cause the processor to change mode or context (e.g. from "user more" to "supervisor mode" or "protected mode"). This switch is known as a context switch, for obvious reasons. A context is the protection and access mode that a piece of code is executing in, its determined by a hardware mediated flag. If you have ever heard of people talking about ring zero or cr0 they are referring to code that executes at protected or supervisor mode such as all kernel code. A context switch allows the OS to perform restricted actions such as accessing hardware devices or the memory management unit. Generally, operating systems provide a library that sits between normal programs and the rest of the operating system, usually the C library (libc), such as Glibc, or the Windows API. This library handles the low-level details of passing information to the kernel and switching to supervisor mode. These API's give you access functions that make your job easier, for instance printf to print a formatted string or the *alloc family to get more memory. In linux the system calls are defined in the file /usr/include/asm/unistd.h. entropy@phalaris entropy $ cat /usr/include/asm/unistd.h #ifndef _ASM_I386_UNISTD_H_ #define _ASM_I386_UNISTD_H_ /* * This file contains the system call numbers. */ #define #define #define #define #define #define #define __NR_restart_syscall __NR_exit __NR_fork __NR_read __NR_write __NR_open __NR_close 0 1 2 3 4 5 6

[...snip...] Each system call is shown as the system call name preceded by __NR_ and then followed by the system call number. The system call number is very important for writing asm programs that don't use gcc, a compiler or libc. The system call and fault low-level handling routines are contained in the file /usr/src/linux/arch/i386/kernel/entry.S although this is over our head for now. The text that you type for the instructions of the program is known as the source code. In order to transform source code into a executable program you must assemble and link it. These steps are done for you by a compiler, but we will do them seperatly. Assembling is the process that transforms your source code into instructions for the machine. The machine itself only reads numbers but humans work much better with words. An assembly language is a human readable form of machine code. The linux assemblers name is `as`, you can type `as -h` to see its arguments. `as` generates and object file out of a source file. An object file is machine code that has not been fully put together yet. Object files contain compact, pre-parsed code, often called binaries, that can be linked with other object files to generate a final executable or code library. An object file is mostly machine code. The linker is the program responsible for putting all the object files together and adding information so the kernel knows how to load and run it. `ld` is the name of the linker on linux.
http://www.phiral.net/linuxasmtwo.htm (1 of 7) [6/25/2010 2:03:56 PM]

Linux Assembly and Disassembly the Basics

So to summarize, source code must be assembled and linked in order to produce and executable program. On linux x86 this is accomplished with as source.s -o object.o ld object.o -o executable Where "source.s" is your assembly code, "object.o" is the object file produced from `as` and output (-o), and "executable" is the final executable produced when the object file has been linked. In the last tutorial we used gcc to generate the asm and then to compile the program. When we called write we pushed the length of the string, the address of the string, and the file descriptor onto the stack and then issued the instruction "call write". This needs some explanation because how we do it now is totally different. That way, pushing values onto the stack, is because we were using C (hello.c) and hence gcc generates C code which uses the C calling convention. A calling convention is the way that variables are stored and the parameters and return values are transfered, C takes its parameters and passes its return variables in a stack frame (eg pushl $14, pushl $.LC0, pushl $1, call write). A stack frame is a piece of the stack that holds all the info needed to call a function. So when we issued the "call write" instruction we were using the C Library (libc), and the write there was really the system call write, same name, but wrapped in libc (eg. getpid() is a wrapper for syscall(SYS_get_pid)). Now when we write our own asm for now we will not be using libc, even though that was is easier its not always possible to use and its good to know whats happening on a lower level. Here's our first program. entropy@phalaris asm $ cat hello.s .section .data hello: .ascii "Hello, World!\n\0" .section .text .globl _start _start: movl $4, %eax movl $14, %edx movl $hello, %ecx movl $1, %ebx int $0x80 movl $1, %eax movl $0, %ebx int $0x80 entropy@phalaris asm $ as hello.s -o hello.o entropy@phalaris asm $ ld hello.o -o hello entropy@phalaris asm $ ./hello Hello, World! Same output as before and accomplished the same thing but done very differently. .section .data Starts the section .data where all our data goes. We could just as easily have done .section .rodata like what gcc generated in the intro and then the string would have been read only but its much more common to put initialized data into the .data section. .rodata section is more like we wanted to do a #define hello "Hello, World\n" in C, in the .data section its more similar to char hello[] = "Hello, World\n". hello:

http://www.phiral.net/linuxasmtwo.htm (2 of 7) [6/25/2010 2:03:56 PM]

Linux Assembly and Disassembly the Basics

The label hello, which remember is a symbol (a symbol being a string representation for an address) followed by a colon. A label says, when you assemble, take the next instruction or data following the colon and and make that the labels value. .ascii "Hello, World!\n\0" And here is what the value of the label hello: is going to be, the label hello is going to point to the first character of the string (.ascii defines a string) "Hello, World!\n\0". .section .text Here we start our code section. .globl _start `as` expects _start while `gcc` expects main to be the starting function of an executable. Again .globl tells the assembler that it shouldn't get rid of the symbol after assembly because the linker needs it. _start: _start is a symbol that is going to be replaced by an address during either assembly or linking. _start here is where our program will start to execute when loaded by the kernel. movl $4, %eax When calling a system call the system call number you want to call is put into the register eax. As we saw above in the file /usr/include/asm/unistd.h, the write system call was defined as "#define __NR_write 4". So here we are moving the immediate value 4 into eax, so when we call the kernel to do its work it will know we want write. movl $14, %edx movl $hello, %ecx movl $1, %ebx The write system call is expecting three arguments namely, the file descriptor to write to, the address of the string to write, and the length of the string to write. When calling system calls, function arguments are passed in registers, which differs from the C Library or libc convention which expects function arguments to be pushed onto the stack. So we have the system call number goes into eax, the first argument goes into ebx, the third into ecx, the fourth into edx. There can be up to six arguments in ebx, ecx, edx, esi, edi, ebp consequently. If there are more arguments, they are simply passed though the structure as first argument. So we fill in the registers that write needs to do its job, we move 1 which is STDOUT into ebx, we put the label hello's value (which is the address of the string "Hello, World!\n\0") into ecx, and we put the length of the string 14, into edx. int $0x80 This instruction int(errupts) the kernel($0x80) and asks it to do the system function whos index is in eax. An interrupt interrupts the programs flow and asks the kernel to do something for us. The kernel will then preform the system function and then return control to our program. Before the interrupt we were executing in a user mode context, during the system call we were executing in a protected mode context, and when the kernel is done and returns control to our program we are again executing in a user mode context. So the kernel reads eax does a write of our string and returns. movl $1, %eax Now were done and we need to exit, so what number do we use to execute exit? Look back at unistd.h and we see that exit is "#define __NR_exit 1".

http://www.phiral.net/linuxasmtwo.htm (3 of 7) [6/25/2010 2:03:56 PM]

Linux Assembly and Disassembly the Basics

movl $0, %ebx exit expects one argument namely the return code (0 means no errors), so we put that into ebx. int $0x80 Call the kernel to execute exit with return code 0 and were done. Onto the disassembly. Compile with debugging symbols, `as` uses the same -g or -gstabs that `gcc` does. entropy@phalaris asm $ as -g hello.s -o hello.o And link it. entropy@phalaris asm $ ld hello.o -o hello Start gdb. entropy@phalaris asm $ gdb hello GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you arewelcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". Set a breakpoint at the address of _start so we can step through it.

(gdb) break *_start Breakpoint 1 at 0x8048094: file hello.s, line 7. (gdb) run Starting program: /home/entropy/asm/hello Hello, World! Program exited normally. Current language: auto; currently asm (gdb) The breakpoint didn't work, Im not sure why this happens but we can do a quick fix. Here is the fixed asm. entropy@phalaris asm $ cat hello.s .section .data hello: .ascii "Hello, World!\n\0" .section .text .globl _start _start: nop movl $4, %eax movl $14, %edx movl $hello, %ecx movl $1, %ebx int $0x80 movl $1, %eax movl $0, %ebx int $0x80

http://www.phiral.net/linuxasmtwo.htm (4 of 7) [6/25/2010 2:03:56 PM]

Linux Assembly and Disassembly the Basics

The only difference is the nop or no operation right after _start. Now we can set our breakpoint and it will work. Reassemble and link. entropy@phalaris asm $ as -g hello.s -o hello.o entropy@phalaris asm $ ld hello.o -o hello Start gdb. entropy@phalaris asm $ gdb hello GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you arewelcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". List our assembly. I put in comments so it would be easier to follow. Breakpoint 1 at 0x8048095: file hello.s, line 8. (gdb) list _start 2 hello: # label hello, address of the first char 3 .ascii "Hello, World!\n\0" # .ascii defines a string 4 .section .text # our code start 5 .globl _start # the start symbol defined as .globl 6 _start: # the start label 7 nop # no operation for debugging with gdb 8 movl $4, %eax # mov 4 into %eax, 4 is write(fd, buf, len) 9 movl $14, %edx # 14 is the length of our string 10 movl $hello, %ecx # the address of our string 11 movl $1, %ebx # 1 is STDOUT, to the screen (gdb) <hit enter> 12 int $0x80 # call the kernel 13 movl $1, %eax # move 1 into %eax, 1 is syscall exit() 14 movl $0, %ebx # move 0 into %ebx, exit's return value 15 int $0x80 # call kernel Set a break point at our nop. (gdb) break *_start+1 Breakpoint 1 at 0x8048095: file hello.s, line 8. And run it. (gdb) run Starting program: /home/entropy/asm/hello Breakpoint 1, _start () at hello.s:8 8 movl $4, %eax # mov 4 into %eax, 4 is write(fd, buf, len) Current language: auto; currently asm Now the breakpoint works. (gdb) step _start () at hello.s:9 9 movl $14, %edx (gdb) step _start () at hello.s:10 10 movl $hello, %ecx (gdb) step _start () at hello.s:11 11 movl $1, %ebx (gdb) step _start () at hello.s:12

# length for write 14 is the length of our string

# the address of our string

# 1 is STDOUT, to the screen

http://www.phiral.net/linuxasmtwo.htm (5 of 7) [6/25/2010 2:03:56 PM]

Linux Assembly and Disassembly the Basics

12

int $0x80

# call the kernel

Check the registers to see if they have the correct information in them. (gdb) print $edx $1 = 14 (gdb) x/s $ecx 0x80490b8 <hello>: (gdb) print $ebx $2 = 1 (gdb) print $eax $3 = 4 (gdb)

"Hello, World!\n"

Looks good so let the kernel do its work. (gdb) step Hello, World! _start () at hello.s:13 13 movl $1, %eax

# move 1 into %eax, 1 is syscall exit()

It has executed the write system call, you can see the printed string and returned to gdb. Now we call exit. (gdb) step _start () at hello.s:14 14 movl $0, %ebx (gdb) step _start () at hello.s:15 15 int $0x80 (gdb) step Program exited normally. And its done. (gdb) q entropy@phalaris asm $ Check out the objdump output. entropy@phalaris asm $ objdump -d hello hello: file format elf32-i386

# move 0 into %ebx, exit's return value

# call kernel

Disassembly of section .text: 08048094 <_start>: 8048094: 90 8048095: b8 04 00 804809a: ba 0e 00 804809f: b9 b8 90 80480a4: bb 01 00 80480a9: cd 80 80480ab: b8 01 00 80480b0: bb 00 00 80480b5: cd 80 entropy@phalaris asm $

00 00 04 00

00 00 08 00

00 00 00 00

nop mov mov mov mov int mov mov int

$0x4,%eax $0xe,%edx $0x80490b8,%ecx $0x1,%ebx $0x80 $0x1,%eax $0x0,%ebx $0x80

Notice the difference from the last tutorial objdump output, no snipping of tons of lines of extra sections and such, its only the code we coded in there which is so much cleaner. Notice how easy it would be to take the opcodes and make some tiny shell code out of, again without the nulls. What? You want some shellcode to print "Hello, World!\n" for your next 0day? Next time my friend, next time.

http://www.phiral.net/linuxasmtwo.htm (6 of 7) [6/25/2010 2:03:56 PM]

Linux Assembly and Disassembly the Basics

http://www.phiral.net/linuxasmtwo.htm (7 of 7) [6/25/2010 2:03:56 PM]

Inline x86 Assembly with GCC


Assembler Instructions with C Expression Operands asm ( CODE : OUTPUTS : INPUTS : CLOBBERED ); Example:

asm ( "add %1, %0" : "=r" (sum) : "r" (x), "0" (y) ); Code: add %1 to %0 and store the result in %0 Outputs: generic register, stored into local variable sum right after the execution of assembly code. Inputs: generic registers, initialized from local variables x and y right before the execution of assembly code. Clobbers: nothing except input/output registers
1

Input and output substitutions (x86) m - memory operand r - generic register operand i - immediate operand f - floating-point register t - top of FP stack u - next-to-top of FP stack a,b,c,d - A, B, C, D registers (EAX / AX / AL) D - DI register S - SI register A - two 32-bit registers combined to form a 64-bit register
2

16 bytes

x1
MOVAPS XMM1 MULPS

x2
ADDPS

+=
XMM0

MOVUPS
3

References:
GCC texinfo documentation GCC Online Documentation http://gcc.gnu.org/onlinedocs/ Linux Assembly HOWTO http://www.linuxdoc.org/HOWTO/Assembly-HOWTO/ Intel SIMD Resources http://developer.intel.com/vtune/cbts/simd.htm Linux Assembly Resources http://linuxassembly.org/resources.html Linux Parallel Processing HOWTO http://www.linuxdoc.org/HOWTO/Parallel-Processing-HOWTO.html Optimizing MILC Math Routines with SSE http://qcdhome.fnal.gov/sse/

You might also like