The following are defined by the APCS:
Hand written code is expected to adhere to the APCS when making calls to externally visible functions. Such code is said to be conforming.
The ARM Procdeure Call Standard comprises a family of variants. The following independent choices need to be made to fix the variant of the APCS required:
For the full specification of the APCS see ARM Procedure Call Standard.
-------------------------------------------------------- Name |Register |APCS Role -------------------------------------------------------- a1 |r0 |argument 1 / integer result -------------------------------------------------------- a2 |r1 |argument 2 -------------------------------------------------------- a3 |r2 |argument 3 -------------------------------------------------------- a4 |r3 |argument 4 -------------------------------------------------------- v1-v5 |r4-r8 |register variables -------------------------------------------------------- sb |r9 |static base -------------------------------------------------------- sl |r10 |stack limit / stack chunk handle -------------------------------------------------------- fp |r11 |frame pointer -------------------------------------------------------- ip |r12 |new-static base in inter-link-unit | |calls -------------------------------------------------------- sp |r13 |lower end of current stack frame -------------------------------------------------------- lr |r14 |link address -------------------------------------------------------- pc |r15 |program counter -------------------------------------------------------- f0 |f0 |FP argument 1 / FP result -------------------------------------------------------- f1 |f1 |FP argument 2 -------------------------------------------------------- f2 |f2 |FP argument 3 -------------------------------------------------------- f3 |f3 |FP argument 4 -------------------------------------------------------- f4-f7 |f4-f7 |FP register variables --------------------------------------------------------
Simplistically:
------------------------------------------------------- Register |Use ------------------------------------------------------- a1-a4, f0-f3 |Used to pass arguments to functions. a1 |is also used to return integer results, |and f0 to return FP results. These |registers can be corrupted by a called |function. ------------------------------------------------------- v1-v5, f4-f7 |Used as register variables. They must |be preserved by called functions. ------------------------------------------------------- sb,sl,fp,ip,sp|have a dedicated role in some APCS ,lr,pc |variants, some of the time. ie. there |are times when some of these registers |can be used for other purposes even when |strictly conforming to the APCS. In |some variants of the APCS sb and sl are |available as additional variable |registers v6 and v7 respectively. -------------------------------------------------------
As stated previously, hand coded assembler routines need not conform strictly to the APCS, but need only conform. This means that all registers which do not need to be used in their APCS role by an assembler routine (eg. fp) can be used as working registers as long as their value on entry is restored before returning.
64 Bit integer addition
The purpose of this example is to examine coding a small function in ARM
Assembly Language, in a way which will enable it to be used from C
modules. First, however, the function is coded in C, and the compiler's
output examined.
Let us consider writing a 64 bit integer addition routine in C, where the data structure used to store 64 bit integers is a two word structure. The obvious way to code the addition of these double length integers in assembler is to make use of the Carry flag from the low word addition in the high word addition. However, there is no way to specify this in C.
A possible way to code around this in C is as follows:
void add_64(int64 *dest, int64 *src1, int64 *src2)
{ unsigned hibit1=src1->lo >> 31, hibit2=src2->lo >> 31, hibit3;
dest->lo=src1->lo + src2->lo;
hibit3=dest->lo >> 31;
dest->hi=src1->hi + src2->hi +
((hibit1 & hibit2) || (hibit1!= hibit3));
return;
}
Set the current directory to examples. The above code can be found in add64_1.c, which we can compile to ARM Assembly Language source as follows:
The -S flag tells armcc to produce ARM Assembly Language source
(suitable for armasm) rather than producing object code. The -li
flag tells armcc to compile for a little-endian memory and the
-apcs option specifies that the 32 bit version of APCS 3 should be used.
armcc -li -apcs 3/32bit -S add64_1.c
Looking at the output file, add64_1.s, we can see that this is indeed an inefficient implementation.
To compile this to give assembler suitable for use with armasm
first set the current directory to examples, and issue this command
(the options used are described above):
void add_64(int64 *dest, int64 *src1, int64 *src2)
{ dest->lo=src1->lo + src2->lo;
dest->hi=src1->hi + src2->hi;
return;
}
This will produce the source in add64_2.s, which will include the
following code:
armcc -li -apcs 3/32bit -S add64_2.c
Looking at this carefully comparing it to the C source we can see that the
first ADD instruction produces the low order word, and the second produces
the high order word. All we need to do to get the carry from the low to
high word right is change the first ADD to ADDS (add and set flags), and
the second ADD to an ADC (add with carry). This modified code is
available in the examples directory as add64_3.s.
add_64
LDR a4,[a2,#0]
LDR ip,[a3,#0]
ADD a4,a4,ip
STR a4,[a1,#0]
LDR a2,[a2,#4]
LDR a3,[a3,#4]
ADD a2,a2,a3
STR a2,[a1,#4]
MOV pc,lr
a1 clearly holds a pointer to the destination structure, a2 and a3 pointers to the operand structures. Both a4 and ip are used as temporary registers, which are not preserved. The conditions under which ip can be corrupted will be discussed later in this recipe.
This is a simple leaf function, which uses few temporary registers. Therefore no registers are saved to the stack, and none need to be restored on exit. Thus a simple "MOV pc,lr" can be used to return.
If we had wished to return a result, perhaps the carry out from this addition, then it would be loaded into a1 prior to exit. In this example, this could be done by changing the second ADD to ADCS (add with carry and set flags), and adding the following instructions to load a1 with 1 or 0 depending on the carry out from the high order addition.
MOV a1, #0
ADC a1, a1, #0
Back to the first inefficient
implementation
Although the first C implementation was inefficient, it shows us more
about the APCS than the more efficient hand modified version.
We have already seen a4 and ip being used as non-preserved temporary registers. However, here v1 and lr are also used as temporary registers. v1 is preserved by storing it (together with lr) on entry. lr is corrupted, but a copy is saved, onto the stack, and is reloaded into pc at the same time that v1 is restored.
Thus there is still only a single exit instruction, but now it is:
LDMIA sp!,{v1,pc}
-------------------------------------------------------- Registe|Description r | -------------------------------------------------------- ip |This register is used only during function call. | It is conventionally used as a local code |generation temporary register. At other times |it can be used as a corruptible temporary |register. -------------------------------------------------------- lr |This register holds the address to which control |must return on function exit. It can be (and |often is) used as a temporary register after |pushing its contents onto the stack. This value |can then be reloaded straight into the PC. -------------------------------------------------------- sp |This is the stack pointer, which is always valid |in strictly conforming code, but need only be |preserved in hand written code. Note, however, |that if any use of the stack is to be made by |hand written code, sp must be available. -------------------------------------------------------- sl |This is the stack limit register. If stack |limit checking is explicit (ie. it is performed |by code when stack pushes occur, rather than by |memory management hardware causing a trap when |stack overflow occurs), then sl must be valid |whenever sp is valid. If stack checking is |implicit sl is instead treated as v7, an |additional register variable (which must be |preserved by called functions). -------------------------------------------------------- fp |This is the frame pointer register. It contains |either zero, or a pointer to the most recently |created stack backtrace data structure. As with |the stack pointer, this must be preserved, but |in hand written code need not be available at |all instants. It should, however, be valid |whenever any strictly conforming functions are |called. -------------------------------------------------------- sb |This is the static base register. If a the |variant of the APCS being used is reentrant, |then this register is used to access an array of |static data pointers to allow code to access |data reentrantly. However, if the variant of |the APCS being used is not reentrant then sb is |instead available as an additional register |variable, v6 (which must be preserved by called |functions). --------------------------------------------------------
Thus sp,sl,fp and sb must all be preserved on function exit for APCS conforming code.