Passing and returning structs

About this recipe

In this recipe you will learn about:

the way structs are normally passed to and from functions;
cases when this is automatically optimised;
how to tell the compiler to return a struct value using several registers.

The default way to pass and return a struct

Unless special conditions apply (detailed in following sections), C structures are:

passed in registers which if necessary overflow onto the stack;
returned via a pointer to the memory location of the result.

For struct-valued functions a pointer to the location where the struct result is to be placed is passed in a1, (the first argument register). The first argument is then passed in a2, the second in a3 etc.

It is as if:

struct s f(int x)

were compiled as:

void f(struct s *result, int x)

As a demonstration of the default way in which structures are passed and returned consider the following code:

typedef struct two_ch_struct
{ char ch1;
  char ch2;
} two_ch;

two_ch max( two_ch a, two_ch b )
{ return (a.ch1>b.ch1) ? a : b;
}

This code is available in the examples directory as two_ch.c. It can be compiled to produce Assembly Language source by using the following command:

armcc -S two_ch.c -li -apcs 3/32bit

Where -li and -apcs 3/32bit can be omitted if armcc has been configured appropriately already.

Here is the code which armcc produces:

max
    MOV    ip,sp
    STMDB  sp!,{a1-a3,fp,ip,lr,pc}
    SUB    fp,ip,#4
    LDRB   a3,[fp,#-&14]
    LDRB   a2,[fp,#-&10]
    CMP    a3,a2
    SUBLE  a2,fp,#&10
    SUBGT  a2,fp,#&14
    LDR    a2,[a2,#0]
    STR    a2,[a1,#0]
    LDMDB  fp,{fp,sp,pc}

The STMDB instruction saves the arguments onto the stack, together with the frame pointer, stack pointer, link register and current pc value (this sequence of values is the stack backtrace data structure).

a2 and a3 are then used as temporary registers to hold the the required part of the strucures passed, and a1 as a pointer to an area in memory in which the resulting struct is placed - all as expected.

For a basic explanation of register naming and usage under the APCS, see Register usage under the ARM procedure call standard. Detailed information can be found in C language calling conventions.

The optimisation of integer-like structures

The ARM Procedure Call Standard specifies different rules for returning integer-like structs. An integer-like struct is one which has the following properties:

The size of the struct is no larger than one word;
The byte offset of each addressable sub-field is 0 (bit-fields are not addressable).

Thus the following structs are integer-like:

struct
{ unsigned a:8, b:8, c:8, d:8;
}

union polymorphic_ptr
{ struct A *a;
  struct B *b;
  int      *i;
}

Whereas the structure used in the previous example is not integer-like:

struct { char ch1, ch2; }

Integer-like structs are returned by returning the struct's contents in a1 rather than a pointer to the struct's contents. Thus a1 is not needed to pass a pointer to a result struct in memory, and is instead be used to pass the first argument.

For example, consider the following code:

typedef struct half_words_struct
{ unsigned field1:16;
  unsigned field2:16;
} half_words;

half_words max( half_words a, half_words b )
{ half_words x;
  x= (a.field1>b.field1) ? a : b;
  return x;
}

We would expect arguments a and b to be passed in registers a1 and a2, and since half_word_struct is integer-like we expect the result structure to be passed back directly in a1, (rather than a1 being used to return a pointer to the result half_words_struct).

The above code is available in the examples directory as half_str.c. It can be compiled to produce Assembly Language source by using the following command:

armcc -S half_str.c -li -apcs 3/32bit

Where -li and -apcs 3/32bit can be omitted if armcc has been configured appropriately already.

Here is the code which armcc produces:

max
    MOV    a3,a1,LSL #16
    MOV    a3,a3,LSR #16
    MOV    a4,a2,LSL #16
    MOV    a4,a4,LSR #16
    CMP    a3,a4
    MOVLE  a1,a2
    MOV    pc,lr

Clearly the contents of the half_words structure is returned directly in a1 as expected.

Returning non integer-like structs in registers

There are occasions when a function needs to return more than one value. The normal way to achieve this is to define a structure which holds all the values to be returned, and return this.

As we have seen, this will result in a pointer to the structure being passed in a1, which will then be dereferenced to store the values returned.

For some applications in which such a function is time critical, the overhead involved in "wrapping" and then "unwrapping" this structure can be significant. However, there is a way to tell the compiler that a structure should be returned in the argument registers a1 - a4. Clearly this is only useful for returning structures which are no larger than 4 words.

The way to tell the compiler to return a structure in the argument registers is to use the keyword "__value_in_regs".

Multiplication - Returning a 64-bit result

To illustrate how to use __value_in_regs, let us consider writing a function which multiplies two 32-bit integers together and returns the 64-bit result.

The way this function must work is to split the two 32-bit numbers (a, b) into high and low 16-bit parts,(a_hi, a_lo, b_hi, b_lo). The four multiplications a_lo * b_lo, a_hi * b_lo, a_lo * b_hi, a_hi * b_lo must be performed, and the results added together, taking care to deal with carry correctly.

Since the problem involves dealing with carry correctly, coding this function in C will not produce optimal code (see 64 Bit integer addition for more details). Therefore we will want to code the function in ARM Assembly Language. The following code performs the algorithm just described:

; On entry a1 and a2 contain the 32-bit integers to be multiplied (a, b)
; On exit a1 and a2 contain the result (a1 bits 0-31, a2 bits 32-63) mul64

    MOV    ip, a1, LSR #16                      ; ip = a_hi
    MOV    a4, a2, LSR #16                      ; a4 = b_hi
    BIC    a1, a1, ip, LSL #16                  ; a1 = a_lo
    BIC    a2, a2, a4, LSL #16                  ; a2 = b_lo
    MUL    a3, a1, a2                           ; a3 = a_lo * b_lo    (m_lo)
    MUL    a2, ip, a2                           ; a2 = a_hi * b_lo    (m_mid1)
    MUL    a1, a4, a1                           ; a1 = a_lo * b_hi    (m_mid2)
    MUL    a4, ip, a4                           ; a4 = a_hi * b_hi    (m_hi)
    ADDS   ip, a2, a1                           ; ip = m_mid1 + m_mid2 (m_mid)
    ADDCS  a4, a4, #&10000                      ; a4 = m_hi + carry       (m_hi')
    ADDS   a1, a3, ip, LSL #16                  ; a1 = m_lo + (m_mid<<16)
    ADC    a2, a4, ip, LSR #16                  ; a2 = m_hi' + (m_mid>>16) + carry
    MOV    pc, lr

Clearly this code is fine for use with Assembly language modules, but in order to use it from C we need to be able tell the compiler that this routine returns its 64-bit result in registers. This can be done by making the following declarations in a header file:

typedef struct int64_struct
{ unsigned int lo;
  unsigned int hi;
} int64;

__value_in_regs extern int64 mul64(unsigned a, unsigned b);

The Assembly Language code above, and the declarations above together with a test program are all in the examples directory, as the files: mul64.s, mul64.h, int64.h and multest.c. To compile, assemble and link these to produce an executable image suitable for armsd first set your current directory to examples, and then execute the following commands:

armasm mul64.s -o mul64.o -li
armcc -c multest.c -li -apcs 3/32bit
armlink mul64.o multest.o

Where somewhere is the directory in which the semi-hosted C libraries reside (eg. the lib directory of the ARM Software Tools Release). Note also that -li and -apcs 3/32bit can be omitted if armcc and armasm (and armsd below) have been configured appropriately.

multest can then be run under armsd as follows:

> armsd -li multest
A.R.M. Source-level Debugger, version 4.10 (A.R.M.) [Aug 26 1992]
ARMulator V1.20, 512 Kb RAM, MMU present, Demon 1.01, FPE, Little 
endian.
Object program file multest
armsd: go
Enter two unsigned 32-bit numbers in hex eg.(100 FF43D)

12345678 10000001
Least significant word of result is 92345678
Most  significant word of result is  1234567
Program terminated normally at PC = 0x00008418
      0x00008418: 0xef000011 .... : >  swi     0x11
armsd: quit
Quitting
>