In the C-library build directory (eg. directory semi for the semi-hosted library), the file options is used to select variants of the C-library.
The supplied file contains the following:
The default divide implementation 'unrolled' is fast, but occupies a total of 416 bytes (55 instructions for the signed version plus 49 instructions for the unsigned version). This is an appropriate default for most toolkit users who are interested in obtaining maximum performance.
memcpy = fast
divide = unrolled
stdfile_redirection = off
fp_type = module
stack = contiguous
backtrace = off
Alternatively you can change this file to select 'small' divide which is more compact at 136 bytes(20 instructions for signed plus 14 instructions for unsigned) but somewhat slower as there is considerable looping overhead.
For a comparison of the speed difference between these two routines, consult the following table (the speed of divide is data-dependent):
--------------------------------------------------------- Calc |unrolled |rolled cycles |cycles | --------------------------------------------------------- 0/100 |22 |19 --------------------------------------------------------- 9/7 |22 |19 --------------------------------------------------------- 4096/2 |70 |136 --------------------------------------------------------- 1000000/3 |99 |240 --------------------------------------------------------- 1000000000/1|148 |370 ---------------------------------------------------------
If you have a specific requirement you can modify the supplied routines to suit your application better. For instance, you could write an unrolled-2-times version or you could combine the signed and unsigned versions to save more space.
The supplied routines implement signed division only; it would be possible to modify them for unsigned division if required. The routines are described by the standard header file "stdlib.h" which contains the following declarations:
typedef struct __sdiv32by16 { int quot, rem; } __sdiv32by16;
/* used int so that values return in regs, although 16-bit */
typedef struct __sdiv64by32 { int rem, quot; } __sdiv64by32;
__value_in_regs extern __sdiv32by16 __rt_sdiv32by16(
int /*numer*/,
short int /*denom*/);
/*
* Signed div: (16-bit quot), (16-bit rem) = (32-bit) / (16-bit)
*/
__value_in_regs extern __sdiv64by32 __rt_sdiv64by32(
int /*numer_h*/, unsigned int /*numer_l*/,
int /*denom*/);
/*
* Signed div: (32-bit quot), (32-bit rem) = (64-bit) / (32-bit)
*/
These routines both have guaranteed performance:108 cycles for __rt_sdiv64by32 (excluding call & return)
44 cycles for __rt_sdiv32by16
Currently the C-compiler does not automatically use these routines, as the default routines have early-termination which yields good performance for small values.
In order to use these new divide routines, you should explicitly call the relevant function. The __rt_div64by32 function is complicated by the fact that our C-compiler does not currently support 64-bit integers, as you have to split a 64-bit value between two 32-bit variables.
The following example program shows how to use these routines. This program is available as dspdiv.c in the examples directory. Once the program has been compiled and linked, type
to calculate 1000/3
armsd dspdiv 1000 3
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
if (argc != 3)
puts("needs 2 numeric args");
else
{
__sdiv32by16 result;
result = __rt_sdiv32by16(atoi(argv[1]), atoi(argv[2]));
printf("quotient %d\n", result.quot);
printf("remainder %d\n", result.rem);
}
return 0;
}
For real-time applications, the maximum division time must be as short as possible to ensure that the calculation can complete in time. In this case, the functions __rt_sdiv64by32 and __rt_sdiv32by16 are useful.