This recipe is intended to show that the ARM is quite capable of handling 16-bit data efficiently, and in several different ways, depending on the what is needed for a particular application.
Clearly any unsigned 16-bit value can be held as a 32-bit value in which the top 16 bits are all zero. Similarly any signed 16-bit value can be held as a 32-bit value with the top 16 bits sign extended (ie. copied from the top bit of the 16-bit value).
The main disadvantage of storing 16-bit data as 32-bit data in this way for ARM based systems is that it takes up twice as much space in memory or on disk. If the amount of memory taken up by the 16-bit data is small, then simply treating it as 32-bit data is likely to be the easiest and most efficient technique. ie. converting the data to 32-bit format and from then on treating it as 32-bit data.
When the space taken by 16-bit data in memory or on disk is not small, an alternative method can be used: The 16-bit data is loaded and converted to be 32-bit data for use within the ARM, and then when processed, can either be output as 32-bit or 16-bit data. Useful code fragments are given to perform the necessary conversions for this approach in section Little endian loading recipes to section Big endian storing recipes.
An issue which may arise when 16-bit data is converted to 32-bit data for use in the ARM and then stored back out as 16-bit data is detecting whether the data is still 16-bit data, ie. whether it has 'overflowed' into the top 16 bits of the ARM register. Code fragments which detect this are given in the section Detecting overflow into the top 16 bits.
Another approach which avoids having to use explicit code to check whether results have overflowed into the top 16-bits is to keep 16-bit data as 16-bit data all the time, by loading it into the top half of ARM registers, and ensuring that the bottom 16 bits are always 0. Useful code sequences, and the issues involved when taking this approach are described in Using ARM registers as 16-bit registers.
Little endian loading recipes
Code fragments in this section which transfer a single 16-bit data item
transfer it to the least significant 16 bits of an ARM register. The
byte offset referred to is the byte offset within a word at the
load address. eg. the address 0x4321 has a byte offset of 1.
This code is also optimal for the common case where the 16-bit data is half word aligned, ie. at either byte offset 0 or 2 (but the same code is required to deal with both cases). Optimisations can be made when it is known that the data is at at byte offset 0, and also when it is known to be at byte offset 2 (but not when it could be at either offset).
The two MOV instructions are only required if the 16-bit value is signed,
and it may be possible to combine the second MOV with another data
processing operation by specifying the second argument as "R0, ASR, #16"
rather than just R0.
LDRB R0, [R2, #0] ; 16-bit value is loaded from the
LDRB R1, [R2, #1] ; address in R2, and put in R0
ORR R0, R0, R1, LSL #8; R1 is required as a
; MOV R0, R0, LSL #16; temporary register
; MOV R0, R0, ASR #16
The "LSR" should be replaced with "ASR" if the data is signed. Note that
as in the previous example it may be possible to combine the MOV with
another data processing operation.
LDR R0, [R2, #-2]; 16-bit data is loaded from address in
MOV R0, R0, LSR #16; R2 into R0 (R2 has byte offset 2)
As before, "LSR" should be replaced with "ASR" if the data is signed.
Also, it may be possible to combine the second MOV with another data
processing operation.
LDR R0, [R2, #0]; 16-bit value is loaded from the word
MOV R0, R0, LSL #16; aligned address in R2 into R0.
MOV R0, R0, LSR #16
This code can be further optimised if non-word-aligned word-loads are permitted (ie. Alignment faults are not enabled). This makes use of the way ARM rotates data into a register for non-word-aligned word-loads, see the appropriate ARM Datasheet for more information:
LDR R0, [R2, #2] ; 16-bit value is loaded from the word
MOV R0, R0, LSR #16; aligned address in R2 into R0.
The version of this for signed data is:
LDR R0, [R2, #0]; 2 unsigned 16-bit values are loaded
MOV R1, R0, LSR #16 ; from one word of memory [R2}. The
BIC R0, R0, R1, LSL #16; 1st is put in R0, 2nd in R1.
The address in R2 should be word aligned (byte offset 0), in which case
these code fragments load the data item in bytes 0-1 into R0, and the data
item in bytes 2-3 into R1.
LDR R0, [R2, #0] ; 2 signed 16-bit values are loaded
MOV R1, R0, ASR #16 ; from one word of memory [R2]. The
MOV R0, R0, LSL #16 ; 1st is put in R0, 2nd in R1.
MOV R0, R0, ASR #16
The second MOV instruction is not needed if the data is no longer needed
after the data is stored.
STRB R0, [R2, #0] ; 16-bit value is stored to the address
MOV R0, R0, ROR #8 ; in R2.STRB R0, [R2, #1]
; MOV R0, R0, ROR #24
Unlike load operations, knowing the alignment of the destination address does not make optimisations possible.
If the values in R0 and R1 are not needed after they are saved, then R3
need not be used as a temporary register (one of R0 or R1 can be used
instead).
ORR R3, R0, R1, LSL #16 ;Two unsigned 16-bit values
STR R3, [R2, #0] ;in R0 and R1 are packed into
;the word addressed by R2
;R3 is used as a temporary register
The version for signed data is:
Again, if the values in R0 and R1 are not needed after they are saved,
then R3 need not be used as a temporary register (R0 can be used
instead).
MOV R3, R0, LSL #16 ; Two signed 16-bit values
MOV R3, R3, LSR #16 ; in R0 and R1 are packed into
ORR R3, R3, R1, LSL #16 ; the word addressed by R2
STR R3, [R2, #0] ; R3 is a temporary register
This code is also optimal for the common case where the 16-bit data is half word aligned, ie. at either byte offset 0 or 2 (but the same code is required to deal with both cases). Optimisations can be made when it is known that the data is at at byte offset 0, and also when it is known to be at byte offset 2 (but not when it could be at either offset).
The two MOV instructions are only required if the 16-bit value is signed,
and it may be possible to combine the second MOV with another data
processing operation by specifying the second argument as "R0, ASR, #16"
rather than just R0.
LDRB R0, [R2, #0] ; 16-bit value is loaded from the
LDRB R1, [R2, #1] ; address in R2, and put in R0
ORR R0, R1, R0, LSL #8 ; R1 is a temporary register
; MOV R0, R0, LSL #16
; MOV R0, R0, ASR #16
The "LSR" should be replaced with "ASR" if the data is signed. Note that
as in the previous example it may be possible to combine the MOV with
another data processing operation.
LDR R0, [R2, #0] ; 16-bit value is loaded from the word
MOV R0, R0, LSR #16 ; aligned address in R2 into R0.
As before, "LSR" should be replaced with "ASR" if the data is signed.
Also, it may be possible to combine the second MOV with another data
processing operation.
LDR R0, [R2, #-2] ; 16-bit value is loaded from the
MOV R0, R0, LSL #16 ; address in R2 into R0. R2 is
MOV R0, R0, LSR #16 ; aligned to byte offset 2
This code can be further optimised if non-word-aligned word-loads are permitted (ie. Alignment faults are not enabled). This makes use of the way ARM rotates data into a register for non-word-aligned word-loads, see the appropriate ARM Datasheet for more information:
LDR R0, [R2, #0] ; 16-bit value is loaded from the
MOV R0, R0, LSR #16 ; address in R2 into R0. R2 is
; aligned to byte offset 2
The version of this for signed data is:
LDR R0, [R2, #0] ; 2 unsigned 16-bit values are
MOV R1, R0, LSR #16 ; loaded from one word of memory
BIC R0, R0, R1, LSL #16 ; 1st goes in R0, the 2nd in R1.
LDR R0, [R2, #0] ;2 signed 16-bit values are loaded
MOV R1, R0, ASR #16 ;from one word of memory (address
MOV R0, R0, LSL #16 ;in R2). The first is put in R0, and
MOV R0, R0, ASR #16 ;the second into R1.
Big endian storing recipes
The code fragment in this section which transfers a single 16-bit data
item transfers it from the least significant 16 bits of an ARM register.
The byte offset referred to is the byte offset from a word address of the
store address. eg. the address 0x4321 has a byte offset of 1.
The second MOV instruction is not needed if the data is no longer needed
after the data is stored.
STRB R0, [R2, #1] ; 16-bit value is stored to the
MOV R0, R0, ROR #8 ; address in R2.
STRB R0, [R2, #0]
; MOV R0, R0, ROR #24
Unlike load operations, knowing the alignment of the destination address does not make optimisations possible.
If the values in R0 and R1 are not needed after they are saved, then R3
need not be used as a temporary register (one of R0 or R1 can be used
instead).
ORR R3, R0, R1, LSL #16 ; Two unsigned 16-bit values in
STR R3, [R2, #0] ; R0 and R1 are packed into the
; word addressed by R2
; R3 is a temporary register
The version for signed data is:
Again, if the values in R0 and R1 are not needed after they are saved,
then R3 need not be used as a temporary register (R0 can be used
instead).
MOV R3, R0, LSL #16 ; Two signed 16-bit values in
MOV R3, R3, LSR #16 ; R0 and R1 are packed into the
ORR R3, R3, R1, LSL #16 ; word addressed by R2.
STR R3, [R2, #0] ; R3 is a temporary register
The following instruction sets the Z flag if the value in R0 is a 16-bit unsigned value. R1 is used as a temporary register.
The following instructions set the Z flag if the value in R0 is a valid
16-bit signed value (ie. bit 15 is the same as the sign extended bits). R1
is used as a temporary register.
MOVS R1, R0, LSR #16
MOVS R1, R0, ASR #15
CMNNE R1, R1, #1
Using
ARM registers as 16-bit
registers
Instead of holding 16-bit data as 32-bit data within the ARM it can be
held as 16-bit data. This is done by positioning it in the top 16-bits of
the ARM registers as opposed to the bottom 16 bits as has been described
so far.
The advantages of this approach are:
ADC R0, R0, #0
ADDCS R0, R0, #&10000
Having to use this form of instruction reduces the chance of being able to
combine several data processing operations into one by making use of the
barrel shifter.