[arm-gnu] Code generation for Cortex M3
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[arm-gnu] Code generation for Cortex M3



Hi.
 
I was looking at the code generated for accessing HW peripherals and was surprised to see gcc 4.4.4 use two 16 bit loads for the address instead of one 32 bit load.
 
C source:
   volatile UWord * puwTmp;
 
   puwTmp = 0x40000130;
 
   *puwTmp = 0x00;
 
The code generated is:
    F2401330    movw r3, #0x130
    F2C40300    movt r3, #0x4000
 
This code is eight bytes and takes 6 cycles to execute (actually I see two and a half of the two instruction sequences above execute per ten clock cycles, so 6.6 cycles is more accurate). I'm not sure if this is because of pipelining. I'm running this code on an STM32F103 processor which is running at 72MHz and running out of RAM where there are no wait states.
 
I think the following would be better:
    4899        ldr r0, [pc, #0x264]
 
and a little later, in memory, we have 0x0130 and 0x4000 in memory. 
 
This code sequence is six bytes and takes four cycles to execute.
 
Any suggestions on how to configure gcc to generate the better code? I've tried setting optimization levels to none, reduce size, max optimization, etc. but with no avail.
 
Thanks,
Harjit
PS: Tangent alert - I'm surprised there is no 32bit immediate load given that this is a 32bit processor.