[arm-gnu] Using ARM Neon Intrinsics to load a constant
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[arm-gnu] Using ARM Neon Intrinsics to load a constant



I want to set the all of the elements of a quad word vector to a legal floating point constant (usually 0.0).
Using the intrinsics, it seems that I need to code...

float32x4_t vec4;

...

vec4 = vdupq_n_f32 (0.0);


When I do that the 2009q3  compiler generates:

mov r7,#0

vdupq.32 qx,r7


The above uses an unnecessary Arm core register. How do I get the compiler to generate:

vmov.32 qx,#0

I wasn't sure that I had was using the intrinsics call properly, so I posed the question to the ARM software support engineers. They confirmed my understanding. This was their response...
RVCT 4.0 build 650 generates the code you expect. When I compile this
function:

float32x4_t Foo(void)
{
    float32x4_t vec4 = vdupq_n_f32(0.0);
    return vec4;
}

It generates:
    Foo
        0x00000000:    f2800050    P...    VMOV.I32 q0,#0
        0x00000004:    e12fff1e    ../.    BX       lr
What do I code to have the Code Sourcery compiler to generate similar code?

Regards,
Bob