[arm-gnu] NEON usage
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[arm-gnu] NEON usage
- To: arm-gnu <arm-gnu@xxxxxxxxxxxxxxxx>
- Subject: [arm-gnu] NEON usage
- From: James <jamessteward@xxxxxxxxxxxxxxx>
- Date: Tue, 11 May 2010 13:20:10 +1000
Hi,
I have a question regarding vectorized loops. With code;
#include <arm_neon.h>
#define N 1024
int main(int argc, char **argv)
{
int32_t x[N], y[N];
int i;
int64_t sum = 0;
for (i = 0; i < N; i++) {
sum += x[i] * y[i];
}
printf("sum is %d\n", sum >> 32);
return 0;
}
Compiled with...
$ arm-none-linux-gnueabi-gcc -O3 -march=armv7-a -mtune=cortex-a8
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize
-ftree-vectorizer-verbose=5 -ffast-math -fvect-cost-model -o neon
test.c
test.c:27: note: not vectorized: unsupported data-type int64_t
test.c:21: note: vectorized 0 loops in function.
But the ARM NEON Intrinsics here:
http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html
seems to suggest that
int64x2_t vmlal_s32 (int64x2_t, int32x2_t, int32x2_t)
could be used?
Have I just done something silly, or does the compiler really not
support the 32x32=64 bit result?
I have 24 ADC data and need to build a 32 bit FIR filter to process it.
Do I have to write the code using the intrinsics and not rely on the
compiler to magic it?
Regards,
James.