[arm-gnu] NEON usage
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[arm-gnu] NEON usage



Hi,

I have a question regarding vectorized loops.  With code;

#include <arm_neon.h>

#define N 1024

int main(int argc, char **argv)
{
	int32_t x[N], y[N];
	int i;
	int64_t sum = 0;

	for (i = 0; i < N; i++) {
		sum += x[i] * y[i];
	}

	printf("sum is %d\n", sum >> 32);
	return 0;
}

Compiled with...

$ arm-none-linux-gnueabi-gcc -O3 -march=armv7-a -mtune=cortex-a8
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize
-ftree-vectorizer-verbose=5 -ffast-math -fvect-cost-model  -o neon
test.c

test.c:27: note: not vectorized: unsupported data-type int64_t
test.c:21: note: vectorized 0 loops in function.

But the ARM NEON Intrinsics here:
http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html

seems to suggest that 

int64x2_t vmlal_s32 (int64x2_t, int32x2_t, int32x2_t)

could be used?

Have I just done something silly, or does the compiler really not
support the 32x32=64 bit result?

I have 24 ADC data and need to build a 32 bit FIR filter to process it.
Do I have to write the code using the intrinsics and not rely on the
compiler to magic it?

Regards,
James.