Re: [arm-gnu] how to compile C code to NEON instructions
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [arm-gnu] how to compile C code to NEON instructions



On 07/08/2011 03:33 AM, David Brown wrote:

> As for automatic vectorisation being poor, it's actually a bit of a
> mixed bag.  It has definitely been improving with newer versions of gcc
> - you have to be precise about the version number when asking about
> this, or when looking up the gcc manuals, as it's a part of gcc that has
> been under heavy development.  The quality of the automatic
> vectorisation code varies a lot - different arrangements of the source
> code can have a heavy influence in how well the compiler can understand
> and optimise it.  It's important to give the compiler as much
> information as you can - for example, it is better to use arrays with
> fixed sizes rather than pointers, and the ordering of loops is vital.
> The compiler flags will also have a big effect - many of the loop and
> vectorisation optimisations are not enabled by any -O flags, but must be
> specified explicitly.

- Last time I looked at this, GCC could only vectorize float or double
when compiling a single file, not both.

- Neon floating point is considered "unsafe".  Read the gcc manual for
details.

- Getting code to vectorize can be tricky.  For example, in C
	*a++ = *b++ + *c++;
won't vectorize (what if a = b+1) unless you tell the compiler that a, b
and c aren't aliased.

- You might have more success if you wrote your code to use (C)BLAS.
This would mean you would have to find a vectorized CBLAS library for
the Neon.  I'm not sure one exists yet.  The latest Ubuntu release has
Atlas, but I'm not sure how they did it since I couldn't get it to build
out-of-the box.  Regardless, using a BLAS library forces you to have a
little discipline while cutting down on the places where you need to
vectorize (maybe by hand).