Re: [arm-gnu] how to compile C code to NEON instructions
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [arm-gnu] how to compile C code to NEON instructions
- To: arm-gnu@xxxxxxxxxxxxxxxx
- Subject: Re: [arm-gnu] how to compile C code to NEON instructions
- From: Allen McIntosh <mcintosh@xxxxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 08 Jul 2011 07:18:02 -0400
On 07/08/2011 03:33 AM, David Brown wrote:
> As for automatic vectorisation being poor, it's actually a bit of a
> mixed bag. It has definitely been improving with newer versions of gcc
> - you have to be precise about the version number when asking about
> this, or when looking up the gcc manuals, as it's a part of gcc that has
> been under heavy development. The quality of the automatic
> vectorisation code varies a lot - different arrangements of the source
> code can have a heavy influence in how well the compiler can understand
> and optimise it. It's important to give the compiler as much
> information as you can - for example, it is better to use arrays with
> fixed sizes rather than pointers, and the ordering of loops is vital.
> The compiler flags will also have a big effect - many of the loop and
> vectorisation optimisations are not enabled by any -O flags, but must be
> specified explicitly.
- Last time I looked at this, GCC could only vectorize float or double
when compiling a single file, not both.
- Neon floating point is considered "unsafe". Read the gcc manual for
details.
- Getting code to vectorize can be tricky. For example, in C
*a++ = *b++ + *c++;
won't vectorize (what if a = b+1) unless you tell the compiler that a, b
and c aren't aliased.
- You might have more success if you wrote your code to use (C)BLAS.
This would mean you would have to find a vectorized CBLAS library for
the Neon. I'm not sure one exists yet. The latest Ubuntu release has
Atlas, but I'm not sure how they did it since I couldn't get it to build
out-of-the box. Regardless, using a BLAS library forces you to have a
little discipline while cutting down on the places where you need to
vectorize (maybe by hand).