Re: [arm-gnu] how to compile C code to NEON instructions
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [arm-gnu] how to compile C code to NEON instructions



On 08/07/2011 08:45, vandung.tran@xxxxxxxxxxxxxxxx wrote:

Hi Bob
Thank you very much for your information
 > Yes, but the automatic vectorization is poor.
 > Check out the Neon Intrinsics and using inline assembly in your C
programs.

Can you provide me some documents that explain why automatic
vectorization is poor?

I want to know which compiler option is better: using mfpu=neon or not?
So I am looking for some C source code (no assembly inside) that can be
compiled to NEON instructions.
However, I can 't find.

Any information would be helpful.

Best Regards,
=====================
Tran Van Dung


What are you actually trying to do here? It sounds like you want to generate Neon instructions just so that you can say "the compiler can generate Neon instructions".

First, you'll have to learn about Neon - its programming model, registers, and instructions. Then /you/ have to write some C code that is relevant for /your/ application needs, and which you are confident would be best implemented using Neon. Then you compile it using various selections of compiler flags, studying the generated assembly code. Run the code and measure real world timings - for both Neon and non-Neon variants. Then re-implement the code with explicit Neon intrinsics, and compare that to the compiler-generated code for speed and size.

/Then/, and only then, will you have a good understanding about how to work together with the compiler to get the fastest possible code.


As for automatic vectorisation being poor, it's actually a bit of a mixed bag. It has definitely been improving with newer versions of gcc - you have to be precise about the version number when asking about this, or when looking up the gcc manuals, as it's a part of gcc that has been under heavy development. The quality of the automatic vectorisation code varies a lot - different arrangements of the source code can have a heavy influence in how well the compiler can understand and optimise it. It's important to give the compiler as much information as you can - for example, it is better to use arrays with fixed sizes rather than pointers, and the ordering of loops is vital. The compiler flags will also have a big effect - many of the loop and vectorisation optimisations are not enabled by any -O flags, but must be specified explicitly.