Re: [arm-gnu] how to compile C code to NEON instructions

To: vandung.tran@xxxxxxxxxxxxxxxx
Subject: Re: [arm-gnu] how to compile C code to NEON instructions
From: Bob Feretich <bob.feretich@xxxxxxxxxxxxxxx>
Date: Fri, 08 Jul 2011 12:32:36 -0700

David is right. You need to determine how the Neon can be used to bestimplement your algorithm. This involves structuring your algorithm forSIMD execution. Once structured for SIMD, then if you write the "C" codeto follow this structure, there is a chance that the compiler willgenerate good code. However, learning the idiosyncrasies of thecompiler's code generation will be difficult. I haven't found anydocumentation on how to write C for optimum SIMD vectorization. Trialand error coding or reading the compiler source code may be the bestmethods.

In determining the SIMD implementation of my algorithms, I found myselfroughly coding the algorithm in Neon instructions. At that point it waseasier for me to translate the algorithm to Neon Intrinsics than tofigure out how to get the C compiler to produce similar results. TheIntrinsics often generated unneeded register to register moveinstructions. If eliminating these extra instruction was important, Iwould implement the segment in inline assembly.


The below link provides some Neon coding examples.
http://www.arm.com/files/pdf/NEON_Support_in_the_ARM_Compiler.pdf

There is also a project to create a Neon optimized Math library. Manygood coding samples can be found in its source.

For some things like in-place matrix transpose, only inline assemblywill achieve optimal results.Neon can transpose a 4x4 matrix of 32-bit elements in four instructions.I have not seen a 'C' implementation that comes close to this.


Regards,
Bob

On 7/8/2011 12:33 AM, David Brown wrote:

On 08/07/2011 08:45, vandung.tran@xxxxxxxxxxxxxxxx wrote:
Hi Bob
Thank you very much for your information
> Yes, but the automatic vectorization is poor.
> Check out the Neon Intrinsics and using inline assembly in your C
programs.

Can you provide me some documents that explain why automatic
vectorization is poor?

I want to know which compiler option is better: using mfpu=neon or not?
So I am looking for some C source code (no assembly inside) that can be
compiled to NEON instructions.
However, I can 't find.

Any information would be helpful.

Best Regards,
=====================
Tran Van Dung
What are you actually trying to do here? It sounds like you want togenerate Neon instructions just so that you can say "the compiler cangenerate Neon instructions".
First, you'll have to learn about Neon - its programming model,registers, and instructions. Then /you/ have to write some C codethat is relevant for /your/ application needs, and which you areconfident would be best implemented using Neon. Then you compile itusing various selections of compiler flags, studying the generatedassembly code. Run the code and measure real world timings - for bothNeon and non-Neon variants. Then re-implement the code with explicitNeon intrinsics, and compare that to the compiler-generated code forspeed and size.
/Then/, and only then, will you have a good understanding about how towork together with the compiler to get the fastest possible code.
As for automatic vectorisation being poor, it's actually a bit of amixed bag. It has definitely been improving with newer versions ofgcc - you have to be precise about the version number when askingabout this, or when looking up the gcc manuals, as it's a part of gccthat has been under heavy development. The quality of the automaticvectorisation code varies a lot - different arrangements of the sourcecode can have a heavy influence in how well the compiler canunderstand and optimise it. It's important to give the compiler asmuch information as you can - for example, it is better to use arrayswith fixed sizes rather than pointers, and the ordering of loops isvital. The compiler flags will also have a big effect - many of theloop and vectorisation optimisations are not enabled by any -O flags,but must be specified explicitly.

References:
- Re: [arm-gnu] how to compile C code to NEON instructions
  - From: vandung . tran
- Re: [arm-gnu] how to compile C code to NEON instructions
  - From: David Brown

Prev by Date: Re: [arm-gnu] how to compile C code to NEON instructions
Next by Date: Re: [arm-gnu] pthread_mutex_lock issue with buildroot external toolchain CS 2010.09
Previous by thread: Re: [arm-gnu] how to compile C code to NEON instructions
Next by thread: [arm-gnu] Compiler option to avoid helper function use -- Arm Helper Function
Index(es):
- Date
- Thread