madtom1999 wrote:Is there anywhere there is a list of (raspbian) compiler switches suitable for maximum performance or possible optimisations for each version of the pi? These would be really helpful for a lot of us for getting other packages working.
This is what I use:
Pi Zero (armv6)
-mcpu=arm1176jzf-s -mfpu=vfp
Pi2 (armv7)
-mcpu=cortex-a7 -mfpu=neon-vfpv4 -mneon-for-64bits
(add -mthumb for a much smaller executable).
Pi3 and Pi2 V1.2 (armv8)
-mcpu=cortex-a53 -mfpu=neon-fp-armv8 -mneon-for-64bits
(change to "march=armv8-a+crc -mfpu=neon-fp-armv8 -mtune=cortex-a53" if you need the crc extensions).
Pi3 (armv8 in 64-bit mode)
-mcpu=cortex-a53
The neon-for-64bits flag "encourages" the compiler to make more use of NEON for 64-bit integer arithmetic. It will use NEON anyway for hard things like 64-bit shifts, and also if it happens to have the value already in a NEON register.
There are no fpu flags allowed for aarch64 because the presence of NEON is guaranteed (and obviously "neon-for-64bits" is pointless!). NEON in 64-bit mode is fully IEE-754 compliant.
Armv6 only supports thumb1 (16-bit instructions only) which is too limited to be useful. Armv7 supports thumb2 which is both 16 and 32 bits and is complete (some ARM processors only accept thumb2). Armv8 partially deprecates thumb2, so I avoid it. Aarch64 instructions are always 32 bits.
There is no need for -march or -mtune.
A good optimization is to use a late version of GCC because much work has been done for ARM recently. The current version is 6.2.
Perhaps something like this should be in a sticky?