Discussion:
-O2 faster than -O3
Edward Nevill
2015-04-02 10:19:59 UTC
Permalink
Hi,

I did some tests on the following function

--- CUT HERE ---
int fibo(int n)
{
if (n < 2) return 1;
return (fibo(n-2) + fibo(n-1));
}
--- CUT HERE ---

and I discovered that it is faster -O2 than -O3. This is with gcc 4.9.2.

Looking at the disassembly I see it is using FP registers to hold
integer values. The following is a small extract.

.L3:
fmov w0, s8
sub w25, w25, #1
cmn w25, #1
add w0, w0, w27
fmov s8, w0
bne .L19
add w0, w0, 1
b .L2

Recompiling with -mgeneral-regs-only generates a huge improvement.

The following are the times I get on various partner HW. I have
normalised the -O2 times to 1 second so that I do not disclose actual
partner performance data:

Partner 1: -O2 = 1sec, -O3 = 1.13sec, -O3 -mgeneral-regs-only = 0.72sec
Partner 2: -O2 = 1sec, -O3 = 0.68sec, -O3 -mgeneral-regs-only = 0.60sec
Partner 3: -O2 = 1sec, -O3 = 0.73sec, -O3 -mgeneral-regs-only = 0.68sec
Partner 4: -O2 = 1sec, -O3 = 0.83sec, -O3 -mgeneral-regs-only = 0.84sec

So, in general, -O3 does actually do better than -O2, but in all cases
performance is better if I stop it using FP registers for int values.

I have put a tarball of the test program along with 3 binaries and 3
disassemblies here:-

http://people.linaro.org/~edward.nevill/fibo.tar

All the best,
Ed.
Jim Wilson
2015-04-02 16:31:48 UTC
Permalink
Post by Edward Nevill
Looking at the disassembly I see it is using FP registers to hold
integer values. The following is a small extract.
-O3 turns on -finline-functions, which causes a lot of code expansion.
That, combined with the fact that we end up with a call in the middle
of a loop, and most values have lifetimes that cross the call, means
that it runs out of registers, and needs to spill. The compiler is
then choosing to use FP registers for spills instead of
storing/loading to/from the stack. I'm not seeing the same behaviour
from the FSF GCC mainline, nor from the linaro-4.9 branch. These are
both spilling to stack instead of FP registers.

This appears to be FSF GCC Bug 61915
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915
which was fixed in November and back ported to our linaro branch last month.

Jim

Loading...