Edward Nevill
2015-04-02 10:19:59 UTC
Hi,
I did some tests on the following function
--- CUT HERE ---
int fibo(int n)
{
if (n < 2) return 1;
return (fibo(n-2) + fibo(n-1));
}
--- CUT HERE ---
and I discovered that it is faster -O2 than -O3. This is with gcc 4.9.2.
Looking at the disassembly I see it is using FP registers to hold
integer values. The following is a small extract.
.L3:
fmov w0, s8
sub w25, w25, #1
cmn w25, #1
add w0, w0, w27
fmov s8, w0
bne .L19
add w0, w0, 1
b .L2
Recompiling with -mgeneral-regs-only generates a huge improvement.
The following are the times I get on various partner HW. I have
normalised the -O2 times to 1 second so that I do not disclose actual
partner performance data:
Partner 1: -O2 = 1sec, -O3 = 1.13sec, -O3 -mgeneral-regs-only = 0.72sec
Partner 2: -O2 = 1sec, -O3 = 0.68sec, -O3 -mgeneral-regs-only = 0.60sec
Partner 3: -O2 = 1sec, -O3 = 0.73sec, -O3 -mgeneral-regs-only = 0.68sec
Partner 4: -O2 = 1sec, -O3 = 0.83sec, -O3 -mgeneral-regs-only = 0.84sec
So, in general, -O3 does actually do better than -O2, but in all cases
performance is better if I stop it using FP registers for int values.
I have put a tarball of the test program along with 3 binaries and 3
disassemblies here:-
http://people.linaro.org/~edward.nevill/fibo.tar
All the best,
Ed.
I did some tests on the following function
--- CUT HERE ---
int fibo(int n)
{
if (n < 2) return 1;
return (fibo(n-2) + fibo(n-1));
}
--- CUT HERE ---
and I discovered that it is faster -O2 than -O3. This is with gcc 4.9.2.
Looking at the disassembly I see it is using FP registers to hold
integer values. The following is a small extract.
.L3:
fmov w0, s8
sub w25, w25, #1
cmn w25, #1
add w0, w0, w27
fmov s8, w0
bne .L19
add w0, w0, 1
b .L2
Recompiling with -mgeneral-regs-only generates a huge improvement.
The following are the times I get on various partner HW. I have
normalised the -O2 times to 1 second so that I do not disclose actual
partner performance data:
Partner 1: -O2 = 1sec, -O3 = 1.13sec, -O3 -mgeneral-regs-only = 0.72sec
Partner 2: -O2 = 1sec, -O3 = 0.68sec, -O3 -mgeneral-regs-only = 0.60sec
Partner 3: -O2 = 1sec, -O3 = 0.73sec, -O3 -mgeneral-regs-only = 0.68sec
Partner 4: -O2 = 1sec, -O3 = 0.83sec, -O3 -mgeneral-regs-only = 0.84sec
So, in general, -O3 does actually do better than -O2, but in all cases
performance is better if I stop it using FP registers for int values.
I have put a tarball of the test program along with 3 binaries and 3
disassemblies here:-
http://people.linaro.org/~edward.nevill/fibo.tar
All the best,
Ed.