Discussion:
Linaro GCC vs Valgrind
William Mills
2016-06-09 21:22:13 UTC
Permalink
Hello,

We have been using Linaro GCC 5.x[1] and valgrind.

When the optimizer is turned on valgrind complains about writes beyond
the current stack pointer. With the optimizer off, the problem report
goes away.

I have my own conclusion about what is going on but I won't bias you
with it. Here are the facts:

All files and logs attached as 10K tar.gz if it survives this maillist.

test.c:
#include <stdio.h>

int main(int argc,char** argv)
{
int i;

for (i = 1; i < argc; i++) {
printf("argument is %s\n", argv[i]);
}

return 0;
}

$ arm-linux-gnueabihf-gcc -march=armv7ve -marm -mfpu=neon \
-mfloat-abi=hard -mcpu=cortex-a15 -O2 -g \
-o test-fail test.c


$ valgrind --leak-resolution=high --track-origins=yes \
--trace-children=yes --leak-check=full --error-limit=no \
./test-fail arg1 arg2 arg3

==20011== Memcheck, a memory error detector
==20011== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==20011== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==20011== Command: ./test-fail arg1 arg2 arg3
==20011==
==20011== Invalid write of size 4
==20011== at 0x10300: main (test.c:4)
==20011== Address 0xbdbfcb58 is on thread 1's stack
==20011== 24 bytes below stack pointer
==20011==

000102f8 <main>:
102f8: e3500001 cmp r0, #1
102fc: da000014 ble 10354 <main+0x5c>
10300: e16d41f8 strd r4, [sp, #-24]! ; 0xffffffe8
^^^^^^^^ Complaint is here

10304: e1a05001 mov r5, r1
10308: e3a04001 mov r4, #1
1030c: e1cd60f8 strd r6, [sp, #8]
10310: e300748c movw r7, #1164 ; 0x48c
10314: e1a06000 mov r6, r0
10318: e3407001 movt r7, #1
1031c: e58d8010 str r8, [sp, #16]
10320: e58de014 str lr, [sp, #20]
10324: e2844001 add r4, r4, #1
10328: e5b51004 ldr r1, [r5, #4]!
1032c: e1a00007 mov r0, r7
10330: ebffffe4 bl 102c8 <***@plt>
10334: e1560004 cmp r6, r4
10338: 1afffff9 bne 10324 <main+0x2c>
1033c: e1cd40d0 ldrd r4, [sp]
10340: e3a00000 mov r0, #0
10344: e1cd60d8 ldrd r6, [sp, #8]
10348: e59d8010 ldr r8, [sp, #16]
1034c: e28dd014 add sp, sp, #20
10350: e49df004 pop {pc} ; (ldr pc, [sp], #4)
10354: e3a00000 mov r0, #0
10358: e12fff1e bx lr

Without the optimizer, the code looks different and valgrind does not
issue any errors.

000103d8 <main>:
103d8: e52db008 str fp, [sp, #-8]!
^^^^^^^ Valgrind does not complain about this

103dc: e58de004 str lr, [sp, #4]
103e0: e28db004 add fp, sp, #4
103e4: e24dd010 sub sp, sp, #16
103e8: e50b0010 str r0, [fp, #-16]
103ec: e50b1014 str r1, [fp, #-20] ; 0xffffffec
103f0: e3a03001 mov r3, #1
103f4: e50b3008 str r3, [fp, #-8]
103f8: ea00000b b 1042c <main+0x54>
103fc: e51b3008 ldr r3, [fp, #-8]
10400: e1a03103 lsl r3, r3, #2
10404: e51b2014 ldr r2, [fp, #-20] ; 0xffffffec
10408: e0823003 add r3, r2, r3
1040c: e5933000 ldr r3, [r3]
10410: e1a01003 mov r1, r3
10414: e30004a4 movw r0, #1188 ; 0x4a4
10418: e3400001 movt r0, #1
1041c: ebffffa9 bl 102c8 <***@plt>
10420: e51b3008 ldr r3, [fp, #-8]
10424: e2833001 add r3, r3, #1
10428: e50b3008 str r3, [fp, #-8]
1042c: e51b2008 ldr r2, [fp, #-8]
10430: e51b3010 ldr r3, [fp, #-16]
10434: e1520003 cmp r2, r3
10438: baffffef blt 103fc <main+0x24>
1043c: e3a03000 mov r3, #0
10440: e1a00003 mov r0, r3
10444: e24bd004 sub sp, fp, #4
10448: e59db000 ldr fp, [sp]
1044c: e28dd004 add sp, sp, #4
10450: e49df004 pop {pc} ; (ldr pc, [sp], #4)


[1] 5.3-2016.02 for Yocto-project and cross-compile
5.2 on the ARM target "since Linaro hasn’t yet fixed building 5.3 from
recipes yet."
Both versions give the same results for this test program.

----------------
William A. Mills
Chief Technologist, Open Solutions, SDO
Texas Instruments, Inc.
20450 Century Blvd
Germantown MD 20878
240-643-0836
Jim Wilson
2016-06-09 23:46:53 UTC
Permalink
Post by William Mills
When the optimizer is turned on valgrind complains about writes beyond
the current stack pointer. With the optimizer off, the problem report
goes away.
102f8: e3500001 cmp r0, #1
102fc: da000014 ble 10354 <main+0x5c>
10300: e16d41f8 strd r4, [sp, #-24]! ; 0xffffffe8
^^^^^^^^ Complaint is here
This optimization is called shrink-wrapping. It involves moving the
function prologue/epilogue inside an outer-most if statement, so that
we we can avoid allocating a stack frame when we don't need it. It
can be disabled with -fno-shrink-wrap. Perhaps valgrind has special
support to detect stack writes inside a prologue, and this support is
failing when a function is shrink wrapped because it can't identify
where the prologue is.

Jim
Charles Baylis
2016-06-10 01:07:50 UTC
Permalink
This looks like a valgrind bug to me.

I can reproduce the problem with this simple program, which shows the
issue at any optimisation level.

int main ()
{
asm volatile ("" : : : "r4", "r5");
return 0;
}

[on my raspberry pi, with the system gcc]
$ gcc test.c -mtune=cortex-a15 -marm
$ valgrind ./a.out
==15850== Memcheck, a memory error detector
==15850== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==15850== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==15850== Command: ./a.out
==15850==
==15850== Invalid write of size 4
==15850== at 0x103E8: main (in /home/cgb23/a.out)
==15850== Address 0xbdcf34a4 is just below the stack ptr. To
suppress, use: --workaround-gcc296-bugs=yes
...

000103e8 <main>:
103e8: e16d40fc strd r4, [sp, #-12]!
103ec: e58db008 str fp, [sp, #8]
103f0: e28db008 add fp, sp, #8
103f4: e3a03000 mov r3, #0
103f8: e1a00003 mov r0, r3
103fc: e24bd008 sub sp, fp, #8
10400: e1cd40d0 ldrd r4, [sp]
10404: e59db008 ldr fp, [sp, #8]
10408: e28dd00c add sp, sp, #12
1040c: e12fff1e bx lr

Without looking at the valgrind sources, I'd guess that valgrind isn't
handling the strd instruction correctly. "size 4" obviously isn't
correct for the strd, and it also may not be accounting for the
writeback of the stack pointer correctly. Looking at google, I found
this bug report to the valgrind mailing list:
https://sourceforge.net/p/valgrind/mailman/message/34632852/. It seems
to relate to the same issue, but did not attract any attention. A
brief look at the attached patch suggests that the problem is related
to the way valgrind handles writes to the stack with negative offsets
and writeback.

The suggested --workaround-gcc296-bugs=yes option does seem to
suppress the error. Alternatively, since the compiler will only use
STRD/LDRD in the prologue and epilogue when compiling for cores with
an out-of-order microarchitecture, you can workaround the problem by
compiling with -mcpu=cortex-a7, in which case it will use PUSH and POP
instead
Post by William Mills
Hello,
We have been using Linaro GCC 5.x[1] and valgrind.
When the optimizer is turned on valgrind complains about writes beyond
the current stack pointer. With the optimizer off, the problem report
goes away.
I have my own conclusion about what is going on but I won't bias you
All files and logs attached as 10K tar.gz if it survives this maillist.
#include <stdio.h>
int main(int argc,char** argv)
{
int i;
for (i = 1; i < argc; i++) {
printf("argument is %s\n", argv[i]);
}
return 0;
}
$ arm-linux-gnueabihf-gcc -march=armv7ve -marm -mfpu=neon \
-mfloat-abi=hard -mcpu=cortex-a15 -O2 -g \
-o test-fail test.c
$ valgrind --leak-resolution=high --track-origins=yes \
--trace-children=yes --leak-check=full --error-limit=no \
./test-fail arg1 arg2 arg3
==20011== Memcheck, a memory error detector
==20011== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==20011== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==20011== Command: ./test-fail arg1 arg2 arg3
==20011==
==20011== Invalid write of size 4
==20011== at 0x10300: main (test.c:4)
==20011== Address 0xbdbfcb58 is on thread 1's stack
==20011== 24 bytes below stack pointer
==20011==
102f8: e3500001 cmp r0, #1
102fc: da000014 ble 10354 <main+0x5c>
10300: e16d41f8 strd r4, [sp, #-24]! ; 0xffffffe8
^^^^^^^^ Complaint is here
10304: e1a05001 mov r5, r1
10308: e3a04001 mov r4, #1
1030c: e1cd60f8 strd r6, [sp, #8]
10310: e300748c movw r7, #1164 ; 0x48c
10314: e1a06000 mov r6, r0
10318: e3407001 movt r7, #1
1031c: e58d8010 str r8, [sp, #16]
10320: e58de014 str lr, [sp, #20]
10324: e2844001 add r4, r4, #1
10328: e5b51004 ldr r1, [r5, #4]!
1032c: e1a00007 mov r0, r7
10334: e1560004 cmp r6, r4
10338: 1afffff9 bne 10324 <main+0x2c>
1033c: e1cd40d0 ldrd r4, [sp]
10340: e3a00000 mov r0, #0
10344: e1cd60d8 ldrd r6, [sp, #8]
10348: e59d8010 ldr r8, [sp, #16]
1034c: e28dd014 add sp, sp, #20
10350: e49df004 pop {pc} ; (ldr pc, [sp], #4)
10354: e3a00000 mov r0, #0
10358: e12fff1e bx lr
Without the optimizer, the code looks different and valgrind does not
issue any errors.
103d8: e52db008 str fp, [sp, #-8]!
^^^^^^^ Valgrind does not complain about this
103dc: e58de004 str lr, [sp, #4]
103e0: e28db004 add fp, sp, #4
103e4: e24dd010 sub sp, sp, #16
103e8: e50b0010 str r0, [fp, #-16]
103ec: e50b1014 str r1, [fp, #-20] ; 0xffffffec
103f0: e3a03001 mov r3, #1
103f4: e50b3008 str r3, [fp, #-8]
103f8: ea00000b b 1042c <main+0x54>
103fc: e51b3008 ldr r3, [fp, #-8]
10400: e1a03103 lsl r3, r3, #2
10404: e51b2014 ldr r2, [fp, #-20] ; 0xffffffec
10408: e0823003 add r3, r2, r3
1040c: e5933000 ldr r3, [r3]
10410: e1a01003 mov r1, r3
10414: e30004a4 movw r0, #1188 ; 0x4a4
10418: e3400001 movt r0, #1
10420: e51b3008 ldr r3, [fp, #-8]
10424: e2833001 add r3, r3, #1
10428: e50b3008 str r3, [fp, #-8]
1042c: e51b2008 ldr r2, [fp, #-8]
10430: e51b3010 ldr r3, [fp, #-16]
10434: e1520003 cmp r2, r3
10438: baffffef blt 103fc <main+0x24>
1043c: e3a03000 mov r3, #0
10440: e1a00003 mov r0, r3
10444: e24bd004 sub sp, fp, #4
10448: e59db000 ldr fp, [sp]
1044c: e28dd004 add sp, sp, #4
10450: e49df004 pop {pc} ; (ldr pc, [sp], #4)
[1] 5.3-2016.02 for Yocto-project and cross-compile
5.2 on the ARM target "since Linaro hasn’t yet fixed building 5.3 from
recipes yet."
Both versions give the same results for this test program.
----------------
William A. Mills
Chief Technologist, Open Solutions, SDO
Texas Instruments, Inc.
20450 Century Blvd
Germantown MD 20878
240-643-0836
_______________________________________________
linaro-toolchain mailing list
https://lists.linaro.org/mailman/listinfo/linaro-toolchain
William Mills
2016-06-10 16:37:09 UTC
Permalink
This post might be inappropriate. Click to display it.
Loading...