[TCWG] clang/llvm build - tips and advices

Discussion:

Adhemerval Zanella

2015-06-24 13:50:03 UTC

Recently I came across two excellent post about accelerating clang/llvm
build with different compiler/optimization [1] [2].

I tried some of author advices getting very good results. Basically I
moved to optimized clang build, changed to gold linker and used another
memory allocator than system glibc one. Results in build time for all
the clang/llvm toolchain is summarized below (my machine is a i7-4510U,
2C/4T, 8GB, 256GB SSD):

GCC 4.8.4 + gold (Ubuntu 14.04)

real 85m17.640s
user 257m1.976s
sys 11m35.284s

LLVM 3.6 + gold (Ubuntu 14.04)

real 34m4.909s
user 128m43.382s
sys 3m51.643s

LLVM 3.7 + gold + tcmalloc

real 32m56.707s
user 121m40.562s
sys 3m52.358s

Gold linker also shows a *much* less RSS usage, I am able to fully use make -j4
while linking in 8GB without issue any swapping.

Two things I would add/check for the posts:

1. Change from libc to tcmalloc showed me a 3-4% improvement. I tried jemalloc,
but tcmalloc is faster. I am using currently system version 2.2, but I have
pushed an aggressive decommit patch to enable as default for 2.4 that might
show lower RSS and latency (I will check it later).

2. First I try to accelerate my build by offloading compilation using distcc.
Results were good, although the other machine utilization (i7, 4C/8T, 8GB)
showed mixes cpu utilization. The problem was linking memory utilization
using ld.bfd, which generates a lot of swapping with higher job count. I
will try using distcc with clang.

[1] http://blogs.s-osg.org/an-introduction-to-accelerating-your-build-with-clang/
[2] http://blogs.s-osg.org/a-conclusion-to-accelerating-your-build-with-clang/

Renato Golin

2015-06-24 14:15:25 UTC

Permalink

On 24 June 2015 at 14:50, Adhemerval Zanella

Post by Adhemerval Zanella
I tried some of author advices getting very good results. Basically I
moved to optimized clang build, changed to gold linker and used another
memory allocator than system glibc one. Results in build time for all
the clang/llvm toolchain is summarized below (my machine is a i7-4510U,

Optimised + no-assertion builds of clang are in general 2/3 of gcc's
build times.

Post by Adhemerval Zanella
Gold linker also shows a *much* less RSS usage, I am able to fully use make -j4
while linking in 8GB without issue any swapping.

BFD uses more than 2GB of RAM per process when linking statically
debug versions of LLVM+Clang.

What I did was to use gold and enable shared libraries in the debug version.

Post by Adhemerval Zanella
1. Change from libc to tcmalloc showed me a 3-4% improvement. I tried jemalloc,
but tcmalloc is faster. I am using currently system version 2.2, but I have
pushed an aggressive decommit patch to enable as default for 2.4 that might
show lower RSS and latency (I will check it later).

Using Ninja generally makes that edge disappear, because it builds a
lot less files than make would.

I also recommend ccache if you're using gcc, but with Clang it tends
to generate some bogus warnings.

Post by Adhemerval Zanella
2. First I try to accelerate my build by offloading compilation using distcc.
Results were good, although the other machine utilization (i7, 4C/8T, 8GB)
showed mixes cpu utilization. The problem was linking memory utilization
using ld.bfd, which generates a lot of swapping with higher job count. I
will try using distcc with clang.

Distcc only helps if you use the Ninja "pool" feature on the linking jobs.

http://www.systemcall.eu/blog/2013/02/distributed-compilation-on-a-pandaboard-cluster/

Also, I don't want to depend on having a desktop near me, nor
distributing jobs across the Internet, so distcc has very limited
value.

If you have a powerful desktop, I recommend that you move your tree in
there, maybe use your laptop as the distcc slave, and export the
source/build trees via NFS, Samba or SSHFS.

cheers,
--renato

Adhemerval Zanella

2015-06-24 14:28:21 UTC

Permalink

Post by Renato Golin
On 24 June 2015 at 14:50, Adhemerval Zanella

Optimised + no-assertion builds of clang are in general 2/3 of gcc's
build times.

Post by Adhemerval Zanella
Gold linker also shows a *much* less RSS usage, I am able to fully use make -j4
while linking in 8GB without issue any swapping.

BFD uses more than 2GB of RAM per process when linking statically
debug versions of LLVM+Clang.
What I did was to use gold and enable shared libraries in the debug version.

I am using default configuration option which I think it with shared libraries.

Post by Renato Golin

Using Ninja generally makes that edge disappear, because it builds a
lot less files than make would.
I also recommend ccache if you're using gcc, but with Clang it tends
to generate some bogus warnings.

The memory allocator change will help with either build system (gnu make
or ninja). I got this idea about observing the 'perf top' profile with
a clang/llvm build.

About ninja, as the posts had reported I also did not noticed much difference
in build time. I am also not very found of out-of-tree/experimental tools.

I also checked ccache, but on most of time and build I am doing lately build
do not hit the cache system. Usually I update my tree daily and since llvm
tend to refactor code a lot it ends by recompiling a lot of objects (and
thus invalidating the cache...).

For clang you can use 'export CCACHE_CPP2=yes' to make the warning get away.
The only issue is it does not work with the optimized tlbgen build option
(I got weird warning mixing ccache and this option).

Post by Renato Golin

Distcc only helps if you use the Ninja "pool" feature on the linking jobs.
http://www.systemcall.eu/blog/2013/02/distributed-compilation-on-a-pandaboard-cluster/
Also, I don't want to depend on having a desktop near me, nor
distributing jobs across the Internet, so distcc has very limited
value.
If you have a powerful desktop, I recommend that you move your tree in
there, maybe use your laptop as the distcc slave, and export the
source/build trees via NFS, Samba or SSHFS.

Distcc in fact helped a lot with my early builds with GCC+ld.bfd, I got from
roughly 85m build time to 40m. And the only issue about distcc is that I need
to lower the timeout factor a bit so it won't take long to start the job
locally if the remote machine is not accessible.

My desktop have more cores, but do not have a SSD on it. Using GCC+ld in debug
mode the total build time is roughly the same.