setting loop buffer size in the gcc (aarch64)

Discussion:

Virendra Kumar Pathak

2016-02-17 14:51:28 UTC

Hi Toolchain Group,

I am trying to study the effect of loop buffer size on loop unrolling & the
way gcc (aarch64) handles this.

To my understanding, Loop Buffer is like i-cache which contains pre-decoded
instruction that can be re-used if branch instruction loopbacks to an
instruction
which is still present in the buffer. For example, in Intelâs Nehalem loop
buffer size is 28 u-ops. In LLVM compiler, it seems LoopMicroOpBufferSize
is for the same purpose.
However, I could not find any parameter/variable inside config/aarch64
representing loop buffer size. I am using Linaro gcc 5.2.1

[Question]
1. Is there any example inside aarch64 (or in general) which uses the loop
buffer size in loop unrolling decision? If yes, could you please mention
the relevant files or code section?
2. Otherwise any guidance/input on adding this support in aarch64 backend
assuming architecture has the loop buffer support.

[My Experiments/Code Browsing]
I have collected following information from code browsing. Please correct
if I missed or misunderstood something.

TARGET_LOOP_UNROLL_ADJUST - This target hook return the number of times a
loop can be unrolled.
This can be used to handle the architecture constraint such number of
memory references inside a loop e.g. ix86_loop_unroll_adjust() &
s390_loop_unroll_adjust().
On the same note, can this be used to handle loop buffer size too?

Without above hook, in loop-unroll.c parameters like
PARAM_MAX_UNROLLED_INSNS (default 200), PARAM_MAX_AVERAGE_UNROLLED_INSNS
(default 80) decides the unrolling factor. e.g. nunroll = PARAM_VALUE
(PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;

In config/aarch64.c, I found align_loops variable in
aarch64_override_options_after_change() function.
I guess this an alignment done before starting the loop header in the
executable. This should not play any role in loop unrolling. Right?

So any guidance on how we can instruct aarch64 backend to utilize loop
buffer size in deciding the loop unrolling factor?

Thanks in advance for your time.

--
with regards,
Virendra Kumar Pathak

Kugan

2016-02-17 21:45:09 UTC

Permalink

Hi,

Post by Virendra Kumar Pathak
Hi Toolchain Group,
I am trying to study the effect of loop buffer size on loop unrolling &
the way gcc (aarch64) handles this.

It depends on the micro-architecture. Usually, loop buffer helps to hold
the loop completely and supplies the instruction fetch unit from there.
The main benefit used to be the dynamic energy reduction. i.e., you
don't access the main (L1) cache for the loop iterations.

Loop unrolling on the other hand can remove the control instructions and
allow the compiler to optimize across loop iterations.

Post by Virendra Kumar Pathak
To my understanding, Loop Buffer is like i-cache which contains
pre-decoded instruction that can be re-used if branch instruction
loopbacks to an instruction
which is still present in the buffer. For example, in Intel’s Nehalem
loop buffer size is 28 u-ops. In LLVM compiler, it seems
LoopMicroOpBufferSize is for the same purpose.
However, I could not find any parameter/variable inside config/aarch64
representing loop buffer size. I am using Linaro gcc 5.2.1
[Question]
1. Is there any example inside aarch64 (or in general) which uses the
loop buffer size in loop unrolling decision? If yes, could you please
mention the relevant files or code section?

Look at this patch for x86:
https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02567.html

This is implemented using TARGET_LOOP_UNROLL_ADJUST as you have found out.

Thanks,
Kugan

Post by Virendra Kumar Pathak
2. Otherwise any guidance/input on adding this support in aarch64
backend assuming architecture has the loop buffer support.
[My Experiments/Code Browsing]
I have collected following information from code browsing. Please
correct if I missed or misunderstood something.
TARGET_LOOP_UNROLL_ADJUST - This target hook return the number of times
a loop can be unrolled.
This can be used to handle the architecture constraint such number of
memory references inside a loop e.g. ix86_loop_unroll_adjust() &
s390_loop_unroll_adjust().
On the same note, can this be used to handle loop buffer size too?
Without above hook, in loop-unroll.c parameters like
PARAM_MAX_UNROLLED_INSNS (default 200), PARAM_MAX_AVERAGE_UNROLLED_INSNS
(default 80) decides the unrolling factor. e.g. nunroll = PARAM_VALUE
(PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
In config/aarch64.c, I found align_loops variable in
aarch64_override_options_after_change() function.
I guess this an alignment done before starting the loop header in the
executable. This should not play any role in loop unrolling. Right?
So any guidance on how we can instruct aarch64 backend to utilize loop
buffer size in deciding the loop unrolling factor?
Thanks in advance for your time.
--
with regards,
Virendra Kumar Pathak
_______________________________________________
linaro-toolchain mailing list
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Kumar, Venkataramanan

2016-02-18 03:25:30 UTC

Permalink

We have a restriction in x86_64 targets on the no of memory references inside a loop that can be held in a loop buffer and take advantage of it.
The patch is trying to find the unroll count, that would still satisfy number of memory references after unrolling and can be held in a loop buffer
Regards,
Venkat.

-----Original Message-----
Behalf Of Kugan
Sent: Thursday, February 18, 2016 3:15 AM
To: Virendra Kumar Pathak; Linaro Toolchain Mailman List
Subject: Re: setting loop buffer size in the gcc (aarch64)
Hi,

Post by Virendra Kumar Pathak
Hi Toolchain Group,
I am trying to study the effect of loop buffer size on loop unrolling
& the way gcc (aarch64) handles this.

It depends on the micro-architecture. Usually, loop buffer helps to hold the
loop completely and supplies the instruction fetch unit from there.
The main benefit used to be the dynamic energy reduction. i.e., you don't
access the main (L1) cache for the loop iterations.
Loop unrolling on the other hand can remove the control instructions and
allow the compiler to optimize across loop iterations.

Post by Virendra Kumar Pathak
To my understanding, Loop Buffer is like i-cache which contains
pre-decoded instruction that can be re-used if branch instruction
loopbacks to an instruction which is still present in the buffer. For
example, in Intel’s Nehalem loop buffer size is 28 u-ops. In LLVM
compiler, it seems LoopMicroOpBufferSize is for the same purpose.
However, I could not find any parameter/variable inside config/aarch64
representing loop buffer size. I am using Linaro gcc 5.2.1
[Question]
1. Is there any example inside aarch64 (or in general) which uses the
loop buffer size in loop unrolling decision? If yes, could you please
mention the relevant files or code section?

https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02567.html
This is implemented using TARGET_LOOP_UNROLL_ADJUST as you have found out.
Thanks,
Kugan

unrolling

Post by Virendra Kumar Pathak
factor. e.g. nunroll = PARAM_VALUE
(PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
In config/aarch64.c, I found align_loops variable in
aarch64_override_options_after_change() function.
I guess this an alignment done before starting the loop header in the
executable. This should not play any role in loop unrolling. Right?
So any guidance on how we can instruct aarch64 backend to utilize loop
buffer size in deciding the loop unrolling factor?
Thanks in advance for your time.
--
with regards,
Virendra Kumar Pathak
_______________________________________________
linaro-toolchain mailing list
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

_______________________________________________
linaro-toolchain mailing list
https://lists.linaro.org/mailman/listinfo/linaro-toolchain