Discussion:
[hpc-sig-devel] GCC extensions for `hcqc'
Renato Golin
2017-08-25 18:04:59 UTC
Permalink
+linaro-toolchain, hoping to get more eyes into it.

cheers,
--renato
Hi,
I extended GCC 7.1(or GCC 7.2) for `hcqc'.
I would be grateful if you could give me a comment about whether
this extension is acceptable and whether this extension should be
pushed upstream.
The extended GCC's output using the option ` -fverbose-asm' is
ldr w0, [x29,48] // tmp433, j(8-byte Folded Spill)
^^^^^^^^^^^^^^^^^^^
This code shows that this instruction accesses a memory area
for spill codes.
I made the following changes to GCC 7.1(or GCC 7.2).
The related files are under `hcqc/patch/gcc-7.1.0-add'.
(1) rtl.h
I added flag information to `struct mem_attrs' that means whether
it is a spill memory area or not.
+
+ /* True if the MEM is for spill. */
+ bool for_spill_p;
Also, I added an access macro for this additional field.
+ /* For a MEM rtx, true if its MEM is for spill. */
+ #define MEM_FOR_SPILL_P(RTX) (get_mem_attrs (RTX)->for_spill_p)
+
(2) emit-rtl.c
I added a code to turn on flags for spill memory area in function
`set_mem_attrs_for_spill'.
+ attrs.for_spill_p = true;
(3) final.c
I added code to print that information in function
`output_asm_operand_names'
if the memory is a spill memory area,
+
+ if (MEM_P (op) && MEM_FOR_SPILL_P (op))
+ {
+ HOST_WIDE_INT size = MEM_SIZE (op);
+ fprintf (asm_out_file, " (" HOST_WIDE_INT_PRINT_DEC "-byte Folded
Spill)", size);
+ }
The above changes are implemented similarly as Clang/LLVM.
Unfortunately, it is difficult for GCC to print the above "(?-byte Folded
Spill)"
for memory access instructions only in the same manner as Clang/LLVM.
The reason is that GCC executes the above `output_asm_operand_names'
even in situations where any instruction object(insn) does not exist when
outputting assembly code.
For example, GCC calls `output_asm_insn' directly from the `define_insn'
definition in the aarch64.md file without an insn object(`output_asm_insn'
calls `output_asm_operand_names').
This occurs in "*cb<optab><mode>1" and
"*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult".
From this fact, `hcqc' extracts and accumulates memory access
instructions from the assembly code with the comment "(?-byte Folded
Spill)".
The above extensions are commonly available on almost any architecture.
Also, these extensions do not affect the execution of the resulting assembly
code since additional outputs are only in comments.
Best regards,
--
--------------------------------------
Masaki Arai
Kugan Vivekanandarajah
2017-08-26 07:40:00 UTC
Permalink
Hi,
Post by Renato Golin
+linaro-toolchain, hoping to get more eyes into it.
cheers,
--renato
Hi,
I extended GCC 7.1(or GCC 7.2) for `hcqc'.
I would be grateful if you could give me a comment about whether
this extension is acceptable and whether this extension should be
pushed upstream.
I think this is a useful info. However there might be pushback from
upstream maintainers as this makes the structure bigger by adding a
field. This could have implications for memory usage of the compiler.
Alternatively, we maybe able to get this info from dwarf info when we
compile with -g ? Jim may have some input here (cc ing him).

Thanks,
Kugan
Post by Renato Golin
The extended GCC's output using the option ` -fverbose-asm' is
ldr w0, [x29,48] // tmp433, j(8-byte Folded Spill)
^^^^^^^^^^^^^^^^^^^
This code shows that this instruction accesses a memory area
for spill codes.
I made the following changes to GCC 7.1(or GCC 7.2).
The related files are under `hcqc/patch/gcc-7.1.0-add'.
(1) rtl.h
I added flag information to `struct mem_attrs' that means whether
it is a spill memory area or not.
+
+ /* True if the MEM is for spill. */
+ bool for_spill_p;
Also, I added an access macro for this additional field.
+ /* For a MEM rtx, true if its MEM is for spill. */
+ #define MEM_FOR_SPILL_P(RTX) (get_mem_attrs (RTX)->for_spill_p)
+
(2) emit-rtl.c
I added a code to turn on flags for spill memory area in function
`set_mem_attrs_for_spill'.
+ attrs.for_spill_p = true;
(3) final.c
I added code to print that information in function
`output_asm_operand_names'
if the memory is a spill memory area,
+
+ if (MEM_P (op) && MEM_FOR_SPILL_P (op))
+ {
+ HOST_WIDE_INT size = MEM_SIZE (op);
+ fprintf (asm_out_file, " (" HOST_WIDE_INT_PRINT_DEC "-byte Folded
Spill)", size);
+ }
The above changes are implemented similarly as Clang/LLVM.
Unfortunately, it is difficult for GCC to print the above "(?-byte Folded
Spill)"
for memory access instructions only in the same manner as Clang/LLVM.
The reason is that GCC executes the above `output_asm_operand_names'
even in situations where any instruction object(insn) does not exist when
outputting assembly code.
For example, GCC calls `output_asm_insn' directly from the `define_insn'
definition in the aarch64.md file without an insn object(`output_asm_insn'
calls `output_asm_operand_names').
This occurs in "*cb<optab><mode>1" and
"*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult".
From this fact, `hcqc' extracts and accumulates memory access
instructions from the assembly code with the comment "(?-byte Folded
Spill)".
The above extensions are commonly available on almost any architecture.
Also, these extensions do not affect the execution of the resulting assembly
code since additional outputs are only in comments.
Best regards,
--
--------------------------------------
Masaki Arai
_______________________________________________
linaro-toolchain mailing list
https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Kugan Vivekanandarajah
2017-08-26 20:53:21 UTC
Permalink
Hi,
However there might be pushback from upstream maintainers as this makes the structure bigger
by adding a field. This could have implications for memory usage of the compiler.
I looked into the structure, adding this field is not going to make the structure bigger for either ILP32 or LP64 targets. If you want, you use bit-fields; there is one bool already there which means you can fit 8 bits in the same area as currently taken up by that one.
Yes. I should have checked the mem_attrs structure. This does have at
least a byte left unlike some other tightly packed structures (gimple
and some tree structures in gcc).

Thanks,
Kugan
Alternatively, we maybe able to get this info from dwarf info when we compile with -g ?
I doubt you can. He wants to know if an instruction is a spill location. The location of a variable might be recorded in -g (if it was an user variable) but not that does present the data for all temps being spilled.
I think the patch is actually a good one in general just needs some cleanup.
For example, GCC calls `output_asm_insn' directly from the `define_insn'
definition in the aarch64.md file without an insn object(`output_asm_insn'
calls `output_asm_operand_names').
This occurs in "*cb<optab><mode>1" and
"*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult".
Spills in GCC will always be via the mov* patterns (they are special).
{
operands[2] = GEN_INT (aarch64_fpconst_pow_of_2 (operands[2]));
return "fcvtz<su>\t%<GPI:w>0, %<GPF:s>1, %2";
}
Thanks,
Andrew
-----Original Message-----
Sent: Saturday, August 26, 2017 12:40 AM
Subject: Re: [hpc-sig-devel] GCC extensions for `hcqc'
Hi,
+linaro-toolchain, hoping to get more eyes into it.
cheers,
--renato
Hi,
I extended GCC 7.1(or GCC 7.2) for `hcqc'.
I would be grateful if you could give me a comment about whether this
extension is acceptable and whether this extension should be pushed
upstream.
I think this is a useful info. However there might be pushback from upstream maintainers as this makes the structure bigger by adding a field. This could have implications for memory usage of the compiler.
Alternatively, we maybe able to get this info from dwarf info when we compile with -g ? Jim may have some input here (cc ing him).
Thanks,
Kugan
The extended GCC's output using the option ` -fverbose-asm' is as
ldr w0, [x29,48] // tmp433, j(8-byte Folded Spill)
^^^^^^^^^^^^^^^^^^^ This
code shows that this instruction accesses a memory area for spill
codes.
I made the following changes to GCC 7.1(or GCC 7.2).
The related files are under `hcqc/patch/gcc-7.1.0-add'.
(1) rtl.h
I added flag information to `struct mem_attrs' that means whether it
is a spill memory area or not.
+
+ /* True if the MEM is for spill. */
+ bool for_spill_p;
Also, I added an access macro for this additional field.
+ /* For a MEM rtx, true if its MEM is for spill. */ #define
+ MEM_FOR_SPILL_P(RTX) (get_mem_attrs (RTX)->for_spill_p)
+
(2) emit-rtl.c
I added a code to turn on flags for spill memory area in function
`set_mem_attrs_for_spill'.
+ attrs.for_spill_p = true;
(3) final.c
I added code to print that information in function
`output_asm_operand_names'
if the memory is a spill memory area,
+
+ if (MEM_P (op) && MEM_FOR_SPILL_P (op))
+ {
+ HOST_WIDE_INT size = MEM_SIZE (op);
+ fprintf (asm_out_file, " (" HOST_WIDE_INT_PRINT_DEC "-byte
+ Folded
Spill)", size);
+ }
The above changes are implemented similarly as Clang/LLVM.
Unfortunately, it is difficult for GCC to print the above "(?-byte
Folded Spill)"
for memory access instructions only in the same manner as Clang/LLVM.
The reason is that GCC executes the above `output_asm_operand_names'
even in situations where any instruction object(insn) does not exist
when outputting assembly code.
For example, GCC calls `output_asm_insn' directly from the `define_insn'
definition in the aarch64.md file without an insn object(`output_asm_insn'
calls `output_asm_operand_names').
This occurs in "*cb<optab><mode>1" and
"*aarch64_fcvt<su_optab><GPF:mode><GPI:mode>2_mult".
From this fact, `hcqc' extracts and accumulates memory access
instructions from the assembly code with the comment "(?-byte Folded
Spill)".
The above extensions are commonly available on almost any architecture.
Also, these extensions do not affect the execution of the resulting
assembly code since additional outputs are only in comments.
Best regards,
--
--------------------------------------
Masaki Arai
_______________________________________________
linaro-toolchain mailing list
https://lists.linaro.org/mailman/listinfo/linaro-toolchain
_______________________________________________
linaro-toolchain mailing list
https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Masaki Arai
2017-08-28 18:54:01 UTC
Permalink
Hi,

Thank you very much for your quick check and reply.
I looked into the structure, adding this field is not going to make the
s=
tructure bigger for either ILP32 or LP64 targets. If you want, you use
bit=
-fields; there is one bool already there which means you can fit 8 bits
in =
the same area as currently taken up by that one.
Yes. I should have checked the mem_attrs structure. This does have at
least a byte left unlike some other tightly packed structures (gimple
and some tree structures in gcc).
Even though memory usage does not increase, I understand the policy of
wanting to make the data structure simple.

Another way to implement this feature is to use the `addrspace' field
in `struct mem_attrs' without adding any fields.
I think that this implementation may be more decent.
However, since this field holds information specific to the target
machine, changing this will affect many files.
Post by Kugan Vivekanandarajah
Alternatively, we maybe able to get this info from dwarf info when we
co=
mpile with -g ?
I doubt you can. He wants to know if an instruction is a spill
location.=
The location of a variable might be recorded in -g (if it was an user
var=
iable) but not that does present the data for all temps being spilled.
I think the patch is actually a good one in general just needs some
clean=
up.
I was not thinking about implementation using DWARF.
About 2013, I have created a tool to extract information from DWARF
data in binary files generated by GCC.
At that time, there were some shortages in the DWARF information, and
as a result, it did not go very well.

Best regards,
--
--------------------------------------
Masaki Arai
Kugan Vivekanandarajah
2017-08-29 03:48:29 UTC
Permalink
Hi,
Post by Masaki Arai
Hi,
Thank you very much for your quick check and reply.
I looked into the structure, adding this field is not going to make the
s=
tructure bigger for either ILP32 or LP64 targets. If you want, you use
bit=
-fields; there is one bool already there which means you can fit 8 bits
in =
the same area as currently taken up by that one.
Yes. I should have checked the mem_attrs structure. This does have at
least a byte left unlike some other tightly packed structures (gimple
and some tree structures in gcc).
Even though memory usage does not increase, I understand the policy of
wanting to make the data structure simple.
Another way to implement this feature is to use the `addrspace' field
in `struct mem_attrs' without adding any fields.
I think that this implementation may be more decent.
However, since this field holds information specific to the target
machine, changing this will affect many files.
I think your current approach is reasonable. It is better to discuss
your alternate approach upstream. I would suggest you post your patch
upstream and initiate a discussion there. Please refer to
https://gcc.gnu.org/contribute.html (submitting patches part) if you
already haven't.
Post by Masaki Arai
Post by Kugan Vivekanandarajah
Alternatively, we maybe able to get this info from dwarf info when we
co=
mpile with -g ?
I doubt you can. He wants to know if an instruction is a spill
location.=
The location of a variable might be recorded in -g (if it was an user
var=
iable) but not that does present the data for all temps being spilled.
I think the patch is actually a good one in general just needs some
clean=
up.
I was not thinking about implementation using DWARF.
About 2013, I have created a tool to extract information from DWARF
data in binary files generated by GCC.
At that time, there were some shortages in the DWARF information, and
as a result, it did not go very well.
I am not an expert in DWARF but I can understand that this can be a problem.

Thanks,
Kugan
Post by Masaki Arai
Best regards,
--
--------------------------------------
Masaki Arai
_______________________________________________
linaro-toolchain mailing list
https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Masaki Arai
2017-08-29 09:15:47 UTC
Permalink
Hi,
Post by Kugan Vivekanandarajah
Post by Masaki Arai
Even though memory usage does not increase, I understand the policy of
wanting to make the data structure simple.
Another way to implement this feature is to use the `addrspace' field
in `struct mem_attrs' without adding any fields.
I think that this implementation may be more decent.
However, since this field holds information specific to the target
machine, changing this will affect many files.
I think your current approach is reasonable. It is better to discuss
your alternate approach upstream. I would suggest you post your patch
upstream and initiate a discussion there. Please refer to
https://gcc.gnu.org/contribute.html (submitting patches part) if you
already haven't.
Thank you very much for your comments and advice.
I checked the URL.
If there is no objection, I will consider proposing this function(and
alternative
appraoch) on the GCC mailing list<***@gcc.gnu.org> (after improving print
strings and actual patch files).

Best regards,
--
--------------------------------------
Masaki Arai
Renato Golin
2017-08-29 09:22:31 UTC
Permalink
Post by Masaki Arai
Thank you very much for your comments and advice.
I checked the URL.
If there is no objection, I will consider proposing this function(and
alternative
strings and actual patch files).
I think that'll be the best way forward, thanks!

The list to send the patches is gcc-patches@, according to this page:
https://gcc.gnu.org/lists.html

Also, make sure you have enough tests. Usually, you can find tests for
similar features and replicate them for yours.

cheers,
--renato
Masaki Arai
2017-08-29 15:36:14 UTC
Permalink
Hi,
Post by Renato Golin
Post by Masaki Arai
If there is no objection, I will consider proposing this function(and
alternative
strings and actual patch files).
I think that'll be the best way forward, thanks!
https://gcc.gnu.org/lists.html
Also, make sure you have enough tests. Usually, you can find tests for
similar features and replicate them for yours.
Thank you very much for your advice.

There are many examples in the archive of the mailing list(gcc and
gcc-patches), so I will prepare a proposal using them as a sample.
In addition to modifying some source codes, I think that it is necessary to
check about test programs related to `-fverbose-asm' under `gcc/testsuite'.

Best regards,
--
--------------------------------------
Masaki Arai
Christophe Lyon
2017-08-29 15:46:36 UTC
Permalink
Post by Masaki Arai
Hi,
Post by Renato Golin
Post by Masaki Arai
If there is no objection, I will consider proposing this function(and
alternative
strings and actual patch files).
I think that'll be the best way forward, thanks!
https://gcc.gnu.org/lists.html
Also, make sure you have enough tests. Usually, you can find tests for
similar features and replicate them for yours.
Thank you very much for your advice.
There are many examples in the archive of the mailing list(gcc and
gcc-patches), so I will prepare a proposal using them as a sample.
In addition to modifying some source codes, I think that it is necessary to
check about test programs related to `-fverbose-asm' under `gcc/testsuite'.
Yes, GCC tests are located under gcc/testsuite. There you can grep for
verbose-asm, and see if there is something helpful.

Thanks,

Christophe
Post by Masaki Arai
Best regards,
--
--------------------------------------
Masaki Arai
_______________________________________________
linaro-toolchain mailing list
https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Richard Sandiford
2017-08-29 21:14:16 UTC
Permalink
Sorry for the delayed response.
Post by Masaki Arai
Hi,
Thank you very much for your quick check and reply.
I looked into the structure, adding this field is not going to make the
s=
tructure bigger for either ILP32 or LP64 targets. If you want, you use
bit=
-fields; there is one bool already there which means you can fit 8 bits
in =
the same area as currently taken up by that one.
Yes. I should have checked the mem_attrs structure. This does have at
least a byte left unlike some other tightly packed structures (gimple
and some tree structures in gcc).
Even though memory usage does not increase, I understand the policy of
wanting to make the data structure simple.
FWIW, I think adding a field for this should be fine. It's very much in
the spirit of ORIGINAL_REGNO, which also exists to record the effects of
register allocation and reloading.

Perhaps one question is how the flag should interact with mem_attrs_eq_p,
but I suppose in practice, if everything else about two mem_attrs is the
same, the spill flag should be too, and so there should be no need to
check the spill flag explicitly.

An alternative to adding a new flag might be to check:

MEM_EXPR (mem) == get_spill_slot_decl (false)

But this would bake in the assumption that everything we want to mark
as a spill slot will use set_mem_attrs_for_spill, whereas the flag
would allow other MEMs to be marked as spill slots too.

Thanks,
Richard
Masaki Arai
2017-08-31 09:22:41 UTC
Permalink
Hi,

Thank you very much, everyone, who gave me useful advice.
I might conclude that extension of GCC for HCQC is
completely unnecessary.
Post by Richard Sandiford
MEM_EXPR (mem) == get_spill_slot_decl (false)
I think that this method is far better than introducing the flag
`for_spill_p'!
For this approach, the additional code can only be the following 6
lines in final.c.

+
+ if (MEM_P (op) && MEM_EXPR (op) && MEM_EXPR (op) ==
get_spill_slot_decl (false))
+ {
+ HOST_WIDE_INT size = MEM_SIZE (op);
+ fprintf (asm_out_file, " (" HOST_WIDE_INT_PRINT_DEC "-byte spill
slot)", size);
+ }

I implemented this, and it seems to work fine both on AArch64

ldr x0, [x29, 192] // tmp1241, %sfp (8-byte spill slot)
str w0, [x29, 204] // ivtmp_805, %sfp (4-byte spill slot)

and on x86_64

movsd 8(%rsp), %xmm1 # %sfp (8-byte spill slot), prephitmp_811
movsd %xmm0, 16(%rsp) #, %sfp (8-byte spill slot)

The important point here is that "spill slot" is always displayed with
"%sfp" attached.
This is because the object returned by `get_spill_slot_decl' always
points to the following decl tree `d'.

d = build_decl (DECL_SOURCE_LOCATION (current_function_decl),
VAR_DECL, get_identifier ("%sfp"), void_type_node);

This means that "%sfp" is "spill slot" on all target machines.
I checked the source code of GCC, and it seems that only this part of
code uses "%sfp".
This "%sfp" does not express data size, but HCQC can get data size
information from assembly codes in the same line with "%sfp".
Therefore, the conclusion is that it is enough for HCQC to detect
"%sfp" without any extension of GCC.
Post by Richard Sandiford
But this would bake in the assumption that everything we want to mark
as a spill slot will use set_mem_attrs_for_spill, whereas the flag
would allow other MEMs to be marked as spill slots too.
I guess that the effect of the flag can also be realized with the
following code:

MEM_EXPR (mem) = get_spill_slot_decl (false);

instead of

MEM_ATTRS (mem).for_spill_p = true;

The only case where `for_spill_p' seems to be useful is where some
optimizations combine a memory area V on the stack for a local
variable and a memory area S for the spill code together and make it
the V only using the result of live range analysis for both memory areas.
In that case, the code

MEM_ATTRS (V).for_spill_p = true;

holds the original memory expression with this additional flag.
However, it may be a matter of opinion whether this memory region
should be considered spill areas or not.

Best regards,
--
--------------------------------------
Masaki Arai

Loading...