[ACTIVITY] 22-26 June 2015

Discussion:

Christophe Lyon

2015-06-26 14:51:19 UTC

* One day off (Wed) (2/10)

== Progress ==
* linaro-5.1-2015.06 snapshot (1/10)
- dealt with tags, release notes
- shared it with B&B

* 4.8-2015.06 branch merge (1/10)
- investigated regression: incorrect automatic merge
- fixed, validation on-going

* 4.9 branch (2/10)
- updated our git linaro-4.9-branch to match the svn one
- ready for branch merge, will be done right after fsf release

* Misc (4/10)
- meetings, conf-calls, emails, reviews (GCC backports, ABE, backflip)

== Next ==
* more reviews for new backports
* backports, release, validation: update doc
* hopefully upstream work

Yao Qi

2015-06-26 16:04:00 UTC

Permalink

* One day off on Thu [2/10]

# Progress #
* Linaro GDB [4/10]
** TCWG-805, aarch64 native debugging multi-arch support.
Prepare for the patches submission.
It is a big patch series, and think about how to upstream them.
Write commit log including the rationale of the changes.

* FSF GDB [2/10]
** FSF GDB 7.10 release. Audit some GDB regressions caused by intel
mpx stuff.
** PR 18605. Write a patch and it is in testing.
** Other patches review.

* Misc [2/10]
** File expense report for Grenoble travel.
** Some discussions on aarch64 tracepoint.

# Plan #
* TCWG-805, upstream some patches on multi-arch debugging.

--
Yao

Prathamesh Kulkarni

2015-06-28 15:38:11 UTC

Permalink

* TCWG-830 (4/10)
- Observing tree dumps

- Peeling for alignment happens at -O3 but not at -O2 -ftree-vectorize
Reason: in vect_enhance_data_refs_alignment() for:
a) -O2 -ftree-vectorize: max_allowed_peel == 0
b) -O3: max_allowed_peel == (unsigned) -1;
which equals UINT_MAX and therefore peeling gets allowed.

- Workaround: Pass -param vect-max-peeling-for-alignment=0

- Peeling for alignment with O2 can be enabled by passing
-fvect-cost-model (we don't want this!)
Reason:
opts.c:
/* Tune vectorization related parametees according to cost model. */
if (opts->x_flag_vect_cost_model == VECT_COST_MODEL_CHEAP)
{
maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS,
6, opts->x_param_values, opts_set->x_param_values);
maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIGNMENT_CHECKS,
0, opts->x_param_values, opts_set->x_param_values);
maybe_set_param_value (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
0, opts->x_param_values, opts_set->x_param_values);
}
The above if condition becomes false when -fvect-cost-model is passed.

- Proposed patch (untested): http://pastebin.com/ftp0mrwH
Patch follows the workaround and passes --param vect-max-peeling-for-alignment=0
if unaligned access is supported.

* TCWG-777 (4/10)
- Observing tree and rtl dumps

- Workaround: for -O1 pass -fno-tree-fre -fno-tree-dominator-opts
Test-case: http://pastebin.com/cjBcSpiT
Generated assembly at -O1 without workaround: http://pastebin.com/jmQGZhN9
Generated assembly at -O1 with workaround: http://pastebin.com/JGj05z66
Is that the expected output for no unnecessary temps in assembly with
workaround ?
Is it profitable over the assembly generated without workaround ?

- Approach currently taken:
a) New pass "remove-temps" (for lack of better name), after nrv (added
as last gimple pass).

b) Transforms:
if (ssa_var != 0)
to
new_ssa_var = SSA_NAME_DEF_STMT (ssa_var)
if (new_ssa_var != 0)

This "unfolds" cse on expressions within if, which was done by fre
(and if fre was disabled then by dom pass).

c) However this approach results in dead stores.
eg:
_8 = flags_7(D) & 1;
if (_8 != 0)
...
is transformed to:
_8 = flags_7(D) & 1;
_32 = flags_7(D) & 1;
if (_32 != 0)
...
so store to _8 is dead store.
I tried to run dse after remove-temps but that didn't work.
RTL 194r.jump eliminates the above dead store as "trivially dead insn".
However I don't think it's a good idea to have dead stores like these
in gimple and rely
on RTL to eliminate them. I could try to make the pass bit smarter to
not generate redundant stores like _32 != 0 in above case.

d) Patch (no intent to commit as-is): http://pastebin.com/AGXnSkrZ
Generated assembly at -O1 with the patch: http://pastebin.com/VmHCVpGC
Patch eliminates temporaries at -O1 but not at -O2.
I have not yet figured out the reason for that.
For if (flags & 1),
In dfinish pass for -O1, the generated RTL is from
zeroextractsi_compare0_scratch
while for -O2, the generated RTL is from andsi3_compare0

e) Is this a problem also on x86 ?
x86 generated assembly with -O1: http://pastebin.com/XMeTXXwK

* Misc (2/10)
- Getting familiar with vectorizer and NEON gcc intrinsics
- Reviewed git tutorials and starting preparation of git doc
- Conference calls

== Next Week ==
- Continue working on TCWG-830 and TCWG-777
- Header file flattening
- Travel to Mumbai on 2nd July (Thursday) for US Visa OFC appointment.

Kugan

2015-06-29 04:02:55 UTC

Permalink

== Progress ==

* TCWG-849 (1/10)
- Committed improvement for VRP
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=225108

* Add REG_EQUAL for arm_emit_movpair (4/10)
- Posted patches for review

* TACT -TCWG-851 (3/10)
- Started with the small examples.
- Ran into an error while tuning; looking into it

* Git work flow for upstream patches -TCWG-848 (1/10)
- Had a chat with Michael and Prathamesh
- Tried the work-flow and now started documenting them

* Misc (1/10)
- gcc-patches, gcc-bugs list
- Meetings

== Plan ==

- GCC Bugs
- TACT driven optimization exploration for gcc