question on bfd - arch & mach

Discussion:

Virendra Kumar Pathak

2015-05-27 08:21:39 UTC

Hi Linaro Toolchain Group,

I am going through the binutils code base specific to arm & aarch64. Please
give some insight on below questions.

1. In the struct bfd_arch_info {...} (in bfd/bfd-in2.h) there are two
fields 'enum bfd_architecture arch' and 'unsigned long mach'.
I went trough the binutils porting guide (by ***@nsc.com) which
says 'arch' is for architecture & 'mach' is for machine value.
At present in the bfd/bfd-in2.h :- arch = bfd_arch_aarch64 and mach =
bfd_mach_aarch64 or bfd_mach_aarch64_ilp32.
But what these fields really means ? What is the difference between 'arch'
and 'mach'?

Lets say instruction set architecture is ARMv8 (also known as aarch64 for
64 bit- if I am not wrong). Then we have specific implementation of this
like cortex53, cortex57, Cavium ThunderX etc. With respect to this what
will be the value of arch = ? and mach = ?

2. In the include/opcode/arm.h the 'arm_feature_set' is defined as a
structure where as in include/opcode/aarch64.h 'aarch64_feature_set' is
defined as unsigned long. Is there any specific reason for this? Why
structure definition was not followed in aarch64 ?
typedef struct
{
unsigned long core;
unsigned long coproc;
} arm_feature_set;

typedef unsigned long aarch64_feature_set;

3. Also I see that in the case of arm, 'mach' values are derived from cpu
extension value specified in that 'arm_feature_set' structure.
For example.
if (ARM_CPU_HAS_FEATURE (cpu_variant, arm_cext_iwmmxt2))
mach = bfd_mach_arm_iWMMXt2;
Whereas in aarch64 mach is derived based on API type (64 or 32). Any
reason for this ?
mach = ilp32_p ? bfd_mach_aarch64_ilp32 : bfd_mach_aarch64;

Thanks in advance.

--
with regards,
Virendra Kumar Pathak

Jim Wilson

2015-05-27 16:07:07 UTC

Permalink

On Wed, May 27, 2015 at 1:21 AM, Virendra Kumar Pathak

1. In the struct bfd_arch_info {...} (in bfd/bfd-in2.h) there are two fields
'enum bfd_architecture arch' and 'unsigned long mach'.
says 'arch' is for architecture & 'mach' is for machine value.
At present in the bfd/bfd-in2.h :- arch = bfd_arch_aarch64 and mach =
bfd_mach_aarch64 or bfd_mach_aarch64_ilp32.
But what these fields really means ? What is the difference between 'arch'
and 'mach'?

arch is for different incompatible architectures, e.g. sparc versus
mips versus arm. mach is for different incompatible machines within
an architecture. So for arm we have for instance bfd_mach_arm_2 for
armv2 and bfd_mach_arm_5T for armv5t, etc. These fields have little
meaning outside what the rest of the binutils code gives to them, so
the author of a port can use them however he sees fit, and sometimes
different ports use them slightly differently. Practical
considerations will sometimes force particular choices, to get a
working linux system.

Lets say instruction set architecture is ARMv8 (also known as aarch64 for 64
bit- if I am not wrong). Then we have specific implementation of this like
cortex53, cortex57, Cavium ThunderX etc. With respect to this what will be
the value of arch = ? and mach = ?

All of the announced aarch64 parts implement the same instruction set
(more or less), so they all use the same mach value, bfd_mach_aarch64.

2. In the include/opcode/arm.h the 'arm_feature_set' is defined as a
structure where as in include/opcode/aarch64.h 'aarch64_feature_set' is
defined as unsigned long. Is there any specific reason for this? Why
structure definition was not followed in aarch64 ?
typedef struct
{
unsigned long core;
unsigned long coproc;
} arm_feature_set;
typedef unsigned long aarch64_feature_set;

Ports are free to implement this as they see fit. Often different
people will do it slightly differently. There is no requirement to do
it exactly the same way as some other port. So no requirement that
aarch64 do anything exactly the same as how the arm port did it.

On the practical side, arm is an old architecture, which has many
variants, and has a definite need to express different feature sets.
Whereas aarch64 is new, and as yet does not have any specific need for
different feature sets, since all of the announced parts implement
mostly the same feature sets. So aarch64 has a simple definition as
it doesn't need anything complicated here. And arm has a complicated
definition, as this was necessary to get correct behaviour from the
arm port.

3. Also I see that in the case of arm, 'mach' values are derived from cpu
extension value specified in that 'arm_feature_set' structure.
For example.
if (ARM_CPU_HAS_FEATURE (cpu_variant, arm_cext_iwmmxt2))
mach = bfd_mach_arm_iWMMXt2;
Whereas in aarch64 mach is derived based on API type (64 or 32). Any
reason for this ?
mach = ilp32_p ? bfd_mach_aarch64_ilp32 : bfd_mach_aarch64;

These are effectively working the same way. The only difference is
that there are many arm variants, but only one aarch64 variant, which
is why there are many bfd_mach_arm* codes and only one
bfd_mach_aarch64* code.

As for the ILP32 ABI, it is incompatible with the default LP64 ABI,
and traditionally ILP32 and LP64 use different ELF formats (ELF32
versus ELF64), so it is convenient to give the ILP32 ABI its own
machine code so that we can use the machine code to select the ELF
format. This is also done for x86, where 32-bit, 64-bit, and x32 ABI
are 3 different machine codes.

There is a practical consideration here that if you are using mach
codes for ABIs, and have x ABIs, and are using mach codes for
implementations, and have y implementations, then you would need x*y
mach codes to represent every combination of ABIs and implementation,
which would quickly get impractical. So for instance in the x86 port,
they only have a few mach codes for implementations, even though there
are dozens of variants of the x86 architecture. A mach code is not
the only way to express a different implementation. It is an
implementer's choice whether to use a mach code or some other
mechanism.

In the arm port for instance, the mach codes only go up to armv5te.
Beyond that, the toolchain started using a .ARM.attribute section to
describe machine info, and we stopped adding machine codes for each
arm architecture variant. The .ARM.attribute section can contain a
lot more info than can be represented in the mach field. This is
probably why there is no bfd_mach_arm_6 and bfd_mach_arm_7, as they
weren't needed once we started using .ARM.attribute sections. A
similar mechanism will likely be added to the aarch64 port once that
architecture has enough variants that it would benefit from having an
attribute section.

Jim

Virendra Kumar Pathak

2015-05-31 14:48:55 UTC

Permalink

Hi Jim,

Thanks for giving the insight.

Post by Jim Wilson
On Wed, May 27, 2015 at 1:21 AM, Virendra Kumar Pathak

Post by Virendra Kumar Pathak
1. In the struct bfd_arch_info {...} (in bfd/bfd-in2.h) there are two

fields

Post by Virendra Kumar Pathak
'enum bfd_architecture arch' and 'unsigned long mach'.

which

Post by Virendra Kumar Pathak
says 'arch' is for architecture & 'mach' is for machine value.
At present in the bfd/bfd-in2.h :- arch = bfd_arch_aarch64 and mach =
bfd_mach_aarch64 or bfd_mach_aarch64_ilp32.
But what these fields really means ? What is the difference between

'arch'

Post by Virendra Kumar Pathak
and 'mach'?

Post by Virendra Kumar Pathak
Lets say instruction set architecture is ARMv8 (also known as aarch64

for 64

Post by Virendra Kumar Pathak
bit- if I am not wrong). Then we have specific implementation of this

Post by Virendra Kumar Pathak
cortex53, cortex57, Cavium ThunderX etc. With respect to this what will

Post by Virendra Kumar Pathak
the value of arch = ? and mach = ?

All of the announced aarch64 parts implement the same instruction set
(more or less), so they all use the same mach value, bfd_mach_aarch64.

Post by Virendra Kumar Pathak
2. In the include/opcode/arm.h the 'arm_feature_set' is defined as a
structure where as in include/opcode/aarch64.h 'aarch64_feature_set' is
defined as unsigned long. Is there any specific reason for this? Why
structure definition was not followed in aarch64 ?
typedef struct
{
unsigned long core;
unsigned long coproc;
} arm_feature_set;
typedef unsigned long aarch64_feature_set;

Ports are free to implement this as they see fit. Often different
people will do it slightly differently. There is no requirement to do
it exactly the same way as some other port. So no requirement that
aarch64 do anything exactly the same as how the arm port did it.
On the practical side, arm is an old architecture, which has many
variants, and has a definite need to express different feature sets.
Whereas aarch64 is new, and as yet does not have any specific need for
different feature sets, since all of the announced parts implement
mostly the same feature sets. So aarch64 has a simple definition as
it doesn't need anything complicated here. And arm has a complicated
definition, as this was necessary to get correct behaviour from the
arm port.

Post by Virendra Kumar Pathak
3. Also I see that in the case of arm, 'mach' values are derived from cpu
extension value specified in that 'arm_feature_set' structure.
For example.
if (ARM_CPU_HAS_FEATURE (cpu_variant, arm_cext_iwmmxt2))
mach = bfd_mach_arm_iWMMXt2;
Whereas in aarch64 mach is derived based on API type (64 or 32). Any
reason for this ?
mach = ilp32_p ? bfd_mach_aarch64_ilp32 : bfd_mach_aarch64;

These are effectively working the same way. The only difference is
that there are many arm variants, but only one aarch64 variant, which
is why there are many bfd_mach_arm* codes and only one
bfd_mach_aarch64* code.
As for the ILP32 ABI, it is incompatible with the default LP64 ABI,
and traditionally ILP32 and LP64 use different ELF formats (ELF32
versus ELF64), so it is convenient to give the ILP32 ABI its own
machine code so that we can use the machine code to select the ELF
format. This is also done for x86, where 32-bit, 64-bit, and x32 ABI
are 3 different machine codes.
There is a practical consideration here that if you are using mach
codes for ABIs, and have x ABIs, and are using mach codes for
implementations, and have y implementations, then you would need x*y
mach codes to represent every combination of ABIs and implementation,
which would quickly get impractical. So for instance in the x86 port,
they only have a few mach codes for implementations, even though there
are dozens of variants of the x86 architecture. A mach code is not
the only way to express a different implementation. It is an
implementer's choice whether to use a mach code or some other
mechanism.
In the arm port for instance, the mach codes only go up to armv5te.
Beyond that, the toolchain started using a .ARM.attribute section to
describe machine info, and we stopped adding machine codes for each
arm architecture variant. The .ARM.attribute section can contain a
lot more info than can be represented in the mach field. This is
probably why there is no bfd_mach_arm_6 and bfd_mach_arm_7, as they
weren't needed once we started using .ARM.attribute sections. A
similar mechanism will likely be added to the aarch64 port once that
architecture has enough variants that it would benefit from having an
attribute section.
Jim

--
with regards,
Virendra Kumar Pathak