Discussion:
Runtime inlining
Zoltan Kiss
2015-11-06 14:48:48 UTC
Permalink
Hi,

We have a packaging/linking/optimization problem at LNG, I hope you guys
can give us some advice on that. (Cc'ing ODP list in case someone want
to add something)
We have OpenDataPlane (ODP), an API stretching between userspace
applications and hardware SDKs. It's defined in the form of C headers,
and we already have several implementations to face SDKs (or whathever
is actually controlling the hardware), e.g. linux-generic, a DPDK one etc.
And we have applications, like Open vSwitch (OVS), which now is able to
work with any ODP platform implementation which implements this API
When it comes to packaging, the ideal scenario would be to create one
package for the application, e.g. openvswitch.deb, and one for each
platform, e.g odp-generic.deb, odp-dpdk.deb. The latter would contain
the implementations in the form of a libodp.so file, so the application
can dynamically load the actually installed platform's library runtime,
with all the benefits of dynamic linking.
The trouble is that we have several accessor functions in the API which
are very short and __very__ frequently used. The best example is
"uint32_t odp_packet_len(odp_packet_t pkt)", which returns the length of
the packet. odp_packet_t is an opaque type defined by the
implementation, often a pointer to the packet's actual metadata, so the
actual function call yields to a simple load from that metadata pointer
(+offset). Having it wrapped into a function call brings a significant
performance decrease: when forwarding 64 byte packets at 10 Gbps, I got
13.2 Mpps with function calls. When I've inlined that function it
brought 13.8 Mpps, that's ~5% difference. And there are a lot of other
frequently used short accessor functions with the same problem.
But obviously if I inline these functions I break the ABI, and I need to
compile the application for each platform (and create packages like
openvswitch-odp-dpdk.deb, containing the platform statically linked).
I've tried to look around on Google and in gcc manual, but I couldn't
find a good solution for this kind of problem.
I've checked link time optimization (-flto), but it only helps with
static linking. Is there any way to keep the ODP application and
platform implementation binaries in separate files while having the
performance benefit of inlining?

Regards,

Zoltan
Jim Wilson
2015-11-09 22:33:56 UTC
Permalink
On Fri, Nov 6, 2015 at 6:48 AM, Zoltan Kiss <***@linaro.org> wrote:
> I've checked link time optimization (-flto), but it only helps with static
> linking. Is there any way to keep the ODP application and platform
> implementation binaries in separate files while having the performance
> benefit of inlining?

I haven't been able to think of a good way to do this, and apparently
no one else has either.

There is a not so good way to do it. You could distribute relocatable
link (i.e. ld -r) output instead of executables and shared libraries,
and then do the final LTO compile and link at run-time. This just
creates a new set of problems though. There would be a long delay for
the LTO compile and link before you start routing packets which would
be inconvenient. It would be better to do the LTO compile and link
just once and reuse the binary, but then you have the problem of where
do you put the binary and how do you give it proper owner and group
permissions. There may also be issues with using ld -r with LTO. You
probably don't want this mess.

Otherwise, you need some kind of JIT or rewritable code scheme to redo
compiler optimizations at run-time, and we don't have that technology,
at least not with gcc. I don't know if LLVM has any useful feature
here.

Jim
Bill Fischofer
2015-11-09 22:39:20 UTC
Permalink
The IO Visor <https://www.iovisor.org/> project appears to be doing
something like this with LLVM and JIT constructs to dynamically insert code
into the kernel in a platform-independent manner. Perhaps we can leverage
that technology?

Bill

On Mon, Nov 9, 2015 at 4:33 PM, Jim Wilson <***@linaro.org> wrote:

> On Fri, Nov 6, 2015 at 6:48 AM, Zoltan Kiss <***@linaro.org>
> wrote:
> > I've checked link time optimization (-flto), but it only helps with
> static
> > linking. Is there any way to keep the ODP application and platform
> > implementation binaries in separate files while having the performance
> > benefit of inlining?
>
> I haven't been able to think of a good way to do this, and apparently
> no one else has either.
>
> There is a not so good way to do it. You could distribute relocatable
> link (i.e. ld -r) output instead of executables and shared libraries,
> and then do the final LTO compile and link at run-time. This just
> creates a new set of problems though. There would be a long delay for
> the LTO compile and link before you start routing packets which would
> be inconvenient. It would be better to do the LTO compile and link
> just once and reuse the binary, but then you have the problem of where
> do you put the binary and how do you give it proper owner and group
> permissions. There may also be issues with using ld -r with LTO. You
> probably don't want this mess.
>
> Otherwise, you need some kind of JIT or rewritable code scheme to redo
> compiler optimizations at run-time, and we don't have that technology,
> at least not with gcc. I don't know if LLVM has any useful feature
> here.
>
> Jim
> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
Jim Wilson
2015-11-09 22:48:48 UTC
Permalink
On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer
<***@linaro.org> wrote:
> The IO Visor project appears to be doing something like this with LLVM and
> JIT constructs to dynamically insert code into the kernel in a
> platform-independent manner. Perhaps we can leverage that technology?

GCC has some experimental JIT support, but I think it would be a lot
of work to use it, and I don't know how stable it is.
https://gcc.gnu.org/wiki/JIT
The LLVM support is probably more advanced.

Jim
Bill Fischofer
2015-11-09 23:50:51 UTC
Permalink
Adding Grant Likely to this chain as it relates to the broader subject of
portable ABIs that we've been discussing.

On Mon, Nov 9, 2015 at 4:48 PM, Jim Wilson <***@linaro.org> wrote:

> On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer
> <***@linaro.org> wrote:
> > The IO Visor project appears to be doing something like this with LLVM
> and
> > JIT constructs to dynamically insert code into the kernel in a
> > platform-independent manner. Perhaps we can leverage that technology?
>
> GCC has some experimental JIT support, but I think it would be a lot
> of work to use it, and I don't know how stable it is.
> https://gcc.gnu.org/wiki/JIT
> The LLVM support is probably more advanced.
>
> Jim
>
Maxim Uvarov
2015-11-10 07:39:28 UTC
Permalink
JIT like lua might also not work because you need to rewrite OVS to support
it. I don't think that it will be accepted.

And it looks like it's problem in OVS, not in ODP. I.e. OVS should allow
to use library functions for fast path (where inlines are critical). I.e.
not just call odp_packet_len(), but move hole OVS function to dynamic
library.

regards,
Maxim.

On 10 November 2015 at 02:50, Bill Fischofer <***@linaro.org>
wrote:

> Adding Grant Likely to this chain as it relates to the broader subject of
> portable ABIs that we've been discussing.
>
> On Mon, Nov 9, 2015 at 4:48 PM, Jim Wilson <***@linaro.org> wrote:
>
>> On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer
>> <***@linaro.org> wrote:
>> > The IO Visor project appears to be doing something like this with LLVM
>> and
>> > JIT constructs to dynamically insert code into the kernel in a
>> > platform-independent manner. Perhaps we can leverage that technology?
>>
>> GCC has some experimental JIT support, but I think it would be a lot
>> of work to use it, and I don't know how stable it is.
>> https://gcc.gnu.org/wiki/JIT
>> The LLVM support is probably more advanced.
>>
>> Jim
>>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
>
Zoltan Kiss
2015-11-10 10:41:19 UTC
Permalink
On 10/11/15 07:39, Maxim Uvarov wrote:
> JIT like lua might also not work because you need to rewrite OVS to
> support it. I don't think that it will be accepted.
>
> And it looks like it's problem in OVS, not in ODP. I.e. OVS should allow
> to use library functions for fast path (where inlines are critical).
> I.e. not just call odp_packet_len(), but move hole OVS function to
> dynamic library.

I'm not sure I get your point here, but OVS allows to use dynamic
library functions on fast path. The problem is that it's slow, because
of the function call overhead.

>
> regards,
> Maxim.
>
> On 10 November 2015 at 02:50, Bill Fischofer <***@linaro.org
> <mailto:***@linaro.org>> wrote:
>
> Adding Grant Likely to this chain as it relates to the broader
> subject of portable ABIs that we've been discussing.
>
> On Mon, Nov 9, 2015 at 4:48 PM, Jim Wilson <***@linaro.org
> <mailto:***@linaro.org>> wrote:
>
> On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer
> <***@linaro.org <mailto:***@linaro.org>>
> wrote:
> > The IO Visor project appears to be doing something like this with LLVM and
> > JIT constructs to dynamically insert code into the kernel in a
> > platform-independent manner. Perhaps we can leverage that technology?
>
> GCC has some experimental JIT support, but I think it would be a lot
> of work to use it, and I don't know how stable it is.
> https://gcc.gnu.org/wiki/JIT
> The LLVM support is probably more advanced.
>
> Jim
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org <mailto:lng-***@lists.linaro.org>
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
Maxim Uvarov
2015-11-10 11:08:27 UTC
Permalink
On 10 November 2015 at 13:41, Zoltan Kiss <***@linaro.org> wrote:

>
>
> On 10/11/15 07:39, Maxim Uvarov wrote:
>
>> JIT like lua might also not work because you need to rewrite OVS to
>> support it. I don't think that it will be accepted.
>>
>> And it looks like it's problem in OVS, not in ODP. I.e. OVS should allow
>> to use library functions for fast path (where inlines are critical).
>> I.e. not just call odp_packet_len(), but move hole OVS function to
>> dynamic library.
>>
>
> I'm not sure I get your point here, but OVS allows to use dynamic library
> functions on fast path. The problem is that it's slow, because of the
> function call overhead.
>

I'm not familiar with ovs code. But for example ovs has something like:

ovs_get_and_packet_process()
{
// here you use some inlines:
pkt = odp_recv();
len = odp_packet_len(pkt);

... etc.

}

So it's clear for each target arch you needs it's own variant of
ovs_get_and_packet_process() function. That function should go from ovs to
dynamic library.

Maxim.




>
>
>> regards,
>> Maxim.
>>
>> On 10 November 2015 at 02:50, Bill Fischofer <***@linaro.org
>> <mailto:***@linaro.org>> wrote:
>>
>> Adding Grant Likely to this chain as it relates to the broader
>> subject of portable ABIs that we've been discussing.
>>
>> On Mon, Nov 9, 2015 at 4:48 PM, Jim Wilson <***@linaro.org
>> <mailto:***@linaro.org>> wrote:
>>
>> On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer
>> <***@linaro.org <mailto:***@linaro.org>>
>> wrote:
>> > The IO Visor project appears to be doing something like this
>> with LLVM and
>> > JIT constructs to dynamically insert code into the kernel in a
>> > platform-independent manner. Perhaps we can leverage that
>> technology?
>>
>> GCC has some experimental JIT support, but I think it would be a
>> lot
>> of work to use it, and I don't know how stable it is.
>> https://gcc.gnu.org/wiki/JIT
>> The LLVM support is probably more advanced.
>>
>> Jim
>>
>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-***@lists.linaro.org <mailto:lng-***@lists.linaro.org>
>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-***@lists.linaro.org
>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
Grant Likely
2015-11-10 12:04:17 UTC
Permalink
On Tue, Nov 10, 2015 at 11:08 AM, Maxim Uvarov <***@linaro.org> wrote:
> On 10 November 2015 at 13:41, Zoltan Kiss <***@linaro.org> wrote:
>> On 10/11/15 07:39, Maxim Uvarov wrote:
>>> And it looks like it's problem in OVS, not in ODP. I.e. OVS should allow
>>> to use library functions for fast path (where inlines are critical).
>>> I.e. not just call odp_packet_len(), but move hole OVS function to
>>> dynamic library.
>>
>> I'm not sure I get your point here, but OVS allows to use dynamic library
>> functions on fast path. The problem is that it's slow, because of the
>> function call overhead.
>
> I'm not familiar with ovs code. But for example ovs has something like:
>
> ovs_get_and_packet_process()
> {
> // here you use some inlines:
> pkt = odp_recv();
> len = odp_packet_len(pkt);
>
> ... etc.
>
> }
>
> So it's clear for each target arch you needs it's own variant of
> ovs_get_and_packet_process() function. That function should go from ovs to
> dynamic library.

Which library? A library specific to OVS? Or some common ODP library
that everyone uses? In either case the solution is not scalable. In
the first case it still requires the app vendor to have a separate
build for each and every supported target. In the second, it is
basically argues for all fast-path application-specific code to go
into a non-app-specific library. That really won't fly.

I have two answers to this question. One for the short term, and one
for the long.

In the short term we have no choice. If we're going to support
portable application binaries, then we cannot do inlines. ODP simply
isn't set up to support that. Portable binaries will have to take the
hit of doing a function call each and every time. It's not fast, but
it *works*, which at least will set a lowest common denominator. To
mitigate the problem we could encourage application packages to
include a generic version (no-inlines, but works everywhere) plus one
or more optimized builds (with inlines) and the correct binary is
selected at runtime. Not great, but it is a reasonable answer for the
short term.

For the long term to get away from per-platform builds, I see two
viable options. Bill suggested the first: Use LLVM to optimize at
runtime so that thing like inlines get picked up when linked to the
platform library. There is some precedence of other projects already
doing this, so this isn't as far fetched as it may seem. The second is
to do what we already do in the kernel for ftrace: instrument the
function calls and runtime patch them with optimized inlines. Not
pretty, probably fragile, but we do have the knowledge from the kernel
of how to do it. All said, I would prefer an LLVM based solution, but
investigation is needed to figure out how to make it work.

g.

>>> On 10 November 2015 at 02:50, Bill Fischofer <***@linaro.org
>>> <mailto:***@linaro.org>> wrote:
>>>
>>> Adding Grant Likely to this chain as it relates to the broader
>>> subject of portable ABIs that we've been discussing.
>>>
>>> On Mon, Nov 9, 2015 at 4:48 PM, Jim Wilson <***@linaro.org
>>> <mailto:***@linaro.org>> wrote:
>>>
>>> On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer
>>> <***@linaro.org <mailto:***@linaro.org>>
>>> wrote:
>>> > The IO Visor project appears to be doing something like this
>>> with LLVM and
>>> > JIT constructs to dynamically insert code into the kernel in a
>>> > platform-independent manner. Perhaps we can leverage that
>>> technology?
>>>
>>> GCC has some experimental JIT support, but I think it would be a
>>> lot
>>> of work to use it, and I don't know how stable it is.
>>> https://gcc.gnu.org/wiki/JIT
>>> The LLVM support is probably more advanced.
>>>
>>> Jim
>>>
>>>
>>>
>>> _______________________________________________
>>> lng-odp mailing list
>>> lng-***@lists.linaro.org <mailto:lng-***@lists.linaro.org>
>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> lng-odp mailing list
>>> lng-***@lists.linaro.org
>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
Adhemerval Zanella
2015-11-10 13:22:57 UTC
Permalink
On 10-11-2015 10:04, Grant Likely wrote:
> On Tue, Nov 10, 2015 at 11:08 AM, Maxim Uvarov <***@linaro.org> wrote:
>> On 10 November 2015 at 13:41, Zoltan Kiss <***@linaro.org> wrote:
>>> On 10/11/15 07:39, Maxim Uvarov wrote:
>>>> And it looks like it's problem in OVS, not in ODP. I.e. OVS should allow
>>>> to use library functions for fast path (where inlines are critical).
>>>> I.e. not just call odp_packet_len(), but move hole OVS function to
>>>> dynamic library.
>>>
>>> I'm not sure I get your point here, but OVS allows to use dynamic library
>>> functions on fast path. The problem is that it's slow, because of the
>>> function call overhead.
>>
>> I'm not familiar with ovs code. But for example ovs has something like:
>>
>> ovs_get_and_packet_process()
>> {
>> // here you use some inlines:
>> pkt = odp_recv();
>> len = odp_packet_len(pkt);
>>
>> ... etc.
>>
>> }
>>
>> So it's clear for each target arch you needs it's own variant of
>> ovs_get_and_packet_process() function. That function should go from ovs to
>> dynamic library.
>
> Which library? A library specific to OVS? Or some common ODP library
> that everyone uses? In either case the solution is not scalable. In
> the first case it still requires the app vendor to have a separate
> build for each and every supported target. In the second, it is
> basically argues for all fast-path application-specific code to go
> into a non-app-specific library. That really won't fly.
>
> I have two answers to this question. One for the short term, and one
> for the long.
>
> In the short term we have no choice. If we're going to support
> portable application binaries, then we cannot do inlines. ODP simply
> isn't set up to support that. Portable binaries will have to take the
> hit of doing a function call each and every time. It's not fast, but
> it *works*, which at least will set a lowest common denominator. To
> mitigate the problem we could encourage application packages to
> include a generic version (no-inlines, but works everywhere) plus one
> or more optimized builds (with inlines) and the correct binary is
> selected at runtime. Not great, but it is a reasonable answer for the
> short term.
>
> For the long term to get away from per-platform builds, I see two
> viable options. Bill suggested the first: Use LLVM to optimize at
> runtime so that thing like inlines get picked up when linked to the
> platform library. There is some precedence of other projects already
> doing this, so this isn't as far fetched as it may seem. The second is
> to do what we already do in the kernel for ftrace: instrument the
> function calls and runtime patch them with optimized inlines. Not
> pretty, probably fragile, but we do have the knowledge from the kernel
> of how to do it. All said, I would prefer an LLVM based solution, but
> investigation is needed to figure out how to make it work.

The LLVM JIT approach will require a lot of engineer work from ODP side.
Currently LLVM provides two JIT engines: the MCJIT and the ORC
(which is new on LLVM 3.7).

The MCJIT work on 'modules': the programs can either pass a C or IR file or
use the API to create a module with multiple functions. The JIT engine will
then build and create a ELF module that will be loaded in process address
VMA. It is essentially an AOT JIT.

The ORC stands for 'On Request Compilation' and it differ than MCJIT is
aiming to lazy compilation using indirection hooks. The function won't be
JITted until is is called. [1]

In any case you won't have inline speed if you decide to just JIT the
inline calls, it will still be an indirection calls to the JIT functions.
Neither supports patchpoints, which was the kernel does to dynamically
change the code to patch for specific instructions.

If you want to actually dynamic change the code you can try the
DynamicRIO [2] project that aims to provide an API to do so. However
it is aimed for instrumentation, so I am not sure how well it plays with
performance-wise projects.

I would suggest instead of focus on dynamic code generation for such
inlines, to work on more general functions that are actually called
through either PLT or indirections and crate runtime dispatch based
on runtime.

You can follow the GCC strategy to do indirection calls
(the __builtin_supports('') which openssl emulates as well) or since
it is a library to use IFUNC on the PLT calls (like GLIBC does with
memory and math operations). With current GCC you can build different
versions of the same function and add a IFUNC dispatch to select the
best one at runtime.

[1] http://article.gmane.org/gmane.comp.compilers.llvm.devel/80639
[2] http://www.dynamorio.org/

>
> g.
>
>>>> On 10 November 2015 at 02:50, Bill Fischofer <***@linaro.org
>>>> <mailto:***@linaro.org>> wrote:
>>>>
>>>> Adding Grant Likely to this chain as it relates to the broader
>>>> subject of portable ABIs that we've been discussing.
>>>>
>>>> On Mon, Nov 9, 2015 at 4:48 PM, Jim Wilson <***@linaro.org
>>>> <mailto:***@linaro.org>> wrote:
>>>>
>>>> On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer
>>>> <***@linaro.org <mailto:***@linaro.org>>
>>>> wrote:
>>>> > The IO Visor project appears to be doing something like this
>>>> with LLVM and
>>>> > JIT constructs to dynamically insert code into the kernel in a
>>>> > platform-independent manner. Perhaps we can leverage that
>>>> technology?
>>>>
>>>> GCC has some experimental JIT support, but I think it would be a
>>>> lot
>>>> of work to use it, and I don't know how stable it is.
>>>> https://gcc.gnu.org/wiki/JIT
>>>> The LLVM support is probably more advanced.
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> lng-***@lists.linaro.org <mailto:lng-***@lists.linaro.org>
>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> lng-***@lists.linaro.org
>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-***@lists.linaro.org
>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>
> _______________________________________________
> linaro-toolchain mailing list
> linaro-***@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-toolchain
>
Zoltan Kiss
2015-11-10 15:04:24 UTC
Permalink
On 10/11/15 12:04, Grant Likely wrote:
> On Tue, Nov 10, 2015 at 11:08 AM, Maxim Uvarov <***@linaro.org> wrote:
>> On 10 November 2015 at 13:41, Zoltan Kiss <***@linaro.org> wrote:
>>> On 10/11/15 07:39, Maxim Uvarov wrote:
>>>> And it looks like it's problem in OVS, not in ODP. I.e. OVS should allow
>>>> to use library functions for fast path (where inlines are critical).
>>>> I.e. not just call odp_packet_len(), but move hole OVS function to
>>>> dynamic library.
>>>
>>> I'm not sure I get your point here, but OVS allows to use dynamic library
>>> functions on fast path. The problem is that it's slow, because of the
>>> function call overhead.
>>
>> I'm not familiar with ovs code. But for example ovs has something like:
>>
>> ovs_get_and_packet_process()
>> {
>> // here you use some inlines:
>> pkt = odp_recv();
>> len = odp_packet_len(pkt);
>>
>> ... etc.
>>
>> }
>>
>> So it's clear for each target arch you needs it's own variant of
>> ovs_get_and_packet_process() function. That function should go from ovs to
>> dynamic library.
>
> Which library? A library specific to OVS? Or some common ODP library
> that everyone uses? In either case the solution is not scalable. In
> the first case it still requires the app vendor to have a separate
> build for each and every supported target. In the second, it is
> basically argues for all fast-path application-specific code to go
> into a non-app-specific library. That really won't fly.
>
> I have two answers to this question. One for the short term, and one
> for the long.
>
> In the short term we have no choice. If we're going to support
> portable application binaries, then we cannot do inlines. ODP simply
> isn't set up to support that. Portable binaries will have to take the
> hit of doing a function call each and every time. It's not fast, but
> it *works*, which at least will set a lowest common denominator. To
> mitigate the problem we could encourage application packages to
> include a generic version (no-inlines, but works everywhere) plus one
> or more optimized builds (with inlines) and the correct binary is
> selected at runtime. Not great, but it is a reasonable answer for the
> short term.

I would argue for the short term to produce platform specific packages
as well, at least for ODP-OVS. As ODP-OVS is not upstream, we need to
produce an openvswitch-odp package anyway (which would set to conflict
with the normal openvswitch package). My idea is to create
openvswitch-odp-[platform] packages, though I don't know if you can set
a wildcard conflict rule during packaging to make sure only one of them
are installed at a time.

>
> For the long term to get away from per-platform builds, I see two
> viable options. Bill suggested the first: Use LLVM to optimize at
> runtime so that thing like inlines get picked up when linked to the
> platform library. There is some precedence of other projects already
> doing this, so this isn't as far fetched as it may seem.

But wouldn't it tie us down with LLVM?

> The second is
> to do what we already do in the kernel for ftrace: instrument the
> function calls and runtime patch them with optimized inlines. Not
> pretty, probably fragile, but we do have the knowledge from the kernel

Yes, I was thinking also about the ftrace way, but I'm not familiar with
ld.so enough to judge how hard it would be.
> of how to do it. All said, I would prefer an LLVM based solution, but
> investigation is needed to figure out how to make it work.
>
> g.
>
>>>> On 10 November 2015 at 02:50, Bill Fischofer <***@linaro.org
>>>> <mailto:***@linaro.org>> wrote:
>>>>
>>>> Adding Grant Likely to this chain as it relates to the broader
>>>> subject of portable ABIs that we've been discussing.
>>>>
>>>> On Mon, Nov 9, 2015 at 4:48 PM, Jim Wilson <***@linaro.org
>>>> <mailto:***@linaro.org>> wrote:
>>>>
>>>> On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer
>>>> <***@linaro.org <mailto:***@linaro.org>>
>>>> wrote:
>>>> > The IO Visor project appears to be doing something like this
>>>> with LLVM and
>>>> > JIT constructs to dynamically insert code into the kernel in a
>>>> > platform-independent manner. Perhaps we can leverage that
>>>> technology?
>>>>
>>>> GCC has some experimental JIT support, but I think it would be a
>>>> lot
>>>> of work to use it, and I don't know how stable it is.
>>>> https://gcc.gnu.org/wiki/JIT
>>>> The LLVM support is probably more advanced.
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> lng-***@lists.linaro.org <mailto:lng-***@lists.linaro.org>
>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> lng-***@lists.linaro.org
>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> lng-***@lists.linaro.org
>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>
Grant Likely
2015-11-10 15:08:32 UTC
Permalink
On Tue, Nov 10, 2015 at 3:04 PM, Zoltan Kiss <***@linaro.org> wrote:
>
>
> On 10/11/15 12:04, Grant Likely wrote:
>>
>> On Tue, Nov 10, 2015 at 11:08 AM, Maxim Uvarov <***@linaro.org>
>> wrote:
>>>
>>> On 10 November 2015 at 13:41, Zoltan Kiss <***@linaro.org> wrote:
>>>>
>>>> On 10/11/15 07:39, Maxim Uvarov wrote:
>>>>>
>>>>> And it looks like it's problem in OVS, not in ODP. I.e. OVS should
>>>>> allow
>>>>> to use library functions for fast path (where inlines are critical).
>>>>> I.e. not just call odp_packet_len(), but move hole OVS function to
>>>>> dynamic library.
>>>>
>>>>
>>>> I'm not sure I get your point here, but OVS allows to use dynamic
>>>> library
>>>> functions on fast path. The problem is that it's slow, because of the
>>>> function call overhead.
>>>
>>>
>>> I'm not familiar with ovs code. But for example ovs has something like:
>>>
>>> ovs_get_and_packet_process()
>>> {
>>> // here you use some inlines:
>>> pkt = odp_recv();
>>> len = odp_packet_len(pkt);
>>>
>>> ... etc.
>>>
>>> }
>>>
>>> So it's clear for each target arch you needs it's own variant of
>>> ovs_get_and_packet_process() function. That function should go from ovs
>>> to
>>> dynamic library.
>>
>>
>> Which library? A library specific to OVS? Or some common ODP library
>> that everyone uses? In either case the solution is not scalable. In
>> the first case it still requires the app vendor to have a separate
>> build for each and every supported target. In the second, it is
>> basically argues for all fast-path application-specific code to go
>> into a non-app-specific library. That really won't fly.
>>
>> I have two answers to this question. One for the short term, and one
>> for the long.
>>
>> In the short term we have no choice. If we're going to support
>> portable application binaries, then we cannot do inlines. ODP simply
>> isn't set up to support that. Portable binaries will have to take the
>> hit of doing a function call each and every time. It's not fast, but
>> it *works*, which at least will set a lowest common denominator. To
>> mitigate the problem we could encourage application packages to
>> include a generic version (no-inlines, but works everywhere) plus one
>> or more optimized builds (with inlines) and the correct binary is
>> selected at runtime. Not great, but it is a reasonable answer for the
>> short term.
>
>
> I would argue for the short term to produce platform specific packages as
> well, at least for ODP-OVS. As ODP-OVS is not upstream, we need to produce
> an openvswitch-odp package anyway (which would set to conflict with the
> normal openvswitch package). My idea is to create openvswitch-odp-[platform]
> packages, though I don't know if you can set a wildcard conflict rule during
> packaging to make sure only one of them are installed at a time.
>
>>
>> For the long term to get away from per-platform builds, I see two
>> viable options. Bill suggested the first: Use LLVM to optimize at
>> runtime so that thing like inlines get picked up when linked to the
>> platform library. There is some precedence of other projects already
>> doing this, so this isn't as far fetched as it may seem.
>
>
> But wouldn't it tie us down with LLVM?

Does that worry you? LLVM is a mature project, open source, and lots
of momentum behind it. There are worse things we can do than align
with LLVM when it brings capability that we cannot get anywhere else.

g.
Zoltan Kiss
2015-11-10 15:27:10 UTC
Permalink
On 10/11/15 15:08, Grant Likely wrote:
> On Tue, Nov 10, 2015 at 3:04 PM, Zoltan Kiss <***@linaro.org> wrote:
>>
>>
>> On 10/11/15 12:04, Grant Likely wrote:
>>>
>>> On Tue, Nov 10, 2015 at 11:08 AM, Maxim Uvarov <***@linaro.org>
>>> wrote:
>>>>
>>>> On 10 November 2015 at 13:41, Zoltan Kiss <***@linaro.org> wrote:
>>>>>
>>>>> On 10/11/15 07:39, Maxim Uvarov wrote:
>>>>>>
>>>>>> And it looks like it's problem in OVS, not in ODP. I.e. OVS should
>>>>>> allow
>>>>>> to use library functions for fast path (where inlines are critical).
>>>>>> I.e. not just call odp_packet_len(), but move hole OVS function to
>>>>>> dynamic library.
>>>>>
>>>>>
>>>>> I'm not sure I get your point here, but OVS allows to use dynamic
>>>>> library
>>>>> functions on fast path. The problem is that it's slow, because of the
>>>>> function call overhead.
>>>>
>>>>
>>>> I'm not familiar with ovs code. But for example ovs has something like:
>>>>
>>>> ovs_get_and_packet_process()
>>>> {
>>>> // here you use some inlines:
>>>> pkt = odp_recv();
>>>> len = odp_packet_len(pkt);
>>>>
>>>> ... etc.
>>>>
>>>> }
>>>>
>>>> So it's clear for each target arch you needs it's own variant of
>>>> ovs_get_and_packet_process() function. That function should go from ovs
>>>> to
>>>> dynamic library.
>>>
>>>
>>> Which library? A library specific to OVS? Or some common ODP library
>>> that everyone uses? In either case the solution is not scalable. In
>>> the first case it still requires the app vendor to have a separate
>>> build for each and every supported target. In the second, it is
>>> basically argues for all fast-path application-specific code to go
>>> into a non-app-specific library. That really won't fly.
>>>
>>> I have two answers to this question. One for the short term, and one
>>> for the long.
>>>
>>> In the short term we have no choice. If we're going to support
>>> portable application binaries, then we cannot do inlines. ODP simply
>>> isn't set up to support that. Portable binaries will have to take the
>>> hit of doing a function call each and every time. It's not fast, but
>>> it *works*, which at least will set a lowest common denominator. To
>>> mitigate the problem we could encourage application packages to
>>> include a generic version (no-inlines, but works everywhere) plus one
>>> or more optimized builds (with inlines) and the correct binary is
>>> selected at runtime. Not great, but it is a reasonable answer for the
>>> short term.
>>
>>
>> I would argue for the short term to produce platform specific packages as
>> well, at least for ODP-OVS. As ODP-OVS is not upstream, we need to produce
>> an openvswitch-odp package anyway (which would set to conflict with the
>> normal openvswitch package). My idea is to create openvswitch-odp-[platform]
>> packages, though I don't know if you can set a wildcard conflict rule during
>> packaging to make sure only one of them are installed at a time.
>>
>>>
>>> For the long term to get away from per-platform builds, I see two
>>> viable options. Bill suggested the first: Use LLVM to optimize at
>>> runtime so that thing like inlines get picked up when linked to the
>>> platform library. There is some precedence of other projects already
>>> doing this, so this isn't as far fetched as it may seem.
>>
>>
>> But wouldn't it tie us down with LLVM?
>
> Does that worry you?

Only that then we require our applications to use LLVM if they want
performance. I don't know the impact of that.

> LLVM is a mature project, open source, and lots
> of momentum behind it. There are worse things we can do than align
> with LLVM when it brings capability that we cannot get anywhere else.
>
> g.
>
Pinski, Andrew
2015-11-10 18:54:36 UTC
Permalink
> On Nov 10, 2015, at 7:28 AM, Zoltan Kiss <***@linaro.org> wrote:
>
>
>
>> On 10/11/15 15:08, Grant Likely wrote:
>>> On Tue, Nov 10, 2015 at 3:04 PM, Zoltan Kiss <***@linaro.org> wrote:
>>>
>>>
>>>> On 10/11/15 12:04, Grant Likely wrote:
>>>>
>>>> On Tue, Nov 10, 2015 at 11:08 AM, Maxim Uvarov <***@linaro.org>
>>>> wrote:
>>>>>
>>>>>> On 10 November 2015 at 13:41, Zoltan Kiss <***@linaro.org> wrote:
>>>>>>
>>>>>>> On 10/11/15 07:39, Maxim Uvarov wrote:
>>>>>>>
>>>>>>> And it looks like it's problem in OVS, not in ODP. I.e. OVS should
>>>>>>> allow
>>>>>>> to use library functions for fast path (where inlines are critical).
>>>>>>> I.e. not just call odp_packet_len(), but move hole OVS function to
>>>>>>> dynamic library.
>>>>>>
>>>>>>
>>>>>> I'm not sure I get your point here, but OVS allows to use dynamic
>>>>>> library
>>>>>> functions on fast path. The problem is that it's slow, because of the
>>>>>> function call overhead.
>>>>>
>>>>>
>>>>> I'm not familiar with ovs code. But for example ovs has something like:
>>>>>
>>>>> ovs_get_and_packet_process()
>>>>> {
>>>>> // here you use some inlines:
>>>>> pkt = odp_recv();
>>>>> len = odp_packet_len(pkt);
>>>>>
>>>>> ... etc.
>>>>>
>>>>> }
>>>>>
>>>>> So it's clear for each target arch you needs it's own variant of
>>>>> ovs_get_and_packet_process() function. That function should go from ovs
>>>>> to
>>>>> dynamic library.
>>>>
>>>>
>>>> Which library? A library specific to OVS? Or some common ODP library
>>>> that everyone uses? In either case the solution is not scalable. In
>>>> the first case it still requires the app vendor to have a separate
>>>> build for each and every supported target. In the second, it is
>>>> basically argues for all fast-path application-specific code to go
>>>> into a non-app-specific library. That really won't fly.
>>>>
>>>> I have two answers to this question. One for the short term, and one
>>>> for the long.
>>>>
>>>> In the short term we have no choice. If we're going to support
>>>> portable application binaries, then we cannot do inlines. ODP simply
>>>> isn't set up to support that. Portable binaries will have to take the
>>>> hit of doing a function call each and every time. It's not fast, but
>>>> it *works*, which at least will set a lowest common denominator. To
>>>> mitigate the problem we could encourage application packages to
>>>> include a generic version (no-inlines, but works everywhere) plus one
>>>> or more optimized builds (with inlines) and the correct binary is
>>>> selected at runtime. Not great, but it is a reasonable answer for the
>>>> short term.
>>>
>>>
>>> I would argue for the short term to produce platform specific packages as
>>> well, at least for ODP-OVS. As ODP-OVS is not upstream, we need to produce
>>> an openvswitch-odp package anyway (which would set to conflict with the
>>> normal openvswitch package). My idea is to create openvswitch-odp-[platform]
>>> packages, though I don't know if you can set a wildcard conflict rule during
>>> packaging to make sure only one of them are installed at a time.
>>>
>>>>
>>>> For the long term to get away from per-platform builds, I see two
>>>> viable options. Bill suggested the first: Use LLVM to optimize at
>>>> runtime so that thing like inlines get picked up when linked to the
>>>> platform library. There is some precedence of other projects already
>>>> doing this, so this isn't as far fetched as it may seem.
>>>
>>>
>>> But wouldn't it tie us down with LLVM?
>>
>> Does that worry you?
>
> Only that then we require our applications to use LLVM if they want performance. I don't know the impact of that.

Or they recompile the programs to get the speed. I am sorry but this is not a new problem. Most of the embedded folks are use to this. What a vendor of odp could do is provide an optimized version of the programs they think are important.

Thanks,
Andrew


>
>> LLVM is a mature project, open source, and lots
>> of momentum behind it. There are worse things we can do than align
>> with LLVM when it brings capability that we cannot get anywhere else.
>>
>> g.
> _______________________________________________
> linaro-toolchain mailing list
> linaro-***@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Zoltan Kiss
2015-11-10 12:09:49 UTC
Permalink
On 10/11/15 11:08, Maxim Uvarov wrote:
>
>
> On 10 November 2015 at 13:41, Zoltan Kiss <***@linaro.org
> <mailto:***@linaro.org>> wrote:
>
>
>
> On 10/11/15 07:39, Maxim Uvarov wrote:
>
> JIT like lua might also not work because you need to rewrite OVS to
> support it. I don't think that it will be accepted.
>
> And it looks like it's problem in OVS, not in ODP. I.e. OVS
> should allow
> to use library functions for fast path (where inlines are
> critical).
> I.e. not just call odp_packet_len(), but move hole OVS function to
> dynamic library.
>
>
> I'm not sure I get your point here, but OVS allows to use dynamic
> library functions on fast path. The problem is that it's slow,
> because of the function call overhead.
>
>
> I'm not familiar with ovs code. But for example ovs has something like:
>
> ovs_get_and_packet_process()
> {
> // here you use some inlines:
> pkt = odp_recv();
> len = odp_packet_len(pkt);
>
> ... etc.
>
> }
>
> So it's clear for each target arch you needs it's own variant of
> ovs_get_and_packet_process() function. That function should go from ovs
> to dynamic library.

I see. That would mitigate some of the problems, but unfortunately the
usage of these accessor functions couldn't be narrowed down to
particular piece of fast path code. E.g. the packet length is a quite
good example, you need it very often during processing, at different
parts of the code.

>
> Maxim.
>
>
>
>
> regards,
> Maxim.
>
> On 10 November 2015 at 02:50, Bill Fischofer
> <***@linaro.org <mailto:***@linaro.org>
> <mailto:***@linaro.org
> <mailto:***@linaro.org>>> wrote:
>
> Adding Grant Likely to this chain as it relates to the broader
> subject of portable ABIs that we've been discussing.
>
> On Mon, Nov 9, 2015 at 4:48 PM, Jim Wilson
> <***@linaro.org <mailto:***@linaro.org>
> <mailto:***@linaro.org
> <mailto:***@linaro.org>>> wrote:
>
> On Mon, Nov 9, 2015 at 2:39 PM, Bill Fischofer
> <***@linaro.org
> <mailto:***@linaro.org>
> <mailto:***@linaro.org
> <mailto:***@linaro.org>>>
> wrote:
> > The IO Visor project appears to be doing something
> like this with LLVM and
> > JIT constructs to dynamically insert code into the
> kernel in a
> > platform-independent manner. Perhaps we can leverage
> that technology?
>
> GCC has some experimental JIT support, but I think it
> would be a lot
> of work to use it, and I don't know how stable it is.
> https://gcc.gnu.org/wiki/JIT
> The LLVM support is probably more advanced.
>
> Jim
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org <mailto:lng-***@lists.linaro.org>
> <mailto:lng-***@lists.linaro.org <mailto:lng-***@lists.linaro.org>>
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org <mailto:lng-***@lists.linaro.org>
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
>
Ola Liljedahl
2015-11-10 14:57:02 UTC
Permalink
On 6 November 2015 at 15:48, Zoltan Kiss <***@linaro.org> wrote:

> Hi,
>
> We have a packaging/linking/optimization problem at LNG, I hope you guys
> can give us some advice on that. (Cc'ing ODP list in case someone want to
> add something)
> We have OpenDataPlane (ODP), an API stretching between userspace
> applications and hardware SDKs. It's defined in the form of C headers, and
> we already have several implementations to face SDKs (or whathever is
> actually controlling the hardware), e.g. linux-generic, a DPDK one etc.
> And we have applications, like Open vSwitch (OVS), which now is able to
> work with any ODP platform implementation which implements this API
> When it comes to packaging, the ideal scenario would be to create one
> package for the application, e.g. openvswitch.deb, and one for each
> platform, e.g odp-generic.deb, odp-dpdk.deb. The latter would contain the
> implementations in the form of a libodp.so file, so the application can
> dynamically load the actually installed platform's library runtime, with
> all the benefits of dynamic linking.
>
We also need binary compatibility between different ODP implementations.
Binary compatibility that goes beyond an ABI.

I would be happy if we for a start could prove that we actually have source
code compatibility. E.g. compile and run the exact same app using different
ODP implementations and run them on their respective platforms with the
expected behaviour (including performance).

The trouble is that we have several accessor functions in the API which are
> very short and __very__ frequently used. The best example is "uint32_t
> odp_packet_len(odp_packet_t pkt)", which returns the length of the packet.
> odp_packet_t is an opaque type defined by the implementation, often a
> pointer to the packet's actual metadata, so the actual function call yields
> to a simple load from that metadata pointer (+offset). Having it wrapped
> into a function call brings a significant performance decrease: when
> forwarding 64 byte packets at 10 Gbps, I got 13.2 Mpps with function calls.
> When I've inlined that function it brought 13.8 Mpps, that's ~5%
> difference. And there are a lot of other frequently used short accessor
> functions with the same problem.
> But obviously if I inline these functions I break the ABI, and I need to
> compile the application for each platform (and create packages like
> openvswitch-odp-dpdk.deb, containing the platform statically linked). I've
> tried to look around on Google and in gcc manual, but I couldn't find a
> good solution for this kind of problem.
> I've checked link time optimization (-flto), but it only helps with static
> linking. Is there any way to keep the ODP application and platform
> implementation binaries in separate files while having the performance
> benefit of inlining?
>
> Regards,
>
> Zoltan
> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
Mike Holmes
2015-11-11 11:46:57 UTC
Permalink
On 11 November 2015 at 00:45, Savolainen, Petri (Nokia - FI/Espoo) <
***@nokia.com> wrote:

>
>
> > -----Original Message-----
> > From: lng-odp [mailto:lng-odp-***@lists.linaro.org] On Behalf Of
> > EXT Nicolas Morey-Chaisemartin
> > Sent: Tuesday, November 10, 2015 5:13 PM
> > To: Zoltan Kiss; linaro-***@lists.linaro.org
> > Cc: lng-odp
> > Subject: Re: [lng-odp] Runtime inlining
> >
> > As I said in the call last week, the problem is wider than that.
> >
> > ODP specifies a lot of types but not their sizes, a lot of
> > enums/defines (things like ODP_PKTIO_INVALID) but not their value
> > either.
> > For our port a lot of those values were changed for
> > performance/implementation reason. So I'm not even compatible between
> > one version of our ODP port and another one.
> >
> > The only way I can see to solve this is for ODP to fix the size of all
> > these types.
> > Default/Invalid values are not that easy, as a pointer would have a
> > completely different behaviour from structs/bitfields
> >
> > Nicolas
> >
>
> Type sizes do not need to be fixed in general, but only when an
> application is build for binary compatibility (the use case we are talking
> here). Binary compatibility and thus the fixed type sizes are defined per
> ISA.
>
> We can e.g. define a configure target (for our reference implementation ==
> linux-generic) "--binary-compatible=armv8.x" or
> "--binary-compatible=x86_64". When you build your application with that
> option, "platform dependent" types and constants would be fixed to
> pre-defined values specified in (new) ODP API arch files.
>
> So instead of building against
> odp/platform/linux-generic/include/odp/plat/queue_types.h ...
>
> typedef ODP_HANDLE_T(odp_queue_t);
> #define ODP_QUEUE_INVALID _odp_cast_scalar(odp_queue_t, 0)
> #define ODP_QUEUE_NAME_LEN 32
>
>
> ... you'd build against odp/arch/armv8.x/include/odp/queue_types.h ...
>


With the introduction of odp/arch at the top level I think we should also
move platform/linux-generic/arch to the same location


> typedef uintptr_t odp_queue_t;
> #define ODP_QUEUE_INVALID ((uintptr_t)0)
> #define ODP_QUEUE_NAME_LEN 64
>
>
> ... or odp/arch/x86_64/include/odp/queue_types.h
>
> typedef uint64_t odp_queue_t;
> #define ODP_QUEUE_INVALID ((uint64_t)0xffffffffffffffff)
> #define ODP_QUEUE_NAME_LEN 32
>
>
> For highest performance on a fixed target platform, you'd still build
> against the platform directly
>
> odp/platform/<soc_vendor_xyz>/include/odp/plat/queue_types.h
>
> typedef xyz_queue_desc_t * odp_queue_t;
> #define ODP_QUEUE_INVALID ((xyz_queue_desc_t *)0xdeadbeef)
> #define ODP_QUEUE_NAME_LEN 20
>
>
> -Petri
>
>
>
>
> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>



--
Mike Holmes
Technical Manager - Linaro Networking Group
Linaro.org <http://www.linaro.org/> *│ *Open source software for ARM SoCs
Gary Robertson
2015-11-13 18:15:21 UTC
Permalink
My information is admittedly a bit dated now - but last time I looked into
it LLVM still had some performance deficits versus gcc.
Have they caught up now? Otherwise any performance gains from their JIT
inlining technology may be offset by diminished performance overall.

On Fri, Nov 13, 2015 at 10:35 AM, Zoltan Kiss <***@linaro.org>
wrote:

>
>
> On 13/11/15 16:19, Nikolai Bozhenov wrote:
>
>>
>>
>> On 11/06/2015 05:48 PM, Zoltan Kiss wrote:
>>
>>> Hi,
>>>
>>> We have a packaging/linking/optimization problem at LNG, I hope you
>>> guys can give us some advice on that. (Cc'ing ODP list in case someone
>>> want to add something)
>>> We have OpenDataPlane (ODP), an API stretching between userspace
>>> applications and hardware SDKs. It's defined in the form of C headers,
>>> and we already have several implementations to face SDKs (or whathever
>>> is actually controlling the hardware), e.g. linux-generic, a DPDK one
>>> etc.
>>> And we have applications, like Open vSwitch (OVS), which now is able
>>> to work with any ODP platform implementation which implements this API
>>> When it comes to packaging, the ideal scenario would be to create one
>>> package for the application, e.g. openvswitch.deb, and one for each
>>> platform, e.g odp-generic.deb, odp-dpdk.deb. The latter would contain
>>> the implementations in the form of a libodp.so file, so the
>>> application can dynamically load the actually installed platform's
>>> library runtime, with all the benefits of dynamic linking.
>>> The trouble is that we have several accessor functions in the API
>>> which are very short and __very__ frequently used. The best example is
>>> "uint32_t odp_packet_len(odp_packet_t pkt)", which returns the length
>>> of the packet. odp_packet_t is an opaque type defined by the
>>> implementation, often a pointer to the packet's actual metadata, so
>>> the actual function call yields to a simple load from that metadata
>>> pointer (+offset). Having it wrapped into a function call brings a
>>> significant performance decrease: when forwarding 64 byte packets at
>>> 10 Gbps, I got 13.2 Mpps with function calls. When I've inlined that
>>> function it brought 13.8 Mpps, that's ~5% difference. And there are a
>>> lot of other frequently used short accessor functions with the same
>>> problem.
>>> But obviously if I inline these functions I break the ABI, and I need
>>> to compile the application for each platform (and create packages like
>>> openvswitch-odp-dpdk.deb, containing the platform statically linked).
>>> I've tried to look around on Google and in gcc manual, but I couldn't
>>> find a good solution for this kind of problem.
>>> I've checked link time optimization (-flto), but it only helps with
>>> static linking. Is there any way to keep the ODP application and
>>> platform implementation binaries in separate files while having the
>>> performance benefit of inlining?
>>>
>>> Regards,
>>>
>>> Zoltan
>>>
>>>
>> Hi!
>>
>> If all you need is to have fast and portable binary, I wonder if you
>> could use relocations to attain your goal. I mean, to make the dynamic
>> linker overwrite at startup time the call instructions with some
>> machine specific absolute values. E.g. with 0xe590400c which is the
>> binary representation of the 'ldr r0, [r0, #12]' instruction and which
>> seems to be fully equivalent to the call to odp_packet_len.
>>
>
> Something like that would be the best, yes, but it seems gcc and friends
> doesn't support that. As others said, probably LLVM has a JIT which can do
> that.
>
>>
>> Another slightly different option is to inline the call at build time,
>> but to have the dynamic linker patch some inlined instructions at
>> startup time. For example, to write correct offsets into the load
>> instructions.
>>
>> Nikolai
>>
>> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
Nikolai Bozhenov
2015-11-14 11:55:56 UTC
Permalink
On 11/13/2015 07:35 PM, Zoltan Kiss wrote:
>
>
> On 13/11/15 16:19, Nikolai Bozhenov wrote:
>>
>>
>> On 11/06/2015 05:48 PM, Zoltan Kiss wrote:
>>> Hi,
>>>
>>> We have a packaging/linking/optimization problem at LNG, I hope you
>>> guys can give us some advice on that. (Cc'ing ODP list in case someone
>>> want to add something)
>>> We have OpenDataPlane (ODP), an API stretching between userspace
>>> applications and hardware SDKs. It's defined in the form of C headers,
>>> and we already have several implementations to face SDKs (or whathever
>>> is actually controlling the hardware), e.g. linux-generic, a DPDK one
>>> etc.
>>> And we have applications, like Open vSwitch (OVS), which now is able
>>> to work with any ODP platform implementation which implements this API
>>> When it comes to packaging, the ideal scenario would be to create one
>>> package for the application, e.g. openvswitch.deb, and one for each
>>> platform, e.g odp-generic.deb, odp-dpdk.deb. The latter would contain
>>> the implementations in the form of a libodp.so file, so the
>>> application can dynamically load the actually installed platform's
>>> library runtime, with all the benefits of dynamic linking.
>>> The trouble is that we have several accessor functions in the API
>>> which are very short and __very__ frequently used. The best example is
>>> "uint32_t odp_packet_len(odp_packet_t pkt)", which returns the length
>>> of the packet. odp_packet_t is an opaque type defined by the
>>> implementation, often a pointer to the packet's actual metadata, so
>>> the actual function call yields to a simple load from that metadata
>>> pointer (+offset). Having it wrapped into a function call brings a
>>> significant performance decrease: when forwarding 64 byte packets at
>>> 10 Gbps, I got 13.2 Mpps with function calls. When I've inlined that
>>> function it brought 13.8 Mpps, that's ~5% difference. And there are a
>>> lot of other frequently used short accessor functions with the same
>>> problem.
>>> But obviously if I inline these functions I break the ABI, and I need
>>> to compile the application for each platform (and create packages like
>>> openvswitch-odp-dpdk.deb, containing the platform statically linked).
>>> I've tried to look around on Google and in gcc manual, but I couldn't
>>> find a good solution for this kind of problem.
>>> I've checked link time optimization (-flto), but it only helps with
>>> static linking. Is there any way to keep the ODP application and
>>> platform implementation binaries in separate files while having the
>>> performance benefit of inlining?
>>>
>>> Regards,
>>>
>>> Zoltan
>>>
>>
>> Hi!
>>
>> If all you need is to have fast and portable binary, I wonder if you
>> could use relocations to attain your goal. I mean, to make the dynamic
>> linker overwrite at startup time the call instructions with some
>> machine specific absolute values. E.g. with 0xe590400c which is the
>> binary representation of the 'ldr r0, [r0, #12]' instruction and which
>> seems to be fully equivalent to the call to odp_packet_len.
>
> Something like that would be the best, yes, but it seems gcc and
> friends doesn't support that. As others said, probably LLVM has a JIT
> which can do that.
>

I don't think you need JIT. JIT is obviously overkill for that. All you
need is to reserve some space at compile time (e.g. with an inline
assembly) and then patch the space at startup. The latter sounds like
a task for the loader.

Though, I don't think there is support for that in any toolchain. It
is not typical to have such hot small functions in shared libraries.
So, you will have to do some development in the toolchain anyway to
support the suggested optimization.

Nikolai
Ola Liljedahl
2015-11-16 14:16:32 UTC
Permalink
I think there are many issues with binary compatibility beyond
function inlining. An ODP application cannot expect all ODP
implementations to support the same number of ODP queues or
classification rules or even which classification terms (fields) are
supported (efficiently/in HW) etc. Is there some kind of lowest common
denominator an application should expect? Do we want to make
guarantees of an ODP implementation stricter? What are the
consequences of such strict functional guarantees?

I think an application that requires binary compatibility over ARMv8.1
platforms should compile and link against a specific ODP SW
implementation (possibly with some well-defined HW offloads where the
underlying platform can provide the relevant drivers). I.e. more of a
(user-space) Linux architecture than standard ODP (as influenced by
OpenGL). The important binary interfaces then becomes the interfaces
to these offloads/drivers.

On 16 November 2015 at 14:23, Nicolas Morey-Chaisemartin
<***@kalray.eu> wrote:
>
>
> On 11/11/2015 09:45 AM, Savolainen, Petri (Nokia - FI/Espoo) wrote:
>>
>>> -----Original Message-----
>>> From: lng-odp [mailto:lng-odp-***@lists.linaro.org] On Behalf Of
>>> EXT Nicolas Morey-Chaisemartin
>>> Sent: Tuesday, November 10, 2015 5:13 PM
>>> To: Zoltan Kiss; linaro-***@lists.linaro.org
>>> Cc: lng-odp
>>> Subject: Re: [lng-odp] Runtime inlining
>>>
>>> As I said in the call last week, the problem is wider than that.
>>>
>>> ODP specifies a lot of types but not their sizes, a lot of
>>> enums/defines (things like ODP_PKTIO_INVALID) but not their value
>>> either.
>>> For our port a lot of those values were changed for
>>> performance/implementation reason. So I'm not even compatible between
>>> one version of our ODP port and another one.
>>>
>>> The only way I can see to solve this is for ODP to fix the size of all
>>> these types.
>>> Default/Invalid values are not that easy, as a pointer would have a
>>> completely different behaviour from structs/bitfields
>>>
>>> Nicolas
>>>
>> Type sizes do not need to be fixed in general, but only when an application is build for binary compatibility (the use case we are talking here). Binary compatibility and thus the fixed type sizes are defined per ISA.
>>
>> We can e.g. define a configure target (for our reference implementation == linux-generic) "--binary-compatible=armv8.x" or "--binary-compatible=x86_64". When you build your application with that option, "platform dependent" types and constants would be fixed to pre-defined values specified in (new) ODP API arch files.
>>
>> So instead of building against odp/platform/linux-generic/include/odp/plat/queue_types.h ...
>>
>> typedef ODP_HANDLE_T(odp_queue_t);
>> #define ODP_QUEUE_INVALID _odp_cast_scalar(odp_queue_t, 0)
>> #define ODP_QUEUE_NAME_LEN 32
>>
>>
>> ... you'd build against odp/arch/armv8.x/include/odp/queue_types.h ...
>>
>> typedef uintptr_t odp_queue_t;
>> #define ODP_QUEUE_INVALID ((uintptr_t)0)
>> #define ODP_QUEUE_NAME_LEN 64
>>
>>
>> ... or odp/arch/x86_64/include/odp/queue_types.h
>>
>> typedef uint64_t odp_queue_t;
>> #define ODP_QUEUE_INVALID ((uint64_t)0xffffffffffffffff)
>> #define ODP_QUEUE_NAME_LEN 32
>>
>>
>> For highest performance on a fixed target platform, you'd still build against the platform directly
>>
>> odp/platform/<soc_vendor_xyz>/include/odp/plat/queue_types.h
>>
>> typedef xyz_queue_desc_t * odp_queue_t;
>> #define ODP_QUEUE_INVALID ((xyz_queue_desc_t *)0xdeadbeef)
>> #define ODP_QUEUE_NAME_LEN 20
>>
>>
>> -Petri
>>
>
> It still means that you need to enforce a type for all ODP implementation on a given arch. Which could be problematic.
> As a precise example: the way handles are used now for odp_packet_t brings some useful features for checks and memory savings, but performance wise, they are a "disaster". One of the first thing I did was to switch them to pointers. And if I wanted a high perf linux x86_64 implementation, I'd probably do the same.
>
> Nicolas
> _______________________________________________
> lng-odp mailing list
> lng-***@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
Loading...