Replacing boost::any arguments with std::variant

older
Discrepancy between histograms and...

Jeff Trull

6 Feb 2020 6 Feb '20

6:12 a.m.

Hi all, I've been wrestling with long compile times in graph-tool while doing some development work, and I noticed that template instantiations seem to be an important factor. Even functions of modest compile time can be very slow if they are multiplied by the product of several type alternatives, each of which much be instantiated. For example, a single function in Python may dispatch to N graph types x M degree map types x P weight map types, which can mean hundreds of instantiations. At the same time I noticed a pattern in the code where there is a set of types (represented by a Boost.MPL typelist) that accompany a boost::any argument. At dispatch time the "any" object is interrogated to find out which of the types it stores, and the correct instantiation is called. Dispatching this way requires a linear search through the typelist for each argument. It seems to me that replacing (MPL typelist + boost::any) with std::variant would improve both of those factors with: 1. Constant time dispatch to the appropriate instantiation via std::visit 2. The ability to reduce compile time by reducing the N x M... type product, depending on how well the argument use can be refactored. I think the approach described in http://jefftrull.github.io/c++/boost/python/2020/01/30/variants-in-boost-pyt... could be applied here. Would there be any interest in exploring this kind of refactoring? I think there could be substantial benefits in compile time, as well as some runtime improvement (depending on how often the Python/C++ boundary is crossed). Thanks, Jeff

Attachments:

attachment.html (text/html — 1.9 KB)

Show replies by date

Tiago de Paula Peixoto

6 Feb 6 Feb

1:09 p.m.

Am 06.02.20 um 07:12 schrieb Jeff Trull:

...

Would there be any interest in exploring this kind of refactoring? I think there could be substantial benefits in compile time, as well as some runtime improvement (depending on how often the Python/C++ boundary is crossed).

I don't think the important issue is whether we use boost::any or std::variant. The high compile times stem simply from the fact we have to cycle through the Cartesian product of the set of types of each parameter. I don't see how using std::variant changes this in any way. -- Tiago de Paula Peixoto <tiago@skewed.de>

Jeff Trull

5:59 p.m.

...

std::variant. The high compile times stem simply from the fact we have to cycle through the Cartesian product of the set of types of each

I absolutely agree on that diagnosis, but I think std::variant can play a role. The key is to postpone dispatch, where possible, to refactor out one or more of the product terms. At the moment the dispatch mechanism identifies all the concrete types first, *then* runs the correct instantiated function. If there were more flexibility in this process, dispatching to selected type-dependent code could happen later and cover less code. Consider: template<class A, class B, class C> void foo(A a, B b, C c) { // lots of A and B code // a single use of C // lots more A and B } For the sake of a small amount of code involving C the entire function gets rebuilt as many times as there are C types. Now consider an approach that postponed determining C's concrete type: template<class A, class B> void foo(A a, B b, std::variant<C1, C2, ...> c_var) { // lots of A and B std::visit([](auto const & c){ // use of C }, c_var); // lots more A and B } If C was something based on scalar_types this would mean a factor of 6 reduction in instantiations! It's true that you could do the same trick - though IMO less cleanly - with boost::any and a typelist. But this would leave off the other advantage of variant: constant time dispatch. I don't know what the runtime impact of the current iterative approach is, but it seems like it could particularly important in filtered graphs, or in algorithms with a lot of callbacks (like my personal favorite, astar_search). On Thu, Feb 6, 2020 at 5:09 AM Tiago de Paula Peixoto <tiago@skewed.de> wrote:

...

Am 06.02.20 um 07:12 schrieb Jeff Trull:

...
Would there be any interest in exploring this kind of refactoring? I think there could be substantial benefits in compile time, as well as some runtime improvement (depending on how often the Python/C++ boundary is crossed).

I don't think the important issue is whether we use boost::any or std::variant. The high compile times stem simply from the fact we have to cycle through the Cartesian product of the set of types of each parameter. I don't see how using std::variant changes this in any way.

-- Tiago de Paula Peixoto <tiago@skewed.de> _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool

Tiago de Paula Peixoto

9:18 p.m.

Am 06.02.20 um 18:59 schrieb Jeff Trull:

...

...
std::variant. The high compile times stem simply from the fact we have to cycle through the Cartesian product of the set of types of each

I absolutely agree on that diagnosis, but I think std::variant can play a role. The key is to postpone dispatch, where possible, to refactor out one or more of the product terms.

At the moment the dispatch mechanism identifies all the concrete types first, /then/ runs the correct instantiated function. If there were more flexibility in this process, dispatching to selected type-dependent code could happen later and cover less code. Consider:

template<class A, class B, class C> void foo(A a, B b, C c) { // lots of A and B code // a single use of C // lots more A and B }

For the sake of a small amount of code involving C the entire function gets rebuilt as many times as there are C types. Now consider an approach that postponed determining C's concrete type:

template<class A, class B> void foo(A a, B b, std::variant<C1, C2, ...> c_var) { // lots of A and B std::visit([](auto const & c){ // use of C }, c_var); // lots more A and B }

If C was something based on scalar_types this would mean a factor of 6 reduction in instantiations!

But that is not the relevant pattern. What we require is the *joint* instantiation of A B and C types, and a dispatch that is specialized for this joint combination, and hence is as fast as possible. What you are describing (independent dispatching for every type) is not very different from dynamic typing, unless the code can be divided into clear disjoint blocks as in your example, which is not the case for most algorithms in graph-tool. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

Jeff Trull

7 Feb 7 Feb

7:28 p.m.

...

But that is not the relevant pattern. What we require is the *joint* instantiation of A B and C types, and a dispatch that is specialized for this joint combination, and hence is as fast as possible.

The joint combination is straightforward: template<class A, class B, class C> void fooImpl(A a, B b, C c); void foo(std::variant<A1..> avar, std::variant<B1..> bvar, std::variant<C1..> cvar) { std::visit( [](auto a, auto b, auto c) { fooImpl(a, b, c); }, avar, bvar, cvar); } but in this case, as you pointed out, we don't get much compile time advantage - but we do still enjoy the fast dispatch.

...

What you are describing (independent dispatching for every type) is not very different from dynamic typing, unless the code can be divided into clear disjoint blocks as in your example, which is not the case for most algorithms in graph-tool

I will defer to your judgment on that one, though perhaps surprisingly I found this worked in the first (only) algorithm I tried applying it to: assortativity. I selected it based on how long it took to build. The results were: boost::any + typelist : 177s, 4.5GB memory std::variant for edge weights only: 37s + 1.74GB The memory reduction is very useful in that it enables parallel builds. The prototype can be found here: https://git.skewed.de/jaafar/graph-tool/compare/master...feature%2Fvariant-c... Best, Jeff On Thu, Feb 6, 2020 at 1:18 PM Tiago de Paula Peixoto <tiago@skewed.de> wrote:

...

Am 06.02.20 um 18:59 schrieb Jeff Trull:

...
...
std::variant. The high compile times stem simply from the fact we have to cycle through the Cartesian product of the set of types of each

I absolutely agree on that diagnosis, but I think std::variant can play a role. The key is to postpone dispatch, where possible, to refactor out one or more of the product terms.

At the moment the dispatch mechanism identifies all the concrete types first, /then/ runs the correct instantiated function. If there were more flexibility in this process, dispatching to selected type-dependent code could happen later and cover less code. Consider:

template<class A, class B, class C> void foo(A a, B b, C c) { // lots of A and B code // a single use of C // lots more A and B }

For the sake of a small amount of code involving C the entire function gets rebuilt as many times as there are C types. Now consider an approach that postponed determining C's concrete type:

template<class A, class B> void foo(A a, B b, std::variant<C1, C2, ...> c_var) { // lots of A and B std::visit([](auto const & c){ // use of C }, c_var); // lots more A and B }

If C was something based on scalar_types this would mean a factor of 6 reduction in instantiations!

But that is not the relevant pattern. What we require is the *joint* instantiation of A B and C types, and a dispatch that is specialized for this joint combination, and hence is as fast as possible.

What you are describing (independent dispatching for every type) is not very different from dynamic typing, unless the code can be divided into clear disjoint blocks as in your example, which is not the case for most algorithms in graph-tool.

Best, Tiago

-- Tiago de Paula Peixoto <tiago@skewed.de> _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool

Tiago de Paula Peixoto

8:51 p.m.

Am 07.02.20 um 20:28 schrieb Jeff Trull:

...

...
But that is not the relevant pattern. What we require is the *joint* instantiation of A B and C types, and a dispatch that is specialized for this joint combination, and hence is as fast as possible.

The joint combination is straightforward:

template<class A, class B, class C> void fooImpl(A a, B b, C c);

void foo(std::variant<A1..> avar, std::variant<B1..> bvar, std::variant<C1..> cvar) { std::visit( [](auto a, auto b, auto c) { fooImpl(a, b, c); }, avar, bvar, cvar); }

but in this case, as you pointed out, we don't get much compile time advantage - but we do still enjoy the fast dispatch.

It's possible that the compiler is able to compile this in less time and memory than what is currently done with MPL + boost::any (which predates the existence of std::variant). It's something to be considered.

...

...
What you are describing (independent dispatching for every type) is not very different from dynamic typing, unless the code can be divided into clear disjoint blocks as in your example, which is not the case for most algorithms in graph-tool

I will defer to your judgment on that one, though perhaps surprisingly I found this worked in the first (only) algorithm I tried applying it to: assortativity. I selected it based on how long it took to build. The results were:

boost::any + typelist : 177s, 4.5GB memory std::variant for edge weights only: 37s + 1.74GB

The memory reduction is very useful in that it enables parallel builds.

The prototype can be found here: https://git.skewed.de/jaafar/graph-tool/compare/master...feature%2Fvariant-c...

I don't find this kind of modification interesting, because the trade-off goes precisely in the opposite direction of what is the intention of the library: you're sacrificing run time for compile time. In particular you replace _every_ lookup of property maps with a variant visit, including type conversion. Sure, that compiles faster because it reduces the type combinations, but it just shifts the burden to run time. This could also be achieved in the traditional way of doing run time polymorphism, virtual functions, etc. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

Jeff Trull

9:13 p.m.

...

It's possible that the compiler is able to compile this in less time and memory than what is currently done with MPL + boost::any (which predates the existence of std::variant). It's something to be considered.

I'm sorry to say that in my experiments it made only a small difference. I think the main benefit would be felt at runtime. It may be worth it for that, though.

...

I don't find this kind of modification interesting, because the trade-off goes precisely in the opposite direction of what is the intention of the library: you're sacrificing run time for compile time. In particular you replace _every_ lookup of property maps with a variant visit, including type conversion. Sure, that compiles faster because it

Quite true! Thanks for considering it, though. Best, Jeff On Fri, Feb 7, 2020 at 12:51 PM Tiago de Paula Peixoto <tiago@skewed.de> wrote:

...

Am 07.02.20 um 20:28 schrieb Jeff Trull:

...
...
But that is not the relevant pattern. What we require is the *joint* instantiation of A B and C types, and a dispatch that is specialized for this joint combination, and hence is as fast as possible.

The joint combination is straightforward:

template<class A, class B, class C> void fooImpl(A a, B b, C c);

void foo(std::variant<A1..> avar, std::variant<B1..> bvar, std::variant<C1..> cvar) { std::visit( [](auto a, auto b, auto c) { fooImpl(a, b, c); }, avar, bvar, cvar); }

but in this case, as you pointed out, we don't get much compile time advantage - but we do still enjoy the fast dispatch.

It's possible that the compiler is able to compile this in less time and memory than what is currently done with MPL + boost::any (which predates the existence of std::variant). It's something to be considered.

...
...
What you are describing (independent dispatching for every type) is not very different from dynamic typing, unless the code can be divided into clear disjoint blocks as in your example, which is not the case for most algorithms in graph-tool

I will defer to your judgment on that one, though perhaps surprisingly I found this worked in the first (only) algorithm I tried applying it to: assortativity. I selected it based on how long it took to build. The results were:

boost::any + typelist : 177s, 4.5GB memory std::variant for edge weights only: 37s + 1.74GB

The memory reduction is very useful in that it enables parallel builds.

The prototype can be found here: https://git.skewed.de/jaafar/graph-tool/compare/master...feature%2Fvariant-c...

I don't find this kind of modification interesting, because the trade-off goes precisely in the opposite direction of what is the intention of the library: you're sacrificing run time for compile time. In particular you replace _every_ lookup of property maps with a variant visit, including type conversion. Sure, that compiles faster because it reduces the type combinations, but it just shifts the burden to run time. This could also be achieved in the traditional way of doing run time polymorphism, virtual functions, etc.

Best, Tiago

-- Tiago de Paula Peixoto <tiago@skewed.de> _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool

2140

Age (days ago)

2141

Last active (days ago)

List overview

Download

6 comments

2 participants

participants (2)

Jeff Trull
Tiago de Paula Peixoto