On 03.07.2017 12:16, François Kawala wrote:
I cProfiled old versus your last commit. The graph has 7 953 158 vertices, I sample 100 vertices and do shortest_distance from each one. I specify no target, and set the max_dist parameter. After each call to shortest_distance, I use the reached array to reset the predecessor map. In average each search reaches 120177 vertices.
* 2.22 : 9.101 seconds * commit 26404ef4 : 4.536 seconds * commit 26404ef4 + no dist map init : 4.141 seconds
Well, the last improvement is only about 9%... We're clearly seeing diminishing returns. As I had expected, numpy initialization is pretty fast.
About the third configuration (no dist map init), I removed the distance map initialization (graph_tool/topology/__init__.py L1779 <https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/topology/__init__.py#L1779>) and used the reached array to reset dist_map to infinity.
Did you do the same with the predecessor map as well?
We could have a do_initialization parameter, I think it would be more explicit, see my proposal <https://git.skewed.de/francois/graph-tool/commit/946826546a80f1f080681a4eaaa5fb2590b5abd4>.
I don't like this very much. The list of reached vertices might be useful even if the user is not interested in the no_init optimization, so I think it is better decoupled. Furthermore, you are ignoring the predecessor map, which will give a cryptic error if the user chooses do_initialization=False and forgets to pass pred_map, and will ignore the value passed by the user otherwise. -- Tiago de Paula Peixoto <tiago@skewed.de>