On 28.06.2017 15:00, François Kawala wrote:
I try to dig up an old idea that we've discussed previously (the thread is here <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Shortest-distance-complexity-when-used-with-max-dist-td4026018.html>).
summary: when one runs shortest distance on a graph with several million vertices the distance initialization cost is high with respect to the actual Dijsktra cost. We want to circumvent the initialization [see L326 of src/graph/topology/graph_distance.cc <https://git.skewed.de/count0/graph-tool/blob/master/src/graph/topology/graph_distance.cc#L326>]
The proposed approach (as per this boost thread <https://groups.google.com/forum/#%21topic/boost-list/7nCzvkkEXCk>) is to use the `dijkstra_shortest_paths_no_color_map_no_init`. The distance initialization is done "on the fly" by the `examine_edge` function. More precisely, the Dijkstra visitor maintains a set of discovered vertices. This set is read in the `examine_edge` function to set the distance to infinity when a the target vertex does not belong to the set of discovered vertices. The `examine_edge` function is also responsible to update the set of discovered vertices.
I've pushed a commit: https://git.skewed.de/count0/graph-tool/commit/bf174831 that replaces the search functions by no_init ones. The initialization is still done, but only once via a single numpy call in the Python side of things. The speed of the numpy call should be comparable to what we get with allocation, which is unavoidable. Please see if the performance improvements you observe with this is acceptable. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>