Hi, I stumbled on a bit of a performance bottleneck when trying bfs_search on a big graph. The problem is with initialize_vertex function, which is called for every vertex of the graph. Even if the function itself is empty, there's a bunch of work that's done for each vertex in VisitorWrapper.__getattr__, before that empty function is even called. Just doing: g = Graph() g.add_vertex(10000000) %timeit bfs_search(g, 0, BFSVisitor()) 1 loops, best of 3: 37.6 s per loop Now I have a graph with 350M vertices and some 550M edges. It takes 21 minutes of initialize_vertex calls that I don't need, before doing a few millisecond search :-/ Seems like my only option is to compile a custom version, with initialize_vertex disabled from all search functions, or implement BFS myself in python. Hopefully I'm doing something terribly wrong? What I'm trying to do, is find all vertices reachable from a given root vertex. Graph is undirected and not fully connected. I thought BFS would be a convenient way to do that, but maybe there's a better way to get that information? I'm not very familiar with graphs :-) - Ilkka
On 15.05.2015 09:11, Ilkka Rajala wrote:
I stumbled on a bit of a performance bottleneck when trying bfs_search on a big graph. The problem is with initialize_vertex function, which is called for every vertex of the graph. Even if the function itself is empty, there's a bunch of work that's done for each vertex in VisitorWrapper.__getattr__, before that empty function is even called.
Indeed. In bfs_search the visitor is a Python class, and hence can be very slow.
What I'm trying to do, is find all vertices reachable from a given root vertex. Graph is undirected and not fully connected. I thought BFS would be a convenient way to do that, but maybe there's a better way to get that information?
BFS is correct, but the bfs_search() function is intended to be used when one wants to do something special with the visitor class to implement some more elaborate algorithm. If you just want to obtain reachability, you can use the shortest_distance() function. Any vertex with a distance larger than N is not reachable... For example: dist = shortest_distance(g, source=g.vertex(10)) u = GraphView(g, vfilt=dist.a < g.num_vertices()) The graph view u contains only the vertices reachable by the source vertex 10. One can also use the label_out_component() function, which is algorithmically equivalent (and maybe even slightly faster): comp = label_out_component(g, g.vertex(10)) u = GraphView(g, vfilt=comp) Both of these approaches should be much faster than bfs_search(). (Note that both these functions use BFS internally, but without Python interference.) Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
On Fri, May 15, 2015 at 10:28 AM, Tiago de Paula Peixoto <tiago@skewed.de> wrote:
One can also use the label_out_component() function, which is algorithmically equivalent (and maybe even slightly faster):
comp = label_out_component(g, g.vertex(10)) u = GraphView(g, vfilt=comp)
Both of these approaches should be much faster than bfs_search(). (Note that both these functions use BFS internally, but without Python interference.)
Thank you. Component labeling is exactly what I was trying to do, but didn't know what to look for. What would be nice, is parallel version of label_components(), but I think I can live with this :-) - Ilkka
participants (2)
-
Ilkka Rajala -
Tiago de Paula Peixoto