Update: I realized that GraphView has an efilt as well, so I've added an edge_property to keep track of which edges were part of the original graph. Oddly enough, this didn't improve things as it seems like the bottle neck wasn't in edge removal (which is supposed to be O(1) with set_fast_edge_removal(True)), but rather with constructing the edge object using g.edge(src, tgt). I tried using gv.add_edge() in a loop and saving the result (instead of using add_edge_list). That made setting the edge properties at the end fast, but adding the edges one by one ends up being as slow as reconstructing the edge object. What seems to have worked is to instead add another edge property map to keep track of the new edges, and set that with add_edge_list, then copy it over to the property map that gets used to the filtering. With all of this in place this has pushed the graph "copying" down to 5% of the routine, which seems reasonable for now. If someone has other suggestions I'm all ears :) Thanks again, -z Here's the MWE using efilt implementation: ``` import graph_tool.all as gt import numpy as np g = gt.Graph(directed=False) g.add_vertex(10) original_edges = [[0, 1], [0, 4]] g.add_edge_list(original_edges) g.edge_properties['orig'] = g.new_edge_property('bool', val=True) g.edge_properties['oorig'] = g.new_edge_property('bool', val=True) g.set_fast_edge_removal(True) n_orig = g.num_vertices() def foo(g, m): n_act = g.num_vertices() n_extra = n_act - (n_orig+m) if n_extra < 0: print(f'adding {-n_extra} vertices ...') g.add_vertex(-n_extra) n_extra = 0 n_act = g.num_vertices() filt = np.ones(n_act) if n_extra > 0: filt[(-n_extra):] = 0 gv = gt.GraphView(g, vfilt=filt, efilt=g.edge_properties['orig']) #print(f'filt = {filt}') new_vtx_ids = np.arange(n, n+m) print(f'indices of new vertices = {new_vtx_ids}') #Check that all edges are expected for e in list(gv.edges()): if [e.source(), e.target()] not in original_edges: raise Exception(f'{e.source()} -> {e.target()} was not in original graph') #Add a bunch of random edges connecting to new candidates edge_list = [] for s, t in zip(np.random.choice(range(n_orig), m), np.random.choice(new_vtx_ids, m)): edge_list.append([s,t,False]) #Can't set this directly to 'orig' since we are filtering on it, and if it's false #the edge won't be added to this view! gv.add_edge_list(edge_list, eprops = [gv.edge_properties['oorig']]) print('Current edges:') for e in list(gv.edges()): print(f'{e.source()} -> {e.target()}') #computation on new graph goes here ... pass g.edge_properties['orig'] = g.edge_properties['oorig'] foo(g, 1) foo(g, 4) foo(g, 3) foo(g, 5) foo(g, 3) foo(g, 4) ``` On Tue, Jan 21, 2020 at 9:54 PM Zouhair <zoohair@gmail.com> wrote:
How are you doing steps 3.a and 3.e exactly?
Are you calling g.add_vertex() and g.remove_vertex() repeatedly?
Yes I am. I assumed that's not the best thing but compared to copying the graph, adding then removing the vertices was performing better.
I investigated a different approach whereas I allow additional vertices to exist as long as I keep track of the original number of vertices (n_orig): For a set of candidate vertices of size m - if graph size is not large enough, add new vertices - create a graph view that only keeps (n_orig + m) - add edges as needed (this relies on vertex properties) - do graph computation - when done, remove edges (but not vertices)
When finished with all candidate sets, remove the extra vertices.
Now I am only spending ~25% of the time in resizing the graph (around 20% in remove_edges, despite having set_fast_edge_removal(True)). That's still more than I'd like, but it's an improvement from the 80% I had with adding / removing vertices. If you have other suggestions on how to tackle this that'd be great (for example, I tried without remove_edges, but the edges that get added to the graphview end up showing up in the original graph. Is there a way to avoid that or to filter such that only the original edges are kept in the graphview?)
as a minimal working example: ``` import graph_tool.all as gt import numpy as np
g = gt.Graph(directed=False) g.add_vertex(10) original_edges = [[0, 1], [0, 4]] g.add_edge_list(original_edges)
g.set_fast_edge_removal(True) n_orig = g.num_vertices()
def foo(g, m, skipEdgeRemoval = False): n_act = g.num_vertices() n_extra = n_act - (n_orig+m)
if n_extra < 0: print(f'adding {-n_extra} vertices ...') g.add_vertex(-n_extra) n_extra = 0 n_act = g.num_vertices()
filt = np.ones(n_act) if n_extra > 0: filt[(-n_extra):] = 0
gv = gt.GraphView(g, filt) #print(f'filt = {filt}')
new_vtx_ids = np.arange(n, n+m) print(f'indices of new vertices = {new_vtx_ids}')
#Check that all edges are expected for e in list(gv.edges()): if [e.source(), e.target()] not in original_edges: raise Exception(f'{e.source()} -> {e.target()} was not in original graph')
#Add a bunch of random edges connecting to new candidates edge_list = [] for s, t in zip(np.random.choice(range(n_orig), m), np.random.choice(new_vtx_ids, m)): edge_list.append([s,t]) #also updates some vertex properties, but that's easy/fast to undo, not done in this MWE
gv.add_edge_list(edge_list)
print('Current edges:') for e in list(gv.edges()): print(f'{e.source()} -> {e.target()}')
#computation on new graph goes here ... pass
#print('Cleanup ...') #If cleanup is not done, this fails as the edges that were added to the graphview #end up being added to the original graph as well :s if not skipEdgeRemoval: for e in edge_list: gv.remove_edge(gv.edge(e[0], e[1]))
foo(g, 1) foo(g, 4) foo(g, 3) foo(g, 5) #above is all good foo(g, 3, True) #skip cleanup foo(g, 4) #fails ```
Message: 3 Date: Tue, 21 Jan 2020 15:01:53 +0100 From: Tiago de Paula Peixoto <tiago@skewed.de> To: Main discussion list for the graph-tool project <graph-tool@skewed.de> Subject: Re: [graph-tool] Speeding up graph copy? Message-ID: <1eb5aeab-0a3d-d3fa-84e3-60734afdd360@skewed.de> Content-Type: text/plain; charset="utf-8"
Am 20.01.20 um 19:43 schrieb Zouhair:
Hi,
I'm using graph-tool in my project, and my use case is as follows: 1) build an undirected graph with vertices and edges (~1e5 vtx and ~5e6 edges) 2) attach some properties to the vertices 3) for about ~5000 sets of *new* candidate vertices (each set has about 2-30 vertices) ? 3.a - add new vertices to graph ? 3.b - add edges as necessary between new vertices and old ones ? 3.c - update the properties ? 3.d - run an algorithm which depends on properties and edges, save result to a list ? 3.e - remove new vertices from graph and restore properties (i.e. undo 3.a-c)
Steps 1) and 2) are going well. For step 3), the algorithm in step 3.d runs fast, 3.b and 3.c are also fast, however, the bottleneck I'm facing is 3.a and 3.e (about 80% of runtime is spent there)
iii- use GraphView but only filter on the original vertices in the graph: this avoids having to delete the vertices, but making GraphView with a filter function is still slow (doing `gt.GraphView(g, lambda v: v < n_nodes_orig)` takes 75% of runtime, and the number of vertices in the original graph keeps growing...
This can be improved by passing a boolean-valued vertex property map instead of a lambda function. In this way the GraphView creation becomes O(1) instead of O(N).
Best, Tiago
-- Tiago de Paula Peixoto <tiago@skewed.de>