Re: [graph-tool] Speeding up graph copy?

22 Jan 2020

      Update:
I realized that GraphView has an efilt as well, so I've added an
edge_property to keep track of which edges were part of the original graph.

Oddly enough, this didn't improve things as it seems like the bottle neck
wasn't in edge removal (which is supposed to be O(1) with
set_fast_edge_removal(True)), but rather with constructing the edge object
using g.edge(src, tgt).
I tried using gv.add_edge() in a loop and saving the result (instead of
using add_edge_list). That made setting the edge properties at the end
fast, but adding the edges one by one ends up being as slow as
reconstructing the edge object.
What seems to have worked is to instead add another edge property map to
keep track of the new edges, and set that with add_edge_list, then copy it
over to the property map that gets used to the filtering.

With all of this in place this has pushed the graph "copying" down to 5% of
the routine, which seems reasonable for now. If someone has other
suggestions I'm all ears :)

Thanks again,
-z

Here's the MWE using efilt implementation:
```
import graph_tool.all as gt
import numpy as np
g = gt.Graph(directed=False)
g.add_vertex(10)
original_edges = [[0, 1], [0, 4]]
g.add_edge_list(original_edges)
g.edge_properties['orig'] = g.new_edge_property('bool', val=True)
g.edge_properties['oorig'] = g.new_edge_property('bool', val=True)

g.set_fast_edge_removal(True)
n_orig = g.num_vertices()

def foo(g, m):
    n_act = g.num_vertices()
    n_extra = n_act - (n_orig+m)

    if n_extra < 0:
        print(f'adding {-n_extra} vertices ...')
        g.add_vertex(-n_extra)
        n_extra = 0
        n_act = g.num_vertices()

    filt = np.ones(n_act)
    if n_extra > 0:
        filt[(-n_extra):] = 0

    gv = gt.GraphView(g, vfilt=filt, efilt=g.edge_properties['orig'])
    #print(f'filt = {filt}')

    new_vtx_ids = np.arange(n, n+m)
    print(f'indices of new vertices = {new_vtx_ids}')

    #Check that all edges are expected
    for e in list(gv.edges()):
        if [e.source(), e.target()] not in original_edges:
            raise Exception(f'{e.source()} -> {e.target()} was not in
original graph')

    #Add a bunch of random edges connecting to new candidates
    edge_list = []
    for s, t in zip(np.random.choice(range(n_orig), m),
                    np.random.choice(new_vtx_ids, m)):
        edge_list.append([s,t,False])

    #Can't set this directly to 'orig' since we are filtering on it, and if
it's false
    #the edge won't be added to this view!
    gv.add_edge_list(edge_list, eprops = [gv.edge_properties['oorig']])

    print('Current edges:')
    for e in list(gv.edges()):
        print(f'{e.source()} -> {e.target()}')

    #computation on new graph goes here ...
    pass

    g.edge_properties['orig'] = g.edge_properties['oorig']

foo(g, 1)
foo(g, 4)
foo(g, 3)
foo(g, 5)
foo(g, 3)
foo(g, 4)
```

On Tue, Jan 21, 2020 at 9:54 PM Zouhair <zoohair@gmail.com> wrote:
...
...
How are you doing steps 3.a and 3.e exactly?
Are you calling g.add_vertex() and g.remove_vertex() repeatedly?
Yes I am. I assumed that's not the best thing but compared to copying the
graph, adding then removing the vertices was performing better.
I investigated a different approach whereas I allow additional vertices to
exist as long as I keep track of the original number of vertices (n_orig):
For a set of candidate vertices of size m
- if graph size is not large enough, add new vertices
- create a graph view that only keeps (n_orig + m)
- add edges as needed (this relies on vertex properties)
- do graph computation
- when done, remove edges (but not vertices)
When finished with all candidate sets, remove the extra vertices.
Now I am only spending ~25% of the time in resizing the graph (around 20%
in remove_edges, despite having set_fast_edge_removal(True)). That's still
more than I'd like, but it's an improvement from the 80% I had with adding
/ removing vertices. If you have other suggestions on how to tackle this
that'd be great (for example, I tried without remove_edges, but the edges
that get added to the graphview end up showing up in the original graph. Is
there a way to avoid that or to filter such that only the original edges
are kept in the graphview?)
as a minimal working example:
```
import graph_tool.all as gt
import numpy as np
g = gt.Graph(directed=False)
g.add_vertex(10)
original_edges = [[0, 1], [0, 4]]
g.add_edge_list(original_edges)
g.set_fast_edge_removal(True)
n_orig = g.num_vertices()
def foo(g, m, skipEdgeRemoval = False):
    n_act = g.num_vertices()
    n_extra = n_act - (n_orig+m)
if n_extra < 0:
        print(f'adding {-n_extra} vertices ...')
        g.add_vertex(-n_extra)
        n_extra = 0
        n_act = g.num_vertices()
filt = np.ones(n_act)
    if n_extra > 0:
        filt[(-n_extra):] = 0
gv = gt.GraphView(g, filt)
    #print(f'filt = {filt}')
new_vtx_ids = np.arange(n, n+m)
    print(f'indices of new vertices = {new_vtx_ids}')
#Check that all edges are expected
    for e in list(gv.edges()):
        if [e.source(), e.target()] not in original_edges:
            raise Exception(f'{e.source()} -> {e.target()} was not in
original graph')
#Add a bunch of random edges connecting to new candidates
    edge_list = []
    for s, t in zip(np.random.choice(range(n_orig), m),
                    np.random.choice(new_vtx_ids, m)):
        edge_list.append([s,t])
        #also updates some vertex properties, but that's easy/fast to
undo, not done in this MWE
gv.add_edge_list(edge_list)
print('Current edges:')
    for e in list(gv.edges()):
        print(f'{e.source()} -> {e.target()}')
#computation on new graph goes here ...
    pass
#print('Cleanup ...')
    #If cleanup is not done, this fails as the edges that were added to
the graphview
    #end up being added to the original graph as well :s
    if not skipEdgeRemoval:
        for e in edge_list:
            gv.remove_edge(gv.edge(e[0], e[1]))
foo(g, 1)
foo(g, 4)
foo(g, 3)
foo(g, 5)
#above is all good
foo(g, 3, True) #skip cleanup
foo(g, 4) #fails
```
...
Message: 3
Date: Tue, 21 Jan 2020 15:01:53 +0100
From: Tiago de Paula Peixoto <tiago@skewed.de>
To: Main discussion list for the graph-tool project
        <graph-tool@skewed.de>
Subject: Re: [graph-tool] Speeding up graph copy?
Message-ID: <1eb5aeab-0a3d-d3fa-84e3-60734afdd360@skewed.de>
Content-Type: text/plain; charset="utf-8"
Am 20.01.20 um 19:43 schrieb Zouhair:
...
Hi,
I'm using graph-tool in my project, and my use case is as follows:
1) build an undirected graph with vertices and edges (~1e5 vtx and ~5e6
edges)
2) attach some properties to the vertices
3) for about ~5000 sets of *new* candidate vertices (each set has about
2-30 vertices)
? 3.a - add new vertices to graph
? 3.b - add edges as necessary between new vertices and old ones
? 3.c - update the properties
? 3.d - run an algorithm which depends on properties and edges, save
result to a list
? 3.e - remove new vertices from graph and restore properties (i.e. undo
3.a-c)
Steps 1) and 2) are going well. For step 3), the algorithm in step 3.d
runs fast, 3.b and 3.c are also fast, however, the bottleneck I'm facing
is 3.a and 3.e (about 80% of runtime is spent there)
...
...
iii- use GraphView but only filter on the original vertices in the
graph: this avoids having to delete the vertices, but making GraphView
with a filter function is still slow (doing `gt.GraphView(g, lambda v: v
< n_nodes_orig)` takes 75% of runtime, and the number of vertices in the
original graph keeps growing...
This can be improved by passing a boolean-valued vertex property map
instead of a lambda function. In this way the GraphView creation becomes
O(1) instead of O(N).
Best,
Tiago
--
Tiago de Paula Peixoto <tiago@skewed.de>