[graph-tool] Fastest way to iterate through edges

6 Aug 2014

      Hi all,

I haven't been on the list very long but I see this question keeps coming
up.  I just thought I put some numbers out so people know.  Graph tool is
super fast at doing the c++ stuff it does, and super convenient for picking
through the data on a python level also, but it doesn't make python fast.

So, here are some examples of working with a mask in Python.  Notice that
numpy is not fast just because it is numpy.  I chose an Nx3 array because
numpy masking is extra slow with shapes other than Nx1.

================

import numpy as np
a_generator = ((x,x+2,x*x) for x in range(1000))
a_list =[(x,x+2,x*x) for x in range(1000)]
a_array = np.array(a_list)
mask = np.ones((len(a_array)),dtype=np.bool)
mask[::3] = False

def tupled():
    for x,b in zip(a_generator,mask):
        if b:
            c,d,e =x

def looped():
    for x,b in zip(a_list,mask):
        if b:
            c,d,e =x

def masked():
    for x in a_array[mask]:
        c,d,e = x

#IPython magic function: timeit

%timeit tupled()
1000000 loops, best of 3: 510 ns per loop

%timeit looped()
10000 loops, best of 3: 76.6 µs per loop

%timeit masked()
1000 loops, best of 3: 445 µs per loop

================

Notice the nano seconds vs micro seconds.  The generator is the clear
winner.

Now, here are some graph tool specific examples.  I left out the mask, but
clearly you can create and manipulate a mask as you see fit.

=================
...
...
...
graph
    <Graph object, directed, reversed, with 32183 vertices and 199381 edges
at 0x7f62842ee9d0>
...
...
...
def graph_loop0():
    for i in range(10):  #Only 10 times because this is soooo slooow
        for e in graph.edges():
            v1,v2 = e.source(),e.target()
...
...
...
def graph_loop1():
    edges = [[e.source(),e.target()] for e in graph.edges()]
    for i in range(1000):
        for e in edges:
            v1,v2 = e[0],e[1]
...
...
...
def graph_loop2():
    edges = [[e.source(),e.target()] for e in graph.edges()]
    gen = (e for e in edges)
    for x in range(1000):
        for e in gen:
            v1,v2 = e[0],e[1]
...
...
...
from time import time
...
...
...
a=time(); graph_loop0(); print time()-a
23.1095559597
...
...
...
a=time(); graph_loop1(); print time()-a
15.721350193
...
...
...
a=time(); graph_loop2(); print time()-a
2.80044198036
=======================

The loop1 and loop2 are doing 100x more, so the generator is 1000x faster.

If you're just looping through once, it makes sense to use the convenience
graph_tool provides.   But if you are implementing a graph algorithm, just
grab what you need from the graph_tool graph into a list or whatever python
object makes sense for what you're doing.

-Elliot