Hi all, I haven't been on the list very long but I see this question keeps coming up. I just thought I put some numbers out so people know. Graph tool is super fast at doing the c++ stuff it does, and super convenient for picking through the data on a python level also, but it doesn't make python fast. So, here are some examples of working with a mask in Python. Notice that numpy is not fast just because it is numpy. I chose an Nx3 array because numpy masking is extra slow with shapes other than Nx1. ================ import numpy as np a_generator = ((x,x+2,x*x) for x in range(1000)) a_list =[(x,x+2,x*x) for x in range(1000)] a_array = np.array(a_list) mask = np.ones((len(a_array)),dtype=np.bool) mask[::3] = False def tupled(): for x,b in zip(a_generator,mask): if b: c,d,e =x def looped(): for x,b in zip(a_list,mask): if b: c,d,e =x def masked(): for x in a_array[mask]: c,d,e = x #IPython magic function: timeit %timeit tupled() 1000000 loops, best of 3: 510 ns per loop %timeit looped() 10000 loops, best of 3: 76.6 µs per loop %timeit masked() 1000 loops, best of 3: 445 µs per loop ================ Notice the nano seconds vs micro seconds. The generator is the clear winner. Now, here are some graph tool specific examples. I left out the mask, but clearly you can create and manipulate a mask as you see fit. =================
graph <Graph object, directed, reversed, with 32183 vertices and 199381 edges at 0x7f62842ee9d0>
def graph_loop0(): for i in range(10): #Only 10 times because this is soooo slooow for e in graph.edges(): v1,v2 = e.source(),e.target()
def graph_loop1(): edges = [[e.source(),e.target()] for e in graph.edges()] for i in range(1000): for e in edges: v1,v2 = e[0],e[1]
def graph_loop2(): edges = [[e.source(),e.target()] for e in graph.edges()] gen = (e for e in edges) for x in range(1000): for e in gen: v1,v2 = e[0],e[1]
from time import time
a=time(); graph_loop0(); print time()-a 23.1095559597
a=time(); graph_loop1(); print time()-a 15.721350193
a=time(); graph_loop2(); print time()-a 2.80044198036
======================= The loop1 and loop2 are doing 100x more, so the generator is 1000x faster. If you're just looping through once, it makes sense to use the convenience graph_tool provides. But if you are implementing a graph algorithm, just grab what you need from the graph_tool graph into a list or whatever python object makes sense for what you're doing. -Elliot