On 03/20/2014 11:30 AM, Pietro Battiston wrote:
Hello,
I tried graph-tool after some experience with networkx, in search for better efficiency (attracted by [1]) ... but unfortunately, it seems like it all depends on how data is stored!
This is true. If your code essentially depends on python loops, graph-tool will not provide any advantage.
The (several) networks I play with are dense weighted networks of 1435 nodes. They are defined in terms of their adjacency matrices, which are pandas dataframes stored in a HDF5 store.
In order to get the relative network with networkx, I would do dg = nx.from_numpy_matrix(hstore['my_df'].as_matrix(), create_using=nx.DiGraph())
which took around 18 seconds. In graph-tool, the best alternative I found, based on [2] and [3], is:
# Since the bottleneck is given by the calls to g.add_edge(), # numba actually slows down things a bit. #from numba import autojit # #@autojit def graph_from_adj_matrix(m): g = Graph()
n = m.shape[0]
edges = list(g.add_vertex(n=n))
c = Counter(every=10, until=n) for i in xrange(n): for j in xrange(n): if m[i,j]: g.add_edge(edges[i], edges[j]) c.count()
return g
g = graph_from_adj_matrix(hstore['my_df'].as_matrix())
... which takes around 200 seconds (and is still not even weighted!).
Is there anything trivial (or simply new) I'm missing (apart from the possibility, mentioned in [3], of writing my code in C++)?
The ability of inserting many edges without calling add_edge() several times is indeed lacking... Because of this, I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply: g.add_edge_list(transpose(nonzero(a))) This should be much faster than the Python loop above. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>