Building graph from adjacency matrix
Hello, I tried graph-tool after some experience with networkx, in search for better efficiency (attracted by [1]) ... but unfortunately, it seems like it all depends on how data is stored! The (several) networks I play with are dense weighted networks of 1435 nodes. They are defined in terms of their adjacency matrices, which are pandas dataframes stored in a HDF5 store. In order to get the relative network with networkx, I would do dg = nx.from_numpy_matrix(hstore['my_df'].as_matrix(), create_using=nx.DiGraph()) which took around 18 seconds. In graph-tool, the best alternative I found, based on [2] and [3], is: # Since the bottleneck is given by the calls to g.add_edge(), # numba actually slows down things a bit. #from numba import autojit # #@autojit def graph_from_adj_matrix(m): g = Graph() n = m.shape[0] edges = list(g.add_vertex(n=n)) c = Counter(every=10, until=n) for i in xrange(n): for j in xrange(n): if m[i,j]: g.add_edge(edges[i], edges[j]) c.count() return g g = graph_from_adj_matrix(hstore['my_df'].as_matrix()) ... which takes around 200 seconds (and is still not even weighted!). Is there anything trivial (or simply new) I'm missing (apart from the possibility, mentioned in [3], of writing my code in C++)? [1] : http://graph-tool.skewed.de/performance [2] : http://lists.skewed.de/pipermail/graph-tool/2011-September/000438.html [3] : http://graph-tool.skewed.de/trac/ticket/106 Thanks, Pietro
On 03/20/2014 11:30 AM, Pietro Battiston wrote:
Hello,
I tried graph-tool after some experience with networkx, in search for better efficiency (attracted by [1]) ... but unfortunately, it seems like it all depends on how data is stored!
This is true. If your code essentially depends on python loops, graph-tool will not provide any advantage.
The (several) networks I play with are dense weighted networks of 1435 nodes. They are defined in terms of their adjacency matrices, which are pandas dataframes stored in a HDF5 store.
In order to get the relative network with networkx, I would do dg = nx.from_numpy_matrix(hstore['my_df'].as_matrix(), create_using=nx.DiGraph())
which took around 18 seconds. In graph-tool, the best alternative I found, based on [2] and [3], is:
# Since the bottleneck is given by the calls to g.add_edge(), # numba actually slows down things a bit. #from numba import autojit # #@autojit def graph_from_adj_matrix(m): g = Graph()
n = m.shape[0]
edges = list(g.add_vertex(n=n))
c = Counter(every=10, until=n) for i in xrange(n): for j in xrange(n): if m[i,j]: g.add_edge(edges[i], edges[j]) c.count()
return g
g = graph_from_adj_matrix(hstore['my_df'].as_matrix())
... which takes around 200 seconds (and is still not even weighted!).
Is there anything trivial (or simply new) I'm missing (apart from the possibility, mentioned in [3], of writing my code in C++)?
The ability of inserting many edges without calling add_edge() several times is indeed lacking... Because of this, I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply: g.add_edge_list(transpose(nonzero(a))) This should be much faster than the Python loop above. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Il giorno gio, 20/03/2014 alle 12.52 +0100, Tiago de Paula Peixoto ha scritto:
[...] I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply:
g.add_edge_list(transpose(nonzero(a)))
This should be much faster than the Python loop above.
This is really great, I will try the git version as soon as possible. Thank you very much. Pietro
Il giorno gio, 20/03/2014 alle 12.52 +0100, Tiago de Paula Peixoto ha scritto:
[...]I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply:
g.add_edge_list(transpose(nonzero(a)))
This should be much faster than the Python loop above.
Do you think it would make sense then to have something analogous for PropertyMaps? Like: pm.set_from_list(a) in place of: for v in g.vertices(): pm[v] = a[i] if pm is a property for vertices, and for e in g.edges(): pm[e] = a[i] if it is for edges? Pietro
That's all ready simpler than that, as you can assign values of the propertymap like that: pm.a = a Le 20/03/2014 18:25, Pietro Battiston a écrit :
Il giorno gio, 20/03/2014 alle 12.52 +0100, Tiago de Paula Peixoto ha scritto:
[...]I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply:
g.add_edge_list(transpose(nonzero(a)))
This should be much faster than the Python loop above.
Do you think it would make sense then to have something analogous for PropertyMaps?
Like:
pm.set_from_list(a)
in place of:
for v in g.vertices(): pm[v] = a[i]
if pm is a property for vertices, and
for e in g.edges(): pm[e] = a[i]
if it is for edges?
Pietro
_______________________________________________ graph-tool mailing list graph-tool@skewed.de http://lists.skewed.de/mailman/listinfo/graph-tool
Il giorno gio, 20/03/2014 alle 12.52 +0100, Tiago de Paula Peixoto ha scritto:
[...] I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply:
g.add_edge_list(transpose(nonzero(a)))
This should be much faster than the Python loop above.
I think there is some problem with add_edge_list. The following code: --------------------------------------------------------- from scipy import array from graph_tool import Graph from graph_tool.draw import graph_draw def blocks_star(transpose=False): g = Graph() members = array([0,1,2]) edges_list = array([[0,1,2], [2,0,1]]).transpose() edges_list_2 = array([[0,2],[1,0],[2,1]]) print edges_list print edges_list_2 print all(edges_list == edges_list_2) if transpose: g.add_edge_list(edges_list) else: g.add_edge_list(edges_list_2) graph_draw(g) --------------------------------------------------------- gives me a different result if called with True or False as argument (with True, the resulting graph is disconnected), while the edges lists are clearly identical. I guess this has to do with the fact that transposing, for numpy, is just a change of view, not a relocation of elements... but I see your code is based on boost, which I have no experience with. I am using package python-graph-tool 2.2.31-1 from your Debian sid repositories. By the way: why "add_edge_list" rather than "add_edges_list"? True, "add_vertex" can add multiple vertices, but then there is "clear_edges". Not really crucial issue, but... later it would be too late to raise it! Cheers, Pietro
On 04/03/2014 10:44 AM, Pietro Battiston wrote:
I think there is some problem with add_edge_list.
The following code:
---------------------------------------------------------
from scipy import array from graph_tool import Graph from graph_tool.draw import graph_draw
def blocks_star(transpose=False): g = Graph() members = array([0,1,2]) edges_list = array([[0,1,2], [2,0,1]]).transpose() edges_list_2 = array([[0,2],[1,0],[2,1]]) print edges_list print edges_list_2 print all(edges_list == edges_list_2) if transpose: g.add_edge_list(edges_list) else: g.add_edge_list(edges_list_2) graph_draw(g)
---------------------------------------------------------
gives me a different result if called with True or False as argument (with True, the resulting graph is disconnected), while the edges lists are clearly identical. I guess this has to do with the fact that transposing, for numpy, is just a change of view, not a relocation of elements... but I see your code is based on boost, which I have no experience with. I am using package python-graph-tool 2.2.31-1 from your Debian sid repositories.
This is a bug. The strides of the numpy array are not propagated properly to the C++ side of things. I have fixed this now in the git version.
By the way: why "add_edge_list" rather than "add_edges_list"? True, "add_vertex" can add multiple vertices, but then there is "clear_edges". Not really crucial issue, but... later it would be too late to raise it!
The only grammatically correct variations I see are "add edge list" or "add list of edges". The phrase "add edges list" does not seem right. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Il giorno gio, 03/04/2014 alle 20.35 +0200, Tiago de Paula Peixoto ha scritto:
On 04/03/2014 10:44 AM, Pietro Battiston wrote:
I think there is some problem with add_edge_list.
The following code:
---------------------------------------------------------
from scipy import array from graph_tool import Graph from graph_tool.draw import graph_draw
def blocks_star(transpose=False): g = Graph() members = array([0,1,2]) edges_list = array([[0,1,2], [2,0,1]]).transpose() edges_list_2 = array([[0,2],[1,0],[2,1]]) print edges_list print edges_list_2 print all(edges_list == edges_list_2) if transpose: g.add_edge_list(edges_list) else: g.add_edge_list(edges_list_2) graph_draw(g)
---------------------------------------------------------
gives me a different result if called with True or False as argument (with True, the resulting graph is disconnected), while the edges lists are clearly identical. I guess this has to do with the fact that transposing, for numpy, is just a change of view, not a relocation of elements... but I see your code is based on boost, which I have no experience with. I am using package python-graph-tool 2.2.31-1 from your Debian sid repositories.
This is a bug. The strides of the numpy array are not propagated properly to the C++ side of things. I have fixed this now in the git version.
Great!
By the way: why "add_edge_list" rather than "add_edges_list"? True, "add_vertex" can add multiple vertices, but then there is "clear_edges". Not really crucial issue, but... later it would be too late to raise it!
The only grammatically correct variations I see are "add edge list" or "add list of edges". The phrase "add edges list" does not seem right.
It seems that you are perfectly right. Pietro
participants (3)
-
Guillaume Gay -
Pietro Battiston -
Tiago de Paula Peixoto