Building graph from adjacency matrix

newer
How can I generate an Erdos-Renyi...

Pietro Battiston

20 Mar 2014 20 Mar '14

10:30 a.m.

Hello, I tried graph-tool after some experience with networkx, in search for better efficiency (attracted by [1]) ... but unfortunately, it seems like it all depends on how data is stored! The (several) networks I play with are dense weighted networks of 1435 nodes. They are defined in terms of their adjacency matrices, which are pandas dataframes stored in a HDF5 store. In order to get the relative network with networkx, I would do dg = nx.from_numpy_matrix(hstore['my_df'].as_matrix(), create_using=nx.DiGraph()) which took around 18 seconds. In graph-tool, the best alternative I found, based on [2] and [3], is: # Since the bottleneck is given by the calls to g.add_edge(), # numba actually slows down things a bit. #from numba import autojit # #@autojit def graph_from_adj_matrix(m): g = Graph() n = m.shape[0] edges = list(g.add_vertex(n=n)) c = Counter(every=10, until=n) for i in xrange(n): for j in xrange(n): if m[i,j]: g.add_edge(edges[i], edges[j]) c.count() return g g = graph_from_adj_matrix(hstore['my_df'].as_matrix()) ... which takes around 200 seconds (and is still not even weighted!). Is there anything trivial (or simply new) I'm missing (apart from the possibility, mentioned in [3], of writing my code in C++)? [1] : http://graph-tool.skewed.de/performance [2] : http://lists.skewed.de/pipermail/graph-tool/2011-September/000438.html [3] : http://graph-tool.skewed.de/trac/ticket/106 Thanks, Pietro

Show replies by date

Tiago de Paula Peixoto

20 Mar 20 Mar

11:52 a.m.

On 03/20/2014 11:30 AM, Pietro Battiston wrote:

...

Hello,

I tried graph-tool after some experience with networkx, in search for better efficiency (attracted by [1]) ... but unfortunately, it seems like it all depends on how data is stored!

This is true. If your code essentially depends on python loops, graph-tool will not provide any advantage.

...

The (several) networks I play with are dense weighted networks of 1435 nodes. They are defined in terms of their adjacency matrices, which are pandas dataframes stored in a HDF5 store.

In order to get the relative network with networkx, I would do dg = nx.from_numpy_matrix(hstore['my_df'].as_matrix(), create_using=nx.DiGraph())

which took around 18 seconds. In graph-tool, the best alternative I found, based on [2] and [3], is:

# Since the bottleneck is given by the calls to g.add_edge(), # numba actually slows down things a bit. #from numba import autojit # #@autojit def graph_from_adj_matrix(m): g = Graph()

n = m.shape[0]

edges = list(g.add_vertex(n=n))

c = Counter(every=10, until=n) for i in xrange(n): for j in xrange(n): if m[i,j]: g.add_edge(edges[i], edges[j]) c.count()

return g

g = graph_from_adj_matrix(hstore['my_df'].as_matrix())

... which takes around 200 seconds (and is still not even weighted!).

Is there anything trivial (or simply new) I'm missing (apart from the possibility, mentioned in [3], of writing my code in C++)?

The ability of inserting many edges without calling add_edge() several times is indeed lacking... Because of this, I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply: g.add_edge_list(transpose(nonzero(a))) This should be much faster than the Python loop above. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

Pietro Battiston

12:10 p.m.

Il giorno gio, 20/03/2014 alle 12.52 +0100, Tiago de Paula Peixoto ha scritto:

...

[...] I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply:

g.add_edge_list(transpose(nonzero(a)))

This should be much faster than the Python loop above.

This is really great, I will try the git version as soon as possible. Thank you very much. Pietro

Pietro Battiston

5:25 p.m.

Il giorno gio, 20/03/2014 alle 12.52 +0100, Tiago de Paula Peixoto ha scritto:

...

[...]I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply:

g.add_edge_list(transpose(nonzero(a)))

This should be much faster than the Python loop above.

Do you think it would make sense then to have something analogous for PropertyMaps? Like: pm.set_from_list(a) in place of: for v in g.vertices(): pm[v] = a[i] if pm is a property for vertices, and for e in g.edges(): pm[e] = a[i] if it is for edges? Pietro

Guillaume Gay

5:29 p.m.

That's all ready simpler than that, as you can assign values of the propertymap like that: pm.a = a Le 20/03/2014 18:25, Pietro Battiston a écrit :

...

Il giorno gio, 20/03/2014 alle 12.52 +0100, Tiago de Paula Peixoto ha scritto:

...
[...]I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply:

g.add_edge_list(transpose(nonzero(a)))

This should be much faster than the Python loop above.

Do you think it would make sense then to have something analogous for PropertyMaps?

Like:

pm.set_from_list(a)

in place of:

for v in g.vertices(): pm[v] = a[i]

if pm is a property for vertices, and

for e in g.edges(): pm[e] = a[i]

if it is for edges?

Pietro

_______________________________________________ graph-tool mailing list graph-tool@skewed.de http://lists.skewed.de/mailman/listinfo/graph-tool

Pietro Battiston

9:07 p.m.

Il giorno gio, 20/03/2014 alle 18.29 +0100, Guillaume Gay ha scritto:

...

That's all ready simpler than that, as you can assign values of the propertymap like that:

pm.a = a

Great! Thanks Pietro

Pietro Battiston

3 Apr 3 Apr

8:44 a.m.

Il giorno gio, 20/03/2014 alle 12.52 +0100, Tiago de Paula Peixoto ha scritto:

...

[...] I have now just added a Graph.add_edge_list() to the git version, which takes a list of edges to be added, which can be a numpy array. If you have a full adjacency matrix instead of an edge list, you can do simply:

g.add_edge_list(transpose(nonzero(a)))

This should be much faster than the Python loop above.

I think there is some problem with add_edge_list. The following code: --------------------------------------------------------- from scipy import array from graph_tool import Graph from graph_tool.draw import graph_draw def blocks_star(transpose=False): g = Graph() members = array([0,1,2]) edges_list = array([[0,1,2], [2,0,1]]).transpose() edges_list_2 = array([[0,2],[1,0],[2,1]]) print edges_list print edges_list_2 print all(edges_list == edges_list_2) if transpose: g.add_edge_list(edges_list) else: g.add_edge_list(edges_list_2) graph_draw(g) --------------------------------------------------------- gives me a different result if called with True or False as argument (with True, the resulting graph is disconnected), while the edges lists are clearly identical. I guess this has to do with the fact that transposing, for numpy, is just a change of view, not a relocation of elements... but I see your code is based on boost, which I have no experience with. I am using package python-graph-tool 2.2.31-1 from your Debian sid repositories. By the way: why "add_edge_list" rather than "add_edges_list"? True, "add_vertex" can add multiple vertices, but then there is "clear_edges". Not really crucial issue, but... later it would be too late to raise it! Cheers, Pietro

Tiago de Paula Peixoto

6:35 p.m.

On 04/03/2014 10:44 AM, Pietro Battiston wrote:

...

I think there is some problem with add_edge_list.

The following code:

---------------------------------------------------------

from scipy import array from graph_tool import Graph from graph_tool.draw import graph_draw

def blocks_star(transpose=False): g = Graph() members = array([0,1,2]) edges_list = array([[0,1,2], [2,0,1]]).transpose() edges_list_2 = array([[0,2],[1,0],[2,1]]) print edges_list print edges_list_2 print all(edges_list == edges_list_2) if transpose: g.add_edge_list(edges_list) else: g.add_edge_list(edges_list_2) graph_draw(g)

---------------------------------------------------------

gives me a different result if called with True or False as argument (with True, the resulting graph is disconnected), while the edges lists are clearly identical. I guess this has to do with the fact that transposing, for numpy, is just a change of view, not a relocation of elements... but I see your code is based on boost, which I have no experience with. I am using package python-graph-tool 2.2.31-1 from your Debian sid repositories.

This is a bug. The strides of the numpy array are not propagated properly to the C++ side of things. I have fixed this now in the git version.

...

By the way: why "add_edge_list" rather than "add_edges_list"? True, "add_vertex" can add multiple vertices, but then there is "clear_edges". Not really crucial issue, but... later it would be too late to raise it!

The only grammatically correct variations I see are "add edge list" or "add list of edges". The phrase "add edges list" does not seem right. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

Pietro Battiston

5 Apr 5 Apr

6:49 a.m.

Il giorno gio, 03/04/2014 alle 20.35 +0200, Tiago de Paula Peixoto ha scritto:

...

On 04/03/2014 10:44 AM, Pietro Battiston wrote:

...
I think there is some problem with add_edge_list.

The following code:

---------------------------------------------------------

from scipy import array from graph_tool import Graph from graph_tool.draw import graph_draw

def blocks_star(transpose=False): g = Graph() members = array([0,1,2]) edges_list = array([[0,1,2], [2,0,1]]).transpose() edges_list_2 = array([[0,2],[1,0],[2,1]]) print edges_list print edges_list_2 print all(edges_list == edges_list_2) if transpose: g.add_edge_list(edges_list) else: g.add_edge_list(edges_list_2) graph_draw(g)

---------------------------------------------------------

gives me a different result if called with True or False as argument (with True, the resulting graph is disconnected), while the edges lists are clearly identical. I guess this has to do with the fact that transposing, for numpy, is just a change of view, not a relocation of elements... but I see your code is based on boost, which I have no experience with. I am using package python-graph-tool 2.2.31-1 from your Debian sid repositories.

This is a bug. The strides of the numpy array are not propagated properly to the C++ side of things. I have fixed this now in the git version.

Great!

...

...
By the way: why "add_edge_list" rather than "add_edges_list"? True, "add_vertex" can add multiple vertices, but then there is "clear_edges". Not really crucial issue, but... later it would be too late to raise it!

The only grammatically correct variations I see are "add edge list" or "add list of edges". The phrase "add edges list" does not seem right.

It seems that you are perfectly right. Pietro

4274

Age (days ago)

4290

Last active (days ago)

List overview

Download

8 comments

3 participants

participants (3)

Guillaume Gay
Pietro Battiston
Tiago de Paula Peixoto