Extracting vertex subgraphs (based on property map)?

Uldis Bojars

24 Jan 2013 24 Jan '13

8:55 p.m.

Hi, Could you suggest what is the most efficient way for: 1) extracting vertex clusters based on values in vertex property map? - e.g., for iterating through all connected components in a graph, based on result of label_components 2) saving extracted subgraphs to a file? Code that I am using is in [1] but for a large graph (~1M edges, 12k clusters) it is taking days. Is there a more optimal way? Can the function topology.mark_subgraph be of help here? - the name seems relevant but I do not know how it is used. [1] https://gist.github.com/4627494 - function save_all_graphs is called to save all the subgraphs Many thanks, Uldis

Show replies by date

Ronnie Ghose

24 Jan 24 Jan

9:50 p.m.

New subject: Extracting vertex subgraphs (based on property map)?

can you release the data / a part of it, let me see if i can optimize it. I personally don't know of anyway besides property map iteration .-. On Thu, Jan 24, 2013 at 3:55 PM, Uldis Bojars <captsolo@gmail.com> wrote:

...

Hi,

Could you suggest what is the most efficient way for:

1) extracting vertex clusters based on values in vertex property map? - e.g., for iterating through all connected components in a graph, based on result of label_components

2) saving extracted subgraphs to a file?

Code that I am using is in [1] but for a large graph (~1M edges, 12k clusters) it is taking days. Is there a more optimal way?

Can the function topology.mark_subgraph be of help here? - the name seems relevant but I do not know how it is used.

[1] https://gist.github.com/4627494 - function save_all_graphs is called to save all the subgraphs

Many thanks, Uldis _______________________________________________ graph-tool mailing list graph-tool@skewed.de http://lists.skewed.de/mailman/listinfo/graph-tool

Tiago de Paula Peixoto

10:56 p.m.

New subject: Extracting vertex subgraphs (based on property map)?

On 01/24/2013 09:55 PM, Uldis Bojars wrote:

...

Hi,

Could you suggest what is the most efficient way for:

1) extracting vertex clusters based on values in vertex property map? - e.g., for iterating through all connected components in a graph, based on result of label_components

You can improve the performance by avoiding repeated python function calls for each vertex. One way to do this is to use array expressions. For instance, instead of doing: gv1 = gt.GraphView(g, vfilt = lambda v: v_prop[v] == cl_num) you could do: gv1 = gt.GraphView(g, vfilt = v_prop.a == cl_num) The second option should be significantly faster for larger graphs.

...

2) saving extracted subgraphs to a file?

Code that I am using is in [1] but for a large graph (~1M edges, 12k clusters) it is taking days. Is there a more optimal way?

The problem with this approach is that since you have a number C of clusters, and the creation / iteration of each filtered graph is O(N + E), where N and E correspond to the size of the unfiltered graph, the whole thing becomes O(C(N + E)), which can be very slow if C is in the order of tens of thousands, as is your case. In this case the best approach would be not to create filtered graphs, but rather, to create your cluster graphs individually by hand, i.e. clusters = defaultdict(list) for v in g.vertices(): clusters[v_prop[v]].append(v) for i, (c, vs) in enumerate(clusters.items()): cg = Graph() vmap = defaultdict(lambda: cg.add_vertex()) for v in vs: u = vmap[v] for w in v.out_neighbours(): cg.add_edge(u, vmap[w]) cg.save("cluster-%d.xml.gz" % i) This approach is O(N + E), which would be much faster than the previous one if C is sufficiently large, even though it is pure python.

...

Can the function topology.mark_subgraph be of help here? - the name seems relevant but I do not know how it is used.

This returns a property map which can be used to mask a subgraph, if it is represented as a graph object. I does in essence the opposite of what you want to achieve. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

Uldis Bojars

6 Feb 6 Feb

8:40 p.m.

New subject: Extracting vertex subgraphs (based on property map)?

On Fri, Jan 25, 2013 at 12:56 AM, Tiago de Paula Peixoto <tiago@skewed.de> wrote:

...

...
1) extracting vertex clusters based on values in vertex property map?

You can improve the performance by avoiding repeated python function calls for each vertex. One way to do this is to use array expressions. For instance, instead of doing:

gv1 = gt.GraphView(g, vfilt = lambda v: v_prop[v] == cl_num)

you could do:

gv1 = gt.GraphView(g, vfilt = v_prop.a == cl_num)

The second option should be significantly faster for larger graphs.

Awesome! Thanks a lot, this solved the problem. Where could I find more information about the v_prop.a == cl_num part? It might be useful in other cases of working with graph-tool, but first need to understand it. Is it a part of NumPy? Cheers, Uldis

Tiago de Paula Peixoto

9:24 p.m.

New subject: Extracting vertex subgraphs (based on property map)?

On 02/06/2013 09:40 PM, Uldis Bojars wrote:

...

On Fri, Jan 25, 2013 at 12:56 AM, Tiago de Paula Peixoto <tiago@skewed.de> wrote:

...
...
1) extracting vertex clusters based on values in vertex property map?

You can improve the performance by avoiding repeated python function calls for each vertex. One way to do this is to use array expressions. For instance, instead of doing:

gv1 = gt.GraphView(g, vfilt = lambda v: v_prop[v] == cl_num)

you could do:

gv1 = gt.GraphView(g, vfilt = v_prop.a == cl_num)

The second option should be significantly faster for larger graphs.

Awesome! Thanks a lot, this solved the problem.

Where could I find more information about the v_prop.a == cl_num part? It might be useful in other cases of working with graph-tool, but first need to understand it. Is it a part of NumPy?

Yes, v_prop.a returns a view of the property map as a Numpy array. Numpy supports so-called "array expressions", which translates to simple C loops. So when you do v_prop.a == cl_num, this returns an array of boolean values, which is computed in a tight C loop. This array is used as-is by the GraphView constructor, which uses it as a filter. Thus, no python loop is ever performed. In the first version, however, a function is passed to the constructor, which is called on each vertex to build a filter, resulting in a much slower performance. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

4697

Age (days ago)

4710

Last active (days ago)

List overview

Download

4 comments

3 participants

participants (3)

Ronnie Ghose
Tiago de Paula Peixoto
Uldis Bojars