Extracting vertex subgraphs (based on property map)?
Hi, Could you suggest what is the most efficient way for: 1) extracting vertex clusters based on values in vertex property map? - e.g., for iterating through all connected components in a graph, based on result of label_components 2) saving extracted subgraphs to a file? Code that I am using is in [1] but for a large graph (~1M edges, 12k clusters) it is taking days. Is there a more optimal way? Can the function topology.mark_subgraph be of help here? - the name seems relevant but I do not know how it is used. [1] https://gist.github.com/4627494 - function save_all_graphs is called to save all the subgraphs Many thanks, Uldis
can you release the data / a part of it, let me see if i can optimize it. I personally don't know of anyway besides property map iteration .-. On Thu, Jan 24, 2013 at 3:55 PM, Uldis Bojars <captsolo@gmail.com> wrote:
Hi,
Could you suggest what is the most efficient way for:
1) extracting vertex clusters based on values in vertex property map? - e.g., for iterating through all connected components in a graph, based on result of label_components
2) saving extracted subgraphs to a file?
Code that I am using is in [1] but for a large graph (~1M edges, 12k clusters) it is taking days. Is there a more optimal way?
Can the function topology.mark_subgraph be of help here? - the name seems relevant but I do not know how it is used.
[1] https://gist.github.com/4627494 - function save_all_graphs is called to save all the subgraphs
Many thanks, Uldis _______________________________________________ graph-tool mailing list graph-tool@skewed.de http://lists.skewed.de/mailman/listinfo/graph-tool
On 01/24/2013 09:55 PM, Uldis Bojars wrote:
Hi,
Could you suggest what is the most efficient way for:
1) extracting vertex clusters based on values in vertex property map? - e.g., for iterating through all connected components in a graph, based on result of label_components
You can improve the performance by avoiding repeated python function calls for each vertex. One way to do this is to use array expressions. For instance, instead of doing: gv1 = gt.GraphView(g, vfilt = lambda v: v_prop[v] == cl_num) you could do: gv1 = gt.GraphView(g, vfilt = v_prop.a == cl_num) The second option should be significantly faster for larger graphs.
2) saving extracted subgraphs to a file?
Code that I am using is in [1] but for a large graph (~1M edges, 12k clusters) it is taking days. Is there a more optimal way?
The problem with this approach is that since you have a number C of clusters, and the creation / iteration of each filtered graph is O(N + E), where N and E correspond to the size of the unfiltered graph, the whole thing becomes O(C(N + E)), which can be very slow if C is in the order of tens of thousands, as is your case. In this case the best approach would be not to create filtered graphs, but rather, to create your cluster graphs individually by hand, i.e. clusters = defaultdict(list) for v in g.vertices(): clusters[v_prop[v]].append(v) for i, (c, vs) in enumerate(clusters.items()): cg = Graph() vmap = defaultdict(lambda: cg.add_vertex()) for v in vs: u = vmap[v] for w in v.out_neighbours(): cg.add_edge(u, vmap[w]) cg.save("cluster-%d.xml.gz" % i) This approach is O(N + E), which would be much faster than the previous one if C is sufficiently large, even though it is pure python.
Can the function topology.mark_subgraph be of help here? - the name seems relevant but I do not know how it is used.
This returns a property map which can be used to mask a subgraph, if it is represented as a graph object. I does in essence the opposite of what you want to achieve. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
On Fri, Jan 25, 2013 at 12:56 AM, Tiago de Paula Peixoto <tiago@skewed.de> wrote:
1) extracting vertex clusters based on values in vertex property map?
You can improve the performance by avoiding repeated python function calls for each vertex. One way to do this is to use array expressions. For instance, instead of doing:
gv1 = gt.GraphView(g, vfilt = lambda v: v_prop[v] == cl_num)
you could do:
gv1 = gt.GraphView(g, vfilt = v_prop.a == cl_num)
The second option should be significantly faster for larger graphs.
Awesome! Thanks a lot, this solved the problem. Where could I find more information about the v_prop.a == cl_num part? It might be useful in other cases of working with graph-tool, but first need to understand it. Is it a part of NumPy? Cheers, Uldis
On 02/06/2013 09:40 PM, Uldis Bojars wrote:
On Fri, Jan 25, 2013 at 12:56 AM, Tiago de Paula Peixoto <tiago@skewed.de> wrote:
1) extracting vertex clusters based on values in vertex property map?
You can improve the performance by avoiding repeated python function calls for each vertex. One way to do this is to use array expressions. For instance, instead of doing:
gv1 = gt.GraphView(g, vfilt = lambda v: v_prop[v] == cl_num)
you could do:
gv1 = gt.GraphView(g, vfilt = v_prop.a == cl_num)
The second option should be significantly faster for larger graphs.
Awesome! Thanks a lot, this solved the problem.
Where could I find more information about the v_prop.a == cl_num part? It might be useful in other cases of working with graph-tool, but first need to understand it. Is it a part of NumPy?
Yes, v_prop.a returns a view of the property map as a Numpy array. Numpy supports so-called "array expressions", which translates to simple C loops. So when you do v_prop.a == cl_num, this returns an array of boolean values, which is computed in a tight C loop. This array is used as-is by the GraphView constructor, which uses it as a filter. Thus, no python loop is ever performed. In the first version, however, a function is passed to the constructor, which is called on each vertex to build a filter, resulting in a much slower performance. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (3)
-
Ronnie Ghose -
Tiago de Paula Peixoto -
Uldis Bojars