On 01/24/2013 09:55 PM, Uldis Bojars wrote:
Hi,
Could you suggest what is the most efficient way for:
1) extracting vertex clusters based on values in vertex property map? - e.g., for iterating through all connected components in a graph, based on result of label_components
You can improve the performance by avoiding repeated python function calls for each vertex. One way to do this is to use array expressions. For instance, instead of doing: gv1 = gt.GraphView(g, vfilt = lambda v: v_prop[v] == cl_num) you could do: gv1 = gt.GraphView(g, vfilt = v_prop.a == cl_num) The second option should be significantly faster for larger graphs.
2) saving extracted subgraphs to a file?
Code that I am using is in [1] but for a large graph (~1M edges, 12k clusters) it is taking days. Is there a more optimal way?
The problem with this approach is that since you have a number C of clusters, and the creation / iteration of each filtered graph is O(N + E), where N and E correspond to the size of the unfiltered graph, the whole thing becomes O(C(N + E)), which can be very slow if C is in the order of tens of thousands, as is your case. In this case the best approach would be not to create filtered graphs, but rather, to create your cluster graphs individually by hand, i.e. clusters = defaultdict(list) for v in g.vertices(): clusters[v_prop[v]].append(v) for i, (c, vs) in enumerate(clusters.items()): cg = Graph() vmap = defaultdict(lambda: cg.add_vertex()) for v in vs: u = vmap[v] for w in v.out_neighbours(): cg.add_edge(u, vmap[w]) cg.save("cluster-%d.xml.gz" % i) This approach is O(N + E), which would be much faster than the previous one if C is sufficiently large, even though it is pure python.
Can the function topology.mark_subgraph be of help here? - the name seems relevant but I do not know how it is used.
This returns a property map which can be used to mask a subgraph, if it is represented as a graph object. I does in essence the opposite of what you want to achieve. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>