Hello, I'm using a cvsreader to parse a very big file and create an undirected graph accordingly. The file can contain duplicated edges (i.e. A B in one row, B A in another one), so I'm checking if g.edge(v1,v2)==None: e = g.add_edge(v1,v2) in order to discard them (v1 and v2 are vertices created from what it's read from the file). However the graph contains a lot of edges (few millions) and vertices (many thousands), with a potentially high degree for the vertices, and it takes a lot to process the data. As far as I read in the soruce code, the Graph.edge() method checks all the outgoing edges of the source node, but even if I check which node has the highest degree, it takes a lot of time to build the graph.
Is there any other way to remove the duplicated edges? Maybe an edge filter based on some lambda wizardry?
Thanks in advance, Giuseppe
Hi Giuseppe,
On 09/20/2011 04:36 PM, Giuseppe Profiti wrote:
Hello, I'm using a cvsreader to parse a very big file and create an undirected graph accordingly. The file can contain duplicated edges (i.e. A B in one row, B A in another one), so I'm checking if g.edge(v1,v2)==None: e = g.add_edge(v1,v2)
in order to discard them (v1 and v2 are vertices created from what it's read from the file). However the graph contains a lot of edges (few millions) and vertices (many thousands), with a potentially high degree for the vertices, and it takes a lot to process the data. As far as I read in the soruce code, the Graph.edge() method checks all the outgoing edges of the source node, but even if I check which node has the highest degree, it takes a lot of time to build the graph.
Is there any other way to remove the duplicated edges? Maybe an edge filter based on some lambda wizardry?
The library includes a 'remove_parallel_edges()' function which does what you want, and is fast.
If you want to mask the parallel edges temporarily, you can use filtering, as such:
l = label_parallel_edges(g) g = GraphView(g, efilt=lambda e: l[e] == 0)
Cheers, Tiago
Hi Tiago,
2011/9/20 Tiago de Paula Peixoto tiago@skewed.de
Hi Giuseppe,
[cut]
The library includes a 'remove_parallel_edges()' function which does what you want, and is fast.
thanks for the hint, postponing the check after loading the data speeded up the process by 10 times, however the remove_parallel_edges() does not work in parallel (at least on my machine) and then it's a bit too slow for my purposes. I'm trying a different approach in order to speed things up.
If you want to mask the parallel edges temporarily, you can use filtering, as such:
l = label_parallel_edges(g) g = GraphView(g, efilt=lambda e: l[e] == 0)
Has label_parallel_edges the same complexity of remove_parallel_edges?
Thanks again for your help, Giuseppe