Fast method for duplicated edge removal

20 Sep 2011

      Hello,
I'm using a cvsreader to parse a very big file and create an undirected
graph accordingly.
The file can contain duplicated edges (i.e. A B in one row, B A in another
one), so I'm checking
    if g.edge(v1,v2)==None:
         e = g.add_edge(v1,v2)
in order to discard them (v1 and v2 are vertices created from what it's read
from the file).
However the graph contains a lot of edges (few millions) and vertices (many
thousands), with a potentially high degree for the vertices, and it takes a
lot to process the data.
As far as I read in the soruce code, the Graph.edge() method checks all the
outgoing edges of the source node, but even if I check which node has the
highest degree, it takes a lot of time to build the graph.

Is there any other way to remove the duplicated edges? Maybe an edge filter
based on some lambda wizardry?

Thanks in advance,
Giuseppe
-- 
Researcher at University of Bologna, Italy

Giuseppe Profiti

Tiago de Paula Peixoto

Giuseppe Profiti

tags

participants (2)