Hi, I am still fighting with the same problem:
Ioana K-Hulpus wrote
I am trying to compute a weight for each relation, that I call "Exclusivity". Given that the graph has typed edges (with the type of edges stored as an int in the property map label_id), the Exclusivity of an edge is computed as 1 / (number of all edges of same type outgoing from the source node + number of all edges of the same type incoming to the same target node - 1) .
As advised by Tiago, I rewrote the code to:
1. not do property map lookup inside the loop
2. not lookup edge descriptors with g.edge(source, target)
The new code:
def computeexclusivities(g): count = 0 excl = g.new_edge_property("double") g.edge_properties["Exclusivity"] = excl lbl_id = g.edge_properties["label_id"] edges = g.get_edges([lbl_id, excl]) for edge in edges: edges_of_head = g.get_out_edges(edge, [lbl_id]) count_edges_of_head_same_type = len(np.where(np.where(edges_of_head == edge) == 2)) edges_of_tail = g.get_in_edges(edge, [lbl_id]) count_edges_of_tail_same_type = len(np.where(np.where(edges_of_tail == edge) == 2)) edge = 1 / (count_edges_of_head_same_type + count_edges_of_tail_same_type - 1) count = count + 1 if count % 1000 == 0: print(count, " exclusivities computed") excl.get_array()[:] = edges[:, 3]
Now, I am not sure that excl.get_array() is the best solution, is there any other way to set the property map without providing an edge descriptor? Or an O(1) method for retrieving an edge descriptor?
In any case, the code is running now for a week and only did 17 mil edges out of 50 mil, and at the moment it solves 2000 edges per minute.
Can anyone help me find a faster solution?
In my graph processing pipeline, this is just an intermediary stage, I will have to use the computed exclusivities for computing transfer probabilities for Personalised PageRank, and I cannot afford such long computing times.
Thanks a lot!