Ni! Hi elastica,

In graph-tool you can efficiently add edges with properties using the `add_edge_list` function:


I'd also suggest that you benchmark python code with the `cProfile` module, it is much easier and more informative:


Cheers,
ale
.~´

On Sat, Nov 24, 2018 at 7:36 PM <elastica@laposte.net> wrote:
Hi

I was experimenting Kruskal performance against a data file for 100 graphs (n, m<20000) :


# =================================================
from time import clock
from sys import stderr, stdin

import graph_tool as gt
from graph_tool.topology import min_spanning_tree 

import networkx as nx


# ------------ NetworkX Code ---------------------------

def kruskal_networkX(edges, n):
    G=nx.Graph()
    for a, b, w in edges:
        G.add_edge(a,b,weight=w)
    return sum(d['weight'] for (_,_,d) in nx.minimum_spanning_tree(G).edges(data=True))



# ------------ Graph_tool Code ---------------------------

def kruskal_gt(wedges,n):
    g= gt.Graph(directed=False)
    edges, weights= zip(*[((a,b),w) for (a,b,w) in wedges])
    weight = g.new_edge_property("long long")
    for a,b,w in wedges:
        e=g.add_edge(a,b)
        weight[e]=w
    tree=min_spanning_tree(g, weights=weight)
    return sum(b*weight[e] for (e,b) in zip(g.edges(), tree))

# ------------ Benchmark ---------------------------

def go(solver, L):
    global duration

    N=len(L)
    k=1

    while k<N:
        edges=[]
        n=L[k]
        k+=1

        for a in range(n):
            d=L[k]
            k+=1
            for j in range(d):
                b=L[k]
                k+=1
                w=L[k]
                k+=1
                if a<b:
                    edges.append([a,b-1,w])
        start=clock()
        print(solver(edges, n))
        print("----------------------------")
        duration+=clock()-start

# data
L=list(int(z) for z in stdin.read().split() if z.isdigit())

for solver in [kruskal_networkX, kruskal_gt]:
    duration=0
    go(solver, L)
    print("%s : %.3f" %(solver.__name__, duration), file=stderr)

# =================================================



Outputs are in agreement but gt execution is very slow:

$ python3 gt.py < big.in > bench.out
kruskal_networkX : 15.721
kruskal_gt : 134.839


the bottleneck being within the following loop-filling the graph:


# --------------------------------------------
    for a,b,w in wedges:
        e=g.add_edge(a,b)
        weight[e]=w
# --------------------------------------------


Probably, i'm filling the graph in the wrong way.

You can dowload the data file here:
https://drive.google.com/file/d/1y0fX1kopgzQEWgmRHRnouWHnMkuliwfw/view?usp=sharing
_______________________________________________
graph-tool mailing list
graph-tool@skewed.de
https://lists.skewed.de/mailman/listinfo/graph-tool