On 11/30/2012 05:45 PM, alepulver wrote:
Hello,
I've started using graph-tool mainly because it's faster than networkx. Recently I tried analyzing Wordnet by constructing a graph from all synsets and polysemy relations between them; it has 117659 nodes and 549414 edges.
When I run statistics such as degree average, local clustering average and pseudo diameter they only take a few seconds. On the other hand, if I try to draw it with graph_draw it takes 6 hours (using 4 cores, and about 24hs of combined CPU time) or graphviz_draw (which I haven't run for more than an hour).
Initially I thought the bottleneck was in using labels (which I removed afterwards), then in the layout algorithm (before confirming that random layout takes only a few seconds), and finally outputting pixels instead of vectors (svg instead of png), but none of them seemed to have to do with it. Also, saving the graph also takes just a few seconds.
I have uploaded it in xml.gz format here (it's 6.4MB): http://www.sendspace.com/file/zc7yrh
The time is certainly being spent on the layout code. Random layout is always very fast, but it is not what is used internally by graph_draw(), which uses instead sfdp_layout(). The graph you supplied has exactly 64835 connected components, one of which is giant (~31% of the vertices), and the remaining are components of very small size (mostly 1 or 2), composed mostly of single vertices with self-loops. The multilevel layout algorithm in sfdp_layout() does not handle this very well, since it does not collapse multiple components together during the coarsening phase. I recommend simply drawing the largest component, as such: g = GraphView(g, vfilt=label_largest_component(g)) pos = sfdp_layout(g, verbose=True) # This way you get more information # about the layout progress graph_draw(g, pos=pos, output="output.png") The result with ~70% of the network composed of single isolated vertices would not look very informative, in any case. When drawing large graphs, it is recommended to look closely at all the options which can be given to sfdp_layout(), since they can often dramatically change the speed and the quality of the result (although the default behavior should be good enough in most cases). Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>