How BIG is your graph
Dear graphErs, I am about to start a project working on a graph with about 50M Nodes and 1B edges and I want your opinion regarding the feasibility of this endeavor with graph_tool. Can you please share your experience with graph examples that were comparable in size? I mostly want to calculate centrality measures and I will need to apply (several) filters to isolate nodes with particular attributes. On top of it, I am running on a super-computer (so memory is NOT an issue) and if I am lucky they have installed/enabled the parallel version of the library. Thank you very much, He
On 07/14/2014 07:11 AM, Helen Lampesis wrote:
Dear graphErs,
I am about to start a project working on a graph with about 50M Nodes and 1B edges and I want your opinion regarding the feasibility of this endeavor with graph_tool.
Can you please share your experience with graph examples that were comparable in size?
I mostly want to calculate centrality measures and I will need to apply (several) filters to isolate nodes with particular attributes.
On top of it, I am running on a super-computer (so memory is NOT an issue) and if I am lucky they have installed/enabled the parallel version of the library.
You should be able to tackle graphs of this size, if you have enough memory. For centrality calculations, graph-tool has pure-C++ parallel code, so you should see some good performance. Graph filtering can also be done without involving python loops, so it should scale well as well. Just as an illustration, for the graph size you suggested: In [1]: g = random_graph(50000000, lambda: poisson(40), random=False, directed=False) In [2]: %time pagerank(g) CPU times: user 3min 26s, sys: 44 s, total: 4min 10s Wall time: 11.3 s So, pagerank takes about 11 seconds on a machine with 32 cores (it would have taken around 3-4 minutes in a single thread). And it takes about 50 GB of ram to store the graph. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Fantastic! Thanks a lot! Sent from mobile device. Please excuse typos-terseness.
On Jul 14, 2014, at 4:42 AM, Tiago de Paula Peixoto <tiago@skewed.de> wrote:
On 07/14/2014 07:11 AM, Helen Lampesis wrote: Dear graphErs,
I am about to start a project working on a graph with about 50M Nodes and 1B edges and I want your opinion regarding the feasibility of this endeavor with graph_tool.
Can you please share your experience with graph examples that were comparable in size?
I mostly want to calculate centrality measures and I will need to apply (several) filters to isolate nodes with particular attributes.
On top of it, I am running on a super-computer (so memory is NOT an issue) and if I am lucky they have installed/enabled the parallel version of the library.
You should be able to tackle graphs of this size, if you have enough memory. For centrality calculations, graph-tool has pure-C++ parallel code, so you should see some good performance.
Graph filtering can also be done without involving python loops, so it should scale well as well.
Just as an illustration, for the graph size you suggested:
In [1]: g = random_graph(50000000, lambda: poisson(40), random=False, directed=False) In [2]: %time pagerank(g) CPU times: user 3min 26s, sys: 44 s, total: 4min 10s Wall time: 11.3 s
So, pagerank takes about 11 seconds on a machine with 32 cores (it would have taken around 3-4 minutes in a single thread). And it takes about 50 GB of ram to store the graph.
Best, Tiago
-- Tiago de Paula Peixoto <tiago@skewed.de>
_______________________________________________ graph-tool mailing list graph-tool@skewed.de http://lists.skewed.de/mailman/listinfo/graph-tool
participants (3)
-
Helen Lampesis -
Panagiotis Achlioptas -
Tiago de Paula Peixoto