Memory requirement estimate for a large graph
Dear Tiago, How is it possible to get an estimate for the memory requirement of a graph in graph-tool? I know that graph-tool is built upon C++ and Boost, and the adjacency list is stored via a hash-map. Apart from the cost of storing the values of vertices indices and edges indices as `unsigned long`, what is the memory overhead of the structures used in storing the graph? For example, for a network of 1M vertices and 100M links without attributes, how much real memory should I plan to use, excluding temporaries? Sorry if the question is repeated, but I could not find it in the previous mailing list posts. Regards, Carlo
Am 06.07.20 um 13:00 schrieb Carlo Nicolini:
Dear Tiago,
How is it possible to get an estimate for the memory requirement of a graph in graph-tool?
Yes, it is, and I should put this in the documentation somewhere.
I know that graph-tool is built upon C++ and Boost, and the adjacency list is stored via a hash-map.
Not quite, we use an adjacency list using std::vector<>.
Apart from the cost of storing the values of vertices indices and edges indices as `unsigned long`, what is the memory overhead of the structures used in storing the graph?
We use a vector-based bidirectional adjacency list, so each edge appears twice. Each edge is comprised of two size_t (uint64_t) values, for the target/source and the edge index, so we need 32 bytes per edge. For each vertex we need a std::vector<> which is 24 bytes and a uint64_t to separate the in-/out-lists, so we also need 32 bytes per node. Therefore we need in total: (N + E) * 32 bytes
For example, for a network of 1M vertices and 100M links without attributes, how much real memory should I plan to use, excluding temporaries?
That would be: 3232000000 bytes = 3.01 GB In practice you will need a little more, since std::vector<> tends to over-commit. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Many thanks Tiago for the quick answer. I've tried (using `htop` command) to measure the RES memory requirement for such a graph with 1M nodes and 100M links, but the results is almost twice the size, a total of 6.2 GB. Is there a reason why I get that figure? I am on MacOS 10.15.6 Catalina Il giorno lun 6 lug 2020 alle ore 14:41 Tiago de Paula Peixoto < tiago@skewed.de> ha scritto:
Am 06.07.20 um 13:00 schrieb Carlo Nicolini:
Dear Tiago,
How is it possible to get an estimate for the memory requirement of a graph in graph-tool?
Yes, it is, and I should put this in the documentation somewhere.
I know that graph-tool is built upon C++ and Boost, and the adjacency list is stored via a hash-map.
Not quite, we use an adjacency list using std::vector<>.
Apart from the cost of storing the values of vertices indices and edges indices as `unsigned long`, what is the memory overhead of the structures used in storing the graph?
We use a vector-based bidirectional adjacency list, so each edge appears twice. Each edge is comprised of two size_t (uint64_t) values, for the target/source and the edge index, so we need 32 bytes per edge.
For each vertex we need a std::vector<> which is 24 bytes and a uint64_t to separate the in-/out-lists, so we also need 32 bytes per node.
Therefore we need in total:
(N + E) * 32 bytes
For example, for a network of 1M vertices and 100M links without attributes, how much real memory should I plan to use, excluding temporaries?
That would be:
3232000000 bytes = 3.01 GB
In practice you will need a little more, since std::vector<> tends to over-commit.
Best, Tiago
-- Tiago de Paula Peixoto <tiago@skewed.de>
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Am 06.07.20 um 15:53 schrieb Carlo Nicolini:
Many thanks Tiago for the quick answer.
I've tried (using `htop` command) to measure the RES memory requirement for such a graph with 1M nodes and 100M links, but the results is almost twice the size, a total of 6.2 GB. Is there a reason why I get that figure?
Hard to say without a minimal working example. Try saving the network to disk, and loading it again from a newly started interpreter. I tried on my machine, and I got 4.5 GB. As I explained, std::vector<> may over-commit the allocations. -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (2)
-
Carlo Nicolini -
Tiago de Paula Peixoto