Hi guys, when saving a graph with .save(), I understood that float are represented in hexadecimal exponentiation format. I was wondering the reason for that. The thing behind my question is that I built a graph with 5M vertices and 50M edges so storing it as an XML file leads to a 5GB one. So, if I can find a way to reduce the size, I will be very happy. Best, F.
That doesn't seem like an unnecessarily large file for the number of edges (I'm assuming with float or double properties also) you have. Thats only 100 bytes per edge. It should compress nicely for storage. The values you are storing are literally binary, and a hexidecimal representation is the most compact AFAIK. I am not certain, but maybe you can round your floating point values to nice round (negative) powers of two to make them smaller. You know, like .3 is not a simple number in hex: http://www.binaryconvert.com/result_double.html?decimal=046051 but .75 is because it comes from negative powers of two (1/2 + 1/4). But, I'd bet the XML saves all the zeros for round numbers anyway. You'd have to check. Elliot On Mon, Jul 7, 2014 at 9:48 PM, Flavien Lambert <petit.lepton@gmail.com> wrote:
Hi guys, when saving a graph with .save(), I understood that float are represented in hexadecimal exponentiation format. I was wondering the reason for that.
The thing behind my question is that I built a graph with 5M vertices and 50M edges so storing it as an XML file leads to a 5GB one. So, if I can find a way to reduce the size, I will be very happy.
Best, F.
_______________________________________________ graph-tool mailing list graph-tool@skewed.de http://lists.skewed.de/mailman/listinfo/graph-tool
On 07/08/2014 04:48 AM, Flavien Lambert wrote:
Hi guys, when saving a graph with .save(), I understood that float are represented in hexadecimal exponentiation format. I was wondering the reason for that.
The hexadecimal format is to guarantee exact representation, so that you load exactly the same number you saved, without any loss. This is not possible with a decimal format.
The thing behind my question is that I built a graph with 5M vertices and 50M edges so storing it as an XML file leads to a 5GB one. So, if I can find a way to reduce the size, I will be very happy.
Just use gzip or bzip2. Note that in graph-tool you can save directly in a compressed format by doing: g.save("graph.xml.gz") # for gzip or g.save("graph.xml.bz2") # for bzip2 The same thing works for load. This should lead to a strong reduction of your file size. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Thanks for the explanation! I was already playing with the compression. Best, F. On 8 Jul 2014 15:23, "Tiago de Paula Peixoto" <tiago@skewed.de> wrote:
On 07/08/2014 04:48 AM, Flavien Lambert wrote:
Hi guys, when saving a graph with .save(), I understood that float are represented in hexadecimal exponentiation format. I was wondering the reason for that.
The hexadecimal format is to guarantee exact representation, so that you load exactly the same number you saved, without any loss. This is not possible with a decimal format.
The thing behind my question is that I built a graph with 5M vertices and 50M edges so storing it as an XML file leads to a 5GB one. So, if I can find a way to reduce the size, I will be very happy.
Just use gzip or bzip2.
Note that in graph-tool you can save directly in a compressed format by doing:
g.save("graph.xml.gz") # for gzip
or
g.save("graph.xml.bz2") # for bzip2
The same thing works for load.
This should lead to a strong reduction of your file size.
Best, Tiago
-- Tiago de Paula Peixoto <tiago@skewed.de>
_______________________________________________ graph-tool mailing list graph-tool@skewed.de http://lists.skewed.de/mailman/listinfo/graph-tool
participants (3)
-
... -
Flavien Lambert -
Tiago de Paula Peixoto