On 02/09/2013 06:02 PM, tcb wrote:
Hi,
I am trying to read the graphml output of graph-tool's graphml using networkx.
https://github.com/networkx/networkx/issues/843
Unfortunately this does not work with any of the vector_* type property maps which graph-tool uses. Have you encountered this issue before?
Yes, this is expected, because the graphml specification only defines the following types: boolean, int, long, float, double, or string If you want another type, you are out of luck.
It seems the right thing to do might be to extend your graphml to hold the vector_* attributes as detailed:
http://graphml.graphdrawing.org/primer/graphml-primer.html#EXT
Is there some reason why it was done the way it is? How do you manage read/writing graphml data to other tools?
Extending it this way would be the strictly "correct" approach. However, it has two downsides: Firstly, it is much more cumbersome to implement. Essentially, the reader must be aware of this whole xml schema extension stuff, which currently it simply ignores. Secondly, it does not really fix the problem of interoperability, it only punts it. Two pieces of software would still need to agree and know about the extension for it to work. In other words, you still would not be able to make networkx read the vector types, unless the they modify their reader. It seems to me that simply adding a nonstandard type is much more straightforward, albeit "unclean" from the point of view of XML validity. Regarding reading data from other tools, there is no issue, since the standard types are fully supported. If the user wants to feed graphml data produced with graph-tool to other programs, then only the standard types should be used.
In the meantime, it might be possible to hack some read support for graph-tool's xml into networkx. To this end, could you please advise how to parse the 'key1' data (should be two floats)
<node id="n1"> <data key="key0">6</data> <data key="key1">0x1.5c71d0cb8d943p+3, 0x1.70db7f4083655p+3</data> </node>
The delimiter is a comma, and spaces should be ignored. The individual values are encoded according to the %a format from C99. This is to ensure exact binary representation. From the printf manpage: a, A (C99; not in SUSv2) For a conversion, the double argument is converted to hexadecimal notation (using the letters abcdef) in the style [-]0xh.hhhhp±; for A conversion the prefix 0X, the letters ABCDEF, and the exponent separa‐ tor P is used. There is one hexadecimal digit before the decimal point, and the number of digits after it is equal to the precision. The default preci‐ sion suffices for an exact representation of the value if an exact represen‐ tation in base 2 exists and otherwise is sufficiently large to distinguish values of type double. The digit before the decimal point is unspecified for nonnormalized numbers, and nonzero but otherwise unspecified for normalized numbers. I'm not sure there is any python function which can read this automatically. You can do it with ctypes: >>> from ctypes import * >>> libc = cdll.LoadLibrary("libc.so.6") >>> d = c_double() >>> libc.sscanf(b"0x1.5c71d0cb8d943p+3", b"%a", byref(d)) 1 >>> print(d) c_double(5.402846293e-315) But this would not be the most portable approach... Otherwise you can write a simple parser based on the format description above. Please keep me informed on any progress on this. Interoperability with other programs is important, so if there is anything I can do to help, I'd be glad to do it. If the networkx people would like to consider a common approach, I'm open for discussion. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>