graph-tool graphml and networkx

tcb

9 Feb 2013 9 Feb '13

5:02 p.m.

Hi, I am trying to read the graphml output of graph-tool's graphml using networkx. https://github.com/networkx/networkx/issues/843 Unfortunately this does not work with any of the vector_* type property maps which graph-tool uses. Have you encountered this issue before? It seems the right thing to do might be to extend your graphml to hold the vector_* attributes as detailed: http://graphml.graphdrawing.org/primer/graphml-primer.html#EXT Is there some reason why it was done the way it is? How do you manage read/writing graphml data to other tools? In the meantime, it might be possible to hack some read support for graph-tool's xml into networkx. To this end, could you please advise how to parse the 'key1' data (should be two floats) <node id="n1"> <data key="key0">6</data> <data key="key1">0x1.5c71d0cb8d943p+3, 0x1.70db7f4083655p+3</data> </node> thanks - -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.

Show replies by date

Tiago de Paula Peixoto

9 Feb 9 Feb

6 p.m.

On 02/09/2013 06:02 PM, tcb wrote:

...

Hi,

I am trying to read the graphml output of graph-tool's graphml using networkx.

https://github.com/networkx/networkx/issues/843

Unfortunately this does not work with any of the vector_* type property maps which graph-tool uses. Have you encountered this issue before?

Yes, this is expected, because the graphml specification only defines the following types: boolean, int, long, float, double, or string If you want another type, you are out of luck.

...

It seems the right thing to do might be to extend your graphml to hold the vector_* attributes as detailed:

http://graphml.graphdrawing.org/primer/graphml-primer.html#EXT

Is there some reason why it was done the way it is? How do you manage read/writing graphml data to other tools?

Extending it this way would be the strictly "correct" approach. However, it has two downsides: Firstly, it is much more cumbersome to implement. Essentially, the reader must be aware of this whole xml schema extension stuff, which currently it simply ignores. Secondly, it does not really fix the problem of interoperability, it only punts it. Two pieces of software would still need to agree and know about the extension for it to work. In other words, you still would not be able to make networkx read the vector types, unless the they modify their reader. It seems to me that simply adding a nonstandard type is much more straightforward, albeit "unclean" from the point of view of XML validity. Regarding reading data from other tools, there is no issue, since the standard types are fully supported. If the user wants to feed graphml data produced with graph-tool to other programs, then only the standard types should be used.

...

In the meantime, it might be possible to hack some read support for graph-tool's xml into networkx. To this end, could you please advise how to parse the 'key1' data (should be two floats)

<node id="n1"> <data key="key0">6</data> <data key="key1">0x1.5c71d0cb8d943p+3, 0x1.70db7f4083655p+3</data> </node>

The delimiter is a comma, and spaces should be ignored. The individual values are encoded according to the %a format from C99. This is to ensure exact binary representation. From the printf manpage: a, A (C99; not in SUSv2) For a conversion, the double argument is converted to hexadecimal notation (using the letters abcdef) in the style [-]0xh.hhhhp±; for A conversion the prefix 0X, the letters ABCDEF, and the exponent separa‐ tor P is used. There is one hexadecimal digit before the decimal point, and the number of digits after it is equal to the precision. The default preci‐ sion suffices for an exact representation of the value if an exact represen‐ tation in base 2 exists and otherwise is sufficiently large to distinguish values of type double. The digit before the decimal point is unspecified for nonnormalized numbers, and nonzero but otherwise unspecified for normalized numbers. I'm not sure there is any python function which can read this automatically. You can do it with ctypes: >>> from ctypes import * >>> libc = cdll.LoadLibrary("libc.so.6") >>> d = c_double() >>> libc.sscanf(b"0x1.5c71d0cb8d943p+3", b"%a", byref(d)) 1 >>> print(d) c_double(5.402846293e-315) But this would not be the most portable approach... Otherwise you can write a simple parser based on the format description above. Please keep me informed on any progress on this. Interoperability with other programs is important, so if there is anything I can do to help, I'd be glad to do it. If the networkx people would like to consider a common approach, I'm open for discussion. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

Tiago de Paula Peixoto

6:07 p.m.

On 02/09/2013 07:00 PM, Tiago de Paula Peixoto wrote:

...

On 02/09/2013 06:02 PM, tcb wrote:

...
Hi,

I am trying to read the graphml output of graph-tool's graphml using networkx.

https://github.com/networkx/networkx/issues/843

Unfortunately this does not work with any of the vector_* type property maps which graph-tool uses. Have you encountered this issue before?

Yes, this is expected, because the graphml specification only defines the following types: boolean, int, long, float, double, or string If you want another type, you are out of luck.

...
It seems the right thing to do might be to extend your graphml to hold the vector_* attributes as detailed:

http://graphml.graphdrawing.org/primer/graphml-primer.html#EXT

Is there some reason why it was done the way it is? How do you manage read/writing graphml data to other tools?

Extending it this way would be the strictly "correct" approach. However, it has two downsides: Firstly, it is much more cumbersome to implement. Essentially, the reader must be aware of this whole xml schema extension stuff, which currently it simply ignores. Secondly, it does not really fix the problem of interoperability, it only punts it. Two pieces of software would still need to agree and know about the extension for it to work. In other words, you still would not be able to make networkx read the vector types, unless the they modify their reader. It seems to me that simply adding a nonstandard type is much more straightforward, albeit "unclean" from the point of view of XML validity.

Regarding reading data from other tools, there is no issue, since the standard types are fully supported. If the user wants to feed graphml data produced with graph-tool to other programs, then only the standard types should be used.

...
In the meantime, it might be possible to hack some read support for graph-tool's xml into networkx. To this end, could you please advise how to parse the 'key1' data (should be two floats)

<node id="n1"> <data key="key0">6</data> <data key="key1">0x1.5c71d0cb8d943p+3, 0x1.70db7f4083655p+3</data> </node>

The delimiter is a comma, and spaces should be ignored. The individual values are encoded according to the %a format from C99. This is to ensure exact binary representation. From the printf manpage:

a, A (C99; not in SUSv2) For a conversion, the double argument is converted to hexadecimal notation (using the letters abcdef) in the style [-]0xh.hhhhp±; for A conversion the prefix 0X, the letters ABCDEF, and the exponent separa‐ tor P is used. There is one hexadecimal digit before the decimal point, and the number of digits after it is equal to the precision. The default preci‐ sion suffices for an exact representation of the value if an exact represen‐ tation in base 2 exists and otherwise is sufficiently large to distinguish values of type double. The digit before the decimal point is unspecified for nonnormalized numbers, and nonzero but otherwise unspecified for normalized numbers.

I'm not sure there is any python function which can read this automatically. You can do it with ctypes:

>>> from ctypes import * >>> libc = cdll.LoadLibrary("libc.so.6") >>> d = c_double() >>> libc.sscanf(b"0x1.5c71d0cb8d943p+3", b"%a", byref(d)) 1 >>> print(d) c_double(5.402846293e-315)

This should have been: >>> libc.sscanf(b"0x1.5c71d0cb8d943p+3", b"%la", byref(d)) 1 >>> print(d) c_double(10.888893506588493) Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

tcb

9:01 p.m.

alright, may as well bookmark it here- you can do this conversion directly in python with float.fromhex()

...

...
...
float.fromhex("0x1.5c71d0cb8d943p+3") 10.888893506588493

-- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.

Tiago de Paula Peixoto

10 Feb 10 Feb

12:51 a.m.

On 02/09/2013 10:01 PM, tcb wrote:

...

alright, may as well bookmark it here- you can do this conversion directly in python with float.fromhex()

...
...
...
float.fromhex("0x1.5c71d0cb8d943p+3") 10.888893506588493

Very nice! And indeed it seems to be possible to output a string in the same manner:

...

...
...
float.hex(10.888893506588493) '0x1.5c71d0cb8d943p+3'

Thanks for pointing this out. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

tcb

9 Feb 9 Feb

7:37 p.m.

OK, I think with the strictly correct approach to graphml it might be possible to read the attributes (if you also write a schema)- but you are right that it is a lot of added complexity for the fairly simple extensions you are using (especially perhaps for reading). On the other hand, other tools then have to write custom readers, which kind of defeats one of the major benefits of graphml- but others seems to have taken this approach too. The modifications to read the vector_* types might be easy enough to make on the networkx side- we'll see how it goes. Alternatively, I suppose I could get by without modifying anything by making two 'float' property maps (pos.x and pos.y)- then combine them again on the networkx side- but this is a bit cumbersome. The other thing is that I have no idea how to write a graphml file from networkx that graph-tool could understand. It would be nice if there was some format which could be easily used to work between graph-tool and networkx (and possible other tools aswell). Do you have any suggestions on what would be a better fit? I'll keep you updated and I'm sure things on the networkx side will be tracked on that github issue. thanks - -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.

Tiago de Paula Peixoto

10 Feb 10 Feb

1:22 a.m.

On 02/09/2013 08:37 PM, tcb wrote:

...

OK, I think with the strictly correct approach to graphml it might be possible to read the attributes (if you also write a schema)- but you are right that it is a lot of added complexity for the fairly simple extensions you are using (especially perhaps for reading). On the other hand, other tools then have to write custom readers, which kind of defeats one of the major benefits of graphml- but others seems to have taken this approach too.

Do you know of any current graphml readers which understand schemas as well? It seems that reader modification would be unavoidable.

...

The modifications to read the vector_* types might be easy enough to make on the networkx side- we'll see how it goes.

Given the very adequate fromhex() function you found, a trivial reader for the (double, float, long double) vector properties would be something like: >>> prop = '0x1.0000000000000p+1, 0x1.8000000000000p+1' >>> vec = [float.fromhex(x) for x in prop.split(",")] >>> print(vec) [2.0, 3.0]

...

Alternatively, I suppose I could get by without modifying anything by making two 'float' property maps (pos.x and pos.y)- then combine them again on the networkx side- but this is a bit cumbersome.

This would be a way to ensure a strictly valid graphml file, and is feasible for users which require it. Another alternative would be to encode it as a string, and decode at the other end. It is important to note however that in graph-tool regular float/double properties (not vectors) are also stored in hex format. I'm not sure if this is a violation of the standard, since I don't know if they specify exactly how a float is represented, but this may cause problems with interoperability as well (possibly networkx would also choke). I made this choice because for my own uses, it is very important that no information is lost during encoding.

...

The other thing is that I have no idea how to write a graphml file from networkx that graph-tool could understand.

To simply write a vector type, it would suffice to do the following for a float type: >>> prop = ", ".join(float.hex(x) for x in vec) >>> print(prop) '0x1.0000000000000p+1, 0x1.8000000000000p+1' This should be completely readable by graph-tool. More work would be required to support the python::object properties, if so desired, but not much. It is just a base64 encoding of a pickled object.

...

It would be nice if there was some format which could be easily used to work between graph-tool and networkx (and possible other tools aswell). Do you have any suggestions on what would be a better fit?

I think graphml is still better than anything else out there. GML, for instance, has an even cruder type system (basically only string and float), and in dot everything is a string. In the end, I'm obviously biased towards the way it is implemented in graph-tool (graphml + custom types), since it is easy enough to implement, and one can guarantee perfect representation of the graph and its properties. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

tcb

12:51 p.m.

On Sun, Feb 10, 2013 at 1:22 AM, Tiago de Paula Peixoto <tiago@skewed.de>wrote:

...

On 02/09/2013 08:37 PM, tcb wrote:

...
OK, I think with the strictly correct approach to graphml it might be possible to read the attributes (if you also write a schema)- but you are right that it is a lot of added complexity for the fairly simple

extensions

...
you are using (especially perhaps for reading). On the other hand, other tools then have to write custom readers, which kind of defeats one of the major benefits of graphml- but others seems to have taken this approach too.

Do you know of any current graphml readers which understand schemas as well? It seems that reader modification would be unavoidable.

You're right- its definitely much more complexity- I must see how gephi deals with it, but I haven't yet... I suppose I would be much happier with the complexity of graphml if it could guarantee seamless transfer of graphs between different tools. I would like to be able to work in graph-tool, networkx and occasionally some others (gephi perhaps) without having to maintain a bunch of conversion scripts.

...

...
The modifications to read the vector_* types might be easy enough to make on the networkx side- we'll see how it goes.

Given the very adequate fromhex() function you found, a trivial reader for the (double, float, long double) vector properties would be something like:

>>> prop = '0x1.0000000000000p+1, 0x1.8000000000000p+1' >>> vec = [float.fromhex(x) for x in prop.split(",")] >>> print(vec) [2.0, 3.0]

...
Alternatively, I suppose I could get by without modifying anything by making two 'float' property maps (pos.x and pos.y)- then combine them again on the networkx side- but this is a bit cumbersome.

This would be a way to ensure a strictly valid graphml file, and is feasible for users which require it. Another alternative would be to encode it as a string, and decode at the other end.

It is important to note however that in graph-tool regular float/double properties (not vectors) are also stored in hex format. I'm not sure if this is a violation of the standard, since I don't know if they specify exactly how a float is represented, but this may cause problems with interoperability as well (possibly networkx would also choke). I made this choice because for my own uses, it is very important that no information is lost during encoding.

I don't think this will be an issue- in fact its a very good idea to preserve the exact float representation.

...

...
The other thing is that I have no idea how to write a graphml file from networkx that graph-tool could understand.

To simply write a vector type, it would suffice to do the following for a float type:

>>> prop = ", ".join(float.hex(x) for x in vec) >>> print(prop) '0x1.0000000000000p+1, 0x1.8000000000000p+1'

This should be completely readable by graph-tool.

More work would be required to support the python::object properties, if so desired, but not much. It is just a base64 encoding of a pickled object.

I'm not sure how much effort is warranted to get complete support- I haven't needed to store pickled python objects yet, so I'll stick with the vector_* and see how it goes with that first.

...

...
It would be nice if there was some format which could be easily used to work between graph-tool and networkx (and possible other tools aswell). Do you have any suggestions on what would be a better fit?

I think graphml is still better than anything else out there. GML, for instance, has an even cruder type system (basically only string and float), and in dot everything is a string.

In the end, I'm obviously biased towards the way it is implemented in graph-tool (graphml + custom types), since it is easy enough to implement, and one can guarantee perfect representation of the graph and its properties.

And for that it's quite a sensible approach. For working with multiple tools I am leaning towards a json approach. The networkx node_link format is quite useful: http://networkx.github.com/documentation/latest/reference/readwrite.json_gra... and extremely easy to work with (at least for my purposes). You can easily encode pretty much every type of data. For graph-tool you would just need to include a small bit of information about the property maps, which networkx et al can easily ignore. thanks -

...

Cheers, Tiago

-- Tiago de Paula Peixoto <tiago@skewed.de>

_______________________________________________ graph-tool mailing list graph-tool@skewed.de http://lists.skewed.de/mailman/listinfo/graph-tool

Tiago de Paula Peixoto

6:26 p.m.

On 02/10/2013 01:51 PM, tcb wrote:

...

I suppose I would be much happier with the complexity of graphml if it could guarantee seamless transfer of graphs between different tools. I would like to be able to work in graph-tool, networkx and occasionally some others (gephi perhaps) without having to maintain a bunch of conversion scripts.

I find this desirable as well, but it won't happen without some coordination between the projects. In the case of networkx and graph-tool, this should probably happen, since there is interest on both sides, and it is not a very difficult problem. In the case of gephi, I expect them to be interested as well. At the very least it should be easy for them to simply ignore graphml types which they to not understand, such as vectors. I don't use gephi much, but I tried importing graphml files generated with graph-tool some time ago, and it seemed to work as expected.

...

I'm not sure how much effort is warranted to get complete support- I haven't needed to store pickled python objects yet, so I'll stick with the vector_* and see how it goes with that first.

Yes, this is more important than the pickled object type. However, I may be able to contribute with something towards full compatibility.

...

For working with multiple tools I am leaning towards a json approach. The networkx node_link format is quite useful:

http://networkx.github.com/documentation/latest/reference/readwrite.json_gra...

and extremely easy to work with (at least for my purposes). You can easily encode pretty much every type of data. For graph-tool you would just need to include a small bit of information about the property maps, which networkx et al can easily ignore.

Json may be interesting, but again, there must be an agreement on how to encode the data. It seems to me one would simply replace xml for json, and one would still need to define a graphml equivalent for it. I have the following questions: Is there any documentation on how exactly is the networkx graph stored as json, other than looking at the source code? Are there other programs which currently read/write the same format used by networkx? Can you parse it with Python in an event-based manner (this is important for large graphs)? Is this done in networkx? Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

tcb

7:25 p.m.

On Sun, Feb 10, 2013 at 6:26 PM, Tiago de Paula Peixoto <tiago@skewed.de>wrote:

...

On 02/10/2013 01:51 PM, tcb wrote: Json may be interesting, but again, there must be an agreement on how to encode the data. It seems to me one would simply replace xml for json, and one would still need to define a graphml equivalent for it.

yes and no- its not much different than graphml, with a list of nodes, a list of edges and properties attached to each. But it seems like a simpler format (to read and write), easier to extend where necessary, and is a very direct representation of the data in your graph.

...

I have the following questions:

Is there any documentation on how exactly is the networkx graph stored as json, other than looking at the source code?

not that I know of. But its very simple- just a list of nodes and a list of edges with the properties written as a dict for each.

...

Are there other programs which currently read/write the same format used by networkx?

not sure about that either- I think there are some javascript libraries which read node/link format, but there is no agreed standard for this I am aware of- since its just json there is a certain amount of flexibility with the actual data layout.

...

Can you parse it with Python in an event-based manner (this is important for large graphs)? Is this done in networkx?

Now that is a real problem. There is a python library I am aware of which reads json like that: http://lloyd.github.com/yajl/ but networkx uses standard json routines which just slurp in all the json at once and then parse it. If you're going to read all the data anyway, then it may not make too much difference for speed, but memory usage will be an issue. On the C++ side I have used the boost spirit parser for json: http://www.codeproject.com/Articles/20027/JSON-Spirit-A-C-JSON-Parser-Genera... which works really well. So for the moment I just need something simple to work, but I'm interested in a better fix. thanks, - Cheers,

...

Tiago

-- Tiago de Paula Peixoto <tiago@skewed.de>

_______________________________________________ graph-tool mailing list graph-tool@skewed.de http://lists.skewed.de/mailman/listinfo/graph-tool

Tiago de Paula Peixoto

9:29 p.m.

On 02/10/2013 08:25 PM, tcb wrote:

...

Are there other programs which currently read/write the same format used by networkx?

not sure about that either- I think there are some javascript libraries which read node/link format, but there is no agreed standard for this I am aware of- since its just json there is a certain amount of flexibility with the actual data layout.

This might actually be a problem from the point of view of interoperability... One can easily imagine different programs with their own json representation which are mutually incompatible. Although, one must admit there are very few sane ways to do the same thing, i.e. represent a graph with node/edge properties.

...

Can you parse it with Python in an event-based manner (this is important for large graphs)? Is this done in networkx?

Now that is a real problem. There is a python library I am aware of which reads json like that:

http://lloyd.github.com/yajl/

but networkx uses standard json routines which just slurp in all the json at once and then parse it. If you're going to read all the data anyway, then it may not make too much difference for speed, but memory usage will be an issue.

Implementing it like this in graph-tool would make it automatically a second-grade citizen, since loading very large graphs efficiently memory-wise is supported by all other formats.

...

On the C++ side I have used the boost spirit parser for json:

http://www.codeproject.com/Articles/20027/JSON-Spirit-A-C-JSON-Parser-Genera...

which works really well.

Nice. A spirit parser such as this would do the trick for graph-tool, since not only it would be event-driven, but it would also be faster than Python based. This actually satisfies me quite a bit.

...

So for the moment I just need something simple to work, but I'm interested in a better fix.

I'll take a look at the networkx format, and see what I can cook up with the json spirit parser, when I have some time. It should not be very difficult. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

ob

17 Aug 17 Aug

2:41 p.m.

It is important to note however that in graph-tool regular float/double properties (not vectors) are also stored in hex format. I'm not sure if this is a violation of the standard, since I don't know if they specify exactly how a float is represented, but this may cause problems with interoperability as well (possibly networkx would also choke). I made this choice because for my own uses, it is very important that no information is lost during encoding. I'm not sure if this discussion led anywhere. It seems that there is still a compatibility issue with basic data types float and double to networkx. The reason is that graph-tool uses hex-format as lexical representation. However, XML standard defines decimal format for float (http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#float) and double. Maybe we can have a parameter in the save function to enable this standard. -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.

Tiago de Paula Peixoto

2:58 p.m.

On 17.08.2015 16:41, ob wrote:

...

It is important to note however that in graph-tool regular float/double properties (not vectors) are also stored in hex format. I'm not sure if this is a violation of the standard, since I don't know if they specify exactly how a float is represented, but this may cause problems with interoperability as well (possibly networkx would also choke). I made this choice because for my own uses, it is very important that no information is lost during encoding.

I'm not sure if this discussion led anywhere. It seems that there is still a compatibility issue with basic data types float and double to networkx. The reason is that graph-tool uses hex-format as lexical representation. However, XML standard defines decimal format for float (http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#float) and double. Maybe we can have a parameter in the save function to enable this standard.

Implementing a parameter in the save function is easy, and I believe is the correct approach, since regardless of the ambiguity of the standard, most implementations expect a decimal format. Of course, it is hard to guarantee exactness with a decimal representation, hence I still think it is justifiable to keep the hex format as default. I'll implement this soon. (If there is any urgency with this, please open an issue in the website.) Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

3971

Age (days ago)

4890

Last active (days ago)

List overview

Download

12 comments

4 participants

participants (4)

ob
tcb
tcb
Tiago de Paula Peixoto

graph-tool graphml and networkx

tcb

tcb

ob

tags

participants (4)