On 19.08.2014 15:24, Dominik wrote:
Hi,
I am working with a lot of vertex property maps that amongst others need to be initialized. Here is a simple example:
I am profiling it with: time python2.7 -m cProfile -o stats test_pmap_speed.py then draw the callgraph with gprof2dot -f pstats stats | dot -Tpdf -o output.pdf
Here is an example: output.pdf <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4025725/output.pdf>
There seems to be a constant overhead associated with accessing the property maps. Fortunately, it appears to be independent of the number of vertices. It contributes quite significantly to the runtime of my tool. The initialization is an example, but in general, I also need to copy one map to another one, e.g. map.a = map1.a, where map1 belongs to the graph Sv, map belongs to S, and Sv is a GraphView of S.
Alright, after the context, here comes my question: how can I efficiently (i.e. fast) initialize the maps, copy the maps, etc.? Also, creating a new map seems to take quite long. I am not working with big graphs, but with a lot of maps.
Unfortunately, this use case (many very small graphs / properties) is not what graph-tool was designed to handle very efficiently... There is not much that can be changed in the property map creation code to improve performance, I'm afraid. If you create the property map only once, and access it many times, and also if your graph does not change in size (this is important), then you can keep an instance of the array returned by map.a instead of obtaining it every single time. This would remove the expensive call to __set_array() you are seeing in your call graph. You can copy its value in the usual way, i.e. a[:] = b, and it would be reflected in the property map. (This is because the property map does not keep a numpy array internally; it creates one on the fly when get_array() is called, which points to the internal buffer. This is because property maps can grow, whereas numpy arrays cannot.) I guess another option would be for you to keep arrays the whole time (maybe even a big multidimensional one), not property maps. You could transform them into property maps only when necessary.
I have seen gt.inline, however there is practically no documentation. Could that be a way to speed this up, and if so, how can I learn about the usage?
This would not help much, unless you move the entire loops to C++. I plan to expand this part of the documentation at some point when time permits. Best, tiago -- Tiago de Paula Peixoto <tiago@skewed.de>