Your point on the initialization is perfectly clear, thanks.
Does the libgraph_tool_topology.get_*dists *could suffer from the same problems of memory re-allocations ? If not, what could be the reasons of the execution time increase (approx. 23x) observed for libgraph_tool_topology.get_*dists *?
I'll try to disable openmp.
Best, François.
2015-03-10 16:33 GMT+01:00 Tiago Peixoto [via Main discussion list for the graph-tool project] ml-node+s982480n4026032h49@n3.nabble.com:
On 10.03.2015 15:39, François wrote:
Hello,
Firstly thanks for pushing in this update as quickly, that's awesome. I tested your fix, It seems that the vector initialization was not the only to blame. With the fix the execution time is still around 300ms.
I checked that you're fix is used, and "timed" the execution of the
*shortest_distance* function up to the call to *libgraph_tool_topology.get_dists*.
In the "sub-sample" graph:
- the initialization takes in average 3ms;
- the the call to libgraph_tool_topology.get_dists* *takes in average
11ms.
In the full graph:
- the initialization takes in average 100ms;
- the call to libgraph_tool_topology.get_dists* *takes in average
230ms. Do you have openmp enabled? If yes, it is best to disable it with openmp_set_num_threads(1) before running the code.
Note, however, that you will always get a slower code for a larger graph... A 10x slow-down in each function is expected if your graph is 10x larger.
Note also that the distance initialization is done inside get_dists().
Surprisingly (or not ?) when I change the pmap initialization [link to
source < https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/topology/...] to:
vals = numpy.arange(g.num_vertices(), dtype='int') pmap = g.new_vertex_property("int64_t", vals=vals)
The numpy call takes 36ms and creation of the new vertex property takes
103ms.
These are very tight loops, so things are very sensitive to cache hits, memory re-allocation issues and so on. For instance, here I get:
In [8]: g = lattice([1000, 10000]) In [9]: g Out[9]: <Graph object, undirected, with 10000000 vertices and 19989000
edges at 0x7fad583720b8>
In [10]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 16.7 ms, sys: 0 ns, total: 16.7 ms Wall time: 19.6 ms In [11]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 13.3 ms, sys: 960 ms, total: 973 ms Wall time: 976 ms In [12]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 20 ms, sys: 13.3 ms, total: 33.3 ms Wall time: 32.6 ms In [13]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 3.33 ms, sys: 950 ms, total: 953 ms Wall time: 954 ms In [14]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 20 ms, sys: 10 ms, total: 30 ms Wall time: 30 ms In [15]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 20 ms, sys: 43.3 ms, total: 63.3 ms Wall time: 62.1 ms In [16]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 23.3 ms, sys: 6.67 ms, total: 30 ms Wall time: 28.9 ms In [17]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 13.3 ms, sys: 173 ms, total: 187 ms Wall time: 186 ms In [18]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 20 ms, sys: 13.3 ms, total: 33.3 ms Wall time: 30.7 ms In [19]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 6.67 ms, sys: 853 ms, total: 860 ms Wall time: 858 ms
Notice how the timing fluctuates wildly from ~30ms to ~900ms!
With the property map creation, things are similar:
In [20]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 50 ms, sys: 0 ns, total: 50 ms Wall time: 48.2 ms In [21]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 23.3 ms, sys: 20 ms, total: 43.3 ms Wall time: 43.5 ms In [22]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 40 ms, sys: 3.33 ms, total: 43.3 ms Wall time: 42.3 ms In [23]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 30 ms, sys: 490 ms, total: 520 ms Wall time: 519 ms In [24]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 50 ms, sys: 6.67 ms, total: 56.7 ms Wall time: 56.6 ms In [25]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 26.7 ms, sys: 1.12 s, total: 1.15 s Wall time: 1.14 s In [26]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 43.3 ms, sys: 13.3 ms, total: 56.7 ms Wall time: 54.6 ms In [27]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 20 ms, sys: 1.11 s, total: 1.13 s Wall time: 1.12 s
These jumps in time are probably just memory re-allocations.
The actual initialization itself is basically the same when using a numpy array or a property map:
In [33]: %time pmap.a[:] = 10 CPU times: user 30 ms, sys: 0 ns, total: 30 ms Wall time: 20.7 ms In [34]: %time pmap.a[:] = 10 CPU times: user 26.7 ms, sys: 0 ns, total: 26.7 ms Wall time: 18.9 ms In [35]: %time pmap.a[:] = 10 CPU times: user 20 ms, sys: 0 ns, total: 20 ms Wall time: 20.1 ms n [36]: %time vals[:] = 10 CPU times: user 16.7 ms, sys: 3.33 ms, total: 20 ms Wall time: 18.3 ms In [37]: %time vals[:] = 10 CPU times: user 26.7 ms, sys: 0 ns, total: 26.7 ms Wall time: 21.7 ms In [38]: %time vals[:] = 10 CPU times: user 16.7 ms, sys: 0 ns, total: 16.7 ms Wall time: 15 ms
Therefore I'm not really sure there is any issue with the initialization.
Best, Tiago
-- Tiago de Paula Peixoto <[hidden email] http:///user/SendEmail.jtp?type=node&node=4026032&i=0>
graph-tool mailing list [hidden email] http:///user/SendEmail.jtp?type=node&node=4026032&i=1 http://lists.skewed.de/mailman/listinfo/graph-tool
*signature.asc* (836 bytes) Download Attachment
http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/attachment/4026032/0/signature.asc
Tiago de Paula Peixoto tiago@skewed.de
If you reply to this email, your message will be added to the discussion below:
http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... To start a new topic under Main discussion list for the graph-tool project, email ml-node+s982480n2141189h16@n3.nabble.com To unsubscribe from Shortest_distance complexity when used with max_dist, click here http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4026018&code=ZnJhbmNvaXMua2F3YWxhQGdtYWlsLmNvbXw0MDI2MDE4fDIxMTQ0MDk4Nzk= . NAML http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml