Your point on the initialization is perfectly clear, thanks. Does the libgraph_tool_topology.get_*dists *could suffer from the same problems of memory re-allocations ? If not, what could be the reasons of the execution time increase (approx. 23x) observed for libgraph_tool_topology.get_*dists *? I'll try to disable openmp. Best, François. 2015-03-10 16:33 GMT+01:00 Tiago Peixoto [via Main discussion list for the graph-tool project] <ml-node+s982480n4026032h49@n3.nabble.com>:
On 10.03.2015 15:39, François wrote:
Hello,
Firstly thanks for pushing in this update as quickly, that's awesome. I tested your fix, It seems that the vector initialization was not the only to blame. With the fix the execution time is still around 300ms.
I checked that you're fix is used, and "timed" the execution of the *shortest_distance* function up to the call to *libgraph_tool_topology.get_dists*.
In the "sub-sample" graph:
* the initialization takes in average 3ms; * the the call to libgraph_tool_topology.get_dists* *takes in average 11ms.
In the full graph:
* the initialization takes in average 100ms; * the call to libgraph_tool_topology.get_dists* *takes in average 230ms. Do you have openmp enabled? If yes, it is best to disable it with openmp_set_num_threads(1) before running the code.
Note, however, that you will always get a slower code for a larger graph... A 10x slow-down in each function is expected if your graph is 10x larger.
Note also that the distance initialization is done inside get_dists().
Surprisingly (or not ?) when I change the pmap initialization [link to source < https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/topology/__init__.py#L1248>] to:
vals = numpy.arange(g.num_vertices(), dtype='int') pmap = g.new_vertex_property("int64_t", vals=vals)
The numpy call takes 36ms and creation of the new vertex property takes 103ms.
These are very tight loops, so things are very sensitive to cache hits, memory re-allocation issues and so on. For instance, here I get:
In [8]: g = lattice([1000, 10000])
In [9]: g Out[9]: <Graph object, undirected, with 10000000 vertices and 19989000 edges at 0x7fad583720b8>
In [10]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 16.7 ms, sys: 0 ns, total: 16.7 ms Wall time: 19.6 ms
In [11]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 13.3 ms, sys: 960 ms, total: 973 ms Wall time: 976 ms
In [12]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 20 ms, sys: 13.3 ms, total: 33.3 ms Wall time: 32.6 ms
In [13]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 3.33 ms, sys: 950 ms, total: 953 ms Wall time: 954 ms
In [14]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 20 ms, sys: 10 ms, total: 30 ms Wall time: 30 ms
In [15]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 20 ms, sys: 43.3 ms, total: 63.3 ms Wall time: 62.1 ms
In [16]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 23.3 ms, sys: 6.67 ms, total: 30 ms Wall time: 28.9 ms
In [17]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 13.3 ms, sys: 173 ms, total: 187 ms Wall time: 186 ms
In [18]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 20 ms, sys: 13.3 ms, total: 33.3 ms Wall time: 30.7 ms
In [19]: %time vals = numpy.arange(g.num_vertices(), dtype='int') CPU times: user 6.67 ms, sys: 853 ms, total: 860 ms Wall time: 858 ms
Notice how the timing fluctuates wildly from ~30ms to ~900ms!
With the property map creation, things are similar:
In [20]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 50 ms, sys: 0 ns, total: 50 ms Wall time: 48.2 ms
In [21]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 23.3 ms, sys: 20 ms, total: 43.3 ms Wall time: 43.5 ms
In [22]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 40 ms, sys: 3.33 ms, total: 43.3 ms Wall time: 42.3 ms
In [23]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 30 ms, sys: 490 ms, total: 520 ms Wall time: 519 ms
In [24]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 50 ms, sys: 6.67 ms, total: 56.7 ms Wall time: 56.6 ms
In [25]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 26.7 ms, sys: 1.12 s, total: 1.15 s Wall time: 1.14 s
In [26]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 43.3 ms, sys: 13.3 ms, total: 56.7 ms Wall time: 54.6 ms
In [27]: %time pmap = g.new_vertex_property("int64_t", vals=vals) CPU times: user 20 ms, sys: 1.11 s, total: 1.13 s Wall time: 1.12 s
These jumps in time are probably just memory re-allocations.
The actual initialization itself is basically the same when using a numpy array or a property map:
In [33]: %time pmap.a[:] = 10 CPU times: user 30 ms, sys: 0 ns, total: 30 ms Wall time: 20.7 ms
In [34]: %time pmap.a[:] = 10 CPU times: user 26.7 ms, sys: 0 ns, total: 26.7 ms Wall time: 18.9 ms
In [35]: %time pmap.a[:] = 10 CPU times: user 20 ms, sys: 0 ns, total: 20 ms Wall time: 20.1 ms
n [36]: %time vals[:] = 10 CPU times: user 16.7 ms, sys: 3.33 ms, total: 20 ms Wall time: 18.3 ms
In [37]: %time vals[:] = 10 CPU times: user 26.7 ms, sys: 0 ns, total: 26.7 ms Wall time: 21.7 ms
In [38]: %time vals[:] = 10 CPU times: user 16.7 ms, sys: 0 ns, total: 16.7 ms Wall time: 15 ms
Therefore I'm not really sure there is any issue with the initialization.
Best, Tiago
-- Tiago de Paula Peixoto <[hidden email] <http:///user/SendEmail.jtp?type=node&node=4026032&i=0>>
_______________________________________________ graph-tool mailing list [hidden email] <http:///user/SendEmail.jtp?type=node&node=4026032&i=1> http://lists.skewed.de/mailman/listinfo/graph-tool
*signature.asc* (836 bytes) Download Attachment <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/attachment/4026032/0/signature.asc> -- Tiago de Paula Peixoto <tiago@skewed.de>
------------------------------ If you reply to this email, your message will be added to the discussion below:
http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... To start a new topic under Main discussion list for the graph-tool project, email ml-node+s982480n2141189h16@n3.nabble.com To unsubscribe from Shortest_distance complexity when used with max_dist, click here <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4026018&code=ZnJhbmNvaXMua2F3YWxhQGdtYWlsLmNvbXw0MDI2MDE4fDIxMTQ0MDk4Nzk=> . NAML <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
-- François Kawala -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.