Running all_shortest_path on large graph
Hi, I have a large graph (800K vertices and 6e6 edges) and I am running the all_shortest_paths function with edge weights. The problem is that due to the size of the graph, there are probably millions of paths that qualify and the function fails with a "MemoryError" code. Is there a way to limit the number of shortest paths returned, or even better a version that returns an iterator and compute the paths as needed? Thanks -- Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/
On 11.01.2018 01:05, Evangelos Petsalis wrote:
Hi,
I have a large graph (800K vertices and 6e6 edges) and I am running the all_shortest_paths function with edge weights. The problem is that due to the size of the graph, there are probably millions of paths that qualify and the function fails with a "MemoryError" code.
Is there a way to limit the number of shortest paths returned, or even better a version that returns an iterator and compute the paths as needed?
The function all_shortest_paths() *DOES* return an iterator to all the paths! It does *not* store them all in memory. You must be doing some list conversion... -- Tiago de Paula Peixoto <tiago@skewed.de>
I understand that it returns an iterator, but I assume it still does some preparatory work internally that potentially allocates memory. I don't do any list conversion. I simply call the function without even assigning the output to any local variable. The error message I get is as follow: Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2862, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-3-07c125e74096>", line 1, in <module> all_shortest_paths(g, v_start, v_dest, ep["sp_weight"], epsilon=10) File "/usr/lib/python3/dist-packages/graph_tool/topology/__init__.py", line 1978, in all_shortest_paths _prop("v", g, all_preds_map)) MemoryError On Tue, Jan 16, 2018 at 10:28 PM, Tiago de Paula Peixoto <tiago@skewed.de> wrote:
On 11.01.2018 01:05, Evangelos Petsalis wrote:
Hi,
I have a large graph (800K vertices and 6e6 edges) and I am running the all_shortest_paths function with edge weights. The problem is that due to the size of the graph, there are probably millions of paths that qualify and the function fails with a "MemoryError" code.
Is there a way to limit the number of shortest paths returned, or even better a version that returns an iterator and compute the paths as needed?
The function all_shortest_paths() *DOES* return an iterator to all the paths! It does *not* store them all in memory. You must be doing some list conversion...
-- Tiago de Paula Peixoto <tiago@skewed.de>
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
On 18.01.2018 06:05, Evangelos Petsalis wrote:
I understand that it returns an iterator, but I assume it still does some preparatory work internally that potentially allocates memory. I don't do any list conversion. I simply call the function without even assigning the output to any local variable.
The error message I get is as follow:
Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2862, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-3-07c125e74096>", line 1, in <module> all_shortest_paths(g, v_start, v_dest, ep["sp_weight"], epsilon=10) File "/usr/lib/python3/dist-packages/graph_tool/topology/__init__.py", line 1978, in all_shortest_paths _prop("v", g, all_preds_map)) MemoryError
As usual, without a minimal and self-contained (i.e. complete) example that shows the problem, it is not possible to say anything. An error message without any context is not very useful. -- Tiago de Paula Peixoto <tiago@skewed.de>
Tiago, I start to believe that the error is not related to the size of the graph but rather to the epsilon parameter that is passed to the function all_shortest_paths(). Based on the documentation the epsilon parameter is defined as "Maximum relative difference between distances to be considered equal". I want to compute not only the best path but also paths that are "fairly" close to the best one, so I set epsilon=5 to have the function return multiple "best paths". This messes up the function it either creates a segmentation error, or it crashes the whole virtual machine. The following code creates a 5x5 graph where each edge that does not lie to the periphery of the square is connected to all neighboring edges with a weight of 1. The best path from vertex "3_1" to "1_2" is obvious the diagonal path which has a weight of 2. I was expecting that by setting epsilon=5, other paths would also be returned, for example (3_1, 3_2, 2_2, 1_2) that has weight of 4. Instead, the function crashes and returns the following error: Traceback (most recent call last): File "<input>", line 3, in <module> File "/home/labuser/anaconda3/envs/py35/lib/python3.5/site-packages/graph_tool/topology/__init__.py", line 1978, in all_shortest_paths _prop("v", g, all_preds_map)) MemoryError I probably misuse the epsilon parameter, but is there a way to achieve what I described above? The source code that recreates the problem can be found at the end of the email. I am using version 2.26 on an Ubuntu 17.10 virtual machine. Thanks in advance for the help, Vaggelis [image: Inline image 2] import gi import numpy as np gi.require_version('Gtk', '3.0') from graph_tool.all import * import graph_tool as gt from graph_tool.topology import * import tqdm import math g = gt.Graph(directed=True) dict_v_name_to_v = {} g.vertex_properties["name"] = g.new_vertex_property("string") g.vertex_properties["pos"] = g.new_vertex_property("vector<float>") g.edge_properties["name"] = g.new_edge_property("string") # name of edge, if any comes from csf g.edge_properties["text"] = g.new_edge_property("string") # name of edge, if any comes from csf g.edge_properties["elev_gain"] = g.new_edge_property("double") # name of edge, if any comes from csf vp = g.vertex_properties ep = g.edge_properties N_LON=5 N_LAT=5 # Vertices print("\r\nCreating Vertices") for i in range(0, N_LAT): for j in range(0, N_LON): v = g.add_vertex() v_name = str(i) + '_' + str(j) dict_v_name_to_v[v_name] = v vp["name"][v] = v_name vp['pos'][v] = [float(j), float(i)] # Edges print("\r\nCreating Edges") for j in range(1, N_LON - 1): # long jm1 = j - 1 jp1 = j + 1 for i in range(1, N_LAT - 1): # lat ip1 = i + 1 im1 = i - 1 im1_s, i_s, ip1_s = [str(el) for el in (im1, i, ip1)] jm1_s, j_s, jp1_s = [str(el) for el in (jm1, j, jp1)] v_i_jm1 = dict_v_name_to_v['_'.join([i_s, jm1_s])] v_i_j = dict_v_name_to_v['_'.join([i_s, j_s])] v_i_jp1 = dict_v_name_to_v['_'.join([i_s, jp1_s])] v_ip1_jm1 = dict_v_name_to_v['_'.join([ip1_s, jm1_s])] v_ip1_j = dict_v_name_to_v['_'.join([ip1_s, j_s])] v_ip1_jp1 = dict_v_name_to_v['_'.join([ip1_s, jp1_s])] v_im1_jm1 = dict_v_name_to_v['_'.join([im1_s, jm1_s])] v_im1_j = dict_v_name_to_v['_'.join([im1_s, j_s])] v_im1_jp1 = dict_v_name_to_v['_'.join([im1_s, jp1_s])] e_i_jm1 = g.add_edge(v_i_j, v_i_jm1) e_i_jp1 = g.add_edge(v_i_j, v_i_jp1) e_ip1_jm1 = g.add_edge(v_i_j, v_ip1_jm1) e_ip1_j = g.add_edge(v_i_j, v_ip1_j) e_ip1_jp1 = g.add_edge(v_i_j, v_ip1_jp1) e_im1_jm1 = g.add_edge(v_i_j, v_im1_jm1) e_im1_j = g.add_edge(v_i_j, v_im1_j) e_im1_jp1 = g.add_edge(v_i_j, v_im1_jp1) ep["elev_gain"].a = 1 for e in g.edges(): ep["text"][e] = str(ep["elev_gain"][e]) # name of edge, if any comes from csf graph_draw(g, pos=g.vertex_properties['pos'], vertex_text=g.vertex_properties["name"], edge_text=ep["text"], update_layout=False) v_start = dict_v_name_to_v['3_1'] v_dest = dict_v_name_to_v['1_3'] ############################# # epsilon=0.0005 WORKS FINE ############################# all_paths = all_shortest_paths(g, v_start, v_dest, weights=ep["elev_gain"], epsilon=.0005) print(list(all_paths)) ############################# # epsilon=5 CRASHES ############################# all_paths = all_shortest_paths(g, v_start, v_dest, weights=ep["elev_gain"], epsilon=5) print(list(all_paths)) On Wed, Jan 17, 2018 at 9:07 PM, Tiago de Paula Peixoto <tiago@skewed.de> wrote:
On 18.01.2018 06:05, Evangelos Petsalis wrote:
I understand that it returns an iterator, but I assume it still does some preparatory work internally that potentially allocates memory. I don't do any list conversion. I simply call the function without even assigning the output to any local variable.
The error message I get is as follow:
Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/IPython/core/ interactiveshell.py", line 2862, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-3-07c125e74096>", line 1, in <module> all_shortest_paths(g, v_start, v_dest, ep["sp_weight"], epsilon=10) File "/usr/lib/python3/dist-packages/graph_tool/topology/__init__.py", line 1978, in all_shortest_paths _prop("v", g, all_preds_map)) MemoryError
As usual, without a minimal and self-contained (i.e. complete) example that shows the problem, it is not possible to say anything. An error message without any context is not very useful.
-- Tiago de Paula Peixoto <tiago@skewed.de>
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
On 26.01.2018 18:55, Evangelos Petsalis wrote:
Tiago,
I start to believe that the error is not related to the size of the graph but rather to the epsilon parameter that is passed to the function all_shortest_paths(). Based on the documentation the epsilon parameter is defined as "Maximum relative difference between distances to be considered equal". I want to compute not only the best path but also paths that are "fairly" close to the best one, so I set epsilon=5 to have the function return multiple "best paths". This messes up the function it either creates a segmentation error, or it crashes the whole virtual machine.
The following code creates a 5x5 graph where each edge that does not lie to the periphery of the square is connected to all neighboring edges with a weight of 1. The best path from vertex "3_1" to "1_2" is obvious the diagonal path which has a weight of 2. I was expecting that by setting epsilon=5, other paths would also be returned, for example (3_1, 3_2, 2_2, 1_2) that has weight of 4. Instead, the function crashes and returns the following error: Traceback (most recent call last): File "<input>", line 3, in <module> File "/home/labuser/anaconda3/envs/py35/lib/python3.5/site-packages/graph_tool/topology/__init__.py", line 1978, in all_shortest_paths _prop("v", g, all_preds_map)) MemoryError
I probably misuse the epsilon parameter, but is there a way to achieve what I described above? The source code that recreates the problem can be found at the end of the email.
I am using version 2.26 on an Ubuntu 17.10 virtual machine.
Sorry for taking so long to reply. Indeed, you are not supposed to use the epsilon parameters in this way... If it is too large, it adds loops to the shortest path tree, creating an infinite loop in the algorithm. The only way to achieve what you want is to use all_paths() instead, and filter the length range you want. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (2)
-
Evangelos Petsalis -
Tiago de Paula Peixoto