With graph-tool and joblib working together, do we need to send graph.copy() in the "Parallel" call like in the code below when using graph vertex filtering with .set_vertex_filter? graph.copy() makes memory usage extreme in large graphs (2M Vs, 4M Es) but in my head ensures any concurrency problems. (or 'graph' without '.copy()' is ok?)
What is the best way to run parallel graph searches and filtering (different vertex per thread) with graph-tool and joblib? (or without joblib)
### # defined and filled earlier g_graph = graph_tool.Graph(directed=False) eprop_ang = g_graph.new_edge_property("float")
### from joblib import Parallel, delayed import multiprocessing import os import tempfile import shutil import datetime
path2 = tempfile.mkdtemp() out_path2 = os.path.join(path2,'z6path_out2.mmap') out2 = np.memmap(out_path2, dtype=np.float32, shape=(g_graph.num_vertices(),dims), mode='w+')
num_cores = 30 num_pre_workers = 60
def runparallel(graph, row, out2): dist, pred = graph_tool.search.dijkstra_search(graph, graph.vertex(row), weight=eprop_ang) ## etc etc #####
v_filter = graph.new_vertex_property('bool',val=False) for v in SOMETHING_LOCAL: v_filter[v] = True graph.set_vertex_filter(v_filter) # do something with the filtered 'graph' (subgraph) # and save output to out2 out4[row] = RESULT ## graph.clear_filters()
Parallel(n_jobs=num_cores, pre_dispatch=num_pre_workers, verbose=1)(delayed(runparallel)(g_graph.copy(), r, out2) for r in range(g_graph.num_vertices()))
-- Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/
Am 28.04.2018 um 16:26 schrieb Tasos:
What is the best way to run parallel graph searches and filtering (different vertex per thread) with graph-tool and joblib? (or without joblib)
The best approach is to create a different GraphView object for each filtering, instead of setting the filter for the main graph. Read about GraphViews here:
https://graph-tool.skewed.de/static/doc/quickstart.html#graph-views
Best, Tiago
Hi, I have the same question. Upon running code attempting to use GraphViews, I get an error during pickling:
error: 'i' format requires -2147483648 <= number <= 2147483647
More specifically, it looks like a line inside joblib is unhappy: CustomizablePickler(buffer, self._reducers).dump(obj)
And this takes us to a struct packing line: header = struct.pack("!i", n)
So, if I had to guess, I'd suspect joblib is trying to pickle the whole graph rather than the GraphView reference, or something like this. Was either of you able to get code to successfully parallelize using GraphViews to avoid copying?
This is on python 3.6.5 with graph_tool version '2.26 (commit , )' , joblib version '0.11'
Below is a minimal breaking example, if it helps. I am also happy to provide other information such as tracebacks.
def toy_func(g): return g.vertex_properties['skim'][0][1]
vmr = [0, 1] g = load_graph(path) # 22,000 vertex directed graph (a road network) skim_table = shortest_distance(g, weights=g.edge_properties["weight"]) g.properties['skim'] = skim_table p(joblib.delayed(toy_func)(GraphView(g)) for i in range(10))
-- Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/
Am 16.07.2018 um 03:34 schrieb cmos:
Hi, I have the same question. Upon running code attempting to use GraphViews, I get an error during pickling:
error: 'i' format requires -2147483648 <= number <= 2147483647
More specifically, it looks like a line inside joblib is unhappy: CustomizablePickler(buffer, self._reducers).dump(obj)
And this takes us to a struct packing line: header = struct.pack("!i", n)
So, if I had to guess, I'd suspect joblib is trying to pickle the whole graph rather than the GraphView reference, or something like this. Was either of you able to get code to successfully parallelize using GraphViews to avoid copying?
It is impossible to say anything, without a minimal and self-contained example that shows the problem.
Below is a minimal breaking example, if it helps. I am also happy to provide other information such as tracebacks.
def toy_func(g): return g.vertex_properties['skim'][0][1]
vmr = [0, 1] g = load_graph(path) # 22,000 vertex directed graph (a road network) skim_table = shortest_distance(g, weights=g.edge_properties["weight"]) g.properties['skim'] = skim_table p(joblib.delayed(toy_func)(GraphView(g)) for i in range(10))
That is not a complete minimal example; the function 'p' is undefined and there are other errors. Please provide one that actually runs, and does not depend on external data.
Best, Tiago