Hello, I observe a quite strange bug that involves python's multiprocessing library. I try to use (read only) one graph instance with several *Multithreading.Process*. The graph is unserialized in the parent process. Each child receives a reference to the graph. Then each child does simple repetitive calls to *graph_tool.topology.shortest_distance*. Everything great each child process works as fast as it can. However when the main process executes the *hash_graphs* function presented below, each child process hangs infinitely. The *hash_graphs* is executed prior to the children start. def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache. :type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed. :rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest() I package a MWE, it is available here : https://drive.google.com/file/d/0B5GhhBKHOKOxVnpfYTBwNDZxODA/view?usp=sharin.... To run it simply do : tar xzf mwe.tar.gz # run the buggy version python3 -m mwe DO_HASH # run as expected python3 -m mwe The buggy output looks like : <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 The expected output looks like : <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-6, started)> done. <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-4, started)> done. <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> done. <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-7, started)> done. <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 ... How could I explain this behavior ? Bests, François.
To be comprehensive, I add here the MWE source. Note that I fixed the vertices so that the output is reproducible. However, one could select randomly the vertices and would end with the same behavior. Bests, François. import multiprocessing import graph_tool as gt import graph_tool.topology as gtt import hashlib import sys class MyProcess(multiprocessing.Process): """ A process that computes shortest paths and shortest distances in a graph tool graph. """ def __init__(self, graph, test): super(MyProcess, self).__init__() self.graph = graph self.test = test def run(self): while True: # Operation is repeated so that the bug is cristal clear. source, target = self.test source = self.graph.vertex(source) target = self.graph.vertex(target) # We start the work. print('{} does shortest_distance from {} to {}'.format(self, source, target)) gtt.shortest_distance(self.graph, source=source, weights=self.graph.ep['weight'], max_dist=1400, pred_map=True) # We end the work. print('{} done.'.format(self)) def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache. :type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed. :rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest() if __name__ == '__main__': # Unserialize the graph. graph = gt.load_graph('./mwe/graph.gt.gz') # Bug switch. if sys.argv[-1] == 'DO_HASH': graph_hash = hash_graphs(graph) # Repetable inputs. tests = [(452946, 391015), (266188, 207342), (514127, 290838), (439705, 87897), (223098, 440593), (279880, 368550), (108357, 199593), (273888, 275937)] # Actual work. procs = [MyProcess(graph, tests[i]) for i in range(8)] for proc in procs: proc.start() for proc in procs: proc.join() On Thu, Nov 10, 2016 at 7:24 PM, François Kawala <francois.kawala@gmail.com> wrote:
Hello,
I observe a quite strange bug that involves python's multiprocessing library. I try to use (read only) one graph instance with several *Multithreading.Process*. The graph is unserialized in the parent process. Each child receives a reference to the graph. Then each child does simple repetitive calls to *graph_tool.topology.shortest_distance*. Everything great each child process works as fast as it can. However when the main process executes the *hash_graphs* function presented below, each child process hangs infinitely. The *hash_graphs* is executed prior to the children start.
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
I package a MWE, it is available here : https://drive.google.com/file/d/ 0B5GhhBKHOKOxVnpfYTBwNDZxODA/view?usp=sharing. To run it simply do :
tar xzf mwe.tar.gz
# run the buggy version python3 -m mwe DO_HASH
# run as expected python3 -m mwe
The buggy output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937
The expected output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-6, started)> done. <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-4, started)> done. <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> done. <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-7, started)> done. <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 ...
How could I explain this behavior ?
Bests, François.
-- François Kawala
As a side note, the MWE I provide calls graph_tool.topology.shortest_path without specifying the target., but the logs refer to the target, this is counterintuitive, sorry about that. However, the behavior is as described above when one does specify the target to graph_tool.topology.shortest_path. François On Sun, Nov 13, 2016 at 6:03 PM, François Kawala <francois.kawala@gmail.com> wrote:
To be comprehensive, I add here the MWE source. Note that I fixed the vertices so that the output is reproducible. However, one could select randomly the vertices and would end with the same behavior.
Bests, François.
import multiprocessing import graph_tool as gt import graph_tool.topology as gtt import hashlib import sys
class MyProcess(multiprocessing.Process): """ A process that computes shortest paths and shortest distances in a graph tool graph. """ def __init__(self, graph, test): super(MyProcess, self).__init__() self.graph = graph self.test = test
def run(self): while True: # Operation is repeated so that the bug is cristal clear. source, target = self.test source = self.graph.vertex(source) target = self.graph.vertex(target)
# We start the work. print('{} does shortest_distance from {} to {}'.format(self, source, target))
gtt.shortest_distance(self.graph, source=source, weights=self.graph.ep['weight'], max_dist=1400, pred_map=True)
# We end the work. print('{} done.'.format(self))
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
if __name__ == '__main__':
# Unserialize the graph. graph = gt.load_graph('./mwe/graph.gt.gz')
# Bug switch. if sys.argv[-1] == 'DO_HASH': graph_hash = hash_graphs(graph)
# Repetable inputs. tests = [(452946, 391015), (266188, 207342), (514127, 290838), (439705, 87897), (223098, 440593), (279880, 368550), (108357, 199593), (273888, 275937)]
# Actual work. procs = [MyProcess(graph, tests[i]) for i in range(8)]
for proc in procs: proc.start()
for proc in procs: proc.join()
On Thu, Nov 10, 2016 at 7:24 PM, François Kawala < francois.kawala@gmail.com> wrote:
Hello,
I observe a quite strange bug that involves python's multiprocessing library. I try to use (read only) one graph instance with several *Multithreading.Process*. The graph is unserialized in the parent process. Each child receives a reference to the graph. Then each child does simple repetitive calls to *graph_tool.topology.shortest_distance*. Everything great each child process works as fast as it can. However when the main process executes the *hash_graphs* function presented below, each child process hangs infinitely. The *hash_graphs* is executed prior to the children start.
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
I package a MWE, it is available here : https://drive.google.com/file/ d/0B5GhhBKHOKOxVnpfYTBwNDZxODA/view?usp=sharing. To run it simply do :
tar xzf mwe.tar.gz
# run the buggy version python3 -m mwe DO_HASH
# run as expected python3 -m mwe
The buggy output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937
The expected output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-6, started)> done. <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-4, started)> done. <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> done. <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-7, started)> done. <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 ...
How could I explain this behavior ?
Bests, François.
-- François Kawala
-- François Kawala
I build the latest version *with debug enabled and openmp disable*, the bug is not present anymore. version: 2.19 (commit da041f33, Sat Nov 12 17:27:48 2016 +0100) gcc version: 5.4.0 compilation flags: -I/usr/lib/python3/dist-packages/numpy/core/include -I/usr/include -I/usr/include/python3.5m -fvisibility=default -fvisibility-inlines-hidden -O3 -Wno-deprecated -Wno-unknown-pragmas -ftemplate-depth-250 -ggdb3 -Wall -Wextra -ftemplate-backtrace-limit=0 -Wno-unknown-pragmas install prefix: /usr/local python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: False* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 For the record, the version below has the bug : version: *2.18* (commit 37996a0a, Thu Jul 21 17:56:14 2016 +0200) gcc version: *5.3.1* compilation flags: -Wdate-time -D_FORTIFY_SOURCE=2 -I/usr/include/python3.5m -I/usr/include -I/usr/lib/python3/dist-packages/numpy/core/include -Wall -Wextra -ftemplate-backtrace-limit=0 -flto=4 -ffunction-sections -fdata-sections --std=gnu++14 -DNDEBUG -ftemplate-depth-250 -Wno-deprecated -Wno-unknown-pragmas -O3 -fvisibility=default -fvisibility-inlines-hidden -fopenmp -Wl,--gc-sections install prefix: /usr python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: True* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 I'm building the head version with openmp enabled to know if it might be involved in the bug. François. On Sun, Nov 13, 2016 at 6:35 PM, François Kawala <francois.kawala@gmail.com> wrote:
As a side note, the MWE I provide calls graph_tool.topology.shortest_path without specifying the target., but the logs refer to the target, this is counterintuitive, sorry about that. However, the behavior is as described above when one does specify the target to graph_tool.topology.shortest_p ath.
François
On Sun, Nov 13, 2016 at 6:03 PM, François Kawala < francois.kawala@gmail.com> wrote:
To be comprehensive, I add here the MWE source. Note that I fixed the vertices so that the output is reproducible. However, one could select randomly the vertices and would end with the same behavior.
Bests, François.
import multiprocessing import graph_tool as gt import graph_tool.topology as gtt import hashlib import sys
class MyProcess(multiprocessing.Process): """ A process that computes shortest paths and shortest distances in a graph tool graph. """ def __init__(self, graph, test): super(MyProcess, self).__init__() self.graph = graph self.test = test
def run(self): while True: # Operation is repeated so that the bug is cristal clear. source, target = self.test source = self.graph.vertex(source) target = self.graph.vertex(target)
# We start the work. print('{} does shortest_distance from {} to {}'.format(self, source, target))
gtt.shortest_distance(self.graph, source=source, weights=self.graph.ep['weight'], max_dist=1400, pred_map=True)
# We end the work. print('{} done.'.format(self))
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
if __name__ == '__main__':
# Unserialize the graph. graph = gt.load_graph('./mwe/graph.gt.gz')
# Bug switch. if sys.argv[-1] == 'DO_HASH': graph_hash = hash_graphs(graph)
# Repetable inputs. tests = [(452946, 391015), (266188, 207342), (514127, 290838), (439705, 87897), (223098, 440593), (279880, 368550), (108357, 199593), (273888, 275937)]
# Actual work. procs = [MyProcess(graph, tests[i]) for i in range(8)]
for proc in procs: proc.start()
for proc in procs: proc.join()
On Thu, Nov 10, 2016 at 7:24 PM, François Kawala < francois.kawala@gmail.com> wrote:
Hello,
I observe a quite strange bug that involves python's multiprocessing library. I try to use (read only) one graph instance with several *Multithreading.Process*. The graph is unserialized in the parent process. Each child receives a reference to the graph. Then each child does simple repetitive calls to *graph_tool.topology.shortest_distance*. Everything great each child process works as fast as it can. However when the main process executes the *hash_graphs* function presented below, each child process hangs infinitely. The *hash_graphs* is executed prior to the children start.
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
I package a MWE, it is available here : https://drive.google.com/file/ d/0B5GhhBKHOKOxVnpfYTBwNDZxODA/view?usp=sharing. To run it simply do :
tar xzf mwe.tar.gz
# run the buggy version python3 -m mwe DO_HASH
# run as expected python3 -m mwe
The buggy output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937
The expected output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-6, started)> done. <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-4, started)> done. <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> done. <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-7, started)> done. <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 ...
How could I explain this behavior ?
Bests, François.
-- François Kawala
-- François Kawala
-- François Kawala
I confirm I reproduce the bug with the version below : version: 2.19 (commit da041f33, Sat Nov 12 17:27:48 2016 +0100) gcc version: 5.4.0 compilation flags: -I/usr/lib/python3/dist-packages/numpy/core/include -I/usr/include -I/usr/include/python3.5m -fopenmp -fvisibility=default -fvisibility-inlines-hidden -O3 -Wno-deprecated -Wno-unknown-pragmas -ftemplate-depth-250 -DNDEBUG -Wall -Wextra -ftemplate-backtrace-limit=0 install prefix: /usr/local python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: True* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 François. On Mon, Nov 14, 2016 at 3:12 PM, François Kawala <francois.kawala@gmail.com> wrote:
I build the latest version *with debug enabled and openmp disable*, the bug is not present anymore.
version: 2.19 (commit da041f33, Sat Nov 12 17:27:48 2016 +0100) gcc version: 5.4.0 compilation flags: -I/usr/lib/python3/dist-packages/numpy/core/include -I/usr/include -I/usr/include/python3.5m -fvisibility=default -fvisibility-inlines-hidden -O3 -Wno-deprecated -Wno-unknown-pragmas -ftemplate-depth-250 -ggdb3 -Wall -Wextra -ftemplate-backtrace-limit=0 -Wno-unknown-pragmas install prefix: /usr/local python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: False* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64
For the record, the version below has the bug :
version: *2.18* (commit 37996a0a, Thu Jul 21 17:56:14 2016 +0200) gcc version: *5.3.1* compilation flags: -Wdate-time -D_FORTIFY_SOURCE=2 -I/usr/include/python3.5m -I/usr/include -I/usr/lib/python3/dist-packages/numpy/core/include -Wall -Wextra -ftemplate-backtrace-limit=0 -flto=4 -ffunction-sections -fdata-sections --std=gnu++14 -DNDEBUG -ftemplate-depth-250 -Wno-deprecated -Wno-unknown-pragmas -O3 -fvisibility=default -fvisibility-inlines-hidden -fopenmp -Wl,--gc-sections install prefix: /usr python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: True* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64
I'm building the head version with openmp enabled to know if it might be involved in the bug.
François.
On Sun, Nov 13, 2016 at 6:35 PM, François Kawala < francois.kawala@gmail.com> wrote:
As a side note, the MWE I provide calls graph_tool.topology.shortest_path without specifying the target., but the logs refer to the target, this is counterintuitive, sorry about that. However, the behavior is as described above when one does specify the target to graph_tool.topology.shortest_p ath.
François
On Sun, Nov 13, 2016 at 6:03 PM, François Kawala < francois.kawala@gmail.com> wrote:
To be comprehensive, I add here the MWE source. Note that I fixed the vertices so that the output is reproducible. However, one could select randomly the vertices and would end with the same behavior.
Bests, François.
import multiprocessing import graph_tool as gt import graph_tool.topology as gtt import hashlib import sys
class MyProcess(multiprocessing.Process): """ A process that computes shortest paths and shortest distances in a graph tool graph. """ def __init__(self, graph, test): super(MyProcess, self).__init__() self.graph = graph self.test = test
def run(self): while True: # Operation is repeated so that the bug is cristal clear. source, target = self.test source = self.graph.vertex(source) target = self.graph.vertex(target)
# We start the work. print('{} does shortest_distance from {} to {}'.format(self, source, target))
gtt.shortest_distance(self.graph, source=source, weights=self.graph.ep['weight'], max_dist=1400, pred_map=True)
# We end the work. print('{} done.'.format(self))
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
if __name__ == '__main__':
# Unserialize the graph. graph = gt.load_graph('./mwe/graph.gt.gz')
# Bug switch. if sys.argv[-1] == 'DO_HASH': graph_hash = hash_graphs(graph)
# Repetable inputs. tests = [(452946, 391015), (266188, 207342), (514127, 290838), (439705, 87897), (223098, 440593), (279880, 368550), (108357, 199593), (273888, 275937)]
# Actual work. procs = [MyProcess(graph, tests[i]) for i in range(8)]
for proc in procs: proc.start()
for proc in procs: proc.join()
On Thu, Nov 10, 2016 at 7:24 PM, François Kawala < francois.kawala@gmail.com> wrote:
Hello,
I observe a quite strange bug that involves python's multiprocessing library. I try to use (read only) one graph instance with several *Multithreading.Process*. The graph is unserialized in the parent process. Each child receives a reference to the graph. Then each child does simple repetitive calls to *graph_tool.topology.shortest_distance*. Everything great each child process works as fast as it can. However when the main process executes the *hash_graphs* function presented below, each child process hangs infinitely. The *hash_graphs* is executed prior to the children start.
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
I package a MWE, it is available here : https://drive.google.com/file/ d/0B5GhhBKHOKOxVnpfYTBwNDZxODA/view?usp=sharing. To run it simply do :
tar xzf mwe.tar.gz
# run the buggy version python3 -m mwe DO_HASH
# run as expected python3 -m mwe
The buggy output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937
The expected output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-6, started)> done. <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-4, started)> done. <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> done. <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-7, started)> done. <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 ...
How could I explain this behavior ?
Bests, François.
-- François Kawala
-- François Kawala
-- François Kawala
-- François Kawala
To disable openmp at the runtime with : OMP_NUM_THREADS=1 python3 -m mwe DO_HASH Also works. F. On Mon, Nov 14, 2016 at 4:42 PM, François Kawala <francois.kawala@gmail.com> wrote:
I confirm I reproduce the bug with the version below :
version: 2.19 (commit da041f33, Sat Nov 12 17:27:48 2016 +0100) gcc version: 5.4.0 compilation flags: -I/usr/lib/python3/dist-packages/numpy/core/include -I/usr/include -I/usr/include/python3.5m -fopenmp -fvisibility=default -fvisibility-inlines-hidden -O3 -Wno-deprecated -Wno-unknown-pragmas -ftemplate-depth-250 -DNDEBUG -Wall -Wextra -ftemplate-backtrace-limit=0 install prefix: /usr/local python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: True* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64
François.
On Mon, Nov 14, 2016 at 3:12 PM, François Kawala < francois.kawala@gmail.com> wrote:
I build the latest version *with debug enabled and openmp disable*, the bug is not present anymore.
version: 2.19 (commit da041f33, Sat Nov 12 17:27:48 2016 +0100) gcc version: 5.4.0 compilation flags: -I/usr/lib/python3/dist-packages/numpy/core/include -I/usr/include -I/usr/include/python3.5m -fvisibility=default -fvisibility-inlines-hidden -O3 -Wno-deprecated -Wno-unknown-pragmas -ftemplate-depth-250 -ggdb3 -Wall -Wextra -ftemplate-backtrace-limit=0 -Wno-unknown-pragmas install prefix: /usr/local python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: False* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64
For the record, the version below has the bug :
version: *2.18* (commit 37996a0a, Thu Jul 21 17:56:14 2016 +0200) gcc version: *5.3.1* compilation flags: -Wdate-time -D_FORTIFY_SOURCE=2 -I/usr/include/python3.5m -I/usr/include -I/usr/lib/python3/dist-packages/numpy/core/include -Wall -Wextra -ftemplate-backtrace-limit=0 -flto=4 -ffunction-sections -fdata-sections --std=gnu++14 -DNDEBUG -ftemplate-depth-250 -Wno-deprecated -Wno-unknown-pragmas -O3 -fvisibility=default -fvisibility-inlines-hidden -fopenmp -Wl,--gc-sections install prefix: /usr python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: True* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64
I'm building the head version with openmp enabled to know if it might be involved in the bug.
François.
On Sun, Nov 13, 2016 at 6:35 PM, François Kawala < francois.kawala@gmail.com> wrote:
As a side note, the MWE I provide calls graph_tool.topology.shortest_path without specifying the target., but the logs refer to the target, this is counterintuitive, sorry about that. However, the behavior is as described above when one does specify the target to graph_tool.topology.shortest_p ath.
François
On Sun, Nov 13, 2016 at 6:03 PM, François Kawala < francois.kawala@gmail.com> wrote:
To be comprehensive, I add here the MWE source. Note that I fixed the vertices so that the output is reproducible. However, one could select randomly the vertices and would end with the same behavior.
Bests, François.
import multiprocessing import graph_tool as gt import graph_tool.topology as gtt import hashlib import sys
class MyProcess(multiprocessing.Process): """ A process that computes shortest paths and shortest distances in a graph tool graph. """ def __init__(self, graph, test): super(MyProcess, self).__init__() self.graph = graph self.test = test
def run(self): while True: # Operation is repeated so that the bug is cristal clear. source, target = self.test source = self.graph.vertex(source) target = self.graph.vertex(target)
# We start the work. print('{} does shortest_distance from {} to {}'.format(self, source, target))
gtt.shortest_distance(self.graph, source=source, weights=self.graph.ep['weight'], max_dist=1400, pred_map=True)
# We end the work. print('{} done.'.format(self))
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
if __name__ == '__main__':
# Unserialize the graph. graph = gt.load_graph('./mwe/graph.gt.gz')
# Bug switch. if sys.argv[-1] == 'DO_HASH': graph_hash = hash_graphs(graph)
# Repetable inputs. tests = [(452946, 391015), (266188, 207342), (514127, 290838), (439705, 87897), (223098, 440593), (279880, 368550), (108357, 199593), (273888, 275937)]
# Actual work. procs = [MyProcess(graph, tests[i]) for i in range(8)]
for proc in procs: proc.start()
for proc in procs: proc.join()
On Thu, Nov 10, 2016 at 7:24 PM, François Kawala < francois.kawala@gmail.com> wrote:
Hello,
I observe a quite strange bug that involves python's multiprocessing library. I try to use (read only) one graph instance with several *Multithreading.Process*. The graph is unserialized in the parent process. Each child receives a reference to the graph. Then each child does simple repetitive calls to *graph_tool.topology.shortest_distance*. Everything great each child process works as fast as it can. However when the main process executes the *hash_graphs* function presented below, each child process hangs infinitely. The *hash_graphs* is executed prior to the children start.
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args: graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes()) graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
I package a MWE, it is available here : https://drive.google.com/file/ d/0B5GhhBKHOKOxVnpfYTBwNDZxODA/view?usp=sharing. To run it simply do :
tar xzf mwe.tar.gz
# run the buggy version python3 -m mwe DO_HASH
# run as expected python3 -m mwe
The buggy output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937
The expected output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-6, started)> done. <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-4, started)> done. <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> done. <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-7, started)> done. <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 ...
How could I explain this behavior ?
Bests, François.
-- François Kawala
-- François Kawala
-- François Kawala
-- François Kawala
-- François Kawala
I don't think this has anything to do with graph-tool. Mixing multiprocessing (i.e. fork()) with openmp (i.e. threads) is a bad idea, and in general does not work, since it leaves mutexes in an inconsistent state. From Python's multiprocessing documentation: The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic. You might have some luck with the other spawning modes. Best, Tiago On 15.11.2016 17:56, François Kawala wrote:
To disable openmp at the runtime with :
OMP_NUM_THREADS=1 python3 -m mwe DO_HASH
Also works. F.
On Mon, Nov 14, 2016 at 4:42 PM, François Kawala <francois.kawala@gmail.com <mailto:francois.kawala@gmail.com>> wrote:
I confirm I reproduce the bug with the version below :
version: 2.19 (commit da041f33, Sat Nov 12 17:27:48 2016 +0100) gcc version: 5.4.0 compilation flags: -I/usr/lib/python3/dist-packages/numpy/core/include -I/usr/include -I/usr/include/python3.5m -fopenmp -fvisibility=default -fvisibility-inlines-hidden -O3 -Wno-deprecated -Wno-unknown-pragmas -ftemplate-depth-250 -DNDEBUG -Wall -Wextra -ftemplate-backtrace-limit=0 install prefix: /usr/local python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: True* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64
François.
On Mon, Nov 14, 2016 at 3:12 PM, François Kawala <francois.kawala@gmail.com <mailto:francois.kawala@gmail.com>> wrote:
I build the latest version *with debug enabled and openmp disable*, the bug is not present anymore.
version: 2.19 (commit da041f33, Sat Nov 12 17:27:48 2016 +0100) gcc version: 5.4.0 compilation flags: -I/usr/lib/python3/dist-packages/numpy/core/include -I/usr/include -I/usr/include/python3.5m -fvisibility=default -fvisibility-inlines-hidden -O3 -Wno-deprecated -Wno-unknown-pragmas -ftemplate-depth-250 -ggdb3 -Wall -Wextra -ftemplate-backtrace-limit=0 -Wno-unknown-pragmas install prefix: /usr/local python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: False* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64
For the record, the version below has the bug :
version: *2.18* (commit 37996a0a, Thu Jul 21 17:56:14 2016 +0200) gcc version: *5.3.1* compilation flags: -Wdate-time -D_FORTIFY_SOURCE=2 -I/usr/include/python3.5m -I/usr/include -I/usr/lib/python3/dist-packages/numpy/core/include -Wall -Wextra -ftemplate-backtrace-limit=0 -flto=4 -ffunction-sections -fdata-sections --std=gnu++14 -DNDEBUG -ftemplate-depth-250 -Wno-deprecated -Wno-unknown-pragmas -O3 -fvisibility=default -fvisibility-inlines-hidden -fopenmp -Wl,--gc-sections install prefix: /usr python dir: /usr/lib/python3/dist-packages graph filtering: True *openmp: True* uname: Linux tungstene 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64
I'm building the head version with openmp enabled to know if it might be involved in the bug.
François.
On Sun, Nov 13, 2016 at 6:35 PM, François Kawala <francois.kawala@gmail.com <mailto:francois.kawala@gmail.com>> wrote:
As a side note, the MWE I provide calls graph_tool.topology.shortest_path without specifying the target., but the logs refer to the target, this is counterintuitive, sorry about that. However, the behavior is as described above when one does specify the target to graph_tool.topology.shortest_path.
François
On Sun, Nov 13, 2016 at 6:03 PM, François Kawala <francois.kawala@gmail.com <mailto:francois.kawala@gmail.com>> wrote:
To be comprehensive, I add here the MWE source. Note that I fixed the vertices so that the output is reproducible. However, one could select randomly the vertices and would end with the same behavior.
Bests, François.
import multiprocessing import graph_tool as gt import graph_tool.topology as gtt import hashlib import sys
class MyProcess(multiprocessing.Process): """ A process that computes shortest paths and shortest distances in a graph tool graph. """ def __init__(self, graph, test): super(MyProcess, self).__init__() self.graph = graph self.test = test
def run(self): while True: # Operation is repeated so that the bug is cristal clear. source, target = self.test source = self.graph.vertex(source) target = self.graph.vertex(target)
# We start the work. print('{} does shortest_distance from {} to {}'.format(self, source, target))
gtt.shortest_distance(self.graph, source=source,
weights=self.graph.ep['weight'], max_dist=1400, pred_map=True)
# We end the work. print('{} done.'.format(self))
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args:
graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes())
graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
if __name__ == '__main__':
# Unserialize the graph. graph = gt.load_graph('./mwe/graph.gt <http://graph.gt>.gz')
# Bug switch. if sys.argv[-1] == 'DO_HASH': graph_hash = hash_graphs(graph)
# Repetable inputs. tests = [(452946, 391015), (266188, 207342), (514127, 290838), (439705, 87897), (223098, 440593), (279880, 368550), (108357, 199593), (273888, 275937)]
# Actual work. procs = [MyProcess(graph, tests[i]) for i in range(8)]
for proc in procs: proc.start()
for proc in procs: proc.join()
On Thu, Nov 10, 2016 at 7:24 PM, François Kawala <francois.kawala@gmail.com <mailto:francois.kawala@gmail.com>> wrote:
Hello,
I observe a quite strange bug that involves python's multiprocessing library. I try to use (read only) one graph instance with several *Multithreading.Process*. The graph is unserialized in the parent process. Each child receives a reference to the graph. Then each child does simple repetitive calls to *graph_tool.topology.shortest_distance*. Everything great each child process works as fast as it can. However when the main process executes the *hash_graphs* function presented below, each child process hangs infinitely. The *hash_graphs* is executed prior to the children start.
def hash_graphs(*args): """ Provides an edge based graph digest that can be used to invalidate old cache.
:type args: tuple of :class:`graph_tool.GraphView` :param args: the graphs to be hashed.
:rtype: str :return: a hash digest of the input graph. """ graph_hash = hashlib.md5() for graph in args:
graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "source").a.tobytes())
graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'], "target").a.tobytes()) return graph_hash.hexdigest()
I package a MWE, it is available here : https://drive.google.com/file/d/0B5GhhBKHOKOxVnpfYTBwNDZxODA/view?usp=sharin... <https://drive.google.com/file/d/0B5GhhBKHOKOxVnpfYTBwNDZxODA/view?usp=sharing>. To run it simply do :
tar xzf mwe.tar.gz
# run the buggy version python3 -m mwe DO_HASH
# run as expected python3 -m mwe
The buggy output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937
The expected output looks like :
<MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-6, started)> done. <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to 368550 <MyProcess(MyProcess-4, started)> done. <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to 87897 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-2, started)> done. <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to 207342 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 <MyProcess(MyProcess-5, started)> done. <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to 440593 <MyProcess(MyProcess-1, started)> done. <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to 391015 <MyProcess(MyProcess-8, started)> done. <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to 275937 <MyProcess(MyProcess-7, started)> done. <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to 199593 <MyProcess(MyProcess-3, started)> done. <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to 290838 ...
How could I explain this behavior ?
Bests, François.
-- François Kawala
-- François Kawala
-- François Kawala
-- François Kawala
-- François Kawala
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
-- Tiago de Paula Peixoto <tiago@skewed.de>
participants (2)
-
François Kawala -
Tiago de Paula Peixoto