I want to use graph-tool using the multiprocessing library in Python 2.6. I keep running into issues though with trying to share a full graph object. I can store the graph in a multiprocessing.Namespace, but it doesn't keep the dicts of properties. Example:
def initNS( ns ): _g = Graph( directed = False) ns.graph = _g ns.edge_properties = { 'genres': _g.new_edge_property("vector<string>"), 'movieid': _g.new_edge_property("int"), } ns.vertex_properties = { 'personid': _g.new_vertex_property("int32_t") }
""" Build property maps for edges and vertices to hold our data The graph vertices represent actors whereas movies represent edges """ # Edges _g.edge_properties["genres"] = ns.edge_properties['genres'] _g.edge_properties["movieid"] = ns.edge_properties['movieid'] # Vertices _g.vertex_properties["personid"] = ns.vertex_properties['personid']
ns.graph = _g
########## This initializes 'ns', which is a multiprocessing.Namespace. The problem is that for example, ns.edge_properties[ * ] tells me that the type isn't pickle-able. I tried to just skip that and use the _g.edge_properties to access it, but those dicts aren't carried over to the different process in the pool. Presumably b/c they aren't pickle-able.
Any thoughts about how to fix this?
(For those interested, I'm attempting to use the IMDbPy library to do some graph analysis on the relationships among actors and movies. Each process has it's own db connection and trying to populate the graph with actor and movie information in parallel since it's a pretty large and dense graph. Somewhere in the neighborhood of 250,000+ vertices for just a depth of three relationships)
Thanks, -- Derek
On 12/03/2010 05:07 PM, Derek Ditch wrote:
I want to use graph-tool using the multiprocessing library in Python 2.6. I keep running into issues though with trying to share a full graph object. I can store the graph in a multiprocessing.Namespace, but it doesn't keep the dicts of properties. Example:
def initNS( ns ): _g = Graph( directed = False) ns.graph = _g ns.edge_properties = { 'genres': _g.new_edge_property("vector<string>"), 'movieid': _g.new_edge_property("int"), } ns.vertex_properties = { 'personid': _g.new_vertex_property("int32_t") }
""" Build property maps for edges and vertices to hold our data The graph vertices represent actors whereas movies represent edges """ # Edges _g.edge_properties["genres"] = ns.edge_properties['genres'] _g.edge_properties["movieid"] = ns.edge_properties['movieid'] # Vertices _g.vertex_properties["personid"] = ns.vertex_properties['personid'] ns.graph = _g
########## This initializes 'ns', which is a multiprocessing.Namespace. The problem is that for example, ns.edge_properties[ * ] tells me that the type isn't pickle-able. I tried to just skip that and use the _g.edge_properties to access it, but those dicts aren't carried over to the different process in the pool. Presumably b/c they aren't pickle-able.
Any thoughts about how to fix this?
The problem is that property maps cannot be pickled independently, since they need internal references to the graph object. However, the entire graph can be pickled, together with its internal properties (i.e. properties stored in g.vertex_properties and g.edge_properties). So, instead of keeping the properties in ns.edge_properties, for instance, you should keep them in ns.graph.edge_properties.
(For those interested, I'm attempting to use the IMDbPy library to do some graph analysis on the relationships among actors and movies. Each process has it's own db connection and trying to populate the graph with actor and movie information in parallel since it's a pretty large and dense graph. Somewhere in the neighborhood of 250,000+ vertices for just a depth of three relationships)
Note that graph-tool is not thread safe... So any access or modification to the graph must be protected by a lock.
Cheers, Tiago
-- Tiago de Paula Peixoto tiago@skewed.de
Tiago Peixoto wrote
Note that graph-tool is not thread safe... So any access or modification to the graph must be protected by a lock.
Cheers, Tiago
Sorry for bringing this thread from the dead but I didnt want to start a new one if its a known thing.
Above you describe graph-tool as non thread-safe. What about the parallel version of centralities for example? (the OpenMP vesions)
I tried to do some test (I have to say Im not sure if Im missing something!) with the modified BGL files, with and without the openmp pragmas. The results are a bit different. No openmp vs openmp and 2 runs of openmp between them.
Is this something known? Am I missing something?
thanks
(I have included a csv export of the values.) export_openmp_test.dat http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4025209/export_openmp_test.dat
-- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
possible patch. "centrality" is shared. now my test correlate perfectly with the non threaded.
diff --git a/src/boost-workaround/boost/graph/betweenness_centrality.hpp b/src/boost-workaround/boost/graph/betweenness_centrality.hpp index 32b56b77a0e5023a068e0b6bb0e9b4a156b17fb9..d4c3536d705b2465dbb020c3f15aef3c93e93ea6 100644 --- a/src/boost-workaround/boost/graph/betweenness_centrality.hpp +++ b/src/boost-workaround/boost/graph/betweenness_centrality.hpp @@ -364,7 +364,10 @@ namespace detail { namespace graph { }
if (u != s) { - update_centrality(centrality, u, get(dependency, u)); + #pragma omp critical(globalupdate) + { + update_centrality(centrality, u, get(dependency, u)); + } } }
-- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 11/20/2013 04:58 PM, Tasos wrote:
possible patch. "centrality" is shared. now my test correlate perfectly with the non threaded.
Yes, it seems like you have found a race condition. Your patch does not completely solve it, but there is a complete fix for this now in the git version.. Please test it.
Cheers, Tiago
Ill check the git. But for sure my fix did at least half job. a bit further down its another update (divide by 2). thanks
-- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
Without a machine to check I just have a couple of comments.
The fix will probably (in git) kill a lot of speed. Based on my test with big graphs that a race will happen, i fixed it wih:
you just need to wrap only the "update_centrality()" functions (both vertex and edge) and one function further down, "divide_dy_two" near the "if_undirected".
With these things everything works great and no problem with speed even in big graphs.
Best T
-- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 11/21/2013 11:04 PM, Tasos wrote:
Without a machine to check I just have a couple of comments.
The fix will probably (in git) kill a lot of speed. Based on my test with big graphs that a race will happen, i fixed it wih:
you just need to wrap only the "update_centrality()" functions (both vertex and edge) and one function further down, "divide_dy_two" near the "if_undirected".
I don't think so... Making several entries into a critical omp region will make things very slow. Furthermore most of the time is spent finding the shortest paths, which is outside the critical region.
The "divide_by_two" operation at the end is outside the parallel region, so it does not need to be protected.
Cheers, Tiago