I want to use graph-tool using the multiprocessing library in Python 2.6. I
keep running into issues though with trying to share a full graph object. I
can store the graph in a multiprocessing.Namespace, but it doesn't keep the
dicts of properties. Example:
def initNS( ns ):
_g = Graph( directed = False)
ns.graph = _g
ns.edge_properties = {
'genres': _g.new_edge_property("vector<string>"),
'movieid': _g.new_edge_property("int"),
}
ns.vertex_properties = {
'personid': _g.new_vertex_property("int32_t")
}
"""
Build property maps for edges and vertices to hold our data
The graph vertices represent actors whereas movies represent edges
"""
# Edges
_g.edge_properties["genres"] = ns.edge_properties['genres']
_g.edge_properties["movieid"] = ns.edge_properties['movieid']
# Vertices
_g.vertex_properties["personid"] = ns.vertex_properties['personid']
ns.graph = _g
##########
This initializes 'ns', which is a multiprocessing.Namespace. The problem is
that for example, ns.edge_properties[ * ] tells me that the type isn't
pickle-able. I tried to just skip that and use the _g.edge_properties to
access it, but those dicts aren't carried over to the different process in
the pool. Presumably b/c they aren't pickle-able.
Any thoughts about how to fix this?
(For those interested, I'm attempting to use the IMDbPy library to do some
graph analysis on the relationships among actors and movies. Each process
has it's own db connection and trying to populate the graph with actor and
movie information in parallel since it's a pretty large and dense graph.
Somewhere in the neighborhood of 250,000+ vertices for just a depth of three
relationships)
Thanks,
--
Derek