On Tue, Jun 15, 2021 at 8:45 PM Mathias Versichele <mathias.versichele@gmail.com> wrote:

Hi, I've been using graph-tool for the last year or so for calculating shortest-path trees on large-scale road networks. We used to do this in a postgres database with the pgrouting extension, but were continually confronted with unacceptable startup costs. The switch to a python module using graph-tool has considerably sped up our routing queries, but we are suffering from this service using too much memory. I have the feeling I might be using graph-tool in a wrong way, but before I dive into that, it would be good to know what is the expected memory footprint for my use case.

Take for example a road network with 30Mio edges and 31 Mio nodes (the combined road network of Belgium, Netherland, France and Germany in OSM). For this road network, I need to calculate shortest paths using different edge weights (edge property map). What would be a very rough estimate how much memory this would use ? For the network only + per edge-property-map. In our setup, there would be one edge-property-map with edge weights per country. We're currently seeing usage of over 50Gb easily, spiking even higher when we're loading extra cost structures or networks. Is that expected ? Or am I experiencing memory leaks somewhere ?

How I'm using graph-tool right now:

1) loading network
nw = dataframe with edges info in the structure: startnode-id, endnode-id, edge-id, country

G = gt.Graph(directed=True)
G.ep["edge_id"] = G.new_edge_property("int")
G.ep["country_id"] = G.new_edge_property("int16_t")
eprops = [G.ep["edge_id"], G.ep["country_id"]]

n = G.add_edge_list(nw.to_numpy(), hashed=True, eprops=eprops)
G.vertex_properties["n"] = n

2) loading edge costs: I'm using GraphViews

countries = list of country-codes
edge_filter = np.in1d(G.ep["country_id"].a, [get_country_id(c) for c in countries])
GV = gt.GraphView(G, efilt=edge_filter)

edges = GV.get_edges([GV.edge_index])
sources = G.vertex_properties["n"].a[edges[:,0]]
targets = G.vertex_properties["n"].a[edges[:,1]]
idxs = edges[:,2]

db_costs = pandas dataframe with structure: source, target, cost

sti = np.vstack((idxs,sources,targets)).T
sti_df = pd.DataFrame({'idx': sti[:, 0], 'source': sti[:, 1], 'target': sti[:, 2]})
m = pd.merge(sti_df, db_costs, on=['source', 'target'], how='left', sort=False)[['idx', 'c']]
wgts_list = m.sort_values(by=['idx']).T.iloc[1].to_numpy()
wgts_list = np.where(wgts_list==np.nan, np.iinfo(np.int32).max, wgts_list)

wgts = GV.new_edge_property("int32_t")
wgts.fa = wgts_list
wgts.fa = np.where(wgts.fa==-2147483648, np.iinfo(np.int32).max, wgts.fa)
GV.edge_properties[cs_ids_str] = wgts

GV2 = gt.GraphView(GV, efilt=wgts.fa != np.inf)

3) I then use GV2 for calculating Dijkstra and such...

I could of course work on an MWE of some sorts. But would be very nice to get an estimate on mem footprint, and to see if I'm doing sth really silly in the code above.

Thx!