Re: [graph-tool] efficient random sampling of paths between two nodes
Hello Tiago, Thanks again for the quick answers and feedback. In fact, the more I think about how to take the most advantage of the graph, the more I think that in my case what I would really like is to control the way the paths are being sampled. So, the process would become a biased sampling where I’m the one giving the bias in the form of the probability of choosing each edge when traversing from the origin to the destination. This probability would depend on information of the node and additional information that will depend on each iteration of my algorithm. It could also be an edge property, I guess. One caveat is that DFS will not be possible since I want to make each sampling independent from the previous one and so getting several paths would be less efficient. Although I’d assume that this is not too much of a problem given that I’d need a smaller sample if I know it’s probably going to be a better one. I was thinking on doing something on the lines of the following (I’ve only tested it with a small graph): def sample_path_from_nodes(node1, node2, all_weights, cutoff): # all_weights is a dictionary on the graph edges current = node1 path = [] visited = set() while True: if current == node2: # we finished! return path + [current] visited.add(current) # maybe we should add (current, len(path)) neighbors = set(current.out_neighbors()) - visited if len(path) >= cutoff or not len(neighbors): # this path does not reach node2 # we backtrack # and we will never visit this node again current = path.pop() continue if not len(path) and not len(neighbors): # there is no path to node2 # this should not be possible return None # we haven't reached node2 # but there is still hope path.append(current) weights = [all_weights.get((current, n), 1) for n in neighbors] _sum = sum(weights) weights = [w/_sum for w in weights] current = np.random.choice(a=list(neighbors), p=weights) I fear this will not be as efficient as just getting a lot of paths via all_paths. Do you think this makes sense for sampling with preferences on edges? And if so, do you think it will perform a lot better if I did it as a graph-tool C++ extension? I guess I can always test the python version and see how it goes… thanks! Franco
Date: Wed, 15 Apr 2020 10:59:51 +0200 From: Tiago de Paula Peixoto <tiago@skewed.de> To: graph-tool@skewed.de Subject: Re: [graph-tool] efficient random sampling of paths between two nodes Message-ID: <8ab80efc-e70b-58c8-b33f-54075880bfff@skewed.de> Content-Type: text/plain; charset="utf-8"
Am 15.04.20 um 10:42 schrieb Franco Peschiera:
I now would like to test giving edges a weight in order for that |cutoff| argument to use it. I know the |shortest_distance| function accepts weights of edges in order to do Dijkstra. Could the same be done with |all_paths| such that the search is stopped if an accumulated weighted-distance is reached?
I'm afraid not.
Is there any (alternative) way of controlling which paths I get from the |all_paths| besides that |cutoff| argument?
There is no other argument to the function, which is deterministic.
You could randomize the order of the edges in the graph (by removing and adding them again in a random order), which will change which paths are shown first, but would not give you a proper unbiased sampling.
Or whatever specialized logic regarding path sampling / filtering I would have to implemented myself (just like the examples you shared)? Would this be something you would consider adding to graph-tool?
I think adding an algorithm to perform uniform sampling of paths in DAGs would be useful, but I can't promise I will find the time anytime soon to implement one.
Best, Tiago
-- Tiago de Paula Peixoto <tiago@skewed.de>
------------------------------
Subject: Digest Footer
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
------------------------------
End of graph-tool Digest, Vol 147, Issue 24 *******************************************
Am 20.04.20 um 19:30 schrieb Franco Peschiera:
I fear this will not be as efficient as just getting a lot of paths via |all_paths|. Do you think this makes sense for sampling with preferences on edges?
I'll refrain from judging whether it makes sense. It's up to you to decide for yourself what you want to do.
And if so, do you think it will perform a lot better if I did it as a graph-tool C++ extension?
In general C++ will speed up any equivalent algorithm when compared to pure python, often by factors of up to 200x. But a good advice is to focus first on defining your problem well, and writing a correct implementation first, and worrying about optimization later. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (2)
-
Franco Peschiera -
Tiago de Paula Peixoto