efficient random sampling of paths between two nodes
Hello Tiago, First of all, thanks for your time. I see what you mean by having a biased logic that would prefer shorter paths to longer ones, I had not thought about that. Regarding the self-reference part, I think it would not be a problem because of the structure of my particular (directed) graph. In fact, each node represents an assignment *at some given time period* and the outward neighbors of a node represent assignments *in the future*. In this way, a path can never visit a previously visited node since there are no possible cycles. In fact I can easily calculate the shortest and longest possible path between two nodes (shortest: using graphql's `shortest_distance` method, longest= number of periods in between the two nodes). So the paths I want to create (or sample) are just the different ways one can go from a node N1 (in period P1) to node N2 (in period P2 > P1). I think that in my graph I could just sample neighbors with a weight that depends on how far they are (in number of periods) from the node: the farthest neighbor will have the least probability of being chosen. This way, I'd compensate the fact that shorter paths take less hops. What do you think? regards, Franco On Mon, Mar 23, 2020 at 4:14 PM <graph-tool-request@skewed.de> wrote:
Am 21.03.20 um 17:44 schrieb Franco Peschiera:
Since I could potentially do many iterations (and samplings), I would like to have an unbiased sampling method for the Y paths without having to enumerate them all.
One option is to do the sampling during the construction of the paths. I have in mind to just use the |get_out_neighbors| method on the start node to do my own depth-first search, randomly choosing a neighbor at each moment until I get to the |cutoff| or the end node. This, I could do for each path until I get to my limit of paths to sample. I?m not sure, though, if this is 1) similar in logic to the |all_paths| method and 2) if this is close in efficiency (due to using python to iterate and sample instead of C++).
This would be heavily biased. It could be that there are many more paths that go through one of the neighbors, so if you want sample uniformly from all paths, you have to do a weighted sample of the neighbors at each step. And these weights would change at each step.
Another option, although a lot more far-fetched, is to create my own |all_paths| method in C++ and re-compile my own version of graph-tool. Is this realistic?
The problem here is not which language should be used.
Are there better options to doing this?
Counting all paths between two nodes is conjectured to be NP-Hard. This is also known as the self-avoiding-walk (SAW) problem. Sampling SAWs is not straightforward either, I believe most efficient approaches are based on MCMC. I recommend you to study the literature a little bit before attempting a solution.
(It would be a lot simpler if you would be seeking instead to sample *shortest* paths between two nodes. This can be done in linear time, is even implemented in graph-tool already in random_shortest_path().)
Best, Tiago
-- Tiago de Paula Peixoto <tiago@skewed.de>
Am 23.03.20 um 22:14 schrieb Franco Peschiera:
Hello Tiago,
First of all, thanks for your time.
I see what you mean by having a biased logic that would prefer shorter paths to longer ones, I had not thought about that.
Regarding the self-reference part, I think it would not be a problem because of the structure of my particular (directed) graph. In fact, each node represents an assignment *at some given time period* and the outward neighbors of a node represent assignments *in the future*. In this way, a path can never visit a previously visited node since there are no possible cycles. In fact I can easily calculate the shortest and longest possible path between two nodes (shortest: using graphql's `shortest_distance` method, longest= number of periods in between the two nodes).
Well, for DAG (directed acyclic graphs) the situation is quite different, you should have said so in the beginning.
So the paths I want to create (or sample) are just the different ways one can go from a node N1 (in period P1) to node N2 (in period P2 > P1). I think that in my graph I could just sample neighbors with a weight that depends on how far they are (in number of periods) from the node: the farthest neighbor will have the least probability of being chosen. This way, I'd compensate the fact that shorter paths take less hops.
What do you think?
Why do I get the impression I'm using google more than you to answer your question? Here is an approach using rejection sampling: https://math.stackexchange.com/questions/2673132/a-procedure-for-sampling-pa... Another approach is to count the number of paths that go through each node (this is feasible for DAGs) and use this to sample directly, see: https://pdfs.semanticscholar.org/0d74/e82c41124f83c842d5432abcb914ed1f410f.p... Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (2)
-
Franco Peschiera -
Tiago de Paula Peixoto