Good day to all,
I wrote an initial message (through the issues site) asking about the possibility of using the all_paths to random sample paths between two nodes in a graph.
It was pointed out that the: 1. The algorithm is deterministic and that 2. It’s not possible to adapt it to iterate in a unbiased random way.
Now I have a couple more questions, namely searching for advice on how to achieve what I want. ------------------------------
*Context*:
I’ve modeled certain parts of combinatorial optimization problem as a set of graphs that I’m exploiting to efficiently generate variables on the fly during the optimization routine. A variable in my problem is represented by a path between two nodes in one of those graphs. ------------------------------
*Specifically, what I want*:
1. I have a big graph. 2. During each iteration I choose two nodes on the graph (based on some logic irrelevant to the question). 3. I get all the paths between those two nodes (or I randomly sample them if there are too many).
For the sampling in the third step *I’m currently doing* the following (assuming a sample of size X and a population of paths of Y):
1. I get the generator of paths using the function all_paths. 2. I iterate over the generator using Reservoir sampling https://en.wikipedia.org/wiki/Reservoir_sampling and I stop if 1) I explored all paths or 2) I reach a multiple of X (e.g., 3X).
Many times, the sample X is really small compared to the population Y). X can be 500-2000, and Y can be 100.000+. ------------------------------
*My problem is*:
Since I could potentially do many iterations (and samplings), I would like to have an unbiased sampling method for the Y paths without having to enumerate them all.
One option is to do the sampling during the construction of the paths. I have in mind to just use the get_out_neighbors method on the start node to do my own depth-first search, randomly choosing a neighbor at each moment until I get to the cutoff or the end node. This, I could do for each path until I get to my limit of paths to sample. I’m not sure, though, if this is 1) similar in logic to the all_paths method and 2) if this is close in efficiency (due to using python to iterate and sample instead of C++).
Another option, although a lot more far-fetched, is to create my own all_paths method in C++ and re-compile my own version of graph-tool. Is this realistic?
Are there better options to doing this?
Thanks,
Franco Peschiera
Am 21.03.20 um 17:44 schrieb Franco Peschiera:
Since I could potentially do many iterations (and samplings), I would like to have an unbiased sampling method for the Y paths without having to enumerate them all.
One option is to do the sampling during the construction of the paths. I have in mind to just use the |get_out_neighbors| method on the start node to do my own depth-first search, randomly choosing a neighbor at each moment until I get to the |cutoff| or the end node. This, I could do for each path until I get to my limit of paths to sample. I’m not sure, though, if this is 1) similar in logic to the |all_paths| method and 2) if this is close in efficiency (due to using python to iterate and sample instead of C++).
This would be heavily biased. It could be that there are many more paths that go through one of the neighbors, so if you want sample uniformly from all paths, you have to do a weighted sample of the neighbors at each step. And these weights would change at each step.
Another option, although a lot more far-fetched, is to create my own |all_paths| method in C++ and re-compile my own version of graph-tool. Is this realistic?
The problem here is not which language should be used.
Are there better options to doing this?
Counting all paths between two nodes is conjectured to be NP-Hard. This is also known as the self-avoiding-walk (SAW) problem. Sampling SAWs is not straightforward either, I believe most efficient approaches are based on MCMC. I recommend you to study the literature a little bit before attempting a solution.
(It would be a lot simpler if you would be seeking instead to sample *shortest* paths between two nodes. This can be done in linear time, is even implemented in graph-tool already in random_shortest_path().)
Best, Tiago