I'm attempting to use get_edges_prob to find the most likely missing edges out of every possible non-edge. I know every possible edge is O(n^2).
Currently I'm sampling the like this:
non_edges_probs = [[] for _ in range(len(non_edges))]
def collect_edge_probs(s): s = s.levels[0]
for i, non_edge in enumerate(non_edges): p = s.get_edges_prob([non_edge], [], entropy_args=dict(partition_dl=False)) non_edges_probs[i].append(p)
gt.mcmc_equilibrate(nested_state, force_niter=100, mcmc_args=dict(niter=10), callback=collect_edge_probs, verbose=True)
Is there a way to speed this up at all? If not, is there a heuristic I can use to reduce the number of possibilities?
Currently I'm using vertex similarity measures to cut the possibilities, but I'm wondering if there is a heuristic involving just the NestedState.
Any help appreciated!
Am 19.03.20 um 18:33 schrieb Deklan Webster:
I'm attempting to use get_edges_prob to find the most likely missing edges out of every possible non-edge. I know every possible edge is O(n^2).
Currently I'm sampling the like this:
non_edges_probs = [[] for _ in range(len(non_edges))]
def collect_edge_probs(s): s = s.levels[0]
for i, non_edge in enumerate(non_edges): p = s.get_edges_prob([non_edge], [], entropy_args=dict(partition_dl=False)) non_edges_probs[i].append(p)
gt.mcmc_equilibrate(nested_state, force_niter=100, mcmc_args=dict(niter=10), callback=collect_edge_probs, verbose=True)
Is there a way to speed this up at all? If not, is there a heuristic I can use to reduce the number of possibilities?
There is no way to avoid looking at all possibilities, but you could pass the actual list at once, instead of iterating through it and passing lists of size 1. The reason get_edges_prob() exists and accepts lists is precisely to speed things up in this case.
Best, Tiago
I was under the impression that passing a list corresponds to getting the probability that *all* the edges are missing. Indeed, when I try it out I get back a scalar not a np array. I want to collect the probability that each individual edge is missing.
Also, with respect to the heuristics I mention, I just saw this paper "Evaluating Overfit and Underfit in Models of Network Community Structure" use "s_ij = θ_i *θ_ j* l_gi,gj"
If sampling is not computationally feasible, this is what I had in mind.
1) Is there a way built into graph-tool to compute this similarity function efficiently? (i.e., without Python slowing me down) 2) Is there a hierarchical analog, like just summing this similarity at each level?
Thanks as always
On Mon, Mar 23, 2020 at 10:13 AM Tiago de Paula Peixoto tiago@skewed.de wrote:
Am 19.03.20 um 18:33 schrieb Deklan Webster:
I'm attempting to use get_edges_prob to find the most likely missing edges out of every possible non-edge. I know every possible edge is
O(n^2).
Currently I'm sampling the like this:
non_edges_probs = [[] for _ in range(len(non_edges))] def collect_edge_probs(s): s = s.levels[0] for i, non_edge in enumerate(non_edges): p = s.get_edges_prob([non_edge], [], entropy_args=dict(partition_dl=False)) non_edges_probs[i].append(p) gt.mcmc_equilibrate(nested_state, force_niter=100, mcmc_args=dict(niter=10), callback=collect_edge_probs, verbose=True)
Is there a way to speed this up at all? If not, is there a heuristic I can use to reduce the number of possibilities?
There is no way to avoid looking at all possibilities, but you could pass the actual list at once, instead of iterating through it and passing lists of size 1. The reason get_edges_prob() exists and accepts lists is precisely to speed things up in this case.
Best, Tiago
-- Tiago de Paula Peixoto tiago@skewed.de
graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Am 01.04.20 um 02:40 schrieb Deklan Webster:
I was under the impression that passing a list corresponds to getting the probability that *all* the edges are missing. Indeed, when I try it out I get back a scalar not a np array. I want to collect the probability that each individual edge is missing.
Yes, this is true.
Also, with respect to the heuristics I mention, I just saw this paper "Evaluating Overfit and Underfit in Models of Network Community Structure" use "s_ij = θ_i *θ_ j* l_gi,gj"
This is not substantially faster that what is actually computed in graph-tool, it is just less accurate.
If sampling is not computationally feasible, this is what I had in mind.
- Is there a way built into graph-tool to compute this similarity
function efficiently? (i.e., without Python slowing me down)
You should switch to using
MeasuredBlockState.get_edge_prob()
which is implemented entirely in C++.
The whole approach described in https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#measu... should be preferred over the older binary classification scheme, as it completely subsumes it.
- Is there a hierarchical analog, like just summing this similarity at
each level?
Yes, any of the above approaches work in the exact say way with the hierarchical model.
Best, Tiago