Hi Tiago,
we noticed that with certain weighted graphs minimize_blockmodel_dl() tends to put hubs (vertices with many edges) into the same cluster. Please find a minimal example below, which produces the clustered graph in the attached plot. This happens even if edge weights are distributed uniformly over edges. Is this intended behavior?
We wonder if a possible explanation could be that the WSBM is fit to predict edge weights *as well as edge probabilities*. (Compare to formulas (1) and (4) in [1].) Hence, vertices with similar degrees tend to end up in the same cluster, if the edge weights do not contradict this. Is this correct?
In case the above makes any sense, is there a way to suppress the likelihood of the edge probabilities as in [2] where the alpha-parameter can be used to fit "only to the weight information"? (Compare to formula (4) in [2].)
This is also related to the question we asked here:
https://git.skewed.de/count0/graph-tool/-/issues/664
Best, Dominik (and Enrique)
[1] Tiago Peixoto. 2017. Nonparametric weighted stochastic block models. Physical Review E, 97. [2] C. Aicher, A. Z. Jacobs, and A. Clauset. 2014. Learning latent block structure in weighted networks. Journal of Complex Networks, 3(2):221–248.
----- import graph_tool.all as gt
h = gt.Graph(directed=False)
h.add_edge(0,3) h.add_edge(1,3) h.add_edge(2,3) h.add_edge(8,3) h.add_edge(9,3) h.add_edge(12,3) h.add_edge(13,3) h.add_edge(14,3) h.add_edge(15,3)
h.add_edge(3,4) h.add_edge(5,4) h.add_edge(6,4) h.add_edge(7,4) h.add_edge(10,4) h.add_edge(11,4) h.add_edge(16,4) h.add_edge(17,4) h.add_edge(18,4) h.add_edge(19,4) h.add_edge(20,4)
h.add_edge(22,21) h.add_edge(23,21) h.add_edge(24,21) h.add_edge(25,21) h.add_edge(26,21) h.add_edge(27,21) h.add_edge(28,21) h.add_edge(29,21) h.add_edge(30,21) h.add_edge(31,21) h.add_edge(32,21)
h.add_edge(5, 21)
ew = h.new_edge_property("double") ew.a = [4] * len(h.get_edges()) h.ep['weight'] = ew
state = gt.minimize_blockmodel_dl( h, state_args=dict(recs=[h.ep.weight],rec_types=["discrete-binomial"]) )
b = state.get_blocks() b = gt.perfect_prop_hash([b])[0]
gt.graph_draw( h, edge_text=h.ep.weight, vertex_size=20, vertex_fill_color=b, ink_scale=0.9, bg_color=[1,1,1,1] )
Am 16.07.20 um 00:49 schrieb Dominik Schlechtweg:
Hi Tiago,
we noticed that with certain weighted graphs minimize_blockmodel_dl() tends to put hubs (vertices with many edges) into the same cluster. Please find a minimal example below, which produces the clustered graph in the attached plot. This happens even if edge weights are distributed uniformly over edges. Is this intended behavior?
We wonder if a possible explanation could be that the WSBM is fit to predict edge weights *as well as edge probabilities*. (Compare to formulas (1) and (4) in [1].) Hence, vertices with similar degrees tend to end up in the same cluster, if the edge weights do not contradict this. Is this correct?
This has nothing to do with having weights or not; if you use an unweighted SBM you get the same behavior.
This clustering makes sense under the model, because a random multigraph model with the same degree sequence would yield a larger number of connections between the hubs, and between the nodes with smaller degree.
See an explanation for this in this paper: https://arxiv.org/abs/2002.07803
Best, Tiago
thanks, short follow-up:
Am 17.07.20 um 12:36 schrieb Tiago de Paula Peixoto:
Am 16.07.20 um 00:49 schrieb Dominik Schlechtweg:
Hi Tiago,
we noticed that with certain weighted graphs minimize_blockmodel_dl() tends to put hubs (vertices with many edges) into the same cluster. Please find a minimal example below, which produces the clustered graph in the attached plot. This happens even if edge weights are distributed uniformly over edges. Is this intended behavior?
We wonder if a possible explanation could be that the WSBM is fit to predict edge weights *as well as edge probabilities*. (Compare to formulas (1) and (4) in [1].) Hence, vertices with similar degrees tend to end up in the same cluster, if the edge weights do not contradict this. Is this correct?
This has nothing to do with having weights or not; if you use an unweighted SBM you get the same behavior.
I see that we are probably mixing up two things here. Regarding this point:
is there a way to suppress the likelihood of the edge probabilities as in [2] where the alpha-parameter can be used to fit "only to the weight information"? (Compare to formula (4) in [2].) [...] [2] C. Aicher, A. Z. Jacobs, and A. Clauset. 2014. Learning latent block structure in weighted networks. Journal of Complex Networks, 3(2):221–248.
How does the graph-tools implementation relate to the alpha-parameter in formula (4)? Is it equivalent to giving equal weight to edge probabilities and weights (alpha = 0.5)?
This clustering makes sense under the model, because a random multigraph model with the same degree sequence would yield a larger number of connections between the hubs, and between the nodes with smaller degree.
See an explanation for this in this paper: https://arxiv.org/abs/2002.07803
Is it possible to use LatentMultigraphBlockState() with a weighted graph?
Best, Tiago
graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Am 17.07.20 um 14:19 schrieb Dominik Schlechtweg:
is there a way to suppress the likelihood of the edge probabilities as in [2] where the alpha-parameter can be used to fit "only to the weight information"? (Compare to formula (4) in [2].) [...] [2] C. Aicher, A. Z. Jacobs, and A. Clauset. 2014. Learning latent block structure in weighted networks. Journal of Complex Networks, 3(2):221–248.
How does the graph-tools implementation relate to the alpha-parameter in formula (4)? Is it equivalent to giving equal weight to edge probabilities and weights (alpha = 0.5)?
This parameter is not implemented in graph-tool.
Note that such a parameter does not have an obvious interpretation from a generative modelling point of view, specially in a Bayesian way. We cannot just introduce ad-hoc parameters to cancel certain parts of the likelihood, without paying proper attention to issues of normalization, etc, and expect things to behave consistently.
In other words, I do not fully agree with the alpha parameter of Aicher et al.
Is it possible to use LatentMultigraphBlockState() with a weighted graph?
Not yet.
Best, Tiago
Am 17.07.20 um 19:44 schrieb Tiago de Paula Peixoto:
Am 17.07.20 um 14:19 schrieb Dominik Schlechtweg:
is there a way to suppress the likelihood of the edge probabilities as in [2] where the alpha-parameter can be used to fit "only to the weight information"? (Compare to formula (4) in [2].) [...] [2] C. Aicher, A. Z. Jacobs, and A. Clauset. 2014. Learning latent block structure in weighted networks. Journal of Complex Networks, 3(2):221–248.
How does the graph-tools implementation relate to the alpha-parameter in formula (4)? Is it equivalent to giving equal weight to edge probabilities and weights (alpha = 0.5)?
This parameter is not implemented in graph-tool.
Note that such a parameter does not have an obvious interpretation from a generative modelling point of view, specially in a Bayesian way. We cannot just introduce ad-hoc parameters to cancel certain parts of the likelihood, without paying proper attention to issues of normalization, etc, and expect things to behave consistently.
In other words, I do not fully agree with the alpha parameter of Aicher et al.
Thanks for clarifying this. Last question: Does your doubt also concern the special case where alpha = 0, i.e., ignoring edge probabilities completely? (This is the actually interesting case for us. We are not interested in tuning this parameter in any way.)
Is it possible to use LatentMultigraphBlockState() with a weighted graph?
Not yet.
We will open a feature request then, in case there is none yet.
Best, Tiago
graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Am 17.07.20 um 20:35 schrieb Dominik Schlechtweg:> Thanks for clarifying this. Last question: Does your doubt also
concern the special case where alpha = 0, i.e., ignoring edge probabilities completely? (This is the actually interesting case for us. We are not interested in tuning this parameter in any way.)
I'd have to look at this more closely, but I think this is strange since the edges define the support of the distribution. The more correct thing to do, in my opinion, would be to marginalize over the edges. This might end up being equivalent o setting alpha=0, but I would need to check.
Am 17.07.20 um 20:44 schrieb Tiago de Paula Peixoto:
Am 17.07.20 um 20:35 schrieb Dominik Schlechtweg:> Thanks for clarifying this. Last question: Does your doubt also
concern the special case where alpha = 0, i.e., ignoring edge probabilities completely? (This is the actually interesting case for us. We are not interested in tuning this parameter in any way.)
I'd have to look at this more closely, but I think this is strange since the edges define the support of the distribution. The more correct thing to do, in my opinion, would be to marginalize over the edges. This might end up being equivalent o setting alpha=0, but I would need to check.
You're probably quite busy, but as we would really profit from this, I would then open a feature request, if you don't stop me right now. :-)
graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool