Hi Davide, Am 03.07.20 um 11:31 schrieb Davide Cittaro:
Hello, I'm testing the new Planted Partition model in graph-tool on my data, indeed I'm finding interesting results. I have some questions/observations, though. - PPBlockState returns a relatively large number of partitions on large networks, which is fine and expected. When I use NSBM, instead, I make use of the hierarchy not only because I can "abstract" partitions up to a certain level, but also because the hierarchy has a meaning in my case. Is there (or will it be there) a hierarchical formulation of the PPBlockState?
A hierarchical prior for the PP model is certainly feasible, and it is something that could come up in the future, but I can't promise when.
- I tried multiple initialisations of PPBlockState over my graph, I also tried to increase the iterations of the initial MCMC sweep and I'd say I get very consistent results. Is this expected? I mean, is it known if the PPBlockState converges to a stable solution faster and in a consistent way?
This depends a lot on the underlying data. If the model is a good fit, then this consistency is expected, otherwise it's not. It will not necessarily behave like this for every data.
- Does the time required to converge scales with the number of edges as it does for SBM?
Like in the SBM, the MCMC sweeps take time proportional to the number of edges, but the multiplicative factor is smaller, since the model is simpler.
- As far as I understand, if the assortativity is the dominant pattern the difference between PP and NSBM is negligible. I don't know how to quantify "negligible" as the differences in entropies are at least in the order of 1e2 in the cases I tested (seems pretty large to me); I would be happy to switch to PP, also given the shorter runtime so far, but I'm a bit concerned about these differences.
I do not recommend simply switching to PP for every analysis. As was described in the paper, the SBM is still a more powerful model, that is capable of better capturing the network structure in a wider variety of cases. To answer your question, you can test whether the two models give similar answers by comparing their partitions. You can use the partition_overlap() function for that. Comparing the description length is useful to select the best fitting model, but not to tell if they give similar answers. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>