Tune the size of detected communities?
Hello, I have a large bipartite network with ~10^4 vertices of type I and ~10^6 vertices of type II. I fitted a nested blockmodel, hoping to identify communities of type I. Unfortunately, the detected communities (at the lowest level in the hierarchy) have a median size that is about twice as big as the empirical evidence suggests; (kind of reminding me of the resolution limit problem). Is there a way to tune the sizes of the communities at the lowest level? I'm thinking - could forcing an extra hierarchy level help, or - adding in another (non-nested) simple block model at the lowest level? - Is it possible to reduce the penalty on the description length? Any ideas would be greatly appreciated... many thanks in advance! Peter Dr Peter Straka Research Fellow (DECRA) School of Physical Engineering and Mathematical Sciences | UNSW Canberra Google Scholar <https://scholar.google.com.au/citations?user=o80TaWgAAAAJ> E: p.straka@unsw.edu.au skype: straka.ps
On 16.11.2016 02:13, Peter Straka wrote:
Hello,
I have a large bipartite network with ~10^4 vertices of type I and ~10^6 vertices of type II. I fitted a nested blockmodel, hoping to identify communities of type I. Unfortunately, the detected communities (at the lowest level in the hierarchy) have a median size that is about twice as big as the empirical evidence suggests; (kind of reminding me of the resolution limit problem).
Is there a way to tune the sizes of the communities at the lowest level? I'm thinking
* could forcing an extra hierarchy level help, or * adding in another (non-nested) simple block model at the lowest level? * Is it possible to reduce the penalty on the description length?
Any ideas would be greatly appreciated... many thanks in advance!
Although it is possible to do what you want, I think this should be discouraged. You say that the groups found are larger than what the "empirical evidence suggests". However the inference approach implemented attempts precisely to gauge the empirical evidence. It tries to avoid overfitting the data, where random fluctuations are mistaken by structure (like finding communities in completely random graphs). By forcing the number of groups to a higher value you may be risking overfitting, and you would be beating the purpose of the algorithm. Note the algorithm implemented is stochastic in nature. Usually one needs to run it several times, in particular if the modular structure is hard to detect. It may be that the algorithm will find a partition that you judge more reasonable (and has a lower description length) if you try many times. If it doesn't, then the algorithm is telling you something about the structure of the network. Maybe what you judge is more reasonable cannot be found in the data with a good statistical significance. However, if you absolutely insist on doing this (despite the consequences), the minimum number of groups can be specified as: state = minimize_nested_blockmodel_dl(g, B_min=B) Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (2)
-
Peter Straka -
Tiago de Paula Peixoto