Problem with the partition into communities
Dear list, I am new with graph-tools, which looks pretty nice. However, I have the following problem. The result of the partition into communities gives one big community and two very small ones, where the vertices are almost no connected between them. I wonder if I am doing something wrong. My network is directed and unweighted. My algorithm is just: " g2 = load_graph("test.graphml") state = gt.minimize_blockmodel_dl(g2, deg_corr=False) b = state.get_blocks() for v in g2.vertices(): print v,b[v] " My network has only 50 vertex, is that a limitation? Thanks in advance, Yérali. -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 22.03.2017 09:59, yerali wrote:
Dear list, I am new with graph-tools, which looks pretty nice. However, I have the following problem. The result of the partition into communities gives one big community and two very small ones, where the vertices are almost no connected between them. I wonder if I am doing something wrong. My network is directed and unweighted. My algorithm is just: " g2 = load_graph("test.graphml") state = gt.minimize_blockmodel_dl(g2, deg_corr=False) b = state.get_blocks()
for v in g2.vertices(): print v,b[v] " My network has only 50 vertex, is that a limitation?
I don't understand what seems to be the problem. What makes you think that the partition the algorithm finds is wrong? I would help if you were more specific about your data, and what you were expecting. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Thanks for your reply. The input is test2.graphml: test2.graphml <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027136/test2.graphml> and the final partition (using the commands explained before) can be seen in: <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027136/dib.jpg> I was expecting each community to have connected nodes, but that is not the case in the blue one, for example. Thanks a lot, Yérali. -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 22.03.2017 15:22, yerali wrote:
Thanks for your reply.
The input is test2.graphml: test2.graphml <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027136/test2.graphml>
and the final partition (using the commands explained before) can be seen in: <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027136/dib.jpg>
I was expecting each community to have connected nodes, but that is not the case in the blue one, for example.
I don't see anything wrong. The approach is based on fitting the stochastic block model, which accepts any type of mixing pattern, not only the assortative one you are expecting. If you use the non-degree-corrected model (deg_corr=False), the model expects that nodes inside the same group will receive typically the same number of edges, and thus the inference procedure will often just split the nodes according to degree, if there is no structures that are more salient. Indeed, this is why the degree-corrected model (deg_corr=True) is often preferred. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Ok, Thanks for your answer. In that case not communities are detected (all the system is in the same community). Maybe the system is to small for SBM, while partitions in the same network can be done using other methods. :( Thanks again, Yérali. -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 22.03.2017 15:39, yerali wrote:
Ok, Thanks for your answer.
In that case not communities are detected (all the system is in the same community). Maybe the system is to small for SBM, while partitions in the same network can be done using other methods.
:(
No, I don't think this is the correct conclusion. If the method finds only one community, it means that there is no statistically significant community structure in the network! Indeed, a very powerful feature of this approach is that it can separate structure from noise, and it will only yield partitions that are substantiated by evidence. Certainly you will be able to find some algorithm that splits your network, but I would advise against using such partition, since it is probably meaningless. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Dear Tiago, Good point to take into account. Thank you very much. All the best, Yérali. -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
Hi Yerali, for assortative communities you would maybe want to find a partition with maximum modularity, which is a bit different to minimizing blockmodel description length. Peter On Thu, 23 Mar 2017 at 01:53 yerali <ygandica@gmail.com> wrote:
Dear Tiago,
Good point to take into account. Thank you very much.
All the best,
Yérali.
-- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com. _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
On 23.03.2017 03:14, Peter Straka wrote:
Hi Yerali, for assortative communities you would maybe want to find a partition with maximum modularity, which is a bit different to minimizing blockmodel description length.
I don't think this is a good idea for the reason that I have mentioned: Maximizing modularity ignores the statistical significance of the partition. Because of this, by maximizing modularity one can find partitions of fully random graphs with very high modularity, but which are completely meaningless: https://journals.aps.org/pre/abstract/10.1103/PhysRevE.70.025101 This is not a problem with the SBM inference approach implemented in graph-tool, which is meant to solve this problem in a principled manner. So when the algorithm finds only one community in your data, it is telling you something important: You graph cannot be distinguished from a fully random one with the same size and density (or degree sequence). It is a bad idea (and bad practice) to ignore this, and use some heuristic just because it finds some partition. This is a recipe for overfitting. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Hi, thank you very much for both answers. This is how science works. My problem with the SBM is that the maximization of internal density is not a desired condition for the separation into blocks. The pattern is very clear in the partition of my figure (without degree correlation), where the nodes of the same colors are connected with the other colors in an equivalent way, but far from being a community. For example, in causality-based networks to know the set of vertex where some flow can be trapped makes sense. The same applies in the case of information flow. I think the network in my figure can be suitably partitioned into communities without being simply noise. Maybe this is a good example for the affirmation of that there is not (yet) a perfect method to detect communities in all kinds of networks. In this sense, I would rather think in the direction of that a partitioning has not a unique and “right” solution. Best, Yérali. -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 23.03.2017 11:16, yerali wrote:
My problem with the SBM is that the maximization of internal density is not a desired condition for the separation into blocks. The pattern is very clear in the partition of my figure (without degree correlation), where the nodes of the same colors are connected with the other colors in an equivalent way, but far from being a community.
That is because this is the most significant mixing pattern present in the data. One of the most central features of the SBM is that it admits _any_ mixing pattern, not only assortative ones. For example, it works beautifully well on networks with bipartite or multipartite structures. However, if the network does possess assortative structures, the method _will_ find them. I'd recommend taking a closer look into the SBM literature.
For example, in causality-based networks to know the set of vertex where some flow can be trapped makes sense. The same applies in the case of information flow.
And if these assortative modules exist, they will be picked up by the SBM as well. But see my point below.
I think the network in my figure can be suitably partitioned into communities without being simply noise.
And how do you make this assessment? If you randomize your network, while keeping the degree sequence intact, you will still be able to partition it into communities (that also trap random walks, has high modularity, etc). The point is that this structure is the result of mere statistical fluctuations, not meaningful properties of the generative mechanism behind the network. It is _very_ easy to find structure in random networks. You have to be careful. I recommend taking the SBM result seriously.
Maybe this is a good example for the affirmation of that there is not (yet) a perfect method to detect communities in all kinds of networks. In this sense, I would rather think in the direction of that a partitioning has not a unique and “right” solution.
This is besides my point. Although I agree that there is more than one unique way to represent a network (with the SBM being only one of them), methods that do not distinguish structure from noise result in spurious results, and should be avoided. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Dear Tiago, Thanks for your answer. Sorry for the delay of my reply but I was learning how to add a property from external data. I just wanted to answer:
And how do you make this assessment?
So, in this picture the separation into communities using partition stability, as an example: <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027150/dib_ps.jpg> I think colors well represent set of nodes densely connected, which is important at least in my current case of study, where links represent causality. Best, Yérali. -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
Dear Yerali, I think you are not really interested in the "communities" but just the dense subgraphs. As Tiago already very nicely explained, these two are quite different things. The existence of dense subgraphs might just be a result of statistical fluctuations in the underlying model or they may simply arise because of the particular form of a degree sequence. As he also warned, if you make a mistake of treating these as "real" communities in the underlying generative model, you may infer very wrong conclusions about the model that may not be valid about a different realization of the same model. On the other hand if you really care _only_ about a given realization, it is perhaps fine to ignore the underlying generative model. You might like to have a look at this which explains these things: http://link.springer.com/article/10.1007/s41109-017-0023-6 Regards Snehal Shekatkar On Fri, Mar 24, 2017 at 3:48 PM, yerali <ygandica@gmail.com> wrote:
Dear Tiago, Thanks for your answer. Sorry for the delay of my reply but I was learning how to add a property from external data.
I just wanted to answer:
And how do you make this assessment?
So, in this picture the separation into communities using partition stability, as an example: <http://main-discussion-list-for-the-graph-tool-project. 982480.n3.nabble.com/file/n4027150/dib_ps.jpg> I think colors well represent set of nodes densely connected, which is important at least in my current case of study, where links represent causality.
Best,
Yérali.
-- View this message in context: http://main-discussion-list- for-the-graph-tool-project.982480.n3.nabble.com/Problem- with-the-partition-into-communities-tp4027132p4027150.html Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com. _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
-- Snehal Madhukar Shekatkar Pune India
On 24.03.2017 11:18, yerali wrote:
Dear Tiago, Thanks for your answer. Sorry for the delay of my reply but I was learning how to add a property from external data.
I just wanted to answer:
And how do you make this assessment?
So, in this picture the separation into communities using partition stability, as an example: <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027150/dib_ps.jpg> I think colors well represent set of nodes densely connected, which is important at least in my current case of study, where links represent causality.
This is a perfect example of what I am trying to convey to you. What you are seeing are basically three groups of nodes with low degree around nodes with high in-degree, in addition to some smaller groups. Any random graph with the same degree sequence will admit similar partitions, although they carry no meaning from a generative point of view. You can check this by randomizing your graph with random_rewire() and then obtaining the partition again with the same method. You will probably see a similar division. In other words, these are not statistically significant communities; they arise out of random fluctuations. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Hello again, Thanks for your important answer. This is indeed a very good example. Maybe I should clarify from the beginning that I am interested, in my current study, in single realizations of the network. Being the links of my system based on causalities, the importance here is in the flow across the links. And, as in the case of networks for information flows, the time-scale separation works perfect as separator in communities. So, *I will be interested in the same partition happening in any random graph with the same degree sequence*. In this sense, colors well represent the set of nodes densely connected in which I am interested in. Sorry for all this thread, maybe for you everything was trivial from the beginning but for me it has been a very didactic and now intuitive point regarding how community detection depends on the particularities of the problem of interest. Thanks and Best Regards, Yérali. -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
participants (4)
-
Peter Straka -
Snehal Shekatkar -
Tiago de Paula Peixoto -
yerali