Hi Tiago and everyone,
I'm trying the community detection algorithm following the example from the docs.
Brifely, with a graph g, I'm running the following:
# Plot the full graph graph_draw(g, size=(15,15), output=output_folder+"/full_graph.png") # Detect communities spins = community_structure(g, n_iter=1000, n_spins=N) ### note the N parameter ng = condensation_graph(g, spins) size = ng[0].new_vertex_property("double") size.a = log(ng[1].a+1) # Plot the community graph graph_draw(ng[0], vsize=size, vcolor=size, splines=True, eprops={"len":20, "penwidth":10}, vprops={"penwidth":10}, output=output_folder+"/community_graph.png", size=(10,10))
I notice that when n_spins is small (say, n_spins=6), the resulting community graph shows links that do not appear on the full graph. See, for example, this pair of plots:
http://www.ebi.ac.uk/~spivakov/testX/full_graph.png http://www.ebi.ac.uk/~spivakov/testX/community_graph.png
I'm a little confused as to how this is possible - am I right that with this algorithm communities are non-overlapping?
With larger n_spins (such as n_spins=50), the ghost link disappears, but lots of very small communities appear that segment the data too much:
http://www.ebi.ac.uk/~spivakov/testX/community_graph50.png
More generally, my problem is that I'd like to use this algorithm in a tool that would need to have a good enough guess of n_spins without using a trial-and-error approach. It doesn't need to be the best possible community detection ever, though. So I would also very much appreciate your thoughts on estimating the most optimal n_spins from the data.
Thanks a lot!
Best wishes, Mikhail
Hi,
On 04/17/2012 04:14 PM, Mikhail Spivakov wrote:
I notice that when n_spins is small (say, n_spins=6), the resulting community graph shows links that do not appear on the full graph. See, for example, this pair of plots:
http://www.ebi.ac.uk/~spivakov/testX/full_graph.png http://www.ebi.ac.uk/~spivakov/testX/community_graph.png
I'm a little confused as to how this is possible - am I right that with this algorithm communities are non-overlapping?
It seems to me the community algorithm simply did not find the best partition. This can happen, since the simulated annealing may converge to a non-optimal solution. You have to make the cooling rate slower or start from different initial conditions.
From the image you sent, it is not possible to assess visually the quality of the result. You should plot the first graph, and make the vertex colors correspond to the community values.
More generally, my problem is that I'd like to use this algorithm in a tool that would need to have a good enough guess of n_spins without using a trial-and-error approach. It doesn't need to be the best possible community detection ever, though. So I would also very much appreciate your thoughts on estimating the most optimal n_spins from the data.
This is a hard problem to solve in general. It is usually the case that you should have some insight on the total number of communities, and AFAIK there is no known reliable method which will give you this information easily. If your networks are not too large, you can try increasing n_spins values, and stop if the modularity no longer changes appreciably.
Cheers, Tiago
-- Tiago de Paula Peixoto tiago@skewed.de