Community detection: how do 'ghost' edges appear?

17 Apr 2012

      Hi Tiago and everyone,

I'm trying the community detection algorithm following the example
from the docs.

Brifely, with a graph g, I'm running the following:

# Plot the full graph
graph_draw(g, size=(15,15), output=output_folder+"/full_graph.png")
# Detect communities
spins = community_structure(g, n_iter=1000, n_spins=N) ### note the N parameter
ng = condensation_graph(g, spins)
size = ng[0].new_vertex_property("double")
size.a = log(ng[1].a+1)
# Plot the community graph
graph_draw(ng[0], vsize=size, vcolor=size, splines=True,
eprops={"len":20, "penwidth":10}, vprops={"penwidth":10},
output=output_folder+"/community_graph.png", size=(10,10))

I notice that when n_spins is small (say, n_spins=6), the resulting
community graph shows links that do not appear on the full graph.
See, for example, this pair of plots:

http://www.ebi.ac.uk/~spivakov/testX/full_graph.png
http://www.ebi.ac.uk/~spivakov/testX/community_graph.png

I'm a little confused as to how this is possible - am I right that
with this algorithm communities are non-overlapping?

With larger n_spins (such as n_spins=50), the ghost link disappears,
but lots of very small communities appear that segment the data too
much:

http://www.ebi.ac.uk/~spivakov/testX/community_graph50.png

More generally, my problem is that I'd like to use this algorithm in a
tool that would need to have a good enough guess of n_spins without
using a trial-and-error approach. It doesn't need to be the best
possible community detection ever, though. So I would also very much
appreciate your thoughts on estimating the most optimal n_spins from
the data.

Thanks a lot!

Best wishes,
Mikhail

Mikhail Spivakov

Tiago de Paula Peixoto

tags

participants (2)