minimize nested blockmodel dl performance
Hello, I am a new user of the library so forgive if my questions are very basic. I am using the minimize_nested_blockmodel_dl in order to create a hierarchical cluster for a graph. The graph is created from a matrix distance of the nodes in the graph. There are 180 nodes. I have tried with different transformations of the distances to weights, basically w=1-d and w=1/d, previously normalizing to fit in range 0-1. Only with the second one I have been able to obtain a non-trivial partition and the partition obtained is quite bad (low NMI and high NVI ... ). Using other methods, much simpler, I have found much better partition (agglomerative clustering for example) so I know the data allows for a better partition to be obtained. The method is non-parametric, once the weights are provided, except for the number of sweeps, epsilons,... How can I know if I need to increase the number of sweeps or if I have reached the limit of the algorithm? So far, I have run upto 10K sweeps but there is no much difference in the execution time nor the result. What other parameters can be adjusted and under what criteria? How must the data be normalized to obtain optimal results(increase algorithm chances of finding a better partition series)? Thanks in advance Jose
Hi Jose, On 04/04/2014 01:30 AM, Jose Magaña wrote:
Hello, I am a new user of the library so forgive if my questions are very basic. I am using the minimize_nested_blockmodel_dl in order to create a hierarchical cluster for a graph. The graph is created from a matrix distance of the nodes in the graph. There are 180 nodes. I have tried with different transformations of the distances to weights, basically w=1-d and w=1/d, previously normalizing to fit in range 0-1. Only with the second one I have been able to obtain a non-trivial partition and the partition obtained is quite bad (low NMI and high NVI ... ). Using other methods, much simpler, I have found much better partition (agglomerative clustering for example) so I know the data allows for a better partition to be obtained.
The method is non-parametric, once the weights are provided, except for the number of sweeps, epsilons,...
How can I know if I need to increase the number of sweeps or if I have reached the limit of the algorithm? So far, I have run upto 10K sweeps but there is no much difference in the execution time nor the result.
What other parameters can be adjusted and under what criteria?
How must the data be normalized to obtain optimal results(increase algorithm chances of finding a better partition series)?
The stochastic blockmodel implemented only covers unweighted graphs, or multigraphs where the weights correspond to the edge multiplicities. The edge weight parameter in the minimize_nested_blockmodel_dl() function corresponds to edge multiplicities in a multigraph, not arbitrary real weights. If you pass real weights, they will be converted to discrete integers, e.g. zero if the weight is in the range [0, 1[, which is probably why you are getting bad results. You could discretize the weights by multiplying by some large constant, but you have to do this carefully, since the results will depend on your quantization, which will arbitrarily change the density of the graph... Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (2)
-
Jose Magaña -
Tiago de Paula Peixoto