On 08.06.2016 10:14, Andrea Briega wrote:
Thank you very much, your answers have been really helpful. I am now on the last step, model selection, and I would like to be sure that I’m doing it right. I get the posterior odds ratio to compare two partitions throught this way: e^-(dl1-dl2), with dl1 and dl2 as higher and lower description length respectively. I have obtained description length using ‘state.entropy()’ for nested models and ‘state.entropy(dl=True)’ for no nested ones.
This is correct. Note that in current versions of graph-tool you can also just call state.entropy() for non-nested models, since we have dl=True per default.
I have doubts about this because small differences in description length cause much lower values than 0.01, so in most cases the evidence supporting one of the models is decisive. I only get higher values than 0.01 if the difference in description length is lower than 5 units. With my data (24.000 nodes and 5.000.000 edges) I always obtain decisive supports, either when I compare different models or when I compare different runs of the same model. I wonder if this is rigth.
This is indeed expected if you have lots of data (i.e. large networks). For sufficient data, the evidence for the better model should always become decisive, as long as the models being compared are indeed distinguishable. 5 million edges is quite a bit, and indeed I would expect the posterior odds to be quite small in this situation. You just have to make sure that you found the best fit (i.e. smaller description length) for each model you are comparing, by running the algorithm as many times as you can. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>