Questions about inference algorithms

Andrea Briega

9 May 2016 9 May '16

2:33 p.m.

Dear Dr Peixoto, I would like to solve some questions I have about inference algorithms for the identification of large-scale network structure via the statistical inference of generative models. Minimize_blockmodel algorithm takes an hour to finish using my network with 21000 nodes (like the hierarchical version), and it spends two days and a half with overlap. However, I have run an hierarchical analysis with overlap, and it is still running since 14 days ago. So my first question is: is this time normal, or maybe there is any problem? Do you know how long could it ussually takes? Secondly, I have repeated some of these analysis with exactly same options but I get different solutions (similar but different), so I wonder if the algorithm is heuristic (I thought it was exact). My last question question regards bipartite analysis. I have two types of nodes in my network and I wonder if there are any analytical difference when running these algorithms with the bipartite option (clabel=True, and different labels in each group of nodes) or not, because it seems that the program “knows” my network is bipartite in any case. If there are differences between bipartite and “unipartite” analysis (clabel=False), is it possible to compare description length between them to model selection? Thank you very much for your help! Best regards, Andrea

Attachments:

attachment.html (text/html — 1.5 KB)

Show replies by date

Tiago de Paula Peixoto

9 May 9 May

3:15 p.m.

On 09.05.2016 16:33, Andrea Briega wrote:

...

Minimize_blockmodel algorithm takes an hour to finish using my network with 21000 nodes (like the hierarchical version), and it spends two days and a half with overlap. However, I have run an hierarchical analysis with overlap, and it is still running since 14 days ago. So my first question is: is this time normal, or maybe there is any problem? Do you know how long could it ussually takes?

The overlapping model is indeed much more time consuming than the non-overlapping one. This is because in the overlapping case we need to partition the _half-edges_ instead of the nodes, as we do in the non-overlapping case. The number of half-edges is twice the number of edges, which is typically one or two orders of magnitude larger than the number of nodes. Because of this, and the fact that group mixtures are more expensive to represent algorithmically, the overlapping model tends to be slower. So, the time difference you are observing is expected. Although, it is difficult to say how long it will in fact take, since it depends on the number of edges, not nodes. In my experience, I find that there are comparatively few networks that are better modelled by the overlapping model (see http://dx.doi.org/10.1103/PhysRevX.5.011033). Usually, the non-overlapping model provides a shorter description length, indicating that it has a better explanatory power. Because of this, I generally tend to work with the non-overlapping model, unless there is a very specific reason not to.

...

Secondly, I have repeated some of these analysis with exactly same options but I get different solutions (similar but different), so I wonder if the algorithm is heuristic (I thought it was exact).

No, it is not exact. This type of inference problem is NP-Hard, hence exact algorithms are not feasible. In the default parametrization, the algorithm implemented is an agglomerative heuristic. The algorithm is probabilistic, so it will indeed return a slightly different fit every time. Thus the typical practice is to perform several fits and choose the one with the smallest description length. If more precision is desired, the algorithm can also be configured via the appropriate parameters to either run a unity-temperature MCMC, Gibbs sampling or simulated annealing. These, of corse, are more expensive numerically.

...

My last question question regards bipartite analysis. I have two types of nodes in my network and I wonder if there are any analytical difference when running these algorithms with the bipartite option (clabel=True, and different labels in each group of nodes) or not, because it seems that the program “knows” my network is bipartite in any case. If there are differences between bipartite and “unipartite” analysis (clabel=False), is it possible to compare description length between them to model selection?

The clabel parameter should not be Boolean; it needs to be a property map that marks the partition constraints for each node. Namely, vertices with different label values will not be clustered in the same group. This does not affect the computation of the description length; it simply guides the algorithm so that violating moves are not attempted. Indeed, as you noticed, the algorithm will always learn that you have a bipartite network if that is the case, hence the 'clabel' parameter does not need to be used for that purpose. The clabel parameter is useful only in situations where there are other types of forbidden partitions that the algorithm would run into otherwise. Note that if the agglomerative heuristic is turned off, and MCMC is used instead, the algorithm may in fact put nodes belonging to different partitions of a bipartite network in the same group. The probability of this happening will depend on how "strongly" bipartite the network is. In this case, the clabel parameter is useful to forbid those non-bipartite divisions, if they are undesired. Note that there is also a 'pclabel' parameter. This is almost identical to 'clabel', with the only difference that it _is_ in fact used to compute the description length. Namely, it assumes that this division is known a priori, and hence the description length for the partition will include only subdivisions thereof. This is indeed useful, for example, when you know from context that your network is bipartite, and this is not a fact that you wish to learn. This generalizes also to other types of known divisions. I recommend using this, if this is the case. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

Andrea Briega

10 May 10 May

9:41 a.m.

Thank you very much! your answer has been really helpful, now I understand this much better. I'll think about the options you said. Thanks again, Andrea 2016-05-09 16:33 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...

Dear Dr Peixoto,

I would like to solve some questions I have about inference algorithms for the identification of large-scale network structure via the statistical inference of generative models.

Minimize_blockmodel algorithm takes an hour to finish using my network with 21000 nodes (like the hierarchical version), and it spends two days and a half with overlap. However, I have run an hierarchical analysis with overlap, and it is still running since 14 days ago. So my first question is: is this time normal, or maybe there is any problem? Do you know how long could it ussually takes?

Secondly, I have repeated some of these analysis with exactly same options but I get different solutions (similar but different), so I wonder if the algorithm is heuristic (I thought it was exact).

My last question question regards bipartite analysis. I have two types of nodes in my network and I wonder if there are any analytical difference when running these algorithms with the bipartite option (clabel=True, and different labels in each group of nodes) or not, because it seems that the program “knows” my network is bipartite in any case. If there are differences between bipartite and “unipartite” analysis (clabel=False), is it possible to compare description length between them to model selection?

Thank you very much for your help!

Best regards,

Andrea

Andrea Briega

12 May 12 May

8:56 a.m.

Hi again, I have recently noticed the actualization of graphtool and now I am a little bit confused about some changes. Sorry, I know my questions are very basic. I am not familiar with these language and I have some dificulties to get results. I am running inference algorithms to get the best model using different options of model selection. I want to set pclabel in the inference algorithms because I know a priori my network is bipartite, and next I want to get the description length. Before actualization I did this by this way: vprop_double = g.new_vertex_property("int") # g is my network for i in range(0, 11772): vprop_double[g.vertex(i)] = 1 for i in range(11773, 214221): vprop_double[g.vertex(i)] = 2 state = gt.minimize_blockmodel_dl(g, pclabel=True) state.entropy(dl=True) # I am not sure this is the right way to get the description length. But now I have some problems. First of all, minimize_blockmodel_dl doesn't have a pclabel argument so I don't know how indicate it in the inference algorithm. I have tried this: state.pclabel = vprop_double But I get the same result when I do "state.entropy(dl=True)" as before. Also, I get the same result doing "state.entropy(dl=True)" or "state.entropy()", and I don't understand why neither. And finally, in NestedBlockState objects I don't know to get description length because entropy hasn't a "dl" argument. In these objects entropy and dl are the same? In conclusion, I don't know how to set pclabel and to get the description length in hierarchical models, and I am not sure if I am getting it correctly in non-hierarchical ones. Sorry again for my basic questions but I can't go on because of these problems. Thank you very much! Best regards, Andrea 2016-05-10 11:41 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...

Thank you very much! your answer has been really helpful, now I understand this much better. I'll think about the options you said.

Thanks again,

Andrea

2016-05-09 16:33 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Dear Dr Peixoto,

I would like to solve some questions I have about inference algorithms for the identification of large-scale network structure via the statistical inference of generative models.

Minimize_blockmodel algorithm takes an hour to finish using my network with 21000 nodes (like the hierarchical version), and it spends two days and a half with overlap. However, I have run an hierarchical analysis with overlap, and it is still running since 14 days ago. So my first question is: is this time normal, or maybe there is any problem? Do you know how long could it ussually takes?

Secondly, I have repeated some of these analysis with exactly same options but I get different solutions (similar but different), so I wonder if the algorithm is heuristic (I thought it was exact).

My last question question regards bipartite analysis. I have two types of nodes in my network and I wonder if there are any analytical difference when running these algorithms with the bipartite option (clabel=True, and different labels in each group of nodes) or not, because it seems that the program “knows” my network is bipartite in any case. If there are differences between bipartite and “unipartite” analysis (clabel=False), is it possible to compare description length between them to model selection?

Thank you very much for your help!

Best regards,

Andrea

Tiago de Paula Peixoto

2:31 p.m.

On 12.05.2016 10:56, Andrea Briega wrote:

...

vprop_double = g.new_vertex_property("int") # g is my network for i in range(0, 11772): vprop_double[g.vertex(i)] = 1 for i in range(11773, 214221): vprop_double[g.vertex(i)] = 2

state = gt.minimize_blockmodel_dl(g, pclabel=True)

You should not pass "pclabel=True" here... the pclabel parameter expects the property map itself, not a Boolean value. Furthermore, you should not pass it the minimize_blockmodel_dl() function directly. Instead, you should do state = gt.minimize_blockmodel_dl(g, state_args=dict(pclabel=vprop_double))

...

state.entropy(dl=True) # I am not sure this is the right way to get the description length.

It is, but if you look at the documentation, you will see that the default is "dl = True", hence calling only state.entropy() is sufficient.

...

But now I have some problems. First of all, minimize_blockmodel_dl doesn't have a pclabel argument so I don't know how indicate it in the inference algorithm. I have tried this:

state.pclabel = vprop_double

This is incorrect. Never try to modify the states directly like this. You have to do it like I described above.

...

And finally, in NestedBlockState objects I don't know to get description length because entropy hasn't a "dl" argument. In these objects entropy and dl are the same?

Yes. Since the NestedBlockState corresponds to the regular BlockState plus an hierarchical prior, its full entropy corresponds always to the description length. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

Andrea Briega

4:43 p.m.

Thank you very much! I was wrong, I meant "state = gt.minimize_blockmodel_dl(g, pclabel=vprop_double)", it has been a mistake while I was writing the mail. So the key was the use of "state_args", I tried it but with a different notation and obviously it didn't work. Now I can go on! Thanks again, Andrea 2016-05-12 10:56 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...

Hi again,

I have recently noticed the actualization of graphtool and now I am a little bit confused about some changes. Sorry, I know my questions are very basic. I am not familiar with these language and I have some dificulties to get results.

I am running inference algorithms to get the best model using different options of model selection. I want to set pclabel in the inference algorithms because I know a priori my network is bipartite, and next I want to get the description length. Before actualization I did this by this way:

vprop_double = g.new_vertex_property("int") # g is my network for i in range(0, 11772): vprop_double[g.vertex(i)] = 1 for i in range(11773, 214221): vprop_double[g.vertex(i)] = 2

state = gt.minimize_blockmodel_dl(g, pclabel=True)

state.entropy(dl=True) # I am not sure this is the right way to get the description length.

But now I have some problems. First of all, minimize_blockmodel_dl doesn't have a pclabel argument so I don't know how indicate it in the inference algorithm. I have tried this:

state.pclabel = vprop_double

But I get the same result when I do "state.entropy(dl=True)" as before. Also, I get the same result doing "state.entropy(dl=True)" or "state.entropy()", and I don't understand why neither.

And finally, in NestedBlockState objects I don't know to get description length because entropy hasn't a "dl" argument. In these objects entropy and dl are the same?

In conclusion, I don't know how to set pclabel and to get the description length in hierarchical models, and I am not sure if I am getting it correctly in non-hierarchical ones.

Sorry again for my basic questions but I can't go on because of these problems.

Thank you very much!

Best regards,

Andrea

2016-05-10 11:41 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Thank you very much! your answer has been really helpful, now I understand this much better. I'll think about the options you said.

Thanks again,

Andrea

2016-05-09 16:33 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Dear Dr Peixoto,

I would like to solve some questions I have about inference algorithms for the identification of large-scale network structure via the statistical inference of generative models.

Minimize_blockmodel algorithm takes an hour to finish using my network with 21000 nodes (like the hierarchical version), and it spends two days and a half with overlap. However, I have run an hierarchical analysis with overlap, and it is still running since 14 days ago. So my first question is: is this time normal, or maybe there is any problem? Do you know how long could it ussually takes?

Secondly, I have repeated some of these analysis with exactly same options but I get different solutions (similar but different), so I wonder if the algorithm is heuristic (I thought it was exact).

My last question question regards bipartite analysis. I have two types of nodes in my network and I wonder if there are any analytical difference when running these algorithms with the bipartite option (clabel=True, and different labels in each group of nodes) or not, because it seems that the program “knows” my network is bipartite in any case. If there are differences between bipartite and “unipartite” analysis (clabel=False), is it possible to compare description length between them to model selection?

Thank you very much for your help!

Best regards,

Andrea

Andrea Briega

19 May 19 May

7:55 a.m.

Dear Mr Peixoto, I have just run a few analysis of the new version of your package and my results totally change between v2.13 and v2.16. Nested_minimize_blockmodel is the one that make most relevant changes and it is very difficult to get a biological explanation of the new results, mainly at the superior hierarchical levels. I would like to know the particular changes in these two analysis to better understanding of my results. Is it possible to change any parameter to run this function in a similar way to the v2.13? I used to run this function on this way: state = minimize_nested_blockmodel_dl(g, pclabel=vprop_double, overlap=False, nonoverlap_init=False, deg_corr=True, layers=False) And I have run the new version of the function on this way: state = minimize_nested_blockmodel_dl(g, state_args=dict(pclabel=vprop_double), overlap=False, nonoverlap_init=False, deg_corr=True, layers=False) Thank you very much, Andrea 2016-05-12 18:43 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...

Thank you very much! I was wrong, I meant "state = gt.minimize_blockmodel_dl(g, pclabel=vprop_double)", it has been a mistake while I was writing the mail. So the key was the use of "state_args", I tried it but with a different notation and obviously it didn't work. Now I can go on!

Thanks again,

Andrea

2016-05-12 10:56 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Hi again,

I have recently noticed the actualization of graphtool and now I am a little bit confused about some changes. Sorry, I know my questions are very basic. I am not familiar with these language and I have some dificulties to get results.

I am running inference algorithms to get the best model using different options of model selection. I want to set pclabel in the inference algorithms because I know a priori my network is bipartite, and next I want to get the description length. Before actualization I did this by this way:

vprop_double = g.new_vertex_property("int") # g is my network for i in range(0, 11772): vprop_double[g.vertex(i)] = 1 for i in range(11773, 214221): vprop_double[g.vertex(i)] = 2

state = gt.minimize_blockmodel_dl(g, pclabel=True)

state.entropy(dl=True) # I am not sure this is the right way to get the description length.

But now I have some problems. First of all, minimize_blockmodel_dl doesn't have a pclabel argument so I don't know how indicate it in the inference algorithm. I have tried this:

state.pclabel = vprop_double

But I get the same result when I do "state.entropy(dl=True)" as before. Also, I get the same result doing "state.entropy(dl=True)" or "state.entropy()", and I don't understand why neither.

And finally, in NestedBlockState objects I don't know to get description length because entropy hasn't a "dl" argument. In these objects entropy and dl are the same?

In conclusion, I don't know how to set pclabel and to get the description length in hierarchical models, and I am not sure if I am getting it correctly in non-hierarchical ones.

Sorry again for my basic questions but I can't go on because of these problems.

Thank you very much!

Best regards,

Andrea

2016-05-10 11:41 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Thank you very much! your answer has been really helpful, now I understand this much better. I'll think about the options you said.

Thanks again,

Andrea

2016-05-09 16:33 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Dear Dr Peixoto,

I would like to solve some questions I have about inference algorithms for the identification of large-scale network structure via the statistical inference of generative models.

Minimize_blockmodel algorithm takes an hour to finish using my network with 21000 nodes (like the hierarchical version), and it spends two days and a half with overlap. However, I have run an hierarchical analysis with overlap, and it is still running since 14 days ago. So my first question is: is this time normal, or maybe there is any problem? Do you know how long could it ussually takes?

Secondly, I have repeated some of these analysis with exactly same options but I get different solutions (similar but different), so I wonder if the algorithm is heuristic (I thought it was exact).

My last question question regards bipartite analysis. I have two types of nodes in my network and I wonder if there are any analytical difference when running these algorithms with the bipartite option (clabel=True, and different labels in each group of nodes) or not, because it seems that the program “knows” my network is bipartite in any case. If there are differences between bipartite and “unipartite” analysis (clabel=False), is it possible to compare description length between them to model selection?

Thank you very much for your help!

Best regards,

Andrea

Tiago de Paula Peixoto

8:07 a.m.

On 19.05.2016 09:55, Andrea Briega wrote:

...

I have just run a few analysis of the new version of your package and my results totally change between v2.13 and v2.16.

Nested_minimize_blockmodel is the one that make most relevant changes and it is very difficult to get a biological explanation of the new results, mainly at the superior hierarchical levels.

There is nothing fundamental that changed between versions, only a re-organization of the code.

...

I would like to know the particular changes in these two analysis to better understanding of my results. Is it possible to change any parameter to run this function in a similar way to the v2.13? I used to run this function on this way:

state = minimize_nested_blockmodel_dl(g, pclabel=vprop_double, overlap=False, nonoverlap_init=False, deg_corr=True, layers=False)

In that version, the pclabel option did not exist. Note also that passing all the other options are redundant, since you are using the default values for them.

...

And I have run the new version of the function on this way:

state = minimize_nested_blockmodel_dl(g, state_args=dict(pclabel=vprop_double), overlap=False, nonoverlap_init=False, deg_corr=True, layers=False)

The difference here is that pclabel is being used. In the previous version it was being ignored. Maybe you should try without the pclabel option, to see if you get similar results to the ones you got previously. This may help you understand the difference. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

Andrea Briega

8 Jun 8 Jun

8:14 a.m.

Thank you very much, your answers have been really helpful. I am now on the last step, model selection, and I would like to be sure that I’m doing it right. I get the posterior odds ratio to compare two partitions throught this way: e^-(dl1-dl2), with dl1 and dl2 as higher and lower description length respectively. I have obtained description length using ‘state.entropy()’ for nested models and ‘state.entropy(dl=True)’ for no nested ones. I have doubts about this because small differences in description length cause much lower values than 0.01, so in most cases the evidence supporting one of the models is decisive. I only get higher values than 0.01 if the difference in description length is lower than 5 units. With my data (24.000 nodes and 5.000.000 edges) I always obtain decisive supports, either when I compare different models or when I compare different runs of the same model. I wonder if this is rigth. Thanks again, Andrea 2016-05-19 9:55 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...

Dear Mr Peixoto,

I have just run a few analysis of the new version of your package and my results totally change between v2.13 and v2.16.

Nested_minimize_blockmodel is the one that make most relevant changes and it is very difficult to get a biological explanation of the new results, mainly at the superior hierarchical levels. I would like to know the particular changes in these two analysis to better understanding of my results. Is it possible to change any parameter to run this function in a similar way to the v2.13? I used to run this function on this way:

state = minimize_nested_blockmodel_dl(g, pclabel=vprop_double, overlap=False, nonoverlap_init=False, deg_corr=True, layers=False)

And I have run the new version of the function on this way:

state = minimize_nested_blockmodel_dl(g, state_args=dict(pclabel=vprop_double), overlap=False, nonoverlap_init=False, deg_corr=True, layers=False)

Thank you very much,

Andrea

2016-05-12 18:43 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Thank you very much! I was wrong, I meant "state = gt.minimize_blockmodel_dl(g, pclabel=vprop_double)", it has been a mistake while I was writing the mail. So the key was the use of "state_args", I tried it but with a different notation and obviously it didn't work. Now I can go on!

Thanks again,

Andrea

2016-05-12 10:56 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Hi again,

I have recently noticed the actualization of graphtool and now I am a little bit confused about some changes. Sorry, I know my questions are very basic. I am not familiar with these language and I have some dificulties to get results.

I am running inference algorithms to get the best model using different options of model selection. I want to set pclabel in the inference algorithms because I know a priori my network is bipartite, and next I want to get the description length. Before actualization I did this by this way:

vprop_double = g.new_vertex_property("int") # g is my network for i in range(0, 11772): vprop_double[g.vertex(i)] = 1 for i in range(11773, 214221): vprop_double[g.vertex(i)] = 2

state = gt.minimize_blockmodel_dl(g, pclabel=True)

state.entropy(dl=True) # I am not sure this is the right way to get the description length.

But now I have some problems. First of all, minimize_blockmodel_dl doesn't have a pclabel argument so I don't know how indicate it in the inference algorithm. I have tried this:

state.pclabel = vprop_double

But I get the same result when I do "state.entropy(dl=True)" as before. Also, I get the same result doing "state.entropy(dl=True)" or "state.entropy()", and I don't understand why neither.

And finally, in NestedBlockState objects I don't know to get description length because entropy hasn't a "dl" argument. In these objects entropy and dl are the same?

In conclusion, I don't know how to set pclabel and to get the description length in hierarchical models, and I am not sure if I am getting it correctly in non-hierarchical ones.

Sorry again for my basic questions but I can't go on because of these problems.

Thank you very much!

Best regards,

Andrea

2016-05-10 11:41 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Thank you very much! your answer has been really helpful, now I understand this much better. I'll think about the options you said.

Thanks again,

Andrea

2016-05-09 16:33 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Dear Dr Peixoto,

I would like to solve some questions I have about inference algorithms for the identification of large-scale network structure via the statistical inference of generative models.

Minimize_blockmodel algorithm takes an hour to finish using my network with 21000 nodes (like the hierarchical version), and it spends two days and a half with overlap. However, I have run an hierarchical analysis with overlap, and it is still running since 14 days ago. So my first question is: is this time normal, or maybe there is any problem? Do you know how long could it ussually takes?

Secondly, I have repeated some of these analysis with exactly same options but I get different solutions (similar but different), so I wonder if the algorithm is heuristic (I thought it was exact).

My last question question regards bipartite analysis. I have two types of nodes in my network and I wonder if there are any analytical difference when running these algorithms with the bipartite option (clabel=True, and different labels in each group of nodes) or not, because it seems that the program “knows” my network is bipartite in any case. If there are differences between bipartite and “unipartite” analysis (clabel=False), is it possible to compare description length between them to model selection?

Thank you very much for your help!

Best regards,

Andrea

Tiago de Paula Peixoto

8:29 a.m.

On 08.06.2016 10:14, Andrea Briega wrote:

...

Thank you very much, your answers have been really helpful. I am now on the last step, model selection, and I would like to be sure that I’m doing it right. I get the posterior odds ratio to compare two partitions throught this way: e^-(dl1-dl2), with dl1 and dl2 as higher and lower description length respectively. I have obtained description length using ‘state.entropy()’ for nested models and ‘state.entropy(dl=True)’ for no nested ones.

This is correct. Note that in current versions of graph-tool you can also just call state.entropy() for non-nested models, since we have dl=True per default.

...

I have doubts about this because small differences in description length cause much lower values than 0.01, so in most cases the evidence supporting one of the models is decisive. I only get higher values than 0.01 if the difference in description length is lower than 5 units. With my data (24.000 nodes and 5.000.000 edges) I always obtain decisive supports, either when I compare different models or when I compare different runs of the same model. I wonder if this is rigth.

This is indeed expected if you have lots of data (i.e. large networks). For sufficient data, the evidence for the better model should always become decisive, as long as the models being compared are indeed distinguishable. 5 million edges is quite a bit, and indeed I would expect the posterior odds to be quite small in this situation. You just have to make sure that you found the best fit (i.e. smaller description length) for each model you are comparing, by running the algorithm as many times as you can. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

Andrea Briega

3:02 p.m.

Ok, thanks! I'll run the algorithm more times then to try to find the best fit. Best regards, Andrea 2016-06-08 10:14 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...

Thank you very much, your answers have been really helpful. I am now on the last step, model selection, and I would like to be sure that I’m doing it right. I get the posterior odds ratio to compare two partitions throught this way: e^-(dl1-dl2), with dl1 and dl2 as higher and lower description length respectively. I have obtained description length using ‘state.entropy()’ for nested models and ‘state.entropy(dl=True)’ for no nested ones. I have doubts about this because small differences in description length cause much lower values than 0.01, so in most cases the evidence supporting one of the models is decisive. I only get higher values than 0.01 if the difference in description length is lower than 5 units. With my data (24.000 nodes and 5.000.000 edges) I always obtain decisive supports, either when I compare different models or when I compare different runs of the same model. I wonder if this is rigth.

Thanks again,

Andrea

2016-05-19 9:55 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Dear Mr Peixoto,

I have just run a few analysis of the new version of your package and my results totally change between v2.13 and v2.16.

Nested_minimize_blockmodel is the one that make most relevant changes and it is very difficult to get a biological explanation of the new results, mainly at the superior hierarchical levels. I would like to know the particular changes in these two analysis to better understanding of my results. Is it possible to change any parameter to run this function in a similar way to the v2.13? I used to run this function on this way:

state = minimize_nested_blockmodel_dl(g, pclabel=vprop_double, overlap=False, nonoverlap_init=False, deg_corr=True, layers=False)

And I have run the new version of the function on this way:

state = minimize_nested_blockmodel_dl(g, state_args=dict(pclabel=vprop_double), overlap=False, nonoverlap_init=False, deg_corr=True, layers=False)

Thank you very much,

Andrea

2016-05-12 18:43 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Thank you very much! I was wrong, I meant "state = gt.minimize_blockmodel_dl(g, pclabel=vprop_double)", it has been a mistake while I was writing the mail. So the key was the use of "state_args", I tried it but with a different notation and obviously it didn't work. Now I can go on!

Thanks again,

Andrea

2016-05-12 10:56 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Hi again,

I have recently noticed the actualization of graphtool and now I am a little bit confused about some changes. Sorry, I know my questions are very basic. I am not familiar with these language and I have some dificulties to get results.

I am running inference algorithms to get the best model using different options of model selection. I want to set pclabel in the inference algorithms because I know a priori my network is bipartite, and next I want to get the description length. Before actualization I did this by this way:

vprop_double = g.new_vertex_property("int") # g is my network for i in range(0, 11772): vprop_double[g.vertex(i)] = 1 for i in range(11773, 214221): vprop_double[g.vertex(i)] = 2

state = gt.minimize_blockmodel_dl(g, pclabel=True)

state.entropy(dl=True) # I am not sure this is the right way to get the description length.

But now I have some problems. First of all, minimize_blockmodel_dl doesn't have a pclabel argument so I don't know how indicate it in the inference algorithm. I have tried this:

state.pclabel = vprop_double

But I get the same result when I do "state.entropy(dl=True)" as before. Also, I get the same result doing "state.entropy(dl=True)" or "state.entropy()", and I don't understand why neither.

And finally, in NestedBlockState objects I don't know to get description length because entropy hasn't a "dl" argument. In these objects entropy and dl are the same?

In conclusion, I don't know how to set pclabel and to get the description length in hierarchical models, and I am not sure if I am getting it correctly in non-hierarchical ones.

Sorry again for my basic questions but I can't go on because of these problems.

Thank you very much!

Best regards,

Andrea

2016-05-10 11:41 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Thank you very much! your answer has been really helpful, now I understand this much better. I'll think about the options you said.

Thanks again,

Andrea

2016-05-09 16:33 GMT+02:00 Andrea Briega <annbrial@gmail.com>:

...
Dear Dr Peixoto,

I would like to solve some questions I have about inference algorithms for the identification of large-scale network structure via the statistical inference of generative models.

Minimize_blockmodel algorithm takes an hour to finish using my network with 21000 nodes (like the hierarchical version), and it spends two days and a half with overlap. However, I have run an hierarchical analysis with overlap, and it is still running since 14 days ago. So my first question is: is this time normal, or maybe there is any problem? Do you know how long could it ussually takes?

Secondly, I have repeated some of these analysis with exactly same options but I get different solutions (similar but different), so I wonder if the algorithm is heuristic (I thought it was exact).

My last question question regards bipartite analysis. I have two types of nodes in my network and I wonder if there are any analytical difference when running these algorithms with the bipartite option (clabel=True, and different labels in each group of nodes) or not, because it seems that the program “knows” my network is bipartite in any case. If there are differences between bipartite and “unipartite” analysis (clabel=False), is it possible to compare description length between them to model selection?

Thank you very much for your help!

Best regards,

Andrea

3479

Age (days ago)

3509

Last active (days ago)

List overview

Download

10 comments

2 participants

participants (2)

Andrea Briega
Tiago de Paula Peixoto

Questions about inference algorithms

tags

participants (2)