Getting edge probabilities for SBM.
Hi Tiago, I am a new user for Graph Tool. I was trying to do a fitting approach for a graph. And to measure the goodness of fit, I intend to use log of likelihood score for the generated graph and the real graph. For this, I needed to calculate the probabilities of each edge existing in the generated graph. However, when I use the 'get_edge_prob' function for BlockState, I get values which are greater than 0 for an edge existing. Which should not be possible since log(prob) <= 0. It is supposed to return 'unnormalised log probability' of the edge. Can you please clarify? Thanks, Sukrit -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 12.10.2016 07:12, isukritgupta wrote:
Hi Tiago, I am a new user for Graph Tool. I was trying to do a fitting approach for a graph. And to measure the goodness of fit, I intend to use log of likelihood score for the generated graph and the real graph.
For this, I needed to calculate the probabilities of each edge existing in the generated graph. However, when I use the 'get_edge_prob' function for BlockState, I get values which are greater than 0 for an edge existing. Which should not be possible since log(prob) <= 0. It is supposed to return 'unnormalised log probability' of the edge.
Can you please clarify?
The exact quantity computed is explained here: https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#predi... It is the posterior likelihood of a missing edge, conditioned on the observed graph and model. It is unnormalized, because computing the normalization constant would require obtaining this value for every possible missing edge, which is typically of the order O(N^2). Since it is not normalized, it can also return log-probability values that are above zero. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Hi Tiago, how do we obtain the normalisation constant in this case? Please help with some sample code for it. Regards, Sukrit -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 12.10.2016 11:37, isukritgupta wrote:
Hi Tiago, how do we obtain the normalisation constant in this case?
I just explained it. You need to sum the probabilities for every possible edge (which will be very slow, and I suspect is not really the quantity you are interested in). If you need help understanding the general framework, just scroll down in the page I sent you, and you will find the relevant literature. -- Tiago de Paula Peixoto <tiago@skewed.de>
Please verify if I have understood this correctly: I calculate a normalisation constant for each iteration of the network generated by 'mcmc_equilibrate'. This constant is the sum of probabilities of each edge existing between all pairs of nodes in the graph. To get the actual value of an edge existing in the graph, I divide the probability of that edge existing in the graph by the above mentioned constant. Thank you, for being patient with me! Sukrit -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 12.10.2016 11:59, isukritgupta wrote:
I calculate a normalisation constant for each iteration of the network generated by 'mcmc_equilibrate'.
mcmc_equilibrate() will generate fits of the SBM, i.e. partitions of the network according to the posterior probability. It will not generate networks.
This constant is the sum of probabilities of each edge existing between all pairs of nodes in the graph.
No, you should consider only all pairs of nodes that are _not_ connected in the graph. This is about predicting _missing_ edges.
To get the actual value of an edge existing in the graph, I divide the probability of that edge existing in the graph by the above mentioned constant.
Yes. (Note that this probability will correspond to the question: Assuming there is only one missing edge in the network, what is the likelihood of it being a particular edge? You have to be sure this is what you want to compute.) -- Tiago de Paula Peixoto <tiago@skewed.de>
Hi Tiago, so this is what I want: I want to calculate the log likelihood that the SBM modeled from a network fits the network correctly. I want to compare this with the likelihood of other model fits to the network. I have the log likelihood scores for other model fits, but SBM is pending. Till now, I was trying to get the edge probabilities for all the edges and trying to calculate the log likelihood based on that, but it doesn't seem like this is a good way to go. Can you give me a workaround for this with Graph Tool? I want to calculate the likelihood that the model generated by minimize_blockmodel_dl fits the graph which is input into it without dropping any constants in the process. The constants part is important because I want the real likelihood so that I can compare it with fits of other models. Sukrit -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 12.10.2016 12:37, isukritgupta wrote:
Hi Tiago, so this is what I want: I want to calculate the log likelihood that the SBM modeled from a network fits the network correctly. I want to compare this with the likelihood of other model fits to the network. I have the log likelihood scores for other model fits, but SBM is pending. Till now, I was trying to get the edge probabilities for all the edges and trying to calculate the log likelihood based on that, but it doesn't seem like this is a good way to go.
Can you give me a workaround for this with Graph Tool?
I want to calculate the likelihood that the model generated by minimize_blockmodel_dl fits the graph which is input into it without dropping any constants in the process. The constants part is important because I want the real likelihood so that I can compare it with fits of other models.
Ah, OK. That is something else entirely. You just want to do model selection. This is fully supported in graph-tool. There are two things you can compute: 1. The joint likelihood of the graph and partition. You get this simply via the state.entropy() function, which returns the negative logarithm of this value (what is also known as the description length). See: https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#model... 2. The full marginal distribution, corresponding to the sum of the likelihood above over all partitions. This cannot be done exactly, but can be done approximately. This is explained here: https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#model... Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (2)
-
isukritgupta -
Tiago de Paula Peixoto