Hi there,
I am trying to include the edge weights by taking to account an edge covariate matrix for the nested block model inference. Well, Each time I run the code on my data set I get slightly different results both in terms of number of blocks and the nodes in each block.
This is my code: state = minimize_nested_blockmodel_dl(g, state_args=dict(recs=[g.edge_properties["weight"]], rec_types=["discrete-geometric"])) state.draw(edge_color=prop_to_size(g.edge_properties["weight"], power=1, log=True), ecmap=(matplotlib.cm.gist_heat, .6), eorder=g.edge_properties["weight"], edge_pen_width=prop_to_size(g.edge_properties["weight"], 1, 4, power=1, log=True), edge_gradient=[], vertex_text=g.vertex_properties["attribute"], vertex_text_position="centered", vertex_text_rotation=g.vertex_properties['text_rotation'], vertex_font_size=10, vertex_font_family='mono', vertex_anchor=0, output_size=[1024*2,1024*2], output="DiscreteGeometric_%s.pdf"%(eventName))
I appreciate if you explain what your approach would be and how I can run graph-tool using the covariance matrix of edges in order to get statistically reliable results?
Is there also any way to get the full posterior of each node belonging to each block?
Thanks in advance.
On 26.04.2018 12:52, Zahra Sheikhbahaee wrote:
Hi there,
I am trying to include the edge weights by taking to account an edge covariate matrix for the nested block model inference. Well, Each time I run the code on my data set I get slightly different results both in terms of number of blocks and the nodes in each block.
This is because the inference is made using MCMC, which is a stochastic algorithm. You have to run it multiple times, and select the result with largest posterior probability (if you only want a point estimate).
This is my code: state = minimize_nested_blockmodel_dl(g, state_args=dict(recs=[g.edge_properties["weight"]], rec_types=["discrete-geometric"])) state.draw(edge_color=prop_to_size(g.edge_properties["weight"], power=1, log=True), ecmap=(matplotlib.cm.gist_heat, .6), eorder=g.edge_properties["weight"], edge_pen_width=prop_to_size(g.edge_properties["weight"], 1, 4, power=1, log=True), edge_gradient=[], vertex_text=g.vertex_properties["attribute"], vertex_text_position="centered", vertex_text_rotation=g.vertex_properties['text_rotation'], vertex_font_size=10, vertex_font_family='mono', vertex_anchor=0, output_size=[1024*2,1024*2], output="DiscreteGeometric_%s.pdf"%(eventName))
Although it not important for the questions you have raised, it is not very useful to post incomplete code. Normally, for troubleshooting purposes, it is necessary for you to provide a _minimal_ and _self-contained_ program that anyone could execute and verify the problem you are reporting.
I appreciate if you explain what your approach would be and how I can run graph-tool using the covariance matrix of edges in order to get statistically reliable results?
This is covered in detail in the HOWTO:
https://graph-tool.skewed.de/static/doc/demos/inference/inference.html
and also in many papers, e.g.
https://arxiv.org/abs/1705.10225 https://arxiv.org/abs/1708.01432
However, I'm note sure what you mean by "covariance matrix of edges". The approach in question deals with graphs with edge covariates (a.k.a. weights). A covariance matrix usually refers to something else.
Is there also any way to get the full posterior of each node belonging to each block?
This is also explained in detail in the HOWTO:
https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#sampl...
Best, Tiago
In my network, beside to the information of which two nodes create an edge, I have the information of the time duration which an edge has lasted. I included this information as weight and used them as the covariate of the SBM. The results seems more reasonable compared to not considering any weights. However, the number of blocks changes slightly in each time I ran my script with the piece of code given before. So I was wondering if I must run minimize_nested_blockmodel_dl function by determining the higher number of MCMC iterations as argument, and then I would get more accurate results with highest confidence interval or I just need to repeat this function in a loop and then compute the mean number of blocks? I hope my question makes sense.
Thanks again. Zahra
On Thu, Apr 26, 2018 at 3:43 PM, Tiago de Paula Peixoto tiago@skewed.de wrote:
On 26.04.2018 12:52, Zahra Sheikhbahaee wrote:
Hi there,
I am trying to include the edge weights by taking to account an edge
covariate matrix for the nested block model inference. Well, Each time I run the code on my data set I get slightly different results both in terms of number of blocks and the nodes in each block.
This is because the inference is made using MCMC, which is a stochastic algorithm. You have to run it multiple times, and select the result with largest posterior probability (if you only want a point estimate).
This is my code: state = minimize_nested_blockmodel_dl(g, state_args=dict(recs=[g.edge_properties["weight"]],
rec_types=["discrete-geometric"]))
state.draw(edge_color=prop_to_size(g.edge_properties["weight"],
power=1, log=True),
ecmap=(matplotlib.cm.gist_heat, .6), eorder=g.edge_properties["weight"], edge_pen_width=prop_to_size(g.edge_properties["weight"],
1, 4, power=1, log=True),
edge_gradient=[], vertex_text=g.vertex_properties["attribute"], vertex_text_position="centered", vertex_text_rotation=g.vertex_properties['text_rotation'],
vertex_font_size=10, vertex_font_family='mono', vertex_anchor=0, output_size=[1024*2,1024*2], output="DiscreteGeometric_%s.pdf"%(eventName))
Although it not important for the questions you have raised, it is not very useful to post incomplete code. Normally, for troubleshooting purposes, it is necessary for you to provide a _minimal_ and _self-contained_ program that anyone could execute and verify the problem you are reporting.
I appreciate if you explain what your approach would be and how I can run graph-tool using the covariance matrix of edges in order to get statistically reliable results?
This is covered in detail in the HOWTO:
https://graph-tool.skewed.de/static/doc/demos/inference/inference.html
and also in many papers, e.g.
https://arxiv.org/abs/1705.10225 https://arxiv.org/abs/1708.01432
However, I'm note sure what you mean by "covariance matrix of edges". The approach in question deals with graphs with edge covariates (a.k.a. weights). A covariance matrix usually refers to something else.
Is there also any way to get the full posterior of each node belonging to each block?
This is also explained in detail in the HOWTO:
https://graph-tool.skewed.de/static/doc/demos/inference/ inference.html#sampling-from-the-posterior-distribution
Best, Tiago
-- Tiago de Paula Peixoto tiago@skewed.de _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
On 26.04.2018 15:29, Zahra Sheikhbahaee wrote:
In my network, beside to the information of which two nodes create an edge, I have the information of the time duration which an edge has lasted. I included this information as weight and used them as the covariate of the SBM. The results seems more reasonable compared to not considering any weights. However, the number of blocks changes slightly in each time I ran my script with the piece of code given before. So I was wondering if I must run minimize_nested_blockmodel_dl function by determining the higher number of MCMC iterations as argument, and then I would get more accurate results with highest confidence interval or I just need to repeat this function in a loop and then compute the mean number of blocks? I hope my question makes sense.
You should run the algorithm multiple times, and choose the result with the smallest description length. You get this value via the method state.entropy().
Best, Tiago
Thanks a lot for the advice!
On Thu, Apr 26, 2018 at 5:26 PM, Tiago de Paula Peixoto tiago@skewed.de wrote:
On 26.04.2018 15:29, Zahra Sheikhbahaee wrote:
In my network, beside to the information of which two nodes create an
edge,
I have the information of the time duration which an edge has lasted. I included this information as weight and used them as the covariate of the SBM. The results seems more reasonable compared to not
considering
any weights. However, the number of blocks changes slightly in each time
I
ran my script with the piece of code given before. So I was wondering if
I
must run minimize_nested_blockmodel_dl function by determining the higher number of MCMC iterations as argument, and then I would get more accurate results with highest confidence interval or I just need to repeat this function in a loop and then compute the mean number of blocks? I hope my question makes sense.
You should run the algorithm multiple times, and choose the result with the smallest description length. You get this value via the method state.entropy().
Best, Tiago
-- Tiago de Paula Peixoto tiago@skewed.de _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Hi Tiago,
For the non-parametric weighted SBMs, how can I extract the "description length" from the the state.entropy() method? Is it also equivalent of having the maximum entropy values after running the algorithm multiple times ?
I also have a theoretical question: I read most of your recent papers and I see this statement but I could not find more description why it is the case? Why do you use the "micro-canonical formulation"? You stated that "it approaches to the canonical distributions asymptotically". In case you have explained it in one of your papers, would you kindly refer me to the right paper?
Thanks in advance.
Best, Zahra
On Thu, Apr 26, 2018 at 6:35 PM, Zahra Sheikhbahaee zsheikh2017@gmail.com wrote:
Thanks a lot for the advice!
On Thu, Apr 26, 2018 at 5:26 PM, Tiago de Paula Peixoto tiago@skewed.de wrote:
On 26.04.2018 15:29, Zahra Sheikhbahaee wrote:
In my network, beside to the information of which two nodes create an
edge,
I have the information of the time duration which an edge has lasted. I included this information as weight and used them as the covariate of the SBM. The results seems more reasonable compared to not
considering
any weights. However, the number of blocks changes slightly in each
time I
ran my script with the piece of code given before. So I was wondering
if I
must run minimize_nested_blockmodel_dl function by determining the
higher
number of MCMC iterations as argument, and then I would get more
accurate
results with highest confidence interval or I just need to repeat this function in a loop and then compute the mean number of blocks? I hope my question makes sense.
You should run the algorithm multiple times, and choose the result with the smallest description length. You get this value via the method state.entropy().
Best, Tiago
-- Tiago de Paula Peixoto tiago@skewed.de _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Am 16.07.2018 um 15:15 schrieb Zahra Sheikhbahaee:
For the non-parametric weighted SBMs, how can I extract the "description length" from the the state.entropy() method? Is it also equivalent of having the maximum entropy values after running the algorithm multiple times ?
The entropy() method returns the negative joint log-likelihood of the data and model parameters. For discrete data and model parameters, this equals the description length.
For the weighted SBM with continuous covariates, the data and model are no longer discrete, so this value can no longer be called a description length, although it plays the same role. However, for discrete covariates, it is the description length.
I also have a theoretical question: I read most of your recent papers and I see this statement but I could not find more description why it is the case? Why do you use the "micro-canonical formulation"? You stated that "it approaches to the canonical distributions asymptotically". In case you have explained it in one of your papers, would you kindly refer me to the right paper?
The microcanonical model is identical to the canonical model, if the latter is integrated over its continuous parameters using uninformative priors, as explained in detail here:
https://arxiv.org/abs/1705.10225
Therefore, in a Bayesian setting, it makes no difference which one is used, as they yield the same posterior distribution.
The main reason to use the microcanonical formulation is that it makes it easier to extend the Bayesian hierarchy, i.e. include deeper priors and hyperpriors, thus achieving more robust models without a resolution limit, accepting of arbitrary group sizes and degree distributions, etc. Within the canonical formulation, this is technically more difficult.
Best, Tiago
Hi Tiago,
Thanks for the explanation. I have another question:
In the "Inferring the mesoscale structure of layered, edge-valued and time-varying networks", you compared two way of constructing layered structures: first approach: You assumed an adjacency matrix in each independent layer. The second method, the collapsed graph considered as a result of merging all the adjacency matrices together.
I am wondering how I can use graph_tool for the first method? Which method or class should I use? If there is a class, is it still possible to consider a graph with weighted edges?
Thanks again.
Regards, Zahra
On Mon, Jul 16, 2018 at 4:38 PM, Tiago de Paula Peixoto tiago@skewed.de wrote:
Am 16.07.2018 um 15:15 schrieb Zahra Sheikhbahaee:
For the non-parametric weighted SBMs, how can I extract the "description length" from the the state.entropy() method? Is it also equivalent of
having
the maximum entropy values after running the algorithm multiple times ?
The entropy() method returns the negative joint log-likelihood of the data and model parameters. For discrete data and model parameters, this equals the description length.
For the weighted SBM with continuous covariates, the data and model are no longer discrete, so this value can no longer be called a description length, although it plays the same role. However, for discrete covariates, it is the description length.
I also have a theoretical question: I read most of your recent papers
and I
see this statement but I could not find more description why it is the
case?
Why do you use the "micro-canonical formulation"? You stated that "it approaches to the canonical distributions asymptotically". In case you
have
explained it in one of your papers, would you kindly refer me to the
right
paper?
The microcanonical model is identical to the canonical model, if the latter is integrated over its continuous parameters using uninformative priors, as explained in detail here:
https://arxiv.org/abs/1705.10225
Therefore, in a Bayesian setting, it makes no difference which one is used, as they yield the same posterior distribution.
The main reason to use the microcanonical formulation is that it makes it easier to extend the Bayesian hierarchy, i.e. include deeper priors and hyperpriors, thus achieving more robust models without a resolution limit, accepting of arbitrary group sizes and degree distributions, etc. Within the canonical formulation, this is technically more difficult.
Best, Tiago
-- Tiago de Paula Peixoto tiago@skewed.de
graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Am 16.07.2018 um 22:46 schrieb Zahra Sheikhbahaee:
Hi Tiago,
Thanks for the explanation. I have another question:
In the "Inferring the mesoscale structure of layered, edge-valued and time-varying networks", you compared two way of constructing layered structures: first approach: You assumed an adjacency matrix in each independent layer. The second method, the collapsed graph considered as a result of merging all the adjacency matrices together.
I am wondering how I can use graph_tool for the first method? Which method or class should I use?
You have to pass the option "layers=True" to the LayeredBlockState constructor:
https://graph-tool.skewed.de/static/doc/inference.html#graph_tool.inference....
If there is a class, is it still possible to consider a graph with weighted edges?
Yes, it accepts 'recs/rec_types/rec_params' just like the regular BlockState.
Best, Tiago
Hi Tiago,
Thanks for the reply. In the section (VI) of your paper "Inferring the mesoscale structure of layered, edge-valued and time-varying networks", you used the layered stochastic block model for a temporal network. I have a similar data set which I do not want to fix the membership for the nodes of different layers to the same block over all layers (nodes can change their block memberships over time). I am wondering again how I can use graph tool for this case? Which method or constructor should I use?
Regards, Zahra
On Wed, Jul 18, 2018 at 1:07 PM, Tiago de Paula Peixoto tiago@skewed.de wrote:
Am 16.07.2018 um 22:46 schrieb Zahra Sheikhbahaee:
Hi Tiago,
Thanks for the explanation. I have another question:
In the "Inferring the mesoscale structure of layered, edge-valued and time-varying networks", you compared two way of constructing layered structures: first approach: You assumed an adjacency matrix in each independent layer. The second method, the collapsed graph considered as a result of merging all the adjacency matrices together.
I am wondering how I can use graph_tool for the first method? Which
method
or class should I use?
You have to pass the option "layers=True" to the LayeredBlockState constructor:
https://graph-tool.skewed.de/static/doc/inference.html# graph_tool.inference.layered_blockmodel.LayeredBlockState
If there is a class, is it still possible to consider a graph with weighted edges?
Yes, it accepts 'recs/rec_types/rec_params' just like the regular BlockState.
Best, Tiago
-- Tiago de Paula Peixoto tiago@skewed.de _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Am 19.07.2018 um 16:01 schrieb Zahra Sheikhbahaee:
Thanks for the reply. In the section (VI) of your paper "Inferring the mesoscale structure of layered, edge-valued and time-varying networks", you used the layered stochastic block model for a temporal network. I have a similar data set which I do not want to fix the membership for the nodes of different layers to the same block over all layers (nodes can change their block memberships over time). I am wondering again how I can use graph tool for this case? Which method or constructor should I use?
It's always the same constructor, LayeredBlockState. To allow the membership to change across layers, you need to set overlap=True.
Hi Tiago,
I have another naive question: It is not still clear for me how I could pass different weighted graphs of each timestamp to this LayeredBlockState constructor while I want that each of these weighted graphs be considered as one layer of this multilayer network? Because the input is just a single graph.
Regards, Zahra
On Thu, Jul 19, 2018 at 7:53 PM, Tiago de Paula Peixoto tiago@skewed.de wrote:
Am 19.07.2018 um 16:01 schrieb Zahra Sheikhbahaee:
Thanks for the reply. In the section (VI) of your paper "Inferring the mesoscale structure of layered, edge-valued and time-varying networks",
you
used the layered stochastic block model for a temporal network. I have a similar data set which I do not want to fix the membership for the nodes
of
different layers to the same block over all layers (nodes can change
their
block memberships over time). I am wondering again how I can use graph
tool
for this case? Which method or constructor should I use?
It's always the same constructor, LayeredBlockState. To allow the membership to change across layers, you need to set overlap=True.
-- Tiago de Paula Peixoto tiago@skewed.de _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Am 19.07.2018 um 19:17 schrieb Zahra Sheikhbahaee:
I have another naive question: It is not still clear for me how I could pass different weighted graphs of each timestamp to this LayeredBlockState constructor while I want that each of these weighted graphs be considered as one layer of this multilayer network? Because the input is just a single graph.
Right, you have to collapse them in a single graph with multiple edges. The time-stamp on the edges (i.e. the "layers") should be stored as a property map that you pass as the 'ec' parameter to LayeredBlockState.