Hello,
I'm studying political coalitions over time (three time slices) and across discrete types of relationships (collaboration, ideological similarity, and discursive similarity). Since graph-tool offers (the only) principled approach that does not require arbitrary user-specified parameters (such as 'resolution' or 'coupling'), I would very much like to use it for my analysis.
I'm trying to fit a SBM to a graph with edges divided into distinct layers (three time slices X three types of relationships). Each edge has a discrete weight in the range of [0, 3], with weight referring to the 'strength' of the relationship. Not all nodes participate in all layers.
I'm interested in how/if the block membership varies across layers that I expect to be somewhat dependent on one another (hence I would like to avoid fitting a separate SBM to each layer). In my ideal scenario, I would be using the NestedBlockState with LayeredBlockState as base and allowing for overlaps across layers:
state=NestedBlockState(g, base_type=LayeredBlockState, state_args=dict(deg_corr=True, overlap=True, layers=True, ec=g.ep.layer, recs=[g.ep.weight], rec_types=["discrete-binomial"]))
dS, nmoves=0, 0 for i in range(100): ret=state.multiflip_mcmc_sweep(niter=10) dS+=ret[0] nmoves+=ret[1]
print("Change in description length:", dS) print("Number of accepted vertex moves:", nmoves)
mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10), verbose=True)
bs=[] def collect_partitions(s): global bs bs.append(s.get_bs())
mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), verbose=True, callback=collect_partitions)
pm=PartitionModeState(bs, nested=True, converge=True) pv=pm.get_marginal(g)
bs=pm.get_max_nested() state=state.copy(bs=bs)
Two questions:
Q1) This is taking very long (days) on a high performance cluster and actually I haven't seen it finishing up for once. Is my model specification correct or simply too complex? If the latter, is there a better strategy to perform my analysis? All I can think of is to revert to a non-nested model and/or reduce the number of layers and/or edges:
state=LayeredBlockState(g, deg_corr=True, overlap=True, layers=True, ec=g.ep.layer, recs=[g.ep.weight], rec_types=["discrete-binomial"])
dS, nattempts, nmoves=state.multiflip_mcmc_sweep(niter=1000) print("Change in description length:", dS) print("Number of accepted vertex moves:", nmoves)
mcmc_equilibrate(state, wait=1000, nbreaks=2, mcmc_args=dict(niter=10), verbose=True)
bs=[] def collect_partitions(s): global bs bs.append(s.b.a.copy())
mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), verbose=True, callback=collect_partitions)
pm=PartitionModeState(bs, converge=True) pv=pm.get_marginal(g)
Q2) Say that the model runs and yields a neat NestedBlockState (or LayeredBlockState) object with X overlapping blocks, 9 layers, degree-corrected, with 2 edge covariates, for graph <Graph object, undirected, with 220 vertices and 11322 edges, 2 internal vertex properties, 4 internal edge properties>.
Now, in previous posts there are mentions about the number of nodes multiplying by the number of layers when 'overlap=True', but in my experiments with only two layers, this does not seem to be the case and I struggle to find a way to extract the layer-specific block and respective memberships of nodes.
I've tried iteration over vertices, but what I eventually need is a 2d_array [y=all nodes in g, x=all layers in g, values=most likely block membership] in order to map how/if the block membership of each node varies across those layers in which they participate (ideally somehow denoting non-participation in the layer with NaN or some unique value). Is there some way to get there?
l=state.get_levels()[0] #NestedBlockState B=l.get_nonempty_B() bv=l.get_overlap_blocks()[0]
z=np.zeros((g.num_vertices(), B)) #to be filled with most likely block membership with potential overlaps across ALL layers, but I need the layer-specific information for i in range(g.num_vertices()): z[int(i), :len(z[v])]=bv[i].a+1
for v in range(g.num_vertices()*max(g.ep.layer)+1): #to see if the number of vertices multiplied by layers, which it didn't print(bv[v])
Many thanks, Arttu
Am 09.07.21 um 12:47 schrieb arttu.malkamaki@helsinki.fi:
Q1) This is taking very long (days) on a high performance cluster and actually I haven't seen it finishing up for once. Is my model specification correct or simply too complex? If the latter, is there a better strategy to perform my analysis? All I can think of is to revert to a non-nested model and/or reduce the number of layers and/or edges:
Very hard to say something, without more concrete information. How large is your network?
You are using the most complex combination possible, so indeed it should take longer than the simpler variants.
My overall suggestion is to always try the minimize_nested_blockmodel_dl() function first, and (time permitting) use that as a starting point to the MCMC.
Furthermore, I would suggest starting simple, using non-overlapping, unweighted and non-layered models first, and seeing how far you get, and building understanding from your data that way. You can then introduce more elaborate models step by step.
Q2) Say that the model runs and yields a neat NestedBlockState (or LayeredBlockState) object with X overlapping blocks, 9 layers, degree-corrected, with 2 edge covariates, for graph <Graph object, undirected, with 220 vertices and 11322 edges, 2 internal vertex properties, 4 internal edge properties>.
Now, in previous posts there are mentions about the number of nodes multiplying by the number of layers when 'overlap=True', but in my experiments with only two layers, this does not seem to be the case and I struggle to find a way to extract the layer-specific block and respective memberships of nodes.
I've tried iteration over vertices, but what I eventually need is a 2d_array [y=all nodes in g, x=all layers in g, values=most likely block membership] in order to map how/if the block membership of each node varies across those layers in which they participate (ideally somehow denoting non-participation in the layer with NaN or some unique value). Is there some way to get there?
In the overlapping model, group membership is a property of _edges_ (actually, half-edges), not nodes directly. Take a look at OverlapBlockState.get_edge_blocks(), which will give you this information. From this, you should be able to decompose membership across layers, etc.
Best, Tiago
Oi!
The graph has 220 nodes and 11322 edges divided into 9 layers, so it is not huge. But this seems like a good way forward - I'll keep on experimenting.
Excellent - get_edge_blocks is the function I was looking for.
Very helpful, thank you so much!
Arttu
Am 09.07.21 um 14:29 schrieb arttu.malkamaki@helsinki.fi:
Oi!
The graph has 220 nodes and 11322 edges divided into 9 layers, so it is not huge. But this seems like a good way forward - I'll keep on experimenting.
It seems really strange that it would take _days_ for such a small network!
There must be something wrong going on.
Try the step-wise approach, maybe it will help you to track it down.
Best, Tiago
Starting with minimize_nested_blockmodel_dl indeed speeds up the process and now I'm more confident that the model will eventually run:
state=minimize_nested_blockmodel_dl(g, state_args=dict(base_type=LayeredBlockState, state_args=dict(deg_corr=True, overlap=True, layers=True, ec=g.ep.layer, recs=[g.ep.weight], rec_types=["discrete-binomial"])))
dS, nmoves=0, 0 for i in range(100): ret=state.multiflip_mcmc_sweep(niter=10) dS+=ret[0] nmoves+=ret[1]
print("Change in description length:", dS) print("Number of accepted vertex moves:", nmoves)
mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10), verbose=True)
bs=[] def collect_partitions(s): global bs bs.append(s.get_bs())
mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), verbose=True, callback=collect_partitions)
pm=PartitionModeState(bs, nested=True, converge=True) pv=pm.get_marginal(g)
bs=pm.get_max_nested() state=state.copy(bs=bs)
Coming back to my initial question concerning the layer-specific partitions, it seems that I was too optimistic about being able to decompose this information from get_edge_blocks. I have already spent time reading the docs and the use of _be_ in source codes to convert the block membership of each half-edge to layer-specific block membership of each node, so I need to ask for more instructions.
There are a few previous posts about extracting this information but none of the answers addresses this decomposition issue beyond get_edge_blocks. I presume there is a relatively straightforward way to do this?
I also thought that there could have been a more convenient way of doing this by applying set_edge_filter to state.g and get_majority_blocks to extract this information, but yeah, didn't work.
Am 11.07.21 um 17:26 schrieb arttu.malkamaki@helsinki.fi:
Coming back to my initial question concerning the layer-specific partitions, it seems that I was too optimistic about being able to decompose this information from get_edge_blocks. I have already spent time reading the docs and the use of_be_ in source codes to convert the block membership of each half-edge to layer-specific block membership of each node, so I need to ask for more instructions.
There are a few previous posts about extracting this information but none of the answers addresses this decomposition issue beyond get_edge_blocks. I presume there is a relatively straightforward way to do this?
Do what? Please ask a specific question.
Best, Tiago
I was referring to the decomposition of layer-specific node membership from get_edge_blocks.
So, how do I process the _be_ returned by layers[0].get_edge_blocks() to yield this information as an np.array or vertex_property_map?
Thanks, Arttu
Am 11.07.21 um 17:49 schrieb arttu.malkamaki@helsinki.fi:
I was referring to the decomposition of layer-specific node membership from get_edge_blocks.
So, how do I process the _be_ returned by layers[0].get_edge_blocks() to yield this information as an np.array or vertex_property_map?
I'm sorry, but this is not helping. You are not making any substantial effort to be specific, despite my asking, and it's still not clear what you want that is not already given.
A half-edge points to a node. Each half-edge belongs to a group, as given by get_edge_blocks(). Each half-edge also belongs to a specific layer.
What else do you need?
Hi,
I'm sorry, I'll try to be more specific.
I eventually need a matrix that I can plot as done in this post by ulgenklc: https://nabble.skewed.de/Temporal-community-detection-with-DSBM-tt4028581.ht...
Here, I use a subgraph with only two layers to demonstrate this seemingly trivial issue:
np.random.seed(42) gt.seed_rng(42)
state = minimize_nested_blockmodel_dl(g, state_args=dict(base_type=LayeredBlockState, state_args=dict(deg_corr=True, overlap=True, layers=True, ec=g.ep.layer, recs=[g.ep.weight], rec_types=["discrete-binomial"])))
This yields:
<NestedBlockState object, with base <LayeredBlockState object with 4610 overlapping blocks, 2 layers, degree-corrected, with 2 edge covariates, for graph <Graph object, undirected, with 82 vertices and 2305 edges, 2 internal vertex properties, 4 internal edge properties, and 8 levels of sizes [(82, 3), (3, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1)] at 0x2b1aa8982dc0>
While there are 4610 overlapping blocks in this stage, state.draw() divides the nodes neatly into three sensible communities. Hence, I know that the snippet of code that I need already exists somewhere in draw_hierarchy.
Then, I filter the edges in state.g by layer and use get_edge_blocks() to extract the layer-specific half-edges:
lay = [] for e in g.edges(): lay.append(g.ep.layer[e]==0) #edges in the first layer lay = np.array(lay, dtype="int") ft = g.new_edge_property("bool") ft.a = lay state.g.set_edge_filter(ft) be = state.levels[0].get_edge_blocks()
This is where I struggle to find a way to extract the desired information. My best guess on how to extract the layer-specific node memberships from _be_ is the following:
s = OverlapBlockState(state.g, b=be) #state.g with filter on mb = s.get_majority_blocks() for v in g.vertices(): print(mb[v])
If I compare the _mb_ for the discrete layers, the memberships vary slightly (this is what I need), but I'm unsure whether this is the correct way to extract this information?
Many thanks, Arttu
Am 11.07.21 um 23:31 schrieb arttu.malkamaki@helsinki.fi:
Hi,
I'm sorry, I'll try to be more specific.
I eventually need a matrix that I can plot as done in this post by ulgenklc: https://nabble.skewed.de/Temporal-community-detection-with-DSBM-tt4028581.ht...
Here, I use a subgraph with only two layers to demonstrate this seemingly trivial issue:
np.random.seed(42) gt.seed_rng(42)
state = minimize_nested_blockmodel_dl(g, state_args=dict(base_type=LayeredBlockState, state_args=dict(deg_corr=True, overlap=True, layers=True, ec=g.ep.layer, recs=[g.ep.weight], rec_types=["discrete-binomial"])))
This yields:
<NestedBlockState object, with base <LayeredBlockState object with 4610 overlapping blocks, 2 layers, degree-corrected, with 2 edge covariates, for graph <Graph object, undirected, with 82 vertices and 2305 edges, 2 internal vertex properties, 4 internal edge properties, and 8 levels of sizes [(82, 3), (3, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1)] at 0x2b1aa8982dc0>
While there are 4610 overlapping blocks in this stage, state.draw() divides the nodes neatly into three sensible communities. Hence, I know that the snippet of code that I need already exists somewhere in draw_hierarchy.
Then, I filter the edges in state.g by layer and use get_edge_blocks() to extract the layer-specific half-edges:
lay = [] for e in g.edges(): lay.append(g.ep.layer[e]==0) #edges in the first layer lay = np.array(lay, dtype="int") ft = g.new_edge_property("bool") ft.a = lay state.g.set_edge_filter(ft) be = state.levels[0].get_edge_blocks()
This is where I struggle to find a way to extract the desired information. My best guess on how to extract the layer-specific node memberships from _be_ is the following:
s = OverlapBlockState(state.g, b=be) #state.g with filter on mb = s.get_majority_blocks() for v in g.vertices(): print(mb[v])
If I compare the _mb_ for the discrete layers, the memberships vary slightly (this is what I need), but I'm unsure whether this is the correct way to extract this information?
If "be" is the property map returned by state.levels[0].get_edge_blocks(), then
bm = g.new_vp("vector<int>", val=[0] * state.levels[0].get_B()) for e in g.edges(): if layer[e] != 0: continue u, v = e u, v = sorted((u, v)) r, s = be[e] bm[u][r] += 1 bm[v][s] += 1
will give you the memberships in each group at layer 0 in the property map "bm". Is this what you want?
Best, Tiago
Yep, this is what I needed - "bm" works fine by handing me the (possibly) overlapping node memberships across layers.
This'll save my week, many thanks!
Best, Arttu
Hi again,
With regard to the previous posts in this thread, I still have one more question.
My motivation to use the overlapping model with discrete layers (instead of fitting a separate model to each layer) was my understanding that this model will somehow account for dependencies between the layers, and return partitions that (potentially) vary across layers. After reading through https://journals.aps.org/prx/abstract/10.1103/PhysRevX.5.011033, I'm still not sure whether/how my model accounts for such dependencies.
What is the difference between A) fitting a separate model to each layer and B) fitting an overlapping model with layers=True? When interpreting my results, what can I say about the dependencies between layers with discrete types of relationships?
Many thanks, Arttu
Am 24.08.21 um 12:49 schrieb arttu.malkamaki@helsinki.fi:
What is the difference between A) fitting a separate model to each layer and B) fitting an overlapping model with layers=True? When interpreting my results, what can I say about the dependencies between layers with discrete types of relationships?
With B) the overlapping partition is encoded in a manner that takes correlations between the layers into account, so that the group memberships will vary between layers only if there is enough evidence requiring this. With A) the independent models do not take advantage of any similarity between layers.
Hi Tiago,
Still working on this same issue here... A quick question - is there any principled way to reconcile between the overlapping partitions (i.e., return majority blocks) stored in the new vertex property map "bm"?
1. I thought about passing bm to BlockState(b = bm) using a layer-specific GraphView and get_majority_blocks(), but the constructor doesn't accept overlapping partitions as b. 2. I also noticed that I can use state.layer_states[0].get_majority_blocks(), but for some reason the number of vertices in each of the 10 overlapping states differs from the ones they should have, thus not sure about how to interpret them.
gt.seed_rng(42); np.random.seed(42) state = gt.minimize_blockmodel_dl(g, state = gt.LayeredBlockState, state_args = dict(deg_corr = True, overlap = True, layers = True, ec = g.ep.layer, recs = [g.ep.weight], rec_types = ["real-exponential"]))
print(state) ... <LayeredBlockState object with 9012 overlapping blocks, 10 layers, degree-corrected, with 2 edge covariates, for graph <Graph object, undirected, with 113 vertices and 4506 edges, 1 internal vertex property, 2 internal edge properties, at 0x2abef36cf6d0>, at 0x2abef48360a0> ...
be = state.get_edge_blocks() bm = g.new_vertex_property("vector<int>", val = [0] * state.get_B())
for e in g.edges(): if g.ep.layer[e] != 0: #the layer I want the memberships for continue u, v = e u, v = sorted((u, v)) r, s = be[e] bm[u][r] += 1 bm[v][s] += 1
for v in g.vertices(): bm[v] = [i for i in bm[v] if i != 0]
for v in g.vertices() print(bm[v]) ... array([9], dtype=int32) array([5], dtype=int32) array([22, 1], dtype=int32) array([15], dtype=int32) array([ 3, 12], dtype=int32) ...
All help is much appreciated.
Best, Arttu
Oh well, I noticed that passing the state object to mcmc_equilibrate() improves the partitions such that I'm now getting certain group memberships repeatedly, thus allowing me to extract the "majority" membership for each node.
This solves my issue.
a.