Temporal community detection with DSBM
This is more of like an implementation question. I'm PhD student in math and I've been trying out different dynamic community detection(DCD) algorithms to compare their performance. I was stuck on graph-tool's implementation for a few months because the documentation for the layered networks was really short and I noticed this mailing list just now. I have multiple questions: first, am I initiating the network correctly below? My network is a list of weighted adjacency matrices corresponding to each snapshots. g = Graph() e_weight = g.new_ep("double") e_layer = g.new_ep("int") g.add_edge_list(edge_list, hashed = True, eprops=[e_weight, e_layer]) ## edge_list has (v1,v2,w,t) format g.edge_properties["edge_weight"] = e_weight ## weights of the intralayer edges g.edge_properties["edge_layer"] = e_layer ## since this is a multilayer network, each edge has to come with a layer information that it belongs to(a layer is a snapshot of the temporal network and has nothing to do with nested model which I will use the word 'level' for) Then, when I run the optimizer with the below code, I feel like I'm doing something wrong. I want to use 'independent layers' model since this is a temporal network and edge weights within each layer are varying between [0,1]. for deg_corr in [True, False]: state = minimize_nested_blockmodel_dl(g, layers = True, deg_corr = deg_corr, state_args=dict(ec = g.ep.edge_layer, layers = True, recs=[g.ep.edge_weight], rec_types=["real-exponential"])) S1 = state.entropy() # we will pad the hierarchy with another four empty levels, to # give it room to potentially increase state = state.copy(bs=state.get_bs() + [np.zeros(1)] * 4, sampling = True) for i in range(100): # this should be sufficiently large state.multiflip_mcmc_sweep(beta=np.inf, niter=10) S2 = state.entropy() print(S1,S2, S2-S1) I have a serious problem with getting the node labels. The code below only returns the node membership information for N nodes(size of the aggregated network). But since this is an evolving network, nodes are expected to change communities over time, so below code should return NxT(number of nodes times number of layers) many community labels? levels = state.get_levels() for s in levels: print(s) This returns the network partition at different levels of the nested SBM but only for N many nodes. What am I missing? -- Sent from: https://nabble.skewed.de/
Am 26.02.21 um 20:00 schrieb ulgenklc:
I have a serious problem with getting the node labels. The code below only returns the node membership information for N nodes(size of the aggregated network). But since this is an evolving network, nodes are expected to change communities over time, so below code should return NxT(number of nodes times number of layers) many community labels?
levels = state.get_levels() for s in levels: print(s)
This returns the network partition at different levels of the nested SBM but only for N many nodes.
What am I missing?
You are missing the fact that the layered model only has a single partition shared between all the layers. If you want a model that gives a different partition for every layer you can either: 1. Split each layer into a separate graph, and fit a different SBM for each layer. 2. Fit an overlapping SBM (i.e. by passing overlap=True to LayeredBLockState), which will cluster the "half-edges", and hence give partitions that (potentially) vary between the layers. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Ahh, yes I noticed that when I scanned through the previous posts, sorry for replication. I have three more quick questions: 1) If I fitted a different SBM to splitted the graph, wouldn't the communities in individual layers be temporally discrete? That's not quite what I want because I want to track an evolution, but I'm open to process that result in a temporally overlapping fashion(some sort of set matching algorithms between layers.) 2) When I pass overlap =True to LayeredBlockState, I have NxT many nodes as a result like you said, that's all fine. However, now I can't get the node membership using .get_blocks(). levels = states[0].get_levels() levels[0].get_blocks() is a list of length M(total number of edges in the network) which should be a list of length NxT, isn't it? Again, it feels like I'm missing something very trivial here... 3) Also, is there a method for levels[0] that I can call to get the total number of communities up front? I can see the number of blocks when I do states[0].print_summary() but I need the integer value of this number for preprocessing.. Thank you!! -- Sent from: https://nabble.skewed.de/
Am 01.03.21 um 17:19 schrieb ulgenklc:
Ahh, yes I noticed that when I scanned through the previous posts, sorry for replication. I have three more quick questions:
1) If I fitted a different SBM to splitted the graph, wouldn't the communities in individual layers be temporally discrete? That's not quite what I want because I want to track an evolution, but I'm open to process that result in a temporally overlapping fashion(some sort of set matching algorithms between layers.)
I'm not sure exactly what you mean, but if you want to "align" the different partitions you can take a look at https://graph-tool.skewed.de/static/doc/inference.html#graph_tool.inference.... There is an example of this alignment being done here: https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#sampl...
2) When I pass overlap =True to LayeredBlockState, I have NxT many nodes as a result like you said, that's all fine. However, now I can't get the node membership using .get_blocks().
levels = states[0].get_levels()
levels[0].get_blocks() is a list of length M(total number of edges in the network) which should be a list of length NxT, isn't it? Again, it feels like I'm missing something very trivial here...
The overlapping block model consists of a labeling of the half-edges of the graph. Since each half-edge can belong to only one layer, you can then tell how the membership of the respective node has changed in the time slice. Please take a look at the documentation on how to extract this information, e.g. via https://graph-tool.skewed.de/static/doc/inference.html#graph_tool.inference....
3) Also, is there a method for levels[0] that I can call to get the total number of communities up front? I can see the number of blocks when I do states[0].print_summary() but I need the integer value of this number for preprocessing..
It's worthwhile to peruse the documentation, which contains all the answers to questions such as this: https://graph-tool.skewed.de/static/doc/inference.html#graph_tool.inference.... -- Tiago de Paula Peixoto <tiago@skewed.de>
Thank you very much! I experimented a little bit with overlap = True in the LayeredBlockState and I don't know if this is a common issue or it is just me but 1) solver takes way longer to equilibrate compared to BlockState(and networks are not very large, around N=100 with 4 layers on average, which I also threshold the weighted network edges) and 2) more interestingly, partitions in the layers that are towards the end of the temporal network are found perfectly whereas the communities I want to find in the first a few layers are partitioned into almost singleton communities. This is what I mean: <https://nabble.skewed.de/file/t496292/Screen_Shot_2021-03-17_at_11.png> <https://nabble.skewed.de/file/t496292/Screen_Shot_2021-03-17_at_11.png> I wasn't sure if this is because of the low number of iterations which I kept 100<niter<1000 (I guess reasonable given the network size and number of edges) but I kept getting similar results over different trials. One other thing I noticed is that DSBM fits non-communities(the upper left stairs in the ground truth) with %100 accuracy. I think I'm going to try fitting a different sbm to each layer as next step and share how it goes. Sorry for keep bugging you!! Best -- Sent from: https://nabble.skewed.de/
participants (2)
-
Tiago de Paula Peixoto -
ulgenklc