I am going to compare NestedBlockState.entropy() of the two run, but I am not sure this is correct. How should I take into account the fact that the networks are slightly different?
Would normalization make the two entropies comparable? I'd be interested to hear opinions about using, for normalization, the entropy of a NestedBlockState where each node is in its own group.
Best
Haiko
Am 27.04.21 um 11:50 schrieb Lietz, Haiko:
I am going to compare NestedBlockState.entropy() of the two run, but
I am not sure this is correct.
How should I take into account the fact that the networks are
slightly different?
Would normalization make the two entropies comparable? I’d be interested to hear opinions about using, for normalization, the entropy of a NestedBlockState where each node is in its own group.
The description length (DL) tells you how much information is needed to encode both the network and the model parameters. If we compare the DL for the same network but different models, this tells which model most compresses the data. But if we compare two different networks with two different models, this tells us very little, because it mixes a comparison of which network is more regular with the quality of fit of each model.
The results of this kind of comparison is often trivial: the more nodes and edges, the higher will be the DL.
You *could* compute something like the DL per edge in order to compare two networks, but since the DL is not a linear function of the number of nodes or edges, it is difficult to put this evaluation on solid statistical grounds.
Best, Tiago
I am going to compare NestedBlockState.entropy() of the two run, but I am not sure this is correct.
How should I take into account the fact that the networks are slightly different?
Would normalization make the two entropies comparable? I'd be interested to hear opinions about using, for normalization, the entropy of a NestedBlockState where each node is in its own group.
The description length (DL) tells you how much information is needed to encode both the network and the model parameters. If we compare the DL for the same network but different models, this tells which model most compresses the data. But if we compare two different networks with two different models, this tells us very little, because it mixes a comparison of which network is more regular with the quality of fit of each model.
The results of this kind of comparison is often trivial: the more nodes and edges, the higher will be the DL.
You *could* compute something like the DL per edge in order to compare two networks, but since the DL is not a linear function of the number of nodes or edges, it is difficult to put this evaluation on solid statistical grounds.
Thanks Tiago,
I see that this could be an option. But how about my proposal?
The 'polbooks' dataset has 105 nodes. An SBM with one block (B=1) has a DL of about 1550 bits. The DL is minimized (DL_min=1300) for B=5. When each node is in its own block (D=105), DL is maximized (DL_max=1950). Can't I make states of different graphs comparable by taking DL_min/DL_max? It seems like a straightforward application of normalized entropy (https://en.wikipedia.org/wiki/Entropy_(information_theory)#Efficiency_(norma...)) to me.
All, Tiago fixed a bug in the mailing list backend. It caused my email to arrive four times. I'm sorry for flooding your mailbox.
Best wishes
Haiko
Am 27.04.21 um 14:02 schrieb Lietz, Haiko:
The 'polbooks' dataset has 105 nodes. An SBM with one block (B=1) has a DL of about 1550 bits. The DL is minimized (DL_min=1300) for B=5. When each node is in its own block (D=105), DL is maximized (DL_max=1950). Can't I make states of different graphs comparable by taking DL_min/DL_max? It seems like a straightforward application of normalized entropy (https://en.wikipedia.org/wiki/Entropy_(information_theory)#Efficiency_(norma...)) to me.
It's difficult to comment, because I don't know what the objective of the comparison is.
If you compute the ratio of the minimum DL with the DL for B=1, this would give you the compression ratio when compared to a baseline random graph model.
If you compare this ratio between two networks of two different sizes, this gives you an idea of how more random one is versus the other, when compared to a fully random graph with the same density, but no deeper insight.
Best, Tiago
The 'polbooks' dataset has 105 nodes. An SBM with one block (B=1) has a DL of about 1550 bits. The DL is minimized (DL_min=1300) for B=5. When each node is in its own block (D=105), DL is maximized (DL_max=1950). Can't I make states of different graphs comparable by taking DL_min/DL_max? It seems like a straightforward application of normalized entropy (https://en.wikipedia.org/wiki/Entropy_(information_theory)#Efficiency_(norma...)) to me.
It's difficult to comment, because I don't know what the objective of the comparison is.
If you compute the ratio of the minimum DL with the DL for B=1, this would give you the compression ratio when compared to a baseline random graph model.
If you compare this ratio between two networks of two different sizes, this gives you an idea of how more random one is versus the other, when compared to a fully random graph with the same density, but no deeper insight.
My objective is to compare the extent to which given networks are in the ordered regime. In this sense, DL_min/DL_B=1 works because it measures the distance to disorder.
Thx for the input
Haiko