Hi team. I'm wondering whether you could help me to see what is happening with your reduced_mutual_information() function because of several mismatching outputs I found on this implementation.
1. RMI is a value between [0, 1], but why in your example the output is negative if I compare two partition? x = np.random.randint(0, 10, 1000) y = np.random.randint(0, 10, 1000) gt.reduced_mutual_information(x, y) -0.065562...
2. In your example, you create sort of two partitions from a random distribution, Is it not the specific case when RMI is zero, or very close to zero?
3. When I use the exact partitions Newman offer in your own code (wine.txt), your function gives 0.7890319931250596 But the Newman function gives Reduced mutual information M = 1.21946279985 bits per object Why do these results are so different or how can we associate them?
4. Finally, what is (or where is) the description of the format one must pass the partitions to the function? I mean, I'm confused about how x (or y) variables should arranged. Each row index is the node label? If so, how to write nodes sharing several partitions?
Thanks in advance for your answers and congratulation for creating this tool! JM
Am 30.04.21 um 18:53 schrieb JohannHM:
Hi team. I'm wondering whether you could help me to see what is happening with your reduced_mutual_information() function because of several mismatching outputs I found on this implementation.
- RMI is a value between [0, 1], but why in your example the output is negative if I compare two partition?
x = np.random.randint(0, 10, 1000) y = np.random.randint(0, 10, 1000) gt.reduced_mutual_information(x, y) -0.065562...
RMI is _not_ between [0,1]. It can take negative values!
The *normalized* value of RMI can take a value of at most one, but it can still be negative.
- In your example, you create sort of two partitions from a random distribution, Is it not the specific case when RMI is zero, or very close to zero?
-0.065562 is close to zero.
- When I use the exact partitions Newman offer in your own code (wine.txt), your function gives
0.7890319931250596 But the Newman function gives Reduced mutual information M = 1.21946279985 bits per object Why do these results are so different or how can we associate them?
Newman's code returns the value in bits (base 2), where in graph-tool the convention is to return the value in nats (base e). Just divide the value obtained via graph-tool by log(2) and the results should match.
By the way, for the "wine.txt" example I get:
reduced_mutual_information(x, y) -> 0.8452672015130195 reduced_mutual_information(x, y) / log(2) -> 1.2194627998489254
I can only recover your value for norm=True
reduced_mutual_information(x, y, norm=True) -> 0.7890319931250596
which is not what is returned by Newman's code. So please pay attention.
- Finally, what is (or where is) the description of the format one must pass the partitions to the function?
I mean, I'm confused about how x (or y) variables should arranged. Each row index is the node label? If so, how to write nodes sharing several partitions?
I honestly do not understand the source of confusion. A label partition is a 1D array containing the group labels for each node, indexed by the node index.
Best, Tiago
Dear Tiago, thanks for your convenient answers. My final question is about how to deal when one have a node in different partitions. If I have these communities, for example: cover = [ [0,1,2,3], [3,1,5,4] ] How it should be the label partition? [0, ?, 0, ?, 1, 1] The question marks is where I do not know what label should I write for? Which label should I write there for the node 1, which is in two communities (community of label 0, and 1)? The same for the node 3..
Any comment of how to deal with the entries of your function for covers will be very welcome. Thanks in advance for your answer. Johann
Am 03.05.21 um 12:28 schrieb JohannHM:
My final question is about how to deal when one have a node in different partitions. If I have these communities, for example: cover = [ [0,1,2,3], [3,1,5,4] ] How it should be the label partition?
This isn't a partition. A partition, by definition, is non-overlapping. RMI is not applicable to this case.
Thank for your comment, Tiago. Would you please recommended me one of the included methods of your gt that is useful for comparing: a. cover, vs partition b. covers, vs covers
Thanks in advance.
Am 03.05.21 um 17:20 schrieb JohannHM:
Thank for your comment, Tiago. Would you please recommended me one of the included methods of your gt that is useful for comparing: a. cover, vs partition b. covers, vs covers
Nothing in particular comes to mind.
Best, Tiago