[graph-tool] Re: Strange outputs for Newman RMI? (graph_tool.inference.partition_centroid.reduced_mutual_information(x, y, norm=False))

2 May 2021

      Am 30.04.21 um 18:53 schrieb JohannHM:
...
Hi team.
I'm wondering whether you could help me to see what is happening with your   reduced_mutual_information() function because of several mismatching outputs I found on this implementation.
1. RMI is a value between [0, 1], but why in your example the output is negative if I compare two partition?
x = np.random.randint(0, 10, 1000)
y = np.random.randint(0, 10, 1000)
gt.reduced_mutual_information(x, y)
-0.065562...
RMI is _not_ between [0,1]. It can take negative values!

The *normalized* value of RMI can take a value of at most one, but it 
can still be negative.
...
2. In your example, you create sort of two partitions from a random distribution, Is it not the specific case when RMI is zero, or very close to zero?
-0.065562 is close to zero.
...
3. When I use the exact partitions Newman offer in your own code (wine.txt), your function gives
0.7890319931250596
But the Newman function gives
Reduced mutual information M = 1.21946279985 bits per object
Why do these results are so different or how can we associate them?
Newman's code returns the value in bits (base 2), where in graph-tool 
the convention is to return the value in nats (base e). Just divide the 
value obtained via graph-tool by log(2) and the results should match.

By the way, for the "wine.txt" example I get:

    reduced_mutual_information(x, y)  -> 0.8452672015130195
    reduced_mutual_information(x, y) / log(2) -> 1.2194627998489254

I can only recover your value for norm=True

   reduced_mutual_information(x, y, norm=True) -> 0.7890319931250596

which is not what is returned by Newman's code. So please pay attention.
...
4. Finally, what is (or where is) the description of the format one must pass the partitions to the function?
I mean, I'm confused about how x (or y) variables should arranged. Each row index is the node label? If so, how to write nodes sharing several partitions?
I honestly do not understand the source of confusion. A label partition 
is a 1D array containing the group labels for each node, indexed by the 
node index.

Best,
Tiago

-- 
Tiago de Paula Peixoto <tiago@skewed.de>