Re: [graph-tool] Latent Poisson questions

9 Apr 2020

      Am 09.04.20 um 02:01 schrieb Deklan Webster:
...
...
That is exactly what this does, but in a numerically stable manner.
Not seeing how the function you linked is incremental. It seems to
require collecting every conditional probability in memory for every
vertex pair. I'm collecting for ~6m pairs on my laptop, that would blow
my RAM. (If I'm understanding correctly)
It's a sum in log-space, so it's incremental.

Doing

   x = logsumexp([x, y])

is the same as

   x = log(exp(x) + exp(y))

but is more accurate and does not overflow.
...
...
If you don't supply anything, a flat prior will be used, which means the
inference will be guided by the network structure alone.
When I use the conditional probability method it too only uses the
network structure, correct?
If you use a flat prior, the number of removed edges is inferred from
the network structure. If you supply it via the prior, then this will be
used.
...
This smaller graph I'm testing on has 225,150 possible edges. There are
20,911 actual edges. So, there are 204,239 possible additional edges.
Finding 13 potential edges out of 204,239 seems extremely off (even
using network structure alone; indeed all the other link prediction
methods I'm using only use network structure). I could test this by
removing 2000 and then seeing how many more it predicts, but I'm fairly
certain already it will be similarly low.
Most other link prediction methods only provide a ranking or scores of
the edges. In order to actually predict edges, all these methods require
you to determine how many edges should be actually placed.

The method in MeasuredBlockState() actually attempts to perform
reconstruction, with a full posterior distributions of networks
conditioned on the observed data. This includes the actual number of
missing/spurious edges.

In order for the reconstruction to be successful, there needs to be
sufficient information in the data. For example, if the original network
is completely random, then removing random edges leaves no trace of the
original structure, and then reconstruction fails. In some cases (as I
have shown in https://dx.doi.org/10.1103/PhysRevX.8.041011) the
reconstruction works very well, but it may fail if the distortion is not
noticeable with a SBM.

You should also try to consider using a different framework, where you
leave the missing edges "unobserved", instead of transforming them into
nonedges. The distinction is important, and changes the reconstruction
problem. You can achieve this with MeasuredBlockState by setting the
corresponding n value to zero.
...
In the context of exploratory link prediction, recommender systems for
social networks, etc I don't think there is a principled way to give a
"deleted fraction". I don't know how many links are "missing" but it's
certainly not 13. I assume I could just assert something more aggressive
arbitrarily and get a better result. Would it not be more principled to
have a single 'non-edge aggressiveness parameter' and then make a
uniform prior (hyperprior?) for it over some reasonable range, perhaps
that f ratio you mention? Say, uniformly between [0.01, 0.2]. Is that
doable with graph-tool?
A bound prior like this is not possible, but you can get something close
with the beta prior, where it is centered on a specific value, say 0.1
with an arbitrary variance.

Best,
Tiag

-- 
Tiago de Paula Peixoto <tiago@skewed.de>