Re: [graph-tool] Latent Poisson questions

14 Apr 2020

      Am 14.04.20 um 02:28 schrieb Deklan Webster:
...
I found many minor typos, if you're interested in
fixing them, but I also found some more confusing errors (I *think* I'm
reading the latest version..):
Please send whatever typos you found by private email, so these can be
collected.
...
In FIG 1 it says "probability q of missing edge", shouldn't that be
"spurious edge"?
Yes.
...
At one point you say "If an entry is not observed, it has a value of
n_ij= 0, which is different from it being observed with n_ij > 0 as a
nonedge x_ij= 0." Shouldn't the terminology for n_ij=0 be "unmeasured"
not "unobserved"? That really confused me in the following paragraphs.
That's that it's meant.
...
I am doing something like 2 not something like 1. I agree that 1
involves an arbitrary choice. 2 doesn't. With 2 I can calibrate
probabilities which *do* have some meaning: they're based on my ability
to predict the edges I removed. This is what the Nearly Optimal Link
Prediction paper does.
Indeed selecting the optimal threshold can be done by cross validation
in a *supervised* setting, but even then this will depend in general on
the fraction of removed edges, size of test set, etc.

In any case, you are either being very generous in interpreting the
combination of arbitrary scores + threshold as "probabilities" with
"some meaning", or at the very least we are talking about different
things. You seem to be talking about the "probability of predicting a
missing edge with a binary classifier", whereas I was referring to the
scores themselves not being proper probabilities.
...
Not sure what you mean by AUROC being 'relative'. As you say, you need a
score threshold to make predictions in the end but computing AUROC
itself doesn't require a threshold. There are other non-threshold-based
metrics one can use.
The reason why the AUC does not need a threshold is because it is a
relative measure: it is the probability of a true positive having a
larger score than a true negative. It does not care about the absolute
values of the scores, only their relative values. If you multiply or add
every score with a constant, you get the same AUC value.

This means that even if you have a perfect AUC value of 1, you are still
not ready to actually predict the edges until you find out where is the
threshold that separates the true positives from the true negatives.
...
For the equilibration I used
wait=1000, epsilon=1e-5, multiflip=True. For sampling I used 5000
iterations.
Just an observation: these numbers do not mean much by themselves. MCMC
equilibration times will vary a lot between datasets. To be sure your
sampling is good you have to run diagnostics, e.g. running the chain
multiple times and checking they give the same answer, and checking that
your results are not changing by increasing the number of iterations.
...
So, slightly better, but still very poor. A binary classifier trained
with just the simple similarity measures implemented in graph-tool (as
described above) as features would give better results than this.
Indeed, the conditional probability method is much much better (based on
SHAP importance in my binary classifier). How can that be? Surely I'm
missing something or making a mistake somewhere?
It's difficult to say. I would investigate the MCMC equilibration in
more detail before declaring defeat.
...
After reading the paper I see what you meant now. If I understand
correctly, this doesn't apply to my situation. Maybe I'm wrong? If
someone says "Here is a real-world graph. Tell me what the missing edges
are", this method doesn't appear applicable?
I'm not going to tell you what your situation is, but in many scenarios
that is applicable. There is a big difference between measuring a
non-edge and not doing a measurement. "The absence of evidence is not
evidence of absence." A good example is drug-drug interactions: A lack
of an effect observed when two drugs are taken together is very
different from not doing the experiment in the first place.

Sadly, this important distinction is often ignored when network datasets
are collected.
...
Of the 1000 random edges, 981 were found in the marginal graphs.
271/1000 = 0.271 had probability > 0.5. Of the 2000 random non-edges 952
were found in the marginal graphs, and only 12 had probability > 0.5.
Assuming I did this right, this method seems to work much better! It
didn't appear to be fooled by the unmeasured non-edges.
For one, the problem now became more balanced, as the size of true
positives and true negatives are the same.
...
Let's say I run this twice, once with alpha=50, beta=50 and once with
alpha=10, beta=90. The former run will have more possible non-edges,
right?. Let's say that there are two non-edges, a and b which were found
in both runs. Is it not the case that if P(a) < P(b) in the first run,
that it will also be so in the second run? Is this an incorrect
intuition? In other words, will adjusting alpha and beta change the
relative ordering of the probabilities (for the common ones)?
No, it should not change the relative ordering by much, although this
can happen for extreme choices, e.g. that make the inferred graph very
dense.

Best,
Tiago

-- 
Tiago de Paula Peixoto <tiago@skewed.de>