Am 14.04.20 um 02:28 schrieb Deklan Webster:
I found many minor typos, if you're interested in fixing them, but I also found some more confusing errors (I *think* I'm reading the latest version..):
Please send whatever typos you found by private email, so these can be collected.
In FIG 1 it says "probability q of missing edge", shouldn't that be "spurious edge"?
Yes.
At one point you say "If an entry is not observed, it has a value of n_ij= 0, which is different from it being observed with n_ij > 0 as a nonedge x_ij= 0." Shouldn't the terminology for n_ij=0 be "unmeasured" not "unobserved"? That really confused me in the following paragraphs.
That's that it's meant.
I am doing something like 2 not something like 1. I agree that 1 involves an arbitrary choice. 2 doesn't. With 2 I can calibrate probabilities which *do* have some meaning: they're based on my ability to predict the edges I removed. This is what the Nearly Optimal Link Prediction paper does.
Indeed selecting the optimal threshold can be done by cross validation in a *supervised* setting, but even then this will depend in general on the fraction of removed edges, size of test set, etc. In any case, you are either being very generous in interpreting the combination of arbitrary scores + threshold as "probabilities" with "some meaning", or at the very least we are talking about different things. You seem to be talking about the "probability of predicting a missing edge with a binary classifier", whereas I was referring to the scores themselves not being proper probabilities.
Not sure what you mean by AUROC being 'relative'. As you say, you need a score threshold to make predictions in the end but computing AUROC itself doesn't require a threshold. There are other non-threshold-based metrics one can use.
The reason why the AUC does not need a threshold is because it is a relative measure: it is the probability of a true positive having a larger score than a true negative. It does not care about the absolute values of the scores, only their relative values. If you multiply or add every score with a constant, you get the same AUC value. This means that even if you have a perfect AUC value of 1, you are still not ready to actually predict the edges until you find out where is the threshold that separates the true positives from the true negatives.
For the equilibration I used wait=1000, epsilon=1e-5, multiflip=True. For sampling I used 5000 iterations.
Just an observation: these numbers do not mean much by themselves. MCMC equilibration times will vary a lot between datasets. To be sure your sampling is good you have to run diagnostics, e.g. running the chain multiple times and checking they give the same answer, and checking that your results are not changing by increasing the number of iterations.
So, slightly better, but still very poor. A binary classifier trained with just the simple similarity measures implemented in graph-tool (as described above) as features would give better results than this. Indeed, the conditional probability method is much much better (based on SHAP importance in my binary classifier). How can that be? Surely I'm missing something or making a mistake somewhere?
It's difficult to say. I would investigate the MCMC equilibration in more detail before declaring defeat.
After reading the paper I see what you meant now. If I understand correctly, this doesn't apply to my situation. Maybe I'm wrong? If someone says "Here is a real-world graph. Tell me what the missing edges are", this method doesn't appear applicable?
I'm not going to tell you what your situation is, but in many scenarios that is applicable. There is a big difference between measuring a non-edge and not doing a measurement. "The absence of evidence is not evidence of absence." A good example is drug-drug interactions: A lack of an effect observed when two drugs are taken together is very different from not doing the experiment in the first place. Sadly, this important distinction is often ignored when network datasets are collected.
Of the 1000 random edges, 981 were found in the marginal graphs. 271/1000 = 0.271 had probability > 0.5. Of the 2000 random non-edges 952 were found in the marginal graphs, and only 12 had probability > 0.5.
Assuming I did this right, this method seems to work much better! It didn't appear to be fooled by the unmeasured non-edges.
For one, the problem now became more balanced, as the size of true positives and true negatives are the same.
Let's say I run this twice, once with alpha=50, beta=50 and once with alpha=10, beta=90. The former run will have more possible non-edges, right?. Let's say that there are two non-edges, a and b which were found in both runs. Is it not the case that if P(a) < P(b) in the first run, that it will also be so in the second run? Is this an incorrect intuition? In other words, will adjusting alpha and beta change the relative ordering of the probabilities (for the common ones)?
No, it should not change the relative ordering by much, although this can happen for extreme choices, e.g. that make the inferred graph very dense. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>