dear graph-tool mailing list,
do you have any recommendations for modelling highly skewed distributions of discrete edge weights?
my network is a multigraph which i collapse to a simple graph with edge-weights represent the number of edges in the multigraph between two vertices
in my data, the modal edge weight is equal to 1, but the max is above 2000
if i fit a degree-corrected Poisson SBM to the multigraph, every pair of firms with a large number of edges together are grouped together in their own block. this makes sense, since the poisson model will assign very low probability to the edges for any value of a poisson parameter that can rationalize the otherwise sparse rate of edge formation.
while this is not necessarily a problem per se, the large number of blocks that this creates complicates my analysis considerably, and it would be useful to use edge-covariates with a distribution that can account for the skewness to get a smaller number of blocks.
wondering if Tiago or anyone else on the list can suggest any transformation-distribution combination that might help. i tried (without thinking too deeply) the transformation weight = log(weight) + 1 with real-geometric weights, but minimize_blockmodel_dl() was taking an unusually long time to fit so i escaped.
the other option that came to my mind was to use a hierarchical SBM and choose a higher level where the blocks are merged. i haven't read the papers on hierarchical SBM or used them in graph-tool yet.
thx, -sam
-- Sent from: https://nabble.skewed.de/
Am 25.08.20 um 00:28 schrieb sam:
wondering if Tiago or anyone else on the list can suggest any transformation-distribution combination that might help. i tried (without thinking too deeply) the transformation weight = log(weight) + 1 with real-geometric weights, but minimize_blockmodel_dl() was taking an unusually long time to fit so i escaped.
It's difficult to say much without looking at the data. But I would try to keep the nature of the covariates the same, i.e. if they are discrete before the transform, they should also be discrete afterwards.
One option to reduce the variance may be to rank the values encountered, and take the rank index as the transformed covariate. YMMV.
Best, Tiago