February 2020 - graph-tool - archives.skewed.de

level entropy
by Davide Cittaro 20 Feb '20

20 Feb '20

Hi all, I'm having issues in understanding functions to access level entropy in a NSBM, in particular [state.level_entropy(x) for x in range(len(state.get_levels()))] whose sum is equal to state.entropy(), is different from [x.entropy() for x in state.get_levels()] What is the difference between the two? How can I use both the informations? d

2 2

Comparing outputs of the nested SBM
by James Ruffle 19 Feb '20

19 Feb '20

Dear Graph Tool Community, I am interested in establishing differences between large undirected networks with multiple edge weights. I have been doing so thus far by using functions like similarity, passing individual edge weights to review effects of a given factor on graphical similarity. But, I am especially interested in comparing generative community based structure acquired with the nested SBM generated by these different edge weights. For instance, comparing the results of a nested SBM generated with 3 edge weights, and another with 3 different edge weights, where the vertices are all the same. However, I’m not sure the best way to implement this. Does anyone have some advice how best to go about it? (The only thought that comes to mind is compute a dice score for a given comparison, but not sure this is the best way…?) Thank you for your time. James

2 2

Difference between fuzzy model averaging and overlapping?
by Deklan Webster 19 Feb '20

19 Feb '20

When you sample from the posterior and take the vertex marginals, is it proper to say that we can interpret the marginals for a given vertex as being the degree of membership in the communities (fuzzy community membership)? If so, how does this differ from the overlapping blockstate? I saw in the mailing list that overlapping is only supported at the base level: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com… But, even if it were supported at every level, what does this achieve that the fuzzy model averaging doesn't? Could you do model averaging with the overlapping state too? E.g., in sample 1 vertex A is in communities c1, c2. In sample 2 vertex A is in communities c1, c4. Etc. Would this be in some way a more accurate measure of multiple community membership than the fuzzy? Thanks

2 2

Replacing boost::any arguments with std::variant
by Jeff Trull 07 Feb '20

07 Feb '20

Hi all, I've been wrestling with long compile times in graph-tool while doing some development work, and I noticed that template instantiations seem to be an important factor. Even functions of modest compile time can be very slow if they are multiplied by the product of several type alternatives, each of which much be instantiated. For example, a single function in Python may dispatch to N graph types x M degree map types x P weight map types, which can mean hundreds of instantiations. At the same time I noticed a pattern in the code where there is a set of types (represented by a Boost.MPL typelist) that accompany a boost::any argument. At dispatch time the "any" object is interrogated to find out which of the types it stores, and the correct instantiation is called. Dispatching this way requires a linear search through the typelist for each argument. It seems to me that replacing (MPL typelist + boost::any) with std::variant would improve both of those factors with: 1. Constant time dispatch to the appropriate instantiation via std::visit 2. The ability to reduce compile time by reducing the N x M... type product, depending on how well the argument use can be refactored. I think the approach described in http://jefftrull.github.io/c++/boost/python/2020/01/30/variants-in-boost-py… could be applied here. Would there be any interest in exploring this kind of refactoring? I think there could be substantial benefits in compile time, as well as some runtime improvement (depending on how often the Python/C++ boundary is crossed). Thanks, Jeff

2 6

Discrepancy between histograms and final number of groups
by Davide Cittaro 06 Feb '20

06 Feb '20

Hi, I'm running the nested version of nSBM, I'm collecting the group marginals using the code from gt documentation, basically counting the number of non empty blocks for each hierarchy level for each iteration: group_marginals = [np.zeros(g.num_vertices() + 1) for s in state.get_levels()] def _collect_marginals(s): levels = s.get_levels() for l, sl in enumerate(levels): group_marginals[l][sl.get_nonempty_B()] += 1 […] At the end of the equilibration I look at the distributions and, in general, the most probable number of blocks at each level is not the one that is stored in the final state, although the final number of blocks is typically the second most probable. I may be naive, but I expected the two to be the same. d

2 2