March 2020 - graph-tool - archives.skewed.de

Re: [graph-tool] recent issue with circle-ci
by Zouhair Mahboubi 31 Mar '20

31 Mar '20

Thanks Tiago, This is not an issue I’m seeing on my local machine (I’m on OSX) since I haven’t had to reinstall graph_tool. This is only failing within Docker which is relevant since gitlab uses it for CI. Aside from the change to the apt-url, the script I included worked fine before but is now failing. I tried with the following Dockerfile to not have to depend on gitlab and so that it’s easier to reproduce (can use play-with-docker <https://labs.play-with-docker.com/>) FROM python:3.7-slim RUN apt-get update &&\ apt-get install -y gnupg2 software-properties-common &&\ apt-key adv --keyserver keys.openpgp.org --recv-key 612DEFB798507F25 &&\ add-apt-repository "deb http://downloads.skewed.de/apt buster main" &&\ apt-get update &&\ apt-get install -y python3-graph-tool &&\ pip3 install numpy &&\ pip3 freeze &&\ python3 -c 'from numpy import *' &&\ python3 -c 'from graph_tool.all import *' This is the output of the last three lines: Installing collected packages: numpy Successfully installed numpy-1.18.2 numpy==1.18.2 Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'graph_tool' you can see that pip3 freeze is only showing numpy, which when imported is fine, but importing graph_tool fails. I ran the docker image to inspect, and this is the problem I think: it looks like that doing apt-get install -y python3-graph-tool is installing it for the /usr/bin/python3.7, but the python (and pip) versions visible in the PATH to docker is in /usr/local/bin. I am not sure why this would be behaving differently now… docker build . -t test docker run -it --rm test /bin/bash root@afe35d645652:/# which python3 /usr/local/bin/python3 root@afe35d645652:/# python3 Python 3.7.7 (default, Mar 11 2020, 00:35:40) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import graph_tool Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'graph_tool' >>> root@afe35d645652:/# /usr/bin/python python python2.7 python3-config python3.7-config python3.7m-config python3m-config python2 python3 python3.7 python3.7m python3m root@afe35d645652:/# /usr/bin/python python python2.7 python3-config python3.7-config python3.7m-config python3m-config python2 python3 python3.7 python3.7m python3m root@afe35d645652:/# /usr/bin/python3.7 Python 3.7.3 (default, Dec 20 2019, 18:57:59) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import graph_tool >>> >

2 1

recent issue with circle-ci
by Zouhair Mahboubi 30 Mar '20

30 Mar '20

Hi, It looks some urls and installation instructions have changed recently? I had a gitlab-ci that started failing: E: The repository 'http://downloads.skewed.de/apt/buster buster Release' does not have a Release file. I tried following the new instructions to update the tests, and while apt-get is not complaining anymore and the installation appears to be successful, when the tests run it is not able to find graph_tool. Here is a MWE that’s failing: (I replaced my usual pytest with a simple attempt to import the library to illustrate) (Note I’m using add-apt-repository. Previously I was using add-apt-repository -s but it seems like having the deb-src was breaking things) test: image: python:3.6-slim stage: test before_script: - apt-get update - apt-get install -y gnupg2 software-properties-common - apt-key adv --keyserver keys.openpgp.org --recv-key 612DEFB798507F25 - add-apt-repository "deb http://downloads.skewed.de/apt buster main" - apt-get update - apt-get install -y gcc python3-dev python3-pip - apt-get install -y python3-graph-tool # - pip3 install -r requirements.txt - pip3 freeze - python3 -c 'from graph_tool.all import *' Here is a snippet from the output. I didn’t see any error during installation (tried this both on gitlab and locally with gitlab-runner) $ apt-get install -y python3-graph-tool Reading package lists... Building dependency tree... Reading state information... The following additional packages will be installed: adwaita-icon-theme at-spi2-core blt dbus-user-session dconf-gsettings-backend dconf-service fontconfig fontconfig-config ... Need to get 115 MB of archives. After this operation, 575 MB of additional disk space will be used. ... Unpacking python3-graph-tool (2.31) ... Setting up libxdot4 (2.40.1-6) ... Setting up javascript-common (11) ... ... $ pip3 freeze $ python3 -c 'from graph_tool.all import *' Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'graph_tool' Running after script... ERROR: Job failed: exit code 1 FATAL: exit code 1

2 1

efficient random sampling of paths between two nodes
by Franco Peschiera 23 Mar '20

23 Mar '20

Hello Tiago, First of all, thanks for your time. I see what you mean by having a biased logic that would prefer shorter paths to longer ones, I had not thought about that. Regarding the self-reference part, I think it would not be a problem because of the structure of my particular (directed) graph. In fact, each node represents an assignment *at some given time period* and the outward neighbors of a node represent assignments *in the future*. In this way, a path can never visit a previously visited node since there are no possible cycles. In fact I can easily calculate the shortest and longest possible path between two nodes (shortest: using graphql's `shortest_distance` method, longest= number of periods in between the two nodes). So the paths I want to create (or sample) are just the different ways one can go from a node N1 (in period P1) to node N2 (in period P2 > P1). I think that in my graph I could just sample neighbors with a weight that depends on how far they are (in number of periods) from the node: the farthest neighbor will have the least probability of being chosen. This way, I'd compensate the fact that shorter paths take less hops. What do you think? regards, Franco On Mon, Mar 23, 2020 at 4:14 PM <graph-tool-request(a)skewed.de> wrote: > > Am 21.03.20 um 17:44 schrieb Franco Peschiera: > > > Since I could potentially do many iterations (and samplings), I would > > like to have an unbiased sampling method for the Y paths without having > > to enumerate them all. > > > > One option is to do the sampling during the construction of the paths. I > > have in mind to just use the |get_out_neighbors| method on the start > > node to do my own depth-first search, randomly choosing a neighbor at > > each moment until I get to the |cutoff| or the end node. This, I could > > do for each path until I get to my limit of paths to sample. I?m not > > sure, though, if this is 1) similar in logic to the |all_paths| method > > and 2) if this is close in efficiency (due to using python to iterate > > and sample instead of C++). > > This would be heavily biased. It could be that there are many more paths > that go through one of the neighbors, so if you want sample uniformly > from all paths, you have to do a weighted sample of the neighbors at > each step. And these weights would change at each step. > > > Another option, although a lot more far-fetched, is to create my own > > |all_paths| method in C++ and re-compile my own version of graph-tool. > > Is this realistic? > > The problem here is not which language should be used. > > > Are there better options to doing this? > > Counting all paths between two nodes is conjectured to be NP-Hard. This > is also known as the self-avoiding-walk (SAW) problem. Sampling SAWs is > not straightforward either, I believe most efficient approaches are > based on MCMC. I recommend you to study the literature a little bit > before attempting a solution. > > (It would be a lot simpler if you would be seeking instead to sample > *shortest* paths between two nodes. This can be done in linear time, is > even implemented in graph-tool already in random_shortest_path().) > > Best, > Tiago > > -- > Tiago de Paula Peixoto <tiago(a)skewed.de> > >

2 1

bipartite graphs
by Davide Cittaro 23 Mar '20

23 Mar '20

Hello, I would like to test nSBM on bipartite graphs but before going on I need to be sure I'm able to build a bipartite graph in graph-tool starting from a matrix: A_nodes = np.arange(data.shape[0]) #nodes for rows start from 0 B_nodes = np.arange(data.shape[1]) + data.shape[0] # nodes from columns start from the last A_node g = gt.Graph(directed=True) # directed or not directed... maybe not important at all g.add_vertex(len(A_nodes) + len(B_nodes)) #add all needed nodes partition = g.new_vertex_property('bool') # create a property indicating the node type for x in A_nodes: partition[g.vertex(x)] = 0 # set all A nodes to 0 for x in B_nodes: partition[g.vertex(x)] = 1 # set all B to 1 idx = np.nonzero(data) # take the edge values weights = adata.X[idx] idx = (idx[0], idx[1] + len(A_nodes)) # node number of columns need to be augmented by the offset g.add_edge_list(np.transpose(idx)) #add weights ew = g.new_edge_property("double") ew.a = weights g.ep['weight'] = ew Is there a more straightforward way to go? d

2 3

efficient random sampling of paths between two nodes
by Franco Peschiera 23 Mar '20

23 Mar '20

Good day to all, I wrote an initial message (through the issues site) asking about the possibility of using the all_paths to random sample paths between two nodes in a graph. It was pointed out that the: 1. The algorithm is deterministic and that 2. It’s not possible to adapt it to iterate in a unbiased random way. Now I have a couple more questions, namely searching for advice on how to achieve what I want. ------------------------------ *Context*: I’ve modeled certain parts of combinatorial optimization problem as a set of graphs that I’m exploiting to efficiently generate variables on the fly during the optimization routine. A variable in my problem is represented by a path between two nodes in one of those graphs. ------------------------------ *Specifically, what I want*: 1. I have a big graph. 2. During each iteration I choose two nodes on the graph (based on some logic irrelevant to the question). 3. I get all the paths between those two nodes (or I randomly sample them if there are too many). For the sampling in the third step *I’m currently doing* the following (assuming a sample of size X and a population of paths of Y): 1. I get the generator of paths using the function all_paths. 2. I iterate over the generator using Reservoir sampling <https://en.wikipedia.org/wiki/Reservoir_sampling> and I stop if 1) I explored all paths or 2) I reach a multiple of X (e.g., 3X). Many times, the sample X is really small compared to the population Y). X can be 500-2000, and Y can be 100.000+. ------------------------------ *My problem is*: Since I could potentially do many iterations (and samplings), I would like to have an unbiased sampling method for the Y paths without having to enumerate them all. One option is to do the sampling during the construction of the paths. I have in mind to just use the get_out_neighbors method on the start node to do my own depth-first search, randomly choosing a neighbor at each moment until I get to the cutoff or the end node. This, I could do for each path until I get to my limit of paths to sample. I’m not sure, though, if this is 1) similar in logic to the all_paths method and 2) if this is close in efficiency (due to using python to iterate and sample instead of C++). Another option, although a lot more far-fetched, is to create my own all_paths method in C++ and re-compile my own version of graph-tool. Is this realistic? Are there better options to doing this? Thanks, Franco Peschiera

2 1

Problem extracting hierarchical blocks
by James Ruffle 23 Mar '20

23 Mar '20

Dear community / Tiago I have a hierarchical partition of a nested block state. The original network contained 4453 vertices and 50051 edges. state.print_summary() l: 0, N: 4453, B: 126 l: 1, N: 126, B: 46 l: 2, N: 46, B: 20 l: 3, N: 20, B: 9 l: 4, N: 9, B: 3 l: 5, N: 3, B: 1 I want to extract the community label of each vertex of each possible hierarchical level. To do this I wrote a loop based upon the guide at https://graph-tool.skewed.de/static/doc/demos/inference/inference.html Where vertexblocksdf is simply a df populated with the vertex numbers 0-4452. for idx in range(len(vertexblocksdf)): r = levels[0].get_blocks()[idx] # group membership of node idx in level 0 vertexblocksdf.ix[idx, 'level0'] = r r = levels[0].get_blocks()[r] # group membership of node idx in level 1 vertexblocksdf.ix[idx, 'level1'] = r r = levels[0].get_blocks()[r] # group membership of node idx in level 2 vertexblocksdf.ix[idx, 'level2'] = r r = levels[0].get_blocks()[r] # group membership of node idx in level 3 vertexblocksdf.ix[idx, 'level3'] = r r = levels[0].get_blocks()[r] # group membership of node idx in level 4 vertexblocksdf.ix[idx, 'level4'] = r r = levels[0].get_blocks()[r] # group membership of node idx in level 5 vertexblocksdf.ix[idx, 'level5'] = r But, I am getting strange results. My level0 column variables make sense, with 126 possibilities (as per l0 above). But my level1 column is a number between 0 and 13; of which none of my levels have 14 blocks. My level2 output is either 0 or 1, again doesn’t make sense! Level3-5 are all simply 0. *this also reproduces the same behaviour if done manually without loop. Any ideas?? James

4 3

Beta-parameter value during equilibration
by Leonardo Morelli 23 Mar '20

23 Mar '20

Hi! I'm a beginner in this field, so I do apologize if my vocabulary isn't precise enough. Basically, I'm trying to use graph-tool library in order to perform the unsupervised clustering step of biological samples (10³-10⁴ *nodes * networks). The following Is my current workflow: 1) state=minimize_nested_blockmodel_dl(g) 2) state.mcmc_sweep(niter=10,000) 3) mcmc_*equilibrate(*state*)* Question 1) Since I'm performing the equilibration with mcmc_equilibrate; is the mcmc_sweep step necessary in my workflow? Or I can just skip it? Question 2) This question concerns β parameter. I'm wondering if performing 2 rounds of equilibration sequentially, changing the value of β, does make any sense. In other words: 1) state=minimize_nested_blockmodel_dl(g) 2) mcmc_equilibrate(state, mcmc_args=dict(niter=10, β=1)) 3)mcmc_equilibrate(state, mcmc_args=dict(niter=10, β=1,000,000)) Thanks for your attention. Leonardo

2 1

Please, advise on interpreting SBM results
by santirdnd 23 Mar '20

23 Mar '20

Hi everybody, I'm not an expert on graph theory, so forgive me if I’m misunderstanding something. I have a dataset (V=2.5k; E=55k) representing biological entities and edges linking them based on a similarity measure. This dataset is very heterogenous with a giant component just shy of 2k nodes while, at the same time, about 200 singletons. To easy the process I’ve filtered the connected components with less than 4 nodes, leaving only 2.2k nodes. Upon inspection the graph seems to reveal many quasi-cliques even in the giant component. Some of these “putative clusters” are mostly isolated while others have a lot of links outward, but usually each one have some unique biological properties. My goal is to apply a more disciplined approach and, ideally, get to define the different communities found. The big communities can be found easily with any algorithm but graph-tool has prove really useful as it has also detected a community of hub nodes that are instances wrongly entered to the dataset. However, I get some blocks with mixed results. In fact they are formed by mostly unconnected “sub-communities”, some of then coming even from different components of the original graph, with nothing in common except for their connectivity pattern. As these sub-communities have very few members (around a dozen of nodes at most) I’m assuming that I’m hitting the resolution threshold even for nSBM. Is that correct? If it is the case, there is some way that could help to improve the analysis? Best, -- Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/

2 1

Finding cliques
by tshpak 23 Mar '20

23 Mar '20

Hello Tiago! Thanks for the great library. I use a function triadic_census to find all combinations of subgraphs based on the clique of size 3: K3. What can you propose, if I would like to calculate the same values for all combinations of K4 or bigger? Thanks in advance. -- Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/

2 1

efficient random sampling of paths between two nodes
by Franco Peschiera 21 Mar '20

21 Mar '20

Good day to all, (repeated because chose wrong email address). I wrote an initial message (through the issues site) asking about the possibility of using the all_paths to random sample paths between two nodes in a graph. It was pointed out that the: 1. The algorithm is deterministic and that 2. It’s not possible to adapt it to iterate in a unbiased random way. Now I have a couple more questions, namely searching for advice on how to achieve what I want. ------------------------------ *Context*: I’ve modeled certain parts of combinatorial optimization problem as a set of graphs that I’m exploiting to efficiently generate variables on the fly during the optimization routine. A variable in my problem is represented by a path between two nodes in one of those graphs. ------------------------------ *Specifically, what I want*: 1. I have a big graph. 2. During each iteration I choose two nodes on the graph (based on some logic irrelevant to the question). 3. I get all the paths between those two nodes (or I randomly sample them if there are too many). For the sampling in the third step *I’m currently doing* the following (assuming a sample of size X and a population of paths of Y): 1. I get the generator of paths using the function all_paths. 2. I iterate over the generator using Reservoir sampling <https://en.wikipedia.org/wiki/Reservoir_sampling> and I stop if 1) I explored all paths or 2) I reach a multiple of X (e.g., 3X). Many times, the sample X is really small compared to the population Y). X can be 500-2000, and Y can be 100.000+. ------------------------------ *My problem is*: Since I could potentially do many iterations (and samplings), I would like to have an unbiased sampling method for the Y paths without having to enumerate them all. One option is to do the sampling during the construction of the paths. I have in mind to just use the get_out_neighbors method on the start node to do my own depth-first search, randomly choosing a neighbor at each moment until I get to the cutoff or the end node. This, I could do for each path until I get to my limit of paths to sample. I’m not sure, though, if this is 1) similar in logic to the all_paths method and 2) if this is close in efficiency (due to using python to iterate and sample instead of C++). Another option, although a lot more far-fetched, is to create my own all_paths method in C++ and re-compile my own version of graph-tool. Is this realistic? Are there better options to doing this? Thanks, Franco Peschiera

1 0