Dear Tiago,
I have a directed graph of about half a million nodes and approximately a
million edges following scale free behaviour and a power law degree
distribution. To test some of my hypothesis, I would like to generate random
smaller graphs (about 50 up to 200 nodes) representative of the big one.
When I used a sample function that samples straight away from the real
distribution of the big network, I have following problems:
- I generate unconnected nodes with both 0 in AND out degree.
- I generate small sub parts of a few nodes that are not connected to the
main graph.
- If only sampling from nodes with at least 1 degree, the generated graph is
coherent, but not representative anymore as I need a big portion of nodes
with either only one in or one out degree.
Here is the part of my script I used for that, where samples are drawn from
dictionaries of the degrees:
def sample_in():
a=np.random.randint(num)
k_in = in_degrees[a]
return k_in
def sample_out():
if sample_in()==0:
b=np.random.randint(num_out)
k_out=out_zero_zeros.values()[b]
return k_out
else:
b=np.random.randint(num)
k_out=out_degrees[b]
return k_out
N=200
g=gt.random_graph(N, lambda:(sample_in(), sample_out()),
model="constrained-configuration", directed=True)
I also tried sampling from a list of tuples as you have mentioned before in
the forum, but I didn't receive any results, as the tuples randomly drawn
from my list might not be combinable.
degs=[(7,1),(4,3),(5,6),(2,4),(6,8),(2,0),(3,5),(0,3),(2,7),(2,1)]
g = gt.random_graph(4, lambda i: degs[i], directed=True)
- Is there any option I could active that would help me in those cases I
described above?
- Is there a better way how to create representative small networks?
Any help on that issue will be much appreciated.
Best wishes,
Jana
--
Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/

I am curious what is being used to calculate the standard deviation of the
average in gt.vertex_average and gt.edge_average
>>> t2=gt.Graph()
>>> t2.add_vertex(2)
>>> t2.add_edge(t2.vertex(0), t2.vertex(1))
>>> gt.vertex_average(t2, "in")
(0.5, 0.35355339059327373)
Now, shouldn't std be σ(n)=sqrt(((0-0.5)^2+(1-0.5)^2)/2)=0.5 ?
also q(n-1)=sqrt((0.5^2+0.5^2)/(2-1))~=0.70710
0.3535 is sqrt(2)/4 which happens to be σ(n-1)/2, so it seems there is some
relation to that.
A little bigger graph.
>>> t3=gt.Graph()
>>> t3.add_vertex(5)
>>> t3.add_edge(t3.vertex(0), t3.vertex(1))
>>> gt.vertex_average(t3, "in")
(0.2, 0.17888543819998318)
Now, we should have 0,1,0,0,0 series for vertex incoming degree.
So Windows calc gives σ(n)=0.4 and σ(n-1)~=0.44721, so where does 0.1788854
come from ?
Reason, I am asking because, I have a large graph, where the average looks
quite alright but the std makes no sense, as going by the histogram, degree
values are quite a bit more distributed than the std would indicate.
--
View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com…
Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.

Dear Graph-Tool Community,
I am interested in analysing the hierarchical partitions generated by the nested blockmodel. Specifically, after I have generated a nested SBM; I would then like to post-process this and calculate measures such as eigenvector centrality for a given hierarchical node; save this as a property, and then in visualisation apply either a size or colormap constraint to said node weighted by its centrality.
Using the collection data;
g = gt.collection.data["celegansneural”]
state = gt.minimize_nested_blockmodel_dl(g)
I can then ascertain what my levels are with;
l1state = state.levels[1].g
l2state = state.levels[2].g
etc.
I can then calculate eigenvector centrality of a given hierarchical partition as follows;
ee1, x1 = gt.eigenvector(l1state)
ee2, x2 = gt.eigenvector(l2state)
1) This presumably then needs to be saved as a hvprops(?!). But, I am unclear how to do this, not least in a way that I know for sure that the correct hierarchical vertices within l1state and l2state are aligning to the generated centrality measures of x1 and x2, respectively.
2) Furthermore, if/when that is achieved, how can I call upon this in drawing, for example to size the level 1 hierarchical vertices according to centrality, or level 2 vertices by another measure, etc.?
Hugely grateful for any solutions!
James

Dear community / Tiago
I have a hierarchical partition of a nested block state.
The original network contained 4453 vertices and 50051 edges.
state.print_summary()
l: 0, N: 4453, B: 126
l: 1, N: 126, B: 46
l: 2, N: 46, B: 20
l: 3, N: 20, B: 9
l: 4, N: 9, B: 3
l: 5, N: 3, B: 1
I want to extract the community label of each vertex of each possible hierarchical level.
To do this I wrote a loop based upon the guide at https://graph-tool.skewed.de/static/doc/demos/inference/inference.html
Where vertexblocksdf is simply a df populated with the vertex numbers 0-4452.
for idx in range(len(vertexblocksdf)):
r = levels[0].get_blocks()[idx] # group membership of node idx in level 0
vertexblocksdf.ix[idx, 'level0'] = r
r = levels[0].get_blocks()[r] # group membership of node idx in level 1
vertexblocksdf.ix[idx, 'level1'] = r
r = levels[0].get_blocks()[r] # group membership of node idx in level 2
vertexblocksdf.ix[idx, 'level2'] = r
r = levels[0].get_blocks()[r] # group membership of node idx in level 3
vertexblocksdf.ix[idx, 'level3'] = r
r = levels[0].get_blocks()[r] # group membership of node idx in level 4
vertexblocksdf.ix[idx, 'level4'] = r
r = levels[0].get_blocks()[r] # group membership of node idx in level 5
vertexblocksdf.ix[idx, 'level5'] = r
But, I am getting strange results. My level0 column variables make sense, with 126 possibilities (as per l0 above). But my level1 column is a number between 0 and 13; of which none of my levels have 14 blocks. My level2 output is either 0 or 1, again doesn’t make sense! Level3-5 are all simply 0.
*this also reproduces the same behaviour if done manually without loop.
Any ideas??
James

Hi,
I was wondering if there is any way to assign vertex properties while
adding edges to the graph. for example using "add_edge_list" I can assign
edge properties but later I have to iterate through all vertices again to
assign their properties.
I know this is not a problem when the vertex property is of the type "int"
or "float" because then one can use "vprop.a = values", but in case of
"string" and "object" this method doesn't work
What would be the best/fastest way to handle this situation.
I guess it would be very helpful to extend the "add_edge_list" function to
accept vertex property in some way.
cheers,
--
Mohsen

Dear Tiago,
Thanks for building this wonderful package!
As a novice in Python I'm struggling to install this package- My operating
system is MacOS and I installed graph-tool via Homebrew.
It seems the installation was successful. When compiling, the command
'./configure' doesn't work, could you please give the further instructions?
Many thanks,
Yingjie
--
Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/

>
>
> How are you doing steps 3.a and 3.e exactly?
>
> Are you calling g.add_vertex() and g.remove_vertex() repeatedly?
>
Yes I am. I assumed that's not the best thing but compared to copying the
graph, adding then removing the vertices was performing better.
I investigated a different approach whereas I allow additional vertices to
exist as long as I keep track of the original number of vertices (n_orig):
For a set of candidate vertices of size m
- if graph size is not large enough, add new vertices
- create a graph view that only keeps (n_orig + m)
- add edges as needed (this relies on vertex properties)
- do graph computation
- when done, remove edges (but not vertices)
When finished with all candidate sets, remove the extra vertices.
Now I am only spending ~25% of the time in resizing the graph (around 20%
in remove_edges, despite having set_fast_edge_removal(True)). That's still
more than I'd like, but it's an improvement from the 80% I had with adding
/ removing vertices. If you have other suggestions on how to tackle this
that'd be great (for example, I tried without remove_edges, but the edges
that get added to the graphview end up showing up in the original graph. Is
there a way to avoid that or to filter such that only the original edges
are kept in the graphview?)
as a minimal working example:
```
import graph_tool.all as gt
import numpy as np
g = gt.Graph(directed=False)
g.add_vertex(10)
original_edges = [[0, 1], [0, 4]]
g.add_edge_list(original_edges)
g.set_fast_edge_removal(True)
n_orig = g.num_vertices()
def foo(g, m, skipEdgeRemoval = False):
n_act = g.num_vertices()
n_extra = n_act - (n_orig+m)
if n_extra < 0:
print(f'adding {-n_extra} vertices ...')
g.add_vertex(-n_extra)
n_extra = 0
n_act = g.num_vertices()
filt = np.ones(n_act)
if n_extra > 0:
filt[(-n_extra):] = 0
gv = gt.GraphView(g, filt)
#print(f'filt = {filt}')
new_vtx_ids = np.arange(n, n+m)
print(f'indices of new vertices = {new_vtx_ids}')
#Check that all edges are expected
for e in list(gv.edges()):
if [e.source(), e.target()] not in original_edges:
raise Exception(f'{e.source()} -> {e.target()} was not in
original graph')
#Add a bunch of random edges connecting to new candidates
edge_list = []
for s, t in zip(np.random.choice(range(n_orig), m),
np.random.choice(new_vtx_ids, m)):
edge_list.append([s,t])
#also updates some vertex properties, but that's easy/fast to undo,
not done in this MWE
gv.add_edge_list(edge_list)
print('Current edges:')
for e in list(gv.edges()):
print(f'{e.source()} -> {e.target()}')
#computation on new graph goes here ...
pass
#print('Cleanup ...')
#If cleanup is not done, this fails as the edges that were added to the
graphview
#end up being added to the original graph as well :s
if not skipEdgeRemoval:
for e in edge_list:
gv.remove_edge(gv.edge(e[0], e[1]))
foo(g, 1)
foo(g, 4)
foo(g, 3)
foo(g, 5)
#above is all good
foo(g, 3, True) #skip cleanup
foo(g, 4) #fails
```
> Message: 3
> Date: Tue, 21 Jan 2020 15:01:53 +0100
> From: Tiago de Paula Peixoto <tiago(a)skewed.de>
> To: Main discussion list for the graph-tool project
> <graph-tool(a)skewed.de>
> Subject: Re: [graph-tool] Speeding up graph copy?
> Message-ID: <1eb5aeab-0a3d-d3fa-84e3-60734afdd360(a)skewed.de>
> Content-Type: text/plain; charset="utf-8"
>
> Am 20.01.20 um 19:43 schrieb Zouhair:
> > Hi,
> >
> > I'm using graph-tool in my project, and my use case is as follows:
> > 1) build an undirected graph with vertices and edges (~1e5 vtx and ~5e6
> > edges)
> > 2) attach some properties to the vertices
> > 3) for about ~5000 sets of *new* candidate vertices (each set has about
> > 2-30 vertices)
> > ? 3.a - add new vertices to graph
> > ? 3.b - add edges as necessary between new vertices and old ones
> > ? 3.c - update the properties
> > ? 3.d - run an algorithm which depends on properties and edges, save
> > result to a list
> > ? 3.e - remove new vertices from graph and restore properties (i.e. undo
> > 3.a-c)
> >
> > Steps 1) and 2) are going well. For step 3), the algorithm in step 3.d
> > runs fast, 3.b and 3.c are also fast, however, the bottleneck I'm facing
> > is 3.a and 3.e (about 80% of runtime is spent there)
>
>
> > iii- use GraphView but only filter on the original vertices in the
> > graph: this avoids having to delete the vertices, but making GraphView
> > with a filter function is still slow (doing `gt.GraphView(g, lambda v: v
> > < n_nodes_orig)` takes 75% of runtime, and the number of vertices in the
> > original graph keeps growing...
>
> This can be improved by passing a boolean-valued vertex property map
> instead of a lambda function. In this way the GraphView creation becomes
> O(1) instead of O(N).
>
> Best,
> Tiago
>
> --
> Tiago de Paula Peixoto <tiago(a)skewed.de>
>
>

Hi Paul,
From how I understand things, you can get the used colors this way:
from graph_tool.all import *
num_blocks = 14
for i in range(0, num_blocks):
j = int(i*(255/num_blocks-1))
print(default_cm(j))
Here ``num_blocks`` is the number of colors that are needed (in a celegans case: 14).
Is this correct, Tiago or others?
Haiko

Hi,
I'm using graph-tool in my project, and my use case is as follows:
1) build an undirected graph with vertices and edges (~1e5 vtx and ~5e6
edges)
2) attach some properties to the vertices
3) for about ~5000 sets of *new* candidate vertices (each set has about
2-30 vertices)
3.a - add new vertices to graph
3.b - add edges as necessary between new vertices and old ones
3.c - update the properties
3.d - run an algorithm which depends on properties and edges, save result
to a list
3.e - remove new vertices from graph and restore properties (i.e. undo
3.a-c)
Steps 1) and 2) are going well. For step 3), the algorithm in step 3.d runs
fast, 3.b and 3.c are also fast, however, the bottleneck I'm facing is 3.a
and 3.e (about 80% of runtime is spent there)
I've read through the docs and tried different approaches:
i- make a copy of the graph using g.copy() or Graph(g): this avoids steps
3.e, but copying the graph is much slower
ii - use a GraphView: adding vertices to the graph view adds them to the
original graph, so still need to do step 3.e
iii- use GraphView but only filter on the original vertices in the graph:
this avoids having to delete the vertices, but making GraphView with a
filter function is still slow (doing `gt.GraphView(g, lambda v: v <
n_nodes_orig)` takes 75% of runtime, and the number of vertices in the
original graph keeps growing...
iv - do this using multiprocessing with the hope that pickling the object
avoids having to copy the graph, but it it appears that a different process
modifying the graph ends up affecting the graph that another process
receives ( because multiprocessing doesn't pickle the underlying C++
object/memory?)
In all cases, it's surprising that the bulk of the runtime is in making
graph copies as opposed to running the algorithm. The cleanest solution is
to take a copy of the graph and work on it to avoid having to later delete
vertices and restore properties, so I'm hoping that there is a way to speed
up graph copies?
I'm hoping for some input on whether there is a better way to achieve this
using graph-tool, or some pythonic magic to help speed up the copy
operation.
Thanks!
P.S. sorry if this is a repost - my initial post was rejected by nabble,
trying again directly.