Eigenvector centrality runtime/complexity
Hi all, I have run into a runtime/complexity question where I am wondering if someone can give me insights into why this might happen. I have conducted the following steps: 1.) Create some random network 2.) Calculate eigenvector centrality 3.) Sample 10% random nodes 4.) Filter graph by nodes 5.) Calculate eigenvector centrality on sample I have observed that for the sample, the eigenvector centrality calculation takes much longer, in some cases (dependent e.g., on block structure) it takes way longer (like 30 times longer). I am now trying to figure out why this is the case. I assume it has something to do with the convergence which might be probably because links are missing in the sample. If I do the same for e.g., PageRank the difference is not that drastic (it still takes longer in the sample). Does anyone have an idea what is going on here? Thanks, Philipp
HI Philipp, Can you provide minimal sample code that demonstrates your observations? That usually improves tenfold the chance that anyone will be able to help. Cheers, ale On Wed, Jul 27, 2016 at 3:14 PM, Philipp Singer <killver@gmail.com> wrote:
Hi all,
I have run into a runtime/complexity question where I am wondering if someone can give me insights into why this might happen.
I have conducted the following steps:
1.) Create some random network 2.) Calculate eigenvector centrality 3.) Sample 10% random nodes 4.) Filter graph by nodes 5.) Calculate eigenvector centrality on sample
I have observed that for the sample, the eigenvector centrality calculation takes much longer, in some cases (dependent e.g., on block structure) it takes way longer (like 30 times longer).
I am now trying to figure out why this is the case. I assume it has something to do with the convergence which might be probably because links are missing in the sample. If I do the same for e.g., PageRank the difference is not that drastic (it still takes longer in the sample).
Does anyone have an idea what is going on here?
Thanks, Philipp _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Sure, sorry for that: ------------- g = gt.price_network(10000,10,directed=False) %time gt.eigenvector(g) def node_sample(g, g_vertices, n_samples): return random.sample(g_vertices, n_samples) sample = node_sample(g, [x for x in g.vertices()], 1000) vfilt = g.new_vertex_property('bool') for s in sample: vfilt[s] = True g_sample = gt.GraphView(g, vfilt=vfilt) %time gt.eigenvector(g_sample) ------------- This is run in an ipython environment. Cheers, Philipp On 27.07.2016 17:01, Alexandre Hannud Abdo wrote:
HI Philipp,
Can you provide minimal sample code that demonstrates your observations?
That usually improves tenfold the chance that anyone will be able to help.
Cheers, ale
On Wed, Jul 27, 2016 at 3:14 PM, Philipp Singer <killver@gmail.com <mailto:killver@gmail.com>> wrote:
Hi all,
I have run into a runtime/complexity question where I am wondering if someone can give me insights into why this might happen.
I have conducted the following steps:
1.) Create some random network 2.) Calculate eigenvector centrality 3.) Sample 10% random nodes 4.) Filter graph by nodes 5.) Calculate eigenvector centrality on sample
I have observed that for the sample, the eigenvector centrality calculation takes much longer, in some cases (dependent e.g., on block structure) it takes way longer (like 30 times longer).
I am now trying to figure out why this is the case. I assume it has something to do with the convergence which might be probably because links are missing in the sample. If I do the same for e.g., PageRank the difference is not that drastic (it still takes longer in the sample).
Does anyone have an idea what is going on here?
Thanks, Philipp _______________________________________________ graph-tool mailing list graph-tool@skewed.de <mailto:graph-tool@skewed.de> https://lists.skewed.de/mailman/listinfo/graph-tool
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
On 27.07.2016 17:06, Philipp Singer wrote:
g = gt.price_network(10000,10,directed=False) %time gt.eigenvector(g)
def node_sample(g, g_vertices, n_samples): return random.sample(g_vertices, n_samples) sample = node_sample(g, [x for x in g.vertices()], 1000)
vfilt = g.new_vertex_property('bool') for s in sample: vfilt[s] = True g_sample = gt.GraphView(g, vfilt=vfilt)
%time gt.eigenvector(g_sample)
If you want to check if the difference is due to graph filtering, you can compare with: u = gt.Graph(g_sample, prune=True) gt.eigenvector(u) Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Thanks for the hint, I tried that but it is the same runtime as on the filtered graph. Best, Philipp On 27.07.2016 17:16, Tiago de Paula Peixoto wrote:
On 27.07.2016 17:06, Philipp Singer wrote:
g = gt.price_network(10000,10,directed=False) %time gt.eigenvector(g)
def node_sample(g, g_vertices, n_samples): return random.sample(g_vertices, n_samples) sample = node_sample(g, [x for x in g.vertices()], 1000)
vfilt = g.new_vertex_property('bool') for s in sample: vfilt[s] = True g_sample = gt.GraphView(g, vfilt=vfilt)
%time gt.eigenvector(g_sample) If you want to check if the difference is due to graph filtering, you can compare with:
u = gt.Graph(g_sample, prune=True) gt.eigenvector(u)
Best, Tiago
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Hi Philipp, Not with the sample you provided. Trying here, time decreases below that for the original graph. (I had to change the sample to increase n and sample size, otherwise it was just too fast to capture) In [1]: %paste import random import graph_tool.all as gt n=100000 g = gt.price_network(n,10,directed=False) %time gt.eigenvector(g) def node_sample(g, g_vertices, n_samples): return random.sample(g_vertices, n_samples) sample = node_sample(g, [x for x in g.vertices()], int(n/10)) vfilt = g.new_vertex_property('bool') for s in sample: vfilt[s] = True g_sample = gt.GraphView(g, vfilt=vfilt) %time gt.eigenvector(g_sample) u = gt.Graph(g_sample, prune=True) %time gt.eigenvector(u) ## -- End pasted text -- CPU times: user 747 ms, sys: 0 ns, total: 747 ms Wall time: 748 ms CPU times: user 12.2 s, sys: 0 ns, total: 12.2 s Wall time: 12.2 s CPU times: user 335 ms, sys: 0 ns, total: 335 ms Wall time: 335 ms Cheers .~´ On Wed, Jul 27, 2016 at 5:35 PM, Philipp Singer <killver@gmail.com> wrote:
Thanks for the hint, I tried that but it is the same runtime as on the filtered graph.
Best, Philipp On 27.07.2016 17:16, Tiago de Paula Peixoto wrote:
On 27.07.2016 17:06, Philipp Singer wrote:
g = gt.price_network(10000,10,directed=False) %time gt.eigenvector(g)
def node_sample(g, g_vertices, n_samples): return random.sample(g_vertices, n_samples) sample = node_sample(g, [x for x in g.vertices()], 1000)
vfilt = g.new_vertex_property('bool') for s in sample: vfilt[s] = True g_sample = gt.GraphView(g, vfilt=vfilt)
%time gt.eigenvector(g_sample)
If you want to check if the difference is due to graph filtering, you can compare with:
u = gt.Graph(g_sample, prune=True) gt.eigenvector(u)
Best, Tiago
_______________________________________________ graph-tool mailing listgraph-tool@skewed.dehttps://lists.skewed.de/mailman/listinfo/graph-tool
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Okay, that is really odd. Your results indicate that it is really slow on the filtered graph, but fast on the pruned graph. For me, the results are always the same for the filtered and pruned. What I found out though is that my anaconda graph tool is much faster for the sample, different to the global graph tool installation. I am also running it with multiple threads. But anyways, the sample should seldomly be that much slower than the full dataset. Strange, I'll try to find out more. Best, Philipp On 27.07.2016 17:43, Alexandre Hannud Abdo wrote:
Hi Philipp,
Not with the sample you provided.
Trying here, time decreases below that for the original graph.
(I had to change the sample to increase n and sample size, otherwise it was just too fast to capture)
In [1]: %paste
import random import graph_tool.all as gt n=100000 g = gt.price_network(n,10,directed=False) %time gt.eigenvector(g) def node_sample(g, g_vertices, n_samples): return random.sample(g_vertices, n_samples) sample = node_sample(g, [x for x in g.vertices()], int(n/10)) vfilt = g.new_vertex_property('bool') for s in sample: vfilt[s] = True g_sample = gt.GraphView(g, vfilt=vfilt) %time gt.eigenvector(g_sample) u = gt.Graph(g_sample, prune=True) %time gt.eigenvector(u)
## -- End pasted text -- CPU times: user 747 ms, sys: 0 ns, total: 747 ms Wall time: 748 ms CPU times: user 12.2 s, sys: 0 ns, total: 12.2 s Wall time: 12.2 s CPU times: user 335 ms, sys: 0 ns, total: 335 ms Wall time: 335 ms
Cheers .~´
On Wed, Jul 27, 2016 at 5:35 PM, Philipp Singer <killver@gmail.com <mailto:killver@gmail.com>> wrote:
Thanks for the hint, I tried that but it is the same runtime as on the filtered graph.
Best, Philipp
On 27.07.2016 17:16, Tiago de Paula Peixoto wrote:
On 27.07.2016 17:06, Philipp Singer wrote:
g = gt.price_network(10000,10,directed=False) %time gt.eigenvector(g)
def node_sample(g, g_vertices, n_samples): return random.sample(g_vertices, n_samples) sample = node_sample(g, [x for x in g.vertices()], 1000)
vfilt = g.new_vertex_property('bool') for s in sample: vfilt[s] = True g_sample = gt.GraphView(g, vfilt=vfilt)
%time gt.eigenvector(g_sample)
If you want to check if the difference is due to graph filtering, you can compare with:
u = gt.Graph(g_sample, prune=True) gt.eigenvector(u)
Best, Tiago
_______________________________________________ graph-tool mailing list graph-tool@skewed.de <mailto:graph-tool@skewed.de> https://lists.skewed.de/mailman/listinfo/graph-tool
_______________________________________________ graph-tool mailing list graph-tool@skewed.de <mailto:graph-tool@skewed.de> https://lists.skewed.de/mailman/listinfo/graph-tool
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
On 27.07.2016 17:35, Philipp Singer wrote:
Thanks for the hint, I tried that but it is the same runtime as on the filtered graph.
The time difference is because of the convergence of the algorithm. As mentioned in the documentation, the convergence speed of eigenvector() is a function of the spectral gap of the graph. By sub-sampling the network, and thus removing most of the edges, the spectral gap changes significantly. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (3)
-
Alexandre Hannud Abdo -
Philipp Singer -
Tiago de Paula Peixoto