Questions about output of "history" of graph_tool.inference.mcmc_equilibrate

older
mcmc_equilibrate - Segmentation...

P-M

13 Feb 2017 13 Feb '17

3:30 p.m.

I have just run the following snippet of code: mcmc_args=dict(parallel = True,niter=10) history = gt.mcmc_equilibrate(state, wait=1000, history=True,mcmc_args=mcmc_args) with open('history1.pkl','wb') as his1_pkl: pickle.dump(history,his1_pkl,-1) According to the manual history is a "list of tuples of the form (iteration, entropy)". When unpickling it however I get a list of length 2000. Each element in the list is another list of length two containing `nan` as first entry and then a single-digit integer as second entry. A couple of questions: 1) I would expect a tuple, not a list for each entry in the list. Is the manual wrong or is the code wrong? Or did I do something wrong? 2) Why am I receiving `nan` rather than a value for "iteration" as first entry of my list? 3) Is there a particular reason why the length of the list is precisely 2000 in this case? (Obviously there is, I just haven't quite figured it out yet.) Best, Philipp -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.

Show replies by date

Tiago de Paula Peixoto

13 Feb 13 Feb

4:48 p.m.

On 13.02.2017 15:30, P-M wrote:

...

I have just run the following snippet of code:

mcmc_args=dict(parallel = True,niter=10) history = gt.mcmc_equilibrate(state, wait=1000, history=True,mcmc_args=mcmc_args) with open('history1.pkl','wb') as his1_pkl: pickle.dump(history,his1_pkl,-1)

According to the manual history is a "list of tuples of the form (iteration, entropy)". When unpickling it however I get a list of length 2000. Each element in the list is another list of length two containing `nan` as first entry and then a single-digit integer as second entry.

A couple of questions: 1) I would expect a tuple, not a list for each entry in the list. Is the manual wrong or is the code wrong? Or did I do something wrong?

The point of the documentation was that two values for each step are returned, not that the actual type was a tuple. Most code should not care about this.

...

2) Why am I receiving `nan` rather than a value for "iteration" as first entry of my list?

I have no idea. I can't reproduce this. You have to send a complete example that shows the problem

...

3) Is there a particular reason why the length of the list is precisely 2000 in this case? (Obviously there is, I just haven't quite figured it out yet.)

As stated in the documentation, this is a stochastic algorithm which will stop after equilibration has been detected (using a record-breaking heuristic). Hence, the length of the history will be different each time. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>

P-M

5:50 p.m.

Tiago Peixoto wrote

...

...
1) I would expect a tuple, not a list for each entry in the list. Is the manual wrong or is the code wrong? Or did I do something wrong?

The point of the documentation was that two values for each step are returned, not that the actual type was a tuple. Most code should not care about this.

It is irrelevant to my code indeed. Seeing something different just threw me off. Tiago Peixoto wrote

...

...
2) Why am I receiving `nan` rather than a value for "iteration" as first entry of my list?

I have no idea. I can't reproduce this. You have to send a complete example that shows the problem

The graph file is 675 MB in size so probably not terribly amenable to sharing online. If I come across the issue with a smaller file I shall upload it. Tiago Peixoto wrote

...

...
3) Is there a particular reason why the length of the list is precisely 2000 in this case? (Obviously there is, I just haven't quite figured it out yet.)

As stated in the documentation, this is a stochastic algorithm which will stop after equilibration has been detected (using a record-breaking heuristic). Hence, the length of the history will be different each time.

OK, thank you for the explanation. Best wishes, Philipp -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.

P-M

21 Feb 21 Feb

11:07 a.m.

Hi Tiago, I have not reproduced the same problem yet, but a different problem for the history with a smaller graph which I can upload. I ran the following piece of code (this is a deliberately small network so ignore the actual results of the code): import graph_tool.all as gt import timeit import random import cPickle as pickle def collect_edge_probs(s): for i in range(len(missing_edges)): p = s.get_edges_prob([missing_edges[i]], entropy_args=dict(partition_dl=False)) probs[i].append(p) g = gt.load_graph('graph_no_multi_clean.gt') pub_years = [1800] vertex_filter = g.new_vertex_property("bool") edge_filter = g.new_edge_property("bool") for pub_year in pub_years: #Initiliase parallel edges filter parallel_edges_filter= g.new_edge_property("int",val=0) #filter vertices by date for v in g.vertices(): if g.vp.v_pub_year[v] <= pub_year: vertex_filter[v] = True else: vertex_filter[v] = False g.set_vertex_filter(vertex_filter) #now filter edges by date for e in g.edges(): if g.ep.pub_year[e] <= pub_year: edge_filter[e] = True else: edge_filter[e] = False g.set_edge_filter(edge_filter) #cannot simply delete all parallel edges as that might prevent accurate #filtering of edges by date in the next step gt.label_parallel_edges(g,eprop=parallel_edges_filter) for e in g.edges(): if parallel_edges_filter[e] != 0: edge_filter[e] = False g.set_edge_filter(edge_filter) remaining_v_indices = [] for v in g.vertices(): remaining_v_indices.append(int(g.vertex_index[v])) num_vertices = g.num_vertices() random_origins = random.sample(remaining_v_indices, int(0.01*num_vertices)) random_targets = random.sample(remaining_v_indices, int(0.01*num_vertices)) missing_edges = [] for v1 in random_origins: for v2 in random_targets: if v1==v2: continue elif g.edge(v1,v2) == None: missing_edges.append((v1,v2)) state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True) state = state.copy(sampling=True) probs = [[] for _ in range(len(missing_edges))] mcmc_args=dict(niter=10) # Now we collect the probabilities for exactly 10,000 sweeps history = gt.mcmc_equilibrate(state, force_niter=1000, mcmc_args=mcmc_args, callback=collect_edge_probs,history=True) name = 'history'+str(g.num_vertices())+'.pkl' with open(name,'wb') as missing_edges_pkl: pickle.dump(history,missing_edges_pkl,-1) #undo filtering g.set_edge_filter(None) g.set_vertex_filter(None) Now when looking at the output of `history` I find that the output for every entry is [7842.8484318875344, a] where `a` is some single-digit integer. Given that the expected format is [iteration,entropy] I can't quite make sense of it as the first entry is always the same and a decimal number wasn't quite what I expected for an iteration counter. The last number however also doesn't work as an interation counter as it doesn't seem to straightforwardly increment. Do you know what is going wrong here? Is this maybe a similar issue to what I had observed previously? I have attached the history output here ( history1023.pkl <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027051/history1023.pkl> ) and the graph as a zipped file here as it was too large otherwise ( graph_no_multi_clean.zip <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027051/graph_no_multi_clean.zip> ). -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.

Tiago de Paula Peixoto

11:37 a.m.

On 21.02.2017 11:07, P-M wrote:

...

Now when looking at the output of `history` I find that the output for every entry is [7842.8484318875344, a] where `a` is some single-digit integer.

It seems that the documentation is wrong; the history returns (entropy, nmoves), where nmoves is the number of vertices moved. Returning the number of iterations would be redundant anyways, since the length of the history gives you that already. I'll fix the documentation. -- Tiago de Paula Peixoto <tiago@skewed.de>

3223

Age (days ago)

3231

Last active (days ago)

List overview

Download

4 comments

2 participants

participants (2)

P-M
Tiago de Paula Peixoto