I was trying to do a nested degree-corrected SBM on my laptop, but I was running out of RAM, 14-15GB before stopping with "Killed". The graph is directed with 20k nodes 2.8m edges. Is that a typical amount of memory usage or did I do something wrong? If it's normal, is there any way for me to work around this such that I can run it on my computer
This thread might answer your question: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Cheers, Dr. Felix Victor Münch Postdoc Researcher Leibniz Institut for Media Research | Hans-Bredow-Institut (HBI), Hamburg https://leibniz-hbi.de/ https://felixvictor.net (https://felixvictor.net/)
On Monday, Feb 10, 2020 at 5:30 PM, Deklan Webster <deklanw@gmail.com (mailto:deklanw@gmail.com)> wrote: I was trying to do a nested degree-corrected SBM on my laptop, but I was running out of RAM, 14-15GB before stopping with "Killed". The graph is directed with 20k nodes 2.8m edges.
Is that a typical amount of memory usage or did I do something wrong? If it's normal, is there any way for me to work around this such that I can run it on my computer _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Thanks, I got it setup. The memory usage is now about 2~GB! I was testing with a smaller graph and I noticed the sweeps are much slower with multiflip. Is this a fundamental tradeoff between speed and memory usage, or is there some setting to make it faster? Thanks On Tue, Feb 11, 2020 at 6:11 AM Felix Victor Münch <felix.v.muench@gmail.com> wrote:
This thread might answer your question:
http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/...
Cheers,
*Dr. Felix Victor Münch* Postdoc Researcher Leibniz Institut for Media Research | Hans-Bredow-Institut (HBI), Hamburg https://leibniz-hbi.de/ https://felixvictor.net
On Monday, Feb 10, 2020 at 5:30 PM, Deklan Webster <deklanw@gmail.com> wrote: I was trying to do a nested degree-corrected SBM on my laptop, but I was running out of RAM, 14-15GB before stopping with "Killed". The graph is directed with 20k nodes 2.8m edges.
Is that a typical amount of memory usage or did I do something wrong? If it's normal, is there any way for me to work around this such that I can run it on my computer _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Hi, Am 11.02.20 um 21:30 schrieb Deklan Webster:
Thanks, I got it setup. The memory usage is now about 2~GB!
I was testing with a smaller graph and I noticed the sweeps are much slower with multiflip. Is this a fundamental tradeoff between speed and memory usage, or is there some setting to make it faster?
It's an illusion that multiflip is slower, it is in fact much master. It may take slightly longer to return per sweep, but it moves much further in state space. Overall, the sampling equilibrates much faster. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
I see. To initialize the NestedBlockState I'm doing this: bs = [np.zeros(1)] * 10 state = NestedBlockState(g=g, bs=bs, sampling=True, deg_corr=True) Is there a better way to initialize it such that this process is faster? Running "minimize_nested_blockmodel_dl" once is seemingly out of the question because of memory usage. Also, I notice that only one of my CPU cores is being utilized. Is there no multicore low-memory way of doing this? Thanks! On Wed, Feb 12, 2020 at 3:23 AM Tiago de Paula Peixoto <tiago@skewed.de> wrote:
Hi,
Am 11.02.20 um 21:30 schrieb Deklan Webster:
Thanks, I got it setup. The memory usage is now about 2~GB!
I was testing with a smaller graph and I noticed the sweeps are much slower with multiflip. Is this a fundamental tradeoff between speed and memory usage, or is there some setting to make it faster?
It's an illusion that multiflip is slower, it is in fact much master. It may take slightly longer to return per sweep, but it moves much further in state space. Overall, the sampling equilibrates much faster.
Best, Tiago
-- Tiago de Paula Peixoto <tiago@skewed.de> _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
In the thread linked (http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/...) is written (Remember to replace mcmc_sweep() with multiflip_mcmc_sweep() in the code above. I had forgotten to make this change the last release...) Does this mean that one should also set "multiflip=True" in mcmc_equilbrate? d
On 11 Feb 2020, at 21:30, Deklan Webster <deklanw@gmail.com> wrote:
Thanks, I got it setup. The memory usage is now about 2~GB!
I was testing with a smaller graph and I noticed the sweeps are much slower with multiflip. Is this a fundamental tradeoff between speed and memory usage, or is there some setting to make it faster?
Thanks
On Tue, Feb 11, 2020 at 6:11 AM Felix Victor Münch <felix.v.muench@gmail.com> wrote: This thread might answer your question:
http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/...
Cheers,
Dr. Felix Victor Münch Postdoc Researcher Leibniz Institut for Media Research | Hans-Bredow-Institut (HBI), Hamburg https://leibniz-hbi.de/ https://felixvictor.net
On Monday, Feb 10, 2020 at 5:30 PM, Deklan Webster <deklanw@gmail.com> wrote: I was trying to do a nested degree-corrected SBM on my laptop, but I was running out of RAM, 14-15GB before stopping with "Killed". The graph is directed with 20k nodes 2.8m edges.
Is that a typical amount of memory usage or did I do something wrong? If it's normal, is there any way for me to work around this such that I can run it on my computer _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Am 24.02.20 um 19:32 schrieb Davide Cittaro:
In the thread linked (http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/...) is written
(Remember to replace mcmc_sweep() with multiflip_mcmc_sweep() in the code above. I had forgotten to make this change the last release...)
Does this mean that one should also set "multiflip=True" in mcmc_equilbrate?
Yes. -- Tiago de Paula Peixoto <tiago@skewed.de>
Dear community, I have been following this thread as have a similar predicament - trying to run the nested block model on a graph of 1mil nodes and 12 mil edges. Since I have read of the computational cost from others, I have got access to a server with 500Gb RAM with ~120 CPU threads. Based upon this graph complexity and your prior experience, which approach you would aim for: 1) minimize_nested_blockmodel_dl(g) or 2) prime an empty object and MCMC equilibrate with multi-flip: bs = [np.zeros(1)]*6 state=NestedBlockState(g=g,bs=bs,sampling=True) gt.mcmc_equilibrate(state,multiflip=True) Am very grateful for the guidance…! (I wonder how long this will take to run…??) James
On 24 Feb 2020, at 18:40, Tiago de Paula Peixoto <tiago@skewed.de> wrote:
Am 24.02.20 um 19:32 schrieb Davide Cittaro:
(Remember to replace mcmc_sweep() with multiflip_mcmc_sweep() in the code above. I had forgotten to make this change the last release...)
Does this mean that one should also set "multiflip=True" in mcmc_equilbrate?
Yes.
-- Tiago de Paula Peixoto <tiago@skewed.de> _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.skewed.de%2Fmailman%2Flistinfo%2Fgraph-tool&data=02%7C01%7C%7C3b3a0d8d77904e2eb56e08d7b9591fc8%7C569df091b01340e386eebd9cb9e25814%7C0%7C0%7C637181664763877491&sdata=izkn35h55P2quQwzqI4AaSS7Zk%2BO%2FY6Hhc9Fdmo38D8%3D&reserved=0
I had success with the latter method. Although, it seems the hardware you have is a bit overkill (I did 3m~ edge graph with 16GB RAM laptop with enough RAM to spare). I'm not sure if it's possible to utilize multithreading for this, but if not those threads you have won't help much. Still hoping to get an answer about multithreading from someone more knowledgeable On Wed, Feb 26, 2020 at 10:35 AM James Ruffle <j.ruffle@qmul.ac.uk> wrote:
Dear community,
I have been following this thread as have a similar predicament - trying to run the nested block model on a graph of 1mil nodes and 12 mil edges. Since I have read of the computational cost from others, I have got access to a server with 500Gb RAM with ~120 CPU threads. Based upon this graph complexity and your prior experience, which approach you would aim for:
1) minimize_nested_blockmodel_dl(g)
or
2) prime an empty object and MCMC equilibrate with multi-flip: bs = [np.zeros(1)]*6 state=NestedBlockState(g=g,bs=bs,sampling=True) gt.mcmc_equilibrate(state,multiflip=True)
Am very grateful for the guidance…! (I wonder how long this will take to run…??)
James
On 24 Feb 2020, at 18:40, Tiago de Paula Peixoto <tiago@skewed.de> wrote:
Am 24.02.20 um 19:32 schrieb Davide Cittaro:
(Remember to replace mcmc_sweep() with multiflip_mcmc_sweep() in the code above. I had forgotten to make this change the last release...)
Does this mean that one should also set "multiflip=True" in mcmc_equilbrate?
Yes.
-- Tiago de Paula Peixoto <tiago@skewed.de> _______________________________________________ graph-tool mailing list graph-tool@skewed.de
_______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
For the first version I needed 400-500GB RAM, so that wouldn't be overkill. The threads however seem pretty much useless, indeed. Hoiweveriwever, as the latter version seems so much more memory efficient, I am wondering why it's not implemented as the default in form of a function call with the same prominence in the documentation. Would love to hear Tiago or anyone else chime in on this. What's the advantage of the boiler-plate minimize_nested_blockmodel? Is there any except less lines of code? Cheers, Felix
On Wednesday, Feb 26, 2020 at 9:28 PM, Deklan Webster <deklanw@gmail.com (mailto:deklanw@gmail.com)> wrote: I had success with the latter method. Although, it seems the hardware you have is a bit overkill (I did 3m~ edge graph with 16GB RAM laptop with enough RAM to spare). I'm not sure if it's possible to utilize multithreading for this, but if not those threads you have won't help much. Still hoping to get an answer about multithreading from someone more knowledgeable
On Wed, Feb 26, 2020 at 10:35 AM James Ruffle <j.ruffle@qmul.ac.uk (mailto:j.ruffle@qmul.ac.uk)> wrote:
Dear community,
I have been following this thread as have a similar predicament - trying to run the nested block model on a graph of 1mil nodes and 12 mil edges. Since I have read of the computational cost from others, I have got access to a server with 500Gb RAM with ~120 CPU threads. Based upon this graph complexity and your prior experience, which approach you would aim for:
1) minimize_nested_blockmodel_dl(g)
or
2) prime an empty object and MCMC equilibrate with multi-flip: bs = [np.zeros(1)]*6 state=NestedBlockState(g=g,bs=bs,sampling=True) gt.mcmc_equilibrate(state,multiflip=True)
Am very grateful for the guidance…! (I wonder how long this will take to run…??)
James
On 24 Feb 2020, at 18:40, Tiago de Paula Peixoto <tiago@skewed.de (mailto:tiago@skewed.de)> wrote:
Am 24.02.20 um 19:32 schrieb Davide Cittaro:
(Remember to replace mcmc_sweep() with multiflip_mcmc_sweep() in the code above. I had forgotten to make this change the last release...)
Does this mean that one should also set "multiflip=True" in mcmc_equilbrate?
Yes.
-- Tiago de Paula Peixoto <tiago@skewed.de (mailto:tiago@skewed.de)> _______________________________________________ graph-tool mailing list graph-tool@skewed.de (mailto:graph-tool@skewed.de) https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.skewed.de%2Fmailman%2Flistinfo%2Fgraph-tool&data=02%7C01%7C%7C3b3a0d8d77904e2eb56e08d7b9591fc8%7C569df091b01340e386eebd9cb9e25814%7C0%7C0%7C637181664763877491&sdata=izkn35h55P2quQwzqI4AaSS7Zk%2BO%2FY6Hhc9Fdmo38D8%3D&reserved=0
_______________________________________________ graph-tool mailing list graph-tool@skewed.de (mailto:graph-tool@skewed.de) https://lists.skewed.de/mailman/listinfo/graph-tool
graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Am 27.02.20 um 10:19 schrieb Felix Victor Münch:
For the first version I needed 400-500GB RAM, so that wouldn't be overkill. The threads however seem pretty much useless, indeed.
500 GB for a network with 3M edges seems absurd. You should be able to do it with an order of magnitude less memory, at least.
Hoiweveriwever, as the latter version seems so much more memory efficient, I am wondering why it's not implemented as the default in form of a function call with the same prominence in the documentation. Would love to hear Tiago or anyone else chime in on this. What's the advantage of the boiler-plate minimize_nested_blockmodel? Is there any except less lines of code?
The algorithms are not the same, and incur different trade-offs. The one in minimize_nested_blockmodel_dl() attempts to bracket the model complexity with a one-dimensional bisection search, and requires more memory because it needs to keep several copies of the global state during the search. On the other hand, multiflip_mcmc_sweep() implements merge-split moves, and keeps only a single state around, hence the improved memory usage. Although it depends on the network, minimize_nested_blockmodel_dl() tends to find better estimates of the ground state (i.e. solutions with smallest description length) in a shorter time, for a cost of higher memory usage. The next version of the library will include an improved version of multiflip_mcmc_sweep() that I am preparing, and this alternative will gain prominence in the documentation. Best, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Is multiflip_mcmc_sweep() able to be done in parallel now or in the future (CPU or GPU)? On Thu, Feb 27, 2020 at 8:01 AM Tiago de Paula Peixoto <tiago@skewed.de> wrote:
Am 27.02.20 um 10:19 schrieb Felix Victor Münch:
For the first version I needed 400-500GB RAM, so that wouldn't be overkill. The threads however seem pretty much useless, indeed.
500 GB for a network with 3M edges seems absurd. You should be able to do it with an order of magnitude less memory, at least.
Hoiweveriwever, as the latter version seems so much more memory efficient, I am wondering why it's not implemented as the default in form of a function call with the same prominence in the documentation. Would love to hear Tiago or anyone else chime in on this. What's the advantage of the boiler-plate minimize_nested_blockmodel? Is there any except less lines of code?
The algorithms are not the same, and incur different trade-offs.
The one in minimize_nested_blockmodel_dl() attempts to bracket the model complexity with a one-dimensional bisection search, and requires more memory because it needs to keep several copies of the global state during the search.
On the other hand, multiflip_mcmc_sweep() implements merge-split moves, and keeps only a single state around, hence the improved memory usage.
Although it depends on the network, minimize_nested_blockmodel_dl() tends to find better estimates of the ground state (i.e. solutions with smallest description length) in a shorter time, for a cost of higher memory usage.
The next version of the library will include an improved version of multiflip_mcmc_sweep() that I am preparing, and this alternative will gain prominence in the documentation.
Best, Tiago
-- Tiago de Paula Peixoto <tiago@skewed.de> _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
On 27. Feb 2020, at 14:01, Tiago de Paula Peixoto <tiago@skewed.de> wrote:
Am 27.02.20 um 10:19 schrieb Felix Victor Münch:
For the first version I needed 400-500GB RAM, so that wouldn't be overkill. The threads however seem pretty much useless, indeed.
500 GB for a network with 3M edges seems absurd. You should be able to do it with an order of magnitude less memory, at least.
Yes, you’re absolutely right, sorry. It was > 60 million edges in my case. My bad for not reading closely enough (and not having provided that information in this thread before).
Hoiweveriwever, as the latter version seems so much more memory efficient, I am wondering why it's not implemented as the default in form of a function call with the same prominence in the documentation. Would love to hear Tiago or anyone else chime in on this. What's the advantage of the boiler-plate minimize_nested_blockmodel? Is there any except less lines of code?
The algorithms are not the same, and incur different trade-offs.
The one in minimize_nested_blockmodel_dl() attempts to bracket the model complexity with a one-dimensional bisection search, and requires more memory because it needs to keep several copies of the global state during the search.
On the other hand, multiflip_mcmc_sweep() implements merge-split moves, and keeps only a single state around, hence the improved memory usage.
Although it depends on the network, minimize_nested_blockmodel_dl() tends to find better estimates of the ground state (i.e. solutions with smallest description length) in a shorter time, for a cost of higher memory usage.
The next version of the library will include an improved version of multiflip_mcmc_sweep() that I am preparing, and this alternative will gain prominence in the documentation.
Looking forward to test it. Thanks for the explanation!
participants (5)
-
Davide Cittaro -
Deklan Webster -
Felix Victor Münch -
James Ruffle -
Tiago de Paula Peixoto