Correlation Histogram
I am trying to obtain the correlation histogram for a graph of mine following the example given in the manual. I run: g = gt.load_graph('graph.gt') gt.remove_parallel_edges(g) h=gt.corr_hist(g,'out','out') My graph is relatively large at 12,238,931 vertices and 24,884,365 edges. My problem is that as soon as I start the code it runs on 20 processes and happily chomps through 252 GB of RAM before starting to spill over into the swap making my machine incredibly slow. I presume the RAM usage is linked to the parallel processing so presumably could be tackled if I ran it using fewer processes. Is there any way of reducing the RAM usage? Or would I need to implement the routine manually to achieve this? Best wishes, Philipp -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
Ni! Hi Phillip, That's a feature of OpenMP controlled by an environment variable: OMP_NUM_THREADS So you can, for example export OMP_NUM_THREADS=4 before running your code. .~´ Le mercredi 08 février 2017 à 10:09 -0700, P-M a écrit :
I am trying to obtain the correlation histogram for a graph of mine following the example given in the manual. I run:
g = gt.load_graph('graph.gt') gt.remove_parallel_edges(g) h=gt.corr_hist(g,'out','out')
My graph is relatively large at 12,238,931 vertices and 24,884,365 edges. My problem is that as soon as I start the code it runs on 20 processes and happily chomps through 252 GB of RAM before starting to spill over into the swap making my machine incredibly slow.
I presume the RAM usage is linked to the parallel processing so presumably could be tackled if I ran it using fewer processes. Is there any way of reducing the RAM usage? Or would I need to implement the routine manually to achieve this?
Best wishes,
Philipp
-- View this message in context: http://main-discussion-list-for-the-gra ph-tool-project.982480.n3.nabble.com/Correlation-Histogram- tp4027010.html Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com. _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
Thanks! I presume this won't impact already running processes and is only valid for as long as my instance of PuTTY is running and after that revert to normal? -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
Well, yes. Though you can configure your shell and make it permanent. Software Carpentry has some good tutorials on using the shell, for example: http://swcarpentry.github.io/shell-extras/08-environment-variables.html You can also modify the environment from within Python, using the "os.environ" dictionary. You'll just have to set the value for 'OMP_NUM_THREADS' before importing graph-tool, because openmp will consider the value at the time of importing. []s On Thursday, February 9, 2017, P-M <pmj27@cam.ac.uk> wrote:
Thanks! I presume this won't impact already running processes and is only valid for as long as my instance of PuTTY is running and after that revert to normal?
-- View this message in context: http://main-discussion-list-fo r-the-graph-tool-project.982480.n3.nabble.com/Correlation- Histogram-Runs-out-of-RAM-tp4027010p4027012.html Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com. _______________________________________________ graph-tool mailing list graph-tool@skewed.de https://lists.skewed.de/mailman/listinfo/graph-tool
On 09.02.2017 16:37, Alexandre Hannud Abdo wrote:
You can also modify the environment from within Python, using the "os.environ" dictionary. You'll just have to set the value for 'OMP_NUM_THREADS' before importing graph-tool, because openmp will consider the value at the time of importing.
graph-tool also provides some convenience functions for doing this from inside python, independently of the environment variables: graph_tool.openmp_get_num_threads() graph_tool.openmp_set_num_threads() graph_tool.openmp_get_schedule() graph_tool.openmp_set_schedule() -- Tiago de Paula Peixoto <tiago@skewed.de>
Those functions were very useful (as was the tip about the environment variable). I couldn't find them anywhere in the documentation though. Would it be possible to add them? Thank you for the help, alas, in this case even limiting the threads to 1 didn't work. The routine still uses up 252 GB of RAM and then carries on into the swap. I suppose the network is simply too large... Best, Philipp -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 10.02.2017 15:14, P-M wrote:
Those functions were very useful (as was the tip about the environment variable). I couldn't find them anywhere in the documentation though. Would it be possible to add them?
It's now in git.
Thank you for the help, alas, in this case even limiting the threads to 1 didn't work. The routine still uses up 252 GB of RAM and then carries on into the swap. I suppose the network is simply too large...
I don't think this is necessarily true. The histogram constructs a DxD matrix, where D is the largest degree in the network. This is probably why you are running out of memory. -- Tiago de Paula Peixoto <tiago@skewed.de>
Yup, that would explain it. D is in the order of 10^5 in this case. Would I be able to write a more memory-efficient script manually with sparse matrices or is the underlying code already fairly optimised in this regard? Best, Philipp -- View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/... Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
On 12.02.2017 12:53, P-M wrote:
Yup, that would explain it. D is in the order of 10^5 in this case. Would I be able to write a more memory-efficient script manually with sparse matrices or is the underlying code already fairly optimised in this regard?
Of course, you could do a lot better with a sparse representation... -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (3)
-
Alexandre Hannud Abdo -
P-M -
Tiago de Paula Peixoto