CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Case running extremely slow on cluster in parallel mode

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 10, 2021, 04:34
Default Case running extremely slow on cluster in parallel mode
  #1
Member
 
Venkat Ganesh
Join Date: May 2020
Location: Cincinnati, Ohio
Posts: 49
Rep Power: 6
Venky_94 is on a distinguished road
Hello,

This is my first time running openFoam on a cluster and I noticed that my simulation was running much slower on the cluster using multiple processors than on my local PC.

I did a scaling analysis by noting down the execution time for around 50 iterations and averaged them (for different processor counts). I noticed that the performance sharply deteriorates after just increasing the processor count to 4. I've attached a photo of the timings and efficiency.

For comparison, the execution time is less than a second on my local PC while using 8 cores. A single node on the cluster has 40 processors, and I'd like to make use of the available computing power to speed up my simulation. I'm using scotch method of decomposition and a custom solver based on interFoam. I'm also attaching the case setup. Please let me know what I could change to improve the performance.
Attached Images
File Type: jpg Scaling Analysis.JPG (39.4 KB, 33 views)
Attached Files
File Type: zip VOF.zip (16.1 KB, 3 views)
Venky_94 is offline   Reply With Quote

Old   November 10, 2021, 05:03
Default
  #2
Senior Member
 
Santiago Lopez Castano
Join Date: Nov 2012
Posts: 354
Rep Power: 16
Santiago is on a distinguished road
Quote:
Originally Posted by Venky_94 View Post
Hello,

This is my first time running openFoam on a cluster and I noticed that my simulation was running much slower on the cluster using multiple processors than on my local PC.

I did a scaling analysis by noting down the execution time for around 50 iterations and averaged them (for different processor counts). I noticed that the performance sharply deteriorates after just increasing the processor count to 4. I've attached a photo of the timings and efficiency.

For comparison, the execution time is less than a second on my local PC while using 8 cores. A single node on the cluster has 40 processors, and I'd like to make use of the available computing power to speed up my simulation. I'm using scotch method of decomposition and a custom solver based on interFoam. I'm also attaching the case setup. Please let me know what I could change to improve the performance.
Pretty common when running unstructured codes on single nodes. This is basically due to the number of channels and the L2/L1 cache of your blade. My suggestion: run the same performance analysis, but using the node as the smallest "cpu unit". That is: use 40, 80, 120 processors.
Santiago is offline   Reply With Quote

Old   November 15, 2021, 11:56
Default
  #3
Member
 
Venkat Ganesh
Join Date: May 2020
Location: Cincinnati, Ohio
Posts: 49
Rep Power: 6
Venky_94 is on a distinguished road
Quote:
Originally Posted by Santiago View Post
Pretty common when running unstructured codes on single nodes. This is basically due to the number of channels and the L2/L1 cache of your blade. My suggestion: run the same performance analysis, but using the node as the smallest "cpu unit". That is: use 40, 80, 120 processors.
Thanks for your suggestion. I tried it out and noticed that while all 3 options are faster than my local PC, I had the best results while using a single node. I apologize for the delayed response. I had quite a large wait time for the jobs to run.

I ran all three options for 24 hours and the completed runtimes for the case were 2.5s on single node (40 processors), 2s on 80 processors and 1.9s on 120 processors.

While the performance of single node is good enough for my case currently, I'm genuinely curious about the results.
  1. Why is it that the performance was so poor for 20 cores but much improved for 40 cores? Is it because the complete system is available at its disposal enabling openfoam to have access to more memory?
  2. Also why is 40 cores giving a better performance than higher core count? Is it because of higher communication times between the cores (not enough cells/cores situation)
  3. And finally what would I have to do if I need to further quicken my simulation?

Last edited by Venky_94; November 20, 2021 at 00:30.
Venky_94 is offline   Reply With Quote

Old   August 28, 2024, 20:10
Default Similar Issue
  #4
New Member
 
Brahmanda
Join Date: Jul 2024
Posts: 2
Rep Power: 0
brahmsdr is on a distinguished road
Hi,
I have a similar issue running my simulation case with the icoFOAM solver in parallel mode on the cluster. I used a similar mesh decomposition method, scotch. Here's a small trial and error I did on flow past a cylinder with a mesh count of ~200k:

32 cores (4 nodes), 256GB (64GB per node, 8GB per cpu/task): sim Time = 0.2 took 2927s (dt=0.005)

16 cores (2 nodes), 256GB (128GB per node, 16GB per cpu/task): sim Time = 0.2 took 2571s (dt=0.005)

8 cores (2 nodes), 256GB (128GB per node, 32GB per cpu/task): sim Time = 0.2 took 1285s (dt=0.005)

4 cores (2 nodes), 256GB (128GB per node, 64GB per cpu/task): sim Time = 0.2 took 784s (dt=0.005)

2 cores (2 nodes), 256GB (128GB per node, 128GB per cpu/task): sim Time = 0.2 took 34s (dt=0.005)

My trial-and-error cases here suggested that the issue might be related to insufficient memory because increasing memory per CPU helps fasten the simulation. The fact that it is running faster with a small number of processors still puzzles me. One possibility here might be because of an inefficiency that arises in the parallel communication for a case with a small mesh count. However, I tried running several tests with a mesh count of ~3mil and the same issue arises:

72 cores (2 nodes), 288GB (144GB per node, 4GB per cpu/task): Simulation Time = 0.1 failed (not enough memory) (dt=0.005)

4 cores (4 nodes), 256GB (128GB per node, 64GB per cpu/task): Simulation Time = 0.1 took 365s (dt=0.005)

The smaller number of processors with large memory assigned per processor still gives a faster run.

Has anyone found any hints/solutions to what might be the case with this?
I am using the ESI version of OpenFOAM v2212
brahmsdr is offline   Reply With Quote

Reply

Tags
cluster, parallel


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Running parallel case after parallel meshing with snappyHexMesh? Adam Persson OpenFOAM Running, Solving & CFD 0 August 31, 2015 23:04
Fluent 14.0 file not running in parallel mode in cluster tejakalva FLUENT 0 February 4, 2015 08:02
OpenFOAM parallel running error in cluster vishal_s OpenFOAM Running, Solving & CFD 5 March 11, 2014 16:11
Running Error using Compressible OpenFoam Parallel mode dhendria OpenFOAM Running, Solving & CFD 0 February 13, 2014 21:53
Free surface boudary conditions with SOLA-VOF Fan Main CFD Forum 10 September 9, 2006 13:24


All times are GMT -4. The time now is 11:01.