|
[Sponsors] |
November 10, 2017, 10:16 |
Load Balancing Issue
|
#1 |
New Member
Kah Joon Yong
Join Date: Jul 2017
Location: Munich, Germany
Posts: 23
Rep Power: 9 |
Hello,
I am currently having a problem with my simulation due to 'Segmentation Fault'. I recently found out that this error was highly possibly due to the load balancing as I am running the simulation on 16 parallel cores. Attached is the ranked cell count of an example of a failed simulation. I have compared all cases where the simulation crashed, and concluded that all simulations that have crashed until recently had a bad load balancing. Attachment 59497 According to one of your documentation I happen to came across, it was presented that to solve this load balancing issue, one should decrease the parallel scale in the input.in file. However, the case I have simulated are already at the scale of -1 and further decreasing that is not possible. Is there any other solution I could try out to have a balanced load? Additional Info: This is a simple cylindrical domain (just a slice of it) I have prescribed in this case Scale 7 for AMR. For Scale 6, it was perfectly balanced but once I increase the scale it couldn't balance anymore. Could it be that the AMR scale was too high and it wasn't intended/recommended at all? For a real geometry the limit was Scale 4. At scale 5 it will have this exact same problem. Regards, Kah Joon Yong Last edited by kahjoonyong; November 10, 2017 at 12:14. |
|
November 10, 2017, 13:10 |
|
#2 |
Senior Member
Kislaya Srivastava
Join Date: Sep 2017
Location: Convergent Science, Northville MI
Posts: 165
Rep Power: 9 |
Hello Kah Joon,
Your AMR scale is too high and not something we recommend. As you might be aware, we load balance based on "blocks" (refer to our manual for more details on this). With such high AMR scale, cell count within a single block might be increase drastically and all the cells within that block are required to go to the same processor. This will negatively impact your load balancing if you have these local areas of high cell count/high refinement. If your processors cannot handle the jump in memory requirement, your case will crash. Since you're already at a parallel scale of -1, my recommendation would be to lower your base grid sizes (and in turn your AMR scales). This would give you better load balancing if you really have to have such small grid sizes in your domain. The closer the cells sizes are with each other, the better the load balancing you will get. Hope this helps, Sincerely, Srivastava |
|
November 10, 2017, 13:18 |
|
#3 |
New Member
Kah Joon Yong
Join Date: Jul 2017
Location: Munich, Germany
Posts: 23
Rep Power: 9 |
Hello Srivastava,
thanks for your confirmation. I have recently figured that out and hypothesized your statement. The current problem at my side with smaller base grid is the cell count which might in turn pressure the processors. It is like a tug of war. Sadly the maximum processors which are allowed for us to utilize was 16 and I cannot go higher. However the experience with high cell count had led me assume that my 16 core couldn't handle the cell count. Now after finding out that basically some cores aren't doing any calculation due to the unbalanced load (one of the core was calculating 1.2MIL cells, which bottlenecked the whole process), making a finer grid could mean that the process will be sped up. I will definitely try it out. Thanks for your solution. Regard, Kah Joon Yong |
|
Tags |
amr, load balacing, load balance, segmentation fault |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
ray-tracing solar load vs DO radiation model | nima_nz | FLUENT | 5 | June 26, 2024 01:32 |
foamToTecplot360 issue with multiregion solvers | manuc | OpenFOAM Post-Processing | 2 | November 21, 2016 14:51 |
Get load values at each substep | ploi | ANSYS | 0 | March 10, 2016 16:07 |
Convergence issue in natural convection problem | chrisf90 | FLUENT | 5 | March 5, 2016 09:30 |
[ANSYS Meshing] load application in PSC girder bridge | astitva | ANSYS Meshing & Geometry | 0 | February 26, 2016 03:33 |