Old   March 8, 2021, 23:02
Default Optimum way for running simulation in parallel
Senior Member
krishna kant
Join Date: Feb 2016
Location: Hyderabad, India
Posts: 133
Rep Power: 10
kk415 is on a distinguished road
Hello All

I am running a simulation of multiphase flow in parallel and there is a huge difference between execution time and clock time. I want to know what could be the possible reason for it.

I am attaching an instance of my log data and my system info here.

PIMPLE: iteration 1
Selected 0 split points out of a possible 0.
Number of isoAdvector surface cells = 0
isoAdvection: Before conservative bounding: min(alpha) = 0, max(alpha) = 1 + -1
isoAdvection: After conservative bounding: min(alpha) = 0, max(alpha) = 1 + -1
isoAdvection: time consumption = 1%
Phase-1 volume fraction = 0  Min(alpha.water) = 0  1 - Max(alpha.water) = 1
solve the reinitialization equation
Interpolation routine for interface normal
Curvature Calculation
Creating isoSurface
Interpolating Curvature from iso-surface to cell centers
smoothSolver:  Solving for Ux, Initial residual = 0.000593322427, Final residual = 1.70711359e-09, No Iterations 3
smoothSolver:  Solving for Uy, Initial residual = 0.00260626766, Final residual = 6.52222249e-09, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.000199399075, Final residual = 1.1910404e-09, No Iterations 3
GAMG:  Solving for p_rgh, Initial residual = 0.00639449018, Final residual = 3.29043924e-05, No Iterations 3
time step continuity errors : sum local = 3.53346848e-09, global = 4.19805384e-11, cumulative = 3.24236402e-08
GAMG:  Solving for p_rgh, Initial residual = 0.000340716515, Final residual = 3.28157441e-06, No Iterations 3
time step continuity errors : sum local = 3.52358875e-10, global = -6.40922139e-12, cumulative = 3.2417231e-08
GAMG:  Solving for p_rgh, Initial residual = 4.49729204e-05, Final residual = 7.00905977e-09, No Iterations 15
time step continuity errors : sum local = 7.51373052e-13, global = -1.42068391e-14, cumulative = 3.24172168e-08
smoothSolver:  Solving for k, Initial residual = 0.000333755776, Final residual = 6.7287304e-07, No Iterations 1
ExecutionTime = 24298.66 s  ClockTime = 121287 s

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Stepping:              2
CPU MHz:               1200.000
BogoMIPS:              4589.05
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-9,20-29
NUMA node1 CPU(s):     10-19,30-39

I am running 10 simulation each using 4 processor. The grid size is approx 22K for a 2D case.
Old   March 9, 2021, 06:00
New Member
Icaro Amorim de Carvalho
Join Date: Dec 2020
Posts: 24
Rep Power: 5
PenPencil is on a distinguished road
Hi Krishna,

I sometimes get confused with the output of 'lscpu' as you used, so I hope I am not saying something wrong. I would suggest you try running these 10 simulations with 2 processors each and compare the cpu time with the clocktime. I say this because I suspect you have actually 20 physical cores, and the way you're running, you're using virtual cores, which OpenFOAM does not take advantage of.
Hope that helps.
kk415 likes this.
Old   March 9, 2021, 12:02
Senior Member
Domenico Lahaye
Join Date: Dec 2013
Posts: 773
Blog Entries: 1
Rep Power: 17
dlahaye is on a distinguished road
ClockTime might be higher due to writing/reading from file and due to communication between processors.
kk415 likes this.
Old   March 10, 2021, 10:56
Senior Member
Join Date: Mar 2009
Posts: 281
Rep Power: 22
klausb will become famous soon enough
- I understand, you have two nodes, each with two sockets, each socket with 10 physical cores?

- Make sure you switch off SMT/Hyper-Threading and use only physical cores!

- How fast is your link between the two nodes? (InfiniBand or something slower?)

- Maybe you use too many cores for your small test case and waste time on "unnecessary" communication (see discussion: MPIRun How many processors)
kk415 likes this.
Old   March 10, 2021, 13:32
Senior Member
Join Date: Apr 2020
Location: UK
Posts: 727
Rep Power: 14
Tobermory will become famous soon enough
Have you tried running top? Just type this from the command line and it will tell you how busy the processors are ... and to check on Domenico's suggestion.

For example, if all is working smoothly the processes for each run should be steaming away at 100%CPU ... if they are always far below 100% then there is probably some bottleneck in the communication or you are over loading the cores; if they are at 100% for a while then drop to something small before returning to 100% then there may be a disk writing bottleneck etc etc.
kk415 likes this.
Old   April 11, 2021, 03:33
Senior Member
krishna kant
Join Date: Feb 2016
Location: Hyderabad, India
Posts: 133
Rep Power: 10
kk415 is on a distinguished road
Hello All

Thank you for all the suggestions and I apologize for my late reply, I was so involved those days for ICLASS so missed the notification of reply on my email.

Now I am running another simulation of 1.125M cells and using 4 processor each. The problem is still pertaining and It seems the problem is of multithreading and CPU utilization. The top command gives me this info:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                         
16526 Rajesh    20   0 1822m 1.1g 7932 R 45.6  3.7 959:58.71 interFlowvAMR1                                                                                                                                                                  
15585 Rajesh    20   0 1830m 993m 7788 R 43.8  3.1 945:34.24 interFlowvAMR1                                                                                                                                                                  
15892 Rajesh    20   0 1797m 976m 8216 R 43.8  3.1 949:38.86 interFlowvAMR1                                                                                                                                                                  
15893 Rajesh    20   0 1809m 1.0g 7600 R 43.8  3.2 945:08.07 interFlowvAMR1                                                                                                                                                                  
16527 Rajesh    20   0 1813m 1.1g 8092 R 43.8  3.7 960:44.33 interFlowvAMR1                                                                                                                                                                  
16524 Rajesh    20   0 1824m 1.2g 8120 R 42.0  3.7 958:22.57 interFlowvAMR1                                                                                                                                                                  
15588 Rajesh    20   0 1826m 965m 7760 R 40.1  3.0 944:16.15 interFlowvAMR1                                                                                                                                                                  
15894 Rajesh    20   0 1813m 1.0g 7844 R 40.1  3.2 947:49.50 interFlowvAMR1                                                                                                                                                                  
16852 Rajesh    20   0 1815m 1.2g 7792 R 40.1  3.8 956:07.44 interFlowvAMR1                                                                                                                                                                  
15586 Rajesh    20   0 1821m 1.0g 7824 R 38.3  3.3 946:58.00 interFlowvAMR1                                                                                                                                                                  
16219 Rajesh    20   0 1808m 1.1g 8196 R 38.3  3.6 953:12.18 interFlowvAMR1                                                                                                                                                                  
16221 Rajesh    20   0 1826m 1.1g 8180 R 38.3  3.6 954:00.33 interFlowvAMR1                                                                                                                                                                  
15891 Rajesh    20   0 1817m 1.0g 8192 R 36.5  3.3 948:09.53 interFlowvAMR1                                                                                                                                                                  
16218 Rajesh    20   0 1830m 1.1g 8220 R 36.5  3.4 947:10.34 interFlowvAMR1                                                                                                                                                                  
16525 Rajesh    20   0 1795m 1.1g 8144 R 36.5  3.7 958:52.91 interFlowvAMR1                                                                                                                                                                  
15587 Rajesh    20   0 1824m 1.1g 7600 R 34.7  3.5 945:53.87 interFlowvAMR1                                                                                                                                                                  
16220 Rajesh    20   0 1830m 1.1g 8032 R 34.7  3.4 948:53.46 interFlowvAMR1                                                                                                                                                                  
16851 Rajesh    20   0 1862m 1.3g 7760 R 34.7  4.0 955:12.77 interFlowvAMR1                                                                                                                                                                  
16853 Rajesh    20   0 1830m 1.2g 7568 R 34.7  3.9 958:54.66 interFlowvAMR1                                                                                                                                                                  
16854 Rajesh    20   0 1831m 1.2g 7772 R 31.0  3.9 956:42.15 interFlowvAMR1
CPU utilization is only 40%, even though I am using only 20 cpus. So I check my multithreading(
grep -i 'ht' /proc/cpuinfo
) and got this info

flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm invpcid_single ssbd pti retpoline ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc flush_l1d
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm invpcid_single ssbd pti retpoline ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc flush_l1d
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm invpcid_single ssbd pti retpoline ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc flush_l1d
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm invpcid_single ssbd pti retpoline ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc flush_l1d
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm invpcid_single ssbd pti retpoline ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc flush_l1d
Old   April 11, 2021, 03:48
Senior Member
krishna kant
Join Date: Feb 2016
Location: Hyderabad, India
Posts: 133
Rep Power: 10
kk415 is on a distinguished road
Is there any command to switch off multithreading in OpenFoam? It is using virtual cpus even if I try to use only physical cpus by limiting to 20 cpus.
