CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Community Contributions

[PyFoam] multiprocessing

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By gschaider

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   December 25, 2017, 04:20
Default multiprocessing
  #1
Member
 
Join Date: Jul 2010
Posts: 55
Rep Power: 16
ashkan is on a distinguished road
Hi All,
I would like to run series of OpenFoam simulations using PyFoam and concurrently, i.e running multiple simulation in parallel.

I created Python script using PyFoam and joblib. The script works perfectly fine when each simulation itself is running in serial. However, when I want to run each simulation in parallel then the script still works but simulations gets very slow.

For example, I have 64 cores and I want to use 2 processors for each simulations, so basically running 32 parallel simulations concurrently.

I checked the CPU usage and when I run the script in parallel then some of processors have very low usage which does not happen when running in serial.

Here is my script and I would highly appreciate if anyone can give me some comments/hints to resolve the issue. I was wondering if it has something to do with then LAMMachine settings (maybe specifying node numbers)?

Many thanks in advance
Ashkan

Code:
import sys,os
import numpy as np
from os import path
from PyFoam.Execution.UtilityRunner import UtilityRunner
from PyFoam.Execution.BasicRunner import BasicRunner
from PyFoam.Infrastructure.ClusterJob import SolverJob
from PyFoam.RunDictionary.SolutionFile import SolutionFile
from PyFoam.RunDictionary.SolutionDirectory import SolutionDirectory
from PyFoam.RunDictionary.ParsedParameterFile import ParsedParameterFile
from PyFoam.Execution.ParallelExecution import LAMMachine
from PyFoam.Basics.DataStructures import Vector
from PyFoam.Error import error

from joblib import Parallel, delayed
import multiprocessing
############################################################################################
num_cores = multiprocessing.cpu_count()   #number of cores on machine

OFcpus  = 2        #number of CPUs for each simulation
num_cases_paral = int(num_cores/OFcpus)
print(num_cases_paral)

solver="simpleFoam"        
OrigCase="BaseCase"
curpath=os.path.abspath('.')

nu      = 1.e-6;
Leng    = 8.0;

U0 = 0.;  Ue = 2.0;  dU = .1
NumUin = int((Ue - U0)/dU) + 1
CurrVel = np.linspace(U0,Ue,NumUin,endpoint=True) 

roughness = np.array([5e-6, 4e-5, 2e-3, 4e-2])
############################################################################################

#---------------------defining the function for cloning and running--------------------

def Run_Case(iu,ir) :
    flowVelocity  = CurrVel[iu];
    ks            = roughness[ir];
    
    #-------Estimating initial values----------
    flowRe  = flowVelocity*Leng/nu
    TurbInt = 0.168*pow(flowRe,-1./8.)
    
    turbulentKE      = (3./2.)*pow((flowVelocity*TurbInt),2.)
    turbulentEpsilon = pow(0.09,3./4.)*pow(turbulentKE,3./2.)/(0.07*Leng);
    turbulentOmega   = np.sqrt(turbulentKE)/(0.07*Leng);
    
    #-------Creating new case directory----------
    NewCase="CurrProfile"+"Uc_"+str(flowVelocity)+"Ks_"+str(ks)
    
    orig=SolutionDirectory(OrigCase,archive=None,paraviewLink=False)
    
    
    case=orig.cloneCase(NewCase)
    dire=SolutionDirectory(NewCase,archive=None,paraviewLink=False)
    

    #-------Modifying initial conditions----------
    velFile=ParsedParameterFile(path.join(dire.initialDir(),"U"))
    velFile["internalField"].setUniform(Vector(flowVelocity,0,0))
    velFile["boundaryField"]["inlet"]["average"]=Vector(flowVelocity,0,0)
    velFile.writeFile ()
    
    pressFile=ParsedParameterFile(path.join(dire.initialDir(),"p"))
    pressFile["internalField"].setUniform(0)
    pressFile.writeFile ()
    
    kFile=ParsedParameterFile(path.join(dire.initialDir(),"k"))
    kFile["internalField"].setUniform(turbulentKE)
    kFile["boundaryField"]["inlet"]["average"]=turbulentKE
    kFile.writeFile ()
    
    omegaFile=ParsedParameterFile(path.join(dire.initialDir(),"omega"))
    omegaFile["internalField"].setUniform(turbulentOmega)
    omegaFile["boundaryField"]["inlet"]["average"]=turbulentOmega
    omegaFile.writeFile ()
    
    epsilonFile=ParsedParameterFile(path.join(dire.initialDir(),"epsilon"))
    epsilonFile["internalField"].setUniform(turbulentEpsilon)
    epsilonFile["boundaryField"]["inlet"]["average"]=turbulentEpsilon
    epsilonFile.writeFile ()
    
    nutFile=ParsedParameterFile(path.join(dire.initialDir(),"nut"))
    nutFile["boundaryField"]["bottom"]["Ks"].setUniform(ks)
    nutFile.writeFile ()

    #-------Meshing----------
    os.system('m4 '+NewCase+'/system/blockMeshDict.m4'+' > '+NewCase+'/system/blockMeshDict')
    blockRun = BasicRunner(argv=["blockMesh","-case",NewCase],silent=True,logname="Blocky")
    print("Running blockMesh")
    blockRun.start()
    if not blockRun.runOK() :
        error("There was a problem with blockMesh in Case ",NewCase)

    #-------decomepose the case----------
    
    decompRun = UtilityRunner(argv=["decomposePar -force","-case",NewCase,],silent=True)
    print("Decomposing the case")
    decompRun.start()
    if not decompRun.runOK() :
        error("There was a problem with decomposPar in Case ",NewCase)
    
    #--------Run the simulation in parallel--------------
    machine = LAMMachine(nr=OFcpus)
    print("Running case",NewCase)
    theRun = BasicRunner(argv=[solver,"-case",NewCase], silent=True, lam=machine)
    theRun.start()


#------------------------Running the Function in parallel-----------------------

Parallel(n_jobs=num_cases_paral)(delayed(Run_Case)(i, j) for j in range(len(roughness)) for i in range(1,len(CurrVel)))
ashkan is offline   Reply With Quote

Old   December 25, 2017, 18:51
Default
  #2
Senior Member
 
Taher Chegini
Join Date: Nov 2014
Location: Houston, Texas
Posts: 125
Rep Power: 13
Taataa is on a distinguished road
OpenFOAM has a lot of utilities that allows you to do these kind of scripting using only bash so you don't have to rely on third-party scripts such as pyFoam. I would suggest to explore those options as well because in the future if you want to use a cluster, setting up all these third-party software in a way to work with OF is a lot of unnecessary and painful work!

Back to your question, do you have 64 physical core or are you using hyper-threading, virtual cores? You can check it by running lscpu and multiply Core(s) per socket by Socket(s). For example on my laptop it would be 2x1 = 2 while CPU(s) is equal to 4. OF only uses the physical cores and if you try use the virtual cores the performance may drop noticeably.

Another question is do you have enough memory for all these cases to run simultaneously? It could be another bottleneck. If I remember correctly, a rule of thumb in OF is for about 1 mil cells you need 1 GB of memory.
Taataa is offline   Reply With Quote

Old   December 25, 2017, 20:32
Default
  #3
Member
 
Join Date: Jul 2010
Posts: 55
Rep Power: 16
ashkan is on a distinguished road
Thanks for your comments Taataa.
I personally believe PyFoam is a very useful tool for interacting with OpenFoam particularly in cases when you need to run many simulations of a same problem but with different conditions. I am just learning it though.

Regarding OF only using physical cores, are you certain? I did a test while ago with and without hyperthreading on my laptop. I noticed that the simulations were slightly faster with 8 cores (hyperthreading) than only 4 physical cores. So I think OF does use virtual cores as well. But I might be wrong!

I also think that the problem might be the physical-virtual cores combinations but believe it might be the Python handling of cores rather than OF, that's why I thought maybe I need to define the node list but have no idea how to do it here.

Thanks again for your comments
Ashkan
ashkan is offline   Reply With Quote

Old   December 26, 2017, 00:43
Default
  #4
Senior Member
 
Taher Chegini
Join Date: Nov 2014
Location: Houston, Texas
Posts: 125
Rep Power: 13
Taataa is on a distinguished road
Yes, I am sure. I asked Chris, one of the OF developers, and he confirmed. It depends on the case but it's better use the physical ones.

Regarding pyFoam, I am not disregarding its benefits but you can do all that it does with bash and OF utilities so why not exploit them? I usually use python scripts for data analysis only.

Anyhow, cores and memory are usually the bottlenecks in these situations where you don't have to worry about IO communications.
Taataa is offline   Reply With Quote

Old   December 31, 2017, 07:04
Default
  #5
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Quote:
Originally Posted by ashkan View Post
Regarding OF only using physical cores, are you certain? I did a test while ago with and without hyperthreading on my laptop. I noticed that the simulations were slightly faster with 8 cores (hyperthreading) than only 4 physical cores. So I think OF does use virtual cores as well. But I might be wrong!
OF does not have any assumptions about the nature of the cores. It just tells MPI "start N instances" assumes that MPI knows what it is doing and MPI usually hands the assigning of the instances to cores to the OS. You can start 200 instances on your 4 core machine and MPI and the OS will try to make sense of the situation in the best way the can (which won't be very effective). In your case this means that the 4 additional instances use parts of the processor that are currently not used by the other 4 (therefor the small speedup)

Quote:
Originally Posted by ashkan View Post
I also think that the problem might be the physical-virtual cores combinations but believe it might be the Python handling of cores rather than OF, that's why I thought maybe I need to define the node list but have no idea how to do it here.

Thanks again for your comments
Ashkan
PyFoam does no core handling at all. It just starts mpirun with appropriate parameters. Usually "only" the "-n" parameter which is determined by the "--procnr" or the "--autosense-parallel"-option (more control is possible by specifying a machinefile with the --machinefile-option). Assigning instances to cores/machines is done by MPI
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   December 31, 2017, 21:52
Default
  #6
Member
 
Join Date: Jul 2010
Posts: 55
Rep Power: 16
ashkan is on a distinguished road
Quote:
Originally Posted by gschaider View Post
OF does not have any assumptions about the nature of the cores. It just tells MPI "start N instances" assumes that MPI knows what it is doing and MPI usually hands the assigning of the instances to cores to the OS. You can start 200 instances on your 4 core machine and MPI and the OS will try to make sense of the situation in the best way the can (which won't be very effective). In your case this means that the 4 additional instances use parts of the processor that are currently not used by the other 4 (therefor the small speedup)
Thanks Bernhard. It absolutely make sense and what I was thinking as well.

Quote:
Originally Posted by gschaider View Post
PyFoam does no core handling at all. It just starts mpirun with appropriate parameters. Usually "only" the "-n" parameter which is determined by the "--procnr" or the "--autosense-parallel"-option (more control is possible by specifying a machinefile with the --machinefile-option). Assigning instances to cores/machines is done by MPI
Thanks for the comment but still I don't understand why when I run multiple instances (each being parallel OF runs) within my PyFoam script using Python "joblib", the speed-up drops so significantly. Is what I am hoping to achieve correct or make sense at all?
ashkan is offline   Reply With Quote

Old   January 1, 2018, 19:14
Default
  #7
Senior Member
 
Taher Chegini
Join Date: Nov 2014
Location: Houston, Texas
Posts: 125
Rep Power: 13
Taataa is on a distinguished road
Quote:
Originally Posted by gschaider View Post
OF does not have any assumptions about the nature of the cores. It just tells MPI "start N instances" assumes that MPI knows what it is doing and MPI usually hands the assigning of the instances to cores to the OS.
OF is pure MPI (no openMP) therefore the recommended setting for OMP_NUM_THEADS is 1, meaning no hyperthreading, one job per cpu, only physical core. If you want to utilize hyperthreading you should implement openMP directives in the the code specifically in the linear solvers which partially has been done before, for example this one and this one.
Taataa is offline   Reply With Quote

Old   January 2, 2018, 08:16
Default
  #8
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Quote:
Originally Posted by ashkan View Post
Thanks Bernhard. It absolutely make sense and what I was thinking as well.



Thanks for the comment but still I don't understand why when I run multiple instances (each being parallel OF runs) within my PyFoam script using Python "joblib", the speed-up drops so significantly. Is what I am hoping to achieve correct or make sense at all?
If I interpret your code correctly correctly then you're filling all the cores (including the multithreading-cores) with OF-instances as cpu_count() (at least on my machine) reports multithreading-"cores" in that number as well. This leads to the behaviour discussed above. Make sure that num_cores corresponds to the number of physical cores. Check with top that the expected number of OF-solver-instances runs (not more than the number of cores)
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   January 2, 2018, 08:18
Default
  #9
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Quote:
Originally Posted by Taataa View Post
OF is pure MPI (no openMP) therefore the recommended setting for OMP_NUM_THEADS is 1, meaning no hyperthreading, one job per cpu, only physical core. If you want to utilize hyperthreading you should implement openMP directives in the the code specifically in the linear solvers which partially has been done before, for example this one and this one.
If he asks MPI to start more processes than physical cores then this won't help. Then the processes will take turns using the CPU
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   January 2, 2018, 23:01
Default
  #10
Member
 
Join Date: Jul 2010
Posts: 55
Rep Power: 16
ashkan is on a distinguished road
Quote:
Originally Posted by gschaider View Post
If I interpret your code correctly correctly then you're filling all the cores (including the multithreading-cores) with OF-instances as cpu_count() (at least on my machine) reports multithreading-"cores" in that number as well. This leads to the behaviour discussed above. Make sure that num_cores corresponds to the number of physical cores. Check with top that the expected number of OF-solver-instances runs (not more than the number of cores)
I think I figure out the issue. Basically, the Python parallelization, decompose the tasks and assign each task (each case/instance of OF simulation) to a single processor. Then because of having "mpirun" within each task, the need more processors which they only have 1. This is why when I use top and look at the CPU percentage not all expected have 100% and only the cores used in the python parallel loop have 100% and hence significant drop in performance.

I am trying to see if it is possible to assign multiple processors to each task in Python so the mpi call have sufficient processors. Any comments is highly appreciated.

Many thanks again for the comments.
ashkan is offline   Reply With Quote

Old   January 3, 2018, 20:23
Default
  #11
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Quote:
Originally Posted by ashkan View Post
I think I figure out the issue. Basically, the Python parallelization, decompose the tasks and assign each task (each case/instance of OF simulation) to a single processor. Then because of having "mpirun" within each task, the need more processors which they only have 1. This is why when I use top and look at the CPU percentage not all expected have 100% and only the cores used in the python parallel loop have 100% and hence significant drop in performance.

I am trying to see if it is possible to assign multiple processors to each task in Python so the mpi call have sufficient processors. Any comments is highly appreciated.

Many thanks again for the comments.
So some cores are not utilized because processes are "pinned" to certain cores?
Try running multiple runs (with pyFoamRunner) without your script and see if the same things happens
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   January 4, 2018, 00:41
Default
  #12
Member
 
Join Date: Jul 2010
Posts: 55
Rep Power: 16
ashkan is on a distinguished road
Quote:
Originally Posted by gschaider View Post
So some cores are not utilized because processes are "pinned" to certain cores?
Try running multiple runs (with pyFoamRunner) without your script and see if the same things happens
I did manage to resolve the issue with Python assigning multiple processors to each case. Here is the link to the discussions

https://stackoverflow.com/questions/...parallel-cases

Also, attached the corrected script.

I have also ran two instances simultaneously each with 2 processors using the PyFoamRunner.py directly rather than my script. Now with the revised script, the processors workload is fine (4 cores have 100% CPU usage) but interestingly, the PyFoamRunner.py is just slightly faster than my python script approach.

Attached also the log files of PyFoam and PythonScript for comparison.

Any comments is highly appreciated.
Attached Files
File Type: zip results.zip (51.2 KB, 12 views)
ashkan is offline   Reply With Quote

Old   January 4, 2018, 11:12
Default
  #13
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Quote:
Originally Posted by ashkan View Post
I did manage to resolve the issue with Python assigning multiple processors to each case. Here is the link to the discussions

https://stackoverflow.com/questions/...parallel-cases

Also, attached the corrected script.

I have also ran two instances simultaneously each with 2 processors using the PyFoamRunner.py directly rather than my script. Now with the revised script, the processors workload is fine (4 cores have 100% CPU usage) but interestingly, the PyFoamRunner.py is just slightly faster than my python script approach.

Attached also the log files of PyFoam and PythonScript for comparison.

Any comments is highly appreciated.
Haven't got the time to look at the stuff.

Just one remark: if you have more than one layer (in your case: mpi, PyFoam, the multiprocessing-library) above the OS then processor pinning is a bad idea: let the OS do the assignment (BTW: PyFoam doesn't mess with the pinning). It is quite good at it
ashkan likes this.
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Reply

Tags
parallel, pyfoam


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Ansa Script ganesh0481 ANSA 47 February 21, 2019 05:09
Ansa Script Grigoriy_Ermolaev ANSA 3 April 20, 2017 05:50


All times are GMT -4. The time now is 04:04.