Issues with mpirun in HPC

lebc · March 24, 2019, 15:34

Hi all,

I've been trying to run an optimization software that will run several OpenFOAM cases in a HPC, but I'm having a hard time to make it work, maybe someone can help me with this...

I have two scenarios, both should run 4 OpenFOAM cases using 4 processors each:

1- 16 processors allocated in only 1 node: in OpenFOAM I'm using all the commands in parallel in the format "mpirun -np 4 --bind-to none simpleFoam -parallel < /dev/null > log.simpleFoam 2>&1". The result is that the simulations are being run in the same processors, which is taking a very long time to finish. In my notebook I could use this command with no issues.

2- 16 processors allocated in 4 nodes (4 processors each node): I use the same kind of command, but now all the simulations are running in the same node, which also takes a long time to finish.

Does anyone know how to solve any of the scenarios? I'm using SLURM to send the job, along with OF4.1 and Dakota for optimization.

In case you need any additional information just let me know!

Thanks! =)

wyldckat · March 24, 2019, 15:50

Quick answer/question:

From your description, the problem is that you are using the same job for running the 4 cases.
If you were to provide an example SLURM job file that you are using, it would make it a lot easier to try and suggest how to correct this issue.
Knowing which MPI toolbox and version is being used would also make it a bit easier to suggest more specific options...
I've got a fairly large blog post with notes on the overall topic: Notes about running OpenFOAM in parallel
- From there, see post #9 on this thread: mpirun, best parameters

lebc · March 24, 2019, 16:48

Hi Bruno,

Thank you for your reply!

Actually I have already checked the links you sent me, I found the bind options there, but I couldn't find a solution to my problem there.

From your questions:
2- This is the job file (very simple, the action actually happens when I call Dakota, everything happens in the input for Dakota, including the generation of the cases to run):

Code:

#!/bin/bash  -v
#SBATCH --partition=SP2
#SBATCH --ntasks=16             # number of tasks / mpi processes
#SBATCH --ntasks-per-node=4             # number of tasks / node (teste)
#SBATCH --cpus-per-task=1       # Number OpenMP Threads per process
#SBATCH -J OpenFOAM
#SBATCH --time=15:00:00         # Se voce nao especificar, o default é 8 horas. O limite é 480
#SBATCH --mem-per-cpu=2048     # 24 GB RAM por CPU. Maximo de 480000 por todas as CPUs


#OpenMP settings:
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OMP_PLACES=threads
export OMP_PROC_BIND=spread

echo $SLURM_JOB_ID              #ID of job allocation
echo $SLURM_SUBMIT_DIR          #Directory job where was submitted
echo $SLURM_JOB_NODELIST        #File containing allocated hostnames
echo $SLURM_NTASKS              #Total number of cores for job

#run the application:

export PATH=/apps/gnu:$PATH

cd $HOME/opt_test/bump/18_WTGs

./initial_sol_run

dakota -i dakota_of.in

exit

3- I'm using openmpi 1.10.7

What I found weird is that when I run the same case (using less processors) in my notebook, it works fine, all OpenFOAM cases are distributed to the processors.

The only solution I could think about is to use 4 nodes and specify the node to be used in each OpenFOAM case during the optimization process, but it will be a little difficult to do. This HPC is shared, it has 128 nodes, so I will only know which nodes will be allocated when the job starts, I cannot set this up before submitting the job.

Let me know if you need any additional information!

Best Regards,
Luis

wyldckat · March 24, 2019, 17:51

Quick answers:

From your job script, the variable "SLURM_JOB_NODELIST" has the path to the file that contains the node names, therefore that way you can have some method to extract the names of the machines and adjust the call to mpirun for each case.
- Although, the SLURM documentation states that it provides the list of nodes directly and not the path to a file.
I'm not familiar with Dakota, so I don't know how you are telling it how to run cases in parallel.
If Open-MPI is being used, the selection of nodes (machines) can be done with "--host" as explained here: https://www.open-mpi.org/faq/?catego...mpirun-options
My guess is that SLURM has overridden how mpirun works and might disregard the option "--bind-to none" for security reasons, e.g. to avoid having someone use the cores from another run on a node.

lebc · March 24, 2019, 18:41

Hi Bruno,

From your answers:

1- At the same time I was writing this post I was writing a code to extract the nodes names, this part I was able to do. In my case, I'm saving the output of SLURM_JOB_NODELIST to a file, which gives me the name of all nodes allocated to my job;

2- Dakota is basically generating the input for my CFD cases, I have several scripts to read this input file and create the cases I need, including writing Allrun scripts, where I use the mpi commands to run OpenFOAM;

3- I was trying this option, but only giving a file with the name of all possible hosts, which didn't work... I guess I have to provide the specific host, right?

4- Makes a lot of sense to me, once a node can be shared with other users.

I'll keep working in a way to provide specific hosts to each CFD case, this will give me a lot of work, but I don see other way out.

Anyway, if you have any other advice, I'm opened to it!!

Thank you for taking your Sunday time to help me!! =)

lebc · April 22, 2019, 18:20

@Bruno,

The administrators of the HPC changed some configurations, now I'm using the option --bind-to socket and I was able to check that each process is running in a different CPU.

The speed is not as high as I was expecting, maybe because of the clock of the HPC CPUs, but that's something I cannot change.

Thank you for the help!

[Moderator note: This post was split half-way, where the first part was moved to here: mpi killer code ]

March 24, 2019, 15:34	Issues with mpirun in HPC	#1
lebc Member Luis Eduardo Join Date: Jan 2011 Posts: 85 Rep Power: 15	Hi all, I've been trying to run an optimization software that will run several OpenFOAM cases in a HPC, but I'm having a hard time to make it work, maybe someone can help me with this... I have two scenarios, both should run 4 OpenFOAM cases using 4 processors each: 1- 16 processors allocated in only 1 node: in OpenFOAM I'm using all the commands in parallel in the format "mpirun -np 4 --bind-to none simpleFoam -parallel < /dev/null > log.simpleFoam 2>&1". The result is that the simulations are being run in the same processors, which is taking a very long time to finish. In my notebook I could use this command with no issues. 2- 16 processors allocated in 4 nodes (4 processors each node): I use the same kind of command, but now all the simulations are running in the same node, which also takes a long time to finish. Does anyone know how to solve any of the scenarios? I'm using SLURM to send the job, along with OF4.1 and Dakota for optimization. In case you need any additional information just let me know! Thanks! =)

March 24, 2019, 15:50		#2
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Quick answer/question: From your description, the problem is that you are using the same job for running the 4 cases. If you were to provide an example SLURM job file that you are using, it would make it a lot easier to try and suggest how to correct this issue. Knowing which MPI toolbox and version is being used would also make it a bit easier to suggest more specific options... I've got a fairly large blog post with notes on the overall topic: Notes about running OpenFOAM in parallel From there, see post #9 on this thread: mpirun, best parameters __________________ OpenFOAM: FAQ \| Getting started Forum: How to get help, to post code/output and forum guide Read this before sending me PM

March 24, 2019, 16:48		#3
lebc Member Luis Eduardo Join Date: Jan 2011 Posts: 85 Rep Power: 15	Hi Bruno, Thank you for your reply! Actually I have already checked the links you sent me, I found the bind options there, but I couldn't find a solution to my problem there. From your questions: 2- This is the job file (very simple, the action actually happens when I call Dakota, everything happens in the input for Dakota, including the generation of the cases to run): Code: #!/bin/bash -v #SBATCH --partition=SP2 #SBATCH --ntasks=16 # number of tasks / mpi processes #SBATCH --ntasks-per-node=4 # number of tasks / node (teste) #SBATCH --cpus-per-task=1 # Number OpenMP Threads per process #SBATCH -J OpenFOAM #SBATCH --time=15:00:00 # Se voce nao especificar, o default é 8 horas. O limite é 480 #SBATCH --mem-per-cpu=2048 # 24 GB RAM por CPU. Maximo de 480000 por todas as CPUs #OpenMP settings: export OMP_NUM_THREADS=1 export MKL_NUM_THREADS=1 export OMP_PLACES=threads export OMP_PROC_BIND=spread echo $SLURM_JOB_ID #ID of job allocation echo $SLURM_SUBMIT_DIR #Directory job where was submitted echo $SLURM_JOB_NODELIST #File containing allocated hostnames echo $SLURM_NTASKS #Total number of cores for job #run the application: export PATH=/apps/gnu:$PATH cd $HOME/opt_test/bump/18_WTGs ./initial_sol_run dakota -i dakota_of.in exit 3- I'm using openmpi 1.10.7 What I found weird is that when I run the same case (using less processors) in my notebook, it works fine, all OpenFOAM cases are distributed to the processors. The only solution I could think about is to use 4 nodes and specify the node to be used in each OpenFOAM case during the optimization process, but it will be a little difficult to do. This HPC is shared, it has 128 nodes, so I will only know which nodes will be allocated when the job starts, I cannot set this up before submitting the job. Let me know if you need any additional information! Best Regards, Luis

March 24, 2019, 17:51		#4
wyldckat Retired Super Moderator Bruno Santos Join Date: Mar 2009 Location: Lisbon, Portugal Posts: 10,981 Blog Entries: 45 Rep Power: 128	Quick answers: From your job script, the variable "SLURM_JOB_NODELIST" has the path to the file that contains the node names, therefore that way you can have some method to extract the names of the machines and adjust the call to mpirun for each case. Although, the SLURM documentation states that it provides the list of nodes directly and not the path to a file. I'm not familiar with Dakota, so I don't know how you are telling it how to run cases in parallel. If Open-MPI is being used, the selection of nodes (machines) can be done with "--host" as explained here: https://www.open-mpi.org/faq/?catego...mpirun-options My guess is that SLURM has overridden how mpirun works and might disregard the option "--bind-to none" for security reasons, e.g. to avoid having someone use the cores from another run on a node.

March 24, 2019, 18:41		#5
lebc Member Luis Eduardo Join Date: Jan 2011 Posts: 85 Rep Power: 15	Hi Bruno, From your answers: 1- At the same time I was writing this post I was writing a code to extract the nodes names, this part I was able to do. In my case, I'm saving the output of SLURM_JOB_NODELIST to a file, which gives me the name of all nodes allocated to my job; 2- Dakota is basically generating the input for my CFD cases, I have several scripts to read this input file and create the cases I need, including writing Allrun scripts, where I use the mpi commands to run OpenFOAM; 3- I was trying this option, but only giving a file with the name of all possible hosts, which didn't work... I guess I have to provide the specific host, right? 4- Makes a lot of sense to me, once a node can be shared with other users. I'll keep working in a way to provide specific hosts to each CFD case, this will give me a lot of work, but I don see other way out. Anyway, if you have any other advice, I'm opened to it!! Thank you for taking your Sunday time to help me!! =) wyldckat likes this.

April 22, 2019, 18:20		#6
lebc Member Luis Eduardo Join Date: Jan 2011 Posts: 85 Rep Power: 15	@Bruno, The administrators of the HPC changed some configurations, now I'm using the option --bind-to socket and I was able to check that each process is running in a different CPU. The speed is not as high as I was expecting, maybe because of the clock of the HPC CPUs, but that's something I cannot change. Thank you for the help! [Moderator note: This post was split half-way, where the first part was moved to here: mpi killer code ] wyldckat likes this. Last edited by wyldckat; April 22, 2019 at 18:23. Reason: see "Moderator note:"

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
mpirun hangup.. can anyone help how to fix mpirun issues.	prameelar	OpenFOAM	12	February 16, 2022 17:23
MPIrun Problem with FoamJob Workaround	kaszt	OpenFOAM Running, Solving & CFD	3	October 4, 2018 13:55
mpirun unable to find SU2_PRT	Apollinaris	SU2 Installation	1	May 10, 2017 06:31
Multigrid Stability Issues	ThomasHermann	SU2	1	November 5, 2014 17:18
Can we merge HPC Pack licenses?	Phillamon	FLUENT	0	January 24, 2014 03:59