Slow Progress in Cluster Nodes

vivirichard007 · May 12, 2022, 09:45

Hi,

I'm testing Converge CFD on cluster with 4 nodes & 2 nodes. I'm experiencing slow progress and less usage of CPU
Model name: Water_pump_rotary_MRF_transient_RANS

Node Configuration: AMD HBv3 64 CPU in each node

************************************************** **********
Slurm run script:

#!/bin/bash
#SBATCH --job-name=ConvCFD
#SBATCH --partition=hpc
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=64

cd /home/ConvergeCFD_Test/Water_pump_rotary_MRF_transient_RANS

MACHINES=$(srun hostname | sort | uniq -c | awk '{print $2 ":" $1}' | paste -s -d "\n" -)

for i in "${MACHINES[@]}"
do echo "$i">hostfile
done

echo $MACHINES

#Run Converge CFD
export RLM_LICENSE="2765@10.32.0.4"
source /home/Convergent_Science/Environment/scripts/CONVERGE/CONVERGE-MPICH/3.1.5.sh
mpirun -hostfile hostfile converge-mpich --super &> logfile_4Node_Water_Pump.out

************************************************** *********
ASK:

I can see all the CPUs are running with Converge CFD, but not 100%, all the CPUs are utilizing only about 30-40% and overall consumption also around 40%.

Is there any way to utilize the full potential of CPU from Convergent Science side???

Note: < dev/null> is not working. If we use this, the simulation is not at all starting.

Kindly do the needful.

vivirichard007 · May 16, 2022, 09:26

HI,

I solved the issue by using Intel MPI instead of MPICH. I have infiniband network and Intel MPI will work better with it.

Thanks.

ChadTellier · May 25, 2022, 06:52

If we use this, the simulation is not at all starting.

psrikanth · May 26, 2022, 18:18

Hi,

Are you able to run your case with other mpi versions (mpich?)? What error message do you get?

Praveen

vivirichard007 · June 3, 2022, 09:14

Quote:

Originally Posted by psrikanth

Hi,

Are you able to run your case with other mpi versions (mpich?)? What error message do you get?

Praveen

Hi,

I'm not getting any error if I use MPICH, just the CPUs are not utilized fully to 100%. When I changed the MPI to IntelMPI, all the CPUs are utilized fully.

May 12, 2022, 09:45	Slow Progress in Cluster Nodes	#1
vivirichard007 New Member Richard Join Date: Mar 2022 Posts: 4 Rep Power: 4	Hi, I'm testing Converge CFD on cluster with 4 nodes & 2 nodes. I'm experiencing slow progress and less usage of CPU Model name: Water_pump_rotary_MRF_transient_RANS Node Configuration: AMD HBv3 64 CPU in each node ************************************************ ****** Slurm run script: #!/bin/bash #SBATCH --job-name=ConvCFD #SBATCH --partition=hpc #SBATCH --nodes=4 #SBATCH --ntasks-per-node=64 cd /home/ConvergeCFD_Test/Water_pump_rotary_MRF_transient_RANS MACHINES=$(srun hostname \| sort \| uniq -c \| awk '{print $2 ":" $1}' \| paste -s -d "\n" -) for i in "${MACHINES[@]}" do echo "$i">hostfile done echo $MACHINES #Run Converge CFD export RLM_LICENSE="2765@10.32.0.4" source /home/Convergent_Science/Environment/scripts/CONVERGE/CONVERGE-MPICH/3.1.5.sh mpirun -hostfile hostfile converge-mpich --super &> logfile_4Node_Water_Pump.out ********************************************** ******* ASK: I can see all the CPUs are running with Converge CFD, but not 100%, all the CPUs are utilizing only about 30-40% and overall consumption also around 40%. Is there any way to utilize the full potential of CPU from Convergent Science side??? Note: < dev/null> is not working. If we use this, the simulation is not at all starting. Kindly do the needful.

May 16, 2022, 09:26	Use Intel MPI	#2
vivirichard007 New Member Richard Join Date: Mar 2022 Posts: 4 Rep Power: 4	HI, I solved the issue by using Intel MPI instead of MPICH. I have infiniband network and Intel MPI will work better with it. Thanks.

May 25, 2022, 06:52		#3
ChadTellier New Member Join Date: May 2022 Posts: 1 Rep Power: 0	If we use this, the simulation is not at all starting. __________________ math games for 3rd graders

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
OpenFOAM running on two nodes (cluster) :confused:	sharonyue	OpenFOAM	0	February 22, 2022 03:18
StarCCMS+ on AWS Parallel Cluster not distributing workload across multiple nodes	dwagoner	STAR-CCM+	3	May 25, 2021 03:39
Adding compute nodes - new nodes faster than existing ones?	SLC	Hardware	3	November 14, 2019 16:35
Compute Cluster with diskless compute nodes	Pauli	Hardware	0	October 6, 2015 17:48
CFX4.3 -build analysis form	Chie Min	CFX	5	July 13, 2001 00:19

May 26, 2022, 18:18		#4
psrikanth New Member Praveen Srikanth Join Date: May 2021 Location: Convergent Science, Madison, WI Posts: 19 Rep Power: 5	Hi, Are you able to run your case with other mpi versions (mpich?)? What error message do you get? Praveen