|
[Sponsors] |
Errors with openmpi/4.1.4 on Slurm HPC OF-5.x |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
February 2, 2023, 22:29 |
Errors with openmpi/4.1.4 on Slurm HPC OF-5.x
|
#1 |
New Member
Jonathan
Join Date: Sep 2022
Posts: 6
Rep Power: 4 |
Hello,
My name is Jonathan, and I am a student working on my master's thesis working with cfd with aerospace application. I am getting a similar error on my school HPC that uses a SLURM architecture instead. I am starting my runs on new epyc128 nodes. The error is shown below... No components were able to be opened in the pml framework. This typically means that either no components of this type were installed, or none of the installed components can be loaded. Sometimes this means that shared libraries required by these components are unable to be found/loaded. Host: cn484 Framework: pml -------------------------------------------------------------------------- [cn484:2795285] PML ucx cannot be selected [cn484:2795291] PML ucx cannot be selected [cn484:2795275] 1 more process has sent help message help-mca-base.txt / find-available:none found [cn484:2795275] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages No matter what I try to change I get the above message with multiple or a single error in the bottom corresponding to the line that says "PML ucx cannot be selected" in the last few lines of the error message. The HPC center employees seem to think that my OF5.x env file is conflicting with the environmental variables that are loaded in the other modules in my batch script. I think that unless I am mistaken there is no way I can successfully launch an OF simulation without setting the environment. I am wondering about suggestions or any assistance with troubleshooting anyone is offering for this issue as I cannot find much to go off of besides the post in this link. My batch script that I am attempting to submit is the following... #!/bin/bash #SBATCH --account=xiz14026 #SBATCH -J re6000 #job name #SBATCH -o BFS_6000.o%j #output and error file name (%j expands to jobID #SBATCH -e BFS_erro.%j #SBATCH --partition=priority # allow 12 hours and parallel works #SBATCH --constraint=epyc128 #SBATCH --ntasks=128 #SBATCH --nodes=1 # Ensure all cores are from whole nodes #SBATCH --time=12:00:00 module purge module load slurm module load gcc/11.3.0 module load zlib/1.2.12 module load ucx/1.13.1 module load openmpi/4.1.4 module load boost/1.77.0 module load cmake/3.23.2 cd /home/jcd17002 source OF5x.env cd /scratch/xiz14026/jcd17002/BFS_6000 #srun -n 192 --mpi=openmpi pimpleFoam -parallel> my_prog.out mpirun -np 16 -x UCX_NET_DEVICES=mlx5_0:1 pimpleFoam -parallel > my_prog.out to explain further I am trying to submit my case on an epyc128 node. I am using OpenFOAM-5.x and Openmpi4.1.4 (based on recommendation from people working at HPC) all modules that I thought I should use are in my batch script. if There are any questions I can clarify further. the file that I source to set env variables in the batch script is the following... export SYS_MPI_HOME=/gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/4.1.4 export SYS_MPI_ARCH_PATH=/gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/4.1.4 export IROOT=/gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/4.1.4 export MPI_ROOT=$IROOT export FOAM_INST_DIR=/$HOME/OpenFOAM/ #export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$FORT_COM_LIB64 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SYS_MPI_ARCH_PAT H/include export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SYS_MPI_ARCH_PAT H/lib #/apps2/openmpi/$mpi_version/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$FOAM_INST_DIR/ThirdParty/lib #export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps2/intel/ips/2019u3/impi/2019.3.199/intel64/libfabric/lib #export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps2/intel/ips/2019u3/impi/2019.3.199/intel64/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps/cgal/4.0.2/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps2/boost/1.77.0/lib foamDotFile=$FOAM_INST_DIR/OpenFOAM-5.x/etc/bashrc #[ -f $foamDotFile ] && . $foamDotFile . $FOAM_INST_DIR/OpenFOAM-5.x/etc/bashrc echo "Sourcing Bashrc" #source $FOAM_INST_DIR/OpenFOAM-5.x/etc/config.sh/settings echo "Done" export OF_ENVIRONMENT_SET=TRUE alias cds="cd /scratch/xiz14026/jcd17002/" unset mpi_version unset fort_com_version echo "Done." #echo "" This environment file was specifically changed to help make the mpi paths used in my HPC's openmpi/4.1.4 module agree with the openFOAM settings. I am wondering what kind of help anyone can offer regarding this issue. I am not expert with openmpi module. Thank you for your time and consideration. |
|
Tags |
mpi errors, openfaom-5, openmpi 4 |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Building OpenFOAM1.7.0 from source | ata | OpenFOAM Installation | 46 | March 6, 2022 14:21 |
pimpleDyMFoam computation randomly stops | babapeti | OpenFOAM Running, Solving & CFD | 5 | January 24, 2018 06:28 |
Floating point exception error | lpz_michele | OpenFOAM Running, Solving & CFD | 53 | October 19, 2015 03:50 |
Upgraded from Karmic Koala 9.10 to Lucid Lynx10.04.3 | bookie56 | OpenFOAM Installation | 8 | August 13, 2011 05:03 |
Could anybody help me see this error and give help | liugx212 | OpenFOAM Running, Solving & CFD | 3 | January 4, 2006 19:07 |