|
[Sponsors] |
March 20, 2009, 16:48 |
Problem running fluent with InfiniBand
|
#1 |
New Member
Anonymous
Join Date: Mar 2009
Posts: 4
Rep Power: 17 |
Good evening,
I hope someone can help me. I got a new small Cluster. The First one with Infiniband. Now i try to use fluent with InfiniBand but i got always a failure. fluent_mpi.6.3.26: Rank 0:10: MPI_Init: dlopen failed: libmtl_common.so: cannot open shared object file: No such file or directory fluent_mpi.6.3.26: Rank 0:10: MPI_Init: vapi_resolve_entrypoints() failed fluent_mpi.6.3.26: Rank 0:10: MPI_Init: Can't initialize RDMA device fluent_mpi.6.3.26: Rank 0:10: MPI_Init: MPI BUG: Cannot initialize RDMA protocol I can start fluent over Ethernet without any problem. Where i can get this file which is missing? libmtl_common.so OS is CentOS 5.2 Good Bye Blackpuma |
|
March 22, 2009, 03:26 |
|
#2 |
New Member
Gilad shainer
Join Date: Mar 2009
Posts: 2
Rep Power: 0 |
Have you run the subnet manager first for getting the IB network up?
|
|
March 22, 2009, 07:13 |
|
#3 |
New Member
Anonymous
Join Date: Mar 2009
Posts: 4
Rep Power: 17 |
OpenSM on the headnode is running. A ibping worked. Have I to install this programm on every node?
Can someon tell me where i can get the file libmtl_common.so? In which paket the file is included? |
|
March 23, 2009, 13:13 |
|
#4 |
New Member
Gilad shainer
Join Date: Mar 2009
Posts: 2
Rep Power: 0 |
You can send email to hpc@mellanox.com, and they will be able to help you. This email is of the HCP Advisory Council help desk (free .. :-) )
|
|
August 3, 2009, 15:22 |
|
#5 |
New Member
Join Date: Aug 2009
Posts: 4
Rep Power: 17 |
even i have the same problem with following error
Host spawning Node 0 on machine "cl1n004" (unix). /home/cfd/FLUENT/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 3ddp -node -alnx86 -t16 -pib -mpi=hp -cnf=parallel -mport 10.0.1.4:10.0.1.4:38940:0 Starting /home/cfd/FLUENT/Fluent.Inc/fluent6.3.26/multiport/mpi/lnx86/hp/bin/mpirun -prot -vapi -e MPI_HASIC_VAPI=1 -e MPI_USE_MALLOPT_SBRK_PROTECTION=1 -e MPI_USE_MALLOPT_AVOID_MMAP=1 -f /tmp/fluent-appfile.25401 fluent_mpi.6.3.26: Rank 0:0: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes). fluent_mpi.6.3.26: Rank 0:0: MPI_Init: Error intializing pin/unpin structures fluent_mpi.6.3.26: Rank 0:0: MPI_Init: MPI BUG: Cannot initialize RDMA protocol MPI Application rank 0 exited before MPI_Init() with status 1 fluent_mpi.6.3.26: Rank 0:8: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes). fluent_mpi.6.3.26: Rank 0:8: MPI_Init: Error intializing pin/unpin structures fluent_mpi.6.3.26: Rank 0:8: MPI_Init: MPI BUG: Cannot initialize RDMA protocol MPI Application rank 8 exited before MPI_Init() with status 1 fluent_mpi.6.3.26: Rank 0:2: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes). fluent_mpi.6.3.26: Rank 0:2: MPI_Init: Error intializing pin/unpin structures fluent_mpi.6.3.26: Rank 0:2: MPI_Init: MPI BUG: Cannot initialize RDMA protocol MPI Application rank 1 killed before MPI_Init() with signal 15 MPI Application rank 2 exited before MPI_Init() with status 1 MPI Application rank 4 killed before MPI_Init() with signal 15 MPI Application rank 6 killed before MPI_Init() with signal 15 MPI Application rank 3 killed before MPI_Init() with signal 15 MPI Application rank 5 killed before MPI_Init() with signal 15 MPI Application rank 7 killed before MPI_Init() with signal 15 fluent_mpi.6.3.26: Rank 0:14: MPI_Init: ERROR: The total amount of memory that may be pinned (3355443 bytes), is insufficient to support even minimal rdma network transfers. This value was derived by taking 20% of physical memory (134217728 bytes) and dividing by the number of local ranks (8). A minimum of 14688256 bytes must be able to be pinned. These values can be changed by setting the environment variables MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY (Mbytes). |
|
August 4, 2009, 02:27 |
|
#6 |
New Member
Anonymous
Join Date: Mar 2009
Posts: 4
Rep Power: 17 |
Good morning Chinmay!
Do you start fluent over Infiniband or Ethernet? Try to set the hard an soft limit to unlimited. Therefor insert into the file /etc/security/limits.conf the 2 lines: Code:
. . . * soft memlock unlimited * hard memlock unlimited . . . |
|
August 4, 2009, 13:50 |
|
#7 |
New Member
Join Date: Aug 2009
Posts: 4
Rep Power: 17 |
hi
Thanks for your help I am trying to start fluent on Infiniband. The hard and soft limits are already set to unlimited |
|
August 4, 2009, 14:51 |
|
#8 |
New Member
Anonymous
Join Date: Mar 2009
Posts: 4
Rep Power: 17 |
Are all Infiniband devices Active?
try ibstat Code:
CA 'mlx4_0' CA type: MT25418 Number of ports: 2 Firmware version: 2.5.0 Hardware version: a0 Node GUID: 0x001e0bffff8446a4 System image GUID: 0x001e0bffff8446a7 Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 5 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x001e0bffff8446a5 Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510868 Port GUID: 0x001e0bffff8446a6 Have you opensm installed and is it running? It's the subnet manager. |
|
August 8, 2009, 07:53 |
|
#9 |
New Member
Join Date: Aug 2009
Posts: 4
Rep Power: 17 |
reply from ibstat:
CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.2.0 Hardware version: a0 Node GUID: 0x0008f1040397e9f0 System image GUID: 0x0008f1040397e9f3 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x02510a68 Port GUID: 0x0008f1040397e9f1 I can run fluent using ethernet but not with infiniband |
|
August 8, 2009, 07:58 |
|
#10 |
New Member
Join Date: Aug 2009
Posts: 4
Rep Power: 17 |
Initially hpmpi was not installed, I have installed it now (ver. 2.3.1.), I installed it on the master node and two other nodes also, still couldn't float run using infiniband.
|
|
August 28, 2011, 02:16 |
|
#11 | |
New Member
Join Date: Dec 2010
Posts: 2
Rep Power: 0 |
Quote:
hi, Have you solve your problem? I encountered the same problem recently, if you solved it, can you help me out of puzzle ,I will appreciate it . |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
For Nozzle fluent problem | Jie | FLUENT | 17 | January 11, 2012 14:44 |
Problem of import fluent data to ilight fieldview | seasoul | FLUENT | 3 | September 9, 2008 23:36 |
Problem about Fluent on Linux | hbinma | FLUENT | 3 | July 6, 2008 11:49 |
UDF problem caused by various version of Fluent | Yurong | FLUENT | 3 | January 15, 2006 11:57 |
Problem using parallel Fluent | Gustavo | FLUENT | 0 | June 28, 2004 00:12 |