|
[Sponsors] |
Running on 4 node error MPI Errors[320798736] |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
April 1, 2020, 04:34 |
Running on 4 node error MPI Errors[320798736]
|
#1 |
New Member
Join Date: Sep 2012
Posts: 26
Rep Power: 14 |
Good morning boys , i have a question. Until this moment i used 3 nodes from 24 cores (v3) for node; and all goes well. Yesterday i buyed another node , in this case is a double xeon (v4). In this moment only one cpu function because the second is without cooling. As Always i started my system with the new node , but as i write in the object i return an error MPI.
The log is this: SEVERE [star.base.neo.ClientNotifyHandler]: MPI Errors[320798736] : MPI_Waitall: Internal MPI error: Filename too long MPI Errors[320798736] : MPI_Waitall: Internal MPI error: Filename too long Now i try with some test: Node 1 , 2 and 3 are the old nodes ; node 4 is the newone Test 1: Node 1,2,3,4 with only a core for node --test passed Node 1,2,3,4 with 16 cores for node -- test passed Node 1,2,4 with 24 core for node 1 and 2 and 16 cores for node 4 --test passed Node 1,2,3 with 24 cores for nodes and node 4 with 16 cores -- test failed Node 1,2,3 with 24 cores for nodes and node 4 with 1 cores -- test failed Anyone have a idea? |
|
April 1, 2020, 12:49 |
|
#2 |
Senior Member
Sebastian Engel
Join Date: Jun 2011
Location: Germany
Posts: 567
Rep Power: 21 |
Is your cluster running a windows system by chance?
|
|
April 1, 2020, 13:14 |
|
#3 |
New Member
Join Date: Sep 2012
Posts: 26
Rep Power: 14 |
||
April 1, 2020, 14:03 |
|
#4 |
New Member
Join Date: Sep 2012
Posts: 26
Rep Power: 14 |
i chaged the mpi from ibm to intel. The error is Always present but isn't the same:
SEVERE [star.base.neo.ClientNotifyHandler]: Connection reset java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream. java:210) at java.net.SocketInputStream.read(SocketInputStream. java:141) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.j ava:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.ja va:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:1 78) at java.io.InputStreamReader.read(InputStreamReader.j ava:184) at java.io.BufferedReader.fill(BufferedReader.java:16 1) at java.io.BufferedReader.read(BufferedReader.java:18 2) at star.base.neo.NeoProperty.readNextChar(NeoProperty .java:804) at star.base.neo.NeoProperty.input(NeoProperty.java:5 83) [catch] at star.base.neo.ClientNotifyHandler.run(ClientNotify Handler.java:427) at java.lang.Thread.run(Thread.java:748) WARNING [org.netbeans.core.multiview.MultiViewTopComponent]: The MultiviewDescription instance class star.coremodule.ui.SimulationMultiviewDesc is not serializable. Cannot persist TopComponent. WARNING [null]: Last record repeated again. |
|
April 2, 2020, 05:59 |
|
#5 |
New Member
Join Date: Sep 2012
Posts: 26
Rep Power: 14 |
Now i add the second cpu , so mi situation is 4 nodes with two cpu for node:
Node 1: 2x e5 2658v3 (24 cores) Node 2: 2x e5 2658v3 (24 cores) Node 3: 2x e5 2658v3 (24 cores) Node 4: 2x e5 2683v4 (32 cores) Today i started a new session , with two tests: -1 : 4 nodes with 24 cores for node ---passed -2 : 3 nodes with 24 cores for nodes and node 4 with 32 cores --- failed Is it possible that all node must have the same cores? But how with 3 nodes i could use 2 node with 24 cores and one with 16 cores? |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
128 core cluster E5-26xx V4 processor choice for Ansys FLUENT | F1aerofan | Hardware | 30 | January 19, 2018 04:53 |
mpirun, best parameters | pablodecastillo | Hardware | 18 | November 10, 2016 13:36 |
running without rsh between nodes | hattonps | OpenFOAM | 10 | March 22, 2010 16:02 |
Kubuntu uses dash breaks All scripts in tutorials | platopus | OpenFOAM Bugs | 8 | April 15, 2008 08:52 |
Running MPI code on a multiprocessor node | wen | Main CFD Forum | 7 | April 19, 2006 17:35 |