|
[Sponsors] |
Parallel runs slower with MTU=9000 than MTU=1500 |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
October 28, 2007, 23:30 |
Parallel runs slower with MTU=9000 than MTU=1500
|
#1 |
Guest
Posts: n/a
|
Hi,
I have been trying to build a small cluster with 2 dual-core Pentium D pcs. I've installed SUSE SLES 10, the NIC cards are Gigabit. After two weeks struggling with the network configuration. I'm finally able to perform some benchmark. The problems is that setting up the Jumbo Frames option on the NIC card (MTU=9000) my test case runs slower than the one with the NIC standard option (MTU=1500). Also I've seen that the MTU=9000 options don't use as much CPU than the standard option. Does anyone have experience with this? Any comments would be helpful. I need to improve this to request some extra funds for my research project and build a bigger beowulf cluster. ---- REFERENCE INFO --------- Case: 464000 Hex Cells 3D, PBNS, RNG k-e, multiphase mixture model (2 phases), Multiple Reference frames, unsteady. MTU=9000 (OPTION) Performance Timer for 40 iterations on 4 compute nodes Average wall-clock time per iteration: 13.969 sec Global reductions per iteration: 223 ops Global reductions time per iteration: 0.000 sec (0.0%) Message count per iteration: 854 messages Data transfer per iteration: 30.742 MB LE solves per iteration: 7 solves LE wall-clock time per iteration: 5.445 sec LE global solves per iteration: 2 solves LE global wall-clock time per iteration: 0.085 sec (0.6%) AMG cycles per iteration: 8 cycles Relaxation sweeps per iteration: 316 sweeps Relaxation exchanges per iteration: 76 exchanges Time-step updates per iteration: 0.05 updates Time-step wall-clock time per iteration: 0.015 sec (0.1%) Total wall-clock time: 558.759 sec Total CPU time: 1477.740 sec MTU=1500 (OPTION) Performance Timer for 40 iterations on 4 compute nodes Average wall-clock time per iteration: 7.700 sec Global reductions per iteration: 223 ops Global reductions time per iteration: 0.000 sec (0.0%) Message count per iteration: 854 messages Data transfer per iteration: 30.742 MB LE solves per iteration: 7 solves LE wall-clock time per iteration: 0.605 sec (7.9%) LE global solves per iteration: 2 solves LE global wall-clock time per iteration: 0.003 sec (0.0%) AMG cycles per iteration: 8 cycles Relaxation sweeps per iteration: 316 sweeps Relaxation exchanges per iteration: 76 exchanges Time-step updates per iteration: 0.05 updates Time-step wall-clock time per iteration: 0.016 sec (0.2%) Total wall-clock time: 308.003 sec Total CPU time: 949.780 sec Cheers, Javier |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
parallel fluent runs being killed at partitioing | Ben Aga | FLUENT | 3 | June 8, 2012 11:40 |
Inconsistent behaviour of gMax/gMin for parallel runs | gschaider | OpenFOAM Bugs | 5 | July 29, 2009 15:23 |
Differences between serial and parallel runs | carsten | OpenFOAM Bugs | 11 | September 12, 2008 12:16 |
Help: Serial code to parallel but even slower | Zonexo | Main CFD Forum | 4 | May 14, 2008 11:26 |
Distributed parallel runs on ANSYS CFX 10 | Manoj Kumar | CFX | 4 | January 25, 2006 09:00 |