|
[Sponsors] |
January 16, 2024, 04:09 |
DL580 G9 or ASUS ESC4000 G4?
|
#1 |
Member
EM
Join Date: Sep 2019
Posts: 59
Rep Power: 7 |
Hello,
Has anyone an opinion on the following? My current front end system (hp ml350 g9, 2x 2690v4, ddr4 2400) seems to be struggling computationally holding back gpu (4x mi50/60) occupancy to less than 50% (75%-80% of the fp64 operations are done on gpus.) buying it was a mistake. i can only stretch it to ~800M nodes (spectral) even though the gpus are underused. this holds things back quite a bit. while looking for a cheap upgrade, it looks as if an hp dl580 g9 with 4x E7-8880 v4 and 16x 16GB - ddr4 2400MHz which will default to 1800MHz, may be a viable option, with a cost of ~1.5K euro (incl vat) - from the uk. Forgetting the gpus, for general cfd computations, is this a 'worthwhile' upgrade? Another possible option is an asus esc4000 g4 with 2x xeon 6148 and 12x 16GB ddr4 @2666. This looks like hovering at ~2.2K euro - if components are bought separatelly. is it possible to say which one system (asus/dl580) would be the better option? Any alternatives? Thanks. == |
|
January 16, 2024, 05:01 |
|
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,426
Rep Power: 49 |
Are we absolutely certain that it is the CPUs FP64 performance that is holding back the GPUs?
From the docs, it looks like the hp ml350 g9 only has three PCIe3.0 x16 slots. There is a fourth mechanical x16 slot, but the bus width on that is only x8. Maybe it's a bandwidth or locality issue? |
|
January 16, 2024, 08:26 |
|
#3 | |
Member
EM
Join Date: Sep 2019
Posts: 59
Rep Power: 7 |
Quote:
good point. i did not want to write an overlong post, so not every detail was given. read/write between the gpu and host ram is controlled by the card itself. so u can get a good idea where the calc is spending most time by simply watching the output of htop for a few minutes. also, an implicit check is monitoring the temperatures of the cpus and gpus. more to the point, i use the gpu dma asynchronous data transfer hardware (with pinned memory on the host) which enable data exchanges while calculations are being carried out. only gpu priming entails idling hardware but its effect is quite small. i went to the intel fortran forum and asked if sections of data can be allocated to specific socket ram. the short answer was: not directly. the distribution of data between socket rams is auto-determined by the system when first use of an array is made and not during declaration. the only control u have is to experiment with different first-use methods of access and hope that u will hit a favourable set up which u can reuse. == |
||
January 16, 2024, 09:35 |
|
#4 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,426
Rep Power: 49 |
There is not too much crossing fingers involved with data placement. Unless you instruct the OS to mess things up, first touch data placement is fairly reliable.
Provided there is enough free memory on every NUMA node at startup. numactl -H is our friend Dropping caches before running a code that is sensitive to data placement is a must (requires root privileges): echo 3 > /proc/sys/vm/drop_caches With that, data placement should be 100% repeatable. These issues will get more pronounced with more complicated NUMA topology. Like having 4 sockets in a shared memory system. But yeah, the hp dl580 g9 should give you roughly twice the CPU performance. And full PCIe bandwidth for each of the 4 GPUs. As a nice bonus, each GPU can be attached to one of the 4 sockets. That might make some pinning easier/more straightforward. 1500€ sounds a bit expensive, this stuff is really old by now. Nobody really wants it any more, outside of very niche use cases. Power draw and heat generation will also be pretty spectacular |
|
January 16, 2024, 23:33 |
|
#5 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 367
Rep Power: 14 |
Quote:
Will |
||
January 17, 2024, 05:34 |
|
#6 |
Member
EM
Join Date: Sep 2019
Posts: 59
Rep Power: 7 |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Asus z10pe-d16 ws | hvem10 | Hardware | 5 | November 24, 2016 08:12 |
HP DL580 servers: How to get the most out of my 160 vCPUs! | mcneilyo | Hardware | 1 | October 24, 2016 02:43 |
ASUS ROG G751JT notebook and Linux distributions? | Milan2013 | Hardware | 0 | December 2, 2015 16:49 |
X79 motherboard selection - Why not Intel for half Price VS. Asus? | nima_nzm | Hardware | 7 | June 24, 2015 14:47 |
AMD Athlon 1000 and Asus A7V suitable for CFD? | steve | Main CFD Forum | 22 | March 5, 2001 11:50 |