|
[Sponsors] |
General recommendations for CFD hardware [WIP] |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
February 22, 2021, 15:50 |
General recommendations for CFD hardware [WIP]
|
#1 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
*WORK IN PROGRESS*
The goal of this thread is to cover most of the basics that come up in almost every CFD hardware recommendation topic. There will be some examples with hardware available as of early 2021, but these are mostly to illustrate some of the points that are being made. It is specifically NOT a recommendation for any of the hardware mentioned, as new and sometimes better hardware gets released all the time. This thread should give you an idea what to look for in a CFD workstation or cluster node. The general concepts presented should allow you to make an informed decision, even with new hardware not mentioned here. And to be frank: it keeps me from sounding like a broken record, by simply linking this thread instead of explaining the same ideas over and over again. Laziness brought me here Without further ado: 0. Checklist before posting When posting a question about buying a new computer, please try to answer as many of these questions as you can in the first post:
Links: Buyers guide for AMD Epyc Genoa: AMD Epyc 9004 "Genoa" buyers guide for CFD 1. CPU - solver performance The most central piece of every computer indeed. Or is it? The initial reflex might be to just buy the newest processor, with the highest core count and frequency. While that might get you the highest solver performance, you are probably over-spending. Or maybe you have a limited budget anyway... Most CFD solvers tend to have a low computational intensity. This is defined as the amount of floating point operations, divided by the amount of data transferred from and to RAM. Which means that in order to keep a high amount of cores fed with enough data, the CPU also needs enough memory bandwidth. Otherwise, you end up with a memory bandwidth bottleneck, where running more and more threads does not increase performance. Memory bandwidth is a product of memory speed (e.g. DDR4-3200) and the number of memory channels. Time for some actual numbers, taken from this thread: OpenFOAM benchmarks on various hardware of_solver_scaling.png Here we have a few popular choices for CPUs, ranging from a latest-gen mainstream CPU (AMD Ryzen 5600x) up to server-grade hardware (2x AMD Epyc 7302). The same benchmark was run with an increasing number of threads, shown on the x-axis. The y-axis indicates normalized solver performance. Normalization was done with the fastest single-core result from the Ryzen 5600x. Some of the conclusions we can draw from this chart, for various classes of CPUs.
Also: 2 CPUs are usually better than one, because the memory controller resides within the CPU package. By choosing e.g. two 32-core CPUs instead of a single 64-core CPU, you get effectively twice the memory bandwidth. 2. System memory/RAM While the choice of memory is closely tied to the CPU, two questions usually come up when discussing memory loadout: capacity and memory type. I will try my best to keep this as short as possible. If you want to know more, this is an excellent starting point: https://frankdenneman.nl/2015/02/18/...y-blog-series/ 2a. Memory capacity How much RAM you need mostly depends on two factors: maximum model size (i.e. how many cells the mesh consists of) and the solver type. General purpose CFD software like Ansys Fluent requires in the range of 1-4 GB of RAM per million cells. SIMPLE solver in single precision marks the lower end of that range, coupled solver in double precision sits at the higher end. This should give you a rough idea how much memory is enough. There are always exceptions of course. If you are not sure about your specific application, you can run one or two smaller cases on hardware you already have, and extrapolate to the cell counts you want to tun with your new machine. And yes, you actually need enough memory to run your models properly. Unlike in some some FEA solvers, out-of-core execution (persistent storage used to extend RAM) is not really a thing in CFD. Even the fastest SSDs are an order of magnitude slower than RAM. Avoid it at all costs. Big caveat here: there is a lower limit for the amount of memory, dictated by the CPU and memory controller. Let's say you get a CPU with an 8-channel memory controller like an AMD Epyc 7302. To make use of these 8 memory channels, you need at least 8 DIMMs. The smallest compatible DIMMs -more on that later- come in 8GB capacity. Which means that 64GB, populated as 8x8GB, is the lower limit for such a CPU. Double that if you opt for two CPUs. Big OEMs and system integrators can be oblivious to this, so check your quote very carefully before pulling the trigger. You really don't want your new 15000$ CFD workstation with two high-end processors choked by single-channel memory. 2b. Memory type The amount of options and nuances here might be overwhelming. But if your goal is just a working system, as opposed to breaking world records, a few simple rules are enough. Your CPU choice dictates which memory you need.
Let's assume you bought a platform that supports ECC, either officially or through board partners. Ask yourself the question: how often can you afford to get into the office in the morning, just to realize that your simulation has failed for no apparent reason. If the answer is not at all, you probably want ECC. But then again, you probably also want redundant power supplies, a UPS to protect against short power outages, redundancy for your storage etc. In practice, the decision for ECC memory with non-server CPUs should come first. Because it dictates which CPUs and motherboard you can get. 3. Graphics card/GPU Graphics cards can serve two distinct purposes in a CFD workstation: render the image on the screen, and help with the computations. 3a. Graphics cards as a display/rendering device Hardware requirements for this aren't particularly high. Even an otherwise high-end CFD workstation doesn't necessarily need a high-end graphics card. The most important specification is the amount of memory on the graphics card. As of 2021, I consider 4GB as the absolute minimum, and recommend at least 8GB. Even more can be helpful if you need to render complex scenes for meshes in the order of 50 million cells or more. Without enough VRAM, one of two things will happen: either performance while interacting with the scene drops to unacceptable levels, or the program stops working entirely. On the contrast, if the GPU core of the graphics card just isn't very fast (i.e. you saved money by buying an entry-level or midrange card with enough VRAM), interacting with the model will just be slower than optimal. I do consider this an acceptable compromise when on a budget. "Professional" or "Gaming" graphics cards: There is usually no point in spending more money on a graphics card from the professional lines. They are made with the same chips as the consumer cards, which means similar performance and feature sets. The main differences are the drivers, which can make a performance difference in some CAD programs. Most professional applications come with a list of recommended or tested GPUs, and the only GPUs on these lists are from the professional lines. So if you need absolutely guaranteed compatibility for all features, and support if something doesn't work as intended, stick to the SKUs on the list. But these days, a GPU not being on the list of "compatible" devices doesn't necessarily mean things won't work as intended. Integrated graphics: with this chapter being written during the great graphics card shortage of 2020/2021, a word on integrated graphics. Some mainstream CPUs come with a GPU integrated into the CPU. While they can not replace a decent graphics card, you can get surprisingly far with these, for the same reasons mentioned above: a graphics card doesn't need to be high-end, it mostly needs to have enough memory. But be warned: since the integrated GPU gets its VRAM from system memory, and also shares memory bandwidth with the CPU cores, doing anything graphically demanding, at the same time as solving a CFD problem, will tank performance. And you won't have all system memory available. 3b. Graphics cards and GPUs to accelerate CFD computations GPU acceleration is still a topic with many caveats and pitfalls. See also: GPU acceleration in Ansys Fluent In theory, GPUs have much higher raw floating point performance and memory bandwidth compared to CPUs, which is the whole appeal of GPU acceleration. In practice, leveraging these capabilities for CFD is not trivial. To put it bluntly: with a limited hardware budget, GPU acceleration should not be a priority. Focus on CPU performance instead. If you are still determined to leverage GPU acceleration for your CFD workstation, you need to do your own research. Important points to answer before buying a GPU for computing are:
That's all for now, more to come. But if I don't start somewhere, I'll never get this done. If you have any suggestions how to structure this article, I really want to hear it. It is not intended as a deep-dive into all the nuances and edge-cases, but rather to cover 90% of the questions and misconceptions that come up regularly. And of course, contributions or ideas for topics are welcome. Note to moderators: it would be nice if I could keep editing privileges to this post for a longer period of time, as I don't know when I will get around to adding more stuff. And maybe when it's polished enough at some point, we could pin it to the top. Changelog 22.02.2021: thread started with chapter 1 on CPU solver performance 23.02.2021: added chapter 2 on memory 25.02.2021: added chapter 0, a checklist for posting questions. And added this changelog 11.11.2021: added chapter 3 on GPUs, pinned the thread 09.11.2023: started a section with links to other useful threads Last edited by flotus1; January 9, 2023 at 05:24. |
|
February 22, 2021, 17:25 |
|
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Reserved for advanced topics
Beware of x16 memory modules It's a complicated topic, but the gist of it: x16 DIMMs come with higher latency and lower bandwidth compared to the regular x8 and x4 DIMMs. This has not been a widespread issue with DDR4 memory. Only some of the cheapest laptop memory (the kind of stuff that OEMs use) was x16. But with DDR5, x16 DIMMs seem to become more common, even for desktop memory. Stick to x8 or x4 instead. Here is a pretty lengthy video that explains the difference: https://www.youtube.com/watch?v=w2bFzQTQ9aI&t=0s It is for DDR4 in particular, but my current understanding is that similar performance differences exist for DDR5. Still waiting for more conclusive DDR5 benchmarks, but in the meantime I would just avoid x16 modules. Memory ranks per channel Why it matters: given the same transfer rate, two ranks per memory channel consistently outperform configurations with a single rank per channel. So if you are limited to a certain maximum transfer rate, use one DIMM per channel, with dual-rank DIMMs. A selection of benchmarks, the internet is full of them: https://downloads.dell.com/manuals/a...rs89_en-us.pdf https://www.igorslab.de/en/performan...yberpunk-2077/ https://www.anandtech.com/show/17269...urers-matter/2 Based on the results from anandtech, DDR5 seems to perform poorly with two single-rank DIMMs per channel. This was not the case with DDR4, where 2R-1DPC was pretty much the same as 1R-2DPC. Whether these are teething issues of the new technology, or an inherent feature of DDR5, time will tell. Until then, the same rule applies: One dual-rank DIMM per channel is your best bet. Side-note: some platforms lower the maximum supported transfer rate with increasing number of ranks per channel. You can sometimes ignore this limit and force transfer rate back to a higher value. But be aware that the more ranks per channel you have, the lower your success rate. Last edited by flotus1; April 9, 2022 at 05:20. |
|
February 23, 2021, 09:43 |
|
#3 |
Member
dab bence
Join Date: Mar 2013
Posts: 48
Rep Power: 13 |
I agree that the broad principals of hardware selection are easily summarized. Perhaps two 1D tables would be useful.
The first table would be example hardware selection where the number of core licenses are controlled (eg fiuent). So an illustrative HW selection for 4,8,16 and 32 licenses. The second table would be illustrative HW selection where there are no core licenses (eg Openfoam) and HW cost is the controlling factor. So examples of 1K,4K,8K and 16K euros. The pace of change would only require this list to be updated every 2 years. |
|
February 25, 2021, 12:44 |
|
#5 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Quote:
The most immediate reason being that they will be largely outdated in a few months, when Epyc Milan hits the retail market. But I have a few other doubts about their usefulness. For part 1 with fixed core counts due to licenses, you theoretically want the maximum performance that money can buy - within reason. Licenses are expensive, and so is the time of a CFD engineer, and development time in general. Logically, the hardware costs should not matter a whole lot. If applicable, you can even throw in a few GPUs, because that's mostly what GPU acceleration with commercial CFD codes is all about: skewing the licensing scheme in order to make it a viable solution. So that's another variable that can't be covered in a simple table. By the way, I do intend to add a chapter on graphics cards and GPU acceleration. But in practice, at least from my personal experience, company money is way tighter than it should be, even if skimping out on hardware makes no sense financially. And for part 2...well, prices and availability vary greatly depending on the region. Fine, let's limit it to North America and Europe. There is still a huge difference between spending 4k for a workstation from Dell, HP, Lenovo and the likes, and buying 4k worth of parts and assembling it yourself. And let's not start about used CPUs, where the real price/performance is hiding. And what's in the budget for each price bracket? Storage? That's highly dependent on the use-case. Same as the amount of RAM. So I don't think I will compile such lists. If someone else wants to give it a go in another thread, I will certainly put in a link. And lastly: this thread is not supposed to replace opening a new thread and asking for advice. Then, taking into account the answers from the checklist, people can be nudged towards a solution that best fits their needs. Just without explaining each time why 16-core Ryzen CPUs are a waste of money, and why a cheap GPU (or even an expensive one) won't magically make simulations run 5x faster. |
||
March 19, 2021, 12:53 |
|
#6 |
New Member
Dominic
Join Date: Feb 2015
Posts: 12
Rep Power: 11 |
does your chart represent constant cpu clock speed or is it possible that the clock speed is boosted at lower core counts? That might explain some of the flattening of the curves.
|
|
March 19, 2021, 13:18 |
|
#7 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
No, clock speeds were not artificially limited in these tests.
While higher boost frequency with low core counts definitely does account for some of the less-than-ideal scaling, the contribution is rather small. Some of these CPUs like the Epyc 7302 have a very flat boosting behavior anyway. |
|
March 20, 2021, 11:09 |
|
#8 |
New Member
Dominic
Join Date: Feb 2015
Posts: 12
Rep Power: 11 |
Ok, that is interesting. I see the EPYC line at low core counts is very straight, so it makes sense with the flat boosting you mention.
I'm quite interested in systems in the sub-8 core range due to licensing cost for commercial codes. Having read up on the importance of memory bandwidth, I'd like to see an example of a bandwidth limited system. For example an Epyc 7262 with 8 vs 4 ram slots populated, i.e. identical cpu with half the memory bandwidth. What would you expect that to do to solver performance? I think I'll start an 8-core thread as it feels like the gains around this license point are substantial but overlooked and under-reported. For this thread I wonder if you could comment on whether memory bandwidth has a minimum requirement, beyond which extra bandwidth is redundant? Also, how significant is memory speed and cpu clock speed? There are scenarios where a higher clocked cpu is available with less memory channels, or more memory channels are avaialble with lower memory speed, is there a rule of thumb for memory channels / memory speed and cpu frequency? |
|
March 20, 2021, 11:51 |
|
#9 | |
Senior Member
Join Date: May 2012
Posts: 551
Rep Power: 16 |
Quote:
Memory speed is connected to memory bandwidth which is the most important parameter. Number of memory channels and memory speed is not the important factor, but rather the memory bandwidth. A quad channel (e.g. Xeon 2690) with DDR3 1600 MHz memory will roughly have the same memory bandwidth as a dual channel (e.g. Ryzen 3600) with DDR4 3200 MHz memory. For non-enterprise CPUs you usually have the option to tweak settings and overclock memory. Usually the memory overclocking gain is not linear, but it may still be well worth the investment. For a maximum 8 core license I would definitely look at the HEDT segment or even the consumer PC segment (Ryzen 5000 series) if you go for high speed memory. There are many better options performance-wise but if you are interested in price performance then that may be a good place to start. |
||
March 21, 2021, 20:50 |
|
#10 |
New Member
Dominic
Join Date: Feb 2015
Posts: 12
Rep Power: 11 |
We're all reading the same specs and benchmarks, how is it that there is no system configurator that takes available core licenses, required minimum ram and price point as inputs and returns the ideal parts? How hard should it be? Relevant component data: cpu clock speed, memory bandwidth, perhaps some kind of cpu rating to reflect ipc and other generational changes outside of pure clock speed, anything else?
Is the data from the open foam benchmark thread collated anywhere? |
|
March 20, 2022, 17:07 |
|
#11 |
New Member
Klaus
Join Date: Dec 2021
Posts: 29
Rep Power: 4 |
||
March 20, 2022, 20:01 |
|
#12 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 371
Rep Power: 14 |
You can find the number of memory channels in the CPU specification. Most motherboards (but not all) aim to have all the memory channels connected. This means at least one memory slot per memory channel. A lot of the Chinese consumer motherboards support only 2 channels of memory even when the processor is a Xeon with 4 channels. Don't be fooled by the 4 DIMM slots or the 4 in the designation of the board.
The number of slots per channel ranges between 1 and 4. More slots allow for more GB when the largest supported DIMM GB capacity DIMMS are used. However, with more slots in use, the maximum memory rate is often lower. For a CPU with 4 memory channels, 4x16GB sticks is better than 2x32GBsticks, as long as the DIMMS are placed in the correct slots to take advantage of all channels. For a CPU with just 2 memory channels, 2x32GB sticks is better in most cases. |
|
March 21, 2022, 02:30 |
|
#13 | |
New Member
Klaus
Join Date: Dec 2021
Posts: 29
Rep Power: 4 |
Quote:
Does all Hp Z840 has 16 memory slots? Goal is to have all slots filled with memory modules/sticks? |
||
March 21, 2022, 03:25 |
|
#14 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Yes, all HP Z840 workstations have 16 DIMM slots.
The goal should be to fill at least 8 of them with identical memory modules in the right order, which enables all memory channels on both CPUs. The CPUs for this generation have 4 memory channels each. All 16 slots filled is fine too. |
|
March 21, 2022, 11:31 |
|
#15 | |
New Member
Klaus
Join Date: Dec 2021
Posts: 29
Rep Power: 4 |
Quote:
If PC has 16slots and I want 256GB, is better to put 8x32GB or 16x16GB? If I want to add some RAMs one day can I mix diffrent size moduls or all must have same GB ? |
||
March 21, 2022, 13:11 |
|
#16 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Specifically talking about this HP Z840 workstation, and 2 CPUs with 4 memory channels each: it doesn't matter a whole lot whether you fill 8 or 16 DIMM slots. There can be a minor performance impact, but more on that later.
Mixing different capacity memory modules usually works, provided they are the same type. BUT: this always comes with a significant performance impact. I would not recommend it. Same for mixing different DIMMs with the identical capacity, but different internals. You can stop reading here if you don't want to get bogged down by minor details. But for the sake of answering your question conclusively: Internally, memory modules are organized into "ranks". There are modules with 1, 2, 4 or even 8 ranks, though the latter is reserved for LRDIMM. For registered memory, the number of ranks is right on the label: "2Rx4" for example denotes a module with 2 ranks. In order to get the maximum amount of sequential memory throughput -which we want for CFD- at least 2 ranks are required on each memory channel. We are talking about a real-world performance difference in the order of 10%. How you get 2 ranks on each memory channel is up to you. Again, the HP Z840 has 2 DIMM slots for each memory channel. So you can either fill both slots with single-rank modules, or one of the two slots with a dual-rank module. Caveat: older hardware generations, and OEM stuff like this in particular, can enforce a lower memory transfer rate with more than 1 rank per channel. The more ranks, the lower the transfer rate. Thus negating most of the benefit from populating more ranks in the first place. You might be able to enforce a higher memory transfer rate in bios, but there is no guarantee the required options are even available, or work as intended. |
|
October 24, 2023, 19:38 |
|
#17 | |
New Member
Join Date: Oct 2020
Posts: 1
Rep Power: 0 |
@flotus1: Same question for Dell T7910 with E5-2680 v4. I am changing from 16 x16 GB 2133 to 2400 - wonder if 8 x 32 vs 16 x16 slots for openFOAM home lab ?
Quote:
|
||
October 25, 2023, 04:49 |
|
#18 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Your best bet is with 8x32GB. Make sure to buy dual-rank modules, i.e. 2Rx4. So you still get two ranks per channel.
|
|
February 29, 2024, 13:48 |
|
#19 | |
New Member
Join Date: Feb 2024
Location: Spain
Posts: 19
Rep Power: 2 |
Quote:
I was thinking about buying the following workstation, but after reading your statement I have some concerns about the RAM part. Do you think I should change the type of RAM? -AMD EPYC 9354P Processor (3.25 GHz, 256 MB Cache, 32 Cores, 64 Threads, Turbo up to 3.80 GHz) -384 GB DDR5-4800 RAM (12x RDIMM 32 GB PC5-38400 ECC Reg.) -Server-Mainboard with AMD System-on-Chip -1x PNY Nvidia T1000 with 4 GB GDDR6-RAM -2x4TB M.2 NVMe SSD WD Black SN850X (1,200,000 IOPS, 2400 TBW, PCIe 4.0 x4) -AMD Controller onboard -2x 1 Gbit LAN onboard -Server Tower Case (black/silver) 7,689 |
||
June 23, 2024, 19:02 |
|
#20 |
New Member
Ben
Join Date: Jan 2024
Posts: 2
Rep Power: 0 |
Which software do you intend to use?
Ansys Fluent, Mechanical Are you limited by license constraints? I.e. does your software license only allow you to run on N threads? No license restrictions. What type of simulations do you want to run? And what’s the maximum cell count? • Two-phase flow with highly velocity fluid speeds such as MACH 3 with fluid and solid particle injection, • Steady/ transient, • External & internal flow. • Pressure, Density, Simple, Coupled solver • 2nd order analysis solvers • 1-20 million cells • Static Mechanical analysis If there is a budget, how high is it? $6,000 USD What kind of setting are you in? Hobbyist? Student? Academic research? Engineer? Engineer/Researcher Where can you source your new computer? Buying a complete package from a large OEM? Assemble it yourself from parts? Are used parts an option? Can build in-house, using existing parts and items from ebay(china). • 12x 16GB DDR5 ram 4800 ECC 2Rx8 - (196GB total) • 1000W corsair platinum(purchased new) • Gigabyte MZ33-AR0 • Noctua air cooling NH-U14S TR5-SP6 • NZXT Phantom Case (existing PC case) • GTX 1080 8GB.(existing) • AMD EPYC 9554, L3 =256 MB, (3.1- 3.75)GHz, 64 core ($3000) OR • AMD EPYC 9654 QS, L3=384 MB, (2.15-3.5)GHz, 96 core($3000) Which part of the world are you from? It’s cool if you don’t want to tell, but since prices and availability vary depending on the region, this can sometimes be relevant. Particularly if it’s not North America or Europe. Sydney, Australia. Anything else that people should know to help you better? • Parts will be second hand(ebay- understood risks involved) • First CFD build. • Wake on LAN - remote login to initiate analysis Additionally, I have researched into the core count and diminishing returns, I understand that after 48 core the efficiency diminishes quickly but the 9454 arent common. For the same price between 9554 and 9654 QS I am at a fork in the road. I can see good performance from 64 core but for the same price I can get the 96 core with a larger L3 cache, so I'm wondering if the 9654 lower frequency and higher cache rate would be a significant improvement over a 9554 lower cache and higher frequency. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Hardware configuration for general and multiphase flows | Kukka | Hardware | 0 | February 13, 2021 04:51 |
Looking for a pimpleFoam tutorial using Salome (and hardware recommendations?) | madact | OpenFOAM | 1 | May 27, 2010 02:24 |
low budget hardware recommendations? | rparks | Hardware | 1 | October 18, 2009 22:58 |
Hardware recommendation? AMD X2, Phenom, Core2Duo, Quadcore? | rparks | OpenFOAM | 0 | April 22, 2009 10:10 |