*WORK IN PROGRESS*
The goal of this thread is to cover most of the basics that come up in almost every CFD hardware recommendation topic. There will be some examples with hardware available as of early 2021, but these are mostly to illustrate some of the points that are being made. It is specifically NOT a recommendation for any of the hardware mentioned, as new and sometimes better hardware gets released all the time.
This thread should give you an idea what to look for in a CFD workstation or cluster node. The general concepts presented should allow you to make an informed decision, even with new hardware not mentioned here. And to be frank: it keeps me from sounding like a broken record, by simply linking this thread instead of explaining the same ideas over and over again. Laziness brought me here
Without further ado:
0. Checklist before posting
When posting a question about buying a new computer, please try to answer as many of these questions as you can in the first post:
- Which software do you intend to use?
- Are you limited by license constraints? I.e. does your software license only allow you to run on N threads?
- What type of simulations do you want to run? And what's the maximum cell count?
- If there is a budget, how high is it?
- What kind of setting are you in? Hobbyist? Student? Academic research? Engineer?
- Where can you source your new computer? Buying a complete package from a large OEM? Assemble it yourself from parts? Are used parts an option?
- Which part of the world are you from? It's cool if you don't want to tell, but since prices and availability vary depending on the region, this can sometimes be relevant. Particularly if it's not North America or Europe.
- Anything else that people should know to help you better?
1. CPU - solver performance
The most central piece of every computer indeed. Or is it?
The initial reflex might be to just buy the newest processor, with the highest core count and frequency. While that might get you the highest solver performance, you are probably over-spending. Or maybe you have a limited budget anyway...
Most CFD solvers tend to have a low computational intensity. This is defined as the amount of floating point operations, divided by the amount of data transferred from and to RAM. Which means that in order to keep a high amount of cores fed with enough data, the CPU also needs enough memory bandwidth. Otherwise, you end up with a memory bandwidth bottleneck, where running more and more threads does not increase performance. Memory bandwidth is a product of memory speed (e.g. DDR4-3200) and the number of memory channels.
Time for some actual numbers, taken from this thread:
OpenFOAM benchmarks on various hardware
Attachment 82913
Here we have a few popular choices for CPUs, ranging from a latest-gen mainstream CPU (AMD Ryzen 5600x) up to server-grade hardware (2x AMD Epyc 7302). The same benchmark was run with an increasing number of threads, shown on the x-axis. The y-axis indicates normalized solver performance. Normalization was done with the fastest single-core result from the Ryzen 5600x. Some of the conclusions we can draw from this chart, for various classes of CPUs.
- Mainstream CPUs with 2 memory channels (Ryzen 5600x): highest single-core performance, but quickly falls behind the more threads are used, because the memory subsystem can't keep up. At 6 threads, it has been overtaken by every other entry in the chart. We can further extrapolate that there will be some more scaling up to 8 threads, but not beyond that. Which means that variants of these CPUs with 12 or even 16 cores are a waste of money for our purpose.
- Entry-level HEDT parts with 4 memory channels (I7-9800x): much better balance of floating point and memory performance. Starts out slower, but ultimately beats the much newer and more expensive mainstream CPU at 6 or more threads.
- High-end HEDT parts with 4 memory channels (TR-3960x): A textbook example of a memory bandwidth bottleneck. There are just way too many cores for only 4 memory channels.
- Older dual-socket server CPUs with 2x4 memory channels (Xeon E5-2673v3): the budget-friendly choice for high solver performance. Thanks to 2x4 memory channels -albeit at a lower memory frequency- this type of hardware can still compete with newer and more expensive hardware
- Top-of-the-line server CPUs with 2x8 memory channels (Epyc 7302): gets beaten by some of the other choices for low thread counts. But the abundance of memory bandwidth allows linear scaling up to 16 cores, resulting in superior performance for 8 threads and above. Also note that scaling is no longer linear above 16 threads. This is partially caused by the unconventional chiplet approach, but also hints at an important conclusion: while these CPUs are available with up to 64 cores, these high core count models are a waste of money for CFD. They would run into the same memory bandwidth bottleneck that the TR-3960x encounters.
Conclusion: core count and number of memory channels should be balanced. The general rule of thumb is between 2-4 CPU cores per memory channel. Aim for the lower end if your solver has an expensive per-core licensing scheme. After all, you want to get the most out of these expensive licenses. For software with low or no licensing cost, you can get a few more cores. They will be less effective, but solver times will still be lower overall.
Also: 2 CPUs are usually better than one, because the memory controller resides within the CPU package. By choosing e.g. two 32-core CPUs instead of a single 64-core CPU, you get effectively twice the memory bandwidth.
2. System memory/RAM
While the choice of memory is closely tied to the CPU, two questions usually come up when discussing memory loadout: capacity and memory type.
I will try my best to keep this as short as possible. If you want to know more, this is an excellent starting point:
https://frankdenneman.nl/2015/02/18/...y-blog-series/
2a. Memory capacity
How much RAM you need mostly depends on two factors: maximum model size (i.e. how many cells the mesh consists of) and the solver type.
General purpose CFD software like Ansys Fluent requires in the range of 1-4 GB of RAM per million cells. SIMPLE solver in single precision marks the lower end of that range, coupled solver in double precision sits at the higher end.
This should give you a rough idea how much memory is enough. There are always exceptions of course. If you are not sure about your specific application, you can run one or two smaller cases on hardware you already have, and extrapolate to the cell counts you want to tun with your new machine.
And yes, you actually need enough memory to run your models properly. Unlike in some some FEA solvers, out-of-core execution (persistent storage used to extend RAM) is not really a thing in CFD. Even the fastest SSDs are an order of magnitude slower than RAM. Avoid it at all costs.
Big caveat here: there is a lower limit for the amount of memory, dictated by the CPU and memory controller. Let's say you get a CPU with an 8-channel memory controller like an AMD Epyc 7302. To make use of these 8 memory channels, you need at least 8 DIMMs. The smallest compatible DIMMs -more on that later- come in 8GB capacity. Which means that 64GB, populated as 8x8GB, is the lower limit for such a CPU. Double that if you opt for two CPUs.
Big OEMs and system integrators can be oblivious to this, so check your quote very carefully before pulling the trigger. You really don't want your new 15000$ CFD workstation with two high-end processors choked by single-channel memory.
2b. Memory type
The amount of options and nuances here might be overwhelming. But if your goal is just a working system, as opposed to breaking world records, a few simple rules are enough. Your CPU choice dictates which memory you need.
- Server CPUs like AMD Epyc 7xxx or Intel Xeon Platinum/Gold/W should be paired with registered ECC memory (also called reg ECC or ECC RDIMM). There might be some rare occasions where unbuffered (UDIMM) might work, but let's leave that to adventurous folks. Memory transfer rate (DDR4-2666, DDR4-3200 etc.) is also determined by the CPU. The product page will list the maximum supported transfer rate, so stick to that.
If you need an extraordinarily large amount of memory, which can not be achieved by filling all DIMM slots with the largest RDIMM available, you can switch to load-reduced LRDIMM.
- Virtually all other CPUs need unbuffered unbuffered (UDIMM) memory. Exceptions confirm the rule, but are usually compatible with both types. Again, the product page for your CPU of choice lists the supported transfer rate, which is your first clue which speed bin is right for you. But in contrast to server CPUs, this spec is usually the minimum guaranteed frequency. As we saw in chapter 1, memory bandwidth is a key factor to high performance. So if you happened to buy a CPU and a motherboard that support memory overclocking, getting memory beyond the minimum guaranteed frequency is an easy way to squeeze some more performance out of your CFD workstation.
Of course, it must be mentioned that this is technically overclocking, albeit a very easy one thanks to XMP profiles, with very little risk of damaging your hardware. So always check for system stability. And if millions of dollars or even peoples lives depend on the correctness of your results, maybe stick to guaranteed specifications instead.
One question remains with non-server CPUs: error checking and correction (
ECC). Only some combinations of CPUs and motherboards do support it officially. Some have unofficial support, i.e. the CPU manufacturer did not disable the feature intentionally, and leaves implementation up to the motherboard manufacturers. And others don't support it at all. Side-note: Unbuffered ECC memory works on most platforms that do support UDIMM. You just don't get the ECC feature.
Let's assume you bought a platform that supports ECC, either officially or through board partners. Ask yourself the question: how often can you afford to get into the office in the morning, just to realize that your simulation has failed for no apparent reason. If the answer is not at all, you probably want ECC. But then again, you probably also want redundant power supplies, a UPS to protect against short power outages, redundancy for your storage etc.
In practice, the decision for ECC memory with non-server CPUs should come first. Because it dictates which CPUs and motherboard you can get.
3. Graphics card/GPU
Graphics cards can serve two distinct purposes in a CFD workstation: render the image on the screen, and help with the computations.
3a. Graphics cards as a display/rendering device
Hardware requirements for this aren't particularly high. Even an otherwise high-end CFD workstation doesn't necessarily need a high-end graphics card.
The most important specification is the amount of memory on the graphics card. As of 2021, I consider 4GB as the absolute minimum, and recommend at least 8GB. Even more can be helpful if you need to render complex scenes for meshes in the order of 50 million cells or more. Without enough VRAM, one of two things will happen: either performance while interacting with the scene drops to unacceptable levels, or the program stops working entirely.
On the contrast, if the GPU core of the graphics card just isn't very fast (i.e. you saved money by buying an entry-level or midrange card with enough VRAM), interacting with the model will just be slower than optimal. I do consider this an acceptable compromise when on a budget.
"Professional" or "Gaming" graphics cards: There is usually no point in spending more money on a graphics card from the professional lines.
They are made with the same chips as the consumer cards, which means similar performance and feature sets. The main differences are the drivers, which can make a performance difference in some CAD programs. Most professional applications come with a list of recommended or tested GPUs, and the only GPUs on these lists are from the professional lines. So if you need absolutely guaranteed compatibility for all features, and support if something doesn't work as intended, stick to the SKUs on the list. But these days, a GPU not being on the list of "compatible" devices doesn't necessarily mean things won't work as intended.
Integrated graphics: with this chapter being written during the great graphics card shortage of 2020/2021, a word on integrated graphics.
Some mainstream CPUs come with a GPU integrated into the CPU. While they can not replace a decent graphics card, you can get surprisingly far with these, for the same reasons mentioned above: a graphics card doesn't need to be high-end, it mostly needs to have enough memory.
But be warned: since the integrated GPU gets its VRAM from system memory, and also shares memory bandwidth with the CPU cores, doing anything graphically demanding, at the same time as solving a CFD problem, will tank performance. And you won't have all system memory available.
3b. Graphics cards and GPUs to accelerate CFD computations
GPU acceleration is still a topic with many caveats and pitfalls. See also:
GPU acceleration in Ansys Fluent
In theory, GPUs have much higher raw floating point performance and memory bandwidth compared to CPUs, which is the whole appeal of GPU acceleration.
In practice, leveraging these capabilities for CFD is not trivial. To put it bluntly: with a limited hardware budget, GPU acceleration should not be a priority. Focus on CPU performance instead.
If you are still determined to leverage GPU acceleration for your CFD workstation, you need to do your own research. Important points to answer before buying a GPU for computing are:
- Does your CFD package support GPU acceleration?
- Do the solvers you intend to run benefit from GPU acceleration?
- Are your models small enough to fit into GPU memory? Or vice versa, how much VRAM would you need to run your models?
- Does GPU acceleration for your code work via CUDA (Nvidia only) or OpenCL (both AMD and Nvidia)
- Which GPUs are allowed for GPU acceleration? Some commercial software comes with whitelists for supported GPUs for acceleration, and refuses to work with other GPUs.
- Single or double precision? All GPUs have tons of single precision floating point performance. But especially for Nvidia, only a few GPUs at the very top end also have noteworthy double precision floating point performance.
In addition to these points, my personal opinion on the current state of this matter: GPU acceleration for commercial CFD packages is artificially made viable with the licensing scheme. Using very few additional license tokens for adding GPUs, compared to adding more CPUs, skews the scale towards GPUs. Without this trick, GPU acceleration for commercial CFD codes would be even more of a niche than it is today. This also means that for CFD codes without license fees, making GPU acceleration viable is even harder.
That's all for now, more to come. But if I don't start somewhere, I'll never get this done.
If you have any suggestions how to structure this article, I really want to hear it. It is not intended as a deep-dive into all the nuances and edge-cases, but rather to cover 90% of the questions and misconceptions that come up regularly.
And of course, contributions or ideas for topics are welcome.
Note to moderators: it would be nice if I could keep editing privileges to this post for a longer period of time, as I don't know when I will get around to adding more stuff. And maybe when it's polished enough at some point, we could pin it to the top.
Changelog
22.02.2021: thread started with chapter 1 on CPU solver performance
23.02.2021: added chapter 2 on memory
25.02.2021: added chapter 0, a checklist for posting questions. And added this changelog
11.11.2021: added chapter 3 on GPUs, pinned the thread