|
[Sponsors] |
Building a "home workstation" for ANSYS Fluent |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
September 26, 2017, 09:41 |
Building a "home workstation" for ANSYS Fluent
|
#1 |
New Member
Join Date: Sep 2017
Posts: 4
Rep Power: 9 |
Hello.
A collegue asked me to help him build a new home PC next month. Its sole purpose is to run flow simulations with large meshes using mainly ANSYS Fluent or corporate in-house code, but Fluent is of first concern. The license will be also corporate for as many cores as needed. I have some expertise in PC hardware, but almost none in flow simulation - my own field of research is heat transfer. When solving heat conduction problems in Fluent, using just the energy equation, with mesh size from very small (20K) to relatively large (40M cells), I noticed that parallel performance did not scale very good - e.g. 4 threads were just 2-2.5x faster than serial, and it was more time-efficient to run a few serial cases simultaneously. But I realize this may not be the case for Navier-Stokes solver which can behave differently in parallel. That's why I went to this forum for recommendations. My colleague's budget is not very tight for a home PC, e.g. a platform with Ryzen 7 1800X and 64 GB RAM is totally fine for him price-wise, however I'd like to keep cost efficiency in reasonable limits and avoid $2000 high-end CPUs in favor of slightly slower but not overpriced models. Server platforms are not considered since the extra features they offer are unnecessary (except RAM capabilities maybe). With that said, I'd like to ask a few questions. They are not about the exact config - it will be chosen later according to budget and availability (although suggestions are still welcome), but rather about influence of different factors on performance in the specific task of flow simulation in ANSYS Fluent with large meshes (say 30M cells) and using density-based solver. The questions are: 1. How well does Fluent Navier-stokes parallel performance scale with thread count? (And what is better then - more cores or higher per-core performance?) 2. How important is RAM bandwidth for this kind of workload? Is it worth buying a TR4 or LGA2066 platform for extra memory channels, or high-frequency DIMMs for a 2-channel system? (The second question is more relevant to Intel platforms since AM4/TR4 requires fast RAM anyway.) 3. How do Skylake-X processors with their new mesh topology and cache architecture compare to previous generations in terms of Fluent performance? (Reviews say they have higher inter-core data transfer latency, but does it matter in CFD that much?) 4. Similarly, do Threadripper CPUs suffer from non-uniform Infinity Fabric latency (different access time between cores in same and different dies) when used in CFD? (Again got this info from reviews, but they mostly focus on games unfortunately.) 5. Maybe a stupid question that I've had for a while but couldn't find any info on it: does Fluent (and CFD-post) use GPU acceleration in 3D scenes and is it worth getting a powerful graphics card for the new machine? (Forgot to say - GPGPU is not considered due to solver limitations and memory requirements.) Any advice and clues are welcome, thank you in advance. Last edited by Ep.Cygni; September 26, 2017 at 18:06. Reason: caught a few typos |
|
September 26, 2017, 18:39 |
|
#2 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Nice, a list of well-formulated questions:
1. Fluent can scale pretty well on large amounts of cores. Depending on the case and the computer architecture up to hundreds or even thousands of cores. That being said, scaling on a single node is usually limited by one factor: memory bandwidth. Apart from that, a higher "per-core" performance is usually more desirable than very high core counts, partly because most tasks in pre- and post-processing do not scale on many cores. 2. Core counts and memory bandwidth should be balanced. An extremely expensive 18-core desktop processor will be severely limited by a lack of bandwidth and thus a waste of money especially for CFD. The mainstream platforms with only two memory channels are not an option if you can afford more. Btw: in my opinion the notion that Ryzen has to be paired with faster memory and Intel processors do not is a myth to some extent. Faster memory helps processors from both manufactures in memory-intensive workloads. 3. For CFD applications, Skylake-X in its default configuration is slightly slower than its predecessor in terms of per-core performance. One of the reasons is indeed the cache architecture that is once again slower than in the previous generation. 4. Fluent and its MPI parallelization do not suffer as much from the slower inter-CCX communication as many mainstream applications do. The reason being that MPI tends to minimize data transfer between cores to some extent. 5. You don't necessarily need an expensive graphics card for pre- and postprocessing. I would recommend a GTX 1050TI 4GB as a minimum or a GTX 1060 6GB as a step-up. My advice would be to determine how many parallel licenses your colleague has available, how many cells his largest simulation models consist of and if he is using solver types or additional physical models that further increase the memory requirement. That should help narrowing down which platform is best for him. Edit: should have read the whole post. "Unlimited" amount of parallel licenses and ~30M cells Quote:
Last edited by flotus1; September 27, 2017 at 05:25. |
||
September 28, 2017, 07:40 |
|
#3 |
New Member
Join Date: Apr 2016
Posts: 12
Rep Power: 10 |
My experince for the incompressible simple solver in Fluent was, that 8GB of memory was barely enough for cases with 4M cells. But I never solved "bigger" cases with Fluent. Recently I'm using Openfoam, and 64GB memory is only enoguh for around 25M cells when using simpleFoam with k-eps modells. So maybe Ryzen is not the best option, as I think it is limited to 64GB of memory.
|
|
September 28, 2017, 16:03 |
|
#4 | |
New Member
Join Date: Sep 2017
Posts: 4
Rep Power: 9 |
Thank you flotus1 for informative and helpful answers.
Thanks to lac for the advice too. Quote:
At this point I came to the following conclusions about platform choice: - no mainstream, because RAM bandwidth and size is not enough; - no TR4, due do unnecessary high core count (8 and higher) for the available RAM bandwidth, high cost (motherboards are rare and expensive) and high TDP; - no old DDR3 platforms. Thus, I think about 5 options so far (including server ones as you suggested): LGA2011-3 with a 6-core CPU and 4 RAM channels, LGA2066 with a 6-core CPU and 4 RAM channels (cheaper and probably slower due to new caches), Dual LGA2011-3 with 2x4/2x6-core CPUs and 8 RAM channels, LGA3647 with a 6/8-core CPU and 6 RAM channels (expensive and very hard to get at the moment), SP3 with a 8/16 core CPU and 8 RAM channels (nearly impossible to get at the moment). Am I right with the Cores-to-RAM-channels ratio, or should we consider a 8core/4channel option too (and then TR4 socket as well)? Regarding memory size, in case of HEDT platforms we will most likely get 64GB RAM (4x16GB) with the possibility to add another 64 GB in the future. In case of Dual LGA2011-3 or SP3, we'll need to have 8 DIMMs from the beginning to use all channels. This is a disadvantage, because 8x16GB might be out of budget right now, and 8x8 will not allow to later upgrade to max capacity by adding more DIMMs (especially with cheaper dual-socket WS motherboards that have only 8 RAM slots). Finally, LGA3647 and 6 DIMMs is more of a hypothetical option which we most likely won't be able to afford. Last edited by Ep.Cygni; September 29, 2017 at 04:43. Reason: added EPYC option |
||
September 29, 2017, 09:39 |
|
#5 |
New Member
Join Date: Apr 2016
Posts: 12
Rep Power: 10 |
In my opinion, the HEDT platforms can be good alternative for the older dual CPU platforms (like E5 v3-v4) if you are on a tight budget, and for the memory I'd also take the fact into account that you can use faster memory (like 3200MHz). However, the bandwidth will be still less than for any dual CPU platform, as you can't OC the memory much more than this.
If I were you, I'd buy some second hand E5-v3/v4 Xeons with a brand new Supermicro board. I'd also get a board with 16 DIMM slots. This way you will have an upgrade path to more memory capacity and second hand cpus are not very expensive these days. |
|
September 29, 2017, 10:00 |
|
#6 |
Senior Member
Micael
Join Date: Mar 2009
Location: Canada
Posts: 157
Rep Power: 18 |
Out of curiosity, are you going to buy those commercial licences or you already have them? The system you are discussing will cost close to nothing compared to the licences.
|
|
September 29, 2017, 15:42 |
|
#7 | ||||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Quote:
Not really relevant to your question, but in my opinion this is only part of why fast memory is usually recommended for Ryzen. The main reason in my opinion are disappointed AMD fanboys who could not take that the benchmark results for Ryzen were significantly lower in many cpu-bound scenarios. Overclocked memory improves performance in these cases, but to some extent Intel-CPUs would also benefit from faster memory in these cases. But I digress... Quote:
-> true that no TR4 -> not necessarily, it has its pros and cons. I would not say that its power consumption is too high considering the performance. Higher core counts are not really a problem as long as you don't have to pay for parallel licenses. The thing is that the additional cores are less effective, but the simulation still runs faster. Only if you pay thousands of dollars for each additional parallel license you have to use more CPUs with lower core count. no old DDR3 -> I strongly disagree. An old dual-socket workstation (Xeon E5-26xx "v1" or v2) is still one of the most cost-efficient ways to get a powerful CFD workstations. Mainly for two reasons: the CPUs and the DDR3 reg ECC are pretty cheap as long as you buy them used. These two components can be bought used because they fail very rarely. This is my go-to option for a cheap CFD workstation when enough parallel licenses are available. I recently put together a 16-core workstation with 256GB of RAM for less than 1000€. Quote:
Quote:
|
|||||
October 2, 2017, 20:20 |
|
#8 |
New Member
Join Date: Sep 2017
Posts: 4
Rep Power: 9 |
Thanks for the replies.
@lac: I agree, HEDT platforms seem to be a good and probably our only choice. I showed estimated prices of several example setups to my collegue and he said he could afford high-end, but not server - even with used CPUs it is still too expensive (but we might consider getting a dual-socket motherboard and installing 1 CPU and 4 DIMMs, and upgrade later if necessary). @Micael: The institute my collegue works in already has a license. @flotus1: Good points. Indeed, while Xeon v3/v4 platform is still expensive with used CPUs and new RAM (it's hard to find any used DDR4), used v1/v2 Xeons and DDR3 DIMMs are ubiquitous and cheap - a 128GB/12(16)-core Dual LGA2011 setup can fit in the same price range as a new HEDT. However, I am concerned about a few things: 1. Most server DDR3 memory is 1333 or 1600 MHz, which won't allow for bandwidth as great as with high-clocked DDR4 in a HEDT motherboard, or with potential 8 DDR4 channels in Dual LGA2011-3. 2. The CPUs are also a bit slower and have higher TDP. 3. Since this will be a home machine, I want to make it as silent as possible. This could be more difficult and expensive with a server platform, which requires an EATX case, a more powerful PSU, and two CPU coolers that are efficient, quiet and small enough to fit near each other. For narrow ILM the only choice are quite costly Noctua models, and I wonder how noisy they are in real life. 4. No warranty for used parts and often no cashback, while BIOS/CPU/MB/RAM compatibility issues or damaged hardware is always slightly possible. These factors could make HEDT more preferable, but it's too early to decide without knowing what is available. A bit later I will suggest a few possible setups with local prices and then ask for opinions again. |
|
October 3, 2017, 06:27 |
|
#9 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
My choice for a dual-socket LGA2011 motherboard is the Asrock EP2C602-4L/D16 , mainly for three reasons:
Standard DDR3-1600 reg ECC modules usually run at DDR3-1866 without any issues. The same applies to DDR3L-1333 once you increase the voltage to 1.5V which is still within the specifications of the memory. Of course you will need a "v2" Xeon to reach this memory frequency. This way you can save a lot of money by avoiding expensive DDR3-1866. A workstation like this can be more silent than anything DELL or HP sell you off the shelf, just don't cheap out on the important parts. You would have to buy in the same price range anyway no matter if you go for a single- or a dual-socket solution. For CPU coolers use Noctua NH-U14s. They might seem expensive but are worth every cent. For cooling an 8- or 10-core X299 processor you would need a single cooler in the same price range as two of these. Btw. Noctua runs an outlet store on ebay, at least in germany: http://www.ebay.de/itm/Noctua-NH-U14...0AAOSwqu9VN3Pn For power supply I would recommend e.g. Bequiet Dark power Pro P11 550W. Again, not the cheapest but worth your money. A quality power supply like this can be re-used once you upgrade the other parts of the workstation, so the money is not wasted. Same applies to the case. Get one in the price range 100€ or above, it will last several hardware generations and allow for quiet cooling. For example a Nanoxia Deep Silence 5 rev. B. The only used parts here are CPUs and memory. They come cheap anyway and again, they rarely fail. What you should avoid buying used are Motherboards and power supplies. Of course this is a bit slower than dual-LGA2011-3, but also much less expensive. Once DDR4 and the newer Xeon processors get cheaper you can still sell the old motherboard, CPUs and memory and upgrade to a newer generation while keeping most of the other hardware. Since your budget seems to be limited and your parallel licenses are not, I thought you should know about this option. Not trying to convince you that dual-LGA2011 is the only way to go. If you want to use a modern HEDT platform that will also work. I just wanted to point out a less obvious option that is relatively cheap especially if you need more than 64GB of RAM. Last edited by flotus1; October 4, 2017 at 12:49. |
|
December 23, 2017, 15:00 |
|
#10 |
New Member
Join Date: Sep 2017
Posts: 4
Rep Power: 9 |
Hello again,
my colleague decided to wait with the PC purchase till Christmas (that is why I stopped posting here). At the moment we have narrowed down our choice to an LGA2066 system and picked all parts except RAM - this is where I need some expert advice again. Which option do you think would work better with an i7-7820X: 1) 64 GB @4400 MHz (4x CMK16GX4M2F4400C19 sets) or 2) 128 GB @3600 MHz (2x CMK64GX4M4B3600C18 sets)? Both are the fastest DDR4 DIMMs of their size (8 and 16 GB per module) that are locally available. My collegue insists on 128 GB, in order to have some headroom for bigger meshes. But I suspect that RAM bandwidth might become a bottleneck and that calculations, slow already due to high cell count, would take ages to finish on slower memory (even if using less than 64 GB). The question is therefore reduced to "mesh size vs required RAM bandwidth ratio" estimation for ANSYS Fluent simulations (e.g. 3D Navier-Stokes with SST turbulence model) - I really wonder if there are any benchmarks or other information available on that matter. Any advice would be welcome, thank you in advance. P.S. I am aware that overclocking to such speeds might be unsuccessful, depending on CPU & MB, and we will probably have to live with lower frequencies. But price difference is not that huge, and potentially faster memory may be able to run at lower timings/voltages which is also an advantage (not mentioning it can be used later in another system with a chance of getting better results). Last edited by Ep.Cygni; December 23, 2017 at 15:15. Reason: adding a remark about OC |
|
October 5, 2018, 08:20 |
|
#11 | |
New Member
Igino Leporati
Join Date: Oct 2018
Location: Reggio Emilia
Posts: 3
Rep Power: 8 |
Quote:
i'm looking right now for a rack server, CFD sim, fluent exactly. i'm troubling between threadripper 2990WX and a couple of Xeon Gold 6142. My var wrote to me about AVX extension, and spoke about a low use of it by Fluent, This is the point for me, with memory bandwidth I'm evaluating AMD solution for money save reason, as you can figure out in your opinion, are these two solution comparable under performance profile? |
||
October 5, 2018, 08:36 |
|
#12 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
I can not stress this enough: the TR 2990WX is a horrible choice for CFD. Best case is you deactivate the 2 dies that have no direct access to memory and get the performance of a TR 2950X. But you still paid extra for these cores. The TR 2990WX is a niche product for workloads that can mostly run in cache. In general, this does not apply for most CFD solvers.
If you want a cheap option from AMD for CFD workloads, get 1-2 Epyc 7301. |
|
October 5, 2018, 11:55 |
|
#13 | |
New Member
Igino Leporati
Join Date: Oct 2018
Location: Reggio Emilia
Posts: 3
Rep Power: 8 |
Quote:
Instead epyc should have Thanks a lot, I will think about it! |
||
October 5, 2018, 12:13 |
|
#14 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Not enough memory bandwidth for 32 cores is only part of the problem. But yes, 4 channels are not nearly enough to put 32 cores to use in CFD.
The other part of the problem is the "unconventional" NUMA topology. A TR 2990WX consists of 4 dies with 8 cores each. The Epyc variant of this CPU (e.g. Epyc 7601) has one memory controller with two channels for each die. In order to make the CPU suitable for X399 motherboards, AMD had to / decided to cut the memory controllers from 2 dies. These dies have no direct access to memory. Their memory traffic has to be routed over the infinity fabric and then processed by the remaining memory controllers on the other 2 dies. This imposes a huge penalty for memory bandwidth and latency. A CFD simulation running on all 32 cores of this CPU would run significantly slower than using only 16 cores on the dies that have direct memory access. And this CPU exists in the form of the much cheaper TR 2950X. Then again, even a TR 2950X only has 4 memory channels for 16 cores. For parallel CFD it gets beaten by an Epyc 7301 which runs lower clock speeds but has 8 memory channels for its 16 cores, making it one of the best value CPUs for this application. |
|
October 8, 2018, 03:53 |
|
#15 | |
New Member
Igino Leporati
Join Date: Oct 2018
Location: Reggio Emilia
Posts: 3
Rep Power: 8 |
Quote:
Nothing else to ask, it confirm my thoughts so or epyc or xeon platform, nothing else ECC memory preclude consumer platform thanks a lot sir! |
||
October 16, 2018, 15:36 |
|
#16 |
New Member
Matt M
Join Date: Oct 2018
Posts: 2
Rep Power: 0 |
I also have some questions relating to creating a cost-effective machine for ANSYS Fluent if anyone could give advice. I'm a collegiate student on a solar vehicle team- we use Fluent to test many different shell designs for the car before we build it- and we are currently looking to improve our testing flow with a faster computer. These are my questions:
1. After reading much discussion on the forum, it seems that an AMD Epyc system is the best way to go unless we want to go the used route with a Xeon v2 or v3. With an Epyc system, however, we were wondering which Epyc configuration would be most cost-efficient (lowest ratio of a given simulation's time over total cost of system). Possible Components Epyc 7351P or 7401P 8 sticks of 4 or 8GB DDR4 ECC memory Supermicro H11SSL-NC Motherboard Power supply (Platinum, 750W+?) SP3 socket cooler 4 different configs: EPYC workstation possible configs.PNG -the P refers to a processor variant that can't be used in a dual CPU board (which is why it is cheaper than a non-P equivalent when looking at the Newegg price page) -Bottom line, how much performance would be gained by using 8GB sticks vs 4GB ones, and same thing for the 24 vs 16 core CPU. Are there things I'm not considering? 2. How fast can the final system run a simulation if a good estimate of the kind of simulation our team usually runs is the sedan_4m benchmark on the ANSYS Fluent benchmark page. That page lists a 32 "core" CPU (I think the benchmark page lists the thread number, not sure) performing 4032 "benchmarks" per day according to their benchmarking terminology... do "benchmarks" refer to iterations? Lots of confusion on this, but we just want to get an idea of how fast simulations will run on a new system. 3. There is an option for GPU acceleration in Fluent. With a proposed system such as this, would a GPU even contribute in any meaningful way (we have a 1080Ti)? If it doesn't, should it still be kept in the system for post-processing? |
|
October 17, 2018, 05:51 |
|
#17 | |||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
Quote:
4GB vs. 8GB DIMMs: It does not directly impact performance for cases where you have enough total system memory. You can simply run larger cases in-core with larger DIMMs. I would definitely go with 8GB DIMMs since 32GB of RAM really is not a lot for a workstation like this. And then there is the issue of NUMA topology: lightly threaded workloads -as they occur in pre- and post-processing- will have to use memory from different NUMA nodes sooner the less memory per node you have. This can also result in lower performance. If you absolutely have to give up one of the upgrades, I would say go with the 16-core CPU but 8x8GB of RAM. What is the cell count of your largest simulations? Our formula student team was happy to get their hands on a machine with 256GB of RAM because their cell counts were that high. Quote:
Quote:
For post-processing alone this graphics card is definitely overkill. Something in the range of a GTX 1050TI would be more than enough. |
||||
October 19, 2018, 18:21 |
|
#18 |
New Member
Matt M
Join Date: Oct 2018
Posts: 2
Rep Power: 0 |
Thank you very much for the information @flotus1! We typically run shell simulations with 4-6M cells... would 32GB of RAM in this proposed system be enough to handle that (would there be any headroom to add a few million more to the cell count if needed)? If so, we could get 32GB now and wait til memory prices go down over the next few months/years to go all out and get 64 or 128GB if we wanted.
|
|
October 19, 2018, 18:41 |
|
#19 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49 |
With Fluent for standard aerodynamics simulations you are dealing with 1-4GB of RAM per million cells. Single precision SIMPLE solver being on the lower end and double precision coupled solver on the upper end of that estimation.
If you get a single-socket motherboard with 16 DIMM slots you can upgrade memory later while keeping the old. Ideally using DIMMs very similar to the ones you already have. That supermicro board only has 8 DIMM slots, so you will have to change the RAM in order to upgrade. |
|
February 13, 2019, 08:01 |
|
#20 |
Senior Member
|
I strongly suggest you buy the old Xeon workstations. For example, I now use a workstation with two Xeon 2650 v2 CPUs (2.60GHz, 8 cores, 16threads) 32G RAM. It is very cheap now. You can buy two workstations to set up a small cluster. Now, we have totally 32 Cores and 64 threads to conduct the simulation.
Another important suggestion is your operating system. fluent in the Linux system run obviously more quickly than in the windows system. This is my experience, hope this will help you. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[foam-extend.org] Error compiling OpenFOAM-1.6-ext | Canesin | OpenFOAM Installation | 137 | January 20, 2016 15:56 |
Error in reading Fluent 13 case file in fluent 12 | apurv | FLUENT | 2 | July 12, 2013 08:46 |
Paraview Compiling Error (OpenFOAM 2.1.x + openSUSE 12.2) | sfigato | OpenFOAM Installation | 22 | January 31, 2013 11:16 |
HELP: building Fluent application | quiri | FLUENT | 2 | October 27, 2011 11:35 |
Compilation error OF1.5-dev on Suse10.3 | darenyang | OpenFOAM Installation | 0 | April 29, 2009 05:55 |