CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Building a "home workstation" for ANSYS Fluent

Register Blogs Community New Posts Updated Threads Search

Like Tree11Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   September 26, 2017, 09:41
Default Building a "home workstation" for ANSYS Fluent
  #1
New Member
 
Join Date: Sep 2017
Posts: 4
Rep Power: 9
Ep.Cygni is on a distinguished road
Hello.
A collegue asked me to help him build a new home PC next month. Its sole purpose is to run flow simulations with large meshes using mainly ANSYS Fluent or corporate in-house code, but Fluent is of first concern. The license will be also corporate for as many cores as needed.

I have some expertise in PC hardware, but almost none in flow simulation - my own field of research is heat transfer. When solving heat conduction problems in Fluent, using just the energy equation, with mesh size from very small (20K) to relatively large (40M cells), I noticed that parallel performance did not scale very good - e.g. 4 threads were just 2-2.5x faster than serial, and it was more time-efficient to run a few serial cases simultaneously. But I realize this may not be the case for Navier-Stokes solver which can behave differently in parallel. That's why I went to this forum for recommendations.

My colleague's budget is not very tight for a home PC, e.g. a platform with Ryzen 7 1800X and 64 GB RAM is totally fine for him price-wise, however I'd like to keep cost efficiency in reasonable limits and avoid $2000 high-end CPUs in favor of slightly slower but not overpriced models. Server platforms are not considered since the extra features they offer are unnecessary (except RAM capabilities maybe).

With that said, I'd like to ask a few questions. They are not about the exact config - it will be chosen later according to budget and availability (although suggestions are still welcome), but rather about influence of different factors on performance in the specific task of flow simulation in ANSYS Fluent with large meshes (say 30M cells) and using density-based solver.

The questions are:

1. How well does Fluent Navier-stokes parallel performance scale with thread count? (And what is better then - more cores or higher per-core performance?)

2. How important is RAM bandwidth for this kind of workload? Is it worth buying a TR4 or LGA2066 platform for extra memory channels, or high-frequency DIMMs for a 2-channel system? (The second question is more relevant to Intel platforms since AM4/TR4 requires fast RAM anyway.)

3. How do Skylake-X processors with their new mesh topology and cache architecture compare to previous generations in terms of Fluent performance? (Reviews say they have higher inter-core data transfer latency, but does it matter in CFD that much?)

4. Similarly, do Threadripper CPUs suffer from non-uniform Infinity Fabric latency (different access time between cores in same and different dies) when used in CFD? (Again got this info from reviews, but they mostly focus on games unfortunately.)

5. Maybe a stupid question that I've had for a while but couldn't find any info on it: does Fluent (and CFD-post) use GPU acceleration in 3D scenes and is it worth getting a powerful graphics card for the new machine? (Forgot to say - GPGPU is not considered due to solver limitations and memory requirements.)

Any advice and clues are welcome, thank you in advance.
flotus1 and Noco like this.

Last edited by Ep.Cygni; September 26, 2017 at 18:06. Reason: caught a few typos
Ep.Cygni is offline   Reply With Quote

Old   September 26, 2017, 18:39
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Nice, a list of well-formulated questions:

1. Fluent can scale pretty well on large amounts of cores. Depending on the case and the computer architecture up to hundreds or even thousands of cores. That being said, scaling on a single node is usually limited by one factor: memory bandwidth. Apart from that, a higher "per-core" performance is usually more desirable than very high core counts, partly because most tasks in pre- and post-processing do not scale on many cores.

2. Core counts and memory bandwidth should be balanced. An extremely expensive 18-core desktop processor will be severely limited by a lack of bandwidth and thus a waste of money especially for CFD.
The mainstream platforms with only two memory channels are not an option if you can afford more.
Btw: in my opinion the notion that Ryzen has to be paired with faster memory and Intel processors do not is a myth to some extent. Faster memory helps processors from both manufactures in memory-intensive workloads.

3. For CFD applications, Skylake-X in its default configuration is slightly slower than its predecessor in terms of per-core performance. One of the reasons is indeed the cache architecture that is once again slower than in the previous generation.

4. Fluent and its MPI parallelization do not suffer as much from the slower inter-CCX communication as many mainstream applications do. The reason being that MPI tends to minimize data transfer between cores to some extent.

5. You don't necessarily need an expensive graphics card for pre- and postprocessing. I would recommend a GTX 1050TI 4GB as a minimum or a GTX 1060 6GB as a step-up.


My advice would be to determine how many parallel licenses your colleague has available, how many cells his largest simulation models consist of and if he is using solver types or additional physical models that further increase the memory requirement.
That should help narrowing down which platform is best for him.
Edit: should have read the whole post. "Unlimited" amount of parallel licenses and ~30M cells

Quote:
Server platforms are not considered since the extra features they offer are unnecessary (except RAM capabilities maybe)
You might want to reconsider this statement. Server platforms, especially dual-socket configurations, are very favorable for CFD due to the increased memory bandwidth from 2 CPUs.
lac and Ep.Cygni like this.

Last edited by flotus1; September 27, 2017 at 05:25.
flotus1 is offline   Reply With Quote

Old   September 28, 2017, 07:40
Default
  #3
lac
New Member
 
Join Date: Apr 2016
Posts: 12
Rep Power: 10
lac is on a distinguished road
My experince for the incompressible simple solver in Fluent was, that 8GB of memory was barely enough for cases with 4M cells. But I never solved "bigger" cases with Fluent. Recently I'm using Openfoam, and 64GB memory is only enoguh for around 25M cells when using simpleFoam with k-eps modells. So maybe Ryzen is not the best option, as I think it is limited to 64GB of memory.
Ep.Cygni likes this.
lac is offline   Reply With Quote

Old   September 28, 2017, 16:03
Default
  #4
New Member
 
Join Date: Sep 2017
Posts: 4
Rep Power: 9
Ep.Cygni is on a distinguished road
Thank you flotus1 for informative and helpful answers.
Thanks to lac for the advice too.

Quote:
Originally Posted by flotus1 View Post
Btw: in my opinion the notion that Ryzen has to be paired with faster memory and Intel processors do not is a myth to some extent. Faster memory helps processors from both manufactures in memory-intensive workloads.
The need for high-speed RAM on new AMD platforms is a common recommendation given in Ryzen processor reviews. It is based on the fact that Infinity Fabric bus clock is fixed at RAM frequency, and with higher clocked DIMMs the overall CPU performance can notably improve in certain applications (particularly in games, as tests have shown). That's where I got my initial opinion from. However, as you mentioned in further answers, CPU's inter-core bus bandwidth is not as important for CFD as RAM bandwidth. Therefore fast RAM is required in any case, and I don't take CPU bus into account anymore.

At this point I came to the following conclusions about platform choice:
- no mainstream, because RAM bandwidth and size is not enough;
- no TR4, due do unnecessary high core count (8 and higher) for the available RAM bandwidth, high cost (motherboards are rare and expensive) and high TDP;
- no old DDR3 platforms.

Thus, I think about 5 options so far (including server ones as you suggested):
LGA2011-3 with a 6-core CPU and 4 RAM channels,
LGA2066 with a 6-core CPU and 4 RAM channels (cheaper and probably slower due to new caches),
Dual LGA2011-3 with 2x4/2x6-core CPUs and 8 RAM channels,
LGA3647 with a 6/8-core CPU and 6 RAM channels (expensive and very hard to get at the moment),
SP3 with a 8/16 core CPU and 8 RAM channels (nearly impossible to get at the moment).

Am I right with the Cores-to-RAM-channels ratio, or should we consider a 8core/4channel option too (and then TR4 socket as well)?

Regarding memory size, in case of HEDT platforms we will most likely get 64GB RAM (4x16GB) with the possibility to add another 64 GB in the future. In case of Dual LGA2011-3 or SP3, we'll need to have 8 DIMMs from the beginning to use all channels. This is a disadvantage, because 8x16GB might be out of budget right now, and 8x8 will not allow to later upgrade to max capacity by adding more DIMMs (especially with cheaper dual-socket WS motherboards that have only 8 RAM slots). Finally, LGA3647 and 6 DIMMs is more of a hypothetical option which we most likely won't be able to afford.

Last edited by Ep.Cygni; September 29, 2017 at 04:43. Reason: added EPYC option
Ep.Cygni is offline   Reply With Quote

Old   September 29, 2017, 09:39
Default
  #5
lac
New Member
 
Join Date: Apr 2016
Posts: 12
Rep Power: 10
lac is on a distinguished road
In my opinion, the HEDT platforms can be good alternative for the older dual CPU platforms (like E5 v3-v4) if you are on a tight budget, and for the memory I'd also take the fact into account that you can use faster memory (like 3200MHz). However, the bandwidth will be still less than for any dual CPU platform, as you can't OC the memory much more than this.

If I were you, I'd buy some second hand E5-v3/v4 Xeons with a brand new Supermicro board. I'd also get a board with 16 DIMM slots. This way you will have an upgrade path to more memory capacity and second hand cpus are not very expensive these days.
Ep.Cygni likes this.
lac is offline   Reply With Quote

Old   September 29, 2017, 10:00
Default
  #6
Senior Member
 
Micael
Join Date: Mar 2009
Location: Canada
Posts: 157
Rep Power: 18
Micael is on a distinguished road
Out of curiosity, are you going to buy those commercial licences or you already have them? The system you are discussing will cost close to nothing compared to the licences.
Micael is offline   Reply With Quote

Old   September 29, 2017, 15:42
Default
  #7
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Originally Posted by Ep.Cygni View Post
The need for high-speed RAM on new AMD platforms is a common recommendation given in Ryzen processor reviews. It is based on the fact that Infinity Fabric bus clock is fixed at RAM frequency, and with higher clocked DIMMs the overall CPU performance can notably improve in certain applications (particularly in games, as tests have shown). That's where I got my initial opinion from. However, as you mentioned in further answers, CPU's inter-core bus bandwidth is not as important for CFD as RAM bandwidth. Therefore fast RAM is required in any case, and I don't take CPU bus into account anymore.
It is true that the inter-CCX communication speed in Ryzen is linked to the memory speed and thus faster memory equals better performance.
Not really relevant to your question, but in my opinion this is only part of why fast memory is usually recommended for Ryzen. The main reason in my opinion are disappointed AMD fanboys who could not take that the benchmark results for Ryzen were significantly lower in many cpu-bound scenarios. Overclocked memory improves performance in these cases, but to some extent Intel-CPUs would also benefit from faster memory in these cases. But I digress...

Quote:
Originally Posted by Ep.Cygni View Post
At this point I came to the following conclusions about platform choice:
- no mainstream, because RAM bandwidth and size is not enough;
- no TR4, due do unnecessary high core count (8 and higher) for the available RAM bandwidth, high cost (motherboards are rare and expensive) and high TDP;
- no old DDR3 platforms.
No mainstream
-> true that

no TR4
-> not necessarily, it has its pros and cons. I would not say that its power consumption is too high considering the performance. Higher core counts are not really a problem as long as you don't have to pay for parallel licenses. The thing is that the additional cores are less effective, but the simulation still runs faster. Only if you pay thousands of dollars for each additional parallel license you have to use more CPUs with lower core count.

no old DDR3 ->
I strongly disagree. An old dual-socket workstation (Xeon E5-26xx "v1" or v2) is still one of the most cost-efficient ways to get a powerful CFD workstations. Mainly for two reasons: the CPUs and the DDR3 reg ECC are pretty cheap as long as you buy them used. These two components can be bought used because they fail very rarely. This is my go-to option for a cheap CFD workstation when enough parallel licenses are available. I recently put together a 16-core workstation with 256GB of RAM for less than 1000€.


Quote:
Originally Posted by Ep.Cygni View Post
Am I right with the Cores-to-RAM-channels ratio, or should we consider a 8core/4channel option too (and then TR4 socket as well)?
For the HEDT Platforms you can go for CPUs with at least 8 cores, the total price/performance ratio of the workstation will be better. Especially since DDR4 RAM is quite expensive these days.


Quote:
Originally Posted by Ep.Cygni View Post
Regarding memory size, in case of HEDT platforms we will most likely get 64GB RAM (4x16GB) with the possibility to add another 64 GB in the future. In case of Dual LGA2011-3 or SP3, we'll need to have 8 DIMMs from the beginning to use all channels. This is a disadvantage, because 8x16GB might be out of budget right now, and 8x8 will not allow to later upgrade to max capacity by adding more DIMMs (especially with cheaper dual-socket WS motherboards that have only 8 RAM slots). Finally, LGA3647 and 6 DIMMs is more of a hypothetical option which we most likely won't be able to afford.
Whatever dual-socket board you buy, just make sure it has 16 DIMM slots. This will allow for future memory upgrades. If dual LGA 3647 is too expensive you should consider used CPUs for dual-socket 2011-3. Their retail prices did not drop at all and you get less CFD performance/$ than with LGA 3647.
lac and Ep.Cygni like this.
flotus1 is offline   Reply With Quote

Old   October 2, 2017, 20:20
Default
  #8
New Member
 
Join Date: Sep 2017
Posts: 4
Rep Power: 9
Ep.Cygni is on a distinguished road
Thanks for the replies.

@lac:
I agree, HEDT platforms seem to be a good and probably our only choice. I showed estimated prices of several example setups to my collegue and he said he could afford high-end, but not server - even with used CPUs it is still too expensive (but we might consider getting a dual-socket motherboard and installing 1 CPU and 4 DIMMs, and upgrade later if necessary).

@Micael:
The institute my collegue works in already has a license.

@flotus1:
Good points. Indeed, while Xeon v3/v4 platform is still expensive with used CPUs and new RAM (it's hard to find any used DDR4), used v1/v2 Xeons and DDR3 DIMMs are ubiquitous and cheap - a 128GB/12(16)-core Dual LGA2011 setup can fit in the same price range as a new HEDT. However, I am concerned about a few things:

1. Most server DDR3 memory is 1333 or 1600 MHz, which won't allow for bandwidth as great as with high-clocked DDR4 in a HEDT motherboard, or with potential 8 DDR4 channels in Dual LGA2011-3.
2. The CPUs are also a bit slower and have higher TDP.
3. Since this will be a home machine, I want to make it as silent as possible. This could be more difficult and expensive with a server platform, which requires an EATX case, a more powerful PSU, and two CPU coolers that are efficient, quiet and small enough to fit near each other. For narrow ILM the only choice are quite costly Noctua models, and I wonder how noisy they are in real life.
4. No warranty for used parts and often no cashback, while BIOS/CPU/MB/RAM compatibility issues or damaged hardware is always slightly possible.

These factors could make HEDT more preferable, but it's too early to decide without knowing what is available. A bit later I will suggest a few possible setups with local prices and then ask for opinions again.
Ep.Cygni is offline   Reply With Quote

Old   October 3, 2017, 06:27
Default
  #9
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
My choice for a dual-socket LGA2011 motherboard is the Asrock EP2C602-4L/D16 , mainly for three reasons:
  • still available today and for a reasonable price of ~350€
  • uses standard square-ilm cooler mounts
  • allows memory tweaking which most competitors do not
So you can buy it new with full warranty, use most standard air coolers and use cheap memory.
Standard DDR3-1600 reg ECC modules usually run at DDR3-1866 without any issues. The same applies to DDR3L-1333 once you increase the voltage to 1.5V which is still within the specifications of the memory. Of course you will need a "v2" Xeon to reach this memory frequency. This way you can save a lot of money by avoiding expensive DDR3-1866.
A workstation like this can be more silent than anything DELL or HP sell you off the shelf, just don't cheap out on the important parts. You would have to buy in the same price range anyway no matter if you go for a single- or a dual-socket solution.

For CPU coolers use Noctua NH-U14s. They might seem expensive but are worth every cent. For cooling an 8- or 10-core X299 processor you would need a single cooler in the same price range as two of these. Btw. Noctua runs an outlet store on ebay, at least in germany: http://www.ebay.de/itm/Noctua-NH-U14...0AAOSwqu9VN3Pn
For power supply I would recommend e.g. Bequiet Dark power Pro P11 550W. Again, not the cheapest but worth your money. A quality power supply like this can be re-used once you upgrade the other parts of the workstation, so the money is not wasted.
Same applies to the case. Get one in the price range 100€ or above, it will last several hardware generations and allow for quiet cooling. For example a Nanoxia Deep Silence 5 rev. B.

The only used parts here are CPUs and memory. They come cheap anyway and again, they rarely fail. What you should avoid buying used are Motherboards and power supplies.
Of course this is a bit slower than dual-LGA2011-3, but also much less expensive. Once DDR4 and the newer Xeon processors get cheaper you can still sell the old motherboard, CPUs and memory and upgrade to a newer generation while keeping most of the other hardware. Since your budget seems to be limited and your parallel licenses are not, I thought you should know about this option.

Not trying to convince you that dual-LGA2011 is the only way to go. If you want to use a modern HEDT platform that will also work. I just wanted to point out a less obvious option that is relatively cheap especially if you need more than 64GB of RAM.
scipy, adr_ps and Ep.Cygni like this.

Last edited by flotus1; October 4, 2017 at 12:49.
flotus1 is offline   Reply With Quote

Old   December 23, 2017, 15:00
Default
  #10
New Member
 
Join Date: Sep 2017
Posts: 4
Rep Power: 9
Ep.Cygni is on a distinguished road
Hello again,
my colleague decided to wait with the PC purchase till Christmas (that is why I stopped posting here).

At the moment we have narrowed down our choice to an LGA2066 system and picked all parts except RAM - this is where I need some expert advice again.

Which option do you think would work better with an i7-7820X:
1) 64 GB @4400 MHz (4x CMK16GX4M2F4400C19 sets)
or
2) 128 GB @3600 MHz (2x CMK64GX4M4B3600C18 sets)?

Both are the fastest DDR4 DIMMs of their size (8 and 16 GB per module) that are locally available.

My collegue insists on 128 GB, in order to have some headroom for bigger meshes. But I suspect that RAM bandwidth might become a bottleneck and that calculations, slow already due to high cell count, would take ages to finish on slower memory (even if using less than 64 GB).

The question is therefore reduced to "mesh size vs required RAM bandwidth ratio" estimation for ANSYS Fluent simulations (e.g. 3D Navier-Stokes with SST turbulence model) - I really wonder if there are any benchmarks or other information available on that matter.

Any advice would be welcome, thank you in advance.

P.S. I am aware that overclocking to such speeds might be unsuccessful, depending on CPU & MB, and we will probably have to live with lower frequencies. But price difference is not that huge, and potentially faster memory may be able to run at lower timings/voltages which is also an advantage (not mentioning it can be used later in another system with a chance of getting better results).

Last edited by Ep.Cygni; December 23, 2017 at 15:15. Reason: adding a remark about OC
Ep.Cygni is offline   Reply With Quote

Old   October 5, 2018, 08:20
Default
  #11
New Member
 
Igino Leporati
Join Date: Oct 2018
Location: Reggio Emilia
Posts: 3
Rep Power: 8
Igino Leporati is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Nice, a list of well-formulated questions:

1. Fluent can scale pretty well on large amounts of cores. Depending on the case and the computer architecture up to hundreds or even thousands of cores. That being said, scaling on a single node is usually limited by one factor: memory bandwidth. Apart from that, a higher "per-core" performance is usually more desirable than very high core counts, partly because most tasks in pre- and post-processing do not scale on many cores.

2. Core counts and memory bandwidth should be balanced. An extremely expensive 18-core desktop processor will be severely limited by a lack of bandwidth and thus a waste of money especially for CFD.
The mainstream platforms with only two memory channels are not an option if you can afford more.
Btw: in my opinion the notion that Ryzen has to be paired with faster memory and Intel processors do not is a myth to some extent. Faster memory helps processors from both manufactures in memory-intensive workloads.

3. For CFD applications, Skylake-X in its default configuration is slightly slower than its predecessor in terms of per-core performance. One of the reasons is indeed the cache architecture that is once again slower than in the previous generation.

4. Fluent and its MPI parallelization do not suffer as much from the slower inter-CCX communication as many mainstream applications do. The reason being that MPI tends to minimize data transfer between cores to some extent.

5. You don't necessarily need an expensive graphics card for pre- and postprocessing. I would recommend a GTX 1050TI 4GB as a minimum or a GTX 1060 6GB as a step-up.


My advice would be to determine how many parallel licenses your colleague has available, how many cells his largest simulation models consist of and if he is using solver types or additional physical models that further increase the memory requirement.
That should help narrowing down which platform is best for him.
Edit: should have read the whole post. "Unlimited" amount of parallel licenses and ~30M cells


You might want to reconsider this statement. Server platforms, especially dual-socket configurations, are very favorable for CFD due to the increased memory bandwidth from 2 CPUs.
Hi,
i'm looking right now for a rack server, CFD sim, fluent exactly.
i'm troubling between threadripper 2990WX and a couple of Xeon Gold 6142.
My var wrote to me about AVX extension, and spoke about a low use of it by Fluent,
This is the point for me, with memory bandwidth
I'm evaluating AMD solution for money save reason, as you can figure out
in your opinion, are these two solution comparable under performance profile?
Igino Leporati is offline   Reply With Quote

Old   October 5, 2018, 08:36
Default
  #12
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I can not stress this enough: the TR 2990WX is a horrible choice for CFD. Best case is you deactivate the 2 dies that have no direct access to memory and get the performance of a TR 2950X. But you still paid extra for these cores. The TR 2990WX is a niche product for workloads that can mostly run in cache. In general, this does not apply for most CFD solvers.
If you want a cheap option from AMD for CFD workloads, get 1-2 Epyc 7301.
flotus1 is offline   Reply With Quote

Old   October 5, 2018, 11:55
Default
  #13
New Member
 
Igino Leporati
Join Date: Oct 2018
Location: Reggio Emilia
Posts: 3
Rep Power: 8
Igino Leporati is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
I can not stress this enough: the TR 2990WX is a horrible choice for CFD. Best case is you deactivate the 2 dies that have no direct access to memory and get the performance of a TR 2950X. But you still paid extra for these cores. The TR 2990WX is a niche product for workloads that can mostly run in cache. In general, this does not apply for most CFD solvers.
If you want a cheap option from AMD for CFD workloads, get 1-2 Epyc 7301.
So you think that tr doesn't have enough memory bandwidth for all cores?
Instead epyc should have
Thanks a lot, I will think about it!
Igino Leporati is offline   Reply With Quote

Old   October 5, 2018, 12:13
Default
  #14
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Not enough memory bandwidth for 32 cores is only part of the problem. But yes, 4 channels are not nearly enough to put 32 cores to use in CFD.
The other part of the problem is the "unconventional" NUMA topology. A TR 2990WX consists of 4 dies with 8 cores each. The Epyc variant of this CPU (e.g. Epyc 7601) has one memory controller with two channels for each die. In order to make the CPU suitable for X399 motherboards, AMD had to / decided to cut the memory controllers from 2 dies. These dies have no direct access to memory. Their memory traffic has to be routed over the infinity fabric and then processed by the remaining memory controllers on the other 2 dies. This imposes a huge penalty for memory bandwidth and latency. A CFD simulation running on all 32 cores of this CPU would run significantly slower than using only 16 cores on the dies that have direct memory access. And this CPU exists in the form of the much cheaper TR 2950X.
Then again, even a TR 2950X only has 4 memory channels for 16 cores. For parallel CFD it gets beaten by an Epyc 7301 which runs lower clock speeds but has 8 memory channels for its 16 cores, making it one of the best value CPUs for this application.
flotus1 is offline   Reply With Quote

Old   October 8, 2018, 03:53
Wink
  #15
New Member
 
Igino Leporati
Join Date: Oct 2018
Location: Reggio Emilia
Posts: 3
Rep Power: 8
Igino Leporati is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Not enough memory bandwidth for 32 cores is only part of the problem. But yes, 4 channels are not nearly enough to put 32 cores to use in CFD.
The other part of the problem is the "unconventional" NUMA topology. A TR 2990WX consists of 4 dies with 8 cores each. The Epyc variant of this CPU (e.g. Epyc 7601) has one memory controller with two channels for each die. In order to make the CPU suitable for X399 motherboards, AMD had to / decided to cut the memory controllers from 2 dies. These dies have no direct access to memory. Their memory traffic has to be routed over the infinity fabric and then processed by the remaining memory controllers on the other 2 dies. This imposes a huge penalty for memory bandwidth and latency. A CFD simulation running on all 32 cores of this CPU would run significantly slower than using only 16 cores on the dies that have direct memory access. And this CPU exists in the form of the much cheaper TR 2950X.
Then again, even a TR 2950X only has 4 memory channels for 16 cores. For parallel CFD it gets beaten by an Epyc 7301 which runs lower clock speeds but has 8 memory channels for its 16 cores, making it one of the best value CPUs for this application.
claps for you.
Nothing else to ask, it confirm my thoughts
so or epyc or xeon platform, nothing else
ECC memory preclude consumer platform
thanks a lot sir!
Igino Leporati is offline   Reply With Quote

Old   October 16, 2018, 15:36
Default
  #16
New Member
 
Matt M
Join Date: Oct 2018
Posts: 2
Rep Power: 0
Mart4672 is on a distinguished road
I also have some questions relating to creating a cost-effective machine for ANSYS Fluent if anyone could give advice. I'm a collegiate student on a solar vehicle team- we use Fluent to test many different shell designs for the car before we build it- and we are currently looking to improve our testing flow with a faster computer. These are my questions:

1. After reading much discussion on the forum, it seems that an AMD Epyc system is the best way to go unless we want to go the used route with a Xeon v2 or v3. With an Epyc system, however, we were wondering which Epyc configuration would be most cost-efficient (lowest ratio of a given simulation's time over total cost of system).

Possible Components
Epyc 7351P or 7401P
8 sticks of 4 or 8GB DDR4 ECC memory
Supermicro H11SSL-NC Motherboard
Power supply (Platinum, 750W+?)
SP3 socket cooler

4 different configs:

EPYC workstation possible configs.PNG


-the P refers to a processor variant that can't be used in a dual CPU board (which is why it is cheaper than a non-P equivalent when looking at the Newegg price page)
-Bottom line, how much performance would be gained by using 8GB sticks vs 4GB ones, and same thing for the 24 vs 16 core CPU.

Are there things I'm not considering?


2. How fast can the final system run a simulation if a good estimate of the kind of simulation our team usually runs is the sedan_4m benchmark on the ANSYS Fluent benchmark page. That page lists a 32 "core" CPU (I think the benchmark page lists the thread number, not sure) performing 4032 "benchmarks" per day according to their benchmarking terminology... do "benchmarks" refer to iterations? Lots of confusion on this, but we just want to get an idea of how fast simulations will run on a new system.


3. There is an option for GPU acceleration in Fluent. With a proposed system such as this, would a GPU even contribute in any meaningful way (we have a 1080Ti)? If it doesn't, should it still be kept in the system for post-processing?
Mart4672 is offline   Reply With Quote

Old   October 17, 2018, 05:51
Default
  #17
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Bottom line, how much performance would be gained by using 8GB sticks vs 4GB ones, and same thing for the 24 vs 16 core CPU.
24 cores vs. 16 cores: from my point of view, potential performance gains are enough to justify the higher price of the single-socket 24-core SKU. It is still a really good value. You can have a look at the benchmarks you linked: they compared a system with 2 Epyc 7601 CPUs running 32 and 64 threads. The speedup going from 16 cores to 24 cores will be about the same.
4GB vs. 8GB DIMMs: It does not directly impact performance for cases where you have enough total system memory. You can simply run larger cases in-core with larger DIMMs. I would definitely go with 8GB DIMMs since 32GB of RAM really is not a lot for a workstation like this. And then there is the issue of NUMA topology: lightly threaded workloads -as they occur in pre- and post-processing- will have to use memory from different NUMA nodes sooner the less memory per node you have. This can also result in lower performance.
If you absolutely have to give up one of the upgrades, I would say go with the 16-core CPU but 8x8GB of RAM. What is the cell count of your largest simulations? Our formula student team was happy to get their hands on a machine with 256GB of RAM because their cell counts were that high.

Quote:
That page lists a 32 "core" CPU (I think the benchmark page lists the thread number, not sure) performing 4032 "benchmarks" per day according to their benchmarking terminology... do "benchmarks" refer to iterations?
In Ansys terminology, one "benchmark" should refer to 20 or 25 iterations (not quite sure) of the given problem.

Quote:
3. There is an option for GPU acceleration in Fluent. With a proposed system such as this, would a GPU even contribute in any meaningful way (we have a 1080Ti)? If it doesn't, should it still be kept in the system for post-processing?
Depends on which solver you are using and if your simulation fits into VRAM of the card: GPU acceleration in Ansys Fluent
For post-processing alone this graphics card is definitely overkill. Something in the range of a GTX 1050TI would be more than enough.
flotus1 is offline   Reply With Quote

Old   October 19, 2018, 18:21
Default
  #18
New Member
 
Matt M
Join Date: Oct 2018
Posts: 2
Rep Power: 0
Mart4672 is on a distinguished road
Thank you very much for the information @flotus1! We typically run shell simulations with 4-6M cells... would 32GB of RAM in this proposed system be enough to handle that (would there be any headroom to add a few million more to the cell count if needed)? If so, we could get 32GB now and wait til memory prices go down over the next few months/years to go all out and get 64 or 128GB if we wanted.
Mart4672 is offline   Reply With Quote

Old   October 19, 2018, 18:41
Default
  #19
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
With Fluent for standard aerodynamics simulations you are dealing with 1-4GB of RAM per million cells. Single precision SIMPLE solver being on the lower end and double precision coupled solver on the upper end of that estimation.
If you get a single-socket motherboard with 16 DIMM slots you can upgrade memory later while keeping the old. Ideally using DIMMs very similar to the ones you already have. That supermicro board only has 8 DIMM slots, so you will have to change the RAM in order to upgrade.
flotus1 is offline   Reply With Quote

Old   February 13, 2019, 08:01
Default
  #20
Senior Member
 
ztdep's Avatar
 
p ding
Join Date: Mar 2009
Posts: 427
Rep Power: 19
ztdep is on a distinguished road
Send a message via Yahoo to ztdep Send a message via Skype™ to ztdep
I strongly suggest you buy the old Xeon workstations. For example, I now use a workstation with two Xeon 2650 v2 CPUs (2.60GHz, 8 cores, 16threads) 32G RAM. It is very cheap now. You can buy two workstations to set up a small cluster. Now, we have totally 32 Cores and 64 threads to conduct the simulation.
Another important suggestion is your operating system. fluent in the Linux system run obviously more quickly than in the windows system.
This is my experience, hope this will help you.
ztdep is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[foam-extend.org] Error compiling OpenFOAM-1.6-ext Canesin OpenFOAM Installation 137 January 20, 2016 15:56
Error in reading Fluent 13 case file in fluent 12 apurv FLUENT 2 July 12, 2013 08:46
Paraview Compiling Error (OpenFOAM 2.1.x + openSUSE 12.2) sfigato OpenFOAM Installation 22 January 31, 2013 11:16
HELP: building Fluent application quiri FLUENT 2 October 27, 2011 11:35
Compilation error OF1.5-dev on Suse10.3 darenyang OpenFOAM Installation 0 April 29, 2009 05:55


All times are GMT -4. The time now is 16:42.