|
[Sponsors] |
August 9, 2019, 02:21 |
|
#21 | |
Member
|
Quote:
Besides, it is reported that the memory WRITE bandwidth is havled for Zen2 with single CCD(3700x, 3600 etc), in the beginning I thought this might compromise cfd performance substantially. However, Simbelmynė's test shows quite the opposite... So CFD is sensitive to only memory read bandwidth? |
||
August 9, 2019, 03:25 |
|
#22 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Looking at the schematics of the CPU, you will see that the distance between each memory controller and CCX is not the same. At the very least this leads to different routing and trace lengths. The latency difference is not huge, but it is measurable. And apparently enough to justify developing this feature for AMD.
Quote:
Edit: never mind, I just found it. Last edited by flotus1; August 9, 2019 at 05:54. |
||
August 9, 2019, 04:02 |
|
#23 | |
Member
|
Quote:
About the halved write bandwidth, from https://forums.anandtech.com/threads...ndrum.2567215/ 2 tests are given: https://www.overclock3d.net/reviews/..._x570_review/9 https://www.tweaktown.com/reviews/90...ew/index3.html Also from techreport.com: https://techreport.com/review/34672/...us-reviewed/3/ Last edited by aparangement; August 9, 2019 at 05:41. |
||
August 9, 2019, 13:39 |
|
#24 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Interesting...I guess it remains to be seen how the cores in Epyc Rome CPUs are distributed among the CCDs. For every CPU with less than 256MB L3 it would theoretically be possible to have CCDs without active cores on it.
Quote:
Imagine a CFD code requests a value from memory, e.g. in order to update the value of a fluid cell, it requests its neighbor value. It will get this value, but along with it at least a whole cache line. So more data is read than the single value that was currently needed. i.e. high read bandwidth usage. Now of course optimized codes make use of this and try to use the other values too. But for general unstuctured CFD, using all of them is not possible. Writing the value back to memory after it was updated requires none of this prefetching, just caching. Or to put it differently: Unstructured CFD codes spend a lot of time reading stuff from memory that gets evicted from the caches before it is used. Please don't quote me on this though, the stuff happening inside a CPU for reading and writing data on RAM is way more complicated than in my simplified explanation. And I am by no means a computer scientist. Edit: not highly surprising, but this confirms that running a Ryzen 3000 CPU without 1:1 ratio of RAM and IF is a waste of time https://www.youtube.com/watch?time_c...&v=nugwAOvijHQ |
||
August 9, 2019, 17:20 |
|
#25 | |
Member
Ivan
Join Date: Oct 2017
Location: 3rd planet
Posts: 34
Rep Power: 9 |
Quote:
And what do you think about AVX-512 in Zen 2?: https://www.reddit.com/r/Amd/comment...rmed_by_cpuid/ https://www.kitguru.net/components/c...-cores-for-7k/ Will it actually help a lot in CFD? I play some times with AVX numbers in BIOS, but I do not get clear understanding is it reduce the actual time or not. |
||
August 9, 2019, 17:38 |
|
#26 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
As far as I am aware, Zen2 does not have AVX-512. They improved their AVX2 implementation which is now on par with Intel.
Lack of AVX-512 is no big deal in my opinion, especially not with a focus on CFD. It can help with some compute-heavy problems, but the benefit for typical CFD applications is negligible. In the past you may have stumbled upon some publications by Intel in partnership with Ansys that made it look like AVX-512 was responsible for xx% performance uplift over the previous CPU generation. But that's just marketing. The CPUs just got over 50% more memory bandwidth which accounted for the majority of the generational improvement. |
|
August 9, 2019, 18:39 |
|
#27 |
Member
Ivan
Join Date: Oct 2017
Location: 3rd planet
Posts: 34
Rep Power: 9 |
We make rebuilding of our 3D model after each iteration (Ansys CFX + CF Turbo). About 30% of all our calculation time (total calculation time is about 100 hours) computer use only 2-4 cores for rebuilding the 3D model with new angels, etc., and 70% of all 2x16 cores (we build system on dual 7301, based on your positive experience).
In this case this floating point controller in CPU can actually help to reduce the time on rebuilding of 3D model? https://techreport.com/news/34242/am...d-128-threads/ AMD also addressed a major competitive shortcoming of the Zen architecture for high-performance computing applications. The first Zen cores used 128-bit-wide registers to execute SIMD instructions, and in the case of executing 256-bit-wide AVX2 instructions, each Zen floating-point unit had to shoulder half of the workload. Compared to Intel's Skylake CPUs (for just one example), which have two 256-bit-wide SIMD execution units capable of independent operation, Ryzen CPUs offered half the throughput for floating-point and integer SIMD instructions. Zen 2 addresses this shortcoming by doubling each core's SIMD register width to 256 bits. The floating-point side of the Zen 2 core has two 256-bit floating-point add units and two floating-point multiply units that can presumably be yoked together to perform two fused multiply-add operations simultaneously. That capability would bring the Zen 2 core on par with the Skylake microarchitecture for SIMD throughput (albeit not the Skylake Server core, which boasts even wider data paths and 512-bit-wide SIMD units to support AVX-512 instructions.) To feed those 256-bit-wide execution engines, AMD also widened the load-store unit, load data path, and floating-point register file to support 256-bit chunks of data. |
|
August 10, 2019, 06:55 |
|
#28 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Quote:
Meshing and remeshing does a lot of memory (re-)allocation and accesses in a non-predictable way. For these tasks machine with less NUMA domains and lower memory latency -along with high single-threaded performance- are better suited. I can observe this with my own mostly single-threaded grid generator. It runs comparable to Intel CPUs as long as the total memory consumption does not exceed the size of a single NUMA node. Going beyond that, performance is reduced significantly with my Epyc 7301 workstation. It is no big issue for me because I don't need remeshing during a simulation. For your application an upgrade to Epyc 7002 -with sub-NUMA clustering disabled- could pay off. |
||
August 10, 2019, 07:27 |
|
#29 |
Member
Ivan
Join Date: Oct 2017
Location: 3rd planet
Posts: 34
Rep Power: 9 |
It is true, in some tasks one 7980 XE or one TR 1950X are just 50% slower, then dual 7301, just because of the big difference in 3D model remeshing time for each iteration. (7980XE make it much faster).
Anyway we feel like reducing the time is one of the main problems for us now. We are trying to evaluate what will be actual difference in time for our tasks between dual 7301 and dual 7302 with all this new controllers and architecture in 7002 CPUs. Dependence on time regarding the number of cores, GHz, channel amount and etc. is more or less understandable. |
|
August 12, 2019, 05:33 |
|
#30 |
New Member
Select One
Join Date: Aug 2019
Posts: 2
Rep Power: 0 |
||
August 12, 2019, 05:51 |
|
#31 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
It will take some time. Availability for CPUs and boards in the retail market seems to be a similar issue as with Naples. And DDR4-3200 RDIMM is not that easy to source either, at least in Europe.
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Lenovo C30 memory configuration and discussions with Lenovo | matthewe | Hardware | 3 | October 17, 2013 11:23 |
[OpenFOAM] Color display problem to view OpenFOAM results. | Sargam05 | ParaView | 16 | May 11, 2013 01:10 |
[OpenFOAM] [Critical] ParaView 3.12.0 breaks monitor signal in Ubuntu 11.04 | v_mil | ParaView | 5 | March 18, 2012 14:39 |
CFX CPU time & real time | Nick Strantzias | CFX | 8 | July 23, 2006 18:50 |