|
[Sponsors] |
[Other] howto optimize OpenFOAM for Core i7 CPU using extended instruction set |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
January 23, 2015, 05:49 |
howto optimize OpenFOAM for Core i7 CPU using extended instruction set
|
#1 |
Senior Member
Join Date: Mar 2010
Location: Germany
Posts: 154
Rep Power: 16 |
Hi,
I tried to compile and optimize OpenFOAM for some new Core i7 CPUs with AVX2 and FMA. As far as I understand the default settings are using the general x86_64 instruction set. I forced the compiler to optimize for the extended instruction set by adding the -march=corei7 flag in /wmake/rules/linux64Gcc/c++Opt and /wmake/rules/linux64Gcc/cOpt. The compiler successfully used the settings, my first benchmarks did not show any noticeable effect though. I've been using a single thread for my cases in order to rule out MPI wait times and measure the raw CPU performance. I've got two questions regarding this issue: 1. Is this the best or correct way to set the compiler flags? 2. What performance gain can be expected from optimized binaries? Many Thanks Cutter |
|
January 24, 2015, 11:16 |
|
#2 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings Cutter,
In theory, AVX should increase performance in mathematical operations, for any application, after compiling with the necessary options. But I'm not sure if and how much OpenFOAM takes advantage of this, although this is usually optimized by the compiler either way. In addition, it also depends on the GCC version you're using. It's also possible that you're using GCC version that is new enough and already does this optimization by default, which would explain why you don't notice any performance increase with and without the option. Therefore, please provide the following details:
Best regards, Bruno
__________________
|
|
January 25, 2015, 03:26 |
|
#3 |
Senior Member
Francesco Del Citto
Join Date: Mar 2009
Location: Zürich Area, Switzerland
Posts: 237
Rep Power: 18 |
Hi all,
I have the same experience as Cutter. I have tried over time with many openfoam versions, gcc, CPUs and operating system, without getting any measurable improvement from the machine-specific optimisation. Last test a few weeks ago, with gcc 4.9.2 on a very recent hardware with two different CPUs. The march option was correctly applied in both cases, the compilation itself took much longer, about 3 times longer, but the running time of the motorBike tutorial was almost exactly the same, both for mesh and solution. It would be interesting to know if anyone has a different experience and could point out the compiler options used. Best regards, Francesco |
|
January 30, 2015, 10:19 |
|
#4 |
Senior Member
Join Date: Mar 2010
Location: Germany
Posts: 154
Rep Power: 16 |
Hi,
thanks to both of you for the initial feedback! I'm currently targeting the following two CPU models (obtained via cat /proc/cpuinfo and g++ --version): * Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz, g++ (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) * Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, g++ (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16) I'm currently doing the research on the first of the two machines, which is running on a Fedora release 19 (Schrödinger’s Cat) with KDE desktop installation: Code:
$ uname -a Linux hostname 3.14.23-100.fc19.x86_64 #1 SMP Thu Oct 30 18:36:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Code:
g++ -dM -E -x c /dev/null | grep -i -e avx -e fma <<no output here>> Code:
g++ -march=core-avx2 -dM -E -x c /dev/null | grep -i -e avx -e fma #define __core_avx2__ 1 #define __AVX__ 1 #define __FP_FAST_FMAF 1 #define __FMA__ 1 #define __AVX2__ 1 #define __tune_core_avx2__ 1 #define __core_avx2 1 #define __FP_FAST_FMA 1 Code:
$ g++ -march=native -dM -E -x c /dev/null | grep -i -e avx -e fma #define __core_avx2__ 1 #define __AVX__ 1 #define __FP_FAST_FMAF 1 #define __FMA__ 1 #define __AVX2__ 1 #define __tune_core_avx2__ 1 #define __core_avx2 1 #define __FP_FAST_FMA 1 Best Regards Cutter |
|
February 2, 2015, 01:24 |
|
#5 |
Senior Member
Francesco Del Citto
Join Date: Mar 2009
Location: Zürich Area, Switzerland
Posts: 237
Rep Power: 18 |
Hi Cutter,
Nice checks! Now we know the compiler is doing its job, or at least is enabling the set of instructions specific to the CPUs, as I think we all expected. Now the questions are: is it able to use them when compiling OpenFOAM? Does this make any difference to the execution time? Francesco |
|
August 29, 2015, 11:27 |
|
#6 | |
Member
Lianhua Zhu
Join Date: Aug 2011
Location: Wuhan, China
Posts: 35
Rep Power: 15 |
Hi, Francesco,
Recently, I compared the performance of OF with icc and gcc. The two configurations are: #1. Icc 15.0.0, OpenFOAM-2.4.0, runs on E5-2680v3@2.5 GHz, compiled with -xHost -O3 flag, OS: CentOS 6.5 x64, RAM DDR4 #2. Gcc-4.8.1, OpenFOAM-2.3.0, runs on E5-2697v2@2.7 GHz, compiled with the default -m64 flag, OS: CentOS 7.0 x64, RAM DDR3 NOTE a): "-xHost will cause icc/icpc or icl to check the cpu information and find the highest level of extended instructions support to use." NOTE b): E5-2680v3 supports AVX2.0 instructions while E5-2697v2 doesn't. I run the cavity flow case in $FOAM_TUT/incompressible/icoFoam/cavity without modifying any files in it, (using only one process.) Results: The Icc configuration (#1) takes 0.16s The Gcc configuration (#2) takes 0.15s You see, almost the same! Hope this testing helps, -- Lianhua Quote:
|
||
December 28, 2015, 21:19 |
|
#7 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings to all!
I've had this thread on my to-do list and I haven't reached a solution yet. Nonetheless, I've done some basic tests that can at least give us a way to get the feeling for the scale up we can hope for. The repository is available here: https://github.com/wyldckat/avxtest The source code does not depend on OpenFOAM, needs only GCC (4.7 or newer) for building it and the summary results were as follows (using an AMD A10-7850K):
As for OpenFOAM, I still need to look into this in more detail. The compiler should be able to vectorize things on its own, but it seems that the code must be prepared in a way that the compiler can understand "oh, this I can vectorize like so and so". Best regards, Bruno
__________________
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 06:36 |
Star cd es-ice solver error | ernarasimman | STAR-CD | 2 | September 12, 2014 01:01 |
OpenFOAM CPU Usage | musahossein | OpenFOAM Running, Solving & CFD | 26 | July 18, 2013 10:03 |
OpenFOAM 13 Intel quadcore parallel results | msrinath80 | OpenFOAM Running, Solving & CFD | 13 | February 5, 2008 06:26 |
OpenFOAM 13 AMD quadcore parallel results | msrinath80 | OpenFOAM Running, Solving & CFD | 1 | November 11, 2007 00:23 |