CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums

OpenFOAM with ICC - Intel's ICC is "allergic" to AMD CPUs

Register Blogs Community New Posts Updated Threads Search

Rate this Entry

OpenFOAM with ICC - Intel's ICC is "allergic" to AMD CPUs

Posted July 20, 2012 at 12:17 by wyldckat
Updated August 1, 2012 at 06:32 by wyldckat
Tags icc, openfoam

This is yet another one of my "rants on my blog" I do once in a while.

So today I was trying to reproduce this report: http://www.openfoam.org/mantisbt/view.php?id=590 - I was curious about this, because ICC is known for having great optimization on Intel processors (or at least it had in the past), as well as being completely free for non-commercial use on Linux.

The detail here is that I don't have an Intel CPU on my home computer. I've got an AMD 1055T x6
But that hasn't stopped me from compiling OpenFOAM on Ubuntu x86_64 with ICC. The detail now is that today it was the first time I took the build for a spin.

My surprise (and origin for this rant) is this message that appeared in front of me:
Code:
Fatal Error: This program was not built to run on the processor in your system.
The allowed processors are: Intel(R) Pentium(R) 4 and compatible Intel processors with Intel(R) Streaming SIMD Extensions 3 (Intel(R) SSE3) instruction support.
Wow... I knew it wouldn't work in optimized mode, due to several messages written everywhere in documents and as documented at Wikipedia: http://en.wikipedia.org/wiki/Intel_C...iler#Criticism

So I went to search more on this error and voilá: http://developer.amd.com/documentati...292005119.aspx - this was the first link to pop-up when searching for this message:
Quote:
This program was not built to run on the processor in your system. The allowed processors are: Intel
So, what to do next?

Using a crowbar is out of the question... Which is why I then remembered this bug report: http://www.openfoam.org/mantisbt/view.php?id=322
Quote:
wmake/rules/linux64Icc/c++Opt should be edited to remove the -SSE3 option.
And rebuild!

And why to go through all of this trouble? Because I want to know!

PS: to answer the first comment:
Code:
icc --version
icc (ICC) 12.1.4 20120410
Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.
(Non-commercial version for Linux.)
__________________________
Continuing investigation has lead to the following results

Mental notes:
Benchmark conditions:
  • OpenFOAM 2.1.x
  • motorBike 2.1.x case from this page: http://code.google.com/p/bluecfd-sin...untimes202_211 - it uses snappyHexMesh in parallel as well
  • Processor AMD 1055T x6, using DDR2 800Mhz memory
  • Only a single run was done, so the timings may have a 5 to 10s margin of error...
Run times:
  • Single core:
    • sHM 300s, simpleFoam 1614s <-- Gcc 4.6.1 -O3
    • sHM 300s, simpleFoam 1399s <-- Gcc 4.6.1 -march=amdfam10 -O2 -pipe
    • sHM 245s, simpleFoam 1409s <-- Icc -O2 -no-prec-div
    • sHM 242s, simpleFoam 1369s <-- Icc -O2 -no-prec-div -msse3
  • Parallel 2 cores:
    • sHM 222s, simpleFoam 1296s <-- Gcc 4.6.1 -O3
    • sHM 198s, simpleFoam 867s <-- Gcc 4.6.1 -march=amdfam10 -O2 -pipe
    • sHM 176s, simpleFoam 894s <-- Icc -O2 -no-prec-div
    • sHM 173s, simpleFoam 884s <-- Icc -O2 -no-prec-div -msse3
  • Parallel 4 cores:
    • sHM 160s, simpleFoam 1196s <-- Gcc 4.6.1 -O3
    • sHM 130s, simpleFoam 726s <-- Gcc 4.6.1 -march=amdfam10 -O2 -pipe
    • sHM 163s, simpleFoam 788s <-- Icc -O2 -no-prec-div
    • sHM 159s, simpleFoam 771s <-- Icc -O2 -no-prec-div -msse3
« Prev     Main     Next »
Total Comments 10

Comments

  1. Old Comment
    SergeAS's Avatar
    what icc version you used?
    permalink
    Posted July 28, 2012 at 02:02 by SergeAS SergeAS is offline
  2. Old Comment
    Good point, I forgot to mention in the blog post. edit: I've added it now.

    Nonetheless, this issue with ICC and AMD processors has several years now, as you can see from the references at Wikipedia And I don't think they (intel) have any plans to change this point of view.
    permalink
    Posted July 28, 2012 at 05:10 by wyldckat wyldckat is offline
    Updated July 28, 2012 at 05:10 by wyldckat (see "edit:")
  3. Old Comment
    SergeAS's Avatar
    AFAIK since icc 10 or 10.1 has special option -xO


    Code:
    -x<codes>  generate specialized code to run exclusively on processors
               indicated by <codes> as described below
        W  Intel Pentium 4 and compatible Intel processors
        P  Intel(R) Core(TM) processor family with Streaming SIMD
           Extensions 3 (SSE3) instruction support
        T  Intel(R) Core(TM)2 processor family with SSSE3
        O  Intel(R) Core(TM) processor family.  Code is expected to run properly
           on any processor that supports SSE3, SSE2 and SSE instruction sets
        S  Future Intel processors supporting SSE4 Vectorizing Compiler and
           Media Accelerator instructions
    For several years I have been using this icc (v10.1) option for my code on my old cluster with AMD Opteron285

    Code:
    icc --version
    icc (ICC) 10.1 20080801
    Copyright (C) 1985-2008 Intel Corporation.  All rights reserved.
    permalink
    Posted July 28, 2012 at 10:27 by SergeAS SergeAS is offline
  4. Old Comment
    How did I not see this before... In the last link I posted, there is this other link: http://developer.amd.com/Assets/Comp...f-61004100.pdf (edit: the AMD website is being revamped and the PDF got lost... best to Google for "CompilerOptQuickRef-61004100.pdf")
    There is indicated to use the option "-msse3" (and to avoid "-ax"), which in Icc 12.1 is the same as the now deprecated option "-oX" you've written about!

    My machine is currently building another set of OpenFOAM 2.1.x with this option and then I'll run some more tests to compare performance times! If it fares well, it's a good idea to propose to include in OpenFOAM the crazy "WM_COMPILER" option "IccAMD"

    Serge, many thanks for the info! Although I can almost bet that Gcc 4.6 can still outrun Icc
    permalink
    Posted July 28, 2012 at 14:01 by wyldckat wyldckat is offline
    Updated May 18, 2014 at 13:46 by wyldckat (see "edit:")
  5. Old Comment
    SergeAS's Avatar
    It would be interesting to see comparative benchmarks gcc and icc on OF.

    PS: In my own density-based solver ( for compressible reacting flows from subsonic to hypersonic) icc beat gcc by about 10% but it is quite old compilers (icc 10.1 vs gcc 4.1) and processors - (AMD Opteron285 and Intel Xeon 5120)

    UPD: I spent a little test for my current problem for the "pure" performance of the solver (serial version), for 3 different compilers: gcc 4.1.0, gcc 4.5.1 and icc 10.1 and got the following results (one time step)

    gcc 4.1 - 14.7 seconds
    gcc 4.5.1 - 14.95 sec
    icc 10.1 - 9.45 seconds

    icc in this case produces code that is faster by more than 30%

    (The test was performed on AMD Opteron285)
    permalink
    Posted July 28, 2012 at 16:08 by SergeAS SergeAS is offline
    Updated July 29, 2012 at 08:59 by SergeAS
  6. Old Comment
    I've finally managed to complete the benchmarks! That "-msse3" option did bring good results! But I then went out to get the flags for GCC + 1055T and it did even better
    edit: I updated the last part of the blog post

    For your Opteron 285, looks like these are the perfect gcc options:
    Quote:
    Originally Posted by http://en.gentoo-wiki.com/wiki/Safe_Cflags/AMD#2xx.2F8xx_Opteron
    -march=opteron -O2 -pipe
    PS: really nice blog posts you've got about performance comparisons!!
    permalink
    Posted August 1, 2012 at 06:37 by wyldckat wyldckat is offline
    Updated August 1, 2012 at 06:38 by wyldckat (see "edit:")
  7. Old Comment
    SergeAS's Avatar
    In my solver I'm using next options set for gcc
    Code:
    -O3 -ffast-math  -funroll-loops -fprefetch-loop-arrays -frerun-loop-opt -ffloat-store  -frerun-cse-after-loop -fschedule-insns2 -fno-strength-reduce -fexpensive-optimizations -fcse-follow-jumps -falign-loops=32 -falign-jumps=32 -falign-functions=32 -falign-labels=32 -fcse-skip-blocks  -ftree-vectorize  -msse3 -msse2 -msse -mmmx -m3dnow -mfpmath=sse -mtune=opteron
    but it seems for current version of solver - GCC have no real chance against icc (in serial version)

    PS: in case of OpenMP solver both compilers has horrible performance/scaling for NUMA systems on Opterons but gcc 4.5.1 a little bit faster then icc 10.1

    in case of dual head Operon285 in 16Gb node (2x2 core) on same test (6 m nodes - ~10 GB overal task size) the results is next:

    icc 10.1 OpenMP - 8.15 sec (~29% parallel effectivity on 4 cores )
    gcc 4.5.1 - 7.95 sec (~47 % parallel effectivity on 4 cores)

    For NUMA systems is more appropriate MPI as programming model (my test task has ~41% parallel effectivity ... on 54 cores in case of MPI )


    PPS: One small remark
    option -pipe speedup compiling of OF but not the execution of the solver code

    Quote:
    -pipe'
    Use pipes rather than temporary files for communication between the
    various stages of compilation.
    so it is well loved in gentoo
    permalink
    Posted August 1, 2012 at 16:16 by SergeAS SergeAS is offline
    Updated August 1, 2012 at 17:00 by SergeAS
  8. Old Comment
    Many thanks for the info! I did think that the "-pipe" option was something for optimizing the way cache was handled in the core... I should have read the manual

    But, why are you using "-ffast-math"? AFAIK, that option is hated in the CFD world, because it doesn't offer full precision nor reproducibility!

    edit: OpenMP is pointless in OpenFOAM, because it doesn't take advantage of multi-threading capabilities, nor is OpenFOAM thread-safe in some cases!
    permalink
    Posted August 1, 2012 at 17:16 by wyldckat wyldckat is offline
  9. Old Comment
    SergeAS's Avatar
    Quote:
    But, why are you using "-ffast-math"? AFAIK, that option is hated in the CFD world, because it doesn't offer full precision nor reproducibility!
    it depends on task, because it only does not guarantee precision in some cases
    permalink
    Posted August 1, 2012 at 17:41 by SergeAS SergeAS is offline
  10. Old Comment
    Tobi's Avatar
    Quote:
    Originally Posted by SergeAS View Comment
    it depends on task, because it only does not guarantee precision in some cases
    And how can you guarantee the precision in a new simulation case?
    permalink
    Posted September 22, 2013 at 11:04 by Tobi Tobi is offline
 

All times are GMT -4. The time now is 10:51.