CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > ANSYS > CFX

Segmentation violation on AMD comptuer. Works on Intel

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 2, 2022, 07:29
Default Segmentation violation on AMD comptuer. Works on Intel
  #1
Member
 
Ruud Caljouw
Join Date: Dec 2012
Posts: 45
Rep Power: 14
bolus13 is on a distinguished road
Hi All,

I am running a multiphase analysis in CFX on a turbine in a channel.

I ran it on two computers. On the first one I get a segmentation violation. On the other computer it works well. See below for computer specs and the error.

Ansys suggested to set the memory allocation to 1.2 and turn of hyperthreading, which did not solve the problem.

Would anybody have an idea why the simulation runs well on one computer and gives an error on the other?


Thanks


---

Computer spec where I receive the error

- Windows 10 Pro 21H2
- AMD Ryzen Threadripper 3960X 24-Core Processor 3.79 GHz
- 128 GB

Computer spec where the simulation runs fine

- Windows 11 Pro 21H2
- 12th Gen Intel(R) Core(TM) i9-12900K 3.19 GHz
- 64 GB

---

Parallel run: Received message from follower
--------------------------------------------
Follower partition: 12
Follower routine : ErrAction
Leader location : Message Handler
Message label : 001100279
Message follows below - :

+--------------------------------------------------------------------+
| ERROR #001100279 has occurred in subroutine ErrAction. |
| Message: |
| Signal caught: Segmentation violation |
| |
| |
| |
| |
| |
+--------------------------------------------------------------------+

Parallel run: Received message from follower
--------------------------------------------
Follower partition: 12
Follower routine : ErrAction
Leader location : Message Handler
Message label : 001100279
Message follows below - :

+--------------------------------------------------------------------+
| ERROR #001100279 has occurred in subroutine ErrAction. |
| Message: |
| Stopped in routine FPX: SIG_HANDLER |
| |
| |
| |
| |
| |
+--------------------------------------------------------------------+

+--------------------------------------------------------------------+
| ERROR #001100279 has occurred in subroutine MESG_RETRIEVE. |
| Message: |
| Stopping the run due to error(s) reported above |
| |
| |
| |
| |
| |
+--------------------------------------------------------------------+
bolus13 is offline   Reply With Quote

Old   May 2, 2022, 08:43
Default
  #2
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,871
Rep Power: 144
ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice
No idea, and it is probably not anything we can help with much anyway. Only the ANSYS developers will be able to do much with it I suspect. If you want us to look into it please post the output file for both runs - but no guarantees we can do anything useful with it.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum.
ghorrocks is offline   Reply With Quote

Old   May 2, 2022, 14:04
Default
  #3
Member
 
Ruud Caljouw
Join Date: Dec 2012
Posts: 45
Rep Power: 14
bolus13 is on a distinguished road
Hi Glenn,

Thanks for replying. Please find attached the two *.out files.

The run that did not work stopped during the second timestep, but this was not repeatable. When trying again it would run for more timesteps before crashing.
Attached Files
File Type: zip CFX.zip (117.8 KB, 3 views)
bolus13 is offline   Reply With Quote

Old   May 2, 2022, 17:23
Default
  #4
Senior Member
 
Join Date: Jun 2009
Posts: 1,880
Rep Power: 33
Opaque will become famous soon enough
You may have to contact Ansys CFX for support.

Recall supplying the "trace" file left over for the failed run. It will help them pinpoint where the failure occurred.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum.
Opaque is offline   Reply With Quote

Old   May 2, 2022, 17:53
Default
  #5
Senior Member
 
Gert-Jan
Join Date: Oct 2012
Location: Europe
Posts: 1,928
Rep Power: 28
Gert-Jan will become famous soon enough
Nice to see a comparison of these computers. I would be interested in the performance difference.

However, Using the compare function of Notepad++ you can see that the cases are not completely the same.
There is a difference in expressions:
- "Not working" has deltaAngleBlade1 = blades dstep 10deg.blade1
- "Working" has deltaAngleBlade1 = blades dstep 6deg.blade1

- "Not working" has stepsPerRevolution = 36
- "Working" has stepsPerRevolution = 60

The grid is different, the moment in time is different. Why not copy one case to the other computer and show that it is really the computer. Now I have the impression we are comparing appels met peren.

Regs, Gert-Jan
Gert-Jan is offline   Reply With Quote

Old   May 3, 2022, 06:27
Default
  #6
Member
 
Ruud Caljouw
Join Date: Dec 2012
Posts: 45
Rep Power: 14
bolus13 is on a distinguished road
Thanks for pointing this out. At the time these cases were representative.

Attached two new files, which are like for like. The not working one failed after 8 timesteps. Still have no idea why
Attached Files
File Type: zip CFX.zip (88.7 KB, 2 views)
bolus13 is offline   Reply With Quote

Old   May 3, 2022, 07:01
Default
  #7
Senior Member
 
Gert-Jan
Join Date: Oct 2012
Location: Europe
Posts: 1,928
Rep Power: 28
Gert-Jan will become famous soon enough
See no difference indeed. Don't understand why it fails. I would ask ANSYS,

Meanwhile perform a few tests while with:
- increase the number of iterations within a timestep. With max 4, convergence is absent.
- double/single precision
- increase memory allocation

Also, I wonder what you are solving since the mesh displacement appears to do nothing.....
Gert-Jan is offline   Reply With Quote

Old   May 4, 2022, 08:34
Default
  #8
Member
 
Ruud Caljouw
Join Date: Dec 2012
Posts: 45
Rep Power: 14
bolus13 is on a distinguished road
I am solving a variable pitch vertical axis tidal turbine in a channel.

I tried your suggestions. Increasing memory allocation (to 4, since 3 did not work) or increasing the iterations (4 to 6) did not work. But when I used single precision I got no error and it ran to the end. I have no idea why. Any idea?
bolus13 is offline   Reply With Quote

Old   May 4, 2022, 09:17
Default
  #9
Senior Member
 
Gert-Jan
Join Date: Oct 2012
Location: Europe
Posts: 1,928
Rep Power: 28
Gert-Jan will become famous soon enough
I have no idea why it works in single precision.
However, it has nothing to do with your CFX-settings or physical models. It has to be something numerical and lies a level deeper.
Still, I would ask ANSYS. You have more information now.
Gert-Jan is offline   Reply With Quote

Old   May 4, 2022, 13:21
Default
  #10
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,428
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
If we are sure the cases run on both PCs are exactly the same, you could also try running them with the same number of threads. It could be that the simulation is borderline unstable, and some minor differences from domain decomposition push it over the edge.
flotus1 is offline   Reply With Quote

Old   May 4, 2022, 20:27
Default
  #11
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,871
Rep Power: 144
ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice
I think Alex's comment is correct for a useful path forwards. It is likely this simulation is borderline numerically stable and small things are tipping it over the edge. So improve numerical stability with better mesh quality (that's the most important one), smaller time steps, check the physical models are correct, tighter convergence and things like that.

Also as Alex suggests you can try different decomposition or even run it serial.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum.
ghorrocks is offline   Reply With Quote

Old   May 4, 2022, 22:51
Default
  #12
Senior Member
 
Join Date: Jun 2009
Posts: 1,880
Rep Power: 33
Opaque will become famous soon enough
Quote:
Originally Posted by bolus13 View Post
Thanks for pointing this out. At the time these cases were representative.

Attached two new files, which are like for like. The not working one failed after 8 timesteps. Still have no idea why
From the output files, and the additional information that it works in single precision on the same machine and number of partitions (correct), I can only guess the software attempted to access an invalid memory location based on logic not triggered in the single-precision simulation.

A "segmentation violation", or "access violation" is an illegal memory access, invalid variable/function type passing. It should not be related to the numerical details of the model except for logic triggering pieces of code that have a problem.

Advice: contact Ansys CFX support and provide the "trace" file listed at the end of the output file.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum.
Opaque is offline   Reply With Quote

Old   May 10, 2022, 09:43
Default Same Problems
  #13
New Member
 
Bingqi Wang
Join Date: Apr 2022
Posts: 2
Rep Power: 0
Bingqi Wang is on a distinguished road
I have the same problem with CFX 2022 on EPYC 7452 & windows server

1. for the multiphase case
It works fine on INTEL machine
However, it occurs random Segmentation Violation on AMD machine
I tried single precision run on AMD and it works
2. for a normal single phase case
all machines work fine
3. I have tried INTEL MPI and MS MPI, but the problem still occurs.
Bingqi Wang is offline   Reply With Quote

Old   May 10, 2022, 09:56
Default
  #14
New Member
 
Bingqi Wang
Join Date: Apr 2022
Posts: 2
Rep Power: 0
Bingqi Wang is on a distinguished road
I have tried some cases, and hope this could help the developers to find the problem:

1. Decrease the overlap relaxation and using less core could help the simulation hold for a longer time, however, it would eventually crash.

2. Turn off the Multigrid solver and the simulation could converge much faster.

3. The single precision seems to work fine now

4. The AMD machine has 512G memory and the memory factor is enough. While this machine works in NPS4 mode.

5. I have tried to change the MPI environment like: I_MPI_DEBUG I_MPI_DEBUG_COREDUMP, but doesn't change anything.

https://www.intel.com/content/www/us...variables.html
Bingqi Wang is offline   Reply With Quote

Old   May 10, 2022, 18:22
Default
  #15
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,871
Rep Power: 144
ghorrocks is just really niceghorrocks is just really niceghorrocks is just really niceghorrocks is just really nice
No point in telling us, the developers do not watch this forum (not that they tell us, anyway).

You will have to report this to your ANSYS rep to pass it onto the developers.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum.
ghorrocks is offline   Reply With Quote

Old   May 27, 2022, 05:07
Default
  #16
New Member
 
Denis Ruban
Join Date: May 2022
Location: Ukraine, Kyiv
Posts: 1
Rep Power: 0
Denis Ruban is on a distinguished road
Quote:
Originally Posted by bolus13 View Post
Hi All,

I am running a multiphase analysis in CFX on a turbine in a channel.

I ran it on two computers. On the first one I get a segmentation violation. On the other computer it works well. See below for computer specs and the error.

Ansys suggested to set the memory allocation to 1.2 and turn of hyperthreading, which did not solve the problem.

Would anybody have an idea why the simulation runs well on one computer and gives an error on the other?


Thanks


---

Computer spec where I receive the error

- Windows 10 Pro 21H2
- AMD Ryzen Threadripper 3960X 24-Core Processor 3.79 GHz
- 128 GB

Computer spec where the simulation runs fine

- Windows 11 Pro 21H2
- 12th Gen Intel(R) Core(TM) i9-12900K 3.19 GHz
- 64 GB

---

Parallel run: Received message from follower
--------------------------------------------
Follower partition: 12
Follower routine : ErrAction
Leader location : Message Handler
Message label : 001100279
Message follows below - :

+--------------------------------------------------------------------+
| ERROR #001100279 has occurred in subroutine ErrAction. |
| Message: |
| Signal caught: Segmentation violation |
| |
| |
| |
| |
| |
+--------------------------------------------------------------------+

Parallel run: Received message from follower
--------------------------------------------
Follower partition: 12
Follower routine : ErrAction
Leader location : Message Handler
Message label : 001100279
Message follows below - :

+--------------------------------------------------------------------+
| ERROR #001100279 has occurred in subroutine ErrAction. |
| Message: |
| Stopped in routine FPX: SIG_HANDLER |
| |
| |
| |
| |
| |
+--------------------------------------------------------------------+

+--------------------------------------------------------------------+
| ERROR #001100279 has occurred in subroutine MESG_RETRIEVE. |
| Message: |
| Stopping the run due to error(s) reported above |
| |
| |
| |
| |
| |
+--------------------------------------------------------------------+
-Hello. I was dealing with such an error in ANSYS CFX on a desktop PC (LGA-1200, Intel Core i7 10700KF 5.0 GHz, 64 Gb DIMM 3500 MHz /1.35V).
- I did not get such an error on another server PC (LGA-2011v3, Intel Xeon E5-2697 v3 3.6GHz, 64 Gb RDIMM(ECC Reg.) 2133 MHz /1.2V). I realized that it was a matter of overclocking RAM on the desktop. By the way, the memory test in AIDA was not successful. After raising the RAM voltage to 1.38 V and reducing the frequency to 3465 MHz, the problem disappeared, so successfully passes the stress test of memory in AIDA.
- That is, the error/problem is solely in the stability of RAM. It is recommended that you use ECC Reg. memory with standard timings (for example, CL22 for 3200 MHz RDIMM)for your Ryzen Threadripper with 4-channel topology.
Denis Ruban is offline   Reply With Quote

Reply

Tags
amd ryzen, intel i9, multhiphase, segmentation violation


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
udf wall distance segmentation violation gento FLUENT 1 April 15, 2016 16:59
User Fortran in Linux - segmentation violation clau90 CFX 5 March 30, 2016 23:20
Segmentation Violation petrovic CFX 4 April 1, 2014 19:13
Error: Segmentation Violation Sri FLUENT 1 August 14, 2007 11:55
C_UDMI - SEGMENTATION VIOLATION CC FLUENT 4 July 2, 2005 05:16


All times are GMT -4. The time now is 04:23.