|
[Sponsors] |
Segmentation violation on AMD comptuer. Works on Intel |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
May 2, 2022, 07:29 |
Segmentation violation on AMD comptuer. Works on Intel
|
#1 |
Member
Ruud Caljouw
Join Date: Dec 2012
Posts: 45
Rep Power: 13 |
Hi All,
I am running a multiphase analysis in CFX on a turbine in a channel. I ran it on two computers. On the first one I get a segmentation violation. On the other computer it works well. See below for computer specs and the error. Ansys suggested to set the memory allocation to 1.2 and turn of hyperthreading, which did not solve the problem. Would anybody have an idea why the simulation runs well on one computer and gives an error on the other? Thanks --- Computer spec where I receive the error - Windows 10 Pro 21H2 - AMD Ryzen Threadripper 3960X 24-Core Processor 3.79 GHz - 128 GB Computer spec where the simulation runs fine - Windows 11 Pro 21H2 - 12th Gen Intel(R) Core(TM) i9-12900K 3.19 GHz - 64 GB --- Parallel run: Received message from follower -------------------------------------------- Follower partition: 12 Follower routine : ErrAction Leader location : Message Handler Message label : 001100279 Message follows below - : +--------------------------------------------------------------------+ | ERROR #001100279 has occurred in subroutine ErrAction. | | Message: | | Signal caught: Segmentation violation | | | | | | | | | | | +--------------------------------------------------------------------+ Parallel run: Received message from follower -------------------------------------------- Follower partition: 12 Follower routine : ErrAction Leader location : Message Handler Message label : 001100279 Message follows below - : +--------------------------------------------------------------------+ | ERROR #001100279 has occurred in subroutine ErrAction. | | Message: | | Stopped in routine FPX: SIG_HANDLER | | | | | | | | | | | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ | ERROR #001100279 has occurred in subroutine MESG_RETRIEVE. | | Message: | | Stopping the run due to error(s) reported above | | | | | | | | | | | +--------------------------------------------------------------------+ |
|
May 2, 2022, 08:43 |
|
#2 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,854
Rep Power: 144 |
No idea, and it is probably not anything we can help with much anyway. Only the ANSYS developers will be able to do much with it I suspect. If you want us to look into it please post the output file for both runs - but no guarantees we can do anything useful with it.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
May 2, 2022, 14:04 |
|
#3 |
Member
Ruud Caljouw
Join Date: Dec 2012
Posts: 45
Rep Power: 13 |
Hi Glenn,
Thanks for replying. Please find attached the two *.out files. The run that did not work stopped during the second timestep, but this was not repeatable. When trying again it would run for more timesteps before crashing. |
|
May 2, 2022, 17:23 |
|
#4 |
Senior Member
Join Date: Jun 2009
Posts: 1,873
Rep Power: 33 |
You may have to contact Ansys CFX for support.
Recall supplying the "trace" file left over for the failed run. It will help them pinpoint where the failure occurred.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
May 2, 2022, 17:53 |
|
#5 |
Senior Member
Gert-Jan
Join Date: Oct 2012
Location: Europe
Posts: 1,913
Rep Power: 28 |
Nice to see a comparison of these computers. I would be interested in the performance difference.
However, Using the compare function of Notepad++ you can see that the cases are not completely the same. There is a difference in expressions: - "Not working" has deltaAngleBlade1 = blades dstep 10deg.blade1 - "Working" has deltaAngleBlade1 = blades dstep 6deg.blade1 - "Not working" has stepsPerRevolution = 36 - "Working" has stepsPerRevolution = 60 The grid is different, the moment in time is different. Why not copy one case to the other computer and show that it is really the computer. Now I have the impression we are comparing appels met peren. Regs, Gert-Jan |
|
May 3, 2022, 06:27 |
|
#6 |
Member
Ruud Caljouw
Join Date: Dec 2012
Posts: 45
Rep Power: 13 |
Thanks for pointing this out. At the time these cases were representative.
Attached two new files, which are like for like. The not working one failed after 8 timesteps. Still have no idea why |
|
May 3, 2022, 07:01 |
|
#7 |
Senior Member
Gert-Jan
Join Date: Oct 2012
Location: Europe
Posts: 1,913
Rep Power: 28 |
See no difference indeed. Don't understand why it fails. I would ask ANSYS,
Meanwhile perform a few tests while with: - increase the number of iterations within a timestep. With max 4, convergence is absent. - double/single precision - increase memory allocation Also, I wonder what you are solving since the mesh displacement appears to do nothing..... |
|
May 4, 2022, 08:34 |
|
#8 |
Member
Ruud Caljouw
Join Date: Dec 2012
Posts: 45
Rep Power: 13 |
I am solving a variable pitch vertical axis tidal turbine in a channel.
I tried your suggestions. Increasing memory allocation (to 4, since 3 did not work) or increasing the iterations (4 to 6) did not work. But when I used single precision I got no error and it ran to the end. I have no idea why. Any idea? |
|
May 4, 2022, 09:17 |
|
#9 |
Senior Member
Gert-Jan
Join Date: Oct 2012
Location: Europe
Posts: 1,913
Rep Power: 28 |
I have no idea why it works in single precision.
However, it has nothing to do with your CFX-settings or physical models. It has to be something numerical and lies a level deeper. Still, I would ask ANSYS. You have more information now. |
|
May 4, 2022, 13:21 |
|
#10 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
If we are sure the cases run on both PCs are exactly the same, you could also try running them with the same number of threads. It could be that the simulation is borderline unstable, and some minor differences from domain decomposition push it over the edge.
|
|
May 4, 2022, 20:27 |
|
#11 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,854
Rep Power: 144 |
I think Alex's comment is correct for a useful path forwards. It is likely this simulation is borderline numerically stable and small things are tipping it over the edge. So improve numerical stability with better mesh quality (that's the most important one), smaller time steps, check the physical models are correct, tighter convergence and things like that.
Also as Alex suggests you can try different decomposition or even run it serial.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
May 4, 2022, 22:51 |
|
#12 | |
Senior Member
Join Date: Jun 2009
Posts: 1,873
Rep Power: 33 |
Quote:
A "segmentation violation", or "access violation" is an illegal memory access, invalid variable/function type passing. It should not be related to the numerical details of the model except for logic triggering pieces of code that have a problem. Advice: contact Ansys CFX support and provide the "trace" file listed at the end of the output file.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
||
May 10, 2022, 09:43 |
Same Problems
|
#13 |
New Member
Bingqi Wang
Join Date: Apr 2022
Posts: 2
Rep Power: 0 |
I have the same problem with CFX 2022 on EPYC 7452 & windows server
1. for the multiphase case It works fine on INTEL machine However, it occurs random Segmentation Violation on AMD machine I tried single precision run on AMD and it works 2. for a normal single phase case all machines work fine 3. I have tried INTEL MPI and MS MPI, but the problem still occurs. |
|
May 10, 2022, 09:56 |
|
#14 |
New Member
Bingqi Wang
Join Date: Apr 2022
Posts: 2
Rep Power: 0 |
I have tried some cases, and hope this could help the developers to find the problem:
1. Decrease the overlap relaxation and using less core could help the simulation hold for a longer time, however, it would eventually crash. 2. Turn off the Multigrid solver and the simulation could converge much faster. 3. The single precision seems to work fine now 4. The AMD machine has 512G memory and the memory factor is enough. While this machine works in NPS4 mode. 5. I have tried to change the MPI environment like: I_MPI_DEBUG I_MPI_DEBUG_COREDUMP, but doesn't change anything. https://www.intel.com/content/www/us...variables.html |
|
May 10, 2022, 18:22 |
|
#15 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,854
Rep Power: 144 |
No point in telling us, the developers do not watch this forum (not that they tell us, anyway).
You will have to report this to your ANSYS rep to pass it onto the developers.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
May 27, 2022, 05:07 |
|
#16 | |
New Member
Denis Ruban
Join Date: May 2022
Location: Ukraine, Kyiv
Posts: 1
Rep Power: 0 |
Quote:
- I did not get such an error on another server PC (LGA-2011v3, Intel Xeon E5-2697 v3 3.6GHz, 64 Gb RDIMM(ECC Reg.) 2133 MHz /1.2V). I realized that it was a matter of overclocking RAM on the desktop. By the way, the memory test in AIDA was not successful. After raising the RAM voltage to 1.38 V and reducing the frequency to 3465 MHz, the problem disappeared, so successfully passes the stress test of memory in AIDA. - That is, the error/problem is solely in the stability of RAM. It is recommended that you use ECC Reg. memory with standard timings (for example, CL22 for 3200 MHz RDIMM)for your Ryzen Threadripper with 4-channel topology. |
||
Tags |
amd ryzen, intel i9, multhiphase, segmentation violation |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
udf wall distance segmentation violation | gento | FLUENT | 1 | April 15, 2016 16:59 |
User Fortran in Linux - segmentation violation | clau90 | CFX | 5 | March 30, 2016 23:20 |
Segmentation Violation | petrovic | CFX | 4 | April 1, 2014 19:13 |
Error: Segmentation Violation | Sri | FLUENT | 1 | August 14, 2007 11:55 |
C_UDMI - SEGMENTATION VIOLATION | CC | FLUENT | 4 | July 2, 2005 05:16 |