|
[Sponsors] |
CFX Solver does not write the results file and returns with error code 1 |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
May 3, 2016, 04:05 |
CFX Solver does not write the results file and returns with error code 1
|
#1 |
New Member
Zeeshan Saeed
Join Date: May 2016
Posts: 3
Rep Power: 10 |
Hi,
I am running an unsteady simulation of a turbine blade passage using Fourier Transformation method using ANSYS CFX 15.0. Some related parameters of interest are: - Frequency: 1294.85 Hz - Interblade Phase Angle (-72) / Nodal Diameter -16 (corresponding to a set of 80 blades) - No. of Time Steps per period: 16 - Total Number of Periods: 6 - No. of Fourier Coefficients: 3 I have prescribed the desired mode shape onto CFD mesh and specified the steady state solution as initial condition. The solver starts fine and keeps solving till the last time step of the simulation and when it comes to writing the results, it return the error code 1 with this message: " An error has occurred in cfx5solve: The ANSYS CFX solver exited with return code 1. No results file has been created. " The it says " The following user files have been saved in the directory zFas3A4b, mon " I have made several attempts but every time face the same issue. Can anyone please suggest a way to overcome this issue. Also how can I use the two filed created above at the end? I shall be very grateful. Regards, Zeeshan Saeed |
|
May 3, 2016, 04:30 |
|
#2 |
Senior Member
Maxim
Join Date: Aug 2015
Location: Germany
Posts: 413
Rep Power: 13 |
please post the complete error message/out-file.
|
|
May 3, 2016, 05:06 |
|
#3 |
New Member
Zeeshan Saeed
Join Date: May 2016
Posts: 3
Rep Power: 10 |
Hi, please see the error message here as well as the text file (read from .out file). Since file size was greater than allowed by the forum here, therefore, I have deleted the time steps and coefficient loop iterations. Hope it serves the purpose.
+--------------------------------------------------------------------+ | An error has occurred in cfx5solve: | | | | The ANSYS CFX solver exited with return code 1. No results file | | has been created. | +--------------------------------------------------------------------+ End of solution stage. +--------------------------------------------------------------------+ | The following user files have been saved in the directory | | C:\Users\zeeshans\Desktop\CoarseMesh_IBPA(72)\B_1B _IBPA(72)_001: | | | | zFas3A4b, mon | +--------------------------------------------------------------------+ This run of the ANSYS CFX Solver has finished. |
|
May 3, 2016, 05:48 |
|
#4 |
Senior Member
Maxim
Join Date: Aug 2015
Location: Germany
Posts: 413
Rep Power: 13 |
That's quite strange - usually there's more information about the error. And your 'return code 1' shows up when CFX is writing the result file to your hard drive.
Do you have enough disk space? Are you allowed to write big files on that drive/folder? (not sure how big the result file will be in your simple case though) In case this is a computer from the university - some schools have a file size/space limit in the home folders such as the desktop... Besides that, I am out of ideas - might have to wait for the geniuses |
|
May 3, 2016, 12:02 |
|
#5 |
New Member
Zeeshan Saeed
Join Date: May 2016
Posts: 3
Rep Power: 10 |
Memory does not seem to be a problem.
Thanks for your comments, anyways. |
|
May 3, 2016, 22:22 |
|
#6 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,870
Rep Power: 144 |
As Maxim says, the run appears to have completed successfully but when it is writing the results file it fails. So I would suggest (and some of these are repeated from Maxim):
* You ran out of disk space, or your quota filled up or something stopped you writing the file to disk. * You lost the network at the end when writing the result file * Writing the results file involved calculating a variable which could not be evaluated (divide by zero, undefined variable etc). Could you have a user defined variable which has a divide by zero? |
|
June 22, 2016, 03:59 |
Problem with saving .mon file instead of .res file.
|
#7 |
New Member
Noukhez Ahmed
Join Date: Mar 2014
Posts: 9
Rep Power: 12 |
Hi guys,
Just to share my experience. I am working on flow inside the centrifugal compressor stage in steady condition in CFX. I have had the same issue today that the .res file or a .bak file was not saving to post-process the results. I was looking online on CFD forum to solve this issue, but I could not find any solution. There is a huge memory file saved in there as well 'zFas3A4b', in the working directory, which is very useful, I guess. What I did is, I have copied the same file and put an extension of .res next to it and opened it in CFD post and the problem is solved. It has all the data saved in it and post processing is working fine as well. One more thing that is important is specification of solver memory allocation factor. I think we need to increase it to 1.5 I guess for my simulation, but it depends to upon the memory the solver takes to numerically simulate the data. The reason of giving this idea is I am running 5 other simulations with Solver memory allocation factor of 1.5 on them and they are working fine and saving the data. However, I forgot to specify the solver memory allocation factor to two other simulations and the result was not saving the data. Therefore, specification of solver memory allocation factor is important. I hope this resolve your issue as well Zeeshan. Regards, Noukhez |
|
July 25, 2018, 08:48 |
|
#8 |
Member
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 9 |
Hi everyone,
Sorry to hijack this post, but it seems to be fairly relevant to an issue I am having. Similar to Zeeshan's original post, the solver for my problem runs perfectly smooth until it comes to writing the results. I should note that this problem only occurs for parallel computing, as it writes the results file just fine in serial. I am using Intel MPI local parallel method, as this is the only method that seems to "work" at all. Furthermore, the process never gives an error message. It just simply hangs at this step with no progress in the out file after it prints the Variable Range Information. Eventually I have to kill the process and processes manually by using Code:
kill -9 <PID> Afterwards, I am left with the same zFas3A4b file as well as mon and pids files. I attempted to copy the zFas3A4b to zFas3A4b.res to see if the results were there, as Noukhez suggested. However, when opening this file in Post, it has not written any of the variables as only the variables X, Y and Z are available for viewing. I should also note that I have attempted to see if there were any lock files that were causing the program to stop. I noticed there were two such lock files. One was sm.<USER_ID>.<PID>.lock while the solver was running, and another lock file on zFas3A4b. Both of which were deleted to see if this may be the cause of the issue, however the problem still persisted. Does anyone have any suggestions or experience with similar issues? This has been a problem for the last few days and it seems I am no closer to solving the issue. Any help is greatly appreciated. Regards, James |
|
July 25, 2018, 09:04 |
|
#9 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,870
Rep Power: 144 |
I cannot tell you what the problem is, only to suggest some ideas:
* If you run a tutorial example in serial, local parallel and distributed parallel - does that complete OK? * If you run this example in distributed parallel - does that complete OK? * Have you tried the other parallel options? * If you save a backup file during a run does it crash when writing that? * Are you sure your network is fast, stable and not flooded with junk packets from other users?
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
July 25, 2018, 12:43 |
|
#10 | |||||
Member
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 9 |
Hi Glenn,
Thank you for taking the time to reply to my post. I greatly appreciate your suggestions. Quote:
Code:
ccl cfx5.mms cfx5.tt def gui-err.txt mms mms.setup mms.setup.attrb mon mon.old mpd.hosts out par pids sm.jg847pc.14441.lock zFas3A4b zFas3A4b.lck Quote:
Quote:
Code:
Permission denied, please try again. /ansys_inc/v182/CFX/bin/linux-amd64/ifort/solver-mpi.exe: error while loading shared libraries: libmport.so: cannot open shared object file: No such file or directory /ansys_inc/v182/CFX/bin/linux-amd64/ifort/solver-mpi.exe: error while loading shared libraries: libmport.so: cannot open shared object file: No such file or directory MPI Application rank 0 exited before MPI_Init() with status 127 mpirun: Broken pipe Quote:
Quote:
Is there anything else I could do to test what is wrong with my set up? It seems strange that this has been such a massive issue. Regards, James |
||||||
July 25, 2018, 20:26 |
|
#11 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,870
Rep Power: 144 |
Most importantly - CFX is a thoroughly tested piece of software. It definitely does not have this problem in CFX itself. There is something unique in your setup which is causing the problem. So you have to find what is unique in your setup and fix it.
You should always run the most recent version of CFX, as it fixes many bugs and adds new features. So definitely go to the most recent version. This problem may magically disappear if you do. But you appear to be running linux, and debugging these sort of problems on linux is always a nigthmare. Linux has such a convoluted network of libraries and background applications to get this sort of stuff working that it is impossible for mere mortals to figure it out. So my only recommendation here is to do a full update of all your linux libraries to the latest versions. Also - make sure you have read and followed the instructions on the CFX installation documentation. There are a few special tasks you need to do for parallel operation.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
July 27, 2018, 05:54 |
|
#12 |
Member
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 9 |
Hi Glenn,
Thank you for your response. Unfortunately, I am unable to upgrade to V19, as V18.2 seems to be the latest version my university has the installation media for. I will look through the Ansys installation guide again and see if there is something I missed. Regards, James |
|
July 27, 2018, 08:51 |
|
#13 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,870
Rep Power: 144 |
You download the installation from the ANSYS Customer website. I have not seen physical media for CFX or ANSYS for decades.
There is a special section in the installation notes about parallel setup, especially for distributed parallel.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
July 27, 2018, 08:54 |
|
#14 |
Member
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 9 |
We're not using a physical installation media, but we have it on our shared server available to mount. I'm not sure why the newest version is not available there, but I will ask.
James |
|
July 27, 2018, 08:58 |
|
#15 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,870
Rep Power: 144 |
Oh yes: What linux distribution are you using?
The CFX forum thread on getting CFX working on unsupported Linux distributions is the longest and most read thread on the forum: https://cfd-online.com/Forums/cfx/25...istros-14.html
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
July 27, 2018, 12:05 |
|
#16 |
Member
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 9 |
I'm using Ubuntu 16.04. I used Ansys 18.2 on UBuntu 16.04: Installation Guide as a guide for installation, but I will also sift through the post you sent to see if anyone has had a similar issue.
Thanks again for your help Glenn. Hopefully I can sort something out. James |
|
February 21, 2020, 12:53 |
Same Issue , any solution found...
|
#17 |
New Member
Sidharth K PIllai
Join Date: Aug 2019
Location: INDIA
Posts: 12
Rep Power: 7 |
Hellow everyone,
I am met with the same problem now. I know this is an old post, but have anyone got the solution for this problem... The error message I get in the black graphics window while running cfx 18.2 in windows 10 in local parallel MPi mode is this... "application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 An error has occurred in cfx5solve:" I tried initialising the case several times with many modifications thinking I have made something wrong, But the same file runs smoothly on another workstation. |
|
November 16, 2023, 12:11 |
|
#18 |
New Member
Join Date: Feb 2015
Posts: 1
Rep Power: 0 |
Hello all, lonnnnnnnnng time reader, first time caller.
I know this is now many years too late, but it doesn't seem like a solution has been pinned down, and I wanted to chime in for posterity. I was having a very similar issue described in this thread awhile ago. I was fortunate to work with my university's very talented research computing personnel and I was able to figure out the problem in my case. I no longer have much idea where all of the out files and logs are, but I will do my best from memory with some generalities. I was attempting to run a multistage, full-annulus turbomachinery model (~600m elements), and am fortunate enough to have access to quite a lot of computational horsepower (clearly) and as a grad student my time is still very cheap so literally over a hundred run attempts for troubleshooting were done. The crux of my issue, was once I finally got the memory correct for the behemoth to run, it would crash when trying to save the .res files. I believe this is the common string in this thread (zeeshans, jgross). I, too, was convinced it couldn't be a memory problem. We use the Slurm scheduler for submitting batch jobs to our computing cluster. I was able to learn some things (with a lot of help) using some native Slurm commands and a few special ones included by our research computing staff. Every time the model crashed during writing results, I would get the incredibly frustrating and useless Ansys error messages that we all know and love. Usually returning with maybe a code 1 or a code 2, always with the zFas3A4b (what an odd series of characters for this) "result/backup" file. I could always see that when my jobs failed (with the Slurm tools) that my memory usage was well under the maximum available. So I continued allocating more and more trying to solve the issue. Same result over and over. I reduced the size of my model by half (to ~600m elements... did I mention I am a graduate student with stupid amounts of cores available and maybe not had the best decision-making at the time... my embarrassment is relevant and funny, I suppose) and the issues continued. What finally unveiled the curtain of this incredibly frustrating issue, was noticing in the Slurm out file (not the Ansys one...) that the jobs were exiting with "return code 9". Ansys continues with it's terrible error reporting of the same thing over and over with no description or documentation to help troubleshoot . Here was a section from one of these out files: Code:
/===================================================================================// //= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES// //= PID 93089 RUNNING AT cluster-a052// //= EXIT CODE: 9// //= CLEANING UP REMAINING PROCESSES// //= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES// //===================================================================================// // Intel(R) MPI Library troubleshooting guide:// // https://software.intel.com/node/561764// //===================================================================================/ Here was his response (I've obfuscated identifying details of institution, etc.): "Process death with signal 9 (unequivocal hard kill) is most commonly associated with an out-of-memory watchdog. I agree with you that this is more of a Slurm issue than an Ansys issue (though both are at at fault). Specifically, the way we configured Slurm to watch for per-core memory. You may have tamed your model to fit in the confines, but I bet there's something else inside Ansys that tries to reach for just a bit more - and it gets shot by Slurm. Looking at your recent jobs that touched cluster-a052, I see two: Code:
$ sacct -X -u myusername -S 2021-07-01 -N cluster-a052 JobID JobName Partition Account AllocCPUS State ExitCode ------- ---------- ---------- ------- --------- ---------- -------- 3979729 myjob+ clus-a mygrp 2560 FAILED 2:0 3979731 myjob+ clus-a mygrp 2560 OUT_OF_ME+ 0:125 Code:
3979729 Max Mem used: 241.20G (clus-a052,clus-a300,clus-a368) 3979731 Max Mem used: 242.78G (clus-a212,clus-a052,clus-a300) You might try adding '--mem=250000M' or '--mem=249G' to your script (I am not sure which one gets converted into larger value) to see if you could squeeze couple extra GBs for the garden, and whether this could be enough for things to proceed. If it works, this could be an easy (albeit potentially fragile) way out. If not... then you'd have to go back to the magic Ansys switches to tell it to limit its appetite somewhat." This was the first real confirmation to me that there were, indeed, memory issues occurring, just not being reported by Ansys in any way, or when they were reported, I assumed that, like usual, I needed to bump the allocation factors. As a result of this exchange, I started over with my memory multipliers. Resubmitting after satisfying each memory issue that finally started being reported correctly by iterating on the multipliers, e.g., bump -size-nr by 0.1, new error... bump -size-ni by 0.1... new error, bump -size-nr by 0.1... until.... it worked! The final memory allocation looked like this: Code:
-size-cat 2.0x -size-nr 1.9x -size-ni 2.3x -single -size-interp-cat 10.0x -large Where each -size factor was added as the result of a new error occurring by satisfying a previous one. When I watched the job as it saves, there is a quite massive jump in memory that is not recorded by the scheduler before the process crashes. When it finally began working and saving, I could see this happening, and it would jump from like 180GB/node to about 220-230 GB/node. Incredibly frustrating, but alas, I now know about this. Sorry for my long story, but I hope this can help someone else not spend months going down the wrong path... |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Definition of y+ in yPlusRAS (1.7.1) | Taka1 | OpenFOAM Programming & Development | 41 | May 23, 2020 13:05 |
Trouble compiling utilities using source-built OpenFOAM | Artur | OpenFOAM Programming & Development | 14 | October 29, 2013 11:59 |