|
[Sponsors] |
Fatal overflow in linear solver occur when execute solution in parallel |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
March 3, 2020, 11:27 |
Fatal overflow in linear solver occur when execute solution in parallel
|
#1 |
Senior Member
Join Date: Nov 2015
Posts: 246
Rep Power: 12 |
Hi, folks!
Let me introduce my problem and physics description first. I`m performing sloshing analysis in 2D rectangular box (256x256 mm), caused by impulse acceleration load. Maximum acceleration is 8 g, ramped load, load duration is 70 ms. Working liquids are air and water, both are incompressible. Turbulence model SST. I use Homogeneous multiphase model with Standard Free surface model option. I include Buoyancy model too. I set Surface tension model OFF. I perform tutorial “Flow around bump” before my analysis and mainly use solver setting from this tutorial. My computational domain is isolated, BC consist from four walls and two symmetry planes. I initialize transient simulation using expressions – set test water tank is half filed with water. At convergence control I ser 3 to 5 coefficient loops and select High resolution scheme for all equations. I add Advanced option -> Volume Fraction Coupling -> Coupled. I use Double Precision for all runs. Now let`s proceed to failure I faced. At the very beginning of mesh convergence/timestep investigation I faced strange error. When I run problem in Serial I manage to solve it and obtain some results. Although, when I run solution in parallel, my solution diverge at first timestep. No matter if I solve first timesteps at single core and then restart to parallel I still get divergence. Errorcode is Code:
+--------------------------------------------------------------------+ | ERROR #004100018 has occurred in subroutine FINMES. | | Message: | | Fatal overflow in linear solver. | +--------------------------------------------------------------------+ 1) What I`m doing wrong and how can I fix issue with parallel solution. 2) Is there any other mistakes in my physics/numeric setup? I attach CCL file, two meshes (4 and 8 mm element size) and two output files – successful solution and failed one. Thanks in advance! |
|
March 3, 2020, 13:46 |
|
#2 |
Senior Member
Join Date: Jun 2009
Posts: 1,880
Rep Power: 33 |
If you want to investigate what might be the problem, you can take advantage of the fact you solved the problem in serial (1 core).
Run the problem in parallel using the same initial conditions/guess as in the serial run. Set both simulations to stop before the timestep that had failed in parallel before. Now you should have to results files, as well as two output files. Compare the two output files using a graphical file difference tool, so you can compare what is different between the two output files. Ignore the obvious things such as parallel settings, partitioning information (for now), etc. Are the diagnostics of the solution steps the same, or close enough? If not, then you got something to investigate further. Both solutions, in theory, should proceed identically if the solution of the linear equations are identical. Hope the above helps,
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
March 3, 2020, 14:11 |
|
#3 |
Senior Member
Join Date: Nov 2015
Posts: 246
Rep Power: 12 |
Thanks for answer.
Unfortunately my simulation fail at second coefficient loop of first iteration. Code:
====================================================================== TIME STEP = 1 SIMULATION TIME = 1.0000E-04 CPU SECONDS = 1.684E+01 ---------------------------------------------------------------------- | SOLVING : Wall Scale | ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | Wallscale-Bulk | 0.00 | 2.7E-04 | 2.7E-04 | 31.8 8.9E-02 OK| +----------------------+------+---------+---------+------------------+ | Wallscale-Bulk | 0.09 | 2.4E-05 | 1.6E-04 | 39.5 6.3E-02 OK| +----------------------+------+---------+---------+------------------+ | Wallscale-Bulk | 0.30 | 7.0E-06 | 4.8E-05 | 39.5 6.3E-02 OK| +----------------------+------+---------+---------+------------------+ ---------------------------------------------------------------------- COEFFICIENT LOOP ITERATION = 1 CPU SECONDS = 1.750E+01 ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom-Bulk | 0.00 | 9.6E-25 | 7.7E-24 | 8.2E+20 * | | V-Mom-Bulk | 0.00 | 2.2E-23 | 1.8E-22 | 1.1E+21 * | | W-Mom-Bulk | 0.00 | 0.0E+00 | 0.0E+00 | 0.0E+00 OK| | Mass-Water | 0.00 | 2.5E-44 | 2.0E-43 | 5.2E+22 * | | Mass-Air | 0.00 | 1.3E-45 | 1.0E-44 | 15.8 9.1E+22 * | +----------------------+------+---------+---------+------------------+ | K-TurbKE-Bulk | 0.00 | 9.6E-16 | 2.8E-14 | 10.6 4.7E-10 OK| | O-TurbFreq-Bulk | 0.00 | 6.2E-02 | 1.0E+00 | 17.3 8.9E-07 OK| +----------------------+------+---------+---------+------------------+ ---------------------------------------------------------------------- COEFFICIENT LOOP ITERATION = 2 CPU SECONDS = 1.878E+01 ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ +--------------------------------------------------------------------+ | ERROR #004100018 has occurred in subroutine FINMES. | | Message: | | Fatal overflow in linear solver. | +--------------------------------------------------------------------+ In addition - I run problem on other PC and error is gone - solution run normal in parallel. Maybe I should reinstall Ansys. |
|
March 3, 2020, 14:39 |
|
#4 |
Senior Member
Join Date: Jun 2009
Posts: 1,880
Rep Power: 33 |
I would compare the output files between the successful run, and the failed run to understand what is different at the start of the run.
Similarly, the suggestion above applies for the diagnostics in the first coefficient loops. They must, in theory, be identical but there are subtle differences in parallel that should go away when converged.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
March 3, 2020, 14:52 |
|
#5 |
Senior Member
Gert-Jan
Join Date: Oct 2012
Location: Europe
Posts: 1,928
Rep Power: 28 |
Did you try 2 or 3 partitions?
Did you try an alternative parallelization method like Recursive bisection (instead of Metis)? Regs, Gert-Jan |
|
March 3, 2020, 15:18 |
|
#6 |
Senior Member
Join Date: Nov 2015
Posts: 246
Rep Power: 12 |
Gert-Jan, Opaque.
I will try tomorrow these recommendations. |
|
March 4, 2020, 04:44 |
|
#7 |
Senior Member
Join Date: Nov 2015
Posts: 246
Rep Power: 12 |
Opaque
Output files are different from the beginning. Look like solution already diverged during first coeff loop. Failed parallel output Code:
====================================================================== TIME STEP = 1 SIMULATION TIME = 1.0000E-04 CPU SECONDS = 1.661E+01 ---------------------------------------------------------------------- | SOLVING : Wall Scale | ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | Wallscale-Bulk | 0.00 | 2.1E-04 | 2.1E-04 | 46.9 1.0E-01 ok| +----------------------+------+---------+---------+------------------+ | Wallscale-Bulk | 0.10 | 2.1E-05 | 1.7E-04 | 46.9 1.3E-01 ok| +----------------------+------+---------+---------+------------------+ | Wallscale-Bulk | 0.34 | 7.3E-06 | 6.2E-05 | 46.9 1.3E-01 ok| +----------------------+------+---------+---------+------------------+ ---------------------------------------------------------------------- COEFFICIENT LOOP ITERATION = 1 CPU SECONDS = 1.734E+01 ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom-Bulk | 0.00 | 7.0E-12 | 7.9E-11 | 1.8E+08 F | | V-Mom-Bulk | 0.00 | 1.3E-10 | 1.5E-09 | 1.8E+08 F | | W-Mom-Bulk | 0.00 | 0.0E+00 | 0.0E+00 | 0.0E+00 OK| | Mass-Water | 0.00 | 2.0E-18 | 2.3E-17 | 2.5E+09 F | | Mass-Air | 0.00 | 6.3E-21 | 7.5E-20 | 15.7 4.1E+09 F | +----------------------+------+---------+---------+------------------+ | K-TurbKE-Bulk | 0.00 | 2.3E-06 | 3.4E-06 | 11.0 7.5E-10 OK| | O-TurbFreq-Bulk | 0.00 | 1.2E-01 | 1.0E+00 | 12.8 3.5E-15 OK| +----------------------+------+---------+---------+------------------+ Code:
====================================================================== TIME STEP = 1 SIMULATION TIME = 1.0000E-04 CPU SECONDS = 2.055E+00 ---------------------------------------------------------------------- | SOLVING : Wall Scale | ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | Wallscale-Bulk | 0.00 | 2.1E-04 | 2.1E-04 | 39.1 8.5E-02 OK| +----------------------+------+---------+---------+------------------+ | Wallscale-Bulk | 0.08 | 1.7E-05 | 3.6E-05 | 46.7 6.6E-02 OK| +----------------------+------+---------+---------+------------------+ | Wallscale-Bulk | 0.30 | 5.1E-06 | 1.1E-05 | 46.7 6.6E-02 OK| +----------------------+------+---------+---------+------------------+ ---------------------------------------------------------------------- COEFFICIENT LOOP ITERATION = 1 CPU SECONDS = 2.382E+00 ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom-Bulk | 0.00 | 2.9E-02 | 3.2E-01 | 2.0E-03 OK| | V-Mom-Bulk | 0.00 | 4.1E-02 | 4.7E-01 | 1.8E-03 OK| | W-Mom-Bulk | 0.00 | 0.0E+00 | 0.0E+00 | 0.0E+00 OK| | Mass-Water | 0.00 | 5.8E-06 | 4.8E-05 | 5.1E-03 OK| | Mass-Air | 0.00 | 2.6E-07 | 2.4E-05 | 15.8 1.5E-02 ok| +----------------------+------+---------+---------+------------------+ | K-TurbKE-Bulk | 0.00 | 3.0E-07 | 2.1E-06 | 8.6 1.5E-15 OK| | O-TurbFreq-Bulk | 0.00 | 1.2E-01 | 1.0E+00 | 9.3 1.3E-16 OK| +----------------------+------+---------+---------+------------------+ I have tried different partitioning methods but solution still diverge at first iteration. Some meshes (I have 8, 4 and 2 mm variants) run on 3 cores and some on 2 and 3 cores but fail at 4 cores. Coarse 8 mm case can run on all four cores. |
|
March 4, 2020, 05:13 |
|
#8 |
Senior Member
Gert-Jan
Join Date: Oct 2012
Location: Europe
Posts: 1,928
Rep Power: 28 |
This is strange. I would ask ANSYS.
Also, in Post, I would check to see how the partitioning is done (look for partition number). I would partition in vertical or horizontal direction . Btw, do you now have 1 element in the 3rd dimension? I would also perform a test with 2 elements. |
|
March 4, 2020, 07:06 |
|
#9 |
Senior Member
Join Date: Nov 2015
Posts: 246
Rep Power: 12 |
Use four elements per thickness for sure but still get divergence.
Check partition numbers on mesh – looks adequate. |
|
March 4, 2020, 19:03 |
|
#10 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,871
Rep Power: 144 |
If areas of very high gradients (such as free surfaces) align with partition boundaries you can get convergence problems. It is best to make sure partition boundaries do not align with free surfaces. Based on your images of the partitions you are using it appears this is contributing.
I would try other partitioning algorithms (eg recursive bisection) and check that they give you a better partition pattern. I would think vertical stripes would probably be a good pattern for you. But as your free surface sloshes around all over the place it might be challenging to find a partition shape which avoids the free surface for the entire run, you will have to compromise a bit there.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
March 5, 2020, 05:45 |
|
#11 |
Senior Member
Join Date: Nov 2015
Posts: 246
Rep Power: 12 |
Thanks.
Today, at one of test runs, I observe error that may confirm your statement. I run model on 3 cores and solution run ok for some time but then I got sudden divergence (at one timestep model run as usual and at other all diverge). When I change back to serial solver error is disappear. I will try different partition methods and write here if I have success. BTW if problem is in large gradients then it is possible to reduce these gradients somehow? The goal of my calculation is to obtain pressure time history to use it in Finite Element Analysis. Therefore I can neglect some of physics that have minor impact on pressure at wall. As I understand for this problem I should account two main features: -) bulk flow of water; -) pressure change inside tank. I have already perform convergence study and can say that I can neglect turbulence effects and use laminar viscous model. On the way is study homogeneous vs. inhomogeneous multiphase model. Best practices recommend use inhomogeneous for problems where interface didn’t remain constant but again – this interphase interaction may not effect on results that I want to obtain. |
|
March 5, 2020, 06:37 |
|
#12 | |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,871
Rep Power: 144 |
Quote:
If your simulation is super-sensitive to the free surface lining up with the partition boundary this suggests your model is very numerically unstable. A free surface simulation in a square box should not be very numerically unstable - so your problem is likely to actually be poor model setup causing instability. So to fix the root cause you should improve the numerical stability. Here are some tips: * Double precision numerics * Smaller timestep (how did you set the time step? Did you guess? If so then you guessed wrong) * Improve mesh quality * Better initial conditions * Check the physics is correctly configured * Tighter convergence tolerance.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
||
March 5, 2020, 06:44 |
|
#13 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,871
Rep Power: 144 |
Just had a look at your setup.
* You have a fixed time step size. Unless this is the result of a time step sensitivity study this will be wrong. I recommend you change to adaptive time stepping, converging on 3-5 coeff loops per iteration. (Actually, your simulation reaches convergence later on in 3 or 4 coeff loops so your time step probably is not too far off for this convergence tolerance) * You have min 3, max 5 coeff loops per iteration. Why have you done this? Set this to no minimum and max 10. * Have you checked your convergence tolerance is adequate? You should do a sensitivity check on this. * I see this is pseudo-2D simulation. In that case make the thickness in the z direction equal to the element size in the X or Y directions. This will make your elements closer to aspect ratio 1.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
March 6, 2020, 04:08 |
|
#14 |
Senior Member
Join Date: Nov 2015
Posts: 246
Rep Power: 12 |
I have tried to partition domain onto four vertical stripes and get failed but solution with three horizontal partitions (therefore whole initial free surface belong to one partition) run fine. But now I cannot be sure that at some time during simulation free surface location don’t cause this error again.
*Increase geometry thickness to make elements close to 1:1 - ready. *I have changed timestepping control to automatic timestepping. Here is my timestep controls. HTML Code:
TIME STEPS: First Update Time = 0.0 [s] Initial Timestep = 1e-6 [s] Option = Adaptive Timestep Update Frequency = 1 TIMESTEP ADAPTION: Maximum Timestep = 0.001 [s] (based on input data discretization levet I need in FEA analisys) Minimum Timestep = 1e-10 [s] Option = Number of Coefficient Loops Target Maximum Coefficient Loops = 5 Target Minimum Coefficient Loops = 3 Timestep Decrease Factor = 0.8 Timestep Increase Factor = 1.06 END ... SOLVER CONTROL: ... CONVERGENCE CONTROL: Maximum Number of Coefficient Loops = 10 Minimum Number of Coefficient Loops = 1 Timescale Control = Coefficient Loops END * I run sensitive study to determine adequate RMS Residual level. Results on 1e-3 and 1e-4 are pretty close. I use time history of force acting on side wall and pressure at one point as convergence parameter. Result are pretty close. Case with 5e-5 is still solving, I will update this question later, whec achieve results. BTW I have noticed that solver control convergence only for main flow parameters like mass, momentum and volume fraction. On the other hand CFX allow turbulence residuals to fall much coarser. I didn’t assign special controls to turbulence residuals. Is it planned solver behavior? For example I have target RMS 1e-5, flow parameters are converged, and turbulence residuals are 5e-5 for K and 3e-4 for Omega at third coeff. loop and solver don’t iterate further but start new timestep. |
|
March 6, 2020, 04:28 |
|
#15 |
Senior Member
Gert-Jan
Join Date: Oct 2012
Location: Europe
Posts: 1,928
Rep Power: 28 |
My 50 cents:
If you run these kind of multiphase simulations, then convergence on residuals is quite hard. So, 1e-4 might be hard to reach. Better add multiple monitor points, like your pressure point. And monitor Pressure, velocity and volume fraction. To make sure that within a timestep you reach convergence, switch on the option "Monitor Coefficient Loop Convergence". You can find this in the top of the CFX-Pre-tab Output Control>Monitor. This will give you the progression of the variables within a timestep. Best results are obtained if you have flatliners everywhere. This also allows you to create a graph in the solver manager to plot these coefficient loops. I would also recommend to plot the time step size. You can do this by creating a monitor in CFX-Pre with the option "Expression” and variable “Time Step Size”. These things won't help the solver, but graphically shows you what the solver is doing and where it has difficulties. |
|
March 6, 2020, 05:27 |
|
#16 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,871
Rep Power: 144 |
Note that for this sensitivity analysis, rather than the normal approach of comparing important variables (like what you seem to be doing quite nicely) you should look at how numerically stable the result is.
Maybe consider choosing the most unstable configuration - 4 partitions on METIS appears to crash on your initial setup very early - so try tighter convergence and smaller time steps on this configuration and see if it does not crash.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
March 6, 2020, 06:31 |
|
#17 |
Senior Member
Join Date: Nov 2015
Posts: 246
Rep Power: 12 |
To Gert-Jan.
Thanks for advice, I`ll use Coeff Loop Convergence for further. I have plot timestep already. For RME 1e-4 most of time timestep is equal or larger than 1e-4 but sometimes it fall to 1e-5. Also I monitor Residuals history of main flow quantities (mass, momentum, volume fraction), most of the time convergence is ok, residuals are below desired level but sometimes there are “spikes” when solver cannot converge during 10 coeff loops. As I mentioned before – look like solver consider turbulence residuals or use much more loose convergence criteria for them. Here is my additional convergence statistics. To ghorrocs. Unfortunately with four partitions solution, in 95% of cases, diverge at second coeff loop of first timestep, before Residuals metrics can be applied. At this point I assume that is “unsafe” to launch solution with many cores. Even with tree cores and “horizontal stripes” partitions. Some variants (different mesh size, physics, residuals) are run normal on multicore and some can fail somewhere in the middle of solution time. I can not recognize system in these fails. |
|
March 6, 2020, 06:39 |
|
#18 |
Senior Member
Gert-Jan
Join Date: Oct 2012
Location: Europe
Posts: 1,928
Rep Power: 28 |
You can also partition in a radial direction. Or use a certain direction. Why not try (1,1,0) or (1,2,0)? Then it is not in line with your free surface. Aternatively, use more elements.....
|
|
March 6, 2020, 09:55 |
|
#19 |
Senior Member
Join Date: Nov 2015
Posts: 246
Rep Power: 12 |
While trying to launch solution on many cores I found another issue. According to documentation I set Pressure Level Information (place point inside air phase) and my results have changed dramatically. When I check pressure contour I don’t see any difference. It is strange because I have set pressure distribution using expression (hydrostatic pressure) and I suppose that initialization with expression is enough.
|
|
March 6, 2020, 13:15 |
|
#20 |
Senior Member
Join Date: Jun 2009
Posts: 1,880
Rep Power: 33 |
Have you looked into the previous output file for a warning regarding the pressure level information?
If you have a closed system, the pressure level is undefined. Some setup may get away without it, but the initial conditions are not guaranteed to define the level.
__________________
Note: I do not answer CFD questions by PM. CFD questions should be posted on the forum. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
mathematical meaning of linear solver overflow | pamstad | CFX | 2 | October 8, 2019 01:54 |
some questions about :Fatal overflow in linear solver | chen bg | Main CFD Forum | 0 | October 30, 2018 22:48 |
simpleFoam parallel | AndrewMortimer | OpenFOAM Running, Solving & CFD | 12 | August 7, 2015 19:45 |
2D isothermal cylinder not converging | UPengineer | OpenFOAM Running, Solving & CFD | 7 | March 13, 2014 06:17 |
linear solver overflow | peggy | CFX | 1 | February 8, 2001 02:39 |