|
[Sponsors] |
August 20, 2012, 11:54 |
Problem running OF on cluster
|
#1 |
Member
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15 |
Dear all,
I am trying to run OF on cluster. I installed centFOAM, which has OF-2.1.1. I have run the tutorial case pitzDaily, both parallel and non-parallel on the cluster without a problem. But when I tried to run my own mesh, it gives me error: --> FOAM FATAL IO ERROR: wrong token type - expected Scalar, found on line 3 the word 'nan' file: /scratch/gpfs/hangdeng/FOAM_Run/test1/system/data::solverPerformance: at line 3. From function operator>>(Istream&, Scalar&) in file lnInclude/Scalar.C at line 91. FOAM exiting I thought my mesh might have problem. However, I ran the same mesh and case set-up on my workstation, everything is fine. On my workstation, it is OF-2.0.X. I am not sure whether it is because of the version difference, or there is something more complicated that went wrong on the cluster installation. If anyone has any idea or suggestion, I greatly appreciate it. Thank you so much for your help. Best, Hang |
|
August 21, 2012, 16:37 |
|
#2 |
Member
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15 |
On the cluster, the error happened at time step 99, this is how it looks:
Time = 98 smoothSolver: Solving for Ux, Initial residual = 0.521635, Final residual = 0.00282749, No Iterations 7 smoothSolver: Solving for Uy, Initial residual = 0.610594, Final residual = 0.00331592, No Iterations 8 smoothSolver: Solving for Uz, Initial residual = 0.42209, Final residual = 0.00283999, No Iterations 7 GAMG: Solving for p, Initial residual = 0.30105, Final residual = 7.36962e+55, No Iterations 100 GAMG: Solving for p, Initial residual = 0.849026, Final residual = 3.30525e+93, No Iterations 100 time step continuity errors : sum local = 9.55575e+155, global = 6.81616e+147, cumulative = 6.81616e+147 ExecutionTime = 439.68 s ClockTime = 444 s Time = 99 smoothSolver: Solving for Ux, Initial residual = 0.566129, Final residual = 0.000374191, No Iterations 2 smoothSolver: Solving for Uy, Initial residual = 0.63567, Final residual = 0.00245801, No Iterations 2 smoothSolver: Solving for Uz, Initial residual = 0.50241, Final residual = 0.000574002, No Iterations 2 GAMG: Solving for p, Initial residual = nan, Final residual = nan, No Iterations 100 GAMG: Solving for p, Initial residual = nan, Final residual = nan, No Iterations 100 It is obvious that the residuals for both p and U are too high, which generates this 'nan' error. However, the log file on my workstation looks quite normal: Time = 98 smoothSolver: Solving for Ux, Initial residual = 0.00059195, Final residual = 4.37997e-06, No Iterations 7 smoothSolver: Solving for Uy, Initial residual = 0.000782548, Final residual = 6.39388e-06, No Iterations 7 smoothSolver: Solving for Uz, Initial residual = 0.000572221, Final residual = 4.79188e-06, No Iterations 7 GAMG: Solving for p, Initial residual = 0.00922742, Final residual = 8.04101e-06, No Iterations 5 GAMG: Solving for p, Initial residual = 0.00850465, Final residual = 7.19074e-06, No Iterations 5 time step continuity errors : sum local = 5.29957e-05, global = -1.78685e-07, cumulative = -0.000110236 ExecutionTime = 499.88 s ClockTime = 500 s Time = 99 smoothSolver: Solving for Ux, Initial residual = 0.000572748, Final residual = 4.24225e-06, No Iterations 7 smoothSolver: Solving for Uy, Initial residual = 0.000761879, Final residual = 6.23747e-06, No Iterations 7 smoothSolver: Solving for Uz, Initial residual = 0.000557909, Final residual = 4.68015e-06, No Iterations 7 GAMG: Solving for p, Initial residual = 0.00920748, Final residual = 8.09545e-06, No Iterations 5 GAMG: Solving for p, Initial residual = 0.00850217, Final residual = 7.26579e-06, No Iterations 5 time step continuity errors : sum local = 5.35504e-05, global = -2.14871e-07, cumulative = -0.000110451 ExecutionTime = 503.95 s ClockTime = 504 s Given that the case set-ups are the same, I am not sure why the computation process has gone wrong on the server. Can anyone give me some idea or suggestion? I truly appreciate it! Thank you. Best, Hang |
|
August 21, 2012, 16:59 |
|
#3 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Hang,
From your first post, the address seems a bit strange: Code:
system/data::solverPerformance::p On the cluster:
The differences shown in the second post are indeed very far apart; initial residuals are 1000 times smaller in your own machine with 2.0.x. I believe CentFOAM still has an install option for 2.0.x as well. The other possibility would be to install 2.1.1 in your machine. Other than you testing things on your side, we'll need at least to know:
Bruno
__________________
|
|
August 21, 2012, 17:17 |
|
#4 |
Member
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15 |
Hi Bruno,
Thank you for the reply! I tried to run parallel earlier (decomposed using simple method), it gave me similar errors, I thought the issue was related to parallel computation, so I instead tried to run the mesh on a single core. The errors in the posts are for the single-core run. So,
Thank you~ Best, Hang |
|
August 22, 2012, 17:58 |
|
#5 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Hang,
If think you forgot to answer this question: Quote:
Although, the one detail that comes to mind is that the configuration of "fvSolution" might have some minor differences between the two versions. For example, if you run a command similar to this one: Code:
diff -Nur ~/OpenFOAM/OpenFOAM-2.0.x/tutorials/incompressible/simpleFoam/pitzDaily ~/OpenFOAM/OpenFOAM-2.1.x/tutorials/incompressible/simpleFoam/pitzDaily Code:
@@ -1,7 +1,7 @@ /*--------------------------------*- C++ -*----------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | -| \\ / O peration | Version: 2.0.0 | +| \\ / O peration | Version: 2.1.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ @@ -80,12 +80,18 @@ relaxationFactors { - p 0.3; - U 0.7; - k 0.7; - epsilon 0.7; - R 0.7; - nuTilda 0.7; + fields + { + p 0.3; + } + equations + { + U 0.7; + k 0.7; + epsilon 0.7; + R 0.7; + nuTilda 0.7; + } } By the way, you can safely have more than one version of OpenFOAM on your machines. For example, instead of having this in "~/.bashrc": Code:
source $HOME/OpenFOAM/OpenFOAM-2.0.x/etc/bashrc Code:
alias of20x='source $HOME/OpenFOAM/OpenFOAM-2.0.x/etc/bashrc' alias of210='source $HOME/OpenFOAM/OpenFOAM-2.1.0/etc/bashrc' Best regards, Bruno
__________________
|
||
August 23, 2012, 11:36 |
|
#6 |
Member
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15 |
Hi Bruno,
Thank you for the reply. I copied the system files from OF21 tutorial, and changed the values accordingly, but it is still giving me the same error. I will try and install OF20 see if it works. About 'Does running checkMesh in parallel give the same output with both versions of OpenFOAM?' I am not sure how to run checkMesh in parallel, could you elaborate on that a little bit. Thank you so much. Best, Hang |
|
August 23, 2012, 15:04 |
|
#7 | ||
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Hang,
Quote:
But still, my usual suggestion is to create a small and simple case that can reproduce the same error, then share it here on the forum. Usually, a modified tutorial does the trick. Of course the same steps should be taken for execution, whenever possible. For example, mapFields and so on. Quote:
For example, with foamJob: Code:
foamJob -s -p checkMesh foamJob -s -p simpleFoam Bruno
__________________
|
|||
August 23, 2012, 22:39 |
|
#8 |
Member
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15 |
Hi Bruno,
I did checkMesh on my workstation and the cluster, the logs are uploaded to the link below: Now the problem is that if I cut 1/10th of the mesh out and run it (using all the system files from OF20), it works on the cluster with and without -parallel. But when the mesh is larger, the problem starts to pop out. I uploaded the case which failed on the cluster here: http://www.princeton.edu/~hangdeng/, I appreciate it if you could take a look. Thank you. Best, Hang |
|
August 24, 2012, 18:46 |
|
#9 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Hang,
Access is restricted on that link. I can see the list of files, but I don't have permissions for downloading. Cut 1/10th... do you mean that you're simulating only part of the whole volume, or the cell count is 1/10th (i.e., a coarser mesh)? Best regards, Bruno
__________________
|
|
August 24, 2012, 19:28 |
|
#10 |
Member
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15 |
Hi Bruno,
Sorry for the confusion, I meant part of the mesh, not a coarser mesh. Apologies that I didn't realize the link has the restriction. Do you mind giving me you email address through private message so that I can share it with you through dropbox or google drive? Thank you~ Best, Hang |
|
August 24, 2012, 19:51 |
|
#11 |
Member
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15 |
Hi Bruno,
Never mind, I have changed the permission so that you should be able to download the files from this link:http://www.princeton.edu/~hangdeng/ Thank you. Best, Hang |
|
August 25, 2012, 06:32 |
|
#12 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Hang,
I've confirmed that this problem is triggered as soon as we switch from OpenFOAM 2.0.x to 2.1.0. I've tried doing some minor adjustments in "fvSchemes", reducing the relaxation parameters, even tried re-decomposing + using scotch; and tried converting the mesh using foamFormatConvert in case it was some sort of mesh incompatibility... And nothing worked! BUT! I've found an interesting solution polyDualMesh! Here are the steps I've taken:
A few more notes on the changes needed from OpenFOAM 2.0 to 2.1:
Conclusion: if you want, you can/should report this bug to the OpenFOAM team, since this seems to be a very strange numerical discrepancy, mainly due to the tetrahedral mesh. Sharing the case with them is crucial, since this seems to be a very isolated problem. I think you already know, but in case you don't, the bug tracker for OpenFOAM is this one: http://www.openfoam.com/mantisbt/ Best regards, Bruno
__________________
|
|
August 25, 2012, 17:25 |
|
#13 |
Member
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15 |
Hello Bruno,
Thank you soooo much! polyDualMesh works! At least for the single-core case. But I was not able to run decomposePar with scotch. I followed this tutorial (http://web.student.chalmers.se/group...elLucchini.pdf) in setting up the dict file: \*---------------------------------------------------------------------------*/ FoamFile { version 2.0; format ascii; class dictionary; location "system"; object decomposeParDict; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // numberOfSubdomains 4; method scotch; simpleCoeffs { n ( 2 1 1 ); delta 0.001; } hierarchicalCoeffs { n ( 2 2 1 ); delta 0.001; order xyz; } metisCoeffs { processorWeights { 1 1 1 1 }; } manualCoeffs { dataFile ""; } distributed no; roots ( ); // ************************************************** *********************** // but it gives me errors: Selecting decompositionMethod scotch --> FOAM FATAL ERROR: You are trying to use scotch but do not have the scotchDecomp library loaded. This message is from the dummy scotchDecomp stub library instead. Please install scotch and make sure that libscotch.so is in your LD_LIBRARY_PATH. The scotchDecomp library can then be built in $FOAM_SRC/parallel/decompose/decompositionMethods/scotchDecomp Am I missing something? Relating to polyDualMesh: (1) could you please elaborate on the number '30'. I actually posted a thread (http://www.cfd-online.com/Forums/ope...ydualmesh.html) a while ago about polyDualMesh, where I used 60 but failed to convert the mesh. (2) After the conversion, my understanding is that the geometry of the object should not be changed, right? Also, I have other even larger and more complex meshes. I will try on the cluster, and let you know whether they work as well! Thank you~ Best, Hang |
|
August 26, 2012, 08:33 |
|
#14 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Hang,
It looks like "scotch" isn't built for some reason Perhaps in the cluster with OpenFOAM 2.1.1 it is working as intended. As for polyDualMesh: the value is the feature angle with which the converter works with when looking at the mesh. I basically got lucky, because the other value I had tried was 150 and was a lot worse. A few more indications on how to use it:
Best regards, Bruno
__________________
Last edited by wyldckat; August 26, 2012 at 08:34. Reason: see "edit:" |
||
August 26, 2012, 18:09 |
|
#15 |
Member
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15 |
Hello Bruno,
Thank you for your reply. That clears a lot of things up~ I tried scotch on the cluster, it is not working, I will see whether simple can be used as an alternative. Best, Hang |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Question about running user-defined applications on cluster | stilljourney | OpenFOAM Running, Solving & CFD | 1 | July 11, 2012 16:20 |
another issue about HPC cluster for running cfx, hepl PLZ. | happy | CFX | 4 | March 5, 2012 00:58 |
Problem of cluster | aerodynamics | FLUENT | 4 | July 11, 2011 09:53 |
Problem running parallel | Hernán | Main CFD Forum | 0 | December 22, 2009 05:36 |
Statically Compiling OpenFOAM Issues | herzfeldd | OpenFOAM Installation | 21 | January 6, 2009 10:38 |