CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Problem running OF on cluster

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 20, 2012, 11:54
Default Problem running OF on cluster
  #1
Member
 
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15
Rebecca513 is on a distinguished road
Dear all,

I am trying to run OF on cluster. I installed centFOAM, which has OF-2.1.1.

I have run the tutorial case pitzDaily, both parallel and non-parallel on the cluster without a problem.

But when I tried to run my own mesh, it gives me error:

--> FOAM FATAL IO ERROR:
wrong token type - expected Scalar, found on line 3 the word 'nan'

file: /scratch/gpfs/hangdeng/FOAM_Run/test1/system/data::solverPerformance: at line 3.


From function operator>>(Istream&, Scalar&)
in file lnInclude/Scalar.C at line 91.

FOAM exiting



I thought my mesh might have problem. However, I ran the same mesh and case set-up on my workstation, everything is fine. On my workstation, it is OF-2.0.X.

I am not sure whether it is because of the version difference, or there is something more complicated that went wrong on the cluster installation.

If anyone has any idea or suggestion, I greatly appreciate it.

Thank you so much for your help.

Best,

Hang
Rebecca513 is offline   Reply With Quote

Old   August 21, 2012, 16:37
Default
  #2
Member
 
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15
Rebecca513 is on a distinguished road
On the cluster, the error happened at time step 99, this is how it looks:

Time = 98

smoothSolver: Solving for Ux, Initial residual = 0.521635, Final residual = 0.00282749, No Iterations 7
smoothSolver: Solving for Uy, Initial residual = 0.610594, Final residual = 0.00331592, No Iterations 8
smoothSolver: Solving for Uz, Initial residual = 0.42209, Final residual = 0.00283999, No Iterations 7
GAMG: Solving for p, Initial residual = 0.30105, Final residual = 7.36962e+55, No Iterations 100
GAMG: Solving for p, Initial residual = 0.849026, Final residual = 3.30525e+93, No Iterations 100
time step continuity errors : sum local = 9.55575e+155, global = 6.81616e+147, cumulative = 6.81616e+147
ExecutionTime = 439.68 s ClockTime = 444 s

Time = 99

smoothSolver: Solving for Ux, Initial residual = 0.566129, Final residual = 0.000374191, No Iterations 2
smoothSolver: Solving for Uy, Initial residual = 0.63567, Final residual = 0.00245801, No Iterations 2
smoothSolver: Solving for Uz, Initial residual = 0.50241, Final residual = 0.000574002, No Iterations 2
GAMG: Solving for p, Initial residual = nan, Final residual = nan, No Iterations 100
GAMG: Solving for p, Initial residual = nan, Final residual = nan, No Iterations 100

It is obvious that the residuals for both p and U are too high, which generates this 'nan' error.

However, the log file on my workstation looks quite normal:

Time = 98

smoothSolver: Solving for Ux, Initial residual = 0.00059195, Final residual = 4.37997e-06, No Iterations 7
smoothSolver: Solving for Uy, Initial residual = 0.000782548, Final residual = 6.39388e-06, No Iterations 7
smoothSolver: Solving for Uz, Initial residual = 0.000572221, Final residual = 4.79188e-06, No Iterations 7
GAMG: Solving for p, Initial residual = 0.00922742, Final residual = 8.04101e-06, No Iterations 5
GAMG: Solving for p, Initial residual = 0.00850465, Final residual = 7.19074e-06, No Iterations 5
time step continuity errors : sum local = 5.29957e-05, global = -1.78685e-07, cumulative = -0.000110236
ExecutionTime = 499.88 s ClockTime = 500 s

Time = 99

smoothSolver: Solving for Ux, Initial residual = 0.000572748, Final residual = 4.24225e-06, No Iterations 7
smoothSolver: Solving for Uy, Initial residual = 0.000761879, Final residual = 6.23747e-06, No Iterations 7
smoothSolver: Solving for Uz, Initial residual = 0.000557909, Final residual = 4.68015e-06, No Iterations 7
GAMG: Solving for p, Initial residual = 0.00920748, Final residual = 8.09545e-06, No Iterations 5
GAMG: Solving for p, Initial residual = 0.00850217, Final residual = 7.26579e-06, No Iterations 5
time step continuity errors : sum local = 5.35504e-05, global = -2.14871e-07, cumulative = -0.000110451
ExecutionTime = 503.95 s ClockTime = 504 s

Given that the case set-ups are the same, I am not sure why the computation process has gone wrong on the server.

Can anyone give me some idea or suggestion? I truly appreciate it!

Thank you.

Best,

Hang
Rebecca513 is offline   Reply With Quote

Old   August 21, 2012, 16:59
Default
  #3
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Hang,

From your first post, the address seems a bit strange:
Code:
system/data::solverPerformance::p
Are you using a customized solver?

On the cluster:
  • Does the error occur with only 1 machine or it doesn't matter how many machines you use?
  • Is the main "system" folder of the case visible to all nodes?

The differences shown in the second post are indeed very far apart; initial residuals are 1000 times smaller in your own machine with 2.0.x.

I believe CentFOAM still has an install option for 2.0.x as well. The other possibility would be to install 2.1.1 in your machine.
Other than you testing things on your side, we'll need at least to know:
  • What solver are you using, or at least based on which solver?
    • If you're using a custom solver, can your case work with the original solver?
  • How many cells or points does your mesh have?
  • Does the mesh have any cyclic, mapped, wedge or any other special boundary condition?
    • If it does, which decomposition method did you use?
  • Was the mesh generated in parallel or in serial (single-core)? This is mostly relevant in case it was made with snappyHexMesh.
  • Does running checkMesh in parallel give the same output with both versions of OpenFOAM?
Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   August 21, 2012, 17:17
Default
  #4
Member
 
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15
Rebecca513 is on a distinguished road
Hi Bruno,

Thank you for the reply!

I tried to run parallel earlier (decomposed using simple method), it gave me similar errors, I thought the issue was related to parallel computation, so I instead tried to run the mesh on a single core. The errors in the posts are for the single-core run.

So,
  • Does the error occur with only 1 machine or it doesn't matter how many machines you use?
    I guess it doesn't matter how many machines I use.
  • Is the main "system" folder of the case visible to all nodes?
    I think so.

  • What solver are you using, or at least based on which solver?
    simpleFoam, no change to the solver has been made
  • How many cells or points does your mesh have?
    it is an unstructured mesh (not generated by snappyHexMesh), 67562 points and 338756 cells.
  • Does the mesh have any cyclic, mapped, wedge or any other special boundary condition?
    No

Thank you~

Best,

Hang
Rebecca513 is offline   Reply With Quote

Old   August 22, 2012, 17:58
Default
  #5
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Hang,

If think you forgot to answer this question:
Quote:
Does running checkMesh in parallel give the same output with both versions of OpenFOAM?
Without an example case where we're able to reproduce this very same problem, then all that is left is for you to test this on your side, namely:
  • If you are using CentOS on your workstation, try installing 2.1.1.
  • Or try installing 2.0.x on your cluster.
I say this because there are simply too many changes made between the two versions of OpenFOAM, to be able to assess the one change that might have caused this to happen.

Although, the one detail that comes to mind is that the configuration of "fvSolution" might have some minor differences between the two versions. For example, if you run a command similar to this one:
Code:
diff -Nur ~/OpenFOAM/OpenFOAM-2.0.x/tutorials/incompressible/simpleFoam/pitzDaily ~/OpenFOAM/OpenFOAM-2.1.x/tutorials/incompressible/simpleFoam/pitzDaily
You'll a similar output to this one:
Code:
@@ -1,7 +1,7 @@
 /*--------------------------------*- C++ -*----------------------------------*\
 | =========                 |                                                 |
 | \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
-|  \\    /   O peration     | Version:  2.0.0                                 |
+|  \\    /   O peration     | Version:  2.1.x                                 |
 |   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
 |    \\/     M anipulation  |                                                 |
 \*---------------------------------------------------------------------------*/
@@ -80,12 +80,18 @@
 
 relaxationFactors
 {
-    p               0.3;
-    U               0.7;
-    k               0.7;
-    epsilon         0.7;
-    R               0.7;
-    nuTilda         0.7;
+    fields
+    {
+        p               0.3;
+    }
+    equations
+    {
+        U               0.7;
+        k               0.7;
+        epsilon         0.7;
+        R               0.7;
+        nuTilda         0.7;
+    }
 }
As you can see, the relaxation parameters have been regrouped... er, wait, this does indeed look like what might be triggering the error you're getting! By default, the relaxation parameters might be set to 1 or higher than the ones you have in your case!




By the way, you can safely have more than one version of OpenFOAM on your machines. For example, instead of having this in "~/.bashrc":
Code:
source $HOME/OpenFOAM/OpenFOAM-2.0.x/etc/bashrc
You can have this:
Code:
alias of20x='source $HOME/OpenFOAM/OpenFOAM-2.0.x/etc/bashrc'

alias of210='source $HOME/OpenFOAM/OpenFOAM-2.1.0/etc/bashrc'
Then on each new terminal, run of210 or of20x to start the desired environment.


Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   August 23, 2012, 11:36
Default
  #6
Member
 
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15
Rebecca513 is on a distinguished road
Hi Bruno,

Thank you for the reply.

I copied the system files from OF21 tutorial, and changed the values accordingly, but it is still giving me the same error.

I will try and install OF20 see if it works.

About
'Does running checkMesh in parallel give the same output with both versions of OpenFOAM?'

I am not sure how to run checkMesh in parallel, could you elaborate on that a little bit.

Thank you so much.

Best,

Hang
Rebecca513 is offline   Reply With Quote

Old   August 23, 2012, 15:04
Default
  #7
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Hang,

Quote:
Originally Posted by Rebecca513 View Post
I copied the system files from OF21 tutorial, and changed the values accordingly, but it is still giving me the same error.
Unfortunately, this is one of those reasons why switching between OpenFOAM versions isn't straight forward. You should compare all of the tutorials you've based yourself on.

But still, my usual suggestion is to create a small and simple case that can reproduce the same error, then share it here on the forum. Usually, a modified tutorial does the trick. Of course the same steps should be taken for execution, whenever possible. For example, mapFields and so on.

Quote:
Originally Posted by Rebecca513 View Post
About
'Does running checkMesh in parallel give the same output with both versions of OpenFOAM?'

I am not sure how to run checkMesh in parallel, could you elaborate on that a little bit.
It's simple! The same way you run simpleFoam in parallel, you can run checkMesh as well!
For example, with foamJob:
Code:
foamJob -s -p checkMesh
foamJob -s -p simpleFoam
Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   August 23, 2012, 22:39
Default
  #8
Member
 
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15
Rebecca513 is on a distinguished road
Hi Bruno,

I did checkMesh on my workstation and the cluster, the logs are uploaded to the link below:

Now the problem is that if I cut 1/10th of the mesh out and run it (using all the system files from OF20), it works on the cluster with and without -parallel. But when the mesh is larger, the problem starts to pop out.

I uploaded the case which failed on the cluster here: http://www.princeton.edu/~hangdeng/, I appreciate it if you could take a look.

Thank you.

Best,

Hang
Rebecca513 is offline   Reply With Quote

Old   August 24, 2012, 18:46
Default
  #9
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Hang,

Access is restricted on that link. I can see the list of files, but I don't have permissions for downloading.

Cut 1/10th... do you mean that you're simulating only part of the whole volume, or the cell count is 1/10th (i.e., a coarser mesh)?

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   August 24, 2012, 19:28
Default
  #10
Member
 
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15
Rebecca513 is on a distinguished road
Hi Bruno,

Sorry for the confusion, I meant part of the mesh, not a coarser mesh.

Apologies that I didn't realize the link has the restriction.

Do you mind giving me you email address through private message so that I can share it with you through dropbox or google drive?


Thank you~

Best,

Hang
Rebecca513 is offline   Reply With Quote

Old   August 24, 2012, 19:51
Default
  #11
Member
 
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15
Rebecca513 is on a distinguished road
Hi Bruno,

Never mind, I have changed the permission so that you should be able to download the files from this link:http://www.princeton.edu/~hangdeng/

Thank you.

Best,

Hang
Rebecca513 is offline   Reply With Quote

Old   August 25, 2012, 06:32
Default
  #12
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Hang,

I've confirmed that this problem is triggered as soon as we switch from OpenFOAM 2.0.x to 2.1.0. I've tried doing some minor adjustments in "fvSchemes", reducing the relaxation parameters, even tried re-decomposing + using scotch; and tried converting the mesh using foamFormatConvert in case it was some sort of mesh incompatibility...
And nothing worked!

BUT! I've found an interesting solution polyDualMesh!

Here are the steps I've taken:
  1. Removed the processor folders.
  2. Changed decomposition from simple to scotch.
  3. Converted the mesh:
    Code:
    polyDualMesh 30 -overwrite
    This converted the mesh from "tetrahedra: 339501" to "polyhedra: 67664"
  4. Decomposed and ran:
    Code:
    decomposePar
    foamJob -p -s simpleFoam
  5. The case was solved at a blazing speed The case converged very fast (less than 100 iterations), even if the skew faces count went from 1 to 17

A few more notes on the changes needed from OpenFOAM 2.0 to 2.1:
  • As you've seen before, relaxation parameters we're regrouped.
  • Convergence control for your case should be something like this:
    Code:
    SIMPLE
    {
        nNonOrthogonalCorrectors 1;
        residualControl
        {
            p               1e-5;
            U               1e-5;
        }
    }


Conclusion: if you want, you can/should report this bug to the OpenFOAM team, since this seems to be a very strange numerical discrepancy, mainly due to the tetrahedral mesh. Sharing the case with them is crucial, since this seems to be a very isolated problem.
I think you already know, but in case you don't, the bug tracker for OpenFOAM is this one: http://www.openfoam.com/mantisbt/

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   August 25, 2012, 17:25
Default
  #13
Member
 
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15
Rebecca513 is on a distinguished road
Hello Bruno,

Thank you soooo much!

polyDualMesh works! At least for the single-core case.

But I was not able to run decomposePar with scotch. I followed this tutorial (http://web.student.chalmers.se/group...elLucchini.pdf) in setting up the dict file:
\*---------------------------------------------------------------------------*/
FoamFile
{
version 2.0;
format ascii;
class dictionary;
location "system";
object decomposeParDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

numberOfSubdomains 4;

method scotch;

simpleCoeffs
{
n ( 2 1 1 );
delta 0.001;
}

hierarchicalCoeffs
{
n ( 2 2 1 );
delta 0.001;
order xyz;
}

metisCoeffs
{
processorWeights
{
1
1
1
1
};
}

manualCoeffs
{
dataFile "";
}

distributed no;

roots ( );


// ************************************************** *********************** //

but it gives me errors:

Selecting decompositionMethod scotch


--> FOAM FATAL ERROR:
You are trying to use scotch but do not have the scotchDecomp library loaded.
This message is from the dummy scotchDecomp stub library instead.

Please install scotch and make sure that libscotch.so is in your LD_LIBRARY_PATH.
The scotchDecomp library can then be built in $FOAM_SRC/parallel/decompose/decompositionMethods/scotchDecomp

Am I missing something?

Relating to polyDualMesh:
(1) could you please elaborate on the number '30'. I actually posted a thread (http://www.cfd-online.com/Forums/ope...ydualmesh.html) a while ago about polyDualMesh, where I used 60 but failed to convert the mesh.
(2) After the conversion, my understanding is that the geometry of the object should not be changed, right?

Also, I have other even larger and more complex meshes. I will try on the cluster, and let you know whether they work as well!

Thank you~

Best,

Hang
Rebecca513 is offline   Reply With Quote

Old   August 26, 2012, 08:33
Default
  #14
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Hi Hang,

It looks like "scotch" isn't built for some reason
Perhaps in the cluster with OpenFOAM 2.1.1 it is working as intended.

As for polyDualMesh: the value is the feature angle with which the converter works with when looking at the mesh. I basically got lucky, because the other value I had tried was 150 and was a lot worse. A few more indications on how to use it:
  • http://openfoamwiki.net/index.php/Po...esh_generation
  • From here: http://www.idurun.com/?p=367
    Quote:
    Where the feature angle is that the minimum angle between two faces.
  • Don't forget to run checkMesh after running polyDualMesh, to assess how good/bad the result is.
    edit: keep in mind that if the errors aren't very bad (skew faces <6 instead of 4), then it might work as intended!
  • And also check the other options:
    Code:
    polyDualMesh -help
And yes, the meshes should be identical, when it comes to geometrical representation. If not, then something went very wrong, possibly due to a bad feature angle.

Best regards,
Bruno
__________________

Last edited by wyldckat; August 26, 2012 at 08:34. Reason: see "edit:"
wyldckat is offline   Reply With Quote

Old   August 26, 2012, 18:09
Default
  #15
Member
 
HD
Join Date: Jul 2011
Posts: 56
Rep Power: 15
Rebecca513 is on a distinguished road
Hello Bruno,

Thank you for your reply. That clears a lot of things up~

I tried scotch on the cluster, it is not working, I will see whether simple can be used as an alternative.

Best,

Hang
Rebecca513 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Question about running user-defined applications on cluster stilljourney OpenFOAM Running, Solving & CFD 1 July 11, 2012 16:20
another issue about HPC cluster for running cfx, hepl PLZ. happy CFX 4 March 5, 2012 00:58
Problem of cluster aerodynamics FLUENT 4 July 11, 2011 09:53
Problem running parallel Hernán Main CFD Forum 0 December 22, 2009 05:36
Statically Compiling OpenFOAM Issues herzfeldd OpenFOAM Installation 21 January 6, 2009 10:38


All times are GMT -4. The time now is 11:39.