|
[Sponsors] |
April 16, 2005, 00:19 |
Parallel startup trouble!
|
#1 |
Guest
Posts: n/a
|
Hello all!
I have a definition file made using CFX-5.6 that has run successfully on a Sun SMP machine using the 5.6 solver. Now I am trying to run it on a Linux cluster with the 5.7.1 solver, default PVM executable. The solver seems to have trouble starting. I notice by top-ing compute nodes that the CPUs are surging up and down. This all occurs after the mesh is partitioned with no problems whatsoever. The compute processes (pvmsolve) die off one by one, and the output file ends before reporting the first iteration results. I thought that this might be due to a poorly defined mesh or partition, but I can't understand why it would run on one machine and not another. Any clues as to the nature of this problem would be greatly appreciated!! Thanks, John |
|
April 17, 2005, 16:06 |
Re: Parallel startup trouble!
|
#2 |
Guest
Posts: n/a
|
I think your cluster/pvm has problems. I had a problem like this before when I meshed using build in parallel, it started fine and then each nodes dies off. Try this in linux for each of your nodes, type 'top', it should list all the processes and memory usage, see if pvmsolver is there and see how long before it is removed, I bet it won't be long.
|
|
April 17, 2005, 19:25 |
Re: Parallel startup trouble!
|
#3 |
Guest
Posts: n/a
|
Hi,
Moyo is right, there is something wrong in your machine. If the problem was a poor mesh or incorrect problem setup it would either give an error message or diverge. Glenn Horrocks |
|
April 17, 2005, 22:35 |
Re: Parallel startup trouble!
|
#4 |
Guest
Posts: n/a
|
I agree with both assertions... it seems unlikely that it would run on one machine and not another if there were a bad mesh or definition file.
On the other hand, other .def files of similar size run without any problems whatsoever, where top shows pvmsolve taking up ~99% CPU on all involved nodes, steadily. In this case, however, I can top a node and the CPU fluctuates between 1% and 90%. It did not behave this way on the multi-processor Sun machine, however. BTW, this 5.7.1 installation is over the new ROCKS distro. Are there any known bugs specific to implementations of CFX on ROCKS? The ROCKS usergroups are, of course, filled with instances of network troubles... after all, what do you want for free?? Thanks!! John |
|
April 18, 2005, 04:11 |
Re: Parallel startup trouble!
|
#5 |
Guest
Posts: n/a
|
Hi,
What's the error message that you get when CFX exits? I experience the same problem i.e. "CPU surges up & down" and then CFX exists with a return code 255. This happens during the 1st iteration and most of the time a bad mesh is the cause. I use WinXP. Santhosh |
|
April 18, 2005, 10:54 |
Re: Parallel startup trouble!
|
#6 |
Guest
Posts: n/a
|
That's the trouble... I get no error message at all. Like your case, it happens during the 1st iteration.
John |
|
April 18, 2005, 11:31 |
Re: Parallel startup trouble!
|
#7 |
Guest
Posts: n/a
|
You might have to wait for an awful long time to get that error message. I usually kill the job while this happens, especially if in the 1st Iteration and get back to meshing as soon as I can. Have you tried to improve your mesh?
Santhosh |
|
April 18, 2005, 22:13 |
Re: Parallel startup trouble!
|
#8 |
Guest
Posts: n/a
|
No, I haven't tried any mesh improvement. Is mesh improvement available in CFX-Build?
John |
|
April 22, 2005, 12:52 |
Re: Parallel startup trouble!
|
#9 |
Guest
Posts: n/a
|
John,
Just a thought. Have you installed CFX5.7 and the pvm libraries on all of the nodes in the linux cluster or are you getting it from a server? (ie could it be a dodgy NFS share?) Have you created the machines file (or is that for MPI?) in your home directory listing all of the machines that the code can run on? I ask this, as you say that it does the partitioning (serial) and then fails in iteration1 (parallel) If pvm is not on every machine then it can't communicate with itself and will fail when it tries to. Have you looked in the /tmp or /var/tmp directories of each machine in the cluster to see if there are any messages either relating to cfx or pvm? Julian |
|
April 22, 2005, 21:17 |
Re: Parallel startup trouble!
|
#10 |
Guest
Posts: n/a
|
Thanks to everyone for their input. I believe that Santhosh has the best answer, it was most likely due to a poorly constructed mesh. Some planar surfaces were not well parameterized. I added some new edge points and turned these ugly surfaces into more, but better looking, planar surfaces. I still get some CPU "surging", i.e. the processors do not run steadily at near 100%, but the solution does come out and the forces do converge rapidly. Now I'll try more mesh controls and up the element count in the wake region until the solution appears to be mesh independent.
Thanks again, everyone! John |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Parallel INIT UDF trouble | mil3st3g | Fluent UDF and Scheme Programming | 2 | January 6, 2011 15:07 |
Parallel Trouble: CFX 11 XP64 - Help? | bbmorales | CFX | 3 | December 5, 2009 05:59 |
Trouble with parallel runs | cfdmarkus | OpenFOAM Running, Solving & CFD | 9 | February 27, 2009 04:59 |
Parallel Fluent: trouble going from 2 to 4 CPUs | Mario | FLUENT | 6 | August 24, 2006 01:17 |
Parallel interfom trouble in execution | mer | OpenFOAM Running, Solving & CFD | 6 | October 18, 2005 06:45 |