|
[Sponsors] |
January 2, 2007, 18:23 |
distributed mpich problem
|
#1 |
Guest
Posts: n/a
|
I am having problems with distributed MPICH between two XP32 and XP64 boxes. This is what I have done. I have the same user with admin rights logged onto both systems as a local user. The rsh and mpich services are configured to use the above mentioned user. I am running CFX10 SP1 on both boxes. On the XP64 box I have installed CFX to c:\1\cfx so that I did not have to worry about spaces in the path, but the XP32 box has it installed in c:\program files\ansys inc\… etc. The correct short path names have been entered into the hosts.ccl files on both machines correctly, the hosts files are identical.
The system works fine when running distributed PVM, although I do get 2 warnings 1. Warning! rsh connection to host X produces the following output before the output of the command: Terminal readThe handle is invalid. This may cause problems spawning parallel slaves, especially on Windows. 2. Warning! rsh connection to host X produces the following output after the output of the command: : It could indicate that an rshd service from a different vendor is running, which may not provide the necessary functionality. This may cause problems spawning parallel slaves. These are warnings that I happen at the start of the mpich run as well. I would like to fix them up as they may be having an effect on the mpich run. But when I run distributed MPICH I get the following error -------------- An error has occurred in cfx5solve: The ANSYS CFX solver has terminated without writing a results file. Command on host cluster exited with return code 0. -------------- That is it, no more explanation that that. The services (rsh and mpich) are running on both machines, the distributed PVM works ok, but with the above described warnings that I would like to clear up. I don't know what the problem is. Can anyone help with this? Thanks Trevor P |
|
January 2, 2007, 20:34 |
Re: distributed mpich problem
|
#2 |
Guest
Posts: n/a
|
I thought you could only run MPICH on homogeneous clusters. I'm not sure if XP32 and XP64 count as being the same.
Can you run MPICH local parallel, or MPICH distributed on XP32 or XP 64 (not both)? |
|
January 2, 2007, 21:39 |
Re: distributed mpich problem
|
#3 |
Guest
Posts: n/a
|
MPICH local parallel runs fine. The documentation only says (from memory) that you cannot run a mix of Unix/Linux and windows, it does not mention and combination of different variants of windows, XP32, XP64, WIN2000 etc. If that is the problem, then it is easily fixed (oh if things were that simple).
PVM runs fine between XP32 Master and XP64 slave, but not MPICH. Thanks Trevor P |
|
January 3, 2007, 01:20 |
Re: distributed mpich problem
|
#4 |
Guest
Posts: n/a
|
The doc you read only applies to PVM.
MPICH must be homogeneous in hardware & os. so no xp32+xp64 parallel runs with this. |
|
January 3, 2007, 01:47 |
Re: distributed mpich problem
|
#5 |
Guest
Posts: n/a
|
On page 57 of the ANSYS CFX 10.0: Installation and Overview doc, under Setting up MPICH for Windows it states Important: Distributed parallel using MPICH cannot be set up using a mixture of UNIX and Windows machines.
Can you confirm that it also means that no XP32 to XP64? Also has anyone got XP64 to XP64 MPICH running with Ansys CFX 10.0 SP1 for win 32 working? or is it just XP32 to XP32? Thanks for you help, I really appreciate it, I may be making headway. Thanks Trevor |
|
January 3, 2007, 08:05 |
Re: distributed mpich problem
|
#6 |
Guest
Posts: n/a
|
Ok, I just tried distributed mpich between 2 XP32 machines and got exactly the same errors. I even changed my main machines installation directory to c:\1\cfx to remove all spaces etc.
Interestingly enough, I did attempt a run with the mpich daemon service turned off and got exactly the same error, "Command on host XX exited with return code 0". I still get the 2 warnings as beforeand the thing still crashes. Both machines are runnign XP pro SP2, 1 is a dell9400, the other a Toshiba Tecra A1. Do you have to go into the MPIConfig.exe to configure anything? Is there anyway to test that the mpich service is working, you know just like there is with the rsh <machine name> cmd /c set commands? I have also just accepted the defaults with the dist parallel set up, I have not touched the advanced options tabs. Thanks Trevor |
|
January 3, 2007, 11:05 |
Re: distributed mpich problem
|
#7 |
Guest
Posts: n/a
|
As my experience, MPICH could be easily set on several IDENTICAL machines (both OS and Hardware), but easily has problems on others. Actually, distributed MPICH is unstable and has very less advantages than distributed PVM.
If you want faster speed, local MPICH is the best way. Distributed MPICH and PVM may have the same speed under Windows OS. |
|
January 3, 2007, 20:20 |
Re: distributed mpich problem
|
#8 |
Guest
Posts: n/a
|
Bian, I think I read somewhere that you got XP64 to XP64 MPICH working, were they on identical machines, how did you set them up and did you get past those 2 warnings.
Thanks Trev |
|
January 4, 2007, 01:22 |
Re: distributed mpich problem
|
#9 |
Guest
Posts: n/a
|
Bian, could the problem be that I have a network of pc's that belong to a workgroup that all have local users and not a domain with a central repositary of users. Hence I have registered users with localusername and password ( the same accross all machines) but the MPI documentation says the users have to be specified in the form of "Domain\User". I have only supplied the User part.
If so, do I have to register the same user multiple times in the form of Machine1\CFDUser, Machine2\CFDUser etc? Thanks Trevor |
|
January 4, 2007, 02:41 |
Re: distributed mpich problem
|
#10 |
Guest
Posts: n/a
|
Hi Trevor,
I just want to share my experience. I am running on a mpich windows cluster. All my machines work within the same domain, and they are accessed through the same domain password during windows logon. When I set mpich up, I use the Domain name as "user" and the domain logon password as "password". I supply this information to each of the computers in the system through the: cfx5parallel -register -mpich -user. This worked for me. Best regards CFDworker |
|
January 4, 2007, 06:49 |
Re: distributed mpich problem
|
#11 |
Guest
Posts: n/a
|
CFDworker, thanks for the reply. I think I am fast comming to the realisation that I need a domain, not just a group of pc's running in a workgroup.
This may take some time to setup. Thanks Trev |
|
January 7, 2007, 06:48 |
Problem FIXED !
|
#12 |
Guest
Posts: n/a
|
OK, the PVM warnings were fixed by installing the patch described in microsoft knowledge base KB892099. I now have no warnings.
The MPICH probs were due to a long path to get to the .def file. I only noticed this when I clicked on the command prompt window that appears when you run cfx5solver. I changed the path and the file name to standard 8.3 format and it worked fine. It runs on a Domain as well as a workgroup, just log in as the same user and password on the 2 machines and presto. I was running a wireless network no probs. To all of those who tried to help thanks heaps. Thanks Trev. |
|
January 9, 2007, 00:20 |
Re: Problem FIXED !
|
#13 |
Guest
Posts: n/a
|
I was going to mention the invalid read handle fix for rsh, sorry.
Did the path contain spaces? |
|
January 9, 2007, 00:22 |
Re: distributed mpich problem
|
#14 |
Guest
Posts: n/a
|
The doc was written before xp64 existed. xp64 is a different os.
cfx 10 was released before xp64, so it is not officially supported. However, it should probably work fine, and I think it does. |
|
January 9, 2007, 10:57 |
Re: distributed mpich problem
|
#15 |
Guest
Posts: n/a
|
Yes, I got CFX11P6 MPICH working on two identical XP64 machines. However, it has other problems (affecting speed) and I am waiting for the official release with everything working.
|
|
January 9, 2007, 18:08 |
Re: Problem FIXED !
|
#16 |
Guest
Posts: n/a
|
Yes, The path to the .def file had heaps of spaces. I thought the spaces problem was only associated with the path to the cfx root dir, this was not the case.
Trevor |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Commercial meshers] Handling cyclic BC from gambit to openfoam for a cascade airfoil problem - OF 1.6 | maverick | OpenFOAM Meshing & Mesh Conversion | 2 | June 18, 2011 05:36 |
MPICH parallel problem (CFX-11 preview 5) | CFDworker | CFX | 8 | October 10, 2006 22:53 |
PVM Distributed problem - error connecting | zaidun | CFX | 2 | July 5, 2006 10:59 |
MPICH problem (CFX-5.7.1) | Jesper | CFX | 7 | April 16, 2005 06:04 |
CFX-5.7 MPICH Parallel Problem (Output of Results) | James Date | CFX | 7 | February 15, 2005 17:03 |