|
[Sponsors] |
Basics: Set-up for different desktop machines |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
January 4, 2010, 15:18 |
Basics: Set-up for different desktop machines
|
#1 |
Member
Stefan
Join Date: Jan 2010
Location: Kiel, Germany
Posts: 81
Rep Power: 16 |
Hello Folks,
i'm setting up an OpenFOAM installation on 3 desktop-pcs, which are connected via lan. After searching the forum for consistent information about correct set-up for parallel running, i'm a little confused due to my problem! The system have to work for 2 seperate users, say Usr1 and Usr2. So my idea was: 1st PC: head-node as server 2nd PC: node for Usr1 as client 3nd PC: node for Usr2 as client All machines are Core2Duo with 12GB RAM at all. I installed GNU/Linux Debian 64-bit and OF-1.6 64-bit on all nodes. The server ist permanent running but the clients just when each user works on it. (OF works fine on every single node.) Now i'm facing the problem, that the clients have different directories of course: /home/Usr1 /home/Usr2 Now when i try to run a case in parallel, say on client Usr1 using the server via mpirun, OF is asking for login data Usr1@server. After entering the data i get an error message: Code:
bash: orted: command not found ... (I've tried to fix the problem with 2 accounts on the server for Usr1 and Usr2 including the OF-Installation on both accounts, but it didn't change anything) I've read in some threads, that on all nodes there have to be identical directories to make sure OF works properly in combination with NFS/NIS. I need some details on that! Has anybody helpful suggestions to manage my requested set-up? (I can't imagine that there is only the way of using the same user account on every node) thx, Stefan |
|
January 5, 2010, 06:24 |
|
#2 | ||
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
It's always a bit difficult with remote diagnoses but I'll give it a go.
Quote:
Quote:
Generate your keys on the user machine (ssh-keygen) and distribute the public part of the keys to the 'authorized_keys'. Once you have this working, you can address the next part - namely getting your settings to work when logging in remotely and with a non-interactive remote shell. The simplest is to add the OpenFOAM (and thus the openmpi) bits into the respective ~/.bashrc files. Note that ~/.profile would be the wrong place, since this won't get seen by an non-interactive shell. Another possibility is to forward the information about which files to source. This is what they do in the mpirunDebug and in the wmakeScheduler, but I would avoid that approach for now since it entails even more scripting. To check that everything is getting found, you can simply test that the path, ld_library_path, etc are getting set properly when accessing remotely. Eg, Code:
ssh $HOST 'echo $PATH' # for this machine ssh otherHost 'echo $PATH' # for another machine After this gets working okay, you can refocus on getting OpenFOAM working in parallel. For this, you either need NFS mounts between the machines (easy) or need to transport the decomposed geometry to the remote machines and set the decompose roots (in decomposePar) accordingly (hard/annoying). For your setup, the simplest would be to create an NFS export directory on each machine. For example, "/data/export" where the user would place their calculation data. And then use an auto-mounter (eg, autofs) to access these data remotely as /net/MACH/data/export. This should work without any major issues. /mark |
|||
January 6, 2010, 12:56 |
|
#3 |
Member
Stefan
Join Date: Jan 2010
Location: Kiel, Germany
Posts: 81
Rep Power: 16 |
Thanks a lot /mark for the helpful remarks! Some questions appear subsequently...
(password-less ssh is done by the way) First: Is it needful to have OF installed in each user account Usr1 and Usr2 on the server? (The accounts are still needed to access from the seperate clients, right!?) I'm asking for processing afterwards any updates, configurations, adding users etc. accordingly to OF. Therefore it would be effective of having a centralized installation directory on the server. Second: What do you mean with "OpenFOAM (and thus the openmpi) bits" to source "the respective ~/.bashrc files"? (Do I have to add the path to the OF-Installation on the server e.g. in the ~./bashrc files on the clients? An example would be helpful!) Third: I'm not sure of understanding the NFS-mounts correctly? Here my Set-up so far: Server: /home/Usr1/OpenFOAM/Usr1-1.6/run... /home/Usr2/OpenFOAM/Usr2-1.6/run... Client Usr1: /home/Usr1/OpenFOAM/Usr1-1.6/run... Client Usr2: /home/Usr2/OpenFOAM/Usr2-1.6/run... Do I have to export the Server-directories and automount them on each client or vice versa? (Why is that needed in cases where no distributed data are set in the 'decomposParDict' on the client where the calculation will running via mpirun) Thx, /Stefan |
|
January 7, 2010, 06:59 |
|
#4 | |||
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
Quote:
FOAM_INST_DIR=/data/app/OpenFOAM and then you have the following structure /data/app/OpenFOAM/OpenFOAM-1.6.x/ /data/app/OpenFOAM/ThirdParty-1.6.x/ The 'site' directory provides an ideal place for sharing custom settings, solvers, utilities and libraries. For versioned files: /data/app/OpenFOAM/site/1.6.x/bin /data/app/OpenFOAM/site/1.6.x/lib /data/app/OpenFOAM/site/1.6.x/constant /data/app/OpenFOAM/site/1.6.x/system And for unversioned files: /data/app/OpenFOAM/site/constant /data/app/OpenFOAM/site/system Quote:
export FOAM_INST_DIR=/data/app/OpenFOAM foamDotFile=$FOAM_INST_DIR/OpenFOAM-1.6.x/etc/bashrc [ -f $foamDotFile ] && . $foamDotFile Of course, if you have a supported queuing system (we use GridEngine), the mpirun from openmpi takes care of doing the right thing and will be able to find itself on the various clients without needing to touch the ~/.bashrc. Quote:
Ensuring that the user's calculation data can be found on each node entails NFS. How you resolve it depends on how you wish to work. I generally don't bother with the ~/OpenFOAM/User-Version/run structure that OpenFOAM provides by default. But you get it to work provided that the ~/OpenFOAM/User-Version/run directories are non-local (ie, NFS-shared). One solution could be this: ln -s /data/shared/Usr1 /home/Usr1/OpenFOAM/Usr1-1.6/run where the /data/shared/Usr1 is NFS exported/imported from somewhere (ie, head node, workstation, NAS, whatever). Our workflow means that we instead prefer to use a structure that looks more like this: /data/work/User/Customer/ProjectNr/catia/ /data/work/User/Customer/ProjectNr/hypermesh/ /data/work/User/Customer/ProjectNr/Component-Revision/OpenFOAM/ /data/work/User/Customer/ProjectNr/Component-Revision/starcd/ To track which application and OpenFOAM version was uses and where, we have a sentinel file 'FOAM_SETTINGS' in each OpenFOAM/ directory with this sort of contents: APPLICATION=rhoPorousSimpleFoam FOAM_VERSION=OpenFOAM-1.6.x This helps prevent accidental restarts with mismatched versions. I hope this helps you further. BTW: does @TUB mean TU Berlin or somewhere else? |
||||
January 8, 2010, 04:09 |
|
#5 |
Member
Stefan
Join Date: Jan 2010
Location: Kiel, Germany
Posts: 81
Rep Power: 16 |
Ok, things getting more clearly...
I'll try to get OF running on all nodes and let the forum know! BTW: TU Berlin is truly correct! Was it a guess or do you have connections to TUB? |
|
January 8, 2010, 06:14 |
|
#6 |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
||
January 11, 2010, 09:59 |
|
#7 |
Member
Stefan
Join Date: Jan 2010
Location: Kiel, Germany
Posts: 81
Rep Power: 16 |
Hello again,
after doing all the things above, the mpirun command exited with the following message (calculation started on client Usr1 in connection with Server): Code:
/*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.6 | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 1.6-f802ff2d6c5a Exec : simpleFoam -parallel Date : Jan 11 2010 Time : 13:34:49 Host : head-node PID : 3753 [0] [0] [0] Cannot read "/home/Usr1/system/decomposeParDict" [0] FOAM parallel run exiting [0] -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 3638 on node head-node exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- /home/Usr1/system/decomposeParDict ? The default structure normally has to be /home/Usr1/OpenFOAM/Usr1-1.6/run...case123/system/decomposeParDict duo to my set-up (see above)? Another aspect i'm facing is that i do have passwordless-ssh from the clients to the server. Thus i'm able to mount exported directories from the server on clients. The other way round doesn't work. But for a ssh connection from the server to the client, i need an account on the clients for the server!? But this isn't prefered! Can I prevent such dependences if i'm working with the suggested /data/app/OpenFOAM structure on every node and avoid the server/client configuartion? What about NIS-accounts? I've never worked with NIS yet... /Stefan |
|
January 20, 2010, 17:49 |
|
#8 |
Member
Stefan
Join Date: Jan 2010
Location: Kiel, Germany
Posts: 81
Rep Power: 16 |
So, after another two days working on the three nodes, everthing works regular. I did the recommended changes by Mark. Now on all nodes i installed OF in /data/app (as root). So no 'real' server/client configuration is present, but that doesn't matter for my usage so far.
In addition to that every node has NFS export/import directories. That is needed to ensure the involved nodes find the decompose case data (i.e. processor0, processor1, ..., processorN) on the machine the calculation is started. To keep my things working properly some confusing annotations! When i set the option distributed no in the decomposePar dict ('cause i do not want to allocate my data) the remote nodes expect their part of data on a local drive (in an equivalent directory structure by default). To evade this i use the distribute option and set the roots to the respective NFS export/import directories. Be careful with the right node order in the <machinefile>, too. The host wherefrom starting the mpirun has to be the first in the list. According to the specified CPUs one become a master, the following treated as slaves. When you run the decompose case via PyFoam the needed <machinefile> for e.g. pyFoamPlotRunner.py use a different order of the slave CPUs (I don't figure out why PyFoam not use the <machinefile> standard openMPI suppose to do). Finally i have to remark that for further installation purposes it would be particularly advantageous of using a server with NIS accounts. But I'm not sure if OF has to be installed on every node in the cluster when using openMPI. Maybe someone can clarify this subject!? /Stefan PS: Thanks Mark for the opportunity calling you! I'll come back to you when necessary. |
|
January 21, 2010, 08:06 |
|
#9 | ||
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Quote:
To find out how mpirun is actually called set in ~/.pyFoam/pyfoamrc Code:
[Debug] ParallelExecution: True Quote:
Bernhard |
|||
January 21, 2010, 09:05 |
|
#10 | ||
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
Quote:
1. Export the directories from each machine. The resulting /etc/exports would resemble this Code:
/export/data *.mydomain.name(fsid=0,rw,root_squash,sync,no_subtree_check) 2. Laziness is virtue. Be lazy and do not import these directories on each machine. This will drive you crazy, is a maintenance nightmare, scales very poorly and can bind more network resources than you wish. 3. Clean out all those NFS mount from the various /etc/fstab files. Only include mounts that absolutely must exist for the system to run (eg, if you are NFS mounting the user home directories). If you can't umount some of the NFS mounts, try "lsof" to see which processes might still have access to them (or be brutal and just reboot the machines). 4. Use the automounter. The /etc/auto.net does a pretty good job of it. Activate it in /etc/auto.master, insert the service and start the autofs service as needed. The NFS-exported directories (step 1 above) should now be addressable via the automounter. For example, the "/net/machine1/export/data/" mount should correspond to the "/export/data" NFS share from machine1. Note that a simple "ls /net" will only show directories that are already mounted. You'll need to use "ls /net/machine1" the first time. If you set things up like this, you will now be able to address the same disk space in two different ways. On all of the machines, you can address this disk space as "/net/machine1/export/data". On machine1, you have the choice of using the real location ("/export/data") or the automounted NFS-share ("/net/machine1/export/data"). Fortunately the kernel is smart enough to notice that the NFS share from machine1 -> machine1 is actually local. You have no performance penalty on the local machine if you address it as "/net/machine1/export/data" instead of "/export/data". If that is clear, then you can see that the only slight difficulty is to make sure that you are indeed on the "/net/machine1/export/data" path instead of the "/export/data" path when starting your jobs. This is not terribly difficult to overcome with something trivial like this. Code:
[ "${PWD#/export/data/}" != "$PWD" ] && cd "/net/$HOST$PWD" Quote:
Likewise, OpenFOAM does not need to be installed on every node, but it must be accessible from every node. A NFS share is the most convenient, but you could also compile and install it on one machine and use rsync to distribute it about if that works better for you. |
|||
January 21, 2010, 16:22 |
|
#11 |
Member
Stefan
Join Date: Jan 2010
Location: Kiel, Germany
Posts: 81
Rep Power: 16 |
@Bernhard
I just noticed using the <machinefile> in the way openMPI wants it, e.g. Code:
head-node cpu=X client-node1 cpu=Y ... Therfore i get a descending order: Code:
master: CPU_X.1 (head-node) slaves: CPU_X.2 (head-node) ... CPU_X.N (head-node) CPU_Y.1 (client1-node) CPU_Y.2 (client1-node) ... CPU_Y.N (client1-node) Code:
master: CPU_X.1 (head-node) slaves: CPU_Y.1 (client1-node) CPU_X.2 (head-node) CPU_Y.2 (client1-node) ... CPU_X.N (head-node) CPU_Y.N (client1-node) Maybe the difference in the chronology isn't caused by PyFoam!? @Mark Thanks once again! This i my first little cluster, so still in practise... I'm pleased the things getting work and the second user, too. If there is spare time soon i keep me busy with these stuff. The next questions appear for sure! /Stefan |
|
January 22, 2010, 08:11 |
|
#12 | ||
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Quote:
Quote:
Bernhard |
|||
January 22, 2010, 08:32 |
|
#13 | |
Senior Member
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40 |
Quote:
|
||
January 22, 2010, 17:59 |
|
#14 |
Member
Stefan
Join Date: Jan 2010
Location: Kiel, Germany
Posts: 81
Rep Power: 16 |
@Bernard
I forgot to say that the CPU order is an output produced by OpenFOAM: Code:
/*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 1.6 | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 1.6-f802ff2d6c5a Exec : /data/app/OpenFOAM/OpenFOAM-1.6/applications/bin/linux64GccDPOpt/pisoFoam -parallel Date : Jan 22 2010 Time : 18:24:47 Host : client1 PID : 6330 Case : /net/host/data/ nProcs : 6 Slaves : 5 ( client1.6332 head-node.5024 head-node.5025 client2.4933 client2.4934 ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking SigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Code:
head-node head-node client client Code:
head-node cpu=2 client cpu=2 @Mark I agree with you but for this config i run the calcs from the same roots. So the settings remains unchanged! |
|
January 27, 2010, 12:44 |
|
#15 |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
||
Tags |
cluster, nfs, parallel |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
OF 1.6 | Ubuntu 9.10 (64bit) | GLIBCXX_3.4.11 not found | piprus | OpenFOAM Installation | 22 | February 25, 2010 14:43 |
OpenFOAM on MinGW crosscompiler hosted on Linux | allenzhao | OpenFOAM Installation | 127 | January 30, 2009 20:08 |
Help with GNUPlot | Renato. | Main CFD Forum | 6 | June 6, 2007 20:51 |
Env variable not set | gruber2 | OpenFOAM Installation | 5 | December 30, 2005 05:27 |
How to set environment variables | kanishka | OpenFOAM Installation | 1 | September 4, 2005 11:15 |