|
[Sponsors] |
bash script for pseudo-parallel usage of reconstructPar |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
August 18, 2014, 03:06 |
|
#21 |
Member
hannes
Join Date: Mar 2013
Posts: 47
Rep Power: 13 |
Hi,
could you post some more information about your case? From what you are writing I would assume that you only have two timesteps to reconstruct because the way the script is written it will never start more jobs than there are timesteps left. Hannes |
|
September 11, 2014, 06:40 |
Error while running parReconstructPar
|
#22 |
Member
Join Date: May 2014
Posts: 31
Rep Power: 12 |
Hi,
I get the following error while trying to run parReconstructPar. Starting Job 1 - reconstructing time = 0.1 through 166.6 Job started with PID 10452 Starting Job 2 - reconstructing time = 166.7 through 234.1 Job started with PID 10462 Starting Job 3 - reconstructing time = 234.2 through 31.7 Job started with PID 10472 Starting Job 4 - reconstructing time = 31.8 through 300 Job started with PID 10482 --> FOAM FATAL ERROR: No times selected From function reconstructPar in file reconstructPar.C at line 210. FOAM exiting If you notice the time ranges in job 3 and 4 there a bit strange too. Can someone please tell me how to correct this? Thanks kcn |
|
September 11, 2014, 06:54 |
|
#23 |
Senior Member
|
Hi kcn!
I can only GUESS, but is it maybe related to the timestep given in your controlDict? Otherwise: Did you already try if it would work with two processors only? Maybe the division of timesteps is giving values which do not work? Or maybe you could try to decompose timestep 0 as well, so it would work beginning from timestep 0 instead of 0.1? As I said: It is all only guessing, but I think these would be the approaches I would take for further testing... Cheers, Bernhard |
|
September 11, 2014, 19:27 |
Minor Fix to get all numeric directories
|
#24 |
New Member
Will
Join Date: Dec 2011
Posts: 17
Rep Power: 15 |
The errors reported above are due to a bug in the way the script lists list the time files in processor0. They needed to be ordered by value using the "1v" flag, otherwise you can get the times in the wrong order, e.g. ... 0.7, 0.8, 1, 10, 1.1, 1.2 ....
If the number of processors is such that one processor picks up the range from 10 to 1.9 (say), when it should be from 1.1 to 1.9, then you have a problem. Simply replace all occurrences of "ls processor0 | ...." with "ls processor0 -1v | ...." Modified script attached. parReconstructPar.txt Otherwise a great script. Thankyou. |
|
September 18, 2014, 02:45 |
|
#25 |
Member
Join Date: May 2014
Posts: 31
Rep Power: 12 |
Dear Will,
Thank you very much for the corrected script. kcn |
|
October 7, 2014, 07:21 |
|
#26 | |
New Member
sd
Join Date: May 2014
Posts: 14
Rep Power: 12 |
Quote:
I tried to put your file in my case directory and execute ./parReconstructPar but it doesn't work. Again I put it in my opt/bin directory for in case but also failed. Can you tell me where Im making mistake?? |
||
October 7, 2014, 08:08 |
|
#27 |
Senior Member
Joachim Herb
Join Date: Sep 2010
Posts: 650
Rep Power: 22 |
Have you set the executable flag?
Code:
chmod a+x parReconstructPar Code:
sh parReconstructPar |
|
October 14, 2014, 05:52 |
|
#28 |
New Member
Jim KIT
Join Date: Aug 2012
Location: Germany
Posts: 25
Rep Power: 14 |
Hallo,
thanks for sharing. I'm doing a parallel simulation in OpenFoam and because of the limitation of files numbers in my computer, I have to prevent, that data be increased exponentially. I have somehow no experience in bash scripting and I want to write a script, that after a certains time lets say 1 min search if there is a new time in processor0. It should than reconstract this new time and delete it in all the processors. Its clear that it shoud be able to run during the simulation. I would appreciate if sombody can help me thx |
|
December 7, 2014, 11:56 |
|
#29 |
New Member
Jaap Stolk
Join Date: Nov 2014
Posts: 11
Rep Power: 12 |
Dear Will,
I'm very happy with this script, but I would like to point out that there still is a small overlap in the detected time ranges when decimals are involved: Starting Job 1 - reconstructing time = 0.25 through 30.5 Starting Job 2 - reconstructing time = 30.25 through 60.5 (I can only run 2 reconstructs in parallel on a single machine (16 GB ram limit) but the script parameters make it very easy to manually divide the reconstructing over 2 or 3 machines.) edit: this example is a bit more problematic, I may have to run 30.5 manually: Starting Job 1 - reconstructing time = 0.25 through 30.25 Starting Job 2 - reconstructing time = 30.75 through 61 Last edited by jwstolk; December 7, 2014 at 12:45. Reason: (another example) |
|
December 14, 2014, 22:53 |
|
#30 | ||
Member
ALLEN
Join Date: Aug 2014
Posts: 32
Rep Power: 12 |
Quote:
hello, jwstolk I am using the tool on HPC, but can tell me how to tun this application, I have put the script in the directory where pro* files are stored. and when I run "parReconstructPar 12",it simply give me a error of "Command not found". Quote:
very much appreciated if you can do any help. /Allen |
|||
December 15, 2014, 17:22 |
|
#31 |
New Member
Jaap Stolk
Join Date: Nov 2014
Posts: 11
Rep Power: 12 |
I don't know exactly what HPC is in this case, but I will assume this is some form of Linux.
You are on the right track. ./parReconstructPar should normally work, but since you downloaded the script, it is not marked as executable, and results in an error. if you run: ls -l parReconstructPar the "x" flag should be missing. in that case, run something like: chmod a+x parReconstructPar (oh, and possibly read scripts you download from the internet before giving them executing privileges :-) Normally you only need to use the -n option, like: ./parReconstructPar -n 12 (see the USAGE line in your quote) Note that reconstructing takes quite a bit of RAM. I recommend just running first with 1 or 2 instead of 12, and then checking how much ram it ends up using (for example with top or htop), and then decide how many cases you can run in parallel without running out of ram. The OS can swap other programs to disk, but the ram used by parReconstructPar is used continuously, and when even a small part of that needs to be swapped to disk, everything slows down to a crawl. With my current cases, I can run upto "-n 3" with 16 GB ram. Since my files are on an NFS drive, I can use the "-t start,end" option to process only half the time directories, and run the other half from another computer, with another 16 GB of ram. (The standard reconstructPar tool can only rebuild all time directories, or a list of timestamps, and does not have the neat "start,end" option like this script.) If you are using decimals in your saved timestamps, check that the script does not skip a timestep between the split time ranges, because bash has trouble with sorting numbers with decimals. |
|
February 16, 2015, 00:35 |
reconstructPar in parallel using GNU Parallel with a bash one-liner
|
#32 |
Member
Peter
Join Date: Feb 2015
Location: New York
Posts: 73
Rep Power: 11 |
Hi All,
I didn't know there was a script for this - really nice. I usually just do this with a bash one-liner: Code:
$ foamListTimes -processor > log.foamTimes; awk 'NR%4==1' log.foamTimes | parallel --halt=0 -j8 reconstructPar -newTimes -time {}: Code:
foamListTimes -processor Code:
awk 'NR%4==1' log.foamTimes Code:
parallel --halt=0 -j8 reconstructPar -newTimes -time {}: Anyway, that's the solution I've been using - hope this helps. Peter |
|
July 19, 2015, 22:25 |
|
#33 |
Member
methma Rajamuni
Join Date: Jul 2015
Location: Victoria, Australia
Posts: 40
Rep Power: 11 |
Kwardly,
Thank you very much for sharing the parReconstructPar script. It is perfectly working. Best, Meth. |
|
August 28, 2016, 06:07 |
|
#34 |
Senior Member
Taher Chegini
Join Date: Nov 2014
Location: Houston, Texas
Posts: 125
Rep Power: 13 |
Thanks Peter. Your one-line code works like a charm, concise and efficient.
|
|
May 12, 2017, 10:24 |
|
#35 |
Member
Ran
Join Date: Aug 2016
Posts: 69
Rep Power: 10 |
Thanks, it real helps.
It looks good, but I've notice that something possible a bug? This is my input $ sh reconPar 24 test running reconstructPar -noZero in pseudo-parallel mode on 24 processors reconstructing 134 time directories making temp dir Starting Job 1 - reconstructing time = 0 through 10.5 Starting Job 2 - reconstructing time = 10.8 through 1.2 Starting Job 3 - reconstructing time = 12.3 through 13.8 Starting Job 4 - reconstructing time = 14.1 through 15.3 Starting Job 5 - reconstructing time = 15.6 through 17.1 Starting Job 6 - reconstructing time = 17.4 through 18.6 Starting Job 7 - reconstructing time = 18.9 through 20.4 Starting Job 8 - reconstructing time = 20.7 through 21.9 Starting Job 9 - reconstructing time = 22.2 through 23.7 Starting Job 10 - reconstructing time = 24 through 25.2 Starting Job 11 - reconstructing time = 25.5 through 27 Starting Job 12 - reconstructing time = 2.7 through 28.5 Starting Job 13 - reconstructing time = 28.8 through 30 Starting Job 14 - reconstructing time = 30.3 through 31.8 Starting Job 15 - reconstructing time = 32.1 through 33.3 Starting Job 16 - reconstructing time = 33.6 through 35.1 Starting Job 17 - reconstructing time = 35.4 through 36.6 Starting Job 18 - reconstructing time = 36.9 through 38.4 Starting Job 19 - reconstructing time = 38.7 through 39.9 Starting Job 20 - reconstructing time = 4.2 through 5.7 Starting Job 21 - reconstructing time = 6 through 7.5 Starting Job 22 - reconstructing time = 7.8 through 9.3 Starting Job 23 - reconstructing time = 9.6 through Starting Job 24 - reconstructing time = through 39.9 =============================================== what does it mean for Job2 : 10.8 though 1.2? Is this a bug? Starting Job 2 - reconstructing time = 10.8 through 1.2 Also notice that Starting Job 23 - reconstructing time = 9.6 through Starting Job 24 - reconstructing time = through 39.9 I can not understand what those two lines mean? It only deals with the timeStamp of 9.6 and 39.9? After a chunk of time, the program stuck at some timeStamp. Here's the output from this script after running a long time. sh reconPar 24 test running reconstructPar -noZero in pseudo-parallel mode on 24 processors reconstructing 134 time directories making temp dir Starting Job 1 - reconstructing time = 0 through 10.5 Starting Job 2 - reconstructing time = 10.8 through 1.2 Starting Job 3 - reconstructing time = 12.3 through 13.8 Starting Job 4 - reconstructing time = 14.1 through 15.3 Starting Job 5 - reconstructing time = 15.6 through 17.1 Starting Job 6 - reconstructing time = 17.4 through 18.6 Starting Job 7 - reconstructing time = 18.9 through 20.4 Starting Job 8 - reconstructing time = 20.7 through 21.9 Starting Job 9 - reconstructing time = 22.2 through 23.7 Starting Job 10 - reconstructing time = 24 through 25.2 Starting Job 11 - reconstructing time = 25.5 through 27 Starting Job 12 - reconstructing time = 2.7 through 28.5 Starting Job 13 - reconstructing time = 28.8 through 30 Starting Job 14 - reconstructing time = 30.3 through 31.8 Starting Job 15 - reconstructing time = 32.1 through 33.3 Starting Job 16 - reconstructing time = 33.6 through 35.1 Starting Job 17 - reconstructing time = 35.4 through 36.6 Starting Job 18 - reconstructing time = 36.9 through 38.4 Starting Job 19 - reconstructing time = 38.7 through 39.9 Starting Job 20 - reconstructing time = 4.2 through 5.7 Starting Job 21 - reconstructing time = 6 through 7.5 Starting Job 22 - reconstructing time = 7.8 through 9.3 Starting Job 23 - reconstructing time = 9.6 through Starting Job 24 - reconstructing time = through 39.9 --> FOAM FATAL ERROR: No times selected From function int main(int, char**) in file reconstructPar.C at line 225. FOAM exiting 134 directories remaining... 112 directories remaining... 110 directories remaining... 108 directories remaining... 105 directories remaining... 100 directories remaining... 89 directories remaining... 88 directories remaining... 87 directories remaining... 85 directories remaining... 83 directories remaining... 79 directories remaining... 77 directories remaining... 73 directories remaining... 63 directories remaining... 61 directories remaining... 58 directories remaining... 56 directories remaining... 54 directories remaining... 52 directories remaining... 49 directories remaining... 39 directories remaining... 37 directories remaining... 35 directories remaining... 33 directories remaining... 32 directories remaining... 28 directories remaining... 24 directories remaining... 20 directories remaining... 19 directories remaining... 17 directories remaining... 15 directories remaining... 11 directories remaining... 10 directories remaining... 9 directories remaining... 8 directories remaining... 7 directories remaining... 6 directories remaining... E.O.F I am using O.F. v4.1 with the flowing hardware: 24 cores/node, Memory per node 32 G, infiniband, AMD @2.1GHz CPU, Centos 6.8. Third update: After 01:43:43 running, it finish without error. Thanks man! But according to my observation, the final several directories were much slower than others. Anybody has ideas about this? This is full record of the output. $ sh reconPar 24 test running reconstructPar -noZero in pseudo-parallel mode on 24 processors reconstructing 134 time directories making temp dir Starting Job 1 - reconstructing time = 0 through 10.5 Starting Job 2 - reconstructing time = 10.8 through 1.2 Starting Job 3 - reconstructing time = 12.3 through 13.8 Starting Job 4 - reconstructing time = 14.1 through 15.3 Starting Job 5 - reconstructing time = 15.6 through 17.1 Starting Job 6 - reconstructing time = 17.4 through 18.6 Starting Job 7 - reconstructing time = 18.9 through 20.4 Starting Job 8 - reconstructing time = 20.7 through 21.9 Starting Job 9 - reconstructing time = 22.2 through 23.7 Starting Job 10 - reconstructing time = 24 through 25.2 Starting Job 11 - reconstructing time = 25.5 through 27 Starting Job 12 - reconstructing time = 2.7 through 28.5 Starting Job 13 - reconstructing time = 28.8 through 30 Starting Job 14 - reconstructing time = 30.3 through 31.8 Starting Job 15 - reconstructing time = 32.1 through 33.3 Starting Job 16 - reconstructing time = 33.6 through 35.1 Starting Job 17 - reconstructing time = 35.4 through 36.6 Starting Job 18 - reconstructing time = 36.9 through 38.4 Starting Job 19 - reconstructing time = 38.7 through 39.9 Starting Job 20 - reconstructing time = 4.2 through 5.7 Starting Job 21 - reconstructing time = 6 through 7.5 Starting Job 22 - reconstructing time = 7.8 through 9.3 Starting Job 23 - reconstructing time = 9.6 through Starting Job 24 - reconstructing time = through 39.9 --> FOAM FATAL ERROR: No times selected From function int main(int, char**) in file reconstructPar.C at line 225. FOAM exiting 134 directories remaining... 112 directories remaining... 110 directories remaining... 108 directories remaining... 105 directories remaining... 100 directories remaining... 89 directories remaining... 88 directories remaining... 87 directories remaining... 85 directories remaining... 83 directories remaining... 79 directories remaining... 77 directories remaining... 73 directories remaining... 63 directories remaining... 61 directories remaining... 58 directories remaining... 56 directories remaining... 54 directories remaining... 52 directories remaining... 49 directories remaining... 39 directories remaining... 37 directories remaining... 35 directories remaining... 33 directories remaining... 32 directories remaining... 28 directories remaining... 24 directories remaining... 20 directories remaining... 19 directories remaining... 17 directories remaining... 15 directories remaining... 11 directories remaining... 10 directories remaining... 9 directories remaining... 8 directories remaining... 7 directories remaining... 6 directories remaining... 5 directories remaining... 4 directories remaining... 3 directories remaining... 2 directories remaining... 1 directories remaining... cleaning up temp files finished E.O.F Last edited by random_ran; May 12, 2017 at 11:58. Reason: Update output:3rd |
|
May 30, 2017, 15:09 |
|
#36 | |
Member
Ran
Join Date: Aug 2016
Posts: 69
Rep Power: 10 |
I found the creator [O. Tange (2011)] of GNU parallel was funny.
To silence the citation notice: run 'parallel --bibtex'. That's a good way to remind users. Anyway, thanks. Quote:
|
||
June 8, 2018, 20:09 |
Another minor fix to get all time steps in order
|
#37 |
New Member
Guilherme Salvador Vieira
Join Date: Jun 2018
Location: Boston, MA
Posts: 1
Rep Power: 0 |
Dear all,
I just wanted to upload a small fix that handles situations in which Will's last version still struggles (e.g. if you have outputs at times 0.125, 0.25, 0.375 and 0.5, when the proposed "ls -v1" by itself doesn't capture well the order). The idea is simply to replace the occurrences of "ls processor0 -1v | ..." with "ls processor0 -1v | sort -g | ...", which guarantees the ordering is correct regardless of how many decimal digits are used in different folders. Regardless, this is a great script, very useful to get a faster reconstruction. Thanks to all those who contributed. |
|
April 13, 2019, 13:16 |
|
#38 |
Member
Nat K
Join Date: Oct 2017
Posts: 68
Rep Power: 9 |
Is it possible to reconstruct specific timestamps using -t option?
|
|
April 14, 2019, 03:35 |
|
#39 | |
New Member
Jaap Stolk
Join Date: Nov 2014
Posts: 11
Rep Power: 12 |
Quote:
This script assigns different reconstruct jobs to different threads. I have not used it for a while but if you set the time range to include only a single timestemp, the script will only use a single thread, and should be identical to just running "reconstructPar -time x.xx" I now mostly use the ParaFoam option to visualize a decomposed case, without the need for reconstructing. |
||
July 1, 2020, 10:13 |
|
#40 |
New Member
Wei Yao
Join Date: Jul 2015
Posts: 1
Rep Power: 0 |
Very usefull scripts !
|
|
Tags |
parallel processing, reconstructpar |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Running decomposePar / reconstructPar as parallel apps? | carcass | OpenFOAM Running, Solving & CFD | 3 | January 17, 2024 08:19 |
Script to Run Parallel Jobs in Rocks Cluster | asaha | OpenFOAM Running, Solving & CFD | 12 | July 4, 2012 23:51 |
Core usage on CFX parallel processing | alterego | CFX | 6 | December 21, 2011 06:45 |
Swap usage on parallel run | nikhilesh | OpenFOAM Running, Solving & CFD | 0 | April 30, 2009 10:50 |
TASCflow,problem with script and parallel mode | Zbynek Hrncir | CFX | 0 | October 2, 2001 08:30 |