|
[Sponsors] |
[OpenFOAM] Paraview in Prallel (server-client) |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
April 17, 2009, 02:05 |
Paraview in Prallel (server-client)
|
#1 |
Senior Member
Prapanch Nair
Join Date: Mar 2009
Location: Bangalore, India
Posts: 105
Rep Power: 17 |
Hi,
I use paraview without the foamreader. So that I convert to VTK using foamToVTK. I have paraview installed (built myself) on a multiprocessor machine(8 processors). Now I have these basic questions. 1. Do I have to decomposePar the case before opening it in paraview? 2. If yes, then when I do foamToVTK, I dont get the time directories in the processor* folders written into the VTK folder. Is there a way I can write the subdomains into VTK ? 3. When I do mpirun -np 8 paraview, does it work for a non-decomposed case too? 4. Or should I mandatorily install the foam reader to do this? PS: I have not installed any parallel reader after building paraview. |
|
April 21, 2009, 07:24 |
|
#2 |
Senior Member
Wolfgang Heydlauff
Join Date: Mar 2009
Location: Germany
Posts: 136
Rep Power: 21 |
hi,
if you run mpirun -np 8 paraFoam (or paraView) paraView will open (8 times) but there is no benefit to more processors. It still runs on one processor. decomposing the case doesn't help eather. maybe someone knows how to (i also struggle with a huge case). |
|
April 25, 2009, 22:37 |
MPI version and Libraries?
|
#3 |
New Member
Bill Rosemurgy
Join Date: Mar 2009
Location: Ann Arbor, MI
Posts: 20
Rep Power: 17 |
Hi,
I'm trying to compile ParaView3.5 with MPI support. I'm on a Mac and have tried openmpi, MPICH, and MPICH2 with various combinations of: MPI_LIBRARY MPI_COMPILER MPI_EXTRA_LIBRARY MPI_INCLUDE_PATH and while it is building it will error out at some point or other depending on the combination of MPI settings. Does anyone have a tried and true way of installing on an Intel Mac? thanks, Bill |
|
July 28, 2010, 13:38 |
Paraview parallel postprocessing
|
#4 |
New Member
Carl Berger
Join Date: Mar 2009
Location: Baden, Switzerland
Posts: 9
Rep Power: 17 |
Hello together,
I have had this post somewhere else already 2 weeks ago but moved it here because I feel it fits better here. Basically I try postprocessing a Star-CD Case (actually saved as Ensight data) with paraview 3.8.0, and I want to make it compute in parallel. I have paraview already compiled with MPI and MESA support on Ubuntu 10.04 It discribed in the paraview wiki: http://www.paraview.org/Wiki/ParaView/Git http://www.paraview.org/Wiki/ParaView:Build_And_Install In first experiements are done on a 2-Core machine, I got as far as this: - in Paraview, connect to server - Server setup as follows: localhost, port 11111 Command: "mpirun -np 2 pvserver (optional: --off-screen-rendering)" This is starting the 2 extra windows with (or without, respectively) the rendered image. The paraview processing works allright, yet the performance of the parallel job is not any faster than when I just use paraview on a single core, rather slightly slower. Neither can I find a significant influence of the openGL-rendering, OK, so maybe my case is too simple for that: Case 1 paraview only: avg. 16 sec. / frame Case 2 paraview client, server on localhost, "mpirun -np 2 pvserver" (rendering via openGL avg. 17 sec. / frame Case 3 paraview client, server on localhost, "mpirun -np 2 pvserver --off-screen-rendering": avg. 17 sec. / frame time documentation was done by via "save animation" and comparing the timestamps, so they round to seconds. The animation was 5 frames long, time per frame did not vary more than one sec. For details, see below. Ideas that I have so far: - parellizing does not help in my specific case (2-stroke engine, moving mesh, filters are: "Cell Data to Point Data", slices and iso-Surfaces), so I should use another testcase? - the ENSIGHT reader is not parallelized? - on a 2-core machine, effects are too small to be visible? - do I need to decompose the computation domain somehow to tell paraview how to seperate the data for the parallelization? - computation time is very small compared to the time to fetch data from ram, and since both cpus rely in the same ram, there is nearly no speed-up? - the whole thing is not properly parallelized and effectively uses only one thread? Next Steps: Concerning the parallelized reader, I have read somewhere that the OpenFOAM reader is parallelzied, so I will try at least an OpenFOAM case. Do I need a decomposed case, or can I just take any case and take advantage of some automatic parallelization nontheless? Is there any possibility to do some profiling of Paraview, like logging the time a filter is taking, so to see at least where paraview is taking the most time? am I missing something else? any comments? Thanks! Carl ps: Could you please fix the spelling in the tiltle of this thread, so that it will be found by search engines? |
|
July 28, 2010, 13:56 |
|
#5 |
New Member
Carl Berger
Join Date: Mar 2009
Location: Baden, Switzerland
Posts: 9
Rep Power: 17 |
Allright, so I tried a few testcases concerning the ensight reader,
the foam-reader and the VTK reader, all to no avail. In the end I found some note somewhere on the paraview wiki, stating that the readers (maybe except the "new VTK" reader are not in parallel, and every instance of pvserver reads the whole data from the hard disk, with the result that the bottleneck of the reader is beeing tightened even more, since then the data has to be read for each process, resulting in a performance degeneration. Since most of the modern computers are multi-core (exept the iPad maybe...), I think its sensible for paraview to use them, also noted on: http://paraview.uservoice.com/forums...chin?ref=title for now, I see no other chance than to wait or code the thing myself... so here are my benchmarks: |
|
July 28, 2010, 18:52 |
|
#6 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Greetings Carl,
OK, lets examine one point at a time:
The ".OpenFOAM" reader from OpenFOAM, on the other hand, I'm not so sure about its capabilities... Best regards, Bruno
__________________
|
|
July 29, 2010, 13:06 |
|
#7 | |||
Super Moderator
Takuya OSHIMA
Join Date: Mar 2009
Location: Niigata City, Japan
Posts: 518
Blog Entries: 1
Rep Power: 20 |
Hi Carl, I'm the main developer of the builtin OF reader in PV 3.8 (the reader that is invoked by opening a file with .foam extension).
Quote:
Quote:
Hi Bruno, Quote:
Takuya Last edited by 7islands; July 29, 2010 at 13:14. Reason: Addenda |
||||
July 30, 2010, 11:13 |
|
#8 |
New Member
Carl Berger
Join Date: Mar 2009
Location: Baden, Switzerland
Posts: 9
Rep Power: 17 |
Hi,
thank you very much for your replies! Its always much easier if you get a new hint, and its so good for the motivation... Concerning the advantages of parallel processing: I understand that parallelizing makes only sense if most of the time is used up by the computational stuff. Neither data fetching from HDD nor the rendering will have any speedup. However, HDD speed around >50MB/s does not seem to be the limiting factor: a timestep folder sums up to 1.2 MB, and timesteps and time/frame is around 3sec. /step. Rendering is rather fast, I shut off the LOD-stuff, because rotation is still quick enough. So at ~10 frames/sec this does not seem to be the limit either. I will try some profiling and parallel execution of a decomposed case... The last tests have been carried out with paraview3.9, I didn't compile the reader that was delivered by OpenFOAM. The file is named "<file>.foam". I will try the other one too, and I will try some profiling. Regards, Carl |
|
August 25, 2010, 13:15 |
DepthCharge Benchmarks (8M cells)
|
#9 |
New Member
Carl Berger
Join Date: Mar 2009
Location: Baden, Switzerland
Posts: 9
Rep Power: 17 |
Hello everybody,
finally I got some time to do some further benchmarking... Paraview parallelization benchmarks =================================== 1) Test Description The OpenFOAM tutorial case "depthCharge3D" was chosen as testcase, since it can be referred to easily by eveybody. ~/OpenFOAM-1.7.x/tutorials/multiphase/compressibleInterFoam/les/depthCharge3D/ Cell numbers have been doubled in each direction in blockMeshDict, so the whole case consists of about 8Mio cells. This has been done to increase computation time of the postprocessing and also to check the scalablity of the parallelization (depthCharge3Dfine). 10 timesteps have been used for the evaluation (0.08s ... 0.17s), writeInterval=0.01 and purgeWrite=10, to keep the total size reasonable. As the original, the decomposition was done for 4 regions. Tests have been carried out for the Ensight reader and the (native) OpenFOAM reader (by opening *.foam files by calling paraview --data="$foldername.foam" ) To check the ensight Reader, the data was converted beforehand using the foamToEnsight utility (unfortunately, I forgot to take the time of this). For evaluation, a single paraview state file was set up and adapted to the three following cases: - reading the Ensight data - reading the OF-data as reconstructed case (which shouldn't improve speed by using more server processes) - reading the OF-data from the decomposed case results file sizes: reconstructed case, timestep 0.17: 28M alpha1.gz 24M p.gz 122M phi.gz 24M p_rgh.gz 17M rho.gz 117M U.gz 8,0K uniform decomposed case, processor0, timestep 0.17: 4,3M alpha1.gz 8,9M p.gz 31M phi.gz 8,6M p_rgh.gz 5,7M rho.gz 30M U.gz 8,0K uniform The evaluation is mainly a clip and a threshold showing alpha values greater than 0.5, so you see the water splashing. 2) Test Environment server: running under ubuntu 10.04, 64bit, on a 12-core machine with 24G RAM, data is on a local hdd (hdparm reported 120MB/s, so the ~350 MB of one timestep should be read in about 3 sec.) client on a notebook: 2-core core2duo, nvidia GeForce 9600M GT, ubuntu 10.04 (32bit version) paraview 3.9.0, 32bit on client, 64bit on server,compiled with mpi and mesa support execution via: carl@server$ mpirun -np [1/2/4] pvserver carl@client$ paraview -> connect to server ... 100Mbit ethernet connection between the two. 3) Test Results similar to last time: 11 frames evaluated, time measurement by "save animation" and looking at the timestamps of the first and last png files Ensight Reader mpirun -np 1 pvserver => 214 s mpirun -np 2 pvserver => 175 s mpirun -np 4 pvserver (run 1) => 149 s mpirun -np 4 pvserver (run 2) => 145 s mpirun -np 4 pvserver (run 3) => 178 s OF Reader (native paraview, .foam) new test decomposed case mpirun -np 1 pvserver => 537 s mpirun -np 2 pvserver => 303 s mpirun -np 4 pvserver => 145 s old test (not sure if trustworthy. I ran the tests several times and they were all in this region, but seemed somehow strange. maybe there were problems concerning networking or something else. please note, the "new" test above is an identical postprocessing evaluation as this "old" one) decomposed case mpirun -np 1 pvserver => 769 s mpirun -np 2 pvserver => 580 s mpirun -np 4 pvserver => 507 s reconstructed case mpirun -np 1 pvserver => 606 s mpirun -np 2 pvserver => 642 s mpirun -np 4 pvserver => 755 s 4) Comments The ensight reader does some domain-splitting by itself, so I get 4 vertical slices (z-normal), while the OF reader uses the OF decomposition which is horizontal (y-normal), there might be a little performance difference due to that, but maybe not too much. most important: Time improvements using parallelization are significant, using any of the two readers. Using the OF reader, scaling is nearly linear, wich may be due to the large number of cells. Using the ensight reader, the speedup is still significant. I am wondering why the ensight reader is so much faster when using only one process? I evaluated some of the time logs, too, but always only the "load state", because for the animation the time logs became too lenghty. Additionally, as non-developer, the time logs are somehow hard to understand. I have the impression, that sometimes the hirachical listing shows the sum in the parent process in the tree below the cpu time consumption of the child processes. sometimes, but not allways So merely summing up the time values does not really give the total computation time, not even when using only one process. Am I wrong? Additionally, it is hard to judge whether jobs are really executed in parallel, or if job-1 is waiting for job-5 to finish. CPU load doesn't tell you that either, its allways at 100% (as described somewhere). The timelogs are thus not particularly interesting, I wanted to post them but they weren't accepted due to file size. It mitght help a bit if the steps in the time log had not only the duration, but also a begin and end timestamp. The processing was rather slow (which is rather OK at 8M cells) I found it strange that when running e.g 4 server processes, 4 windows are opening on the client side, but for rotation not only LOD is reduced but the resolution, so I figure, it's using mesa on my client? That leads to two things: i) when using mesa, it could be run on the server (maybe I need --offScreenRendering for that), ii) I thought that when using pvserver, it runs the data server and the render server on the server? then why does it open the VTK windows on my client? and when it's opening the windows on my client, why doesn't it use GL? and what is the IceT dev renderer? ok. maybe again too much documentation for some little result, but anything to improve the tools! Greetings, Carl |
|
August 27, 2010, 04:02 |
|
#10 |
Super Moderator
Takuya OSHIMA
Join Date: Mar 2009
Location: Niigata City, Japan
Posts: 518
Blog Entries: 1
Rep Power: 20 |
Hi Carl,
Many thanks for sharing your observations. I am not familiar with the EnSight format, but on the whole your results looks reasonable if the following hypotheses are true:
As to why the pvservers are opening the VTK windows on the client, I have no idea. Perhaps you can discuss it better on the ParaView list. Takuya |
|
August 27, 2010, 04:55 |
|
#11 |
Senior Member
Anonymous
Join Date: Mar 2009
Posts: 110
Rep Power: 17 |
This can be prevented by compiling ParaView yourself and applying a patch that can be obtained from the mailing list. There is no need for these windows to pop-up, as far as I'm aware.
|
|
September 24, 2010, 08:12 |
|
#12 |
New Member
Carl Berger
Join Date: Mar 2009
Location: Baden, Switzerland
Posts: 9
Rep Power: 17 |
Hello everybody,
lately I have reviewed my original case with the performance problems, the one I posted at end of July. It is actually a 2-stroke scavening simulation, quite a small model with about 200k cells, devided in several blocks. The parallelization strategy of the ensight reader devides every block in N pieces for N server processes, so in the end there are a multitude of blocks with very few cells. I guess that's the main limitation for the parallel performance. I found similar notes in some mailing lists. I haven't tried applying the patch yet, also because I don't really need the server/client setup of paraview. I will have a look at it though some time. Thanks again for all the suggestions, Carl |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[General] “Upload” vtk data from client to server in paraview script | Jack001 | ParaView | 0 | March 8, 2018 08:27 |
[General] initiate Paraview on a server | LM4112 | ParaView | 1 | August 16, 2013 08:45 |
[OpenFOAM] paraview server and mpi | sail | ParaView | 0 | November 2, 2012 14:55 |
paraFoam reader for OpenFOAM 1.6 | smart | OpenFOAM Installation | 13 | November 16, 2009 22:41 |
Vista Client - Linux Server Connectivity Issue | vismech | STAR-CCM+ | 2 | July 17, 2009 09:12 |