|
[Sponsors] |
Large test case for running OpenFoam in parallel |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
August 16, 2007, 15:44 |
Hi,
I am testing parallel
|
#1 |
New Member
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 17 |
Hi,
I am testing parallel feature of OpenFOAM. The test case I used is icoFoam/cavity. However, I did not observe any speedup. The execution times are sequential: 0.27s 4 cpus : 0.63s 8 cpus : 0.7 s It might be because it is a small case. I am wondering whether there are any large cases which I can try. I see there are quite some cases in the tutorial directory. Can someone suggest one which is large enough to test the parallel feature? Or is there any publicly available OpenFoam test cases for benchmark purpose? Thanks, Huiyu |
|
August 16, 2007, 17:03 |
Just change the node density i
|
#2 |
Senior Member
Srinath Madhavan (a.k.a pUl|)
Join Date: Mar 2009
Location: Edmonton, AB, Canada
Posts: 703
Rep Power: 21 |
Just change the node density in constant/polyMesh/blockMeshDict. Increase the number of control volumes!
|
|
August 17, 2007, 04:18 |
Hi
We did some Benchmarks
|
#3 |
Senior Member
Jens Klostermann
Join Date: Mar 2009
Posts: 117
Rep Power: 17 |
Hi
We did some Benchmarks based on the pyFoamBech cases from the wiki (Thanks to Bernhard) see http://openfoamwiki.net/index.php/Be...ks_standard_v1 With OF-1.3 depending on the case, we got speedups up to 128 cores. Jens |
|
August 17, 2007, 15:29 |
Hi Srinath and Jens,
Thanks
|
#4 |
New Member
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 17 |
Hi Srinath and Jens,
Thanks for your reply and suggestions! I increased the node density from (20,20,1) to (100, 100,1) and decrease the deltaT to 0.001 to satisfy CFL condition (of the icoFoam/cavity case). With longer runtime, I did observe speedup. However, parallel efficiency dropped to < 50% with 8+ processors. I checked the benchmark wiki page. Most parallel results only have maximal 4 cpu result. Jens, can you point me some cases that scales with large number of processors? I am looking for some test cases that can scales to 64 cpus. I understand that scaling is depending on application as well as system. That's why I am looking for some cases which already show good scaling in other systems. I want to make sure the bad scaling in my local system is not due to application. Thanks, Huiyu |
|
August 17, 2007, 15:50 |
In my case it is a rasInterFoa
|
#5 |
Senior Member
Join Date: Mar 2009
Posts: 225
Rep Power: 18 |
In my case it is a rasInterFoam solver. The cluster which I'm using allows me to use 8 nodes each having 8 processors.
Frankly speaking the more processors I use, the much better efficiency I obtain. For example using my Mac G5, single processor, computation takes about couple days. With a cluster, 4 nodes give 1.5h, and 8 give something a bit more than 10 minutes. Try that case. Krystian |
|
August 17, 2007, 18:13 |
When I was in school a grad st
|
#6 |
Guest
Posts: n/a
|
When I was in school a grad student was doing cluster performance experiments on VHDL language chip simulations. He found that going from 1 to 2 to 4 nodes that total compute time was reduced. He attributed it to more cache hits due to smaller data sets.
|
|
August 20, 2007, 19:05 |
Hi Krystian,
What test case
|
#7 |
New Member
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 17 |
Hi Krystian,
What test case did you run with rasInterFoam solver? Is it the default damBreak case in the tutorials? Did you change anything in the controlDict and blockMeshDict? Did you use the damBreak case for interFoam in standardBench_v1.cfg ? Thanks, Huiyu |
|
August 20, 2007, 19:22 |
Hi Jens and Bernhard,
I dow
|
#8 |
New Member
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 17 |
Hi Jens and Bernhard,
I downloaded PyFoam-0.4.0, but did not find benchFoam.py under examples/. Where can I find it? Thanks. I did find standardBench_v1.cfg under examples/data. Based on it, I modified the interForm/damBreak. However the sequential run only takes about 180s to finish, which is significantly less than the baseline (1605.82s) in standardBench_v1.cfg. Something must be wrong here. I just want to make sure I am running the correct benchmark. The following is what I did for modification based on the configure file, please let me know which steps are wrong. step 1: Modify blockMeshDict, blocks section blocks ( hex (0 1 5 4 12 13 17 16) (46 16 1) simpleGrading (1 1 1) hex (2 3 7 6 14 15 19 18) (38 16 1) simpleGrading (1 1 1) hex (4 5 9 8 16 17 21 20) (46 84 1) simpleGrading (1 1 1) hex (5 6 10 9 17 18 22 21) (8 84 1) simpleGrading (1 1 1) hex (6 7 11 10 18 19 23 22) (38 84 1) simpleGrading (1 1 1) ); step 2. modify controlDict endTime 0.5; deltaT 0.0005; writeControl adjustableRunTime; writeInterval 0.1; step 3: Generate Mesh blockMesh . damBreak step4: reset gammar setFields . damBreak step5. run it interFoam . damBreak I am running with OpenFoam-1.4 on AMD Opteron(tm) Processor 285, 2.6Ghz, 8GB RAM, SLES10. Thanks, Huiyu |
|
August 21, 2007, 01:59 |
Hi Huiyu,
1. benchFoam.py i
|
#9 |
Senior Member
Jens Klostermann
Join Date: Mar 2009
Posts: 117
Rep Power: 17 |
Hi Huiyu,
1. benchFoam.py is now pyFoamBench.py 2. suggestion for cases: oodles pitzDaily and interFoam dambreak should have a good efficiency Jens |
|
August 21, 2007, 15:54 |
Hi Jens,
Thanks for your su
|
#10 |
New Member
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 17 |
Hi Jens,
Thanks for your suggestion. In the benchmark wiki page, I saw you submitted result on Opti250 with OpenFoam 1.2 standard. I am wondering whether you have tried the benchmark with OpenFoam 1.4. My problem is that I tried OpenFoam1.4 compiled with gcc 4.1.0 on a SLES 10, AMD Opteron(tm) Processor 285, 2.6Ghz, 8GB RAM machine. The sequential version interFoam/damBreak (as in the benchmark v1 configuration) finishes around 180s. Your submission is 588.91s on a Opteron 250 2.4Ghz 4GB Ram system. The big runtime difference can not be explained by the difference in the system. So I am wondering whether different OpenFoam versions contribute to the difference. However, it is still hard to believe it accounts for all the rest difference. Is there a way to verify the result of the benchmark? Huiyu |
|
August 21, 2007, 17:57 |
Hi Huiyu!
As far as I know
|
#11 |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Hi Huiyu!
As far as I know interFoam was rewritten in a major way from 1.3 to 1.4 (completely new algorithm). Propably this is the cause for the big difference. Bernhard
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request |
|
August 21, 2007, 19:00 |
Hi Bernhard,
Thanks for the
|
#12 |
New Member
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 17 |
Hi Bernhard,
Thanks for the info! I wonder how many solvers got rewritten from 1.2 to 1.4. I just run PitzDaily with oodles using benchmark_v1 configuration. And got 151.9s (Wall clock), while Jens's submission on Opti252 Dual-Opteron with OF1.2 is 232.47s. Has anyone run the benchmark suite using OF1.4? The reason I am so concerned about the runtime is if it is too short, it won't be a good case for parallel runs. Although I can modify the case to make it run longer, there won't be any data to compare from the benchmark wiki page. Huiyu |
|
August 22, 2007, 03:42 |
Hi Huiyu,
just started the
|
#13 |
Senior Member
Jens Klostermann
Join Date: Mar 2009
Posts: 117
Rep Power: 17 |
Hi Huiyu,
just started the benchmark_v1 again for sequential version interFoam/damBreak (as in the benchmark v1 configuration) I got 168.6 s at the same machine I mentioned on the wiki. So this is quite a speedup!! I will publish some more results later this week. In the wiki. If there is some interest in the community for benchmarking and I am willing to share my experience! Maybe we should colaborate and form some kind of benchmark group? I think the pyFoamBench is a good starting point. Jens |
|
August 22, 2007, 14:48 |
Hi Jens,
Thanks a lot for t
|
#14 |
New Member
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 17 |
Hi Jens,
Thanks a lot for testing! I think it is a great idea to form a benchmark group! I am very interested in benchmarking the parallel features of OpenFoam. Reading through some posts in this forum, I did find there are people interested in benchmarking for difference reasons, such as InfiniBand vs GigE comparision, procurement reference and so on. I do appreciate the benchmark wiki page and PyFoamBench, and will update benchmarking results when I finish some tests. The current benchmark suite is a good start. It may be too small for parallel runs with the significant speedup due to version updates. Huiyu |
|
August 27, 2009, 04:18 |
|
#15 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
Hi, I hope I am not too late to join the discussion.
It seems no followup of this concern after OpenFOAM-1.4. Why?
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
|
August 27, 2009, 05:45 |
|
#16 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
My personal Computer:
Case 9 (Please correct me if I was wrong), http://openfoamwiki.net/index.php/Be...ks_standard_v1 which means, I have modified the case according to standardBench_v2.cfg file. Version: OpenFOAM-1.6 Case: tutorials/incompressible/pisoFoam/pitzDaily Application: pisoFoam MPI: openmpi I use precompiled one from the official release. SUSE LINUX Release 11.2 Gnome 2.26.2 Memory 3.9G Processor 0: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Processor 1: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Processor 2: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Processor 3: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz That is 4 cores. Code:
NP Time(s) Speedup Speedup|baseline(880s) 1 105 1 8.381 2 55 1.909 16 4 34 3.088 25.882 Any ideas?
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
|
August 27, 2009, 05:57 |
|
#17 |
Senior Member
Anonymous
Join Date: Mar 2009
Posts: 110
Rep Power: 17 |
That seems normal to me.
You'll never get 4 times speedup by using 4 cores, especially with quad-cores. All four cores are on one processor (or two dual-cores stuck together like the Kentsfield quad-cores are) and that single processor has access to its RAM via a single Front-Side Bus. The FSB allows the passage of information between the cpu cache and the RAM itself. Now, if four separate cores are trying to access the RAM storage you are going to be limited on your FSB speed (not sure what the baseline value is). If you had a dual cpu motherboard with two dual cores and the same core-to-RAM ratio, then you would see improved performance as the FSB limit has been improved upon. I have the same cpu at home and one way to improve these numbers (without spending any money) is to perform overclocking, but this can be hazardous if not done properly. Nevertheless, you would then be able to increase FSB speeds (but keeping the same cpu speed) and you should see an increase in performance. |
|
August 27, 2009, 07:09 |
|
#18 | |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
No, that's my only cpu, you are not gonna to persuade me overclocking, I will be burnt out if it's burnt out.
Quote:
And I want that! Super-linear speed-up of a parallel multigrid. Navier-Stokes solver on Flosolver
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
||
August 30, 2009, 23:06 |
|
#19 |
Senior Member
Daniel WEI (老魏)
Join Date: Mar 2009
Location: Beijing, China
Posts: 689
Blog Entries: 9
Rep Power: 21 |
Anyone experience very very good scalability using OpenFOAM, like super linear speedup?
What is the best report of scalability using OpenFOAM so far? According to Amdahl's Law, the maximum speedup will be restricted by the fraction of the codes that can not be parallelized, so I would eagerly want to know what is this fraction of OpenFOAM? Thank you!
__________________
~ Daniel WEI ------------- Boeing Research & Technology - China Beijing, China |
|
September 22, 2009, 05:13 |
|
#20 |
Member
Carsten Thorenz
Join Date: Mar 2009
Location: Germany
Posts: 34
Rep Power: 17 |
Hi lakeat!
Speedup is not very easy to define. It is not only a property of the programm code (as you assume) but also a property of the machine you're using and the testcase. Amdahl's law is not really applicable here. OpenFoam is based on the idea of domain partitioning and should not (or to a negligible extend) suffer from Amdahl's law. For singlecore clusters scaling is mainly limited by the speed of the network between the nodes (latency beeing the culprit). The smaller the partitioned domains, the worse the impact. For multicore systems scaling is additionally hindered by the transfer of data from the cores to the main memory. Multiple cores on a single CPU all have to share a bus to the memory and this can hurt execution speed badly. Further more the size of the problem is relevant. Small meshes only run well on small number of CPUs, bigger meshes on larger numbers. There is usually a "sweet spot" (cells/core) where a code performs best. The is depending on the machine you're using (interconnects, cache sizes, cores per CPU, bus system, ...) For our machine (HP-Xeon Cluster with Gigabit-Ethernet) the code performs best with 50000 cells/core. Setting 4 cores as the reference (speedup 4), we see superlinear speedup of 42 for 32 cores for a testcase with 1.6 mio cells. This is mainly due to cache effects, i.e. lot of the data fits into the CPU caches for this number of cores. For even larger number of cores, the speedup is poorer (56 for 64 cores, 55!!! for 128 cores). This is due to the poor interconnects. Hope this helps. Bye, Carsten Thorenz |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
3d test case | Hassan Raiesi | Main CFD Forum | 1 | August 19, 2006 13:33 |
Test Case | ganesh | Main CFD Forum | 0 | March 16, 2006 13:34 |
Looking for test case | William M. | Main CFD Forum | 2 | May 26, 2005 04:45 |
test case? | lsm | Main CFD Forum | 0 | June 14, 2004 12:39 |
test case | Follet | Main CFD Forum | 0 | July 8, 2002 05:07 |