|
[Sponsors] |
May 21, 2002, 07:38 |
Parallel speed up
|
#1 |
Guest
Posts: n/a
|
Hi,
Does anyone have experiance with dual processor computer and CFX-5.5 under linux ? What kind of speed-up is common compared to singel processor ? Thanks a lot. Regards Soren |
|
May 21, 2002, 12:01 |
Re: Parallel speed up
|
#2 |
Guest
Posts: n/a
|
I seem to remember that you can obtain a fairly linar relationship assuming you solve a large problem to dissolve the effects of the partitioning (I saw a CFX presentation), but contact your vendor since CFX is very likely to have done the comparison.
|
|
May 21, 2002, 13:02 |
Re: Parallel speed up
|
#3 |
Guest
Posts: n/a
|
CFX-5.5 gets speedups of 1.6-1.8, depending on the problem size, in Linux. This is better on high end workstations, where 1.9-2.1 are typical. The memory and cache architectures on Intel/AMD Linux boxes are just not good enough to get comparable speedups.
Neale |
|
May 22, 2002, 03:14 |
Re: Parallel speed up
|
#4 |
Guest
Posts: n/a
|
Hi
Thanks for the reply. I know that under Windows NT/2k/XP the parallel performance of a dual processor computer is very bad. The speed up is about 1.1 to 1.2. Thats why I am looking at Linux. Any comment ? Regards Soren |
|
May 22, 2002, 04:41 |
Re: Parallel speed up
|
#5 |
Guest
Posts: n/a
|
Using CFX 5.5 on a Pentium-IV with WinNT, we obtained a speed-up of about 1.8-2.0. But, we have only tested it up to 4 PC's.
Astrid |
|
May 22, 2002, 05:43 |
Re: Parallel speed up
|
#6 |
Guest
Posts: n/a
|
Hi Astrid
It the computer singel or dual processor ? Regards Soren |
|
May 22, 2002, 08:44 |
Re: Parallel speed up
|
#7 |
Guest
Posts: n/a
|
I use TASCflow and CFX-5.5 in a Dual PIII PC. I've noted a speed-up about 1.4 - 1.6 in CFX-5 and 1.6 - 1.8 in TASCflow, depending the problem size. I only ran local parallel with two partitions.
cfd guy |
|
May 22, 2002, 16:46 |
Re: Parallel speed up
|
#8 |
Guest
Posts: n/a
|
Linux seems to generally do a better job at dynamic process managment (i.e., multitasking) so you see slightly better speedups there usually. I've typically seen on the order of 1.4-1.6 on NT, and 1.6-1.8 on Linux for CFX-5.5.
Neale. |
|
May 22, 2002, 16:49 |
Re: Parallel speed up
|
#9 |
Guest
Posts: n/a
|
Astrid,
Do you mean you ran a 4 process job on 4 PCs and only got 1.8 -> 2.0 speedup??? What problem size were you running? For a 4 process job you would need at least 400,000->600,000 elements to see a decent speedup. Neale |
|
May 23, 2002, 03:11 |
Re: Parallel speed up
|
#10 |
Guest
Posts: n/a
|
Hi
I am curious about these speed-up. I am running indoor and HVAC problems with mesh size from 400k-2.000k on a Win NT box with dual P4 processor. The speed up I am getting is below 1.2. Are you appling something special ? Thanks Regards Jens |
|
May 23, 2002, 12:22 |
Re: Parallel speed up
|
#11 |
Guest
Posts: n/a
|
Hi Jens,
How much RAM usage do you have. For a 2 million node problem, I'd be surprised if you were not running into swap space. In this case, you will see the best speedup if you run it on multiple systems, at least enough to get it all in RAM and out of swap. Robin |
|
May 23, 2002, 14:43 |
Re: Parallel speed up
|
#12 |
Guest
Posts: n/a
|
Hi
I have benchmark using a HVAC problem with 600.000 cell. The speed up was 1.15 on a dual P4 with 1.2 Gb Ram. Any hints ? Regards Jens |
|
May 24, 2002, 11:05 |
Re: Parallel speed up
|
#13 |
Guest
Posts: n/a
|
How were you calculating the speedup? You should use the CFD start and finish times in the output file.
600,000 cells means roughly 120,000 nodes (for a tet grid I assume), which should only take about 180MB-200MB for uvwp-k-eps. So, swapping probably isn't an issue. Make sure you do your performance measurements on a "clean" machine. i.e., you aren't running anything else or doing anything else other than the CFD calculation. Neale. |
|
May 27, 2002, 12:56 |
Architetures Benchmark
|
#14 |
Guest
Posts: n/a
|
Hi Jens,
As this discussion is very interesting, I'd like to propose you the following benchmark. It would be very interesting that users could share their speedups information. I've built a very simple case (rectangular channel) with approximately 960K cells (hybrid mesh with inflation). I've performed this definition file in a SUN Workstation running on Solaris 8 with 4 processors. Here's some data about this case: 3D, Turbulent (k-eps), Incompressible(Air) and Steady State flow. Number of Cells: Almost 948,000 Run - - - - - - Speedup Serial -----> 1. 2 proc. -----> 2.08 3 proc. -----> 3.03 4 proc. -----> 4.02 Why don't you test in your NT machine? I could send you the journal file so that you could easily obtain this definition file. If anyone else wants the journal file, please feel free to mail me. PS1.: Make sure you're not running any other applications in your machine. PS2.: Rebuilding the journal file in my NT machine the resulting mesh has 947,916 elements. However when rebuilding it in my UNIX system, the resulting file has 948,161 elements. I believe that will be no problem at all for benchmarking purposes. PS3: I think it's the simplest case you could ever imagine. It's a simple geometry with no geometric bad angles and no grid interfaces (monoblock). I believe that the speedup also depends on some geometric information. Kind regards, cfd guy |
|
May 28, 2002, 14:02 |
Re: Architetures Benchmark
|
#15 |
Guest
Posts: n/a
|
About my previous post. I've tested two coarser grids in comparison with the 1st one. Here the results:
Grid 2: 468.500 elements. SpeedUp: (2 processes) = 1.92 SpeedUp: (3 processes) = 2.74 SpeedUp: (4 processes) = 3.41 Grid 3: 109.400 elements. SpeedUp: (2 processes) = 1.49 SpeedUp: (3 processes) = 2.15 SpeedUp: (4 processes) = 2.80 I'm not trying to find out the optimal mesh size for this problem, but it seems that in this case, the minimum number of elements for each processor must be greater than 200k cells to obtain a linear relation between the speedup and the number of processes. cfd guy |
|
May 30, 2002, 17:38 |
Re: Parallel speed up
|
#16 |
Guest
Posts: n/a
|
Soren,
We used 4 distributed parallel PC's in a 100bt network. Astrid |
|
May 30, 2002, 17:44 |
Re: Parallel speed up
|
#17 |
Guest
Posts: n/a
|
Sorry, I was wrong. I didn't mean to confuse you.
We ran a job with ± 1.5M elements on 1 standalone PC and on 2 distributed parallel PC's. Then, speed up was 1.8-2.0. With 4 PC's, speed up was about 3.6. Astrid |
|
May 30, 2002, 17:47 |
Re: Architetures Benchmark
|
#18 |
Guest
Posts: n/a
|
cfd guy,
The number of nodes is more relevant to parallel efficiency. Can you post the number of nodes in your mesh rather than elements? Typically, the best efficiency is achieved when the number of nodes per partition is greater than 100k. At less than 20k per partition the trend may reverse, taking longer with added partitions (due to increased communication). Robin |
|
May 31, 2002, 13:26 |
Re: Architetures Benchmark
|
#19 |
Guest
Posts: n/a
|
Actually, it's ok to quote by elements as well. They are related anyways (roughly 1:1 for hex grids, and 5-6:1 for tet/hybrid grids). In fact the assembly really scales with the number of elements anyways as the CFX-5 solver uses an element based assembly.
I'm not suprised by the results though, as 50,000 vertices per partition translates into 200k elements on a tet/hybrid grid. This is what we see in parallel results as well. Neale. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
IcoFoam in parallel Issue with speed up | hjasak | OpenFOAM Running, Solving & CFD | 19 | October 11, 2011 18:07 |
Increase speed of parallel computation | Purushothama | Siemens | 2 | November 30, 2010 15:51 |
Parallel with Windoze, speed difference between PV | Charles | CFX | 3 | March 10, 2005 02:25 |
Parallel speed up for CFX 5 on PC's | Roued | CFX | 6 | November 28, 2001 19:02 |
speed up ratio at parallel processing | Kim hak-gyu | Main CFD Forum | 1 | October 25, 2000 10:57 |