speedup questions

January 31, 2008, 12:31

Hi, all,

It seems that when I change from serial to parallel using 4 partitions of MPICH2 Local Parallel for Windows, the computational time increased(almost doubled!), which is oppsite to what I expected.

I am using CFX11. My machine:

Fujitsu Siemens Celsius v830

AMD64 2.61G 2xdual core (total 4 processors)

8G memory

Windows XP x64 edition

The mesh: total nodes: 581665

total elements: 2.7M (mostly tet)

I tried several cases with different meshes and different partition numbers, I didn't see speedup. The serial mode seems always faster.

Did I missed something in solver setup or other software setup?

Thank you for your comments.

January 31, 2008, 13:46

Hi Tony,

That sounds unusual. You certainly have a large enough problem to get reasonable scaling and with 8GB of RAM, you have enough memory. Could you add the following:

1. How many iterations are you running? 2. How long is the partitioning taking vs. the iterations? (look at the time reported just before the iterations start vs. the time after) 3. How does the CPU time compare to the wall clock time? 4. How does the performance compare for 2 and 3 partitions? 5. Do you have a lot of domain interfaces (GGIs)? 6. Do other cases display similar behavior?

-CycLone

January 31, 2008, 15:49

Hi, CycLone,

Thank you for your reply.

1. I haven't done any real comparason runs on a clean machine like most previous posts did. I noticed that the CPU time for one iteration remains almost the same (maybe I am wrong here). So I just compared the CPU Seconds for one iteration on the same def file using different partitions.

2. The summed CPU-time for mesh partitioning: 19s; The CPU time for each iteration: 500s

3. For this run I stopped the solver at 17th iterations and wall clock time is 4 minutes longer than the cpu time (it wrote 4 *_full.bak files).

4. I will check this later. Before I tried once and I didn't see any advantage from parallel computing.

5. Yes, I do have a lot of GGI interfaces. And one of the interfaces has a lot of 2d regions. Is this the reason?

6. I haven't try other cases without interfaces.

Thanks.

January 31, 2008, 17:24

Hi,

Lots of GGI interfaces will reduce parallel efficiency. You may have lots of small domains, that will also parallelise poorly. Also writing results files parallelises poorly, if you stop writing results files it should improve.

Try the benchmark.def file which is located in the examples directory of CFX. Run that serial and parallel, you should get a reasonable speedup using that model. That would be a good test to see if the issue is the model or your computer setup.

Glenn Horrocks

February 1, 2008, 11:58

Hi, Glenn,

Yes, that benchmark gives me reasonable speedup.

partition: speedup

4 2.58

3 2.32

2 1.78

I think I have to minimize or stop writing backup files.

Thanks.

February 3, 2008, 18:26

Hi,

It looks like the backup files are the cause then. Sounds like the simulation is spending more time writing backup files then solving - you will get a big speedup by writing less backup files.

Glenn Horrocks

January 31, 2008, 12:31	speedup questions	#1
tony Guest Posts: n/a	Hi, all, It seems that when I change from serial to parallel using 4 partitions of MPICH2 Local Parallel for Windows, the computational time increased(almost doubled!), which is oppsite to what I expected. I am using CFX11. My machine: Fujitsu Siemens Celsius v830 AMD64 2.61G 2xdual core (total 4 processors) 8G memory Windows XP x64 edition The mesh: total nodes: 581665 total elements: 2.7M (mostly tet) I tried several cases with different meshes and different partition numbers, I didn't see speedup. The serial mode seems always faster. Did I missed something in solver setup or other software setup? Thank you for your comments.

January 31, 2008, 13:46	Re: speedup questions	#2
CycLone Guest Posts: n/a	Hi Tony, That sounds unusual. You certainly have a large enough problem to get reasonable scaling and with 8GB of RAM, you have enough memory. Could you add the following: 1. How many iterations are you running? 2. How long is the partitioning taking vs. the iterations? (look at the time reported just before the iterations start vs. the time after) 3. How does the CPU time compare to the wall clock time? 4. How does the performance compare for 2 and 3 partitions? 5. Do you have a lot of domain interfaces (GGIs)? 6. Do other cases display similar behavior? -CycLone

January 31, 2008, 15:49	Re: speedup questions	#3
tony Guest Posts: n/a	Hi, CycLone, Thank you for your reply. 1. I haven't done any real comparason runs on a clean machine like most previous posts did. I noticed that the CPU time for one iteration remains almost the same (maybe I am wrong here). So I just compared the CPU Seconds for one iteration on the same def file using different partitions. 2. The summed CPU-time for mesh partitioning: 19s; The CPU time for each iteration: 500s 3. For this run I stopped the solver at 17th iterations and wall clock time is 4 minutes longer than the cpu time (it wrote 4 *_full.bak files). 4. I will check this later. Before I tried once and I didn't see any advantage from parallel computing. 5. Yes, I do have a lot of GGI interfaces. And one of the interfaces has a lot of 2d regions. Is this the reason? 6. I haven't try other cases without interfaces. Thanks.

January 31, 2008, 17:24	Re: speedup questions	#4
Glenn Horrocks Guest Posts: n/a	Hi, Lots of GGI interfaces will reduce parallel efficiency. You may have lots of small domains, that will also parallelise poorly. Also writing results files parallelises poorly, if you stop writing results files it should improve. Try the benchmark.def file which is located in the examples directory of CFX. Run that serial and parallel, you should get a reasonable speedup using that model. That would be a good test to see if the issue is the model or your computer setup. Glenn Horrocks

February 1, 2008, 11:58	Re: speedup questions	#5
tony Guest Posts: n/a	Hi, Glenn, Yes, that benchmark gives me reasonable speedup. partition: speedup 4 2.58 3 2.32 2 1.78 I think I have to minimize or stop writing backup files. Thanks.

February 3, 2008, 18:26	Re: speedup questions	#6
Glenn Horrocks Guest Posts: n/a	Hi, It looks like the backup files are the cause then. Sounds like the simulation is spending more time writing backup files then solving - you will get a big speedup by writing less backup files. Glenn Horrocks

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Superlinear speedup in OpenFOAM 13	msrinath80	OpenFOAM Running, Solving & CFD	18	March 3, 2015 06:36
possible interview questions	atturh	Main CFD Forum	1	February 21, 2012 09:53
NACA0012 Validation Case Questions	ozzythewise	Main CFD Forum	3	August 3, 2010 15:39
[Other] Some perhaps stupid questions about calculix	lynx	OpenFOAM Meshing & Mesh Conversion	11	May 17, 2010 07:48
Parallel efficiency and speedup info	lakeat	OpenFOAM Running, Solving & CFD	2	August 31, 2009 12:05