Setting parallelization in CFX

metaliat93 · January 19, 2016, 13:11

Help set parallelization of calculation in CFX on the local computer.
I determine what performance gains are not large.
506549230.png
Y - time spent on computation.
X - Number of Cores
Axial compress rotor

Opaque · January 19, 2016, 14:30

What kind of problem are you solving ? Not all problems scale equally

How many cores do you have available ?

Is HyperThreading enabled ?

metaliat93 · January 19, 2016, 15:41

I have 4 phisical cores.

metaliat93 · January 19, 2016, 15:42

Quote:

Originally Posted by Opaque

What kind of problem are you solving ? Not all problems scale equally

How many cores do you have available ?

Is HyperThreading enabled ?

If you want i can send my project

ghorrocks · January 19, 2016, 16:59

Small meshes do not parallelise well. Some physical models can have problems in parallel: radiation modelling especially, but some multiphase models slow things down. Also lots of file IO from results files, monitor points or anything else will cause parallel problems.

Finally: The only part of the simulation which runs in parallel is the solver iterations. Partitioning, initial condition interpolation, set up to begin and pack up to end (including writing the results files) are done serial. To get a good speed up you will need a simulation where the majority of the time is spent in the solver iterations.

metaliat93 · January 20, 2016, 02:17

Quote:

Originally Posted by ghorrocks

Small meshes do not parallelise well. Some physical models can have problems in parallel: radiation modelling especially, but some multiphase models slow things down. Also lots of file IO from results files, monitor points or anything else will cause parallel problems.

Finally: The only part of the simulation which runs in parallel is the solver iterations. Partitioning, initial condition interpolation, set up to begin and pack up to end (including writing the results files) are done serial. To get a good speed up you will need a simulation where the majority of the time is spent in the solver iterations.

I think that CFX must parallel more good. Can you see what's a problem in my project?
https://drive.google.com/file/d/0B5g...ew?usp=sharing

case number 5(3)

ghorrocks · January 20, 2016, 16:14

I do not have time to look at your project.

If you want us to look at your simulation, post some images of your mesh and your CCL (or output file).

metaliat93 · January 21, 2016, 03:21

Quote:

Originally Posted by ghorrocks

i do not have time to look at your project.

If you want us to look at your simulation, post some images of your mesh and your ccl (or output file).

506685918.png
506685770.png
506685896.png
506685896.png
506685746.png

This is picture of quality of mesh and limits, settings of solver, results of series of solving.
My processor Intel(R) Xeon(R) CPU E5430 @ 2.66GHz, 3024 МГц
link of my mesh in TG https://drive.google.com/file/d/0B5g...ew?usp=sharing
link of my results https://drive.google.com/file/d/0B5g...ew?usp=sharing

ghorrocks · January 21, 2016, 17:05

Is this a simple, single phase simulation? No radiation or multiphase?

In the <CFX ROOT>/examples directory there is a file called Benchmark.def. Can you run this in serial and local parallel? Please report the solver wall clock time for the solver iterations (not the total time). This is in the output file and listed after the solver iterations is converged and before the output file lists the results file contents.

I use this benchmark simulation to debug parallel problems.

metaliat93 · January 30, 2016, 08:22

Quote:

Originally Posted by ghorrocks

Is this a simple, single phase simulation? No radiation or multiphase?

In the <CFX ROOT>/examples directory there is a file called Benchmark.def. Can you run this in serial and local parallel? Please report the solver wall clock time for the solver iterations (not the total time). This is in the output file and listed after the solver iterations is converged and before the output file lists the results file contents.

I use this benchmark simulation to debug parallel problems.

I used: local parallel, was not radiation or multiphase. I have SSD

ghorrocks · January 31, 2016, 04:25

OK thanks. Please run the Benchmark.def file I mentioned before in serial and local parallel modes.

metaliat93 · February 5, 2016, 03:45

Quote:

Originally Posted by ghorrocks

OK thanks. Please run the Benchmark.def file I mentioned before in serial and local parallel modes.

Please, can you explaine where get 'Benchmark.def' ?

ghorrocks · February 5, 2016, 04:15

See post #9:

Quote:

In the <CFX ROOT>/examples directory

highorder_cfd · February 5, 2016, 05:28

To get benefits from HPC the number of elements need to be large, otherwise the time spent for the process communication and I/O will be significant.

Ideally you should have a linear behaviour time vs ncores, otherwise you will not exploit the parallelism in the right way.

I hope this helps.

evcelica · February 5, 2016, 17:40

My guess is that you are running out of memory bandwidth. That CPU only has 2 memory channels at 1333 MHz. You scale quite well from 1-2 cores, but lose efficiency quickly after that.
On my computer with 4 memory channels @ 2133 MHz, I do not get too much speedup past 4 cores. 3 cores is actually the sweet spot where it is pretty linear from 1, then efficiency starts to drop off at 4 cores. I've tried 5 cores and got the same performance as 4 in some cases.

January 19, 2016, 13:11	Setting parallelization in CFX	#1
metaliat93 Member Aleksandr Join Date: Dec 2015 Location: Kharkov, Ukraine Posts: 93 Blog Entries: 1 Rep Power: 10	Help set parallelization of calculation in CFX on the local computer. I determine what performance gains are not large. 506549230.png Y - time spent on computation. X - Number of Cores Axial compress rotor

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
CFX Treatment of Laminar and Turbulent Flows	Jade M	CFX	18	September 15, 2022 07:08
CFX setting for high velocity air beam in small gap.	Raymond	CFX	0	March 29, 2015 05:04
Question about heat transfer coefficient setting for CFX	Anna Tian	CFX	1	June 16, 2013 06:28
CFX pressure in Simulations problem	nasdak	CFX	1	April 14, 2010 13:22

January 19, 2016, 14:30		#2
Opaque Senior Member Join Date: Jun 2009 Posts: 1,853 Rep Power: 33	What kind of problem are you solving ? Not all problems scale equally How many cores do you have available ? Is HyperThreading enabled ?

January 19, 2016, 15:41		#3
metaliat93 Member Aleksandr Join Date: Dec 2015 Location: Kharkov, Ukraine Posts: 93 Blog Entries: 1 Rep Power: 10	I have 4 phisical cores.

January 19, 2016, 16:59		#5
ghorrocks Super Moderator Glenn Horrocks Join Date: Mar 2009 Location: Sydney, Australia Posts: 17,819 Rep Power: 144	Small meshes do not parallelise well. Some physical models can have problems in parallel: radiation modelling especially, but some multiphase models slow things down. Also lots of file IO from results files, monitor points or anything else will cause parallel problems. Finally: The only part of the simulation which runs in parallel is the solver iterations. Partitioning, initial condition interpolation, set up to begin and pack up to end (including writing the results files) are done serial. To get a good speed up you will need a simulation where the majority of the time is spent in the solver iterations.

January 20, 2016, 16:14		#7
ghorrocks Super Moderator Glenn Horrocks Join Date: Mar 2009 Location: Sydney, Australia Posts: 17,819 Rep Power: 144	I do not have time to look at your project. If you want us to look at your simulation, post some images of your mesh and your CCL (or output file).

January 21, 2016, 17:05		#9
ghorrocks Super Moderator Glenn Horrocks Join Date: Mar 2009 Location: Sydney, Australia Posts: 17,819 Rep Power: 144	Is this a simple, single phase simulation? No radiation or multiphase? In the <CFX ROOT>/examples directory there is a file called Benchmark.def. Can you run this in serial and local parallel? Please report the solver wall clock time for the solver iterations (not the total time). This is in the output file and listed after the solver iterations is converged and before the output file lists the results file contents. I use this benchmark simulation to debug parallel problems.

January 31, 2016, 04:25		#11
ghorrocks Super Moderator Glenn Horrocks Join Date: Mar 2009 Location: Sydney, Australia Posts: 17,819 Rep Power: 144	OK thanks. Please run the Benchmark.def file I mentioned before in serial and local parallel modes.

February 5, 2016, 05:28		#14
highorder_cfd Member Join Date: Jan 2015 Posts: 63 Rep Power: 11	To get benefits from HPC the number of elements need to be large, otherwise the time spent for the process communication and I/O will be significant. Ideally you should have a linear behaviour time vs ncores, otherwise you will not exploit the parallelism in the right way. I hope this helps.

February 5, 2016, 17:40		#15
evcelica Senior Member Erik Join Date: Feb 2011 Location: Earth (Land portion) Posts: 1,176 Rep Power: 23	My guess is that you are running out of memory bandwidth. That CPU only has 2 memory channels at 1333 MHz. You scale quite well from 1-2 cores, but lose efficiency quickly after that. On my computer with 4 memory channels @ 2133 MHz, I do not get too much speedup past 4 cores. 3 cores is actually the sweet spot where it is pretty linear from 1, then efficiency starts to drop off at 4 cores. I've tried 5 cores and got the same performance as 4 in some cases.