CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > ANSYS > CFX

Simulation speed increase

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   September 29, 2010, 11:50
Smile Simulation speed increase
  #1
Senior Member
 
Attesz's Avatar
 
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17
Attesz is an unknown quantity at this point
Hi all,

I'm interesting in the effect of the Memory Allocation Factor in Solver Manager, Define Run Advanced Tabs. When I set higher values it speeds up the simulation?

My other question is about the number of cores. I'm using a HP Workstation machine with 4 core INTEL Xeon and 16GBs of RAM. My simulation is a steady stade turbomachninery case with 2.5millions of elements. I experienced that when I use 2 cores it is faster than using 4 cores. Is there a fist-rule to select the cores correctly, or I should do benchmarks in every cases?

Thanks in advance,
Attila
Attesz is offline   Reply With Quote

Old   September 29, 2010, 14:31
Default
  #2
New Member
 
Chrome
Join Date: Sep 2010
Location: WI, USA
Posts: 5
Rep Power: 16
TX_Air is on a distinguished road
Attila,

What Xeon processor are you using? I use Xeon 5650. It is a 6 core with 24GB RAM. I have notice the same on my machine. There is no marginal speedup from 4 to 6 core. That has been the case for most of the simulation ranging from 800k cell to 2 million.

Most of the time speedup or speed is bottlenecked by BUS speed. So even if you have good RAM and Processor, adding extra core does not help just because BUS is not capable of handling I/O.
TX_Air is offline   Reply With Quote

Old   September 29, 2010, 16:43
Default
  #3
Senior Member
 
Join Date: Mar 2009
Location: Europe
Posts: 169
Rep Power: 17
joey2007 is on a distinguished road
1. Speed up by memory allocation. Service told me some time ago that it may introduce a slighty better performance.

2. Much more impact: the multiple core issue. The cores share the band width to the memory. Depending on the machine this may limit the scalabilty. Possibly it is much faster to use two cores on two machines than four cores on one machine
joey2007 is offline   Reply With Quote

Old   September 29, 2010, 16:45
Default
  #4
Senior Member
 
Join Date: Mar 2009
Location: Europe
Posts: 169
Rep Power: 17
joey2007 is on a distinguished road
upps. Point two was already answered. I am sorry.
joey2007 is offline   Reply With Quote

Old   September 30, 2010, 06:25
Default
  #5
Senior Member
 
Attesz's Avatar
 
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17
Attesz is an unknown quantity at this point
Quote:
Originally Posted by TX_Air View Post
Attila,

What Xeon processor are you using? I use Xeon 5650. It is a 6 core with 24GB RAM. I have notice the same on my machine. There is no marginal speedup from 4 to 6 core. That has been the case for most of the simulation ranging from 800k cell to 2 million.

Most of the time speedup or speed is bottlenecked by BUS speed. So even if you have good RAM and Processor, adding extra core does not help just because BUS is not capable of handling I/O.
Hello,
Im using X5450 3GHz 4 cores. Thank's for your reply!
Attesz is offline   Reply With Quote

Old   September 30, 2010, 06:26
Default
  #6
Senior Member
 
Attesz's Avatar
 
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17
Attesz is an unknown quantity at this point
Quote:
Originally Posted by joey2007 View Post
1. Speed up by memory allocation. Service told me some time ago that it may introduce a slighty better performance.

2. Much more impact: the multiple core issue. The cores share the band width to the memory. Depending on the machine this may limit the scalabilty. Possibly it is much faster to use two cores on two machines than four cores on one machine
Thanks Joey!
Attesz is offline   Reply With Quote

Old   October 7, 2010, 07:51
Default
  #7
New Member
 
Sharad Gupta
Join Date: Oct 2010
Posts: 3
Rep Power: 16
CFD in my blood is on a distinguished road
Quote:
Originally Posted by TX_Air View Post
Attila,

What Xeon processor are you using? I use Xeon 5650. It is a 6 core with 24GB RAM. I have notice the same on my machine. There is no marginal speedup from 4 to 6 core. That has been the case for most of the simulation ranging from 800k cell to 2 million.

Most of the time speedup or speed is bottlenecked by BUS speed. So even if you have good RAM and Processor, adding extra core does not help just because BUS is not capable of handling I/O.
Hi..

What you are saying is true. But in the event that bus speed is not your limiting factor, there is also another dimension to it. I use an 8 core, 32GB RAM machine. I use 6 cores for my solver and not all 8. Primarily, the CPU needs some "free" cores to help in writing the result files into the hard drive. If you allocate all the cores for your solver, the solver will temporarily need to stop every time it writes a result file or updates monitoring points data into a file. This slows down the solver.

Also, I have found that in parallelization, always use even number of cores. For some odd reason (which I am not sure why) even number of cores work faster than odd number of cores. For example a solver working on 6 cores in an 8 core machine, works faster than if I use 5 cores or even 7 cores for that matter.

Meanwhile, using 6 cores on an 8 core machine is faster than using 2 or 4 cores. So as a rule of thumb I tend to use the highest possible 'even' number of cores which is lower than the maximum cores present in the system.

I hope this helps...

Good Luck..
CFD in my blood is offline   Reply With Quote

Old   October 7, 2010, 08:08
Smile
  #8
Senior Member
 
Attesz's Avatar
 
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17
Attesz is an unknown quantity at this point
--------------------

Last edited by Attesz; October 7, 2010 at 10:49. Reason: mistake
Attesz is offline   Reply With Quote

Old   October 7, 2010, 08:36
Default
  #9
New Member
 
Sharad Gupta
Join Date: Oct 2010
Posts: 3
Rep Power: 16
CFD in my blood is on a distinguished road
Hi Attila,

I understand what you mean when you run multiple processes and your data is definitely helpful. Thanks! When I have a mesh which is less than 1-2 million, and I have multiple cases to run, I tend to use 2 cores for each run. This way I use a total of 6 cores for running 3 different load cases. This works best provided you have enough RAM and hard disk space.

However, I did not understand quite clearly your other data. You mentioned it took more time (almost double) for every iteration as you doubled the number of cores. For example when you went from 1 core to 4 cores, it took 5.88 min in 1 core and 18 min in 4 cores for just 1 iteration?!?! Is the data correct or is it because you have moving mesh. Because, I think that moving mesh and parallelization don't gel very well.
CFD in my blood is offline   Reply With Quote

Old   October 7, 2010, 09:20
Default
  #10
Senior Member
 
Attesz's Avatar
 
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17
Attesz is an unknown quantity at this point
Quote:
Originally Posted by CFD in my blood View Post
However, I did not understand quite clearly your other data. You mentioned it took more time (almost double) for every iteration as you doubled the number of cores. For example when you went from 1 core to 4 cores, it took 5.88 min in 1 core and 18 min in 4 cores for just 1 iteration?!?! Is the data correct or is it because you have moving mesh. Because, I think that moving mesh and parallelization don't gel very well.
Yes, on 4 cores it takes about 3 times more to solve only one iteration, it is correct. My machine is a workstation, but not only for computation, so i'm working with WindowsXP. The simulation doesn't have moving mesh, just a simple rotating frame. I think that the slow-down caused by the slowing effect of the communication between the processors (just guessing, i'm not an IT expert). Here we have a linux cluster also, and the phenomenon is the same: when i use 4 cores for 1 process instead of 2 cores, the simulation time slightly increases.
On your machine, your simulation accelerates if you use more CPU's when running with 1-2 millions of elements?

Regards,
Attesz
Attesz is offline   Reply With Quote

Old   October 7, 2010, 09:35
Default
  #11
New Member
 
Sharad Gupta
Join Date: Oct 2010
Posts: 3
Rep Power: 16
CFD in my blood is on a distinguished road
Yes. In my experience, when I use 1-2 million elements, the simulation runs faster if I use multiple cores as opposed to a single core. This is when all the cores belong to the same CPU. However, the slow communication problem occurs when I use multiple CPUs in a cluster connected by LAN cables. So the communication between different CPUs becomes the limiting factor. I have not yet witnessed slow communication problem when I use mulitple cores in the same CPU.
CFD in my blood is offline   Reply With Quote

Old   October 7, 2010, 09:40
Default
  #12
Senior Member
 
Attesz's Avatar
 
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17
Attesz is an unknown quantity at this point
Hmm, that's interesting. The communication slowdown using LAN cables is comprehensible, of course.
Maybe in my case it's caused by the rotating frame really. Anyway, it was useful to discuss this.

Regards,
Attesz
Attesz is offline   Reply With Quote

Old   October 7, 2010, 09:50
Default
  #13
New Member
 
Alexander U.
Join Date: Aug 2010
Posts: 7
Rep Power: 16
FoxTwo is on a distinguished road
just to mention, if you compare the cpu time between two seperated iterations, you have to divide the time by the used cores! to compare the speedup/time saving it is better to run a simulation with a defined number of iteration and compare the duration of the hole simulation!
FoxTwo is offline   Reply With Quote

Old   October 7, 2010, 09:56
Default
  #14
Senior Member
 
Attesz's Avatar
 
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17
Attesz is an unknown quantity at this point
Alexander,
thanks for the notice, I did almost the same. The numbers what I've shared before means 1 iteration time need computed from the 99.th and 100.th iterations of a single process.
Attesz is offline   Reply With Quote

Old   October 7, 2010, 10:04
Default
  #15
New Member
 
Alexander U.
Join Date: Aug 2010
Posts: 7
Rep Power: 16
FoxTwo is on a distinguished road
well, then i donīt understand why the simulation with two cores is slower than when you use only one on a 4 core-machine? in my experience, a parallel run (2 cpus/cores) is always faster!
FoxTwo is offline   Reply With Quote

Old   October 7, 2010, 10:06
Default
  #16
Senior Member
 
Attesz's Avatar
 
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17
Attesz is an unknown quantity at this point
I don't understand it as well
Attesz is offline   Reply With Quote

Old   October 7, 2010, 10:10
Default
  #17
New Member
 
Alexander U.
Join Date: Aug 2010
Posts: 7
Rep Power: 16
FoxTwo is on a distinguished road
can you plz write down here the header with the cpu-time (CPU SECONDS = ...) of the two iterations which are you comparing? and that for all core-cases you have done yet? i have to see it with my own eyes.
FoxTwo is offline   Reply With Quote

Old   October 7, 2010, 10:19
Default
  #18
Senior Member
 
Attesz's Avatar
 
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17
Attesz is an unknown quantity at this point
1 procesess on 4 cores


================================================== ====================
OUTER LOOP ITERATION = 762 ( 99) CPU SECONDS = 2.366E+05 (1.085E+05)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 1.36 | 8.0E-04 | 5.1E-01 | 1.2E-02 OK|
| V-Mom | 1.19 | 2.6E-04 | 1.6E-01 | 1.5E-02 OK|
| W-Mom | 1.26 | 2.1E-04 | 1.4E-01 | 1.7E-02 OK|
| P-Mass | 1.70 | 1.4E-05 | 1.0E-02 | 9.1 3.5E-02 OK|
+----------------------+------+---------+---------+------------------+
| H-Energy | 0.74 | 1.2E-04 | 6.0E-02 | 6.1 5.6E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE | 1.13 | 2.6E-04 | 1.6E-01 | 6.0 1.8E-02 OK|
| O-TurbFreq | 1.22 | 2.2E-05 | 7.9E-03 | 12.5 4.6E-05 OK|
+----------------------+------+---------+---------+------------------+
================================================== ====================
OUTER LOOP ITERATION = 763 ( 100) CPU SECONDS = 2.377E+05 (1.096E+05)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 0.70 | 5.6E-04 | 4.5E-01 | 2.0E-02 OK|
| V-Mom | 0.83 | 2.2E-04 | 1.3E-01 | 1.6E-02 OK|
| W-Mom | 0.84 | 1.7E-04 | 9.7E-02 | 1.8E-02 OK|
| P-Mass | 0.60 | 8.6E-06 | 5.3E-03 | 9.1 3.6E-02 OK|
+----------------------+------+---------+---------+------------------+
| H-Energy | 1.30 | 1.5E-04 | 8.6E-02 | 6.1 5.5E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE | 0.83 | 2.2E-04 | 1.2E-01 | 6.0 1.9E-02 OK|
| O-TurbFreq | 0.84 | 1.9E-05 | 7.4E-03 | 12.5 4.7E-05 OK|
+----------------------+------+---------+---------+------------------+


one process on 1 core (i've started it now, so it will be much faster)

================================================== ====================
OUTER LOOP ITERATION = 930 ( 1) CPU SECONDS = 3.780E+05 (4.566E+01)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 7.31 | 3.6E-03 | 3.8E-01 | 1.5E-01 ok|
| V-Mom |14.93 | 3.6E-03 | 1.3E-01 | 1.3E-01 ok|
| W-Mom |99.99 | 2.2E-02 | 1.3E+00 | 1.5E-02 OK|
| P-Mass |99.99 | 1.6E-03 | 1.2E-01 | 4.9 5.8E-02 OK|
+----------------------+------+---------+---------+------------------+
+--------------------------------------------------------------------+
| ****** Notice ****** |
| A wall has been placed at portion(s) of an OUTLET |
| boundary condition (at 20.0% of the faces, 24.6% of the area) |
| to prevent fluid from flowing into the domain. |
| The boundary condition name is: Outlet. |
| The fluid name is: Air Ideal Gas. |
| If this situation persists, consider switching |
| to an Opening type boundary condition instead. |
+--------------------------------------------------------------------+
| H-Energy |58.25 | 6.0E-03 | 1.5E-01 | 5.8 7.7E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE |26.76 | 5.4E-03 | 1.9E-01 | 5.8 3.8E-02 OK|
| O-TurbFreq |99.99 | 3.0E-03 | 7.6E-02 | 12.3 1.6E-04 OK|
+----------------------+------+---------+---------+------------------+
================================================== ====================
OUTER LOOP ITERATION = 931 ( 2) CPU SECONDS = 3.785E+05 (4.598E+02)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+


and one process on 2 cores
================================================== ====================
OUTER LOOP ITERATION = 1174 ( 57) CPU SECONDS = 6.893E+05 (3.785E+04)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 1.00 | 1.4E-04 | 2.5E-02 | 2.2E-02 OK|
| V-Mom | 0.98 | 1.7E-04 | 7.0E-02 | 1.9E-02 OK|
| W-Mom | 0.98 | 9.0E-05 | 2.3E-02 | 2.2E-02 OK|
| P-Mass | 0.99 | 3.2E-06 | 4.6E-04 | 9.0 3.6E-02 OK|
+----------------------+------+---------+---------+------------------+
| H-Energy | 0.95 | 5.6E-05 | 2.3E-02 | 5.9 5.7E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE | 0.97 | 5.1E-05 | 4.6E-03 | 5.8 1.9E-02 OK|
| O-TurbFreq | 1.06 | 2.2E-05 | 3.3E-03 | 12.4 1.4E-04 OK|
+----------------------+------+---------+---------+------------------+
================================================== ====================
OUTER LOOP ITERATION = 1175 ( 58) CPU SECONDS = 6.900E+05 (3.854E+04)
----------------------------------------------------------------------
| Equation | Rate | RMS Res | Max Res | Linear Solution |
+----------------------+------+---------+---------+------------------+
| U-Mom | 1.06 | 1.5E-04 | 3.0E-02 | 2.2E-02 OK|
| V-Mom | 0.88 | 1.5E-04 | 3.8E-02 | 2.4E-02 OK|
| W-Mom | 1.09 | 9.8E-05 | 3.3E-02 | 2.2E-02 OK|
| P-Mass | 0.96 | 3.1E-06 | 3.2E-04 | 9.0 3.7E-02 OK|
+----------------------+------+---------+---------+------------------+
| H-Energy | 0.92 | 5.2E-05 | 1.4E-02 | 5.9 5.9E-02 OK|
+----------------------+------+---------+---------+------------------+
| K-TurbKE | 0.97 | 5.0E-05 | 4.5E-03 | 5.8 2.0E-02 OK|
| O-TurbFreq | 0.80 | 1.8E-05 | 2.5E-03 | 12.4 1.2E-04 OK|
+----------------------+------+---------+---------+------------------+
Attesz is offline   Reply With Quote

Old   October 7, 2010, 10:35
Default
  #19
New Member
 
Alexander U.
Join Date: Aug 2010
Posts: 7
Rep Power: 16
FoxTwo is on a distinguished road
so if you are just showing us the same case, me and my calculator are getting the following results:

core; simulation-time per iteration (s)

1; 414
2; 345
4; 275

well itīs a little speedup by increasing the cores, not quite good, but it is there!
FoxTwo is offline   Reply With Quote

Old   October 7, 2010, 10:48
Default
  #20
Senior Member
 
Attesz's Avatar
 
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17
Attesz is an unknown quantity at this point
Oh I see now. The CPU TIME is the cumulated time of the cores. I didn't know it. :S thanks. Than my values are meaningless.
Attesz is offline   Reply With Quote

Reply

Tags
cores, memory allocation factor


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Solar Radiation in OpenFOAM plainstyle OpenFOAM Running, Solving & CFD 15 July 8, 2014 05:43
UDF to change Rotation Speed in a MRF simulation Mike FLUENT 3 September 27, 2011 07:46
Best Pc Configuration for maximum simulation speed Anastasios Georgoulas FLUENT 2 December 18, 2008 13:34
How to Increase Speed? steve podleski Main CFD Forum 23 January 11, 2000 16:41
3-D Contaminant Dispersal Simulation Apple L S Chan Main CFD Forum 1 December 23, 1998 11:06


All times are GMT -4. The time now is 20:58.