|
[Sponsors] |
September 29, 2010, 11:50 |
Simulation speed increase
|
#1 |
Senior Member
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17 |
Hi all,
I'm interesting in the effect of the Memory Allocation Factor in Solver Manager, Define Run Advanced Tabs. When I set higher values it speeds up the simulation? My other question is about the number of cores. I'm using a HP Workstation machine with 4 core INTEL Xeon and 16GBs of RAM. My simulation is a steady stade turbomachninery case with 2.5millions of elements. I experienced that when I use 2 cores it is faster than using 4 cores. Is there a fist-rule to select the cores correctly, or I should do benchmarks in every cases? Thanks in advance, Attila |
|
September 29, 2010, 14:31 |
|
#2 |
New Member
Chrome
Join Date: Sep 2010
Location: WI, USA
Posts: 5
Rep Power: 16 |
Attila,
What Xeon processor are you using? I use Xeon 5650. It is a 6 core with 24GB RAM. I have notice the same on my machine. There is no marginal speedup from 4 to 6 core. That has been the case for most of the simulation ranging from 800k cell to 2 million. Most of the time speedup or speed is bottlenecked by BUS speed. So even if you have good RAM and Processor, adding extra core does not help just because BUS is not capable of handling I/O. |
|
September 29, 2010, 16:43 |
|
#3 |
Senior Member
Join Date: Mar 2009
Location: Europe
Posts: 169
Rep Power: 17 |
1. Speed up by memory allocation. Service told me some time ago that it may introduce a slighty better performance.
2. Much more impact: the multiple core issue. The cores share the band width to the memory. Depending on the machine this may limit the scalabilty. Possibly it is much faster to use two cores on two machines than four cores on one machine |
|
September 29, 2010, 16:45 |
|
#4 |
Senior Member
Join Date: Mar 2009
Location: Europe
Posts: 169
Rep Power: 17 |
upps. Point two was already answered. I am sorry.
|
|
September 30, 2010, 06:25 |
|
#5 | |
Senior Member
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17 |
Quote:
Im using X5450 3GHz 4 cores. Thank's for your reply! |
||
September 30, 2010, 06:26 |
|
#6 | |
Senior Member
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17 |
Quote:
|
||
October 7, 2010, 07:51 |
|
#7 | |
New Member
Sharad Gupta
Join Date: Oct 2010
Posts: 3
Rep Power: 16 |
Quote:
What you are saying is true. But in the event that bus speed is not your limiting factor, there is also another dimension to it. I use an 8 core, 32GB RAM machine. I use 6 cores for my solver and not all 8. Primarily, the CPU needs some "free" cores to help in writing the result files into the hard drive. If you allocate all the cores for your solver, the solver will temporarily need to stop every time it writes a result file or updates monitoring points data into a file. This slows down the solver. Also, I have found that in parallelization, always use even number of cores. For some odd reason (which I am not sure why) even number of cores work faster than odd number of cores. For example a solver working on 6 cores in an 8 core machine, works faster than if I use 5 cores or even 7 cores for that matter. Meanwhile, using 6 cores on an 8 core machine is faster than using 2 or 4 cores. So as a rule of thumb I tend to use the highest possible 'even' number of cores which is lower than the maximum cores present in the system. I hope this helps... Good Luck.. |
||
October 7, 2010, 08:08 |
|
#8 |
Senior Member
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17 |
--------------------
Last edited by Attesz; October 7, 2010 at 10:49. Reason: mistake |
|
October 7, 2010, 08:36 |
|
#9 |
New Member
Sharad Gupta
Join Date: Oct 2010
Posts: 3
Rep Power: 16 |
Hi Attila,
I understand what you mean when you run multiple processes and your data is definitely helpful. Thanks! When I have a mesh which is less than 1-2 million, and I have multiple cases to run, I tend to use 2 cores for each run. This way I use a total of 6 cores for running 3 different load cases. This works best provided you have enough RAM and hard disk space. However, I did not understand quite clearly your other data. You mentioned it took more time (almost double) for every iteration as you doubled the number of cores. For example when you went from 1 core to 4 cores, it took 5.88 min in 1 core and 18 min in 4 cores for just 1 iteration?!?! Is the data correct or is it because you have moving mesh. Because, I think that moving mesh and parallelization don't gel very well. |
|
October 7, 2010, 09:20 |
|
#10 | |
Senior Member
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17 |
Quote:
On your machine, your simulation accelerates if you use more CPU's when running with 1-2 millions of elements? Regards, Attesz |
||
October 7, 2010, 09:35 |
|
#11 |
New Member
Sharad Gupta
Join Date: Oct 2010
Posts: 3
Rep Power: 16 |
Yes. In my experience, when I use 1-2 million elements, the simulation runs faster if I use multiple cores as opposed to a single core. This is when all the cores belong to the same CPU. However, the slow communication problem occurs when I use multiple CPUs in a cluster connected by LAN cables. So the communication between different CPUs becomes the limiting factor. I have not yet witnessed slow communication problem when I use mulitple cores in the same CPU.
|
|
October 7, 2010, 09:40 |
|
#12 |
Senior Member
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17 |
Hmm, that's interesting. The communication slowdown using LAN cables is comprehensible, of course.
Maybe in my case it's caused by the rotating frame really. Anyway, it was useful to discuss this. Regards, Attesz |
|
October 7, 2010, 09:50 |
|
#13 |
New Member
Alexander U.
Join Date: Aug 2010
Posts: 7
Rep Power: 16 |
just to mention, if you compare the cpu time between two seperated iterations, you have to divide the time by the used cores! to compare the speedup/time saving it is better to run a simulation with a defined number of iteration and compare the duration of the hole simulation!
|
|
October 7, 2010, 09:56 |
|
#14 |
Senior Member
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17 |
Alexander,
thanks for the notice, I did almost the same. The numbers what I've shared before means 1 iteration time need computed from the 99.th and 100.th iterations of a single process. |
|
October 7, 2010, 10:04 |
|
#15 |
New Member
Alexander U.
Join Date: Aug 2010
Posts: 7
Rep Power: 16 |
well, then i donīt understand why the simulation with two cores is slower than when you use only one on a 4 core-machine? in my experience, a parallel run (2 cpus/cores) is always faster!
|
|
October 7, 2010, 10:06 |
|
#16 |
Senior Member
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17 |
I don't understand it as well
|
|
October 7, 2010, 10:10 |
|
#17 |
New Member
Alexander U.
Join Date: Aug 2010
Posts: 7
Rep Power: 16 |
can you plz write down here the header with the cpu-time (CPU SECONDS = ...) of the two iterations which are you comparing? and that for all core-cases you have done yet? i have to see it with my own eyes.
|
|
October 7, 2010, 10:19 |
|
#18 |
Senior Member
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17 |
1 procesess on 4 cores
================================================== ==================== OUTER LOOP ITERATION = 762 ( 99) CPU SECONDS = 2.366E+05 (1.085E+05) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 1.36 | 8.0E-04 | 5.1E-01 | 1.2E-02 OK| | V-Mom | 1.19 | 2.6E-04 | 1.6E-01 | 1.5E-02 OK| | W-Mom | 1.26 | 2.1E-04 | 1.4E-01 | 1.7E-02 OK| | P-Mass | 1.70 | 1.4E-05 | 1.0E-02 | 9.1 3.5E-02 OK| +----------------------+------+---------+---------+------------------+ | H-Energy | 0.74 | 1.2E-04 | 6.0E-02 | 6.1 5.6E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 1.13 | 2.6E-04 | 1.6E-01 | 6.0 1.8E-02 OK| | O-TurbFreq | 1.22 | 2.2E-05 | 7.9E-03 | 12.5 4.6E-05 OK| +----------------------+------+---------+---------+------------------+ ================================================== ==================== OUTER LOOP ITERATION = 763 ( 100) CPU SECONDS = 2.377E+05 (1.096E+05) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 0.70 | 5.6E-04 | 4.5E-01 | 2.0E-02 OK| | V-Mom | 0.83 | 2.2E-04 | 1.3E-01 | 1.6E-02 OK| | W-Mom | 0.84 | 1.7E-04 | 9.7E-02 | 1.8E-02 OK| | P-Mass | 0.60 | 8.6E-06 | 5.3E-03 | 9.1 3.6E-02 OK| +----------------------+------+---------+---------+------------------+ | H-Energy | 1.30 | 1.5E-04 | 8.6E-02 | 6.1 5.5E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 0.83 | 2.2E-04 | 1.2E-01 | 6.0 1.9E-02 OK| | O-TurbFreq | 0.84 | 1.9E-05 | 7.4E-03 | 12.5 4.7E-05 OK| +----------------------+------+---------+---------+------------------+ one process on 1 core (i've started it now, so it will be much faster) ================================================== ==================== OUTER LOOP ITERATION = 930 ( 1) CPU SECONDS = 3.780E+05 (4.566E+01) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 7.31 | 3.6E-03 | 3.8E-01 | 1.5E-01 ok| | V-Mom |14.93 | 3.6E-03 | 1.3E-01 | 1.3E-01 ok| | W-Mom |99.99 | 2.2E-02 | 1.3E+00 | 1.5E-02 OK| | P-Mass |99.99 | 1.6E-03 | 1.2E-01 | 4.9 5.8E-02 OK| +----------------------+------+---------+---------+------------------+ +--------------------------------------------------------------------+ | ****** Notice ****** | | A wall has been placed at portion(s) of an OUTLET | | boundary condition (at 20.0% of the faces, 24.6% of the area) | | to prevent fluid from flowing into the domain. | | The boundary condition name is: Outlet. | | The fluid name is: Air Ideal Gas. | | If this situation persists, consider switching | | to an Opening type boundary condition instead. | +--------------------------------------------------------------------+ | H-Energy |58.25 | 6.0E-03 | 1.5E-01 | 5.8 7.7E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE |26.76 | 5.4E-03 | 1.9E-01 | 5.8 3.8E-02 OK| | O-TurbFreq |99.99 | 3.0E-03 | 7.6E-02 | 12.3 1.6E-04 OK| +----------------------+------+---------+---------+------------------+ ================================================== ==================== OUTER LOOP ITERATION = 931 ( 2) CPU SECONDS = 3.785E+05 (4.598E+02) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ and one process on 2 cores ================================================== ==================== OUTER LOOP ITERATION = 1174 ( 57) CPU SECONDS = 6.893E+05 (3.785E+04) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 1.00 | 1.4E-04 | 2.5E-02 | 2.2E-02 OK| | V-Mom | 0.98 | 1.7E-04 | 7.0E-02 | 1.9E-02 OK| | W-Mom | 0.98 | 9.0E-05 | 2.3E-02 | 2.2E-02 OK| | P-Mass | 0.99 | 3.2E-06 | 4.6E-04 | 9.0 3.6E-02 OK| +----------------------+------+---------+---------+------------------+ | H-Energy | 0.95 | 5.6E-05 | 2.3E-02 | 5.9 5.7E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 0.97 | 5.1E-05 | 4.6E-03 | 5.8 1.9E-02 OK| | O-TurbFreq | 1.06 | 2.2E-05 | 3.3E-03 | 12.4 1.4E-04 OK| +----------------------+------+---------+---------+------------------+ ================================================== ==================== OUTER LOOP ITERATION = 1175 ( 58) CPU SECONDS = 6.900E+05 (3.854E+04) ---------------------------------------------------------------------- | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+---------+---------+------------------+ | U-Mom | 1.06 | 1.5E-04 | 3.0E-02 | 2.2E-02 OK| | V-Mom | 0.88 | 1.5E-04 | 3.8E-02 | 2.4E-02 OK| | W-Mom | 1.09 | 9.8E-05 | 3.3E-02 | 2.2E-02 OK| | P-Mass | 0.96 | 3.1E-06 | 3.2E-04 | 9.0 3.7E-02 OK| +----------------------+------+---------+---------+------------------+ | H-Energy | 0.92 | 5.2E-05 | 1.4E-02 | 5.9 5.9E-02 OK| +----------------------+------+---------+---------+------------------+ | K-TurbKE | 0.97 | 5.0E-05 | 4.5E-03 | 5.8 2.0E-02 OK| | O-TurbFreq | 0.80 | 1.8E-05 | 2.5E-03 | 12.4 1.2E-04 OK| +----------------------+------+---------+---------+------------------+ |
|
October 7, 2010, 10:35 |
|
#19 |
New Member
Alexander U.
Join Date: Aug 2010
Posts: 7
Rep Power: 16 |
so if you are just showing us the same case, me and my calculator are getting the following results:
core; simulation-time per iteration (s) 1; 414 2; 345 4; 275 well itīs a little speedup by increasing the cores, not quite good, but it is there! |
|
October 7, 2010, 10:48 |
|
#20 |
Senior Member
Attesz
Join Date: Mar 2009
Location: Munich
Posts: 368
Rep Power: 17 |
Oh I see now. The CPU TIME is the cumulated time of the cores. I didn't know it. :S thanks. Than my values are meaningless.
|
|
Tags |
cores, memory allocation factor |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Solar Radiation in OpenFOAM | plainstyle | OpenFOAM Running, Solving & CFD | 15 | July 8, 2014 05:43 |
UDF to change Rotation Speed in a MRF simulation | Mike | FLUENT | 3 | September 27, 2011 07:46 |
Best Pc Configuration for maximum simulation speed | Anastasios Georgoulas | FLUENT | 2 | December 18, 2008 13:34 |
How to Increase Speed? | steve podleski | Main CFD Forum | 23 | January 11, 2000 16:41 |
3-D Contaminant Dispersal Simulation | Apple L S Chan | Main CFD Forum | 1 | December 23, 1998 11:06 |