CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Concurrent runs slowing each others

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   October 21, 2008, 13:40
Default Hi, I wanted to launch seve
  #1
Senior Member
 
John Deas
Join Date: Mar 2009
Posts: 160
Rep Power: 17
johndeas is on a distinguished road
Hi,

I wanted to launch several runs of one case on a multicore machine at the same time. I hoped, since each process would be completely independent, that computation time of each run won't affect too much the others. To the contrary, I witnessed an increase in the calculation time spent on each timestep according to the following chart:



(Results are not very precise, I used a timer to get them quickly, but the trends are represented). There is a very big increase in computation time, and it seems to occur by levels (e.g. with 3 or 4 cores results are quite the same).

To detect whether the problem is coming from OpenFOAM or not, I tried to see if another will lay the same results. I chose to launch several runs of Scilab, computing a basic matrix inversion. The command was: for i=1:10000, inv(rand(1000,1000));

The following results were obtained :



As one can see, the performance of the runs are relatively unaffected by the others until almost all the cores are occupied by Scilab processes.

I am using a dell station with 2 Intel E5450 processors each consisting of 4 cores cadenced at 3 gHz. The operating system is RHEL 5.

Is my problem coming from OF, should I adjust some parameters ? I hinted that OpenFOAM is making much more access to the RAM than Scilab, and maybe can slow the other processes accesses. Has somebody witnessed the same behaviour with OF ?

Thank you,

JD
johndeas is offline   Reply With Quote

Old   October 21, 2008, 13:42
Default Sorry for the image sizes, did
  #2
Senior Member
 
John Deas
Join Date: Mar 2009
Posts: 160
Rep Power: 17
johndeas is on a distinguished road
Sorry for the image sizes, didn't check that !
johndeas is offline   Reply With Quote

Old   October 21, 2008, 14:16
Default We have dual-core dual-cpu ser
  #3
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,715
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
We have dual-core dual-cpu servers.
With both OpenFOAM and STAR-CD we get better performance if we use a single core from each cpu and split across more machines rather than use all available cpus and cores.
Apparently the bottleneck to memory is much more significant than the additional network traffic by splitting across more separate machines.

With quad-cores, the memory bottleneck will look even worse. I think your best chance is to run each case in parallel and run the cases sequentially.
In this case, you should see how the parallel speed up looks for a single case.
olesen is offline   Reply With Quote

Old   October 21, 2008, 14:44
Default John and Mark, just out of
  #4
New Member
 
Juergen Neubauer
Join Date: Mar 2009
Location: Los Angeles, CA, USA
Posts: 2
Rep Power: 0
juergen is on a distinguished road
John and Mark,

just out of curiosity: Which processor architecture do you use? AMD or Intel? Xeon? Opteron? Is NUMA active?

What is your 'uname -a'?

I'm about to build a multiprocessor machine for work with OpenFOAM and I'd like to collect some experience from multi-core users.

Thanks a lot.

Ciao, Juergen
juergen is offline   Reply With Quote

Old   October 21, 2008, 15:28
Default If you search the forum, you'l
  #5
Senior Member
 
Srinath Madhavan (a.k.a pUl|)
Join Date: Mar 2009
Location: Edmonton, AB, Canada
Posts: 703
Rep Power: 21
msrinath80 is on a distinguished road
If you search the forum, you'll find that many of us have experimented with dual/quad core offerings from AMD and Intel and posted the results here.
msrinath80 is offline   Reply With Quote

Old   October 22, 2008, 03:35
Default Hi John, matrix calculation
  #6
Senior Member
 
Markus Rehm
Join Date: Mar 2009
Location: Erlangen (Germany)
Posts: 184
Rep Power: 17
markusrehm is on a distinguished road
Hi John,

matrix calculation is computationally very dense which means that memory access is rather small and much of it can be done in CPU cache.

By the way: which solver and problem size did you benchmark?

It depends a lot on the case and size. You should benchmark your cases and decide if it is better to fill up your whole cluster or leave half of the processors empty as Mark mentioned.

Regards Markus.
markusrehm is offline   Reply With Quote

Old   October 29, 2008, 09:43
Default First, thank you for your answ
  #7
Senior Member
 
John Deas
Join Date: Mar 2009
Posts: 160
Rep Power: 17
johndeas is on a distinguished road
First, thank you for your answers, and sorry for not replying myself as promptly.

@Juergen : uname -a gave me: Linux lambda30 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
Regarding NUMA being enabled or not, I have no idea.

@Mark : "I think your best chance is to run each case in parallel and run the cases sequentially." Doing this, am I not adding communication between process slow down to the existing memory bandwith slodown ?

@Florian : What processors are your using ?

I also had results from another calculation on this machine, performed using Fluent on an 8 million case. The case was run in parallel, and the results are reported below. As well as the various speedups obtained running concurrent OpenFOAM cases at the same time.

The openFoam case is a plane channel with 300000 calculation points. I am running a modified version of icoFoam which allows me to sustain a constant pressure gradient, and compute some statistics on the fly (like what channeloodles did before using controlDict functions). The Fluent case is an 8 million cells rectangular domain with velocity inlet/ pressure outlet, and symmetries on the sides.



One of the results which is astonishing is that Fluent run in parallel scales better that individual small cases in OpenFOAM run simultaneously. As I said, the Scilab results are obtained inversing 1000*1000 matrices. Since cat /proc/cpuinfo gave me a cache size of 6000 kb, I suspect the inversion to be made in cache, explaining the good speedup.

Based on what I have read on this forum, right now, as the datasets used in CFD are often large, memory bandwith is the most limiting factor. Having multiple cores sharing this bandwith through a single Therefore, multiple cores CPU are not suited for those tasks, because they share a front side bus to access the memory. The Opteron architecture, however is not subject to those limitations, as every core has a direct. The processor to memory controller interface is on the processor die.
As my laboratory is willing to invest in a cluster, I guess a good base could be a bunch of multicore Opterons connected using Infiniband.

I don't understand then why so many clusters seem to use Xeon machines ? Have they some kind of improvement which makes their shared bandwith efficient anyway ?
johndeas is offline   Reply With Quote

Old   November 28, 2008, 14:54
Default I posted this one on CFD onlin
  #8
Senior Member
 
John Deas
Join Date: Mar 2009
Posts: 160
Rep Power: 17
johndeas is on a distinguished road
I posted this one on CFD online. Not everybody might track posts there and well, as I am mainly doign calculation with OpenFOAM, I value more comments from this forum.

CPU have become much more performant than memory access, which is putting a lot of pressure on the FSB paradigm, created times ago, when memory access were comparatively faster. The CPU cache has been developped to limit the access to the memory, but, due to the large amount of data needed to solve Navier-Stokes equations on a large domain, cache performance become less important as it needs to be frequently filled with data from the RAM. Hypertransport from AMD is a solution, has it removes the FSB and allow the Opteron CPU to be connected directly to memory. However, recent Xeon provide several FSB (one per core) which also cirvumvent the problem of saturation of a single FSB, and might explain why Xeon equip a large percentage of clusters, despite its use of "old" FSB technology.

What is your say on this ?
johndeas is offline   Reply With Quote

Old   January 15, 2009, 04:27
Default John Deas -> I use dell workst
  #9
Member
 
florian
Join Date: Mar 2009
Location: Mannheim - Vincennes - Valenciennes, Deutchland - France
Posts: 34
Rep Power: 17
floooo is on a distinguished road
John Deas -> I use dell workstation with 2 intel Xenon quad processor and 64GB of memory; the machines are connected togever with a Gbit network:
My speedup is nearly linear.
I can't make a nice plot because I never alone on the machines.

Here are the results obtained by ENEA (national agency for new technology, energy and envionment - Italy)
http://www.eneagrid.enea.it/papers_presentations/papers/NapoliEScience2008_09_CR ESCO.pdf

With an infiniband their result are linear.
The study show that the bandwith of the connection becomes a major criterion when the number of cores increase.
For a Gb bandwith the effect of this limitation appear after 20 cores.

Your machines are maybe connected with only a 100M ethernet bandwith.
floooo is offline   Reply With Quote

Old   January 15, 2009, 04:44
Default i need to run a combustion pro
  #10
New Member
 
VIJAYAKUMAR R
Join Date: Mar 2009
Location: BANGALORE, KARNATAKA, INDIA
Posts: 20
Rep Power: 17
vijayakumar is on a distinguished road
i need to run a combustion problem.... i used XI-foam as solver, while running some errors are coming... i need to know which solver is good for combustion problem.. and few guidelines for solving combustion problem.
vijayakumar is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parallel runs using LAM sek OpenFOAM Running, Solving & CFD 11 February 13, 2008 08:36
Slowing an animation sequence Shanti FLUENT 1 May 11, 2006 19:44
LES runs anindya FLUENT 0 June 25, 2005 08:03
CFD in concurrent engineering David Howell Main CFD Forum 5 April 29, 1999 08:46


All times are GMT -4. The time now is 20:35.