|
[Sponsors] |
January 27, 2009, 13:15 |
parallel performance
|
#1 |
Guest
Posts: n/a
|
I am testing cfx11 for parallel computing, using an Intel QuadCore with 4GB ram. My domain is a pipe with about 10000 elements in the section, and 256 nodes streamwise (about 3 millions of nodes). I have found that the serial run performs very well with full load of a single processor, while parallel run is a disaster. The load falls down and the wallclock time per timestep is higher than the serial run! Can it depend on the domain shape that is almost topologically cubic, or there is something else? Some mistake? I use PVM and automatic partioning.
|
|
January 27, 2009, 17:22 |
Re: parallel performance
|
#2 |
Guest
Posts: n/a
|
Hi,
Are you running on Windows? PVM does not run too well on windows. Try MPICH. Glenn Horrocks |
|
January 27, 2009, 18:12 |
Re: parallel performance
|
#3 |
Guest
Posts: n/a
|
Here's one possibility. Each solver process needs some extra memory, in addition to the memory for the matrices and variable values on the mesh. I'll call these two types the 'solver memory' and 'job memory'. So
a) For 1 CFX process you need (100% job memory + solver memory) b) For 2 CFX processes, dividing the job 50% per process, you need 2*(50% job memory + solver memory) c) For 3 CFX processes, dividing the job 33% per process, you need 3*(33% job memory + solver memory) etc. So your parallel task will use more memory requirements than the single process version because of the extra copies of the 'solver memory'. Do you have enough memory? Are you using pagefiles in the parallel runs (but not in the single processor case)? Cheers, andy |
|
January 28, 2009, 12:53 |
Re: parallel performance
|
#4 |
Guest
Posts: n/a
|
As Glenn suggested, use MPICH. PVM hangs on Windows, which is probably the greatest source of delay.
The case you are running is only 10k nodes, so it probably won't scale that well. Available memory won't be an issue, but there is additional computational overhead for the solver and some communication overhead. On large models, these are amortized over a large number of nodes and aren't noticable, but you will generally see parallel efficiency drop off as your partitions drop below 100k nodes each. That said, with MPICH you should see some improvement in run time, I just wouldn't expect 2 processors to be twice as fast (maybe 1.2 to 1.5 times faster). Make your mesh bigger and you'll see better parallel efficiency. -CycLone |
|
January 28, 2009, 17:44 |
Re: parallel performance
|
#5 |
Guest
Posts: n/a
|
You are probably right about PVM - I've never used Windows with CFX, and you have a knack for answering questions well here! However, I'll just point out the OP's problem is 10,000 elements in *each* of 256 mesh cross sections, giving ~3 million nodes in total, as the original post said. (It sounds like an extruded or structured mesh.)
I would certainly agree that 10,000 nodes is too small too scale well in parallel. However, the actual problem size of 3 million nodes does sound about the size that would use most of a 4GB machine's memory. Hence my suggestion - however I don't have access to a suitable problem to estimate the actual memory consumption accurately just now, so I freely admit it's just a half-educated guess! Best wishes, andy2o |
|
January 29, 2009, 12:33 |
Re: parallel performance
|
#6 |
Guest
Posts: n/a
|
Ah! I missed that. I thought it was 10k nodes total.
3 million nodes (hex) will probably require ~3GB RAM. The solver memory overhead is still pretty small, so I doubt it would push him over but it may be pushing the limit with other applications running on the same machine. PVM will definitely be an issue (it is gone from v12beta altogether), so let's see how MPICH works for him. -CycLone |
|
January 29, 2009, 16:26 |
Re: parallel performance
|
#7 |
Guest
Posts: n/a
|
The Intel Core 2 the front side bus can cause bottleneck issues as well. This might be part of the problem
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
parallel performance on BX900 | uzawa | OpenFOAM Installation | 3 | September 5, 2011 16:52 |
Performance of GGI case in parallel | hannes | OpenFOAM Running, Solving & CFD | 26 | August 3, 2011 04:07 |
Parallel performance OpenFoam Vs Fluent | prapanj | Main CFD Forum | 0 | March 26, 2009 06:43 |
Performance of interFoam running in parallel | hsieh | OpenFOAM Running, Solving & CFD | 8 | September 14, 2006 10:15 |
ANSYS CFX 10.0 Parallel Performance for Windows XP | Saturn | CFX | 4 | August 13, 2006 13:27 |