|
[Sponsors] |
August 8, 2002, 18:58 |
p3 vs p4 for CFD: winner p3
|
#1 |
Guest
Posts: n/a
|
we've been benchmarking our new p4-2.26 GHz, 1 Gb DDRAM vs an older laptop with p3-800 MHz, 512 Mb SDRAM and vs a p4 1.4 gHz, 512 Mb RDRAM. All computers were by Dell.
We designed a suite of CFD problems of different size, with the largest requiring slightly more than 512 mb so that the p3 and p4-1.5 had to page a little. The results are pretty amazing. If for smaller problem the CPUs scaled roughly proportional to the processor speed, e.g. p3-800 was ~2.6 times slower than p4-2.26, as the problem size got larger p3 became 2 times(!) faster than p4-1.5 and almost 50% faster than p4-2.26. We tried to explain the performance of p4-1.5 by the smaller cache (256 vs 512 on the other computers), but what would explain a computer with a 3x slower processor and slower memory beating the brand new p4 is beyond comprehension. Any ideas of experience? |
|
August 9, 2002, 06:37 |
Re: p3 vs p4 for CFD: winner p3
|
#2 |
Guest
Posts: n/a
|
Paging for a CFD simulation! I know the efficiency of the Intel CISC chips is dreadful (i.e. the number of effective operations you get per clock tick for real problems) but surely this a bit extreme.
* In cache benchmark performance although widely discussed and important for shifting hardware is almost wholly irrelevant for real CFD predictions in my experience. * What has always mattered for big CFD predictions since microprocessors replaced "proper" processors is getting the values from memory to the microprocessor. This involves the performance of the main board and support chips which generally does not seem to get much attention. (I know effectively nothing about this for your hardware). If my recollection is not letting me down (don't trust it), I think the PIII collects signficantly more values from memory each visit than the P4 and that the product of number of values * memory speed is actually in favour of the PIII compared to the P4. Hence, if most of the extra values collected from memory are used then the PIII should go faster than the P4 when you are running a large job and the processors are mainly idling waiting for values from memory. What is actually happening is probably slightly more involved that the above but it may well be at the heart of what is being observed. |
|
August 9, 2002, 10:44 |
Re: p3 vs p4 for CFD: winner p3
|
#3 |
Guest
Posts: n/a
|
We have some 1Ghz P3's and various speeds of P4 Xeon's and everything we have run shows that the P4's give speeds that are even better than the clock rate difference (ie a 2ghz P4 is more than 2x times the P3). I think that the extra cache (Xeon) makes alot of the difference, at least for our code.
|
|
August 9, 2002, 11:16 |
Re: p3 vs p4 for CFD: winner p3
|
#4 |
Guest
Posts: n/a
|
Steve - how much cache do your Xeons have? Their standard setup has the same 512 Kb as any p4 faster than 2 GHz would. thx
|
|
August 9, 2002, 13:55 |
Re: p3 vs p4 for CFD: winner p3
|
#5 |
Guest
Posts: n/a
|
You may provide more details of your benchmark project, such as array size, compiler, compiler optimization options. However, based on your description, It could be due the different cache size.
|
|
August 9, 2002, 16:51 |
Re: p3 vs p4 for CFD: winner p3
|
#6 |
Guest
Posts: n/a
|
We ran problems from 140,000 grid cells up to 1.2M cells. The performance was not only grid, but also the problem structure dependent, as expected. For 140,000 problems the p4-2.26 was indeed nearly 3x faster than the p3-800. However at 1.2M it was 50% slower.
Cache in both the p3 and p4 is the same: 512 Kb. On Intel's advice we have just recompiled the solver with their latest v6 compiler: it did help: now p4-2.26 is only 5-10% slower... |
|
August 10, 2002, 11:45 |
Re: p3 vs p4 for CFD: winner p3
|
#7 |
Guest
Posts: n/a
|
If you're using intel's compiler, you have chance to exploit the full horse power of p3 and p4.
For P3, try the following compiler options: ifc -O3 -xK -tpp6 -o executable_file source.f90 For P4, use ifc -O3 -xW -tpp7 -o executable_file source.f90 |
|
August 11, 2002, 12:42 |
Re: p3 vs p4 for CFD: winner p3
|
#8 |
Guest
Posts: n/a
|
Clif, I looked again at the machines that we ran a set of tests on and was surprised to find that the 1Ghz P3 and 1.7 Ghz P4 both had 256Kb of cache. I had thought the P4 had more, but was mistaken. The 1.7Ghz P4 was always significantly faster (esp on a big problem) than the P3. For what its worth, all our code is compiled with Absoft fortran rather than Intel's compiler. Steve
|
|
August 12, 2002, 12:02 |
Re: p3 vs p4 for CFD: winner p3
|
#9 |
Guest
Posts: n/a
|
We are using Intel compiler v 5. We have just upgraded to v 6 last friday to see if that would help. It did: now p4 2.26 runs not 50% slower than p3 800, but about the same...
We are talking to Intel to see if there is something that we missed. In all previous transitions from p to p2 from p2 to p3 there has never been anything like that. Processors simply scaled with their frequency. Thank you for your input. |
|
August 13, 2002, 03:34 |
Re: p3 vs p4 for CFD: winner p3
|
#10 |
Guest
Posts: n/a
|
This is not in line with our observations - we have tested both commercial codes and in-house codes (compiled both with AbSoft Fortran and Intel's new Fortran compiler). The P4 runs all our CFD codes very well and often runs faster per MHz than our P3s. Note that the memory is very critical for the P4. PC800 RDRAM is still best. If you have PC600 RDRAM it will affect performance significantly.
|
|
August 13, 2002, 16:32 |
Re: p3 vs p4 for CFD: winner p3
|
#11 |
Guest
Posts: n/a
|
Jonas - we were using Intel 5 and now trying Intel 6. the latter helped a lot by now, except that we have been unable to compile the software for parallel computation (the same code was compiled fine in v5). Here is a trivial sample code for which Intel 6 reports an error:
!----------------------------------------------------------------------! program test integer N, fi real eps, dt fi = 100 open( fi, file = 'input.dat' ) read( fi, '(i8)' ) N read( fi, '(e13.5)' ) dt read( fi, '(e13.5)' ) eps close( fi ) end !----------------------------------------------------------------------! input.dat: 100 ! N 1.00000E-02 ! dt 1.00000E-05 ! eps ----- make file to compile it: @echo off ifl -ogood test.for ifl -obad -Qopenmp -Qfpp2 test.for |
|
|
|