Build times for OpenFOAM 2.0.x code with Ubuntu 10.10 with its gcc 4.4.5
Posted August 20, 2011 at 12:41 by wyldckat
Updated August 27, 2011 at 14:36 by wyldckat (Added a few notes)
Updated August 27, 2011 at 14:36 by wyldckat (Added a few notes)
This post is gathers information about OpenFOAM 2.0.x build times, using the gcc version that comes with Ubuntu 10.10, namely 4.4.5.
_____________ This was initially posted here: http://www.openfoam.com/mantisbt/view.php?id=256 _____________
Since I was curious about using N+1 processes, I've built OpenFOAM 3 times on my machine that has 8GB of DDR2 800MHz and one AMD 1055T X6, using Ubuntu 10.10 with its original gcc version:
Which resulted in the following timings:
Additional timings for comparison:
More conclusions:
_____________ This was initially posted here: http://www.openfoam.com/mantisbt/view.php?id=256 _____________
Since I was curious about using N+1 processes, I've built OpenFOAM 3 times on my machine that has 8GB of DDR2 800MHz and one AMD 1055T X6, using Ubuntu 10.10 with its original gcc version:
- 1st build with 6 processes;
- 2nd build with 7 processes;
- 3rd build with 6 processes again, to account for file cache on RAM.
Which resulted in the following timings:
- Code:
10273.99user 372.12system 36:14.82elapsed 489%CPU (0avgtext+0avgdata 2001584maxresident)k 227072inputs+4583896outputs (452major+127455000minor)pagefaults 0swaps
- Code:
10274.53user 368.04system 36:10.27elapsed 490%CPU (0avgtext+0avgdata 2001552maxresident)k 56inputs+4584104outputs (0major+127470259minor)pagefaults 0swaps
- Code:
10209.99user 366.84system 35:49.39elapsed 492%CPU (0avgtext+0avgdata 2001536maxresident)k 200inputs+4583880outputs (0major+127473965minor)pagefaults 0swaps
Additional timings for comparison:
- 1 core:Code:
8213.30user 298.34system 2:24:07elapsed 98%CPU (0avgtext+0avgdata 2001648maxresident)k 218880inputs+4585496outputs (343major+127443737minor)pagefaults 0swaps
- 2 cores:Code:
8437.93user 321.37system 1:16:39elapsed 190%CPU (0avgtext+0avgdata 2001584maxresident)k 16inputs+4584432outputs (0major+127456817minor)pagefaults 0swaps
- 3 cores:Code:
8990.99user 363.41system 56:40.62elapsed 275%CPU (0avgtext+0avgdata 2001600maxresident)k 792inputs+4584168outputs (0major+127467703minor)pagefaults 0swaps
- 4 cores:Code:
9963.10user 382.46system 48:36.42elapsed 354%CPU (0avgtext+0avgdata 2001584maxresident)k 8inputs+4584064outputs (0major+127476961minor)pagefaults 0swaps
- 5 cores:Code:
10105.92user 385.89system 40:57.82elapsed 426%CPU (0avgtext+0avgdata 2001424maxresident)k 176inputs+4583944outputs (2major+127468657minor)pagefaults 0swaps
More conclusions:
- Linking is mostly done in single core, therefore it's normal that the performance drops when more CPUs are added to the computation pool.
- The second line in each timing output doesn't reflect much about what really happened.
- Neither does the maxresident value report much about the real total memory used during build time.
- Using more than one machine.
- Monitoring the maximum memory used with each set of cores. I already know that 4 cores need around 1.5 to 1.6 GB of RAM, but I don't know how much is needed for 6 cores.
- Using limited RAM, to reduce file cache, which might then improve timings for using N+1.
- Building inside a virtual machine on this very same real machine, to compare performance.
Total Comments 2
Comments
-
Posted September 3, 2011 at 11:42 by 7islands -
Hi Takuya!
Actually, I do believe it seriously depends on the speed at which the files are accessed and/or accessible. And since modern machines + Linux OS use file cache on RAM abundantly, the necessity for N+1 is reduced to naught.
But indeed: experiments must be conducted to reach some sort of proof, even if it can be disproved in the future with other experiments
Best regards,
BrunoPosted September 12, 2011 at 15:59 by wyldckat