|
[Sponsors] |
Comparison of F90 and F77 in terms of running speed |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
November 10, 1999, 06:50 |
Comparison of F90 and F77 in terms of running speed
|
#1 |
Guest
Posts: n/a
|
Dear people of the CFD community,
Do you have had the experience in comparing the running speed of a code written in Fortran 90 and Fortran 77? Recently I have changed my CFD code from F77 into F90. The big advantage I have obtained is that the memory allocation is much smaller. But the running speed of the Fortran 90 code is about 2 times of the Fortran 77 code. Do the over heads of F90 play a negative role? X. Ye |
|
November 10, 1999, 10:20 |
Re: Comparison of F90 and F77 in terms of running speed
|
#2 |
Guest
Posts: n/a
|
(1). If it is 2 times faster than f77, everybody would be using it right away. (2). If it is 2 times slower than f77, I think, your codes are not identical. It makes no sense to invent a new language which is 2 times slower than f77. (3). My suggestion is to run some typical test cases in a loop to check the speed using f90 and f77. (I am still using f77, so I can't run the test for you.)
|
|
November 10, 1999, 10:27 |
Re: Comparison of F90 and F77 in terms of running speed
|
#3 |
Guest
Posts: n/a
|
Dear John,
Thanks for your message. I wrote wrongly. The computing time of the F90 code is 2 times of the F77 code, i.e., F90 is slower. Till now, I see that the unique advantage of F90 is in the dynamic memory allocation. Maybe I must wait for the next version of the F90 compiler. Best regards X. Ye |
|
November 10, 1999, 11:30 |
Re: Comparison of F90 and F77 in terms of running speed
|
#4 |
Guest
Posts: n/a
|
(1). There are two ways to write the code, one for maximum speed, and one for minimum memory. (2). Many CFD researchers in early days had to find various ways to minimize the memory requirement. You must be a real CFD researcher to squeeze a 2-D turbulent flow problem into a 48k computer, or a 3-D explicit compressible flow code into an old main frame. (3). The same requirement is true for the computer game industries, to reduce the number of the polygons of a game object to reach the real time frame speed. (4). So, if you don't need the variable right away, you can always recompute it at a later time , use it and delete it (don't have to do this in old fortran).(5). To write a code for maximum speed, you would like to store all the temporary variables ahead of the time, or reserve all the memory even if not all of them will be used. You can also un-roll the do loop to make the logic check statement stay outside the loop. As a result of this approach, the code will be much faster than the one written for minimum memory. (6).I would say that, for 2-D codes , the maximum speed approach is practical. (commercial code is a different story, because it must include many options to cover many grounds) (7). On the other hand, for 3-D flows, it is necessary to consider the minimum memory approach, since not every machine is equipped with one giga bytes of memory. A couple of hundreds Mega bytes is required for one Mega cells or nodes. (for a commercial code, it would require more than one Giga bytes of ram memory for a one Mega cells problem) (8). For complex 3-D flow through a turbine passage, it may take a 100x100x100 mesh to get a good prediction of the total pressure loss with a low Reynolds number turbulence model. (9). So, for 3-D flows, a minimum memory approach for code development is the practical approach. If you don't have enough mesh points in a 3-D flow calculation, the result is not useful at all. (even the trend could be wrong.) The speed gain for the 3-D flow calculation will have to come from the hardware improvement.(parallel processing included) (10). Since the typical computing time for a 3-D problem is measured in days and weeks, there are rooms for both the hardware and the software improvement.
|
|
November 10, 1999, 12:05 |
Re: Comparison of F90 and F77 in terms of running speed
|
#5 |
Guest
Posts: n/a
|
Thank you, John, for your message. You are completely right that a code must be faster if the memory allocation has been done ahead and not dynamically during the calculation. At the latter case, there will be too many overheads. The large and fixed memory allocation can be a problem for me only if there are some jobs running parallel or if the cell number excesses that of the fixed array dimensions. Hence, for industry usage, F90 has its advantage. But the overhead problem should be solved.
Of course, there is a method of quasi-dynamical memory allocation with F77: You can compile your CFD code so many times with different dimensioning as nessesary; You write a shell script, in it you'll run a small program to read the mesh file and to know the cell number; then according to the cell number the shell script will select a compiled solver with minimal allocated memory to start. X. Ye |
|
November 10, 1999, 12:25 |
Re: Comparison of F90 and F77 in terms of running speed
|
#6 |
Guest
Posts: n/a
|
Hi,
What kind of compilation directives are you using ? Make, sure that you are using the same level of optimization of the code for both compilers. Cheers Neyval |
|
November 10, 1999, 12:31 |
Re: Comparison of F90 and F77 in terms of running speed
|
#7 |
Guest
Posts: n/a
|
Yes, the optimization level is same. I compiled the F77 code even with teh F90 compiler. The difference between my F90 and F77 codes is only in the dynamical/fixed memory allocation, i.e., the two codes are quite different but the compiler and the optimization level are same.
X. Ye |
|
November 10, 1999, 12:35 |
Re: Comparison of F90 and F77 in terms of running speed
|
#8 |
Guest
Posts: n/a
|
You said that you compiled your old F77 code with the F90 compiler. How about the execution speed this time ? Was it better or worse than the code compiled with F77 ?
Neyval |
|
November 10, 1999, 12:41 |
Re: Comparison of F90 and F77 in terms of running speed
|
#9 |
Guest
Posts: n/a
|
Our F90 compiler is newer than the F77 compiler, hence the code comoiled with F90 compiler is faster than that with the old F77 compiler.
X. Ye |
|
November 10, 1999, 12:46 |
Re: Comparison of F90 and F77 in terms of running speed
|
#10 |
Guest
Posts: n/a
|
This looks more close to normal to me.
I do not use dynamic memory allocation in Fortran, and I do know that the compilor improved a lot since I started programing, but when I was used to use Pascal programs using dynamic memory allocation were allways slower than one with 'normal' memory accessing. This fact was related to the accessing of the memory using pointers. Therefore, I allways assumed that it was a choice between speed or memory usage. Neyval |
|
November 10, 1999, 13:01 |
Re: Comparison of F90 and F77 in terms of running speed
|
#11 |
Guest
Posts: n/a
|
Then it sounds like you've found the root of your problem if I understand this correctly:
Old code written for F77 and compiled with F77 = baseline speed New code taking advantage of dynamic memory allocation and complied with F90 = half of baseline speed Old code written for F77 but complied under F90 = slower than baseline The process of allocating and deallocating memory requires clock cycles and is slowing the new code down (by more than half based on your old code compiled under F90 test). If you've got enough RAM, and speed is more important, then I'd suggest not using the dynamic memory allocation. As engineers we tend to want to use the newest, best features of anything we touch, and in this case the new features seem to be slowing you down. |
|
November 11, 1999, 04:28 |
Re: Comparison of F90 and F77 in terms of running speed
|
#12 |
Guest
Posts: n/a
|
Hi,
Thank you for your message. Neyval C. Reis and Alton J. Reich have found the root of the overhead problem of Fortran 90. Normaly speed is more important than memory allocation. As I wrote to the message of John C. Chien, there is a method of quasi-dynamical memory allocation with F77: You can compile your CFD code with different dimensioning and produce a lot of executable solvers with different memory allocations; you write a shell script, in it you'll run a small program to read the mesh file and to know the cell number; then according to the cell number the shell script will select an executable solver with minimal allocated memory to start. X. Ye |
|
November 11, 1999, 05:06 |
Re: Comparison of F90 and F77 in terms of running speed
|
#13 |
Guest
Posts: n/a
|
(1). I think it can be done, and it has been done, and I have used such approach (scripts written by someone). But it is hard to maintain. (2). Normally, this happens when there are several different computer systems and different versions of operating systems exist in one place. (3). There are two practical solutions, one is to use the maximum dimension size allowable and create the executable file. Just one version is enough on one computer system. The other one is to keep the system uniformly across the group or division, that is to use only one brand of computer and one version of the operating system. (4). Actually, there is a third one, that is to compile for each case. It is fairly straightforward to change a few dimension parameters and recompile the code. (5). By the way, the difficulty to maintain the script file is that it requires the hardwired files address, which can easily become a problem when the system is changing.
|
|
November 12, 1999, 11:50 |
Re: Comparison of F90 and F77 in terms of running speed
|
#14 |
Guest
Posts: n/a
|
the source of your f90 codes slowness wrt the f77 code is quite likely the use of dynamic allocation. dynamic allocation slows down code in any language. i use C and this is what i see. you can use conditional compilation (ie with C type preprocessor directives) to vary the size of your arrays depending on your problem. this approach is similar in concept to what you said about using a unix shell file except you don't need different versions of the source code. i'm sure there are other workarounds that will allow you to keep static allocation and still get some dynamic effect. or you could just keep working with the f77 code. oh by the way have you consider using C to allow dynamic allocation. you may find that the dynamic performance in C is better than the dynamic performance in f90. the static performance in either will still be better though. if all else fails you can go in and really rip your dynamic f90 code to shreds to find places to save time, profile the heck out of it, get comp sci guys to help you speed it up (this is a good tip, a comp sci professor once helped a friend of mine to make his f90 code go much faster-they have PhD's for a reason),use some esoteric compiler options etc.
|
|
November 12, 1999, 16:42 |
Re: Comparison of F90 and F77 in terms of running speed
|
#15 |
Guest
Posts: n/a
|
The CFD code I used in my first incarnation (job) was home- grown. All of the array dimensioning that was required for compilation was handled in a single file. For each job we had to go into the file, enter appropriate dimensions for each array, and compile the code. There was much complaining about having to actually edit the array dimensions, but it didn't require much effort, and we knew that we weren't wasting memory.
|
|
November 12, 1999, 16:57 |
Re: Comparison of F90 and F77 in terms of running speed
|
#16 |
Guest
Posts: n/a
|
77 has the parameter statement that works something like:
parameter (imax = 50, jmax=50) dimension uvel(imax,jmax), wvel(imax,jmax) ... Just change the 50's to whatever to change the size of your arrays. This seems to work on my f90 compiler as well, although I don't know what it might do to execution speed. |
|
|
|