|
[Sponsors] |
May 4, 2001, 18:07 |
C or Fortran: Facts
|
#1 |
Guest
Posts: n/a
|
This posting is prompted by the lack of facts in the C/C++ or Fortran discussion below.
The following is for the air benchmark at: http://www.polyhedron.co.uk/ The fortran was converted to C using f2c from netlib: http://www.netlib.org/f2c/ The machine used is an old Sun Ultra 1/140 (about a 5 year old bottom of the range workstation). The native compilers are the same vintage version 4.0 (4.2 I think). The gnu compilers are current(ish) version 2.95.2. The times are in seconds: f77 -O => 105.0 g77 -O => 167.0 cc -O => 300.0 cc -O5 => 138.0 gcc -O => 191.0 gcc -O3 => 173.0 The results are pretty much what I would expect (I repeated the C compilations with the highest (simple) optimisation flag because the native C compiler was not doing much optimising. I ought to raise the optimisation for the Fortran to be consistent but then I ought to fix some of the translated C code to be fair and I ought to use similar age compilers and ought to explore and use the optimisations that really work and use a bigger 3D test case and ... (this sort of exercise has no end). Any more contributions? The exercise should take only a few minutes - half an hour tops unless you play around. |
|
May 5, 2001, 15:03 |
Re: C or Fortran: Facts
|
#2 |
Guest
Posts: n/a
|
(1). Based on your test data, it seems to me that there is an average speed ratio of 2 relative to the f77 code speed. I don't know whether it is very important for every application or not. But in my case, the selection of the language is a different story. (2). In old days, there is only Fortran language available, so it is the standard language. (3). And when the PC became available, I had to use BASIC, because it was the only language available to the average users. Even though BASIC was slow (it is still slow today), I was able to run my CFD codes on the PC using BASIC and to plot the results using also BASIC programs I wrote. (4). Later on, Fortran language became available on PC, and that made the life easier(run faster, 5 to 10 times faster)to work on cfd code. (5). But the drawback then was, I still have to use my basic code to do graphics. Fortran simply does not have the graphic commands included. (6). But this was then solved around late 80's, when MS Fortran included graphic commands. At that point, I was able to create commercial like cfd programs, with graphics included. So, I invested a lot of time there, thinking that it will be the standard of future(using Fortran with graphics). (7). This approach got interrupted again around mid-90's, when MS decided to drop the support of MS Fortran. (it is dangerous to use someone's code, and this is one aspect of it). (8). In late 80's, when I was looking for graphics for Fortran, I also run into this MS Windows program, which was written in C-language. So, C-language became attractive me because of Windows GUI, not because of its speed. (it was fast using C) (9). From there, I decided to learn the Windows API, which eventually lead to VC++ today. (10). So, from my point of view, even though the speed of the language is important, my decision to use C/C++ or VC++ was based on my programming need, that is easy-to-use GUI. (11). I think, in view of this, the speed ratio of 2 is quite acceptable to me, because even today, most of us are spending a lot (major part) of time doing modeling, and trying to get the code to converge. In this case, better user-interface, and better algorithm development seem to be the more rewarding areas of research in the future. (12). And on PC, the only languages available to me right now is VC++ and Visual Basic. The important aspect of this is: it has a large market support and it can create Windows programs with graphics and even 3-D animations.(DirectX 3D and OpenGL )
|
|
May 6, 2001, 13:01 |
Re: C or Fortran: Facts
|
#3 |
Guest
Posts: n/a
|
Repeating the same "air" benchmark on a 500/100 MHz Intel Pentium III (in a cheap motherboard) running FreeBSD 4.3.
f77 is invoking g77 version 2.95.3 and cc is invoking gcc version 2.95.3. f77 -O => 38.7 f77 -O2 => 34.2 f77 -O2 -malign-double -funroll-loops -march=pentiumpro => 34.4 cc -O => 41.5 cc -O3 => 40.4 The extra f77 optimisation tuning flags were taken from the same source as the benchmark. Comments? |
|
May 6, 2001, 13:35 |
Re: C or Fortran: Facts
|
#4 |
Guest
Posts: n/a
|
Try these flags:
-s -fomit-frame-pointer -Wall -mpentiumpro -march=pentiumpro -malign-functions=4 -fexpensive-optimizations -fschedule-insns2 -mwide-multiply -O2 +plus these flags strongly depend on application: -funroll-loops Comment: with the -O3 and linux or Freebsd I get worst results than with -O2 |
|
May 6, 2001, 13:43 |
Re: C or Fortran: Facts
|
#5 |
Guest
Posts: n/a
|
C++ optimized and fortran optimized. These results were computed on an AMD1000 and FreeBSD.
These are the times for 40*vector operations, the vectors have 1x10^6 elements: sum of two vectors: fortran (1.03) C++(0.75) dot product: fortran(1.58) C++(1.35) y=2*z : fortran(2.06) C++(1.44) |
|
May 6, 2001, 17:25 |
Re: C or Fortran: Facts
|
#6 |
Guest
Posts: n/a
|
I cannot check your results because you have provided insufficient information. I (and I suspect most people) would invoke machine optimised BLAS based routines (i.e. hand optimised machine code) to perform this type of operation.
However, performing a simple vector add 40 times on two arrays of a million elements in Fortran and C/C++ on the PC gave: g77 -O2 => 2.2 gcc -O3 => 2.2 My suspicion is the generated machine instructions are identical but I cannot find an installed disassembler to check. The Fortran optimiser should have spotted that the outer iteration (the 40) was doing nothing and removed it. Altering 40 to 100 showed this was not occurring. I am slightly surprised at this because 15/20 years ago it was precisely this sort of optimisation which forced people to adopt "real world" benchmarks and drop small and simple tests such as this. The timing routines (etime) for g77 were clearly interfering with the execution of the program. The times for just the vector add were erratic and exceeded the time for running the whole program. The timing routine for gcc (gettimeofday) appeared to be fine. I was forced to time the execution of the whole program include the start up and initilization. |
|
May 6, 2001, 18:34 |
Re: C or Fortran: Facts
|
#7 |
Guest
Posts: n/a
|
In order to get these results with C++ you have to use some register variables. Furthermore, I make use of pipelines, the vector operations are performed 4 at a times:
template <class T> inline T Vecteur<T>:perator* (const Vecteur<T>& y) const { T z = 0; register const T* q = y.adr(); register const T* w = p; register const T* fin = &p[m-1-4]; for (; w<=fin; ) { z += *w*(*q); w++; q++; z += *w*(*q); w++; q++; z += *w*(*q); w++; q++; z += *w*(*q); w++; q++; } for (fin = &p[m-1]; w<=fin; ) { z += *w*(*q); w++; q++; } return z; } |
|
May 6, 2001, 19:35 |
Re: C or Fortran: Facts
|
#8 |
Guest
Posts: n/a
|
(1). Interesting, but they are still within a factor of two. (2). So, I would say that between Fortran, C, C++, the speed variation is a factor of two. That is perhaps the difference between the average users and the expert.(it is also problem dependent)
|
|
May 7, 2001, 07:08 |
Re: C or Fortran: Facts
|
#9 |
Guest
Posts: n/a
|
Sebastien, do you really believe that if i need to make a change in your programm this could be done easily ?
Then how could you share lines of cfd codes ? Regards, Sylvain |
|
May 7, 2001, 08:15 |
Re: C or Fortran: Facts
|
#10 |
Guest
Posts: n/a
|
I don't if I undertood the exact meaning of your question.
This code example is for a very low level class which is not design to be changed by another user. These Vector (vecteur) template class is design to be used as it is. In fortran, did you ever thought of changing the dot product functions or the sin function? I beleive not. Furthermore, this class was first desing by my thesis director. Even though the programming is not easy to read, I was capable of optimizing parts of the code. For high level classes, I never use pointers or such a crazy design. If I can get some evenings off, I will have the time to finish commenting my CFD code for 3D incompresible flow. I think it is designed to be used by others. Cheers. |
|
May 7, 2001, 09:57 |
Re: C or Fortran: Facts
|
#11 |
Guest
Posts: n/a
|
That was exactly what I mean : you have to be very carrefull when you write programs in C++ to let them understandable by other people.
Has an f77 user, I admit that a lot of thing are impossible (or difficult) to do with f77 but almost everybody - who knows a litttle in english - is able to read f77. About C++, I am to admit that my only knowledge of C++ comes from a code that I had modify. It took me days before understanding what (and where) the modifications had to be done, just because it was writing by some kind of C++ wizard. Since f90, it is now possible to define any function on any "kind" of variable with fortran, I must admit that it is very usefull. f90 also allows recursive function and dynamic memory allocation. Cheers Sylvain |
|
May 7, 2001, 12:42 |
Re: C or Fortran: Facts
|
#12 |
Guest
Posts: n/a
|
(1). For the Fortran code, as long as you have the flow chart and understand the logic of what must be done first, then you can understand the whole process easily. (2). In C++, there are at least two more concepts to learn, one is the class/object and the other is the pointer(which store the memory address like the mail box number in the post office). (3). To read a C++ code is like reading a new novel. You need to learn each character in the novel, his name, his relationship with other characters, his personality, etc... And this part is normally defined in the class in the header file along with other definitions. (4). So, it takes more time to learn the C++ code written by other researchers. (5). The advantage is that, if you are using the pre-defined class/object, then it will save you a lot of time(code re-use),especially if the class/object is checked out fully. (6). I think, at the class/object level, C++ and conventional Fortran are two different things. They are designed to do different things, in different ways. (7). So, it is not meaningful to make direct comparison, because you will be forcing them to do the same task, and that will destroy the original idea of making the code safer, easier to maintatin, and code-reuse. (8). So, to understand a Fortran code, it is like reading the city map. On the other hand, to understand a C++ code, you are trying to understand a community.
|
|
May 7, 2001, 16:27 |
Re: C or Fortran: Facts
|
#13 |
Guest
Posts: n/a
|
The code is obviously incomplete and mangled by the html. I think I have extracted a number things from the above:
(1) You are not performing a vector add but summing the product of the elements. I have changed the test accordingly. (2) The C++ness is essentially irrelevant. You are performing hand optimisations by using a register hint and taking a stride of 4 instead of 1. You can do this in Fortran just as easily. Here are the results for the two machines I used previously. They are the average of 5 runs. The scatter is OK on the Sun but fairly large on the PC - take the last figure with a pinch of salt). <tt> Sun: stride 4 stride 1 (stride 1 no register) f77 -O5 2.47 2.63 g77 -O2 2.59 4.63 cc -xO5 2.72 2.82 (2.82) gcc -O3 2.81 3.19 (3.18) PC: stride 4 stride 1 (stride 1 no register) g77 -O2 1.62 1.83 gcc -O3 1.57 1.70 (1.75) </tt> Points to note: (1) On the sun the native compilers behave as expected. Fortran a bit quicker than C. (2) I was disappointed to see that the stride improved the Fortran results (slightly) on the Sun. This simple sort of array optimisation should be performed by a reasonable Fortran optimiser. Curiously, it did little for the native C compiler which I would expect to benefit from this hand optimisation because of aliasing knackering the reasoning of the optimiser. Hmm. (3) gcc and g77 produced slower code than the native compiler as expected. (4) g77 with a stride of 1 is clearly suffering from something on the Sun. The stride of 4 is making a big improvement on the Sun and making g77 faster than gcc. (5) On the PC for this test case g77 does lag gcc slightly and does not show a big improvement with a stride 4. (This is exagerated by comparing a gcc stride of 4 with g77 stride of 1). (6) Comparing clock speeds the PC is under-performing significantly. (7) The register hint is having no effect on the Sun (plenty of registers) and the change on the PC (few registers) is within the accuracy of the timings. I would conclude that there is little difference between g77 and gcc on the PC for this test case. On the Sun there is a bit more difference and it is in the expected direction. (I would add that testing a CFD prediction is a more worthwhile test). |
|
May 7, 2001, 19:28 |
Re: C or Fortran: Facts
|
#14 |
Guest
Posts: n/a
|
The register variables on a PC makes a bigger difference than the stride of 4.
These are the results for a stride of 4 (this stride usualy takes advantage of multiple pipelines on a processor, with fortran this shouldn't be necessary and handle by the compiler) vector sum: with register (0.75), without (1.12) , fortran(1.03) dot product : with (1.35), without (1.80), fortran (1.58) y=2*z : with (1.55), without (2.23), fortran(2.06) Conclusion: On a PC if you don't optimize the C++ code with register variables, the vector operations are faster with fortran. But, as I pointed out, the vector classes don't need to be designed by an average user, these are low level classes. |
|
May 8, 2001, 06:49 |
Re: C or Fortran: Facts
|
#15 |
Guest
Posts: n/a
|
I see no significant difference with the register hint. On a RISC processor (plenty of registers) with a reasonable native optimizing compiler this is only to be expected. On a CISC processor (few registers - 6 on the Intel I think?) with a general purpose optimizing compiler it is certainly possible. However, my attempts to reproduce the incompletely specified problems are clearly showing your results are not general (and I would suspect unusual) and apply to whatever C/C++ and Fortran compilers you possess (you do not say which) and for your particular processor/motherboard combination. I am not disputing your results but the generality of your conclusion is incorrect. For such a simple and common operation, the biggest influence is probably the quality of the Fortran optimization - it would not be unreasonable to expect this to match hand coded machine code although this is clearly does not the case.
|
|
May 8, 2001, 07:29 |
Re: C or Fortran: Facts
|
#16 |
Guest
Posts: n/a
|
These kind of results can be obtain on FreeBSD and an AMD1000 and linux with a pIII. The compilers I used for these "benchmarks" are g77 and g++ (gnu compilers).
If I ever get the time, I will try these little vector producs operations on the Solaris workstation I have in my office. P.S. The problems are very specific, these are simple vector operations and can be reproduce: 1) sum of a vector elements : a= sum (i) x_i 2) dot product: a=x cdot y 3) scalar product and affectation: y=z*a These operations have to be implemented as functions (not just a loop in a main program). The vector have one million elements and the operations were performed 40 times. I personaly think that these operations give a good insight of the relative speed of the vectors operations with different languages. Tonight, if I have the time, I will time (once again) my MHD program compiled with a native fortran compiler (digital fortran) and see if there is a difference between its results ans those obtain with f2c+gcc. Cheers. |
|
May 8, 2001, 09:57 |
Re: C or Fortran: Facts
|
#17 |
Guest
Posts: n/a
|
Taking the results for 40 dot products of two vectors of a million elements. This is 40 million sets of operations of something like: load, load, mult, add, test_loop.
On the Athalon using complicated hand-crafted C++: 1.35*1000M/40M = 33.75 ticks On Sun calling BLAS from Fortran: 1.8*140M/40M = 6.3 ticks What is the Pentium doing with its complex instructions and all those ticks?. Is this progress? Is there any point measuring anything other than memory access times for modern Pentiums? |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fortran Compiler-CFX12.1 | Araz | CFX | 13 | March 27, 2017 06:37 |
Intrinsic Procedure 'ISNAN' in GNU FORTRAN 77 | hawk | Main CFD Forum | 1 | April 12, 2005 23:13 |
visual fortran | Monica | Main CFD Forum | 1 | August 28, 2004 21:45 |
Fortran77 or Fortran 90 | Swapnil | CFX | 2 | November 26, 2002 16:16 |
Why Favoring Fortran over C/C++? | Zi-Wei Chiou | Main CFD Forum | 35 | September 26, 2001 10:34 |