|
[Sponsors] |
Best coding and benchmarking practices for Fortran |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
November 19, 2020, 07:49 |
Best coding and benchmarking practices for Fortran
|
#1 |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
What are the best coding and benchmarking practices for Fortran?
I was surprised to see that x(i) = x(i)**4 was WAAAAAYYYYYY faster than x(i) = x(i)**4.0 It was quite surprising to see this. I thought the compiler would understand how to optimize the code since the compiler has access to the constant term 4.0. The compiler could have simply converted 4.0 to 4. But for some reason it didn't. Kindly share some things you know. Thanks Here's my benchmark code and compiler flags: Code:
! COMPILE : gfortran -g -O3 -mavx2 power.F90 ! DISASM : objdump -d -SC -Mintel a.out program main implicit none integer *4 :: n, r, rmax real :: time_start, time_end n = 8**8 rmax = 10 print *, "N : ", n print *, "REPITIONS : ", rmax print *, "--------------------------------------------" print *, "BENCHMARK x(i) = x(i)**4" print *, "" do r=1,rmax call cpu_time(time_start) call power_integer(n) call cpu_time(time_end) print *, " - TIME ELAPSED ", time_end - time_start end do print *, "--------------------------------------------" print *, "BENCHMARK x(i) = x(i)**4.0" print *, "" do r=1,rmax call cpu_time(time_start) call power_real(n) call cpu_time(time_end) print *, " - TIME ELAPSED ", time_end - time_start end do contains subroutine power_integer(n) implicit none real *4 , dimension(:), allocatable :: x integer *4 :: n, i allocate(x(n)) do i=1,AND(n,-8) x(i) = 2.0 end do do i=1,AND(n,-8) x(i) = x(i)**4 end do deallocate(x) end subroutine subroutine power_real(n) implicit none real *4 , dimension(:), allocatable :: x integer *4 :: n, i allocate(x(n)) do i=1,AND(n,-8) x(i) = 2.0 end do do i=1,AND(n,-8) x(i) = x(i)**4.0 end do deallocate(x) end subroutine end program Here's my result : Code:
N : 16777216 REPITIONS : 10 -------------------------------------------- BENCHMARK x(i) = x(i)**4 - TIME ELAPSED 0.105583996 - TIME ELAPSED 6.43480048E-02 - TIME ELAPSED 4.91680056E-02 - TIME ELAPSED 4.82190102E-02 - TIME ELAPSED 4.89619970E-02 - TIME ELAPSED 4.88489866E-02 - TIME ELAPSED 4.95190024E-02 - TIME ELAPSED 4.94549870E-02 - TIME ELAPSED 4.96839881E-02 - TIME ELAPSED 4.93779778E-02 -------------------------------------------- BENCHMARK x(i) = x(i)**4.0 - TIME ELAPSED 0.121948004 - TIME ELAPSED 0.121810973 - TIME ELAPSED 0.121183991 - TIME ELAPSED 0.120921016 - TIME ELAPSED 0.120067954 - TIME ELAPSED 0.121345043 - TIME ELAPSED 0.120756984 - TIME ELAPSED 0.120249987 - TIME ELAPSED 0.120095015 - TIME ELAPSED 0.120734930 |
|
November 19, 2020, 10:14 |
|
#2 |
Senior Member
|
The general reasoning is that you are in charge of stuff. Should the compiler analyze every occurrence of reals in your code to understand what you actually want to do? Intel would probably do it (see here https://community.intel.com/t5/Intel...on/td-p/924287), but your mileage with other compilers might vary. So it is certainly a good practice to use integers in this case and whenever the result is translatable to a very different operation. Here integer exponentiation is just multiplication. But this might actually be an extreme case.
Tricks really depend from what you know or don't. Also, most things in Fortran are, purpotedly, not standardized. For example, do you know that, despite the work by reference, most compilers (probably all of them) will likely do a copy of your input to a subroutine if it is a non-conitguous stride of a larger array? Which means that, sometimes, it is just better to pass the whole thing and indexing in the subroutine, if that makes sense. |
|
Tags |
benchmarking, fortran code |
|
|