CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

Best coding and benchmarking practices for Fortran

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By sbaffini

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 19, 2020, 07:49
Default Best coding and benchmarking practices for Fortran
  #1
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
What are the best coding and benchmarking practices for Fortran?


I was surprised to see that x(i) = x(i)**4 was WAAAAAYYYYYY faster than x(i) = x(i)**4.0


It was quite surprising to see this. I thought the compiler would understand how to optimize the code since the compiler has access to the constant term 4.0. The compiler could have simply converted 4.0 to 4. But for some reason it didn't.


Kindly share some things you know.


Thanks



Here's my benchmark code and compiler flags:


Code:
! COMPILE : gfortran -g -O3 -mavx2  power.F90
! DISASM  : objdump -d -SC -Mintel  a.out

program main
implicit none
integer *4 :: n, r, rmax
real       :: time_start, time_end

n = 8**8
rmax = 10

print *, "N            : ", n
print *, "REPITIONS    : ", rmax
print *, "--------------------------------------------"

print *, "BENCHMARK x(i) = x(i)**4"
print *, ""

do r=1,rmax
    call cpu_time(time_start)
    call power_integer(n)
    call cpu_time(time_end)

    print *, " - TIME ELAPSED      ", time_end - time_start
end do
print *, "--------------------------------------------"

print *, "BENCHMARK x(i) = x(i)**4.0"
print *, ""

do r=1,rmax
    call cpu_time(time_start)
    call power_real(n)
    call cpu_time(time_end)

    print *, " - TIME ELAPSED      ", time_end - time_start
end do

contains

subroutine power_integer(n)
implicit none
real    *4 , dimension(:), allocatable :: x
integer *4 :: n, i

allocate(x(n))

do i=1,AND(n,-8)
    x(i) = 2.0
end do

do i=1,AND(n,-8)
    x(i) = x(i)**4
end do

deallocate(x)
end subroutine

subroutine power_real(n)
implicit none
real    *4 , dimension(:), allocatable :: x
integer *4 :: n, i

allocate(x(n))

do i=1,AND(n,-8)
    x(i) = 2.0
end do

do i=1,AND(n,-8)
    x(i) = x(i)**4.0
end do

deallocate(x)
end subroutine

end program

Here's my result :


Code:
 N            :     16777216
 REPITIONS    :           10
 --------------------------------------------
 BENCHMARK x(i) = x(i)**4
 
  - TIME ELAPSED        0.105583996    
  - TIME ELAPSED         6.43480048E-02
  - TIME ELAPSED         4.91680056E-02
  - TIME ELAPSED         4.82190102E-02
  - TIME ELAPSED         4.89619970E-02
  - TIME ELAPSED         4.88489866E-02
  - TIME ELAPSED         4.95190024E-02
  - TIME ELAPSED         4.94549870E-02
  - TIME ELAPSED         4.96839881E-02
  - TIME ELAPSED         4.93779778E-02
 --------------------------------------------
 BENCHMARK x(i) = x(i)**4.0
 
  - TIME ELAPSED        0.121948004    
  - TIME ELAPSED        0.121810973    
  - TIME ELAPSED        0.121183991    
  - TIME ELAPSED        0.120921016    
  - TIME ELAPSED        0.120067954    
  - TIME ELAPSED        0.121345043    
  - TIME ELAPSED        0.120756984    
  - TIME ELAPSED        0.120249987    
  - TIME ELAPSED        0.120095015    
  - TIME ELAPSED        0.120734930
aerosayan is offline   Reply With Quote

Old   November 19, 2020, 10:14
Default
  #2
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,195
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
The general reasoning is that you are in charge of stuff. Should the compiler analyze every occurrence of reals in your code to understand what you actually want to do? Intel would probably do it (see here https://community.intel.com/t5/Intel...on/td-p/924287), but your mileage with other compilers might vary. So it is certainly a good practice to use integers in this case and whenever the result is translatable to a very different operation. Here integer exponentiation is just multiplication. But this might actually be an extreme case.

Tricks really depend from what you know or don't. Also, most things in Fortran are, purpotedly, not standardized. For example, do you know that, despite the work by reference, most compilers (probably all of them) will likely do a copy of your input to a subroutine if it is a non-conitguous stride of a larger array? Which means that, sometimes, it is just better to pass the whole thing and indexing in the subroutine, if that makes sense.
aerosayan likes this.
sbaffini is offline   Reply With Quote

Reply

Tags
benchmarking, fortran code


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 12:40.