|
[Sponsors] |
Fortran->Assembly : How to remove redundant non-vectorized code? |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
November 17, 2020, 10:47 |
Fortran->Assembly : How to remove redundant non-vectorized code?
|
#1 |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
SUMMARY :
When gfortran vectorizes an equation such as x = y*z , the compiler will only use vector registers (such as YMM) and vector opcodes, if and only if the compiler knows that all of the data (x,y,z) can be loaded on said vector registers. However if the compiler doesn't know whether all of the data can be loaded onto the registers, it will generate redundant code such that the remaining data can be operated upon. This redundant code will generally not use the highest available SIMD vector registers and will bloat the executable causing ICACHE misses or preventing function/subroutine inlining optimizations. I would like to remove this redundant code since I know that all of my data (x,y,z) will fit perfectly on the YMM vector registers. DETAILS : Code compiled and run on Intel i3 CPU with AVX2 SIMD support. I observed that for the sample operation x = y*z, with array lengths of 8 each, the compiler is able to generate only vectorized code for the equation x = y*z. This happens since 8 integers can be perfectly fit onto a AVX2 YMM register. CODE : Code:
! COMPILE : gfortran -g -O3 -mavx2 -march=skylake fortran_vectorization_1a.dis.F90 ! DISASM : objdump -d -SC -Mintel a.out program main implicit none integer *4, dimension(8) :: x, y, z ! Initialize x = 0 y = (/10, 11, 12, 13, 14, 15, 16, 17/) z = (/1 , 2, 3, 4, 5, 6, 7, 8/) ! Vector multiplication x(:) = y*z ! Prevent dead code elimination print *, x end program ASSEMBLY : Slightly cleaned up Code:
00000000000011b0 <MAIN__>: program main 11b0: 55 push rbp program main 11b8: 48 89 e5 mov rbp,rsp 11bb: 41 54 push r12 11bd: 48 83 e4 e0 and rsp,0xffffffffffffffe0 11c1: 48 81 ec c0 02 00 00 sub rsp,0x2c0 y = (/10, 11, 12, 13, 14, 15, 16, 17/) 11c8: c5 f9 6f 0d 90 0e 00 vmovdqa xmm1,XMMWORD PTR [rip+0xe90] # 2060 <A.0.3879> 11cf: 00 11d0: c5 f9 6f 15 98 0e 00 vmovdqa xmm2,XMMWORD PTR [rip+0xe98] # 2070 <A.0.3879+0x10> 11d7: 00 z = (/1 , 2, 3, 4, 5, 6, 7, 8/) 11d8: c5 f9 6f 1d 60 0e 00 vmovdqa xmm3,XMMWORD PTR [rip+0xe60] # 2040 <A.1.3881> 11df: 00 11e0: c5 f9 6f 25 68 0e 00 vmovdqa xmm4,XMMWORD PTR [rip+0xe68] # 2050 <A.1.3881+0x10> 11e7: 00 y = (/10, 11, 12, 13, 14, 15, 16, 17/) 11e8: c5 f8 29 4c 24 20 vmovaps XMMWORD PTR [rsp+0x20],xmm1 11ee: c5 f8 29 54 24 30 vmovaps XMMWORD PTR [rsp+0x30],xmm2 x(:) = y*z 11f4: c5 fd 6f 6c 24 20 vmovdqa ymm5,YMMWORD PTR [rsp+0x20] z = (/1 , 2, 3, 4, 5, 6, 7, 8/) 120a: c5 f8 29 5c 24 40 vmovaps XMMWORD PTR [rsp+0x40],xmm3 1210: c5 f8 29 64 24 50 vmovaps XMMWORD PTR [rsp+0x50],xmm4 x(:) = y*z 1223: c4 e2 55 40 44 24 40 vpmulld ymm0,ymm5,YMMWORD PTR [rsp+0x40] x(:) = y*z 123d: c5 fd 7f 04 24 vmovdqa YMMWORD PTR [rsp],ymm0 However when we increase the length of the arrays to 9, one integer can't be fit on a YMM vector register. So, the compiler has to generate extra non-vectorized code to take care of the last integer. CODE : Code:
! COMPILE : gfortran -g -O3 -mavx2 -march=skylake fortran_vectorization_1b.dis.F90 ! DISASM : objdump -d -SC -Mintel a.out program main implicit none integer *4, dimension(9) :: x, y, z ! Initialize x = 0 y = (/10, 11, 12, 13, 14, 15, 16, 17, 18/) z = (/1 , 2, 3, 4, 5, 6, 7, 8, 9/) ! Vector multiplication x(:) = y*z ! Prevent dead code elimination print *, x end program ASSEMBLY : Slightly cleaned up... Code:
00000000000011b0 <MAIN__>: program main 11b0: 55 push rbp 11b1: 48 89 e5 mov rbp,rsp 11b4: 41 54 push r12 11b6: 48 83 e4 e0 and rsp,0xffffffffffffffe0 11ba: 48 81 ec 00 03 00 00 sub rsp,0x300 y = (/10, 11, 12, 13, 14, 15, 16, 17, 18/) 11c1: 8b 15 d9 0e 00 00 mov edx,DWORD PTR [rip+0xed9] # 20a0 <A.0.3879+0x20> z = (/1 , 2, 3, 4, 5, 6, 7, 8, 9/) 11c7: 8b 05 93 0e 00 00 mov eax,DWORD PTR [rip+0xe93] # 2060 <A.1.3881+0x20> y = (/10, 11, 12, 13, 14, 15, 16, 17, 18/) 11cd: c5 f9 6f 0d ab 0e 00 vmovdqa xmm1,XMMWORD PTR [rip+0xeab] # 2080 <A.0.3879> 11d4: 00 z = (/1 , 2, 3, 4, 5, 6, 7, 8, 9/) 11d5: 89 84 24 a0 00 00 00 mov DWORD PTR [rsp+0xa0],eax y = (/10, 11, 12, 13, 14, 15, 16, 17, 18/) 11dc: c5 f9 6f 15 ac 0e 00 vmovdqa xmm2,XMMWORD PTR [rip+0xeac] # 2090 <A.0.3879+0x10> 11e3: 00 x(:) = y*z 11e4: 0f af c2 imul eax,edx z = (/1 , 2, 3, 4, 5, 6, 7, 8, 9/) 11e7: c5 f9 6f 1d 51 0e 00 vmovdqa xmm3,XMMWORD PTR [rip+0xe51] # 2040 <A.1.3881> 11ee: 00 11ef: c5 f9 6f 25 59 0e 00 vmovdqa xmm4,XMMWORD PTR [rip+0xe59] # 2050 <A.1.3881+0x10> 11f6: 00 y = (/10, 11, 12, 13, 14, 15, 16, 17, 18/) 11f7: c5 f8 29 4c 24 40 vmovaps XMMWORD PTR [rsp+0x40],xmm1 11fd: c5 f8 29 54 24 50 vmovaps XMMWORD PTR [rsp+0x50],xmm2 x(:) = y*z 1203: 89 44 24 20 mov DWORD PTR [rsp+0x20],eax 1207: c5 fd 6f 6c 24 40 vmovdqa ymm5,YMMWORD PTR [rsp+0x40] z = (/1 , 2, 3, 4, 5, 6, 7, 8, 9/) 1224: c5 f8 29 9c 24 80 00 vmovaps XMMWORD PTR [rsp+0x80],xmm3 122b: 00 00 122d: c5 f8 29 a4 24 90 00 vmovaps XMMWORD PTR [rsp+0x90],xmm4 1234: 00 00 x(:) = y*z 1243: c4 e2 55 40 84 24 80 vpmulld ymm0,ymm5,YMMWORD PTR [rsp+0x80] 124a: 00 00 00 y = (/10, 11, 12, 13, 14, 15, 16, 17, 18/) 124d: 89 54 24 60 mov DWORD PTR [rsp+0x60],edx x(:) = y*z 1264: c5 fd 7f 04 24 vmovdqa YMMWORD PTR [rsp],ymm0 This is a good feature. However it becomes a problem when we use allocatable arrays. The compiler doesn't know the size of the array so it HAS TO generate both the vectorized and non-vectorized versions of the code. This can be seen in the next example. CODE : Code:
! COMPILE : gfortran -g -O3 -mavx2 -march=skylake fortran_vectorization_1c.dis.F90 ! DISASM : objdump -d -SC -Mintel --visualize-jumps a.out program main implicit none integer *4, dimension(:), allocatable :: x, y, z integer *4 :: n read *, n ! Allocate allocate(x(n)) allocate(y(n)) allocate(z(n)) ! Initialize x = 0 y(n/2) = n z(n/2) = n/2 ! Vector multiplication x(:) = y*z ! Prevent dead code elimination print *, x print *, y print *, z ! Scrub clean deallocate(x,y,z) end program ASSEMBLY : The generated code is too large. I'm only posting the x=y*z part. Code:
x(:) = y*z 1498: c4 c1 7e 6f 0c 07 vmovdqu ymm1,YMMWORD PTR [r15+rax*1] 149e: c4 c2 75 40 04 04 vpmulld ymm0,ymm1,YMMWORD PTR [r12+rax*1] 14a4: c4 c1 7e 7f 04 06 vmovdqu YMMWORD PTR [r14+rax*1],ymm0 14aa: 48 83 c0 20 add rax,0x20 14ae: 48 39 c2 cmp rdx,rax 14b1: 75 e5 jne 1498 <MAIN__+0x278> 14b3: 48 89 ca mov rdx,rcx 14b6: 48 83 e2 f8 and rdx,0xfffffffffffffff8 14ba: 48 8d 42 01 lea rax,[rdx+0x1] 14be: 48 39 ca cmp rdx,rcx 14c1: 0f 84 aa 01 00 00 je 1671 <MAIN__+0x451> 14c7: c5 f8 77 vzeroupper 14ca: 48 8d 50 ff lea rdx,[rax-0x1] 14ce: 41 8b 34 97 mov esi,DWORD PTR [r15+rdx*4] 14d2: 41 0f af 34 94 imul esi,DWORD PTR [r12+rdx*4] 14d7: 41 89 34 96 mov DWORD PTR [r14+rdx*4],esi 14db: 48 8d 50 01 lea rdx,[rax+0x1] 14df: 48 39 d1 cmp rcx,rdx 14e2: 7c 7b jl 155f <MAIN__+0x33f> 14e4: 41 8b 34 87 mov esi,DWORD PTR [r15+rax*4] 14e8: 41 0f af 34 84 imul esi,DWORD PTR [r12+rax*4] 14ed: 41 89 34 86 mov DWORD PTR [r14+rax*4],esi 14f1: 48 8d 70 02 lea rsi,[rax+0x2] 14f5: 48 39 f1 cmp rcx,rsi 14f8: 7c 65 jl 155f <MAIN__+0x33f> 14fa: 41 8b 3c 97 mov edi,DWORD PTR [r15+rdx*4] 14fe: 41 0f af 3c 94 imul edi,DWORD PTR [r12+rdx*4] 1503: 41 89 3c 96 mov DWORD PTR [r14+rdx*4],edi 1507: 48 8d 50 03 lea rdx,[rax+0x3] 150b: 48 39 d1 cmp rcx,rdx 150e: 7c 4f jl 155f <MAIN__+0x33f> 1510: 41 8b 3c b7 mov edi,DWORD PTR [r15+rsi*4] 1514: 41 0f af 3c b4 imul edi,DWORD PTR [r12+rsi*4] 1519: 41 89 3c b6 mov DWORD PTR [r14+rsi*4],edi 151d: 48 8d 70 04 lea rsi,[rax+0x4] 1521: 48 39 f1 cmp rcx,rsi 1524: 7c 39 jl 155f <MAIN__+0x33f> 1526: 41 8b 3c 97 mov edi,DWORD PTR [r15+rdx*4] 152a: 41 0f af 3c 94 imul edi,DWORD PTR [r12+rdx*4] 152f: 41 89 3c 96 mov DWORD PTR [r14+rdx*4],edi 1533: 48 8d 50 05 lea rdx,[rax+0x5] 1537: 48 39 d1 cmp rcx,rdx 153a: 7c 23 jl 155f <MAIN__+0x33f> 153c: 41 8b 3c b7 mov edi,DWORD PTR [r15+rsi*4] 1540: 48 83 c0 06 add rax,0x6 1544: 41 0f af 3c b4 imul edi,DWORD PTR [r12+rsi*4] 1549: 41 89 3c b6 mov DWORD PTR [r14+rsi*4],edi 154d: 48 39 c1 cmp rcx,rax 1550: 7c 0d jl 155f <MAIN__+0x33f> 1552: 41 8b 04 94 mov eax,DWORD PTR [r12+rdx*4] 1556: 41 0f af 04 97 imul eax,DWORD PTR [r15+rdx*4] 155b: 41 89 04 96 mov DWORD PTR [r14+rdx*4],eax We can clearly see how much bloat is generated. We know that we will only pass in an array of integers which has length of 8 or multiples of 8 to this vector operation. So, we don't actually need the rest of the 7 redundant IMUL operations. However since the arrays x,y,z are allocatalbe, the compiler doesn't know this. Is there anyway to tell the compiler to only generate vectorized code like we see in the first case? CONCLUSION: The three examples given above show that the compiler generates bloat when it doesn't know for sure that all of the data can be fit perfectly on AVX2 vector registers. This seems like a good opportunity for optimization. I would even call it necessary. Just look at the bloat generated for the following equation : Code:
! Find the area of a triangle by using cross product vols = 0.5*sqrt((x1*(y2-y3))**2.0 + (x2*(y3-y1))**2.0 + (x3*(y1-y2))**2.0) GODOT LINK : https://gcc.godbolt.org/z/oxcWzd Thanks Last edited by aerosayan; November 17, 2020 at 10:56. Reason: :) gets translated to smilies. Changed vols(:) to vols |
|
November 17, 2020, 15:26 |
|
#2 |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
So I was able to use GODBOLT to try my code in Intel's Fortran compiler (x86-64 IFORT 21.1.9) and was able to generate satisfactory assembly code.
I don't have access to an IFORT compiler, so I can't use it, I'm still looking for how I could achieve such results in GFORTRAN compiler. Here's what I did : Looked up this Intel article "Data Allignment to Assist Vectorization" by Rakesh Krishnaiyer. URL : https://software.intel.com/content/www/us/en/develop/articles/data-alignment-to-assist-vectorization.html ARCHIVE : https://archive.vn/RHEIe I saw that we could give the compiler some information about the value N, so that it would be able to optimize the code based on it. As shown in the first example, we should get purely vectorized code when N%8==0. This is due to the YMM vector registers being able to hold 8 integers. So, I used !DIR$ ASSUME_ALIGNED X: 32 to tell the compiler that we have X aligned on 32. Then I used !DIR$ ASSUME (MOD(N,8) .EQ. 0) to tell the compiler that N will be a multiple of 8. I kept rest of the code same. Code:
program main implicitnone integer *4 :: n, i integer *4, dimension(:), allocatable :: x, y, z read *, n ! Allocate allocate(x(n)) allocate(y(n)) allocate(z(n)) ! Initialize x = 0 y(n/2) = n z(n/2) = n/2 ! Vector multiplication !DIR$ ASSUME_ALIGNED X: 32 !DIR$ ASSUME (MOD(N,8) .EQ. 0) do i=1,n x(i) = y(i)*z(i) enddo ! Prevent dead code elimination print *, x print *, y print *, z ! Scrub clean deallocate(x,y,z) endprogram Code:
..B1.15: vmovdqu ymm0, YMMWORD PTR [r13+r12*4] vpmulld ymm1, ymm0, YMMWORD PTR [r14+r12*4] vmovdqu YMMWORD PTR [rax+r12*4], ymm1 add r12, 8 cmp r12, rcx jb ..B1.15 Here's the live GODBOLT link for the code shown above : https://gcc.godbolt.org/z/PPf1zE Here's the live GODBOLT link for the code to calculate the volume of triangles using cross product : https://gcc.godbolt.org/z/Y5orh1 Now that I was able to optimize the code in IFORT, I know that it is a viable optimization. Unfortunately, I still don't know how to do it in GFORTRAN. Hopefully, someone will be able to help if they know more about GFORTRAN's and GCC's compiler directives. I will keep looking too. Last edited by aerosayan; November 17, 2020 at 15:46. Reason: Added godbolt link |
|
November 18, 2020, 03:26 |
|
#3 |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
I optimized the solution for GFORTRAN compiler, but I'm not too happy with it.
The solution is to let the compiler know that the loop ends at a multiple of 8. Here's how I did it. Code:
! Vector multiplication do i=1,AND(n,-8) x(i) = y(i)*z(i) end do This lets the compiler know that it doesn't need to generate the redundant non-vectorized code. However I'm not too happy with it since gfortran should have compiler directives to make this easier. Anyway, here's the full code. Code:
! COMPILE : gfortran -g -O3 -mavx2 -march=skylake fortran_vectorization_1g.dis.F90 ! DISASM : objdump -d -SC -Mintel --visualize-jumps a.out program main implicit none integer *4, dimension(:), allocatable :: x, y, z integer *4 :: n, i read *, n ! Allocate allocate(x(n)) allocate(y(n)) allocate(z(n)) ! Initialize do i=1,n x(i) = 0 y(i) = i z(i) = 2*i end do ! Loop truncation math print *, "AND(N,-8) :", AND(N,-8) print *, "N - MOD(N,8) :", N - MOD(N,8) ! Vector multiplication do i=1,AND(n,-8) x(i) = y(i)*z(i) end do ! Prevent dead code elimination print *, "RESULT :" do i=1,n print *, y(i), " x", z(i) ,"=", x(i) end do ! Scrub clean deallocate(x,y,z) end program Here's the assembly code of the required operation x(i) = y(i)*z(i) : Code:
x(i) = y(i)*z(i) 15a8: 48 8b 4c 24 18 mov rcx,QWORD PTR [rsp+0x18] 15ad: c5 fe 6f 24 01 vmovdqu ymm4,YMMWORD PTR [rcx+rax*1] 15b2: 48 8b 4c 24 10 mov rcx,QWORD PTR [rsp+0x10] 15b7: c5 fd 7f 64 24 20 vmovdqa YMMWORD PTR [rsp+0x20],ymm4 15bd: c4 e2 5d 40 04 01 vpmulld ymm0,ymm4,YMMWORD PTR [rcx+rax*1] 15c3: 48 8b 4c 24 08 mov rcx,QWORD PTR [rsp+0x8] 15c8: c5 fe 7f 04 01 vmovdqu YMMWORD PTR [rcx+rax*1],ymm0 15cd: 48 83 c0 20 add rax,0x20 15d1: 48 39 d0 cmp rax,rdx 15d4: 75 d2 jne 15a8 <MAIN__+0x378> Running the executable and entering N=8 gives the result: Code:
8 AND(N,-8) : 8 N - MOD(N,8) : 8 RESULT : 1 x 2 = 2 2 x 4 = 8 3 x 6 = 18 4 x 8 = 32 5 x 10 = 50 6 x 12 = 72 7 x 14 = 98 8 x 16 = 128 Code:
31 AND(N,-8) : 24 N - MOD(N,8) : 24 RESULT : 1 x 2 = 2 2 x 4 = 8 3 x 6 = 18 4 x 8 = 32 5 x 10 = 50 6 x 12 = 72 7 x 14 = 98 8 x 16 = 128 9 x 18 = 162 10 x 20 = 200 11 x 22 = 242 12 x 24 = 288 13 x 26 = 338 14 x 28 = 392 15 x 30 = 450 16 x 32 = 512 17 x 34 = 578 18 x 36 = 648 19 x 38 = 722 20 x 40 = 800 21 x 42 = 882 22 x 44 = 968 23 x 46 = 1058 24 x 48 = 1152 25 x 50 = 0 26 x 52 = 0 27 x 54 = 0 28 x 56 = 0 29 x 58 = 0 30 x 60 = 0 31 x 62 = 0 This shows that the optimization was a success. Also observed HUGE optimization for the code that calculates volume of triangles : https://gcc.godbolt.org/z/1or59d However, I'm looking for input from other Fortran experts who know if we can achieve this result in an easier way as done in IFORT. Last edited by aerosayan; November 18, 2020 at 03:42. Reason: Added godbolt link |
|
November 18, 2020, 10:22 |
|
#4 |
Senior Member
|
You are certainly learning something, and I am certainly not an expert, but never were any of my codes so good that the next thing to optimize was at this level.
May I ask what sort of improvements are you achieving for your test code on triangle areas and what, if any, in your actual code? Is it actually worth the effort? |
|
November 18, 2020, 11:34 |
|
#5 | |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
Quote:
I was just playing around when I observed this code bloat. I was implementing the triangle volume calculation for my CFD solver when I observed that the actual AVX2 vectorized assembly code (which uses YMM registers) was only around 14 lines while the generated SSE vectorized redundant code (which uses XMM registers) was around 7*14 lines. And the volume calculation equation wasn't even that complex. Imagine how much amount of bloat would be present in a generic number crunching function/subroutine. The method described here has tremendous advantages and applications: - Almost all of my vectorized operations can now be supercharged by removing the redundant code. All I need to do, is to ensure that I fill up the YMM registers completely. For example : If there are 31 cells, I will add an extra dummy cell to make the cell list have 32 elements. - This would allow less Instruction Cache misses and would allow the compiler to basically inline a lot of functions/subroutines, thus removing any cost of function/subroutine call. It would make sense to use such an optimization when we know that everything else will be accelerated. - I just used the volume calculation as an example, since it was sufficiently complicated and it was the first equation that I tried to optimize. Since ALL our cell data (eg, rho, u, v, E etc) is organized in 1D arrays, we can SIMD vectorize most of our CFD solver code (I agree it is tough to do and I need more experience with it). - The optimization is surprisingly portable. A YMM register would fit four 64 bit doubles. To get the same optimization, I just need to use AND(N,-4). For 32 bit floats, I would need AND(N,-8) since eight floats can be fit on a YMM register. When I need to run my code on a machine that supports AVX512, I would need to simply change the -8 and -4 part. This can be done using a #define macro. If the machine doesn't support AVX2, and only supports SSE, the code will still compile and run if we define the optimization as AND(N, MACRO_SIMD_MAX_PACKED_ELEMENTS_BLAH_BLAH). Last edited by aerosayan; November 18, 2020 at 11:40. Reason: Added last line. |
||
November 19, 2020, 13:24 |
|
#6 |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
I did some testing to see if I could get the same optimization while continuing to use Fortran's amazing array slice operations.
Don't wanna write ugly code with xyz(i) 's everywhere. Here's how to do it : We just need to explicitly define the range of operation i.e 1,AND(N,-8) and the compiler understands that it can optimize. Code:
! Vector operartion vols(1:AND(N,-8)) = 0.5*sqrt((x1*(y2-y3))**2.0 + (x2*(y3-y1))**2.0 + (x3*(y1-y2))**2.0) GODBOLT LINK : https://gcc.godbolt.org/z/7PoKzf Assembly code generated : Code:
and ebx, -8 jle .L6 movsx rbx, ebx vmovaps ymm4, YMMWORD PTR .LC2[rip] sal rbx, 2 xor eax, eax .L9: vmovups ymm3, YMMWORD PTR [r9+rax] vmovups ymm1, YMMWORD PTR [r11+rax] vmovups ymm0, YMMWORD PTR [r15+rax] vsubps ymm2, ymm3, ymm1 vsubps ymm1, ymm1, ymm0 vsubps ymm0, ymm0, ymm3 vmulps ymm2, ymm2, YMMWORD PTR [r13+0+rax] vmulps ymm1, ymm1, YMMWORD PTR [r14+rax] vmulps ymm0, ymm0, YMMWORD PTR [r10+rax] vmulps ymm1, ymm1, ymm1 vfmadd231ps ymm1, ymm2, ymm2 vfmadd132ps ymm0, ymm1, ymm0 vsqrtps ymm0, ymm0 vmulps ymm0, ymm0, ymm4 vmovups YMMWORD PTR [r12+rax], ymm0 add rax, 32 cmp rax, rbx jne .L9 |
|
November 20, 2020, 06:37 |
|
#7 |
New Member
Pedro Costa
Join Date: Jul 2017
Posts: 9
Rep Power: 9 |
https://fortran-lang.discourse.group/
this forum may be better suited for a detailed discussion about the Fortran programming language. |
|
Tags |
assembly, fortran 90, solver deveopment |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Patankar CFD FORTRAN 90 Code FVM | siddiquesil | Main CFD Forum | 19 | October 25, 2022 06:21 |
Parallel Fortran 95 code using Openmp | Shiv1510 | Main CFD Forum | 5 | February 14, 2020 12:36 |
CGNS fortran code compiling with intel visual fortran xe | dokeun | Main CFD Forum | 1 | April 5, 2012 22:01 |
Comparison between C/C++ and Fortran? | rick | Main CFD Forum | 45 | September 6, 2011 01:52 |
CFD Code with Fortran | murat | Main CFD Forum | 0 | October 11, 2009 18:05 |