|
[Sponsors] |
June 27, 2021, 03:59 |
share your best C/C++ trick/tips
|
#1 |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
Developers try to pick up C/C++ for coding scientific programs, and usually fall into one of the thousand different traps waiting for them. Time is short, and bugs are many. I'm starting this thread, and inviting everyone to share their best tricks/tips. Starting with simple tips, maybe in future, I will add more complex things.
In order to avoid spam, I will update it only once every one month or so. Kindly mark your tips in order, so that anyone can find them easily in future. >>> TIP 1 : std::cin, std::cout, std::endl are generally very slow if used incorrectly. std::cin, std::cout and std::endl were made with user comfort in mind, not performance. So, they suck. std::cin and std::cout will by default, sync with the old C functions printf and scanf. Due to this synchronization, cin and cout have to wait for any previous printf or scanf operations to finish. This is extremely slow. If you want to read or write a very large file, like a CFD solution file, after every N iterations, you're wasting a lot of time doing it, using the default configuration of cin and cout. I have seen a lot of production code, that wasted 4-5 seconds while loading or writing an extremely big solution from an ASCII file. The trick, is to tell cin and cout to never synchronize with printf or scanf. However, that means that you can't use printf or scanf without re-activating the synchronization again. Here's how to speedup cin and cout. Include this in your code, and see results: Code:
ios::sync_with_stdio(false);cin.tie(0);cout.tie(0); Code:
// std::endl is basically std::cout << "\n" << std::flush; // slow for(int i=0; ...) { cout << i << endl; } // faster for(int i=0; ...) { cout << i << "\n"; } cout << flush; Macros are evil. We should know that before we start. However, they can be used during development, to make our life easier. I got severe hand pain from using a horrible keyboard at my office, some 1.5 years ago, and I still haven't recovered. So, my keyboard shortcuts, and aliases, are probably the best optimized for speed, and less key crunching. One of the things I very much hate, is writing for loops in C/C++. Fortran's do loops are significantly better. So, I came up with my own. And, by Gods, I love them. Code:
// forward loop // #define xdo(var, lo, hi) for(decltype(hi) var=(lo); var<(hi) ; ++var) // reverse loop // #define xro(var, hi, lo) for(decltype(hi) var=(hi); var>=(lo); --var) Code:
long long n = 10; xdo(i,0,n) cout << i << " "; cout << endl; xro(i,n-1,0) cout << i << " "; cout << endl; However, since this is a macro, use carefully, and with judgement. If you mess up, everything's gonna blow up to high heaven. Additionally, there's a limitation that OpenMP can't detect these for loops, so you can't easily use #pragma omp parallel directives with this form. You have to write the loop out in normal form. But that's okay for me, as I only use it for rapid development. |
|
June 28, 2021, 06:53 |
|
#2 |
Super Moderator
|
I am very interested to know how people allocate multi-dimensional arrays.
For example, on a 3d structured grid, one needs to store data like this double sol[nx][ny][nz][nvar]; This is easy in fortran, but not so in C/C++ as there are no in-built multi-d arrays. What are the best ways you have found ? |
|
June 28, 2021, 07:24 |
|
#3 |
Senior Member
Uwe Pilz
Join Date: Feb 2017
Location: Leipzig, Germany
Posts: 744
Rep Power: 15 |
Cumbersome to write, but dynamic and located at the stack:
Code:
vector<vector<vector<double>>> myVar(X,vector<vector<double>(Y,vector<double>Z)));
__________________
Uwe Pilz -- Die der Hauptbewegung überlagerte Schwankungsbewegung ist in ihren Einzelheiten so hoffnungslos kompliziert, daß ihre theoretische Berechnung aussichtslos erscheint. (Hermann Schlichting, 1950) |
|
June 28, 2021, 08:32 |
|
#4 | |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
Quote:
In case you're not using the data structure for performance critical operations, you can follow the method provided by piu58. That seems correct, and a really good time-saving trick. In case you're using the data structure for performance critical operations, it's simple. You don't, if you don't understand the underlying memory layout of the nested std::vectors. Fortran multi dimensional arrays have a good property of using flat memory. i.e the whole multi dimensional array will be flattened out into a single chunk of 1D block on the heap (if your array is dynamic and/or very large) or stack (if your array is small). I'm not an expert on this topic, so I can't guarantee that's what happens in every implementation. However, as per my experience this is how it works. However, C++ doesn't have multi dimensional array support. It only has 1D array support. So, in most general cases, you'll use std::vector to represent your dynamic 1D arrays. You can also use nested std::vector as std::vector<std::vector<double>> to represent 2 dimensional array or matrix. The problem is that, in specification of std::vector, it's clearly stated that they can only guarantee uniform flat memory for only 1 level of std::vector. That is, in your 2D multi dimensinal std::vector, only the last nested layer i.e std::vector<double> will be laid out uniformly into a single chunk of memory. If we consider the std::vector<double> to be the rows in std::vector<std::vector<double>>, there's only a guarantee of uniform memory chunk allocation for each row, and every different row can be far away from each other in memory. The performance implication being : you'll have severe cache misses, if the different rows are far away from each other in memory. This is extremely bad in case your matrix is small (like 4*4, or 5*5), but due to the implementation of nested std::vectors, you might have sever cache misses when doing even simple mathematical operation on different rows of the matrices. And since there's no guarantee of how far the different std::vector rows will be in memory, you have to consider the worst case possibility that they'll have to be loaded from RAM. That's why I manually allocate memory locations in huge chunks of 1D arrays of size NROWS*NCOLS, for 2D matrices. This way, even if your matrix is small or big, your data will be contiguous in memory, so you'll have extremely good performance, provided you access the data correctly. However, data access is a little bit more complicated, so maybe I'll write about it in future. Unfortunately, they aren't : https://stackoverflow.com/questions/8036474 The header information of the std::vector will be on stack. That's the basic housekeeping information regarding the std::vector will be on stack. However, your data will be stored internally, as a pointer to a huge chunk of 1D contiguous memory block in heap. This allows you to dynamically change the pointer to different locations in heap, in case you need to update the vector's size. Of course the memory layout for multi dimensional vectors are more complicated, and explained above. |
||
June 28, 2021, 10:46 |
|
#5 |
Senior Member
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 516
Rep Power: 20 |
Is there a reason not to use the boost library?
https://www.boost.org/doc/libs/1_63_.../doc/user.html |
|
June 28, 2021, 10:57 |
|
#6 | |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
Quote:
If you can guarantee good/acceptable performance for your code, then there's nothing wrong with using any library. Personally I don't like using any library except the most standard ones (even there, I don't use every part of STL for my critical sections), simply because I don't feel comfortable with hidden abstractions of each library. Most of these libraries do a whole lot of Template Meta-programming and Object Oriented Programming, and that automatically worsens compilation time, and code performance. Other than that, boost seems like a well used library for the portions of the codes that aren't as performance critical. Can be very useful in programming different things for the GUI. Personally, I don't want the headache of downloading the correct boost version when deploying my code to a different machine. |
||
July 4, 2021, 04:34 |
|
#7 |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
I remembered why I used to hate boost library.
Here's a "design rationale" for c++ program for calculating distance between 2 points : https://archive.md/2014.04.28-125041...ry/design.html Read through it. Read through it and realize how much badly they messed up even for a simple function as calculating the distance between 2 points. Their whole codebase is bloated, and now a mental gymnastic routine. Realize that the boost developers are such geniuses, they've overflowed their integer IQ score variables, and now each have a negative IQ score. This is why I will never trust boost, and use it in my performance critical code. |
|
July 4, 2021, 10:21 |
|
#8 | |
Senior Member
Uwe Pilz
Join Date: Feb 2017
Location: Leipzig, Germany
Posts: 744
Rep Power: 15 |
Dear aerosayan
that example is great! It shows how badly the code is bloated with a so calle well structured library. Thank you for correcting my mistake (stack) and for your opinions when using vector. It is not too hard multiplying the indices into a flat memory as you mentioned. Quote:
__________________
Uwe Pilz -- Die der Hauptbewegung überlagerte Schwankungsbewegung ist in ihren Einzelheiten so hoffnungslos kompliziert, daß ihre theoretische Berechnung aussichtslos erscheint. (Hermann Schlichting, 1950) |
||
August 28, 2021, 18:27 |
|
#9 |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
>>> TIP 3 : Stop calculating element indices for your matrix class
EDIT1 : After some thorough review, it seems like modern compilers are able to optimize index calculations very well. So, this trick isn't really necessary if your code is being compiled with the latest and greatest compilers. However, if you're supporting older compilers and older hardware, this might be helpful. Although, do your own performance profiling and assembly code analysis on your own, and don't believe everything that's given here, as a fact. C++ doesn't have matrices, so many tend to create matrix classes that store the data inside long arrays, but access the matrix data using row-major or column-major address formulation, and overloading the parenthesis operators. It is a horrible way to do it, since the matrix element's address calculation requires many addition and multiplication operations, and they are not trivial. Here's a significantly better method, that's easy to use, and performs well, as per my preliminary analysis. Needs more testing... Needs lots and lots of testing, before I can say for sure that it performs great. Code:
int main() { // linear 1d array containing data for 4x4 2d matrix int array[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; // we store the begining of each row, into another array int * matrix[4] = { &array[0], &array[4], &array[8], &array[12] }; // now access the 1d array, as if it was a 2d matrix for(int i=0; i<4; ++i) { for(int j=0; j<4; ++j) { // courtesy of c style arrays, you can access pointers as arrays // so, we access pointer to the head of each row, as a new array; // thus, we essentially access the 1d array as a 2d matrix. std::cout << matrix[i][j] << " "; } std::cout << std::endl; } return 0; } Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Last edited by aerosayan; September 4, 2021 at 07:15. |
|
September 1, 2021, 20:17 |
|
#10 |
Senior Member
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8 |
improved previous code, and made it more efficient : no extra memory required + we can do linear traversal through the column order matrix, resulting in more efficient code.
Code:
// template<typename tx> struct dymatrix { tx * head; int ecols, erows; inline tx* operator[](int i) { return &head[i]; } inline const tx* operator[](int i) const { return &head[i]; } }; // int main() { printf("matrix use demonstration...\n"); // float a[100]; for(int i=0; i<100; ++i) a[i] = i; dymatrix<float> h; h.head = reinterpret_cast<float*>(a); h.ecols = 10; h.erows = 10; for(int i=0; i<100; i+=h.ecols) { for(int j=0; j<10; ++j) { const float x = h[i][j]; // mark cout << x << " "; } cout << endl; } // return 0; } Code:
matrix use demonstration... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 Code:
const float x = h[i][j]; // mark 401c99: vmovss xmm0,DWORD PTR [rsp+r14*4+0x28] // <- NOTICE THIS!!! Assembly generated here, uses register based maths, so most likely they'reextremely fast. I have seen fortran generate such code, and previously I thought that they were slow. But since the address calculation is using registers, they were actually extremely fast. obviously, more test required. i'm tired. bye. EDIT : Modern C++ compilers can optimize naive implementations very well. Although, I like to write code that works well with older compilers too. So, kindly do your own profiling, as you might not need this technique. But it's a good trick, and useful for writing matrix/tensor math libraries. |
|
Tags |
c++ tips and tricks |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[blockMesh] Internal walls of zero thickness | anger | OpenFOAM Meshing & Mesh Conversion | 23 | February 6, 2020 19:25 |
[ANSYS Meshing] Share topology and structured mesh | sanket2309 | ANSYS Meshing & Geometry | 0 | December 4, 2019 03:22 |
[ANSYS Meshing] SolidWorks and Share Topology | ThomasEnzinger | ANSYS Meshing & Geometry | 1 | May 21, 2018 06:23 |
[DesignModeler] Share topology issue | rohit.sreekumar | ANSYS Meshing & Geometry | 0 | August 14, 2017 10:14 |
[ANSYS Meshing] Connection Group OR Share Topology ? | John_cfd | ANSYS Meshing & Geometry | 3 | October 9, 2015 11:34 |