|
[Sponsors] |
July 28, 2023, 06:28 |
Self-built InfiniBand cluster?
|
#1 |
New Member
Join Date: Aug 2014
Posts: 18
Rep Power: 12 |
Hi all,
We have five workstations at hand, each with 1. 1×32-core AMD EPYC 7532 CPU 2. 8×16GB DDR4 3200 RAM 3. 1×1TB Samsung 980 Pro SSD for storage Each of them tried hard to make the most use of the 8-ch RAM support of the AMD EPYC CPU by having a relatively low core count and filling eight RAM slots. Not very outdated so far? Now, we need to setup a small-scale cluster to make parellall solving of our own CFD code based on 1. a finite-volume discratization of the Navier-Sokes equations with a SIMPLE-like algorithm on a structured mesh. 2. a distribution parallel strategy with a message passing interface (MPI) to enable exchange of the halo region of the FV mesh after domain decomposition. Each decomposed domain solves the equations independently with one CPU core or more. So far, all the programming and testing of the MPI parallelism of the code have been confined to one machine mentioned above to minic a cluster with intra-node data exchange only. We intend to break through this hardware limitation in the near future and expect a good (near-linear) core number-speedup scaling up to ~100 CPU cores. To achieve this goal, I know that the inter-node rate of the data exchange would be the bottleneck (similar to that caused by the inter-node RAM bandwidth), so the InfiniBand (IB) network should be as fast as possible (i.e., as low latancy/high bandwidth as possible). Therefore, a cluster consisting of the machines above (each would serves as a node in the cluster) and a small IB network would be what we want. The budget shouldn't be too high (<$800 for each node) A 100 or 200Gbps IB network is perhaps the good choice. Theoretically, there may be serveral ways of easy setup of the network: Case-1: a three-node cluster with ring topology, Case-2: a three-node cluster with a star topology, Case-3: a five-node cluster with ring topology, Case-4: a five-node cluster with a star topology. See the my illustrative image here for clearity: https://www.cfd-online.com/Forums/at...1&d=1690535873 Note that Cases 1,3 with a ring topology avoid the use of a expensive 100/200G IB switch, Case 1 features direct node-to-node connection, Cases 2,5 with a star topology require the use of the IB switch and can thus be ruled out I have no idea about how the IB network work and how to build a efficent one with limit budget. Here are my questions: 1. Can the way of Cases 1 and 3 works without a IB switch, which is too expensive? 2. Can the 100 or 200Gbps IB network, regardless the use of a switch, achieve the goal of ~100-core near-linear scaling (suppose the parallel algrithm of code is efficent enough)? 3. If the inter-node latency is more important than the bandwidth, is the older low-latency 40 or the 56Gbps network competent? Here are some Ansys demonstrations using the AMD EPYC CPUs and a 100 or 200Gbps IB network, which show good scaling: https://www.cfd-online.com/Forums/at...1&d=1690535832 https://www.cfd-online.com/Forums/at...1&d=1690535908 Thanks! ----------------------------------------------- Prices of IB hardware for reference (US$, from eBay) 40G QDR: Switch: Mellanox IS5023 18-port, US$100 NIC: Mellanox ConnectX-3 dual port, US$25 56G FDR: Switch: Mellanox SX6036 36-port: US$100 NIC: Mellanox Connect-IB dual port: US$50 100G EDR, PCIe 4.0x16/3.0x16: Switch: Mellanox SB7800 36-port: US$1,700 NIC: Mellanox ConnectX-5 dual port: US$300 200G HDR, PCIe 4.0x16/3.0x16: Switch: NVidia Mellanox QM8700, US$5,000 each NIC: NVidia Mellanox ConnectX-6 dual port, US$600 400G NDR, PCIe 5.0x16/4.0x16: Switch: Switch: NVidia Mellanox QM9700, US$19,000 NIC: NVidia Mellanox ConnectX-7 dual port, US$800 *NIC: the Network Interface Card Last edited by Freewill1; July 30, 2023 at 10:23. Reason: fix typo |
|
July 28, 2023, 14:10 |
|
#2 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
Q2: Yes, the 100 or 200Gbps IB network can achieve linear scaling because your cluster is very small. Q3: Yes, the older 40 or the 56Gbps network will be fine. In fact, you probably will approach your performance goal with 1Gbps ethernet. This is so, because normally, the network has to share only the boundary vectors between nodes. So the network bandwidth requirement is much less than the local core to dram bandwidth for a solution iteration. If you misconfigure the cluster and each node has to reread grid info from a single node at every iteration the traffic would be larger. Normally, repeated reads of the same info from a disk or shared volume ends up cached in core memory so it has to be read only once and not every iteration. |
||
Tags |
cfd, cluster, infiniteband, scaling, self-built |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fine/Turbo cannot open grid built by IGG, Configuration file problem? | wkjshon | Fidelity CFD | 6 | March 29, 2016 03:09 |
Compile Thirdparty-2.3.0 | seav | OpenFOAM Installation | 9 | March 23, 2014 17:08 |
ParaView 3.8.0 problem on debian | Unseen | OpenFOAM Installation | 4 | August 16, 2010 11:26 |
Compilation error with paraview | quartzian | OpenFOAM Installation | 0 | September 21, 2008 09:32 |
CFX 5.6 Built | Neser | CFX | 2 | December 15, 2004 23:21 |