|
[Sponsors] |
CFX performance scaling on multicore local server |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
July 18, 2016, 17:35 |
CFX performance scaling on multicore local server
|
#1 |
New Member
Sachin Aggarwal
Join Date: Aug 2014
Posts: 4
Rep Power: 12 |
Hi,
I have been running a frozen rotor problem on a local server with 19 cores running parallel using Intel local parallel MPI. My company recently purchased a new server to be able to run 32 cores and use our hpc licenses to full. When i ran the same simulation on new server with 32 cores it runs slower but when i run it with same 19 cores it run a little bit faster than old server. Any settings I am missing? My simulation is big enough to have 14 million+ elements, so in my understanding it should not have any multi-threading issues. Anyone can give me some insight/guidance on this issue? I really appreciate the help. Thank you, Sachin Aggarwal |
|
July 18, 2016, 21:31 |
|
#2 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,871
Rep Power: 144 |
Effective multi-processor simulations require good interconnects, memory busses and much more.
If you have 32 cores in a single machine it will need to be carefully designed for multi-threaded operation or it will not give you the performance you expect. Also check you have not crippled the machine. Check the: * BIOS is current * motherboard, hard drive, ethernet and other drivers are correct and current * firmware is current in the hard drive and other gizmos * You have not run out of memory * You are not sharing the machine with other users * Your antivirus or other background process is not causing problems. |
|
July 19, 2016, 15:32 |
|
#3 |
New Member
Sachin Aggarwal
Join Date: Aug 2014
Posts: 4
Rep Power: 12 |
Hi Glenn,
Thank you for your reply. We completed the installation of this machine last week only, so hardware and software parts are good and configured by Dell technician himself. The server has 192 GB of RAM and only 12-15 GB is being used while simulation is running. I do have virtualization "on" in the server I cannot say if that can affect the solution time or not. I am not quite sure about antivirus, but I will check with IT. The server has 44 cores in FC630 blade configuration installed in a FX2 blade chassis. I hope this helps. Other than this I ran a stage up study on my set-up with increasing number of cores by 4 and found that 28 cores is most time efficient rather than 32. Any thoughts on that? Thank you very much for your help. Regards, Sachin Aggarwal |
|
July 19, 2016, 20:45 |
|
#4 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,871
Rep Power: 144 |
If 28 cores is running better than 32 it suggests there is a bottleneck in your system which is preventing it running efficiently to larger number of cores.
Do not assume that because it was installed by a technician and it is the latest stuff that makes it suitable for large multiprocessor simulation. Most large multi processor systems are design for servers and web servers and they have very different demands compared to multi processor simulations. Also - make sure your simulation is suitable for lots of partitions. How many nodes per core? What physics are you using? What physics are you modelling. Here are some examples of things which have caught me out in the past on multiprocessor simulations: 1) A workstation straight from the vendor (Dell) ran at half the speed I expected based on spec.org results. I found the BIOS did not support the CPU and when I upgraded the BIOS to the latest BIOS it supported the CPU and double speed to the expected value. 2) A high-end workstation straight from the vendor ran a different simulation software at a fraction of the speed expected. It turned out the motherboard was unsuitable for multi-processor operation as the FSB was not fast enough for the memory throughput. This was despite having the best CPU and lots of memory. We had to downgrade the machine to a CAD workstation and buy more suitable machines where I checked the technical details of the workstation carefully. 3) How is the CPU to memory and CPU to CPU interconnect done on this machine? |
|
July 20, 2016, 03:31 |
|
#5 | |
Senior Member
Maxim
Join Date: Aug 2015
Location: Germany
Posts: 413
Rep Power: 13 |
Quote:
My CFD hardware guy showed my bechmarks of multi-core CPUs where apparently two of the 4-core Xeons on one mainboard (with same amount of RAM each) are faster than a one 8-core Xeon with the same amount of RAM. So is your 32-core cluster a 4 computer with each 2*4-core Xeons on one mainboard plus InfiniBand for the connection? You question might also be suitable for the hardware section of this forum - this is where the hardware guys are hiding |
||
July 21, 2016, 17:14 |
|
#6 | |
New Member
Sachin Aggarwal
Join Date: Aug 2014
Posts: 4
Rep Power: 12 |
Quote:
Thank you for your reply. I am working with my IT department to figure the answer to your questions out. They told me that BIOS is the updated one. They will try to look into FSB and motherboard and also about the interconnect. When i will get the answer I will let you know. About the problem itself, I am simulating a high speed wind turbine with a frozen rotor interface. The model has 14+ Million elements and 8+ million nodes. As far as i know the thumb rule is 50-100K nodes/core. This makes me believe that I should be able to use 80 cores without any loss of performance. I am using rotational periodicity to decrease the problem size to half and default partitioner Metis for partitioning the model. Thank You, Sachin Aggarwal |
||
July 21, 2016, 17:18 |
|
#7 | |
New Member
Sachin Aggarwal
Join Date: Aug 2014
Posts: 4
Rep Power: 12 |
Quote:
We did not add new CPU to old CPU but replaced it with new. The old server was a 20 core machine and i was using 19 out of 20 for simulations as 20 was clogging it down. The new machine has 44 cores in total and my intention was to use 32 cores out of them. I hope this clear things up. Thank You, Sachin Aggarwal |
||
July 21, 2016, 19:19 |
|
#8 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,871
Rep Power: 144 |
You do not appear to be modelling any physics which cause multi processor issues.
Can you show a graph of simulation speed versus number of cores? Also, how does your simulation speed compare to the spec.org result for your machine? |
|
Tags |
cfx 17.1, intel local parallel mpi, multi-cores, solve time |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
time step continuity problem in VAWT simulation | lpz_michele | OpenFOAM Running, Solving & CFD | 5 | February 22, 2018 20:50 |
Help for the small implementation in turbulence model | shipman | OpenFOAM Programming & Development | 25 | March 19, 2014 11:08 |
CFX local parallel on windows XP | frank | CFX | 12 | April 24, 2008 08:26 |
ANSYS CFX 10.0 Parallel Performance for Windows XP | Saturn | CFX | 4 | August 13, 2006 13:27 |
Does CFX support LES, local dynamic mdoel | JJ | CFX | 0 | August 28, 2003 22:15 |