massive parallel run messed up lustre file system

matthias · May 24, 2011, 12:21

Hello folks,

I have performed a parallel run with 2048 cores (in future more than 2048) using an HPC with lustre file system. So far the simulation ran fine, results were good but there occured a problem I had not expected.
After some runs of my case accompanied by big IO traffic which was produced by the other users on the cluster, the storage system (lustre) of the cluster was messed up. A detailed analysis by SGI revealed that a massive simultaneous parallel access to the storage system was responsible for the damage.
For this reason, the admins of the cluster have passed some rules concerning the usage of the HPC. From now not more than ~600-800 processes (or tasks/threads/files) should be read or written simultaneously (especially not more than 6000 files should be written simultaneously by all users).

The admins asked me, if it would be possible to serialize OF for read/write access when using lustre file systems and more than (let's say) 1500 cores.
They suggested to read/write the first block of 128 or 256 processes/files/threads and then the next one and so on until all data is loaded or written within a time step.
Time steps without IO traffic would be not affected by this restriction.

I would therefore like to forward the question to the experts.

Best regards

Matthias

cliffw · May 28, 2011, 17:18

600-800 threads is actually kinda small for Lustre, large sites routinely run >100k threads (see http://www.nccs.gov/jaguar/ for example)

If your backend storage cannot keep up with the volume of Lustre IO requests, there are various ways to tune the Lustre clients to reduce IO load.
You can reduce the number of RPCs in flight, reduce the amount of dirty memory cached per client, etc. Client tuning is quite easy - certainly simpler than forcing serialized IO.

See lustre.org or whamcloud.com for the Lustre manual, which has the tuning information. Also see the lustre-discuss email list

(Note: I work for Whamcloud)

May 24, 2011, 12:21	massive parallel run messed up lustre file system	#1
matthias Member Matthias Walter Join Date: Mar 2009 Location: Rostock, Germany Posts: 63 Rep Power: 17	Hello folks, I have performed a parallel run with 2048 cores (in future more than 2048) using an HPC with lustre file system. So far the simulation ran fine, results were good but there occured a problem I had not expected. After some runs of my case accompanied by big IO traffic which was produced by the other users on the cluster, the storage system (lustre) of the cluster was messed up. A detailed analysis by SGI revealed that a massive simultaneous parallel access to the storage system was responsible for the damage. For this reason, the admins of the cluster have passed some rules concerning the usage of the HPC. From now not more than ~600-800 processes (or tasks/threads/files) should be read or written simultaneously (especially not more than 6000 files should be written simultaneously by all users). The admins asked me, if it would be possible to serialize OF for read/write access when using lustre file systems and more than (let's say) 1500 cores. They suggested to read/write the first block of 128 or 256 processes/files/threads and then the next one and so on until all data is loaded or written within a time step. Time steps without IO traffic would be not affected by this restriction. I would therefore like to forward the question to the experts. Best regards Matthias

May 28, 2011, 17:18	massively parallel and lustre	#2
cliffw New Member Cliff white Join Date: May 2011 Posts: 1 Rep Power: 0	600-800 threads is actually kinda small for Lustre, large sites routinely run >100k threads (see http://www.nccs.gov/jaguar/ for example) If your backend storage cannot keep up with the volume of Lustre IO requests, there are various ways to tune the Lustre clients to reduce IO load. You can reduce the number of RPCs in flight, reduce the amount of dirty memory cached per client, etc. Client tuning is quite easy - certainly simpler than forcing serialized IO. See lustre.org or whamcloud.com for the Lustre manual, which has the tuning information. Also see the lustre-discuss email list (Note: I work for Whamcloud)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
SnappyHexMesh OF-1.6-ext crashes on a parallel run	norman1981	OpenFOAM Bugs	5	December 7, 2011 13:48
1.7.x Environment Variables on Linux 10.04	rasma	OpenFOAM Installation	9	July 30, 2010 05:43
OpenFOAM Install Script	ljsh	OpenFOAM Installation	82	October 12, 2009 12:47
OpenFOAM on MinGW crosscompiler hosted on Linux	allenzhao	OpenFOAM Installation	127	January 30, 2009 20:08
Problem with rhoSimpleFoam	matteo_gautero	OpenFOAM Running, Solving & CFD	0	February 28, 2008 07:51