August 8, 2020, 09:39
|
|
#21
|
Senior Member
MA
Join Date: Mar 2020
Posts: 163
Rep Power: 6
|
Quote:
Originally Posted by bluebase
If you have a cluster with time limits for jobs, you might like my script https://wikis.ovgu.de/lss/doku.php?i...rccm#variant_1
It tells slurm to send a signal, when a job is about to reach its time limit. The above script traps that signal and will write an ABORT file into the work directory. That way, starccm can stop gracefully.
The advantage with this method is you will not lose the computation since your last (auto)save.
As long as you have an active stop file stopping criterion this methods will work with macro-based runs as well as command-style runs.
Anyhow, this script contains some lines which are specific to my university's cluster. This includes the compute node's resources, name of queues, work directories, availability and names of modules, custom prolog and epilog scripts, and mpi settings.
Of course, you need to adapt these information to your system.
|
Thanks for your kind input. I already made a SLURM file, and sorted out the issues. However, it would help someone else for sure.
|
|
|