A new interactive step in Slurm
A new version of the
sh_devtool has been released, that leverages a recently-added Slurm feature.
Slurm 20.11 introduced a new “interactive step” , designed to be used with
salloc to automatically launch a terminal on an allocated compute node. This new type of job step resolves a number of problems with the previous interactive job approaches, both in terms of accounting and resource allocation.
What is this about?
In previous versions, launching an interactive job with
srun --pty bash would create a step 0, that was consuming resources, especially Generic Resources (GRES, ie. GPUs). Among other things, it made it impossible to use
srun within that allocation to launch subsequent steps. Any attempt would result in a “step creation temporarily disabled” error message.
Now, with this new feature, you can use
salloc to directly open a shell on a compute node. The new interactive step won’t consume any of the allocated resources, so you’ll be able to start additional steps with
srun within your allocation.
sdev) has been updated to use interactive steps.
On the surface, nothing changes: you can continue to use
sh_dev exactly like before, to start an interactive session on one of the compute nodes dedicated to that task (the default), or on a node in any partition (which is particularly popular among node owners). You’ll be able to use the same options, with the same features (including X11 forwarding).
Under the hood, though, you’ll be leveraging the new interactive step automatically.
If you use
salloc on a regular basis, the main change is that the resulting shell will open on the first allocated node, instead of the node you ran
[kilian@sh01-ln01 login ~]$ salloc salloc: job 25753490 has been allocated resources salloc: Granted job allocation 25753490 salloc: Nodes sh02-01n46 are ready for job [kilian@sh02-01n46 ~] (job 25753490) $
If you want to keep that initial shell on the submission host, you can simply specify a command as an argument, and the resulting command will continue to be executed as the calling user on the calling host:
[kilian@sh01-ln01 login ~]$ salloc bash salloc: job 25752889 has been allocated resources salloc: Granted job allocation 25752889 salloc: Nodes sh02-01n46 are ready for job [kilian@sh01-ln01 login ~] (job 25752889) $
If you’re used to run
srun —pty bash to get a shell on a compute node, you can continue to do so (as long as you don’t intend to run additional steps within the allocation).
But you can also just type
salloc, get a more usable shell, and save 60% in keystrokes!
Happy computing! And as usual, please feel free to reach out if you have comments or questions.
Did you like this update?