A new interactive step in Slurm
June 3rd, 2021 at 8:18 PM
A new version of the
sh_dev
tool has been released, that leverages a recently-added Slurm feature.
Slurm 20.11 introduced a new “interactive step” , designed to be used with salloc
to automatically launch a terminal on an allocated compute node. This new type of job step resolves a number of problems with the previous interactive job approaches, both in terms of accounting and resource allocation.
What is this about?
In previous versions, launching an interactive job with srun --pty bash
would create a step 0, that was consuming resources, especially Generic Resources (GRES, ie. GPUs). Among other things, it made it impossible to use srun
within that allocation to launch subsequent steps. Any attempt would result in a “step creation temporarily disabled” error message.
Now, with this new feature, you can use salloc
to directly open a shell on a compute node. The new interactive step won’t consume any of the allocated resources, so you’ll be able to start additional steps with srun
within your allocation. sh_dev
(aka sdev
) has been updated to use interactive steps.
What changes?
For sh_dev
On the surface, nothing changes: you can continue to use sh_dev
exactly like before, to start an interactive session on one of the compute nodes dedicated to that task (the default), or on a node in any partition (which is particularly popular among node owners). You’ll be able to use the same options, with the same features (including X11 forwarding).
Under the hood, though, you’ll be leveraging the new interactive step automatically.
For salloc
If you use salloc
on a regular basis, the main change is that the resulting shell will open on the first allocated node, instead of the node you ran salloc
on:
[kilian@sh01-ln01 login ~]$ salloc
salloc: job 25753490 has been allocated resources
salloc: Granted job allocation 25753490
salloc: Nodes sh02-01n46 are ready for job
[kilian@sh02-01n46 ~] (job 25753490) $
If you want to keep that initial shell on the submission host, you can simply specify a command as an argument, and the resulting command will continue to be executed as the calling user on the calling host:
[kilian@sh01-ln01 login ~]$ salloc bash
salloc: job 25752889 has been allocated resources
salloc: Granted job allocation 25752889
salloc: Nodes sh02-01n46 are ready for job
[kilian@sh01-ln01 login ~] (job 25752889) $
For srun
If you’re used to run srun —pty bash
to get a shell on a compute node, you can continue to do so (as long as you don’t intend to run additional steps within the allocation).
But you can also just type salloc
, get a more usable shell, and save 60% in keystrokes!
Happy computing! And as usual, please feel free to reach out if you have comments or questions.
Did you like this update?