Your Sherlock prompt just got a little smarter
timestamp1621017420000
Have you ever felt confused when running things on Sherlock and wondered if your current shell was part of a job? And if so, which one? Well, maybe you noticed it already, but we’ve deployed a small improvement to the Sherlock shell prompt (the thing that displays your user name and the host name of the node you’re on) that will hopefully makes things a little easier to navigate.
Now, when you’re in the context of a Slurm job, your shell prompt will automatically display that job’s id, so you always know where you’re at.
For instance, when you submit an interactive job with sdev
, your prompt will automatically be updated to not only display the host name of the compute node you’ve been allocated, but also the id of the job your new shell is running in:
[kilian@sh03-ln06 login ~]$ sdev
srun: job 24333698 queued and waiting for resources
srun: job 24333698 has been allocated resources
[kilian@sh02-01n58 ~] (job 24333698) $
Use cases
This additional information could prove particularly useful in situations where the fact that you’re running in the context of a Slurm job is not immediately visible.
Dynamic resource allocation
For instance, when allocating resources with salloc
, the scheduler will start a new shell on the same node you’re on, but nothing will differentiate that shell from your login shell, so it’s pretty easy to forget that you’re in a job (and also that if you exit that shell, you’ll terminate your resource allocation).
So now, when you use salloc
, your prompt will be updated as well, so you’ll always know you’re in a job:
[kilian@sh03-ln06 login ~]$ salloc -N 4 --time 2:0:0
salloc: Pending job allocation 24333807
[...]
[kilian@sh03-ln06 login ~] (job 24333807) $ srun hostname
sh03-01n25.int
sh03-01n28.int
sh03-01n27.int
sh03-01n30.int
[kilian@sh03-ln06 login ~] (job 24333807) $ exit
salloc: Relinquishing job allocation 24333807
[kilian@sh03-ln06 login ~]$
Connecting to computing nodes
Another case is when you need to connect via SSH to compute nodes where your jobs are running. The scheduler will automatically inject your SSH session in the context of the running job, and now, you’ll see that jobid automatically displayed in your prompt, like this:
[kilian@sh03-ln06 login ~]$ sbatch sleep.sbatch
Submitted batch job 24334257
[kilian@sh03-ln06 login ~]$ squeue -j 24334257 -O nodelist -h
sh02-01n47
[kilian@sh03-ln06 login ~]$ ssh sh02-01n47
------------------------------------------
Sherlock compute node
>> deployed Fri Apr 30 23:36:45 PDT 2021
------------------------------------------
[kilian@sh02-01n47 ~] (job 24334257) $
Step creation temporarily disabled
Have you ever encountered that message when submitting a job?step creation temporarily disabled, retrying (Requested nodes are busy)
That usually means that you’re trying to run a job from within a job: the scheduler tries to allocate resources that are already allocated to your current shell, so it waits until those resources become available. Of course, that never happens, so it waits here forever. Or until your job time runs out…
Now, a quick glance at your prompt will show you that you’re already in a job, so it will hopefully help catching those situations:
[kilian@sh03-ln06 login ~]$ srun --pty bash
srun: job 24334422 queued and waiting for resources
srun: job 24334422 has been allocated resources
[kilian@sh02-01n47 ~] (job 24334422) $ srun --pty bash
srun: Job 24334422 step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Job 24334422 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 24334422 step creation still disabled, retrying (Requested nodes are busy)
srun: Job 24334422 step creation still disabled, retrying (Requested nodes are busy)
We hope that small improvement will help make things easier and more visible when navigating jobs on Sherlock. Sometimes, it’s the little things, they say. :)
As usual, please feel free to reach out if you have comments or questions!
Did you like this update?