How many jobs are running?
What partitions do I have access to?
How many CPUs can I use?
Where should I submit my jobs?
Any of those sound familiar?
We know it’s not always easy to navigate the native scheduler tools, their syntax, and the gazillion options they provide.
So today, we’re introducing
sh_part, a new command on Sherlock, that will simplify navigating Sherlock’s partitions, and provide an user-focused, centralized view of its computing resources.
To run it, simply type
sh_part at the prompt on any login or compute node, and you’ll be greeted by something like this:
sh_part QUEUE FREE TOTAL FREE TOTAL RESORC OTHER MAXJOBTIME CORES NODE GRES PARTITION CORES CORES NODES NODES PENDNG PENDNG DAY-HR:MN PERNODE MEM-GB (COUNT) normal* 30 1600 0 76 2801 2278 7-00:00 20-24 128-191 - bigmem 0 88 0 2 90 1 1-00:00 32-56 512-3072 - dev 50 56 2 3 32 0 0-02:00 16-20 128 - gpu 62 140 0 7 121 0 7-00:00 16-24 191-256 gpu:8(1),gpu:4(6)
You’ll find a brief list of partitions you have access to, complete with information about the number of available nodes/cores and pending jobs.
- in the
QUEUE PARTITIONcolumn, the
*character indicates the default partition.
RESOURCE PENDINGcolumn shows the core count of pending jobs that are waiting on resources,
OTHER PENDINGcolumn lists core counts for jobs that are pending for other reasons, such as licenses, user, group or any other limit,
GREScolumn shows the number and type of GRES available in that partition, and the number of nodes that feature that specific GRES combination in paranteses. So for instance, in the output above, the
gpupartition features ` node with 8 GPUs, and 6 nodes with 4 GPUs each.
sh_part will make it easier to figure out cluster activity, and allow users to get a better understanding of what’s running and what’s available in the various Sherlock partitions.
As usual, if you have any question or comment, please don’t hesitate to reach out at email@example.com.
Did you like this update?