A better view at Sherlock's resources
timestamp1556922960001
How many jobs are running?
What partitions do I have access to?
How many CPUs can I use?
Where should I submit my jobs?
Any of those sound familiar?
We know it’s not always easy to navigate the native scheduler tools, their syntax, and the gazillion options they provide.
Enter sh_part
So today, we’re introducing sh_part
[1], a new command on Sherlock, that will simplify navigating Sherlock’s partitions, and provide an user-focused, centralized view of its computing resources.
To run it, simply type sh_part
at the prompt on any login or compute node, and you’ll be greeted by something like this:
$ sh_part
QUEUE FREE TOTAL FREE TOTAL RESORC OTHER MAXJOBTIME CORES NODE GRES
PARTITION CORES CORES NODES NODES PENDNG PENDNG DAY-HR:MN PERNODE MEM-GB (COUNT)
normal* 30 1600 0 76 2801 2278 7-00:00 20-24 128-191 -
bigmem 0 88 0 2 90 1 1-00:00 32-56 512-3072 -
dev 50 56 2 3 32 0 0-02:00 16-20 128 -
gpu 62 140 0 7 121 0 7-00:00 16-24 191-256 gpu:8(1),gpu:4(6)
You’ll find a brief list of partitions you have access to, complete with information about the number of available nodes/cores and pending jobs.
- in the
QUEUE PARTITION
column, the*
character indicates the default partition. - the
RESOURCE PENDING
column shows the core count of pending jobs that are waiting on resources, - the
OTHER PENDING
column lists core counts for jobs that are pending for other reasons, such as licenses, user, group or any other limit, - the
GRES
column shows the number and type of GRES available in that partition, and the number of nodes that feature that specific GRES combination in paranteses. So for instance, in the output above, thegpu
partition features ` node with 8 GPUs, and 6 nodes with 4 GPUs each.
Hopefully sh_part
will make it easier to figure out cluster activity, and allow users to get a better understanding of what’s running and what’s available in the various Sherlock partitions.
As usual, if you have any question or comment, please don’t hesitate to reach out at [email protected].
Did you like this update?