Sherlock changelog

A new tool to help optimize job resource requirements

Sat, 25 Mar 2023 01:04:21 GMT

It’s not always easy to determine the right amount of resources to request for a computing job. Making sure that the application will have enough resources to run properly, but avoiding over-requests that would make the jobs spend too much time waiting in queue for resources they won’t be using.

To help users inform those choices, we’ve just added a new tool to the module list on Sherlock. ruse is command-line tool developed by Jan Moren which facilitates measuring processes’ resource usage. It periodically measures the resource use of a process and its sub-processes, and can help users find out how much resource to allocate to their jobs. It will determine the actual memory, execution time and cores that individual programs or MPI applications need to request in their job submission options.

You’ll find more information and some examples in the Sherlock documentation at https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/#resource-requests

Hopefully ruse will make it easier to write job resource requests , and allow users to get a better understanding of their applications’ behavior to take better advantage of Sherlock’s capabilities.

As usual, if you have any question or comment, please don’t hesitate to reach out at [email protected].

Job #1, again!

Sun, 06 Nov 2022 02:11:25 GMT

A new interactive step in Slurm

Thu, 03 Jun 2021 20:18:40 GMT

A new version of the sh_dev tool has been released, that leverages a recently-added Slurm feature.

Slurm 20.11 introduced a new “interactive step” , designed to be used with salloc to automatically launch a terminal on an allocated compute node. This new type of job step resolves a number of problems with the previous interactive job approaches, both in terms of accounting and resource allocation.

What is this about?

In previous versions, launching an interactive job with srun --pty bash would create a step 0, that was consuming resources, especially Generic Resources (GRES, ie. GPUs). Among other things, it made it impossible to use srun within that allocation to launch subsequent steps. Any attempt would result in a “step creation temporarily disabled” error message.

Now, with this new feature, you can use salloc to directly open a shell on a compute node. The new interactive step won’t consume any of the allocated resources, so you’ll be able to start additional steps with srun within your allocation.

sh_dev (aka sdev) has been updated to use interactive steps.

What changes?

For `sh_dev`

On the surface, nothing changes: you can continue to use sh_dev exactly like before, to start an interactive session on one of the compute nodes dedicated to that task (the default), or on a node in any partition (which is particularly popular among node owners). You’ll be able to use the same options, with the same features (including X11 forwarding).
Under the hood, though, you’ll be leveraging the new interactive step automatically.

For `salloc`

If you use salloc on a regular basis, the main change is that the resulting shell will open on the first allocated node, instead of the node you ran salloc on:

[kilian@sh01-ln01 login ~]$ salloc
salloc: job 25753490 has been allocated resources
salloc: Granted job allocation 25753490
salloc: Nodes sh02-01n46 are ready for job
[kilian@sh02-01n46 ~] (job 25753490) $

If you want to keep that initial shell on the submission host, you can simply specify a command as an argument, and the resulting command will continue to be executed as the calling user on the calling host:

[kilian@sh01-ln01 login ~]$ salloc bash
salloc: job 25752889 has been allocated resources
salloc: Granted job allocation 25752889
salloc: Nodes sh02-01n46 are ready for job
[kilian@sh01-ln01 login ~] (job 25752889) $

For `srun`

If you’re used to run srun —pty bash to get a shell on a compute node, you can continue to do so (as long as you don’t intend to run additional steps within the allocation).

But you can also just type salloc, get a more usable shell, and save 60% in keystrokes!

Happy computing! And as usual, please feel free to reach out if you have comments or questions.

Job #1

Mon, 11 May 2020 20:09:00 GMT

More (and easier!) GPU scheduling options

Tue, 05 Nov 2019 20:00:00 GMT

A better view at Sherlock's resources

Fri, 03 May 2019 22:36:00 GMT

Persistent processes on Sherlock

Mon, 05 Nov 2018 22:36:00 GMT

Better error messages when submitting jobs

Tue, 18 Sep 2018 21:53:00 GMT

High priority QOS for owners

Wed, 01 Aug 2018 23:32:00 GMT