Copyright © Sherlockhttps://validator.w3.org/feed/docs/rss2.htmlSherlock changelogwww.sherlock.stanford.eduhttps://www.sherlock.stanford.edu?utm_source=noticeable&utm_campaign=sherlock&utm_content=other&utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki&utm_medium=newspageenSat, 25 Mar 2023 01:16:33 GMThttps://noticeable.io[email protected] (Sherlock)[email protected] (Noticeable Team)https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/newspages/GtmOI32wuOUPBTrHaeki/01h55ta3gs1vmdhtqqtjmk7m4z-header-logo.pngSherlock changeloghttps://www.sherlock.stanford.edu?utm_source=noticeable&utm_campaign=sherlock&utm_content=other&utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki&utm_medium=newspagehttps://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/newspages/GtmOI32wuOUPBTrHaeki/01h55ta3gs1vmdhtqqtjmk7m4z-header-logo.png#8c1515SAz2fLkjN80X6CGoMnHXSat, 25 Mar 2023 01:04:21 GMT[email protected] (Kilian Cavalotti)A new tool to help optimize job resource requirementshttps://news.sherlock.stanford.edu/publications/a-new-tool-to-help-optimize-job-resource-requirementsIt’s not always easy to determine the right amount of resources to request for a computing job. Making sure that the application will have enough resources to run properly, but avoiding over-requests that would make the jobs spend too much time waiting in queue for resources they won’t be using.

To help users inform those choices, we’ve just added a new tool to the module list on Sherlock. ruse is command-line tool developed by Jan Moren which facilitates measuring processes’ resource usage. It periodically measures the resource use of a process and its sub-processes, and can help users find out how much resource to allocate to their jobs. It will determine the actual memory, execution time and cores that individual programs or MPI applications need to request in their job submission options.

You’ll find more information and some examples in the Sherlock documentation at https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/#resource-requests

Hopefully ruse will make it easier to write job resource requests , and allow users to get a better understanding of their applications’ behavior to take better advantage of Sherlock’s capabilities.

As usual, if you have any question or comment, please don’t hesitate to reach out at [email protected].

]]>
It’s not always easy to determine the right amount of resources to request for a computing job. Making sure that the application will have enough resources to run properly, but avoiding over-requests that would make the jobs spend too much time waiting in queue for resources they won’t be using.

To help users inform those choices, we’ve just added a new tool to the module list on Sherlock. ruse is command-line tool developed by Jan Moren which facilitates measuring processes’ resource usage. It periodically measures the resource use of a process and its sub-processes, and can help users find out how much resource to allocate to their jobs. It will determine the actual memory, execution time and cores that individual programs or MPI applications need to request in their job submission options.

You’ll find more information and some examples in the Sherlock documentation at https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/#resource-requests

Hopefully ruse will make it easier to write job resource requests , and allow users to get a better understanding of their applications’ behavior to take better advantage of Sherlock’s capabilities.

As usual, if you have any question or comment, please don’t hesitate to reach out at [email protected].

]]>
DocumentationSchedulerImprovement
0UxFZFimazxEAK4GjJJOSun, 06 Nov 2022 02:11:25 GMT[email protected] (Kilian Cavalotti)Job #1, again!https://news.sherlock.stanford.edu/publications/job-1-againEventSchedulerSSx9LtIFOW9O3ULcMqGEThu, 03 Jun 2021 20:18:40 GMT[email protected] (Kilian Cavalotti)A new interactive step in Slurmhttps://news.sherlock.stanford.edu/publications/a-new-interactive-step-in-slurm

A new version of the sh_dev tool has been released, that leverages a recently-added Slurm feature.

Slurm 20.11 introduced a new “interactive step” , designed to be used with salloc to automatically launch a terminal on an allocated compute node. This new type of job step resolves a number of problems with the previous interactive job approaches, both in terms of accounting and resource allocation.

What is this about?

In previous versions, launching an interactive job with srun --pty bash would create a step 0, that was consuming resources, especially Generic Resources (GRES, ie. GPUs). Among other things, it made it impossible to use srun within that allocation to launch subsequent steps. Any attempt would result in a “step creation temporarily disabled” error message.

Now, with this new feature, you can use salloc to directly open a shell on a compute node. The new interactive step won’t consume any of the allocated resources, so you’ll be able to start additional steps with srun within your allocation.

sh_dev (aka sdev) has been updated to use interactive steps.

What changes?

For sh_dev

On the surface, nothing changes: you can continue to use sh_dev exactly like before, to start an interactive session on one of the compute nodes dedicated to that task (the default), or on a node in any partition (which is particularly popular among node owners). You’ll be able to use the same options, with the same features (including X11 forwarding).
Under the hood, though, you’ll be leveraging the new interactive step automatically.

For salloc

If you use salloc on a regular basis, the main change is that the resulting shell will open on the first allocated node, instead of the node you ran salloc on:

[kilian@sh01-ln01 login ~]$ salloc
salloc: job 25753490 has been allocated resources
salloc: Granted job allocation 25753490
salloc: Nodes sh02-01n46 are ready for job
[kilian@sh02-01n46 ~] (job 25753490) $ 

If you want to keep that initial shell on the submission host, you can simply specify a command as an argument, and the resulting command will continue to be executed as the calling user on the calling host:

[kilian@sh01-ln01 login ~]$ salloc bash
salloc: job 25752889 has been allocated resources
salloc: Granted job allocation 25752889
salloc: Nodes sh02-01n46 are ready for job
[kilian@sh01-ln01 login ~] (job 25752889) $

For srun

If you’re used to run srun —pty bash to get a shell on a compute node, you can continue to do so (as long as you don’t intend to run additional steps within the allocation).

But you can also just type salloc, get a more usable shell, and save 60% in keystrokes!


Happy computing! And as usual, please feel free to reach out if you have comments or questions.

]]>

A new version of the sh_dev tool has been released, that leverages a recently-added Slurm feature.

Slurm 20.11 introduced a new “interactive step” , designed to be used with salloc to automatically launch a terminal on an allocated compute node. This new type of job step resolves a number of problems with the previous interactive job approaches, both in terms of accounting and resource allocation.

What is this about?

In previous versions, launching an interactive job with srun --pty bash would create a step 0, that was consuming resources, especially Generic Resources (GRES, ie. GPUs). Among other things, it made it impossible to use srun within that allocation to launch subsequent steps. Any attempt would result in a “step creation temporarily disabled” error message.

Now, with this new feature, you can use salloc to directly open a shell on a compute node. The new interactive step won’t consume any of the allocated resources, so you’ll be able to start additional steps with srun within your allocation.

sh_dev (aka sdev) has been updated to use interactive steps.

What changes?

For sh_dev

On the surface, nothing changes: you can continue to use sh_dev exactly like before, to start an interactive session on one of the compute nodes dedicated to that task (the default), or on a node in any partition (which is particularly popular among node owners). You’ll be able to use the same options, with the same features (including X11 forwarding).
Under the hood, though, you’ll be leveraging the new interactive step automatically.

For salloc

If you use salloc on a regular basis, the main change is that the resulting shell will open on the first allocated node, instead of the node you ran salloc on:

[kilian@sh01-ln01 login ~]$ salloc
salloc: job 25753490 has been allocated resources
salloc: Granted job allocation 25753490
salloc: Nodes sh02-01n46 are ready for job
[kilian@sh02-01n46 ~] (job 25753490) $ 

If you want to keep that initial shell on the submission host, you can simply specify a command as an argument, and the resulting command will continue to be executed as the calling user on the calling host:

[kilian@sh01-ln01 login ~]$ salloc bash
salloc: job 25752889 has been allocated resources
salloc: Granted job allocation 25752889
salloc: Nodes sh02-01n46 are ready for job
[kilian@sh01-ln01 login ~] (job 25752889) $

For srun

If you’re used to run srun —pty bash to get a shell on a compute node, you can continue to do so (as long as you don’t intend to run additional steps within the allocation).

But you can also just type salloc, get a more usable shell, and save 60% in keystrokes!


Happy computing! And as usual, please feel free to reach out if you have comments or questions.

]]>
ImprovementScheduler
S2oaJqRSEdqtp6VICvO6Mon, 11 May 2020 20:09:00 GMT[email protected] (Kilian Cavalotti)Job #1https://news.sherlock.stanford.edu/publications/job-1EventSchedulerLWLl3sbP5hYZFMHSJqvSTue, 05 Nov 2019 20:00:00 GMT[email protected] (Kilian Cavalotti)More (and easier!) GPU scheduling optionshttps://news.sherlock.stanford.edu/publications/more-and-easier-gpu-scheduling-optionsNewSchedulerImprovementZ7dki2n3MCcR1PgBRsG2Fri, 03 May 2019 22:36:00 GMT[email protected] (Kilian Cavalotti)A better view at Sherlock's resourceshttps://news.sherlock.stanford.edu/publications/a-better-view-at-sherlock-s-resourcesSchedulerImprovementNewv7Gg5vpNIANw5U1jUsdFMon, 05 Nov 2018 22:36:00 GMT[email protected] (Kilian Cavalotti)Persistent processes on Sherlockhttps://news.sherlock.stanford.edu/publications/persistent-processes-on-sherlockSchedulerDocumentationGbpQey4tQnPIwPr9LyvxTue, 18 Sep 2018 21:53:00 GMT[email protected] (Kilian Cavalotti)Better error messages when submitting jobshttps://news.sherlock.stanford.edu/publications/better-error-messages-when-submitting-jobsImprovementSchedulerRNWO2vpL3CcIFrSyXEW8Wed, 01 Aug 2018 23:32:00 GMT[email protected] (Kilian Cavalotti)High priority QOS for ownershttps://news.sherlock.stanford.edu/publications/high-priority-qosImprovementScheduler