Sherlock changelog

Instant lightweight GPU instances are now available

by Kilian Cavalotti, Technical Lead & Architect, HPC
New
Hardware
We know that getting access to GPUs on Sherlock can be difficult and feel a little frustrating at times. Which is why we are excited to announce the immediate availability of our new instant lightweight GPU instances!

A new tool to help optimize job resource requirements

by Kilian Cavalotti, Technical Lead & Architect, HPC
It’s not always easy to determine the right amount of resources to request for a computing job. Making sure that the application will have enough resources to run properly, but avoiding over-requests that would make the jobs spend too much
Documentation
Scheduler
Improvement

SRCF is expanding

by Kilian Cavalotti, Technical Lead & Architect, HPC
Maintenance
In order to bring up a new building that will increase data center capacity, a full SRCF power shutdown is planned for late June 2023. It’s expected to last about a week, and Sherlock will be unavailable during that time.

More free compute on Sherlock!

by Kilian Cavalotti, Technical Lead & Architect, HPC
Announce
Hardware
Improvement
We’re thrilled to announce that the free and generally available normal partition on Sherlock is getting an upgrade! With the addition of 24 brand new SH3_CBASE.1 compute nodes, each featuring one AMD EPYC 7543 Milan 32-core CPU and 256 GB

ClusterShell on Sherlock

by Kilian Cavalotti, Technical Lead & Architect, HPC
Software
New
Ever wondered how your jobs were doing while they were running? Keeping a eye on a log file is nice, but what if you could quickly gather process lists, usage metrics and other data points from all the nodes your multi-node jobs are running

Job #1, again!

by Kilian Cavalotti, Technical Lead & Architect, HPC
This is not the first time, we’ve been through this already (not so long ago, actually) but today, the Slurm job id counter was reset and went from job #67043327 back to job #1.
Event
Scheduler

From Rome to Milan, a Sherlock catalog update

by Kilian Cavalotti, Technical Lead & Architect, HPC
Announce
Hardware
It’s been almost a year and a half since we first introduced Sherlock 3.0 and its major new features: brand new CPU model and manufacturer, 2x faster interconnect, much larger and faster node-local storage, and more! We’ve now reached an

3.3 PFlops: Sherlock hits expansion milestone

by Kilian Cavalotti, Technical Lead & Architect, High Performance Computing
Hardware
Event
Sherlock is a traditional High-Performance Computing cluster in many aspects. But unlike most of similarly-sized clusters where hardware is purchased all at once, and refreshed every few years, it is in constant evolution. Almost like a

Tracking NFS problems down to the SFP level

by Kilian Cavalotti
Blog
Data
Hardware
When NFS problems turn out to be... not NFS problems at all.