Sherlock changelog

It’s been a long, long, way too long of a wait, but despite a global pandemic, heatwaves, thunderstorms, power shutoffs, fires and smoke, it’s finally here!

Today, we’re very excited to announce the immediate availability of Sherlock 3.0, the third generation of the Sherlock cluster.

What is Sherlock 3.0?

First, let’s take a quick step back for context.

The Sherlock cluster is built around core Infiniband fabrics, which connect compute nodes together and allow them to work as a single entity. As we expand Sherlock over time, more compute nodes are added to the cluster, and when a core fabric reaches capacity, a new one needs to be spun up. This is usually a good opportunity to refresh the compute node hardware characteristics, as well as continue expanding and renewing ancillary equipment and services, such as login nodes, DTNs, storage systems, etc. The collection of compute and service nodes connected to the same Infiniband fabric constitutes a sort of island, or generation, that could live on its own, but is actually an integral part of the greater, unified Sherlock cluster.

So far, since its inception in 2014, Sherlock has grown over two generations of nodes: the first one built around an FDR (56Gb/s) Infiniband fabric, and the second one, started in 2017, around an EDR (100Gb/s) fabric.

Late last year, that last EDR fabric reached capacity, and after a long and multifactorial hiatus, today, we’re introducing the third generation of Sherlock, architectured around a new Infiniband fabric, and a completely refreshed compute node offering.

What does it look like?

Sherlock still looks like a bunch of black boxes with tiny lights, stuffed in racks 6ft high, and with an insane number of cables going everywhere.

But in more technical details, Sherlock 3.0 features:

a new, faster interconnect | Infiniband HDR, 200Gb/s
The new interconnect provides more bandwidth and lower latency to all the new nodes on Sherlock, for either inter-node communication in large parallel MPI applications, or for accessing the $SCRATCH and $OAK parallel file systems.
Sherlock is one of the first HPC clusters in the world to provide 200Gb/s to the nodes.
new and faster processors | AMD 2nd generation EPYC (Rome) CPUs
To take advantage of the doubled inter-node bandwidth, a brand new generation of CPUs was required, to provide enough internal bandwidth between the CPUs and the network interfaces. The AMD Rome CPUs are actually the first (and currently still the only) x86 CPU model to provide PCIe Gen4 connectivity, that enables faster local and remote I/O, and that can unlock 200Gb/s network speeds.
Those CPUs are also faster, draw less power, and provide more cores per socket than the ones found in the previous generations of Sherlock nodes, with a minimum of 32 CPU cores per node.
more (and faster) internal storage | 2TB NVMe per node
Sherlock 3.0 nodes now each feature a minimum of 2TB of local NVMe storage (over 10x the previous amount), for applications that are particularly sensitive to IOPS rates.
refreshed $HOME storage
More nodes means more computing power, but it also means more strain on the shared infrastructure. To absorb it, we’ve also refreshed and expanded the storage cluster that supports the $HOME and $GROUP_HOME storage spaces, to provide higher bandwidth, more IOPS, and better availability.
more (and faster) login and DTN nodes
Sherlock 3.0 also feature 8 brand new login nodes, that are part of the login.sherlock.stanford.edu login pool, and each feature a pair of AMD 7502 CPUs (for a total of 64 cores) and 512 GB of RAM. As well as a new pair of dedicated Data Transfer Nodes (DTNs)
refreshed and improved infrastructure
The list would be too long to go through exhaustively, but between additional service nodes to better scale the distributed cluster management infrastructure, improved Ethernet topology between the racks, and a refreshed hardware framework for the job scheduler, all the aspects of Sherlock have been rethought and improved.

What does it change for me?

In terms of habits and workflows: nothing. You don’t have to change anything and can continue to use Sherlock exactly the way you’ve been using it so far.

Sherlock is still a single cluster, with the same:

single point of entry at login.sherlock.stanford.edu,
single and ubiquitous data storage space (you can still access all of your data on all the file systems, from all the nodes in the cluster),
single application stack (you can load the same module and run the same software on all Sherlock nodes).

But it now features a third island, with a new family of compute nodes.

One thing you’ll probably notice pretty quickly is that your pending times in queue for the normal, bigmem and gpu partitions have been dropping. Considerably.

This is because, thanks to the generous sponsorship of the Stanford Provost, we’ve been able to add the following resources to Sherlock’s public partitions:

partition	#nodes	node specs
`normal`	72	32-core (1x 7502) w/ 256GB RAM
`normal`	2	128-core (2x 7742) w/ 1TB RAM
`bigmem`	1	64-core (2x 7502) w/ 4TB RAM
`gpu`	16	32-core (1x 7502P) w/ 256GB RAM and 4x RTX 2080 Ti GPUs
`gpu`	2	32-core (1x 7502P) w/ 256GB RAM and 4x V100S GPUs
Total	93	3,200 cores, 30TB RAM, 72 GPUs

Those new Sherlock 3.0 nodes are adding over twice the existing computing power available for free to every Sherlock user in the normal, bigmem and gpu partitions.

How can I use the new nodes?

It’s easy! You can keep submitting your jobs as usual, and the scheduler will automatically try to pick the new nodes that satisfy your request requirements if they’re available.

If you want to target the new nodes specifically, take a look at the output of sh_node_feat: all the new nodes have features defined that allow the scheduler to specifically select them when your job requests particular constraints.

For instance, if you want to select nodes:

with HDR IB connectivity, you can use -C IB:HDR
with AMD Rome CPUs, you can use -C CPU_GEN:RME
with 7742 CPUs, you can use -C CPU_SKU:7742
with Turing GPUs, you can use -C GPU_GEN:TUR

Can I get more of it?

Absolutely! And we’re ready to take orders today.

If you’re interested in getting your own compute nodes on Sherlock, we’ve assembled a catalog of select configurations that you can choose from, and worked very hard with our vendors to maintain comparable price ranges with our previous generation offerings.

You’ll find the detailed configuration and pricing in the Sherlock Compute Nodes Catalog, and we’ve also prepared an Order Form that you can use to provide the required information to purchase those nodes

Sherlock catalog
http://www.sherlock.stanford.edu/docs/overview/orders/catalog
Order form
http://www.sherlock.stanford.edu/docs/overview/orders/form

For complete details about the purchasing process, please take a look at
https://www.sherlock.stanford.edu/docs/overview/orders/ and as usual,
please let us know if you have any questions.

Finally, we wanted to sincerely thank every one of you for your patience while we were working on bringing up this new cluster generation, in an unexpectedly complicated global context. We know it’s been a very long wait, but hopefully it will have been worthwhile.

Happy computing and don’t hesitate to reach out!

Oh, and Sherlock is on Slack now, so feel free to come join us there too!