urn:noticeable:projects:bYyIewUV308AvkMztxixSherlock changelogwww.sherlock.stanford.edu2024-02-08T00:29:40.623ZCopyright © SherlockNoticeablehttps://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/newspages/GtmOI32wuOUPBTrHaeki/01h55ta3gs1vmdhtqqtjmk7m4z-header-logo.pnghttps://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/newspages/GtmOI32wuOUPBTrHaeki/01h55ta3gs1vmdhtqqtjmk7m4z-header-logo.png#8c1515urn:noticeable:publications:VKxO5IXJlMStQurJnpwv2024-02-07T23:49:24.699Z2024-02-08T00:29:40.623ZSherlock goes full flashWhat could be more frustrating than anxiously waiting for your computing job to finish? Slow I/O that makes it take even longer is certainly high on the list. But not anymore! Fir, Sherlock’s scratch file system, has just undergone a major<p>What could be more frustrating than anxiously waiting for your computing job to finish? Slow I/O that makes it take even longer is certainly high on the list. But not anymore! <a href="https://news.sherlock.stanford.edu/publications/a-new-scratch?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-goes-full-flash&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.VKxO5IXJlMStQurJnpwv&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Fir"><strong>Fir</strong></a><strong>,</strong> <strong>Sherlock’s scratch file system, has just undergone a major tech face-lift: it’s now</strong> <strong>a 10 PB all-flash storage system, providing an aggregate bandwidth of</strong> <strong>400 GB/sec</strong> (and &gt;800 kIOPS). Bringing Sherlock’s high-performance parallel scratch file system into the era of flash storage was not just a routine maintenance task, but a significant leap into the future of HPC and AI computing.</p><h2>But first, a little bit of context </h2><p>Traditionally, High-Performance Computing clusters face a challenge when dealing with modern, data-intensive applications. Existing HPC storage systems, long designed with spinning disks to provide efficient and parallel sequential read/write operations, often become bottlenecks for modern workloads generated by AI/ML or CryoEM applications. Those demand substantial data storage and processing capabilities, putting a strain on traditional systems.</p><p>So to accommodate those new needs and future evolution of the HPC I/O landscape, we at Stanford Research Computing, with the generous support of the <a href="https://doresearch.stanford.edu/who-we-are/office-vice-provost-and-dean-research?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-goes-full-flash&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.VKxO5IXJlMStQurJnpwv&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Office of the Stanford VPDoR">Vice Provost and Dean of Research</a>, have been hard at work for over two years, revamping Sherlock's scratch with an all-flash system. </p><p>And it was not just a matter of taking delivery of a new turn-key system. As most things we do, it was done entirely in-house: from the original vendor-agnostic design, upgrade plan, budget requests, procurement, gradual in-place hardware replacement at the Stanford Research Computing Facility (SRCF), deployment and validation, performance benchmarks, to the final production stages, all of those steps were performed with minimum disruption for all Sherlock users.</p><h2>The technical details</h2><p>The <code>/scratch</code> file system on Sherlock is using <a href="https://wiki.lustre.org/?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-goes-full-flash&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.VKxO5IXJlMStQurJnpwv&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Lustre">Lustre</a>, an open-source, parallel file system that supports many requirements of leadership class HPC environments. And as you probably know by now, Stanford Research Computing loves <a href="https://github.com/stanford-rc?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-goes-full-flash&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.VKxO5IXJlMStQurJnpwv&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="open source">open source</a>! We actively contribute to the Lustre community and are a proud member of <a href="https://opensfs.org/?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-goes-full-flash&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.VKxO5IXJlMStQurJnpwv&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="OpenSFS">OpenSFS</a>, a non-profit industry organization that supports vendor-neutral development and promotion of Lustre.</p><p>In Lustre, file metadata and data are stored separately, with Object Storage Servers (OSS) serving file data on the network. Each OSS pair and associated storage devices forms an I/O cell, and Sherlock's scratch has just bid farewell to its old HDD-based I/O cells. In their place, new flash-based I/O cells have taken the stage, each equipped with 96 x 15.35TB SSDs, delivering mind-blowing performance.</p><p>Sherlock’s <code>/scratch</code> has 8 I/O cells and the goal was to replace every one of them. Our new I/O cell has 2 OSS with Infiniband HDR at 200Gb/s (or 25GB/s) connected to 4 storage chassis, each with 24 x 15.35TB SSD (dual-attached 12Gb/s SAS), as pictured below:</p><p><span style="color: #000000;"></span></p><figure><img src="https://lh7-us.googleusercontent.com/gI-D9jEmQeMntz4clh3TNYF60Q6Xep5cMcwQqHL3TGX_9H7L0m_6MgjDlPfSQrUtSBsh5l9bVa8Nddamm4BHzsQwk1S5Q5s9Wq_i8wdGGcXXnOD5wW_kqTJDQXjdwGEb7VYN1gSNPHccCYBc9iEzgTM" alt="" height="284" loading="lazy" title="" width="562"></figure><br><br>Of course, you can’t just replace each individual rotating hard-drive with a SSD, there are some infrastructure changes required, and some reconfiguration needed. The upgrade, executed between January 2023 and January 2024, was a seamless transition. Old HDD-based I/O cells were gracefully retired, one by one, while flash-based ones progressively replaced them, gradually boosting performance for all Sherlock users throughout the year.<br><span style="color: #000000;"><figure><img src="https://lh7-us.googleusercontent.com/B7lwfOxhKxKc-kDeQZkZ63exdm99PnDvete7-03-wD3906KQ_BaUOAGpzuNRa1nrZ_UdcCz_XcPusFZGA60zH6xWSMR60WDz-C6q-qg2BetwYGf1Ytpevnr0Hg5cN9kVPnEVRkeRRfqJBXje3AvmAXo" alt="" height="332" loading="lazy" title="" width="472"></figure></span><br>All of those replacements happened while the file system was up and running, serving data to the thousands of computing jobs that run on Sherlock every day. Driven by our commitment to minimize disruptions to users, our top priority was to ensure uninterrupted access to data throughout the upgrade. Data migration is never fun, and we wanted to avoid having to ask users to manually transfer their files to a new, separate storage system. This is why we developed and <a href="https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=commit;h=1121816c4a4e1bb2ef097c4a9802362181c43800&amp;utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-goes-full-flash&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.VKxO5IXJlMStQurJnpwv&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="del_ost commit">contributed</a> a new feature in Lustre, which allowed us to seamlessly remove existing storage devices from the file system, before the new flash drives could be added. More technical details about the upgrade have been <a href="http://www.eofs.eu/wp-content/uploads/2024/02/2.5-stanfordrc_s_thiell.pdf?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-goes-full-flash&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.VKxO5IXJlMStQurJnpwv&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="presentation slides">presented</a> during the <a href="https://www.eofs.eu/index.php/events/lad-22/?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-goes-full-flash&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.VKxO5IXJlMStQurJnpwv&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="LAD'22">LAD’22</a> conference.<p></p><p><strong>Today, we are happy to announce that the upgrade is officially complete, and Sherlock stands proud with a whopping 9,824 TB of solid-state storage in production. No more spinning disks in sight!</strong></p><h2>Key benefits</h2><p>For users, the immediately visible benefits are quicker access to their files, faster data transfers, shorter job execution times for I/O intensive applications. More specifically, every key metric has been improved:</p><ul><li><p>IOPS: over <strong>100x</strong> (results may vary, see below)</p></li><li><p>Backend bandwidth: <strong>6x</strong> (128 GB/s to 768 GB/s)</p></li><li><p>Frontend bandwidth: <strong>2x</strong> (200 GB/s to 400 GB/s)</p></li><li><p>Usable volume: <strong>1.6x</strong> (6.1 PB to 9.8 PB)<br></p></li></ul><p>In terms of measured improvement, the graph below shows the impact of moving to full-flash storage for reading data from 1, 8 and 16 compute nodes, compared to the previous <code>/scratch</code> file system: </p><p><span style="color: #000000;"></span></p><figure><img src="https://lh7-us.googleusercontent.com/a1wBmS1DW--_SfmLz5iyYRChlTp8MSuE7VKNKinX2nBgzb6iRiNeiSqa5zuXQrTvN1YztMqTLBVPdc_gqA1lrqOpQh7ZA1FzsNdS4VToP_okzXIhbWdzS2rWtUD33joDAaFV4m7eSMQp6DB8se6PY_Y" alt="" height="387" loading="lazy" title="" width="624"></figure><p></p><p>And we even tried to replicate the I/O patterns of <a href="https://github.com/google-deepmind/alphafold?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-goes-full-flash&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.VKxO5IXJlMStQurJnpwv&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="AlphaFold">AlphaFold</a>, a well-known AI model to predict protein structure, and the benefits are quite significant, with up to 125x speedups in some cases:</p><p><span style="color: #000000;"></span></p><figure><img src="https://lh7-us.googleusercontent.com/4qvJD4MDJwjdlyKLcE4F24ZaaqanbQHjS1CkxPVWvzBKHphgLLAfa0QoepWrbOYOtwLFnYLrwLHTyS1NatKDItsDI63mlC1mxhac6RSFKSHCLyiEOykLBnHw7ziqM5uQ0VTVmmLd5BPPJpNF6bNUN70" alt="" height="335" loading="lazy" title="" width="624"></figure><br><br>This upgrade is a major improvement that will benefit all Sherlock users, and Sherlock’s enhanced I/O capabilities will allow them to approach data-intensive tasks with unprecedented efficiency. We hope it will help support the ever-increasing computing needs of the Stanford research community, and enable even more breakthroughs and discoveries. <p></p><p>As usual, if you have any question or comment, please don’t hesitate to reach out to Research Computing at <a href="mailto:[email protected]" rel="noopener nofollow" target="_blank" title="[email protected]">[email protected]</a>. 🚀🔧<br><br></p>Stéphane Thiell & Kilian Cavalotti[email protected]urn:noticeable:publications:tkzeo34ezqhztdmSbO5B2023-11-16T02:00:00Z2023-11-16T02:21:28.317ZA brand new Sherlock OnDemand experienceStanford Research Computing is proud to unveil Sherlock OnDemand 3.0, a cutting-edge enhancement to its computing and data storage resources, revolutionizing user interaction and efficiency.<p>Following a long tradition of <a href="https://news.sherlock.stanford.edu/publications/sherlock-on-demand?utm_source=noticeable&amp;utm_campaign=sherlock.a-brand-new-sherlock-ondemand-experience&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.tkzeo34ezqhztdmSbO5B&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock OnDemand">announces</a> and <a href="https://news.sherlock.stanford.edu/publications/sherlock-goes-container-native?utm_source=noticeable&amp;utm_campaign=sherlock.a-brand-new-sherlock-ondemand-experience&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.tkzeo34ezqhztdmSbO5B&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock goes container native">releases</a> during the <a href="https://supercomputing.org/?utm_source=noticeable&amp;utm_campaign=sherlock.a-brand-new-sherlock-ondemand-experience&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.tkzeo34ezqhztdmSbO5B&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="SuperComputing conference">SuperComputing</a> conference, and while <a href="https://sc23.supercomputing.org/?utm_source=noticeable&amp;utm_campaign=sherlock.a-brand-new-sherlock-ondemand-experience&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.tkzeo34ezqhztdmSbO5B&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="SC23">SC23</a> is underway in Denver CO, <strong>Stanford Research Computing is proud to unveil Sherlock OnDemand 3.0,</strong> a cutting-edge enhancement to its computing and data storage resources, revolutionizing user interaction and efficiency. <br><br><strong>The upgraded Sherlock OnDemand is available immediately at </strong><a href="https://ondemand.sherlock.stanford.edu?utm_source=noticeable&amp;utm_campaign=sherlock.a-brand-new-sherlock-ondemand-experience&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.tkzeo34ezqhztdmSbO5B&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock OnDemand"><strong>https://ondemand.sherlock.stanford.edu</strong></a> </p><p></p><figure><img src="https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/publications/tkzeo34ezqhztdmSbO5B/01hfaqwynskjp4v9s198vs7ppg-image.png" alt="" loading="lazy" title=""></figure><p></p><p><span style="color: var(--text-primary);">This new release brings a host of transformative changes. A lot happened under the hood, but the visible changes are significant as well.</span></p><p><strong><span style="color: var(--text-primary);">Infrastructure upgrades:</span></strong></p><ul><li><p><strong><span style="color: var(--tw-prose-bold);">A new URL:</span></strong> Sherlock OnDemand is now accessible at <a href="https://ondemand.sherlock.stanford.edu?utm_source=noticeable&amp;utm_campaign=sherlock.a-brand-new-sherlock-ondemand-experience&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.tkzeo34ezqhztdmSbO5B&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank"><span style="color: rgba(41,100,170,var(--tw-text-opacity));">https://ondemand.sherlock.stanford.edu</span></a>, in line<span style="color: rgb(15, 15, 15);"> with our other instances, for a more homogeneous </span>user experience across Research Computing systems. The previous URL will still work for a time, and redirections will be progressively deployed to ease the transition.</p></li><li><p><strong><span style="color: var(--tw-prose-bold);">New engine, same feel:</span></strong> a lot of internal components have undergone substantial updates, but the familiar interface remains intact, ensuring a seamless transition for existing users.</p></li><li><p><strong><span style="color: var(--tw-prose-bold);">Streamlined authentication:</span></strong> Sherlock OnDemand now uses <a href="https://openid.net/?utm_source=noticeable&amp;utm_campaign=sherlock.a-brand-new-sherlock-ondemand-experience&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.tkzeo34ezqhztdmSbO5B&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="OpenID">OIDC</a> via the Stanford central Identity Provider instead of SAML, resulting in a lighter, more robust configuration for enhanced security.</p></li><li><p><strong><span style="color: var(--tw-prose-bold);">Enhanced Performance:</span></strong> expect a more responsive interface and improved reliability with the eradication of 422 HTTP errors.</p></li></ul><h2><strong><span style="color: var(--text-primary);">User-centric features:</span></strong></h2><ul><li><p><strong><span style="color: var(--tw-prose-bold);">Expanded file access:</span></strong> all your <a href="https://uit.stanford.edu/service/oak-storage?utm_source=noticeable&amp;utm_campaign=sherlock.a-brand-new-sherlock-ondemand-experience&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.tkzeo34ezqhztdmSbO5B&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Oak">Oak</a> groups, are now conveniently listed in the embedded file browser for easier and more comprehensive access to your data. And if you have <code>rclone</code> remotes already configured on Sherlock, you’'ll find them there as well!</p></li><li><p><strong><span style="color: var(--tw-prose-bold);">Effortless support tickets:</span></strong> you can now send support tickets directly from the OnDemand interface, which will automatically include contextual information about your interactive sessions, to simply issue resolution.</p></li><li><p><strong><span style="color: var(--tw-prose-bold);">New interactive apps:</span></strong> In addition to the existing apps, VS Code server, MATLAB, and JupyterLab join the platform, offering expanded functionalities, like the ability of loading and unloading of modules within JupyterLab directly.<br><em>Yes, you read that right: we now have <strong>VS Code</strong> and <strong>MATLAB</strong> in Sherlock OnDemand!</em><br>The RStudio app has also been rebuilt from the ground up, providing a much better and reliable experience.</p><p style="text-align: center;"></p><figure><img src="https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/publications/tkzeo34ezqhztdmSbO5B/01hfaxb83938p5dwqfxj532jp6-image.png" alt="" loading="lazy" title=""></figure><p></p></li><li><p><strong><span style="color: var(--tw-prose-bold);">Customizable working directories:</span></strong> users can now select a working directory across all interactive apps, for easier customization of their work environment.</p></li></ul><p><span style="color: var(--text-primary);">For more details and guidance on using the new features, check out the updated documentation at </span><a href="https://www.sherlock.stanford.edu/docs/user-guide/ondemand/.?utm_source=noticeable&amp;utm_campaign=sherlock.a-brand-new-sherlock-ondemand-experience&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.tkzeo34ezqhztdmSbO5B&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank"><span style="color: var(--text-primary);">https://www.sherlock.stanford.edu/docs/user-guide/ondemand/.</span></a><span style="color: var(--text-primary);"><br></span><strong><span style="color: var(--text-primary);"><br>This update delivers a brand new computing experience, designed to empower you in your work. </span></strong><span style="color: var(--text-primary);">Sherlock OnDemand 3.0 marks a significant milestone in optimizing user access to computing resources, lowering the barrier to entry for new users, and empowering researchers with an unparalleled computing environment. We're excited to see how it will enhance your productivity and efficiency, so dive into this transformative experience today and elevate your computing endeavors to new heights with Sherlock OnDemand 3.0!<br><br>And as usual, if you have any question, comment or suggestion, don’t hesitate to reach out at </span><a href="mailto:[email protected]" rel="noopener nofollow" target="_blank" title="support"><span style="color: var(--text-primary);">[email protected]</span></a><span style="color: var(--text-primary);">. </span></p>Kilian Cavalotti[email protected]urn:noticeable:publications:yYBxYUSUYLiw2D6qzR0S2023-05-12T22:30:44.259Z2023-05-12T22:32:58.168ZFinal hours announced for the June 2023 SRCF downtimeAs previously announced, the Stanford Research Computing Facility (SRCF), where Sherlock is hosted, will be powered off during the last week of June, in order to safely bring up power to the new SRCF2 datacenter. Sherlock will not be<p>As <a href="https://news.sherlock.stanford.edu/publications/srcf-is-expanding?utm_source=noticeable&amp;utm_campaign=sherlock.final-hours-announced-for-the-june-2023-srcf-downtime&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.yYBxYUSUYLiw2D6qzR0S&amp;utm_medium=newspage" target="_blank" title="SRCF is expanding">previously announced</a>, the Stanford Research Computing Facility (SRCF), where Sherlock is hosted, will be powered off during the last week of June, in order to safely bring up power to the new SRCF2 datacenter.</p><blockquote><p><strong>Sherlock will not be available for login, to submit jobs or to access files</strong> from <strong>Saturday June 24th, 2023 at 00:00 PST</strong> to <strong>Monday July 3rd, 2023 at 18:00 PST.</strong></p></blockquote><p>Jobs will stop running and access to login nodes will be closed at 00:00 PST on Saturday, June 24th, to allow sufficient time for shutdown and pre-downtime maintenance tasks on the cluster, before the power actually goes out. If everything goes according to plan, and barring issues or delays with power availability, access will be restored on Monday, July 3rd at 18:00 PST.</p><p>We will use this opportunity to perform necessary maintenance operations on Sherlock that can’t be done while jobs are running, which will avoid having to schedule a whole separate downtime. Sherlock will go offline in advance of the actual electrical shutdown to ensure that all equipment is properly powered off and minimize the risks of disruption and failures when power is restored.<br><br>A reservation will be set in the scheduler for the duration of the downtime: if you submit a job on Sherlock and the time you request exceeds the time remaining until the start of the downtime, your job will be queued until the maintenance is over, and the <code>squeue</code> command will report a status of <code>ReqNodeNotAvailable</code> (“Required Node Not Available”).</p><p><em>The hours leading up to a downtime are an excellent time to submit shorter, smaller jobs that can complete before the maintenance begins: as the queues drain there will be many nodes available, and your wait time may be shorter than usual.<br><br></em>As previously mentioned, in anticipation of this week-long downtime, we encourage all users to plan their work accordingly, and ensure that they have contingency plans in place for their computing and data accessibility needs during that time. <strong>If you have important data that you need to be able to access while Sherlock is down, we strongly recommend that you start transferring your data to off-site storage systems ahead of time, to avoid last-minute complications.</strong> Similarly, if you have deadlines around the time of the shutdown that require computation results, make sure to anticipate those and submit your jobs to the scheduler as early as possible.<br><br>We understand that this shutdown will have a significant impact for users who rely on Sherlock for their computing and data processing needs, and we appreciate your cooperation and understanding as we work to improve our Research Computing infrastructure.<br><br>For help transferring data, any questions or concerns, please do not hesitate to reach out to <a href="mailto:[email protected]" rel="noopener nofollow" target="_blank">[email protected]</a>.</p>Kilian Cavalotti[email protected]urn:noticeable:publications:fVC8v76vTKAPzyy0I0Lh2023-04-27T01:05:18.100Z2023-04-27T19:00:07.260ZInstant lightweight GPU instances are now availableWe know that getting access to GPUs on Sherlock can be difficult and feel a little frustrating at times. Which is why we are excited to announce the immediate availability of our new instant lightweight GPU instances!<p>We know that getting access to GPUs on Sherlock can be difficult and feel a little frustrating at times. Demand has been steadily growing, leading to long pending times, and waiting in line rarely feels great, especially when you have important work to do. </p><p>Which is why we are excited to announce the immediate availability of our latest addition to the Sherlock cluster: <strong>instant lightweight GPU instances</strong>! Every user can now get immediate access to a GPU instance, for a quick debugging session or to explore new ideas in a Notebook.<br><br>GPUs are the backbone of high-performance computing. They’ve become an integral component of the toolbox for many users, and are essential for deep learning, scientific simulations, and many other applications. But you don’t always need a full-fledged, top-of-the-line GPU for all your tasks. Sometimes all you want is to run a quick test to prototype an idea, debug a script, or explore new data in an interactive Notebook. For this, the new lightweight GPU instances on Sherlock will give you instant access to a GPU, without having to wait in line and compete with other jobs for resources you don’t need.<br><br>Sherlock’s instant lightweight GPU instances leverage NVIDIA’s <a href="https://www.nvidia.com/en-us/technologies/multi-instance-gpu/?utm_source=noticeable&amp;utm_campaign=sherlock.instant-lightweight-gpu-instances-are-now-available&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.fVC8v76vTKAPzyy0I0Lh&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="NVIDIA Multi-Instance GPU">Multi-Instance GPU</a> (MIG) to provide multiple fully isolated GPU instances on the same physical GPU, each with their own high-bandwidth memory, cache, and compute cores. Those lightweight instances are ideal for small to medium-sized jobs, and lower the barrier to entry for all users<br><br>Similar to the interactive sessions available through the <code>dev</code> partition, Sherlock users can now request a GPU via the <code>sh_dev</code> command, and get immediate access with the following command:</p><pre><code>$ sh_dev -g 1</code></pre><p>For interactive apps in the <a href="https://www.sherlock.stanford.edu/docs/user-guide/ondemand/?utm_source=noticeable&amp;utm_campaign=sherlock.instant-lightweight-gpu-instances-are-now-available&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.fVC8v76vTKAPzyy0I0Lh&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock OnDemand docs">Sherlock OnDemand</a> interface, requesting a GPU in the <code>dev</code> partition will initiate an interactive session with access to a lightweight GPU instance.<br></p><figure><img src="https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/publications/fVC8v76vTKAPzyy0I0Lh/01h55ta3gsgn6y7qksqsnbat6e-image.png" alt="" height="265.6474576271186" loading="lazy" title="" width="443.99999999999994"></figure><p></p><p><br>So now, everyone gets a GPU, no questions asked! 😁</p><p style="text-align: center;"></p><figure><img src="https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/publications/fVC8v76vTKAPzyy0I0Lh/01h55ta3gsjw21hcsf0z70n9he-image.png" alt="" loading="lazy" title=""></figure><p></p><p><br>We hope those new instances will improve access to GPUs on Sherlock, enable a wider range of use cases, with all the flexibility and performance you need to get your work done, and lead to even more groundbreaking discoveries!</p><p>As always, thanks to all of our users for your continuous support and patience as we work to improve Sherlock, and if you have any question or comment, please don’t hesitate to reach out at <a href="mailto:[email protected]" rel="noopener" target="_blank">[email protected]</a>.<br></p>Kilian Cavalotti[email protected]urn:noticeable:publications:SAz2fLkjN80X6CGoMnHX2023-03-25T01:04:21.451Z2023-03-25T01:16:33.972ZA new tool to help optimize job resource requirementsIt’s not always easy to determine the right amount of resources to request for a computing job. Making sure that the application will have enough resources to run properly, but avoiding over-requests that would make the jobs spend too much<p>It’s not always easy to determine the right amount of resources to request for a computing job. Making sure that the application will have enough resources to run properly, but avoiding over-requests that would make the jobs spend too much time waiting in queue for resources they won’t be using.<br><br>To help users inform those choices, we’ve just added a new tool to the <a href="https://www.sherlock.stanford.edu/docs/software/list/?utm_source=noticeable&amp;utm_campaign=sherlock.a-new-tool-to-help-optimize-job-resource-requirements&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.SAz2fLkjN80X6CGoMnHX&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Software list">module list</a> on Sherlock. <code><a href="https://github.com/JanneM/Ruse?utm_source=noticeable&amp;utm_campaign=sherlock.a-new-tool-to-help-optimize-job-resource-requirements&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.SAz2fLkjN80X6CGoMnHX&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Ruse project webpage">ruse</a></code> is command-line tool developed by <a href="https://github.com/JanneM?utm_source=noticeable&amp;utm_campaign=sherlock.a-new-tool-to-help-optimize-job-resource-requirements&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.SAz2fLkjN80X6CGoMnHX&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Jan Moren GitHub profile page">Jan Moren</a> which facilitates measuring processes’ resource usage. It periodically measures the resource use of a process and its sub-processes, and can help users find out how much resource to allocate to their jobs. It will determine the actual memory, execution time and cores that individual programs or MPI applications need to request in their job submission options.<br><br>You’ll find more information and some examples in the Sherlock documentation at <a href="https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/?utm_source=noticeable&amp;utm_campaign=sherlock.a-new-tool-to-help-optimize-job-resource-requirements&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.SAz2fLkjN80X6CGoMnHX&amp;utm_medium=newspage#resource-requests" rel="noopener nofollow" target="_blank">https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/#resource-requests</a> <br><br>Hopefully <code>ruse</code> will make it easier to write job resource requests , and allow users to get a better understanding of their applications’ behavior to take better advantage of Sherlock’s capabilities.</p><p>As usual, if you have any question or comment, please don’t hesitate to reach out at <a href="mailto:[email protected]" rel="noopener" target="_blank">[email protected]</a>.</p>Kilian Cavalotti[email protected]urn:noticeable:publications:sfVys1ZofGziZcYKUEhR2023-02-24T02:00:00Z2023-03-01T18:49:02.984ZSRCF is expandingIn order to bring up a new building that will increase data center capacity, a full SRCF power shutdown is planned for late June 2023. It’s expected to last about a week, and Sherlock will be unavailable during that time.<p>The <a href="SRCF" rel="noopener nofollow" target="_blank" title="https://srcc.stanford.edu/facilities">Stanford Research Computing Facility</a> (SRCF), where Sherlock is hosted, has been a highly effective data center since its opening in January of 2014, and demand has grown so much that we’re expanding it! Another identical building (SRCF2) is under construction at SLAC, which will increase our data center capacity when it opens this summer.</p><p>In order to bring power to the new building, the entire existing SRCF data center will need to be shut down. The 12kV electrical infrastructure is so pervasive that for the new building to be connected safely, everything needs to be powered off, including the backup generators. It unfortunately means that all servers and equipment will need to be shut down for this event, including Sherlock.<br><br><strong>The full building power shutdown is planned for late June 2023, it’s expected to last for about a week, and Sherlock will be unavailable during that time.</strong></p><p>During the power outage, Sherlock will be entirely powered down, meaning that it will not allow login or data transfer, the <a href="https://www.sherlock.stanford.edu/docs/user-guide/ondemand/?utm_source=noticeable&amp;utm_campaign=sherlock.srcf-is-expanding&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.sfVys1ZofGziZcYKUEhR&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock OnDemand">Sherlock OnDemand</a> interface will be down, jobs will not run, and data will not be accessible (including <code>$HOME</code>, <code>$SCRATCH</code> and <code>$OAK</code>). We expect all services to resume normally once power is back up, and jobs that were in queue before the downtime should resume being scheduled normally.<br><br><a href="https://itcommunity.stanford.edu/news/stanford-research-computing-facility-planned-shutdown-june-26-july-3-2023?utm_source=noticeable&amp;utm_campaign=sherlock.srcf-is-expanding&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.sfVys1ZofGziZcYKUEhR&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Stanford Research Computing Facility Planned Shutdown June 26-July 3, 2023">The power outage is currently scheduled for the last week of June 2023</a>. Specific dates and times have not been finalized yet, but we will share more detailed information as the shutdown date gets closer.<br><br>In anticipation of this week-long downtime, we encourage all users to plan their work accordingly, and ensure that they have contingency plans in place for their computing and data accessibility needs during that time. If you have important data that you need to be able to access while Sherlock is down, we strongly recommend that you start transferring your data to off-site storage systems ahead of time, to avoid last-minute complications. Similarly, if you have deadlines around the time of the shutdown that require computation results, make sure to anticipate those and submit your jobs to the scheduler as early as possible.<br><br>We understand that this shutdown will have a significant impact for users who rely on Sherlock for their computing and data processing needs, and we appreciate your cooperation and understanding as we work to improve our Research Computing infrastructure.<br><br>For help in transferring data, any questions or concerns, please do not hesitate to reach out to <a href="mailto:[email protected]" rel="noopener nofollow" target="_blank">[email protected]</a>.<br></p>Kilian Cavalotti[email protected]urn:noticeable:publications:MARmnxM2JHvznq8MaK6q2022-12-14T17:27:18.657Z2022-12-14T17:27:26.687ZMore free compute on Sherlock!We’re thrilled to announce that the free and generally available normal partition on Sherlock is getting an upgrade! With the addition of 24 brand new SH3_CBASE.1 compute nodes, each featuring one AMD EPYC 7543 Milan 32-core CPU and 256 GB<p>We’re thrilled to announce that the free and generally available <code>normal</code> partition on Sherlock is getting an upgrade!<br><br>With the addition of 24 brand new <a href="https://www.sherlock.stanford.edu/docs/orders/?h=cbase&amp;utm_source=noticeable&amp;utm_campaign=sherlock.more-free-compute-on-sherlock&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.MARmnxM2JHvznq8MaK6q&amp;utm_medium=newspage#configurations" rel="noopener nofollow" target="_blank" title="Sherlock node configurations">SH3_CBASE.1</a> compute nodes, each featuring one <a href="https://www.amd.com/en/products/cpu/amd-epyc-7543?utm_source=noticeable&amp;utm_campaign=sherlock.more-free-compute-on-sherlock&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.MARmnxM2JHvznq8MaK6q&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="AMD EPYC 7543">AMD EPYC 7543</a> Milan 32-core CPU and 256 GB of RAM, Sherlock users now have 768 more CPU cores at there disposal. Those new nodes will complete the existing 154 compute nodes and 4,032 core in that partition, for a <strong>new total of 178 nodes and 4,800 CPU cores.</strong><br><br>The <code>normal</code> partition is Sherlock’s shared pool of compute nodes, which is available <a href="https://www.sherlock.stanford.edu/?utm_source=noticeable&amp;utm_campaign=sherlock.more-free-compute-on-sherlock&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.MARmnxM2JHvznq8MaK6q&amp;utm_medium=newspage#how-much-does-it-cost" rel="noopener nofollow" target="_blank" title="Sherlock cost">free of charge</a> to all Stanford Faculty members and their research teams, to support their wide range of computing needs. <br><br>In addition to this free set of computing resources, Faculty can supplement these shared nodes by <a href="https://www.sherlock.stanford.edu/docs/orders/?utm_source=noticeable&amp;utm_campaign=sherlock.more-free-compute-on-sherlock&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.MARmnxM2JHvznq8MaK6q&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Purchasing Sherlock compute nodes">purchasing additional compute nodes</a>, and become Sherlock owners. By investing in the cluster, PI groups not only receive exclusive access to the nodes they purchased, but also get access to all of the other owner compute nodes when they're not in use, thus giving them access to the <a href="https://www.sherlock.stanford.edu/docs/tech/facts/?utm_source=noticeable&amp;utm_campaign=sherlock.more-free-compute-on-sherlock&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.MARmnxM2JHvznq8MaK6q&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock facts">whole breadth of Sherlock resources</a>, currently over over 1,500 compute nodes, 46,000 CPU cores and close to 4 PFLOPS of computing power.<br><br>We hope that this new expansion of the <code>normal</code> partition, made possible thanks to additional funding provided by the University Budget Group as part of the FY23 budget cycle, will help support the ever-increasing computing needs of the Stanford research community, and enable even more breakthroughs and discoveries.<br><br>As usual, if you have any question or comment, please don’t hesitate to reach out at <a href="mailto:[email protected]" rel="noopener" target="_blank">[email protected]</a>.<br><br><br><br></p>Kilian Cavalotti[email protected]urn:noticeable:publications:lWJ0NjSCycX68eP1aVpU2022-12-03T02:57:22.756Z2022-12-03T02:57:36.261ZClusterShell on SherlockEver wondered how your jobs were doing while they were running? Keeping a eye on a log file is nice, but what if you could quickly gather process lists, usage metrics and other data points from all the nodes your multi-node jobs are running<p>Ever wondered how your jobs were doing while they were running? Keeping a eye on a log file is nice, but what if you could quickly gather process lists, usage metrics and other data points from all the nodes your multi-node jobs are running on, all at once?<br><br>Enter <a href="https://cea-hpc.github.io/clustershell/?utm_source=noticeable&amp;utm_campaign=sherlock.clustershell-on-sherlock&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.lWJ0NjSCycX68eP1aVpU&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="ClusterShell">ClusterShell</a>, the best parallel shell application (and library!) of its kind.<br><br>With ClusterShell on Sherlock, you can quickly run a command on all the nodes your job is running on, to gather information about your applications and processes, in real time, and gather live output without having to wait for your job to end to see how it did. And with its tight integration with the job scheduler, no need to fiddle with manual node lists anymore, all it needs is a job id!<br><br>You allocated a few nodes in an interactive session and want to distribute some files on each node’s local storage devices? Check: ClusterShell has a <a href="https://clustershell.readthedocs.io/en/latest/tools/clush.html?utm_source=noticeable&amp;utm_campaign=sherlock.clustershell-on-sherlock&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.lWJ0NjSCycX68eP1aVpU&amp;utm_medium=newspage#file-copying-mode" rel="noopener nofollow" target="_blank" title="File copy mode">copy mode</a> just for this.<br><br>Want to double-check that your processes are correctly laid out? Check: you can run a quick command to check the process tree across the nodes allocated to your job with:</p><pre><code>$ clush -w @job:$JOBID pstree -au $USER</code></pre><p>and verify that all your processes are running correctly.<br><br>You’ll find more details and examples in our Sherlock documentation, at <a href="https://www.sherlock.stanford.edu/docs/software/using/clustershell/?utm_source=noticeable&amp;utm_campaign=sherlock.clustershell-on-sherlock&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.lWJ0NjSCycX68eP1aVpU&amp;utm_medium=newspage#local-storage" rel="noopener nofollow" target="_blank">https://www.sherlock.stanford.edu/docs/software/using/clustershell</a><br><br>Questions, ideas, or suggestions? Don’t hesitate to reach out to <a href="mailto:[email protected]" rel="noopener nofollow" target="_blank">[email protected]</a> to let us know!</p>Kilian Cavalotti[email protected]urn:noticeable:publications:0UxFZFimazxEAK4GjJJO2022-11-06T02:11:25.989Z2022-11-06T02:19:58.706ZJob #1, again!This is not the first time, we’ve been through this already (not so long ago, actually) but today, the Slurm job id counter was reset and went from job #67043327 back to job #1.<p>This is not the first time, we’ve been through this already (not so long ago, actually) but today, the Slurm job id counter was reset and went from job #67043327 back to job #1.</p><p></p><pre><code>JobID Partition Start ------------ ---------- ------------------- 67043327 normal 2022-11-05T10:18:32 1 normal 2022-11-05T10:18:32</code></pre><p>The largest job id that the scheduler can assign on Sherlock is 67,043,327. So when that number is reached, the next submitted job will be assigned job id #1.<br><br>This is the second time this job id reset happens in Sherlock’s history, since it debuted in 2014. The <a href="https://news.sherlock.stanford.edu/publications/job-1?utm_source=noticeable&amp;utm_campaign=sherlock.job-1-again&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.0UxFZFimazxEAK4GjJJO&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Job #1">first occurrence</a> happened on May 11th, 2020, just a little under 2.5 years ago. <br><br>It took about 6 years to submit the first 67 million jobs on Sherlock, but It’s incredible to realize that it took less than half that time to get to that staggering number of submitted jobs once again, and that it all happened since the beginning of the pandemic.<br><br>This is an humbling illustration of Sherlock’s central role and its importance to the Stanford research community, especially over the last few months. This give us once again the opportunity to thank each and every one of you, Sherlock users, for your continuous support, your extraordinary motivation and all of your patience and understanding when things break. We’ve never been so proud of supporting your amazing work, especially during those particularly trying times. Stay safe and happy computing!</p>Kilian Cavalotti[email protected]urn:noticeable:publications:Hdh5qDe3icyS6vJXdQpt2021-11-30T17:00:00Z2021-11-30T18:27:25.812ZFrom Rome to Milan, a Sherlock catalog updateIt’s been almost a year and a half since we first introduced Sherlock 3.0 and its major new features: brand new CPU model and manufacturer, 2x faster interconnect, much larger and faster node-local storage, and more! We’ve now reached an<p>It’s been almost a year and a half since we first <a href="https://news.sherlock.stanford.edu/publications/sherlock-3-0-is-here?utm_source=noticeable&amp;utm_campaign=sherlock.from-rome-to-milan-a-sherlock-catalog-update&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.Hdh5qDe3icyS6vJXdQpt&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock 3.0">introduced Sherlock 3.0</a> and its major new features: brand new CPU model and manufacturer, 2x faster interconnect, much larger and faster node-local storage, and more! We’ve now reached an inflexion point in Sherlock’s current generation and it’s time to update the hardware configurations available for purchase in the <a href="https://www.sherlock.stanford.edu/catalog?utm_source=noticeable&amp;utm_campaign=sherlock.from-rome-to-milan-a-sherlock-catalog-update&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.Hdh5qDe3icyS6vJXdQpt&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock catalog">Sherlock catalog</a>.<br><br>So today, <strong>we’re introducing a new Sherlock catalog refresh</strong>, a Sherlock 3.5 of sorts.</p><h1>The new catalog</h1><p>So, what changes? What stays the same?<br>In a nutshell, you’ll continue to be able to purchase the existing node types that you’re already familiar with:</p><p><strong>CPU configurations:</strong></p><ul><li><p><code>CBASE</code>: base configuration ($)</p></li><li><p><code>CPERF</code>: high core-count configuration ($$)</p></li><li><p><code>CBIGMEM</code>: large-memory configuration ($$$$)</p></li></ul><p><strong>GPU configurations</strong></p><ul><li><p><code>G4FP32</code>: base GPU configuration ($$)</p></li><li><p><code>G4TF64</code>: HPC GPU configuration ($$$)</p></li><li><p><code>G8TF64</code>: best-in-class GPU configuration ($$$$)</p></li></ul><p>But they now come with better and faster components!<br><br><em>To avoid confusion, the configuration names in the catalog will be suffixed with a index to indicate the generational refresh, but will keep the same global denomination. For instance, the previous <code>SH3_CBASE</code> configuration is now replaced with a <code>SH3_CBASE.1</code> configuration that still offers 32 CPU cores and 256 GB of RAM.</em></p><h2>A new CPU generation</h2><p>The main change in the existing configuration is the introduction of the new <a href="https://www.amd.com/en/processors/epyc-7003-series?utm_source=noticeable&amp;utm_campaign=sherlock.from-rome-to-milan-a-sherlock-catalog-update&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.Hdh5qDe3icyS6vJXdQpt&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="AMD EPYC™ 7003 Series Processors">AMD 3rd Gen EPYC Milan</a> CPUs. In addition to the advantages of the previous Rome CPUs, this new generation brings:</p><ul><li><p>a new micro-architecture (Zen3)</p></li><li><p>a ~20% performance increase in instructions completed per clock cycle (IPC)</p></li><li><p>enhanced memory performance, with a unified 32 MB L3 cache</p></li><li><p>improved CPU clock speeds</p></li></ul><p></p><figure><img src="https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/publications/Hdh5qDe3icyS6vJXdQpt/01h55ta3gsmvbh16hzp4z34xt9-image.jpg" alt="" loading="lazy" title=""></figure><p></p><p>More specifically, for Sherlock, the following CPU models are now used:</p><table><tbody><tr><th data-colwidth="105"><p>Model</p></th><th><p>Sherlock 3.0 (Rome)</p></th><th><p>Sherlock 3.5 (Milan)</p></th></tr><tr><td data-colwidth="105"><p><code>CBASE</code></p></td><td><p>1× 7502 (32-core, 2.50GHz)</p></td><td><p>1× 7543 (32-core, 2.75GHz)</p></td></tr><tr><td data-colwidth="105"><p><code>CPERF</code></p></td><td><p>2× 7742 (64-core, 2.25GHz)</p></td><td><p>2× 7763 (64-core, 2.45GHz)</p></td></tr><tr><td data-colwidth="105"><p><code>CBIGMEM</code></p></td><td><p>2× 7502 (32-core, 2.50GHz)</p></td><td><p>2× 7543 (32-core, 2.75GHz)</p></td></tr><tr><td data-colwidth="105"><p><code>G4FP32</code></p></td><td><p>1× 7502 (32-core, 2.50GHz)</p></td><td><p>1× 7543 (32-core, 2.75GHz)</p></td></tr><tr><td data-colwidth="105"><p><code>G4TF64</code></p></td><td><p>2× 7502 (32-core, 2.50GHz)</p></td><td><p>2× 7543 (32-core, 2.75GHz)</p></td></tr><tr><td data-colwidth="105"><p><code>G8TF64</code></p></td><td><p>2× 7742 (64-core, 2.25GHz)</p></td><td><p>2× 7763 (64-core, 2.45GHz)</p></td></tr></tbody></table><p>In addition to IPC and L3 cache improvements, the new CPUs also bring a frequency boost that will provide a substantial performance improvement.<br></p><h2>New GPU options</h2><p>On the GPU front, the two main changes are the re-introduction of the <code>G4FP32</code> model, and the doubling of GPU memory all across the board.<br><br>GPU memory is quickly becoming the constraining factor for training deep-learning models that keep increasing in size. Having large amounts of GPU memory is now key for running medical imaging workflows, computer vision models, or anything that requires processing large images.</p><p>The entry-level <code>G4FP32</code> model is back in the catalog, with a new <a href="NVIDIA A40" rel="noopener nofollow" target="_blank" title="https://www.nvidia.com/en-us/data-center/a40/">NVIDIA A40 GPU</a> in an updated <code>SH3_G4FP32.2</code> configuration. The A40 GPU not only provides higher performance than the previous model it replaces, but it also comes with twice as much GPU memory, with a whopping 48GB of GDDR6.<br></p><figure><img src="https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/publications/Hdh5qDe3icyS6vJXdQpt/01h55ta3gsf1ph3715608pjxhk-image.png" alt="" loading="lazy" title=""></figure><p></p><p>The higher-end <code>G4TF64</code> and <code>G8TF64</code> models have also been updated with newer AMD CPUs, as well as updated versions of the <a href="https://www.nvidia.com/en-us/data-center/a100/?utm_source=noticeable&amp;utm_campaign=sherlock.from-rome-to-milan-a-sherlock-catalog-update&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.Hdh5qDe3icyS6vJXdQpt&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="NVIDIA A100">NVIDIA A100 GPU</a>, now each featuring a massive 80GB of HBM2e memory.<br></p><figure><img src="https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/publications/Hdh5qDe3icyS6vJXdQpt/01h55ta3gsx86d48cxj1sx0zf1-image.png" alt="" loading="lazy" title=""></figure><p></p><h1>Get yours today!</h1><p>For more details and pricing, please check out the <a href="https://www.sherlock.stanford.edu/catalog?utm_source=noticeable&amp;utm_campaign=sherlock.from-rome-to-milan-a-sherlock-catalog-update&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.Hdh5qDe3icyS6vJXdQpt&amp;utm_medium=newspage" rel="noopener" target="_blank">Sherlock catalog</a> <em>(SUNet ID required)</em>.<br><br>If you’re interested in <a href="https://www.sherlock.stanford.edu/docs/orders/?utm_source=noticeable&amp;utm_campaign=sherlock.from-rome-to-milan-a-sherlock-catalog-update&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.Hdh5qDe3icyS6vJXdQpt&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="purchasing process">getting your own compute nodes</a> on Sherlock, all the new configurations are available for purchase today, and can be ordered online though the <a href="https://www.sherlock.stanford.edu/order?utm_source=noticeable&amp;utm_campaign=sherlock.from-rome-to-milan-a-sherlock-catalog-update&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.Hdh5qDe3icyS6vJXdQpt&amp;utm_medium=newspage" rel="noopener" target="_blank">Sherlock order form</a> <em>(SUNet ID required)</em>.<br><br>As usual, please don’t hesitate to <a href="mailto:[email protected]" rel="noopener" target="_blank">reach out</a> if you have any questions!</p>Kilian Cavalotti[email protected]