urn:noticeable:projects:bYyIewUV308AvkMztxixSherlock changelogwww.sherlock.stanford.edu2022-11-06T02:19:58.706ZCopyright © SherlockNoticeablehttps://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/newspages/GtmOI32wuOUPBTrHaeki/01h55ta3gs1vmdhtqqtjmk7m4z-header-logo.pnghttps://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/newspages/GtmOI32wuOUPBTrHaeki/01h55ta3gs1vmdhtqqtjmk7m4z-header-logo.png#8c1515urn:noticeable:publications:0UxFZFimazxEAK4GjJJO2022-11-06T02:11:25.989Z2022-11-06T02:19:58.706ZJob #1, again!This is not the first time, we’ve been through this already (not so long ago, actually) but today, the Slurm job id counter was reset and went from job #67043327 back to job #1.<p>This is not the first time, we’ve been through this already (not so long ago, actually) but today, the Slurm job id counter was reset and went from job #67043327 back to job #1.</p><p></p><pre><code>JobID Partition Start ------------ ---------- ------------------- 67043327 normal 2022-11-05T10:18:32 1 normal 2022-11-05T10:18:32</code></pre><p>The largest job id that the scheduler can assign on Sherlock is 67,043,327. So when that number is reached, the next submitted job will be assigned job id #1.<br><br>This is the second time this job id reset happens in Sherlock’s history, since it debuted in 2014. The <a href="https://news.sherlock.stanford.edu/publications/job-1?utm_source=noticeable&amp;utm_campaign=sherlock.job-1-again&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.0UxFZFimazxEAK4GjJJO&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Job #1">first occurrence</a> happened on May 11th, 2020, just a little under 2.5 years ago. <br><br>It took about 6 years to submit the first 67 million jobs on Sherlock, but It’s incredible to realize that it took less than half that time to get to that staggering number of submitted jobs once again, and that it all happened since the beginning of the pandemic.<br><br>This is an humbling illustration of Sherlock’s central role and its importance to the Stanford research community, especially over the last few months. This give us once again the opportunity to thank each and every one of you, Sherlock users, for your continuous support, your extraordinary motivation and all of your patience and understanding when things break. We’ve never been so proud of supporting your amazing work, especially during those particularly trying times. Stay safe and happy computing!</p>Kilian Cavalotti[email protected]urn:noticeable:publications:nxYhogleTbG5uVFkz1FC2021-04-03T00:00:00Z2021-04-03T00:50:17.280Z3.3 PFlops: Sherlock hits expansion milestoneSherlock is a traditional High-Performance Computing cluster in many aspects. But unlike most of similarly-sized clusters where hardware is purchased all at once, and refreshed every few years, it is in constant evolution. Almost like a<p><a href="https://www.sherlock.stanford.edu?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-expansion-milestone&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.nxYhogleTbG5uVFkz1FC&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock">Sherlock</a> is a traditional High-Performance Computing cluster in many aspects. But unlike most of similarly-sized clusters where hardware is purchased all at once, and refreshed every few years, it is in constant evolution. Almost like a living organism, it changes all the time: mostly expanding as individual PIs, research groups, labs and even whole <a href="https://www.stanford.edu/academics/schools/?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-expansion-milestone&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.nxYhogleTbG5uVFkz1FC&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Stanford's Seven Schools">Schools</a> contribute computing resources to the system ; but also sometimes contracting, when older equipment is retired.</p><h2>A significant expansion milestone</h2><p>A few days ago, Sherlock has reached a major expansion milestone, largely owing to significant purchases from the <a href="https://earth.stanford.edu?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-expansion-milestone&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.nxYhogleTbG5uVFkz1FC&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="School of Earth, Energy &amp; Environmental Sciences">School of Earth, Energy &amp; Environmental Sciences</a>, but also thanks to multiple existing owner groups who decided to renew their investment in Sherlock by purchasing additional hardware. <br><br>With these recent additions, Sherlock reached a theoretical power of over <strong>3 Petaflops</strong>, 3 thousand million million (10<sup>15</sup>) floating-point operations per second. That would place it around the 150th position in the most recent <a href="https://top500.org/?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-expansion-milestone&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.nxYhogleTbG5uVFkz1FC&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="TOP500 list">TOP500</a> list of the most powerful computer systems in the world.<br><br>Among the newly added nodes, a number of <code><a href="https://news.sherlock.stanford.edu/publications/new-gpu-options-in-the-sherlock-catalog?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-expansion-milestone&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.nxYhogleTbG5uVFkz1FC&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="New GPU options">SH3_G8TF64</a></code><a href="https://news.sherlock.stanford.edu/publications/new-gpu-options-in-the-sherlock-catalog?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-expansion-milestone&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.nxYhogleTbG5uVFkz1FC&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="New GPU options"> nodes</a>, each featuring 128 CPU cores, 1TB of RAM, 8x <a href="https://www.nvidia.com/en-us/data-center/a100/?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-expansion-milestone&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.nxYhogleTbG5uVFkz1FC&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="A100 GPU">A100 SXM4 GPUs</a> (NVLink) and two Infiniband HDR interfaces providing 400Gb/s of interconnect bandwidth, both for storage and inter-node communication. Those nodes alone provide over half a Petaflop of computing power.<br><br>Sherlock now features over <strong>1,700 compute nodes</strong>, occupying 45 data-center racks, and consuming close to half a megawatt of power. Over <strong>44,000 CPU cores</strong>, more than 120 Infiniband switches and close to 20 miles of cables help support the daily computing activities of over 5,000 users. <em>For even more facts and numbers, checkout the <a href="https://www.sherlock.stanford.edu/docs/overview/tech/facts/?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-expansion-milestone&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.nxYhogleTbG5uVFkz1FC&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock facts">Sherlock Facts</a> page!</em></p><h2>A steady growth</h2><p>Since in first days in 2014, and its initial 120 nodes, Sherlock has been growing at a steady pace. Three generations and as many Infiniband fabrics later, and after a few months of slowdown at the beginning of 2020, expansion has resumed and is going stronger than ever: <br></p><figure><img src="https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/publications/nxYhogleTbG5uVFkz1FC/01h55ta3gsq59rjhfw223xjdtv-image.png" alt="" loading="lazy" title=""></figure><p></p><table><tbody><tr><td><p></p><figure><img src="https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/publications/nxYhogleTbG5uVFkz1FC/01h55ta3gsc37fabfksytx5c91-image.png" alt="" loading="lazy" title=""></figure><p></p></td><td><p></p><figure><img src="https://storage.noticeable.io/projects/bYyIewUV308AvkMztxix/publications/nxYhogleTbG5uVFkz1FC/01h55ta3gshsb65ge585zgsjzr-image.png" alt="" loading="lazy" title=""></figure><p></p></td></tr></tbody></table><h2>The road ahead</h2><p>To keep expanding Sherlock and continue to serve the computing needs of the Stanford research community, rack space used by first generation Sherlock nodes needs to be reclaimed to make room for the next generation. Those 1st-gen nodes have been running well over their initial service life of 4 years, and in most cases, we’ve even been able to keep them running for an extra year. But data-center space being the hot property it has now become, and since demand for new nodes is not exactly dwindling down, we’ll be starting to retire the older Sherlock nodes to accommodate the ever-increasing requests for more computing power. We’ve started working on renewal plans with those node owners, and the process is already underway. <br><br>So for a while, Sherlock will shrink in size, as old nodes are retired. Before it can start growing again!</p><h2>Catalog changes</h2><p>As we move forward, the <a href="https://www.sherlock.stanford.edu/catalog/?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-expansion-milestone&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.nxYhogleTbG5uVFkz1FC&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="Sherlock Catalog">Sherlock Compute Nodes Catalog</a> is also evolving, to follow the latest technological trends, and to adapt to the computing needs of our research community.<br><br>As part of this evolution, the <a href="https://news.sherlock.stanford.edu/publications/sh-3-g-4-fp-32-nodes-are-back-in-the-catalog?utm_source=noticeable&amp;utm_campaign=sherlock.sherlock-expansion-milestone&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.nxYhogleTbG5uVFkz1FC&amp;utm_medium=newspage" rel="noopener nofollow" target="_blank" title="SH3_G4FP32 nodes are back in the catalog!">recently announced</a> <code>SH3_G4FP32</code> configuration is sadly not available anymore, as vendors suddenly and globally discontinued the consumer-grade GPU model that was powering this configuration. They don’t have plans to bring back anything comparable, so that configuration had to be pulled from the catalog, unfortunately.<br><br>On a more positive note, a significant and exciting catalog refresh is coming up, and will be announced soon. Stay tuned! 🤫</p><hr><p>As usual, we want to sincerely thank every one of you, Sherlock users, for your patience when things break, your extraordinary motivation and your continuous support. We’re proud of supporting your amazing work, and Sherlock simply wouldn’t exist without you.<br><br>Happy computing and don’t hesitate to&nbsp;<a href="mailto:[email protected]" rel="noopener" target="_blank">reach out</a> if you have any questions!</p>Kilian Cavalotti[email protected]urn:noticeable:publications:S2oaJqRSEdqtp6VICvO62020-05-11T20:09:00.001Z2022-11-06T01:50:25.320ZJob #1If you've been submitting jobs on Sherlock over the last couple days, you probably noticed something different about your your job ids... They lost a couple digits! If you submitted a job last week, its job id was likely in the 67,000...<p>If you’ve been submitting jobs on Sherlock over the last couple days, you probably noticed something different about your your job ids… They lost a couple digits!</p><p>If you submitted a job last week, its job id was likely in the 67,000,000s. Today, it’s back in the 100,000s. What happened? Did we reset anything? Did we start simplifying job ids because there were too many numbers to keep track of?</p><p>Not really.</p><p>It’s just that so many jobs are submitted to Sherlock these days (and even more so since the beginning of the stay-at-home directives), that we reached the maximum job id that the scheduler can use.</p><p>Those job ids are roughly 26 bits in length, with a little headroom for special cases, and the largest job id that the scheduler can assign on Sherlock is 67,043,327. It means that when that number is reached, the next submitted job will be assigned jobid #1.</p><p>Both were submitted on Friday night, and started running Saturday morning:</p><pre><code> JobID Partition Submit Start ------------ ---------- ------------------- ------------------- 1 normal 2020-05-08T22:21:28 2020-05-09T06:04:50 67043327 normal 2020-05-08T22:21:28 2020-05-09T06:05:16 </code></pre><p>A few months ago, we <a href="https://news.sherlock.stanford.edu/posts/job-50-000-000?utm_source=noticeable&amp;utm_campaign=sherlock.job-1&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.S2oaJqRSEdqtp6VICvO6&amp;utm_medium=newspage" rel="noopener" target="_blank">celebrated job #50,000,000</a>. Today, we’re celebrating job #1, the beginning of a new cycle. :)</p><p>Thanks to each and every one of you, Sherlock users, for your continuous support, your extraordinary motivation and all of your patience and understanding when things break. We’ve never been so proud of supporting your amazing work, especially during those particularly trying times. Stay safe and happy computing!</p>Kilian Cavalotti[email protected]urn:noticeable:publications:0VGmhxExlhEdyc2FPc6F2019-09-12T17:44:00.001Z2021-04-02T19:00:11.434Z🎉 Job #50,000,000!We just wanted to share that Sherlock recently ran job #50,000,000! 🎈🎉 This is a significant milestone since Sherlock, in its current form, started running its first job in January 2017. Fifty million jobs in less than 3 years is no...<p>We just wanted to share that Sherlock recently ran job #50,000,000! 🎈🎉</p><p>This is a significant milestone since Sherlock, in its current form<a href="#fn1"><sup>[1]</sup></a>, started running its first job in January 2017. Fifty million jobs in less than 3 years is no small feat, and it wouldn’t have been achieved without the trust and confidence of all of our users.</p><p>Thanks to each and every one of you, Sherlock users, for your continuous support, your extraordinary motivation and all of your patience during times when things break. We’re proud of supporting your amazing work, and Sherlock simply wouldn’t exist without you.</p><p>And also, kudos to Xiang Zhu, from <a href="https://statistics.stanford.edu/people/wing-hung-wong?utm_source=noticeable&amp;utm_campaign=sherlock.job-50-000-000&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.0VGmhxExlhEdyc2FPc6F&amp;utm_medium=newspage" rel="noopener" target="_blank">Prof. Wing Wong</a>'s group in the <a href="https://statistics.stanford.edu?utm_source=noticeable&amp;utm_campaign=sherlock.job-50-000-000&amp;utm_content=publication+link&amp;utm_id=bYyIewUV308AvkMztxix.GtmOI32wuOUPBTrHaeki.0VGmhxExlhEdyc2FPc6F&amp;utm_medium=newspage" rel="noopener" target="_blank">Department of Statistics</a> for submitting job #50,000,000, you won a $50 Coupa card! ☕</p><hr><ol><li><p>Sherlock was born in 2014 and its first instance ran close to 30 million jobs between 2014 and 2017. It was reborn under its current Sherlock 2.0 form in 2017. <a href="#fnref1">↩</a></p></li></ol>Kilian Cavalotti[email protected]