almost 2 years agoA new tool to help optimize job resource requirementsby Kilian Cavalotti, Technical Lead & Architect, HPCIt’s not always easy to determine the right amount of resources to request for a computing job. Making sure that the application will have enough resources to run properly, but avoiding over-requests that would make the jobs spend too much
almost 2 years agoSRCF is expandingby Kilian Cavalotti, Technical Lead & Architect, HPCMaintenanceIn order to bring up a new building that will increase data center capacity, a full SRCF power shutdown is planned for late June 2023. It’s expected to last about a week, and Sherlock will be unavailable during that time.
about 2 years agoClusterShell on Sherlockby Kilian Cavalotti, Technical Lead & Architect, HPCSoftwareNewEver wondered how your jobs were doing while they were running? Keeping a eye on a log file is nice, but what if you could quickly gather process lists, usage metrics and other data points from all the nodes your multi-node jobs are running
over 2 years agoJob #1, again!by Kilian Cavalotti, Technical Lead & Architect, HPCThis is not the first time, we’ve been through this already (not so long ago, actually) but today, the Slurm job id counter was reset and went from job #67043327 back to job #1.
over 3 years agoKeep up to date with software updatesby Kilian Cavalotti, Technical Lead & Architect, HPCTo help users stay on top of software changes on Sherlock, we’ve recently introduced a new software updates RSS feed. It’s available from the Sherlock software list page, and you can directly add it to your RSS reader of choice. And if
over 3 years agoA new interactive step in Slurmby Kilian Cavalotti, Technical Lead & Architect, HPCA new version of the sh_dev tool has been released, that leverages a recently-added Slurm feature. Slurm 20.11 introduced a new“interactive step”, designed to be used with salloc to automatically launch a terminal on an allocated compute
about 4 years agoTracking NFS problems down to the SFP levelby Kilian CavalottiBlogDataHardwareWhen NFS problems turn out to be... not NFS problems at all.
almost 5 years agoJob #1by Kilian CavalottiIf you've been submitting jobs on Sherlock over the last couple days, you probably noticed something different about your your job ids... They lost a couple digits! If you submitted a job last week, its job id was likely in the 67,000...
almost 5 years agoSherlock is hard at work against COVID-19by Kilian CavalottiAbout a month ago, we announced that we were dedicating a portion of Sherlock's computing resources to research projects around COVID-19. Since then, more than 15 PIs and research groups have reached out to share their projects, and...