HPC & Quantum
27 subscribers
11.4K photos
668 videos
3 files
30.6K links
Download Telegram
HPC Guru (Twitter)

RT @johnclinford: The AHUG Cloud Hackathon for #Arm #HPC has started! 36 teams, 61 clusters, 110 apps/mini-apps, 5 days, 4 AWS, #gravition2, #efa, #spack, #reframe, #slurm ... it's awesome!
HPC Guru (Twitter)

RT @hpcjoe: Fellow #HPC ers who use #SLURM for their schedulers, I'm curious about something. How many of you are using REST API based tooling to interact with it (as in submit/delete/status jobs) to any great extent? Or are you using just CLI/C-API?
HPC Guru (Twitter)

#AWS believes it has finally created a cloud service that will break through with skeptical #HPC and #supercomputing customers

Parallel Computing Service: @awscloud adds support for the #Slurm scheduler to ease the transition to #cloud

https://www.hpcwire.com/2024/08/29/aws-perfects-cloud-service-for-supercomputing-customers/

via @HPCwire
HPC Guru (Twitter)

ML clusters @Meta:

o Use #Slurm on top of bare-metal allocations

o MTTF of 1024-GPU jobs is 7.9 hours - ~2 orders-of-magnitude lower than 8-GPU jobs (47.7 days)

o Restart time is 5-20 minutes after a failure

o Meta rediscovers idea of “forward progress” as defined by NNSA