The introduction of SuperBench marks a significant advancement in proactive system validation. By addressing the challenges of gray failures and improving the reliability of cloud AI infrastructure, SuperBench not only enhances system performance but also contribute...
The introduction of SuperBench marks a significant advancement in proactive system validation. By addressing the challenges of gray failures and improving the reliability of cloud AI infrastructure, SuperBench not only enhances system performance but also ...
第一次投稿ATC,OveMer 4434(RevExp1312)中了,rebuttal没有改分,topic选的是cloud computing和storage。 文章内容是在加密镜像场景下,改进现有先将镜像层解压然后再进行message-locked encryption的去重方法。4个reviewer都很认可我们的motivation和技术。rebuttal阶段是让我们重点介绍系统的warm-up是如何做的。 ATC这出结...
The chinese translation for https://www.usenix.org/legacy/event/atc10/tech/full_papers/Hunt.pdf - mapleFU/zookeeper_paper_cn
To reproduce the performance, it is suggested to use the same software version as specified in the paper Section 6.1 "Experimental Settings". We also provide a Docker image that complies with all the suggested software.docker pull yuantailing/megatron-kwai:atc24ae-1.0.0...
system for cloud AI infrastructure that mitigates hidden degradation caused by hardware redundancies and enhances overall reliability. The paper on SuperBench has been accepted by USENIX ATC 2024, the world’s top academic conference in the field of computer systems, ...