and self._torchao_fp8_handler.precompute_scale ): warnings.warn( f"Parameters are kept in float32 as {self.distributed_mode=} and fp8 dynamic scaling precompute is enabled" ) else: model.to(dtype=torch.bfloat16) model.to(dtype=torch.bfloat16) model = self._torchao_fp8_handler.convert_...
🎉 CUDA Learn Notes with PyTorch: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc. - Phoenix8215/CUDA-Learn-Notes