bf16 is only supported on A100+ GPUs smallkFis not supported because: max(query.shape[-1] != value.shape[-1]) > 32 dtype=torch.bfloat16 (supported: {torch.float32}) has custom scale bf16 is only supported on A10
Because memory capacity is not the only relevant metric? I see. The RTX 4090 is based on the Ada Lovelace architecture which is newer than the Ampere architecture, which in turn is newer than the Turing architecture that the Quadro RTX 6000 uses I understand. e.g. is “for example”...
RuntimeError: grad_scale is too small when training Large scale Zipformer model (streaming)#1463 additional instruction for thegrad_scale is too smallerror#1550 Disadvantages of bf16 training: Limited hardware support (only supports ampere and afterwards, so V100 not supported) ...