但是,当我使用以下代码直接从 PyTorch API 加载预训练权重时,模型训练成功: def vit_h_14(): pretrained_vit_weights = torchvision.models.ViT_H_14_Weights.IMAGENET1K_SWAG_E2E_V1 pretrained_vit = torchvision.models.vit_h_14(weights=pretrained_vit_weights).to(device) for parameter in pretrained_vit...
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. This error only happened onvit_h_14model in cuda device (the cpu is fine). Also I cannot reproduce the error on AWS cluster machine. Seems like this error is either machine or environment dependent and likely to be pyto...