Describe the bug AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 16 != 2 * 1 * 1 This error only occurs when using deepspeed v0.9.0 and zero stage 2. M...
I can reproduce this with the above code on my 3090 TI with xformers 0.0.21 and on the T4 GPU on free google colab with xformers-0.0.22.dev599 toyxyz, Kosinkadink, jidedaka, Ir1d, and NIRVANALAN reacted with thumbs up emojiCHNtentes reacted with eyes emoji ...
测试方法来自沐神:github.com/mli/transfor A100A6000V1003090 Ti4090 Theory TF32(FP32) / FP16 156 / 312 75 / 150 16 / 125 80 / 160 Memory (GB) / Bandwidth (GB/s) 80 / 2039 48 / 768 32 / 900 24 / 1008 Approximate Price $ 16,000 4,000 3,500 1,500 Matrix Multiplication...
deephog network.get_input(0).shape=(4,3,224,224) I tried your method, and i can see the input shape of the trt engine changed as I specified it. However, the inference time of the engine doesn't change no matter what batch size I used when compiling the engine. I have the engine...
❓ Questions & Help When running evaluation, why am i getting slightly different output when running a batch size of 1 compared to batch size greater than 1?