Together these optimizations enable throughput on a sparse Resnet-50 model at a batch size of 1 of 4550 images/s, which is nearly 4x the throughput of NVIDIA's fastest machine learning targeted GPU, the V100, and outperforms all prior work on FPGAs. 展开 ...