(WG) update.WG takes about one-third of the computation in the whole training process.Current training accelerators usually ignore the special computation property of WG and process it in a way similar to FF/BP.Besides, the extensive data sparsity existing in WG, which brings opportunities to ...