不少用BERT+NMT有效的工作几乎都不跟BT 比,唯一的价值就是证明BERT通过提出的方法可以优化NMT,但是从实用角度,BT如果同样好用为啥要用BERT?感觉大炮打蚊子,尤其是资源丰富的语种
(most recent call last): File "", line 1, in File "/home/jiahui/workspace/nmt/bert_nmt/fairseq/fairseq/modules/init.py", line 33, in from .multihead_attention import MultiheadAttention File "/home/jiahui/workspace/nmt/bert_nmt/fairseq/fairseq/modules/multihead_attention.py", line 25, ...
As the name suggests, the BERT Simple Seq2Seq Model is literally the simplest structural model to apply BERT to the NLG Task. This model follows Transformer architecture, but the only difference is that it uses BERT as an Encoder. BERT Fused Seq2Seq Model follows the methodology that suggeste...