We compare a simple Transformer Encoder-Decoder model built on the work of [4] and a Encoder-Decoder model where the Encoder is BERT [1], a Transformer with weights pre-trained to learn a bidirectional language model over a large dataset. Our architectures have been trained on semantic ...
An original implementation of ACL 2019, "Compositional Questions Do Not Necessitate Multi-hop Reasoning" (Single-hop Reading Comprehension Model based on BERT) - shmsw25/single-hop-rc