Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering简介,程序员大本营,技术文章内容聚合第一站。
While previous VL research focuses mainly on improving the vision-language fusion model and leaves the object detection model improvement untouched, we show that visual features matter significantly in VL models. In our experiments we feed the visual features generated by the new object detection ...