For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given ...
DatasetNum.#Hoptext-davinci-003ChatGPTBARD 0-shot1-shotCOT0-shot1-shotCOT0-shot1-shotCOT NeuLR3,0001~550.9359.1767.9037.2748.1348.0063.6765.0766.00 Deductive1,000259.0069.4086.1085.2069.1068.3087.4093.1091.90 Inductive1,000386.9089.6095.6015.1068.6069.6096.0092.6096.30 ...
PARARULE Plus is a deep multi-step reasoning dataset over natural language. It can be seen as an improvement on the dataset of PARARULE (Peter Clark et al., 2020). The motivation is to generate deeper PARARULE training samples. We add more training samples for the case where the depth ...
github: https://github.com/csitfun/ConTRoL-dataset Chinese: ConTRoL是一个深入考察上下文推理能力尤其是逻辑推理能力的自然语言推理数据集,共包含8325个专家设计的“Context-hypothesis”对及其标签“entailment/neutral/contradiction”。ConTRoL作为一个篇章级的自然语言推理数据集,考察复杂的上下文逻辑推理能力,来源于职...
论文:Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework【RLCD:针对语言模型对齐的对比蒸馏强化学习】近日,加利福尼亚大学伯克利分校、Meta AI 和加利福尼亚大学洛杉矶分校联合提出了对比蒸馏强化学习(RLCD),这是一种在不使用人类反馈的情况下使语言模型遵循自然语言原则的...
Cater: a diagnostic dataset for compositional actions and temporal reasoning. In Proc. of 8th International Conference on Learning Representations (2020). James, S., Ma, Z., Arrojo, D. R. & Davison, A. J. Rlbench: the robot learning benchmark & learning environment. IEEE Robot. Autom. ...
大家好,今天和大家分享的论文是发表在MM2020的《MEmoR: A Dataset for Multimodal Emotion Reasoning in Videos》。这篇论文定义了一个新的名为Multimodal Emotion Reasoning的任务,提出了一个帮助解决这一问题的数据集MEmoR,再针对这个任务提出了一个基于注意力机制的模型AMER。 1. Motivation 该文认为,让人工智能系统...
(like GPT-4 or Llama-2) pre-trained on enormous amounts of data and capable of performing a sheer variety of language tasks in the zero-shot manner (or at least be fine-tuned on the specific dataset). These days, multimodal FMs even support language, vision, audio, and other modalities...
consider SPARQL query number nine from the LUBM test suite that turned out to be one of the most challenging out of the 14 given queries. The query asks for students and their advisors which teach courses taken by those students – a triangular relationship pattern over most of the dataset:...
The Touchdown Dataset (c) 2018 The Touchdown Dataset is licensed under a Creative Commons Attribution 4.0 International License. You should have received a copy of the license along with this work. If not, seehttp://creativecommons.org/licenses/by/4.0/. ...