检索增强型生成(RAG)是缓解大型语言模型(LLMs)幻觉现象的一种有希望的方法。然而,现有研究缺乏对检索增强型生成对不同大型语言模型影响的严格评估,这使得识别RAG在不同LLMs能力方面的潜在瓶颈变得具有挑战性。在本文中,我们系统地研究了检索增强型生成对大型语言模型的影响。我们分析了不同大型语言模型在RAG所需的4种...
论文阅读:BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling 在LLM的Alignment中,有两个很基础的问题: RLHF的优化上限是多少?如何训练模型接近这个上限,但不显著影响模型的分布(因为分布偏移可能会导致模型的非预期行为,比如回复不断变长之类)… Didi 论文题目:《Multi-Task Lea...
论文精读:TASKBENCH: BENCHMARKING LARGE LANGUAGE MODELS FOR TASK AUTOMATION,LLM带动了任务自动化的发展,它将用户指令描述的复杂任务分解为子任务,并调用外部工具来执行它们,在Agent中发挥着核心作用。但是目前还缺少系统化、标准化的基准来催LLM任务自动化的发展。
Large Language Models (LLM) continue to demonstrate their utility in a variety of emergent capabilities in different fields. An area that could benefit from effective language understanding in cybersecurity is the analysis of log files. This work explores LLMs with different architectures (BERT, RoBE...
This study investigates the strategic decision-making abilities of large language models (LLMs) via the game of Tic-Tac-Toe, renowned for its straightforward rules and definitive outcomes. We developed a mobile application coupled with web services, facilitating gameplay am...
大模型能否胜任临床诊断任务交互式医学诊断仿真和评测AIHospitalBenchmarkingLargeLanguageModelsinaMulti-agentMedicalInteractionSimulator AIHospital:BenchmarkingLargeLanguageModels inaMulti-agentMedicalInteractionSimulator 112∗33 ZhihaoFan,JialongTang,WeiChen,SiyuanWang,ZhongyuWei, ...
Large language models are effective in aiding threat hunting and incident investigation. However, they would still require some guardrails and guidance. We believe that this potential application can be implemented using LLMs out of the box, with careful prompt engineering. ...
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer Authors Akari Asai, Sneha Kudugunta, Xinyan Velocity, Terra Blevins, Hila Gonen, Machel Reid, Yulia Tsvetkov, Sebastian Ruder, Hannaneh Hajishirzi Despite remarkable advancements in few-shot generalization in n...
MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems 15 Apr 2024 · Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Zhiyong Huang, Jing Ma · Edit social preview Programming often involves converting detailed and complex specifications into...
CriticBench: Evaluating Large Language Models as Critic Tian Lan, Wenwei Zhang, Chen Xu, Heyan Huang, Dahua Lin, Kai Chen, Xian-Ling Mao 2024 Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models Chenyang Lyu, Minghao Wu, Alham Fikri Aji ...