llm+evaluation+survey

2025-03-15 10:50:16

拼音 [ 拼音 ]

简拼 [ 简拼 ]

含义

LLM评测方法综述 - 知乎

A Survey on Evaluation of Large Language Models 原文链接:A Survey on Evaluation of Large Language Models Github链接:https://github.com/MLGroupJLU/LLM-eval-survey 大型语言模型(LLM)由于其在各种应用中前所未有的性能,在学术界和工业界都越来越受欢迎。随着LLM在研究和日常使用中继续发挥重要作用,如何对其...
...综述:Understanding the planning of LLM agents: A survey...

8 Evaluation 评估 9 论文总结论文简介随着大规模语言模型(LLMs)展现出显著的智能,将LLMs用作自主代理的规划(planning) 模块的进展吸引了更多关注。这项综述首次提供了关于基于LLM的代理规划的系统视角,涵盖了旨在提高规划能力的最新工作。我们对现有的LLM-代理规划工作进行了分类,可以分为: 任务分解 Task Decomposi...
「LLM-综述」利用大语言模型进行自然语言生成评估的综述

此外，Jiang等人(2023)从各种文本生成数据集(包括摘要、翻译和data2text)中采样数据，其系统输出包括真实系统输出和GPT-4合成，并提示GPT-4策划错误分析以调优LLaMA进行细粒度评估。论文标题：Leveraging Large Language Models for NLG Evaluation： A Survey 论文链接：https://arxiv.org/abs/2401.07103 #我是科技...
中国台湾李宏毅:如何让LLMs更好评估文本质量?-腾讯云开发者社区...

大型语言模型(LLMs)在自然语言处理领域的应用越来越广泛,但如何更好地使用它们来评估文本质量一直是个挑战。最近有一篇研究,深入探讨了如何最大程度地提高自然语言生成模型的评估性能,并提供了一些重要的指导原则。让我们一起来了解一下吧! Paper:A Closer Look into Automatic Evaluation Using Large Language ModelsLin...
GitHub - tjunlp-lab/Awesome-LLMs-Evaluation-Papers: The...

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey. - tjunlp-lab/Awesome-LLMs-Evaluation-Papers
a survey on evaluation of llms - 百度文库

a survey on evaluation of llmsa survey on evaluation of llms中文翻译 a survey on evaluation of llms翻译成中文意思为:远程学习管理系统评价研究综述。©2022 Baidu |由百度智能云提供计算服务 | 使用百度前必读 | 文库协议 | 网站地图 | 百度营销 ...
大语言模型(LLM)评价指标小汇总 - bonelee - 博客园

When Neural Model Meets NL2Code: A Survey[J]. arXiv preprint arXiv:2212.09420, 2022.(这篇论文之前叫Large Language Models Meet NL2Code: A Survey ,改名了)[5]Liang P, Bommasani R, Lee T, et al. Holistic evaluation of language models[J]. arXiv preprint arXiv:2211.09110, 2022.[6]Lees ...
6000字解读:当前大语言模型LLM研究的10大挑战_Natural_Context_输出

Survey of Hallucination in Natural Language Generation(Ji et al., 2022) How Language Model Hallucinations Can Snowball(Zhang et al., 2023) A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity(Bang et al., 2023) ...
GitHub - Paitesanshi/LLM-Agent-Survey

we conduct a comprehensive survey study, focusing on the construction, application, and evaluation of LLM-based autonomous agents. In particular, we first explore the essential components of an AI agent, including a profile module, a memory module, a planning module, and an action module. We fu...
大模型安全评估——LLMs Evaluation in Safety - 知乎

safety evaluation benchmark的代表工作 1.1 SafetyBench 第一篇是清华大学2024ACL提出的SafetyBench。它的数据整体特点是: 文章给出了7种安全风险的定义。构建了11435道的单选题,涵盖了中文和英文。数据来源分为了三部分: (a)对现有的公开数据集进行合并。

快搜汉语词典

llm+evaluation+survey

拼音 [ 拼音 ]

简拼 [ 简拼 ]

含义

LLM评测方法综述 - 知乎

...综述:Understanding the planning of LLM agents: A survey...

「LLM-综述」利用大语言模型进行自然语言生成评估的综述

中国台湾李宏毅:如何让LLMs更好评估文本质量?-腾讯云开发者社区...

GitHub - tjunlp-lab/Awesome-LLMs-Evaluation-Papers: The...

a survey on evaluation of llms - 百度文库

大语言模型(LLM)评价指标小汇总 - bonelee - 博客园

6000字解读:当前大语言模型LLM研究的10大挑战_Natural_Context_输出

GitHub - Paitesanshi/LLM-Agent-Survey

大模型安全评估——LLMs Evaluation in Safety - 知乎

缩写

今日热搜

上海网友集中晒蘑菇

近反义词

相关词语

相关搜索