而本次引发关注的正是其新版本 KwaiYii-13B,该预训练模型具备优异的通用技术底座能力,在 MMLU、CMMLU、C-Eval、HumanEval 等 Benchmark 上目前处于同等模型规模的领先水平,显示出快意大模型在中文和英文上都具备非常强悍的能力。KwaiYii-13B-Base 在 Benchmark 上的效果 KwaiYii-13B-Chat 对话模型具有出色的语...
例如,KwaiYii-13B-Base 预训练模型在 MMLU、CMMLU、C-Eval、HumanEval 等 Benchmark 上目前处于同等模型规模的领先水平。KwaiYii-13B-Chat 对话模型具备出色的语言理解和生成能力,支持内容创作、信息咨询、数学逻辑、代码编写、多轮对话等广泛任务,人工评估结果表明 KwaiYii-13B-Chat 超过主流的开源模型,并在内容...
[4] L. Xu and others from SuperCLUE team. Superclue: A benchmark for foundation models in chinese. https://github.com/CLUEbench/SuperCLUE, 2023. [5] L. Xu, H. Hu, X. Zhang, L. Li, C. Cao, Y. Li, Y. Xu, K. Sun, D. Yu, C. Yu, Y. Tian, Q. Dong, W. Liu, B....
[4] L. Xu and others from SuperCLUE team. Superclue: A benchmark for foundation models in chinese. https://github.com/CLUEbench/SuperCLUE, 2023. [5] L. Xu, H. Hu, X. Zhang, L. Li, C. Cao, Y. Li, Y. Xu, K. Sun, D. Yu, C. Yu, Y. Tian, Q. Dong, W. Liu, B....
[4] L. Xu and others from SuperCLUE team. Superclue: A benchmark for foundation models in chinese. https://github.com/CLUEbench/SuperCLUE, 2023. [5] L. Xu, H. Hu, X. Zhang, L. Li, C. Cao, Y. Li, Y. Xu, K. Sun, D. Yu, C. Yu, Y. Tian, Q. Dong, W. Liu, B....
KwaiYii-13B-Base 预训练模型具备优异的通用技术底座能力,在绝大部分权威的中 / 英文 Benchmark 上取得了同等模型尺寸下的 State-Of-The-Art 效果。例如,KwaiYii-13B-Base 预训练模型在 MMLU、CMMLU、C-Eval、HumanEval 等 Benchmark 上目前处于同等模型规模的领先水平。
A chinese language understanding evaluation benchmark. In D. Scott, N. Bel, and C. Zong, editors,Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, pages 4762–4772. International Committee on Computational ...
1、KwaiYii-13B-Base预训练模型具备优异的通用技术底座能力,在绝大部分权威的中/英文Benchmark上取得了同等模型尺寸下的State-Of-The-Art效果。 例如,KwaiYii-13B-Base预训练模型在MMLU、CMMLU、C-Eval、HumanEval等Benchmark上目前处于同等模型规模的领先水平。
例如,KwaiYii-13B-Base预训练模型在MMLU、CMMLU、C-Eval、HumanEval等Benchmark上目前处于同等模型规模的领先水平。 2、KwaiYii-13B-Chat对话模型具备出色的语言理解和生成能力,支持内容创作、信息咨询、数学逻辑、代码编写、多轮对话等广泛任务,人工评估结果表明KwaiYii-13B-Chat超过主流的开源模型,并在内容创作、信息...
As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging. This paper aims to bridge this gap by introducing CMMLU, a comprehensive Chinese benchmark that covers various subjects, including natural science, social ...