@misc{li2023cmmlu, title={CMMLU: Measuring massive multitask language understanding in Chinese}, author={Haonan Li and Yixuan Zhang and Fajri Koto and Yifei Yang and Hai Zhao and Yeyun Gong and Nan Duan and Timothy Baldwin}, year={2023}, eprint={2306.09212}, archivePrefix={arXiv}, prim...
中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调。 - add cmmlu · Ma-Dan/ChatLM-mini-Chinese@5f39fe7
This paper aims to bridge this gap by introducing CMMLU, a comprehensive Chinese benchmark that covers various subjects, including natural science, social sciences, engineering, and humanities. We conduct a thorough evaluation of 18 advanced multilingual- and Chinese-oriented LLMs, assessing their ...
相较今年1月发布的孟子大模型GPT V2系列,Mengzi3-13B在数据集质量上有了显著的提升。其采用的Mengzi-3数据集规模高达3T tokens,涵盖了网页、代码、书籍、论文等多元化、高质量的数据来源。 在MMLU、Chinese-MMLU、GSM8K、HUMAN-EVAL等多项公开数据集进行的模型效果评估中,Mengzi3-13B的性能表现出色。在参数量20B以...
🎉According to the results from C-Eval and CMMLU, the performance of Llama3-70B-Chinese-Chat in Chinese significantly exceeds that of ChatGPT and is comparable to GPT-4! Developed by:Shenzhi Wang(王慎执) andYaowei Zheng(郑耀威)
Our goal was to provide an easy to setup and fast evaluation library for guiding the performance/use on existing chinese LLMs. Currently, we only support evaluation for TMMLU+, however in the future we are exploring more domain, ie knowledge extensive dataset (CMMLU, C-Eval) as well as ...
CMMLU是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。CMMLU涵盖了从基础学科到高级专业水平的67个主题。它包括:需要计算和推理的自然科学,需要知识的人文科学和社会科学,以及需要生活常识的中国驾驶规则等。此外,CMMLU中的许多任务具有中国特定的答案,可能在其他地区或语言中并不普遍适...