Add a description, image, and links to the ceval topic page so that developers can more easily learn about it. Curate this topic Add this topic to your repo To associate your repository with the ceval topic, visit your repo's landing page and select "manage topics." Learn more Fo...
const Parser = require('ceval') const analysis = new Parser({ /** * @desc Allow operators * @type {boolean} */ endableOperators?: boolean = true; /** * @desc number enable multi bit base * @type {boolean} */ endableBitNumber?: boolean = true; /** * @desc Allow access to ...
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023] - hkust-nlp/ceval
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023] - ceval/README.md at main · hkust-nlp/ceval
中文大语言模型评估基准:C-EVAL C-EVAL: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models arxiv.org/pdf/2305.0832 github.com/SJTU-LIT/cev cevalbenchmark.com/stat Part1 前言 怎么去评估一个大语言模型呢? 在广泛的NLP任务上进行评估。 在高级LLM能力上进行评估,比如推理、解...
github地址:https://github.com/SJTU-LIT/ceval C-Eval榜单地址:https://cevalbenchmark.com/static/leaderboard.html 数据集地址:https://huggingface.co/datasets/ceval/ceval-exam C-Eval的科目覆盖及难度设计 C-Eval包括四个难度级别的多项选择题:初中、高中、大学和专业。C-Eval还附带有C-Eval HARD,...
GitHub Advanced Security Enterprise-grade security features Copilot for business Enterprise-grade AI features Premium Support Enterprise-grade 24/7 support Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Provide feedback We read every piece of feedback...
C-Eval 测评大模型 - 掘金 https://github.com/llmeval/llmeval-1/tree/master chatGLM2-6B 模型在 MMLU(英文)、C-Eval(中文)、GSM8K(数学)、BBH(英文)上的测评结果。 ChatGLM-6B升级V2:性能大幅提升,8-32k上下文,推理提速42% | 量子位 ChatGLM相关 - 知乎 (zhihu.com) ...
Code URL:https://github.com/SJTU-LIT/ceval Blog URL:https://yaofu.notion.site/C-Eval-6b79edd91b454e3d8ea41c59ea2af873 TL;DR 上海交大和清华联合研发的中文大语言模型测试集,是目前最流行的中文测试集之一 Introduction 背景 在OpenAI GPT 系列 / Google PaLM 系列 / DeepMind Chinchilla 系列 / ...
See https://github.com/csebuetnlp/xl-sum/tree/master/multilingual_rouge_scoring Run the scripts for inference and metric calculation (with Qwen-2.5 as an example): cd scripts bash run_eval.sh bash calculate_metrics.sh The evaluation results will be saved in the output directory. Pretraining...