mmlu+pro+leaderboard

2025-02-17 17:19:15

拼音 [ 拼音 ]

简拼 [ 简拼 ]

含义

MMLU-Pro:新的 LLM 评估基准-AI.x-AIGC专属社区-51CTO.COM

对应的 Leaderboard:MMLU Pro - a Hugging Face Space by TIGER-Lab 二、摘要在LLM 的发展历程中,MMLU 这样的基准测试在推动 AI 在不同领域的语言理解和推理方面起到关键作用。然而,随着模型的不断改进,这些基准测试的性能开始趋于稳定,辨别不同模型能力的差异变得越来越困难。因此作者创建了 MMLU-Pro,这是一...
GitHub - TIGER-AI-Lab/MMLU-Pro: The code and data for "MMLU...

MMLU-Pro |🤗 Dataset | 🏆Leaderboard | 📖 Paper | This repo contains the evaluation code for the NeurIPS-24 paper "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" Introduction We introduce MMLU-Pro, an enhanced benchmark designed to evaluate language und...
MMLU-Pro/README.md at main · TIGER-AI-Lab/MMLU-Pro · GitHub

Breadcrumbs MMLU-Pro / README.mdTop File metadata and controls Preview Code Blame 81 lines (63 loc) · 4.83 KB Raw MMLU-Pro |🤗 Dataset | 🏆Leaderboard | 📖 Paper |This repo contains the evaluation code for the NeurIPS-24 paper "MMLU-Pro: A More Robust and Challenging Multi-Task ...
Add llama 3.2 mmlu, math, gpqa evals to meta_eval harness (#...

- **Tasks for instruct models**: Math-Hard, IFeval, GPQA, and MMLU-Pro These tasks are common evalutions, many of which overlap with the Hugging Face [Open LLM Leaderboard v2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Here, we aim to get the benchmark ...
GitHub - TIGER-AI-Lab/MMLU-Pro: The code and data for "MMLU...

MMLU-Pro |🤗 Dataset | 🏆Leaderboard | 📖 Paper | This repo contains the evaluation code for the NeurIPS-24 paper "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" Introduction We introduce MMLU-Pro, an enhanced benchmark designed to evaluate language und...

快搜汉语词典

mmlu+pro+leaderboard

拼音 [ 拼音 ]

简拼 [ 简拼 ]

含义

MMLU-Pro:新的 LLM 评估基准-AI.x-AIGC专属社区-51CTO.COM

GitHub - TIGER-AI-Lab/MMLU-Pro: The code and data for "MMLU...

MMLU-Pro/README.md at main · TIGER-AI-Lab/MMLU-Pro · GitHub

Add llama 3.2 mmlu, math, gpqa evals to meta_eval harness (#...

GitHub - TIGER-AI-Lab/MMLU-Pro: The code and data for "MMLU...

缩写

今日热搜

上海网友集中晒蘑菇

近反义词

相关词语

相关搜索