HuggingFace都在用的MMLU-PRO,被扒出评测方法更偏向闭源模型,被网友直接在GitHub Issue提出质疑。 此前MMLU原始版本早已经被各家大模型刷爆了,谁考都是高分,对前沿模型已经没有了区分度。 号称更强大、更具挑战线性多任务语言基准MMLU-Pro,成了业界对大模型性能的重要参考。 但结果没想到的是,现在有人扒出其在采...
MMLU-Pro was created to provide language models with a more challenging and robust benchmark, pushing the boundaries of what these models can achieve in terms of expert-level knowledge and reasoning. Please refer to our huggingface 🤗 Dataset for more details. Evaluation To run local inference,...
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024] - MMLU-Pro/README.md at main · TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024] - MMLU-Pro/evaluate_from_local.py at main · TIGER-AI-Lab/MMLU-Pro
MMLU-Pro |🤗 Dataset | 🏆Leaderboard | 📖 Paper | This repo contains the evaluation code for the NeurIPS-24 paper "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" Introduction We introduce MMLU-Pro, an enhanced benchmark designed to evaluate language und...
git diff mmlu-pro..main -- run_openai.py Usage Change the config.toml according to your setup. You can also override settings in configuration file with command line flags like --model, ---category, etc. For example, if you specify--model phi3, all the settings from configuration file...
Ollama_MMLU_Pro.ipynb README.md config.toml requirements.txt run_openai.pyBreadcrumbs Ollama-MMLU-Pro / requirements.txt Latest commit chigkim Changed default configuration based on the comment: TIGER-AI-Lab/MMLU… d6b3680· Jul 10, 2024 HistoryHistory ...
The scripts for MMLU-Pro. Contribute to TIGER-AI-Lab/MMLU-Pro development by creating an account on GitHub.
parser.add_argument("--file", type=str, default="TIGER-Lab/MMLU-Pro", help="Path to the mmlu.jsonl file") parser.add_argument("--result", type=str, default="./mmlu_pro.json", help="Path to save the result JSON file") parser.add_argument("--log", type=str, default="./mmlu...
This is a modified version ofTIGER-AI-Lab/MMLU-Pro, and it lets you runMMLU-Probenchmark via the OpenAI Chat Completion API. It's tested on Ollama and Llama.cpp, but it should also work with LMStudio, Koboldcpp, Oobabooga with openai extension, etc. ...