使用此标志提供参数以初始化wandb运行(wandb.init)为逗号分隔的字符串参数。 lm_eval --model hf --model_args pretrained=microsoft/phi-2,trust_remote_code=True --tasks hellaswag,mmlu_abstract_algebra --device cuda:0 --batch_size 8 --output_path output/phi-2 --limit 10 --wandb_args project=...
task_manager = lm_eval.tasks.TaskManager( include_path="/path/to/custom/yaml" ) # To get a task dict for `evaluate` task_dict = lm_eval.tasks.get_task_dict( [ "mmlu", # A stock task "my_custom_task", # A custom task { "task": ..., # A dict that configures a task "doc...
To avoid conflict, each task needs to be registered with a unique name. Because of this, slight variations of task are still counted as unique tasks and need to be named uniquely. This could be done by appending an additional naming that may refer to the variation such as in MMLU where ...
lm_eval --model hf --model_args pretrained=microsoft/phi-2,trust_remote_code=True --tasks hellaswag,mmlu_abstract_algebra --device cuda:0 --batch_size 8 --output_path output/phi-2 --limit 10 --wandb_args project=lm-eval-harness-integration --log_samples ...
To avoid conflict, each task needs to be registered with a unique name. Because of this, slight variations of task are still counted as unique tasks and need to be named uniquely. This could be done by appending an additional naming that may refer to the variation such as in MMLU where ...
[!NOTE] Currently loglikelihood and MCQ based tasks (such as MMLU) are only supported for completion endpoints. Not for chat-completion — those that expect a list of dicts — endpoints! Completion APIs which support instruct tuned models can be evaluated with the --apply_chat_template option...
mmlureadme bjudge baber bchat mela adamlin120/main revert-2083-patch-1 v0.4.4 v0.4.3 v0.4.2 v0.4.1 v0.4.0 v0.3.0 v0.2.0 v0.0.1 lm-evaluation-harness / ignore.txt ignore.txt 28 Bytes 一键复制 编辑 原始数据 按行查看 历史 Julen Etxaniz 提交于 2年前 . Add multilingual ...
mmlureadme bjudge baber bchat mela adamlin120/main revert-2083-patch-1 v0.4.4 v0.4.3 v0.4.2 v0.4.1 v0.4.0 v0.3.0 v0.2.0 v0.0.1 lm-evaluation-harness / main.py main.py 3.43 KB 一键复制 编辑 原始数据 按行查看 历史 jonabur 提交于 1年前 . Add pre-commit fixes. 12345...
lm_eval \ --model hf \ --model_args pretrained=microsoft/phi-2,trust_remote_code=True \ --tasks hellaswag,mmlu_abstract_algebra \ --device cuda:0 \ --batch_size 8 \ --output_path output/phi-2 \ --limit 10 \ --wandb_args project=lm-eval-harness-integration \ --log_samples ...
arc_challenge \ --batch_size auto \ --output_path ./eval_out/openbuddy13b \ --use_cache ./eval_cache # 使用accelerate启动器,这支持多GPU accelerate launch -m lm_eval --model hf \ --model_args pretrained=./openbuddy-llama2-13b-v11.1-bf16 \ --tasks mmlu,nq_open,triviaqa,truthfulqa...