rewardbench --model={yourmodel} Examples: Normal operation rewardbench --model=OpenAssistant/reward-model-deberta-v3-large-v2 --dataset=allenai/ultrafeedback_binarized_cleaned --split=test_gen --chat_template=raw DPO model from local dataset (note --load_json) rewardbench --model=Qwen/...
rewardbench --model=OpenAssistant/reward-model-deberta-v3-large-v2 --dataset=allenai/ultrafeedback_binarized_cleaned --split=test_gen --chat_template=raw DPO model from local dataset (note --load_json) rewardbench --model=Qwen/Qwen1.5-0.5B-Chat --ref_model=Qwen/Qwen1.5-0.5B --dataset...
rewardbench --model={yourmodel} Examples: Normal operation rewardbench --model=OpenAssistant/reward-model-deberta-v3-large-v2 --dataset=allenai/ultrafeedback_binarized_cleaned --split=test_gen --chat_template=raw DPO model from local dataset (note--load_json) ...
rewardbench --model={yourmodel} Examples: Normal operation rewardbench --model=OpenAssistant/reward-model-deberta-v3-large-v2 --dataset=allenai/ultrafeedback_binarized_cleaned --split=test_gen --chat_template=raw DPO model from local dataset (note --load_json) rewardbench --model=Qwen/...