", "speech": "", "vote": "孙五", "model": "gpt-4.1-nano" }, /* 4.1-nano 预言家最后一天才跳身份,被票 */ { "player": "user", "role": "法官", "think": "", "speech": "现在是白天,请睁眼。目前在场的玩家有:['郑八', '张三', '孙五']。请发言。", "vote": [ "郑八...
每个项目测试三次, 将输出命名为 {benchmark_name}-{model_name}-{turn-n} 选取得分最高的输出, 将测试输出后缀增加 -high-score, 例如: benchmark-ball-bouncing-inside-spinning-hexagon-Claude-3.7-Sonnet-Thinking-turn-3-high-score.py 在项目目录运行 make all 生成得分图片, 生成环境需要有 python-3.1...