每台机器上,环境配置,各种安装的库版本要保持高度一致,比如在测试时,由于Transformers库版本不一致,遇到了exits with return code = -6的错误; deepspeed不识别在conda环境中的相关库路径,如遇到已经在conda环境中安装了ninja,但deepspeed仍提示未找到,需将环境变量PATH中加入ninja在conda安装的库路径...
784] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 861021 [2022-01-30 20:09:26,784] [ERROR] [launch.py:184:sigkill_handler] ['/home/aadelucia/miniconda3/envs/fda_cersi_tobacco/bin/python', '-u', 'hf_zero_example.py', '--local_rank=1'] exits with return code = ...
0', '--model_name_or_path', 'facebook/opt-1.3b', '--gradient_accumulation_steps', '2', '--lora_dim', '128', '--zero_stage', '0', '--deepspeed', '--output_dir', '/DeepSpeed/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b'] exits with return code = 1...
环境配置 虚拟环境路径也要一样,conda各个节点安装路径也要一样(否则会报exits with return code = 127),各种安装的库版本要保持高度一致(否则会报exits with return code = -6) 环境路径加载 主节点上默认虚拟环境路径系统加载可能不正确,python环境无法正常加载,所以需要在入口python文件里加入:(这个问题目前没有...
','--local_rank=0','--model_name_or_path','facebook/opt-1.3b','--gradient_accumulation_steps','8','--lora_dim','128','--zero_stage','0','--deepspeed','--output_dir','/home/zhangxiaoyu/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b']exitswithreturncode=1...
你好,@CxsGhost,请阅读更多关于死锁的信息,这里和这里。本质上,问题将会发生,如果你的内存不足以计算...
我们目前不支持70B llama-2模型(这里的模型架构与较小的llama-2变体不同)。我们正在努力尽快添加支持!
172.27.221.56: [2024-06-07 22:13:00,183] [ERROR] [launch.py:325:sigkill_handler] ['/home/hy/anaconda3/envs/algmnode1/bin/python', '-u', 'test.py', '--local_rank=7'] exits with return code = 1
hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IG51bGwsICJkZXBsb3lfcmFuayI6IFswXSwgInRvcmNoX2Rpc3RfcG9ydCI6IDI5NTAwLCAiaGZfYXV0aF90b2tlbiI6IG51bGwsICJyZXBsYWNlX3dpdGhfa2VybmVsX2luamVjdCI6IHRydWUsICJwcm9maWxlX21vZGVsX3RpbWUiOiBmYWxzZX0='] exits with return code = 1...
'--model_name_or_path', 'facebook/opt-1.3b', '--gradient_accumulation_steps', '8', '--lora_dim', '128', '--zero_stage', '0', '--deepspeed', '--output_dir', '/home/zhangxiaoyu/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b'] exits with return code = 1...