[BUG] exits with return code = -11 #4063 New issue Closed Description DogeWatch opened on Jul 31, 2023 I run this repo https://github.com/yangjianxin1/Firefly on my own data, ds config is like { "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print...
I am trying to finetune a LLM by running a finetune script (https://github.com/PKU-YuanGroup/Video-LLaVA/blob/main/scripts/v1_5/finetune.sh). I am using zero2_offload.json. After running the script the script automatically terminates by giving return code = -11 This is the finetune...
环境配置 虚拟环境路径也要一样,conda各个节点安装路径也要一样(否则会报exits with return code = 127),各种安装的库版本要保持高度一致(否则会报exits with return code = -6) 环境路径加载 主节点上默认虚拟环境路径系统加载可能不正确,python环境无法正常加载,所以需要在入口python文件里加入:(这个问题目前没有...
False [2023-05-01 11:16:01,205] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False [2023-05-01 11:16:01,205] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True [2023-05-01 11:16:01,205] [INFO] [config.py:957:print] checkpoint_tag_validation_...
py','--local_rank=0','--model_name_or_path','facebook/opt-1.3b','--gradient_accumulation_steps','8','--lora_dim','128','--zero_stage','0','--deepspeed','--output_dir','/home/zhangxiaoyu/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b']exitswithreturncode=...
你好,@CxsGhost,请阅读更多关于死锁的信息,这里和这里。本质上,问题将会发生,如果你的内存不足以计算...
ZW5hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IG51bGwsICJkZXBsb3lfcmFuayI6IFswXSwgInRvcmNoX2Rpc3RfcG9ydCI6IDI5NTAwLCAiaGZfYXV0aF90b2tlbiI6IG51bGwsICJyZXBsYWNlX3dpdGhfa2VybmVsX2luamVjdCI6IHRydWUsICJwcm9maWxlX21vZGVsX3RpbWUiOiBmYWxzZX0='] exits with return code =...
['/opt/conda/bin/python3.8', '-u', './dtu_denovo_sequencing/train.py', '--local_rank=7', 'train_data_path=./data/denovo_dataset_v1/', 'batch_size=12', 'distributed.n_gpus_per_node=8', '--deepspeed', '--deepspeed_config=deepspeed_cfg.json'] exits with return code = -7...
- Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) [2024-07-10 23:25:59,357] [INFO] [launch.py:351:main] Process 9200 exits successfully 八.服务器资源使用情况编辑于 2024-07-11 15:04・IP 属地河南 ...
3b'] exits with return code = 1提示accelerate的版本需要>=0.20.3,而当前的版本是0.19.0。