() File "/root/miniconda3/envs/szsys_py39/lib/python3.9/site-packages/torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py", line 54, in next_rendezvous self._store = TCPStore( # type: ignore[call-arg] RuntimeError: Stop_waiting response is expected 复制 副节点报错: /root...
When I run the pipeline.py, I got this error : Traceback (most recent call last): File "pipeline.py", line 99, in spawn(main, args = (), nprocs = world_size, join = True) File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn re...
self._store = TCPStore( # type: ignore[call-arg] RuntimeError: Stop_waiting response is expected For node1: Traceback (most recent call last): File "/data/data/anaconda3/envs/pytorch/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, ...