Describe the bug In a single-node training run, the commanddeepspeed --enable_each_rank_log logdir <training command here>will cause each rank to write its stderr/stdout to a unique file in logdir/ However, in a multinode training run using the default launcher (PDSH) e.g.deepspeed --ho...
When configuring the config with client.config.enableLoggerErrorToTrace = true; or if configuring using a JSON file "enableLoggerErrorToTrace" = true Bunyan log.err will be exported as traces preserving information passed by the log.err in the msg field. This also extends to console and win...