"gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 100, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false } 36 changes: 6 additions & 30 deletions 36 examples/pytorch/llama2/fine_tuning.py ...
File c:\Users\Tian\Documents\code\mindnlp\mindnlp\engine\trainer\base.py:986, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, ignore_keys_for_eval) 983 if step % args.gradient_accumulation_steps == 0: 984 self.control = self.callback_handler.on_step_begi...
The gradient backpropagation through time can be regulated to effectively address the gradient vanishing and exploding problems. Long-term memory can be kept with IndRNNs to process long sequences. Experiments have demonstrated that an IndRNN can well process sequences over 5000 steps. ...