Step 3: Apply both diffusion training loss and reward loss: # reward model inferenceifargs.task_name=='canny':outputs=reward_model(image.to(accelerator.device),low_threshold,high_threshold)else:outputs=reward_model(image.to(accelerator.device))# Determine which samples in the current batch need ...