Wouldn't it be great if we use human feedback for generated text as a measure of performance or go even one step further and use that feedback as a loss to optimize the model? That's the idea of Reinforcement Learning from Human Feedback (RLHF); use methods from reinforcement learning...
Public repo for HF blog posts. Contribute to tolgacangoz/blog development by creating an account on GitHub.
Public repo for HF blog posts. Contribute to merico34/Huggingface-blog development by creating an account on GitHub.