Published July 24, 2023
| Version v1
Presentation
Open
Reinforcement Learning from Human Feedback: A Tutorial at ICML 2023
Contributors
Project manager:
- 1. Toloka
- 2. Yandex
- 3. Hugging Face
Description
Reinforcement learning from human feedback (RLHF) has dramatically improved the real-world performance and user experience of large machine learning models. Still, this approach has primarily been applied at a scale of compute and data curation that limits academic availability. In this tutorial, we will describe the general framework of RLHF and explain the technical procedures required to apply this framework. The tutorial begins with a detailed conceptual overview and continues with an explanation of human-in-the-loop data collection procedures used when scaling state-of-the-art systems.
Files
ICML2023-RLHF-Tutorial.pdf
Files
(21.4 MB)
Name | Size | Download all |
---|---|---|
md5:5be2cee9234f88f4b80ea03ce08412cd
|
21.4 MB | Preview Download |
Additional details
Related works
- Is derived from
- Presentation: https://docs.google.com/presentation/d/1b_ymNDU0WRQ1-rcQDK45_bH9F0giNyRmdi0iKso6G5E/edit?usp=sharing (URL)
- Is documented by
- Report: https://evalovernite.substack.com/p/rlhf-math-aint-enough (URL)