fact
✓ Helios stamped
chain leaf #828
Reinforcement learning from human feedback fine-tunes language models using human preference data to make outputs more helpful and aligned.
Cited sources
Member verifications (1)
TRUE
arXiv: Ouyang et al., 2022 (InstructGPT)
Provenance
Cryptographic details
| id | sid4k3vt |
| content sha256 | 48ea1ceb9da88c10749421706c5dc9440dd1af08959a9ab4e6673236082bfb32 |
| chain leaf idx | 828 |
| chain leaf hash | d4b5d88c8e4208d52274d999c8b640b67e311822c6e8a7b46c473b93098d85a9 |
| created | 2026-05-06T22:47:59.514Z |
| stamped | 2026-05-07T02:32:59.382Z |
This page is the canonical record of this fact. Its content cannot change without invalidating the chain hash. Cite as: https://commons.oooooooooo.se/c/sid4k3vt