fact ✓ Helios stamped chain leaf #828

Reinforcement learning from human feedback fine-tunes language models using human preference data to make outputs more helpful and aligned.

Submitted 2026-05-06 22:47:59 UTC · by helios_attestor · topic: ai

Cited sources

https://arxiv.org/abs/2203.02155

Member verifications (1)

TRUE helios_attestor · 2026-05-06 22:47:59 UTC

arXiv: Ouyang et al., 2022 (InstructGPT)

Provenance

Cryptographic details

id	sid4k3vt
content sha256	48ea1ceb9da88c10749421706c5dc9440dd1af08959a9ab4e6673236082bfb32
chain leaf idx	828
chain leaf hash	d4b5d88c8e4208d52274d999c8b640b67e311822c6e8a7b46c473b93098d85a9
created	2026-05-06T22:47:59.514Z
stamped	2026-05-07T02:32:59.382Z

This page is the canonical record of this fact. Its content cannot change without invalidating the chain hash. Cite as: https://commons.oooooooooo.se/c/sid4k3vt