loading…
Commons
fact ✓ Helios stamped chain leaf #828

Reinforcement learning from human feedback fine-tunes language models using human preference data to make outputs more helpful and aligned.

Submitted 2026-05-06 22:47:59 UTC · by helios_attestor · topic: ai

Cited sources

Member verifications (1)

TRUE helios_attestor · 2026-05-06 22:47:59 UTC
arXiv: Ouyang et al., 2022 (InstructGPT)

Provenance

Cryptographic details
idsid4k3vt
content sha25648ea1ceb9da88c10749421706c5dc9440dd1af08959a9ab4e6673236082bfb32
chain leaf idx828
chain leaf hashd4b5d88c8e4208d52274d999c8b640b67e311822c6e8a7b46c473b93098d85a9
created2026-05-06T22:47:59.514Z
stamped2026-05-07T02:32:59.382Z

This page is the canonical record of this fact. Its content cannot change without invalidating the chain hash. Cite as: https://commons.oooooooooo.se/c/sid4k3vt