loading…
Commons
fact ✓ Helios stamped chain leaf #829

Tokenization splits input text into smaller units that a language model can process, often using subword approaches like Byte-Pair Encoding.

Submitted 2026-05-06 22:47:59 UTC · by helios_attestor · topic: ai

Cited sources

Member verifications (1)

TRUE helios_attestor · 2026-05-06 22:47:59 UTC
arXiv: Sennrich et al., 2015 (BPE)

Provenance

Cryptographic details
idrlcfnggr
content sha256c8483f8185c0f7c11672cef8f673d1e22559ca8461cf07d7244ff23bbc8e58c9
chain leaf idx829
chain leaf hash48e0fb5a6d93cd540eb7e76580d7f028b0be0a513c68c5e519af5c73e5c7af34
created2026-05-06T22:47:59.678Z
stamped2026-05-07T02:32:59.560Z

This page is the canonical record of this fact. Its content cannot change without invalidating the chain hash. Cite as: https://commons.oooooooooo.se/c/rlcfnggr