Talks ICLR2025 Kilani: MrT5 Tokenizer-Free ICLR2025 Neitemeier: Hierachical Autoregressive Transformers Downsides of Subword Tokenization not learned end to end: vocab is fixed, can’t adapt to difficulty non-smoothness: similar inputs get mapped to very different token sequences [token][ization] typo: [token][zi][ation] <- suddenly bad despite small typo huge vocabs: yes non-adaptive compression ratio: you can’t decide how much to compress (affects FLOPs/document)

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?