https://nathangodey.github.io/posts/manta/
MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling - Nathan Godey