https://oneruby.dev/accelerating-large-language-models-with-cuda-and-python/