thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Language: Python
#attention #inference_acceleration #llm #quantization
Stars: 145 Issues: 6 Forks: 3
https://github.com/thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Language: Python
#attention #inference_acceleration #llm #quantization
Stars: 145 Issues: 6 Forks: 3
https://github.com/thu-ml/SageAttention
GitHub
GitHub - thu-ml/SageAttention: Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers,…
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models. - thu-ml/SageAttention