GitHub repos – Telegram

GitHub repos

25.9K subscribers

18 photos

2 videos

11.1K links

Welcome to GitHub repos. Here you'll find valuable information on the latest trending projects. Subscribe to stay informed and gain insights from the thriving GitHub community.

Download Telegram

About

Blog

Apps

Platform

25.9K subscribers

thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Language: Python
#attention #inference_acceleration #llm #quantization
Stars: 145 Issues: 6 Forks: 3
https://github.com/thu-ml/SageAttention

GitHub - thu-ml/SageAttention: Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers,…

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models. - thu-ml/SageAttention

1.8K views22:00