ML Research Hub
32.8K subscribers
4.13K photos
244 videos
23 files
4.46K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Kimi Linear: An Expressive, Efficient Attention Architecture

📝 Summary:
Kimi Linear is a new hybrid linear attention architecture that outperforms full attention in performance and efficiency across diverse scenarios. It leverages Kimi Delta Attention and Multi-Head Latent Attention, reducing KV cache by up to 75% and boosting decoding throughput by 6x.

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26692
• PDF: https://arxiv.org/pdf/2510.26692
• Github: https://github.com/MoonshotAI/Kimi-Linear

🔹 Models citing this paper:
https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Base
https://huggingface.co/aiqtech/Kimi-Linear-48B-A3B-Instruct

Spaces citing this paper:
https://huggingface.co/spaces/Speedofmastery/orynxml-agents

==================================

For more data science resources:
https://t.me/DataScienceT

#AttentionMechanisms #LLM #AIResearch #DeepLearning #ModelEfficiency
Virtual Width Networks

📝 Summary:
Virtual Width Networks VWN enhance model efficiency by expanding representational width without increasing computational cost. VWN accelerates optimization and improves loss reduction, showing a log-linear scaling relation between virtual width and loss.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11238
• PDF: https://arxiv.org/pdf/2511.11238

==================================

For more data science resources:
https://t.me/DataScienceT

#NeuralNetworks #DeepLearning #ModelEfficiency #MachineLearning #AI
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

📝 Summary:
OmniZip is a training-free framework that addresses the computational bottleneck in omnimodal LLMs by dynamically compressing audio-visual tokens. It uses audio retention scores to guide video token pruning, achieving 3.42X inference speedup and 1.4X memory reduction without performance loss.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14582
• PDF: https://arxiv.org/pdf/2511.14582
• Github: https://github.com/KD-TAO/OmniZip

==================================

For more data science resources:
https://t.me/DataScienceT

#OmnimodalLLM #TokenCompression #LLMs #AI #ModelEfficiency
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

📝 Summary:
SSA is a new training framework for sparse attention in LLMs that aligns sparse and full attention outputs. It achieves state-of-the-art performance, stronger sparsity, and improves long-context extrapolation, allowing flexible compute-performance trade-offs.

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20102
• PDF: https://arxiv.org/pdf/2511.20102

==================================

For more data science resources:
https://t.me/DataScienceT

#LLM #SparseAttention #DeepLearning #AIResearch #ModelEfficiency
CosineGate: Semantic Dynamic Routing via Cosine Incompatibility in Residual Networks

📝 Summary:
CosineGate enables dynamic routing in residual networks using cosine incompatibility to skip redundant blocks. This reduces computation by up to 28.5 percent while matching or exceeding ResNet-20 accuracy, without auxiliary supervision.

🔹 Publication Date: Published on Dec 21, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22206
• PDF: https://arxiv.org/pdf/2512.22206
• Github: https://github.com/thotayogeswarreddy/CosineGate

==================================

For more data science resources:
https://t.me/DataScienceT

#DeepLearning #NeuralNetworks #DynamicRouting #ModelEfficiency #AIResearch
👍1