✨UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
📝 Summary:
UniQL unifies quantization and low-rank compression to deploy LLMs on mobile devices. It reduces memory by 4x-5.7x and improves token throughput by 2.7x-3.4x, maintaining accuracy across various model types.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03383
• PDF: https://arxiv.org/pdf/2512.03383
• Project Page: https://hychiang.info/projects/uniql/
• Github: https://github.com/enyac-group/UniQL
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #EdgeAI #Quantization #ModelCompression #DeepLearning
📝 Summary:
UniQL unifies quantization and low-rank compression to deploy LLMs on mobile devices. It reduces memory by 4x-5.7x and improves token throughput by 2.7x-3.4x, maintaining accuracy across various model types.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03383
• PDF: https://arxiv.org/pdf/2512.03383
• Project Page: https://hychiang.info/projects/uniql/
• Github: https://github.com/enyac-group/UniQL
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #EdgeAI #Quantization #ModelCompression #DeepLearning
✨Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in {pm 1, pm i}
📝 Summary:
Fairy2i converts pre-trained real-valued LLMs to a complex form, enabling efficient low-bit quantization while reusing existing checkpoints. It achieves near full-precision performance for LLaMA-2 7B at 2-bit, significantly outperforming real-valued binary methods.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2512.02901
• PDF: https://arxiv.org/pdf/2512.02901
• Github: https://github.com/PKULab1806/Fairy2i-W2
🔹 Models citing this paper:
• https://huggingface.co/PKU-DS-LAB/Fairy2i-W2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #Quantization #ModelCompression #DeepLearning #AIResearch
📝 Summary:
Fairy2i converts pre-trained real-valued LLMs to a complex form, enabling efficient low-bit quantization while reusing existing checkpoints. It achieves near full-precision performance for LLaMA-2 7B at 2-bit, significantly outperforming real-valued binary methods.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2512.02901
• PDF: https://arxiv.org/pdf/2512.02901
• Github: https://github.com/PKULab1806/Fairy2i-W2
🔹 Models citing this paper:
• https://huggingface.co/PKU-DS-LAB/Fairy2i-W2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #Quantization #ModelCompression #DeepLearning #AIResearch
❤2
✨BitNet Distillation
📝 Summary:
BitNet Distillation fine-tunes LLMs to 1.58-bit precision using SubLN, attention distillation, and continual pre-training. It achieves comparable performance to full-precision models, offering 10x memory savings and 2.65x faster inference.
🔹 Publication Date: Published on Oct 15, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13998
• PDF: https://arxiv.org/pdf/2510.13998
• Github: https://github.com/microsoft/BitNet
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #Quantization #ModelCompression #DeepLearning #AI
📝 Summary:
BitNet Distillation fine-tunes LLMs to 1.58-bit precision using SubLN, attention distillation, and continual pre-training. It achieves comparable performance to full-precision models, offering 10x memory savings and 2.65x faster inference.
🔹 Publication Date: Published on Oct 15, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13998
• PDF: https://arxiv.org/pdf/2510.13998
• Github: https://github.com/microsoft/BitNet
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #Quantization #ModelCompression #DeepLearning #AI