✨UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
📝 Summary:
UniQL unifies quantization and low-rank compression to deploy LLMs on mobile devices. It reduces memory by 4x-5.7x and improves token throughput by 2.7x-3.4x, maintaining accuracy across various model types.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03383
• PDF: https://arxiv.org/pdf/2512.03383
• Project Page: https://hychiang.info/projects/uniql/
• Github: https://github.com/enyac-group/UniQL
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #EdgeAI #Quantization #ModelCompression #DeepLearning
📝 Summary:
UniQL unifies quantization and low-rank compression to deploy LLMs on mobile devices. It reduces memory by 4x-5.7x and improves token throughput by 2.7x-3.4x, maintaining accuracy across various model types.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03383
• PDF: https://arxiv.org/pdf/2512.03383
• Project Page: https://hychiang.info/projects/uniql/
• Github: https://github.com/enyac-group/UniQL
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #EdgeAI #Quantization #ModelCompression #DeepLearning
✨ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
📝 Summary:
ToolOrchestra uses reinforcement learning to train small orchestrators that coordinate intelligent tools. This method enables an 8B model to outperform GPT-5 on complex tasks like Humanitys Last Exam, achieving higher accuracy at significantly lower cost and improving efficiency.
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21689
• PDF: https://arxiv.org/pdf/2511.21689
• Project Page: https://research.nvidia.com/labs/lpr/ToolOrchestra/
• Github: https://github.com/NVlabs/ToolOrchestra/
🔹 Models citing this paper:
• https://huggingface.co/nvidia/Orchestrator-8B
• https://huggingface.co/Mungert/Orchestrator-8B-GGUF
• https://huggingface.co/cyankiwi/Orchestrator-8B-AWQ-4bit
✨ Datasets citing this paper:
• https://huggingface.co/datasets/nvidia/ToolScale
• https://huggingface.co/datasets/victor/ToolScale
• https://huggingface.co/datasets/FranckAbgrall/ToolScale
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ToolOrchestra #ModelOrchestration #ReinforcementLearning #LLMs #AI
📝 Summary:
ToolOrchestra uses reinforcement learning to train small orchestrators that coordinate intelligent tools. This method enables an 8B model to outperform GPT-5 on complex tasks like Humanitys Last Exam, achieving higher accuracy at significantly lower cost and improving efficiency.
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21689
• PDF: https://arxiv.org/pdf/2511.21689
• Project Page: https://research.nvidia.com/labs/lpr/ToolOrchestra/
• Github: https://github.com/NVlabs/ToolOrchestra/
🔹 Models citing this paper:
• https://huggingface.co/nvidia/Orchestrator-8B
• https://huggingface.co/Mungert/Orchestrator-8B-GGUF
• https://huggingface.co/cyankiwi/Orchestrator-8B-AWQ-4bit
✨ Datasets citing this paper:
• https://huggingface.co/datasets/nvidia/ToolScale
• https://huggingface.co/datasets/victor/ToolScale
• https://huggingface.co/datasets/FranckAbgrall/ToolScale
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ToolOrchestra #ModelOrchestration #ReinforcementLearning #LLMs #AI
arXiv.org
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool...
Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally...
✨Deep Research: A Systematic Survey
📝 Summary:
This survey systematically reviews Deep Research systems that integrate LLMs with external tools to enhance complex problem-solving. It provides a roadmap, key components, optimization techniques, and challenges for these advanced research agents.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02038
• PDF: https://arxiv.org/pdf/2512.02038
• Project Page: https://deep-research-survey.github.io/
• Github: https://github.com/mangopy/Deep-Research-Survey
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DeepResearch #LLMs #AI #ResearchAgents #SystematicSurvey
📝 Summary:
This survey systematically reviews Deep Research systems that integrate LLMs with external tools to enhance complex problem-solving. It provides a roadmap, key components, optimization techniques, and challenges for these advanced research agents.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02038
• PDF: https://arxiv.org/pdf/2512.02038
• Project Page: https://deep-research-survey.github.io/
• Github: https://github.com/mangopy/Deep-Research-Survey
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DeepResearch #LLMs #AI #ResearchAgents #SystematicSurvey
✨Adversarial Confusion Attack: Disrupting Multimodal Large Language Models
📝 Summary:
The Adversarial Confusion Attack systematically disrupts multimodal LLMs, causing incoherent or confidently incorrect outputs. This basic adversarial technique transfers to diverse models, including proprietary ones, potentially hindering AI Agent reliability.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20494
• PDF: https://arxiv.org/pdf/2511.20494
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AdversarialAttack #MultimodalAI #LLMs #AISecurity #AIResearch
📝 Summary:
The Adversarial Confusion Attack systematically disrupts multimodal LLMs, causing incoherent or confidently incorrect outputs. This basic adversarial technique transfers to diverse models, including proprietary ones, potentially hindering AI Agent reliability.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20494
• PDF: https://arxiv.org/pdf/2511.20494
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AdversarialAttack #MultimodalAI #LLMs #AISecurity #AIResearch
❤1👍1🔥1
✨SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs
📝 Summary:
SignRoundV2 is a post-training quantization framework for LLMs. It uses a sensitivity metric for bit allocation and pre-tuning for scales to achieve competitive accuracy even at 2-bit quantization, closing the gap with full-precision models.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04746
• PDF: https://arxiv.org/pdf/2512.04746
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #Quantization #DeepLearning #AI #MachineLearning
📝 Summary:
SignRoundV2 is a post-training quantization framework for LLMs. It uses a sensitivity metric for bit allocation and pre-tuning for scales to achieve competitive accuracy even at 2-bit quantization, closing the gap with full-precision models.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04746
• PDF: https://arxiv.org/pdf/2512.04746
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #Quantization #DeepLearning #AI #MachineLearning
✨SQ-format: A Unified Sparse-Quantized Hardware-friendly Data Format for LLMs
📝 Summary:
The SQ-format is a unified sparse-quantized data format for LLM post-training quantization. It improves accuracy and efficiency balance by combining sparse and low-precision matrix multiplications. This enables better performance and throughput, especially for outlier activations, supporting next...
🔹 Publication Date: Published on Dec 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05409
• PDF: https://arxiv.org/pdf/2512.05409
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #Quantization #SparseML #HardwareAcceleration #AIResearch
📝 Summary:
The SQ-format is a unified sparse-quantized data format for LLM post-training quantization. It improves accuracy and efficiency balance by combining sparse and low-precision matrix multiplications. This enables better performance and throughput, especially for outlier activations, supporting next...
🔹 Publication Date: Published on Dec 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05409
• PDF: https://arxiv.org/pdf/2512.05409
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #Quantization #SparseML #HardwareAcceleration #AIResearch
❤1
✨MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
📝 Summary:
MemLoRA and MemLoRA-V enable efficient on-device memory-augmented AI by equipping small language and vision-language models with specialized, distilled memory adapters. This allows accurate local memory operations and native visual understanding, outperforming larger baselines in text and visual ...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04763
• PDF: https://arxiv.org/pdf/2512.04763
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#OnDeviceAI #LLMs #VLMs #AIAdapters #MemoryAugmentedAI
📝 Summary:
MemLoRA and MemLoRA-V enable efficient on-device memory-augmented AI by equipping small language and vision-language models with specialized, distilled memory adapters. This allows accurate local memory operations and native visual understanding, outperforming larger baselines in text and visual ...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04763
• PDF: https://arxiv.org/pdf/2512.04763
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#OnDeviceAI #LLMs #VLMs #AIAdapters #MemoryAugmentedAI
❤1
🤖🧠 How to Run and Fine-Tune Kimi K2 Thinking Locally with Unsloth
🗓️ 11 Dec 2025
📚 AI News & Trends
The demand for efficient and powerful large language models (LLMs) continues to rise as developers and researchers seek new ways to optimize reasoning, coding, and conversational AI performance. One of the most impressive open-source AI systems available today is Kimi K2 Thinking, created by Moonshot AI. Through collaboration with Unsloth, users can now fine-tune and ...
#KimiK2Thinking #Unsloth #LLMs #LargeLanguageModels #AI #FineTuning
🗓️ 11 Dec 2025
📚 AI News & Trends
The demand for efficient and powerful large language models (LLMs) continues to rise as developers and researchers seek new ways to optimize reasoning, coding, and conversational AI performance. One of the most impressive open-source AI systems available today is Kimi K2 Thinking, created by Moonshot AI. Through collaboration with Unsloth, users can now fine-tune and ...
#KimiK2Thinking #Unsloth #LLMs #LargeLanguageModels #AI #FineTuning
❤1
✨Thinking with Images via Self-Calling Agent
📝 Summary:
sCoT is a novel visual reasoning paradigm that reformulates interleaved multimodal CoT as a language-only CoT with self-calling subagents. It improves reasoning performance and efficiency by avoiding explicit multimodal interleaving and using group-relative policy optimization.
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08511
• PDF: https://arxiv.org/pdf/2512.08511
• Github: https://github.com/YWenxi/think-with-images-through-self-calling
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisualReasoning #MultimodalAI #LLMs #AIagents #AIResearch
📝 Summary:
sCoT is a novel visual reasoning paradigm that reformulates interleaved multimodal CoT as a language-only CoT with self-calling subagents. It improves reasoning performance and efficiency by avoiding explicit multimodal interleaving and using group-relative policy optimization.
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08511
• PDF: https://arxiv.org/pdf/2512.08511
• Github: https://github.com/YWenxi/think-with-images-through-self-calling
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisualReasoning #MultimodalAI #LLMs #AIagents #AIResearch
✨Sliding Window Attention Adaptation
📝 Summary:
Sliding Window Attention Adaptation SWAA allows pretrained LLMs to use efficient sliding window attention for long contexts without retraining. SWAA combines five adaptation methods, with specific synergistic combinations effectively recovering original long-context performance.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10411
• PDF: https://arxiv.org/pdf/2512.10411
🔹 Models citing this paper:
• https://huggingface.co/yuyijiong/Qwen3-SWA-adaptation
✨ Datasets citing this paper:
• https://huggingface.co/datasets/yuyijiong/LongMemEval_24k
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #SlidingWindowAttention #LongContextAI #NLP #AIResearch
📝 Summary:
Sliding Window Attention Adaptation SWAA allows pretrained LLMs to use efficient sliding window attention for long contexts without retraining. SWAA combines five adaptation methods, with specific synergistic combinations effectively recovering original long-context performance.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10411
• PDF: https://arxiv.org/pdf/2512.10411
🔹 Models citing this paper:
• https://huggingface.co/yuyijiong/Qwen3-SWA-adaptation
✨ Datasets citing this paper:
• https://huggingface.co/datasets/yuyijiong/LongMemEval_24k
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #SlidingWindowAttention #LongContextAI #NLP #AIResearch
❤2