✨Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
📝 Summary:
Youtu-LLM is a lightweight 1.96B LLM, pre-trained from scratch with a compact architecture and a multi-stage curriculum focused on commonsense, STEM, and agentic tasks. It achieves state-of-the-art performance for sub-2B models, demonstrating strong intrinsic agentic capabilities.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24618
• PDF: https://arxiv.org/pdf/2512.24618
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #AgenticAI #LightweightLLM #DeepLearning
📝 Summary:
Youtu-LLM is a lightweight 1.96B LLM, pre-trained from scratch with a compact architecture and a multi-stage curriculum focused on commonsense, STEM, and agentic tasks. It achieves state-of-the-art performance for sub-2B models, demonstrating strong intrinsic agentic capabilities.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24618
• PDF: https://arxiv.org/pdf/2512.24618
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #AgenticAI #LightweightLLM #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
📝 Summary:
SpaceTimePilot is a video diffusion model for dynamic scene rendering, offering independent control over spatial viewpoint and temporal motion. It achieves precise space-time disentanglement via a time-embedding, temporal-warping training, and a synthetic dataset.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25075
• PDF: https://arxiv.org/pdf/2512.25075
• Project Page: https://zheninghuang.github.io/Space-Time-Pilot/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoDiffusion #GenerativeAI #DynamicScenes #ComputerGraphics #DeepLearning
📝 Summary:
SpaceTimePilot is a video diffusion model for dynamic scene rendering, offering independent control over spatial viewpoint and temporal motion. It achieves precise space-time disentanglement via a time-embedding, temporal-warping training, and a synthetic dataset.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25075
• PDF: https://arxiv.org/pdf/2512.25075
• Project Page: https://zheninghuang.github.io/Space-Time-Pilot/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoDiffusion #GenerativeAI #DynamicScenes #ComputerGraphics #DeepLearning
✨Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers
📝 Summary:
This paper improves respiratory sound classification using AST enhanced with SAM. It optimizes loss surface geometry for flatter minima, yielding state-of-the-art 68.10% score and crucial 68.31% sensitivity on ICBHI 2017.
🔹 Publication Date: Published on Dec 27, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22564
• PDF: https://arxiv.org/pdf/2512.22564
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#RespiratoryHealth #MedicalAI #DeepLearning #SoundClassification #AIHealthcare
📝 Summary:
This paper improves respiratory sound classification using AST enhanced with SAM. It optimizes loss surface geometry for flatter minima, yielding state-of-the-art 68.10% score and crucial 68.31% sensitivity on ICBHI 2017.
🔹 Publication Date: Published on Dec 27, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22564
• PDF: https://arxiv.org/pdf/2512.22564
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#RespiratoryHealth #MedicalAI #DeepLearning #SoundClassification #AIHealthcare
✨mHC: Manifold-Constrained Hyper-Connections
📝 Summary:
Manifold-Constrained Hyper-Connections mHC resolve training instability and scalability issues of Hyper-Connections HC. mHC restores identity mapping via manifold projection and infrastructure optimization, enabling effective large-scale training with improved performance.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24880
• PDF: https://arxiv.org/pdf/2512.24880
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MachineLearning #DeepLearning #NeuralNetworks #ManifoldLearning #AI
📝 Summary:
Manifold-Constrained Hyper-Connections mHC resolve training instability and scalability issues of Hyper-Connections HC. mHC restores identity mapping via manifold projection and infrastructure optimization, enabling effective large-scale training with improved performance.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24880
• PDF: https://arxiv.org/pdf/2512.24880
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MachineLearning #DeepLearning #NeuralNetworks #ManifoldLearning #AI
✨Kronos: A Foundation Model for the Language of Financial Markets
📝 Summary:
Kronos is a novel foundation model for financial K-line data. It uses a specialized tokenizer and autoregressive pre-training on a vast dataset to significantly outperform existing models in price and volatility forecasting, and synthetic data generation, establishing it as a versatile tool for f...
🔹 Publication Date: Published on Aug 2, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02739
• PDF: https://arxiv.org/pdf/2508.02739
• Github: https://github.com/shiyu-coder/Kronos
🔹 Models citing this paper:
• https://huggingface.co/NeoQuasar/Kronos-base
• https://huggingface.co/NeoQuasar/Kronos-Tokenizer-base
• https://huggingface.co/NeoQuasar/Kronos-mini
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ByronWang2005/Kronos-CS2-Skins-Forecast-Demo
• https://huggingface.co/spaces/yangyang158/kronos
• https://huggingface.co/spaces/heyunfei/crypt
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModel #FinancialAI #DeepLearning #QuantitativeFinance #Forecasting
📝 Summary:
Kronos is a novel foundation model for financial K-line data. It uses a specialized tokenizer and autoregressive pre-training on a vast dataset to significantly outperform existing models in price and volatility forecasting, and synthetic data generation, establishing it as a versatile tool for f...
🔹 Publication Date: Published on Aug 2, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02739
• PDF: https://arxiv.org/pdf/2508.02739
• Github: https://github.com/shiyu-coder/Kronos
🔹 Models citing this paper:
• https://huggingface.co/NeoQuasar/Kronos-base
• https://huggingface.co/NeoQuasar/Kronos-Tokenizer-base
• https://huggingface.co/NeoQuasar/Kronos-mini
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ByronWang2005/Kronos-CS2-Skins-Forecast-Demo
• https://huggingface.co/spaces/yangyang158/kronos
• https://huggingface.co/spaces/heyunfei/crypt
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModel #FinancialAI #DeepLearning #QuantitativeFinance #Forecasting
arXiv.org
Kronos: A Foundation Model for the Language of Financial Markets
The success of large-scale pre-training paradigm, exemplified by Large Language Models (LLMs), has inspired the development of Time Series Foundation Models (TSFMs). However, their application to...
✨Guiding a Diffusion Transformer with the Internal Dynamics of Itself
📝 Summary:
This paper introduces Internal Guidance IG for diffusion models, which adds auxiliary supervision to intermediate layers during training and extrapolates outputs during sampling. This simple strategy significantly improves training efficiency and generation quality. IG achieves state-of-the-art F...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24176
• PDF: https://arxiv.org/pdf/2512.24176
• Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
• Github: https://github.com/CVL-UESTC/Internal-Guidance
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #AI #DeepLearning #GenerativeAI #ComputerVision
📝 Summary:
This paper introduces Internal Guidance IG for diffusion models, which adds auxiliary supervision to intermediate layers during training and extrapolates outputs during sampling. This simple strategy significantly improves training efficiency and generation quality. IG achieves state-of-the-art F...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24176
• PDF: https://arxiv.org/pdf/2512.24176
• Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
• Github: https://github.com/CVL-UESTC/Internal-Guidance
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #AI #DeepLearning #GenerativeAI #ComputerVision
✨FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation
📝 Summary:
FlowBlending optimizes video generation by adapting model capacity to each stage. It uses large models for critical early and late timesteps, and small models for intermediate ones. This achieves faster inference and fewer FLOPs with no loss in large model fidelity.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24724
• PDF: https://arxiv.org/pdf/2512.24724
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ModelOptimization
📝 Summary:
FlowBlending optimizes video generation by adapting model capacity to each stage. It uses large models for critical early and late timesteps, and small models for intermediate ones. This achieves faster inference and fewer FLOPs with no loss in large model fidelity.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24724
• PDF: https://arxiv.org/pdf/2512.24724
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ModelOptimization
✨Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting
📝 Summary:
Dolphin is a novel multimodal model for document image parsing. It uses an analyze-then-parse approach with heterogeneous anchor prompting, achieving state-of-the-art performance and superior efficiency.
🔹 Publication Date: Published on May 20, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.14059
• PDF: https://arxiv.org/pdf/2505.14059
• Github: https://github.com/bytedance/dolphin
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DocumentParsing #MultimodalAI #DeepLearning #ComputerVision #AI
📝 Summary:
Dolphin is a novel multimodal model for document image parsing. It uses an analyze-then-parse approach with heterogeneous anchor prompting, achieving state-of-the-art performance and superior efficiency.
🔹 Publication Date: Published on May 20, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.14059
• PDF: https://arxiv.org/pdf/2505.14059
• Github: https://github.com/bytedance/dolphin
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DocumentParsing #MultimodalAI #DeepLearning #ComputerVision #AI
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
📝 Summary:
NeoVerse is a 4D world model for reconstruction and video generation. It scales to in-the-wild monocular videos using pose-free feed-forward reconstruction and online degradation simulation, achieving state-of-the-art performance.
🔹 Publication Date: Published on Jan 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00393
• PDF: https://arxiv.org/pdf/2601.00393
• Project Page: https://neoverse-4d.github.io/
• Github: https://neoverse-4d.github.io
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#4DWorldModel #VideoGeneration #ComputerVision #DeepLearning #AI
📝 Summary:
NeoVerse is a 4D world model for reconstruction and video generation. It scales to in-the-wild monocular videos using pose-free feed-forward reconstruction and online degradation simulation, achieving state-of-the-art performance.
🔹 Publication Date: Published on Jan 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00393
• PDF: https://arxiv.org/pdf/2601.00393
• Project Page: https://neoverse-4d.github.io/
• Github: https://neoverse-4d.github.io
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#4DWorldModel #VideoGeneration #ComputerVision #DeepLearning #AI
✨MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing
📝 Summary:
MorphAny3D offers a training-free framework for high-quality 3D morphing, even across categories. It leverages Structured Latent representations with novel attention mechanisms MCA, TFSA for structural coherence and temporal consistency. This achieves state-of-the-art results and supports advance...
🔹 Publication Date: Published on Jan 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00204
• PDF: https://arxiv.org/pdf/2601.00204
• Project Page: https://xiaokunsun.github.io/MorphAny3D.github.io
• Github: https://github.com/XiaokunSun/MorphAny3D
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DMorphing #ComputerGraphics #DeepLearning #StructuredLatent #AIResearch
📝 Summary:
MorphAny3D offers a training-free framework for high-quality 3D morphing, even across categories. It leverages Structured Latent representations with novel attention mechanisms MCA, TFSA for structural coherence and temporal consistency. This achieves state-of-the-art results and supports advance...
🔹 Publication Date: Published on Jan 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00204
• PDF: https://arxiv.org/pdf/2601.00204
• Project Page: https://xiaokunsun.github.io/MorphAny3D.github.io
• Github: https://github.com/XiaokunSun/MorphAny3D
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DMorphing #ComputerGraphics #DeepLearning #StructuredLatent #AIResearch
✨Nested Learning: The Illusion of Deep Learning Architectures
📝 Summary:
Nested Learning NL models ML as nested optimization problems. It enables expressive algorithms for higher-order learning and continual adaptation, introducing optimizers, self-modifying models, and continuum memory systems.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24695
• PDF: https://arxiv.org/pdf/2512.24695
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#NestedLearning #MachineLearning #DeepLearning #Optimization #AI
📝 Summary:
Nested Learning NL models ML as nested optimization problems. It enables expressive algorithms for higher-order learning and continual adaptation, introducing optimizers, self-modifying models, and continuum memory systems.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24695
• PDF: https://arxiv.org/pdf/2512.24695
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#NestedLearning #MachineLearning #DeepLearning #Optimization #AI
✨InfoSynth: Information-Guided Benchmark Synthesis for LLMs
📝 Summary:
InfoSynth automatically generates novel and diverse coding benchmarks for LLMs. It uses information-theoretic metrics and genetic algorithms to create scalable self-verifying problems, overcoming manual effort and training data contamination.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00575
• PDF: https://arxiv.org/pdf/2601.00575
• Project Page: https://ishirgarg.github.io/infosynth_web/
• Github: https://github.com/ishirgarg/infosynth
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #Benchmarking #GenerativeAI #DeepLearning
📝 Summary:
InfoSynth automatically generates novel and diverse coding benchmarks for LLMs. It uses information-theoretic metrics and genetic algorithms to create scalable self-verifying problems, overcoming manual effort and training data contamination.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00575
• PDF: https://arxiv.org/pdf/2601.00575
• Project Page: https://ishirgarg.github.io/infosynth_web/
• Github: https://github.com/ishirgarg/infosynth
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #Benchmarking #GenerativeAI #DeepLearning
✨OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions
📝 Summary:
OmniVCus introduces a system for feedforward multi-subject video customization with multimodal controls. It proposes a data pipeline, VideoCus-Factory, and a diffusion Transformer framework with novel embedding mechanisms. This enables more subjects and precise editing, significantly outperformin...
🔹 Publication Date: Published on Jun 29, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.23361
• PDF: https://arxiv.org/pdf/2506.23361
• Project Page: https://caiyuanhao1998.github.io/project/OmniVCus/
• Github: https://github.com/caiyuanhao1998/Open-OmniVCus
🔹 Models citing this paper:
• https://huggingface.co/CaiYuanhao/OmniVCus
✨ Datasets citing this paper:
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Test
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Train
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #DiffusionModels #MultimodalAI #DeepLearning #ComputerVision
📝 Summary:
OmniVCus introduces a system for feedforward multi-subject video customization with multimodal controls. It proposes a data pipeline, VideoCus-Factory, and a diffusion Transformer framework with novel embedding mechanisms. This enables more subjects and precise editing, significantly outperformin...
🔹 Publication Date: Published on Jun 29, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.23361
• PDF: https://arxiv.org/pdf/2506.23361
• Project Page: https://caiyuanhao1998.github.io/project/OmniVCus/
• Github: https://github.com/caiyuanhao1998/Open-OmniVCus
🔹 Models citing this paper:
• https://huggingface.co/CaiYuanhao/OmniVCus
✨ Datasets citing this paper:
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Test
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Train
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #DiffusionModels #MultimodalAI #DeepLearning #ComputerVision
arXiv.org
OmniVCus: Feedforward Subject-driven Video Customization with...
Existing feedforward subject-driven video customization methods mainly study single-subject scenarios due to the difficulty of constructing multi-subject training data pairs. Another challenging...
❤1
✨Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
📝 Summary:
Bitnet.cpp enhances edge inference for ternary LLMs using a novel mixed-precision matrix multiplication library. This system incorporates Ternary Lookup Tables and Int2 with a Scale for efficient, lossless inference, achieving up to a 6.25x speed increase over baselines.
🔹 Publication Date: Published on Feb 17, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.11880
• PDF: https://arxiv.org/pdf/2502.11880
• Github: https://github.com/microsoft/BitNet/tree/paper
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #EdgeAI #MachineLearning #DeepLearning #AI
📝 Summary:
Bitnet.cpp enhances edge inference for ternary LLMs using a novel mixed-precision matrix multiplication library. This system incorporates Ternary Lookup Tables and Int2 with a Scale for efficient, lossless inference, achieving up to a 6.25x speed increase over baselines.
🔹 Publication Date: Published on Feb 17, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.11880
• PDF: https://arxiv.org/pdf/2502.11880
• Github: https://github.com/microsoft/BitNet/tree/paper
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #EdgeAI #MachineLearning #DeepLearning #AI
❤1
✨BitNet b1.58 2B4T Technical Report
📝 Summary:
BitNet b1.58 2B4T is the first open-source 1-bit Large Language Model with 2 billion parameters. It matches full-precision LLM performance while offering significant improvements in computational efficiency like reduced memory and energy. The model weights are openly released for research.
🔹 Publication Date: Published on Apr 16, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.12285
• PDF: https://arxiv.org/pdf/2504.12285
• Github: https://github.com/microsoft/bitnet
🔹 Models citing this paper:
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16
✨ Spaces citing this paper:
• https://huggingface.co/spaces/suayptalha/Chat-with-Bitnet-b1.58-2B-4T
• https://huggingface.co/spaces/aizip-dev/SLM-RAG-Arena
• https://huggingface.co/spaces/Tonic/Native_1-bit_LLM
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #Quantization #OpenSourceAI #DeepLearning
📝 Summary:
BitNet b1.58 2B4T is the first open-source 1-bit Large Language Model with 2 billion parameters. It matches full-precision LLM performance while offering significant improvements in computational efficiency like reduced memory and energy. The model weights are openly released for research.
🔹 Publication Date: Published on Apr 16, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.12285
• PDF: https://arxiv.org/pdf/2504.12285
• Github: https://github.com/microsoft/bitnet
🔹 Models citing this paper:
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16
✨ Spaces citing this paper:
• https://huggingface.co/spaces/suayptalha/Chat-with-Bitnet-b1.58-2B-4T
• https://huggingface.co/spaces/aizip-dev/SLM-RAG-Arena
• https://huggingface.co/spaces/Tonic/Native_1-bit_LLM
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #Quantization #OpenSourceAI #DeepLearning
arXiv.org
BitNet b1.58 2B4T Technical Report
We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been...
✨BitNet Distillation
📝 Summary:
BitNet Distillation fine-tunes LLMs to 1.58-bit precision using SubLN, attention distillation, and continual pre-training. It achieves comparable performance to full-precision models, offering 10x memory savings and 2.65x faster inference.
🔹 Publication Date: Published on Oct 15, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13998
• PDF: https://arxiv.org/pdf/2510.13998
• Github: https://github.com/microsoft/BitNet
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #Quantization #ModelCompression #DeepLearning #AI
📝 Summary:
BitNet Distillation fine-tunes LLMs to 1.58-bit precision using SubLN, attention distillation, and continual pre-training. It achieves comparable performance to full-precision models, offering 10x memory savings and 2.65x faster inference.
🔹 Publication Date: Published on Oct 15, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13998
• PDF: https://arxiv.org/pdf/2510.13998
• Github: https://github.com/microsoft/BitNet
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #Quantization #ModelCompression #DeepLearning #AI
✨InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams
📝 Summary:
InfiniteVGGT enables continuous 3D visual geometry understanding for infinite streams. It uses a causal transformer with adaptive rolling memory for long-term stability, outperforming existing streaming methods. A new Long3D benchmark is introduced for rigorous evaluation of such systems.
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02281
• PDF: https://arxiv.org/pdf/2601.02281
• Github: https://github.com/AutoLab-SAI-SJTU/InfiniteVGGT
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisualGeometry #3DVision #Transformers #StreamingAI #DeepLearning
📝 Summary:
InfiniteVGGT enables continuous 3D visual geometry understanding for infinite streams. It uses a causal transformer with adaptive rolling memory for long-term stability, outperforming existing streaming methods. A new Long3D benchmark is introduced for rigorous evaluation of such systems.
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02281
• PDF: https://arxiv.org/pdf/2601.02281
• Github: https://github.com/AutoLab-SAI-SJTU/InfiniteVGGT
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisualGeometry #3DVision #Transformers #StreamingAI #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies
📝 Summary:
DiffProxy generates multi-view consistent human proxies using diffusion models to improve human mesh recovery. This bridges synthetic training and real-world generalization, achieving state-of-the-art performance on real benchmarks.
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02267
• PDF: https://arxiv.org/pdf/2601.02267
• Project Page: https://wrk226.github.io/DiffProxy.html
• Github: https://github.com/wrk226/DiffProxy
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#HumanMeshRecovery #DiffusionModels #ComputerVision #DeepLearning #AI
📝 Summary:
DiffProxy generates multi-view consistent human proxies using diffusion models to improve human mesh recovery. This bridges synthetic training and real-world generalization, achieving state-of-the-art performance on real benchmarks.
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02267
• PDF: https://arxiv.org/pdf/2601.02267
• Project Page: https://wrk226.github.io/DiffProxy.html
• Github: https://github.com/wrk226/DiffProxy
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#HumanMeshRecovery #DiffusionModels #ComputerVision #DeepLearning #AI
✨CPPO: Contrastive Perception for Vision Language Policy Optimization
📝 Summary:
CPPO improves vision-language model fine-tuning by detecting perception tokens through entropy shifts. It then applies a Contrastive Perception Loss to enhance multimodal reasoning, outperforming prior methods more efficiently.
🔹 Publication Date: Published on Jan 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00501
• PDF: https://arxiv.org/pdf/2601.00501
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #MultimodalAI #ContrastiveLearning #DeepLearning #AIResearch
📝 Summary:
CPPO improves vision-language model fine-tuning by detecting perception tokens through entropy shifts. It then applies a Contrastive Perception Loss to enhance multimodal reasoning, outperforming prior methods more efficiently.
🔹 Publication Date: Published on Jan 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00501
• PDF: https://arxiv.org/pdf/2601.00501
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #MultimodalAI #ContrastiveLearning #DeepLearning #AIResearch
✨Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping
📝 Summary:
Prithvi-CAFE improves flood mapping by integrating a pretrained Geo-Foundation Model encoder with a parallel CNN branch featuring attention modules. This hybrid approach effectively captures both global context and critical local details, achieving state-of-the-art results on Sen1Flood11 and Floo...
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02315
• PDF: https://arxiv.org/pdf/2601.02315
• Github: https://github.com/Sk-2103/Prithvi-CAFE
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FloodMapping #DeepLearning #GeoAI #RemoteSensing #ComputerVision
📝 Summary:
Prithvi-CAFE improves flood mapping by integrating a pretrained Geo-Foundation Model encoder with a parallel CNN branch featuring attention modules. This hybrid approach effectively captures both global context and critical local details, achieving state-of-the-art results on Sen1Flood11 and Floo...
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02315
• PDF: https://arxiv.org/pdf/2601.02315
• Github: https://github.com/Sk-2103/Prithvi-CAFE
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FloodMapping #DeepLearning #GeoAI #RemoteSensing #ComputerVision