ML Research Hub
32.8K subscribers
4.17K photos
251 videos
23 files
4.5K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Scaling Behavior of Discrete Diffusion Language Models

📝 Summary:
Research on discrete diffusion language models DLMs shows their scaling behavior depends on noise type. Uniform diffusion is more parameter and data efficient than masked diffusion, making it promising for data-bound settings. A 10B parameter model confirmed this.

🔹 Publication Date: Published on Dec 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10858
• PDF: https://arxiv.org/pdf/2512.10858
• Github: https://github.com/dvruette/gidd-easydel

==================================

For more data science resources:
https://t.me/DataScienceT

#DiffusionModels #LanguageModels #NLP #AIResearch #DeepLearning
1
This media is not supported in your browser
VIEW IN TELEGRAM
FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering

📝 Summary:
FrameDiffuser is an autoregressive neural rendering framework. It generates temporally consistent, photorealistic frames using G-buffer data and its own previous output. This achieves interactive speed and high quality compared to prior methods.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16670
• PDF: https://arxiv.org/pdf/2512.16670

==================================

For more data science resources:
https://t.me/DataScienceT

#NeuralRendering #DiffusionModels #ComputerGraphics #RealtimeRendering #DeepLearning
2
RadarGen: Automotive Radar Point Cloud Generation from Cameras

📝 Summary:
RadarGen synthesizes realistic automotive radar point clouds from camera images using diffusion models. It incorporates depth, semantic, and motion cues for physical plausibility, enabling scalable multimodal simulation and improving perception models.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17897
• PDF: https://arxiv.org/pdf/2512.17897

==================================

For more data science resources:
https://t.me/DataScienceT

#AutomotiveRadar #PointClouds #DiffusionModels #ComputerVision #AutonomousDriving
1
This media is not supported in your browser
VIEW IN TELEGRAM
MatSpray: Fusing 2D Material World Knowledge on 3D Geometry

📝 Summary:
MatSpray integrates 2D PBR materials from diffusion models onto 3D Gaussian Splatting geometry. Using projection and neural refinement, it enables accurate relighting and photorealistic rendering from reconstructed scenes. This boosts asset creation efficiency.

🔹 Publication Date: Published on Dec 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18314
• PDF: https://arxiv.org/pdf/2512.18314
• Project Page: https://matspray.jdihlmann.com/
• Github: https://github.com/cgtuebingen/MatSpray

==================================

For more data science resources:
https://t.me/DataScienceT

#MatSpray #GaussianSplatting #DiffusionModels #3DRendering #ComputerGraphics
2
SkyReels-V2: Infinite-length Film Generative Model

📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2

🔹 Models citing this paper:
https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
https://huggingface.co/Skywork/SkyCaptioner-V1
https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P

Spaces citing this paper:
https://huggingface.co/spaces/fffiloni/SkyReels-V2
https://huggingface.co/spaces/Dudu0043/SkyReels-V2
https://huggingface.co/spaces/14eee109giet/SkyReels-V2

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch
2
This media is not supported in your browser
VIEW IN TELEGRAM
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

📝 Summary:
InsertAnywhere is a framework for realistic video object insertion. It uses 4D aware mask generation for geometric consistency and an extended diffusion model for appearance-faithful synthesis, outperforming existing methods.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17504
• PDF: https://arxiv.org/pdf/2512.17504
• Project Page: https://myyzzzoooo.github.io/InsertAnywhere/
• Github: https://github.com/myyzzzoooo/InsertAnywhere

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoEditing #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
1
This media is not supported in your browser
VIEW IN TELEGRAM
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

📝 Summary:
Transparent objects are hard for perception. This work observes video diffusion models can synthesize transparent phenomena, so they repurpose one. Their DKT model, trained on a new dataset, achieves zero-shot SOTA for depth and normal estimation of transparent objects, proving diffusion knows tr...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23705
• PDF: https://arxiv.org/pdf/2512.23705
• Project Page: https://daniellli.github.io/projects/DKT/
• Github: https://github.com/Daniellli/DKT

==================================

For more data science resources:
https://t.me/DataScienceT

#ComputerVision #DiffusionModels #DepthEstimation #TransparentObjects #AIResearch
SpotEdit: Selective Region Editing in Diffusion Transformers

📝 Summary:
SpotEdit is a training-free framework for selective image editing in diffusion transformers. It avoids reprocessing stable regions by reusing their features, combining them with edited areas. This reduces computation and preserves unchanged regions, enhancing efficiency and precision.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22323
• PDF: https://arxiv.org/pdf/2512.22323
• Project Page: https://biangbiang0321.github.io/SpotEdit.github.io
• Github: https://biangbiang0321.github.io/SpotEdit.github.io

==================================

For more data science resources:
https://t.me/DataScienceT

#ImageEditing #DiffusionModels #ComputerVision #AIResearch #DeepLearning
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.

🔹 Publication Date: Published on Dec 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX

==================================

For more data science resources:
https://t.me/DataScienceT

#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision
GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

📝 Summary:
GRAN-TED improves text encoders for diffusion models by addressing evaluation and adaptation challenges. It introduces TED-6K, an efficient text-only benchmark that predicts generation quality 750x faster. Using this, GRAN-TED develops a superior encoder via a two-stage training method, enhancing...

🔹 Publication Date: Published on Dec 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15560
• PDF: https://arxiv.org/pdf/2512.15560

==================================

For more data science resources:
https://t.me/DataScienceT

#DiffusionModels #TextEmbeddings #AIResearch #MachineLearning #NLP