This media is not supported in your browser
VIEW IN TELEGRAM
✨FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering
📝 Summary:
FrameDiffuser is an autoregressive neural rendering framework. It generates temporally consistent, photorealistic frames using G-buffer data and its own previous output. This achieves interactive speed and high quality compared to prior methods.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16670
• PDF: https://arxiv.org/pdf/2512.16670
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#NeuralRendering #DiffusionModels #ComputerGraphics #RealtimeRendering #DeepLearning
📝 Summary:
FrameDiffuser is an autoregressive neural rendering framework. It generates temporally consistent, photorealistic frames using G-buffer data and its own previous output. This achieves interactive speed and high quality compared to prior methods.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16670
• PDF: https://arxiv.org/pdf/2512.16670
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#NeuralRendering #DiffusionModels #ComputerGraphics #RealtimeRendering #DeepLearning
❤2
✨RadarGen: Automotive Radar Point Cloud Generation from Cameras
📝 Summary:
RadarGen synthesizes realistic automotive radar point clouds from camera images using diffusion models. It incorporates depth, semantic, and motion cues for physical plausibility, enabling scalable multimodal simulation and improving perception models.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17897
• PDF: https://arxiv.org/pdf/2512.17897
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AutomotiveRadar #PointClouds #DiffusionModels #ComputerVision #AutonomousDriving
📝 Summary:
RadarGen synthesizes realistic automotive radar point clouds from camera images using diffusion models. It incorporates depth, semantic, and motion cues for physical plausibility, enabling scalable multimodal simulation and improving perception models.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17897
• PDF: https://arxiv.org/pdf/2512.17897
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AutomotiveRadar #PointClouds #DiffusionModels #ComputerVision #AutonomousDriving
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨MatSpray: Fusing 2D Material World Knowledge on 3D Geometry
📝 Summary:
MatSpray integrates 2D PBR materials from diffusion models onto 3D Gaussian Splatting geometry. Using projection and neural refinement, it enables accurate relighting and photorealistic rendering from reconstructed scenes. This boosts asset creation efficiency.
🔹 Publication Date: Published on Dec 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18314
• PDF: https://arxiv.org/pdf/2512.18314
• Project Page: https://matspray.jdihlmann.com/
• Github: https://github.com/cgtuebingen/MatSpray
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MatSpray #GaussianSplatting #DiffusionModels #3DRendering #ComputerGraphics
📝 Summary:
MatSpray integrates 2D PBR materials from diffusion models onto 3D Gaussian Splatting geometry. Using projection and neural refinement, it enables accurate relighting and photorealistic rendering from reconstructed scenes. This boosts asset creation efficiency.
🔹 Publication Date: Published on Dec 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18314
• PDF: https://arxiv.org/pdf/2512.18314
• Project Page: https://matspray.jdihlmann.com/
• Github: https://github.com/cgtuebingen/MatSpray
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MatSpray #GaussianSplatting #DiffusionModels #3DRendering #ComputerGraphics
❤2
✨SkyReels-V2: Infinite-length Film Generative Model
📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2
🔹 Models citing this paper:
• https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
• https://huggingface.co/Skywork/SkyCaptioner-V1
• https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P
✨ Spaces citing this paper:
• https://huggingface.co/spaces/fffiloni/SkyReels-V2
• https://huggingface.co/spaces/Dudu0043/SkyReels-V2
• https://huggingface.co/spaces/14eee109giet/SkyReels-V2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch
📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2
🔹 Models citing this paper:
• https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
• https://huggingface.co/Skywork/SkyCaptioner-V1
• https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P
✨ Spaces citing this paper:
• https://huggingface.co/spaces/fffiloni/SkyReels-V2
• https://huggingface.co/spaces/Dudu0043/SkyReels-V2
• https://huggingface.co/spaces/14eee109giet/SkyReels-V2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch
arXiv.org
SkyReels-V2: Infinite-length Film Generative Model
Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion...
❤2
This media is not supported in your browser
VIEW IN TELEGRAM
✨InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion
📝 Summary:
InsertAnywhere is a framework for realistic video object insertion. It uses 4D aware mask generation for geometric consistency and an extended diffusion model for appearance-faithful synthesis, outperforming existing methods.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17504
• PDF: https://arxiv.org/pdf/2512.17504
• Project Page: https://myyzzzoooo.github.io/InsertAnywhere/
• Github: https://github.com/myyzzzoooo/InsertAnywhere
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoEditing #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
📝 Summary:
InsertAnywhere is a framework for realistic video object insertion. It uses 4D aware mask generation for geometric consistency and an extended diffusion model for appearance-faithful synthesis, outperforming existing methods.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17504
• PDF: https://arxiv.org/pdf/2512.17504
• Project Page: https://myyzzzoooo.github.io/InsertAnywhere/
• Github: https://github.com/myyzzzoooo/InsertAnywhere
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoEditing #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI
📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI
✨Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
📝 Summary:
Transparent objects are hard for perception. This work observes video diffusion models can synthesize transparent phenomena, so they repurpose one. Their DKT model, trained on a new dataset, achieves zero-shot SOTA for depth and normal estimation of transparent objects, proving diffusion knows tr...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23705
• PDF: https://arxiv.org/pdf/2512.23705
• Project Page: https://daniellli.github.io/projects/DKT/
• Github: https://github.com/Daniellli/DKT
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ComputerVision #DiffusionModels #DepthEstimation #TransparentObjects #AIResearch
📝 Summary:
Transparent objects are hard for perception. This work observes video diffusion models can synthesize transparent phenomena, so they repurpose one. Their DKT model, trained on a new dataset, achieves zero-shot SOTA for depth and normal estimation of transparent objects, proving diffusion knows tr...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23705
• PDF: https://arxiv.org/pdf/2512.23705
• Project Page: https://daniellli.github.io/projects/DKT/
• Github: https://github.com/Daniellli/DKT
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ComputerVision #DiffusionModels #DepthEstimation #TransparentObjects #AIResearch
✨SpotEdit: Selective Region Editing in Diffusion Transformers
📝 Summary:
SpotEdit is a training-free framework for selective image editing in diffusion transformers. It avoids reprocessing stable regions by reusing their features, combining them with edited areas. This reduces computation and preserves unchanged regions, enhancing efficiency and precision.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22323
• PDF: https://arxiv.org/pdf/2512.22323
• Project Page: https://biangbiang0321.github.io/SpotEdit.github.io
• Github: https://biangbiang0321.github.io/SpotEdit.github.io
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #DiffusionModels #ComputerVision #AIResearch #DeepLearning
📝 Summary:
SpotEdit is a training-free framework for selective image editing in diffusion transformers. It avoids reprocessing stable regions by reusing their features, combining them with edited areas. This reduces computation and preserves unchanged regions, enhancing efficiency and precision.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22323
• PDF: https://arxiv.org/pdf/2512.22323
• Project Page: https://biangbiang0321.github.io/SpotEdit.github.io
• Github: https://biangbiang0321.github.io/SpotEdit.github.io
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #DiffusionModels #ComputerVision #AIResearch #DeepLearning
✨Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.
🔹 Publication Date: Published on Dec 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision
📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.
🔹 Publication Date: Published on Dec 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision
✨GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models
📝 Summary:
GRAN-TED improves text encoders for diffusion models by addressing evaluation and adaptation challenges. It introduces TED-6K, an efficient text-only benchmark that predicts generation quality 750x faster. Using this, GRAN-TED develops a superior encoder via a two-stage training method, enhancing...
🔹 Publication Date: Published on Dec 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15560
• PDF: https://arxiv.org/pdf/2512.15560
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #TextEmbeddings #AIResearch #MachineLearning #NLP
📝 Summary:
GRAN-TED improves text encoders for diffusion models by addressing evaluation and adaptation challenges. It introduces TED-6K, an efficient text-only benchmark that predicts generation quality 750x faster. Using this, GRAN-TED develops a superior encoder via a two-stage training method, enhancing...
🔹 Publication Date: Published on Dec 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15560
• PDF: https://arxiv.org/pdf/2512.15560
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #TextEmbeddings #AIResearch #MachineLearning #NLP
✨DiRL: An Efficient Post-Training Framework for Diffusion Language Models
📝 Summary:
DiRL is an efficient post-training framework for Diffusion Language Models, integrating online updates and introducing DiPO for unbiased policy optimization. It achieves state-of-the-art math performance for dLLMs, surpassing comparable models.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22234
• PDF: https://arxiv.org/pdf/2512.22234
• Github: https://github.com/OpenMOSS/DiRL
🔹 Models citing this paper:
• https://huggingface.co/OpenMOSS-Team/DiRL-8B-Instruct
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #LLM #ModelOptimization #MachineLearning #AI
📝 Summary:
DiRL is an efficient post-training framework for Diffusion Language Models, integrating online updates and introducing DiPO for unbiased policy optimization. It achieves state-of-the-art math performance for dLLMs, surpassing comparable models.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22234
• PDF: https://arxiv.org/pdf/2512.22234
• Github: https://github.com/OpenMOSS/DiRL
🔹 Models citing this paper:
• https://huggingface.co/OpenMOSS-Team/DiRL-8B-Instruct
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #LLM #ModelOptimization #MachineLearning #AI
✨UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
📝 Summary:
UltraShape 1.0 is a 3D diffusion framework that generates high-fidelity shapes using a two-stage process: coarse then refined geometry. It includes a novel data pipeline improving dataset quality, enabling strong geometric results on public data.
🔹 Publication Date: Published on Dec 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21185
• PDF: https://arxiv.org/pdf/2512.21185
• Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/
• Github: https://pku-yuangroup.github.io/UltraShape-1.0/
🔹 Models citing this paper:
• https://huggingface.co/infinith/UltraShape
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
📝 Summary:
UltraShape 1.0 is a 3D diffusion framework that generates high-fidelity shapes using a two-stage process: coarse then refined geometry. It includes a novel data pipeline improving dataset quality, enabling strong geometric results on public data.
🔹 Publication Date: Published on Dec 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21185
• PDF: https://arxiv.org/pdf/2512.21185
• Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/
• Github: https://pku-yuangroup.github.io/UltraShape-1.0/
🔹 Models citing this paper:
• https://huggingface.co/infinith/UltraShape
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction
📝 Summary:
GaMO improves sparse-view 3D reconstruction by using geometry-aware multi-view outpainting. It expands existing views to enhance scene coverage and consistency. This achieves state-of-the-art quality 25x faster than prior methods, with reduced computational cost.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25073
• PDF: https://arxiv.org/pdf/2512.25073
• Project Page: https://yichuanh.github.io/GaMO/
• Github: https://yichuanh.github.io/GaMO/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DReconstruction #ComputerVision #DiffusionModels #GaMO #AI
📝 Summary:
GaMO improves sparse-view 3D reconstruction by using geometry-aware multi-view outpainting. It expands existing views to enhance scene coverage and consistency. This achieves state-of-the-art quality 25x faster than prior methods, with reduced computational cost.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25073
• PDF: https://arxiv.org/pdf/2512.25073
• Project Page: https://yichuanh.github.io/GaMO/
• Github: https://yichuanh.github.io/GaMO/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DReconstruction #ComputerVision #DiffusionModels #GaMO #AI
✨Guiding a Diffusion Transformer with the Internal Dynamics of Itself
📝 Summary:
This paper introduces Internal Guidance IG for diffusion models, which adds auxiliary supervision to intermediate layers during training and extrapolates outputs during sampling. This simple strategy significantly improves training efficiency and generation quality. IG achieves state-of-the-art F...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24176
• PDF: https://arxiv.org/pdf/2512.24176
• Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
• Github: https://github.com/CVL-UESTC/Internal-Guidance
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #AI #DeepLearning #GenerativeAI #ComputerVision
📝 Summary:
This paper introduces Internal Guidance IG for diffusion models, which adds auxiliary supervision to intermediate layers during training and extrapolates outputs during sampling. This simple strategy significantly improves training efficiency and generation quality. IG achieves state-of-the-art F...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24176
• PDF: https://arxiv.org/pdf/2512.24176
• Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
• Github: https://github.com/CVL-UESTC/Internal-Guidance
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #AI #DeepLearning #GenerativeAI #ComputerVision
✨Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation
📝 Summary:
DiffusionGS is a novel single-stage 3D diffusion model that directly generates 3D Gaussian point clouds from a single image. It ensures strong view consistency from any prompt view. This method achieves superior quality and is over 5x faster than state-of-the-art techniques.
🔹 Publication Date: Published on Nov 21, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2411.14384
• PDF: https://arxiv.org/pdf/2411.14384
• Project Page: https://caiyuanhao1998.github.io/project/DiffusionGS/
• Github: https://github.com/caiyuanhao1998/Open-DiffusionGS
🔹 Models citing this paper:
• https://huggingface.co/CaiYuanhao/DiffusionGS
✨ Datasets citing this paper:
• https://huggingface.co/datasets/CaiYuanhao/DiffusionGS
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DGeneration #DiffusionModels #GaussianSplatting #ComputerVision #AIResearch
📝 Summary:
DiffusionGS is a novel single-stage 3D diffusion model that directly generates 3D Gaussian point clouds from a single image. It ensures strong view consistency from any prompt view. This method achieves superior quality and is over 5x faster than state-of-the-art techniques.
🔹 Publication Date: Published on Nov 21, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2411.14384
• PDF: https://arxiv.org/pdf/2411.14384
• Project Page: https://caiyuanhao1998.github.io/project/DiffusionGS/
• Github: https://github.com/caiyuanhao1998/Open-DiffusionGS
🔹 Models citing this paper:
• https://huggingface.co/CaiYuanhao/DiffusionGS
✨ Datasets citing this paper:
• https://huggingface.co/datasets/CaiYuanhao/DiffusionGS
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DGeneration #DiffusionModels #GaussianSplatting #ComputerVision #AIResearch
arXiv.org
Baking Gaussian Splatting into Diffusion Denoiser for Fast and...
Existing feedforward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction...
✨OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions
📝 Summary:
OmniVCus introduces a system for feedforward multi-subject video customization with multimodal controls. It proposes a data pipeline, VideoCus-Factory, and a diffusion Transformer framework with novel embedding mechanisms. This enables more subjects and precise editing, significantly outperformin...
🔹 Publication Date: Published on Jun 29, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.23361
• PDF: https://arxiv.org/pdf/2506.23361
• Project Page: https://caiyuanhao1998.github.io/project/OmniVCus/
• Github: https://github.com/caiyuanhao1998/Open-OmniVCus
🔹 Models citing this paper:
• https://huggingface.co/CaiYuanhao/OmniVCus
✨ Datasets citing this paper:
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Test
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Train
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #DiffusionModels #MultimodalAI #DeepLearning #ComputerVision
📝 Summary:
OmniVCus introduces a system for feedforward multi-subject video customization with multimodal controls. It proposes a data pipeline, VideoCus-Factory, and a diffusion Transformer framework with novel embedding mechanisms. This enables more subjects and precise editing, significantly outperformin...
🔹 Publication Date: Published on Jun 29, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.23361
• PDF: https://arxiv.org/pdf/2506.23361
• Project Page: https://caiyuanhao1998.github.io/project/OmniVCus/
• Github: https://github.com/caiyuanhao1998/Open-OmniVCus
🔹 Models citing this paper:
• https://huggingface.co/CaiYuanhao/OmniVCus
✨ Datasets citing this paper:
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Test
• https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Train
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #DiffusionModels #MultimodalAI #DeepLearning #ComputerVision
arXiv.org
OmniVCus: Feedforward Subject-driven Video Customization with...
Existing feedforward subject-driven video customization methods mainly study single-subject scenarios due to the difficulty of constructing multi-subject training data pairs. Another challenging...
❤1
✨Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning
📝 Summary:
This paper addresses Preference Mode Collapse PMC in text-to-image diffusion models, where models lose diversity despite high reward scores. It introduces D^2-Align, a framework that mitigates PMC by directionally correcting the reward signal during optimization. This novel approach maintains gen...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24146
• PDF: https://arxiv.org/pdf/2512.24146
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #AIResearch
📝 Summary:
This paper addresses Preference Mode Collapse PMC in text-to-image diffusion models, where models lose diversity despite high reward scores. It introduces D^2-Align, a framework that mitigates PMC by directionally correcting the reward signal during optimization. This novel approach maintains gen...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24146
• PDF: https://arxiv.org/pdf/2512.24146
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
✨DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
📝 Summary:
DreamID-V is a novel video face swapping framework that uses diffusion transformers and curriculum learning. It achieves superior identity preservation and visual realism by bridging the image-to-video gap, outperforming existing methods and enhancing temporal consistency.
🔹 Publication Date: Published on Jan 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01425
• PDF: https://arxiv.org/pdf/2601.01425
• Project Page: https://guoxu1233.github.io/DreamID-V/
• Github: https://guoxu1233.github.io/DreamID-V/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FaceSwapping #DiffusionModels #ComputerVision #GenerativeAI #VideoAI
📝 Summary:
DreamID-V is a novel video face swapping framework that uses diffusion transformers and curriculum learning. It achieves superior identity preservation and visual realism by bridging the image-to-video gap, outperforming existing methods and enhancing temporal consistency.
🔹 Publication Date: Published on Jan 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01425
• PDF: https://arxiv.org/pdf/2601.01425
• Project Page: https://guoxu1233.github.io/DreamID-V/
• Github: https://guoxu1233.github.io/DreamID-V/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FaceSwapping #DiffusionModels #ComputerVision #GenerativeAI #VideoAI
✨M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models
📝 Summary:
Existing concept erasure methods in diffusion models are vulnerable to non-text inputs. M-ErasureBench is a new multimodal evaluation framework, and IRECE is a module to restore robustness against these attacks, reducing concept reproduction.
🔹 Publication Date: Published on Dec 28, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22877
• PDF: https://arxiv.org/pdf/2512.22877
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #ConceptErasure #MultimodalAI #AISafety #MachineLearning
📝 Summary:
Existing concept erasure methods in diffusion models are vulnerable to non-text inputs. M-ErasureBench is a new multimodal evaluation framework, and IRECE is a module to restore robustness against these attacks, reducing concept reproduction.
🔹 Publication Date: Published on Dec 28, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22877
• PDF: https://arxiv.org/pdf/2512.22877
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #ConceptErasure #MultimodalAI #AISafety #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies
📝 Summary:
DiffProxy generates multi-view consistent human proxies using diffusion models to improve human mesh recovery. This bridges synthetic training and real-world generalization, achieving state-of-the-art performance on real benchmarks.
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02267
• PDF: https://arxiv.org/pdf/2601.02267
• Project Page: https://wrk226.github.io/DiffProxy.html
• Github: https://github.com/wrk226/DiffProxy
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#HumanMeshRecovery #DiffusionModels #ComputerVision #DeepLearning #AI
📝 Summary:
DiffProxy generates multi-view consistent human proxies using diffusion models to improve human mesh recovery. This bridges synthetic training and real-world generalization, achieving state-of-the-art performance on real benchmarks.
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02267
• PDF: https://arxiv.org/pdf/2601.02267
• Project Page: https://wrk226.github.io/DiffProxy.html
• Github: https://github.com/wrk226/DiffProxy
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#HumanMeshRecovery #DiffusionModels #ComputerVision #DeepLearning #AI