This media is not supported in your browser
VIEW IN TELEGRAM
✨Exploring MLLM-Diffusion Information Transfer with MetaCanvas
📝 Summary:
MetaCanvas uses MLLMs as latent-space planners for diffusion models to enable precise and structured image and video generation. This approach bridges the gap between multimodal understanding and generation, outperforming global-conditioning methods.
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11464
• PDF: https://arxiv.org/pdf/2512.11464
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MLLM #DiffusionModels #GenerativeAI #ComputerVision #AIResearch
📝 Summary:
MetaCanvas uses MLLMs as latent-space planners for diffusion models to enable precise and structured image and video generation. This approach bridges the gap between multimodal understanding and generation, outperforming global-conditioning methods.
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11464
• PDF: https://arxiv.org/pdf/2512.11464
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MLLM #DiffusionModels #GenerativeAI #ComputerVision #AIResearch
❤1
✨Directional Textual Inversion for Personalized Text-to-Image Generation
📝 Summary:
Directional Textual Inversion DTI enhances text-to-image personalization by fixing learned token magnitudes and optimizing only their direction. This prevents norm inflation issues of standard Textual Inversion, improving prompt conditioning and enabling smooth interpolation. DTI offers better te...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13672
• PDF: https://arxiv.org/pdf/2512.13672
• Project Page: https://kunheek.github.io/dti
• Github: https://github.com/kunheek/dti
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TextualInversion #TextToImage #GenerativeAI #DeepLearning #AI
📝 Summary:
Directional Textual Inversion DTI enhances text-to-image personalization by fixing learned token magnitudes and optimizing only their direction. This prevents norm inflation issues of standard Textual Inversion, improving prompt conditioning and enabling smooth interpolation. DTI offers better te...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13672
• PDF: https://arxiv.org/pdf/2512.13672
• Project Page: https://kunheek.github.io/dti
• Github: https://github.com/kunheek/dti
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TextualInversion #TextToImage #GenerativeAI #DeepLearning #AI
✨Bidirectional Normalizing Flow: From Data to Noise and Back
📝 Summary:
Bidirectional Normalizing Flow BiFlow improves generative modeling by learning an approximate noise-to-data inverse, removing the need for exact invertibility. This allows flexible architectures, yielding better generation quality and accelerating sampling by up to two orders of magnitude.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10953
• PDF: https://arxiv.org/pdf/2512.10953
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#NormalizingFlows #GenerativeAI #MachineLearning #DeepLearning #DataScience
📝 Summary:
Bidirectional Normalizing Flow BiFlow improves generative modeling by learning an approximate noise-to-data inverse, removing the need for exact invertibility. This allows flexible architectures, yielding better generation quality and accelerating sampling by up to two orders of magnitude.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10953
• PDF: https://arxiv.org/pdf/2512.10953
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#NormalizingFlows #GenerativeAI #MachineLearning #DeepLearning #DataScience
Media is too big
VIEW IN TELEGRAM
✨Animate Any Character in Any World
📝 Summary:
AniX extends controllable-entity models to enable diverse, user-defined character interactions in static 3D environments via natural language. It synthesizes temporally coherent videos through conditional autoregressive video generation, allowing characters to perform open-ended actions.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17796
• PDF: https://arxiv.org/pdf/2512.17796
• Project Page: https://snowflakewang.github.io/AniX/
• Github: https://github.com/snowflakewang/AniX
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GenerativeAI #VideoGeneration #CharacterAnimation #NLP #3D
📝 Summary:
AniX extends controllable-entity models to enable diverse, user-defined character interactions in static 3D environments via natural language. It synthesizes temporally coherent videos through conditional autoregressive video generation, allowing characters to perform open-ended actions.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17796
• PDF: https://arxiv.org/pdf/2512.17796
• Project Page: https://snowflakewang.github.io/AniX/
• Github: https://github.com/snowflakewang/AniX
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GenerativeAI #VideoGeneration #CharacterAnimation #NLP #3D
❤1
Media is too big
VIEW IN TELEGRAM
✨3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework
📝 Summary:
3D-RE-GEN reconstructs single images into modifiable 3D textured mesh scenes with comprehensive backgrounds. It uses a compositional generative framework and novel optimization for artist-ready, physically realistic layouts, achieving state-of-the-art performance.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17459
• PDF: https://arxiv.org/pdf/2512.17459
• Project Page: https://3dregen.jdihlmann.com/
• Github: https://github.com/cgtuebingen/3D-RE-GEN
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DReconstruction #GenerativeAI #ComputerVision #DeepLearning #ComputerGraphics
📝 Summary:
3D-RE-GEN reconstructs single images into modifiable 3D textured mesh scenes with comprehensive backgrounds. It uses a compositional generative framework and novel optimization for artist-ready, physically realistic layouts, achieving state-of-the-art performance.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17459
• PDF: https://arxiv.org/pdf/2512.17459
• Project Page: https://3dregen.jdihlmann.com/
• Github: https://github.com/cgtuebingen/3D-RE-GEN
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DReconstruction #GenerativeAI #ComputerVision #DeepLearning #ComputerGraphics
❤1
✨MineTheGap: Automatic Mining of Biases in Text-to-Image Models
📝 Summary:
MineTheGap automatically finds prompts that cause Text-to-Image models to generate biased outputs. It uses a genetic algorithm and a novel bias score to identify and rank biases, aiming to reduce redundancy and improve output diversity.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13427
• PDF: https://arxiv.org/pdf/2512.13427
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIbias #TextToImage #GenerativeAI #ResponsibleAI #MachineLearning
📝 Summary:
MineTheGap automatically finds prompts that cause Text-to-Image models to generate biased outputs. It uses a genetic algorithm and a novel bias score to identify and rank biases, aiming to reduce redundancy and improve output diversity.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13427
• PDF: https://arxiv.org/pdf/2512.13427
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIbias #TextToImage #GenerativeAI #ResponsibleAI #MachineLearning
✨Over++: Generative Video Compositing for Layer Interaction Effects
📝 Summary:
Over++ introduces augmented compositing, a framework that generates realistic, text-prompted environmental effects for videos. It synthesizes effects like shadows onto video layers while preserving the original scene, outperforming prior methods without dense annotations.
🔹 Publication Date: Published on Dec 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19661
• PDF: https://arxiv.org/pdf/2512.19661
• Project Page: https://overplusplus.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GenerativeAI #VideoCompositing #VFX #ComputerGraphics #AIResearch
📝 Summary:
Over++ introduces augmented compositing, a framework that generates realistic, text-prompted environmental effects for videos. It synthesizes effects like shadows onto video layers while preserving the original scene, outperforming prior methods without dense annotations.
🔹 Publication Date: Published on Dec 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19661
• PDF: https://arxiv.org/pdf/2512.19661
• Project Page: https://overplusplus.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GenerativeAI #VideoCompositing #VFX #ComputerGraphics #AIResearch
👍1
✨T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation
📝 Summary:
T2AV-Compass introduces a unified benchmark for text-to-audio-video generation evaluation. It features 500 diverse prompts and a dual-level framework. Evaluations reveal current T2AV models struggle significantly with realism and cross-modal consistency.
🔹 Publication Date: Published on Dec 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21094
• PDF: https://arxiv.org/pdf/2512.21094
• Project Page: https://nju-link.github.io/T2AV-Compass/
• Github: https://github.com/NJU-LINK/T2AV-Compass/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TextToAudioVideo #MultimodalAI #AIEvaluation #GenerativeAI #AIResearch
📝 Summary:
T2AV-Compass introduces a unified benchmark for text-to-audio-video generation evaluation. It features 500 diverse prompts and a dual-level framework. Evaluations reveal current T2AV models struggle significantly with realism and cross-modal consistency.
🔹 Publication Date: Published on Dec 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21094
• PDF: https://arxiv.org/pdf/2512.21094
• Project Page: https://nju-link.github.io/T2AV-Compass/
• Github: https://github.com/NJU-LINK/T2AV-Compass/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TextToAudioVideo #MultimodalAI #AIEvaluation #GenerativeAI #AIResearch
Media is too big
VIEW IN TELEGRAM
✨Spatia: Video Generation with Updatable Spatial Memory
📝 Summary:
Spatia is a video generation framework that improves long-term consistency by using an updatable 3D scene point cloud as persistent spatial memory. It iteratively generates video clips and updates this memory via visual SLAM, enabling realistic videos and 3D-aware interactive editing.
🔹 Publication Date: Published on Dec 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15716
• PDF: https://arxiv.org/pdf/2512.15716
• Project Page: https://zhaojingjing713.github.io/Spatia/
• Github: https://github.com/ZhaoJingjing713/Spatia
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #GenerativeAI #ComputerVision #3DReconstruction #SLAM
📝 Summary:
Spatia is a video generation framework that improves long-term consistency by using an updatable 3D scene point cloud as persistent spatial memory. It iteratively generates video clips and updates this memory via visual SLAM, enabling realistic videos and 3D-aware interactive editing.
🔹 Publication Date: Published on Dec 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15716
• PDF: https://arxiv.org/pdf/2512.15716
• Project Page: https://zhaojingjing713.github.io/Spatia/
• Github: https://github.com/ZhaoJingjing713/Spatia
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #GenerativeAI #ComputerVision #3DReconstruction #SLAM
❤1
✨SkyReels-V2: Infinite-length Film Generative Model
📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2
🔹 Models citing this paper:
• https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
• https://huggingface.co/Skywork/SkyCaptioner-V1
• https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P
✨ Spaces citing this paper:
• https://huggingface.co/spaces/fffiloni/SkyReels-V2
• https://huggingface.co/spaces/Dudu0043/SkyReels-V2
• https://huggingface.co/spaces/14eee109giet/SkyReels-V2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch
📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2
🔹 Models citing this paper:
• https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
• https://huggingface.co/Skywork/SkyCaptioner-V1
• https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P
✨ Spaces citing this paper:
• https://huggingface.co/spaces/fffiloni/SkyReels-V2
• https://huggingface.co/spaces/Dudu0043/SkyReels-V2
• https://huggingface.co/spaces/14eee109giet/SkyReels-V2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch
arXiv.org
SkyReels-V2: Infinite-length Film Generative Model
Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion...
❤2
This media is not supported in your browser
VIEW IN TELEGRAM
✨InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion
📝 Summary:
InsertAnywhere is a framework for realistic video object insertion. It uses 4D aware mask generation for geometric consistency and an extended diffusion model for appearance-faithful synthesis, outperforming existing methods.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17504
• PDF: https://arxiv.org/pdf/2512.17504
• Project Page: https://myyzzzoooo.github.io/InsertAnywhere/
• Github: https://github.com/myyzzzoooo/InsertAnywhere
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoEditing #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
📝 Summary:
InsertAnywhere is a framework for realistic video object insertion. It uses 4D aware mask generation for geometric consistency and an extended diffusion model for appearance-faithful synthesis, outperforming existing methods.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17504
• PDF: https://arxiv.org/pdf/2512.17504
• Project Page: https://myyzzzoooo.github.io/InsertAnywhere/
• Github: https://github.com/myyzzzoooo/InsertAnywhere
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoEditing #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
❤1
✨Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding
📝 Summary:
MiA-RAG enhances RAG systems with global context awareness, inspired by human understanding. It uses hierarchical summarization to build a 'mindscape,' improving long-context retrieval and generation for better evidence-based understanding.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17220
• PDF: https://arxiv.org/pdf/2512.17220
🔹 Models citing this paper:
• https://huggingface.co/MindscapeRAG/MiA-Emb-8B
• https://huggingface.co/MindscapeRAG/MiA-Emb-4B
• https://huggingface.co/MindscapeRAG/MiA-Emb-0.6B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#RAG #LLM #NLP #GenerativeAI #ContextUnderstanding
📝 Summary:
MiA-RAG enhances RAG systems with global context awareness, inspired by human understanding. It uses hierarchical summarization to build a 'mindscape,' improving long-context retrieval and generation for better evidence-based understanding.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17220
• PDF: https://arxiv.org/pdf/2512.17220
🔹 Models citing this paper:
• https://huggingface.co/MindscapeRAG/MiA-Emb-8B
• https://huggingface.co/MindscapeRAG/MiA-Emb-4B
• https://huggingface.co/MindscapeRAG/MiA-Emb-0.6B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#RAG #LLM #NLP #GenerativeAI #ContextUnderstanding
❤1
Media is too big
VIEW IN TELEGRAM
✨Yume-1.5: A Text-Controlled Interactive World Generation Model
📝 Summary:
Yume-1.5 is a novel framework that generates realistic, interactive, and continuous worlds from a single image or text prompt. It overcomes prior limitations in real-time performance and text control by using unified context compression, streaming acceleration, and text-controlled world events.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22096
• PDF: https://arxiv.org/pdf/2512.22096
• Project Page: https://stdstu12.github.io/YUME-Project/
• Github: https://github.com/stdstu12/YUME
🔹 Models citing this paper:
• https://huggingface.co/stdstu123/Yume-5B-720P
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #GenerativeAI #WorldGeneration #ComputerGraphics #DeepLearning
📝 Summary:
Yume-1.5 is a novel framework that generates realistic, interactive, and continuous worlds from a single image or text prompt. It overcomes prior limitations in real-time performance and text control by using unified context compression, streaming acceleration, and text-controlled world events.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22096
• PDF: https://arxiv.org/pdf/2512.22096
• Project Page: https://stdstu12.github.io/YUME-Project/
• Github: https://github.com/stdstu12/YUME
🔹 Models citing this paper:
• https://huggingface.co/stdstu123/Yume-5B-720P
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #GenerativeAI #WorldGeneration #ComputerGraphics #DeepLearning
✨UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
📝 Summary:
UltraShape 1.0 is a 3D diffusion framework that generates high-fidelity shapes using a two-stage process: coarse then refined geometry. It includes a novel data pipeline improving dataset quality, enabling strong geometric results on public data.
🔹 Publication Date: Published on Dec 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21185
• PDF: https://arxiv.org/pdf/2512.21185
• Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/
• Github: https://pku-yuangroup.github.io/UltraShape-1.0/
🔹 Models citing this paper:
• https://huggingface.co/infinith/UltraShape
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
📝 Summary:
UltraShape 1.0 is a 3D diffusion framework that generates high-fidelity shapes using a two-stage process: coarse then refined geometry. It includes a novel data pipeline improving dataset quality, enabling strong geometric results on public data.
🔹 Publication Date: Published on Dec 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21185
• PDF: https://arxiv.org/pdf/2512.21185
• Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/
• Github: https://pku-yuangroup.github.io/UltraShape-1.0/
🔹 Models citing this paper:
• https://huggingface.co/infinith/UltraShape
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
📝 Summary:
SpaceTimePilot is a video diffusion model for dynamic scene rendering, offering independent control over spatial viewpoint and temporal motion. It achieves precise space-time disentanglement via a time-embedding, temporal-warping training, and a synthetic dataset.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25075
• PDF: https://arxiv.org/pdf/2512.25075
• Project Page: https://zheninghuang.github.io/Space-Time-Pilot/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoDiffusion #GenerativeAI #DynamicScenes #ComputerGraphics #DeepLearning
📝 Summary:
SpaceTimePilot is a video diffusion model for dynamic scene rendering, offering independent control over spatial viewpoint and temporal motion. It achieves precise space-time disentanglement via a time-embedding, temporal-warping training, and a synthetic dataset.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25075
• PDF: https://arxiv.org/pdf/2512.25075
• Project Page: https://zheninghuang.github.io/Space-Time-Pilot/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoDiffusion #GenerativeAI #DynamicScenes #ComputerGraphics #DeepLearning
✨Guiding a Diffusion Transformer with the Internal Dynamics of Itself
📝 Summary:
This paper introduces Internal Guidance IG for diffusion models, which adds auxiliary supervision to intermediate layers during training and extrapolates outputs during sampling. This simple strategy significantly improves training efficiency and generation quality. IG achieves state-of-the-art F...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24176
• PDF: https://arxiv.org/pdf/2512.24176
• Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
• Github: https://github.com/CVL-UESTC/Internal-Guidance
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #AI #DeepLearning #GenerativeAI #ComputerVision
📝 Summary:
This paper introduces Internal Guidance IG for diffusion models, which adds auxiliary supervision to intermediate layers during training and extrapolates outputs during sampling. This simple strategy significantly improves training efficiency and generation quality. IG achieves state-of-the-art F...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24176
• PDF: https://arxiv.org/pdf/2512.24176
• Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
• Github: https://github.com/CVL-UESTC/Internal-Guidance
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #AI #DeepLearning #GenerativeAI #ComputerVision
✨FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation
📝 Summary:
FlowBlending optimizes video generation by adapting model capacity to each stage. It uses large models for critical early and late timesteps, and small models for intermediate ones. This achieves faster inference and fewer FLOPs with no loss in large model fidelity.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24724
• PDF: https://arxiv.org/pdf/2512.24724
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ModelOptimization
📝 Summary:
FlowBlending optimizes video generation by adapting model capacity to each stage. It uses large models for critical early and late timesteps, and small models for intermediate ones. This achieves faster inference and fewer FLOPs with no loss in large model fidelity.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24724
• PDF: https://arxiv.org/pdf/2512.24724
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ModelOptimization
This media is not supported in your browser
VIEW IN TELEGRAM
✨Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch
📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch
✨Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
📝 Summary:
MLLMs struggle with hallucinations on counterfactual videos. DualityForge synthesizes counterfactual video data and QA pairs through diffusion-based editing to address this. This method significantly reduces model hallucinations and improves general performance.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24271
• PDF: https://arxiv.org/pdf/2512.24271
• Project Page: https://amap-ml.github.io/Taming-Hallucinations/
• Github: https://github.com/AMAP-ML/Taming-Hallucinations
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MLLMs #VideoUnderstanding #AIHallucinations #GenerativeAI #MachineLearning
📝 Summary:
MLLMs struggle with hallucinations on counterfactual videos. DualityForge synthesizes counterfactual video data and QA pairs through diffusion-based editing to address this. This method significantly reduces model hallucinations and improves general performance.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24271
• PDF: https://arxiv.org/pdf/2512.24271
• Project Page: https://amap-ml.github.io/Taming-Hallucinations/
• Github: https://github.com/AMAP-ML/Taming-Hallucinations
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MLLMs #VideoUnderstanding #AIHallucinations #GenerativeAI #MachineLearning
✨InfoSynth: Information-Guided Benchmark Synthesis for LLMs
📝 Summary:
InfoSynth automatically generates novel and diverse coding benchmarks for LLMs. It uses information-theoretic metrics and genetic algorithms to create scalable self-verifying problems, overcoming manual effort and training data contamination.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00575
• PDF: https://arxiv.org/pdf/2601.00575
• Project Page: https://ishirgarg.github.io/infosynth_web/
• Github: https://github.com/ishirgarg/infosynth
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #Benchmarking #GenerativeAI #DeepLearning
📝 Summary:
InfoSynth automatically generates novel and diverse coding benchmarks for LLMs. It uses information-theoretic metrics and genetic algorithms to create scalable self-verifying problems, overcoming manual effort and training data contamination.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00575
• PDF: https://arxiv.org/pdf/2601.00575
• Project Page: https://ishirgarg.github.io/infosynth_web/
• Github: https://github.com/ishirgarg/infosynth
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #Benchmarking #GenerativeAI #DeepLearning