ML Research Hub

✨Animate Any Character in Any World

📝 Summary:
AniX extends controllable-entity models to enable diverse, user-defined character interactions in static 3D environments via natural language. It synthesizes temporally coherent videos through conditional autoregressive video generation, allowing characters to perform open-ended actions.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17796
• PDF: https://arxiv.org/pdf/2512.17796
• Project Page: https://snowflakewang.github.io/AniX/
• Github: https://github.com/snowflakewang/AniX

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#GenerativeAI #VideoGeneration #CharacterAnimation #NLP #3D

❤1

235 views03:02

✨ Explore Data Science 📝 Write your paper

✨3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework

📝 Summary:
3D-RE-GEN reconstructs single images into modifiable 3D textured mesh scenes with comprehensive backgrounds. It uses a compositional generative framework and novel optimization for artist-ready, physically realistic layouts, achieving state-of-the-art performance.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17459
• PDF: https://arxiv.org/pdf/2512.17459
• Project Page: https://3dregen.jdihlmann.com/
• Github: https://github.com/cgtuebingen/3D-RE-GEN

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#3DReconstruction #GenerativeAI #ComputerVision #DeepLearning #ComputerGraphics

❤1

343 views09:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MineTheGap: Automatic Mining of Biases in Text-to-Image Models

📝 Summary:
MineTheGap automatically finds prompts that cause Text-to-Image models to generate biased outputs. It uses a genetic algorithm and a novel bias score to identify and rank biases, aiming to reduce redundancy and improve output diversity.

🔹 Publication Date: Published on Dec 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13427
• PDF: https://arxiv.org/pdf/2512.13427

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AIbias #TextToImage #GenerativeAI #ResponsibleAI #MachineLearning

381 views13:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Over++: Generative Video Compositing for Layer Interaction Effects

📝 Summary:
Over++ introduces augmented compositing, a framework that generates realistic, text-prompted environmental effects for videos. It synthesizes effects like shadows onto video layers while preserving the original scene, outperforming prior methods without dense annotations.

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19661
• PDF: https://arxiv.org/pdf/2512.19661
• Project Page: https://overplusplus.github.io/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#GenerativeAI #VideoCompositing #VFX #ComputerGraphics #AIResearch

👍1

298 views23:23

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

📝 Summary:
T2AV-Compass introduces a unified benchmark for text-to-audio-video generation evaluation. It features 500 diverse prompts and a dual-level framework. Evaluations reveal current T2AV models struggle significantly with realism and cross-modal consistency.

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21094
• PDF: https://arxiv.org/pdf/2512.21094
• Project Page: https://nju-link.github.io/T2AV-Compass/
• Github: https://github.com/NJU-LINK/T2AV-Compass/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#TextToAudioVideo #MultimodalAI #AIEvaluation #GenerativeAI #AIResearch

263 views03:00

✨ Explore Data Science 📝 Write your paper

✨Spatia: Video Generation with Updatable Spatial Memory

📝 Summary:
Spatia is a video generation framework that improves long-term consistency by using an updatable 3D scene point cloud as persistent spatial memory. It iteratively generates video clips and updates this memory via visual SLAM, enabling realistic videos and 3D-aware interactive editing.

🔹 Publication Date: Published on Dec 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15716
• PDF: https://arxiv.org/pdf/2512.15716
• Project Page: https://zhaojingjing713.github.io/Spatia/
• Github: https://github.com/ZhaoJingjing713/Spatia

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #ComputerVision #3DReconstruction #SLAM

❤1

261 views06:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SkyReels-V2: Infinite-length Film Generative Model

📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2

🔹 Models citing this paper:
• https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
• https://huggingface.co/Skywork/SkyCaptioner-V1
• https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P

✨ Spaces citing this paper:
• https://huggingface.co/spaces/fffiloni/SkyReels-V2
• https://huggingface.co/spaces/Dudu0043/SkyReels-V2
• https://huggingface.co/spaces/14eee109giet/SkyReels-V2

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch

arXiv.org

SkyReels-V2: Infinite-length Film Generative Model

Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion...

❤2

682 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:21

This media is not supported in your browser

VIEW IN TELEGRAM

✨InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

📝 Summary:
InsertAnywhere is a framework for realistic video object insertion. It uses 4D aware mask generation for geometric consistency and an extended diffusion model for appearance-faithful synthesis, outperforming existing methods.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17504
• PDF: https://arxiv.org/pdf/2512.17504
• Project Page: https://myyzzzoooo.github.io/InsertAnywhere/
• Github: https://github.com/myyzzzoooo/InsertAnywhere

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VideoEditing #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI

❤1

367 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

📝 Summary:
MiA-RAG enhances RAG systems with global context awareness, inspired by human understanding. It uses hierarchical summarization to build a 'mindscape,' improving long-context retrieval and generation for better evidence-based understanding.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17220
• PDF: https://arxiv.org/pdf/2512.17220

🔹 Models citing this paper:
• https://huggingface.co/MindscapeRAG/MiA-Emb-8B
• https://huggingface.co/MindscapeRAG/MiA-Emb-4B
• https://huggingface.co/MindscapeRAG/MiA-Emb-0.6B

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#RAG #LLM #NLP #GenerativeAI #ContextUnderstanding

❤1

227 views04:01

✨ Explore Data Science 📝 Write your paper

✨Yume-1.5: A Text-Controlled Interactive World Generation Model

📝 Summary:
Yume-1.5 is a novel framework that generates realistic, interactive, and continuous worlds from a single image or text prompt. It overcomes prior limitations in real-time performance and text control by using unified context compression, streaming acceleration, and text-controlled world events.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22096
• PDF: https://arxiv.org/pdf/2512.22096
• Project Page: https://stdstu12.github.io/YUME-Project/
• Github: https://github.com/stdstu12/YUME

🔹 Models citing this paper:
• https://huggingface.co/stdstu123/Yume-5B-720P

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AI #GenerativeAI #WorldGeneration #ComputerGraphics #DeepLearning

129 views09:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

📝 Summary:
UltraShape 1.0 is a 3D diffusion framework that generates high-fidelity shapes using a two-stage process: coarse then refined geometry. It includes a novel data pipeline improving dataset quality, enabling strong geometric results on public data.

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21185
• PDF: https://arxiv.org/pdf/2512.21185
• Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/
• Github: https://pku-yuangroup.github.io/UltraShape-1.0/

🔹 Models citing this paper:
• https://huggingface.co/infinith/UltraShape

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning

318 views09:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:22

This media is not supported in your browser

VIEW IN TELEGRAM

✨SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

📝 Summary:
SpaceTimePilot is a video diffusion model for dynamic scene rendering, offering independent control over spatial viewpoint and temporal motion. It achieves precise space-time disentanglement via a time-embedding, temporal-warping training, and a synthetic dataset.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25075
• PDF: https://arxiv.org/pdf/2512.25075
• Project Page: https://zheninghuang.github.io/Space-Time-Pilot/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VideoDiffusion #GenerativeAI #DynamicScenes #ComputerGraphics #DeepLearning

163 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Guiding a Diffusion Transformer with the Internal Dynamics of Itself

📝 Summary:
This paper introduces Internal Guidance IG for diffusion models, which adds auxiliary supervision to intermediate layers during training and extrapolates outputs during sampling. This simple strategy significantly improves training efficiency and generation quality. IG achieves state-of-the-art F...

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24176
• PDF: https://arxiv.org/pdf/2512.24176
• Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
• Github: https://github.com/CVL-UESTC/Internal-Guidance

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#DiffusionModels #AI #DeepLearning #GenerativeAI #ComputerVision

340 views11:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

📝 Summary:
FlowBlending optimizes video generation by adapting model capacity to each stage. It uses large models for critical early and late timesteps, and small models for intermediate ones. This achieves faster inference and fewer FLOPs with no loss in large model fidelity.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24724
• PDF: https://arxiv.org/pdf/2512.24724

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ModelOptimization

318 views15:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:15

This media is not supported in your browser

VIEW IN TELEGRAM

✨Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.

🔹 Publication Date: Published on Jan 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch

224 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

📝 Summary:
MLLMs struggle with hallucinations on counterfactual videos. DualityForge synthesizes counterfactual video data and QA pairs through diffusion-based editing to address this. This method significantly reduces model hallucinations and improves general performance.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24271
• PDF: https://arxiv.org/pdf/2512.24271
• Project Page: https://amap-ml.github.io/Taming-Hallucinations/
• Github: https://github.com/AMAP-ML/Taming-Hallucinations

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#MLLMs #VideoUnderstanding #AIHallucinations #GenerativeAI #MachineLearning

273 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨InfoSynth: Information-Guided Benchmark Synthesis for LLMs

📝 Summary:
InfoSynth automatically generates novel and diverse coding benchmarks for LLMs. It uses information-theoretic metrics and genetic algorithms to create scalable self-verifying problems, overcoming manual effort and training data contamination.

🔹 Publication Date: Published on Jan 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00575
• PDF: https://arxiv.org/pdf/2601.00575
• Project Page: https://ishirgarg.github.io/infosynth_web/
• Github: https://github.com/ishirgarg/infosynth

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#LLM #AI #Benchmarking #GenerativeAI #DeepLearning

332 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

📝 Summary:
This paper addresses Preference Mode Collapse PMC in text-to-image diffusion models, where models lose diversity despite high reward scores. It introduces D^2-Align, a framework that mitigates PMC by directionally correcting the reward signal during optimization. This novel approach maintains gen...

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24146
• PDF: https://arxiv.org/pdf/2512.24146

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #AIResearch

178 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

1:04

This media is not supported in your browser

VIEW IN TELEGRAM

✨DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

📝 Summary:
DreamID-V is a novel video face swapping framework that uses diffusion transformers and curriculum learning. It achieves superior identity preservation and visual realism by bridging the image-to-video gap, outperforming existing methods and enhancing temporal consistency.

🔹 Publication Date: Published on Jan 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01425
• PDF: https://arxiv.org/pdf/2601.01425
• Project Page: https://guoxu1233.github.io/DreamID-V/
• Github: https://guoxu1233.github.io/DreamID-V/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#FaceSwapping #DiffusionModels #ComputerVision #GenerativeAI #VideoAI

151 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Selective Imperfection as a Generative Framework for Analysis, Creativity and Discovery

📝 Summary:
Materiomusic links matter's hierarchical structures to music's compositional logic through vibrational principles. Sound serves as a scientific probe, revealing how selective imperfection drives novelty in both. AI models can leverage this framework for creative invention beyond interpolation.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00863
• PDF: https://arxiv.org/pdf/2601.00863
• Github: https://github.com/lamm-mit/MusicAnalysis

✨ Datasets citing this paper:
• https://huggingface.co/datasets/lamm-mit/scales-12tet-defects

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#GenerativeAI #ComputationalMusic #ComplexSystems #Creativity #Interdisciplinary

243 views13:43

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform