ML Research Hub
32.8K subscribers
4.13K photos
244 videos
23 files
4.46K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Media is too big
VIEW IN TELEGRAM
Animate Any Character in Any World

📝 Summary:
AniX extends controllable-entity models to enable diverse, user-defined character interactions in static 3D environments via natural language. It synthesizes temporally coherent videos through conditional autoregressive video generation, allowing characters to perform open-ended actions.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17796
• PDF: https://arxiv.org/pdf/2512.17796
• Project Page: https://snowflakewang.github.io/AniX/
• Github: https://github.com/snowflakewang/AniX

==================================

For more data science resources:
https://t.me/DataScienceT

#GenerativeAI #VideoGeneration #CharacterAnimation #NLP #3D
1
Media is too big
VIEW IN TELEGRAM
3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework

📝 Summary:
3D-RE-GEN reconstructs single images into modifiable 3D textured mesh scenes with comprehensive backgrounds. It uses a compositional generative framework and novel optimization for artist-ready, physically realistic layouts, achieving state-of-the-art performance.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17459
• PDF: https://arxiv.org/pdf/2512.17459
• Project Page: https://3dregen.jdihlmann.com/
• Github: https://github.com/cgtuebingen/3D-RE-GEN

==================================

For more data science resources:
https://t.me/DataScienceT

#3DReconstruction #GenerativeAI #ComputerVision #DeepLearning #ComputerGraphics
1
MineTheGap: Automatic Mining of Biases in Text-to-Image Models

📝 Summary:
MineTheGap automatically finds prompts that cause Text-to-Image models to generate biased outputs. It uses a genetic algorithm and a novel bias score to identify and rank biases, aiming to reduce redundancy and improve output diversity.

🔹 Publication Date: Published on Dec 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13427
• PDF: https://arxiv.org/pdf/2512.13427

==================================

For more data science resources:
https://t.me/DataScienceT

#AIbias #TextToImage #GenerativeAI #ResponsibleAI #MachineLearning
Over++: Generative Video Compositing for Layer Interaction Effects

📝 Summary:
Over++ introduces augmented compositing, a framework that generates realistic, text-prompted environmental effects for videos. It synthesizes effects like shadows onto video layers while preserving the original scene, outperforming prior methods without dense annotations.

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19661
• PDF: https://arxiv.org/pdf/2512.19661
• Project Page: https://overplusplus.github.io/

==================================

For more data science resources:
https://t.me/DataScienceT

#GenerativeAI #VideoCompositing #VFX #ComputerGraphics #AIResearch
👍1
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

📝 Summary:
T2AV-Compass introduces a unified benchmark for text-to-audio-video generation evaluation. It features 500 diverse prompts and a dual-level framework. Evaluations reveal current T2AV models struggle significantly with realism and cross-modal consistency.

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21094
• PDF: https://arxiv.org/pdf/2512.21094
• Project Page: https://nju-link.github.io/T2AV-Compass/
• Github: https://github.com/NJU-LINK/T2AV-Compass/

==================================

For more data science resources:
https://t.me/DataScienceT

#TextToAudioVideo #MultimodalAI #AIEvaluation #GenerativeAI #AIResearch
Media is too big
VIEW IN TELEGRAM
Spatia: Video Generation with Updatable Spatial Memory

📝 Summary:
Spatia is a video generation framework that improves long-term consistency by using an updatable 3D scene point cloud as persistent spatial memory. It iteratively generates video clips and updates this memory via visual SLAM, enabling realistic videos and 3D-aware interactive editing.

🔹 Publication Date: Published on Dec 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15716
• PDF: https://arxiv.org/pdf/2512.15716
• Project Page: https://zhaojingjing713.github.io/Spatia/
• Github: https://github.com/ZhaoJingjing713/Spatia

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #ComputerVision #3DReconstruction #SLAM
1
SkyReels-V2: Infinite-length Film Generative Model

📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2

🔹 Models citing this paper:
https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
https://huggingface.co/Skywork/SkyCaptioner-V1
https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P

Spaces citing this paper:
https://huggingface.co/spaces/fffiloni/SkyReels-V2
https://huggingface.co/spaces/Dudu0043/SkyReels-V2
https://huggingface.co/spaces/14eee109giet/SkyReels-V2

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch
2
This media is not supported in your browser
VIEW IN TELEGRAM
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

📝 Summary:
InsertAnywhere is a framework for realistic video object insertion. It uses 4D aware mask generation for geometric consistency and an extended diffusion model for appearance-faithful synthesis, outperforming existing methods.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17504
• PDF: https://arxiv.org/pdf/2512.17504
• Project Page: https://myyzzzoooo.github.io/InsertAnywhere/
• Github: https://github.com/myyzzzoooo/InsertAnywhere

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoEditing #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
1
Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

📝 Summary:
MiA-RAG enhances RAG systems with global context awareness, inspired by human understanding. It uses hierarchical summarization to build a 'mindscape,' improving long-context retrieval and generation for better evidence-based understanding.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17220
• PDF: https://arxiv.org/pdf/2512.17220

🔹 Models citing this paper:
https://huggingface.co/MindscapeRAG/MiA-Emb-8B
https://huggingface.co/MindscapeRAG/MiA-Emb-4B
https://huggingface.co/MindscapeRAG/MiA-Emb-0.6B

==================================

For more data science resources:
https://t.me/DataScienceT

#RAG #LLM #NLP #GenerativeAI #ContextUnderstanding
1
Media is too big
VIEW IN TELEGRAM
Yume-1.5: A Text-Controlled Interactive World Generation Model

📝 Summary:
Yume-1.5 is a novel framework that generates realistic, interactive, and continuous worlds from a single image or text prompt. It overcomes prior limitations in real-time performance and text control by using unified context compression, streaming acceleration, and text-controlled world events.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22096
• PDF: https://arxiv.org/pdf/2512.22096
• Project Page: https://stdstu12.github.io/YUME-Project/
• Github: https://github.com/stdstu12/YUME

🔹 Models citing this paper:
https://huggingface.co/stdstu123/Yume-5B-720P

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #GenerativeAI #WorldGeneration #ComputerGraphics #DeepLearning
UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

📝 Summary:
UltraShape 1.0 is a 3D diffusion framework that generates high-fidelity shapes using a two-stage process: coarse then refined geometry. It includes a novel data pipeline improving dataset quality, enabling strong geometric results on public data.

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21185
• PDF: https://arxiv.org/pdf/2512.21185
• Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/
• Github: https://pku-yuangroup.github.io/UltraShape-1.0/

🔹 Models citing this paper:
https://huggingface.co/infinith/UltraShape

==================================

For more data science resources:
https://t.me/DataScienceT

#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

📝 Summary:
SpaceTimePilot is a video diffusion model for dynamic scene rendering, offering independent control over spatial viewpoint and temporal motion. It achieves precise space-time disentanglement via a time-embedding, temporal-warping training, and a synthetic dataset.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25075
• PDF: https://arxiv.org/pdf/2512.25075
• Project Page: https://zheninghuang.github.io/Space-Time-Pilot/

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoDiffusion #GenerativeAI #DynamicScenes #ComputerGraphics #DeepLearning
Guiding a Diffusion Transformer with the Internal Dynamics of Itself

📝 Summary:
This paper introduces Internal Guidance IG for diffusion models, which adds auxiliary supervision to intermediate layers during training and extrapolates outputs during sampling. This simple strategy significantly improves training efficiency and generation quality. IG achieves state-of-the-art F...

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24176
• PDF: https://arxiv.org/pdf/2512.24176
• Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
• Github: https://github.com/CVL-UESTC/Internal-Guidance

==================================

For more data science resources:
https://t.me/DataScienceT

#DiffusionModels #AI #DeepLearning #GenerativeAI #ComputerVision
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

📝 Summary:
FlowBlending optimizes video generation by adapting model capacity to each stage. It uses large models for critical early and late timesteps, and small models for intermediate ones. This achieves faster inference and fewer FLOPs with no loss in large model fidelity.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24724
• PDF: https://arxiv.org/pdf/2512.24724

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ModelOptimization
This media is not supported in your browser
VIEW IN TELEGRAM
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.

🔹 Publication Date: Published on Jan 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing

==================================

For more data science resources:
https://t.me/DataScienceT

#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

📝 Summary:
MLLMs struggle with hallucinations on counterfactual videos. DualityForge synthesizes counterfactual video data and QA pairs through diffusion-based editing to address this. This method significantly reduces model hallucinations and improves general performance.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24271
• PDF: https://arxiv.org/pdf/2512.24271
• Project Page: https://amap-ml.github.io/Taming-Hallucinations/
• Github: https://github.com/AMAP-ML/Taming-Hallucinations

==================================

For more data science resources:
https://t.me/DataScienceT

#MLLMs #VideoUnderstanding #AIHallucinations #GenerativeAI #MachineLearning
InfoSynth: Information-Guided Benchmark Synthesis for LLMs

📝 Summary:
InfoSynth automatically generates novel and diverse coding benchmarks for LLMs. It uses information-theoretic metrics and genetic algorithms to create scalable self-verifying problems, overcoming manual effort and training data contamination.

🔹 Publication Date: Published on Jan 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00575
• PDF: https://arxiv.org/pdf/2601.00575
• Project Page: https://ishirgarg.github.io/infosynth_web/
• Github: https://github.com/ishirgarg/infosynth

==================================

For more data science resources:
https://t.me/DataScienceT

#LLM #AI #Benchmarking #GenerativeAI #DeepLearning
Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

📝 Summary:
This paper addresses Preference Mode Collapse PMC in text-to-image diffusion models, where models lose diversity despite high reward scores. It introduces D^2-Align, a framework that mitigates PMC by directionally correcting the reward signal during optimization. This novel approach maintains gen...

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24146
• PDF: https://arxiv.org/pdf/2512.24146

==================================

For more data science resources:
https://t.me/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

📝 Summary:
DreamID-V is a novel video face swapping framework that uses diffusion transformers and curriculum learning. It achieves superior identity preservation and visual realism by bridging the image-to-video gap, outperforming existing methods and enhancing temporal consistency.

🔹 Publication Date: Published on Jan 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01425
• PDF: https://arxiv.org/pdf/2601.01425
• Project Page: https://guoxu1233.github.io/DreamID-V/
• Github: https://guoxu1233.github.io/DreamID-V/

==================================

For more data science resources:
https://t.me/DataScienceT

#FaceSwapping #DiffusionModels #ComputerVision #GenerativeAI #VideoAI
Selective Imperfection as a Generative Framework for Analysis, Creativity and Discovery

📝 Summary:
Materiomusic links matter's hierarchical structures to music's compositional logic through vibrational principles. Sound serves as a scientific probe, revealing how selective imperfection drives novelty in both. AI models can leverage this framework for creative invention beyond interpolation.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00863
• PDF: https://arxiv.org/pdf/2601.00863
• Github: https://github.com/lamm-mit/MusicAnalysis

Datasets citing this paper:
https://huggingface.co/datasets/lamm-mit/scales-12tet-defects

==================================

For more data science resources:
https://t.me/DataScienceT

#GenerativeAI #ComputationalMusic #ComplexSystems #Creativity #Interdisciplinary