ML Research Hub
32.8K subscribers
4.09K photos
237 videos
23 files
4.41K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Media is too big
VIEW IN TELEGRAM
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

📝 Summary:
Stable Video Infinity SVI generates infinite-length videos with high consistency and controllable stories. It introduces Error-Recycling Fine-Tuning, teaching the Diffusion Transformer to correct its self-generated errors and address the training-test discrepancy.

🔹 Publication Date: Published on Oct 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.09212
• PDF: https://arxiv.org/pdf/2510.09212
• Project Page: https://stable-video-infinity.github.io/homepage/
• Github: https://github.com/vita-epfl/Stable-Video-Infinity

🔹 Models citing this paper:
https://huggingface.co/vita-video-gen/svi-model

Datasets citing this paper:
https://huggingface.co/datasets/vita-video-gen/svi-benchmark

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #AI #DiffusionModels #DeepLearning #ComputerVision
BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

📝 Summary:
This paper presents a video diffusion framework that decouples scene dynamics from camera pose. This enables precise 4D control over time and viewpoint for high-quality video generation, outperforming prior models in controllability.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05076
• PDF: https://arxiv.org/pdf/2512.05076

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #DiffusionModels #GenerativeAI #ComputerVision #AICameraControl
This media is not supported in your browser
VIEW IN TELEGRAM
EgoLCD: Egocentric Video Generation with Long Context Diffusion

📝 Summary:
EgoLCD addresses content drift in long egocentric video generation by integrating long-term sparse and attention-based short-term memory with narrative prompting. It achieves state-of-the-art perceptual quality and temporal consistency, mitigating generative forgetting.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04515
• PDF: https://arxiv.org/pdf/2512.04515
• Project Page: https://aigeeksgroup.github.io/EgoLCD/
• Github: https://github.com/AIGeeksGroup/EgoLCD

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #VideoGeneration #DiffusionModels #ComputerVision #EgocentricVision
👍1
Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos

📝 Summary:
A new metric evaluates human action in generated videos by using a learned latent space of real-world actions, fusing skeletal geometry and appearance features. It significantly improves temporal and visual correctness assessment, outperforming existing methods and correlating better with human p...

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01803
• PDF: https://arxiv.org/pdf/2512.01803
• Project Page: https://xthomasbu.github.io/video-gen-evals/
• Github: https://xthomasbu.github.io/video-gen-evals/

Datasets citing this paper:
https://huggingface.co/datasets/dghadiya/TAG-Bench-Video

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #HumanMotion #ComputerVision #AIMetrics #DeepLearning
Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

📝 Summary:
Deep Forcing is a training-free method that enhances real-time video diffusion for high-quality, long-duration generation. It uses Deep Sink for stable context and Participative Compression for efficient KV cache pruning, achieving over 12x extrapolation and improved consistency.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05081
• PDF: https://arxiv.org/pdf/2512.05081
• Github: https://cvlab-kaist.github.io/DeepForcing/

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #DiffusionModels #TrainingFreeAI #DeepLearning #ComputerVision
2
Media is too big
VIEW IN TELEGRAM
Light-X: Generative 4D Video Rendering with Camera and Illumination Control

📝 Summary:
Light-X is a video generation framework for controllable rendering from monocular videos with joint viewpoint and illumination control. It disentangles geometry and lighting using synthetic data for robust training, outperforming prior methods in both aspects.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05115
• PDF: https://arxiv.org/pdf/2512.05115
• Project Page: https://lightx-ai.github.io/
• Github: https://github.com/TQTQliu/Light-X

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #ComputerVision #AI #NeuralRendering #GenerativeAI
1
ProPhy: Progressive Physical Alignment for Dynamic World Simulation

📝 Summary:
ProPhy is a two-stage framework that enhances video generation by explicitly incorporating physics-aware conditioning and anisotropic generation. It uses a Mixture-of-Physics-Experts mechanism to extract fine-grained physical priors, improving physical consistency and realism in dynamic world sim...

🔹 Publication Date: Published on Dec 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05564
• PDF: https://arxiv.org/pdf/2512.05564

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #PhysicsAI #DynamicSimulation #DeepLearning #ComputerVision
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

📝 Summary:
UnityVideo is a unified framework enhancing video generation by integrating multiple modalities and training paradigms. It uses dynamic noising and a modality switcher for comprehensive world understanding. This improves video quality, consistency, and zero-shot generalization to new data.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07831
• PDF: https://arxiv.org/pdf/2512.07831
• Project Page: https://jackailab.github.io/Projects/UnityVideo/
• Github: https://github.com/dvlab-research/UnityVideo

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #MultimodalAI #GenerativeAI #DeepLearning #AIResearch
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V

==================================

For more data science resources:
https://t.me/DataScienceT

#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning
Media is too big
VIEW IN TELEGRAM
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

📝 Summary:
OneStory generates coherent multi-shot videos by modeling global cross-shot context. It uses a Frame Selection module and an Adaptive Conditioner for next-shot generation, leveraging pretrained models and a new dataset. This achieves state-of-the-art narrative coherence for long-form video storyt...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07802
• PDF: https://arxiv.org/pdf/2512.07802
• Project Page: https://zhaochongan.github.io/projects/OneStory/

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #AI #DeepLearning #ComputerVision #GenerativeAI
1
VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

📝 Summary:
VideoSSM proposes a hybrid state-space memory model for long video generation. It unifies autoregressive diffusion with global state-space memory and local context to achieve state-of-the-art temporal consistency and motion stability. This enables scalable, interactive minute-scale video synthesis.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04519
• PDF: https://arxiv.org/pdf/2512.04519

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #DiffusionModels #StateSpaceModels #DeepLearning
GimbalDiffusion: Gravity-Aware Camera Control for Video Generation

📝 Summary:
GimbalDiffusion offers precise text-to-video camera control by using absolute, gravity-aligned coordinates. This framework defines interpretable camera trajectories, enhancing robustness and diverse motion beyond relative methods.

🔹 Publication Date: Published on Dec 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.09112
• PDF: https://arxiv.org/pdf/2512.09112
• Project Page: https://lvsn.github.io/GimbalDiffusion/

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #AI #DiffusionModels #ComputerVision #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

📝 Summary:
SAM2VideoX improves realistic video motion by distilling structure-preserving priors from a tracking model into a bidirectional diffusion model. It uses novel feature fusion and local alignment, achieving significant performance gains over prior methods.

🔹 Publication Date: Published on Dec 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11792
• PDF: https://arxiv.org/pdf/2512.11792
• Project Page: https://sam2videox.github.io/

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #DiffusionModels #ComputerVision #DeepLearning #MotionTracking
This media is not supported in your browser
VIEW IN TELEGRAM
EgoX: Egocentric Video Generation from a Single Exocentric Video

📝 Summary:
EgoX generates egocentric videos from single exocentric inputs. It uses video diffusion models with LoRA adaptation, unified conditioning, and geometry-guided self-attention for coherent and realistic results.

🔹 Publication Date: Published on Dec 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08269
• PDF: https://arxiv.org/pdf/2512.08269
• Project Page: https://keh0t0.github.io/EgoX/
• Github: https://github.com/KEH0T0/EgoX

==================================

For more data science resources:
https://t.me/DataScienceT

#EgocentricVideo #VideoGeneration #DiffusionModels #ComputerVision #DeepLearning
1
Media is too big
VIEW IN TELEGRAM
Animate Any Character in Any World

📝 Summary:
AniX extends controllable-entity models to enable diverse, user-defined character interactions in static 3D environments via natural language. It synthesizes temporally coherent videos through conditional autoregressive video generation, allowing characters to perform open-ended actions.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17796
• PDF: https://arxiv.org/pdf/2512.17796
• Project Page: https://snowflakewang.github.io/AniX/
• Github: https://github.com/snowflakewang/AniX

==================================

For more data science resources:
https://t.me/DataScienceT

#GenerativeAI #VideoGeneration #CharacterAnimation #NLP #3D
1
Media is too big
VIEW IN TELEGRAM
Spatia: Video Generation with Updatable Spatial Memory

📝 Summary:
Spatia is a video generation framework that improves long-term consistency by using an updatable 3D scene point cloud as persistent spatial memory. It iteratively generates video clips and updates this memory via visual SLAM, enabling realistic videos and 3D-aware interactive editing.

🔹 Publication Date: Published on Dec 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15716
• PDF: https://arxiv.org/pdf/2512.15716
• Project Page: https://zhaojingjing713.github.io/Spatia/
• Github: https://github.com/ZhaoJingjing713/Spatia

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #ComputerVision #3DReconstruction #SLAM
1
SkyReels-V2: Infinite-length Film Generative Model

📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2

🔹 Models citing this paper:
https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
https://huggingface.co/Skywork/SkyCaptioner-V1
https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P

Spaces citing this paper:
https://huggingface.co/spaces/fffiloni/SkyReels-V2
https://huggingface.co/spaces/Dudu0043/SkyReels-V2
https://huggingface.co/spaces/14eee109giet/SkyReels-V2

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch
2
This media is not supported in your browser
VIEW IN TELEGRAM
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

📝 Summary:
Dream2Flow bridges video generation and robotic control using 3D object flow. It reconstructs 3D object motions from generated videos, enabling zero-shot manipulation of diverse objects through trajectory tracking without task-specific demonstrations.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24766
• PDF: https://arxiv.org/pdf/2512.24766

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #Robotics #3DVision #AI #ZeroShotLearning
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

📝 Summary:
FlowBlending optimizes video generation by adapting model capacity to each stage. It uses large models for critical early and late timesteps, and small models for intermediate ones. This achieves faster inference and fewer FLOPs with no loss in large model fidelity.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24724
• PDF: https://arxiv.org/pdf/2512.24724

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ModelOptimization