ML Research Hub

✨Real-Time Reasoning Agents in Evolving Environments

📝 Summary:
AI agents struggle with real-time reasoning in dynamic environments, failing to balance logical judgments with timely responses. This paper introduces Real-Time Reasoning Gym and AgileThinker. AgileThinker combines reactive and planning approaches to effectively balance reasoning depth and respon...

🔹 Publication Date: Published on Nov 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04898
• PDF: https://arxiv.org/pdf/2511.04898
• Project Page: https://realtimegym.saltlab.stanford.edu
• Github: https://github.com/SALT-NLP/RealtimeGym

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AI #RealTimeAI #AutonomousAgents #DynamicEnvironments #MachineLearning

382 views23:30

✨FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

📝 Summary:
FlashVSR introduces the first real-time, one-step streaming diffusion framework for video super-resolution. It addresses high latency and computation through innovations like distillation, sparse attention, and a tiny decoder. FlashVSR achieves state-of-the-art performance with up to 12x speedup.

🔹 Publication Date: Published on Oct 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.12747
• PDF: https://arxiv.org/pdf/2510.12747
• Project Page: https://zhuang2002.github.io/FlashVSR/
• Github: https://github.com/OpenImagingLab/FlashVSR

🔹 Models citing this paper:
• https://huggingface.co/JunhaoZhuang/FlashVSR
• https://huggingface.co/JunhaoZhuang/FlashVSR-v1.1

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#FlashVSR #VideoSuperResolution #RealTimeAI #DiffusionModels #ComputerVision

🔥1

330 views11:04

✨Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

📝 Summary:
Inferix is a next-gen inference engine for immersive world simulation, generating high-quality interactive videos. It uses semi-autoregressive block-diffusion with LLM-style KV Cache for efficient, stable generation, enabling real-time world dynamics.

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20714
• PDF: https://arxiv.org/pdf/2511.20714
• Github: https://github.com/alibaba-damo-academy/Inferix

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#WorldSimulation #DiffusionModels #GenerativeAI #AIResearch #RealtimeAI

329 views03:01

✨VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference

📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.

🔹 Publication Date: Published on Nov 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning

126 views04:03

0:05

✨RELIC: Interactive Video World Model with Long-Horizon Memory

📝 Summary:
RELIC is a unified framework enabling real-time, memory-aware exploration of scenes with user control. It integrates long-horizon memory and spatial consistency using video-diffusion distillation, achieving 16 FPS generation with robust 3D coherence.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04040
• PDF: https://arxiv.org/pdf/2512.04040
• Project Page: https://relic-worldmodel.github.io/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#WorldModels #VideoDiffusion #DeepLearning #RealTimeAI #ComputerVision

148 views08:02

✨Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

📝 Summary:
Live Avatar uses a 14-billion-parameter diffusion model to achieve real-time, high-fidelity, infinite-length audio-driven avatar generation. It employs Timestep-forcing Pipeline Parallelism and Rolling Sink Frame Mechanism for efficiency and consistency, reaching 20 FPS on 5 H800 GPUs.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04677
• PDF: https://arxiv.org/pdf/2512.04677

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#LiveAvatar #GenerativeAI #RealtimeAI #DiffusionModels #AvatarGeneration

197 views04:04

✨Real-Time Object Detection Meets DINOv3

📝 Summary:
DEIMv2 extends DEIM with DINOv3 features, achieving superior real-time object detection across GPU, edge, and mobile. It uses a Spatial Tuning Adapter and pruned HGNetv2 for diverse models, setting new state of the art with impressive performance-cost trade-offs.

🔹 Publication Date: Published on Sep 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.20787
• PDF: https://arxiv.org/pdf/2509.20787
• Project Page: https://intellindust-ai-lab.github.io/projects/DEIMv2/
• Github: https://github.com/Intellindust-AI-Lab/DEIMv2

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ObjectDetection #RealTimeAI #ComputerVision #MachineLearning #EdgeAI

302 views08:01

0:07

✨PersonaLive! Expressive Portrait Image Animation for Live Streaming

📝 Summary:
PersonaLive is a diffusion framework for real-time portrait animation, overcoming latency issues in live streaming. It uses multi-stage training, implicit signals for motion control, and appearance distillation for efficiency. This achieves state-of-the-art performance with up to 7-22x speedup ov...

🔹 Publication Date: Published on Dec 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11253
• PDF: https://arxiv.org/pdf/2512.11253
• Github: https://github.com/GVCLab/PersonaLive

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#PortraitAnimation #LiveStreaming #DiffusionModels #RealtimeAI #ComputerVision

❤1

280 views03:02

✨Sharp Monocular View Synthesis in Less Than a Second

📝 Summary:
SHARP synthesizes photorealistic 3D views from a single image using a 3D Gaussian representation. It achieves state-of-the-art quality with rapid processing, taking less than a second, and supports metric camera movements.

🔹 Publication Date: Published on Dec 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10685
• PDF: https://arxiv.org/pdf/2512.10685
• Project Page: https://apple.github.io/ml-sharp/
• Github: https://github.com/apple/ml-sharp

🔹 Models citing this paper:
• https://huggingface.co/apple/Sharp

✨ Spaces citing this paper:
• https://huggingface.co/spaces/ronedgecomb/ml-sharp

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ViewSynthesis #3DVision #ComputerVision #RealtimeAI #GaussianSplats

❤1

346 views12:05

✨TimeBill: Time-Budgeted Inference for Large Language Models

📝 Summary:
TimeBill is a framework for LLMs in time-critical systems. It predicts execution time and adaptively adjusts KV cache eviction to balance inference efficiency and response performance within given time budgets, improving task completion rates.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21859
• PDF: https://arxiv.org/pdf/2512.21859

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#LLM #AI #RealTimeAI #InferenceOptimization #DeepLearning

❤1

328 views04:02

0:55

✨LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI

257 views09:53

✨YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection

📝 Summary:
YOLO-Master proposes an Efficient Sparse Mixture-of-Experts ES-MoE block for real-time object detection. It adaptively allocates computational resources based on scene complexity using a dynamic routing network, overcoming static computation limits. This improves accuracy and speed, especially on...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23273
• PDF: https://arxiv.org/pdf/2512.23273

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ObjectDetection #YOLO #MixtureOfExperts #Transformers #RealTimeAI

❤1

278 views11:54

0:15

✨Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.

🔹 Publication Date: Published on Jan 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch

249 views03:01