ML Research Hub
32.8K subscribers
4.41K photos
272 videos
23 files
4.77K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction

📝 Summary:
Gen3R combines reconstruction and video diffusion models to generate 3D scenes. It produces RGB videos and 3D geometry by aligning geometric and appearance latents. This achieves state-of-the-art results and improves reconstruction robustness.

🔹 Publication Date: Published on Jan 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04090
• PDF: https://arxiv.org/pdf/2601.04090
• Project Page: https://xdimlab.github.io/Gen3R/
• Github: https://xdimlab.github.io/Gen3R/

==================================

For more data science resources:
https://t.me/DataScienceT

#3DGeneration #DiffusionModels #ComputerVision #3DReconstruction #DeepLearning
👍1
RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

📝 Summary:
RL-AWB is a novel framework combining statistical methods with deep reinforcement learning for improved nighttime auto white balance. It is the first RL approach for color constancy, mimicking expert tuning. This method shows superior generalization across various lighting conditions, and a new m...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05249
• PDF: https://arxiv.org/pdf/2601.05249
• Project Page: https://ntuneillee.github.io/research/rl-awb/

==================================

For more data science resources:
https://t.me/DataScienceT

#ReinforcementLearning #ComputerVision #ImageProcessing #AutoWhiteBalance #LowLightImaging
2
Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset

📝 Summary:
This paper introduces IMDD-1M, a large dataset of 1 million industrial defect image-text pairs. It enables training a vision-language foundation model tailored for industrial use. This model achieves comparable performance with less data for specialized tasks, promoting data-efficient quality ins...

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24160
• PDF: https://arxiv.org/pdf/2512.24160

==================================

For more data science resources:
https://t.me/DataScienceT

#IndustrialAI #VisionLanguageModel #DefectDetection #MultimodalAI #ComputerVision
ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3D Gaussian Splatting

📝 Summary:
ProFuse enhances open-vocabulary 3DGS understanding via an efficient, context-aware framework. It uses a pre-registration phase to fuse semantic features onto Gaussians for cross-view coherence, completing semantic attachment twice as fast as SOTA.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04754
• PDF: https://arxiv.org/pdf/2601.04754
• Project Page: https://chiou1203.github.io/ProFuse/
• Github: https://chiou1203.github.io/ProFuse/

==================================

For more data science resources:
https://t.me/DataScienceT

#3DGaussianSplatting #ComputerVision #OpenVocabulary #3DReconstruction #DeepLearning
RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

📝 Summary:
RL-AWB is a novel framework for nighttime auto white balance. It combines statistical methods with deep reinforcement learning, mimicking expert tuning to improve color constancy in low-light scenes. The method shows superior generalization across various lighting conditions and includes a new mu...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05249
• PDF: https://arxiv.org/pdf/2601.05249
• Project Page: https://ntuneillee.github.io/research/rl-awb/
• Github: https://github.com/BrianChen1120/RL-AWB

==================================

For more data science resources:
https://t.me/DataScienceT

#ReinforcementLearning #DeepLearning #ComputerVision #ImageProcessing #AWB
RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

📝 Summary:
Collecting diverse robot manipulation data is challenging. This paper introduces visual identity prompting, using exemplar images to guide diffusion models for generating multi-view, temporally coherent data. This augmented data improves robot policy performance in both simulation and real-world ...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05241
• PDF: https://arxiv.org/pdf/2601.05241
• Project Page: https://robovip.github.io/RoboVIP/
• Github: https://robovip.github.io/RoboVIP/

==================================

For more data science resources:
https://t.me/DataScienceT

#Robotics #AI #GenerativeAI #ComputerVision #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
Plenoptic Video Generation

📝 Summary:
PlenopticDreamer addresses multi-view video re-rendering inconsistency by synchronizing generative hallucinations. It uses an autoregressive model with camera-guided retrieval to ensure spatio-temporal coherence, achieving state-of-the-art results with high fidelity.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05239
• PDF: https://arxiv.org/pdf/2601.05239
• Project Page: https://research.nvidia.com/labs/dir/plenopticdreamer/

==================================

For more data science resources:
https://t.me/DataScienceT

#PlenopticVideo #GenerativeAI #VideoGeneration #ComputerVision #DeepLearning
ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers

📝 Summary:
ViTNT-FIQA is a training-free method for face image quality assessment using Vision Transformers. It measures the stability of patch embeddings across intermediate blocks with a single forward pass. High-quality images show stable feature evolution, achieving competitive results efficiently.

🔹 Publication Date: Published on Jan 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05741
• PDF: https://arxiv.org/pdf/2601.05741
• Github: https://github.com/gurayozgur/ViTNT-FIQA

==================================

For more data science resources:
https://t.me/DataScienceT

#VisionTransformers #FaceQuality #ComputerVision #DeepLearning #AI
2
Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

📝 Summary:
Laser introduces Dynamic Windowed Alignment Learning DWAL for visual reasoning. This method maintains global feature superposition, achieving state-of-the-art performance with significantly reduced computational costs and high efficiency.

🔹 Publication Date: Published on Jan 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06803
• PDF: https://arxiv.org/pdf/2601.06803

==================================

For more data science resources:
https://t.me/DataScienceT

#VisualReasoning #MachineLearning #AIResearch #ComputerVision #EfficientAI
1
FlyPose: Towards Robust Human Pose Estimation From Aerial Views

📝 Summary:
FlyPose is a lightweight, real-time aerial human pose estimation system. It achieves significantly improved accuracy through multi-dataset training and performs efficiently on UAVs. A new challenging dataset, FlyPose-104, is also released.

🔹 Publication Date: Published on Jan 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05747
• PDF: https://arxiv.org/pdf/2601.05747
• Github: https://github.com/farooqhassaan/FlyPose

==================================

For more data science resources:
https://t.me/DataScienceT

#HumanPoseEstimation #UAV #ComputerVision #DeepLearning #AI
1
This media is not supported in your browser
VIEW IN TELEGRAM
3AM: Segment Anything with Geometric Consistency in Videos

📝 Summary:
3AM enhances video object segmentation by integrating 3D-aware features from MUSt3R into SAM2. This improves viewpoint consistency and geometric recognition using only RGB input at inference, significantly outperforming prior methods on challenging datasets.

🔹 Publication Date: Published on Jan 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.08831
• PDF: https://arxiv.org/pdf/2601.08831
• Project Page: https://jayisaking.github.io/3AM-Page/

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoSegmentation #ComputerVision #DeepLearning #GeometricAI #AI
ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios

📝 Summary:
ViDoRe v3 is a new multimodal RAG benchmark for complex queries over visually rich, multi-language documents. It shows visual retrievers and late-interaction models improve performance, though models struggle with non-textual elements and visual grounding.

🔹 Publication Date: Published on Jan 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.08620
• PDF: https://arxiv.org/pdf/2601.08620

Datasets citing this paper:
https://huggingface.co/datasets/vidore/vidore_v3_physics
https://huggingface.co/datasets/vidore/vidore_v3_computer_science
https://huggingface.co/datasets/vidore/vidore_v3_finance_en

==================================

For more data science resources:
https://t.me/DataScienceT

#RAG #MultimodalAI #AIResearch #NLP #ComputerVision
Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

📝 Summary:
Omni-R1 proposes unified generative multimodal reasoning. It uses intermediate image generation to enable diverse skills across tasks. Omni-R1-Zero, needing no multimodal data, matches or exceeds its performance, showing a promising path.

🔹 Publication Date: Published on Jan 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.09536
• PDF: https://arxiv.org/pdf/2601.09536

🔹 Models citing this paper:
https://huggingface.co/ModalityDance/Omni-R1
https://huggingface.co/ModalityDance/Omni-R1-Zero

Datasets citing this paper:
https://huggingface.co/datasets/ModalityDance/Omni-Bench

==================================

For more data science resources:
https://t.me/DataScienceT

#MultimodalAI #GenerativeAI #DeepLearning #ComputerVision #AIResearch