ML Research Hub
32.8K subscribers
4.09K photos
237 videos
23 files
4.41K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

📝 Summary:
InternVideo-Next proposes a two-stage Encoder-Predictor-Decoder framework for general video representation learning without text supervision. It uses a conditional diffusion decoder to bridge pixel fidelity with semantics in Stage 1, then a latent world model in Stage 2 to learn world knowledge a...

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01342
• PDF: https://arxiv.org/pdf/2512.01342

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoFoundationModels #VideoAI #DeepLearning #UnsupervisedLearning #DiffusionModels
How Much 3D Do Video Foundation Models Encode?

📝 Summary:
A new framework quantifies 3D understanding in Video Foundation Models VidFMs. VidFMs, trained only on video, show strong 3D awareness, often surpassing expert 3D models, providing insights for 3D AI.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19949
• PDF: https://arxiv.org/pdf/2512.19949
• Project Page: https://vidfm-3d-probe.github.io/
• Github: https://vidfm-3d-probe.github.io

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoFoundationModels #3DUnderstanding #ComputerVision #AIResearch #DeepLearning
2