ML Research Hub
32.8K subscribers
4.29K photos
258 videos
23 files
4.63K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

📝 Summary:
InfiniteVGGT enables continuous 3D visual geometry understanding for infinite streams. It uses a causal transformer with adaptive rolling memory for long-term stability, outperforming existing streaming methods. A new Long3D benchmark is introduced for rigorous evaluation of such systems.

🔹 Publication Date: Published on Jan 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02281
• PDF: https://arxiv.org/pdf/2601.02281
• Github: https://github.com/AutoLab-SAI-SJTU/InfiniteVGGT

==================================

For more data science resources:
https://t.me/DataScienceT

#VisualGeometry #3DVision #Transformers #StreamingAI #DeepLearning
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

📝 Summary:
SWE-Lego achieves state-of-the-art software issue resolution through a lightweight supervised fine-tuning approach. It uses a high-quality dataset and refined training procedures like error masking and a difficulty-based curriculum, outperforming complex methods. Performance is further boosted by...

🔹 Publication Date: Published on Jan 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01426
• PDF: https://arxiv.org/pdf/2601.01426
• Project Page: https://github.com/SWE-Lego/SWE-Lego
• Github: https://github.com/SWE-Lego/SWE-Lego

🔹 Models citing this paper:
https://huggingface.co/SWE-Lego/SWE-Lego-Qwen3-8B
https://huggingface.co/SWE-Lego/SWE-Lego-Qwen3-32B

Datasets citing this paper:
https://huggingface.co/datasets/SWE-Lego/SWE-Lego-Real-Data
https://huggingface.co/datasets/SWE-Lego/SWE-Lego-Synthetic-Data

==================================

For more data science resources:
https://t.me/DataScienceT

#SoftwareEngineering #MachineLearning #LLM #FineTuning #AIforCode
M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models

📝 Summary:
Existing concept erasure methods in diffusion models are vulnerable to non-text inputs. M-ErasureBench is a new multimodal evaluation framework, and IRECE is a module to restore robustness against these attacks, reducing concept reproduction.

🔹 Publication Date: Published on Dec 28, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22877
• PDF: https://arxiv.org/pdf/2512.22877

==================================

For more data science resources:
https://t.me/DataScienceT

#DiffusionModels #ConceptErasure #MultimodalAI #AISafety #MachineLearning
Selective Imperfection as a Generative Framework for Analysis, Creativity and Discovery

📝 Summary:
Materiomusic links matter's hierarchical structures to music's compositional logic through vibrational principles. Sound serves as a scientific probe, revealing how selective imperfection drives novelty in both. AI models can leverage this framework for creative invention beyond interpolation.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00863
• PDF: https://arxiv.org/pdf/2601.00863
• Github: https://github.com/lamm-mit/MusicAnalysis

Datasets citing this paper:
https://huggingface.co/datasets/lamm-mit/scales-12tet-defects

==================================

For more data science resources:
https://t.me/DataScienceT

#GenerativeAI #ComputationalMusic #ComplexSystems #Creativity #Interdisciplinary
Confidence Estimation for LLMs in Multi-turn Interactions

📝 Summary:
This paper presents the first systematic study of confidence estimation in multi-turn LLM interactions, introducing a formal evaluation framework, novel metrics, and a Hinter-Guesser dataset paradigm. It reveals that current confidence techniques struggle with calibration and monotonicity in mult...

🔹 Publication Date: Published on Jan 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02179
• PDF: https://arxiv.org/pdf/2601.02179

==================================

For more data science resources:
https://t.me/DataScienceT

#LLM #ConfidenceEstimation #ConversationalAI #NLP #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies

📝 Summary:
DiffProxy generates multi-view consistent human proxies using diffusion models to improve human mesh recovery. This bridges synthetic training and real-world generalization, achieving state-of-the-art performance on real benchmarks.

🔹 Publication Date: Published on Jan 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02267
• PDF: https://arxiv.org/pdf/2601.02267
• Project Page: https://wrk226.github.io/DiffProxy.html
• Github: https://github.com/wrk226/DiffProxy

==================================

For more data science resources:
https://t.me/DataScienceT

#HumanMeshRecovery #DiffusionModels #ComputerVision #DeepLearning #AI
1
CPPO: Contrastive Perception for Vision Language Policy Optimization

📝 Summary:
CPPO improves vision-language model fine-tuning by detecting perception tokens through entropy shifts. It then applies a Contrastive Perception Loss to enhance multimodal reasoning, outperforming prior methods more efficiently.

🔹 Publication Date: Published on Jan 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00501
• PDF: https://arxiv.org/pdf/2601.00501

==================================

For more data science resources:
https://t.me/DataScienceT

#VisionLanguageModels #MultimodalAI #ContrastiveLearning #DeepLearning #AIResearch
Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping

📝 Summary:
Prithvi-CAFE improves flood mapping by integrating a pretrained Geo-Foundation Model encoder with a parallel CNN branch featuring attention modules. This hybrid approach effectively captures both global context and critical local details, achieving state-of-the-art results on Sen1Flood11 and Floo...

🔹 Publication Date: Published on Jan 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02315
• PDF: https://arxiv.org/pdf/2601.02315
• Github: https://github.com/Sk-2103/Prithvi-CAFE

==================================

For more data science resources:
https://t.me/DataScienceT

#FloodMapping #DeepLearning #GeoAI #RemoteSensing #ComputerVision
nature papers: 1400$

Q1 and  Q2 papers    900$

Q3 and Q4 papers   500$

Doctoral thesis (complete)    700$

M.S thesis         300$

paper simulation   200$

Contact me
https://t.me/m/-nTmpj5vYzNk
This media is not supported in your browser
VIEW IN TELEGRAM
LTX-2: Efficient Joint Audio-Visual Foundation Model

📝 Summary:
LTX-2 is an open-source audiovisual diffusion model generating synchronized video and audio content. It uses a dual-stream transformer to achieve state-of-the-art quality, producing rich audio tracks efficiently.

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03233
• PDF: https://arxiv.org/pdf/2601.03233

==================================

For more data science resources:
https://t.me/DataScienceT

#AudiovisualAI #DiffusionModels #GenerativeAI #FoundationModels #VideoGeneration
This media is not supported in your browser
VIEW IN TELEGRAM
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

📝 Summary:
InfiniDepth represents depth as neural implicit fields using a local implicit decoder, enabling continuous 2D coordinate querying for arbitrary-resolution depth estimation and superior performance in ...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03252
• PDF: https://arxiv.org/pdf/2601.03252
• Github: https://zju3dv.github.io/InfiniDepth

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

📝 Summary:
A new large-scale video dataset and framework are presented that enable effective first-frame propagation without runtime guidance through adaptive spatio-temporal positional encoding and self-distill...

🔹 Publication Date: Published on Jan 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01720
• PDF: https://arxiv.org/pdf/2601.01720
• Project Page: https://ffp-300k.github.io/

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization

📝 Summary:
A unified multimodal large language model for end-to-end speaker-attributed, time-stamped transcription with extended context window and strong generalization across benchmarks. AI-generated summary S...

🔹 Publication Date: Published on Jan 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01554
• PDF: https://arxiv.org/pdf/2601.01554
• Project Page: https://mosi.cn/models/moss-transcribe-diarize

Spaces citing this paper:
https://huggingface.co/spaces/OpenMOSS-Team/MOSS-transcribe-diarize

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving

📝 Summary:
Visual mathematical problem solving remains challenging for multimodal large language models, prompting the development of CogFlow, a cognitive-inspired three-stage framework that enhances perception,...

🔹 Publication Date: Published on Jan 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01874
• PDF: https://arxiv.org/pdf/2601.01874
• Project Page: https://shchen233.github.io/cogflow/
• Github: https://shchen233.github.io/cogflow/

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
NitroGen: An Open Foundation Model for Generalist Gaming Agents

📝 Summary:
NitroGen is a vision-action foundation model trained on extensive gameplay data that demonstrates strong cross-game generalization and effective transfer learning capabilities. AI-generated summary We...

🔹 Publication Date: Published on Jan 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02427
• PDF: https://arxiv.org/pdf/2601.02427
• Project Page: https://nitrogen.minedojo.org/
• Github: https://github.com/MineDojo/NitroGen

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

📝 Summary:
A novel explainability-guided training framework for hate speech detection in Indic languages that combines large language models with attention-enhancing techniques and provides human-annotated ratio...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03194
• PDF: https://arxiv.org/pdf/2601.03194
• Github: https://github.com/ziarehman30/X-MuTeST

Datasets citing this paper:
https://huggingface.co/datasets/UVSKKR/X-MuTeST

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Parallel Latent Reasoning for Sequential Recommendation

📝 Summary:
Parallel Latent Reasoning framework improves sequential recommendation by exploring multiple diverse reasoning trajectories simultaneously through learnable trigger tokens and adaptive aggregation. AI...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03153
• PDF: https://arxiv.org/pdf/2601.03153

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
DreamStyle: A Unified Framework for Video Stylization

📝 Summary:
DreamStyle is a unified video stylization framework that supports multiple style conditions while addressing style inconsistency and temporal flicker through a specialized data curation pipeline and L...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02785
• PDF: https://arxiv.org/pdf/2601.02785
• Project Page: https://lemonsky1995.github.io/dreamstyle/

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MiMo-V2-Flash Technical Report

📝 Summary:
MiMo-V2-Flash is a sparse Mixture-of-Experts model with hybrid attention architecture and efficient distillation technique that achieves strong performance with reduced parameters and improved inferen...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02780
• PDF: https://arxiv.org/pdf/2601.02780
• Project Page: https://mimo.xiaomi.com/blog/mimo-v2-flash
• Github: https://github.com/XiaomiMiMo/MiMo-V2-Flash

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research