ML Research Hub
32.8K subscribers
4.31K photos
260 videos
23 files
4.65K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions

📝 Summary:
Existing feedforward subject-driven video customization methods mainly study single-subject scenarios due to the difficulty of constructing multi-subject training data pairs. Another challenging probl...

🔹 Publication Date: Published on Jun 29, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.23361
• PDF: https://arxiv.org/pdf/2506.23361
• Project Page: https://caiyuanhao1998.github.io/project/OmniVCus/
• Github: https://github.com/caiyuanhao1998/Open-OmniVCus

🔹 Models citing this paper:
https://huggingface.co/CaiYuanhao/OmniVCus

Datasets citing this paper:
https://huggingface.co/datasets/CaiYuanhao/OmniVCus
https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Test
https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Train

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs

📝 Summary:
mmGRPO, a multi-module extension of GRPO, enhances accuracy in modular AI systems by optimizing LM calls and prompts across various tasks. AI-generated summary Group Relative Policy Optimization ( GRP...

🔹 Publication Date: Published on Aug 6, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04660
• PDF: https://arxiv.org/pdf/2508.04660
• Project Page: https://dspy.ai
• Github: https://github.com/stanfordnlp/dspy

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

📝 Summary:
InternVL3 is a multimodal pre-trained language model that jointly learns from both multimodal data and text, improving performance and scalability through advanced techniques and setting a new state-o...

🔹 Publication Date: Published on Apr 14, 2025

🔹 Paper Links:
• arXiv Page: https://arxivlens.com/PaperView/Details/internvl3-exploring-advanced-training-and-test-time-recipes-for-open-source-multimodal-models-4439-1c8e76a9
• PDF: https://arxiv.org/pdf/2504.10479
• Project Page: https://internvl.github.io/blog/2025-04-11-InternVL-3.0/

🔹 Models citing this paper:
https://huggingface.co/OpenGVLab/InternVL3-78B
https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B
https://huggingface.co/OpenGVLab/InternVL3-8B

Datasets citing this paper:
https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2-prompts

Spaces citing this paper:
https://huggingface.co/spaces/AntResearchNLP/ViLaBench
https://huggingface.co/spaces/TIGER-Lab/MEGA-Bench
https://huggingface.co/spaces/developer0hye/InternVL3-8B

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

📝 Summary:
Dolphin, a multimodal document image parsing model, uses heterogeneous anchor prompting to achieve state-of-the-art performance on diverse page-level and element-level tasks through an efficient analy...

🔹 Publication Date: Published on May 20, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.14059
• PDF: https://arxiv.org/pdf/2505.14059
• Github: https://github.com/bytedance/dolphin

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
LightRAG: Simple and Fast Retrieval-Augmented Generation

📝 Summary:
LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times. AI...

🔹 Publication Date: Published on Oct 8, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.05779
• PDF: https://arxiv.org/pdf/2410.05779
• Github: https://github.com/hkuds/lightrag

Spaces citing this paper:
https://huggingface.co/spaces/rm-lht/lightrag

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SAM Audio: Segment Anything in Audio

📝 Summary:
SAM Audio, a diffusion transformer-based foundation model, achieves superior performance in general audio separation using unified text, visual, and temporal span prompts across various audio types. A...

🔹 Publication Date: Published on Dec 19, 2025

🔹 Paper Links:
• arXiv Page: https://arxivlens.com/PaperView/Details/sam-audio-segment-anything-in-audio-1718-de85c75a
• PDF: https://arxiv.org/pdf/2512.18099
• Project Page: https://ai.meta.com/samaudio/
• Github: https://github.com/facebookresearch/sam-audio

🔹 Models citing this paper:
https://huggingface.co/facebook/sam-audio-large
https://huggingface.co/facebook/sam-audio-small
https://huggingface.co/facebook/sam-audio-base

Datasets citing this paper:
https://huggingface.co/datasets/facebook/sam-audio-bench

Spaces citing this paper:
https://huggingface.co/spaces/lpeterl/sam-audio-webui
https://huggingface.co/spaces/Arrcttacsrks/SAM-Audio-Demo
https://huggingface.co/spaces/chippie1/SAM-Audio-Demo

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

📝 Summary:
GigaBrain-0, a VLA foundation model, uses world model-generated data to enhance cross-task generalization and policy robustness, improving real-world performance on complex manipulation tasks. AI-gene...

🔹 Publication Date: Published on Oct 22, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19430
• PDF: https://arxiv.org/pdf/2510.19430
• Project Page: https://gigabrain0.github.io/
• Github: https://github.com/open-gigaai/giga-brain-0

🔹 Models citing this paper:
https://huggingface.co/open-gigaai/GigaBrain-0-3.5B-Base

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PDFMathTranslate: Scientific Document Translation Preserving Layouts

📝 Summary:
PDFMathTranslate enables layout-preserving scientific document translation using large language models and precise layout detection, offering improved precision, flexibility, and efficiency. AI-genera...

🔹 Publication Date: Published on Jul 2, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.03009
• PDF: https://arxiv.org/pdf/2507.03009
• Github: https://github.com/byaidu/pdfmathtranslate

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PyTorch Distributed: Experiences on Accelerating Data Parallel Training

📝 Summary:
The PyTorch distributed data parallel module optimizes large-scale model training using techniques like gradient bucketing, computation-communication overlap, and selective synchronization to achieve ...

🔹 Publication Date: Published on Jun 28, 2020

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2006.15704
• PDF: https://arxiv.org/pdf/2006.15704
• Github: https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/distributed.py

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Video Generation Models Are Good Latent Reward Models

📝 Summary:
PRFL optimizes video generation preferences in latent space, improving alignment with human preferences while reducing memory consumption and training time. AI-generated summary Reward feedback learni...

🔹 Publication Date: Published on Nov 26, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21541
• PDF: https://arxiv.org/pdf/2511.21541
• Project Page: https://hy-video-prfl.github.io/HY-VIDEO-PRFL/
• Github: https://github.com/Tencent-Hunyuan/HY-Video-PRFL

🔹 Models citing this paper:
https://huggingface.co/tencent/HY-Video-PRFL

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RAG-Anything: All-in-One RAG Framework

📝 Summary:
RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks. A...

🔹 Publication Date: Published on Oct 14, 2025

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/rag-anything-all-in-one-rag-framework
• PDF: https://arxiv.org/pdf/2510.12323
• Github: https://github.com/HKUDS/RAG-Anything

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SAM 3: Segment Anything with Concepts

📝 Summary:
Segment Anything Model 3 achieves state-of-the-art performance in promptable concept segmentation and tracking by leveraging a unified model architecture with decoupled recognition and localization. A...

🔹 Publication Date: Published on Nov 20, 2025

🔹 Paper Links:
• arXiv Page: https://arxivlens.com/PaperView/Details/sam-3-segment-anything-with-concepts-8758-14547cc3
• PDF: https://arxiv.org/pdf/2511.16719
• Project Page: https://ai.meta.com/sam3/
• Github: https://github.com/facebookresearch/sam3

Spaces citing this paper:
https://huggingface.co/spaces/kith777/rag_agent

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Very Large-Scale Multi-Agent Simulation in AgentScope

📝 Summary:
Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendl...

🔹 Publication Date: Published on Jul 25, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2407.17789
• PDF: https://arxiv.org/pdf/2407.17789
• Github: https://github.com/modelscope/agentscope

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

📝 Summary:
olmOCR is an open-source toolkit using a fine-tuned vision language model to process PDFs into clean text while preserving structure, optimized for large-scale batch processing. AI-generated summary P...

🔹 Publication Date: Published on Feb 25, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.18443
• PDF: https://arxiv.org/pdf/2502.18443
• Github: https://github.com/allenai/olmocr

Datasets citing this paper:
https://huggingface.co/datasets/davanstrien/test-olmocr2
https://huggingface.co/datasets/davanstrien/newspapers-olmocr2
https://huggingface.co/datasets/stckmn/ocr-output-Directive017-1761355297

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

📝 Summary:
AgentScope enhances agentic applications by providing flexible tool-based interactions, unified interfaces, and advanced infrastructure based on the ReAct paradigm, supporting efficient and safe devel...

🔹 Publication Date: Published on Aug 22, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.16279
• PDF: https://arxiv.org/pdf/2508.16279
• Github: https://github.com/agentscope-ai/agentscope

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MediaPipe: A Framework for Building Perception Pipelines

📝 Summary:
MediaPipe is a framework for building perception applications. It helps developers combine components, prototype, and measure performance across platforms, addressing key development challenges. This allows focusing on algorithm improvement with reproducible results.

🔹 Publication Date: Published on Jun 14, 2019

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/1906.08172
• PDF: https://arxiv.org/pdf/1906.08172
• Github: https://github.com/google-ai-edge/mediapipe

Spaces citing this paper:
https://huggingface.co/spaces/Jha-Pranav/PixelCare

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Multi-Agent Software Development through Cross-Team Collaboration

📝 Summary:
Cross-Team Collaboration improves software quality by enabling multiple LLM agent teams to propose and communicate decisions. AI-generated summary The latest breakthroughs in Large Language Models ( L...

🔹 Publication Date: Published on Jun 13, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.08979
• PDF: https://arxiv.org/pdf/2406.08979
• Github: https://github.com/OpenBMB/ChatDev

Spaces citing this paper:
https://huggingface.co/spaces/shanghengdu/LLM-Agent-Optimization-PaperList

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
2