ML Research Hub
32.8K subscribers
4.09K photos
237 videos
23 files
4.41K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
LISA: Reasoning Segmentation via Large Language Model

New segmentation task -- reasoning segmentation. The task is designed to output a segmentation mask given a complex and implicit query text.

🖥 Github: https://github.com/dvlab-research/lisa

📕 Paper: https://arxiv.org/abs/2308.00692v2

☑️ Dataset: https://github.com/dvlab-research/lisa#dataset

https://t.me/DataScienceT
👍7
🌟 MiraData: Large, long-duration video dataset with structured annotations.

When training generative models, the training dataset plays an important role in the quality of reference of ready-made models.
One of the good sources can be MiraData from Tencent - a ready-made dataset with a total video duration of 16 thousand hours, designed for training models for generating text in videos. It includes long videos (average 72.1 seconds) with high motion intensity and detailed structured annotations (average 318 words per video).

To assess the quality of the dataset, a system of MiraBench benchmarks was even specially created, consisting of 17 metrics that evaluate temporal consistency, movement in the frame, video quality, and other parameters. According to their results, MiroData outperforms other well-known datasets available in open sources, which mainly consist of short videos with floating quality and short descriptions.

🟡 Project page
🟡 Arxiv
🤗 Hugging Face
🖥 GitHub

#Text2Video #Dataset #ML

https://t.me/DataScienceT ⭐️
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍21
This media is not supported in your browser
VIEW IN TELEGRAM
🍄 4D Mocap Human-Object 🍄

Adobe unveils HUMOTO, a high-quality #dataset of human-object interactions designed for #motiongeneration, #computervision, and #robotics. It features over 700 sequences (7,875 seconds @ 30FPS) with interactions involving 63 precisely modeled objects and 72 articulated parts—a rich resource for researchers and developers in the field.


⚡️ Review: https://t.ly/lCof3
⚡️ Paper: https://lnkd.in/dVVBDd_c
⚡️ Project: https://lnkd.in/dwBcseDf

#HUMOTO #4DMocap #HumanObjectInteraction #AdobeResearch #AI #MachineLearning #PoseEstimation

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
👍51🔥1
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

📝 Summary:
Pico-Banana-400K is a new 400K-image dataset for text-guided image editing, built from real photos. It offers diverse edit types, high quality, and specialized subsets for multi-turn, preference-based, and long-short instruction editing, enabling comprehensive model development.

🔹 Publication Date: Published on Oct 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19808
• PDF: https://arxiv.org/pdf/2510.19808
• Github: https://github.com/apple/pico-banana-400k

🔹 Models citing this paper:
https://huggingface.co/eigen-ai-labs/eigen-banana-qwen-image-edit

==================================

For more data science resources:
https://t.me/DataScienceT

#ImageEditing #TextGuidedEditing #Dataset #ComputerVision #AI
GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents

📝 Summary:
GUI-360 is a large dataset and benchmark for computer-using agents, addressing gaps in real-world tasks and unified evaluation. It contains over 1.2M action steps in Windows apps for GUI grounding, screen parsing, and action prediction. Benchmarking reveals significant shortcomings in current mod...

🔹 Publication Date: Published on Nov 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04307
• PDF: https://arxiv.org/pdf/2511.04307

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #ComputerAgents #GUIAgents #Dataset #Benchmark
CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios

📝 Summary:
CATS-V2V is a new real-world dataset for V2V cooperative perception, focusing on complex adverse traffic scenarios. It provides extensive synchronized sensor data, including LiDAR and cameras, from two vehicles across diverse conditions. This dataset supports autonomous driving research.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11168
• PDF: https://arxiv.org/pdf/2511.11168

==================================

For more data science resources:
https://t.me/DataScienceT

#V2V #AutonomousDriving #CooperativePerception #Dataset #ADAS
miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

📝 Summary:
An analysis of miniF2F showed AI systems had 36% accuracy due to problem errors. Correcting these errors created miniF2F-v2, improving accuracy to 70%. High-quality benchmarks like miniF2F-v2 are crucial for evaluating formal reasoning progress.

🔹 Publication Date: Published on Nov 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03108
• PDF: https://arxiv.org/pdf/2511.03108
• Github: https://github.com/roozbeh-yz/miniF2F_v2

Datasets citing this paper:
https://huggingface.co/datasets/roozbeh-yz/miniF2F_v2

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #FormalReasoning #Benchmarks #MachineLearning #Dataset
MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model

📝 Summary:
MicroVQA plus plus is a new high-quality microscopy VQA dataset built via a three-stage process. This includes HiCQA-Graph, a novel filtering method using NLI, CLIP, and MLLM signals. The dataset enables strong microscopy reasoning performance for MLLMs.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11407
• PDF: https://arxiv.org/pdf/2511.11407
• Github: https://github.com/ieellee/MicroVQA-PlusPlus

==================================

For more data science resources:
https://t.me/DataScienceT

#MLLM #Microscopy #VQA #AIResearch #Dataset
UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits

📝 Summary:
This paper tackles image editing model performance gaps due to data scarcity by introducing UnicEdit-10M, a 10M-scale high-quality dataset from a lightweight verified pipeline. It also proposes UnicBench, a new benchmark with novel metrics to diagnose reasoning limitations in models.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02790
• PDF: https://arxiv.org/pdf/2512.02790

==================================

For more data science resources:
https://t.me/DataScienceT

#ImageEditing #AI #Dataset #Benchmark #ComputerVision
MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

📝 Summary:
MultiMed-ST, a large-scale multilingual medical speech translation dataset, is introduced. With 290,000 samples in five languages, it is the largest medical MT and multilingual ST dataset. This work also provides an extensive comparative analysis.

🔹 Publication Date: Published on Apr 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.03546
• PDF: https://arxiv.org/pdf/2504.03546
• Project Page: https://github.com/leduckhai/MultiMed-ST
• Github: https://github.com/leduckhai/MultiMed-ST

🔹 Models citing this paper:
https://huggingface.co/leduckhai/MultiMed-ST

Datasets citing this paper:
https://huggingface.co/datasets/leduckhai/MultiMed-ST

Spaces citing this paper:
https://huggingface.co/spaces/HaoVuong/MedicalASR

==================================

For more data science resources:
https://t.me/DataScienceT

#SpeechTranslation #MedicalAI #MultilingualNLP #MachineTranslation #Dataset
MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation

📝 Summary:
MeViS is a multi-modal dataset for referring motion expression video segmentation, addressing the need to segment and track objects based on their motion descriptions. It provides text and audio annotations for complex videos, enabling research into motion-guided video understanding.

🔹 Publication Date: Published on Dec 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10945
• PDF: https://arxiv.org/pdf/2512.10945
• Project Page: https://henghuiding.com/MeViS/

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoSegmentation #MultiModalAI #ComputerVision #Dataset #MotionUnderstanding
2
SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models

📝 Summary:
SecureCode v2.0 is a production-grade dataset of 1215 security-focused coding examples. It trains AI models to generate secure code by providing real-incident examples with vulnerable and secure implementations, attacks, defense, and operational security context across 11 languages, using a conve...

🔹 Publication Date: Published on Dec 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18542
• PDF: https://arxiv.org/pdf/2512.18542
• Project Page: https://perfecxion.ai/
• Github: https://github.com/scthornton/securecode-v2

==================================

For more data science resources:
https://t.me/DataScienceT

#Cybersecurity #CodeSecurity #AI #CodeGeneration #Dataset