ML Research Hub

✨GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents

📝 Summary:
GUI-360 is a large dataset and benchmark for computer-using agents, addressing gaps in real-world tasks and unified evaluation. It contains over 1.2M action steps in Windows apps for GUI grounding, screen parsing, and action prediction. Benchmarking reveals significant shortcomings in current mod...

🔹 Publication Date: Published on Nov 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04307
• PDF: https://arxiv.org/pdf/2511.04307

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AI #ComputerAgents #GUIAgents #Dataset #Benchmark

251 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨OmniParser for Pure Vision Based GUI Agent

📝 Summary:
OmniParser enhances GPT-4V's ability to act as a GUI agent by improving screen parsing. It identifies interactable icons and understands element semantics using specialized models. This significantly boosts GPT-4V's performance on benchmarks like ScreenSpot, Mind2Web, and AITW.

🔹 Publication Date: Published on Aug 1, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.00203
• PDF: https://arxiv.org/pdf/2408.00203
• Github: https://github.com/microsoft/omniparser

🔹 Models citing this paper:
• https://huggingface.co/microsoft/OmniParser
• https://huggingface.co/microsoft/OmniParser-v2.0
• https://huggingface.co/banao-tech/OmniParser

✨ Datasets citing this paper:
• https://huggingface.co/datasets/mlfoundations/Click-100k

✨ Spaces citing this paper:
• https://huggingface.co/spaces/callmeumer/OmniParser-v2
• https://huggingface.co/spaces/nofl/OmniParser-v2
• https://huggingface.co/spaces/SheldonLe/OmniParser-v2

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#GUIagents #ComputerVision #GPT4V #AIagents #DeepLearning

arXiv.org

OmniParser for Pure Vision Based GUI Agent

The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as...

395 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HiconAgent: History Context-aware Policy Optimization for GUI Agents

📝 Summary:
HiconAgent introduces History Context-aware Policy Optimization HCPO for GUI agents. HCPO efficiently leverages historical context using dynamic sampling and compression, achieving better performance than larger models with reduced computational cost and significant speedups.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01763
• PDF: https://arxiv.org/pdf/2512.01763
• Github: https://github.com/JiuTian-VL/HiconAgent

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#HiconAgent #GUIAgents #AIResearch #ReinforcementLearning #ContextAwareAI

127 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

📝 Summary:
GUI Exploration Lab is a simulation environment to train GUI agents for screen navigation. It finds supervised fine-tuning establishes basics, single-turn reinforcement learning improves generalization, and multi-turn RL enhances exploration for superior navigation performance.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02423
• PDF: https://arxiv.org/pdf/2512.02423

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ReinforcementLearning #GUIAgents #AINavigation #MachineLearning #AIResearch

187 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

📝 Summary:
MAI-UI introduces a family of foundation GUI agents tackling real-world deployment challenges. It uses a self-evolving data pipeline, device-cloud collaboration, and online RL to set new state-of-the-art in GUI grounding and mobile navigation, significantly boosting performance and privacy.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22047
• PDF: https://arxiv.org/pdf/2512.22047

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#GUIAgents #AI #ReinforcementLearning #MobileTech #HCI

❤2

251 views03:00

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform