✨GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents
📝 Summary:
GUI-360 is a large dataset and benchmark for computer-using agents, addressing gaps in real-world tasks and unified evaluation. It contains over 1.2M action steps in Windows apps for GUI grounding, screen parsing, and action prediction. Benchmarking reveals significant shortcomings in current mod...
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04307
• PDF: https://arxiv.org/pdf/2511.04307
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #ComputerAgents #GUIAgents #Dataset #Benchmark
📝 Summary:
GUI-360 is a large dataset and benchmark for computer-using agents, addressing gaps in real-world tasks and unified evaluation. It contains over 1.2M action steps in Windows apps for GUI grounding, screen parsing, and action prediction. Benchmarking reveals significant shortcomings in current mod...
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04307
• PDF: https://arxiv.org/pdf/2511.04307
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #ComputerAgents #GUIAgents #Dataset #Benchmark
✨OmniParser for Pure Vision Based GUI Agent
📝 Summary:
OmniParser enhances GPT-4V's ability to act as a GUI agent by improving screen parsing. It identifies interactable icons and understands element semantics using specialized models. This significantly boosts GPT-4V's performance on benchmarks like ScreenSpot, Mind2Web, and AITW.
🔹 Publication Date: Published on Aug 1, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.00203
• PDF: https://arxiv.org/pdf/2408.00203
• Github: https://github.com/microsoft/omniparser
🔹 Models citing this paper:
• https://huggingface.co/microsoft/OmniParser
• https://huggingface.co/microsoft/OmniParser-v2.0
• https://huggingface.co/banao-tech/OmniParser
✨ Datasets citing this paper:
• https://huggingface.co/datasets/mlfoundations/Click-100k
✨ Spaces citing this paper:
• https://huggingface.co/spaces/callmeumer/OmniParser-v2
• https://huggingface.co/spaces/nofl/OmniParser-v2
• https://huggingface.co/spaces/SheldonLe/OmniParser-v2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GUIagents #ComputerVision #GPT4V #AIagents #DeepLearning
📝 Summary:
OmniParser enhances GPT-4V's ability to act as a GUI agent by improving screen parsing. It identifies interactable icons and understands element semantics using specialized models. This significantly boosts GPT-4V's performance on benchmarks like ScreenSpot, Mind2Web, and AITW.
🔹 Publication Date: Published on Aug 1, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.00203
• PDF: https://arxiv.org/pdf/2408.00203
• Github: https://github.com/microsoft/omniparser
🔹 Models citing this paper:
• https://huggingface.co/microsoft/OmniParser
• https://huggingface.co/microsoft/OmniParser-v2.0
• https://huggingface.co/banao-tech/OmniParser
✨ Datasets citing this paper:
• https://huggingface.co/datasets/mlfoundations/Click-100k
✨ Spaces citing this paper:
• https://huggingface.co/spaces/callmeumer/OmniParser-v2
• https://huggingface.co/spaces/nofl/OmniParser-v2
• https://huggingface.co/spaces/SheldonLe/OmniParser-v2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GUIagents #ComputerVision #GPT4V #AIagents #DeepLearning
arXiv.org
OmniParser for Pure Vision Based GUI Agent
The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as...
✨HiconAgent: History Context-aware Policy Optimization for GUI Agents
📝 Summary:
HiconAgent introduces History Context-aware Policy Optimization HCPO for GUI agents. HCPO efficiently leverages historical context using dynamic sampling and compression, achieving better performance than larger models with reduced computational cost and significant speedups.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01763
• PDF: https://arxiv.org/pdf/2512.01763
• Github: https://github.com/JiuTian-VL/HiconAgent
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#HiconAgent #GUIAgents #AIResearch #ReinforcementLearning #ContextAwareAI
📝 Summary:
HiconAgent introduces History Context-aware Policy Optimization HCPO for GUI agents. HCPO efficiently leverages historical context using dynamic sampling and compression, achieving better performance than larger models with reduced computational cost and significant speedups.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01763
• PDF: https://arxiv.org/pdf/2512.01763
• Github: https://github.com/JiuTian-VL/HiconAgent
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#HiconAgent #GUIAgents #AIResearch #ReinforcementLearning #ContextAwareAI
✨GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning
📝 Summary:
GUI Exploration Lab is a simulation environment to train GUI agents for screen navigation. It finds supervised fine-tuning establishes basics, single-turn reinforcement learning improves generalization, and multi-turn RL enhances exploration for superior navigation performance.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02423
• PDF: https://arxiv.org/pdf/2512.02423
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #GUIAgents #AINavigation #MachineLearning #AIResearch
📝 Summary:
GUI Exploration Lab is a simulation environment to train GUI agents for screen navigation. It finds supervised fine-tuning establishes basics, single-turn reinforcement learning improves generalization, and multi-turn RL enhances exploration for superior navigation performance.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02423
• PDF: https://arxiv.org/pdf/2512.02423
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #GUIAgents #AINavigation #MachineLearning #AIResearch
✨MAI-UI Technical Report: Real-World Centric Foundation GUI Agents
📝 Summary:
MAI-UI introduces a family of foundation GUI agents tackling real-world deployment challenges. It uses a self-evolving data pipeline, device-cloud collaboration, and online RL to set new state-of-the-art in GUI grounding and mobile navigation, significantly boosting performance and privacy.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22047
• PDF: https://arxiv.org/pdf/2512.22047
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GUIAgents #AI #ReinforcementLearning #MobileTech #HCI
📝 Summary:
MAI-UI introduces a family of foundation GUI agents tackling real-world deployment challenges. It uses a self-evolving data pipeline, device-cloud collaboration, and online RL to set new state-of-the-art in GUI grounding and mobile navigation, significantly boosting performance and privacy.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22047
• PDF: https://arxiv.org/pdf/2512.22047
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GUIAgents #AI #ReinforcementLearning #MobileTech #HCI
❤2