Media is too big
VIEW IN TELEGRAM
✨UFO^3: Weaving the Digital Agent Galaxy
📝 Summary:
UFO^3 unifies diverse digital devices into a single orchestration fabric, enabling AI agents to collaborate seamlessly across platforms. It models tasks dynamically for asynchronous execution, achieving efficient, resilient, and accurate cross-device task orchestration with improved parallelism a...
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11332
• PDF: https://arxiv.org/pdf/2511.11332
• Project Page: https://microsoft.github.io/UFO/
• Github: https://github.com/microsoft/UFO/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #TaskOrchestration #DistributedSystems #EdgeAI #MultiAgentSystems
📝 Summary:
UFO^3 unifies diverse digital devices into a single orchestration fabric, enabling AI agents to collaborate seamlessly across platforms. It models tasks dynamically for asynchronous execution, achieving efficient, resilient, and accurate cross-device task orchestration with improved parallelism a...
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11332
• PDF: https://arxiv.org/pdf/2511.11332
• Project Page: https://microsoft.github.io/UFO/
• Github: https://github.com/microsoft/UFO/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #TaskOrchestration #DistributedSystems #EdgeAI #MultiAgentSystems
✨Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
📝 Summary:
Orion is a visual agent framework that orchestrates specialized computer vision tools to execute complex visual workflows. It achieves competitive performance on benchmarks and enables autonomous, tool-driven visual reasoning.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14210
• PDF: https://arxiv.org/pdf/2511.14210
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ComputerVision #AIagents #VisualReasoning #MultimodalAI #DeepLearning
📝 Summary:
Orion is a visual agent framework that orchestrates specialized computer vision tools to execute complex visual workflows. It achieves competitive performance on benchmarks and enables autonomous, tool-driven visual reasoning.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14210
• PDF: https://arxiv.org/pdf/2511.14210
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ComputerVision #AIagents #VisualReasoning #MultimodalAI #DeepLearning
✨Agent READMEs: An Empirical Study of Context Files for Agentic Coding
📝 Summary:
This study analyzed 2303 agent context files, finding them complex and evolving like config code. Developers prioritize functional details but rarely specify non-functional requirements like security or performance. This suggests a gap in guardrails for agent-written code quality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12884
• PDF: https://arxiv.org/pdf/2511.12884
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #SoftwareEngineering #CodeQuality #LLMs #AIResearch
📝 Summary:
This study analyzed 2303 agent context files, finding them complex and evolving like config code. Developers prioritize functional details but rarely specify non-functional requirements like security or performance. This suggests a gap in guardrails for agent-written code quality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12884
• PDF: https://arxiv.org/pdf/2511.12884
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #SoftwareEngineering #CodeQuality #LLMs #AIResearch
✨OmniParser for Pure Vision Based GUI Agent
📝 Summary:
OmniParser enhances GPT-4V's ability to act as a GUI agent by improving screen parsing. It identifies interactable icons and understands element semantics using specialized models. This significantly boosts GPT-4V's performance on benchmarks like ScreenSpot, Mind2Web, and AITW.
🔹 Publication Date: Published on Aug 1, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.00203
• PDF: https://arxiv.org/pdf/2408.00203
• Github: https://github.com/microsoft/omniparser
🔹 Models citing this paper:
• https://huggingface.co/microsoft/OmniParser
• https://huggingface.co/microsoft/OmniParser-v2.0
• https://huggingface.co/banao-tech/OmniParser
✨ Datasets citing this paper:
• https://huggingface.co/datasets/mlfoundations/Click-100k
✨ Spaces citing this paper:
• https://huggingface.co/spaces/callmeumer/OmniParser-v2
• https://huggingface.co/spaces/nofl/OmniParser-v2
• https://huggingface.co/spaces/SheldonLe/OmniParser-v2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GUIagents #ComputerVision #GPT4V #AIagents #DeepLearning
📝 Summary:
OmniParser enhances GPT-4V's ability to act as a GUI agent by improving screen parsing. It identifies interactable icons and understands element semantics using specialized models. This significantly boosts GPT-4V's performance on benchmarks like ScreenSpot, Mind2Web, and AITW.
🔹 Publication Date: Published on Aug 1, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.00203
• PDF: https://arxiv.org/pdf/2408.00203
• Github: https://github.com/microsoft/omniparser
🔹 Models citing this paper:
• https://huggingface.co/microsoft/OmniParser
• https://huggingface.co/microsoft/OmniParser-v2.0
• https://huggingface.co/banao-tech/OmniParser
✨ Datasets citing this paper:
• https://huggingface.co/datasets/mlfoundations/Click-100k
✨ Spaces citing this paper:
• https://huggingface.co/spaces/callmeumer/OmniParser-v2
• https://huggingface.co/spaces/nofl/OmniParser-v2
• https://huggingface.co/spaces/SheldonLe/OmniParser-v2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GUIagents #ComputerVision #GPT4V #AIagents #DeepLearning
arXiv.org
OmniParser for Pure Vision Based GUI Agent
The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as...
✨What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
📝 Summary:
Ideation diversity significantly enhances AI research agent performance. Higher ideation diversity leads to stronger results on the MLE-bench benchmark across different models and scaffolds. This finding holds across various performance metrics.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15593
• PDF: https://arxiv.org/pdf/2511.15593
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIResearch #IdeationDiversity #MachineLearning #AIagents #AIPerformance
📝 Summary:
Ideation diversity significantly enhances AI research agent performance. Higher ideation diversity leads to stronger results on the MLE-bench benchmark across different models and scaffolds. This finding holds across various performance metrics.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15593
• PDF: https://arxiv.org/pdf/2511.15593
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIResearch #IdeationDiversity #MachineLearning #AIagents #AIPerformance
Media is too big
VIEW IN TELEGRAM
✨GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
📝 Summary:
GeoVista is a new agentic model for geolocalization that integrates tool invocation and reinforcement learning. It achieves high performance on the new GeoBench benchmark, surpassing open-source models and matching closed-source models.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15705
• PDF: https://arxiv.org/pdf/2511.15705
• Project Page: https://ekonwang.github.io/geo-vista/
• Github: https://github.com/ekonwang/GeoVista
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Geolocalization #AI #ReinforcementLearning #ComputerVision #AIAgents
📝 Summary:
GeoVista is a new agentic model for geolocalization that integrates tool invocation and reinforcement learning. It achieves high performance on the new GeoBench benchmark, surpassing open-source models and matching closed-source models.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15705
• PDF: https://arxiv.org/pdf/2511.15705
• Project Page: https://ekonwang.github.io/geo-vista/
• Github: https://github.com/ekonwang/GeoVista
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Geolocalization #AI #ReinforcementLearning #ComputerVision #AIAgents
This media is not supported in your browser
VIEW IN TELEGRAM
✨Computer-Use Agents as Judges for Generative User Interface
📝 Summary:
This paper introduces a framework where Computer-Use Agents CUA act as judges for coding language models Coder to automatically design GUIs. The goal is to optimize interfaces for CUA efficiency and task solvability, rather than human aesthetics, using a new benchmark called AUI-Gym.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15567
• PDF: https://arxiv.org/pdf/2511.15567
• Project Page: https://showlab.github.io/AUI/
• Github: https://github.com/showlab/AUI/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #GUIDesign #GenerativeAI #AIevaluation #LanguageModels
📝 Summary:
This paper introduces a framework where Computer-Use Agents CUA act as judges for coding language models Coder to automatically design GUIs. The goal is to optimize interfaces for CUA efficiency and task solvability, rather than human aesthetics, using a new benchmark called AUI-Gym.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15567
• PDF: https://arxiv.org/pdf/2511.15567
• Project Page: https://showlab.github.io/AUI/
• Github: https://github.com/showlab/AUI/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #GUIDesign #GenerativeAI #AIevaluation #LanguageModels
✨Budget-Aware Tool-Use Enables Effective Agent Scaling
📝 Summary:
Tool-augmented agents struggle to scale with more tool calls due to a lack of budget awareness. This paper introduces Budget Tracker for continuous budget awareness and BATS for adaptive planning, dynamically adjusting strategy based on remaining resources. These methods significantly improve cos...
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17006
• PDF: https://arxiv.org/pdf/2511.17006
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #ToolUse #ResourceManagement #AgentScaling #AIResearch
📝 Summary:
Tool-augmented agents struggle to scale with more tool calls due to a lack of budget awareness. This paper introduces Budget Tracker for continuous budget awareness and BATS for adaptive planning, dynamically adjusting strategy based on remaining resources. These methods significantly improve cos...
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17006
• PDF: https://arxiv.org/pdf/2511.17006
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #ToolUse #ResourceManagement #AgentScaling #AIResearch
✨PRInTS: Reward Modeling for Long-Horizon Information Seeking
📝 Summary:
PRInTS is a generative process reward model that improves AI agents information-seeking. It provides dense scoring on step quality and summarizes long trajectories to manage context. PRInTS enhances agent performance, matching or surpassing frontier models with a smaller backbone.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19314
• PDF: https://arxiv.org/pdf/2511.19314
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#RewardModeling #InformationSeeking #AIagents #GenerativeAI #MachineLearning
📝 Summary:
PRInTS is a generative process reward model that improves AI agents information-seeking. It provides dense scoring on step quality and summarizes long trajectories to manage context. PRInTS enhances agent performance, matching or surpassing frontier models with a smaller backbone.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19314
• PDF: https://arxiv.org/pdf/2511.19314
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#RewardModeling #InformationSeeking #AIagents #GenerativeAI #MachineLearning
✨PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
📝 Summary:
PC-Agent is a hierarchical multi-agent framework improving MLLM-based GUI agents for complex PC tasks. It uses an Active Perception Module and a hierarchical decision-making architecture with Manager, Progress, and Decision agents. A Reflection agent provides feedback. It achieved a 32% task succ...
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.14282
• PDF: https://arxiv.org/pdf/2502.14282
• Github: https://github.com/X-PLUG/MobileAgent/tree/main/PC-Agent
✨ Spaces citing this paper:
• https://huggingface.co/spaces/junyangwang0410/PC-Agent
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultiAgentSystems #AIAgents #MLLMs #PCAutomation #DeepLearning
📝 Summary:
PC-Agent is a hierarchical multi-agent framework improving MLLM-based GUI agents for complex PC tasks. It uses an Active Perception Module and a hierarchical decision-making architecture with Manager, Progress, and Decision agents. A Reflection agent provides feedback. It achieved a 32% task succ...
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.14282
• PDF: https://arxiv.org/pdf/2502.14282
• Github: https://github.com/X-PLUG/MobileAgent/tree/main/PC-Agent
✨ Spaces citing this paper:
• https://huggingface.co/spaces/junyangwang0410/PC-Agent
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultiAgentSystems #AIAgents #MLLMs #PCAutomation #DeepLearning
✨Fara-7B: An Efficient Agentic Model for Computer Use
📝 Summary:
FaraGen creates synthetic datasets for computer use agents, solving a data scarcity problem. This data trains Fara-7B, a small on-device model that perceives computers via screenshots and outperforms larger models on diverse web tasks.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19663
• PDF: https://arxiv.org/pdf/2511.19663
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #OnDeviceAI #SyntheticData #MachineLearning #ComputerVision
📝 Summary:
FaraGen creates synthetic datasets for computer use agents, solving a data scarcity problem. This data trains Fara-7B, a small on-device model that perceives computers via screenshots and outperforms larger models on diverse web tasks.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19663
• PDF: https://arxiv.org/pdf/2511.19663
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #OnDeviceAI #SyntheticData #MachineLearning #ComputerVision
✨Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
📝 Summary:
Agent0-VL is a self-evolving vision-language agent that integrates tool usage into both reasoning and self-evaluation. It uses a Solver and Verifier in a self-evolving cycle for continuous improvement without human annotation or external rewards, achieving a 12.5% performance gain.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19900
• PDF: https://arxiv.org/pdf/2511.19900
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #VisionLanguage #SelfEvolvingAI #ToolAugmentedAI #AIResearch
📝 Summary:
Agent0-VL is a self-evolving vision-language agent that integrates tool usage into both reasoning and self-evaluation. It uses a Solver and Verifier in a self-evolving cycle for continuous improvement without human annotation or external rewards, achieving a 12.5% performance gain.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19900
• PDF: https://arxiv.org/pdf/2511.19900
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AIAgents #VisionLanguage #SelfEvolvingAI #ToolAugmentedAI #AIResearch
✨Latent Collaboration in Multi-Agent Systems
📝 Summary:
LatentMAS enables LLM agents to collaborate directly in latent space, surpassing text-based communication. This boosts reasoning quality, accuracy, and efficiency speed, tokens without extra training.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20639
• PDF: https://arxiv.org/pdf/2511.20639
• Github: https://github.com/Gen-Verse/LatentMAS
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #MultiAgentSystems #LatentSpace #AIAgents #ArtificialIntelligence
📝 Summary:
LatentMAS enables LLM agents to collaborate directly in latent space, surpassing text-based communication. This boosts reasoning quality, accuracy, and efficiency speed, tokens without extra training.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20639
• PDF: https://arxiv.org/pdf/2511.20639
• Github: https://github.com/Gen-Verse/LatentMAS
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #MultiAgentSystems #LatentSpace #AIAgents #ArtificialIntelligence