The Unreasonable Effectiveness of Scaling Agents for Computer Use
Authors: Gonzalo Gonzalez-Pumariega, Vincent Tu, Chih-Lun Lee, Jiachen Yang, Ang Li, and Xin Eric Wang
Publication: arXiv, October 2, 2025
Summary: This paper introduces Behavior Best-of-N (bBoN), a method to improve the reliability of AI agents that perform tasks on a computer. bBoN generates multiple possible ways an agent could complete a task and then selects the best one based on a “behavior narrative”.
Why it matters: This research could lead to more reliable AI assistants that can automate complex digital tasks with higher success rates, bringing us closer to truly useful autonomous agents. It demonstrates a practical way to improve agent performance by focusing on the outcomes of their actions.
Key technical insight: Instead of relying on a single attempt, the AI tries multiple approaches and then uses a “behavior narrative” to pick the most successful one. This is like having a supervisor who watches several attempts and picks the best outcome, significantly improving the chances of success.