2026 iOSWorld: A Benchmark for Personally Intelligent Phone Agents Lawrence Keunho Jang, Mareks Woodside, Geronimo Carom , and 3 more authors In ArXiv Preprint , 2026 Code Website MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Lawrence Keunho Jang, Andrew Keunwoo Jang, Jing Yu Koh , and 1 more author In ArXiv Preprint , 2026 Code Website Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried , and 1 more author In ArXiv Preprint , 2026 Code Website 2025 Agent Learning via Early Experience Kai Zhang, Xiangchao Chen, Bo Liu , and 27 more authors In ICML , 2025 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Frank F. Xu, Yufan Song, Boxuan Li , and 18 more authors In NeurIPS , 2025 Code Website The BrowserGym Ecosystem for Web Agent Research Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin , and 17 more authors In TMLR , 2025 Code VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Lawrence Jang, Yinheng Li, Charles Ding , and 5 more authors In ICLR , 2025 Code Website Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Rogerio Bonatti, Dan Zhao, Francesco Bonacci , and 9 more authors In ICML , 2025 PDF Code Website 2024 ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights Gabriel Sarch, Lawrence Jang, Michael Tarr , and 3 more authors In NeurIPS Spotlight , 2024 PDF Code Website MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts Haofei Yu, Zhengyang Qi, Lawrence Jang , and 3 more authors In EMNLP , 2024 PDF Code VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks Jing Yu Koh, Robert Lo, Lawrence Jang , and 7 more authors In ACL , 2024 Code Website