2024 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Frank F. Xu, Yufan Song, Boxuan Li , and 18 more authors In ArXiv Preprint , 2024 Code Website The BrowserGym Ecosystem for Web Agent Research Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin , and 17 more authors In arXiv Preprint , 2024 Code VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Lawrence Jang, Yinheng Li, Charles Ding , and 5 more authors In NeurIPS Open World Agents Workshop , 2024 Code Website Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Rogerio Bonatti, Dan Zhao, Francesco Bonacci , and 9 more authors In NeurIPS Safe and Trustworthy Agents Workshop (Oral) , 2024 PDF Code Website ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights Gabriel Sarch, Lawrence Jang, Michael Tarr , and 3 more authors In NeurIPS Spotlight , 2024 PDF Code Website MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts Haofei Yu, Zhengyang Qi, Lawrence Jang , and 3 more authors In EMNLP , 2024 PDF Code VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks Jing Yu Koh, Robert Lo, Lawrence Jang , and 7 more authors In ACL , 2024 Code Website