ST-WEBAGENTBENCH: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
- Ido Levy
- Ben Wiesel
- et al.
- 2025
- ICML 2025
Ido Levy is an AI Research Scientist at IBM Research–Haifa, where he designs and builds generalist computer use agents that reason, plan, and act autonomously. He co-created IBM CUGA, the first enterprise-ready agent to outperform OpenAI Operator on standard web-navigation benchmarks, and created ST-WebAgentBench, the field’s reference suite for safety and trust evaluation.
Before IBM he was an NLP data scientist at GE Healthcare, developing drift- detection models and MLOps pipelines for clinical text. Ido is also a graduate student in Data Science (M.Sc., Technion, advisers Yonatan Belinkov & Ron Meir) and holds a fast-track B.Sc. in Data Science & Engineering.
Research interests: generative AI · multi-agent orchestration · emergent communication · trustworthy AI · large-language-model tooling.