Zhiyuan He, Yijun Yang, et al.
ICML 2024
The emergence of agentic AI---reasoning AI agents that can connect to tools and take actions---offers an enormous potential in performing tasks that currently require highly skilled humans to perform. In this position paper, we discuss AI agents in one such role: performance engineer. A performance engineer is typically highly trained and highly trusted to run performance diagnostic tools---which more often than not require root or administrator privileges---on production machines to diagnose performance issues. Critically, performance engineers are trusted not to cause harm to the production systems they are investigating, including crashing or hanging the systems, extracting sensitive information from them, or negatively affecting their performance. In this paper, we argue that current AI agents have the training, but lack the trust to be performance engineers. We outline four components: prevention, detection/auditing, aborting/rollback, and retry/refocus and highlight gaps where the approaches taken for human-based performance engineers fall short.
Zhiyuan He, Yijun Yang, et al.
ICML 2024
Teryl Taylor, Frederico Araujo, et al.
Big Data 2020
Anisa Halimi, Leonard Dervishi, et al.
PETS 2022
Chengkun Wei, Shouling Ji, et al.
IEEE TIFS