Chih-kai Ting, Karl Munson, et al.
AAAI 2023
The performance of large language models (LLMs) depends on how they are prompted, with choices spanning both the high-level prompting pattern (e.g., Zero-Shot, CoT, ReAct, ReWOO) and the specific prompt content (instructions and few-shot demonstrations). Manually tuning this combination is tedious, error-prone, and non-transferable across LLMs or tasks. Therefore, this paper proposes AutoPDL, an automated approach to discover good LLM agent configurations. Our method frames this as a structured AutoML problem over a combinatorial space of agentic and non-agentic prompting patterns and demonstrations, using successive halving to efficiently navigate this space. We introduce a library implementing common prompting patterns using the PDL prompt programming language. AutoPDL solutions are human-readable, editable, and executable PDL programs that use this library. This approach also enables source-to-source optimization, allowing human-in-the-loop refinement and reuse. Evaluations across three tasks and six LLMs (ranging from 8B to 70B parameters) show consistent accuracy gains (9.5 +- 17.5 percentage points), up to 68.9pp, and reveal that selected prompting strategies vary across models and tasks.
Chih-kai Ting, Karl Munson, et al.
AAAI 2023
Sahil Suneja, Yufan Zhuang, et al.
ACM TOSEM
Ziv Nevo, Orna Raz, et al.
ASE 2025
Saurabh Pujar, Yunhui Zheng, et al.
Empirical Software Engineering