DARE to Diversify: DAta Driven and Diverse LLM REd Teaming
Abstract
Large language models (LLMs) have been rapidly adopted, as showcased by ChatGPT’s overnight popularity, and are integrated in products used by millions of people every day, such as search engines and productivity suites. Yet the societal impact of LLMs, encompassing both benefits and harms, is not well understood. Inspired by cybersecurity practices, red-teaming is emerging as a technique to uncover model vulnerabilities. Despite increasing attention from industry, academia, and government centered around red-teaming LLMs, such efforts are still limited in the diversity of the red-teaming focus, approaches and participants. Importantly, given that LLMs are becoming ubiquitous, it is imperative that red-teaming efforts are scaled out to include large segments of the research, practitioners and the people whom are directly affected by the deployment of these systems. The goal of this tutorial is two fold. First, we introduce the topic of LLM red-teaming by reviewing the state of the art for red-teaming practices, from ad-hoc events to automatic AI-focused approaches, exposing the gaps in both the techniques and coverage of the targeted harms. Second, we plan to engage the audience with a hands-on and interactive exercise in LLM red-teaming to showcase the ease (or difficulty) of exposing model vulnerabilities, contingent on both the targeted harm and model capabilities. We believe that the KDD community of researchers and practitioners are in a unique position to address the existing gaps in red-teaming approaches, given their longstanding research and practice of extracting knowledge from data. Why this tutorial? Nowadays, access to a Large Language Model (LLM) with billions of parameters is a website away. Given the demonstrated high performance of LLMs on a variety of tasks (e.g., summarization, content creation, etc.), there is widespread excitement to use these systems in a myriad of downstream applications. Concurrently, there are growing efforts in understanding and categorizing the risks associated with LLMs. Red-teaming, or interactive probing, is one such area of focus. In order to most effectively uncover potential risks via red-teaming, we strongly believe that a participatory effort is paramount. In particular, with this tutorial, we seek to leverage the diverse set of skills and lived experiences of KDD conference attendees in order to discover LLM failures. In doing so, we hope to affirm the notion that red-teaming is not a point-in-time endeavor, rather it is a never-ending process. Effective red-teaming is the result of interactive exploration and iterative improvement.