About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
EMNLP 2024
Short paper
Don’t be my Doctor! Recognizing Healthcare Advice in Large Language Models
Abstract
Large language models (LLMs) have seen increasing popularity in daily use, with their widespread adoption as virtual assistants, chatbots, predictors, and many more. This raises the need for safeguards and guardrails to ensure that the outputs from LLMs do not mislead or harm users, especially in highly regulated domains such as healthcare, where misleading advice may influence users to unknowingly commit malpractice. Despite this vulnerability, the majority of guardrail bench marking datasets do not have enough focus on medical advice. In this paper, we present the HeAL benchmark 014 (HElth Adivce in LLMs), a health-advice benchmark dataset that has been manually curated and annotated to evaluate LLMs' capability in recognizing health-advice - which we use to safeguards LLMs deployed in industrial settings.