Michael Hind, Dennis Wei, et al.
ICML 2020
Developing value-aligned agents is a complex un- dertaking and an ongoing challenge in the field of AI. Indeed, designing Large Language Models (LLMs) that can balance multiple possibly conflicting moral values based on the context is a problem of paramount importance. In this paper, we propose a system that performs contextual value alignment based on contextual aggregation of possible responses. This aggregation is achieved by integrating a subset of possible LLM responses that are best suited to a user’s input while taking into account features extracted about the user’s moral preferences. The proposed system trained using the Moral Integrity Corpus displays better alignment to human values than state-of-the-art baselines. Index Terms—value alignment, contextual alignment
Michael Hind, Dennis Wei, et al.
ICML 2020
Samuel Ackerman, Ella Rabinovich, et al.
EMNLP 2024
Georgia Perakis, Wei Sun, et al.
AISTATS 2024
Akifumi Wachi, Yanan Sui
ICML 2020