Better Bias Benchmarking of Language Models via Multi-factor AnalysisHannah PowersIoana Baldini Soareset al.2024NeurIPS 2024
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational AgentsIvoline NgongSwanand Ravindra Kadheet al.2024NeurIPS 2024
Towards Collecting Royalties for Copyrighted Data for Generative ModelsHeiko LudwigYi Zhouet al.2024ICWS 2024
Prompt Templates: A Methodology for Improving Manual Red Teaming PerformanceBrandon DominiqueDavid Piorkowskiet al.2024CHI 2024
Facilitating Human-LLM Collaboration through Factuality Scores and Source AttributionsHyo Jin DoRachel Ostrandet al.2024CHI 2024
FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMsSwanand Ravindra KadheAnisa Halimiet al.2023NeurIPS 2023
Cost-Aware Counterfactuals for Black Box ExplanationsNatalia Martinez GilKanthi Sarpatwaret al.2023NeurIPS 2023
Influence Based Approaches to Algorithmic Fairness: A Closer LookSoumya GhoshPrasanna Sattigeriet al.2023NeurIPS 2023
Weakly Supervised Detection of Hallucinations in LLM ActivationsMiriam RateikeCelia Cintaset al.2023NeurIPS 2023
Subtle Misogyny Detection and Mitigation: An Expert-Annotated DatasetAnna RichterBrooklyn Sheppardet al.2023NeurIPS 2023