Be Your Own Neighborhood: Detecting Adversarial Examples by the Neighborhood Relations Built on Self-Supervised LearningZhiyuan HeYijun Yanget al.2024ICML 2024
Larimar: Large Language Models with Episodic Memory ControlPayel DasSUBHAJIT CHAUDHURYet al.2024ICML 2024
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMsSwanand Ravindra KadheFarhan Ahmedet al.2024ICML 2024
CharED: Character-wise Ensemble Decoding for Large Language ModelsKevin GuEva Tueckeet al.2024ICML 2024
Towards Assurance of LLM Adversarial Robustness using Ontology-Driven ArgumentationTomas Bueno MomcilovicBeat Buesseret al.2024xAI 2024
AUTOLYCUS: Exploiting Explainable Artificial Intelligence (XAI) for Model Extraction Attacks against Interpretable ModelsAbdullah Caglar OksuzAnisa Halimiet al.2024PETS 2024
Identifying Homogeneous and Interpretable Groups for Conformal PredictionNatalia Martinez GilDhaval Patelet al.2024UAI 2024
Quantifying Representation Reliability in Self-Supervised Learning ModelsYoung Jin ParkHao Wanget al.2024UAI 2024
Exploring Vulnerabilities in LLMs: A Red Teaming Approach to Evaluate Social BiasYuya Jeremy OngJay Pankaj Galaet al.2024IEEE CISOSE 2024
Privacy-Preserving Verification of Preprocessing in Machine Learning ModelsWenbiao LiAnisa Halimiet al.2024PETS 2024