Manipulating stock prices with an adversarial tweetResearchKim Martineau13 Jul 2022Adversarial Robustness and PrivacyTrustworthy AI
Securing AI systems with adversarial robustnessDeep DivePin-Yu Chen15 Dec 20218 minute readAdversarial Robustness and PrivacyAIData and AI Security
Researchers develop defenses against deep learning hack attacksReleaseAmbrish Rawat, Killian Levacher, and Mathieu Sinn05 Aug 20217 minute readAdversarial Robustness and PrivacyData and AI SecurityGenerative AISecurityTrustworthy AI
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language ModelsXiaomeng XuPin-Yu Chenet al.2025AAAI 2025
Retention Score: Quantifying Jailbreak Risks for Vision Language ModelsZhaitang LiPin-Yu Chenet al.2025AAAI 2025
Privacy without Noisy Gradients: Slicing Mechanism for Generative Model TrainingKristjan GreenewaldYuancheng Yuet al.2024NeurIPS 2024
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAIAmbrish RawatStefan Schoepfet al.2024NeurIPS 2024
Membership Inference Attacks Against Time-Series ModelsNoam KorenAbigail Goldsteenet al.2024ACML 2024
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt AttacksGiandomenico CornacchiaKieran Fraseret al.2024AIES 2024