Effective Red-Teaming of Policy-Adherent AgentsItay NakashGeorge Kouret al.2025EMNLP 2025Conference paper
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language ModelsGeorge KourItay Nakashet al.2025ACL 2025Conference paper
Exploring Straightforward Methods for Automatic Conversational Red-TeamingGeorge KourNaama Zwerdlinget al.2025NAACL 2025Conference paper
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You InItay NakashGeorge Kouret al.2025NAACL 2025Conference paper
Unveiling Safety Vulnerabilities of Large Language ModelsGeorge KourMarcel Zalmanoviciet al.2023EMNLP 2023Workshop paper
Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text CorporaGeorge KourSamuel Ackermanet al.2022EMNLP 2022Workshop paper
Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights GenerationGeorge KourMarcel Zalmanoviciet al.2022AAAI 2022Workshop paper