Curiosity-driven Red-teaming for Large Language Models
- Zhang-wei Hong
- Idan Shenfeld
- et al.
- 2024
- ICLR 2024
I am a PI at the MIT-IBM AI Research Lab, Research Manager at IBM Research, and the Chief Scientist at EBI within the Exploratory Sciences unit. I lead the AI Models Alignment Team, which is responsible for tuning and alignment of IBM foundation models. I also serve as the lead for the synthetic data generation efforts at IBM. My lab is located on campus, at 314 Main St (new MIT Museum building) in Cambridge, where my group works on topics ranging from foundation models, deep generative models, self-supervised learning, differential and statistical privacy, density ratio estimation, machine common-sense, model calibration and uncertainty quantification. Before joining MIT-IBM, I obtained my PhD at the University of Edinburgh where I worked with Prof Charles Sutton and Prof Michael U. Gutmann on variational inference for generative models and deep learning.