Trustworthy Generation
Data is key to technological innovations. We develop theoretical and algorithmic frameworks for generative AI to synthesize realistic, diverse, and targeted data. Our methods facilitate data augmentation for trustworthy machine learning and accelerate novel designs for drug and material discovery, and beyond.
Our work
What is retrieval-augmented generation?
ExplainerKim Martineau- AI
- Explainable AI
- Generative AI
- Natural Language Processing
- Trustworthy Generation
Accelerating molecular optimization with AI
Deep DivePayel Das, Samuel Hoffman, Vijil Chenthamarakshan, Kahini Wadhawan, and Pin-Yu Chen11 minute read- Accelerated Discovery
- Generative AI
- Healthcare
- Materials Discovery
- Trustworthy AI
- Trustworthy Generation
AI boosts the discovery of metamaterials vital for next-gen gadgets
ResearchYoussef Mroueh, Karthikeyan Shanmugam, and Payel Das10 minute read- AI
- Materials Discovery
- Trustworthy Generation
- Uncertainty Quantification
IBM AI finds new peptides – paving the way to better drug design
ResearchAleksandra Mojsilovic and Payel Das4 minute read- Accelerated Discovery
- AI
- Generative AI
- Materials Discovery
- Trustworthy Generation
DualTKB: A Dual Learning Bridge between Text and Knowledge Base
ResearchPierre Dognin6 minute read- Knowledge and Reasoning
- Natural Language Processing
- Trustworthy Generation
Image captioning as an assistive technology
NewsYoussef Mroueh5 minute read- Computer Vision
- Trustworthy AI
- Trustworthy Generation
Tools + code
CLaSS: Controlled Latent attribute Space Sampling
Code for an efficient computational method for attribute-controlled generation of molecules, which leverages guidance from classifiers trained on an informative latent space of molecules modeled using a deep generative autoencoder.
View project →TabFormer
Pytorch source code and data for tabular transformers for modeling multivariate time series data, showcasing card transactions data synthetic generation and analysis.
View project →Sobolev Independence Criterion
Code for non-linear feature selection and provable false discovery rate control using generative models and hold out randomized testings.
View project →Fair Mixup
Code for training fair classifiers across different modalities such as tabular, language and image data, using fair mixup augmentation as a regularizer.
View project →Fold2Seq
Code for designing protein sequences conditioned on a specific target 3D fold using a novel transformer-based generative framework.
View project →Unbalanced Sobolev Descent
Code for unbalanced Sobolev Descent for generating unbalanced data with birth and death processes.
View project →ReGen
Code for bi-directional Text and Knowledge Base generation using Pretrained Language Models
View project →
Publications
- Pablo Navarro
- Celia Cintas
- et al.
- 2023
- IJCAI 2023
- Keerthiram Murugesan
- Sarathkrishna Swaminathan
- et al.
- 2023
- ACL 2023
- Ella Neeman
- Roee Aharoni
- et al.
- 2023
- ACL 2023
- Karthikeyan Natesan Ramamurthy
- Aldo Guzmán-Sáenz
- et al.
- 2023
- ICASSP 2023
- Matteo Manica
- Jannis Born
- et al.
- 2023
- npj Computational Materials
- Fengjie Wang
- Xuye Liu
- et al.
- 2023
- CHI 2023