Trustworthy Generation
Data is key to technological innovations. We develop theoretical and algorithmic frameworks for generative AI to synthesize realistic, diverse, and targeted data. Our methods facilitate data augmentation for trustworthy machine learning and accelerate novel designs for drug and material discovery, and beyond.
Our work
Accelerating molecular optimization with AI
Deep DiveAI boosts the discovery of metamaterials vital for next-gen gadgets
ResearchIBM AI finds new peptides – paving the way to better drug design
ResearchDualTKB: A Dual Learning Bridge between Text and Knowledge Base
ResearchImage captioning as an assistive technology
News
Tools + code
CLaSS: Controlled Latent attribute Space Sampling
Code for an efficient computational method for attribute-controlled generation of molecules, which leverages guidance from classifiers trained on an informative latent space of molecules modeled using a deep generative autoencoder.
View project →TabFormer
Pytorch source code and data for tabular transformers for modeling multivariate time series data, showcasing card transactions data synthetic generation and analysis.
View project →Sobolev Independence Criterion
Code for non-linear feature selection and provable false discovery rate control using generative models and hold out randomized testings.
View project →Fair Mixup
Code for training fair classifiers across different modalities such as tabular, language and image data, using fair mixup augmentation as a regularizer.
View project →Fold2Seq
Code for designing protein sequences conditioned on a specific target 3D fold using a novel transformer-based generative framework.
View project →Unbalanced Sobolev Descent
Code for unbalanced Sobolev Descent for generating unbalanced data with birth and death processes.
View project →ReGen
Code for bi-directional Text and Knowledge Base generation using Pretrained Language Models
View project →
Publications
- 2023
- MRS Spring Meeting 2023
- 2022
- EMNLP 2022
- 2022
- EMNLP 2022
- 2022
- NeurIPS 2022
- 2022
- BPM 2022
- 2022
- KDD 2022