Tabular Data Synthesis with GANs for Adaptive AI Models
Abstract
In situations such as demographics change ML models often perform poorly because the training data does not appropriately represent the environment. Privacy concerns worsen the issue by severely limiting training data. In this paper, we present a framework that utilizes a GAN-based synthesizer to generate synthetic data that not only satisfies user-defined constraints expressed as marginal distributions of selected columns but also strives to preserve the distributions observed in the original data. This framework takes as input an original dataset and a set of user-defined constraints, and synthesizes data that adheres to these constraints while capturing the underlying distributions present in the given data. The result is a customizable and realistic data generation solution that balances constraint satisfaction and preservation of data distributions.We validate and demonstrate the effectiveness of our technique through experimentation.