Conditionally Independent Data Generation
Conditional independence (CI) is a fundamental concept with wide applications in machine learning and causal inference. Although the problems of testing CI and estimating divergences have been extensively studied, the complementary problem of generating data that satisfies CI has received much less attention. Given samples from an input data distribution, we formulate the problem of generating samples from a distribution that is close to the input distribution and satisfies CI. We establish a general characterization of CI in terms of a divergence identity that holds for a large class of divergences satisfying separability and strict convexity properties. Based on the Jensen-Shannon version of this identity, an architecture is proposed that leverages the capabilities of generative adversarial networks (GANs) to enforce CI in an end-to-end differentiable manner. As one illustration of the problem formulation and architecture, we consider applications to notions of fairness that can be written as enforcing CIs. We demonstrate the generation of data to trade off adherence to fairness criteria against classification accuracy.