About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Abstract
Unlike traditional supervised learning, in many settings only partial feedback is available. Such settings encompass a wide variety of applications including pricing, online marketing and precision medicine. We approach this task as a domain adaptation problem and propose a self-training algorithm which imputes outcomes with finite discrete values for finite unseen actions in the observational data to simulate a randomized trial. We offer a theoretical motivation for this approach by providing an upper bound on the generalization error defined on a randomized trial under the self-training objective. We empirically demonstrate the effectiveness of the proposed algorithms.