In this talk, we present a framework to learn interpretable optimal policy from observational data. The proposed framework consists of a causal teacher model which produces counterfactual outcomes corresponding to different treatment actions, and a prescriptive student model which distills a set of optimized policies in the form of a tree. We show the resulting prescriptive tree can be learned greedily for swift deployment. As the greedy heuristic is unable to incorporate constraints that are often critical for enterprise applications, we introduce a scalable mixed-integer program that solves the constrained policy prescription problem via column generation. We will highlight the results from an online test that shows a 7% increase in revenue over the legacy pricing benchmark, where we applied this solution to a large US airline in premium seat upsell.