There has been a surge of interest in learning optimal decision trees using mixed-integer programs (MIP) as heuristic-based methods do not guarantee optimality and find it challenging to incorporate practical operational constraints. Existing MIP methods rely on an arc-based binary-tree formulations that are limited to datasets having a few thousand samples, sample-level constraints, and linear metrics. We propose a novel path-based MIP model that can be solved using a scalable column generation framework to produce a multiway-split tree that is more interpretable due to its shorter rules. Our method can optimize nonlinear metrics like F1 score and satisfy a broader class of constraints. We demonstrate its efficacy with extensive experiments including a million-sample dataset and report up to a 24X reduction in runtime compared to the state-of-art MIP-based methods.