Emergence of numerous modalities for data generation necessitates the development of machine learning techniques that can perform efficient inference with multi-modal data. In this paper, we present an approach to learn discriminant low-dimensional projections from supervised multi-modal data. We construct intra- and inter-class similarity graphs for each modality and optimize for consensus projections in the kernel space. Features obtained with these projections can then be used to train a classifier for consensus inference. We also provide methods for out-of-sample extensions with novel test data. Classification results with standard multi-modal data sets demonstrate the efficacy of our method.