In this paper we present a novel discriminative mixture model for statistical machine translation (SMT).We model the feature space with a log-linear combination of multiple mixture components. Each component contains a large set of features trained in a maximum entropy framework. All features within the same mixture component are tied and share the same mixture weights, where the mixture weights are trained discriminatively to maximize the translation performance. This approach aims at bridging the gap between the maximum-likelihood training and the discriminative training for SMT. It is shown that the feature space can be partitioned in a variety of ways, such as based on feature types, word alignments, or domains, for various applications. The proposed approach improves the translation performance significantly on a large-scale Arabic-to-English MT task. © 2011 Association for Computational Linguistics.