This paper revisits a classical task of learning probabilistic mixture models. Our major goal is to sparsely learn the mixture weights to automatically determine the right number of clusters. The key idea is to use a novel Bernoulli prior on the mixture weights in a Bayesian learning framework, and formalize the task of determining the mixture weights as an ℓ0-regularized optimization problem. By leveraging a specific mathematical structure, we derive a quadratic time algorithm for efficiently solving the non-convex ℓ0-based problem. In experiments, we evaluate the performance of our proposed approach over existing methods in recovery capability and anomaly detection for synthetic as well as real-world data sets.