Many applications of practical interest rely on time evolution of Hamiltonians that are given by a sum of Pauli operators. Quantum circuits for exact time evolution of single Pauli operators are well known, and can be extended trivially to sums of commuting Paulis by concatenating the circuits of individual terms. In this paper we reduce the circuit complexity of Hamiltonian simulation by partitioning the Pauli operators into mutually commuting clusters and exponentiating the elements within each cluster after applying simultaneous diagonalization. We provide a practical algorithm for partitioning sets of Paulis into commuting subsets, and show that the proposed approach can help to significantly reduce both the number of cnot operations and circuit depth for Hamiltonians arising in quantum chemistry. The algorithms for simultaneous diagonalization are also applicable in the context of stabilizer states; in particular we provide novel four- and five-stage representations, each containing only a single stage of conditional gates.