Recent work has explored using the stabilizer formalism to classically simulate quantum circuits containing a few non-Clifford gates. The computational cost of such methods is directly related to the notion of stabilizer rank, which for a pure state ψ is defined to be the smallest integer χ such that ψ is a superposition of χ stabilizer states. Here we develop a comprehensive mathematical theory of the stabilizer rank and the related approximate stabilizer rank. We also present a suite of classical simulation algorithms with broader applicability and significantly improved performance over the previous state-of-the-art. A new feature is the capability to simulate circuits composed of Clifford gates and arbitrary diagonal gates, extending the reach of a previous algorithm specialized to the Clifford+T gate set. We implemented the new simulation methods and used them to simulate quantum algorithms with 40-50 qubits and over 60 non-Clifford gates, without resorting to high-performance computers. We report a simulation of the Quantum Approximate Optimization Algorithm in which we process superpositions of χ ∼ 106 stabilizer states and sample from the full n-bit output distribution, improving on previous simulations which used ∼ 103 stabilizer states and sampled only from single-qubit marginals. We also simulated instances of the Hidden Shift algorithm with circuits including up to 64 T gates or 16 CCZ gates; these simulations showcase the performance gains available by optimizing the decomposition of a circuit's non-Clifford components.