This paper presents a robust deep in-memory machine learning classifier with a stochastic gradient descent (SGD)-based on-chip trainer using a standard 16-kB 6T SRAM array. The deep in-memory architecture (DIMA) enhances both energy efficiency and throughput over conventional digital architectures by reading multiple bits per bit line (BL) per read cycle and by employing mixed-signal processing in the periphery of the bit-cell array. Though these techniques improve the energy efficiency and latency, DIMA's analog nature makes it sensitive to process, voltage, and temperature (PVT) variations, especially under reduced BL swings. On-chip training enables DIMA to adapt to chip-specific variations in PVT as well as data statistics, thereby further enhancing its energy efficiency. The 65-nm CMOS prototype IC demonstrates this improvement by realizing an on-chip trainable support vector machine. By learning chip-specific weights, on-chip training enables robust operation under reduced BL swing leading to a 2.4 times reduction in energy over an off-chip trained DIMA. The prototype IC in 65-nm CMOS consumes 42 pJ/decision at 32 M decisions/s, corresponding to 3.12 TOPS/W (1 OP = one 8-b × 8-b MAC) during inference, thereby achieving a reduction of 21 times in energy and 100 times in energy-delay product as compared with a conventional digital architecture. The energy overhead of training is <26% per decision for SGD batch sizes of 128 and higher.