The data rate of a Tomlinson-Harashima (TH) precoder is limited by the delay in the feedback path of its infinite impulse response (IIR) filter that cancels the channel's post-cursor intersymbol interference (ISI). One reason for long loop delays is the requirement to perform the ISI subtraction from the data signal in binary arithmetic in order to properly execute the modulo (MOD) operation, which stabilises the IIR filter and limits the transmitter amplitude. The restriction to binary arithmetic is unfortunate, because some filter subcomponents such as, e.g. adder trees are often implemented in a faster data format that has no slow carry propagation [e.g. the carry-save adder (CSA) format]. This paper proposes a novel architecture for TH precoders in which the MOD operation is taken out of the loop, so that the remaining IIR filtering is performed entirely in CSA arithmetic to reduce the feedback delay as no conversion to binary format is required within the loop. With the pipelining technique applied, the filter's feedback delay reduces to the propagation delay of a multiply-Accumulate unit operated with CSA arithmetic. The proposed concept was verified in a 14 nm CMOS test chip with multi-level signalling (NRZ, 4-/8-/16-PAM).