Memristive crossbar arrays can be used to realize matrix-vector multiplication (MVM) operations in constant time complexity by exploiting the Kirchhoff's circuit laws. This is enabled by the parallel read of the entire array in a single time step. However, parallel writing is prohibitive in such arrays due to limitations on the current that could be accumulated along the wires. Hence, loading the matrix elements into such an array still incurs significant time penalty. Another key challenge is the achievable computational precision. To overcome these challenges, we propose a unit-cell array design where each unit-cell comprises four memristive devices each attached to a selection transistor. Moreover, the array is organized in such a way that the selection transistors can be turned on in a diagonal fashion. We experimentally demonstrated this concept by fabricating a $2\times 2$ unit-cell array based on projected phase-change memory (PCM) devices in 90 nm CMOS technology. It is shown that using the diagonal connections, the write operations can be parallelized while maintaining the current limit of the back-end-of-the-line metallization. Moreover, the increase in write time due to having more devices per unit-cell is minimized through a combination of single-shot and iterative programming schemes. Finally, we present experimental results on MVM operations that demonstrate improved computational precision exceeding that of a 4-bit fixed-point implementation.