Int. J. Parallel Program

Generating ASIPs with Reduced Number of Connections to the Register-File

Download paper


We propose automatic synthesis of application specific instruction set processors (ASIPs). We use pipeline execution of multi-op machine-instructions, e.g., ∗ (reg1 ∗ reg2) = (∗ reg3) + (∗ reg4) (C-syntax) an instruction with three memory pipeline stages and two arithmetic stages. The problem is, for a given set of loops, to find a pipeline configuration and a multi-op ISA that maximizes the IPC (instructions per cycle) while minimizing the resource usage and the cost of interconnections to the register-file of the resulting CPU. The algorithm is based on finding an efficient cover of a large graph by a small set of convex sub-graphs (called gis) that are consistent with a given set of pipeline units. Unlike previous works, gis are not synthesized to circuits that are executed in a co-processor mode but rather both gis and the rest of the program are executed by the same set of multiop pipeline units. In this way we eliminate the overhead associated with the co-processor mode of regular ASIPs but maintain high values of IPC of these ASIPs. The main advantage of using pipeline execution of multi-op versus VLIW instructions is shown to be the cost of interconnections between the CPU’s execution units and the register file. Once the pipeline configuration and the cover g1∪ ⋯ ∪ gn= G has been computed the Verilog RTL of the corresponding CPU (extended with branch instructions) is generated and synthesized to FPGA. The results show that, for a set of selected kernels, the resulting ASIP (called Ocpu) obtains higher IPC values compare to an equivalent compilation to an ARM cpu while obtaining similar clock frequencies.


13 Feb 2017


Int. J. Parallel Program