FPGA programming for the masses: The programmability of FPGAs must improve if they are to be part of mainstream computing
Abstract
BRAMs, which are specialized memory structures distributed throughout the FPGA fabric in columns, are of particular importance. Each BRAM can hold up to 36Kbits of data. BRAMs can be used in various form factors and can be cascaded to form a larger logical memory structure. Because of the distributed organization of BRAMs, they can provide terabytes of bandwidth for memory bandwidth-intensive applications. The contrast in performance between processors and FPGAs lies in the architecture itself. Processors rely on the Von Neumann paradigm where an application is compiled and stored in instruction and data memory. They typically work on an instruction and data fetch-decode-execute- store pipeline. This means both instructions and data have to be fetched from an external memory into the processor pipeline. Although caches are used to alleviate the cost of expensive fetch operations from external memory, each cache miss incurs a severe penalty. The bandwidth between processor and memory is often the critical factor in determining the overall performance.