Architectural Optimizations and Synthesis Tools for Improved Energy Efficiency and Faster Design Closure for FPGAs

Somsubhra Mondal

doi:https://doi.org/10.21985/N2XX4Q

Work

Architectural Optimizations and Synthesis Tools for Improved Energy Efficiency and Faster Design Closure for FPGAs

Public Deposited

Download PDF

Download All Files (.zip)

FPGAs are evolving at a rapid pace with improved performance and logic density. However, power efficiency of FPGAs has continuously lagged behind, and hence power optimization of FPGAs is crucial. Trends in technology scaling makes leakage power a serious concern for designers; on the other hand, routing power is the dominant component of total power consumption in FPGAs. We propose a hierarchical look-up table (LUT) structure for FPGAs to improve leakage power consumption. We present an analysis on the number of inputs actually used by LUTs, and depending on the number of inputs used by the LUTs, we shut down SRAM cells, transistors, and multiplexers associated with the unused LUT inputs. Based on this technique, for 180nm technology, we report an average savings of 22.94% (as high as 64.22%) in leakage power for logic blocks. The savings will be even greater for technologies 90nm or below that are currently in use. We also propose a Dual-Vdd-dual-Vt interconnect architecture, where voltage scaling is applied within the programmable interconnect structure of the FPGA to reduce routing power consumption. Our experiments reveal that an average reduction of 23.45% (as high as 47%) in total interconnect power is achievable with 11.75% worst-case delay penalty. Another major challenge in FPGA-based design is to comply with the resource and storage capacity of the target device. We propose an early estimation framework for the hardware cost before actually attempting the synthesis of a streaming accelerator on reconfigurable logic. Specifically, our proposed framework tackles the problem of pre-synthesis estimation of functional unit area cost, while incorporating the potential impact of resource constraints and different operator bitwidths on the final implementation. We evaluated our estimation technique by comparing the estimated area with the area of the synthesized design and the average estimation error is 9.3%. We also present a global resource sharing technique for pipelined CDFG synthesis and optimization techniques to reduce the area requirement of the stream queue buffers in reconfigurable accelerators.

Last modified