Abstract
Convolutional Neural Networks (CNNs) have been widely used in many computer applications. The growth in deep neural networks and machine learning applications has resulted in the state-of-the-art in CNN architectures becoming more and more complex. Millions of multiply-accumulate (MACC) operations are needed in this kind of processing. To deal with these massive computing requirements, accelerating CNNs on FPGAs has become a viable solution for balancing power efficiency and processing speed. In this paper, we propose an approximate high-speed implementation of the convolution stage of a CNN computing architecture, the Approximate Multiply-Accumulate Array. Compared with the traditional multiply-accumulate operation, this implementation converts multiplications into additions and systolic accumulate operations. A key feature is the logarithmic addition with iterative residual error reduction stages which, in principle, allows to trade off power, area and speed with accuracy through for specific data using different configurations. Here, we present experiments where we configure the approximate multiplier in different ways, changing number of iteration stages as well as the bit width of the data and investigate the impact on overall accuracy. In this paper we present initial experiments evaluating the architecture's error using random input data, and Sobel Edge detection is used to investigate the proposed architecture with regard to its use in image-processing CNNs. The experimental results show that the proposed approximate architecture is up to 10.7 and that residual errors after two iterations reach 1.6-bit inputs and 0.0012-bit inputs on average, based on 10,000 random samples.
Original language | English |
---|---|
Title of host publication | 2019 14th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC) |
Pages | 35-42 |
Number of pages | 8 |
DOIs | |
Publication status | Published - 1 Jul 2019 |
Keywords
- Field programmable gate arrays
- Computer architecture
- Kernel
- Power demand
- Logic gates
- Hardware
- Biological neural networks
- FPGA
- Approximate Computing
- Convolutional Neural Networks
- Neural Network Accelerator