Approximating Deep Learning Accelerators – AdequateDL
Approximating Deep Learning Accelerators
AdequateDL explores how approximate computing can improve the performance of deep-learning hardware accelerators. Deep learning is very relevant in this context, since playing with the accuracy to reach adequate computations will significantly enhance energy efficiency, while keeping quality of results in a user-constrained range. Outcomes include a framework for accuracy exploration and evaluation of gains in performance per watt of the proposed adequate accelerators over CPU/GPU platforms.
How approximation techniques can improve the performance of hardware accelerator for deep-learning applications?
The computational workload involved with convolutional neural networks (CNNs) for deep learning (DL) is often out of reach for low-power embedded devices and is still very costly when run on data centers. By relaxing the need for fully precise operations, approximate computing substantially improves performance and energy efficiency. DL is very relevant in this context, since playing with the accuracy to reach adequate computations will significantly enhance performance, while keeping quality of results in a user-constrained range.<br />The goal of this project is to explore how approximation techniques can improve the performance and energy efficiency of hardware accelerators for CNN in DL applications. In particular, we will study how custom floating-point and fixed-point arithmetic, adequate number representations, and even algorithmic-level transformations, can improve efficiency of CNN computations while keeping classification and learning phases at a very high success rate. The aim is to go further than current state-of-art research studies in which only the inputs and outputs of the neural network layers are quantized to low precision. The ambition of AdequateDL is to explore this new way to get order-of-magnitude improvements in performance and energy efficiency and therefore to influence the design of future computing systems dedicated to deep learning applications in both embedded and cloud markets.
A framework for precision exploration that brings together the expertise of INRIA, CEA LIST and LIRMM is being developed. This framework, based on N2D2 (Neural Network Design & Deployment), explores the impact of approximations and precision reduction on training and classification of CNNs. N2D2’s backend will also be adapted and used to generate C++ or OpenCL code amenable for high-level synthesis (HLS) of the approximated hardware accelerators. The generated code will take as inputs a library of reduced-precision operators in floating-point and fixed-point arithmetic and will also be optimized through some reduction precision techniques and source-level transformations. Moreover, AdequateDL will also exploit weight quantization methods during learning phase to bring the lowest accuracy loss for a given precision reduction. Finally, the solution will be synthesized and run on a target hardware platform to demonstrate the gains in performance and energy of the automatically generated accelerators.
The developed framework will be validated through a prototype of an FPGA accelerator for CNN. Precision exploration through controlled approximate computing will be the main focus to increase energy efficiency. Comparison with other platforms such as embedded GPU, and existing FPGA implementation of CNNs will be performed.
A library of custom floating-point (FlP) operators was designed and demonstrated the interest in terms of cost-accuracy trade-off of such reduced-precision FlP operators. The ctfloat library was released as opensource. Initial experiments explored the use of the custom FlP operators inside the N2D2 framework.
We also investigated the opportunity of Weight-Sharing (WS). In particular, we focus on how WS can be exploited to reduce the memory footprint of CNNs, what is the best granularity (i.e., network-, layer-, channel- or kernel-wise) to apply it and finally, to develop a framework to allow design space exploration for approximating CNNs. Thanks to the proposed sensitive metrics, we were able to implement a multi objective design space exploration to identify trade-offs between compression and accuracy.
We also integrated a quantization module in N2D2 for post-training quantization module. This method relies on a dataset of representative data points to calculate an approximation of the range of the outputs for each network layer. Further, new features (weight, activation, and threshold quantization) have been fully-integrated in the open-source repository of N2D2. N2D2 was also released to offer the possibility to generates C/HLS code for a quantized DNN. This code opens the way to automatically estimate the cost of a quantized DNN topology on an FPGA.
We expect the work on reduced-precision floating-point operators to be integrated in the near future to the recently developed quantization module for N2D2. We recently started exploring low precision acceleration of DNN training through a PhD funded by the project since October 2020. We are exploring custom number representation based on floating-point and their adaptation during the epochs of the training. The work is progressing fast and we expect relevant results during 2021.
Our efficient exploration method reduced the number of CNN scoring operations required to optimize a weight-sharing for a CNN to a few hundred with respect to the initial. With promising results on small CNNs, we look forward to applying it to larger, state-of-the-art models, such as ResNet and MobileNet, targeting larger datasets such as CIFAR-10/100 and ImageNet.We will explore the use of heuristic methods to avoid the costly scoring step and will, in particular, take a look at using the inertia(sum of the squared error) of the clustering as a proxy for the accuracy loss.
We plan to extend the compression process to include a pruning step.
We plan to evaluate the benefit with state of the art backend, currently exploring Xilinx’s FINN framework for implementing quantized neural network accelerator on FPGA target
We plan to integrate new modules in N2D2 to apply quantization aware training on network parameters. These new features shall balance the loss induced on passing state of the art DNN toward 8-bits integers numerical precision. Moreover, it will be possible to evaluate the performances of these modules on a more aggressive quantization level (<8bits). Evaluation of the robustness and scalability of these methods will be performed on SOTA DNN like MobileNet, Resnet trained on CIFAR10/100 and ImageNet. An evaluation on a segmentation wise task, trained on cityscapes dataset should give a good idea on the robustness of the method on the industrial field of ADAS.
1. E. Dupuis, D. Novo, I. O’Connor and A. Bosio, «On the Automatic Exploration of Weight Sharing for Deep Neural Network Compression,« IEEE/ACM Design, Automation & Test in Europe Conference (DATE), 2020, pp. 1319-1322.
2. E. Dupuis, D. Novo, I. O’Connor and A. Bosio, «Sensitivity Analysis and Compression Opportunities in DNNs Using Weight Sharing,« 23rd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), 2020, pp. 1-6.
3. Alberto Bosio, “Security in an Approximated World”, Tutorial HIPEAC 2020.
4. Alberto Bosio, “Making AI Applications Robust, Secure and Efficient: AxC to the Rescue”, Tutorial HIPEAC 2021.
5. Alberto Bosio, Olivier Sentieys, Daniel Menard, “A Comprehensive Analysis of Approximate Computing Techniques: From Component- to Application-Level«, Tutorial DATE 2019.
6. E. Dupuis, D. Novo, I. O’Connor and A. Bosio, « Fast Exploration of Weight Sharing Opportunities for CNN Compression,« Workshop on System-level Design Methods for Deep Learning on Heterogeneous Architectures (SLOHA 2021) in conjunction with DATE 2021.
7. V. Ha, T. Yuki, and O. Sentieys. “Towards Generic and Scalable Word-Length Optimization”. 23rd IEEE/ACM Design, Automation and Test in Europe (DATE), Mar. 9, 2020, pp. 1–6.
8. V. Ha and O. Sentieys. “Leveraging Bayesian Optimization to Speed Up Automatic Precision Tuning”. 24th IEEE/ACM Design, Automation and Test in Europe (DATE), 2021.
A common book chapter entitled «Approximations in Deep Learning« is being written. It should be published in the summer of 2021 by Springer.
Several softwares have been released as opensource.
The design and implementation of convolutional neural networks for deep learning is currently receiving a lot of attention from both industrials and academics. However, the computational workload involved with CNNs is often out of reach for low power embedded devices and is still very costly when run on datacenter. By relaxing the need for fully precise operations, approximate computing substantially improves performance and energy efficiency. Deep learning is very relevant in this context, since playing with the accuracy to reach adequate computations will significantly enhance performance, while keeping quality of results in a user-constrained range. AdequateDL will explore how approximations can improve performance of hardware accelerator in deep-learning applications. Outcomes include a framework for accuracy exploration, demonstration of performance gains by several orders of magnitude of proposed adequate accelerators with regards to conventional CPU/GPU computing platforms.
Project coordination
Olivier Sentieys (Centre de Recherche Inria Rennes - Bretagne Atlantique)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
LIST Laboratoire d'Intégration des Systèmes et des Technologies
Inria Rennes - Bretagne Atlantqiue Centre de Recherche Inria Rennes - Bretagne Atlantique
UM-LIRMM Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
INL INSTITUT DES NANOTECHNOLOGIES DE LYON
Help of the ANR 559,126 euros
Beginning and duration of the scientific project:
January 2019
- 42 Months