Resource-Aware DYnamically Adaptable machine Learning – RADYAL
Resource-Aware DYnamically Adaptable machine Learning
In this project we propose an original interdisciplinary approach that allows DNN models to be dynamically configurable at run-time on a given reconfigurable hardware accelerator architecture, depending on the external environment, following an approach based on feedback loops and control theory.
Run-time configurable and contrôlable DNN SW and HW accel
At software (SW) level, for a given DNN model, different variants with incremental precision levels can be obtained by setting parameters along different dimensions: (i) data precision or quantization (increasing/decreasing bit-width of activations and/or weights), (ii) degree of sparsification (e.g., pruning, tensor decomposition), (iii) depth of the NN (number and type of network layers to execute). Depending on the chosen SW precision level, the mean output accuracy will change, as well as energy consumption and timing.<br />The key observation is that, for some particularly “easy” inputs, using high-precision energy-hungry computations is an “overkill”. Conversely, for “hard” inputs, low-precision energy-efficient computations are not enough. Therefore, being able to dynamically change the SW precision is key to enable energy-efficient and accurate NN computations. At the same time, at the hardware (HW) level, the DNN accelerator needs to be configurable at runtime to satisfy SW processing requirements. Different degrees of existing HW strategies have an impact on energy consumption and runtime. For example, changing operation mapping strategies (e.g., Input-Stationary, Weight-Stationary, Output-Stationary, Row-Stationary), the number of active Processing Elements (PE) inside an accelerator (e.g., 16x16 or 8x8 systolic array), their data precision (e.g., 8 bits, 4 bits), sparsification support (e.g., weight pruning, element-wise pruning, structured pruning, dynamic pruning, activation sparsification).<br />We want to explore the different configuration opportunities at SW and HW level – independently and jointly – to be able to build a light surrogate model of the configurations’ impact on system’s attributes. This model will enable a control-theory-oriented approach to combine SW and HW configurations delivering accurate DNN outputs while ensuring the most efficient computation possible, given the runtime conditions.
RADYAL research methodology is organized in 3 Work Packages (WPs). Firstly, design space explorations will be carried out in WP1 and WP2 independently, to provide suitable SW and HW
configurations and control variables for the runtime controller. In a second phase, the obtained results will be combined in a HW/SW design space exploration (that will involve all WPs) to pave the way to optimal runtime control strategies. Hence, control strategies for HW/SW dynamic precision, in relation to system status and field conditions, will be proposed in WP3. Finally, a computer vision application will be used to showcase the
whole approach.
(Scientific results to come.)
–
–
Research on machine learning and Deep Neural Networks (DNN) has made considerable progress in the past decades. State-of-the-art DNN models usually require large amounts of data to be trained and contain a tremendous number of parameters leading to overall high resource requirements, in terms of computation and memory and thus energy. In the past years, this gave rise to approaches to reduce these requirements, where, for example, during or after training, parts of the model are removed (pruning) or stored with lower precision (quantisation) or surrogate models are trained (knowledge distillation) or where the best configuration is searched by testing different parameters (Neural Architecture Search, NAS). Also, concerning the hardware, many optimisations have been proposed to accelerate the inference of DNNs on different architectures.
But these accelerators are usually specific to a given hardware and are optimised to satisfy certain static performance criteria. However, for many applications, the performance requirements of a DNN model deployed on a given hardware platform are not static but evolving dynamically as its operating conditions and environment change. Thus, in this project we propose an original interdisciplinary approach that allows DNN models to be dynamically configurable at run-time on a given reconfigurable hardware accelerator architecture, depending on the external environment following an approach based on feedback loops and control theory.
Project coordination
Stefan Duffner (UMR 5205 - LABORATOIRE D'INFORMATIQUE EN IMAGE ET SYSTEMES D'INFORMATION)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
LIRIS UMR 5205 - LABORATOIRE D'INFORMATIQUE EN IMAGE ET SYSTEMES D'INFORMATION
GIPSA-lab Grenoble Images Paroles Signal Automatique
Centre Inria de l'Université de Rennes Centre Inria de l’Université de Rennes
Help of the ANR 608,687 euros
Beginning and duration of the scientific project:
September 2023
- 42 Months