Low-power neural network accelerators: advancements in custom floating-point techniques
Veröffentlichungsdatum
2024-05-22
Autoren
Betreuer
Gutachter
Zusammenfassung
This dissertation investigates design techniques involving custom Floating-Point (FP) computation for low-power neural network accelerators in resource-constrained embedded systems. It focuses on the sustainability of the future omnipresence of Artificial Intelligence (AI) through developing efficient hardware engines, emphasizing the balance between energy-efficient computations, inference quality, application versatility, and cross-platform compatibility.
The research presents a hardware design methodology for low-power inference of Spike-by-Spike (SbS) neural networks. Despite the reduced complexity and noise robustness of SbS networks, their deployment in constrained embedded devices is challenging due to high memory and computational costs. The dissertation proposes a novel Multiply-Accumulate (MAC) hardware module that optimizes the balance between computational accuracy and resource efficiency in FP operations. This module employs a hybrid approach, combining standard FP with custom 8-bit FP and 4-bit logarithmic numerical representations, enabling customization based on application-specific constraints and implementing acceleration for the first time in embedded systems.
Additionally, the study introduces a hardware design for low-power inference in Convolutional Neural Networks (CNNs), targeting sensor analytics applications. This proposes a Hybrid-Float6 (HF6) quantization scheme and a dedicated hardware accelerator. The proposed Quantization-Aware Training (QAT) method demonstrates improved quality despite the numerical quantization. The design ensures compatibility with standard ML frameworks such as TensorFlow Lite, highlighting its potential for practical deployment in real-world applications.
This dissertation addresses the critical challenge of harmonizing computational accuracy with energy efficiency in AI hardware engines with inference quality, application versatility, and cross-platform compatibility as a design philosophy.
The research presents a hardware design methodology for low-power inference of Spike-by-Spike (SbS) neural networks. Despite the reduced complexity and noise robustness of SbS networks, their deployment in constrained embedded devices is challenging due to high memory and computational costs. The dissertation proposes a novel Multiply-Accumulate (MAC) hardware module that optimizes the balance between computational accuracy and resource efficiency in FP operations. This module employs a hybrid approach, combining standard FP with custom 8-bit FP and 4-bit logarithmic numerical representations, enabling customization based on application-specific constraints and implementing acceleration for the first time in embedded systems.
Additionally, the study introduces a hardware design for low-power inference in Convolutional Neural Networks (CNNs), targeting sensor analytics applications. This proposes a Hybrid-Float6 (HF6) quantization scheme and a dedicated hardware accelerator. The proposed Quantization-Aware Training (QAT) method demonstrates improved quality despite the numerical quantization. The design ensures compatibility with standard ML frameworks such as TensorFlow Lite, highlighting its potential for practical deployment in real-world applications.
This dissertation addresses the critical challenge of harmonizing computational accuracy with energy efficiency in AI hardware engines with inference quality, application versatility, and cross-platform compatibility as a design philosophy.
Schlagwörter
CNN
;
TinyML
;
Sensor Analytics
;
Spike-by-Spike
;
HLS
;
Floating-Point
;
Hardware Accelerator
Institution
Fachbereich
Dokumenttyp
Dissertation
Sprache
Englisch
Dateien![Vorschaubild]()
Lade...
Name
Thesis_YaribNevarez.pdf
Description
Low-Power Neural Network Accelerators: Advancements in Custom Floating-Point Techniques
Size
8.66 MB
Format
Adobe PDF
Checksum
(MD5):2ed2272da6e3e0084cad399134aaf242