Resource-demand Estimation for Edge Tensor Processing Units

Publication by Benedict Herzog, Stefan Reif, Judith Hemp, Timo Hönig, Wolfgang Schröder-Preikschat
Related to the Energy-, Latency- And Resilience-aware Networking (e.LARN) project
Published in ACM Transactions on Embedded Computing Systems, 2022

Paper

Abstract:

Machine learning has shown tremendous success in a large variety of applications. The evolution of machine-learning applications from cloud-based systems to mobile and embedded devices has shifted the focus from only quality-related aspects towards the resource demand of machine learning. For embedded systems, dedicated accelerator hardware promises the energy-efficient execution of neural network inferences. Their precise resource demand in terms of execution time and power demand, however, is undocumented. Developers, therefore, face the challenge to fine-tune their neural networks such that their resource demand matches the available budgets. This article presents Precious, a comprehensive approach to estimate the resource demand of an embedded neural network accelerator. We generate randomised neural networks, analyse them statically, execute them on an embedded accelerator while measuring their actual power draw and execution time, and train estimators that map the statically analysed neural network properties to the measured resource demand. In addition, this article provides an in-depth analysis of the neural networks’ resource demands and the responsible network properties. We demonstrate that the estimation error of Precious can be below 1.5% for both power draw and execution time. Furthermore, we discuss what estimator accuracy is practically achievable and how much effort is required to achieve sufficient accuracy.