Embedded System Design for Machine Learning

Module Number: EI71104

Duration: 1 Semester

Occurence: Winter and Summer Semester

Language: English

Number of ECTS: 6


Professors in charge: Daniel Müller-Gritschneder, Wolfgang Ecker

Amount of work

Contact hours: 90

Self-study hours: 90

Total: 180

Description of achievement and assessment methods

1)    Code submission for the lab part. (coursework)
2)    Written final exam (90min) (Examination, 100% of grade)

1)    The student will hand in code submissions for the three lab parts, which will be graded as passed or fail. All three submissions need to get grades as passed to participate in the final exam.
2)    In the final exam (90min written or 30min oral), the students will answer questions from the lecture content and from the lab part to test their understanding of the theoretical and practical aspects of embedded machine learning.

Recommended requirements

Basic knowledge on embedded C and micro-controllers (e.g. in the form of a micro-controller programming lab) is assumed to be known.
Basic knowledge of Hardware design in VHDL or Verilog (e.g. in the form of a hardware design lab or the lecture "Entwurf Digitaler Systeme mit VHDL und SystemC" (Prof. Ecker) is assumed to be known.
Basic knowledge of machine learning algorithms (e.g. in the form of the computational intelligence lecture or the lecture Machine Learning: Methods and Tools lecture) is recommended.

Intended Learning Outcome

Upon completion of this module, students are able to:

*    Understand the design flow and design steps for deploying machine learn-ing workloads on embedded devices.
*    Evaluate the trade-offs involved in executing machine learning workloads such as neural network inference in software and hardware for on embed-ded processors and dedicated accelerators.
*    Apply effectively model compression methods to embedded machine learn-ing workloads and understand the theory behind the methods.
*    Apply hardware acceleration principles (SIMD, Vector, 2D systolic arrays) for accelerating ML workloads and know about the influence of the memory system.

*    Apply a state-of-the-art machine learning deployment flow to a simple machine learning application such as keyword recognition.
*    Implement the deployment code on an embedded processor platform (micro-controller board) and to design a simple hardware accelerator for the application that works in simulation.


*    Introduction to the design flow and design steps to deploy machine learn-ing workloads on embedded devices2 *
Machine learning theory to understand the typical structure, operatorsand trade-offs in accuracy, memory and performance demands of machinelearning workloads
*    Neural Network Model Compression Methods: Number systems, Integerand sub-byte Quantization, Quantization-aware training, Pruning, RankReduction.
*    Software Optimization Methods: Memory planning, target-aware operator optimization, operator fusing and tiling
*    Methods and basic HW blocks for embedded HW-acceleration (SIMD,Vector Instructions, loosely-coupled accelerators, memory systems)

The lab part will cover the following contents to transfer the theory into practice:

*    Introduction to the Machine Learning Deployment Toolchain TVM
*    Training, model optimization and deployment of a keyword recognitionapplication onto a low-power micro-controller board using TVM
*    Design of a simple HW accelerator to offload machine learning workload into hardware with test by simulation


Teaching and learning methods

The module will consist of two parts, a weekly lecture and a parallel lab with three parts.
The lecture will consist of classroom sessions with slide presentation. The exercises will be integrated in the lecture flow in order to apply the learned content directly on example problems. This will be done using activating methods such as group works.
The lab will be split into three major tasks, that will be tutored, each being introduced in one classroom lab session. The students will work on these tasks in small groups on their own schedule to also train team work and independent work. The lab tasks will directly put the lecture content into practise, hence, following a problem-oriented learning approach.


The course will be taught based on lecture material in the forms of slides and with additional exercises. The lab part will involve the work on a state of the art open-source simulation and deployment flow (TVM) that can be used either on university lab PCs or private machines. Additionally, low-power micro-controller boards will be used to demonstrate the application in real hardware.

Reading List

There exist textbooks covering the content of the lectures so far. It is planned to provide a lecture script based on the lecture contents in the future.Following books cover parts of the lecture’s content and provide related information:

*    Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer;Efficient Pro-cessing of Deep Neural Networks; Morgan & Claypool Publishers
*    David Patterson, John Hennessy;Computer Organization and Design RISC-V Edition - The Hardware Software Interface; Elsevier