Applied Reinforcement Learning

Applied Reinforcement Learning
Lecturer: Hao Shen
Assistant: Martin Gottwald
Targeted Audience: Wahlfach, Ergänzungsvorlesung (Master)
Umfang: 2/2 (SWS Lecture/Tutorial)
Term: Summer
Start registration: 15.02.2021
Time & Place:  
Lecture: 06 - 09.04.2021 (4 days in total)
Online lecture, 9:00 - 17:00 h
Tutorial/Exercise: during Semester Thursdays, 13:15 - 14:45, online
Question Session: during Semester Thursdays, 10:00 - 12:00 (online if requested)
First session in Semester: see TUMonline calendar


The course will take place despite Corona in a pure online format.

Unfortunately, this means it is not possible for you to use our (physical) robots. We replace them by a simulator and adapt the projects accordingly.


Reinforcement learning (RL) is one most powerful approach in solving sequential decision making problems. A reinforcement learning agent interacts with its environment and uses its experience to make decisions towards solving the problem. The technique has succeeded in various applications of operation research, robotics, game playing, network management, and computational intelligence.

This lecture provides an overview of basic concepts, practical techniques, and programming tools used in reinforcement learning. Specifically, it focuses on the application aspects of the subject, such as problem solving and implementations. By design, it aims to complement the theoretical treatment of the subject, such as mathematical derivation, convergence proves, and bound analysis, which are covered in the lecture "Approximate Dynamic Programming and Reinforcement Learning" in winter semesters.

In this lecture, we will cover the following topics (not exclusively):

  • Reinforcement learning problems as Markov decision processes
  • Dynamic programming (value iteration and policy iteration)
  • Monte Carlo reinforcement learning methods
  • Temporal difference learning (SARSA and Q learning)
  • Simulation-based reinforcement learning algorithms
  • Linear value function approximation, e.g. tile coding

We will not cover:

  • Deep Reinforcement Learning in any flavor
  • Deep function approximation architectures that change during the learning process

The excessive tuning of hyper parameters exceeds the time and computational constraints of the lecture.

The course project is done in groups of three, each group works on a physical robot. Currently we can provide:

  • Poppy Humanoid
  • Poppy Ergo
  • Stem Kit Level 1 & 2
  • Turtlebot
  • Metabot V2
  • E-Puck

It is possible to extend the existing robots during the project ( e.g. add new sensors, more construction parts, addtional equipment required for projects etc. ).

On completion of this course, students are able to:

  • describe classic scenarios of reinforcement learning problems;
  • explain basics of reinforcement learning methods;
  • model real engineering problems using reinforcement learning methods;
  • compare performance of the reinforcement learning algorithms that are covered in the course practically in the specific projects;
  • select proper reinforcement learning algorithms in accordance with specific problems, and argue their choices;
  • construct and implement reinforcement learning algorithms to solve simple robotics problems on physical systems

Registration Details

Due to the limited number of available robots, the number of participants has to be restricted. Please mind the following procedure:

  • If you have interest in the course, sign up on TUMOnline
  • Visit the block course before the semester starts (mandatory)
  • The places for the practical part of the lecture will be filled from all people showing up on the last lecture day. The ordering is determined by TUMonline.
  • Once you attended the complete lecture and sign up for the practical part, you are committed to the course and thus block a robot. If you skip the course lateron you will prevent other students from taking the course. Only sign up if you are sure to stay in the course for the whole semester!

Lecture Details

The lecture consists of two phases:

  1. six day block lecture before the semester starts (frontal teaching sessions in the morning, practical part with discussions after lunch)
  2. weekly tutorial sessions (two hours per week) throughout the semester
  3. Additional practical and question sessions if requested

Time and location are at the top of this page.


  • Sutton, R. S. & Barto, A. G., Reinforcement Learning: An Introduction. The MIT Press, 1998 (or the new version)
  • Bertsekas, D. P. & Tsitsiklis, J., Neuro-dynamic programming. Athena Scientific, 1996
  • Bertsekas, D. P., Dynamic Programming and Optimal Control Vol. 1 & 2.
  • Szepesvári, S., Algorithms for Reinforcement Learning. Morgan & Claypool, 2010 (a draft)

Target Audience and Signup

Students in a Masters degree program. Registration via TUMOnline.