Student research projects

There are always topics for student research projects here at MMK (Bachelor's and Master's thesis, Research Internship, IDP).

When you have found a topic please contact the scientific assistant.

Ingenieurpraxis: The aim of the Ingenieurpraxis is to have a look into the processes in the industry. For this reason we don't offer some Ingenieurpraxis here at MMK, but it is possible to supervise you if you find a position in a company.

Additionally, we do not offer any internships to students from outside TUM. Because of the volume of requests we receive, it is not possible for us to answer all emails with internship requests.

Actual appointments of the MMK student research project talks

Topics for Student Projects

Area: Speech Recognition

Deep Neural Networks for Speech Recognition

Thema Deep Neural Networks for Speech Recognition
Typ Forschungspraxis, IDP, Master
Betreuer Ludwig Kürzinger, Dipl.-Ing. (Univ.)
Tel.: +49 (0)89 289-28562
E-Mail: ludwig.kuerzinger@tum.de
Sachgebiet Speech Recognition
Beschreibung Motivation:
Speech Recognition enables a machine to understand human voice and convert it to text. Conventional speech recognition systems are based on a combination of neural networks and hidden markov models. With the advent of deep learning and increasing computational power, deep neural networks are able to achieve the performance of the traditional systems, but do not require complex feature crafting at the same time.
Your work will be about key concepts of deep neural nets which are not yet fully understood. For example attention [1], inspired by the human ability to concentrate on important information, is a simple but powerful technique that can directly transform any audio signal directly into a sequence of characters.

Task Description:
The main task will be about applying or examining neural networks for speech recognition. The topic can be conducted in English or German. For more information about the topic, please contact the supervisor.

References:
[1] Vaswani, Ashish, et al. Attention is all you need., 2017.
[2] Graves, Alex, et al. Connectionist temporal classification, 2006
Voraussetzung - Experience with Python and/or C++
- Experience with machine learning
- Independent work style
- Motivation to learn new concepts
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to " application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Area: Computer Vision

Lightweight User Identification System

Thema Lightweight User Identification System, Announcement PDF
Typ Research Internship, Bachelor’s Thesis, IDP
Betreuer Torben Teepe, M.Sc.
E-Mail: t.teepe@tum.de
Sachgebiet Computer Vision
Beschreibung More and more systems are able to authenticate and identify the user by their face [1]. At this chair, we develop models to match faces to given gallery of identities. First, the face of an unknown person is localized within an image by a face detection model. Next, the face is aligned such that eyes, mouth and nose are always on a similar position within the image [2]. The normalized face is passed onward to the face identification model, which can be interpreted as a feature extractor. This feature vector is then compared using a distance metric to feature vectors from the gallery.
The goal of this work is to, showcase this pipeline in a simple sales system. Using a Raspberry Pi + Camera + Touchscreen we want to allow users to log their coffee consumption. The user will be identified using the camera and then confirms their purchase with the touchscreen. The face identification model will be provided to the student. The task is then to use this model to implement the pipeline, build a GUI and evaluate the results in real-world use.
Depending on your personal interests you can choose from these additional tasks:
  • Design a case in CAD for the system (that will be 3D printed)
  • Fraud detection with photoplethysmography
[1] Wang, Mei, and Weihong Deng. "Deep face recognition: A survey." arXiv preprint arXiv:1804.06655 (2018).
[2] Zhang, Kaipeng, et al. "Joint face detection and alignment using multitask cascaded convolutional networks." IEEE Signal Processing Letters 23.10 (2016): 1499-1503.
Voraussetzung
  • Good programming skills, ideally in Python
  • Prior knowledge in Machine/Deep Learning is helpful
  • Experience with OpenCV, Qt5 & Raspbian OS is a plus
Bewerbung If you are interested in a topic in this area, we welcome the applications via the email address above. Please set the email subject to " application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why you are interested in the topic in the text of the message. Also make sure to attach your most recent CV and grade report.

Distracted Driver Dataset

Thema Distracted Driver Dataset
Typ Master
Betreuer Okan Köpüklü, M.Sc.
Tel.: +49 (0)89 289-28554
E-Mail: okan.kopuklu@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation: According to the last National Highway Traffic Safety Administration (NHTSA) report, one in ten fatal crashes and two in ten injury crashes were reported as distracted driver crashes in the United State in 2014. Therefore detecting the drivers distraction state is utmost important to reduce driver-related accidents. For this task, properly annotated dataset for drivers actions observation is necessary. With such a dataset, state-of-the art Deep Learning Architectures can be used to recognize the distraction state of the drivers.

Task: The main task is to collect a “Distracted Driver Dataset”, and use a light-weight Convolutional Neural Networks (CNN) architecture in order to detect driver’s distractive actions. The dataset should contain the following annotations:
1. Predefined distractive actions that the drivers do
2. Drivers hand states (whether they are on the wheel or not)

During the thesis, the following steps will be followed in general:
1. State-of-the-art research
2. Dataset collection and preparation (i.e. labeling and formating)
3. Light-weight CNN Architecture design
4. Evaluation of the CNN Architecture on the prepared dataset
5. Demonstration of the working system

References:
[1] Baheti, B., Gajre, S., & Talbar, S. (2018). Detection of Distracted Driver using Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1032-1038).
[2] Hssayeni, M. D., Saxena, S., Ptucha, R., & Savakis, A. (2017). Distracted driver detection: Deep learning vs handcrafted features. Electronic Imaging, 2017(10), 20-26.
[3] G. Borghi, E. Frigieri, R. Vezzani and R. Cucchiara, "Hands on the wheel: A Dataset for Driver Hand Detection and Tracking," 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi'an, 2018, pp. 564-570.
Voraussetzung 1. Excellent coding skills, preferable in Python
2. Experience in deep learning frameworks, preferably in Torch/PyTorch
3. Motivation to work on deep learning.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to " application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Real-time Detection and classification of Dynamic Hand Gestures

Thema Lightweight User Identification System
Typ Research Internship, Bachelor’s Thesis, IDP
Betreuer Torben Teepe, M.Sc.
E-Mail: t.teepe@tum.de
Sachgebiet Computer Vision
Beschreibung More and more systems are able to authenticate and identify the user by their face [1]. At this chair, we develop models to match faces to given gallery of identities. First, the face of an unknown person is localized within an image by a face detection model. Next, the face is aligned such that eyes, mouth and nose are always on a similar position within the image [2]. The normalized face is passed onward to the face identification model, which can be interpreted as a feature extractor. This feature vector is then compared using a distance metric to feature vectors from the gallery.
The goal of this work is to, showcase this pipeline in a simple sales system. Using a Raspberry Pi + Camera + Touchscreen we want to allow users to log their coffee consumption. The user will be identified using the camera and then confirms their purchase with the touchscreen. The face identification model will be provided to the stu- dent. The task is then to use this model to implement the pipeline, build a GUI and evaluate the results in real-world use.
Depending on your personal interests you can choose from these additional tasks:
  • Design a case in CAD for the system (that will be 3D printed)
  • Fraud detection with photoplethysmography
[1] arXiv preprint arXiv:1804.06655 (2018).[2] Zhang, Kaipeng, et al. "Joint face detection and alignment using multitask cascaded convolutional networks." IEEE Signal Processing Letters 23.10 (2016): 1499-1503.
Voraussetzung  
Bewerbung  

Joint Segmentation and Tracking of Targets in Video Using Deep Learning

Joint Segmentation and Tracking of Targets in Video Using Deep Learning  
Typ Master, Forschungspraxis
Betreuer Maryam Babaee, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: maryam.babaee@tum.de
Sachgebiet Computer Vision
Beschreibung In some video surveillance applications such as activity recognition, it is required to segment objects in video as well as their tracking. Both segmentation and tracking of multi targets in video are challenging problems in computer vision. In joint segmentation and tracking approaches, much detailed information in level of pixel or super pixel is used compared to detection boxes. To track people in a video, the mapping between observations in consequent frames can be formulated as a probabilistic graphical model such as CRF (Conditional Random Field). CRF is a powerful framework in solving discrete optimization problems like tracking as well as segmentation.
Based on a research work on the semantic image segmentation [1], a CRF model can be casted to a Recurrent Neural Network (RNN). The goal is to extend this deep learning technique for joint segmentation and multi people tracking problem. To do this, these two problems would be first formulated as a unified CRF model and then we develop a deep RNN that could mimic the proposed CRF. Below you can see three frames of a video sequence captured at different times as well as their corresponding segmentation.



Ref:
[1] http://www.robots.ox.ac.uk/~szheng/crfasrnndemo
Voraussetzung Basic knowledge in probabilistic graphical model and neural network as well as solid programming skill are required. In case you have any question, write me an email.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “ application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

CNN Application to Video Saliency

Thema CNN Application to Video Saliency
Typ Master, Forschungspraxis, Bachelor, Ing.prax.
Betreuer Mikhail Startsev
Tel.: +49 (0)89 289-28550
E-Mail: mikhail.startsev@tum.de
Sachgebiet Computer Vision
Beschreibung One of the important questions in computer vision is how you determine what information in a scene (represented by an image or a video) is relevant. So-called “saliency models” [1] have been used to predict informativeness in images. However for videos the ways of incorporating the temporal component of the series of frames into an attention prediction model range from being extremely computationally intensive (ex. deep neural networks using 3D convolution operators) to the ones using hand-crafted approaches (ex. the use of optical flow or using two subsequent frames as input).

In order to avoid or reduce the “hand-engineered” aspect of the features in use, different modifications of traditional 2D CNNs can be employed. The deep learning methods have already proven their worth in the image saliency task [2] and some results related to videos are starting to appear as well. In this project the candidate will work with various CNN models that work with video data in order to compare their performance. Depending on the progress, learning several models from scratch on pre-recorded data can be beneficial.

[1] https://en.wikipedia.org/wiki/Salience_(neuroscience)#Visual_saliency_modeling
[2] http://saliency.mit.edu/results_mit300.html
Voraussetzung Understanding of machine learning concepts and solid programming skills are desirable.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “ application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.